Saraha desert
© Luca Galuzzi, via Wikimedia Commons
In a previous article I described the evidence that cooption faces insurmountable mathematical challenges in explaining the origins of such complex molecular machines as the bacterial flagellum. Some of my argument relies on the evidence that novel proteins are exceedingly difficult to evolve. In discussions with critics, several important questions were raised which led me to further research studies addressing the effect of mutations on protein stability. I found that the consistent results of key studies decisively confirm the conclusion of Doug Axe that most natural proteins are too rare to evolve through an undirected search.

Accumulating Mutations

As I noted earlier, evolutionists argue that each protein comprising a flagellum resulted from the duplication of an existing gene which then continuously mutated until it stumbled upon a new flagellar function. However, research over the past several years has shown this claim to be implausible. Understanding why will require a few steps of analysis. To begin, experiments on such model proteins as β-lactamase and HisA demonstrate that their activity declines increasingly rapidly as mutations that change amino acids accumulate, and the mutations' deleterious effects also increase.

As a specific example, the Tokuriki and Tawfik study demonstrated the following effects of accumulating mutations:
  • After only a few random mutations (1-2) under weak selection, around a third of subsequent changes to a protein completely disable it.
  • After several more mutations accumulate (5-6), the protein is inactivated by slightly under two-thirds of subsequent changes.
  • After random alteration of less than 10 percent of the protein's initial sequence, it becomes permanently nonfunctional (fitness approaches zero).
The corresponding rarity (ratio of functional to nonfunctional amino acid sequences) can be calculated working backwards. The number of sequences that differ from an optimal one by a given number of amino acids increases almost exponentially with the number, so a random search would find a barely functional sequence long before an optimized one. Therefore, estimating the upper limit for the probability of a successful trial must focus on this neighborhood of sequence space (map of all possible sequences).

In the study, after 5-10 mutations, roughly 2 in 3 mutations inactivate a protein. Therefore, 1 in 3 amino acids at each position on average would correspond to a functional sequence. The rarity would then be less than 1/3 to the power of the sequence length. This estimate closely matches the result from Axe's 2004 β-lactamase experiment that only 1 in 1077 sequences corresponds to a functional protein. The actual rarity is much more extreme, since almost no sequences are functional after 10 percent of a protein randomly changes.

Losing Structural Stability

The authors demonstrate that their results apply generally to globular proteins (e.g., enzymes and components of molecular machines). The reason is that the stability of a protein's structure decreases on average with each added mutation. As a result, increasingly small percentages of sequences in the local region of sequence space are functional. The drop in stability can be slow at first, but after a certain threshold is reached, the structure rapidly destabilizes. The corresponding region of sequence space then becomes almost entirely devoid of functional sequences. Axe came to the same conclusions previously, but the authors neglected to give him due credit.

This trend in destabilization is remarkably consistent across different types of proteins, as confirmed by both experiment and the FoldX computational algorithm. Of particular importance, the average drop in stability (change in free energy of folding) per mutation is approximately 1 kcal/mol, and the difference in free energy between the folded and unfolded protein states is typically only 3-10 kcal/mol. In addition, roughly two-thirds of mutations are destabilizing (stability drop > 1 kcal/mol) which explains the observation that the same fraction of mutations to a barely functional protein inactivate it. Moreover, the related rarity of proteins has been further confirmed by a residue-residue co-evolutionary statistical model. As a result, a few dozen random mutations would completely disable (unfold) most proteins.

This definitive evidence refutes one of the most common criticisms of Axe's protein research. Namely, critics often complain that he did not consider the possibility that the protein he studied might have performed some other function than the one for which he tested. In reality, since the functional loss is due to the loss of structural stability, all other functions dependent on a stable structure must also cease. An analogy would be a demolition crew blowing up the foundation of a building and then watching it start to collapse. They would not need video cameras in every room to know that all activities would soon come to an end.

Visualizing the Challenge

The implications of these results can be better understood with an analogy. Evolving a novel protein through random mutations in a section of DNA is analogous to a blindfolded man meandering across some terrain and randomly bumping into a dentist. The chance of finding a functional protein sequence neighboring a highly optimized natural protein could correspond in a best-case scenario to the chance of the man finding a dentist if he starts in the middle of a national dental convention. The probability for success would be relatively high.

In contrast, finding a sequence in a region near a protein with a few random mutations would be like the man finding a dentist if he started in the middle of Times Square in New York City, a more difficult challenge. A few more mutations would correspond to the man starting in the middle of Kansas, and nearly any 10 percent sequence change would correspond to the man starting in the middle of the Sahara desert. As I said in my previous article, evolving a flagellar protein from its hypothetical ancestor would be equivalent to initiating the search from a random sequence, and this challenge would correspond to starting in the Sahara desert. The task would be next to impossible.

Assessing the Number of Targets

A common response to such probabilistic analyses is that vast numbers of different proteins might serve some particular function. Therefore, finding a specific protein might be quite unlikely, but finding any one of a multitude that perform a particular task could be a tractable problem. This objection fails as a result of studies examining the distribution of proteins in all of nature, for they determined that known proteins reside within only a few hundred thousand protein families.

The difficulty of a search finding a target corresponding to a protein family can be assessed using the timescale study, described in my earlier article, by Chatterjee et al. Proteins in the same family tend to have similar structures and functions, and they typically have amino acid sequences that are at least 35 percent identical. The similarity in their DNA sequences would then calculate to roughly 50 percent identity, if the commonly redundant third nucleotide in a codon is not counted. This figure includes the 35 percent of codons (amino acids) that are the same and a quarter of the nucleotides (A, C, T, and G) in the remaining 65 percent that would match a "center sequence" purely by chance. The sequence-space region encompassing a protein family would then roughly correspond in the timescale study to a target with c=.50.

The timescale analysis concluded that the chance of any organism finding a c=.50 target for sequences of modest length in the entire history of the earth would be less than 1 in 1026. Therefore, a search would have any hope of success only if over a trillion trillion targets existed. This figure dramatically exceeds the number of protein families that have been identified in all of life. Moreover, protein families can be classified by their members sharing the same combination of "sequence profiles" which are related to structural domains (independent folding units in proteins). The relationship is not exact since one domain might correspond to multiple profiles and vice versa. Significantly, only around 15,000 sequence profiles have been identified. The number of families is growing rapidly with the explosion of protein sequencing, but the count of sequence profiles is increasing very slowly. As a consequence, the number of possible targets is miniscule compared to what would be required for a search to find even one. So a randomly mutating sequence would likely never even enter the neighborhood of any protein family.

Overly Optimistic Estimate

Actually, the previous analysis is overly optimistic since, as demonstrated, most proteins become entirely nonfunctional after less than 10 percent of their sequences randomly change. The probability of a trial finding such a target (c=.10) would often be over 10300 times less likely than finding the protein family (c=.50 ) target. That number is a 1 with 300 zeros behind it, which is greater than the ratio of the volume of the entire universe to that of a single proton. As a result, the protein targets represent remote islands of functionality very diffusely scattered throughout the larger target region which dramatically decreases the probability for success.

One caveat is that some negative mutations can be compensated for by other mutations usually increasing stability. As a consequence, some sequences with more than a 10 percent difference might still be functional. However, the percentage of compensating mutations is sufficiently small (approximately 5 percent) that only narrow corridors of functionality would exist through sequence-space, which would not alter the conclusion of extreme rarity.

To illustrate why, imagine a protein of a length of 500 amino acids where two out of the twenty amino acids at each position allows for some function. Note that two out of twenty represents a 10 percent flexibility which is twice the percentage of mutations which increase stability. Two sequences could have no sequence similarity (differ by 100 percent) yet both function. However, the probability of finding any functional sequence would still only correspond to a 10 percent target (c=.10) in the timescale study. The fact that sequences could potentially differ dramatically does not necessarily translate to a large increase in the probability for finding the corresponding target.

Calculating the Odds

Compounding the challenge, a search could never realistically find a functional sequence even if it were confined to within the neighborhood of a known protein. And natural selection could not assist until after a marginally functional protein originated since all nonfunctional sequences are equally useless.

A simple calculation will demonstrate the problem. As I mentioned, after fewer than 10 mutations in the Tokuriki and Tawfik experiment, the rarity of functional sequences approximates to 1/3 to the power of the sequence length. The HisA article reported an even faster drop in fitness, indicating even greater rarity. And several other studies demonstrated a corresponding rarity that was similar. Using Tokuriki and Tawfik's results, the probability of a trial finding a protein the size of that forming the flagellar filament (L=498) would equate to less than 1 in 10200.

This estimate represents an upper limit, for the probability of success in a region with known functional sequences is much higher than in a region chosen at random. As an analogy, one would expect to have greater luck finding a dandelion in an area immediately neighboring a dandelion patch than at a random location in North America. Therefore, the difficulty of a search finding a viable protein would be vastly greater starting at a random location within the entire protein-family neighborhood, and the chance of success would be even worse starting at a random location within all of sequence space which would often be the case.

Understanding the Implications

The proteins in such complex molecular machines as a bacterial flagellum demonstrate a clear negative response to mutations. For instance, even a few mutations in those comprising the flagellum's basal body, hook, or filament can degrade performance or disable operation. These observations strongly confirm that a flagellar protein would be completely unusable before 10 percent of its sequence randomly changed. So the possibility of its evolving by chance is less than remote. As a consequence, the core arguments of Doug Axe, and by extension those of Michael Behe, are decisively confirmed.