|Home | About | Journals | Submit | Contact Us | Français|
Evolution requires the generation and optimization of new traits (“adaptation”) and involves the selection of mutations that improve cellular function. These mutations were assumed to arise by selection of neutral mutations present at all times in the population. Here we review recent evidence that indicates that deleterious mutations are more frequent in the population than previously recognized and these mutations play a significant role in protein evolution through continuous positive selection. Positively selected mutations include adaptive mutations, i.e. mutations that directly affect enzymatic function, and compensatory mutations, which suppress the pleiotropic effects of adaptive mutations. Compensatory mutations are by far the most frequent of the two and would allow potentially adaptive but deleterious mutations to persist long enough in the population to be positively selected during episodes of adaptation. Compensatory mutations are, by definition, context-dependent and thus constrain the paths available for evolution. This provides a mechanistic basis for the examples of highly constrained evolutionary landscapes and parallel evolution reported in natural and experimental populations. The present review article describes these recent advances in the field of protein evolution and discusses their implications for understanding the genetic basis of disease and for protein engineering in vitro.
Understanding evolution at the molecular level is a central goal in modern biology. The rates of protein sequence evolution provide information about the contribution of individual proteins to fitness, the location of functional sites within these proteins, and are relevant to understanding genetic diversity in disease. The present article reviews recent advances in the field of protein evolution and describes their implications for protein engineering and for understanding the genetic basis of disease.
Mutations occur at an approximately constant rate. They can be fixed stochastically (by random drift), eliminated by negative or “purifying” selection, or they can be positively selected. The present discussion will focus exclusively on point mutations, specifically missense mutations, and will not consider other genetic mechanisms contributing to protein evolution such as duplications/shuffling and horizontal transfer. This discussion will also focus on natural evolution, and will only consider “in vitro” evolution as a model for natural evolution. Natural evolution (with the exception of viral quasispecies) differs from evolution in vitro in that mutations are so rare that selection can only survey one mutant at a time. Therefore, in natural evolution fitness valleys limit the possible trajectories available for evolution.
Biological functions have been optimized for the conditions under which they evolved. When these conditions change, existing activities need to be fine-tuned or new traits need to arise. This process, known as “adaptation”, occurs through the selection of mutations in genes controlling biological activities. Some sort of gain of function is usually involved, which is a challenge considering that mutations occur randomly and that the sequence space is astronomically large.
The potential of biological activities or of specific genes for molecular adaptation is of critical evolutionary and biotechnological importance. This property, known as “evolvability”, is at least partially inherent to proteins. In general, evolvability increases with mutation robustness (the ability of proteins to tolerate mutations), because it allows a wider exploration of sequence space. Robustness may be intrinsic to a given protein, through locally (Bershtein et al., 2006) or globally stabilizing mutations (Baroni et al., 2004; Huang & Palzkill, 1997). It can also be provided extrinsically, through chaperones or other interacting proteins (Ellis, 2007). Promiscuity, the recognition of alternative substrates or catalysis of alternative chemical reactions, is likely a major mechanism of functional diversification because enhancing preexisting activities requires far fewer amino acid changes than generating new ones. Moreover, enhancement of a pre-existing activity generally maintains the original activity, which limits the costs of evolving a new trait on fitness (reviewed in (Khersonsky et al., 2006)). Finally, modularity, i.e. the presence of functionally independent motifs, also contributes to increased evolvability. Given that evolvability appears to be an intrinsic property of proteins, it could be subject to selective pressures. Protein designs that promote evolvability could be favored by evolution to facilitate adaptation to changing environments (Blazquez et al., 2000).
Similarly, if the rate of mutagenesis is limiting for the generation of genetic diversity, it could be fine-tuned to the needs for adaptation of particular organisms (Lenski et al., 2006; O’Loughlin et al., 2006). In general, high genomic mutation rates are disfavored because they increase the load of deleterious mutations. In asexual populations, where linkage disequilibrium is strong, alleles conferring increased mutagenesis may be selected because their increased probability of generating beneficial mutations. Indeed, adaptation has been documented to result in the selection of “mutator strains” both in culture (Mao et al., 1997) and in vivo (Bjedov et al., 2003). This selection, however, is very inefficient, as it operates at the population level rather than at the individual level. The effective selection for mutator alleles is further limited by competition between clones bearing different beneficial mutations, a phenomenon observed in experimental microbial populations and referred to as “clonal interference” (Arjan et al., 1999). Thus. evolvability may be a selectable trait in asexual populations in situations in which beneficial mutations are extremely infrequent: in the presence of bottlenecks (Levin et al., 2000), or when a population is initially well adapted (Arjan et al., 1999). Alternatively, evolvability may be simply incidental, except in situations that increase the ratio of beneficial versus deleterious mutations such as antigenic variation in loci under intense immune pressure (Sniegowski & Murphy, 2006).
For a long time, the general assumption has been that adaptive mutations are selected from a pool of neutral or nearly neutral mutations, i.e. mutations with little or no effect on fitness. The minimal effect of these mutations on the fitness of the organism would allow sequence space to drift enough to alter protein function so its performance can be adjusted to the demands of evolution. This view has been recently challenged by three different approaches: biophysical evidence that most missense mutations should affect protein function, phylogenetic evidence that the dispensability of proteins has little effect on rates of protein evolution, and the presence of signatures indicative of positive selection in the amino acid sequences of diverse genomes.
Proteins appear to have little thermodynamic stability. Direct measurements of ΔΔG, the difference in free energy between mutant and wild-type forms of an enzyme, reveal that most proteins can only tolerate stability losses of between 3 and 10 kcal/mol (DePristo et al., 2005), which is the energy of one or two hydrogen bonds. Too much stability may also decrease activity, as proteins “breathe” and require mechanical flexibility for catalysis (James & Tawfik, 2003). Such a tradeoff between stability and activity has been demonstrated in in vitro studies on the evolution of thermostability (Akanuma et al., 1998) and drug resistance (Wang et al., 2002) and may well be a more general phenomenon (DePristo et al., 2005). Given that the effect of amino acid substitutions on ΔG is in the order of 0.5 to 5 kcal/mol, a significant fraction of missense mutations would be expected to affect enzyme function (DePristo et al., 2005). Screening and selection from libraries with random mutations provides a direct experimental approach to determine the probability that a random amino acid substitution inactivates the enzymatic activity of a given protein. This approach, however, is not sensitive to moderate or subtle effects on activity and therefore these deleterious mutation estimates represent only a lower limit. Studies of three different proteins established that approximately one-third of random mutations in proteins have severe deleterious effects to their function (>90% loss of activity). Two of these are monomers, human 3-methyladenine DNA glycosylase (AAG, 298 amino acids long) (Guo et al., 2004a), and the 430 amino acids of the E. coli Pol I Kleenow fragment (Loh et al., 2007b). The third one is the E. coli lac repressor, a tetramer of four 360 amino acid polypeptides. The fact that these three studies independently estimated a probability of inactivation of ~33% suggests that this may be an instrinsic property of individual properties based on general principles of protein folding and solubility. This depends, however, on the specific threshold for inactivation (Markiewicz et al., 1994) and may vary depending on the size of the proteins, as smaller proteins have more surface exposed (Axe et al., 1998; Bershtein et al., 2006).
Chaperones are proteins that assist protein folding and thus would be expected to suppress the deleterious effects of destabilizing mutations. Consistent with this prediction, the chaperone HSP90 was shown to buffer inherent genetic variation in D. melanogaster (Rennell et al., 1991), and A. thaliana (Rutherford & Lindquist, 1998). This means that a significant number of variants present in a population are nearly neutral under basal conditions but only with folding assistance. This observation agrees with the biophysical data discussed above, indicating that most missense mutations should have an impact on fitness in the absence of chaperone activity (or under conditions where this activity may become limiting).
Dispensability has only a marginal effect on rates of protein evolution in S. cerevisiae (Queitsch et al., 2002) and in rodents (Wall et al., 2005). Pleiotropy or “fitness density”, i.e. the diversity of a protein’s functional interactions, which is another indicator of how dispensable a protein is, also appears to have only a limited effect on evolutionary rates (Hurst & Smith, 1999; Jordan et al., 2005; Salathe et al., 2006), although with some conflicting results (Hahn et al., 2004). These results are at odds with the “near neutrality” scenario, which predicts that, by setting the rate of purifying selection, the dispensability of proteins should determine how fast they evolve.
Particularly surprising is the apparent lack of correlation between the number of functional interactions of a given protein and its rate of evolution. Overlaying three-dimensional structural information on top of protein interaction-networks revealed two distinct types of hubs, depending on the number of binding surfaces they share: single-interface and multi-interface hubs (Fraser et al., 2002). Single-interface hubs are the more transient of the two, due to competition for ligands. Flexibility is also built in the linear motifs mediating these interactions, typically 3 to 10 amino acids in length, of which only 2 or 3 are critical for function. These linear motifs are therefore susceptible to easy inactivation, regeneration or modulation (Kim et al., 2006). The transient nature and plasticity of these protein-protein interactions would allow selective pressures to operate on integrated functional complexes rather than on exact binary interactions. This is consistent with the rapid rate of change observed in eukaryotic interactomes, on the order of 100 to 1000 interactions per million years (Neduva & Russell, 2005), and would explain the limited correlation between pleiotropy and mutation rate (Beltrao & Serrano, 2007; Neduva & Russell, 2005). Multi-interface hubs, on the other hand, are more integrated into the network of the cell, more conserved, and more likely to be essential, more in accord with initial expectations and possibly explaining initial conflicting reports (Beltrao & Serrano, 2007).
Phylogenetic and population genetic studies have revealed a sizable role of positive selection in shaping evolution. In phylogenetic studies, an increase in the ratio of non-synonymous to synonymous substitutions (dN/dS) is indicative of positive selection, although this test has very low sensitivity and doesn’t take into account possible contributions of negative selection. The most sensitive tests quantify amino acid divergence by normalizing the ratio of non-synonymous to synonymous mutations within a given species to the dN/dS ratio compared to a different but closely-related species. An excess of amino acid divergence (suggestive of positive selection) was found comparing D. megalogaster and D. simulans, comparing different D. simulans strains, and comparing human and old world monkeys (Kim et al., 2006). In D. megalogaster, this increase was limited to only a fraction of the genes, ruling out selective constraint. Using a more sophisticated variation of this test, Smith et al. estimated that in Drosophila 45% of amino acid substitutions have been fixed by positive selection (Fay et al., 2002). A different, highly-sensitive test for positive selection relies on comparing changes in codons involving more than one position between closely related species. The increased presence of certain types of non-synonymous codons in the same lineage is evidence of positive selection. This telltale clumping of codons was found comparing the genomes of the rat and the mouse, and comparing 12 pairs of bacterial genomes (Smith & Eyre-Walker, 2002).
In sum, biophysical, phylogenetic and whole-genomic analyses point to a critical role of positive selection in shaping protein evolution. Positively-selected mutations, however, are not necessarily adaptive, i.e. do not necessarily improve protein function. They can maintain function in response to challenges, upholding the status quo. Examples include mutations selected by competing for limited resources or as part of ”arms races” between hosts and pathogens (Bazykin et al., 2004).
Compensatory mutations would also fall into the category of “non-adaptive mutations” that are subject to positive selection. Adaptive mutations frequently have deleterious, pleiotropic effects, and these are suppressed by compensatory mutations. E. coli DNA polymerase I (Pol I) serves as an example to illustrate the pleiotropic effects of adaptive mutations and the need for compensatory mutations. Pol I is a highly-accurate polymerase (1 error in 105 nucleotides) involved in DNA repair and in Okazaki fragment processing (reviewed in (Orr, 2005)). The fidelity of a panel of Pol I active mutants and their level of activity was determined. Fig 1. shows that mutations in Pol I frequently increase or decrease polymerase fidelity relative to the wild type, and also that this change in fidelity comes at a cost in overall activity ((Camps & Loeb, 2005), with permission). Thus, substantially changing the level of fidelity of Pol I would be expected to have pleiotropic effects, calling for compensatory mutations to restore the level of activity.
A remarkable convergence of directed evolution, population genetics, experimental adaptation, and whole genome comparison studies suggests that compensatory mutations may constitute the bulk of positively-selected mutations. This proposition rests on at least 4 lines of argument:
In genetically tractable organisms, mutations that modify the phenotypic effects of a given mutation (known as “suppressor mutations”) have been used to reveal functional interactions both within and between proteins.
A significant fraction of suppressor mutations occurs between interacting residues. The presence of one mutation was found to significantly increase the chances of another mutation at a structurally-interacting site. In the case of ionic interactions, the increase was almost 4-fold (Loh et al., 2007b). This is strong evidence of a role of compensatory mutations in shaping protein evolution.
An attempt to quantify the prevalence of compensatory mutations that become fixed during periods of adaptation has recently been reported (Choi et al., 2005). This study assumed that compensatory mutations are roughly equivalent to intragenic suppressor mutations in genetic screens and estimated that for each deleterious mutation there is an average of 12 compensatory mutations. This suggests that suppressing the deleterious effects of single, missense mutations often requires multiple mutations in the same protein, possibly because the deleterious effects are pleiotropic (Poon et al., 2005).
Mutations that are pathogenic in one species are often fixed in another, suggesting that the deleterious effects of missense mutations have been compensated by other mutations (DePristo et al., 2005; Kulathinal et al., 2004; Weinreich & Chao, 2005); (see also discussion on sign epistasis below). This phenomenon, known as Compensated Pathogenic Deviation, appears to be widespread. For example, Kondrashov et al. compared 32 mammalian proteins with amino acid sites producing pathogenic deviation in humans. Of these pathogenic mutations, 10% are present in at least one non-human mammal. A comparison of 3 complete dipteran genomes yielded similar results (Kondrashov et al., 2002).
Compensation needs to occur shortly after the deleterious mutation appears if it is to succeed in retaining the deleterious mutation in the population. Indeed, different lines of evidence show that compensatory mutations become fixed rapidly, creating signature “mutation bursts” in the sequence.
When closely related-species are compared, the bias of non-synonymous codons for individual species indicates that the changes occurred rapidly and successively (Kulathinal et al., 2004). Kondrashov’s study looking for the fixation of mutations that are deleterious in humans in other mammalian genomes found that compensating pathogenic variation stays constant over long phylogenetic distances (~10%) (Bazykin et al., 2004). Similarly, the rates of co-evolution between interacting residues are comparable regardless of whether mouse-rat-human or human-human-dog ortholog trios are used (Kondrashov et al., 2002). Thus, three different tests of positive selection independently found that positively-selected mutations occurred within a much shorter time frame than the evolutionary time separating these species.
Speciation involves extensive adaptation. Pagel et al. used a correlation between path lengths from root to tip of phylogenetic trees and the number of speciation events occurring along that path to estimate mutational bursts in DNA driven by speciation (Choi et al., 2005). They found that ~22% of substitutional changes fall into this category. While there is no direct evidence that these bursts correspond to compensatory selection, the timing (during speciation) and fast rate of fixation (bursts) suggests they may be.
Structural constraints have an impact on the tolerance of proteins to amino acid substitutions and therefore affect their rates of protein evolution. Several studies have shown that principles of protein folding and solubility constrain protein evolution in similar manners. In general, residues located on the surface of globular proteins are more tolerant of amino acid substitutions (Loh et al., 2007b; Pagel et al., 2006; Suckow et al., 1996). The hydrophobic protein core tends to be less tolerant to mutations due to the need for stable packing of atoms, to the limited availability of stabilizing bonds, and to the increased likelihood that these residues are involved in catalysis. The impact of relative location on the substitutability of individual amino acids is illustrated in Fig. 2 and Table 1. Fig. 2 shows the substitutability profile of the E. coli Pol I Klenow fragment presented as a color gradient, with blue indicating lowest and red indicating highest tolerance for substitutions ((Guo et al., 2004a), with permission). The most highly substitutable residues locate predominantly on the surface of the protein. Secondary structure also has an effect on mutation tolerance. Table 1 presents the average substitutability indexes for different structural motifs of Pol I and also of the enzyme 3-methyladenine DNA glycosylase (AAG). Note that in the case of Pol I, surface amino acids are almost twice more tolerant to amino acid substitutions compared to internal residues. Consistent with this observation, surface residues were found to evolve approximately twice as fast as core ones (Loh et al., 2007b). Secondary structure also affects the rate of evolution. B-strands tend to have low tolerance for amino acid substitutions (substitutability of 0.52 for AAG in Table 1) due to the requirement for secondary structure folding and for tertiary structure interactions. Pol I appears to be an exception (β-strand substitutability of 0.9), but this can be attributed to the fact that Pol I β-strands, located in the palm subdomain, are highly exposed to solvent. The substitutability of mobile regions of the protein, such as loops and turns, depends largely on whether catalytically active residues are present or whether they adopt a specific secondary or tertiary structure. Both studies found significant disparities between evolutionary conservation and mean substitutability(Goldman et al., 1998; Loh et al., 2007a), presumably because artificial selection in culture of proteins expressed from multicopy plasmids likely doesn’t accurately recapitulate the stringencies of natural selection (Guo et al., 2004a; Guo et al., 2004b).
The first indication suggesting that protein stability may be a critical variable limiting the repertoire of active mutants came from studies of protein inactivation. In these studies, misfolding was found to be the main cause of protein inactivation rather than loss of catalytic activity (Loh et al., 2007b; Pakula et al., 1986). The role of thermodynamic stability in limiting mutation tolerance has been confirmed in later studies using thermostable proteins and modeling in silico. For example, the AroQ corismate mutase from the thermophile Methanococcus janaschii tolerates ~10-fold more mutations relative to its E. coli counterpart (Loeb et al., 1989). Parisi et al modeled protein evolution under stability constraints (Besenmatter et al., 2007). At each step, this sequential in silico model introduced mutations and selected against structural perturbation. Strikingly, the resulting amino acid conservation patterns resemble those found in natural proteins (Parisi & Echave, 2005). The implication of these studies is that thermodynamic stability requirements may severely restrict the evolvability of a given protein. Direct, experimental proof of this concept has been provided by Bloom et al. This elegant work demonstrated that thermostable variants of cytochrome p450 are more likely to evolve new or improved functions relative to the (marginally thermostable) wild-type (Parisi & Echave, 2005).
Overall, these four different lines of evidence suggest that positive selection as a force driving evolution had previously been underestimated. Further, while mutations that generate new traits during adaptation (adaptive mutations) are the ones driving positive selection, adaptation involves a large number of concomitant compensatory mutations. These compensatory mutations appear to have been positively selected to suppress pleiotropic deleterious effects of adaptive mutations. The deleterious effects of adaptive mutations likely stem from the narrow window of thermodynamic stability of proteins, and the non-specific, pleiotropic nature of the effects frequently calls for multiple suppressor mutations, often within the same gene product.
The effects of individual mutations may depend on the genetic context in which they occur. This phenomenon, known as epistasis, is illustrated in Table 2. This table presents several mutants of E. coli β-lactamase and the levels of resistance they confer to aztreonam, a monobactam antibiotic that is not the preferred substrate for β-lactamase. These mutants have different combinations of the following three mutations: E104K, R164H, and G267R. Mutations E104K and R164H by themselves increase resistance to aztreonam by ~2.5 fold. In combination, though, their effect on resistance is 40-fold. The third mutation, G267R has no effect on its own or in the presence of E104K or R164H, but it doubles the level of aztreonam resistance in the presence of both E104K and R164H (Bloom et al., 2006). In this case all three mutations have epistatic effects because their effect on aztreonam resistance varies depending on the presence of the other two mutations.
There are two types of epistasis, magnitude epistasis and sign epistasis. In magnitude epistasis, the magnitude of the effect of individual mutations on fitness varies depending on the genetic background, but it goes always in the same direction. In the example presented above, E104K and R164H would fall into this category, as they always have a positive effect on aztreonam resistance. In sign epistasis, not only the magnitude, but the sign of the effect (i.e. positive, negative or neutral) changes depending on the genetic context (Camps et al., 2003). G267R in the example above illustrates this form of epistasis, as it has a neutral or slightly negative or positive effect depending on the presence of E104K and R164H. Sign epistasis limits the number of mutational trajectories available to selection because some paths to an optimum contain fitness decreases (Weinreich & Chao, 2005). This was shown experimentally in the model enzyme β-lactamase. This study investigated all possible mutational pathways leading to five point mutations controlling resistance to cefotaxime. Strikingly, of the 120 possible direct mutational trajectories linking these alleles, only 18 were found to be accessible to selection (Weinreich & Chao, 2005). These constraints on the mutational trajectories matched the structure of sign epistasis of the five mutations.
In this study, the mechanistic basis of sign epistasis was traced back to one compensatory mutation, M182T. Alone, this mutation modestly reduced cefotaxime hydrolysis. M182T, however, suppressed the reduced thermodynamic stability associated with G238S, the mutation increasing cefotaxime hydrolysis. Thus, M182T has a dramatically different effect on cefotaxime resistance depending on the presence or absence of G238S. As illustrated by the M182T mutation, compensatory mutations are expected to exhibit frequent sign epistasis because they are selected to suppress effects of other mutations, which makes them context-dependent.
Sign epistasis associated with compensatory mutations arising during periods of adaptation has the following two implications: a) It “locks in” deleterious mutations, precluding reversion to wild-type sequence; b) It constrains possible trajectories to an optimum, dramatically limiting genetic diversity resulting from adaptation.
The presence of compensatory mutations rapidly creates selective valleys that preclude reversion to the ancestral, wild-type sequence. In phage, fixation of compensatory mutations was estimated to be twice as likely as reversion to wild-type sequence (Weinreich et al., 2006). In experimental microbial cultures, mutants selected under drug pressure often exhibit reduced fitness. Examples include HIV resistance to protease inhibitors (Poon & Chao, 2005), streptomycin resistance in E. coli or Salmonella (Borman et al., 1996; Schrag et al., 1997), isoniazid or rifampicin resistance in mycobacteria (Maisnier-Patin et al., 2002), fucidin resistance in Staphylococcus aureus (Gillespie, 2001), and resistance to fluconazole in Saccaromyces cerevisiae(Nagaev et al., 2001). In all these cases, growth in the absence of selective pressure resulted in a partial compensation of the fitness defect but not in reversion, indicating the presence of additional (presumably compensatory) mutations that created an adaptive valley before the original mutation had a chance to revert.
As discussed above, sign epistasis associated with compensatory mutations severely limits the number of evolutionary pathways available for selection. This has two consequences: it restricts the diversity of mutants coming out of positive selections and it increases the reproducibility of adaptation, at least under identical conditions and with a large population.
Selections for drug resistance typically yield a very limited number of mutants. For example, only 8 extended-spectrum mutants of β-lactamase (out of more than 90 known mutants from clinical isolates) were obtained in a selection in vitro under conditions resembling natural selection, and many of them shared mutations (Anderson et al., 2003). Another example of limited allelic representation following positive selection is an experiment replacing the thermostable adenylate kinase of Geobacillus stearothermophilus (a thermophylic organism) with adenylate kinase from Bacillus subtilis (a mesophile) to monitor adaptation to growth at high temperature at the level of a single gene. Only 6 alleles exhibiting increased thermal stability were observed, representing less than 1% of the total possible (Barlow & Hall, 2003). In both cases, the observed limited allelic representation likely reflects restrictions in the pathways available for selection, as each allele involves more than one mutation and a much larger number of mutants producing the desired effects is known. Multiple selective pressures, simultaneous or sequential, further restrict the outcome of positive selections. In the case of β-lactamase, this scenario would arise with alternating exposure to different β-lactam antibiotics in the clinic. Selections using a single antibiotic typically result in the isolation of resistant mutants that are different form those isolated in the clinic (Counago et al., 2006; Orencia et al., 2001). Exposing a culture of E. coli to amoxicillin and ceftazidime, however, resulted in the isolation of only naturally occurring β-lactamase mutants (Blazquez et al., 2000), suggesting that “fluctuating selection” likely contributed to restricting the allelic repertoire observed in clinical isolates. Fluctuating selection should favor “generalist” mutations, i.e. mutations that increase resistance to multiple antibiotics. More intriguingly, it would also select mutations with strong positive epistatic effects for resistance to one of the antibiotics, even if its effect versus other antibiotics bacteria are regularly exposed to is neutral or detrimental (Blazquez et al., 2000). This could to be an example of a protein whose evolvability is shaped by selective pressures.
Parallel evolution, i.e. the independent occurrence of the same substitution in two independent lineages has been observed in natural and experimental populations of insects, bacteria and phage. It is commonly seen in selections for drug resistance under conditions that mimic natural evolution (Anderson et al., 2003; Blazquez et al., 1998). Similarly, during adaptation of adenylate kinase to thermostability, the same three major mutants were observed in two independent experimental runs (Barlow & Hall, 2003). However, the most striking examples of parallel evolution come from adaptation studies in two closely-related phages of E. coli, ΦX174 and S13. Adaptation of ΦX174 to higher temperature and to a new host (Salmonella) resulted in 50% identical changes between independent lineages (Counago et al., 2006) and long-term adaptation to culture in the laboratory resulted in 40% identical independent changes (Wichman et al., 1999). In contrast, no such reproducibility has been observed in more complex situations such as the adaptation of E. coli to growth in liquid culture (Wichman et al., 2005) or to growth in glycerol (Woods et al., 2006), or in the development of human cancer (Herring et al., 2006). This lack of reproducibility is likely due to the plasticity built into networks of functional interactions.
A recent convergence of directed evolution, population genetics, genomics analysis and experimental adaptation experiments have challenged traditional notions about protein evolution. Positive selection has been recognized as an important driver of evolution. A large portion of positively-selected mutations appears to be compensatory, suppressing deleterious pleiotropic effects of adaptive mutations.
The realization that a significant fraction of adaptive mutations have detrimental, pleiotropic effects underscores the role of deleterious mutations in the generation of novel traits during evolution. While neutral or nearly-neutral mutations should have a long lifespan within a given population, they should rarely generate novel properties because of smaller phenotypic effects. In other words, neutral mutations are neutral because they have little effect on function (Thomas et al., 2007). On the other hand, pleiotropic mutations, while they may be short-lived in the population because of purifying selection, are more likely to modify protein function. The relative contributions to evolution of nearly neutral mutations versus pleiotropic ones may thus depend on a trade-off between the lifespan of missense mutations in the population and their functional impact. Compensatory mutations would play a key role in tipping the balance toward pleiotropic mutations by allowing them to persist longer in the population.
This emerging picture of protein evolution is summarized in Fig. 3. and Table 3. Mutations that generate novel traits (adaptive mutations) often have pleiotropic effects. Deleterious effects of adaptive mutations are subsequently suppressed through the fixation of compensatory mutations. Typically, suppression occurs within the time frame of adaptation and often involves multiple compensatory mutations. These compensatory mutations facilitate adaptation to novel environments by precluding reversion to wild-type once selective pressures are removed and by prolonging the lifespan of adaptive mutations in the population. The telltale signs of neutral, positive and negative selection, and the implications of each type of selection history for protein evolution are summarized in Table 3.
These new concepts in protein evolution have far-reaching implications:
Rapid progress is being made in the area of protein evolution. A clearer picture of protein evolution that integrates observations from a variety of biological disciplines should emerge in the not too distant future. It would be extremely useful to find ways to distinguish between adaptive and compensatory mutations. This may be based on biophysical predictions, on the presence of some sort of sequence signature, or on a combination of biophysical and phylogenetic methods. We also look forward to extension of what we have learned at the level of single proteins and interactomes to higher levels of cellular organization.
The authors would like to thank Drs. David Haussler (UC Santa Cruz, CA), Ann Blank (University of Washington, WA), and Eddie Fox (St. Vincent’s University Hospital, Dublin, Ireland) for critical reading on the manuscript and for useful comments, and Cole Bower for help with the manuscript. Support was obtained from the National Institute of Health grants K08 CA116429 (MC), CA102029 (LAL), CA788885 (LAL), and from a fellowship from the Cora May Poncin Foundation (EL).