|Home | About | Journals | Submit | Contact Us | Français|
Understanding the mechanisms of evolution requires information on the rate of appearance of new mutations and their effects at the molecular and phenotypic levels. Although procuring such data has been technically challenging, high-throughput genome sequencing is rapidly expanding knowledge in this area. With information on spontaneous mutations now available in a variety of organisms, general patterns have emerged for the scaling of the mutation rate with genome size and for the likely mechanisms driving this pattern. Support is presented for the hypothesis that natural selection pushes mutation rates down to a lower limit set by the power of random genetic drift rather than by intrinsic physiological limitations, and that this has resulted in reduced levels of replication, transcription, and translation fidelity in eukaryotes relative to prokaryotes.
Because mutation is the ultimate source of all variation, both adaptive and deleterious, a mechanistic understanding of the evolutionary process will be incomplete until a detailed account has been made of the rate of origin, molecular nature, and phenotypic consequences of spontaneous alterations for a diversity of organisms. Owing to the extreme rarity of mutational events and their frequent elimination by selection in natural environments, most prior insights into the molecular aspects of mutation have been derived from a few reporter constructs in a handful of model species (Drake 2006). This situation is now rapidly changing as the application of high-throughput genome sequencing to mutation accumulation experiments allows the identification of de novo mutations in an essentially unbiased manner.
At least two broad generalizations now seem possible. First, there is a dramatic reversal in the directional relationship between the mutation rate and genome size from viruses to cellular microbes to multicellular species, with prokaryotes having higher levels of fidelity than eukaryotes at the levels of replication, transcription, and translation. Second, in multicellular species, somatic mutation rates are notably higher than germline rates, whereas on a cell division basis the latter are not much different than rates observed in unicellular species. With these observations in hand, we are now in a better position to understand the causes and consequences of mutation rate evolution in various phylogenetic lineages.
In one of the first attempts to understand the patterning of mutation rates across various organisms, Drake (1991) concluded that the mutation rate/nucleotide site/generation (u) scales inversely with genome size (G) in DNA-based microbes, which further implies that the mutation rate/genome/generation (uG) is essentially constant across all microbial life. Because this early analysis was based on just seven taxa, four of which were bacteriophage, there was room for skepticism over the initial findings, but additional mutation rate assays performed in recent years have allowed for a substantial extension of this previous analysis. Although most microbial mutation rate estimates still rely on single reporter constructs, the approaches advocated by Drake (1991) can be used to translate per locus rates to a per nucleotide site scale (Supplemental Material). The focus here will be on base substitution mutations alone, as considerably less work has been done on insertions and deletions.
For double-stranded DNA viruses and prokaryotes, strong support for Drake’s conjecture remains (Figure 1a), with the mutation rate/site/generation scaling with the −1.1 power of total genome size, although an obvious remaining concern is that the pattern is largely dependent on the inclusion of bacteriophage genomes. Mutation rates for RNA viruses are greater than those for double-stranded DNA genomes of comparable size, but the negative scaling is retained (although this is entirely dependent on a single data point). For the prokaryotic genomes for which data are available, the sampling error of the mutation rate is so large and the range in variation for genome size is so small that only a weak scaling of u with G can be discerned. To more clearly resolve this matter, whole genome mutation accumulation assays, like those now available for several eukaryotes (Lynch et al. 2008; Denver et al. 2009; Keightley et al. 2009; Ossowski et al. 2010), are desirable for a range of prokaryotes, particularly those with extreme genome sizes.
In striking contrast to the preceding pattern, when attention is confined to cellular species, mutation rates scale positively with genome size, with vertebrates having nearly 100× higher per generation rates than prokaryotes, and with the rates for unicellular eukaryotes, invertebrates, and land plants being intermediate (Figure 1b). Most of the eukaryotic estimates are based on surveys of substantial genomic regions (including complete genome sequences in four cases), which greatly reduces the sampling variance associated with locus-specific peculiarities, and a statistical scaling of u with the ~2/3rds power of genome size has very strong statistical support (Figure 1b). Note that on a per generation basis, the average mammalian mutation rate is nearly equal to that per replication in double-stranded DNA viruses.
Drake (1991) suggested that the constant rate of total genomic mutation in microbes “is likely to be determined by deep general forces, perhaps by a balance between the usually deleterious effects of mutation and the physiological costs of further reducing mutation rates.” The implicit assumption here is that organisms under strong selection for rapid replication cannot maximize the fidelity of DNA replacement without limiting the rate of DNA synthesis necessary for daughter cell production. This general idea has been promoted broadly (Kimura 1967; Kondrashov 1995; Dawson 1998, 1999; Drake et al. 1998; Sniegowski et al. 2000; André and Godelle 2006; Baer et al. 2007), and although it has not been the subject of empirical investigation, it is known that microbial systems can be improved (Quiñones and Piechocki 1985; Loh et al. 2010).
If the cost-of-repair hypothesis is correct, then we would infer a higher cost of replication in multicellular species (where mutation rates are high) than in prokaryotes. However, the time necessary for the replication of large eukaryotic genomes is compensated by the population of chromosomes with multiple origins of replication (in contrast to the single origin in most bacterial chromosomes). Moreover, as will be discussed below, the burden of somatic mutations imposes a downward selective pressure on mutation rates in multicellular species which is not shared by unicellular species. Thus, an alternative explanation must be sought for the elevated rates of mutation in eukaryotes.
One possibility is that the lower bound on the mutation rate is not set by physiological or biochemical limitations, but by the intrinsic inability of selection to push the rate any lower. The power of random genetic drift (1/2Ne for diploid organisms, where Ne is the genetic effective population size) ultimately constrains what natural selection can accomplish with any trait, and once the mutation rate is pushed to such a low level that any further incremental improvement conveys a fitness advantage smaller than the power of drift, selection will be incapable of reducing the rate any further. Therefore, a key to understanding mutation rate evolution is determining the degree to which evolved mutation rates approach the barriers imposed by drift.
By producing a correlated genetic load through the recurrent influx of deleterious mutations at linked and unlinked sites, even the weakest of mutator alleles suffer an indirect selective disadvantage associated with the excess mutational burden contained within the genomes of carrier individuals (Kimura 1967; Kondrashov 1995; Dawson 1999; Lynch 2008). This disadvantage can be quite small, however, having a maximum value equal to twice the product of the average deleterious effect of a heterozygous mutation (sd) and the diploid genome-wide reduction in the deleterious mutation rate (ΔU, where U is in the range of 0.01 to 1.0 per generation for multicellular eukaryotes, Lynch and Walsh 1998; Baer et al. 2007; and perhaps an order of magnitude lower in yeast, Wloch et al. 2001; Joseph and Hall 2004). The factor of two arises because most induced mutations arise on chromosomes unlinked to the mutator, retaining an association with the latter for an average of just two generations.
Two factors will reduce the selective advantage of an antimutator allele below 2sdΔU, where ΔU is now the genome-wide reduction in the deleterious mutation rate. First, the full long-term advantage of an antimutator is not realized until it has reached selection-mutation balance with respect to its reduced mutation load (Johnson 1999), thus making it more difficult for selection to initially promote such an allele towards fixation. Second, if there is any “cost of replication” associated with the antimutator (sr), the maximum selective advantage becomes 2sdΔU - sr. Thus, because allelic variants with selection coefficients much smaller than the power of random genetic drift evolve in an effectively neutral manner (Kimura 1983), an antimutator allele will be insensitive to selection unless the change in the genome-wide deleterious mutation rate is considerably greater than [1/(2Ne) + sr]/(2sd). Assuming that sd and sr are independent of Ne, this suggests that the mutation rate should scale negatively with Ne up to the point where U is so low that further incremental reductions cannot overcome the drift barrier.
Are eukaryotic mutation rates driven to such low levels? Although a definitive answer cannot yet be given, it is known that Ne is typically in the range of 105 to 106 for the nuclear genomes of multicellular species (Lynch 2007), and that the average value of sd generally ranges from 10−3 to 10−2 (Lynch and Walsh 1998). This implies that a selectable antimutator must reduce the deleterious genome-wide mutation rate in a multicellular lineage by an amount much greater than 10−4 to 10−2. Because these values are ~1% of the genome-wide deleterious mutation rates known for multicellular species, it follows that an antimutator allele would have to reduce U by much more than 1%, perhaps an order of magnitude more, to be promoted by selection. Although not impossible, given that DNA replication and repair are functions of dozens of loci, single amino acid altering mutations at such loci might only rarely have such large effects. Thus, the drift hypothesis appears to be quantitatively plausible.
The drift hypothesis derives further support from the distribution of average Ne among phylogenetic lineages (Lynch 2006, 2007). Under the assumption that nucleotide diversity at silent sites in natural populations is effectively neutral (due to the lack of impact at the amino acid level), the equilibrium level of heterozygosity (πs) at such sites is ~4Neu in diploid species (and 2Neu in haploids), where u is the mutation rate per site. Using previously summarized data on πs from major phylogenetic groupings (Lynch 2006), and factoring out the average mutation rates provided in Figure 1, the average Ne in these groups can be approximated. One then finds a significant negative correlation between u and Ne in accordance with the drift hypothesis (Figure 2a).
A similar pattern is found for mammalian mitochondrial genomes using data from Piganeau and Eyre-Walker (2009). Here, Ne is the effective number of females, as the mammalian mitochondrion is maternally inherited. The mutation rate was inferred indirectly from phylogenetic estimates of divergence at silent sites (assumed to be neutral), estimated times of divergence from the fossil record, and estimated mean generation times. Despite the greater degree of uncertainty in these data, the log-log regression of lineage-specific estimates of u on Ne has a slope identical to that for the nuclear data described above (Figure 2b).
Because the indirect estimates of Ne in both of these analyses are associated with a considerable (but unknown) degree of sampling error, the true scaling of u and Ne might be more extreme than the observed −0.6 power. Nevertheless, that two analyses based on different phylogenetic groups, types of data analysis, and genomic compartments yield essentially the same result provides strong support for the hypothesis that declines in Ne compromise the ability of selection to maintain high-fidelity replication and/or repair mechanisms. Still further support derives from a body of studies suggesting that several aspects of replication fidelity in eukaryotes are compromised relative to the situation in prokaryotes (Lynch 2008), although some aspects of DNA repair seem to be enhanced in mammals relative to microbes (Saparbaev et al. 2000).
These observations help explain a long-standing conundrum in evolutionary genetics – the near independence of nuclear molecular heterozygosity levels across phylogenetic groups with presumably large disparities in Ne. Lewontin (1974) dubbed this pattern “the paradox of variation,” although Nei (1983) later pointed out a weak positive correlation between levels of variation and Ne. We now see that the relative phylogenetic stability of πs across broad domains of life is not a reflection of relatively constant Ne, but of an inverse relationship between u and Ne. This inverse relationship appears also to be responsible for the relative invariance of πs in the mitochondrial genomes of diverse animals (Bazin et al. 2006; Nabholz et al. 2008, 2009)
The preceding arguments also provide a plausible explanation for the opposite scaling pattern of the mutation rate with genome size in viruses and prokaryotes. The case has been made that an upper-bound to Ne, in the neighborhood of 109 to 1011, might exist in cellular species, dictated by the physical (linked) nature of the genome (Lynch 2007). Assuming this upper bound is approximated in non-eukaryotic microbes, and the genome-wide deleterious mutation rate is driven to the lower limit compatible with the associated magnitude of drift, then because selection operates on the genome-wide deleterious mutation rate, any reduction in genome size would increase the lower limit of the achievable per site mutation rate by reducing the number of mutational targets, yielding the inverse scaling suggested by Drake. Such a response is quite notable in the endosymbiotic bacterium Buchnera aphidicola (Moran et al. 2009), which has a highly reduced genome size and the highest known mutation rate for a prokaryote (left-most eubacterial data point in Figure 1).
It also follows that if the average effect of a deleterious mutation (sd) were to increase, the lower limit to the achievable mutation rate would decrease. Drawing from observations that mutations that are benign at low temperatures often have elevated deleterious effects at high temperatures, Drake (2009) has argued that an elevation in sd has promoted the evolution of reduced base-substitution mutation rates in thermophilic bacteria.
Finally, it should be noted that despite the similar scaling of the per site mutation rate with Ne in both nuclear and mitochondrial genomes, the absolute values of u are much greater for mitochondria (Figure 2). Such a pattern is also in agreement with the expectations of the drift hypothesis, as the number of mutational targets in the animal mitochondrion (e.g., just 13 protein-coding genes) is far below the number in nuclear genomes. Thus, although it is often argued that elevated mitochondrial mutation rates in metazoans are an inevitable consequence of a highly oxidative mitochondrial environment, the drift hypothesis provides an explanation based purely on the efficiency of selection. Nonetheless, a remaining puzzle with respect to organelle mutation rates concerns the apparent ~ten-fold reduction in land plants relative to nuclear rates (Lynch 2007). Plant organelle genomes can be up to ten-fold larger than animal mitochondrial genomes, but they are still vastly smaller than nuclear genomes, and the effective population sizes of such organelles do not appear to be unusually large (Lynch 2007). Thus, to be consistent with the drift hypothesis, the average deleterious effects of organelle mutations in land plants must be unusually large, some aspects of the repair machinery must be driven by nuclear functions and/or there must be mechanisms for reducing plant organelle mutation rates in much smaller increments than in nuclear environments.
Along with the burden of deleterious germline mutations, multicellular species experience transient somatic mutations, which influence the reproductive output of parental genomes via the development of cancer, senescence, and a large number of other disorders. Although almost no theory exists on the consequences of somatic mutations for the evolution of the mutation rate, because the same basic repair pathway machinery appears to be deployed in all cells, there must be a direct connection between selection to reduce the somatic mutation rate and the evolution of the germline mutation rate, and vice versa (Lynch 2008).
To evaluate the evolutionary consequences of somatic mutations, it is first instructive to put things on an equal footing by standardizing the germline mutation rates of multicellular lineages to a per cell division basis. Such a comparison shows that although selection has been incapable of maintaining per generation germline mutation rates for base substitutions at the levels observed in microbes, the rates per cell division have been kept low, and in humans, perhaps even suppressed (Table 1). However, this degree of conservation seems not to apply to all forms of mutation, as germline mutations at microsatellite loci arise five times more frequently per cell division in Caenorhabditis elegans than in yeast or slime mold, with mammalian and land plant rates being ~14× those in C. elegans (Seyfert et al. 2008; Marriage et al. 2009).
Although the maintenance of somatic integrity is critical to germline transmission, metazoan somatic mutation rates are consistently greater than germline rates. In humans, the average mutation rate for four somatic cell types, 1.02 × 10−9/site/cell division (SE = 0.27 × 10−9), is 17× higher than the germline rate and 3.5× higher than the average for yeast and Escherichia coli (Lynch 2010). Assays of a wide range of tissue types in mouse and rat lines engineered to carry reporter constructs show that somatic cells accumulate two- to six-fold more mutations than do cells in the testes at the age of maturity, and considerably more later in life (Table 1). On an absolute time scale, somatic mutation rates are also higher than germline rates in the medaka fish (Winn et al. 2000), and in Drosophila melanogaster per generation somatic rates average ~80× those in the germline (Garcia et al. 2007; Edman et al. 2009). Thus, without the advantages of germline protection, the precise nature of which remains to be determined, the heritable per generation mutation rates for animal species would be several-fold higher.
The enormity of the somatic mutation problem can be roughly estimated in humans, where the per generation rate of mutation for intestinal epithelium is ~13× that in the germline, and by extrapolation, that in fibroblasts and lymphocytes is likely to be ~5× higher again (Table 1). Thus, with a human germline mutation rate of ~10−8 base substitutions/site/generation, a site in a somatic nucleus will be mutated with a probability of 10−7 to 10−6 by the average age of reproduction, with the burden being higher in older individuals. With a diploid genome size of 6 × 109 sites and ~1013 cells per soma, the body of a middle-aged human might then contain >1016 mutations (not including insertions, deletions, or other larger scale mutations). Only about 1% of the human genome consists of coding DNA, so a substantial fraction of somatic mutations will be inconsequential, but even if just 1% of coding mutations had significant fitness effects, the total body burden of mutations would be of order 1012. Diploidy might mask the effects of many deleterious mutations, but most mutations with small effects act in a nearly additive fashion (Lynch and Walsh 1998), and although processes such as apoptosis might remove some cells with major mutational defects, it is unlikely that cells with incremental levels of incapacitation could be selectively eliminated. The net result is a progressive lifetime accumulation of somatic mutations, as clearly revealed in the mouse where the germline DNA remains relatively stable within an environment of degrading somatic cell genomes (Figure 3).
Without details on the absolute fitness effects of somatic mutations, only qualitative statements can be made on their consequences for the evolution of the germline mutation rate (Lynch 2008). One central question is the degree to which the efficiency of selection operating on the mutation rate via the consequences of somatic mutations changes with the level of multicellularity. At low levels of organismal complexity, the reduction in individual fitness associated with somatic mutations can be described roughly as the product of four factors, 2usTnss, where 2us is the diploid somatic mutation rate per nucleotide site per generation, T is the number of sites influencing fitness, n is the number of cells influencing fitness, and ss is the reduction in fitness per somatic mutation. (A more explicit form of this expression would not treat all cells equally, but sum over independent tissues; Lynch 2008).
Although T and n must increase with increasing levels of multicellularity, ss might decrease, depending on aspects of cellular surveillance and the buffering effects of multicellularity on individual mutant cells. By contrast, as noted in Figure 2, increased multicellularity is generally associated with a reduction in Ne, which in turn reduces the efficiency of selection. Thus, a key to understanding the degree to which the burden of somatic mutations impacts selection on the mutation rate itself is analogous to the situation noted above for germline mutations. If the increase in 1/(2Ne) with increasing multicellularity exceeds the increase in 2usTnss, the ability of selection to reduce the somatic mutation rate (and likely the correlated effect on the germline rate) will become progressively compromised.
Moreover, a scenario can be envisioned whereby a critical level of multicellularity is eventually reached, beyond which the ability of selection to reduce the somatic mutation rate begins to decline (Lynch 2008). Such behavior is expected because the strength of selection depends on relative rather than absolute fitness effects. Although the absolute negative consequences of somatic mutations might continue to increase indefinitely with increasing multicellularity, once a level has been reached at which the fraction of affected individuals approaches saturation, the relative selective disadvantage of a further increase in the mutation rate must begin to decline. It is unclear where organisms with various levels of multicellularity reside on this continuum. However, it is clear that if the somatic mutation load plays a role in the evolution of the germline mutation rate, it has generally been incapable of keeping the somatic rate at levels observed in unicellular species.
Although somatic nuclear mutations permanently influence a host cell and all of its descendants, two more transient forms of mutations are also of relevance – errors in transcription and translation. The average error rate estimate for RNA Polymerase II (the polymerase involved in transcription of coding mRNAs) in E. coli is 1 × 10−5 per base incorporation (Blank et al. 1986; Ninio 1991; Goldsmith and Tawfik 2009), whereas that for Saccharomyces cerevisiae is in the range of 2 × 10−6 to 3 × 10−4 (Shaw et al. 2002; Kireeva et al. 2008), and the single estimate for a multicellular species is 1 × 10−3 in wheat (de Mercoyrol et al. 1992). These rough estimates are based on a variety of methodologies and have a restricted phylogenetic range. Nevertheless, they indicate that transcription error rates per nucleotide transaction are orders of magnitude higher than the replicaton error rates noted above. They also suggest that transcriptional fidelity is reduced in eukaryotes, perhaps substantially so in multicellular lineages.
A similar pattern is observed for translation, with the overall level of fidelity appearing to be even lower than that for transcription. Although there can be considerable variation among codon types, the average translation error rate per codon in E. coli is 6 × 10−4 (Ortego et al. 2007; Kramer and Farabaugh 2007; Willensdorfer et al. 2007), whereas the average rates for yeast, rabbit reticulocytes, and mouse liver cells are, respectively, 2 × 10−3 (Stansfield et al. 1998; Salas-Marco and Bedwell 2005), 3 × 10−4 (Loftfield and Vanderjagt 1972), and 1 × 10−3 (Mori et al. 1985). Taken together, with an average protein length of ~300 amino acid residues (Lynch 2007), these observations suggest that, without removal by post-translational surveillance, >20% of individual proteins will contain at least one inappropriate amino acid (Drummond and Wilkie 2009).
Like DNA polymerases, RNA polymerases have a proof-reading capacity (Sydow and Cramer 2009), and there is no obvious reason why they (or the translational machinery) should be intrinsically constrained from operating at the level of efficiency of DNA polymerases. However, because individual loci generally produce multiple transcripts, and mRNAs and individual proteins have transient residence times within cells, transcriptional and translational errors are expected to have less severe effects on cell integrity than genome-level errors. Thus, the strength of selection operating on the transcriptional and translational machinery is likely to be less stringent, and consistent with the drift hypothesis, this might explain the greatly elevated error rates at these processes.
Germline mutation rate data provide a critical basis for interpreting patterns of molecular diversity within species and divergence among species. Indeed, inferences regarding selection have been historically derived by assuming certain classes of sites (e.g., synonymous coding-region positions) to be effectively neutral and hence to evolve at the mutation rate, hence providing clear predictions of evolutionary patterns expected in the absence of selection (Kimura 1983). However, the direct estimates of mutation rates and molecular spectra obtained in the studies reviewed herein are often substantially different from those derived by indirect inference from natural populations, at least in part because selection is more pervasive than formerly believed (e.g., Eöry et al. 2009). Thus, to be fully reliable, future molecular investigations with a goal of interpreting evolutionary mechanisms should take advantage of direct estimates of mutation rates. One might argue that laboratory estimates are subject to their own peculiar biases, but the consistent patterns noted above suggest that we are close to developing a general understanding of the rates at which base substitutions arise in various phylogenetic lineages.
Although there are strong scaling patterns between the mutation rate per generation and genome size, this pattern is not a function of direct causality, i.e., large eukaryotic genomes do not intrinsically engender low fidelity of DNA replication. Rather, because there is a general insertional bias in most eukaryotic genomes, as effective population sizes decline and the efficiency of selection against excess DNA is relaxed, genome size increases in a passive fashion (Lynch 2007), along with the mutation rate. By contrast, effective population sizes in viruses and prokaryotes might often be so close to their maxima that the lower limit to the evolvable genome-wide mutation rate has been reached. Once this point has been reached, any events that lead to a further reduction in genome size (e.g., loss of non-essential genes in endosymbionts and parasites) will generally increase the minimum evolvable per site mutation rate, as the product of the latter and the genome-wide number of selected sites is equal to the genome-wide deleterious rate. However, why the total genomic mutation rate for microbes converges on ~0.003 per cell division (Drake 1991) is a mystery that remains to be solved.
An additional unsolved problem concerns the long-term stability of the mutation rate in low-Ne lineages. Just as random genetic drift can inhibit the fixation of an antimutator with an insufficiently large effect on the genomic mutation rate, it can prevent the fixation of sufficiently mild mutator alleles. What then prevents the gradual accumulation of very mildly deleterious mutations at DNA repair loci and a slow but progressive increase in the mutation rate in multicellular lineages? Given the preceding arguments, it might be premature to assume that the mutation rate in such lineages has actually attained equilibrium.
This work was funded by NIH grant GM36827 and NSF grant EF-0827411, and by the MetaCyte program derived from Lilly Foundation funding to Indiana University. I am grateful to J. Drake for helpful comments.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.