PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of biolettersThe Royal Society PublishingBiology LettersAboutBrowse By SubjectAlertsFree Trial
 
Biol Lett. 2009 June 23; 5(3): 421–424.
Published online 2009 February 25. doi:  10.1098/rsbl.2008.0729
PMCID: PMC2679908

An examination of phylogenetic models of substitution rate variation among lineages

Abstract

Molecular evolutionary rates can show significant variation among lineages, complicating the task of estimating substitution rates and divergence times using phylogenetic methods. Accordingly, relaxed molecular clock models have been developed to accommodate such rate heterogeneity, but these often make the assumption of rate autocorrelation among lineages. In this paper, I examine the validity of this assumption.

Keywords: rate autocorrelation, relaxed clocks, divergence dating, mutation rate, substitution rate, life history

1. Introduction

Rates of molecular evolution can vary substantially among sites, genes and lineages. In the past two decades, phylogenetic methods have been modified to take these forms of rate heterogeneity into account. For example, a number of ‘relaxed-clock’ models have been developed, which allow substitution rates to vary among lineages in a phylogenetic tree, without the need to assign a separate rate parameter for each branch (for a general overview, see Rutschmann 2006). These models enable the estimation of divergence times and lineage-specific substitution rates from sequence data that do not conform to a strict molecular clock. To make the estimation procedure tractable, relaxed-clock models place limitations on how rates are able to vary throughout the tree. Many of the widely used models assume that the substitution rate is indirectly heritable because it is correlated with a variety of inherited characteristics, including those associated with cellular environment, physiology and life history. Such patterns are then assumed to lead to some degree of autocorrelation between molecular rates in adjacent branches of the tree.

2. Autocorrelated relaxed-clock models

In practice, the assumption of rate autocorrelation is applied in one of several ways. In autocorrelated relaxed-clock models, the various biological factors are encapsulated in a single function describing the behaviour of rates throughout the tree. Some relaxed-clock methods employ an algorithm to minimize the rate changes between adjacent branches (Sanderson 1997, 2002), while others implement an explicit model of rate variation in which substitution rates can change or ‘evolve’ along branches (Aris-Brosou & Yang 2002; e.g. Huelsenbeck et al. 2000; Kishino et al. 2001; Lepage et al. 2006; Rannala & Yang 2007). After reviewing the various relaxed-clock models in detail, Lepage et al. (2006, 2007) proposed that the Cox–Ingersoll–Ross process possesses a number of desirable statistical properties that make it suitable for describing rate evolution. In this model, the mean rate at time t, R(t), is equal to

E[R(t)]=R(0)eθt+μ(1eθt),

where μ is the stationary mean of the rate and θ determines the speed of the decay in rate autocorrelation.

Studies of simulated and real data have demonstrated that estimates of substitution rates and divergence times are sensitive to the choice of relaxed-clock model (Ho et al. 2005; Drummond et al. 2006; Lepage et al. 2007), highlighting the need for careful model selection.

3. Biological motivation

The biological motivation behind autocorrelated relaxed clocks can be summarized in the form of two key assumptions. The first assumption is that mutation rates are influenced by life-history characteristics such as generation time, metabolic rate and DNA repair efficiency (Gillespie 1991; Baer et al. 2007). For example, herbaceous plants generally have shorter generation times than woody plants, and so exhibit higher rates of molecular evolution (Smith & Donoghue 2008).

The second assumption is that rates of mutation and substitution are correlated. Unless evolution is proceeding in an effectively neutral manner, substitution rates are somewhat removed from mutation rates. It is possible, however, that closely related species experience similar selection intensities, with comparable fitness distributions for mutations. In some empirical studies of rate variation among lineages, the two steps are not separated, and substitution rates are taken as a proxy for mutation rates. This is primarily due to the difficulty in obtaining reliable estimates of mutation rates. In any case, the substitution rate in each lineage depends on the interplay of mutation, selection and drift.

Among mammals, substitution rates have been found to be correlated with body size (Lanfear et al. 2007) or metabolic rate (Gillooly et al. 2005), synonymous rates with generation time (Nikolaev et al. 2007) and maximum lifespan (Welch et al. 2008), and non-synonymous rates with population size (Nikolaev et al. 2007) and several other traits (Welch et al. 2008). It is not clear, however, whether such patterns extend to other taxonomic groups; Lanfear et al. (2007) found no evidence of a metabolic rate effect on substitution rates across a variety of metazoan taxa. Investigations of the correlations between rates and biological traits in plants have yielded mixed results (e.g. Barraclough & Savolainen 2001; Davies et al. 2004; Smith & Donoghue 2008).

4. Rate autocorrelation in practice

The biological assumptions underlying autocorrelated relaxed clocks warrant closer examination. The first assumption, that mutation rates are closely linked to heritable traits, receives support from studies of mammalian data. Nevertheless, even these trends differ between mammalian mitochondrial and nuclear genomes (Welch et al. 2008). Studies of other taxa have indicated that the correlations observed in mammals cannot be readily extended to other metazoans (Thomas et al. 2006; Lanfear et al. 2007).

Another pertinent question, related to the first biological assumption, concerns the taxonomic scale of the sequence data that are being analysed with autocorrelated relaxed-clock models. In a study of the cytochrome b gene in mammals, Nabholz et al. (2008) found that family-level categorization explained the greatest amount of rate variation. Overall, one would predict the highest degree of autocorrelation to be observed at intermediate levels of the taxonomic hierarchy. At one extreme, we would expect a very high degree of underlying rate autocorrelation within a species, such that any rate variation among lineages would be primarily due to stochastic, uninherited factors (Drummond et al. 2006); indeed, many population genetic and coalescent-based approaches assume a strict molecular clock.

At the other end of the continuum, autocorrelation in life-history traits (or any other factor that might be strongly correlated with mutation/substitution rates) would inevitably break down at higher taxonomic levels (Gittleman & Kot 1990; Drummond et al. 2006). The magnitude of the differences among lineages would be amplified if there is very incomplete taxon sampling, and the degree of autocorrelation would decrease as taxon sampling becomes more sparse. In cases where a dataset consists of distantly related taxa, there is little reason to expect any appreciable autocorrelation among the rates on different lineages. Consequently, it would be difficult to defend the validity of making a priori assumptions about the manner in which the rates vary among lineages.

Autocorrelated rate methods have been used to analyse sequences at various taxonomic scales, ranging from viral sequences obtained from a single host, to sequences acquired from representatives of different kingdoms of life. To investigate the trends in the application of autocorrelated relaxed clocks, a survey was conducted of all 46 studies that used such methods and were published in Royal Society journals prior to November 2008 (table 1).

Table 1
Summary of all 46 studies that have used autocorrelated relaxed clocks and have been published in Royal Society journals.

The sequence data examined in these studies spanned a broad range of taxonomic levels (figure 1). Five studies analysed datasets in which the majority of nodes in the tree represented ordinal divergences or higher. At the other extreme, nine studies involved the analyses of datasets that included large numbers of sequences from conspecific individuals, with three conducted entirely at the population level.

Figure 1
Plot of the approximate taxonomic levels spanned by 48 datasets that have been analysed using autocorrelated relaxed-clock models. Details of the individual studies are given in table S1 in the electronic supplementary material.

For the methods of analysis to be applicable to all of these datasets, they would need to be sufficiently flexible such that they could accommodate widely varying levels of rate change and autocorrelation. For small, sparsely sampled datasets, it is doubtful whether there should be any expectation of rate autocorrelation at all.

The second assumption behind autocorrelated relaxed-clock models is that mutation and substitution rates are strongly correlated. This is reasonable for sequences that are evolving neutrally. In analyses of sequences under selection, however, such an assumption is far more questionable. This relates particularly to non-mammalian mitochondrial sequence data, of which the evolutionary history appears to have been driven substantially by adaptive evolution (Bazin et al. 2006). If rates of adaptive substitution are not tied to inherited factors, then the presence of such substitutions can seriously weaken the link between life-history traits and substitution rates. As mentioned above, however, closely related species could experience similar selection intensities, as implied under covarion models of sequence evolution (e.g. Tuffley & Steel 1998). The extent to which such processes could lead to rate autocorrelation among lineages is not known.

In a comprehensive study of mammals, no correlation was found between non-synonymous mitochondrial rates and life-history traits (Welch et al. 2008). Indeed, this suggests that autocorrelated relaxed-clock models might be inappropriate for analyses of amino acid sequences. Thus, perhaps it would be desirable to employ separate autocorrelated and uncorrelated models of among-lineage rate variation for non-coding and coding or amino acid sequences, respectively.

5. Detecting autocorrelation

Rate autocorrelation among branches can be detected in a Bayesian framework, e.g. by Bayes factor comparison of autocorrelated and uncorrelated relaxed clocks (Lepage et al. 2007). Using this approach, Lepage et al. (2007) found that autocorrelated models provided a significantly better fit than uncorrelated models to three protein-coding DNA alignments, but not when the number of taxa was small.

Rate autocorrelation can also be measured by using relaxed-clock models that do not make an a priori assumption of autocorrelation (Drummond et al. 2006; Rannala & Yang 2007). In these models, the rate on each branch is sampled from a single underlying distribution (such as a lognormal distribution), of which the parameters are estimated in the analysis. Comparison of the posterior and prior distributions of the covariance in (estimated) rates in neighbouring branches can then be used to provide an indication of rate autocorrelation in a dataset. Analyses performed in this framework have failed to detect rate autocorrelation in DNA/RNA sequences of influenza virus, dengue virus, marsupials (Drummond et al. 2006), plants (Moore & Donoghue 2007) and birds (Brown et al. 2008).

6. Concluding remarks

Given the considerations described above, it is clear that further investigations of among-lineage rate variation are critically required. In view of the rapidly growing amount of sequence data, it should become possible to employ mixed relaxed-clock models when analysing datasets that comprise both selected and putatively neutral sites. For example, it might be preferable to employ relaxed-clock models that separate synonymous and non-synonymous rates (Seo et al. 2004; Lemey et al. 2007). Such an approach has the potential to provide a better fit to the data, and to be more illuminating with respect to the molecular evolutionary process.

Acknowledgements

I am grateful to Rob Lanfear and three anonymous referees for their constructive comments. This research was supported by the Australian Research Council.

Footnotes

One contribution of 11 to a Special Feature on ‘Whole organism perspectives on understanding molecular evolution’.

Supplementary Material

Table S1:

List of studies using autocorrelated relaxed-clock methods and published in Royal Society journals, complete as of 19 November, 2008

References

  • Aris-Brosou S., Yang Z. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst. Biol. 2002;51:703–714. doi:10.1080/10635150290102375 [PubMed]
  • Baer C.F., Miyamoto M.M., Denver D.R. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat. Rev. Genet. 2007;8:619–631. doi:10.1038/nrg2158 [PubMed]
  • Barraclough T.G., Savolainen V. Evolutionary rates and species diversity in flowering plants. Evolution. 2001;55:677–683. doi:10.1554/0014-3820(2001)055[0677:ERASDI]2.0.CO;2 [PubMed]
  • Bazin E., Glemin S., Galtier N. Population size does not influence mitochondrial genetic diversity in animals. Science. 2006;312:570–572. doi:10.1126/science.1122033 [PubMed]
  • Brown J.W., Rest J.S., Garcia-Moreno J., Sorenson M.D., Mindell D.P. Strong mitochondrial DNA support for a Cretaceous origin of modern avian lineages. BMC Biol. 2008;6:6. [PMC free article] [PubMed]
  • Davies T.J., Savolainen V., Chase M.W., Moat J., Barraclough T.G. Environmental energy and evolutionary rates in flowering plants. Proc. R. Soc. Lond. B. 2004;271:2195–2200. doi:10.1098/rspb.2004.2849 [PMC free article] [PubMed]
  • Drummond A.J., Ho S.Y.W., Phillips M.J., Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88. doi:10.1371/journal.pbio.0040088 [PMC free article] [PubMed]
  • Gillespie J.H. Oxford University Press; Oxford, UK: 1991. The causes of molecular evolution.
  • Gillooly J.F., Allen A.P., West G.B., Brown J.H. The rate of DNA evolution: effects of body size and temperature on the molecular clock. Proc. Natl Acad. Sci. USA. 2005;102:140–145. doi:10.1073/pnas.0407735101 [PubMed]
  • Gittleman J.L., Kot M. Statistics and a null model for estimating phylogenetic effects. Syst. Zool. 1990;39:227–241. doi:10.2307/2992183
  • Ho S.Y.W., Phillips M.J., Drummond A.J., Cooper A. Accuracy of rate estimation using relaxed-clock models with a critical focus on the early metazoan radiation. Mol. Biol. Evol. 2005;22:1355–1363. doi:10.1093/molbev/msi125 [PubMed]
  • Huelsenbeck J.P., Larget B., Swofford D.L. A compound Poisson process for relaxing the molecular clock. Genetics. 2000;154:1879–1892. [PubMed]
  • Kishino H., Thorne J.L., Bruno W.J. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol. 2001;18:352–361. [PubMed]
  • Lanfear R., Thomas J.A., Welch J.J., Brey T., Bromham L. Metabolic rate does not calibrate the molecular clock. Proc. Natl Acad. Sci. USA. 2007;104:15 388–15 393. doi:10.1073/pnas.0703359104 [PubMed]
  • Lemey P., Kosakovsky Pond S.L., Drummond A.J., Pybus O.G., Shapiro B., Barroso H., Taveira N., Rambaut A. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput. Biol. 2007;3:e29. doi:10.1371/journal.pcbi.0030029 [PubMed]
  • Lepage T., Tupper P., Bryant D., Lawi S. Continuous and tractable models for the variation of evolutionary rates. Math. Biosci. 2006;199:216–233. doi:10.1016/j.mbs.2005.11.002 [PubMed]
  • Lepage T., Bryant D., Philippe H., Lartillot N. A general comparison of relaxed molecular clock models. Mol. Biol. Evol. 2007;24:2669–2680. doi:10.1093/molbev/msm193 [PubMed]
  • Moore B.R., Donoghue M.J. Correlates of diversification in the plant clade Dipsacales: geographic movement and evolutionary innovations. Am. Nat. 2007;170:S28–S55. doi:10.1086/519460 [PubMed]
  • Nabholz B., Glemin S., Galtier N. Strong variations of mitochondrial mutation rate across mammals—the longevity hypothesis. Mol. Biol. Evol. 2008;25:120–130. doi:10.1093/molbev/msm248 [PubMed]
  • Nikolaev S.I., Montoya-Burgos J.I., Popadin K., Parand L., Margulies E.H., Antonarakis S.E. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc. Natl Acad. Sci. USA. 2007;104:20 443–20 448. doi:10.1073/pnas.0705658104 [PubMed]
  • Rannala B., Yang Z. Inferring speciation times under an episodic molecular clock. Syst. Biol. 2007;56:453–466. doi:10.1080/10635150701420643 [PubMed]
  • Rutschmann F. Molecular dating of phylogenetic trees: a brief review of current methods that estimate divergence times. Divers. Distrib. 2006;12:35–48. doi:10.1111/j.1366-9516.2006.00210.x
  • Sanderson M.J. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 1997;14:1218–1231.
  • Sanderson M.J. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 2002;19:101–109. [PubMed]
  • Seo T.K., Kishino H., Thorne J.L. Estimating absolute rates of synonymous and nonsynonymous nucleotide substitution in order to characterize natural selection and date species divergences. Mol. Biol. Evol. 2004;21:1201–1213. doi:10.1093/molbev/msh088 [PubMed]
  • Smith S.A., Donoghue M.J. Rates of molecular evolution are linked to life history in flowering plants. Science. 2008;322:86–89. doi:10.1126/science.1163197 [PubMed]
  • Thomas J.A., Welch J.J., Woolfit M., Bromham L. There is no universal molecular clock for invertebrates, but rate variation does not scale with body size. Proc. Natl Acad. Sci. USA. 2006;103:7366–7371. doi:10.1073/pnas.0510251103 [PubMed]
  • Tuffley C., Steel M. Modeling the covarion hypothesis of sequence evolution. Math. Biosci. 1998;147:63–91. doi:10.1016/S0025-5564(97)00081-3 [PubMed]
  • Welch J.J., Bininda-Emonds O.R., Bromham L. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evol. Biol. 2008;8:53. [PMC free article] [PubMed]

Articles from Biology Letters are provided here courtesy of The Royal Society