Search tips
Search criteria 


Logo of transbThe Royal Society PublishingPhilosophical Transactions BAboutBrowse By SubjectAlertsFree Trial
Philos Trans R Soc Lond B Biol Sci. 2008 December 27; 363(1512): 3985–3995.
Published online 2008 October 7. doi:  10.1098/rstb.2008.0176
PMCID: PMC2607419

Fast, accurate and simulation-free stochastic mapping


Mapping evolutionary trajectories of discrete traits onto phylogenies receives considerable attention in evolutionary biology. Given the trait observations at the tips of a phylogenetic tree, researchers are often interested where on the tree the trait changes its state and whether some changes are preferential in certain parts of the tree. In a model-based phylogenetic framework, such questions translate into characterizing probabilistic properties of evolutionary trajectories. Current methods of assessing these properties rely on computationally expensive simulations. In this paper, we present an efficient, simulation-free algorithm for computing two important and ubiquitous evolutionary trajectory properties. The first is the mean number of trait changes, where changes can be divided into classes of interest (e.g. synonymous/non-synonymous mutations). The mean evolutionary reward, accrued proportionally to the time a trait occupies each of its states, is the second property. To illustrate the usefulness of our results, we first employ our simulation-free stochastic mapping to execute a posterior predictive test of correlation between two evolutionary traits. We conclude by mapping synonymous and non-synonymous mutations onto branches of an HIV intrahost phylogenetic tree and comparing selection pressure on terminal and internal tree branches.

Keywords: stochastic mapping, counting processes, evolutionary rewards, posterior predictive diagnostics

1. Introduction and background

Reconstructing evolutionary histories from present-day observations is a central problem in quantitative biology. Phylogenetic estimation is one example of such reconstruction. However, phylogenetic reconstruction alone does not provide a full picture of an evolutionary history, because evolutionary paths (mappings) describing trait states along the phylogenetic tree remain hidden. Although one is rarely interested in detailed reconstruction of such mappings, certain probabilistic properties of the paths are frequently used in evolutionary hypotheses testing (Nielsen 2002; Huelsenbeck et al. 2003; Leschen & Buckley 2007). For example, given a tree and a Markov model of amino acid evolution, one can compute the expected number of times a transition from a hydrophobic to a hydrophilic state occurs, conditional on the observed amino acid sequence alignment. Such expectations can inform researchers about model adequacy and provide insight into the features of the evolutionary process overlooked by standard phylogenetic techniques (Dimmic et al. 2005).

Nielsen (2002) introduced stochastic mapping of trait states on trees and employed this new technique in a model-based evolutionary hypothesis testing context. The author starts with a discrete evolutionary trait X that attains m states. He further assumes that this trait evolves according to an evolutionary model described by a parameter vector θ, where θ consists of a tree τ with n tips and branch lengths T=(t1,,tBn), root distribution π=(π1,,πm) and a continuous-time Markov chain (CTMC) generator Q={qij} for i,j=1,,m. Let mapping Mθ=({X1t},,{XBnt}) be a collection of CTMC trajectories along all branches of τ and H(Mθ) be a real-valued summary of Mθ. Clearly, even when parameters θ are fixed, h(Mθ) remains a random variable. Nielsen (2002) proposed to test evolutionary hypotheses using prior and posterior expectations E[H(Mθ)] and E[H(Mθ)|D], where D=(D1,,Dn) are trait values observed at the n tips of τ. Since these expectations are deterministic functions of θ and D, they can be used as discrepancy measures for posterior predictive p-value calculations (Meng 1994; Gelman et al. 1996).

A major advantage of Nielsen's stochastic mapping framework is its ability to account for uncertainty in model parameters, including phylogenies. A major limitation of stochastic mapping is its current implementation that relies on time-consuming simulations. In describing his method for calculating E[H(Mθ)] and E[H(Mθ)|D], Nielsen (2002) wrote ‘In general, we can not evaluate sums in equations 5 and 6 directly, because the set [of all possible mappings] is not of finite size.’ However, the infinite number of possible mappings does not prevent one from explicitly calculating E[H(Mθ)] for some choices of H. For example, if


then E[H(Mθ)]=Pr(D), the familiar phylogenetic likelihood that can be evaluated without simulations (Felsenstein 2004). Therefore, hope remains that other choices of H may also permit evaluation of E[H(Mθ)] and E[H(Mθ)|D] without simulations.

In this paper, we consider a class of additive mapping summaries of the form

H(Mθ)=b[set membership]Ωh({Xbt}),

where h({Xbt}) is a one-dimensional summary of the Markov chain path along branch b and Ω is an arbitrary subset of all branches of τ. Moreover, we restrict our attention to the two most popular choices of function h. Let L[subset or is implied by]{1,,m}2 be a set of ordered index pairs that label transitions of trait X. For each Markov path {Xt} and interval [0,t), we count the number of labelled transitions in this interval and arrive at


where we omit dependence on θ and t for brevity. Although our second choice of h is more abstract, it is motivated by Huelsenbeck et al. (2003), who used Nielsen's stochastic mapping algorithm to calculate the mean dwelling time of a trait in a particular state. Let w=(w1,,wm) be a set of rewards assigned to each trait state. Trait X is ‘rewarded’ the amount t×wi for spending time t in state i. We obtain the total reward of Markov path {Xt} by summing up all rewards that X accumulates during interval [0,t),


To obtain dwelling times of X in a predefined set of trait states, we set wi=1 if i belongs to the set of interest and wi=0 otherwise.

For these two choices of function h, we provide an algorithm for exact, simulation-free computation of E[H(Mθ)] and E[H(Mθ)|D]. Similar to phylogenetic likelihood calculations of Pr(D), this algorithm relies on the eigen decomposition of Q and requires traversing τ. Our work generalizes and unifies exact calculations previously developed for stochastic mapping (Holmes & Rubin 2002; Guindon et al. 2004; Dutheil et al. 2005). Despite the restricted form of the considered mapping summaries, our results cover nearly all current applications of stochastic mapping. We conclude with two applications of stochastic mapping illustrating the capabilities of exact computation. In our first example, we examine coevolution of two binary traits and demonstrate that a previously developed simulation-based test of independent evolution can be executed without simulations. We then turn to a large codon Markov state space, on which simulation-based stochastic mapping generally experiences severe computational limitations. Using our exact computations, we study temporal patterns of synonymous and non-synonymous mutations in intrahost HIV evolution.

2. Local, one branch calculations

In this section, we provide mathematical details needed for calculating expectations of stochastic mapping summaries on one branch of a phylogenetic tree. We first motivate the need for such local computations by making further analogies between calculations of Pr(D) and expectations of stochastic mapping summaries. The additive form of H reduces calculation of E[H(Mθ)] and E[H(Mθ)|D] to compute branch-specific expectations E[h({Xbt})] and E[h({Xbt)}|D]. Recall that according to the most phylogenetic models, trait X evolves independently on each branch of τ, conditional on trait states at all internal nodes of τ. This conditional independence is the key behind the dynamic programming algorithm that allows for efficient calculation of Pr(D) (Felsenstein 1981). For this likelihood calculation algorithm, it suffices to compute finite-time transition probabilities P(t)={pij(t)}, where


for arbitrary branch length t. Similarly, to obtain E[h({Xt})] and E[h({Xt})|D], we require means of computing local expectations E(h,t)={eij(h,t)}, where


and 1{·} is the indicator function. After illustrating how to compute E(NL,t) and E(Rw,t) without resorting to simulations, we provide an algorithm that efficiently propagates local expectations E(h, t) and finite-time transition probabilities P(t) along τ to arrive at E[h({Xt})] and E[h({Xt})|D].

(a) Expected number of labelled Markov transitions

Abstracting from phylogenetics, let NL(t) count the number of labelled transitions of a CTMC {Xt} during time interval [0,t). It follows from the theory of Markov chain-induced counting processes that


where QL={qij×1{(i,j)[set membership]L}} (Ball & Milne 2005). Since most evolutionary models are locally reversible, we can safely assume that Q is diagonalizable with eigen decomposition Q=U×diag(d1,,dm)×U1, where eigenvectors of Q form the columns of U, d1,,dm are the real eigenvalues of Q, and diag(d1,,dm) is a diagonal matrix with elements d1,,dm on its main diagonal. Such analytic or numeric diagonalization procedure permits calculation of finite-time transition probabilities P(t)=U×diag(ed1t,,edmt)×U1, needed for likelihood calculations (Lange 2004). Minin & Suchard (2008) have shown that one can use the same eigen decomposition of Q to calculate local expectations


where Si=UEiU1, Ei is a matrix with zero entries everywhere except at the ii-th entry, which is one, and


(b) Expected Markov rewards

For the reward process Rw(t), we define a matrix cumulative distribution function V(x,t)={Vij(x,t)}, where


Neuts (1995, ch. 5) demonstrated that local reward expectations can be expressed as




is the Laplace–Stieltjes transform of V(x, t). It is easy to see that the matrix exponential in equation (2.8) satisfies the following differential equation:


Differentiating this matrix differential equation with respect to s, exchanging the order of differentiation and evaluating both sides of the resulting equation at s=0, we arrive at the differential equation for local expectations


where E(Rw, 0) is the m×m zero matrix. Multiplication of both sides of equation (2.10) by integrating factor eQt from the right and integration with respect to t produces the solution,


Similarity between equations (2.3) and (2.11) invites calculation of the expected Markov rewards via spectral decomposition of Q,


In summary, formulae (2.4) and (2.12) provide a recipe for exact calculations of local expectations for the number of labelled transitions and rewards.

3. Assembling pieces together over a tree

(a) Notation for tree traversal

Let us label the internal nodes of τ with integers {1, …, n−1} starting from the root of the tree. Recall that we have already arbitrarily labelled the tips of τ with integers {1, …, n}. Let I be the set of internal branches and E be the set of terminal branches of τ. For each branch b[set membership]I, we denote the internal node labels of the parent and child of branch b by p(b) and c(b), respectively. We use the same notation for each terminal branch b except p(b), which is an internal node index, while c(b) is a tip index. Let i=(i1,,in1) denote the internal node trait states. Then, the complete likelihood of unobserved internal node states and the observed states at the tips of τ is

Pr(i,D)=πi1[product]b[set membership]Ipip(b)ic(b)(tb)[product]b[set membership]Epip(b)Dc(b)(tb).

We form the likelihood of the observed data by summing over all possible states of internal nodes,

Pr(D)=i1=1m(...)in1=1mπi1[product]b[set membership]Ipip(b)ic(b)(tb)[product]b[set membership]Epip(b)Dc(b)(tb).

Clearly, when data on the tips are not observed, the prior distribution of internal nodes becomes

Pr(i)=πi1[product]b[set membership]Ipip(b)ic(b)(tb).

(b) Posterior expectations of mapping summaries

Consider an arbitrary branch b* connecting parent internal node p(b*) to its child c(b*). First, we introduce restricted moments,


The expectation (3.4) integrates over all evolutionary mappings consistent with D on the tips of τ. Invoking the law of total expectation and the definition of conditional probability, we deduce

E[h({Xb*t})1D]=E[h({Xb*t)}|D]×Pr(D)=iE[h({Xb*t})|i,D]×Pr(i,D)=iE[h({Xb*t})|ip(b*),ic(b*)]πi1[product]b[set membership]Ipip(b)ic(b)(tb)[product]b[set membership]Epip(b)Dc(b)(tb)=ieip(b*)ic(b*)(h,tb*)πi1[product]b[set membership]I\{b*}pip(b)ic(b)(tb)[product]b[set membership]E\{b*}pip(b)Dc(b)(tb).

The last expression in derivation (3.5) illustrates that in order to calculate the posterior restricted moment (3.4) along branch b*[set membership]I, we merely need to replace finite-time transition probability pip(b*)ic(b*)(tb*) with local expectation eip(b*)ic(b*)(h,tb*) in the likelihood formula (3.2). Similarly, if b*[set membership]E, we substitute eip(b*)Dc(b*)(h,tb*) for pip(b*)Dc(b*)(tb*) in (3.2). Given matrices P(tb) for bb* and E(h,tb*), we can sum over internal node states using Felsenstein's pruning algorithm to arrive at the restricted mean E[h({Xb*t})1D] and then divide this quantity by Pr(D) to obtain E[h({Xb*t})|D].

This procedure is efficient for calculating the posterior expectations of mapping summaries for one branch of τ. However, in practice, we need to calculate the mapping expectations over many branches and consequently execute the computationally intensive pruning algorithm many times. A similar problem is encountered in maximum-likelihood phylogenetic estimation during differentiation of the likelihood with respect to branch lengths. An algorithm, reviewed by Schadt et al. (1998), allows for computationally efficient, repeated replacement of one of the finite-time transition probabilities with an arbitrary function of the corresponding branch length in equation (3.2). This algorithm finds informal use since the 1980s in pedigree analysis (Cannings et al. 1980) and is implemented in most maximum-likelihood phylogenetic reconstruction software such as PAUP (J. Huelsenbeck 2007, personal communication).

Let Fu=(Fu1,,Fum) be the vector of forward, often called partial or fractional, likelihoods at node u. Element Fui is the probability of the observed data at only the tips that descend from the node u, given that the state of u is i. If u is a tip, then we initialize partial likelihoods via Fui=1{i=Du}. In case of missing or ambiguous data, Du denotes the subset of possible trait states, and forward likelihoods are set to Fui=1{i[set membership]Du}. During the first, upward traversal of τ, we compute forward likelihoods for each internal node u using the recursion


where b1 and b2 are indices of the branches descending from node u and c(b1) and c(b2) are the corresponding children of u. We denote the directional likelihoods in square brackets of equation (3.6) by Sb1i and Sb2i and record Sb=(Sb1,,Sbm) together with Fu. Finally, we define backward likelihoods Gu=(Gu1,,Gum), where Gui is the probability of observing state i at node u together with other tip states on the subtree of τ obtained by removing all lineages downstream of node u. A second, downward traversal of τ yields Gu given the precomputed Sb.

For each branch b*, we can sandwich pij(tb*) among the forward, directional and backward likelihoods and write


where b′ is the second branch descending from the parent node p(b*). Therefore, with Fu, Sb and Gu precomputed for all nodes and branches of τ, we can replace pij(tb*) with any other quantity for any arbitrary branch b without repeatedly traversing τ. In particular, the posterior restricted moment for branch b* can be expressed as


In figure 1, we use an example tree to illustrate the correspondence between each quantity in sandwich formula (3.8) and the part of the tree involved in this quantity computation. We summarize all steps that lead to the computation of the global mean E[H(Mθ)|D] in algorithm 1. Note that Pr(D), needed to transition between conditional and restricted expectations in formula (3.4), is computed with virtually no additional cost in step (iv) of the algorithm.

Figure 1
Sandwich formula illustration. (a) An example phylogenetic tree in which we label internal nodes numerically and two branches b* and b′. We break this tree at nodes 3 and 4 into the subtrees shown in (b). Assuming that the trait states are i and ...
Algorithm 1
Calculating posterior expectations E[H(Mθ)|D].

(c) Pulley principle for evolutionary mappings

Suppose that we are interested in a mean mapping summary E[H(Mθ)|D], obtained as a sum of local mapping summaries over all branches of the phylogenetic tree τ. We would like to know whether quantity E[H(Mθ)|D] changes when we move the root of τ to a different location.

Recall that reversibility of the Markov chain {Xt} makes Pr(D) invariant to the root placement in τ if the root distribution π is the stationary distribution of {Xt} (Felsenstein 1981). Felsenstein's pulley principle rests on the detailed balance condition


and Chapman–Kolmogorov relationship


Applying both formulae (3.9) and (3.10) once in equation (3.2) allows one to move the root to any position along the two root branches without changing Pr(D). Therefore, Pr(D) is invariant to moving the root to any position on any branch of τ since we can repeatedly apply the detailed balance condition and the Chapman–Kolmogorov equation.

Invariance of Pr(D) to root placement together with formula (3.4) suggests that root position invariance of conditional expectations E[H(Mθ)|D] holds if and only if invariance of joint expectations E[H(Mθ)1D] is satisfied. Consider a two-tip phylogeny with branches of length t1 and t2 leading to observed trait states D1 and D2, respectively. According to formulae (1.2) and (3.5), we may expect that


depends only on the sum t1+t2. Therefore, we can move the root anywhere on this phylogeny without altering the expectations if {Xt} is reversible. It is easy to see that repeated application of derivation (3.11) readily allows for extension of the root invariance principle to n-tip phylogenies.

In derivation (3.11), we use identities




Equation (3.12) splits computation of the expected summary on interval [0, t1+t2) into calculations bound to intervals [0,t1) and [t1,t1+t2) with the help of the total expectation law and Markov property. This derivation parallels that of Chapman–Kolmogorov equation (3.10). Identity (3.13) requires more care as it does not hold for all choices of function h. Using equation (2.12), detailed balance condition (3.13) holds for h2=Rw. However, equation (2.3) suggests that we only can guarantee the detailed balanced condition (3.13) for h1=NL when (i,j)[set membership]L if and only if (j,i)[set membership]L.

(d) Prior expectations of mapping summaries

In many applications of stochastic mapping, one wishes to compare the prior to posterior expectations of summaries (Nielsen 2002). In this section, we derive the formulae necessary for computing prior expectations. Similar to our derivation of posterior expectations, we begin by considering an arbitrary branch b* and use the law of total expectation to arrive at

E[h({Xb*t})]={ieip(b*)ic(b*)(h,tb*)πi1[product]b[set membership]I\{b*}pip(b)ic(b)(tb)ifb*[set membership]Iieip(b*)(h,tb*)πi1[product]b[set membership]Ipip(b)ic(b)(tb)ifb*[set membership]E,

where ei(h,t)=j=1meij(h,t) is the marginal local expectation of the mapping summary.

Identity P(t)1=1 allows us to eliminate summation over some internal node states in formula (3.14) and consider only those internal nodes that lie on the path connecting the root of τ and c(b*). If π is the stationary distribution of {Xt}, then formula (3.14) together with identities πTP(t)=πT and P(t)1=1 simplifies prior local expectations even further,


The fact that prior local expectations at stationarity compute with virtually no additional burden has immediate practical implications. In the context of the posterior predictive model checking, researchers often need to simulate L independent and identically distributed (iid) realizations D1,,DL of data at the tips of τ and then calculate the summary-based discrepancy measure 1/Ll=1LE[h({Xt})|Dl] (Nielsen 2002). Since we know the ‘true’ model under simulation, we can approximate this discrepancy measure with the prior expectation


where D* ranges over all possible trait values at the tips of τ. In molecular evolution applications, L is of the order of 103–105, and hence approximation (3.16) should work well.

(e) Comparison with simulation-based stochastic mapping

We comment earlier that the Monte Carlo algorithm for stochastic mapping is an alternative and a very popular way to compute expectations of mapping summaries. This algorithm consists of two major steps. In the first step, the tree is traversed once to compute Fu for each node. In the second step, internal node states i are simulated conditional on D (Pagel 1999). Then, conditional on i, one simulates CTMC trajectories on each branch of the tree and computes summaries of interest. This second step is repeated N times producing a Monte Carlo sample of mapping summaries whose averages approximate the branch-specific expectations.

The running time of the algorithm depends on N and the computational efficiency of generating CTMC trajectories. The number of samples N, required for accurate estimation, varies with Q and T. Unfortunately, this aspect of stochastic mapping is largely ignored in the literature and by practitioners. Although more than one way exists to simulate trajectories conditional on starting and ending states (Rodrigue et al. 2008), rejection sampling is the most common (Nielsen 2002). Assessing the efficiency of rejection sampling is complicated, because the efficiency depends not only on the choice of CTMC parameters, but also on the observed data patterns D. We illustrate these difficulties using a CTMC on a state space of 64 codon triplets. We take two sites from an alignment of 129 HIV sequences that we discuss later in one of our examples. We call the first site slow evolving as it obtains only three different codons. The second, fast evolving site emits nine different codons. We take one-parameter slice of our Markov chain Monte Carlo (MCMC) sample and run rejection sampling to estimate the expected number of mutations for all branches of τ. For each trial, we record the total number of rejections on all branches of τ and the absolute errors of the estimated mean number of synonymous mutations summed over all branches. We summarize the results of this experiment in table 1. We see a 400× increase in the number of rejections required to simulate trajectories conditional on the fast site compared to equivalent simulations based on the slow site. The Monte Carlo error decreases as the number of Monte Carlo iterations increases, but not as fast as one would hope.

Table 1
Efficiency and accuracy of stochastic mapping. (For each number of iterations, we report the median number of rejected CTMC trajectories over the entire phylogenetic tree per iteration and the sum of absolute errors (SAE) of simulation-based estimates ...

In summary, simulation-based stochastic mapping requires simulation of CTMC trajectories; this is not a trivial computational task. Assessing accuracy of methods is cumbersome and difficult to automate. Our algorithm 1 replaces both simulation components from stochastic mapping calculations and therefore should be a preferred way of calculating expectations of mapping summaries.

4. Examples

(a) Detecting coevolution via dwelling times

In this section, we reformulate a previously developed simulation-based method for detection of correlated evolution (Huelsenbeck et al. 2003) in terms of a Markov reward process. We consider two primate evolutionary traits, oestrus advertisement (EA) and multimale mating system (MS), analysed by Pagel & Meade (2006). These authors first use cytochrome b molecular sequences to estimate the posterior distribution of the phylogenetic relationship among 60 Old World monkeys and ape species. Using 500 MCMC samples from the posterior distribution of phylogenetic trees, Pagel and Meade run another MCMC chain, this time with a reversible jump component that explores a number of CTMC models involving EA and MS traits and assess the models' posterior probabilities to learn about EA/MS coevolution. The authors find support in favour of a hypothesis stating that EA presence correlates with MS presence. The trait data are shown in figure 2 together with a phylogenetic tree, randomly chosen from the posterior sample. While this is not the case for Pagel & Meade (2006), reversible jump MCMC for model selection can be difficult to implement, especially as the number of trait states grows. Methods that simply require data fitting under the null model are warranted. Consequentially, we revisit this dataset and apply posterior predictive model diagnostics to test the hypothesis of independent evolution between EA and MS traits.

Figure 2
Primate trait data. We plot a phylogenetic tree, randomly chosen from the posterior sample, of 60 primate species. Branches of the sampled tree are not drawn to scale, nor is the tree ultrametric. Taxa names and trait values (‘0’, absence; ...

Our null model assumes that EA and MS evolve independently as 2 two-state Markov chains Xt(1), Xt(2)[set membership]{0,1}, where 0 and 1, respectively, stand for trait absence and presence. Let the infinitesimal generators of the EA and MS CTMCs be


We form a product Markov chain Yt=(Xt(1),Xt(2)) on the state space {(0,0),(0,1),(1,0),(1,1)} that keeps track of presence/absence of the two traits simultaneously assuming that they evolve independently. The generator of the product chain is obtained via the Kronecker sum ([plus sign in circle]),

Φ=QMS[plus sign in circle]QEA=((α0+β0)β0α00β1(α0+β1)0α0α10(α1+β0)β00α1β1(α1+β1)).

The Kronecker sum representation extends to general finite state-space Markov chains and to an arbitrary number of independently evolving traits (Neuts 1995). Computationally, this representation is advantageous, because eigenvalues and eigenvectors of a potentially high-dimensional product chain generator derive analytically from eigenvalues and eigenvectors of low-dimensional individual generators (Laub 2004).

To test the independent evolution model fit via posterior predictive diagnostics, we need a discrepancy measure (Meng 1994). Following Huelsenbeck et al. (2003), we employ mean dwelling times to form a discrepancy measure. Let Z=b=1Btb be the tree length of τ. We define the mean dwelling times Zi(1) and Zi(2) of traits Xt(1) and Xt(2) in state i, and the mean dwelling time Zij of the product chain Yt in state (i, j) for i, j=0, 1. More formally, we set


where w1=(1,0), w2=(0,1), w00=(1,0,0,0), w01=(0,1,0,0), w10=(0,0,1,0), w11=(0,0,0,1) and D(1), D(2) are observations of the two traits on the tips of τ.

Using the dwelling times, we define ‘expected’ ϵij and ‘observed’ ηij fractions of time the two traits spend in states i and j,


We use quotation marks because both quantities are not observed. To quantify the deviation from the null hypothesis of independence seen in the data, we introduce the discrepancy measure,


This measure returns small values under the null and implicitly depends on τ and branch lengths T. We account for this dependence and phylogenetic uncertainty by averaging our results over a finite sample from the posterior distribution of τ and T, obtained from the molecular sequence data.

We use the software package BayesTraits to accomplish this averaging and to produce a MCMC sample from the posterior distribution of QEA and QMS, assuming the null model of independent evolution of the two traits (Pagel et al. 2004). Each iteration sample from the output of BayesTraits consists of (τ,T,α0,α1,β0,β1) drawn from their posterior distribution. Given these model parameters, we generate a new dataset Drep and compute the observed Δ(QEA,QMS,D) and predicted Δ(QEA,QMS,Drep) discrepancies for each iteration. We then compare their marginal distributions by plotting their corresponding histograms (figure 3). In figure 3, we also plot the observed against predicted discrepancies to display the correlation between these two random variables. The apparent disagreement between observed and predicted discrepancies is a manifestation of poor fit of the independent model of evolution. The observed discrepancy consistently exceeds the predicted quantity. To illustrate the performance of predictive diagnostics when the independent model fits data well, we simulate trait data under this model along one of the 500 a posteriori supported phylogenetic trees. Figure 3c,d depicts the result of the posterior model diagnostics applied to the simulated data. In contrast to the observed primate data, the simulated data do not exhibit disagreement between the observed and predicted discrepancies.

Figure 3
Testing coevolution. (a,c) Plots depicting observed (white bars) and predicted (grey bars) distributions of the discrepancy measure for the (a,b) primate and (c,d) simulated data. (b,d) The scatter plots of these distributions.

Disagreement between the observed and predicted discrepancies can be quantified using a tail probability, called a posterior predictive p value,


where the tail probability is taken over the posterior distribution of the independent model. In practice, given N MCMC iterations, one estimates posterior predictive p values via


where QEA(g),QMS(g) are the parameter values realized at iteration g, and Drep.g is a dataset simulated using these parameter values. Following this recipe, we estimate ppp for the primate data and the artificial data. The disagreement between the ‘observed’ and ‘predicted’ discrepancies is reflected in a low ppp=0.0128. By contrast, the ppp=0.3139, for the simulated data, supports agreement between the observed and predicted distributions of Δ.

(b) Mapping synonymous and non-synonymous mutations

In this section, we consider the important task of mapping synonymous and non-synonymous mutations onto branches of a phylogenetic tree. Our point of departure is a recent ambitious analysis of HIV intrahost evolution by Lemey et al. (2007), who use sequence data originally reported by Shankarappa et al. (1999). The authors attempt to estimate branch-specific synonymous and non-synonymous mutation rates and then project these measurements onto a time axis. This projection enables one to relate the time evolution of selection processes with clinical covariates. Lemey et al. (2007) find fitting codon models computationally prohibitive in this case. Instead, they first fit a DNA model to the intrahost HIV sequences, obtain a posterior sample of phylogenies with branch lengths, and then use these phylogenies to fit a codon model to the same DNA sequence alignment. Instead of fitting two different models to the data, we propose to use just a DNA model and exploit mapping summaries to predict synonymous and non-synonymous mutation rates.

Suppose we observe a multiple DNA sequence alignment D=(D1,,DL) of a protein-coding region with L sites and that all C=L/3 codons are aligned to each other such that the coding region starts at site 1 of D (in other words, there is no frameshift). Following the codon partitioning model of Yang (1996), we assume that sites corresponding to all first codon positions, D1,D4,,DL2, evolve according to a standard HKY model with generator QHKY(κ1,π1), where κ1 is the transition–transversion ratio and π1 is the stationary distribution, both just for the first codon position appropriately constrained. Similarly, we define CTMC generators QHKY(κ2,π2) and QHKY(κ3,π3) for the other two codon positions with independent parameters. Assuming that all L nucleotide sites in D evolve independently together with the three codon position HKY models induces a product Markov chain model on the space of codons (AAA,AAG,,TTT), where codons are arranged in lexicographic order with respect to our nucleotide order A<G<C<T, the generator of this product CTMC is

Qcodon=QHKY(κ1,π1)[plus sign in circle]QHKY(κ2,π2)[plus sign in circle]QHKY(κ3,π3).

With this Markov chain on the codon space, we define a labelling L(s) that contains all possible pairs of codons that translate into the same amino acid. All other codon pairs are collected into a labelling set L(n). Clearly, transitions between elements of L(s) constitute synonymous mutations, and non-synonymous mutations are represented by transitions between elements of L(n). In this manner, counting processes map synonymous and non-synonymous mutations onto specific branches of τ. We consider HIV sequences from patient 1 of Shankarappa et al. (1999) and approximate the posterior distribution of our DNA model parameters Pr(Qcodon,τ,T|D) using MCMC sampling implemented in the software package BEAST (Drummond & Rambaut 2007). The serially sampled HIV sequences permit us to estimate the branch lengths T in units of clock time, months in this case. For each saved MCMC sample, we compute branch-specific rates of synonymous and non-synonymous mutations,


where we denote data at codon site c by Dc:c+2. We also record the fraction rb(n)/(rb(s)+rb(n)) of non-synonymous mutations. Similar to Lemey et al. (2007), we summarize these measurements by projecting them on the time axis. To this end, we form a finite time grid and produce a density profile of the synonymous and non-synonymous rates, and of the non-synonymous mutation fractions for each time interval between grid points (figure 4). Both synonymous and non-synonymous rate density profiles are consistently bimodal across time. Interestingly, the modes also stay appreciably constant. The density profile of the non-synonymous mutation fractions is multimodal and fairly complex. There are a considerable number of branches that exhibit strong negative ((rb(n))/(rb(s)+rb(n))~0) and positive ((rb(n))/(rb(s)+rb(n))~1) selection. In general, when comparing branches that occur earlier in the tree to those that occur later, the non-synonymous mutation fraction has first a modest upward trend through time and then descends to lower values, consistent with other patterns of evolutionary diversity reported by Shankarappa et al. (1999).

Figure 4
Time evolution of synonymous and non-synonymous rates. (a) A representative phylogeny of 129 intrahost HIV sequences. The three heat maps depict the marginal posterior densities of the (b) synonymous and (c) non-synonymous rates, and (d) the proportion ...

Intrigued by the multimodality observed in figure 4, we investigate this issue further. Lemey et al. (2007) consider several branch categories in their analysis, e.g. internal and terminal. We decide to test whether differences exist between selection forces acting on internal and terminal branches of τ. We define the fractions of non-synonymous mutations on internal and terminal branches as

ρE(Qcodon,τ,T,D)=b[set membership]Ec=1CE[NL(n)(tb)|Dc:c+2]b[set membership]Ec=1C{E[NL(s)(tb)|Dc:c+2]+E[NL(n)(tb)|Dc:c+2]}


ρI(Qcodon,τ,T,D)=b[set membership]Ic=1CE[NL(n)(tb)|Dc:c+2]b[set membership]Ic=1C{E[NL(s)(tb)|Dc:c+2]+E[NL(n)(tb)|Dc:c+2]}.

We plot their posterior histograms in figure 5. These histograms do not overlap, suggesting different fractions of non-synonymous mutations for internal and external branches. To test this hypothesis more formally, we form a discrepancy measure


As in our previous example, we compare the observed discrepancy Δ(Qcodon,τ,T,D) with the expected discrepancy Δ(Qcodon,τ,T,Drep), where Drep is a multiple sequence alignment simulated under the codon partitioning model with parameters Qcodon, τ and T. Evoking approximation (3.16) and recalling that our model assumes the same substitution rates for each branch of τ, we deduce that


for all parameter values Qcodon, T and τ and replicated data Drep. Plugging in our new discrepancy measures into equation (4.7), we find that ppp<0.001. Therefore, our posterior predictive test suggests that there is a significant heterogeneity in selective pressure among the branches of τ.

Figure 5
Bimodality of the fraction of non-synonymous mutations. We plot the predicted fraction of non-synonymous mutations computed for terminal (grey bars) and internal (white bars) branches.

5. Discussion

In this paper, we develop a computationally efficient framework for mapping evolutionary trajectories onto phylogenies. Although we aim to keep this mathematical framework fairly general, our main interest with evolutionary mappings lies in computing the mean number of labelled trait transitions and the mean evolutionary reward that depends linearly on the time a trait occupies each of its states. These two mapping summaries have been the most promising building blocks for constructing statistical tests.

We build upon our earlier work involving single branch calculations for Markov-induced counting processes (Minin & Suchard 2008). In our extension, we introduce single branch calculations for evolutionary reward processes and devise algorithms to extend single branch calculations to mapping expectations of counting and reward processes onto branches across an entire phylogeny. Our main result generalizes Felsenstein's pruning algorithm that forms the workhorse of modern phylogenetic computation. The generalized pruning algorithm warrants two comments about its efficiency for performing simulation-free stochastic mapping. First, when the infinitesimal rates are unknown, a traditionally slow component of phylogenetic inference is the eigen decomposition of their matrix. Fortunately, this decomposition finds immediate reuse in our algorithm to calculate posterior expectations of mappings. Second, the algorithm requires only two traversals of the phylogenetic tree, and is therefore at most two times slower than the standard likelihood evaluation algorithm. In practice, we find our algorithm approximately 1.5 times slower than the likelihood calculation. We achieve this advantage because during the second traversal n terminal branches are not visited. Finally, the Felsenstein's algorithm analogy yields a pulley principle for stochastic mapping and reduction in computation for prior expectations.

Our examples demonstrate how our novel algorithm facilitates phylogenetic exploratory analysis and hypothesis testing. First, we use simulation-free stochastic mapping of occupancy times to re-implement Huelsenbeck et al.'s (2003) posterior predictive test of independent evolution. In our second example, we attempt to recover synonymous and non-synonymous mutation rates without resorting to codon models and instead use an independent codon partitioning model. We overcome this gross model misspecification with stochastic mapping, find intriguing multimodality of synonymous and non-synonymous rates, and use a posterior predictive model check to test differences in selection pressures between terminal and internal branches. We stress that our predictions are only as good as the model we use. For example, the terminal/internal branch differences may be due to a general bad fit of our purposely misspecified model to the intrahost HIV data. However, we find this scenario unlikely in light of the recent demonstration of the excellent performance of codon partitioning models during analyses of protein coding regions (Shapiro et al. 2006).

Our examples illustrate importance of hypothesis testing in statistical phylogenetics. In recent years, it has become clear that an evolutionary analysis almost never ends with tree estimation. Importantly, phylogenetic inference enables evolutionary biologists to tackle scientific hypotheses, appropriately accounting for ancestry-induced correlation in observed trait values (Huelsenbeck et al. 2000; Pagel & Lutzoni 2002). Several authors demonstrate that mapping evolutionary histories onto inferred phylogenies provides a convenient and probabilistically grounded basis for designing statistically rigorous tests of evolutionary hypotheses (Nielsen 2002; Huelsenbeck et al. 2003; Dimmic et al. 2005). Unfortunately, this important statistical technique has been hampered by the high computational cost of stochastic mapping. Our general mathematical framework and fast algorithms should secure a central place for stochastic mapping in the statistical toolbox of evolutionary biologists.


We would like to thank Philippe Lemey for sharing his HIV codon alignments. We are also grateful to Andrew Meade for his help on using the BayesTraits software and Ziheng Yang and two anonymous reviewers for their constructive criticism. M.A.S. is supported by an Alfred P. Sloan Research Fellowship, John Simon Guggenheim Memorial Fellowship and NIH R01 GM086887.


One contribution of 17 to a Discussion Meeting Issue ‘Statistical and computational challenges in molecular phylogenetics and evolution’.


  • Ball F, Milne R. Simple derivations of properties of counting processes associated with Markov renewal processes. J. Appl. Prob. 2005;42:1031–1043. doi:10.1239/jap/1134587814
  • Cannings, C., Thompson, E. & Skolnick, M. 1980 Pedigree analysis of complex models. In Current developments in anthropological genetics, pp. 251–298. New York, NY: Plenum Press.
  • Dimmic M, Hubisz M, Bustamante C, Nielsen R. Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics. 2005;21:i126–i135. doi:10.1093/bioinformatics/bti1032 [PubMed]
  • Drummond A, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 2007;7:214. doi:10.1186/1471-2148-7-214 [PMC free article] [PubMed]
  • Dutheil J, Pupko T, Marie A, Galtier N. A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 2005;22:1919–1928. doi:10.1093/molbev/msi183 [PubMed]
  • Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 1981;13:93–104. [PubMed]
  • Felsenstein J. Sinauer Associates; Sunderland, MA: 2004. Inferring phylogenies.
  • Gelman A, Meng X, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 1996;6:733–807.
  • Guindon S, Rodrigo A, Dyer K, Huelsenbeck J. Modeling the site-specific variation of selection patterns along lineages. Proc. Natl Acad. Sci. USA. 2004;101:12 957–12 962. doi:10.1073/pnas.0402177101 [PubMed]
  • Holmes I, Rubin G.M. An expectation maximization algorithm for training hidden substitution models. J. Mol. Biol. 2002;317:753–764. doi:10.1006/jmbi.2002.5405 [PubMed]
  • Huelsenbeck J, Rannala B, Masly J. Accommodating phylogenetic uncertainty in evolutionary studies. Science. 2000;288:2349–2350. doi:10.1126/science.288.5475.2349 [PubMed]
  • Huelsenbeck J, Nielsen R, Bollback J. Stochastic mapping of morphological characters. Syst. Biol. 2003;52:131–158. doi:10.1080/10635150390192780 [PubMed]
  • Lange K. Springer; New York, NY: 2004. Applied probability.
  • Laub A. SIAM; Philadelphia, PA: 2004. Matrix analysis for scientists and engineers.
  • Lemey P, Pond S.K, Drummond A, Pybus O, Shapiro B, Barroso H, Taveira N, Rambaut A. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput. Biol. 2007;3:e29. doi:10.1371/journal.pcbi.0030029 [PubMed]
  • Leschen R, Buckley T. Multistate characters and diet shifts: evolution of Erotylidae (Coleoptera) Syst. Biol. 2007;56:97–112. doi:10.1080/10635150701211844 [PubMed]
  • Meng X. Posterior predictive p-values. Ann. Stat. 1994;22:1142–1160. doi:10.1214/aos/1176325622
  • Minin V, Suchard M. Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 2008;56:391–412. doi:10.1007/s00285-007-0120-8 [PubMed]
  • Neuts M. Chapman and Hall; London, UK: 1995. Algorithmic probability: a collection of problems.
  • Nielsen R. Mapping mutations on phylogenies. Syst. Biol. 2002;51:729–739. doi:10.1080/10635150290102393 [PubMed]
  • Pagel M. The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst. Biol. 1999;48:612–622. doi:10.1080/106351599260184
  • Pagel, M. & Lutzoni, F. 2002 Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation. Biological evolution and statistical physics, pp. 148–161. Berlin, Germany: Springer.
  • Pagel M, Meade A. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 2006;167:808–825. doi:10.1086/503444 [PubMed]
  • Pagel M, Meade A, Barker D. Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 2004;53:673–684. doi:10.1080/10635150490522232 [PubMed]
  • Rodrigue N, Philippe H, Lartillot N. Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models. Bioinformatics. 2008;24:56–62. doi:10.1093/bioinformatics/btm532 [PubMed]
  • Schadt E, Sinsheimer J, Lange K. Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res. 1998;8:222–233. [PubMed]
  • Shankarappa R, et al. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 1999;73:10 489–10 502. [PMC free article] [PubMed]
  • Shapiro B, Rambaut A, Drummond A. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 2006;23:7–9. doi:10.1093/molbev/msj021 [PubMed]
  • Yang Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 1996;42:587–596. doi:10.1007/BF02352289 [PubMed]

Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society