Home | About | Journals | Submit | Contact Us | Français |

Formats

Article sections

- Abstract
- 1. Introduction and background
- 2. Local, one branch calculations
- 3. Assembling pieces together over a tree
- 4. Examples
- 5. Discussion
- References

Authors

Related links

Philos Trans R Soc Lond B Biol Sci. 2008 December 27; 363(1512): 3985–3995.

Published online 2008 October 7. doi: 10.1098/rstb.2008.0176

PMCID: PMC2607419

Copyright © 2008 The Royal Society

This article has been cited by other articles in PMC.

Mapping evolutionary trajectories of discrete traits onto phylogenies receives considerable attention in evolutionary biology. Given the trait observations at the tips of a phylogenetic tree, researchers are often interested where on the tree the trait changes its state and whether some changes are preferential in certain parts of the tree. In a model-based phylogenetic framework, such questions translate into characterizing probabilistic properties of evolutionary trajectories. Current methods of assessing these properties rely on computationally expensive simulations. In this paper, we present an efficient, simulation-free algorithm for computing two important and ubiquitous evolutionary trajectory properties. The first is the mean number of trait changes, where changes can be divided into classes of interest (e.g. synonymous/non-synonymous mutations). The mean evolutionary reward, accrued proportionally to the time a trait occupies each of its states, is the second property. To illustrate the usefulness of our results, we first employ our simulation-free stochastic mapping to execute a posterior predictive test of correlation between two evolutionary traits. We conclude by mapping synonymous and non-synonymous mutations onto branches of an HIV intrahost phylogenetic tree and comparing selection pressure on terminal and internal tree branches.

Reconstructing evolutionary histories from present-day observations is a central problem in quantitative biology. Phylogenetic estimation is one example of such reconstruction. However, phylogenetic reconstruction alone does not provide a full picture of an evolutionary history, because evolutionary paths (mappings) describing trait states along the phylogenetic tree remain hidden. Although one is rarely interested in detailed reconstruction of such mappings, certain probabilistic properties of the paths are frequently used in evolutionary hypotheses testing (Nielsen 2002; Huelsenbeck *et al*. 2003; Leschen & Buckley 2007). For example, given a tree and a Markov model of amino acid evolution, one can compute the expected number of times a transition from a hydrophobic to a hydrophilic state occurs, conditional on the observed amino acid sequence alignment. Such expectations can inform researchers about model adequacy and provide insight into the features of the evolutionary process overlooked by standard phylogenetic techniques (Dimmic *et al*. 2005).

Nielsen (2002) introduced stochastic mapping of trait states on trees and employed this new technique in a model-based evolutionary hypothesis testing context. The author starts with a discrete evolutionary trait *X* that attains *m* states. He further assumes that this trait evolves according to an evolutionary model described by a parameter vector ** θ**, where

A major advantage of Nielsen's stochastic mapping framework is its ability to account for uncertainty in model parameters, including phylogenies. A major limitation of stochastic mapping is its current implementation that relies on time-consuming simulations. In describing his method for calculating $E[H({\mathit{M}}_{\mathit{\theta}})]$ and $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$, Nielsen (2002) wrote ‘In general, we can not evaluate sums in equations 5 and 6 directly, because the set [of all possible mappings] is not of finite size.’ However, the infinite number of possible mappings does not prevent one from explicitly calculating $E[H({\mathit{M}}_{\mathit{\theta}})]$ for some choices of *H*. For example, if

$$H({\mathit{M}}_{\mathit{\theta}})=\{\begin{array}{cc}1& \text{if}\phantom{\rule{0.25em}{0ex}}{\mathit{M}}_{\mathit{\theta}}\phantom{\rule{0.25em}{0ex}}\text{is}\phantom{\rule{0.25em}{0ex}}\text{consistent}\phantom{\rule{0.25em}{0ex}}\text{with}\phantom{\rule{0.25em}{0ex}}\mathit{D}\\ 0& \text{if}\phantom{\rule{0.25em}{0ex}}{\mathit{M}}_{\mathit{\theta}}\phantom{\rule{0.25em}{0ex}}\text{is}\phantom{\rule{0.25em}{0ex}}\text{inconsistent}\phantom{\rule{0.25em}{0ex}}\text{with}\phantom{\rule{0.25em}{0ex}}\mathit{D},\end{array}$$

(1.1)

then $E[H({\mathit{M}}_{\mathit{\theta}})]=\text{Pr}(\mathit{D})$, the familiar phylogenetic likelihood that can be evaluated without simulations (Felsenstein 2004). Therefore, hope remains that other choices of *H* may also permit evaluation of $E[H({\mathit{M}}_{\mathit{\theta}})]$ and $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$ without simulations.

In this paper, we consider a class of additive mapping summaries of the form

$$H({\mathit{M}}_{\mathit{\theta}})=\sum _{b\Omega}$$

(1.2)

where $h(\{{X}_{bt}\})$ is a one-dimensional summary of the Markov chain path along branch *b* and *Ω* is an arbitrary subset of all branches of *τ*. Moreover, we restrict our attention to the two most popular choices of function *h*. Let $\mathcal{L}{\{1,\dots ,m\}}^{2}$ be a set of ordered index pairs that label transitions of trait *X*. For each Markov path {*X*_{t}} and interval [0,*t*), we count the number of labelled transitions in this interval and arrive at

$${h}_{1}(\{{X}_{t}\})={N}_{\mathcal{L}}\u2010\text{number}\phantom{\rule{0.25em}{0ex}}\text{of}\phantom{\rule{0.25em}{0ex}}\text{state}\phantom{\rule{0.25em}{0ex}}\text{transitions}\phantom{\rule{0.25em}{0ex}}\text{labelled}\phantom{\rule{0.25em}{0ex}}\text{by}\phantom{\rule{0.25em}{0ex}}\text{set}\phantom{\rule{0.25em}{0ex}}\mathcal{L},$$

(1.3)

where we omit dependence on ** θ** and

$${h}_{2}(\{{X}_{t}\})={R}_{\mathit{w}}\u2010\text{evolutionary}\phantom{\rule{0.25em}{0ex}}\text{reward}\phantom{\rule{0.25em}{0ex}}\text{defined}\phantom{\rule{0.25em}{0ex}}\text{by}\phantom{\rule{0.25em}{0ex}}\text{vector}\phantom{\rule{0.25em}{0ex}}\mathit{w}.$$

(1.4)

To obtain dwelling times of *X* in a predefined set of trait states, we set *w*_{i}=1 if *i* belongs to the set of interest and *w*_{i}=0 otherwise.

For these two choices of function *h*, we provide an algorithm for exact, simulation-free computation of $E[H({\mathit{M}}_{\mathit{\theta}})]$ and $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$. Similar to phylogenetic likelihood calculations of Pr(** D**), this algorithm relies on the eigen decomposition of

In this section, we provide mathematical details needed for calculating expectations of stochastic mapping summaries on one branch of a phylogenetic tree. We first motivate the need for such local computations by making further analogies between calculations of Pr(** D**) and expectations of stochastic mapping summaries. The additive form of

$${p}_{ij}(t)=\text{Pr}({X}_{t}=j|{X}_{0}=i),$$

(2.1)

for arbitrary branch length *t*. Similarly, to obtain $E[h(\left\{{X}_{t}\right\})]$ and $E[h(\left\{{X}_{t}\right\})|\mathit{D}]$, we require means of computing local expectations $\mathit{E}(h,t)=\{{e}_{ij}(h,t)\}$, where

$${e}_{ij}(h,t)=E[h(\left\{{X}_{t}\right\}){1}_{\{{X}_{t}=j\}}|{X}_{0}=i]$$

(2.2)

and 1_{{·}} is the indicator function. After illustrating how to compute $\mathit{E}({N}_{\mathcal{L}},t)$ and $\mathit{E}({R}_{\mathit{w}},t)$ without resorting to simulations, we provide an algorithm that efficiently propagates local expectations ** E**(

Abstracting from phylogenetics, let ${N}_{\mathcal{L}}(t)$ count the number of labelled transitions of a CTMC {*X*_{t}} during time interval [0,*t*). It follows from the theory of Markov chain-induced counting processes that

$$\mathit{E}({N}_{\mathcal{L}},t)={\int}_{0}^{t}{\text{e}}^{\mathit{Q}z}{\mathit{Q}}_{\mathcal{L}}{\text{e}}^{\mathit{Q}(t-z)}\phantom{\rule{0.25em}{0ex}}\text{d}z,$$

(2.3)

where ${\mathit{Q}}_{\mathcal{L}}=\{{q}_{ij}\times {1}_{\{(i,j)\mathcal{L}\}}$ (Ball & Milne 2005). Since most evolutionary models are locally reversible, we can safely assume that ** Q** is diagonalizable with eigen decomposition $\mathit{Q}=\mathit{U}\times \text{diag}({d}_{1},\dots ,{d}_{m})\times {\mathit{U}}^{-1}$, where eigenvectors of

$$\mathit{E}({N}_{\mathcal{L}},t)=\sum _{i=1}^{m}\sum _{j=1}^{m}{\mathit{S}}_{i}{\mathit{Q}}_{\mathcal{L}}{\mathit{S}}_{j}{I}_{ij}(t),$$

(2.4)

where ${\mathit{S}}_{i}=\mathit{U}{\mathit{E}}_{i}{\mathit{U}}^{-1}$, *E*_{i} is a matrix with zero entries everywhere except at the *ii*-th entry, which is one, and

$${I}_{ij}(t)=\{\begin{array}{ll}t{\text{e}}^{{d}_{i}t}\hfill & \text{if}\phantom{\rule{0.25em}{0ex}}{d}_{i}={d}_{j},\hfill \\ \frac{{\text{e}}^{{d}_{i}t}-{\text{e}}^{{d}_{j}t}}{{d}_{i}-{d}_{j}}\hfill & \text{if}\phantom{\rule{0.25em}{0ex}}{d}_{i}\ne {d}_{j}.\hfill \end{array}$$

(2.5)

For the reward process *R*_{w}(*t*), we define a matrix cumulative distribution function $\mathit{V}(x,t)=\{{V}_{ij}(x,t)\}$, where

$${V}_{ij}(x,t)=\text{Pr}({R}_{\mathit{w}}(t)\le x,\phantom{\rule{0.25em}{0ex}}{X}_{t}=j|{X}_{0}=i).$$

(2.6)

Neuts (1995, ch. 5) demonstrated that local reward expectations can be expressed as

$$\mathit{E}({R}_{\mathit{w}},t)=-{\frac{\text{d}}{\text{d}s}{\mathit{V}}^{*}(s,t)|}_{s=0},$$

(2.7)

where

$${\mathit{V}}^{*}(s,t)={\int}_{0}^{\infty}{\text{e}}^{-sx}\phantom{\rule{0.25em}{0ex}}\text{d}\mathit{V}(x,t)={\text{e}}^{[\mathit{Q}-\text{diag}({w}_{1},\dots ,{w}_{m})s]t}$$

(2.8)

is the Laplace–Stieltjes transform of ** V**(

$$\frac{\text{d}}{\text{d}t}{\mathit{V}}^{*}(s,t)={\mathit{V}}^{*}(s,t)[\mathit{Q}-\text{diag}({w}_{1},\dots ,{w}_{m})s].$$

(2.9)

Differentiating this matrix differential equation with respect to *s*, exchanging the order of differentiation and evaluating both sides of the resulting equation at *s*=0, we arrive at the differential equation for local expectations

$$\frac{\text{d}}{\text{d}t}\mathit{E}({R}_{\mathit{w}},t)=\mathit{E}({R}_{\mathit{w}},t)\mathit{Q}+{\text{e}}^{\mathit{Q}t}\text{diag}({w}_{1},\dots ,{w}_{m}),$$

(2.10)

where ** E**(

$$\mathit{E}({R}_{\mathit{w}},t)={\int}_{0}^{t}{\text{e}}^{\mathit{Q}z}\text{diag}({w}_{1},\dots ,{w}_{m}){\text{e}}^{\mathit{Q}(t-z)}\phantom{\rule{0.25em}{0ex}}\text{d}z.$$

(2.11)

Similarity between equations (2.3) and (2.11) invites calculation of the expected Markov rewards via spectral decomposition of ** Q**,

$$\mathit{E}({R}_{\mathit{w}},t)=\sum _{i=1}^{m}\sum _{j=1}^{m}{\mathit{S}}_{i}\text{diag}({w}_{1},\dots ,{w}_{m}){\mathit{S}}_{j}{I}_{ij}(t).$$

(2.12)

In summary, formulae (2.4) and (2.12) provide a recipe for exact calculations of local expectations for the number of labelled transitions and rewards.

Let us label the internal nodes of *τ* with integers {1,…,*n*−1} starting from the root of the tree. Recall that we have already arbitrarily labelled the tips of *τ* with integers {1,…,*n*}. Let $\mathcal{I}$ be the set of internal branches and $\mathcal{E}$ be the set of terminal branches of *τ*. For each branch *b*$\mathcal{I}$, we denote the internal node labels of the parent and child of branch *b* by *p*(*b*) and *c*(*b*), respectively. We use the same notation for each terminal branch *b* except *p*(*b*), which is an internal node index, while *c*(*b*) is a tip index. Let $\mathit{i}=({i}_{1},\dots ,{i}_{n-1})$ denote the internal node trait states. Then, the complete likelihood of unobserved internal node states and the observed states at the tips of *τ* is

$$\text{Pr}(\mathit{i},\mathit{D})={\pi}_{{i}_{1}}\underset{}{b\mathcal{I}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})\underset{}{b\mathcal{E}{p}_{{i}_{p(b)}{D}_{c(b)}}({t}_{b}).}}$$

(3.1)

We form the likelihood of the observed data by summing over all possible states of internal nodes,

$$\text{Pr}(\mathit{D})=\sum _{{i}_{1}=1}^{m}\sum _{{i}_{n-1}=1}^{m}{\pi}_{{i}_{1}}\underset{}{b\mathcal{I}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})\underset{}{b\mathcal{E}{p}_{{i}_{p(b)}{D}_{c(b)}}({t}_{b}).}}$$

(3.2)

Clearly, when data on the tips are not observed, the prior distribution of internal nodes becomes

$$\text{Pr}(\mathit{i})={\pi}_{{i}_{1}}\underset{}{b\mathcal{I}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b}).}$$

(3.3)

Consider an arbitrary branch *b*^{*} connecting parent internal node *p*(*b*^{*}) to its child *c*(*b*^{*}). First, we introduce restricted moments,

$$E[h(\left\{{X}_{{b}^{*}t}\right\}){1}_{\mathit{D}}]=E[h(\left\{{X}_{{b}^{*}t}\right\})|\mathit{D}]\times \text{Pr}(\mathit{D}).$$

(3.4)

The expectation (3.4) integrates over all evolutionary mappings consistent with ** D** on the tips of

$$E[h(\left\{{X}_{{b}^{*}t}\right\}){1}_{\mathit{D}}]=E[h(\{{X}_{{b}^{*}t})\left\}\right|\mathit{D}]\times \text{Pr}(\mathit{D})=\sum _{\mathit{i}}E[h(\left\{{X}_{{b}^{*}t}\right\})|\mathit{i},\mathit{D}]\times \text{Pr}(\mathit{i},\mathit{D})=\sum _{\mathit{i}}E[h(\left\{{X}_{{b}^{*}t}\right\})|{i}_{p({b}^{*})},{i}_{c({b}^{*})}]{\pi}_{{i}_{1}}\underset{}{b\mathcal{I}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})\underset{}{b\mathcal{E}{p}_{{i}_{p(b)}{D}_{c(b)}}({t}_{b})=\sum _{\mathit{i}}{e}_{{i}_{p({b}^{*})}{i}_{c({b}^{*})}}(h,{t}_{{b}^{*}}){\pi}_{{i}_{1}}\underset{}{b\mathcal{I}\backslash \left\{{b}^{*}\right\}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})\underset{}{b\mathcal{E}\backslash \left\{{b}^{*}\right\}{p}_{{i}_{p(b)}{D}_{c(b)}}({t}_{b}).}}}}$$

(3.5)

The last expression in derivation (3.5) illustrates that in order to calculate the posterior restricted moment (3.4) along branch ${b}^{*}\mathcal{I}$, we merely need to replace finite-time transition probability ${p}_{{i}_{p({b}^{*})}{i}_{c({b}^{*})}}({t}_{{b}^{*}})$ with local expectation ${e}_{{i}_{p({b}^{*})}{i}_{c({b}^{*})}}(h,{t}_{{b}^{*}})$ in the likelihood formula (3.2). Similarly, if ${b}^{*}\mathcal{E}$, we substitute ${e}_{{i}_{p({b}^{*})}{D}_{c({b}^{*})}}(h,{t}_{{b}^{*}})$ for ${p}_{{i}_{p({b}^{*})}{D}_{c({b}^{*})}}({t}_{{b}^{*}})$ in (3.2). Given matrices $\mathit{P}({t}_{b})$ for $b\ne {b}^{*}$ and $\mathit{E}(h,{t}_{{b}^{*}})$, we can sum over internal node states using Felsenstein's pruning algorithm to arrive at the restricted mean $E[h(\left\{{X}_{{b}^{*}t}\right\}){1}_{\mathit{D}}]$ and then divide this quantity by Pr(** D**) to obtain $E[h(\left\{{X}_{{b}^{*}t}\right\})|\mathit{D}]$.

This procedure is efficient for calculating the posterior expectations of mapping summaries for one branch of *τ*. However, in practice, we need to calculate the mapping expectations over many branches and consequently execute the computationally intensive pruning algorithm many times. A similar problem is encountered in maximum-likelihood phylogenetic estimation during differentiation of the likelihood with respect to branch lengths. An algorithm, reviewed by Schadt *et al*. (1998), allows for computationally efficient, repeated replacement of one of the finite-time transition probabilities with an arbitrary function of the corresponding branch length in equation (3.2). This algorithm finds informal use since the 1980s in pedigree analysis (Cannings *et al*. 1980) and is implemented in most maximum-likelihood phylogenetic reconstruction software such as PAUP (J. Huelsenbeck 2007, personal communication).

Let ${\mathit{F}}_{u}=({F}_{u1},\dots ,{F}_{um})$ be the vector of forward, often called partial or fractional, likelihoods at node *u*. Element *F*_{ui} is the probability of the observed data at only the tips that descend from the node *u*, given that the state of *u* is *i*. If *u* is a tip, then we initialize partial likelihoods via ${F}_{ui}={1}_{\{i={D}_{u}\}}$. In case of missing or ambiguous data, *D*_{u} denotes the subset of possible trait states, and forward likelihoods are set to ${F}_{ui}={1}_{\{i{D}_{u}\}}$. During the first, upward traversal of *τ*, we compute forward likelihoods for each internal node *u* using the recursion

$${F}_{ui}=\left[\sum _{j=1}^{m}{F}_{c({b}_{1})j}{p}_{ij}({t}_{{b}_{1}})\right]\times \left[\sum _{j=1}^{m}{F}_{c({b}_{2})j}{p}_{ij}({t}_{{b}_{2}})\right],$$

(3.6)

where *b*_{1} and *b*_{2} are indices of the branches descending from node *u* and *c*(*b*_{1}) and c(*b*_{2}) are the corresponding children of *u*. We denote the directional likelihoods in square brackets of equation (3.6) by ${S}_{{b}_{1}i}$ and ${S}_{{b}_{2}i}$ and record ${\mathit{S}}_{b}=({S}_{b1},\dots ,{S}_{bm})$ together with *F*_{u}. Finally, we define backward likelihoods ${\mathit{G}}_{u}=({G}_{u1},\dots ,{G}_{um})$, where *G*_{ui} is the probability of observing state *i* at node *u* together with other tip states on the subtree of *τ* obtained by removing all lineages downstream of node *u*. A second, downward traversal of *τ* yields *G*_{u} given the precomputed *S*_{b}.

For each branch *b*^{*}, we can sandwich ${p}_{ij}({t}_{{b}^{*}})$ among the forward, directional and backward likelihoods and write

$$\text{Pr}(\mathit{D})=\sum _{i=1}^{m}\sum _{j=1}^{m}{G}_{p({b}^{*})i}{S}_{{b}^{\prime}i}{p}_{ij}({t}_{{b}^{*}}){F}_{c({b}^{*})j},$$

(3.7)

where *b*′ is the second branch descending from the parent node *p*(*b*^{*}). Therefore, with *F*_{u}, *S*_{b} and *G*_{u} precomputed for all nodes and branches of *τ*, we can replace ${p}_{ij}({t}_{{b}^{*}})$ with any other quantity for any arbitrary branch *b* without repeatedly traversing *τ*. In particular, the posterior restricted moment for branch *b*^{*} can be expressed as

$$E[h(\left\{{X}_{{b}^{*}t}\right\}){1}_{\mathit{D}}]=\sum _{i=1}^{m}\sum _{j=1}^{m}{G}_{p({b}^{*})i}{S}_{{b}^{\prime}i}{e}_{ij}(h,{t}_{{b}^{*}}){F}_{c({b}^{*})j}.$$

(3.8)

In figure 1, we use an example tree to illustrate the correspondence between each quantity in sandwich formula (3.8) and the part of the tree involved in this quantity computation. We summarize all steps that lead to the computation of the global mean $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$ in algorithm 1. Note that Pr(** D**), needed to transition between conditional and restricted expectations in formula (3.4), is computed with virtually no additional cost in step (iv) of the algorithm.

Sandwich formula illustration. (*a*) An example phylogenetic tree in which we label internal nodes numerically and two branches *b*^{*} and *b*′. We break this tree at nodes 3 and 4 into the subtrees shown in (*b*). Assuming that the trait states are *i* and **...**

Suppose that we are interested in a mean mapping summary $E[H({\mathit{M}}_{\theta})|\mathit{D}]$, obtained as a sum of local mapping summaries over *all branches* of the phylogenetic tree *τ*. We would like to know whether quantity $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$ changes when we move the root of *τ* to a different location.

Recall that reversibility of the Markov chain {*X*_{t}} makes Pr(** D**) invariant to the root placement in

$${\pi}_{i}{p}_{ij}(t)={\pi}_{j}{p}_{ji}(t)$$

(3.9)

and Chapman–Kolmogorov relationship

$${p}_{ij}({t}_{1}+{t}_{2})=\sum _{k=1}^{m}{p}_{ik}({t}_{1}){p}_{kj}({t}_{2}).$$

(3.10)

Applying both formulae (3.9) and (3.10) once in equation (3.2) allows one to move the root to any position along the two root branches without changing Pr(** D**). Therefore, Pr(

Invariance of Pr(** D**) to root placement together with formula (3.4) suggests that root position invariance of conditional expectations $E[H({\mathit{M}}_{\mathit{\theta}})|\mathit{D}]$ holds if and only if invariance of joint expectations $E[H({\mathit{M}}_{\mathit{\theta}}){1}_{\mathit{D}}]$ is satisfied. Consider a two-tip phylogeny with branches of length

$$E[H({\mathit{M}}_{\mathit{\theta}}){1}_{\mathit{D}}]=\sum _{k=1}^{m}{\pi}_{k}{e}_{k{D}_{1}}(h,{t}_{1}){p}_{k{D}_{2}}({t}_{2})+\sum _{k=1}^{m}{\pi}_{k}{e}_{k{D}_{2}}(h,{t}_{2}){p}_{k{D}_{1}}({t}_{1})=\sum _{k=1}^{m}{\pi}_{{D}_{1}}{e}_{{D}_{1}k}(h,{t}_{1}){p}_{k{D}_{2}}({t}_{2})+\sum _{k=1}^{m}{\pi}_{{D}_{1}}{e}_{k{D}_{2}}(h,{t}_{2}){p}_{{D}_{1}k}({t}_{1})={\pi}_{{D}_{1}}{e}_{{D}_{1}{D}_{2}}(h,{t}_{1}+{t}_{2}),$$

(3.11)

depends only on the sum *t*_{1}+*t*_{2}. Therefore, we can move the root anywhere on this phylogeny without altering the expectations if {*X*_{t}} is reversible. It is easy to see that repeated application of derivation (3.11) readily allows for extension of the root invariance principle to *n*-tip phylogenies.

In derivation (3.11), we use identities

$${e}_{ij}(h,{t}_{1}+{t}_{2})=\sum _{k=1}^{m}[{e}_{ik}(h,{t}_{1}){p}_{kj}({t}_{2})+{e}_{kj}(h,{t}_{2}){p}_{ik}({t}_{1})]$$

(3.12)

and

$${\pi}_{i}{e}_{ij}(h,t)={\pi}_{j}{e}_{ji}(h,t).$$

(3.13)

Equation (3.12) splits computation of the expected summary on interval [0, *t*_{1}+*t*_{2}) into calculations bound to intervals [0,*t*_{1}) and [*t*_{1},*t*_{1}+*t*_{2}) with the help of the total expectation law and Markov property. This derivation parallels that of Chapman–Kolmogorov equation (3.10). Identity (3.13) requires more care as it *does not hold* for all choices of function *h*. Using equation (2.12), detailed balance condition (3.13) holds for *h*_{2}=*R*_{w}. However, equation (2.3) suggests that we only can guarantee the detailed balanced condition (3.13) for ${h}_{1}={N}_{\mathcal{L}}$ when $(i,j)\mathcal{L}$ if and only if $(j,i)\mathcal{L}$.

In many applications of stochastic mapping, one wishes to compare the prior to posterior expectations of summaries (Nielsen 2002). In this section, we derive the formulae necessary for computing prior expectations. Similar to our derivation of posterior expectations, we begin by considering an arbitrary branch *b*^{*} and use the law of total expectation to arrive at

$$E[h(\left\{{X}_{{b}^{*}t}\right\})]=\{\begin{array}{c}\sum _{\mathit{i}}{e}_{{i}_{p({b}^{*})}{i}_{c({b}^{*})}}(h,{t}_{{b}^{*}}){\pi}_{{i}_{1}}\underset{}{b\mathcal{I}\backslash \left\{{b}^{*}\right\}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})}\text{if}\phantom{\rule{0.25em}{0ex}}{b}^{*}\mathcal{I}& \sum _{\mathit{i}}{e}_{{i}_{p({b}^{*})}}(h,{t}_{{b}^{*}}){\pi}_{{i}_{1}}\underset{}{b\mathcal{I}{p}_{{i}_{p(b)}{i}_{c(b)}}({t}_{b})}\text{if}\phantom{\rule{0.25em}{0ex}}{b}^{*}\mathcal{E},\end{array}$$

(3.14)

where ${e}_{i}(h,t)={\sum}_{j=1}^{m}{e}_{ij}(h,t)$ is the marginal local expectation of the mapping summary.

Identity $\mathit{P}(t)\mathbf{1}=\mathbf{1}$ allows us to eliminate summation over some internal node states in formula (3.14) and consider only those internal nodes that lie on the path connecting the root of *τ* and *c*(*b*^{*}). If *π* is the stationary distribution of {*X*_{t}}, then formula (3.14) together with identities ${\pi}^{T}\mathit{P}(t)={\pi}^{T}$ and $\mathit{P}(t)\mathbf{1}=\mathbf{1}$ simplifies prior local expectations even further,

$$E[h(\left\{{X}_{{b}^{*}t}\right\})]=\sum _{i}\sum _{j}{\pi}_{i}{e}_{ij}(h,{t}_{{b}^{*}})={\pi}^{T}\mathit{E}(h,{t}_{{b}^{*}})\mathbf{1}=\{\begin{array}{cc}{\pi}^{T}{\mathit{Q}}_{\mathcal{L}}\mathbf{1}{t}_{{b}^{*}}& \text{if}\phantom{\rule{0.25em}{0ex}}h={N}_{\mathcal{L}}\\ \sum _{i=1}^{m}{\pi}_{i}{w}_{i}{t}_{{b}^{*}}& \text{if}\phantom{\rule{0.25em}{0ex}}h={R}_{\mathit{w}}\end{array}$$

(3.15)

The fact that prior local expectations at stationarity compute with virtually no additional burden has immediate practical implications. In the context of the posterior predictive model checking, researchers often need to simulate *L* independent and identically distributed (iid) realizations ${\mathit{D}}_{1},\dots ,{\mathit{D}}_{L}$ of data at the tips of *τ* and then calculate the summary-based discrepancy measure $1/L{\sum}_{l=1}^{L}E[h(\left\{{X}_{t}\right\})|{\mathit{D}}_{l}]$ (Nielsen 2002). Since we know the ‘true’ model under simulation, we can approximate this discrepancy measure with the prior expectation

$$\frac{1}{L}\sum _{l=1}^{L}E[h(\left\{{X}_{t}\right\})|{\mathit{D}}_{l}]\approx \sum _{{\mathit{D}}^{*}}E[h(\left\{{X}_{t}\right\})|{\mathit{D}}^{*}]\text{Pr}({\mathit{D}}^{*})=E[h(\left\{{X}_{t}\right\})],$$

(3.16)

where *D*^{*} ranges over all possible trait values at the tips of *τ*. In molecular evolution applications, *L* is of the order of 10^{3}–10^{5}, and hence approximation (3.16) should work well.

We comment earlier that the Monte Carlo algorithm for stochastic mapping is an alternative and a very popular way to compute expectations of mapping summaries. This algorithm consists of two major steps. In the first step, the tree is traversed once to compute *F*_{u} for each node. In the second step, internal node states ** i** are simulated conditional on

The running time of the algorithm depends on *N* and the computational efficiency of generating CTMC trajectories. The number of samples *N*, required for accurate estimation, varies with ** Q** and

Efficiency and accuracy of stochastic mapping. (For each number of iterations, we report the median number of rejected CTMC trajectories over the entire phylogenetic tree per iteration and the sum of absolute errors (SAE) of simulation-based estimates **...**

In summary, simulation-based stochastic mapping requires simulation of CTMC trajectories; this is not a trivial computational task. Assessing accuracy of methods is cumbersome and difficult to automate. Our algorithm 1 replaces both simulation components from stochastic mapping calculations and therefore should be a preferred way of calculating expectations of mapping summaries.

In this section, we reformulate a previously developed simulation-based method for detection of correlated evolution (Huelsenbeck *et al*. 2003) in terms of a Markov reward process. We consider two primate evolutionary traits, oestrus advertisement (EA) and multimale mating system (MS), analysed by Pagel & Meade (2006). These authors first use cytochrome *b* molecular sequences to estimate the posterior distribution of the phylogenetic relationship among 60 Old World monkeys and ape species. Using 500 MCMC samples from the posterior distribution of phylogenetic trees, Pagel and Meade run another MCMC chain, this time with a reversible jump component that explores a number of CTMC models involving EA and MS traits and assess the models' posterior probabilities to learn about EA/MS coevolution. The authors find support in favour of a hypothesis stating that EA presence correlates with MS presence. The trait data are shown in figure 2 together with a phylogenetic tree, randomly chosen from the posterior sample. While this is not the case for Pagel & Meade (2006), reversible jump MCMC for model selection can be difficult to implement, especially as the number of trait states grows. Methods that simply require data fitting under the null model are warranted. Consequentially, we revisit this dataset and apply posterior predictive model diagnostics to test the hypothesis of independent evolution between EA and MS traits.

Primate trait data. We plot a phylogenetic tree, randomly chosen from the posterior sample, of 60 primate species. Branches of the sampled tree are not drawn to scale, nor is the tree ultrametric. Taxa names and trait values (‘0’, absence; **...**

Our null model assumes that EA and MS evolve independently as 2 two-state Markov chains ${X}_{t}^{(1)}$, ${X}_{t}^{(2)}\{0,1\}$, where 0 and 1, respectively, stand for trait absence and presence. Let the infinitesimal generators of the EA and MS CTMCs be

$${\mathit{Q}}_{\text{EA}}=\left(\begin{array}{cc}-{\alpha}_{0}& {\alpha}_{0}\\ {\alpha}_{1}& -{\alpha}_{1}\end{array}\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\mathit{Q}}_{\text{MS}}=\left(\begin{array}{cc}-{\beta}_{0}& {\beta}_{0}\\ {\beta}_{1}& -{\beta}_{1}\end{array}\right).$$

(4.1)

We form a product Markov chain ${Y}_{t}=\left({X}_{t}^{(1)},{X}_{t}^{(2)}\right)$ on the state space $\{(0,0),(0,1),(1,0),(1,1)\}$ that keeps track of presence/absence of the two traits simultaneously assuming that they evolve independently. The generator of the product chain is obtained via the Kronecker sum (),

$$\mathit{\Phi}={\mathit{Q}}_{\text{MS}}{\mathit{Q}}_{\text{EA}}=\left(\begin{array}{cccc}-({\alpha}_{0}+{\beta}_{0})& {\beta}_{0}& {\alpha}_{0}& 0\\ {\beta}_{1}& -({\alpha}_{0}+{\beta}_{1})& 0& {\alpha}_{0}\\ {\alpha}_{1}& 0& -({\alpha}_{1}+{\beta}_{0})& {\beta}_{0}\\ 0& {\alpha}_{1}& {\beta}_{1}& -({\alpha}_{1}+{\beta}_{1})\end{array}\right).$$

(4.2)

The Kronecker sum representation extends to general finite state-space Markov chains and to an arbitrary number of independently evolving traits (Neuts 1995). Computationally, this representation is advantageous, because eigenvalues and eigenvectors of a potentially high-dimensional product chain generator derive analytically from eigenvalues and eigenvectors of low-dimensional individual generators (Laub 2004).

To test the independent evolution model fit via posterior predictive diagnostics, we need a discrepancy measure (Meng 1994). Following Huelsenbeck *et al*. (2003), we employ mean dwelling times to form a discrepancy measure. Let $Z={\sum}_{b=1}^{B}{t}_{b}$ be the tree length of *τ*. We define the mean dwelling times ${Z}_{i}^{(1)}$ and ${Z}_{i}^{(2)}$ of traits ${X}_{t}^{(1)}$ and ${X}_{t}^{(2)}$ in state *i*, and the mean dwelling time *Z*_{ij} of the product chain *Y*_{t} in state (*i*,*j*) for *i*, *j*=0, 1. More formally, we set

$${Z}_{i}^{(1)}=E\left[{R}_{{\mathit{w}}_{i}}|{\mathit{D}}^{(1)}\right],\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{Z}_{i}^{(2)}=E\left[{R}_{{\mathit{w}}_{i}}|{\mathit{D}}^{(2)}\right]\phantom{\rule{1em}{0ex}}\text{and}{Z}_{ij}=E\left[{R}_{{\mathit{w}}_{ij}}|{\mathit{D}}^{(1)},{\mathit{D}}^{(2)}\right],$$

(4.3)

where ${\mathit{w}}_{1}=(1,0)$, ${\mathit{w}}_{2}=(0,1)$, ${\mathit{w}}_{00}=(1,0,0,0)$, ${\mathit{w}}_{01}=(0,1,0,0)$, ${\mathit{w}}_{10}=(0,0,1,0)$, ${\mathit{w}}_{11}=(0,0,0,1)$ and *D*^{(1)}, *D*^{(2)} are observations of the two traits on the tips of *τ*.

Using the dwelling times, we define ‘expected’ *ϵ*_{ij} and ‘observed’ *η*_{ij} fractions of time the two traits spend in states *i* and *j*,

$${\u03f5}_{ij}=\frac{{Z}_{i}^{(1)}}{Z}\times \frac{{Z}_{j}^{(2)}}{Z}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\eta}_{ij}=\frac{{Z}_{ij}}{Z}.$$

(4.4)

We use quotation marks because both quantities are not observed. To quantify the deviation from the null hypothesis of independence seen in the data, we introduce the discrepancy measure,

$$\Delta ({\mathit{Q}}_{\text{EA}},{\mathit{Q}}_{\text{MS}},\mathit{D})=\sum _{i=0}^{1}\sum _{j=0}^{1}{({\u03f5}_{ij}-{\eta}_{ij})}^{2}.$$

(4.5)

This measure returns small values under the null and implicitly depends on *τ* and branch lengths ** T**. We account for this dependence and phylogenetic uncertainty by averaging our results over a finite sample from the posterior distribution of

We use the software package BayesTraits to accomplish this averaging and to produce a MCMC sample from the posterior distribution of *Q*_{EA} and *Q*_{MS}, assuming the null model of independent evolution of the two traits (Pagel *et al*. 2004). Each iteration sample from the output of BayesTraits consists of $(\tau ,\mathit{T},{\alpha}_{0},{\alpha}_{1},{\beta}_{0},{\beta}_{1})$ drawn from their posterior distribution. Given these model parameters, we generate a new dataset *D*^{rep} and compute the observed $\Delta ({\mathit{Q}}_{\text{EA}},{\mathit{Q}}_{\text{MS}},\mathit{D})$ and predicted $\Delta ({\mathit{Q}}_{\text{EA}},{\mathit{Q}}_{\text{MS}},{\mathit{D}}^{\text{rep}})$ discrepancies for each iteration. We then compare their marginal distributions by plotting their corresponding histograms (figure 3). In figure 3, we also plot the observed against predicted discrepancies to display the correlation between these two random variables. The apparent disagreement between observed and predicted discrepancies is a manifestation of poor fit of the independent model of evolution. The observed discrepancy consistently exceeds the predicted quantity. To illustrate the performance of predictive diagnostics when the independent model fits data well, we simulate trait data under this model along one of the 500 *a posteriori* supported phylogenetic trees. Figure 3*c*,*d* depicts the result of the posterior model diagnostics applied to the simulated data. In contrast to the observed primate data, the simulated data do not exhibit disagreement between the observed and predicted discrepancies.

Testing coevolution. (*a*,*c*) Plots depicting observed (white bars) and predicted (grey bars) distributions of the discrepancy measure for the (*a*,*b*) primate and (*c*,*d*) simulated data. (*b*,*d*) The scatter plots of these distributions.

Disagreement between the observed and predicted discrepancies can be quantified using a tail probability, called a posterior predictive *p* value,

$$\text{ppp}=\text{Pr}\left(\Delta ({\mathit{Q}}_{\text{EA}},{\mathit{Q}}_{\text{MS}},{\mathit{D}}^{\text{rep}})>\Delta ({\mathit{Q}}_{\text{EA}},{\mathit{Q}}_{\text{MS}},\mathit{D})|\mathit{D},{H}_{0}\right),$$

(4.6)

where the tail probability is taken over the posterior distribution of the independent model. In practice, given *N* MCMC iterations, one estimates posterior predictive *p* values via

$$\text{ppp}\approx \frac{1}{N}\sum _{g=1}^{N}{1}_{\left\{\Delta \left({\mathit{Q}}_{\text{EA}}^{(g)},{\mathit{Q}}_{\text{MS}}^{(g)},{\mathit{D}}^{\text{rep},g}\right)>\Delta \left({\mathit{Q}}_{\text{EA}}^{(g)},{\mathit{Q}}_{\text{MS}}^{(g)},\mathit{D}\right)\right\}},$$

(4.7)

where ${\mathit{Q}}_{\text{EA}}^{(g)},{\mathit{Q}}_{\text{MS}}^{(g)}$ are the parameter values realized at iteration *g*, and *D*^{rep.g} is a dataset simulated using these parameter values. Following this recipe, we estimate ppp for the primate data and the artificial data. The disagreement between the ‘observed’ and ‘predicted’ discrepancies is reflected in a low ppp=0.0128. By contrast, the ppp=0.3139, for the simulated data, supports agreement between the observed and predicted distributions of *Δ*.

In this section, we consider the important task of mapping synonymous and non-synonymous mutations onto branches of a phylogenetic tree. Our point of departure is a recent ambitious analysis of HIV intrahost evolution by Lemey *et al*. (2007), who use sequence data originally reported by Shankarappa *et al*. (1999). The authors attempt to estimate branch-specific synonymous and non-synonymous mutation rates and then project these measurements onto a time axis. This projection enables one to relate the time evolution of selection processes with clinical covariates. Lemey *et al*. (2007) find fitting codon models computationally prohibitive in this case. Instead, they first fit a DNA model to the intrahost HIV sequences, obtain a posterior sample of phylogenies with branch lengths, and then use these phylogenies to fit a codon model to the same DNA sequence alignment. Instead of fitting two different models to the data, we propose to use just a DNA model and exploit mapping summaries to predict synonymous and non-synonymous mutation rates.

Suppose we observe a multiple DNA sequence alignment $\mathit{D}=({D}_{1},\dots ,{D}_{L})$ of a protein-coding region with *L* sites and that all *C*=*L*/3 codons are aligned to each other such that the coding region starts at site 1 of ** D** (in other words, there is no frameshift). Following the codon partitioning model of Yang (1996), we assume that sites corresponding to all first codon positions, ${\mathit{D}}_{1},{\mathit{D}}_{4},\dots ,{\mathit{D}}_{L-2}$, evolve according to a standard HKY model with generator ${\mathit{Q}}_{\text{HKY}}({\kappa}_{1},{\pi}_{1})$, where

$${\mathit{Q}}_{\text{codon}}={\mathit{Q}}_{\text{HKY}}({\kappa}_{1},{\pi}_{1}){\mathit{Q}}_{\text{HKY}}({\kappa}_{2},{\pi}_{2}){\mathit{Q}}_{\text{HKY}}({\kappa}_{3},{\pi}_{3}).$$

(4.8)

With this Markov chain on the codon space, we define a labelling $\mathcal{L}$(*s*) that contains all possible pairs of codons that translate into the same amino acid. All other codon pairs are collected into a labelling set $\mathcal{L}$(*n*). Clearly, transitions between elements of $\mathcal{L}$(*s*) constitute synonymous mutations, and non-synonymous mutations are represented by transitions between elements of $\mathcal{L}$(*n*). In this manner, counting processes map synonymous and non-synonymous mutations onto specific branches of *τ*. We consider HIV sequences from patient 1 of Shankarappa *et al*. (1999) and approximate the posterior distribution of our DNA model parameters $\text{Pr}({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T}|\mathit{D})$ using MCMC sampling implemented in the software package BEAST (Drummond & Rambaut 2007). The serially sampled HIV sequences permit us to estimate the branch lengths ** T** in units of clock time, months in this case. For each saved MCMC sample, we compute branch-specific rates of synonymous and non-synonymous mutations,

$${r}_{b}(s)=\frac{\frac{1}{C}{\sum}_{c=1}^{C}E[{N}_{\mathcal{L}(s)}({t}_{b})|{\mathit{D}}_{c:c+2}]}{{t}_{b}}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{r}_{b}(n)=\frac{\frac{1}{C}{\sum}_{c=1}^{C}E\left[{N}_{\mathcal{L}(n)}({t}_{b})|{\mathit{D}}_{c:c+2}\right]}{{t}_{b}},$$

(4.9)

where we denote data at codon site *c* by ${\mathit{D}}_{c:c+2}$. We also record the fraction ${r}_{b}(n)/({r}_{b}(s)+{r}_{b}(n))$ of non-synonymous mutations. Similar to Lemey *et al*. (2007), we summarize these measurements by projecting them on the time axis. To this end, we form a finite time grid and produce a density profile of the synonymous and non-synonymous rates, and of the non-synonymous mutation fractions for each time interval between grid points (figure 4). Both synonymous and non-synonymous rate density profiles are consistently bimodal across time. Interestingly, the modes also stay appreciably constant. The density profile of the non-synonymous mutation fractions is multimodal and fairly complex. There are a considerable number of branches that exhibit strong negative $(({r}_{b}(n))/({r}_{b}(s)+{r}_{b}(n))~0)$ and positive $(({r}_{b}(n))/({r}_{b}(s)+{r}_{b}(n))~1)$ selection. In general, when comparing branches that occur earlier in the tree to those that occur later, the non-synonymous mutation fraction has first a modest upward trend through time and then descends to lower values, consistent with other patterns of evolutionary diversity reported by Shankarappa *et al*. (1999).

Time evolution of synonymous and non-synonymous rates. (*a*) A representative phylogeny of 129 intrahost HIV sequences. The three heat maps depict the marginal posterior densities of the (*b*) synonymous and (*c*) non-synonymous rates, and (*d*) the proportion **...**

Intrigued by the multimodality observed in figure 4, we investigate this issue further. Lemey *et al*. (2007) consider several branch categories in their analysis, e.g. internal and terminal. We decide to test whether differences exist between selection forces acting on internal and terminal branches of *τ*. We define the fractions of non-synonymous mutations on internal and terminal branches as

$${\rho}_{\mathcal{E}}({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D})=\frac{\sum _{b\mathcal{E}}\sum _{b\mathcal{E}}}{}$$

(4.10)

and

$${\rho}_{\mathcal{I}}({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D})=\frac{\sum _{b\mathcal{I}}\sum _{b\mathcal{I}}.}{}$$

(4.11)

We plot their posterior histograms in figure 5. These histograms do not overlap, suggesting different fractions of non-synonymous mutations for internal and external branches. To test this hypothesis more formally, we form a discrepancy measure

$$\Delta ({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D})={\rho}_{\mathcal{E}}({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D})-{\rho}_{\mathcal{I}}({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D}).$$

(4.12)

As in our previous example, we compare the observed discrepancy $\Delta ({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},\mathit{D})$ with the expected discrepancy $\Delta ({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},{\mathit{D}}^{\text{rep}})$, where *D*^{rep} is a multiple sequence alignment simulated under the codon partitioning model with parameters *Q*_{codon}, *τ* and ** T**. Evoking approximation (3.16) and recalling that our model assumes the same substitution rates for each branch of

$$\Delta ({\mathit{Q}}_{\text{codon}},\tau ,\mathit{T},{\mathit{D}}^{\text{rep}})\approx 0,$$

(4.13)

for all parameter values *Q*_{codon}, ** T** and

In this paper, we develop a computationally efficient framework for mapping evolutionary trajectories onto phylogenies. Although we aim to keep this mathematical framework fairly general, our main interest with evolutionary mappings lies in computing the mean number of labelled trait transitions and the mean evolutionary reward that depends linearly on the time a trait occupies each of its states. These two mapping summaries have been the most promising building blocks for constructing statistical tests.

We build upon our earlier work involving single branch calculations for Markov-induced counting processes (Minin & Suchard 2008). In our extension, we introduce single branch calculations for evolutionary reward processes and devise algorithms to extend single branch calculations to mapping expectations of counting and reward processes onto branches across an entire phylogeny. Our main result generalizes Felsenstein's pruning algorithm that forms the workhorse of modern phylogenetic computation. The generalized pruning algorithm warrants two comments about its efficiency for performing simulation-free stochastic mapping. First, when the infinitesimal rates are unknown, a traditionally slow component of phylogenetic inference is the eigen decomposition of their matrix. Fortunately, this decomposition finds immediate reuse in our algorithm to calculate posterior expectations of mappings. Second, the algorithm requires only two traversals of the phylogenetic tree, and is therefore at most two times slower than the standard likelihood evaluation algorithm. In practice, we find our algorithm approximately 1.5 times slower than the likelihood calculation. We achieve this advantage because during the second traversal *n* terminal branches are not visited. Finally, the Felsenstein's algorithm analogy yields a pulley principle for stochastic mapping and reduction in computation for prior expectations.

Our examples demonstrate how our novel algorithm facilitates phylogenetic exploratory analysis and hypothesis testing. First, we use simulation-free stochastic mapping of occupancy times to re-implement Huelsenbeck *et al*.'s (2003) posterior predictive test of independent evolution. In our second example, we attempt to recover synonymous and non-synonymous mutation rates without resorting to codon models and instead use an independent codon partitioning model. We overcome this gross model misspecification with stochastic mapping, find intriguing multimodality of synonymous and non-synonymous rates, and use a posterior predictive model check to test differences in selection pressures between terminal and internal branches. We stress that our predictions are only as good as the model we use. For example, the terminal/internal branch differences may be due to a general bad fit of our purposely misspecified model to the intrahost HIV data. However, we find this scenario unlikely in light of the recent demonstration of the excellent performance of codon partitioning models during analyses of protein coding regions (Shapiro *et al*. 2006).

Our examples illustrate importance of hypothesis testing in statistical phylogenetics. In recent years, it has become clear that an evolutionary analysis almost never ends with tree estimation. Importantly, phylogenetic inference enables evolutionary biologists to tackle scientific hypotheses, appropriately accounting for ancestry-induced correlation in observed trait values (Huelsenbeck *et al*. 2000; Pagel & Lutzoni 2002). Several authors demonstrate that mapping evolutionary histories onto inferred phylogenies provides a convenient and probabilistically grounded basis for designing statistically rigorous tests of evolutionary hypotheses (Nielsen 2002; Huelsenbeck *et al*. 2003; Dimmic *et al*. 2005). Unfortunately, this important statistical technique has been hampered by the high computational cost of stochastic mapping. Our general mathematical framework and fast algorithms should secure a central place for stochastic mapping in the statistical toolbox of evolutionary biologists.

We would like to thank Philippe Lemey for sharing his HIV codon alignments. We are also grateful to Andrew Meade for his help on using the BayesTraits software and Ziheng Yang and two anonymous reviewers for their constructive criticism. M.A.S. is supported by an Alfred P. Sloan Research Fellowship, John Simon Guggenheim Memorial Fellowship and NIH R01 GM086887.

One contribution of 17 to a Discussion Meeting Issue ‘Statistical and computational challenges in molecular phylogenetics and evolution’.

- Ball F, Milne R. Simple derivations of properties of counting processes associated with Markov renewal processes. J. Appl. Prob. 2005;42:1031–1043. doi:10.1239/jap/1134587814
- Cannings, C., Thompson, E. & Skolnick, M. 1980 Pedigree analysis of complex models. In
*Current developments in anthropological genetics*, pp. 251–298. New York, NY: Plenum Press. - Dimmic M, Hubisz M, Bustamante C, Nielsen R. Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics. 2005;21:i126–i135. doi:10.1093/bioinformatics/bti1032 [PubMed]
- Drummond A, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 2007;7:214. doi:10.1186/1471-2148-7-214 [PMC free article] [PubMed]
- Dutheil J, Pupko T, Marie A, Galtier N. A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 2005;22:1919–1928. doi:10.1093/molbev/msi183 [PubMed]
- Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 1981;13:93–104. [PubMed]
- Felsenstein J. Sinauer Associates; Sunderland, MA: 2004. Inferring phylogenies.
- Gelman A, Meng X, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 1996;6:733–807.
- Guindon S, Rodrigo A, Dyer K, Huelsenbeck J. Modeling the site-specific variation of selection patterns along lineages. Proc. Natl Acad. Sci. USA. 2004;101:12957–12962. doi:10.1073/pnas.0402177101 [PubMed]
- Holmes I, Rubin G.M. An expectation maximization algorithm for training hidden substitution models. J. Mol. Biol. 2002;317:753–764. doi:10.1006/jmbi.2002.5405 [PubMed]
- Huelsenbeck J, Rannala B, Masly J. Accommodating phylogenetic uncertainty in evolutionary studies. Science. 2000;288:2349–2350. doi:10.1126/science.288.5475.2349 [PubMed]
- Huelsenbeck J, Nielsen R, Bollback J. Stochastic mapping of morphological characters. Syst. Biol. 2003;52:131–158. doi:10.1080/10635150390192780 [PubMed]
- Lange K. Springer; New York, NY: 2004. Applied probability.
- Laub A. SIAM; Philadelphia, PA: 2004. Matrix analysis for scientists and engineers.
- Lemey P, Pond S.K, Drummond A, Pybus O, Shapiro B, Barroso H, Taveira N, Rambaut A. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput. Biol. 2007;3:e29. doi:10.1371/journal.pcbi.0030029 [PubMed]
- Leschen R, Buckley T. Multistate characters and diet shifts: evolution of Erotylidae (Coleoptera) Syst. Biol. 2007;56:97–112. doi:10.1080/10635150701211844 [PubMed]
- Meng X. Posterior predictive
*p*-values. Ann. Stat. 1994;22:1142–1160. doi:10.1214/aos/1176325622 - Minin V, Suchard M. Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 2008;56:391–412. doi:10.1007/s00285-007-0120-8 [PubMed]
- Neuts M. Chapman and Hall; London, UK: 1995. Algorithmic probability: a collection of problems.
- Nielsen R. Mapping mutations on phylogenies. Syst. Biol. 2002;51:729–739. doi:10.1080/10635150290102393 [PubMed]
- Pagel M. The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst. Biol. 1999;48:612–622. doi:10.1080/106351599260184
- Pagel, M. & Lutzoni, F. 2002 Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation.
*Biological evolution and statistical physics*, pp. 148–161. Berlin, Germany: Springer. - Pagel M, Meade A. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 2006;167:808–825. doi:10.1086/503444 [PubMed]
- Pagel M, Meade A, Barker D. Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 2004;53:673–684. doi:10.1080/10635150490522232 [PubMed]
- Rodrigue N, Philippe H, Lartillot N. Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models. Bioinformatics. 2008;24:56–62. doi:10.1093/bioinformatics/btm532 [PubMed]
- Schadt E, Sinsheimer J, Lange K. Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res. 1998;8:222–233. [PubMed]
- Shankarappa R, et al. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 1999;73:10489–10502. [PMC free article] [PubMed]
- Shapiro B, Rambaut A, Drummond A. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 2006;23:7–9. doi:10.1093/molbev/msj021 [PubMed]
- Yang Z. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 1996;42:587–596. doi:10.1007/BF02352289 [PubMed]

Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of **The Royal Society**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |