Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3410445

Formats

Article sections

Authors

Related links

Methods Mol Biol. Author manuscript; available in PMC 2012 August 2.

Published in final edited form as:

PMCID: PMC3410445

NIHMSID: NIHMS385225

The publisher's final edited version of this article is available at Methods Mol Biol

See other articles in PMC that cite the published article.

Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.

In eukaryotes, many protein-coding genes have their coding sequence broken into pieces – the exons – separated by the non-coding spliceosomal introns. These introns are removed from the nascent pre-mRNA and the exons are spliced together to form the intronless mRNA by the spliceosome, a large and elaborate macromolecular complex comprising several small RNA molecules and numerous proteins. No spliceosomal introns have ever been found in prokaryotes, and there are no eukaryotes with a completely sequenced genomes, not even the very basal ones, which would not possess introns (1–3) and the accompanying splicing machinery (4).

Despite the introns being such a remarkable idiosyncrasy of eukaryotic genomes, their origin and evolution are not thoroughly understood (5, 6). It is generally accepted that introns can be regarded as units of evolution and that their presence/absence pattern is a result of stochastic processes of loss and gain. However, the nature of these processes is vigorously debated. Recent large-scale attempts to study these processes using extant eukaryotic genomes led to incongruent conclusions.

In a study on reconstruction of intron evolution, Rogozin et al. (7) analyzed ~700 sets of intron-bearing orthologous genes from eight eukaryotic species. The multiple alignment of the orthologs within each set was computed, and the intron positions were projected on the alignments to form presence/absence maps. Using Dollo parsimony to infer ancestral states, these authors observed a diverse repertoire of behaviors. Some lineages endured extensive losses, while others experienced mostly gain events. Early forbearers, such as the last common ancestor of multicellular life, were shown to be relatively intron-rich. This work suggested that both gain and loss of introns played significant roles in shaping the modern eukaryotic gene structure. However, as these inferences rely upon the Dollo parsimony reconstruction, the number of gains in terminal branches (leaves of the phylogenetic tree) is overestimated, resulting in underestimation (potentially, significant) of the number of introns in ancient lineages.

The same data set was analyzed by Roy and Gilbert (8, 9) using a different methodology. They adopted a simple evolutionary model, according to which different lineages are associated with different loss and gain probabilities. Using a variation on maximum likelihood estimation, they obtained considerably higher estimates for the number of introns in early eukaryotes and a correspondingly lower level of gains in all lineages, i.e., a clear dominance of loss events in the evolution of eukaryotic genes. Roy and Gilbert have substantially simplified the mathematics involved in the estimation procedure, at the expense of introducing into the computation considerations of parsimony, which yielded an inference technique that is a hybrid between parsimony and maximum likelihood. This hybrid, however, excludes from consideration different evolutionary scenarios, resulting in inflated estimates of the number of introns in early eukaryotes (10).

The model of Roy and Gilbert is *branch-specific*, i.e., it assumes that the gain and loss rates depend only on the branch, thus tacitly presuming that all genes behave identically with respect to intron gain and loss. Exactly the inverse approach was adopted by Qiu et al. (11). These authors developed a *gene-specific* model, whereby different gene families are characterized by different rates of intron gain and loss, but for a particular gene these rates are constant across the entire phylogenetic tree. They used a different data set combined with a Bayesian estimation technique and concluded that almost all extant introns were gained during the eukaryotic evolution. This suggests evolution is dominated by intron gain events with few losses. However, the validity of a gene-specific model is disputable as it is hard to reconcile with the accumulating evidence on large differences between lineages (12–15).

Recently, two maximum likelihood estimation techniques have been developed for essentially the same branch-specific evolutionary model as the one of Roy and Gilbert. Csuros (10) used a direct approach, while Nguyen et al. (16) developed an expectation-maximization algorithm. Both methods encountered the same problem of estimating the number of unobserved intronless sites. Each employed a technically different but conceptually similar method to evaluate this number. Both techniques were applied to the eight-species data of Rogozin et al. (7), yielding very close estimates. As expected, these methods predict intron occupancy level of ancient lineages higher than those predicted by Dollo parsimony and lower than those predicted by the hybrid technique of Roy and Gilbert. Notably, these estimates are generally closer to those obtained using Dollo parsimony, and they imply an evolutionary landscape comprising both losses and gains, with some excess of gains.

While the Dollo parsimony (7) and the hybrid technique of Roy and Gilbert (8, 9) showed some methodological biases, the other analyses of intron evolution (10, 11, 16) used well-established estimation techniques. Nevertheless, these studies kept yielding widely diverging inferences. The reason seems to be the differences in the underlying evolutionary models, neither being sufficient to describe the complex reality of intron evolution. The branch-specific model fails to account for important differences between genes, whereas the gene-specific model ignores the sharp differences between lineages. Additionally, rate variability between sites, known to be an important factor in other fields of molecular evolution (17, 18), should be taken into account also in the evolution of gene structure. This is particularly important for intron gain in light of the accumulating evidence in favor of the proto-splice model, according to which new introns are preferentially inserted inside certain sequence motifs (19–21). This means that sites could dramatically differ in their gain rate depending on their position relative to a proto-splice site.

Here we describe a model of evolution that takes into consideration all of the above factors. In order to efficiently estimate the model parameters by maximum likelihood, we have developed an expectation-maximization algorithm. We also compiled a data set that is considerably larger than previously used ones, consisting of 400 sets of orthologous genes from 19 eukaryotic species. Applying our algorithm to this data set, we obtained high-precision estimates, revealing a fascinating evolutionary history of gene structure, where both losses and gains played significant roles albeit the contribution of losses was somewhat greater. Moreover, we identified novel properties of intron evolution: (i) all eukaryotic lineages share a common, universal, mode of intron evolution, whereby the loss and gain processes are positively correlated. This suggests that the mechanisms of intron gain and loss share common mechanistic components. In some lineages, additional forces come into play, resulting either in elevated loss rate or in elevated gain rate. Lineages exhibiting an increased loss rate are dispersed throughout the entire phylogenetic tree. In contrast, lineages with excessive gains are much rarer, and all of them are ancient. (ii) Intron loss rates of individual genes show no correlation with any other genomic property. By contrast, intron gain rate of individual genes show several remarkable relationships, not always easily explained. In brief, intron gain rate is positively correlated with expression level, negatively correlated with sequence evolution rate, and negatively correlated with the gene length. Moreover, genes of apparent bacterial origin have significantly lower rates of intron gain than genes of archaeal origin. (iii) We showed that the remarkable conservation of intron positions is, mainly (~90%), due to shared ancestry, and only in a minority of the cases (~10%), due to parallel gain at the same location. (iv) We determined that the density of potential intron insertion sites is about 1 site per 7 nucleotides.

The algorithm learns the parameters of the model by comparing the structure of orthologous genes in extant species. To carry out this comparison, it requires two sets of input data, to be described in this section. The first is a phylogenetic tree, defining topological relationships between a set of eukaryotic species. The second is a collection of genes, for which one can identify orthologs in at least a subset of the species above.

Suppose that we have *G* sets of aligned orthologous genes from S species. To represent the gene structure, we transform these alignments into intron presence–absence maps by substituting for each nucleotide (or amino acid) 0 or 1, depending on whether an intron is present or absent in the respective position. We allow for missing data by using a third symbol (*), and consequently a gene might be included in the input data even if it is missing in part of the species. Every site in an alignment, called *pattern*, is a vector of length S over the alphabet (*0,1,**). Let Ω be the total number of unique patterns in the entire set of *G* alignments, denoted ω_{1}, . . . , ω_{Ω}, and let *n _{gp}* count the number of times pattern ω

Let *T* be a rooted bifurcating phylogenetic tree with *S* leaves (terminal nodes) corresponding to the *S* species above. The total number of nodes in *T* is *N* = 2*S* – 1, and we index them by *t* = 0, 1, . . . , *N* – 1, with the convention that zero is the root node. The state of node *t* is described by the random variable *q _{t}*, which can take the values 0 and 1 (and * in leaves). We use

A graphical model is a mathematical graph whose nodes symbolize random variables, and whose branches describe dependence relationships between them (22). A bifurcating phylogenetic tree, when viewed as a graphical model, depicts the probabilistic model

$$\mathrm{Pr}\left({q}_{0}\right)\prod _{t=1}^{N-1}\mathrm{Pr}({q}_{t}\mid {q}_{t}^{P}).$$

[1]

We use the notation π_{i} = Pr(*q*_{0} = *i*) to describe the prior probability of the root, and ${A}_{ij}(g,t)=\mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i,g)$ to describe the transition probability for gene *g* along branch *t*. In our model, we assume that the transition probability depends on both the gene and the branch, and that it takes the explicit form

$$A(g,t)=\left(\begin{array}{cc}\hfill 1-{\xi}_{t}(1-{e}^{-{\eta}_{g}{\Delta}_{t}})\hfill & \hfill {\xi}_{t}(1-{e}^{-{\eta}_{g}{\Delta}_{t}})\hfill \\ \hfill 1-(1-{\varphi}_{t}){e}^{-{\theta}_{g}{\Delta}_{t}}\hfill & \hfill (1-{\varphi}_{t}){e}^{-{\theta}_{g}{\Delta}_{t}}\hfill \end{array}\right).$$

[2]

Here, *η _{g}* and

In other fields of molecular evolution, it was long realized that analysis precision improves if one allows for rate variability across sites (17, 18). Typically, such rate variability is modeled by introducing a *rate variable*, *r*, which scales, for each site, the time units of the phylogenetic tree, Δ_{t} ← *r* · Δ_{t}. This rate variable is a random variable, distributed according to a distribution function with nonnegative domain and unit mean, typically the unit-mean gamma distribution. The rate variability reflects the idea that sites differ in their rate of evolution. Specifically, there are fast-evolving sites (*r* > > 1), as well as slow-evolving ones (*r* < < 1). In our model of intron evolution we extend this idea by assuming that the gain and loss processes are subject to rate variability, independently of each other. Hence, a site can have any combination of gain and loss rates. To accommodate this idea, we use two independent rate variables, *r ^{η}* and

$$\begin{array}{cc}\hfill {r}^{\eta}& \sim \nu \delta \left(\eta \right)+(1-\nu )\Gamma (\eta ;{\lambda}_{\eta})\hfill \\ \hfill {r}^{\theta}& \sim \Gamma (\theta ;{\lambda}_{\theta}).\hfill \end{array}$$

[3]

Here, Γ(*x*, λ) is the unit-mean gamma distribution of variable *x* with shape parameter λ, δ(*x*) is the Dirac delta-function, and η is the fraction of sites that are assumed to have zero gain rate. These latter sites, denoted *invariant sites*, reflect these sites that are not a proto-splice site (19–21). Intron loss does not have an invariant counterpart, as the assumption is that once an intron is gained, it can always be lost. Therefore, the loss rate variable is assumed to be distributed according to a gamma distribution, which is by far the most popular in describing rate variability (17, 18, 23).

In practice, the rate distributions in **Eq. [3]** are rendered discrete (24). We assume that the gain rate variable can take *K _{η}* discrete values ${r}_{1}^{\eta}=0,{r}_{2}^{\eta},\dots ,{r}_{{K}_{\eta}}^{\eta}$ with probabilities ${f}_{1}^{\eta}=\nu ,{f}_{2}^{\eta}\dots ,{f}_{{K}_{\eta}}^{\eta}$ such that ${\sum}_{k=1}^{{K}_{\eta}}{f}_{k}^{\eta}=1$. Analogously, we assume that the loss rate variable can take

For notational clarity, we aggregate the model parameters into a small number of sets. To this end, let Ξ_{t} = {ξ_{t}, _{t}} be the set of parameters that are specific for branch *t*, and let Ξ = (Ξ_{1}, . . . , Ξ_{N – 1}) be the set of all branch-specific parameters. Similarly, let ${\Psi}_{g}=({\eta}_{g},{\theta}_{g})$ be the set of parameters that are specific for gene *g*, and let Ψ = (Ψ_{1} . . . ,Ψ_{G}) be the set of all gene-specific parameters. Additionally, we denote by Λ = (*ν, λ _{η}, λ_{θ}*) the parameters that determine the rate variability. When the distinction between the different sets of parameters is irrelevant, we shall use Θ = (Ξ, Ψ, Λ) as the set of all the model's parameters. We achieve further succinctness in notations by denoting the actual gene-specific rate values for particular values ${r}_{k}^{\eta}$ and ${r}_{{k}^{\prime}}^{\theta}$ of the rate variables as ${\Psi}_{k{k}^{\prime}g}=({\eta}_{kg},{\theta}_{{k}^{\prime}g})$.

For each site, the *S* leaves form a set of observed random variables, their states being described by the corresponding pattern ω_{p}. The state of all the internal nodes, denoted σ, form a set of hidden random variables, that is, random variables whose state is not observed. In order to account for rate variability across sites, we associate with each pattern two hidden random variables, ${\rho}_{p}^{\eta}$ and ${\rho}_{p}^{\eta}$, that determine the value of the rate variables in that site. To sum up, the observed random variables are ω_{p}, and the hidden random variables are $(\sigma ,{\rho}_{p}^{\eta},{\rho}_{p}^{\theta})$.

We assume that sites within a gene, as well as the genes themselves, evolve independently. Therefore, the total likelihood can be decomposed as

$$L({M}_{1},\dots ,{M}_{G}\mid \u03f4)=\prod _{g=1}^{G}L({M}_{g}\mid \Xi ,{\Psi}_{g},\Lambda )=\prod _{g=1}^{G}\prod _{p=1}^{\Omega}L{({\omega}_{p}\mid \Xi ,{\Psi}_{g},\Lambda )}^{{n}_{gp}}.$$

and so

$$\text{log}\phantom{\rule{thickmathspace}{0ex}}L({M}_{1},\dots ,{M}_{G}\mid \u03f4)=\sum _{g=1}^{G}\sum _{p=1}^{\Omega}{n}_{gp}\phantom{\rule{thickmathspace}{0ex}}\text{log}\phantom{\rule{thickmathspace}{0ex}}L({\omega}_{p}\mid \Xi ,{\Psi}_{g},\Lambda ).$$

[4]

According to the well-known EM paradigm (25) log *L*(*M*_{1}, . . . , *M _{G}*|Θ) is guaranteed to increase as long as we maximize the auxiliary function

$$\mathcal{Q}(\u03f4,{\u03f4}^{0})=\sum _{g=1}^{G}\sum _{p=1}^{\Omega}{n}_{gp}{\mathcal{Q}}_{gp}(\Xi ,{\Psi}_{g},\Lambda ,{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0}),$$

[5]

where

$${\mathcal{Q}}_{gp}(\Xi ,{\Psi}_{g},\Lambda ,{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})=\sum _{\sigma ,{\rho}_{p}^{\eta},{\rho}_{p}^{\theta}}\mathrm{Pr}(\sigma ,{\rho}_{p}^{\eta},{\rho}_{p}^{\theta}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\phantom{\rule{thickmathspace}{0ex}}\text{log}\phantom{\rule{thickmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma ,{\rho}_{p}^{\eta},{\rho}_{p}^{\theta}\mid \Xi ,{\Psi}_{g},\Lambda ).$$

[6]

Using some manipulations (see **Note 1**), this can be written as

$${\mathcal{Q}}_{gp}\left(\Xi ,{\Psi}_{g},\Lambda ,{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0}\right)=\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\left[\mathrm{Pr}\left({\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0}\right)\right].\left[\sum _{\sigma}\mathrm{Pr}\left(\sigma \mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0}\right)\cdot \{\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})\}\right].$$

Denoting by ${w}_{gpk{k}^{\prime}}$ and ${\mathcal{Q}}_{gpk{k}^{\prime}}$ the first and second square brackets, respectively, this expression becomes

$${\mathcal{Q}}_{gp}(\Xi ,{\Psi}_{g},\Lambda ,{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})=\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}{w}_{gpk{k}^{\prime}}{\mathcal{Q}}_{gpk{k}^{\prime}},$$

[7]

and consequently

$$\mathcal{Q}(\u03f4,{\u03f4}^{0})=\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}{n}_{gp}{w}_{gpk{k}^{\prime}}{\mathcal{Q}}_{gpk{k}^{\prime}}.$$

[8]

In this step we compute the function $\mathcal{Q}(\u03f4,{\u03f4}^{0})$, or, equivalently, the set of coefficients ${w}_{gpk{k}^{\prime}}$ and ${\mathcal{Q}}_{gpk{k}^{\prime}}$. We accomplish this with the aid of an inward–outward recursion on the tree.

Here we propose a variation on the well-known Felsenstein's pruning algorithm (26). Let us associate with each node *t* (except for the root) a vector ${\gamma}_{i}^{gpk{k}^{\prime}}\left(t\right)=\mathrm{Pr}({V}_{t}\mid {q}_{t}^{P}=i,{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})$. In words, ${\gamma}_{i}^{gpk{k}^{\prime}}$ (*t*) is the probability of observing the nodes *V _{t}* (which are a subset of the pattern ω

$$\gamma (t\in {V}_{0})=\{\begin{array}{cc}\left(\begin{array}{c}\hfill 1-{\xi}_{t}(1-{e}^{-{\eta}_{gk}{\Delta}_{t}})\hfill \\ \hfill 1-(1-{\varphi}_{t}){e}^{-{\theta}_{g{k}^{\prime}}{\Delta}_{t}}\hfill \end{array}\right)\hfill & {q}_{t}=0\hfill \\ \left(\begin{array}{c}\hfill {\xi}_{t}(1-{e}^{-{\eta}_{gk}{\Delta}_{t}})\hfill \\ \hfill (1-{\varphi}_{t}){e}^{-{\theta}_{g{k}^{\prime}}{\Delta}_{t}}\hfill \end{array}\right)\hfill & {q}_{t}=1.\hfill \end{array}\phantom{\}}$$

[9]

Here, and in the derivations to follow, we omit the superscript from γ. For all internal nodes (except for the root), γ is computed using the recursion

$${\gamma}_{i}\left(t\right)=\sum _{j=0}^{1}{A}_{ij}(g,t){\stackrel{~}{\gamma}}_{j}\left(t\right),$$

[10]

where ${\stackrel{~}{\gamma}}_{j}\left(t\right)$ is defined as γ_{j}[L(*t*)]γ_{j}[R(*t*)] (see **Note 2**).

The γ-recursion allows for computing the likelihood of any observed pattern ω_{p}, given the values of the rate variables:

$$\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})=\mathrm{Pr}({V}_{0}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})=\mathrm{Pr}({V}_{0}^{L},{V}_{0}^{R}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})==\sum _{i=0}^{1}\mathrm{Pr}({V}_{0}^{L},{V}_{0}^{R},{q}_{0}=i\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})==\sum _{i=0}^{1}\mathrm{Pr}({q}_{0}=i\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})\cdot \mathrm{Pr}({V}_{0}^{L}\mid {q}_{0}=i,{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})\cdot \mathrm{Pr}({V}_{0}^{R}\mid {V}_{0}^{L},{q}_{0}=i,{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0}).$$

Given *q*_{0}, ${V}_{0}^{R}$ is independent of ${V}_{0}^{L}$, and so

$$\mathrm{Pr}({V}_{0}^{R}\mid {V}_{0}^{L},{q}_{0}=i,{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})=\mathrm{Pr}({V}_{0}^{R}\mid {q}_{0}=i,{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0}),$$

and

$$\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})=\sum _{i=0}^{1}{\pi}_{i}{\stackrel{~}{\gamma}}_{i}\left(0\right).$$

[11]

This γ-recursion can be easily modified to incorporate missing data (see **Note 3**).

Once the γ-recursion is computed, we can use it to compute a second, complementary, recursion. To this end, let us associate with each node *t* (except for the root node) a matrix ${\alpha}_{ij}^{gpk{k}^{\prime}}\left(t\right)=\mathrm{Pr}({q}_{t}=j,{q}_{t}^{P}=i\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})$. It is beneficial to define for each node *t* (except for the root node) a vector ${\beta}_{j}^{gpk{k}^{\prime}}\left(t\right)={\sum}_{i=0}^{1}{\alpha}_{ij}^{gpk{k}^{\prime}}\left(t\right)=\mathrm{Pr}({q}_{t}=j\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})$. Upon the computation of α, β is readily computed too. Again, omitting the superscripts, α can be initialized from its definition on the two direct descendants of the root,

$$\alpha \left(\mathrm{D}\left(0\right)\right)=\frac{1}{\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})}\{\begin{array}{cc}\left(\begin{array}{cc}\hfill {\pi}_{0}{\gamma}_{0}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){A}_{00}(g,\mathrm{D}\left(0\right))\hfill & \hfill 0\hfill \\ \hfill {\pi}_{1}{\gamma}_{1}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){A}_{10}(g,\mathrm{D}\left(0\right))\hfill & \hfill 0\hfill \end{array}\right)\phantom{\rule{thickmathspace}{0ex}}\mathrm{D}\left(0\right)\in {V}_{0},{q}_{0}^{\mathrm{D}}=0\hfill & \hfill \\ \left(\begin{array}{cc}\hfill 0\hfill & \hfill {\pi}_{0}{\gamma}_{0}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){A}_{01}(g,\mathrm{D}\left(0\right))\hfill \\ \hfill 0\hfill & \hfill {\pi}_{1}{\gamma}_{1}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){A}_{11}(g,\mathrm{D}\left(0\right))\hfill \end{array}\right)\phantom{\rule{thickmathspace}{0ex}}\mathrm{D}\left(0\right)\in {V}_{0},{q}_{0}^{\mathrm{D}}=1\hfill & \hfill \\ \left(\begin{array}{cc}\hfill {\pi}_{0}{\gamma}_{0}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){\stackrel{~}{\gamma}}_{0}\left(\mathrm{D}\left(0\right)\right){A}_{00}(g,\mathrm{D}\left(0\right))\hfill & \hfill {\pi}_{0}{\gamma}_{0}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){\stackrel{~}{\gamma}}_{1}\left(\mathrm{D}\left(0\right)\right){A}_{01}\left(\mathrm{D}\left(0\right)\right)\hfill \\ \hfill {\pi}_{1}{\gamma}_{1}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){\stackrel{~}{\gamma}}_{0}\left(\mathrm{D}\left(0\right)\right){A}_{10}\left(\mathrm{D}\left(0\right)\right)\hfill & \hfill {\pi}_{1}{\gamma}_{1}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right){\stackrel{~}{\gamma}}_{1}\left(\mathrm{D}\left(0\right)\right){A}_{11}\left(\mathrm{D}\left(0\right)\right)\hfill \end{array}\right)\hfill & \mathrm{D}\left(0\right)\notin {V}_{0}.\hfill \end{array}\phantom{\}}$$

[12]

Here, D(0) stands for any one of the direct descendants of the root, and D(0) is its sibling. For any other internal node, α is computed using the outward-recursion

$$\alpha \left(t\right)=\left(\begin{array}{cc}\hfill {\beta}_{0}\left(\mathrm{P}\left(t\right)\right){\stackrel{~}{\gamma}}_{0}\left(t\right){A}_{00}(g,t)\u2215{\gamma}_{0}\left(t\right)\hfill & \hfill {\beta}_{0}\left(\mathrm{P}\left(t\right)\right){\stackrel{~}{\gamma}}_{1}\left(t\right){A}_{01}(g,t)\u2215{\gamma}_{0}\left(t\right)\hfill \\ \hfill {\beta}_{1}\left(\mathrm{P}\left(t\right)\right){\stackrel{~}{\gamma}}_{0}\left(t\right){A}_{10}(g,t)\u2215{\gamma}_{1}\left(t\right)\hfill & \hfill {\beta}_{1}\left(\mathrm{P}\left(t\right)\right){\stackrel{~}{\gamma}}_{1}\left(t\right){A}_{11}(g,t)\u2215{\gamma}_{1}\left(t\right)\hfill \end{array}\right)$$

[13]

(see **Note 4**).

Finally, for each leaf that is not a descendant of the root,

$$\alpha \left(t\right)=\{\begin{array}{cc}\left(\begin{array}{cc}\hfill {\beta}_{0}\left(\mathrm{P}\left(t\right)\right)\hfill & \hfill 0\hfill \\ \hfill {\beta}_{1}\left(\mathrm{P}\left(t\right)\right)\hfill & \hfill 0\hfill \end{array}\right)\phantom{\rule{1em}{0ex}}\hfill & {q}_{t}=0\hfill \\ \left(\begin{array}{cc}\hfill 0\hfill & \hfill {\beta}_{0}\left(\mathrm{P}\left(t\right)\right)\hfill \\ \hfill 0\hfill & \hfill {\beta}_{1}\left(\mathrm{P}\left(t\right)\right)\hfill \end{array}\right)\phantom{\rule{1em}{0ex}}\hfill & {q}_{t}=1.\hfill \end{array}\phantom{\}}\phantom{\rule{1em}{0ex}}t\in {V}_{0},\mathrm{P}\left(t\right)\ne 0$$

[14]

Again, this recursion can be straightforwardly modified when missing data are present (see **Note 5**).

These inward–outward recursions are the phylogenetic equivalent of the backward–forward recursions known from hidden Markov models, and other versions of it have already been developed (27, 28). The version that we developed here can be shown to be the realization of the junction tree algorithm (29) on rooted bifurcating trees (see **Note 6**).

Here we show that the γ-recursion is sufficient to compute the coefficients *w _{gpkk′}*. From the definition, ${w}_{gpk{k}^{\prime}}=\mathrm{Pr}({\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})$. Using the Bayes formula Pr(

$$\begin{array}{cc}\hfill {w}_{gpk{k}^{\prime}}& =\frac{\mathrm{Pr}({\rho}_{p}^{\eta},=k,{\rho}_{p}^{\theta}={k}^{\prime},{\omega}_{p}\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})}{\sum _{h,{h}^{\prime}}\mathrm{Pr}({\rho}_{p}^{\eta}=h,{\rho}_{p}^{\theta}={h}^{\prime},{\omega}_{p}\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})}=\hfill \\ \hfill & =\frac{\mathrm{Pr}({\rho}_{p}^{\eta}=k\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\cdot \mathrm{Pr}({\rho}_{p}^{\theta}={k}^{\prime}\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\cdot \mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})}{\sum _{h,{h}^{\prime}}\mathrm{Pr}({\rho}_{p}^{\eta}=h\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\cdot \mathrm{Pr}({\rho}_{p}^{\theta}={h}^{\prime}\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\cdot \mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gh{h}^{\prime}}^{0})}.\hfill \end{array}$$

But $\mathrm{Pr}({\rho}_{p}^{\eta}=k\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})$ is just the current estimate of the probability of the gain rate variable to have the value ${r}_{k}^{\eta}$, namely ${\left({f}_{k}^{\eta}\right)}^{0}$. Similarly, $\mathrm{Pr}({\rho}_{p}^{\theta}={k}^{\prime}\mid {\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})$ is just ${\left({f}_{{k}^{\prime}}^{\theta}\right)}^{0}$. Therefore, the expression for the coefficients ${w}_{gpk{k}^{\prime}}$ is reduced to

$${w}_{gpk{k}^{\prime}}=\frac{{\left({f}_{k}^{\eta}\right)}^{0}{\left({f}_{{k}^{\prime}}^{\theta}\right)}^{0}\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})}{\sum _{h,{h}^{\prime}}{\left({f}_{h}^{\eta}\right)}^{0}{\left({f}_{{h}^{\prime}}^{\theta}\right)}^{0}\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gh{h}^{\prime}}^{0})}.$$

[15]

The function $\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})$ is the likelihood of observing pattern ω_{p} for gain and loss rate variables ${r}_{k}^{\eta}$ and ${r}_{{k}^{\prime}}^{\theta}$, respectively. This is readily computed upon completion of the γ-recursion, using **Eq. [11]**.

Here we show that these coefficients require the α, β-recursion. By definition,

$${\mathcal{Q}}_{gpk{k}^{\prime}}=\sum _{\sigma}\mathrm{Pr}(\sigma \mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})\cdot [\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})].$$

The probability $\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})$ is just the likelihood of a particular realization of the tree, thus from **Eq. [1]**

$$\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})=\sum _{i=0}^{1}\delta ({q}_{0},i)\cdot \text{log}\phantom{\rule{thinmathspace}{0ex}}{\pi}_{i}+\sum _{i,j=0}^{1}\sum _{t=1}^{N-1}\delta ({q}_{t},j)\delta ({q}_{t}^{P},i)\cdot \text{log}\phantom{\rule{thinmathspace}{0ex}}{A}_{ij}(g,t).$$

[16]

Here, δ(*a, b*) is the Kronecker delta function, which is 1 for *a* = *b* and 0 otherwise. Denote the expectation over $\mathrm{Pr}(\sigma \mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})$ by E_{σ}. Applying it to **Eq. [16]**, we get

$${E}_{\sigma}\left[\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})\right]=\sum _{i=0}^{1}\text{log}\phantom{\rule{thinmathspace}{0ex}}{\pi}_{i}\cdot {E}_{\sigma}\left[\delta ({q}_{0},i)\right]+\sum _{i,j=0}^{1}\sum _{t=1}^{N-1}\text{log}\phantom{\rule{thinmathspace}{0ex}}{A}_{ij}(g,t)\cdot {E}_{\sigma}\left[\delta ({q}_{t},j)\delta ({q}_{t}^{P},i)\right].$$

But ${E}_{\sigma}\left[\delta ({q}_{0},i)\right]=\mathrm{Pr}({q}_{0}=i\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})={\beta}_{i}\left(0\right)$, and similarly ${E}_{\sigma}\left[\delta ({q}_{t},j)\delta ({q}_{t}^{P},i)\right]={\alpha}_{ij}\left(t\right)$. Hence, ${\mathcal{Q}}_{gpk{k}^{\prime}}$ is given by

$$\begin{array}{cc}\hfill {\mathcal{Q}}_{gpk{k}^{\prime}}& =\sum _{\sigma}\mathrm{Pr}(\sigma \mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})[\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})]\hfill \\ \hfill & =\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta}+\sum _{i=0}^{1}{\beta}_{i}\left(0\right)\text{log}\phantom{\rule{thinmathspace}{0ex}}{\pi}_{i}+\sum _{i,j=0}^{1}\sum _{t=1}^{N-1}{\alpha}_{ij}\left(t\right)\text{log}\phantom{\rule{thinmathspace}{0ex}}{A}_{ij}(g,t).\hfill \end{array}$$

[17]

Substituting **Eq. [17]** in **Eq. [8]**, we obtain an explicit form of the function whose maximization guarantees stepping up-hill in the likelihood landscape,

$$\begin{array}{cc}\hfill \mathcal{Q}=& \sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}{n}_{gp}{w}_{gpk{k}^{\prime}}(\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta})+\hfill \\ \hfill & +\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}{n}_{gp}{w}_{gpk{k}^{\prime}}[{\beta}_{0}^{gpk{k}^{\prime}}\left(0\right)\text{log}{\pi}_{0}+{\beta}_{1}^{gpk{k}^{\prime}}\left(0\right)\text{log}\phantom{\rule{thinmathspace}{0ex}}{\pi}_{1}]+\hfill \\ \hfill & +\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\sum _{t=1}^{N-1}{n}_{gp}{w}_{gpk{k}^{\prime}}{\alpha}_{00}^{gpk{k}^{\prime}}\left(t\right)\text{log}[1-{\xi}_{t}(1-{e}^{-{\eta}_{gk}{\Delta}_{t}})]+\hfill \\ \hfill & +\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\sum _{t=1}^{N-1}{n}_{gp}{w}_{gpk{k}^{\prime}}{\alpha}_{01}^{gpk{k}^{\prime}}\left(t\right)[\text{log}{\xi}_{t}+\text{log}(1-{e}^{-{\eta}_{gk}{\Delta}_{t}})]+\hfill \\ \hfill & +\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\sum _{t=1}^{N-1}{n}_{gp}{w}_{gpk{k}^{\prime}}{\alpha}_{10}^{gpk{k}^{\prime}}\left(t\right)\text{log}[1-(1-{\varphi}_{t}){e}^{-{\theta}_{g{k}^{\prime}}{\Delta}_{t}}]+\hfill \\ \hfill & +\sum _{g=1}^{G}\sum _{p=1}^{\Omega}\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\sum _{t=1}^{N-1}{n}_{gp}{w}_{gpk{k}^{\prime}}{\alpha}_{11}^{gpk{k}^{\prime}}\left(t\right)[\text{log}(1-{\varphi}_{t})-{\theta}_{g{k}^{\prime}}{\Delta}_{t}].\hfill \end{array}$$

[18]

Actually, any increase in $\mathcal{Q}$ is sufficient to guarantee an increase in the likelihood, suggesting that a precise maximization of $\mathcal{Q}$ is not very important. Therefore, we speed computations by performing low-tolerance maximization with respect to each of the parameters individually. Except for the parameters λ_{η} and λ_{θ}, it is easy to differentiate $\mathcal{Q}$ twice with respect to any parameter. This lends itself into using simple zero-finding algorithms; we chose the Newton-Raphson algorithm (30). Maximizing $\mathcal{Q}$ with respect to the shape parameters λ_{η} and λ_{θ} is more involved, as $\mathcal{Q}$ depends on these parameters only through the discrete approximation of the rate variability distributions, **Eq. [3]** (see **Note 7**).

^{1}If we replace the formal summing over all states of ${\rho}_{p}^{\eta}$ and ${\rho}_{p}^{\eta}$ in **Eq. [6]** by a direct sum, we get

$${\mathcal{Q}}_{gp}(\Xi ,{\Psi}_{g},\Lambda ,{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})=\sum _{k=1}^{{K}_{\eta}}\sum _{{k}^{\prime}=1}^{{K}_{\theta}}\sum _{\sigma}\mathrm{Pr}(\sigma ,{\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}\phantom{\rule{0ex}{0ex}}={k}^{\prime}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\phantom{\rule{0ex}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma ,{\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid \Xi ,{\Psi}_{g},\Lambda ).$$

[19]

Using our notational conventions, we can write the first term in **Eq. [19]** as

$$\mathrm{Pr}(\sigma ,{\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})=\mathrm{Pr}({\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid {\omega}_{p},{\Xi}^{0},{\Psi}_{g}^{0},{\Lambda}^{0})\cdot \mathrm{Pr}(\sigma \mid {\omega}_{p},{\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0}),$$

[20]

and the second term as

$$\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma ,{\rho}_{p}^{\eta}=k,{\rho}_{p}^{\theta}={k}^{\prime}\mid \Xi ,{\Psi}_{g},\Lambda )\phantom{\rule{0ex}{0ex}}=\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\rho}_{p}^{\eta}=k\mid \Xi ,{\Psi}_{g},\Lambda )+\phantom{\rule{0ex}{0ex}}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\rho}_{p}^{\theta}={k}^{\prime}\mid \Xi ,{\Psi}_{g},\Lambda )\phantom{\rule{0ex}{0ex}}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}})\phantom{\rule{0ex}{0ex}}=\text{log}{f}_{k}^{\eta}+\text{log}{f}_{{k}^{\prime}}^{\theta}+\text{log}\phantom{\rule{thinmathspace}{0ex}}\mathrm{Pr}({\omega}_{p},\sigma \mid \Xi ,{\Psi}_{gk{k}^{\prime}}).$$

[21]

Substituting **Eqs. [20]** and **[21]** back in **Eq. [19]** gives the desired result.

^{2}We expand

$$\begin{array}{cc}\hfill {\gamma}_{i}\left(t\right)& =\mathrm{Pr}({V}_{t}\mid {q}_{t}^{P}=i)=\mathrm{Pr}({V}_{t}^{L},{V}_{t}^{R}\mid {q}_{t}^{P}=i)\hfill \\ \hfill & =\sum _{j=0}^{1}\mathrm{Pr}({V}_{t}^{L},{V}_{t}^{R},{q}_{t}=j\mid {q}_{t}^{P}=i)=\hfill \\ \hfill & =\sum _{j=0}^{1}\mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i)\cdot \mathrm{Pr}({V}_{t}^{L}\mid {q}_{t}=j,{q}_{t}^{P}=i)\cdot \mathrm{Pr}({V}_{t}^{R}\mid {V}_{t}^{L},{q}_{t}=j,{q}_{t}^{P}=i).\hfill \end{array}$$

[22]

The first term is simply the definition of ${A}_{ij}(g,t)$. Given *q _{t}*, ${V}_{t}^{L}$ is independent on ${q}_{t}^{P}$, thus the second term is just $\mathrm{Pr}({V}_{t}^{L}\mid {q}_{t}=j)={\gamma}_{j}\left({t}^{L}\right)$. By similar arguments the third term is just $\mathrm{Pr}({V}_{t}^{R}\mid {q}_{t}=j)={\gamma}_{j}\left({t}^{R}\right)$. By substituting those results in

^{3}One of the appealing features of this recursion is that it allows to treat missing data fairly easily. Only a single option has to be added to the initialization phase **Eq. [9]**,

$$\gamma (t\in {V}_{0})=\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right)\phantom{\rule{1em}{0ex}}{q}_{t}=\ast .$$

^{4}To prove this recursion, let us start with the definition of α,

$$\begin{array}{cc}\hfill {\alpha}_{ij}\left(t\right)& =\mathrm{Pr}({q}_{t}=j,{q}_{t}^{P}=i\mid {\omega}_{p})=\mathrm{Pr}({q}_{t}=j,{q}_{t}^{P}=i\mid {V}_{0})\hfill \\ \hfill & =\mathrm{Pr}({q}_{t}^{P}=i\mid {V}_{0})\cdot \mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i,{V}_{0})\hfill \\ \hfill & ={\beta}_{i}\left(\mathrm{P}\left(t\right)\right)\cdot \mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i,{V}_{0}).\hfill \end{array}$$

[23]

Let us make the decomposition *V*_{0} = *V _{t}* +

$${\alpha}_{ij}\left(t\right)={\beta}_{i}\left(P\left(t\right)\right)\cdot \mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i,{V}_{t}).$$

[24]

From Bayes formula,

$$\begin{array}{cc}\hfill \mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i,{V}_{t})& =\frac{\mathrm{Pr}({q}_{t}=j,{V}_{t}\mid {q}_{t}^{P}=i)}{\mathrm{Pr}({V}_{t}\mid {q}_{t}^{P}=i)}\hfill \\ \hfill & =\frac{\mathrm{Pr}({q}_{t}=j\mid {q}_{t}^{P}=i)\cdot \mathrm{Pr}({V}_{t}\mid {q}_{t}=j,{q}_{t}^{P}=i)}{{\gamma}_{i}\left(t\right)}\hfill \\ \hfill & =\frac{{A}_{ij}(g,t)}{{\gamma}_{i}\left(t\right)}\cdot \mathrm{Pr}({V}_{t}\mid {q}_{t}=j,{q}_{t}^{P}=i).\hfill \end{array}$$

[25]

But given *q _{t}*,

$$\mathrm{Pr}({V}_{t}\mid {q}_{t}=j,{q}_{t}^{P}=i)=\mathrm{Pr}({V}_{t}\mid {q}_{t}=j)={\stackrel{~}{\gamma}}_{j}\left(t\right).$$

[26]

Combining **Eqs. [25]** and **[26]** in **Eq. [24]**, we get

$${\alpha}_{ij}\left(t\right)=\frac{{\stackrel{~}{\gamma}}_{j}\left(t\right){\beta}_{i}\left(\mathrm{P}\left(t\right)\right)}{{\gamma}_{i}\left(t\right)}{A}_{ij}(g,p),$$

which is just another form of writing **Eq. [13]**.

^{5}When missing data are present, two simple modifications are required. First, we have to add to the initialization phase **Eq. [12]** an option

$$\alpha \left(\mathrm{D}\left(0\right)\right)=\frac{1}{\mathrm{Pr}({\omega}_{p}\mid {\Xi}^{0},{\Psi}_{gk{k}^{\prime}}^{0})}\left\{\begin{array}{cc}\hfill {\pi}_{0}{\gamma}_{0}\left[\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right]{A}_{00}[g,\mathrm{D}\left(0\right)]\hfill & \hfill {\pi}_{0}{\gamma}_{0}\stackrel{\u2012}{\mathrm{D}}\left(0\right)]{A}_{01}\left[\mathrm{D}\left(0\right)\right]\hfill \\ \hfill {\pi}_{1}{\gamma}_{1}\left[\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right]{A}_{10}\left[\mathrm{D}\left(0\right)\right]\hfill & \hfill {\pi}_{1}{\gamma}_{1}\left(\stackrel{\u2012}{\mathrm{D}}\left(0\right)\right]{A}_{11}\left[\mathrm{D}\left(0\right)\right]\hfill \end{array}\right\}\phantom{\rule{1em}{0ex}}\mathrm{D}\left(0\right)\in {V}_{0},{q}_{0}^{D}=\ast $$

Second, we have to add to the finalization phase **Eq. [14]** an option

$$\alpha \left(t\right)=\left\{\begin{array}{cc}\hfill {\beta}_{0}\left[\mathrm{P}\left(t\right)\right]{A}_{00}(g,t)\hfill & \hfill {\beta}_{0}\left[\mathrm{P}\left(t\right)\right]{A}_{01}(g,t)\hfill \\ \hfill {\beta}_{1}\left[\mathrm{P}\left(t\right)\right]{A}_{10}(g,t)\hfill & \hfill {\beta}_{1}\left[\mathrm{P}\left(t\right)\right]{A}_{11}(g,t)\hfill \end{array}\right\}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{q}_{t}=\ast .$$

^{6}The junction tree algorithm is a scheme to compute marginal probabilities of maximal cliques on graphs by means of belief propagation on a modified junction tree. Indeed, the matrix α computes marginal probabilities of pairs (*t*, P(*t*)), but such pairs are nothing but maximal cliques on rooted bifurcating trees.

^{7}In our implementation, we used Yang's quantile method (24) to compute the discrete levels of the gamma distributions such that each level has equal probability. Formally, ${f}_{1}^{\eta}=\nu ,{f}_{k}^{\eta}=(1-\nu )\u2215({K}_{\eta}-1)$ for *k* = 2, . . . , *K _{η}*, and ${f}_{k}^{\theta}=1\u2215{K}_{\theta}$ for

1. Nixon JE, Wang A, Morrison HG, McArthur AG, Sogin ML, Loftus BJ, Samuelson J. A spliceosomal intron in Giardia lamblia. Proc Natl Acad Sci U S A. 2002;99:3359–3361. [PubMed]

2. Vanacova S, Yan W, Carlton JM, Johnson PJ. Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis. Proc Natl Acad Sci U S A. 2005;102:4430–4435. [PubMed]

3. Simpson AG, MacQuarrie EK, Roger AJ. Early origin of canonical introns. Nature. 2002;419:270. [PubMed]

4. Collins L, Penny D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol. 2005;22:1053–1066. [PubMed]

5. Lynch M, Richardson AO. The evolution of spliceosomal introns. Curr Opin Genet Dev. 2002;12:701–710. [PubMed]

6. Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 2006;7:211–221. [PubMed]

7. Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol. 2003;13:1512–1517. [PubMed]

8. Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci U S A. 2005;102:1986–1991. [PubMed]

9. Roy SW, Gilbert W. Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Natl Acad Sci U S A. 2005;102:5773–5778. [PubMed]

10. Csuros M. McLysaght A, Huson D, editors. Likely scenarios of intron evolution, Lecture Notes in Bioinformatics. Proc. RECOMB 2005 Comparative Genomics International Workshop (RCG 2005) 2005;3678:47–60.

11. Qiu WG, Schisler N, Stoltzfus A. The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol. 2004;21:1252–1263. [PubMed]

12. Fedorov A, Roy SW, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res. 2003;13:2236–2241. [PubMed]

13. Cho S, Jin SW, Cohen A, Ellis RE. A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res. 2004;14:1207–1220. [PubMed]

14. Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res. 2006;16:750–756. [PubMed]

15. Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet. 2006;22:16–22. [PubMed]

16. Nguyen HD, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol. 2005;1:e79. [PMC free article] [PubMed]

17. Nei M, Chakraborty R, Fuerst PA. Infinite allele model with varying mutation rate. Proc Natl Acad Sci U S A. 1976;73:4164–4168. [PubMed]

18. Uzzell T, Corbin KW. Fitting discrete probability distributions to evolutionary events. Science. 1971;172:1089–1096. [PubMed]

19. Dibb NJ. Proto-splice site model of intron origin. J Theor Biol. 1991;151:405–416. [PubMed]

20. Dibb NJ, Newman AJ. Evidence that introns arose at proto-splice sites. Embo J. 1989;8:2015–2021. [PubMed]

21. Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Reconstruction of ancestral protosplice sites. Curr Biol. 2004;14:1505–1508. [PubMed]

22. Jordan IM, editor. Learning in Graphical Models. Kluwer Academic Publishers; Boston, MA: 1998.

23. Jin L, Nei M. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol. 1990;7:82–102. [PubMed]

24. Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994;39:306–314. [PubMed]

25. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc B. 1977;39:1–38.

26. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17:368–376. [PubMed]

27. Friedman N, Ninio M, Pe'er I, Pupko T. A structural EM algorithm for phylogenetic inference. J Comput Biol. 2002;9:331–353. [PubMed]

28. Siepel A, Haussler D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 2004;21:468–488. [PubMed]

29. Castillo E, Gutierrez JM, Hadi AS. Expert systems and probabilistic network models (Monographs in Computer Science) Springer; New York: 1996.

30. Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical recipes in C: The art of scientific computing. 2nd ed. Cambridge University Press; New York: 1992.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |