Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologically-related patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the ‘store and retrieve’ hypothesis positing that viruses stored early in latently infected cells preferentially transmit or establish new infections upon reactivation.
Since its discovery three decades ago, the HIV epidemic has unfolded into one of the most devastating pandemics in human history. When HIV replication cannot be completely inhibited, the fast-evolving retrovirus continuously evades intra-host immune and drug selective pressure, but diversifies according to more neutral epidemiological dynamics at the interhost level. Limited evidence suggests that the virus may evolve faster in a single host than in a population of hosts, and various hypotheses have been put forward to explain this phenomenon. Here, we develop a new computational approach aimed at integrating host transmission information with pathogen genealogical reconstructions. We apply this approach to comprehensive sequence data sets sampled from a large HIV-1 subtype C transmission chain, and in addition to providing several insights into the reconstruction of HIV-1 transmissions histories and its associated population dynamics, we find that transmission decreases the HIV-1 evolutionary rate. The fact that we also identify this decline for substitutions that do not alter amino acid substitutions provides evidence against hypotheses that invoke selection forces. Instead, our findings support earlier reports that new infections start preferentially with less evolved variants, which may be stored in latently infected cells, and this may vary among different HIV-1 subtypes.
The factors that determine the origin and fate of cross-species transmission events remain unclear for the majority of human pathogens, despite being central for the development of predictive models and assessing the efficacy of prevention strategies. Here, we describe a flexible Bayesian statistical framework to reconstruct virus transmission between different host species based on viral gene sequences, while simultaneously testing and estimating the contribution of several potential predictors of cross-species transmission. Specifically, we use a generalized linear model extension of phylogenetic diffusion to perform Bayesian model averaging over candidate predictors. By further extending this model with branch partitioning, we allow for distinct host transition processes on external and internal branches, thus discriminating between recent cross-species transmissions, many of which are likely to result in dead-end infections, and host shifts that reflect successful onwards transmission in the new host species. Our approach corroborates genetic distance between hosts as a key determinant of both host shifts and cross-species transmissions of rabies virus in North American bats. Furthermore, our results indicate that geographical range overlap is a modest predictor for cross-species transmission, but not for host shifts. Although our evolutionary framework focused on the multi-host reservoir dynamics of bat rabies virus, it is applicable to other pathogens and to other discrete state transition processes.
Bayesian diffusion models; branch partitioning; cross-species transmission; rabies virus
Bayesian phylogeographic methods simultaneously integrate geographical and evolutionary modelling, and have demonstrated value in assessing spatial spread patterns of measurably evolving organisms. We improve on existing phylogeographic methods by combining information from multiple phylogeographic datasets in a hierarchical setting. Consider N exchangeable datasets or strata consisting of viral sequences and locations, each evolving along its own phylogenetic tree and according to a conditionally independent geographical process. At the hierarchical level, a random graph summarizes the overall dispersion process by informing which migration rates between sampling locations are likely to be relevant in the strata. This approach provides an efficient and improved framework for analysing inherently hierarchical datasets. We first examine the evolutionary history of multiple serotypes of dengue virus in the Americas to showcase our method. Additionally, we explore an application to intrahost HIV evolution across multiple patients.
Bayesian statistics; phylodynamics; phylogenetics; random graphs; HIV; dengue
Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.
coalescent; smoothing; effective population size; Gaussian Markov random fields
Information on global human movement patterns is central to spatial epidemiological models used to predict the behavior of influenza and other infectious diseases. Yet it remains difficult to test which modes of dispersal drive pathogen spread at various geographic scales using standard epidemiological data alone. Evolutionary analyses of pathogen genome sequences increasingly provide insights into the spatial dynamics of influenza viruses, but to date they have largely neglected the wealth of information on human mobility, mainly because no statistical framework exists within which viral gene sequences and empirical data on host movement can be combined. Here, we address this problem by applying a phylogeographic approach to elucidate the global spread of human influenza subtype H3N2 and assess its ability to predict the spatial spread of human influenza A viruses worldwide. Using a framework that estimates the migration history of human influenza while simultaneously testing and quantifying a range of potential predictive variables of spatial spread, we show that the global dynamics of influenza H3N2 are driven by air passenger flows, whereas at more local scales spread is also determined by processes that correlate with geographic distance. Our analyses further confirm a central role for mainland China and Southeast Asia in maintaining a source population for global influenza diversity. By comparing model output with the known pandemic expansion of H1N1 during 2009, we demonstrate that predictions of influenza spatial spread are most accurate when data on human mobility and viral evolution are integrated. In conclusion, the global dynamics of influenza viruses are best explained by combining human mobility data with the spatial information inherent in sampled viral genomes. The integrated approach introduced here offers great potential for epidemiological surveillance through phylogeographic reconstructions and for improving predictive models of disease control.
What explains the geographic dispersal of emerging pathogens? Reconstructions of evolutionary history from pathogen gene sequences offer qualitative descriptions of spatial spread, but current approaches are poorly equipped to formally test and quantify the contribution of different potential explanatory factors, such as human mobility and demography. Here, we use a novel phylogeographic method to evaluate multiple potential predictors of viral spread in human influenza dynamics. We identify air travel as the predominant driver of global influenza migration, whilst also revealing the contribution of other mobility processes at more local scales. We demonstrate the power of our inter-disciplinary approach by using it to predict the global pandemic expansion of H1N1 influenza in 2009. Our study highlights the importance of integrating evolutionary and ecological information when studying the dynamics of infectious disease.
Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade host immunity acquired to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, and simultaneously characterize antigenic and genetic evolution by modeling the diffusion of antigenic phenotype over a shared virus phylogeny. Using HI data from influenza lineages A/H3N2, A/H1N1, B/Victoria and B/Yamagata, we determine patterns of antigenic drift across viral lineages, showing that A/H3N2 evolves faster and in a more punctuated fashion than other influenza lineages. We also show that year-to-year antigenic drift appears to drive incidence patterns within each influenza lineage. This work makes possible substantial future advances in investigating the dynamics of influenza and other antigenically-variable pathogens by providing a model that intimately combines molecular and antigenic evolution.
Every year, seasonal influenza, commonly called flu, infects up to one in five people around the world, and causes up to half a million deaths. Even though the human immune system can detect and destroy the virus that causes influenza, people can catch flu many times throughout their lifetimes because the virus keeps evolving in an effort to avoid the immune system. This antigenic drift—so-called because the antigens displayed by the virus keep changing—also explains why influenza vaccines become less effective over time and need to be reformulated every year.
It is possible to determine which antigens are displayed by a new strain of the virus by observing how blood samples that respond to known strains respond to the new strain. This information about the “antigenic phenotype” of the virus can be plotted on an antigenic map in which strains with similar antigens cluster together. Gene sequencing has shown that there are four subtypes of the flu virus that commonly infect people; but the relationship between changes in antigenic phenotype and changes in gene sequences of the influenza virus is poorly understood.
Bedford et al. have now developed an approach to combine antigenic maps with genetic information about the four subtypes of the human flu virus. This revealed that the antigenic phenotype of H3N2—a subtype that is becoming increasingly common—evolved faster than the other three subtypes. Further, a correlation was observed between antigenic drift and the number of new influenza cases per year for each flu strain. This suggests that knowing which antigenic phenotypes are present at the start of flu season could help predict which strains of the virus will predominate later on.
The work of Bedford et al. provides a useful framework to study influenza, and could help to pinpoint which changes in viral genes cause the changes in antigens. This information could potentially speed up the development of new flu vaccines for each flu season.
influenza; evolution; antigenic cartography; phylogenetics; Bayesian inference; multidimensional scaling; viruses
Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike’s information criterion through Markov chain Monte Carlo (AICM), in Bayesian model selection of demographic and molecular clock models. Almost simultaneously, a Bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.
model comparison; marginal likelihood; Bayes factors; path sampling; stepping-stone sampling; model averaging; molecular clock; Bayesian inference; phylogeny; BEAST
Multidrug-resistant (MDR) HIV-1 presents a challenge to the efficacy of antiretroviral therapy (ART). To examine mechanisms leading to MDR variants in infected individuals, we studied recombination between single viral genomes from the genital tract and plasma of a woman initiating ART. We determined HIV-1 RNA sequences and drug resistance profiles of 159 unique viral variants obtained before ART and semiannually for 4 years thereafter. Soon after initiating zidovudine, lamivudine, and nevirapine, resistant variants and intrapatient HIV-1 recombinants were detected in both compartments; the recombinants had inherited genetic material from both genital and plasma-derived viruses. Twenty-three unique recombinants were documented during 4 years of therapy, comprising ∼22% of variants. Most recombinant genomes displayed similar breakpoints and clustered phylogenetically, suggesting evolution from common ancestors. Longitudinal analysis demonstrated that MDR recombinants were common and persistent, demonstrating that recombination, in addition to point mutation, can contribute to the evolution of MDR HIV-1 in viremic individuals.
Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates () at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates.
Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.
Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beast-mcmc/. The method will be made available in the next public release of the package, including support to set up analyses in BEAUti.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
model comparison; marginal likelihood; Bayes factors; path sampling; stepping-stone sampling; demographic models; molecular clock; Bayesian inference; phylogeny; BEAST
The factors that determine the origin and fate of cross-species transmission events remain unclear for the majority of human pathogens despite being central for the development of predictive models and assessing the efficacy of prevention strategies. Here, we describe a flexible Bayesian statistical framework to reconstruct virus transmission between different host species based on viral gene sequences while simultaneously testing and estimating the contribution of several potential predictors of cross-species transmission. Specifically, we employ a generalized linear model extension of phylogenetic diffusion to perform Bayesian model averaging over candidate predictors. By further extending this model with branch partitioning, we allow for distinct host transition processes on external and internal branches, thus discriminating between recent cross-species transmissions, many of which are likely to result in dead-end infections, and host shifts that reflect successful onwards transmission in the new host species. Our approach corroborates genetic distance between hosts as a key determinant of both host shifts and cross-species transmissions of rabies virus in North American bats. Furthermore, our results indicate that geographical range overlap is a modest predictor for cross-species transmission but not for host shifts. Although our evolutionary framework focused on the multi-host reservoir dynamics of bat rabies virus, it is applicable to other pathogens and to other discrete state transition processes.
Changes in Dengue virus (DENV) disease patterns in the Americas over recent decades have been attributed, at least in part, to repeated introduction of DENV strains from other regions, resulting in a shift from hypoendemicity to hyperendemicity. Using newly sequenced DENV-1 and DENV-3 envelope (E) gene isolates from 11 Caribbean countries, along with sequences available on GenBank, we sought to document the population genetic and spatiotemporal transmission histories of the four main invading DENV genotypes within the Americas and investigate factors that influence the rate and intensity of DENV transmission. For all genotypes, there was an initial invasion phase characterized by rapid increases in genetic diversity, which coincided with the first confirmed cases of each genotype in the region. Rapid geographic dispersal occurred upon each genotype's introduction, after which individual lineages were locally maintained, and gene flow was primarily observed among neighboring and nearby countries. There were, however, centers of viral diversity (Barbados, Puerto Rico, Colombia, Suriname, Venezuela, and Brazil) that were repeatedly involved in gene flow with more distant locations. For DENV-1 and DENV-2, we found that a “distance-informed” model, which posits that the intensity of virus movement between locations is inversely proportional to the distance between them, provided a better fit than a model assuming equal rates of movement between all pairs of countries. However, for DENV-3 and DENV-4, the more stochastic “equal rates” model was preferred.
dengue virus; gene flow; Bayesian phylogeography; Americas; population dynamics; evolution; coalescent
John Maynard Smith compared protein evolution to the game where one word is converted into another a single letter at a time, with the constraint that all intermediates are words: WORD→WORE→GORE→GONE→GENE. In this analogy, epistasis constrains evolution, with some mutations tolerated only after the occurrence of others. To test whether epistasis similarly constrains actual protein evolution, we created all intermediates along a 39-mutation evolutionary trajectory of influenza nucleoprotein, and also introduced each mutation individually into the parent. Several mutations were deleterious to the parent despite becoming fixed during evolution without negative impact. These mutations were destabilizing, and were preceded or accompanied by stabilizing mutations that alleviated their adverse effects. The constrained mutations occurred at sites enriched in T-cell epitopes, suggesting they promote viral immune escape. Our results paint a coherent portrait of epistasis during nucleoprotein evolution, with stabilizing mutations permitting otherwise inaccessible destabilizing mutations which are sometimes of adaptive value.
During evolution, the effect of one mutation on a protein can depend on whether another mutation is also present. This phenomenon is similar to the game in which one word is converted to another word, one letter at a time, subject to the rule that all the intermediate steps are also valid words: for example, the word WORD can be converted to the word GENE as follows: WORD→WORE→GORE→GONE→GENE. In this example, the D must be changed to an E before the W is changed to a G, because GORD is not a valid word.
Similarly, during the evolution of a virus, a mutation that helps the virus evade the human immune system might only be tolerated if the virus has acquired another mutation beforehand. This type of mutational interaction would constrain the evolution of the virus, since its capacity to take advantage of the second mutation depends on the first mutation having already occurred.
Gong et al. examined whether such interactions have indeed constrained evolution of the influenza virus. Between 1968 and 2007, the nucleoprotein—which acts as a scaffold for the replication of genetic material—in the human H3N2 influenza virus underwent a series of 39 mutations. To test whether all of these mutations could have been tolerated by the 1968 virus, Gong et al. introduced each one individually into the 1968 nucleoprotein. They found that several mutations greatly reduced the fitness of the 1968 virus when introduced on their own, which strongly suggests that these ‘constrained mutations’ became part of the virus’s genetic makeup as a result of interactions with ‘enabling’ mutations.
The constrained mutations decreased the stability of the nucleoprotein at high temperatures, while the enabling mutations counteracted this effect. It may, therefore, be possible to identify enabling mutations based on their effects on thermal stability. Intriguingly, the constrained mutations helped the virus overcome one form of human immunity to influenza, suggesting that interactions between mutations might limit the rate at which viruses evolve to evade the immune system.
Overall, these results show that interactions among mutations constrain the evolution of the influenza nucleoprotein in a fashion that can be largely understood in terms of protein stability. If the same is true for other proteins and viruses, this work could lead to a deeper understanding of the constraints that govern evolution at the molecular level.
epistasis; protein evolution; influenza; protein stability; Viruses
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters.
phylogenetics; Bayesian nonparametrics; wavelets; importance sampling; influenza A
Human immunodeficiency virus type 2 (HIV-2) emerged in West Africa and has spread further to countries that share socio-historical ties with this region. However, viral origins and dispersal patterns at a global scale remain poorly understood. Here, we adopt a Bayesian phylogeographic approach to investigate the spatial dynamics of HIV-2 group A (HIV-2A) using a collection of 320 partial pol and 248 partial env sequences sampled throughout 19 countries worldwide. We extend phylogenetic diffusion models that simultaneously draw information from multiple loci to estimate location states throughout distinct phylogenies and explicitly attempt to incorporate human migratory fluxes. Our study highlights that Guinea-Bissau, together with Côte d’Ivoire and Senegal, have acted as the main viral sources in the early stages of the epidemic. We show that convenience sampling can obfuscate the estimation of the spatial root of HIV-2A. We explicitly attempt to circumvent this by incorporating rate priors that reflect the ratio of human flow from and to West Africa. We recover four main routes of HIV-2A dispersal that are laid out along colonial ties: Guinea-Bissau and Cape Verde to Portugal, Côte d’Ivoire and Senegal to France. Within Europe, we find strong support for epidemiological linkage from Portugal to Luxembourg and to the UK. We demonstrate that probabilistic models can uncover global patterns of HIV-2A dispersal providing sampling bias is taken into account and we provide a scenario for the international spread of this virus.
Probabilistic inference of a phylogenetic tree from molecular sequence data is predicated on a substitution model describing the relative rates of change between character states along the tree for each site in the multiple sequence alignment. Commonly, one assumes that the substitution model is homogeneous across sites within large partitions of the alignment, assigns these partitions a priori, and then fixes their underlying substitution model to the best-fitting model from a hierarchy of named models. Here, we introduce an automatic model selection and model averaging approach within a Bayesian framework that simultaneously estimates the number of partitions, the assignment of sites to partitions, the substitution model for each partition, and the uncertainty in these selections. This new approach is implemented as an add-on to the BEAST 2 software platform. We find that this approach dramatically improves the fit of the nucleotide substitution model compared with existing approaches, and we show, using a number of example data sets, that as many as nine partitions are required to explain the heterogeneity in nucleotide substitution process across sites in a single gene analysis. In some instances, this improved modeling of the substitution process can have a measurable effect on downstream inference, including the estimated phylogeny, relative divergence times, and effective population size histories.
across-site rate variation; Dirichlet process mixture model; Bayesian model selection
Phylogeographic approaches help uncover the imprint that spatial epidemiological processes leave in the genomes of fast evolving viruses. Recent Bayesian inference methods that consider phylogenetic diffusion of discretely and continuously distributed traits offer a unique opportunity to explore genotypic and phenotypic evolution in greater detail. To provide a taste of the recent advances in viral diffusion approaches, we highlight key findings arising at the intra-host, local and global epidemiological scales. We also outline future areas of research and discuss how these may contribute to a quantitative understanding of the phylodynamics of RNA viruses.
Multiple origins indicate this serotype was introduced in several episodes.
Dengue virus serotype 4 (DENV-4) reemerged in Roraima State, Brazil, 28 years after it was last detected in the country in 1982. To study the origin and evolution of this reemergence, full-length sequences were obtained for 16 DENV-4 isolates from northern (Roraima, Amazonas, Pará States) and northeastern (Bahia State) Brazil during the 2010 and 2011 dengue virus seasons and for an isolate from the 1982 epidemic in Roraima. Spatiotemporal dynamics of DENV-4 introductions in Brazil were applied to envelope genes and full genomes by using Bayesian phylogeographic analyses. An introduction of genotype I into Brazil from Southeast Asia was confirmed, and full genome phylogeographic analyses revealed multiple introductions of DENV-4 genotype II in Brazil, providing evidence for >3 introductions of this genotype within the last decade: 2 from Venezuela to Roraima and 1 from Colombia to Amazonas. The phylogeographic analysis of full genome data has demonstrated the origins of DENV-4 throughout Brazil.
dengue virus; serotype 4; molecular epidemiology; phylogeography; Brazil; viruses; reemergence; genetic characterization; spatiotemporal patterns
A birth-death process is a continuous-time Markov chain that counts the number of particles in a system over time. In the general process with n current particles, a new particle is born with instantaneous rate λn and a particle dies with instantaneous rate μn. Currently no robust and efficient method exists to evaluate the finite-time transition probabilities in a general birth-death process with arbitrary birth and death rates. In this paper, we first revisit the theory of continued fractions to obtain expressions for the Laplace transforms of these transition probabilities and make explicit an important derivation connecting transition probabilities and continued fractions. We then develop an efficient algorithm for computing these probabilities that analyzes the error associated with approximations in the method. We demonstrate that this error-controlled method agrees with known solutions and outperforms previous approaches to computing these probabilities. Finally, we apply our novel method to several important problems in ecology, evolution, and genetics.
General birth-death process; Continuous-time Markov chain; Transition probabilities; Population genetics; Ecology; Evolution
Host species switches by bacterial pathogens leading to new endemic infections are important evolutionary events that are difficult to reconstruct over the long term. We investigated the host switching of Staphylococcus aureus over a long evolutionary timeframe by developing Bayesian phylogenetic methods to account for uncertainty about past host associations and using estimates of evolutionary rates from serially sampled whole-genome data. Results suggest multiple jumps back and forth between human and bovids with the first switch from humans to bovids taking place around 5500 BP, coinciding with the expansion of cattle domestication throughout the Old World. The first switch to poultry is estimated at around 275 BP, long after domestication but still preceding large-scale commercial farming. These results are consistent with a central role for anthropogenic change in the emergence of new endemic diseases.
Bayesian phylogenetics; molecular clocks; bacterial evolution; host switching
The interplay between C-C chemokine receptor type 5 (CCR5) host genetic background, disease progression, and intrahost HIV-1 evolutionary dynamics remains unclear because differences in viral evolution between hosts limit the ability to draw conclusions across hosts stratified into clinically relevant populations. Similar inference problems are proliferating across many measurably evolving pathogens for which intrahost sequence samples are readily available. To this end, we propose novel hierarchical phylogenetic models (HPMs) that incorporate fixed effects to test for differences in dynamics across host populations in a formal statistical framework employing stochastic search variable selection and model averaging. To clarify the role of CCR5 host genetic background and disease progression on viral evolutionary patterns, we obtain gp120 envelope sequences from clonal HIV-1 variants isolated at multiple time points in the course of infection from populations of HIV-1–infected individuals who only harbored CCR5-using HIV-1 variants at all time points. Presence or absence of a CCR5 wt/Δ32 genotype and progressive or long-term nonprogressive course of infection stratify the clinical populations in a two-way design. As compared with the standard approach of analyzing sequences from each patient independently, the HPM provides more efficient estimation of evolutionary parameters such as nucleotide substitution rates and dN/dS rate ratios, as shown by significant shrinkage of the estimator variance. The fixed effects also correct for nonindependence of data between populations and results in even further shrinkage of individual patient estimates. Model selection suggests an association between nucleotide substitution rate and disease progression, but a role for CCR5 genotype remains elusive. Given the absence of clear dN/dS differences between patient groups, delayed onset of AIDS symptoms appears to be solely associated with lower viral replication rates rather than with differences in selection on amino acid fixation.
CCR5; envelope; HIV-1; hierarchical phylogenetic models; disease progression; Bayesian inference
Staphylococcus aureus is a common cause of infections that has undergone rapid global spread over recent decades. Formal phylogeographic methods have not yet been applied to the molecular epidemiology of bacterial pathogens because the limited genetic diversity of data sets based on individual genes usually results in poor phylogenetic resolution. Here, we investigated a whole-genome single nucleotide polymorphism (SNP) data set of health care-associated Methicillin-resistant S. aureus sequence type 239 (HA-MRSA ST239) strains, which we analyzed using Markov spatial models that incorporate geographical sampling distributions. The reconstructed timescale indicated a temporal origin of this strain shortly after the introduction of Methicillin, followed by global pandemic spread. The estimate of the temporal origin was robust to the molecular clock, coalescent prior, full/intergenic/synonymous SNP inclusion, and correction for excluded invariant site patterns. Finally, phylogeographic analyses statistically supported the role of human movement in the global dissemination of HA-MRSA ST239, although it was unable to conclusively resolve the location of the root. This study demonstrates that bacterial genomes can indeed contain sufficient evolutionary information to elucidate the temporal and spatial dynamics of transmission. Future applications of this approach to other bacterial strains may provide valuable epidemiological insights that may justify the cost of genome-wide typing.
Bayesian inférence; phylogeography; phylogenetics; measurably evolving population
But Tuffley and Steel (1997) introduced a model called No Common Mechanism (NCM), in which characters may—but are not required to—vary their relative rates independently, both within and between branches. Because the independent variation is taken only as a possibility, not as a requirement, NCM would apply to almost any situation, and so may be accepted as realistic. This is useful because Tuffley and Steel also showed that maximum likelihood under NCM selects the same trees as does parsimony. With the realistic NCM in the background, then, most parsimonious trees have greatest power to explain available observations.
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk
Bayesian phylogenetics; evolution; phylogenetics; molecular evolution; coalescent theory
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
Bayes factor; Bayesian inference; MCMC; model averaging; model choice