PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (66)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Reconstructing the Dynamics of HIV Evolution within Hosts from Serial Deep Sequence Data 
PLoS Computational Biology  2012;8(11):e1002753.
At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a ‘fitness valley’ separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant.
Author Summary
At the start of infection, human immunodeficiency virus (HIV) generally requires a specific protein receptor (CCR5) on the cell surface to bind and enter the cell. In roughly half of all HIV infections, the virus population eventually switches to using a different receptor (CXCR4). This ‘HIV coreceptor switch’ is associated with an accelerated rate of progression to AIDS. Although it is not known why this switch occurs in some infections and not others, it is thought to be shaped by constraints on how HIV can evolve from one mode to another. In this study, we test this hypothesis by reconstructing the evolutionary histories of HIV within 8 patients known to have undergone an HIV coreceptor switch. Each history is recreated from samples of HIV genetic sequences that were derived from repeated blood samples by next-generation sequencing, an emerging technology that is rapidly becoming an essential tool in the study of rapidly-evolving populations such as viruses or cancerous cells. Because we have samples from different points in time, we can use models of evolution to extrapolate back in time to the ancestors of each infection. Our analysis reveals patient-specific dynamics in HIV evolution that sheds new light on the determinants of the coreceptor switch.
doi:10.1371/journal.pcbi.1002753
PMCID: PMC3486858  PMID: 23133358
2.  Dynamics of a Sex-Linked Deleterious Mutation in Populations Subject to Sex Reversal 
PLoS ONE  2011;6(10):e25362.
The heterogametic sex chromosomes (i.e. mammalian Y and avian W) do not usually recombine with the homogametic sex chromosomes which is known to lead into rapid degeneration of Y and W due to accumulation of deleterious mutations. On the other hand, some 96% of amphibian species have homomorphic, i.e. non-degenerate chromosomes. Nicolas Perrin's fountain-of-youth hypothesis states that this is a result of recombination between and chromosomes in sex-reversed individuals. In this study, I model the consequences of such recombination for the dynamics of a deleterious mutation occurring in chromosomes. As expected, even relatively low levels of sex reversal help to purge deleterious mutations. However, the population-dynamic consequences of this depend on the type of selection that operates on the population undergoing sex reversal. Under fecundity selection, sex reversal can be beneficial for some parameter values, whereas under survival selection, it seems to be always harmful.
doi:10.1371/journal.pone.0025362
PMCID: PMC3189978  PMID: 22016765
3.  Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology 
Bioinformatics  2010;26(19):2455-2457.
Summary: Datamonkey is a popular web-based suite of phylogenetic analysis tools for use in evolutionary biology. Since the original release in 2005, we have expanded the analysis options to include recently developed algorithmic methods for recombination detection, evolutionary fingerprinting of genes, codon model selection, co-evolution between sites, identification of sites, which rapidly escape host-immune pressure and HIV-1 subtype assignment. The traditional selection tools have also been augmented to include recent developments in the field. Here, we summarize the analyses options currently available on Datamonkey, and provide guidelines for their use in evolutionary biology.
Availability and documentation: http://www.datamonkey.org
Contact: spond@ucsd.edu
doi:10.1093/bioinformatics/btq429
PMCID: PMC2944195  PMID: 20671151
4.  Selection in Coastal Synechococcus (Cyanobacteria) Populations Evaluated from Environmental Metagenomes 
PLoS ONE  2011;6(9):e24249.
Environmental metagenomics provides snippets of genomic sequences from all organisms in an environmental sample and are an unprecedented resource of information for investigating microbial population genetics. Current analytical methods, however, are poorly equipped to handle metagenomic data, particularly of short, unlinked sequences. A custom analytical pipeline was developed to calculate dN/dS ratios, a common metric to evaluate the role of selection in the evolution of a gene, from environmental metagenomes sequenced using 454 technology of flow-sorted populations of marine Synechococcus, the dominant cyanobacteria in coastal environments. The large majority of genes (98%) have evolved under purifying selection (dN/dS<1). The metagenome sequence coverage of the reference genomes was not uniform and genes that were highly represented in the environment (i.e. high read coverage) tended to be more evolutionarily conserved. Of the genes that may have evolved under positive selection (dN/dS>1), 77 out of 83 (93%) were hypothetical. Notable among annotated genes, ribosomal protein L35 appears to be under positive selection in one Synechococcus population. Other annotated genes, in particular a possible porin, a large-conductance mechanosensitive channel, an ATP binding component of an ABC transporter, and a homologue of a pilus retraction protein had regions of the gene with elevated dN/dS. With the increasing use of next-generation sequencing in metagenomic investigations of microbial diversity and ecology, analytical methods need to accommodate the peculiarities of these data streams. By developing a means to analyze population diversity data from these environmental metagenomes, we have provided the first insight into the role of selection in the evolution of Synechococcus, a globally significant primary producer.
doi:10.1371/journal.pone.0024249
PMCID: PMC3170327  PMID: 21931665
5.  Social Complexity and Nesting Habits Are Factors in the Evolution of Antimicrobial Defences in Wasps 
PLoS ONE  2011;6(7):e21763.
Microbial diseases are important selective agents in social insects and one major defense mechanism is the secretion of cuticular antimicrobial compounds. We hypothesized that given differences in group size, social complexity, and nest type the secretions of these antimicrobials will be under different selective pressures. To test this we extracted secretions from nine wasp species of varying social complexity and nesting habits and assayed their antimicrobial compounds against cultures of Staphylococcus aureus. These data were then combined with phylogenetic data to provide an evolutionary context. Social species showed significantly higher (18x) antimicrobial activity than solitary species and species with paper nests showed significantly higher (11x) antimicrobial activity than those which excavated burrows. Mud-nest species showed no antimicrobial activity. Solitary, burrow-provisioning wasps diverged at more basal nodes of the phylogenetic trees, while social wasps diverged from the most recent nodes. These data suggest that antimicrobial defences may have evolved in response to ground-dwelling pathogens but the most important variable leading to increased antimicrobial strength was increase in group size and social complexity.
doi:10.1371/journal.pone.0021763
PMCID: PMC3130748  PMID: 21754998
6.  In Vitro Selection of Clinically Relevant Bevirimat Resistance Mutations Revealed by “Deep” Sequencing of Serially Passaged, Quasispecies-Containing Recombinant HIV-1 ▿ †  
Journal of Clinical Microbiology  2010;49(1):201-208.
Initial in vitro studies of bevirimat resistance failed to observe mutations in the clinically significant QVT motif in SP1 of HIV-1 gag. This study presents a novel screening method involving mixed, clinically derived gag-protease recombinant HIV-1 samples to more accurately mimic the selection of resistance seen in vivo. Bevirimat resistance was investigated via population-based sequencing performed with a large, initially antiretroviral-naïve cohort before (n = 805) and after (n = 355) standard HIV therapy (without bevirimat). The prevalence of any polymorphism in the motif comprising Q, V, and T was ∼6%, 29%, and 12%, respectively, and did not change appreciably over the course of therapy. From these samples, three groups of 10 samples whose bulk sequences were wild type at the QVT motif were used to generate gag-protease recombinant viruses that captured the existing diversity. Groups were mixed and passaged with various bevirimat concentrations for 9 weeks. gag variations were assessed by amplicon-based “deep” sequencing using a GS FLX sequencer (Roche). Unscreened mutations were present in all groups, and a V370A minority not originally detected by bulk sequencing was present in one group. V370A, occurring together with another preexisting, unscreened resistance mutation, was selected in all groups in the presence of a bevirimat concentration above 0.1 μM. For the two groups with V370A levels below consistent detectability by deep sequencing, the initial selection of V370A required 3 to 4 weeks of exposure to a narrow range of bevirimat concentrations, whereas for the group with the V370A minority, selection occurred immediately. This approach provides quasispecies diversity that facilitates the selection of mutations observed in clinical trials and, coupled with deep sequencing, could represent an efficient in vitro screening method for detecting resistance mutations.
doi:10.1128/JCM.01868-10
PMCID: PMC3020451  PMID: 21084518
7.  Transmitted Drug Resistance in the CFAR Network of Integrated Clinical Systems Cohort: Prevalence and Effects on Pre-Therapy CD4 and Viral Load 
PLoS ONE  2011;6(6):e21189.
Human immunodeficiency virus type 1 (HIV-1) genomes often carry one or more mutations associated with drug resistance upon transmission into a therapy-naïve individual. We assessed the prevalence and clinical significance of transmitted drug resistance (TDR) in chronically-infected therapy-naïve patients enrolled in a multi-center cohort in North America. Pre-therapy clinical significance was quantified by plasma viral load (pVL) and CD4+ cell count (CD4) at baseline. Naïve bulk sequences of HIV-1 protease and reverse transcriptase (RT) were screened for resistance mutations as defined by the World Health Organization surveillance list. The overall prevalence of TDR was 14.2%. We used a Bayesian network to identify co-transmission of TDR mutations in clusters associated with specific drugs or drug classes. Aggregate effects of mutations by drug class were estimated by fitting linear models of pVL and CD4 on weighted sums over TDR mutations according to the Stanford HIV Database algorithm. Transmitted resistance to both classes of reverse transcriptase inhibitors was significantly associated with lower CD4, but had opposing effects on pVL. In contrast, position-specific analyses of TDR mutations revealed substantial effects on CD4 and pVL at several residue positions that were being masked in the aggregate analyses, and significant interaction effects as well. Residue positions in RT with predominant effects on CD4 or pVL (D67 and M184) were re-evaluated in causal models using an inverse probability-weighting scheme to address the problem of confounding by other mutations and demographic or risk factors. We found that causal effect estimates of mutations M184V/I ( pVL) and D67N/G ( and pVL) were compensated by K103N/S and K219Q/E/N/R. As TDR becomes an increasing dilemma in this modern era of highly-active antiretroviral therapy, these results have immediate significance for the clinical management of HIV-1 infections and our understanding of the ongoing adaptation of HIV-1 to human populations.
doi:10.1371/journal.pone.0021189
PMCID: PMC3118815  PMID: 21701595
8.  jMOTU and Taxonerator: Turning DNA Barcode Sequences into Annotated Operational Taxonomic Units 
PLoS ONE  2011;6(4):e19259.
Background
DNA barcoding and other DNA sequence-based techniques for investigating and estimating biodiversity require explicit methods for associating individual sequences with taxa, as it is at the taxon level that biodiversity is assessed. For many projects, the bioinformatic analyses required pose problems for laboratories whose prime expertise is not in bioinformatics. User-friendly tools are required for both clustering sequences into molecular operational taxonomic units (MOTU) and for associating these MOTU with known organismal taxonomies.
Results
Here we present jMOTU, a Java program for the analysis of DNA barcode datasets that uses an explicit, determinate algorithm to define MOTU. We demonstrate its usefulness for both individual specimen-based Sanger sequencing surveys and bulk-environment metagenetic surveys using long-read next-generation sequencing data. jMOTU is driven through a graphical user interface, and can analyse tens of thousands of sequences in a short time on a desktop computer. A companion program, Taxonerator, that adds traditional taxonomic annotation to MOTU, is also presented. Clustering and taxonomic annotation data are stored in a relational database, and are thus amenable to subsequent data mining and web presentation.
Conclusions
jMOTU efficiently and robustly identifies the molecular taxa present in survey datasets, and Taxonerator decorates the MOTU with putative identifications. jMOTU and Taxonerator are freely available from http://www.nematodes.org/.
doi:10.1371/journal.pone.0019259
PMCID: PMC3081837  PMID: 21541350
9.  Phylogenetic Analysis of Population-Based and Deep Sequencing Data to Identify Coevolving Sites in the nef Gene of HIV-1 
Molecular Biology and Evolution  2009;27(4):819-832.
Rapidly evolving viruses such as HIV-1 display extensive sequence variation in response to host-specific selection, while simultaneously maintaining functions that are critical to replication and infectivity. This apparent conflict between diversifying and purifying selection may be resolved by an abundance of epistatic interactions such that the same functional requirements can be met by highly divergent sequences. We investigate this hypothesis by conducting an extensive characterization of sequence variation in the HIV-1 nef gene that encodes a highly variable multifunctional protein. Population-based sequences were obtained from 686 patients enrolled in the HOMER cohort in British Columbia, Canada, from which the distribution of nonsynonymous substitutions in the phylogeny was reconstructed by maximum likelihood. We used a phylogenetic comparative method on these data to identify putative epistatic interactions between residues. Two interactions (Y120/Q125 and N157/S169) were chosen to further investigate within-host evolution using HIV-1 RNA extractions from plasma samples from eight patients. Clonal sequencing confirmed strong linkage between polymorphisms at these sites in every case. We used massively parallel pyrosequencing (MPP) to reconstruct within-host evolution in these patients. Experimental error associated with MPP was quantified by performing replicates at two different stages of the protocol, which were pooled prior to analysis to reduce this source of variation. Phylogenetic reconstruction from these data revealed correlated substitutions at Y120/Q125 or N157/S169 repeated across multiple lineages in every host, indicating convergent within-host evolution shaped by epistatic interactions.
doi:10.1093/molbev/msp289
PMCID: PMC2877536  PMID: 19955476
coevolution; epistasis; HIV-1; next-generation sequencing; ancestral reconstruction; sequencing error
10.  Evolutionary Fingerprinting of Genes 
Molecular Biology and Evolution  2009;27(3):520-536.
Over time, natural selection molds every gene into a unique mosaic of sites evolving rapidly or resisting change—an “evolutionary fingerprint” of the gene. Aspects of this evolutionary fingerprint, such as the site-specific ratio of nonsynonymous to synonymous substitution rates (dN/dS), are commonly used to identify genetic features of potential biological interest; however, no framework exists for comparing evolutionary fingerprints between genes. We hypothesize that protein-coding genes with similar protein structure and/or function tend to have similar evolutionary fingerprints and that comparing evolutionary fingerprints can be useful for discovering similarities between genes in a way that is analogous to, but independent of, discovery of similarity via sequence-based comparison tools such as Blast.
To test this hypothesis, we develop a novel model of coding sequence evolution that uses a general bivariate discrete parameterization of the evolutionary rates. We show that this approach provides a better fit to the data using a smaller number of parameters than existing models. Next, we use the model to represent evolutionary fingerprints as probability distributions and present a methodology for comparing these distributions in a way that is robust against variations in data set size and divergence. Finally, using sequences of three rapidly evolving RNA viruses (HIV-1, hepatitis C virus, and influenza A virus), we demonstrate that genes within the same functional group tend to have similar evolutionary fingerprints. Our framework provides a sound statistical foundation for efficient inference and comparison of evolutionary rate patterns in arbitrary collections of gene alignments, clustering homologous and nonhomologous genes, and investigation of biological and functional correlates of evolutionary rates.
doi:10.1093/molbev/msp260
PMCID: PMC2877558  PMID: 19864470
adaptive evolution; codon models; evolutionary distance; machine classification
11.  Protease polymorphisms in HIV-1 subtype CRF01_AE represent selection by antiretroviral therapy and host immune pressure 
AIDS (London, England)  2010;24(3):411-416.
Background
Most of our knowledge about how antiretrovirals and host immune responses influence the HIV-1 protease gene is derived from studies of subtype B virus. We investigated the effect of protease resistance-associated mutations (PRAMs) and population-based HLA haplotype frequencies on polymorphisms found in CRF01_AE pro.
Methods
We used all CRF01_AE protease sequences retrieved from the LANL database and obtained regional HLA frequencies from the dbMHC database. Polymorphisms and major PRAMs in the sequences were identified using the Stanford Resistance Database, and we performed phylogenetic and selection analyses using HyPhy. HLA binding affinities were estimated using the Immune Epitope Database and Analysis.
Results
Overall, 99% of CRF01_AE sequences had at least 1 polymorphism and 10% had at least 1 major PRAM. Three polymorphisms (L10 V, K20RMI and I62 V) were associated with the presence of a major PRAM (P < 0.05). Compared to the subtype B consensus, six additional polymorphisms (I13 V, E35D, M36I, R41K, H69K, L89M) were identified in the CRF01_AE consensus; all but L89M were located within epitopes recognized by HLA class I alleles. Of the predominant HLA haplotypes in the Asian regions of CRF01_AE origin, 80% were positively associated with the observed polymorphisms, and estimated HLA binding affinity was estimated to decrease 19–40 fold with the observed polymorphisms at positions 35, 36 and 41.
Conclusion
Polymorphisms in CRF01_AE protease gene were common, and polymorphisms at residues 10, 20 and 62 most likely represent selection by use of protease inhibitors, whereas R41K and H69K were more likely attributable to recognition of epitopes by the HLA haplotypes of the host population.
doi:10.1097/QAD.0b013e3283350eef
PMCID: PMC2913588  PMID: 20009919
CRF01_AE; HIV; HLA; polymorphisms; protease; resistance
12.  Purging Deleterious Mutations under Self Fertilization: Paradoxical Recovery in Fitness with Increasing Mutation Rate in Caenorhabditis elegans 
PLoS ONE  2010;5(12):e14473.
Background
The accumulation of deleterious mutations can drastically reduce population mean fitness. Self-fertilization is thought to be an effective means of purging deleterious mutations. However, widespread linkage disequilibrium generated and maintained by self-fertilization is predicted to reduce the efficacy of purging when mutations are present at multiple loci.
Methodology/Principal Findings
We tested the ability of self-fertilizing populations to purge deleterious mutations at multiple loci by exposing obligately self-fertilizing populations of Caenorhabditis elegans to a range of elevated mutation rates and found that mutations accumulated, as evidenced by a reduction in mean fitness, in each population. Therefore, purging in obligate selfing populations is overwhelmed by an increase in mutation rate. Surprisingly, we also found that obligate and predominantly self-fertilizing populations exposed to very high mutation rates exhibited consistently greater fitness than those subject to lesser increases in mutation rate, which contradicts the assumption that increases in mutation rate are negatively correlated with fitness. The high levels of genetic linkage inherent in self-fertilization could drive this fitness increase.
Conclusions
Compensatory mutations can be more frequent under high mutation rates and may alleviate a portion of the fitness lost due to the accumulation of deleterious mutations through epistatic interactions with deleterious mutations. The prolonged maintenance of tightly linked compensatory and deleterious mutations facilitated by self-fertilization may be responsible for the fitness increase as linkage disequilibrium between the compensatory and deleterious mutations preserves their epistatic interaction.
doi:10.1371/journal.pone.0014473
PMCID: PMC3013097  PMID: 21217820
13.  jsPhyloSVG: A Javascript Library for Visualizing Interactive and Vector-Based Phylogenetic Trees on the Web 
PLoS ONE  2010;5(8):e12267.
Background
Many software packages have been developed to address the need for generating phylogenetic trees intended for print. With an increased use of the web to disseminate scientific literature, there is a need for phylogenetic trees to be viewable across many types of devices and feature some of the interactive elements that are integral to the browsing experience. We propose a novel approach for publishing interactive phylogenetic trees.
Methods/Principal Findings
We present a javascript library, jsPhyloSVG, which facilitates constructing interactive phylogenetic trees from raw Newick or phyloXML formats directly within the browser in Scalable Vector Graphics (SVG) format. It is designed to work across all major browsers and renders an alternative format for those browsers that do not support SVG. The library provides tools for building rectangular and circular phylograms with integrated charting. Interactive features may be integrated and made to respond to events such as clicks on any element of the tree, including labels.
Conclusions/Significance
jsPhyloSVG is an open-source solution for rendering dynamic phylogenetic trees. It is capable of generating complex and interactive phylogenetic trees across all major browsers without the need for plugins. It is novel in supporting the ability to interpret the tree inference formats directly, exposing the underlying markup to data-mining services. The library source code, extensive documentation and live examples are freely accessible at www.jsphylosvg.com.
doi:10.1371/journal.pone.0012267
PMCID: PMC2923619  PMID: 20805892
14.  The Re-Emergence of H1N1 Influenza Virus in 1977: A Cautionary Tale for Estimating Divergence Times Using Biologically Unrealistic Sampling Dates 
PLoS ONE  2010;5(6):e11184.
In 1977, H1N1 influenza A virus reappeared after a 20-year absence. Genetic analysis indicated that this strain was missing decades of nucleotide sequence evolution, suggesting an accidental release of a frozen laboratory strain into the general population. Recently, this strain and its descendants were included in an analysis attempting to date the origin of pandemic influenza virus without accounting for the missing decades of evolution. Here, we investigated the effect of using viral isolates with biologically unrealistic sampling dates on estimates of divergence dates. Not accounting for missing sequence evolution produced biased results and increased the variance of date estimates of the most recent common ancestor of the re-emergent lineages and across the entire phylogeny. Reanalysis of the H1N1 sequences excluding isolates with unrealistic sampling dates indicates that the 1977 re-emergent lineage was circulating for approximately one year before detection, making it difficult to determine the geographic source of reintroduction. We suggest that a new method is needed to account for viral isolates with unrealistic sampling dates.
doi:10.1371/journal.pone.0011184
PMCID: PMC2887442  PMID: 20567599
15.  Compensatory mutations are repeatable and clustered within proteins 
Compensatory mutations improve fitness in genotypes that contain deleterious mutations but have no beneficial effects otherwise. As such, compensatory mutations represent a very specific form of epistasis. We show that intragenic compensatory mutations occur non-randomly over gene sequence. Compensatory mutations are more likely to appear at some sites than others. Moreover, the sites of compensatory mutations are more likely than expected by chance to be near the site of the original deleterious mutation. Furthermore, compensatory mutations tend to occur more commonly in certain regions of the protein even when controlling for clustering around the site of the deleterious mutation. These results suggest that compensatory evolution at the protein level is partially predictable and may be convergent.
doi:10.1098/rspb.2008.1846
PMCID: PMC2674493  PMID: 19324785
compensatory mutation; deleterious mutations; experimental evolution; epistasis; primary structure
16.  Atlantic Cod Piscidin and Its Diversification through Positive Selection 
PLoS ONE  2010;5(3):e9501.
Piscidins constitute a family of cationic antimicrobial peptides that are thought to play an important role in the innate immune response of teleosts. On the one hand they show a remarkable diversity, which indicates that they are shaped by positive selection, but on the other hand they are ancient and have specific targets, suggesting that they are constrained by purifying selection. Until now piscidins had only been found in fish species from the superorder Acanthopterygii but we have recently identified a piscidin gene in Atlantic cod (Gadus morhua), thus showing that these antimicrobial peptides are not restricted to evolutionarily modern teleosts. Nucleotide diversity was much higher in the regions of the piscidin gene that code for the mature peptide and its pro domain than in the signal peptide. Maximum likelihood analyses with different evolution models revealed that the piscidin gene is under positive selection. Charge or hydrophobicity-changing amino acid substitutions observed in positively selected sites within the mature peptide influence its amphipathic structure and can have a marked effect on its function. This diversification might be associated with adaptation to new habitats or rapidly evolving pathogens.
doi:10.1371/journal.pone.0009501
PMCID: PMC2830478  PMID: 20209162
17.  An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1 
PLoS Computational Biology  2009;5(11):e1000581.
Genetically diverse pathogens (such as Human Immunodeficiency virus type 1, HIV-1) are frequently stratified into phylogenetically or immunologically defined subtypes for classification purposes. Computational identification of such subtypes is helpful in surveillance, epidemiological analysis and detection of novel variants, e.g., circulating recombinant forms in HIV-1. A number of conceptually and technically different techniques have been proposed for determining the subtype of a query sequence, but there is not a universally optimal approach. We present a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial) sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities. Our Subtype Classification Using Evolutionary ALgorithms (SCUEAL) procedure is shown to perform very well in a variety of simulation scenarios, runs in parallel when multiple sequences are being screened, and matches or exceeds the performance of existing approaches on typical empirical cases. We applied SCUEAL to all available polymerase (pol) sequences from two large databases, the Stanford Drug Resistance database and the UK HIV Drug Resistance Database. Comparing with subtypes which had previously been assigned revealed that a minor but substantial (≈5%) fraction of pure subtype sequences may in fact be within- or inter-subtype recombinants. A free implementation of SCUEAL is provided as a module for the HyPhy package and the Datamonkey web server. Our method is especially useful when an accurate automatic classification of an unknown strain is desired, and is positioned to complement and extend faster but less accurate methods. Given the increasingly frequent use of HIV subtype information in studies focusing on the effect of subtype on treatment, clinical outcome, pathogenicity and vaccine design, the importance of accurate, robust and extensible subtyping procedures is clear.
Author Summary
There are nine different subtypes of the main group of HIV-1, each originating as a distinct subepidemic of HIV-1. The distribution of subtypes is often unique to a given geographic region of the world and constitutes a useful epidemiological and surveillance resource. The effects of viral subtype on disease progression, treatment outcome and vaccine design are being actively researched, and the importance of accurate subtyping procedures is clear. In HIV-1, subtype assignment is complicated by frequent recombination among co-circulating strains, creating new genetic mosaics or recombinant forms: 43 have been characterized to date, and many more likely exist. We present an automated phylogenetic method (SCUEAL) to accurately characterize both simple and complex HIV-1 mosaics. Using computer simulations and biological data we demonstrate that SCUEAL performs very well under various conditions, especially when some of the existing classification procedures fail. Furthermore, we show that a small, but noticeable proportion of subtype characterization stored in public databases may be incomplete or incorrect. The computational technique introduced here should provide a much more accurate characterization of HIV-1 strains, especially novel recombinants, and lead to new insights into molecular history, epidemiology and geographical distribution of the virus.
doi:10.1371/journal.pcbi.1000581
PMCID: PMC2776870  PMID: 19956739
18.  Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars 
PLoS ONE  2009;4(9):e6777.
Background
Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a ÔhiddenÕ population, so-called because its members are difficult to identify. Many hidden populations are also at an elevated risk of exposure to infectious diseases. Consequently, public health agencies are presently adopting modern survey techniques that traverse social networks in hidden populations by soliciting individuals to recruit their peers, e.g., respondent-driven sampling (RDS). The concomitant accumulation of network-based epidemiological data, however, is rapidly outpacing the development of computational methods for analysis. Moreover, current analytical models rely on unrealistic assumptions, e.g., that the traversal of social networks can be modeled by a Markov chain rather than a branching process.
Methodology/Principal Findings
Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). Survey data were encoded as text strings that were parsed using our custom implementation of the inside-outside algorithm in a publicly-available software package (HyPhy), which uses either expectation maximization or direct optimization methods and permits constraints on model parameters for hypothesis testing. We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits.
Conclusions
SCFGs provide a rich probabilistic language that can articulate complex latent structure in survey data derived from the traversal of social networks. Such structure that has no representation in Markov chain-based models can interfere with the estimation of the composition of hidden populations if left unaccounted for, raising critical implications for the prevention and control of infectious disease epidemics.
doi:10.1371/journal.pone.0006777
PMCID: PMC2734164  PMID: 19738904
19.  Spidermonkey: rapid detection of co-evolving sites using Bayesian graphical models 
Bioinformatics  2008;24(17):1949-1950.
Spidermonkey is a new component of the Datamonkey suite of phylogenetic tools that provides methods for detecting coevolving sites from a multiple alignment of homologous nucleotide or amino acid sequences. It reconstructs the substitution history of the alignment by maximum likelihood-based phylogenetic methods, and then analyzes the joint distribution of substitution events using Bayesian graphical models to identify significant associations among sites.
Availability: Spidermonkey is publicly available both as a web application at http://www.data-monkey.org and as a stand-alone component of the phylogenetic software package HyPhy, which is freely distributed on the web (http://www.hyphy.org) as precompiled binaries and open source.
Contact: afpoon@ucsd.edu
doi:10.1093/bioinformatics/btn313
PMCID: PMC2732215  PMID: 18562270
20.  A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and Its Application to Influenza A Virus 
Molecular Biology and Evolution  2008;25(9):1809-1824.
We develop a model-based phylogenetic maximum likelihood test for evidence of preferential substitution toward a given residue at individual positions of a protein alignment—directional evolution of protein sequences (DEPS). DEPS can identify both the target residue and sites evolving toward it, help detect selective sweeps and frequency-dependent selection—scenarios that confound most existing tests for selection, and achieve good power and accuracy on simulated data. We applied DEPS to alignments representing different genomic regions of influenza A virus (IAV), sampled from avian hosts (H5N1 serotype) and human hosts (H3N2 serotype), and identified multiple directionally evolving sites in 5/8 genomic segments of H5N1 and H3N2 IAV. We propose a simple descriptive classification of directionally evolving sites into 5 groups based on the temporal distribution of residue frequencies and document known functional correlates, such as immune escape or host adaptation.
doi:10.1093/molbev/msn123
PMCID: PMC2515872  PMID: 18511426
directional selection; evolution of influenza; maximum likelihood; episodic selection
21.  A Comparative Proteomic Analysis of the Simple Amino Acid Repeat Distributions in Plasmodia Reveals Lineage Specific Amino Acid Selection 
PLoS ONE  2009;4(7):e6231.
Background
Microsatellites have been used extensively in the field of comparative genomics. By studying microsatellites in coding regions we have a simple model of how genotypic changes undergo selection as they are directly expressed in the phenotype as altered proteins. The simplest of these tandem repeats in coding regions are the tri-nucleotide repeats which produce a repeat of a single amino acid when translated into proteins. Tri-nucleotide repeats are often disease associated, and are also known to be unstable to both expansion and contraction. This makes them sensitive markers for studying proteome evolution, in closely related species.
Results
The evolutionary history of the family of malarial causing parasites Plasmodia is complex because of the life-cycle of the organism, where it interacts with a number of different hosts and goes through a series of tissue specific stages. This study shows that the divergence between the primate and rodent malarial parasites has resulted in a lineage specific change in the simple amino acid repeat distribution that is correlated to A–T content. The paper also shows that this altered use of amino acids in SAARs is consistent with the repeat distributions being under selective pressure.
Conclusions
The study shows that simple amino acid repeat distributions can be used to group related species and to examine their phylogenetic relationships. This study also shows that an outgroup species with a similar A–T content can be distinguished based only on the amino acid usage in repeats, and suggest that this might be a useful feature for proteome clustering. The lineage specific use of amino acids in repeat regions suggests that comparative studies of SAAR distributions between proteomes gives an insight into the mechanisms of expansion and the selective pressures acting on the organism.
doi:10.1371/journal.pone.0006231
PMCID: PMC2705789  PMID: 19597555
22.  Immune-driven recombination and loss of control after HIV superinfection 
The Journal of Experimental Medicine  2008;205(8):1789-1796.
After acute HIV infection, CD8+ T cells are able to control viral replication to a set point. This control is often lost after superinfection, although the mechanism behind this remains unclear. In this study, we illustrate in an HLA-B27+ subject that loss of viral control after HIV superinfection coincides with rapid recombination events within two narrow regions of Gag and Env. Screening for CD8+ T cell responses revealed that each of these recombination sites (∼50 aa) encompassed distinct regions containing two immunodominant CD8 epitopes (B27-KK10 in Gag and Cw1-CL9 in Env). Viral escape and the subsequent development of variant-specific de novo CD8+ T cell responses against both epitopes were illustrative of the significant immune selection pressures exerted by both responses. Comprehensive analysis of the kinetics of CD8 responses and viral evolution indicated that the recombination events quickly facilitated viral escape from both dominant WT- and variant-specific responses. These data suggest that the ability of a superinfecting strain of HIV to overcome preexisting immune control may be related to its ability to rapidly recombine in critical regions under immune selection pressure. These data also support a role for cellular immune pressures in driving the selection of new recombinant forms of HIV.
doi:10.1084/jem.20080281
PMCID: PMC2525594  PMID: 18625749
23.  Herpes Simplex Virus Type 2 Acquisition During Recent HIV Infection Does Not Influence Plasma HIV Levels 
Summary
We assessed the effect of herpes simplex virus type 2 (HSV-2) acquisition on the plasma HIV RNA and CD4 cell levels among individuals with primary HIV infection using a retrospective cohort analysis. We studied 119 adult, antiretroviral-naive, recently HIV-infected men with a negative HSV-2–specific enzyme immunoassay (EIA) result at enrollment. HSV-2 acquisition was determined by seroconversion on HSV-2 EIA, confirmed by Western blot analysis. Ten men acquired HSV-2 infection a median of 1.3 years after HIV infection (HSV-2 incidence rate of 7.4 per 100 person-years of follow-up). The median time of follow-up after acquiring HSV-2 infection was 303 days. All men except 1 were asymptomatic during HSV-2 acquisition, and only 1 HSV-2 seroconverter, who was asymptomatic, had a transient increase in blood HIV load (0.5 log10 copies/mL over 11 days). The HSV-2 incidence rate was high in our cohort of recently HIV-infected individuals; however, HSV-2 acquisition did not significantly change the plasma HIV dynamics and CD4 cell levels.
doi:10.1097/QAI.0b013e318163bd87
PMCID: PMC2630881  PMID: 18197122
HIV RNA; incident herpes simplex virus-2; viral dynamics
24.  Mapping Protease Inhibitor Resistance to Human Immunodeficiency Virus Type 1 Sequence Polymorphisms within Patients▿  
Journal of Virology  2007;81(24):13598-13607.
Resistance genotyping provides an important resource for the clinical management of patients infected with human immunodeficiency virus type 1 (HIV-1). However, resistance to protease (PR) inhibitors (PIs) is a complex phenotype shaped by interactions among nearly half of the residues in HIV-1 PR. Previous studies of the genetic basis of PI resistance focused on fixed substitutions among populations of HIV-1, i.e., host-specific adaptations. Consequently, they are susceptible to a high false discovery rate due to founder effects. Here, we employ sequencing “mixtures” (i.e., ambiguous base calls) as a site-specific marker of genetic variation within patients that is independent of the phylogeny. We demonstrate that the transient response to selection by PIs is manifested as an excess of nonsynonymous mixtures. Using a sample of 5,651 PR sequences isolated from both PI-naive and -treated patients, we analyze the joint distribution of mixtures and eight PIs as a Bayesian network, which distinguishes residue-residue interactions from direct associations with PIs. We find that selection for resistance is associated with the emergence of nonsynonymous mixtures in two distinct groups of codon sites clustered along the substrate cleft and distal regions of PR, respectively. Within-patient evolution at several positions is independent of PIs, including those formerly postulated to be involved in resistance. These positions are under strong positive selection in the PI-naive patient population, implying that other factors can produce spurious associations with resistance, e.g., mutational escape from the immune response.
doi:10.1128/JVI.01570-07
PMCID: PMC2168824  PMID: 17913806
25.  The 2006 NESCent Phyloinformatics Hackathon: A Field Report 
In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development.
PMCID: PMC2684128
phylogenetics; phyloinformatics; open source software; analysis workflow

Results 1-25 (66)