Polyadenylation is present in all three domains of life, making it the most conserved post-transcriptional process compared with splicing and 5'-capping. Even though most mammalian poly(A) sites contain a highly conserved hexanucleotide in the upstream region and a far less conserved U/GU-rich sequence in the downstream region, there are many exceptions. Furthermore, poly(A) sites in other species, such as plants and invertebrates, exhibit high deviation from this genomic structure, making the construction of a general poly(A) site recognition model challenging. We surveyed nine poly(A) site prediction methods published between 1999 and 2011. All methods exploit the skewed nucleotide profile across the poly(A) sites, and the highly conserved poly(A) signal as the primary features for recognition. These methods typically use a large number of features, which increases the dimensionality of the models to crippling degrees, and typically are not validated against many kinds of genomes.
We propose a poly(A) site model that employs minimal features to capture the essence of poly(A) sites, and yet, produces better prediction accuracy across diverse species. Our model consists of three dior-trinucleotide profiles identified through principle component analysis, and the predicted nucleosome occupancy flanking the poly(A) sites. We validated our model using two machine learning methods: logistic regression and linear discriminant analysis. Results show that models achieve 85-92% sensitivity and 85-96% specificity in seven animals and plants. When we applied one model from one species to predict poly(A) sites from other species, the sensitivity scores correlate with phylogenetic distances.
A four-feature model geared towards small motifs was sufficient to accurately learn and predict poly(A) sites across eukaryotes.
Viral codon usage bias may be the product of a number of synergistic or antagonistic factors, including genomic nucleotide composition, translational selection, genomic architecture, and mutational or repair biases. Most studies of viral codon bias evaluate only the relative importance of genomic base composition and translational selection, ignoring other possible factors. We analyzed the codon preferences of ssRNA (luteoviruses and potyviruses) and ssDNA (geminiviruses) plant viruses that infect translationally distinct monocot and dicot hosts. We found that neither genomic base composition nor translational selection satisfactorily explains their codon usage biases. Furthermore, we observed a strong relationship between the codon preferences of viruses in the same family or genus, regardless of host or genomic nucleotide content. Our results suggest that analyzing codon bias as either due to base composition or translational selection is a false dichotomy that obscures the role of other factors. Constraints such as genomic architecture and secondary structure can and do influence codon usage in plant viruses, and likely in viruses of other hosts.
synonymous codon usage; translational selection; genomic content; mutational bias
Grapevine leafroll-associated viruses are a problem for grape production globally. Symptoms are caused by a number of distinct viral species. During a survey of Napa Valley vineyards (California, USA), we found evidence of a new variant of Grapevine leafroll-associated virus 3 (GLRaV-3). We isolated its genome from a symptomatic greenhouse-raised plant and fully sequenced it.
In a maximum likelihood analysis of representative GLRaV-3 gene sequences, the isolate grouped most closely with a recently sequenced variant from South Africa and a partial sequence from New Zealand. These highly divergent GLRaV-3 variants have predicted proteins that are more than 10% divergent from other GLRaV-3 variants, and appear to be missing an open reading frame for the p6 protein.
This divergent GLRaV-3 phylogroup is already present in grape-growing regions worldwide and is capable of causing symptoms of leafroll disease without the p6 protein.
Ampelovirus; Wine; Mealybug
Viruses are exceedingly diverse in their evolved strategies to manipulate hosts for viral replication. However, despite these differences, most virus populations will occasionally experience two commonly-encountered challenges: growth in variable host environments, and growth under fluctuating population sizes. We used the segmented RNA bacteriophage ϕ6 as a model for studying the evolutionary genomics of virus adaptation in the face of host switches and parametrically varying population sizes. To do so, we created a bifurcating deme structure that reflected lineage splitting in natural populations, allowing us to test whether phylogenetic algorithms could accurately resolve this ‘known phylogeny’. The resulting tree yielded 32 clones at the tips and internal nodes; these strains were fully sequenced and measured for phenotypic changes in selected traits (fitness on original and novel hosts).
We observed that RNA segment size was negatively correlated with the extent of molecular change in the imposed treatments; molecular substitutions tended to cluster on the Small and Medium RNA chromosomes of the virus, and not on the Large segment. Our study yielded a very large molecular and phenotypic dataset, fostering possible inferences on genotype-phenotype associations. Using further experimental evolution, we confirmed an inference on the unanticipated role of an allelic switch in a viral assembly protein, which governed viral performance across host environments.
Our study demonstrated that varying complexities can be simultaneously incorporated into experimental evolution, to examine the combined effects of population size, and adaptation in novel environments. The imposed bifurcating structure revealed that some methods for phylogenetic reconstruction failed to resolve the true phylogeny, owing to a paucity of molecular substitutions separating the RNA viruses that evolved in our study.
Adaptation; Bacteria; Bacteriophage; Experimental evolution; Known phylogeny; Pseudomonas; Virus
The literature is ripe with phylogenetic estimates of nucleotide substitution rates, especially of measurably evolving species such as RNA viruses. However, it is not known how robust these rate estimates are to inaccuracies in the data, particularly in sampling dates that are used for molecular clock calibration. Here we report on the rate of evolution of the emerging pathogen Rabbit hemorrhagic disease virus (RHDV), which has significantly different rates of evolution for the same outer capsid (VP60) gene published in the literature. In an attempt to reconcile the conflicting data and further elucidate details of RHDV ’s evolutionary history, we undertook fresh Bayesian analyses and employed jackknife control methods to produce robust substitution rate and time to most recent common ancestor (TMRCA) estimates for RHDV based on the VP60 and RNA-dependent RNA polymerase genes.
Through these control methods, we were able to identify a single misdated taxon, a passaged lab strain used for vaccine production, which was responsible for depressing the RHDV capsid gene’s rate of evolution by 65%. Without this isolate, the polymerase and the capsid protein genes had nearly identical rates of evolution: 1.90x10-3 nucleotide substitutions/site/year, ns/s/y, (95% highest probability density (HPD) 1.25x10-3-2.55x10-3) and 1.91x10-3 ns/s/y (95% HPD 1.50x10-3-2.34x10-3), respectively.
After excluding the misdated taxon, both genes support a significantly higher substitution rate as well as a relatively recent emergence of RHDV, and obviate the need for previously hypothesized decades of unobserved diversification of the virus. The control methods show that using even one misdated taxon in a large dataset can significantly skew estimates of evolutionary parameters and suggest that it is better practice to use smaller datasets composed of taxa with unequivocal isolation dates. These jackknife controls would be useful for future tip-calibrated rate analyses that include taxa with ambiguous dates of isolation.
RHDV; Substitution rate; Tip-calibrated; BEAST; Misdated taxon
Picornaviruses have some of the highest nucleotide substitution rates among viruses, but there have been no comparisons of evolutionary rates within this broad family. We combined our own Bayesian coalescent analyses of VP1 regions from four picornaviruses with 22 published VP1 rates to produce the first within-family meta-analysis of viral evolutionary rates. Similarly, we compared our rate estimates for the RNA polymerase 3Dpol gene from five viruses to four published 3Dpol rates. Both a structural and a nonstructural gene show that enteroviruses are evolving, on average, a half order of magnitude faster than members of other genera within the Picornaviridae family.
Grapevine leafroll disease (GLD) is caused by a complex of several virus species (grapevine leafroll-associated viruses, GLRaV) in the family Closteroviridae. Because of its increasing importance, it is critical to determine which species of GLRaV is predominant in each region where this disease is occurring. A structured sampling design, utilizing a combination of RT-PCR based testing and sequencing methods, was used to survey GLRaVs in Napa Valley (California, USA) vineyards (n = 36). Of the 216 samples tested for GLRaV-1, -2, -3, -4, -5, and -9, 62% (n = 134) were GLRaV positive. Of the positives, 81% (n = 109) were single infections with GLRaV-3, followed by GLRaV-2 (4%, n = 5), while the remaining samples (15%, n = 20) were mixed infections of GLRaV-3 with GLRaV-1, 2, 4, or 9. Additionally, 468 samples were tested for genetic variants of GLRaV-3, and of the 65% (n = 306) of samples positive for GLRaV-3, 22% were infected with multiple GLRaV-3 variants. Phylogenetic analysis utilizing sequence data from the single infection GLRaV-3 samples produced seven well-supported GLRaV-3 variants, of which three represented 71% of all GLRaV-3 positive samples in Napa Valley. Furthermore, two novel variants, which grouped with a divergent isolate from New Zealand (NZ-1), were identified, and these variants comprised 6% of all positive GLRaV-3 samples. Spatial analyses showed that GLRaV-3a, 3b, and 3c were not homogeneously distributed across Napa Valley. Overall, 86% of all blocks (n = 31) were positive for GLRaVs and 90% of positive blocks (n = 28) had two or more GLRaV-3 variants, suggesting complex disease dynamics that might include multiple insect-mediated introduction events.
Viral codon usage is shaped by the conflicting forces of mutational pressure and selection to match host patterns for optimal expression. We examined whether genomic architecture (single- or double-stranded DNA) influences the degree to which bacteriophage codon usage differ from their primary bacterial hosts and each other. While both correlated equally with their hosts’ genomic nucleotide content, the coat genes of ssDNA phages were less well adapted than those of dsDNA phages to their hosts’ codon usage profiles due to their preference for codons ending in thymine. No specific biases were detected in dsDNA phage genomes. In all nine of ten cases of codon redundancy in which a specific codon was overrepresented, ssDNA phages favored the NNT codon. A cytosine to thymine biased mutational pressure working in conjunction with strong selection against non-synonymous mutations appears be shaping codon usage bias in ssDNA viral genomes.
bacteriophage; codon usage bias; evolution; genome; genomic adaptation; genomic architecture; single-stranded DNA
Short-form publications such as Plant Disease reports serve essential functions: the rapid dissemination of information on the geography of established plant pathogens, incidence and symptomology of pathogens in new hosts, and the discovery of novel pathogens. Many of these sentinel publications include viral sequence data, but most use that information only to confirm the virus' species. When researchers use the standard technique of per cent nucleotide identity to determine that the new sequence is closely related to another sequence, potentially erroneous conclusions can be drawn from the results. Multiple introductions of the same pathogen into a country are being ignored because researchers know fast-evolving plant viruses can accumulate substantial sequence divergence over time, even from a single introduction. An increased use of phylogenetic methods in short-form publications could speed our understanding of these cryptic second introductions and aid in control of epidemics.
BLAST; phylogeny; biogeography; molecular epidemiology; per cent nucleotide identity
Current knowledge of plant virus diversity is biased towards agents of visible and economically important diseases. Less is known about viruses that have not caused major diseases in crops, or viruses from native vegetation, which are a reservoir of biodiversity that can contribute to viral emergence. Discovery of these plant viruses is hindered by the traditional approach of sampling individual symptomatic plants. Since many damaging plant viruses are transmitted by insect vectors, we have developed “vector-enabled metagenomics” (VEM) to investigate the diversity of plant viruses. VEM involves sampling of insect vectors (in this case, whiteflies) from plants, followed by purification of viral particles and metagenomic sequencing. The VEM approach exploits the natural ability of highly mobile adult whiteflies to integrate viruses from many plants over time and space, and leverages the capability of metagenomics for discovering novel viruses. This study utilized VEM to describe the DNA viral community from whiteflies (Bemisia tabaci) collected from two important agricultural regions in Florida, USA. VEM successfully characterized the active and abundant viruses that produce disease symptoms in crops, as well as the less abundant viruses infecting adjacent native vegetation. PCR assays designed from the metagenomic sequences enabled the complete sequencing of four novel begomovirus genome components, as well as the first discovery of plant virus satellites in North America. One of the novel begomoviruses was subsequently identified in symptomatic Chenopodium ambrosiodes from the same field site, validating VEM as an effective method for proactive monitoring of plant viruses without a priori knowledge of the pathogens. This study demonstrates the power of VEM for describing the circulating viral community in a given region, which will enhance our understanding of plant viral diversity, and facilitate emerging plant virus surveillance and management of viral diseases.
Porcine circovirus 2 (PCV2) is the primary etiological agent of postweaning multisystemic wasting syndrome (PMWS), one of the most economically important emerging swine diseases worldwide. Virulent PCV2 was first identified following nearly simultaneous outbreaks of PMWS in North America and Europe in the 1990s and has since achieved global distribution. However, the processes responsible for the emergence and spread of PCV2 remain poorly understood. Here, phylogenetic and cophylogenetic inferences were utilized to address key questions on the time scale, processes, and geographic diffusion of emerging PCV2. The results of these analyses suggest that the two genotypes of PCV2 (PCV2a and PCV2b) are likely to have emerged from a common ancestor approximately 100 years ago and have been on independent evolutionary trajectories since that time, despite cocirculating in the same host species and geographic regions. The patterns of geographic movement of PCV2 that we recovered appear to mimic those of the global pig trade and suggest that the movement of asymptomatic animals is likely to have facilitated the rapid spread of virulent PCV2 around the globe. We further estimated the rate of nucleotide substitution for PCV2 to be on the order of 1.2 × 10−3 substitutions/site/year, the highest yet recorded for a single-stranded DNA virus. This high rate of evolution may allow PCV2 to maintain evolutionary dynamics closer to those of single-stranded RNA viruses than to those of double-stranded DNA viruses, further facilitating the rapid emergence of PCV2 worldwide.
Maize streak virus (MSV), which causes maize streak disease (MSD), is one of the most serious biotic threats to African food security. Here, we use whole MSV genomes sampled over 30 years to estimate the dates of key evolutionary events in the 500 year association of MSV and maize. The substitution rates implied by our analyses agree closely with those estimated previously in controlled MSV evolution experiments, and we use them to infer the date when the maize-adapted strain, MSV-A, was generated by recombination between two grass-adapted MSV strains. Our results indicate that this recombination event occurred in the mid-1800s, ∼20 years before the first credible reports of MSD in South Africa and centuries after the introduction of maize to the continent in the early 1500s. This suggests a causal link between MSV recombination and the emergence of MSV-A as a serious pathogen of maize.
Despite the demonstration that geminiviruses, like many other single stranded DNA viruses, are evolving at rates similar to those of RNA viruses, a recent study has suggested that grass-infecting species in the genus Mastrevirus may have co-diverged with their hosts over millions of years. This "co-divergence hypothesis" requires that long-term mastrevirus substitution rates be at least 100,000-fold lower than their basal mutation rates and 10,000-fold lower than their observable short-term substitution rates. The credibility of this hypothesis, therefore, hinges on the testable claim that negative selection during mastrevirus evolution is so potent that it effectively purges 99.999% of all mutations that occur.
We have conducted long-term evolution experiments lasting between 6 and 32 years, where we have determined substitution rates of between 2 and 3 × 10-4 substitutions/site/year for the mastreviruses Maize streak virus (MSV) and Sugarcane streak Réunion virus (SSRV). We further show that mutation biases are similar for different geminivirus genera, suggesting that mutational processes that drive high basal mutation rates are conserved across the family. Rather than displaying signs of extremely severe negative selection as implied by the co-divergence hypothesis, our evolution experiments indicate that MSV and SSRV are predominantly evolving under neutral genetic drift.
The absence of strong negative selection signals within our evolution experiments and the uniformly high geminivirus substitution rates that we and others have reported suggest that mastreviruses cannot have co-diverged with their hosts.
Geminiviruses are devastating viruses of plants that possess single-stranded DNA (ssDNA) DNA genomes. Despite the importance of this class of phytopathogen, there have been no estimates of the rate of nucleotide substitution in the geminiviruses. We report here the evolutionary rate of the tomato yellow leaf curl disease-causing viruses, an intensively studied group of monopartite begomoviruses. Sequences from GenBank, isolated from diseased plants between 1988 and 2006, were analyzed using Bayesian coalescent methods. The mean genomic substitution rate was estimated to be 2.88 × 10−4 nucleotide substitutions per site per year (subs/site/year), although this rate could be confounded by frequent recombination within Tomato yellow leaf curl virus genomes. A recombinant-free data set comprising the coat protein (V1) gene in isolation yielded a similar mean rate (4.63 × 10−4 subs/site/year), validating the order of magnitude of genomic substitution rate for protein-coding regions. The intergenic region, which is known to be more variable, was found to evolve even more rapidly, with a mean substitution rate of ∼1.56 × 10−3 subs/site/year. Notably, these substitution rates, the first reported for a plant DNA virus, are in line with those estimated previously for mammalian ssDNA viruses and RNA viruses. Our results therefore suggest that the high evolutionary rate of the geminiviruses is not primarily due to frequent recombination and may explain their ability to emerge in novel hosts.
A phylogenetic analysis of three genomic regions revealed that Tomato yellow leaf curl virus (TYLCV) from western North America is distinct from TYLCV isolated in eastern North America and the Caribbean. This analysis supports a second introduction of this Old World begomovirus into the New World, most likely from Asia.
Mathematical models were developed to predict the probability of yeast spoilage of cold-filled ready-to-drink beverages as a function of beverage formulation. A Box-Behnken experimental design included five variables, each at three levels: pH (2.8, 3.3, and 3.8), titratable acidity (0.20, 0.40, and 0.60%), sugar content (8.0, 12.0, and 16.0 °Brix), sodium benzoate concentration (100, 225, and 350 ppm), and potassium sorbate concentration (100, 225, and 350 ppm). Duplicate samples were inoculated with a yeast cocktail (100 μl/50 ml) consisting of equal proportions of Saccharomyces cerevisiae, Zygosaccharomyces bailii, and Candida lipolytica (∼5.0 × 104 CFU/ml each). The inoculated samples were plated on malt extract agar after 0, 1, 2, 4, 6, and 8 weeks. Logistic regression was used to create the predictive models. The pH and sodium benzoate and potassium sorbate concentrations were found to be significant factors controlling the probability of yeast growth. Interaction terms for pH and each preservative were also significant in the predictive model. Neither the titratable acidity nor the sugar content of the model beverages was a significant predictor of yeast growth in the ranges tested.
Probabilistic models were used as a systematic approach to describe the response of Escherichia coli O157:H7 populations to combinations of commonly used preservation methods in unpasteurized apple cider. Using a complete factorial experimental design, the effect of pH (3.1 to 4.3), storage temperature and time (5 to 35°C for 0 to 6 h or 12 h), preservatives (0, 0.05, or 0.1% potassium sorbate or sodium benzoate), and freeze-thaw (F-T; −20°C, 48 h and 4°C, 4 h) treatment combinations (a total of 1,600 treatments) on the probability of achieving a 5-log10-unit reduction in a three-strain E. coli O157:H7 mixture in cider was determined. Using logistic regression techniques, pH, temperature, time, and concentration were modeled in separate segments of the data set, resulting in prediction equations for: (i) no preservatives, before F-T; (ii) no preservatives, after F-T; (iii) sorbate, before F-T; (iv) sorbate, after F-T; (v) benzoate, before F-T; and (vi) benzoate, after F-T. Statistical analysis revealed a highly significant (P < 0.0001) effect of all four variables, with cider pH being the most important, followed by temperature and time, and finally by preservative concentration. All models predicted 92 to 99% of the responses correctly. To ensure safety, use of the models is most appropriate at a 0.9 probability level, where the percentage of false positives, i.e., falsely predicting a 5-log10-unit reduction, is the lowest (0 to 4.4%). The present study demonstrates the applicability of logistic regression approaches to describing the effectiveness of multiple treatment combinations in pathogen control in cider making. The resulting models can serve as valuable tools in designing safe apple cider processes.