Search tips
Search criteria

Results 1-25 (1079360)

Clipboard (0)

Related Articles

1.  Amplification of Uncultured Single-Stranded DNA Viruses from Rice Paddy Soil▿ † 
Applied and Environmental Microbiology  2008;74(19):5975-5985.
Viruses are known to be the most numerous biological entities in soil; however, little is known about their diversity in this environment. In order to explore the genetic diversity of soil viruses, we isolated viruses by centrifugation and sequential filtration before performing a metagenomic investigation. We adopted multiple-displacement amplification (MDA), an isothermal whole-genome amplification method with φ29 polymerase and random hexamers, to amplify viral DNA and construct clone libraries for metagenome sequencing. By the MDA method, the diversity of both single-stranded DNA (ssDNA) viruses and double-stranded DNA viruses could be investigated at the same time. On the contrary, by eliminating the denaturing step in the MDA reaction, only ssDNA viral diversity could be explored selectively. Irrespective of the denaturing step, more than 60% of the soil metagenome sequences did not show significant hits (E-value criterion, 0.001) with previously reported viral sequences. Those hits that were considered to be significant were also distantly related to known ssDNA viruses (average amino acid similarity, approximately 34%). Phylogenetic analysis showed that replication-related proteins (which were the most frequently detected proteins) related to those of ssDNA viruses obtained from the metagenomic sequences were diverse and novel. Putative circular genome components of ssDNA viruses that are unrelated to known viruses were assembled from the metagenomic sequences. In conclusion, ssDNA viral diversity in soil is more complex than previously thought. Soil is therefore a rich pool of previously unknown ssDNA viruses.
PMCID: PMC2565953  PMID: 18708511
2.  Raw Sewage Harbors Diverse Viral Populations 
mBio  2011;2(5):e00180-11.
At this time, about 3,000 different viruses are recognized, but metagenomic studies suggest that these viruses are a small fraction of the viruses that exist in nature. We have explored viral diversity by deep sequencing nucleic acids obtained from virion populations enriched from raw sewage. We identified 234 known viruses, including 17 that infect humans. Plant, insect, and algal viruses as well as bacteriophages were also present. These viruses represented 26 taxonomic families and included viruses with single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), positive-sense ssRNA [ssRNA(+)], and dsRNA genomes. Novel viruses that could be placed in specific taxa represented 51 different families, making untreated wastewater the most diverse viral metagenome (genetic material recovered directly from environmental samples) examined thus far. However, the vast majority of sequence reads bore little or no sequence relation to known viruses and thus could not be placed into specific taxa. These results show that the vast majority of the viruses on Earth have not yet been characterized. Untreated wastewater provides a rich matrix for identifying novel viruses and for studying virus diversity.
Importance At this time, virology is focused on the study of a relatively small number of viral species. Specific viruses are studied either because they are easily propagated in the laboratory or because they are associated with disease. The lack of knowledge of the size and characteristics of the viral universe and the diversity of viral genomes is a roadblock to understanding important issues, such as the origin of emerging pathogens and the extent of gene exchange among viruses. Untreated wastewater is an ideal system for assessing viral diversity because virion populations from large numbers of individuals are deposited and because raw sewage itself provides a rich environment for the growth of diverse host species and thus their viruses. These studies suggest that the viral universe is far more vast and diverse than previously suspected.
At this time, virology is focused on the study of a relatively small number of viral species. Specific viruses are studied either because they are easily propagated in the laboratory or because they are associated with disease. The lack of knowledge of the size and characteristics of the viral universe and the diversity of viral genomes is a roadblock to understanding important issues, such as the origin of emerging pathogens and the extent of gene exchange among viruses. Untreated wastewater is an ideal system for assessing viral diversity because virion populations from large numbers of individuals are deposited and because raw sewage itself provides a rich environment for the growth of diverse host species and thus their viruses. These studies suggest that the viral universe is far more vast and diverse than previously suspected.
PMCID: PMC3187576  PMID: 21972239
3.  Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics 
eLife  2014;3:e03125.
Viruses modulate microbial communities and alter ecosystem functions. However, due to cultivation bottlenecks, specific virus–host interaction dynamics remain cryptic. In this study, we examined 127 single-cell amplified genomes (SAGs) from uncultivated SUP05 bacteria isolated from a model marine oxygen minimum zone (OMZ) to identify 69 viral contigs representing five new genera within dsDNA Caudovirales and ssDNA Microviridae. Infection frequencies suggest that ∼1/3 of SUP05 bacteria is viral-infected, with higher infection frequency where oxygen-deficiency was most severe. Observed Microviridae clonality suggests recovery of bloom-terminating viruses, while systematic co-infection between dsDNA and ssDNA viruses posits previously unrecognized cooperation modes. Analyses of 186 microbial and viral metagenomes revealed that SUP05 viruses persisted for years, but remained endemic to the OMZ. Finally, identification of virus-encoded dissimilatory sulfite reductase suggests SUP05 viruses reprogram their host's energy metabolism. Together, these results demonstrate closely coupled SUP05 virus–host co-evolutionary dynamics with the potential to modulate biogeochemical cycling in climate-critical and expanding OMZs.
eLife digest
Microorganisms help to drive a number of processes that recycle energy and nutrients, including elements such as carbon, nitrogen, and sulfur, around the Earth's ecosystems. Viruses that infect microbes can also affect these cycles by killing and breaking open microbial cells, or by reprogramming the cell's metabolism. However, as there are many different species of microbes and viruses —the vast majority of which cannot easily be grown in the laboratory— little is known about most virus–host interactions in natural ecosystems, especially in the oceans.
In the world's oceans, the concentration of oxygen dissolved in the water changes in different regions and at different depths. ‘Oxygen minimum zones’ occur globally throughout the oceans at depths of 200–1000 meters, and climate change is causing these zones to expand and intensify. Although a lack of oxygen is sometimes considered detrimental to living organisms, oxygen minimum zones appear to be rich with microbial life that is adapted to thrive under oxygen-starved conditions.
Sulfur-oxidizing bacteria are one of the most abundant groups of microbes in these oxygen minimum zones, and several of these bacteria are known to influence the recycling of chemical substances. Now, Roux et al. introduce a new method to identify viruses that infect the microbes in this environment, including those microbes that cannot be grown in the laboratory and which have previously remained largely unexplored.
The genomes of 127 individual bacterial cells —collected from an oxygen minimum zone in western Canada— were examined. Roux et al. estimate that about a third of the sulfur-oxidizing bacterial cells are infected by at least one virus, but often multiple viruses infected the same bacterium. Five new genera (groups of one or more species) of viruses were also discovered and found to infect these bacteria. Looking for these new viral sequences in the DNA of this oxygen minimum zone's microbial community revealed that these newly discovered viruses persist in this region over several years. It also revealed that these viruses appear to only be found within the oxygen minimum zone. Roux et al. uncovered that these viruses carry genes that could manipulate how an infected bacterium processes sulfur-containing compounds; this is similar to previous observations showing that other viruses also influence cellular process (such as photosynthesis) in infected bacteria. As such, these newly discovered viruses might also influence the recycling of chemical elements within oxygen minimum zones.
Together, Roux et al.'s findings provide an unprecedented look into a wild virus community using a method that can be generalized to uncover viruses in a data type that is quickly becoming more widespread: single cell genomes. This effort to understand virus–host interactions by looking in the genomes of individual cells now sets the stage for future efforts aimed to uncover the impact of viruses on bacteria in other environments across the globe.
PMCID: PMC4164917  PMID: 25171894
SUP05; bacteriophages; viruses; single cell genomics; oxygen minimum zone; viral dark matter; other
4.  Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses 
PLoS Pathogens  2008;4(6):e1000079.
It is well known that the dinucleotide CpG is under-represented in the genomic DNA of many vertebrates. This is commonly thought to be due to the methylation of cytosine residues in this dinucleotide and the corresponding high rate of deamination of 5-methycytosine, which lowers the frequency of this dinucleotide in DNA. Surprisingly, many single-stranded RNA viruses that replicate in these vertebrate hosts also have a very low presence of CpG dinucleotides in their genomes. Viruses are obligate intracellular parasites and the evolution of a virus is inexorably linked to the nature and fate of its host. One therefore expects that virus and host genomes should have common features. In this work, we compare evolutionary patterns in the genomes of ssRNA viruses and their hosts. In particular, we have analyzed dinucleotide patterns and found that the same patterns are pervasively over- or under-represented in many RNA viruses and their hosts suggesting that many RNA viruses evolve by mimicking some of the features of their host's genes (DNA) and likely also their corresponding mRNAs. When a virus crosses a species barrier into a different host, the pressure to replicate, survive and adapt, leaves a footprint in dinucleotide frequencies. For instance, since human genes seem to be under higher pressure to eliminate CpG dinucleotide motifs than avian genes, this pressure might be reflected in the genomes of human viruses (DNA and RNA viruses) when compared to those of the same viruses replicating in avian hosts. To test this idea we have analyzed the evolution of the influenza virus since 1918. We find that the influenza A virus, which originated from an avian reservoir and has been replicating in humans over many generations, evolves in a direction strongly selected to reduce the frequency of CpG dinucleotides in its genome. Consistent with this observation, we find that the influenza B virus, which has spent much more time in the human population, has adapted to its human host and exhibits an extremely low CpG dinucleotide content. We believe that these observations directly show that the evolution of RNA viral genomes can be shaped by pressures observed in the host genome. As a possible explanation, we suggest that the strong selection pressures acting on these RNA viruses are most likely related to the innate immune response and to nucleotide motifs in the host DNA and RNAs.
Author Summary
Viruses are obligate intracellular parasites that use different strategies to sequester host cell machinery and avoid the host immune system. In this paper we explore the genomes of viruses that encode their genetic information in single-stranded RNA, a different material than the one used by their hosts (double-stranded DNA). It is interesting to observe that these viruses share some of the host's characteristics. For instance, one of the most underrepresented motifs in the DNA of vertebrates is the dinucleotide CpG. This is commonly thought to be due to methylation and deamination of cytosine residues in this dinucleotide. Surprisingly, the same CpG suppression is observed in vertebrate RNA viruses but not in RNA phages. We show that RNA viruses present similar dinucleotide pressures as their host genes. We find that the influenza A virus, which originated from an avian reservoir and replicated in humans over many generations, evolves to reduce the frequency of CpG dinucleotides mimicking the human genes. Influenza B, which has been in humans longer, exhibits an extremely low CpG dinucleotide content. These observations suggest that the evolution of RNA viruses is shaped by pressures observed in the host genome.
PMCID: PMC2390760  PMID: 18535658
5.  The Characterization of RNA Viruses in Tropical Seawater Using Targeted PCR and Metagenomics 
mBio  2014;5(3):e01210-14.
Viruses have a profound influence on the ecology and evolution of plankton, but our understanding of the composition of the aquatic viral communities is still rudimentary. This is especially true of those viruses having RNA genomes. The limited data that have been published suggest that the RNA virioplankton is dominated by viruses with positive-sense, single-stranded (+ss) genomes that have features in common with those of eukaryote-infecting viruses in the order Picornavirales (picornavirads). In this study, we investigated the diversity of the RNA virus assemblages in tropical coastal seawater samples using targeted PCR and metagenomics. Amplification of RNA-dependent RNA polymerase (RdRp) genes from fractions of a buoyant density gradient suggested that the distribution of two major subclades of the marine picornavirads was largely congruent with the distribution of total virus-like RNA, a finding consistent with their proposed dominance. Analyses of the RdRp sequences in the library revealed the presence of many diverse phylotypes, most of which were related only distantly to those of cultivated viruses. Phylogenetic analysis suggests that there were hundreds of unique picornavirad-like phylotypes in one 35-liter sample that differed from one another by at least as much as the differences among currently recognized species. Assembly of the sequences in the metagenome resulted in the reconstruction of six essentially complete viral genomes that had features similar to viruses in the families Bacillarna-, Dicistro-, and Marnaviridae. Comparison of the tropical seawater metagenomes with those from other habitats suggests that +ssRNA viruses are generally the most common types of RNA viruses in aquatic environments, but biases in library preparation remain a possible explanation for this observation.
Marine plankton account for much of the photosynthesis and respiration on our planet, and they influence the cycling of carbon and the distribution of nutrients on a global scale. Despite the fundamental importance of viruses to plankton ecology and evolution, most of the viruses in the sea, and the identities of their hosts, are unknown. This report is one of very few that delves into the genetic diversity within RNA-containing viruses in the ocean. The data expand the known range of viral diversity and shed new light on the physical properties and genetic composition of RNA viruses in the ocean.
PMCID: PMC4068258  PMID: 24939887
6.  The Footprint of Genome Architecture in the Largest Genome Expansion in RNA Viruses 
PLoS Pathogens  2013;9(7):e1003500.
The small size of RNA virus genomes (2-to-32 kb) has been attributed to high mutation rates during replication, which is thought to lack proof-reading. This paradigm is being revisited owing to the discovery of a 3′-to-5′ exoribonuclease (ExoN) in nidoviruses, a monophyletic group of positive-stranded RNA viruses with a conserved genome architecture. ExoN, a homolog of canonical DNA proof-reading enzymes, is exclusively encoded by nidoviruses with genomes larger than 20 kb. All other known non-segmented RNA viruses have smaller genomes. Here we use evolutionary analyses to show that the two- to three-fold expansion of the nidovirus genome was accompanied by a large number of replacements in conserved proteins at a scale comparable to that in the Tree of Life. To unravel common evolutionary patterns in such genetically diverse viruses, we established the relation between genomic regions in nidoviruses in a sequence alignment-free manner. We exploited the conservation of the genome architecture to partition each genome into five non-overlapping regions: 5′ untranslated region (UTR), open reading frame (ORF) 1a, ORF1b, 3′ORFs (encompassing the 3′-proximal ORFs), and 3′ UTR. Each region was analyzed for its contribution to genome size change under different models. The non-linear model statistically outperformed the linear one and captured >92% of data variation. Accordingly, nidovirus genomes were concluded to have reached different points on an expansion trajectory dominated by consecutive increases of ORF1b, ORF1a, and 3′ORFs. Our findings indicate a unidirectional hierarchical relation between these genome regions, which are distinguished by their expression mechanism. In contrast, these regions cooperate bi-directionally on a functional level in the virus life cycle, in which they predominantly control genome replication, genome expression, and virus dissemination, respectively. Collectively, our findings suggest that genome architecture and the associated region-specific division of labor leave a footprint on genome expansion and may limit RNA genome size.
Author Summary
RNA viruses include many major pathogens. The adaptation of viruses to their hosts is facilitated by fast mutation and constrained by small genome sizes, which are both due to the extremely high error rate of viral polymerases. Using an innovative computational approach, we now provide evidence for additional forces that may control genome size and, consequently, affect virus adaptation to the host. We analyzed nidoviruses, a monophyletic group of viruses that populate the upper ∼60% of the RNA virus genome size scale. They evolved a conserved genomic architecture, and infect vertebrate and invertebrate species. Those nidoviruses that have the largest known RNA genomes uniquely encode a 3′-to-5′exoribonuclease, a homolog of canonical DNA proof-reading enzymes that improves their replication fidelity. We show that nidoviruses accumulated mutations on par with that observed in the Tree of Life for comparable protein datasets, although the time scale of nidovirus evolution remains unknown. Extant nidovirus genomes of different size reached particular points on a common trajectory of genome expansion. This trajectory may be shaped by the division of labor between open reading frames that predominantly control genome replication, genome expression, and virus dissemination, respectively. Ultimately, genomic architecture may determine the observed genome size limit in contemporary RNA viruses.
PMCID: PMC3715407  PMID: 23874204
7.  Previously unknown and highly divergent ssDNA viruses populate the oceans 
The ISME Journal  2013;7(11):2169-2177.
Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.
PMCID: PMC3806263  PMID: 23842650
ssDNA viruses; microbial diversity; viral diversity
8.  Complex Recombination Patterns Arising during Geminivirus Coinfections Preserve and Demarcate Biologically Important Intra-Genome Interaction Networks 
PLoS Pathogens  2011;7(9):e1002203.
Genetic recombination is an important process during the evolution of many virus species and occurs particularly frequently amongst begomoviruses in the single stranded DNA virus family, Geminiviridae. As in many other recombining viruses it is apparent that non-random recombination breakpoint distributions observable within begomovirus genomes sampled from nature are the product of variations both in basal recombination rates across genomes and in the over-all viability of different recombinant genomes. Whereas factors influencing basal recombination rates might include local degrees of sequence similarity between recombining genomes, nucleic acid secondary structures and genomic sensitivity to nuclease attack or breakage, the viability of recombinant genomes could be influenced by the degree to which their co-evolved protein-protein and protein-nucleotide and nucleotide-nucleotide interactions are disreputable by recombination. Here we investigate patterns of recombination that occur over 120 day long experimental infections of tomato plants with the begomoviruses Tomato yellow leaf curl virus and Tomato leaf curl Comoros virus. We show that patterns of sequence exchange between these viruses can be extraordinarily complex and present clear evidence that factors such as local degrees of sequence similarity but not genomic secondary structure strongly influence where recombination breakpoints occur. It is also apparent from our experiment that over-all patterns of recombination are strongly influenced by selection against individual recombinants displaying disrupted intra-genomic interactions such as those required for proper protein and nucleic acid folding. Crucially, we find that selection favoring the preservation of co-evolved longer-range protein-protein and protein DNA interactions is so strong that its imprint can even be used to identify the exact sequence tracts involved in these interactions.
Author Summary
Genetic recombination between viruses is a form of parasexual reproduction during which two parental viruses each contribute genetic information to an offspring, or recombinant, virus. Unlike with sexual reproduction, however, recombination in viruses can even involve the transfer of sequences between the members of distantly related species. When parental genomes are very distantly related, it is anticipated that recombination between them runs the risk of producing defective offspring. The reason for this is that the interactions between different parts of genomes and the proteins they encode (such as between different viral proteins or between viral proteins and the virus genomic DNA or RNA) often depend on particular co-evolved binding sites that recognize one another. When in a recombinant genome the partners in a binding site pair are each inherited from different parents there is a possibility that they will not interact with one another properly. Here we examine recombinant genomes arising during experimental mixed infections of two distantly related viruses to detect evidence that intra-genome interaction networks are broadly preserved in these genomes. We show this preservation is so strict that patterns of recombination in these viruses can even be used to identify the interacting regions within their genomes.
PMCID: PMC3174254  PMID: 21949649
9.  Discovery of a Novel Single-Stranded DNA Virus from a Sea Turtle Fibropapilloma by Using Viral Metagenomics▿  
Journal of Virology  2008;83(6):2500-2509.
Viral metagenomics, consisting of viral particle purification and shotgun sequencing, is a powerful technique for discovering viruses associated with diseases with no definitive etiology, viruses that share limited homology with known viruses, or viruses that are not culturable. Here we used viral metagenomics to examine viruses associated with sea turtle fibropapillomatosis (FP), a debilitating neoplastic disease affecting sea turtles worldwide. By means of purifying and shotgun sequencing the viral community directly from the fibropapilloma of a Florida green sea turtle, a novel single-stranded DNA virus, sea turtle tornovirus 1 (STTV1), was discovered. The single-stranded, circular genome of STTV1 was approximately 1,800 nucleotides in length. STTV1 has only weak amino acid level identities (25%) to chicken anemia virus in short regions of its genome; hence, STTV1 may represent the first member of a novel virus family. A total of 35 healthy turtles and 27 turtles with FP were tested for STTV1 using PCR, and only 2 turtles severely afflicted with FP were positive. The affected turtles were systemically infected with STTV1, since STTV1 was found in blood and all major organs. STTV1 exists as a quasispecies, with several genome variants identified in the fibropapilloma of each positive turtle, suggesting rapid evolution of this virus. The STTV1 variants were identical over the majority of their genomes but contained a hypervariable region with extensive divergence. This study demonstrates the potential of viral metagenomics for discovering novel viruses directly from animal tissue, which can enhance our understanding of viral evolution and diversity.
PMCID: PMC2648252  PMID: 19116258
10.  Conserved Region of Mammalian Retrovirus RNA 
Journal of Virology  1979;32(3):925-933.
The viral RNAs of various mammalian retroviruses contain highly conserved sequences close to their 3′ ends. This was demonstrated by interviral molecular hybridization between fractionated viral complementary DNA (cDNA) and RNA. cDNA near the 3′ end (cDNA3′) from a rat virus (RPL strain) was fractionated by size and mixed with mouse virus RNA (Rauscher leukemia virus). No hybridization occurred with total cDNA (cDNAtotal), in agreement with previous results, but a cross-reacting sequence was found with the fractionated cDNA3′. The sequences between 50 to 400 nucleotides from the 3′ terminus of heteropolymeric RNA were most hybridizable. The rat viral cDNA3′ hybridized with mouse virus RNA more extensively than with RNA of remotely related retroviruses. The related viral sequence of the rodent viruses (mouse and rat) showed as much divergence in heteroduplex thermal denaturation profiles as did the unique sequence DNA of these two rodents. This suggests that over a period of time, rodent viruses have preserved a sequence with changes correlated to phylogenetic distance of hosts. The cross-reacting sequence of replication-competent retroviruses was conserved even in the genome of the replication-defective sarcoma virus and was also located in these genomes near the 3′ end of 30S RNA. A fraction of RD114 cDNA3′, corresponding to the conserved region, cross-hybridized extensively with RNA of a baboon endogenous virus (M7). Fractions of similar size prepared from cDNA3′ of MPMV, a primate type D virus, hybridized with M7 RNA to a lesser extent. Hybridization was not observed between Mason-Pfizer monkey virus and M7 if total cDNA's were incubated with viral RNAs. The degree of cross-reaction of the shared sequence appeared to be influenced by viral ancestral relatedness and host cell phylogenetic relationships. Thus, the strikingly high extent of cross-reaction at the conserved region between rodent viruses and simian sarcoma virus and between baboon virus and RD114 virus may reflect ancestral relatedness of the viruses. Slight cross-reaction at the site between type B and C viruses of rodents (mouse mammary tumor virus and RPL virus, 58-2T) or type C and D viruses of primates (M7, RD114, and Mason-Pfizer monkey virus) may have arisen at the conserved region through a mechanism that depends more on the phylogenetic relatedness of the host cells than on the viral type or origin. Determining the sequence of the conserved region may help elucidate this mechanism. The conserved sequences in retroviruses described here may be an important functional unit for the life cycle of many retroviruses.
PMCID: PMC525941  PMID: 229272
11.  Metagenomic and whole-genome analysis reveals new lineages of gokushoviruses and biogeographic separation in the sea 
Much remains to be learned about single-stranded (ss) DNA viruses in natural systems, and the evolutionary relationships among them. One of the eight recognized families of ssDNA viruses is the Microviridae, a group of viruses infecting bacteria. In this study we used metagenomic analysis, genome assembly, and amplicon sequencing of purified ssDNA to show that bacteriophages belonging to the subfamily Gokushovirinae within the Microviridae are genetically diverse and widespread members of marine microbial communities. Metagenomic analysis of coastal samples from the Gulf of Mexico (GOM) and British Columbia, Canada, revealed numerous sequences belonging to gokushoviruses and allowed the assembly of five putative genomes with an organization similar to chlamydiamicroviruses. Fragment recruitment to these genomes from different metagenomic data sets is consistent with gokushovirus genotypes being restricted to specific oceanic regions. Conservation among the assembled genomes allowed the design of degenerate primers that target an 800 bp fragment from the gene encoding the major capsid protein. Sequences could be amplified from coastal temperate and subtropical waters, but not from samples collected from the Arctic Ocean, or freshwater lakes. Phylogenetic analysis revealed that most sequences were distantly related to those from cultured representatives. Moreover, the sequences fell into at least seven distinct evolutionary groups, most of which were represented by one of the assembled metagenomes. Our results greatly expand the known sequence space for gokushoviruses, and reveal biogeographic separation and new evolutionary lineages of gokushoviruses in the oceans.
PMCID: PMC3871881  PMID: 24399999
biogeography; ssDNA viruses; Microviridae; Gokushovirinae; virus diversity; ocean viruses
12.  The Fecal Virome of Pigs on a High-Density Farm ▿ † 
Journal of Virology  2011;85(22):11697-11708.
Swine are an important source of proteins worldwide but are subject to frequent viral outbreaks and numerous infections capable of infecting humans. Modern farming conditions may also increase viral transmission and potential zoonotic spread. We describe here the metagenomics-derived virome in the feces of 24 healthy and 12 diarrheic piglets on a high-density farm. An average of 4.2 different mammalian viruses were shed by healthy piglets, reflecting a high level of asymptomatic infections. Diarrheic pigs shed an average of 5.4 different mammalian viruses. Ninety-nine percent of the viral sequences were related to the RNA virus families Picornaviridae, Astroviridae, Coronaviridae, and Caliciviridae, while 1% were related to the small DNA virus families Circoviridae, and Parvoviridae. Porcine RNA viruses identified, in order of decreasing number of sequence reads, consisted of kobuviruses, astroviruses, enteroviruses, sapoviruses, sapeloviruses, coronaviruses, bocaviruses, and teschoviruses. The near-full genomes of multiple novel species of porcine astroviruses and bocaviruses were generated and phylogenetically analyzed. Multiple small circular DNA genomes encoding replicase proteins plus two highly divergent members of the Picornavirales order were also characterized. The possible origin of these viral genomes from pig-infecting protozoans and nematodes, based on closest sequence similarities, is discussed. In summary, an unbiased survey of viruses in the feces of intensely farmed animals revealed frequent coinfections with a highly diverse set of viruses providing favorable conditions for viral recombination. Viral surveys of animals can readily document the circulation of known and new viruses, facilitating the detection of emerging viruses and prospective evaluation of their pathogenic and zoonotic potentials.
PMCID: PMC3209269  PMID: 21900163
13.  Unexpected Inheritance: Multiple Integrations of Ancient Bornavirus and Ebolavirus/Marburgvirus Sequences in Vertebrate Genomes 
PLoS Pathogens  2010;6(7):e1001030.
Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological advantage to the species. In addition, the viruses could also benefit, as some resistant species (e.g. bats) may serve as natural reservoirs for their persistence and transmission. Given the stringent limitations imposed in this informatics search, the examples described here should be considered a low estimate of the number of such integration events that have persisted over evolutionary time scales. Clearly, the sources of genetic information in vertebrate genomes are much more diverse than previously suspected.
Author Summary
Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented. In this comprehensive study, we compared sequences representing all known non-retroviruses containing single stranded RNA genomes, with the genomes of 48 vertebrate species. We discovered that as long ago as 40 million years, almost half of these species acquired sequences related to the genes of certain of these RNA viruses. Surprisingly, almost all of the nearly 80 integrations identified are related to only two viral families, the Ebola/ Marburgviruses, and Bornaviruses, which are deadly pathogens that cause lethal hemorrhagic fevers and neurological disease, respectively. The conservation and expression of some of these endogenous sequences, and a potential correlation between their presence and a species' resistance to the diseases caused by the related viruses, suggest that they may afford an important selective advantage in these vertebrate populations. The related viruses could also benefit, as some resistant species may provide natural reservoirs for their persistence and transmission. This first comprehensive study of its kind demonstrates that the sources of genetic inheritance in vertebrate genomes are considerably more diverse than previously appreciated.
PMCID: PMC2912400  PMID: 20686665
14.  The Fecal Viral Flora of Wild Rodents 
PLoS Pathogens  2011;7(9):e1002218.
The frequent interactions of rodents with humans make them a common source of zoonotic infections. To obtain an initial unbiased measure of the viral diversity in the enteric tract of wild rodents we sequenced partially purified, randomly amplified viral RNA and DNA in the feces of 105 wild rodents (mouse, vole, and rat) collected in California and Virginia. We identified in decreasing frequency sequences related to the mammalian viruses families Circoviridae, Picobirnaviridae, Picornaviridae, Astroviridae, Parvoviridae, Papillomaviridae, Adenoviridae, and Coronaviridae. Seventeen small circular DNA genomes containing one or two replicase genes distantly related to the Circoviridae representing several potentially new viral families were characterized. In the Picornaviridae family two new candidate genera as well as a close genetic relative of the human pathogen Aichi virus were characterized. Fragments of the first mouse sapelovirus and picobirnaviruses were identified and the first murine astrovirus genome was characterized. A mouse papillomavirus genome and fragments of a novel adenovirus and adenovirus-associated virus were also sequenced. The next largest fraction of the rodent fecal virome was related to insect viruses of the Densoviridae, Iridoviridae, Polydnaviridae, Dicistroviriade, Bromoviridae, and Virgaviridae families followed by plant virus-related sequences in the Nanoviridae, Geminiviridae, Phycodnaviridae, Secoviridae, Partitiviridae, Tymoviridae, Alphaflexiviridae, and Tombusviridae families reflecting the largely insect and plant rodent diet. Phylogenetic analyses of full and partial viral genomes therefore revealed many previously unreported viral species, genera, and families. The close genetic similarities noted between some rodent and human viruses might reflect past zoonoses. This study increases our understanding of the viral diversity in wild rodents and highlights the large number of still uncharacterized viruses in mammals.
Author Summary
Rodents are the natural reservoir of numerous zoonotic viruses causing serious diseases in humans. We used an unbiased metagenomic approach to characterize the viral diversity in rodent feces. In addition to diet-derived insect and plant viruses mammalian viral sequences were abundant and diverse. Most notably, multiple new circular viral DNA families, two new picornaviridae genera, and the first murine astrovirus and picobirnaviruses were characterized. A mouse kobuvirus was a close relative to the Aichi virus human pathogen. This study significantly increases the known genetic diversity of eukaryotic viruses in rodents and provides an initial description of their enteric viromes.
PMCID: PMC3164639  PMID: 21909269
15.  Endogenous Viral Elements in Animal Genomes 
PLoS Genetics  2010;6(11):e1001191.
Integration into the nuclear genome of germ line cells can lead to vertical inheritance of retroviral genes as host alleles. For other viruses, germ line integration has only rarely been documented. Nonetheless, we identified endogenous viral elements (EVEs) derived from ten non-retroviral families by systematic in silico screening of animal genomes, including the first endogenous representatives of double-stranded RNA, reverse-transcribing DNA, and segmented RNA viruses, and the first endogenous DNA viruses in mammalian genomes. Phylogenetic and genomic analysis of EVEs across multiple host species revealed novel information about the origin and evolution of diverse virus groups. Furthermore, several of the elements identified here encode intact open reading frames or are expressed as mRNA. For one element in the primate lineage, we provide statistically robust evidence for exaptation. Our findings establish that genetic material derived from all known viral genome types and replication strategies can enter the animal germ line, greatly broadening the scope of paleovirological studies and indicating a more significant evolutionary role for gene flow from virus to animal genomes than has previously been recognized.
Author Summary
The presence of retrovirus sequences in animal genomes has been recognized since the 1970s, but is readily explained by the fact that these viruses integrate into chromosomal DNA as part of their normal replication cycle. Unexpectedly, however, we identified a large and diverse population of sequences in animal genomes that are derived from non-retroviral viruses. Analysis of these sequences—which represent all known virus genome types and replication strategies—reveals new information about the evolutionary history of viruses, in many cases providing the first and only direct evidence for their ancient origins. Additionally, we provide evidence that the functionality of one of these sequences has been maintained in the host genome over many millions of years, raising the possibility that captured viral sequences may have played a larger than expected role in host evolution.
PMCID: PMC2987831  PMID: 21124940
16.  The ancient Virus World and evolution of cells 
Biology Direct  2006;1:29.
Recent advances in genomics of viruses and cellular life forms have greatly stimulated interest in the origins and evolution of viruses and, for the first time, offer an opportunity for a data-driven exploration of the deepest roots of viruses. Here we briefly review the current views of virus evolution and propose a new, coherent scenario that appears to be best compatible with comparative-genomic data and is naturally linked to models of cellular evolution that, from independent considerations, seem to be the most parsimonious among the existing ones.
Several genes coding for key proteins involved in viral replication and morphogenesis as well as the major capsid protein of icosahedral virions are shared by many groups of RNA and DNA viruses but are missing in cellular life forms. On the basis of this key observation and the data on extensive genetic exchange between diverse viruses, we propose the concept of the ancient virus world. The virus world is construed as a distinct contingent of viral genes that continuously retained its identity throughout the entire history of life. Under this concept, the principal lineages of viruses and related selfish agents emerged from the primordial pool of primitive genetic elements, the ancestors of both cellular and viral genes. Thus, notwithstanding the numerous gene exchanges and acquisitions attributed to later stages of evolution, most, if not all, modern viruses and other selfish agents are inferred to descend from elements that belonged to the primordial genetic pool. In this pool, RNA viruses would evolve first, followed by retroid elements, and DNA viruses. The Virus World concept is predicated on a model of early evolution whereby emergence of substantial genetic diversity antedates the advent of full-fledged cells, allowing for extensive gene mixing at this early stage of evolution. We outline a scenario of the origin of the main classes of viruses in conjunction with a specific model of precellular evolution under which the primordial gene pool dwelled in a network of inorganic compartments. Somewhat paradoxically, under this scenario, we surmise that selfish genetic elements ancestral to viruses evolved prior to typical cells, to become intracellular parasites once bacteria and archaea arrived at the scene. Selection against excessively aggressive parasites that would kill off the host ensembles of genetic elements would lead to early evolution of temperate virus-like agents and primitive defense mechanisms, possibly, based on the RNA interference principle. The emergence of the eukaryotic cell is construed as the second melting pot of virus evolution from which the major groups of eukaryotic viruses originated as a result of extensive recombination of genes from various bacteriophages, archaeal viruses, plasmids, and the evolving eukaryotic genomes. Again, this vision is predicated on a specific model of the emergence of eukaryotic cell under which archaeo-bacterial symbiosis was the starting point of eukaryogenesis, a scenario that appears to be best compatible with the data.
The existence of several genes that are central to virus replication and structure, are shared by a broad variety of viruses but are missing from cellular genomes (virus hallmark genes) suggests the model of an ancient virus world, a flow of virus-specific genes that went uninterrupted from the precellular stage of life's evolution to this day. This concept is tightly linked to two key conjectures on evolution of cells: existence of a complex, precellular, compartmentalized but extensively mixing and recombining pool of genes, and origin of the eukaryotic cell by archaeo-bacterial fusion. The virus world concept and these models of major transitions in the evolution of cells provide complementary pieces of an emerging coherent picture of life's history.
W. Ford Doolittle, J. Peter Gogarten, and Arcady Mushegian.
PMCID: PMC1594570  PMID: 16984643
17.  Temporal order of evolution of DNA replication systems inferred by comparison of cellular and viral DNA polymerases 
Biology Direct  2006;1:39.
The core enzymes of the DNA replication systems show striking diversity among cellular life forms and more so among viruses. In particular, and counter-intuitively, given the central role of DNA in all cells and the mechanistic uniformity of replication, the core enzymes of the replication systems of bacteria and archaea (as well as eukaryotes) are unrelated or extremely distantly related. Viruses and plasmids, in addition, possess at least two unique DNA replication systems, namely, the protein-primed and rolling circle modalities of replication. This unexpected diversity makes the origin and evolution of DNA replication systems a particularly challenging and intriguing problem in evolutionary biology.
I propose a specific succession for the emergence of different DNA replication systems, drawing argument from the differences in their representation among viruses and other selfish replicating elements. In a striking pattern, the DNA replication systems of viruses infecting bacteria and eukaryotes are dominated by the archaeal-type B-family DNA polymerase (PolB) whereas the bacterial replicative DNA polymerase (PolC) is present only in a handful of bacteriophage genomes. There is no apparent mechanistic impediment to the involvement of the bacterial-type replication machinery in viral DNA replication. Therefore, I hypothesize that the observed, markedly unequal distribution of the replicative DNA polymerases among the known cellular and viral replication systems has a historical explanation. I propose that, among the two types of DNA replication machineries that are found in extant life forms, the archaeal-type, PolB-based system evolved first and had already given rise to a variety of diverse viruses and other selfish elements before the advent of the bacterial, PolC-based machinery. Conceivably, at that stage of evolution, the niches for DNA-viral reproduction have been already filled with viruses replicating with the help of the archaeal system, and viruses with the bacterial system never took off. I further suggest that the two other systems of DNA replication, the rolling circle mechanism and the protein-primed mechanism, which are represented in diverse selfish elements, also evolved prior to the emergence of the bacterial replication system. This hypothesis is compatible with the distinct structural affinities of PolB, which has the palm-domain fold shared with reverse transcriptases and RNA-dependent RNA polymerases, and PolC that has a distinct, unrelated nucleotidyltransferase fold. I propose that PolB is a descendant of polymerases that were involved in the replication of genetic elements in the RNA-protein world, prior to the emergence of DNA replication. By contrast, PolC might have evolved from an ancient non-templated polymerase, e.g., polyA polymerase. The proposed temporal succession of the evolving DNA replication systems does not depend on the specific scenario adopted for the evolution of cells and viruses, i.e., whether viruses are derived from cells or virus-like elements are thought to originate from a primordial gene pool. However, arguments are presented in favor of the latter scenario as the most parsimonious explanation of the evolution of DNA replication systems.
Comparative analysis of the diversity of genomic strategies and organizations of viruses and cellular life forms has the potential to open windows into the deep past of life's evolution, especially, with the regard to the origin of genome replication systems. When complemented with information on the evolution of the relevant protein folds, this comparative approach can yield credible scenarios for very early steps of evolution that otherwise appear to be out of reach.
Eric Bapteste, Patrick Forterre, and Mark Ragan.
PMCID: PMC1766352  PMID: 17176463
18.  A bioinformatic analysis of ribonucleotide reductase genes in phage genomes and metagenomes 
Ribonucleotide reductase (RNR), the enzyme responsible for the formation of deoxyribonucleotides from ribonucleotides, is found in all domains of life and many viral genomes. RNRs are also amongst the most abundant genes identified in environmental metagenomes. This study focused on understanding the distribution, diversity, and evolution of RNRs in phages (viruses that infect bacteria). Hidden Markov Model profiles were used to analyze the proteins encoded by 685 completely sequenced double-stranded DNA phages and 22 environmental viral metagenomes to identify RNR homologs in cultured phages and uncultured viral communities, respectively.
RNRs were identified in 128 phage genomes, nearly tripling the number of phages known to encode RNRs. Class I RNR was the most common RNR class observed in phages (70%), followed by class II (29%) and class III (28%). Twenty-eight percent of the phages contained genes belonging to multiple RNR classes. RNR class distribution varied according to phage type, isolation environment, and the host’s ability to utilize oxygen. The majority of the phages containing RNRs are Myoviridae (65%), followed by Siphoviridae (30%) and Podoviridae (3%). The phylogeny and genomic organization of phage and host RNRs reveal several distinct evolutionary scenarios involving horizontal gene transfer, co-evolution, and differential selection pressure. Several putative split RNR genes interrupted by self-splicing introns or inteins were identified, providing further evidence for the role of frequent genetic exchange. Finally, viral metagenomic data indicate that RNRs are prevalent and highly dynamic in uncultured viral communities, necessitating future research to determine the environmental conditions under which RNRs provide a selective advantage.
This comprehensive study describes the distribution, diversity, and evolution of RNRs in phage genomes and environmental viral metagenomes. The distinct distributions of specific RNR classes amongst phages, combined with the various evolutionary scenarios predicted from RNR phylogenies suggest multiple inheritance sources and different selective forces for RNRs in phages. This study significantly improves our understanding of phage RNRs, providing insight into the diversity and evolution of this important auxiliary metabolic gene as well as the evolution of phages in response to their bacterial hosts and environments.
PMCID: PMC3653736  PMID: 23391036
Ribonucleotide reductase; Phage; Metagenome; Phage metadata; Phylogenetics; Evolution; Split gene
19.  Recombination in Enteroviruses Is a Biphasic Replicative Process Involving the Generation of Greater-than Genome Length ‘Imprecise’ Intermediates 
PLoS Pathogens  2014;10(6):e1004191.
Recombination in enteroviruses provides an evolutionary mechanism for acquiring extensive regions of novel sequence, is suggested to have a role in genotype diversity and is known to have been key to the emergence of novel neuropathogenic variants of poliovirus. Despite the importance of this evolutionary mechanism, the recombination process remains relatively poorly understood. We investigated heterologous recombination using a novel reverse genetic approach that resulted in the isolation of intermediate chimeric intertypic polioviruses bearing genomes with extensive duplicated sequences at the recombination junction. Serial passage of viruses exhibiting such imprecise junctions yielded progeny with increased fitness which had lost the duplicated sequences. Mutations or inhibitors that changed polymerase fidelity or the coalescence of replication complexes markedly altered the yield of recombinants (but did not influence non-replicative recombination) indicating both that the process is replicative and that it may be possible to enhance or reduce recombination-mediated viral evolution if required. We propose that extant recombinants result from a biphasic process in which an initial recombination event is followed by a process of resolution, deleting extraneous sequences and optimizing viral fitness. This process has implications for our wider understanding of ‘evolution by duplication’ in the positive-strand RNA viruses.
Author Summary
The rapid evolution of most positive-sense RNA viruses enables them to escape immune surveillance and adapt to new hosts. Genetic variation arises due to their error-prone RNA polymerases and by recombination of viral genomes in co-infected cells. We have developed a novel approach to analyse the poorly understood mechanism of recombination using a poliovirus model system. We characterised the initial viable recombinants and demonstrate the majority are longer than genome length due to an imprecise crossover event that duplicates part of the genome. These viruses are unfit, but rapidly lose the duplicated material and regain full fitness upon serial passage, a process we term resolution. We show this is a replicative recombination process by modifying the fidelity of the viral polymerase, or replication complex coalescence, using methods that have no influence on a previously reported, less efficient, non-replicative recombination mechanism. We conclude that recombination is a biphasic process involving separate generation and resolution events. These new insights into an important evolutionary mechanism have implications for our understanding of virus evolution through partial genome duplication, they suggest ways in which recombination might be modified and provides an approach that may be exploited to analyse recombination in other RNA viruses.
PMCID: PMC4055744  PMID: 24945141
20.  Recombination in Eukaryotic Single Stranded DNA Viruses 
Viruses  2011;3(9):1699-1738.
Although single stranded (ss) DNA viruses that infect humans and their domesticated animals do not generally cause major diseases, the arthropod borne ssDNA viruses of plants do, and as a result seriously constrain food production in most temperate regions of the world. Besides the well known plant and animal-infecting ssDNA viruses, it has recently become apparent through metagenomic surveys of ssDNA molecules that there also exist large numbers of other diverse ssDNA viruses within almost all terrestrial and aquatic environments. The host ranges of these viruses probably span the tree of life and they are likely to be important components of global ecosystems. Various lines of evidence suggest that a pivotal evolutionary process during the generation of this global ssDNA virus diversity has probably been genetic recombination. High rates of homologous recombination, non-homologous recombination and genome component reassortment are known to occur within and between various different ssDNA virus species and we look here at the various roles that these different types of recombination may play, both in the day-to-day biology, and in the longer term evolution, of these viruses. We specifically focus on the ecological, biochemical and selective factors underlying patterns of genetic exchange detectable amongst the ssDNA viruses and discuss how these should all be considered when assessing the adaptive value of recombination during ssDNA virus evolution.
PMCID: PMC3187698  PMID: 21994803
parvovirus; geminivirus; anellovirus; circovirus; nanovirus
21.  Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures 
BMC Genomics  2008;9:420.
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses.
From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes.
That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
PMCID: PMC2556352  PMID: 18798991
22.  Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses 
PLoS Biology  2010;8(9):e1000495.
Ancient hepadnavirus sequences found in bird genomes reveal that this family of viruses, which includes the human hepatitis B virus, is much older than previously thought.
Because most extant viruses mutate rapidly and lack a true fossil record, their deep evolution and long-term substitution rates remain poorly understood. In addition to retroviruses, which rely on chromosomal integration for their replication, many other viruses replicate in the nucleus of their host's cells and are therefore prone to endogenization, a process that involves integration of viral DNA into the host's germline genome followed by long-term vertical inheritance. Such endogenous viruses are highly valuable as they provide a molecular fossil record of past viral invasions, which may be used to decipher the origins and long-term evolutionary characteristics of modern pathogenic viruses. Hepadnaviruses (Hepadnaviridae) are a family of small, partially double-stranded DNA viruses that include hepatitis B viruses. Here we report the discovery of endogenous hepadnaviruses in the genome of the zebra finch. We used a combination of cross-species analysis of orthologous insertions, molecular dating, and phylogenetic analyses to demonstrate that hepadnaviruses infiltrated repeatedly the germline genome of passerine birds. We provide evidence that some of the avian hepadnavirus integration events are at least 19 My old, which reveals a much deeper ancestry of Hepadnaviridae than could be inferred based on the coalescence times of modern hepadnaviruses. Furthermore, the remarkable sequence similarity between endogenous and extant avian hepadnaviruses (up to 75% identity) suggests that long-term substitution rates for these viruses are on the order of 10−8 substitutions per site per year, which is a 1,000-fold slower than short-term rates estimated based on the sequences of circulating hepadnaviruses. Together, these results imply a drastic shift in our understanding of the time scale of hepadnavirus evolution, and suggest that the rapid evolutionary dynamics characterizing modern avian hepadnaviruses do not reflect their mode of evolution on a deep time scale.
Author Summary
Paleovirology is the study of ancient viruses and the way they have shaped the innate immune system of their hosts over millions of years. One way to reconstruct the deep evolution of viruses is to search for viral sequences “fossilized” at different evolutionary time points in the genome of their hosts. Besides retroviruses, few virus families are known to have deposited molecular relics in their host's genomes. Here we report on the discovery of multiple fragments of viruses belonging to the Hepadnaviridae family (which includes the human hepatitis B viruses) fossilized in the genome of the zebra finch. We show that some of these fragments infiltrated the germline genome of passerine birds more than 19 million years ago, which implies that hepadnaviruses are much older than previously thought. Based on this age, we can infer a long-term avian hepadnavirus substitution rate, which is a 1,000-fold slower than all short-term substitution rates calculated based on extant hepadnavirus sequences. These results call for a reevaluation of the long-term evolution of Hepadnaviridae, and indicate that some exogenous hepadnaviruses may still be circulating today in various passerine birds.
PMCID: PMC2946954  PMID: 20927357
23.  Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing 
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness.
Author Summary
Dengue virus infection is a global health concern, affecting as many as 100 million people annually worldwide. A critical first step to proper treatment and control of any virus infection is a correct diagnosis. Traditional diagnostic tests for viruses depend on amplification of conserved portions of the viral genome, detection of the binding of antibodies to viral proteins, or replication of the virus in cell cultures. These methods have a major shortcoming: they are unable to detect divergent or novel viruses for which a priori sequence, serological, or cellular tropism information is not known. In our study, we use two approaches, microarrays and deep sequencing, to virus identification that are less susceptible to such shortcomings. We used these unbiased tools to search for viruses in blood collected from Nicaraguan children with clinical symptoms indicating dengue virus infection, but for whom current dengue virus detection assays yielded negative results. We were able to identify both known and divergent viruses in about one third of previously negative samples, demonstrating the utility of these approaches to detect viruses in cases of unknown dengue-like illness.
PMCID: PMC3274504  PMID: 22347512
24.  Complete Genome Sequence of the Shrimp White Spot Bacilliform Virus 
Journal of Virology  2001;75(23):11811-11820.
We report the first complete genome sequence of a marine invertebrate virus. White spot bacilliform virus (WSBV; or white spot syndrome virus) is a major shrimp pathogen with a high mortality rate and a wide host range. Its double-stranded circular DNA genome of 305,107 bp contains 181 open reading frames (ORFs). Nine homologous regions containing 47 repeated minifragments that include direct repeats, atypical inverted repeat sequences, and imperfect palindromes were identified. This is the largest animal virus that has been completely sequenced. Although WSBV is morphologically similar to insect baculovirus, the two viruses are not detectably related at the amino acid level. Rather, some WSBV genes are more homologous to eukaryotic genes than viral genes. In fact, sequence analysis indicates that WSBV differs from all known viruses, although a few genes display a weak homology to herpesvirus genes. Most of the ORFs encode proteins that bear no homology to any known proteins, either suggesting that WSBV represents a novel class of viruses or perhaps implying a significant evolutionary distance between marine and terrestrial viruses. The most unique feature of WSBV is the presence of an intact collagen gene, a gene encoding an extracellular matrix protein of animal cells that has never been found in any viruses. Determination of the genome of WSBV will facilitate a better understanding of the molecular mechanism underlying the pathogenesis of the WSBV virus and will also provide useful information concerning the evolution and divergence of marine and terrestrial animal viruses at the molecular level.
PMCID: PMC114767  PMID: 11689662
25.  Assessing the Diversity and Specificity of Two Freshwater Viral Communities through Metagenomics 
PLoS ONE  2012;7(3):e33641.
Transitions between saline and fresh waters have been shown to be infrequent for microorganisms. Based on host-specific interactions, the presence of specific clades among hosts suggests the existence of freshwater-specific viral clades. Yet, little is known about the composition and diversity of the temperate freshwater viral communities, and even if freshwater lakes and marine waters harbor distinct clades for particular viral sub-families, this distinction remains to be demonstrated on a community scale.
To help identify the characteristics and potential specificities of freshwater viral communities, such communities from two lakes differing by their ecological parameters were studied through metagenomics. Both the cluster richness and the species richness of the Lake Bourget virome were significantly higher that those of the Lake Pavin, highlighting a trend similar to the one observed for microorganisms (i.e. the specie richness observed in mesotrophic lakes is greater than the one observed in oligotrophic lakes). Using 29 previously published viromes, the cluster richness was shown to vary between different environment types and appeared significantly higher in marine ecosystems than in other biomes. Furthermore, significant genetic similarity between viral communities of related environments was highlighted as freshwater, marine and hypersaline environments were separated from each other despite the vast geographical distances between sample locations within each of these biomes. An automated phylogeny procedure was then applied to marker genes of the major families of single-stranded (Microviridae, Circoviridae, Nanoviridae) and double-stranded (Caudovirales) DNA viruses. These phylogenetic analyses all spotlighted a very broad diversity and previously unknown clades undetectable by PCR analysis, clades that gathered sequences from the two lakes. Thus, the two freshwater viromes appear closely related, despite the significant ecological differences between the two lakes. Furthermore, freshwater viral communities appear genetically distinct from other aquatic ecosystems, demonstrating the specificity of freshwater viruses at a community scale for the first time.
PMCID: PMC3303852  PMID: 22432038

Results 1-25 (1079360)