Viral metagenomics, consisting of viral particle purification and shotgun sequencing, is a powerful technique for discovering viruses associated with diseases with no definitive etiology, viruses that share limited homology with known viruses, or viruses that are not culturable. Here we used viral metagenomics to examine viruses associated with sea turtle fibropapillomatosis (FP), a debilitating neoplastic disease affecting sea turtles worldwide. By means of purifying and shotgun sequencing the viral community directly from the fibropapilloma of a Florida green sea turtle, a novel single-stranded DNA virus, sea turtle tornovirus 1 (STTV1), was discovered. The single-stranded, circular genome of STTV1 was approximately 1,800 nucleotides in length. STTV1 has only weak amino acid level identities (25%) to chicken anemia virus in short regions of its genome; hence, STTV1 may represent the first member of a novel virus family. A total of 35 healthy turtles and 27 turtles with FP were tested for STTV1 using PCR, and only 2 turtles severely afflicted with FP were positive. The affected turtles were systemically infected with STTV1, since STTV1 was found in blood and all major organs. STTV1 exists as a quasispecies, with several genome variants identified in the fibropapilloma of each positive turtle, suggesting rapid evolution of this virus. The STTV1 variants were identical over the majority of their genomes but contained a hypervariable region with extensive divergence. This study demonstrates the potential of viral metagenomics for discovering novel viruses directly from animal tissue, which can enhance our understanding of viral evolution and diversity.
The non-enveloped bacilliform viruses are the second group of plant viruses known to possess a genome consisting of circular double-stranded DNA. We have characterized the viral transcript and determined the complete sequence of the genome of Commelina mellow mottle virus (CoYMV), a member of this group. Analysis of the viral transcript indicates that the virus encodes a single terminally-redundant genome-length plus 120 nucleotide transcript. A fraction of the transcripts is polyadenylated, although the majority of the transcript is not polyadenylated. Analysis of the genome sequence indicates that the genome is 7489 bp in size and that the transcribed strand contains three open reading frames capable of encoding proteins of 23, 15 and 216 kd. The function of the 25 and 15 kd proteins is unknown. Similarities between the 216 kd polypeptide and the cauliflower mosaic virus coat protein and protease/reverse transcriptase polyprotein suggest that the 216 kd polypeptide is a polyprotein that is proteolytically processed to yield the virion coat protein, a protease, and replicase (reverse transcriptase and ribonuclease H). Each strand of the CoYMV genome is interrupted by site-specific discontinuities. The locations of the 5'-ends of these discontinuities, and the presence and location of a region on the CoYMV transcript capable of annealing with the 3'-end of cytosolic initiator methionine tRNA are consistent with replication by reverse transcription. We have demonstrated that a construct containing 1.3 CoYMV genomes is infective when introduced into Commelina diffusa, the host for CoYMV, using Agrobacterium-mediated infection.
Sulfolobus turreted icosahedral virus (STIV) was the first icosahedral virus characterized from an archaeal host. It infects Sulfolobus species that thrive in the acidic hot springs (pH 2.9 to 3.9 and 72 to 92°C) of Yellowstone National Park. The overall capsid architecture and the structure of its major capsid protein are very similar to those of the bacteriophage PRD1 and eukaryotic viruses Paramecium bursaria Chlorella virus 1 and adenovirus, suggesting a viral lineage that predates the three domains of life. The 17,663-base-pair, circular, double-stranded DNA genome contains 36 potential open reading frames, whose sequences generally show little similarity to other genes in the sequence databases. However, functional and evolutionary information may be suggested by a protein's three-dimensional structure. To this end, we have undertaken structural studies of the STIV proteome. Here we report our work on A197, the product of an STIV open reading frame. The structure of A197 reveals a GT-A fold that is common to many members of the glycosyltransferase superfamily. A197 possesses a canonical DXD motif and a putative catalytic base that are hallmarks of this family of enzymes, strongly suggesting a glycosyltransferase activity for A197. Potential roles for the putative glycosyltransferase activity of A197 and their evolutionary implications are discussed.
Viral particles in stool samples from wild-living chimpanzees were analysed using random PCR amplification and sequencing. Sequences encoding proteins distantly related to the replicase protein of single-stranded circular DNA viruses were identified. Inverse PCR was used to amplify and sequence multiple small circular DNA viral genomes. The viral genomes were related in size and genome organization to vertebrate circoviruses and plant geminiviruses but with a different location for the stem–loop structure involved in rolling circle DNA replication. The replicase genes of these viruses were most closely related to those of the much smaller (∼1 kb) plant nanovirus circular DNA chromosomes. Because the viruses have characteristics of both animal and plant viruses, we named them chimpanzee stool-associated circular viruses (ChiSCV). Further metagenomic studies of animal samples will greatly increase our knowledge of viral diversity and evolution.
At this time, about 3,000 different viruses are recognized, but metagenomic studies suggest that these viruses are a small fraction of the viruses that exist in nature. We have explored viral diversity by deep sequencing nucleic acids obtained from virion populations enriched from raw sewage. We identified 234 known viruses, including 17 that infect humans. Plant, insect, and algal viruses as well as bacteriophages were also present. These viruses represented 26 taxonomic families and included viruses with single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), positive-sense ssRNA [ssRNA(+)], and dsRNA genomes. Novel viruses that could be placed in specific taxa represented 51 different families, making untreated wastewater the most diverse viral metagenome (genetic material recovered directly from environmental samples) examined thus far. However, the vast majority of sequence reads bore little or no sequence relation to known viruses and thus could not be placed into specific taxa. These results show that the vast majority of the viruses on Earth have not yet been characterized. Untreated wastewater provides a rich matrix for identifying novel viruses and for studying virus diversity.
Importance At this time, virology is focused on the study of a relatively small number of viral species. Specific viruses are studied either because they are easily propagated in the laboratory or because they are associated with disease. The lack of knowledge of the size and characteristics of the viral universe and the diversity of viral genomes is a roadblock to understanding important issues, such as the origin of emerging pathogens and the extent of gene exchange among viruses. Untreated wastewater is an ideal system for assessing viral diversity because virion populations from large numbers of individuals are deposited and because raw sewage itself provides a rich environment for the growth of diverse host species and thus their viruses. These studies suggest that the viral universe is far more vast and diverse than previously suspected.
At this time, virology is focused on the study of a relatively small number of viral species. Specific viruses are studied either because they are easily propagated in the laboratory or because they are associated with disease. The lack of knowledge of the size and characteristics of the viral universe and the diversity of viral genomes is a roadblock to understanding important issues, such as the origin of emerging pathogens and the extent of gene exchange among viruses. Untreated wastewater is an ideal system for assessing viral diversity because virion populations from large numbers of individuals are deposited and because raw sewage itself provides a rich environment for the growth of diverse host species and thus their viruses. These studies suggest that the viral universe is far more vast and diverse than previously suspected.
Viruses are known to be the most numerous biological entities in soil; however, little is known about their diversity in this environment. In order to explore the genetic diversity of soil viruses, we isolated viruses by centrifugation and sequential filtration before performing a metagenomic investigation. We adopted multiple-displacement amplification (MDA), an isothermal whole-genome amplification method with φ29 polymerase and random hexamers, to amplify viral DNA and construct clone libraries for metagenome sequencing. By the MDA method, the diversity of both single-stranded DNA (ssDNA) viruses and double-stranded DNA viruses could be investigated at the same time. On the contrary, by eliminating the denaturing step in the MDA reaction, only ssDNA viral diversity could be explored selectively. Irrespective of the denaturing step, more than 60% of the soil metagenome sequences did not show significant hits (E-value criterion, 0.001) with previously reported viral sequences. Those hits that were considered to be significant were also distantly related to known ssDNA viruses (average amino acid similarity, approximately 34%). Phylogenetic analysis showed that replication-related proteins (which were the most frequently detected proteins) related to those of ssDNA viruses obtained from the metagenomic sequences were diverse and novel. Putative circular genome components of ssDNA viruses that are unrelated to known viruses were assembled from the metagenomic sequences. In conclusion, ssDNA viral diversity in soil is more complex than previously thought. Soil is therefore a rich pool of previously unknown ssDNA viruses.
The human gut is known to be a reservoir of a wide variety of microbes, including viruses. Many RNA viruses are known to be associated with gastroenteritis; however, the enteric RNA viral community present in healthy humans has not been described. Here, we present a comparative metagenomic analysis of the RNA viruses found in three fecal samples from two healthy human individuals. For this study, uncultured viruses were concentrated by tangential flow filtration, and viral RNA was extracted and cloned into shotgun viral cDNA libraries for sequencing analysis. The vast majority of the 36,769 viral sequences obtained were similar to plant pathogenic RNA viruses. The most abundant fecal virus in this study was pepper mild mottle virus (PMMV), which was found in high concentrations—up to 109 virions per gram of dry weight fecal matter. PMMV was also detected in 12 (66.7%) of 18 fecal samples collected from healthy individuals on two continents, indicating that this plant virus is prevalent in the human population. A number of pepper-based foods tested positive for PMMV, suggesting dietary origins for this virus. Intriguingly, the fecal PMMV was infectious to host plants, suggesting that humans might act as a vehicle for the dissemination of certain plant viruses.
A comparative metagenomic analysis of RNA viruses in the human gut identifies the vast majority as plant pathogens.
Viruses are the most common biological entities in the marine environment. There has not been a global survey of these viruses, and consequently, it is not known what types of viruses are in Earth's oceans or how they are distributed. Metagenomic analyses of 184 viral assemblages collected over a decade and representing 68 sites in four major oceanic regions showed that most of the viral sequences were not similar to those in the current databases. There was a distinct “marine-ness” quality to the viral assemblages. Global diversity was very high, presumably several hundred thousand of species, and regional richness varied on a North-South latitudinal gradient. The marine regions had different assemblages of viruses. Cyanophages and a newly discovered clade of single-stranded DNA phages dominated the Sargasso Sea sample, whereas prophage-like sequences were most common in the Arctic. However most viral species were found to be widespread. With a majority of shared species between oceanic regions, most of the differences between viral assemblages seemed to be explained by variation in the occurrence of the most common viral species and not by exclusion of different viral genomes. These results support the idea that viruses are widely dispersed and that local environmental conditions enrich for certain viral types through selective pressure.
An extensive metagenomic survey of viral diversity in the marine environment is presented. Many phages are widely distributed, although location-specific selection results in enrichment of some viruses.
Metagenomics can be used to determine the diversity of complex, often unculturable, viral communities with various nucleic acid compositions. Here, we report the use of hydroxyapatite chromatography to efficiently fractionate double-stranded DNA (dsDNA), single-stranded DNA (ssDNA), dsRNA, and ssRNA genomes from known bacteriophages. Linker-amplified shotgun libraries were constructed to generate sequencing reads from each hydroxyapatite fraction. Greater than 90% of the reads displayed significant similarity to the expected genomes at the nucleotide level. These methods were applied to marine viruses collected from the Chesapeake Bay and the Dry Tortugas National Park. Isolated nucleic acids were fractionated using hydroxyapatite chromatography followed by linker-amplified shotgun library construction and sequencing. Taxonomic analysis demonstrated that the majority of environmental sequences, regardless of their source nucleic acid, were most similar to dsDNA viruses, reflecting the bias of viral metagenomic sequence databases.
There are no known RNA viruses that infect Archaea. Filling this gap in our knowledge of viruses will enhance our understanding of the relationships between RNA viruses from the three domains of cellular life and, in particular, could shed light on the origin of the enormous diversity of RNA viruses infecting eukaryotes. We describe here the identification of novel RNA viral genome segments from high-temperature acidic hot springs in Yellowstone National Park in the United States. These hot springs harbor low-complexity cellular communities dominated by several species of hyperthermophilic Archaea. A viral metagenomics approach was taken to assemble segments of these RNA virus genomes from viral populations isolated directly from hot spring samples. Analysis of these RNA metagenomes demonstrated unique gene content that is not generally related to known RNA viruses of Bacteria and Eukarya. However, genes for RNA-dependent RNA polymerase (RdRp), a hallmark of positive-strand RNA viruses, were identified in two contigs. One of these contigs is approximately 5,600 nucleotides in length and encodes a polyprotein that also contains a region homologous to the capsid protein of nodaviruses, tetraviruses, and birnaviruses. Phylogenetic analyses of the RdRps encoded in these contigs indicate that the putative archaeal viruses form a unique group that is distinct from the RdRps of RNA viruses of Eukarya and Bacteria. Collectively, our findings suggest the existence of novel positive-strand RNA viruses that probably replicate in hyperthermophilic archaeal hosts and are highly divergent from RNA viruses that infect eukaryotes and even more distant from known bacterial RNA viruses. These positive-strand RNA viruses might be direct ancestors of RNA viruses of eukaryotes.
The Microviridae comprises icosahedral lytic viruses with circular single-stranded DNA genomes. The family is divided into two distinct groups based on genome characteristics and virion structure. Viruses infecting enterobacteria belong to the genus Microvirus, whereas those infecting obligate parasitic bacteria, such as Chlamydia, Spiroplasma and Bdellovibrio, are classified into a subfamily, the Gokushovirinae. Recent metagenomic studies suggest that members of the Microviridae might also play an important role in marine environments. In this study we present the identification and characterization of Microviridae-related prophages integrated in the genomes of species of the Bacteroidetes, a phylum not previously known to be associated with microviruses. Searches against metagenomic databases revealed the presence of highly similar sequences in the human gut. This is the first report indicating that viruses of the Microviridae lysogenize their hosts. Absence of associated integrase-coding genes and apparent recombination with dif-like sequences suggests that Bacteroidetes-associated microviruses are likely to rely on the cellular chromosome dimer resolution machinery. Phylogenetic analysis of the putative major capsid proteins places the identified proviruses into a group separate from the previously characterized microviruses and gokushoviruses, suggesting that the genetic diversity and host range of bacteriophages in the family Microviridae is wider than currently appreciated.
Geminiviruses with small circular single-stranded DNA genomes replicate in plant cell nuclei by using various double-stranded DNA (dsDNA) intermediates: distinct open circular and covalently closed circular as well as heterogeneous linear DNA. Their DNA may be methylated partially at cytosine residues, as detected previously by bisulfite sequencing and subsequent PCR. In order to determine the methylation patterns of the circular molecules, the DNAs of tomato yellow leaf curl Sardinia virus (TYLCSV) and Abutilon mosaic virus were investigated utilizing bisulfite treatment followed by rolling circle amplification. Shotgun sequencing of the products yielded a randomly distributed 50% rate of C maintenance after the bisulfite reaction for both viruses. However, controls with unmethylated single-stranded bacteriophage DNA resulted in the same level of C maintenance. Only one short DNA stretch within the C2/C3 promoter of TYLCSV showed hyperprotection of C, with the protection rate exceeding the threshold of the mean value plus 1 standard deviation. Similarly, the use of methylation-sensitive restriction enzymes suggested that geminiviruses escape silencing by methylation very efficiently, by either a rolling circle or recombination-dependent replication mode. In contrast, attempts to detect methylated bases positively by using methylcytosine-specific antibodies detected methylated DNA only in heterogeneous linear dsDNA, and methylation-dependent restriction enzymes revealed that the viral heterogeneous linear dsDNA was methylated preferentially.
Although single stranded (ss) DNA viruses that infect humans and their domesticated animals do not generally cause major diseases, the arthropod borne ssDNA viruses of plants do, and as a result seriously constrain food production in most temperate regions of the world. Besides the well known plant and animal-infecting ssDNA viruses, it has recently become apparent through metagenomic surveys of ssDNA molecules that there also exist large numbers of other diverse ssDNA viruses within almost all terrestrial and aquatic environments. The host ranges of these viruses probably span the tree of life and they are likely to be important components of global ecosystems. Various lines of evidence suggest that a pivotal evolutionary process during the generation of this global ssDNA virus diversity has probably been genetic recombination. High rates of homologous recombination, non-homologous recombination and genome component reassortment are known to occur within and between various different ssDNA virus species and we look here at the various roles that these different types of recombination may play, both in the day-to-day biology, and in the longer term evolution, of these viruses. We specifically focus on the ecological, biochemical and selective factors underlying patterns of genetic exchange detectable amongst the ssDNA viruses and discuss how these should all be considered when assessing the adaptive value of recombination during ssDNA virus evolution.
parvovirus; geminivirus; anellovirus; circovirus; nanovirus
Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages.
The aim of this study was to develop and demonstrate an approach for describing the diversity of human pathogenic viruses in an environmentally isolated viral metagenome.
Methods and Results
In silico bioinformatic experiments were used to select an optimum annotation strategy for discovering human viruses in virome datasets, and applied to annotate a class B biosolids virome. Results from the in silico study indicated that less than 1% errors in virus identification could be achieved when nucleotide-based search programs (BLASTn or tBLASTx), viral genome only databases, and sequence reads greater than 200 nt were considered. Within the 51,925 annotated sequences, 94 DNA and 19 RNA sequences were identified as human viruses. Virus diversity included environmentally transmitted agents such as parechovirus, coronavirus, adenovirus, and aichi virus, as well as viruses associated with chronic human infections such as human herpes and hepatitis C viruses.
This study provided a bioinformatic approach for identifying pathogens in a virome dataset, and demonstrated the human virus diversity in a relevant environmental sample.
Significance and Impact of Study
As the costs of next generation sequencing decrease, the pathogen diversity described by virus metagenomes will provide an unbiased guide for subsequent cell-culture and quantitative pathogen analyses, and ensures that highly enriched and relevant pathogens are not neglected in exposure and risk assessments.
virus; bioinformatics; biosolids; next generation DNA sequencing; viral metagenome; pathogen; virome
Chlorella viruses (or chloroviruses) are very large, plaque-forming viruses. The viruses are multilayered structures containing a large double-stranded DNA genome, a lipid bilayered membrane, and an outer icosahedral capsid shell. The viruses replicate in certain isolates of the coccal green alga, Chlorella. Sequence analysis of the 330-kbp genome of Paramecium bursaria Chlorella virus 1 (PBCV-1), the prototype of the virus family Phycodnaviridae, reveals <365 protein-encoding genes and 11 tRNA genes. Products of about 40% of these genes resemble proteins of known function, including many that are unexpected for a virus. Among these is a virus-encoded protein, called Kcv, which forms a functional K+ channel. This chapter focuses on the initial steps in virus infection and provides a plausible role for the function of the viral K+ channel in lowering the turgor pressure of the host. This step appears to be a prerequisite for delivery of the viral genome into the host.
Circoviruses are small, nonenveloped icosahedral animal viruses characterized by circular single-stranded DNA genomes. Their genomes are the smallest possessed by animal viruses. Infections with circoviruses, which can lead to economically important diseases, frequently result in virus-induced damage to lymphoid tissue and immunosuppression. Within the family Circoviridae, different genera are distinguished by differences in genomic organization. Thus, Chicken anemia virus is in the genus Gyrovirus, while porcine circoviruses and Beak and feather disease virus belong to the genus Circovirus. Little is known about the structures of circoviruses. Accordingly, we investigated the structures of these three viruses with a view to determining whether they are related. Three-dimensional maps computed from electron micrographs showed that all three viruses have a T=1 organization with capsids formed from 60 subunits. Porcine circovirus type 2 and beak and feather disease virus show similar capsid structures with flat pentameric morphological units, whereas chicken anemia virus has stikingly different protruding pentagonal trumpet-shaped units. It thus appears that the structures of viruses in the same genus are related but that those of viruses in different genera are unrelated.
Viruses, most of which are phage, are extremely abundant in marine sediments, yet almost nothing is known about their identity or diversity. We present the metagenomic analysis of an uncultured near-shore marine-sediment viral community. Three-quarters of the sequences in the sample were not related to anything previously reported. Among the sequences that could be identified, the majority belonged to double-stranded DNA phage. Temperate phage were more common than lytic phage, suggesting that lysogeny may be an important lifestyle for sediment viruses. Comparisons between the sediment sample and previously sequenced seawater viral communities showed that certain phage phylogenetic groups were abundant in all marine viral communities, while other phage groups were under-represented or absent. This 'marineness' suggests that marine phage are derived from a common set of ancestors. Several independent mathematical models, based on the distribution of overlapping shotgun sequence fragments from the library, were used to show that the diversity of the viral community was extremely high, with at least 10(4) viral genotypes per kilogram of sediment and a Shannon index greater than 9 nats. Based on these observations we propose that marine-sediment viral communities are one of the largest unexplored reservoirs of sequence space on the planet.
Mutations introduced into the capsid gene of duck hepatitis B virus (DHBV) were tested for their effects on viral DNA synthesis and assembly of enveloped viruses. Four classes of mutant phenotypes were observed among a series of deletions of covering the 3' end of the capsid open reading frame. Class I mutant capsids were able to support normal single-stranded and relaxed circular viral DNA synthesis; class II mutant capsids supported normal single-stranded DNA synthesis but not relaxed circular DNA synthesis; class III mutant capsids resembled class II capsids, but viral DNA synthesis was inhibited 5- to 10-fold; and class IV capsids were severely restricted in their ability to support viral DNA synthesis. Class I capsids were assembled into enveloped virions, but class II, III, and IV capsids were not. Viral DNA synthesized inside class II capsids was normal with respect to minus-strand DNA initiation, plus-strand DNA initiation, and circularization of the DNA, but plus strands failed to be elongated to mature 3-kb DNA. The results suggest that a function of the capsid protein specifically required for viral DNA maturation is also required for assembly of nucleocapsids into envelopes. Thus, class II mutants appear to be defective in the appearance of the "packaging signal" for virus assembly (J. Summers and W. Mason, Cell 29:403-415, 1982).
In a recent BMC Evolutionary Biology article, Huiquan Liu and colleagues report two new genomes of double-stranded RNA (dsRNA) viruses from fungi and use these as a springboard to perform an extensive phylogenomic analysis of dsRNA viruses. The results support the old scenario of polyphyletic origin of dsRNA viruses from different groups of positive-strand RNA viruses and additionally reveal extensive horizontal gene transfer between diverse viruses consistent with the network-like rather than tree-like mode of viral evolution. Together with the unexpected discoveries of the first putative archaeal RNA virus and a RNA-DNA virus hybrid, this work shows that RNA viral genomics has major surprises to deliver.
See research article: http://www.biomedcentral.com/1471-2148/12/91
Human B19 erythrovirus is a ubiquitous viral pathogen, commonly infecting individuals before adulthood. As with all autonomous parvoviruses, its small single-stranded DNA genome is replicated with host cell machinery. While the mechanism of parvovirus genome replication has been studied in detail, the rate at which B19 virus evolves is unknown. By inferring the phylogenetic history and evolutionary dynamics of temporally sampled B19 sequences, we observed a surprisingly high rate of evolutionary change, at approximately 10−4 nucleotide substitutions per site per year. This rate is more typical of RNA viruses and suggests that high mutation rates are characteristic of the Parvoviridae.
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses.
From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes.
That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
Little is known about the viruses infecting most species. Even in groups as well-studied as Drosophila, only a handful of viruses have been well-characterized. A viral metagenomic approach was used to explore viral diversity in 83 wild-caught Drosophila innubila, a mushroom feeding member of the quinaria group. A single fly that was injected with, and died from, Drosophila C Virus (DCV) was added to the sample as a control. Two-thirds of reads in the infected sample had DCV as the best BLAST hit, suggesting that the protocol developed is highly sensitive. In addition to the DCV hits, several sequences had Oryctes rhinoceros Nudivirus, a double-stranded DNA virus, as a best BLAST hit. The virus associated with these sequences was termed Drosophila innubila Nudivirus (DiNV). PCR screens of natural populations showed that DiNV was both common and widespread taxonomically and geographically. Electron microscopy confirms the presence of virions in fly fecal material similar in structure to other described Nudiviruses. In 2 species, D. innubila and D. falleni, the virus is associated with a severe (∼80–90%) loss of fecundity and significantly decreased lifespan.
Freshwater lakes and ponds present an ecological interface between humans and a variety of host organisms. They are a habitat for the larval stage of many insects and may serve as a medium for intraspecies and interspecies transmission of viruses such as avian influenza A virus. Furthermore, freshwater bodies are already known repositories for disease-causing viruses such as Norwalk Virus, Coxsackievirus, Echovirus, and Adenovirus. While RNA virus populations have been studied in marine environments, to this date there has been very limited analysis of the viral community in freshwater. Here we present a survey of RNA viruses in Lake Needwood, a freshwater lake in Maryland, USA. Our results indicate that just as in studies of other aquatic environments, the majority of nucleic acid sequences recovered did not show any significant similarity to known sequences. The remaining sequences are mainly from viral types with significant similarity to approximately 30 viral families. We speculate that these novel viruses may infect a variety of hosts including plants, insects, fish, domestic animals and humans. Among these viruses we have discovered a previously unknown dsRNA virus closely related to Banna Virus which is responsible for a febrile illness and is endemic to Southeast Asia. Moreover we found multiple viral sequences distantly related to Israeli Acute Paralysis virus which has been implicated in honeybee colony collapse disorder. Our data suggests that due to their direct contact with humans, domestic and wild animals, freshwater ecosystems might serve as repositories of a wide range of viruses (both pathogenic and non-pathogenic) and possibly be involved in the spread of emerging and pandemic diseases.
The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system.
microphage; microviridae; turkey; enteric; metagenomics