This study extends the known sequence space for viral DNA polymerase genes to reveal that podoviruses are diverse and widespread members of marine communities. Sequence analysis revealed the existence of at least three previously unrecognized broad evolutionary groups of environmental podovirus polymerase sequences that have no cultured representatives, while the genetic richness of other groups has been expanded markedly.
Our study investigated the genetic richness of marine T7-like phages by targeting family A DNA pol with the degenerate primers Podo-F and Podo-R2. These primers amplified the expected 1,430-bp product from bacteriophage T7 and generated products of ca. 1,150 to 1,250 bp (data not shown) from natural marine virus communities. Podoviruses that harbor family B DNA pol, such as the marine phage VpV262, would not be captured with this primer set.
The PCR products from each of the four water column and four sediment samples collected from the west coast of Canada and the western Gulf of Mexico were cloned, and 20 clones were analyzed from each sample. Restriction digests of 160 cloned fragments yielded 29 different RFLP patterns that were sequenced (Fig. ); of these, 17 were at least 5% different from all others at the nucleotide level. For this reason, the sequences from the patterns B and K were removed from further analysis because they were more than 95% identical to the sequence from pattern D. Moreover, multiple sequences of clones representing the same RFLP pattern recovered the same genotype.
FIG. 1. RFLP patterns generated from the MboII digestion of natural virus DNA pol clones. Shown are marine surface sediment samples from Sechelt Inlet (A), Malaspina Inlet sites 1 (B) and 4 (C), and Nanoose Bay (D) and water column samples from Howe Sound (E), (more ...)
Similar to results from other studies (5
), the frequency with which specific genotypes occurred varied among samples, implying that viral communities are different among sites. For example, in the sediments, five genotypes were recovered from Nanoose Bay, whereas, Malaspina site 1 was mostly dominated by a single genotype (Fig. and , respectively). Similarly, in the clone library of 20 PCR fragments from each sample, some genotypes occurred more frequently than others. For example, sequence SOG-W-A (i.e., Strait of Georgia, water sample A) was found in three different samples, Malaspina Inlet 442 and 443, as well as Howe Sound 430 (Fig. ). On the other hand, GOM-W-J was found only in water from the Gulf of Mexico (Fig. ). Within samples from British Columbia, six sequences occurred in more than one sample, whereas all the sequences from the Gulf of Mexico were unique to that location (Fig. and ). Although interpretations of relative genetic richness among samples are suspect because of PCR amplification, every environment differed in terms of the sequences recovered.
FIG. 2. Abundances of each phylotype in the clone libraries from water and sediment samples. The stacked bars identify specific sample distribution in terms of numbers of clones for each respective sequence. B and K were >95% similar at the sequence (more ...)
There was little genotypic overlap between OTUs in the sediment and the water (Fig. ), implying that the virus assemblages in these environments are distinct. This is consistent with different bacterial communities being present in the water and sediments, as well (15
). Nonetheless, some very similar sequences (58% at the amino acid level) were found in geographically distant locations (the Gulf of Mexico and British Columbia inlets) and different environments (water and sediment). This is similar to previous results (4
), where the same Podoviridae
sequences were found in very different environments. Consequently, a larger sequencing effort, or an approach using nondegenerate primers, might reveal that sequences that were not detected are present but rare. Similarly, some Myoviridae g20
) and cyanophage DNA pol
) are dispersed among a variety of environments, while others appear to be restricted in distribution.
Despite the high degeneracy of the primers, all of the sequences recovered were family A DNA pol
genes that clustered within the Podoviridae
. Sequences from T7-like phages (Fig. ) were consistently 200 bp shorter than those of the marine phages, with nucleotides being lost throughout the gene fragment, yet, the amino acid sequences retained many regions of strong conservation. Assuming that marine phages are ancestral (21
), it appears that the enterophage pol
genes have become more streamlined.
The amino acid sequence alignments showed multiple regions of conservation, as well as regions that were confined to specific groups. Phylogenetic analysis using both Bayesian (Fig. ) and maximum-likelihood algorithms produced similar trees, with several branches clustering with known cultured representatives, including cyanophage P60, Syn5, and SCBP-1, as well as roseophage SIO1, but distinct from the T7 enterophages. Most sequences fell into the clusters ENV1 and ENV2, which are not closely related to cultured representatives. HECTOR and PARIS are PCR products obtained with nondegenerate primers that amplify a shorter part of the DNA polymerase and are representatives of podovirus sequences found in a number of environments (4
); they resolve as a sister clade to the group containing roseophage SIO1 (Fig. ). None of the HECTOR and PARIS sequences fell into other groups, and none of the sequences from this study fell into the PARIS and HECTOR groups.
FIG. 3. Maximum-likelihood and Bayesian (1,000,000 MCMC generations; 25% burning) analyses of DNA pol inferred amino acid sequence alignments. Sequences from the major cultured representatives from each group are underlined. Sequences from the Global (more ...)
A BLAST search of the CAMERA database (35
) recovered many family A podovirus DNA pol
sequences. The CAMERA database contains the metagenomic data from the Global Ocean Survey, a study of 23 sites along a track from the northwest Atlantic through the eastern tropical Pacific (51
) that contains primarily bacterial sequences, but also many sequences from viruses (48
). Only sequences that included the reverse primer Podo-R2 and had a podovirus as the first hit in GenBank were kept for phylogenetic analysis. This was important, as DNA pol
sequences from some bacteria, such as Azorhizobium caulinodans
ORS 57, Burkholderia oklahomensis
EO147, and Pseudomonas putida
KT2440 are very similar to viral DNA pol
sequences. For example, there is 43% identity between the targeted fragments of the DNA polymerase of A. caulinodans
and that of the cyanophage SCBP-3. However, the top hits for the bacterial sequences were not viral and, when analyzed, clustered in a separate group near the cyanophages. For this reason, the samples collected in our study were 0.2-μm-pore filtered.
Phylogenetic analysis showed that none of the Global Ocean Survey sequences fell outside of the groups circumscribed by the sequences recovered using primers. This demonstrates that these primers can be used to target the genetic diversity of Podoviridae family A pol sequences found in marine systems. The environmental sequences established three new groups, ENV1, ENV2, and ENV3, that have no cultured representatives.
This study presents a non-culture-based method using new primers that can be used to explore the diversity, evolution, and distribution of podoviruses. These results demonstrate that the genetic richness of family A DNA pol sequences associated with podoviruses is much greater than previously recognized and that some sequences are most frequently recovered from specific environments. Also, the results reveal the existence of several groups of distantly related and previously unknown podoviruses. Bringing representative host-virus systems that include representatives of these groups into culture will be essential to understanding host-virus interactions in the sea.