|Home | About | Journals | Submit | Contact Us | Français|
Male-specific single-stranded RNA (FRNA) coliphages belong to the family Leviviridae. They are classified into two genera (Levivirus and Allolevivirus), which can be subdivided into four genogroups (genogroups I and II in Levivirus and genogroups III and IV in Allolevivirus). Relatively few strains have been completely characterized, and hence, a detailed knowledge of this virus family is lacking. In this study, we sequenced and characterized the complete genomes of 19 FRNA strains (10 Levivirus strains and 9 Allolevivirus strains) and compared them to the 11 complete genome sequences available in GenBank. Nucleotide similarities among strains of Levivirus genogroups I and II were 75% to 99% and 83 to 94%, respectively, whereas similarities among strains of Allolevivirus genogroups III and IV ranged from 70 to 96% and 75 to 95%, respectively. Although genogroup I strain fr and genogroup III strains MX1 and M11 share only 70 to 78% sequence identity with strains in their respective genogroups, phylogenetic analyses of the complete genome and the individual genes suggest that strain fr should be grouped in Levivirus genogroup I and that the MX1 and M11 strains belong in Allolevivirus genogroup III. Strains within each genus share >50% sequence identity, whereas between the two genera, strains have <40% nucleotide sequence identity. Overall, amino acid composition, nucleotide similarities, and replicase catalytic domain location contributed to phylogenetic assignments. A conserved eight-nucleotide signature at the 3′ end of the genome distinguishes leviviruses (5′ ACCACCCA 3′) from alloleviviruses (5′ TCCTCCCA 3′).
Male-specific RNA (FRNA) coliphages are single-stranded RNA (ssRNA) viruses that are found throughout the world in bacterial isolates associated with sewage and feces in mammals (12). They possess a positive-sense genome ranging from 3.8 to 4.2 kb in size enclosed by a nonenveloped 26-nm icosahedral capsid (5). The natural host is restricted to gram-negative bacteria (17) expressing a male factor F+, Hfr, or F′ (44). For successful infection, the host must possess a fertility (F) sex pilus, coded on the F plasmid of Escherichia coli (28), or chromosomal marker Hfr (6), as infection occurs by attachment to this receptor site (7).
FRNA phages belong to the family Leviviridae, which is further subdivided into two genera (Levivirus and Allolevivirus). Levivirus is subdivided into genogroups I and II, and Allolevivirus is subdivided into genogroups III and IV. Historically, separation into subgroups was based on serological properties (33), sedimentation, density, and molecular weight (36). Recently, genomic data have provided an additional subgrouping tool (41).
Based on a limited number of completely sequenced FRNA genomes, four genes were identified (4). These genes code for an assembly or maturation protein, capsid protein, lysis protein, and replicase protein in the leviviruses, whereas the lysis protein is replaced by a read-through protein in alloleviviruses. Nonstructural and structural proteins are encoded by the Leviviridae viral genome (5). Each FRNA virion contains 1 copy of positive-sense ssRNA, 180 copies of the capsid or coat protein, 1 copy of the assembly or maturation protein, and, in the alloleviviruses, approximately 15 copies of the read-through protein (38, 39, 42).
In this study, the complete genomes of 19 FRNA strains representing the four known genogroups were sequenced and compared to 11 FRNA sequences available in the National Center for Biotechnology Information (NCBI) GenBank, for a total analysis of 30 FRNA genomes. Phylogenetic profiles, nucleotide sequence similarity, amino acid compositions, open reading frame (ORF) positions, and subsequent gene locations were compared. The results of this study will contribute to a better understanding of the ecology of FRNA coliphages as well as provide a more substantial genetic database to design molecular FRNA detection and identification methods.
FRNA prototype strains used in this study, MS2, GA, Qβ, FI, and SP, were kindly provided by K. Furuse (Tokai University, Japan). Prototype strain fr was provided by A. Boehm (Stanford University, Stanford, CA), and prototype strains MX1 and M11 were obtained from the University of North Carolina, Chapel Hill, collection. FRNA strains ST4, TW18, VK, and BZ1 were a gift from J. van Duin (Leiden University, The Netherlands). Field-collected strains BR1, BR8, and BR12 were generously provided by Brian Robinson (NOAA, Charleston, SC), and strain R17 was obtained from the Felix D'Herelle Reference Centre for Bacterial Viruses, Université Laval, Quebec, Canada. Additional field strains DL1, DL2, DL13, DL16, J20, T72, DL10, DL20, HL4-9, HB-P22, and HB-P24, collected from wastewater, surface waters, swine lagoons, and chicken litter, were used in this study (11). Preliminary subgrouping of all 19 strains was conducted by reverse line blot hybridization (41).
Each strain was plaque purified and further enriched using Escherichia coli HS(pFamp)R as the host (41). Aliquots of approximately 1 to 2 ml of the purified viral supernatant were frozen at −75°C.
Coliphage RNA was extracted from purified virus as described by Stewart et al. (32) by using a QIAamp viral RNA minikit (Qiagen, Valencia, CA). Purified RNA was stored frozen at −20°C.
For cDNA synthesis, strain MS2 was used as the positive control. First, viral RNA was 3′ polyadenylated with yeast poly(A) polymerase (USB, Inc., Cleveland, OH) and 25 mM ATP in a 50-μl reaction volume (USB). The 50-μl reaction volume was prepared with 10 μl of 5× poly(A) polymerase reaction buffer, 10 μl RNA, 2 μl of 25 mM ATP, 0.7 μl of 600 U poly(A) polymerase, and 27.3 μl nuclease-free water. The mixture was incubated at 37°C for 5 min and then placed on ice for enzymatic termination. Polyadenylated RNA was either immediately frozen or used as a template for cDNA synthesis.
Full-length cDNA was prepared using an oligo(dT) reverse primer supplied with the reverse transcriptase MonsterScript 1st Strand cDNA synthesis kit (Epicentre, Madison, WI) as outlined by the manufacturer. The single-stranded cDNA was used as a template for the PCR. To verify the successful generation of full-length cDNA, amplification of a small region of the 5′ end of strain MS2 was used as a positive control (19).
To amplify the 1-kb region between the replicase gene and the 3′ end of the genome, strain-specific forward primers were designed based on a 200-nucleotide (nt) region of the replicase gene (41) and utilized along with an oligo(dT) reverse primer. To amplify the upstream region of the genome, reverse primers were designed based on the replicase gene sequence of each strain and forward primers were designed based on available FRNA coliphage sequences in GenBank. As sequences were generated (Sequetech, Mountain View, CA), reverse primers were designed to amplify overlapping sections of the genome. The majority of the genome was sequenced by primer walking.
The nucleotide sequence of the 5′ region was determined by rapid amplification of cDNA ends by using a Smart Race cDNA amplification kit (Clontech, Mountain View, CA) with only minor modifications. First-strand cDNA synthesis was carried out on ice with a 250-μl thin-walled PCR tube by combining 3 μl RNA, 1 μl of 10 μM gene-specific reverse primer, and 1 μl Smart oligonucleotide. The 5-μl reaction volume was briefly centrifuged, and the following components were added: 2 μl of 5× First Strand buffer (Invitrogen, Carlsbad, CA), 1 μl of 20 mM dithiothreitol, 1 μl of 10 mM deoxynucleoside triphosphate, and 1 μl SuperScript II (Invitrogen, Carlsbad, CA). Following a brief centrifugation, the mixture was incubated for 90 min at 42°C. To dilute the first-strand cDNA, 20 μl of Tricine-EDTA buffer was added, and the mixture was heated for 7 min at 72°C. The reaction generated double-stranded cDNA. The cDNA was frozen at −20°C and used for subsequent PCRs. In all experiments, a spectrophotometer (NanoDrop Technologies, Wilmington, DE) was used to determine nucleic acid concentrations.
The cDNA was amplified by using Phusion DNA polymerase (New England Biolabs, Ipswich, MA) in a master mix containing 10 μl of 5× Phusion buffer, 0.2 mM deoxynucleoside triphosphate, 1 μl of 10 μM forward primer, 1 μl of 10 μM reverse primer, 3% dimethyl sulfoxide, 2 μl cDNA, and 0.5 μl Phusion Taq in a 50-μl reaction volume by using the following cycle parameters: one cycle denaturation at 98°C (1 min) followed by 35 cycles at 98°C (30 s), 48°C (1 min), and 72°C (3 min) followed by a 10-min extension at 72°C. For each reaction, positive controls were prepared using primers MJV82 and JV81 for leviviruses and MJV82 and JV41 for alloleviviruses (41). A no-template negative control was included.
PCR products were separated by electrophoresis in a 1.5% agarose gel, stained with SYBR Gold nucleic acid gel stain (Molecular Probes, Carlsbad, CA), and visualized under blue light (Dark Reader transilluminators; Clare Chemical Research, Dolores, CO).
Blunt-end PCR products were excised using a gel extraction tool (USA Scientific Plastics, Ocala, FL) and purified according to the instructions of the manufacturer (QuickClean 5 M gel extraction kit; GenScript Corporation, Piscataway, NJ).
Gel-purified DNA was cloned using a Zero Blunt TOPO PCR cloning kit (Invitrogen, Carlsbad, CA). Colonies of transformed E. coli cells were screened for positive inserts by using whole-cell PCR and Phusion DNA polymerase as described above with the following cycle modifications: one cycle of denaturation at 98°C (3 min) followed by 35 cycles at 98°C (10 s), 57°C (30 s), and 72°C (30 s) followed by a 10-min extension at 72°C. Amplicons were separated by electrophoresis in 1.5% agarose gel in 0.5× Tris-acetate-EDTA, stained with 20 μg/ml ethidium bromide, and visualized under UV light (UVP, Upland, CA). Clones with the appropriate-size PCR amplicon were selected for plasmid purification (QIAprep Spin Miniprep kit; Qiagen, Valencia, CA).
Each PCR amplicon was cloned, and three to five clones were sequenced. PCR products from reactions from 5′ rapid amplification of cDNA ends were sequenced directly. To achieve publication-quality sequence data, both forward and reverse strands were sequenced (Sequetech, Mountain View, CA).
To avoid contamination, a PCR hood (AirClean 600; AirClean Systems, Raleigh, NC) located in a designated clean room was used to prepare master mixes. PCR amplification, electrophoresis, template, and/or viral preparations (35) were conducted in individual assigned rooms based on designated use.
Raw sequences from three to five individual clones were imported and aligned using BioEdit v7.0.1 (14) followed by Basic Local Alignment Search Tool (BLAST; National Center for Biotechnology Information) analyses for sequence and phylogenetic confirmations. Full-length sequences from all strains were aligned with those of prototype strains (GenBank) by using ClustalW.
Similarity analyses were evaluated using SimPlot version 3.5.1 (18). The percent similarity was calculated within a sliding window of 200 bp with a step size of 20 bp between plots.
Deduced amino acid sequences corresponding to each of the four genes were determined using a computer-generated DNA-to-protein translation tool, ExPASY (http://ca.expasy.org/). Predicted protein sequence motifs were identified by PROSITE (http://ca.expasy.org/prosite/), and protein families and domains were modeled in Pfam (http://pfam.janelia.org). Genetic distance was calculated for each protein within the respective genogroup by ClustalW alignment and neighbor-joining analysis.
Sequence data were analyzed using BioNumerics software, version 3.5 (Applied Maths, Saint-Martens-Latem, Belgium). Phylogenetic trees were built by global cluster analysis performed on multiple aligned sequences and clustered by unweighted-pair group method using arithmetic averages. A bootstrap analysis, based on 10,000 substitutions, was used to measure cluster significance. The reliability of each cluster was expressed on a percentage basis.
The accession numbers of some full-length Leviviridae sequences (for genogroup I, NC_001417 [MS2], AF195778 [M12], and X15031 [fr]; for genogroup II, NC_001426 [GA] and AF227250 [KU1]; for genogroup III, AF052431 [M11], AY099114 [Qβ], and AF059242 [MX1]; and for genogroup IV, X07489 [SP], AF059243 [NL95], and EF068134 [FI]) and a partial sequence of genogroup II strain TL2 (AB218927) were available in GenBank.
The GenBank accession numbers generated in this study are as follows: for genogroup I, EF107159 (DL1), EF108464 (DL16), EF108465 (R17), EF204939 (J20), and EF204940 (ST4); for genogroup II, FJ483837 (DL10), FJ483838 (T72), and FJ483839 (DL20); for genogroup III, FJ483840 (TW18), FJ483841 (HL4-9), FJ483842 (BR12), FJ483843 (VK), and FJ483844 (BZ1); and for genogroup IV, FJ539132 (HB-P22), FJ539133 (HB-P24), FJ539134 (BR1), and FJ539135 (BR8).
Full-length genome sequences of 19 FRNA strains were determined in this study and compared to 11 strains previously published in GenBank (Table (Table11).
Seven genogroup I strains (DL1, DL2, DL13, DL16, ST4, R17, and J20) were sequenced and compared to genogroup I prototype strains MS2, M12, and fr (Table (Table2).2). Genogroup I strains DL2 and DL13 were omitted from Table Table2,2, as they were >99% identical, differing by only 4 nt from the DL16 genome. MS2 and ST4 were 98.7% similar to each other. Sequence similarity among genogroup I strains ranged from 75.3 to 98.7%, with strain fr forming a separate subgroup (Table (Table2;2; Fig. Fig.11).
Sequences of three genogroup II strains (DL10, DL20, and T72) were compared to the sequences of genogroup II prototype strains GA and KU1. Among genogroup II strains, nucleotide sequence similarities ranged from 83.3 to 93.8%, with strains DL10, DL20, and GA having the highest sequence identities (93.4 to 93.7%), whereas strains T72 and KU1 formed a separate subcluster (Table (Table2;2; Fig. Fig.1).1). Strains in genogroup I had only 50% sequence similarity (range of 46.7 to 53.9%) with strains in genogroup II (Table (Table2;2; Fig. Fig.2A).2A). Interestingly, all Levivirus strains (genogroups I and II) had an identical 8-nt sequence at the 3′ terminus, 5′ ACCACCCA 3′ (Table (Table33).
Allolevivirus genogroup III strains formed two different subclusters (Fig. (Fig.1).1). Strains VK, HL4-9, BR12, BZ1, TW18, and Qβ, having nucleotide sequence similarities ranging from 91.9 to 95.7% (Table (Table2),2), formed the first subcluster (Fig. (Fig.1).1). The second subcluster was formed with genogroup III strains MX1 and M11, as these strains shared 87% nucleotide similarity. The nucleotide similarities of strains between the two genogroup III subclusters ranged from 69.8 to 71.3% (Table (Table2).2). Genogroup III strains shared <40% sequence identity (29.7 to 39.1%) with Levivirus genogroups I and II (Table (Table2).2). All Allolevivirus strains had an identical 3′ terminus signature sequence of 5′ TCCTCCCA 3′ (Table (Table33).
Sequences of genogroup IV strains BR1, BR8, HB-P22, and HB-P24 were compared to the sequences of genogroup IV prototype strains SP, FI, and NL95. Genogroup IV Allolevivirus strains shared nucleotide sequence identities ranging from 74.9 to 95.0%, with the closest identities being 95.0% between strains BR8 and BR1 (Table (Table2).2). Strain HB-P24 shared 90.2% nucleotide identity with prototype strain NL95, whereas strains BR8 and BR1 grouped with prototype strain SP (91.1 to 91.7%). In contrast, strain FI formed a unique phylogenetic subcluster (Fig. (Fig.1).1). Genogroup IV nucleotide sequence identity was 53.5 to 57.9% with Allolevivirus genogroup III (Table (Table2;2; Fig. Fig.2B)2B) and <40% (31.8 to 38.7%) with Levivirus genogroups I and II.
In Levivirus genogroup I, the ORF start and stop codons were located at nucleotide positions identical or very similar to those previously reported for strain MS2 (9). With the exceptions of strain DL1 and fr, all MS2-like strains (DL1, DL2, DL13, DL16, J20, ST4, R17, and M12) had their four ORFs located at similar nucleotide positions (Table (Table1).1). AUG was found to be the start codon for all four ORFs within genogroup I with two exceptions: (i) MS2, ST4, and fr had GUG as the start codon for ORF1, and (ii) the strain fr lysis gene start codon was UUG (Table (Table44).
The nucleotide start and stop positions of genogroup II genes of strains T72 and KU1 were similar to each other, whereas the nucleotide stop and start positions of strains DL10 and DL20 were similar to those of strain GA (Tables (Tables11 and and4).4). The start codon for all ORFs in genogroup II strains GA, DL10, and DL20 was AUG (Table (Table4).4). The lysis gene start codon in strains T72 and KU1 was UUG (Table (Table4).4). In addition, there was an 18-nt insertion between the capsid gene stop codon (ORF2) and the lysis gene start codon (ORF3) in strains T72 and KU1 that was absent in the other genogroup II strains. However, a translation coupled stop-start codon (UAAUG) was observed in the lysis protein for genogroup II strains GA, DL10, and DL20.
The Allolevivirus genome possesses four genes and three start codons, as the capsid and read-through genes share a single ORF (ORF2/ORF3) (Table (Table4).4). The ORF alignment positions for all genogroup III strains except MX1 and M11 were very similar if not identical. Although ORFs of strain Qβ aligned perfectly with those of the other genogroup III strains, the GenBank-acquired Qβ sequences were not complete. Thus, individually mapped ORF positions varied slightly (Table (Table1).1). ORF and Shine-Dalgarno positions of the assembly, capsid, and read-through genes of MX1 and M11 were similar to those of the other genogroup III strains, but the replicase gene ORF and Shine-Dalgarno positions differed (Table (Table44).
In prokaryotes, the Shine-Dalgarno core consensus sequence (GGAGG) or slight variations of the core sequence are located upstream from the ORF start codon (20, 30). Since variations of the core Shine-Dalgarno sequences were observed in this study, spacing between the start codon and Shine-Dalgarno sequence was defined as the number of nucleotides between the last base of the Shine-Dalgarno sequence and the first base of the start codon. For Levivirus groups I and II, Shine-Dalgarno sequences were located within 4 to 8 nt upstream from ORF1 and ORF2, 8 to 15 nt upstream from ORF3, and 6 nt upstream from ORF4. Shine-Dalgarno sequences for Allolevivirus groups III and IV were located 4 or 5 nt upstream from ORF1, 11 nt upstream from ORF2/ORF3, and 5 to 8 nt upstream from ORF4 (Table (Table44).
The maturation/assembly proteins in genogroups I and II were 393 and 390 amino acids in length, respectively, and the Levivirus capsid protein was 130 amino acids in length (Table (Table1).1). The lysis protein of strain fr consisted of 71 amino acids, whereas the remaining genogroup I strains had a lysis protein 75 amino acids in length. An amino acid deletion was observed in the genogroup II lysis protein of strains DL10, DL20, TL2, and GA relative to T72 and KU1 (Table (Table1).1). The conserved ssRNA YGDD sequence was located in the replicase protein, and the lengths of these proteins in genogroups I and II were 545 and 532 amino acids, respectively.
Pfam examines species distribution of the specified protein and to which family or families the protein structure belongs (10). Levivirus capsid amino acid compositions were placed into the “Levi_coat” domain, which included all Leviviridae strains from genogroups I, II, III, and IV and bacteriophage PRR1 in the Pfam species tree. Pfam searches of genogroup I and II replicase proteins resulted in phages from the Leviviridae family and the addition of bacteriophages PRR1, ZR, and BO1, as well as Acinetobacter phage AP205. The replicase protein was placed into the “RNA replicase, beta-chain” domain. The genogroup I lysis protein was not sorted into a family or domain in a PfamA search. A subsequent PfamB search for the genogroup I lysis protein linked it to a lysis domain, and the results matched Levivirus genogroup I strains fr, M12, MS2, and JP501. A PfamA maturation protein search generated the “phage_mat-A” domain along with a Pfam species tree including all Leviviridae strains plus three additional bacteriophages, PRR1, PP7, and AP205.
Predicted protein motifs, casein kinase II phosphorylation, cyclic AMP (cAMP)- and cGMP-dependent protein kinase phosphorylation, protein kinase C phosphorylation, N myristoylation, N glycosylation, and tyrosine kinase phosphorylation, occurred frequently in the FRNA coliphages. Unique to strain fr was the presence of a leucine zipper in the lysis protein and an amidation motif in the replicase region. The replicase gene RNA-dependent RNA polymerase catalytic domain occurred at amino acid positions 243 to 373 and 245 to 375 for groups I and II, respectively. Common to every genogroup II strain was a prenyl group binding site (CAAX box) at amino acid positions 529 to 532 in the replicase region.
Excluding that of strain fr, genogroup I amino acid compositions were very conserved, as the genetic distances were small (data not shown). The capsid protein was the most conserved (distance of 0.0000 to 0.0411), followed by maturation protein (0.0046 to 0.0889), replicase (0.0033 to 0.0887), and lysis protein (0.0000 to 0.3416). The capsid protein was identical among strains DL1, DL2, DL13, DL16, and J20. As demonstrated by the distance values (0.2316 to 0.5685), the amino acid compositions of all four proteins in strain fr were not conserved compared to those of the other genogroup I strains.
In genogroup II strains, the capsid protein was the most conserved, with a genetic distance of 0.0135 to 0.1116, followed by the lysis (0.0000 to 0.1521), replicase (0.0415 to 0.2160), and maturation (0.0235 to 0.2036) proteins.
The length of the maturation proteins of genogroups III and IV ranged from 420 to 450 amino acids (Table (Table1).1). Genogroup IV maturation protein in strains HB-P22, HB-P24, NL95, and FI had a nine-amino-acid deletion compared to the other genogroup IV strains. The lengths of the capsid proteins were 133 and 132 for genogroups III and IV, respectively. Read-through proteins were 328 to 329 and 329 to 332 amino acids in length for genogroups III and IV, respectively. The replicase was 576 to 592 amino acids in length and contained the YGDD sequence.
Similar to the case for Levivirus, a Allolevivirus maturation protein search generated the “phage_mat-A” domain, which matched all four genogroups of Leviviridae phages described in this study plus the non-FRNA bacteriophage strains PRR1, PP7, and AP205. The PfamA capsid protein search resulted in the family “Levi_coat” and matched all four genogroups in the Leviviridae family, along with FRNA phages ZR, TH1, TL2, SD, f2, and BO1 plus the Pseudomonas bacteriophage PRR1. Non-FRNA strains PP7 and AP205 were not detected in the capsid search results.
Read-through proteins were grouped as “A1-protein coat readthrough” with PfamB, generating a five-member FRNA strain match of SP, Qβ, NL95, MX1, and M11. As with the Levivirus protein, the Allolevivirus replicase protein was sorted into the “RNA replicase, beta-chain” family, which included Leviviridae strains of all four genogroups with additional FRNA strains ZR and BO1 plus non-FRNA bacteriophages PRR1, PP7, and AP205.
As observed for the Levivirus genus, the most prevalent protein motifs for Allolevivirus were casein kinase II phosphorylation, cAMP- and cGMP-dependent protein kinase phosphorylation, protein kinase C phosphorylation, N myristoylation, N glycosylation, and tyrosine kinase phosphorylation motifs. With the exception of genogroup III strains MX1 and M11 and genogroup IV strain HB-P24, a cell attachment motif (RGD) was present in the maturation protein. Genogroup IV strains SP, BR8, BR1, and HB-P22 had an additional cell attachment motif in the read-through protein.
The catalytic domain of the RNA-dependent RNA polymerase (replicase protein) was located at amino acid positions 262 to 394 in genogroup III strains, with the exception of strains M11 and MX1. The M11 and MX1 catalytic domain was located at amino acid positions 259 to 391. The catalytic domain of genogroup IV was located at amino acid positions 259 to 391.
Genetic distances (data not shown) corresponding to genogroup III proteins were most conserved in the capsid (distance, 0.0000 to 0.3734) followed by read-through protein (0.0444 to 0.5128) and replicase (0.0278 to 0.6571), with the greatest genetic distance occurring in the maturation protein (0.0347 to 0.8289). Strains BR12 and VK shared identical capsid proteins (distance of 0.0000).
In genogroup IV, the most similar amino acid compositions were found in the capsid (0.0535 to 0.2569), followed by the replicase (distance 0.0474 to 0.3382) and the read-through protein (0.0555 to 0.5072). The greatest genetic distance was observed in the maturation protein (0.0607 to 0.5646).
Phylogenetic trees from nucleotide sequences for genogroup I strains clustered into two branches, one branch with nine strains clustered as MS2-like and a second branch with strain fr (Fig. (Fig.1).1). For genogroup II nucleotide sequences, strains KU1 and T72 formed one branch and strains DL10, DL20, and GA formed a second branch. Genogroup III nucleotide sequences clustered into two branches, one branch containing MX1 and M11 and a second branch with Qβ-like strains BR12, VK, BZ1, HL4-9, TW18, and prototype Qβ. Nucleotide sequence analysis revealed three branches in genogroup IV strains as follows: (i) HB-P24, HB-P22, and prototype NL95, (ii) BR1, BR8, and prototype SP, and (iii) prototype FI.
Overall, phylogenetic trees from amino acid sequences of individual proteins were similar to nucleotide phylogenetic clustering (Fig. (Fig.3).3). For example, each genogroup III protein formed two branches, (i) MX1 and M11 and (ii) Qβ-like (six strains). Genogroup IV generated three or four subclusters for each individual protein tree.
Nucleotide SimPlot analysis for all genogroup I and II strains showed that the replicase genes were most similar (approximately 0.5 to 0.6, or 50 to 60% similarity), whereas the assembly or maturation genes were the most dissimilar (<10%) (Fig. (Fig.2A).2A). When full-length nucleotide sequences were compared between genogroups III and IV, SimPlot graphs showed similar regions in the capsid (approximately 60%) and the 5′ portion of the replicase (approximately 60%) but increased dissimilarity in the assembly and 3′ region of the replicase (Fig. (Fig.2B2B).
We report the characterization of 30 complete FRNA genomes, 19 of which were newly sequenced in this study. Phylogenetic analyses confirmed that the family Leviviridae contains two genera, each with two distinct genogroups. In some cases, the genetic similarity of FRNA strains collected from different continents was greater than 90% within each genogroup, which suggests that geography does not play a significant role in sequence variability (37). The sequences showed great uniformity (i.e., protein length, ORF positions, and replicase catalytic domain) throughout the Leviviridae family (15). For example, the number of amino acids in the capsid or coat protein is highly conserved, ranging from 130 to 133 amino acids, indicating that the size of this protein is apparently constant in Leviviridae phages (15) and critical for the structure of proper capsid configuration (25).
Unlike the case for genogroup II, the genogroup I lysis gene initiation codon was embedded at the 3′ end of the capsid gene. In both genogroups I and II, the lysis gene was out of frame from the capsid gene and the termination codon was located at the 5′ region of the replicase gene. Subsequently, the replicase ORF4 initiation codon was embedded in the lysis gene and also reads out of frame from the lysis gene.
Notably, the genogroup II lysis gene was unique within the Leviviridae viruses. Genogroup II strains DL10, DL20, and GA had an overlapping or translational coupled stop-start codon (UAAUG) (16) in the lysis gene, which was not observed in other Leviviridae strains. In this case, the frameshifted lysis start codon was coupled to the capsid stop codon. In comparison, the coupled stop-start codon was disrupted between the A and U by an 18-nt insertion in strain T72 and resulted in 17 nt between the capsid stop codon lying out of frame with the lysis start codon. The sequences of strain T72 confirmed an earlier observation in which strain KU1 also had this 18-nt insert (13). In addition, the UUG lysis start codon (13) was unique to strains T72 and KU1 and has been reported to occur at a rate of approximately 3% as an alternative start codon in prokaryotes (3). Despite the dissimilarities in the genogroup II lysis start codon, the translated lysis protein was conserved (93.8 to 100% amino acid similarity), supporting the findings that the variability of the start codon was not involved in gene regulation (13).
All 30 FRNA strains as well as bacteriophages PP7, AP205, and PRR1 contained the replicase YGDD motif, which is conserved among all positive-sense ssRNA viruses (27). Interestingly, all Leviviridae genomic sequences possessed a conserved 8-nt signature at the 3′ end of the genome, which distinguishes alloleviviruses (5′ TCCTCCCA 3′) from leviviruses (5′ ACCACCCA 3′). This observation confirms and extends previous reports on a CCCA stretch at the 3′ end of the genome (15).
The Shine-Dalgarno sequence, the initiation codon spacing, and the secondary RNA structure function to initiate protein translation by aligning the ribosome with the start codon (30). In this study, Shine-Dalgarno sequences of FRNA viruses were located 4 to 15 nt upstream from the start codon(s). In comparison to prokaryotes, greater than 80% of these Shine-Dalgarno sequences occur within 5 to 13 bases upstream (20).
Stop codons, UAG, UAA, and UGA, serve as signals for peptide chain termination. During translation of the viral RNA coat protein cistron, the UGA stop codon can be read through, resulting in an additional translated product (43). In Allolevivirus, a read-through protein is translated when a leaky UGA stop codon is misread as a tryptophan codon (UGG) (39), thereby influencing regulatory control and efficiency of gene expression (1). In comparison, alignment of genogroup III viruses revealed that the Allolevivirus maturation protein stop codon is also a UGA; however, in this instance, it is a nonleaky codon. This may occur because the 5′ and 3′ codons flanking the UGA stop codon influence translation termination efficiency (2, 23, 31). A study on genogroup III strain Qβ proposed that programmed read-through was regulated by the 3′ nucleotides, specifically an A nucleotide, flanking the stop codon in strain Qβ (8). However, alignment of genogroup III and IV nucleotide data in the present study did not reveal a 3′ flanking pattern downstream of the UGA stop codon, nor was a 3′ nucleotide observed immediately following the stop codon in all genogroup III strains. Noticeably, Qβ-like strains contained the 3′ A; however, the 3′ A was absent in genogroup III strains MX1 and M11. Interestingly, in all Allolevivirus sequences, a 5′ pattern emerged at the read-through UGA stop codon but was absent in the maturation gene UGA stop codon. Beginning 12 nt upstream from the read-through UGA stop codon, the sequence AAY CCR GCR UAY UGA in genogroup III and AAY CCW GCN UAC UGA in genogroup IV was observed. Nucleotides present in both genogroups III and IV are underlined; these nucleotide triplets coded for the amino acids LNPAY. These findings suggest the upstream sequences may reduce translation termination efficiency of the UGA read-through stop codon in Allolevivirus spp.
A cell attachment motif, Arg-Gly-Asp (RGD), was identified in the maturation and/or read-through proteins in the majority of Allolevivirus strains but was absent in Levivirus strains. The function of the RGD motif in FRNA coliphages has yet to be experimentally demonstrated but may explain the fact that Levivirus strains attach to the host's pili via the maturation protein, whereas in Allolevivirus strains, both the maturation and read-through proteins are required for phage infection (37). For example, the RGD motif was shown to be involved in cell-to-cell adhesion in ssRNA viruses, such as the passaged foot-and-mouth disease virus (21), enterovirus, echovirus 9 strain Barty, coxsackievirus A9, echovirus 22 (24), and bluetongue virus (34). In nearly all astroviruses, an RGD or similar integrin-recognition motif has been identified (40).
In previous reports, ssRNA bacteriophages PP7, PRR1, and AP205 have been compared to FRNA phages. Pseudomonas aeruginosa ssRNA phage PP7 shares secondary regulatory RNA structures with FRNA viruses and has been classified into the genus Levivirus (39) despite the lack of sequence similarity (27, 39) and amino acid clustering (29). The Pfam protein domain profile supports the observation that the lysis and capsid proteins of phage PP7 do not cluster to these proteins in leviviruses or the capsid and read-through proteins of alloleviviruses. However, phage PP7 replicase protein clusters with the replicase protein of the alloleviviruses but not with the leviviruses. In addition, the maturation protein of phage PP7 phylogenetically clusters with the maturation proteins of both Levivirus and Allolevivirus.
Phage PRR1 adsorbs to host pili and displays a genetic map similar to those of viruses of the family Leviviridae (29). Although propagated in Pseudomonas aeruginosa, phage PRR1 has a broad host range (26). Phage PRR1 shared approximately 43 to 48% sequence identity to other ssRNA Leviviridae phages but clustered outside the Levivirus and Allolevivirus genera (29). The PRR1 genetic map was similar to that of Leviviridae, and subsequently, phage PRR1 was grouped into the Levivirus genus (29). Our data demonstrate that phage PRR1 shares Pfam domains with Leviviridae maturation, capsid, and replicase proteins. PRR1 did not share the signature Levivirus 3′ terminus ACCACCCA.
Phage AP205 from Acinetobacter shares Pfam domains in only the Levivirus and Allolevivirus maturation and replicase proteins. However, AP205 proteins, including the coat, maturation, lysis, and replicase proteins, clustered outside the Levivirus and Allolevivirus phylogenetic tree (29). Although AP205 lacks significant sequence similarity, this phage shares important structural features with Leviviridae (15). Like phages PRR1 and PP7, phage AP205 did not share the Levivirus or Allolevivirus 3′ terminus signature. NCBI GenBank taxonomy lists bacteriophage PRR1 as “unclassified Leviviridae,” and bacteriophages PP7 and AP205 were placed into an “unclassified Levivirus” category. Inclusion of these non-FRNA strains into the Leviviridae family should be reconsidered.
Originally, four major Leviviridae genogroups, I, II, III and IV, and subgroups a, b, and c in genogroup III and subgroups a and b in genogroup IV were assigned to FRNA phages based on template specificity of RNA replicase (22). To provide greater clarity in Leviviridae classification, we suggest designating specific phage type strains and using their names rather than alphabetical subgroups. The following phages may be likely candidates for type strain designation: (i) phages MS2 and fr in genogroup I; (ii) phages GA and KU1 in genogroup II; (iii) phages Qβ and MX1 in genogroup III; and (iv) phages NL95, SP, and FI in genogroup IV. This would result in placing the phages of genogroup I into MS2-like or fr-like categories, etc.
In conclusion, the findings of this study agree with previously determined FRNA features and phylogenetic analyses, which concluded that viruses in the family Leviviridae contain two genera and four distinct genogroups. Although genogroup I strain fr and genogroup III strains MX1 and M11 share only 70 to 78% sequence identity with strains in their respective genogroups, our analyses suggest that fr should be grouped into Levivirus genogroup I and MX1 and M11 should be grouped into Allolevivirus genogroup III. Within each genus, strains share approximately 50% sequence identity, whereas between the two genera, strains have <40% nucleotide sequence identity. However, strains within each genogroup shared approximately 70 to 98% nucleotide identity.
This research was funded, in part, through the Environmental Protection Agency's (EPA's) New England Regional Applied Research Effort. We gratefully acknowledge the assistance of Jack Paar III, U.S. EPA New England Regional Laboratory, for initiating and sponsoring this program.
An acknowledgment is extended to Emilie Cooper for SimPlot analysis and Syed Muaz Khalil for providing a portion of the sequence data. We thank Greg Lovelace and David Love for isolating and providing some of the strains used in this study.
The information in this document has been funded wholly (or in part) by the U.S. EPA. It has been subjected to review by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.
This is contribution number 1354 from the Gulf Ecology Division.
The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the funding agency or the Centers for Disease Control and Prevention. This article received clearance through the appropriate channels at the Centers for Disease Control and Prevention prior to submission.
Published ahead of print on 26 August 2009.