|Home | About | Journals | Submit | Contact Us | Français|
Group A human rotaviruses (RVs) remain the most frequently detected viral agents associated with acute gastroenteritis in infants and young children. Despite their medical importance, relatively few complete genome sequences have been determined for commonly circulating G/P-type strains (i.e., G1P, G2P, G3P, G4P, and G9P). In the current study, we sequenced the genomes of 11 G4P isolates from stool specimens that were collected in Washington, DC during the years of 1974-1991. We found that the VP7-VP4-VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6-encoding genes of all 11 G4P RVs have the genotypes of G4-P-I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees for each gene, extensive intra-genotypic diversity was revealed among the G4P RVs, and new sub-genotype gene alleles were identified. Several of these alleles are nearly identical to those of G3P isolates previously sequenced from this same Washington, DC collection, strongly suggesting that the RVs underwent gene reassortment. On the other hand, we observed that some G4P RVs exhibit completely different allele-based genome constellations, despite being collected during the same epidemic season; there was no evidence of gene reassortment between these strains. This observation extends our previous findings and supports the notion that stable, genetically-distinct clades of human RVs with the same G/P-type can co-circulate in a community. Interestingly, the sub-genotype gene alleles found in some of the DC RVs share a close evolutionary relationship with genes of more contemporary human strains. Thus, archival human RVs sequenced in this study might represent evolutionary precursors to modern-day strains.
Group A rotaviruses (RVs) are important pathogens that cause severe, watery diarrhea in infants and young children (Parashar et al., 2009; Parashar et al., 2006). The genome consists of 11 double-stranded (ds) RNA segments and is surrounded by a triple-layered protein capsid (Estes and Kapikian, 2007). In general, each genome segment encodes a single polypeptide, allowing RVs to express 6 structural proteins (VP1-VP4, VP6-7) and 5 or 6 non-structural proteins (NSP1-NSP5, sometimes NSP6) (Estes and Kapikian, 2007). Neutralizing antibodies generated against outer capsid proteins VP7 and VP4 correlate with protection from subsequent RV-induced disease (Offit, 1994). As such, serotypes defined by VP7 (G-serotypes) and VP4 (P-serotypes) antigenicity, as well as the G/P-genotypes based on gene sequences, are a principal method of classifying RV isolates (Estes and Kapikian, 2007). While numerous G/P-type combinations have been detected in nature, strains with specificities of G1P, G2P, G3P, G4P, and G9P are responsible for the majority of cases of RV gastroenteritis in humans (Desselberger et al., 2001; Gentsch et al., 2005; Santos and Hoshino, 2005). Recently, a RV classification system was developed that assigns a genotype to each of the 9 internal protein genes (i.e., those encoding VP1-VP3, VP6, NSP1-NSP5/6) in addition to G/P-genotypes (Matthijnssens et al., 2008a; Matthijnssens et al., 2008b). Now, the acronym Gx-P[x]-Ix-Rx-Cx-Mx-Ax-Nx-Tx-Ex-Hx is used to classify the VP7-VP4-VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6-encoding segments. Using this system, it was shown that the internal protein genes of RVs with G1, G3, G4, G9 and P specificities are usually genotype 1, and are therefore said to belong to the Wa genogroup (Heiman et al., 2008; Matthijnssens et al., 2008a; Matthijnssens et al., 2008b). In contrast, the internal protein genes of G2 and P strains are generally genotype 2, assigning them into the DS-1 genogroup (Heiman et al., 2008; Matthijnssens et al., 2008a; Matthijnssens et al., 2008b). Inter-genogroup reassortant RVs containing both genotype 1 and 2 genes can be generated via reassortment following co-infection with two ‘pure’ strains (e.g., G1P and G2P) (Ward et al., 1990). Such ‘mixed’ genotype strains have occasionally been isolated from humans, usually in the context of rare G/P-type combinations (Heiman et al., 2008; Matthijnssens et al., 2008a). However, the lack of sequence information for common human G/P-type strains (i.e., G1P, G2P, G3P, G4P, and G9P) limits our understanding of RV genotypic and intra-genotypic diversity, notably for the genes encoding proteins other than VP7 and VP4.
To better understand the level of genetic variation among human RVs, we have an ongoing contract with the J. Craig Venter Institute (JCVI) in Rockville, MD to determine the sequences of archival strains in a Washington, DC stool collection. This collection contains 428 G1P-, 27 G2P-, 150 G3P-, and 32 G4P-positive fecal specimens, all of which were isolated from children with RV gastroenteritis at Children’s Hospital National Medical Center during the years of 1974-1991 (Table 1). We have previously reported on the analysis of RVs from 51 of the 150 G3P-positive specimens (McDonald et al., 2009). The results of that study showed that the 51 G3P RVs viruses all belong to the Wa genogroup and have identical genotypes (G3-P-I1-R1-C1-M1-A1-N1-T1-E1-H1); however, we found that that there is significant intra-genotypic variation among their genes. Specifically, by generating phylogenetic trees using the open-reading frame (ORF) sequences, 4 or 5 sub-genotype alleles were identified for each gene. We observed that some G3P RVs maintained completely different allele-based genome constellations, even though they were collected during the same epidemic season. This result suggested that (i) several clades of G3P RVs co-circulated in Washington, DC, and that (ii) minimal gene reassortment occurred among viruses belonging to different clades.
Here, we report on the sequencing and analysis of 11 archival G4P RVs from the same Washington, DC stool collection. To our knowledge, these are the first near-complete genome sequences available for this medically-important serotype. We found that all 11 of the sequenced G4P RVs have the genotypes G4-P-I1-R1-C1-M1-A1-N1-T1-E1-H1, and that most share sub-genotype gene allele designations with those of the G3P strains. However, for the VP4, VP1, and NSP4 genes, new sub-genotype alleles were identified in the G4P RVs; these alleles are absent in the G3P strains analyzed thus far. Consistent with the previous study, some of the G4P RVs isolated during the same epidemic season exhibit completely different allele-based genome constellations, supporting the idea that several RV clades can circulate concurrently. Interestingly, the genes found in one of the G4P clades, as well as in some of the G3P isolates, are closely related to those of recently-circulating human strains and possibly represent ancestral gene precursors.
The methods of sample collection, RNA extraction, and G/P-typing were reported previously (McDonald et al., 2009). Briefly, fecal specimens were collected from children <2 years of age who were hospitalized with diarrhea at Children’s Hospital National Medical Center, Washington, DC. Specimens were tested for RV using electron microscopy and enzyme-linked immunosorbent assay (ELISA). RNA was extracted from RV-positive samples using TRIzol (Invitrogen), and samples were classified into G/P-types based on the results of a microtiter plate hybridization-based PCR-ELISA.
RT-PCR and nucleotide sequencing was performed at JCVI as described previously (McDonald et al., 2009). Briefly, RT-PCR was performed with 1 ng of RNA using OneStep RT-PCR kits (Qiagen). Reactions were scaled down to 1/5 the recommended volumes, the RNA templates were denatured in 50% DMSO at 95°C for 5 min, and 1.6 units RNase Out (Invitrogen) was added. Following RT-PCR cycling, the reactions were treated with 0.5 units of shrimp alkaline phosphatase and 1 unit of exonuclease I (USB) and then incubated at 37°C for 60 min to degrade remaining dNTPs and digest the single-stranded primers. Enzymes were heat inactivated by incubation at 72° C for 15 min. The RT-PCR products were sequenced with an ABI Prism BigDye v3.1 terminator cycle sequencing kit (Applied Biosystems). The dye terminator was removed using Performa DTR cartridges (Edge Biosystems), and sequences were obtained with a 3730 DNA Analyzer (Applied Biosystems). Raw sequence data was trimmed to remove any primer-derived sequence ,as well as low quality sequence, and gene sequences were assembled using the Elvira and TIGR assemblers (www.jcvi.org/cms/research/software). The gene sequences were then manually edited using CloE (Closure Editor; JCVI) and polymorphisms were re-analyzed by sequencing. The nucleotide sequences deduced for the 11 genome segments of the 11 G4P RVs were deposited into GenBank. Table 2 lists the accession numbers and the full strain name for each G4P isolate according to the guidelines of the RCWG (Matthijnssens et al., 2011). For simplicity, we will refer to viruses by their abbreviated common names (DC2241, DC4996, DC5064, DC5115, DC827, DC1285, DC4613, DC1359, DC1208, DC4608, or DC4320).
Nucleotide alignments were constructed using MacVector 8.1.2 and ClustalW with the set defaults (open gap penalty of 15.0, extended gap penalty of 6.66, delay divergence of 30%, and transitions weighted). Phylogenetic trees were generated using MacVector 8.1.2 and the neighbor-joining method (1000 bootstrap repetitions and Kimura-2 correction parameter). GenBank accession numbers and genotypes of the previously sequenced RV genes used in the phylogenetic analyses are provided in supplemental materials (Tables S1 and S2).
RNA was extracted from all 32 G4P-positive specimens of the Washington, DC stool collection and used as template for RT-PCR and sequencing. Of the 32 samples, 21 yielded partial genome sequences, while 11 contained viral RNA of sufficient quality to obtain near-complete genome sequences. Specifically, the entire ORFs, and in some cases portions of the untranslated regions (UTRs), were determined for each of the 11 genome segments of the 11 G4P RVs. The sequence chromatograms showed strong, single peaks, suggesting that each stool specimen contained a dominant G4P RV (data not shown). Using RotaC, a web-based RV genotyping tool (http://rotac.regatools.be), we confirmed the G/P-genotypes of the 11 G4P RVs and identified the other 9 genes of each isolate as genotype 1 (Maes et al., 2009). Thus, the 11 G4P RVs have the complete genotype descriptor of G4-P-I1-R1-C1-M1-A1-N1-T1-E1-H1, which is similar to that of the 51 previously sequenced G3P RVs (G3-P-I1-R1-C1-M1-A1-N1-T1-E1-H1) (McDonald et al., 2009). Therefore, all of the archival RVs sequenced thus far from the Washington, DC stool collection belong to the Wa genogroup.
We next constructed neighbor-joining phylogenetic trees for each gene of the 62 sequenced archival DC RVs to examine the intra-genotypic diversity of the G4P strains and their genetic relationships to the G3P strains (Fig. 1). Similar to the previous study, we defined each major phylogenetic grouping of each tree as a sub-genotype gene allele and, for purposes of clarity, assigned it a specific color descriptor (red, orange, green, cyan, navy, or brown) (McDonald et al., 2009). Because there were a limited number of G4P RVs in this study, we were not able to define sub-genotype gene alleles for VP7 (Fig. 1A). For the remaining ten genes (VP1-VP4, VP6, NSP1-NSP5/6), we found that most of the G4P sequences cluster with those of the G3P RVs in the phylogenetic trees, and therefore, they can be given the same color-coded allele designations (Fig. 1B-K). For example, the VP6 genes of G4P isolates DC1285, DC4613, DC1359, DC827, and DC2241 (highlighted in yellow) cluster tightly with the orange VP6 alleles of G3P isolates DC1563, DC1730, DC5751, DC4996, DC5064, and DC5115 (Fig. 1F). However, the VP4, VP1, NSP1, and NSP4 genes of a few G4P RVs do not cluster with those of the G3P RVs (Fig. 1B-C, 1G, and 1J). Instead, they comprise new phylogenetic groupings and sub-genotype alleles, which we designated with the color brown. For VP4, the isolate DC1285 from 1980 is the only virus to show a brown allele (Fig. 1B). For VP1, the brown allele is seen in isolates DC1208 and DC4608 from 1980 and is quite divergent from those of the cyan, red, green, and orange alleles seen for other archival Washington, DC RVs (Fig. 1C). For NSP1 and NSP4, the brown alleles are only detected in viruses isolated in 1977 (DC2241, DC4996, DC5064, and DC5115) (Fig. 1G and 1J). The color-coded allele assignments of the G3P RV genes were consistent with what we had defined previously, with two exceptions (McDonald et al., 2009). First, we found that the NSP2 genes of G3P isolates DC1563 and DC1730 cluster with the red, rather than the orange allele grouping when analyzed along with the G4P RV sequences (McDonald et al., 2009) (Fig. 1H). Also, we found that the NSP3 genes previously described as green and orange alleles share a very close phylogenetic relationship when analyzed in context of the G4P sequences (Fig.1I). Therefore, the NSP3 genes of isolates DC2262, DC2239, DC130, and DC135 were redefined as orange, rather than green, alleles.
A summary of color-coded allele-based genome constellations of the G3P and G4P RVs sequenced from the Washington, DC stool collection thus far are summarized in Fig. 2. The results show that all of the G4P RVs have genes, and in some cases genomes, quite similar to those of the G3P RVs when compared at the allele-level. The orange VP4 allele of the G3P 1979 isolate DC1730 is 99.4% identical to that of the G4P isolate DC4613 collected the following year. Likewise, the green VP1 alleles of the 1976 G3P isolates DC2262, DC2239, and DC130 are ~99.3% identical to those of 1977 G4P isolates DC2241, DC4996, DC5064, and DC5115. Given their genetic similarities, we think it is likely that the G3P and G4P RVs directly reassorted their genes. However, we cannot exclude the possibility that the G3P and G4P RVs independently acquired similar genes by reassorting with strains not yet analyzed, such as the G1P strains. Strikingly, G3P isolates from 1974 and 1979 (DC1563 and DC1730, respectively) show genome constellations indistinguishable at the allele level from the 1980 G4P isolates DC4613 and DC1359 (Fig. 2). Yet, the two other G4P strains from 1980 (DC1208 and DC4608) are very divergent from DC4613 and DC1359, and instead, are much more genetically-similar to the G3P strains (DC1600 and DC792) isolated from the same year (Fig. 2). Moreover, the single 1988 G4P isolate DC4320 is identical at the allele level to the majority of 1991 G3P viruses (Fig. 2). It is quite probable that these particular G4P viruses more similar to G3P strains represent examples of VP7 gene reassortants. Together, the analyses allowed us to identify at least three different clades of G4P RVs that circulated in Washington, DC during the years of collection, all of which have some genes closely related to G3P RVs isolated from the same location. This data is consistent with our previous findings for the G3P strains and clearly show that viruses (i) isolated from the same epidemic season, (ii) collected at the same geographical location, and (iii) having the same G/P-type specificity (i.e., G4P) can contain completely different sub-genotype allele-based genome constellations (McDonald et al., 2009). The results presented here are also in accordance with a previous study, which used electrophoretic analyses of viral genomes to show that co-circulating strains with the same serotype can have distinctly migrating internal protein gene segments (Nakagomi et al, 1988). The biological reason for the existence of clades with preferred allele-based genome constellations remains to be determined. Yet, is interesting to speculate that, in addition to reassortment biases at the genotype level, preferences may also exist for certain gene sets at the sub-genotype level.
Having defined relationships among the sequenced G3P and G4P RVs from the Washington, DC collection, we next sought to determine how similar these archival strains are to other sequenced human and animal RVs. Therefore, we created neighbor-joining phylogenetic trees using the ORF sequences of representative DC RVs and those of various other RVs available in GenBank (Fig. 3). For VP7, the genes of the 11 G4P strains sequenced in this study were compared to those of published human and porcine RVs with known G4 specificities (Fig. 3A). We found that the DC G4P VP7 genes cluster with genes of other human G4 strains within lineage I (Fig. 3A) (Stupka et al., 2009). More specifically, the VP7 genes of the DC strains seem most closely related to those of older, cell culture-adapted human strains (e.g., Hochi, Odelia, and ST3) and to primary human RVs isolated before 2002 (e.g., MW4086/00, RMC100, and Kagawa/90-544) (Fig. 3A) (Kudo et al., 2001; Heiman et al., 2008; Page et al., 2010). Such genetic similarities might be a reflection of the general time period in which the viruses were isolated (during the years of 1977-2002). In support of this notion, the majority of VP7 genes from more recently circulating strains, isolated between 2002-2010, comprise a phylogenetic grouping distinct from that containing the archival DC RV VP7 genes (Fig. 3A). This grouping has been proposed to represent an emerging (i.e., modern) G4 sub-lineage (Stupka et al., 2009). However, the VP7 genes of G4 human RVs isolated in Vietnam in 2003 (strains VN-5, VN-9, VN-21, and VN-565) do not group with the modern sub-lineage and instead cluster with the 1988 G4P isolate DC4320 (Fig. 3A) (Trinh et al., 2010). This result suggests that the VP7 genes of currently circulating G4 strains may show more intra-genotypic diversity than previously thought. At the amino acid level, the archival DC RV VP7 proteins are nearly identical (92.2 to 100% amino acid identity) to those of other human G4 strains, including those of recently isolated RVs. This observation suggests that, irrespective of nucleotide variation, the VP7 proteins of G4 strains have changed very little over the last 30+ years. We did find that the VP7 protein of the 1978 G4P isolate DC827 has two unique amino acid changes (M63L and A183V) not seen in any other human G4 strain for which a sequence is available in GenBank. Likewise, the VP7 protein of the 1988 isolate DC4320 shows a single unique amino acid substitution (V235I) (Fig. S1). These changes are conservative, and none are located in domains predicted to be bound by neutralizing antibodies; this result suggests that the unique changes do not affect VP7 function or antigenicity (data not shown).
For VP4, the genes of the G4P strains sequenced in this study were compared phylogenetically to those of previously-published human RVs with P specificities, including 5 archival G3P DC RVs (Fig. 3B). Like what was seen for VP7, the VP4 genes of archival DC RVs generally cluster with those of older, cell culture-adapted P human strains (e.g., Wa, D, KU, IAL28, and YO), which may reflect the general time period of isolation (during the years of 1977-2002) (Fig. 3B) (Heiman et al, 2008). Nonetheless, we did observe that the orange VP4 alleles found in some archival DC RVs from 1974, 1979, and 1980 (strains DC1563, DC1359, and DC4613) grouped more closely with VP4 genes of P human strains isolated after 2002. This modern lineage includes VP4 proteins the 2006 Indian strain, 0613158, the 2008-2009 Belgium strains, BE00029 and BE00036, the 2008 German strain, GER126-08, and the 2009 United States strains, 2008747332 and 2008747332 (Arora and Chitambar, 2011; Pietsch et al., 2011). At the amino acid level, the orange allele-encoded VP4 is also very similar to proteins of the modern lineage strains (Fig. S2). In particular, the VP4 proteins of 1980 G4P strains DC1359 and DC4613 share five amino acid changes (S144G, N/D195G, I580V, L604V, and K617N) with VP4 proteins of modern lineage strains when compared the proteins of older, pre-2002 RVs. Amino acids at positions 144 and 195 are located in putative antigenic domain 8-1 of VP8*, the distal cleavage fragment of trypsin-activated VP4 (Fig. S3) (Dormitzer et al., 2002; Dormitzer et al., 2004). It is possible that these changes provided a selective advantage to RVs with the orange-allele encoded VP4 proteins, thus allowing their emergence in the human population.
For VP1-VP3, VP6, and NSP1-NSP5/6, we found that most of the archival DC RV genes cluster phylogenetically with the genotype 1 genes of human strains and are separate from the genotype 1 genes of porcine RVs (strains OSU, Gottfried, YM, A131, and A253) and the porcine-like bovine RVs (strains KJ25-1 and KJ44) (Fig. 3C-K). However, we did notice that the newly-designated brown VP1 allele of DC G4P isolates DC1208 and DC4608 comprises a phylogenetic grouping along with porcine RV genotype 1 VP1 genes (Fig. 3C). In fact, the VP1 gene of DC1208 is 93.2% identical at the nucleotide level to the VP1 gene of porcine strain YM, but only ~86.4% identical to the VP1 genes of the Washington, DC RVs (Almanza et al., 1994). A few other alleles of the archival DC strains also showed a close evolutionary relationship with genes of porcine RVs: the cyan and navy VP2 alleles, the cyan NSP1 allele, and the navy NSP3 allele (Fig. 3D, 3G, and 3I). It is not clear whether this phylogenetic pattern is indicative of direct reassortment among human and porcine strains or simply reflects a shared ancestral origin (Matthijnssens et al., 2008a).
Interestingly, when compared phylogenetically to other human strains, the orange alleles of many internal protein genes are closely related to genes of primary human RVs isolated in many different parts of the world during the years of 2002-2010 (e.g., 6361, 061060, BE00036, BE00029, Dhaka16-03, AM06-I, 2008747332, 2008747336, B3458, Matlab36-02, Dhaka12-03, Matlab13-03, GER172-08, GER126-08, B4633-03, and Dhaka25-02) (Fig. 3C-K) (Rahman et al., 2007; Matthijnssens et al., 2008a; Arora and Chitambar, 2011; Matthijnssens et al., 2010; Pietsch et al., 2011). This pattern is similar to what we observed for the orange VP4 alleles (Fig. 3B). In contrast, the red, navy, cyan, green, and brown alleles of the archival DC RVs seem more similar to older, cell culture-adapted RVs and to primary human strains isolated before 2002 (Fig. 3C-K). For the NSP1 and NSP4 trees, the phylogenetic pattern is more complicated; there is no clear grouping of orange alleles with genes of contemporary human RVs (Fig. 3G and 3J). Specifically, while the orange NSP1 alleles cluster with genes of the 2008 United States strains 2008747332 and 2008747336, they are more distantly related to genes of other recent isolates (Fig. 3G). The contemporary Belgium and German strains, BE00039 and GER126-08, respectively, instead group with the green and navy NSP1 alleles (Fig. 3G) (Pietsch et al., 2011). For NSP4, the genes of many modern isolates comprise a phylogenetic grouping with the orange alleles of G4P isolates DC1359, DC1285, and DC4613 (Fig. 3J). However, the NSP4 gene of G3P isolate DC1563, which is also classified as orange, does not cluster with those of the G4P isolates, and the branching separation shows a weak bootstrap value (<70%) (Fig. 3J). Nonetheless, it is clear that the sub-genotype alleles designated as orange in this study are genetically similar to genes of recently circulating human strains.
The results of this study reveal that extensive intra-genotypic diversity existed among 11 G4P and 51 G3P archival RVs from Washington, DC. These different G/P-type strains were found to share sub-genotype gene allele designations, suggesting that they might have undergone reassortment. Still, we found some G4P strains that were isolated during the same epidemic season, but exhibit completely different allele-based genome constellations. This observation is similar to what we have previously reported for the G3P RVs and supports the notion that stable, genetically-distinct clades of human RVs with the same G/P-type can co-circulate without reassorting genes. Interestingly, the genes designated as orange alleles in this study share a close evolutionary relationship with genes of more contemporary human strains. Based on this data, we hypothesize that human RVs with predominantly orange allele-like genes emerged in the population over the last few decades. If so, the archival DC RVs sequenced in this and our previous study might represent the oldest known ancestors of modern-day strains. However, we acknowledge that this study only analyzes a small subset of RVs relative to the vast number that infect children everyday. Future, large-scale RV genomics studies are certainly required to fully elucidate the evolutionary relationship among archival and contemporary strains.
We would like to thank members of the Patton laboratory for scientific and editorial suggestions. SMM, KD, and JTP were supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases, National Institutes of Health during the time of this study.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.