|Home | About | Journals | Submit | Contact Us | Français|
Noroviruses are a major cause of epidemic gastroenteritis in children and adults, and GII.4 has been the predominant genotype since its first documented occurrence in 1987. This study examined the evolutionary dynamics of GII.4 noroviruses over more than three decades to investigate possible mechanisms by which these viruses have emerged to become predominant. Stool samples (n = 5,424) from children hospitalized at the Children's Hospital in Washington, DC, between 1974 and 1991 were screened for the presence of noroviruses by a custom multiplex real-time reverse transcription-PCR. The complete genome sequences of five GII.4 noroviruses (three of which predate 1987 by more than a decade) in this archival collection were determined and compared to the sequences of contemporary strains. Evolutionary analysis determined that the GII.4 VP1 capsid gene evolved at a rate of 4.3 × 10−3 nucleotide substitutions/site/year. Only six sites in the VP1 capsid protein were found to evolve under positive selection, most of them located in the shell domain. No unique mutations were observed in or around the two histoblood group antigen (HBGA) binding sites in the P region, indicating that this site has been conserved since the 1970s. The VP1 proteins from the 1974 to 1977 noroviruses contained a unique sequence of four consecutive amino acids in the P2 region, which formed an exposed protrusion on the modeled capsid structure. This protrusion and other observed sequence variations did not affect the HBGA binding profiles of recombinant virus-like particles derived from representative 1974 and 1977 noroviruses compared with more recent noroviruses. Our analysis of archival GII.4 norovirus strains suggests that this genotype has been circulating for more than three decades and provides new ancestral strain sequences for the analysis of GII.4 evolution.
Noroviruses are a major cause of epidemic gastroenteritis in children and adults. They are responsible for nearly half of all gastroenteritis cases and for more than 90% of nonbacterial gastroenteritis epidemics worldwide (38). In addition, noroviruses have emerged as the second most important cause of diarrhea in children, following rotaviruses (38). Noroviruses have been described as quasispecies on the basis of their population diversity, and the constant generation of genetic and antigenic heterogeneity may allow them to persist in human populations by evading the immune response (4, 12, 19). Although noroviruses were first detected in samples derived from an outbreak at a school in Norwalk, OH, in 1968 (30), little is known about the epidemiological dynamics responsible for the wide range of genetic and antigenic variation. A thorough understanding of the evolution and mechanism of persistence of these viruses is critical to developing adequate control strategies such as antiviral drugs and vaccines.
The genus Norovirus belongs to the family Caliciviridae along with three genera, Sapovirus, Vesivirus, and Lagovirus. Norovirus virions are nonenveloped and approximately 30 to 35 nm in diameter. The icosahedral capsid surrounds a positive-sense single-stranded RNA genome covalently linked to VPg at the 5′ end, polyadenylated at the 3′ end, and approximately 7.7 kb in length. The RNA genome is organized into three open reading frames (open reading frame 1 [ORF1], ORF2, and ORF3). ORF1 (in genogroup II noroviruses) encodes a large nonstructural polyprotein that is processed by the virus-encoded proteinase into intermediate precursors and final nonstructural protein products (NS1-2N-term, NS3NTPase, NS4p20, NS5VPg, NS6Pro, and NS7Pol). ORF2 and ORF3 encode major (VP1) and minor (VP2) capsid proteins, respectively. Noroviruses bind saccharides of the histoblood group antigens (HBGAs) as part of the mechanism that mediates viral entry into the epithelial cells of the gastrointestinal tract (25). The protruding (P) domain of the VP1 protein has been cocrystallized with HBGAs to identify the amino acids involved in this interaction, and two sites (interaction site 1 and 2) have been mapped that participate in trisaccharide A and B binding (8).
The genus Norovirus has been divided into five major genogroups (G), designated genogroup I (GI) through GV (49). GI and GII contain the majority of norovirus strains associated with human disease. In order to facilitate studies of the molecular epidemiology of the noroviruses, a genetic typing system was developed by Zheng et al. that further grouped strains in GI and GII into 8 and 17 genotypes, respectively (49). A single genotype, GII.4, has been associated with the majority of global outbreaks since the mid-1990s, when active surveillance was initiated using molecular diagnostic techniques (32, 36). The Foodborne Viruses in Europe network reported that the majority of gastroenteritis outbreaks in 13 countries during 5 years of surveillance (July 2001 to June 2006) were due to this single genotype (31). Genotype II.4 has also played a predominant role in sporadic gastroenteritis occurring in both developed and developing countries, as shown in recent reports from Italy and Nicaragua, where it was responsible for 81% and 68% of norovirus gastroenteritis cases, respectively (6, 42).
The predominance of GII.4 noroviruses, along with the periodic emergence of GII.4 variants associated with a sharp increase in the number of reported illnesses, has prompted intense study of this genotype since the mid-1990s, when it was first recognized as a major epidemic strain (1, 12, 32, 36, 43, 45). Lindesmith et al. analyzed the temporal evolution and HGBA binding characteristics of representative GII.4 noroviruses circulating since 1987 and proposed the existence of five clusters that have likely evolved through antigenic drift and that are serially replaced over time with intermediate periods of stasis (32). The evolutionary pattern described in their study began with a cluster represented by the Camberwell virus, associated with gastroenteritis in Australia in 1994 (10), and the MD145 virus, associated with gastroenteritis in a Maryland nursing home in 1987 (21). Little is known about the GII.4 noroviruses that may have circulated prior to this time, limiting the understanding of the evolution of this important genotype. In order to investigate whether the GII.4 genotype had been an important cause of gastroenteritis earlier than presently documented and to study its evolution over time, we analyzed one of the oldest epidemiologic studies of infantile gastroenteritis available. The Children's Hospital National Medical Center in Washington, DC (CHDC) study was initiated in 1974 in order to investigate the etiology of severe gastroenteritis in infants and young children (5). From the analysis of CHDC archival stool specimens, we identified GII.4 noroviruses dating back to 1974, which enabled a high-resolution phylogenetic analysis of GII.4 noroviruses spanning 34 years (1974 to 2007). Our data support the emergence of an ancestral GII.4 genotype in the 1960s that has evolved at a rate of approximately 4.3 × 10−3 nucleotide substitutions/site/year. In addition, modeling predictions of the GII.4 capsid showed evidence for structural shifts in the capsid architecture over time which did not affect HBGA binding but may have contributed to the emergence of new epidemic strains.
A study was conducted at the Children's Hospital National Medical Center in Washington, DC (CHDC, previously referred to as CHNMC in reference 5), from 1974 to 1991 to examine the etiology of gastroenteritis in infants and young children (5). One or more stool specimens, diaper scrapings, and/or rectal swab samples were collected from control cases (infants and young children hospitalized for reasons other than diarrhea) and diarrhea cases from infants and young children hospitalized with diarrhea and/or vomiting of 4 days duration or less. Samples were stored as undiluted stool samples or 2% fecal filtrate samples at −80°C (29). Samples collected from 1974 to 1981 and from 1987 to 1991 were available for analysis in the present study.
The study at the Children's Hospital National Medical Center in Washington, DC, was originally supported by several grants (including AI01528-17/10) and contracts (NIH-NIAID-71-2091) with the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH). The NIH Institutional Review Board was notified of the research plan, and the samples were considered exempt from board approval for this study.
Stool samples were prepared as 10% suspensions in phosphate-buffered saline (PBS). Stool sample material contained in diaper scrapings and rectal swabs was resuspended in PBS. Available fecal filtrate samples were processed without further manipulation. For diagnostic purposes, the RNA present in 5,424 samples was extracted using Trizol following the manufacturer's instructions (Invitrogen, Carlsbad, CA). Further RNA extraction was required for samples used to sequence the complete norovirus genome as described previously (21). Briefly, 400 μl of the 10% dilution or filtration material was extracted in an equal volume of Genetron (1,1,2-trichloro-1,2,2-trifluorethane; Aqua Solutions, Deer Park, TX). The RNA present in 100 μl of the Genetron-extracted sample was further purified according to instructions in the RNeasy kit (Qiagen, Valencia, CA). The purified RNA was stored at −70°C.
A total of 5,424 samples were screened for the presence of norovirus GI and GII RNA using a multiplex real-time reverse transcription-PCR (RT-PCR) in a one-step format. Because of the sequence diversity among norovirus genotypes, only short stretches of conserved sequences were available for probe design. Two 13-nucleotide probes were designed on the basis of homologous sequences in the polymerase region within GI and GII sequences, respectively. Each probe was then conjugated to minor groove binding groups (MGB; Applied Biosystems, Foster City, CA) in order to increase the specificity and annealing temperature of the probes and to two different fluorophores, 5-carboxyfluorescein (FAM) and VIC to allow for GI and GII discrimination in a single reaction (Applied Biosystems) (Table (Table1).1). The reaction was optimized for MgCl2, primer and probe concentration, and annealing temperature. The Brilliant II core one-step RT-PCR kit (Stratagene, La Jolla, CA) was used to prepare 800-sample reaction mix batches containing the following: 1× core RT-PCR buffer, 3 mM MgCl2, 200 μM of each deoxynucleoside triphosphate, 300 nM ROX reference dye, 1.25 U of SureStart Taq DNA polymerase, 1 μl of reverse transcriptase, 800 nM each of primer MGBGI_F and MGBGI_R, 500 nM primer MGBGII_F, 300 nM primer MGBGII_R, and 250 nM of each MGB probe (GI and GII; Table Table1).1). Each reaction mix batch was tested by establishing a triplicate standard curve with known concentrations of plasmids containing the full-length genome of GI and GII noroviruses (20, 21), together with known positive RNA samples representing both genogroups. The reaction mix was then aliquoted in 1-ml portions and tested for stability after one freeze-thawing cycle. Finally, 5 μl of RNA was mixed with 20 μl of reaction mix and then incubated at 45°C for 30 min and then at 95°C for 10 min, followed by 45 cycles of PCR, with 1 cycle consisting of 30 s at 95°C, 1 min at 52°C, and 30 s at 72°C, in an ABI 7900 HT instrument (Applied Biosystems).
Samples that showed an amplification signal in the multiplex real-time RT-PCR were further evaluated by a regular RT-PCR using primer pairs Mon431/Mon432 and Mon 433/Mon434 as previously described (21). Amplicons obtained from norovirus-positive stool samples were isolated in agarose gels and further purified with a gel extraction kit (Qiagen). The nucleotide sequence was determined directly from the purified DNA amplicon using the Big Dye terminator cycle sequencing ready reaction kit (Applied Biosystems), and the sequencing products were resolved on an ABI PRISM 3730 automated DNA sequencer (Applied Biosystems). Noroviruses were provisionally genotyped by submitting the partial polymerase sequence to a BLAST search (35).
Five out of eight GII.4-positive samples were selected for complete genome sequencing on the basis of the quality and quantity of the available RNA: Hu/NoV/GII.4/CHDC2094/1974/US (CHDC2094/1974), Hu/NoV/GII.4/CHDC5191/1974/US, Hu/NoV/GII.4/CHDC4871/1977/US, Hu/NoV/GII.4/CHDC4108/1987/US, and Hu/NoV/GII.4/CHDC3967/1988/US. The oldest GII.4 strains were CHDC2094/1974, collected at the end of the 1973 to 1974 winter season, and Hu/NoV/GII.4/CHDC5191/1974, collected at the beginning of the next winter season (1974 to 1975). Consensus sequences were obtained for each sample via primer walking every ~500 nucleotides. The 5′ and 3′ prime sequences were obtained using 5′ or 3′ rapid amplification of cDNA end kits (Roche Applied Science, Indianapolis, IN) according to the manufacturer's instructions. The sequence fragments were aligned into a contig to obtain the final sequence with SeqMan v8.0 (Lasergene, Madison, WI). Complete genome sequences were determined independently two times for each sample: once from the RNA in the original Trizol extraction and once from RNA purified again from the original stool specimen by using the RNeasy kit as described above.
The five GII.4 genome sequences from this study were aligned to norovirus genome sequences available in the GenBank database that represented all norovirus genogroups (for a total of 56 norovirus genomes). Sequences were aligned using Clustal X 1.8, and alignment was manually edited with MegAlign v8.0 (Lasergene, Madison, WI).
The evolutionary analysis was performed on a total of 185 VP1 nucleotide sequences corresponding to GII.4 noroviruses from this study together with sequences available in the GenBank database as of August 2008 (see Table S1 in the supplemental material). The alignment was created and edited as described above, and the corresponding dates were assigned using the BEAUTi application, part of the BEAST package (14, 15).
A phylogenetic tree was inferred by maximum likelihood reconstruction based on the nucleotide alignment of the whole-genome sequences as implemented in the PhyML software (23). The statistical significance of phylogenies constructed was estimated by bootstrap analysis with 100 pseudoreplicate data sets. The tree was displayed with the Treeview program (37).
The parameter values for the best-fit model of nucleotide substitutions were determined using Akaike information criterion as implemented in MODELTEST (41). Rates of nucleotide substitution per site and the time to the most recent common ancestor (MRCA) were estimated using Bayesian Markov chain Monte Carlo as implemented in the BEAST software package (14, 15). The model of substitution was the general time reversible, and the data set was partitioned by codon positions. Since no demographic model is available from organisms showing cyclic annual/seasonal behavior, the Bayesian Skyline model for population growth was selected (16). BEAST files were run three times each (for a total of 9 runs), either assuming a constant rate of evolution across the tree (constant molecular clock), or using relaxed-clock models (lognormal and exponential) which model a molecular rate that varies among lineages (13). In all cases, statistical uncertainty in parameter values across the sampled trees is given by the 95% highest probability density values. For all models, chains were run until convergence was achieved (as assessed using the TRACER program, part of The BEAST package). The BEAST analysis was also used to infer a maximum credibility clade (MCC) tree, where the branch length was calibrated to reflect temporal patterns.
Site-specific positive selection was also evaluated as rates of nonsynonymous or synonymous substitutions among 185 sequences present in the GII.4 alignment. Our analysis was performed using the single likelihood ancestor counting method as available in the HyPhy package (40). The general time reversible substitution model was chosen, and a neighbor-joining tree was used as an input, with a limit of significance level of 0.25. The presence of recombination was evaluated using the Recombination Detection Program (34).
The amino acid differences between Hu/NoV/CHDC2094/GII.4/1974/US and VA387 were inserted into the pdb files for the corresponding P-domain structure (2obr: crystal structure of the P domain of norovirus VA387; 2obt: crystal structure of the P domain of norovirus VA387 in complex with blood group trisaccharide type B) (8). The new structure files were loaded and prepared for minimization using SYBYL8.0 (Tripos International, St. Louis, MO). Protein preparation for minimization consisted of adding AMBER7 ff99 atom types to all atoms. For standard residues, SYBYL8.0 used the AMBER7 ff99 parameter file. For the polysaccharides, SYBYL8.0 used an atom-typing algorithm to assign AMBER7 ff99 atom types; adding AMBER 7 ff99 charges to the protein and Gasteiger-Marsili charges to the polysaccharides; optimizing H-bonding of side chain amides.
The modified protein files were then minimized using the Powell conjugate gradient method with the AMBER 7 ff99 force field and terminated at a gradient of 0.05 kcal/mol (SYBYL8.0). Minimization was done with an effective dielectric of ~32 to reproduce the shielding of charges. A cutoff of 8 Å for nonbonded interactions was used.
In order to express recombinant virus-like particles (rVLPs), the ORF2 and ORF3 genes of norovirus strains Hu/NoV/GII.4/CHDC5191/1974/US (nucleotides 5085 to 7510), Hu/NoV/GII.4/CHDC4871/1977/US (nucleotides 5085 to 7510), Hu/NoV/GI.1/Norwalk virus/1968/US [nucleotides 5358 to 7654 plus poly(A)10], and Hu/NoV/GII.1/Hawaii/1971/USA (nucleotides 5085 to 7471) were RT-PCR amplified and cloned into either the FastBac or the Gateway vector, pENTR (Invitrogen). Recombination of plasmid DNA with baculovirus DNA was performed using the Baculodirect kit (Invitrogen), and a baculovirus stock was obtained following transfection of the recombination product into Sf-9 cells as recommended by the manufacturer (serum-free SF9 cells; Invitrogen). The baculovirus stock was used to infect Sf-9 suspension cultures for VLP production. Culture medium from baculovirus-infected cells was layered onto a 25% sucrose cushion and subjected to centrifugation in a SW28 rotor at 76,200 × g for 4 h at 4°C (22). The resulting pellets were dissolved in PBS (pH 7.5) and further purified through a cesium chloride (CsCl) gradient by centrifugation in a SW55 rotor at 218,400 × g overnight. The collected fractions (~1.3 g/ml density) were dialyzed against PBS, and the protein concentration was determined with a commercial Bradford assay kit (Pierce, Rockford, IL). The yields ranged from 2 to 3 mg of each VLP/150 ml of cell culture infected at a multiplicity of infection of ≥5. The presence of VLPs was confirmed by electron microscopy.
The rVLPs were screened for binding to a panel of HBGA-associated oligosaccharides as described elsewhere (8, 32). Briefly, 96-well plates (polyvinyl microtiter plates, Dynatech, Chantilly, VA; neutravidin-coated plates, Pierce) were coated for 2 h at room temperature with either human serum albumin coupled to specific synthetic oligosaccharides or synthetic oligosaccharide-biotin conjugates containing human HBGA epitopes at a concentration of 20 and 10 μg/ml, respectively. After blocking with 5% Blotto, 1.25 μg/ml of norovirus rVLPs was added and incubated for 1 h at room temperature. The binding of captured rVLPs was determined by incubation with genotype-specific hyperimmune sera generated in rVLP-immunized guinea pigs (1:10,000 dilution), followed by incubation with a peroxidase-conjugated goat anti-guinea pig serum (KPL, Gaithersburg, MD). The peroxidase substrate 2,2′-azinobis(3-ethylbenthiazolinesulfonic acid) (ABTS) (KPL) was used, and plates were read at 405 nm in a Dynex Technologies Revelation 4.25 plate reader (Dynatech). The oligosaccharides examined in this study were biotin conjugates type A, type B, Lea, Leb, Lex, Ley, H type 1, H type 3, and human serum albumin conjugate H type 2 (GlycoTech Corporation, Rockville, MD).
The complete genome sequences were submitted to the GenBank database under the following accession numbers: FJ537134 (Hu/NoV/GII.4/CHDC5191/1974/US), FJ537135 (Hu/NoV/GII.4/CHDC2094/1974/US), FJ537136 (Hu/NoV/GII.4/CHDC3967/1988/US), FJ537137 (Hu/NoV/GII.4/CHDC4108/1987/US), and FJ537138 (Hu/NoV/GII.4/CHDC4871/1977/US).
A total of 5,424 stool samples from infants and young children hospitalized with gastroenteritis or other conditions (controls) in the CHDC study were screened for the presence of norovirus RNA by a multiplex one-step real-time RT-PCR designed to detect GI and GII noroviruses (for primer and probe sequences, see Table Table1).1). Three hundred five (5.6%) samples yielded a positive signal in the real-time RT-PCR, but only 50 samples could be confirmed as norovirus positive by RT-PCR amplification of a portion of the polymerase gene and direct sequence analysis of the PCR products. It was possible that long-term storage or multiple freeze-thaw cycles of the samples decreased the efficiency of the diagnostic RT-PCR, and an accurate assessment of the incidence of norovirus-associated illness was not established in this study.
The norovirus-positive stool samples were obtained from children hospitalized with diarrhea, with the exception of two strains detected in stool specimens from the two control cases hospitalized for other conditions. Rotavirus was not detected in the norovirus-positive samples by RT-PCR (data not shown). Sequence analysis of norovirus-specific RT-PCR products showed evidence for the presence of several genotypes in the collection, with the distribution as follows: GII.3 (48%), GII.4 (16%), GII.7 (14%), GII.6 (6%), GII.1 (4%), GI.3 (4%), GII.2 (2%), and GII.8 (2%) (Fig. (Fig.1).1). The viruses from the two control cases belonged to GII.3, and two samples (4%) could not be assigned to a known genotype, although they clustered most closely with human GII.3 noroviruses (data not shown). These data indicate that GII noroviruses have likely been the predominant genogroup for several decades in the human population and in addition show that GII.4 noroviruses were associated with disease as early as 1974.
The detection of GII.4 noroviruses in the archival CHDC specimens allowed the evaluation of five new complete GII.4 genome sequences for comparison with those of approximately 50 noroviruses representing GI (human), GII (human and swine), GIII (bovine), and GV (murine) strains. A complete genome sequence for human and feline noroviruses in GIV was not available in the public database. The CHDC GII.4 viruses clustered most closely with other GII.4 strains, confirming that they belong to this genotype (Fig. (Fig.2).2). The CHDC samples isolated in the 1970s clustered as the ancestors of all GII.4 viruses detected subsequently, with CHDC2094/1974 representing the oldest GII.4 virus characterized thus far. In addition, the two CHDC samples isolated during the 1980s (CHDC4108/1987 and CHDC3967/1988) grouped together with cocirculating noroviruses in that time period, such as the MD145-12 virus, associated with a 1987 gastroenteritis outbreak in a nearby Maryland nursing home (21). Norovirus sequences showed an average of 20% and not more than 40% nucleotide diversity within a genotype, while differences between genogroups were over 40% (measured as percent nucleotide distances [data not shown]).
One hundred eighty-five GII.4 VP1 sequences from noroviruses spanning a 34-year period were submitted to an evolutionary analysis (see Table S1 in the supplemental material). A Bayesian coalescent method was used to infer the rate of evolutionary change expressed as nucleotide substitutions per site per year, using three different molecular clock models (Table (Table2).2). The most conservative method (strict clock) calculated that GII.4 noroviruses evolved at a rate of 4.3 × 10−3 nucleotide substitutions/site/year. This rate of evolution was comparable to the relaxed-clock estimations (uncorrelated exponential deviation and uncorrelated log normal), varying from 4.4 × 10−3 to 6.4 × 10−3 nucleotide substitutions/site/year when considering the highest probability density intervals. The age of the analyzed population was also estimated using the same Bayesian approach, expressed as the time to the most recent common ancestor. The mean age of the GII.4 population was 40 to 41 years considering the three molecular clock estimations. The strict-clock estimation dated the ancestor to 1966, which was similar to that of the relaxed-clock model, 1967.
On the basis of the evolutionary analysis of the GII.4 noroviruses, an MCC tree was inferred, showing the phylogenetic relationship of capsid sequences (Fig. (Fig.3).3). The MCC tree showed the CHDC GII.4 sequences from the 1970s forming a cluster that likely evolved into the GII.4 clusters previously described by Lindesmith et al. (32) that are indicated in the tree. Camberwell is the cluster with the highest intra-amino acid and nucleotide divergence value (2.8% and 7%, respectively), followed by the CHDC, Den Haag, and Hunter clusters (4.7%, 3.3%, and 3.2% nucleotide distance, respectively) (Fig. (Fig.3).3). The largest sequence clusters, Grimsby and Farmington Hills, showed the most conserved nucleotide distance values (2.5% and 0.8%, respectively) and minimal amino acid variation within each cluster (1.1% and 0.4%, respectively). The previous assignment of Grimsby and Farmington Hills to independent clusters might simply be due to the larger numbers of outbreak strain sequences available for analysis at that time because of the predominance of these viruses in a period of high disease prevalence and not to the actual presence of two separate clusters (32).
The VP1 amino acid sequences from the five CHDC GII.4 noroviruses in this study were aligned with representative viruses from the other GII.4 clusters (spanning 1974 to 2005). Seventy-three variable sites were detected, representing 13.5% of the total VP1 amino acid sequence of 540 residues, and their locations in the structure of the capsid protein are shown in Fig. Fig.4.4. Approximately half the variable sites (36 positions) were located in the P2 region of the capsid. Fifteen amino acid residues (indicated by an asterisk in the sequence alignment) were present in one or more CHDC viruses and were not present in viruses from subsequent years. Of these, four consecutive amino acids between positions 292 and 295 (RVGI) represented a conserved sequence that was found exclusively in the CHDC strains isolated in the 1970s. The RVGI sequence was replaced with the sequence HIVG after 1978 and has undergone further evolution at residue 294. An analysis of the VP1 amino acid residues with evidence for positive selection pressure identified four residues in the shell domain (amino acids [aa] 6, 9, 15, and 47), one in the P1 domain (aa 534), and one in the P2 region (aa 395). Four of these residues (aa 6, 9, 395, and 534) were identical to those identified previously by Lindesmith et al. (32). The CHDC GII.4 noroviruses lacked an amino acid insertion at position 394, first detected in GII.4 noroviruses of the Farmington Hills cluster (11). According to a recombination detection analysis, none of the CHDC sequences showed evidence for recombination in the polymerase-VP1 region with other genotypes (7).
A comparable rate of amino acid substitutions was observed in an alignment of sequences corresponding to the minor structural protein, VP2 (Fig. (Fig.5).5). The highest number of overall variations was observed between CHDC VP2 sequences and those of the clusters described in the VP1 phylogeny (26.5% [71/268 aa]). Four sites (aa 4, 155, 158, and 187) likely evolved under positive selection pressure. CHDC2094/1974 presented unique amino acid sequences in approximately half the variable sites. A possible recombination point was identified immediately upstream of the ORF3 at nucleotide 6564 in CHDC2094/1974 by the recombination detection program. The possibility of intragenotypic recombination between ORF2 and ORF3 had been previously described for GII.4 noroviruses (28).
The lower number of full-length genomic sequences for GII.4 noroviruses hampered detailed evolutionary analysis of ORF1. The same positive selection detection algorithm was employed to analyze the GII.4 whole-genome sequences available. The single likelihood ancestor counting analysis indicated that amino acid residues 79 (NS1Nterm), 773 (NS4p20), 846 (NS4p20), and 1091 (NS5Pro) evolved under positive selection. The significance of variation at these residues (if any) will require further investigation of norovirus nonstructural protein structure and function.
It has been suggested that noroviruses bind HBGA molecules as part of the mechanism that mediates viral entry into the epithelial cells of the gastrointestinal tract (25). The structural basis of this recognition has been recently described through the cocrystallization of a GII.4 strain (VA387) with trisaccharides A and B. The first interaction site was located in the P2 region of the capsid, where the trisaccharide interacted via hydrogen bonding with the P2 domain of the viral capsid (aa 343, 344, 345, 374, 441, 442, and 443) (8). A second site also located in the P2 domain was thought to stabilize the interaction between the ligands (aa 390, 391, 392, 393, and 395). The interaction sites remained the same either with trisaccharide A or B (8). The amino acid substitutions unique to the oldest CHDC sequence (CHDC2094/1974) were modeled onto the solved capsid protein structure of the GII.4 norovirus complexed with trisaccharide B (VA387 [Fig. [Fig.6A]).6A]). With the exception of positions 534, 537, and 539, which were not resolved in the original particle structure, all CHDC2094 substitutions were found to be surface exposed, but not directly adjacent to the mapped HBGA binding sites (8). The unique RVGI sequence formed a protrusion from the P region neighboring HBGA binding site 1. The difference in conformation and the relationship between the RVGI protrusion and the trisaccharide B binding site is illustrated in Fig. Fig.6B,6B, in which the two predicted protein structures (VA387 and CHDC2094) are superimposed. The RVGI sequence does not appear to overlap with the HBGA binding site, but it does show a change in structure from the HIAG sequence present in VA387.
Several studies have proposed that human noroviruses are under selective pressure for binding to a wide diversity of HBGA-containing receptors in the gut (12, 32). In order to compare the HBGA binding properties of the GII.4 historical samples with those of more recently circulating strains, we examined the binding of rVLPs derived from the oldest GII.4 norovirus strains to a panel of HBGA oligosaccharides. rVLPs were generated for the CHDC5191/1974 and CHDC4871/1977 noroviruses and studied in direct HBGA binding assays (Fig. (Fig.7).7). Norwalk virus (GI.1) and Hawaii virus (GII.1) rVLPs were included as controls. The rVLPs from both archival GII.4 viruses exhibited binding patterns similar to those described previously for the Camberwell GII.4 cluster (32), recognizing H type 3, Ley, and B antigens. These results confirmed that the archival norovirus carbohydrate binding properties were conserved over time. In addition, these data indicate that the unique RVGI sequence does not affect the carbohydrate binding pattern.
Noroviruses are increasingly recognized as important pathogens of gastroenteritis worldwide, but the mechanisms responsible for the emergence of successful predominant strains are poorly understood (24, 27). A one-tube multiplex real-time RT-PCR assay was developed for a large-scale analysis of RNA extracted from samples obtained up to 34 years ago from children hospitalized with diarrhea. This study reports identification of the earliest known strains of several predominant norovirus genotypes belonging to genogroups I and II. The diversity of these viruses and their ancestral relatedness to contemporary noroviruses suggest that several norovirus genotypes, including GII.4, have been circulating concurrently for at least the last few decades.
The GII.4 genotype has been the predominant pathogen in global gastroenteritis outbreaks among the 25 or more norovirus genotypes associated with human disease. It has been postulated that the GII.4 Camberwell cluster noroviruses emerged in the 1980s, when these viruses encountered a naïve population that allowed them to establish a niche in individuals that expressed H type 3 and Ley antigens (12, 32). Proposed mechanisms for the persistence of the GII.4 noroviruses in the human population have included receptor switching and antigenic variation or a combination of the two (12). Our study showed that GII.4 noroviruses were circulating and associated with severe illness more than a decade prior to the emergence of the Camberwell cluster and that they already had the ability to bind H type 3 and Ley antigens, a property that remained intact until the appearance of the Farmington Hills cluster. This finding argues against a major shift in the HBGA binding profile of GII.4 noroviruses in the 1980s. We further examined the mechanisms by which these viruses may have evolved into the present GII.4 clusters by structural modeling. An evolutionary trend analysis was performed on the basis of current structural knowledge of the HBGA binding sites in the viral capsid (8, 47, 48). The unique amino acid residues that were present only in sequences from the 1970s were all located in surface-exposed areas of the P particle that have not been implicated in carbohydrate binding. Moreover, the HBGA binding sites and surrounding amino acids have remained constant since the 1970s, with the exception of positions 393 to 395 that showed variation in 1997, when the Camberwell cluster was replaced with the Grimsby cluster (11). Tan et al. have performed mutational analysis of the HBGA binding sites of VA387 (47, 48) and demonstrated that binding of VA387 to HBGAs could be inhibited by mutation of position 338 from Thr to Ala, which also was unchanged in the archival GII.4 samples. In addition, they examined amino acids in the capsid that are sterically close and likely involved in interaction with the HBGAs and found that changes in certain positions (aa 331, 348, 346, and 389) altered the binding pattern (47, 48). Comparison of the CHDC VP1 sequences with these critical HBGA binding residues showed that these amino acids have remained conserved over the decades, with the exception of position 389 where an isoleucine was replaced with valine in the Farmington Hills and Hunter clusters, only to revert to isoleucine in the Sakai cluster. This observation suggests that the HBGA binding sites remained constant, but it does not rule out a role for evolution at other sites that could affect the overall conformation of binding to carbohydrate ligands or the receptor. In addition, a major limitation in the study of norovirus evolution is the inability to analyze functional neutralization sites in the VP1 protein. When these assays become available, it may be possible to visualize the mechanisms of immune selection in light of the modeled structural changes described in this study.
A key component in establishing evolutionary mechanisms of the noroviruses is knowledge of the rate at which noroviruses generate genetic diversity that leads to fixed mutations in the viral population. Most RNA viruses evolve at a rate of approximately 10−3 nucleotide substitutions/site/year (17). We found that the capsid region of GII.4 noroviruses evolved at a comparable rate over the last 34 years, which might have allowed the virus to generate replacement clusters and constantly evade the immune response, within the same genotype. However, in many viruses, an immune-driven evolution pattern correlates with the presence of positively selected sites in exposed regions of the capsid due to the selective pressure of neutralizing antibodies (44). Of interest, four of the six positively selected positions of the capsid protein identified in our study were located in the S domain of the capsid, while only one position (aa 395), was found in the P2 region exposed on the surface of the virus. Amino acid residue 395 was identified also as under positive selection by Lindesmith et al. (32) in an independent analysis. Considering the recognized role of antibody pressure in the evolution of many RNA viruses (2, 18), it was striking that so few positively selected sites were observed on the predicted GII.4 capsid surface over the 34-year period. Little is known about norovirus immunity, although antibodies do not correlate with resistance to illness in adult volunteers challenged with Norwalk virus, and immunity is at best short term (34). The role of immune-driven evolution in the emergence and persistence of the GII.4 noroviruses may be complex, and factors beyond (but not ruling out) antibody pressure may be important.
The most striking difference between the CHDC GII.4 sequences from the 1970s and the Camberwell cluster from the 1980s was the presence of a conserved 4-aa RVGI sequence in the archival strains that was replaced with the sequence HIVG. The RVGI sequence resulted in a detectable change in the predicted surface of the capsid, creating a unique protrusion. However, the presence of this predicted protrusion did not affect the HBGA binding pattern of the archival noroviruses compared to those of the Camberwell cluster, suggesting that other selective pressures may be involved in the evolution of GII.4 noroviruses. Additional studies will be needed to establish whether the GII.4 cluster replacement in the 1980s was associated with a change in this exposed 4-aa sequence.
The evolution of noroviruses has focused primarily on phylogenetic inference of the coding sequence of the VP1 protein (9, 26, 45). The VP2 protein, a minor component of the capsid, has been less studied (10, 28). The VP2 protein is essential for generating infectious particles during feline calicivirus replication, and its presence confers greater stability to recombinant norovirus VLPs (3, 46). The VP2 protein of the CHDC noroviruses presented higher amino acid variation than VP1 since the 1970s, with amino acid substitutions distributed across the entire VP2 sequence. No conclusions could be made on the locations of these mutations in the viral particle because a norovirus VP2 structure is not yet available. One possibility is that the high degree of diversity in this region might reflect a role for VP2 in population adaptation by improving viral fitness as the mutations become fixed in the VP1 sequence. Alternatively, VP2 may itself have a role in replication subject to selective pressure.
Several epidemiological studies have reported the use of genetic markers to track the emergence of new GII.4 norovirus variants that were responsible for an increase in the number of outbreaks diagnosed (1, 33), and it was of interest to examine whether these genetic markers existed in the archival strains. Lopman et al. detected an unusual set of mutations that they noted as present only in GII.4 polymerase sequences analyzed worldwide after 2002 (33). The “new variant” changed from AACTTG to AATCTG during that year, and it was proposed that the AATCTG sequence might be used as a genetic marker for the shift in prevalence, although the amino acid sequence (NL) remained the same. The analysis of our archival specimens showed that the CHDC4108/1987 genome contained the AATCTG set of mutations, suggesting that this sequence was present in viruses circulating several years prior to 2002. The three samples sequenced from the 1970s contained GATCTC at this site, which results in one nonsynonymous amino acid substitution (DL). Allen et al. in an analysis of the amino acid variation in the P2 domain of capsid GII.4 sequences identified two sites (aa 296 to 298 and aa 393 to 395) as “hot spots” that impacted the biochemical properties of P2 and changed between epidemic waves (1). Both sites remained unchanged from 1974 until 1997, when the Grimsby cluster became predominant, indicating that these hot spots were not involved in earlier cluster replacements. These two examples show that the inference of new genetic markers based on limited sequencing data should be interpreted with caution. On the other hand, the archival norovirus analysis confirmed that some predictions were consistent over time. A study by Phan et al. described a classification system for noroviruses circulating in children in Japan (39), where four amino acids at positions 101 to 104 of the S domain and two amino acids at positions 522 to 523 of the C terminus represented an identification code specific for each genotype of virus. The GII.4 CHDC samples shared the “NGYA-NQ” signature of this genotype, showing that this genetic marker remained consistent for over 3 decades.
In summary, this study provides insight into the complex evolutionary patterns of GII.4 noroviruses since 1974. We report the first evolution rate of the GII.4 norovirus capsid, including the oldest available strains that were circulating at the time of the discovery of noroviruses. This study demonstrates that GII.4 noroviruses emerged prior to the 1980s and suggests that the ability of GII.4 noroviruses to predominate over time may involve unrecognized virus and host interactions beyond antibody selection and HBGA binding properties. Efforts are in progress to identify norovirus strains in even earlier archival collections (1960s) and to compare the evolutionary characteristics of the other genotypes identified in the CHDC study with those of the GII.4 noroviruses. The success of a potential norovirus vaccine or antiviral drug may depend on our understanding of the evolutionary dynamics that allows the GII.4 genotype to persist over time.
We thank Etsuko Utagawa, Norma Santos, and Taka Hoshino (Laboratory of Infectious Diseases, NIAID) for their contributions to this study. We also thank Michael Dolan, Ana P. Goncalvez, and German Añez-Gutierrez for their assistance with structure modeling. We also thank Lashanda Long-Croal for assistance in managing the RNA samples and the database and the Children's Hospital National Medical Center Research Group lead by H. W. Kim, which originally obtained the specimens and the clinical data (5).
This research was supported by the Intramural Research Program of the NIH, NIAID.
Published ahead of print on 16 September 2009.
†Supplemental material for this article may be found at http://jvi.asm.org/.