|Home | About | Journals | Submit | Contact Us | Français|
Human noroviruses (NoVs) of genogroup II, genotype 4 (GII.4) are the most common strains detected in outbreaks of acute gastroenteritis worldwide. To gain insight into the epidemiology and genetic variation of GII.4 strains, we analyzed 773 NoV outbreaks reported to the CDC from 1994 to 2006. Of these NoV outbreaks, 629 (81.4%) were caused by GII viruses and 342 (44.2%) were caused by GII.4 strains. The proportion of GII.4 outbreaks increased from 5% in 1994 to 85% in 2006, but distinct annual differences were noted, including sharp increases in 1996, 2003, and 2006 each associated with newly emerging GII.4 strains. Sequence analysis of the full-length VP1 gene of GII.4 strains identified in this study and from GenBank segregated these viruses into at least 9 distinct subclusters which had 1.3 to 3.2% amino acid variation between strains in different subclusters. We propose that GII.4 subclusters be defined as having >5% sequence variation between strains. Our data confirm other studies on the rapid emergence and displacement of highly virulent GII.4 strains.
Noroviruses (NoVs), members of the genus Norovirus of the family Caliciviridae (13), are the leading cause of acute gastroenteritis (AGE) worldwide. In the United States, 81 to 96% of the nonbacterial AGE outbreaks with fecal specimens sent to the Viral Gastroenteritis Unit at the Centers for Disease Control and Prevention (CDC) were attributable to NoVs (4, 11, 12). In Europe, surveillance data from 10 countries demonstrated that >85% of all AGE outbreaks were associated with NoVs (22), and in Japan, NoV outbreaks were responsible for 93 to 97% of all viral gastroenteritis outbreaks (Infectious Disease Surveillance Center [IDSC], National Institute of Infectious Diseases [NIID] website http://idsc.nih.go.jp/iasr/iasrcnt-e.html#srsv-e).
In the United States, NoVs account for an estimated 30 to 50% of all food-borne outbreaks (7a, 47). Norovirus outbreaks occur in a wide variety of settings year-round but are particularly common and protracted in the winter months in closed settings (e.g., hospitals and long-term care facilities [LTCFs]) where transmission is predominantly from person to person. Immunity to NoVs is poorly understood due to the lack of a cell culture system to measure neutralizing antibodies (9). Data from volunteer challenge studies demonstrate that immunity to NoVs seems to be short-term, and reinfections with heterogeneous and homologous strains have been documented (13). However, these conclusions are suspect because the challenge dose of virus used may have been 4 to 6 log10 units higher than the infectious dose (39).
NoVs have a 7.5- to 7.7-kb single-stranded genome of positive-sense RNA which contains three open reading frames (ORFs). ORF2 codes for the major capsid protein (VP1). NoVs are classified into five distinct genogroups (genogroup I [GI] to genogroup V [GV]) on the basis of VP1 sequencing analysis. These genogroups are further subdivided into at least 32 genetic clusters (13, 24, 44, 48). Of these, GI, GII, and GIV strains have been detected in humans, and viruses belonging to GII, genetic cluster or genotype 4 (GII.4) have emerged as the predominant strain over the last decade (4, 5, 10, 14, 15, 21, 23, 28, 33, 43, 46). The global distribution of a distinct GII.4 strain was first recognized in 1995 and 1996 (28). Between 1997 and 2002, NoV activity was moderate to low with different strains cocirculating without a distinct epidemic strain (6, 22). In 2002, 2004, and 2006, NoV outbreaks increased sharply and were associated with the emergence of new GII.4 strains (5, 7, 21, 46).
To understand the epidemiologic patterns of GII.4 outbreaks in the United States and to determine the genetic variation of GII.4 NoV strains over time, we systematically analyzed NoV outbreaks and specimens reported to the CDC from 1994 to 2006.
(This work was presented in part at the Third International Calicivirus Meeting, 10 to 13 November 2007, Cancun, Mexico, and at the 26th Annual Meeting of the American Society for Virology, 14 to 18 July 2007, Corvallis, OR.)
We analyzed epidemiologic data (setting, transmission route, and onset date) for all confirmed NoV outbreaks for which specimens were sent to the CDC from 1994 through 2006. A confirmed NoV outbreak was defined as one from which at least 2 stools tested positive for NoV by reverse transcription-PCR (RT-PCR) (2, 4, 11, 27) or real-time TaqMan RT-PCR (40). Epidemiologic data of NoV-positive outbreaks were analyzed for statistical significance by using the chi-square test (SAS, version 9.1).
All NoV-positive specimens had previously been typed into genogroup or genotype as described previously (4, 11, 12, 27, 28). Strains that had been genotyped as GII.4 by sequence analyses of partial regions of the polymerase gene or VP1 gene (2, 4, 42) were included in the study for further characterization.
We retested the selected specimens by previously described standard methods. In brief, viral RNA was extracted from a 10% stool suspension using the NucliSens automated nucleic acid extraction system (BioMérieux) (4). A 322-nucleotide (nt) region of the 5′ end of the VP1 gene of all strains was amplified with primer set Mon381-Mon383 (4, 11, 27). The full-length VP1 gene of 11 strains each representing a different GII.4 subcluster (Table (Table11 ) was also amplified using the Qiagen One-Step RT-PCR kit (Qiagen Inc., Valencia, CA) with forward primer DPZ-F291 (5′-ACT CAG ACA AAT GTA TTG GAC-3′) and reverse primer DPZ-R2457 (5′-ACC CTC TAG GAG CAT CGC CTG-3′). RT-PCR products were purified from 2% agarose gels and sequenced on an automated DNA sequencer (model 3130xl; Applied Biosystems, Foster City, CA) using the BigDye terminator cycle sequencing ready reaction kit (Applied Biosystems).
All sequences generated in this study were edited with Sequencher 4.8 (Gene Codes Corporation, Ann Arbor, MI) and analyzed with several GCG programs (Wisconsin Package, version 11.1.2; Accelrys Inc.). The FastA program was used to search for the closest sequence match to identify unique sequences, the Distances program was used to calculate the uncorrected pairwise distances and to draw the phylogenetic trees, and the PAUP Search was used for bootstrapping the consensus tree.
Nucleotide sequences that were not identical to any other sequences in our database were assigned a unique identification number (sequence identification [SeqID]). Strains that had more than 85% sequence similarity to GII.4 reference strains (i.e., Bristol or Farmington Hills) were initially classified as GII.4 (28, 48) and included for further analysis. Phylogenetic trees were displayed with TreeView software (30).
To compare GII.4 sequences from our study with strains that have been detected globally, a GenBank BLAST search was conducted using the 11 full-length GII.4 VP1 sequences that were generated in this study.
The VP1 sequences identified in this study were deposited in GenBank and have been assigned accession numbers as shown in Table Table11.
Between 1994 and 2006, CDC received fecal specimens from 997 outbreaks of acute gastroenteritis that had been submitted by state and local health departments for NoV testing and genotyping. Of these, 773 (77.5%) outbreaks were confirmed as NoV positive by conventional or real-time RT-PCR with at least two positive specimens per outbreak. The number of NoV-positive outbreaks varied from 19 to 90 each year with the proportion of GI outbreaks ranging between 2 and 15 per year except for 1999 (n = 22; 24.4%) and 2000 (n = 32; 39.0%) when a significant increase of the number of GI outbreaks was reported. The number of GII outbreaks ranged from 16 to 77 per year and accounted for 58 to 96% of all NoV outbreaks. Of the 773 NoV outbreaks, only two outbreaks (one in 1999 and one in 2000) were typed as GIV. Among the GII outbreaks, the number of GII.4 outbreaks ranged from 1 to 69 by year with 3 remarkable seasonal increases in 1995 to 1996 (5 to 43), 2002 to 2003 (18 to 44), and 2005 to 2006 (34 to 69). Since 2002, the proportion of GII.4 outbreaks never dropped below 52% (2005) and steadily increased to 85.2% in 2006 (Fig. (Fig.11).
Overall, 342 (44.2%) of the 773 NoV outbreaks were typed as GII.4, 287 (37.1%) were typed as other GII strains, 128 (16.6%) were typed as GI, and 16 (2.1%) had mixed genotypes (Fig. (Fig.2A2A).
GII.4 outbreaks occurred more frequently in long-term care facilities (LTCFs) (e.g., nursing homes) (42.7%) and cruise ships (25.1%) (P < 0.001) than in other settings, whereas GI and other GII viruses were more often associated with outbreaks in restaurants and parties (37.8 and 37.2%, respectively) than in LTCFs (7.1 and 27.6%, respectively) (Fig. 2B, C, and D) (P < 0.001).
GII outbreaks (including GII.4) demonstrated a marked seasonal peak during the cooler months (December to March), whereas the GI outbreaks occurred year-round and did not exhibit any apparent seasonality (Fig. (Fig.33).
A total of 726 sequences were obtained from 342 GII.4 outbreaks. Among these, 198 unique sequences (differing by at least 1 nt) were identified, and these sequences were plotted by frequency, geographic distribution, and year (Fig. 4A to C).
Overall, 66 (33.3%) unique GII.4 sequences were detected from 1994 to 2001 and 132 (66.7%) unique GII.4 sequences were detected from 2002 to 2006 (Fig. (Fig.4C).4C). The largest number of unique GII.4 sequences was detected in 1996 (n = 24), 2002 and 2003 (n = 95), and 2006 (n = 25). In contrast, during 1997 to 2001 when >50% of the NoV outbreaks were associated with non-GII.4 viruses, the number of unique GII.4 sequences ranged from 4 to 9 per year and they did not cause many outbreaks or spread widely.
The majority of the unique GII.4 sequences were associated with only one outbreak (80%) (Fig. (Fig.4A),4A), in one state (88%) (Fig. (Fig.4B),4B), and in one year (92%) (Fig. (Fig.4C).4C). Some of the GII.4 sequences caused more outbreaks, had a larger geographic distribution, and circulated for a longer period of time. For example, GII.4 sequence 15 was associated with 11 outbreaks in 6 states but circulated only in 1996, whereas sequence 72, which first emerged in 2002, was detected in 30 outbreaks in 18 states during 2002 to 2005 (Fig. 4A to C).
An unrooted phylogenetic distance tree was generated based on analysis of 198 unique partial ORF2 sequences from GII.4 strains collected during 1994 to 2006 in the United States (Fig. (Fig.5).5). These sequences fell into 9 subclusters of which 3 subclusters (Richmond, QM2CS, and SSCS) consisted of only 2 strains. Viruses of most subclusters circulated for 2 to 3 years before becoming extinct except for viruses in the subcluster 95/96-US which were detected for up to 7 years from 1995 to 2002 (28). The 60 unique GII.4 sequence variants formed 2 separate groups (Houston and Lonaconing) with <5% sequence diversity (Fig. (Fig.55 and Table Table1).1). The GII.4 sequence variants detected after 2001 (n = 136) grouped into 7 subclusters. Viruses from subcluster Warren/Farmington Hills, which were detected from 2002 to 2005, demonstrated the greatest number of unique sequences (n = 83) and caused a remarkable increase of GII.4 outbreaks in 2002 (Fig. (Fig.1)1) (46). Viruses from subclusters Henry and Cumberland included 16 and 12 unique sequences, respectively, which were detected in 22 GII.4 outbreaks in 2000 to 2005 (Henry) and 13 GII.4 outbreaks in 2003 to 2006 (Cumberland). Viruses belonging to subclusters CLCS/Laurens and OSDCS/Minerva were responsible for a significant increase in the number of GII.4 outbreaks in 2006 (Fig. (Fig.1)1) (7). Unique GII.4 sequences that grouped into subclusters QM2CS and SSCS circulated for only 1 to 2 years in 2001 and 2005 to 2006, respectively, and did not cause a large number of outbreaks.
A BLAST search of the GenBank database using the 11 strains representing different GII.4 subclusters yielded 1,043 sequences of which 141 were unique at the nucleotide level. Of these, 73 strains differed at the amino acid level and were used to assess the phylogenetic relationship of the GII.4 strains detected in this study with the strains circulating in other parts of the world. Strains in each subcluster had a maximum amino acid variation of 1.3 to 4.1% (Table (Table1).1). The strain diversity among all GII.4 VP1 sequences was 10.6%, which is <15%, the cutoff for viruses classified into a cluster (48). Thus, all strains belong to the GII.4 cluster and can be grouped into 9 distinct subclusters with robust bootstrap values (85 to 100%), including Bristol, US95/96, Henry, Farmington Hills, Hunter, Chiba, Yerseke, Osaka, and Den Haag (Fig. (Fig.6).6). Two strains (Richmond and Erfurt007) could not be grouped in any of the 9 subclusters and tentatively represent additional subclusters. Viruses in subclusters US95/96 and Farmington Hills were associated with sequence variants 15 and 72, respectively (Fig. (Fig.4),4), which caused pandemics in 1996 and 2002 (21, 28, 46); viruses in Hunter and Chiba subclusters caused outbreaks during 2004 to 2006 with Hunter viruses detected around the globe (5), whereas Chiba viruses were mainly detected in Asia. Viruses in subclusters Yerseke and Den Haag both had a global distribution in 2006 (7, 31, 41). In contrast, viruses in subclusters Henry and Osaka were only found in 3 (China, Spain, and United States) and 2 (Japan, United States) countries, respectively. Overall, GII.4 viruses of a particular subcluster circulated between 2 and 6 years.
We systematically analyzed the epidemiology and strain diversity of GII.4 NoV outbreaks reported to the CDC from 1994 to 2006. Overall, the proportion of GII.4 outbreaks increased from 5.3% in 1994 to 85.2% in 2006, but distinct annual differences with sharp increases in pandemic years were noted. The GII.4 outbreaks were most often identified in LTCFs (43%) and cruise ships (25%) and had distinct winter seasonality, whereas GI viruses caused outbreaks throughout the year, which also has been reported in Europe (19).
Comparison of GII.4 sequences circulating in the United States with sequences submitted to GenBank demonstrated that most GII.4 strains have a worldwide distribution. Phylogenetic analysis of full-length VP1 sequences demonstrated that all known GII.4 viruses can be grouped into at least 9 distinct subclusters (Fig. (Fig.6)6) with tentatively 2 additional subclusters (Richmond and Erfurt) represented by only one full-length VP1 sequence. On the basis of the maximum diversity within each subcluster (1.3 to 4.1%) (Table (Table1),1), we propose that 5% of amino acid variation in the full-length VP1 be the cutoff for the classification of GII.4 subclusters. For naming of the subclusters, we employed the same approach as has been used for naming of genotypes, i.e., the first publicly available VP1 protein sequence (48), which was adopted by participants of the Third International Calicivirus Meeting, Cancun, Mexico, 10 to 13 November 2007.
We detected 198 unique GII.4 sequence variants, the majority of which were associated with only one outbreak. A few sequence variants were associated with the emergence of new GII.4 subclusters (US95/96 in 1996, Farmington Hills in 2002, Hunter in 2004, and Yerseke and Den Haag in 2006) that gradually displaced previous GII.4 viruses in the population (20, 31, 34), an evolutionary pattern similar to the pattern described for influenza virus (17). An important feature of RNA virus replication is the lack of the proofreading function of its RNA polymerase (8). Due to the error-prone nature of RNA polymerases, the mutation rate for NoVs can be expected to be in the same range as estimated for RNA viruses, such as poliovirus and hepatitis C virus, with 1.44 × 10−3 to 3 × 10−3 base substitutions per site per year (29, 45). This could ultimately lead to the emergence of new GII.4 strains. Rapid spread of these newly emerging strains in an immunologically naïve population could result in large-scale outbreaks leading to a pandemic within a short amount of time (5, 7, 21, 23, 26, 28, 32, 41, 43, 46).
Recent studies have demonstrated that susceptibility to NoV infection, including GII.4 viruses, is associated with of the presence of certain histo-blood group antigens (HBGA) on host cells (36, 37). Moreover, the surface-exposed carbohydrate ligand-binding domain in the NoV capsid is under heavy immune selection and likely evolves by antigenic drift. Since no simple cell culture or small animal model is available to study NoVs, the exact mechanism behind the evolution and subsequent selection of a particular strain and its transmissibility remains unknown. Several research groups have used three-dimensional (3D) homology models to study the interactions between HBGA receptors and the protruding (P) domain of the NoV capsid monomer, the putative receptor-binding domain, to assess the host susceptibility to viral infections, which might help to understand the mechanism of GII.4 predominance (1, 35-38).
Although the CDC started to document outbreaks of acute viral gastroenteritis in 1982, the true burden of NoV outbreaks in the United States is unknown and is likely substantially underreported due to the passive surveillance system (4). In this study, we analyzed only outbreaks for which samples were submitted to the CDC. Prior to 1995, most outbreak samples were tested for NoV by electron microscopy or serology (3, 25); with the introduction of RT-PCR and real-time RT-PCR methods, the NoV detection rate has increased greatly (3, 25, 40).
To strengthen the current NoV outbreak surveillance and to obtain a better assessment of the true burden of NoV illness as well as the early detection of newly emerging GII.4 strains, real-time RT-PCR assays have been implemented in most laboratories of state health departments. In addition, several electronic surveillance systems for NoV outbreaks have recently been initiated. CaliciNet, a network of state and local public health and food regulatory agency laboratories in the United States is coordinated by the CDC and allows us to rapidly detect and link epidemiologic and NoV sequence data to identify multistate and international food-borne outbreaks. A global norovirus surveillance network, NoroNet, which includes data from CaliciNet as well as data from other surveillance networks (18, 26) will allow more timely identification of increased norovirus activity and emergence of novel pandemic strains.
The findings of this study highlight the importance of continued NoV outbreak surveillance as well as molecular characterization of strains to better understand the mechanism(s) driving the rapid evolving GII.4 viruses, to determine why they have an advantage over other NoV genotypes, and to assess whether other genotypes follow similar evolutionary patterns (16).
We thank all state and local health departments for providing us with epidemiologic data and outbreak specimens; Lenee H. Blanton, R. Suzanne Beard, and Belinda Vermeulen for their assistance with outbreak and laboratory investigations; and Larry Anderson, Jon Gentsch, and Olen Kew for their constructive comments.
The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. This article did receive clearance through the appropriate channels at the CDC prior to submission.
Published ahead of print on 28 October 2009.