Noroviruses are important causative agents of acute viral gastroenteritis in both children and adults worldwide. Their persistence in the human population may be due, in part, to the genetic diversity of these viruses. The majority of studies on the evolution of NoVs have focused broadly on all genotypes or on a few specific genotypes, such as GII.4, because of their high prevalence in outbreaks (28
). However, GII.3 NoVs, which are responsible for a large number of infections, particularly in settings in which it is endemic and in children, have not been thoroughly studied (6
). The goal of this study was to compare differences in the evolutionary dynamics of these viruses to inform the development of control strategies to prevent NoV infection and dissemination.
In this study, we have identified and described the VP1 region of the earliest known GII.3 NoV strains isolated from children hospitalized in Washington, DC. Prior to this study, the oldest GII.3 full-length VP1 sequence (Hu/NoV/GII.3/Goulburn Valley G5175A/1983/AUS) reported was collected in 1983 (42
). GII.3 NoVs were present and caused acute gastroenteritis in the human population at least 8 years earlier, in 1975. Previously, and in unpublished data, we showed that GII.3 NoVs were the most prevalent genotype detected from children hospitalized with acute gastroenteritis at the Children's Hospital National Medical Center of Washington, DC, from 1974 to 1991 (2
). It has been proposed that the most successful NoV genotypes, which currently are responsible for the majority of gastroenteritis cases (GII.4 and GII.3), emerged during the first peak detected in the 1980s (GII.4) (28
) and 1990s (GII.3) (6
). Our studies of norovirus samples collected prior to those years in the Children's Hospital study now have shown that both GII.4 and GII.3 viruses have been circulating for at least several decades (unpublished data) (2
and K. L. Shumansky, E. J. Abente, S. V. Sosnovtsev, A. Z. Kapikian, K. Y. Green, and K. Bok, unpublished data). This suggests that regardless of the mechanism of evolution of different NoV genotypes and the cumulative host immunity acquired after each gastroenteritis episode, certain genotypes, such as GII.4 and GII.3 remain predominant. Unlocking the key to this evolutionary advantage might be essential for the development of adequate control strategies.
One of the key features of RNA viruses that have allowed them to persist in human populations is their ability to undergo genetic changes which may lead to a more fit generation of viruses (10
). It has been reported that RNA viruses evolve at an approximate rate of 10−3
nucleotide substitutions/site/year (18
). Consistently with this, it was previously reported that GII.4 NoVs evolved at a rate of 4.3 to 6.5 × 10−3
nucleotide substitutions/site/year (2
). The GII.3 NoVs in our study evolved at a rate of 4.16 × 10−3
to 7.39 × 10−3
nucleotide substitutions/site/year (strict- and relaxed-molecular-clock models), which is comparable to the rate for GII.4 viruses. These data differ from a previous study that calculated a lower rate of evolution for GII.3 compared to that of GII.4 (5
). Our data suggest that the overall lower prevalence of GII.3 compared to GII.4 NoVs cannot be attributed to differences in the rate of nucleotide evolution in the VP1 region, since the rates of evolution between the two genotypes were remarkably similar for both strict- and relaxed-molecular-clock models. The addition of the CHDC samples to both the GII.3 and the GII.4 (2
) evolutionary analyses, together with the use of more statistically advanced algorithms (Bayesian estimation), may have afforded better estimations of evolutionary rates.
The nucleotide and amino acid variations both within and between the three clusters identified in our GII.3 phylogenetic tree were analyzed and compared to values previously described for GII.4. The GII.3 clusters I, II, and III had intracluster nucleotide variation of 2.8, 2.1, and 1.8%, respectively. These values are lower than the range of intracluster nucleotide variation seen in GII.4 NoVs, which demonstrated nucleotide variations ranging from 0.8 to 7% (2
). However, this difference might also be due to the availability of a larger number of GII.4 VP1 sequences in the public databases. Intercluster nucleotide variation increased with increasing distance between clusters in the tree, with clusters I and II, II and III, and I and III demonstrating variation of 9.5, 10.8, and 11.1%, respectively. It was hypothesized that intercluster amino acid variation would follow a similar pattern; however, the intercluster variation between clusters II and III (3.5%) was higher than the intercluster variation between clusters I and II (2.8%) and I and III (2.8%). These results indicate that while the nucleotide composition of GII.3 NoVs continues to change over time, viruses in the earliest cluster are more similar to viruses in the most recent cluster in their amino acid composition than the two most modern clusters are to each other.
The percent amino acid distances between the two most distant GII.3 strains in time (Hu/NoV/GII.3/CHDC2005/1975/US, cluster I, and Hu/NoV/GII.3/RotterdamP1D88/2006/NL, cluster III) shown in were analyzed. Only 30% of the amino acids in VP1 changed over time, whereas 67.1% of amino acids changed in a similar analysis of GII.4 NoVs (Hu/NoV/GII.4/CHDC2094/1974/US compared to Hu/NoV/GII.4/Sakai/2005/US) (2
). Additionally, the percent distance of all GII.3 NoVs individually compared to that of Hu/NoV/GII.3/DC2005/1975/US demonstrated a stable rate of approximately 4 to 6% amino acid distance over time, with sample isolation dates ranging from 1975 to 2006 (). In contrast, GII.4 NoVs showed a linear rate of amino acid change that continually increased over time, from 1974 to 2005, to greater than 10%, which appeared to correlate with the emergence of the GII.4 clusters. Taken together, these data demonstrate a striking difference in the evolution of GII.3 and GII.4 NoVs at the amino acid level, despite their similar rates of nucleotide substitution. It seems that while certain amino acid residues have undergone change over time in GII.3 NoVs, many residues in cluster III that had undergone mutation between clusters I and II are the same as the residues in cluster I. While we cannot determine if these mutations were fixed from cluster I, we did not have a large enough sample size to demonstrate this, and could not demonstrate that these mutations have actually mutated twice, this phenomenon of similar amino acids between only clusters I and III may reflect an advantage in viral fitness in relation to these particular residues. Alternatively, perhaps only a small repertoire of amino acids can be tolerated by the virus at these positions, which may have undergone mutation due to pressure to evade the host immune response or because of an error-prone RNA-dependent RNA polymerase (RdRp). Since these residues were present in the viral population in the 1970s, and assuming only short periods of antibody protection are elicited after infection, reverting back to those particular residues may allow these viruses to infect current populations, which may have gained immunity to more recently circulating strains.
In contrast, GII.4 NoVs seem to evolve in a linear fashion without much reversion to previously utilized amino acids. This difference may reflect the difference in hosts that these two genotypes tend to infect. For GII.4 to cause epidemics in adult populations, it needs to continue to reinfect previously exposed hosts, which requires constant changes in the capsid protein to evade the host immune response developed in response to previous infections. Conversely, GII.3 NoVs have been shown to be prevalent in young children and infants (1
). These young naïve hosts may not have established immune-associated protection from previous infections, and therefore GII.3 NoVs may not need to continually adapt to evade the immune response, as the pool of young children and infants is continually renewed. Why these two genotypes have a tendency to infect different age groups in the population has yet to be determined, but it may be due to a variety of factors, including seroprotection from heterologous strains or the minimal infectious dose necessary to cause disease.
The alignment of VP1 sequences representative of all three GII.3 clusters revealed that greater than 50% of the amino acid changes occurred in the P2 domain. Bull and colleagues identified 15 evolutionary hotspots on the GII.4 NoV capsid protein (5
). These hotspots vary between each pandemic cluster within this genotype. Interestingly, they found that six (aa 310, 312, 389, 392, 395, and 404) of these GII.4 hotspots also were hypervariable sites in GII.3 sequences, and that they clustered onto four exposed loops in the P2 domain (5
). Consistent with those results, our analysis found variation over time at the same positions with the exception of residue 395 (). Residue 312 was identified as a residue which had undergone genetic mutation over time but had reverted back to the most ancestral amino acid in the most recent isolates. Residue 389 was predicted, in our analysis, to be under positive selective pressure. Our data confirmed that except for amino acid 395, these hotspots might actually vary between GII.3 clusters.
Our analysis on positive selection sites within the VP1 region of GII.3 NoVs resulted in the identification of only seven residues; however, all seven were located in the P2 domain (). Previous analysis of the GII.4 NoVs over the same time period identified six positively selected sites; however, only one was located in the P2 domain, while four were identified in the shell (S) domain and one in the P1 domain (2
). These differences in the location of sites under positive selection between GII.3 and GII.4 NoVs indicate that the two genotypes are under differing host selection pressures, such as possible coreceptors or immune-driven selection.
The structural modeling of the Hu/NoV/GII.3/CHDC2005/1975/US P domain sequence demonstrated that all seven of the positively selected amino acid sites were surface-exposed residues on the GII.3 minimized structure (). It is important that while there is no capsid structure solved for GII.3 viruses, the ability to model with the solved GII.4 structure may give interesting preliminary insight into the location of certain residues. By mapping the same sites on the better-understood GII.4 minimized model, Hu/NoV/GII.4/VA387/1998/US, we showed that out of the six residues that could be mapped, all remained in surface-exposed locations. Additionally, these residues mapped to regions surrounding the predicted HBGA binding sites. This suggests that these sites directly interact with host ligands, including HBGAs. Moreover, these residues may be undergoing positive selection as a direct evasion strategy of the host immune response and neutralizing antibodies. However, we cannot differentiate between mutations occurring due to host-driven pressure and mutations occurring due to RdRp error. Mutations that remain fixed in the population, such as those predicted to be undergoing positive selection, can be hypothesized to be due to host pressure, but further investigation will be necessary to better describe this dynamic process.
The role of the immune-driven evolution of GII.3 NoVs merits further study, particularly with the investigation of HBGA binding in relationship to the evolutionary changes of the P2 domain over time. Previous studies have found that GII.4 NoVs bind HBGA types A, B, H3, Leb
, and Ley
in oligosaccharide-based binding assays, while GII.3 NoVs bound only types A and B strongly and Leb
). The fewer HBGAs bound by GII.3 NoVs in comparison to GII.4 NoVs may explain why GII.3 NoVs are not as prevalent as GII.4 NoVs in adults (28
), in that there may be fewer hosts who are susceptible to GII.3 NoV infection. The studies on HBGA binding in the GII.3 NoVs were performed on more contemporary clusters II and III (Hu/NoV/GII.3/Mexico/1989/MX, cluster II; Hu/NoV/GII.3/ParisIsland/2003/US, cluster III). Our binding results agree with the lower level of amino acid variation found on GII.3 NoVs, since all of the clusters analyzed typically bound the same types of carbohydrates, contrasting with a wider and continually changing repertoire of HBGAs bound by GII.4 NoVs over time. Regardless of the binding pattern differences between GII.4 clusters, cluster-specific sera were able to block the interaction of heterologous GII.4 VLPs to HBGAs, even between clusters three decades apart. These data suggest the presence of several conserved antigenic sites within strains of a particular genotype. Several studies show evidence for GII.4 NoVs escaping the immune response through cluster replacement (2
), but our HBGA-blocking binding assay data did not support these predictions. If GII.4 NoVs are indeed escaping the immune response by cluster replacement, the “blockade” assay might not adequately predict major antigenic shifts within a genotype.
Our study provides novel insight into the evolution of the VP1 protein of GII.3 NoVs over a 31-year period and includes the characterization of three sequences older than any previously described. These data demonstrate that GII.3 NoVs were circulating and causing severe disease in humans prior to 1983, the date of the oldest GII.3 NoV VP1 sequence previously available in public databases, and that the currently circulating GII.3 strains probably have a common ancestor between 1970 and 1973. Our study also provided insight into the evolution of GII.3 NoVs compared to that of their more prevalent relative, the GII.4 NoVs. While both of these genotypes had similar rates of nucleotide evolution, the fixation of amino acid substitutions over time was strikingly different, with GII.3 NoVs demonstrating more similarity in amino acid sequence between samples isolated three decades ago than samples isolated in the same decade. This, together with the difference in location of the positively selected amino acids, suggests that these two genotypes are under differing host-specific selective pressures. A clear and thorough characterization of the various patterns of evolution employed by different NoV genotypes is critical for the development of suitable vaccine candidates which need to be formulated to target the most likely strains circulating at a given point in time.