|Home | About | Journals | Submit | Contact Us | Français|
Sixty-six human enterovirus serotypes have been identified by serum neutralization, but the molecular determinants of the serotypes are unknown. Since the picornavirus VP1 protein contains a number of neutralization domains, we hypothesized that the VP1 sequence should correspond with neutralization (serotype) and, hence, with phylogenetic lineage. To test this hypothesis and to analyze the phylogenetic relationships among the human enteroviruses, we determined the complete VP1 sequences of the prototype strains of 47 human enterovirus serotypes and 10 antigenic variants. Our sequences, together with those available from GenBank, comprise a database of complete VP1 sequences for all 66 human enterovirus serotypes plus additional strains of seven serotypes. Phylogenetic trees constructed from complete VP1 sequences produced the same four major clusters as published trees based on partial VP2 sequences; in contrast to the VP2 trees, however, in the VP1 trees strains of the same serotype were always monophyletic. In pairwise comparisons of complete VP1 sequences, enteroviruses of the same serotype were clearly distinguished from those of heterologous serotypes, and the limits of intraserotypic divergence appeared to be about 25% nucleotide sequence difference or 12% amino acid sequence difference. Pairwise comparisons suggested that coxsackie A11 and A15 viruses should be classified as strains of the same serotype, as should coxsackie A13 and A18 viruses. Pairwise identity scores also distinguished between enteroviruses of different clusters and enteroviruses from picornaviruses of different genera. The data suggest that VP1 sequence comparisons may be valuable in enterovirus typing and in picornavirus taxonomy by assisting in the genus assignment of unclassified picornaviruses.
Human enteroviruses (family Picornaviridae) infect millions of people worldwide each year, resulting in a wide range of clinical outcomes ranging from inapparent infection to mild respiratory illness (common cold), hand-foot-and-mouth disease, acute hemorrhagic conjunctivitis, aseptic meningitis, myocarditis, severe neonatal sepsis-like disease, and acute flaccid paralysis (reviewed in references 43 and 45). In the United States, enteroviruses are responsible for 30,000 to 50,000 meningitis hospitalizations per year as a result of 30 million to 50 million infections. Serologic studies have distinguished 66 human enterovirus serotypes on the basis of an antibody neutralization test (43), and additional antigenic variants have been defined within several of the serotypes on the basis of reduced or nonreciprocal cross-neutralization between prototype and variant strains (6, 8, 68, 71, 72). On the basis of their pathogenesis in humans and experimental animals, the enteroviruses were originally classified into four groups, polioviruses, coxsackie A viruses (CA), coxsackie B viruses (CB), and echoviruses, but it was quickly realized that there were significant overlaps in the biological properties of viruses in the different groups (8). The more recently isolated enteroviruses have been named with a system of consecutive numbers: EV68, EV69, EV70, and EV71 (42).
A comparison of nucleotide and deduced amino acid sequences at the 5′ end of VP2 has identified four major phylogenetic groups within the Enterovirus genus: CA16-like viruses (cluster A), a CB-like group containing all CB and echoviruses as well as CA9 and EV69 (cluster B), poliovirus-like viruses (cluster C), and EV68 and EV70 (cluster D) (23, 24, 49, 53, 54, 73). However, pairwise alignments and phylogenetic analyses within these groups demonstrated that the VP2 sequence does not fully correlate with serotype, as viruses known to belong to the same serotype often failed to cluster together (2, 49). (E22 and E23 are genetically distinct from enteroviruses , and their reclassification into a separate genus has been proposed ).
VP1 is the most external and immunodominant of the picornavirus capsid proteins (58). A number of major neutralization sites reside in the VP1 proteins of many picornaviruses (reviewed in references 40 and 44), but the specific epitopes responsible for serotype specificity and intratypic variation have not been identified. Similarly, the genetic correlates of serotype identity remain unknown. If the important serotype-specific neutralization sites reside in VP1, then the VP1 sequence or some portion thereof would be predicted to correlate with serotype. Studies on the three serotypes of poliovirus have shown that a partial VP1 sequence correlates well with serotype (32). In addition, genetic lineages based on the VP1 sequence can be used to define poliovirus reservoirs and chains of transmission (reviewed in reference 30). To test whether the VP1 sequence might be applied to the classification of nonpolio enteroviruses and to the analysis of the phylogenetic relationships among the human enteroviruses, we determined the complete VP1 nucleotide sequences for 47 human enterovirus prototypes and 10 well-characterized antigenic variants. These data, together with previously available sequences, comprise a database of complete VP1 sequences for all known human enterovirus serotypes and 12 natural antigenic variants. This database will be useful for molecular epidemiologic studies of enteroviral disease outbreaks, to obtain a better understanding of the genetic correlates of serotype, and for the development of enteroviral molecular diagnostic reagents.
The viruses used for sequence analysis and phylogenetic reconstruction are listed in Table Table1.1. RNA isolation and reverse transcription-PCR were carried out as described previously (48). Briefly, viral RNA was extracted from infected cell culture supernatant or 10% infected mouse brain homogenate with Trizol LS (Life Technologies, Inc., Gaithersburg, Md.), and cDNA was synthesized by use of a random hexamer primer and a SuperScript preamplification kit (Life Technologies, Inc.). From each viral cDNA, an amplicon of approximately 900 to 950 bp, encompassing the 3′ end of VP3, all of VP1, and the 5′ end of 2A, was amplified by PCR with primers for VP3 and 2A (Table (Table2).2). For some viruses, VP1 was amplified as two overlapping fragments with internal VP1 primers as well as the VP3 and 2A primers. The PCR products were gel isolated and purified for sequencing with a QIAquick gel extraction kit (Qiagen, Inc., Santa Clarita, Calif.) and sequenced on an automated DNA sequencer with fluorescent dideoxy chain terminators (PE-Applied Biosystems, Foster City, Calif.). Complete VP1 PCR products of viruses for which VP1 primers were not available were cloned into pGEM-T (Promega Corp., Madison, Wis.), and nested-deletion subclones were constructed with an Erase-a-Base kit (Promega). For each virus, at least two independent clones were sequenced by automated methods as described above.
Pairwise nucleotide and amino acid sequence identities were calculated by alignment of all possible sequence pairs with the program Gap (Wisconsin Sequence Analysis Package, version 9.1; Genetics Computer Group, Inc., Madison, Wis.). For phylogenetic reconstructions, groups of nucleotide sequences were aligned with Pileup (Genetics Computer Group). The alignment was manually adjusted to account for codon boundaries, optimal alignment among closely related sequences, and optimal alignment of highly conserved amino acid motifs. Phylogenetic relationships were inferred with the programs Neighbor and DNApars (PHYLIP version 3.57 ) and Puzzle (version 4.0 ). The maximum-likelihood method of Kishino and Hasegawa (33), with a transition/transversion (Ts/Tv) ratio of 8.0, was used to construct a distance matrix for neighbor-joining analysis. The statistical significance of phylogenies constructed with Neighbor and DNApars was estimated by bootstrap analysis with 100 pseudoreplicate data sets. Puzzle was executed by use of the distance method of Kishino and Hasegawa (33), with a Ts/Tv ratio of 8.0, and the reliability of phylogenetic reconstructions was estimated by use of 1,000 puzzling steps. Branch lengths of the neighbor-joining trees were calculated by the maximum-likelihood method with Puzzle. The robustness of subtrees within cluster B (CB-like group) was confirmed by construction of a Puzzle tree containing each of the subtree taxa, as well as their nearest sibling taxa, for a total of 14 sequences. The sibling taxa were those with the highest amino acid identity scores compared with the taxa in the subtree of interest. For subtree analysis, Puzzle was executed with Ts/Tv ratios of 1.0, 6.0, and 10.0, and well-supported Puzzle trees were taken as evidence in support of the subtree in the original phylogram.
The sequences reported here were deposited in the GenBank sequence database under accession no. AF081293 to AF081349.
Complete VP1 nucleotide sequences were determined for 57 human enterovirus strains for which VP1 sequences were not previously available (Table (Table1).1). Forty-seven of the strains were prototype strains for recognized human enterovirus serotypes (43). The other 10 sequenced strains were well-characterized antigenic variants which, while antigenically distinct from their respective prototype strains, were similar enough to the prototype strains to have been considered of the same serotype (8, 43). Combined with the 21 previously available complete enterovirus VP1 sequences, the 57 sequences reported here comprise the first collection of complete gene sequences representing each of the 66 human enterovirus serotypes.
The boundaries of the newly sequenced VP1 genes were predicted by comparison of the nucleotide and deduced amino acid sequences with those of previously characterized enteroviruses. Human enterovirus VP1 sequences varied in length from 834 to 951 nucleotides (278 to 317 amino acids). The CB had the shortest predicted VP1 amino acid sequences (278 to 298 amino acids), while EV68 and EV70 had the longest ones (312 and 317 amino acids). The newly determined enterovirus sequences were compared with previously available human enterovirus VP1 sequences and with the sequences of other closely related picornaviruses, including E22 and E23, provisionally reclassified as the only members of the genus Parechovirus (45); porcine enterovirus 9; bovine enterovirus types 1 and 2; human rhinovirus types 2 and 14; and human hepatitis A virus.
To assess the broad relationships among the enteroviruses and other human picornaviruses, a phylogenetic tree was constructed with representatives of each of the four human enterovirus clusters (A: CA2, CA12, and CA16; B: CA9, CB1, E26, and EV69; C: PV1, CA19, and CA24; D: EV68 and EV70), the nonhuman enteroviruses (BEV1, BEV2a, BEV2b, and PEV9), and other human picornaviruses (E22, E23, HAV, HRV2, and HRV14). As expected, the human enteroviruses clustered into four major groups (Fig. (Fig.1),1), consistent with published enterovirus phylogenies (23, 49, 53, 54, 73). Picornaviruses from the genera Parechovirus and Hepatovirus were distinct from the enterovirus clusters. The human rhinoviruses HRV2 and HRV14 clustered among the human enteroviruses, consistent with previous phylogenies, but the precise position of HRV2 was poorly supported (bootstrap value, 37%). The nonhuman enteroviruses, PEV9, BEV1, BEV2a, and BEV2b, formed a monophyletic group distinct from but related to cluster A. The human enterovirus clusters were very strongly supported, with bootstrap values of 100%, and the relationship between clusters B and C was also well supported (80%). The relationship of the nonhuman enteroviruses to cluster A was supported by a bootstrap value of 62%. In some cases, PEV9 fell outside this group, resulting in the low bootstrap value.
To determine the phylogenetic relationships among individual prototype viruses and, where available, prototypes and their antigenic variants, intracluster phylogenetic trees were constructed with several phylogeny reconstruction algorithms (neighbor-joining, maximum parsimony, and maximum likelihood) and included one sequence from each of the heterologous human enterovirus clusters as an outgroup. Within each of the four major clusters, trees constructed by different methods were congruent in overall structure and in general clustering patterns but often differed slightly in the order of one or more distal branches or in the bootstrap support for specific nodes. Viruses in cluster A segregated into three distinct subgroups—(i) CA7, CA14, CA16, and EV71; (ii) CA3, CA4, CA6, CA8, and CA10; and (iii) CA5 and CA12—with 59 to 99% bootstrap support (Fig. (Fig.2A).2A). CA2 appeared to be distinct from the other viruses in cluster A. Cluster C viruses segregated into four subgroups—(i) CA1, CA19, and CA22; (ii) CA21, CA24, CA24v, and E34; (iii) CA11 and CA15; and (iv) CA13, CA17, CA18, CA20, PV1, PV2, and PV3—with bootstrap support of 67 to 100% (Fig. (Fig.2C).2C). Within cluster B, some of the viruses clustered into subgroups, but few of the subgroups were well supported by bootstrap analysis (Fig. (Fig.2B).2B). Stable subgroups included (i) E3 and E12; (ii) E11, E11′, and E19; (iii) E2 and E15; (iv) E13 and EV69; (v) the six CB serotypes; (vi) E1, E8, E4-Pesacek, E4-DuToit, and E4-Shropshire; (vii) E6-D’Amori, E6-Charles, E6′-Cox, and E6"-Burgess; and (viii) E21, E25, E29, E30-Bastianni, E30-Frater, E30-Giles, and E30-PR-17. The other viruses could not be reliably subgrouped, as the bootstrap values were extremely low. For every cluster, all serotypes which were represented by more than one isolate were monophyletic. In most cases, bootstrap support was strong (60 to 98%), but the value for CA24 strains was only 43%.
Sequence relationships within a serotype, within a cluster, between clusters, and between human enteroviruses and other picornaviruses were analyzed by comparison of the nucleotide and deduced amino acid sequences of all possible sequence pairs. The relationships were visualized by plotting the frequency of pairwise identity scores versus percent identity, rounded down to the nearest integer, as a histogram (Fig. (Fig.3).3). For both the nucleotide (Fig. (Fig.3A)3A) and amino acid (Fig. (Fig.3B)3B) pairwise identity distributions, the scores fell into four categories. The highest scores (nucleotide identity, ≥75%; amino acid identity ≥88%) depicted relationships among viruses of the same serotype (e.g., the four E30 strains) or among prototype viruses that have been proposed to be homologous based on antigenic relatedness (e.g., CA13 and CA18). Nucleotide identity scores for pairwise comparisons within a major cluster ranged from 48.9 to 73.2% and defined a peak that was clearly delineated from that of the homologous pairs and from the peak of scores comparing viruses of different phylogenetic clusters (Fig. (Fig.3A).3A). Cluster A scores ranged from 58.5 to 73.2%, while cluster C scores ranged from 55.9 to 70.6%. Viruses in cluster B appeared to be somewhat more heterogeneous, with scores ranging from 48.9 to 71.8%. Scores for the heterologous comparison peak ranged from 42.1 to 64.5% nucleotide identity. The final peak, containing the lowest scores, represented comparisons of viruses of different genera within the family Picornaviridae. In the amino acid identity distribution (Fig. (Fig.3B),3B), the heterologous cluster peak appeared to be composed of two overlapping peaks. The peak with higher scores represented comparisons of viruses from phylogenetically related clusters (e.g., clusters B and C), whereas the peak with lower scores represented comparisons of viruses from more distant clusters (e.g., clusters A and B).
VP1 is the major surface-accessible protein in the mature picornavirus virion; it is arrayed around the fivefold axis of symmetry of the icosahedral virion (1, 21, 39, 58). VP2 and VP3 comprise the remainder of the virion surface. Each of the capsid proteins is composed of conserved elements that form the β-barrel structural elements of the capsid, with variable loops between the β-barrel structures (reviewed in reference 44). Many of the loops are exposed on the virion surface, and studies of monoclonal antibody-resistant mutants have shown that a number of the loops contribute to specific antigenic neutralization sites. VP1 contributes to all three of the major neutralization sites that have been identified on the poliovirus surface, whereas VP2 and VP3 contribute to two and one of the sites, respectively. For example, the B-C loop is one of five VP1 loops forming VP1 antigenic site 1. Replacement of the VP1 B-C loop of CB3 by that of CB4 through site-directed mutagenesis produced a viable virus with a mixed neutralization phenotype, demonstrating the presence of a serotype-specific antigenic neutralization site in the B-C loop (57). Our sequencing studies confirmed the presence of sequence domains that are conserved among all members of the Enterovirus genus, as well as intervening domains that vary in sequence between strains of different serotypes and in some cases within a serotype. Due to the complexity of the three-dimensional structure of the enterovirus capsid and the fact that most of the neutralization sites are discontinuous, it is not possible to correlate specific VP1 residues with antigenic sites responsible for serotype specificity. Sequence comparisons and phylogenetic reconstructions suggest that VP1 contains serotype-specific information that can be used for virus identification and evolutionary studies. For the polioviruses, serotype-specific sequences in VP1 have been exploited to produce type-specific molecular diagnostic reagents (see below).
Pairwise sequence identity has been applied to the taxonomy of plant viruses in the family Potyviridae, with clear distinction of viruses of the same strain, viruses of different strains within a genus, and viruses of different genera (67). Our data suggest that a similar system based on the VP1 sequence can be used to classify viruses within the family Picornaviridae (Fig. (Fig.3).3). Enteroviruses of the same serotype were clearly distinguished from those of heterologous serotypes, and the limits of intraserotypic divergence appeared to be about 25% nucleotide sequence difference or 12% amino acid sequence difference. Likewise, strains of a homologous serotype clustered together in phylogenetic analyses (Fig. (Fig.2).2). Sequencing of additional strains within several different serotypes will provide more extensive data to determine whether this distinction is valid and will provide a more accurate measure of the serotype boundaries. Since enterovirus serotypic differentiation is based on neutralization and the VP1 sequence correlates with neutralization type, it is logical to assume that molecular diagnostics targeted to the VP1 coding region should give typing results that also correlate with the serotype determined by neutralization with type-specific antisera. The molecular distinction between serotypes has obvious applications to the typing of enterovirus isolates in the clinical laboratory. Molecular assays directed to specific sequences in VP1 have already been applied to the serotyping, genotyping, and group identification of polioviruses (12, 13, 30–32, 69, 70). Pairwise identity scores have also placed viruses of the same cluster in a single frequency peak, demonstrating that gross intercluster sequence differences exist, whereas differences within a cluster occur on a smaller scale. The molecular distinction between clusters suggests that it may be useful to extend the taxonomic classification of enteroviruses to include each of the clusters as a subgenus. Similarly, VP1 sequence comparisons may prove valuable in the assignment of unclassified picornaviruses to one of the existing genera or point out the need for the introduction of new genera.
The CB were originally distinguished from the CA and polioviruses on the basis of differences in pathogenesis when inoculated intracerebrally into newborn mice. The echoviruses were, by definition, apathogenic in newborn mice; however, there are examples of echovirus isolates that are mouse virulent. In phylogenetic trees based on the partial sequence of VP2 (or VP4-VP2), the CB failed to cluster together, being interspersed among the echoviruses in a large CB-echovirus group (23, 49). In addition, analysis of 5′ nontranslated region sequence of seven CB5 clinical isolates showed that there is little or no genetic linkage between the 5′ nontranslated region sequences and serotype, probably due to a high frequency of recombination, whereas VP1-VP2A junction sequences are monophyletic and are correlated with serotype (35). In the present work, VP1 trees contained a monophyletic group consisting only of CB, suggesting that biological properties exclusive to the CB may be partially or completely encoded within the VP1 gene. Although bootstrap support for this subgroup was relatively low in the overall cluster B tree (Fig. (Fig.2B),2B), additional tree reconstruction with only sequences most closely related to the CB supported the existence of a distinct CB subgroup within cluster B (data not shown). The nature of the CB-specific determinants remains unknown but could include receptor specificity and, hence, host range, cell or tissue tropism, and pathogenesis, as structural studies have shown that VP1 forms part of the picornavirus receptor-binding pocket (58).
CA1, CA19, and CA22 are the only enterovirus serotypes that have never been successfully adapted to growth in cell cultures, instead requiring passage by intracranial inoculation of suckling mice (18, 29, 43). In parsimony and neighbor-joining analyses with partial VP2 sequences, these three viruses formed a monophyletic cluster within the poliovirus-like group, but with low bootstrap support (49). The maximum-likelihood method produced a VP2 tree with a star topology within the poliovirus-like cluster and failed to show specific clustering of CA1, CA19, and CA22. When complete VP1 sequences were used, the same algorithms consistently produced trees in which CA1, CA19, and CA22 formed a well-resolved monophyletic cluster within the poliovirus-like group, with bootstrap support of 77 to 100%, suggesting that VP1 may play a role in the common host range and/or cell tropism of these three viruses.
E8 (strain Bryson) was isolated in 1953 and initially classified as a separate enterovirus serotype (8, 56). Later serologic studies demonstrated the antigenic relatedness between E8 and E1 (9, 20), and E8 has been designated a strain of the E1 serotype (45). Our genetic comparison of E1 and E8 agrees with the previous antigenic studies, with E1 and E8 being 76.0% identical in nucleotide sequence and 93.2% identical in amino acid sequence, and supports the reclassification of E8 as a variant of E1, based on the homologous serotype boundary described above. The VP1 nucleotide sequences of CA11 (Belgium-1) and CA15 (G-9) were 80.1% identical to one another (96.7% amino acid identity). Partial VP2 sequences had shown that the two viruses were 100% identical in the 50 amino acids at the amino terminus of VP2 (49). Similarly, the complete VP1 nucleotide sequences of CA13 (Flores) and CA18 (G-13) were 77.2% identical to one another (95.1% amino acid identity). Antigenic cross-reactivity has been noted between CA11 and CA15 and between CA13 and CA18 (9). Taken together, the serologic and genetic relationships suggest that CA15 may be a strain of CA11 and that CA18 may be a strain of CA13. In contrast, antigenic cross-reactivity has been reported for CA3 and CA8 (10), but their VP1 amino acid sequences were only 83.3% identical, suggesting that they are genuinely distinct serotypes that share a common epitope(s).
We have demonstrated the general utility of the enterovirus VP1 sequence database by its application to basic problems in enterovirus classification, phylogeny, and evolution. Based on the success of poliovirus molecular diagnostics targeted to VP1 (12, 13, 31, 32, 69, 70) and the shortcomings of current molecular methods for identifying nonpoliovirus enteroviruses (2), we conclude that future enterovirus molecular diagnostic development efforts should be targeted to the genomic region encoding VP1. Hybridization, PCR, and sequencing assays targeted to other regions of the genome may still be useful in identifying and characterizing intertypic enterovirus recombinants. The existence of a complete VP1 sequence database that includes representatives of all human enterovirus serotypes will facilitate the development of generic and serotype-specific molecular reagents for the rapid diagnosis of enterovirus infections, for the detailed molecular epidemiologic study of enterovirus disease outbreaks, for the characterization of newly discovered enteroviruses, and for the study of enterovirus evolution within and among the 66 human enterovirus serotypes.