The complete VP1, 2A, and 3C and partial 5'UTR regions of 395 EV71 and 6 coxsackievirus A 16-like (CA16-like) strains isolated in Taiwan in 1998 to 2003 were determined. The aligned lengths of VP1, 2A, and 3C were 891, 450, and 549 nucleotides, respectively. The aligned 5'UTR region including indels was 561 nucleotides long, which spans 151 to 701 nucleotides of EV71-A (BrCr, U22521). Therefore, 2451 nucleotides were sequenced from each of the 401 viral isolates. The strains examined in this study are listed in Additional file 1
, Table S1, with the year of isolation and associated clinical symptoms, if known. According to sequence similarities of VP1, genotypes of the recovered viral strains are listed in Table . As previously reported [19
], EV71-C2 was the major etiologic group in 1998, and EV71-B4 became predominant during the period from 2000 to 2003. Our large-scale survey also revealed that in addition to the major genotype, there were genotypes with very low frequencies co-circulating each year. For example, in addition to the previously reported EV71-C2 and -B4, EV71-C1 and CA16-like were also recovered in 1998. In addition, genotype C2 was found in 1999, and genotypes C4 and CA16-like were discovered in 2001.
List of viral strains based on the VP1 region collected from 1998 to 2003
Phylogeny and recombination of EV71
Within each segment, the homoplasy test [20
], GENECONV [21
], and GARD [22
] failed to identify any recombination event for either within- or between-genotype comparisons. In order to test for possible recombinations between sequenced regions, we combined the four segments for further analysis. The concatenated dataset was divided into 10 lineages at the 1% level of divergence. According to sequence similarities of VP1, genotype B4 included four lineages (1 to 4); genotype C contained three lineages, i.e., lineages 5 (genotype C4), 6 (C1), and 7 (C2); and CA16-like had two lineages, 8 and 9. The last lineage which contained only one viral isolate, TW-2051-98, was identified as a recombinant of lineages 4 and 7 (p
< 0.01; KH test after GARD and homoplasy test) with the breaking point located between 2A and 3C, and thus it was excluded from further analysis (Additional file 2
, Figure S1). No other recombination was detected within any lineage. The occurrences of major lineages from 1998 to 2003 are given in Figure . With the exception of 2000, EV71 outbreaks in each season contained only one major lineage. In addition, each lineage was unique to that sampled season with no recurrence in later outbreaks. For example, lineage 4 was from 1998 and 1999, lineages 2 and 3 contained viral strains recovered mainly from 2000, and lineage 1 included viruses from 2001 to 2003.
Figure 1 Occurrences (left y-axis) and temporal changes in the relative genetic diversity (right y-axis) of major enterovirus (EV) 71 lineages each month. Occurrences of major lineages, lineages 1~4 (EV71-B4) and lineage 7 (EV71-C2), are shown as a bar plot. The (more ...)
Phylogenetic relationships of viral strains derived from different genomic regions are given in Figure which only shows abridged phylogenies. The complete phylogenetic trees are available in Additional file 3
, Figure S2. With the exception of the 2A region, phylogenies derived from different regions, including the 5'UTR, VP1, and 3C, support monophyletic origins of the different lineages (> 70% bootstrap replicates) in most cases, further supporting no recombination existing between them. Since essentially identical relationships of different EV71 lineages were yielded by the maximum-likelihood (ML), neighbor-joining (NJ), and Bayesian inference (BI) methods, we only present the phylogenetic trees constructed by NJ method with nodal supports derived from NJ, ML, and BI analyses given at the nodes.
Figure 2 Neighbor-joining (NJ) trees showing the phylogenetic relationships among human enterovirus (HEV)-A and enterovirus (EV) 71 isolates. The trees were constructed based on the nucleotide substitution model determined by the Akaike information criterion (AIC) (more ...)
In the VP1 region, EV71-B4 recovered from different years closely agreed with the phylogenetic positions. Lineage 4 was at the base of the B4 clade followed by lineages 3, 2, and 1. A monophyletic origin of genotype C, including lineages 5, 6, and 7 was strongly supported by the NJ method but with only a 50% bootstrap value in the ML method (Figure ). These three lineages together with EV71-A and -B4 formed a monophyletic group with a high bootstrap value (99%). CA16 and its related strain, CA16-like, formed a sister group to EV71. In the 2A region (Figure ), all above mentioned lineages are still formed but with low bootstrap support. CA16 did not form a sister group to lineages 8 and 9 as shown in Figure . In the 3C region, lineages 5, 6, and 7 were not clustered together as in Figure and . Instead, lineages 6 and 7 were a sister group to CA8 with 99% bootstrap support in both the ML and NJ methods and 73% probability for BI analysis (Figure ). Furthermore, CA16 was not a sister group to lineages 8 and 9. The former was clustered with CA4, and the latter pair was clustered at the bottom of the tree with CA2, CA3, CA6, CA10, CA12, and EV71-A (with 99% bootstrap support in both the ML and NJ trees and 73% probability for BI analysis). In the NJ tree constructed using the 5'UTR, lineages 6 and 7 together were a sister group to CA8 (Figure ).
Inconsistent phylogenies derived from different genomic regions imply frequent recombinations between different species of human enterovirus species (HEV)-A as previously noted [23
]. In Figure , lineages 5 and 6 clustered tightly with CA8 in the 3C and 5'UTR regions, but they were recognized as EV71 genotype C (C1 and C2, respectively) based on the VP1 sequences. It is likely that genotypes C1 and C2 were generated by acquiring the structural domain of ancestral EV71. We further used the bootscan program to detect the region of possible recombination. When EV71-C1 and -C2 were compared against different viruses of HEV-A, two recombination events between CA8 and EV71 (either genotype A or B or both combined) in the 5'UTR (within nucleotide 400~600) and 2A regions (within nucleotide 3400~3600) were clearly demonstrated (Figure ). Other examples of sequence acquisition were provided by the analyses of the EV71-C4 (Figure ) and CA16-like (Figure ) viral strains, the structural domains of which only resembled those of EV71-C1+C2 and CA16, respectively. Interestingly, in the above three cases, the break points of recombination were all located at the 3' end of the 5'UTR and 2A regions which neatly covers the entire structural (= P1) domain of the enterovirus.
Figure 3 Identification of recombined regions between different species of human enterovirus (HEV)-A. The upper panel shows the genomic structure of enterovirus (EV) 71. The results from a bootscan analysis indicated the likelihood of recombination of (a) EV71-C1+C2, (more ...)
Nucleotide diversity and divergence
Average numbers of mutation per site for each year are given in Table . We focused only on the major genotypes of each year, including EV71-C2 from 1998 and EV71-B4 from 2000 to 2002. The nucleotide polymorphisms (θ) of the 5'UTR in different years were 1.52%~2.46% which were significantly lower than those of synonymous sites (θs) of the three protein coding regions (p = 0.014; Mann-Whitney test). In addition to low polymorphism, the 5'UTR had higher proportions of singletons than did synonymous sites of protein coding regions. Proportions of non-singletons polymorphisms, i.e., the number of mutations which occurred more than once divided by the number of total mutations, in the 5'UTR was significant lower than those of synonymous sites of VP1 (p = 0.021), 2A (p = 0.041), and 3C (p = 0.041). θs values derived from different coding regions ranged 8.11%~10.03% for VP1, 8.37%~12.33% for 2A, and 9.34%~11.38% for 3C, and the differences among them were not significant (p > 0.05). Nevertheless, nucleotide polymorphisms at nonsynonymous sites (θa) of VP1, ranging 0.19%~0.40%, were significantly lower than those of 2A and 3C which were 0.46%~0.97% (p = 0.014). Consequently, the ratios of nonsynonymous to synonymous nucleotide polymorphisms (θa/θs) of VP1, ranging 0.024~0.04, were 2~3-times smaller than those of 2A and 3C (p = 0.014). We also analyzed differences in ratios of nonsynonymous (N) to synonymous (S) mutations among the three coding regions. The N/S ratios for VP1, at 0.08~0.14, were smaller than those of 2A (p = 0.001) and 3C (p = 0.014). The difference between VP1 and 2A became insignificant after the singletons were removed.
Polymorphisms in coding and non-coding regions of enterovirus (EV) 71 derived from viruses collected in different years.
Ka and Ks values between EV71 and HEV-A are shown in Table . Since the outgroup of EV71 was difficult to identify, other than for the VP1 region (Figure ), results are expressed as the average divergence between EV71 and different species of HEV-A. Divergences of the 5'UTR (K), ranging 0.116~0.139, were much lower than Ks values of the protein coding regions, which ranged 1.1~1.8 (Table ). The Ka of VP1 between EV71 and other HEV-As, being 0.384 for EV71-B4 and 0.383 for EV71-C2, were higher than those of 2A and 3C which were 0.038~0.057. As a result, rates of protein evolution after calibration by the neutral mutation rate (as presented by Ka/Ks) for VP1, at 0.211 and 0.221, were much higher than those of 2A and 3C, at 0.023~0.034. Small Ka and Ka/Ks values between EV71-B4 and -C2, at 0.016 and 0.015, respectively, indicated that among different genotypes of EV71, VP1 was extremely conserved.
Divergences in coding and non-coding regions between different genotypes and species of human enterovirus (HEV)-A
Tests for positive selection
Since Ks was saturated, the Ka/Ks ratio should be interpreted with care. We tested whether different protein coding regions had different selection regimes by HyPhy (see "Methods"). The results of likelihood ratio test (LRT) revealed no sign for codon under positive selection. In addition, compared to VP1, 2A and 3C were under stronger purifying selection (p < 10-3 for VP1 vs. 2A, p < 0.05 for VP1 vs. 3C, and p = 0.91 for 2A vs. 3C). Therefore, higher Ka/Ks ratio in VP1 cannot be solely imputed to the saturation on synonymous sites, which suggests that different evolutionary forces may have dominated different regions of the genome.
Different approaches were applied to test the hypothesis of positive selection in the VP1 region. We first performed site model tests implemented in PAML to search for evidence of positive selection. The results of LRTs for M1a vs. M2a and M7 vs. M8 were both insignificant, suggesting that the evidence of positive selection working on a particular codon was limited. Using model M2a, we estimated that 98% of the codons were subject to negative selection with an average Ka/Ks ratio of 0.01, 2% had evolved under near neutrality with Ka/Ks ≈ 1, and there was no site with Ka/Ks > 1. We also used branch-site model A to test for positive selection on individual sites along specific lineages (foreground branches). Different lineages were sequentially assigned as foreground branches to test against the null model with Ka/Ks = 1. This test also failed to detect a signature of positive selection on specific branches. PAML also implemented a Suzuki and Gojobori-style [24
] method to test the effects of selection at individual sites with changes according to ancestral states reconstructed by the likelihood method. The results suggested that 98% of the sites were under negative selection, and that only 2% had Ka/Ks values of > 1, but they were all insignificant. Similar results were noted in the analysis of the poliovirus surface protein [25
]. Therefore, within enterovirus, evidence of positive selection at individual sites was limited.
Because synonymous substitutions were saturated (Ks > 1; Table ), the standard Ka/Ks analysis might not be a suitable statistical test for positive selection. We, therefore, performed computer simulations. In each of 10,000 simulations, the number of nucleotide changes was drawn from a Poisson distribution until Ks was equal to the average Ks between EV71 and HEV-A (see "Methods"). Ratios of nonsynonymous (N) to synonymous (S) mutations were chosen according to the observed N/S ratios in the polymorphisms, which were 0.10, 0.24, and 0.21 for VP1, 2A, and 3C, respectively (Table ). The objective of this test is to detect natural selection in the sequences. If the selection intensity is constant over the evolutionary time scale, the ratio of N to S fixations should be equal to the ratio of N to S polymorphisms. The pattern will differ, however, when other forms of selection act on nonsynonymous mutations. Positive selection will increase the divergence relative to polymorphisms at selected sites, whereas negative selection is expected to result in the opposite pattern.
Simulations were done for both EV71-C2 and B4, and the results were essentially the same. For illustrative purposes, we only present the results of genotype B4 (Figure ). For VP1, the observed Ka and Ka/Ks values were significantly higher than the simulated results (p
; Figure ), indicating accelerated fixation of amino acid replacement substitutions. Such a pattern was reversed for 2A and 3C (Figure ). In the 3C region, the observed Ka and Ka/Ks values were lower than the predictions, but the difference was insignificant. Significantly lower Ka and Ka/Ks values in the simulations than in observations for 2A suggest the effect of negative selection preventing deleterious mutations from becoming fixed. Deleterious mutations are usually at a low frequency and inflate the N/S ratio of polymorphisms. This problem can be partially overcome by removing rare mutations from the sample [26
]. The N/S ratio of 2A was largely reduced, from 0.24 to 0.12 (Table ), after singletons were removed. We then reset the N/S ratio from 0.24 to 0.12 for 2A in the simulation, and differences in Ka and Ka/Ks between observations and predictions became non-significant (Figure ). In short, when singletons were not counted, 2A and 3C showed good agreement with the neutral mutation hypothesis.
Figure 4 Observed (filled box) and simulated (open box) Ka and Ka/Ks values for VP1 (A), 2A (B), and 3C (C), respectively, between enterovirus (EV) 71-B4 and human enterovirus(HEV)-A. Simulated Ks values were set equal to the observations. The proportions of nonsynonymous (more ...)
Reduced levels of polymorphism and divergence in the 5'UTR resemble general patterns of protein evolution and indicate that the 5'UTR was either under selection or was subjected to a lower mutation rate than were synonymous sites. A framework, proposed by McDonald and Kreitman (MK), of comparing levels of polymorphism within and divergence between different viruses can distinguish neutrality from selection in the sequences [27
]. If reduced levels of polymorphism and divergence in the 5'UTR can be explained by a lower mutation rate, the ratio of polymorphisms to divergences should be similar to that for synonymous sites. Although, the MK test was originally developed to detect selection at classes of different sites that have the same evolutionary history, it can also be generalized to consider other classes of selected sites from different regions, including non-coding regions [26
], even when they are completely unlinked [28
]. We then applied the MK test to the 5'UTR of EV71-C2 vs. CA8 and EV71-C2 vs. -C1 using polymorphisms and divergence in the 3C region as a reference, because CA8 was strongly supported as the outgroup of EV71-C1 and -C2 in these two regions (Figure ) and because Ks values between CA8 and EV71-C2 at 0.73 and between EV71-C1 and -C2 at 0.42 were not saturated.
We first applied the MK test using all polymorphisms, and the non-significant results suggested a neutral pattern of molecular evolution which seemed to support a low mutation rate in the 5'UTR of HEV-A. Nevertheless, since the 5'UTR exhibited many rare variants (Table ) which may be deleterious and are likely to be a factor limiting the power to detect selection in the sequences [29
], we next removed all singletons from the sequences. Applying this approach revealed a significant excess of divergence in the 5'UTR of EV71-C2 compared to CA8 (p
= 0.045; Chi-square test) and EV71-C1 (p
= 0.026; Table ). This result suggested that a significant fraction of divergence in the 5'UTR was probably driven to fixation by positive selection.
Polymorphism and divergence in the 5' untranslated region (UTR) and 3C regions of enterovirus (EV) 71-C
Population demographics and the rate of change
Evolutionary dynamics, represented by the VP1 region, were estimated using an established Bayesian Markov chain Monte Carlo (MCMC) approach that incorporates the month of viral sampling. In the absence of natural selection, the genetic diversity obtained reflects the change in the effective number of infections over time [30
]. Nevertheless, a possible role of selection was demonstrated for VP1 of EV71 in a previous section; the plot was interpreted as a measure of the relative genetic diversity. Changing patterns of genetic diversity clearly showed temporal dynamics of EV71 isolated from 1998 to 2003 (Figure ). Two peaks, from left to right, respectively represent the outbreak from late 1999 to 2000 (lineages 2 and 3) and the epidemic seasons of 2001 to 2003 (lineage 1). Genetic diversities were largely reduced at the end of each epidemic season demonstrating genetic bottlenecks. Because EV71-C2 (lineage 7) was mainly recovered in 1998 which yielded a large uncertainty in the diversity estimation, we did not include lineage 7 in Figure .
For each region, the time to the most recent common ancestor (TMRCA) recovered from different outbreaks was separately estimated (Table ). For lineage 1, and lineages 2 and 3, the TMRCAs derived from different segments were very close, indicating that within these epidemic seasons, different genomic regions may have shared similar evolutionary histories. For lineage 7, the TMRCAs of four different segments exhibited greater variation than the above three lineages. All the TMRCAs predated the beginning of the outbreaks. For example, the TMRCA of VP1 of lineages 2 and 3, both of which caused the epidemic from late 1999 to 2000, was 1998.6 (i.e., June 1998) with the 95% highest posterior density (HPD) ranging 1997.9~1999.2. The TMRCA of VP1 for viral strains recovered from 2001 to 2003 (lineage 1) was 2000.8 (95% HPD, 2000.5~2001.1). Since our sequences were collected within a relatively short period of time which may have constrained the posterior distribution of that TMRCA, we next included VP1 sequences collected between the 1990 s and 2003 from surrounding areas for TMRCA estimation (see "Methods"). The results shown in Table are close to our original estimations with narrower 95% HPDs. We also performed a Bayesian MCMC analysis using randomly associated sequences and dates. For the VP1 region, the TMRCAs of the generated dataset were 1852 (95% HPD, 1480~1991) for lineage 1 and 1725 (95% HPD, 1063~1985) for lineages 2 and 3. Therefore, our estimation of TMRCAs is sound.
The time to the most recent ancestor (TMRCA) of different segments of enterovirus 71 derived from this study.
The evolutionary rates of different genomic segments of EV71-B4 recovered in this study between 1998 and 2003 were estimated. The overall rates of change estimated by the linear regression method for the VP1, 2A, 3C, and 5'UTR regions were 4.6 × 10-3
, 6.8 × 10-3
, 7.1 × 10-3
, and 5.3 × 10-3
per site per year (/site/year), respectively. These values are generally in good agreement with those derived from the Bayesian MCMC method which were 5.7 × 10-3
, 6.7 × 10-3
, 6.3 × 10-3
, and 5.5 × 10-3
/site/year for VP1, 2A, 3C, and 5'UTR, respectively (Table ). In addition, evolutionary rates estimated by the linear regression fell within the 95% HPDs derived from the Bayesian MCMC method. Similar values derived from different approaches indicated that our estimates were authentic. For the protein coding regions, approximately 70%~90% of all changes were synonymous changes (Table ). We then estimated the rate of change for the third codon position by Bayesian MCMC and on synonymous sites by the linear regression method (see "Methods"). The evolutionary rate for third codon position for VP1, 2A, and 3C were 1.48 × 10-2
, 1.66 × 10-2
, and 1.51 × 10-2
(/site/year), respectively. Because not all changes on the third codon position are synonymous, these values are smaller than synonymous mutation rates which were 2.2 × 10-2
, 3.5 × 10-2
, and 3.2 × 10-2
/site/year for VP1, 2A, and 3C, respectively. For the 2A and 3C regions, synonymous mutation rates are close to that of the poliovirus (3.96 × 10-2
], but higher than estimates by Hanada et al. (1.00 × 10-2
] and Brown et al. (1.35 × 10-2
] of enterovirus.
Rates of change in different genomic regions of enterovirus (EV) 71-B4 derived from this study