|Home | About | Journals | Submit | Contact Us | Français|
The eastern equine encephalitis (EEE) complex consists of four distinct genetic lineages: one that circulates in North America (NA EEEV) and the Caribbean and three that circulate in Central and South America (SA EEEV). Differences in their geographic, pathogenic, and epidemiologic profiles prompted evaluation of their genetic diversity and evolutionary histories. The structural polyprotein open reading frames of all available SA EEEV and recent NA EEEV isolates were sequenced and used in evolutionary and phylogenetic analyses. The nucleotide substitution rate per year for SA EEEV (1.2 × 10−4) was lower and more consistent than that for NA EEEV (2.7 × 10−4), which exhibited considerable rate variation among constituent clades. Estimates of time since divergence varied widely depending upon the sequences used, with NA and SA EEEV diverging ca. 922 to 4,856 years ago and the two main SA EEEV lineages diverging ca. 577 to 2,927 years ago. The single, monophyletic NA EEEV lineage exhibited mainly temporally associated relationships and was highly conserved throughout its geographic range. In contrast, SA EEEV comprised three divergent lineages, two consisting of highly conserved geographic groupings that completely lacked temporal associations. A phylogenetic comparison of SA EEEV and Venezuelan equine encephalitis viruses (VEEV) demonstrated similar genetic and evolutionary patterns, consistent with the well-documented use of mammalian reservoir hosts by VEEV. Our results emphasize the evolutionary and genetic divergences between members of the NA and SA EEEV lineages, consistent with major differences in pathogenicity and ecology, and propose that NA and SA EEEV be reclassified as distinct species in the EEE complex.
Eastern equine encephalitis virus (EEEV) is an important veterinary and human pathogen belonging to one of seven antigenic complexes in the Alphavirus genus, family Togaviridae (32). Isolated throughout the Americas, EEEV is classified as the only species in the eastern equine encephalitis (EEE) complex (9, 10), which was originally divided into North and South American varieties based on antigenic properties (11). However, additional antigenic and phylogenetic analyses have refined its classification to include four subtypes that correspond to four major genetic lineages (I to IV) (7, 55). North American EEEV (NA EEEV) strains and most strains from the Caribbean comprise subtype/lineage I, while subtypes/lineages II to IV include South and Central American EEEV (SA EEEV) strains. The EEEV genome consists of a nonsegmented, single-stranded, positive-sense RNA of approximately 11.7 kb, which includes a 5′ cap and a 3′ poly(A) tail. The 5′ end of the genome encodes four nonstructural proteins (nsP1 to -4), while a subgenomic RNA (26S) is encoded by the 3′ end and ultimately produces three main structural proteins: capsid and envelope glycoproteins E1 and E2 (46).
Despite considerable nucleotide sequence divergence between NA and SA EEEV lineages, NA EEEV is highly conserved throughout its geographic and temporal spectra. Multiple robust analyses have demonstrated less than 2% nucleotide sequence divergence among NA EEEV strains isolated between 1933 and 2007 (5, 7, 64, 68, 69). An overall temporal trend of genetic conservation is also maintained, with newer isolates differing most from ancestral strains at the base of the North American clade (7, 64). In contrast, SA EEEV is highly divergent both between and among the three lineages/subtypes. Although less robust than previous NA EEEV phylogenetic analyses, those of SA EEEV show a tendency for geographic clustering of isolates rather than temporal relationships (7). Differing patterns of genetic conservation between NA and SA EEEV may be the result of differences in their ecology and adaptation to different mosquito and vertebrate hosts (65).
Transmission of NA EEEV occurs in an enzootic cycle involving the ornithophilic mosquito vector Culiseta melanura and passerine birds in hardwood swamp habitats (32, 43). The broad geographic distribution and distinctly ornithophagic behavior of Cs. melanura result in a close relationship between NA EEEV and avian vertebrate hosts, which is one proposed mechanism for its highly conserved genetic nature. Infected birds provide for efficient geographic dispersal and the mixing of strains with distant origins. While genetic drift tends to have less impact on large, panmictic populations, competition and natural selection may periodically constrain genetic diversity in the NA EEEV population, resulting in the antigenic and genetic conservation observed (64, 66). Transmission of NA EEEV by bridge vectors probably does not impact viral evolution; however, it does result in sporadic outbreaks of severe disease in humans, equids, and other domestic animals, including game birds, swine, and dogs that are considered dead-end hosts (22, 23, 43, 50).
Although they are associated with equine disease, SA strains of EEEV are not clearly associated with human disease (4, 17, 18, 40). This lack of human pathogenicity has limited research to expand our epidemiologic and ecologic understanding of SA strains. EEEV isolations from Culex (Melanoconion) spp. in the Spissipes section (Culex pedroi in South America and Culex taeniopus in Central America) suggest that they are the primary enzootic, and potentially epizootic, vectors (28, 33, 53, 58). Movement of these vectors beyond their tropical forest habitat is typically limited (29), which may influence the focality of transmission. However, these species are relatively catholic in their feeding behavior, which broadens the potential transmission cycles used by SA EEEV. Greater vector diversity in tropical regions may also contribute to genetic diversity among the SA EEEV lineages, although vector competence data are limited.
The vertebrate ecology of SA EEEV is not well described, with serological associations including wild birds, ground-dwelling rodents, marsupials, and reptiles (12, 17, 31, 45, 56, 57, 58). The observed genetic divergence and geographic clustering of the SA EEEV phylogeny could reflect the use of ground-dwelling mammals as primary hosts for enzootic transmission (43, 65). With limited mobility, these vector and vertebrate species may restrict the distribution of SA EEEV to geographically defined regions, thus limiting competition among distant strains and allowing for the independent evolution of genetic lineages (65). Geographically delineated transmission foci may also be more susceptible to the impacts of genetic drift, thus constraining genetic diversity locally. Venezuelan equine encephalitis viruses (VEEV), which also utilize Culex (Melanoconion) sp. vectors and small mammals as primary vertebrate hosts (15, 42, 51, 52, 59, 67), exhibit a similar genetic pattern of independent evolution and multiple, cocirculating subtypes in Central and South America (60). However, a robust comparison of the evolutionary patterns between SA EEEV and VEEV has not been conducted.
Elucidation of patterns of enzootic transmission and dispersal of zoonotic, arboviral pathogens is critical for understanding and predicting the risk to human health. Therefore, we studied the evolutionary progression of the EEE complex to clarify the extent of divergence between NA and SA EEEV. Because previous analyses of SA EEEV were either limited in their geographic scope or utilized only partial, concatenated sequences, conclusions regarding the genetic relationships of members within and among EEEV lineages were limited. In addition, previous analyses utilized linear regression and were based on few representatives of a single SA EEEV lineage. Here we exploited contemporary techniques to sequence and analyze the structural protein open reading frames (ORFs) of all available SA EEEV and additional NA EEEV isolates and phylogenetically compared SA EEEV and VEEV. Our results support evolutionary and ecological diversity between NA and SA EEEV and suggest that NA and SA lineages be considered independent species in the EEE complex.
Table Table11 includes a list of all EEEV strains included in this study, which were either from our collection or kindly provided by Robert Tesh (UTMB) from the World Reference Center for Emerging Viruses and Arboviruses. RNA was extracted using a QIAamp viral RNA extraction kit (Qiagen Inc., Valencia, CA), according to the manufacturer's protocol. The cDNA synthesis and PCR amplification reactions were conducted simultaneously using a Titan one-tube RT-PCR kit (Roche Diagnostics Corp., Indianapolis, IN), according to the manufacturer's protocol. The complete structural polyprotein ORFs of all EEEV strains were amplified by producing three overlapping fragments (primer sequences available upon request). SA EEEV strain GU68 required the use of additional strain-specific primers to fill gaps, and random hexamer primers were used to produce cDNA, followed by PCR in two-step RT-PCRs for strains BR75, BR76, BR77, PE75, and GU68. The PCR amplifications included 35 cycles, with annealing temperatures set to 3 to 5°C below the lowest melting temperature of each primer pair, and a 1-min extension step per kb of genome amplified.
PCR amplicons were extracted using agarose gel electrophoresis and purified using the QIAquick PCR purification kit (Qiagen). DNA sequencing was performed using the BigDye Terminator version 3.1 cycle sequencing kit (Roche) and an Applied Biosystems 3100 genetic analyzer (Foster City, CA). Independent sequencing reactions used both the forward and reverse amplification primers (3.2 pmol) and multiple internal sequencing primers.
Nucleotide sequences were aligned using ClustalW (48) in the MacVector 9.0 software package (MacVector, Inc.). The final sequence alignments were manually adjusted according to the translated ORF alignment. Pairwise comparisons were performed using MacVector; phylogenetic analyses were performed with multiple methods using the PAUP* version 4.0b10 (47) and BEAST version 1.4.7 (20) software packages, and bootstrap resampling was performed with 1,000 replicates (25). The heuristic search algorithm was used in maximum parsimony (MP) analyses, and the neighbor-joining (NJ) distance matrix algorithm was used with Hasegawa-Kishono-Yano, 85, Kimura 3, and general time-reversible (GTR) substitution models. Maximum likelihood (ML) analyses were performed with the heuristic search method using the GTR plus gamma distribution plus a proportion of invariant sites (GTR+G+I) model, as recommended by Modeltest 3.7 (35), and refined with multiple iterations of parameter estimates. The resultant ML substitution model parameters were also applied to NJ analyses for additional validation and bootstrapping. BEAST was used to implement a Bayesian Markov chain Monte Carlo (MCMC) method using the codon-based SRD06 nucleotide substitution model (44). Further details of the Bayesian analysis are provided below. As the most closely related alphavirus, VEEV was used as an outgroup to root some EEEV trees.
The BEAST software package was used to conduct Bayesian evolutionary analyses, including phylogenetic and coalescent analyses, from data sets compiled using the BEAUti interface. BEAST analyses produce rooted phylogenetic trees that incorporate a time scale based on rates of evolution estimated for each tree branch or group of related sequences. Rates of evolution were independently estimated as substitutions per nucleotide site per year (s/n/y), assuming both the relaxed and strict molecular clock models. Appropriate single or variable rates were then used to estimate divergence times (i.e., time since most recent common ancestor [TMRCA]) of the EEEV complex and of individual lineages. When available, dates of isolation for each strain were provided to the month; otherwise, they were designated as midway through the calendar year. All analyses were initially run with the relaxed molecular clock model using the uncorrelated lognormal distribution (UCLD) (19) to account for rate heterogeneity among lineages and indicate the degree to which the data fits a clock-like model of evolution. If unable to reject a clock-like evolution (as measured by the UCLD standard deviation [UCLD.STDEV] and coefficient of variation parameters), the analyses were then conducted under the strict molecular clock model to further refine the rate of evolution and divergence dates.
The Bayesian skyline coalescent model (21) was used in all strict and relaxed molecular clock analyses. The SRD06 model parameters were applied because they have been shown to impose a reasonable balance of prior information to fit coding nucleotide data (44). This model links first and second codon positions but allows the third position to differ in the rate of nucleotide substitution, the transition/transversion (Ti/Tv) ratio, and gamma-distributed rate heterogeneity. Convergence was monitored using the Tracer version 1.4 (http://beast.bio.ed.ac.uk/Tracer) software program, and the MCMC algorithm was run for a number of generations sufficient to obtain estimated sample size (ESS) values of at least 200 for each parameter in the model. At least two independent runs were performed for each data set. While chain length varied for each analysis conducted, they generally consisted of 10,000,000 to 50,000,000 generations, with parameters sampled and logged every 1,000 generations. Maximum clade credibility trees were generated (with 10% burn-in) to display median node heights using TreeAnnotator version 1.4.7 and visualized using FigTree version 1.2.2 (http://tree.bio.ed.ac.uk/software/figtree).
The complete structural polyprotein ORF of approximately 3.7 kb was sequenced for 25 SA EEEV strains and 4 NA EEEV strains. These new sequences were combined with all homologous EEEV sequences available from GenBank for a data set comprising 29 SA EEEV and 22 NA EEEV strains (Table (Table1;1; Fig. Fig.1).1). The monophyletic nature of the EEE complex within the Alphavirus genus and the presence of four major EEEV lineages were validated using all phylogenetic methods (Fig. (Fig.2A).2A). Consistent with previous findings (7), lineage I included isolates from North America, lineages II and III included isolates from Central and South America, and lineage IV contained a single strain from Brazil. The inclusion of longer and additional sequences in our analysis further supported the sister grouping of SA EEEV lineages II and III and the polyphyletic nature of all three Central/South American clades.
Pairwise comparisons of both nucleotide and amino acid sequences were used to determine the genetic relatedness among members of the EEE complex as well as their relatedness to VEEV (Table (Table2).2). The NA and SA EEEV lineages consistently showed 23 to 24% nucleotide and 9 to 11% amino acid sequence divergence. The SA EEEV were only slightly more conserved than the overall EEE complex, with 17 to 21% nucleotide divergence between the two main lineages (II and III) but only 3 to 5% amino acid divergence, indicating a high proportion of synonymous nucleotide changes. Greater divergence was observed between SA EEEV lineage IV and the other two SA lineages, particularly at the amino acid sequence level.
The degree of genetic divergence within each EEEV lineage varied greatly. NA EEEV lineage I was highly conserved, with less than 3% nucleotide divergence throughout its temporal and geographic range. The independent clades comprising SA EEEV lineage II differed from one another by approximately 5% and from the basal isolate (GU68) by 11 to 12%. SA EEEV lineage III was more highly conserved, with only 4 to 5% sequence divergence among strains. Consistent with previous alphavirus intercomplex comparisons (37), all EEEV lineages, and each of their members, differed from subtype I VEEV by 41 to 43% in both nucleotides and amino acids.
The temporally dominated evolution and monophyletic nature of the NA EEEV lineage were robustly supported by MP and Bayesian analyses, which placed the older isolates (1933 to 1977) at the base of clade, followed by subsequent divergence into 2 distinct, cocirculating groups in the 1970s (Fig. (Fig.2B).2B). However, the use of some NJ and ML models resulted in either the placement of MD90/FL93-939 isolates basal to the NA lineage or the paraphyletic codivergence of those isolates from the older isolates. While this arrangement supports the early cocirculation of two monophyletic groups in North America prior to 1970, low bootstrap values and the lack of basal resolution (polytomies) with these methods limited confidence in this theory. Similar inconsistencies in NA EEEV topology were encountered in earlier analyses (64). However, the limited sequence data and lack of early sequences led to the conclusion that NA EEEV evolves as a single lineage. Our robust MP and Bayesian phylogenies validated these previous assumptions. The basal inconsistencies we observed may reflect the inherent limitations of various phylogenetic methods to resolve relationships among very highly conserved sequences.
Although the placement of the MD90/FL93-939 group was inconsistent, the divergence of the NA EEEV lineage into additional monophyletic groups after 1970 was robustly supported in all analyses (Fig. (Fig.2B).2B). Previously termed group A and group B by Weaver et al. in 1994 (64), the sympatric cocirculation of these two groups was further validated by our distinct phylogenetic placement of two newly sequenced group A Florida 1993 strains, FL93-969 and FL93-1637, from the group B FL93-939 strain. FL93-969 and FL93-939 were isolated from two different mosquito species that were collected simultaneously from the same county (30).
A temporally dominated pattern of NA EEEV evolution was also evident in the terminal groupings of our most recent isolates, GA01, TX03, MA06, and TN08 (Fig. (Fig.2B).2B). The grouping of all recent isolates from Georgia, Tennessee, and Florida supported regional EEEV evolution, with only occasional geographic dispersal (5, 7, 64, 69). While other regional clusters (TX91/MX97/TX95, GL91/FL96, and MD85/CT90) also supported regionally confined transmission, their persistence appeared to be limited, and their topological placement generally followed a temporal trend. However, the basal relationship of a Massachusetts isolate (MA06) to the most terminal Southern grouping also emphasized the wide geographic dispersal and temporal conservation of NA EEEV.
The phylogeny of SA EEEV was stable regardless of the methods and models used and demonstrated an evolutionary pattern very different from that of NA EEEV. Multiple highly divergent lineages of SA EEEV have coevolved and continue to cocirculate in overlapping geographic regions (Fig. (Fig.1).1). A temporal trend of evolution was lacking, and multiple geographic clusters were evident within both of the main SA EEEV lineages (Fig. (Fig.2A).2A). The inclusion of longer, contiguous genomic sequences provided the robust support that had been lacking for previously recognized clades (7, 28), and the addition of more recent isolates revealed newly recognized geographic groupings that also lacked a temporal association.
Despite its limited representation, lineage II consisted of multiple genetically divergent SA clades. Brazilian (BR65/BR67) and Peruvian (PE70/PE3.0815-96/PE18.0172-99/PE18.0140-99) groups exhibited a high degree of localized genetic conservation, particularly exemplified by the isolates collected in the Amazon basin of Peru over a span of 30 years. Although lineage III was more highly conserved overall, it was more extensive in its geographic scope and contained numerous geographically based groupings. One such northern South/Central American cluster included isolates from Panama, Colombia, and Ecuador, with a collection time span from 1962 to 1992. Argentinean isolates collected from 1936 to 1959 also formed a robust grouping on the most terminal branches of the lineage, further emphasizing the lack of widespread EEEV dispersal in SA. Finally, a Peruvian clade (PE75/PE16.0050-98/PE0.0155-96) similar to that in lineage II further supported the genetic conservation among isolates from the same geographic area over the same period of time. Most interesting was the apparent cocirculation and persistence of subtypes II and III for multiple decades.
Interestingly, some of the highly conserved geographic SA EEEV clades were closely related to geographically distant isolates. For example, the Peruvian isolates of lineage II grouped with a distant Brazilian isolate (BR56), and those from Argentina consistently grouped with BR83 in lineage III. While long-term geographic groupings could indicate maintenance by vertebrate hosts with limited mobility, these distant relationships could represent historical introductions, perhaps via alternative vector or vertebrate hosts. Sampling bias is also inherent in these analyses, as the majority of SA EEEV isolates originated from equine epizootics, structured arbovirus surveillance, or focused scientific research studies. EEEV circulation in sparsely inhabited tropical regions may go undetected, resulting in an incomplete representation of the SA EEEV phylogeny.
Because VEEV ecology, and especially reservoir host use, is better understood in SA, the phylogenetic patterns of enzootic VEEV subtypes ID and IE were compared to those of SA EEEV. VEEV subtypes IAB and IC utilize fundamentally different epizootic cycles of limited duration and were therefore not considered. To provide an accurate comparison of the topologies and scales of divergence, the phylogeny in Fig. Fig.33 was generated using the structural polyprotein ORFs of both VEE and EEE complex viruses (see Table S1 in the supplemental material). Representative members of all VEE subtypes and two NA EEEV representatives (VA33 and MA06) were included in the tree for context and to provide an accurate topology of the VEE and EEE complexes.
Similar evolutionary patterns were observed between SA EEEV and VEEV subtypes ID and IE, which overlap both geographically and temporally. Many geographic clusters of SA EEEV and VEEV subtype ID/IE isolates were analogous in their spatial and temporal scales and their degree of genetic conservation (Fig. (Fig.3).3). For example, the SA EEEV lineage III grouping that included Panama/Colombia/Ecuador isolates and the Mexican/Guatemala VEEV IE grouping were comparable in their geographic dimensions, their collection times spanned 30 to 40 years, and they maintained similar levels of genetic conservation at approximately 98 to 99%. The lineage II and III Peruvian and lineage III Argentinean EEEV clusters were spatially more focal but equally conserved, which corresponded in geographic and collection time span to the VEEV subtype ID Venezuelan and VEEV subtype IE Mexican (MX63/MX08) and Guatemalan clusters. Although isolated decades apart, the viruses within each group differed in nucleotide sequences by less than 2%.
Despite the well-established role of rodent hosts with limited mobility in the transmission of enzootic VEEV (2, 62), examples of closely related viruses with distant geographic origin were also observed in the VEEV phylogeny (e.g., VEEV subtype ID PA61/PE98). Although fewer sequences are available for other subtypes of the VEE complex, the recent phylogeny (6) of VEE complex subtype IIIA (Mucambo virus) generally agreed with those observed with SA EEEV and VEEV subtypes ID and IE.
The rates of evolution of the EEE complex, NA EEEV lineage I, and SA EEEV lineages II to IV were independently analyzed under the relaxed molecular clock model of evolution (Table (Table3).3). We observed a high degree of rate heterogeneity in all 3 data sets, which signified that these data sets were best modeled with the relaxed molecular clock; therefore, the use of a strict molecular clock model of evolution was rejected. Mean substitution rates (UCLD.mean) were 2.1 × 10−4 s/n/y for the entire EEE complex, 2.7 × 10−4 s/n/y for the NA EEEV lineage, and 1.2 × 10−4 s/n/y for SA EEEV lineages II to IV.
Branch rate variation within the SA EEEV data set was not surprising because it included diverse SA EEEV lineages. Therefore, lineages II and III were individually analyzed using the relaxed clock model to determine the degree of intraclade variation. UCLD.STDEV parameter estimates abutting zero indicated that a strict molecular clock could not be rejected for SA EEEV lineages II and III. A strict clock model applied to the analysis of each lineage (Table (Table4)4) yielded a median substitution rate (clock rate) of lineage II (1.5 × 10−4 s/n/y) that was approximately 1.5 times higher than that of lineage III (1.0 × 10−4 s/n/y). Both the strict and relaxed clock models yielded similar rates of nucleotide substitution for each SA EEEV lineage, further supporting the robustness of the groupings within these lineages and their clock-like evolution (20).
There was considerable rate variation estimated with the relaxed molecular clock model analysis of NA EEEV; therefore, clades were analyzed individually via the relaxed and strict clock models. Based on NA EEEV phylogenetic analyses conducted in this study and those of previous studies (64), the individual groups analyzed consisted of the following: (i) all strains isolated prior to 1977, designated “pre-1977”; (ii) all strains isolated after 1977, designated “post-1977”; and (iii) post-1977 strains minus MD90 and FL93-939, termed “group B,” which corresponds with that of Weaver et al. in 1994 (64). Because only 2 isolates from group A were included in the present study, substitution rates were not estimated for these isolates.
The pre-1977 group was unable to efficiently reach convergence for all parameters using the relaxed clock model, suggesting a poor fit of this model to the data. Alternatively, convergence was quickly reached using the strict clock model, with a median substitution rate estimate of 9.4 × 10−5 s/n/y. The post-1977 and group B data sets ultimately reached convergence using the relaxed clock model; however, the strict clock model resulted in more efficient convergence and substitution rates similar to those with the relaxed model. The evolutionary rate estimate of 2.2 × 10−4 s/n/y in the post-1977 isolates was more than twice that of the pre-1977 group, supporting previous observations of an increase in evolutionary rate following the divergence of NA EEEV into two distinct, cocirculating clades in the 1970s (64). Differentially higher passage histories in these two groups may have slightly impacted these estimated evolutionary rates. However, many of the oldest EEEV isolates had very low passage histories (e.g., 1 to 4 for LA47 and LA50), and extensive passage of EEEV is accompanied by relatively few mutations (61), suggesting that any effect on evolutionary rate estimates was minimal. Interestingly, the rate for group B isolates (1.8 × 10−4 s/n/y) was lower than that of the post-1977 group, i.e., when MD90 and FL93-939 (group A) were removed, which implies that MD90 and FL93-939 evolved at a higher rate than those isolates in group B.
The times since most recent common ancestor (TMRCAs) were estimated using the model that best fit the corresponding data (Fig. (Fig.2A).2A). Using the relaxed model and the entire EEE complex, NA and SA EEEV last shared a common ancestor 1,598 (922 to 2,370) years ago, or around the year 410 AD of the Gregorian calendar. The same analysis estimated a divergence of lineage IV (BR85) from the other SA EEEV lineages 1,307 (868 to 1,794) years ago (701 AD), followed by the divergence of lineages II and III 878 (577 to 1,239) years ago (1129 AD). However, the relaxed model analysis that included only SA EEEV lineages produced much earlier TMRCA estimates of 2,166 (1,057 to 4,020) years since the divergence of lineage IV (158 BC) and 1,617 (836 to 2,926) years since the divergence of lineages II and III (391 AD). An additional analysis including all SA EEEV lineages and a single representative of the predominate NA EEEV clade (TX03) was performed in order to generate a TMRCA for the basal divergence of NA and SA EEEV that corresponded to those of the SA EEEV analysis. This analysis resulted in TMRCAs for all internal nodes that were similar to those generated by the SA EEEV strains only. In addition, the estimate for NA and SA EEEV divergence was much earlier, 2,866 years (1,689 to 4,856 years; ca. 878 BC), than that generated from the entire EEE complex data set. Although the confidence intervals broadly overlapped, these wide differences in TMRCAs and the corresponding dates of divergence highlight the variation obtained with the different models and data sets used for coalescent analyses and the imprecision of the estimates based on rate variation among virus lineages.
Geographic, pathogenic, and epidemiologic differences between NA and SA EEEV have prompted exploration of their genetic diversity and evolutionary history. However, a lack of corresponding sequence data had previously limited a robust comparison. By expanding the length and number of available EEEV sequences, we produced an equal platform upon which to compare and contrast the evolutionary patterns of NA and SA EEEV and to compare SA EEEV to the closely related VEEV. Our results emphasized the differences between NA and SA EEEV and provided insights into the extent that this divergence likely reflects extant transmission dynamics.
To explore the evolutionary history of the EEE complex, a Bayesian coalescent analysis was performed. Depending upon the data set used, median estimates of when NA and SA EEEV last shared a common ancestor were approximately 1,600 and 2,300 years ago, with ranges stretching much earlier than previously estimated. Data dominated by SA EEEV produced an earlier range of TMRCAs (1,689 to 4,856 years ago) due to the slower evolutionary rate estimated for these lineages (1.2 × 10−4 s/n/y), while those dominated by the entire EEE complex or just the NA EEEV yielded more recent TMRCAs (922 to 2,370 years ago) based on their higher evolutionary rate estimates (2.1 × 10−4 and 2.7 × 10−4 s/n/y, respectively). While it is unclear why analysis of the entire EEE complex was influenced more by NA EEEV than by SA EEEV, the variation in evolutionary rates among EEEV lineages limits the precision of estimates for divergence events. The stability and uniformity of the slower evolutionary rates of the SA EEEV lineages, as well as their concordance with estimates of other alphaviruses (66), support the earlier estimates of key divergence events.
The consistency observed in SA EEEV evolutionary rates suggests long-term adaptation to its ecology and stability in its environment. Nonsynonymous (dN)-to-synonymous (dS) mutation ratios (data not shown) in SA EEEV lineages II and III suggested similar degrees of purifying selection. This may indicate that EEEV has reached a high level of fitness for circulation in South and Central America, thus stabilizing its evolutionary rates. Although still dominated by purifying selection, higher dN/dS ratios were observed for NA EEEV, with that of the pre-1977 group exceeding that of the post-1977 group. This pattern is consistent with progressive adaption of EEEV to its transmission cycle in North America, possibly reflecting its relatively recent introduction or anthropogenic changes in it habitat. However, a decline in the dN/dS ratios was also associated with increasing evolutionary rates, suggesting that positive selection is an unlikely driving force behind this rate change.
An alternative explanation for the apparent increase in the EEEV evolutionary rate in North America is genetic drift. Recent studies have focused on NA EEEV transmission in the northeastern United States and provide evidence for episodic overwintering, regionally independent evolution, and epizootic clustering (5, 69). While the precise mechanisms are unclear, viral overwintering in temperate regions could impose focal bottlenecks, and surviving populations may be more subject to rapid genetic drift and seasonal competition with southern strains reintroduced from areas of continuous transmission. In addition, recent work suggests that, in some areas, NA EEEV transmission may deviate from the typical avian-mosquito enzootic cycle to involve ectothermic hosts, such as reptiles and amphibians, and herpetophilic mosquito vectors (14, 16). Changes in vector and host ecology in these southeastern foci could impact the spatial and temporal transmission patterns by affecting virus dispersal and reducing virus populations, thereby providing additional opportunities for founder effects and genetic drift. Because these dynamics could contribute to variability in EEEV evolutionary rates, it may be important to monitor the evolutionary progression of NA EEEV when considering predictive factors of epizootic/epidemic emergence and adaptation to new environments.
The dichotomy between NA and SA EEEV was further underscored by their distinct genetic and phylogenetic patterns. The highly conserved, monophyletic, and temporally dominated relationships among strains of NA EEEV starkly contrast with the highly divergent, polyphyletic, cocirculating, and geographically associated relationships among SA EEEV strains. The maintenance of NA EEEV by highly mobile avian hosts, with their ability to widely disperse the virus, is hypothesized to determine its molecular epidemiologic patterns. Similar patterns are observed with other New World alphaviruses, e.g., western equine encephalitis virus (WEEV), which also uses avian vertebrate hosts throughout its North and South American transmission range (27, 38, 45), and Highlands J virus that circulates in eastern North America in a manner indistinguishable from EEEV (13). Alternatively, arboviruses that utilize less mobile mammalian hosts tend to share a molecular epidemiologic pattern more similar to that observed for SA EEEV. Ground-dwelling mammals, such as rodents and marsupials, lack the ability to physically disperse acutely infecting viruses. Theoretically, this limited host and virus mobility leads to geographically defined transmission foci with independent evolution.
As the closest relative to EEEV, VEEV circulates sympatrically with SA EEEV and provides a prototypical example of the evolutionary pattern generated by an arbovirus that relies primarily on terrestrial mammalian vertebrate hosts for its enzootic maintenance. A comparison between SA EEEV and VEEV subtypes ID and IE revealed similar patterns of genetic divergence characterized by the evolution of multiple subtypes and lineages and highly conserved geographic groupings that lack temporal clustering. Comparable to those observed with VEEV subtypes ID/IE, the geographic scales defining SA EEEV clusters are highly focal, on the order of a few hundred miles or less. This pattern suggests a mode of transmission that limits dispersal of EEEV in SA and is consistent with the use of mammalian vertebrate hosts as reservoirs and amplifiers. In contrast, NA EEEV demonstrates a similar degree of genetic conservation over its entire geographic range, up to thousands of miles, which is consistent with wide dispersal of the virus by avian hosts.
Although VEEV and SA EEEV overlap in their range of transmission and share similar evolutionary profiles, their degree of ecological similarity is unknown. Members of the Culex (Melanoconion) subgenus have been implicated as the primary vectors of both enzootic VEEV (15, 42, 51, 52, 59, 67) and SA EEEV (28, 33, 53, 58) in Central and South America. While these mosquitoes are known to feed on a variety of vertebrates, a primary vertebrate host(s) for SA EEEV has not yet been identified. Field isolations, seroprevalence among wild birds, rodents, marsupials, and reptiles, and experimental data (N. C. Arrigo, unpublished data) indicate that both mammalian and avian species are susceptible to infection (12, 17, 31, 45, 56, 57, 58); however, their involvement in maintaining enzootic transmission of SA EEEV is unclear. Additional ecological and experimental data are needed to implicate a particular type of vertebrate host responsible for the maintenance of SA EEEV.
In the early 1980s, the classifications of numerous arboviruses, including EEEV, were proposed based solely on their antigenic properties (10). Different viruses were delineated by a fourfold or greater difference in antibody cross-reactivity in both directions, i.e., the heterologous versus homologous antibody titers of sera from 2 viruses. A fourfold or greater difference in only one direction designated a subtype, while antigenic varieties were distinguishable only with special serological tests (e.g., kinetic hemagglutination inhibition). According to this definition, all EEEV strains were originally classified as a single virus consisting of two antigenic varieties, NA and SA (11). Later, cross-neutralization testing with representatives from each phylogenetically identified EEEV lineage divided EEEV into 4 antigenic subtypes, despite some relationships with greater than fourfold differences in cross-reactivity in both directions (7).
The International Committee on Taxonomy of Viruses (ICTV) has more recently revised the definition of a virus species to be a “polythetic class of viruses that constitute a replicating lineage and occupy a particular ecological niche” (24, 54). This definition incorporates the notion of multiple characteristics defining a virus species, including but not limited to genetic and phylogenetic relationships, geographic distribution, differences in ecology and transmission cycles, pathogenicity, morphology, replication patterns, and antigenicity. Genetic diversity resulting in distinct phylogenetic lineages can often reflect differences in ecological niche and evolutionary history; therefore, they often dominate the current classification of novel virus species. For example, the newly discovered Lujo virus (family Arenaviridae) (8) and Bundibugyo ebolavirus (family Filoviridae) (49) were designated novel species primarily based on their nucleotide sequence divergence of at least 21.5% and 32%, respectively, which also corresponded to unique geographic isolation and pathogenic properties.
The ability to analyze genetic relationships has also led to the reconsideration of established Alphavirus taxonomy, resulting in recommendations that have subsequently been accepted by the ICTV. Tonate virus was designated a species unique from Mucambo virus within subtype III of the VEE complex based on 16% nucleotide and 7% amino acid sequence divergence as well as antigenic differences and the use of different reservoir hosts (37). The distinction of the Mayaro virus and Una virus species was also supported by recent molecular epidemiological studies, despite their previous conspecific designations based on antigenic relationships (36). These viruses exhibit 55% nucleotide sequence divergence, and their phylogenetic patterns also suggest differences in the use of reservoir hosts and the occupation of distinct ecological niches. With up to 24% nucleotide and 11% amino acid sequence divergence between lineages of NA and SA EEEV, the genetic and phylogenetic diversities observed in our study were consistent with the examples described above and with the 21% nucleotide and 8% amino acid sequence divergence generally observed among different Alphavirus species of the same antigenic complex (37).
The current ICTV species definition encompasses several characteristics that are applicable to public health programs aimed at prevention and intervention. Perceptions of EEEV often focus on NA strain characteristics, namely, the avian-mosquito transmission cycle, geographic range, highly pathogenic nature resulting in severe human and equine encephalitis, and highly conserved genetic nature. However, the distinct characteristics of SA EEEV are not reflected by this depiction. Importantly, unlike NA EEEV, SA EEEV has little to no association with human disease, despite evidence of human exposure in areas of endemic and epizootic activity (4, 17, 18, 40). Differential replication in lymphoid tissues of mice and differences in interferon induction and sensitivity (3, 26) may contribute to the observed attenuation of SA EEEV, further distinguishing it pathogenically from NA EEEV.
Considering the goal of classification as a means to facilitate the understanding of a virus taxon from multiple perspectives, we recommend designating NA and SA EEEV as separate virus species, given their distinct geographic, epidemiologic, ecologic, pathogenic, genetic, phylogenetic, and evolutionary characteristics. This revision, based on polythetic criteria, would provide a more medically and scientifically accurate representation of the viruses comprising the EEE complex. Reclassification of individual SA EEEV subtypes is not warranted based solely on genetic differences, as the lack of information on potential ecologic differences within South America precludes the evaluation of polythetic criteria. Because NA EEEV strains are considered the prototypes, we propose a revision of all SA EEEV strains to a new species called Madariaga virus (MADV), based on the location of the earliest strain isolated in 1930 from General Madariaga Partido, Buenos Aires Province, Argentina (39, 41).
We thank Robert Tesh and Hilda Guzman of the World Reference Center for Emerging Viruses and Arboviruses (UTMB) for providing many of the EEEV isolates used in this study and Sara M. Volk for valuable theoretical and technical expertise in the evolutionary analysis.
N.C.A. was supported by the TO1/CCT622892 Fellowship Training Grant in Vector-Borne Infectious Diseases from the Centers for Disease Control and Prevention and by Biodefense Training Program NIH T32 training grant AI-060549. A.P.A. was supported by the James W. McLaughlin Fellowship Fund. This work was supported by the John S. Dunn Research Foundation and NIH grant U54 AI-057156 from the National Institute of Allergy and Infectious Diseases to S.C.W. through the Western Regional Center of Excellence for Biodefense and Emerging Infectious Diseases Research.
Published ahead of print on 4 November 2009.
†Supplemental material for this article may be found at http://jvi.asm.org/.