|Home | About | Journals | Submit | Contact Us | Français|
Partial E1 envelope glycoprotein gene sequences and complete structural polyprotein sequences were used to compare divergence and construct phylogenetic trees for the genus Alphavirus. Tree topologies indicated that the mosquito-borne alphaviruses could have arisen in either the Old or the New World, with at least two transoceanic introductions to account for their current distribution. The time frame for alphavirus diversification could not be estimated because maximum-likelihood analyses indicated that the nucleotide substitution rate varies considerably across sites within the genome. While most trees showed evolutionary relationships consistent with current antigenic complexes and species, several changes to the current classification are proposed. The recently identified fish alphaviruses salmon pancreas disease virus and sleeping disease virus appear to be variants or subtypes of a new alphavirus species. Southern elephant seal virus is also a new alphavirus distantly related to all of the others analyzed. Tonate virus and Venezuelan equine encephalitis virus strain 78V3531 also appear to be distinct alphavirus species based on genetic, antigenic, and ecological criteria. Trocara virus, isolated from mosquitoes in Brazil and Peru, also represents a new species and probably a new alphavirus complex.
The family Togaviridae is comprised of two genera, Alphavirus and Rubivirus (77). The genus Alphavirus contains at least 24 species (77) that can be classified antigenically into seven complexes (4) (Table (Table1).1). As a genus, the alphaviruses are widely distributed throughout the world, inhabiting all of the continents except Antarctica. The geographic distributions of individual species are restricted because of specific ecological conditions and reservoir host and vector restrictions (22, 77).
Members of the genus Alphavirus are typically maintained in natural cycles involving transmission by an arthropod vector among susceptible vertebrate hosts (60). Virus-host interactions may be highly specific, and sometimes only a single mosquito species is utilized as the principal vector, as has been reported for many Venezuelan equine encephalitis (VEE) complex viruses (74). These specific virus-vector interactions may limit the distribution of many alphaviruses. Possible exceptions to the presumption that all alphaviruses have an arthropod host are the newly identified salmonid viruses salmon pancreas disease virus (SPDV) (81) and sleeping disease virus (SDV) (69). These viruses have been isolated only from diseased Atlantic salmon and rainbow trout, respectively, and are not known to have arthropod vectors. It has been postulated that the sea louse, Lepeophtheirus salmonis, may play a role in the transmission of SPDV, but no evidence to support this hypothesis has been generated. Parasitic lice have been implicated in the transmission of the newly discovered southern elephant seal alphavirus (SESV) from the coast of Australia. SESV has been grouped genetically with the Semliki Forest virus complex (32).
The members of the genus Alphavirus cause a wide range of diseases in humans and animals. Many Old World viruses, including the Ross River, Barmah Forest, Mayaro, o'nyong-nyong, chikungunya, and Sindbis viruses, cause an arthralgia syndrome (47, 52), while encephalitis is caused by VEEV, eastern equine encephalitis virus (EEEV), and western equine encephalitis virus (WEEV) in the New World. In addition to causing febrile illness in equines, pigs, and calves, Getah virus has been reported to potentially induce abortion or stillbirth in pregnant sows (20, 44). Highlands J virus causes dramatic decreases in egg production and mortality in domestic birds (13, 70). Seroprevalence data on many of the remaining alphaviruses indicate that they infect people and/or domestic animals but have unknown clinical manifestations or cause only a mild febrile illness (1, 29–31, 41, 63, 65). Interestingly, alphaviruses causing similar disease symptoms are maintained under diverse ecological conditions and can have a widespread distribution. For example, Mayaro virus is limited geographically to Latin America (46, 64) while o'nyong-nyong virus has never been identified outside of Africa (21, 33, 48). These two viruses cause almost identical clinical signs and symptoms. This unusual epidemiological pattern seen among the various alphaviruses presents some intriguing questions regarding evolutionary relationships of the members of the Alphavirus genus, including the origins of the genus and subsequent geographic expansion of the genus and species.
The alphaviruses are small, spherical, enveloped viruses with a genome consisting of a single strand of positive-sense RNA (22, 55, 60). The nonstructural protein genes are encoded in the 5′ two-thirds of the genome, while the structural proteins are translated from a subgenomic mRNA colinear with the 3′ one-third of the genome (Fig. (Fig.1).1). Replication occurs within the cytoplasm, and virions mature by budding through the plasma membrane, where virus-encoded surface glycoproteins E2 and E1 are assimilated. These two glycoproteins are the targets of numerous serologic reactions and tests (e.g., neutralization and hemagglutination inhibition); the alphaviruses show various degrees of antigenic cross-reactivity in these reactions, forming the basis for the seven antigenic complexes, 24 species, and many subtypes and varieties of alphaviruses defined previously (4, 23, 62). The E2 protein is the site of most neutralizing epitopes, while the E1 protein contains more conserved, cross-reactive epitopes.
Previous studies of the evolutionary relationships among alphaviruses have relied on phylogenetic analyses of either partial or complete sequences from one or more of the seven protein genes (35, 73, 80). Overall, these studies have produced relationships in agreement with the antigenically based approaches used traditionally for alphavirus classification (4, 7, 77). For example, viruses in the VEE (49, 76), EEE (2, 75), and WEE antigenic complexes (80) have each been shown to be monophyletic (WEE complex for the envelope glycoproteins only). Additionally, phylogenetic studies have shown that most of the New World viruses in the WEE antigenic complex (WEEV, Highlands J virus, Fort Morgan virus, and Buggy Creek virus [a variant of Fort Morgan virus]) are descendants of an ancestral alphavirus that resulted from a recombination event; recombination combined the E2 and E1 envelope protein genes from a Sindbis-like virus and the remaining genes from an EEEV-like ancestor (19, 80). The Old World serogroups have been studied in less detail; the chikungunya, o'nyong-nyong, Semliki Forest, and Ross River viruses, belonging to the Semliki Forest virus complex, are monophyletic in some analyses and paraphyletic in others, with Middelburg virus falling into this group in some trees (73, 79).
To provide a more complete understanding of the evolutionary history and mechanisms of emergence of alphaviruses, we conducted a comprehensive examination of the evolution of the genus by sequencing most of the E1 envelope glycoprotein gene for representatives of all alphavirus species (77), as well as major antigenic subtypes and varieties (4). Using phylogenetic methods, these sequences were used to reexamine the evolutionary history and systematics of the genus.
The virus strains used in this study are listed in Table Table1.1. Viruses were diluted and passaged on BHK-21 or Vero 76 cells at a low multiplicity of infection. After approximately 75% of the cells exhibited cytopathic effects, the virus present in the supernatant was concentrated by precipitation with 7% polyethylene glycol and 2.3% NaCl (24). The virus pellet was resuspended in 150 μl of TEN (Tris-EDTA-NaCl) buffer, and 2 ml of Trizol LS (Gibco BRL, Bethesda, Md.) was added in preparation for RNA extraction in accordance with the manufacturer's protocol.
RNA was extracted from one-half of each virus-Trizol suspension in accordance with manufacturer's protocols as described previously (8). cDNAs were synthesized from the RNA by using a poly(T) oligonucleotide primer (T25V-Mlu; 5′-TTACGAATTCACGCGT25V-3′ or T19V). PCR amplification was performed on the first-strand cDNA by using the poly(T) primer and a forward primer designated α10247A (5′-TACCCNTTYATGTGGGG-3′). Thisforward primer anneals to a highly conserved sequence that encodes the putative fusion domain of the E1 protein, and this conservation allowed us to amplify most of the E1 glycoprotein gene from a wide variety of highly divergent alphaviruses. Amplification of the carboxy portion of the E1 gene and the 3′ noncoding region utilized the following parameters: 30 cycles of denaturation at 95°C for 30 s, primer annealing at 49°C for 30 s, and extension at 72°C for 3 min. A 10-min final extension was used to ensure complete product synthesis. For the virus designated Ag80-663, for which the above primer pairs were unsuccessful, the T25V-Mlu primer was used in conjunction with primer E/V7514(+) (5′-ACYCTCTACGGCTRACCTRA-3′) to amplify the entire 26S subgenomic message region. Sequencing was performed on this strain by gene walking using sequentially designed primers (see Table Table22).
PCR products ranging in size from 1.1 to 1.8 kb were isolated from 1% agarose gels. The cleaned DNA fragments were either sequenced directly or cloned into pBluescript II SK (Stratagene, La Jolla, Calif.) that had been linearized with SmaI. Restriction enzyme SmaI was included in the ligation reaction to reduce the religation of the vector upon itself. White bacterial colonies were screened for plasmids containing inserts of the correct size. Two selected clones were sequenced by using plasmid-specific T7 promoter and M13 reverse primers. Additional internal sequence was obtained by using virus-specific primers as indicated in Table Table2.2. Sequencing was performed by using an Applied Biosystems (Foster City, Calif.) Prism 377 sequencer and BigDye automated DNA sequencing kit. Deduced amino acid sequences were aligned with those of other alphaviruses sequenced previously (Table (Table1)1) by using the PILEUP program in the University of Wisconsin Genetics Computer Group package (10) with manual refinements to preserve codon homology. Pairwise comparisons were performed with PAUP (61) and the GAP program within the Genetics Computer Group package.
Phylogenetic analyses were performed on both the nucleotide and translated amino acid sequences for the E1 gene or complete 26S sequence by using the PAUP program (61). The heuristic algorithm was employed for the maximum-parsimony analysis. The neighbor-joining distance matrix algorithm was used with the Kimura 2 parameter and F84 corrections. Bootstrap resampling to determine confidence values on the groupings within trees was performed with 1,000 replicates (12). For the generation of a maximum-likelihood model for alphavirus sequence evolution, closely related sequences of many different strains of EEEV (2), VEEV (49), WEEV (80), Highlands J virus (8), and chikungunya virus (48) were analyzed to avoid the effects of superimposed nucleotide substitutions. Tree topologies determined previously were used to estimate the transition-transversion ratios and gamma values for the unevenness in the distribution of substitutions across nucleotide sites using the PAUP program (61). These estimates were then used to search for maximum-likelihood trees for the Alphavirus genus.
Nucleotide sequence accession numbers. The GenBank accession numbers for the sequences reported in this paper are AF398374 to AF398393.
The 3′ noncoding region (NCR) and E1 envelope glycoprotein gene were selected for genetic analyses to take advantage of conserved sequences described previously for primer annealing and PCR amplification (8). The E1 region has also been shown to be phylogenetically informative. Alphavirus cDNAs were synthesized by using an oligo(dT) primer containing a 3′ clamp (T25V-Mlu). By using this primer and a primer from a conserved region of the E1 gene (α10247A), nearly all alphavirus genomes were amplified. The VEE complex virus Ag80-663 (subtype VI) could not be amplified with the α10247A primer, but the entire 26S region of this strain was amplified by using T25V-Mlu and E/V7514(+). This was the only alphavirus that required alternative amplification conditions (see Materials and Methods). An analysis of the genome at the α10247A primer binding site revealed that the primer site was a highly conserved region across the entire Alphavirus genus and was an exact match in strain Ag80-663, making it unclear why this virus was unable to be amplified with this primer.
All of the amplicons generated ranged in size from 1.1 to 1.8 kb (Table (Table3),3), depending upon the length of the 3′ NCR. The shortest 3′ NCRs belonged to the 78V3531 and Trocara viruses, and Bebaru virus contained the longest (Table (Table3).3). Some of the amplicons (Getah, Una, Babanki, and Trocara viruses) did not contain the conserved alphavirus termini (5′-ATTTTGTTTTTAATATTTC-3′) (45), indicating that perhaps the entire 3′ NCR was not amplified. Trocara virus was unique in that the α10247A primer used in the PCR amplification was present on both ends of the amplicon, suggesting a 3′ NCR of only 34 nucleotides. The use of a longer poly(T) oligonucleotide primer increased the likelihood of obtaining the entire 3′ NCR (T25 compared to T19) but was still unsuccessful in some instances, including that of Trocara virus. However, based on the finding of George and Raju (16) that the classical 19-nt conserved terminal element is not essential for replication or virus maintenance, it is possible that some of these viral sequences that appear to be incomplete (because they lack the entire conserved 3′ terminus) are actually complete. While there was considerable variability in the 3′ NCR sequences, the E1 gene, with the exception of the five or six 3′ terminal codons, was more conserved among all of the alphaviruses. Most viruses were sequenced directly by using the α10247A and α10552(+) primers. However, several required additional primers and the Kyzylagach strain, a subtype of Sindbis virus, was unable to be sequenced with the universal internal primer and required virus-specific primers (Table (Table2).2). Occasionally, there were sequence differences between isolates sequenced in our laboratory and those in the GenBank database. The Mucambo (VEE subtype IIIA), Tonate (VEE subtype IIIB), 71D1252 (VEE subtype IIIC), and Ag80-663 (VEE subtype VI) viruses had differences, typically in the third codon and/or synonymous positions, that were most likely due to differences in passage history. The sequence analyses we performed utilized the isolates with the lowest passage histories available, which were generally lower than those of the isolates used to generate sequences already in the GenBank database.
To determine the extent of relatedness of all established members of the genus Alphavirus, pairwise comparisons were performed by using the nucleotide and deduced amino acid sequences in the E1 gene coding region (Table (Table4).4). The C-terminal 5 to 10 amino acids and their codons were omitted from the analyses because they were highly divergent and could not be aligned reliably (many alignment scores for this fragment did not differ statistically significantly from jumbled alignments). In general, the percentage of sequence divergence correlated inversely with serologic cross-reactivity (4). Viruses within a given antigenic serocomplex were usually genetically more closely related than viruses in different complexes. Those within a given antigenic complex typically had a nucleotide sequence divergence of less than 43% and an amino acid sequence divergence of less than 44%, while interserocomplex comparisons usually exceeded 38 and 40%, respectively. The Middelburg virus complex was the least divergent of the antigenic complexes, with only 33% nucleotide and 31% amino acid sequence divergence compared with some Semliki Forest virus complex viruses, such as Getah virus. In contrast, Trocara virus exhibited considerable sequence divergence versus all other alphaviruses, with at least 43% nucleotide and 47% amino acid sequence divergence. These data support the previous conclusion that Trocara virus probably represents a new antigenic complex in the genus Alphavirus (65).
Within antigenic complexes, different Alphavirus species generally showed at least 21% nucleotide and 8% amino acid sequence divergence. One exception was Everglades virus (EVEV), which differed from some strains of VEEV by only 10% at the nucleotide level and 3% at the amino acid level. Me Tri virus, which is associated with central nervous system disease in Vietnamese children and is considered a new alphavirus on the basis of antigenic tests (18), differed by only 2% in its nucleotide sequence and 1% in its amino acid sequence from Semliki Forest virus. Different subtypes of a given alphavirus had as little as 3% nucleotide and 1% amino acid sequence difference (Sindbis virus) and as much as 25 and 13%, respectively (Ross River and subtype Sagiyama viruses). The maximum divergence between subtypes was found in VEE subtype IF (78V3531), which differed from all other VEE complex viruses by at least 22% at the nucleotide sequence level and 19% at the amino acid sequence level. As in previous analyses (49), this virus grouped with VEE complex subtype VI (with the Ag80-663 strain, a species considered distinct from VEEV) but was still quite distantly related.
The sequences of the two fish viruses were the most distinct genetically, with at least 49% nucleotide and 59% amino acid sequence divergence versus all other alphaviruses. SDV and SPDV had only 5% nucleotide and 2% amino acid sequence differences, values comparable to those of subtypes of other alphaviruses.
Initially, phylogenetic analyses were performed on the E1 gene region by using the maximum-parsimony and neighbor-joining methods. These methods produced trees with similar topologies, differing primarily in the relationships among some serocomplexes and within the Semliki Forest complex. Viruses with inconsistent placement included the Barmah Forest, Middelburg, Mayaro, Una, and Trocara viruses. Neighbor joining grouped the Barmah Forest and Ndumu viruses at the base of the Semliki Forest clade and placed Middelburg virus within the Semliki Forest group. In neighbor-joining trees, Trocara virus was basal to the WEE complex, which grouped with the EEEV-VEEV clade (Fig. (Fig.2).2). Maximum parsimony placed Middelburg virus outside of the Semliki Forest virus clade without transversion weighting and placed Trocara virus at the base of a nonfish alphavirus clade (not shown). The placement of the Cabassou and Pixuna viruses within the VEE complex was also inconsistent when different methods were used. In general, analyses using amino acid sequences generated results similar to those described above, with diminished resolution within some terminal groupings due to loss of informative, synonymous nucleotides. When all of the methods were used, midpoint rooting placed the fish virus clade at the base of the alphavirus tree, indicating that these viruses probably diverged from the mosquito-borne alphaviruses very early in the evolution of the genus.
In an attempt to resolve the topological discrepancies described above, the maximum-likelihood method was used, based on a sequence evolution model derived from previously published detailed analyses of many strains from a given Alphavirus species or complex (2, 8, 43, 48, 49). Maximum-likelihood analysis of these data sets provided a mean estimate of 4.5 for the Ti/Tv ratio and a mean gamma value of 0.24. When these values were used, the topologies generated by the maximum-parsimony and neighbor-joining methods were evaluated by using a Kishino-Hasegawa likelihood test (61). The topology generated by the neighbor-joining method, shown in Fig. Fig.2,2, was significantly more likely (maximum likelihood, P < 0.03) than the topologies generated by the maximum-parsimony method.
In an attempt to resolve further some of the discrepancies in the tree topologies generated from partial E1 protein gene sequences, we analyzed the complete nonstructural and structural protein gene sequences available for all alphavirus species by using the methods described above. Included in this analysis was the partial structural polyprotein sequence of SESV (32). Individual genes were also analyzed, and no evidence of recombination (topology differences supported by bootstrap values of 80% or greater), aside from the recombinant WEEV-Highlands J virus-Fort Morgan virus group described previously (19, 80), was detected. Structural polyprotein gene trees were consistently more robust than those constructed from nonstructural genes and also included more alphavirus representatives. Therefore, we focused on the structural polyprotein gene analyses.
Trees generated by using both the maximum-parsimony and neighbor-joining methods had identical topologies, except for the placement of Middelburg virus, which fell within the Semliki Forest complex when the neighbor-joining method was used and was basal to the Semliki Forest virus complex when the maximum-parsimony method was used. The neighbor-joining tree generated by using amino acid sequences, which had higher bootstrap values than all others, is shown in Fig. Fig.3.3. Because this tree had robust groupings for the VEE complex, we applied the VEE topology to the partial E1 protein gene sequence analysis (Fig. (Fig.2)2) and compared the maximum-likelihood values generated for both the E1 and structural polyprotein topologies. The likelihood ratios indicated that the neighbor-joining topology generated by using structural polyprotein sequences was as likely as the original topology generated with E1 nucleotide sequences. The original E1 topology, which placed Cabassou virus at the base of the VEE clade, with Pixuna virus a sister group to VEEV and EVEV, was not significantly more likely when our sequence evolution model was used (P > 0.3). Therefore, we believe that Fig. Fig.22 represents the most accurate topology available for the genus Alphavirus. The fish viruses were even more clearly the outliers in the complete structural polyprotein analyses than in the trees generated from partial E1 sequences, providing stronger evidence that they would represent the basal clade in a rooted tree, as indicated in the midpoint rooted trees (Fig. (Fig.2).2). SESV also appeared to be quite distinct genetically from all of the mosquito-borne alphaviruses, with an amino acid sequence divergence level equivalent to that of a distinct antigenic complex. However, the distance of the SESV branch could be somewhat misleading if the missing regions (part of the capsid and E1 protein sequences) are less divergent than the included sequence regions (E3, E2, and 6K).
Previous analyses of Alphavirus evolution suggested that the genus originated in the New World from an insect-borne plant virus (35, 73, 79). The present analyses are also consistent with this hypothesis. Excluding the fish and seal viruses, a New World origin would require at least three transoceanic introductions between the hemispheres: (i) transport of the ancestor of the Barmah Forest-Ndumu-Middelburg-Semliki Forest virus complexes from the New World to the Old World, (ii) transport of the ancestor of the Sindbis and Whataroa viruses to the Old World, and (iii) transport of the ancestor of the Mayaro and Una viruses from the Old World to the New World (Fig. (Fig.2).2). However, an Old World origin is also consistent with three transoceanic introductions between the hemispheres: (i) transport of the ancestor of the Trocara virus-WEE-EEE-VEE complexes from the Old World to the New World; (ii) transport of the ancestor of the Sindbis and Whataroa viruses to the Old World, and (iii) transport of the ancestor of the Mayaro and Una viruses from the Old World to the New World (Fig. (Fig.2).2). These equally parsimonious scenarios do not favor either hypothesis over the other. An ancestral alphavirus presumably adapted to fish in the distant past to form the SDV-SPDV lineage. The possible transmission of SESV by insects (lice) strengthens the hypothesis that alphaviruses arose as insect-borne or insect viruses.
Previous estimates placed the origin of the alphaviruses several thousand years ago (73, 79). However, the methods employed previously relied on the assumption of an equal rate of substitutions across nucleotide or amino acid positions in the alphavirus genome. Our data clearly indicate that this assumption is invalid; all estimates of the uniformity of nucleotide changes across sites are far from uniform, with an average gamma value of only 0.24 for those viruses examined in detail (range, 0.05 to 0.31). This nonuniformity in nucleotide substitutions across sites, combined with the saturation of nucleotide changes in many positions, indicates that estimates on the order of thousands of years ago for the alphavirus ancestor are far too recent. An accurate time estimate for the alphavirus progenitor may be impossible due to these factors. Another example of the problems with estimating internal branch lengths is illustrated by our analysis of the recombination event between EEEV- and Sindbis virus-like ancestors leading to the WEEV Fort Morgan virus-Highlands J virus group (19, 80). The interior branch lengths produced with most of the phylogenetic methods yielded different horizontal positions for the internal branches shown previously to represent the recombinant ancestors (80) (Fig. (Fig.2).2). The fact that these ancestors did not occur at the same horizontal position (the dashed line in Fig. Fig.22 cannot be drawn vertically) indicates error in the internal branch lengths of either the EEE or the WEE complex clade or both clades.
Because homologous sequences for the structural proteins cannot be identified in viruses outside the genus Alphavirus, even in rubella virus, which comprises the other genus (Rubivirus) in the family Togaviridae, our trees could not be rooted by using an outgroup. If midpoint rooting is used, the fish viruses are consistently placed at the base of our alphavirus trees; this rooting relies on the assumption of a constant rate of evolution across different lineages in the tree. The WEE complex recombination example described above implies that this assumption is not completely correct and suggests that an unrooted tree is the most accurate representation of the genus at this time.
Previous studies of Alphavirus diversification have emphasized host switching events and geographic introductions in the evolution of the genus (2, 48, 49, 73, 80). Examination of the complete Alphavirus phylogeny confirms the importance of these mechanisms. The Alphavirus phylogenies also show numerous examples of host switching events, such as the presumed introduction of EEEV into North America, accompanied by switching from Culex to Culiseta mosquito vectors (73). EVEV was presumably introduced into Florida from Central or South America and adapted to Culex cedecei, which occurs only in North America. Chikungunya virus is believed to have originated in East Africa in a nonhuman primate-sylvatic Aedes mosquito transmission cycle and later was introduced into Asia along with the urban vector Aedes aegypti (48). O'nyong-nyong virus is believed to have evolved from a chikungunya-like virus that adapted to Anopheles mosquito vectors, a unique trait among alphaviruses (48).
The diversity exhibited by alphavirus groups may be influenced strongly by host mobility. Viruses that utilize reservoir hosts with limited mobility, such as small mammals, tend to be quite diverse and have nonoverlapping distributions. The best examples are the VEE complex viruses, which use primarily rodent hosts and Culex (Melanoconion) mosquito vectors with a limited flight range. VEE complex viruses occur nearly throughout the neotropics and subtropics, but the distributions of the various subtypes are discrete, for the most part. A similar epidemiological phenomenon is seen among the isolates of Ross River virus from Australia (37). Viruses that use birds as their reservoir hosts, such as Sindbis virus, EEEV, and WEEV in North America, are less diverse, and each variant or topotype tends to occupy a greater geographic range (37, 54). Host mobility presumably limits virus diversity by preventing geographic isolation and allopatric divergence and by favoring competitive exclusion of closely related viruses that are mixed over large geographic ranges.
Initially, Alphavirus classification was defined by the Subcommittee on Interrelationships Among Catalogued Arboviruses (SIRACA) of the American Committee on Arthropod-Borne Viruses, which relied completely on antigenic cross-reactivity in tests such as hemagglutination inhibition, complement fixation, and neutralization (4, 7). These criteria identified seven antigenic complexes of alphaviruses that contained members displaying greater cross-reactivity to each other than to members of other complexes. Different Alphavirus species were defined as viruses with fourfold or greater differences in cross-reactivity in both directions (one virus reacted against antibody from a second, and the second virus reacted against antibody produced against the first) compared to homologous (a given virus reacted against antibody produced against itself) antibody-antigen reactions. Subtypes were considered viruses with fourfold differences in one direction only, while antigenic varieties were distinguishable only with special tests like hemagglutination inhibition or monoclonal antibody assays (4, 7).
The International Committee on the Taxonomy of Viruses (ICTV) has established taxonomic criteria for alphaviruses and has limited its classification to species within the genus (no complexes or subtypes are defined). Currently, the ICTV defines a virus species as a “polythetic class of viruses that constitute a replicating lineage and occupy a particular ecological niche” (67, 68). This definition includes additional criteria in comparison to the SIRACA classification, but this leads to more subjective interpretation in some cases. For example, EVEV is currently considered a species distinct from VEEV (77) (Table (Table1),1), although the SIRACA classification includes it as a subtype of VEEV (4). Phylogenetic studies examining VEEV subtype I viruses in greater detail have shown clearly that EVEV falls within the VEEV subtype IAB/C/D clade (49, 53). A completely natural classification would not include this kind of a paraphyletic taxon and would consider EVEV a variant of VEEV, along with all of the subtype I strains except 78V3531 (Fig. (Fig.2).2). However, EVEV clearly constitutes a replicating lineage (it occurs only in Florida and is genetically distinct based on this distribution) and occupies a particular ecological niche (for example, it uses a mosquito vector different from those of all other VEE complex viruses). Also, EVEV has not been associated with the emergence of epidemics and epizootics like the subtype ID and IE viruses (74). Synonymizing EVEV with VEEV has been previously proposed (27, 39); although justified in many theoretical respects, this would have important practical implications due to biological safety recommendations (66). An additional example of the difficulties in virus classification and taxonomy is the original classification of Barmah Forest virus in the family Bunyaviridae based on antigenic criteria (9, 38). However, subsequent genetic characterization revealed it to be a member of the Alphavirus genus based on virion structure, mode of replication, and nucleic acid and protein sequences.
Despite the fundamental differences between the antigenic and polythetic species definitions, the systematics of the alphaviruses developed on antigenic grounds alone (4) agree remarkably well with those of the ICTV (77). The more detailed nature of the SIRACA classification of antigenic subtypes can lead to minor genetic changes that have a dramatic affect on antigenicity and thus the rapid appearance of new taxa. An example is an antigenic subtype of EEEV isolated from a human in Mississippi in 1983 (5). Although this strain met antigenic criteria as a subtype, genetic analyses demonstrated that minor genetic changes resulted in the addition of an N-linked glycosylation site in the E2 protein (78). Although there was no evidence that this genotype persisted beyond 1983, these kinds of antigenic changes could be epidemiologically important. Another example is VEEV, where only one or two amino acid substitutions in the E2 envelope glycoprotein can result in the generation of subtype IC equine-virulent strains from enzootic, equine-avirulent subtype ID progenitors (72). These changes may have dramatic effects on pathogenicity and host range, leading to epizootics. A completely natural classification would not distinguish these subtypes because they are paraphyletic and the epizootic viruses do not appear to constitute ongoing lineages. However, subtyping of VEEV is extremely important for public health purposes and classifications must balance theoretical and practical considerations.
The seven antigenic complexes of alphaviruses (4) appear to accurately reflect clades of viruses that share medically important characteristics. For example, members of the EEE and VEE complexes share encephalitic potential in equines and humans, while the Semliki Forest virus complex viruses generally produce an arthralgic syndrome. The grouping of Barmah Forest virus with the Semliki Forest virus complex viruses is consistent with their sharing this pathogenic trait. The WEE complex includes viruses that produce both arthralgic (Sindbis virus-like clade) and encephalitic (WEEV and Highlands J virus) syndromes. WEEV and Highlands J virus are descendants of a recombinant alphavirus, and their encephalitic potential presumably reflects the genetic contribution (nonstructural proteins, capsid, and 3′ NCR) of the EEEV-like ancestor rather than the Sindbis virus-like glycoprotein genes (19, 80). The only inconsistency of the established Alphavirus complexes with evolutionary relationships is Middelburg virus, which is classified as a separate antigenic complex based on antigenic relationships (4). While there are very few isolates available and the epidemiologic patterns of the virus are unknown, Middelburg virus may be a member of the Semliki Forest virus complex clade (Fig. (Fig.22 and and33).
Interestingly, serological characterizations may provide some insight into the relationships of the various complex clades. For example, when monoclonal antibodies generated against the Semliki Forest Virus nucleocapsid are used in antibody capture assays, they cross-react with members of the Semliki Forest virus, WEEV, EEEV, Middelburg virus, and Ndumu virus complexes but do not cross-react at all with VEEV or Barmah Forest virus (17). As the nucleocapsid is one of the more conserved virion proteins, this may reflect some ancient relationships among the alphaviruses.
Although our phylogenetic data generally supported the current Alphavirus classification, several discrepancies were noted. (i) Virus strain 78V3531 (VEE subtype IF according to SIRACA) is quite distinct phylogenetically from VEEV, and its closest relative is Ag80-646 (Ag80V). Although its transmission cycle has not been characterized and its niche cannot therefore be evaluated, this virus probably warrants species designation based on the clear distinction of its genetic lineage, its isolation in a part of Brazil not known to be inhabited by other VEEV complex alphaviruses, and its antigenic distinction (3). Unlike VEEV, it is also avirulent for adult mice and is not associated with VEEV outbreaks. (ii) Tonate virus, a member of VEEV complex subtype III, is quite distinct from the other members of subtype III, with at least 16% nucleotide and 7% amino acid sequence divergence (Table (Table4).4). In addition to their antigenic differences, the Tonate and Mucambo viruses apparently use different reservoir hosts (birds and small mammals, respectively) (71). They should probably be considered distinct species. The Bijou Bridge strain from western North America, also a bird virus and apparently transmitted by nest bugs (40), is appropriately considered a strain of Tonate virus due to its genetic similarity and similar niche. (iii) Although its transmission cycle remains obscure, Trocara virus also appears to be a new Alphavirus species based on genetic distinctions from all other species (65). The antigenic comparisons suggesting that Trocara virus represents a new antigenic complex are not as comprehensive as our sequence comparisons, and cross-reactions with members of several Alphavirus serocomplexes were very weak. (iv) Me Tri virus, originally reported to be a new Alphavirus based on antigenic criteria, is genetically very close to Semliki Forest virus and does not appear to constitute a separate lineage (although lineage is a rather arbitrary term); its genetic distance from Semliki Forest virus is similar to the distances among other Alphavirus subtypes or strains, and it should probably be considered a subtype or strain of Semliki Forest virus. (v) Sagiyama virus, considered by SIRACA to be a subtype of Getah virus, along with Ross River virus and Bebaru virus (4), and considered a subtype of Ross River virus in the most recent ICTV classification (77), is much more closely related to Getah virus than to Ross River or Bebaru virus. Based on our genetic data alone, the Ross River, Bebaru, and Getah viruses should be retained as distinct Alphavirus species but, as suggested by Shirako and Yamaguchi (57), Sagiyama virus should be considered a subtype of Getah virus. (vi) Kyzylagach virus, which was originally isolated in Azerbaijan and was recently identified in China (36), appears to be one of the most distinct subtypes of Sindbis virus yet identified. The genetic data indicate that it could be classified as either a subtype of Sindbis virus or a distinct species (18% divergence at the nucleotide level and 6 to 8% divergence at the amino acid level). Additionally, of all of the viruses analyzed in this study, this is the only virus that could not be sequenced with the degenerate alphavirus sequencing primers; it required species-specific primers. Because SIN viruses are usually transmitted among avian hosts and maintain a high degree of genetic homogeneity, the fact that Kyzylagach virus exists in a lineage so independent from all other SIN viruses suggests that it could be classified as a distinct species. (vii) SDV and SPDV, although not yet compared antigenically to the alphaviruses (69, 81), also appear to represent a distinct complex based on their sequence divergence. They clearly occupy dramatically different niches and genetic lineages from all remaining alphaviruses, indicating that they are not variants of an established species. However, the very small amount of sequence divergence between the two fish viruses suggests that SDV is really a strain or subtype of the novel Alphavirus species SPDV. (viii) SESV also represents a new Alphavirus species, as reported previously (32). It appears to be quite distinct genetically from all of the mosquito-borne alphaviruses, with the amino acid sequence divergence level of a distinct antigenic complex.
We thank Robert Tesh, Robert Shope, and Hilda Guzman for providing some of the alphaviruses used in our analyses.
A.M.P. was supported by the James W. McLaughlin Fellowship Fund and NIH T32 Training Grant on Emerging and Reemerging Infectious Diseases AI-07536. A.C.B. was supported by a James L. McLaughlin Infection and Immunity Fellowship and NIH Emerging Tropical Diseases T32 training grant AI-107526. This research was supported by National Institutes of Health grants AI-10984 to Robert Tesh, AI-39800 to S.C.W., and AI-10793 to J.H.S. and E.G.S.