|Home | About | Journals | Submit | Contact Us | Français|
Since the 1990s West Nile virus (WNV) has become an increasingly important public health problem and the cause of outbreaks of neurological disease. Genetic analyses have identified multiple lineages with many studies focusing on lineage 1 due to its emergence in New York in 1999 and its neuroinvasive phenotype. Until recently, viruses in lineage 2 were not thought to be of public health importance due to few outbreaks of disease being associated with viruses in this lineage. However, recent epidemics of lineage 2 in Europe (Greece and Italy) and Russia have shown the increasing importance of this lineage. There are very few genetic studies examining isolates belonging to lineage 2. We have sequenced the full-length genomes of four older lineage 2 WNV isolates, compared them to 12 previously published genomic sequences and examined the evolution of this lineage. Our studies show that this lineage has evolved over the past 300–400 years and appears to correlate with a change from mouse attenuated to virulent phenotype based on previous studies by our group. This evolution mirrors that which is seen in lineage 1 isolates, which have also evolved to a virulent phenotype over the same period of time.
West Nile virus (WNV; Flaviviridae: Flavivirus) is a widely distributed mosquito-borne virus that can cause encephalitic disease in humans and other vertebrates. It is maintained in nature in a mosquito–bird cycle with Culex species mosquitoes being the primary vector. First isolated from a febrile woman in Uganda in 1937 (Smithburn et al., 1940), WNV caused sporadic outbreaks in humans and horses in parts of Africa, Australia and the Middle East with larger outbreaks in Israel (1951–52, 1957 and 1962), France (1962–65) and South Africa (1974 and 1983–84). In the 1990s, there was an increase in outbreaks of WNV with epidemics occurring in Algeria (1994 and 1997), Morocco (1996), Romania (1996), Tunisia (1997), Russia (1999), Israel (1998–2000) and France (2000) (Mackenzie et al., 2004; Murgue et al., 2001). These epidemics were associated with severe human and equine encephalitic disease. In 1999, WNV was first isolated in the Western hemisphere during an outbreak of encephalitis in New York. Since 2000, WNV rapidly spread across the United States and into Canada, the Caribbean, and Central and South America. Concurrently, there have been several severe epidemics of WNV in Israel, Russia, Tunisia, Hungary and most recently in Greece and Italy (Bagnarelli et al., 2011; Bakonyi et al., 2006; Papa, 2012; Platonov et al., 2011). Like the epidemics in North America, these involved large numbers of human, equine and avian cases with high incidence of severe encephalitic disease.
WNV is classified into at least five distinct lineages (Mackenzie & Williams, 2009). Lineage 1, the largest and most widespread, is found in Africa, Asia, the Middle East, Europe, Australia (Kunjin subtype) and the Americas (May et al., 2011; Scherret et al., 2001). Lineage 2 isolates are found primarily in sub-Saharan Africa and Madagascar with recent introductions into Europe (Greece, Hungary and Italy) and Russia (Bagnarelli et al., 2011; Bakonyi et al., 2006; Berthet et al., 1997; Papa et al., 2011a; Scherret et al., 2002). Lineage 3 is composed of a single isolate from the Rabensburg region of the Czech Republic (Bakonyi et al., 2005), while lineage 4 isolates are found in the Caucasus region of Russia (Lvov et al., 2004). Lineage 5 isolates have been found in India only (Bondre et al., 2007). Other putative lineages have been described, including a lineage based on a Koutango virus isolate from Africa, another lineage with isolates from Spain, and an additional lineage with the Kunjin virus strain from Sarawak, Malaysia (Mackenzie & Williams, 2009; Scherret et al., 2001, 2002; Vazquez et al., 2010).
While much recent work with WNV has focused primarily on lineage 1 isolates, particularly in the Americas, recent epidemics in Europe, primarily in Greece and Russia, due to lineage 2 isolates have shown that this lineage is of increasing public health importance (Papa et al., 2011b; Platonov et al., 2011). To investigate the evolution of lineage 2 we have determined the genomic sequences of four early lineage 2 isolates; three from Africa (Congo, South Africa and Madagascar) and one from the Mediterranean basin (Cyprus) to complement the published genomic sequences of recent isolates from Europe and phenotypic studies previously undertaken by our group (Beasley et al., 2002, 2004). These new sequence data together with our phenotypic studies have provided significant new insights into the evolution of WNV lineage 2.
The full genome sequence of four lineage 2 isolates was determined in this study. Two isolates are from 1958 [Eyoku from Congo (CON58) and SA AN2842 from South Africa (SA58B)], one from 1968 [Q3574-5 from Cyprus (CYP68)] and one from 1988 [ArMg979 from Madagascar (MAD88)] (GenBank accession numbers: GQ903680, HM147822–HM147824). Details of these isolates are shown in Table 1. As of February 2012, there were a total of 18 published genomic sequences of lineage 2 WNV isolates, including the four isolates described here. The Sarafend and WNFC3 lineage 2 strains were omitted from these studies due to the uncertainty of passage history, isolation location and/or isolation date, leaving 16 lineage 2 strains analysed in these studies.
As with most lineage 2 isolates, the ORFs of the isolates in this study are 10302 nt long (3434 aa) with the exception of CYP68, which has only 10296 nt (3432 aa). SEN90 and UGA37 are the only two lineage 2 isolates that have fewer nucleotides (10290 nt; 3430 aa). CYP68 has amino acid deletions at envelope (E)-200 and NS3-265, while SEN90 and UGA37 lack the amino acids at E-153–156, which results in loss of the E protein glycosylation site. Pairwise comparison shows that nucleotide divergence ranges from 0.20 to 16.90%, while amino acids differ by 0.30–4.30% (Table 2). CYP68 and MAD78 are the most divergent with nucleotide and amino acid sequence differences of 16.90 and 4.30%, respectively, while SEN90 and UGA37 are the most similar with nucleotide and amino acid divergence of 0.20 and 0.3%, respectively.
Five of the 16 lineage 2 isolates (MAD78, CYP68, CON58, UGA37 and SEN90) have a modified E protein glycosylation site (Table 1). The conserved sequence for the glycosylation site amongst all WNV strains is NYS at residues 154–156 in the E protein. CYP68, MAD78 and CON58 have the motif NYP, which ablates glycosylation, and UGA37 and SEN90 each have a deletion of four amino acids (E-153–156). All other lineage 2 viruses examined have the NYS motif at the E protein glycosylation site. Fig. 1 shows a Bayesian phylogenetic tree composed of lineage 2 isolates with the pertinent amino acid substitutions that define each node. The lineage 2 nodes are defined by one to ten amino acid substitutions distributed across the E and non-structural proteins. Several of these residues are exposed on the surface of the E protein including E-26, 153–156, 230, 312 and 313. The node between MAD78 and the other lineage 2 isolates is defined by 90 aa substitutions, illustrating the significant divergence of this strain.
The 5′- and 3′-UTR’s of the newly sequenced isolates were also aligned and compared. There were no differences seen within the 5′-UTR of any lineage 2 strain. Within the 3′-UTR, there was a 14 nt insertion for both CYP68 and SA58B at position 10441–10454. This is a variable region in the 3′-UTR and contains a variety of deletions in different lineage 1 strains (data not shown). SA58A, SA58B, CYP68 and MAD88 also encoded different single nucleotide insertions within the 3′-UTR. Not all published sequences contained the full-length 5′- and 3′-UTR sequences so these regions were not included in the phylogenetic analyses.
As of February 2012, there were over 350 complete WNV genome sequences published within GenBank with the majority of these sequences being highly related lineage 1 isolates from North America. The majority of these sequences were not included in these analyses since they would have little effect on the dating of the phylogenetic nodes of interest for this study. Therefore, 84 sequences were chosen to represent all lineages (lineages 1–5) and geographical regions where WNV has been isolated (Table S1, available in JGV Online). All methods of phylogenetic analyses [neighbour-joining (NJ), maximum-likelihood (ML) and Bayesian] used in these studies confirmed the placement of the isolates sequenced within lineage 2 as had been previously determined by partial sequencing studies (Beasley et al., 2002, 2004). With the addition of the new sequences, lineage 2 is composed of four clades (2a–2d) as determined by the genetic diversity between the strains (Fig. 1, Table 2). Clade 2a is composed of the most divergent isolate, MAD78. Clade 2b is composed of SA58b and CYP68, while clade 2c is composed of MAD88. Clade 2d is the largest and is composed of CON58, isolates from Russia (RUSV07), Africa (SA00, SA58A, CAR82, UGA37, SEN90, SA89 and SA01) and Europe (HUN04, ITA11b and GRE10).
Recombination analyses performed using programs within the Datamonkey server (sbp and gard) and rdp3 identified no evidence of recombination within any of the lineage 2 isolates. Selection analyses using programs within the Datamonkey server (fel and rel) identified one site of potential positive selection within the lineage 2 isolates. Residue 312 in the E protein encodes a valine in MAD78, SA00, SA58B and CYP68 and an alanine in all other isolates. E-312 is a surface exposed amino acid in domain III of the E protein. E-L312A has been linked to subtle effects on neutralization by mAbs (Li et al., 2005). This is one of the significant changes identifying the node in Fig. 1, which includes clades 2c and d.
beast analyses were performed to determine the evolutionary rates and dates of the major nodes of the WNV lineages in addition to the nodes within lineage 2. The Bayesian maximum-clade credibility (MCC) tree produced confirms the placement of the WNV isolates within each lineage, clade and cluster as determined previously (Fig. 2) (Beasley et al., 2002; Botha et al., 2008; May et al., 2011). Posterior values for each node are shown in Table 3. Lower posterior values for the nodes for the most recent common ancestor (MRCA) for lineages 2, 3 and 4 are most likely low due to the relatively few numbers of available isolates for each of these lineages.
The mean nucleotide substitution rate for all lineages of WNV is 3.74×10−4 substitutions per site per year [95% high posterior density (HPD): 2.21×10−4 to 5.34×10−4] and the substitution rate of lineage 2 is 2.73×10−4 substitutions per site per year (95% HPD: 5.23×10−5 to 5.44×10−4). These rates are comparable to previously published nucleotide substitution rates for WNV and other flaviviruses (Baillie et al., 2008; Bryant et al., 2007; Hanada et al., 2004; Jenkins et al., 2002; May et al., 2011).
Lineage 2 emerged from lineages 1 and 5 in approximately 1285AD (95% HPD 789–1600AD), and lineage 5 diverged from lineage 1 in approximately 1553AD (95% HPD 1163–1768AD). Nodes within lineage 1 were not determined as they have been described elsewhere (May et al., 2011) and the primary focus of this study was lineage 2. The divergence dates for the MRCA at the node of each of the eight clades of lineage 2 are shown in Table 3. Overall, the progenitor for lineage 2 occurred around 1646AD (95% HPD 1375–1815AD), while the progenitor for the most recent isolates, ITA11b and GRE10, occurred very recently around 2003AD (95% HPD 1995–2009AD).
With the increasing importance of lineage 2 WNV in recent outbreaks and epidemics in Europe and Russia, our study provides a better understanding of the phylogenetic and evolutionary relationships of WNV lineage 2. We have sequenced the entire genome of four older WNV isolates (CON58, SA58b, CYP68 and MAD88) from Africa and Cyprus isolated between 1958 and 1988, and these are among the most divergent lineage 2 isolates sequenced thus far. As determined in previous studies, these isolates are naturally attenuated for mouse neuroinvasiveness with LD50 values ranging from 125 p.f.u. for SA58B to >10000 p.f.u. for CYP68 (Beasley et al., 2002, 2004) (Table 1). These previous studies carried out in our laboratory and others (Botha et al., 2008) show that beginning with CON58, there is a change from an attenuated to virulent phenotype in terms of mouse neuroinvasiveness that is associated with genetic clade d and not with their chronological year of isolation (see Fig. 1). This modification in mouse neuroinvasive phenotype appears to coincide with the increase of outbreaks and epidemics associated with more recent isolates within lineage 2. The mouse neuroinvasiveness phenotypes of the most recent isolates (HUN04, GRE10 and ITA11b) have not been reported, but can be assumed to be virulent since the latter two have been isolated from humans during large outbreaks and epidemics within those regions (Bagnarelli et al., 2011; Papa et al., 2011a).
As compared to lineage 1, lineage 2 WNV isolates have a smaller geographical range, with isolates being found in Africa with recent introductions into Europe and the Volgograd region of Russia. Thus, the MRCA of WNV probably occurred in Africa and spread to Europe, Russia and the Middle East, most probably through migratory birds. Our phylogenetic analysis suggests that lineage 2 WNV emerged out of Africa on at least three separate occasions (Fig. 2 and Table 3). The first occurred during the 19th century/early 20th century (node F: 95% HPD1805–1932) when the virus spread to an area, most likely in the Mediterranean region, that eventually led to virus isolation in Cyprus in 1968. The second and third occurrences were during the middle of the 20th century, when the virus emerged in south-western Russia (node J: 95% HPD 1914–1951), and into Europe and was later isolated in recent outbreaks in Hungary, Italy and Greece (node O: 95% HPD 1936–1981). While our studies suggest that WNV has emerged out of Africa on at least three separate occasions, it is possible that there is continual movement of WNV between the two regions each year, but has only been identified as the virus has become more virulent and/or entered into new regions. Clearly, a better understanding of how lineage 2 WNV has moved between Africa and Eurasia would be gained with more genomic sequences of isolates.
As the virus has spread and continued to evolve, it appears to have become more virulent leading to increased numbers of cases of avian, equine and human disease. This evolutionary pattern is similar to that being seen with lineage 1 isolates, which have also evolved from a naturally attenuated to virulent phenotype over the same 300–400 years. For lineage 2 isolates, the change from the attenuated to virulent phenotype, based on the mouse neuroinvasive phenotype, is characterized by six amino acid substitutions: E-V159I, NS1-L338T, NS2A-A126S, NS3-N421S, NS4B-L20P and NS5-Y254F (Fig. 1). At most of these positions, these amino acids vary greatly between the different lineages (Table 4). At E-159, an isoleucine is found in almost all lineage 1b isolates except for those in cluster 4, which includes the virulent North America isolates and their ancestors (TUN97, ISR98 and HUN03). These isolates have a valine at position E-159. Starting in 2002, an alanine is found at this position corresponding to the change from the NY99 to NA/WN02 genotype in North America (Beasley et al., 2003; Davis et al., 2005; Ebel et al., 2004). Lineage 2 clades a and b also contain a valine at this residue, while clades c and d contain an isoleucine. At position 338 in NS1, the attenuated clades of lineage 2 contain a leucine, which is also found in the attenuated lineage 3 and 4 isolates. Virulent lineage 2 strains (clade d) have a threonine at this position. NS3-S421 is a conserved residue in all lineages, except for attenuated lineage 2 clades a–c, which contain an asparagine. The NS5-Y254 substitution to phenylalanine is the only mutation found in virulent lineage 2 isolates. This position is located in the methyltransferase (MTase) domain of NS5 proximal to the RNA-binding site (Zhou et al., 2007). Mutating the residue to an alanine has been shown to suppress 2′-O methylation activity (Dong et al., 2008a, b). The substitution of a tyrosine to a phenylalanine results in the loss of a hydroxyl group. This change could potentially result in an increased activity of the MTase, which could aid in viral propagation, leading to the increase in virulence phenotype exhibited by the clades containing the mutation. Finally, one additional amino acid substitution to note is the substitution to a proline at NS3-249, which is found concurrently in the lineage 2 strain GRE10 (H249P) and in lineage 1a, cluster 4 isolates from the Middle East and the Americas (T249P). This substitution has been shown to lead to an increased virulence and pathogenesis in American crows and other birds (Brault et al., 2004, 2007; Papa, 2011a). Interestingly, the GRE10 strain was isolated during the largest lineage 2 epidemic recorded to date and phylogenetic analysis suggests that this isolate originated during the period 1995–2009 (Fig. 2 and Table 3, node R), which is approximately the same time period as when lineage 1a, cluster 4 originated (1990–1996) (May et al., 2011) with the substitution at NS3-249.
WNV lineage 2 isolates pose a continuing threat to Europe and potentially other regions around the world. A better understanding of how these isolates have evolved can provide insights into the future evolution of this lineage, and allow us to better predict how these isolates will lead to future outbreaks and epidemics.
Four WNV strains (Eyoku: Congo 1958, SA AN2842: South Africa 1958, Q3574-5: Cyprus 1968 and ArMg979: Madagascar 1988) were obtained from the World Reference Center for Emerging Viruses and Arboviruses at the University of Texas Medical Branch. Viruses were passaged once in Vero cells and stored at −80 °C. Properties for these viruses are shown in Table 1.
Viral RNA was extracted directly from WNV-infected Vero cell culture supernatants by using the QIAamp viral RNA extraction kit (Qiagen Sciences), as per manufacturer’s instructions. PCR products encompassing the entire ORF of each isolate were obtained using primers designed on the published sequence of lineage 2 isolate B956 (GenBank accession no. AY532665). RT-PCR was performed with the Titan One-tube RT-PCR system (Roche) as per manufacturer’s instructions (primer sequences and PCR conditions available from authors on request). Briefly, RT-PCR was performed in a 50 µl volume containing 5 µl viral RNA, 1 µl of each primer, 10 µl of 5× RT buffer, 1 µl 10 mM dNTPs, 1 µl enzyme mix, 2.5 µl 100 mM DTT, 0.25 µl of inhibitor (10 U µl−1), and 28.25 µl HPLC water. PCR products were electrophoresed and analysed on 1% agarose gels. DNA bands were purified by using the QIAquick gel extraction kit (Qiagen Sciences). PCR products were directly sequenced in both directions to generate consensus sequences at the University of Texas Medical Branch’s Protein Chemistry Core Laboratory. In some cases where RT-PCR amplification was insufficient, products were cloned into the pGEM-T (Easy) vector (Promega), and the cDNA clones for each were sequenced in both directions. Sequences were assembled and analysed using the Vector NTI suite of programs (Invitrogen) and SeaView v4.2 (Gouy et al., 2010).
Genomic sequences of 12 lineage 2 WNV isolates were downloaded from GenBank. The Sarafend and WNFC3 strains were not used for this study due to unavailability of published isolation dates. Additionally, genomic sequences of 83 isolates representing the remaining lineages were also downloaded. Properties for the strains used for the phylogenetic analyses are listed in Table 3. Nucleotide sequences were aligned and edited using the muscle algorithm in Seaview v4.2 (Gouy et al., 2010) keeping gaps consistent within the reading frame. NJ and ML trees were constructed using phylip (Felsenstein, 1989) and PhyML (Guindon & Gascuel, 2003), respectively, while a Bayesian tree was constructed using beast v1.6.1 package (Drummond & Rambaut, 2007). To determine the robustness of the trees, 100 bootstrap replicates were performed. Evolutionary rates and times to MRCA for all of WNV and for lineage 2 individually were estimated using the Bayesian Markov Chain Monte Carlo (MCMC) method in beast. The GTR+I+Γ6 model was used with a relaxed uncorrelated exponential molecular clock and the exponential model. The Bayes factors for all population models available in the beast package were compared to choose the best-fit model. The MCMC analysis was run for 330 million chains (11 runs of 30 million each) to attain convergence with 10% of the runs discarded as ‘burn-in’. Using TreeAnnotator, within the beast package, an MCC tree was created. beast runs were performed using the cipres Science Gateway (Miller et al., 2010).
Recombination analyses were performed to determine the presence of recombination within the nucleotide sequence alignment. Using the RDP3 vα44 program (Martin et al., 2010), analyses were performed using the rdp, geneconv, Chimaera, MaxChi and Bootscan methods. Additional recombination analyses were performed using the sbp and gard methods in the Datamonkey server (Kosakovsky Pond et al., 2006; Pond & Frost, 2005). The Datamonkey server was also used for selection analyses. Positive selection was determined using the alignment containing 16 lineage 2 isolates and the fel and rel methods (Kosakovsky Pond & Frost, 2005). Sites were considered positive when dN/dS >1 with a P-value <0.1.
This work was supported in part by NIH grant AI 067847 (to A.D.T.B.) and contract HHSN272201000040I/HHSN27200004/D04. A.R.M. is supported by NIH T32 training grant AI 07526. A special thanks to Amy Schuh and Andrew Beck for their help with the beast phylogenetic analyses.
A supplementary table is available with the online version of this paper.