Global Picture of Evolutionary Relationships of Influenza A and B Neuraminidase (NA) Genes
The Maximum Likelihood (ML) and MCMC Bayesian analyses demonstrate that the influenza NA gene diverged first into A and B (Group I and Group II), followed by the division of influenza A subtypes (, File S1
). The monophylic origin of influenza A and influenza B was strongly supported by the bootstrap values (100%). Within influenza A, two subgroups were found, one consisting of subtype N2, N3, N6, N7 and N9 (Subgroup I) and the other consisting of the remaining four subtypes, N1, N4, N5 and N8 (Subgroup II) (). Each subgroup consists of viruses independently adapted to the avian, human, equine and swine hosts, indicating that parallel evolution occurred in these two subgroups (). In addition, each of the nine influenza A NA subtypes was found to form a distinct cluster with a high bootstrap support value (>90%), indicating a monophyletic origin for each subtype.
Phylogeny of influenza A and B neuraminidase (NA) genes.
Phylogeny of Neuraminidase (NA) Genes within Influenza A and B Viruses
A total of 23 lineages, two to three lineages for each subtype, were identified within influenza A viruses, while two lineages were classified within influenza B (). Lineages 1A and 2A were further divided into five and three sublineages, respectively. Human lineages were found in influenza A N1 and N2 subtypes and influenza B, swine lineages in N1 and N2, equine lineages in N7 and N8, and avian lineages in all influenza A subtypes. In addition, avian lineages were found to have more combinations of HA and NA compared with mammalian lineages.
The annotations, isolation periods, representative sequences and subtypes for the NA lineages.
Lineage analyses of influenza A N1 genes
Three lineages, 1A, 1B and 1C, were identified based upon strong bootstrap support values (100%) of the phylogenetic tree, which was generated from 4,146 sequences (-A, ). The genetic distances between lineages ranged from 0.191 to 0.238. Lineage 1A is a major avian lineage, which is further divided into five sublineages: 1A.1 (H5N1), 1A.2 (Eurasian avian), 1A.3 (Pandemic H1N1 2009), 1A.4 (Eurasian avian-like swine) and 1A.5 (North American avian).
Maximum-likelihood (ML) tree of influenza A NA subtypes.
Sublineage 1A.1 originated from the recent highly pathogenic H5N1 avian influenza epizootic that started in Asia around 1996 and has spread throughout the Eastern Hemisphere. The viruses in 1A.1 are mostly from birds (n
1,031), but some are from humans (n
164), swine (n
8), tigers (n
2) and mink (n
1). Sublineage 1A.2 is composed of mostly Eurasian avian influenza viruses (n
230), whereas some human highly pathogenic H5N1 influenza viruses (n
24) sampled in 1997 in Hong Kong were also found in 1A.2. Sublineage 1A.4 consists of Eurasian swine influenza viruses which were originally derived from Eurasian avian viruses and first detected in Belgium in 1979. Not surprisingly, 1A.3 (Pandemic H1N1 2009) is grouped together with Eurasian swine, which confirms previous findings that the NA segment of pandemic H1N1 2009 viruses originated from the Eurasian swine influenza viruses. Sublineage 1A.5 is composed of viruses mainly from North American avian species (n
162), with a few exceptions: 1 viral sequence from human and 3 from environmental samples.
Lineage 1B consists of mainly North American swine influenza viruses, while 1C is a human lineage, consisting mainly of H1N1 human influenza viruses. The viruses in 1B correspond mostly to the classical H1N1 isolates from swine (n
126), but include 9 isolates from humans and 9 from birds, indicating sporadic interspecies transmissions of influenza viruses from swine to humans or birds. Lineage 1C consists predominantly of human viruses (n
1204), with a few exceptions, namely, swine (4 isolates) and birds (2 isolates). Within the influenza A N1 subtype, avian influenza viruses include sequences from multiple HA subtypes (e.g., H1N1, H3N1, H5N1, H6N1, H7N1, H9N1, and H11N1), whereas human and swine viruses have limited HA subtypes (human: H1N1; swine: H1N1, H3N1).
Lineage analyses of influenza A N2 genes
The N2 sequences (3,754 in total) were classified into two major lineages, 2A and 2B (-B, ). The genetic distance between lineages 2A and 2B was estimated to be 0.204. Lineage 2A is a major avian lineage whereas 2B consists of mainly mammalian (i.e., human and swine) influenza viruses. Three sublineages were further classified in 2A, 2A.1 for H9N2, 2A.2 for Eurasian avian, and 2A.3 for North American avian.
The 2A.1 is a subtype-specific sublineage consisting of mainly H9N2 avian influenza viruses, with the majority from birds (n
412), but with 24 sequences from swine and 4 from humans, which indicates the occurrence of interspecies transmissions. The 2A.2 and 2A.3 correspond to Eurasian and North American avian viruses, respectively. The viruses of 2A.2 are mainly from birds (n
342), but a few are from swine (n
7) and humans (n
2). A similar result was also found in 2A.3, which includes 291 avian viruses, 1 H7N2 human virus, and 29 viruses isolated from environmental samples.
Within 2B, most of the influenza viruses are from human H2N2 and H3N2 influenza viruses (n
2,340) and swine H3N2 and H1N2 viruses (n
214). However, avian influenza H3N2 viruses (n
11) were also found in this lineage. Interestingly, there were five major clades of swine influenza viruses scattered within lineage 2B, suggesting these viruses originate from human viruses through either genome reassortment or direct transmission events. It is also noted that the branch lengths of the swine clusters are much longer as compared to those of the closely related human viruses, indicating extensive evolution of the N2 gene in swine viruses after transmission from humans to swine.
Lineage analyses of influenza A N3–N9 genes
Three lineages, 3A, 3B, and 3C, were found in N3, with genetic distances between lineages ranging from 0.173 to 0.349 (, Figure S1
). Lineage 3A consists mainly of North American avian viruses (n
173), but includes several avian strains from South America (n
8). In addition, within lineage 3A, 166 sequences were isolated from avian, 4 from swine, 1 from human, and 9 from environmental samples. Lineage 3B is a Eurasian/Oceanian avian lineage, while 3C is also an avian lineage, but does not show any geographical pattern. Lineage 3B and 3C were all composed of avian influenza viruses.
The N4, N5 and N6 subtypes were each classified into two lineages, one corresponding to North American avian (4A, 5A and 6A) and the other Eurasian/Oceanian avian (4B, 5B and 6B) (, Figure S2
, -C, Figure S3
). The genetic distance between lineages was estimated to be 0.198 for N4, 0.254 for N5, and 0.250 for N6 viruses, respectively. All N4 and N5 viruses are from avian species. Lineage 6A is composed mainly of North American avian viruses (n
336), with a few exceptions (n
2) from Asia avian viruses. Lineage 6B consists mainly of Eurasian/Oceanian avian viruses (n
121), but contains 6 avian viruses from North America.
Three lineages were identified in N7 and N8, which correspond to North American avian (7A, 8A), equine (7C, 8B) and Eurasian/Oceanian avian (7B, 8C), respectively (, Figure S4
and -D). For N9, 3 lineages were identified: 9A, 9B and 9C, which correspond to North American avian, Eurasian/Oceanian avian I and Eurasian/Oceanian avian II, respectively (, Figure S5
). The genetic distances between lineages were found in the range from 0.297 to 0.320 for N7, from 0.269 to 0.298 for N8, and from 0.117 to 0.224 for N9, respectively.
Lineage analyses of influenza B neuraminidase (NA) genes
The NA genes of influenza B viruses were divided into two distinct lineages, B/Victoria/2/87-like (Vic87) and B/Yamagata/16/88-like (Yam88) (). All influenza B viruses were found from humans, with no obvious geographical separation in either lineage. The genetic distance between Vic87 and Yam88 lineages was estimated to be 0.06.
Maximum-likelihood (ML) tree of influenza B NA genes.
Substitution Rates and Times of Most Recent Common Ancestor (tMRCAs) of Influenza A and B NA Lineages
Outliers were identified and removed before the estimation of substitution rate and tMRCA for each lineage (Table S1
). The mean substitution rate and 95% HPD range for each lineage are summarized in . Our results demonstrated that the mean substitution rates estimated under random local clock (RLC) model were generally lower than the corresponding rates estimated under uncorrelated exponential relaxed clock (UCED) model (). In the following, we present the results based upon the RLC model, a new model that can reveal the rate heterogeneity among branches.
Substitutions rates and tMRCAs of different lineages for influenza A and B NA genes*.
The Bayesian consensus tree for each lineage, along with posterior mean branch lengths scaled in real time, is depicted in . To reflect the rate variation, we colored branches by their posterior mean relative rate of nucleotide substitution. Blue branches reflect a slow substitution rate, whereas red branches indicate rapid change. For H5N1, the mean substitution rate was estimated to be 3.06×10−3 subs/site/year (), with a low rate (1.5×10−3) found in earlier branches (blue) and a high rate (4.20×10−3) in later branches (red) (-A). In contrast, N1 genes of North American swine viruses have a mean rate of 2.55×10−3, with a decrease in rates during evolution: a high rate (3.2×10−3) in earlier branches (red) and a low rate (0.9×10−3) in later branches (blue) (-B). It is noted that human H1N1 viruses were found to evolve at two different rates in two circulation periods, with a low rate (1.3×10−3) during 1918–1957 (blue) and a high rate (2.9×10−3) after 1977 (red) (-C).
Bayesian inferences of random local clocks on influenza NA genes.
The H9N2 lineage was found to have a mean substitution rate of 4.45×10−3 (), with a constant rate of 4.9×10−3 in the majority of branches (red) and a low rate (2.6×10−3) in a small number of branches (blue) (-D). The substitution rates with the equine N7 lineage decreased from earlier branches (red) (3.4×10−3) to late branches (blue) (1.6×10−3) and averaged at 2.65×10−3 (-E). The influenza B Yama88 viruses has a mean substitution rate of 2.3×10−3 (), with a consistent rate of 2.4×10−3 in the majority of branches (red) and a rate of 1.5×10−3 in a small number of branches (blue) (-F). Different rate heterogeneity patterns were also found in other lineages (Data available from authors on request).
The time of most recent common ancestor (tMRCA) varies from lineage to lineage (). The tMRCA for human H1N1 (1C), which includes viruses causing the 1918 Spanish Flu, was dated to 1898 and the 95% HPD interval was between 1882 and 1909. The tMRCA of H5N1viruses (1A.1) was estimated to be at 1988 (95% HPD: 1984–1992), eight years before the outbreak of H5N1 avian virus in 1996 in Asia. For 1A.2 (Eurasian avian in N1), the tMRCA was estimated to be at 1927 (95% HPD: 1922–1931), with the earliest sampling time being 1934. For the pandemic H1N1 2009 (1A.3), it can be dated back to Nov 19, 2008 (95% HPD: June 7, 2008– Mar 16, 2009). The most recent common ancestor of the Eurasian (avian-like) swine (1A.4) can be dated back to 1978 (95% HPD: 1977–1979), one year earlier than the first detection of this lineage in 1979. For lineage 2B, the tMRCA was dated to 1956 (95% HPD: 1955–1957), one year before the occurrence of human H2N2 in 1957. The tMRCAs for other lineages are shown in and the MCC trees are available from the authors upon request. The above results suggest that pandemic or epidemic viruses emerged several months or several years before their initial detection, indicating the crucial role for enhanced surveillance of newly emerging viruses.
Selection of Influenza A and B Neuraminidase Lineages
Different selection pressures were revealed in different lineages as indicated by the ratio of non-synonymous (dN) to synonymous (dS) substitutions per site (dN/dS) (). Within influenza A, the highest dN/dS ratio was observed in 2B - human N2 lineage (0.313), which was slightly higher than that of 8B - equine N8 lineage (0.281), 1C - human N1 lineage (0.261), 1A.1 - H5N1 (0.274) and 2A.1- H9N2 (0.252), most likely reflecting host immune selection pressure, as a result of continuous circulation within the respective hosts and/or vaccination. The lineages under the most purifying selection were lineage 9C (0.068), 4B (0.062) and 5B (0.078). In comparison, the dN/dS ratios for influenza B lineages were comparable: 0.259 for Yam88 and 0.257 for Vic87.
Evidence of positive selection using the SLAC, FEL and IFEL methods with a significance level of 0.05.
Human lineages were found to have the largest numbers of positively selected sites, with 16 sites for the human N2 lineage (2B), 9 sites for human H1N1 lineage (1C), and 8 sites for Yam88 lineage (). In addition, H5N1 (1A.1) and H9N2 (2A.1), have 10 and 7 positively selected sites, respectively. No positive selection sites were detected in lineages 3C, 6B, 7A, 7C, 8B, and 9A–9C. Other lineages were found to have one to six sites under positive selection.
Protein structure analyses revealed all the positively selected sites were located at the surface of the NA protein and pertained to antibody binding and/or interactions with the sugar molecules of host cells (, Figures S6
). In addition, a number of positively selected sites reside in regions of the NA protein where neuraminidase inhibitors have been known to bind, indicating strong selection in influenza viruses with molecular markers predictive of antiviral resistance.
The structures and positive selection sites of human influenza neuraminidase.
In the human H1N1 lineage (1C), amino acid positions 151, 222 and 344 were found to be under a strong positive selection, and the amino acids in these appear to interact with the NA inhibitor – zanamivir, a drug molecule according to the NA structure (-A). In addition, positively selected sites 344 and 365 are located in the B-cell antigenic regions. The amino acid position 319 in human H1N1 lineage, identified to be under positive selection, forms a hydrogen bond with position 379, whose backbone carbonyl is involved in interactions with calcium ions (-A). This Ca2+ ion interacts with positions 379, 389, 387, 382, and 381, forming H-bonds with position 385 and position 383. These interactions are crucial in protein folding to create the appropriate tertiary structure for sialic acid binding (which allows the NA to cleave the sialic acid) or for NA inhibitor binding.
With regard to another human lineage (2B), positions 126 and 127 were found to be within the binding pocket of influenza A virus (-B). These two residues, along with residues 120 and 151 were found to be under positive selection. All these sites fold in close proximity to each other, providing a hydrogen-bond network that is essential for NA inhibitor binding. Specifically, position 151 forms a hydrogen bond to position 75, which itself is predicted to bind to zanamivir.
For human influenza B, positions 42, 65, 248, 345, 373, 389, 395, and 436 were found to be under positive selection (). The crystal structure of the B/Perth/211/2011 virus NA region with zanamivir, oseltamivir, or peramivir showed that residues 373 and 374 participated in drug binding, while residue 345 is involved in calcium binding and dimerization of two NA monomers (-C, D).