|Home | About | Journals | Submit | Contact Us | Français|
Analysis of patterns of nucleotide sequence diversity in wild-type rabies virus (RABV) genomes and in the SAD live attenuated oral vaccine lineage was used to test for the relaxation of purifying selection in the latter and provide evidence regarding the genomic regions where such relaxation of selection occurs. The wild-type sequences showed evidence of strong past and ongoing purifying selection both on non-synonymous sites in coding regions and on non-coding regions, particularly the start and end and 5’ UTR regions. SAD vaccine sequences showed a relaxation of purifying selection at nonsynonymous sites in coding regions, resulting a substantial number of amino acid sequence polymorphisms at sites that were invariant in the wild-type sequences. Moreover, SAD vaccine sequences showed high levels of mutation accumulation in the non-coding regions that were most conserved in the wild-type sequences. Understanding the biological effects of the unique mutations accumulated in the vaccine lineage is important because of their potential effects on antigenicity and effectiveness of the vaccine.
Rabies virus (RABV), which causes an acute and generally fatal illness of humans and domestic animals, is the oldest recorded human viral pathogen and remains an important source of mortality in many parts of the world (Fu 1997; Warrell and Warrell 2004). In Europe, while control measures have greatly reduced human cases of rabies, wild mammals, particularly the red fox (Vulpes vulpes), continue to provide a reservoir of rabies infection (Lontai 1997). As a consequence, the main control strategy has involved oral vaccination of foxes through baits treated with an attenuated live vaccine (Geue et al. 2008). All of the commercially available oral rabies vaccines for wildlife are derived from a single common ancestral isolate known as Street Alabama Dufferin (SAD), derived from an infected dog in North America in 1935 (Fenje 1960; Sacramento et al. 1992).
A number of different vaccine strains have been developed from the original SAD strain by passaging under a variety of protocols (Geue et al. 2008). As in other live attenuated virus vaccibes, mutations during passaging have resulted in a certain degree of nucleotide sequence polymorphism within the SAD lineage (Geue et al. 2008). Understanding the evolution of live vaccines is important because evolutionary changes may affect the biological properties of the vaccine, including its antigenicity and effectiveness, as well as its potential to reacquire virulence (Hughes 2009). For example, attenuated live vaccines may show a pattern of relaxation of purifying selection, as was observed in three live attenuated vaccine strains of viruses belonging to the family Paramyxoviridae (Hughes 2009). In the case of the SAD vaccine strains, there have been reports of SAD vaccine-associated rabies in wildlife, in spite of evidence of the stability of the vaccine in experimental passaging studies (Beckert et al. 2009).
Purifying selection – that is, selection against deleterious variants – is the most prevalent form of natural selection at the molecular level (Nei 1987; Hughes 1999, 2008). Evidence for the prevalence of purifying selection on coding regions is provided by the observation that the number of synonymous substitutions per synonymous site (dS) generally exceeds the number of nonsynonymous (amino acid-altering) substitutions per nonsynonymous site (dN). This pattern evidently occurs because most nonsynonymous mutations are deleterious to protein function and thus are eliminated by purifying selection. In fact, there are two distinct aspects of purifying selection, revealed by different methods of analysis: (1) selection against strongly deleterious mutations, which are eliminated quickly; and (2) selection against slightly deleterious mutations, which may be relatively inefficient if the effective population size is small or recombination is limited (Ohta 1973: Hughes 2008).
RABV is a single stranded RNA negative-strand virus belonging to the genus Lyssavirus (family Rhabdoviridae). As in other members of the family, there are five genes, designated N, P, M, G, and L, each including a 5’ untranslated region (UTR) and 3’ UTR (Conzelmann 1998; Tordo et al. 1986). These are illustrated in Figure 1 according to their order in the RNA antigenome (Conzelmann 1998). There are also start, end, and intergenic regions; the longest intergenic region (between G and L) may represent the location of a remnant gene that has lost functionality (Tordo et al. 1986).
The purpose of the present paper is to use the tools of molecular evolutionary genetics to understand the evolutionary factors at work in the evolution of the SAD rabies vaccine lineage, in particular the role of purifying selection. By examining the patterns of nucleotide sequence polymorphism in wild-type genomic sequences, I provide evidence for the strength of both past and ongoing purifying selection on different genomic regions. By comparing these patterns with those seen in the SAD vaccine and in SAD vaccine-derived sequences isolated from red foxes, I both test for the relaxation of purifying selection in the latter and provide evidence regarding the genomic regions where such relaxation of selection occurs.
Sequence analyses were based on 35 rabies virus (RABV) sequences, 13 wild-type; 14 from SAD vaccine strains; and 6 vaccine-derived strains isolated from red foxes in Europe (for accession numbers, see figure 2). Genomic sequences were aligned using CLUSTALW (Thompson et al. 1997). In computing pairwise distances among a set of sequences, any site at which the alignment postulated a gap in any of the sequences was excluded from all comparisons (“complete deletion,” Nei and Kumar 2000). Complete deletion ensures that the same set of sites is compared in each pairwise comparison (Nei and Kumar 2000).
Phylogenetic analyses were conducted including the above-mentioned 35 sequences plus two other sequences belonging to the attenuated strains Ni-CE, RC-HL, and RV-97 (for accession numbers, see Figure 2). A number of different methods of phylogenetic analysis were applied to both complete genome sequences and to coding regions only, all of which showed similar results (not shown). The phylogenetic tree shown below was constructed by the neighbor-joining (NJ) method (Saitou and Nei 1987) on the basis of the maximum composite likelihood (MCL) distance (Tamura et al. 2007). The reliability of clustering patterns in the tree was assessed by bootstrapping (Felsenstein 1985); 1000 bootstrap samples were used. The tree was rooted following the phylogenetic analyses of Wu et al. (2007). Ancestral genomic sequences were reconstructed based on the NJ tree by the maximum parsimony method (Swofford 2003).
The nucleotide usage in the RABV sequences analyzed here was remarkably even. At third positions in coding regions, the G+C content was 49.5%, while in non-coding regions it was 41.1%. By the MCL method, the transition:transversion ratio at non-coding sites and at third positions of codons was estimated at 5.4:1. Because of this strong transitional bias, the number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous (amino acid-altering) substitutions per nonsynonymous site (dN) were estimated by Li’s (1993) method, which takes into account transitional bias in affecting the relative probabilities of synonymous and nonsynonymous mutations at two-fold and three-fold degenerate sites. The number of nucleotide substitutions per site (d) in non-coding regions was estimated by the MCL method. For coding regions, we computed the mean of dS for all pairwise comparisons (termed synonymous nucleotide diversity and symbolized πS) and the mean of dN for all pairwise comparisons (termed nonsynonymous nucleotide diversity and symbolized πN). Likewise, in non-coding regions, we computed mean d for all pairwise comparisons (nucleotide diversity or π). Standard errors of πS, πN, and π were computed by the bootstrap method (Nei and Kumar 2000); 1000 bootstrap samples were used. Separate estimates of πS and πN were obtained for each of the five protein-coding genes (Figure 1), as well as for the combined coding regions. In the case of non-coding regions, π was estimated separately for four separate categories of sites: (1) the start and end sites (upstream to the 5’ UTR of N and downstream to the 3’UTR of L); (2) the 5’ UTR regions of the protein-coding genes;(3) the 3’ UTR regions of the protein-coding genes; and (4) the intergenic regions (Figure 1).
Gene diversity (“heterozygosity”) at individual polymorphic nucleotide sites (SNP sites) was estimated by the formula:
where n is the number of alleles and xi is the population frequency of the ith allele (Nei 1987, p. 177). In coding regions, single nucleotide polymorphisms were classified either as synonymous or nonsynonymous depending on their effect of the encoded nucleotide sequence. Ambiguous sites were excluded from these analyses; i.e., sites at which both synonymous and nonsynonymous variants occurred or at which the polymorphism could be considered synonymous or nonsynoymous depending on the pathway taken by evolution.
I computed πS, πN, and π as well as gene diversity at polymorphic sites in both coding and non-coding regions separately for the following sets of sequences: (1) the wild-type isolates (N = 13), not including SAD or any other vaccine strain or vaccine-derived sequences; (2) the SAD vaccine sequences (N = 14); and (3) the SAD vaccine-derived sequences (N= 6). In testing for differences in gene diversity among categories (synonymous, nonsynonymous, and non-coding) of polymorphic sites, randomization tests were used. In each test, 1000 pseudo-data sets were created by sampling (with replacement) from the data; a difference between two categories was considered significant at the α level if it was greater than the absolute value of 100(1-α) % of the differences observed between the same categories in the pseudo-data sets.
In the NJ tree of complete rabies virus genomes, the SAD lineage formed a separate clade, with 100% bootstrap support, which was distinct from the other attenuated RABV strains (Figure 2). The SAD vaccine-derived wild isolates formed part of the clade with the SAD vaccine strains (Figure 2). (Topology only of the same tree is illustrated in Supplementary Figure S1). There were much shorter branch length within the SAD clade than in the rest of the tree (Figure 2), reflecting a lower level of sequence divergence within the SAD clade. The phylogenetic tree provided strong support for the hypothesis that SAD vaccine-derived wild strains originated independently at least twice. The vaccine-derived wild sequences EU886631 and EU886636 clustered with the SAD P5/88(pro) vaccine sequence EF206715 (the SAD P5/88 vaccine strain; Geue et al. 2008) with 100% bootstrap support (Supplementary Figure S1). Other vaccine and vaccine-derived sequences fell outside the latter cluster (Supplementary Figure S1).
In wild-type RABV sequences, πS was significantly greater than πN in each of the five protein-coding genes (Table 1). By contrast, in the case of SAD vaccine genomes, πS was not significantly greater πN in any of the five genes or in the combined coding regions of all five genes (Table 1). In the case of the SAD vaccine-derived genomes, πS was significantly greater than πN in the L gene but not in any of the other genes or in the combined coding regions of all five genes (Table 1). For each gene, the ratio of πN :πS was much higher in SAD vaccine genomes than in the wild type (Table 1), and this ratio was over 1.0 in the G gene and close to 1.0 in the M gene (Table 1). For the five genes, mean πN :πS for SAD vaccine genomes(0.77 ± 0.11) was significantly greater than that for the wild-type (0.06 ± 0.02) by a paired t-test (P = 0.002). Although the mean πN :πS for SAD vaccine-derived sequences (0.27 ± 0.12) was greater than that for wild-type sequences, the difference was not significant. These results thus support the hypothesis that the five protein-coding genes of RABV have been subject to strong purifying selection on the encoded amino acid sequences but that there has been relaxation of this selection in the SAD vaccine genomes.
In wild-type RABV, π in non-coding regions was significantly lower than πS at synonymous sites in all coding regions (Table 2). The overall π in non-coding regions (0.3160) was just 43.1% of overall πS (0.7330; Tables 1 and and2).2). This pattern indicates purifying selection on non-coding regions. This purifying selection appeared to be particularly strong in the case of 5’ UTR, which showed a significantly lower π than did either the 3’ UTR or intergenic regions (Table 2). In the case of SAD vaccine genomes, on the other hand, the start and end regions, the 5’ UTR, the 3’ UTR, and the intergenic regions all showed values of π that were not significantly different from πS in all coding regions (Table 2). In fact, in the case of the SAD vaccine genomes, π in all non-coding regions was significantly greater than πS in all coding regions, a reversal of the pattern seen in wild-type RABV (Table 2). In the SAD vaccine-derived genomes, there were no polymorphisms in the 5’UTR and 3’UTR, resulting in significantly lower π in these regions than πS in all coding regions (Table 2). However, overall π in all non-coding regions of the SAD vaccine-derived genomes was not significantly different from overall πS (Table 2).
Patterns of gene diversity were examined in order to test for ongoing purifying selection at polymorphic sites in coding and non-coding regions. In the wild-type sample, mean gene diversity at polymorphic nonsynonymous sites (0.336) was significantly less than that at either polymorphic synonymous sites (0.369) or polymorphic non-coding sites (0.356; Figure 3A). In the SAD vaccine sequences, there was no significant difference in mean gene diversity among nonsynonymous (0.282), synonymous (0.285), and noncoding (0.299) polymorphic sites (Figure 3B). Likewise, in the SAD vaccine-derived sequences, there was no significant difference in mean gene diversity among nonsynonymous (0.333), synonymous (0.328), and noncoding (0.389) polymorphic sites, although in this case the numbers of polymorphic sites in the three categories (6, 10, and 3, respectively) were very small.
There were 4218 sites in coding regions that were polymorphic in the wild-type sequences but not in the SAD vaccine sequences. Of these, 1862 (44.1 %) were nonsynonymous. By contrast, of 65 sites in coding regions that were polymorphic in SAD but not in the wild-type sequences, 48 (73.8 %) were nonsynonymous. The difference between proportions was highly significant (χ2 = 22.9; 1 d.f.; P < 0.001). Likewise, of 50 sites in coding regions that were polymorphic in both wild-type and SAD, only 12 (24.0 %) were nonsynonymous. This pattern also differed significantly from that seen at sites polymorphic only in the wild-type sequences (χ2 = 8.1; 1 d.f.; P < 0.001) and from that seen at sites polymorphic only in SAD (χ2 = 28.1; 1 d.f.; P < 0.001).
Maximum parsimony was used to reconstruct the nucleotide sequence changes along the branch ancestral to the SAD clade (Figure 2). Along this branch, there were 263 (64.3%) synonymous changes; 76 (18.6%) nonsynonymous changes; and 70 (17.1%) changes in noncoding sites. Along this branch, dS was 0.0847 ± 0.0058, dN was 0.0107 ± 0.0010; and d in non-coding regions was 0.0653 ± 0.0081. There was a significant difference between dS and dN (P < 0.001), but not between dS and d in non-coding regions. Thus, there was evidence along the branch ancestral to the SAD clade of strong purifying selection on the RABV coding regions, but a relaxation of purifying selection on non-coding regions.
Within the population of SAD vaccine sequences, there were 36 (29.8%) synonymous polymorphic sites; 58 (47.9%) nonsynonymous polymorphic sites; and 27 (22.3%) noncoding polymorphic sites. These proportions differed strikingly from those of changes along the branch ancestral to the SAD clade, and the difference in proportions was highly significant (χ2 = 53.0; 2 d.f.; P < 0.001). This difference was due mainly to the great excess of nonsynonymous polymorphisms (47.9%) relative to nonsynonymous divergent changes (18.6%).
Examination of the patterns of nucleotide sequence polymorphism was used to test for evidence for the relaxation of purifying selection on the SAD strain of live attenuated RABV vaccine, and to identify the regions of the genome where such relaxation is most apparent. Wild-type RABV sequences showed strong evidence of both past and ongoing purifying selection. This selection was focused mainly on nonsynonymous sites in the five protein-coding genes and on sites in the non-coding portions of the genome, particularly the 5’ UTR of genes.
In wild-type genomes, the nonsynonymous nucleotide diversity (πN) was significantly lower than synonymous nucleotide diversity (πS) in each of the genes, and πN was overall only 5% as great as πS (Table 1). These results support the hypothesis that purifying selection has acted to eliminate a substantial fraction of nonsynonymous mutations occurring in RABV genomes. Nucleotide diversity (π) in non-coding regions was also significantly reduced compared to πS, although the reduction was not so great as in the case of nonsynonymous sites, indicating that noncoding sites have also been subject to purifying selection. The nucleotide diversity was lowest in the 5’ UTR of genes and significantly less than in either the 3’ UTR or the intergenic regions (Table 2), indicating that the 5’ UTR have been the focus of particularly strong past purifying selection. In addition, mean gene diversity was significantly reduced at nonsynonymous sites in comparison to both synonymous and non-coding sites, implying that ongoing purifying selection is acting reduce the frequency of certain nonsynonymous variants. The evidence for both past and ongoing purifying selection on RABV is consistent with evidence from other RNA viruses, which in general are subject to highly effective purifying selection (Hughes and Hughes 2007).
SAD vaccine sequences showed a strikingly different pattern of nucleotide sequence polymorphism, indicating an overall relaxation of purifying selection. None of the genes showed πN significantly less than πS (Table 1), and none of the non-coding regions showed π significantly less than πS (Table 2). In each of the five genes, the ratio πN :πS was substantially greater in the SAD vaccine sequences than in the wild-type in the SAD vaccine sequences; πS and πN were nearly identical in two of the genes (M and G), with πN actually slightly higher than πS in G (Table 1).
The level of polymorphism within the SAD vaccine sequences was much lower than in the wild-type sequences, and it might be argued that the lack of a significant difference between πS and either πN or π merely reflects the difficulty to detecting such differences statistically when the number of substitutions is small. However, there are several lines of evidence against this interpretation. First, when the same analysis was applied to 6 SAD vaccine-derived sequences isolated from red foxes, there was an even lower level of sequence diversity than the SAD vaccine sequences; yet in the case of the SAD vaccine-derived sequences, πS was significantly greater than πN in the L gene (Table 1). Likewise in the SAD vaccine-derived sequences, π in the 5’ UTR and 3’UTR was significantly less than πS in coding regions (Table 2). Furthermore, in the case of the SAD vaccine sequences, overall π in the non-coding regions was actually significantly greater than πS was significantly, a reversal of the pattern seen in the wild-type sequences.
Relaxation of purifying selection at nonsynonymous sites was seen likewise in analyses of gene diversity at polymorphic sites in the SAD vaccine sequences. In contrast to the wild-type, mean gene diversities at synonymous and nonsynonymous sites were not significantly different in the case of SAD vaccine sequences (Figure 3B). Moreover, sites polymorphic in SAD vaccine sequences but not in the wild-type were predominately nonsynonymous. Likewise, the SAD vaccine sequences showed an excess of nonsynonymous polymorphisms in comparison to divergent changes that occurred along the branch ancestral to the SAD clade.
Since there were more than 400 reconstructed nucleotide changes along the branch ancestral to the SAD clade, it is not possible to determine which changes are responsible for loss of virulence. It is worth noting that there was strong purifying selection at nonsynonymous sites in coding regions, but not at non-coding sites, along this branch. This suggests the possibility that the attenuated phenotype may arise due to changes in non-coding regions rather than amino acid sequence changes. However, in order to reconstruct the changes that occurred during attenuation of the SAD vaccine, it will be necessary to obtain an outgroup sequence closer to thec SAD ancestor than are any currently available sequences.
The present results support the hypothesis that the SAD vaccine lineage is characterized by a substantial relaxation of purifying selection and therefore a relaxation of functional constraint. In order to understand the factors involved in maintaining vaccine effectiveness, it is important to understand the biological effects of such changes in functional constraint (Hughes 2009). Because the SAD vaccine is administered to wildlife in baits, its effectiveness in the field can generally be determined only indirectly, through its impact on rabies prevalence (Lontai 1997). In this case, it is particularly important to monitor mutational changes occurring in the vaccine strain in order to ensure continued vaccine effectiveness.
This research was supported by grant GM43940 from the National Institutes of Health.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.