|Home | About | Journals | Submit | Contact Us | Français|
RNA viruses within a host exist as dynamic distributions of closely related mutants and recombinant genomes. These closely related mutants and recombinant genomes, which are subjected to a continuous process of genetic variation, competition, and selection, act as a unit of selection, termed viral quasispecies. Characterization of mutant spectra within hosts is essential for understanding viral evolution and pathogenesis resulting from the cooperative behavior of viral mutants within viral quasispecies. Furthermore, a detailed analysis of viral variability within hosts is needed to design control strategies, because viral quasispecies are reservoirs of viral variants that potentially can emerge with increased virulence or altered tropism. In this work, we report a detailed analysis of within-host viral populations in 13 field isolates of the bipartite Tomato chlorosis virus (ToCV) (genus Crinivirus, family Closteroviridae). The intraisolate genetic structure was analyzed based on sequencing data for 755 molecular clones distributed in four genomic regions within the RNA-dependent RNA polymerase (RNA1) and Hsp70h, CP, and CPm (RNA2) open reading frames. Our results showed that populations of ToCV within a host plant have a heterogeneous and complex genetic structure similar to that described for animal and plant RNA viral quasispecies. Moreover, the structures of these populations clearly differ depending on the RNA segment considered, being more complex for RNA1 (encoding replication-associated proteins) than for RNA2 (encoding encapsidation-, systemic-movement-, and insect transmission-relevant proteins). These results support the idea that, in multicomponent RNA viruses, function can generate profound differences in the genetic structures of the different genomic segments.
RNA viruses within a host exist as dynamic distributions of closely related mutants and recombinant genomes subjected to a continuous process of genetic variation, competition, and selection. These closely related mutants and recombinant genomes act as a unit of selection and are termed viral quasispecies (1, 12). Quasispecies complexity arises as a result of the genetic variation introduced by the low fidelity of RNA polymerases, and reverse transcriptases in the case of retroviruses, in combination with large progenies, small genome sizes, and short replicative cycles. Positive and negative selection and genetic drift and migration are the main evolutionary forces acting on the mutant spectrum. As a consequence, quasispecies evolve with a great capacity to adapt to changing environments and to survive even after strong population bottlenecks (15). Characterization of mutant spectra within hosts is essential for understanding viral evolution and pathogenesis resulting from the cooperative behavior of viral mutants within viral quasispecies (11). Furthermore, a detailed analysis of viral variability within hosts is needed to design appropriate control strategies because viral quasispecies are reservoirs of viral variants that potentially can emerge with increased virulence or altered tropism.
Most plant viruses have a genome constituted by RNA, and many of them cause very important diseases in agricultural crops (22). Analyses of quasispecies diversity in plant viruses are scarce. Studies have focused on experimental evolution, and little work has been done to analyze quasispecies in field isolates (reviewed in reference 44). Researchers have only recently begun to study the nature of plant RNA virus populations within individual host plants and the factors that affect the effective population sizes (the fraction of the population that passes its genes to the new generation), such as genetic bottlenecks and positive and negative selection (44).
Two main hypotheses have been proposed to explain the evolution of multicomponent RNA viruses (i.e., those with segmented genomes encapsidated in separate particles). Nee (36) suggested that evolution toward smaller RNA molecules could be favored by selection on RNAs within a host cell because smaller RNA molecules could be replicated and encapsidated more rapidly. In contrast, Chao (5) proposed that multicomponent reproduction evolved in RNA viruses as a form of sex. Sex can be advantageous to organisms that experience high mutation rates (like RNA viruses) because it can bring together genomes that have not been destroyed by deleterious mutations and thereby maintain or increase genetic population robustness (14).
The current paper concerns the within-host genetic structure of Tomato chlorosis virus (ToCV) (58), an economically important crinivirus within the family Closteroviridae, the most complex family of positive-strand RNA plant viruses. Criniviruses have two single-stranded RNA molecules (RNA1 and RNA2) of positive polarity, with the exception of the tripartite Potato yellow vein virus (28). Both RNA1 and RNA2 are needed for infectivity and are separately encapsidated in long and flexuous virions that are transmitted in nature by whiteflies (Hemiptera: Aleyrodidae) in a semipersistent manner.
ToCV RNA1 encompasses four open reading frames (ORFs) (31, 57) (Fig. (Fig.1).1). The first two ORFs encode proteins involved in the replication of viral RNA. ORF 1a encodes a protein containing protease, methyltransferase, and helicase domains. ORF 1b encodes a protein containing the conserved motifs identified in the RNA-dependent RNA polymerases (RdRp) of positive-strand RNA viruses. ORF 2 encodes a putative protein of unknown function, and the predicted small ORF 3 encodes a putative protein of 6 kDa with a transmembrane domain similar to those of other 3′-end proteins of criniviruses. RNA2 includes nine ORFs that encompass the hallmark gene array of the family Closteroviridae; these ORFs are involved in encapsidation, movement, and host transmission (9) (Fig. (Fig.1).1). They encode a heat shock protein 70 family homologue (Hsp70h); a 59- to 60-kDa protein, the coat protein (CP); and a diverged CP (CPm) (30, 57).
Intraisolate genetic diversity has been studied for a few members of the Closteroviridae family, namely the closterovirus Citrus tristeza virus (CTV) (2, 3, 25, 47, 56), the ampeloviruses Grapevine leafroll-associated virus 1 (GLRaV-1) and GLRaV-3 (24, 51), and the criniviruses Cucurbit yellow stunt disease virus (CYSDV) and Lettuce infectious yellows virus (LIYV) (45, 46).
This paper reports a detailed analysis of within-host viral populations in field isolates of ToCV from southeastern Spain. The intraisolate genetic structures of 13 infected tomato plants were analyzed based on sequencing data for 755 molecular clones distributed in four genomic regions within the RdRp, Hsp70h, CP, and CPm ORFs. Our results demonstrate that within-host populations of ToCV have a heterogeneous and complex genetic structure similar to that described for animal and plant RNA viral quasispecies. Moreover, the structures of these populations clearly differ depending on the RNA segment considered. Our results suggest that these differences are related to the specific function of each genomic segment.
ToCV isolates were obtained from tomato plants sampled from commercial field-grown or greenhouse-grown crops in the provinces of Málaga, Almería, and Murcia in southeastern Spain from 1997 to 2004 (see Table Table22 for a summary). The plants were tested for ToCV infection by petiole tissue printing hybridization with a digoxigenin-labeled probe recognizing the CP gene. Leaf samples from the ToCV-infected plants were stored at −80°C until RNA extraction. RNA was extracted with TRIzol reagent (Sigma) following the manufacturer's instructions plus an extra step of centrifugation at 12,000 × g at 4°C for 10 min after the addition of the reagent to remove insoluble glycopolysaccharides. RNA obtained from 0.1 g leaf tissue was resuspended in 50 μl of diethyl pyrocarbonate-treated water.
Conserved sequences in the ToCV genome were identified after multiple-sequence alignment, with the ClustalX program (50), of the criniviruses (GenBank accession numbers are in parentheses) ToCV (Florida isolate, AY903447 and AY903448; AT80/99 Spain isolate, DQ983480 and DQ136146), LIYV (U15440, U15441), Strawberry pallidosis-associated virus (AY488137, AY488138), Beet pseudo yellows virus (AY330918, AY330919), CYSDV (AY242077, AY242078), Sweet potato chlorotic stunt virus (AJ428554, AJ428555), and Potato yellow vein virus (AJ557128, AJ557129, AJ508757) and the closterovirus Beet yellow virus (X73476). Conserved amino acid domains of the ToCV proteins RdRp, Hsp70h, CP, and CPm were identified using the ANAGram protein function assignment program (http://jaguar.genetica.uma.es/anagram.htm) (41). The sequences of oligonucleotide primers designed within conserved genomic regions of RNA1 were as follows (nucleotide numbers are according to the AT80/99 Spain ToCV isolate [30, 31]): RdRp (from 6664 to 7426; 763 nt), primers MA396 (forward; 5′-TGGTCGAACAGTTTGAGAGC-3′) and MA397 (reverse; 5′-TGAACTCGAATTGGGACAGA-3′). Primer sequences targeting conserved genomic regions of RNA2 were as follows: Hsp70h (from 1171 to 1827; 657 nt), primers MA394 (forward; 5′-CCGGCTGATTACAAGTCTGG-3′) and MA395 (reverse; 5′-CTCTTGTGCATGGAGCATTG-3′); CP (from 4456 to 5072; 617 nt), primers MA461 (forward; 5′-ACATCTCTCATTCCGGCTAATC-3′) and MA462 (reverse; 5′-TACAGTTCCTTGCCCTCGTTAC-3′); and CPm (from 6503 to 7119; 617 nt), primers MA392 (forward; 5′-TAAGGTCCAAACCGAAGTGG-3′) and MA393 (reverse; 5′-AAAGCTGACTCGTGCTCACA-3′).
RNAs were amplified by two-step reverse transcription (RT)-PCR as follows. One microliter of RNA was thawed at 70°C for 3 min, and after it was cooled on ice, 8 pmol of reverse primer was added, and the mixture was incubated at 65°C for 5 min and cooled on ice. A mixture containing 2.5 U of avian myeloblastosis virus reverse transcriptase (Promega) and 10 U of recombinant RNasin RNA inhibitor (Promega) was added to the primer mixture in a final reaction volume of 20 μl. The RT reaction was done at 37°C for 45 min, followed by incubation at 94°C for 5 min. Five microliters of the first-strand reaction mixture was used in a touchdown PCR in a volume of 50 μl with 1.25 U of Pfu DNA polymerase and 32 pmol of reverse and forward primers. Touchdown PCR proceeded with an initial denaturation at 95°C for 2 min, followed by 20 cycles at 95°C for 1 min, 60°C for 30 s (with a decrease of 0.5°C per cycle), and 72°C for 2 min and finally 10 additional cycles at an annealing temperature of 45°C. To ensure that the small quantity of template did not produce a bottleneck, we used cDNA only from RNA samples that had yielded positive products in a 1/10 dilution. Ten microliters of PCR product was treated with 10 U of exonuclease I and 2 U of shrimp alkaline phosphatase (both from Fermentas) and was directly sequenced to obtain the consensus sequence. The PCR products were cloned into the pCR-BluntII-TOPO vector (Invitrogen) and transferred into Escherichia coli Top10 electrocompetent cells (Invitrogen). Plasmids from 9 to 20 positive colonies for each ORF and isolate were amplified with ϕ29 DNA polymerase (TempliPhi amplification kit; Amersham) by following the manufacturer's protocol and were then sequenced. The basal mutation frequency (or experimental error), which was determined after sequencing 22 molecular clones of a T7 runoff transcript of the CP region, was 1.47 × 10−4 (two mutations found in 13,574 nt sequenced). The observed error was comparable to that reported for T7 or SP6 RNA polymerase (0.5 × 10−4 or 1.34 × 10−4 misincorporations per copied nucleotide, respectively [21, 43]). These numbers of misincorporations would generate 0.67 or 1.82 mutations, respectively.
Multiple-sequence alignments for each field isolate and genomic region were performed using ClustalX with the default parameters. A standard nonparametric one-way run test, available in SPSS 14.0 statistical software (SPSS Inc.), was performed in order to analyze the distribution of mutations as a function of genome position. The acceptability of amino acid changes identified in each genomic region was evaluated following the structure-genetic (SG) matrix of Feng et al. (17). In this matrix, structural similarities, as well as probabilities of amino acid changes, are assigned values between 0 and 6; drastic amino acid changes are given a value of 0, whereas synonymous replacements are given a value of 6.
Mutation frequencies were calculated by scoring different mutations (repeated mutations were counted only once) relative to the consensus sequence divided by the total number of nucleotides sequenced (12). Shannon entropy values were calculated using the following formula: −Σi [(pi × ln pi)/ln N], in which pi is the frequency of each sequence in the mutant spectrum and N is the total number of sequences compared (54). The calculated Shannon entropy values range from 0 (all sequences are identical) to 1 (all sequences are different).
Alignments were used to estimate pairwise genetic distances by Kimura's two-parameter method (23) implemented with MEGA version 3.1 (26) (available at http://www.megasoftware.net). Standard deviations were calculated by the bootstrap method with 1,000 repeats (37). Pairwise synonymous substitutions per synonymous site (dS) and nonsynonymous substitutions per nonsynonymous site (dN) were calculated according to the Pamilo-Bianchi-Li method based on Kimura's two-parameter model (27, 37, 39). Phylogenetic relationships were inferred by the neighbor-joining method available in MEGA3. The robustness of evolutionary relationships was assessed by 1,000 bootstrap replicates (37). The pairwise nucleotide identity profile of the putative recombinant molecule found in the RdRp region of sample AT80/99 was determined with SimPlot (29).
A hierarchical analysis of molecular variance (AMOVA) was used to evaluate the distribution of the genetic diversity, based on the type and frequency of every sequence in each isolate and on the likelihood that the distribution was random; the AMOVA was performed using Arlequin 3.01 (16). Total variance was partitioned into variance components due to within-isolate and between-isolate variance. Genetic differentiation between ToCV subpopulations was estimated with the F statistic (55), and its significance was tested by nonparametric permutation analysis based on 10,000 repetitions (16).
To determine the genetic composition of ToCV isolates, RNA was extracted from 13 tomato plants infected with ToCV, and four genomic regions in the RdRp (RNA1), Hsp70h, CP, and CPm (RNA2) ORFs were amplified. Between 9 and 20 clones for each genomic region and isolate were sequenced. Mutations (base substitutions, indels, and nonsense mutations) were computed only once per isolate after comparison of each sequence with its corresponding consensus sequence. The genetic variabilities of the four genomic regions together accounted for a total of 165 mutations (identified in 755 clones corresponding to 463,272 nt sequenced), represented in Fig. Fig.11 (see the supplemental material for a list of mutations and amino acid replacements). Base substitutions in all 13 isolates were distributed throughout the genome of ToCV, and insertions were found only in the RdRp and Hsp70h regions, whereas deletions were found in the four genomic regions. The RdRp region accumulated the highest number of mutations per nucleotide position (representing 42% of the total), whereas Hsp70h, CP, and CPm harbored 25%, 16%, and 18% of the total computed mutations, respectively. An overall χ2 test indicated significant differences in the frequencies of mutation for the four regions (χ2 = 14.64; df = 3; P = 0.0022). Pairwise χ2 tests corrected using the sequential Bonferroni method also revealed that the number of mutations per nucleotide position was significantly higher in the RdRp region than in the other regions.
Mutations shared by several isolates were found in the RdRp region and, to a lesser extent, in the Hsp70h and the CP regions, but they were not detected in the CPm region. The distribution of mutations after a standard nonparametric one-way run test was shown to be random in the genomic regions RdRp, Hsp70h, and CPm (P = 0.81, 0.13, and 0.90, respectively). In contrast, mutations in the CP region were not randomly distributed (P < 0.01), suggesting the existence of mutational hot spots. Altogether, these results indicate that the accumulation of mutations was not evenly distributed among the four genomic regions of ToCV that were analyzed.
The types of mutations predominant in field isolates of ToCV were studied by pooling the changes belonging to each of the four genomic regions. Percentages with respect to the number of mutations within each genomic region and with respect to the total number of mutations found in all genomic regions jointly were calculated (Table (Table1).1). Considering each genomic region independently, the most abundant mutations were base substitutions (82.9 to 94.2%); insertions were found only in the RdRp (1.4%) and Hsp70h (2.4%) regions, whereas deletions were present in the four regions and especially in the Hsp70h (14.6%) and CP (11.5%) regions. In general, transitions were more common than transversions, except in the CP region, where transversions amounted to 52%. U→C transitions were the most frequent base substitutions in the RdRp, Hsp70h, and CPm regions, followed by G→A. G→U transversions (27%) were the most common changes within the CP region. Moreover, C→G transversions were the rarest nucleotide substitutions and were found only in the CP region. The abundance of transversions over transitions in the CP region suggests that negative selection suppressed the number of transitional changes, because polymerases in general have misincorporation tendencies that favor transitions. When mutations from all genomic regions were pooled, transitions accounted for 66.4% whereas transversions accounted for 33.6%. Insertions and deletions represented 1.2% and 8.5% of the total number of mutations, respectively, and most were situated in U- or A-rich regions (data not shown). Nonsense mutations were found only in the RdRp (4.3%) and Hsp70h (11.5%) regions.
To assess whether selective constraints could be acting on the viral genomic RNA, synonymous and nonsynonymous mutations were computed for the four genomic regions. As indicated in Table Table1,1, nonsynonymous mutations were more abundant in the regions belonging to RNA2 (from 65.2 to 74.3%), suggesting that selective constraints act at the RNA level. In the RdRp region located in RNA1, however, the number of synonymous changes was higher, indicating that in this case selection could be operating at the protein level instead.
The relative frequencies and acceptabilities of amino acid changes according to the SG matrix of Feng et al. (17) for all four genomic regions are illustrated in Fig. Fig.2.2. While the Hsp70h and CPm regions exhibited similar distributions with relative frequencies inversely proportional to acceptability values, the CP region allowed more severe amino acid changes (values 1 and 2), indicating a higher tolerance in this region. Conversely, most of the amino acid changes in the RdRp region (60%) were of value 6 (i.e., synonymous), meaning that, despite being the most variable coding region, RdRp does not tolerate drastic amino acid changes.
Considering the intraisolate genetic variability observed in the ToCV field samples under study, estimations of mutation frequency, Shannon entropy, and average genetic distance were used to characterize each mutant spectrum. The mutation frequency values within each genomic region for the 13 isolates of ToCV were calculated. As shown in Table Table2,2, the variation of mutation frequency values depended on the particular genomic region, being highest for the RdRp region and ranging from 0.9 × 10−4 substitutions per nucleotide for isolate AT45/02 to 22.7 × 10−4 substitutions per nucleotide for isolate AT80/99. For the three regions of RNA2, mutation frequency values (the number of substitutions per nucleotide) ranged from <1.08 × 10−4 to 6.07 × 10−4 for the Hsp70h region, from <1.16 × 10−4 to 5.82 × 10−4 for the CP region, and from <1.08 × 10−4 to 6.12 × 10−4 for the CPm region. These results are in agreement with mutation frequency values obtained for several animal and plant RNA viruses (12, 42, 44).
The proportion of different genomes in a mutant distribution was measured by calculating the Shannon entropy. Very different degrees of heterogeneity and complexity of the ensemble of ToCV RNA molecules in each genomic region for each isolate were found, as shown in Table Table2.2. Shannon entropy values ranged from 0.089 to 0.706 in the RdRp region, from 0 to 0.477 in the Hsp70h region, from 0 to 0.429 in the CP region, and from 0 to 0.347 in the CPm region. These results indicate that the heterogeneity and complexity of ToCV varied greatly, not only between isolates, but also among the four genomic regions analyzed within the same isolate. They also show that, in general, the RdRp region presents the most variable and heterogeneous mutant spectra.
The genetic complexity of ToCV field isolates was estimated by calculating pairwise genetic distances within isolates using Kimura's two-parameter method. Average genetic distances (d) for each genomic region in each field isolate are shown in Table Table3.3. The maximum value of d estimated for the RdRp region was 2.89-fold higher than that for the Hsp70h and CP regions and 3.89-fold higher than the maximum value of d for the CPm region. Furthermore, the measure of dispersion of d between mutant spectra around the mean (i.e., the coefficient of variation [CV]) among the four regions was highest for RdRp (CV = 1.106, 0.697, 0.986, and 0.775 for RdRp, Hsp70h, CP, and CPm, respectively). The CV value of >1 indicates that d for RdRp has a “high-variance” distribution of data points. However, d values obtained for each genomic region after pooling all sequences from all mutant spectra (referred as total in Table Table3)3) were very similar, with the largest difference being 4.88-fold higher for RdRp in relation to the Hsp70h region. These results indicate, first, that RdRp is by far the most diverse genomic region of ToCV intrahost populations and, second, that the great differences in d values between isolates within each genomic region might be underestimated if total d is employed.
Thus, ToCV intraisolate variability in tomato field samples is characterized by the high heterogeneity and genetic complexity of RNA virus populations. High mutation frequencies, Shannon entropy values, and average genetic distances estimated from ToCV mutant spectra in four genomic regions indicate that ToCV exists as viral quasispecies in field tomato isolates.
To address whether positive or negative selection is shaping ToCV populations, pairwise synonymous substitutions per synonymous site (dS) and nonsynonymous substitutions per nonsynonymous site (dNS) were estimated using the Pamilo-Biachi-Li method and the dNS/dS ratio, when its calculation was possible. As shown in Table Table4,4, the dNS/dS ratio for each quasispecies and genomic region, estimated individually, ranged from 0.037 to 2.241, suggesting different constraints on sequence change depending on both the field isolate and the genomic region analyzed. Thus, the RdRp and CPm regions contained mutant spectra subjected to negative selection (dNS/dS < 1) and to positive selection (dNS/dS > 1). In both the Hsp70h and CP regions; however, purifying or negative selection seemed to be acting on every quasispecies. Average dNS/dS ratios for the genomic regions were between 0.265 and 0.768, suggesting that ToCV quasispecies in all four genomic regions would be under negative selection. These data contrasted greatly with the dNS/dS ratios obtained after pairwise analysis of pooled sequences from each genomic region (indicated as the total in Table Table4).4). In this case, the RdRp (dNS/dS = 0.110) and CP (dNS/dS = 0.115) regions would have similar stringent constraints on amino acid change, indicating the effect of purifying selection. On the other hand, Hsp70h (dNS/dS = 1.364) would be subjected to strong positive selection and CPm (dNS/dS = 1.071) to nearly neutral selection.
The phylogenetic relationships of sequences belonging to quasispecies of genomic regions RdRp, Hsp70h, CP, and CPm were deduced from multiple-sequence alignments by the neighbor-joining method. Figure Figure33 illustrates unrooted trees representing phylogeny within the four genomic regions. The composition, frequency, and phylogenetic relationship of the sequences found in each quasispecies determined the different shapes of the trees. The inferred tree for the RdRp region showed the maximum bifurcation, whereas the Hsp70h tree exhibited the lowest branching of all four regions. Phylogenetic trees of the CP and CPm regions displayed similar divergence levels, as determined by the analogous distributions of their mutant spectra. Interestingly, in the RdRp region, phylogenetic distances were higher because of sequences that differed by 10 base substitutions, indicated as types I and II. Even though most quasispecies in this region showed consensus sequences of type I, with closely related but nonidentical master sequences, a type II consensus sequence was found in sample AT230/00. However, a type II variant, immersed in an ensemble of type I RNA molecules, was detected in sample AT198/00. This sequence lacked two mutations present in the AT230/00 consensus and was positioned at an intermediate distance between type I and type II RNAs. Furthermore, a putative recombinant RNA molecule was found in sample AT80/99 (Fig. (Fig.3).3). The 5′ moiety of the amplified RdRp fragment was identical to consensus sequence AT230/00 (type II), and the 3′ moiety was identical to consensus sequence AT80/99 (type I). Although SimPlot analysis strongly suggests a recombinant origin for this molecule (not shown), it remains possible that convergence of sequences in this genomic region could be responsible for the observed difference between the moieties.
An AMOVA test was used to determine whether genetic variation represented in the phylogenetic trees of each genomic region was a reflection of variation within isolates or between isolates. The results indicated that the observed variability in the genomic regions RdRp (81.8%), CP (71.8%), and CPm (82.5%) was due mainly to differences between populations (Table (Table5).5). Quasispecies from the RdRp region displayed different master sequences surrounded by their own mutant spectrum. In the CP and CPm regions, most of the mutant spectra shared the same master sequence, but the extent of variation within each spectrum was very narrow, accounting for the higher diversity between isolates than within isolates. In the Hsp70h region, most variation (61.6%) came from within populations because quasispecies in this region shared the master sequence but displayed distinctive mutant spectra.
Knowledge about the structure and genetic content of natural plant virus populations is limited. Despite the growing number of studies of virus diversity, few of them address the genetic characterization of populations that plant viruses establish within their hosts. However, there is increasing evidence that intrahost mutant spectra in both plant DNA and RNA viruses exist as virus quasispecies and that such quasispecies play a key role in virus adaptability and pathogenesis (11, 20). Virus quasispecies are subjected to drastic and frequent changes of environment and repeated bottlenecks that result in fitness loss due to the accumulation of deleterious mutations (6, 13), although a virus population may lose or gain fitness depending on the initial fitness of the population and the size of the bottleneck (38). Understanding virus quasispecies structure and behavior is fundamental for the design of effective control strategies because viral quasispecies are reservoirs of emerging viral variants (10, 11, 20, 34).
In this work, mutant spectra of 13 field isolates of ToCV sampled from 1997 to 2004 have been characterized by nucleotide sequencing of 755 molecular clones. Four genomic regions, located in two separately encapsidated genomic segments, have been sequenced, amounting to a total of 463,272 nt. Sequence determination of molecular clones has been documented to be a valid experimental approach for characterizing quasispecies from natural infections of bacterial, plant, and animal RNA viruses (1, 44). Our results have shown uneven distributions of mutations in the four genomic regions analyzed, strongly suggesting that representative regions of the viral genome should be considered to obtain rigorous estimates of mutation frequencies and genetic distances between sequences within mutant clouds. The need for analysis of several genomic regions has also been emphasized in previous studies of the variability of animal and plant RNA viruses (1, 18, 33). Some genomic regions of plant viruses can exhibit extremely low variation (48) and give an erroneous idea of virus genetic variability.
Genetic variation in a quasispecies can be overestimated because of experimental error introduced during the process of copying and amplification of viral RNA. This error may arise from the limited amounts of initial template and the use of low-fidelity polymerases (12). In the present study, such bias in the determination of intrahost virus population nucleotide heterogeneity was prevented by observing the following precautions for RT-PCR amplification. First, RNAs were transcribed and cloned only from RNA that yielded amplification products after a 1:10 dilution (i.e., the quantity of template was not limiting). Second, a Pfu high-fidelity DNA polymerase was used during PCR amplification. Third, the reaction conditions were those reported to favor detection of most of the viral variants (19).
Our results showed that ToCV populations in a natural host consisted of heterogeneous and complex mutant distributions. These distributions contain a most abundant master sequence surrounded by a mutant spectrum of closely related variants. Within the family Closteroviridae, of which ToCV is a member, there have been few studies of intraisolate genetic diversity in field isolates. In perennial plants, like citrus and grapevine, the variability of the closterovirus CTV (25, 47) and the ampeloviruses GLRaV-1 and GLRaV-3 (24, 51) have been analyzed by Single-strand conformation polymorphism. Estimates of mutation frequencies, however, have not been previously presented for criniviruses or for members of the other two genera in the family Closteroviridae, Closterovirus and Ampelovirus. Our results give a clear picture of the extent of genomic heterogeneity and complexity that the crinivirus ToCV exhibits in its primary natural host, tomato. For the most variable region, RdRp, if only mutations from the predominant RNA1 sequences (i.e., type I variants) in isolates AT80/99 and AT198/00 are considered, mutation frequencies are reduced to 15.8 × 10−4 and 4.8 × 10−4, respectively. Even so, mutation frequencies for the RdRp region are higher than those for the Hsp70h, CP, and CPm regions located in RNA2. For the CP region of the crinivirus CYSDV, genetic diversity (d) within three field isolates has been documented to be from 1.9 × 10−4 to 7.3 × 10−4 (46). In the case of Closteroviridae-infected perennial plants, these figures may be up to 1,000-fold higher. For instance, intraisolate diversity of CTV in citrus reached 0.142, with diversity in this case estimated as the genetic distance between two haplotypes (defined by the single-strand conformation polymorphism pattern) (25). For the Hsp70h region of GLRaV-1 in grapevine, diversity values obtained from sequencing data were from 2.0 × 10−3 to 3.6 × 10−2, although these figures were obtained from comparison of six or fewer clones (24). The lower level of genetic variation found in ToCV than in CTV and GLRaV-1 might be caused in part by differences in host plants and/or modes of transmission, as suggested for CYSDV (46). CTV and GLRaV-1 have perennial hosts that can be infected for decades, whereas tomato, the primary host for ToCV, is an annual, and infections generally do not last for more than 180 days before the crop ends. ToCV is transmitted only by its whitefly vectors, whereas CTV and GLRaV-1 are also transmitted by vegetative propagation, which could relax selective constraints related to insect transmission.
In the RdRp region, coexistence of phylogenetically distant sequences was detected. In this region, two types of RNA 1, namely, types I and II, have been found (G. Lozano, J. Navas-Castillo, unpublished results). While mutant variants in most ToCV isolates analyzed in this study were closely related to type I, all sequences belonging to isolate AT230/00 displayed type II-related sequences. Isolate AT198/00, however, was found to have one type II variant coexisting in a virus spectrum dominated by type I variants. A similar situation has been described for CTV isolates, in which mutant spectra harbor two divergent master sequences (47). A mutant virus with the potential to become dominant may remain as a minority in the population because of the suppressive effect of the mutant spectrum and can be retained by complementation (8, 12). In an evolving quasispecies, bottlenecks can isolate genomes that are able to reinitiate an infection (4). In nature, intrahost ToCV populations are subjected to repeated whitefly inoculations, and it is likely that the low-frequency type II variant might have been introduced into a tomato plant previously infected by the type I sequences. However, maintenance of this type II variant in that mutant spectrum (and the same applies to deleterious nonsense mutations found in RdRp and CP mutant spectra) can be explained by lack of negative or purifying selection and the modulating effect of the mutant spectrum surrounding it, supporting the view that the quasispecies as a whole, and not individual genomes, is the target of selection (40, 52). Similarly, the cucumovirus Tomato aspermy virus maintains in its mutant spectrum a cell-to-cell movement-defective mutant by complementation, providing another example of the ability of plant virus quasispecies to keep variants with low fitness (32).
Recombination in RNA viruses is regarded as a mechanism that compensates for the accumulation of deleterious mutations and that helps to maintain genomic diversity (35). Thus, the large genomes of closteroviruses in particular are at high risk, and recombination might allow regeneration of functional genomes from deleterious ones. Nonhomologous recombination, indicated by the generation of defective RNAs, as well as homologous recombination, has been reported in natural isolates of CTV (3, 53, 56). In the last study (56), which was carried out in a single sweet orange plant through a combination of genome-wide resequencing analysis and deep sequencing of selected genomic regions, an extraordinarily large proportion (17.6%) of the sequenced molecules were recombinants derived from the three predominant genotypes. Although the search for recombinant molecules was not the aim of our work, the current study suggests that recombination is not frequent in ToCV, in contrast to closteroviruses, since we detected only one putative recombinant molecule. The low abundance of recombinant molecules, however, might indicate strong negative selection against those variants. In the particular case of the AT80/99 RdRp recombinant molecule, most mutations located in its type II sequence were synonymous, suggesting that this particular variant might have escaped from purifying selection.
Our results for all 13 isolates showed that quasispecies of regions belonging to RNA2 were the more restricted in genetic variation. Hsp70h and CPm regions displayed identical or very closely related master sequences and an abundance of nonsynonymous over synonymous changes, which might indicate functional constraints. In addition, structural constraints acting on the viral RNA might greatly limit sequence changes in these regions. Other restraints could include RNA folding into secondary structures, control of replication and translation, and protection against degradation (49).
From our results it is clear that, of all regions analyzed, the CP region was the most restricted with respect to sequence variation. There are several reasons for this. First, all isolates presented closely related, if not identical, master sequences, and their surrounding mutant spectra were extremely invariant. Second, this region was the only one where mutations were not distributed randomly according to the results of the run test. Third, transversions outnumbered transitional nucleotide substitutions in the CP region, suggesting purifying selection. Fourth, synonymous mutations were more abundant than nonsynonymous nucleotide substitutions in the CP region. Nonetheless, amino acid change for this protein may be subjected to greater purifying selection for the maintenance of the structure and stability of viral capsids and by interaction with the vector (7). On the other hand, quasispecies of the RdRp region were the most variable, although most of the substitutions were synonymous. Although this suggests that selection might not be acting at the nucleotide level, we cannot rule out the possibility that specific amino acid changes may be driving evolution. In this region (as well as in the CP region), nonsense mutants coding for truncated proteins were detected, suggesting that they may be maintained by complementation with viable mutants. For both RNA1 and RNA2, no geographical differences can be inferred from phylogenetic analyses.
One possible explanation for the relatively high mutation frequencies, high Shannon entropy, and high diversity values in the RdRp region is that the RdRp region is located in RNA1, which is specialized for replication, while the other regions studied are in RNA2. ToCV is a two-component virus, which means that during infection genomic segments 1 and 2 are separately encapsidated. Virions containing RNA1 inside the cell are able to decapsidate and replicate autonomously even in the absence of virions containing RNA2. However, to be encapsidated and to be able to spread within the plant, coinfection with RNA2 must occur. On the other hand, RNA2 must coincide with RNA1 to be replicated and encapsidated. More rounds of RNA1 replication before meeting RNA2 would mean more chances to accumulate mutations, resulting in imbalances of mutations between the segments. Asynchronous accumulation of RNA1 and RNA2 of the crinivirus LIYV upon coinfection of protoplasts has been reported (59). LIYV RNA1 progeny, both genomic and subgenomic RNAs, was detected in protoplasts as early as 12 h postinoculation (p.i.) and accumulated to high levels by 24 h p.i. In contrast, RNA2 progeny were not readily detected until ca. 36 h p.i.
The results obtained from this in-depth study of the intraisolate genetic variability of ToCV should contribute to our understanding of the biology, epidemiology, and evolution of the rapidly growing list of viruses in the complex family Closteroviridae. This information is especially important because the viruses in this family infect a broad range of economically important crops.
This research was funded by grants AGL2001-0542 and AGL2004-06959-C04-01/AGR to J.N.-C. and BFU2007-65080BMC to A.G.-P. (Plan Nacional de I+D+I, Ministerio de Educación y Ciencia [MEC], Spain). G.L. was the recipient of an FPI fellowship from MEC. A.G.-P. was supported by a Ramón y Cajal contract from the Ministerio de Ciencia e Innovación (Spain) and the European Social Fund.
We thank Pedro Moreno and Bruno Gronenborn for their critical reading of the manuscript. Also, many thanks to Marta Montserrat for her statistical advice and Pablo Navas for his editing help.
Published ahead of print on 30 September 2009.
†Supplemental material for this article may be found at http://jvi.asm.org/.