One major breakthrough in the work presented here lies in the space and time scales at which the experiments were performed. Indeed, the processes occurring within the course of a single infection of one multi-cellular host are of obvious biological relevance for any disease. Previous studies on viral recombination suffered from major drawbacks in this respect, basing their conclusions on experiments relying on complementation among non-infectious viruses or between viruses with undetermined relative fitness, on phylogenetically based analyses, or on experiments in cell cultures. For reasons detailed in the Introduction, the first two methods either do not provide information on the frequency of recombination, but only its occurrence, or address the question at a different temporal, and often spatial, scale. Results from cell cultures, on the other hand, impose cell co-infection by different viral variants, potentially overestimating the frequency of recombination events. Our study circumvents these limitations by analyzing viral genotypes sampled from infected plants after the course of a single infection, and therefore the invasion and co-infection of cells in various organs and tissues is very close to natural.
More than half of the genomes (53.8% ± 2.0%; see ) present in a CaMV population after a single passage in its host plant were identified as recombinants, and these data allowed us to infer a per nucleotide per generation recombination rate on the order of 2 × 10
−5 to 4 × 10
−5. The time length of one generation, i.e., the time required for a given genome to go from one replication to the next, is totally unknown in plant viruses. The only experimental data available on CaMV are based on the kinetics of gene expression in infected protoplasts, where the capsid protein is produced between 48 and 72 h [
40]. The reverse transcription and the encapsidation of genomic DNA being two coupled phenomena [
30], we judged it reasonable to assume a generation time of 2 d and, thus, an average of ten generations during our experiments. In case this estimate is mistaken, we have verified a linear relationship between
r and the number of generations, thereby allowing an immediate adjustment of
r if the CaMV generation time is more precisely established. At this point, we must consider that all cloned genomes may not have been through all the successive replication events potentially allowed by the timing of our experiments. It was previously shown that about 95% of CaMV mature virus particles accumulate in compact inclusion bodies [
41], where they may be sequestered for a long time, as such inclusions are very frequent in all infected cells, including those in leaves that have been invaded by the virus population for several weeks. The viral population may thus present an age structure that could bias the estimation of the recombination rate. In order to minimize this bias, the clones we analyzed were collected in one young newly formed leaf, where the chances of finding genomes from “unsequestered lines” were assumed to be higher. In any case, our data analysis is conservative, since this age structure can only lead to an underestimation of the recombination rate.
Our results show that interferences between pairs of loci are negative: a recombination event between two loci apparently increases the probability of recombination between another pair of loci. We believe that the most parsimonious explanation of these negative interferences is based on the way the infection builds up within plant hosts. Indeed, one can divide infected host cells into those infected by a single virus genotype and those infected by more than one viral genotype. In the former, analogous to clonal propagation, recombination is undetectable. In the latter, recombination is not only detectable but, as our results indicate, very frequent. Samples consisting of viruses resulting from a mixture of these two types of host cell infections will thus contain viruses with no recombination and viruses with several recombination events, thus yielding an impression of negative interference. These conceptual arguments are supported by mathematical models. It is indeed easy to show (detailed results not shown) that if a proportion F of the population reproduces clonally, analogous to single infections, while the remaining reproduces panmictically, negative interferences could be inferred even if they do not exist. For example, assuming a three-locus model with real recombination rates r1 and r2 and interference i12, the “apparent” recombination and interference parameters, would be r1 = (1 − F)r1, r2 = (1 − F)r2, and i12 = −(F − i12)/(1 − F). Interestingly, this example also shows that our estimates of the recombination rate are conservative: that a fraction F of host cells are singly infected while others are multiply infected leads to an underestimation of the recombination rate.
As judged by
r1,
r2, and
r3, calculated between markers a–b, b–c, and c–d, respectively, we found evidence for recombination through the entire CaMV genome. The values for
r1, r2, and
r3 are remarkably similar, hence the recombination sites seem to be evenly distributed along the genome. We considered the template-switching model as the major way recombinants are created in CaMV. As already mentioned in the Introduction, hot spots of template switching have been predicted at the position of the 5′ extremities of the 35S and 19S RNAs [
21,
36,
42]. If other recombination mechanisms, such as that associated with second-strand DNA synthesis or with the host cell DNA repair machinery, act significantly, hot spots would be expected at the positions of the sequence interruption Δ1, Δ2, and Δ3 [
43]. Due to the design of our experiment and the position of the four markers, we have no information on putative hot spots at positions corresponding to the 5′ end of the 35S RNA and to Δ1 (at nucleotide position 0). Nevertheless, the putative hot spots at the 5′ end of the 19S RNA and at Δ2 and Δ3 (nucleotide positions 4,220 and 1,635, respectively) fall between marker pairs c–d, b–c, and a–b, respectively. Our results indicate that either these hot spots are quantitatively equivalent—though predicted by different recombination mechanisms—or, more likely, that they simply do not exist. Whatever the explanation, what we observe is that the CaMV can exchange any portion of its genome, and thus any gene thereof, with an astonishingly high frequency during the course of a single host infection.
To our knowledge, the viral recombination rate has never previously been quantified experimentally for a plant virus [
3]. In contrast, retroviruses and particularly HIV-1 have been extensively investigated in that sense. As we have already discussed for these latter cases, the quantification of the intrinsic recombination rate was carried out in artificially co-infected cell cultures. The estimated intrinsic per nucleotide per generation recombination rate in HIV-1 is on the order of 10
−4 [
14,
15,
19], less than one order of magnitude higher than our estimation for CaMV. Because for various reasons detailed above we probably underestimate the within-host CaMV recombination rate, we believe that the intrinsic recombination rate in CaMV is higher and perhaps on the order of that of HIV.
Other pararetroviruses such as plant badnaviruses or vertebrate hepadnaviruses have a similar cycle within their host cells, including steps of nuclear minichromosome, genomic size RNA synthesis, and reverse transcription and encapsidation. Nevertheless, vertebrate hepadnaviruses (e.g., hepatitis B virus) infect hosts that are very different from plants in their biology and physiology, and this could lead to a totally different frequency of cell co-infection during the development of the virus populations. Thus, even though our results can be informative for other pararetroviruses because of the viruses' shared biological characteristics, they should not be extrapolated to vertebrate pararetroviruses without caution.