Three coexisting EACMV-like virus species
A total of 114 full DNA-A and 41 full DNA-B components were cloned and sequenced from dried cassava leaf samples collected in Grande Comore (GC, 42 DNA-A and 6 DNA-B), Anjouan (AJ, 12 DNA-A and 9 DNA-B), Mohéli (MO, 17 DNA-A and 4 DNA-B), Mayotte (YT, 32 DNA-A and 20 DNA-B) and the Seychelles archipelago (SC, 11 DNA-A and 2 DNA-B). In addition to these full genome sequences, 43 partial DNA-A sequences of the core capsid protein (CP) gene were obtained (see Additional file 1
: Table S1 for details).
Out of the eight species of CMG’s characterised in Africa, only complete sequences of EACMV (n
69), EACMKV (n
43) and EACMCV (n
2) were identified amongst the Comoros and the Seychelles archipelago samples. The partial sequences of the core CP genes of 43 isolates were also all classifiable as belonging to one of these three species (EACMV/EACMKV, n
42; EACMCV, n
1), although it was not possible to differentiate between EACMV and EACMKV isolates based on the CP alone. Whereas EACMV was identified on every island sampled, EACMKV was found everywhere other than in the Seychelles archipelago and in Anjouan. EACMCV was only detected in Grande Comore (Figure ). Interestingly, we noticed large variability in the CMG species composition among islands (Figure ). While in Mayotte and Anjouan EACMV represented the vast majority of the isolates (Mayotte: 84%; Anjouan: 100%), on Grande Comore and Mohéli it represented only approximately a quarter of CMG isolates, the remainder being mostly EACMKV (Grande Comore: 62%; Mohéli: 71%).
Figure 1 Repartition map of the seven African CMG’s in East Africa, the Comoros archipelago and the Seychelles archipelago. The map on the left describes the general repartition of the seven DNA-A species in the area. The map at the bottom right zooms (more ...)
Importantly, when examining the phylogenetic tree of the DNA-A sequences (Additional file 2
: Figure S1), several distinct variants of EACMV and EACMKV are specific to the SWIO. All of these (1) share higher sequence identity to CMG isolates from mainland Africa than to the other SWIO variants and (2) display complex geographical distributions. Together with the variation in CMG species compositions from one island to the next, this suggests a complex history of virus migrations throughout the SWIO region.
Regarding the DNA-B components, 17 sequences were obtained from samples infected with EACMV DNA-A sequences and seven from samples infected with EACMKV DNA-A sequences. The remaining 17 DNA-B sequences were isolated from samples from which no DNA-A was obtained. Based on BLAST searches, the majority of these sequences appeared most closely related to EACMKV DNA-B components (n
34) with seven sequences being most closely related to EACMV DNA-B sequences. However, it is important to note that, as has previously been indicated [28
], there is no clear demarcation between EACMV, EACMKV, EACMZV and SACMV DNA-B sequences, and that species assignment based on DNA-B sequences alone remains difficult for CMG’s. Interestingly, the two DNA-B sequences isolated from the Seychelles samples present lower sequence identity (less than 88%) to EACMV-like DNA-B components than did the rest of the CMG DNA-B’s analysed.
The difficulties inherent in classifying DNA-B sequences were confirmed by their phylogenetic analysis (Additional file 3
: Figure S2). The DNA-B sequences of EACMV, EACMKV, EACMZV and SACMV do not cluster according to the species classification of their associated DNA-A sequences and the two DNA-B sequences from the Seychelles clustered within a separate clade distinct from the other SWIO island CMG DNA-B’s sequences that was most closely related to EACMCV (Additional file 3
: Figure S2). In this phylogenetic tree, no clear clustering of sequences based on sampling location was apparent, again suggesting both multiple-introductions of these DNA-B lineages to individual islands and complex historical migration patterns between the SWIO islands and mainland Africa.
As recombination is a major process influencing the evolution of single stranded DNA viruses in general and begomoviruses in particular, we searched for evidence of (1) CMG sequence fragments being transferred into the genomic backgrounds of other species (i.e. events with CMG donors) and (2) genomic fragments of other species being transferred into mostly CMG-like genomic backgrounds (i.e. events with CMG recipients).
To do so, we constituted a dataset comprising the 114 full-length DNA-A sequences from this study with 264 DNA-A and DNA-A-like sequences representing all available CMG sequences from Africa and Asia including both monopartite and bipartite sequences of African begomoviruses, and additional sequences of legumoviruses, curtoviruses and topocuviruses available in GenBank.
Of the 15 detected recombination events involving CMG DNA-A sequences (Table ), eight involve obvious inter-species recombination, while the others lack clear identification of at least one parental sequence but were nevertheless also likely inter-species recombinants. Intra-species recombination was not detected in our analyses, probably due to the difficulties associated with detecting recombination between very closely related sequences within such large datasets [29
]. Only two of the inter-species recombination events detected involve exchanges between two EACMV-like species, while other events represent exchanges between an EACMV-like and ACMV (3 events) or tomato-infecting begomoviruses (3 events).
List of recombination events inferred in EACMV-like sequences
Previous studies have demonstrated that recombination is a particularly important process in the diversification of EACMV-like CMG species that has resulted in the emergence of new epidemiologically-important CMG variants (e.g. the EACMV-Uganda severe variant; [23
]), or species (e.g. the EACMMV, [30
]; for review [12
]). Despite our analyses confirming both the presence of multiple CMG species on various SWIO islands and the pervasiveness of inter-species recombination amongst CMG’s, we were unable to identify any DNA-A recombination event that was unique to SWIO CMG lineages. This may suggest that, despite the presently overlapping geographical ranges of different CMG species on the SWIO islands, mixed infections of these viruses and recombination events between them on these islands have not presently yielded any major epidemiologically relevant lineage. It is plausible that the introduction of these viruses to the SWIO islands may have simply been too recent either for such recombinants to have emerged yet or for them to have proliferated to the point where they would be detectable in a survey such as that described here.
The general recombination profile revealed by our analysis (Additional file 4
: Figure S3) is characterised by the absence of significant breakpoint hot spots in the inter-genic region (IR), contrasting with profiles of monopartite begomoviruses obtained in previous studies [20
]. The small number of events identified in our dataset, combined with the inability to clearly locate many breakpoints must most likely explain these results. However, recombination hot-spots are identified around nucleotide positions 1000 (4 of 15 events with breakpoints between positions 950 and 1100) and 1800 (7 of 15 events with breakpoints between positions 1750 and 1900) which correspond respectively to previously identified recombination hotspots near the end of ORF AV1 (encoding the capsid protein – CP), and the central region of ORF AC1 (encoding the Replication-associated protein – Rep) (Table ).
A DNA-B dataset of 168 begomoviruses, containing our 41 new sequences, along with representatives of the other diverse cassava-infecting begomovirus DNA-B sequences, was analysed in the same manner as described for the DNA-A dataset. Nine recombination events were identified in the dataset, with two of those being unique to SWIO sequences (Table ). A recombination event (event 2 of DNA-B, Table ) was detected in the two Seychelles DNA-B sequences, spanning the IR between ORFs BV1 and BC1 (position 1124 to 1454 relative to EACMV-[AJ:Oua:AJ03AN3:2004], accession JF909200). Whereas the genome region between the detected breakpoints was distantly related to an EACMCV DNA-B sequence (83% identity with EACMCV-[TZ1], accession AY795989), the remainder of the genomes of these Seychelles DNA-B sequences resemble those of EACMV-like group DNA-B sequences. This recombination event most likely explains why the two Seychelles DNA-B sequences appear as outliers within the CMG DNA-B sequences phylogenetic tree.
Recombination event 8 listed in Table was detected only within the DNA-B sequence of an isolate of EACMV from Anjouan (EACMV-[AJ:Bam:AJ29AQ1:2009]; JF909211). The DNA-B of this virus was closely related to EACMV DNA-B sequences (97,2% identical to EACMV-[AJ:Dzi:AJ10AK1:2005]; accession JF909202) except in the 3' half of the BV1 ORF (encoding the nuclear shuttle protein – NSP, position 723 to 1143) which was apparently derived from a virus species/isolate that is currently unsampled. Due to this recombination event, the BV1 ORF of EACMV-[AJ:Bam:AJ29AQ1:2009] (JF909211) seems non-functional (it has a premature stop codon), and this recombinant sequence potentially represents a non-viable variant.
The remaining events involve DNA-B sequences belonging to the clade EACMV/EACMKV/EACMZV/SACMV, for which, as mentioned previously, species demarcation is difficult. Importantly, however, none of these events were unique to the SWIO viruses.
Two recombination hotspots, mapping with the IR region (2500–50) and the core of the BC1 ORF coding for the MP protein (~1700pb) were detected and partially confirms results obtained in previous studies (Additional file 4
: Figure S3).
CMG diversity on the SWIO islands is fuelled by multiple introductions from East Africa
Clearly apparent from the phylogenetic reconstruction of CMG DNA-A sequences (Additional file 2
: Figure S1), the history of CMG migrations onto the SWIO islands is complex and most likely involves multiple introduction events including migrations from East Africa and between islands. To precisely reconstruct the pathways and evolutionary time-frame associated with CMG diversification and movements, we employed the probabilistic framework implemented in the computer program BEAST [31
]. Given sampling locations and dates for a set of sequences, BEAST permits the spatial-temporal reconstruction of plausible movement pathways underlying the observed geographical distributions of the analysed sequence sample. On the basis of GPS sampling coordinates, we defined seven distinct geographical groups corresponding to West and East/Centre Africa (Centre and East Africa for full-genome DNA-B – FG-B – dataset), the Seychelles, and each of the four Comoros islands separately (Additional file 5
: Figure S4). We then used the discrete phylogeographic model [32
] to reconstruct migration routes of CMG’s between these localities.
As recombination is known to confound molecular clock analyses, in addition to the full-genome DNA-A (FG-A) dataset, we constituted a mostly recombination-free (details in [33
]) core CP ORF (CP) dataset. Consistent with previous work on large begomovirus datasets that indicated that the core CP region is a recombination cold-spot [16
], there was an absence of detectable recombination breakpoints within this region of the CMG’s and their closest relatives. The final dataset that we constructed for these analyses was a DNA-B dataset (called FG-B).
For each of these datasets, all the new sequences described in this study were aligned with all currently available EACMV-like sequences from GenBank with an associated sampling date and location. This yielded 226, 213 and 92 sequences for the FG-A, CP and FG-B datasets respectively (see Additional file 1
: Table S1 for details).
We analysed each of these datasets using BEAST to infer the time when and the place where the most recent common ancestor of the EACMV-like viruses originated. While the FG-A analysis indicated that the mean substitution rates during EACMV-like virus evolution has been approximately 1.27 x 10-3
subs/site/year (95% highest posterior density - HPD - interval ranging from 9.08 x 10-4
to 1.64 x 10-3
), the CP analysis indicated a rate of 1.93 x 10-3
subs/site/year (95% HPD ranging from 1.26 x 10-3
to 2.64 x 10-3
). These substitution rate estimates are consistent with previously published estimates of substitution rates for these viruses [14
]. For the DNA-B, however, our estimate of 2.35 x 10-3
subs/site/year (95% HPD ranging from 1.36 x 10-3
to 3.35 x 10-3
) was faster than both those estimated for the DNA-A datasets, and that estimated for EACMV DNA-B in a previous study (1.33 x 10-4
; HPD between 1.06 x 10-5
and 3.39 x 10-4
It is important to point out here that, due to the relatively short time span over which the analysed samples were collected (1996 to 2009 but with 95% of the samples obtained between 2000 and 2009), the substitution rates inferred are probably more reflective of short-term mutation rates and not the longer-term substitution rates of EACMV-like viruses [15
]. This is because it is expected that over the 13
year sampling period there would have likely been insufficient time for purifying selection to remove many slightly deleterious mutations that would ultimately be purged from EACMV populations over longer time-frames (which if purged would result in lower substitution-rate estimates). For this reason, it is probable that our datasets will be ineffective for accurately estimating the dates of the deeper nodes of the EACMV-like virus phylogenetic tree.
Whereas the most recent common ancestor (MRCA) of the EACMV-like viruses was estimated to be 1880 (95% HPD: 1786–1945) using the FG-A dataset (Additional file 6
: Figure S5) it was dated to only 1938 (95% HPD: 1867–1982) using the CP dataset (Figure ). Using the FG-B dataset, the MRCA was dated to 1921, a date that is included in all the other estimated HPDs (HPD: 1820–1980). These contradictory date estimates highlight another potential bias introduced by recombination to the full genome datasets. It is expected that with the FG-A dataset, the much older dates of the last common ancestors of the divergent recombinationally acquired “non-EACMV-like” genome fragments would have pushed the estimated date of the EACMV-like virus MRCA much deeper into the past [13
] (i.e. the estimated date is expected to be somewhere between the date of the MRCA of the recombinationally acquired genomic tracks and the date of the MRCA of the rest of the genome).
Figure 2 Maximum clade credibility trees constructed from the EACMV-like capsid protein (CP) dataset. Branches are coloured according to the most probable location state of the node on their right (i.e. the likely geographical location of the ancestral sequence (more ...)
Therefore, when not stated otherwise, below we present dates of ancestral sequences using the core CP dataset. Importantly, though, since these dates will still be strongly upwardly biased due to the analysed sequences being sampled over such a short time-period, we will instead primarily focus on phylogeographic inferences since these will not have been influenced by this unavoidable bias. Moreover, while we use both FG-A and CP datasets analyses to describe phylogeographic patterns for CMG's in these islands, more credence should be given to inferences supported by the recombination-free CP dataset, as recombination can also have confounding effects on FG-A dataset phylogeographic reconstruction.
Not surprisingly, both the FG-A and CP analyses clearly indicated that the MRCA of the EACMV-like viruses probably resided on the African mainland (posterior state probability or PSP
0.96 and 0.71 for FG-A and CP datasets respectively; see Additional file 6
: Figure S5 and Figure ). If East/Centre Africa is more strongly supported as the root location than West Africa, it is impossible with our grouping design to provide definitive results at a finer spatial scale. The most probable geographical origin of DNA-B MRCA is also likely mainland Africa (PSP
69%; Figure ).
Figure 3 Maximum clade credibility trees constructed from the EACMV-like DNA-B Full genome (FG-B) dataset. Branches are coloured according to the most probable location state of their descendant nodes, with black circled nodes indicating nodes with state probabilities<0.5. (more ...)
Although our inference of the location of the EACMV-like virus MRCA would have been uninfluenced by the short time-frame over which samples were collected, it could have potentially been biased by the fact that sample sizes differed among locations. We therefore repeated our analyses using a configuration that randomizes the assigned locations of the sequences along the course of the analyses. If with these settings we again encountered mainland Africa as the likely origin of the EACMV-like viruses, it would imply that this result may have simply been caused by greater numbers of mainland African EACMV sequences having been included in our analysis. The grey “shadows” within the bar graphs in Figure , Figure and Additional file 6
: Figure S5 indicate the estimates obtained from these location randomized analyses. Importantly, none of these analyses yielded probability estimates for the location of the EACMV-like virus MRCA that were anywhere near as high as those obtained with the real datasets, confirming that our estimation of the EACMV-like virus MRCA’s location was robust to possible sampling biases.
The phylogeographic analyses of the FG-A and CP datasets implied that the current geographical distribution of EACMV-like virus genetic variants could best be explained by the possibility of at least four (for FG-A) or five (for CP) independent introductions of these viruses to the SWIO islands from mainland Africa (see tree branches indicated by red arrows in Figure and Additional file 6
: Figure S5). Whereas the four introductions inferred from the analyses of the FG-A dataset all involved movements from Africa to Grande Comore, the CP dataset indicated that movements from Africa had occurred three times to Grande Comore, once to Mohéli and once to Mayotte.
Importantly, the inferred locations of the ancestral sequences at some of the nodes do not have well resolved location estimates (highest location state probability of these estimates was lower than 0.5), and are circled in black in Figure and Additional file 6
: Figure S5. Slight changes in these probabilities as might be achieved with a larger sample of sequences from a wider variety of locations, would therefore likely yield different estimates to the number of independent introduction events inferred here. Crucially though, the CP phylogeographic reconstruction is far less ambiguous than that for the FG-A dataset since there is only a relatively high degree of uncertainty regarding the movement of viruses between Africa and the Seychelles: The two almost equally credible scenarios implied by our analysis is that there was a EACMV-like virus which either moved to the Seychelles from Mayotte between 1993 and 1999 (CP dataset HPD: 1985–2002) or was directly introduced to the Seychelles from Africa (depicted in Figure and Additional file 6
: Figure S5).
As emphasized before, recombination likely introduces a major bias of the FG-A molecular clock and phylogeographic analyses because of the detrimental effects it has on the accurate inference of phylogenetic trees [35
]. Here, it is clearly apparent that some of the isolates share the same CP gene but not the remainder of their genomes. The position of some EACMV and EACMKV clades is incongruent between FG-A and CP trees, thus slightly modifying the migration histories inferred using these two datasets.
From the FG-A dataset, the MRCA of all the EACMKV sequences sampled from the Comoros islands infers that this species arrived on Grande Comore in a single founding event that occurred sometime between 1981 and 1988 (FG-A HPD: 1966–1997). This founding lineage likely then moved from Grande Comore to Mohéli and Mayotte between 1997 and 2003 (Additional file 6
: Figure S5). Note that over the CP dataset, a major event involves the migration of EACMKV from Africa to Grande Comore between 1995 and 2001 (CP HPD: 1990–2003) but that the isolates from Mayotte and two isolates from Mohéli are regrouped within EACMV clade.
This phenomenon is possibly due to an undetected recombination event transferring the entire CP region between isolates of these species. Such difficult to detect recombination events have been invoked previously as an explanation for the CP genes of isolates belonging to the Tomato yellow leaf curl virus
(TYLCV) strains IL and Mld being polyphyletic [16
A single introduction of EACMCV to Grande Comore probably occurred from Africa between 1993 and 2006 (CP dataset HPD: 1987–2008). As only two EACMCV sequences have been isolated on Grande Comore (one in 2008 and one in 2009) no other movement events could be inferred for this species. It is possible that the low prevalence of EACMCV on SWIO islands may be due either to these viruses having only been on the islands for a very short time or because they are in the process of being displaced by another virus.
All the others introductions correspond to isolates of EACMV and likely occurred between 1988 and 2008 (CP dataset HPD 1978–2009).
Despite the bias exposed earlier, the CP and FG-A datasets both yielded congruent estimates of the migration routes between different islands. Two major migration directions are inferred and strongly supported by Bayes factor (BF) tests. From the FG-A and the CP datasets respectively four and six migrations were inferred from Grande Comore to Mohéli between 1999 and 2009 (CP HPD between 1997 and 2009 with an associated BF
30000). Similarly, between six and fourteen migrations were inferred between Mayotte to Anjouan between 1999 and 2009 (CP HPD between 1997 and 2009 with an associated BF
30000; Figure ). Although the results very strongly indicate that EACMV have moved frequently and relatively unimpeded between Mayotte and Anjouan (although primarily from Mayotte to Anjouan), it must be stressed in this case that there is a high degree of phylogenetic uncertainty in the inferred locations of the ancestral sequences used to detect some of these individual movements. For example, whereas the CP dataset indicates all movements were from Mayotte to Anjouan, up to three possible movements from Anjouan to Mayotte are indicated for the FG-A dataset. One movement from Mayotte to Grande Comore between 2002 and 2005 (CP dataset 95% HPD between 2001 and 2005) and another from Mohéli to Grande Comore between 2008–2009 (CP dataset 95% HPD between 2007 and 2009) are supported by both datasets, whereas the other migrations are supported by one or the other dataset but not both.
Figure 4 CMG migrations from East Africa and between SWIO islands. CMG migration events inferred using the capsid protein (CP, in green); full genome DNA-A (FG-A, in red) and full genome DNA-B (FG-B, in blue) datasets. Arrow colours represent the dataset used (more ...)
Three migration events from Africa to the SWIO islands could be inferred from the FG-B dataset (Figure ). The first, at the very base of the MCC tree, implies a possible movement of viruses from Africa to the Seychelles between 1921 and 2000 (HPD: 1819–2004). However, as was indicated earlier, the recombinant nature of these two outlier sequences from the Seychelles may have resulted in their artifactual placement at the root of the MCC tree.
Two other concomitant but phylogenetically distinct EACMV-like virus DNA-B sequence introductions from East Africa to the SWIO are later inferred. One migration occurs from East Africa to Grande Comore between 1975 and 2003 (95% HPD between 1955 and 2007), and then from Grande Comore to Mohéli between 2002 and 2009 (95% HPD: between 1997 and 2009). Unfortunately only three DNA-B sequences belonging to this phylogenetic clade are available and further movements amongst these islands such as those observed with the FG-A and CP datasets were therefore impossible to detect. The second EACMV-like virus DNA-B migration detectable between East Africa and the SWIO islands occurred to Anjouan between 1976 and 1990 (95% HPD between 1955 and 1999). From Anjouan there were likely four migration events to Grand Comore and two to Mayotte all between 1989 and 2009 (95% HPD between 1976 and 2009). Additionally, these analyses inferred two migrations from Grande Comore to Mohéli between 2000 and 2009 (95% HPD between 1995 and 2009), and two from Mayotte to Anjouan between 1997 and 2009 (95% HPD between 1993 and 2009; Figures and ).
As was found with the EACMV DNA-A migrations, it was difficult to conclusively determine the numbers and directions of DNA-B movements between Mayotte and Anjouan. Importantly, the same two major migration paths between the SWIO islands that were indicated in the FG-A and CP dataset analyses were also identified in the FG-B analysis (Grande Comore/Mohéli, BF for FG-B analysis
138; Mayotte/Anjouan, BF for FG-B analysis
57). The DNA-B dataset did, however, also strongly support an additional route (Grande Comore/Anjouan, BF
In all these analyses, it is important to note that migrations of CMG’s from the SWIO islands to Africa are barely ever detected. A single such event may be apparent in the FG-A phylogeographic reconstruction (see the bi-colour branch on Additional file 6
: Figure S5), but this signal could be artifactual since the probabilities of the locations for the supposedly migratory ancestral sequence involved is highly uncertain and could in fact be Africa in general: i.e. the location probabilities for West and East/Centre Africa are respectively 0.11 and 0.27 and sum to 0.39 which is higher than the location probability of 0.35 for Grande Comore.
These results therefore suggest that viruses from the SWIO islands have likely not made an appreciable contribution to the diversification of CMG’s on the African continent. Moreover, these results indicate that Grande Comore and Mayotte are the main pathways via which CMG’s are distributed from Africa to the SWIO islands of Mohéli and Anjouan. That Grande Comore and Mayotte are the likely launch pads of CMG movements throughout the Comoros, is probably best explained by these islands being the main traffic hubs in the archipelago and the closest to Africa and Madagascar respectively.
Regarding Madagascar, previous studies reported a severe epidemic of CMD in the 1930s-1940s [37
] and the co-existence of ACMV, EACMV and SACMV species [38
] based on serological test and partial sequencing. Since no CMG sequences were available from Madagascar at the time of our analysis, we were unable to determine how this island contributes to the dissemination of CMG’s across the SWIO. Obtaining full genome sequences from this island is therefore a top priority in our future efforts to trace CMG movements across Africa and the SWIO islands.
Complex patterns of component re-assortment
As was abundantly apparent from our phylogenetic and phylogeographic analyses of CMG DNA-A and DNA-B components, the cognate component pairs of particular viruses share neither completely congruent phylogenies nor identical migration histories (Figure ). We therefore sought to determine whether frequent genome component re-assortment might account for these observations and if so, to what extent had this pseudo-recombination process impacted the evolution of CMG’s. Towards this end we associated within a single dataset couples of DNA-A and B sequences sampled from the same plant. As the co-infections of individual sampled plants with multiple DNA-A and DNA-B components could bias this analysis, we discarded plants from which more than a single genetically uniform DNA-A or DNA-B sequence were isolated, resulting in a dataset with 81 sequence pairs (see Additional file 1
: Table S1 for details). It must be stressed, however, that we are unlikely to have discounted all co-infections by this screen and that we remained tentative about our pairing of cognate DNA-A and DNA-B components.
From this dataset, two different analyses were performed: The first made use of the recombination detection algorithms available in RDP3 [39
] but with a particular set-up so that only re-assortment events with or without CR exchanges were detected as recombination events. The second analysis approach used the program BEAST’s ancestral state reconstruction capabilities [32
] to infer past association histories between species classified DNA-A molecules and DNA-B component sequences. Based on the DNA-A phylogeny, six different DNA-A types were defined (respectively ACMV, EACMV, EACMV-UG, EACMKV, EACMCV and EACMZV) and each of these was assigned as a discrete character state to each of the DNA-B sequences. Therefore, just as with a phylogeographic analysis for which each sampled viruses is associated with a “location state” such as a country, island or province, in our analysis, each DNA-B sequence was associated with one of the six defined “DNA-A states”. We then interpreted individual “migration events” between these six DNA-A states in subsequent phylogeographic analyses of this dataset as individual inter-species re-assortment events.
From the RDP analysis, we detect a total of 15 recombination events that were suggestive of component re-assortment, among which at least four of these events seem to have additionally been associated with the overwriting of a DNA-B CR sequence by that of the DNA-A sequence that newly captured it (Table ). Most of these full DNA-B re-assortment events (7 out of 11) involve re-associations of DNA-B components between DNA-A sequences of the same species, the three remaining events involving exchanges between EACMV and EACMKV. These results tend to confirm that re-assortment occurs preferentially between closely related viruses, and probably depends on the ability of the DNA-A encoded Rep to trans-replicate the DNA-B sequences with which it comes into contact.
List of pseudorecombination events inferred in concatenated DNA-A and DNA-B sequences
Interestingly, the four detected events which bear signals of CR rewriting involve EACMZV DNA-B components, which seem to form a sub-clade among the other EACMV-like DNA-B sequences. This suggests that there may exist some incompatibility between EACMZV DNA-B CR sequences and the expressed Rep proteins of other EACMV-like viruses (i.e. non-EACMZV viruses), which, without CR overwriting, could inhibit full component re-assortments involving EACMZV variants.
Also consistent with the hypothesis that component re-assortments are likely to be most permissible between closely related viruses is the fact that no clear signal of re-assortment involving EACMCV DNA-B or ACMV DNA-B components, which are clearly distinct from the other DNA-B components, was detected in our analysis.
Importantly, our second BEAST-based approach provides results completely consistent with our RDP3 based analysis, with the inference of multiple recent (between 1979 and 2009; 95% HPD between 1965 and 2009) DNA-B exchanges between EACMV-UG, EACMV and EACMKV (BFs ranging from 4 to 253; Figure ). Colour changes over the tree indicate possible inter-species re-assortment events. Importantly, for several nodes (circled in black on Figure ), probabilities of association of DNA-B’s with particular DNA-A’s were low so that it is impossible to confidently retrace all the association histories. For example, due to its out-group position over the tree, ACMV-like DNA-A is identified as the most probable last common ancestor of all cognate DNA-A sequences. The reason for this is that despite our static designations of the analysed DNA-A sequences to six contemporary lineages, as one progresses from the branch tips into the deeper recesses of the tree, so the meaning of the static contemporary lineage designators becomes less and less meaningful – i.e. ideally the lineages should dynamically change to reflect the fact that, like the DNA-B lineages described in the tree, the DNA-A lineages upon which the designations are based also coalesce as one travels backwards in time down the tree. Despite this unavoidable departure from reality, among the thirteen well-supported re-assortment events detected by this analysis, ten were also supported by the RDP-based re-assortment analysis.
Figure 5 Analysis of EACMV-like component re-assortments. Analysis of component re-assortment through counts of DNA-B component “migrations” between the DNA-A sequences of the major CMG lineages. In the analysis DNA-A sequences were classified (more ...)
Probably because of the high degrees of similarity between EACMV and EACMKV DNA-B sequences, the analysis revealed reciprocal DNA-B movements between these species on the SWIO islands, comparable to those observed on the continent. Despite having a DNA-B that is also similar to those of EACMV and EACMKV, EACMV-UG being absent from these islands, no related DNA-B exchanges involving this species were detected. Nevertheless these results collectively reiterate that re-assortment of DNA-B components amongst the EACMV-like CMG’s is likely facilitated by these viruses having broadly compatible DNA-A and DNA-B components.