Integrated maps that link crop and model plants allow knowledge gained from independent research on different plants to be accumulated [17
]. This work links the genetic map of peanut, one of the world's most important grain legumes, to the model legumes Lotus
, a first step for its inclusion in an integrated genetic system for legumes.
Genome restructuring progressively breaks down syntenic relationships between species over evolutionary time. In addition, whole genome duplications occur periodically during plant evolution, followed by progressive diploidization [18
]. These events split and obfuscate syntenic relationships. For the legumes, useful levels of synteny have been shown between the Galegoid models and the Phaseoloids, which diverged some 50 Myr ago [19
]. However, between the models and the more basally diverged Genistoid lupin, syntenic relationships are more complex [22
]. The divergence of the Dalbergioid clade to which Arachis
belongs is placed at a similar, or slightly later, date in evolutionary time than the divergence of the Genistoids. Therefore the degree to which we could detect macro-synteny between the models and Arachis
Initial inspection of the Arachis vs. model legume plots show surprising degrees of synteny considering the time of species divergence. Although there are some regions of double affinities between Arachis and the model legumes, most synteny blocks have a single main affinity and not two affinities interleaved (Figs. , and ). These patterns indicate that the last universal legume whole genome duplication predated the divergence of Arachis from the Galegoids and Phaseoloids sufficiently that the common ancestral genome was substantially diploidized. Synteny at the macro-level between Arachis and the model legumes will be useful for many genomic regions.
The different cross-species plots showed fascinating similarities, and the power of the Arachis
out-group (Fig. ) allows new inferences to be made: In the Lotus vs. Medicago
genomic plot [14
] distinct conserved synteny blocks and non-conserved regions are observed. To explain this, we could hypothesize rearrangements/deletions within the non-conserved regions, either in Medicago
, or in both. Although either explanation is possible, the philosophical principle of Ockham's razor guides us to prefer the simplest explanation: that regions lack synteny due to disruption in Medicago
but not both. However, with the addition of the Arachis
out-group, the power of inference is increased: we see similar patterns of synteny (and disruption) in all possible species by species comparisons (with the notable exception of SB1). Therefore, the evidence from the Arachis
out-group strongly argues against the simplest explanation for patterns of Medicago
synteny and disruption. The inference, instead, is that certain legume genomic regions are consistently more stable during evolution than others. Additional evidence for this was found in our recently completed study, where 104 anchor markers mapped in bean are used to detect genomic regions that are syntenic with Lotus, Medicago
. In this study, large syntenic segments are also found to be conserved between all species. These syntenic segments correspond to synteny blocks 2, 3, 4, 6, 7, 8, 9, 10 and 11 (the latter consisting of a block of bean LG5 – Lj3 – Mt2 – Ar8 associations). SB1 and SB5 are also evident, but the former is small, covering only 2.3 cM in LG1 of bean, and the latter is fragmented into three sections covering bean LG1 and LG6 ([21
]; Arachis vs.
bean marker correspondences from this reference are also summarized in Additional file 1
We sought an explanation for these observations and began by analyzing transposon distributions in Lotus
. We found that retrotransposons are very unevenly distributed in both the model legume genomes and that the retrotransposon-rich regions tend to correspond to variable regions, intercalating with the synteny blocks (Fig. ). This tendency is particularly evident for Medicago
, its higher retrotransposon content [14
], and higher proportion of anchored BACs compared to Lotus
may account for this. Overall, considering the time of evolutionary divergence, the patterns of synteny blocks, variable regions and retrotransposon distributions are substantially similar. In addition, it is notable that SB1, which is conserved between Medicago
, but was not evident between Arachis
and the model legumes, is positioned on local peaks in the densities of retrotransposons in the model legumes. Considering this, we suggest that the euchromatic gene space of the model legumes, and by inference possibly most Papilionoid legumes, can be divided into two broadly defined components: regions that remain relatively stable, and regions that experience high rates of genome restructuring. The former tend to be syntenic across taxa and to have low retrotransposon densities, and the latter tend to show little synteny, and to have high retrotransposon densities. The proposed genome model is similar to the pan-genome concept, originally from bacteria but recently suggested for plants [24
]. It should be noted that this model appears consistent with the data used here, from diploid genomes. However, plants which have undergone rapid genome restructuring after polyploidy, such as soya, may differ.
In comparing syntenic and more variable regions, another trend seems clear. Variable regions have lower densities of anchor-marker correspondences and therefore single copy genes (Figs. , ). However, regions without synteny are not simply "holes" in the dot plot, because there are correspondences in these regions; but they are scattered. In contrast to the low concentrations of single copy genes, some of the retrotransposon rich regions in the model legumes host high densities of some genes in multigene families, and we illustrated this with the plant disease resistance gene homologs.
The fast evolving nature of the repetitive fraction of the genome has been documented in many plant species. This evolution involves rapid expansions and reductions in transposon numbers [25
], such that, for instance, even closely related species of Arachis
can be distinguished using whole genome in-situ
]. Therefore, it seems likely that the restructuring within the variable regions, which tend to be transposon rich, frequently includes amplification of some sequences and elimination of others ("birth and death"). Considering this, it would be expected that natural selection would tend to select against single copy or essential genes, and high densities of transposons inhabiting the same genomic regions. On the other hand, it may be expected that the co-localization of certain fast-evolving multigene families and high densities of transposons could be advantageous. The presence of high densities of resistance gene homologs, a gene family for which rapid birth and death, and frequent diversifying selection has been well established, in some of the variable retrotransposon rich regions, supports this view. A detailed phylogenetic analysis of NBS-LRR genes in Medicago
also provides support to the hypothesis that restructuring in the variable regions has driven the evolution of some resistance gene clusters; NBS-LRR encoding genes in retrotransposon rich regions are, on average, more recent in origin, and have more unusual domain rearrangements than those in synteny blocks [15
Retrotransposon rich genomic regions may play a similar role in legumes as in trypanosomes, where they interrupt synteny and are associated with gene family expansions and the evolution of new gene diversity [26
]. The legume retrotransposon rich regions may also be similar to pericentromeres, exceptional genomic regions that are also retrotransposon rich: in animals they contain segmental duplications implicated in gene creation, and in plants they harbor rearrangements and insertions uncommon in euchromatin [28
]. However, the size of the retrotransposon rich regions described here, extending in some cases to entire or nearly entire euchromatic chromosome arms, and the association of some of them with disease resistance genes, seems notable. For applied science, the presence of clusters of resistance gene homologs in regions of low synteny also has important implications. Synteny between genomes may often not enable predictions of the locations of orthologous resistance genes, although the genome model presented here may aid in the identification of the resistance genes for which synteny is more likely to be preserved.