Retrotransposons make a significant contribution to the size, organization and genetic diversity of their host genomes. To characterize retrotransposon families in the grapevine genome (the fourth crop plant genome sequenced) we have combined two approaches: a PCR-based method for the isolation of RnaseH-LTR sequences with a computer-based sequence similarity search in the whole-genome sequence of PN40024.
Supported by a phylogenic analysis, ten novel Ty1/copia families were distinguished in this study. To select a canonical reference element sequence from amongst the various insertions in the genome belonging to each retroelement family, the following screening criteria were adopted to identify the element sequence with: (1) perfect 5 bp-duplication of target sites, (2) the highest level of identity between 5' and 3'-LTR within a single insertion sequence, and (3) longest, un-interrupted coding capacity within the gag-pol ORF. One to eight copies encoding a single putatively functional gag-pol polyprotein were found for three families, indicating that these families could be still autonomous and active. For the others, no autonomous copies were identified. However, a subset of copies within the presumably non-autonomous families had perfect identity between their 5' and 3' LTRs, indicating a recent insertion event. A phylogenic study based on the sequence alignment of the region located between reverse transcriptase domains I and VII distinguished these 10 families from other plant retrotransposons. Including the previously characterized Ty1/copia-like grapevine retrotransposons Tvv1 and Vine 1 and the Ty3/gypsy-like Gret1 in this assessment, a total of 1709 copies were identified for the 13 retrotransposon families, representing 1.24% of the sequenced genome. The copy number per family ranged from 91–212 copies. We performed insertion site profiling for 8 out of the 13 retrotransposon families and confirmed multiple insertions of these elements across the Vitis genus. Insertional polymorphism analysis and dating of full-length copies based on their LTR divergence demonstrated that each family has a particular amplification history, with 71% of the identified copies being inserted within the last 2 million years.
The strategy we used efficiently delivered new Ty1/copia-like retrotransposon sequences, increasing the total number of characterized grapevine retrotrotransposons from 3 to 13. We provide insights into the representation and dynamics of the 13 families in the genome. Our data demonstrated that each family has a particular amplification pattern, with 7 families having copies recently inserted within the last 0.2 million year. Among those 7 families with recent insertions, three retain the capacity for activity in the grape genome today.