|Home | About | Journals | Submit | Contact Us | Français|
Retroposons are ubiquitous transposable elements found in the genomes of most eukaryotes, including trypanosomatids. The African and American trypanosomes (Trypanosoma brucei and Trypanosoma cruzi) contain long autonomous retroposons of the ingi clade (Tbingi and L1Tc, respectively) and short nonautonomous truncated versions (TbRIME and NARTc, respectively), as well as degenerate ingi-related retroposons devoid of coding capacity (DIREs). In contrast, Leishmania major contains only remnants of extinct retroposons (LmDIREs) and of short nonautonomous heterogeneous elements (LmSIDERs). We extend this comparative and evolutionary analysis of retroposons to the genomes of two other African trypanosomes (Trypanosoma congolense and Trypanosoma vivax) and another Leishmania sp. (Leishmania braziliensis). Three new potentially functional retroposons of the ingi clade have been identified: Tvingi in T. vivax and Tcoingi and L1Tco in T. congolense. T. congolense is the first trypanosomatid containing two classes of potentially active retroposons of the ingi clade. We analyzed sequences located upstream of these new long autonomous ingi-related elements, which code for the recognition site of the retroposon-encoded endonuclease. The closely related Tcoingi and Tvingi elements show the same conserved pattern, indicating that the Tcoingi- and Tvingi-encoded endonucleases share site specificity. Similarly, the conserved pattern previously identified upstream of L1Tc has also been detected at the same relative position upstream of L1Tco elements. A phylogenetic analysis of all ingi-related retroposons identified so far, including DIREs, clearly shows that several distinct subfamilies have emerged and coexisted, though in the course of trypanosomatid evolution, only a few have been maintained as active elements in modern trypanosomatid (sub)species.
Retroposons, also called non-long-terminal-repeat retrotransposons, are ubiquitous elements that transpose through an RNA intermediate and are found in the genomes of most eukaryotes (10, 17). The current model for transposition of retroposons predicts that an element-encoded endonuclease performs a single-strand nick of the target DNA, generating an exposed 3′ hydroxyl that serves as a primer for reverse transcription of the element's RNA. Synthesis of the complementary strand of the new DNA copy of the retroposon is performed by the element-encoded reverse transcriptase (RT). The second single-strand nick is carried out on the other strand, a few base pairs downstream of the first nick, by the same element-encoded endonuclease, generating a primer for the second-strand synthesis of the retroelement. Retroposons are therefore flanked by a direct repeat called a target site duplication (TSD), which corresponds to the sequence between the two single-strand nicks. They also have a variable-length poly(A)- or A-rich 3′ tail, due to the involvement of an RNA intermediate (25).
Trypanosomatids are protozoan parasites of major medical and veterinary significance. They not only cause serious diseases, such as sleeping sickness (Trypanosoma brucei gambiense and T. brucei rhodesiense), Chagas disease (Trypanosoma cruzi), and leishmaniasis (Leishmania spp.) in humans, but also are a serious impediment to socioeconomic development by causing disease in domestic animals (T. brucei brucei, Trypanosoma congolense, and Trypanosoma vivax). T. brucei subspecies and T. cruzi belong to the genus Trypanosoma and constitute a monophyletic group distantly related to the Leishmania spp. (18, 26). The mammalian parasites of the genus Trypanosoma are further divided into the Salivaria (African trypanosomes, including T. brucei subspecies, T. congolense, and T. vivax) and the Stercoraria (South American trypanosomes, including T. cruzi). This division is based on their modes of transmission: predominantly inoculative by tsetse flies, as is the case in the African trypanosomes, or contaminative by a variety of bloodsucking insects, as is the case in the South American trypanosomes. T. vivax is placed at the most basal position of the Salivarian group, while separation between T. brucei and T. congolense is more recent (16, 35). The year 2005 saw the publication of the nuclear genomes of T. brucei, T. cruzi, and Leishmania major (1, 14, 15, 21), which also provided a remarkable opportunity to comprehensively analyze retroposons from lower eukaryotes in their genomic context (8, 14). In this study, we extend the analysis of retroposons into a wider comparative and evolutionary context using three other trypanosomatid genomes: those of Leishmania braziliensis (31), T. congolense, and T. vivax (our unpublished data).
Retroposons of the ingi clade constitute the most abundant transposable elements described in the genomes of T. cruzi and T. brucei (~3% of the nuclear genome) (Table (Table1).1). The T. brucei ingi (5.25 kb) (here renamed Tbingi) and T. cruzi L1Tc (4.74 kb) elements are potentially functional and autonomous retroposons that encode a single large multifunctional protein containing the N-terminal apurinic/apyrimidinic-like endonuclease, the RT, the RNase H, and the C-terminal DNA-binding domains (24, 28, 30). Tbingi and L1Tc are dispersed in the genomes, although they show relative site specificity for insertion (4, 5). The trypanosome genome also contains small nonautonomous retroposons, namely, NARTc (0.26 kb) and RIME (here renamed TbRIME; 0.5 kb), respectively, that are related to the autonomous L1Tc and Tbingi but that do not encode their own mobilization machinery. Instead, their transposition requires the enzymatic activities of L1Tc or Tbingi, with which the elements form Tbingi/TbRIME and L1Tc/NARTc pairs of retroposons, similar to pairings that have been previously described in the case of human LINE1/Alu, eel UnaL2/UnaSINE1, and plant LINE/S1 retroposon pairs (12, 22, 23, 37). The trypanosome and Leishmania genomes also contain highly degenerate elements related to retroposons of the ingi clade named DIREs (for degenerate ingi-related elements) (7). Tbingi/TbRIME, L1Tc/NARTc, and DIREs share the first 79 residues, which constitute the hallmark of trypanosomatid retroposons (“79-bp signature”). Recently, small degenerate retroposons (~0.55 kb) containing the “79-bp signature,” named LmSIDERs (for short interspersed degenerate retroposons), have also been identified in the genomes of L. major (9), Leishmania infantum, and L. braziliensis (34). LmSIDER constitutes the largest retroposon family described so far in trypanosomatids, and members are located in the 3′ untranslated regions of genes, where they play a role in the regulation of gene expression (3, 9, 29). In this paper, we report the identification and characterization of the full complement of ingi-related elements (potentially active retroposons, DIREs, and truncated elements) in the genomes of T. congolense, T. vivax, and L. braziliensis. We also analyzed the genomic environments of these retroelements to compare their mechanisms of retrotransposition. Our analysis shows that at least six retroposon families (the ingi1 to -6 subclades) belonging to the ingi clade are represented across trypanosomatid genomes. However, most of these families have been lost in individual genomes. T. congolense is the most retroposon-rich trypanosomatid, with two potentially active families belonging to the ingi1 and ingi6 subclades. None of the Leishmania spp. analyzed contain active ingi-related retroposons.
A comprehensive analysis of retroposons in the genomes of T. brucei, T. cruzi, and L. major has been reported previously (1, 4-7, 9, 14, 15, 21). We used TBLASTN with Tbingi and T. cruzi L1Tc amino acid sequences as queries to detect ingi-related sequences in the T. congolense (strain IL-3000, version 1), T. vivax (strain Y486, version 2), and L. braziliensis (clone MHOM/BR/75M2904, version 2) (31) genome data sets. The T. congolense genome assembly consisted of 3,181 contigs, totaling 41.8 Mb, while the T. vivax assembly contained 10,250 contigs, totaling 47.4 Mb, including 8,279 T. vivax contigs not assigned to chromosomes. To identify further ingi-related sequences and to precisely determine the element boundaries, several rounds of BLASTN and TBLASTN searches were performed, including at each step new retroposon sequences identified in the subject data set by these reiterative BLAST searches.
To determine the approximate coordinates of degenerate ingi-related sequences (DIREs), an initial TBLASTN search was performed against the T. congolense, T. vivax, and L. braziliensis contigs using the Tbingi, Tcoingi (T. congolense ingi), Tvingi (T. vivax ingi), L1Tco, and L1Tc peptide sequences as queries. The models were refined and extended by approximately 300 nucleotides by searching with the corresponding peptide sequences against a protein database composed of previously identified Tbingi, Tcoingi, Tvingi, L1Tco and/or L1Tc peptides using the BLAST-extend-repraze algorithm developed at the J. Craig Venter Institute (JCVI; formerly The Institute for Genomic Research). A subsequent Smith-Waterman alignment between the proteins, including the translation of the extensions, allowed the examination of all translation frames. To tentatively reconstitute proteins from the analyzed DIREs, frameshifts were removed manually from the DNA sequences using the BLAST-extend-repraze output. This approach generated a pseudogene for each DIRE element, encoding a single ingi-like sequence, which in most cases contained numerous stop codons.
The RT, endonuclease, and RNase H amino acid domains were aligned using the multiple-alignment software CLUSTAL X (38), followed by minor manual adjustments using MacClade version 4.06 (Sinauer Associates, Inc.). The alignments of the RT domains are shown in Fig. S1 in the supplemental material. Phylogenetic trees were generated by the neighbor-joining method as implemented in PAUP version 4.0b10 (Sinauer Associates, Inc.), using the default parameters. Bootstrapping was also carried out using PAUP.
Retroposons in the T. brucei (strain TREU927/4), T. cruzi (strain CL), and L. major (strain Friedlin) genomes have previously been annotated and analyzed (4, 5, 9). Here, we identified and analyzed ingi-related retroposons in the draft genomes of T. congolense, T. vivax, and L. braziliensis (Table (Table1)1) using an iterative BLAST-based approach. The L. braziliensis genome contains 65 heterogeneous sequences (named LbrDIRE) with numerous frameshifts and/or in-frame stop codons, which inactivate their ingi-related coding sequences. In the T. vivax genome, 864 sequences were identified, including 108 TvDIREs. Based on nucleotide sequence analysis, the other 756 ingi-related sequences form a group of closely related elements, with the percentage of divergence between aligned nucleotide sequences larger than 3.5 kb (46 elements) ranging between 4.2% and 18.6% (median, 10%) (Fig. (Fig.1).1). The consensus sequence of this retroposon family, called Tvingi, is 5,419 bp long and encodes a 1,752-amino-acid protein sharing 31.6% identity with the Tbingi protein (Fig. (Fig.22 and Table Table2).2). Only one Tvingi retroposon among the 11 full-length elements identified encodes a potentially functional protein.
We also identified 241 ingi-related sequences in the T. congolense genome, which can be divided further into three groups. The first group is composed of 173 sequences containing highly degenerate coding sequences, called TcoDIREs. The other 68 sequences form two groups of closely related sequences, called Tcoingi (56 sequences) and L1Tco (12 sequences), showing a low percentage of divergence between their consensus sequences, each with a median value of ~5% (Fig. (Fig.1).1). The Tcoingi consensus sequence is 5,404 bp long and encodes a 1,751-amino-acid protein sharing 32.4% and 88.1% identity with the Tbingi and Tvingi proteins, respectively (Fig. (Fig.22 and Table Table2).2). The L1Tco consensus sequence is 4,733 bp long and, after four frame shifts were removed, coded for a 1,505-amino-acid protein sharing 50.1% identity with the L1Tc product (Fig. (Fig.22 and Table Table2).2). Among the 11 full-length Tcoingi elements, 2 encode a potentially functional protein. The only full-length L1Tco element identified does not appear to code for a functional protein and therefore probably no longer encodes the retroposon machinery required for activation. However, one cannot exclude the possibility that the L1Tco subfamily is still active in the T. congolense genome. Indeed, the other 11 L1Tco sequences identified in the genome are truncated due to their positions at contig boundaries and could yet turn out to be potentially functional.
In contrast to the African trypanosome genomes and consistent with the genomes of L. major (7, 21) and L. infantum (data not shown), no potentially active ingi-related retroposons were detected in the L. braziliensis genome.
Phylogenetic analyses of retrotransposons are commonly performed on the RT domain, which is the trademark of these mobile elements. As it is the most conserved retrotransposon domain (27), RT phylogeny is statistically more robust than phylogenetic trees generated with the endonuclease and RNase H domains. In order to perform a comprehensive phylogenetic analysis of all retroposons, we reconstituted DIRE proteins using matches to Tbingi, Tcoingi, Tvingi, L1Tc, and/or L1Tco proteins. Among the DIREs identified in the genomes of T. congolense (TcoDIRE; 173 copies), T. vivax (TvDIRE; 108 copies), and L. braziliensis (LbrDIRE; 65 copies), approximately half were successfully reconstituted, with gene products ranging between 199 and 1,702 amino acids. As observed before, all the ingi-related elements (113 DIREs from across the five species and the consensus sequences of Tbingi, Tcoingi, Tvingi, L1Tc, and L1Tco) form a monophyletic clade distinct from all the other retroposons, supported by a high bootstrap value (99%) (Fig. (Fig.3)3) (7). A pattern equivalent to the RT analysis was observed with the RNase H and endonuclease domains, using cellular domains as an outgroup (data not shown).
A previous phylogenetic analysis of the T. brucei, T. cruzi, and L. major sequences prompted us to consider three ingi subclades (Fig. (Fig.3)3) named L1Tc, LmDIRE, and ingi; the last was divided into three further groups, two composed of TbDIREs and the third containing TbDIREs, Tbingi, and TcDIREs (7). Although the inclusion of the T. congolense, T. vivax, and L. braziliensis ingi-related retroposons in this phylogenetic analysis did not change the overall structure of the tree, it considerably changed the complexity of each group/subclade, called here the ingi1 to ingi6 subclades (Fig. (Fig.3).3). The former T. cruzi L1Tc subclade (now called ingi1) is enriched with T. congolense (L1Tco and TcoDIREs) and L. braziliensis (LbrDIREs) sequences. The other LbrDIRE sequences belong to the ingi2 subclade (the former LmDIRE subclade), together with the LmDIRE elements. Among the three TbDIRE groups previously identified (7), (i) TbDIRE1 forms a monophyletic group with Tbingi and some of the TcDIRE sequences (ingi4 subclade); (ii) TbDIRE2 sequences are closely related to Tvingi, TvDIREs, Tcoingi, and the majority of the TcoDIREs (ingi6 subclade); and (iii) TbDIRE3 sequences are grouped with TcoDIRE sequences (ingi5 subclade). Finally, a subset of the TcoDIRE sequences forms a distinct group, called the ingi3 subclade. These data suggest that the evolution of the ingi-related retroposons in the trypanosomatid genomes is quite complex, with the contraction and expansion of many subfamilies during the evolution of these parasites. Among the six identified subfamilies, only three have retained potentially retrotransposition-competent retroposons, ingi1 in T. cruzi (L1Tc) and possibly T. congolense (L1Tco), ingi2 in T. brucei (Tbingi), and ingi6 in T. congolense (Tcoingi) and T. vivax (Tvingi). There is, however, no evidence of active elements in the Leishmania genomes.
The T. brucei and T. cruzi genomes are rich in short nonautonomous retroposons, called TbRIME and NARTc, respectively, which are truncated versions of the long autonomous Tbingi and L1Tc sequences (6, 19) (Table (Table11 and Fig. Fig.2).2). Several lines of evidence indicate that the T. vivax genome contains a short nonautonomous retroposon family (TvRIME) that corresponds to a truncated version of Tvingi: first, the TvRIME consensus sequences (1,030 bp long) share the first 204 and the last 826 residues with Tvingi (Fig. (Fig.2);2); second, the matching 1,030-bp residues between the TvRIME and Tvingi consensus sequences are 92% identical; third, all 58 of the identified TvRIME sequences are highly conserved, with the percentages of divergence from their deduced consensus sequence ranging between 0.1% and 2.3% (median, 1%) (Fig. (Fig.4),4), suggesting that they have been relatively recently mobilized and consequently are still active. It is noteworthy that all 58 of the TvRIME sequences identified in 15 contigs show a tandem arrangement, with the longest TvRIME cluster composed of 10 elements (Fig. (Fig.5).5). In contrast, the T. congolense genome does not contain retroposon families corresponding to shorter versions of the Tcoingi or L1Tco elements.
Retroposons of the ingi clade show relative site specificity for insertion and are preceded by a conserved motif recognized by the element-encoded endonuclease domain, which performs two strand nicks at the target site of insertion (4, 5). In order to study the insertion sites of the Tcoingi, L1Tco, and Tvingi elements, we first considered all of the full-length elements identified in the T. congolense and T. vivax genomes. Only two of each showed a duplicated sequence (TSD) flanking the element, which constituted too small a data set to determine the target site consensus sequence (Fig. (Fig.6).6). As most of the T. brucei (Tbingi/TbRIME) and T. cruzi (L1Tc/NARTc) retroposons are flanked by a 12-bp TSD (4, 5), we extended this analysis to all of the Tcoingi, L1Tco, and Tvingi elements with an intact extreme 5′ end, considering that T. congolense and T. vivax TSDs may have the same conserved length (12 bp), as observed for the six full-length retroelements mentioned above (Fig. (Fig.66).
To determine the sequence conservation upstream of the Tvingi and Tcoingi elements, we considered only a single representative sequence of each group of nearly identical 5′ flanking sequences (110 and 30 sequences for Tvingi and Tcoingi out of 193 and 44 retroposons, respectively, with a 5′ extremity). Both the Tvingi and Tcoingi retroelements are preceded by well-conserved patterns (Fig. (Fig.77 and and8),8), which are similar (5′-TTTTAXXXAA↑AAAAAAAXXTTT-3′ and 5′-AXXXAXTTTTXTXXXA↑AAAAAXAATTAT-3′, respectively; the arrows indicate the putative first-strand cleavage sites; boldface residues show 60 to 75% conservation; boldface and underlined residues show 76 to 100% conservation). The most conserved residues are 2 adenine residues at positions −12 and −13 upstream of the Tvingi (82% and 85% conservation) and Tcoingi (93% and 87% conservation) elements. Interestingly, the most conserved residues upstream of L1Tc sequences are also located at positions −12 and −13, which flank the characterized first-strand cleavage site (Fig. (Fig.8).8). This strongly supports the view that the TSD size for Tvingi and Tcoingi elements is also 12 bp, as previously observed for Tbingi and L1Tc (4, 5). The conservation of the upstream pattern between the closely related Tcoingi and Tvingi retroposons suggests that the two retroelements recognize similar target sites for insertion.
The T. brucei and T. cruzi long autonomous and short nonautonomous retroposons of the same pair (Tbingi/TbRIME and L1Tc/NARTc, respectively) show exactly the same site specificity for insertion, as both members of the pair use the same retrotransposition machinery, which is encoded by the autonomous element (4, 5). Thus, one may expect that the same holds true for the Tvingi/TvRIME pair. The TvRIME retroposons form large clusters of tandemly repeated elements separated by the 12-bp TSD. Unfortunately, owing to the state of the draft genome assembly, all arrays of TvRIME sequences were truncated by contig boundaries. We were therefore unable to determine the conserved pattern upstream of TvRIME sequences. However, all 41 of the TSD sequences identified between TvRIMEs show at most two differences from the consensus sequences (−12 AAACCAATGTTT −1; boldface and underlined residues are shared with the consensus TSD flanking Tvingi sequences), suggesting that all of the TvRIME clusters are flanked by similar sequences and consequently are in the same genomic environment. The TSD sequences flanking TvRIMEs are similar to the consensus TSD flanking Tvingi (−12 AAAAAAAxxTTT −1; boldface and underlined residues are shared with the consensus TSD flanking TvRIME sequences), which supports the assumption that TvRIME also uses the Tvingi retrotransposition machinery.
Among the 12 L1Tco retroposons identified in the T. congolense genome data set, only 7 start with their 5′ extremities. Interestingly, two of them (1-03436 and 1-03805 in Fig. Fig.6B)6B) are preceded by all of the residues constituting the conserved motif located upstream of the L1Tc retroposons (5′-GAXXAXGAXXXTXTATG↑AXXXXXXXXXXX-3′) (4). At least half of these residues are also conserved upstream of the other five L1Tco sequences (Fig. (Fig.6B).6B). These data suggest that the L1Tc and L1Tco retroposons, which are closely related in the phylogenetic tree (Fig. (Fig.3),3), also show the same site specificity for insertion.
A previous analysis of trypanosomatid ingi-related retroposons showed that all the retroelements identified in the T. brucei, T. cruzi, and L. major genomes (Tbingi, L1Tc, and DIREs) are grouped according to their species origin, with the exception of a few T. cruzi DIRE elements (the TcDIRE1 family) (Fig. (Fig.3),3), which are closely related to the T. brucei Tbingi and TbDIRE elements. We have previously interpreted these data as an indication of a lower rate of evolution for the TcDIRE1 sequences than for the other retroelements (7), as horizontal transfer of retroposons offers an unlikely explanation (27). The extension of this retroposon analysis to three other trypanosomatid genomes now provides a novel insight into the evolution of trypanosomatid retroposons and enables us to revise the previous interpretations. Indeed, retroposons of the ingi clade can be divided into six subclades, each subclade in turn containing members belonging to up to three different trypanosomatid (sub)species (ingi1 and ingi6 subclades in Fig. Fig.3).3). This phylogenetic analysis clearly demonstrates that ingi subfamilies arose and disappeared in individual (sub)species during the evolution of trypanosomatid species.
Two lines of evidence indicate that the ingi1 subclade was present in the trypanosomatid genome before Trypanosoma and Leishmania speciation. First, members of this subclade are present in the genomes of both Trypanosoma (T. congolense and T. cruzi) and Leishmania (L. braziliensis) (Fig. (Fig.9).9). Second, ingi1 sequences branch at the very base of the trypanosomatid RT tree and thus represent one of the most ancient ingi subclades identified (Fig. (Fig.3).3). The ingi2 subclade, which is composed of Leishmania DIREs (L. major and L. braziliensis), also has a basal position in the retroposon tree. The positions of the ingi1 and ingi2 sequences in the RT tree were also confirmed by the phylogenetic analyses of the endonuclease and RNase H domains (reference 7 and data not shown). Interestingly, ingi2 sequences are more closely related to ingi3 to -6 than ingi1 sequences, suggesting that the ingi1 subclade on one hand and all of the other sequences on the other hand form two different groups of retroposons. This separation of the ingi-related retroposons into two groups also clearly appears on the phylogenetic tree (Fig. (Fig.3).3). We therefore propose that the genome of the ancestral trypanosomatid (before Trypanosoma and Leishmania speciation) contained at least two ingi-related families, the ancestors of ingi1 and ingi2 to -6 (Fig. (Fig.9).9). The ingi5 and -6 subclades are present only in the genomes of African trypanosomes, while ingi4 sequences are also present in T. cruzi. This suggests that ingi4 sequences appeared in the trypanosome ancestor, followed by ingi5 and ingi6 in the African trypanosome ancestor. According to its position in the retroposon trees, the ingi3 subclade probably appeared before the Stercoraria/Salivaria speciation. We cannot exclude the possibility that ingi3 to -6 sequences were also present in the trypanosomatid genome before Trypanosoma/Leishmania speciation, although this hypothesis is unlikely, as (i) the phylogenetic analysis is in agreement with a relatively recent appearance of the ingi3 and -4 subclades, followed by ingi5 and -6 subclades, and (ii) the Leishmania and T. cruzi genomes are devoid of ingi3 to -6 and ingi5 and -6 sequences, respectively. According to the model presented in Fig. Fig.9,9, only very few retroposon subfamilies are maintained in individual trypanosomatid genomes. Among the six ingi subclades identified, only one was identified in the genome of T. vivax. T. congolense is the most retroposon-rich trypanosomatid, with four identified subclades, two of which show potentially active elements (L1Tco and Tcoingi). In conclusion, Leishmania subspecies lost active retroposons of the ingi clade rapidly after their speciation, while trypanosomes maintained active retroposon families in their genomes. However, in the course of the evolution of trypanosome species, at least six subclades appeared, and elements of each clade remained in individual genomes as active families, as well as vestigial sequences. We can anticipate that the complexity of ingi evolution will increase with the number of trypanosomatid genomes analyzed.
So far, five potentially active retroposons of the ingi clade have been identified in T. brucei (Tbingi), T. congolense (Tcoingi and L1Tco), T. vivax (Tvingi), and T. cruzi (L1Tc). The closely related Tcoingi and Tvingi (86% and 88.1% identity at the nucleotide and amino acid levels, respectively) show the same conserved patterns upstream of the retroposons at positions −1 to −22 (Fig. (Fig.77 and and8).8). According to the current model of retrotransposition, this conserved motif represents the binding site of the retroposon-encoded endonuclease, which performs the site strand cleavage required to initiate target-primed reverse transcription of the retroelement (25). This indicates that the Tcoingi and Tvingi endonuclease domains, which are 93% identical, show the same site specificity. The L1Tc- and L1Tco-encoded endonucleases also share the same site specificity (Fig. (Fig.6B);6B); however, they are not as closely related as the Tcoingi and Tvingi elements. Although L1Tc and L1Tco are closely related in the phylogenetic tree, they are poorly conserved at the nucleotide level and are only 50.1% identical at the amino acid level (the respective endonuclease domains are 43.5% identical). In contrast, the Tbingi endonuclease domain shows 44.7% and 43.4% identity with the Tcoingi and Tvingi endonuclease domains, respectively. However, the 5′-flanking regions do not share conservation (Fig. (Fig.8).8). Altogether, this comparative analysis of endonuclease domains and putative recognition sites of retroposons suggests that the trypanosomatid ingi elements may provide a good model to study the structure-function relationship of the retroposon endonuclease domains.
The rise and fall of retroelement families are well documented in eukaryotes, and it has been recently proposed that the extinction of transposable element families might be linked to molecular domestication events (2, 20, 33, 39). The expansion and domestication of two large families of short ingi-related retroposons have recently been described in the genomes of all Leishmania spp., which contain only the extinct retroposon families LmSIDER1 and LmSIDER2 (9, 34). Interestingly, members of both LmSIDER families have been domesticated by Leishmania to play a role in the regulation of gene expression at the posttranscriptional and/or posttranslational level (3, 9, 29). This massive expansion followed by domestication of transposable elements is seemingly confined to Leishmania spp. Indeed, large families of short ingi-related retroposons have not been identified in the genomes of T. brucei and T. cruzi (9) or in those of T. vivax and T. congolense.
T. brucei TbRIME (500 bp) and T. cruzi NARTc (260 bp) are short-retroposon families that have been successfully expanded in the respective genomes. Nucleotide comparisons of these retroelements and their flanking regions clearly demonstrated that TbRIME and NARTc are truncated versions of Tbingi and L1Tc, respectively, and that they use the retrotransposition machinery of the long autonomous elements for their own retrotransposition, thus forming the Tbingi/TbRIME and L1Tc/NARTc pairs (4, 5). Similarly, we identified in the T. vivax genome a truncated version of Tvingi (TvRIME), which is probably active to constitute the Tvingi/TvRIME pair. Clearly, the production of active truncated ingi-related elements occurred independently in the trypanosome genomes. First, the high level of sequence conservation between autonomous and nonautonomous members of each pair suggests that the deletion occurred quite recently in the evolution of trypanosomes, after T. brucei, T. vivax, and T. cruzi speciation. Second, the Tbingi/TbRIME, Tvingi/TvRIME, and L1Tc/NARTc pairs are distantly related and belong to different ingi subclades (Fig. (Fig.9).9). Third, the position of the deleted DNA fragment implies that different events led to the production of TbRIME, TvRIME, and NARTc elements (Fig. (Fig.2).2). In contrast, we did not detect any equivalent short versions of Tcoingi and L1Tco retroposons, suggesting that no active truncated elements appeared in the T. congolense genome. However, we cannot exclude the possibility that such a short active retroposon element evolved in the T. congolense genome and was subsequently lost in the course of the parasite's evolution.
We thank the core sequencing and informatics teams at the Wellcome Trust Sanger Institute for their assistance and the Wellcome Trust for its support of the Sanger Institute Pathogen Genomics and Pathogen Informatics groups. We are grateful to H. Valeins and P. Thebault for informatics support and bioinformatics advice.
F.B. was supported by the CNRS, the University Victor Segalen Bordeaux 2, and the Fondation de la Recherche Médicale. M.B. and C.H.-F. were funded by the Wellcome Trust (grant number WT085775/Z/08/Z).
Published ahead of print on 7 August 2009.
†Supplemental material for this article may be found at http://ec.asm.org/.