We recently characterized HAmo SINE and its partner LINE in silver carp and bighead carp based on hybridization capture of repetitive elements from digested genomic DNA in solution using a bead-probe . To reveal the distribution and evolutionary history of SINEs and LINEs in cyprinid genomes, we performed a multi-species search for HAmo SINE and its partner LINE using the bead-probe capture and internal-primer-SINE polymerase chain reaction (PCR) techniques.
Sixty-seven full-size and 125 internal-SINE sequences (as well as 34 full-size and 9 internal sequences previously reported in bighead carp and silver carp) from 17 species of the family Cyprinidae were aligned as well as 14 new isolated HAmoL2 sequences. Four subfamilies (type I, II, III and IV), which were divided based on diagnostic nucleotides in the tRNA-unrelated region, expanded preferentially within a certain lineage or within the whole family of Cyprinidae as multiple active source genes. The copy numbers of HAmo SINEs were estimated to vary from 104 to 106 in cyprinid genomes by quantitative RT-PCR. Over one hundred type IV members were identified and characterized in the primitive cyprinid Danio rerio genome but only tens of sequences were found to be similar with type I, II and III since the type IV was the oldest subfamily and its members dispersed in almost all investigated cyprinid fishes. For determining the taxonomic distribution of HAmo SINE, inter-primer SINE PCR was conducted in other non-cyprinid fishes, the results shows that HAmo SINE- related sequences may disperse in other families of order Cypriniforms but absent in other orders of bony fishes: Siluriformes, Polypteriformes, Lepidosteiformes, Acipenseriformes and Osteoglossiforms.
Depending on HAmo LINE2, multiple source genes (subfamilies) of HAmo SINE actively expanded and underwent retroposition in a certain lineage or within the whole family of Cyprinidae. From this perspective, HAmo SINE should provide useful phylogenetic makers for future analyses of the evolutionary relationships among species in the family Cyprinidae.
Short interspersed repetitive elements (SINEs) are a type of retroposon, being members of a class of informational molecules that are amplified via cDNA intermediates and flow back into the host genome. In contrast to retroviruses and retrotransposons, SINEs do not encode the enzymes required for their amplification, such as reverse transcriptases, so they are presumed to borrow these enzymes from other sources. In the present study, we isolated a family of long interspersed repetitive elements (LINEs) from the turtle genome. The sequence of this family was found to be very similar to those of the avian CR1 family. To our surprise, the sequence at the 3' end of the LINE in the turtle genome was nearly identical to that of a family of tortoise SINEs. Since CR1-like LINEs are widespread in birds and in many other reptiles, including the turtle, and since the tortoise SINEs are only found in vertical-necked turtles, it seems possible that the sequence at the 3' end of the tortoise SINEs might have been generated by recombination with the CR1-like LINE in a common ancestor of vertical-necked turtles, after the divergence of side-necked turtles. We extended our observations to show that the 3'-end sequences of families of several tRNA-derived SINEs, such as the salmonid HpaI family, the tobacco TS family, and the salmon SmaI family, might have originated from the respective LINEs. Since it appears reasonable that the recognition sites of LINEs for reverse transcriptase are located within their 3'-end sequences, these results provide the basis for a general scheme for the mechanism by which SINEs might acquire retropositional activity. We propose here that tRNA-derived SINEs might have been generated by a recombination event in which a strong-stop DNA with a primer tRNA, which is an intermediate in the replication of certain retroviruses and long terminal repeat retrotransposons, was directly integrated at the 3' end of a LINE.
Short interspersed nucleotide elements (SINEs), a type of retrotransposon, are widely distributed in various genomes with multiple copies arranged in different orientations, and cause changes to genes and genomes during evolutionary history. This can provide the basis for determining genome diversity, genetic variation and molecular phylogeny, etc. SINE DNA is transcribed into RNA by polymerase III from an internal promoter, which is composed of two conserved boxes, box A and box B. Here we present an approach to isolate novel SINEs based on these promoter elements. Box A of a SINE is obtained via PCR with only one primer identical to box B (B-PCR). Box B and its downstream sequence are acquired by PCR with one primer corresponding to box A (A-PCR). The SINE clone produced by A-PCR is selected as a template to label a probe with biotin. The full-length SINEs are isolated from the genomic pool through complex capture using the biotinylated probe bound to magnetic particles. Using this approach, a novel SINE family, Cn-SINE, from the genomes of Coilia nasus, was isolated. The members are 180–360 bp long. Sequence homology suggests that Cn-SINEs evolved from a leucine tRNA gene. This is the first report of a tRNALeu-related SINE obtained without the use of a genomic library or inverse PCR. These results provide new insights into the origin of SINEs.
transposable element; SINE; tRNA; Coilia nasus
Although more than 120 families of short interspersed nuclear elements (SINEs) have been isolated from the eukaryotic genomes, little is known about SINEs in insects. Here, we characterize three novel SINEs from the cotton bollworm, Helicoverpa armigera. Two of them, HaSE1 and HaSE2, share similar 5′ -structure including a tRNA-related region immediately followed by conserved central domain. The 3′ -tail of HaSE1 is significantly similar to that of one LINE retrotransposon element, HaRTE1.1, in H. armigera genome. The 3′ -region of HaSE2 showed high identity with one mariner-like element in H. armigera. The third family, termed HaSE3, is a 5S rRNA-derived SINE and shares both body part and 3′-tail with HaSE1, thus may represent the first example of a chimera generated by recombination between 5S rRNA and tRNA-derived SINE in insect species. Further database searches revealed the presence of these SINEs in several other related insect species, but not in the silkworm, Bombyx mori, indicating a relatively narrow distribution of these SINEs in Lepidopterans. Apart from above, we found a copy of HaSE2 in the GenBank EST entry for the cotton aphid, Aphis gossypii, suggesting the occurrence of horizontal transfer.
Repetitive short interspersed elements (SINEs) are retrotransposons ubiquitous in mammalian genomes and are highly informative markers to identify species and phylogenetic associations. Of these, SINEs unique to the order Carnivora (CanSINEs) yield novel insights on genome evolution in domestic dogs and cats, but less is known about their role in related carnivores. In particular, genome-wide assessment of CanSINE evolution has yet to be completed across the Feliformia (cat-like) suborder of Carnivora. Within Feliformia, the cat family Felidae is composed of 37 species and numerous subspecies organized into eight monophyletic lineages that likely arose 10 million years ago. Using the Felidae family as a reference phylogeny, along with representative taxa from other families of Feliformia, the origin, proliferation and evolution of CanSINEs within the suborder were assessed.
We identified 93 novel intergenic CanSINE loci in Feliformia. Sequence analyses separated Feliform CanSINEs into two subfamilies, each characterized by distinct RNA polymerase binding motifs and phylogenetic associations. Subfamily I CanSINEs arose early within Feliformia but are no longer under active proliferation. Subfamily II loci are more recent, exclusive to Felidae and show evidence for adaptation to extant RNA polymerase activity. Further, presence/absence distributions of CanSINE loci are largely congruent with taxonomic expectations within Feliformia and the less resolved nodes in the Felidae reference phylogeny present equally ambiguous CanSINE data. SINEs are thought to be nearly impervious to excision from the genome. However, we observed a nearly complete excision of a CanSINEs locus in puma (Puma concolor). In addition, we found that CanSINE proliferation in Felidae frequently targeted existing CanSINE loci for insertion sites, resulting in tandem arrays.
We demonstrate the existence of at least two SINE families within the Feliformia suborder, one of which is actively involved in insertional mutagenesis. We find SINEs are powerful markers of speciation and conclude that the few inconsistencies with expected patterns of speciation likely represent incomplete lineage sorting, species hybridization and SINE-mediated genome rearrangement.
Incomplete lineage sorting; SINEs; Carnivora; Speciation; transposable elements; Adaptation; Feliformia; Felidae
The proopiomelanocortin gene (POMC) is expressed in the pituitary gland and the ventral hypothalamus of all jawed vertebrates, producing several bioactive peptides that function as peripheral hormones or central neuropeptides, respectively. We have recently determined that mouse and human POMC expression in the hypothalamus is conferred by the action of two 5′ distal and unrelated enhancers, nPE1 and nPE2. To investigate the evolutionary origin of the neuronal enhancer nPE2, we searched available vertebrate genome databases and determined that nPE2 is a highly conserved element in placentals, marsupials, and monotremes, whereas it is absent in nonmammalian vertebrates. Following an in silico paleogenomic strategy based on genome-wide searches for paralog sequences, we discovered that opossum and wallaby nPE2 sequences are highly similar to members of the superfamily of CORE-short interspersed nucleotide element (SINE) retroposons, in particular to MAR1 retroposons that are widely present in marsupial genomes. Thus, the neuronal enhancer nPE2 originated from the exaptation of a CORE-SINE retroposon in the lineage leading to mammals and remained under purifying selection in all mammalian orders for the last 170 million years. Expression studies performed in transgenic mice showed that two nonadjacent nPE2 subregions are essential to drive reporter gene expression into POMC hypothalamic neurons, providing the first functional example of an exapted enhancer derived from an ancient CORE-SINE retroposon. In addition, we found that this CORE-SINE family of retroposons is likely to still be active in American and Australian marsupial genomes and that several highly conserved exonic, intronic and intergenic sequences in the human genome originated from the exaptation of CORE-SINE retroposons. Together, our results provide clear evidence of the functional novelties that transposed elements contributed to their host genomes throughout evolution.
One of the most striking observations derived from the genomic era is the overwhelming contribution of transposed elements to mammalian genomes. For example, 45% of the human genome is derived from mobile element fragments. Although historically viewed as “junk DNA,” transposed elements could also contribute to novel advantageous functional elements in their host genomes, a process called exaptation. Functionally proven examples of exaptation derived from ancient retroposition events are rare. Using an in silico paleogenomic strategy, we unraveled the evolutionary origin of nPE2, a neuronal enhancer of the proopiomelancortin gene that participates in the production of hypothalamic peptides involved in feeding behavior and stress-induced analgesia. We demonstrate that nPE2 originated from the exaptation of a SINE retroposon in the lineage leading to mammals and remained under purifying selection for the last 170 million years. The difficulty in detecting nPE2 origin as an exapted retroposon illustrates the underestimation of this phenomenon and encourages the finding of the many thousands of retroposon-derived functional elements still hidden within the genomes. Their discovery will contribute to a better understanding of the dynamics of gene evolution and, at a larger scale, the origin of macroevolutionary novelties that lead to the appearance of new species, orders, or classes.
A substantial number of “retrogenes” that are derived from the mRNA of various intron-containing genes have been reported. A class of mammalian retroposons, long interspersed element-1 (LINE1, L1), has been shown to be involved in the reverse transcription of retrogenes (or processed pseudogenes) and non-autonomous short interspersed elements (SINEs). The 3′-end sequences of various SINEs originated from a corresponding LINE. As the 3′-untranslated regions of several LINEs are essential for retroposition, these LINEs presumably require “stringent” recognition of the 3′-end sequence of the RNA template. However, the 3′-ends of mammalian L1s do not exhibit any similarity to SINEs, except for the presence of 3′-poly(A) repeats. Since the 3′-poly(A) repeats of L1 and Alu SINE are critical for their retroposition, L1 probably recognizes the poly(A) repeats, thereby mobilizing not only Alu SINE but also cytosolic mRNA. Many flowering plants only harbor L1-clade LINEs and a significant number of SINEs with poly(A) repeats, but no homology to the LINEs. Moreover, processed pseudogenes have also been found in flowering plants. I propose that the ancestral L1-clade LINE in the common ancestor of green plants may have recognized a specific RNA template, with stringent recognition then becoming relaxed during the course of plant evolution.
The major clinical manifestations of Entamoeba histolytica infection include amebic colitis and liver abscess. However the majority of infections remain asymptomatic. Earlier reports have shown that some E. histolytica isolates are more virulent than others, suggesting that virulence may be linked to genotype. Here we have looked at the genomic distribution of the retrotransposable short interspersed nuclear elements EhSINE1 and EhSINE2. Due to their mobile nature, some EhSINE copies may occupy different genomic locations among isolates of E. histolytica possibly affecting adjacent gene expression; this variability in location can be exploited to differentiate strains.
We have looked for EhSINE1- and EhSINE2-occupied loci in the genome sequence of Entamoeba histolytica HM-1:IMSS and searched for homologous loci in other strains to determine the insertion status of these elements. A total of 393 EhSINE1 and 119 EhSINE2 loci were analyzed in the available sequenced strains (Rahman, DS4-868, HM1:CA, KU48, KU50, KU27 and MS96-3382. Seventeen loci (13 EhSINE1 and 4 EhSINE2) were identified where a EhSINE1/EhSINE2 sequence was missing from the corresponding locus of other strains. Most of these loci were unoccupied in more than one strain. Some of the loci were analyzed experimentally for SINE occupancy using DNA from strain Rahman. These data helped to correctly assemble the nucleotide sequence at three loci in Rahman. SINE occupancy was also checked at these three loci in 7 other axenically cultivated E. histolytica strains and 16 clinical isolates. Each locus gave a single, specific amplicon with the primer sets used, making this a suitable method for strain typing. Based on presence/absence of SINE and amplification with locus-specific primers, the 23 strains could be divided into eleven genotypes. The results obtained by our method correlated with the data from other typing methods. We also report a bioinformatic analysis of EhSINE2 copies.
Our results reveal several loci with extensive polymorphism of SINE occupancy among different strains of E. histolytica and prove the principle that the genomic distribution of SINEs is a valid method for typing of E. histolytica strains.
Entamoeba histolytica; Genotype; EhSINE1; SINE occupancy; Polymorphism; Strain typing
Transposable elements, including short interspersed repetitive elements (SINEs), comprise nearly half the mammalian genome. Moreover, they are a major source of conserved non-coding elements (CNEs), which play important functional roles in regulating development-related genes, such as enhancing and silencing, serving for the diversification of morphological and physiological features among species. We previously reported a novel SINE family, AmnSINE1, as part of mammalian-specific CNEs. One AmnSINE1 locus, named AS071, showed an enhancer property in the developing mouse diencephalon. Indeed, AS071 appears to recapitulate the expression of diencephalic fibroblast growth factor 8 (Fgf8). Here we established three independent lines of AS071-transgenic mice and performed detailed expression profiling of AS071-enhanced lacZ in comparison with that of Fgf8 across embryonic stages. We demonstrate that AS071 is a distal enhancer that directs Fgf8 expression in the developing diencephalon. Furthermore, enhancer assays with constructs encoding partially deleted AS071 sequence revealed a unique modular organization in which AS071 contains at least three functionally distinct sub-elements that cooperatively direct the enhancer activity in three diencephalic domains, namely the dorsal midline and the lateral wall of the diencephalon, and the ventral midline of the hypothalamus. Interestingly, the AmnSINE1-derived sub-element was found to specify the enhancer activity to the ventral midline of the hypothalamus. To our knowledge, this is the first discovery of an enhancer element that could be separated into respective sub-elements that determine regional specificity and/or the core enhancing activity. These results potentiate our understanding of the evolution of retroposon-derived cis-regulatory elements as well as the basis for future studies of the molecular mechanism underlying the determination of domain-specificity of an enhancer.
The proper temporal and spatial expression of genes during plant development is governed, in part, by the regulatory activities of various types of small RNAs produced by the different RNAi pathways. Here we report that transgenic Arabidopsis plants constitutively expressing the rapeseed SB1 SINE retroposon exhibit developmental defects resembling those observed in some RNAi mutants. We show that SB1 RNA interacts with HYL1 (DRB1), a double-stranded RNA-binding protein (dsRBP) that associates with the Dicer homologue DCL1 to produce microRNAs. RNase V1 protection assays mapped the binding site of HYL1 to a SB1 region that mimics the hairpin structure of microRNA precursors. We also show that HYL1, upon binding to RNA substrates, induces conformational changes that force single-stranded RNA regions to adopt a structured helix-like conformation. Xenopus laevis ADAR1, but not Arabidopsis DRB4, binds SB1 RNA in the same region as HYL1, suggesting that SINE RNAs bind only a subset of dsRBPs. Consistently, DCL4-DRB4-dependent miRNA accumulation was unchanged in SB1 transgenic Arabidopsis, whereas DCL1-HYL1-dependent miRNA and DCL1-HYL1-DCL4-DRB4-dependent tasiRNA accumulation was decreased. We propose that SINE RNA can modulate the activity of the RNAi pathways in plants and possibly in other eukaryotes.
Short interspersed elements (SINEs) are transposable elements in eukaryotic genomes that mobilize through an RNA intermediate. Recently, mammalian SINE RNAs were shown to have roles as noncoding riboregulators in stress situations or in specific tissues. Mammalian SINE RNAs modulate the level of mRNAs and proteins by interacting with key proteins involved in gene transcription and translation. Here we show that constitutive production of a plant SINE RNA induces developmental defects in Arabidopsis thaliana and that this SINE RNA interacts with HYL1, a double-stranded RNA-binding protein required for the production of microRNA and trans-acting small interfering (tasi)RNA. We mapped the binding site of HYL1 to a SINE RNA region that mimics the hairpin structure of microRNA precursors. We also found that HYL1 induces conformational changes upon binding to RNA substrates. These data suggest that SINE RNAs modulate the activity of RNAi pathways in Arabidopsis.
Invasive amoebiasis, caused by infection with the human parasite Entamoeba histolytica remains a major cause of morbidity and mortality in some less-developed countries. Genetically E. histolytica exhibits a number of unusual features including having approximately 20% of its genome comprised of repetitive elements. These include a number of families of SINEs - non-autonomous elements which can, however, move with the help of partner LINEs. In many eukaryotes SINE mobility has had a profound effect on gene expression; in this study we concentrated on one such element - EhSINE1, looking in particular for evidence of recent transposition.
EhSINE1s were detected in the newly reassembled E. histolytica genome by searching with a Hidden Markov Model developed to encapsulate the key features of this element; 393 were detected. Examination of their sequences revealed that some had an internal structure showing one to four 26-27 nt repeats. Members of the different classes differ in a number of ways and in particular those with two internal repeats show the properties expected of fairly recently transposed SINEs - they are the most homogeneous in length and sequence, they have the longest (i.e. the least decayed) target site duplications and are the most likely to show evidence (in a cDNA library) of active transcription. Furthermore we were able to identify 15 EhSINE1s (6 pairs and one triplet) which appeared to be identical or very nearly so but inserted into different sites in the genome; these provide good evidence that if mobility has now ceased it has only done so very recently.
Of the many families of repetitive elements present in the genome of E. histolytica we have examined in detail just one - EhSINE1. We have shown that there is evidence for waves of transposition at different points in the past and no evidence that mobility has entirely ceased. There are many aspects of the biology of this parasite which are not understood, in particular why it is pathogenic while the closely related species E. dispar is not, the great genetic diversity found amongst patient isolates and the fact, which may be related, that only a small proportion of those infected develop clinical invasive amoebiasis. Mobile genetic elements, with their ability to alter gene expression may well be important in unravelling these puzzles.
A repetitive element of approximately 200 bp was cloned from harbour seal (Phoca vitulina concolour) genomic DNA. The sequence of the element revealed putative RNA polymerase III control boxes, a poly A tail and direct terminal repeats characteristic of SINEs. Sequence and secondary structural similarities suggest that the SINE is derived from a tRNA, possibly tRNA-alanine. Southern blot analysis indicated that the element is predominately dispersed in unique regions of the seal genome, but may also be present in other repetitive sequences, such as tandemly arrayed satellite DNA. Based on slot-blot hybridization analysis, we estimate that 1.3 x 10(6) copies of the SINE are present in the harbour seal genome; SINE copy number based on the number of clones isolated from a size-selected library, however, is an order of magnitude lower (1-3 x 10(5) copies), an estimate consistent with the abundance of SINEs in other mammalian genomes. Database searches found similar sequences have been isolated from dog (Canis familiaris) and mink (Mustela vison). These, and the seal SINE sequences are characterized by an internal CT dinucleotide microsatellite in the tRNA-unrelated region. Hybridization of genomic DNA from representative species of a wide range of mammalian orders to an oligonucleotide (30mer) probe complementary to a conserved region of the SINE confirmed that the element is unique to carnivores of the superfamily Canoidea.
The C family of short, interspersed repeats (SINES) is highly repeated in the rabbit genome, and most members have a structure suggestive of a model for their dispersal via reinsertion of a double-stranded copy of an RNA polymerase III transcribed RNA. We have determined the nucleotide sequence of additional members of the repeat family and have compiled them to obtain an improved consensus sequence. This compilation shows that although most regions of the repeat are well conserved, two regions show high variability. Some individual repeats are truncated, and one truncated repeat retains the characteristic structures of a retroposon. The consensus sequence for C repeats does not match the sequence of any other sequenced mammalian SINE over large regions, but short imperfect matches to several primate and rodent SINES are observed. A sequence similar to the 27 nucleotide consensus sequence TCCCAGCAACCACATGGGAGGCAGAGA was found in all mammalian SINES examined. The 3' portion of this sequence matches a DNA segment found at the replication origins of papovaviruses.
Short interspersed elements (SINEs) make up a significant fraction of total DNA in mammalian genomes, providing a rich substrate for chromosomal rearrangements by SINE-SINE recombinations. Proliferation of mammalian SINEs is mediated primarily by LINE1 (L1) non-LTR retrotransposons that preferentially integrate at DNA sequence targets with average length ~15 bp and containing conserved endonucleolytic nicking signals at both ends. We report that sequence variations in the first of the two nicking signals, represented by a 5′TT-AAAA consensus sequence, affect the position of the second signal thus leading to target site duplications (TSDs) of different lengths. The length distribution of TSDs appears to be affected also by L1-encoded enzyme variants, since targets with the same 5′ nicking site can be of different average length in different mammalian species. Taking this into account, we re-analyzed the second nicking site and found that it is larger and includes more conserved sites than previously appreciated, with a consensus of 5′ANTNTN-AA. We also studied potential involvement of the nicking sites in stimulating recombinations between SINE elements. We determined that SINE elements retaining TSDs with perfect 5′TT-AAAA nicking sites appear to be lost relatively rapidly from the human and rat genomes, and less rapidly from dog. We speculate that the introduction of single-strand DNA breaks induced by recurring endonucleolytic attacks at these sites, combined with the ubiquitousness of SINEs, may significantly promote recombination between repetitive elements, leading to the observed losses. At the same time new L1 subfamilies may be selected for “incompatibility” with pre-existing targets. This provides a possible driving force for the continual emergence of new L1 subfamilies which, in turn, may affect selection of L1-dependent SINE subfamilies.
non-LTR retrotransposons; recombination; SINE integration targets
In a human genome, we found dispersed repetitive sequences homologous to part of a human endogenous retrovirus termed HERV-K which resembled mouse mammary tumor virus. For elucidation of their structure and organization, we cloned some of these sequences from a human gene library. The sequence common to the cloned DNA was ca. 630 base-pairs (bp) in length with an A-rich tail at the 3' end and was found to be a SINE (short interspersed repeated sequence) type nonviral retroposon. In this retroposon, the 5' end had multiple copies of a 40 bp direct repeat very rich in GC content and about the next 510 nucleotides were homologous to the 3' long terminal repeat and its upstream flanking region of the HERV-K genome. This retroposon was thus given the name, SINE-R element since most of it derived from a retrovirus. SINE-R elements were present at 4,000 to 5,000 copies per haploid human genome. The nucleotide sequence was ca. 90% homologous among the cloned elements.
The non-long-terminal-repeat (non-LTR) retrotransposons (also called long interspersed repetitive elements [LINEs]) are among the oldest retroelements. Here we describe the properties of such an element from a primitive protozoan parasite, Entamoeba histolytica, that infects the human gut. This 4.8-kb element, called EhLINE1, is present in about 140 copies dispersed throughout the genome. The element belongs to the R4 clade of non-LTR elements. It has a centrally located reverse transcriptase domain and a restriction enzyme-like endonuclease (EN) domain at the carboxy terminus. We have cloned and expressed a 794-bp fragment containing the EN domain in Escherichia coli. The purified protein could nick supercoiled pBluescript DNA to yield open circular and linear DNAs. The conserved PDX12-14D motif was required for activity. Genomic sequences flanking the sites of insertion of EhLINE1 and the putative partner short interspersed repetitive element (SINE), EhSINE1, were analyzed. Both elements resulted in short target site duplications (TSD) upon insertion. A common feature was the presence of a short T-rich stretch just upstream of the TSD in most insertion sites. By sequence analysis an empty target site in the E. histolytica genome, known to be occupied by EhSINE1, was identified. When a 176-bp fragment containing the empty site was used as a substrate for EN, it was prominently nicked on the bottom strand at the precise point of insertion of EhSINE1, showing that this SINE could use the LINE-encoded endonuclease for its insertion. The nick on the bottom strand was toward the right of the TSD, which is uncommon. The lack of strict target site-specificity of the restriction enzyme-like EN encoded by EhLINE1 is also exceptional. A model for retrotransposition of EhLINE1/SINE1 is presented.
The popularity of microsatellites has greatly increased in the last decade on account of their many applications. However, little is currently understood about the factors that influence their genesis and distribution among and within species genomes. In this work, we analyzed carnivore microsatellite clones from GenBank to study their association with interspersed repeats and elucidate the role of the latter in microsatellite genesis and distribution.
We constructed a comprehensive carnivore microsatellite database comprising 1236 clones from GenBank. Thirty-three species of 11 out of 12 carnivore families were represented, although two distantly related species, the domestic dog and cat, were clearly overrepresented. Of these clones, 330 contained tRNALys-derived SINEs and 357 contained other interspersed repeats. Our rough estimates of tRNA SINE copies per haploid genome were much higher than published ones. Our results also revealed a distinct juxtaposition of AG and A-rich repeats and tRNALys-derived SINEs suggesting their coevolution. Both microsatellites arose repeatedly in two regions of the insterspersed repeat. Moreover, microsatellites associated with tRNALys-derived SINEs showed the highest complexity and less potential instability.
Our results suggest that tRNALys-derived SINEs are a significant source for microsatellite generation in carnivores, especially for AG and A-rich repeat motifs. These observations indicate two modes of microsatellite generation: the expansion and variation of pre-existing tandem repeats and the conversion of sequences with high cryptic simplicity into a repeat array; mechanisms which are not specific to tRNALys-derived SINEs. Microsatellite and interspersed repeat coevolution could also explain different distribution of repeat types among and within species genomes.
Finally, due to their higher complexity and lower potential informative content of microsatellites associated with tRNALys-derived SINEs, we recommend avoiding their use as genetic markers.
Entamoeba histolytica and Entamoeba dispar are closely related protistan parasites but while E. histolytica can be invasive, E. dispar is completely non pathogenic. Transposable elements constitute a significant portion of the genome in these species; there being three families of LINEs and SINEs. These elements can profoundly influence the expression of neighboring genes. Thus their genomic location can have important phenotypic consequences. A genome-wide comparison of the location of these elements in the E. histolytica and E. dispar genomes has not been carried out. It is also not known whether the retrotransposition machinery works similarly in both species. The present study was undertaken to address these issues.
Here we extracted all genomic occurrences of full-length copies of EhSINE1 in the E. histolytica genome and matched them with the homologous regions in E. dispar, and vice versa, wherever it was possible to establish synteny. We found that only about 20% of syntenic sites were occupied by SINE1 in both species. We checked whether the different genomic location in the two species was due to differences in the activity of the LINE-encoded endonuclease which is required for nicking the target site. We found that the endonucleases of both species were essentially very similar, both in their kinetic properties and in their substrate sequence specificity. Hence the differential distribution of SINEs in these species is not likely to be influenced by the endonuclease. Further we found that the physical properties of the DNA sequences adjoining the insertion sites were similar in both species.
Our data shows that the basic retrotransposition machinery is conserved in these sibling species. SINEs may indeed have occupied all of the insertion sites in the genome of the common ancestor of E. histolytica and E. dispar but these may have been subsequently lost from some locations. Alternatively, SINE expansion took place after the divergence of the two species. The absence of SINE1 in 80% of syntenic loci could affect the phenotype of the two species, including their pathogenic properties, which needs to be explored.
Short interspersed nuclear elements (SINEs) are non-autonomous non-LTR retroelements that are present in most eukaryotic species. While SINEs have been intensively investigated in humans and other animal systems, they are poorly studied in plants, especially in wheat (Triticum aestivum). We used quantitative PCR of various wheat species to determine the copy number of a wheat SINE family, termed Au SINE, combined with computer-assisted analyses of the publicly available 454 pyrosequencing database of T. aestivum. In addition, we utilized site-specific PCR on 57 Au SINE insertions, transposon methylation display and transposon display on newly formed wheat polyploids to assess retrotranspositional activity, epigenetic status and genetic rearrangements in Au SINE, respectively. We retrieved 3706 different insertions of Au SINE from the 454 pyrosequencing database of T. aestivum, and found that most of the elements are inserted in A/T-rich regions, while approximately 38% of the insertions are associated with transcribed regions, including known wheat genes. We observed typical retrotransposition of Au SINE in the second generation of a newly formed wheat allohexaploid, and massive hypermethylation in CCGG sites surrounding Au SINE in the third generation. Finally, we observed huge differences in the copy numbers in diploid Triticum and Aegilops species, and a significant increase in the copy numbers in natural wheat polyploids, but no significant increase in the copy number of Au SINE in the first four generations for two of three newly formed allopolyploid species used in this study. Our data indicate that SINEs may play a prominent role in the genomic evolution of wheat through stress-induced activation.
SINE; transposable elements; allopolyploidy; genome evolution; wheat; methylation
Previous studies have indicated a paucity of SINEs within the genomes of the guinea pig and nutria, representatives of the Hystricognathi suborder of rodents. More recent work has shown that the guinea pig genome contains a large number of B1 elements, expanding to various levels among different rodents. In this work we utilized A–B PCR and screened GenBank with sequences from isolated clones to identify potentially uncharacterized SINEs within the guinea pig genome, and identified numerous sequences with a high degree of similarity (>92%) specific to the guinea pig. The presence of A-tails and flanking direct repeats associated with these sequences supported the identification of a full-length SINE, with a consensus sequence notably distinct from other rodent SINEs. Although most similar to the ID SINE, it clearly was not derived from the known ID master gene (BC1), hence we refer to this element as guinea pig ID-like (GPIDL). Using the consensus to screen the guinea pig genomic database (Assembly CavPor2) with Ensembl BlastView, we estimated at least 100,000 copies, which contrasts markedly to just over 100 copies of ID elements. Additionally we provided evidence of recent integrations of GPIDL as two of seven analyzed conserved GPIDL-containing loci demonstrated presence/absence variants in Cavia porcellus and C. aperea. Using intra-IDL PCR and sequence analyses we also provide evidence that GPIDL is derived from a hystricognath-specific SINE family. These results demonstrate that this SINE family continues to contribute to the dynamics of genomes of hystricognath rodents.
Genome Evolution; ID elements; Retrotransposons; Rodent Genomes; SINEs
Short Interspersed Nucleotide Elements (SINEs) are highly abundant in mammalian genomes. The term SINE has come to be restricted to short retroposons with internal RNA polymerase III promoter sites in a region derived from a structural RNA (usually a tRNA). Here we describe a novel, 260 bp tRNA-derived SINE, some fragments of which have been noted before to be repetitive in mammalian DNA. Unlike previously reported SINEs, which are restricted to closely related species, copies of this element can be found in all mammalian genomes, including marsupials. It is therefore called MIR for mammalian-wide interspersed repeat. Their high divergence and their presence at orthologous sites in different mammals indicate that MIRs, at least in part, amplified before the mammalian radiation. Next to Alu, MIRs are the most common interspersed repeat in primates with an estimated 300,000 copies still discernible, which account for 1 to 2% of our DNA. Interestingly, a small, central region of MIR appears to be much better conserved in the genomic copies than the rest of the sequence.
ID elements comprise a rodent SINE (short interspersed DNA repetitive element) family that has amplified by retroposition of a few master genes. In order to understand the important factors of SINE amplification, we investigated the transcription of rat ID elements. Three different size classes of ID transcripts, BC1, BC2 and T3, have been detected in various rat tissues, including brain and testes. We have analysed the nucleotide sequences of testes- and brain-derived ID transcripts isolated by size-fractionation, C-tailing and RACE. Nucleotide sequence variation of testes ID transcripts demonstrated derivation from different loci. However, the transcripts represent a preferred set of ID elements that closely match the subfamily consensus sequences. The small ID transcripts, T3, are not comprised of primary transcripts, but are instead processed polyA-transcripts generated from many different loci. These truncated transcripts would be expected to be retroposition-incompetent forms. Therefore, the amplification of ID elements is likely to be regulated at multiple steps of retroposition, which include transcription and processing. Although brain ID transcripts showed a similar pattern, with the addition of very high levels of transcription from the BC1 locus, we also found evidence that a single locus dominated the production of brain BC2 RNA species. BC1 RNA is highly stable in both germ line and brain cells, based on the low level of detection of the processing product, T3. This stability of BC1 RNA might have been a contributing factor in its role as a master gene for ID amplification.
The genome of the carnivorous marsupial, the Tasmanian devil (Sarcophilus harrisii, Order: Dasyuromorphia), was sequenced in the hopes of finding a cure for or gaining a better understanding of the contagious devil facial tumor disease that is threatening the species’ survival. To better understand the Tasmanian devil genome, we screened it for transposable elements and investigated the dynamics of short interspersed element (SINE) retroposons.
The temporal history of Tasmanian devil SINEs, elucidated using a transposition in transposition analysis, indicates that WSINE1, a CORE-SINE present in around 200,000 copies, is the most recently active element. Moreover, we discovered a new subtype of WSINE1 (WSINE1b) that comprises at least 90% of all Tasmanian devil WSINE1s. The frequencies of WSINE1 subtypes differ in the genomes of two of the other Australian marsupial orders. A co-segregation analysis indicated that at least 66 subfamilies of WSINE1 evolved during the evolution of Dasyuromorphia. Using a substitution rate derived from WSINE1 insertions, the ages of the subfamilies were estimated and correlated with a newly established phylogeny of Dasyuromorphia. Phylogenetic analyses and divergence time estimates of mitochondrial genome data indicate a rapid radiation of the Tasmanian devil and the closest relative the quolls (Dasyurus) around 14 million years ago.
The radiation and abundance of CORE-SINEs in marsupial genomes indicates that they may be a major player in the evolution of marsupials. It is evident that the early phases of evolution of the carnivorous marsupial order Dasyuromorphia was characterized by a burst of SINE activity. A correlation between a speciation event and a major burst of retroposon activity is for the first time shown in a marsupial genome.
SINE; WSINE1; Retroposon; Tasmanian devil; Sarcophilus; Genome; Marsupials
Identification and mapping of repetitive elements is a key step for accurate gene prediction and overall structural annotation of genomes. During the assembly and annotation of three highly repetitive amoeba genomes, Entamoeba histolytica, Entamoeba dispar, and Entamoeba invadens, we performed comparative sequence analysis to identify and map all class I and class II transposable elements in their sequences.
Here, we report the identification of two novel Entamoeba-specific repeats: ERE1 and ERE2; ERE1 is spread across the three genomes and associated with different repeats in a species-specific manner, while ERE2 is unique to E. histolytica. We also report the identification of two novel subfamilies of LINE and SINE retrotransposons in E. dispar and provide evidence for how the different LINE and SINE subfamilies evolved in these species. Additionally, we found a putative transposase-coding gene in E. histolytica and E. dispar related to the mariner transposon Hydargos from E. invadens. The distribution of transposable elements in these genomes is markedly skewed with a tendency of forming clusters. More than 70% of the three genomes have a repeat density below their corresponding average value indicating that transposable elements are not evenly distributed. We show that repeats and repeat-clusters are found at syntenic break points between E. histolytica and E. dispar and hence, could work as recombination hot spots promoting genome rearrangements.
The mapping of all transposable elements found in these parasites shows that repeat coverage is up to three times higher than previously reported. LINE, ERE1 and mariner elements were present in the common ancestor to the three Entamoeba species while ERE2 was likely acquired by E. histolytica after its separation from E. dispar. We demonstrate that E. histolytica and E. dispar share their entire repertoire of LINE and SINE retrotransposons and that Eh_SINE3/Ed_SINE1 originated as a chimeric SINE from Eh/Ed_SINE2 and Eh_SINE1/Ed_SINE3. Our work shows that transposable elements are organized in clusters, frequently found at syntenic break points providing insights into their contribution to chromosome instability and therefore, to genomic variation and speciation in these parasites.
SINEBase (http://sines.eimb.ru) integrates the revisited body of knowledge about short interspersed elements (SINEs). A set of formal definitions concerning SINEs was introduced. All available sequence data were screened through these definitions and the genetic elements misidentified as SINEs were discarded. As a result, 175 SINE families have been recognized in animals, flowering plants and green algae. These families were classified by the modular structure of their nucleotide sequences and the frequencies of different patterns were evaluated. These data formed the basis for the database of SINEs. The SINEBase website can be used in two ways: first, to explore the database of SINE families, and second, to analyse candidate SINE sequences using specifically developed tools. This article presents an overview of the database and the process of SINE identification and analysis.