Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated “TE models” in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila.
A first step in adding value to the large-scale DNA sequences generated by genome projects is the process of annotation—marking biological features on the raw string of adenines, cytosines, guanines, and thymines. The predominant goal in genome annotation thus far has been to identify gene sequences that encode proteins; however, many functional sequences exist in non-protein-coding regions and their annotation remains incomplete. Mobile, repetitive DNA segments known as transposable elements (TEs) are one class of functional sequence in non-protein-coding regions, which can make up large fractions of genome sequences (e.g., about 45% in the human) and can play important roles in gene and chromosome structure and regulation. As a consequence, there has been increasing interest in the computational identification of TEs in genome sequences. Borrowing current ideas from the field of gene annotation, the authors have developed a pipeline to predict TEs in genome sequences that combines multiple sources of evidence from different computational methods. The authors' combined-evidence pipeline represents an important step towards raising the standards of TE annotation to the same quality as that of genes, and should help catalyze their understanding of the biological role of these fascinating sequences.
Although transposable elements (TEs) are known to be potent sources of mutation, their contribution to the generation of recent adaptive changes has never been systematically assessed. In this work, we conduct a genome-wide screen for adaptive TE insertions in Drosophila melanogaster that have taken place during or after the spread of this species out of Africa. We determine population frequencies of 902 of the 1,572 TEs in Release 3 of the D. melanogaster genome and identify a set of 13 putatively adaptive TEs. These 13 TEs increased in population frequency sharply after the spread out of Africa. We argue that many of these TEs are in fact adaptive by demonstrating that the regions flanking five of these TEs display signatures of partial selective sweeps. Furthermore, we show that eight out of the 13 putatively adaptive elements show population frequency heterogeneity consistent with these elements playing a role in adaptation to temperate climates. We conclude that TEs have contributed considerably to recent adaptive evolution (one TE-induced adaptation every 200–1,250 y). The majority of these adaptive insertions are likely to be involved in regulatory changes. Our results also suggest that TE-induced adaptations arise more often from standing variants than from new mutations. Such a high rate of TE-induced adaptation is inconsistent with the number of fixed TEs in the D. melanogaster genome, and we discuss possible explanations for this discrepancy.
Transposable elements (TEs) are present in virtually all species and often contribute a substantial fraction of the genome size. Understanding the functional roles, evolution, and population dynamics of TEs is essential to understanding genome evolution and function. Much of our knowledge about TE population dynamics and evolution comes from the studies of TEs in Drosophila. However, the adaptive importance of TEs in the Drosophila genome has never been assessed. In this work, we describe the first comprehensive genome-wide screen for recent adaptive TE insertions in D. melanogaster. Using several independent criteria, we identified a set of 13 adaptive TEs and estimate that 25–50 TEs have played adaptive roles since the migration of D. melanogaster out of Africa. We show that most of these adaptive TEs are likely to be involved in regulatory changes and appear to be involved in adaptation to the temperate climate. We argue that most identified adaptive TEs are destined to be lost from the D. melanogaster population but that they do contribute significantly to local adaptation in this species.
Transposable elements contributed substantially to the adaptation ofD. melanogaster to the out-of-Africa environments. The majority of these adaptive insertions are likely to be involved in regulatory changes.
Annotation of an improved whole-genome shotgun assembly of the Drosophila melanogaster genome predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Fluorescence in situ hybridization was used to correlate the genomic sequence with the cytogenetic map; the annotated euchromatic sequence extends into the centric heterochromatin on each chromosome arm.
Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly.
WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm.
Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes.
Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop.
Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95%) of these elements (particularly a few hundred low-copy-number families) are first described in this study.
SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually curated transposable element database for any individual plant genome completely sequenced to date. Transposable elements previously identified in legumes, the third largest family of flowering plants, are relatively scarce. Thus this database will facilitate structural, evolutionary, functional, and epigenetic analyses of transposable elements in soybean and other legume species.
Transposable elements represent a large proportion of the eukaryotic genomes. Long Terminal Repeat (LTR) retrotransposons are very abundant and constitute the predominant family of transposable elements in plants. Recent studies have identified chromoviruses to be a widely distributed lineage of Gypsy elements. These elements contain chromodomains in their integrases, which suggests a preference for insertion into heterochromatin. In turn, this preference might have contributed to the patterning of heterochromatin observed in host genomes. Despite their potential importance for our understanding of plant genome dynamics and evolution, the regulatory mechanisms governing the behavior of chromoviruses and their activities remain largely uncharacterized. Here, we report a detailed analysis of the spatio-temporal activity of a plant chromovirus in the endogenous host. We examined LORE1a, a member of the endogenous chromovirus LORE1 family from the model legume Lotus japonicus. We found that this chromovirus is stochastically de-repressed in plant populations regenerated from de-differentiated cells and that LORE1a transposes in the male germline. Bisulfite sequencing of the 5′ LTR and its surrounding region suggests that tissue culture induces a loss of epigenetic silencing of LORE1a. Since LTR promoter activity is pollen specific, as shown by the analysis of transgenic plants containing an LTR::GUS fusion, we conclude that male germline-specific LORE1a transposition in pollen grains is controlled transcriptionally by its own cis-elements. New insertion sites of LORE1a copies were frequently found in genic regions and show no strong insertional preferences. These distinctive novel features of LORE1 indicate that this chromovirus has considerable potential for generating genetic and epigenetic diversity in the host plant population. Our results also define conditions for the use of LORE1a as a genetic tool.
In contrast to animals, where germline differentiation initiates early in embryogenesis, germline differentiation in plants starts in the adult phase during reproductive development. Transpositions of transposable elements in both somatic and gametic cells can be transmitted to the next generation. As a result, plant genomes may contain transposable elements exhibiting a variety of tissue-specific activities. Thus far, the spatio-temporal activity of LTR retrotransposons, the most abundant class of transposable elements in plants, has not been well characterized. Here, we report a detailed analysis of the spatio-temporal transposition pattern of a plant LTR retrotransposon in the endogenous system. Using the model legume Lotus japonicus, we found that LORE1a, a member of the chromovirus LORE1 family that belongs to the Gypsy superfamily, was epigenetically de-repressed via tissue culture. Activation was stochastic and derepression was maintained in regenerated plants. This feature made it possible to trace the original spatio-temporal activity of the retrotransposon in the intact plants. We determined that the plant chromovirus retrotransposes mainly in the male germline, without obvious insertional preferences for chromosomal regions. This finding suggests that the tissue specificity of transposable elements should be taken into account when considering their impact on the host genome dynamics and evolution.
Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that “burden tests” that test for the effect of TEs as a class may be more fruitful.
transposable element; DGRP; DSPR; genomics; population genetics
Previous studies of repetitive elements (REs) have implicated a mechanistic role in generating new chimerical genes. Such examples are consistent with the classic model for exon shuffling, which relies on non-homologous recombination. However, recent data for chromosomal aberrations in model organisms suggest that ectopic homology-dependent recombination may also be important. Lack of a dataset comprising experimentally verified young duplicates has hampered an effective examination of these models as well as an investigation of sequence features that mediate the rearrangements. Here we use ∼7,000 cDNA probes (∼112,000 primary images) to screen eight species within the Drosophila melanogaster subgroup and identify 17 duplicates that were generated through ectopic recombination within the last 12 mys. Most of these are functional and have evolved divergent expression patterns and novel chimeric structures. Examination of their flanking sequences revealed an excess of repetitive sequences, with the majority belonging to the transposable element DNAREP1 family, associated with the new genes. Our dataset strongly suggests an important role for REs in the generation of chimeric genes within these species.
In numerous organisms, many new genes have been found to originate through dispersed gene duplication and exon/domain shuffling. What recombination mechanisms were involved in the duplication and the shuffling processes? Lack of the intermediate products of recombination that share adequate sequence identity between homologous sequences, or the parental sequences from which the new genes were derived, often makes answering these questions difficult. We identified a number of young genes that originated in recently diverged branches in the evolutionary tree of the eight Drosophila melanogaster subgroup species, by using fluorescence in situ hybridization with polytene chromosomes. We analyzed the genomic regions surrounding 17 new dispersed duplicate genes and observed that most of these genes are flanked by repetitive elements (REs), including a large and diverged transposable element family, DNAREP1. Several copies of these REs are kept in both new and parental gene regions, and their degeneration is correlated with the increasing ages of the identified new genes. These data suggest that REs mediate the recombination responsible for the new gene origination.
The benefits of ever-growing numbers of sequenced eukaryotic genomes will not be fully realized until we learn to decipher vast stretches of noncoding DNA, largely composed of transposable elements. Transposable elements persist through self-replication, but some genes once encoded by transposable elements have, through a process called molecular domestication, evolved new functions that increase fitness. Although they have conferred numerous adaptations, the number of such domesticated transposable element genes remains unknown, so their evolutionary and functional impact cannot be fully assessed. Systematic searches that exploit genomic signatures of natural selection have been employed to identify potential domesticated genes, but their predictions have yet to be experimentally verified. To this end, we investigated a family of domesticated genes called MUSTANG (MUG), identified in a previous bioinformatic search of plant genomes. We show that MUG genes are functional. Mutants of Arabidopsis thaliana MUG genes yield phenotypes with severely reduced plant fitness through decreased plant size, delayed flowering, abnormal development of floral organs, and markedly reduced fertility. MUG genes are present in all flowering plants, but not in any non-flowering plant lineages, such as gymnosperms, suggesting that the molecular domestication of MUG may have been an integral part of early angiosperm evolution. This study shows that systematic searches can be successful at identifying functional genetic elements in noncoding regions and demonstrates how to combine systematic searches with reverse genetics in a fruitful way to decipher eukaryotic genomes.
The genomes of complex organisms are mostly made up not of ordinary genes but of transposable elements. Transposable elements have been called “selfish DNA” because they normally persist by copying themselves, not by helping the organism to survive or reproduce. Yet transposable elements can help organisms to evolve; for instance, transposable element genes sometimes acquire new functions that do benefit the organism. Because they are difficult to distinguish from transposable elements, little is known about these “domesticated genes.” Although studies have attempted to identify them computationally, the predictions have not been verified experimentally. Here, we examine some of the first domesticated genes to be predicted computationally, the MUSTANG family of plant genes. We show that the predictions were correct: MUSTANGs are, like ordinary genes, functional. MUSTANG mutations result in serious defects in how plants grow, flower, and reproduce. Since they are present only in flowering plants, MUSTANG probably originated when flowers first evolved, perhaps taking on a key role. This study is important both because it shows that MUSTANG is critical to plant fitness and because, in the future, a similar approach can be used to find additional domesticated genes and to better understand how transposable elements contribute to evolution.
An analysis of high-resolution transposable element annotations in Drosophila melanogaster suggests the existence of a global surveillance system against the majority of transposable elements families in the fly.
The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences.
Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity.
Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster.
Transposable elements are mobile DNA sequences that integrate into host genomes using diverse mechanisms with varying degrees of target site specificity. While the target site preferences of some engineered transposable elements are well studied, the natural target preferences of most transposable elements are poorly characterized. Using population genomic resequencing data from 166 strains of Drosophila melanogaster, we identified over 8,000 new insertion sites not present in the reference genome sequence that we used to decode the natural target preferences of 22 families of transposable element in this species. We found that terminal inverted repeat transposon and long terminal repeat retrotransposon families present clade-specific target site duplications and target site sequence motifs. Additionally, we found that the sequence motifs at transposable element target sites are always palindromes that extend beyond the target site duplication. Our results demonstrate the utility of population genomics data for high-throughput inference of transposable element targeting preferences in the wild and establish general rules for terminal inverted repeat transposon and long terminal repeat retrotransposon target site selection in eukaryotic genomes.
Most known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification of these transformations remains a difficult issue. The study of families of transposable elements is generally founded on a multiple alignment of their sequences, a critical step that is adapted to transposons containing mostly localized nucleotide mutations. Many transposons that have lost their protein-coding capacity have undergone more complex rearrangements, needing the development of more complex methods in order to characterize the architecture of sequence variations.
In this study, we introduce the concept of a transposable element module, a flexible motif present in at least two sequences of a family of transposable elements and built on a succession of maximal repeats. The paper proposes an assembly method working on a set of exact maximal repeats of a set of sequences to create such modules. It results in a graphical view of sequences segmented into modules, a representation that allows a flexible analysis of the transformations that have occurred between them. We have chosen as a demonstration data set in depth analysis of the transposable element Foldback in Drosophila melanogaster. Comparison with multiple alignment methods shows that our method is more sensitive for highly variable sequences. The study of this family and the two other families AtREP21 and SIDER2 reveals new copies of very different sizes and various combinations of modules which show the potential of our method.
ModuleOrganizer is available on the Genouest bioinformatics center at http://moduleorganizer.genouest.org
Transposable elements are a common source of genetic variation that may play a substantial role in contributing to gene expression variation. However, the contribution of transposable elements to expression variation thus far consists of a handful of examples. We used previously published gene expression data from 37 inbred Drosophila melanogaster lines from the Drosophila Genetic Reference Panel to perform a genome-wide assessment of the effects of transposable elements on gene expression. We found thousands of transcripts with transposable element insertions in or near the transcript and that the presence of a transposable element in or near a transcript is significantly associated with reductions in expression. We estimate that within this example population, ∼2.2% of transcripts have a transposable element insertion, which significantly reduces expression in the line containing the transposable element. We also find that transcripts with insertions within 500 bp of the transcript show on average a 0.67 standard deviation decrease in expression level. These large decreases in expression level are most pronounced for transposable element insertions close to transcripts and the effect diminishes for more distant insertions. This work represents the first genome-wide analysis of gene expression variation due to transposable elements and suggests that transposable elements are an important class of mutation underlying expression variation in Drosophila and likely in other systems, given the ubiquity of these mobile elements in eukaryotic genomes.
transposable elements; gene expression; rare alleles of large effect; DGRP
Transposable elements (TEs) are mobile genetic elements that parasitize genomes by semi-autonomously increasing their own copy number within the host genome. While TEs are important for genome evolution, appropriate methods for performing unbiased genome-wide surveys of TE variation in natural populations have been lacking. Here, we describe a novel and cost-effective approach for estimating population frequencies of TE insertions using paired-end Illumina reads from a pooled population sample. Importantly, the method treats insertions present in and absent from the reference genome identically, allowing unbiased TE population frequency estimates. We apply this method to data from a natural Drosophila melanogaster population from Portugal. Consistent with previous reports, we show that low recombining genomic regions harbor more TE insertions and maintain insertions at higher frequencies than do high recombining regions. We conservatively estimate that there are almost twice as many “novel” TE insertion sites as sites known from the reference sequence in our population sample (6,824 novel versus 3,639 reference sites, with on average a 31-fold coverage per insertion site). Different families of transposable elements show large differences in their insertion densities and population frequencies. Our analyses suggest that the history of TE activity significantly contributes to this pattern, with recently active families segregating at lower frequencies than those active in the more distant past. Finally, using our high-resolution TE abundance measurements, we identified 13 candidate positively selected TE insertions based on their high population frequencies and on low Tajima's D values in their neighborhoods.
Transposable elements (TE's) are parasitic genetic elements that spread by replicating themselves within a host genome. Most organisms are burdened with transposable elements; in fact, up to 80% of some genomes can consist of TE–derived DNA. Here, we use new sequencing technology to examine variation in genomic TE composition within a population at a finer scale and in a more unbiased fashion than has been possible before. We study a Portuguese population of D. melanogaster and find a large number of TE insertions, most of which occur in few individuals. Our analysis confirms that TE insertions are subject to purifying selection that counteracts their spread, and it suggests that the genome records waves of past TE invasions, with recently active elements occurring at low population frequency. We also find indications that TE insertions may sometimes have beneficial effects.
Eight terminally deleted Drosophila melanogaster chromosomes have now been found to be "healed." In each case, the healed chromosome end had acquired sequence from the HeT DNA family, a complex family of repeated sequences found only in telomeric and pericentric heterochromatin. The sequences were apparently added by transposition events involving no sequence homology. We now report that the sequences transposed in healing these chromosomes identify a novel transposable element, HeT-A, which makes up a subset of the HeT DNA family. Addition of HeT-A elements to broken chromosome ends appears to be polar. The proximal junction between each element and the broken chromosome end is an oligo(A) tract beginning 54 nucleotides downstream from a conserved AATAAA sequence on the strand running 5' to 3' from the chromosome end. The distal (telomeric) ends of HeT-A elements are variably truncated; however, we have not yet been able to determine the extreme distal sequence of a complete element. Our analysis covers approximately 2,600 nucleotides of the HeT-A element, beginning with the oligo(A) tract at one end. Sequence homology is strong (greater than 75% between all elements studied). Sequence may be conserved for DNA structure rather than for protein coding; even the most recently transposed HeT-A elements lack significant open reading frames in the region studied. Instead, the elements exhibit conserved short-range sequence repeats and periodic long-range variation in base composition. These conserved features suggest that HeT-A elements, although transposable elements, may have a structural role in telomere organization or maintenance.
Hybrid incompatibilities (HIs) cause reproductive isolation between species and thus contribute to speciation. Several HI genes encode adaptively evolving proteins that localize to or interact with heterochromatin, suggesting that HIs may result from co-evolution with rapidly evolving heterochromatic DNA. Little is known, however, about the intraspecific function of these HI genes, the specific sequences they interact with, or the evolutionary forces that drive their divergence. The genes Hmr and Lhr genetically interact to cause hybrid lethality between Drosophila melanogaster and D. simulans, yet mutations in both genes are viable. Here, we report that Hmr and Lhr encode proteins that form a heterochromatic complex with Heterochromatin Protein 1 (HP1a). Using RNA-Seq analyses we discovered that Hmr and Lhr are required to repress transcripts from satellite DNAs and many families of transposable elements (TEs). By comparing Hmr and Lhr function between D. melanogaster and D. simulans we identify several satellite DNAs and TEs that are differentially regulated between the species. Hmr and Lhr mutations also cause massive overexpression of telomeric TEs and significant telomere lengthening. Hmr and Lhr therefore regulate three types of heterochromatic sequences that are responsible for the significant differences in genome size and structure between D. melanogaster and D. simulans and have high potential to cause genetic conflicts with host fitness. We further find that many TEs are overexpressed in hybrids but that those specifically mis-expressed in lethal hybrids do not closely correlate with Hmr function. Our results therefore argue that adaptive divergence of heterochromatin proteins in response to repetitive DNAs is an important underlying force driving the evolution of hybrid incompatibility genes, but that hybrid lethality likely results from novel epistatic genetic interactions that are distinct to the hybrid background.
Sister species capable of mating often produce hybrids that are sterile or die during development. This reproductive isolation is caused by incompatibilities between the two sister species' genomes. Some hybrid incompatibilities involve genes that encode rapidly evolving proteins that localize to heterochromatin. Heterochromatin is largely made up of highly repetitive transposable elements and satellite DNAs. It has been hypothesized that rapid changes in heterochromatic DNA drives the changes in these HI genes and thus the evolution of reproductive isolation. In support of this model, we show that two rapidly evolving HI proteins, Lhr and Hmr, which reproductively isolate the fruit fly sister species D. melanogaster and D. simulans, repress transposable elements and satellite DNAs. These proteins also help regulate the length of the atypical Drosophila telomeres, which are themselves made of domesticated transposable elements. Our data suggest that these proteins are part of the adaptive machinery that allows the host to respond to changes and increases in heterochromatin and to maintain the activity of genes located within or adjacent to heterochromatin.
Mobile genetic elements represent a high proportion of the Eukaryote genomes. In maize, 85% of genome is composed by transposable elements of several families. First step in transposable element life cycle is the synthesis of an RNA, but few is known about the regulation of transcription for most of the maize transposable element families. Maize is the plant from which more ESTs have been sequenced (more than two million) and the third species in total only after human and mice. This allowed us to analyze the transcriptional activity of the maize transposable elements based on EST databases.
We have investigated the transcriptional activity of 56 families of transposable elements in different maize organs based on the systematic search of more than two million expressed sequence tags. At least 1.5% maize ESTs show sequence similarity with transposable elements. According to these data, the patterns of expression of each transposable element family is variable, even within the same class of elements. In general, transcriptional activity of the gypsy-like retrotransposons is higher compared to other classes. Transcriptional activity of several transposable elements is specially high in shoot apical meristem and sperm cells. Sequence comparisons between genomic and transcribed sequences suggest that only a few copies are transcriptionally active.
The use of powerful high-throughput sequencing methodologies allowed us to elucidate the extent and character of repetitive element transcription in maize cells. The finding that some families of transposable elements have a considerable transcriptional activity in some tissues suggests that, either transposition is more frequent than previously expected, or cells can control transposition at a post-transcriptional level.
The mariner family of transposable elements is one of the most widespread in the Metazoa. It is subdivided into several subfamilies that do not mirror the phylogeny of these species, suggesting an ancient diversification. Previous hybridization and PCR studies allowed a partial survey of mariner diversity in the Metazoa. In this work, we used a comparative genomics approach to access the genus-wide diversity and evolution of mariner transposable elements in twenty Drosophila sequenced genomes.
We identified 36 different mariner lineages belonging to six distinct subfamilies, including a subfamily not described previously. Wide variation in lineage abundance and copy number were observed among species and among mariner lineages, suggesting continuous turn-over. Most mariner lineages are inactive and contain a high proportion of damaged copies. We showed that, in addition to substitutions that rapidly inactivate copies, internal deletion is a major mechanism contributing to element decay and the generation of non-autonomous sublineages. Hence, 23% of copies correspond to several Miniature Inverted-repeat Transposable Elements (MITE) sublineages, the first ever described in Drosophila for mariner. In the most successful MITEs, internal deletion is often associated with internal rearrangement, which sheds light on the process of MITE origin. The estimation of the transposition rates over time revealed that all lineages followed a similar progression consisting of a rapid amplification burst followed by a rapid decrease in transposition. We detected some instances of multiple or ongoing transposition bursts. Different amplification times were observed for mariner lineages shared by different species, a finding best explained by either horizontal transmission or a reactivation process. Different lineages within one species have also amplified at different times, corresponding to successive invasions. Finally, we detected a preference for insertion into short TA-rich regions, which appears to be specific to some subfamilies.
This analysis is the first comprehensive survey of this family of transposable elements at a genus scale. It provides precise measures of the different evolutionary processes that were hypothesized previously for this family based on PCR data analysis. mariner lineages were observed at almost all “life cycle” stages: recent amplification, subsequent decay and potential (re)-invasion or invasion of genomes.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-727) contains supplementary material, which is available to authorized users.
Drosophila; Comparative genomics; Tc1-mariner; Transposable elements; MITEs; Deletion rate
A genome-wide comparison of transposable elements reveals evidence for unexpectedly high rates of horizontal transfer between three species of Drosophila
Horizontal transfer (HT) could play an important role in the long-term persistence of transposable elements (TEs) because it provides them with the possibility to avoid the checking effects of host-silencing mechanisms and natural selection, which would eventually drive their elimination from the genome. However, despite the increasing evidence for HT of TEs, its rate of occurrence among the TE pools of model eukaryotic organisms is still unknown.
We have extracted and compared the nucleotide sequences of all potentially functional autonomous TEs present in the genomes of Drosophila melanogaster, D. simulans and D. yakuba - 1,436 insertions classified into 141 distinct families - and show that a large fraction of the families found in two or more species display levels of genetic divergence and within-species diversity that are significantly lower than expected by assuming copy-number equilibrium and vertical transmission, and consistent with a recent origin by HT. Long terminal repeat (LTR) retrotransposons form nearly 90% of the HT cases detected. HT footprints are also frequent among DNA transposons (40% of families compared) but rare among non-LTR retroelements (6%). Our results suggest a genomic rate of 0.04 HT events per family per million years between the three species studied, as well as significant variation between major classes of elements.
The genome-wide patterns of sequence diversity of the active autonomous TEs in the genomes of D. melanogaster, D. simulans and D. yakuba suggest that one-third of the TE families originated by recent HT between these species. This result emphasizes the important role of horizontal transmission in the natural history of Drosophila TEs.
High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5–18.5 Kbp with an extremely low error rate (0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.
Heterochromatin plays an important role in chromosome function and gene regulation. Despite the availability of polytene chromosomes and genome sequence, the heterochromatin of the major malaria vector Anopheles gambiae has not been mapped and characterized.
To determine the extent of heterochromatin within the An. gambiae genome, genes were physically mapped to the euchromatin-heterochromatin transition zone of polytene chromosomes. The study found that a minimum of 232 genes reside in 16.6 Mb of mapped heterochromatin. Gene ontology analysis revealed that heterochromatin is enriched in genes with DNA-binding and regulatory activities. Immunostaining of the An. gambiae chromosomes with antibodies against Drosophila melanogaster heterochromatin protein 1 (HP1) and the nuclear envelope protein lamin Dm0 identified the major invariable sites of the proteins' localization in all regions of pericentric heterochromatin, diffuse intercalary heterochromatin, and euchromatic region 9C of the 2R arm, but not in the compact intercalary heterochromatin. To better understand the molecular differences among chromatin types, novel Bayesian statistical models were developed to analyze genome features. The study found that heterochromatin and euchromatin differ in gene density and the coverage of retroelements and segmental duplications. The pericentric heterochromatin had the highest coverage of retroelements and tandem repeats, while intercalary heterochromatin was enriched with segmental duplications. We also provide evidence that the diffuse intercalary heterochromatin has a higher coverage of DNA transposable elements, minisatellites, and satellites than does the compact intercalary heterochromatin. The investigation of 42-Mb assembly of unmapped genomic scaffolds showed that it has molecular characteristics similar to cytologically mapped heterochromatin.
Our results demonstrate that Anopheles polytene chromosomes and whole-genome shotgun assembly render the mapping and characterization of a significant part of heterochromatic scaffolds a possibility. These results reveal the strong association between characteristics of the genome features and morphological types of chromatin. Initial analysis of the An. gambiae heterochromatin provides a framework for its functional characterization and comparative genomic analyses with other organisms.
The control of transposable element (TE) activity in germ cells provides genome integrity over generations. A distinct small RNA–mediated pathway utilizing Piwi-interacting RNAs (piRNAs) suppresses TE expression in gonads of metazoans. In the fly, primary piRNAs derive from so-called piRNA clusters, which are enriched in damaged repeated sequences. These piRNAs launch a cycle of TE and piRNA cluster transcript cleavages resulting in the amplification of piRNA and TE silencing. Using genome-wide comparison of TE insertions and ovarian small RNA libraries from two Drosophila strains, we found that individual TEs inserted into euchromatic loci form novel dual-stranded piRNA clusters. Formation of the piRNA-generating loci by active individual TEs provides a more potent silencing response to the TE expansion. Like all piRNA clusters, individual TEs are also capable of triggering the production of endogenous small interfering (endo-si) RNAs. Small RNA production by individual TEs spreads into the flanking genomic regions including coding cellular genes. We show that formation of TE-associated small RNA clusters can down-regulate expression of nearby genes in ovaries. Integration of TEs into the 3′ untranslated region of actively transcribed genes induces piRNA production towards the 3′-end of transcripts, causing the appearance of genic piRNA clusters, a phenomenon that has been reported in different organisms. These data suggest a significant role of TE-associated small RNAs in the evolution of regulatory networks in the germline.
Silencing of transposable elements (TEs) in germ cells depends on a distinct class of small RNAs, Piwi-interacting RNAs (piRNAs). TE repression is provided by piRNAs derived from large heterochromatic loci enriched in fragmented TE copies, so-called piRNA clusters. According to the current model, individual TEs and their transcripts are considered merely as targets of cluster-derived primary piRNAs, which exert post-transcriptional and transcriptional silencing in Drosophila. In our work, we show that natural individual transposons become piRNA-generating loci themselves. We came to this conclusion by comparing the ovarian small RNAs and TE insertion sites of two Drosophila strains, which showed that euchromatic target sites of strain-specific TEs generate a number of novel strain-specific piRNAs. This mechanism allows production of additional small RNAs that target active TEs and provide more potent transposon suppression in the germline. Moreover, small RNA production by individual TEs spreads into the flanking genomic regions, which affects the expression of adjacent coding genes and microRNA genes. These data underline the role of individual TEs in a silencing response and explore a new level of TE impact on the gene regulatory networks in the germline.
The constant bombardment of mammalian genomes by transposable elements (TEs) has resulted in TEs comprising at least 45% of the human genome. Because of their great age and abundance, TEs are important in comparative phylogenomics. However, estimates of TE age were previously based on divergence from derived consensus sequences or phylogenetic analysis, which can be unreliable, especially for older more diverged elements. Therefore, a novel genome-wide analysis of TE organization and fragmentation was performed to estimate TE age independently of sequence composition and divergence or the assumption of a constant molecular clock. Analysis of TEs in the human genome revealed ∼600,000 examples where TEs have transposed into and fragmented other TEs, covering >40% of all TEs or ∼542 Mbp of genomic sequence. The relative age of these TEs over evolutionary time is implicit in their organization, because newer TEs have necessarily transposed into older TEs that were already present. A matrix of the number of times that each TE has transposed into every other TE was constructed, and a novel objective function was developed that derived the chronological order and relative ages of human TEs spanning >100 million years. This method has been used to infer the relative ages across all four major TE classes, including the oldest, most diverged elements. Analysis of DNA transposons over the history of the human genome has revealed the early activity of some MER2 transposons, and the relatively recent activity of MER1 transposons during primate lineages. The TEs from six additional mammalian genomes were defragmented and analyzed. Pairwise comparison of the independent chronological orders of TEs in these mammalian genomes revealed species phylogeny, the fact that transposons shared between genomes are older than species-specific transposons, and a subset of TEs that were potentially active during periods of speciation.
Transposable elements (TEs) are interspersed repetitive DNA families that are capable of copying themselves from place to place; they have literally infested our genome over evolutionary time, and now comprise as much as 45% of our total DNA. Because of their great age and abundance, TEs are important in evolutionary genomics. However, estimates of their age based on DNA sequence composition have been unreliable, especially for older more diverged elements. Therefore, a novel method to estimate the age of TEs was developed based on the fact that as TEs spread throughout the genome, they inserted into and fragmented older TEs that were already present. Therefore, the age of TEs can be revealed by how often they have been fragmented over evolutionary time. We performed a genome-wide defragmention of TEs, and developed a novel objective function to derive the chronological order of TEs spanning >100 million years. This method has been used to infer the relative ages of TEs from seven sequenced mammalian genomes across all four major TE classes, including the oldest, most diverged elements. This age estimate is independent of TE sequence composition or divergence and does not rely on the assumption of a constant molecular clock. This study provides a novel analysis of the evolutionary history of some of the most abundant and ancient repetitive DNA elements in mammalian genomes, which is important for understanding the dynamic forces that shape our genomes during evolution.
Two classes of DNA elements interrupt a fraction of the rRNA repeats of Bombyx mori. We have analyzed by genomic blotting and sequence analysis one class of these elements which we have named R2. These elements occupy approximately 9% of the rDNA units of B. mori and appear to be homologous to the type II rDNA insertions detected in Drosophila melanogaster. Approximately 25 copies of R2 exist within the B. mori genome, of which at least 20 are located at a precise location within otherwise typical rDNA units. Nucleotide sequence analysis has revealed that the 4.2-kilobase-pair R2 element has a single large open reading frame, occupying over 82% of the total length of the element. The central region of this 1,151-amino-acid open reading frame shows homology to the reverse transcriptase enzymes found in retroviruses and certain transposable elements. Amino acid homology of this region is highest to the mobile line 1 elements of mammals, followed by the mitochondrial type II introns of fungi, and the pol gene of retroviruses. Less homology exists with transposable elements of D. melanogaster and Saccharomyces cerevisiae. Two additional regions of sequence homology between L1 and R2 elements were also found outside the reverse transcriptase region. We suggest that the R2 elements are retrotransposons that are site specific in their insertion into the genome. Such mobility would enable these elements to occupy a small fraction of the rDNA units of B. mori despite their continual elimination from the rDNA locus by sequence turnover.
Transposable elements with long terminal inverted repeats are rare and only one family of elements of this sort has been identified in the genome of Drosophila melanogaster. An insertion associated with the HSBS mutation of the achaete-scute complex has been reported to be a second element of this type. We have determined the complete sequence of this insertion and have shown that it is in fact two copies of a new LINE-like transposable element, that we have called BS, inserted in opposite orientation 337 bp apart. Like other elements of this type, BS has two open reading frames that appear to encode a gag-like polypeptide and a reverse transcriptase. There are few complete BS elements in the five strains of D.melanogaster that we have tested and they appear to transpose infrequently. The events that may have lead to the double BS insertion are discussed in terms of the supposed mechanism of transposition of LINE-like elements.
The potential adaptive significance of transposable elements (TEs) to the host genomes in which they reside is a topic that has been hotly debated by molecular evolutionists for more than two decades. Recent genomic analyses have demonstrated that TE fragments are associated with functional genes in plants and animals. These findings suggest that TEs may contribute significantly to gene evolution.
We have analyzed two transposable elements associated with genes in the sequenced Drosophila melanogaster y; cn bw sp strain. A fragment of the Antonia long terminal repeat (LTR) retrotransposon is present in the intron of Chitinase 3 (Cht3), a gene located within the constitutive heterochromatin of chromosome 2L. Within the euchromatin of chromosome 2R a full-length Burdock LTR retrotransposon is located immediately 3' to cathD, a gene encoding cathepsin D. We tested for the presence of these two TE/gene associations in strains representing 12 geographically diverse populations of D. melanogaster. While the cathD insertion variant was detected only in the sequenced y; cn bw sp strain, the insertion variant present in the heterochromatic Cht3 gene was found to be fixed throughout twelve D. melanogaster populations and in a D. mauritiana strain suggesting that it maybe of adaptive significance. To further test this hypothesis, we sequenced a 685bp region spanning the LTR fragment in the intron of Cht3 in strains representative of the two sibling species D. melanogaster and D. mauritiana (~2.7 million years divergent). The level of sequence divergence between the two species within this region was significantly lower than expected from the neutral substitution rate and lower than the divergence observed between a randomly selected intron of the Drosophila Alcohol dehydrogenase gene (Adh).
Our results suggest that a 359 bp fragment of an Antonia retrotransposon (complete LTR is 659 bp) located within the intron of the Drosophila melanogaster Cht3 gene is of adaptive evolutionary significance. Our results are consistent with previous suggestions that the presence of TEs in constitutive heterochromatin may be of significance to the expression of heterochromatic genes.