|Home | About | Journals | Submit | Contact Us | Français|
Transposable elements (TEs) have a unique ability to mobilize to new genomic locations and the major advance of next-generation DNA sequencing has provided insights into the dynamic relationship between TEs and their hosts. It now is clear that TEs have adopted diverse strategies – such as specific integration sites or patterns of activity - to thrive in host environments that are replete with mechanisms – such as small RNAs or epigenetic marks - to combat their amplification. Emerging evidence suggests that TE mobilization might sometimes benefit host genomes by enhancing genetic diversity, but TEs are also implicated in diseases such as cancer. Here, we discuss recent findings about how, where, and when TEs insert in diverse organisms.
Through her pioneering work in maize, Barbara McClintock was the first to realize that eukaryotic genomes are not static entities and contain transposable elements (TEs) that have the ability to move from one chromosomal location to another1. It now is clear that virtually all organisms harbor TEs that have amplified in copy number over evolutionary time via DNA or RNA intermediates. On occasion, TEs sporadically have been co-opted by the host to perform critical cellular functions(e.g., 2–5). However, most TEs likely are finely tuned genomic parasites that mobilize to ensure their own survival6–9. The genomic revolution, coupled with new DNA sequencing technologies, now provides an unprecedented wealth of data documenting TE content and mobility in a broad array of organisms.
In multi-cellular eukaryotes, TEs must mobilize within gametes or during early development to be transmitted to future generations. In humans, there are at least 65 documented cases of diseases resulting from de novo TE insertions; these events account for approximately 1/1000 spontaneous cases of disease in humans5, 10. Indeed, new genomic technologies combined with cell culture based experiments have demonstrated that active TEs are more prevalent in the human population than previously appreciated11–18. A growing body of evidence further suggests that mammalian TE integration occurs during early development19–21. In addition, studies of neurogenesis and some forms of cancer have raised the intriguing possibility that TE activity may impact the biology of certain somatic cells12, 22–24. It is likely that we only have observed the tip of the iceberg and still are underestimating the contribution of TE-mediated events to inter-and intra-individual structural variation in mammalian genomes.
TE mobility poses a serious challenge to host fitness. Paradoxically, TE insertions that are harmful to the host jeopardize TE survival. Thus, many TEs have evolved highly specific targeting mechanisms that direct their integration to genomic “safe havens,” thereby minimizing their damage to the host(e.g., 25–29, and references mentioned below). Nevertheless, host genomes have evolved potent restriction mechanisms, such as the methylation of TE DNA sequences and the expression of small RNAs or cytidine deaminases, to restrict TE activity in the germline and perhaps somatic cells(e.g., 30–33, and references mentioned below).
Interestingly, a growing number of examples suggest that TEs may become activated under certain environmental conditions, such as stress. Stress has been shown to induce TE transcription or integration, or redirect TE integration to alternative target sites34–38. These findings are consistent with Barbara McClintock’s hypothesis that environmental challenges may induce transposition, and that transposition, in turn, may create genetic diversity to overcome threats to host survival39.
We begin this review with a brief description of the types of TEs and their modes of mobility. We then describe the latest understanding of TE integration mechanisms and how the host defends against these attacks. Finally, we discuss exciting new research that suggests TE mobility may impact the biology of somatic cells. From the growing understanding of target site selection to the discovery of new active TE copies in human populations, it is clear that the field of transposon biology continues to yield new insights about genome biology.
TEs mobilize by remarkably diverse replication strategies (Table 1 and Figure 1)40. Many DNA transposons mobilize by a non-replicative “cut and paste” mechanism, whereby an element-encoded enzyme, the transposase, recognizes sequences at or near TE inverted terminal repeats to “cut” the TE from its existing genomic location and then acts to “paste” the excised DNA into a new genomic location (Fig. 1a)41.
Retrotransposons mobilize via the reverse transcription of an RNA intermediate; however, different types of retrotransposons carry out this process by distinct mechanisms (see Fig. 1b and c). Long terminal repeat (LTR) retrotransposons42–44 (Fig. 1b) and non-LTR retrotransposons45 (Fig. 1c) use element-encoded enzymes to mediate their mobility. In addition, the endonuclease and reverse transcriptase activities of non-LTR retrotransposons also play a central role in mobilizing non-autonomous Short INterspersed Elements (SINEs)46–48, certain classes of non-coding RNAs49–52, and messenger RNAs, which can result in the formation of processed pseudogenes53, 54.
Examples of DNA TEs include Tn5 and Tn7 of E. coli55, 56, P elements of Drosophila57, and Tc1 elements of C. elegans58. Though they thrive in prokaryotes and simpler eukaryotes, DNA TE activity appears to be extinct in most mammals, which fuelled speculation that DNA TEs play a limited role in the ongoing evolution of mammalian genomes59. However, recent studies suggest that DNA TEs, namely non-autonomous hobo/Activator/TAM (nhAT transposons and helitrons), are active in certain bat species60–62. Thus, these studies highlight how new DNA sequencing technologies can facilitate fundamental discoveries about the impact of different TE families on genome evolution and serve as a cautionary note against deriving general conclusions regarding TE activity from relatively few “reference” sequences.
LTR-retrotransposons are particularly abundant in eukaryotes. For example, Drosophila contains approximately 20 distinct families of LTR-retrotransposons that comprise ~1% of the genome63, while maize contains ~400 families of LTR-retrotransposons that comprise ~ 75% of the genome64, 65. In addition, the mouse genome contains multiple active LTR-retrotransposon families. Indeed, the ongoing retrotransposition of both autonomous LTR-retrotransposons and their non-autonomous derivatives is estimated to account for approximately ~10–12 percent of sporadic mutations in mouse66. By comparison, there appears to be little LTR-retrotransposon activity in human genomes59; however, a small number of human endogenous retroviruses are polymorphic with respect to presence/absence at a given genomic location, suggesting that they have retrotransposed relatively recently in human evolution67.
Non-LTR retrotransposons are widespread among eukaryotes, but have been especially prolific in mammalian genomes. For example, L1 elements and the non-autonomous SINEs that they mobilize (e.g., Alu and SINE-R/VNTR/Alu (SVA) sequences)47, 48, comprise approximately 30% of human genomic DNA sequence59. Furthermore, recent research using a combination of transposon display68, 69, second-generation DNA sequencing12, 15–17 and analyses of genomic DNA sequences from the Human Structural Variation project13, 70–72, the 1000 genomes project18, 73, 74, and clinical cohorts14, have revealed that L1 presence/absence dimorphisms, as well as non-allelic recombination between L1 and Alu elements, account for an appreciable proportion of the inter-individual structural variation observed among humans and continue to have a profound effect on the human genome (see ref. 5 for a detailed review).
Transposons exhibit a remarkable diversity of integration behaviors. Some TEs preferentially integrate into gene-dense regions of the genome, others target regions such as heterochromatin, telomeres, or ribosomal DNA arrays, and some appear to insert throughout the genome. Below, we describe several examples of TE integration and what is known about how TEs target specific sites in genomic DNA.
Many TEs integrate into gene rich regions although they use mechanisms that prevent the disruption of open reading frames (ORFs). An extreme example is the E. coli Tn7 DNA TE. Tn7 encodes a sequence specific DNA binding protein, TnsD, which mediates integration into a specific position in the host chromosome, termed attnTn7, and thereby avoids damaging the host genome75, 76. A second targeting protein, TnsE, can alter Tn7 target preference by directing integration to plasmid DNAs that are transferred between E. coli by conjugation77 or to double strand breaks and DNA structures formed during DNA replication78. By comparison, the Drosophila P element avoids disrupting ORFs by integrating within the 500 bp upstream of transcription start sites of genes79. However, the mechanism by which P-elements target these sites requires elucidation.
Certain non-LTR retrotransposons encode endonucleases that target specific sites in genomic DNA. For example, the R1 and R2 elements of insects encode sequence-specific endonucleases that cleave at specific positions within the 28S rDNA locus to initiate target-primed reverse transcription (TPRT)28, 45. However, these endonucleases operate by distinct mechanisms. R1 encodes an endonuclease that shares sequence similarity to an apurinic/apyrimidinic (APE) DNA repair endonuclease80, 81, whereas R2 encodes a type II-S restriction endonuclease82. Thus, these elements apparently have evolved convergent mechanisms to integrate into ribosomal DNA arrays.
LTR-retrotransposons also have evolved strategies to integrate into gene rich regions, while ensuring minimal damage to their hosts. For example, the Ty1 and Ty3 retrotransposons of S. cerevisiae specifically target gene-free windows located immediately upstream of RNA polymerase III transcribed genes, such as tRNAs25, 27, 83, 84; Ty3 is directed to integration sites 2 or 3 bp upstream of such genes by transcription factors TFIIIB and TFIIIC (Fig. 2a)85, 86. However, in the case of the SNR6 gene, which does not depend on TFIIIC for its expression, the TFIIIB factors Brf1 and TBP are sufficient to direct Ty3 integration87, 88.
Ty1 integrates into a ~700 bp window upstream of tRNA genes with a periodicity of 80 bp (Fig. 2b)27, 89. Although the factors that direct Ty1 to tRNA genes remain unknown, the unusual periodicity of integration depends on the amino-terminal domain of Bdp1, another TFIIIB factor90. The ability to integrate upstream of RNA polymerase III transcribed genes also can regulate host and TE gene expression. For example, Ty1 and Ty3 insertions can stimulate the transcription of downstream RNA polymerase III transcribed genes and transcription of the RNA polymerase III target genes can reciprocate by repressing Ty1 transcription91, 92. Clearly, determining how Ty1 target integration sites and exploring how integration alters gene regulation remain areas for future study.
The ability to target RNA polymerase III transcribed genes is not peculiar to LTR-retrotransposons. For example, the Dictyostelium discoideum non-LTR retrotransposon DRE (also known as TRE-5A) preferentially inserts ~48bp upstream of tRNA genes, whereas the retrotransposon Tdd3 (also known as TRE-3A) inserts downstream of tRNA genes29, 93. Indeed, experimental evidence suggests that the TRE-5A ORF1-encoded protein directly interacts with subunits of TFIIIB to direct its integration to tRNA genes94.
The Schizosaccharomyces pombe retrotransposon, Tf1, preferentially integrates into the promoters of RNA polymerase II transcribed genes and provides another example of how TEs target gene-rich regions95–97 (Fig. 2c). Tf1 integration has been studied by examining integration into promoters contained within extrachromosomal replicating plasmids26. For example, the fbp1 promoter is induced when the activating transcription factor Atf1p binds to an eight base pair upstream activating sequence (UAS1)98. Tf1 integration generally occurs 30 bp and 40 bp downstream of UAS1; however, mutating six nucleotides of UAS1 or deleting the Atf1p gene26 disrupts Tf1 integration specificity, causing integration to occur throughout the plasmid. Although cells lacking Atf1p show little reduction in the overall Tf1 retrotransposition frequency99, the above data, as well as the finding that Atf1p forms a complex with Tf1 IN, indicate that specific transcription factors such as Atf1p can play a critical role in directing Tf1 integration to a specific target site. Notably, experiments conducted with a synthetic promoter revealed that RNA polymerase II transcription is not sufficient to target Tf1 integration99.
The development of second-generation sequencing technology recently has allowed the in vivo examination of Tf1 integration sites en masse. Characterization of 73,125 Tf1 integration events from four independent experiments revealed a highly reproducible pattern — approximately 95% of integration events are clustered upstream of ORFs96. Interestingly, the most frequently targeted promoters are associated with genes that are induced by environmental stressors. The targeting of genes that respond to stress, coupled with the ability of Tf1 to induce the expression of adjacent genes26, suggests that Tf1 integration has the potential to improve survival of specific cells that are exposed to environmental stress. Likewise, the transcription of Tf2, another LTR-retrotransposon in S. pombe, is induced by oxidative and osmotic stress or by growth in low oxygen34, 100. Clearly, understanding the consequences of stress-induced retrotransposition will yield insights about how TE mobility can lead to genetic diversity, which may affect the ability of an organism to cope with stress.
Finally, certain retroviruses, which are descended from LTR-retrotransposons101, also exhibit preferential integration in gene rich regions. For example, human immunodeficiency virus-1 (HIV-1) preferentially integrates into RNA polymerase II transcribed genes, whereas murine leukemia virus shows a strong integration preference near transcriptional start sites102–104. Structural and biochemical data demonstrate that HIV-1 IN interacts with the cellular lens epithelium-derived growth factor (LEDGF/p75) host factor, and there is evidence that this interaction plays an important role in proviral DNA integration105, 106.
Some TEs target heterochromatic sequences that contain relatively few genes. For example, chromoviruses, which are related to Ty3/Gypsy LTR-retrotransposons, reside in heterochromatin of eukaryotes from fungi to vertebrates107. They contain a chromodomain near the carboxyl-terminus of IN that is related to HP1, a heterochromatin protein that binds to histone H3 methylated at lysine 9107. Furthermore, chromovirus chromodomains fused to green fluorescent protein (GFP) co-localize with heterochromatin108, suggesting that the chromodomain plays a principal role in directing integration. Indeed, fusion of one such chromodomain to the carboxyl-terminus of Tf1 IN directs integration of this TE to heterochromatin108.
The Ty5 LTR-retrotransposon also targets gene poor regions in S. cerevisiae. Approximately 90% of Ty5 integration events occur within the silent mating type loci or near silent heterochromatin at telomeres (Fig. 2d)109–111. Genetic and biochemical experiments indicate that a nine amino acid targeting domain (TD) in the Ty5 integrase (IN) carboxyl-terminus directly binds to a structural component of heterochromatin, Sir4p, to target integration35, 112, 113. Moreover, fusing Sir4p to the DNA binding domain of the LexA repressor protein causes Ty5 integration to be redirected to Lex A binding sites114. Thus, the Ty5 TD, by interacting with Sir4p, directs integration to heterochromatin. Interestingly, genetic and biochemical evidence indicate that the Ty5 TD evolved its interaction with Sir4p by mimicking residues in a host factor, Esc1p, that binds to the same amino acids of Sir4p115.
Although the ability of Ty5 IN to target heterochromatin suggests that the TE dictates target site integration, there also are indications that TE-host interactions can alter Ty5 target-site preference. Mass spectroscopy revealed that phosphorylation of the integrase TD at S1095 is critical for binding Sir4p35 and mutating S1095 redirects integration to expressed regions of the genome. Although the host-encoded kinase has not been identified, studies using a phospho-specific antibody indicate that stressors, such as nitrogen deprivation, can down-regulate S1095 phosphorylation35. Thus, stress conditions may alter the phosphorylation state of Ty5 IN, thereby redirecting Ty5 integration specificity. This elegant example provides a plausible mechanism for how stress can alter transposon mobilization in a manner that might provide an advantage for the host. It remains to be determined whether retargeting Ty5 to gene rich regions benefits Ty5 by allowing newly retrotransposed copies to reside in permissive expression contexts or benefits the host by generating genetic diversity, offering the potential to adapt to stress.
Some TEs exclusively integrate at or near telomeric ends. For example, the Het-A, TART, and TAHRE non-LTR retrotransposons comprise the ends of Drosophila chromosomes and likely substitute for the function of telomerase in maintaining chromosome end integrity2, 116, 117. The SART1 and TRAS1 non-LTR retrotransposons may have a similar role in Bombyx mori118. The proteins encoded by TART, TAHRE, SART1, and TRAS1 have a Apurinic/apyrimidinic (AP)-like endonuclease domain118, 119 and it is likely that the SART1 and TRAS1 endonuclease proteins direct their integration into telomeric repeats118; however, the functional role of the putative endonuclease domain in TART and TAHRE remains unknown.
Excitingly, recent studies have revealed that certain retrotransposons can target telomeric sequences for integration. For example, by an alternative endonuclease-independent retrotransposition mechanism, human L1 retrotransposons containing missense mutations in the L1 EN active site can integrate at endogenous DNA lesions and dysfunctional telomeres in Chinese Hamster Ovary cell lines that are deficient for factors important in the non-homologous end-joining pathway of DNA repair as well as p53 function120, 121. Similarly, members of the Penelope clade of retrotransposons, which encode an RT that lacks an obvious endonuclease domain, reside at telomeres in organisms from four eukaryotic kingdoms122. The RNAs encoded by these terminal Penelope elements also contain sequences that are complementary to telomeric DNA sequences, suggesting that base pairing between the TE RNA and single stranded telomeric DNA is critical for integration. Interestingly, both of the above cases can be considered as a type of RNA-mediated DNA repair that appears curiously similar to the mechanism used by telomerase120, 122. Future studies should elucidate whether host factors are critical for the localization of these retrotransposons to DNA lesions and/or chromosomal termini.
In contrast to elegant mechanisms that target integration of some TEs into specific regions of the genome, other TEs appear to lack target site specificity. For example, L1s and the non-autonomous elements they mobilize are interspersed throughout the genome59. Indeed, ~30% of engineered human L1 retrotransposition events in cultured cells, and a similar proportion of recently discovered full-length, dimorphic human-specific L1s, are near or within the introns of genes13, 50, 123, 124. Since protein-coding genes constitute ~40% of the human genome125, 126, these findings suggest a lack of robust mechanisms employed by L1s or the host to prevent L1 retrotransposition into genes.
The interspersed nature of L1 and Alu sequences probably reflects the fact that the L1 endonuclease has relatively weak target-site specificity, preferentially cleaving the sequence 5’-TTTT/A-3’ (and variants of that sequence), to initiate TPRT80, 121, 127, 128. Interestingly, while “young” Alu and L1 insertions exhibit similar interspersed integration patterns, cytogenetic studies and examination of the human genome reference sequence revealed that evolutionarily “older” L1s and Alus show distinct genomic distributions59, 129. Older L1s preferentially reside in gene-poor AT-rich sequences, whereas older Alus are preferentially reside in gene-rich GC-rich regions of the genome59.
The distinct distributions of older L1s and Alus likely result from post-integration selective processes that have operated on the genome for millions of years59. However, how these skewed distributions arose remains a mystery. Some researchers have suggested that Alus may possibly play an advantageous, albeit undefined, role in gene-rich regions of the genome59. Others have suggested that L1 retrotransposition events into genic regions may exert a greater fitness cost to the host than Alu insertions130. If so, negative selection would lead to the removal of detrimental L1 alleles from the population. Consistent with this hypothesis, data suggest that evolutionary recent human full-length L1s insertions are detrimental to the host131, 132, whereas in vitro studies have revealed that L1s contain cis-acting sequences that can reduce gene expression133, 134. Clearly, further studies are needed to explain how the distributions of L1 and Alu have diverged over evolutionary time.
Despite their interspersed distribution, a small body of evidence suggests that there may be preferred, albeit rare, L1 integration sites. For example, independent L1-mediated retrotransposon insertions at the same nucleotide position in the BTK gene (i.e., an SVA and an Alu element) have resulted in two sporadic cases of X-linked agammaglobulinemia135. Similarly, independent L1 and Alu insertions associated with colorectal and desmoid tumors, respectively, have occurred at the same nucleotide position in the APC gene22, 135, 136, whereas two independent Alu insertions at the same nucleotide position in the Factor IX gene have caused hemophilia B135, 137. Thus, it would not be surprising to find that chromatin structure and accessibility impact L1-mediated retrotransposon target preference138.
Although many TEs have evolved mechanisms to limit genome damage, TE integration still poses a potential threat to the host. Thus, it is not surprising that host organisms have evolved a diverse array of mechanisms to combat TE activity. However, the host must be able to discriminate TE sequences from host genes to accomplish this feat. Below, we discuss mechanistic strategies employed by the host to restrict TE mobilization.
Cytosine methylation (5-methylcytosine) is an important DNA modification in eukaryotes with genomes larger than 5×108 bp, which includes vertebrates, flowering plants, and some fungi. The majority of cytosine methylation in plants and mammals, and almost all cytosine methylation in Neurospora crassa, occurs within repetitive elements and is correlated with the transcriptional repression of retrotransposons in somatic and germline cells139, 140.
Experiments in mammals and plants demonstrate that global demethylation of genomic DNA strongly reactivates TE transcription141–144. For example, deletion of DNA cytosine-5-methyltransferase 3-like gene (Dnmt3L) in mice leads to loss of de novo cytosine methylation of both LTR and non-LTR retrotransposons, reactivation of transposable element expression in spermocytes and spermatogonia, and meiotic catastrophe in male germ cells145. Determining whether TE mobilization directly is responsible for the meiotic defects requires further study. Moreover, recent data demonstrates that inactivation of cytosine methylation in Arabidopsis thaliana causes a burst of retrotransposon and DNA TE activity and results in substantial increases in TE copy number144. Thus, epigenetic mechanisms act to control the expression, and perhaps mobility of various TEs.
Multiple lines of evidence indicate that DNA methylation inhibits TE transcription. Patterns of DNA methylation are established during gametogenesis and are mediated by Dnmt3a and the non-catalytic paralog, Dnmt3L, in mammals, but how TEs are recognized as methylation substrates requires further study146. By comparison, during plant development, small 24 nt RNAs target paralogous DNA sequences that share high levels of homology (such as TEs) for cytosine methylation. The mechanism of RNA-directed DNA methylation (RdDM) is not fully understood, but it appears to require the canonical RNA interference (RNAi) machinery (see below and Fig. 3), the DNA methyltransferase DRM2, and two plant specific RNA polymerases, Pol IV and Pol V146.
Small RNA-based mechanisms (including, endogenous small interfering RNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs)), also act to defend eukaryotic cells against TEs. The mechanisms by which these small RNAs are generated and how they inhibit TEs remain an active area of investigation in various model organisms. Mechanistic details regarding these processes can be found in many outstanding reviews on this topic(e.g., 30, 147–152); here we briefly summarize common themes that have emerged from the above studies.
Endo-siRNAs have the potential to inhibit TE mobility through the post-transcriptional disruption of transposon mRNA. For example, double-strand “trigger” RNAs (ds-RNAs) can be derived from the complementary inverted terminal repeats in DNA transposons, from structured mRNA transcripts, or from overlapping regions contained within convergent transcription units30, 147, 148, 153. The resultant ds-RNAs then can be processed into ~21–24 nt endo-siRNAs by members of the Dicer family of proteins (Fig. 3a)30, 154. These endo-siRNAs are loaded onto an Argonaute protein, and the “passenger” RNA strand (typically the sense strand of a TE) is degraded. The remaining complex of a single stranded RNA and Argonaute is called the RNA-Induced Silencing Complex (RISC); the RNA directs RISC to complementary sequences in target mRNAs, leading to their post-transcriptional degradation. Importantly, the RNAi machinery has the capability to inhibit any TE that generates a ds-RNA “trigger” that can serve as a substrate for the RNAi machinery.
By a different mechanism, piRNAs can be generated from genomic loci encoding long precursor RNAs that contain the remnants of different families of TEs151. In general, processing of these precursor RNAs leads to the production of a mature ~24–35 nt piRNAs (Fig 3b). A subfamily of Argonaute proteins, known as the Piwi clade of proteins, predominantly binds mature antisense piRNAs and directs them to complementary sequences in TE mRNA. An endonuclease activity associated with the Piwi protein cleaves the TE mRNA to release a sense strand piRNA, which can interact with other Piwi clade proteins. Binding of this complex to the original piRNA precursor RNA then reiterates this amplification cycle by a “ping-pong” mechanism151, 155. In addition to this type of mechanism that restricts TE mobility in the germline, recent studies suggest that specialized piRNA pathways, which do not operate via a ping-pong mechanism, might restrict somatic TE activity32, 155–157.
Examples of TEs that are controlled by small RNA-based mechanisms include Tc1 transposons in C. elegans, and P elements in Drosophila153, 158. Also, 21nt small interfering RNAs (siRNA) derived from the Athila family LTR-retrotransposons in the vegetative nucleus of the pollen grains in Arabidopsis thaliana are delivered to the sperm cells to inhibit expression of transposons that, in principle, could mobilize in the germline159.
Small RNA based mechanisms also may be critical for silencing mammalian L1s. For example, an antisense promoter located within the human L1 5’UTR allows the production of an antisense RNA that, in principle, could base pair with sense strand L1 mRNA to establish a ds-RNA substrate for Dicer160. Furthermore, mouse mutants lacking the murine Piwi family proteins MILI or MIWI2 exhibit a loss of L1 and intracisternal A particle (IAP) LTR-retrotransposon DNA methylation,; this loss correlates with their transcriptional activation in male germ cells161. Similarly, mice lacking a MILI interacting protein, Tudor containing protein-1, exhibit a similar loss of L1 DNA methylation and a reactivation of L1 expression162. Finally, mouse mutants lacking the non-canonical Maelstrom protein, a component of the nuage complex that may be important for small RNA biogenesis, exhibit de-repression of L1 transcription, an increase of L1 ribonucleoprotein particle intermediates in spermatids, and a chromosomal synapsis defect during male meiosis163. Together, the above examples provide compelling data that small RNA-based pathways likely act to control the expression of certain TEs in the mammalian germline.
Finally, it is noteworthy that other antisense RNA-based mechanisms may be involved in TE silencing. For example, antisense transcripts from S. cerevisiae Ty1 elements reduce Ty1 IN and RT protein levels by a post-translational mechanism; this leads to inhibition of Ty1 mobility and thus controls Ty1 copy number164. Since S. cerevisiae lacks RNAi machinery, these results suggest that genomes have evolved other RNA-dependent strategies to tame TEs.
Proteins involved in nucleic acid metabolism and/or DNA repair can also restrict TE mobility. For example, members of the APOBEC3 family of cytidine deaminases can restrict the retrotransposition of a various retroviruses and LTR and non-LTR retrotransposons33. For retroviruses and LTR-retrotransposons, APOBEC3 proteins generally deaminate cytidines during the first strand cDNA synthesis, which leads to either cDNA degradation or the integration of a mutated provirus. The mechanisms by which certain APOBEC3 proteins restrict non-LTR retrotransposons require elucidation. Similarly, over-expression of the 3’-repair exonuclease 1 (Trex1) gene, mutations in which cause Aicardi-Goutieres syndrome, can inhibit L1 and IAP retrotransposition in cultured cell assays165, 166, but the mechanism of Trex1-mediated TE repression requires elucidation.
Other mechanisms are also likely to restrict the mobility of non-LTR retrotransposons. For example, the overwhelming majority of L1 elements in mammalian genomes are 5’ truncated and are essentially “dead on arrival” because they cannot synthesize proteins critical for retrotransposition167. It has been proposed that 5’ truncation may be due to the low processivity of the L1-encoded reverse transcriptase. However, recent work on the reverse transcriptase encoded by the R2 non-LTR retrotransposon of Bombyx mori demonstrated that this enzyme is more processive than the reverse transcriptases encoded by retroviruses168. Alternatively, L1 5’ truncation might result if host factors cause the dissociation of the L1 reverse transcriptase from the nascent cDNA and/or degrade the L1 mRNA during integration. In this scenario, to generate a full-length insertion the L1 RT would need to complete integration before the TPRT intermediate is recognized as DNA damage by the host50, 169. Indeed, proteins involved in the non-homologous end-joining pathway of DNA repair seem to act to restrict the retrotransposition of a zebrafish LINE-2 element in DT40 chicken cells170, whereas members of DNA excision repair pathway (that is, ERCC1/XPF1) might restrict L1 retrotransposition in cultured human cells171.
Finally, in addition to recognizing the L1 integration intermediate as a form of DNA damage, recent data suggests that retrotransposition indicator cassettes delivered by engineered L1s in human embryonic carcinoma cell lines can be epigenetically silenced during or immediately after their integration into genomic DNA172. Given that L1 is an ancient “stowaway” in mammalian genomes, it is likely that the host has evolved multiple mechanisms to combat L1 mobility at discrete steps in the retrotransposition pathway, and that some of these mechanisms operate in a context dependent manner. Clearly, continued studies will reveal new and more diverse host mechanisms to restrict TE mobility.
Despite mechanisms to combat TE mobility, TEs continue to thrive in many host genomes. Thus, TEs must have evolved ways to either overwhelm or counteract these host defenses. TEs must mobilize in germ cells or during early development to ensure their survival (Figure 4). However, some TEs can mobilize in somatic cells, providing a potential mechanism to generate intra-individual genetic variation.
Drosophila P elements provide one of the best-studied cases of cell type specific transposition173. P element transposition occurs when females lacking P elements (the M cytotype) mate with males carrying P elements (the P cytotype); P element mobilization can cause hybrid dysgenesis in the offspring. In the reciprocal cross, eggs from P cytotype females produce a repressor protein and piRNAs that inhibit P element transposition57, 158. The repressor is an alternatively spliced truncated form of the transposase. Importantly, the repressor not only controls which crosses produce germline integration, but also inhibits transposition in the soma.
The Drosophila gypsy element is another example of a TE that exhibits tissue specific control174, 175. Gypsy transcription is induced in somatic follicle cells that surround the oocyte. The TE mRNA assembles into virus-like particles that are thought to traffic to the oocyte to carry out transposition. It remains unclear whether the transfer of gypsy virus-like particles to the oocyte occurs via an enveloped particle (similar to retroviruses) or by a form of endocytosis. However, the Flamenco locus encodes piRNAs that silence gypsy elements in follicle cells, thereby preventing the spread of these TEs to the surrounding germ cells155.
Relatively little is known about the developmental timing of L1 retrotransposition in mammals. The sheer numbers of L1 and Alu retrotransposons that populate mammalian genomes provide prima facie evidence that they mobilize in the germline. Various studies, using endogenous and engineered L1s, provide strong experimental evidence to back this assertion. For example, full-length mouse L1 RNA and the mouse L1 ORF1-encoded protein are co-expressed in leptotene and zygotene spermatocytes during meiotic prophase176. In addition, the mouse ORF1 protein is expressed in the cytoplasm during specific stages of development in oocytes177. Similarly, human oocytes express L1 RNA and support the retrotransposition of an engineered human L1178. Finally, transgenic mouse experiments demonstrated that an engineered human L1 retrotransposon, whose expression is driven from a heterologous pPol II promoter, can retrotranspose in male germ cells179.
Unexpectedly, a growing body of experimental evidence suggests that L1 retrotransposition also might occur frequently during early development (Figure 4) (also reviewed in ref. 5). For example, human embryonic stem cells can express L1 RNA and ORF1 protein, and accommodate the retrotransposition of engineered L1s, albeit at lower levels than in other types of transformed human cells19, 180. In addition, studies of a male patient with X-linked choroideremia revealed that his mother had mosaicism for the mutagenic L1 insertion in both germline and somatic tissues20. Thus, the initial retrotransposition event must have occurred during early embryogenesis in the mother. Finally, recent transgenic experiments conducted in rats and mice led to the conclusion that most L1 retrotransposition occurs during early embryogenesis and that most of the resultant events are not heritable21. Intriguingly, these data suggest that L1 ribonucleoprotein particles can be deposited into zygotes by either the sperm or egg to undergo retrotransposition during early development, thereby providing a possible mechanism to generate somatic mosaicism and intra-individual genetic variation (see below).
Classical experiments in maize revealed that DNA TE activity in somatic tissues could lead to variegated corn color phenotypes1, 181. Since that time, somatic TE events also have been reported in other organisms. For example, it is well established that Tc1 transposition in the Bergerac strain of C. elegans preferentially occurs in somatic cells182. Similarly, a recent study has revealed that somatic transposition of a DNA TE (Hatvine1-rrm) into the promoter region of the VvTFL1A gene of the grapevine cultivar Carnigan affects the grapevine branching pattern and size of fruit clusters183. Also, a mutagenic L1 insertion was identified in the adenomatous polyposis coli (APC) gene in tumor tissue, but not in the surrounding tissue, of a patient with colon cancer, suggesting a role for the insertion in cancer development22. Together with the transgenic L1 experiments (discussed above), these findings establish that somatic TE mobility can lead to phenotypic changes in the host.
Intriguingly, several lines of evidence suggest that somatic L1 retrotransposition may also occur in the mammalian nervous system (also reviewed in ref. 5). First, an engineered human L1 can retrotranspose in neurogenic zones of the brain in transgenic mice24 when its expression is driven by a promoter contained within its native 5’ UTR184. Second, engineered human L1s can retrotranspose in cultured rat neuronal progenitor cells (NPCs), human embryonic stem cell-derived NPCs, and at low levels in human fetal derived NPCs23, 24. Third, sensitive multiplex quantitative PCR experiments suggest a modest increase in L1 copy number in post mortem brain tissue, when compared to heart and liver tissue derived from the same individual23. Finally, retrotransposition of an engineered human L1 is elevated in a mouse model of Rett syndrome (a neurodevelopmental disorder), and induced pluripotent stem cells derived from Rett syndrome patients exhibit an increase in L1 DNA copy number when compared to normal controls, suggesting a potential increase in endogenous L1 retrotransposition185.
The above studies strongly suggest that certain neuronal cells may be permissive for L1 retrotransposition. However, additional research is needed to truly understand the impact of L1 retrotransposition in the brain. For example, recent advances in DNA sequencing technology should provide a means to directly test whether L1 DNA copy number changes detected in quantitative PCR experiments represent actual de novo endogenous retrotransposition events or result from other forms of genomic instability reported in neurons186, 187. Similarly, it remains unclear whether endogenous L1 retrotransposition events represent a type of “genomic noise” or whether they have any functional impact on neuronal development. Finally, it remains a mystery why neuronal cells may accommodate L1 retrotransposition at apparently higher levels than other somatic cells. Nonetheless, these studies have unveiled a new area of investigation that surely will be the subject of future work.
A growing body of evidence suggests that L1 retrotransposition may become deregulated in certain cancers. For example, early studies revealed that hypomethylation of the L1 promoter is correlated with increased L1 expression and/or the production of the L1 ORF1-encoded protein in certain tumors188–190. Moreover, engineered human L1s readily retrotranspose in a variety of transformed human and mouse cell lines, but generally show lower levels of retrotransposition activity in “normal” human cells such as fibroblasts(e.g.,11, 191, 192). Consistent with this, recent findings using second-generation DNA sequencing revealed a total of 9 de novo L1 retrotransposition events in 6 of 20 examined non-small cell lung tumors12. Intriguingly, the tumors containing the new L1 insertions also exhibited a specific genome-wide hypomethylation signature, which is consistent with the notion that altering the epigenome can create a permissive environment for L1 expression and/or retrotransposition, and perhaps the retrotransposition of other classes of non-LTR retrotransposons. Clearly, further innovations in DNA sequencing of heterogeneous cell populations will be critical to reveal patterns of TE activity in diverse tumors. The challenge then will be to determine whether all these TE insertions are “passenger” mutations that are a consequence of the altered cellular milieu of cancer cells or whether some act as “drivers” to promote tumorigenesis.
It is undeniable that TEs have played important roles in structuring genomes and generating genetic diversity. By understanding how, when, and where TEs integrate, and how the host responds to this ever-present threat, we will unveil the dynamic forces that shape our genomes. Indeed, we are now able to critically evaluate the McClintock doctrine and future experiments should allow valuable insight into whether the increases in TE transcription caused by environmental stress lead to higher levels of TE integration, and whether these insertions impact host phenotypes and/or survival.
It remains a curiosity why sequences without any apparent purpose continue to thrive in genomes. What is clear is that an understanding of TE biology is necessary to understand genome biology. It is intriguing to speculate that some phenotypic differences among organisms and/or between individuals are due to the effects of TEs. These speculations require rigorous experimental tests. However, the coming years should be an exciting time for TE biology.
We thank Dr. John Kim, Dr. José Garcia-Perez, and members of the Moran lab for critical reading of the manuscript. H.L. was supported in part by the Intramural Research Program of the NIH from the Eunice Kennedy Shriver National Institute of Child Health and Human Development. H.L. received additional support from the Intramural AIDS Targeted Antiviral Program. J.V.M. was supported in part by grants from the National Institutes of Health (GM060518 and GM082970). J.V.M. also is an Investigator of the Howard Hughes Medical Institute.
Henry L. Levin heads the Section on Eukaryotic Transposable Elements in the Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH. He received his Ph.D. in molecular biology at the University of California, Berkeley with Dr. Howard Schachman and completed postdoctoral research at Johns Hopkins University School of Medicine with Dr. Jef Boeke. Over the past 18 years Levin’s studies of LTR retrotransposons in fission yeast have identified mechanistic details of particle formation, reverse transcription, and integration. Recently, work in the Levin lab has focused on the mechanisms that control the position of integration.
John V. Moran currently is the Gilbert S. Omenn Collegiate Professor of Human Genetics at the University of Michigan Medical School in Ann Arbor, Michigan, USA. He also is an Investigator of the Howard Hughes Medical Institute. He received his Ph.D. in biochemistry from the University of Texas Southwestern Medical Center in Dallas, Texas, USA, and conducted his postdoctoral studies in the laboratory of Dr. Haig H. Kazazian Jr. at Johns Hopkins University School of Medicine in Baltimore, MD, USA and at the University of Pennsylvania School of Medicine in Philadelphia, PA, USA. His laboratory uses genetic, molecular biological, biochemical, and genomic approaches to understand the biology of human LINE-1 retrotransposons.