|Home | About | Journals | Submit | Contact Us | Français|
Non-LTR retrotransposons – including LINE-1 (or L1), Alu and SVA elements – have proliferated during the past 80 million years of primate evolution and now account for approximately one third of the human genome. These transposable elements are now known to affect the human genome in many different ways: generating insertion mutations, genomic instability, alterations in gene expression and also contributing to genetic innovation. As the sequences of human and other primate genomes are analyzed in increasing detail, we are begining to understand the scale and complexity of the past and current contribution of non-LTR retrotransposons to genomic change in the human lineage.
Also known as “jumping genes”, transposable elements (TEs) are discrete pieces of DNA that can move from site to site within (and sometimes between) genomes. Although their discovery dates back to the 1940s1, it took about half a century before we began to understand how TEs interact with their genomic environment. A crucial stage was reached with the completion of the first human genome sequence, which revealed that nearly half of our genome is derived from TEs2,3 (FIG. 1a). Actually, this is likely to be an underestimate, as many ancient TEs inserted in the human genome have probably diverged beyond recognition3. The scale of the contribution of TEs to the human genome is all the more remarkable when considering that protein-coding regions account for just 1.5% of the human genome3.
TEs can be separated into two major classes: DNA transposons and retrotransposons. DNA transposons, which make up ~3% of the human genome (FIG. 1a), are able to excise themselves from the genome, move as DNA and paste themselves into new genomic sites4. Although they are currently not mobilizing in the human genome, they were active during early primate evolution, until ~37 million years (My) ago5. Retrotransposons duplicate via RNA intermediates that are reverse-transcribed and inserted at new genomic locations4. Retrotransposons can be subdivided into two groups, distinguished by the presence or absence of long terminal repeats (LTRs). Human LTR elements are endogenous retroviruses which account for ~8% of the genome (FIG. 1a). Most endogenous retroviruses inserted in the human genome >25 My ago, and their activity is presently very limited in humans, if occurring at all3,6. By contrast, the vast majority of human TEs result from the present and past activity of non-LTR retrotransposons, typified by LINE-1 (or L1), Alu and SVA elements, that collectively account for about one third of the human genome3 (FIG. 1a). L1, Alu and SVA non-LTR retrotransposons are the only TEs unequivocally shown to be currently active in humans, as demonstrated by more than 60 reported cases of de novo insertions responsible for genetic disorders7-11.
The extremely high density of TEs in our genome poses the question: what is their evolutionary significance and impact during human evolution? The development of innovative molecular methodologies such as retrotransposition assays in cultured cells12,13 and computational techniques for comparative genomics, in conjunction with the availability of multiple primate genome sequences (such as the human3, chimpanzee14 and macaque15 genomes), have resulted in a progressive shift of the focus of TE research to how diverse and profound the impact of TE activity on genomic evolution is. The past years have witnessed a number of important discoveries regarding ways in which TEs affect human genome evolution, so it is now possible to literally quantify the overall impact TE activity has had on shaping our genome. For example, it has long been recognized that recombination between TEs can trigger genomic deletions in humans, as these deletions have caused several genetic disorders8. However, only recently have genome-wide comparisons of human and other primate genomes permitted us to determine the magnitude and significance of TE recombination-mediated deletions at an evolutionary scale16-18.
In this review, we focus on the evolutionary impact of non-LTR retrotransposons, which are by far the most abundant TEs in the human genome and the most active TEs during recent human evolution. First, we briefly describe the structure of non-LTR retrotransposons and mechanisms by which they move. Then, we explore the evolutionary dynamics of non-LTR retrotransposons, that is, what has made them so evolutionary successful in the human genome. Addressing this question helps us to understand the ways in which and to what extent TEs in general - and non-LTR retrotransposons in particular - have impacted human genome evolution. This impact turns out to be tremendously diverse and considerable, ranging from local instability (for example, through insertion mutagenesis and seeding of microsatellites) to large-scale structural variation (for example, through ectopic recombination and transduction of flanking sequences) to contributions to genetic innovation (for example, through new gene formation and exonization) and alterations in gene expression (for example, through alternative splicing and epigenetic regulation). Finally, we conclude with potential future research directions.
There are >500,000 L1 copies in the human genome, resulting from their continued mobilization activity for the past 150 My3. L1 elements constitute ~17% of the human genome, which makes them the most successful TEs in the human genome by mass (FIG. 1a). The canonical, full-length L1 element is ~6 kilobases (kb) in length and it consists of a 5′ untranslated region (UTR) containing an internal RNA polymerase II promoter19, two open reading frames (ORF1 and ORF2) and a 3′ UTR containing a polyadenylation signal ending with an oligo dA-rich tail of variable length20 (FIG. 1b). ORF1 encodes an RNA-binding protein and ORF2 encodes a protein with endonuclease and reverse-transcriptase activities20. This molecular machinery allows the retrotransposition process known as target-primed reverse transcription (TPRT) to occur (BOX 1), thus making L1 elements the only autonomous TEs in the human genome. However, not all L1 copies are competent for retrotransposition. Indeed, as a result of the TPRT process and decay over time, most L1 copies are inactivated by truncations, internal rearrangements and mutations3,21. Out of the >500,000 L1 elements inserted in the human genome, less than 100 copies are intact22.
The increase in copy numbers of non-LTR retrotransposons occurs via an RNA-based duplication process termed retrotransposition. The first step in L1 retrotransposition involves RNA polymerase II-mediated transcription of a genomic L1 locus from an internal promoter that directs transcription initiation at the 5′ boundary of the L1 element19,129. Carrying an internal promoter makes sense for a retrotransposon if it is to generate autonomous duplicate copies at multiple locations in the genome. The L1 RNA is exported to the cytoplasm where ORF1 (encoding an RNA-binding protein) and ORF2 (encoding a protein with endonuclease and reverse-transcriptase activities) are translated. Both proteins exhibit strong cis-preference27; consequently, they preferentially associate with the L1 RNA transcript that encoded them, to produce a ribonucleoprotein (RNP) particle. The RNP is then transported back into the nucleus by a mechanism that is poorly understood.
The integration of the L1 element into the genome likely occurs via a process termed target-primed reverse transcription13,130,131 (TPRT), which was originally described for the R2 non-LTR retrotransposon of the silkworm Bombyx mori132. During TPRT, it is thought that the L1 endonuclease cleaves the first strand of target DNA, generally at 5′-TTTT/AA-3′ consensus sites133 (a). The free 3′ hydroxyl generated by the nick is then used to prime reverse transcription of L1 RNA (red) by the L1 reverse transcriptase (b). The second strand of the target DNA is cleaved (c) and used to prime second-strand synthesis (d), through poorly understood mechanisms. Hallmarks of the integration process include frequent 5′ truncations, presence of an oligo dA-rich tail at the 3′end, and 2-20 bp-long duplications of the target site3,21 (TSD) (e).
Alu and SVA retrotransposition also likely occurs via TPRT, through the hijacking of the L1 retrotransposition machinery12,29,30. The mechanism of Alu and SVA trans-mobilization by L1 proteins remains elusive. RNA polymerase III-mediated Alu transcripts are exported to the cytoplasm and bound to SRP9/14 proteins to form stable RNPs134,135. It has been hypothesized that Alu RNPs interact with ribosomes, thereby positioning Alu transcripts in close vicinity of nascent L1 ORF2 proteins12,42 (ORF1 protein enhances, but is not strictly required for, Alu retrotransposition12,136). However, it remains unclear whether Alu RNPs gain access to the L1 retrotransposition machinery in the cytoplasm or in the nucleus, as Alu RNPs might recruit L1 ORF2 proteins in the nucleus and immediately proceed with TPRT137.
There are >1 million Alu copies in the human genome3, resulting from their continued mobilization activity throughout the past ~65 My23. This makes Alu elements the most successful TEs in the human genome in terms of copy number. The typical full-length Alu element is ~300 base pairs (bp) in length and it exhibits a dimeric structure formed by fusion of two monomers derived from the 7SL RNA gene24 (a component of the signal recognition particle), which are separated by an A-rich linker region (FIG. 1c). The 5′ region contains an internal RNA polymerase III promoter (A and B boxes) and the element ends with an oligo dA-rich tail of variable length23. As Alu elements do not possess RNA polymerase III termination signals, Alu transcripts extend into the downstream flanking sequence until a terminator (typically a run of four or more consecutive Ts) is found25,26. Alu elements have no coding capacity and are, therefore, non-autonomous TEs. Instead, they borrow the retrotransposition molecular machinery encoded by L1 elements12 despite the fact that L1 ORF1 and ORF2 proteins exhibit strong cis-preference for L1 RNA27 (BOX 1), which is the reason why Alu elements are sometimes referred to as “a parasite’s parasite”28.
There are ~3,000 SVA copies in the human genome, resulting from continued activity throughout the ~25 My of hominoid evolution29,30. The typical full-length SVA element is ~2 kb in length and it is composed of an hexamer repeat region, an Alu-like region, a variable number of tandem repeats region, a HERV-K10-like region and a polyadenylation signal ending with an oligo dA-rich tail of variable length29,30 (FIG. 1d). Several lines of evidence suggest that SVA elements are transcribed by RNA polymerase II29,30. However, SVA elements apparently contain no internal promoter and they might rely, at least partly, on promoter activity in flanking regions29,30. Similar to Alu elements, SVA elements are non-autonomous TEs presumably trans-mobilized by the L1 retrotransposition machinery29,30 (BOX 1).
In addition to the L1, Alu and SVA elements, which are currently active, there are additional families of old, inactive non-LTR retrotransposons that comprise a total of ~6% of the human genome (FIG. 1a). Although far less numerous than L1 and Alu elements, these old elements represent a rich molecular fossil record testifying to the long-term relationship between TEs and the human genome3. This record indicates for example that before the autonomous L1 element and its Alu parasite expansions, the genome experienced retrotransposition of the autonomous L2 element and its MIR parasite3. These old elements may have substantially impacted human genome evolution31-34.
The impact of non-LTR retrotransposons on human genome evolution largely results from their extremely high copy numbers (for example, there is one Alu insertion every ~3 kb on average3) and continued activity over tens of My. These two features are particularly striking when considering the various cellular processes that control retrotransposon activity (BOX 2). At an evolutionary scale, the vertical persistence of non-LTR retrotransposons, not only in primates but in mammals in general, sets them apart from most other TEs in mammals and other eukaryotes3,5,35. In this section, we discuss the evolutionary dynamics that have made non-LTR retrotransposons so prolific during primate genome evolution.
TEs can be seen as selfish genetic entities whose spread can be deleterious to the host cell due to the genomic instability that is induced by a massive increase in copy number. As a result of the conflicting interests of TEs and the host genome, the cell has developed various processes to control retrotransposon activity, as predicted by the Red Queen hypothesis138. Below we provide examples of how L1 and Alu retrotransposition activity is regulated in host cells (for more detailed discussion, see refs. 11,139).
Regulation of L1 retrotransposition can occur at the transcription level. For example, novel regulatory regions have been frequently recruited during L1 evolution38; the current L1 5′ UTR contains several transcription factor-binding sites important for transcription activation or initiation140-142. In addition, DNA methylation at the promoter is known to repress L1 expression124,143. L1 elements are also subject to post-transcriptional regulation. For example, RNA-induced silencing through RNA interference has been suggested to reduce L1 retrotransposition in cultured cells144,145. The A-rich coding strand of the full-length human L1 contains 19 potential canonical and noncanonical polyadenylation signals that lead to truncation of full-length L1 transcripts by premature polyadenylation, thus ultimately contributing to the attenuation of L1 activity110. Furthermore, cells produce proteins such as those of the APOBEC3 family that can inhibit L1 and Alu retrotransposition146.
Alu activity is influenced by its primary sequence in that the accumulation of mutations across time may alter important motifs such as the internal RNA polymerase III promoter or SRP9/14 binding motifs26,42. The accumulation of mutations is facilitated by the high density of CpG dinucleotides that are prone to mutation as a result of the deamination of 5-methylcytosine residues125. Overall, it has been estimated that when an Alu copy reaches ~10% divergence from its subfamily consensus sequence, the likelihood that it continues to be active is remote42. The length and homogeneity of the oligo dA-rich tail also appear to be important for activity 147,148. The genomic environment in which Alu copies insert is crucial for retrotranspositional activity149-151 and the distance between the oligo dA-rich tail at the 3′ end of the Alu sequence and the RNA polymerase III terminator located in the downstream sequence, which determines the overall length of Alu transcripts26, is also important.
A key concept relevant to the evolutionary dynamics of L1, Alu and SVA sequences is that they can all be divided into subfamilies or “clades” of related elements based on diagnostic nucleotide substitutions and insertions/deletions exclusively shared by all subfamily members. For example, more than 200 Alu subfamilies are currently recognized in the human genome36, but only six subfamilies of the younger SVA family exist30. Not only are subfamilies different in age, but the diagnostic sequence mutations or changes that define subfamilies tend to accumulate hierarchically23,37. In other words, instead of two subfamilies being independently derived from an ancestral subfamily, most subfamilies represent an ongoing linear sequential evolution pattern where a series of subfamilies have each been successively derived one from the other. For example, it has been shown that during the past ~40 My, all L1 subfamilies in the human genome are derived from a single lineage from which they arose sequentially38. Similar patterns of subfamily evolution have been reported for Alu23 and SVA30 elements. These observations can be explained if one assumes that only a few elements (so-called ‘source’ or ‘master’ elements) are involved in the retrotransposition process and are responsible for the formation of all other subfamily members37.
The “master gene” model of retrotransposon amplification37 has been refined, in particular by quantifying the number of retrotransposition-competent elements within the human genome. Analysis of the >200 Alu subfamilies in the human genome suggested the existence of at least 143 Alu source elements36 and it has been estimated that an average human genome carries 80-100 retrotransposition-competent L1 copies, six of which (hot L1s) are probably responsible for the bulk of L1 retrotransposition22,39,40. These results further indicate that several source elements may exist within a subfamily because all six hot L1 elements belong to the L1-Ta subfamily22. A network-based analysis also revealed that human-specific Alu subfamilies typically contain ~15% of secondary source elements that contributed ~30% of subfamily members, in addition to a main master element41. Thus, there may be hundreds of active Alu ‘core’ sequences in the human genome42. Although they only represent a tiny fraction of all human non-LTR retrotransposons, source elements can be considered as the ultimate drivers of evolutionary change in the human genome because they are responsible for most L1, Alu and SVA elements inserted in our genome.
Another distinguishing feature of human retrotransposons is their persistent activity over tens of My of evolution. How have active retrotransposons been maintained over this time? Reconstruction of the evolutionary history of the Alu Yb lineage showed that it originated during early hominoid evolution, 18-25 My ago43. Strikingly, the Alu Yb lineage has dramatically expanded to ~2,000 copies within the past few My specifically in the human genome, as non-human hominoid primates carry only a handful of Alu Yb elements43-45. Therefore, the Alu Yb lineage remained in the genome with no or little retrotransposition for 15-20 My, while preserving the ability to generate a high number of new copies in a species-specific manner. These results suggest that long-lived, low-activity source elements may act as “stealth drivers” that occasionally produce elements, some of which may become highly active. While highly active “master” elements may be deleterious and negatively selected, low-activity stealth drivers may allow the Alu lineage to persist on the long term43. Attenuation of mobilization activity may be a common evolutionary strategy of various retrotransposons46,47. Therefore, the ability to maintain low to moderate levels of retrotransposition activity may be an important feature that allowed human retrotransposons to maintain long-term activity.
Because of their continued activity and accumulation in the genome over tens of My, L1, Alu and SVA elements have had a tremendous impact on the evolution of primate genomes, both in terms of structure and function. To assess the impact of these elements on genome evolution we can first consider how frequently retrotransposition occurs in the germline. The current rate of Alu retrotransposition has been estimated as one insertion every ~20 births in humans, based on both the frequency of disease-causing de novo insertions compared to nucleotide substitutions48 and evolutionary comparisons of the human and chimpanzee genomes48 and of multiple human genome sequences49. The current rate of L1 retrotransposition has also been estimated as one insertion every ~20 births in humans based on disease-causing de novo insertions50 but as one insertion every ~200 births based on genome comparisons49. The difference between the two estimates might lie in the underlying assumptions of the methods, but no such bias is observed for Alu elements using the same approaches. Alternatively, the difference may reflect recent variation in the L1 retrotransposition rate or intense negative selection against L1 insertions. The current SVA retrotransposition rate has tentatively been estimated as one insertion every ~900 births based on genome comparisons49. However, there is more uncertainty around this rate due to the smaller datasets available for analysis. Although new heritable retrotransposition events take place in the germline, retrotransposition also occurs in somatic tissues with an impact ranging from cancer to a possible role in brain development8,51,52. Retrotransposon-induced somatic variation is a fascinating area of investigation that is likely to provide new insight into TE biology and their impact on human beings.
Amplification rates have not been uniform over time. For example, the vast majority of Alu elements were inserted by ~40 My ago, following a peak of amplification during which there was approximately one new Alu insertion in every birth53. Similarly, during the past ~70 My of evolution, variation in the L1 amplification rate has been observed, with the most prolific L1 subfamilies having amplified 12-40 My ago38. Genome-wide comparisons of the human and chimpanzee genomes provide additional evidence for recent variation in L1, Alu and SVA retrotransposition rates, as judged by the different numbers of species-specific elements that have inserted since the divergence of the two species ~6 My ago14,54,55. Such fluctuation in amplification rates on a short time-scale suggests influences at the host population level40,54.
Perhaps one of the most intuitive consequences of TE accumulation is their contribution to genome size increase56: L1 and Alu elements alone have contributed ~750 million bases (Mb) to the human genome sequence3 (FIG. 1). This increase in genome size is an ongoing process, as the human genome has accumulated ~2,000 L1, ~7,000 Alu and ~1,000 SVA copies within the past ~6 My of human evolution, a combined addition of >8 Mb14. Equally importantly, the ongoing expansion of retrotransposons has also created significant inter-individual variation in retrotransposon content; several hundred new mobile element insertions have been detected between multiple human genomic sequences49,57-59. These human-specific retrotransposons insertions are often polymorphic (present or absent) at orthologous loci among human individuals and they constitute highly informative genetic markers that are being used to investigate human evolutionary history, population structure and demography (BOX 3).
As revealed by pioneering studies on humans152-154, primates155 and non-primate groups156,157, retrotransposons afford several advantages that make them very poweful tools as genetic markers for studying human and non-human primate evolutionary history23,157,158. They are essentially homoplasy-free characters in that individuals which share retrotransposon copies at orthologous sites have inherited them from a common ancestor, that is precise excision of retrotransposons is extremely rare158,159. Their ancestral state is known as the absence of the element at any locus of interest, which makes it possible to include hypothetical ancestors for rooting of phylogenetic trees153. As there are only two possible character states for each locus - presence or absence of the element - genotyping of individuals for retrotransposon insertions is technically straight-forward. Furthermore, the vast majority of retrotransposon insertions are neutral residents of the genome160, and the gradual accumulation of elements over time makes it possible to determine loci most suitable to a range of time points in primate history. As a result, retrotransposon insertion polymorphisms (most notably Alu elements) have been used to decipher the phylogenetic relationships of various primate groups161,162, including the resolution of the human-chimpanzee-gorilla trichotomy demonstrating the close relationship between humans and chimpanzees163. Some retrotransposons have inserted so recently during human evolution that they are polymorphic for presence or absence among human populations and individuals23,49,164. In particular, Alu elements have proved highly informative for the study of human origins, by providing strong evidence for an African origin of anatomically modern humans153,154. More recently Alu element insertion polymorphisms have been used to investigate human population structure and demography154,165,166. Retrotransposon insertion polymorphisms are also being used as forensic tools, for example for species-specific DNA detection and quantitation, analysis of complex biomaterials, human gender determination and the inference of geographic origin of human samples167.
There are many ways through which retrotransposons can generate genomic instability. In this section, we consider the effects of retrotransposons at a local genomic scale, linked to insertions and their protein products, as well as consequences affecting retrotransposon sequences at a deeper time scale.
The most straightforward way in which a retrotransposon can impact upon genome function, and thereby potentially influence genome evolution, is by inserting into protein-coding or regulatory regions (FIG. 2a). Due to the immediate phenotypic impact of such insertions, they were the first to be detected7. Examples of human genetic disorders caused by de novo L1, Alu and SVA insertions continue to accumulate, with 65 cases reported to date shown to cause various heritable diseases such as haemophilia, cystic fibrosis, Apert syndrome, neurofibromatosis, Duchenne muscular dystrophy, β-thalassemia, hypercholesterolemia and breast and colon cancers8,9,11. Overall, it has been estimated that ~0.3% of all human mutations are attributable to de novo L1, Alu and SVA insertions10. Interestingly, L1 (and to a lesser extent Alu and SVA) disease-causing insertions appear to be enriched on the X chromosome8,9,11. This observation may partly be attributable to ascertainment bias, as X-linked genetic disorders are often dominant in males and thus more easily detected. Alternatively, L1 elements may preferentially insert in the X chromosome, perhaps in relation to their possible involvement in X inactivation60,61.
It has recently been shown that the ORF2 protein encoded by L1 elements, which has endonuclease activity, generates many more DNA double-strand breaks (DSBs) than the number associated with actual L1 insertions in mammalian cell lines62 (FIG. 2b). The extent to which this mechanism contributes to human genomic instability remains unknown, since the levels of L1 expression under these experimental conditions were much higher than those expected under normal cellular conditions. However, the repair of L1-mediated DSB lesions would leave no particular signature of L1 involvement. Thus, it is possible that a substantial fraction of the genomic instability associated with DSBs, which are highly mutagenic and recombinogenic, is ultimately attributable to L1 activity.
L1 and Alu elements have also been linked to DSB repair. Evidence from L1 retrotransposition assays in cultured cells demonstrated that L1 insertions can occur independently of endonuclease in cell lines that lack the ability to perform non-homologous end joining, a major mechanism of DSB repair63 (FIG. 2b). Endonuclease-independent (ENi) L1 insertions lack the hallmarks of TPRT (BOX 1), thereby suggesting that L1 elements can integrate into and repair DSB DNA lesions63. In addition, dysfunctional telomeres can serve as substrates for ENi L1 retrotransposition and endonuclease-deficient LINE-like (Penelope) elements are present at the telomeres of several eukaryotes, suggesting that ENi retrotransposition may be an ancestral mechanism of RNA-mediated DNA repair associated with non-LTR retrotransposons used before the acquisition of an endonuclease domain64,65. Recent analyses of the human genome have shown that 0.5-0.7% of all L1 and Alu insertions have non-canonical structures and may have resulted from ENi retrotransposition66,67, suggesting that non-LTR retrotransposons in general, not just L1 elements, may serve as a ‘fail-safe’ mechanism in maintaining human genome integrity.
Because of their abundance in the genome and because they contain homopolymeric tracts, non-LTR retrotransposons have the ability to generate microsatellites at many loci in the genome (FIG. 2c). This has been studied in particular for Alu elements68,69, each new copy of which provides two potential sources of microsatellites: the linker region in the middle of the element and the 3′ oligo dA-rich tail (FIG. 1c). These homopolymeric repeats can experience various mutational forces such as nucleotide substitutions and replication slippage, which may ultimately result in microsatellites of varying length and complexity. Consequently, it is not surprising that ~20% of all microsatellites (including ~50% of mononucleotide microsatellites) shared by the human and chimpanzee genomes lie within Alu elements70. In addition, there are at least two examples of expansions of microsatellites that arose from A-rich regions of Alu elements causing genetic disorders71,72.
Several studies indicate that Alu elements undergo gene conversion73,74 (FIG. 2d), a type of recombination defined as the non-reciprocal transfer of information between homologous sequences. Gene conversion may play a role in the evolution of Alu elements by inactivating active copies or resurrecting inactivated copies23. For example, it has recently been shown that the master element of the Alu Yh3a3 subfamily has been inactivated by gene conversion in humans, thus preventing further amplification of this subfamily75. In addition, because Alu elements make up >10% of the human genome, Alu-mediated gene conversion might have a substantial impact on the overall nucleotide diversity of our genome. Also it might impair the use of SNPs located within Alu sequences as genetic markers, since gene conversion would make these SNPs identical by state rather than identical by descent23. However, the significance of this phenomenon has not formally been tested and the development of second-generation sequencing approaches and personal genomics opens new avenues to resolving this question.
In addition to local genomic instability, retrotransposons can also generate genomic rearrangements such as deletions, duplications and inversions. In this section, we discuss three ways in which retrotransposons can create structural variation in the genome.
The insertion of L1 and Alu elements at new genomic sites sometimes results in the concomitant deletion of adjacent genomic sequence (FIG. 2e). This phenomenon was first observed during the analysis of L1 integrations within cultured human cells, where ~20% of L1 insertions were associated with structural rearrangements, including concomitant deletions at the insertion site, ranging in size from 1 bp to possibly >130 kb76-78. These deletions apparently can arise by endonuclease-dependent and ENi mechanisms78. L1 and Alu insertion-mediated deletions have subsequently been shown to occur naturally in the human and chimpanzee genomes, although they are usually shorter (<800 bp on average) and they occur at a much lower frequency than in cultured cells (~2% and ~0.3% of L1 and Alu insertion events, respectively)79,80. This may reflect, at least partly, negative selection against large, disruptive insertion-mediated deletions. Consistent with these observations, a 46 kb-long L1 insertion-mediated deletion event in the PDHX gene has recently been implicated in pyruvate dehydrogenase complex deficiency81 and human-chimpanzee genome comparisons identified a single insertion-mediated deletion event that caused functional gene loss within the past ~6 My79.
It has also been noted that ~90% of non-classical, ENi L1 and Alu insertions are associated with deletions of flanking sequence ranging in size from 1 bp to 14 kb, including one deletion that removed an olfactory receptor gene from the human and chimpanzee genomes66,67. Altogether, it has been estimated that during primate evolution, as many as ~45,000 insertion-mediated deletions may have removed >30 Mb of genomic sequences18.
Due to their extremely high copy numbers, L1 and Alu elements can also create structural genomic variation at the post-insertion stage, through recombination between non-allelic homologous elements (FIG. 2f), including between elements that have been inserted in the genome for a long time. Ectopic recombination can result in various types of genomic rearrangements such as deletions, duplications and inversions.
It has long been recognized that Alu recombination-mediated deletions (RMD) occur in the human genome, as shown by the >70 reported cases of Alu RMDs being responsible for various forms of cancer and genetic disorders8,10. By contrast, only three disease-causing L1 RMD events have been reported17. Genome-wide comparisons identified 492 Alu compared with 73 L1 RMD events in the human genome since the human-chimpanzee divergence16,17. L1 RMDs are larger on average than Alu RMDs and they occur more frequently in gene-poor regions of the genome than Alu RMDs. These results are suggestive of negative selection against long, deleterious L1 RMDs in gene-rich regions of the genome18,82,83. Thus, Alu and L1 RMD events detectable by comparative genomics approaches largely represent the fraction of all RMDs that have escaped negative selection. Yet, based on human and chimpanzee genome comparisons, these events have collectively removed nearly 1 Mb of genomic sequence from the human genome within the past few My16-18, thereby underscoring their important evolutionary impact on the human genome.
The human genome contains many large (>10 kb in length) and highly similar (>90% sequence identity) duplicated genomic regions, termed segmental duplications. Interestingly, the boundaries of human segmental duplications are significantly enriched in Alu elements, that is, they comprise ~24% of boundary sequences but only ~11% elsewhere in the human genome84. Considering that ~5% of the human genome has been duplicated within the past ~40 My, recombination between Alu elements may represent an important mechanism for the origin and expansion of segmental duplications in our genome84.
The contribution of L1 and Alu elements to chromosomal inversions has also been recently investigated by comparative genomics. Nearly half of the inversions that took place in the human and chimpanzee genomes since their divergence involve L1 and Alu elements, and ~20% of all inversions can clearly be identified as products of L1-L1 or Alu-Alu recombination events85. Although this type of rearrangement does not result in gain or loss of genomic sequence, it contributes to genomic variation sometimes with functional significance, since several events are involved in the inversion of exons85.
In addition to duplicating themselves, L1 and SVA elements can sometimes carry upstream or downstream flanking genomic sequences with them (termed 5′ and 3′ transduction, respectively) (FIG. 2g). In 3′ transduction, the RNA transcription machinery skips the weak retrotransposon polyadenylation signal and terminates transcription by using an alternative polyadenylation signal located downstream in the 3′ flanking sequence. Similarly, 5′ transduction occurs when a promoter located upstream of the retrotransposon is used to transcribe the sequence down to the retrotransposon86,87. The transcript containing the retrotransposon along with the extra genomic sequence is subsequently integrated back into the genome via retrotransposition. Initially characterized using cell culture-based methods88, 3′ transduction has subsequently been shown to occur frequently in the human genome: ~10% of both L1 and SVA insertions are associated with 3′ transduction events30,89-91.
Variation in the number of genes among species indicates that new genes are continuously generated over evolutionary time. Comparative genomic studies have confirmed the notion of “evolutionary tinkering”92 according to which new genes most commonly arise by rearrangements between pre-existing genetic structures. In this section, we explore mechanisms by which retrotransposons have fostered genetic innovations in the human lineage.
Retrotransposon-mediated transduction (discussed above) can lead to the duplication of coding sequences fortuitously located in the transduced flanking genomic sequence. The potential of L1 retrotransposons to mediate exon shuffling via 3′ transduction has been experimentally confirmed using cell culture assays88. This mechanism has subsequently been shown to have mediated the formation of a new gene family during recent human evolution, via multiple SVA-mediated transduction events of the AMAC1 gene89 (BOX 4).
It has been experimentally demonstrated using cell culture assays that L1 retrotransposons can mediate exon shuffling via 3′ transduction88. Subsequent analyses of the human genome have confirmed that L1-mediated transduction indeed took place during human genome evolution and that it may account for 0.6-1% of human DNA3,90,91. However, whether it contributes to evolving new gene function remained an open question. A recent analysis of SVA retrotransposons has demonstrated the evolutionary significance of retrotransposon-mediated 3′ transduction, by showing that SVA-mediated transduction is responsible for the creation of the AMAC1 gene family that comprises four copies in the human genome89.
As part of a genome-wide analysis of SVA-mediated transduction, Xing et al. 89 identified 143 events that transduced sequences ranging in size from a few dozen bp to almost 2 kb. Interestingly, three transduced sequences located on chromosomes 8, 17 and 18 were found to originate from the same source locus located elsewhere on chromosome 17 (a). In the figure, flanking sequences of the original locus are shown as blue boxes and the flanking sequences of the transduced loci are shown as light blue boxes. Target site duplications are shown as green arrows. SVA elements are depicted as red bars and the transduced sequences are shown as blue bars with coding regions shown as purple bars. SVA element oligo dA-rich tails are shown as “(AAA)n”. Analysis of the four paralogous sequences identified four copies of the AMAC1 gene. The ancestral AMAC1L3 gene copy at the source locus consisted of two exons separated by an intron. By contrast, the three transduced copies were intronless versions of AMAC1L3, as a result of the splicing of the intron during the retrotransposition process (b). Evolutionary analyses demonstrated that the three transduction events all took place ~7-14 My ago, as human and African great apes share all four AMAC1 copies, whereas orangutan and other primate and non-primate species that have been analyzed only possess the ancestral AMAC1L3 gene. Experimental studies indicated that, in addition to AMAC1L3, at least two of the three transduced AMAC1 genes are expressed in human tissues. RNA transcript sequence analyses of the expressed AMAC1 duplicates further revealed that the promoter sequence had been duplicated along with the AMAC1 coding sequence as part of the 3′ transduction process. This indicates that retrotransposon-mediated gene transduction can duplicate not only coding regions of genes but also their regulatory regions, thus retaining functional potential after duplication. Hence, this retrotransposon-mediated duplication mechanism can lead to rapid generation of functional gene families.
In contrast to transduction, gene retrotransposition only duplicates gene sequences and no retrotransposon sequence is co-duplicated in the process. This is because gene retrotransposition is based on the hijacking of the L1 retrotransposition machinery by host mRNA transcripts93, similar to Alu and SVA retrotransposition. As a result, gene retrotransposition generally does not duplicate upstream regulatory regions, thus requiring duplicated genes to fortuitously acquire new regulatory regions to be functional. Therefore, gene retrotransposition was long thought to generate non-functional duplicate gene copies termed retropseudogenes. However, genome-wide searches have confirmed the importance of gene retrotransposition in the emergence of new primate genes94-96 and it has been estimated that at least one new retrogene per My emerged in the human lineage during the past ~65 My97 (for more detailed discussion, see ref. 96).
Alternative splicing is a widespread mechanism that occurs in 40-60% of human genes3,98. By producing more than one type of mRNA from a single gene, alternative splicing significantly contributes to human proteome variation98. Interestingly, retrotransposon sequences are sometimes recruited as exons that become integrated to genes, a process termed exonization (FIG. 3a). It was initially estimated based on transcript sequence data that ~4% of human protein-coding sequences contained TEs (mostly Alu and L1)99. However, a recent analysis at the protein level more conservatively suggested that this proportion is closer to ~0.1%100. Exonization is thought to be facilitated by the fact that many TEs carry cryptic donor and acceptor splice sites. For example, a typical Alu sequence contains nine GT dinucleotides and 14 AG dinucleotides that represent as many cryptic donor and acceptor splice sites, respectively101,102. Consistently, Alu exonization has occurred repeatedly during primate evolution103. It has been estimated that ~5% of alternatively spliced exons are derived from Alu elements in humans and that most – if not all – Alu exons are alternatively spliced presumably because constitutively expressed Alu exons are deleterious and negatively selected101. Consistently, the three reported cases of exonized Alu elements becoming constitutively expressed are all associated with genetic disorders98.
Non-LTR retrotransposons have also been involved in facilitating the molecular domestication of other TEs. This is exemplified by SETMAR, a chimeric primate gene resulting from fusion of a SET histone methyltransferase gene to the transposase gene of an Hsmar1 DNA transposon104. The birth of SETMAR might have never occurred without the contribution of an Alu element that inserted in and partially deleted the 5′-terminal inverted repeat of the Hsmar1 element104. Because both terminal inverted repeats of DNA transposons are necessary for transposition, the Alu insertion may have contributed to the recruitment of the Hsmar1 transposon as part of SETMAR by immobilizing it at a period when the Hsmar1 family was experiencing high levels of transposition in primate genomes5. Overall, it is striking that non-LTR retrotransposons seem to directly contribute a disproportionately small number of domesticated genes compared to other TEs (such as DNA transposons), despite the fact that they are the most numerous TEs in the human genome105,106.
As described above, retrotransposons have dramatically impacted human evolution at the DNA level. Evidence is also accumulating that retrotransposons significantly shape human evolution at the RNA level through various mechanisms, which we discuss in this section.
Retrotransposons impact the expression of nearby genes through a variety of mechanisms. Similar to Alu elements, L1 sequences can provide new splice sites that may promote exonization and alternative splicing107,108 (FIG. 3a). In addition, intronic L1 elements can interfere with transcriptional elongation of the host gene due to reduced ability of RNA polymerase II to read through L1 sequences109 (FIG. 3b). Furthermore, retrotransposon sequences can provide polyadenylation signals inducing termination of gene transcripts110-112 (FIG. 3b). It has also been shown that Alu elements carry transcription factor-binding sites that may serve to modulate gene expression113,114 (FIG. 3c). The functional promoter sequences of L1 and Alu elements can also initiate sense or anti-sense transcription through other genes115-117 (FIG. 3d).
The potential of L1 endogenous promoter and polyadenylation signals to create transcriptome diversity in humans is illustrated by 15 human genes that were apparently split by L1 elements inserted in antisense orientation in intronic sequences118. In each of these genes, a transcript containing exons upstream of the insertion site terminates at the L1 3′ antisense polyadenylation signal; a second transcript derived from the L1 5′ antisense promoter drives expression of a transcript that includes the downstream exons of the gene. These observations provide a mechanistic basis for the emergence of new gene structures by gene fission.
RNA editing is a process by which RNA nucleotide sequences are co- or post-transcriptionally modified, as exemplified by the conversion of adenosine to inosine (A-to-I) in double-stranded RNA (FIG. 3e). A-to-I editing is widespread in humans and >90% of all A-to-I substitutions occur within Alu sequences embedded in mRNA transcripts119-122. Editing within Alu elements might be favored by the dimeric structure of these elements and the occasional occurrence of pairs of Alu elements in head-to-tail orientation. A-to-I editing can eliminate splice sites and therefore might affect alternative splicing of exonized Alu sequences. Furthermore, it has recently been shown that A-to-I editing of pairs of inverted Alu elements in 3′ UTRs can suppress expression through nuclear retention of mRNA transcripts123.
The epigenetic regulation of retrotransposon activity through DNA methylation represents an important defence mechanism for the cell (BOX 2): the L1 promoter CpG island is typically highly methylated124, and Alu and SVA elements are enriched in CpG sites30,125 to the extent that one third of all human CpG sites are contained within Alu sequences126. Because L1, Alu and SVA elements are frequently found in or near genes, retrotransposon-mediated heterochromatin formation and spread could repress transcription of nearby genes (FIG. 3f). Consistently, Alu elements may be excluded from imprinted regions of the human genome due to their potential negative impact on methylation associated with imprinting127. Similarly, it has been proposed that the high density of L1 elements on the X chromosome may be explained by their involvement in X inactivation60,61. Under this hypothesis, L1 elements would serve as booster stations to propagate the signal that silences one of the two female X chromosomes. However, formal demonstration of retrotransposon-mediated epigenetic control of neighboring genes in humans and evaluation of the extent of this phenomenon at a genome-wide scale represent active topics of investigation in the field.
Recent genome comparisons have revealed the occurrence of numerous conserved non-coding elements (CNEs) in the human genome. Strikingly, many CNEs appear to be derived from ancient TE sequences, in particular a class of non-LTR retrotransposons known as short interspersed elements (SINEs, to which Alu elements belong)31-33. These ancient SINE-derived sequences are currently evolving under strong negative selection and have apparently taken on regulatory functions31-33. It remains unclear whether the frequent recruitment of SINEs as CNEs reflects an endogenous functional property of these elements, is a by-product of their high copy numbers in mammalian genomes or results from their distinctive sequence architecture which makes them more readily identifiable as old retrotransposons106. In any event, the genome-wide contribution of this phenomenon to human evolution remains to be determined, but is likely to be important.
For tens or even hundreds of My, TEs (including non-LTR retrotransposons) have shaped the evolution of the genomes in which they reside128. Maintaining activity over extended periods of time is a distinguishing feature of non-LTR retrotransposons that was instrumental to their evolutionary success in the human lineage. Our understanding of the factors underlying this evolutionary success is still incomplete and the next few years will probably shed new light on this intriguing question. This intricate relationship does not mean that non-LTR retrotransposons have been maintained in the human genome on such a timescale because of various evolutionary advantages they have conferred to their host genome. On the contrary, we believe that the profound impact of retrotransposons on genome evolution is a by-product of, not the reason for, the evolutionary success of these selfish genetic elements.
This view is supported by the notion that retrotransposons often represent a threat to human health. While their involvement in causing genetic diseases through insertion mutagenesis as a result of their sustained mobilization activity has long been established, other mechanisms are less well understood. For example, investigating the contribution of L1 endonuclease to the generation of DSBs in germline and somatic tissues might have important implications for understanding the L1 integration process and interactions with DNA repair mechanisms, as well as for chromosomal damage and human health more generally. Although the contribution of retrotransposons to genomic deletions, such as insertion-mediated deletions and RMDs is well established, other types of genomic rearrangements such as retrotransposon recombination-mediated tandem duplications are less well understood, partly because they are more difficult to characterize through computational comparisons of genome sequences. Given that duplications are a key contributor to genetic innovation, the extent to which retrotransposons have contributed to the formation of new genes in the human genome might still be vastly underestimated. This is also true for many aspects of retrotransposon impact on gene expression. For example, there is growing evidence that TEs, not just non-LTR retrotransposons, have been a rich source of material for the assembly and evolution of regulatory networks106. By providing a wealth of genomic and transcriptomic sequence data, next generation sequencing and personal genomics will shed new light on the surprisingly dynamic nature of TEs and the roles that they play in shaping within- and inter-individual variation. This will allow researchers to dissect retrotransposon-induced variation at ever-increasing resolution. Such information is crucial if we are to better understand the overall TE impact on human health, genome evolution and the unique traits that make us human.
Ref. #3: A landmark paper analyzing the entire human genome sequence and revealing that transposable elements make up nearly half of our genome.
Ref. #12: The authors establish an experimental test of Alu retrotransposition in cultured cells and demonstrate that L1 ORF2 protein is required for Alu retrotransposition.
Ref. #13: A landmark paper presenting the development and characterization of an in vitro assay to measure L1 retrotransposition in cultured cells.
Refs. #16 & 17: References 16 and 17 report genome-wide analyses demonstrating that L1 and Alu recombination-mediated deletions have greatly impacted human genome evolution.
Ref. #22: This study indicates that most of L1 retrotransposition in humans may result from the activity of just six highly active L1 elements.
Ref. #41: The authors refine the master gene model of Alu amplification by showing that human-specific Alu subfamilies typically contain 10-20% retrotransposition-competent copies.
Ref. #49: The first genome-wide comparison of inter-individual structural variation attributable to transposable elements in humans.
Refs. #76 & 77: References 76 and 77 demonstrate that L1 retrotransposition can be associated with various forms of genomic instability in cultured cells.
Refs. # 88 & 89: Reference 88 shows that retrotransposon-mediated transduction can create new genes in cultured cells and reference 89 demonstrates the evolutionary significance of this phenomenon during human evolution.
Ref. #110: This study demonstrates that the L1 element contains many polyadenylation signals, resulting in truncated transcripts and attenuated L1 activity.
Ref. #118: This paper shows how anti-sense promoter and polyadenylation signals of L1 elements can lead to the formation of new genes by fission of pre-existing genes.
Refs. #153 & 154: References 153 and 154 elegantly show how Alu insertion polymorphisms can be used human evolutionary history and demography.
We apologize to colleagues whose work could not be discussed or cited due to space constraints. Our research on various aspects of mobile elements is supported by a Young Investigator ATIP Award from the Centre National de la Recherche Scientifique (CNRS) to R.C., as well as Louisiana Board of Regents Governor’s Biotechnology Initiative GBI (2002-005), National Science Foundation grant BCS-0218338, National Institutes of Health PO1 AG022064, and National Institutes of Health RO1 GM59290 all to M.A.B.
Richard Cordaux received his Ph.D. from the laboratory of Mark Stoneking at the Max Planck Institute for Evolutionary Anthropology in Leipzig (Germany). He carried out postdoctoral studies with Mark Batzer at Louisiana State University. He became a tenured investigator at the Centre National de la Recherche Scientifique (CNRS) and the University of Poitiers (France) in 2006. He was awarded the CNRS bronze medal in 2009. His group focuses on mobile elements, comparative genomics, population genetics in humans and bacterial endosymbionts.
Mark Batzer received his Ph.D. from the laboratory of William R. Lee at Louisiana State University. He carried out postdoctoral studies with Prescott Deininger at the LSU Health Sciences Center, and then with Pieter de Jong in the Human Genome Center at Lawrence Livermore National Laboratory. He became a faculty member at the LSU Health Sciences Center in 1995 and moved to a position as a professor of Biological Sciences at LSU in 2001. He is presently an LSU System Boyd Professor and Dr. Mary Lou Applewhite Distinguished Professor at LSU. His laboratory focuses on mobile elements, comparative genomics, population genetics, and human molecular genetics.
Cordaux Lab homepage: http://site.voila.fr/rcordaux
Batzer Lab homepage: http://batzerlab.lsu.edu/
dbRIP, a database of retrotransposon insertion polymorphisms: http://dbrip.brocku.ca/
CSHL Dolan DNA learning center human diversity module: http://www.geneticorigins.org/pv92/aluframeset.htm
Repbase, a database of eukaryotic transposable elements: http://www.girinst.org/repbase/index.html
Access to this interactive links box is free online.
Richard Cordaux, Université de Poitiers, CNRS UMR 6556 Ecologie, Evolution, Symbiose, 40 Avenue du Recteur Pineau, 86022 Poitiers, France.
Mark A. Batzer, Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA.