|Home | About | Journals | Submit | Contact Us | Français|
The ubiquity of mobile elements in mammalian genomes poses considerable challenges for the maintenance of genome integrity. The predisposition of mobile elements towards participation in genomic rearrangements is largely a consequence of their interspersed homologous nature. As tracts of nonallelic sequence homology, they have the potential to interact in a disruptive manner during both meiotic recombination and DNA repair processes, resulting in genomic alterations ranging from deletions and duplications to large-scale chromosomal rearrangements. Although the deleterious effects of transposable element (TE) insertion events have been extensively documented, it is arguably through post-insertion genomic instability that they pose the greatest hazard to their host genomes. Despite the periodic generation of important evolutionary innovations, genomic alterations involving TE sequences are far more frequently neutral or deleterious in nature. The potentially negative consequences of this instability are perhaps best illustrated by the 25 human genetic diseases that are attributable to TE-mediated rearrangements. Some of these rearrangements, such as those involving the MLL locus in leukemia and the LDL receptor in familial hypercholesterolemia, represent recurrent mutations that have independently arisen multiple times in human populations. While TE-instability has been a potent force in shaping eukaryotic genomes and a significant source of genetic disease, much concerning the mechanisms governing the frequency and variety of these events remains to be clarified. Here we survey the current state of knowledge regarding the mechanisms underlying mobile element-based genetic instability in mammals. Compared to simpler eukaryotic systems, mammalian cells appear to have several modifications to their DNA-repair ensemble that allow them to better cope with the large amount of interspersed homology that has been generated by TEs. In addition to the disruptive potential of nonallelic sequence homology, we also consider recent evidence suggesting that the endonuclease products of TEs may also play a key role in instigating mammalian genomic instability.
Because of the high density of repetitive elements in mammalian genomes, it is not surprising that we should find them frequently serving as substrates for genomic rearrangements. With nearly 50% of the human genome occupied by recognizable repetitive sequences, chance alone would place transposable elements (TEs) in the vicinity of a great number of genetic alterations. Yet the biological basis for involvement of repetitive sequences in genomic instability extends beyond their abundance in genomes. As tracts of nonallelic homologous sequences, they possess the potential to disrupt essential DNA repair processes, often resulting in genomic alterations. The variety of post-insertion mutations involving TE substrates ranges from genomic insertion/deletion (indel) mutations to larger-scale, intra and inter-chromosomal rearrangements. While a number of TE-related rearrangement events have resulted in key innovations during evolution (e.g.[1,2]), it is nevertheless the case that these events are more frequently neutral or deleterious in terms of organismal fitness. The negative consequences of instability are perhaps best illustrated by the 25 distinct human genetic diseases that are attributable to TE- related rearrangements . Some of these rearrangements have independently arisen multiple times in the human population, suggesting the TE sequences involved are particularly disposed to instability. Well known examples of this phenomenon include Alu-mediated deletions in the LDL receptor locus, resulting in familial hypocholestrolemia, and rearrangements involving the MLL locus, a frequent factor in acute myeloid leukemias .
Prior to the advent of large-scale sequencing projects associated and comparative genomic studies, our ability to observe the results of instability events involving TEs was effectively limited to those events that were associated with observable disease phenotypes (Reviewed in [3,4]) or those that were induced under controlled experimental systems. The abundance of sequence data now available from several taxa, as well as the growing list of TE-related rearrangement events, make the profound impact TE expansion and post-insertion instability have had on the evolution of eukaryotic genomes increasingly evident. Nevetheless, the full extent to which ongoing negative selection is occurring against both individual and groups of TE insertions due to their potential for ectopic recombination remains to be clarified. In support of the hypothesis that a significant historical selection pressure has existed against TE copy accumulation based on instability, several studies have demonstrated a tendency for TEs to accumulate in low recombining areas of the genome (discussed in ). Although these observations are also consistent with several alternative hypotheses (e.g. accumulation of deleterious insertions due to decreased selection efficiency within these regions) that do not necessitate ectopic recombination, there is increasing support from the drosophila model for significant selection against TE loci due to their potential for instability  Recent data from humans, showing stronger selection against full-length L1 elements than their truncated counterparts, is also highly suggestive of selection against ectopic recombination instability in our own lineage , particularly when taken together with an earlier genomic analysis of TE accumulation on human sex chromosomes which indicated that negative selection on L1s was related to L1 insertion size  as opposed to the full length vs. non-full length status of insertions. In the latter work, a statistically significant accumulation of >500bp but less-than-full-length L1 inserts on the human sex chromosomes vs. the autosomes .
Although modern sequencing efforts and advances in experimental techniques have considerably improved our understanding of the range and impact of TE instability, significant gaps in our knowledge remain due to our inability to observe those naturally occurring mutational events that are likely to be rapidly removed from the population via natural selection. In addition, another group of mutation events remain largely inaccessible on account of their occurring in somatic cell lineages or subcomponents of tumors that can not readily be assayed. It is therefore important for researchers to remain cognizant of how current observational and experimental constraints impact our overall picture of both the variety and frequency of TE mutations in organisms.
As a group, transposable elements can contribute to the disruption of genetic information in a variety of ways, many of which do not involve physical rearrangements of the genome. Such events include, but are not limited to, direct disruption of genetic structures through insertion , generation of aberrant mRNA splices [8–10], introduction of premature polyadenylation signals  and/or transcription disruption , and as disruptive epigenetic signals which can spread to nearby genes. This larger class of genetic perturbations, along with their evolutionary implications, has been reviewed elsewhere [8,9,13–16]. In this review, we focus on the molecular biology underlying that subset of mutagenic events in which TEs serve as substrates for physical genomic rearrangements.
The majority of instability events involving TEs are thought to be the byproduct of errant double-strand break (DSB)-repair processes. Several major DSB-repair mechanisms rely on sequence homology to conserve sequence information while repairing damaged chromosomes. DSBs are a particularly disruptive form of DNA damage that typically must be corrected by the cell prior to its proceeding through the cell cycle . DSBs are known to instigate a variety of genetic alterations, many of which directly increase the risk of tumorigenesis and/or metastasis [18,19]. The sources of these breaks are multi-fold; they can arise from free-radical byproducts of cellular reactions, topoisomerases, ionizing radiation, as well as during the process of meiosis (reviewed in [19,20]). DSBs can also be generated from single-stranded nicks in DNA; this most frequently occurs when replication forks traverse the broken strand, leading to a collapse in the fork which creates a DSB . As a consequence, the S-phase of the cell cycle appears to endure the bulk of DSBs. In addition to these canonical sources of DNA breaks, our laboratory has recently reported that the endonuclease produced by L1, an autonomous retrotransposon that is widespread throughout the mammalian lineage [22–24], also has the capability of generating DSBs . As we discuss below, the generation of DSBs by L1 and related TE endonucleases may play a significant role in some genetic instability events.
Current estimates place the number of DSBs naturally occurring during each cell cycle at ~50 . This figure will no doubt vary among cell types that differ in their respective exposures to endogenous free radicals and environmental stresses as well as the genomic replication demands placed upon them. Eukaryotic cells possess a robust set of mechanisms for the detection and repair of DSBs [26,27]. In the most extreme cases, the detection of rampant DSBs throughout the genome results in signal cascades leading to apoptosis . More typically, however, DNA-repair machinery is recruited to the site of the DSB, and repair processes are initiated.
Eukaryotic repair pathways can be categorized based on the extent to which they rely on tracts of sequence homology as well as the accuracy with which they repair the defect. There are those mechanisms that are dependent upon tracts of sequence homology, referred to as homologous recombination repeair (HRR), and those pathways, such as non-homologous end joining (NHEJ), which require little or no sequence homology to operate. HRR pathways can also be further subdivided into two basic types. The first type consists of the conservative HRR pathways, which are those that demonstrate a high degree of preservation of sequence information, such as classic double-strand break repair (DSBR), synthesis-dependent strand annealing (SDSA), and break-induced replication (BIR). The second type, nonconservative HRR, presently consists of a single pathway, single strand annealing (SSA). NHEJ can also be further subdivided into “accurate” and “inaccurate” pathways, depending on the particular protein pathway invoked. A complex network of cellular proteins, many of which remain to be characterized, are necessary for the DNA repair process; these proteins and their interactions are detailed in several available reviews [19,29–31]. In the sections below, we discuss current models for DNA repair along with their potential interactions with interspersed homologous sequence.
It is through the interaction with the homology-based DNA DSB repair systems that TEs appear to be most disruptive to genome integrity. Yet these very same repair pathways offer the greatest potential for the cell to repair DSB breaks with no associated loss of genetic information. In most DSB repair processes involving homology, an undamaged copy of the severed chromosome is used to provide the information necessary for repair. Sister chromatids, which are available from S-phase until cell division, are the preferred template, being far more frequently used than the homologous chromosome as source templates for the repair process .
Numerous models currently exist for the interactions among chromosomes and templates during the homologous repair process . In the classic DSB repair scenario of Szostak et al. (DSBR), which extended an earlier proposal by Resnick  and Resnick and Martin , the 3’ end of each of the exposed chromosome ends invades the undamaged chromosome at the homologous (allelic) counterpart of the damaged locus [33,35] (Figure 1). The invading strand displaces one of the template strands, and DNA synthesis is subsequently primed from the invading 3’ end. Meanwhile, the second exposed 3' end anneals to the displaced template strand, also initiating synthesis and resulting in the formation of two separate Holiday junctions which are subsequently resolved (Figure 1). The locations of the two Holiday junctions are not fixed and may migrate to expand the heteroduplex region. The process replicates the chromosomal region in the vicinity of the breakpoint. In this scenario, the repair process culminates with, or without, a crossover event, depending on how the Holiday junctions are resolved (Figure 1). Ideally, if an allelic template on the sister chromatid is used, no genetic information is lost. When the homologous chromosome is used, gene conversion can occur where hybrid strands anneal, potentially resulting in loss of heterozygosity (LOH) at the locus. Experimental evidence suggests, however, that extensive LOH is suppressed during DSB repair in mammalian cells .
Interspersed TE sequences can severely disrupt this classic homologous repair pathway by offering alternative nonallelic tracts of homology with which the invading strand can anneal (Figure 2). Several possible alleles can be generated from the resolution of these events, depending upon the relative locations of the nonallelic sequences involved. A subset of these possibilities are depicted in Figure 2. If the targeted sequence is a nonallelic homologous element located on the sister chromatid or homologous chromosome, two types of alleles are generated: one containing a deletion and one containing a duplication of the region (Figure 2). Such misrouted recombination events, collectively referred to as nonallelic homologous recombinations (NAHR), are thought to be a source of many TE-related instability events, particularly those resulting in duplications. If the targeted sequence is on the same chromosomal strand as the repair site, the intervening sequence can be inverted (Figure 2). As discussed below, most of the Alu-mediated deletion events in humans are consistent with either NAHR or SSA mechanisms .
NAHR involving TEs represents a viable mechanism for large-scale chromosomal translocation mutations. Yet although confirmed instances exist (e.g. ), there are surprisingly few authentic examples of TE-mediated translocations resulting in human diseases . Under experimental conditions, it has been demonstrated that homologous Alu elements are capable of mediating such rearrangements at appreciable levels . In addition, it has also been demonstrated that genomic L1 copies throughout the genome can serve as templates to repair DSBs that are experimentally induced within an L1 copy . These data demonstrate that repair-related physical interactions do occur among homologous TE copies interspersed throughout the genome. While the rarity of TE-mediated translocation mutations no doubt partially stems from the lethality of these mutations relative to smaller-scale rearrangements, as we discuss further below, this observation is likely also related to a general reliance in mammals on non-crossover mechanisms of DNA repair such as SDSA and SSA.
While there is extensive support for the DSBR model described above for meiotic cells, a number of inconsistencies with experimental results were observed for mitotically dividing systems [32,37,41]. As a result, several alternative conservative HRR models have been put forward to accommodate the experimental data, including a SDSA and BIR (Reviewed in [26,41]).
In eukaryotic mitotic systems, gene conversion is frequently observed to occur without the associated crossover events predicted by the classic DSBR model described above . In addition, the transfer of sequence information between chromosomes via gene conversion in several experimental systems appeared primarily unidirectional , an observation that is also inconsistent with predictions of the DSBR model. Experimental data in mammals further suggests that homologous repair using the sister chromatid without associated crossover is the predominant pathway in mammalian cells . To address these and other inconsistencies with DSBR, several related models, collectively referred to as synthesis-dependent strand-annealing (SDSA), have been proposed (Reviewed in detail in ). The common theme of this family of models consists of the return of the newly synthesized strands to the original damaged chromosome. Three variations of SDSA are depicted in (Figures 3a–c). In the first scenario, as in the classic DSBR model described above, both free 3′ ends invade the homologous locus and initiate synthesis. Instead of forming holiday junctions, however, the nascent strands anneal with one another, resulting in no associated crossover and restriction of conversion events to the invading (non-template) strand (Figure 3a). The second scenario is similar to the first, with the exception that only one strand invades and primes synthesis; the newly synthesized sequence subsequently anneals to homologous sequence on the damaged chromosome (Figure 3b). A third scenario, which has received experimental support in mammals, occurs when newly synthesized strands are deprived of homology and NHEJ is employed to rejoin the strands to the original chromosome . A variation of this process is depicted in Figure 3c, wherein invasion at a nonallelic homologous TE locus results in synthesis of nonhomologous sequence that relies on NHEJ to be rejoined to the source locus. Such hybrid HRR-NHEJ processes provide a mechanism for interspersed repetitive elements to serve as vehicles for replicating unique regions of sequence throughout the genome . While the homology-associated pathways are believed to be the primary repair mechanisms involved in TE-related rearrangements, NHEJ likely plays an important role in finalizing some TE-related instability events. Additional SDSA pathways, not discussed here, have been proposed that also include the possibility of crossover events .
During SDSA repair, there are several avenues by which interspersed repetitive elements could serve to derail proper resolution of the DSB. Re-annealment and subsequent attachment of the newly synthesized strands with nonallelic homologous TEs (e.g. Figure 2c) could lead to a variety of rearrangements, several of which being difficult to distinguish from both DSBR and SSA pathways.
Another HRR repair process, SSA, is fundamentally different from the above-mentioned repair methods in that it is invariably mutagenic and hence nonconservative. SSA can occur when a DSB occurs either within a pair of directly repeated sequences or somewhere between the repeating sequences. During SSA, sequences on both sides of the DSB are resected or “chewed back,” allowing homologous sequences adjacent to the exposed ends to associate and ligate  (Figure 4). Ideally, the homologous sequence would be located within the immediate vicinity of the break, thereby minimizing the inevitable genetic loss. Such nearby homology is typically only available for TEs, giving SSA a particularly important role in TE-related DNA repair events. Recent data from Drosophila indicate that SSA is the preferred mechanism of repair for repeat-related DSBs in that organism . In addition, a large number of the observed smaller-scale deletion events in humans are consistent with the SSA pathway , and the use of SSA in preference to recombination based mechanisms would be consistent with the general tendency of mammalian lineages, as opposed to yeast, to suppress crossover during repair. Although the relatively short average size of Alu-Alu mediated deletions in humans (~806 bp)  is suggestive of an SSA model where limited resection is involved, the somewhat larger size of disease rearrangements involving Alu  indicates that larger events that are occurring may be rapidly purged from the genome. As a consequence, we should be cautious about extrapolating much from the characteristics of instability events that have already been heavily filtered by natural selection.
While SSA also has the potential to produce larger intra and inter-chromosomal rearrangements, its overall contribution to these large-scale events is generally considered minimum, as such larger events would require either a) simultaneous DSBs at distant regions of the genome or b) that the resection (“chewing back”) process extends an exceptionally long distance. As is the case with NHEJ, SSA does not require a sister chromatid and remains available throughout the cell cycle. It remains unclear what determines when and where SSA will be invoked in preference to other available repair processes. With the availability of NHEJ—which theoretically necessitates less sequence loss on average than SSA—throughout the cell cycle, invocation of SSA would seldom seem desirable for the cell. However, experimental evidence increasingly points to a competition between multiple repair pathways whose outcome may be determined by a number of cell-state-dependent host factors [43,44].
Given the ubiquity of TEs in mammalian genomes, and the incredible potential for disruption which exists as a result, it may seem a wonder that the overall integrity of the genome is able to be maintained at all. As a consequence of the hazards such interspersed sequences pose, eukaryotic organisms have evolved a number of mechanisms to mitigate post-insertional TE instability. Although homologous interspersed repeats exist throughout mammalian genomes, many of these copies exhibit varying degrees of divergence from one another. Such between-copy diversity is primarily the result of the accumulation of neutral mutations in the sequences. One mechanism employed to avoid NAHR appears to be the discrimination of the level of sequence identity used as a repair template . Experimental evidence from several systems suggests that interspersed repeats having greater levels of divergence are less likely to be used as templates [45,46]. This aspect of recombination surveillance is thought to be carried out, in part, by proteins involved in the mismatch repair system . In yeast, a rapid drop-off in recombination frequency vs. percent divergence is observed for repetitive elements, wherein the first few mismatches have a more suppressive effect on recombination than the addition of further mismatches . As a consequence of the relationship between recombination and divergence, the particular TE ecology of a host genome, including the number and age of copy numbers, will have a significant impact on the potential for ectopic instability. At the same time, the host organism's ability to suppress such rearrangements will in turn have a considerable impact on the manner in which TEs may expand—or not expand—in a given taxa.
In addition to sequence identity, the length of homology of sequences also appears to be a factor in template selection . Data from multiple systems indicate that the longer the available tract of sequence homology, the more likely recombination-based instability events will occur [46,49]. Given that longer tracts of homology increase the likelihood of nonallelic recombination occurring, it may initially seem counterintuitive that most known recombination instability events leading to genetic disease in humans appear to be Alu-mediated [3,4]. This is the case despite the presence of 500,000 copies of considerably longer (up to 6kb) L1 sequence that together comprise a larger fraction of the total genomic sequence than any other repeat family (roughly 17% vs. Alu’s 10%) . There may be multiple factors contributing to this observation. Foremost among these is that the lower L1 copy numbers result in greater inter-copy distances between L1 sequences [49,51,52], rendering recombinations between L1 elements far more likely to be large, mutagenic, and lethal. The distance between Alu elements, by comparison, is smaller, both due to their higher copy number (~1.5 million) and their tendency to cluster in GC-rich regions, often in the proximity of genes. The tendency of Alu sequences to cluster in and around genes also serves to increase the probability of Alu-mediated deletions that knock out components of individual genes without resulting in larger multi-genic lethal mutations. The frequency of meiotic recombination is also observed to vary considerably across the genome, and is generally found to be higher in GC-rich genic regions [51,52] where Alu elements reside. All of these factors likely contribute to some degree to Alu’s predominance in human disease-causing rearrangements.
There is a class of genetic rearrangements which appear to occur in concert with the TE insertion process itself. Insertion-related deletions and other rearrangements were first observed at newly inserted human L1 retrotransposition loci recovered from cell culture experiments [53,54]. Rearrangements were heterogeneous in size, with deletions ranging from a few base pairs to several thousand base pairs. Numerous products recovered from these insertions appeared to be fusions between the novel L1 copy and existing endogenous elements, suggesting the involvement of SSA or other homology dependent pathways for resolution of the insertion. Bioinformatic analyses, complemented with molecular validations, have subsequently confirmed that, for both L1 and Alu inserts, these events occur in the germline, and their products are readily observable in the public human genome sequence ([55,56]). The precise mechanism(s) resulting in these rearrangements remain uncertain, but may indicate the rather fragile nature of the exposed chromosome sites during the insertion process. The hallmark of DNA repair activity associated with these insertion events may indicate a competitive struggle occurring, at the time of the TE insertion process, wherein the host repair machinery attempts to repair the nascent insertion location and derail the retrotransposon insertion . The generation of DSBs by L1 products, which are used by L1, Alu, and SVA insertions, probably alerts the cell DSB signaling system and recruits DNA repair machinery to the insertion loci. This raises the possibility that heterogeneity in pathway preference, both at the level of cell cycle phase  and across tissue types and developmental stages , could greatly impact the relative efficiency of retrotransposon insertions.
Increased transcriptional activity in certain contexts is believed to be highly recombinogenic (reviewed in ). A common explanation for this phenomenon is that interactions between replication forks and transcriptional machinery can increase the likelihood of DSBs . By introducing potentially active promoters throughout the genome, TEs may increase the overall level of transcription-related stress, inflating the average number of DSBs per cell cycle. This phenomenon is no doubt largely mitigated in normal cells due to the repression of many TE promoters by methylation and associated chromatin structures [59–61]. However, some TE promoters evidently do manage—if only temporarily—to escape these repression mechanisms as evidenced by their continued survival in the vast majority of taxa surveyed. It is known, for example, that early developmental stages in mammalian species include a period of general hypomethylation across the genome . Additionally, several forms of cancer are associated with a widespread hypomethylation . Such hypomethylation may increase the transcriptional activity of these elements, thereby increasing the impact of transcriptional stress related DSBs. The increased “transcriptional stress,” by introducing additional DSBs and associated rearrangements, may be an important contributor to the genetic instability that is associated with many cancers as well as the rampant instability associated with loss of methyltransferases in humans and mice[64–66].
Segmental duplications of chromosomal regions are a common occurrence in mammalian genomes and appear to be particularly enriched in human genomes . Such segmental duplications have played substantial roles both in the evolution of genomes and in human disease. Segmental duplications in chromosomes are conducive to additional rearrangements on account of their very extensive nonallelic homology. A systematic analysis of evolutionarily recent segmental duplications in humans revealed a significantly increased density of Alu sequences at, or near, the duplication junctions, specifically those duplications that were separated by a span of unique sequence . For the human lineage, it was hypothesized that the explosion of Alu retrotransposition ~30–40 mya effectively primed the human genome to be particularly subject to segmental duplications . The precise mechanism by which Alu and other repetitive sequences are involved in generating some of these duplications remains unclear, although several of models have been proposed. While the aforementioned homologous repair pathways could readily generate duplication events in which the replicons are tandemly arranged, it is not obvious what mechanism generates the large number of segmental duplications that are physically separated by >1Mb of unique sequence. However, the increased density of Alu elements at the junction of these events, along with tendency for these segmental duplications to involve younger, presumably less-diverged, elements is highly suggestive of a prominent role for homology in the underlying mechanism . In addition to the human data, similar observations have been made regarding the involvement of the B1 and B2 SINE lineages in segmental duplications within the mouse genome . Together these data suggest that TE-driven segmental duplication may be a common theme in mammalian genome evolution.
Multiple studies have demonstrated the particularly unstable nature of nearby TE copies that are in inverted orientation relative to one another [45,70,71]. Closely located, inverted Alu elements are extremely rare in the human genome . In the case of Alu elements in the human genome, these inverted repeats appear to yield DSBs due to secondary structures formed during DNA replication. Although some degree of insertion preference exists for the direct orientation of immediately adjacent Alu insertions , it is likely that the inherent instability of the inverted insertions is the primary reason for their almost complete exclusion from the human genome . Despite the infrequent nature of these inverted loci in humans, the fact that Alu insertions are still occurring in the human population at an appreciable rate  indicates that the potential for the de novo generation of inverted loci remains present in each new generation and possibly even in somatic cell lineages; the latter being increasingly likely given recent in vivo evidence demonstrating that L1 can mobilize in the soma  as well as ex vivo (cell culture) evidence for the possibility of somatic Alu mobilization .
Several studies have demonstrated the ability of some TE sequences to seed the formation of unstable microsatellite repeats [76–78]. Because the structure of many TE insertions includes a poly-A rich tail, mutations in the A-tracts can readily create the initial repeats of a microsatellite sequence. These large tracts of repetitive sequence can subsequently become unstable [79–81]. Depending upon their location, such instabilities can range from innocuous to serious. Severe expansion of microsatellite sequence within introns can influence gene transcripts, such as in the case of an Alu-generated microsatellite in Friedrich's Ataxia .
As the critical role of epigenetic regulation of the mammalian genome becomes increasingly evident, it is important to recognize the potential epigenetic ramifications of TE insertions, along with their mutagenic effects on the primary DNA sequence. These epigenetic alterations can be considered physical alterations in the genome’s information repository. Because of their potential to disrupt normal biological function during proliferation, a number of mammalian taxa have employed epigenetic modifications, many of which serve to constrain TE activity [60,83–86]. As a consequence, the insertion of TEs at particular regions of the genome can result in the recruitment of chromatin modulating factors, typically via CpG methylation and subsequent histone deacetylation, to the region containing the inserted element. In some cases, the introduction of these heterochromatin-inducing factors can not be effectively confined to the immediate vicinity of the TE element. The resulting spread of heterochromatin has the potential to negatively impact surrounding gene expression .
The ability for the genome to restrict chromatin modulation to the repeat location appears to vary among repeats. In humans, LTR and L1 repeats appear to generate larger chromatin modulating effects than Alu elements, which can be methylated in a more focused manner . The relationship between chromatin state and the propensity for regions to accumulate genetic alterations remain unclear. It is conceivable that, by modulating the epigenetic structures associated with a particular region in the genome, TE insertions may exert an influence on the overall mutation rate.
While our discussion of TE-related instability largely focuses on the disruptive aspects of their homology, the activity of mobile elements has also been implicated in instability. In order to facilitate transposition, TEs produce a variety of protein products, including reverse transcriptases and endonucleases. While the precise biochemical steps in the insertion process for many TEs remain unclear, it was recently reported from our laboratory that the endonuclease products of L1, an autonomous retrotransposon found throughout the mammalian order, is capable of generating DSBs far in excess of insertions in mammalian cell lines . This capacity of the L1 endonuclease was experimentally demonstrated by showing an increase in H2AX foci following L1 transfection, a phenomena strongly associated with DSB formation [89,90]. The extent to which this represents an important source of natural endogenous DSBs remains unknown, as the levels of L1 expression used in these experiments were significantly greater than those expected in the normal cellular environment. Nevertheless, the formation of DSBs, which are highly recombinogenic and associated with numerous genetic alterations, has several possible implications for both genome stability and TE evolution in general. For those TEs that do produce DSBs during their insertion, the existence of a number of cellular detection mechanisms for DSBs and their associated repair, suggest a possible avenue for the cell to resist new insertion events. On the other hand, data showing ATM, an important protein in many DNA repair signaling processes, is required for L1 retrotransposition  may indicate that the L1 retrotransposon capitalizes on DNA host repair proteins for its own purposes . Both the reliance on the DNA repair process by the cell to thwart new insertion attempts and the manipulation of the repair process by the TE for its own ends need not be mutually exclusive possibilities.
Many varieties of TEs, including mammalian LINEs and SINEs, employ a mechanism termed target-primed reverse transcription (TPRT) as their means of integration into the genome . In TPRT, the initial target site for endonuclease cleavage and integration is replicated, resulting in a pair of target site duplications (TSDs) that flank the new insert (Figure 5). This allows for possible further cleavage of these target sites by subsequent endonuclease visits (Figure 4). This “revisitation” of target sites by L1 endonucleases, as well as its potential consequences for instability, was initially proposed in conjunction with Alu-mediated low-copy repeat shuffling on chromosome 22q11 . Further support was provided through subsequent bioinformatic analyses, which indicated that in both the human and rodent lineages, SINE elements flanked by perfect (i.e. nonmutated) L1 endonuclease consensus sites were more rapidly lost from the genome . Although it should be noted that a similar analysis for dog SINEs failed to show this effect; there are, however, indications that SINE/LINE dynamics in the dog differ significantly in key ways from those of humans and rodents . Taken together with recent experimental evidence for the ability of endonucleases to generate DSBs at endonuclease target sites , a connection between TSDs and post-insertion genomic instability involving TEs appears highly likely. Endonuclease-generated DSBs at TSD locations would have an enormous potential for genomic instability because they represent the “worst-case-scenario” conjunction of DSBs and adjacent interspersed homology. As described above, the interaction of interspersed homologous sequence and DSB repair pathways is a recipe for genetic rearrangements.
Several studies have reported the involvement of a short 26 bp “core” sequence 5'-CCTGTAATCCCAGCACTTTGGGAGGC-3'within the Alu left monomer in a number of rearrangement events [38,94]. Most recently, an examination of over 400 Alu mediated deletion events showed that the most frequent locations of deletion junctions did indeed occur within this Alu segment, with the previously described 26bp region showing a modest, though clear, two-fold increase in junction participation .
The presence within this 26 bp region of 5 bp of the prokaryotic chi sequence has served to fuel further speculation about a possible recombinogenic role for this region of the Alu left monomer. In E. coli, the short 8 bp chi site is the target site for RecA recombinase, the enzyme which initiates recombination . However, the fact that the prokaryotic chi target site does not appear to be conserved in any other eukaryotic system examined to date makes it highly unlikely that a RecA-”chi-like” system would have been uniquely preserved in the primate lineage—particularly one maintaining target site identity at the nucleotide level. In addition, as others have noted previously , the 26 bp core Alu fragment represents the most highly conserved stretch of Alu sequence among genomic Alu copies. This elevated sequence conservation is likely attributable to both its location in the polIII promoter region as well as its complete lack of highly mutagenic CpG dinucleotides, which are present throughout the remainder of the Alu consensus. It is therefore plausible that the reason this segment of Alu is frequently represented at Alu-mediated rearrangement junctions is simply due to its sequence conservation ; as indicated above, two sequences with little divergence are more likely to engage in a recombination events. While sequence homology no doubt plays a key role in the phenomenon, the possible involvement of topoisomerases has also been suggested . Due to its location within the polIII promoter, there is also a distinct possibility that DSBs related to transcriptional stress  contributes to the number of rearrangements involving this region. In addition to the factors listed above, we further speculate, based on the endonuclease revisitation, that the proximity of the 26bp sequence to the endonuclease target site 5’ of the Alu may play an important role in its involvement in rearrangement junctions. DSBs generated at the 5' end of the Alu sequence could frequently be involved in DSB repair, resulting in an increased appearance of the Alu sequences at both homologous and nonhomologous rearrangement junctions.
The of DNA repair systems in mammalian lineages to avoid NAHR related instability may have been, in part, a response to increased interspersed TE homology across their genomes . As a result of the host organisms’s repair responses, the evolutionary landscape for TEs was likely significantly altered by the reduced risk of ectopic recombination. If, as has been suggested previously, ectopic recombination does play a major role in limiting total TE copy numbers , it follows that increased suppression of ectopic recombination in mammals and other taxa may have ultimately provided for more “safe” niche space within mammalian genomes for the expansion of larger, more similar, TE families . Thus the repeat-driven expansion of many genomes, including our own, may in part represent the consequence of increased vigilance against TE-related instability.
The capacity for L1 to generate DSBs, along with interspersed homology, serves to further increase the culpability of TEs in mammalian genetic instability. From an epidemiological standpoint, L1-generated DSBs also suggest the possibility that variability in the number of actively expressing TE sequences in the human population could lead to differences between individuals in the risk for genetic alterations, both somatic and germinal. As a consequence, TE activity levels may contribute to heterogeneity among individuals for the onset of cancer, reproductive deficiencies, as well as the generation of de novo disease alleles in offspring.
We are very grateful to Louisiana State University in Baton Rouge, and specifically Dr. Mark Batzer, for hosting us in his laboratory during our evacuation for hurricane Katrina. We thank all members of Dr. Batzer’s laboratory for their kindness and help. This research was supported by grants from the National Institutes of Health, R01GM45668 (PLD), National Science Foundation, EPS-0346411 (PLD), and the State of Louisiana Board of Regents Support Fund (PLD).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.