Since their initial discovery in maize, there have been various attempts to categorize the relationship between transposable elements (TEs) and their host organisms. These have ranged from TEs being selfish parasites to their role as essential, functional components of organismal biology. Research over the past several decades has, in many respects, only served to complicate the issue even further. On the one hand, investigators have amassed substantial evidence concerning the negative effects that TE-mutagenic activity can have on host genomes and organismal fitness. On the other hand, we find an increasing number of examples, across several taxa, of TEs being incorporated into functional biological roles for their host organism. Some 45% of our own genomes are comprised of TE copies. While many of these copies are dormant, having lost their ability to mobilize, several lineages continue to actively proliferate in modern human populations. With its complement of ancestral and active TEs, the human genome exhibits key aspects of the host–TE dynamic that has played out since early on in organismal evolution. In this review, we examine what insights the particularly well-characterized human system can provide regarding the nature of the host–TE interaction.
There are over a half a million copies of L1 retroelements in the human genome which are responsible for as much as 0.5% of new human genetic diseases. Most new L1 inserts arise from young source elements that are polymorphic in the human genome. Highly active polymorphic “hot” L1 source elements have been shown to be capable of extremely high levels of mobilization and result in numerous instances of disease. Additionally, hot polymorphic L1s have been described to be highly active within numerous cancer genomes. These hot L1s result in mutagenesis by insertion of new L1 copies elsewhere in the genome, but also have been shown to generate additional full length L1 insertions which are also hot and able to further retrotranspose. Through this mechanism, hot L1s may amplify within a tumor and result in a continued cycle of mutagenesis.
Results and conclusions
We have developed a method to detect full-length, polymorphic L1 elements using a targeted next generation sequencing approach, Sequencing Identification and Mapping of Primed L1 Elements (SIMPLE). SIMPLE has 94% sensitivity and detects nearly all full-length L1 elements in a genome. SIMPLE will allow researchers to identify hot mutagenic full-length L1s as potential drivers of genome instability. Using SIMPLE we find that the typical individual has approximately 100 non-reference, polymorphic L1 elements in their genome. These elements are at relatively low population frequencies relative to previously identified polymorphic L1 elements and demonstrate the tremendous diversity in potentially active L1 elements in the human population.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1374-y) contains supplementary material, which is available to authorized users.
Retrotransposon; High-throughput sequencing; LINE1; Polymorphism
Alu elements make up the largest family of human mobile elements, numbering 1.1 million copies and comprising 11% of the human genome. As a consequence of evolution and genetic drift, Alu elements of various sequence divergence exist throughout the human genome. Alu/Alu recombination has been shown to cause approximately 0.5% of new human genetic diseases and contribute to extensive genomic structural variation. To begin understanding the molecular mechanisms leading to these rearrangements in mammalian cells, we constructed Alu/Alu recombination reporter cell lines containing Alu elements ranging in sequence divergence from 0%-30% that allow detection of both Alu/Alu recombination and large non-homologous end joining (NHEJ) deletions that range from 1.0 to 1.9 kb in size. Introduction of as little as 0.7% sequence divergence between Alu elements resulted in a significant reduction in recombination, which indicates even small degrees of sequence divergence reduce the efficiency of homology-directed DNA double-strand break (DSB) repair. Further reduction in recombination was observed in a sequence divergence-dependent manner for diverged Alu/Alu recombination constructs with up to 10% sequence divergence. With greater levels of sequence divergence (15%-30%), we observed a significant increase in DSB repair due to a shift from Alu/Alu recombination to variable-length NHEJ which removes sequence between the two Alu elements. This increase in NHEJ deletions depends on the presence of Alu sequence homeology (similar but not identical sequences). Analysis of recombination products revealed that Alu/Alu recombination junctions occur more frequently in the first 100 bp of the Alu element within our reporter assay, just as they do in genomic Alu/Alu recombination events. This is the first extensive study characterizing the influence of Alu element sequence divergence on DNA repair, which will inform predictions regarding the effect of Alu element sequence divergence on both the rate and nature of DNA repair events.
DNA double-strand breaks (DSBs) are a highly mutagenic form of DNA damage that can be repaired through one of several pathways with varied degrees of sequence preservation. Faithful repair of DSBs often occurs through gene conversion in which a sister chromatid is used as a repair template. Unfaithful repair of DSBs can occur through non-allelic homologous or homeologous recombination, which leads to chromosomal abnormalities such as deletions, duplications, and translocations and has been shown to cause several human genetic diseases. Substrates for these homologous and homeologous events include Alu elements, which are approximately 300 bp elements that comprise ~11% of the human genome. We use a new reporter assay to show that repair of DSBs results in Alu-mediated deletions that resolve through several distinct repair pathways. Either single-strand annealing (SSA) repair or microhomology-mediated end joining occurs ‘in register’ between two Alu elements when Alu sequence divergence is low. However, with more diverged Alu elements, like those typically found in the human genome, repair of DSBs appears to use the Alu/Alu homeology to direct non-homologous end joining in the general vicinity of the Alu elements. Mutagenic NHEJ repair involving divergent Alu elements may represent a common repair event in primate genomes.
Familial dilated cardiomyopathy is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic dilated cardiomyopathy (DCM) cases.
Methods and Results
We used an unbiased genome-wide approach employing both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, MLOD 1.59. We identified six TTN truncating variants carried by affected with DCM in 7 of 17 DCM families (LOD 2.99); 2 of these 7 families also had novel missense variants segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable to five other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ~5,400 cases from the Exome Sequencing Project was ~23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity LOD score of 1.74.
These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
genetics; human; genome-wide analysis; dilated cardiomyopathy; exome
Despite the increasing speculation that oxidative stress and abnormal energy metabolism may play a role in Autism Spectrum Disorders (ASD), and the observation that patients with mitochondrial defects have symptoms consistent with ASD, there are no comprehensive published studies examining the role of mitochondrial variation in autism. Therefore, we have sought to comprehensively examine the role of mitochondrial DNA (mtDNA) variation with regard to ASD risk, employing a multi-phase approach.
In phase 1 of our experiment, we examined 132 mtDNA single-nucleotide polymorphisms (SNPs) genotyped as part of our genome-wide association studies of ASD. In phase 2 we genotyped the major European mitochondrial haplogroup-defining variants within an expanded set of autism probands and controls. Finally in phase 3, we resequenced the entire mtDNA in a subset of our Caucasian samples (~400 proband-father pairs). In each phase we tested whether mitochondrial variation showed evidence of association to ASD. Despite a thorough interrogation of mtDNA variation, we found no evidence to suggest a major role for mtDNA variation in ASD susceptibility. Accordingly, while there may be attractive biological hints suggesting the role of mitochondria in ASD our data indicate that mtDNA variation is not a major contributing factor to the development of ASD.
mitochondrial DNA; autism; autism spectrum disorders; association studies; genetic
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.
Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean.
Alu elements are trans-mobilized by the autonomous non-LTR retroelement, LINE-1 (L1). Alu-induced insertion mutagenesis contributes to about 0.1% human genetic disease and is responsible for the majority of the documented instances of human retroelement insertion-induced disease. Here we introduce a SINE recovery method that provides a complementary approach for comprehensive analysis of the impact and biological mechanisms of Alu retrotransposition. Using this approach, we recovered 226 de novo tagged Alu inserts in HeLa cells. Our analysis reveals that in human cells marked Alu inserts driven by either exogenously supplied full length L1 or ORF2 protein are indistinguishable. Four percent of de novo Alu inserts were associated with genomic deletions and rearrangements and lacked the hallmarks of retrotransposition. In contrast to L1 inserts, 5′ truncations of Alu inserts are rare, as most of the recovered inserts (96.5%) are full length. De novo Alus show a random pattern of insertion across chromosomes, but further characterization revealed an Alu insertion bias exists favoring insertion near other SINEs, highly conserved elements, with almost 60% landing within genes. De novo Alu inserts show no evidence of RNA editing. Priming for reverse transcription rarely occurred within the first 20 bp (most 5′) of the A-tail. The A-tails of recovered inserts show significant expansion, with many at least doubling in length. Sequence manipulation of the construct led to the demonstration that the A-tail expansion likely occurs during insertion due to slippage by the L1 ORF2 protein. We postulate that the A-tail expansion directly impacts Alu evolution by reintroducing new active source elements to counteract the natural loss of active Alus and minimizing Alu extinction.
SINEs are mobile elements that are found ubiquitously throughout a large diversity of genomes from plants to mammals. The human SINE, Alu, is among the most successful mobile elements, with more than one million copies in the genome. Due to its high activity and ability to insert throughout the genome, Alu retrotransposition is responsible for the majority of diseases reported to be caused by mobile element activity. To further evaluate the genomic impact of SINEs, we recovered and characterized over 200 de novo Alu inserts under controlled conditions. Our data reinforce observations on the mutagenic potential of Alu, with newly retrotransposed Alu elements favoring insertion into genic and highly conserved elements. Alu-mediated deletions and rearrangements are infrequent and lack the typical hallmarks of TPRT retrotransposition, suggesting the use of an alternate method for resolving retrotransposition intermediates or an atypical insertion mechanism. Our data also provide novel insights into SINE retrotransposition biology. We found that slippage of L1 ORF2 protein during reverse transcription expands the A-tails of de novo insertions. We propose that the L1 ORF2 protein plays a major role in minimizing Alu extinction by reintroducing active Alu elements to counter the natural loss of Alu source elements.
Autism spectrum disorders (ASD) represent a group of neurodevelopmental disorders characterized by a core set of social-communicative and behavioral impairments. Gamma-aminobutyric acid (GABA) is the major inhibitory neurotransmitter in the brain, acting primarily via the GABA receptors (GABR). Multiple lines of evidence, including altered GABA and GABA receptor expression in autistic patients, indicate that the GABAergic system may be involved in the etiology of autism.
As copy number variations (CNVs), particularly rare and de novo CNVs, have now been implicated in ASD risk, we examined the GABA receptors and genes in related pathways for structural variation that may be associated with autism. We further extended our candidate gene set to include 19 genes and regions that had either been directly implicated in the autism literature or were directly related (via function or ancestry) to these primary candidates. For the high resolution CNV screen we employed custom-designed 244 k comparative genomic hybridization (CGH) arrays. Collectively, our probes spanned a total of 11 Mb of GABA-related and additional candidate regions with a density of approximately one probe every 200 nucleotides, allowing a theoretical resolution for detection of CNVs of approximately 1 kb or greater on average. One hundred and sixty-eight autism cases and 149 control individuals were screened for structural variants. Prioritized CNV events were confirmed using quantitative PCR, and confirmed loci were evaluated on an additional set of 170 cases and 170 control individuals that were not included in the original discovery set. Loci that remained interesting were subsequently screened via quantitative PCR on an additional set of 755 cases and 1,809 unaffected family members.
Results include rare deletions in autistic individuals at JAKMIP1, NRXN1, Neuroligin4Y, OXTR, and ABAT. Common insertion/deletion polymorphisms were detected at several loci, including GABBR2 and NRXN3. Overall, statistically significant enrichment in affected vs. unaffected individuals was observed for NRXN1 deletions.
These results provide additional support for the role of rare structural variation in ASD.
AUTISM; CGH; CNV; GABA; NRXN1
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic component. The skewed prevalence toward males and evidence suggestive of linkage to the X chromosome in some studies suggest the presence of X-linked susceptibility genes in people with ASD.
We analyzed genome-wide association study (GWAS) data on the X chromosome in three independent autism GWAS data sets: two family data sets and one case-control data set. We performed meta- and joint analyses on the combined family and case-control data sets. In addition to the meta- and joint analyses, we performed replication analysis by using the two family data sets as a discovery data set and the case-control data set as a validation data set.
One SNP, rs17321050, in the transducin β-like 1X-linked (TBL1X) gene [OMIM:300196] showed chromosome-wide significance in the meta-analysis (P value = 4.86 × 10-6) and joint analysis (P value = 4.53 × 10-6) in males. The SNP was also close to the replication threshold of 0.0025 in the discovery data set (P = 5.89 × 10-3) and passed the replication threshold in the validation data set (P = 2.56 × 10-4). Two other SNPs in the same gene in linkage disequilibrium with rs17321050 also showed significance close to the chromosome-wide threshold in the meta-analysis.
TBL1X is in the Wnt signaling pathway, which has previously been implicated as having a role in autism. Deletions in the Xp22.2 to Xp22.3 region containing TBL1X and surrounding genes are associated with several genetic syndromes that include intellectual disability and autistic features. Our results, based on meta-analysis, joint analysis and replication analysis, suggest that TBL1X may play a role in ASD risk.
Despite the ever-increasing throughput and steadily decreasing cost of next
generation sequencing (NGS), whole genome sequencing of humans is still not a
viable option for the majority of genetics laboratories. This is particularly
true in the case of complex disease studies, where large sample sets are often
required to achieve adequate statistical power. To fully leverage the potential
of NGS technology on large sample sets, several methods have been developed to
selectively enrich for regions of interest. Enrichment reduces both monetary and
computational costs compared to whole genome sequencing, while allowing
researchers to take advantage of NGS throughput. Several targeted enrichment
approaches are currently available, including molecular inversion probe ligation
sequencing (MIPS), oligonucleotide hybridization based approaches, and PCR-based
strategies. To assess how these methods performed when used in conjunction with
the ABI SOLID3+, we investigated three enrichment techniques: Nimblegen
oligonucleotide hybridization array-based capture; Agilent SureSelect
oligonucleotide hybridization solution-based capture; and Raindance
Technologies' multiplexed PCR-based approach. Target regions were selected
from exons and evolutionarily conserved areas throughout the human genome. Probe
and primer pair design was carried out for all three methods using their
respective informatics pipelines. In all, approximately 0.8 Mb of target space
was identical for all 3 methods. SOLiD sequencing results were analyzed for
several metrics, including consistency of coverage depth across samples,
on-target versus off-target efficiency, allelic bias, and genotype concordance
with array-based genotyping data. Agilent SureSelect exhibited superior
on-target efficiency and correlation of read depths across samples. Nimblegen
performance was similar at read depths at 20× and below. Both Raindance
and Nimblegen SeqCap exhibited tighter distributions of read depth around the
mean, but both suffered from lower on-target efficiency in our experiments.
Raindance demonstrated the highest versatility in assay design.
Although autism is one of the most heritable neuropsychiatric disorders, its underlying genetic architecture has largely eluded description. To comprehensively examine the hypothesis that common variation is important in autism, we performed a genome-wide association study (GWAS) using a discovery dataset of 438 autistic Caucasian families and the Illumina Human 1M beadchip. 96 single nucleotide polymorphisms (SNPs) demonstrated strong association with autism risk (p-value < 0.0001). The validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip. A novel region on chromosome 5p14.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone. Our findings demonstrate that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
Recombination rates vary widely across the human genome, but little of that variation is correlated with known DNA sequence features. The genome contains more than one million Alu mobile element insertions, and these insertions have been implicated in non-homologous recombination, modulation of DNA methylation, and transcriptional regulation. If individual Alu insertions have even modest effects on local recombination rates, they could collectively have a significant impact on the pattern of linkage disequilibrium in the human genome and on the evolution of the Alu family itself.
We carried out sequencing, SNP identification, and SNP genotyping around 19 AluY insertion loci in 347 individuals sampled from diverse populations, then used the SNP genotypes to estimate local recombination rates around the AluY loci. The loci and SNPs were chosen so as to minimize other factors (such as SNP ascertainment bias and SNP density) that could influence recombination rate estimates. We detected a significant increase in recombination rate within ~2 kb of the AluY insertions in our African population sample. To test this observation against a larger set of AluY insertions, we applied our locus- and SNP-selection design and analyses to the HapMap Phase II data. In that data set, we observed a significantly increased recombination rate near AluY insertions in both the CEU and YRI populations.
We show that the presence of a fixed AluY insertion is significantly predictive of an elevated local recombination rate within 2 kb of the insertion, independent of other known predictors. The magnitude of this effect, approximately a 6% increase, is comparable to the effects of some recombinogenic DNA sequence motifs identified via their association with recombination hot spots.
The identification of repeat structure in eukaryotic genomes can be time-consuming and difficult because of the large amount of information (~3×109 bp) that needs to be processed and compared. We introduce a new approach based on exact word counts to evaluate, de novo, the repeat structure present within large eukaryotic genomes. This approach avoids sequence alignment and similarity search, two of the most time-consuming components of traditional methods for repeat identification. Algorithms were implemented to efficiently calculate exact counts for any length oligonucleotide in large genomes. Based on these oligonucleotide counts, oligonucleotide excess probability clouds, or “P-clouds”, were constructed. P-clouds are composed of clusters of related oligonucleotides that occur, as a group, more often than expected by chance. After construction, P-clouds were mapped back onto the genome, and regions of high P-cloud density were identified as repetitive regions based on a sliding window approach. This efficient method is capable of analyzing the repeat content of the entire human genome on a single desktop computer in less than half a day, at least 10-fold faster than current approaches. The predicted repetitive regions strongly overlap with known repeat elements, as well as other repetitive regions such as gene families, pseudogenes and segmental duplicons. This method should be extremely useful as a tool for use in de novo identification of repeat structure in large newly sequenced genomes.
alignment; complete genome annotation; oligonucleotide counts; P-clouds; repeat structure
The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci show a different set of biases than those observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C-neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.
LINE; Retrotransposition; Alu; LINE; SINE
The long interspersed element-1 (LINE-1 or L1) is a highly successful retrotransposon in mammals. L1 elements have continued to actively propagate subsequent to the human–chimpanzee divergence, ~6 million years ago, resulting in species-specific inserts. Here, we report a detailed characterization of chimpanzee-specific L1 subfamily diversity and a comparison with their human-specific counterparts. Our results indicate that L1 elements have experienced different evolutionary fates in humans and chimpanzees within the past ~6 million years. Although the species-specific L1 copy numbers are on the same order in both species (1200–2000 copies), the number of retrotransposition-competent elements appears to be much higher in the human genome than in the chimpanzee genome. Also, while human L1 subfamilies belong to the same lineage, we identified two lineages of recently integrated L1 subfamilies in the chimpanzee genome. The two lineages seem to have coexisted for several million years, but only one shows evidence of expansion within the past three million years. These differential evolutionary paths may be the result of random variation, or the product of competition between L1 subfamily lineages. Our results suggest that the coexistence of several L1 subfamily lineages within a species may be resolved in a very short evolutionary period of time, perhaps in just a few million years. Therefore, the chimpanzee genome constitutes an excellent model in which to analyze the evolutionary dynamics of L1 retrotransposons.
L1 elements; Retrotransposons; Human; Chimpanzee; Species-specific; Polymorphism
Long interspersed element-1 elements compose on average one-fifth of mammalian genomes. The expression and retrotransposition of L1 is restricted by a number of cellular mechanisms in order to limit their damage in both germ-line and somatic cells. L1 transcription is largely suppressed in most tissues, but L1 mRNA and/or proteins are still detectable in testes, a number of specific somatic cell types, and malignancies. Down-regulation of L1 expression via premature polyadenylation has been found to be a secondary mechanism of limiting L1 expression. We demonstrate that mammalian L1 elements contain numerous functional splice donor and acceptor sites. Efficient usage of some of these sites results in extensive and complex splicing of L1. Several splice variants of both the human and mouse L1 elements undergo retrotransposition. Some of the spliced L1 mRNAs can potentially contribute to expression ofopen reading frame 2-related products and therefore have implications for the mobility of SINEs even if they are incompetent for L1 retrotransposition. Analysis of the human EST database revealed that L1 elements also participate in splicing events with other genes. Such contribution of functional splice sites by L1 may result in disruption of normal gene expression or formation of alternative mRNA transcripts.
Retrotransposons have had a considerable impact on the overall architecture of the human genome. Currently, there are three lineages of retrotransposons (Alu, L1, and SVA) that are believed to be actively replicating in humans. While estimates of their copy number, sequence diversity, and levels of insertion polymorphism can readily be obtained from existing genomic sequence data and population sampling, a detailed understanding of the temporal pattern of retrotransposon amplification remains elusive. Here we pose the question of whether, using genomic sequence and population frequency data from extant taxa, one can adequately reconstruct historical amplification patterns. To this end, we developed a computer simulation that incorporates several known aspects of primate Alu retrotransposon biology and accommodates sampling effects resulting from the methods by which mobile elements are typically discovered and characterized. By modeling a number of amplification scenarios and comparing simulation-generated expectations to empirical data gathered from existing Alu subfamilies, we were able to statistically reject a number of amplification scenarios for individual subfamilies, including that of a rapid expansion or explosion of Alu amplification at the time of human–chimpanzee divergence.
Nearly 50% of the human genome is composed of mobile elements. While much of this sequence consists of inactive “fossil” elements that are no longer actively moving or generating new copies, three families are currently proliferating in human genomes. Among these, the Alu lineage has reached a copy number of over 1 million and alone accounts for approximately 10% of the genome. While considerable evidence has been gathered concerning the underlying biological mechanisms of Alu mobilization and proliferation, a detailed understanding of Alu amplification history is currently lacking. Researchers are aware, for example, that several thousand Alu elements have inserted within the human genome since the divergence of humans and chimpanzees, but how those insertions were distributed over this ~6-million-year time period is currently unknown. In this work, the authors introduce a simulation framework that seeks to incorporate both sequence diversity and empirically gathered population data from human Alu elements, in order to provide a better understanding of the last several million years of human Alu evolution. The results suggest that a rapid explosion of Alu amplification at the time of the human–chimpanzee divergence is unlikely. Therefore, it is improbable that an increase in Alu retrotransposition activity was involved in the speciation of humans and chimpanzees.
Alu elements are short (~300 bp) interspersed elements that amplify in primate genomes through a process termed retroposition. The expansion of these elements has had a significant impact on the structure and function of primate genomes. Approximately 10 % of the mass of the human genome is comprised of Alu elements, making them the most abundant short interspersed element (SINE) in our genome. The majority of Alu amplification occurred early in primate evolution, and the current rate of Alu retroposition is at least 100 fold slower than the peak of amplification that occurred 30–50 million years ago. Alu elements are therefore a rich source of inter- and intra-species primate genomic variation.
A total of 153 Alu elements from the Ye subfamily were extracted from the draft sequence of the human genome. Analysis of these elements resulted in the discovery of two new Alu subfamilies, Ye4 and Ye6, complementing the previously described Ye5 subfamily. DNA sequence analysis of each of the Alu Ye subfamilies yielded average age estimates of ~14, ~13 and ~9.5 million years old for the Alu Ye4, Ye5 and Ye6 subfamilies, respectively. In addition, 120 Alu Ye4, Ye5 and Ye6 loci were screened using polymerase chain reaction (PCR) assays to determine their phylogenetic origin and levels of human genomic diversity.
The Alu Ye lineage appears to have started amplifying relatively early in primate evolution and continued propagating at a low level as many of its members are found in a variety of hominoid (humans, greater and lesser ape) genomes. Detailed sequence analysis of several Alu pre-integration sites indicated that multiple types of events had occurred, including gene conversions, near-parallel independent insertions of different Alu elements and Alu-mediated genomic deletions. A potential hotspot for Alu insertion in the Fer1L3 gene on chromosome 10 was also identified.
The Alu Yb-lineage is a 'young' primarily human-specific group of short interspersed element (SINE) subfamilies that have integrated throughout the human genome. In this study, we have computationally screened the draft sequence of the human genome for Alu Yb-lineage subfamily members present on autosomal chromosomes. A total of 1,733 Yb Alu subfamily members have integrated into human autosomes. The average ages of Yb-lineage subfamilies, Yb7, Yb8 and Yb9, are estimated as 4.81, 2.39 and 2.32 million years, respectively. In order to determine the contribution of the Alu Yb-lineage to human genomic diversity, 1,202 loci were analysed using polymerase chain reaction (PCR)-based assays, which amplify the genomic regions containing individual Yb-lineage subfamily members. Approximately 20 per cent of the Yb-lineage Alu elements are polymorphic for insertion presence/absence in the human genome. Fewer than 0.5 per cent of the Yb loci also demonstrate insertions at orthologous positions in non-human primate genomes. Genomic sequencing of these unusual loci demonstrates that each of the orthologous loci from non-human primate genomes contains older Y, Sg and Sx Alu family members that have been altered, through various mechanisms, into Yb8 sequences. These data suggest that Alu Yb-lineage subfamily members are largely restricted to the human genome. The high copy number, level of insertion polymorphism and estimated age indicate that members of the Alu Yb elements will be useful in a wide range of genetic analyses.
mobile elements; SINEs