|Home | About | Journals | Submit | Contact Us | Français|
Mobile elements represent a unique and powerful set of tools for understanding the variation in a genome. Methods exist not only to utilize the polymorphisms among and within taxa to various ends but also to investigate the mechanism through which mobilization occurs. The number of methods to accomplish these ends is ever growing. Here we present several protocols designed to assay mobile element-based variation within and among individual genomes.
Mobile elements are interspersed repetitive DNA sequences with the unique ability to spread copies of themselves throughout the genome they occupy. As a result, these sequences can comprise a large proportion of the genomes in which they are found (1,2). Mobile elements may be divided into two classes depending on how they mobilize and the type of intermediate they use. Class I elements include the retrotransposons, which utilize an RNA intermediate during retrotransposition, while DNA transposons, Class II, utilize a DNA intermediate during mobilization (3). While DNA transposons have had periods of activity early in primate evolution, all major recent activity in the human lineage has been retrotransposon-based (1,4). Thus, we will focus on these for this chapter.
Retrotransposons from the human lineage include L1 (a Long INterspersed Element), Alu (a primate-specific Short INterspersed Element) and SVA (a composite retrotransposon). Together these elements have had significant impacts on the architecture of primate genomes (5). They comprise over ~40% of the human genome by mass and are the most abundant interspersed elements therein (6). Because of their high copy number, these interspersed repeats have been a significant source of variation as a result of insertion, transduction, and post-integration recombination among elements (6,7). During retrotransposition the RNA copy is reverse transcribed by target primed reverse transcription (TPRT) and subsequently integrated into the host genome (8,9,10). Unable to retrotranspose autonomously, Alu and SVA elements are thought to borrow the enzymatic factors required for their mobilization from L1 elements (8,11,12,13,14,15), which encode a protein complex with endonuclease and reverse transcriptase activity (16,17).
Over the millions of years of primate evolution, retrotransposons have tended to accumulate in a hierarchical manner. This pattern is a direct result of the mechanisms through which they mobilize and insert, a modified version of the master gene model (18,19,20,21). Evidence suggests that a subfamily will accumulate copies for a certain time period and then become quiescent. Other newer subfamilies subsequently become active and the pattern repeats itself. This pattern is well illustrated by the Alu family of SINEs. Over time, the Alu element has diversified into a variety of subfamilies, each with its own set of diagnostic sequence characteristics and period of activity. For example, during the early stages of primate evolution, AluJ subfamilies were active. The activity of these subfamilies was later reduced (if not extinguished) and the AluS subfamilies, derivatives of AluJ, became active. Thus, while AluJ elements are found in all primates, AluS elements are found only in anthropoid primates (tarsiers, Platyrrhini, Catarrhini). The AluY subfamilies (22) are even more taxonomically specific in that they began their expansion in catarrhine primates after the platyrrhine-catarrhine split (23). Thus, each taxon has a unique pattern of insertions some of which are shared with other closely related taxa and others that are unique to that lineage. For example, the most recent Alu elements to mobilize in our own genome belong to a series of AluY subfamilies (AluYb8, AluYa5a2, etc.) that are exclusively or primarily specific to the human branch of the primate tree (24,25,26,27).
As genetic markers, retrotransposons of all sorts offer certain advantages over more commonly used genetic characters such as microsatellites and sequence data. First and foremost is the observation that these markers are an essentially homoplasy-free set of characters (28,29,30,31,32). Unlike many other genetic markers, they tend to exist as character states for no other reason than inheritance from a common ancestor. Thus, they are almost invariably identical by descent, not just identical by state. As a result, they can be used to provide an extremely accurate picture of evolutionary and population relationships (33,34,35,36,37,38,39). We also know that the ancestral state at any locus is the absence of the element and once the element is inserted it typically remains there indefinitely. These characteristics result in a relatively simple evolutionary model to be applied when interpreting the data.
SINEs and other retrotransposons share other desirable characteristics as well. The vast majority of mobile element insertions in the host genome are neutral residents(40). The process of genotyping individuals to determine insertion presence or absence at any given set of loci is a relatively simple task involving easily distinguishable fragments on a simple agarose gel stained with ethidium bromide. Multiplexing of loci is possible (41,42) and fluorescently labeled primers may also be used if one is interested in automated analysis (43). These features make the analysis of Alu elements a robust tool for tracking human geographic origins.
In the following pages we will describe several techniques that have been used to investigate aspects of primate biology and mobile element biology in the primate lineage. We will not only focus on the human lineage but will also mention some techniques that are widely applicable to other taxa, especially non-human primates. We will describe the laboratory techniques required to investigate questions from the fields of forensics, population biology, phylogenetics, genome evolution, and the biology of the elements themselves. One advantage primate researchers have over many other taxa is the availability of a variety of primate genome sequences to serve as a resource and reference in their work. Many of the laboratory techniques to be described benefit from the availability of these sequences and we suggest the reader reference the companion chapter in this volume dedicated to computational analysis of primate/human mobile elements.
Many of the unique properties of mobile elements make them ideally suited for a variety of forensic applications. This section will focus on the Alu element, the most abundant class of SINE in the human genome. Most Alu elements have become permanent residents of the human genome and are “fixed present”, meaning that all individuals are homozygous for the insertion at a particular locus. The continued expansion of Alu elements throughout primate evolution has created several recently integrated “young” subfamilies that are present in the human genome, but largely absent from nonhuman primates (24,25,27,44,45). Some members of these young Alu subfamilies have inserted in the human genome recently enough that individuals remain polymorphic for the insertion presence/absence. Both fixed and polymorphic Alu elements have been utilized successfully as robust forensic tools.
Forensic DNA analysis typically begins with the quantitation of human-specific DNA obtained from the biological sample. This is essential to determine the most appropriate autosomal and Y chromosome analysis strategies to perform (46). Highly sensitive methods for quantitation of human DNA based on Alu elements have been reported (46,47,48,49,50,51,52,53). These methods take advantage of the high copy number of fixed Alu elements in the human genome to maximize sensitivity. Human DNA quantitation based on Alu elements is evolving as the preferred method in the forensic community (50). The method described in this chapter utilizes a subfamily of Alu elements, enriched in the human genome as compared to other primate species, to maximize human specificity (52,53).
Another important forensic use for Alu elements is human gender identification (54). Fixed Alu insertions on either the X or the Y chromosome provide a simple and reliable system to identify them. AluSTXa and AluSTYa loci demonstrate 100% accuracy in X and Y chromosome identification. The combination of these two markers provides added assurance that gender identification results are accurate since two completely independent mutations would have to occur to affect the outcome.
When one thinks about forensic DNA analysis what typically comes to mind is obtaining a “match” between a crime scene DNA sample and an alleged criminal suspect, thus “solving the case.” Frequently however, tools that narrow the potential pool of suspects are essential precursors to a positive identification. The inferred ancestral origin of a DNA specimen is one type of predictor evidence which can advance a criminal investigation (55). Polymorphic Alu insertions have been widely used to study human genetic variation in the world populations (6,56,57,58,59,60).
One of the most productive areas of mobile element application has been in the arena of phylogenetic inference. Numerous difficult questions regarding the evolutionary history of the primate lineage have been successfully addressed using Alu elements as tools. For example, Salem et al. (37) confidently resolved the human-chimpanzee-gorilla trichotomy and Ray et al. (38) successfully determined the controversial branching order of three families of platyrrhine (New World) primates. Utilizing retrotransposons as phylogenetic markers has been described a number of times. However, phylogenetic analysis of the primate lineage is unique due to the existence of several ‘reference’ genomes. The human (1), chimpanzee (61), and macaque (62) genomes have been released and the marmoset and orang-utan genomes will likely be released in the near future. These genomes provide a valuable resource in determining potentially informative insertions and primer design.
One important consequence of the hierarchical accumulation of retrotransposons in the genome is the ability to target subfamilies of the retrotransposon family that were active during the evolutionary period of interest. For example, if a researcher’s interest is in the recent evolutionary history of tamarins, he or she would want to focus on elements belonging to the AluTa subfamilies instead of AluY, AluS or AluJ: the reason being that all of the latter families were either inactive during that period or never proliferated in that lineage. AluTa, on the other hand has been active in the tamarin lineage over the last 15-20 million years and many informative insertions will likely be present. Methods described in the companion chapter on computational analysis can aid researchers in determining the sequences that should be targeted for any particular question.
In laboratories dealing with primate genetics, it is critical that researchers be sure that they are handling DNA from the appropriate taxon. For instance, very often researchers collect or receive DNA that was collected in a ‘non-invasive’ manner (i.e. ‘divorced’ tissues such as hair or feces) (63,64,65). This is especially true during investigations of the illegal wildlife trade and identification of seized products (64,66,67). Even when laboratories produce their own ‘in-house’ genomic DNA via cell culture, cross-contamination can occur among cell cultures and within concurrent large-scale DNA extractions from multiple species. Furthermore, simple mishandling of well documented samples may result in the loss of their labels. Future analyses based on these mistaken identities can be compromised. We will review an Alu-based dichotomous key for the resolution of primate sample identity for researchers in this area.
Among mobile elements, retrotransposons (e.g.,L1, Alu, and SVA elements) are major endogenous contributors to the creation of structural variation in primate genomes. The tempo and mode of their amplification during the primate radiation have been shown to be lineage-specific events and thus, retrotransposons have had an extensive impact on the evolutionary history of different primate lineages through shaping of their genomic landscape (1,61,68,69,70). Computational analyses of genomic sequence, along with the use of newly developed cell culture assays, suggest that the overall contribution of retrotransposon-mediated genomic variation involves not only the initial integration event but also a variety of recombination events occurring after that integration (e.g., Alu retrotransposition-mediated deletions, L1 insertion-mediated deletion, and Alu recombination-mediated deletions) (68,71,72,73).
Completion of the human and chimpanzee reference genomes allowed whole-genome comparison studies of L1 and Alu insertion-mediated variation in these primate lineages. The results showed that 24 (~1.3%) of the total ~1800 human-specific L1 insertions are involved in genomic deletions and are directly responsible for the loss of ~18 kb from the human genome (72), whereas, only ~0.2% of human-specific Alu insertions are involved in genomic deletions and are responsible for the loss of ~9 kb from the human genome (71). Post-insertion recombination events, however, were shown to have greater genomic impact. Sen et al. (73) identified 492 Alu recombination-mediated genomic deletions which resulted in the loss of ~400 kb of human genomic sequence, and ~60% of these deletions are involved in known or predicted genes. Three events actually deleted functional exons from human genes as compared to orthologous chimpanzee genes (73).
Genome alignment studies such as these have helped us to understand the distribution of retrotransposons and provide insight into their impact on host genomes, but tell us little about their mobilization. It has been the development of in vitro cell culture based assays which have allowed us to study the mobilization dynamics of retrotransposons. A companion chapter in this volume is dedicated to computational methods for the analysis of primate/human mobile elements. Therefore, in this section we will focus on methods which utilize recently developed cell culture assays to study retrotransposition events and consider their genomic impact in cultured human cells.
The transient cultured cell retrotransposition assay was developed by Moran and his colleagues (74,75). L1.2A was isolated as a potential progenitor of disease-producing L1 insertions into the factor VIII from patient JH-27 (hemophilia A) (76). To investigate whether the L1.2A has the capacity of an autonomous retrotransposon, the sequence was cloned and subcloned into a pCEP4 expression vector including a mneol reporter cassette which is comprised of an antisense copy of a neo selectable marker, the heterologous SV40 promoter, and a polyadenylation sequence. The neo gene is disrupted by an intron in the opposite transcriptional orientation (74). This genetic system could display L1 retrotransposition in cultured cell lines and help to estimate the frequency of L1 autonomous retrotransposition. On the basis of these achievements, 82 out of 89 L1s with intact ORFs that exist in the human genome were cloned, and the retrotranspositional capability of each was predicted in cultured human 143B TK- osteosarcoma cells (77). Moreover, the characterization of new daughter L1 inserts generated by synthetic retrotransposition-competent L1s in cultured human cells demonstrated that L1 retrotransposition events cause genomic instability such as deletions, duplications, translocations, and intra-L1 rearrangements (78,79,80) and have the potential to provide the host genome with new gene families through L1-mediated transduction (74,81).
Through the L1-mediated Alu retrotransposition assay, the retrotransposed Alu elements and their flanking sequences were investigated to confirm the fact that Alu elements are indeed mobilized in trans by using the L1 enzymatic machinery. As a result, the new daughter Alu inserts derived from a neoTet-marked Alu construct were intact without deletion. Their pre-insertion sites were predominantly close to an L1 endonuclease cleavage site consensus (TT^AAAA) and on each side of the Alu inserts were the presence of target site duplications (TSDs), one hallmark of authentic Alu retrotransposition, generated by the target-site primed reverse transcription process (10,17,82). Moreover, it was noteworthy that only ORF2p products (endonuclease and reverse transcriptase domains) of L1-encoded proteins are essential for the Alu retrotransposition (14).
The fact that L1 retrotransposition can create genomic deletions in the human genome was revealed by the systems of L1 retrotransposition in cultured human cells and the plasmid-based rescue technique (see Subheading 3.3.3.). It revealed that ~ 20% of de novo L1 insertions recognized through cultured cell retrotransposition assays caused genomic deletions at the integration site and the size of DNA sequences deleted through these events ranged up to 71 kb (78,79,80).
The enormous difference in genomic variation observed between in vitro and in vivo forms of investigation could be caused by evolutionary forces (e.g., selection pressure, the number of retrotransposition-competent L1s, and effective population size) and host defense mechanisms (e.g., RNAi, APOBEC, and methylation).
Using knowledge of subfamily diagnostic sites, it is a relatively simple task to design primers to experimentally mine a genome with reference to a full genome sequence. The key to this process is to ensure that the diagnostic sites that define the subfamily of interest are well-represented in the primer to be used. Furthermore, it would be advantageous to have the most 3′ base be unique to the subfamily targeted. Several software packages exist to aid in this process. Identification of potentially informative loci involves the generation of ‘half-sites’ from the genomes of interest. Specifically, a linker ligation protocol first suggested by Munroe et al. (93) and refined by Roy et al. (94) and Ray et al. (38) (Fig. 4) is used to clone the sequences neighboring one side of an insertion. This process involves the digestion of genomic DNA in such a way that an overhang is produced. That overhang is matched to a set of annealed linkers, which are ligated to the digested genome fragments.
If you have some information on the consensus sequence of the Alu subfamily you are targeting, use an alignment of that subfamily consensus and other Alu subfamilies to design primers to be used. Primers should be as specific as possible to the subfamily of interest and preferably end with a subfamily specific base. Standard primer design criteria regarding length, GC content and annealing temperature should be considered (see Notes 4.3 and 4.4).
The most productive hits are single, high-scoring hits from the genome of interest in which the flanking sequence is unique. When these are encountered, BLAT can be used to expand the coverage of the genome region to determine two important pieces of information. First, you can immediately discover whether the insertion you recovered from the genome of interest is present in the reference genome. This in itself may be useful information in resolving your phylogeny. Second, you can identify the opposing flanking sequence of the SINE insertion in the reference genome. Using the opposing flank and the flanking sequence from the genome of interest, oligonucleotides primers can be designed using standard methodologies. Primer design should take into account the potential presence of other mobile elements in the flanks. These should be avoided as priming sites for reasons stated above.
The pre-integration site of a de novo L1 insert would be identified by searching BLAT (e.g., hg18; Mar. 2006 freeze) with its each upstream and downstream flanking sequence obtained from above rescue procedure. The acquisition of pre-integration sequences confers the opportunity of additional analyses such as endonuclease cleavage sites, TSD structures, and target sequence alterations derived from the L1 retrotransposition.
Our research is supported by National Science Foundation BCS-0218338 (MAB) and EPS-0346411 (MAB), National Institutes of Health RO1 GM59290 (MAB) and PO1 AG022064 (MAB), and the State of Louisiana Board of Regents Support Fund (MAB). DAR is supported by the Eberly College of Arts and Sciences at West Virginia State University.