|Home | About | Journals | Submit | Contact Us | Français|
Transposable elements are an important source of genome diversity and have crucial role in genome evolution.. A recent study by Zhao et al. describes novel patterns of transposable element (TE) diversification in the genome of the extinct mammoth, Mammuthus primigenius. Analysis of Mammuthus has provided a unique genome landscape, a pivotal species for understanding TEs and genome evolution, and hints at the diversity we verge on discovering by expanding our taxonomic sampling among genomes. Strategies based on the work. might also revolutionize investigations of the interface between TE dynamics and genome diversity.
Transposable elements (TEs, Box 1) have had substantial impacts on eukaryotic genomes throughout life’s history, being responsible either directly or indirectly for much of the genomic diversity we see today. Not surprisingly, studies of TE impacts on human and non-human primate genomes are numerous and well developed. We know for example how the movement of TEs has impacted human diseases , genome size [2–8], and the transcriptome [9–11]. But how well does our little corner of the genomic world reflect TE diversity and impact in a more general sense? The broader mammalian perspective is only now being investigated, and although we are starting to answer this question (see Ref  for an example), many gaps in our knowledge remain.
Transposable elements are repetitive DNA sequences that accumulate in genomes via multiple mechanisms. Class I elements (a), the retrotransposons, utilize a ’copy and paste‘ method referred to as retrotransposition. With these elements, the original DNA copy in the genome is first transcribed to mRNA. This transcript is often used as a template by reverse transcriptase to form a DNA molecule that is then inserted into a new location in the genome via a process known as target-primed reverse transcription . There are several subgroups of Class I elements including the LTR (long terminal repeat) and non-LTR retrotransposons (including the Long INterspersed Elements, LINEs). Autonomous elements within these categories encode much of the enzymatic machinery required for mobilization. In the case of LINE/RTE, the principle Class I element described by Zhao et al., one or two open reading frames (ORFs) provide endonuclease (EN) and reverse transcriptase (RT) activity. LTR element structure differs from LINEs in the identity and organization of ORFs. Often, they include a virus-like integrase (IN) coding sequence, and do not contain a tract of direct repeats (DR).
Class II elements (b), the DNA transposons and their derivatives, are common in many organisms from bacteria to humans. First discovered by Barbara McClintock in maize, DNA transposons differ from Class I elements in that they often utilize a ’cut and paste‘ mechanism. In other words, the entire DNA segment of autonomous elements is excised from where it resides and is reinserted into the genome at a different location. This is accomplished via an encoded transposase. Surrounding the transposase open reading frame are 5′ and 3′ untranslated sequences and the terminal inverted repeats (TIRs), which are the recognition target of the transposase during mobilization. The Tigger1 transposon was mentioned in Zhao et al. as potentially recently active in the mammoth.
Non-autonomous elements exist within both classes. These typically short elements rely on an autonomous partner to provide the enzymatic machinery for their mobilization. Two SINEs (Short INterspersed Elements), common non-autonomous partners of LINEs, were discussed in detail by Zhao et al.– AfroSINE and AfroLA. These tRNA-derived SINEs are similar in structure but differ in their temporal distribution with AfroSINEs spreading throughout the genome much earlier than AfroLA. Non-autonomous DNA transposons (various MERs) were also recovered but appear to be ancient and inactive. In all cases, the arrows represent target site duplications (TSD) generated on insertion.
Recently, Zhao et al.  applied next-generation sequencing (454) to address the question in a unique way – by investigating TE amplification dynamics in the woolly mammoth (Mammuthus primigenius), a species that has been extinct for ~10 000 years. Using the massive data available from the mammoth genome project, they determined likely TE content using an iterative process to identify relatives of known TE families. It was immediately clear that the sheer volume of TEs within the mammoth genome sets it apart from other mammals. The uniqueness of the mammoth genome can been seen in several TE-related areas, including genomic expansion, TE diversity, and a likely case of horizontal transfer of a Class I TE. Such observations hint at tremendous potential for finding a vast array of TE associated diversity in mammals as their genomes are explored. Zhao et al.  also demonstrate the impressive potential of next-generation sequencing with regard to subgenomic analysis. The increased throughput provided by platforms such as the 454, Illumina, and SOLiD systems has revolutionized numerous aspects of biological analysis from gene discovery, to expression profiling and whole genome sequencing. In particular, the combination of high-throughput DNA sequencing and the repetitive nature of TEs make certain next-generation sequencing platforms ideal for rapid and inexpensive genome wide TE analyses. Here we suggest how researchers can use next-generation sequencing approaches to implement broad genome-wide surveys of TE content and discuss the likely impact this will have for our understanding of the wider role of TEs.. Such strategies could radically alter our ability to investigate and understand the complex interface between TE amplification dynamics and genome diversification
The mammoth genome contains a greater proportion of TEs than any mammal analyzed to date. This led to Zhao et al. to highlight the potential connection between the increased genome size (~50% larger than our own [14, 15]) and rapid expansions of particular TEs. Increased genome size has long been considered a potential consequence of TE expansion  and many mammals appear to have accommodated massive TE-mediated genome expansions, whereas certain animals (e.g. birds, reptiles and some fish) have had a tendency to eliminate them [17–21]. For example, analysis of the recently sequenced genome of Anolis carolinensis revealed that although these lizards have several recently active lineages of LINEs, they have essentially reached equilibrium between TE insertion and removal over the past several million years . Observations of the mammoth genome, along with many other comparisons between mammalian and non-mammalian taxa, suggest that we and our hair-bearing relatives share a unique ability to accommodate some TE expansions while repelling others. Several hypotheses have been advanced to explain this observation include the utilization of DNA methylation as a control mechanism [22–24], decreased ectopic inter-element recombination [21, 25], and increased permissiveness for some families to allow for inter-element competition and selection .
It is important to note, however, that the mammoth genome expansion was probably not the result of the common mammalian LINE, L1, but instead appears to be the result of both L1 and a nearly-parallel RTE (one of eleven well-defined lineages of LINEs ) element expansion. Zhao et al. found that as much as 12% of the mammoth genome consists of RTEs, whereas the mammal with the next highest RTE proportion is the opossum at a mere 2.3% . Thus, the mammoth is the first eutherian genome characterized to have accommodated multiple simultaneous LINE expansions. Most other TE types (SINEs, LTRs, LINEs (L1 and L2), and DNA transposons) in mammoth DNA were on par with or in fewer numbers than in other mammalian genomes. Significantly, RTE elements are absent in armadillo , cetartiodactyls, primates, carnivores and rodents  but are found in ruminants and at least two afrotherian clades, tenrec and the modern elephant  (Figure 1). This distribution of RTEs lends support to the idea that each repeat is a unique genomic invasion by a lineage of LINEs with an unprecedented ability to spread via horizontal transfer [28, 30, 31].
Although not the main focus of the study, Zhao et al. also noted relatively recent activity by Tigger1 (a Class II element or DNA transposon) in the mammoth genome. With the exception of a single bat genome that has experienced multiple massive waves of recent DNA transposon activity [32–34], many mammals including M. primigenius have repelled DNA transposons rather successfully. However, this Tigger1 expansion adds to a growing number of studies suggesting that there are rare occasions in which mammalian genomes are impacted by single Class II lineage expansions [35, 36] (Ray et al., unpublished). The increasing number of these isolated instances demonstrates that a general shutdown of mammalian Class II TEs, as suggested by numerous studies [37, 38], has, at the very least, been subverted by selected elements via multiple instances of horizontal transfer into some mammalian genomes.
These observations suggest that our focus as researchers is to now determine the answers to several likely interrelated questions. What is the mechanism of these horizontal TE transfers, both Class I and Class II? What makes some genomes more susceptible and/or tolerant of TEs and novel genomic invasions by TEs? Why are some TE families, like RTE, better able to ‘jump’ between genomes than others? It is clear that the answers to these questions will not be found by the relatively limited sampling of genomes currently available to us. Instead, extensive study of a wide variety of mammalian and non-mammalian genomes will be necessary to answer any one of these questions. Fortunately, the data presented by Zhao et al. provide a start to the process by expanding our knowledge of TE dynamics to a taxonomically important mammalian lineage, Afrotheria,
For obvious reasons, most genomic sequencing and analyses tend to be focused on biomedical model species, such as Rattus, Mus, and various non-human primates [38–41]. However, a thorough knowledge of mammalian TE dynamics is only possible when appropriate outgroups are also examined. One of the more important aspects of the M. primigenius analysis is that the authors chose to study a unique and important mammalian lineage that serves as a basal group within Class Mammalia. After the metatherian–eutherian divergence ~105 mya , it is estimated that the earliest diverging clade of extant mammals, Atlantogenata, arose quickly (~103 MYA; ) almost twenty million years before the next major divergence, Laurasiatheria . As a member of Atlantogenata, Afrotheria is one of the earliest diversifications of Atlantogenata (Figure. 1). As a result, Zhao et al.’s analysis of M. primigenius is an important contribution to the study of genome dynamics associated with the protherian, metatherian, and eutherian diversifications.
Of broader significance, Zhao et al. have successfully harnessed the power of next-generation sequencing to target a particular genome component. Indeed, the power of next-generation sequencing for understanding the evolutionary patterns of repetitive sequences and their impact on genome evolution has recently been shown in work in the pea (Pisum sativa) genome  and in the soybean genome . The use of 454 sequencing technology seems particularly well suited to genome-wide TE analysis. . As seen from Zhao et al’s analysis in the mammoth genome, the relatively long average read-lengths of 454 technology (~250 bp for FLX chemistry and 400 bp or more for Titanium chemistry) enabled the identification of full insertions of smaller, nonautonomous elements, such as SINEs and MITEs (Minature Inverted-repeat Transposable Elements): a task that would be difficult for other technologies that have higher throughput but considerably shorter reads (Illumina and SOLiD) on the order of 75+ bp. It is now clear that in complex genomes (e.g. the plants examined by Macas et al.  and Swaminathan et al. , and now, the extinct M. primigenius), TE content can be accurately surveyed using the random genomic fragments targeted by 454 pyrosequencing.
What is the best way to use the sequencing tools available? For example, can we examine multiple taxa in a single 454 run? We suspect that this might be an efficient strategy. Using the older FLX chemistry, Macas et al.  obtained ~33 Mb of data from the 4.3 Gb pea genome (~ 0.77% of the genome), with the data consisting of 319 402 reads with an average length of 104 bp. Using this data, they identified what are likely to be all of the major TE families in the genome. The new Titanium 454 chemistry from Roche promises one million reads averaging 400 bp, or 0.4 Gb of data. Assuming an average mammalian genome size of 3.3 Gb (http://www.genomesize.com), a single Titanium run would provide a researcher with a random sample of 12% of a single genome, much more than is necessary for surveying mammalian TEs. It would therefore be more cost effective and efficient for a researcher to subdivide the run among 10 taxa (1.2% of the genome for each), thus generating the data for a taxonomic survey of TE dynamics in a cluster of taxa rather than just one. As a result, we would be making large-scale genome-wide comparisons and investigations of TE dynamics in a broad range of taxa the norm rather than the exceptional study.
By providing a thorough analysis of TEs within the mammoth genome Zhao et al.  have advanced the quest to understand TE dynamics in mammalian genomes. Although the mammoth has some typical mammalian genome qualities, such as a relative lack of recent DNA transposon activity, other characteristics, such as the increased activity of RTEs and larger overall size, make this genome unique, further highlighting its significance. The data also point to the potential variety of mammalian TE dynamics that might be just around the corner given our rather limited sampling to date.
Still, several questions remain. For example, how does TE diversity impact species diversity? In some cases, we see a correlation between the two – consider the bat genus Myotis with its massive Class II TE activity within the same historical period of its world-wide diversification into 100+ species . In other cases, there is no obvious connection – the bat genus Pteropus is nearly as species-rich but appears to have experienced a shutdown of all TE activity . Obviously, mere TE activity is not enough to ensure diversification. Under what ecological and genomic conditions would such activity contribute to adaptive radiations?
What is the mechanism through which the observed instances of horizontal transfer might occur? There must be a vector by which transferred elements are moved from genome to genome. The most obvious place to look would be among the blood-born parasites and the pathogens (viruses in particular) that they harbor. However, probing random parasite and pathogen genomes would be a rather inefficient methodology.
Depending on taxon selection, the goals of both studies could be easily addressed. For example, examinations of taxa sharing similar ecological niches but distinct taxonomic distributions and levels of species diversity might be a way to investigate whether TE expansions have impacted species diversity. Alternatively, targeted 454 sequencing of parasites that are shared among organisms known to have participated in horizontal transfer events parasites might be yield results of interest to researcher attempting to identify the mechanism of horizontal transfer of elements.
Regardless, this study and others make it clear that we are entering a new phase in genomic research. By utilizing high-throughput genome sequencing and available computational tools efficiently there is little reason for us not to gather genome-scale data in an effort to investigate the interface among TE dynamics, genome change and species diversification.
D. G. Peterson contributed valuable comments to earlier versions of this manuscript. Transposable element research in the Batzer Laboratory is supported by the Louisiana Board of Regents Governor’s Biotechnology Initiative GBI (2002–005), and National Institutes of Health RO1 GM59290.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.