Reviewer 1: Eugene V. Koonin, NCBI, NLM, NIH, Bethesda, MD 20894, USA
This is a very straightforward paper that links the fixation of different families of TEs in mammalian genome with major evolutionary transitions (here described as periods of regulatory innovations). I think the following conclusion that closes the article “it is proposed that regulatory innovations increased the number of small viable subpopulations of individuals with new biological adaptations. New families of TEs became fixed in these subpopulations by genetic drift and preserved in lineages leading to modern mammals” is quite correct. The proposed chain of causation here is: regulatory innovations -> subdivision of the ancestral populations into small subpopulations -> fixation of propagating TEs via drift. I wonder if an alternative interpretation could be at least as plausible: population bottleneck -> regulatory innovations+fixation of TE via drift? Regardless, the title of the article and some language, especially in the abstract, seem to imply direct contribution of TEs to the co-temporal regulatory innovations. However, one should avoid falling prey to ‘post hoc ergo propter hoc’: both regulatory innovations and bursts of TEs could be triggered by the same population-genetic factors, and these need not be adaptive.
Author's response:Indeed, regulatory innovations either due to fixation of TEs or other mutations can occur in small (lucky) populations. We elaborated on this point in the revised version “Biological innovations and the diversity of repetitive families”.There is also a broader point inspired by this and the last review that deserves closer attention. Specifically, small subpopulations are more likely to go extinct due to harmful mutations than to benefit from advantageous mutations. Nevertheless, the subpopulations that survive can become the treasure troves of pre-screened mutations (beneficial, neutral and slightly harmful), which can be transferred back to the meta-population by crossbreeding. Therefore, the existence of multiple co-temporal sub-populations may be critical for successful weeding out harmful mutations and passing the remaining ones for further testing by natural selection (see “Repetitive families, population structure and speciation” in the main text).
Reviewer 2: King Jordan, Georgia Institute of Technology, USA
Jurka and colleagues present an analysis of repeat sequences, many of which have been deeply conserved across mammalian evolution. Detailed analysis of such sequences allowed the authors to identify distinct clusters of such repeats, three groups of which show marked expansions/contractions corresponding to previously characterized periods of regulatory innovation. Similar observations have been reported previously and have been taken to suggest that repeat families, transposable element (TE)-derived sequences in particular, may have provided raw material (e.g. novel regulatory sequences) that facilitated regulatory innovations during such periods of evolutionary transition (Oliver and Greene 2009 Bioessays 31:703; Oliver and Greene 2011 Mobile DNA 2: 8). Interestingly, the authors of this work favor a very different explanation for the expansion of repeats at such times. They hold that these periods of regulatory innovation were likely to be characterized by elevated levels of population subdivision, which in turn allowed for the fixation of numerous TE families by genetic drift. In this sense, the abundance of TE families can be considered to be a by-product of the population dynamics of the species whose genomes bear the elements rather than a driving force of evolutionary innovation (Jurka et al. Biol Direct 2011 6:44). This manuscript is therefore in some sense a continuation of an ongoing debate in the literature regarding the global role of TE family expansions and contractions in regulatory innovation, speciation and evolution. The new data the authors bring to bear on the question is a welcome addition to the debate. One thing that is missing from the discussion of the results, however, is a consideration of the evidence in favor of their very different world view compared to the groups who favor a more active role for TEs in driving regulatory innovation and evolution. Below, I pose a series of questions regarding the findings and how they relate to this debate.
As an aside it is worth noting that this paper reports a major addition of consensus sequences for newly identified human and mammalian conserved repeats, all of which have been deposited in Repbase. These consensus sequences represent and important resource for the research community and a new source of information and annotations for the human genome.
Major Points 1. The work of Oliver and Greene on the genomic drive or TE-thrust hypothesis for the role of TEs and regulatory innovation, which has direct relevance to this manuscript as discussed above, is not cited or discussed. Given that this work and the TE-thrust hypothesis articulate two clearly distinct perspectives on the role of TEs, in particular whether they are active agents or passive by standers, in the processes of regulatory innovation, speciation and evolution, these papers should be cited and discussed in the context of the current work (Oliver and Greene 2009 Bioessays 31:703; Oliver and Greene 2011 Mobile DNA 2: 8). For instance, if the authors disagree with the previous perspectives on an active role for TEs in driving regulatory innovation and speciation, they should state why. Or perhaps, if they feel these two world views are not actually mutually exclusive this could be discussed. But to ignore the works reporting a conflicting perspective is a mistake in my view.
Author’s response:In the original version of the manuscript we mentioned speciation only in passing as it was discussed in our hypothesis published a year earlier. Clearly, the issue continues to be of substantial interest because it was brought up in this and the last review. We appreciate the comments, and we elaborated on this point in the revised version (see “Repetitive families, population structure and speciation”). TEs have an evolutionary impact due to their mutagenic activities, but they are not the “drivers” of evolution. The fixation of any mutations, including the TE-generated ones, depends to a large extent on the population structure, which in turn is driven by biological innovations and ecological factors. The original “genomic drive hypothesis” and the follow-up paper lack the population genetics perspective.
2. The distributions of conserved repeat densities reported here, e.g. cTE/gTE seen in Figure , are uneven and lead the authors to conclude that there have been expansions and contractions of TE families that correspond to “three previously described periods regulatory innovations in vertebrate genomes.” I wonder if these apparent uneven distributions could be an artifact of the methods used whereby two extreme sets of repetitive sequences were used in the analysis: ancient repeats found in all mammals and human-specific repeats. What about families of repeats that are found in some, but not all, mammalian species including humans? Would inclusion of such families in the analysis change the distributions seen?
Author’s response:None of the 266 conserved repeats listed inAdditional file 2
: Table S2are human-specific. Eutrep families are present only in eutherian mammals and the cTE/gTE ratio for this group is ~ 0.17. The cTE/gTE ratio for platypus-specific families is ~0.3. Typically, the cTE/gTE ratio increases with the age of the family. The peaks in Figurecorrespond to multiple families of similar age. Figureand Figuredescribe essentially the same uneven distributions obtained by two different approaches. We don’t see any room for artifacts.
3. The majority of “conserved repeats” identified here are small fragments that cannot be related to any particular TE family. Do the authors feel that these are likely to be TE-derived as well? If they are not TE-derived how would that impact the conclusions relating the observations reported here to their previous work on the population dynamics of TEs? In other words, is it possible that many of regulatory innovations provided by the repetitive families observed here were not seeded via the fixation of TEs by drift in small populations, but rather by some other process?
Author’s response:The vast majority, if not all, of these fragments are likely to be TE-derived as well. After the paper was submitted, we classified over a dozen of conserved repeats as fragments of known TEs preserved in other vertebrates. The successful classification continues as new genomes become available (see also the response to the last reviewer). Even if some of the conserved repeats are not derived from TEs, they are unlikely to end up in our dataset, which was repeatedly filtered over the last five years, or so.
4. Related to the point above. If the distributions showing clusters of conserved families in Figure and Figure of are broken down into those families of conserved repeats that can be demonstrated to be TE related versus the others, are similar uneven distributions seen? Do the TE-derived versus the non TE-derived distributions differ substantially and if so what are the implications?
Author’s response:We added FigureA and FigureB showing the distribution of the classified and unclassified repeats. The two peaks continue to be distinguishable. In FigureA the first peak is larger than the second one. This simply illustrates that younger families of TEs are easier to classify than the older ones.
Minor Points 5. The authors state that “Using binomial and chi square statistics we identified families composed of repetitive elements significantly overrepresented in the conserved regions relative to the rest of the genome.” and the analyses reported in the paper are based on those 266 conserved families. However, no data or information on this statistical analysis is provided in the Methods or Results sections. For instance, how significant are the overrepresentations seen? Do they vary widely? Are they different for TE-derived versus non TE-derived families (see points #3 and #4 above)?
Author’s response:Typically, older repeats are more significantly overrepresented in the conserved regions than the younger ones, due to continuing attrition of the corresponding non-conserved copies over time. We don’t see any other obvious patterns.
6. Is it known whether populations of mammals were indeed more subdivided between periods of regulatory innovation as suggested by the authors on page 9-10?
Author’s response:The data suggest a more subdivided population during the evolutionary innovation periods and a less subdivided one when the periods end (i.e. between the periods). There is no direct way to determine the number of subpopulations that emerged and vanished during the evolutionary history of vertebrates. Based on our original hypothesis
], a surge in the number of subpopulations translates into an increase in the number of repetitive families in the genomic fossil record of a particular lineage. The surge is also likely to trigger a parallel surge in the number of new species and lineages, consistent with the punctuated equilibria theory. The analysis of the corresponding speciation patterns is possible based on the geological fossil record, but this goes beyond the scope of the paper.
7. I did not see the Supplemental Materials and Methods cited by the authors in the descriptions of Figure (only Supplemental Tables and Legends).
Author’s response:Corrected. The supplemental Materials and Methods are in the main text.
Reviewer 3: JÃ¼rgen Brosius, University of Muenster, Germany
This manuscript describes detection of novel transposed elements in vertebrate mammalian and primate lineages and correlates their estimated times of expansion with regulatory innovations in vertebrate genomes. The authors detected ~150 additional "families of TEs" overrepresented in conserved regions of the genome. Some of these elements have similarities to known repeats, but most of them remained unidentified, thus far. The majority of newly described elements present in genomes are less than 150 copies. The "abundance" of some is less than 10 copies. It might be hard to discriminate between bona fide TEs and sequences that amplified via segmental duplications.
Here, the zinc-finger proteins or domains come to mind. This should be discussed and perhaps more information given as to why those elements are not derived from frequent segmental duplications or are merely retropseudogenes of, e.g., tRNAs or other small RNAs.
Author’s response:Based on our experience, the elements that are homologous to known pseudogenes, functional motifs and known TEs are relatively easy to identify. More difficult is to classify some of the LTRs and non-autonomous elements, which are often very diverse even in a single species. However, as indicated in FiguresA andB, the observed patterns are quite similar for classified and unclassified families. As stated in the response to the second reviewer, the unclassified pile of conserved repeats continues to shrink as sequences from more vertebrate genomes become available.
Perhaps the authors should spend more effort to bridge the gap between events that lead to the formation of sub-populations, which could happen over as little as a few hundred years or less and time frames of the proposed regulatory innovations during separation of the mammalian and bird lineages, the diversification of the mammalian lineages prior to the origin of eutherian lineages or even the diversification of mammals. Here, time spans of a few million up to tens of million years are probably involved. It is likely that very few, if any of the TEs insert are chock-full of pre-assembled, ready-to-use functional gene modules, such as enhancer sequences -- enhancers usually consisting of arrays of transcription factor binding sites that, with ~10 bp, are relatively short. In analogy to grape juice requiring further steps to generate wine, initially TEs are the raw material for innovation. As we have shown for the exaptation of (parts of) SINE elements as novel protein domains, it can take tens even 100 million years for such raw material to fortuitously acquire mutations that pave the way to functionality and exaptation [Krull M., Petrusma M., Makalowski W., Brosius J. & Schmitz J. (2007) Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs). Genome Res. 17:1139–45]. This is well beyond the time frames for formation of subpopulations.
Author’s response:We expanded on this point mostly in the section “Biological innovations and the diversity of repetitive families.“ The pattern of regulatory changes during a particular period of vertebrate evolution probably drove new biological adaptations. In fact, similar changes took place independently in different lineages
]. Based on our original hypothesis
], we propose that the origin of new subpopulations was fueled by the new adaptations. Most subpopulations probably originated and became extinct in a relatively short time. The surviving subpopulations either diverged into separate species and lineages or channeled their TE-produced mutations back to the meta-population by crossbreeding. Therefore, the genomic DNA of the mammalian lineage combines mutations from countless subpopulations that existed at some point of time during the history of the lineage. These mutations likely contributed to the “raw material” fueling the genomic changes (see also the response to the first reviewer).
In nature almost anything that is possible does happen and one cannot rule out such TE induced "regulatory hopeful monsters" resurrecting ideas of Richard Goldschmidt (see below) occasionally arise. However, one can imagine that a novel TE family with high copy number, that readily alters expression of most genes in whose vicinity it integrates, would cause havoc and would not bode well for the fitness of small populations.
Authors' response:This is quite correct. The extinction rate of small subpopulations is likely to be high and it may be an inherent part of the evolutionary process (see response to the first reviewer and “Repetitive families, population structure and speciation” in the main text).
The idea that TEs contribute to speciation is not new and it is obvious that most of the time, the path leads via sub-populations. In the following, I am citing the work and ideas of other investigators from one of my review articles in [Brosius J: Disparity, adaptation, exaptation, bookkeeping, and contingency at the genome level. Paleobiology 2005, 31:S1-16]:
"Nevertheless, expression of the same gene at different times in development or in different cell types has long been suggested to be a key event in speciation
(Wilson 1975; Zuckerkandl 1975; Gould 1977b), and newly inserted retronuons are well capable of inducing such alterations […]. Without ignoring the potential of chromosomal rearrangements […] or even point mutations in a single gene, single retroposition events and, more likely, the combined impact of a newly arisen retronuon family (see also below) are reasonable scenarios that set the course for speciation (Bingham et al. 1982; Rose and Doolittle 1983; Ginzburg et al. 1984; McDonald 1990 […]). Apart from the significance of Hox genes and other developmental switches, I see the likely role of retroposition in speciation as a partial vindication of Richard B. Goldschmidt’s proposals concerning species-level saltations (Goldschmidt 1940; Gould 2002; Ronshaugen et al. 2002; Dietrich 2003; Wagner et al. 2003)."
Relevant references from the above quote:
Bingham, P. M., M. G. Kidwell, and G. M. Rubin. 1982. The molecular basis of P-M hybrid dysgenesis: the role of the P element, a P-strain-specific transposon family. Cell 29:995–1004.
Britten, R. J. 1996. DNA sequence insertion and evolutionary variation in gene regulation. Proceedings of the National Academy of Sciences USA 93:9374–9377.
Dietrich, M. R. 2003. Richard Goldschmidt: hopeful monsters and other ‘heresies.’ Nature Reviews Genetics 4:68–74.
Ginzburg, L. R., P. M. Bingham, and S. Yoo. 1984. On the theory of speciation induced by transposable elements. Genetics 107: 331–341.
Goldschmidt, R. B. 1940. Material basis of evolution. Yale University Press, New Haven, Conn.
Gould, S. J. 1977b. Ontogeny and phylogeny. Belknap Press of Harvard University Press, Cambridge.
Gould, S. J. 2002. The structure of evolutionary theory. Belknap Press of Harvard University Press, Cambridge.
McDonald, J. F. 1990. Macroevolution and retroviral elements. Bioscience 40:183–191.
Ronshaugen, M., N.McGinnis, andW.McGinnis. 2002.Hox protein mutation and macroevolution of the insect body plan. Nature 415:914–917.
Rose, M. R., and W. F. Doolittle. 1983. Molecular biological mechanisms of speciation. Science 220:157–162.
Wagner, G. P., C. Amemiya, and F. Ruddle. 2003. Hox cluster duplications and the opportunity for evolutionary novelties. Proceedings of the National Academy of Sciences USA 100: 14603–14606.
Wilson, D. S. 1975. A theory of group selection. Proceedings of the National Academy of Sciences USA 72:143–146.
Zuckerkandl, E. 1975. The appearance of new structures and functions in proteins during evolution. Journal of Molecular Evolution 7:1–57.
Authors' response:Indeed TEs likely contributed to the process due to their mutagenic activities and the role of small populations in speciation has been explored for quite some time. Our goal was to strictly focus on the origin of multiple repetitive families based on our original hypothesis
], and on the results presented in this paper. Nevertheless, both the origin of repetitive families and the origin of species are conceptually linked to small sub-populations. Therefore, in response to the thoughtful remarks presented in this and the previous review, we elaborated the text accordingly (see “Repetitive families, population structure and speciation”).
While clear to members of the scientific community working on transposable elements, the way transposable elements are described in the background sections of abstract and main body of the paper might give rise to some confusion and misunderstandings when addressing a broader audience. For example, it is not typical that transposable elements generate multiple interactive copies of themselves. In contrast, only a very small number (as little as one) master, source, or founder genes from which the proposed RNA template is transcribed are indeed multiplying THEMSELVES. Most integrated copies are not transcribed, available RNA copies being a prerequisite of retroposition. Hence, most TEs (with the exception of DNA transposons) are transposed elements with a small minority being transposABLE. This should be addressed throughout the text.
Authors' response:We revised the text as suggested.
In the background section and throughout the manuscript, it should be clarified that TE families and the members decline via loss (elimination is for my taste too active a process) of DNA through recombination AND mutation of changes that, over time periods long enough, render such TEs virtually undetectable. TEs are mostly lost due to attrition. For example, the sentence on page 7 would be more precise as follows: "The ratio reflects a faster attrition of non-essential DNA in mouse than in human, driven by differences in the mutation rates between the two [16
Authors' response:Attrition is an excellent term. We appreciate the suggestion and made the changes.
Not only after fixation can TE derived repeats be recruited as functional components of the non-coding genomic DNA, but 1) can be exapted prior to fixation and 2) can also contribute to protein coding DNA as novel exons.
Authors' response:Any exaptation prior to fixation and other specific events are difficult to evaluate statistically. Therefore, our focus is mostly on the major evolutionary mechanisms leading to the observed rise and fall in the number of diverse families of TEs.
I presume the characteristic hallmarks of DNA transposon's such as terminal inverted repeats are, after long time periods, only useful if they are long enough. The shorter terminal inverted repeats, like the short direct repeats of retroposons, will not be recognizable. See for example page 6 bottom, where the authors should explain which TE-specific features were applied.
Authors' response:Indeed, classification of ancient DNA transposons with short terminal inverted repeats (TIRs) is notoriously difficult. However, in most cases the classification is based on homology to known TEs, not on the presence or absence of putative TIRs. This is reflected in the annotations of individual conserved repeats.