I. King Jordan, Georgia Institute of Technology, USA
In this compelling hypothesis paper, Jerzy Jurka and colleagues lay out their vision for the relationship between the genome dynamics of transposable elements (TEs) and the process of speciation. They present the 'carrier subpopulation (CASP)' hypothesis, which emphasizes that species-specific differences in TE family composition are best explained by differences in species' population structure. The basic idea of the CASP hypothesis is that subdivided populations will inherit distinct sets of active TEs, or TE subfamilies, and furthermore different subsets of these TEs will be randomly fixed by genetic drift among the divided subpopulations. Meanwhile, given sufficient time and the availability of distinct niches, the divided subfamilies will diverge into new species. Together, this will lead to a greater relative diversity of young TE families for lineages that include numerous species. The CASP hypothesis is notable for the fact that it posits a passive, rather than a causal, role for TE activity and accumulation in the process of speciation. As such, the CASP hypothesis serves as a counter-point to the 'genomic drive' hypothesis for the significance of TEs with respect to speciation, which holds that amplification of TEs leads to speciation by increasing genomic variability.
The presentation of the CASP hypothesis in this paper is interesting, thoughtful and timely, and I expect that this work will be quite thought provoking to investigators working on TEs, evolution and genomics. Thus, I certainly support publication of the work in Biology Direct. Below, I provide a number of comments, questions and suggestions that the authors may wish to consider prior to finalization of their manuscript.
1. There are assumptions of the CASP model that need to be critically interrogated. One critical assumption of the model is that fixation of repetitive families takes place primarily in small populations by genetic drift. Of course, it is entirely reasonable to posit that random fixation any genetic element would occur preferentially in small populations. However, this assumption seems to imply that TE fixation dynamics are dominated, or even exclusively shaped, by population level forces. In fact, the population dynamics of repetitive elements can be considered to take place at two levels - there are indeed population level dynamics predicated upon differential reproductive success of individual organisms but there are also genome level dynamics based upon differential reproductive success of individual TE copies or subfamilies. This understanding was articulated in the early formulation of the selfish DNA theory when it was theoretically demonstrated that TEs could increase in copy number even if their replication was deleterious to the host (Hickey 1982 Genetics 101: 519). In other words, the genome level replication dynamics of TEs could overcome the population level effects of selection. This may be an extreme view, but there is certainly an interplay between population dynamics and genome level dynamics when it comes to TE replication and fixation. Any model that only treats one or the other of these two important levels may be missing a critical component.
Response: The CASP hypothesis is based on neutral theories (refs. 27-29) that include slightly deleterious mutations under relaxed selection in small populations. Under relaxed selection the mutation rate by TEs (the rate of transposition) and the fixation rate are the same, which may leave the impression that one is missing.
2. Another important aspect of the model is the idea that randomly divided subpopulations will inherit different sets of active TE copies (Figure ). On its face this seems quite reasonable. However, the master copy model for TE replication holds that one or a few copies of a TE (sub)family are primarily responsible for the replication and ongoing expansion of the family. If there are only a few master copies, or if they are highly identical in terms of sequence/structure, then the likelihood of different subpopulations inheriting distinct sets of active TEs would seem to be reduced. How well does the master copy model hold for the species examined here, in particular for the primate lineage with respect to Alus and L1s, which are discussed at length? And how would this impact the CASP hypothesis?
Response: The CASP hypothesis permits subpopulations without active TEs. However, such subpopulations may be less likely to diversify fast enough to become foundations for new species (see section "Carrier subpopulations and the origin of species"). The hypothesis implies that very active TEs are most likely to contribute to productive speciation. Nevertheless, they can be very destructive and only a few of them left behind viable subpopulations that were foundations for new species. They also left behind large repetitive families in primates that were the basic evidence for the master gene hypothesis (refs. 10-15). Slowly replicating TEs are much less likely to affect speciation. They are also less destructive and probably less frequently suppressed by the silencing mechanisms. Therefore, they are more likely to be represented by multiple active copies as proposed by the "transposon model" (refs. 47-48).
3. There is one slightly troubling (or perhaps simply confusing) aspect of the CASP hypothesis articulated in the conclusion (implications) section of the article. Here, the authors mention that the 'genomic record of young TEs can be a powerful indicator of... subdivisions in the population that underlie speciation events' but then go on to state that the opposite pattern of a lack of young TE families in a genome may not be informative because it could be due to other factors. This would seem to suggest that the hypothesis lacks discriminating power with respect to the relationship between the extent of young TE families in a genome and speciation rates along a lineage. Is this really the case? Does this mean that one would not be able to systematically relate the extent, or lack, of young TE families to high or low levels of speciation?
Response: The CASP hypothesis links speciation to population subdivision, which is driven by the availability of biological niches. There are three basic categories of subpopulations: (1) those that do not carry any active TEs but still evolve into new species, due to geographical factors allowing both survival and reproductive isolation; (2) subpopulations that carry moderately active TEs that have little or no impact on potential speciation events and (3) subpopulations that are rapidly mutated by very active TEs and, if they survive, they are likely to become founding populations for new species. Given the ubiquitous nature of TEs, the first category of subpopulations is probably rare. We think that the most common is category 2 and majority of the diverse families of TEs originate in such subpopulations (see "Origin of diverse families of TEs"). The last category is the "extreme version" of the category 2.
4. Michael Lynch has written extensively on the importance of non-adaptive aspects of evolution, i.e. the role of genetic drift, in shaping genome architecture (see his book The Origins of Genome Architecture). Similar to what is proposed here, Lynch holds that TEs are able to accumulate to high copy numbers owing to the reduced efficacy of natural selection in small populations. The CASP hypothesis should be considered in light of the previous work of Lynch along with the ensuing discussion (controversy) that his work engendered.
Response: In the last section we discuss the relationship between the CASP hypothesis and the Lynch & Conery hypothesis in the context of the ongoing debate on the role of drift in evolution of genomic complexity.
5. The authors make a very clear and strong statement in the abstract regarding the relationship between the numbers of young TEs in genomes and the number of species in a lineage. In the body of the manuscript, they go on to provide data on the diversity of TE families among several vertebrate, plant and insect genomes (Table ) and additional more detailed data on the diversity of TEs in mammalian species (Table ). However, it was not immediately apparent how, or even whether, these data directly support the CASP hypothesis. For example, they show that closely related Anopheles species have large differences in TE diversity and they speculate as to the pattern of population subdivision this would predict, but they do not confirm whether this conjecture is borne out by the data. Similarly for Table , the authors discuss the data in depth as they relate to various aspects of TE and species population dynamics, but they don't show a clear pattern of high numbers of young TEs and high numbers of species in a lineage. It would really help the reader to clearly and succinctly point out how the TE diversity data do or do not support the hypothesis. More to the point, a clear and quantitative (perhaps a regression analysis?) demonstration of the relationship between the numbers of young TEs in genomes and the numbers of species in a lineage is needed to provide support for the unequivocal statement made in the abstract.
Response: The main points can be summarized as follows (with some oversimplification): (1) Multiple families of TEs in a genome are associated with multiple subpopulations in the historical population from which the genome has emerged (younger multiple families are associated with more recent subdivisions). (2) Speciation events are likely to correlate with the cumulative number of subpopulations that originated (and vanished) during the history of a lineage and indirectly with the number of families of TEs generated in those subpopulations (recent speciation events are likely to correlate indirectly with recent fixations of repetitive families). (3) On average, there should be higher proportions of surviving species from recent speciation events than from the old ones.
Currently, there is not enough data to correlate the number of species or speciation events with the population structure over geologic time. However, we found a way to indirectly support the first point above by showing a significant positive correlation between two unrelated families (see Figure ) based on the prediction that they both correlate with the number of subpopulations in a population.
6. A corollary to comment #5 is that the best hypotheses make very specific predictions that can be empirically tested - or in the case of evolutionary hypotheses at least interrogated via direct observations on standing variation. The CASP hypothesis directly contradicts the genomic drive hypothesis (Oliver and Greene 2009 Bioessays 31:703) with respect to the agency of TEs in the process of speciation. Here, it would help if the authors could set up some mutually exclusive predictions that would clearly distinguish these two hypotheses. Oliver and Greene have recently published additional evidence for their genomic drive hypothesis (Oliver and Greene 2011 Mobile DNA 2: 8). Consideration of this work could be relevant to the authors' efforts to distinguish the two hypotheses.
Response: The basic premise of the genomic drive hypothesis is that TEs constitute the main engine of the process that can result in the generation of "widely divergent new taxa, fecund lineages, lineage selection, and punctuated equilibrium." We find this premise untenable without rooting it in a broader context of the population structure, which is the frame of reference for the definition of species and other taxonomic units in a lineage (see ref. 81). Furthermore, in their papers Oliver and Greene ignored the fundamental problem of reproductive isolation. In the most recent version of the hypothesis published in Mobile DNA, the authors moved closer to the population genetics perspective by invoking "environmental and ecological factors." However, they ended up with enumeration of changes introduced by TEs in primates to support their premise. Therefore, we find little theoretical overlap between the CASP hypothesis based on the fundamental concepts of the population genetics and the genomic drive hypothesis except that both hypotheses attempt to make sense of the undeniable evidence linking TEs and speciation. In the revised manuscript we comment only on the concept of "fecundity" in the context of large mammalian orders and briefly address the main premise of the hypothesis in the Implications section.
7. The authors work at the Genetic Information Research Institute - the home of Repbase - and thus would seem to have a uniquely close perspective and deep insight into the distribution of TE diversity among evolutionary lineages. However, the rationale behind the choice of species/lineages represented among the primary data presented in Tables & may not be immediately apparent to readers less familiar with Repbase. It would help to have a clear explanation for how and why the genomes and lineages represented among these data were chosen.
Response: First, we focused on young families of TEs to minimize uncertainties associated with annotations of older repeats. In Table we tried to focus on well annotated species to illustrate the proposed correlation between population structure and the diversity of TEs. In Figure , inspired by this review, we use a more extensive dataset from Repbase to verify the predicted correlation. In Table we focus on mammalian species that are not only well annotated in Repbase, but they also have been extensively studied from the evolutionary point of view. This helped to combine multiple lines of published evolutionary evidence in support of the hypothesis.