Reviewer #1: Dr. I. King Jordan, Georgia Institute of Technology
Igor Rogozin and colleagues have written a comprehensive, synthetic and compelling review mainly covering the ‘evolution’ of spliceosomal introns. As in Darwin’s famous tome, the notion of ‘origin’ is actually given short shrift, but I will come to that point later. In any case, this ambitious work benefits from the broad perspective that the authors have gained over years of investigating the subject. Along with this perspective comes some inevitable bias, or perhaps it is more fair to say a favored world-view, as described below with respect to the authors position on ‘introns-kind-of-early’. But this does not represent a liability of the work in my opinion; the authors are clearly entitled to their views, and the conclusions they draw appear to be both nuanced and well-supported by the data. They cover a lot of ground herein and strike a nice balance between thoroughly reviewing the relevant literature and elucidating the most salient points from the large body of work on the subject. One of the major conclusions of this review relates to a resolution, or compromise really, of the ‘introns-early’ versus ‘introns-late’ debate that consumed the field for many years. The authors champion a merger of these two hypotheses into the ‘many introns early in eukaryotic evolution’ view, whereby the earliest eukaryotic lineages contained genomes that were already loaded with many introns and subsequent evolution was dominated by intron loss.
The parts of the review that cover the origin of spliceosomal introns are the most speculative and least supported. This is not a critique per se; it may simply be the case that the study of origins must always be more speculative than the study of evolution. According to the ‘many introns early in eukaryotic evolution’ hypothesis, the earliest eukaryotic genomes were formed via massive intron invasion that resulted in genomes consisting of up to 80% intronic DNA. Crucially, the authors hold that this invasion was probably facilitated by low effective population sizes and the corollary weak purifying selection, following the influential Michael Lynch model for the non-adaptive evolution of eukaryotic genome complexity. This model accounts for population level dynamics but neglects the internal dynamics of the genome. If spliceosomal introns indeed evolved from Group II introns, as the authors maintain, then the initial intron invasion of eukaryotic genomes would have been driven, to some extent, by a kind of selfish genetic element with its own internal drive mechanism to replicate within the genome. In theory, such selfish replicators can efficiently increase in copy number even in the face of a selective cost to the host. Therefore, the early origin of introns may be attributed to an active internally driven process, rather than a solely passive drift related process, i.e. a mechanism akin to the molecular-drive concept of Gabriel Dover or the mutation bias emphasized by Arlin Stoltzfus. Such an active replicative process inside the genome could have actually outpaced selection’s ability to contain it. The authors actually touch on this notion, when they speculate as to whether introns are genomic parasites and how the host may have evolved the spliceosome as an adaptive response to intron invasion, but an explicit connection between their selfish drive to replicate and the origin of introns is not made.
Authors’ response: We agree on all accounts. Yes, it comes with the territory: discussion of origins is inevitably more speculative than the analysis of subsequent evolution. More importantly, the role of the active mobility of Group II introns certainly must not be under-appreciated, and we explicitly point this out in the revised manuscript: ‘Indeed, it has to be emphasized that Group II introns are typical mobile elements that actively spread around the host genome when given a chance by weakness of purifying selection pressure.’
One specific suggestion as to how the work can be improved relates to the abstract. Currently, the abstract is very short and concise, whereas the manuscript is rather long and presents a lot of material. I think it would be helpful to provide a more detailed abstract that specifically enumerates the authors’ most important points, something more like of a summary of the last two sections of the manuscript.
Authors’ response: We fully agree, the original short abstract resulted from a misunderstanding regarding the limits on abstract length in review articles. In the revised article, the abstract was substantially expanded.
Reviewer #1: Dr. I. King Jordan, Georgia Institute of Technology (additional comment on the revised version of the manuscript)
I have re-reviewed the manuscript of Rogozin et al. I am satisfied with the changes made, for the most part, and I recommend that the paper be accepted for publication in Biology Direct after the following point is addressed.
I would like the authors to elaborate just a bit on their response the first comment that I made, in particular with respect to the connection between Group II intron dynamics and the evolution (emergence) of introns. I think I may have rambled a bit in my original comment and was not explicit enough. I would urge the authors to have a look at the manuscript of Donal Hickey from Genetics (Hickey 1982 101; 519), which makes the point much better than I did in my comment. The population genetics models in the manuscript may be a bit simplistic by this time, but I think the ideas contained therein are highly relevant to their own work. In particular, Hickey makes an explicit connection between the genome dynamics of mobile elements, host selection pressure and the evolution of introns. The basic idea is that mobile genetic elements can spread in a population even in the face of a fitness cost to the host, and this kind of process could have resulted in the emergence and spread of introns. Below, I provide a comment in response to the authors' response to my first comment in an attempt to facilitate further discussion and consideration of this issue.
Authors' response: We agree on all accounts. Yes, it comes with the territory: discussion of origins is inevitably more speculative that analysis of subsequent evolution. More importantly, the role of the active mobility of Group II introns certainly must not be under-appreciated, and we explicitly note in the revised manuscript: 'Indeed, it has to be emphasized that Group II introns are typical mobile elements that actively spread around the host genome when given a chance by weakness of purifying selection pressure.'
Response: I would like the authors to further consider the possibility that mobile elements (such as Group II introns) can increase in frequency in a population, even when they impose a fitness cost on their host organisms, owing to the fact that a replicative transposition process results in a biased transmission rate relative to host genes. This idea was introduced by Donal Hickey 30 years ago, and he also connected this point to the evolution of introns (Hickey 1982 Genetics 101: 519). In other words, it is not simply a matter of weak purifying selection allowing active spread of the elements, but an effect of the element mutational dynamics introducing directional bias in the evolutionary process. This idea is very much analogous to the notion that mutation bias in the broader sense can be a cause of direction in evolution (e.g. see Yampolsky and Stoltzfus 2001 Evol Dev 3: 73).
Authors' response: We agree that mutational dynamics of selifhs element could be an important driver of their spread. We think that once this additional exchange with the reviewer is published, the emphasis on this issue will be adequate.
Reviewer #2. Dr. Tobias Mourier, University of Copenhagen (nominated by Dr Anthony Poole)
This review provides a comprehensive overview of the current knowledge of intron evolution in eukaryotic genomes.
The advent of numerous eukaryotic genomic sequences has consistently supported the 'many introns early in eukaryotic evolution' concept, as evident from the manuscript. But surely this hypothesis is not a merger from the introns early/late/first ideas (as the authors write in the "Intron-early, introns-late, introns-first …" section). All recording of spliceosomal intron features comes from eukaryotic genomes, and regardless of how many eukaryotic genomes are sequenced, extant spliceosomal intron features will never allow one to synthesize past LECA.
In the end of the manuscript, the authors present a scenario proclaiming that an intron-rich LECA is not inconsistent with the introns-late hypothesis. This is not a problem, but the structure of the manuscript may give the impression that this is a conclusion (or synthesis) directly from the current knowledge of eukaryotic gene architecture (that is nicely reviewed in the preceding text).
Authors’ response: Actually, we do believe that the synthesis we present in the section of the review preceding the Conclusions follows from the comparative genomic results reviewed in the preceding sections. Certainly, not all parts of the article directly contribute to this synthesis: for instance, the discussion of the functional roles of introns is only tangentially relevant here albeit important in other respects. Nevertheless, we do maintain that in this section we present major implications of the comparative genomic study of eukaryotic gene structure.
The review presents an overview of the comparative approaches taken to delineate intron-exon structures during evolution. The basis for such comparative analyses is well-aligned sequences around splice sites. If intron-exon structures to some extent evolve via mechanisms such as alternative splicing and intronization of exonic sequence, should this not result in sequences that are unlikely to meet the criteria for being included in the above analyses? I think it would be relevant to discuss the implications of this.
Authors’ response: This issue is discussed in the section ’Evolutionary conservation of intron positions and routes of gene architecture evolution of eukaryotes’.
Section "Functional elements and genes within introns"
When discussing intronic RNA genes, I'm surprised there is no mentioning of the classical connection between vertebrate snoRNAs and introns (and perhaps even the existence of genes with non-coding exons and introns encoding snoRNAs, (e.g. Tycowski et al., Nature 1996).
Authors’ response: Yes, this certainly is an important theme, and we added it to the section ‘Functional elements and genes within introns’.
Very minor points:
Page 11: "whereas the remaining 6 Xist" rather than "whereas remaining 6 Xist"
Page 14: "and so does the strength" rather than "and so does and the strength"
Page 18: should "introns are inserts or fixed" read "introns are inserted or fixed"?
Authors’ response: All corrected, we appreciate the reviewer’s attention to these points.
Reviewer #3. Dr. Manuel Irimia, University of Toronto (nominated by Dr Anthony Poole)
Rogozin et al. have put together an impressively comprehensive review on the origin and evolution of splicesomal introns that will certainly become a major reference in the field. Overall, I found it easy and entertaining to read, as well as informative. I have only a few comments and suggestions, often regarding further literature, that I hope can help to improve the piece (listed according to their appearance in the main text):
Authors’ response: We appreciate Dr. Irimia’s close attention to the details of this article. As detailed below, we found most of the suggestions fully pertinent and modified the manuscript accordingly.
1) P3: The paragraph on splice site consensus sequences could provide a more detailed portrait of canonical intron signals across eukaryotes. For example, not all eukaryotes have polyT tracts between the branch point (BP) and the 3’ AG, and some fungal species even have polyT tracts upstream the BP (see Bon et al., Nucleic Acids Res 2003; Irimia and Roy, PLoS Genetics 2008). Also, some extremely intron-poor species intriguingly have strict GTATGT as consensus 5’sequence (including yeast), which may be worth pointing out. Finally, the 3’ consensus is closer to YAG than to CAG, at least in most species.
Authors’ response: We added discussion of this issue to the revised text.
2) P5: I found the (exciting) discussion on the ancestrality of U2 vs. U12 too short and a bit imbalanced. Personally, I think it is a good idea that the authors give their authoritative opinion/preference on this kind of discussions, but the opposite arguments should also be presented extensively. In this case, I think the arguments supporting an ancestral origin of U12 (i.e. lack of evidence for conversion from U2 to U12, argued higher similarity of U12 to type II introns, etc.) should be fully developed.
Authors’ response: In our view, the questionable greater similarity of U12 introns to Group II introns does not immediately imply ancestral status of U12 introns. We added to the text ‘it might be tempting to speculate that the ancestral introns were of the U12 type (for example, see discussion by the reviewer #3 below) but have been subsequently converted to U2 introns.’
3) P8: Pleiss et al. (PLoS Biol 2007) may be added supporting a global regulatory function of introns in yeast.
Authors’ response: We added discussion of this important work to the section ‘Functions of introns associated with splicing’.
4) P9: I missed a more comprehensive and complete review of the literature on the genome-wide dynamics of intron gain and loss in this section. For example, on the general slow paucity of intron gain, I missed references on vertebrates (Loh et al., MBE 2008; and actually ref 72 is incorrect: Coulombe-Huntington and Majewski, Genome Res 2007), flies (actual ref 72), plants (Roy and Penny, MBE 2007), apicomplexa (Roy and Hartl, Genome Res 2006; Roy and Penny, Genome Res 2006), Entamoeba (Roy et al., MBE 2006), Fungi (Nielsen et al., Plos Biol 2004; Stajich et al., Genome Biol 2007; in Aspergillus (Zhang et al., JME 2010)). On the opposite side: tunicates (Seo et al., Science 2001; Edvardsen et al., JME 2004), diatoms (Roy and Penny, MBE 2007), mitochondrial transfers (Ahmadinejad et al., BMC Evol Biol 2010). Given the overall level of comprehensiveness and detail of this review and that, as I said above, it is very likely to become a major reference in the field, I think it would be important to cite all relevant references in the main text, in particular from such an important and prolific subtopic.
Authors’ response: There is indeed a lot of evidence on specific events in individual lineage. We appreciate their importance but it is hardly possible to discuss ‘everything’ in detail. That said, the revised version of the review cites all the references pointed out by the reviewer.
5) P9: when commenting on ref 61, the use of the word “dispute” may give the impression that there is an ongoing controversy or a difference in opinions between the authors, which I guess is really not the case. Ref. 74 showed that most reported gains in ref. 61 were indeed losses by adding more species to the analysis that were not available by the time of the original study. This may not be clear to general readers that have not followed the specialized literature.
Authors’ response: We added this explanation to the text.
6) P11: the authors may wish to mention here the recent work by Cabili et al. (Genes Dev 2011), which describe >8,000 lincRNA genes, with an average of ~1.9 introns per Kbp and that are extensively alternatively spliced, with 2.3 isoforms per gene.
Authors’ response: We added a brief description to the text.
7) P14: I was quite surprised to read that the sequences at the 3’ of the intron behave completely different from those at the 5’. Many of the extremely intron-poor species (although not all, in this case) that show strict 5’ splice site consensus also have very strict BPs, and sometimes even very constrained branch-point-to-AG distances (Irimia and Roy, Plos Genetics 2008). I guess this apparent contradiction is due to the fact that these species are all missing from the analysis by Iwata and Gotoh (represented in Figure
), which is strongly biased towards multicellular organisms, and I suspect that the inclusion of the intron-poor species would fully disrupt the observed negative correlation. In my opinion, this section should be modified to give a more complete view of the evolution of the 3’ intronic signals (more like 3-4 qualitatively different behaviors related to, but not fully determined by, intron densities). Also, I recommend removing Figure
or making a new one using a more complete eukaryotic taxon sampling.
Authors’ response: We added a list of species to the legend. Robust estimation of the information content require hundreds of splice signals, so it is impossible for the extremely intron-poor species. This is why these species are missing from the analysis of Iwata and Gotoh, and accordingly, from our Figure. We believe that it is fully legitimate to present only the data for those organisms that possess enough introns for meaningful statistical analysis. Furthermore, there is no contradiction at all between the observation that some extremely intron-poor species possess strict 5’ splice site and also have very strict BPs and the positive correlation between the strength of the donor splice signal and the combined strength of the branch point signal + the acceptor splice signal emphasized in the present article.
8) P14: also related to splicing signals, it would be interesting to include a comment on the effect of intron size on splicing signals (long introns have stronger boundaries, species with extremely short introns often have very weak signals (e.g. paramecium, B. natans nucleomorph), etc.).
Authors’ response: The effect of intron size is complicated. We added discussion of this issue to the section on ‘Evolution of splicing signals, protosplice sites, and intron phase distribution.’
may be a bit unclear and “too raw” for non-specialists.
Authors’ response: We included an additional explanation in the legend: “An excess of protosplice sites in phase 0 is noticeable, however the ‘protosplice site’ hypothesis, which posits that introns are randomly inserted into protosplice sites, is unable to fully explain the observed over-representation of phase 0 introns.”
10) P26: the authors may want to point out from the beginning that the “two competing hypothesis” they present are not necessarily mutually exclusive.
Authors’ response: Added to the text as suggested.
11) P28: ref. 158 also concludes that alternative splicing has emerged early in eukaryotic evolution, so it should be cited along with ref. 166 and not with 28.
Authors’ response: Modified as suggested.
12) P31: more references may be added supporting the low conservation of alternative splicing in mammals (currently only one, from 2003, is given, but several studies have reached similar conclusions). Similarly, many other studies have dealt with the evolution of alternative splicing from the perspective of the splicing signals, not only regarding GC splicing donor sequences (e.g. evolution of ESEs and ESSs (Parmley et al., MBE 2006; Ke et al., Genome Res 2008; Irimia et al., PLoS One 2009) and their polymorphism in human populations (Stallings-Mann et al., PNAS 1996; Stanton et al., PNAS 2003; Fairbrother et al., PLoS Biol 2006; Carlini and Genut, JME 2006; Coulombe-Huntington et al., Plos Genetics 2009).
Authors’ response: A brief discussion and references added as suggested.
13) P32: perhaps the section “Functions of introns” would fit better before the section on alternative splicing (since the latter is one of those functions).
Authors’ response: Alternative splicing is not exactly a function of introns, rather a mechanism of modulation of protein and RNA function. In the functional section we addressed specific functions of intron sequences. This might be debatable but we consider the original order of the sections acceptable.
14) P33: the authors may want to add that some spectacular, functional exceptions are known to the general case that splicing occurs before mRNA is exported to the cytoplasm. For example, Buckley et al. (Neuron 2011) describe the case of some transcripts with retained introns, which drive subcellular location of the transcripts to the dendrites due to the presence of a particular transposable element within their sequence.
Authors’ response: We appreciate the reviewer bringing our attention to this exiting work. Cited and briefly discussed.
15) P34: the catalog of U12 introns by Alioto (Nucleic Acids Res 2007) could be referenced here.
Authors’ response: Cited as suggested.
16) P35: I think it could be useful to make a clearer distinction between Splice Leader (SL) trans-splicing and trans-splicing between two different genes from the beginning of the paragraph (I found it a bit confusing now). Also, the authors may wish to cite a very elegant analysis searching for trans-splicing in Drosophila using RNAseq on hybrids (McManus et al., PNAS 2010).
Authors’ response: We agree and have included a brief discussion and references as suggested.
17) P36: in this subsection I missed a paragraph on the (predictable and predictive) association between long introns and the presence of functional elements. For example, Denoeud et al. (Science 2011) found that the few genes with long introns in Oikopleura are enriched for key developmental regulators, and that those introns likely contain regulatory information. This has also been observed for many other developmental genes across metazoans [e.g. Shh (Muller et al., Development 1999), FoxP1 and Dach (Sandelin et al., BMC Genomics 2004); Gli3 (Abbasi et al., PLoS One 2007), Meis genes (Irimia et al., GBE 2011), etc.] and for associated non-developmental genes (“bystander” genes) (e.g. Woolfe et al., PLoS Biol 2005; McEwen et al., Genome Res 2006; Kikuta et al., Genome Res 2007; Engstrom et al., Genome Res 2007), with exciting implications for the evolution of genome architecture. Also, supporting the presence of regulatory elements, higher sequence conservation is often found in longer introns (Bergman and Kreitman, Genome Res 2001; Parsch, Genetics 2003; Haddrill et al., Genome Biol 2005; Marais et al., Genetics 2005; Halligan and Keightley, Genome Res 2006; Parsch et al., MBE 2010).
Authors’ response: Brief discussion and references included as suggested.
18) P38: the authors may add the report by Curtis and Archibald (Curr Biol 2010) to the list of different sources of spliceosomal introns.
Authors’ response: Cited as suggested.
Reviewer #4. Dr. Fyodor Kondrashov, Center for Genome Regulation, Barcelona
This is a straightforward and extensive review of everything that is known about the evolution of introns and then some more. I do not have much to add in addition to what the authors have already said. The only thing that I am left wondering about after reading this review is whether or not the authors think that Group II introns in LECA were involved in the transport of mitochondrial precursor genes into what is now the cytoplasm across the novel intracellular membrane. In light of the previous reviews I would leave it up to the authors to space and moderate the level and format of speculation, even though I believe that the nice synthesis the authors have produced make the review more interesting and useful.
Authors’ response: We appreciate this comment. We are not entirely clear about the exact meaning of the reviewer’s idea regarding mitochondrial genes. Is this about transfer of genes from the mitochondrial to the nuclear genome? If so, the possibility of involvement of the reverse transcriptase activity of Group II introns is intriguing but in the absence of specific evidence, one would think the main route was DNA recombination.