The mechanisms and evolutionary dynamics of intron insertion and loss in eukaryotic genes remain poorly understood. Reconstruction of parsimonious scenarios of gene structure evolution in paralogous gene families in animals and plants revealed numerous gains and losses of introns. In all analyzed lineages, the number of acquired new introns was substantially greater than the number of lost ancestral introns. This trend held even for lineages in which vertical evolution of genes involved more intron losses than gains, suggesting that gene duplication boosts intron insertion. However, dating gene duplications and the associated intron gains and losses based on the molecular clock assumption showed that very few, if any, introns were gained during the last ∼100 million years of animal and plant evolution, in agreement with previous conclusions reached through analysis of orthologous gene sets. These results are generally compatible with the emerging notion of intensive insertion and loss of introns during transitional epochs in contrast to the relative quiet of the intervening evolutionary spans.
The presence of spliceosomal introns in eukaryotic genes poses a major puzzle for the study of genome evolution. Intron densities vary enormously among distant lineages. However, the mechanisms driving intron gains are poorly understood and very few intron gains and losses have been documented over short evolutionary time spans. Fungi emerged recently as excellent models to study intron evolution and “reverse splicing” was found to be a major driver of recent intron gains in a clade of ascomycete fungi. We screened a total of 38 genomes from two fungal clades important in medicine and agriculture to identify intron gains and losses both within and between species. We detected 86 and 198 variable intron positions in the Cryptococcus and Fusarium clades, respectively. Some genes underwent extensive changes in their exon–intron structure, with up to six variable intron positions per gene. We identified a very recently gained intron in a group of tomato-infecting strains belonging to the F. oxysporum species complex. In the human pathogen C. gattii, we found recent intron losses in subtypes of the species. The two studied fungal clades provided evidence for extensive changes in their exon–intron structure within and among closely related species. We show that both intronization of previously coding DNA and insertion of exogenous DNA are the major drivers of intron gains.
spliceosomal introns; intron gains; Fusarium; Cryptococcus; population genomics
It is widely accepted that orthologous genes have lost or gained introns throughout evolution. However, the specific mechanisms that generate these changes have proved elusive. Introns are known to affect nearly every level of gene expression. Therefore, understanding their mechanism of evolution after their initial fixation in eukaryotes is pertinent to understanding the means by which organisms develop greater regulation and complexity.
To investigate possible mechanisms of intron gain and loss, we identified 189 intron gain and 297 intron loss events among 11 Drosophila species. We then investigated these events for signatures of previously proposed mechanisms of intron gain and loss. This work constitutes the first comprehensive study into the specific mechanisms that may generate intron gains and losses in Drosophila. We report evidence of intron gain via transposon insertion; the first intron loss that may have occurred via non-homologous end joining; intron gains via the repair of a double strand break; evidence of intron sliding; and evidence that internal or 5' introns may not frequently be deleted via the self-priming of reverse transcription during mRNA-mediated intron loss. Our data also suggest that the transcription process may promote or result in intron gain.
Our findings support the occurrence of intron gain via transposon insertion, repair of double strand breaks, as well as intron loss via non-homologous end joining. Furthermore, our data suggest that intron gain may be enabled by or due to transcription, and we shed further light on the exact mechanism of mRNA-mediated intron loss.
Genome-wide studies of intron dynamics in mammalian orthologous genes have found convincing evidence for loss of introns but very little for intron turnover. Similarly, large-scale analysis of intron dynamics in a few vertebrate genomes has identified only intron losses and no gains, indicating that intron gain is an extremely rare event in vertebrate evolution. These studies suggest that the intron-rich genomes of vertebrates do not allow intron gain. The aim of this study was to search for evidence of de novo intron gain in domesticated genes from an analysis of their exon/intron structures.
A phylogenomic approach has been used to analyse all domesticated genes in mammals and chordates that originated from the coding parts of transposable elements. Gain of introns in domesticated genes has been reconstructed on well established mammalian, vertebrate and chordate phylogenies, and examined as to where and when the gain events occurred. The locations, sizes and amounts of de novo introns gained in the domesticated genes during the evolution of mammals and chordates has been analyzed. A significant amount of intron gain was found only in domesticated genes of placental mammals, where more than 70 cases were identified. De novo gained introns show clear positional bias, since they are distributed mainly in 5' UTR and coding regions, while 3' UTR introns are very rare. In the coding regions of some domesticated genes up to 8 de novo gained introns have been found. Intron densities in Eutheria-specific domesticated genes and in older domesticated genes that originated early in vertebrates are lower than those for normal mammalian and vertebrate genes. Surprisingly, the majority of intron gains have occurred in the ancestor of placentals.
This study provides the first evidence for numerous intron gains in the ancestor of placental mammals and demonstrates that adequate taxon sampling is crucial for reconstructing intron evolution. The findings of this comprehensive study slightly challenge the current view on the evolutionary stasis in intron dynamics during the last 100 - 200 My. Domesticated genes could constitute an excellent system on which to analyse the mechanisms of intron gain in placental mammals.
Reviewers: this article was reviewed by Dan Graur, Eugene V. Koonin and Jürgen Brosius.
Positions of spliceosomal introns are often conserved between remotely related genes. Introns that reside in non-conserved positions are either novel or remnants of frequent losses of introns in some evolutionary lineages. A recent gain of such introns is difficult to prove. However, introns verified as novel are needed to evaluate contemporary processes of intron gain.
We identified 25 unambiguous cases of novel intron positions in 31 Drosophila genes that exhibit near intron pairs (NIPs). Here, a NIP consists of an ancient and a novel intron position that are separated by less than 32 nt. Within a single gene, such closely-spaced introns are very unlikely to have coexisted. In most cases, therefore, the ancient intron position must have disappeared in favour of the novel one. A survey for NIPs among 12 Drosophila genomes identifies intron sliding (migration) as one of the more frequent causes of novel intron positions. Other novel introns seem to have been gained by regional tandem duplications of coding sequences containing a proto-splice site.
Recent intron gains sometimes appear to have arisen by duplication of exonic sequences and subsequent intronization of one of the copies. Intron migration and exon duplication together may account for a significant amount of novel intron positions in conserved coding sequences.
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
Comparison of five relatively closely related yeast Cryptococcus genomes suggests that recombination causes internal intron loss and that DNA repeat expansion can create new introns in a population.
Genome comparisons across deep phylogenetic divergences have revealed that spliceosomal intron gain and loss are common evolutionary events. However, because of the deep divergences involved in these comparisons, little is understood about how these changes occur, particularly in the case of intron gain. To ascertain mechanisms of intron gain and loss, we compared five relatively closely related genomes from the yeast Cryptococcus.
We observe a predominance of intron loss over gain and identify a relatively slow intron loss rate in Cryptococcus. Some genes preferentially lose introns and a large proportion of intron losses occur in the middle of genes (so called internal intron loss). Finally, we identify a gene that displays a differential number of introns in a repetitive DNA region.
Based the observed patterns of intron loss and gain, population resequencing and population genetic analysis, it appears that recombination causes the widely observed but poorly understood phenomenon of internal intron loss and that DNA repeat expansion can create new introns in a population.
Retroposition, a leading mechanism for gene duplication, is an important process shaping the evolution of genomes. Retrogenes are also involved in the gene structure evolution as a major player in the process of intron deletion. Here, we demonstrate the role of retrogenes in intron gain in mammals. We identified one case of “intronization,” the transformation of exonic sequences into an intron, in the primate specific retrogene RNF113B and two independent “intronization” events in the retrogene DCAF12L2, one in the common ancestor of primates and rodents and another one in the rodent lineage. Intron gain resulted from the origin of new splice variants, and both genes have two transcript forms, one with retained intron and one with the intron spliced out. Evolution of these genes, especially RNF113B, has been very dynamic and has been accompanied by several additional events including parental gene loss, secondary retroposition, and exaptation of transposable elements.
intron gain; gene structure evolution; splice variant; RNF113; DCAF12
Over the past 5 years, the availability of dozens of whole genomic sequences from a wide variety of eukaryotic lineages has revealed a very large amount of information about the dynamics of intron loss and gain through eukaryotic history, as well as the evolution of intron sequences. Implicit in these advances is a great deal of information about the structure and evolution of surrounding sequences. Here, we review the wealth of ways in which structures of spliceosomal introns as well as their conservation and change through evolution may be harnessed for evolutionary and genomic analysis. First, we discuss uses of intron length distributions and positions in sequence assembly and annotation, and for improving alignment of homologous regions. Second, we review uses of introns in evolutionary studies, including the utility of introns as indicators of rates of sequence evolution, for inferences about molecular evolution, as signatures of orthology and paralogy, and for estimating rates of nucleotide substitution. We conclude with a discussion of phylogenetic methods utilizing intron sequences and positions.
Analysis of intron gain and loss in fungal genomes provides support for an intron-rich fungus-animal ancestor.
Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from transcripts before protein translation. Many facets of spliceosomal intron evolution, including age, mechanisms of origins, the role of natural selection, and the causes of the vast differences in intron number between eukaryotic species, remain debated. Genome sequencing and comparative analysis has made possible whole genome analysis of intron evolution to address these questions.
We analyzed intron positions in 1,161 sets of orthologous genes across 25 eukaryotic species. We find strong support for an intron-rich fungus-animal ancestor, with more than four introns per kilobase, comparable to the highest known modern intron densities. Indeed, the fungus-animal ancestor is estimated to have had more introns than any of the extant fungi in this study. Thus, subsequent fungal evolution has been characterized by widespread and recurrent intron loss occurring in all fungal clades. These results reconcile three previously proposed methods for estimation of ancestral intron number, which previously gave very different estimates of ancestral intron number for eight eukaryotic species, as well as a fourth more recent method. We do not find a clear inverse correspondence between rates of intron loss and gain, contrary to the predictions of selection-based proposals for interspecific differences in intron number.
Our results underscore the high intron density of eukaryotic ancestors and the widespread importance of intron loss through eukaryotic evolution.
In eukaryotes, introns are located in nuclear and organelle genes from several kingdoms. Large introns (up to 5 kbp) are frequent in mitochondrial genomes of plant and fungi but scarce in Metazoa, even if these organisms are grouped with fungi among the Opisthokonts. Mitochondrial introns are classified in two groups (I and II) according to their RNA secondary structure involved in the intron self-splicing mechanism. Most of these mitochondrial group I introns carry a “Homing Endonuclease Gene” (heg) encoding a DNA endonuclease acting in transfer and site-specific integration (“homing”) and allowing intron spreading and gain after lateral transfer even between species from different kingdoms. Opposed to this gain mechanism, is another which implies that introns, which would have been abundant in the ancestral genes, would mainly evolve by loss. The importance of both mechanisms (loss and gain) is matter of debate. Here we report the sequence of the cox1 gene of the button mushroom Agaricus bisporus, the most widely cultivated mushroom in the world. This gene is both the longest mitochondrial gene (29,902 nt) and the largest group I intron reservoir reported to date with 18 group I and 1 group II. An exhaustive analysis of the group I introns available in cox1 genes shows that they are mobile genetic elements whose numerous events of loss and gain by lateral transfer combine to explain their wide and patchy distribution extending over several kingdoms. An overview of intron distribution, together with the high frequency of eroded heg, suggests that they are evolving towards loss. In this landscape of eroded and lost intron sequences, the A. bisporus cox1 gene exhibits a peculiar dynamics of intron keeping and catching, leading to the largest collection of mitochondrial group I introns reported to date in a Eukaryote.
Intron gains reportedly are very rare during evolution of vertebrates, and the mechanisms underlying their creation are largely unknown. Previous investigations have shown that, during metazoan radiation, the exon-intron patterns of serpin superfamily genes were subject to massive changes, in contrast to many other genes.
Here we investigated intron dynamics in the serpin superfamily in lineages pre- and postdating the split of vertebrates. Multiple intron gains were detected in a group of ray-finned fishes, once the canonical groups of vertebrate serpins had been established. In two genes, co-occurrence of non-standard introns was observed, implying that intron gains in vertebrates may even happen concomitantly or in a rapidly consecutive manner. DNA breakage/repair processes associated with genome compaction are introduced as a novel factor potentially favoring intron gain, since all non-canonical introns were found in a lineage of ray-finned fishes that experienced genomic downsizing.
Multiple intron acquisitions were identified in serpin genes of a lineage of ray-finned fishes, but not in any other vertebrates, suggesting that insertion rates for introns may be episodically increased. The co-occurrence of non-standard introns within the same gene discloses the possibility that introns may be gained simultaneously. The sequences flanking the intron insertion points correspond to the proto-splice site consensus sequence MAG↑N, previously proposed to serve as intron insertion site. The association of intron gains in the serpin superfamily with a group of fishes that underwent genome compaction may indicate that DNA breakage/repair processes might foster intron birth.
Group II introns are novel genetic elements that have properties of both catalytic RNAs and retroelements. Initially identified in organellar genomes of plants and lower eukaryotes, group II introns are now being discovered in increasing numbers in bacterial genomes. Few of the newly sequenced bacterial introns are correctly identified or annotated by those who sequenced them. Here we have compiled and thoroughly analyzed group II introns and their fragments in bacterial DNA sequences reported to GenBank. Intron distribution in bacterial genomes differs markedly from the distribution in organellar genomes. Bacterial introns are not inserted into conserved genes, are often inserted outside of genes altogether and are frequently fragmented, suggesting a high rate of intron gain and loss. Some introns have multiple natural homing sites while others insert after transcriptional terminators. All bacterial group II introns identified to date encode reverse transcriptase open reading frames and are either active retroelements or derivatives of retroelements. Together, these observations suggest that group II introns in bacteria behave primarily as retroelements rather than as introns, and that the strategy for group II intron survival in bacteria is fundamentally different from intron survival in organelles.
The fission yeast, Schizosaccharomyces pombe, is an important model species with a low intron density. Previous studies showed extensive intron losses during its evolution. To test the models of intron loss and gain in fission yeasts, we conducted a comparative genomic analysis in four Schizosaccharomyces species. Both intronization and de-intronization were observed, although both were at a low frequency. A de-intronization event was caused by a degenerative mutation in the branch site. Four cases of imprecise intron losses were identified, indicating that genomic deletion is not a negligible mechanism of intron loss. Most intron losses were precise deletions of introns, and were significantly biased to the 3′ sides of genes. Adjacent introns tended to be lost simultaneously. These observations indicated that the main force shaping the exon-intron structures of fission yeasts was precise intron losses mediated by reverse transcriptase. We found two cases of intron gains caused by tandem genomic duplication, but failed to identify the mechanisms for the majority of the intron gain events observed. In addition, we found that intron-lost and intron-gained genes had certain similar features, such as similar Gene Ontology categories and expression levels.
Little is known about the patterns of intron gain and loss or the relative contributions of these two processes to gene evolution. To investigate the dynamics of intron evolution, we analyzed orthologous genes from four filamentous fungal genomes and determined the pattern of intron conservation. We developed a probabilistic model to estimate the most likely rates of intron gain and loss giving rise to these observed conservation patterns. Our data reveal the surprising importance of intron gain. Between about 150 and 250 gains and between 150 and 350 losses were inferred in each lineage. We discuss one gene in particular (encoding 1-phosphoribosyl-5-pyrophosphate synthetase) that displays an unusually high rate of intron gain in multiple lineages. It has been recognized that introns are biased towards the 5′ ends of genes in intron-poor genomes but are evenly distributed in intron-rich genomes. Current models attribute this bias to 3′ intron loss through a poly-adenosine-primed reverse transcription mechanism. Contrary to standard models, we find no increased frequency of intron loss toward the 3′ ends of genes. Thus, recent intron dynamics do not support a model whereby 5′ intron positional bias is generated solely by 3′-biased intron loss.
A comparative study of four fungal genomes reveals the patterns of intron gain and loss over several hundred million years of evolution
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
In eukaryotes, protein-coding genes are interrupted by non-coding introns. The intron densities widely differ, from 6–7 introns per kilobase of coding sequence in vertebrates, some invertebrates and plants, to only a few introns across the entire genome in many unicellular forms. We applied a robust statistical methodology, Markov Chain Monte Carlo, to reconstruct the history of intron gain and loss throughout the evolution of eukaryotes using a set of 245 homologous genes from 99 genomes that represent the diversity of eukaryotes. Intron-rich ancestors were confidently inferred for each major eukaryotic group including 53% to 74% of the human intron density for the last eukaryotic common ancestor, and 120% to 130% of the human value for the last common ancestor of animals. Evolution of eukaryotic genes involved primarily intron loss, with substantial gain only at the bases of several major branches including plants and animals. Thus, the common ancestor of all extant eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. The line of descent from the last common ancestor to mammals was an uninterrupted intron-rich state that, given the error-prone splicing in intron-rich organisms, was conducive to the elaboration of functional alternative splicing.
The evolutionary forces responsible for intron loss are unresolved. Whereas research has focused on protein-coding genes, here we analyze noncoding small nucleolar RNA (snoRNA) genes in which introns, rather than exons, are typically the functional elements. Within the yeast lineage exemplified by the human pathogen Candida albicans, we find—through deep RNA sequencing and genome-wide annotation of splice junctions—extreme compaction and loss of associated exons, but retention of snoRNAs within introns. In the Saccharomyces yeast lineage, however, we find it is the introns that have been lost through widespread degeneration of splicing signals. This intron loss, perhaps facilitated by innovations in snoRNA processing, is distinct from that observed in protein-coding genes with respect to both mechanism and evolutionary timing.
A recent study makes considerable progress towards answering the question of how genes acquire introns by identifying numerous recently gained introns in nematodes.
The long-standing question of how genes acquire introns has provoked much debate. A recent study makes considerable progress by identifying numerous recently gained introns in nematodes - although it remains difficult to distinguish definitively between models of intron gain.
Orthologous genes from distant eukaryotic species, e.g. animals and plants, share up to 25–30% intron positions. However, the relative contributions of evolutionary conservation and parallel gain of new introns into this pattern remain unknown. Here, the extent of independent insertion of introns in the same sites (parallel gain) in orthologous genes from phylogenetically distant eukaryotes is assessed within the framework of the protosplice site model. It is shown that protosplice sites are no more conserved during evolution of eukaryotic gene sequences than random sites. Simulation of intron insertion into protosplice sites with the observed protosplice site frequencies and intron densities shows that parallel gain can account but for a small fraction (5–10%) of shared intron positions in distantly related species. Thus, the presence of numerous introns in the same positions in orthologous genes from distant eukaryotes, such as animals, fungi and plants, appears to reflect mostly bona fide evolutionary conservation.
Intron number varies considerably among genomes, but despite their fundamental importance, the mutational mechanisms and evolutionary processes underlying the expansion of intron number remain unknown. Here we show that Drosophila, in contrast to most eukaryotic lineages, is still undergoing a dramatic rate of intron gain. These novel introns carry significantly weaker splice sites that may impede their identification by the spliceosome. Novel introns are more likely to encode a premature termination codon (PTC), indicating that nonsense-mediated decay (NMD) functions as a backup for weak splicing of new introns. Our data suggest that new introns originate when genomic insertions with weak splice sites are hidden from selection by NMD. This mechanism reduces the sequence requirement imposed on novel introns and implies that the capacity of the spliceosome to recognize weak splice sites was a prerequisite for intron gain during eukaryotic evolution.
The surprising observation 30 years ago that genes are interrupted by non-coding introns changed our view of gene architecture. Intron number varies dramatically among species; ranging from nine introns/gene in humans to less than one in some simple eukyarotes. Here we ask where new introns come from and how they are maintained in a population. We find that novel introns do not arise from pre-existing introns, although the mechanisms that generate novel introns remain unclear. We also show that novel introns carry only weak signals for their identification and removal, and therefore depend on nonsense-mediated decay (NMD). NMD maintains RNA quality control by degrading transcripts that have not been spliced properly. We propose that NMD shelters novel introns from natural selection. This increases the likelihood that a novel intron will rise in frequency and be maintained within a population, thus increasing the rate of intron gain.
Retrogenes generally do not contain introns. However, in some instances, retrogenes may recruit internal exonic sequences as introns, which is known as intronization. A retrogene that undergoes intronization is a good model with which to investigate the origin of introns. Nevertheless, previously, only two cases in vertebrates have been reported.
In this study, we systematically screened the human (Homo sapiens) genome for retrogenes that evolved introns and analyzed their patterns in structure, expression and origin. In total, we identified nine intron-containing retrogenes. Alignment of pairs of retrogenes and their parents indicated that, in addition to intronization (five cases), retrogenes also may have gained introns by insertion of external sequences into the genes (one case) or reversal of the orientation of transcription (three cases). Interestingly, many intronizations were promoted not by base substitutions but by cryptic splice sites, which were silent in the parental genes but active in the retrogenes. We also observed that the majority of introns generated by intronization did not involve frameshifts.
Intron gains in retrogenes are not as rare as previously thought. Furthermore, diverse mechanisms may lead to intron creation in retrogenes. The activation of cryptic splice sites in the intronization of retrogenes may be triggered by the change of gene structure after retroposition. A high percentage of non-frameshift introns in retrogenes may be because non-frameshift introns do not dramatically affect host proteins. Introns generated by intronization in human retrogenes are generally young, which is consistent with previous findings for Caenorhabditis elegans. Our results provide novel insights into the evolutionary role of introns.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
Hemiascomycetous yeasts have intron-poor genomes with very few cases of alternative splicing. Most of the reported examples result from intron retention in Saccharomyces cerevisiae and some have been shown to be functionally significant. Here we used transcriptome-wide approaches to evaluate the mechanisms underlying the generation of alternative transcripts in Yarrowia lipolytica, a yeast highly divergent from S. cerevisiae.
Experimental investigation of Y. lipolytica gene models identified several cases of alternative splicing, mostly generated by intron retention, principally affecting the first intron of the gene. The retention of introns almost invariably creates a premature termination codon, as a direct consequence of the structure of intron boundaries. An analysis of Y. lipolytica introns revealed that introns of multiples of three nucleotides in length, particularly those without stop codons, were underrepresented. In other organisms, premature termination codon-containing transcripts are targeted for degradation by the nonsense-mediated mRNA decay (NMD) machinery. In Y. lipolytica, homologs of S. cerevisiae UPF1 and UPF2 genes were identified, but not UPF3. The inactivation of Y. lipolytica UPF1 and UPF2 resulted in the accumulation of unspliced transcripts of a test set of genes.
Y. lipolytica is the hemiascomycete with the most intron-rich genome sequenced to date, and it has several unusual genes with large introns or alternative transcription start sites, or introns in the 5' UTR. Our results suggest Y. lipolytica intron structure is subject to significant constraints, leading to the under-representation of stop-free introns. Consequently, intron-containing transcripts are degraded by a functional NMD pathway.
The evolution of spliceosomal introns remains poorly understood. Although many approaches have been used to infer intron evolution from the patterns of intron position conservation, the results to date have been contradictory. In this paper, we address the problem using a novel maximum likelihood method, which allows estimation of the frequency of intron insertion target sites, together with the rates of intron gain and loss. We analyzed the pattern of 10,044 introns (7,221 intron positions) in the conserved regions of 684 sets of orthologs from seven eukaryotes. We determined that there is an average of one target site per 11.86 base pairs (bp) (95% confidence interval, 9.27 to 14.39 bp). In addition, our results showed that: (i) overall intron gains are ~25% greater than intron losses, although specific patterns vary with time and lineage; (ii) parallel gains account for ~18.5% of shared intron positions; and (iii) reacquisition following loss accounts for ~0.5% of all intron positions. Our results should assist in resolving the long-standing problem of inferring the evolution of spliceosomal introns.
When did spliceosomal introns originate, and what is their role? These questions are the central subject of the introns-early versus introns-late debate. Inference of intron evolution from the pattern of intron position conservation is vital for resolving this debate. So far, different methods of two approaches, maximum parsimony (MP) and maximum likelihood (ML), have been developed, but the results are contradictory. The differences between previous ML results are due predominantly to differing assumptions concerning the frequency of target sites for intron insertion. This paper describes a new ML method that treats this frequency as a parameter requiring optimization. Using the pattern of intron position in conserved regions of 684 clusters of gene orthologs from seven eukaryotes, the authors found that, on average, there is one target site per ~12 base pairs. The results of intron evolution inferred using this optimal frequency are more definitive than previous ML results. Since the ML method is preferred to the MP one for large datasets, the current results should be the most reliable ones to date. The results show that during the course of evolution there have been slightly more intron gains than losses, and thus they favor introns-late. These results should shed new light on our understanding of intron evolution.
U12-type introns are spliced by the U12-dependent spliceosome and are present in the genomes of many higher eukaryotic lineages including plants, chordates and some invertebrates. However, due to their relatively recent discovery and a systematic bias against recognition of non-canonical splice sites in general, the introns defined by U12-type splice sites are under-represented in genome annotations. Such under-representation compounds the already difficult problem of determining gene structures. It also impedes attempts to study these introns genome-wide or phylum-wide. The resource described here, the U12 Intron Database (U12DB), aims to catalog the U12-type introns of completely sequenced eukaryotic genomes in a framework that groups orthologous introns with each other. This will aid further investigations into the evolution and mechanism of U12-dependent splicing as well as assist ongoing genome annotation efforts. Public access to the U12DB is available at .