Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
Many spliceosomal introns exist in the eukaryotic nuclear genome. Despite much research, the evolution of spliceosomal introns remains poorly understood. In this paper, we tried to gain insights into intron evolution from a novel perspective by comparing the gene structures of cytoplasmic ribosomal proteins (CRPs) and mitochondrial ribosomal proteins (MRPs), which are held to be of archaeal and bacterial origin, respectively. We analyzed 25 homologous pairs of CRP and MRP genes that together had a total of 527 intron positions. We found that all 12 of the intron positions shared by CRP and MRP genes resulted from parallel intron gains and none could be considered to be “conserved,” i.e., descendants of the same ancestor. This was supported further by the high frequency of proto-splice sites at these shared positions; proto-splice sites are proposed to be sites for intron insertion. Although we could not definitively disprove that spliceosomal introns were already present in the last universal common ancestor, our results lend more support to the idea that introns were gained late. At least, our results show that MRP genes were intronless at the time of endosymbiosis. The parallel intron gains between CRP and MRP genes accounted for 2.3% of total intron positions, which should provide a reliable estimate for future inferences of intron evolution.
Genes in eukaryotes are usually intervened by extra bits of DNA sequence, called introns, that have to be removed after the genes are transcribed into RNA. Why do introns exist in eukaryotic genes? What is the reason for the increased intron density in higher eukaryotes? There is much that is not known about introns. This research tries to clarify the evolutionary process by which introns arose by comparing the gene structures of two types of ribosomal proteins; one in cytoplasm and the other in mitochondria of the cell. Since cytoplasm and mitochondria are of archaeal and bacterial origin, respectively, cytoplasmic ribosomal proteins (CRPs) and mitochondrial ribosomal proteins (MRPs) are believed to diverge at the same time with the divergence of archaea and bacteria. Thus, a comparative analysis of CRP and MRP genes may reveal whether introns already existed at the last common ancestor of archaea and bacteria (introns-early) or whether they emerged late (introns-late). The results make it clear, at least, that all of the introns in MRP genes were gained during the course of eukaryotic evolution and therefore lend more support to the introns-late theory.
Studies of mobile group II introns from a thermophilic cyanobacterium reveal how these introns proliferate within genomes and might explain the origin of introns and retroelements in higher organisms.
Mobile group II introns, which are found in bacterial and organellar genomes, are site-specific retroelments hypothesized to be evolutionary ancestors of spliceosomal introns and retrotransposons in higher organisms. Most bacteria, however, contain no more than one or a few group II introns, making it unclear how introns could have proliferated to higher copy numbers in eukaryotic genomes. An exception is the thermophilic cyanobacterium Thermosynechococcus elongatus, which contains 28 closely related copies of a group II intron, constituting ∼1.3% of the genome. Here, by using a combination of bioinformatics and mobility assays at different temperatures, we identified mechanisms that contribute to the proliferation of T. elongatus group II introns. These mechanisms include divergence of DNA target specificity to avoid target site saturation; adaptation of some intron-encoded reverse transcriptases to splice and mobilize multiple degenerate introns that do not encode reverse transcriptases, leading to a common splicing apparatus; and preferential insertion within other mobile introns or insertion elements, which provide new unoccupied sites in expanding non-essential DNA regions. Additionally, unlike mesophilic group II introns, the thermophilic T. elongatus introns rely on elevated temperatures to help promote DNA strand separation, enabling access to a larger number of DNA target sites by base pairing of the intron RNA, with minimal constraint from the reverse transcriptase. Our results provide insight into group II intron proliferation mechanisms and show that higher temperatures, which are thought to have prevailed on Earth during the emergence of eukaryotes, favor intron proliferation by increasing the accessibility of DNA target sites. We also identify actively mobile thermophilic introns, which may be useful for structural studies, gene targeting in thermophiles, and as a source of thermostable reverse transcriptases.
Group II introns are bacterial mobile elements thought to be ancestors of introns and retroelements in higher organisms. They comprise a catalytically active intron RNA and an intron-encoded reverse transcriptase, which promotes splicing of the intron from precursor RNA and integration of the excised intron into new genomic sites. While most bacteria have small numbers of group II introns, in the thermophilic cyanobacterium Thermosynechococcus elongatus, a single intron has proliferated and constitutes 1.3% of the genome. Here, we investigated how the T. elongatus introns proliferated to such high copy numbers. We found divergence of DNA target specificity, evolution of reverse transcriptases that splice and mobilize multiple degenerate introns, and preferential insertion into other mobile introns or insertion elements, which provide new integration sites in non-essential regions of the genome. Further, unlike mesophilic group II introns, the thermophilic T. elongatus introns rely on higher temperatures to help promote DNA strand separation, facilitating access to DNA target sites. We speculate how these mechanisms, including elevated temperature, might have contributed to intron proliferation in early eukaryotes. We also identify actively mobile thermophilic introns, which may be useful for structural studies and biotechnological applications.
The spliceosome, a sophisticated molecular machine involved in the removal of intervening sequences from the coding sections of eukaryotic genes, appeared and subsequently evolved rapidly during the early stages of eukaryotic evolution. The last eukaryotic common ancestor (LECA) had both complex spliceosomal machinery and some spliceosomal introns, yet little is known about the early stages of evolution of the spliceosomal apparatus. The Sm/Lsm family of proteins has been suggested as one of the earliest components of the emerging spliceosome and hence provides a first in-depth glimpse into the evolving spliceosomal apparatus. An analysis of 335 Sm and Sm-like genes from 80 species across all three kingdoms of life reveals two significant observations. First, the eukaryotic Sm/Lsm family underwent two rapid waves of duplication with subsequent divergence resulting in 14 distinct genes. Each wave resulted in a more sophisticated spliceosome, reflecting a possible jump in the complexity of the evolving eukaryotic cell. Second, an unusually high degree of conservation in intron positions is observed within individual orthologous Sm/Lsm genes and between some of the Sm/Lsm paralogs. This suggests that functional spliceosomal introns existed before the emergence of the complete Sm/Lsm family of proteins; hence, spliceosomal machinery with considerably fewer components than today's spliceosome was already functional.
The spliceosome is a complex molecular machine that removes intervening sequences (introns) from mRNAs. It is unique to eukaryotes. Although prokaryotes have self-splicing introns, they completely lack spliceosomal introns and the spliceosome itself. Yet even the simplest eukaryotic organisms have introns and a rather complex spliceosomal apparatus. Little is known about how this amazing machine rapidly evolved in early eukaryotes. Here, we attempt to reconstruct a part of this evolutionary process using one of the most fundamental components of the spliceosome—the Sm and Lsm family of proteins. Using sequence and structure analysis as well as the analysis of the intron positions in Sm and Lsm genes in conjunction with a wealth of published data, we propose a plausible scenario for some aspects of spliceosomal evolution. In particular, we suggest that the Lsm family of genes could have been the first and the most essential component that allowed rudimentary splicing of early spliceosomal introns. Extensive duplications of Lsm genes and the later rise of the Sm gene family likely reflect a gradual increase in complexity of the spliceosome.
Numerous instances of presence/absence variations for introns have been documented in eukaryotes, and some cases of recurrent loss of the same intron have been suggested. However, there has been no comprehensive or phylogenetically deep analysis of recurrent intron loss. Of 883 cases of intron presence/absence variation that we detected in five sequenced grass genomes, 93 were confirmed as recurrent losses and the rest could be explained by single losses (652) or single gains (118). No case of recurrent intron gain was observed. Deep phylogenetic analysis often indicated that apparent intron gains were actually numerous independent losses of the same intron. Recurrent loss exhibited extreme non-randomness, in that some introns were removed independently in many lineages. The two larger genomes, maize and sorghum, were found to have a higher rate of both recurrent loss and overall loss and/or gain than foxtail millet, rice or Brachypodium. Adjacent introns and small introns were found to be preferentially lost. Intron loss genes exhibited a high frequency of germ line or early embryogenesis expression. In addition, flanking exon A+T-richness and intron TG/CG ratios were higher in retained introns. This last result suggests that epigenetic status, as evidenced by a loss of methylated CG dinucleotides, may play a role in the process of intron loss. This study provides the first comprehensive analysis of recurrent intron loss, makes a series of novel findings on the patterns of recurrent intron loss during the evolution of the grass family, and provides insight into the molecular mechanism(s) underlying intron loss.
The spliceosomal introns are nucleotide sequences that interrupt coding regions of eukaryotic genes and are removed by RNA splicing after transcription. Recent studies have reported several examples of possible recurrent intron loss or gain, i.e., introns that are independently removed from or inserted into the identical sites more than once in an investigated phylogeny. However, the frequency, evolutionary patterns or other characteristics of recurrent intron turnover remain unknown. We provide results for the first comprehensive analysis of recurrent intron turnover within a plant family and show that recurrent intron loss represents a considerable portion of all intron losses identified and intron loss events far outnumber intron gain events. We also demonstrate that recurrent intron loss is non-random, affecting only a small number of introns that are repeatedly lost, and that different lineages show significantly different rates of intron loss. Our results suggest a possible role of DNA methylation in the process of intron loss. Moreover, this study provides strong support for the model of intron loss by reverse transcriptase mediated conversion of genes by their processed mRNA transcripts.
The evolution of spliceosomal introns remains poorly understood. Although many approaches have been used to infer intron evolution from the patterns of intron position conservation, the results to date have been contradictory. In this paper, we address the problem using a novel maximum likelihood method, which allows estimation of the frequency of intron insertion target sites, together with the rates of intron gain and loss. We analyzed the pattern of 10,044 introns (7,221 intron positions) in the conserved regions of 684 sets of orthologs from seven eukaryotes. We determined that there is an average of one target site per 11.86 base pairs (bp) (95% confidence interval, 9.27 to 14.39 bp). In addition, our results showed that: (i) overall intron gains are ~25% greater than intron losses, although specific patterns vary with time and lineage; (ii) parallel gains account for ~18.5% of shared intron positions; and (iii) reacquisition following loss accounts for ~0.5% of all intron positions. Our results should assist in resolving the long-standing problem of inferring the evolution of spliceosomal introns.
When did spliceosomal introns originate, and what is their role? These questions are the central subject of the introns-early versus introns-late debate. Inference of intron evolution from the pattern of intron position conservation is vital for resolving this debate. So far, different methods of two approaches, maximum parsimony (MP) and maximum likelihood (ML), have been developed, but the results are contradictory. The differences between previous ML results are due predominantly to differing assumptions concerning the frequency of target sites for intron insertion. This paper describes a new ML method that treats this frequency as a parameter requiring optimization. Using the pattern of intron position in conserved regions of 684 clusters of gene orthologs from seven eukaryotes, the authors found that, on average, there is one target site per ~12 base pairs. The results of intron evolution inferred using this optimal frequency are more definitive than previous ML results. Since the ML method is preferred to the MP one for large datasets, the current results should be the most reliable ones to date. The results show that during the course of evolution there have been slightly more intron gains than losses, and thus they favor introns-late. These results should shed new light on our understanding of intron evolution.
The origin of spliceosomal introns is the central subject of the introns-early versus introns-late debate. The distribution of intron phases is non-uniform, with an excess of phase-0 introns. Introns-early explains this by speculating that a fraction of present-day introns were present between minigenes in the progenote and therefore must lie in phase-0. In contrast, introns-late predicts that the nonuniformity of intron phase distribution reflects the nonrandomness of intron insertions.
In this paper, we tested the two theories using analyses of intron phase distribution. We inferred the evolution of intron phase distribution from a dataset of 684 gene orthologs from seven eukaryotes using a maximum likelihood method. We also tested whether the observed intron phase distributions from 10 eukaryotes can be explained by intron insertions on a genome-wide scale. In contrast to the prediction of introns-early, the inferred evolution of intron phase distribution showed that the proportion of phase-0 introns increased over evolution. Consistent with introns-late, the observed intron phase distributions matched those predicted by an intron insertion model quite well.
Our results strongly support the introns-late hypothesis of the origin of spliceosomal introns.
Analysis of intron gain and loss in fungal genomes provides support for an intron-rich fungus-animal ancestor.
Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from transcripts before protein translation. Many facets of spliceosomal intron evolution, including age, mechanisms of origins, the role of natural selection, and the causes of the vast differences in intron number between eukaryotic species, remain debated. Genome sequencing and comparative analysis has made possible whole genome analysis of intron evolution to address these questions.
We analyzed intron positions in 1,161 sets of orthologous genes across 25 eukaryotic species. We find strong support for an intron-rich fungus-animal ancestor, with more than four introns per kilobase, comparable to the highest known modern intron densities. Indeed, the fungus-animal ancestor is estimated to have had more introns than any of the extant fungi in this study. Thus, subsequent fungal evolution has been characterized by widespread and recurrent intron loss occurring in all fungal clades. These results reconcile three previously proposed methods for estimation of ancestral intron number, which previously gave very different estimates of ancestral intron number for eight eukaryotic species, as well as a fourth more recent method. We do not find a clear inverse correspondence between rates of intron loss and gain, contrary to the predictions of selection-based proposals for interspecific differences in intron number.
Our results underscore the high intron density of eukaryotic ancestors and the widespread importance of intron loss through eukaryotic evolution.
The timing of the origin of introns is of crucial importance for an understanding of early genome architecture. The Exon theory of genes proposed a role for introns in the formation of multi-exon proteins by exon shuffling and predicts the presence of conserved splice sites in ancient genes. In this study, large-scale analysis of potential conserved splice sites was performed using an intron-exon database (ExInt) derived from GenBank.
A set of conserved intron positions was found by matching identical splice sites sequences from distantly-related eukaryotic kingdoms. Most amino acid sequences with conserved introns were homologous to consensus sequences of functional domains from conserved proteins including kinases, phosphatases, small GTPases, transporters and matrix proteins. These included ancient proteins that originated before the eukaryote-prokaryote split, for instance the catalytic domain of protein phosphatase 2A where a total of eleven conserved introns were found. Using an experimental setup in which the relation between a splice site and the ancientness of its surrounding sequence could be studied, it was found that the presence of an intron was positively correlated to the ancientness of its surrounding sequence. Intron phase conservation was linked to the conservation of the gene sequence and not to the splice site sequence itself. However, no apparent differences in phase distribution were found between introns in conserved versus non-conserved sequences.
The data confirm an origin of introns deep in the eukaryotic branch and is in concordance with the presence of introns in the first functional protein modules in an 'Exon theory of genes' scenario. A model is proposed in which shuffling of primordial short exonic sequences led to the formation of the first functional protein modules, in line with hypotheses that see the formation of introns integral to the origins of genome evolution.
This article was reviewed by Scott Roy (nominated by Anthony Poole), Sandro de Souza (nominated by Manyuan Long), and Gáspár Jékely.
The presence of spliceosomal introns in eukaryotes raises a range of questions about genomic evolution. Along with the fundamental mysteries of introns' initial proliferation and persistence, the evolutionary forces acting on intron sequences remain largely mysterious. Intron number varies across species from a few introns per genome to several introns per gene, and the elements of intron sequences directly implicated in splicing vary from degenerate to strict consensus motifs. We report a 50-species comparative genomic study of intron sequences across most eukaryotic groups. We find two broad and striking patterns. First, we find that some highly intron-poor lineages have undergone evolutionary convergence to strong 3′ consensus intron structures. This finding holds for both branch point sequence and distance between the branch point and the 3′ splice site. Interestingly, this difference appears to exist within the genomes of green alga of the genus Ostreococcus, which exhibit highly constrained intron sequences through most of the intron-poor genome, but not in one much more intron-dense genomic region. Second, we find evidence that ancestral genomes contained highly variable branch point sequences, similar to more complex modern intron-rich eukaryotic lineages. In addition, ancestral structures are likely to have included polyT tails similar to those in metazoans and plants, which we found in a variety of protist lineages. Intriguingly, intron structure evolution appears to be quite different across lineages experiencing different types of genome reduction: whereas lineages with very few introns tend towards highly regular intronic sequences, lineages with very short introns tend towards highly degenerate sequences. Together, these results attest to the complex nature of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron structures across modern lineages, and the impressive evolutionary malleability of eukaryotic gene structures.
The spliceosomal introns that interrupt eukaryotic genes show great number and sequence variation across species, from the rare, highly uniform yeast introns to the ubiquitous and highly variable vertebrate intron sequences. The causes of these differences remain mysterious. We studied sequences of intron branch points and 3′ termini in 50 eukaryotic species. All intron-rich species exhibit variable 3′ sequences. However, intron-poor species range from variable sequences, to uniform branch point motifs, to uniform branch point motifs in uniform positions along the intronic sequence. This is a more complex pattern than the clear relationship between intron number and 5′ intron sequence uniformity found previously. The correspondence of sequence uniformity and intron number extends to species of the green algal genus Ostreococcus, in which the single intron-rich genomic region shows far more variable intron sequences than in the otherwise intron-poor genome. We suggest that different concentrations of spliceosomal complexes may explain these differences. In addition, we report the existence of 3′ polyT tails in diverse eukaryotic protists, suggesting that this structure is ancestral. Together, these results underscore the complexity of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron sequences in modern eukaryotes, and the impressive evolutionary malleability of eukaryotic genes.
Intron number varies considerably among genomes, but despite their fundamental importance, the mutational mechanisms and evolutionary processes underlying the expansion of intron number remain unknown. Here we show that Drosophila, in contrast to most eukaryotic lineages, is still undergoing a dramatic rate of intron gain. These novel introns carry significantly weaker splice sites that may impede their identification by the spliceosome. Novel introns are more likely to encode a premature termination codon (PTC), indicating that nonsense-mediated decay (NMD) functions as a backup for weak splicing of new introns. Our data suggest that new introns originate when genomic insertions with weak splice sites are hidden from selection by NMD. This mechanism reduces the sequence requirement imposed on novel introns and implies that the capacity of the spliceosome to recognize weak splice sites was a prerequisite for intron gain during eukaryotic evolution.
The surprising observation 30 years ago that genes are interrupted by non-coding introns changed our view of gene architecture. Intron number varies dramatically among species; ranging from nine introns/gene in humans to less than one in some simple eukyarotes. Here we ask where new introns come from and how they are maintained in a population. We find that novel introns do not arise from pre-existing introns, although the mechanisms that generate novel introns remain unclear. We also show that novel introns carry only weak signals for their identification and removal, and therefore depend on nonsense-mediated decay (NMD). NMD maintains RNA quality control by degrading transcripts that have not been spliced properly. We propose that NMD shelters novel introns from natural selection. This increases the likelihood that a novel intron will rise in frequency and be maintained within a population, thus increasing the rate of intron gain.
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
In eukaryotes, protein-coding genes are interrupted by non-coding introns. The intron densities widely differ, from 6–7 introns per kilobase of coding sequence in vertebrates, some invertebrates and plants, to only a few introns across the entire genome in many unicellular forms. We applied a robust statistical methodology, Markov Chain Monte Carlo, to reconstruct the history of intron gain and loss throughout the evolution of eukaryotes using a set of 245 homologous genes from 99 genomes that represent the diversity of eukaryotes. Intron-rich ancestors were confidently inferred for each major eukaryotic group including 53% to 74% of the human intron density for the last eukaryotic common ancestor, and 120% to 130% of the human value for the last common ancestor of animals. Evolution of eukaryotic genes involved primarily intron loss, with substantial gain only at the bases of several major branches including plants and animals. Thus, the common ancestor of all extant eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. The line of descent from the last common ancestor to mammals was an uninterrupted intron-rich state that, given the error-prone splicing in intron-rich organisms, was conducive to the elaboration of functional alternative splicing.
Small nucleolar (sno)RNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes.
We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA), but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates.
Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not evolutionarily stable across the eukaryote tree; ongoing intragenomic mobility has erased signal of their ancestral gene organization, and neither introns-first nor evolved co-expression adequately explain our results. We therefore present a third model — constrained drift — whereby individual snoRNAs are intragenomically mobile and may occupy any genomic location from which expression satisfies phenotype.
snoRNA; Last Eukaryotic Common Ancestor; Intron; Retrotransposition; Introns-first; Constrained drift
As part of the exploratory sequencing program Génolevures, visual scrutinisation and bioinformatic tools were used to detect spliceosomal introns in seven hemiascomycetous yeast species. A total of 153 putative novel introns were identified. Introns are rare in yeast nuclear genes (<5% have an intron), mainly located at the 5′ end of ORFs, and not highly conserved in sequence. They all share a clear non-random vocabulary: conserved splice sites and conserved nucleotide contexts around splice sites. Homologues of metazoan snRNAs and putative homologues of SR splicing factors were identified, confirming that the spliceosomal machinery is highly conserved in eukaryotes. Several introns’ features were tested as possible markers for phylogenetic analysis. We found that intron sizes vary widely within each genome, and according to the phylogenetic position of the yeast species. The evolutionary origin of spliceosomal introns was examined by analysing the degree of conservation of intron positions in homologous yeast genes. Most introns appeared to exist in the last common ancestor of present day yeast species, and then to have been differentially lost during speciation. However, in some cases, it is difficult to exclude a possible sliding event affecting a pre-existing intron or a gain of a novel intron. Taken together, our results indicate that the origin of spliceosomal introns is complex within a given genome, and that present day introns may have resulted from a dynamic flux between intron conservation, intron loss and intron gain during the evolution of hemiascomycetous yeasts.
Eukaryotic pre-mRNA gene transcripts are processed by the spliceosome to remove portions of the transcript, called spliceosomal introns. The spliceosome recognizes intron boundaries by the presence of sequence signals (motifs) contained in the actual transcript, thus sequence changes in the genome that affect existing splicing signals or create new signals may lead to changes in transcript splicing patterns. Such changes may lead to previously excluded (intronic) transcript regions being included (exonic) or vice versa. Such changes can affect the encoded protein sequence and/or post-transcriptional regulation, and are thus a potentially important source of genomic and phenotypic novelty. Two recent papers suggest that such changes may be a major force in remodeling of eukaryotic gene structures, however the rate of occurrence of such changes has not been assessed at the genomic level.
I studied four closely related species of Cryptoccocus fungi. Among 28,256 studied introns, canonical GT/C...AG boundaries are nearly universally conserved across all four species. Among only 40 observed cases of cDNA-confirmed non-conserved intron boundaries, most are likely to involve alternative splicing. I find only five cases of "intronization," intron creation from an internal exonic region by de novo emergence of new splicing boundaries, and no cases of the reverse process, "de-intronization." I find no more than ten clear cases of true movement of an intron boundary of a possibly constitutively spliced intron, and no clear cases of true "intron sliding," in which changes in the positions of both intron boundaries could lead to a movement of the intron position along the coding sequence.
These results suggest that intronization, de-intronization, and intron boundary movement are rare events in evolution.
Many multicellular eukaryotes have two types of spliceosomes for the removal of introns from messenger RNA precursors. The major (U2) spliceosome processes the vast majority of introns, referred to as U2-type introns, while the minor (U12) spliceosome removes a small fraction (less than 0.5%) of introns, referred to as U12-type introns. U12-type introns have distinct sequence elements and usually occur together in genes with U2-type introns. A phylogenetic distribution of U12-type introns shows that the minor splicing pathway appeared very early in eukaryotic evolution and has been lost repeatedly.
We have investigated the evolution of U12-type introns among eighteen metazoan genomes by analyzing orthologous U12-type intron clusters. Examination of gain, loss, and type switching shows that intron type is remarkably conserved among vertebrates. Among 180 intron clusters, only eight show intron loss in any vertebrate species and only five show conversion between the U12 and the U2-type. Although there are only nineteen U12-type introns in Drosophila melanogaster, we found one case of U2 to U12-type conversion, apparently mediated by the activation of cryptic U12 splice sites early in the dipteran lineage. Overall, loss of U12-type introns is more common than conversion to U2-type and the U12 to U2 conversion occurs more frequently among introns of the GT-AG subtype than among introns of the AT-AC subtype. We also found support for natural U12-type introns with non-canonical terminal dinucleotides (CT-AC, GG-AG, and GA-AG) that have not been previously reported.
Although complete loss of the U12-type spliceosome has occurred repeatedly, U12 introns are extremely stable in some taxa, including eutheria. Loss of U12 introns or the genes containing them is more common than conversion to the U2-type. The degeneracy of U12-type terminal dinucleotides among natural U12-type introns is higher than previously thought.
All sequenced eukaryotic genomes have been shown to possess at least a few introns. This includes those unicellular organisms, which were previously suspected to be intron-less. Therefore, gene splicing must have been present at least in the last common ancestor of the eukaryotes. To explain the evolution of introns, basically two mutually exclusive concepts have been developed. The introns-early hypothesis says that already the very first protein-coding genes contained introns while the introns-late concept asserts that eukaryotic genes gained introns only after the emergence of the eukaryotic lineage. A very important aspect in this respect is the conservation of intron positions within homologous genes of different taxa.
GenePainter is a standalone application for mapping gene structure information onto protein multiple sequence alignments. Based on the multiple sequence alignments the gene structures are aligned down to single nucleotides. GenePainter accounts for variable lengths in exons and introns, respects split codons at intron junctions and is able to handle sequencing and assembly errors, which are possible reasons for frame-shifts in exons and gaps in genome assemblies. Thus, even gene structures of considerably divergent proteins can properly be compared, as it is needed in phylogenetic analyses. Conserved intron positions can also be mapped to user-provided protein structures. For their visualization GenePainter provides scripts for the molecular graphics system PyMol.
GenePainter is a tool to analyse gene structure conservation providing various visualization options. A stable version of GenePainter for all operating systems as well as documentation and example data are available at http://www.motorprotein.de/genepainter.html.
Exon; Intron; Gene structure; Evolution
Recent advances in genomics of viruses and cellular life forms have greatly stimulated interest in the origins and evolution of viruses and, for the first time, offer an opportunity for a data-driven exploration of the deepest roots of viruses. Here we briefly review the current views of virus evolution and propose a new, coherent scenario that appears to be best compatible with comparative-genomic data and is naturally linked to models of cellular evolution that, from independent considerations, seem to be the most parsimonious among the existing ones.
Several genes coding for key proteins involved in viral replication and morphogenesis as well as the major capsid protein of icosahedral virions are shared by many groups of RNA and DNA viruses but are missing in cellular life forms. On the basis of this key observation and the data on extensive genetic exchange between diverse viruses, we propose the concept of the ancient virus world. The virus world is construed as a distinct contingent of viral genes that continuously retained its identity throughout the entire history of life. Under this concept, the principal lineages of viruses and related selfish agents emerged from the primordial pool of primitive genetic elements, the ancestors of both cellular and viral genes. Thus, notwithstanding the numerous gene exchanges and acquisitions attributed to later stages of evolution, most, if not all, modern viruses and other selfish agents are inferred to descend from elements that belonged to the primordial genetic pool. In this pool, RNA viruses would evolve first, followed by retroid elements, and DNA viruses. The Virus World concept is predicated on a model of early evolution whereby emergence of substantial genetic diversity antedates the advent of full-fledged cells, allowing for extensive gene mixing at this early stage of evolution. We outline a scenario of the origin of the main classes of viruses in conjunction with a specific model of precellular evolution under which the primordial gene pool dwelled in a network of inorganic compartments. Somewhat paradoxically, under this scenario, we surmise that selfish genetic elements ancestral to viruses evolved prior to typical cells, to become intracellular parasites once bacteria and archaea arrived at the scene. Selection against excessively aggressive parasites that would kill off the host ensembles of genetic elements would lead to early evolution of temperate virus-like agents and primitive defense mechanisms, possibly, based on the RNA interference principle. The emergence of the eukaryotic cell is construed as the second melting pot of virus evolution from which the major groups of eukaryotic viruses originated as a result of extensive recombination of genes from various bacteriophages, archaeal viruses, plasmids, and the evolving eukaryotic genomes. Again, this vision is predicated on a specific model of the emergence of eukaryotic cell under which archaeo-bacterial symbiosis was the starting point of eukaryogenesis, a scenario that appears to be best compatible with the data.
The existence of several genes that are central to virus replication and structure, are shared by a broad variety of viruses but are missing from cellular genomes (virus hallmark genes) suggests the model of an ancient virus world, a flow of virus-specific genes that went uninterrupted from the precellular stage of life's evolution to this day. This concept is tightly linked to two key conjectures on evolution of cells: existence of a complex, precellular, compartmentalized but extensively mixing and recombining pool of genes, and origin of the eukaryotic cell by archaeo-bacterial fusion. The virus world concept and these models of major transitions in the evolution of cells provide complementary pieces of an emerging coherent picture of life's history.
W. Ford Doolittle, J. Peter Gogarten, and Arcady Mushegian.
Genome-wide studies of intron dynamics in mammalian orthologous genes have found convincing evidence for loss of introns but very little for intron turnover. Similarly, large-scale analysis of intron dynamics in a few vertebrate genomes has identified only intron losses and no gains, indicating that intron gain is an extremely rare event in vertebrate evolution. These studies suggest that the intron-rich genomes of vertebrates do not allow intron gain. The aim of this study was to search for evidence of de novo intron gain in domesticated genes from an analysis of their exon/intron structures.
A phylogenomic approach has been used to analyse all domesticated genes in mammals and chordates that originated from the coding parts of transposable elements. Gain of introns in domesticated genes has been reconstructed on well established mammalian, vertebrate and chordate phylogenies, and examined as to where and when the gain events occurred. The locations, sizes and amounts of de novo introns gained in the domesticated genes during the evolution of mammals and chordates has been analyzed. A significant amount of intron gain was found only in domesticated genes of placental mammals, where more than 70 cases were identified. De novo gained introns show clear positional bias, since they are distributed mainly in 5' UTR and coding regions, while 3' UTR introns are very rare. In the coding regions of some domesticated genes up to 8 de novo gained introns have been found. Intron densities in Eutheria-specific domesticated genes and in older domesticated genes that originated early in vertebrates are lower than those for normal mammalian and vertebrate genes. Surprisingly, the majority of intron gains have occurred in the ancestor of placentals.
This study provides the first evidence for numerous intron gains in the ancestor of placental mammals and demonstrates that adequate taxon sampling is crucial for reconstructing intron evolution. The findings of this comprehensive study slightly challenge the current view on the evolutionary stasis in intron dynamics during the last 100 - 200 My. Domesticated genes could constitute an excellent system on which to analyse the mechanisms of intron gain in placental mammals.
Reviewers: this article was reviewed by Dan Graur, Eugene V. Koonin and Jürgen Brosius.
The origin of present day introns is a subject of spirited debate. Any intron evolution theory must account for not only nuclear spliceosomal introns but also their antecedents. The evolution of group II introns is fundamental to this debate, since group II introns are the proposed progenitors of nuclear spliceosomal introns and are found in ancient genes from modern organisms. We have studied the evolution of chloroplast introns and twintrons (introns within introns) in the genus Euglena. Our hypothesis is that Euglena chloroplast introns arose late in the evolution of this lineage and that twintrons were formed by the insertion of one or more introns into existing introns. In the present study we find that 22 out of 26 introns surveyed in six different photosynthesis-related genes from the plastid DNA of Euglena gracilis are not present in one or more basally branching Euglena spp. These results are supportive of a late origin for Euglena chloroplast group II introns. The psbT gene in Euglena viridis, a basally branching Euglena species, contains a single intron in the identical position to a psbT twintron from E.gracilis, a derived species. The E.viridis intron, when compared with 99 other Euglena group II introns, is most similar to the external intron of the E.gracilis psbT twintron. Based on these data, the addition of introns to the ancestral psbT intron in the common ancester of E.viridis and E.gracilis gave rise to the psbT twintron in E.gracilis.
The presence of spliceosomal introns in eukaryotic genes poses a major puzzle for the study of genome evolution. Intron densities vary enormously among distant lineages. However, the mechanisms driving intron gains are poorly understood and very few intron gains and losses have been documented over short evolutionary time spans. Fungi emerged recently as excellent models to study intron evolution and “reverse splicing” was found to be a major driver of recent intron gains in a clade of ascomycete fungi. We screened a total of 38 genomes from two fungal clades important in medicine and agriculture to identify intron gains and losses both within and between species. We detected 86 and 198 variable intron positions in the Cryptococcus and Fusarium clades, respectively. Some genes underwent extensive changes in their exon–intron structure, with up to six variable intron positions per gene. We identified a very recently gained intron in a group of tomato-infecting strains belonging to the F. oxysporum species complex. In the human pathogen C. gattii, we found recent intron losses in subtypes of the species. The two studied fungal clades provided evidence for extensive changes in their exon–intron structure within and among closely related species. We show that both intronization of previously coding DNA and insertion of exogenous DNA are the major drivers of intron gains.
spliceosomal introns; intron gains; Fusarium; Cryptococcus; population genomics
Genes in pieces and spliceosomal introns are a landmark of eukaryotes, with intron invasion usually assumed to have happened early on in evolution. Here, we analyze the intron landscape of Micromonas, a unicellular green alga in the Mamiellophyceae lineage, demonstrating the coexistence of several classes of introns and the occurrence of recent massive intron invasion. This study focuses on two strains, CCMP1545 and RCC299, and their related individuals from ocean samplings, showing that they not only harbor different classes of introns depending on their location in the genome, as for other Mamiellophyceae, but also uniquely carry several classes of repeat introns. These introns, dubbed introner elements (IEs), are found at novel positions in genes and have conserved sequences, contrary to canonical introns. This IE invasion has a huge impact on the genome, doubling the number of introns in the CCMP1545 strain. We hypothesize that each IE class originated from a single ancestral IE that has been colonizing the genome after strain divergence by inserting copies of itself into genes by intron transposition, likely involving reverse splicing. Along with similar cases recently observed in other organisms, our observations in Micromonas strains shed a new light on the evolution of introns, suggesting that intron gain is more widespread than previously thought.
intron evolution; intron gain; Mamiellophyceae; Micromonas; introner elements
Most eukaryotes have at least some genes interrupted by introns. While it is well
accepted that introns were already present at moderate density in the last
eukaryote common ancestor, the conspicuous diversity of intron density among
genomes suggests a complex evolutionary history, with marked differences between
phyla. The question of the rates of intron gains and loss in the course of
evolution and factors influencing them remains controversial. We have
investigated a single gene family, alpha-amylase, in 55 species covering a
variety of animal phyla. Comparison of intron positions across phyla suggests a
complex history, with a likely ancestral intronless gene undergoing frequent
intron loss and gain, leading to extant intron/exon structures that are highly
variable, even among species from the same phylum. Because introns are known to
play no regulatory role in this gene and there is no alternative splicing, the
structural differences may be interpreted more easily: intron positions, sizes,
losses or gains may be more likely related to factors linked to splicing
mechanisms and requirements, and to recognition of introns and exons, or to more
extrinsic factors, such as life cycle and population size. We have shown that
intron losses outnumbered gains in recent periods, but that “resets”
of intron positions occurred at the origin of several phyla, including
vertebrates. Rates of gain and loss appear to be positively correlated. No phase
preference was found. We also found evidence for parallel gains and for intron
sliding. Presence of introns at given positions was correlated to a strong
protosplice consensus sequence AG/G, which was much weaker in the absence of
intron. In contrast, recent intron insertions were not associated with a
specific sequence. In animal Amy genes, population size and
generation time seem to have played only minor roles in shaping gene
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
Certain eukaryotic genomes, such as those of the amitochondriate parasites Giardia and Trichomonas, have very low intron densities, so low that canonical spliceosomal introns have only recently been discovered through genome sequencing. These organisms were formerly thought to be ancient eukaryotes that diverged before introns originated, or at least became common. Now however, they are thought to be members of a supergroup known as excavates, whose members generally appear to have low densities of canonical introns. Here we have used environmental expressed sequence tag (EST) sequencing to identify 17 genes from the uncultivable oxymonad Streblomastix strix, to survey intron densities in this most poorly studied excavate group.
We find that Streblomastix genes contain an unexpectedly high intron density of about 1.1 introns per gene. Moreover, over 50% of these are at positions shared between a broad spectrum of eukaryotes, suggesting theyare very ancient introns, potentially present in the last common ancestor of eukaryotes.
The Streblomastix data show that the genome of the ancestor of excavates likely contained many introns and the subsequent evolution of introns has proceeded very differently in different excavate lineages: in Streblomastix there has been much stasis while in Trichomonas and Giardia most introns have been lost.