|Home | About | Journals | Submit | Contact Us | Français|
The two types of eukaryotic spliceosomal introns, U2 and U12, possess different splice signals and are excised by distinct spliceosomes. The nature of the primordial introns remains uncertain. A comparison of the amino acid distributions at insertion sites of introns that retained their positions throughout eukaryotic evolution with the distributions for human and Arabidopsis thaliana U2 and U12 introns reveals close similarity with U2 but not U12. Thus, the primordial spliceosomal introns were, most likely, U2-type.
Introns are excised from pre-mRNAs at acceptor and donor splice sites. This process is mediated by the spliceosome, a complex assembly of small nuclear ribonucleoprotein particles (snRNPs) and heterogeneous nuclear ribonucleoprotein particles (hnRNPs) that is conserved throughout the eukaryotic world [1–3]. There are two classes of introns, U2-type and U12-type, which are excised by two distinct spliceosomes in eukaryotic nuclei [4,5]. During the splicing process, the components of a spliceosome establish specific interactions with parts of the intron and its flanking exons to ensure accurate and efficient splicing . These essential interactions are supported by conserved nucleotide sequence motifs (splice signals) that flank the splice junctions from both the intronic and the exonic sides and are specific to the intron class (Box 1).
The splicing process includes specific interactions between components of a spliceosome and parts of the intron and its flanking exons that ensure accurate and efficient splicing . In the major spliceosome, the U1 snRNP recognizes the donor splice site and the U5 snRNP recognizes the acceptor site. The (A,C)AG|GU(A,G)AGU consensus (the first two nucleotides of an intron are underlined) is present in the donor splice sites and is partially complementary to the 5′ end of U1 small nuclear RNA (snRNA); this interaction is a major requirement for splicing. The motif CAG|G (the last two nucleotides of the intron are underlined), which is preceded by a polypyrimidine tract, is typical of the acceptor splice site [6,14,15].
The minor spliceosome catalyses the removal of an atypical class of spliceosomal introns (U12-type) from eukaryotic pre-mRNAs. U12 introns have been originally recognized on the basis of their unusual terminal dinucleotides: |AT at the donor splice site and AC| at the acceptor splice site [16,17]. A closer examination of the sequences of these introns revealed several features that distinguish them from U2 introns, including conservation of unusual signals at the donor splice site (|ATATCCTT) and immediately upstream of the acceptor splice site (TCCTTAAC 10–15 bases from the splice junction) [16,17]. Subsequently, it has been shown that some |GT-AG| introns (the consensus of U2 introns) are also excised by the U12 spliceosome. The U12 spliceosome was first identified and characterized in animals, in which it was found to contain several unique RNA constituents that share structural similarity with and seem to be functionally analogous to the snRNAs contained in the major spliceosome [18,19]. The U12 spliceosome contains several specific, low-abundance snRNPs, namely, U11, U12, U4atac and U6atac and the U5 snRNP, which is present also in the major spliceosome .
Major and minor spliceosomal components and both type of introns are present in animals, plants, fungi and at least several unicellular eukaryotes . Given that several of their characteristic constituents are present in representative organisms from all eukaryotic supergroups, both U2 and U12 spliceosomes evolved before the radiation of the supergroups (i.e. at the earliest stages of eukaryotic evolution) [4,12,21]. This conclusion is supported by the recent demonstration that the positions of U12 introns are conserved in orthologous genes from human and Arabidopsis to an even greater extent than the positions of U2 introns .
The U12-type introns are the minor class of spliceosomal introns in eukaryotic genomes (<1%) . However, the paucity of U12 introns in extant genomes does not rule out the possibility that introns of this type were substantially more abundant at the early stages of eukaryotic evolution or even that U12 introns are the ancestral form of splicesosomal introns, especially given that unidirectional conversion of U12 to U2 introns is apparent in genomic comparisons and that many lineages of eukaryotes have lost U12 introns altogether . Thus, the scenario in which the ancestral introns were U12-type, but subsequent amelioration led to the current excess of U2 introns, is not unrealistic.
To gain insight into the nature of primordial introns, we analyzed putative protosplice sites of ancient introns that retained their positions throughout the course of the evolution of eukaryotes. The idea is to determine whether the primordial protosplice sites correspond to those of U12 or U2 introns. Protosplice sites [8,9] are thought to comprise specific targets for intron insertion into coding sequences of eukaryotic genes. The existence of protosplice sites is indicated by the conservation of nucleotides flanking the splice junctions (Figure 1a,b). In principle, these consensus nucleotides could be remnants of the original protosplice sites or could have evolved convergently after intron insertion. The existence of protosplice sites has been addressed directly by examining the context of introns inserted within codons encoding amino acids that are conserved in all eukaryotes and that, accordingly, are not subject to selection for splicing efficiency . Evidence has been presented that introns are either predominantly inserted into specific protosplice sites, which have the consensus sequence (A/C)AG||Gt, or are inserted randomly but preferentially fixed at such sites . The U12 protosplice sites are distinct from the U2 protosplice sites and have the CT||ATA consensus sequence (Figure 1c,d). This sequence is conserved in human and Arabidopsis thaliana, indicating that it has not changed since the divergence of plants and animals from their last common ancestor.
We analyzed the distributions of amino acids in intron-containing sites in which the amino acid is conserved in the sequences of orthologous proteins from eight eukaryotes and five prokaryotes (Supplementary Table S1 and Supplementary Materials and Methods in the supplementary material online), that is, sites that are subject to extreme evolutionary constraints (hereafter called invariant sites). Such constraints operating at the level of amino acids imply that selection for splicing efficiency had no substantial impact on the intron insertion signal. Thus, this signal, at least to the extent that it covers conserved nucleotides within the respective codon, must have remained intact since the time of intron insertion at an early stage of eukaryotic evolution. Ancient introns, in this case, were defined as those in which positions are conserved in at least two of three major eukaryotic lineages (plants, animals plus fungi, and apicomplexa). All 197 ancient introns (53% of the intron positions that are conserved between animals and plants) found at the invariant sites were of the U2-type.
Putative protosplice sites can be inferred by analyzing amino acid frequency distributions at intron-containing sites . We compared the amino acid distributions at the putative ancient protosplice sites that are derived from the invariant site analysis with the distributions at the sites containing U2 and U12 introns in human and Arabidopsis genes. The distributions of amino acids at intron-containing invariant sites were highly non-uniform (Figure 2). Introns occur in three phases, that is, the location of an intron can occur within or between codons; introns of phase 0, 1 and 2 are located between two codons, after the first position in a codon and after the second position, respectively. Each phase has a distinct set of over-represented conserved amino acids (Figure 2a,d and Supplementary Table S2). This effect is especially pronounced for phase 1 in which 71% of primordial introns are located within glycine codons (G|GN) (Figure 2c). This pattern is similar to that seen for U2 introns in phase 1, in which 37% of human introns and 40% of Arabidopsis introns are located within glycine codons (Figure 2c), in agreement with the inference that at least a substantial fraction of ancient introns was U2-type. The excess of glycine in the case of ancient introns is a straightforward consequence of the over-representation of glycine in invariant positions (Supplementary Figure S1).
Comparison of the distributions of amino acids that harbor human and Arabidopsis U2 and U12 introns revealed an insignificant negative correlation (Supplementary Table S3). This is not unexpected when taking into account the difference between the U2 and U12 inferred protosplice sites (Figure 1). To compare the protosplice sites of primordial introns with U2 and U12 protosplice sites, we employed multiple regression analyses using frequencies of invariant amino acids containing ancient introns as a dependent variable and frequencies of amino acids containing human and Arabidopsis U2 or U12 introns as independent variables (Table 1).
A strong and statistically significant positive correlation between the putative ancient protosplice sites and U2 protosplice sites from human and Arabidopsis was found both for the raw numbers of amino acids and for normalized values (Table 1 and Supplementary Table S4), thereby explaining a substantial part (>0.64) of the sequence variance of the ancient protosplice sites. This result indicates that most, if not all, of the analyzed primordial introns were U2-type at the time of their insertion at an early stage of eukaryotic evolution rather than being the result of U12 to U2 conversion. It should be noted that this finding in itself is not dependent on the excess of U2 introns in extant genes or even in conserved intron positions but, rather, comes from an unbiased analysis of invariant intron-containing sites.
With respect to the possibility of massive losses of U12 in early eukaryotes, we have shown in a separate recent study that positions of U12 introns are even more strongly conserved between humans and Arabidopsis than positions of U2 introns . Therefore, it seems unlikely that all primordial U12 introns have been lost, so at least a substantial majority of the primordial introns probably were of the U2-type.
We cannot rule out the (formal) possibility that a minor fraction of U12 introns was present during the early stages of eukaryotic evolution, although there was no correlation between ancient protosplice sites and U12 protosplice sites (Table 1). We attempted to estimate the sensitivity of the multiple regression analysis using a sampling procedure. Mixtures of U2 and U12 protosplice sites with different proportions of each type (e.g. 10% U12 protosplice sites and 90% U2 protosplice sites) were generated and used as pseudo-ancestral protosplice sites. The results of this simulation show (Supplementary Figure S2) that even a 10% admixture of U12 introns yielded a correlation coefficient value that was significantly lower than the value observed with the real data (Supplementary Table S5). The number of known U12 introns is too small to enable a more precise estimate but the results strongly indicate that, if U12 introns were present among the primordial introns, their fraction was, at best, similar to that in modern genomes. The conclusions of this analysis should be interpreted with caution considering that the invariant sites that are informative for inferring the features of primordial introns comprise but a small fraction of the conserved intron positions and, also, that the statistics on the discrimination between U2 and U12 protosplice sites is weak. Nevertheless, as shown earlier, we currently have no indication of the existence of primordial U12 introns, whereas the evidence in support of primordial U2 introns is clear.
The origin of the two types of spliceosomal introns remains a matter of conjecture.
The first scenario to be proposed involved a fission–fusion model in which the two types of introns and the two distinct spliceosomes were combined in the ancestral eukaryote as a result of a fusion of two ancient lineages . However, there seems to be little, if any, independent evidence in support of such a fusion. Perhaps, a more realistic hypothesis holds that U2 and U12 introns descend from two separate invasions of group II self-splicing introns (retroelements) into eukaryotic genes . The present results seem to be compatible with this scenario but indicate a specific succession of the two putative waves of invasion. The U2 introns would be the first to populate the genes to substantial intron densities, followed by the later (but still antedating the radiation of eukaryotic supergroups ) invasion of U12 introns that was much more limited in scale owing to the paucity of niches available for insertion of new introns.
We thank Ravi Sachidanandam for providing the dump of the SpliceRack (http://katahdin.cshl.edu:9331/SpliceRack) dababase. The authors’ research is supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health.