Many biological processes involve gene-expression regulation by alternative splicing. Here, we identify the splicing factor SRSF6 as a regulator of wound healing and tissue homeostasis in skin. We show that SRSF6 is a proto-oncogene that is frequently overexpressed in human skin cancer. Overexpressing it in transgenic mice induces hyperplasia of sensitized skin and promotes aberrant alternative splicing. We identify 139 target genes of SRSF6 in skin, and show that this SR protein binds to alternative exons of the extracellular-matrix protein tenascin C pre-mRNA, promoting the expression of isoforms characteristic of invasive and metastatic cancer in a cell-type-independent manner. SRSF6 overexpression additionally results in depletion of Lgr6+ stem cells, and excessive keratinocyte proliferation and response to injury. Furthermore, the effects of SRSF6 in wound healing assayed in vitro depend on the TNC isoforms. Thus, abnormal SR-protein expression can perturb tissue homeostasis.
SRSF1; p53; ribosomal stress; RPL5; MDM2; oncogene-induced senescence; OIS; autoregulation; oncogenesis
Alternative splicing of the pyruvate kinase M gene (PK-M) can generate the M2 isoform and promote aerobic glycolysis and tumor growth. However, the cancer-specific alternative splicing regulation of PK-M is not completely understood. Here, we demonstrate that PK-M is regulated by reciprocal effects on the mutually exclusive exons 9 and 10, such that exon 9 is repressed and exon 10 is activated in cancer cells. Strikingly, exonic, rather than intronic, cis-elements are key determinants of PK-M splicing isoform ratios. Using a systematic sub-exonic duplication approach, we identify a potent exonic splicing enhancer in exon 10, which differs from its homologous counterpart in exon 9 by only two nucleotides. We identify SRSF3 as one of the cognate factors, and show that this serine/arginine-rich protein activates exon 10 and mediates changes in glucose metabolism. These findings provide mechanistic insights into the complex regulation of alternative splicing of a key regulator of the Warburg effect, and also have implications for other genes with a similar pattern of alternative splicing.
alternative splicing; cancer metabolism; pyruvate kinase; SRSF3
SF2/ASF is a prototypical SR protein, with important roles in splicing and other aspects of mRNA metabolism. SFRS1 (SF2/ASF) is a potent proto-oncogene with abnormal expression in many tumors. We found that SF2/ASF negatively autoregulates its expression to maintain homeostatic levels. We characterized six SF2/ASF alternatively spliced mRNA isoforms: the major isoform encodes full-length protein, whereas the others are either retained in the nucleus or degraded by NMD. Unproductive splicing accounts for only part of the autoregulation, which occurs primarily at the translational level. The effect is specific to SF2/ASF and requires RRM2. The ultraconserved 3′UTR is necessary and sufficient for downregulation. SF2/ASF overexpression shifts the distribution of target mRNA towards mono-ribosomes, and translational repression is partly independent of Dicer and a 5′ cap. Thus, multiple post-transcriptional and translational mechanisms are involved in fine-tuning the expression of SF2/ASF.
Efficient transcription of the HIV-1 genome is regulated by Tat, which recruits P-TEFb from the 7SK small nuclear ribonucleoprotein (snRNP) and other nucleoplasmic complexes to phosphorylate RNA polymerase II and other factors associated with the transcription complex. Although Tat activity is dependent on its binding to the viral TAR sequence, little is known about the cellular factors that might also assemble onto this region of the viral transcript. Here, we report that the splicing factor SRSF1 (SF2/ASF) and Tat recognize overlapping sequences within TAR and the 7SK RNA. SRSF1 expression can inhibit Tat transactivation by directly competing for its binding to TAR. Additionally, we provide evidence that SRSF1 can increase the basal level of viral transcription in the absence of Tat. We propose that SRSF1 activates transcription in the early stages of viral infection by recruiting P-TEFb to TAR from the 7SK snRNP. Whereas in the later stages, Tat substitutes for SRSF1 by promoting release of the stalled polymerase and more efficient transcriptional elongation.
Many mutations in the skeletal-muscle sodium-channel gene SCN4A have been associated with myotonia and/or periodic paralysis, but so far all of these mutations are located in exons. We found a patient with myotonia caused by a deletion/insertion located in intron 21 of SCN4A, which is an AT-AC type II intron. This is a rare class of introns that, despite having AT-AC boundaries, are spliced by the major or U2-type spliceosome. The patient's skeletal muscle expressed aberrantly spliced SCN4A mRNA isoforms generated by activation of cryptic splice sites. In addition, genetic suppression experiments using an SCN4A minigene showed that the mutant 5′ splice site has impaired binding to the U1 and U6 snRNPs, which are the cognate factors for recognition of U2-type 5′ splice sites. One of the aberrantly spliced isoforms encodes a channel with a 35-amino-acid insertion in the cytoplasmic loop between domains III and IV of Nav1.4. The mutant channel exhibited a marked disruption of fast inactivation, and a simulation in silico showed that the channel defect is consistent with the patient's myotonic symptoms. This is the first report of a disease-associated mutation in an AT-AC type II intron, and also the first intronic mutation in a voltage-gated ion channel gene showing a gain-of-function defect.
skeletal muscle; myotonia; splicing; gain-of-function; simulation; channelopathy
Splicing and translation are highly regulated steps of gene expression. Altered expression of proteins involved in these processes can be deleterious. Therefore, the cell has many safeguards against such misregulation. We report that the oncogenic splicing factor SRSF1, which is overexpressed in many cancers, stabilizes the tumor-suppressor protein p53 by abrogating its MDM2-dependent proteasomal degradation. We show that SRSF1 is a necessary component of an MDM2/ribosomal-protein complex—separate from the ribosome—that functions in a p53-dependent ribosomal-stress checkpoint pathway. Consistent with the stabilization of p53, increased SRSF1 expression in primary human fibroblasts decreases cellular proliferation and ultimately triggers oncogene-induced senescence (OIS). These findings underscore the deleterious outcome of SRSF1 overexpression and identify a cellular defense mechanism against its aberrant function. Furthermore, they implicate the RPL5-MDM2 complex in OIS, and demonstrate a link between spliceosomal and ribosomal components—functioning independently of their canonical roles—to monitor cellular physiology and cell-cycle progression.
A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (∼14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows–Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.
One of the greatest thrills a biomedical researcher may experience is seeing the product of many years of dedicated effort finally make its way to the patient. As a team, we have worked for the past eight years to discover a drug that could treat a devastating childhood neuromuscular disease, spinal muscular atrophy (SMA). Here, we describe the journey that has led to a promising drug based on the biology underlying the disease.
Motivation: Alternative splicing (AS) is a pre-mRNA maturation process leading to the expression of multiple mRNA variants from the same primary transcript. More than 90% of human genes are expressed via AS. Therefore, quantifying the inclusion level of every exon is crucial for generating accurate transcriptomic maps and studying the regulation of AS.
Results: Here we introduce SpliceTrap, a method to quantify exon inclusion levels using paired-end RNA-seq data. Unlike other tools, which focus on full-length transcript isoforms, SpliceTrap approaches the expression-level estimation of each exon as an independent Bayesian inference problem. In addition, SpliceTrap can identify major classes of alternative splicing events under a single cellular condition, without requiring a background set of reads to estimate relative splicing changes. We tested SpliceTrap both by simulation and real data analysis, and compared it to state-of-the-art tools for transcript quantification. SpliceTrap demonstrated improved accuracy, robustness and reliability in quantifying exon-inclusion ratios.
Conclusions: SpliceTrap is a useful tool to study alternative splicing regulation, especially for accurate quantification of local exon-inclusion ratios from RNA-seq data.
Availability and Implementation: SpliceTrap can be implemented online through the CSH Galaxy server http://cancan.cshl.edu/splicetrap and is also available for download and installation at http://rulai.cshl.edu/splicetrap/.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Alternative splicing of the pyruvate kinase M gene involves a choice between mutually exclusive exons 9 and 10. Use of exon 10 to generate the M2 isoform is crucial for aerobic glycolysis (the Warburg effect) and tumour growth. We previously demonstrated that splicing enhancer elements that activate exon 10 are mainly found in exon 10 itself, and deleting or mutating these elements increases the inclusion of exon 9 in cancer cells. To systematically search for new enhancer elements in exon 10 and develop an effective pharmacological method to force a switch from PK-M2 to PK-M1, we carried out an antisense oligonucleotide (ASO) screen. We found potent ASOs that target a novel enhancer in exon 10 and strongly switch the splicing of endogenous PK-M transcripts to include exon 9. We further show that the ASO-mediated switch in alternative splicing leads to apoptosis in glioblastoma cell lines, and this is caused by the downregulation of PK-M2, and not by the upregulation of PK-M1. These data highlight the potential of ASO-mediated inhibition of PK-M2 splicing as therapy for cancer.
alternative splicing; antisense oligonucleotides; cancer
Both splicing factors and microRNAs are important regulatory molecules that play key roles in post-transcriptional gene regulation. By miRNA deep sequencing, we identified 40 miRNAs that are differentially expressed upon ectopic overexpression of the splicing factor SF2/ASF. Here we show that SF2/ASF and one of its upregulated microRNAs (miR-7) can form a negative feedback loop: SF2/ASF promotes miR-7 maturation, and mature miR-7 in turn targets the 3′UTR of SF2/ASF to repress its translation. Enhanced microRNA expression is mediated by direct interaction between SF2/ASF and the primary miR-7 transcript to facilitate Drosha cleavage and is independent of SF2/ASF’s function in splicing. Other miRNAs, including miR-221 and miR-222, may also be regulated by SF2/ASF through a similar mechanism. These results underscore a function of SF2/ASF in pri-miRNA processing and highlight the potential coordination between splicing control and miRNA-mediated gene repression in gene regulatory networks.
The SR protein splicing factor SRSF1 is a potent proto-oncogene that is frequently upregulated in cancer. Here we show that SRSF1 is a direct target of the transcription-factor oncoprotein MYC. These two oncogenes are significantly co-expressed in lung carcinomas, and MYC knockdown downregulates SRSF1 expression in lung-cancer cell lines. MYC directly activates transcription of SRSF1 through two non-canonical E-boxes in its promoter. The resulting increase in SRSF1 protein is sufficient to modulate alternative splicing of a subset of transcripts. In particular, MYC induction leads to SRSF1-mediated alternative splicing of the signaling kinase MKNK2 and the transcription factor TEAD1. SRSF1 knockdown reduces MYC’s oncogenic activity, decreasing proliferation and anchorage-independent growth. These results suggest a mechanism for SRSF1 upregulation in tumors with elevated MYC, and identify SRSF1 as a critical MYC target that contributes to its oncogenic potential by enabling MYC to regulate the expression of specific protein isoforms through alternative splicing.
Spinal muscular atrophy (SMA) is an autosomal recessive neuromuscular disorder caused by mutations in the SMN1 gene that result in a deficiency of SMN protein. One approach to treat SMA is to use antisense oligonucleotides (ASOs) to redirect the splicing of a paralogous gene, SMN2, to boost production of functional SMN. Injection of a 2′-O-2-methoxyethyl–modified ASO (ASO-10-27) into the cerebral lateral ventricles of mice with a severe form of SMA resulted in splice-mediated increases in SMN protein and in the number of motor neurons in the spinal cord, which led to improvements in muscle physiology, motor function and survival. Intrathecal infusion of ASO-10-27 into cynomolgus monkeys delivered putative therapeutic levels of the oligonucleotide to all regions of the spinal cord. These data demonstrate that central nervous system–directed ASO therapy is efficacious and that intrathecal infusion may represent a practical route for delivering this therapeutic in the clinic.
Alternative splicing and posttranslational modifications (PTMs) are major sources of protein diversity in eukaryotic proteomes. The SR protein SF2/ASF is an oncoprotein that functions in pre-mRNA splicing, with additional roles in other posttranscriptional and translational events. Functional studies of SR protein PTMs have focused exclusively on the reversible phosphorylation of Ser residues in the C-terminal RS domain. We confirmed that human SF2/ASF is methylated at residues R93, R97, and R109, which were identified in a global proteomic analysis of Arg methylation, and further investigated whether these methylated residues regulate the properties of SF2/ASF. We show that the three arginines additively control the subcellular localization of SF2/ASF and that both the positive charge and the methylation state are important. Mutations that block methylation and remove the positive charge result in the cytoplasmic accumulation of SF2/ASF. The consequent decrease in nuclear SF2/ASF levels prevents it from modulating the alternative splicing of target genes, results in higher translation stimulation, and abrogates the enhancement of nonsense-mediated mRNA decay. This study addresses the mechanisms by which Arg methylation and the associated positive charge regulate the activities of SF2/ASF and emphasizes the significance of localization control for an oncoprotein with multiple functions in different cellular compartments.
There is at present no cure or effective therapy for spinal muscular atrophy (SMA), a neurodegenerative disease that is the leading genetic cause of infant mortality. SMA usually results from loss of the SMN1 (survival of motor neuron 1) gene, which leads to selective motor neuron degeneration. SMN2 is nearly identical to SMN1 but has a nucleotide replacement that causes exon 7 skipping, resulting in a truncated, unstable version of the SMA protein. SMN2 is present in all SMA patients, and correcting SMN2 splicing is a promising approach for SMA therapy. We identified a tetracycline-like compound, PTK-SMA1, which stimulates exon 7 splicing and increases SMN protein levels in vitro and in vivo in mice. Unlike previously identified molecules that stimulate SMN production via SMN2 promoter activation or undefined mechanisms, PTK-SMA1 is a unique therapeutic candidate in that it acts by directly stimulating splicing of exon 7. Synthetic small-molecule compounds such as PTK-SMA1 offer an alternative to antisense oligonucleotide therapies that are being developed as therapeutics for a number of disease-associated splicing defects.
hnRNP A1 binds to RNA in a cooperative manner. Initial hnRNP A1 binding to an exonic splicing silencer at the 3′ end of human immunodeficiency virus type 1 (HIV-1) tat exon 3, which is a high-affinity site, is followed by cooperative spreading in a 3′-to-5′ direction. As hnRNP A1 propagates toward the 5′ end of the exon, it antagonizes binding of a serine/arginine-rich (SR) protein to an exonic splicing enhancer, thereby inhibiting splicing at that exon's alternative 3′ splice site. tat exon 3 and the preceding intron of HIV-1 pre-mRNA can fold into an elaborate RNA secondary structure in solution, which could potentially influence hnRNP A1 binding. We report here that hnRNP A1 binding and splicing repression can occur on an unstructured RNA. Moreover, hnRNP A1 can effectively unwind an RNA hairpin upon binding, displacing a bound protein. We further show that hnRNP A1 can also spread in a 5′-to-3′ direction, although when initial binding takes place in the middle of an RNA, spreading preferentially proceeds in a 3′-to-5′ direction. Finally, when two distant high-affinity sites are present on the same RNA, they facilitate cooperative spreading of hnRNP A1 between the two sites.
Kaposi's sarcoma-associated herpesvirus (KSHV) ORF57 facilitates the expression of both intronless viral ORF59 genes and intron-containing viral K8 and K8.1 genes (V. Majerciak, N. Pripuzova, J. P. McCoy, S. J. Gao, and Z. M. Zheng, J. Virol. 81:1062-1071, 2007). In this study, we showed that disruption of ORF57 in a KSHV genome led to increased accumulation of ORF50 and K8 pre-mRNAs and reduced expression of ORF50 and K-bZIP proteins but had no effect on latency-associated nuclear antigen (LANA). Cotransfection of ORF57 and K8β cDNA, which retains a suboptimal intron of K8 pre-mRNA due to alternative splicing, promoted RNA splicing of K8β and production of K8α (K-bZIP). Although Epstein-Barr virus EB2, a closely related homolog of ORF57, had a similar activity in the cotransfection assays, herpes simplex virus type 1 ICP27 was inactive. This enhancement of RNA splicing by ORF57 correlates with the intact N-terminal nuclear localization signal motifs of ORF57 and takes place in the absence of other viral proteins. In activated KSHV-infected B cells, KSHV ORF57 partially colocalizes with splicing factors in nuclear speckles and assembles into spliceosomal complexes in association with low-abundance viral ORF50 and K8 pre-mRNAs and essential splicing components. The association of ORF57 with snRNAs occurs by ORF57-Sm protein interaction. We also found that ORF57 binds K8β pre-mRNAs in vitro in the presence of nuclear extracts. Collectively our data indicate that KSHV ORF57 functions as a novel splicing factor in the spliceosome-mediated splicing of viral RNA transcripts.
Drosophila Pumilio (Pum) protein is a translational regulator involved in embryonic patterning and germline development. Recent findings demonstrate that Pum also plays an important role in the nervous system, both at the neuromuscular junction (NMJ) and in long-term memory formation. In neurons, Pum appears to play a role in homeostatic control of excitability via down regulation of para, a voltage gated sodium channel, and may more generally modulate local protein synthesis in neurons via translational repression of eIF-4E. Aside from these, the biologically relevant targets of Pum in the nervous system remain largely unknown. We hypothesized that Pum might play a role in regulating the local translation underlying synapse-specific modifications during memory formation. To identify relevant translational targets, we used an informatics approach to predict Pum targets among mRNAs whose products have synaptic localization. We then used both in vitro binding and two in vivo assays to functionally confirm the fidelity of this informatics screening method. We find that Pum strongly and specifically binds to RNA sequences in the 3′UTR of four of the predicted target genes, demonstrating the validity of our method. We then demonstrate that one of these predicted target sequences, in the 3′UTR of discs large (dlg1), the Drosophila PSD95 ortholog, can functionally substitute for a canonical NRE (Nanos response element) in vivo in a heterologous functional assay. Finally, we show that the endogenous dlg1 mRNA can be regulated by Pumilio in a neuronal context, the adult mushroom bodies (MB), which is an anatomical site of memory storage.
The Drosophila Pumilio (Pum) protein was originally identified as a translational control factor for embryo patterning. Subsequent studies have identified Pum's role in multiple biological processes, including the maintenance of germline stem cell, the proliferation and migration of primordial germ cells, olfactory leaning and memory, and synaptic plasticity. Pum is highly conserved across phyla, i.e., from worm to human; however, the mRNA targets of Pum within each tissue and organism are largely unknown. On the other hand, the prediction of RNA binding sites remains a hard question in the computational field. We were interested in finding Pum targets in the nervous system using fruit flies as a model organism. To accomplish this, we used the few Pum binding sequences that had previously been shown in vivo as “training sequences” to construct bioinformatic models of the Pum binding site. We then predicted a few Pum mRNA targets among the genes known to function in neuronal synapses. We then used a combination of “golden standards” to verify these predictions: a biochemical assay called gel shifts, and in vivo functional assays both in embryo and neurons. With these approaches, we successfully confirmed one of the targets as Dlg, which is the Drosophila ortholog of human PSD95. Therefore, we present a complete story from computational study to real biological functions.
Serine/arginine-rich (SR) proteins are essential splicing factors with one or two RNA-recognition motifs (RRMs) and a C-terminal arginine- and serine-rich (RS) domain. SR proteins bind to exonic splicing enhancers via their RRM(s), and from this position are thought to promote splicing by antagonizing splicing silencers, recruiting other components of the splicing machinery through RS-RS domain interactions, and/or promoting RNA base-pairing through their RS domains. An RS domain tethered at an exonic splicing enhancer can function as a splicing activator, and RS domains play prominent roles in current models of SR protein functions. However, we previously reported that the RS domain of the SR protein SF2/ASF is dispensable for in vitro splicing of some pre-mRNAs. We have now extended these findings via the identification of a short inhibitory domain at the SF2/ASF N-terminus; deletion of this segment permits splicing in the absence of this SR protein's RS domain of an IgM pre-mRNA substrate previously classified as RS-domain-dependent. Deletion of the N-terminal inhibitory domain increases the splicing activity of SF2/ASF lacking its RS domain, and enhances its ability to bind pre-mRNA. Splicing of the IgM pre-mRNA in S100 complementation with SF2/ASF lacking its RS domain still requires an exonic splicing enhancer, suggesting that an SR protein RS domain is not always required for ESE-dependent splicing activation. Our data provide additional evidence that the SF2/ASF RS domain is not strictly required for constitutive splicing in vitro, contrary to prevailing models for how the domains of SR proteins function to promote splicing.
Pre-mRNA splicing is a crucial step in gene expression, and accurate recognition of splice sites is an essential part of this process. Splice sites with weak matches to the consensus sequences are common, though it is not clear how such sites are efficiently utilized. Using an in vitro splicing-complementation approach, we identified PUF60 as a factor that promotes splicing of an intron with a weak 3′ splice-site. PUF60 has homology to U2AF65, a general splicing factor that facilitates 3′ splice-site recognition at the early stages of spliceosome assembly. We demonstrate that PUF60 can functionally substitute for U2AF65 in vitro, but splicing is strongly stimulated by the presence of both proteins. Reduction of either PUF60 or U2AF65 in cells alters the splicing pattern of endogenous transcripts, consistent with the idea that regulation of PUF60 and U2AF65 levels can dictate alternative splicing patterns. Our results indicate that recognition of 3′ splice sites involves different U2AF-like molecules, and that modulation of these general splicing factors can have profound effects on splicing.
Despite a growing number of splicing mutations found in hereditary diseases, utilization of aberrant splice sites and their effects on gene expression remain challenging to predict. We compiled sequences of 346 aberrant 5′splice sites (5′ss) that were activated by mutations in 166 human disease genes. Mutations within the 5′ss consensus accounted for 254 cryptic 5′ss and mutations elsewhere activated 92 de novo 5′ss. Point mutations leading to cryptic 5′ss activation were most common in the first intron nucleotide, followed by the fifth nucleotide. Substitutions at position +5 were exclusively G>A transitions, which was largely attributable to high mutability rates of C/G>T/A. However, the frequency of point mutations at position +5 was significantly higher than that observed in the Human Gene Mutation Database, suggesting that alterations of this position are particularly prone to aberrant splicing, possibly due to a requirement for sequential interactions with U1 and U6 snRNAs. Cryptic 5′ss were best predicted by computational algorithms that accommodate nucleotide dependencies and not by weight-matrix models. Discrimination of intronic 5′ss from their authentic counterparts was less effective than for exonic sites, as the former were intrinsically stronger than the latter. Computational prediction of exonic de novo 5′ss was poor, suggesting that their activation critically depends on exonic splicing enhancers or silencers. The authentic counterparts of aberrant 5′ss were significantly weaker than the average human 5′ss. The development of an online database of aberrant 5′ss will be useful for studying basic mechanisms of splice-site selection, identifying splicing mutations and optimizing splice-site prediction algorithms.
In eukaryotic nuclei, DNA is wrapped around a protein octamer composed of the core histones H2A, H2B, H3, and H4, forming nucleosomes as the fundamental units of chromatin. The modification and deposition of specific histone variants play key roles in chromatin function. In this study, we established an in vitro system based on permeabilized cells that allows the assembly and exchange of histones in situ. H2A and H2B, each tagged with green fluorescent protein (GFP), are incorporated into euchromatin by exchange independently of DNA replication, and H3.1-GFP is assembled into replicated chromatin, as found in living cells. By purifying the cellular factors that assist in the incorporation of H2A–H2B, we identified protein phosphatase (PP) 2C γ subtype (PP2Cγ/PPM1G) as a histone chaperone that binds to and dephosphorylates H2A–H2B. The disruption of PP2Cγ in chicken DT40 cells increased the sensitivity to caffeine, a reagent that disturbs DNA replication and damage checkpoints, suggesting the involvement of PP2Cγ-mediated histone dephosphorylation and exchange in damage response or checkpoint recovery in higher eukaryotes.
We have collected over half a million splice sites from five species—Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana—and classified them into four subtypes: U2-type GT–AG and GC–AG and U12-type GT–AG and AT–AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT–AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3′ splice sites (3′ss) and (iv) distinct evolutionary histories of 5′ and 3′ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.