Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families.
Alternative splicing expands the protein-coding potential of genes and genomes. RNAs copied from a gene can be spliced differently to produce distinct proteins under regulatory influences that arise during development or upon environmental change. These authors present a global analysis of alternative splicing in the mouse, using microarray measurements of splicing from 22 adult tissues. The ability to measure thousands of splicing events across the genome in many tissues has allowed the capture of co-regulated sets of exons whose inclusion in mRNA occurs preferentially in a given set of tissues. An examination of the sequences associated with exons whose expression is regulated in brain or muscle as compared to other tissues reveals extreme conservation of intron sequences nearby the regulated exon. These conserved regions contain sequence motifs likely to contribute to the regulation of alternative splicing in brain and muscle cells. The availability of global gene expression data with splicing level resolution should spur the development of computational methods for detecting and predicting alternative splicing and its regulation. In addition, the authors make strong predictions for biological experiments leading to the identification of components and their mechanisms of action in the regulation of splicing during mammalian development.
Splicing is an important process for regulation of gene expression in eukaryotes, and it has important functional links to other steps of gene expression. Two examples of these linkages include Ceg1, a component of the mRNA capping enzyme, and the chromatin elongation factors Spt4–5, both of which have recently been shown to play a role in the normal splicing of several genes in the yeast Saccharomyces cerevisiae. Using a genomic approach to characterize the roles of Spt4–5 in splicing, we used splicing-sensitive DNA microarrays to identify specific sets of genes that are mis-spliced in ceg1, spt4, and spt5 mutants. In the context of a complex, nested, experimental design featuring 22 dye-swap array hybridizations, comprising both biological and technical replicates, we applied five appropriate statistical models for assessing differential expression between wild-type and the mutants. To refine selection of differential expression genes, we then used a robust model-synthesizing approach, Differential Expression via Distance Synthesis, to integrate all five models. The resultant list of differentially expressed genes was then further analyzed with regard to select attributes: we found that highly transcribed genes with long introns were most sensitive to spt mutations. QPCR confirmation of differential expression was established for the limited number of genes evaluated. In this paper, we showcase splicing array technology, as well as powerful, yet general, statistical methodology for assessing differential expression, in the context of a real, complex experimental design. Our results suggest that the Spt4–Spt5 complex may help coordinate splicing with transcription under conditions that present kinetic challenges to spliceosome assembly or function.
Splicing is a key process for the regulation of gene expression in eukaryotes and is credited as being the main reason for the extraordinary complexity of the human proteome relative to the human genome. Accurate splicing is crucial for normal protein function; aberrant transcripts due to splicing mutations are known causes for 15% of genetic diseases. Therefore, elucidation of splicing mechanisms will not only help in understanding the complexity and diversity of higher organisms, but also potentially aid in new therapeutic strategies for treatments of splicing-related genetic disorders. It has been previously shown that splicing has important links to other steps involved with gene expression. In this study, the authors pursue a genome-wide approach, using yeast-based, splicing-sensitive, DNA microarrays in order to further characterize the roles of select splicing factors. They devise novel statistical and computational methods that enable identification of specific sets of genes that are mis-spliced in the chosen splicing factors. Follow-up investigation of known attributes of the genes so elicited indicates that these factors may help coordinate splicing and transcription in situations where additional energy is required to effect splicing.
The human testis has almost as high a frequency of alternative splicing events as brain. While not as extensively studied as brain, a few candidate testis-specific splicing regulator proteins have been identified, including the nuclear RNA binding proteins RBMY and hnRNP G-T, which are germ cell-specific versions of the somatically expressed hnRNP G protein and are highly conserved in mammals. The splicing activator protein Tra2β is also highly expressed in the testis and physically interacts with these hnRNP G family proteins. In this study, we identified a novel testis-specific cassette exon TLE4-T within intron 6 of the human transducing-like enhancer of split 4 (TLE4) gene which makes a more transcriptionally repressive TLE4 protein isoform. TLE4-T splicing is normally repressed in somatic cells because of a weak 5′ splice site and surrounding splicing-repressive intronic regions. TLE4-T RNA pulls down Tra2β and hnRNP G proteins which activate its inclusion. The germ cell-specific RBMY and hnRNP G-T proteins were more efficient in stimulating TLE4-T incorporation than somatically expressed hnRNP G protein. Tra2b bound moderately to TLE4-T RNA, but more strongly to upstream sites to potently activate an alternative 3′ splice site normally weakly selected in the testis. Co-expression of Tra2β with either hnRNP G-T or RBMY re-established the normal testis physiological splicing pattern of this exon. Although they can directly bind pre-mRNA sequences around the TLE4-T exon, RBMY and hnRNP G-T function as efficient germ cell-specific splicing co-activators of TLE4-T. Our study indicates a delicate balance between the activity of positive and negative splicing regulators combinatorially controls physiological splicing inclusion of exon TLE4-T and leads to modulation of signalling pathways in the testis. In addition, we identified a high-affinity binding site for hnRNP G-T protein, showing it is also a sequence-specific RNA binding protein.
This study investigates tissue-specific alternative splicing, which plays a key role in generating diversity in animal cells. We found a new testis-specific exon in a human homologue of the important Drosophila developmental regulator Groucho, which is activated by germ cell RNA binding proteins. By analyzing splicing control of this exon, we elucidated how variations in the activity and expression of splicing regulators together counterbalance splicing activation, and achieve more tightly regulated physiological splicing patterns. We find that although this new human testis-specific exon is not conserved in mice, it is functionally important in that it encodes a peptide which increases the activity of this developmental regulator as a transcriptional repressor. This study provides new insights into how signalling pathways are evolving in human germ cells and the possible molecular defects that might be occurring in infertile men who have genetic deletions of germ cell-specific RNA binding proteins.
Inclusion or exclusion of single codons at the splice acceptor site of mammalian genes is regulated in a tissue-specific manner, is strongly conserved, and is associated with local accelerated protein evolution.
Thousands of human genes contain introns ending in NAGNAG (N any nucleotide), where both NAGs can function as 3′ splice sites, yielding isoforms that differ by inclusion/exclusion of three bases. However, few models exist for how such splicing might be regulated, and some studies have concluded that NAGNAG splicing is purely stochastic and nonfunctional. Here, we used deep RNA-Seq data from 16 human and eight mouse tissues to analyze the regulation and evolution of NAGNAG splicing. Using both biological and technical replicates to estimate false discovery rates, we estimate that at least 25% of alternatively spliced NAGNAGs undergo tissue-specific regulation in mammals, and alternative splicing of strongly tissue-specific NAGNAGs was 10 times as likely to be conserved between species as was splicing of non-tissue-specific events, implying selective maintenance. Preferential use of the distal NAG was associated with distinct sequence features, including a more distal location of the branch point and presence of a pyrimidine immediately before the first NAG, and alteration of these features in a splicing reporter shifted splicing away from the distal site. Strikingly, alignments of orthologous exons revealed a ∼15-fold increase in the frequency of three base pair gaps at 3′ splice sites relative to nearby exon positions in both mammals and in Drosophila. Alternative splicing of NAGNAGs in human was associated with dramatically increased frequency of exon length changes at orthologous exon boundaries in rodents, and a model involving point mutations that create, destroy, or alter NAGNAGs can explain both the increased frequency and biased codon composition of gained/lost sequence observed at the beginnings of exons. This study shows that NAGNAG alternative splicing generates widespread differences between the proteomes of mammalian tissues, and suggests that the evolutionary trajectories of mammalian proteins are strongly biased by the locations and phases of the introns that interrupt coding sequences.
In order to translate a gene into protein, all of the non-coding regions (introns) need to be removed from the transcript and the coding regions (exons) stitched back together to make an mRNA. Most human genes are alternatively spliced, allowing the selection of different combinations of exons to produce multiple distinct mRNAs and proteins. Many types of alternative splicing are known to play crucial roles in biological processes including cell fate determination, tumor metabolism, and apoptosis. In this study, we investigated a form of alternative splicing in which competing adjacent 3′ splice sites (or splice acceptor sites) generate mRNAs differing by just an RNA triplet, the size of a single codon. This mode of alternative splicing, known as NAGNAG splicing, affects thousands of human genes and has been known for a decade, but its potential regulation, physiological importance, and conservation across species have been disputed. Using high-throughput sequencing of cDNA (“RNA-Seq”) from human and mouse tissues, we found that single-codon splicing often shows strong tissue specificity. Regulated NAGNAG alternative splice sites are selectively conserved between human and mouse genes, suggesting that they are important for organismal fitness. We identified features of the competing splice sites that influence NAGNAG splicing, and validated their effects in cultured cells. Furthermore, we found that this mode of splicing is associated with accelerated and highly biased protein evolution at exon boundaries. Taken together, our analyses demonstrate that the inclusion or exclusion of RNA triplets at exon boundaries can be effectively regulated by the splicing machinery, and highlight an unexpected connection between RNA processing and protein evolution.
Appropriate expression of most eukaryotic genes requires the removal of introns from their pre–messenger RNAs (pre-mRNAs), a process catalyzed by the spliceosome. In higher eukaryotes a large family of auxiliary factors known as SR proteins can improve the splicing efficiency of transcripts containing suboptimal splice sites by interacting with distinct sequences present in those pre-mRNAs. The yeast Saccharomyces cerevisiae lacks functional equivalents of most of these factors; thus, it has been unclear whether the spliceosome could effectively distinguish among transcripts. To address this question, we have used a microarray-based approach to examine the effects of mutations in 18 highly conserved core components of the spliceosomal machinery. The kinetic profiles reveal clear differences in the splicing defects of particular pre-mRNA substrates. Most notably, the behaviors of ribosomal protein gene transcripts are generally distinct from other intron-containing transcripts in response to several spliceosomal mutations. However, dramatically different behaviors can be seen for some pairs of transcripts encoding ribosomal protein gene paralogs, suggesting that the spliceosome can readily distinguish between otherwise highly similar pre-mRNAs. The ability of the spliceosome to distinguish among its different substrates may therefore offer an important opportunity for yeast to regulate gene expression in a transcript-dependent fashion. Given the high level of conservation of core spliceosomal components across eukaryotes, we expect that these results will significantly impact our understanding of how regulated splicing is controlled in higher eukaryotes as well.
The spliceosome is a large RNA-protein machine responsible for removing the noncoding (intron) sequences that interrupt eukaryotic genes. Nearly everything known about the behavior of this machine has been based on the analysis of only a handful of genes, despite the fact that individual introns vary greatly in both size and sequence. Here we have utilized a microarray-based platform that allows us to simultaneously examine the behavior of all intron-containing genes in the budding yeast S. cerevisiae. By systematically examining the effects of individual mutants in the spliceosome on the splicing of all substrates, we have uncovered a surprisingly complex relationship between the spliceosome and its full complement of substrates. Contrary to the idea that the spliceosome engages in “generic” interactions with all intron-containing substrates in the cell, our results show that the identity of the transcript can differentially affect splicing efficiency when the machine is subtly perturbed. We propose that the wild-type spliceosome can also distinguish among its many substrates as external conditions warrant to function as a specific regulator of gene expression.
Many eukaryotic gene transcripts are spliced; here the authors show that components of the splicing complex can distinguish between different introns in highly homologous transcripts.
One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at so-called NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50% of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery.
It has recently become clear that splice variation affects most mammalian genes. It is, however, less clear to what extent these splice variations are functional and regulated by the cell as opposed to simply a result of noise in the splicing process.
One of the most frequently observed forms of splice variation are small variations in exon length in which the boundary of an exon is shifted by small amounts between different transcripts. In this work the authors study the statistics of these splice variations in detail, and the results suggest that these variations are mostly the result of noise in the splicing process. In particular, they propose a simple physical model in which the last step of splicing involves the sequence-specific binding of the splicing machinery to the splice site. In this model, small length variations can occur when there are nearby splice sites with comparable affinity for the splicing machinery. The authors show that this model not only accurately predicts the relative abundances of different splice variations but also predicts which splice sites are likely to undergo small exon length variations.
The splicing of pre-mRNAs is an essential step of gene expression in eukaryotes. Introns are removed from split genes through the activities of the spliceosome, a large ribonuclear machine that is conserved throughout the eukaryotic lineage. While unicellular eukaryotes are characterized by less complex splicing, pre-mRNA splicing of multicellular organisms is often associated with extensive alternative splicing that significantly enriches their proteome. The alternative selection of splice sites and exons permits multicellular organisms to modulate gene expression patterns in a cell type specific fashion, thus contributing to their functional diversification. Alternative splicing is a regulated process that is mainly influenced by the activities of splicing regulators, such as SR proteins or hnRNPs. These modular factors have evolved from a common ancestor through gene duplication events to a diverse group of splicing regulators that mediate exon recognition through their sequence specific binding to pre-mRNAs. Given the strong correlations between intron expansion, the complexity of pre-mRNA splicing, and the emergence of splicing regulators, it is argued that the increased presence of SR and hnRNP proteins promoted the evolution of alternative splicing through relaxation of the sequence requirements of splice junctions.
Pre-mRNA splicing; Spliceosome; alternative splicing; splicing regulation; SR protein; hnRNP protein; evolution; intron expansion; gene duplication; multicellular eukaryote; unicellular eukaryote; exon recognition; splice site
Accurate mRNA splicing depends on multiple regulatory signals encoded in the transcribed RNA sequence. Many examples of mutations within human splice regulatory regions that alter splicing qualitatively or quantitatively have been reported and allelic differences in mRNA splicing are likely to be a common and important source of phenotypic diversity at the molecular level, in addition to their contribution to genetic disease susceptibility. However, because the effect of a mutation on the efficiency of mRNA splicing is often difficult to predict, many mutations that cause disease through an effect on splicing are likely to remain undiscovered.
We have combined a genome-wide scan for sequence polymorphisms likely to affect mRNA splicing with analysis of publicly available Expressed Sequence Tag (EST) and exon array data. The genome-wide scan uses published tools and identified 30,977 SNPs located within donor and acceptor splice sites, branch points and exonic splicing enhancer elements. For 1,185 candidate splicing polymorphisms the difference in splicing between alternative alleles was corroborated by publicly available exon array data from 166 lymphoblastoid cell lines. We developed a novel probabilistic method to infer allele-specific splicing from EST data. The method uses SNPs and alternative mRNA isoforms mapped to EST sequences and models both regulated alternative splicing as well as allele-specific splicing. We have also estimated heritability of splicing and report that a greater proportion of genes show evidence of splicing heritability than show heritability of overall gene expression level. Our results provide an extensive resource that can be used to assess the possible effect on splicing of human polymorphisms in putative splice-regulatory sites.
We report a set of genes showing evidence of allele-specific splicing from an integrated analysis of genomic polymorphisms, EST data and exon array data, including several examples for which there is experimental evidence of polymorphisms affecting splicing in the literature. We also present a set of novel allele-specific splicing candidates and discuss the strengths and weaknesses of alternative technologies for inferring the effect of sequence variants on mRNA splicing.
The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests a novel mechanism for intron recognition that compensates for a weakened canonical pre-mRNA splicing motif.
While the current model of pre-mRNA splicing is based on the recognition of four canonical intronic motifs (5' splice site, branchpoint sequence, polypyrimidine (PY) tract and 3' splice site), it is becoming increasingly clear that splicing is regulated by both canonical and non-canonical splicing signals located in the RNA sequence of introns and exons that act to recruit the spliceosome and associated splicing factors. The diversity of human intronic sequences suggests the existence of novel recognition pathways for non-canonical introns. This study addresses the recognition and splicing of human introns that lack a canonical PY tract. The PY tract is a uridine-rich region at the 3' end of introns that acts as a binding site for U2AF65, a key factor in splicing machinery recruitment.
Human introns were classified computationally into low- and high-scoring PY tracts by scoring the likely U2AF65 binding site strength. Biochemical studies confirmed that low-scoring PY tracts are weak U2AF65 binding sites while high-scoring PY tracts are strong U2AF65 binding sites. A large population of human introns contains weak PY tracts. Computational analysis revealed many families of motifs, including C-rich and G-rich motifs, that are enriched upstream of weak PY tracts. In vivo splicing studies show that C-rich and G-rich motifs function as intronic splicing enhancers in a combinatorial manner to compensate for weak PY tracts.
The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests that a novel mechanism for intron recognition exists, which compensates for a weakened canonical pre-mRNA splicing motif.
Messenger RNA splicing is an essential and complex process for the removal of intron sequences. Whereas the composition of the splicing machinery is mostly known, the kinetics of splicing, the catalytic activity of splicing factors and the interdependency of transcription, splicing and mRNA 3′ end formation are less well understood. We propose a stochastic model of splicing kinetics that explains data obtained from high-resolution kinetic analyses of transcription, splicing and 3′ end formation during induction of an intron-containing reporter gene in budding yeast. Modelling reveals co-transcriptional splicing to be the most probable and most efficient splicing pathway for the reporter transcripts, due in part to a positive feedback mechanism for co-transcriptional second step splicing. Model comparison is used to assess the alternative representations of reactions. Modelling also indicates the functional coupling of transcription and splicing, because both the rate of initiation of transcription and the probability that step one of splicing occurs co-transcriptionally are reduced, when the second step of splicing is abolished in a mutant reporter.
The coding information for the synthesis of proteins in mammalian cells is first transcribed from DNA to messenger RNA (mRNA), before being translated from mRNA to protein. Each step is complex, and subject to regulation. Certain sequences of DNA must be skipped in order to generate a functional protein, and these sequences, known as introns, are removed from the mRNA by the process of splicing. Splicing is well understood in terms of the proteins and complexes that are involved, but the rates of reactions, and models for the splicing pathways, have not yet been established. We present a model of splicing in yeast that accounts for the possibilities that splicing may take place while the mRNA is in the process of being created, as well as the possibility that splicing takes place once mRNA transcription is complete. We assign rates to the reactions in the pathway, and show that co-transcriptional splicing is the preferred pathway. In order to reach these conclusions, we compare a number of alternative models by a quantitative computational method. Our analysis relies on the quantitative measurement of messenger RNA in live cells - this is a major challenge in itself that has only recently been addressed.
Splice site selection is a key element of pre-mRNA splicing. Although it is known to involve specific recognition of short consensus sequences by the splicing machinery, the mechanisms by which 5′ splice sites are accurately identified remain controversial and incompletely resolved. The human F7 gene contains in its seventh intron (IVS7) a 37-bp VNTR minisatellite whose first element spans the exon7–IVS7 boundary. As a consequence, the IVS7 authentic donor splice site is followed by several cryptic splice sites identical in sequence, referred to as 5′ pseudo-sites, which normally remain silent. This region, therefore, provides a remarkable model to decipher the mechanism underlying 5′ splice site selection in mammals. We previously suggested a model for splice site selection that, in the presence of consecutive splice consensus sequences, would stimulate exclusively the selection of the most upstream 5′ splice site, rather than repressing the 3′ following pseudo-sites. In the present study, we provide experimental support to this hypothesis by using a mutational approach involving a panel of 50 mutant and wild-type F7 constructs expressed in various cell types. We demonstrate that the F7 IVS7 5′ pseudo-sites are functional, but do not compete with the authentic donor splice site. Moreover, we show that the selection of the 5′ splice site follows a scanning-type mechanism, precluding competition with other functional 5′ pseudo-sites available on immediate sequence context downstream of the activated one. In addition, 5′ pseudo-sites with an increased complementarity to U1snRNA up to 91% do not compete with the identified scanning mechanism. Altogether, these findings, which unveil a cell type–independent 5′−3′-oriented scanning process for accurate recognition of the authentic 5′ splice site, reconciliate apparently contradictory observations by establishing a hierarchy of competitiveness among the determinants involved in 5′ splice site selection.
Typically, mammalian genes contain coding sequences (exons) separated by non-coding sequences (introns). Introns are removed during pre-mRNA splicing. The accurate recognition of introns during splicing is essential, as any abnormality in that process will generate abnormal mRNAs that can cause diseases. Understanding the mechanisms of accurate splice site selection is of prime interest to life scientists. Exon–intron borders (splice sites) are defined by short sequences that are poorly conserved. The strength of any splice sequence can be assessed by its degree of homology with a splice site consensus sequence. Within exons and introns, several sequences can match with this consensus as well as or better than the splice sites. Using a system in which a splice site sequence is repeated several times in the intron, the authors showed that linear 5′−3′ search is a leading mechanism underlying splice site selection. This scanning mechanism is cell type–independent, and only the most upstream splice site of all the series is selected, even if splice sites with a better match to the consensus are in the vicinity. These findings reconciliate contradictory observations and establish a hierarchy among the determinants involved in splice site selection.
The unconventional splicing of Hac1 by the ribonuclease Ire1 is a key event in the activation of the unfolded protein response (UPR) in Saccharomyces cerevisiae. This splicing is independent of the spliceosome and is mediated by a secondary structure at the intron-exon boundaries of the mRNA. Similar unconventional splicing was also described for the gene Xbp1 in human, mouse, Caenorhabditis elegans and Drosophila melanogaster, and for Hac1 in five other fungi. We used reported RNA structures to build a multiple sequence alignment and the Infernal package to search for homologous structures. We identified homologous non-canonical intron structures in 128 out of 156 searched eukaryotic genomes. Our results show that the sequence of the Hac1/Xbp1 intron is highly conserved only around the splice sites recognized by Ire1. The consensus structure of the Hac1/Xbp1 mRNA is well conserved in Fungi and Metazoa and resembles structures previously described. We show that a typical Hac1/Xbp1 intron is very short, only 20–26 bases, whereas yeast species have a long intron (>100 bases). We identified six species with unambiguous Hac1/Xbp1 homologs that have lost the non-canonical intron structure. We propose that these species use a different mechanism to regulate the UPR.
unfolded protein response; splicing; RNA structure; intron; HAC1; XBP1
Many alternative splicing events are regulated by pentameric and hexameric intronic sequences that serve as binding sites for splicing regulatory factors. We hypothesized that intronic elements that regulate alternative splicing are under selective pressure for evolutionary conservation. Using a Wobble Aware Bulk Aligner genomic alignment of Caenorhabditis elegans and Caenorhabditis briggsae, we identified 147 alternatively spliced cassette exons that exhibit short regions of high nucleotide conservation in the introns flanking the alternative exon. In vivo experiments on the alternatively spliced let-2 gene confirm that these conserved regions can be important for alternative splicing regulation. Conserved intronic element sequences were collected into a dataset and the occurrence of each pentamer and hexamer motif was counted. We compared the frequency of pentamers and hexamers in the conserved intronic elements to a dataset of all C. elegans intron sequences in order to identify short intronic motifs that are more likely to be associated with alternative splicing. High-scoring motifs were examined for upstream or downstream preferences in introns surrounding alternative exons. Many of the high- scoring nematode pentamer and hexamer motifs correspond to known mammalian splicing regulatory sequences, such as (T)GCATG, indicating that the mechanism of alternative splicing regulation is well conserved in metazoans. A comparison of the analysis of the conserved intronic elements, and analysis of the entire introns flanking these same exons, reveals that focusing on intronic conservation can increase the sensitivity of detecting putative splicing regulatory motifs. This approach also identified novel sequences whose role in splicing is under investigation and has allowed us to take a step forward in defining a catalog of splicing regulatory elements for an organism. In vivo experiments confirm that one novel high-scoring sequence from our analysis, (T)CTATC, is important for alternative splicing regulation of the unc-52 gene.
Alternative splicing of precursor messenger RNA is a process by which multiple protein isoforms are generated from a single gene. As many as 60% of human genes are processed in this manner, creating tissue-specific isoforms of proteins that may be a key factor in regulating the complexity of our physiology. One of the major challenges to understanding this process is to identify the sequences on the precursor messenger RNA responsible for splicing regulation. Some of these regulatory sequences occur in regions that are spliced out (called introns). This study tested the hypothesis that there should be evolutionary pressure to maintain these intronic regulatory sequences, even though intron sequence is non-coding and rapidly diverges between species. The authors employed a genomic alignment of two roundworms, Caenorhabditis elegans and Caenorhabditis briggsae, to investigate the regulation of alternative splicing. By examining evolutionarily conserved stretches of introns flanking alternatively spliced exons, the authors identified and functionally confirmed splicing regulatory sequences. Many of the top scoring sequences match known mammalian regulators, suggesting the alternative splicing regulatory mechanism is conserved across all metazoans. Other sequences were not previously identified in mammals and may represent new alternative splicing regulatory elements in higher organisms or ones that may be specific to worms.
Computational and experimental evidence is given for alternative splicing at the unusual GYNGYN motif in several species, enabling in most cases subtle protein variations.
Splice donor sites have a highly conserved GT or GC dinucleotide and an extended intronic consensus sequence GTRAGT that reflects the sequence complementarity to the U1 snRNA. Here, we focus on unusual donor sites with the motif GYNGYN (Y stands for C or T; N stands for A, C, G, or T).
While only one GY functions as a splice donor for the majority of these splice sites in human, we provide computational and experimental evidence that 110 (1.3%) allow alternative splicing at both GY donors. The resulting splice forms differ in only three nucleotides, which results mostly in the insertion/deletion of one amino acid. However, we also report the insertion of a stop codon in four cases. Investigating what distinguishes alternatively from not alternatively spliced GYNGYN donors, we found differences in the binding to U1 snRNA, a strong correlation between U1 snRNA binding strength and the preferred donor, over-represented sequence motifs in the adjacent introns, and a higher conservation of the exonic and intronic flanks between human and mouse. Extending our genome-wide analysis to seven other eukaryotic species, we found alternatively spliced GYNGYN donors in all species from mouse to Caenorhabditis elegans and even in Arabidopsis thaliana. Experimental verification of a conserved GTAGTT donor of the STAT3 gene in human and mouse reveals a remarkably similar ratio of alternatively spliced transcripts in both species.
In contrast to alternative splicing in general, GYNGYN donors in addition to NAGNAG acceptors enable subtle protein variations.
RNA splicing is a major regulatory mechanism for controlling eukaryotic gene expression. By generating various splice isoforms from a single pre–mRNA, alternative splicing plays a key role in promoting the evolving complexity of metazoans. Numerous splicing factors have been identified. However, the in vivo functions of many splicing factors remain to be understood. In vivo studies are essential for understanding the molecular mechanisms of RNA splicing and the biology of numerous RNA splicing-related diseases. We previously isolated a Caenorhabditis elegans mutant defective in an essential gene from a genetic screen for suppressors of the rubberband Unc phenotype of unc-93(e1500) animals. This mutant contains missense mutations in two adjacent codons of the C. elegans microfibrillar-associated protein 1 gene mfap-1. mfap-1(n4564 n5214) suppresses the Unc phenotypes of different rubberband Unc mutants in a pattern similar to that of mutations in the splicing factor genes uaf-1 (the C. elegans U2AF large subunit gene) and sfa-1 (the C. elegans SF1/BBP gene). We used the endogenous gene tos-1 as a reporter for splicing and detected increased intron 1 retention and exon 3 skipping of tos-1 transcripts in mfap-1(n4564 n5214) animals. Using a yeast two-hybrid screen, we isolated splicing factors as potential MFAP-1 interactors. Our studies indicate that C. elegans mfap-1 encodes a splicing factor that can affect alternative splicing.
RNA splicing removes intervening intronic sequences from pre–mRNA transcripts and joins adjacent exonic sequences to generate functional messenger RNAs. The in vivo functions of numerous factors that regulate splicing remain to be understood. From a genetic screen for suppressors of the rubberband Unc phenotype caused by the Caenorhabditis elegans unc-93(e1500) mutation, we isolated a mutation that affects a highly conserved essential gene, mfap-1. MFAP-1 is a nuclear protein that is broadly expressed. MFAP-1 can affect the alternative splicing of tos-1, an endogenous reporter gene for splicing, and is required for the altered splicing at a cryptic 3′ splice site of tos-1. mfap-1 enhances the effects of the gene uaf-1 (splicing factor U2AF large subunit) in suppressing the rubberband Unc phenotype of unc-93(e1500) animals. Our studies provide in vivo evidence that MFAP-1 functions as a splicing factor.
Little is known about pre-mRNA splicing in Dictyostelium discoideum although its genome has been completely sequenced. Our analysis suggests that pre-mRNA splicing plays an important role in D. discoideum gene expression as two thirds of its genes contain at least one intron. Ongoing curation of the genome to date has revealed 40 genes in D. discoideum with clear evidence of alternative splicing, supporting the existence of alternative splicing in this unicellular organism. We identified 160 candidate U2-type spliceosomal proteins and related factors in D. discoideum based on 264 known human genes involved in splicing. Spliceosomal small ribonucleoproteins (snRNPs), PRP19 complex proteins and late-acting proteins are highly conserved in D. discoideum and throughout the metazoa. In non-snRNP and hnRNP families, D. discoideum orthologs are closer to those in A. thaliana, D. melanogaster and H. sapiens than to their counterparts in S. cerevisiae. Several splicing regulators, including SR proteins and CUG-binding proteins, were found in D. discoideum, but not in yeast. Our comprehensive catalog of spliceosomal proteins provides useful information for future studies of splicing in D. discoideum where the efficient genetic and biochemical manipulation will also further our general understanding of pre-mRNA splicing.
pre-mRNA splicing; spliceosomal genes; Dictyostelium discoideum; comparative genomics; splicing regulators
Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.
We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.
Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
promoter; tissue-specific gene expression; position weight matrix; regulatory motif
Kinetic analysis shows that RNA polymerase elongation kinetics are not modulated by co-transcriptional splicing and that post-transcriptional splicing can proceed at the site of transcription without the presence of the polymerase.
RNA processing events that take place on the transcribed pre-mRNA include capping, splicing, editing, 3′ processing, and polyadenylation. Most of these processes occur co-transcriptionally while the RNA polymerase II (Pol II) enzyme is engaged in transcriptional elongation. How Pol II elongation rates are influenced by splicing is not well understood. We generated a family of inducible gene constructs containing increasing numbers of introns and exons, which were stably integrated in human cells to serve as actively transcribing gene loci. By monitoring the association of the transcription and splicing machineries on these genes in vivo, we showed that only U1 snRNP localized to the intronless gene, consistent with a splicing-independent role for U1 snRNP in transcription. In contrast, all snRNPs accumulated on intron-containing genes, and increasing the number of introns increased the amount of spliceosome components recruited. This indicates that nascent RNA can assemble multiple spliceosomes simultaneously. Kinetic measurements of Pol II elongation in vivo, Pol II ChIP, as well as use of Spliceostatin and Meayamycin splicing inhibitors showed that polymerase elongation rates were uncoupled from ongoing splicing. This study shows that transcription elongation kinetics proceed independently of splicing at the model genes studied here. Surprisingly, retention of polyadenylated mRNA was detected at the transcription site after transcription termination. This suggests that the polymerase is released from chromatin prior to the completion of splicing, and the pre-mRNA is post-transcriptionally processed while still tethered to chromatin near the gene end.
The pre-mRNA emerging from RNA polymerase II during eukaryotic transcription undergoes a series of processing events. These include 5′-capping, intron excision and exon ligation during splicing, 3′-end processing, and polyadenylation. Processing events occur co-transcriptionally, meaning that a variety of enzymes assemble on the pre-mRNA while the polymerase is still engaged in transcription. The concept of co-transcriptional mRNA processing raises questions about the possible coupling between the transcribing polymerase and the processing machineries. Here we examine how the co-transcriptional assembly of the splicing machinery (the spliceosome) might affect the elongation kinetics of the RNA polymerase. Using live-cell microscopy, we followed the kinetics of transcription of genes containing increasing numbers of introns and measured the recruitment of transcription and splicing factors. Surprisingly, a sub-set of splicing factors was recruited to an intronless gene, implying that there is a polymerase-coupled scanning mechanism for intronic sequences. There was no difference in polymerase elongation rates on genes with or without introns, suggesting that the spliceosome does not modulate elongation kinetics. Experiments including inhibition of splicing or transcription, together with stochastic computational simulation, demonstrated that pre-mRNAs can be retained on the gene when polymerase termination precedes completion of splicing. Altogether we show that polymerase elongation kinetics are not affected by splicing events on the emerging pre-mRNA, that increased splicing leads to more splicing factors being recruited to the mRNA, and that post-transcriptional splicing can proceed at the site of transcription in the absence of the polymerase.
To date the Simian Virus 40 (SV40) is the only proven example of a virus that recruits the mechanism of RNA trans-splicing to diversify its sequences and gene products. Thereby, two identical viral transcripts are efficiently joined by homologous trans-splicing triggering the formation of a highly transforming 100 kDa super T antigen. Sequences of other viruses including HIV-1 and the human adenovirus type 5 were reported to be involved in heterologous trans-splicing towards cellular or viral sequences but the meaning of these events remains unclear. We computationally and experimentally investigated molecular features associated with viral RNA trans-splicing and identified a common pattern: Viral RNA trans-splicing occurs between strong cryptic or regular viral splice sites and strong regular or cryptic splice sites of the trans-splice partner sequences. The majority of these splice sites are supported by exonic splice enhancers. Splice sites that could compete with the trans-splicing sites for cis-splice reactions are weaker or inexistent. Finally, all but one of the trans-splice reactions seem to be facilitated by one or more complementary binding domains of 11 to 16 nucleotides in length which, however occur with a statistical probability close to one for the given length of the involved sequences. The chimeric RNAs generated via heterologous viral RNA trans-splicing either did not lead to fusion proteins or led to proteins of unknown function. Our data suggest that distinct viral RNAs are highly susceptible to trans-splicing and that heterologous viral trans-splicing, unlike homologous SV40 trans-splicing, represents a chance event.
Alternative splicing; RNA trans-splicing; Viral RNA trans-splicing; SV40; HIV-1; Adenovirus
Knowledge of the functional cis-regulatory elements that regulate constitutive and alternative pre-mRNA splicing is fundamental for biology and medicine. Here we undertook a genome-wide comparative genomics approach using available mammalian genomes to identify conserved intronic splicing regulatory elements (ISREs). Our approach yielded 314 ISREs, and insertions of ~70 ISREs between competing splice sites demonstrated that 84% of ISREs altered 5′ and 94% altered 3′ splice site choice in human cells. Consistent with our experiments, comparisons of ISREs to known splicing regulatory elements revealed that 40%–45% of ISREs might have dual roles as exonic splicing silencers. Supporting a role for ISREs in alternative splicing, we found that 30%–50% of ISREs were enriched near alternatively spliced (AS) exons, and included almost all known binding sites of tissue-specific alternative splicing factors. Further, we observed that genes harboring ISRE-proximal exons have biases for tissue expression and molecular functions that are ISRE-specific. Finally, we discovered that for Nova1, neuronal PTB, hnRNP C, and FOX1, the most frequently occurring ISRE proximal to an alternative conserved exon in the splicing factor strongly resembled its own known RNA binding site, suggesting a novel application of ISRE density and the propensity for splicing factors to auto-regulate to associate RNA binding sites to splicing factors. Our results demonstrate that ISREs are crucial building blocks in understanding general and tissue-specific AS regulation and the biological pathways and functions regulated by these AS events.
During RNA splicing, sequences (introns) in a pre-mRNA are excised and discarded, and the remaining sequences (exons) are joined to form the mature RNA. Splicing is regulated not only by the binding of the basic splicing machinery to splice sites located at the exon–intron boundaries, but also by the combined effects of various other splicing factors that bind to a multitude of sequence elements located both in the exons as well as the flanking introns. Instances of alternative splicing, where usage of splice site(s) is incomplete or different between tissues, cell types, or lineages, can be created by the interaction of sequence elements and tissue, cell type, and stage-specific splicing factors. To better understand constitutive and alternative pre-mRNA splicing, the authors describe a comparative genomics approach, using available mammalian genomes, to systematically identify splicing regulatory elements located in the introns proximal to exons. A quarter of the elements were tested experimentally, and most of them altered splicing in human cells. The authors also showed that that the intronic elements are close to tissue-specific alternative exons and are more likely to be located in specific positions in the introns, suggestive of potential regulatory function. These elements are also frequently found in tissue-specific genes, suggesting a coupling between expression and alternative splicing of these genes. Finally, the authors propose a strategy using the elements to identify the binding sites of several splicing factors.
Incorporation of exon 11 of the insulin receptor gene is both developmentally and hormonally-regulated. Previously, we have shown the presence of enhancer and silencer elements that modulate the incorporation of the small 36-nucleotide exon. In this study, we investigated the role of inherent splice site strength in the alternative splicing decision and whether recognition of the splice sites is the major determinant of exon incorporation.
We found that mutation of the flanking sub-optimal splice sites to consensus sequences caused the exon to be constitutively spliced in-vivo. These findings are consistent with the exon-definition model for splicing. In-vitro splicing of RNA templates containing exon 11 and portions of the upstream intron recapitulated the regulation seen in-vivo. Unexpectedly, we found that the splice sites are occupied and spliceosomal complex A was assembled on all templates in-vitro irrespective of splicing efficiency.
These findings demonstrate that the exon-definition model explains alternative splicing of exon 11 in the IR gene in-vivo but not in-vitro. The in-vitro results suggest that the regulation occurs at a later step in spliceosome assembly on this exon.
Recently, thanks to the increasing throughput of new technologies, we have begun to explore the full extent of alternative pre–mRNA splicing (AS) in the human transcriptome. This is unveiling a vast layer of complexity in isoform-level expression differences between individuals. We used previously published splicing sensitive microarray data from lymphoblastoid cell lines to conduct an in-depth analysis on splicing efficiency of known and predicted exons. By combining publicly available AS annotation with a novel algorithm designed to search for AS, we show that many real AS events can be detected within the usually unexploited, speculative majority of the array and at significance levels much below standard multiple-testing thresholds, demonstrating that the extent of cis-regulated differential splicing between individuals is potentially far greater than previously reported. Specifically, many genes show subtle but significant genetically controlled differences in splice-site usage. PCR validation shows that 42 out of 58 (72%) candidate gene regions undergo detectable AS, amounting to the largest scale validation of isoform eQTLs to date. Targeted sequencing revealed a likely causative SNP in most validated cases. In all 17 incidences where a SNP affected a splice-site region, in silico splice-site strength modeling correctly predicted the direction of the micro-array and PCR results. In 13 other cases, we identified likely causative SNPs disrupting predicted splicing enhancers. Using Fst and REHH analysis, we uncovered significant evidence that 2 putative causative SNPs have undergone recent positive selection. We verified the effect of five SNPs using in vivo minigene assays. This study shows that splicing differences between individuals, including quantitative differences in isoform ratios, are frequent in human populations and that causative SNPs can be identified using in silico predictions. Several cases affected disease-relevant genes and it is likely some of these differences are involved in phenotypic diversity and susceptibility to complex diseases.
Alternative splicing (AS), through the alternative use of exons, can produce many different mRNA transcripts from the same genomic locus, thus possibly resulting in the production of many different proteins. We know that splicing differences between individuals exist and that these changes are often associated with genetic variants. Thus far, very few of these associations have led to the precise localization of the causative polymorphisms. In this work, using in-depth analysis of previously published splicing sensitive micro-array data from human cell lines, we identified and validated a large number of splicing changes which are highly correlated with nearby genetic variations. We then sequenced the genomic DNA around candidate exons and used in silico modeling tools to identify causative SNPs for most of our candidates. Using a plasmid reporter construct, we further demonstrated that five selected SNPs reproduce the expected effect in vivo. Our results indicate that genetically controlled splicing differences between individuals may be more common than previously suggested and can be very subtle; and most are caused by SNPs affecting either the splice-site region or exonic splicing enhancers (ESEs) sequences.
Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5′ splice site that perfectly matches the 5′ consensus combined with mutation to match the CAG/G sequence of the 3′ consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3′ splice site and a consensus 5′ splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5′ splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with β-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
The 3′ splice site (SS) at the end of pre-mRNA introns has a consensus sequence (Y)nNYAG for constitutive splicing of mammalian genes. Deviation from this consensus could change or interrupt the usage of the splice site leading to alternative or aberrant splicing, which could affect normal cell function or even the development of diseases. We have shown that the position “N” can be replaced by a CA-rich RNA element called CaRRE1 to regulate the alternative splicing of a group of genes.
Taking it a step further, we searched the human genome for purine-rich elements between the -3 and -10 positions of the 3′ splice sites of annotated introns. This identified several thousand such 3′SS; more than a thousand of them contain at least one copy of G tract. These sites deviate significantly from the consensus of constitutive splice sites and are highly associated with alterative splicing events, particularly alternative 3′ splice and intron retention. We show by mutagenesis analysis and RNA interference that the G tracts are splicing silencers and a group of the associated exons are controlled by the G tract binding proteins hnRNP H/F. Species comparison of a group of the 3′SS among vertebrates suggests that most (~87%) of the G tracts emerged in ancestors of mammals during evolution. Moreover, the host genes are most significantly associated with cancer.
We call these elements together with CaRRE1 regulatory RNA elements between the Py and 3′AG (REPA). The emergence of REPA in this highly constrained region indicates that this location has been remarkably permissive for the emergence of de novo regulatory RNA elements, even purine-rich motifs, in a large group of mammalian genes during evolution. This evolutionary change controls alternative splicing, likely to diversify proteomes for particular cellular functions.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1143) contains supplementary material, which is available to authorized users.
3′ splice site; Alternative splicing; G-tract; Evolution; Cancer
Intron removal from a pre-mRNA by RNA splicing was once thought to be controlled mainly by intron splicing signals. However, viral and other eukaryotic RNA exon sequences have recently been found to regulate RNA splicing, polyadenylation, export, and nonsense-mediated RNA decay in addition to their coding function. Regulation of alternative RNA splicing by exon sequences is largely attributable to the presence of two major cis-acting elements in the regulated exons, the exonic splicing enhancer (ESE) and the suppressor or silencer (ESS). Two types of ESEs have been verified from more than 50 genes or exons: purine-rich ESEs, which are the more common, and non-purine-rich ESEs. In contrast, the sequences of ESSs identified in approximately 21 genes or exons are highly diverse and show little similarity to each other. Through interactions with cellular splicing factors, an ESE or ESS determines whether or not a regulated splice site, usually an upstream 3′ splice site, will be used for RNA splicing. However, how these elements function precisely in selecting a regulated splice site is only partially understood. The balance between positive and negative regulation of splice site selection likely depends on the cis-element’s identity and changes in cellular splicing factors under physiological or pathological conditions.
RNA; exons; introns; alternative RNA splicing; gene expression; RNA processing; splicing enhancers; splicing suppressors