Point mutations frequently cause genetic diseases by disrupting the correct pattern of pre-mRNA splicing. The effect of a point mutation within a coding sequence is traditionally attributed to the deduced change in the corresponding amino acid. However, some point mutations can have much more severe effects on the structure of the encoded protein, for example when they inactivate an exonic splicing enhancer (ESE), thereby resulting in exon skipping. ESEs also appear to be especially important in exons that normally undergo alternative splicing. Different classes of ESE consensus motifs have been described, but they are not always easily identified. ESEfinder (http://exon.cshl.edu/ESE/) is a web-based resource that facilitates rapid analysis of exon sequences to identify putative ESEs responsive to the human SR proteins SF2/ASF, SC35, SRp40 and SRp55, and to predict whether exonic mutations disrupt such elements.
Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge.
As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants.
Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral.
In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE prediction tool. This is the first report on the prediction of the frequency and distribution of ESEs in the BRCA1 gene, and it is the first reported attempt to predict which ESEs are most likely to be functional and therefore which sequence variants in ESEs are most likely to be pathogenic.
A very early step in splice site recognition is exon definition, a process that is as yet poorly understood. Communication between the two ends of an exon is thought to be required for this step. We report genome-wide evidence for exons being defined through the combinatorial activity of motifs located in flanking intronic regions.
Strongly co-occurring motifs were found to specifically reside in four intronic regions surrounding a large number of human exons. These paired motifs occur around constitutive and alternative exons but not pseudo exons. Most co-occurring motifs are limited to intronic regions within 100 nucleotides of the exon. They are preferentially associated with weaker exons. Their pairing is conserved in evolution and they exhibit a lower frequency of single nucleotide polymorphism when paired. Paired motifs display specificity with respect to distance from the exon borders and in constitutive versus alternative splicing. Many resemble binding sites for heterogeneous nuclear ribonucleoproteins. Specific pairs are associated with tissue-specific genes, the higher expression of which coincides with that of the pertinent RNA binding proteins. Tested pairs acted synergistically to enhance exon inclusion, and this enhancement was found to be exon-specific.
The exon-flanking sequence pairs identified here by genomic analysis promote exon inclusion and may play a role in the exon definition step in pre-mRNA splicing. We propose a model in which multiple concerted interactions are required between exonic sequences and flanking intronic sequences to effect exon definition.
Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic splicing enhancers have been shown to enhance the utilization of nearby splice sites.
We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software.
Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.
Alternatively spliced exons play an important role in the diversification of gene function in most metazoans and are highly regulated by conserved motifs in exons and introns. Two contradicting properties have been associated to evolutionary conserved alternative exons: higher sequence conservation and higher rate of non-synonymous substitutions, relative to constitutive exons. In order to clarify this issue, we have performed an analysis of the evolution of alternative and constitutive exons, using a large set of protein coding exons conserved between human and mouse and taking into account the conservation of the transcript exonic structure. Further, we have also defined a measure of the variation of the arrangement of exonic splicing enhancers (ESE-conservation score) to study the evolution of splicing regulatory sequences. We have used this measure to correlate the changes in the arrangement of ESEs with the divergence of exon and intron sequences.
We find evidence for a relation between the lack of conservation of the exonic structure and the weakening of the sequence evolutionary constraints in alternative and constitutive exons. Exons in transcripts with non-conserved exonic structures have higher synonymous (dS) and non-synonymous (dN) substitution rates than exons in conserved structures. Moreover, alternative exons in transcripts with non-conserved exonic structure are the least constrained in sequence evolution, and at high EST-inclusion levels they are found to be very similar to constitutive exons, whereas alternative exons in transcripts with conserved exonic structure have a dS significantly lower than average at all EST-inclusion levels. We also find higher conservation in the arrangement of ESEs in constitutive exons compared to alternative ones. Additionally, the sequence conservation at flanking introns remains constant for constitutive exons at all ESE-conservation values, but increases for alternative exons at high ESE-conservation values.
We conclude that most of the differences in dN observed between alternative and constitutive exons can be explained by the conservation of the transcript exonic structure. Low dS values are more characteristic of alternative exons with conserved exonic structure, but not of those with non-conserved exonic structure. Additionally, constitutive exons are characterized by a higher conservation in the arrangement of ESEs, and alternative exons with an ESE-conservation similar to that of constitutive exons are characterized by a conservation of the flanking intron sequences higher than average, indicating the presence of more intronic regulatory signals.
Although considerable information is currently available about the factors involved in constitutive vertebrate polyadenylation, the factors and mechanisms involved in facilitating communication between polyadenylation and splicing are largely unknown. Even less is known about the regulation of polyadenylation in genes in which 3′-terminal exons are alternatively recognized. Here we demonstrate that an SR protein, SRp20, affects recognition of an alternative 3′-terminal exon via an effect on the efficiency of binding of a polyadenylation factor to an alternative polyadenylation site. The gene under study codes for the peptides calcitonin and calcitonin gene-related peptide. Its pre-mRNA is alternatively processed by the tissue-specific inclusion or exclusion of an embedded 3′-terminal exon, exon 4, via factors binding to an intronic enhancer element that contains both 3′ and 5′ splice site consensus sequence elements. In cell types that preferentially exclude exon 4, addition of wild-type SRp20 enhances exon 4 inclusion via recognition of the intronic enhancer. In contrast, in cell types that preferentially include exon 4, addition of a mutant form of SRp20 containing the RNA-binding domain but missing the SR domain inhibits exon 4 inclusion. Inhibition is likely at the level of polyadenylation, because the mutant SRp20 inhibits binding of CstF to the exon 4 poly(A) site. This is the first demonstration that an SR protein can influence alternative polyadenylation and suggests that this family of proteins may play a role in recognition of 3′-terminal exons and perhaps in the communication between polyadenylation and splicing.
The guanosine-adenosine-rich exonic splicing enhancer (GAR ESE) identified in exon 5 of the human immunodeficiency virus type-1 (HIV-1) pre-mRNA activates either an enhancer-dependent 5′ splice site (ss) or 3′ ss in 1-intron reporter constructs in the presence of the SR proteins SF2/ASF2 and SRp40. Characterizing the mode of action of the GAR ESE inside the internal HIV-1 exon 5 we found that this enhancer fulfils a dual splicing regulatory function (i) by synergistically mediating exon recognition through its individual SR protein-binding sites and (ii) by conferring 3′ ss selectivity within the 3′ ss cluster preceding exon 5. Both functions depend upon the GAR ESE, U1 snRNP binding at the downstream 5′ ss D4 and the E42 sequence located between these elements. Therefore, a network of cross-exon interactions appears to regulate splicing of the alternative exons 4a and 5. As the GAR ESE-mediated activation of the upstream 3′ ss cluster also is essential for the processing of intron-containing vpu/env-mRNAs during intermediate viral gene expression, the GAR enhancer substantially contributes to the regulation of viral replication.
Incorporation of exon 11 of the insulin receptor gene is both developmentally and hormonally-regulated. Previously, we have shown the presence of enhancer and silencer elements that modulate the incorporation of the small 36-nucleotide exon. In this study, we investigated the role of inherent splice site strength in the alternative splicing decision and whether recognition of the splice sites is the major determinant of exon incorporation.
We found that mutation of the flanking sub-optimal splice sites to consensus sequences caused the exon to be constitutively spliced in-vivo. These findings are consistent with the exon-definition model for splicing. In-vitro splicing of RNA templates containing exon 11 and portions of the upstream intron recapitulated the regulation seen in-vivo. Unexpectedly, we found that the splice sites are occupied and spliceosomal complex A was assembled on all templates in-vitro irrespective of splicing efficiency.
These findings demonstrate that the exon-definition model explains alternative splicing of exon 11 in the IR gene in-vivo but not in-vitro. The in-vitro results suggest that the regulation occurs at a later step in spliceosome assembly on this exon.
Human internal exons have an average size of 147 nt, and most are <300 nt. This small size is thought to facilitate exon definition. A small number of large internal exons have been identified and shown to be alternatively spliced. We identified 1115 internal exons >1000 nt in the human genome; these were found in 5% of all protein-coding genes, and most were expressed and translated. Surprisingly, 40% of these were expressed at levels similar to the flanking exons, suggesting they were constitutively spliced. While all of the large exons had strong splice sites, the constitutively spliced large exons had a higher ratio of splicing enhancers/silencers and were more conserved across mammals than the alternatively spliced large exons. We asked if large exons contain specific sequences that promote splicing and identified 38 sequences enriched in the large exons relative to small exons. The consensus sequence is C-rich with a central invariant CA dinucleotide. Mutation of these sequences in a candidate large exon indicated that these are important for recognition of large exons by the splicing machinery. We propose that these sequences are large exon splicing enhancers (LESEs).
To better understand splicing regulation, we used a cell-based screen to identify ten diverse motifs that inhibit splicing from intron. Each motif was validated in another human cell type and gene context, and their presence correlated with in vivo splicing changes. All motifs exhibited exonic splicing enhancer or silencer activity, and grouping these motifs based on their distributions yielded clusters with distinct patterns of context-dependent activity. Candidate regulatory factors associated with each motif were identified, recovering 24 known and novel splicing regulators. Specific domains in selected factors were sufficient to confer ISS activity. Many factors bound multiple distinct motifs with similar affinity, and all motifs were recognized by multiple factors, revealing a complex, overlapping network of protein:RNA interactions. This arrangement enables individual cis-element to function differently in distinct cellular contexts depending on the spectrum of regulatory factors present.
splicing regulation; splicing factors; intronic splicing silencers; RNA binding protein; context dependent activity
A new method which predicts internal exon sequences in human DNA has been developed. The method is based on a splice site prediction algorithm that uses the linear discriminant function to combine information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions. The accuracy of our splice site recognition function is 97% for donor splice sites and 96% for acceptor splice sites. For exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. This corresponds to a correlation coefficient for exon prediction of 0.87. The precision of this approach is better than other methods and has been tested on a larger data set. We have also developed a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.
Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. In vertebrates, most splice sites are initially recognized by the spliceosome across the exon, because most exons are small and surrounded by large introns. This gene architecture predicts that efficient exon recognition depends largely on the strength of the flanking 3′ and 5′ splice sites. However, it is unknown if the 3′ or the 5′ splice site dominates the exon recognition process. Here, we test the 3′ and 5′ splice site contributions towards efficient exon recognition by systematically replacing the splice sites of an internal exon with sequences of different splice site strengths. We show that the presence of an optimal splice site does not guarantee exon inclusion and that the best predictor for exon recognition is the sum of both splice site scores. Using a genome-wide approach, we demonstrate that the combined 3′ and 5′ splice site strengths of internal exons provide a much more significant separator between constitutive and alternative exons than either the 3′ or the 5′ splice site strength alone.
Exon 3 of the human apolipoprotein A-II (apoA-II) gene is efficiently included in the mRNA although its acceptor site is significantly weak because of a peculiar (GU)16 tract instead of a canonical polypyrimidine tract within the intron 2/exon 3 junction. Our previous studies demonstrated that the SR proteins ASF/SF2 and SC35 bind specifically an exonic splicing enhancer (ESE) within exon 3 and promote exon 3 splicing. In the present study, we show that the ESE is necessary only in the proper context. In addition, we have characterized two novel sequences in the flanking introns that modulate apoA-II exon 3 splicing. There is a G-rich element in intron 2 that interacts with hnRNPH1 and inhibits exon 3 splicing. The second is a purine rich region in intron 3 that binds SRp40 and SRp55 and promotes exon 3 inclusion in mRNA. We have also found that the (GU) repeats in the apoA-II context bind the splicing factor TDP-43 and interfere with exon 3 definition. Significantly, blocking of TDP-43 expression by small interfering RNA overrides the need for all the other cis-acting elements making exon 3 inclusion constitutive even in the presence of disrupted exonic and intronic enhancers. Altogether, our results suggest that exonic and intronic enhancers have evolved to balance the negative effects of the two silencers located in intron 2 and hence rescue the constitutive exon 3 inclusion in apoA-II mRNA.
The cardiac troponin T pre-mRNA contains an exonic splicing enhancer that is required for inclusion of the alternative exon 5. Here we show that enhancer activity is exquisitely sensitive to changes in the sequence of a 9-nucleotide motif (GAGGAAGAA) even when its purine content is preserved. A series of mutations that increased or decreased the level of exon inclusion in vivo were used to correlate enhancer strength with RNA-protein interactions in vitro. Analyses involving UV cross-linking and immunoprecipitation indicated that only four (SRp30a, SRp40, SRp55, and SRp75) of six essential splicing factors known as SR proteins bind to the active enhancer RNA. Moreover, purified SRp40 and SRp55 activate splicing of exon 5 when added to a splicing-deficient S100 extract. Purified SRp30b did not stimulate splicing in S100 extracts, which is consistent with its failure to bind the enhancer RNA. In vitro competition of SR protein splicing activity and UV cross-linking demonstrated that the sequence determinants for SR protein binding were precisely coincident with the sequence determinants of enhancer strength. Thus, a subset of SR proteins interacts directly with the exonic enhancer to promote inclusion of a poorly defined alternative exon. Independent regulation of the levels of SR proteins may, therefore, contribute to the developmental regulation of exon inclusion.
Alternative splicing is an important regulatory mechanism to create protein diversity. In order to elucidate possible regulatory elements common to neuron specific exons, we created and statistically analysed a database of exons that are alternatively spliced in neurons. The splice site comparison of alternatively and constitutively spliced exons reveals that some, but not all alternatively spliced exons have splice sites deviating from the consensus sequence, implying diverse patterns of regulation. The deviation from the consensus is most evident at the -3 position of the 3' splice site and the +4 and -3 position of the 5' splice site. The nucleotide composition of alternatively and constitutively spliced exons is different, with alternatively spliced exons being more AU rich. We performed overlapping k-tuple analysis to identify common motifs. We found that alternatively and constitutively spliced exons differ in the frequency of several trinucleotides that cannot be explained by the amino acid composition and may be important for splicing regulation.
The insulin receptor (IR) exists as two isoforms, IR-A and IR-B, which result from alternative splicing of exon 11 in the primary transcript. This alternative splicing is cell specific, and the relative proportions of exon 11 isoforms also vary during development, aging, and different disease states. We have previously demonstrated that both intron 10 and exon 11 contain regulatory sequences that affect IR splicing both positively and negatively. In this study, we sought to define the precise sequence elements within exon 11 that control exon recognition and cellular factors that recognize these elements. Using minigenes carrying linker-scanning mutations within exon 11, we detected both exonic splicing enhancer and exonic splicing silencer elements. We identified binding of SRp20 and SF2/ASF to the exonic enhancers and CUG-BP1 to the exonic silencer by RNA affinity chromatography. Overexpression and knockdown studies with hepatoma and embryonic kidney cells demonstrated that SRp20 and SF2/ASF increase exon inclusion but that CUG-BP1 causes exon skipping. We found that CUG-BP1 also binds to an additional intronic splicing silencer, located at the 3′ end of intron 10, to promote exon 11 skipping. Thus, we propose that SRp20, SF2/ASF, and CUG-BP1 act antagonistically to regulate IR alternative splicing in vivo and that the relative ratios of SRp20 and SF2/ASF to CUG-BP1 in different cells determine the degree of exon inclusion.
Pseudo-exons are intronic sequences that are flanked by apparent consensus splice sites but that are not observed in spliced mRNAs. Pseudo-exons are often difficult to activate by mutation and have typically been viewed as a conceptual challenge to our understanding of how the spliceosome discriminates between authentic and cryptic splice sites. We have analyzed an apparent pseudo-exon located downstream of mutually exclusive exons 2 and 3 of the rat α-tropomyosin (TM) gene. The TM pseudo-exon is conserved among mammals and has a conserved profile of predicted splicing enhancers and silencers that is more typical of a genuine exon than a pseudo-exon. Splicing of the pseudo-exon is fully activated for splicing to exon 3 by a number of simple mutations. Splicing of the pseudo-exon to exon 3 is predicted to lead to nonsense-mediated decay (NMD). In contrast, when “prespliced” to exon 2 it follows a “zero length exon” splicing pathway in which a newly generated 5′ splice site at the junction with exon 2 is spliced to exon 4. We propose that a subset of apparent pseudo-exons, as exemplified here, are actually authentic alternative exons whose inclusion leads to NMD.
Nuclear RNA processing events, such as 5′ cap formation, 3′ polyadenylation, and pre-mRNA splicing, mark mRNA for efficient translation. Splicing enhances translation via the deposition of the exon-junction complex and other multifunctional splicing factors, including SR proteins. All retroviruses synthesize their structural and enzymatic proteins from unspliced genomic RNAs (gRNAs) and must therefore exploit unconventional strategies to ensure their effective expression. Here, we report that specific SR proteins, particularly SRp40 and SRp55, promote human immunodeficiency virus type 1 (HIV-1) Gag translation from unspliced (intron-containing) viral RNA. This activity does not correlate with nucleocytoplasmic shuttling capacity and, in the case of SRp40, is dependent on the second RNA recognition motif and the arginine-serine (RS) domain. While SR proteins enhance Gag expression independent of RNA nuclear export pathway choice, altering the nucleotide sequence of the gag-pol coding region by codon optimization abolishes this effect. We therefore propose that SR proteins couple HIV-1 gRNA biogenesis to translational utilization.
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.
Eukaryotic splicing factors belonging to the SR family are essential splicing factors consisting of an N-terminal RNA-binding region and a C-terminal RS domain. They are believed to be involved in alternative splicing of numerous transcripts because their expression levels can influence splice site selection. We have characterized the structure and transcriptional regulation of the gene for the smallest member of the SR family, SRp20 (previously called X16). The mouse gene encoding SRp20, termed Srp20, consists of one alternative exon and six constitutive exons and was mapped to a 2-centimorgan interval on chromosome 17. When cells are transfected with SRp20 genomic DNA, both standard and alternatively spliced transcripts and corresponding proteins are produced. Interestingly, in starved (G0) cells, the amount of SRp20 mRNA containing the alternative exon is large, whereas the amount of the standard SRp20 mRNA without the alternative exon is small. When starved cells are stimulated with serum, the alternative form is lost and the standard form is induced. These results suggest that splicing could be regulated during the cell cycle and that this could be, at least in part, due to regulated expression of SR proteins. Consistent with this, experiments with synchronized cells showed an induction of SRp20 transcripts in late G1 or early S. We have also characterized the promoter of SRp20. It lies within a GC-rich CpG island and contains two consensus binding sites for E2F, a transcription factor thought to be involved in regulating the cell cycle. These motifs may be functional since reporter constructs with the SRp20 promoter can be stimulated by cotransfection with E2F expression plasmids.
Human pre-mRNAs contain a definite number of exons and several pseudoexons which are located within intronic regions. We applied a computational approach to address the question of how pseudoexons are neglected in favor of exons and to possibly identify sequence elements preventing pseudoexon splicing. A search for possible splicing silencers was carried out on a pseudoexon selection that resembled exons in terms of splice site strength and exon splicing enhancer (ESE) representation; three motifs were retrieved through hexamer composition comparisons. One of these functions as a powerful silencer in transfection-based splicing assays and matches a previously identified silencer sequence with hnRNP H binding ability. The other two motifs are novel and failed to induce skipping of a constitutive exon, indicating that they might act as weak repressors or in synergy with other unidentified elements. All three motifs are enriched in pseudoexons compared with intronic regions and display higher frequencies in intronless gene-coding sequences compared with exons. We consider that a subpopulation of pseudoexons might rely on negative regulators for splicing repression; this hypothesis, if experimentally verified, might improve our understanding of exonic splicing regulatory sequences and provide the identification of a novel mutation target for human genetic diseases.
Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.
A total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.
Difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical research.
The Alternative Splicing Mutation Database (ASMD) presents a collection of all known mutations inside human exons which affect splicing enhancers and silencers and cause changes in the alternative splicing pattern of the corresponding genes.
An algorithm was developed to derive a Splicing Potential (SP) table from the ASMD information. This table characterizes the influence of each oligonucleotide on the splicing effectiveness of the exon containing it. If the SP value for an oligonucleotide is positive, it promotes exon retention, while negative SP values mean the sequence favors exon skipping. The merit of the SP approach is the ability to separate splicing signals from a wide range of sequence motifs enriched in exonic sequences that are attributed to protein-coding properties and/or translation efficiency. Due to its direct derivation from observed splice site selection, SP has an advantage over other computational approaches for predicting alternative splicing.
We show that a vast majority of known exonic splicing enhancers have highly positive cumulative SP values, while known splicing silencers have core motifs with strongly negative cumulative SP values. Our approach allows for computation of the cumulative SP value of any sequence segment and, thus, gives researchers the ability to measure the possible contribution of any sequence to the pattern of splicing.
Exonic splicing enhancers (ESEs) are important cis elements required for exon inclusion. Using an in vitro functional selection and amplification procedure, we have identified a novel ESE motif recognized by the human SR protein SC35 under splicing conditions. The selected sequences are functional and specific: they promote splicing in nuclear extract or in S100 extract complemented by SC35 but not by SF2/ASF. They can also function in a different exonic context from the one used for the selection procedure. The selected sequences share one or two close matches to a short and highly degenerate octamer consensus, GRYYcSYR. A score matrix was generated from the selected sequences according to the nucleotide frequency at each position of their best match to the consensus motif. The SC35 score matrix, along with our previously reported SF2/ASF score matrix, was used to search the sequences of two well-characterized splicing substrates derived from the mouse immunoglobulin M (IgM) and human immunodeficiency virus tat genes. Multiple SC35 high-score motifs, but only two widely separated SF2/ASF motifs, were found in the IgM C4 exon, which can be spliced in S100 extract complemented by SC35. In contrast, multiple high-score motifs for both SF2/ASF and SC35 were found in a variant of the Tat T3 exon (lacking an SC35-specific silencer) whose splicing can be complemented by either SF2/ASF or SC35. The motif score matrix can help locate SC35-specific enhancers in natural exon sequences.