The frequency distribution of mutation-induced aberrant 3′ splice sites (3′ss) in exons and introns is more complex than for 5′ splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3′ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3′ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3′ss was achieved by the maximum entropy model. Almost one half of aberrant 3′ss was activated by AG-creating mutations and ∼95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3′ss was characterized by higher purine content than for authentic sites, particularly in position −3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position −11. A newly developed online database of aberrant 3′ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.
Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5′- and 3′-splice sites were activated either by mutations in the consensus sequences of natural exon–intron junctions (cryptic sites) or elsewhere (‘de novo’ sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3′- and 5′-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at http://www.dbass.org.uk/.
We compiled sequences of previously published aberrant 3′ splice sites (3′ss) that were generated by mutations in human disease genes. Cryptic 3′ss, defined here as those resulting from a mutation of the 3′YAG consensus, were more frequent in exons than in introns. They clustered in ∼20 nt region adjacent to authentic 3′ss, suggesting that their under-representation in introns is due to a depletion of AG dinucleotides in the polypyrimidine tract (PPT). In contrast, most aberrant 3′ss that were induced by mutations outside the 3′YAG consensus (designated ‘de novo’) were in introns. The activation of intronic de novo 3′ss was largely due to AG-creating mutations in the PPT. In contrast, exonic de novo 3′ss were more often induced by mutations improving the PPT, branchpoint sequence (BPS) or distant auxiliary signals, rather than by direct AG creation. The Shapiro–Senapathy matrix scores had a good prognostic value for cryptic, but not de novo 3′ss. Finally, AG-creating mutations in the PPT that produced aberrant 3′ss upstream of the predicted BPS in vivo shared a similar ‘BPS-new AG’ distance. Reduction of this distance and/or the strength of the new AG PPT in splicing reporter pre-mRNAs improved utilization of authentic 3′ss, suggesting that AG-creating mutations that are located closer to the BPS and are preceded by weaker PPT may result in less severe splicing defects.
Splice site selection is a key element of pre-mRNA splicing. Although it is known to involve specific recognition of short consensus sequences by the splicing machinery, the mechanisms by which 5′ splice sites are accurately identified remain controversial and incompletely resolved. The human F7 gene contains in its seventh intron (IVS7) a 37-bp VNTR minisatellite whose first element spans the exon7–IVS7 boundary. As a consequence, the IVS7 authentic donor splice site is followed by several cryptic splice sites identical in sequence, referred to as 5′ pseudo-sites, which normally remain silent. This region, therefore, provides a remarkable model to decipher the mechanism underlying 5′ splice site selection in mammals. We previously suggested a model for splice site selection that, in the presence of consecutive splice consensus sequences, would stimulate exclusively the selection of the most upstream 5′ splice site, rather than repressing the 3′ following pseudo-sites. In the present study, we provide experimental support to this hypothesis by using a mutational approach involving a panel of 50 mutant and wild-type F7 constructs expressed in various cell types. We demonstrate that the F7 IVS7 5′ pseudo-sites are functional, but do not compete with the authentic donor splice site. Moreover, we show that the selection of the 5′ splice site follows a scanning-type mechanism, precluding competition with other functional 5′ pseudo-sites available on immediate sequence context downstream of the activated one. In addition, 5′ pseudo-sites with an increased complementarity to U1snRNA up to 91% do not compete with the identified scanning mechanism. Altogether, these findings, which unveil a cell type–independent 5′−3′-oriented scanning process for accurate recognition of the authentic 5′ splice site, reconciliate apparently contradictory observations by establishing a hierarchy of competitiveness among the determinants involved in 5′ splice site selection.
Typically, mammalian genes contain coding sequences (exons) separated by non-coding sequences (introns). Introns are removed during pre-mRNA splicing. The accurate recognition of introns during splicing is essential, as any abnormality in that process will generate abnormal mRNAs that can cause diseases. Understanding the mechanisms of accurate splice site selection is of prime interest to life scientists. Exon–intron borders (splice sites) are defined by short sequences that are poorly conserved. The strength of any splice sequence can be assessed by its degree of homology with a splice site consensus sequence. Within exons and introns, several sequences can match with this consensus as well as or better than the splice sites. Using a system in which a splice site sequence is repeated several times in the intron, the authors showed that linear 5′−3′ search is a leading mechanism underlying splice site selection. This scanning mechanism is cell type–independent, and only the most upstream splice site of all the series is selected, even if splice sites with a better match to the consensus are in the vicinity. These findings reconciliate contradictory observations and establish a hierarchy among the determinants involved in splice site selection.
Most retained introns found in human cDNAs generated by high-throughput sequencing projects seem to result from underspliced transcripts, and thus they capture intermediate steps of pre-mRNA splicing. On the other hand, mutations in splice sites cause exon skipping of the respective exon or activation of pre-existing cryptic sites. Both types of events reflect properties of the splicing mechanism.
The retained introns were significantly shorter than constitutive ones, and skipped exons are shorter than exons with cryptic sites. Both donor and acceptor splice sites of retained introns were weaker than splice sites of constitutive introns. The authentic acceptor sites affected by mutations were significantly weaker in exons with activated cryptic sites than in skipped exons. The distance from a mutated splice site to the nearest equivalent site is significantly shorter in cases of activated cryptic sites compared to exon skipping events. The prevalence of retained introns within genes monotonically increased in the 5'-to-3' direction (more retained introns close to the 3'-end), consistent with the model of co-transcriptional splicing. The density of exonic splicing enhancers was higher, and the density of exonic splicing silencers lower in retained introns compared to constitutive ones and in exons with cryptic sites compared to skipped exons.
Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns, co-transcriptional splicing, dependence of splicing efficiency on the splice site strength and the density of candidate exonic splicing enhancers and silencers. These results are consistent with other, recently published analyses.
Mutations that affect splicing of precursor messenger RNAs play a major role in the development of hereditary diseases. Most splicing mutations have been found to eliminate GT or AG dinucleotides that define the 5′ and 3′ ends of introns, leading to exon skipping or cryptic splice-site activation. Although accurate description of the mis-spliced transcripts is critical for predicting phenotypic consequences of these alterations, their exact nature in affected individuals cannot often be determined experimentally. Using a comprehensive collection of exons that sustained cryptic splice-site activation or were skipped as a result of splice-site mutations, we have developed a multivariate logistic discrimination procedure that distinguishes the two aberrant splicing outcomes from DNA sequences. The new algorithm was validated using an independent sample of exons and implemented as a free online utility termed CRYP-SKIP (http://www.dbass.org.uk/cryp-skip/). The web application takes up one or more mutated alleles, each consisting of one exon and flanking intronic sequences, and provides a list of important predictor variables and their values, the overall probability of activating cryptic splice vs exon skipping, and the location and intrinsic strength of predicted cryptic splice sites in the input sequence. These results will facilitate phenotypic prediction of splicing mutations and provide further insights into splicing enhancer and silencer elements and their relative importance for splice-site selection in vivo.
mutation; gene; splicing; cryptic splice site; exon skipping; RNA
The underlying pathogenic mechanism of a large fraction of DNA variants of disease-causing genes is the disruption of the splicing process. We aimed to investigate the effect on splicing of the BRCA2 variants c.8488-1G > A (exon 20) and c.9026_9030del (exon 23), as well as 41 BRCA2 variants reported in the Breast Cancer Information Core (BIC) mutation database.
DNA variants were analyzed with the splicing prediction programs NNSPLICE and Human Splicing Finder. Functional analyses of candidate variants were performed by lymphocyte RT-PCR and/or hybrid minigene assays. Forty-one BIC variants of exons 19, 20, 23 and 24 were bioinformatically selected and generated by PCR-mutagenesis of the wild type minigenes.
Lymphocyte RT-PCR of c.8488-1G > A showed intron 19 retention and a 12-nucleotide deletion in exon 20, whereas c.9026_9030del did not show any splicing anomaly. Minigene analysis of c.8488-1G > A displayed the aforementioned aberrant isoforms but also exon 20 skipping. We further evaluated the splicing outcomes of 41 variants of four BRCA2 exons by minigene analysis. Eighteen variants presented splicing aberrations. Most variants (78.9%) disrupted the natural splice sites, whereas four altered putative enhancers/silencers and had a weak effect. Fluorescent RT-PCR of minigenes accurately detected 14 RNA isoforms generated by cryptic site usage, exon skipping and intron retention events. Fourteen variants showed total splicing disruptions and were predicted to truncate or eliminate essential domains of BRCA2.
A relevant proportion of BRCA2 variants are correlated with splicing disruptions, indicating that RNA analysis is a valuable tool to assess the pathogenicity of a particular DNA change. The minigene system is a straightforward and robust approach to detect variants with an impact on splicing and contributes to a better knowledge of this gene expression step.
The process of excising introns from pre-mRNA complexes is directed by specific genomic DNA sequences at intron—exon borders known as splice sites. These regions contain well-conserved motifs which allow the splicing process to proceed in a regulated and structured manner. However, as well as conventional splicing, several genes have the inherent capacity to undergo alternative splicing, thus allowing synthesis of multiple gene transcripts, perhaps with different functional properties. Within the human genome, therefore, through alternative splicing, it is possible to generate over 100,000 physiological gene products from the 35,000 or so known genes. Abnormalities in normal or alternative splicing, however, account for about 15% of all inherited single gene disorders, including many with a skin phenotype. These splicing abnormalities may arise through inherited mutations in constitutive splice sites or other critical intronic or exonic regions. This review article examines the process of normal intron—exon splicing, as well as what is known about alternative splicing of human genes. The review then addresses pathological disruption of normal intron—exon splicing that leads to inherited skin diseases, either resulting from mutations in sequences that have a direct influence on splicing or that generate cryptic splice sites. Examples of aberrant splicing, especially for the COL7A1 gene in patients with dystrophic epidermolysis bullosa, are discussed and illustrated. The review also examines a number of recently introduced computational tools that can be used to predict whether genomic DNA sequences changes may affect splice site selection and how robust the influence of such mutations might be on splicing.
Inherited skin disease; RNA; Intron; Exon; Gene mutation; Splice site; Cryptic splicing
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.
From transcription to translation, the events underlying protein production from DNA sequence are paramount to all aspects of cellular function. Pre-mRNAs in eukaryotes undergo several processing steps prior to their export to the cytoplasm. Among these, splicing – the process of intron removal and exon ligation – has been shown to play a central role in the regulation of gene expression. It has been estimated that more than half of the disease-causing mutations in humans do so by interfering with splicing. The difficulty in describing these disease mechanisms often lies in the low accuracy of the methods for prediction of functional splicing signals in the pre-mRNA. This is especially the case of the branch point, mainly due to its high sequence variability. We have developed a methodology for mammalian branch point prediction based on a machine-learning algorithm, which shows improved accuracy over previous published methods. Moreover, using a combination of experimental and bioinformatics approaches, we uncovered important positional properties of the branch point and shed new light on how some of its features may contribute to the final splicing outcome. These findings might prove useful for a better understanding of how splicing-associated mutations can lead to disease.
cis-spliced nuclear pre-mRNA introns found in a variety of organisms, including Tetrahymena thermophila, Drosophila melanogaster, Caenorhabditis elegans, and plants, are significantly richer in adenosine and uridine residues than their flanking exons are. The functional significance of this intronic AU richness, however, has been demonstrated only in plant nuclei. In these nuclei, 5' and 3' splice sites are selected in part by their positions relative to AU-rich elements spread throughout the length of an intron. Because of this position-dependent selection scheme, a 5' splice site at the normal (+1) exon-intron boundary having only three contiguous consensus nucleotides can compete effectively with an enhanced exonic site (-57E) having nine consensus nucleotides and outcompete an enhanced site (+106E) embedded within the AU-rich intron. To determine whether transitions from AU-poor exonic sequences to AU-rich intronic sequences influence 5' splice site selection in other organisms, alleles of the pea rbcS3A1 intron were expressed in Drosophila Schneider 2 cells, and their splicing patterns were compared with those in tobacco nuclei. We demonstrate that this heterologous transcript can be accurately spliced in transfected Drosophila nuclei and that a +1 G-to-A knockout mutation at the normal splice site activates the same three cryptic 5' splice sites as in tobacco. Enhancement of the exonic (-57) and intronic (+106) sites to consensus splice sites indicates that potential splice sites located in the upstream exon or at the 5' exon-intron boundary are preferred in Drosophila cells over those embedded within AU-rich intronic sequences. In contrast to tobacco, in which the activities of two competing 5' splice sites upstream of the AU-rich intron are modulated by their proximity to the AU transition point, D. melanogaster utilizes the upstream site which has a higher proportion of consensus nucleotides. The enhanced version of the cryptic intronic site is efficiently selected in D. melanogaster when the normal +1 site is weakened or discrete AU-rich elements upstream of the +106E site are disrupted. Selection of this internal site in tobacco requires more drastic disruption of these motifs. We conclude that 5' splice site selection in Drosophila nuclei is influenced by the intrinsic strengths of competing sites and by the presence of AU-rich intronic elements but to a different extent than in tobacco.
Alternative 3′ and 5′ splice site (ss) events constitute a significant part of all alternative splicing events. These events were also found to be related to several aberrant splicing diseases. However, only few of the characteristics that distinguish these events from alternative cassette exons are known currently. In this study, we compared the characteristics of constitutive exons, alternative cassette exons, and alternative 3′ss and 5′ss exons. The results revealed that alternative 3′ss and 5′ss exons are an intermediate state between constitutive and alternative cassette exons, where the constitutive side resembles constitutive exons, and the alternative side resembles alternative cassette exons. The results also show that alternative 3′ss and 5′ss exons exhibit low levels of symmetry (frame-preserving), similar to constitutive exons, whereas the sequence between the two alternative splice sites shows high symmetry levels, similar to alternative cassette exons. In addition, flanking intronic conservation analysis revealed that exons whose alternative splice sites are at least nine nucleotides apart show a high conservation level, indicating intronic participation in the regulation of their splicing, whereas exons whose alternative splice sites are fewer than nine nucleotides apart show a low conservation level. Further examination of these exons, spanning seven vertebrate species, suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally on four exons, showing that they indeed originated from constitutive exons that acquired a new competing splice site during evolution.
Alternative splicing is the mechanism that is responsible for the creation of multiple mRNA products from a single gene. It is considered a key player in genomic complexity achievement. Alternative 3′ and 5′ splicing events in which part of the exon is alternatively included or excluded in the mRNA constitute a significant part of all alternative splicing events, and yet little is known regarding their regulation mechanism and the evolutionary background that led to their creation. We show that alternative 3′ and 5′ splice site exons resemble constitutive exons. However, their alternative sequence resembles alternative cassette exons. Comparative genomics spanning seven vertebrate species suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally, showing that during evolution mutations shifted constitutive exons to undergo alternative 3′ and 5′ splicing.
The removal of introns from eukaryotic RNA transcripts requires the activities of five multi-component ribonucleoprotein complexes and numerous associated proteins. The lack of mutations affecting splicing factors essential for animal survival has limited the study of the in vivo regulation of splicing. From a screen for suppressors of the Caenorhabditis elegans unc-93(e1500) rubberband Unc phenotype, we identified mutations in genes that encode the C. elegans orthologs of two splicing factors, the U2AF large subunit (UAF-1) and SF1/BBP (SFA-1). The uaf-1(n4588) mutation resulted in temperature-sensitive lethality and caused the unc-93 RNA transcript to be spliced using a cryptic 3′ splice site generated by the unc-93(e1500) missense mutation. The sfa-1(n4562) mutation did not cause the utilization of this cryptic 3′ splice site. We isolated four uaf-1(n4588) intragenic suppressors that restored the viability of uaf-1 mutants at 25°C. These suppressors differentially affected the recognition of the cryptic 3′ splice site and implicated a small region of UAF-1 between the U2AF small subunit-interaction domain and the first RNA recognition motif in affecting the choice of 3′ splice site. We constructed a reporter for unc-93 splicing and using site-directed mutagenesis found that the position of the cryptic splice site affects its recognition. We also identified nucleotides of the endogenous 3′ splice site important for recognition by wild-type UAF-1. Our genetic and molecular analyses suggested that the phenotypic suppression of the unc-93(e1500) Unc phenotype by uaf-1(n4588) and sfa-1(n4562) was likely caused by altered splicing of an unknown gene. Our observations provide in vivo evidence that UAF-1 can act in regulating 3′ splice-site choice and establish a system that can be used to investigate the in vivo regulation of RNA splicing in C. elegans.
Eukaryotic genes contain intervening intronic sequences that must be removed from pre-mRNA transcripts by RNA splicing to generate functional messenger RNAs. While studying genes that encode and control a presumptive muscle potassium channel complex in the nematode Caenorhabditis elegans, we found that mutations in two splicing factors, the U2AF large subunit and SF1/BBP suppress the rubberband Unc phenotype caused by a rare missense mutation in the gene unc-93. Mutations affecting the U2AF large subunit caused the recognition of a cryptic 3′ splice site generated by the unc-93 mutation, providing in vivo evidence that the U2AF large subunit can affect splice-site selection. By contrast, an SF1/BBP mutation that suppressed the rubberband Unc phenotype did not cause splicing using this cryptic 3′ splice site. Our genetic studies identified a region of the U2AF large subunit important for its effect on 3′ splice-site choice. Our mutagenesis analysis of in vivo transgene splicing identified a positional effect on weak 3′ splice site selection and nucleotides of the endogenous 3′ splice site important for recognition. The system we have defined should facilitate future in vivo analyses of pre–mRNA splicing.
Exons with predicted branch points were identified from a large dataset of human exons and the importance of these branch points for splicing was verified.
The three consensus elements at the 3' end of human introns - the branch point sequence, the polypyrimidine tract, and the 3' splice site AG dinucleotide - are usually closely spaced within the final 40 nucleotides of the intron. However, the branch point sequence and polypyrimidine tract of a few known alternatively spliced exons lie up to 400 nucleotides upstream of the 3' splice site. The extended regions between the distant branch points (dBPs) and their 3' splice site are marked by the absence of other AG dinucleotides. In many cases alternative splicing regulatory elements are located within this region.
We have applied a simple algorithm, based on AG dinucleotide exclusion zones (AGEZ), to a large data set of verified human exons. We found a substantial number of exons with large AGEZs, which represent candidate dBP exons. We verified the importance of the predicted dBPs for splicing of some of these exons. This group of exons exhibits a higher than average prevalence of observed alternative splicing, and many of the exons are in genes with some human disease association.
The group of identified probable dBP exons are interesting first because they are likely to be alternatively spliced. Second, they are expected to be vulnerable to mutations within the entire extended AGEZ. Disruption of splicing of such exons, for example by mutations that lead to insertion of a new AG dinucleotide between the dBP and 3' splice site, could be readily understood even though the causative mutation might be remote from the conventional locations of splice site sequences.
In disease-associated genes, the understanding of the functional significance of deep intronic nucleotide variants may represent a difficult challenge. We have previously reported a new disease-causing mechanism that involves an intronic splicing processing element (ISPE) in ATM, composed of adjacent consensus 5′ and 3′ splice sites. A GTAA deletion within ISPE maintains potential adjacent splice sites, disrupts a non-canonical U1 snRNP interaction and activates an aberrant exon. In this paper, we demonstrate that binding of U1 snRNA through complementarity within a ∼40 nt window downstream of the ISPE prevents aberrant splicing. By selective mutagenesis at the adjacent consensus ISPE splice sites, we show that this effect is not due to a resplicing process occurring at the ISPE. Functional comparison of the ATM mouse counterpart and evaluation of the pre-mRNA splicing intermediates derived from affected cell lines and hybrid minigene assays indicate that U1 snRNP binding at the ISPE interferes with the cryptic acceptor site. Activation of this site results in a stringent 5′–3′ order of intron sequence removal around the cryptic exon. Artificial U1 snRNA loading by complementarity to heterologous exonic sequences represents a potential therapeutic method to prevent the usage of an aberrant CFTR cryptic exon. Our results suggest that ISPE-like intronic elements binding U1 snRNPs may regulate correct intron processing.
Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described.
Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates.
We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution.
Several unclassified variants (UVs) have been identified in splicing regions of disease-associated genes and their characterization as pathogenic mutations or benign polymorphisms is crucial for the understanding of their role in disease development. In this study, 24 UVs located at BRCA1 and BRCA2 splice sites were characterized by transcripts analysis. These results were used to evaluate the ability of nine bioinformatics programs in predicting genetic variants causing aberrant splicing (spliceogenic variants) and the nature of aberrant transcripts. Eleven variants in BRCA1 and 8 in BRCA2, including 8 not previously characterized at transcript level, were ascertained to affect mRNA splicing. Of these, 16 led to the synthesis of aberrant transcripts containing premature termination codons (PTCs), 2 to the up-regulation of naturally occurring alternative transcripts containing PTCs, and one to an in-frame deletion within the region coding for the DNA binding domain of BRCA2, causing the loss of the ability to bind the partner protein DSS1 and ssDNA. For each computational program, we evaluated the rate of non-informative analyses, i.e. those that did not recognize the natural splice sites in the wild-type sequence, and the rate of false positive predictions, i.e., variants incorrectly classified as spliceogenic, as a measure of their specificity, under conditions setting sensitivity of predictions to 100%. The programs that performed better were Human Splicing Finder and Automated Splice Site Analyses, both exhibiting 100% informativeness and specificity. For 10 mutations the activation of cryptic splice sites was observed, but we were unable to derive simple criteria to select, among the different cryptic sites predicted by the bioinformatics analyses, those actually used. Consistent with previous reports, our study provides evidences that in silico tools can be used for selecting splice site variants for in vitro analyses. However, the latter remain mandatory for the characterization of the nature of aberrant transcripts.
Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing.
A typical human gene consists of 9 exons around 150 nucleotides in length, separated by introns that are ∼3,000 nucleotides long. The challenge of the splicing machinery is to precisely identify and ligate the exons, while removing the introns. We aimed to understand how the splicing machinery meets this momentous challenge, based on Alu exonization events. Alus are transposable elements, of which approximately one million copies exist in the human genome, a large portion of which within introns. Throughout evolution, some intronic Alus accumulated mutations and became recognized by the splicing machinery as exons, a process termed exonization. Such Alus remain highly similar to their non-exonizing counterparts but are perceived as different by the splicing machinery. By comparing exonizing Alus to their non-exonizing counterparts, we were able to identify numerous features in which they differ and which presumably lead to the recognition only of the former by the splicing machinery. Our findings reveal insights regarding the role of local RNA secondary structures, exon–intron architecture constraints, and splicing regulatory signals. We integrated these features in a computational model, which was able to successfully mimic the function of the splicing machinery and discriminate between true Alu exons and their intronic counterparts, highlighting the functional importance of these features.
Abnormalities of pre-mRNA splicing are increasingly recognized as an important mechanism through which gene mutations cause disease. However, apart from the mutations in the donor and acceptor sites, the effects on splicing of other sequence variations are difficult to predict. Loosely defined exonic and intronic sequences have been shown to affect splicing efficiency by means of silencing and enhancement mechanisms. Thus, nucleotide substitutions in these sequences can induce aberrant splicing. Web-based resources have recently been developed to facilitate the identification of nucleotide changes that could alter splicing. However, computer predictions do not always correlate with in vivo splicing defects. The issue of unclassified variants in cancer predisposing genes is very important both for the correct ascertainment of cancer risk and for the understanding of the basic mechanisms of cancer gene function and regulation. Therefore we aimed to verify how predictions that can be drawn from in silico analysis correlate with results obtained in an in vivo splicing assay.
We analysed 99 hMLH1 and hMSH2 missense mutations with six different algorithms. Transfection of three different cell lines with 20 missense mutations, showed that a minority of them lead to defective splicing. Moreover, we observed that some exons and some mutations show cell-specific differences in the frequency of exon inclusion.
Our results suggest that the available algorithms, while potentially helpful in identifying splicing modulators especially when they are located in weakly defined exons, do not always correspond to an obvious modification of the splicing pattern. Thus caution must be used in assessing the pathogenicity of a missense or silent mutation with prediction programs. The variations observed in the splicing proficiency in three different cell lines suggest that nucleotide changes may dictate alternative splice site selection in a tissue-specific manner contributing to the widely observed phenotypic variability in inherited cancers.
Transcripts of the human tumor susceptibility gene 101 (TSG101) are aberrantly spliced in many cancers. A major aberrant splicing event on the TSG101 pre-mRNA involves joining of distant alternative 5′ and 3′ splice sites within exon 2 and exon 9, respectively, resulting in the extensive elimination of the mRNA. The estimated strengths of the alternative splice sites are much lower than those of authentic splice sites. We observed that the equivalent aberrant mRNA could be generated from an intron-less TSG101 gene expressed ectopically in breast cancer cells. Remarkably, we identified a pathway-specific endogenous lariat RNA consisting solely of exonic sequences, predicted to be generated by a re-splicing between exon 2 and exon 9 on the spliced mRNA. Our results provide evidence for a two-step splicing pathway in which the initial constitutive splicing removes all 14 authentic splice sites, thereby bringing the weak alternative splice sites into close proximity. We also demonstrate that aberrant multiple-exon skipping of the fragile histidine triad (FHIT) pre-mRNA in cancer cells occurs via re-splicing of spliced FHIT mRNA. The re-splicing of mature mRNA can potentially generate mutation-independent diversity in cancer transcriptomes. Conversely, a mechanism may exist in normal cells to prevent potentially deleterious mRNA re-splicing events.
BRCA2 germ-line mutations predispose to breast and ovarian cancer. Mutations are widespread and unclassified splice variants are frequently encountered. We describe the parental origin and functional characterization of a novel de novo BRCA2 splice site mutation found in a patient exhibiting a ductal carcinoma at the age of 40.
Variations were identified by denaturing high performance liquid chromatography (dHPLC) and sequencing of the BRCA1 and BRCA2 genes. The effect of the mutation on splicing was examined by exon trapping in COS-7 cells and by RT-PCR on RNA isolated from whole blood. The paternity was determined by single nucleotide polymorphism (SNP) microarray analysis. Parental origin of the de novo mutation was determined by establishing mutation-SNP haplotypes by variant specific PCR, while de novo and mosaic status was investigated by sequencing of DNA from leucocytes and carcinoma tissue.
A novel BRCA2 variant in the splice donor site of exon 21 (nucleotide 8982+1 G→A/c.8754+1 G→A) was identified. Exon trapping showed that the mutation activates a cryptic splice site 46 base pairs 3' of exon 21, resulting in the inclusion of a premature stop codon and synthesis of a truncated BRCA2 protein. The aberrant splicing was verified by RT-PCR analysis on RNA isolated from whole blood of the affected patient. The mutation was not found in any of the patient's parents or in the mother's carcinoma, showing it is a de novo mutation. Variant specific PCR indicates that the mutation arose in the male germ-line.
We conclude that the novel BRCA2 splice variant is a de novo mutation introduced in the male spermatozoa that can be classified as a disease causing mutation.
Alternative splicing of genes is an efficient means of generating variation in protein function. Several disease states have been associated with rare genetic variants that affect splicing patterns. Conversely, splicing efficiency of some genes is known to vary between individuals without apparent ill effects. What is not clear is whether commonly observed phenotypic variation in splicing patterns, and hence potential variation in protein function, is to a significant extent determined by naturally occurring DNA sequence variation and in particular by single nucleotide polymorphisms (SNPs). In this study, we surveyed the splicing patterns of 250 exons in 22 individuals who had been previously genotyped by the International HapMap Project. We identified 70 simple cassette exon alternative splicing events in our experimental system; for six of these, we detected consistent differences in splicing pattern between individuals, with a highly significant association between splice phenotype and neighbouring SNPs. Remarkably, for five out of six of these events, the strongest correlation was found with the SNP closest to the intron–exon boundary, although the distance between these SNPs and the intron–exon boundary ranged from 2 bp to greater than 1,000 bp. Two of these SNPs were further investigated using a minigene splicing system, and in each case the SNPs were found to exert cis-acting effects on exon splicing efficiency in vitro. The functional consequences of these SNPs could not be predicted using bioinformatic algorithms. Our findings suggest that phenotypic variation in splicing patterns is determined by the presence of SNPs within flanking introns or exons. Effects on splicing may represent an important mechanism by which SNPs influence gene function.
Genetic variation, through its effects on gene expression, influences many aspects of the human phenotype. Understanding the impact of genetic variation on human disease risk has become a major goal for biomedical research and has the potential of revealing both novel disease mechanisms and novel functional elements controlling gene expression. Recent large-scale studies have suggested that a relatively high proportion of human genes show allele-specific variation in expression. Effects of common DNA polymorphisms on mRNA splicing are less well studied. Variation in splicing patterns is known to be tissue specific, and for a small number of genes has been shown to vary among individuals. What is not known is whether allele-specific splicing events are an important mechanism by which common genetic variation affects gene expression. In this study we show that allele-specific alternative splicing was observed in six out of 70 exon-skipping events. Sequence analysis of the relevant splice sites and of the regions surrounding single nucleotide polymorphisms correlated with the splicing events failed to identify any predictive bioinformatic signals. A genome-wide study of allele-specific splicing, using an experimental rather than a bioinformatic approach, is now required.
A comparative analysis of SNPs and their exonic and intronic environments identifies the features predictive of splice affecting variants.
Single point mutations at both synonymous and non-synonymous positions within exons can have severe effects on gene function through disruption of splicing. Predicting these mutations in silico purely from the genomic sequence is difficult due to an incomplete understanding of the multiple factors that may be responsible. In addition, little is known about which computational prediction approaches, such as those involving exonic splicing enhancers and exonic splicing silencers, are most informative.
We assessed the features of single-nucleotide genomic variants verified to cause exon skipping and compared them to a large set of coding SNPs common in the human population, which are likely to have no effect on splicing. Our findings implicate a number of features important for their ability to discriminate splice-affecting variants, including the naturally occurring density of exonic splicing enhancers and exonic splicing silencers of the exon and intronic environment, extensive changes in the number of predicted exonic splicing enhancers and exonic splicing silencers, proximity to the splice junctions and evolutionary constraint of the region surrounding the variant. By extending this approach to additional datasets, we also identified relevant features of variants that cause increased exon inclusion and ectopic splice site activation.
We identified a number of features that have statistically significant representation among exonic variants that modulate splicing. These analyses highlight putative mechanisms responsible for splicing outcome and emphasize the role of features important for exon definition. We developed a web-tool, Skippy, to score coding variants for these relevant splice-modulating features.
We previously found that the splicing of exon 5 to exon 6 in the rat beta-TM gene required that exon 6 first be joined to the downstream common exon 8 (Helfman et al., Genes and Dev. 2, 1627-1638, 1988). Pre-mRNAs containing exon 5, intron 5 and exon 6 are not normally spliced in vitro. We have carried out a mutational analysis to determine which sequences in the pre-mRNA contribute to the inability of this precursor to be spliced in vitro. We found that mutations in two regions of the pre-mRNA led to activation of the 3'-splice site of exon 6, without first joining exon 6 to exon 8. First, introduction of a nine nucleotide poly U tract upstream of the 3'-splice site of exon 6 results in the splicing of exon 5 to exon 6 with as little as 35 nucleotides of exon 6. Second, introduction of a consensus 5'-splice site in exon 6 led to splicing of exon 5 to exon 6. Thus, three distinct elements can act independently to activate the use of the 3'-splice site of exon 6: (1) the sequences contained within exon 8 when joined to exon 6, (2) a poly U tract in intron 5, and (3) a consensus 5'-splice site in exon 6. Using biochemical assays, we have determined that these sequence elements interact with distinct cellular factors for 3'-splice site utilization. Although HeLa cell nuclear extracts were able to splice all three types of pre-mRNAs mentioned above, a cytoplasmic S100 fraction supplemented with SR proteins was unable to efficiently splice exon 5 to exon 6 using precursors in which exon 6 was joined to exon 8. We also studied how these elements contribute to alternative splice site selection using precursors containing the mutually exclusive, alternatively spliced cassette comprised of exons 5 through 8. Introduction of the poly U tract upstream of exon 6, and changing the 5'-splice site of exon 6 to a consensus sequence, either alone or in combination, facilitated the use of exon 6 in vitro, such that exon 6 was spliced more efficiently to exon 8. These data show that intron sequences upstream of an exon can contribute to the use of the downstream 5'-splice, and that sequences surrounding exon 6 can contribute to tissue-specific alternative splice site selection.
Splicing is a cellular mechanism, which dictates eukaryotic gene expression by removing the noncoding introns and ligating the coding exons in the form of a messenger RNA molecule. Alternative splicing (AS) adds a major level of complexity to this mechanism and thus to the regulation of gene expression. This widespread cellular phenomenon generates multiple messenger RNA isoforms from a single gene, by utilizing alternative splice sites and promoting different exon–intron inclusions and exclusions. AS greatly increases the coding potential of eukaryotic genomes and hence contributes to the diversity of eukaryotic proteomes. Mutations that lead to disruptions of either constitutive splicing or AS cause several diseases, among which are myotonic dystrophy and cystic fibrosis. Aberrant splicing is also well established in cancer states. Identification of rare novel mutations associated with splice-site recognition, and splicing regulation in general, could provide further insight into genetic mechanisms of rare diseases. Here, disease relevance of aberrant splicing is reviewed, and the new methodological approach of starting from disease phenotype, employing exome sequencing and identifying rare mutations affecting splicing regulation is described. Exome sequencing has emerged as a reliable method for finding sequence variations associated with various disease states. To date, genetic studies using exome sequencing to find disease-causing mutations have focused on the discovery of nonsynonymous single nucleotide polymorphisms that alter amino acids or introduce early stop codons, or on the use of exome sequencing as a means to genotype known single nucleotide polymorphisms. The involvement of splicing mutations in inherited diseases has received little attention and thus likely occurs more frequently than currently estimated. Studies of exome sequencing followed by molecular and bioinformatic analyses have great potential to reveal the high impact of splicing mutations underlying human disease.