1.  Aberrant 3′ splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization 
Nucleic Acids Research  2006;34(16):4630-4641.
The frequency distribution of mutation-induced aberrant 3′ splice sites (3′ss) in exons and introns is more complex than for 5′ splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3′ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3′ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3′ss was achieved by the maximum entropy model. Almost one half of aberrant 3′ss was activated by AG-creating mutations and ∼95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3′ss was characterized by higher purine content than for authentic sites, particularly in position −3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position −11. A newly developed online database of aberrant 3′ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.
PMCID: PMC1636351  PMID: 16963498
2.  Global control of aberrant splice-site activation by auxiliary splicing sequences: evidence for a gradient in exon and intron definition 
Nucleic Acids Research  2007;35(19):6399-6413.
Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
PMCID: PMC2095810  PMID: 17881373
3.  DBASS3 and DBASS5: databases of aberrant 3′- and 5′-splice sites 
Nucleic Acids Research  2010;39(Database issue):D86-D91.
DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5′- and 3′-splice sites were activated either by mutations in the consensus sequences of natural exon–intron junctions (cryptic sites) or elsewhere (‘de novo’ sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3′- and 5′-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at
PMCID: PMC3013770  PMID: 20929868
4.  Biased exon/intron distribution of cryptic and de novo 3′ splice sites 
Nucleic Acids Research  2005;33(15):4882-4898.
We compiled sequences of previously published aberrant 3′ splice sites (3′ss) that were generated by mutations in human disease genes. Cryptic 3′ss, defined here as those resulting from a mutation of the 3′YAG consensus, were more frequent in exons than in introns. They clustered in ∼20 nt region adjacent to authentic 3′ss, suggesting that their under-representation in introns is due to a depletion of AG dinucleotides in the polypyrimidine tract (PPT). In contrast, most aberrant 3′ss that were induced by mutations outside the 3′YAG consensus (designated ‘de novo’) were in introns. The activation of intronic de novo 3′ss was largely due to AG-creating mutations in the PPT. In contrast, exonic de novo 3′ss were more often induced by mutations improving the PPT, branchpoint sequence (BPS) or distant auxiliary signals, rather than by direct AG creation. The Shapiro–Senapathy matrix scores had a good prognostic value for cryptic, but not de novo 3′ss. Finally, AG-creating mutations in the PPT that produced aberrant 3′ss upstream of the predicted BPS in vivo shared a similar ‘BPS-new AG’ distance. Reduction of this distance and/or the strength of the new AG PPT in splicing reporter pre-mRNAs improved utilization of authentic 3′ss, suggesting that AG-creating mutations that are located closer to the BPS and are preceded by weaker PPT may result in less severe splicing defects.
PMCID: PMC1197134  PMID: 16141195
5.  Oriented Scanning Is the Leading Mechanism Underlying 5′ Splice Site Selection in Mammals 
PLoS Genetics  2006;2(9):e138.
Splice site selection is a key element of pre-mRNA splicing. Although it is known to involve specific recognition of short consensus sequences by the splicing machinery, the mechanisms by which 5′ splice sites are accurately identified remain controversial and incompletely resolved. The human F7 gene contains in its seventh intron (IVS7) a 37-bp VNTR minisatellite whose first element spans the exon7–IVS7 boundary. As a consequence, the IVS7 authentic donor splice site is followed by several cryptic splice sites identical in sequence, referred to as 5′ pseudo-sites, which normally remain silent. This region, therefore, provides a remarkable model to decipher the mechanism underlying 5′ splice site selection in mammals. We previously suggested a model for splice site selection that, in the presence of consecutive splice consensus sequences, would stimulate exclusively the selection of the most upstream 5′ splice site, rather than repressing the 3′ following pseudo-sites. In the present study, we provide experimental support to this hypothesis by using a mutational approach involving a panel of 50 mutant and wild-type F7 constructs expressed in various cell types. We demonstrate that the F7 IVS7 5′ pseudo-sites are functional, but do not compete with the authentic donor splice site. Moreover, we show that the selection of the 5′ splice site follows a scanning-type mechanism, precluding competition with other functional 5′ pseudo-sites available on immediate sequence context downstream of the activated one. In addition, 5′ pseudo-sites with an increased complementarity to U1snRNA up to 91% do not compete with the identified scanning mechanism. Altogether, these findings, which unveil a cell type–independent 5′−3′-oriented scanning process for accurate recognition of the authentic 5′ splice site, reconciliate apparently contradictory observations by establishing a hierarchy of competitiveness among the determinants involved in 5′ splice site selection.
Typically, mammalian genes contain coding sequences (exons) separated by non-coding sequences (introns). Introns are removed during pre-mRNA splicing. The accurate recognition of introns during splicing is essential, as any abnormality in that process will generate abnormal mRNAs that can cause diseases. Understanding the mechanisms of accurate splice site selection is of prime interest to life scientists. Exon–intron borders (splice sites) are defined by short sequences that are poorly conserved. The strength of any splice sequence can be assessed by its degree of homology with a splice site consensus sequence. Within exons and introns, several sequences can match with this consensus as well as or better than the splice sites. Using a system in which a splice site sequence is repeated several times in the intron, the authors showed that linear 5′−3′ search is a leading mechanism underlying splice site selection. This scanning mechanism is cell type–independent, and only the most upstream splice site of all the series is selected, even if splice sites with a better match to the consensus are in the vicinity. These findings reconciliate contradictory observations and establish a hierarchy among the determinants involved in splice site selection.
PMCID: PMC1557585  PMID: 16948532
6.  Computational analysis of splicing errors and mutations in human transcripts 
BMC Genomics  2008;9:13.
Most retained introns found in human cDNAs generated by high-throughput sequencing projects seem to result from underspliced transcripts, and thus they capture intermediate steps of pre-mRNA splicing. On the other hand, mutations in splice sites cause exon skipping of the respective exon or activation of pre-existing cryptic sites. Both types of events reflect properties of the splicing mechanism.
The retained introns were significantly shorter than constitutive ones, and skipped exons are shorter than exons with cryptic sites. Both donor and acceptor splice sites of retained introns were weaker than splice sites of constitutive introns. The authentic acceptor sites affected by mutations were significantly weaker in exons with activated cryptic sites than in skipped exons. The distance from a mutated splice site to the nearest equivalent site is significantly shorter in cases of activated cryptic sites compared to exon skipping events. The prevalence of retained introns within genes monotonically increased in the 5'-to-3' direction (more retained introns close to the 3'-end), consistent with the model of co-transcriptional splicing. The density of exonic splicing enhancers was higher, and the density of exonic splicing silencers lower in retained introns compared to constitutive ones and in exons with cryptic sites compared to skipped exons.
Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns, co-transcriptional splicing, dependence of splicing efficiency on the splice site strength and the density of candidate exonic splicing enhancers and silencers. These results are consistent with other, recently published analyses.
PMCID: PMC2234086  PMID: 18194514
7.  AU-rich intronic elements affect pre-mRNA 5' splice site selection in Drosophila melanogaster. 
Molecular and Cellular Biology  1993;13(12):7689-7697.
cis-spliced nuclear pre-mRNA introns found in a variety of organisms, including Tetrahymena thermophila, Drosophila melanogaster, Caenorhabditis elegans, and plants, are significantly richer in adenosine and uridine residues than their flanking exons are. The functional significance of this intronic AU richness, however, has been demonstrated only in plant nuclei. In these nuclei, 5' and 3' splice sites are selected in part by their positions relative to AU-rich elements spread throughout the length of an intron. Because of this position-dependent selection scheme, a 5' splice site at the normal (+1) exon-intron boundary having only three contiguous consensus nucleotides can compete effectively with an enhanced exonic site (-57E) having nine consensus nucleotides and outcompete an enhanced site (+106E) embedded within the AU-rich intron. To determine whether transitions from AU-poor exonic sequences to AU-rich intronic sequences influence 5' splice site selection in other organisms, alleles of the pea rbcS3A1 intron were expressed in Drosophila Schneider 2 cells, and their splicing patterns were compared with those in tobacco nuclei. We demonstrate that this heterologous transcript can be accurately spliced in transfected Drosophila nuclei and that a +1 G-to-A knockout mutation at the normal splice site activates the same three cryptic 5' splice sites as in tobacco. Enhancement of the exonic (-57) and intronic (+106) sites to consensus splice sites indicates that potential splice sites located in the upstream exon or at the 5' exon-intron boundary are preferred in Drosophila cells over those embedded within AU-rich intronic sequences. In contrast to tobacco, in which the activities of two competing 5' splice sites upstream of the AU-rich intron are modulated by their proximity to the AU transition point, D. melanogaster utilizes the upstream site which has a higher proportion of consensus nucleotides. The enhanced version of the cryptic intronic site is efficiently selected in D. melanogaster when the normal +1 site is weakened or discrete AU-rich elements upstream of the +106E site are disrupted. Selection of this internal site in tobacco requires more drastic disruption of these motifs. We conclude that 5' splice site selection in Drosophila nuclei is influenced by the intrinsic strengths of competing sites and by the presence of AU-rich intronic elements but to a different extent than in tobacco.
PMCID: PMC364840  PMID: 8246985
8.  Ab initio prediction of mutation-induced cryptic splice-site activation and exon skipping 
Mutations that affect splicing of precursor messenger RNAs play a major role in the development of hereditary diseases. Most splicing mutations have been found to eliminate GT or AG dinucleotides that define the 5′ and 3′ ends of introns, leading to exon skipping or cryptic splice-site activation. Although accurate description of the mis-spliced transcripts is critical for predicting phenotypic consequences of these alterations, their exact nature in affected individuals cannot often be determined experimentally. Using a comprehensive collection of exons that sustained cryptic splice-site activation or were skipped as a result of splice-site mutations, we have developed a multivariate logistic discrimination procedure that distinguishes the two aberrant splicing outcomes from DNA sequences. The new algorithm was validated using an independent sample of exons and implemented as a free online utility termed CRYP-SKIP ( The web application takes up one or more mutated alleles, each consisting of one exon and flanking intronic sequences, and provides a list of important predictor variables and their values, the overall probability of activating cryptic splice vs exon skipping, and the location and intrinsic strength of predicted cryptic splice sites in the input sequence. These results will facilitate phenotypic prediction of splicing mutations and provide further insights into splicing enhancer and silencer elements and their relative importance for splice-site selection in vivo.
PMCID: PMC2947103  PMID: 19142208
mutation; gene; splicing; cryptic splice site; exon skipping; RNA
9.  Genome-Wide Association between Branch Point Properties and Alternative Splicing 
PLoS Computational Biology  2010;6(11):e1001016.
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.
Author Summary
From transcription to translation, the events underlying protein production from DNA sequence are paramount to all aspects of cellular function. Pre-mRNAs in eukaryotes undergo several processing steps prior to their export to the cytoplasm. Among these, splicing – the process of intron removal and exon ligation – has been shown to play a central role in the regulation of gene expression. It has been estimated that more than half of the disease-causing mutations in humans do so by interfering with splicing. The difficulty in describing these disease mechanisms often lies in the low accuracy of the methods for prediction of functional splicing signals in the pre-mRNA. This is especially the case of the branch point, mainly due to its high sequence variability. We have developed a methodology for mammalian branch point prediction based on a machine-learning algorithm, which shows improved accuracy over previous published methods. Moreover, using a combination of experimental and bioinformatics approaches, we uncovered important positional properties of the branch point and shed new light on how some of its features may contribute to the final splicing outcome. These findings might prove useful for a better understanding of how splicing-associated mutations can lead to disease.
PMCID: PMC2991248  PMID: 21124863
10.  Comprehensive splicing functional analysis of DNA variants of the BRCA2 gene by hybrid minigenes 
The underlying pathogenic mechanism of a large fraction of DNA variants of disease-causing genes is the disruption of the splicing process. We aimed to investigate the effect on splicing of the BRCA2 variants c.8488-1G > A (exon 20) and c.9026_9030del (exon 23), as well as 41 BRCA2 variants reported in the Breast Cancer Information Core (BIC) mutation database.
DNA variants were analyzed with the splicing prediction programs NNSPLICE and Human Splicing Finder. Functional analyses of candidate variants were performed by lymphocyte RT-PCR and/or hybrid minigene assays. Forty-one BIC variants of exons 19, 20, 23 and 24 were bioinformatically selected and generated by PCR-mutagenesis of the wild type minigenes.
Lymphocyte RT-PCR of c.8488-1G > A showed intron 19 retention and a 12-nucleotide deletion in exon 20, whereas c.9026_9030del did not show any splicing anomaly. Minigene analysis of c.8488-1G > A displayed the aforementioned aberrant isoforms but also exon 20 skipping. We further evaluated the splicing outcomes of 41 variants of four BRCA2 exons by minigene analysis. Eighteen variants presented splicing aberrations. Most variants (78.9%) disrupted the natural splice sites, whereas four altered putative enhancers/silencers and had a weak effect. Fluorescent RT-PCR of minigenes accurately detected 14 RNA isoforms generated by cryptic site usage, exon skipping and intron retention events. Fourteen variants showed total splicing disruptions and were predicted to truncate or eliminate essential domains of BRCA2.
A relevant proportion of BRCA2 variants are correlated with splicing disruptions, indicating that RNA analysis is a valuable tool to assess the pathogenicity of a particular DNA change. The minigene system is a straightforward and robust approach to detect variants with an impact on splicing and contributes to a better knowledge of this gene expression step.
PMCID: PMC3446350  PMID: 22632462
11.  Functional Characterization of the spf/ash Splicing Variation in OTC Deficiency of Mice and Man 
PLoS ONE  2015;10(4):e0122966.
The spf/ash mouse model of ornithine transcarbamylase (OTC) deficiency, a severe urea cycle disorder, is caused by a mutation (c.386G>A; p.R129H) in the last nucleotide of exon 4 of the Otc gene, affecting the 5’ splice site and resulting in partial use of a cryptic splice site 48 bp into the adjacent intron. The equivalent nucleotide change and predicted amino acid change is found in OTC deficient patients. Here we have used liver tissue and minigene assays to dissect the transcriptional profile resulting from the “spf/ash” mutation in mice and man. For the mutant mouse, we confirmed liver transcripts corresponding to partial intron 4 retention by the use of the c.386+48 cryptic site and to normally spliced transcripts, with exon 4 always containing the c.386G>A (p.R129H) variant. In contrast, the OTC patient exhibited exon 4 skipping or c.386G>A (p.R129H)-variant exon 4 retention by using the natural or a cryptic splice site at nucleotide position c.386+4. The corresponding OTC tissue enzyme activities were between 3-6% of normal control in mouse and human liver. The use of the cryptic splice sites was reproduced in minigenes carrying murine or human mutant sequences. Some normally spliced transcripts could be detected in minigenes in both cases. Antisense oligonucleotides designed to block the murine cryptic +48 site were used in minigenes in an attempt to redirect splicing to the natural site. The results highlight the relevance of in depth investigations of the molecular mechanisms of splicing mutations and potential therapeutic approaches. Notably, they emphasize the fact that findings in animal models may not be applicable for human patients due to the different genomic context of the mutations.
PMCID: PMC4390381  PMID: 25853564
12.  Mutations in the Caenorhabditis elegans U2AF Large Subunit UAF-1 Alter the Choice of a 3′ Splice Site In Vivo 
PLoS Genetics  2009;5(11):e1000708.
The removal of introns from eukaryotic RNA transcripts requires the activities of five multi-component ribonucleoprotein complexes and numerous associated proteins. The lack of mutations affecting splicing factors essential for animal survival has limited the study of the in vivo regulation of splicing. From a screen for suppressors of the Caenorhabditis elegans unc-93(e1500) rubberband Unc phenotype, we identified mutations in genes that encode the C. elegans orthologs of two splicing factors, the U2AF large subunit (UAF-1) and SF1/BBP (SFA-1). The uaf-1(n4588) mutation resulted in temperature-sensitive lethality and caused the unc-93 RNA transcript to be spliced using a cryptic 3′ splice site generated by the unc-93(e1500) missense mutation. The sfa-1(n4562) mutation did not cause the utilization of this cryptic 3′ splice site. We isolated four uaf-1(n4588) intragenic suppressors that restored the viability of uaf-1 mutants at 25°C. These suppressors differentially affected the recognition of the cryptic 3′ splice site and implicated a small region of UAF-1 between the U2AF small subunit-interaction domain and the first RNA recognition motif in affecting the choice of 3′ splice site. We constructed a reporter for unc-93 splicing and using site-directed mutagenesis found that the position of the cryptic splice site affects its recognition. We also identified nucleotides of the endogenous 3′ splice site important for recognition by wild-type UAF-1. Our genetic and molecular analyses suggested that the phenotypic suppression of the unc-93(e1500) Unc phenotype by uaf-1(n4588) and sfa-1(n4562) was likely caused by altered splicing of an unknown gene. Our observations provide in vivo evidence that UAF-1 can act in regulating 3′ splice-site choice and establish a system that can be used to investigate the in vivo regulation of RNA splicing in C. elegans.
Author Summary
Eukaryotic genes contain intervening intronic sequences that must be removed from pre-mRNA transcripts by RNA splicing to generate functional messenger RNAs. While studying genes that encode and control a presumptive muscle potassium channel complex in the nematode Caenorhabditis elegans, we found that mutations in two splicing factors, the U2AF large subunit and SF1/BBP suppress the rubberband Unc phenotype caused by a rare missense mutation in the gene unc-93. Mutations affecting the U2AF large subunit caused the recognition of a cryptic 3′ splice site generated by the unc-93 mutation, providing in vivo evidence that the U2AF large subunit can affect splice-site selection. By contrast, an SF1/BBP mutation that suppressed the rubberband Unc phenotype did not cause splicing using this cryptic 3′ splice site. Our genetic studies identified a region of the U2AF large subunit important for its effect on 3′ splice-site choice. Our mutagenesis analysis of in vivo transgene splicing identified a positional effect on weak 3′ splice site selection and nucleotides of the endogenous 3′ splice site important for recognition. The system we have defined should facilitate future in vivo analyses of pre–mRNA splicing.
PMCID: PMC2762039  PMID: 19893607
13.  Genomic features defining exonic variants that modulate splicing 
Genome Biology  2010;11(2):R20.
A comparative analysis of SNPs and their exonic and intronic environments identifies the features predictive of splice affecting variants.
Single point mutations at both synonymous and non-synonymous positions within exons can have severe effects on gene function through disruption of splicing. Predicting these mutations in silico purely from the genomic sequence is difficult due to an incomplete understanding of the multiple factors that may be responsible. In addition, little is known about which computational prediction approaches, such as those involving exonic splicing enhancers and exonic splicing silencers, are most informative.
We assessed the features of single-nucleotide genomic variants verified to cause exon skipping and compared them to a large set of coding SNPs common in the human population, which are likely to have no effect on splicing. Our findings implicate a number of features important for their ability to discriminate splice-affecting variants, including the naturally occurring density of exonic splicing enhancers and exonic splicing silencers of the exon and intronic environment, extensive changes in the number of predicted exonic splicing enhancers and exonic splicing silencers, proximity to the splice junctions and evolutionary constraint of the region surrounding the variant. By extending this approach to additional datasets, we also identified relevant features of variants that cause increased exon inclusion and ectopic splice site activation.
We identified a number of features that have statistically significant representation among exonic variants that modulate splicing. These analyses highlight putative mechanisms responsible for splicing outcome and emphasize the role of features important for exon definition. We developed a web-tool, Skippy, to score coding variants for these relevant splice-modulating features.
PMCID: PMC2872880  PMID: 20158892
14.  Whole Exome Sequencing Reveals Novel PHEX Splice Site Mutations in Patients with Hypophosphatemic Rickets 
PLoS ONE  2015;10(6):e0130729.
Hypophosphatemic rickets (HR) is a heterogeneous genetic phosphate wasting disorder. The disease is most commonly caused by mutations in the PHEX gene located on the X-chromosome or by mutations in CLCN5, DMP1, ENPP1, FGF23, and SLC34A3. The aims of this study were to perform molecular diagnostics for four patients with HR of Indian origin (two independent families) and to describe their clinical features.
We performed whole exome sequencing (WES) for the affected mother of two boys who also displayed the typical features of HR, including bone malformations and phosphate wasting. B-lymphoblast cell lines were established by EBV transformation and subsequent RT-PCR to investigate an uncommon splice site variant found by WES. An in silico analysis was done to obtain accurate nucleotide frequency occurrences of consensus splice positions other than the canonical sites of all human exons. Additionally, we applied direct Sanger sequencing for all exons and exon/intron boundaries of the PHEX gene for an affected girl from an independent second Indian family.
WES revealed a novel PHEX splice acceptor mutation in intron 9 (c.1080-3C>A) in a family with 3 affected individuals with HR. The effect on splicing of this mutation was further investigated by RT-PCR using RNA obtained from a patient’s EBV-transformed lymphoblast cell line. RT-PCR revealed an aberrant splice transcript skipping exons 10-14 which was not observed in control samples, confirming the diagnosis of X-linked dominant hypophosphatemia (XLH). The in silico analysis of all human splice sites adjacent to all 327,293 exons across 81,814 transcripts among 20,345 human genes revealed that cytosine is, with 64.3%, the most frequent nucleobase at the minus 3 splice acceptor position, followed by thymidine with 28.7%, adenine with 6.3%, and guanine with 0.8%. We generated frequency tables and pictograms for the extended donor and acceptor splice consensus regions by analyzing all human exons. Direct Sanger sequencing of all PHEX exons in a sporadic case with HR from the Indian subcontinent revealed an additional novel PHEX mutation (c.1211_1215delACAAAinsTTTACAT, p.Asp404Valfs*5, de novo) located in exon 11.
Mutation analyses revealed two novel mutations and helped to confirm the clinical diagnoses of XLH in two families from India. WES helped to analyze all genes implicated in the underlying disease complex. Mutations at splice positions other than the canonical key sites need further functional investigation to support the assertion of pathogenicity.
PMCID: PMC4479593  PMID: 26107949
15.  A class of human exons with predicted distant branch points revealed by analysis of AG dinucleotide exclusion zones 
Genome Biology  2006;7(1):R1.
Exons with predicted branch points were identified from a large dataset of human exons and the importance of these branch points for splicing was verified.
The three consensus elements at the 3' end of human introns - the branch point sequence, the polypyrimidine tract, and the 3' splice site AG dinucleotide - are usually closely spaced within the final 40 nucleotides of the intron. However, the branch point sequence and polypyrimidine tract of a few known alternatively spliced exons lie up to 400 nucleotides upstream of the 3' splice site. The extended regions between the distant branch points (dBPs) and their 3' splice site are marked by the absence of other AG dinucleotides. In many cases alternative splicing regulatory elements are located within this region.
We have applied a simple algorithm, based on AG dinucleotide exclusion zones (AGEZ), to a large data set of verified human exons. We found a substantial number of exons with large AGEZs, which represent candidate dBP exons. We verified the importance of the predicted dBPs for splicing of some of these exons. This group of exons exhibits a higher than average prevalence of observed alternative splicing, and many of the exons are in genes with some human disease association.
The group of identified probable dBP exons are interesting first because they are likely to be alternatively spliced. Second, they are expected to be vulnerable to mutations within the entire extended AGEZ. Disruption of splicing of such exons, for example by mutations that lead to insertion of a new AG dinucleotide between the dBP and 3' splice site, could be readily understood even though the causative mutation might be remote from the conventional locations of splice site sequences.
PMCID: PMC1431707  PMID: 16507133
16.  Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3 
BMC Genetics  2009;10:67.
Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described.
Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates.
We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution.
PMCID: PMC2767349  PMID: 19840385
17.  Factors affecting authentic 5' splice site selection in plant nuclei. 
Molecular and Cellular Biology  1993;13(3):1323-1331.
To define elements critical for 5' splice selection in dicot plant nuclei, wild-type and mutant transcripts containing the first intron of the pea rbcS3A gene were expressed in vivo by using an autonomously replicating plant expression vector. Mutations within the normal 5' splice site (+1) of this intron demonstrate that 5' splice sites at the normal exon-intron boundary having only limited agreement with a 5' splice site consensus sequence can be spliced quite effectively in dicot nuclei. Inactivation of the normal 5' splice site occurs only by point mutations of the G at position +1 of the intron (+1G) or +2U or by multiple mutations at other positions and results in the activation of three cryptic 5' splice sites in the adjacent exon and intron. cis competition of cryptic sites having consensus 5' splice site sequences with the normal 5' splice site demonstrates that cryptic splice sites in the exon, but not the intron, can compete to some extent with the normal site. Replacement of the sequences between the cryptic and normal 5' splice sites with heterologous exon or intron sequences demonstrates that the 5' boundary of this plant intron is defined by its position relative to the AU transition point between exon and intron. These results suggest that potential 5' splice sites upstream of the AU transition point are accessible for recognition by the plant pre-mRNA splicing machinery and that those downstream in the AU-rich intron are masked from recognition.
PMCID: PMC359441  PMID: 8441378
18.  Normal and abnormal mechanisms of gene splicing and relevance to inherited skin diseases 
The process of excising introns from pre-mRNA complexes is directed by specific genomic DNA sequences at intron—exon borders known as splice sites. These regions contain well-conserved motifs which allow the splicing process to proceed in a regulated and structured manner. However, as well as conventional splicing, several genes have the inherent capacity to undergo alternative splicing, thus allowing synthesis of multiple gene transcripts, perhaps with different functional properties. Within the human genome, therefore, through alternative splicing, it is possible to generate over 100,000 physiological gene products from the 35,000 or so known genes. Abnormalities in normal or alternative splicing, however, account for about 15% of all inherited single gene disorders, including many with a skin phenotype. These splicing abnormalities may arise through inherited mutations in constitutive splice sites or other critical intronic or exonic regions. This review article examines the process of normal intron—exon splicing, as well as what is known about alternative splicing of human genes. The review then addresses pathological disruption of normal intron—exon splicing that leads to inherited skin diseases, either resulting from mutations in sequences that have a direct influence on splicing or that generate cryptic splice sites. Examples of aberrant splicing, especially for the COL7A1 gene in patients with dystrophic epidermolysis bullosa, are discussed and illustrated. The review also examines a number of recently introduced computational tools that can be used to predict whether genomic DNA sequences changes may affect splice site selection and how robust the influence of such mutations might be on splicing.
PMCID: PMC1351063  PMID: 16054339
Inherited skin disease; RNA; Intron; Exon; Gene mutation; Splice site; Cryptic splicing
19.  Alu Exonization Events Reveal Features Required for Precise Recognition of Exons by the Splicing Machinery 
PLoS Computational Biology  2009;5(3):e1000300.
Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing.
Author Summary
A typical human gene consists of 9 exons around 150 nucleotides in length, separated by introns that are ∼3,000 nucleotides long. The challenge of the splicing machinery is to precisely identify and ligate the exons, while removing the introns. We aimed to understand how the splicing machinery meets this momentous challenge, based on Alu exonization events. Alus are transposable elements, of which approximately one million copies exist in the human genome, a large portion of which within introns. Throughout evolution, some intronic Alus accumulated mutations and became recognized by the splicing machinery as exons, a process termed exonization. Such Alus remain highly similar to their non-exonizing counterparts but are perceived as different by the splicing machinery. By comparing exonizing Alus to their non-exonizing counterparts, we were able to identify numerous features in which they differ and which presumably lead to the recognition only of the former by the splicing machinery. Our findings reveal insights regarding the role of local RNA secondary structures, exon–intron architecture constraints, and splicing regulatory signals. We integrated these features in a computational model, which was able to successfully mimic the function of the splicing machinery and discriminate between true Alu exons and their intronic counterparts, highlighting the functional importance of these features.
PMCID: PMC2639721  PMID: 19266014
20.  The Emergence of Alternative 3′ and 5′ Splice Site Exons from Constitutive Exons 
PLoS Computational Biology  2007;3(5):e95.
Alternative 3′ and 5′ splice site (ss) events constitute a significant part of all alternative splicing events. These events were also found to be related to several aberrant splicing diseases. However, only few of the characteristics that distinguish these events from alternative cassette exons are known currently. In this study, we compared the characteristics of constitutive exons, alternative cassette exons, and alternative 3′ss and 5′ss exons. The results revealed that alternative 3′ss and 5′ss exons are an intermediate state between constitutive and alternative cassette exons, where the constitutive side resembles constitutive exons, and the alternative side resembles alternative cassette exons. The results also show that alternative 3′ss and 5′ss exons exhibit low levels of symmetry (frame-preserving), similar to constitutive exons, whereas the sequence between the two alternative splice sites shows high symmetry levels, similar to alternative cassette exons. In addition, flanking intronic conservation analysis revealed that exons whose alternative splice sites are at least nine nucleotides apart show a high conservation level, indicating intronic participation in the regulation of their splicing, whereas exons whose alternative splice sites are fewer than nine nucleotides apart show a low conservation level. Further examination of these exons, spanning seven vertebrate species, suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally on four exons, showing that they indeed originated from constitutive exons that acquired a new competing splice site during evolution.
Author Summary
Alternative splicing is the mechanism that is responsible for the creation of multiple mRNA products from a single gene. It is considered a key player in genomic complexity achievement. Alternative 3′ and 5′ splicing events in which part of the exon is alternatively included or excluded in the mRNA constitute a significant part of all alternative splicing events, and yet little is known regarding their regulation mechanism and the evolutionary background that led to their creation. We show that alternative 3′ and 5′ splice site exons resemble constitutive exons. However, their alternative sequence resembles alternative cassette exons. Comparative genomics spanning seven vertebrate species suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally, showing that during evolution mutations shifted constitutive exons to undergo alternative 3′ and 5′ splicing.
PMCID: PMC1876488  PMID: 17530917
21.  BAP1 Missense Mutation c.2054 A>T (p.E685V) Completely Disrupts Normal Splicing through Creation of a Novel 5’ Splice Site in a Human Mesothelioma Cell Line 
PLoS ONE  2015;10(4):e0119224.
BAP1 is a tumor suppressor gene that is lost or deleted in diverse cancers, including uveal mela¬noma, malignant pleural mesothelioma (MPM), clear cell renal carcinoma, and cholangiocarcinoma. Recently, BAP1 germline mutations have been reported in families with combinations of these same cancers. A particular challenge for mutation screening is the classification of non-truncating BAP1 sequence variants because it is not known whether these subtle changes can affect the protein function sufficiently to predispose to cancer development. Here we report mRNA splicing analysis on a homozygous substitution mutation, BAP1 c. 2054 A&T (p.Glu685Val), identified in an MPM cell line derived from a mesothelioma patient. The mutation occurred at the 3rd nucleotide from the 3’ end of exon 16. RT-PCR, cloning and subsequent sequencing revealed several aberrant splicing products not observed in the controls: 1) a 4 bp deletion at the end of exon 16 in all clones derived from the major splicing product. The BAP1 c. 2054 A&T mutation introduced a new 5’ splice site (GU), which resulted in the deletion of 4 base pairs and presumably protein truncation; 2) a variety of alternative splicing products that led to retention of different introns: introns 14–16; introns 15–16; intron 14 and intron 16; 3) partial intron 14 and 15 retentions caused by activation of alternative 3’ splice acceptor sites (AG) in the introns. Taken together, we were unable to detect any correctly spliced mRNA transcripts in this cell line. These results suggest that aberrant splicing caused by this mutation is quite efficient as it completely abolishes normal splicing through creation of a novel 5’ splice site and activation of cryptic splice sites. These data support the conclusion that BAP1 c.2054 A&T (p.E685V) variant is a pathogenic mutation and contributes to MPM through disruption of normal splicing.
PMCID: PMC4382119  PMID: 25830670
22.  Re-splicing of mature mRNA in cancer cells promotes activation of distant weak alternative splice sites 
Nucleic Acids Research  2012;40(16):7896-7906.
Transcripts of the human tumor susceptibility gene 101 (TSG101) are aberrantly spliced in many cancers. A major aberrant splicing event on the TSG101 pre-mRNA involves joining of distant alternative 5′ and 3′ splice sites within exon 2 and exon 9, respectively, resulting in the extensive elimination of the mRNA. The estimated strengths of the alternative splice sites are much lower than those of authentic splice sites. We observed that the equivalent aberrant mRNA could be generated from an intron-less TSG101 gene expressed ectopically in breast cancer cells. Remarkably, we identified a pathway-specific endogenous lariat RNA consisting solely of exonic sequences, predicted to be generated by a re-splicing between exon 2 and exon 9 on the spliced mRNA. Our results provide evidence for a two-step splicing pathway in which the initial constitutive splicing removes all 14 authentic splice sites, thereby bringing the weak alternative splice sites into close proximity. We also demonstrate that aberrant multiple-exon skipping of the fragile histidine triad (FHIT) pre-mRNA in cancer cells occurs via re-splicing of spliced FHIT mRNA. The re-splicing of mature mRNA can potentially generate mutation-independent diversity in cancer transcriptomes. Conversely, a mechanism may exist in normal cells to prevent potentially deleterious mRNA re-splicing events.
PMCID: PMC3439910  PMID: 22675076
23.  A naturally arising mutation of a potential silencer of exon splicing in human immunodeficiency virus type 1 induces dominant aberrant splicing and arrests virus production. 
Journal of Virology  1997;71(11):8542-8551.
We have isolated a naturally arising human immunodeficiency type 1 (HIV-1) mutant containing a point mutation within the env gene. The point mutation resulted in complete loss of balanced splicing, with dominant production of aberrant mRNAs. The aberrant RNAs arose via activation of normally cryptic splice sites flanking the mutation within the env terminal exon to create exon 6D, which was subsequently incorporated in aberrant env, tat, rev, and nef mRNAs. Aberrant multiply spliced messages contributed to reduced virus replication as a result of a reduction in wild-type Rev protein. The point mutation within exon 6D activated exon 6D inclusion when the exon and its flanking splice sites were transferred to a heterologous minigene. Introduction of the point mutation into an otherwise wild-type HIV-1 proviral clone resulted in virus that was severely inhibited for replication in T cells and displayed elevated usage of exon 6D. Exon 6D contains a bipartite element similar to that seen in tat exon 3 of HIV-1, consisting of a potential exon splicing silencer (ESS) juxtaposed to a purine-rich sequence similar to known exon splicing enhancers. In the absence of a flanking 5' splice site, the point mutation within the exon 6D ESS-like element strongly activated env splicing, suggesting that the putative ESS plays a natural role in limiting the level of env splicing. We propose, therefore, that exon silencers may be a common element in the HIV-1 genome used to create balanced splicing of multiple products from a single precursor RNA.
PMCID: PMC192318  PMID: 9343212
24.  Alternative splicing of beta-tropomyosin pre-mRNA: multiple cis-elements can contribute to the use of the 5'- and 3'-splice sites of the nonmuscle/smooth muscle exon 6. 
Nucleic Acids Research  1994;22(12):2318-2325.
We previously found that the splicing of exon 5 to exon 6 in the rat beta-TM gene required that exon 6 first be joined to the downstream common exon 8 (Helfman et al., Genes and Dev. 2, 1627-1638, 1988). Pre-mRNAs containing exon 5, intron 5 and exon 6 are not normally spliced in vitro. We have carried out a mutational analysis to determine which sequences in the pre-mRNA contribute to the inability of this precursor to be spliced in vitro. We found that mutations in two regions of the pre-mRNA led to activation of the 3'-splice site of exon 6, without first joining exon 6 to exon 8. First, introduction of a nine nucleotide poly U tract upstream of the 3'-splice site of exon 6 results in the splicing of exon 5 to exon 6 with as little as 35 nucleotides of exon 6. Second, introduction of a consensus 5'-splice site in exon 6 led to splicing of exon 5 to exon 6. Thus, three distinct elements can act independently to activate the use of the 3'-splice site of exon 6: (1) the sequences contained within exon 8 when joined to exon 6, (2) a poly U tract in intron 5, and (3) a consensus 5'-splice site in exon 6. Using biochemical assays, we have determined that these sequence elements interact with distinct cellular factors for 3'-splice site utilization. Although HeLa cell nuclear extracts were able to splice all three types of pre-mRNAs mentioned above, a cytoplasmic S100 fraction supplemented with SR proteins was unable to efficiently splice exon 5 to exon 6 using precursors in which exon 6 was joined to exon 8. We also studied how these elements contribute to alternative splice site selection using precursors containing the mutually exclusive, alternatively spliced cassette comprised of exons 5 through 8. Introduction of the poly U tract upstream of exon 6, and changing the 5'-splice site of exon 6 to a consensus sequence, either alone or in combination, facilitated the use of exon 6 in vitro, such that exon 6 was spliced more efficiently to exon 8. These data show that intron sequences upstream of an exon can contribute to the use of the downstream 5'-splice, and that sequences surrounding exon 6 can contribute to tissue-specific alternative splice site selection.
PMCID: PMC523690  PMID: 8036160
25.  Functional studies on the ATM intronic splicing processing element 
Nucleic Acids Research  2005;33(13):4007-4015.
In disease-associated genes, the understanding of the functional significance of deep intronic nucleotide variants may represent a difficult challenge. We have previously reported a new disease-causing mechanism that involves an intronic splicing processing element (ISPE) in ATM, composed of adjacent consensus 5′ and 3′ splice sites. A GTAA deletion within ISPE maintains potential adjacent splice sites, disrupts a non-canonical U1 snRNP interaction and activates an aberrant exon. In this paper, we demonstrate that binding of U1 snRNA through complementarity within a ∼40 nt window downstream of the ISPE prevents aberrant splicing. By selective mutagenesis at the adjacent consensus ISPE splice sites, we show that this effect is not due to a resplicing process occurring at the ISPE. Functional comparison of the ATM mouse counterpart and evaluation of the pre-mRNA splicing intermediates derived from affected cell lines and hybrid minigene assays indicate that U1 snRNP binding at the ISPE interferes with the cryptic acceptor site. Activation of this site results in a stringent 5′–3′ order of intron sequence removal around the cryptic exon. Artificial U1 snRNA loading by complementarity to heterologous exonic sequences represents a potential therapeutic method to prevent the usage of an aberrant CFTR cryptic exon. Our results suggest that ISPE-like intronic elements binding U1 snRNPs may regulate correct intron processing.
PMCID: PMC1178006  PMID: 16030351

