DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5′- and 3′-splice sites were activated either by mutations in the consensus sequences of natural exon–intron junctions (cryptic sites) or elsewhere (‘de novo’ sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3′- and 5′-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at http://www.dbass.org.uk/.
The frequency distribution of mutation-induced aberrant 3′ splice sites (3′ss) in exons and introns is more complex than for 5′ splice sites, largely owing to sequence constraints upstream of intron/exon boundaries. As a result, prediction of their localization remains a challenging task. Here, nucleotide sequences of previously reported 218 aberrant 3′ss activated by disease-causing mutations in 131 human genes were compared with their authentic counterparts using currently available splice site prediction tools. Each tested algorithm distinguished authentic 3′ss from cryptic sites more effectively than from de novo sites. The best discrimination between aberrant and authentic 3′ss was achieved by the maximum entropy model. Almost one half of aberrant 3′ss was activated by AG-creating mutations and ∼95% of the newly created AGs were selected in vivo. The overall nucleotide structure upstream of aberrant 3′ss was characterized by higher purine content than for authentic sites, particularly in position −3, that may be compensated by more stringent requirements for positive and negative nucleotide signatures centred around position −11. A newly developed online database of aberrant 3′ss will facilitate identification of splicing mutations in a gene or phenotype of interest and future optimization of splice site prediction tools.
We compiled sequences of previously published aberrant 3′ splice sites (3′ss) that were generated by mutations in human disease genes. Cryptic 3′ss, defined here as those resulting from a mutation of the 3′YAG consensus, were more frequent in exons than in introns. They clustered in ∼20 nt region adjacent to authentic 3′ss, suggesting that their under-representation in introns is due to a depletion of AG dinucleotides in the polypyrimidine tract (PPT). In contrast, most aberrant 3′ss that were induced by mutations outside the 3′YAG consensus (designated ‘de novo’) were in introns. The activation of intronic de novo 3′ss was largely due to AG-creating mutations in the PPT. In contrast, exonic de novo 3′ss were more often induced by mutations improving the PPT, branchpoint sequence (BPS) or distant auxiliary signals, rather than by direct AG creation. The Shapiro–Senapathy matrix scores had a good prognostic value for cryptic, but not de novo 3′ss. Finally, AG-creating mutations in the PPT that produced aberrant 3′ss upstream of the predicted BPS in vivo shared a similar ‘BPS-new AG’ distance. Reduction of this distance and/or the strength of the new AG PPT in splicing reporter pre-mRNAs improved utilization of authentic 3′ss, suggesting that AG-creating mutations that are located closer to the BPS and are preceded by weaker PPT may result in less severe splicing defects.
Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
Mutations that affect splicing of precursor messenger RNAs play a major role in the development of hereditary diseases. Most splicing mutations have been found to eliminate GT or AG dinucleotides that define the 5′ and 3′ ends of introns, leading to exon skipping or cryptic splice-site activation. Although accurate description of the mis-spliced transcripts is critical for predicting phenotypic consequences of these alterations, their exact nature in affected individuals cannot often be determined experimentally. Using a comprehensive collection of exons that sustained cryptic splice-site activation or were skipped as a result of splice-site mutations, we have developed a multivariate logistic discrimination procedure that distinguishes the two aberrant splicing outcomes from DNA sequences. The new algorithm was validated using an independent sample of exons and implemented as a free online utility termed CRYP-SKIP (http://www.dbass.org.uk/cryp-skip/). The web application takes up one or more mutated alleles, each consisting of one exon and flanking intronic sequences, and provides a list of important predictor variables and their values, the overall probability of activating cryptic splice vs exon skipping, and the location and intrinsic strength of predicted cryptic splice sites in the input sequence. These results will facilitate phenotypic prediction of splicing mutations and provide further insights into splicing enhancer and silencer elements and their relative importance for splice-site selection in vivo.
mutation; gene; splicing; cryptic splice site; exon skipping; RNA
To define elements critical for 5' splice selection in dicot plant nuclei, wild-type and mutant transcripts containing the first intron of the pea rbcS3A gene were expressed in vivo by using an autonomously replicating plant expression vector. Mutations within the normal 5' splice site (+1) of this intron demonstrate that 5' splice sites at the normal exon-intron boundary having only limited agreement with a 5' splice site consensus sequence can be spliced quite effectively in dicot nuclei. Inactivation of the normal 5' splice site occurs only by point mutations of the G at position +1 of the intron (+1G) or +2U or by multiple mutations at other positions and results in the activation of three cryptic 5' splice sites in the adjacent exon and intron. cis competition of cryptic sites having consensus 5' splice site sequences with the normal 5' splice site demonstrates that cryptic splice sites in the exon, but not the intron, can compete to some extent with the normal site. Replacement of the sequences between the cryptic and normal 5' splice sites with heterologous exon or intron sequences demonstrates that the 5' boundary of this plant intron is defined by its position relative to the AU transition point between exon and intron. These results suggest that potential 5' splice sites upstream of the AU transition point are accessible for recognition by the plant pre-mRNA splicing machinery and that those downstream in the AU-rich intron are masked from recognition.
Certain thalassemic human beta-globin pre-mRNAs carry mutations that generate aberrant splice sites and/or activate cryptic splice sites, providing a convenient and clinically relevant system to study splice site selection. Antisense 2'-O-methyl oligoribonucleotides were used to block a number of sequences in these pre-mRNAs and were tested for their ability to inhibit splicing in vitro or to affect the ratio between aberrantly and correctly spliced products. By this approach, it was found that (i) up to 19 nucleotides upstream from the branch point adenosine are involved in proper recognition and functioning of the branch point sequence; (ii) whereas at least 25 nucleotides of exon sequences at both 3' and 5' ends are required for splicing, this requirement does not extend past the 5' splice site sequence of the intron; and (iii) improving the 5' splice site of the internal exon to match the consensus sequence strongly decreases the accessibility of the upstream 3' splice site to antisense 2'-O-methyl oligoribonucleotides. This result most likely reflects changes in the strength of interactions near the 3' splice site in response to improvement of the 5' splice site and further supports the existence of communication between these sites across the exon.
The fourth exon of the mouse polymeric immuno-globulin receptor (pIgR) is 654 nt long and, despite being surrounded by large introns, is constitutively spliced into the mRNA. Deletion of an 84 nt sequence from this exon strongly activated both cryptic 5' and 3' splice sites surrounding a 78 nt cryptic intron. The 84 nt deletion is just upstream of the cryptic 3' splice site; the cryptic 3' splice site was likely activated because the deletion created a better 3' splice site. However, the cryptic 5' splice site was also required to activate the cryptic splice reaction; point mutations in either of the cryptic splice sites that decreased their match to the consensus splice site sequence inactivated the cryptic splice reaction. The activation and inactivation of these cryptic splice sites as a pair suggests that they are being co-recognized by the splicing machinery. Interestingly, the large fourth exon of the pIgR gene encodes two immunoglobulin-like extracellular protein domains; the cryptic 3' splice site coincides with the junction between these protein domains. The cryptic 5' splice site is located between protein subdomains where an intron is found in another gene of the immunoglobulin superfamily.
Variants of unknown significance in the CAPN3 gene constitute a significant challenge for genetic counselling. Despite the frequency of intronic nucleotide changes in this gene (15–25% of all mutations), so far their pathogenicity has only been inferred by in-silico analysis, and occasionally, proven by RNA analysis. In this study, 5 different intronic variants (one novel) that bioinformatic tools predicted would affect RNA splicing, underwent comprehensive studies which were designed to prove they are disease-causing. Muscle mRNA from 15 calpainopathy patients was analyzed by RT-PCR and splicing-specific-PCR tests. We established the previously unrecognized pathogenicity of these mutations, which caused aberrant splicing, most frequently by the activation of cryptic splicing sites or, occasionally, by exon skipping. The absence or severe reduction of protein demonstrated their deleterious effect at translational level. We concluded that bioinformatic tools are valuable to suggest the potential effects of intronic variants; however, the experimental demonstration of the pathogenicity is not always easy to do even when using RNA analysis (low abundance, degradation mechanisms), and it might not be successful unless splicing-specific-PCR tests are used. A comprehensive approach is therefore recommended to identify and describe unclassified variants in order to offer essential data for basic and clinical geneticists. ©2010 Wiley-Liss, Inc.
CAPN3; LGMD2 A; calpainopathy; intronic variants; pathogenetic mutations; splicing
A T→G mutation at nucleotide 705 of human β-globin intron 2 creates an aberrant 5′ splice site and activates a cryptic 3′ splice site upstream. In consequence, the pre-mRNA is spliced via aberrant splice sites, despite the presence of the still functional correct sites. Surprisingly, when IVS2-705 HeLa or K562 cells were cultured at temperatures below 30°C, aberrant splicing was inhibited and correct splicing was restored. Similar temperature effects were seen for another β-globin pre-mRNA, IVS2-745, and in a construct in which a β-globin intron was inserted into a coding sequence of EGFP. Temperature-induced alternative splicing was affected by the nature of the internal aberrant splice sites flanking the correct sites and by exonic sequences. The results indicate that in the context of thalassemic splicing mutations and possibly in other alternatively spliced pre-mRNAs, temperature is one of the parameters that affect splice site selection.
We have isolated a naturally arising human immunodeficiency type 1 (HIV-1) mutant containing a point mutation within the env gene. The point mutation resulted in complete loss of balanced splicing, with dominant production of aberrant mRNAs. The aberrant RNAs arose via activation of normally cryptic splice sites flanking the mutation within the env terminal exon to create exon 6D, which was subsequently incorporated in aberrant env, tat, rev, and nef mRNAs. Aberrant multiply spliced messages contributed to reduced virus replication as a result of a reduction in wild-type Rev protein. The point mutation within exon 6D activated exon 6D inclusion when the exon and its flanking splice sites were transferred to a heterologous minigene. Introduction of the point mutation into an otherwise wild-type HIV-1 proviral clone resulted in virus that was severely inhibited for replication in T cells and displayed elevated usage of exon 6D. Exon 6D contains a bipartite element similar to that seen in tat exon 3 of HIV-1, consisting of a potential exon splicing silencer (ESS) juxtaposed to a purine-rich sequence similar to known exon splicing enhancers. In the absence of a flanking 5' splice site, the point mutation within the exon 6D ESS-like element strongly activated env splicing, suggesting that the putative ESS plays a natural role in limiting the level of env splicing. We propose, therefore, that exon silencers may be a common element in the HIV-1 genome used to create balanced splicing of multiple products from a single precursor RNA.
DMD nonsense and frameshift mutations lead to severe Duchenne muscular dystrophy while in-frame mutations lead to milder Becker muscular dystrophy. Exceptions are found in 10% of cases and the production of alternatively spliced transcripts is considered a key modifier of disease severity. Several exonic mutations have been shown to induce exon-skipping, while splice site mutations result in exon-skipping or activation of cryptic splice sites. However, factors determining the splicing pathway are still unclear. Point mutations provide valuable information regarding the regulation of pre-mRNA splicing and elements defining exon identity in the DMD gene. Here we provide a comprehensive analysis of 98 point mutations related to clinical phenotype and their effect on muscle mRNA and dystrophin expression. Aberrant splicing was found in 27 mutations due to alteration of splice sites or splicing regulatory elements. Bioinformatics analysis was performed to test the ability of the available algorithms to predict consequences on mRNA and to investigate the major factors that determine the splicing pathway in mutations affecting splicing signals. Our findings suggest that the splicing pathway is highly dependent on the interplay between splice site strength and density of regulatory elements.
Beta-globin gene mutations which alter normal globin RNA splicing have confirmed the necessity of invariant nucleotides GT at donor splice sites. Functional consequences of point mutations in the invariant AG acceptor splice site have not been determined. We have isolated and characterized a beta-globin gene from a Black patient with beta-thalassemia intermedia which has an A-G transition at the usual intervening sequence 2 (IVS2) acceptor splice site. Functional analysis of transcripts produced by this mutant gene in a transient expression vector indicates that the mutation inactivates the normal acceptor splice site and results in some utilization of a cryptic splice site near position 580 of IVS2. This mutation would be expected to produce a beta-globin gene which results in no normal beta-globin mRNA.
Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons.
Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns.
We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.
The splicing of group II and nuclear pre-mRNAs introns occurs via a similar splicing pathway and some of the RNA-RNA interactions involved in these splicing reactions show structural similarities. Recently, genetic analyses performed in a group II intron and the yeast nuclear actin gene suggested that non Watson-Crick interactions between intron boundaries are important for the second splicing step efficiency in both classes of introns. We here show that, in the yeast nuclear rp51A intron, a G to A mutation at the first position activates cryptic 3' splice sites with the sequences UAC/ or UAA/. Moreover, the natural 3' splice site could be reactivated by a G to C substitution of the last intron nucleotide. These results demonstrate that the interaction between the first and last intron nucleotides is a conserved feature of nuclear pre-mRNA splicing in yeast and is involved in the mechanism of 3' splice site selection.
We describe a new program called cryptic splice finder (CSF) that can reliably identify cryptic splice sites (css), so providing a useful tool to help investigate splicing mutations in genetic disease. We report that many css are not entirely dormant and are often already active at low levels in normal genes prior to their enhancement in genetic disease. We also report a fascinating correlation between the positions of css and introns, whereby css within the exons of one species frequently match the exact position of introns in equivalent genes from another species. These results strongly indicate that many introns were inserted into css during evolution and they also imply that the splicing information that lies outside some introns can be independently recognized by the splicing machinery and was in place prior to intron insertion. This indicates that non-intronic splicing information had a key role in shaping the split structure of eukaryote genes.
X-linked spondyloepiphyseal dysplasia tarda can be caused by mutations in the SEDL gene. This study describes an interesting novel mutation (IVS4+1A>G) located exactly at the rare noncanonical AT–AC consensus splicing donor point of SEDL, which regained the canonical GT–AG consensus splicing junction in addition to several other rarer noncanonical splice patterns. The mutation activated several cryptic splice sites and generated the production of seven erroneous splicing isoforms, which we confirmed by sequencing of RT-PCR products and resequencing of cDNA clones. All the practical splice donors/acceptors were further assessed using FSPLICE 1.0 and SPL(M) Platforms to predict potential splice sites in genomic DNA. Subsequently, the expression levels of SEDL among the affected patients, carriers and controls were estimated using real-time quantitative PCR. Expression analyses showed that the expression levels of SEDL in both patients and carriers were decreased. Taken together, these results illustrated how disruption of the AT donor site in a rare AT–AC intron, leading to a canonical GT donor site, resulted in a multitude of aberrant transcripts, thus impairing exon definition. The unexpected splicing patterns resulting from the special mutation provide additional challenges and opportunities for understanding splicing mechanisms and specificity.
alternative splicing; canonical splice site; mutation analysis; noncanonical splice site; splicing mechanism; spondyloepiphyseal dysplasia tarda
In a patient with a beta-thalassemia intermedia, a mutation was identified in the second intron of the human beta-globin gene. The U-->G mutation is located within the polypyrimidine tract at position -8 upstream of the 3' splice site. In vivo, this mutation leads to decreased levels of the hemoglobin protein. Because of the location of the mutation and the role of the polypyrimidine tract in the splicing process, we performed in vitro splicing assays on the pre-messenger RNA (pre-mRNA). We found that the splicing efficiency of the mutant pre-mRNA is reduced compared to the wild type and that no cryptic splice sites are activated. Analysis of splicing complex formation shows that the U-->G mutation affects predominantly the progression of the H complex towards the pre-spliceosome complex. By cross-linking and immunoprecipitation assays, we show that the hnRNP C protein interacts more efficiently with the mutant precursor than with the wild-type. This stronger interaction could play a role, directly or indirectly, in the decreased splicing efficiency.
Pseudo-exons are intronic sequences that are flanked by apparent consensus splice sites but that are not observed in spliced mRNAs. Pseudo-exons are often difficult to activate by mutation and have typically been viewed as a conceptual challenge to our understanding of how the spliceosome discriminates between authentic and cryptic splice sites. We have analyzed an apparent pseudo-exon located downstream of mutually exclusive exons 2 and 3 of the rat α-tropomyosin (TM) gene. The TM pseudo-exon is conserved among mammals and has a conserved profile of predicted splicing enhancers and silencers that is more typical of a genuine exon than a pseudo-exon. Splicing of the pseudo-exon is fully activated for splicing to exon 3 by a number of simple mutations. Splicing of the pseudo-exon to exon 3 is predicted to lead to nonsense-mediated decay (NMD). In contrast, when “prespliced” to exon 2 it follows a “zero length exon” splicing pathway in which a newly generated 5′ splice site at the junction with exon 2 is spliced to exon 4. We propose that a subset of apparent pseudo-exons, as exemplified here, are actually authentic alternative exons whose inclusion leads to NMD.
There has been growing evidence for extensive diversity of alternative splicing in human populations. Genetic variants within the 5′ splice site can cause splicing differences among human individuals and constitute an important class of human disease mutations. In this study, we explored whether natural variations of splicing could reveal important signals of 5′ splice site recognition. In seven lymphoblastoid cell lines of Asian, European and African ancestry, we identified 1174 single nucleotide polymorphisms (SNPs) within the consensus 5′ splice site. We selected 129 SNPs predicted to significantly alter the splice site activity, and quantitatively examined their splicing impact in the seven individuals. Surprisingly, outside of the essential GT dinucleotide position, only ∼14% of the tested SNPs altered splicing. Bioinformatic and minigene analyses identified signals that could modify the impact of 5′ splice site polymorphisms, most notably a strong 3′ splice site and the presence of intronic motifs downstream of the 5′ splice site. Strikingly, we found that the poly-G run, a known intronic splicing enhancer, was the most significantly enriched motif downstream of exons unaffected by 5′ splice site SNPs. In TRIM62, the upstream 3′ splice site and downstream intronic poly-G runs functioned redundantly to protect an exon from its 5′ splice site polymorphism. Collectively, our study reveals widespread context-dependent robustness to 5′ splice site polymorphisms in human transcriptomes. Consequently, certain exons are more susceptible to 5′ splice site mutations. Additionally, our work demonstrates that genetic diversity of alternative splicing can provide significant insights into the splicing code of mammalian cells.
In pre-mRNA splicing, specific spliceosomal components recognize key intron sequences, but the mechanisms by which splice sites are selected arenot completely understood. In the Saccharomyces cerevisiae actin intron a silent branch point-like sequence (UACUAAG) is located 7 nt upstream of the canonical sequence. Mutation of the canonicalUACUAAC sequence to UAAUAAC reduces utilization of this signal and activates the cryptic UACUAAG. Splicing-dependent beta-galactosidase assays have shown that these two splice signals cooperate to enhance splicing. Analyses of several variants of this double branch point intron demonstrate that the upstream UACUAAG sequence significantly increases usage of the UAAUAAC as a site of lariat formation. This activation is sequence-specific and unidirectional. However the ability of the UACUAAG signal to activate the downstream branch point is dependent on the presence of a short non-conserved sequence located a few nucleotides upstream of the UACUAAG. Mutation of this sequence leads to the disappearance of the cooperative interactions between the two branch signals. Our results show that this non-conserved sequence and the UACUAAG signal must both be present to achieve activation of the downstream branch point and suggest that a specific structure may be necessary to allow efficient recognition of the UAAUAAC.
A large proportion of mutations at the human hprt locus result in aberrant splicing of the hprt mRNA. We have been able to relate the mutation to the splicing abnormality in 30 of these mutants. Mutations at the splice acceptor sites of introns 4, 6 and 7 result in splicing out of the whole of the downstream exons, whereas in introns 1, 7 or 8 a cryptic site in the downstream exon can be used. Mutations in the donor site of introns 1 and 5 result in the utilisation of cryptic sites further downstream, whereas in the other introns, the upstream exons are spliced out. Our most unexpected findings were mutations in the middle of exons 3 and 8 which resulted in splicing out of these exons in part of the mRNA populations. Our results have enabled us to assess current models of mRNA splicing. They emphasize the importance of the polypyrimidine tract in splice acceptor sites, they support the role of the exon as the unit of assembly for splicing, and they are consistent with a model proposing a stem-loop structure for exon 8 in the hprt mRNA.
Abnormalities of pre-mRNA splicing are increasingly recognized as an important mechanism through which gene mutations cause disease. However, apart from the mutations in the donor and acceptor sites, the effects on splicing of other sequence variations are difficult to predict. Loosely defined exonic and intronic sequences have been shown to affect splicing efficiency by means of silencing and enhancement mechanisms. Thus, nucleotide substitutions in these sequences can induce aberrant splicing. Web-based resources have recently been developed to facilitate the identification of nucleotide changes that could alter splicing. However, computer predictions do not always correlate with in vivo splicing defects. The issue of unclassified variants in cancer predisposing genes is very important both for the correct ascertainment of cancer risk and for the understanding of the basic mechanisms of cancer gene function and regulation. Therefore we aimed to verify how predictions that can be drawn from in silico analysis correlate with results obtained in an in vivo splicing assay.
We analysed 99 hMLH1 and hMSH2 missense mutations with six different algorithms. Transfection of three different cell lines with 20 missense mutations, showed that a minority of them lead to defective splicing. Moreover, we observed that some exons and some mutations show cell-specific differences in the frequency of exon inclusion.
Our results suggest that the available algorithms, while potentially helpful in identifying splicing modulators especially when they are located in weakly defined exons, do not always correspond to an obvious modification of the splicing pattern. Thus caution must be used in assessing the pathogenicity of a missense or silent mutation with prediction programs. The variations observed in the splicing proficiency in three different cell lines suggest that nucleotide changes may dictate alternative splice site selection in a tissue-specific manner contributing to the widely observed phenotypic variability in inherited cancers.
Ancient Alu elements have been shown to be included in mature transcripts by point mutations that improve their 5′ or 3′ splice sites. We have examined requirements for exonization of a younger, disease-associated AluYa5 in intron 16 of the human ACE gene. A single G>C transversion in position −3 of the new Alu exon was insufficient for Alu exonization and a significant inclusion in mRNA was only observed when improving several potential splice donor sites in the presence of 3′ CAG. Since complete Alu exonization was not achieved by optimizing traditional splicing signals, including the branch site, we tested whether auxiliary elements in AluYa5 were required for constitutive inclusion. Exonization was promoted by a SELEX-predicted heptamer in Alu consensus sequence 222–228 and point mutations in highly conserved nucleotides of this heptamer decreased Alu inclusion. In addition, we show that Alu exonization was facilitated by a subset of serine/arginine-rich (SR) proteins through activation of the optimized 3′ splice site. Finally, the haplotype- and allele-specific ACE minigenes generated similar splicing patterns in both ACE-expressing and non-expressing cells, suggesting that previously reported allelic association with plasma ACE activity and cardiovascular disease is not attributable to differential splicing of introns 16 and 17.
Group I self-splicing introns have a 5' splice site duplex (P1) that contains a single conserved base pair (U.G). The U is the last nucleotide of the 5' exon, and the G is part of the internal guide sequence within the intron. Using site-specific mutagenesis and analysis of the rate and accuracy of splicing of the Tetrahymena thermophila group I intron, we found that both the U and the G of the U.G pair are important for the first step of self-splicing (attack of GTP at the 5' splice site). Mutation of the U to a purine activated cryptic 5' splice sites in which a U.G pair was restored; this result emphasizes the preference for a U.G at the splice site. Nevertheless, some splicing persisted at the normal site after introduction of a purine, suggesting that position within the P1 helix is another determinant of 5' splice site choice. When the U was changed to a C, the accuracy of splicing was not affected, but the Km for GTP was increased by a factor of 15 and the catalytic rate constant was decreased by a factor of 7. Substitution of U.A, U.U, G.G, or A.G for the conserved U.G decreased the rate of splicing by an even greater amount. In contrast, mutation of the conserved G enhanced the second step of splicing, as evidenced by a trans-splicing assay. Furthermore, a free 5' exon ending in A or C instead of the conserved U underwent efficient ligation. Thus, unlike the remainder of the P1 helix, which functions in both the first and second steps of self-splicing, the conserved U.G appears to be important only for the first step.