The unconventional splicing of Hac1 by the ribonuclease Ire1 is a key event in the activation of the unfolded protein response (UPR) in Saccharomyces cerevisiae. This splicing is independent of the spliceosome and is mediated by a secondary structure at the intron-exon boundaries of the mRNA. Similar unconventional splicing was also described for the gene Xbp1 in human, mouse, Caenorhabditis elegans and Drosophila melanogaster, and for Hac1 in five other fungi. We used reported RNA structures to build a multiple sequence alignment and the Infernal package to search for homologous structures. We identified homologous non-canonical intron structures in 128 out of 156 searched eukaryotic genomes. Our results show that the sequence of the Hac1/Xbp1 intron is highly conserved only around the splice sites recognized by Ire1. The consensus structure of the Hac1/Xbp1 mRNA is well conserved in Fungi and Metazoa and resembles structures previously described. We show that a typical Hac1/Xbp1 intron is very short, only 20–26 bases, whereas yeast species have a long intron (>100 bases). We identified six species with unambiguous Hac1/Xbp1 homologs that have lost the non-canonical intron structure. We propose that these species use a different mechanism to regulate the UPR.
unfolded protein response; splicing; RNA structure; intron; HAC1; XBP1
Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. In vertebrates, most splice sites are initially recognized by the spliceosome across the exon, because most exons are small and surrounded by large introns. This gene architecture predicts that efficient exon recognition depends largely on the strength of the flanking 3′ and 5′ splice sites. However, it is unknown if the 3′ or the 5′ splice site dominates the exon recognition process. Here, we test the 3′ and 5′ splice site contributions towards efficient exon recognition by systematically replacing the splice sites of an internal exon with sequences of different splice site strengths. We show that the presence of an optimal splice site does not guarantee exon inclusion and that the best predictor for exon recognition is the sum of both splice site scores. Using a genome-wide approach, we demonstrate that the combined 3′ and 5′ splice site strengths of internal exons provide a much more significant separator between constitutive and alternative exons than either the 3′ or the 5′ splice site strength alone.
Incorporation of exon 11 of the insulin receptor gene is both developmentally and hormonally-regulated. Previously, we have shown the presence of enhancer and silencer elements that modulate the incorporation of the small 36-nucleotide exon. In this study, we investigated the role of inherent splice site strength in the alternative splicing decision and whether recognition of the splice sites is the major determinant of exon incorporation.
We found that mutation of the flanking sub-optimal splice sites to consensus sequences caused the exon to be constitutively spliced in-vivo. These findings are consistent with the exon-definition model for splicing. In-vitro splicing of RNA templates containing exon 11 and portions of the upstream intron recapitulated the regulation seen in-vivo. Unexpectedly, we found that the splice sites are occupied and spliceosomal complex A was assembled on all templates in-vitro irrespective of splicing efficiency.
These findings demonstrate that the exon-definition model explains alternative splicing of exon 11 in the IR gene in-vivo but not in-vitro. The in-vitro results suggest that the regulation occurs at a later step in spliceosome assembly on this exon.
Appropriate expression of most eukaryotic genes requires the removal of introns from their pre–messenger RNAs (pre-mRNAs), a process catalyzed by the spliceosome. In higher eukaryotes a large family of auxiliary factors known as SR proteins can improve the splicing efficiency of transcripts containing suboptimal splice sites by interacting with distinct sequences present in those pre-mRNAs. The yeast Saccharomyces cerevisiae lacks functional equivalents of most of these factors; thus, it has been unclear whether the spliceosome could effectively distinguish among transcripts. To address this question, we have used a microarray-based approach to examine the effects of mutations in 18 highly conserved core components of the spliceosomal machinery. The kinetic profiles reveal clear differences in the splicing defects of particular pre-mRNA substrates. Most notably, the behaviors of ribosomal protein gene transcripts are generally distinct from other intron-containing transcripts in response to several spliceosomal mutations. However, dramatically different behaviors can be seen for some pairs of transcripts encoding ribosomal protein gene paralogs, suggesting that the spliceosome can readily distinguish between otherwise highly similar pre-mRNAs. The ability of the spliceosome to distinguish among its different substrates may therefore offer an important opportunity for yeast to regulate gene expression in a transcript-dependent fashion. Given the high level of conservation of core spliceosomal components across eukaryotes, we expect that these results will significantly impact our understanding of how regulated splicing is controlled in higher eukaryotes as well.
The spliceosome is a large RNA-protein machine responsible for removing the noncoding (intron) sequences that interrupt eukaryotic genes. Nearly everything known about the behavior of this machine has been based on the analysis of only a handful of genes, despite the fact that individual introns vary greatly in both size and sequence. Here we have utilized a microarray-based platform that allows us to simultaneously examine the behavior of all intron-containing genes in the budding yeast S. cerevisiae. By systematically examining the effects of individual mutants in the spliceosome on the splicing of all substrates, we have uncovered a surprisingly complex relationship between the spliceosome and its full complement of substrates. Contrary to the idea that the spliceosome engages in “generic” interactions with all intron-containing substrates in the cell, our results show that the identity of the transcript can differentially affect splicing efficiency when the machine is subtly perturbed. We propose that the wild-type spliceosome can also distinguish among its many substrates as external conditions warrant to function as a specific regulator of gene expression.
Many eukaryotic gene transcripts are spliced; here the authors show that components of the splicing complex can distinguish between different introns in highly homologous transcripts.
Pre-mRNA splicing is a crucial step in gene expression, and accurate recognition of splice sites is an essential part of this process. Splice sites with weak matches to the consensus sequences are common, though it is not clear how such sites are efficiently utilized. Using an in vitro splicing-complementation approach, we identified PUF60 as a factor that promotes splicing of an intron with a weak 3′ splice-site. PUF60 has homology to U2AF65, a general splicing factor that facilitates 3′ splice-site recognition at the early stages of spliceosome assembly. We demonstrate that PUF60 can functionally substitute for U2AF65 in vitro, but splicing is strongly stimulated by the presence of both proteins. Reduction of either PUF60 or U2AF65 in cells alters the splicing pattern of endogenous transcripts, consistent with the idea that regulation of PUF60 and U2AF65 levels can dictate alternative splicing patterns. Our results indicate that recognition of 3′ splice sites involves different U2AF-like molecules, and that modulation of these general splicing factors can have profound effects on splicing.
Messenger RNA splicing is an essential and complex process for the removal of intron sequences. Whereas the composition of the splicing machinery is mostly known, the kinetics of splicing, the catalytic activity of splicing factors and the interdependency of transcription, splicing and mRNA 3′ end formation are less well understood. We propose a stochastic model of splicing kinetics that explains data obtained from high-resolution kinetic analyses of transcription, splicing and 3′ end formation during induction of an intron-containing reporter gene in budding yeast. Modelling reveals co-transcriptional splicing to be the most probable and most efficient splicing pathway for the reporter transcripts, due in part to a positive feedback mechanism for co-transcriptional second step splicing. Model comparison is used to assess the alternative representations of reactions. Modelling also indicates the functional coupling of transcription and splicing, because both the rate of initiation of transcription and the probability that step one of splicing occurs co-transcriptionally are reduced, when the second step of splicing is abolished in a mutant reporter.
The coding information for the synthesis of proteins in mammalian cells is first transcribed from DNA to messenger RNA (mRNA), before being translated from mRNA to protein. Each step is complex, and subject to regulation. Certain sequences of DNA must be skipped in order to generate a functional protein, and these sequences, known as introns, are removed from the mRNA by the process of splicing. Splicing is well understood in terms of the proteins and complexes that are involved, but the rates of reactions, and models for the splicing pathways, have not yet been established. We present a model of splicing in yeast that accounts for the possibilities that splicing may take place while the mRNA is in the process of being created, as well as the possibility that splicing takes place once mRNA transcription is complete. We assign rates to the reactions in the pathway, and show that co-transcriptional splicing is the preferred pathway. In order to reach these conclusions, we compare a number of alternative models by a quantitative computational method. Our analysis relies on the quantitative measurement of messenger RNA in live cells - this is a major challenge in itself that has only recently been addressed.
Sequences that conform to the 5′ splice site (5′SS) consensus are highly abundant in mammalian introns. Most of these sequences are preceded by at least one in-frame stop codon; thus, their use for splicing would result in pre-maturely terminated aberrant mRNAs. In normally grown cells, such intronic 5′SSs appear not to be selected for splicing. However, under heat shock conditions aberrant splicing involving such latent 5′SSs occurred in a number of specific gene transcripts. Using a splicing-sensitive microarray, we show here that stress-induced (e.g. heat shock) activation of latent splicing is widespread across the human transcriptome, thus highlighting the possibility that latent splicing may underlie certain diseases. Consistent with this notion, our analyses of data from the Gene Expression Omnibus (GEO) revealed widespread activation of latent splicing in cells grown under hypoxia and in certain cancers such as breast cancer and gliomas. These changes were found in thousands of transcripts representing a wide variety of functional groups; among them are genes involved in cell proliferation and differentiation. The GEO analysis also revealed a set of gene transcripts in oligodendroglioma, in which the level of activation of latent splicing increased with the severity of the disease.
While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.
Most human genes are split into pieces, such that the protein-coding parts (exons) are separated in the genome by large tracts of non-coding DNA (introns) that must be transcribed and spliced out to create a functional transcript. Variation in splicing reactions can create multiple transcripts from the same gene, yet the function for many of these alternative transcripts is unknown. In this study, we show that many of these transcripts are due to splicing errors which are not preserved over evolutionary time. We estimate that the error rate in the splicing of an intron is about 0.7% and demonstrate that there are two major types of splicing error: errors in the recognition of exons and errors in the precise choice of splice site. These results raise the possibility that variation in levels of alternative splicing across species may in part be to variation in splicing error rate.
Pairing between U2 snRNA and the branch site of spliceosomal introns is essential for spliceosome assembly and is thought to be required for the first catalytic step of splicing. We have identified an RNA comprising the 5’ end of U2 snRNA and the 3’ exon of the ACT1-CUP1 reporter gene, resulting from a trans-splicing reaction in which a 5’ splice site-like sequence in the universally conserved branch site-binding region of U2 is used in trans as a 5’ splice site for both steps of splicing in vivo. Formation of this product occurs in functional spliceosomes assembled on reporter genes whose 5’ splice sites are predicted to bind poorly at the spliceosome catalytic centre. Multiple spatially disparate splice sites in U2 can be used, calling into question both the fate of its pairing to the branch site and the details of its role in splicing catalysis.
U2 snRNA; branch site; trans-splicing; bulged duplex model; splicing catalysis
The presence of spliceosomal introns in eukaryotes raises a range of questions about genomic evolution. Along with the fundamental mysteries of introns' initial proliferation and persistence, the evolutionary forces acting on intron sequences remain largely mysterious. Intron number varies across species from a few introns per genome to several introns per gene, and the elements of intron sequences directly implicated in splicing vary from degenerate to strict consensus motifs. We report a 50-species comparative genomic study of intron sequences across most eukaryotic groups. We find two broad and striking patterns. First, we find that some highly intron-poor lineages have undergone evolutionary convergence to strong 3′ consensus intron structures. This finding holds for both branch point sequence and distance between the branch point and the 3′ splice site. Interestingly, this difference appears to exist within the genomes of green alga of the genus Ostreococcus, which exhibit highly constrained intron sequences through most of the intron-poor genome, but not in one much more intron-dense genomic region. Second, we find evidence that ancestral genomes contained highly variable branch point sequences, similar to more complex modern intron-rich eukaryotic lineages. In addition, ancestral structures are likely to have included polyT tails similar to those in metazoans and plants, which we found in a variety of protist lineages. Intriguingly, intron structure evolution appears to be quite different across lineages experiencing different types of genome reduction: whereas lineages with very few introns tend towards highly regular intronic sequences, lineages with very short introns tend towards highly degenerate sequences. Together, these results attest to the complex nature of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron structures across modern lineages, and the impressive evolutionary malleability of eukaryotic gene structures.
The spliceosomal introns that interrupt eukaryotic genes show great number and sequence variation across species, from the rare, highly uniform yeast introns to the ubiquitous and highly variable vertebrate intron sequences. The causes of these differences remain mysterious. We studied sequences of intron branch points and 3′ termini in 50 eukaryotic species. All intron-rich species exhibit variable 3′ sequences. However, intron-poor species range from variable sequences, to uniform branch point motifs, to uniform branch point motifs in uniform positions along the intronic sequence. This is a more complex pattern than the clear relationship between intron number and 5′ intron sequence uniformity found previously. The correspondence of sequence uniformity and intron number extends to species of the green algal genus Ostreococcus, in which the single intron-rich genomic region shows far more variable intron sequences than in the otherwise intron-poor genome. We suggest that different concentrations of spliceosomal complexes may explain these differences. In addition, we report the existence of 3′ polyT tails in diverse eukaryotic protists, suggesting that this structure is ancestral. Together, these results underscore the complexity of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron sequences in modern eukaryotes, and the impressive evolutionary malleability of eukaryotic genes.
Small deletions of 6, 7, and 12 nucleotides introduced between the 5' splice site and the internal branch acceptor site of the first intron of the yeast MATa1 gene completely abolish accurate splicing in vitro in these constructs. Splicing only occurs at an alternative 5' splice site which was found in the first exon of the MATa1 gene and which is used both in vivo and in vitro. The splicing defect cannot be cured by expanding the distance from the branch point to the 3' splice site. If the alternative 5' splice site is deleted as well in these constructs, neither spliced products nor spliceosomes are formed. Our findings especially lead to the conclusion that a minimum distance between the 5' splice site and the internal branch acceptor site of the intron is required for the formation of splicing complexes and for accurate splicing.
Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5′ splice site that perfectly matches the 5′ consensus combined with mutation to match the CAG/G sequence of the 3′ consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3′ splice site and a consensus 5′ splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5′ splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with β-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
As part of the exploratory sequencing program Génolevures, visual scrutinisation and bioinformatic tools were used to detect spliceosomal introns in seven hemiascomycetous yeast species. A total of 153 putative novel introns were identified. Introns are rare in yeast nuclear genes (<5% have an intron), mainly located at the 5′ end of ORFs, and not highly conserved in sequence. They all share a clear non-random vocabulary: conserved splice sites and conserved nucleotide contexts around splice sites. Homologues of metazoan snRNAs and putative homologues of SR splicing factors were identified, confirming that the spliceosomal machinery is highly conserved in eukaryotes. Several introns’ features were tested as possible markers for phylogenetic analysis. We found that intron sizes vary widely within each genome, and according to the phylogenetic position of the yeast species. The evolutionary origin of spliceosomal introns was examined by analysing the degree of conservation of intron positions in homologous yeast genes. Most introns appeared to exist in the last common ancestor of present day yeast species, and then to have been differentially lost during speciation. However, in some cases, it is difficult to exclude a possible sliding event affecting a pre-existing intron or a gain of a novel intron. Taken together, our results indicate that the origin of spliceosomal introns is complex within a given genome, and that present day introns may have resulted from a dynamic flux between intron conservation, intron loss and intron gain during the evolution of hemiascomycetous yeasts.
The rol-6 gene is trans-spliced to the 22 nt leader, SL1, 173 nt downstream of the transcription start. We have analyzed splicing in transformants carrying extrachromosomal arrays of rol-6 with mutations in the trans-splice acceptor site. This site is a close match to the consensus, UUUCAG, that is highly conserved in both trans-splice and intron acceptor sites in C. elegans. When the trans-splice site was inactivated by mutating the perfectly-conserved AG, trans-splicing still occurred, but at a cryptic site 20 nt upstream. We tested the frequency with which splicing switched from the normal site to the cryptic site when the pyrimidines at this site were changed to A's. Since most C. elegans 3' splice sites lack an obvious polypyrimidine tract, we hypothesized that these four pyrimidines might play this role, and indeed mutation of these bases caused splicing to switch to the cryptic site. We also demonstrated that a major reason the downstream site is normally favored is because it occurs at a boundary between A+U rich and non-A+U rich RNA. When the RNA between the two splice sites was made less A+U rich, splicing occurred preferentially at the upstream site.
Pseudo-exons are intronic sequences that are flanked by apparent consensus splice sites but that are not observed in spliced mRNAs. Pseudo-exons are often difficult to activate by mutation and have typically been viewed as a conceptual challenge to our understanding of how the spliceosome discriminates between authentic and cryptic splice sites. We have analyzed an apparent pseudo-exon located downstream of mutually exclusive exons 2 and 3 of the rat α-tropomyosin (TM) gene. The TM pseudo-exon is conserved among mammals and has a conserved profile of predicted splicing enhancers and silencers that is more typical of a genuine exon than a pseudo-exon. Splicing of the pseudo-exon is fully activated for splicing to exon 3 by a number of simple mutations. Splicing of the pseudo-exon to exon 3 is predicted to lead to nonsense-mediated decay (NMD). In contrast, when “prespliced” to exon 2 it follows a “zero length exon” splicing pathway in which a newly generated 5′ splice site at the junction with exon 2 is spliced to exon 4. We propose that a subset of apparent pseudo-exons, as exemplified here, are actually authentic alternative exons whose inclusion leads to NMD.
Correct identification of all introns is necessary to discern the protein-coding potential of a eukaryotic genome. The existence of most of the spliceosomal introns predicted in the genome of Saccharomyces cerevisiae remains unsupported by molecular evidence. We tested the intron predictions for 87 introns predicted to be present in non-ribosomal protein genes, more than a third of all known or suspected introns in the yeast genome. Evidence supporting 61 of these predictions was obtained, 20 predicted intron sequences were not spliced and six predictions identified an intron-containing region but failed to specify the correct splice sites, yielding a successful prediction rate of <80%. Alternative splicing has not been previously described for this organism, and we identified two genes (YKL186C/MTR2 and YML034W) which encode alternatively spliced mRNAs; YKL186C/MTR2 produces at least five different spliced mRNAs. One gene (YGR225W/SPO70) has an intron whose removal is activated during meiosis under control of the MER1 gene. We found eight new introns, suggesting that numerous introns still remain to be discovered. The results show that correct prediction of introns remains a significant barrier to understanding the structure, function and coding capacity of eukaryotic genomes, even in a supposedly simple system like yeast.
Alternative splicing (AS) is a key molecular process that endows biological functions with diversity and complexity. Generally, functional redundancy leads to the generation of new functions through relaxation of selective pressure in evolution, as exemplified by duplicated genes. It is also known that alternatively spliced exons (ASEs) are subject to relaxed selective pressure. Within consensus sequences at the splice junctions, the most conserved sites are dinucleotides at both ends of introns (splice dinucleotides). However, a small number of single nucleotide polymorphisms (SNPs) occur at splice dinucleotides. An intriguing question relating to the evolution of AS diversity is whether mutations at splice dinucleotides are maintained as polymorphisms and produce diversity in splice patterns within the human population. We therefore surveyed validated SNPs in the database dbSNP located at splice dinucleotides of all human genes that are defined by the H-Invitational Database.
We found 212 validated SNPs at splice dinucleotides (sdSNPs); these were confirmed to be consistent with the GT-AG rule at either allele. Moreover, 53 of them were observed to neighbor ASEs (AE dinucleotides). No significant differences were observed between sdSNPs at AE dinucleotides and those at constitutive exons (CE dinucleotides) in SNP properties including average heterozygosity, SNP density, ratio of predicted alleles consistent with the GT-AG rule, and scores of splice sites formed with the predicted allele. We also found that the proportion of non-conserved exons was higher for exons with sdSNPs than for other exons.
sdSNPs are found at CE dinucleotides in addition to those at AE dinucleotides, suggesting two possibilities. First, sdSNPs at CE dinucleotides may be robust against sdSNPs because of unknown mechanisms. Second, similar to sdSNPs at AE dinucleotides, those at CE dinucleotides cause differences in AS patterns because of the arbitrariness in the classification of exons into alternative and constitutive type that varies according to the dataset. Taking into account the absence of differences in sdSNP properties between those at AE and CE dinucleotides, the increased proportion of non-conserved exons found in exons flanked by sdSNPs suggests the hypothesis that sdSNPs are maintained at the splice dinucleotides of newly generated exons at which negative selection pressure is relaxed.
Retroviral replication requires both spliced and unspliced mRNAs. Splicing suppression of avian retroviral RNA depends in part upon a cis-acting element within the gag gene called the negative regulator of splicing (NRS). The NRS, linked to a downstream intron and exon (NRS-Ad3′), was not capable of splicing in vitro. However, a double-point mutation in the NRS pseudo-5′ splice site sequence converted it into a functional 5′ splice site. The wild-type (WT) NRS-Ad3′ transcript assembled an ∼50S spliceosome-like complex in vitro; its sedimentation rate was similar to that of a functional spliceosome formed on the mutant NRS-Ad3′ RNA. The five major spliceosomal snRNPs were observed in both complexes by affinity selection. In addition, U11 snRNP was present only in the WT NRS-Ad3′ complex. Addition of heparin to these complexes destabilized the WT NRS-Ad3′ complex; it was incapable of forming a B complex on a native gel. Furthermore, the U5 snRNP protein, hPrp8, did not cross-link to the NRS pseudo-5′ splice site, suggesting that the tri-snRNP complex was not properly associated with it. We propose that this aberrant, stalled spliceosome, containing U1, U2, and U11 snRNPs and a loosely associated tri-snRNP, sequesters the 3′ splice site and prevents its interaction with the authentic 5′ splice site upstream of the NRS.
Alternative splicing makes a major contribution to proteomic diversity in higher eukaryotes with ~70% of genes encoding two or more isoforms. In most cases, the molecular mechanisms responsible for splice site choice remain poorly understood. Here, we used a randomization-selection approach in vitro to identify sequence elements that could silence a proximal strong 5′ splice site located downstream of a weakened 5′ splice site. We recovered two exonic and four intronic motifs that effectively silenced the proximal 5′ splice site both in vitro and in vivo. Surprisingly, silencing was only observed in the presence of the competing upstream 5′ splice site. Biochemical evidence strongly suggests that the silencing motifs function by altering the U1 snRNP/5′ splice site complex in a manner that impairs commitment to specific splice site pairing. The data indicate that perturbations of non-rate limiting step(s) in splicing can lead to dramatic shifts in splice site choice.
Auxiliary splicing signals play a major role in the regulation of constitutive and alternative pre-mRNA splicing, but their relative importance in selection of mutation-induced cryptic or de novo splice sites is poorly understood. Here, we show that exonic sequences between authentic and aberrant splice sites that were activated by splice-site mutations in human disease genes have lower frequencies of splicing enhancers and higher frequencies of splicing silencers than average exons. Conversely, sequences between authentic and intronic aberrant splice sites have more enhancers and less silencers than average introns. Exons that were skipped as a result of splice-site mutations were smaller, had lower SF2/ASF motif scores, a decreased availability of decoy splice sites and a higher density of silencers than exons in which splice-site mutation activated cryptic splice sites. These four variables were the strongest predictors of the two aberrant splicing events in a logistic regression model. Elimination or weakening of predicted silencers in two reporters consistently promoted use of intron-proximal splice sites if these elements were maintained at their original positions, with their modular combinations producing expected modification of splicing. Together, these results show the existence of a gradient in exon and intron definition at the level of pre-mRNA splicing and provide a basis for the development of computational tools that predict aberrant splicing outcomes.
Regulation of splicing in eukaryotes occurs through the coordinated action of multiple splicing factors. Exons and introns contain numerous putative binding sites for splicing regulatory proteins. Regulation of splicing is presumably achieved by the combinatorial output of the binding of splicing factors to the corresponding binding sites. Although putative regulatory sites often overlap, no extensive study has examined whether overlapping regulatory sequences provide yet another dimension to splicing regulation. Here we analyzed experimentally-identified splicing regulatory sequences using a computational method based on the natural distribution of nucleotides and splicing regulatory sequences. We uncovered positive and negative interplay between overlapping regulatory sequences. Examination of these overlapping motifs revealed a unique spatial distribution, especially near splice donor sites of exons with weak splice donor sites. The positively selected overlapping splicing regulatory motifs were highly conserved among different species, implying functionality. Overall, these results suggest that overlap of two splicing regulatory binding sites is an evolutionary conserved widespread mechanism of splicing regulation. Finally, over-abundant motif overlaps were experimentally tested in a reporting minigene revealing that overlaps may facilitate a mode of splicing that did not occur in the presence of only one of the two regulatory sequences that comprise it.
We have investigated the alternative splicing of the EIIIB exon of the rat fibronectin gene. Mini-gene constructs containing this exon and portions of adjacent introns and exons, when transfected into HeLa cells, are transcribed and spliced, but omit the EIIIB exon. In vitro, HeLa nuclear extracts similarly splice out (skip) the EIIIB exon from similarly structured transcripts. Therefore, the HeLa splicing apparatus recognizes as atypical the EIIIB exon and its flanking intron sequences, both in vivo and in vitro. We also report that alterations in the ionic conditions of the in vitro splicing reaction can promote the initiation of EIIIB exon inclusion, as reflected by the formation of intermediate and product RNAs related to the removal of the intron upstream of EIIIB. Processing of this intron correlates with the formation of complexes resembling intermediates in spliceosome assembly. The branch sites involved in this alternative processing pathway are rather distant from the EIIIB 3' splice site, and lie within a region which is well conserved in the fibronectin genes of other species. Thus, the intron upstream of EIIIB shows singular structure and behavior which probably have a bearing on the regulated alternative splicing of this exon.
The unfolded protein response (UPR) in eukaryotes upregulates factors that restore ER homeostasis upon protein folding stress and in yeast is activated by a non-conventional splicing of the HAC1 mRNA. The spliced HAC1 mRNA encodes an active transcription factor that binds to UPR-responsive elements in the promoter of UPR target genes. Overexpression of the HAC1 gene of S. cerevisiae can reportedly lead to increased production of heterologous proteins. To further such studies in the biotechnology favored yeast Pichia pastoris, we cloned and characterized the P. pastoris HAC1 gene and the splice event.
We identified the HAC1 homologue of P. pastoris and its splice sites. Surprisingly, we could not find evidence for the non-spliced HAC1 mRNA when P. pastoris was cultivated in a standard growth medium without any endoplasmic reticulum stress inducers, indicating that the UPR is constitutively active to some extent in this organism. After identification of the sequence encoding active Hac1p we evaluated the effect of its overexpression in Pichia. The KAR2 UPR-responsive gene was strongly upregulated. Electron microscopy revealed an expansion of the intracellular membranes in Hac1p-overexpressing strains. We then evaluated the effect of inducible and constitutive UPR induction on the production of secreted, surface displayed and membrane proteins. Wherever Hac1p overexpression affected heterologous protein expression levels, this effect was always stronger when Hac1p expression was inducible rather than constitutive. Depending on the heterologous protein, co-expression of Hac1p increased, decreased or had no effect on expression level. Moreover, α-mating factor prepro signal processing of a G-protein coupled receptor was more efficient with Hac1p overexpression; resulting in a significantly improved homogeneity.
Overexpression of P. pastoris Hac1p can be used to increase the production of heterologous proteins but needs to be evaluated on a case by case basis. Inducible Hac1p expression is more effective than constitutive expression. Correct processing and thus homogeneity of proteins that are difficult to express, such as GPCRs, can be increased by co-expression with Hac1p.
Splicing of pre-mRNA is a critical step in mRNA maturation and disturbances cause several genetic disorders. We apply the synthetic tetracycline (tc)-binding riboswitch to establish a gene expression system for conditional tc-dependent control of pre-mRNA splicing in yeast. Efficient regulation is obtained when the aptamer is inserted close to the 5′splice site (SS) with the consensus sequence of the SS located within the aptamer stem. Structural probing indicates limited spontaneous cleavage within this stem in the absence of the ligand. Addition of tc leads to tightening of the stem and the whole aptamer structure which probably prevents recognition of the 5′SS. Combination of more then one aptamer-regulated intron increases the extent of regulation leading to highly efficient conditional gene expression systems. Our findings highlight the potential of direct RNA–ligand interaction for regulation of gene expression.
Splice site selection is a key element of pre-mRNA splicing. Although it is known to involve specific recognition of short consensus sequences by the splicing machinery, the mechanisms by which 5′ splice sites are accurately identified remain controversial and incompletely resolved. The human F7 gene contains in its seventh intron (IVS7) a 37-bp VNTR minisatellite whose first element spans the exon7–IVS7 boundary. As a consequence, the IVS7 authentic donor splice site is followed by several cryptic splice sites identical in sequence, referred to as 5′ pseudo-sites, which normally remain silent. This region, therefore, provides a remarkable model to decipher the mechanism underlying 5′ splice site selection in mammals. We previously suggested a model for splice site selection that, in the presence of consecutive splice consensus sequences, would stimulate exclusively the selection of the most upstream 5′ splice site, rather than repressing the 3′ following pseudo-sites. In the present study, we provide experimental support to this hypothesis by using a mutational approach involving a panel of 50 mutant and wild-type F7 constructs expressed in various cell types. We demonstrate that the F7 IVS7 5′ pseudo-sites are functional, but do not compete with the authentic donor splice site. Moreover, we show that the selection of the 5′ splice site follows a scanning-type mechanism, precluding competition with other functional 5′ pseudo-sites available on immediate sequence context downstream of the activated one. In addition, 5′ pseudo-sites with an increased complementarity to U1snRNA up to 91% do not compete with the identified scanning mechanism. Altogether, these findings, which unveil a cell type–independent 5′−3′-oriented scanning process for accurate recognition of the authentic 5′ splice site, reconciliate apparently contradictory observations by establishing a hierarchy of competitiveness among the determinants involved in 5′ splice site selection.
Typically, mammalian genes contain coding sequences (exons) separated by non-coding sequences (introns). Introns are removed during pre-mRNA splicing. The accurate recognition of introns during splicing is essential, as any abnormality in that process will generate abnormal mRNAs that can cause diseases. Understanding the mechanisms of accurate splice site selection is of prime interest to life scientists. Exon–intron borders (splice sites) are defined by short sequences that are poorly conserved. The strength of any splice sequence can be assessed by its degree of homology with a splice site consensus sequence. Within exons and introns, several sequences can match with this consensus as well as or better than the splice sites. Using a system in which a splice site sequence is repeated several times in the intron, the authors showed that linear 5′−3′ search is a leading mechanism underlying splice site selection. This scanning mechanism is cell type–independent, and only the most upstream splice site of all the series is selected, even if splice sites with a better match to the consensus are in the vicinity. These findings reconciliate contradictory observations and establish a hierarchy among the determinants involved in splice site selection.