|Home | About | Journals | Submit | Contact Us | Français|
Pre-mRNA splicing in eukaryotes requires joining together the nucleotides of the various mRNA-coding regions (exons) after recognizing them from the normally vastly superior number of non-mRNA-coding sequences (introns). For three excellent reviews on general splicing and its regulation, refer to references 14, 62, and 70. In eukaryotes, the vast majority of splicing processes are catalyzed by the spliceosome, a very complex RNA-protein aggregate which has been estimated to contain several hundred different proteins in addition to five spliceosomal snRNAs (1, 54, 62, 63, 81, 109). These factors are responsible for the accurate positioning of the spliceosome on the 5′ and 3′ splice site sequences. The reason why so many factors are needed reflects the observation that exon recognition can be affected by many pre-mRNA features such as exon length (5, 97), the presence of enhancer and silencer elements (8, 62), the strength of splicing signals (45), the promoter architecture (29, 55), and the rate of RNA processivity (86). In addition, the general cellular environment also exerts an effect, as recent observations suggest the existence of extensive coupling between splicing and many other gene expression steps (69) and even its modification by external stimuli (96).
In the midst of all this complexity, it has also been proposed that pre-mRNA secondary structures can potentially influence splicing activity. However, despite a steady increase of reports invoking their effects on splicing regulation, the last specific review on this subject is now more than 10 years old (3). Here, we propose to address again this specific issue in the current perspective of the general field. Before we do this, however, we have to answer a basic question.
Two properties of RNA molecules cannot be denied: their natural tendency to form highly stable secondary and tertiary structures in vitro and in vivo (9, 27, 39) and the observation that alterations in these structures represent a well-known regulatory mechanism for many RNA cellular processes (60).
In this particular respect, however, a question that still remains to be addressed conclusively regards the presence of secondary structures in pre-mRNAs in vivo. That this existence may not simply be taken for granted comes from early experimental evidence. In fact, it was suggested that in vitro evidence regarding the possible influence of RNA structure on splicing (94) could not be accurately reproduced in vivo (95). The reason why this should be so goes back to the classical concept that RNA is coated in vivo by proteins. In fact, heterogeneous ribonucleoprotein particles have been known since early studies and the major protein family involved, the hnRNP proteins, are very abundant in mammalian cells. These RNA-protein interactions may well prevent mRNAs from folding in stable secondary structures (34) (Fig. (Fig.1a).1a). For this reason, it was hypothesized that, following transcription, pre-mRNA may be allowed only a very limited timespan to fold (36). Consistent with this view, studies with artificial constructs used for the quantification of enhancer activities yielded results which supported the hypothesis that these pre-mRNA molecules behaved largely as a linear structure (44).
Notwithstanding these results, there are also some problems with the view that this situation may be applied to the vast majority of pre-mRNA molecules. Clearly, considering the enormously diverse sequences of all processed pre-mRNAs, it would be quite over the line to propose the presence of highly stable secondary structures (Fig. (Fig.1b)1b) that resemble those of the highly conserved tRNAs, rRNAs, IRES, or other stability-, replication-, and localization-controlling elements present in several 3′UTRs of prokaryotic and eukaryotic mRNAs, in which proteins may also play a key role in stabilizing the structure (60). However, in between these two extremes there may exist a third possibility, represented by the existence of a loose amount of RNA-specific secondary structures which might, under normal conditions, influence the splicing machinery (Fig. (Fig.1c).1c). Significantly, several studies along this line have been reported. For example, in organisms such as Saccharomyces cerevisiae, probing of pre-mRNA structures by dimethyl sulfate in vivo has demonstrated the existence of secondary structure formation between the 5′ splice site and the branch point capable of promoting U1snRNP assembly in the early splicing stages (21). Although there is no comparable evidence for human systems, it has been reported recently that single-nucleotide polymorphisms are capable of inducing in vivo different structural folds in mRNA structures (88) (however, the effect of these single-nucleotide polymorphisms on splicing or function has not yet been tested). In addition, statistical analysis of mRNA coding sequences has revealed that the calculated mRNA folding is more stable than expected by chance, suggesting that codon bias may favor the existence of mRNA structures (87). Even though these results have been challenged using a different set of statistical tools and genes (107), considerations analogous to those of Seffens and Digby (87) have been recently reported concerning bacterial RNA (57).
An additional possibility to indirectly assess this issue is to investigate whether, and to what extent, the binding of splicing factors can be affected by or affect the RNA secondary structures. Clearly, any indications along these lines would represent a sound experimental basis for speculations regarding the role played by RNA secondary structure in splicing.
There are several excellent reviews regarding the topic of how proteins bind sequence specifically to single-stranded RNA (2, 30, 85) and a more recent report regarding binding to double-stranded sequences (17). Indeed, several reports in the recent literature suggest that RNA secondary structure plays an important role in binding. For example, binding of proteins to RNA (CNG)n trinucleotide repeats in vivo closely matches the in vitro results that predict that these repeats are folded in a characteristic hairpin shape (93). The most recent observation is that the predictive ability in the search for novel RNA binding targets for well-known proteins can be greatly enhanced if secondary structure is taken into consideration. A recent example of this is represented by HuR (68) (Fig. (Fig.2a),2a), a protein that binds specific mRNA subsets and is involved in the posttranscriptional regulation of gene expression (68). Considering that an increasing number of RNA binding proteins behave like HuR, that is, seem to recognize loosely defined sequence motifs, it would not be surprising if in several cases RNA secondary structures represented constraining elements capable of shaping well-defined target regions in the presence of loose sequence conservation.
With regard to specific factors capable of affecting the splicing process, it has to be noted that the binding of several positive (B52, SRp55, and NOVA-1) and negative (hnRNP A1) regulators of splicing have been shown to depend on RNA secondary structures as well as on the target nucleotide sequences (10, 31, 78, 89). Recently, the fact that most major members of the SR protein family have been observed to be potentially affected by the conformation of a target RNA may indicate that structural influences may be a widespread occurrence, at least for the components of this important family of splicing modifiers (13).
Interestingly, this relationship between splicing factors and the RNA spatial distribution may well go both ways, providing a potentially even greater level of flexibility in the control of splicing. For example, it has recently been proposed that binding of U2AF65 alone to the 3′ splice site has the result of “compacting” the RNA in such a way as to bring in close proximity to each other the 3′ splice site and the branch site region (59) (Fig. (Fig.2b).2b). It should be noted that these studies were performed using an artificial short RNA (62 nucleotides [nt]) containing the branch region, a polypyrimidine tract, and a 3′ splice site, and thus further experiments will have to be performed in order to verify whether these effects play a wider role in vivo. Nonetheless, this finding shows that protein factors are not just passive players in the “binding and folding game,” and hopefully future studies will develop this emerging concept and its implications.
For obvious reasons, the earliest and most numerous reports regarding the ability of RNA secondary structures to affect the splicing process concern conserved key regions that define an exon (i.e., 5′ splice site, 3′ splice site, and branch site). These reports include many diverse organisms and genes. For example, they include viruses such as hepatitis B virus (67), adenovirus (22, 41, 76), human immunodeficiency virus type 1 (31, 52), Rous sarcoma virus (15), yeasts (21, 33, 42, 43, 47, 80, 102), plants such as Nicotiana plumbaginifolia (66), Drosophila (23), and rats and mice (26, 28, 105). In humans, secondary structures which affect the recognition of conserved splice site consensus sequences have been proposed in the generation of human growth hormone isoforms (37), the tau gene (46, 53, 100, 101, 108), the Hprt gene (49, 98), and the hnRNPA1 gene (6).
Although many of these cases contain individual peculiarities, there seem to be two rather intuitive and unifying mechanisms involved. The most common one is represented by the presence of structural elements which may hinder the accessibility of selected sequences by basic splicing factors. In this way they have been proposed to hinder intron processivity and promote skipping of the exon both in an artificial context (43) (Fig. (Fig.3a)3a) and in the context of a pathological defect involving the human tau gene (100) (see Fig. Fig.66 and below). Depending on the system analyzed this inhibition has been observed to target only the acceptor site, the donor site, or both. With special regard to the 3′ splice site, however, it should be noted that recent attempts to correlate the presence of loosely defined secondary structures in 3′ splice site definition have resulted in a small (5 to 10%) but significant improvement in predictive ability (84), indicating that this region may be particularly sensitive to the presence of structured RNA.
The second mechanism involves a more indirect effect, whereby RNA secondary structures that do not involve the conserved splicing sequences can nonetheless vary the relative distance between these elements. These changes can then determine considerable variation in splice site usage or efficiency. An example of such an event has been seen to occur in the yeast Kluyveromyces lactis actin pre-mRNA, where varying the distance between the branch point element and two potential 3′ splice sites determines efficient use of the distal acceptor site (33) (Fig. (Fig.3b).3b). Alternatively, structural constraints may also have the effect of indirectly promoting branch site use by keeping it in a single-stranded accessible configuration, such as was described for the Drosophila Adh gene (23) (Fig. (Fig.3c3c).
In addition to splicing consensus sequences, there is also a smaller (but ever increasing) number of cases where structural constraints have been described to affect less-defined cis-acting sequences such as exonic/intronic splicing enhancers (ESE/ISE) or silencer elements (ESS/ISS) (8, 92).
For example, a human-specific ESS sequence in the fibronectin EDA exon has been shown to affect the binding of SR proteins to an ESE sequence which lies 13 nt upstream in the primary RNA sequence. Under normal conditions, the function of this ESS sequence has been proposed to stabilize the secondary structure of the ESE sequence in such a way as to allow binding of SR proteins (77). Additional characterization of the ESS/ESE system in the mouse and human EDA exons showed that while human and mouse ESE sequences behaved in an identical fashion, mutations introduced in the mouse ESS sequence (putatively identified by sequence homology) had no effect on exon splicing. Structural analysis of the mouse EDA exon showed that regardless of its few nucleotide changes in sequence (8 of 270) from the human exon the two RNA secondary structures differed considerably (13). By comparing how the mouse structure responded to homologous deletions in its putative ESS sequences, it was thus finally demonstrated that changes in splicing behavior with respect to the human ESS sequence could be accounted for by a conformational shift from a loop to a stem in the ESE structure (Fig. (Fig.4a).4a). This shift prevented binding of SF2/ASF and resulted in exon skipping without modifying directly the SF2/ASF binding motif (13). Therefore, different structural constraints in mice and humans could thus account for what appeared to be an apparently contradictory splicing behavior.
A somewhat analogous situation has also been recently described for the SMN1/SMN2 genes; Miyaso et al. (74) have identified an ISE element consisting of a conserved 24-nt stem-loop structure in intron 7. Disruption of this secondary structure leads to loss of binding of an as-yet-unidentified trans-acting factor, and this can influence the splicing process (but only in the presence of the C-to-T transition which occurs in position 6 of exon 7). Considering that this transition has been shown to involve directly several protein-binding signatures such as SF2/ASF (19) and hnRNP A1 (56) and is close to a Tra2-β1 (51) binding site, it will be interesting to analyze the potential interplay between all these factors and the identified ISE element. Significantly, an in silico search made by Miyaso et al. has shown that this element seems to be present in a variety of intron sequences from several genes, raising the possibility that this structurally defined ISE may play a wider role in the general splicing field (74). Finally, from a structural point of view it has to be noted that this exon may also harbor a stem-loop element near its 3′ splice site region (90), although the effect of this structure on exon 7 splicing still remains to be determined.
A different mechanism from the ones presented above has been recently proposed for the human FGFR2 gene. In this case, the formation of a double-stranded RNA created by the joining of two single-stranded elements (creating a loop of 735 nt) was initially observed to regulate splicing of the mutually exclusive IIIb and IIIc exons (32, 75). Mutational studies have demonstrated that the fundamental feature is in the double-stranded structure and not in the FGFR2-specific sequences. Further work on the subject has recently suggested that the function of this structure would be to approximate an intronic control element that inactivates a previously mapped ISS sequence localized near the IIIb exonic sequence (4). In fact, as shown in Fig. Fig.4b,4b, in linear conditions this novel intronic control element would be too far away to have any effect on the functioning of the ISS element. Interestingly, a phylogenetic analysis of this structure from sea urchin to humans has demonstrated that functional conservation of this structure has been maintained for over 600 million years, highlighting the resilience of mRNA secondary structures during evolution independently of the specific nucleotide sequences (73).
A still very much obscure mechanism through which RNA secondary structure has been proposed to influence the splicing process is by affecting higher-order structures of the pre-mRNA molecule. The first evidence that alterations of extensive secondary structural elements involving both exonic and intronic sequences were responsible for splicing alterations came from the analysis of the chicken β-tropomyosin gene (25, 64) and the dystrophin gene (71). A somewhat related concept has been recently taken up again concerning the presence of conserved polypurinic and polypyrimidinic sequences in the intronic regions of a variety of genes (72) which might be able to pair off with each other and thus exclude determinate exons from the splicing “queue.” It remains unclear, however, how these structures might be responsible for exon skipping, although for the chicken β-tropomyosin gene recent work indicates that under conditions that favor RNA structure formation there is a generalized interference with U1-U6 snRNP interactions (91) (Fig. (Fig.5a).5a). Nonetheless, further experimentation will hopefully allow us to provide some information on this. For example, in the rp51b intron of S. cerevisiae the efficient splicing of a 325-nt intron requires the pairing of two short interacting sequences which are normally 200 nt apart, an event which probably facilitates cooperative interactions between intron-spanning factors (65).
An analogous mechanism which has received extended experimental testing in recent years has also been described to occur in the hnRNPA1 gene (7, 79). In this case, the hnRNP A1 factor itself has been shown to bind on either side of an exon and directly promote exclusion through a “looping out” mechanism (Fig. (Fig.5b).5b). At present, the proposed mechanisms of action involve an active hindrance of the looped-out 5′ splice site (probably in postcommitment processing steps because U1snRNP binding did not seem to be altered in the looped-out exon) (20) and/or approximation of the distal 5′ splice site, potentially providing a competitive advantage. Interestingly, a similar situation may also occur in the splicing regulation of the neuron-specific c-src exon N1 by the polypyrimidine tract binding protein. Also in this case, polypyrimidine tract binding protein binding on either side of this exon and looping out the N1-containing RNA may contribute to its suppression and prevent its inclusion in nonneuronal cells (24, 103).
Considering the evident complexity involved in correct pre-mRNA processing it is not surprising that splicing alterations have been increasingly reported as being involved in many genetic diseases. On purpose, this review does not intend to be an exhaustive analysis of the ever-growing connections between mutations, splicing, and disease, as this has been the subject of several excellent recent reviews (16, 18, 38, 40, 82, 83). However, among this ever-increasing body of evidence that links splicing with disease it is worthwhile to point out that changes in RNA structure have also been invoked to play a role in pathogenic processes involving the dystrophin gene (71), the NF-1 gene (50, 58), and, more recently, the CFTR gene (48). In these cases, however, there is no evidence by experimental probing that the proposed structures follow the in silico predictions. Furthermore, in IVS8 of the CFTR gene it is still unclear how the predicted structures relate to splicing factors binding in the same position (11, 12, 104). At this stage, however, a word of caution is warranted regarding the fact that these described examples are principally based on association studies between splicing activity and in silico predictions of pre-mRNA structures such as those obtainable by Mfold (110) or Pfold (61). The drawback of these approaches is represented by the fact that computer algorithms provide a folding prediction (and often more than one) for virtually any RNA sequence and are strongly biased by the length of the RNA sequence examined. For this reason, although in silico predictions represent an invaluable tool for the researcher in this field, special care should be exercised when predicted pre-mRNA structures are correlated with splicing behavior. As an example, in silico studies of NF-1 gene transcripts (50, 58), which are implicated in the generation of human tumors, have been challenged by successive reports (99, 106). In fact, these studies have shown that the analyses reporting correlations between in silico predicted changes in secondary structure and splicing in these systems are heavily dependent on the RNA window taken into consideration, making it very difficult to assign significance to the suggested correlations. An analogous situation has occurred concerning the splicing control of exon 2 in the human hprt gene, where the proposed role of RNA secondary structure based on in silico evidence (49) has not received any support in a more recent analysis performed using updated parameters and including part of the flanking intron sequences (98).
At present, direct experimental evidence for a role played by secondary structure in the generation of human disease is best represented only by the work performed on the mutations that affect inclusion of exon 10 in the tau gene (46, 53, 100, 101, 108), although it should be noted that one mutational study does not support these conclusions (35). Mutations in the tau gene have been associated with frontotemporal dementia and parkinsonism. In particular, mutations in the intronic region near the 5′ splice site of exon 10 correlate closely with alterations in a characteristic stem-loop structure which has been determined by nuclear magnetic resonance spectroscopy (100) (Fig. (Fig.6a).6a). Extensive mutational analyses (46, 53) of this region and functional binding studies to monitor U1snRNP binding to the splice site (53) have shown that mutations which destabilize the helix result in an increased splice site usage owing to an increase in U1snRNP binding (53) (see the schematic diagram in Fig. Fig.6b).6b). Notably, the fact that a small antibiotic, neomycin, can bind to this region and stabilize the stem-loop configuration represents a promising start in the search for therapeutic agents that exploit structural motifs (101).
In conclusion, the picture that is beginning to emerge clearly favors the possibility that many (if not most) pre-mRNA sequences are quite capable of harboring selected regions which can fold in well-defined secondary structures in vivo. Evolutionarily this is probably not a chance occurrence, as lack of structure would certainly deprive the splicing process of an additional regulating mechanism while too much structure would end up interfering with later complex assembly steps and other layers of regulation. The functional mechanisms investigated so far mostly involve two kinds of mechanistic explanations: the occlusion/exposure of key cis-acting regulatory elements or the spatial modification of the distance between these elements. At present, the principal limitation in identifying these events is that our predictive abilities are still rather limited and safe judgement can be made only through implementation of robust functional studies and experimental probing of proposed RNA structures.
This work was supported by grants from the Telethon Onlus Foundation (Italy) (grant no. GGP02453) and FIRB (grant no. RBNE01W9PM).