|Home | About | Journals | Submit | Contact Us | Français|
Exonic splicing enhancers (ESEs) are important cis elements required for exon inclusion. Using an in vitro functional selection and amplification procedure, we have identified a novel ESE motif recognized by the human SR protein SC35 under splicing conditions. The selected sequences are functional and specific: they promote splicing in nuclear extract or in S100 extract complemented by SC35 but not by SF2/ASF. They can also function in a different exonic context from the one used for the selection procedure. The selected sequences share one or two close matches to a short and highly degenerate octamer consensus, GRYYcSYR. A score matrix was generated from the selected sequences according to the nucleotide frequency at each position of their best match to the consensus motif. The SC35 score matrix, along with our previously reported SF2/ASF score matrix, was used to search the sequences of two well-characterized splicing substrates derived from the mouse immunoglobulin M (IgM) and human immunodeficiency virus tat genes. Multiple SC35 high-score motifs, but only two widely separated SF2/ASF motifs, were found in the IgM C4 exon, which can be spliced in S100 extract complemented by SC35. In contrast, multiple high-score motifs for both SF2/ASF and SC35 were found in a variant of the Tat T3 exon (lacking an SC35-specific silencer) whose splicing can be complemented by either SF2/ASF or SC35. The motif score matrix can help locate SC35-specific enhancers in natural exon sequences.
Accurate removal of introns from pre-mRNA requires multiple cis elements, including the splice sites, polypyrimidine tract, branch site, and other intronic and exonic sequences that have positive or negative effects on splicing (10, 31, 44); reviewed in references (1 and 2). Positive-acting sequences, termed exonic splicing enhancers (ESEs) (37, 39, 42), have been identified primarily in exons associated with regulated splicing. These exons are typically adjacent to introns with weak intronic splicing signals and require ESEs for their inclusion. Deletion of an ESE often causes exon skipping or, in the case of terminal exons, suppresses removal of the last intron. One of the first characterized ESEs is located in the M2 3′-terminal exon of the mouse immunoglobulin M (IgM) gene (39). This 73-nucleotide (nt) ESE, which is highly purine rich, is required for inclusion of the alternatively spliced M2 exon. However, deletion of just the purine-rich sequences within this ESE does not abolish splicing completely. The M2 ESE also functions in a heterologous context to enhance splicing of a Drosophila melanogaster doublesex intron (39).
A SELEX procedure has been used to identify sequences that can function as ESEs (36). A 20-nt sequence of the internal duplicated exon of a model pre-mRNA was replaced by 20 nt of random sequence. The randomized pre-mRNAs were incubated under splicing conditions in nuclear extract, and functional enhancer elements that promoted splicing were selected. A large number of sequences, both purine rich and non-purine rich were obtained, and the two types of sequences stimulated exon inclusion to similar extents. A similar approach was used in an in vivo system involving transfection of a troponin minigene with random sequences in place of a natural ESE (9). Purine-rich sequences and a novel class of AC-rich ESE sequences were identified. The AC-rich sequences are efficient splicing enhancers and can also function in a heterologous gene context.
Considerable evidence suggests that ESEs interact specifically with a family of RNA-binding proteins called SR proteins, which are characterized by one or two RNA recognition motifs (RRMs) and a C-terminal Arg-Ser-rich domain (12, 18, 27, 34, 37, 38). SR proteins are essential splicing factors required for both constitutive and alternative splicing (11, 17, 43). SR proteins can determine alternative splice site selection by antagonizing the activity of hnRNP A/B proteins. High concentrations of SR proteins usually favor the use of proximal splice sites and exon inclusion, whereas high concentrations of hnRNP A/B proteins tend to favor distal splice sites and exon skipping (5, 22, 25). SR proteins also specifically recognize ESEs, and the resulting complex may then stimulate U2AF binding to the weak polypyrimidine tract of the upstream 3′ splice site. The ESE–SR protein–U2AF interaction is thought to be important during the early stages of spliceosome assembly (8, 16, 41, 45), although recent evidence suggests that, in at least some cases, including the IgM M2 exon, ESEs act in part by neutralizing exonic silencer elements (7, 15). SF2/ASF and SC35 are two of the best characterized among the nine human SR proteins identified to date. Both proteins have been implicated in many aspects of constitutive and regulated splicing. Both are found in the prespliceosomal E complex and can interact with U1-70K and U2AF by RS domain-mediated protein-protein interactions. The RRMs of these two proteins are responsible for their unique substrate specificities (6, 26).
A better understanding of the functional interactions between ESEs and SR proteins depends on knowledge of the sequence specificity of all SR proteins. To this end, we recently performed an iterative selection under splicing conditions to identify exon sequences that can enhance splicing in the presence of each of three SR proteins. We identified three novel classes of functional ESE motifs recognized specifically by SF2/ASF, SRp40, and SRp55. The consensus motifs indicated that individual SR proteins recognize distinct and highly degenerate sequences (20). The three SR proteins we studied previously are closely related, i.e., they all have two tandem RRMs. To extend this analysis, we have now determined the sequence specificity of an additional, extensively studied SR protein, SC35, which has a single N-terminal RRM.
HeLa nuclear and S100 extracts were prepared as described (23). Recombinant SC35 expressed in baculovirus was generously provided by K. Lynch and T. Maniatis and by R.-M. Xu.
The amplification and selection procedure was carried out as described (20). Briefly, the natural ESE of the IgM pre-mRNA was replaced by 20 nt of random sequence by overlap-extension PCR with plasmid μMΔ DNA (39) as a template. The resulting PCR product was used for in vitro transcription to generate a 32P-labeled random pre-mRNA pool. Twenty femtomoles of the pre-mRNA pool was incubated under in vitro splicing conditions in S100 extract plus recombinant SC35 in a 25-μl reaction mixture. The RNA was separated by denaturing polyacrylamide gel electrophoresis, and the spliced mRNAs were excised and eluted from the gel in 0.5 M ammonium acetate plus 0.1% sodium dodecyl sulfate and reamplified by reverse transcription-PCR (RT-PCR). Reverse transcription was carried out by using Superscript II as described by the manufacturer (Life Technologies). PCR was performed by using high-fidelity Pfu polymerase as specified by the manufacturer (Stratagene). The PCR product was subcloned into the vector PCR-Blunt (Stratagene) and sequenced by using a Dye Terminator Cycle Sequencing kit (Perkin-Elmer) and an automated ABI 377 sequencer. Selected winner sequences were rebuilt into DNA templates for transcription of pre-mRNAs by overlap-extension PCR, as done initially for the random sequences (20).
The selected sequences of each SR protein winner pool plus a portion of the flanking nucleotides were aligned by using Gibbs sampler (19). The identified consensus motif was then used to generate a score matrix. The compositional bias of the initial RNA pool was taken into account. For details of the sequence analysis, see reference (20).
PCR products carrying an SP6 or T7 promoter were used for in vitro transcription. 5′-capped transcripts were incubated in 25-μl splicing reaction mixtures as previously described (24). Each reaction mixture had 4 μl of nuclear extract or 7 μl of S100 extract. For S100 complementation assays, 20 pmol of specific SR protein was used. Splicing reactions were carried out at 30°C for 4 h. The RNA was then extracted, loaded on 6 or 12% polyacrylamide gels, and visualized by autoradiography (20). DNA templates for IgM M1-M2 pre-mRNAs with a D2 variant containing an SC35 consensus match or with the 6-24 winner sequence (29) were made by overlap-extension PCR with primers M2-D2HXL (GTGAAATGACTCTCAGCATggggacatactcggcccctgCTAGTAAACTTATTCTTACGT) and M2-SCH24 (GTGAAATGACTCTCAGCATtttgcggtctccggcctccCTAGTAAACTTATTCTTACGT), respectively (shared flanking sequences are in uppercase letters). DNA templates for pre-mRNAs in an IgM C3-C4 context were made by PCR on pμC3-C4 plasmid DNA (40) with an SP6 promoter primer and the following antisense primers: Ca (TGGCAGCAGGTACACAGC), CaCb (gtggctgactccctcagg), D2 (ctgcggccgagtatgtccccTGGCAGCAGGTACACAGC)D2C(caggggccgagtatgtccccTGGCAGCAGGTACACA GC) and 6-24 (ggaggccggagaccgcaaaTGGCAGCAGGTACACAGC). RNAs were made as described above.
To study the sequence specificity of ESE recognition by SC35 under splicing conditions, a functional SELEX procedure (20) was used (Fig. (Fig.1).1). Functional ESEs were selected in the context of a well-characterized mouse immunoglobulin μ heavy chain minigene transcript, comprising the last intron flanked by the M1 and M2 exons (39). The natural ESE in the M2 exon was replaced by 20 nt of random sequence by overlap-extension PCR. The random RNA pool, a library of pre-mRNAs representing 1.2 × 1010 different molecules, was spliced in nuclear extract or in S100 extract complemented by SC35. As previously reported, the wild-type IgM pre-mRNA spliced very efficiently in nuclear extract, with the mature mRNA representing greater than 90% of the RNA after a 4-h incubation (Fig. (Fig.2,2, lane 1). In contrast, the mutant with a deletion of ESE (ED) did not splice at all under the same conditions (Fig. (Fig.2,2, lane 2), confirming that the natural ESE of the IgM pre-mRNA is essential for splicing (39). The initial RNA pool was spliced in nuclear extract with an apparent efficiency of about 20% (Fig. (Fig.2,2, lane 3), whereas no splicing was detected in the S100 extract alone (lane 4). When the S100 extract was complemented by SC35, splicing of the initial RNA pool remained undetectable by autoradiography (lane 5). However, we assumed that a very small fraction of the RNA pool was correctly spliced, and we excised a gel slice corresponding to the position of spliced mRNA, using the product in lane 3 as a marker. RNA was eluted from the gel slice and amplified by RT-PCR. The amplified products were cloned, and 30 clones were sequenced. The resulting sequences were analyzed by using the program GIBBS sampler to determine a consensus sequence (19, 20). A score matrix was generated according to the frequency of each nucleotide at each position of the consensus motif, adjusted for the compositional bias of the initial random pool. This score matrix was used to identify high-score motifs within each winner sequence, taking into account the randomized sequence region and a small portion of the flanking sequences.
The SC35 winner sequences after a single round of selection yielded the short degenerate octamer consensus motif GRYYcSYR (Fig. (Fig.3).3). The C-residue content within the randomized 20-nt segment increased from 19% in the initial pool to 23% after a single round of selection. This change in C composition occurred at the expense of slight reductions in the content of G, A, and U residues. As reported previously for our similar analysis of other SR proteins, the SC35 consensus sequence is highly degenerate. Several of the winner sequences have more than one high-score motif (Fig. (Fig.3A).3A). The scores of the 30 SC35 winner sequences range from 1.19 to 3.55, with a mean score of 2.56 ± 0.56. Thirty individual sequences cloned and randomly selected from the initial pool (20) gave a range of scores from 0.64 to 3.23, with a mean score of 1.62 ± 0.72, when searched by the same score matrix. Only 3 sequences in the control pool had scores higher than the mean of the winner pool, whereas 16 sequences in the winner pool had scores higher than this mean, and 28 had scores higher than the mean of the control pool. The difference in the means of the scores between the two sequence pools is highly significant (P < 10−7, t test with df = 58).
The highest possible score for a single octamer is 3.95, corresponding to the sequence GGCCCCUG (Fig. (Fig.3B).3B). This precise sequence does not occur in any of the 30 winner sequences analyzed. The absence of a perfect motif in the selected sequences may reflect the small sample size or the fact that a linear consensus sequence or nucleotide frequency matrix assumes an independent contribution at each position, an assumption that may or may not fit the actual recognition mechanism (33).
We next tested whether the individual sequences selected in the presence of SC35 could function as true splicing enhancers. Five sequences with a range of scores were arbitrarily chosen from the 30 analyzed sequences and individually rebuilt into IgM M1-M2 pre-mRNAs with the same structure as those in Fig. Fig.11 and and2,2, using overlap-extension PCR and in vitro transcription (20). Each pre-mRNA was then incubated under splicing conditions in nuclear extract or in S100 extract complemented by SC35 (Fig. (Fig.4).4). Four of the five SC35 winner sequences activated IgM pre-mRNA splicing very efficiently in nuclear extract (Fig. (Fig.4,4, lanes 1, 4, 7, and 10). They also promoted IgM pre-mRNA splicing in S100 extract complemented by SC35, albeit less efficiently (Fig. (Fig.4,4, lanes 3, 6, 9, and 12), but not in S100 extract alone (Fig. (Fig.4,4, lanes 2, 5, 8, and 11). One winner sequence from the SC35 winner pool, D5, enhanced splicing less efficiently in nuclear extract (Fig. (Fig.4,4, lane 13) and gave only trace activity in the complementation assay (Fig. (Fig.4,4, lane 15). In general, the splicing efficiency correlated with the motif scores shown in Fig. Fig.3.3. D1 and D2 have the highest scores; D3 and D4 have intermediate scores; and D5 has the lowest score among the 30 sequences analyzed (Fig. (Fig.3).3). However, the correlation between splicing efficiency and motif scores is not linear, presumably reflecting sequence context effects. Also, D3 has a higher score than D4, and although they spliced with similar efficiency in nuclear extract, D4 spliced more efficiently in the complementation assay. Sixteen sequences from the random RNA pool were also analyzed for enhancer activity (20). All of them spliced in nuclear extract poorly or not at all. In most cases the pre-mRNAs showed partial degradation, suggesting that spliceosomal complexes did not assemble on these RNAs (H.-X. Liu and A. R. Krainer, unpublished data).
Next, we determined the SR protein specificity of the SC35-selected ESEs. Pre-mRNAs with the different winner sequences were separately incubated under splicing conditions in S100 extract complemented by SC35, SF2/ASF, SRp40, or SRp55. All of the tested SC35 winners promoted splicing with higher efficiency in S100 extract when the extract was complemented by SC35 (Fig. (Fig.5A,5A, lanes 3, 7, 11, 15, and 19), SRp40, or SRp55 (Liu and Krainer, unpublished). When the extract was complemented by SF2/ASF, the splicing efficiencies were much lower (Fig. (Fig.5A,5A, lanes 4, 8, 12, 16, and 20). In contrast, five SF2/ASF-selected winners promoted splicing in S100 extract complemented by either SF2/ASF (Fig. (Fig.5B,5B, lanes 4, 8, 12, 16, and 20) (20) or SC35 (Fig. (Fig.5B,5B, lanes 3, 7, 11, 15, and 19) with comparable efficiencies. These SF2/ASF winners promoted splicing very poorly or not at all in the presence of SRp40 or SRp55 (20).
To test whether an octamer with the highest possible SC35 ESE score has enhancer activity and to compare this consensus with a previously identified one, we analyzed the D2 winner containing the motif GGCCGCAG, a variant of D2 with two transversions that create the maximum score consensus GGCCCCUG, and one of the 19mer winners (6-24) selected by Schaal and Maniatis (29). These three sequences were first tested in the context of the IgM M2 exon (Fig. (Fig.6A).6A). All three sequences strongly promoted splicing of exons M1 and M2 in nuclear extract (Fig. (Fig.6A,6A, lanes 3 to 5), in contrast to the lack of detectable splicing with the parent pre-mRNA in which the natural ESE was deleted (Fig. (Fig.6A,6A, lane 1).
Next we tested the same three ESEs in a different exonic context, namely the C4 exon derived from a different region of the IgM pre-mRNA. When this exon is divided into three segments, Ca, Cb, and Cc, the Cc segment is dispensable, whereas the Cb segment behaves as an SC35-specific ESE (26). Indeed, a shortened 3′ exon consisting of the Ca and Cb segments of C4 spliced to exon C3 much more efficiently in nuclear extract than one consisting of Ca alone (Fig. (Fig.6B,6B, lanes 1 and 2). When the Cb segment was replaced by each of the above three ESEs, all of them promoted splicing above the background of Ca alone (Fig. (Fig.6B,6B, lanes 4 to 6). However, the D2 winner ESE was as strong as the natural Cb ESE, the 6-24 ESE was slightly less efficient, and the perfect consensus was the least active. These results show that both our SC35 motif and a winner sequence identified in a previous study (29) can function in different exonic contexts, although the precise context can influence the extent of enhancement.
To determine whether the selected ESE motifs are relevant to splicing of natural pre-mRNA substrates, we conducted a search of SC35 high-score motifs in natural genes. Only scores higher than the lowest score of the SC35 winner pool are shown (Fig. (Fig.7,7, green vertical bars). For comparison, we also show the high-score SF2/ASF motifs in the same genes (Fig. (Fig.7,7, blue vertical bars) (20). The first natural sequence we examined was the M2 exon of the IgM gene. The search result indicated that there are many SC35 ESE motifs within the segment comprising the previously characterized natural ESE (Fig. (Fig.7A,7A, magenta horizontal bar). The distribution of high-score SC35 motifs differs from that of SF2/ASF motifs. SF2/ASF-specific motifs are present at a higher density within the natural ESE than in the flanking regions. In contrast, high-score SC35 motifs have a relatively even distribution across the M2 exon. Both SR proteins can promote splicing of this pre-mRNA in S100 extract (Liu and Krainer, unpublished). The presence of ESE motifs in regions lacking enhancer activity shows that although the motifs may be necessary, they are not sufficient for ESE function (see Discussion).
To address the issue of whether the identified ESE motifs are specific to SC35, we searched two additional pre-mRNA substrates that are known to have different SR protein specificities. Splicing of the IgM C3-C4 pre-mRNA is activated in S100 extract when complemented by SC35 but not by SF2/ASF (26). In contrast, splicing of the human immunodeficiency virus Tat T2-T3 pre-mRNA is activated by SF2/ASF but not by SC35 in S100 extract (6, 26). When an SC35-specific splicing silencer in the 3′ region of the T3 exon is deleted, both SF2/ASF and SC35 can activate T2-T3 splicing in S100 extract. Detailed analysis of the splicing of these two pre-mRNAs indicated that the C4 and T3 exons determine the SR protein specificity (26). Our search result matches the experimental data (Fig. (Fig.7B7B and C). Many high-score motifs matching the consensus of SC35 were found in the C4 exon, but only two well-separated SF2/ASF motifs were found in this exon (Fig. (Fig.7B).7B). Interestingly, in a deletion mutant missing the first 38 nt of the C4 exon, splicing of C3-C4 was activated by both SF2/ASF and SC35 (26). Consistent with this result, the SF2/ASF motif near position 61 is closer to the 3′ splice site in the deletion mutant. High-score motifs for both SF2/ASF and SC35 were found in the T3 exon of the tat gene (Fig. (Fig.7C).7C). Curiously, a single SC35 high-score motif is present within the SC35-specific silencer region.
Finally, we studied the distribution of SC35 high-score motifs in human exons versus introns. A total of 570 genes, representing 2,626 exons (426 kb) and 2,079 introns (1,295 kb), were extracted from the ALLSEQ database (4) and analyzed. Scores equal to or higher than the mean score of the winner pool were taken into account. High-score motifs appeared more frequently in exons than in introns. An average of nine SC35 high-score motifs were found per kilobase of exon compared to only 5.9 per kilobase of intron. This comparison was statistically significant because of the large database size (P < 10−10).
We have identified a novel ESE motif recognized by the human SR protein SC35. Several lines of evidence point to the biological relevance of the selected ESE motifs. First, they are functional ESEs. All of the SELEX winners we have tested promote splicing in nuclear extract and in S100 extract plus the cognate SR protein. In nuclear extract, the SELEX winners function as potent ESEs. Second, the SC35 motifs are present within exon segments containing natural ESEs and are more frequently found in exons than in introns, suggesting that they may contribute to exon definition by the spliceosome. Third, the SC35 motifs are specific, i.e., they are not recognized by all SR proteins. The SC35-selected ESEs were recognized by SC35, SRp40, or SRp55, but not by SF2/ASF under splicing conditions. In addition, the distribution of high-score motifs of SF2/ASF and SC35 in the IgM C4 and Tat T3 exons correlated with the observed SR protein specificity of the corresponding substrates (26). This result also suggests that the score matrices we have generated have some predictive value. We have previously analyzed the predictive value of the SR-specific score matrices derived for other SR proteins (20). Statistically, high-score motifs of SR proteins are present at a higher density in natural ESEs than in the flanking regions. Experimentally, SR proteins specifically recognize their cognate ESE motifs when these are placed in the context of the IgM M2 exon, replacing the natural ESE. The present study confirms and extends our previous work to two natural ESEs in IgM and human immunodeficiency virus Tat exons. In addition, we have now shown that SC35 winner sequences and a maximum-score SC35 motif can promote splicing in different exonic contexts.
The specific interaction between SR proteins and ESEs has also been described in other systems. During assembly of enhancer complexes in vitro (Enh complex, which resembles the E complex), the enhancer sequences determine the specific pattern of SR proteins that can be UV cross-linked to the RNA (32). Female-specific alternative splicing of the Drosophila doublesex pre-mRNA requires six 13-nt repeat elements and a purine-rich element. UV cross-linking analysis showed that SR proteins, along with Tra and Tra-2, assemble on the ESEs in a stepwise and sequence-specific manner (21). The fact that SR proteins are expressed in a tissue-specific manner (14, 43), together with the specific recognition of ESEs by individual SR proteins may contribute quantitatively to the regulation of gene expression.
The SC35 SELEX winners have the consensus GRYYcSYR, which is a highly degenerate sequence. Even though SC35 has a single RRM, a SELEX protocol based on RNA binding yielded two different nonamer consensus sequences, AGSAGAGTA and GTTCGAGTA, which share the last five nucleotides (35). These two motifs differ significantly from the more degenerate consensus identified by functional SELEX. Although the second motif has a partial fit to the above consensus, neither motif has a good score, consistent with the observation that the high-affinity binding sequences fail to enhance splicing of RNA substrates in nuclear extract or in S100 extract plus SC35, even when present in several copies (35). Therefore, it appears that high-affinity SC35-binding sites are not optimal for function. Perhaps RNA-binding selection does not achieve an interaction geometry compatible with SC35 enhancement function, or it is essential to coselect sequences that in addition to binding SC35 can also accommodate putative coactivators or fail to bind silencing factors.
Nevertheless, our data argue that SC35 has limited but defined sequence specificity in recognizing functional sequences. Despite the fact that this protein has a single RRM, the functional recognition motif is degenerate, as was the case for the two-RRM SR proteins SF2/ASF, SRp40, and SRp55 (20). Therefore, the degeneracy of the ESE motifs recognized by those proteins is probably not attributable to the recognition of distinct motifs by each of their RRMs. The sequence degeneracy of the ESEs is consistent with the fact that they must coexist with a very wide variety of unrelated open reading frames and must be recognized by a discrete set of SR proteins (20, 28, 29).
Schaal and Maniatis recently used a similar functional SELEX approach to select ESEs that could function in the context of the Drosophila doublesex pre-mRNA in HeLa nuclear extract (29). The selected 18-nt winner sequences were then individually analyzed by S100 complementation assays to define their SR protein specificity. Two round 6 winner sequences were the most active in the presence of SC35. By comparing these two sequences to each other and to an SC35-dependent ESE present in human β-globin exon 2, the authors proposed the SC35 heptamer consensus UGCNGYY, which is also a highly degenerate sequence. Although this heptamer motif is substantially different from our consensus octamer motif, some versions of the degenerate heptamer consensus have high scores, as defined in the present study. We therefore searched the two published winner sequences (29) by using our SC35 score matrix. Both sequences had multiple high-score motifs, some of which were nonoverlapping, consistent with the fact that they had undergone six rounds of selection for splicing. In the case of the 6-24 sequence, the highest score (3.13) corresponded to the octamer GGUCUCCG, which has a 4-nt overlap with the UGCGGUC sequence that fits the heptamer consensus. In the case of the 6-38 sequence, the second highest score (1.56) corresponds to the octamer UGCCGCC, of which the first 7 nt fit the heptamer consensus; the highest score (2.44) was for the nonoverlapping octamer GGACCGGA. Similarly, within the 18-nt β-globin fragment in which Schaal and Maniatis characterized an SC35-dependent ESE that comprises the heptamer UGCUGUU (28), the highest score (1.36) corresponds to the octamer UGAUGCUG, which includes the first 5 nt of the heptamer.
We conclude that despite the very different pre-mRNA contexts, types of extract used for the selection, and number of selection rounds, the SC35 ESEs identified by the two approaches are remarkably consistent. We believe, however, that our octamer motif has greater predictive value because it was derived from a much larger number of winner sequences. Also, the use of a nucleotide frequency matrix derived from 30 sequences allows identification of putative SC35 ESEs that do not precisely match the consensus at every position. Thus, our SC35 score matrix finds high-score motifs in both of the winner sequences and the β-globin segment characterized by Schaal and Maniatis (28, 29), whereas of our 30 SC35 winner sequences (Fig. (Fig.3),3), only no. 14 has a precise match to the heptamer consensus they defined.
The IgM M2 exon has a higher density of SF2/ASF and SRp40 high-score motifs within the natural ESE segment than in the flanking sequences. In contrast, the SRp55 high-score motifs do not correlate with the location of the ESE (20). In the case of SC35, the high-score motifs also have a relatively even distribution across the exon. The different motif distributions may reflect different mechanisms of SR protein-ESE recognition. Although for some pre-mRNAs any SR protein can complement splicing in the S100 extract (30, 43), each SR protein may function by slightly different mechanisms. Some SR proteins may require multiple binding sites to function, and the optimal distance from the 3′ splice site to the SR protein-binding site may also be protein specific. The fact that ESE motifs are not found exclusively in natural exonic segments required for ESE activity indicates that the motifs are not sufficient for ESE function. It appears that sequence context, structure, or position effects are also very important.
Examples of sequence context effects that can influence ESE activity are provided by exonic splicing silencers. These inhibitory elements probably coexist with splicing enhancers in many exons, and they may also be SR protein dependent and function in a cell-type specific manner. For example, an SC35-dependent silencer sequence has been mapped in the tat gene T3 exon (26). This silencer element includes within it an SC35-specific ESE motif (Fig. (Fig.7C).7C). We speculate that binding of SC35 to this region prevents the function of other splicing factors, although it is presently unclear how this element acts at a distance and suppresses the effect of SC35-dependent ESEs but not of SF2/ASF-dependent ESEs. Recently, the 3′ portion of the IgM M2 exon was also shown to comprise a silencer element that binds U2 snRNP and antagonizes the upstream ESE (15). The silencer element, so far mapped to a fragment between nt 94 and 167 (Fig. (Fig.7A)7A) in the M2 exon, overlaps with several SC35 high-score motifs and with one SF2/ASF high-score motif.
The similar arrangement of adjacent ESE and exonic splicing silencer elements seen in the IgM M2 and Tat T3 exons may turn out to be a common feature of many vertebrate cellular and viral exons. To improve the predictive value of the SR protein-specific ESE motifs, it will be necessary to gain a better understanding of the influence of sequence context and position, as well as of the mechanistic basis for the function of splicing enhancers and silencers.
We thank Y. Shimura for the gift of IgM plasmids, K. Lynch, T. Maniatis, R.-M. Xu, and A. Mayeda for recombinant SR proteins, J. Yin for DNA sequencing, A. Mayeda and members of our laboratory for valuable ideas, and M. Hastings for helpful comments on the manuscript.
This work was supported by NIH grants to A.R.K. (GM42699) and M.Q.Z. (HG01696), by an Advanced Fellowship from The Wellcome Trust to S.L.C. (045401), by a fellowship from the Human Frontiers Science Program to L.C. (LT0066/1997-M), and by a fellowship from the U.S. Army Medical Research and Matériel Command under DAMD 17-96-1-6172 to H.-X.L.