We have identified a novel ESE motif recognized by the human SR protein SC35. Several lines of evidence point to the biological relevance of the selected ESE motifs. First, they are functional ESEs. All of the SELEX winners we have tested promote splicing in nuclear extract and in S100 extract plus the cognate SR protein. In nuclear extract, the SELEX winners function as potent ESEs. Second, the SC35 motifs are present within exon segments containing natural ESEs and are more frequently found in exons than in introns, suggesting that they may contribute to exon definition by the spliceosome. Third, the SC35 motifs are specific, i.e., they are not recognized by all SR proteins. The SC35-selected ESEs were recognized by SC35, SRp40, or SRp55, but not by SF2/ASF under splicing conditions. In addition, the distribution of high-score motifs of SF2/ASF and SC35 in the IgM C4 and Tat T3 exons correlated with the observed SR protein specificity of the corresponding substrates (26
). This result also suggests that the score matrices we have generated have some predictive value. We have previously analyzed the predictive value of the SR-specific score matrices derived for other SR proteins (20
). Statistically, high-score motifs of SR proteins are present at a higher density in natural ESEs than in the flanking regions. Experimentally, SR proteins specifically recognize their cognate ESE motifs when these are placed in the context of the IgM M2 exon, replacing the natural ESE. The present study confirms and extends our previous work to two natural ESEs in IgM and human immunodeficiency virus Tat exons. In addition, we have now shown that SC35 winner sequences and a maximum-score SC35 motif can promote splicing in different exonic contexts.
The specific interaction between SR proteins and ESEs has also been described in other systems. During assembly of enhancer complexes in vitro (Enh complex, which resembles the E complex), the enhancer sequences determine the specific pattern of SR proteins that can be UV cross-linked to the RNA (32
). Female-specific alternative splicing of the Drosophila
doublesex pre-mRNA requires six 13-nt repeat elements and a purine-rich element. UV cross-linking analysis showed that SR proteins, along with Tra and Tra-2, assemble on the ESEs in a stepwise and sequence-specific manner (21
). The fact that SR proteins are expressed in a tissue-specific manner (14
), together with the specific recognition of ESEs by individual SR proteins may contribute quantitatively to the regulation of gene expression.
The SC35 SELEX winners have the consensus GRYYcSYR, which is a highly degenerate sequence. Even though SC35 has a single RRM, a SELEX protocol based on RNA binding yielded two different nonamer consensus sequences, AGSAGAGTA and GTTCGAGTA, which share the last five nucleotides (35
). These two motifs differ significantly from the more degenerate consensus identified by functional SELEX. Although the second motif has a partial fit to the above consensus, neither motif has a good score, consistent with the observation that the high-affinity binding sequences fail to enhance splicing of RNA substrates in nuclear extract or in S100 extract plus SC35, even when present in several copies (35
). Therefore, it appears that high-affinity SC35-binding sites are not optimal for function. Perhaps RNA-binding selection does not achieve an interaction geometry compatible with SC35 enhancement function, or it is essential to coselect sequences that in addition to binding SC35 can also accommodate putative coactivators or fail to bind silencing factors.
Nevertheless, our data argue that SC35 has limited but defined sequence specificity in recognizing functional sequences. Despite the fact that this protein has a single RRM, the functional recognition motif is degenerate, as was the case for the two-RRM SR proteins SF2/ASF, SRp40, and SRp55 (20
). Therefore, the degeneracy of the ESE motifs recognized by those proteins is probably not attributable to the recognition of distinct motifs by each of their RRMs. The sequence degeneracy of the ESEs is consistent with the fact that they must coexist with a very wide variety of unrelated open reading frames and must be recognized by a discrete set of SR proteins (20
Schaal and Maniatis recently used a similar functional SELEX approach to select ESEs that could function in the context of the Drosophila
doublesex pre-mRNA in HeLa nuclear extract (29
). The selected 18-nt winner sequences were then individually analyzed by S100 complementation assays to define their SR protein specificity. Two round 6 winner sequences were the most active in the presence of SC35. By comparing these two sequences to each other and to an SC35-dependent ESE present in human β-globin exon 2, the authors proposed the SC35 heptamer consensus UGCNGYY, which is also a highly degenerate sequence. Although this heptamer motif is substantially different from our consensus octamer motif, some versions of the degenerate heptamer consensus have high scores, as defined in the present study. We therefore searched the two published winner sequences (29
) by using our SC35 score matrix. Both sequences had multiple high-score motifs, some of which were nonoverlapping, consistent with the fact that they had undergone six rounds of selection for splicing. In the case of the 6-24 sequence, the highest score (3.13) corresponded to the octamer GGUCUCCG, which has a 4-nt overlap with the UGCGGUC sequence that fits the heptamer consensus. In the case of the 6-38 sequence, the second highest score (1.56) corresponds to the octamer UGCCGCC, of which the first 7 nt fit the heptamer consensus; the highest score (2.44) was for the nonoverlapping octamer GGACCGGA. Similarly, within the 18-nt β-globin fragment in which Schaal and Maniatis characterized an SC35-dependent ESE that comprises the heptamer UGCUGUU (28
), the highest score (1.36) corresponds to the octamer UGAUGCUG, which includes the first 5 nt of the heptamer.
We conclude that despite the very different pre-mRNA contexts, types of extract used for the selection, and number of selection rounds, the SC35 ESEs identified by the two approaches are remarkably consistent. We believe, however, that our octamer motif has greater predictive value because it was derived from a much larger number of winner sequences. Also, the use of a nucleotide frequency matrix derived from 30 sequences allows identification of putative SC35 ESEs that do not precisely match the consensus at every position. Thus, our SC35 score matrix finds high-score motifs in both of the winner sequences and the β-globin segment characterized by Schaal and Maniatis (28
), whereas of our 30 SC35 winner sequences (Fig. ), only no. 14 has a precise match to the heptamer consensus they defined.
The IgM M2 exon has a higher density of SF2/ASF and SRp40 high-score motifs within the natural ESE segment than in the flanking sequences. In contrast, the SRp55 high-score motifs do not correlate with the location of the ESE (20
). In the case of SC35, the high-score motifs also have a relatively even distribution across the exon. The different motif distributions may reflect different mechanisms of SR protein-ESE recognition. Although for some pre-mRNAs any SR protein can complement splicing in the S100 extract (30
), each SR protein may function by slightly different mechanisms. Some SR proteins may require multiple binding sites to function, and the optimal distance from the 3′ splice site to the SR protein-binding site may also be protein specific. The fact that ESE motifs are not found exclusively in natural exonic segments required for ESE activity indicates that the motifs are not sufficient for ESE function. It appears that sequence context, structure, or position effects are also very important.
Examples of sequence context effects that can influence ESE activity are provided by exonic splicing silencers. These inhibitory elements probably coexist with splicing enhancers in many exons, and they may also be SR protein dependent and function in a cell-type specific manner. For example, an SC35-dependent silencer sequence has been mapped in the tat
gene T3 exon (26
). This silencer element includes within it an SC35-specific ESE motif (Fig. C). We speculate that binding of SC35 to this region prevents the function of other splicing factors, although it is presently unclear how this element acts at a distance and suppresses the effect of SC35-dependent ESEs but not of SF2/ASF-dependent ESEs. Recently, the 3′ portion of the IgM M2 exon was also shown to comprise a silencer element that binds U2 snRNP and antagonizes the upstream ESE (15
). The silencer element, so far mapped to a fragment between nt 94 and 167 (Fig. A) in the M2 exon, overlaps with several SC35 high-score motifs and with one SF2/ASF high-score motif.
The similar arrangement of adjacent ESE and exonic splicing silencer elements seen in the IgM M2 and Tat T3 exons may turn out to be a common feature of many vertebrate cellular and viral exons. To improve the predictive value of the SR protein-specific ESE motifs, it will be necessary to gain a better understanding of the influence of sequence context and position, as well as of the mechanistic basis for the function of splicing enhancers and silencers.