The importance of cis
-regulatory sequences for accurate splice-site recognition and exon definition is well documented. However, most experimental studies to date have focused on the regulation of single splicing events. A more global understanding of pre-mRNA splicing requires some knowledge of the distribution of both splicing enhancers and silencers. Using ESEfinder (33
), we have undertaken a large-scale genomic analysis in an attempt to uncover relationships between ESE motif frequencies and splicing regulation. Many of the experimental studies of ESE function have involved examination of their role in the regulation of alternative splicing, and as such little is known about their functional relevance to the process of constitutive splicing. Our studies implicate ESE participation in the regulation of both constitutive and alternative splicing.
Previously, the SR protein-specific matrices utilized by ESEfinder were used to search a limited set of genomic sequences for ESE motifs, which were found to occur more frequently in exons versus introns (31
). We have greatly expanded these initial observations, and demonstrated a significant enrichment for ESE motifs in >60
000 internal constitutive protein-coding human exons. The motifs identified by the RESCUE-ESE technique (35
) and the PESEs of Zhang and Chasin (37
) also occur more frequently in exons versus introns. ESEfinder motif frequencies within exons were approximately constant, supporting the hypothesis that ESEs function to activate splicing from varying distances from the splice sites, an observation also made for the exonic distribution of PESEs (37
). In addition, constant ESE motif frequencies along exons may be a consequence of the ability of single enhancer motifs to influence recognition of both 3′ and 5′ splice sites (43
). The functional SELEX experiments used to derive the ESEfinder matrices were dependent upon the ability of sequences to enhance splicing of a 3′ terminal exon (31
). However, numerous studies have implicated ESE motifs identified by ESEfinder in the splicing of internal exons (13
) and our new data support the conclusion that these ESE motifs play a role in the splicing of internal exons, in addition to terminal exons.
ESE motif frequencies for three of the four SR proteins were significantly higher in exons versus pseudo exons, supporting a role for ESEs in exon definition, and consistent with previous studies of genomic ESE motif distributions (37
). Zhang and Chasin (37
) found fewer PESEs in the same set of pseudo exons that we analyzed with ESEfinder, but identification of the PESE motifs was conditional on their overrepresentation in exons versus pseudo exons. Therefore, the observation that the PESE motifs were more frequent in a second test set of exons versus pseudo exons was a logical expectation (37
). The functional SELEX experiments used to derive the ESEfinder motifs imposed no such a priori criteria; therefore, the fact that these motifs are present at significantly higher frequencies in exons versus pseudo exons supports the conclusion that they are involved in exon definition. In addition, there is evidence supporting a role for silencers in the suppression of pseudo exon splicing: a subset of pseudo exons with a relatively high frequency of ESEfinder motifs was found to have increased frequencies of elements capable of silencing splicing (47
); and Zhang and Chasin (37
) also observed overrepresentation of putative exonic splicing silencers in pseudo exons.
Experimental evidence demonstrated a role for ESEs in constitutive splicing (13
), a function supported by our bioinformatic analysis. One ascribed function of ESEs is facilitating the recognition of suboptimal splice sites. Indeed, improving weak 3′ splice-site polypyrimidine tracts negates the enhancer requirement for a number of substrates (50
). However, there is no evidence that all exons with weak splice sites have an increased dependence upon ESEs. Our comparison of ESE motif frequencies in constitutive exons with weak and strong splice sites implicates ESE involvement in splice-site recognition of all exons. We observed significant differences in some ESE motif frequencies when constitutive exons with strong and weak 3′ or 5′ splice sites were compared independently, or when exons with both strong 3′ and 5′ splice sites were compared with their counterparts with weak sites. However, there was not a simple relationship between splice-site score and ESE motif frequency, as in some instances exons with strong splice sites were found to contain more ESE motifs. In addition, when we repeated this analysis using Zhang and Chasin's PESEs, we observed no difference in the frequency of PESEs in exons with weak splice sites compared with those with strong splice sites (data not shown). It remains possible that weak splice sites tend to be associated with stronger ESEs, rather than with an increased number of ESEs, although it is known that multiple ESEs in the same exon act additively (52
). This hypothesis remains to be tested, and will require a more quantitative version of ESEfinder.
A recent survey revealed an increase in the number of ESE motifs identified by RESCUE-ESE in the vicinity of the splice sites of constitutive exons (53
). We only observed this trend with SF2/ASF and SRp55 motifs in exons with weak 3′ and 5′ splice sites, respectively. As described above, ESE motifs for some of the SR proteins are actually higher in exons with strong splice sites. These differences in ESE motif distributions may be a consequence of the very different methods used in their identification. The motifs identified as putative enhancers by RESCUE-ESE were constrained by the requirement to be enriched in constitutive exons with weak splice sites, whereas the sequences identified by functional SELEX were selected by their ability to activate exon inclusion in the presence of a particular SR protein. It is possible that RESCUE-ESE identified a set of enhancer sequences involved in the recognition of a restricted set of exons, and that ESEfinder recognizes enhancers involved in a more general aspect of exon definition.
Alternative splicing serves to greatly expand the proteome, with one recent report estimating that up to 74% of multiexon human genes are alternatively spliced (54
). ESEs, and the SR proteins that bind them, have well defined roles in regulating the process of alternative splicing [reviewed in (1
)]. A commonly held assumption states that exons that undergo alternative splicing have weaker splice sites, by comparison with those that are constitutively spliced. Our previous analysis of a limited set of alternatively spliced exons supported this assumption (56
). In addition, a recent report found significantly higher splice-site scores for constitutive versus alternative exons in five species, including humans (45
). We derived large datasets of constitutive and alternatively spliced (included or skipped) protein-coding human exons, and again demonstrated that alternatively spliced exons as a set have significantly weaker splice-site scores. However, the splice-site score distributions are surprisingly similar and largely overlapping, such that the splice-site scores alone are not sufficient to define a given exon as constitutive or alternative.
Intriguingly, we found that skipped exons have significantly fewer ESE motifs than constitutively spliced exons. In addition, skipped exons, unlike those that are constitutively spliced, do not have increased ESE motif frequencies in comparison with their flanking intronic regions, except for one of the four SR proteins tested, SF2/ASF. Zhang and Chasin (37
) likewise reported finding fewer PESEs in alternative exons compared with constitutive exons, and a comparable number or slightly fewer RESCUE-ESE motifs were observed in skipped exons (35
). One can speculate that fewer ESEs per exon may result in less efficient exon definition, and subsequently lead to exon skipping. However, this remains a hypothesis that will require appropriate experimental validation. Two recent publications (57
) reported significant conservation of the flanking intronic regions of alternatively spliced exons, perhaps implying a function for intronic motifs in the control of alternative exon definition.
ESE motif identification by functional SELEX, and the computational methods of RESCUE-ESE or Zhang and Chasin's octamer analysis rely upon different methodologies. However, the motifs identified share some commonalities, namely overrepresentation in exons versus introns, and in constitutive versus alternatively skipped exons. Interestingly, our analysis revealed that the ESE motifs recognized by ESEfinder and RESCUE-ESE do not significantly overlap. Nevertheless, experimental data proved the ability of both methods to define functional enhancers (31
), and as described above, these differences may arise at least in part from the constraint of association with weak splice sites inherent in RESCUE-ESE. Over 80% of the RESCUE-ESE hexamers are found in the collection of PESEs (37
). However, in contrast to the analysis of RESCUE-ESE motif distribution (53
), there was no increase in PESE frequency near the splice sites (37
). This difference may be due to differences in the exonic databases analyzed, or it may be a consequence of a small subset of the RESCUE-ESE motifs accounting for the observed increase near splice sites. Our scoring of Zhang and Chasin's PESEs with ESEfinder revealed no enrichment for high-score SC35, SRp40 or SRp55 motifs. However, we did find an increase over the expected number of SF2/ASF motifs within the PESE group, indicating some overlap between the two methods. It should be noted that our analysis is limited to four SR proteins, and it is highly probable that both the set of RESCUE-ESE hexamers and the PESE octamers contain enhancer sequences recognized by other SR and non-SR proteins, though these methods do not identify the factors responsible for motif recognition.
ESEfinder scores sequences for the presence of putative enhancers, and we emphasize that experimental validation is required for definitive proof that any given motif is a bona fide ESE in its natural context. Other factors may influence the ESE potential of any given motif. These include sequence context, e.g. the presence of nearby silencers, secondary structure effects and tissue-specific splicing factor concentrations. Experimental efforts are underway to refine the original matrices. Future improvements will include experimental refinement of threshold values, and additional SR protein-specific matrices.