ESEfinder allows for the identification of putative ESEs and one of its most useful applications is the correct interpretation of the effects of disease-associated point mutations or polymorphisms. We have previously shown that ESEs predicted by this matrix-based approach tend to cluster in regions where natural enhancers have been experimentally mapped and are more frequent in exons than in introns (9
). In a database of 50 human point mutations known to cause in vivo
exon skipping, the majority reduced or eliminated at least one predicted ESE (12
). Considering that we can currently search for putative ESEs using matrices for just four SR proteins, it is likely that a large fraction of skipping-associated mutations do indeed cause ESE disruption, and that a higher predictive value will be obtained when matrices for other relevant splicing factors become available. A computational approach (RESCUE-ESE) was recently described (7
), in which putative ESE motifs are identified by comparing the frequency of hexamers in exons surrounded by ‘weak’ versus ‘strong’ splice sites. Several hexamer families enriched in the weak exons, which likely depend on enhancers for correct expression, were identified, and some of these overlap with the motifs defined by ESEfinder.
The ESEfinder matrices have been used to show that disruption of ESEs recognized by various SR proteins cause exon skipping in several genes (11
). In some contexts, ESEfinder appears to be remarkably accurate. For example, using a BRCA1
-derived three-exon minigene system, which is very responsive to point mutations within a critical ESE, we showed that when multiple SF2/ASF-dependent ESEs were substituted for each other or mutated, there was a strong correlation between exon-inclusion efficiency and the matrix scores (12
). Furthermore, ESEfinder was used in combination with mutational analysis, in vitro
and in vivo
splicing, and site-specific UV-crosslinking experiments to demonstrate that the translationally silent, single-nucleotide difference between SMN1
disrupts an ESE, which in SMN1
is directly recognized by splicing factor SF2/ASF (17
). The disruption of the SF2/ASF-dependent ESE causes inefficient SMN2
exon 7 inclusion. In the absence of SMN1
is unable to produce enough full-length SMN protein, thus resulting in a spinal muscular atrophy phenotype. Finally, we exploited the degeneracy of the consensus motif, and used ESEfinder to design a second-site suppressor mutation that reconstituted the high-score motif and fully restored exon 7 inclusion in the SMN2
context in vivo
and in vitro
, as predicted (17
). More than a dozen wild-type and mutant SF2/ASF heptamer motifs were tested in the SMN
). All of the motifs that maintained a high-score promoted exon inclusion in a manner roughly proportional to the motif score, even though, because of the degeneracy of the consensus motif, some of them did not share a single nucleotide. All of the motifs with below-threshold scores resulted in reduced levels of exon inclusion.
It should be emphasized, however, that the presence of a high-score motif in a sequence does not necessarily identify that sequence as a functional ESE, and that, in general, there is not a very strict quantitative correlation between numerical scores and ESE activity. Until stronger predictive algorithms are available, direct experimental evidence will remain necessary before safely concluding that a particular sequence can act as an ESE in its natural context. Conversely, the lack of a high-score motif does not imply that no ESEs are present. Several important variables, such as the local sequence context, the splice-site strengths, the position of the ESE along the exon and the presence of silencer elements, are likely to play a significant role in ESE activity. Furthermore, even mutations that abrogate genuine ESEs might not always exert a noticeable effect, because of the presence of redundant ESEs nearby. Finally, it should be noted that our matrices were defined in a mammalian system and reflect the sequence specificity of the human SR proteins. Their relevance to other species depends on the extent of conservation of each SR protein.
The development and refinement of reliable prediction tools for auxiliary splicing elements will have important implications for our ability to accurately identify the exon/intron structures of genes and predict their expression profile, to correctly interpret the effects of point mutations and/or polymorphisms, and to assess phenotypic risk.