GGGG and Two Exonic UAGG Motifs Are Required in Combination for Silencing of a Brain-Region-Specific Exon
The 5′ splice site of the CI cassette exon is atypical because of an adjacent
GGGG motif, which is conserved in human, rat, and mouse GRIN1 genes.
GGGG motifs in the first ten nucleotides of human introns are generally infrequent (see below). In the case of the CI cassette exon, the
GGGG motif is immediately adjacent to the U1 small nuclear RNA complementary region of the 5′ splice site, and the overall complementarity of the 5′ splice site (6 bp) is typical for mammals (6 to 7 bp), including all of the most highly conserved positions (−1 to +5).
The role of the
GGGG motif in splicing silencing of the CI cassette exon was examined by generating site-directed mutations in nucleotides +6, +7, and +8 of the intron. These mutations were designed so as not to disrupt the U1 small nuclear RNA complementary nucleotides, which include the last nucleotide of the CI exon and the first five nucleotides of the adjacent intron. Splicing assays involved transfecting splicing reporters into non-neuronal mouse myoblasts (C2C12 cells), followed by measurement of the levels of the exon-included and exon-skipped products by RT-PCR relative to the wild-type sequence.
Each mutation in the
GGGG motif led to a dramatic increase in exon inclusion (A). The strongest effects were observed when the GGG at +6 to +8 was converted to CCC (mutation 5m2) or AUA (5m4), which resulted in an approximately 4-fold increase in exon inclusion, compared to the wild-type sequence. Even a point mutation (5m9) resulted in a 3-fold increase in exon inclusion. Thus, the
GGGG motif plays an important role in the silencing mechanism. Additional sequence changes upstream and downstream of the
GGGG motif had only modest effects on splicing. For example, mutations 5m1, 5m13, and 5m14 were designed to test potential RNA secondary structures involving the
GGGG motif and complementary intron sequences. The modest changes in the splicing pattern resulting from these mutations do not support a significant role in splicing for these hypothetical structures.
Exonic UAGG and 5′ Splice Site
GGGG Motifs Are Required in Combination for Silencing of the CI Cassette Exon
Other than the
GGGG motif at the 5′ splice site, the sequence of this intronic region is devoid of guanosine-rich sequences. Strikingly, introduction of a GGG at intron positions +40 to +42 (5m8) resulted in a 5-fold decrease in exon inclusion. In contrast, two overlapping mutations that did not generate guanosine-rich motifs had little or no effect on the splicing pattern (5m11 and 5m12). Thus, in this context the introduction of a second intronic GGG cluster can shift the splicing pattern toward nearly complete exon skipping.
The possibility that sequences within the CI cassette exon itself might contribute to the silencing mechanism was also explored. Either a scarcity of ESE sequences within the CI cassette exon might weaken exon definition, or the presence of exonic ESS sequences might enforce silencing. A model for the arrangement of ESE motifs in the CI cassette exon was based on the high-affinity sequence-recognition sites for known SR family splicing factors (B, top). Mutations were then made in the ASF/SF2 (
CACCCUG, and CGUAGGU) and SC35 (
GGCCUCCA, and GUCCUCCA) motifs to test predictions of this model, anticipating that reduced exon inclusion should result from the disruption of functional ESE motifs.
The results of these experiments show that most of the mutations decreased exon inclusion, consistent with ESE function (mutations E1, E2, E3, E4, E5, and E6; B). In contrast, a pair of double point mutations in a UAGG sequence beginning at position 93 of the exon generated a substantial increase in exon inclusion, indicative of a silencing role for this sequence (E8 and E9; B). Note that the overlapping ASF/SF2 motif is disrupted by the E9 mutation, but the E8 mutation generates a different ASF/SF2 motif. An additional six-nucleotide mutation (CAUCGU) that eliminates the ASF/SF2 motif at this position also resulted in a strong increase in exon inclusion (K. H. and P. J. G., unpublished data). These results show that the position 93 UAGG motif functions in C2C12 cells primarily as a silencer rather than as part of an ASF/SF2 motif. These results suggested the possible involvement of the splicing repressor hnRNP A1 based on the similarity of the UAGG motif to the hnRNP A1 high-affinity binding sequence U
AGGG[A/U] determined previously by SELEX experiments [30
A Motif Pattern for Strong Splicing Silencing: Analysis of Copy Number and Position Effects in Neuronal and Non-Neuronal Cells
The presence of two natural UAGG motifs in the CI cassette exon raised the question of how silencing might be affected by changes in the number of exonic UAGGs. The number and position of UAGG motifs in the CI cassette exon were altered in the context of the wild-type splicing reporter (wt0) and the effects tested in neuronal (PC12) and non-neuronal (C2C12) cell lines (). One set of mutations varied the position of the 5′-splice-site-proximal UAGG by disrupting the original motif at position 93 of the exon, and by introducing a new UAGG motif at positions 11, 76, and 100 (splicing reporters E10, E11, and E20). These position variations had small effects on the pattern of splicing, with exon skipping predominating in both cell lines (, lanes 1–4 and 15–18). The effect of a single UAGG was then examined at different positions of the exon (splicing reporters E8, E13, E14, E15, and E21). The resulting splicing patterns uniformly showed an increase in exon inclusion, and these effects were essentially independent of position (, lanes 5–9 and 19–23). It was also evident that the level of exon inclusion was higher in C2C12 than in PC12 cells, suggesting that there may be differences in splicing factors that mediate or antagonize silencing in the two cell lines. Nonetheless, each cell line exhibited a similar trend—stronger exon silencing associated with increased copy number of exonic UAGGs. Thus, splicing silencing of the CI cassette exon depends critically on the number of UAGG motifs in the exon, but less so on their relative positions. To further test the prediction that the strength of splicing silencing is linked to the number of UAGGs in the exon, a third UAGG was introduced at position 11 of the exon (splicing reporter E18). As a result, the level of exon inclusion decreased to approximately 0% in both cell lines in agreement with this prediction (lanes 11 and 25).
Effect of Number and Position of CI Cassette Exon Splicing Silencer Motifs
The role of the 5′-splice-site-proximal
GGGG motif was examined independently by generating exons lacking the two natural UAGG motifs in the presence and absence of the
GGGG motif (splicing reporters E17 and T8, respectively; , lanes 10, 12, 24, 26). The
GGGG motif had a small silencing effect in both cell lines in the absence of the exonic UAGGs (compare E17 and T8; lanes 10 versus 12, and 24 versus 26). By contrast, silencing was reduced substantially when the
GGGG motif was disrupted by mutation in the presence of intact UAGGs: exon inclusion increased significantly in PC12 cells (from 25% to 67%; compare wt0 and D0: , lanes 13 and 14), and a similar trend was observed in C2C12 cells (from 32% to 87%; , lanes 27 and 28). Note that mutant D0 contains two intact UAGGs, but lacks the
GGGG motif. Thus, the
GGGG motif acts cooperatively with the exonic UAGGs in both of these cell lines. Together these results show that, for the CI cassette, multiple exonic UAGGs combined with a 5′-splice-site-proximal
GGGG function cooperatively to specify silencing of an otherwise strong exon.
GGGG Motif Is Involved in Silencing by hnRNP A1 and Anti-Silencing by hnRNP H
Next we sought to identify protein factors that interact directly with the UAGG and
GGGG motifs in order to guide empirical tests for their roles in splicing silencing. GTP-labeled RNA substrates were subjected to UV crosslinking in HeLa nuclear extracts under in vitro splicing conditions. These experiments showed pronounced crosslinking to a protein doublet in the vicinity of 50 kDa for RNA substrates containing the intact
GGGG motif (cs1 and 3h1; A, lanes 1 and 3). By contrast, a point mutation in the
GGGG motif largely disrupts protein binding (cs3 and 3h3; A, lanes 2 and 4). Because the apparent molecular weights of these proteins and the guanosine-rich binding specificity [31
] suggested the involvement of hnRNP H/H′ and F proteins, relevant antibodies were obtained for immunoprecipitation experiments. These results identified the bottom band of the doublet as hnRNP F (A, lanes 5–7), whereas the upper band corresponded to hnRNP H/H′ (A, lanes 8 and 9). Although the hnRNP F antibody is highly specific, the H/H′ antibody crossreacts with hnRNP F, which is 95% identical to H/H′ at the protein sequence level. Control reactions (A, lanes 10 and 11) show the background level precipitated with preimmune serum (lane 10).
Identification and Functional Roles of Protein Factors That Bind to
GGGG and UAGG Motifs
Proteins that interact directly with the exonic UAGG motif were identified similarly, except that the RNA substrates contained a single radioactive label in the middle of the UAGG. Even with a single radioactive label, multiple proteins were observed to crosslink to the wild-type substrate, wt3, under splicing conditions (B, lane 4). To examine hnRNP A1 binding, the SELEX-derived consensus sequence, A1winner, was also tested in parallel. A low efficiency of UV crosslinking of hnRNP A1 has been observed previously [30
]. The A1winner contains two U
AGGGA sequences, and was found to crosslink to hnRNP H/H′ and F, in addition to A1 (B, lane 1; data not shown). These results show that A1 is immunoprecipitated as an approximately 35-kDa protein from the wt3 sample, as was the case for the A1winner (B, lanes 1–8). A control substrate, mt3, with a dinucleotide mutation in the UAGG showed little or no immunoprecipitation of crosslinked A1 (B, lanes 9–11). Thus, these results confirm that hnRNP A1 binds directly to the UAGG motif in the context of the CI cassette exon sequence.
In order to investigate the functional roles of hnRNPs F, H, and A1 in the silencing mechanism, each protein was co-expressed with splicing reporters containing the CI cassette exon, and effects on the splicing pattern were monitored. For the wild-type splicing reporter containing an intact
GGGG motif, overexpression of hnRNP F or H was found to enhance CI exon inclusion relative to the pcDNA control (C, lanes 1–5). These effects were reduced but not eliminated in the presence of the 5m2 splicing reporter, which lacks the
GGGG motif (C, lanes 6–10). These results rule out a role in silencing of the CI exon for hnRNP F and H, and instead support an anti-silencing role for these factors.
Next we asked whether the silencing role of the
GGGG motif is mediated through hnRNP A1, since the 5′ splice site of the CI cassette exon is related to the A1 consensus binding motif (ACG:GU
AAGGGGAA [colon defines 5′ splice site] versus U
AGGG[A/U]). These experiments also examined the effects of portions of the flanking introns, since our previous study demonstrated a role for the downstream intron in this silencing mechanism. Chimeric splicing reporters contained the CI cassette exon and various portions of the flanking introns inserted between exons 1 and 3 of the GABAA receptor γ2 subunit (D). When the complete downstream intron was present, co-expression of hnRNP A1 reduced exon inclusion from 78.8% to 29.1%, nearly a 3-fold effect (D, lanes 5 and 6). In this context, the silencing effect of hnRNP A1 depends upon the intact downstream intron, since the silencing effect was substantially reduced when most of the downstream intron was removed (rGγCI-wt0 and rGγCI-up; D, lanes 1–4). The role of the 5′ splice site
GGGG motif was then examined in the context of the rGγCI-dn reporter by introducing mutations 5m2 and 5m4, which destroy the guanosine cluster. The ability of hnRNP A1 to induce splicing silencing was reduced significantly by these mutations, suggesting that A1 is involved in mediating the cooperative effects of the
GGGG motif (rGγCI-dn5m2 and rGγCI-dn5m4 D, lanes 7–10).
Combinations of UAGG and
GGGG Motifs Are Associated with cDNA- and EST-Confirmed Skipped Exons in the Human and Mouse Genomes
We next sought to determine the extent to which the CI cassette silencing motif pattern is associated with exon skipping (partial or complete) in the human and mouse genomes. For this analysis, over 90,000 human and mouse orthologous exon pairs were divided into two datasets based on the presence or absence of one or more UAGG motifs at any position in the exon (but not overlapping the splice sites) and a
GGGG motif within bases 3–10 of the adjacent downstream intron (). The percentage of alternatively spliced (skipped) exons in each of these datasets was then determined by use of large-scale, high-stringency alignments of available cDNAs and ESTs to the corresponding genomic loci (see Materials and Methods
). If the motif pattern functions generally in splicing silencing, the frequency of exon skipping should be higher in the group of exons containing the UAGG and
GGGG motif pattern, compared to those without.
Computational Analysis of UAGG and
GGGG Motif Patterns Reveals Association with Exon Skipping Genome-Wide
In these searches we considered exons of typical size (≤250 bases), and we required each component of the motif pattern to be conserved in sequence and position in the human and mouse orthologous exons. Using these stringent criteria, 16 exons (0.018%) contained the motif pattern, and of these, three were confirmed skipped exons (18.75%). The remaining 90,175 exons (99.98%) lacked the conserved motif pattern, and of these, 4,173 (4.63%) were confirmed skipped exons. The difference in the percentage of skipped exons in these two datasets was significant (p < 0.05). When exon length was not constrained, the fraction of skipped exons with the motifs was slightly lower (15.8%), but still significant (p < 0.05). When this analysis was repeated without requiring conservation of the motif pattern, 227 exons (0.24%) contained the motif pattern, and of these, 18 (7.9%) were confirmed skipped exons (p < 0.05). The remaining 96,292 exons (99.76%) lacked the motif pattern, and of these 4,441 (4.61%) were confirmed skipped exons.
Variations of the CI cassette motif pattern were also analyzed. The reciprocal pattern, one or more
GGGG motifs in the exon and a UAGG motif in bases 3–10 of the intron, also showed enrichment for confirmed skipped exons (8.4%) compared to those without this pattern (4.6%) (p < 0.001). Moreover, the occurrence of a 5′ splice site
GGGG by itself was found to be associated with exon skipping: exons containing the
GGGG motif in bases 3–10 of the intron but lacking UAGG and
GGGG within the exon showed a significantly higher rate of exon skipping (7.8%) compared to those without the
GGGG intronic motif (4.6%) (p < 0.001). Moving the position of the
GGGG motif slightly downstream to bases 11–20 of the intron reduced the fraction of skipped exons observed to background levels (4.6%). Taken together, these data suggest that the close proximity (or overlap) of the
GGGG motif to the 5′ splice site may be generally important in silencing, perhaps by limiting binding of U1 or U6 small nuclear ribonucleoprotein particles.
Underrepresentation of UAGG in Constitutive Exons, and Overrepresentation in Skipped Exons
Underrepresentation of UAGG in constitutively spliced exons and overrepresentation in skipped exons would be expected if this motif frequently plays a role in splicing silencing. To test this idea, approximately 5,000 known human cDNAs were downloaded from Ensembl (www.ensembl.org
, and those containing a full-length ORF were shuffled 50 times using the program CodonShuffle. CodonShuffle randomizes the nucleotide sequence by swapping synonomous codons, preserving the encoded amino acid sequence, codon usage, and base composition of the native mRNA [32
]. Consequently, the program controls for constraints on the protein coding function of the mRNA, and for constraints on codon usage. Since the ORF is preserved by this type of shuffling, codon arrangements forbid the UAG portion of the UAGG motif to occur in-frame. The occurrence of UAGG was reduced by 1.5-fold in authentic coding sequences as compared to CodonShuffled control sequences (p
< 0.001). Thus, the correlation of the motif with exon skipping is statistically significant, and there is modest selection against UAGG sequences for constitutive exons. Next we asked whether UAGG is overrepresented in skipped human exons. As expected, both UAGG and
GGGG were found to be significantly overrepresented in skipped exons as compared to constitutive exons in human (χ2
= 436 and 87, respectively; p
More rigorously, when all possible 5-mers were examined for overrepresentation in orthologous exons that are skipped in both human and mouse, a significant enrichment for U
AGGC and U
AGGG motifs was found (χ2
= 15 and 13, respectively; p
) compared to orthologous pairs of constitutive exons. U
AGGA and UAGGU were not significantly overrepresented, but this may be explained by the small dataset used for the analysis (approximately 240 exons), or to functional overlap with ESE sequences. Nonetheless, the appearance of the UAGG motif in two 5-mers indicates the importance of the motif in conserved skipped exons. Overrepresentation of UAGG in skipped exons has also been found for mRNAs expressed in brain and testes, which are enriched for regulated splicing events [33
Identification of Skipped Exons with Conserved UAGG and
GGGG Motif Patterns across the Human and Mouse Genomes
To identify exons unrelated to the CI cassette that might be silenced by a similar motif configuration, we focused in more detail on the UAGG and
GGGG motif pattern by searching for these motifs singly and in combination in the database of approximately 96,000 human and mouse orthologous exons. Exons containing a
GGGG in bases 3–10 of the intron and one or more exonic UAGGs were identified in the human and mouse subsets of the database and at the intersection of these datasets. These data are presented as Venn diagrams, and specific examples selected from the intersection dataset are shown to illustrate the motif patterns that are conserved in human and mouse orthologous exons (). We included in the intersection dataset only exons in which the motif pattern is conserved in sequence and position in the human and mouse orthologous exons.
Genome-Wide Identification of Exons with UAGG and
GGGG Silencing Motifs
As expected, the CI cassette exon of the GRIN1
gene was found in all three of the overlap datasets. Of the 19 exons containing the motif pattern in the intersection dataset, 16 exons of 250 or fewer bases in length were considered for further study based on the observation that skipping of longer exons is quite rare [34
]. This dataset contained the genes for two well known splicing factors, hnRNP H1 and H3 (HNRPH1
Although human hnRNP H1 contains 14 exons and H3 contains ten exons, the UAGG and
GGGG motif pattern was found associated with a single exon in each of these genes. As hnRNP H proteins are known to bind to guanosine-rich sequences, the presence of a conserved
GGGG motif in the 5′ splice sites of these hnRNP H exons suggests the possibility of autoregulation at the level of splicing.
The hnRNP H exons and additional candidates in the intersection dataset (total of 12) were selected for experimental analysis of splicing patterns by RT-PCR, and to investigate the tissue specificity of the splicing patterns in human tissues (; ). The CI cassette exon was included in the analysis as a positive control (GRIN1). Skipping of the candidate exons for both human HNRPH1 and HNRPH3 was confirmed in several tissues. Exon skipping was also confirmed for candidate exons of UTRN and an uncharacterized hypothalamus-expressed gene, and tissue-specific exon skipping was evident for HNRPH1 exon 5, HNRPH3 exon 3, and UTRN exon 5. To our knowledge, these tissue-specific patterns have not been characterized previously. The results of were confirmed by DNA sequence analysis of the gel-purified products of the RT-PCR reactions. Although the candidate exon in the ANXA8 gene was not experimentally validated in our analysis, EST and mRNA evidence confirms that the exon is skipped in cDNA libraries derived from choriocarcinomas (). An important caveat is that the true number of skipped exons could be significantly higher than that confirmed by RT-PCR because our sampling of human tissues in these experiments was not exhaustive.
RT-PCR Confirmation of Exon Skipping Patterns in Human Tissues
Human and Mouse Orthologous Exons Containing
GGGG Motif Patterns
The mouse orthologs of HNRPH1 exon 5 and HNRPH3 exon 3 were chosen for further analysis of their splicing patterns (, “1
GGGG exons”). These splicing patterns were determined using RNA derived from mouse heart and brain tissue, as well as from the mouse C2C12 cell line. For each RNA sample, radioactive RT-PCR reactions were performed for a set of three serial dilutions of the input RNA. Good consistency in the percent exon inclusion values for each set of serial dilutions was evident. Sequence alignments showed that exon 3 of both the human and mouse HNRPH3 genes contained an additional exonic
GGGG motif not found in the orthologous HNRPH1 exon 5 sequences (, bottom), which might explain the higher rate of exon skipping observed. HNRPH1 exon 8 and β-actin exon 2 served as control exons, since these exons do not contain UAGG or
GGGG motifs (, “0
GGGG exons”). As expected, the “0
GGGG” control exons showed 100% exon inclusion in each case.
Analysis of Splicing Patterns in Mouse Tissues for Variations in the Number of Exonic UAGGs
The observation that multiple UAGGs are associated with an increased strength of splicing silencing of the CI cassette exon (see ) prompted us to examine several exons with these characteristics that were identified in our searches. From the dataset of 213 human exons containing UAGG and
GGGG, 13 exons with two or more UAGGs were identified, and from the dataset of 200 mouse exons containing UAGG and
GGGG, 12 exons with two or more UAGGs were identified (). Exons within these datasets that had lengths typical for internal coding exons (≤250 bases) were chosen for RT-PCR analysis of their splicing patterns. RNA derived from mouse heart and brain and C2C12 cells confirmed the skipping of Hp1bp3 exon 2 and NCOA2 exon 13 and trace levels of skipping for MEN1 exon 8 (see ). Additional cDNA evidence was found in the databases in support of these splicing patterns (). In the case of Hp1bp3, sequence alignments showed that two
TAGGs and the 5′ splice site
GGGG motif were conserved in the human and mouse orthologs, but these exons were not found in the intersection dataset of because the human exon corresponds to the first exon in the transcript, and consequently was not annotated as an internal exon in the Ensembl dataset. Sequence alignments for the more weakly skipped exons, NCOA2 exon 13 and MEN1 exon 8, showed that one or more segments of the motif pattern was imperfect in each set of orthologs (see , bottom).
Generality of the UAGG and
GGGG Motif Pattern for Exon Silencing and Differential Regulation by hnRNP Proteins
To test whether the silencing motif pattern identified above for the CI cassette exon is sufficient for exon silencing in vivo, this pattern was introduced into the middle exon of a heterologous splicing reporter, SIRT1 (). This middle exon corresponds to the constitutively spliced exon 6 of the human SIRT1 gene, and lacks any features of the silencing motif pattern to be examined. In these experiments the generality of the motif pattern, as well as the regulatory roles of hnRNP A1 and H were tested. When the
GGGG motif was introduced by itself at intron positions 6–9 or 8–11 of the SIRT1 splicing reporter (substrates SIRT1-G6–9 and SIRT1-G8–11, respectively), no change in the splicing pattern was observed relative to the parent substrate SIRT1 (, lanes 1, 4, and 7). These results indicate that in the SIRT1 context, the
GGGG motif alone is not sufficient to induce exon skipping. However, when two UAGGs were introduced into the middle exon (ESS19), the splicing pattern was shifted substantially, from 100% to 29% exon inclusion (, lane 10). When the
GGGG motif was subsequently introduced into the ESS19 substrate at intron positions 6–9 (ESS19-G6–9), exon inclusion was further reduced to 18% (, lane 13), showing the combined effects of the motif pattern. In this context, the effect of the intronic
GGGG motif was position dependent, since no additional silencing was observed when the
GGGG was moved to positions 8–11 of the intron (ESS19-G8–11).
Analysis of UAGG and
GGGG Motif Pattern in a Heterologous Context and Effects of hnRNP A1 and H Co-Expression
Based on the effects of hnRNP A1 and hnRNP H on the level of CI cassette exon inclusion described above (see ), we also tested the effects of these factors with the new splicing reporter substrates in co-expression assays. Relative to the vector backbone controls, the co-expression of hnRNP A1 down-regulated exon inclusion, consistent with the presence of the complete silencing motif pattern or exonic UAGGs, and co-expression of hnRNP H had the opposite effect (see , lanes 10–18). The differential effects of hnRNP A1 and H were both dependent upon the presence of exonic UAGGs, since no change in the splicing pattern was observed for substrates SIRT1, SIRT1-G6–9, or SIRT1-G8–11 (see , lanes 1–9). Interestingly, these results suggest that hnRNP H can exert its anti-silencing effect through the exonic UAGGs.
To further investigate the generality of exon silencing by UAGG and
GGGG motifs, we examined a subset of the exons identified by bioinformatics to assess their splicing patterns and sensitivity to regulation by hnRNP A1 and hnRNP H in the SIRT1 heterologous context. Exons containing the silencing motif pattern should be skipped exons, and regulation by these splicing factors would generally be expected for exons that contain the silencing motif pattern. For the convenience of testing new exons in this context, the SIRT1 splicing reporter was modified to introduce restriction sites 12 nucleotides upstream and 12 nucleotides downstream of the middle exon. Test exons with 12 nucleotides of flanking intron on each side were then cloned from mouse genomic DNA and inserted in place of the SIRT1 exon 6 between the restriction sites (). As controls, the middle exon of the SIRT1 splicing reporter and the middle exon of ESS19 were reinserted in this context to generate new splicing reporters identical to those tested above except for the added restriction sites. The splicing patterns of these modified substrates, SIRT1a and ESS19a, were found to be essentially identical to those of SIRT1 and ESS19 shown above, which shows that the restriction sites have no effect on the splicing pattern in these assays.
Exons Identified by Bioinformatics to Contain UAGG and
GGGG Motif Patterns: Analysis of Splicing Patterns in a Heterologous Context and Effects of hnRNP Co-Expression
Next we replaced the test exon of SIRT1a with the CI cassette exon of the rat GRIN1 (GRIN1_CI), exon 8 of MEN1 (MEN1_8), and exon 2 of Hp1bp3 (Hp1bp3_2). In the absence of protein co-expression, exon skipping was observed in every case, although the extent of skipping varied over a wide range (, “Control [vbb]”). For the CI cassette exon, hnRNP A1 induced 2.7-fold more skipping, whereas hnRNP H induced 3.2-fold more exon inclusion compared to the control sample (, compare lanes 3, 8, and 13). Co-expression of hnRNP H increased the inclusion of exon 2 of Hp1bp3 by a factor of 7.4, but no effects of hnRNP A1 were observed (, lanes 5, 10, and 15). The latter may have been precluded by the extreme skipping pattern of this exon (0.8% inclusion), which contains three exonic UAGG motifs and a
GGGG motif in the 5′ splice site. Thus, for these three exons, the regulation mediated by these hnRNP proteins is specified locally—that is, by sequences limited to the exon and adjacent splice sites. We cannot rule out the possible contributing roles of unknown sequence control elements in splicing silencing. However, sequence alignments show that these exons are highly diverse, and lack shared sequences longer than a few bases.
Co-expression of hnRNP A1 and hnRNP H was also observed to regulate exon 8 of MEN1, but with different results. Whereas exon skipping decreased as expected in the presence of hnRNP A1 (74% to 57% exon inclusion), exon skipping decreased to an even greater extent in the presence of hnRNP H (43% exon inclusion), indicating that both of these factors can silence the exon (, lanes 4, 9, and 14). Because the MEN1 exon contains two guanosine-rich ASF/SF2 motifs, 5′-
GGGAGGA3′ and 5′-
AGGAGGG-3′, capable of binding hnRNP H, the observed silencing effect of hnRNP H in this case is not surprising, and is likely explained by the disruption of exon enhancement.
Finally, the results observed for the ESS19 splicing reporter prompted another computational search to determine whether exon skipping is associated with two or more exonic UAGGs genome-wide. Similar to the analysis of , exons containing two or more conserved UAGGs were identified from a large database (>94,000) of human and mouse exons and the cDNA/EST-confirmed skipped exons in that group were determined. From this analysis 163 human exons were found to contain two or more exonic UAGGs that are conserved in sequence and position in the orthologous mouse exons, and 16 of these (9.8%) were confirmed skipped exons (). This was a significant enrichment of exon skipping (p < 0.002) compared to the remaining exons (90,028) lacking UAGGs, of which 4,160 (4.6%) were confirmed skipped exons. When the analysis was repeated for a single UAGG in the exon, a larger number of exons was identified (3,602), but a smaller percentage of confirmed skipped exons, 229 (6.4%), was associated with this group (p < 0.002). The list of 16 human exons with two or more conserved UAGGs and transcript evidence for skipping is shown in , since these are novel candidates for alternative splicing regulation. Of particular interest are Elongator protein 2, NCOA2, Pumilio homolog 2, and RNA binding protein S1, which are implicated in RNA metabolism.
Computational Analysis of Exonic UAGG Motifs and Exon Skipping Patterns Genome-Wide