|Home | About | Journals | Submit | Contact Us | Français|
To better understand splicing regulation, we used a cell-based screen to identify ten diverse motifs that inhibit splicing from intron. Each motif was validated in another human cell type and gene context, and their presence correlated with in vivo splicing changes. All motifs exhibited exonic splicing enhancer or silencer activity, and grouping these motifs based on their distributions yielded clusters with distinct patterns of context-dependent activity. Candidate regulatory factors associated with each motif were identified, recovering 24 known and novel splicing regulators. Specific domains in selected factors were sufficient to confer ISS activity. Many factors bound multiple distinct motifs with similar affinity, and all motifs were recognized by multiple factors, revealing a complex, overlapping network of protein:RNA interactions. This arrangement enables individual cis-element to function differently in distinct cellular contexts depending on the spectrum of regulatory factors present.
The specificity of splicing is mainly defined by splice site and branch point sequences located near the 5' and 3' ends of introns. Beyond these core signals, multiple cis-acting splicing regulatory elements (SREs) play essential roles in controlling splicing specificity. These SREs are conventionally classified as exonic splicing enhancers (ESEs) or silencers (ESSs), and intronic splicing enhancers (ISEs) or silencers (ISSs). SREs generally function by recruiting trans-acting splicing factors that interact favorably or unfavorably with the core splicing machinery such as the small nuclear ribonucleoprotein (snRNP) complexes U1 or U2 1,2. In mammals, splicing of each gene is controlled by multiple SREs and corresponding splicing factors, whose combinatorial actions determine the final splicing outcome 3.
To determine comprehensive rules of splicing regulation, systematic identification of SREs and the associated trans-acting factors is needed. Significant progress has been made recently toward comprehensive identification of exonic SREs 4–9. However, fewer large-scale analyses of intronic SREs have been conducted 9,10 despite the expectation that intronic elements are as important in splicing regulation as their exonic counterparts. Intronic SREs are likely recognized by a greater diversity of proteins, including proteins of the hnRNP class as well as others 2, but knowledge of these factors remains incomplete.
The best-known ISSs include binding sites for the splicing repressors hnRNP I (or PTB) and hnRNP A1 11,12, as well as CA-rich sequences bound by hnRNP L 13. PTB can bind to ISSs and inhibit the transition from an exon-definition complex to an intron-spanning spliceosome 11, whereas hnRNP A1 may block recognition of splice sites by cooperative propagation along the pre-mRNA from primary binding sites introns 12 or by other mechanisms 14. Overall, the sequence composition and mechanisms of action of known ISSs are quite diverse, suggesting that many more ISSs remain to be discovered.
To systematically identify ISSs, we screened a random library for short sequences that inhibit splicing in cultured human cells. All identified ISS motifs enhanced or inhibited exon inclusion when inserted into exons, and most altered the choice of alternative 5′ splice sites (5′SS). Analysis of genomic distributions of these motifs identified four classes of motifs, each having similar patterns of context-dependent splicing regulatory activity. Identification of trans-acting splicing factors for each motif revealed a densely connected network, suggesting that individual exons are often controlled by multiple factors with overlapping RNA-binding specificities that function through the same element at different temporal or cellular contexts.
To unbiasedly identify ISS sequences, we developed a cell-based screen called FAS-ISS (fluorescence activated screen of ISS). We constructed a reporter gene (pZW11) whose exons 1 and 3 form a complete mRNA encoding the GFP, and exon 2 is a small constitutive exon (exon 6 of human SIRT1) that is normally included in ~100% of mRNAs (Supplementary Table 1). Insertion of known ISS containing binding sites of hnRNP I (Test 1) or hnRNP A1 (Test 2) in the downstream intron of exon 2 caused partial skipping of this exon. Following transient transfection with the ISS-containing reporters, we observed about 15–35% GFP-positive cells (Fig. 1a), whereas cells with the control construct had negligible fluorescence (< 5% green cells). We confirmed that splicing patterns were consistent with the GFP expression via RT-PCR (Fig. 1b).
We inserted a random pool of decanucleotides (10mers) at 17 nt downstream of the test exon, far enough into the intron to avoid interference with the 5′SS. We transformed the resulting library into sufficient numbers of E. coli to achieve ~2-fold coverage of possible DNA 10mers. To evaluate the library quality, we randomly picked 24 colonies for plasmid extraction and sequencing, and found that all had 10 base inserts with no bias at any position. We further transfected the resulting library into 293-FlpIn cells and sequenced inserted fragments from a pool of stably transfected cells to ensure no sequence bias (Fig. 1c).
The reporter was inserted by site-specific recombination into a single target site of 293-FlpIn cell line. Typically about one in 1000–5000 cells was GFP positive (Supplementary Fig. 1a). These GFP positive cells were sorted by FACS, clonally expanded, and putative ISS sequences were identified by PCR and sequencing (Fig. 1d). We conducted 315 transfections in 21 batches, from which 114 ISS 10mers were identified, 102 of which were unique (In a few cases, 9mers or 11mers were identified, see Supplementary Table 2). We identified nine 10mers at least twice in independent transfections and 19 pairs differed by a single nucleotide, suggesting that the screen approached saturation. These ISS 10mers could be readily clustered by sequence similarity (Fig. 1d). We inserted 16 of the recovered ISS 10mers (asterisks in Fig. 1d) spanning all clusters back into the reporter pZW11 to assess their activity with a transient transfection assay 7. All 16 candidates produced 20–45% GFP-positive cells, compared to < 3% for random controls (Supplementary Fig. 1b). The percentage of green cells was presumably limited by transfection efficiency as stable lines containing the same 10mers generally yielded a higher fraction of green cells.
To extract core motifs with intrinsic ISS activity, we used statistical over-representation in the recovered 10mer set as a criterion 7. Forty-one 5mers occurred at least five times in the recovered sequences, about 3-fold higher than expected (p < 10−4, based on 10,000 samplings of random 10mers). Using similar criteria, we identified over-represented 6mers and 7mers, and these k-mers (k = 5, 6, 7) were then clustered by sequence similarity and multiply aligned using CLUSTALW to identify candidate ISS motifs. At a cutoff dissimilarity score of 3.6, most ISS k-mers fell into one of nine clusters of at least six sequences each, which were designated FAS-ISS groups A to I (Fig. 2a). We also analyzed an isolated 7-mer designated group U (unclustered). Such clustering was inherently inexact, as the number of clusters may vary depending on the dissimilarity cutoff. However, this clustering was still informative as it provided distinct consensus motifs for further analyses. The group A consensus (CTCCT) resembled the binding motif of PTB and of its neuronal homolog, nPTB, and the group I consensus (TAGG) matched the hnRNP A1 motif. Both factors were known to inhibit splicing from intronic binding sites 11,12,14, confirming the ability of our screen to identify authentic ISS motifs.
The ISSs identified had higher content of A (37%) or T (28%) and lower levels of C (19%) or G (16%) (Supplementary Fig. 1c). This composition was distinct from previously identified exonic SREs, where ESEs were largely A or G-rich 5,6 and ESSs were G or T-rich 6,7. Considering individual motifs, FAS-ISSs more often resembled the A or G-rich ESE motifs than ESSs. Consistently, there are examples where SR proteins promoted splicing from exonic sites but inhibited splicing from intronic sites 15–18. At the dinucleotide level, AC, CA and GA were over-represented in FAS-ISSs, and CG was under-represented. By comparison, AC and GA were over-represented in ESEs and under-represented in ESSs 5,7, again suggesting similarity between ISSs and ESEs (Supplementary Fig. 1d). An exception was the ISS group I (consensus TAGG) that matched motifs bound by hnRNP A1, which typically represses splicing from intronic and exonic locations 19–21.
There are additional ISSs undetectable by our screen, such as motifs whose corresponding trans-factors were poorly expressed in 293T cells or motifs with weak activity to repress the test exon in our reporter. Our screen might also miss very long ISSs, or those that function only when presented in multiple copies. A previous screen for intronic SREs using a reporter based on nonsense-mediated mRNA decay called SPLICE 10 predominantly identified G-rich and G or T-rich elements. The SPLICE elements exhibited limited overlap with the ISS motifs identified here, but overlapped somewhat with motifs identified in a parallel screen for ISE 22.
To test whether the over-represented motifs possess intrinsic ISS activity, we analyzed a representative 6mer or 7mer (an “exemplar”) from each ISS group (Fig. 2a). We used a mutant of the exemplar differing by 1–3 nt as a control (Supplementary Table 3 and 4). Using transient transfection, we observed ISS activity for each exemplar as judged by generation of GFP-positive cells (Fig. 2b). Control sequences all yielded significantly decreased levels of green cells (p < 0.05, paired T test), and the mutations of groups A, C, F and U essentially abolished ISS activity. We further confirmed these results by semi-quantitative RT-PCR (Fig. 2c). These observations suggested that our approach successfully extracted core ISS motifs, and motivated our use of these exemplars as representatives of the ISS groups in the remainder of this study.
To examine the functionality of FAS-ISSs in genomic scale, we determined whether the presence of these ISSs affected alternative splicing patterns in Human BodyMap 2.0 dataset. We searched the downstream of alternative exons for the ISS sequences and for a set of decoy sequences generated by permuting the ISSs. For each ISS group we compared the PSI (percent-spliced-in) values of the exons with an ISS to those associated with a decoy. Eight out of ten ISS groups had decreased inclusion rate than their decoy sets, with four groups (B, E, D and G) having significant changes, whereas only one ISS (I) showed substantially increased PSI (Supplementary Table 5), suggesting that most FAS-ISSs can indeed interpret the splicing of alternative exons from RNA-seq data.
SREs generally function through recruiting specific trans-factors whose activities may vary in different cell types and pre-mRNA contexts. Thus we tested whether the FAS-ISS motifs were specific to the cell type used in original screen by transiently transfecting HeLa cells with reporters containing the same exemplars. While no skipping of the test exon was observed in reporters containing randomly chosen sequences, exemplars of all 10 ISS groups caused a significant degree of exon skipping, and most mutants showed decreased levels of exon skipping in HeLa cells (Fig. 2d).
To test whether these motifs had silencer activity in a second pre-mRNA context, we constructed a new splicing reporter by inserting exon 2 of Chinese hamster dihydrofolate reductase gene and portions of its flanking introns between two GFP exons. We inserted exemplars of ten ISS groups at downstream of this exon (Supplementary Tables 3, 4) and assayed exon inclusion. We observed that 8 of the 10 ISS motifs exhibited silencer activity in this heterologous intron compared to the controls (Fig. 2e). Two exceptions were groups F and I, whose exon inclusion levels were not markedly different from controls, even though sequences similar to the group I motif had been shown to function as ISSs 20. Taken together, these data indicated that although the screen was conducted with a single reporter in a specific cell line, all the obtained ISS motifs had activity across different cell types and most functioned in a heterologous intron context.
The activities of SREs often depend on their locations relative to nearby splice sites, collectively described as the “context dependence” of SREs 1. To determine whether FAS-ISS motifs function in other pre-mRNA locations, we inserted the ISS exemplars into the exon of a modular splicing reporter (Fig. 3a). Strikingly, all ISSs functioned either as ESEs to promote exon inclusion (exemplars C, D, E, F, G, and U) or as ESSs to inhibit exon inclusion (exemplars A, H, I). We observed similar functions for the ISSs in both HeLa and 293T cells (Supplementary Fig. 2a). The only exception was exemplar B, which functioned as an ESE in 293T cells but lacked detectable ESE activity in HeLa cells.
Using a reporter with two competing 5′SS, we previously found that all FAS-ESSs inhibited use of the proximal 5′SS when inserted between the two sites, whereas most ESEs promoted use of this site 23. When inserting each ISS exemplar between competing 5′SS of the same reporter, we observed more diverse activities (Fig. 3b and Supplementary Fig. 2b): three ISS motifs (C, D and E) showed “ESE-like” activity by promoting use of the proximal site, whereas most ISSs (A, B, F, G, H and I) inhibited the proximal site usage (i.e., ESS-like).
We scanned the Illumina’s Human BodyMap 2.0 dataset for alternative exons containing either ISS sequences or their decoys, and examined the correlation of ISS with the change of exon inclusion rate. The ISSs with ESS activity (group A, H, and I, shown in red) all showed remarkably decreased PSI values compared to decoy set (Supplementary Table 5), whereas most ISSs with ESE activity increased the PSI value although only groups D and G produced substantial increases. Similar analyses for exons with alternative 5′SS did not produce statistically significant results, probably because the sample size was small in this case.
Specific classes of SREs often have characteristic distribution patterns in pre-mRNAs. For example, many ESEs are enriched in constitutive exons (CE) relative to introns and in exons with weak splice sites, consistent with their function in enhancing exon inclusion 5,6. Conversely, ESSs are enriched in introns relative to exons and in skipped exons (SE) compared to CE, consistent with their function in repressing exon inclusion and use of decoy splice sites 7,23. Thus the distributional biases of SREs reflect their functions in different pre-mRNA locations and may be used in SRE classification 24. Previously we observed that all FAS-ESSs exhibited similar distribution biases 7,23. By contrast, the relative frequencies of different ISS groups varied substantially in CE, SE and flanking introns – e.g., some groups were enriched in CEs, others in SEs or introns – revealing unexpected complexity (Supplementary Fig. 3).
To explore this complexity and uncover any new patterns, we measured the relative enrichment of FAS-ISSs in thirteen pairs of pre-mRNA regions (Supplementary Table 6), and analyzed the vectors of enrichment values using principal component analysis (PCA). The distribution of each ISS group in space of the first two principal components, which represented 86% of the variance, suggested the existence of 4 clusters of motifs (numbered by cluster size) including two tight clusters (Cluster 1: motifs C, D, E, G, and Cluster 3: B, U), the looser Cluster 2 (A, H, I), and the isolated motif F as Cluster 4 (Fig. 4a). This classification corresponded closely with the context-dependent activities of the motifs observed above (Fig. 4b). Cluster 1 motifs were characterized by enrichment in CEs relative to SEs and introns, and by a convex distribution along exons (i.e. enriched near both ends relative to the centers of exons). All ISSs in this cluster promoted inclusion from an exonic location and favored proximal 5'SS usage when located between competing 5'SS (Fig. 3), a pattern typical of ESEs 23. Cluster 2 motifs were enriched in introns and SEs relative to CEs and had variable exonic distributions. Conversely they inhibited exon inclusion and proximal 5'SS usage from an exonic location (Fig. 3 and and4b),4b), similar to ESSs 23. Motifs of Cluster 3 were exon-enriched but had monotone increasing along exons. Unlike canonical ESEs, these ISSs promoted exon inclusion but inhibited (or did not affect) proximal 5'SS usage. Motif F distributed differently from all other motifs (Fig. 4a). It was intron-enriched like Cluster 2, but promoted exon inclusion like Clusters 1 and 3. Taken together, all six exon-enriched ISS motifs promoted exon inclusion and all but one of intron-enriched motifs inhibited exon inclusion, consistent with the RNA-seq results in endogenous exons (Fig. 4b). The existence of 4 clusters of ISS motifs with distinct patterns of activity contrasts with the relatively homogeneous distributions and activities of FAS-ESSs, suggesting that the functions of intronic SREs are more diverse.
SREs generally function through recruiting trans-factors to affect splicing 1. Alternatively, they affect splicing by forming inhibitory structures to block recognition of splice signals 25 or changing the long range RNA structure 26, or by altering the spliceosome assembly dynamics 9. Since we didn’t detect particular changes of pre-mRNA structures in the original intron containing ISSs versus control 10mers using Mfold (not shown) and most FAS-ISS motifs had silencer activity in a heterologous intron (Fig. 3b), they are most likely to function through recruiting inhibitory trans-factors. To unbiasedly identify associated factors, we incubated a short RNA fragment of ISS exemplars with whole cell extract, and isolated RNA-protein complexes with streptavidin beads and identified proteins specifically bound by ISSs with mass spectrometry 27. (Fig. 5a).
Our approach was illustrated using the exemplar from group U (CACACCA). Four major bands were present in the exemplar RNA but were reduced or absent from controls (Fig. 5b). The strongest band (no. 2) corresponded to hnRNP L that is known to bind CA-rich elements 13,28. Band 1 contained two similar proteins (IGF-II mRNA binding proteins 1 and 3) that are involved in RNA trafficking and translation 29. Band 3 was identified as Y box binding protein 1 (YB-1) that can bind to both DNA and RNA. YB-1 was reported to influence alternative splicing of CD44 30, but its RNA binding specificity was remained unclear. Band 4 contained an alternative isoform of hnRNP L and the protein SF2P32. SF2P32 was originally identified through co-purification with the splicing factor SRSF1 31 but was later localized to the mitochondrial matrix 32, suggesting that it may be a co-purification artifact resulting from its ability to bind other RNA-binding proteins (RBPs). Thus, hnRNP L and YB-1 were identified as putative splicing repressors associated with motif U.
We conducted RNAi and over-expression of identified proteins in cells transfected with splicing reporters containing cognate ISS. Knockdown of either hnRNP L or YB-1 derepressed the inclusion of an exon containing the group U ISS, whereas RNAi of an unrelated RBP (GRSF-1) had no effect (Fig. 5c, supplementary Fig. 4a). These splicing inhibitory activities were specific to the CA-rich motif, as the knockdown had no effect on the reporter containing another ISS (group D). Conversely, the over-expression of YB-1, and to a lesser extent hnRNP L, enhanced exon skipping in the reporter containing the CA-rich ISS but not in a similar reporter containing other ISSs (Fig. 5d, supplementary Fig. 4b). These data implicated hnRNP L and YB-1 as factors responsible for the ISS activity of motif U. Interestingly, simultaneous knockdown of hnRNP L and YB-1 had similar effects as RNAi of each protein alone, and over-expression of both factors repressed splicing less than over-expression of YB-1, suggesting that ISS-dependent splicing repression by YB-1 and hnRNP L is not simply additive. The two proteins did not directly interact with each other as judged by co-immunoprecipitation assay, although the indirect association through RNA bridges was observed (Fig. 5e). The over-expression of YB-1 can rescue the RNAi effect of hnRNP L in splicing reporters containing cognate ISS but not control ISS (Fig. 5f). Therefore the two proteins might compete for binding to the same ISS, with YB-1 repressing splicing more strongly than hnRNP L.
We expanded the affinity purifications to identify trans-factors for other ISS motifs (groups B to I). Group A was excluded because it matched known binding motifs of PTB and nPTB that inhibit splicing from intronic positions33–36. We purified the RNA-protein complexes associated with these ISSs and two additional negative controls (mutant exemplars from groups C and F) and resolved total proteins by SDS-PAGE (Supplementary Fig. 5).
Each ISS motif exhibited a distinct protein binding profile (Supplementary Fig. 5), implying sequence-specific binding of RNAs by associated factors. The protein bands present only in ISS samples were identified by mass spectrometry. Most of identified factors were known RBPs or contained putative RNA binding domains, while a few were annotated as DNA binding or cytoskeletal proteins (Supplementary Table 7). Some proteins may have been identified through indirect interaction with other specific RBPs. For example, multiple proteins in U1 snRNP (e.g. U1-70k, U1A, SmD1 and SmD2) co-purified with ISS motif E, a purine-rich element recognized by SFRS1 (Table 1). Since SRSF1 was detected as a component of U1 snRNP complex 37, these U1 snRNP core components probably bound to group E ISS indirectly through interacting with SFRS1.
We also identified some abundant proteins that probably bound RNA in a non-sequence-specific manner (e.g., La protein was co-purified with 3 motifs and nucleolin co-purified with motif C). Of the 24 identified putative sequence-specific splicing factors (Table 1), 12 belonged to the hnRNP class, many of which were previously shown to regulate splicing, including hnRNPs A1, A2, H1, I and L 35,36,38–40. Several non-hnRNP proteins have not previously been reported to regulate splicing.
Most of the identified RBPs contained multiple RNA binding domains, and many appeared to have flexible specificity capable of binding multiple distinct motifs. Conversely, individual ISS motifs were recognized by multiple proteins, suggesting that the relationship between splicing factors and ISS motifs may be most compactly represented by an RNA-protein connectivity map (Fig. 6a) rather than a table of one-to-one interactions. The ISS motifs from the same cluster (Fig. 4a) tended to bind similar proteins, while a few specific proteins (e.g., members of hnRNP A, D and H families) recognized motifs belonging to two different clusters.
The complexity of the RNA-protein connectivity map raised questions about the extent of direct binding. An extreme case was presented by ISS motifs F, H and I that bound to a highly overlapping set of factors despite significant sequence diversity. These factors included hnRNP A0, A1, A2, A3, D0, DL and DAZAP1, most containing two N-terminal RRM domains and a C-terminal Gly-rich motif. We used surface plasmon resonance (SPR) to determine direct interactions between the ISS exemplar and single purified protein (hnRNP A0, A1, A2, D0 and DL were tested, hnRNP A3 failed to express). Notably, the ISS motifs F, H and I can all directly bind to multiple proteins in the hnRNP A family (A0, A1, A2) and to hnRNP D0 and hnRNP DL (Fig. 6b, supplementary Fig. 6), although each protein exhibited somewhat different affinities for individual exemplar. For example, hnRNP A0 and D0 preferentially bound the A or U-rich motif H, while hnRNP DL showed preferential binding to the motif I. Nevertheless, the binding affinities to distinct motifs were fairly similar, with apparent Kd values in the range 10–100 nM for each pair (unpublished), suggesting that the RNA-protein connections detected in our affinity purification typically reflected direct interactions.
The hnRNP proteins comprised roughly half of the putative trans-factors identified above, many containing regions of repetitive sequence like Gly-rich domains (e.g. hnRNP A1 family). In addition, we found one SR protein (SRSF1) binding to ISSs of groups D and E. It is well established that SRSF1 activates splicing by binding to ESEs and hnRNP A1 inhibits splicing by recognizing ESSs (reviewed in 1,41). Both of these prototypical splicing factors have modular structures, recognizing RNA targets through their N-terminal RRM domains and affecting splicing through C-terminal functional domains (RS or Gly-rich) 19,42. However other mechanism may exist, as the RS domains were shown to recognize RNA near branch-point to promote splicing 43. When tethered to an exon, most RS domains can activate exon splicing 42,44,45 whereas the Gly-rich domain of A1 can suppress splicing 19,45. Since both hnRNP A1 and SRSF1 were identified as putative ISS-associated splicing repressors, we speculated that their RS and Gly-rich domains can inhibit splicing from an intronic location.
We tested this hypothesis by fusing an RS domain (from SRSF1) or Gly-rich domain (from hnRNP A1) with modified pumilio (PUF) domains that recognized specific 8 nt RNA sequences 46. Two PUF domains, the PUF(3-2) specifically recognizing UGUAUGUA and the PUF(6-2) recognizing UUGAUAUA 45,46, were combined with the RS or Gly-rich domain (from SRSF1 or hnRNP A1) yielded four fusion proteins that were co-transfected with the splicing reporter containing cognate target sites (Fig. 6c). As predicted, both the RS and Gly-rich domain inhibited exon inclusion from downstream of the 5′SS, whereas the binding of PUF domain itself had no detectable effect (Fig. 6c, left). Splicing inhibition was detected only in the cognate PUF-RNA pairs but not between non-cognate pairs.
To examine whether the splicing inhibition at intronic locations is a general activity of the RS and Gly-rich domains, additional proteins were generated by fusing PUF(6-2) with the RS domains of SFRS2, SFRS5 and SFRS7, the Gly-rich domains of hnRNP A1, A2 and A3, a short peptide of (RS)6 or a 19-aa Gly-rich sequence (Fig. 6c right panel). Remarkably, compared to the controls (lanes 13 and 14), all of these fusions strongly inhibited inclusion of the exon containing the cognate target at downstream introns, suggesting that a polypeptide containing RS repeats or a short Gly-rich fragment is sufficient to inhibit splicing when recruited downstream of an exon.
We adapted a cell-based screening approach developed previously for comprehensive identification of ESSs 7 to identify ten distinct ISS motifs. The similarities in the reporters and other aspects of the two screens make comparison of their results worthwhile to understand intrinsic differences between exonic and intronic silencers. All the newly identified ISS motifs had activity in a second cell type, and most functioned in a heterologous intron, supporting their functional modularity and portability as observed for ESSs. However, in some respects ISSs behaved very differently from ESSs, exhibiting far greater diversity and complexity of function.
The first clue to these differences was the greater sequence complexity of the ISSs. While the ESSs were G or T-rich (74% G+T), the sequence bias in the ISSs was less pronounced (65% A+T), and a larger number of ISS motifs were identified (10 ISSs versus 7 ESSs). All ESS motifs had similar patterns of distributional bias 7,23 and were able to repress intron-proximal 5'SS and 3'SS, demonstrating remarkable consistency of activity 23. By contrast, the ISS motifs had diverse patterns of distributional bias, with different motifs being most enriched in either CE, SE or introns, and having either convex or increasing frequency along exons. Placement of the ISSs into exonic locations revealed a diversity of functions, including activation or repression of exon inclusion and inhibition or enhancement of proximal 5'SS usage. Such context-dependent activities matched the genomic distributional patterns, emphasizing the need for subclassification of ISSs. We used a PCA approach to define four clusters of functionally different ISS motifs based on distinct genomic distribution patterns, thus the motifs within same cluster may still have quite different consensus sequence. The associated trans-factors can act through multiple motifs and sometimes even across multiple PCA classes, suggesting this classification is likely to be improved by taking into account of trans-factors.
It has been noted that the sequence specificity of most RBPs is relatively low, causing high noise in affinity purifications of proteins associated with ISSs. Our method had a number of improvements to minimize such noise, which include using a short RNA (21–23 nt) with 3 copies of ISS exemplars to increase affinity, using a long spacer separating biotin from RNA to increase the RNA accessibility, and incubating in a large volume (i.e. low protein concentration) for an extended time (> 4 h) to help reach binding equilibrium. Many of the identified proteins probably regulate splicing as suggested in this study and by other labs, but some may also function in other RNA processing pathways. For example, G3BP-1 was reported to mediate RNA translation and degradation 47 and GRSF-1 was shown to affect translation 48. Even the canonical splicing factors like SRSF1 or hnRNP A1 were reported to control mRNA translation or degradation 49,50.
Most factors bound multiple ISS motifs and each motif was recognized by multiple factors, supporting a complex network of protein-RNA interactions responsible for ISS activity. Such overlapping specificities may enable a variety of regulatory relationships. For example, multiple factors with similar regulatory activities may bind the same motif to confer functional redundancy, whereas antagonism would be expected in cases where one factor can displace another and confer opposite splicing regulatory function 22. Other relationships could provide subtler tuning of splicing levels, e.g., displacement of a repressive factor by another repressive factor with stronger or weaker activity, or recruitment of distinct splicing regulatory complexes to a given site in the pre-mRNA. The relationship between PTB and nPTB may illustrate some of these regulatory relationships, e.g., in HeLa cells nPTB can compensate for PTB depletion 34, while in neural development replacement of PTB by nPTB may alter the regulation of a substantial alternative splicing program 51.
A variety of regulatory relationships are possible in the examples identified here. For example, both hnRNP L and YB-1 bound ISS motif U. While over-expression or depletion of either factors was sufficient to repress or derepress splicing respectively, their relationship is not simple redundancy. Instead they may compete for the same site, with hnRNP L repressing splicing less strongly than YB-1 (Fig. 5c). As another example, ISS motifs D and E were both recognized by SFRS1 and hnRNP H1 that typically have the opposite pattern of activity 38,52–54. Our data suggested these factors might antagonistically regulate introns containing motifs D or E - in 293T cells, binding of motifs D and E by SFRS1 or other intron-repressive factors likely predominates, since these motifs have ISS activity. The less-characterized factors RBM45 and GRSF-1 also bind motif D, potentially adding additional regulatory complexity via this motif.
Our analyses using fusions of candidate splicing regulatory domains indicated that the requirements for splicing repression were surprisingly minimal. Tethering of an RS or Gly-rich domain from any of various SR or hnRNP proteins (or even a short RS repeat or Gly-rich peptide) at downstream of an exon was sufficient to repress splicing (Fig. 6c). This observation raises the possibility that many other proteins with similar domains also control splicing when binding to pre-mRNA. An analysis of human proteome identified a total of 24 human proteins that contain both a short Gly-rich segment and at least one RRM domain, representing an 8-fold enrichment above the frequency expected from occurrence of these domains individually (unpublished).
All splicing reporters were constructed from a backbone vector, pZW1, which contains a multi-cloning site between two GFP exons 7. The random sequence library was generated from a foldback primer extended with Klenow and digested and ligated into pZW11 that contain a multi-cloning site at downstream of a constitutive exon. To test ISSs in a heterologous context, a new reporter (pZW2C) was constructed by inserting the exon 2 of DHFR gene and part of its flanking introns between the two GFP exons. The reporter with competing 5′SS and the modular splicing reporter were described previously 23,55,56. Additional details were described in supplemental note.
293 cells and HeLa cells were cultured with D-MEM medium supplemented with 10% fetal bovine serum (FBS). Transfections were carried out with lipofectamine 2000 (Invitrogen) according to the manufacturer’s instructions.
For stable transfection, 293 FlpIn cells were co-transfected with 10 fold excess of pOG44 plasmid. To select stable transfectants, the cells were expanded by a 1 to 4 dilution one day after transfection, grown for one more day and then hygromycin was added to a final concentration of 100 µg/ml. For each transfection in the FAS screen, 1.6 µg pZW11 vector containing random library were transfected into 293 FlpIn cells in a 15 cm tissue culture dish. After selection for 10 days, the positive clones were trypsin digested, pooled and sorted using a Cytomation MoFlo high-speed sorter into 96 well plates at one cell per well. The survival colonies will be visible after 10 days, and were checked for green fluorescence. The total DNAs from GFP positive cells were used for PCR and sequencing. The ISS sequences were further analyzed with a computational pipeline as previously described.
Total RNA was isolated from transfected cells with TRIzol reagent (Invitrogen) followed by DNase I (Invitrogen) treatment. Total RNA (2 µg) was then reverse-transcribed with SuperScript III (Invitrogen) with poly T primer or gene specific primer (for splicing reporter), and one-tenth of the RT product was used as the template for PCR amplification (25 cycles of amplification, with trace amount of Cy5-dCTP in addition to non-fluorescent dNTPs). The resulting gels were scanned with a Typhoon 8600 Imager (GE Healthcare), and analyzed with Image Quant 5.2 software (Molecular Dynamics/GE Healthcare). All experiments were repeated at least three times. The primers used to amply GFP based minigene reporters were AGTGCTTCAGCCGCTACCC for GFP exon 1 and GTTGTACTCCAGCTTGTGCC for exon 3.
The RNA affinity purification method was modified from a protocol of Marzluff’s lab 27. A short (~21 nt) RNA fragment containing three copies of ISS exemplar was synthesized with 5′ biotin followed by two 18-carbon spacers (Ambion/Invitrogen). For each RNA sample, approximately 2.5 × 108 HeLa cells (NCCC, Minneapolis) were harvested at ~95% confluent and resuspended with ice-cold 2.5 ml resuspension buffer (50 mM Tris-HCl pH 8.0, 150 mM NaCl). The cells were mixed with 2.5ml 2x lysis buffer (50mM Tris-HCl pH 8.0, 150mM NaCl, 15 mM NaN3, 1%(V/V) NP-40, 2 mM DTT, 2 mM PMSF, 2x protease inhibitor mix) and lysed for 5 min, and then centrifuge at 12000 g for 20 min at 4°C. Subsequently 0.75 nmol biotinylated RNA with two 18 atom spacers (purchased from Dharmacon) were added and incubated for 2 hrs at 4 °C. We then added 50 µl Streptavidin-agarose beads (purchased from Sigma) into the mixture and incubated for 2 hrs at 4 °C with slow rotation. The beads were rinsed 3 times using 4 ml lysis buffer (50 mM Tris-HCl pH 8.0, 150 mM NaCl, 15 mM NaN3, 0.5% NP-40, 1 mM DTT, 1 mM PMSF, 1x protease inhibitor mix), resuspended in 40 µl final volume, and mixed with 10 µl 5x SDS loading buffer. The proteins were then separated with a 10% SDS-PAGE gel and stained with coomassie blue. The gels was kept in 3% acetic acid for the further mass spectrometry analysis. Bands specific to the ISS RNA but not to controls were excised and the gel slices containing the candidate proteins was digested with trypsin following standard protocols. Peptides eluted from the gel were analyzed by ESI-MS/MS on a Q-Tof (Micromass) mass spectrometer. The in gel digestion and MS was carried out by the UNC Proteomics Center.
The human tissue sequence data (Body Map 2.0) were obtained from the ENA archive with accession number ERP000546 (available at http://www.ebi.ac.uk/ena/data/view/ERP000546). The 50bp paired end reads were mapped using Tophat and PSI values of alternative exons were calculated using MISO 57. Decoy ISS motif sets were generated as random permutations of each of the ISS cluster members and the known SREs were avoided. ISS sequences and decoys identified in the skipped exon body and in the 200 bases following the skipped exon. Significance of overall ISS effect was calculated as the median of comparisons between PSI values for events with an ISS and those with a decoy. SE PSI values were taken for each event from the tissue where their gene is most highly expressed. Significance of tissue specificity was assessed by a bootstrapping comparison.
The siRNAs were purchased from Dharmacon (On-target SMARTpool, with scramble dsRNA controls), and transfections were conducted with Lipofectamine 2000 (Invitrogen) in cells grown in 24-well plates. After 48 hours of siRNA transfection, we transfected with 0.2 µg splicing reporter, and harvested the cells 24 hours after the second transfection for further analyses. In over-expression experiments, we co-transfected 0.8 µg expression vector and 0.2 µg splicing reporters. The cells were harvested after 72 h for further analyses. For the rescue experiment, 293T cells were transfected with siRNA or control siRNA. After 48 hours, we co-transfected FLAG-tagged YB1 and ISS splicing reporters, and collected the cells 24 hours after the second transfection for further analysis.
FLAG-tagged splicing factors were transfected into 293T cell as described above. After 48 hours, the transfected cells were lysed in lysis buffer containing 50 mM HEPES, 150 mN NaCl (4.38g), 1mM EDTA, 1% (w/v) CHAPS and Sigma protease inhibitor cocktail (with or without 50 µg/ml of RnaseA) at room temperature for 10 minutes. The M2 FLAG agarose resin (40 µl) was prepared as described by manufacturer instruction, and incubated with 1000 µl of cell lysate supernatant with gentle agitation at 4 °C for overnight. The IP samples were span down at 8000rpm for 30 seconds and washed with wash buffer (50 mM HEPES, 150 mM NaCl, 0.1% Triton X-100, 10% glycerol, pH to 7.5) for 3 times. Then the proteins were eluted with FLAG peptide (200ng/µl) and transferred into a new tube for further analysis.
The exon and intron datasets were generated using similar filters as described earlier 23. The A3Es and A5Es were further filtered by requiring that the longer isoform differ from the shorter isoform by at least 6 bases, and SEs were required to be at least 6 bases in length. The following numbers of orthologous human/mouse exon pairs were obtained using our criteria: 1232 A5Es, 1408 A3Es, 2964 SEs and 44368 CEs. See supplemental method for details in computational analyses.
The ISS-kmers in each group were analyzed separately for the positional frequency along the pre-mRNA of different exon/intron dataset. The first and last 60-nt of exons, together with the 200 nt flanking introns were used in the analyses. The positional frequency was defined as number of transcripts that contain certain set of ISS k-mers divided by the total number of transcripts in the position under consideration. To smooth the data, the average frequency in a 10-nt window was plotted.
The hnRNP A0, A1, A2, D0 and DL were cloned into pT7HtB expression vector with 6xHis tag and expressed in E. coli BL21(DE3) strain. The proteins were purified into >90% purity with His GraviTrap Kit (GE Health Care) following manufacturer’s instruction. The purified protein was buffer exchanged to remove excess immidazole and stored at 4 °C in storage buffer (20mM HEPES pH 7.4 0.5M NaCl, 10% glycerol and 2mM DTT).
The 5′-biotynalyed RNAs were synthesized by Dharmacon and immobilized at 150 fmols per flow cell on a Biacore streptavidine coated chip (Sensor Chip SA, GE Health Care). A control flow cell was loaded with a 21nt CA-rich control sequence, and all the SLR response were normalized to the controls. All experiments were done on Biacore 2000 platform for at least twice in every concentration, and the data were fitted with BIAevaluation software. The protein is diluted to its final concentration by HBS-EP buffer (GE Health Care, Cat No. BR-1001-88) prior to experiments to avoid bulk shift. Data acquisition was performed using Kinject mode at a flow rate of 50 µl/min in the same running buffer. The surface is regenerated using 100 µl injection of 2M NaCl followed by 200 µl of running buffer.
We thank the J. Hui from Shanghai Institute of Biological Science, Shanghai, China and A. Willis from University of Nottingham, UK providing expression constructs of trans-factors, and Brenton Graveley from University of Connecticut Health Center, Farmington, CT for constructs containing RS domains. We thank T. Nilsen and A. Berglund for critical reading of manuscripts. We thank Z. Dominski and B. Marzluff for helping in RNA affinity purifications. This work is supported by an AHA grant (0865329E) and NIH grant (R01CA158283) to ZW, and NIH grant to CB.
Author contribution: Y W, C B B and Z W designed the research. Y W, Z W, J Z, K L and R C performed the experiments. M M, X X and A R developed computational methods to analyze the data. Y W, C B B and Z W wrote the paper.