|Home | About | Journals | Submit | Contact Us | Français|
Spliceosomes assemble on pre-mRNA splice sites through a series of dynamic ribonucleoprotein complexes, yet the nature of the conformational changes remains unclear. Splicing Factor 1 (SF1) and U2 Auxiliary Factor (U2AF65) cooperatively recognize the 3’ splice site during the initial stages of pre-mRNA splicing. Here, we used small-angle X-ray scattering to compare the molecular dimensions and ab initio shape restorations of SF1 and U2AF65 splicing factors, as well as the SF1/U2AF65 complex in the absence and presence of AdML splice site RNAs. The molecular dimensions of the SF1/U2AF65/RNA complex substantially contracted by 15 Å in the maximum dimension, relative to the SF1/U2AF65 complex in the absence of RNA ligand. In contrast, no detectable changes were observed for the isolated SF1 and U2AF65 splicing factors or their individual complexes with RNA, although slight differences in the shapes of their molecular envelopes were apparent. We propose that the conformational changes that are induced by assembly of the SF1/U2AF65/RNA complex serve to position the pre-mRNA splice site optimally for subsequent stages of splicing.
Pre-mRNA splicing regulates expression of almost all human genes, by either including or excluding different protein coding exons and removing intervening noncoding introns.1 The fidelity of pre-mRNA splicing is highly regulated, to avoid errors in pre-mRNA splicing that are frequently lethal or lead to human genetic diseases (reviewed in 2). For example, the splicing factor U2AF65 is essential for vertebrate development,3 and specific U2AF65 deficiencies have been associated with cystic fibrosis,4 myotonic dystrophy,5 and cancers.6; 7 As for U2AF65, the splicing factor SF1 is required for mammalian cell viability8, and deficiencies of SF1 isoforms have also been shown to promote cancer proliferation.9; 10; 11 In the early stages of splicing, a protein-protein complex between U2AF65 and SF1 splicing factors recognizes consensus pre-mRNA sequences located near the 3’ splice site (Fig. 1a).12; 13; 14 These respective pre-mRNA sequences include a polypyrimidine (Py) tract directly adjacent the 3’ splice site junction, and an upstream branch point sequence (BPS) that ultimately provides the nucleophile in the splicing reaction. Following association of the SF1/U2AF65 complex, the core small nuclear ribonucleoprotein (snRNP) particles of the spliceosome are recruited to the pre-mRNA, and undergo a series of ATP-dependent rearrangements to achieve an active conformation for pre-mRNA splicing (reviewed in 15).
The U2AF65 and SF1 splicing factors are composed of modular domains that primarily function to mediate protein/protein or protein/RNA interactions (Fig. 1b). A C-terminal, RRM-like domain of U2AF65 with specialized features for protein rather than RNA binding (called a U2AF-Homology Motif, UHM16) is responsible for recognizing an N-terminal domain of SF1 (called a U2AF-Ligand Motif, ULM).17; 18 Two central RNA recognition motifs (RRM) of U2AF65 identify the Py tract sequence of the pre-mRNA.14 The N-terminal region of U2AF65 contains a ULM that mediates heterodimerization with the U2AF small subunit (U2AF35),19 which is required for splicing a subset of introns with short, divergent Py tracts.20; 21; 22 An arginine-serine (RS) rich domain at the U2AF65 N-terminus is essential for snRNP recruitment, and is thought to act by enhancing duplex formation between the U2 snRNA and the BPS of the pre-mRNA.23 For SF1, a central K homology (KH) motif and adjoining Quaking homology 2 (QUA2) region recognize the BPS site.17; 24 The SF1 domain between the KH-QUA2 and ULM is highly conserved from yeast to humans, and contains regulatory sites for phosphorylation.25 The piecewise structures of many of these interacting domains have been determined, including SF1-ULM/U2AF65-UHM, U2AF65-RRM1-RRM2/Py tract, U2AF65-ULM/U2AF35-UHM, and SF1-KH-QUA2/BPS complexes (Fig. 1c).18; 24; 26; 27; 28 Nevertheless, the overall structure of the SF1/U2AF65 and its complex with the 3’ splice site remain unknown.
Several lines of evidence indicate that the 5’ and 3’ splice sites communicate during the earliest stages of splicing, despite intervening pre-mRNA sequences that are up to thousands of nucleotides in length. For U2AF65 and SF1 to efficiently associate with the 3’ splice site during the initial stages of splicing, the U1 snRNP of the spliceosome must be present at the 5’ splice site.29; 30 RNA substrates labeled with directed hydroxyl-radical probes confirm that the 5’ and 3’ splice sites are within close proximity (10–20 Å) even in this early splicing complex.31 The inter-splice site interaction depends on the presence of a Py tract in the pre-mRNA substrate for recognition by U2AF65. Further, use of a U2AF65-tethered hydroxyl-radical probe demonstrates that the U2AF65 N-terminus interacts with the BPS32, despite the simultaneous engagement of the C-terminal U2AF65-UHM by the SF1/U2AF65/BPS complex (Fig. 1a, b).18 A bent pre-mRNA configuration that places the 5’ splice site near the 3’ splice site is consistent with the functions of the N-terminal U2AF65 RS domain to contact the BPS and promote annealing of the BPS/U2 snRNA duplex (Fig. 1a).23; 33; 34
Despite the plethora of biochemical evidence in support of these conformational changes and the availability of structures for the individual splicing factor domains, the field currently lacks structural information for the early splice site assemblies. To investigate the overall conformations and influence of RNA binding on SF1, U2AF65, SF1/U2AF65 complexes, we used small-angle X-ray scattering (SAXS) to characterize the low resolution shapes of the splicing factors in the presence and absence of prototypical AdML splice site sequences. This method is suitable for complexes of intermediate size and has the advantage of determining macromolecular shapes in solution.35; 36 To the best of our knowledge, these results represent the first overall views of these splicing factor and RNA complexes.
The boundaries of the protein constructs and sequences of the RNA sites used for SAXS are shown in Fig. 1d,e. Both the U2AF65 and SF1 constructs (U2AF65R123 and SF11–255) contain the essential domains of the SF1/U2AF65 complex and bind the 3’ splice site consensus sequences. The U2AF65R123 construct is composed of the two RRMs for Py tract recognition and a UHM for SF1 interaction. The RS domain of U2AF65 was omitted, since it nonspecifically dominates the RNA affinity and causes protein aggregation that interferes with SAXS.37; 38 The SF11–255 construct is composed of a KH-QUA2 domain for BPS recognition and a ULM for interaction with the U2AF65 domain. The C-terminal proline-rich domain and zinc knuckle of SF1 were omitted, since these regions lack known function in formation of the splice site complex and like the RS domain are prone to aggregation that interferes with SAXS. Sequences from the prototypical 3’ splice site of the adenovirus major late first intron (AdML) were chosen as the RNA binding sites, since this substrate is independent of the U2AF35 small subunit21 and the activities of U2AF65 and SF1 in AdML splicing have been studied extensively (for example, 14; 39). Serendipitously, the RNA sites contribute less than 10% of the total scattering mass of the splicing factor complexes. Consequently, the SAXS data primarily reflects the protein conformations.
To verify that the samples used for SAXS were sufficiently concentrated to ensure homogeneous protein/protein and protein/RNA complexes in solution, we measured the apparent equilibrium dissociation constants (KD) of the complexes. Previously, we had determined a high affinity for the association of SF11–255 with the U2AF65-UHM interaction domain (KD 12 nM, near the detection limits of the calorimetry method)40, indicating that the SF11–255/U2AF65R123 complex would be stable under the conditions of the SAXS experiments. To address the affinities of the protein/RNA complexes, we used fluorescence anisotropy to measure the KD values of the splicing factors for fluorescein-labeled RNAs (Fig. 1f–h). Consistent with previous reports,24; 26 the KDs for SF11–255 binding the 9-nucleotide BPS and U2AF65R123 binding the 14-nucleotide Py tract of the AdML splice site were in the micromolar range (0.7 ± 0.1 µM and 0.3 ± 0.1 µM, respectively). The SF11–255/U2AF65R123 complex displayed a nanomolar affinity for the 25-nucleotide AdML 3’ splice site (3’SS; 2.6 ± 0.8 nM), which was ~100–200-fold higher than the affinities of the individual proteins for the isolated BPS and Py tract RNAs. Although the large positive cooperativity observed here is enhanced by differences in the lengths of the RNA sites, previous reports likewise document that the presence of SF1 or U2AF65 pre-bound to the AdML splice site increases the apparent RNA affinity of its protein partner.13; 41 Altogether, these affinity measurements ensure that the sample concentrations used for SAXS data collection were significantly greater than the KD values of the protein-protein and protein/RNA complexes, by at least 3800-fold for the lowest concentration of SF11–255/U2AF65R123, 150-fold for the lowest concentrations of U2AF65R123/Py tract or SF11–255/BPS, and 14600-fold for the lowest concentration of SF11–255/U2AF65R123/3’SS.
SF11–255 and U2AF65R123 were expressed separately as GST fusion proteins in E. coli as described 40; 42, and the GST-tags were removed following purification by glutathione-affinity chromatography. All of the proteins and protein-protein complexes used for SAXS experiments were further purified by ion-exchange chromatography and size exclusion chromatography. For preparation of the SF11–255/U2AF65R123/3’SS complex, the purified proteins and synthetic RNA (Dharmacon Technologies) were mixed in a 1:1:1.2 ratio, and the complex was isolated following size exclusion chromatography. The relatively large size of the AdML 3’SS RNA (7.7 kDa) allowed this high affinity complex to be concentrated using a 2 kDa MWCO filter without detectable loss of the RNA component (Supp. Table S1, Supp. Fig. S1). The shorter BPS RNA (2.7 kDa) approached the MWCO of the filter and bound with a lower affinity, raising selective RNA loss during the early stages of SF11–255/ BPS sample concentration as a concern. To ensure a 1:1 stoichiometry, the SF11–255/ BPS sample was reconstituted following size exclusion chromatography of the SF11–255 protein. The U2AF65R123 complex with Py tract (4.2 kDa) used for detailed analysis here was prepared in a similar manner to the SF11–255/ BPS complex. Control SAXS experiments comparing U2AF65R123/Py tract complexes with RNA added either before or after size exclusion chromatography lacked detectable differences in the molecular dimensions (Supp. Fig. S2). All samples (SF11–255, U2AF65R123, SF11–255/U2AF65R123, SF11–255/BPS, U2AF65R123/Py tract, SF11–255/U2AF65R123/3’SS) were monodisperse by dynamic light scattering (≥98% of the scattering contributed by appropriately sized macromolecules, data not shown).
Short and long exposures for three different concentrations of each sample were collected at SIBYLS beamline 12.3.1 of the Advanced Light Source, Lawrence Berkeley National Laboratory. Following subtraction of matching buffer profiles, the final dataset was obtained as previously described26 by merging the datasets collected at different exposures and concentrations. Superposition of scaled scattering curves, linearity of Guinier plots43, and consistent radii of gyration (RG) over the range of concentrations confirmed the absence of sample aggregation (Fig. 2). In general, the Kratky plots for each of the splicing factors and RNA complexes are broad parabolas followed by slightly increasing I(q)*q2 with increasing q (Fig. 3), consistent with well-folded domains connected by flexible linkers44.
Overall, the Kratky plots and pairwise distance distribution (P(r)) functions of SF11–255 compared with SF11–255/BPS indicate that RNA induces relatively minor changes in the SF1 shape (Fig. 3a, d). The SF11–255/BPS Kratky plot is similar to that of SF11–255 with the exception of a slightly more prominent parabolic region characteristic of globular proteins44. This may result from RNA-induced structuring of KH-QUA2 loop regions and/or nonspecific RNA interactions by the basic ULM, which is unstructured18; 40 and exhibits weak RNA affinity in the absence of U2AF65 (data not shown). The P(r) functions of both apo-SF11–255 and the SF11–255/BPS complex share similar values of RG (respectively ~30 and 29 Å) and the maximum dimension (Dmax) (~105 Å) (Table 1). Both P(r) curves exhibit a maximum at ~25 Å that is likely to correspond to intraparticle distances of the KH-QUA2 domain, and extended tails characteristic of elongated shapes45 (Fig. 3a). A minor shoulder at ~45 Å in the P(r) function for the SF11–255/BPS complex may arise from interdomain distances, again suggesting a somewhat more defined interdomain structure for the complex of SF1 with RNA than for the apo-protein.
To obtain three-dimensional information from the scattering profiles, ab initio shapes were restored using the program DAMMIN.46 This program was chosen since its dummy atom model makes no assumptions concerning the nature of the macromolecule, whereas the alternative program GASBOR47 constrains its dummy residues to the Cα positions of a polypeptide chain. Although the structure of the KH-QUA2 domain is available,24 docking of the high resolution structure with the SF11–255 molecular envelope is expected to be unreliable since this domain comprises only ~50% of the scattering mass of the construct employed here. Ten iterations of DAMMIN46 demonstrate acceptable uniformity (normalized spatial discrepancies, NSD of 0.80 for apo-SF11–255 and 0.68 for SF11–255/BPS, Table 1) and excellent matches with the experimental scattering data (χ ~1) (Table 1, Fig. 2a,b). The averaged, filtered molecular envelopes of apo-SF11–255 and SF11–255/BPS are shown in Fig. 4b. The similar ab initio shape reconstructions of apo-SF11–255 and SF11–255/BPS complex were consistent with their shared P(r) characteristics (Fig. 4a). Both samples adopt three-lobed ellipsoids, where the distinct lobes may correspond to the C-terminal KH-QUA2 domain, the central phosphorylated domain of unknown structure, and the N-terminal ULM of SF1. The averaged shape of the SF11–255/BPS sample appears smaller and thinner in one lobe than the corresponding lobe of the apo-SF11–255 complex. Since the final models represent a frequency map of the dummy atom positions,48 constriction of this lobe on RNA binding may reflect a slightly reduced flexibility of the globular SF1 KH-QUA2 domain – an interpretation supported by the enhanced parabolic region of the SF11–255/BPS Kratky plot.
The P(r) functions of the apo-U2AF65R123 protein and its complex with the AdML Py tract RNA are superimposed in Fig. 3b. The unliganded protein and its RNA complex share RG values of ~33 Å and extend ~115 Å in the Dmax (Table 1). In both cases, the skewed, asymmetric P(r) profiles are characteristic of elongated particles with a close to linear distribution of domains45, and both display similar first maxima at ~22 Å corresponding to the interatomic distances within an RRM or UHM. A second maximum at ~45 Å in the P(r) function of apo-U2AF65R123 corresponds to a shoulder previously observed for a U2AF65 construct composed solely of the two RRM domains (U2AF65R12) (also shown in Fig. 3b).26 These data suggest a beads-on-string shape for U2AF65R12 or U2AF65R123, in which the position of this second maximum corresponds to the distance between RRM2 and the overall centers of the other domains. The addition of RNA to form the U2AF65R123/Py tract complex slightly compresses the parabolic region of the Kratky plot relative to U2AF65R123, but has little effect on the increase in I(q)*q2 at higher q (Fig. 3e). This observation suggests that the poorly-conserved linker regions connecting the well-folded RRM and UHM domains remain flexible following RNA binding. Accordingly, deletion or substitution of the inter-RRM linker sequences of U2AF65 has no detectable effect on in vitro RNA splicing or RNA binding, little effect on the structural characteristics of the U2AF65 RNA binding domain determined by SAXS, and the crystal structure of a U2AF65 variant exhibits few inter-RRM contacts.27; 42
Ten independent iterations of DAMMIN46 gave consistent ab initio models from the scattering profiles (NSD of 0.59 for apo-U2AF65R123 and 0.55 for U2AF65R123/Py tract, Table 1) and acceptable fits with the scattering data (χ ~1) (Table 1, Fig. 2a,b). The average, filtered molecular envelopes of the apo-U2AF65R123 and its Py tract complex are shown in Fig. 4b. The availability of high resolution structures for the U2AF65 RRM and UHM domains, which contribute ~80% of the scattering mass of the construct, enabled rigid body models of these structures connected by ab initio linker regions to be fit against the apo-U2AF65R123 SAXS data using the program BUNCH50. Ten independent iterations of BUNCH starting from random configurations of the domains resulted in reasonable spatial agreement among the final models (NSD 1.22). In all cases, the two RRM domains were located slightly closer to one another than to the UHM, consistent with a 30-residue linker between RRM1 and RRM2 compared with a 40-residue linker between RRM2 and the UHM. The ‘most typical’ BUNCH model (NSD 1.18) is shown superimposed on the DAMMIN dummy atom model in the left panel of Fig. 4b, and has reasonable agreement with the SAXS data (χ 1.26). Consistent with the two maxima observed in the P(r) function, superposition with the BUNCH models of apo-U2AF65R123 indicate that the three distinct lobes of the DAMMIN shape are likely to correspond to the RRM1, RRM2, and UHM domains.
In the presence of the Py tract RNA, the U2AF65R123/Py tract complex changes to an oblong ellipsoid (Fig. 4b). Based on biochemical expectations14; 17; 24; 49 and superposition with the apo- U2AF65R123 models, it appears likely that the enlarged lobe at the base of the orientation shown in Fig. 4b corresponds to U2AF65 RRM1 and RRM2 bound to the Py tract RNA, whereas the head may represent the RNA-free UHM domain.
The P(r) function of the SF11–255/U2AF65R123 complex substantially contracts in the presence of the RNA site, decreasing Dmax values from ~150 to 135 Å for the SF11–255/U2AF65R123/3’SS RNA complex (Fig. 3c, Table 1). Likewise, addition of RNA slightly compresses and alters the lower resolution region of the SF11–255/U2AF65R123/3’SS Kratky plot from a single to bimodal parabola (Fig. 3f). The higher resolution region of the SF11–255/U2AF65R123/3’SS Kratky plot remains similar to that of SF11–255/U2AF65R123. Since the parabolic shapes of the Kratky plot correspond to globular folded domains whereas the increasing values at higher resolution reflect flexibility44, the SF11–255/U2AF65R123/3’SS complex appears to become more compact and globular without undergoing substantial changes in flexibility compared with the protein complex in the absence of RNA.
The ab initio shapes of the SF11–255/U2AF65R123 and SF11–255/U2AF65R123/3’SS complexes show reasonable consistency among ten iterations (NSD 0.76 and 0.69, respectively) and fit the experimental data (χ ~1) (Table 1, Fig. 2a,b). The molecular envelope of SF11–255/U2AF65R123 displays two protrusions at the top and right of the viewpoint shown in Fig. 4c, and a larger, extended lobe at the base. In the RNA complex, the lower lobe contracts towards the center of the scattering mass, whereas the top and right protrusions remain distinct. Although the locations of the component proteins within the molecular envelope cannot be verified without access to a neutron source for scattering experiments with deuterium-labeled samples, one possible explanation is that the upper protrusions correspond to the RRM1 and RRM2 domains of U2AF65 and the larger base to the SF1/U2AF65UHM complex. This interpretation would account for the increased mass in the central region of the SF11–255/U2AF65R123/3’SS complex. Since each RNA binding domain (KH-QUA2 of SF1; RRM1 and RRM2 of U2AF65) is respectively tethered by binding the BPS and the adjacent Py tract of the AdML splice site RNA, the KH-QUA2 domain of SF1 may be constrained closer to the U2AF65 RRMs in the SF11–255/U2AF65R123/3’SS compared with the SF11–255/U2AF65R123 complexes (Fig. 4d).
The SAXS solution analysis presented here represents the major functional domains of SF1 and U2AF65 in the absence and presence of splice site RNAs. Since the bound RNAs contribute only ~10% of the total scattering mass, these data primarily reflect and provide evidence for changes in the protein conformations following RNA binding. The slight changes observed in the overall shapes of the U2AF65R123/Py tract and SF11–255/BPS complexes, compared with the unliganded proteins, can be attributed to localized changes in the RNA binding domains of these splicing factors. For example, BPS binding is expected to be limited to the KH-QUA2 domain of SF1, and may account for the similar but more defined ab initio shape of the SF11–255/BPS complex compared with apo-SF11–255 (Fig. 4a). Conversely, the RRM1 and RRM2 domains of U2AF65 are expected to bind in a linear arrangement to the continuous Py tract sequences of the pre-mRNA, consistent with an ellipsoidal rather than lobed shape of the U2AF65R123/Py tract complex (Fig. 4b). Notably, the molecular dimensions of the SF11–255/U2AF65R123/3’SS complex substantially contract when bound to an RNA oligonucleotide containing the 3’ splice site consensus sequences (Fig. 3c, Fig. 4c). These comparisons demonstrate that formation of the overall SF11–255/U2AF65R123/3’SS complex, rather than RNA binding by the individual splicing factors, leads to a major structural change. It is possible that this transition serves as a structural checkpoint to protect the pre-mRNA from entering the splicing pathway until an appropriate SF1/U2AF65 complex has assembled at the 3’ splice site.
Substantial evidence has accumulated in support of a bent conformation for the 3’ splice site when bound in the early splicing complex with U2AF65 and SF1. Functionally, the 5’ and 3’ splice sites communicate either directly or indirectly in this earliest splicing factor complex. The U1 snRNP must be present at the 5’ splice site for U2AF to efficiently assemble with the 3’ splice site29, as well as for the U2 snRNP to stably associate with the BPS in the subsequent step of splicing.30 Moreover, the S. cerevisiae homolog of SF1 (Msl5p) interacts with the Prp40p subunit of the U1 snRNP at the 5’ splice site, yet binds the BPS near the 3’ splice site.51 Unambiguous chemical evidence is offered by directed hydroxyl radical probing experiments, which indicate that the 5’ and 3’ splice sites are within close proximity even in the early splicing factor complex.31 This proximity is dependent on the presence of an intact Py tract consensus sequence in the pre-mRNA31, and either SF1 or U2AF65.36 Accordingly, a bent conformation of the bound Py tract is required to account for the specific interactions mediated by the U2AF65 domains, including the U2AF65UHM/SF1 complex at the BPS,17; 24 crosslinking of the RRM1 and RRM2 domains respectively with the 3’ and 5’ ends as well as the central nucleotides of the Py tract,49 crosslinking of the N-terminal RS domain of U2AF65 with the BPS,23 and directed cleavage of RNA sequences both upstream of the BPS and downstream of the 3’ splice site by a hydroxyl radical probe attached to the U2AF65 N-terminus.32
We propose a model based on these known interactions among the U2AF65, SF1 and splice site sequences in which the component RNA binding domains are constrained near the center of mass of the complex in the RNA-bound conformation (Fig. 4d). The RNA-induced compression of the SF11–255/U2AF65R123/3’SS complex presented here is consistent with and supports the model of a bent conformation for the bound splice site. Similarities between the high q regions of the Kratky plots for SF11–255/U2AF65R123 and SF11–255/U2AF65R123/3’SS (Fig. 3f) rule out extensive folding of unstructured domains as the chief basis for the RNA-induced changes in the molecular dimensions of the complexes.52 Instead, the exact locations of the subunits and domains need to be defined in the longer term by neutron scattering or high resolution structures of the early splice site complexes. This low resolution structural analysis provides a glimpse of conformational changes among essential splicing factors during the early stages of pre-mRNA splicing, and represent an important step towards the future high resolution studies needed to fully elucidate a complete structural model of the mechanism for pre-mRNA splice site choice.
This work was supported by a grant from the National Institutes of Health (R01 GM070503) to C.L.K. We are grateful to Dr. G.L. Hura and Dr. R. Gillilan for indispensable guidance with SAXS data collection and analysis, and to Dr. J.E. Wedekind and Dr. M.R. Green for insightful discussions. SAXS data were collected at the SIBYLS beamline at the Advanced Light Source, Lawrence Berkeley National Laboratory, which is supported in part by the DOE program Integrated Diffraction Analysis Technologies (IDAT) and the DOE program Molecular Assemblies Genes and Genomics Integrated Efficiently (MAGGIE) under Contract Number DE-AC02-05CH11231 with the U.S. Department of Energy.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
SAXS data and models were deposited in the BIOISIS SAXS database (www.bioisis.net) with ID codes XX, YY, ZZ, XR, YR, and ZR for SF11–255, U2AF65R123, SF11–255/U2AF65R123, SF11–255/BPS, U2AF65R123/Py tract, and SF11–255/U2AF65R123/3’SS, respectively.
Supplementary data associated with this article are available online.