|Home | About | Journals | Submit | Contact Us | Français|
The essential pre-mRNA splicing factor, U2AF65, guides the early stages of splice site choice by recognizing a polypyrimidine (Py)-tract consensus sequence near the 3′-splice site. Since Py-tracts are relatively poorly conserved in higher eukaryotes, U2AF65 is faced with the problem of specifying uridine-rich sequences, yet tolerating a variety of nucleotide substitutions found in natural Py-tracts. To better understand these apparently contradictory RNA binding characteristics, the X-ray structure of the U2AF65 RNA binding domain bound to a Py-tract composed of seven uridines has been determined at 2.5Å resolution. Specific hydrogen bonds between U2AF65 and the uracil bases provide an explanation for polyuridine recognition. Flexible sidechains and bound water molecules form the majority of the base contacts, and potentially could rearrange when the U2AF65 structure adapts to different Py-tract sequences. The energetic importance of conserved residues for Py-tract binding is established by analysis of site-directed mutant U2AF65 proteins using surface plasmon resonance.
Most transcripts of higher eukaryotes contain intervening sequences (introns) between the protein coding regions (exons) that must be excised by pre-mRNA splicing before nuclear export and translation of the mRNA product. The task of pre-mRNA splicing is accomplished through a series of ATP-dependent conformational rearrangements among constitutive splicing factors and small nuclear (sn)RNAs called the spliceosome (Jurica and Moore, 2003). In addition, alternative splicing factors generate transcript diversity for cell growth and differentiation by incorporating different exons into the final mRNA (Maniatis and Tasic, 2002). An essential splicing factor, U2 Auxiliary Factor (U2AF) recognizes consensus 3′-splice site sequences in the pre-mRNA and coordinates the initial states of spliceosome assembly. Since formation of the U2AF complex commits the pre-mRNA to be spliced (Michaud and Reed, 1991), U2AF/pre-mRNA interactions present a key target for regulation during alternative splicing, for example by Sex-lethal (SXL) (Valcarcel et al., 1993) or polypyrimidine tract binding protein (PTB) (Sharma et al., 2005). Accurate recognition of the 3′-splice site by U2AF is critical for pre-mRNA splicing, as demonstrated by the association of an estimated half of human genetic diseases with errors in splice site recognition (Garcia-Blanco et al., 2004).
U2AF is a heterodimer of two subunits. The large subunit (U2AF65) recognizes an essential, polypyrimidine (Py)-tract pre-mRNA consensus that is composed predominantly of uridines (Zamore et al., 1992). The small subunit (U2AF35) associates tightly with a region near the U2AF65 N-terminus, and contacts an adjacent ‘AG’ consensus dinucleotide at the nearby intron-exon boundary (Merendino et al., 1999; Wu et al., 1999; Zorio and Blumenthal, 1999). Initially, the U2AF heterodimer binds the pre-mRNA as a ternary complex with a third protein, Splicing Factor 1 (SF1) (Abovich and Rosbash, 1997). SF1 recognizes the branchpoint consensus sequence (BPS) of the pre-mRNA where the first step of the splicing reaction ultimately takes places. In parallel with assembly of the U2AF/SF1/3′-splice site complex, the U1 small nuclear ribonucleoprotein (snRNP) associates with the 5′-splice site. Formation of this early splicing complex brings the BPS, 5′- and 3′-splice sites together in a structured conformation (Kent and MacMillan, 2002; Kent et al., 2003). Next, the U2 snRNP component of the spliceosome forms an ATP-dependent complex with the BPS and U2AF as SF1 dissociates. Stable association of the U2 snRNP requires an N-terminal, arginine-serine-rich (RS) domain of U2AF65 (Shen and Green, 2004; Valcarcel et al., 1996), and RNP-unwindases such as the U2AF-Associated-Protein-56KD (UAP56) (Fleckner et al., 1997). Ultimately, U2AF is released from the pre-mRNA before the splicing reaction is catalyzed by the active spliceosome.
U2AF65 preferentially binds uridine-rich RNA sequences, as shown by in vitro genetic selection experiments with U2AF65 that enrich polyuridine sequences (Singh et al., 1995), and chemical modification of the uridine-N3 or O4 atoms inhibits U2AF65 binding by ~100-fold (Singh et al., 2000). Accordingly, Py-tracts composed of long uridine stretches promote use of adjacent 3′-splice sites (Coolidge et al., 1997; Reed, 1989). However, natural mammalian Py-tracts vary in length and sequence composition (Senapathy et al., 1990). U2AF65 universally recognizes these diverse natural Py-tracts, which are frequently interrupted with cytosines or purines, albeit with a broad (200-fold) range of affinities (Zamore et al., 1992). In contrast, the alternative splicing factors SXL and PTB bind specific Py-tract sequences of regulated splice sites (guanosine-containing uridine-tracts or alternating (CU)-tracts, respectively) (Perez et al., 1997; Singh et al., 1995; Sosnowski et al., 1989; Valcarcel et al., 1993).
Despite their distinct Py-tract specificities, the RNA binding domains of U2AF65 (Ito et al., 1999), SXL (Handa et al., 1999), and PTB (Conte et al., 2000; Oberstrass et al., 2005; Simpson et al., 2004) are composed of a similar structural scaffold of consecutive RNA recognition motifs (RRM). The RRM, one of the most common types of eukaryotic RNA binding domains, is characterized by two ribonucleoprotein consensus motifs (RNP1 and RNP2) with aromatic and basic residues that interact with single-stranded RNAs (Maris et al., 2005). Specifically, the two central RRMs (RRM1 and RRM2) of U2AF65 comprise the minimal Py-tract binding domain (U2AF651,2) (Banerjee et al., 2003; Banerjee et al., 2004; Zamore et al., 1992). To investigate how these two U2AF65 RRMs accomplish versatile Py-tract recognition, we present the X-ray structure and a complementary mutational analysis of the U2AF65 RNA binding domain in complex with polyuridine RNA. The structure reveals that U2AF65 recognizes uridines through a network of hydrogen bond interactions with the base edges, rather than shape selection of the smaller pyrimidine compared with purine bases. A significant number of side-chain and water-mediated hydrogen bonds may explain the ability of U2AF65 to bind a variety of natural Py-tract sequences.
Initial attempts to cocrystallize U2AF65 or U2AF651,2 with polyuridine RNAs of 12–20 nucleotides were unsuccessful. One strategy to engineer protein crystallization without interfering with functional activity is to shorten poorly-conserved loop regions (Mazza et al., 2002; Nolen et al., 2001). The length and sequence composition of the 30-residue linker between RRM1 and RRM2 of human U2AF65 are phylogenetically variable (Figure 1), and this linker region is predicted to lack a well-defined structure (http://bioinf.cs.ucl.ac.uk/disopred/ and (Shamoo et al., 1995)). Accordingly, several U2AF65 variants with shortened linkers were screened for cocrystallization with single-stranded polyuridine oligonucleotides of various lengths (Sickmier et al., 2006). The best crystals were obtained from a human U2AF65 fragment containing RRM1 and RRM2 (dU2AF651,2; residues 148–336 lacking residues 238–257), in complex with a polyuridine RNA dodecamer (rU12).
The structure of the dU2AF651,2 complex with polyuridine RNA was determined at 2.5Å resolution by molecular replacement using the X-ray structure of the isolated RRM1 domain (PDB code 2FZR) as the search model. Electron density calculated from the initial molecular replacement solution unambiguously revealed three nucleotides bound to RRM1. Following iterations of model building, refinement and density modification, a complete structure was built for RRM1 and RRM2 of one polypeptide and seven of the twelve uridines in the asymmetric unit (rU7). No electron density was observed for the remaining five uridines of the oligonucleotide, indicating that these were either disordered or degraded during crystallization. Crystallographic data and refinement statistics are given in Table 1.
To confirm that residues 238–257 of the U2AF65 RRM1-RRM2 linker were dispensable for pre-mRNA splicing, the activity of a full length U2AF65 variant containing the linker deletion (dU2AF65) was tested in vitro (Figure 2). A titration of the wild-type U2AF65 or dU2AF65 linker variant with U2AF65-depleted nuclear extract showed that splicing of the adenovirus major late pre-mRNA (Ad ML) was restored following addition of comparable levels of both proteins (between 0.25 and 0.50μM), regardless of the linker composition. Thus, the deleted linker residues were dispensable for U2AF65 to function during pre-mRNA splicing in vitro.
The RNA binding properties of the wild-type U2AF651,2 and the variant dU2AF651,2 fragments were measured using nitrocellulose filter binding assays with single-stranded RNAs. Both U2AF651,2 and dU2AF651,2 proteins bound an RNA oligonucleotide composed of 20 uridines (rU20) with comparable apparent equilibrium dissociation constants (Kd 0.37±0.06 or 0.41±0.04μM respectively, data not shown). These binding constants were on the order of those previously measured for a U2AF65 fragment binding to natural Py-tracts using electrophoretic mobility shift assays (EMSAs) (Zamore et al., 1992). Neither U2AF651,2 nor dU2AF651,2 detectably bound a polyguanosine RNA (rG20), indicating that the sequence preference for polyuridine over polyguanosine is >200-fold assuming a ~100μM Kd limit for nitrocellulose filter binding (Hall and Kranz, 1999). In summary, our modification of the RRM1-RRM2 linker region had no apparent effect on the in vitro splicing activity or RNA binding characteristics of U2AF65, establishing that the dU2AF651,2/rU7 complex is a reliable model system in which to study Py-tract recognition by U2AF65.
The RNA affinity of U2AF651,2 is relatively low compared with the subnanomolar affinity of the corresponding SXL RRM’s for its regulated tra Py-tract (Kd ~5×10−11M for an SXL fragment containing RRM1 and RRM2) (Kanaar et al., 1995), although in practice the relative affinities of the full length U2AF65 and SXL proteins for this sequence are closer (Kd 10−8M and 10−9M, respectively) (Valcarcel et al., 1993). Association of the U2AF65/pre-mRNA complex with additional splicing factors such as SF1 (Berglund et al., 1998) and U2AF35 (Merendino et al., 1999; Wu et al., 1999; Zorio and Blumenthal, 1999) is functionally important to ensure the affinity and specificity of U2AF65 for the entire 3′ splice site consensus (BPS-Py-tract-AG). The binding constants of the isolated U2AF65 RRMs for rU20 (Kd 4.0±2.0μM and >100μM for RRM2 and RRM1 respectively, data not shown), were similar to the RNA affinities of other isolated RRMs (Shamoo et al., 1995). Using these values, a binding constant can be predicted for U2AF651,2/rU20 assuming a 30-residue unstructured linker separating two RRMs. The predicted Kd (0.8μM calculated using equations derived by Shamoo and coworkers (Shamoo et al., 1995)) is consistent with the observed Kd of the U2AF651,2 fragment, which provides further evidence that the U2AF651,2 interdomain linker does not significantly contribute to RNA binding.
The repeating unit of the crystal is a 1:1 stoichiometric dU2AF651,2/rU7 complex, although the dU2AF65 RRM1 and RRM2 that interact with the same rU7 strand are contributed by distinct polypeptide chains (Figure 3A & Supplementary Figure 1). We have demonstrated that dU2AF65 recognizes the Py-tract and functions comparably to unmodified U2AF65 during pre-mRNA splicing, and used RNA binding experiments with five different site-directed U2AF651,2 mutants (presented below) to establish the energetic importance of the U2AF65/uridine interactions observed in the structure. Furthermore, the dU2AF651,2/rU7 structure is consistent with and explains extensive biochemical literature (Kent et al., 2003; Shen and Green, 2004; Singh et al., 2000; Singh et al., 1995; Valcarcel et al., 1996). Thus, both published work and assays presented here support the relevance of the dU2AF651,2/rU7 structure for interpreting uridine recognition by RRM1 and RRM2 of U2AF65.
The overall dimensions of the integrated structural unit containing U2AF65 RRM1 and RRM2 bound to the same RNA strand (henceforth called dU2AF651,2/rU7) are 55×37×20Å (Figure 3B). Each RRM has the expected topology of four β-strands and two α-helices, with an additional β-extension of the second α-helix (β4) that forms a short β-hairpin with the C-terminal β5-strand. The topologies of the dU2AF651,2 RRMs are essentially the same as the isolated, RNA-free U2AF65 RRMs (r.m.s.d. 2.0Å for 78 matching RRM1 Cα atoms, and 2.2Å for 84 matching RRM2 Cα atoms), which were previously determined using nuclear magnetic resonance (NMR) (Ito et al., 1999). The largest differences between these structures are due to crystal contacts that rearrange an unusually long α1-β2 loop of RRM1. This potentially flexible region is required for the association of U2AF65 with the RNP-dependent ATPase, UAP56 (Fleckner et al., 1997). Likewise, comparison with the X-ray structure of unliganded RRM1 shows that the structure remains largely unchanged by association with RNA (0.6Å r.m.s.d. for 82 matching Cα atoms). The RNA-induced twisting of the RRM2 β2′-β3′-loop (where primed italics distinguish RRM2 secondary structure) is the most notable difference between the unliganded and RNA-bound X-ray structures.
The dU2AF651,2/rU7 complex buries a significant amount of surface area (2200Å2), which is comparable to the surface area buried in other structures of tandem RRMs bound to RNA (2400Å2–2600Å2) (Deo et al., 1999; Handa et al., 1999). The β-sheet surfaces of the two RRMs face one another across a deep cleft surrounding the oligonucleotide bases, and loop regions encircle the RNA strand. The RNP1 and RNP2 signatures of the RRM fold are located on the central β-strands (β3/β3′ and β1/β1′, respectively) at the RNA interface. Although the theoretical isoelectric point of the U2AF65 Py-tract binding domain is slightly acidic, the RNA binding groove is highly electropositive and complements the negative charge of the phosphodiester backbone (Figure 3C).
The relative orientations of RRM1 and RRM2 from distinct dU2AF651,2 molecules are indirectly stabilized by the shared interface with the bound RNA, in particular by interactions with opposite faces of Uri4 and flanking nucleotides as the RNA threads through the RRM1/RRM2 cleft (Figure 3C and Figure 5E). A few direct RRM1/RRM2 interactions bury 504Å2 of the dU2AF651,2 protein’s surface area, consistent with other complexes of tandem RRMs with RNA; for example, 550Å2 of surface area is buried between RRM1 and RRM2 of polyadenosine binding protein (PAB) (Deo et al., 1999). Encircling the central Uri4 nucleotide, RRM1/RRM2 contacts include hydrogen bonds (~2.8Å) between the carbonyl of Lys195 (RRM1 β2-β3 loop) and Lys260 sidechain (RRM2 β1′), and weak contacts (~4Å) between the sidechains of Asp194 (RRM1 β2-β3 loop) and Asn289 (RRM2 β2′). Around Uri3, the RRM2 C-terminus is located near the sidechain of Gln222 in the loop before RRM1 β5; around Uri5, an alternate conformation of His230 near the RRM1 C-terminus contacts Ser294 in the RRM2 β2′/β3′-loop. In contrast, no contacts are observed between dU2AF651,2 RRM domains connected by the same polypeptide chain but bound to distinct RNA oligonucleotides.
At the Uri4 nucleotide, the RNA strand makes an abrupt turn so that the 5′- and 3′-halves respectively interact with RRM2 and RRM1 of symmetry-related polypeptides (Figure 3A, B). Each RRM recognizes four nucleotides; RRM2 recognizes Uri1–Uri4, RRM1 recognizes Uri4–Uri7, and Uri4 is bound at the interface between RRM1 and RRM2. The location of Uri4 at the RRM1/RRM2 interface is consistent with simultaneous crosslinking of both U2AF65 RRMs with central uridines of the Ad ML Py-tract (Banerjee et al., 2003). The 5′-to-3′-orientation of the RNA strand is universally shared among previously characterized RRMs, and the binding-site size of four nucleotides per U2AF65 RRM is the most common site size among canonical RRMs (Maris et al., 2005).
The RNA strand displays a 120° kink between Uri4 and Uri5, with an overall bend of 100° in the RNA axis due to additional curvature contributed by flanking nucleotides. The distance between the 5′- and 3′-ends of the rU7 strand is shortened by 66% relative to the fully-extended conformation. Several other structures of RNA binding proteins composed of tandem RRMs bound to single-stranded RNA ligands are available for comparison with the dU2AF651,2/rU7 structure (Figure 4). These structures include SXL bound to the tra Py-tract, which is contains a 5′-terminal guanosine (Handa et al., 1999), PAB bound to an polyadenosine tract (Deo et al., 1999), and HuD bound to a polyuridine tract containing an internal adenosine (Wang and Tanaka-Hall, 2001). The RNA ligands of these different proteins bend to various extents, as measured by the shortening of the 5′-to-3′-distance in the bound relative to an ideal, fully-extended conformation. The bound polyadenosine ligand of PAB is only slightly bent (43% shortening), whereas the Py-tract ligands of SXL and HuD are more bent (60% shortening). The 66% shortening between the polyuridine termini observed in the dU2AF651,2/rU7 complex is slightly greater than previous examples, although the conformation of the Py-tract is likely to differ and/or be flexible when bound by unmodified U2AF65.
In contrast with the intramolecular hydrogen bonds and base-base stacking interactions stabilize the bent SXL- and HuD-bound RNA conformations (Handa et al., 1999; Wang and Tanaka-Hall, 2001), no intramolecular RNA/RNA interactions are observed within the U2AF65-bound rU7 strand. Instead, conserved hydrophobic residues of the U2AF65-RNP1 or RNP2 motifs stabilize many of the rU7 backbone positions by packing against the ribose sugars (RRM2-RNP1-Gly301/Uri2, RRM2-RNP1-Tyr302/Uri3, and RRM1-RNP1-Phe197/Uri5) (Figure 5C, D, F). The sugar-phosphate backbones of the relatively straight, 3′-terminal nucleotides (Uri6 and Uri7) lack extensive U2AF65 contacts and are exposed to solvent. Only two of the phosphates engage in direct, electrostatic interactions with the protein (RRM2-Lys328/Uri1 and RRM2-Lys225/Uri4), consistent with the barely detectable interference with U2AF651,2/Py-tract interactions by chemical modification of the oligonucleotide phosphates (Singh et al., 2000).
One direct, and one water-mediated hydrogen bond are observed between U2AF65 and the ribose-hydroxyls (RRM2-RNP1-Lys300/Uri1-O2′ and RRM1-RNP1-Asn196/H2O/Uri3-O2′) (Figure 5B, D). The presence of a single direct hydrogen bond between U2AF65 and the ribose-hydroxyls distinctly differs from the SXL/tra Py-tract structure, in which six of the nine O2′ hydroxyls are involved in inter- or intra-molecular hydrogen bonds (Handa et al., 1999). The dU2AF651,2/rU7 structure explains the 4–9-fold difference in the affinities of wild-type U2AF651,2 for DNA and RNA, observed here using surface plasmon resonance (SPR) (Table 2), and previously by EMSAs with U2AF65 and tra sequences substituted with ribouridine compared with deoxythymidine (Singh et al., 2000). Assuming a penalty of 3 kcal/mol for the unsatisfied hydrogen bond potential of Lys300 in the absence of a ribose hydroxyl group (Pace et al., 1996), a 4.5-fold predicted preference for RNA over DNA agrees with the observed specificity. In contrast, SXL displays a >10,000-fold preference for RNA over DNA (Kanaar et al., 1995; Singh et al., 2000), consistent with the abundant O2′ contacts in the SXL/tra structure (Handa et al., 1999).
A schematic representation and detailed views of the dU2AF651,2 interactions with each of the uridines are shown in Figure 5A and Figures 5B–G, respectively. The aromatic sidechains of the U2AF65-RNP1 and RNP2 consensus motifs stack parallel to the uracils (RRM2-RNP2-Phe262/Uri3, RRM2-RNP1-Phe304/Uri4, RRM1-RNP2-Tyr152/Uri5, RRM1-RNP1-Phe199/Uri6). Additional base-stacking interactions are observed between conserved Uri2 and RRM2-RNP1 glycines (Gly264 and Gly265). In contrast, no restrictive contacts are observed with the hydrocarbon edges (C5 and C6 atoms) of the uracil rings (Figure 5B–F). Hence, U2AF65 allocates space to accommodate larger purine nucleotides that often interrupt natural mammalian Py-tract sequences without a requirement for global conformational rearrangements (Senapathy et al., 1990). Further, the open environment of the C5 carbons of the uracils allows U2AF65 to tolerate the extra methyl group of thymine, as demonstrated by the similar U2AF65 affinities for thymidine compared with uridine oligonucleotides (Singh et al., 2000).
dU2AF651,2 forms sequence-specific hydrogen bonds with up to two of the three polar uracil atoms (Figure 5B–G), and directly engages all three potential hydrogen bond donor and acceptor atoms at Uri1. These include the uracil-O2 positions (RRM2-RNP1-Lys300/Uri1-O2, RRM1-RNP1-Lys195/Uri4-O2, RRM2-Asn289/Uri4-O2, RRM1-His230(mainchain-N)/Uri5-O2, RRM1-Arg150/Uri6-O2), and the uracil-N3 or O4 atoms of several nucleotides (RRM2-Asp293/Uri1-N3, RRM2-Ala335(mainchain-O)/Uri3-N3, RRM1-Arg228(mainchain-O)/Uri5-N3 and RRM2-Thr296/Uri1-O4, RRM2-Lys329/Uri2-O4, RRM2-Gln333/Uri3-O4, RRM2-Lys260/Uri4-O4, RRM1-Asp231(mainchain-N)/Uri6-O4). Additionally, two water molecules indirectly mediate hydrogen bonds between the sidechains of RRM1-RNP2-Asn155 or RRM2-Asn289 and the Uri3-O2 or Uri4-N3 atoms, respectively. The extensive interactions of U2AF65 explains the ~100-fold reduced affinity following methylation of the N3 or O4 positions, which introduces a bulky hydrophobic group and removes the imino hydrogen bond donor (Singh et al., 2000).
With the exception of Lys300 and Lys195, the sequence-specific hydrogen bonds of the dU2AF651,2/rU7 complex are primarily mediated by residues outside the consensus RNP motifs, particularly in the last β-strands of RRM1 or RRM2. Strikingly, flexible sidechains rather than mainchain atoms are responsible for 66% of the sequence-specific hydrogen bonds between U2AF65 and the uracils. Like the U2AF65 RNP motifs, residues that mediate specific hydrogen bonds with the uridines are phylogenetically conserved among U2AF65 homologs (Figure 1). As observed here for U2AF65, specific hydrogen bonds with residues outside the shared RNP consensus motifs often contribute to the sequence-selectivity of RRM-containing proteins (Maris et al., 2005; Wang and Tanaka-Hall, 2001), providing one means for the ~500 different human RRM-containing proteins to distinguish among the many RNAs in the cellular milieu (Maris et al., 2005).
To test the energetic contribution of the interactions observed in the dU2AF651,2/rU7 structure to Py-tract binding, we analyzed a series of site-directed mutant U2AF651,2 proteins using SPR (Table 2). The affinities of the wild-type U2AF651,2 and variant dU2AF651,2 proteins for an immobilized RNA strand composed of twenty uridines (rU20) were comparable when determined using SPR or nitrocellulose filter binding assays. Since the dU2AF651,2 and U2AF651,2 proteins displayed only a 4–9-fold preference for the ribose-containing rU20 over a deoxyribose-counterpart (dU20), the more stable dU20 oligonucleotide was chosen for subsequent assays. The wild-type U2AF651,2 construct was modified by site-directed mutagenesis to evaluate the energetic contribution of the structurally observed interactions to Py-tract binding by the natural U2AF65 protein. Altered residues were chosen to span the length and engage in different types of interactions with the RNA strand (Supplementary Figure 2A), ranging from aromatic stacking to hydrogen bond formation. Alanine was substituted for most residues, except the replacement of Gly301 with a bulkier isoleucine residue.
All of the U2AF651,2 site-directed mutations reduced affinity for dU20 by several-fold (Table 2). The conserved phenylalanines in the RNP motifs were most important for RNA affinity, and displayed an energetic contribution (~5 kcal/mol each) that was slightly greater than expected based solely on loss of the buried hydrophobic surface area (3.2 kcal/mol). The penalties for substituting either RRM2-RNP2-Phe262 or RRM2-RNP1-Phe304 with alanine were greater by 0.4 kcal/mol than the penalty for RRM1-RNP1-Phe199 (Ito et al., 1999). Accordingly, Phe262 and Phe304 respectively interact with the centrally-located Uri3 and Uri4, and may stabilize the global RNA conformation beyond local base-stacking interactions; whereas Phe199 stacks solely with Uri6 near the relatively straight 3′-end of the oligonucleotide. A double Lys260Ala/Asn289Ala mutation that removed U2AF65 hydrogen bond donors and acceptors with Uri4 also dramatically reduced RNA binding (9-fold). The smallest effect (5-fold) was observed for the Gly301Ile-substitution, which packs against the sugar-phosphate backbone of Uri2 rather than extensively interacting with the base. Consistent with their structural and energetic importance for Py-tract interactions, the U2AF65 RNP1 and RNP2 residues are identical among homologs from diverse organisms ranging from plants to mammals, with the exception of conservative substitutions in distantly related fission yeast (Figure 1). The excellent correlation between the dU2AF651,2/rU7 structure and the RNA binding properties of the mutant U2AF651,2 proteins, coupled with previous biochemical experiments (Kent et al., 2003; Shen and Green, 2004; Singh et al., 2000; Singh et al., 1995; Valcarcel et al., 1996) and the results of our functional assays, confirms that the dU2AF651,2/rU7 structure provides a reliable tool for interpreting U2AF65/Py-tract recognition.
In theory, U2AF65 could distinguish purines (adenine and guanine) from pyrimidines (uracil and cytosine) on the basis of size, or all four bases could be distinguished by their unique patterns of hydrogen bond donors and acceptors. The structure reveals that dU2AF651,2 adopts the latter mechanism, recognizing the rU7 strand by forming unique hydrogen bonds with the uracil edges rather than shape selective recognition. Frequently, the hydrogen bonds with the uracil bases are formed with sidechains, which have the flexibility to retreat from unfavorable contacts if faced with a different base. For example, lysine residues, such as U2AF65 Lys260, Lys300, or Lys329, generally undergo sidechain rearrangements upon ligand binding (Najmanovich et al., 2000). Additional hydrogen bond donors and acceptors on several of the interacting sidechains could easily interconvert by torsional rotations (including Asn289, Gln333, and Thr296), as observed for residues in the RNP motifs of hnRNP A1 (Vitali et al., 2002). This provides one structural basis for the apparently contradictory abilities of U2AF65 to specify uridine-rich splice sites, yet tolerate a variety of relatively divergent metazoan Py-tract sequences.
Two bound water molecules mediate interactions between U2AF65 sidechains and sequence-specific atoms of the uracil bases. Although prominent roles have yet to be established for water molecules during protein/RNA recognition, the importance of bound water molecules has been established for many protein/DNA complexes, including the trp repressor, paired homeodomain, and Smad3 MH1 (Jayaram and Jain, 2004). Most of the bound water molecules in protein-DNA complexes separate electrostatically-repulsive polar atoms or extend short sidechains to achieve hydrogen bonds with the nucleic acid, as observed for two water-mediated contacts in the dU2AF651,2/rU7 complex. In addition to flexible sidechain rearrangements, relocation of bound water molecules is a second possible mechanism for the U2AF65 structure to accommodate base-substitutions within natural Py-tract sequences.
Comparison of Py-tract interactions by U2AF65, which is an essential splicing factor, with those of the specific splicing regulators SXL and PTB, reveals guidelines for general versus sequence-specific RNA recognition by RRMs. First, only three polar atoms of the seven uracils form hydrogen bonds with U2AF65 mainchain peptide bonds. In contrast, SXL uses eight mainchain hydrogen bonds to recognize nine bases of the tra Py-tract (5′-UGUUUUUUU-3′), including one specifying the distinctive guanine exocyclic amine (Handa et al., 1999). The PTB RRMs also display several mainchain interactions with an alternating (CU)-tract found in regulated splice sites, such as α- or β-tropomyosin (Oberstrass et al., 2005). Second, U2AF65 recognizes the uracils with three water-mediated hydrogen bonds, which could rearrange easily to accommodate other sequences. Only one water molecule is observed mediating protein/RNA contacts in the SXL structure (Handa et al., 1999). Third, the hydrocarbon edges of the U2AF65-bound uracils are free of inter- or intramolecular contacts that would confer shape-selective recognition, whereas the pyrimidines of the SXL and PTB structures are engaged in tight inter- and intramolecular contacts (Handa et al., 1999; Oberstrass et al., 2005). Thus, despite similar use of tandem RRM folds, U2AF65 tolerates base substitutions by adopting clearly different structural strategies from the sequence-specific recognition modes of SXL and PTB.
Polyuridine sequences are inherently more flexible than other RNA polymers, due to the lower stability of intramolecular uracil-uracil stacking (Inners and Felsenfeld, 1970; Norberg and Nilsson, 1995). The rU7 strand of the dU2AF651,2 structure is the most curved among known structures of single-stranded RNAs bound by tandem RRMs, and the conformations of Py-tracts bound by SXL and HuD are also unstacked and bent (Handa et al., 1999; Wang and Tanaka-Hall, 2001). Directed hydroxyl radical footprinting experiments show that the BPS and 3′-splice sites are brought close together by association of U2AF65 with a functional Py-tract (Kent and MacMillan, 2002; Kent et al., 2003), and these and UV-crosslinking experiments show that the N-terminal U2AF65 RS domain concurrently interacts with the BPS and the downstream 3′-splice site (Kent et al., 2003; Shen et al., 2004; Valcarcel et al., 1996). Moreover, events at the 5′-splice site influences association of U2AF65 with the Py-tract (Cote et al., 1995; Sharma et al., 2005). A through-space explanation for U2AF65 interactions with linearly-distant RNA sequences was proposed previously (Kent et al., 2003), based on the established bends of Py-tracts in the presence of other RRM-containing proteins (Handa et al., 1999; Wang and Tanaka-Hall, 2001). With the reservation that the bend of the Py-tract and arrangements of the wild-type U2AF65 RRMs may be variable in solution and are likely to differ from those of the dU2AF651,2 variant, this hypothesis is further supported by the bent conformation of the rU7 site observed here.
Human U2AF65 and shorter variants were expressed and purified as glutathione-S-transferase (GST) fusion proteins in the vector pGEX6P-1 using standard protocols (GE Healthcare). The GST-tag was cleaved using PreScission Protease™, and removed by subtractive glutathione-Sepharose affinity and anion-exchange chromatography followed by a final size-exclusion chromatography step. Details of the crystallization are described elsewhere (Sickmier et al., 2006). In brief, RNA oligonucleotides (Dharmacon) were incubated in a in 1:1 ratio with purified proteins, then equilibrated by the hanging drop method with a reservoir solution containing 1.6 M ammonium sulfate, 10% dioxane, and 0.1M MES pH 6.0. Crystals were flash-cooled to −180° C for data collection.
A native dataset was collected at the National Synchrotron Light Source (NSLS) Beamline X8C, and processed using DENZO/SCALEPACK (Otwinowski, 1997). The structure was determined by molecular replacement, using the X-ray coordinates of the unbound RRM1 (PDB code 2FZR, Thickman & Kielkopf, in preparation) as the search model. The NMR structure of the unbound RRM2 facilitated building the structure using O (Jones et al., 1991), which was refined using CNS (Brunger et al., 1998) (Table 1). The final model includes U2AF65 residues 148–336 (excluding deleted residues 238–257), with three additional N-terminal residues from the protease site, seven uridines plus a 5′-phosphate, one dioxane molecule from the crystallization conditions, and 31 water molecules. Analysis of the structure using PROCHECK (Laskowski et al., 1993) established none of the (,ψ) combinations lie in the disallowed regions of the Ramachandran plot, and that the overall G factor (0.23) is significantly better than average for structures of comparable resolution. RNA helical parameters were calculated using the program Curves (Lavery and Sklenar, 1988), and figures were prepared using Bobscript (Esnouf, 1999), Molscript (Kuralis, 1991), Pymol (www.pymol.org), and Raster3D (Merritt and Bacon, 1997).
HeLa nuclear extract was depleted of U2AF as described (Valcarcel et al., 1997) by chromatography on oligo(dT)-cellulose in the presence of 1M KCl. Splicing reactions were performed essentially as described previously (Kan and Green, 1999), except that 40% HeLa nuclear extract was used. Spliced products were resolved on 10% denaturing polyacrylamide gels (19:1) in 8M urea in Tris-Borate-EDTA buffer.
Preliminary nitrocellulose filter binding experiments with U2AF651,2 ordU2AF651,2 and radiolabeled RNA (5′-UUUUUUUUUUUUUUUUUUUU-3′-or 5′-GGGGGGGGGGGGGGGGGGGG-3′) were performed as described (Hall and Kranz, 1999). Affinities of site-directed U2AF651,2 mutants for 20-nucleotide deoxyuridine strands (dU20) were measured using a BIAcore3000 instrument. dU20 oligonucleotides with 5′-biotin labels were immobilized on pre-conditioned streptavidin sensor chips to a density of 20 response units (RU). Since the off-rate (1s−1) for the interaction was beyond the resolution of the instrument (<0.1s−1) (Karlsson, 1999), equilibrium binding experiments were used to obtain the Kd. Two-fold serial dilutions of the various U2AF651,2 proteins, covering two orders of magnitude around the Kds, were injected over this surface, at 20μL/min flow rate in room temperature buffer (containing 100mM NaCl, 10mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonate pH6.8, 0.05% surfactant-P20). No additional treatments other than running buffer were required to regenerate the baseline. The protein injections were performed in duplicate in random order, and were interspersed with periodic injections of buffer to monitor baseline stability. The RU during the equilibrium-binding phase was averaged over 100s, plotted against peptide concentration, and curves were fit using BIAevaluation-v4.1 (Supplementary Figure 2B).
The atomic coordinates will be deposited with the Protein Data Bank (PDB code 2G4B).
We deeply appreciate S. K. Burley’s assistance developing this project. We are grateful to R. McMacken for advice and J. Bender, S. Prigge, and J. Wedekind for critically reading the manuscript. M. Morelli and S. Lindley assisted with protein purification. We thank the NSLS staff for use of Beamline X8C at the Brookhaven National Laboratory, which is supported by the U.S. Department of Energy under contract no. DE-AC02-98CH10886. H.S. was supported in part by a Charles A. King Trust Fellowship, and work in the M.R.G laboratory was supported by the National Institutes of Health (NIH, grant GM035490). K.E.F. was supported in part by a training grant (T32 GM08403) from the NIH, and E.A.S. was a Lang Fellow. Work in the C.L.K. laboratory was supported by the NIH (grant GM070503).