|Home | About | Journals | Submit | Contact Us | Français|
Recent structures of the heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) have revealed two unexpected examples of RNA recognition motif (RRM)-like domains with specialized features for protein recognition. These unusual RRMs, called U2AF homology motifs (UHMs), represent a novel class of protein recognition motifs. Defining a set of rules to distinguish traditional RRMs from UHMs is key to identifying novel UHM family members. Here we review the critical sequence features necessary to mediate protein–UHM interactions, and perform comprehensive database searches to identify new members of the UHM family. The resulting implications for the functional and evolutionary relationships among candidate UHM family members are discussed.
The processes of RNA splicing, transport, capping, editing, and polyadenylation are heavily dependent on protein factors that recognize the pre-mRNA and assemble the appropriate pre-mRNA processing complexes. Surprisingly, the many different protein factors that guide pre-mRNA modification pathways are composed of a limited number of conserved, modular RNA-binding domains (Burd and Dreyfuss 1994). Of these, the RNA recognition motif (RRM) domain is by far the most abundant type of eukaryotic RNA-binding motif. In addition to associations between protein and RNA, protein–protein interactions are essential to recruit catalytic components to sites of RNA modification and to coordinate pre-mRNA processing with other cellular pathways. Interestingly, traditional protein interaction domains, such as SH2, SH3, and WW motifs, are rarely observed in pre-mRNA processing factors (e.g., see Shatkin and Manley 2000; Zhou et al. 2002), implying that the ability to interact with other proteins may reside in the sequences previously thought to be involved in RNA binding. Consistent with this idea, recent structures of the heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) have revealed two unexpected examples of RRM-like domains with specialized features for protein recognition (Kielkopf et al. 2001; Selenko et al. 2003). In light of this structural information, we call these unusual RRMs U2AF homology motifs (UHMs) to reflect their distinct role in protein recognition. Here, the critical sequence features necessary to mediate protein–UHM interactions are reviewed and formulated in a manner that has permitted a comprehensive database search designed to identify members of the UHM family. The resulting implications for the functional and evolutionary relationships among candidate members of the UHM family are discussed. This review represents a first step toward distinguishing canonical RRMs from UHMs, and thereby contributes toward a major goal of the postgenomic era (Thornton et al. 2000): to convert genomic sequences into testable functional hypotheses.
The RNA-binding function of the canonical RRM domain has been extensively investigated over the last two decades. The most conserved RRM signature sequence is an eight-residue motif called ribonucleoprotein 1 (RNP1; Adam et al. 1986; Sachs et al. 1986), which has the consensus [RK]-G-[FY]-[GA]-[FY]-[ILV]-X-[FY] (where X is any amino acid). A second six-residue region of homology, called RNP2, is typically located ~30 residues N-terminal to RNP1 (Lahiri and Thomas 1986; Dreyfuss et al. 1988), and has the consensus [ILV]-[FY]-[ILV]-X-N-L. Additional conserved amino acids define an ~80-residue domain that encompasses the RNA-binding function (Query et al. 1989; Scherly et al. 1989; Birney et al. 1993).
The three-dimensional structure of the canonical RRM domain was first determined for the RRM of U1A (Nagai et al. 1990; Hoffman et al. 1991). The RRM fold is composed of two α-helices packed against four antipar-allel β-strands with topology βαββαβ, which form an α/β sandwich (Fig. 1). The RNP consensus motifs form two central β-strands, with RNP1 in β3 and RNP2 in β1. Because of the alternating side-chain conformations of the pleated β-sheet, some of the consensus residues maintain the core fold, whereas others are displayed on the surface for nucleic acid recognition. Structures of single RRMs complexed with RNA have been determined for the U1A-RRM bound to a hairpin loop of U1 snRNA (Oubridge et al. 1994; Price et al. 1998; Deo et al. 1999; Handa et al. 1999; Allain et al. 2000; Wang and Tanaka Hall 2001), and for a ternary complex of the U2 snRNP proteins U2B″-RRM/U2A′ with a U2 snRNA hairpin loop (Price et al. 1998). In contrast to the isolated RRM of U1A, in most cases multiple RRMs are observed within a single polypeptide, with an average of two RRMs per protein (Letunic et al. 2004). The structures of several proteins composed of two tandem RRMs complexed with single-stranded RNA oligonucleotides have been determined, including the alternative splicing factor Sxl (Handa et al. 1999), PAB (Deo et al. 1999), pre-rRNA packaging protein nucleolin (Allain et al. 2000), and translation regulatory protein HuD (Wang and Tanaka Hall 2001). A comparison of these six different structures has revealed some common themes, as well as differences, in the mode of canonical RRM/RNA recognition. When all the structures are superimposed, structural equivalent hydrogen-bonds or stacking interactions are observed between single-stranded RNA and residues in the RNP1 and RNP2 motifs. A variety of sequences and RNA conformations are recognized by a variety of complementary hydrogen bonds with specific bases and differing arrangements of single or multiple RRMs.
During pre-mRNA splicing, U2AF and other essential factors facilitate sequential association of small nuclear RNP particles (snRNPs), including U1, U2, U4, U5, and U6 snRNPs, with the borders of intervening pre-mRNA sequences (for review, see Brow 2002). Following assembly of the functional spliceosome, the intron is excised as a branched lariat by two catalytic steps, and adjacent exons are joined together to form the spliced mRNA. U2AF was identified as a factor that binds to pre-mRNA consensus sequences at the 3′ splice site (3′SS), and is required for stable association of the U2 snRNP core spliceosome particle with the pre-mRNA branch point sequence (BPS) during the first ATP-dependent step of the splicing process (Complex A; Ruskin et al. 1988; Zamore and Green 1989). The importance of U2AF in vitro was soon corroborated by the discovery that both subunits are essential in Drosophila melanogaster (Kanaar et al. 1993; Rudner et al. 1996, 1998b) and Caenorhabditis elegans (Zorio and Blumenthal 1999b). Moreover, U2AF65 is an essential protein in Schizosaccharomyces pombe (Potashkin et al. 1993), and U2AF35 is necessary for vertebrate development (Golling et al. 2002). Because U2AF commits the pre-mRNA to the first critical ATP-dependent step of splicing, its binding is often regulated during alternative splicing (Smith and Valcarcel 2000). In humans, the products of five U2AF35-like open reading frames and the single U2AF65 subunit may form distinct heterodimers with different functional activities (Tupler et al. 2001; Shepard et al. 2002). In addition to U2AF, other non-snRNP protein factors are required for formation of Complex A, including Splicing Factor 1 (SF1) and Splicing Factor 3b (SF3b), a multisubunit component of the U2 snRNP (Kramer and Utans 1991).
To perform its role in RNA splicing, two central canonical RRM domains of U2AF65 recognize the polypyrimidine tract (Py-tract) in the pre-mRNA (Fig. 2). Binding of U2AF65 to the Py-tract is strengthened by cooperative protein–protein interactions with SF1 at the upstream BPS (Berglund et al. 1998; Rain et al. 1998) and with U2AF35, which contacts the downstream 3′SS consensus (Merendino et al. 1999; Wu et al. 1999; Zorio and Blumenthal 1999a). The C-terminal UHM domain of U2AF65 interacts with the N-terminal domain of SF1 (U2AF65-UHM/SF1-ligand; Rain et al. 1998). At the opposite end of the large subunit, the N-terminal domain of U2AF65 provides a ligand that interacts with the central UHM domain of U2AF35 (U2AF35-UHM/U2AF65-ligand; Zhang et al. 1992; Rudner et al. 1998b). Subsequently, entry of the U2 snRNP displaces SF1 by interacting with the BPS via the U2 snRNA (Nelson and Green 1989; Wu and Manley 1989; Zhuang and Weiner 1989; Query et al. 1994), and with the U2AF65 C-terminal domain via the SF3b subunit, SAP155 (Gozani et al. 1998; Habara et al. 1998). Once the U2 snRNP has contacted the pre-mRNA, U2AF is dissociated by conformational rearrangements of the spliceosome components (Bennett et al. 1992; Chiara et al. 1997). In summary, key protein–protein interactions are mediated by the U2AF65-UHM, which interacts with SF1 and subsequently SAP155, and by the U2AF35-UHM, which interacts with the U2AF65 N terminus.
Based on primary sequence analysis, both the U2AF65 C-terminal domain and the central domain of U2AF35 were suspected to contain unusual variations of the RRM fold (Birney et al. 1993). However, the borders of the U2AF-UHM domains could not be assigned accurately because of sequence insertions in the first helix of the fold (Helix A) and the absence of aromatic amino acids in the RNP-like motifs that are normally critical for RNA recognition. The independent determination of the X-ray structure of the U2AF35-UHM/U2AF65-ligand complex (Kielkopf et al. 2001) and NMR structure of the U2AF65-UHM/SF1-ligand complex (Selenko et al. 2003) confirmed that both the C-terminal U2AF65 and central U2AF35 protein interaction domains adopt the βαββαβ RRM-fold topology. Within the RRM-like fold, the sequence insertions separating the RNP-like motifs increase the length of Helix A from three turns observed among canonical RRMs to five or eight turns for U2AF65 and U2AF35, respectively; the functional role of these sequence insertions, if any, is unclear. The parallel use of an RRM-like fold to recognize similar peptide ligands implies that the U2AF35-UHM and U2AF65-UHM domains represent a new type of protein–protein interaction motif, hitherto undetected amid the many canonical RRMs of pre-mRNA processing factors.
Three-dimensional structural information revealed unanticipated sequence features of U2AF35-UHM and U2AF65-UHM domains that enable interaction with short protein ligands. Despite low primary sequence identity (23%), ligand recognition by the different UHM domains is very similar (Fig. 3). In both the U2AF35-UHM/U2AF65-ligand and U2AF65-UHM/SF1-ligand structures, a critical Trp residue in the ligand sequence inserts into a tight hydrophobic pocket between the α-helices and the RNP1- and RNP2-like motifs (Kielkopf et al. 2001; Selenko et al. 2003). In addition to aliphatic residues, a conserved Arg–X–Phe motif (where X is any amino acid; see below) on the loop connecting the last α-helix (Helix B) and β-strand of the UHM fold contributes to the Trp-binding pocket. The Arg residue in the loop (U2AF35-Arg 133 or U2AF65-Arg 452) forms an intramolecular salt bridge with the last Glu residue of Helix A (U2AF35-Glu 88 or U2AF65-Glu 405) that shields one face of the ligand-Trp, whereas the Phe residue (U2AF35-Phe 135 or U2AF65-Phe 454) encloses the opposite Trp face. In addition to the extensive interface with the ligand-Trp, a series of acidic residues in Helix A of the UHM interacts with basic residues at the N terminus of the protein ligand. Specifically, electrostatic interactions between U2AF35-Glu 84 and U2AF65-Lys 90 as well as U2AF65-Asp 401 and SF1-Arg 21 are observed at similar positions for both structures. The essential nature of acidic residues within Helix A, Phe 454, and the Trp-binding pocket was confirmed for the U2AF65-UHM/SF1-ligand complex by site-directed mutagenesis of the U2AF65-UHM or SF1-ligand followed by pull-down assays (Selenko et al. 2003). Likewise, the U2AF65-ligand-Trp 92 was found to contribute two orders of magnitude to the affinity of the U2AF35-UHM/U2AF65-ligand complex by isothermal titration calorimetry.
In the U2AF35-UHM, a distinctive Trp residue (U2AF35-Trp 134) is observed at the X position of the Arg–X–Phe motif on the last loop of the UHM domain. Bulky aromatic residues such as Trp at this solvent-exposed position are especially rare among canonical RRM domains (1% of 676 annotated RRM domains in the SWISS-PROT database). The most frequently observed residues at the corresponding position of canonical RRMs are highly charged, including Glu (16%) and Lys (15%), as is also observed for the U2AF65-UHM (Arg–Lys–Phe). The unusual U2AF35-Trp 134 inserts between a series of unique Pro residues at the C terminus of the U2AF65-ligand, which are completely absent from the SF1-ligand of the U2AF65-UHM. The additional Trp/Pro interaction significantly contributes to the high affinity of the U2AF heterodimer (1.7 nM Kd; Kielkopf et al. 2001). Because the U2AF65-UHM/SF1-ligand complex lacks the corresponding Trp/Pro interaction, the affinity is relatively weak (~100 nM Kd; Selenko et al. 2003). The sequence differences in the ligands recognized by the U2AF35-UHM and U2AF65-UHM domains reflect the different functional roles of the complexes, which, respectively, maintain the constitutive U2AF35-UHM/ U2AF65 heterodimer (Zhang et al. 1992; Rudner et al. 1998b) or form a transient U2AF65/SF1 intermediate during spliceosome assembly (Rutz and Seraphin 1999).
The structures of the U2AF35-UHM/U2AF65-ligand and U2AF65-UHM/SF1-ligand complexes revealed several sequence features that distinguish UHMs from canonical RRM domains. One striking feature of UHM domains is their atypical RNP-like motifs. The first residue of the RNP1-like motif and the second residue of the RNP2-like motif are unusual in that they are exposed on the β-sheet surface rather than directly involved in RNA binding. Residues in these positions consist of aliphatic amino acids (U2AF35-Ala 47, Val 110, or U2AF65-Cys 379, Cys 429) as opposed to the basic and aromatic residues used for RNA recognition by canonical RRM domains. Other prominent distinguishing sequences include the Arg–X–Phe motif and acidic residues in Helix A (especially U2AF35-Glu 84/Glu 88 and U2AF65-Asp 401/Glu 405). As a consequence of the acidic nature of Helix A and lack of a basic RNP1 residue that usually contacts the RNA, the isoelectric points of UHM domains are remarkably low (pI 4.1 for the U2AF35-UHM and pI 4.3 for the U2AF65-UHM) compared with the typically basic character of canonical RRMs (pI >9) that function to bind anionic RNA ligands. The majority of the aliphatic residues lining the Trp-binding pocket (including U2AF35-UHM Leu 48, Val 85, Leu 130, and Ile 140 and their U2AF65-UHM counterparts Leu 380, Val 402, Leu 449, and Val 459), however, cannot be used to distinguish UHM from canonical RRM domains, because they also serve to preserve the RRM fold (Birney et al. 1993). One exception is the last aliphatic residue of the RNP2 motif (U2AF35-Ile 51 or U2AF65-Met 383), which contributes to the Trp-binding pocket and consequently differs from the conserved RNP2-Leu residue within the hydrophobic core of canonical RRM domains. Thus, at least three major sequence differences required for UHM–protein interactions distinguish UHMs from canonical RRM domains: (1) atypical RNP-like motifs, (2) an Arg–X–Phe motif in the last loop, and (3) an acidic character of Helix A.
The discovery of two examples of RRM-like domains with specialized sequence characteristics for protein–protein interactions raised the question of whether the U2AF35-UHM and U2AF65-UHM domains represented a larger family of modular protein interaction domains. Examples of proteins with domains similar to the U2AF65 or U2AF35-UHM had been previously noted (Kielkopf et al. 2001; Selenko et al. 2003), including the C-terminal homodimerization domain of PUF60, which has previously been referred to as the PUF60, U2AF65, MUD2 protein–protein interaction (PUMP) domain (Page-McCaw et al. 1999). Several search strategies were used to further extend the UHM family. An initial consensus pattern for the U2AF65, U2AF35, PUF60, and Tat-SF1 UHM domains, defined automatically using the program PRATT (Brazma et al. 1996), proved too stringent as it only matched homologs of these proteins in a Scan-ProSite search of the SWISS-PROT/TrEMBL databases (Gattiker et al. 2002). Therefore, a target pattern ([ILM-VFC]-X-[LIFV]-X-[NSHT]-[ILMVC]-X(6,40)-[VLIT]-X(2)-[ED]-X(4,5)-G-X-[IVA]-X(4)-[VIL]-X(4,25)-[GV]-X-[VIAL]-[FY]-[VIL]-X-[FYC]-X(6,12)-[AC]-[LVMIC]-X-X-[LMIF]-X-[NG]-R-[WYKM]-[FY]-X-G-X(4,8)-[IVL]) was defined manually based upon conserved residues that either maintain the RRM-like fold (Birney et al. 1993) or mediate protein–protein interactions in the structures of U2AF35-UHM/U2AF65-ligand and U2AF65-UHM/SF1-ligand (Kielkopf et al. 2001; Selenko et al. 2003). A search of the SWISS-PROT/TrEMBL databases with this target pattern identified several novel UHM candidates. The UHM family was further extended by manually inspecting RRM family alignments (Prosite PS50102) and the results of iterative PHI-PSI BLAST searches (Altschul et al. 1997) for similarities to the signature Arg–X–Phe motif observed in the last loop of the prototype U2AF-UHM domains.
Sequence comparisons revealed that the principal features that distinguish UHM candidates from canonical RRMs are conserved among 12 novel UHM candidates (Table 1; Fig. 4A), including (1) poor conservation of amino acids in the RNP1- and RNP2-like consensus motifs that would normally bind RNA (first/third and second positions, respectively); (2) an Arg–X–Phe motif in the last loop of the RRM-like fold; and (3) conserved acidic residues in the predicted Helix A and a low isoelectric point (average pI ~4.5). Seven additional UHM candidates displayed a subset of the UHM characteristics. To further investigate the evolutionary relationship among members of the UHM and RRM families, a phylogenetic tree of the candidates was constructed using neighbor joining with correction for multiple substitutions (Fig. 4B; Thompson et al. 1997). A comparison with canonical RRMs whose role in RNA recognition has been established by structure determination (including U1A, SXL, PAB, HuD, and nucleolin) revealed that the 12 convincing UHM candidates occupy a phylogenetic branch distinct from canonical RRMs that diverged from a common ancestral domain. The dendrogram also confirms that several of the putative UHM candidates (i.e., those that displayed only a subset of the UHM characteristics) are more closely related to canonical RRM domains than to U2AF or other UHM candidates, indicating that these proteins may have independently evolved UHM-like sequence motifs. Additional proteins from diverse eukaryotes displayed UHM signature sequences, but were considered homologs of other UHM candidates based on high sequence identity (Table 2). Given the difficulty of distinguishing UHMs from canonical RRM domains based on primary sequence comparisons alone, additional UHM protein interaction domains may be hidden within the RRM superfamily.
In a few cases, UHM candidates that share a similar domain organization may represent homologs despite low sequence identities and/or a lack of consistent functional data, including SPF45 and DRT111; HCC1 and PAD1; TAT-SF1, UAP2, and CUS2; and MUD2 and U2AF65. In a well-studied example, MUD2 is the S. cerevisiae homolog of U2AF65 based on similar functional interactions with the Py-tract, U2 snRNP, and SF1 (Abovich et al. 1994; Rain et al. 1998). Despite low sequence identity (16%), heterologous complexes between MUD2 and human SF1, or between S. cerevisiae SF1 and human U2AF65 have not been observed (Rain et al. 1998) indicating that the ligand specificity of the human U2AF65-UHM has diverged from the MUD2-UHM. These differences in protein–protein interaction specificity are consistent with functional divergence of MUD2 from other U2AF large subunits. For example, MUD2 is dispensable for viability in S. cerevisiae (Abovich et al. 1994), whereas the UHM domain of the S. pombe U2AF65 homolog (which shares 31% sequence identity with human U2AF65) is required in vivo (Banerjee et al. 2004). The U2AF65-UHM interacts with an N-terminal domain of the SAP155 subunit of the U2 snRNP that is absent from the S. cerevisiae homolog of SAP155 (Gozani et al. 1998). Moreover, S. cerevisiae lacks an ortholog of the U2AF small subunit, indicating that MUD2 functions in the absence of the heterodimeric partner. These differences between S. cerevisiae and other U2AF homologs, coupled with the identification of eight human UHM candidates and 35 more homologs in a variety of higher eukaryotes compared with only three convincing yeast UHM candidates, suggests that the UHM diverged from the canonical RRM late in the evolutionary timeframe to serve the complicated pre-mRNA processing requirements of multicellular organisms.
The 12 candidate UHM domains are found in the context of a variety of domain arrangements within their protein sequences; a subset is detailed in Table 3. With the exception of the central URP-UHM, the UHM domains often occur near the C terminus of the candidate proteins, providing an exposed position to facilitate molecular recognition. Many of the UHM candidates also contain motifs frequently observed in splicing factors, such as canonical RRMs, arginine–serine (RS) domains, zinc fingers, and Glyrich regions. Additional unexpected domains are also observed, including the LAP2-Emerin-Man1 (LEM) protein–protein interaction domain of MAN1 and kinase domain of KIS.
The diverse functional domains of the UHM candidates are accompanied by an array of different biological functions (Table 3). Like U2AF65 and U2AF35, many of the UHM candidates play important roles during RNA splicing. Interestingly, a few of the candidates (PUF60, TAT-SF1, and HCC1) play a dual role in regulating transcription. Because RNA splicing factors can influence the efficiency of transcription, and conversely, transcription often influences the efficiency and products of alternative RNA splicing (Auboeuf et al. 2002; Rosonina and Blencowe 2002), these UHM candidates may couple RNA splicing with transcription to coordinate gene expression. Other candidate UHM proteins, namely, the membrane protein MAN1 and the regulatory kinase KIS, are important for signal transduction, but have not yet been shown to affect RNA processing. It remains to be determined whether KIS and MAN1 couple pre-mRNA splicing with other cellular pathways via their established roles in signal transduction. Demonstrating the significance of their functions, misregulated UHM-containing proteins are associated with several human diseases, including human immunodeficiency virus type 1 (HIV-1; Zhou and Sharp 1996), and certain cancers (Liu et al. 2001; Bieche et al. 2003; Sampath et al. 2003).
As stated above, the structures of the U2AF35-UHM and U2AF65-UHM complexed with their ligands revealed remarkably similar modes of protein interaction. Notably, both the U2AF65 and SF1 ligands bind their respective UHM domains via a similar arrangement of basic and Trp residues. A search for the ligand consensus pattern [RK]-X-[RK]-W, shared by both the SF1 and U2AF65 ligands, found >1000 matches within the SWISS-PROT database, indicating that predicting protein ligands of candidate UHMs is impractical without further experimental information to identify their functional binding partners. Given that all 12 compelling UHM candidates possess the signature sequences predicted to recognize ligands containing the [RK]-X-[RK]-W motif, it remains an open question whether each UHM domain specifically recognizes a single target, or could promiscuously interact with the intended ligand of a different UHM in the absence of temporal or spatial regulation. Toward answering this question, the U2AF65-UHM has been found not to interact with the N terminus of the U2AF65-ligand in two-hybrid assays (Tronchere et al. 1997) and pull-down experiments (C. Kielkopf, unpubl.), suggesting that individual UHMs may, indeed, specifically recognize a unique target. By analogy with other modular peptide binding domains (Pawson and Nash 2003), UHM sequences flanking the Trp-binding pocket may ensure specific and directional interactions (N-to-C orientation) with the ligand.
In support of this analogy, structure-based modeling suggests that variation of the central X residue in the UHM Arg–X–Phe loop may provide one mechanism for UHM recognition of diverse C-terminal ligand sequences. Several of the UHM candidates (hURP, PUF60, Tat-SF1, and HCC1) share a Trp residue within the Arg–X–Phe loop that is essential in the U2AF35-UHM for specific recognition of C-terminal U2AF65-ligand-Proresidues (Kielkopf et al. 2001). Other UHM sequences vary from similar Arg–Tyr–Phe motifs (SPF45, DRT111, UAP2, CUS2, and PAD1) to divergent Lys (U2AF65) and Met (KIS) residues. Besides recognizing the ligand C terminus via the Arg–X–Phe loop, distinct U2AF65-UHM or U2AF35-UHM residues make specific contacts with N-terminal ligand residues. In particular, the bulky U2AF65-ligand Tyr 91 stacks against unique U2AF35 aromatic residues (Tyr 52 and Phe 81), and forms a specific hydrogen-bond with His 77 of the U2AF35-UHM (Kielkopf et al. 2001). Similar or identical residues in the URP-UHM (Phe 206, Phe 239, and Gln 235) suggest that a bulky, hydrophobic residue preceding the consensus ligand-Trp would be recognized in an analogous manner, consistent with an interaction between hURP and U2AF65 in pull-down and yeast two-hybrid assays (Tronchere et al. 1997). The smaller size of the corresponding U2AF65-UHM residues (Ile 398 and Val 384) would leave the hydrophobic side chain of a ligand-Tyr in an unfavorable, solvent-exposed environment (Selenko et al. 2003). Considering the variety of cellular roles played by UHM candidates and the consequent requirement to recognize diverse protein ligands, it will be important to determine whether variation in the positions corresponding to U2AF35 Tyr 52, Phe 81, and His 77, and the central position of the Arg–X–Phe loop enables recognition of distinct ligand sequences by UHM domains.
In addition to recognizing short peptide ligands, UHM domains can self-associate to form protein homodimers. For example, the PUF60-UHM domain interacts with itself in two-hybrid assays (Poleev et al. 2000) and forms SDS-resistant homodimers during electrophoresis (Page-McCaw et al. 1999). The U2AF35-UHM has been shown to form weak homodimers by gel filtration, analytical ultracentrifugation, dynamic light scattering (Kielkopf et al. 2001), and two-hybrid assays (Wentz-Hunter and Potashkin 1996), whereas homodimers of the U2AF65-UHM have not been observed (Tronchere et al. 1997). Homo- or heterotypic oligomerizations also have been observed for classical protein–protein interaction domains, with several different effects on ligand recognition. For example, the nNOS-PDZ/syntrophin heterodimer prohibits peptide recognition (Hillier et al. 1999), whereas GRIP or Shank PDZ homodimers leave the peptide-binding pockets free (Im et al. 2003a,b) and the Eps8-SH3 homodimer alters the ligand specificity (Kishan et al. 1997). Although a U2AF35-UHM homodimer can be modeled with the solvent exposed Arg–Trp–Phe loop binding to the Trp-binding site on a second UHM domain, alternative interfaces are possible that would allow the oligomer to simultaneously recognize peptide ligands, as observed for established protein–protein interaction domains.
Modeling of the U2AF-UHM/ligand structures with RNA has revealed that peptide binding to the helical surface of the RRM-like fold is not predicted to physically interfere with putative RNA interactions on the opposite β-sheet face (Kielkopf et al. 2001; Selenko et al. 2003). Although the U2AF35-UHM/U2AF65-ligand complex binds RNA weakly (Kd >6 μM), accessory protein factors and adjacent domains in the full-length U2AF35 sequence (e.g., flanking zinc fingers and an RS domain) are required to assist the weak interaction (Rudner et al. 1998a; J. Valcarcel, pers. comm.). Likewise, the U2AF65-UHM domain is not required for Py-tract recognition (Banerjee et al. 2003), and does not appear to interact with RNA (Selenko et al. 2003). These results indicate that UHM domains are not likely to be involved in RNA interactions.
Instead, the UHM family has evolved sequence characteristics that have no benefit for RNA binding, while optimizing the interaction with peptide ligands. In most canonical RRM-RNA structures, conserved aromatic Phe/Tyr residues at the third RNP1 position or second RNP2 position stack with RNA bases or sugars, and a basic Arg/Lys residue at the first position of the RNP1 motif frequently forms a salt bridge with the phosphate backbone (Fig. 5A; Oubridge et al. 1994; Price et al. 1998; Deo et al. 1999; Handa et al. 1999; Allain et al. 2000; Wang and Tanaka Hall 2001). In contrast, the corresponding U2AF35-UHM (Fig. 5B) and U2AF65-UHM (Fig. 5C) residues are replaced with aliphatic substitutions that are not predicted to interact favorably with RNA. Moreover, UHMs display unexpectedly low isoelectric points for optimal binding of basic peptides.
In addition to poor conservation of RNP-like motifs and overall negative charge, RNA binding by the U2AF65-UHM structure is further inhibited by a C-terminal α-helix that forms a tight hydrophobic interface with the putative RNA-binding surface of the RRM-like fold (Selenko et al. 2003). In contrast, the C-terminal extensions of canonical RRMs more often strengthen rather than inhibit RNA binding. For example, the C-terminal helical extension of the N-terminal U1A-RRM not only contributes to RNA binding (Oubridge et al. 1994; Zeng and Hall 1997) but also mediates dimer formation for recognition of tandem RNA elements (Klein Gunnewiek et al. 2000; Varani et al. 2000). Based on the U2AF65-UHM structure, Phe 433 in the RNP1-like motif and Tyr 463 within the preceding turn interact with Tyr 469, Phe 474, and Trp 475 in the C-terminal α-helix (Selenko et al. 2003). Although counterparts of Phe 474 and Trp 475 are absent among the UHM candidates, aromatic residues at positions corresponding to Phe 433, Tyr 463, and Tyr 469 are observed for the UHM domains of KIS, PUF60, SPF45, and HCC1. This raises the possibility that some of the UHM candidates may have a hydrophobic C-terminal extension that may either interfere with RNA binding as for the U2AF65-UHM, or contribute to homodimer formation in a manner similar to U1A.
The canonical RRM domain was a relatively late evolutionary addition to the array of RNA-binding folds that emerged in response to the needs of complex pre-mRNA processing pathways (Anantharaman et al. 2002). As processes that were originally based on the RNA world became progressively more regulated and reliant on protein interactions, the RRM fold further developed specialized sequence characteristics for protein recognition to form the UHM subfamily. These UHM signature sequences included divergent residues in the RNP-like motifs, an Arg–X–Phe loop sequence, and key acidic residues that collectively recognized the Trp residue and positive charge of the protein ligand. Convincing UHM candidates have been discovered in association with a variety of fundamental cellular processes, ranging from pre-mRNA splicing to transcription, DNA repair, and signal transduction. The large number of proteins that share the signature protein–protein interaction residues of UHM domains supports the proposal that the U2AF-UHMs represent a novel family of modular protein interaction domains.
Because protein interaction domains are attractive modules for communication among a network of pathways, the UHM domain may be an evolutionary extension of RRMs that couples pre-mRNA processing with other nuclear processes. Protein recognition by so-called RNA-binding domains is an emerging theme in molecular recognition. An early example of RRM–protein interactions was observed in the structure of U2B″/U2A′, in which the α-helical surface of the U2B″-RRM interacts with the U2A′ leucine-rich repeat motif (Price et al. 1998). Several recent structures of the β-sheet surfaces of heterodimeric RRM domains interacting with α-helical protein ligands have revealed a second mode of RRM–protein recognition distinct from that of UHM/protein complexes (Fribourg et al. 2003; Lau et al. 2003; Shi and Xu 2003; Kadlec et al. 2004). In addition to distinguishing protein recognition domains within the RRM family, a growing list of fold families such as the Sterile α-Motif (SAM; Kim and Bowie 2003), LEM (Cai et al. 2001; Laguri et al. 2001), Pumilio/HEAT-repeat domains (Wang et al. 2002), and zinc fingers (Morgan et al. 1997) have been found to bind either nucleic acids or protein ligands through slight variations of a common scaffold. Furthermore, RS domains have been shown to contact the pre-mRNA during splicing (Valcarcel et al. 1996; Shen et al. 2004), and have also been reported to mediate protein–protein interactions (Wu and Maniatis 1993). Because a major goal of the “postgenomic” era is the ability to predict protein functions even in the absence of corroborating experimental results (Thornton et al. 2000), it will be essential to compile a lexicon of signature sequences, such as those that distinguish UHMs from canonical RRMs, for other fold families whose members play diverse functional roles.
We thank S. Evans for editorial assistance, and J. Bender, M. Matunis, M. Swenson, and J. Wedekind for careful reading of the manuscript. Funding for C.L.K. is provided by the Johns Hop-kins University Center for AIDS Research grant #P30 AI42855.