|Home | About | Journals | Submit | Contact Us | Français|
3′-end cleavage and subsequent polyadenylation are critical steps in mRNA maturation. The precise location where cleavage occurs [referred to as poly(A) site] is determined by a tripartite mechanism in which a A(A/U) UAAA hexamer, GU-rich downstream element and UGUA upstream element are recognized by the cleavage and polyadenylation factor (CPSF), cleavage stimulation factor (CstF) and cleavage factor Im (CFIm), respectively. CFIm is composed of a smaller 25 kDa subunit (CFIm25) and a larger 59, 68 or 72 kDa subunit. CFIm68 interacts with CFIm25 through its N-terminal RNA recognition motif (RRM). We recently solved the crystal structures of CFIm25 bound to RNA and of a complex of CFIm25, the RRM domain of CFIm68 and RNA. Our studies illustrated the molecular basis for UGUA recognition by the CFIm complex, suggested a possible mechanism for CFIm mediated alternative polyadenylation, and revealed potential links between CFIm and other mRNA processing factors, such as the 20 kDa subunit of the cap binding protein (CBP20), and the splicing regulator U2AF65.
3′ processing of message RNA (mRNA) is an essential maturation step that increases the stability of mRNA, facilitates its export from the nucleus to the cytoplasm, and enhances translation efficiency.1 3′ end formation is a two-step process involving, first, endonucleolytic cleavage at a polyadenylation site [poly(A) site] followed by the addition of a polyadenine tail.2–5 Poly(A) site definition is accomplished through the recognition of specific cis-elements [referred to as poly(A) signal] located on the mRNA by their corresponding protein factors.4,5 Two of the well-studied poly(A) signals are the A(A/U)UAAA hexamer and downstream GU-rich element, which are bound by the cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF) complexes, respectively.4,5 A third poly(A) signal consists of UGUA elements and was identified as the preferred binding site of Cleavage Factor Im (CFIm) by SELEX and biochemical analyses.6,7 The tripartite core protein-RNA complexes, together with Cleavage factor IIm serve as a platform to recruit other 3′ processing factors to modulate the efficiency of the cleavage and polyadenylation reaction.1–3,8
CFIm is a two-subunit complex, composed of a small 25 kDa (CFIm25) subunit and a larger 59/68/72 kDa subunit.9 CFIm25 is encoded by one gene, CPSF5, whereas two separate genes, CPSF6 and CPSF7, code for two isoforms of the large subunit, CFIm68 and CFIm59. The third isoform, CFIm72, is an alternatively spliced form of CFIm68.9–11 CFIm68 and CFIm59 both contain an N-terminal RRM domain, a central proline-rich region, and a C-terminal RS-like domain.10,11 The N-terminal RRM of CFIm59/68 mediates the interaction with CFIm25.11 Besides its fundamental role in UGUA-mediated poly(A) site recognition,6,7 CFIm has been shown to influence alternative poly(A) site selection,12–14 mRNA export,15,16 and mRNA splicing.17 The recently solved crystal structures of CFIm25-RNA and CFIm68 RRM-CFIm25-RNA complexes18 taken together with biochemical analyses shed light on the molecular mechanisms underpinning CFIm's specificity for UGUA elements and its role in alternative poly(A) site selection. The major findings resulting from these studies and their implications are discussed below.
A quick glance at the domain organization of the two subunits of CFIm might give the erroneous impression that CFIm68 is likely to be the subunit that recognizes UGUA sequence elements, because the RRM it contains is the most abundant single-stranded RNA binding domain in vertebrates.19,20 Furthermore, this motif interacts with RNA in a sequence-specific manner in a large number of instances.21,22 In contrast, CFIm25 possesses a Nudix hydrolase domain,23,24 a motif found in housekeeping enzymes which primarily hydrolyze (di)nucleotides.25,26 However, UV crosslinking11 and gel shift assays27 indicated that CFIm25 is capable of binding RNA. The CFIm68 RRM, on the other hand, enhances RNA binding mediated by CFIm25, but is not able to bind RNA by itself.11
Crystal structures of CFIm25 in complex with an RNA oligonucleotide containing a UGUA element unveiled the molecular basis for sequence specific recognition.27 Comparison with other Nudix proteins revealed that CFIm25 possesses a unique α-helix loop motif preceding its Nudix fold. This α-helix loop motif not only blocks the canonical hydrolase active site, but also provides a scaffold for CFIm25 to bind RNA.27 The UGUA element is recognized via a variety of hydrogen bonding interactions. U1 is mainly recognized by main chain atoms from Phe104, whereas U3 is recognized by the side chain of Arg63. In addition to the interaction with the side chain of Glu55, G2 forms an intramolecular Watson-Crick/sugar-edge base pair with A4.27 Moreover, Phe103 stacks with U1 and G2 to further stabilize the CFIm25-UGUA complex.27
To further investigate the role of the larger CFIm subunit in poly(A) site recognition, we solved a crystal structure of a CFIm68 RRM-CFIm25-RNA ternary complex.18 Consistent with previous observations, CFIm68 and CFIm25 forms a 2:2 heterotetramer.12 However, instead of forming a dimer, two CFIm68 RRM molecules bind to opposite sides of the CFIm25 homodimer.18 The CFIm68 RRM adopts the typical β1α1β2β3α2β4 architecture.22 The RRM contacts CFIm25 through the loops connecting β1/α1 and β2/β3, referred to as loop1 and loop3, respectively. Interactions mediated by loop1 are mainly hydrophobic, whereas loop3 residues participate in hydrogen bonding interactions, involving both side chain and main chain atoms. Mutagenesis analyses illustrated that only loop3 is critical for the CFIm complex formation.
Biochemical data demonstrated a critical role of CFIm68 in looping the intervening sequence between the two UGUA elements bound by the CFIm25 dimer. Two CFIm25 monomers are oriented in an anti-parallel orientation, so that the 5′ end of the two UGUA elements are facing each other. Hence, the intervening sequence needs to loop around to position both UGUA elements in the RNA binding pocket of CFIm25. Mutagenesis analyses attempting to sketch a low resolution RNA path have revealed that the CFIm68 RRM residues located in the clefts formed at CFIm68-CFIm25 interface are essential for RNA looping. A CFIm68 RRM quadruple variant bearing mutated residues in both clefts (W90A/W91A/N117A/R118A) nearly abrogated the RNA binding affinity of CFIm. Unlike other RRMs, which bind RNA across the β-sheet surface formed by the four anti-parallel β-strands,22 CFIm68 RRM is likely to direct the looping RNA beneath the β-sheet surface: Asp94, which is located at the bottom of the RRM, significantly reduced the RNA binding affinity of CFIm when mutated to alanine. Furthermore, an additional C-terminal α-helix (α3) immediately following the RRM blocks the surface of the β-sheet, which is the platform that usually binds RNAs.22 Also, in the CFIm68 RRM one of the three highly conserved aromatic residues that are responsible for RNA binding is replaced by a leucine (L128 on RNP1).22
CFIm68, CFIm59 and their homologues were aligned using ClustalW28 and sequence conservation was calculated and mapped onto the CFIm68 RRM model using ConSurf (Fig. 2B).29 In agreement with the proposed RNA path based on the biochemical data, residues in the vicinity of clefts 1 and 2, and located at the bottom of the RRM manifest higher conservation than the rest of the RRM. Taken together these observations suggest that the RNA is unlikely to loop across the β-sheet surface.
Pre-mRNA may be polyadenylated in several different ways due to the presence of multiple polyadenylation sites in the 3′-UTR.30 Deep sequencing and bioinformatics analyses have demonstrated the prevalence of alternative polyadenylation,8,31,32 which gives rise to mRNAs with 3′ UTR of various lengths30,33,34 and subsequently affects a variety of cellular events, such as gene silencing, tissue differentiation, and development.30,35 A recent report illustrated the involvement of CFIm in alternative polyadenylation in male germ cells.12–14 Moreover, knockdown of either CFIm25 or CFIm complex in Hela cell extracts led to a shift to the use of an upstream poly(A) site.12,13 The ability to loop the intervening sequence between two UGUA elements by CFIm68 provided a potential mechanism for CFIm to regulate alternative poly(A) site selection.18 Gel shift assays using UGUA-containing RNAs of various lengths for the intervening sequence showed that a minimum of 7 nucleotides (7 nt) is required for effective binding by the CFIm68 RRM-CFIm25 complex, whereas longer spacers (9 to 15 nt) enhanced the binding affinity. These data led to the hypothesis that CFIm68 RRM might not restrain the maximal length of the intervening RNA, and might therefore loop out an entire poly(A) site, including the AAUAAA hexamer and downstream GU rich element (Fig. 1). A similar RNA looping-mediated regulation mechanism has been proposed for the splicing regulator pyrimidine track binding protein (PTB).36 The antiparallel organization of the RRMs in PTB may allow the protein to loop out and exclude an entire exon from the mature mRNA.36 Interestingly enough, the 3-subunit cleavage stimulation factor complex (CstF) has been proposed to form a heterohexamer consisting of two copies of each subunits: CstF77, CstF64 and CstF50.3,37–39 Although CstF64 contains only one RRM domain, which has been shown to recognize GU rich element,40,41 CstF64 might achieve a similar RNA looping mechanism facilitated by the dimeric status of CstF complex and thereby influence the usage of an alternative poly(A) site. The hypothesis is consistent with the previous observation that a lower level of CstF64 in plasma B cells correlates with the use of alternative poly(A) sites as compared to pre-B cells.42 We speculate that RNA looping might be a general mechanism utilized by some 3′ processing factors to regulate polyadenylation.
CFIm68 and CFIm59 share a similar domain composition that is the hallmark of splicing regulator SR proteins,43 with a central proline-rich region flanked by an N-terminal RRM domain and a C-terminal RS-like domain. On the other hand, CFIm68 possesses an additional glycine-arginine rich (GAR) motif, which is missing in CFIm59. Interesting, CFIm68 has previously been shown to participate in the export of mRNA out of the nucleus and the GAR motif is responsible for the interaction with the mRNA export receptor NXF1/TAP.15,16 These data demonstrated a potentially different function for the two larger subunits. A recent study focused on post-translational modifications also shed light on the role of the multiple forms of the larger subunit.10 In the report, Martin and colleagues identified distinct methylation patterns of arginine residues in CFIm68 and CFIm59, and the different enzymes they are methylated by.10 The RS-like domain of both CFIm68 and CFIm59 are weakly methylated by PRMT1, a member of a family of protein arginine methyltransferases (PRMTs).10,44 The SH3 domain of PRMT2 was found to inter act with CFIm59, but not CFIm68.10,45 On the other hand, the PRMT5 complex only methylates CFIm68 within the GAR motif that is absent in CFIm59.10 The distinct methylation patterns of the two larger CFIm subunits suggested they might play different roles in mRNA 3′ processing, but the function of these modifications have yet to be determined.
Besides the crystal structure of the CFIm68 RRM-CFIm25 complex,18 a structure of a CFIm59 RRM-CFIm25 complex has also been solved recently (Structural Genomics Consortium, Karolinska Institute; PDB ID code 3N9U). Although different RRM constructs were used for crystallization, namely residues 13–235 for CFIm68 and 50–182 for CFIm59, both groups could observe interpretable electron density only for the RRM domain, i.e., residues 81–173 of CFIm68 and 82–177 of CFIm59. A structural comparison between the two complexes provides insight into the function of various CFIm isoforms (Fig. 2D). The overall protein architecture of CFIm59 RRM is very similar to that of CFIm68, with a typical RRM fold appended with a C-terminal helix positioned on top of the β-sheet. (RMSD 1.23 Å calculated on 93 Cα atoms) (Fig. 2D). In addition to the hydrophobic stacking interaction between Phe168 and Tyr127 and van der Waals contacts between Leu165 and Tyr85, which stabilize α3 as observed in CFIm68, Arg159 makes stacking and hydrogen bonding interactions with Phe168 and Gln167, respectively. These additional forces are not observed in CFIm68, since a threonine is located at the position corresponding to CFIm59 Arg159. Another interesting feature within α3 is the hydrogen bond between Ser166 and the main chain carbonyl of residue 162,59 (numbered as in CFIm59), which exists in both CFIm59 and CFIm68. Ser166 is subject to phosphorylation,46 which would disrupt the hydrogen bonding interaction with the main chain carbonyl of residue 161,68 and potentially destabilize helix α3. Interestingly, when this serine was mutated into an aspartate, a phosphate mimic, we observed a two-fold increase in the RNA binding affinity of the CFIm68/CFIm25 complex (data not shown). The potential role of Ser166 phosphorylation in the regulation of mRNA processing will need to be explored further.
Despite the fact that they crystallized in different space groups, the heterotetramer of CFIm59 RRM-CFIm25 is organized in the same manner as the CFI 68 RRM-CFIm25 complex, with two CFIm59 RRMs flanking the CFIm25 homodimer. In order to investigate the relationship between CFIm59 and CFIm68 in RNA binding, we inspected the sequence conservation (Fig. 2A and B) and electrostatics potential of the two RRM domains (Fig. 2C). Most of the surface charges are similar between CFIm59 and CFIm68. One notable difference is that the cleft 2 side of CFIm59 is more negatively charged than in CFIm68. The potential impact of the charge difference on RNA binding will require further experimental investigation.
Although the overall domain architecture of individual subunits are nearly identical between CFIm68 RRM-CFIm25 and CFIm59 RRM-CFIm25 complexes, a superposition of the entire heterotetramer revealed interesting differences (Fig. 2D and E). The large subunit of CFIm contacts the CFIm25 dimer through loop1 and loop3 of the RRM domain, and two clefts are formed at the RRM-CFIm25 interface. These clefts have been proposed to serve as the entry and exit paths for the mRNA bound by CFIm. In the CFIm68 RRM-CFIm25 complex, the exit clefts (designated as cleft2) are quite wide (~20 Å) whereas the entrance clefts (designated as cleft1) are narrower (~8 Å). Similarly, in the CFIm59 RRM-CFIm25 complex, cleft2 is open. While one cleft1 exhibits the same width as in the CFI 68 RRM-CFIm25 the other cleft1 is much narrower (~4 Å), due to a ~4° shift of the RRM domain which does not affect loop1 or loop3, which remain in the same position (Fig. 2D and E). As a consequence, a salt bridge is formed between Glu112 of CFIm59 and Arg68 of CFIm25 (Fig. 2E). The movement of the RRM domain suggests that CFIm68/CFIm25 and CFIm59/CFIm25 complexes might bind RNA in a different manner, since an RNA molecule is not expected to thread through a 4 Å cleft. We cannot rule out, however, that the RRM movement might be the result of crystal packing interactions. Structures of CFIm in complex with a long RNA containing two UGUA elements will be required to define the path of the intervening looping sequences and possibly shed light on the different roles that CFIm68 and CFIm59 play in RNA binding, in particular and mRNA processing in general.
Whereas many studies have focused on elucidating the mechanism of individual steps in mRNA processing, emerging evidence suggests that all steps are intimately connected (reviewed in ref. 47). As an integral part of mRNA processing, 3′ processing is coupled to 5′ capping and splicing.48 An earlier study has identified a physical connection between the cap binding protein (CBP) complex with 3′ processing by showing that the 3′ processing complex was less stable when CBP was depleted from HeLa cell nucleus extract, while addition of recombinant CBP complex restored the 3′ cleavage efficiency.49 The crystal structure of CFIm68 RRM-CFIm25 complex and structural alignment analyses suggest that CFIm25 might be the 3′ processing factor that mediates the connection. CBP is a two-subunit complex composed of a larger 80 kDa (CBP80) and a smaller 20 kDa (CBP20) subunit, which binds to m7GpppG Cap through its RRM domain.50,51 A superposition of CBP20 and CFIm68 illustrated that not only the β-sheets and α-helices are superimposable, but more importantly, loop1 and loop3 aligned nearly perfectly (Fig. 3). As shown by mutagenesis, only loop3 is critical for the formation of CFIm complex. Considering that 4 out of 7 hydrogen bonding interactions mediated by loop3 are through main chain atoms, it is plausible that a protein, such as CBP20 could establish stable contacts with CFIm25, regardless of the amino acid sequence in loop3. Moreover, CBP20 interacts with CBP80 through loop2 and loop4, which are located on the opposite side of loop1 and loop3. This would allow CFIm25 to bind CBP20 without interfering with its binding partner, CBP80.
In light of the CBP20-CFIm68 structural alignment, we performed a systematic search for homologous structures of the CFIm68 RRM domain using the DALI server.52 We aligned all the homologs with a Z score greater than 5 and inspected them manually using PyMOL.53 Among 598 homologs (This data set contains duplicates due to the presence of multiple chains in PDB files), only 4 proteins (duplicates were excluded) were found to have the same shape as loop1 and loop3 in CFIm68 (Fig. 3): the first RRM of cytoplasm poly(A) binding protein (PAPBC) (Z score = 12.5, PDB ID: 1CVJ),54 CBP20 (Z score = 11.8, PDB ID: 1H2V),51 Rna15, the yeast homolog of CstF64 (Z score = 10.9, PDB ID: 2X1A)40 and the second RRM of the splicing factor U2AF65 (Z score = 10.4, PDB ID: 2G4B).55 Identification of U2AF65 is intriguing, since the subunits of CFIm have been identified in purified human spliceosomes.56,57 In addition, specific interactions between CFIm and the splicing factor U2AF65 have been established experimentally.17 The sequence alignment comparison revealed a potential molecular mechanism for how CFIm25 and U2AF65 may bridge 3′ processing and splicing. Future efforts will be devoted to validate the direct interactions between CFI 25 and the identified protein factors and investigate the impact these interactions may have on mRNA processing.
After this review was accepted for publication a paper describing a crystal structure of a CFIm complex was published:
Li H, Tong S, Li X, Shi H, Ying Z, Gao Y, Ge H, Niu L, Teng M. Structural basis of pre-mRNA recognition by the human cleavage factor I(m) complex. Cell Res 2011; 21:1039–51.