|Home | About | Journals | Submit | Contact Us | Français|
Most eukaryotic mRNA precursors (pre-mRNAs) must undergo extensive processing, including cleavage and polyadenylation at the 3′-end. Processing at the 3′-end is controlled by sequence elements in the pre-mRNA (cis elements) as well as protein factors. Despite the seeming biochemical simplicity of the processing reactions, more than 14 proteins have been identified for the mammalian complex, and more than 20 proteins have been identified for the yeast complex. The 3′-end processing machinery also has important roles in transcription and splicing. The mammalian machinery contains several sub-complexes, including cleavage and polyadenylation specificity factor (CPSF), cleavage stimulation factor (CstF), cleavage factor I (CF Im), and cleavage factor II (CF IIm). Additional protein factors include poly(A) polymerase (PAP), poly(A) binding protein (PABP), symplekin, and the C-terminal domain (CTD) of RNA polymerase II largest subunit. The yeast machinery includes cleavage factor IA (CF IA), cleavage factor IB (CF IB), and cleavage and polyadenylation factor (CPF).
In eukaryotes, messenger RNA precursors (pre-mRNAs) are transcribed in the nucleus from the genomic DNA by RNA polymerase II (polII). These pre-mRNAs must undergo extensive co-transcriptional processing before they can be transported to the cytoplasm for translation into proteins. The processing events include capping, where the guanosine at the 5′-end of the pre-mRNA is methylated at the 7 carbon; splicing to remove the intronic sequences of the pre-mRNA; and cleavage and polyadenylation at the 3′-end [1–4]. All these modifications are essential for the maturation of the pre-mRNAs. This review will focus on the processing events at the 3′-end.
A poly(A) polymerase activity was first isolated from calf thymus in 1960 , and its role in pre-mRNA 3′-end processing was recognized ten years later [6–8]. Subsequent studies showed that the primary RNA transcripts extended beyond the site of polyadenylation, indicating that eukaryotic pre-mRNA 3′-end processing requires both cleavage and polyadenylation [9–11]. These observations are in sharp contrast to those in bacteria, where the 3′-ends of mRNAs are formed by transcriptional termination. Further studies showed that a large collection of protein factors (with more than 14 molecules in mammals and more than 20 molecules in yeast, totaling over 1 mega-dalton) are required for the 3′-end processing, despite the apparent simplicity of the cleavage and polyadenylation reactions. This machinery is directed to the correct cleavage site by sequence elements within the pre-mRNA 3′-end.
This review will attempt to summarize the current knowledge on pre-mRNA 3′-end processing in mammals and yeast, two model systems in which these modifications have been studied in the greatest detail. Special emphasis will be placed on recent developments in this area, including structural information on the protein factors that have become available in the past few years. There are also many excellent earlier reviews on pre-mRNA 3′-end processing [1–4,12–23].
The processing of pre-mRNA 3′-ends has crucial functional importance in eukaryotes, and disruption of this processing has catastrophic effects on cell growth and viability . First, 3′-end processing promotes the transport of mRNAs from the nucleus to the cytoplasm . Substitution of the pre-mRNA 3′-end polyadenylation site with a ribosomal RNA cleavage site produces an mRNA that is cleaved but not polyadenylated. This substitution decreased the ratio of cytoplasmic to nuclear mRNA concentration by ten-fold, indicating a reduction in mRNA transport and consequently a reduction in protein expression .
Secondly, 3′-end processing promotes the stability of mRNAs [20,22]. In the cytoplasm, mRNAs are degraded from the 3′-end first, indicating the importance of protecting the 3′-end . The addition of the poly(A) tail and subsequently the binding of poly(A) binding protein (PABP) has been shown to prevent degradation in mammalian cells  and Xenopus oocytes . In fact, just the presence of PABP without a poly(A) tail may be sufficient to prevent mRNA degradation .
Thirdly, 3′-end processing enhances the translation of mRNAs into proteins. The poly(A) tail and PABP interact with the methyl cap at the 5′-end to promote translation [20,22,28,29]. Studies in yeast have shown that the presence of the poly(A) tail alone is sufficient to initiate efficient translation, but the presence of both the poly(A) tail and the 5′-cap is optimal for translation .
Finally, 3′-end processing is intricately coupled to the transcription and splicing machineries [13,16,17,23,31]. The 3′-end processing complex interacts with transcription factors and the C-terminal domain (CTD) of polII to help control transcriptional initiation, and a proper poly(A) signal is essential for transcriptional termination. Alterations in these interactions lead to improper polyadenylation and mRNA degradation .
Recent studies show that polyadenylation is not restricted to the nucleus. PAP enzymes have been identified in the cytoplasm and in the mitochondria [32–34]. These enzymes are found from yeast to humans, and have important functions in many cellular processes.
The 3′-end cleavage and polyadenylation reaction is directed by sequence elements within the untranslated region of the pre-mRNA (the so-called cis elements). These sequence elements are found in almost every eukaryotic pre-mRNA that is polyadenylated, but they are not found in histone pre-mRNAs which are cleaved but not polyadenylated . Disruption of the position and sequence of these cis elements reduces the efficiency of 3′-end processing, consistent with their conservation in pre-mRNAs. Recent studies analyzing mRNAs and cDNAs indicate that polyadenylation can occur at multiple sites for over half the genes in humans (~54%) and for about one-third of the genes (~32%) in mice [35,36]. These alternative polyadenylation sites may occur in the same exon or in an alternatively spliced 3′ exon. Although polyadenylation can occur at multiple sites, those containing the optimal sequence elements are cleaved more efficiently.
The sequence elements in yeast and mammals share some recognizable similarity, but also have significant differences (Figs. 1A, 1B). The yeast sequence elements differ in their sequence and location from mammalian sequence elements (Fig. 1B). Additionally, yeast sequence elements are also less conserved than their mammalian counterparts. Some of the sequence elements in yeast appear in duplicate, while mammalian sequence elements generally occur only once (Fig. 1A). Another difference is the presence of U-rich elements that flank the cleavage site in yeast. As a result, we will discuss the mammalian and yeast sequence elements separately below.
Mammalian pre-mRNAs contain three primary sequence elements that define the polyadenylation site and two auxiliary sequence elements that enhance/regulate the 3′-end processing reaction (Fig. 1A). The three primary sequence elements consist of the hexamer AAUAAA polyadenylation signal (PAS), the cleavage site, and the G/U-rich downstream element (DSE) . The two auxiliary sequence elements consist of an upstream element and a downstream element .
The PAS was the first identified sequence element proposed to play a role in 3′-end processing . This AAUAAA hexamer was found in 80–90% of sequenced mRNAs. Variation of the second position to a U, creating the AUUAAA hexamer, is present in about 10% of the mRNAs . A study of over 4,000 human ESTs found that 77% contained a PAS, and of these 75% contained the AAUAAA sequence and 20% contained the AUUAAA sequence . A more recent study of over 13,000 human and mice ESTs showed that the number of ESTs that did not contain a PAS is only about 4% . AAUAAA (70%) and AUUAAA (15%) are the most common hexamers in the PAS.
Mutational studies have confirmed the necessity of the PAS for proper 3′-end processing. Patients suffering from α and β-thalassaemia carry a point mutation in the final base of the motif (A to G) and have low levels of mRNA . Additionally, four separate point mutations (AACAAA, AAUUAA, AAUACA and AAUGAA) in the PAS produced significantly reduced levels of polyadenylated RNA and increased amounts of unprocessed pre-mRNA in Xenopus laevis oocytes . Interestingly, two of the tested mutations are found in naturally occurring pre-mRNAs, yet they failed to produce polyadenylated mRNA.
Not only is the PAS sequence conserved, its distance from the poly(A) site is also conserved, which ranges from 10 to 30 bases downstream of the PAS (Fig. 1A). Deleting parts of the region between the PAS and the cleavage site produced a new cleavage site further downstream that corresponded to the length of the deleted segment .
The DSE is less conserved and more diffuse than the PAS [2,19]. The presence of this element was suggested by the observation that a deletion more than 10 nucleotides downstream of the cleavage site reduced polyadenylation three-fold . The DSE has been observed in two forms, a GU-rich element that has a sequence of YGUGUUYY (Y=pyrimidine) [43,44] and a U-rich sequence (UUUUU) [43,45], although pre-mRNAs can have neither, one, or both of these sequences . Point mutations within the DSE have a smaller effect on cleavage activity while deletions of short segments of the DSE have more significant impacts [47,48]. While the DSE is fairly tolerant of sequence abnormalities, it is less tolerant of positioning abnormalities. The DSE is generally located within 30 nucleotides of the cleavage site (Fig. 1A), although in a few instances DSEs have been observed further downstream .
The cleavage site of pre-mRNAs is positioned between the PAS and the DSE, generally within a region of 13 nucleotides . The nucleotide sequence surrounding the cleavage site is not strictly conserved [51,52]. In vertebrate pre-mRNAs, 70% of the cleavage sites are located at the 3′ side of an adenosine residue , with a nucleotide preference of A > U > C G. The nucleotide preceding the cleavage site is cytosine in 59% of 269 pre-mRNA sequences examined , making the optimal cleavage site CA (Fig. 1A).
This element, located upstream of the PAS (Fig. 1A), does not have a consensus sequence, but often consists of a U-rich element (UUUU)  or similar sequences (UGUA, UAUA) . The efficiency of cleavage and polyadenylation is enhanced by the presence of this auxiliary element, as it promotes the binding of other polyadenylation factors to the cleavage site [55–58]. Most auxiliary upstream elements have been identified with the enhanced expression of intronless genes, which are normally expressed at lower levels than transcripts that contain introns .
While many examples of auxiliary upstream elements are known, fewer auxiliary elements have been identified downstream of the cleavage site. These auxiliary elements are generally G-rich, but they lack a conserved sequence and distance from the cleavage site. In addition, more than one auxiliary sequences can be present in a gene [2,60–64].
The yeast poly(A) site is defined by four sequence elements: the AU-rich efficiency element (EE), the A-rich positioning element (PE), the cleavage site, and the U-rich elements flanking the cleavage site (Fig. 1B).
The AU-rich EE is located furthest upstream but at a variable distance from the cleavage site . The sequence UAUAUA provided the greatest effect on 3′-end processing with the U at the first and fifth positions being the most critical for function . A large-scale analysis of the 3′ untranslated region of 1017 yeast nuclear transcripts showed that more than half of them (52%) contained the optimal EE sequence (UAUAUA) . While the EE improves the efficiency, it is not required for cleavage.
The A-rich PE is located downstream of the EE and approximately 10 to 30 nucleotides upstream of the cleavage site . The position and sequence of the PE are critical for efficient 3′-end processing. Although many A-rich sequences for the PE have been identified, the most effective sequences are AAUAAA or AAAAAA . The PE received its name because most mutations in the PE disrupt the position but not the efficiency of 3′-end processing , although a single-point mutations in the PE of the TRP4 gene decreased the efficiency of 3′-end processing . Deletion of the entire PE can also decrease the efficiency of 3′-end processing . The PE is similar to its mammalian counterpart, the PAS, but serves a less critical role in 3′-end processing.
The cleavage site is generally defined by a sequence element containing a pyrimidine followed by multiple adenosines Y(A)n , with the cleavage occurring at the 3′ side of an adenosine, similar to the reaction in mammals [70,71]. Mutation of the PE forces cleavage to occur at unspecific locations in the 3′ untranslated region .
Surprisingly, deletion of the PE and EE in cytochrome c pre-mRNA (CYC-1) does not abolish cleavage activity, signifying the importance of other sequences near the cleavage site . Sequence analysis revealed conserved U-rich sequences just upstream and downstream of the cleavage site . Mutations in either UUE or DUE reduce cleavage activity but do not abolish it. If these mutations are made in combination with mutations to the cleavage site, cleavage is drastically reduced, indicating a synergistic role of the U-rich sequences with the cleavage site . These U-rich sequences are also found in plants, but are absent in mammals .
Only two enzymatic activities are required for pre-mRNA 3′-end processing, cleavage and polyadenylation. However, studies with mammalian nuclear extracts  and yeast extracts , as well as studies by other biochemical and genetic methods, suggest that a large number of protein factors are required for 3′-end processing. Recent studies from (tandem) affinity purification of protein complexes in yeast, followed by identification of the components by mass spectrometry, confirmed the known protein factors in 3′-end processing as well as identifying many new ones [77–82]. Currently, more than 14 proteins have been identified for the mammalian 3′-end processing machinery (Fig. 1A), and more than 20 proteins appear associated with the yeast 3′-end processing machinery (Fig. 1B). Therefore, a mega-dalton complex is required for eukaryotic pre-mRNA 3′-end processing, despite the seeming biochemical simplicity of the reaction. While many of the protein factors in this machinery regulate pre-mRNA 3′-end processing, other factors in this machinery mediate transcriptional initiation, transcriptional termination, splicing and other events.
The mammalian 3′-end processing complex contains several sub-complexes (Fig. 1A), including cleavage and polyadenylation specificity factor (CPSF), cleavage stimulation factor (CstF), cleavage factor I (CF Im), and cleavage factor II (CF IIm). In addition, poly(A) polymerase (PAP), poly(A) binding protein (PABP), symplekin, and PolII CTD also belong to this machinery. All the protein factors except PABP are required for in vitro cleavage reaction, while only CPSF, PAP and PABP are required for in vitro polyadenylation .
The yeast 3′-end processing complex is different from the mammalian complex, but also shares significant similarities . The sub-complexes of the yeast machinery include the cleavage and polyadenylation factor (CPF), cleavage factor IA (CF IA), and cleavage factor IB (CF IB) (Fig. 1B). CPF can be further separated into cleavage factor II (CF II) and polyadenylation factor I (PF I) . CF II contains subunits that are homologous to those in mammalian CPSF, except that the homologs of mammalian CPSF-30 and hFip1, Yth1p and Fip1p, actually belongs to PF I rather than CF II (Fig. 1B). CF IA contains subunits that are homologous to those in mammalian CstF and CF IIm, except that mammalian CstF-50 does not appear to have a homolog in yeast. The functions of some of these sequence homologs can be different in yeast and in mammals. For example, mammalian CPSF-160 recognizes the AAUAAA PAS, whereas its homolog in yeast (Cft1p/Yhh1p) is mostly associated with the cleavage site. Similarly, while mammalian CstF-64 recognizes the G/U-rich DSE, yeast Rna15p recognizes the A-rich PE.
The yeast machinery also contains many additional protein factors in the CPF (Fig. 1B). Even though some of them have mammalian homologs, the functional roles of these homologs in pre-mRNA 3′-end processing have yet to be established. Even in yeast, reconstitution of the cleavage reaction in vitro requires only CF IA, CF IB, and CF II, while in vitro polyadenylation requires CPF, CF IA, CF IB, and Pab1p. The additional factors play generally a modulatory role.
Despite the identification of this large collection of protein factors, the identity of the nuclease that actually catalyzes the cleavage reaction was not known. Recent studies have provided strong structural and biochemical evidence that CPSF-73 is the endoribonuclease for pre-mRNA 3′-end processing.
The protein factors involved in pre-mRNA 3′-end processing will be described in more detail in the following sections, organized based on the mammalian machinery. For those mammalian proteins that have yeast homologs, the names of the yeast proteins are given in the parentheses. The function and sequence characteristics of protein factors that are characterized in both yeast and mammals are summarized in Table 1.
Mammalian CPSF contains five subunits, CPSF-30, CPSF-73, CPSF-100, CPSF-160 and hFip1 (Table 1). They are all required for efficient cleavage and polyadenylation of pre-mRNAs, and they all have clear sequence homologs in the yeast 3′-end processing machinery.
CPSF-160 is the largest subunit of CPSF and is one of the more studied proteins of the pre-mRNA 3′-end processing complex. CPSF-160 is conserved throughout eukaryotes (24% identity and 51% similarity between CPSF-160 and Cft1p/Yhh1p). It is an essential factor in yeast and mammals that interacts directly with pre-mRNA to direct the cleavage reaction, and it is also required for polyadenylation. Besides its role in 3′-end processing, CPSF-160 associates with factors involved in transcriptional initiation (TFIID)  and elongation (PolII CTD) , and plays a role in transcriptional termination . CPSF-160 is also involved in cytoplasmic polyadenylation in Xenopus oocytes .
Mammalian CPSF-160 binds directly to the PAS (Fig. 1A), with high affinity for the perfect PAS sequence (AAUAAA) and lower affinity for other PAS sequences . UV cross-linking, nuclease protection and other experiments demonstrated this direct binding, using both nuclear extracts and purified recombinant protein [86–88]. Deletion of nucleotides beyond the cleavage site reduces the ability of CPSF-160 to bind to the PAS. Mutations of the PAS abolished the cross-linking of nuclear extract , while they only have a small effect on the cross-linking of the recombinant CPSF-160 , suggesting that other proteins are required for the highly specific binding to the PAS by CPSF-160. CPSF-160 can also bind at a site other than the PAS in HIV pre-mRNA .
Efficient binding of CPSF-160 to the PAS also depends on the CstF. CPSF-160 was cross-linked to the PAS along with CstF-64 and the C protein of heterogeneous ribonucleoprotein particles (hnRNPs). The interaction with CstF is likely mediated by CstF-77 (see section on CstF-77 below). CPSF-160 interacts with other subunits of CPSF, including CPSF-100 and hFip1, as well as PAP [88,91].
Cleavage and polyadenylation are abolished if yeast Cft1p/Yhh1p is depleted from cell extracts by its antibody . Cft1p/Yhh1p binds directly to RNA , although not at the PE, the equivalent of the mammalian PAS. Instead, binding was observed primarily near the A-rich cleavage site (Fig. 1B), although the exact site is still not known .
Human CPSF-160 contains 1443 amino acid residues (Fig. 2A). There are two RNP-type binding motifs near the N-terminus (RNP1, residues 379–386 and RNP2, residues 344–349), but it is not known whether these residues directly interact with RNA . In yeast, deletion experiments showed that residues 500–750 of Cft1p/Yhh1p are required for RNA binding in vitro . Sequence analysis of this region predicts that it contains five β-propeller repeats . This region of the sequence is highly similar to the spliceosomal U2 snRNP (human SAP130, yeast Rse1p)  and the UV damage recognition protein Xeroderma pigmentosum group E (XPE) . A total of sixteen β-propeller repeats were identified in human CPSF-160 based on sequence analysis .
CPSF-160 also bears weak sequence homology with the ubiquitin ligase DDB1, for which structural information has been obtained [97,98]. DDB1 contains three β-propellers (each with 7 repeats) and a C-terminal domain. CPSF-160 probably forms a similar structure, although it contains several large insertions in the β-propellers that do not have counterparts in DDB1. The RNA-binding region of Cft1p/Yhh1p would be located in the second β-propeller.
Gene disruption of Brr5p/Ysh1p reveals that CPSF-73 is required for cell viability . It was recognized recently that CPSF-73 contains a metallo-β-lactamase domain at its N-terminus, and it is a founding member of theβ-CASP superfamily of proteins (Fig. 2A) [100–102]. The name β-CASP was derived from the metallo-β-lactamase domain they all contain and after the members CPSF-73 and CPSF-100, Artemis, SNM1 and PSO2. The highest amino acid sequence conservation between CPSF-73 and Brr5p/Ysh1p is for their metallo-β-lactamase domains .
Most β-CASP proteins are involved in nucleic acid binding and/or processing. Several members of the β-CASP family are known to be nucleases . Artemis, when complexed with DNA-dependent protein kinase, functions as an endonuclease on ssDNA overhangs [104,105]. Bacterial RNase J1 has 5′-to-3′ exoribonuclease activity on ribosomal RNA precursors . RNase Z, which contains only the metallo-β-lactamase domain (and therefore is not a β-CASP family member), is the ribonuclease essential for the 3′-end processing of tRNA precursors [107,108]. These observations suggest that CPSF-73 could have nuclease activity, and could actually be the endoribonuclease for the cleavage step of pre-mRNA 3′-end processing.
UV cross-linking of nuclear extracts revealed that CPSF-73 binds directly to the cleavage site in an AAUAAA dependent manner , suggesting that it is at the correct position to carry out the cleavage reaction. In addition, CPSF-73 is cross-linked in a U7 dependent manner to the histone pre-mRNA cleavage site, and it may be the endoribonuclease involved in histone pre-mRNA 3′-end processing .
Most β-CASP proteins contain conserved residues that can bind two metal ions (zinc, iron or others). Disruption of these putative metal binding residues results in 3′-end processing defects and cell death [109,111]. Dialysis of the nuclear extract leads to partial loss of cleavage activity, which can be rescued by the addition of ZnCl2. Moreover, cleavage reaction using nuclear extract is inhibited by zinc-specific chelators TPEN (N,N,N′,N′-tetrakis(2-pyridylmethyl)ethylenediamine) and OP (ortho-phenanthroline), as well as high concentrations the metal chelator EDTA . These results suggest that the cleavage reaction is zinc-dependent. However, it is currently not known exactly which protein(s) in this machinery is sensitive to the loss of zinc.
The crystal structure of the N-terminal region (residues 1–460) of human CPSF-73 has been reported (Fig. 2B) . The structure contains a metallo-β-lactamase domain and a β-CASP domain. Remarkably, the structure reveals that a stretch of about 60 residues just after the β-CASP domain actually belongs to the metallo-β-lactamase domain (Figs. 2A, 2B). Two metal ions, modeled as zinc based on the current data, are bound by the metallo-β-lactamase domain (Fig. 2C), and the binding site is located at the interface with the β-CASP domain. The binding modes of the two zinc ions in CPSF-73 are similar to those of the two zinc ions in RNase Z [107,108] as well as a bacterial ribonuclease . The structural studies identify a conserved His residue as the general acid, which is activated by a conserved Asp/Glu residue (Fig. 2C). Mutation of this His residue is lethal in yeast . In addition, the brr5-1 phenotype in yeast, a cold sensitive mutant that exhibits defects in cleavage and polyadenylation , is caused by a single-site mutation, A407T , that is located next to this His residue in Brr5p/Ysh1p. The functional importance of this His-Glu pair is also confirmed in RNase Z .
Biochemical studies with the bacterially expressed and purified human CPSF-73 showed that it possessed weak ribonuclease activity towards pre-mRNA substrates , in the absence of the other proteins of the 3′-end processing machinery. This activity was enhanced greatly when the purified protein was pre-incubated with Ca2+ ions. The activity is endonucleolytic and has little sequence specificity, consistent with the fact that CPSF-160 and CstF-64 help define the exact cleavage site in the pre-mRNA substrate. Overall, the structural and biochemical studies offer convincing direct experimental evidence that CPSF-73 is the endoribonuclease for the cleavage reaction in pre-mRNA 3′-end processing (Fig. 1A).
While the C-terminal region of CPSF-73 is not as conserved as the N-terminal region, removal of the C-terminus of Brr5p/Ysh1p, by as little as the last 30 residues, results in cell death . On the other hand, removal of the last 10 or 19 residues do not affect cell viability. Sequence analysis suggests that a leucine zipper, which is usually involved in protein-protein interactions, may be present in the final 30 residues of Brr5p/Ysh1p. The C-terminus of CPSF-73 does interact with other proteins in the 3′-end processing complex, including CPSF-100  and Pta1p . Brr5p/Ysh1p has also been shown to interact strongly with Clp1p, but the binding region is unknown . This association may bridge CPSF with CF IIm in mammals .
The C-terminal residues of Brr5p/Ysh1p have strong sequence homology to another member of the CPF in the yeast 3′-end processing complex, Syc1p (38% identity) . This will be discussed in more detail in the section on Syc1p later.
CPSF-73 has a second isoform in humans, known as RC-68 or Int9 [117,120]. It may be involved in 3′-end processing of small nuclear RNAs and it interacts with the second isoform of CPSF-100, RC-74 or Int11 [117,120].
Cft2p/Ydh1p is critical for yeast cell viability , and conditional mutants of this protein disrupt both cleavage and polyadenylation in vitro and in vivo . CPSF-100 has recognizable sequence homology to CPSF-73 (23% identity and 49% similarity for their metallo-β-lactamase domains) . This sequence conservation is comparable to that between CPSF-100 and Cft2p/Ydh1p (24% identity and 43% similarity for the entire proteins). Therefore, CPSF-100 is also a member of the β-CASP superfamily of proteins. However, the zinc-binding residues are not conserved in CPSF-100 (especially in Cft2p/Ydh1p), and therefore CPSF-100 cannot bind zinc and is not expected to be catalytically active [100,101]. The exact function of this protein in 3′-end processing is currently not known. Cft2p/Ydh1p contains an insert of about 200 residues (residues 401–601) that is highly charged and hydrophilic , which explains its larger size compared to CPSF-73 but the functional role of this segment is not known. CPSF-100 contains a similar (though shorter) segment (Fig. 2A).
The crystal structure of the metallo-β-lactamase and β-CASP domains of Cft2p/Ydh1p was reported recently . The overall structure is similar to that of CPSF-73 (Fig. 2D), and the structure confirms that the zinc binding site is severely disrupted in this protein.
Interactions between CPSF-100 and other proteins in the 3′-end processing complex have been identified. Cft2p/Ydh1p self-associates and also interacts with many proteins of the CPF (Cft1p/Yhh1p, Brr5p/Ysh1p, Pta1p, Pfs2p, Ssu72p, YDL094c), Pcf11p of CF IA, and the CTD of PolII . Deletion mutagenesis studies showed that the C-terminal region of Cft2p/Ydh1p is required for its self-association, while the N-terminal region is required for interaction with Pfs2p. In contrast, the full-length protein is required for interactions with the other proteins . Yeast two-hybrid studies showed that the last 245 residues of CPSF-73 can interact with CPSF-100 . Besides protein-protein interactions, Cft2p/Ydh1p can bind the pre-mRNA near the cleavage site [72,122].
CPSF-30 is required for both cleavage and polyadenylation , despite initial experiments showing that it might only be essential for polyadenylation . Interestingly, Yth1p does not appear to co-purify with the other CPSF subunits (Cft1p/Yhh1p, Cft2p/Ydh1p, Brr5p/Ysh1p, Fip1p), even though they are in the larger CPF complex (Fig. 1B) [115,122]. CPSF-30 has a second isoform in humans (locus XP_945726), sharing 54% sequence identity. However, the function of this close homolog has not been characterized.
CPSF-30 contains five CCCH zinc finger motifs (ZF1-ZF5) in all eukaryotes, and metazoan CPSF-30 has an additional CCHC zinc knuckle at the C-terminus (Fig. 2A). CCCH zinc fingers have the consensus sequence CX8CX5CX3H, while CCHC zinc knuckles have the consensus sequence CX2CX4HX4C . Both motifs have been shown to bind RNA by UV cross-linking , and in vitro RNA binding assays demonstrate that CPSF-30 preferentially binds a poly(U) sequence. Deletion of the zinc knuckle significantly decreases binding but retains specificity. RNase H protection assays in yeast identified that binding of Yth1p occurs at the U-rich sequences (UUE and DUE) that surround the cleavage site . The mutant yth1-1, which lacks the C-terminal 55 amino acids and thus the last zinc finger, still efficiently cleaves pre-mRNA but polyadenylation fails .
Sequence alignment indicates that ZF2 is the most conserved zinc finger in CPSF-30 (76% identity and 96% similarity between yeast and mammals). Point mutations made to the conserved Cys residues in ZF2 are lethal . Mutations to other residues within ZF2 reduce the cleavage activity in vitro but could be rescued by the addition of wild-type Yth1p. Mutations in ZF2 also disrupt binding to pre-mRNAs . Deletions of ZF1, ZF4 and ZF5 result in lethality or slowed growth, indicating their requirement for fully functioning Yth1p . In contrast, deletion of ZF3 did not alter cell growth, indicating that it is the only zinc finger that is not required for function .
CCCH zinc finger motifs are also involved in protein-protein interactions, and CPSF-30 coordinates other proteins in cleavage and polyadenylation. ZF4 of Yth1p is required for interactions with Fip1p, and ZF5 contributes to this binding [123,124]. hFip1 also interacts with CPSF-30, although the location of interaction has not been mapped . ZF1 and ZF4 of Yth1p interact with the N-terminal region of Brr5p/Ysh1p . Influenza virus attenuates host antiviral response by blocking the function of CPSF-30 with its NS1 protein [127–129].
CPSF-30 has high sequence homology with the Drosophila protein clipper (70% identity over the 174 N-terminal residues in bovine CPSF-30). Clipper contains five CCCH zinc finger motifs and two zinc knuckle motifs, and the high sequence conservation covers the five zinc finger motifs. This region of clipper has endoribonuclease activity and cleaves RNA hairpins . Consequently, it has been suggested that CPSF-30 might be the endoribonuclease responsible for the cleavage reaction . However, endoribonucleolytic activity has not been observed for CPSF-30 .
Fip1p (Factor interacting with Pap1p) was first identified in yeast, through its interaction with Pap1p . hFip1 was subsequently identified by sequence analysis of the HeLa cell cDNA library and confirmed to be a member of CPSF . hFip1 contains an acidic segment near the N-terminus, followed by a highly conserved segment of about 70 residues (48% identity and 72% similarity between hFip1 and Fip1p) (Fig. 2A). The C-terminal region contains a Pro-rich segment, a segment with alternating Arg and Asp residues (RD segment), and an Arg-rich segment. Fip1p, at 35 kD, is much smaller than hFip1 (66 kD), because it does not contain the RD and Arg-rich segments. hFip1 runs as a diffuse band in SDS-PAGE, with apparent molecular weights between 65 and 80 kD.
The acidic segment (residues 1–111) of hFip1 mediates interactions with PAP, and the highly conserved segment (residues 137–243) interacts with CPSF-30. Both of these segments (residues 1–355) are required for binding CPSF-160 and CstF-77, and CPSF-160 may also interact with the C-terminal region of Fip1 (residues 443–594) . A stable ternary complex of hFip1, CPSF-160 and PAP can be identified through purification of HeLa cell extracts. The interaction with PAP may also be aided by the presence of CF Im . Recombinantly produced RD segment preferentially binds to U-rich RNA sequences, which is present in the auxiliary upstream element of pre-mRNAs .
Similar interactions have been observed for Fip1p. Residues 80–105 of Fip1p interact directly with Pap1p, and residues 206–220 interact with Yth1p . There are also weak interactions between Fip1p and Rna14p [124,132,134].
The primary function of hFip1 (Fip1p) may be to bring PAP close to the polyadenylation site. Removal of the C-terminal region beyond the Pro-rich segment of Fip1p produces temperature-sensitive growth in yeast. These cells are deficient in polyadenylation, but can be rescued by the addition of wild-type Fip1p. Interestingly, yeast Pap1p is inhibited in vitro by the addition of Fip1p, likely mediated by residues 105–206 of Fip1p [134,135]. In contrast, the addition of recombinant hFip1 stimulates the activity of PAP in vitro, which is dependent on the U-rich auxiliary upstream element. The stimulation is abolished by the deletion of this element in the pre-mRNA, or by the deletion of the RD segment or the CPSF-160 binding segment in hFip1 .
CstF-64 was one of the first proteins identified within the 3′-end processing complex because it can be UV cross-linked to RNA . CstF-64 contains a conserved RNA recognition motif (RRM) RNA binding domain (43% identity between CstF-64 and Rna15p) at its N-terminus (Fig. 3A). The RRM alone is sufficient for RNA binding and shows preference for the G/U-rich DSE (Fig. 1A) . The structure of the RRM of CstF-64 has been determined by NMR (Fig. 3B) , which showed the presence of a UU dinucleotide-specific binding site and that the protein:RNA interface is highly mobile . This flexibility may allow CstF-64 to form stable complexes with a wide range of GU-rich sequences.
The RRM in Rna15p has unique properties compared to that in CstF-64 (Table 1). First, the RRM achieves sequence specificity only in the presence of other proteins, especially Rna14p and Hrp1p, but Hrp1p and the EE it recognizes are not present in mammals. Second, the RRM in Rna15p binds specifically to the A-rich PE (Fig. 1B), in contrast to the G/U-rich DSE in mammals. Mutation to conserved amino acid residues in Rna15p eliminates RNA binding, and mutation to the PE prevents Rna15p cross-linking . Therefore, CF IA is located upstream of the CPF (Fig. 1B), whereas its mammalian counterpart CstF is located downstream of CPSF (Fig. 1A).
Immediately after the RRM there is a hinge region of about 100 residues that is highly conserved in CstF-64 but somewhat less conserved in Rna15p (Fig. 3A). The hinge region mediates protein-protein interactions and binds to CstF-77 and symplekin [140,141].
CstF-64 contains a C-terminal domain (CTD), covering the last 50 amino acids, that is more conserved than the RRM . This domain forms a three-helical bundle (Fig. 3C) . Deletions in the C-terminus of Rna15p lead to slow growth or cell death in vivo and are defective in cleavage in vitro. The C-terminus of Rna15p binds to another member of CF IA, Pcf11p, as truncation of the last 16 residues disrupts the interaction between these two proteins . In addition, this region can interact with transcription factors and may play a role in regulating transcription .
Metazoan CstF-64 homologs contain additional sequences between the hinge region and the CTD, including a proline/glycine-rich segment followed by 12 repeats of the MEAR(A/G) pentapeptide motif. The length and composition of this region is variable among metazoans and its function is unknown . The MEAR(A/G) repeats may form a helical structure in solution .
CstF-77 is required for proper 3′-end cleavage. Mutation of the Drosophila homolog of CstF-77, suppressor of forked su(f), results in the utilization of alternative poly(A) sites . This defect can be rescued by the addition of human CstF-77 .
CstF-77 contains 12 repeated sequences at the N-terminus (Fig. 3A), which are called HAT (half a TPR) motifs for their similarity to tetratricopeptide repeat (TPR) motifs . TPR motifs often mediate protein-protein interactions. Structural and biochemical data suggest that the HAT domain can be further divided into two sub-domains, HAT-N domain (residues 1–240, with HAT motifs 1 through 5) and HAT-C domain (residues 241–549, HAT motifs 6 through 12) [152–154]. Most importantly, the structures reveal that the HAT domain is an intimately associated dimer, mediated by the HAT-C domain (Fig. 3D). The overall shape of the dimer is highly elongated, about 45 Å wide but 165 Å long (Fig. 3D). From the side, the dimer looks like a bow (Fig. 3E). This dimerization is supported by studies in solution, yeast two-hybrid assays , far Western experiments , as well as by solution and electron microscopy studies on Rna14p . In addition, self-association of CstF-77 has been suggested based on genetic observations on the Drosophila homolog Su(f), as distinct lethal alleles of Su(f) can partially complement each other [150,156]. Taken together, these data suggest that CstF-77 may function as a dimer at a crucial stage in pre-mRNA 3′-end processing.
The HAT domain is followed by a proline-rich segment in CstF-77 (Fig. 3A). Far Western experiments showed that this segment binds the hinge region of CstF-64 and the WD-40 repeats of CstF-50 , and the interaction between CstF-77 and CstF-64 was further confirmed by nickel pull-down assays and analytical ultracentrifugation (AUC) . Electron microscopy and AUC experiments showed that Rna14p and Rna15p can form a heterotetramer . Yeast two-hybrid assays showed that the HAT-C domain mediates interactions with the second β-propeller of CPSF-160, the same region that recognizes the AAUAAA PAS . CstF-77 binds specifically to the CTD of PolII but with less efficiency than CstF-50 . Rna14p binds to unphosphorylated CTD, but the binding increases upon phosphorylation of the CTD .
CstF-50 is required for cleavage in vitro . It contains seven WD-40 repeats that begin about 90 residues from the N-terminus (Fig. 3A). The WD-40 repeats are required for interaction with CstF-77, and deletion of the last repeat reduces binding . CstF-50 also can self-associate and only the N-terminal region is required for this interaction. CstF-50 does not appear to have a sequence homolog in yeast.
Both CstF-50 and CstF-77 bind specifically to the CTD of PolII but CstF-50 binds with a higher efficiency . This binding is significantly reduced upon deletion of the first 91 amino acids of CstF-50, indicating that the WD-40 repeats are not sufficient for interaction. RNAi experiments showed that CstF-50 also interacts with the splicing co-activator SRm160, establishing another link between 3′-end processing and transcription . A yeast two-hybrid screen identified an interaction between the WD-40 repeats of CstF-50 and the protein BARD1, which associates with the tumor suppressor BRCA1 [160,161]. This interaction inhibits 3′-end cleavage of pre-mRNAs in vitro.
CF Im is required for cleavage in vitro , and appears to be unique to higher eukaryotes. Three polypeptides (25 kD, 59 kD and 68 kD) as well as a less abundant one (72 kD) generally copurify with CF Im activity from HeLa cell nuclear extract. CF Im functions as a heterodimer that is made up of the 25 kD subunit and one of the 59 kD, 68 kD or 72 kD subunits. The heterodimer consisting of the 25 kD and 68 kD polypeptides can reconstitute cleavage activity to partially purified 3′-end processing factors .
The three large subunits have similar amino acid sequences, but are encoded by separate genes . The N-terminal region of the 68 kD subunit contains an RNP-type RNA binding domain (RBD), which is necessary for binding to the 25 kD subunit . RBDs have been observed in protein-protein interactions in other cases, for example the splicing factor U2AF [165,166] and the Drosophila proteins Y14 and Mago . The C-terminal region of the 68 kD subunit is rich in RS, RD and RE repeats that are similar to pre-mRNA splicing SR proteins. In fact, this region interacts with the spliceosomal SR proteins , and CF Im has been identified as a component of purified spliceosomes [168,169]. For the 25 kD subunit, residues 81–160 mediate interactions with the C-terminus of PAP (residues 472–739) and PABP [164,170].
The primary function of CF Im may be to provide additional recognition of the pre-mRNA substrate and aid the definition of the proper polyadenylation site. The 25 kD, 59 kD and 68 kD subunits of CF Im can be UV cross-linked to RNA in a sequence dependent manner , and SELEX analysis revealed that CF Im prefers to bind to the RNA sequences containing UGUAA , which is generally found just upstream of the PAS. Further studies demonstrated that CF Im binding enhances the recognition of sequences that contain both the perfect PAS and the noncanonical PAS . The enhancement on the perfect PAS is aided by binding of hFip1 and the consequent recruitment of PAP .
In mammals, dephosphorylation of a protein in CF Im or CF IIm by a Ser/Thr phosphatase abolished the cleavage reaction .
CF IIm contains two subunits, hPcf11 and hClp1, both of which were originally discovered in yeast (Pcf11p and Clp1p) and belong to CF IA . These proteins are highly conserved among the eukaryotes.
hPcf11 can be identified in nuclear extracts with apparent molecular weights between 140 and 200 kD on SDS gels . Pcf11p is only about half the size of hPcf11, and is equivalent to the N-terminal half of hPcf11. The function of the C-terminal half of hPcf11 is not known.
Pcf11p contains a conserved PolII CTD interacting domain (CID) that covers ~130 residues in the N-terminus (Fig. 4A), which prefers the phosphorylated form of the CTD [157,172,173]. The crystal structure of Pcf11p in complex with a phosphorylated CTD peptide has been reported (Fig. 4B) . Mutations in the CID reduce or abolish binding to the CTD and result in cell death. Interestingly, these mutations do not affect 3′-end processing, but instead cause incorrect transcriptional termination [157,174–176]. The CID can also bind RNA to affect transcriptional termination . Studies of the CID indicate that the RNA binding and the protein binding regions overlap, and this competition for binding may play a role in the release of the 3′-end processing factors from PolII .
The CID is followed by a 20-residue stretch of glutamines (234–253 in Pcf11p), which is followed by the Rna14p/Rna15p binding domain (Fig. 4A). The binding of Rna14p/Rna15p to Pcf11p is dependent on the binding of Clp1p first . This CstF binding region is followed by the Clp1 interacting segment (477–499, discussed below), which is flanked by a zinc finger on each side (N-terminal C2H2 and C-terminal C2HC type). Pcf11p also interacts with Cft1p/Yhh1p, Cft2p/Ydh1p, Brr5p/Ysh1p, and Pta1p , but the location of the binding sites is unknown.
hClp1 is conserved throughout eukaryotes and has 23% identity with Clp1p . Immunodepletion of hClp1 abolished cleavage activity but does not affect polyadenylation. Clp1p binds strongly to Brr5p/Ysh1p and Pcf11p, and weakly to Cft2p/Ydh1p . Similarly, hClp1 interacts with CF Im and CPSF [119,178].
hClp1 and its homologs contain a Walker A motif (residues 130–137), which is generally associated with binding ATP/GTP (Fig. 4A) . The structure of Clp1p confirms the similarity to other ATPases (Fig. 4C) . The bound conformation of ATP suggests that Clp1p may not have ATPase activity, which is confirmed by biochemical assays in vitro. Recent studies show however that hClp1 is a RNA 5′-kinase that is important for tRNA splicing and activation of synthetic short interfering RNAs .
The structure also reveals the molecular basis for the interactions between Clp1p and Pcf11p (Fig. 4C) . A peptide segment of Pcf11p (residues 475–499) is bound by Clp1p. Two of the residues in this segment, Arg480 and Trp489, are strictly conserved among Pcf11p homologs and make large contributions to the binding interface (Fig. 4C).
In mammals, PAP is required for cleavage and polyadenylation, but in yeast Pap1p is only required for polyadenylation. While PAP interacts with other members of the 3′-end processing complex, it does not require any of these proteins for in vitro polyadenylation, although these interactions are important for defining the proper length of the poly(A) tail .
The structures of human, bovine and yeast PAP have been reported [182–185]. Overall, the structure of PAP contains three domains, N-terminal, middle and C-terminal domains (Fig. 5A). The active site is located at the bottom of a large cleft between the N- and C-terminal domains. The N-terminal domain coordinates the two metal ions (Mg2+ or Mn2+) that are required for catalysis and is strongly conserved throughout eukaryotes. The C-terminal domain binds hFip1 (Fip1p) and CPSF-160 [126,132,183,186]. The RNA binding site in this domain is not directly involved in binding the RNA substrate. The structure of Pap1p in a ternary complex with MgATP and an oligo(A) shows that only the last 3 nucleotides of the pre-mRNA substrate is bound tightly by the enzyme (Fig. 5A) .
PAP is an induced-fit enzyme, and this induced-fit behavior has an important role in defining the substrate specificity of the enzyme [187–190]. The N-terminal domain undergoes a large movement upon the formation of the ternary complex with the MgATP and pre-mRNA substrates, whereas MgGTP cannot induce this active site closure. Other polymerases may also use an induced-fit mechanism to help regulate substrate preference .
PABP (Pab1p) is required for correct and efficient polyadenylation. The polyadenylation reaction can occur without PABP, but controlling the length of the poly(A) tail requires PABP [192–195]. PABP binds directly to nascent stretches of 11–14 adenylate nucleotides as they become available , and this binding continues until the proper poly(A) tail length is reached (~200 to 300 bases in mammals) . In addition, direct binding of PABP to pre-mRNA, adjacent to PAP, increases the efficiency of polyadenylation 80-fold . Pab1p binds directly to Rna15p, which possibly recruits Pab1p to the polyadenylation site . Interactions between PABP and other members of the mammalian 3′-end processing complex have not been reported.
PABP contains four tandem repeats of the RNA-recognition motifs (RRMs), and the crystal structure of the first two motifs in complex with an 11-mer adenylate nucleotide (A11) has been reported (Fig. 5B) . The two RRMs form a contiguous RNA binding site, primarily using the face of the β-sheet in the two RRMs, and the nucleotide adopts an extended conformation in the complex (Fig. 5B). The adenine bases are recognized by conserved residues in the RRMs.
Pta1p was originally identified as an essential component of pre-tRNA processing , while symplekin was discovered in association with tight junctions . Both were later identified in 3′-end processing complexes. Two separate cDNAs of symplekin were isolated, encoding proteins of 1273 and 1058 residues . The first 964 residues of the two forms of symplekin are identical, suggesting that they are derived from alternative splicing. Symplekin and Pta1p share very weak sequence homology, but a putative conserved region between the two proteins have been identified by sequence analysis, with 17% identity and 31% similarity .
Pta1p is required for both cleavage and polyadenylation . Symplekin/Pta1p probably functions as a scaffolding protein, interacting with and possibly bringing together a large number of proteins in the 3′-end processing complex. Pta1p purifies with CPF and can be further purified into CF II. It interacts directly with the C-terminus of Brr5p/Ysh1p , Cft2p/Ydh1p , Syc1p, Glc7p, Pti1p, Ssu72p and Swd2p [81,202,203]. Symplekin was identified through its interaction with the hinge region of CstF-64 , and forms a stable complex with CPSF and CstF.
Depletion of the phosphatase Glc7p leads to accumulation of phosphorylated Pta1p and shortened poly(A) tails . Restoration of normal polyadenylation requires either Glc7p or unphosphorylated Pta1p, indicating that Pta1p is likely a target for the regulation of polyadenylation through phosphorylation. The mechanism of this regulation is currently not known.
Symplekin may also function as a scaffold protein in the 3′-end processing of histone pre-mRNAs  as well as polyadenylation in the cytoplasm. Symplekin co-localizes with CPSF-100 in Cajal bodies during oocyte maturation, and is required for cytoplasmic polyadenylation in these oocytes . Symplekin mediates protein-protein interactions in cytoplasmic polyadenylation by directly contacting CPSF and the RNA binding protein CPEB .
The CTD is made up of 52 heptapeptide repeats in humans and 26 repeats in yeast with a consensus sequence of YSPTSPS for each repeat. The repeated serine residues within the CTD are susceptible to phosphorylation . The CTD is not observed in the structure of yeast PolII, suggesting that it may be flexible . Sequence analysis and early NMR studies proposed that the CTD forms a β-turn. The structure of the CID in Pcf11p in complex with a CTD peptide containing a phosphorylated serine showed that the CTD peptide assumed the conformation of a β-turn in this complex (Fig. 4B) . CTD peptides have also been observed in more extended conformations [209,210]. The variety of CTD conformations suggests that the CTD may not form a single basal structure .
PolII was identified in 3′-end processing because transiently transfected cells with CTD truncations exhibit inefficient polyadenylation . Purified phosphorylated and unphosphorylated CTD activated a reconstituted cleavage reaction in vitro . While the CTD is necessary for cleavage in mammals, it does not appear to be necessary for cleavage in yeast . Deletion of the CTD in yeast produces unstable mRNAs that are degraded, but the degradation can be rescued by blocking the 5′-to-3′ exonuclease XRN1, indicating a problem with the 5′-end cap as opposed to 3′-end polyadenylation .
Both mammalian and yeast CTD bind CPSF-160 (Cft1p/Yhh1p) [85,93]. Additionally, yeast CTD interacts with Cft2p/Ydh1p and Pcf11p [118,172,173,213] and the mammalian CTD binds CstF-50 . Protein binding generally increases upon phosphorylation of the CTD, and 3′-end processing is stimulated in vitro by phosphorylated CTD . Disruption of these interactions often leads to poor transcriptional termination.
Several other 3′-end processing proteins have been identified in yeast. While some of these proteins have mammalian homologs, none of these homologs have yet been identified as necessary for 3′-end processing in mammals. Additional yeast proteins were identified through their co-purification with Pta1p  or Ref2p . Most of the proteins identified are associated with snRNA and snoRNA 3′-end formation, strengthening the connection between pre-mRNA processing and other cellular processes . The proteins that have been recognized as playing a role in 3′-end processing are described below.
Hrp1p is the only member of the processing factor CF IB (Fig. 1B). It is required for proper cleavage in yeast, although it does not have a homolog in mammals. Hrp1p contains two centrally located consecutive RRM-type RNA binding domains (residues 160–290). Hrp1p cross-links to RNA with specificity for the AU-rich EE, and the molecular basis of this recognition has been determined by NMR . Both RRMs specifically recognize the hexamer AU-rich site (Fig. 5C). Besides binding the EE, genetic analysis indicates that Hrp1p interacts directly with the CF IA components Rna14 and Rna15 .
Pfs2p is a subunit of CPF and can be further purified into PFI. It contains seven WD-40 repeats from residue 90 to 380. Disruption of Pfs2p, like most proteins in the 3′-end processing complex, results in cell death . Mutation or depletion of Pfs2p impairs both cleavage and polyadenylation, indicating that Pfs2p plays an essential role in both processes. The N-terminal domain prior to the WD-40 repeats is not necessary for 3′-end processing activity. Pfs2p directly interacts with proteins from CPF (Fip1p, Brr5p/Ysh1p and Swd2p) and CF IA (Rna14p), suggesting that Pfs2p plays a role in tethering the two factors for proper 3′-end processing [202,219].
Pfs2p has recognizable sequence homology with the N-terminal domain of the human protein WDR33, which also contains a large, collagen-like domain in the C-terminus . It is not known whether WDR33 has any role in pre-mRNA 3′-end processing.
Ssu72p was originally identified by its genetic interaction with the transcription factor TFIIB, as a mutation in Ssu72p disrupted this interaction and affected the accuracy of start site selection . Further studies identified Ssu72p within the CPF [77–79]. It is essential for the cleavage reaction but is not required for polyadenylation . Ssu72p interacts with Pta1p, Cft2p/Ydh1p  and the Rpb2p subunit of PolII . Upon binding to Pta1p in CPF, Ssu72p functions as a phosphatase with specificity for the fifth serine of the heptapeptide repeat in the CTD [223,224]. However, this activity is not required for 3′-end processing, but it may affect its role in 5′-end capping .
The human homolog of Ssu72p (hSsu72) has been identified as an interacting protein of the retinoblastoma tumor suppressor, Rb . Similar to its yeast homolog, hSsu72 binds directly to TFIIB and Pta1p and exhibits phosphatase activity, but its role in 3′-end processing is still unknown. hSsu72 cannot rescue a lethal mutation in Ssu72 in yeast . A close homolog of hSsu72 is found in humans , sharing 72% sequence identity. It has been suggested that this second isoform is encoded by a pseudogene .
Pti1p is part of the snoRNA associated complex from the CPF . It is a sequence homolog of Rna15p and CstF-64, and contains a RRM at the N-terminus (residues 20–100). The hinge region (residues 185–260) is more conserved with CstF-64 than Rna15p, and interacts with Rna14p, Pcf11p and Pta1p . In addition, the N-terminus and the hinge region of Pti1p interact with Glc7p . Defects in Pti1p affect the cleavage site selection and the polyadenylation length of some pre-mRNAs, but the presence of Pti1p is not essential for 3′-end processing . Over-expression of Pti1p prevents polyadenylation, consistent with its association with snoRNA, which are not polyadenylated. The more likely role for Pti1p may be to coordinate the CPF with snoRNAs, and this has minimal effect on 3′-end processing .
Swd2p was identified by its interaction with Pta1p and belongs to the CPF . Additionally, Swd2p associates with the histone methylation complex Set1 [226,227], indicating a role in transcription as well as 3′-end processing. Sequence analysis of Swd2p and its homologs predicts that they contain seven WD-40 repeats . Depletion of Swd2p does not have an effect on 3′-end cleavage in vitro . However, mutations or depletion of Swd2p disrupt proper transcriptional termination [202,226]. Recombinantly purified GST-Swd2p interacts with many proteins of the CPF, including Cft1p/Yhh1p, Brr5p/Ysh1p, Pta1p, Pfs2p, Ref2p, Ssu72p, Glc7p and Pti1p, as well as Pcf11p in CF IA . Swd2p may not be essential for pre-mRNA 3′-end processing, but it does tether the complex to other proteins and establishes another connection to transcription.
Swd2p has a close homolog in humans, known as WDR82 (SwissProt entry Q6UXN9), with 35% sequence identity. The function of this protein is currently not known.
Mpe1p was identified because it suppressed a temperature-sensitive mutation in Pcf11p . It is required for cleavage and polyadenylation. Affinity-tagged Mpe1p can pull down the entire CPF, indicating that it is a functional member of the CPF. Mpe1p is conserved throughout eukaryotes from yeast to humans, but it is unknown whether the homologs are also involved in 3′-end processing. Mpe1p contains a zinc knuckle and a possible ring finger motif. The zinc knuckle might mimic that in CPSF-30, which is absent in Yth1p .
Syc1p is a component of CPF that is highly homologous (38% identity) to the C-terminal residues of Brr5p/Ysh1p. Syc1p is not required for either of the 3′-end processing functions and deletion of Syc1p does not disrupt cell viability . In fact, deletion of Syc1p partially rescues in vivo phenotypes of mutated Brr5p/Ysh1p and Pta1p and alleviates in vitro processing defects of mutated Brr5p/Ysh1p. GST pull-down assays identify that Syc1p interacts with the C-terminal domain of Brr5p/Ysh1p and with Pta1p . Therefore, Syc1p may negatively regulate 3′-end processing, possibly by competing with the C-terminal domain of Brr5p/Ysh1p.
Glc7p is a type 1 protein phosphatase that is required for cell viability and participates in a variety of cellular processes, including 3′-end processing [77,229]. Depletion, inhibition or removal of Glc7p results in mRNAs with shortened poly(A) tails, indicating that Glc7p plays a role in polyadenylation but not cleavage [204,230]. The level of phosphorylated Pta1p increases upon depletion of Glc7p in vivo, identifying Pta1p as a specific target for Glc7p phosphotase activity. The effects of depleted Glc7p can be rescued upon the addition of unphosphorylated Pta1p and to a lesser degree Fip1p . GST-pull-down assays show that Glc7p interacts with residues 101–200 of Pta1p . Additionally, Glc7p can pull-down Pti1p and to a weaker extent Cft1p/Yhh1p, Cft2p/Ydh1p and Brr5p/Ysh1p.
Pre-mRNA 3′-end processing is a fundamental event in most eukaryotes. Studies over the past 20 years have identified a large number of protein factors that are involved in this processing (Table 1), and a schematic model for the mammalian machinery based on the current structural and biochemical information is shown in Fig. 6. Many of the protein factors are directly involved in pre-mRNA recognition, cleavage, or polyadenylation. It is likely that the recognition of the upstream PAS by CPSF-160 and the G/U-rich DSE by CstF-64 help position the 3′-end processing machinery on the pre-mRNA, bringing the endoribonuclease (CPSF-73) close to the cleavage site (Fig. 6). This correct placement may also help to activate CPSF-73, as CPSF-73 on its own appears to have very weak activity. This may ensure that the nuclease is only functional when it is in the correct location on the pre-mRNA, preventing non-specific activity on other cellular RNAs. The current evidence suggests that CstF may be dimeric in this machinery (Fig. 6). The presence of two CstF-64 subunits (and their RRMs) may be important for recognizing the DSE, whereas the dimeric HAT domains of CstF-77 may provide a platform for interacting with other protein factors, including CPSF-160 (Fig. 6). In addition to binding and processing the pre-mRNA, some of the protein factors in this machinery are also needed to coordinate with transcriptional initiation and termination, 5′-end capping, as well as splicing. This may explain why such a large machinery is involved in pre-mRNA 3′-end processing.
Future studies will focus on defining the functional roles of this large number of protein factors in more detail, as well as how this 3′-end processing machinery is constructed at the molecular level. Structural information on the protein factors themselves is already revealing significant, and sometimes unexpected, molecular insights, for example the dimeric association of CstF in the pre-mRNA 3′-end processing machinery (Fig. 6). The greatest molecular and functional insights will be derived from structural information on the sub-complexes of these protein factors (CPSF, CstF and others) and even the entire pre-mRNA 3′-end processing machinery.
We thank James Manley and Kevin Ryan for helpful discussions. This research is supported in part by a grant from the NIH to LT (GM077175).