Transcription elongation and termination are finely balanced and controlled in a variety of ways to ensure that termination does not occur prematurely at cryptic poly(A) sites. One control mechanism is governed by the kinetics of transcriptional elongation and poly(A) complex assembly [31
]. A second mechanism involves factors that promote transcription termination and RNA processing (e.g. Rna15 and its orthologue CstF-64) competing against those antagonizing termination (e.g.
yeast Sub1 and human PC4, as well as Npl3 or Drosophila
]. Accordingly, we proposed that Npl3 might function to prevent premature termination by binding to weak or cryptic 3′ processing sequences. Only the high affinity binding of polyadenylation factors to “real” poly(A) sites overcomes the competition, leading to termination of transcription at proper polyadenylation sites. This model is supported by our findings that Npl3 binds to the pre-mRNA in competition with Rna15 [15
]. In order to understand how Npl3 functions in balancing transcription termination and anti-termination, we examined the structure of the protein and studied its interaction with RNA sequences representing yeast 3′-end processing sites.
The NMR structure of Npl3 shows two fairly canonical RRM structures, although with about 70 amino acids per domain, these are probably the smallest RRMs studied to date. The RNA-binding experiments also demonstrate that Npl3 binds to single-stranded RNAs using the classical β-sheet recognition surface. Although Npl3 is an acidic protein, most of the negatively charged residues lie on the surface of the protein opposite to the RNA-binding site.
Several observations conclusively demonstrate that the two RRM domains do not interact with each other in the RNA-free form, yet form a compact globular structure in the presence of RNA while the linker region separating the two RRMs becoming ordered as is the rest of the protein. This feature has also been seen with other multidomain RRMs, such as those in the Pab, SxL, HuD and Hrp1 proteins (PDB IDs: 1CVJ, 1FXL, 1B7F and 2CJK, respectively) [24
]. In each of these structures, the two domains combine to form a single extended RNA-binding platform that is required for the specific recognition of about 10 contiguous nucleotides. Yet the mode of RNA recognition in Npl3 is quite different from the rest of these proteins. There is a clear difference in off rates for the interaction between RRM-1 and RRM-2 and the G/U-rich RNA. This behavior suggests that the RNA binds to Npl3 through two independent recognition sequences connected by a flexible linker of a few nucleotides, and then forms a single globular structure when bound to RNA [24
]. This conclusion is strongly supported by the observation that shorter G/U-rich RNA oligonucleotides bind to Npl3 with the same kinetic characteristics as the 18-mer, but with 2:1 stoichiometry. Thus, Npl3 is capable of forming multipartite and independent interactions with RNA sequences separated by a few or many nucleotides. Npl3 also differs from other multi-domain proteins in that while both RRMs work together to give the specificity; in Npl3, the specificity appears to be largely determined by RRM-2.
Full length Npl3 binds to various oligonucleotides that represent yeast 3′-end processing signals with nM affinity (), i.e. it has strong but non-specific RNA-binding activity. In fact full length Npl3 binds only about 5-fold more strongly to G/U-rich compared to AU-rich oligonucleotides. Although the construct of the protein containing only the two RRMs has much weaker RNA-binding activity, the difference in binding between AU- and G/U-rich sequences is much stronger. The NMR results very clearly show that the two RRMs bind weakly to AU-rich sequences, with affinity of approximately 100 μM, but very differently to G/U-rich RNAs. While binding of RRM-1 to the G/U-rich 18-mer and AU-rich oligonucleotides remains the same, RRM-2 forms a much stronger interaction with the G/U-rich RNA. Because the off rates are long, it is not possible to measure the dissociation constant directly by NMR. However, if the on rate is diffusion limited (there is no evidence to the contrary), then the NMR data demonstrate that the interaction between RRM-2 and the G/U-rich RNA is in the nM-μM range.
The structures of RRM-1 and RRM-2 of Npl3 suggest two possible reasons for this difference in affinity. First, in RRM-1 the ‘recognition’ loop connecting β2 and β3 consists of only two very highly ordered residues, a feature unique in RRMs that generally have longer flexible loops. In contrast, the length of the loop between β2 and β3 in RRM-2 is more typical, with 7 amino acids experiencing conformational exchange, representing a structure more conducive to specific RNA recognition. Second, the RNA binding β-sheet surface of RRM-2 is uniformly neutral to basic. In contrast, the β-sheet surface of RRM-1 has an acidic residue (Glu189) in the center of the RNA binding surface.
The comparison between the full-length protein and the constructs containing only the two RRM domains suggests that other parts of the protein, presumably the sequence rich in Arg, Ser and Gly near its C-terminus, contribute to the strong but non-specific RNA-binding activity of Npl3 [43
]. The direct interaction of RNA through the RS domain has been reported in the case of other SR proteins like the splicing factor U2AF [45
] and the PTB associated splicing factor (PSF) [47
]. RNA recognition by Npl3 may be similar, with the RS domain contributing to the strong, but non-specific RNA binding activity of the whole protein. This non-specific binding masks the ability of RRM-2 to discriminate between G/U-rich and other RNA oligonucleotides, because the full-length protein binds with only 5-fold different affinity to G/U- and AU-rich RNAs. This result suggests that Npl3 may become a sequence-specific RNA-binding protein if the strong but non-specific RNA-binding activity of the rest of the protein is negatively regulated. Consistent with this suggestion, phosphorylation of Npl3 reduces binding to RNA 5-fold [43
Superposition of the two RRMs of Npl3 on the SxL-protein (PDB: 1B7F) demonstrates that the two proteins are structurally very similar. We also notice that the length of the shorter G/U-rich oligonucleotide (about 10 nucleotides, ) corresponds closely to the number of nucleotides that are specifically recognized by each of the two domain proteins mentioned earlier: Pab, SxL, Hrp1 and HuD. However, the binding sites for RRM-1 and RRM-2 must be separated by a flexible polynucleotide region in order to account for all the results described above. We exploited this similarity in structures and these observations to generate a model of the Npl3 protein in complex with the 18-mer RNA that incorporates this similarity as well as the independent interaction of each domain of the protein with two distinctive G/U-rich sequences separated by a flexible RNA linker ().
FIGURE 8 Model of Npl3 binding to N2 RNA. The pdb file was generated by using the energy minimization program MOE (version 2006.08; Chemical Computing Group, Montreal, Quebec, Canada) starting with the coordinates of the Sxl-RNA complex. Residues in Npl3 protein (more ...)
The short G/U-rich sequence to which Npl3 binds (GCC UGG UUG C; ) is nearly identical to the consensus sequence of another RRM binding protein, human PTB (-(A/Y) GCC UGG UGC Y-) [48
]. This sequence similarity suggests that these proteins and SxL may also share a similar mechanism. Indeed, SxL and Cstf-64 compete for binding to G/U-elements in the regulation of alternative poly(A) site selection [36
] and PTB competes with U2AF65 for the pyrimidine tract upstream of 3′ splice sites [49
]. Thus the antagonistic interaction of these proteins functions to regulate termination/poly(A) and alternative splicing.
The NMR data strongly suggest that the RRM-2 Npl3 has a preference for sequences rich in Us and Gs that is partially masked by the non sequence-specific activity of the rest of the protein. The Rna15 consensus sequences originally identified in vitro
using SELEX are also G/U-rich [50
], and its vertebrate orthologue CstF-64 binds to GU or U-rich sequences as well [51
]. Thus, both Npl3 and Rna15 appear to bind preferentially to G/U-rich sequences thereby providing mechanistic insight into how Npl3 prevents premature processing and transcription termination. When in complex with other CF I subunits, Rna15 contacts an A-rich sequence and Hrp1 binds specifically to an alternating AU-rich RNA upstream of the processing site [40
]. Consistent with the NMR results that report a much stronger interaction between RRM-2 and the G/U-rich RNAs, mutations in RRM-2 of Npl3 have significant consequences on the selection of 3′-end processing sites [17
]. Presumably, reduced binding of the RRM-2 of Npl3 to G/U-rich sequences would allow for their recognition by Rna15, which could then direct binding of the rest of the 3′-end processing complex to weak or cryptic poly(A) sites. Thus, the preference of Npl3 for G/U-rich sequences is consistent with its function in regulating recognition of 3′-end processing sites via its competition with the Rna15 protein for RNA binding, and suggests that this function is encoded within RRM-2.