|Home | About | Journals | Submit | Contact Us | Français|
Control of translation in eukaryotes is complex, depending on the binding of various factors to mRNAs. Available data for subsets of mRNAs that are translationally up- and down-regulated in yeast eIF4E-binding protein (4E-BP) deletion mutants are coupled with reported mRNA secondary structure measurements to investigate whether 5′-UTR secondary structure varies between the subsets. Genes with up-regulated translational efficiencies in the caf20Δ mutant have relatively high averaged 5′-UTR secondary structure. There is no apparent wide-scale correlation of RNA-binding protein preferences with the increased 5′-UTR secondary structure, leading us to speculate that the secondary structure itself may play a role in differential partitioning of mRNAs between eIF4E/4E-BP repression and eIF4E/eIF4G translation initiation. Both Caf20p and Eap1p contain stretches of positive charge in regions of predicted disorder. Such regions are also present in eIF4G and have been reported to associate with mRNA binding. The pattern of these segments, around the canonical eIF4E-binding motif, varies between each 4E-BP and eIF4G. Analysis of gene ontology shows that yeast proteins containing predicted disordered segments, with positive charge runs, are enriched for nucleic acid binding. We propose that the 4E-BPs act, in part, as differential, flexible, polyelectrostatic scaffolds for mRNAs.
The initiation of eukaryotic translation is subject to multiple mechanisms for regulation (1,2). Binding of a 43S pre-initiation complex to the capped 5′-end of mRNA is facilitated by the cap-binding factor eIF4E and partners eIF4G and eIF4A in the eIF4F complex. The helicase subunit of eIF4F, eIF4A, promotes scanning of mRNAs containing secondary structure in their 5′-UTRs (3). Poly(A) binding protein (PABP) interacts also with eIF4G and can mediate the formation of a circular messenger ribonucleoprotein (mRNP) complex by linking the cap and the poly(A) tail. Because the rate of translation initiation depends on the recognition of the mRNA 5′-cap by eIF4F, this process is central for control of translation (1). 4E-binding proteins (4E-BPs) compete with eIF4G for a common binding site on the companion eIF4F subunit, eIF4E (4), and are therefore inhibitory for translation. Regulatory mechanisms generally fall into two categories, those impacting on the eIFs (e.g. by protein phosphorylation) or ribosomes, which affect initiation overall, and those acting on mRNAs, with the potential to be mRNA subset-specific (2). In the latter category, interaction of specific RNA-binding proteins is more common for the 3′-UTR than the 5′-UTR and follows a general mechanism in which the specifically (3′-UTR) bound protein cross-links through an intermediate protein to a cap-binding protein, forming an inhibitory loop that precludes access for eIF4F (2).
Genomic studies are facilitating analysis of translational control mechanisms, with yeast a key model organism. For example, eIF4G depletion in yeast narrows the range of translational efficiencies, rather than preventing translation, for most genes (5). Translation efficiencies with the greatest dependence on eIF4G were found for mRNAs that are relatively well translated, without long or highly structured 5′-UTRs. This result is consistent with eIF4F being more important in enhancing 43S attachment to the mRNA 5′-end than in scanning through long, structured 5′-UTRs (5). Another study has looked at the effects on translation of deletion of the two eIF4G isoforms in yeast (6). While there is no clear functional differentiation between the eIF4G isoforms, mRNAs with longer poly(A) tails are more sensitive to eIF4G loss, consistent with a coupling between translation efficiency and eIF4G–PABP interaction (6).
In addition to modulating translation through binding proteins (2,7), the intrinsic properties of flanking UTRs may also influence translation (8). For example, analysis of UTRs in sets of yeast genes up- and down-regulated at translation in response to stress shows that the 5′-UTRs of up-regulated mRNAs are relatively longer, with an over-representation of upstream open reading frames (ORFs) (9). The secondary structure of mRNA has been predicted at the translational start site. An early study of eukaryotic and prokaryotic mRNAs found relatively low predicted secondary structure at translation initiation sites (10), an observation repeated with a more extensive analysis of yeast mRNAs (11). A broad study of predicted mRNA secondary structure in 340 species reported a similar reduction in secondary structure near the start codon (12). There have also been reports that 5′-UTR structures influence translation. Genes with relatively long and structured 5′-UTRs are translated more slowly (13) and, conversely, 5′-UTRs predicted to be weakly folded lead to higher translation rates (14).
Understanding of the structural biology and biophysics of eukaryotic translation has advanced with the determination of atomic resolution structures for several domains of initiation factors and regulators (15,16) and from an improved understanding of structure–function in the 43S pre-initiation complex (17). Intrinsically disordered protein regions play a role in cap-dependent translation. Parts of eIF4E undergo folding transitions upon cap binding and eIF4G binding (18). In eIF4G, beyond folding transitions centred on a specific interface with eIF4E, there exist RS (arginine–serine)-rich domains that interact with mRNA (19), which have been implicated in promoting assembly of eIF4F–mRNA complexes (20). RS domains are widely involved in protein–RNA interactions (21) and are believed to be intrinsically disordered (22). In previous work, we have simulated non-specific interactions between the charged surfaces of eIF4E/eIF4G structured domains and a polymer bead model for mRNA, suggesting that these interactions have the potential to supplement the cap–eIF4E interaction (23) and that mRNA secondary structure could, in principle, influence the interaction with protein charges.
The aim of this study is to investigate the potential for weak protein–mRNA interactions making use of genomic data. Translational profiling has revealed that two yeast 4E-BPs, Eap1p and Caf20p, modulate the translation of more than 1000 genes, with evidence that mRNA-binding proteins are in part responsible, through complexation with one or other 4E-BP (24). Genome-wide measurement of RNA secondary structure in yeast has also been reported, in which the parallel analysis of RNA secondary structure (PARS) technique is based on structure-specific nuclease treatment and subsequent sequencing (25). Combining these studies, it is possible to determine whether there are secondary structural differences between the mRNA subsets whose translation is modulated by the 4E-BPs. The structural framework in which to interpret these results is limited, because mRNA is a flexible polymer and large segments of the 4E-BPs have the properties of intrinsically disordered domains. In place of structural models, we complement bioinformatics analysis of mRNA secondary structures with a parallel study of charged amino acid runs in 4E-BPs, eIF4G and in the context of the yeast proteome. Relatively high 5′-UTR secondary structure is found for the translationally up-regulated set of mRNAs associated with one of the 4E-BP deletions. On the basis of a known role for eIF4G positive charge runs in mRNA binding and a gene ontology analysis for the yeast proteome, it is suggested that positively charged regions in the 4E-BPs could be associated with mRNA binding.
Sets of genes up- and down-regulated at translation, measured by polyribosome association, upon deletion of either of the two yeast (Saccharomyces cerevisiae) 4E-BPs, Caf20p and Eap1p, were derived from previous work (24). Table 1 gives the numbers of genes in these four subsets, before (Nred) and after (Nnon-red) removal of those (redundant) genes that appear in more than one subset and removal of entries from duplicate probesets (24). Secondary structures for S. cerevisiae mRNAs, measured with the PARS method (25), were obtained from http://genie.weizmann.ac.il/pubs/PARS10/pars10_catalogs.html. Untranslated regions were mapped onto the mRNAs using coordinates (26) available from the same location. The PARS technique profiles secondary structure of mRNAs by deep sequencing of fragments after treatment with structure-specific enzymes. RNase V1 preferentially cleaves double-stranded RNA, and S1 nuclease preferentially cleaves single-stranded RNA. For each nucleotide, the logarithm of the ratio between the number of reads obtained for that nucleotide in the V1-treated sample and that obtained in the S1-treated sample is computed. The PARS score of a particular nucleotide is then defined as this logarithm, for reads with the first base observed as the nucleotide immediately downstream of the site in question (25). The ability of PARS methodology to detect base pairing and secondary structure formation has been tested in relation to RNA footprinting data and known structures, with significant correspondence between PARS and computational predictions (25).
Code was written in Perl for various stages of data processing. Subsets of genes with up- or down-regulation, associated with 4E-BP deletions (24), are provided as a text file to make_nonredundant.pl to produce non-redundant versions (i.e. each gene is associated with just one subset). Either the redundant or non-redundant data are then used in code, PARS-profiles.pl, which associates each gene with UTR coordinate data and PARS scores (25), where available. Profiles are produced traversing the 5′- and 3′-UTRs either from the mRNA termini or from the start/stop codons. Several features of this profiling can be adjusted, including the number of nucleotides sampled (from mRNA termini and start/stop) and whether the profiles are smoothed. Smoothing is achieved, at each nucleotide location, by averaging the PARS scores over a window of 11nt centred on the selected nucleotide. The window is reduced at the UTR termini. Smoothing is applied unless stated otherwise. We generally study the profiles using the mRNA termini as origin. A profile for all genes, aligned to an origin at the start codon, matches with the original report [Figure 3c (25), data not shown]. There is an option to exclude a given number (typically 20) of nucleotides adjacent to the start codon, in the 5′-UTR, and adjacent to the stop codon, in the 3′-UTR. This is designed to prevent systematic variations in the profiles averaged within a subset, where UTRs will contribute lower PARS at the start/stop codon (25). These contributions will be out of phase, owing to variation in UTR lengths, when the profile is calculated traversing from either of the mRNA termini. PARS-profiles.pl produces profiles that are averaged over all genes in a subset (that can be mapped with UTR coordinates and which have PARS scores), and over all genes (with UTR coordinates and PARS scores). Separate code (write_PARS_individual.pl) allows output of individual UTR PARS profiles (i.e. not averaged over genes). For statistical comparison of a subset PARS profile and the overall profile, loop_resample.pl selects random subsets from the overall gene/UTR data, with sample number matching the subset size. The subset profile is then compared to the random sampling.
Past work has identified nucleotide sequence motifs that are enriched in mRNAs bound by specific RNA-binding proteins (RBPs) in yeast (27,28). RNA motifs, with the most significant enrichments, for 14 RBPs have been derived (27). The 5′-UTRs of mRNAs that are translationally up-regulated in the caf20Δ mutant were searched for occurrence of these 14 motifs. For each motif match, a PARS score average was calculated across the nucleotides of the motif. These PARS averages were then assigned to a single 10nt bin, starting from the 5′-end of the 5′-UTR, according to the location of the central nucleotide within a motif. If the motif contained an even number of nucleotides, the closest nucleotide below centre was used for binning. The sum of the PARS averages assigned to a bin were divided by the number of motif matches in that bin and the resultant histogram plotted. In addition, the number of motif matches in each bin was plotted. The total number of motif matches, for the 14 RBPs, in the 5′-UTRs of genes translationally up-regulated in the caf20Δ mutant, is 80.
Structural disorder for amino acid sequences was predicted with the GlobPlot set of amino acid propensities (29), coded locally, with a window of 21 amino acids giving an average value for the central amino acid in that window. Regions of disorder computed in this way were comparable to those computed with the Fold Index scheme (30). Perl code (seq_props_all.pl) that calculates disorder over 21 amino acid windows also gives the net charge in each window. It is assumed that the relevant cytoplasmic environment is close to neutral pH, so that the sidechains of aspartic and glutamic acids carry −1e charge, and lysine and arginine +1e. Histidine (intrinsic pKa of 6.3) is likely to be neutral in the absence of favourable interactions elevating the pKa. A typical value for pKa deviation upon salt-bridge formation is 1 pH unit (31). Histidine is treated as either neutral in the charge run calculations or with the effect of elevated histidine pKa (e.g. through interaction with the polyphosphate backbone of a nucleic acid). Charge runs were calculated for a set of 5885 proteins, the translations of all systematically named ORFs, obtained from the Saccharomyces Genome Database.
Caf20p, Eap1p, Tif4631p and Tif4632p amino acid sequences were searched against the Protein Data Bank (PDB) (32) with BLAST (33) to identify regions with known 3D structural domains. Two such regions were identified for eIF4G, with PDB ids 2vso (eiF4G–eIF4A complex) and 1rf8 (eIF4G–eIF4E) selected to display these regions and their interactions. Amino acid conservation was studied with the ConSurf server (34) (http://consurf.tau.ac.il/).
Sets of yeast genes with charge runs of a certain net charge in a 21 amino acid window were analyzed for enrichment of molecular function gene ontology (GO) terms with the GO-TermFinder software (35) implemented at Princeton (http://go.princeton.edu/cgi-bin/GOTermFinder). The set of 5885 ORFs used in the charge run analysis was provided as background for the enrichment calculation. Parameters were otherwise set at default values, including use of the Bonferroni correction for multiple comparisons. Enrichment for a particular GO molecular function (e.g. nucleic acid binding) was calculated as the ratio of the percentage of the test set genes associated with the molecular function to the percentage of the background set associated with the same molecular function.
In addition to the numbers of genes translationally up- and down-regulated, upon 4E-BP gene deletion, (before and after redundancy imposition), Table 1 gives the average lengths and PARS scores for 5′-UTRs and 3′-UTRs in these subsets. Statistics for subset differences to the overall yeast gene set (that match to PARS scores and UTR coordinates) are shown. The 5′-UTR properties, length and PARS score, for genes up-regulated in the caf20Δ mutant, show the most difference to the overall set. Here, the average 5′-UTR length is longer and the average 5′-UTR PARS score higher. Table 2 shows the equivalent UTR length and PARS score data to those in Table 1 but for redundant subsets. A similar pattern of results is obtained with 5′-UTR length and PARS score for the caf20Δ mutant up-regulated subset again giving the most significant deviations from the overall set.
Following the results in Table 1, highlighting 5′-UTRs, averaged PARS profiles are shown in Figure 1. The profiles are displayed with origins fixed at the 5′ terminus of each mRNA and with 20 UTR nucleotides adjacent to the start codon excluded from the analysis. Each plot gives the PARS profile for one deletion mutant, compared with the analogous profile for all genes, and with a spread of standard deviations for 1000 random resamples corresponding to the subset number of UTRs. In agreement with Table 1, the 5′-UTRs of genes translationally up-regulated in the caf20Δ mutant show higher PARS scores (more secondary structure) on average.
There are other differences between subset and overall PARS profiles in Figure 1. The accuracy of these comparisons reduces with distance from the 5′ end, as the number of UTRs included in the calculations decreases. To examine the influence of UTR length on profile differences, Figure 2 compares 5′-UTR results for the caf20Δ mutant [panel (a), as in Figure 1], with nucleotide exclusion from the start codon increased from 20 to 40 [panel (b)]. In addition, the entire set of genes was subjected to a pre-screen for 5′-UTRs of length ≥100, as well as implementing the 40-nt exclusion from start codon (Figure 2c). The average PARS score for the subset profile is not reduced by these measures, suggesting that increased secondary structure in the caf20Δ mutant set is not the result of differential sampling of 5′-UTR secondary structure properties immediately adjacent to the start codon. It may be that the significant increases in 5′-UTR length and average PARS scores, for genes up-regulated in the caf20Δ mutant, are related. For example, longer 5′-UTRs may simply give more scope for the development of mRNA secondary structure. It is not apparent that GC content lies behind the PARS profile differences because it is relatively uniform for subset averages over 5′-UTRs at 33–35%, comparable with previous bioinformatics analysis (36).
To investigate whether there are protein sequence features that could aid understanding of PARS score profile data, the sequences of Caf20p, Eap1p and Tif4631p were studied. The second eIF4G, Tif4632p, has similar properties to Tif4631p and is not shown. Amino acid sequences are aligned at the eIF4E-binding motif, YxxxxL, and disorder and net charge (with neutral histidine sidechains) shown for windowed calculations over 21 amino acids (Figure 3). Two regions of eIF4G are structurally annotated. Representative structures are displayed (as solid surfaces in complexes with other eIF4F components). These regions map, as expected, to low predicted structural disorder. Additionally, the sequence is more negatively charged (red) for each region, consistent with the surface plots. Three eIF4G regions with non-specific mRNA binding affinity (19) are marked in Figure 3 as RNA1, RNA2, RNA3 (20) along with the PABP-binding region. The non-specific mRNA-binding sites correspond to positive charge runs. RNA2 and RNA3 were identified as RS sites (arginine-serine rich) (19), and RNA1 is also clearly positively charged. All three RNA-binding regions are predicted to be structurally disordered. Furthermore, positive charge is involved in mRNA binding, since arginine replacement with alanine in RNA2 abolishes its RNA-binding activity (19). Deleting these regions impacts cumulatively on eIF4G function, with eIF4G lacking one of RNA1, RNA2 or RNA3 still active, but removal of two or three of these sites giving strong impairment or inactivity (19). This background encourages us to look at potential links between positive charge runs and mRNA binding in the current work.
The RNA2 region of eIF4G is in a similar location (about 100 amino acids C-terminal to the eIF4E-binding motif, YxxxxL) to positively charged runs in regions of predicted disorder for Caf20p and Eap1p. There are additional positive charge runs in regions of predicted disorder for these two 4E-BPs, notably in the much longer Eap1p sequence. The boxed region of about 100 amino acids either side of the specific eIF4E-binding motif (Figure 3) shows differences between the 4E-BPs and eIF4G. Caf20p has little sequence N-terminal to the binding motif, and the dominant C-terminal feature is the positively charged region. Eap1p is predicted to be largely positively charged and disordered on each side, and eIF4G (Tif4631p) is ordered and negatively charged N-terminal, and positively charged and predicted to be disordered C-terminal to the motif. We currently lack structural or simulation models for disordered protein interactions with mRNA, including any potential affect of mRNA secondary structure. It is therefore difficult to draw precise links between the protein sequence properties of Figure 3 and the PARS score profiles (Figures 1 and and2).2). It is, however, interesting that the most positively charged window in these sequences is a net charge of +8 in Caf20p. This region is relatively well conserved in the context of these protein sequences, particularly the positively charged amino acids (not shown). In addition, this window in Caf20p also contains four histidines that are assigned zero charge in the analysis of Figure 3, but could ionize, increasing the positive charge. Charge profiles with histidine included are studied in a subsequent section.
To further investigate charge runs in yeast proteins and biological function, an analysis of GO term (molecular function) enrichment in protein-coding ORFs of varying charge properties was carried out. Figure 4a shows the fraction of protein-coding genes that contain 21 amino acid windows (predicted to be ordered or disordered) of the given net charge, positive or negative. Proteins with high negative charge windows are more common than those with positive charge. Taking a net charge of 8 (with neutral histidines), the highest for the proteins in Figure 3 (in Caf20p), the most significant enrichments for both positive and negative charge runs, in the function GO category, are nucleic acid binding. The enrichments are higher for positive charge runs than their negative counterparts. For example, at a net charge of +8, enrichment ratios for nucleic acid binding, DNA binding, RNA binding are 2.62, 2.32 and 1.78, respectively, in windows of predicted disorder. The equivalent enrichment ratios for a net charge of −8 are 1.77, 1.46 and 1.10. Figure 4 shows the enrichment of annotation for nucleic acid binding, and within that RNA- and DNA-binding, for windows with net charge +1 to +10, and for predicted disordered (Figure 4b) and ordered (Figure 4c) segments. Both plots show significant enrichment for nucleic acid binding as net positive charge increases, with this more evident at lower charge for disordered, relative to ordered. These data indicate that windows of net positive charge comparable to those in the 4E-BPs are consistent with nucleic acid binding. A comparison of Figure 4b and c shows that enrichment for RNA-binding proteins falls off more quickly, as net charge increases at higher values, than does that for DNA-binding proteins. This difference is more evident for proteins containing windows predicted to be disordered. Analysis of those genes lost when increasing net charge from +6 to +7 reveals a large number of retrotransposon Ty elements. These contain nucleocapsid proteins, structural components of the virus-like particle shell that encapsulates the retrotransposon RNA (37), involved in non-specific RNA interactions and packaging (38).
Having noted that the window of predicted disorder with the highest positive charge in the two 4E-BPs and eIF4G, is also rich in histidines, the charge runs analysis was repeated with ionized histidine. The effect (Figure 5) is to substantially enhance the positive charge profile of the region previously identified in Caf20p, such that it now stands out in magnitude from all other regions in these three proteins. A BLAST search (33) reveals 12 homologues of Caf20p with a sequence identity of ≥55%. There is then a gap to further homologues with sequence identity of ≤33%. The left-hand insert of Figure 5 shows charge profiles for the Caf20p homologues of ≥55% sequence identity, aligned at the amino termini. The positively charged region is a relatively well-conserved feature within this set of proteins.
The maximum positive charge of a 21 amino acid window in Figure 5 (with ionized histidines) is +12 in Caf20p. We found 16 other yeast proteins with a maximum charge of +12 or greater, including ionized histidines, in a region of predicted disorder. Searching these against the PDB revealed one structure, for ribosomal protein L28. The positively charged region, that is predicted to be disordered, is an N-terminal extension from a folded L28 domain. It is ordered by virtue of extensive interactions with ribosomal RNA (39). The right-hand inset of Figure 5 shows this L28 extension, with basic amino acids highlighted on a backbone tube. Peak charge for a window in L28 is +12, the same as that for Caf20p.
Caf20p is distinct from Eap1p and eIF4G, in terms of charge profiles, when charged histidine is included. A region with the same charge characteristics in yeast ribosomal protein L28 interacts with secondary-structure-rich ribosomal RNA. The charge states and roles of histidine sidechains are unknown in the L28-RNA interactions, although in principle electrostatic interactions could lead to protonation.
Positively charged, predicted disordered regions, contribute to mRNA binding for eIF4G (19). GO analysis shows that such runs are enriched in nucleic acid-binding proteins. They are also present in the yeast 4E-BPs, where we suggest they could contribute to mRNA selectivity when coupled with mRNA secondary structure, since Caf20p has the largest positively charged region and average secondary structure is higher in the 5′-UTRs of genes up-regulated in the caf20Δ mutant. Studies of protein–mRNA interactions in translational control generally centre on sequence specific binding proteins, often with additional protein–protein interactions (27). The 4E-BP deletion mutant sets of up- and down-regulated genes have been examined previously in the context of enrichment for RBPs (24). In the current analysis, we were interested in the locations and PARS scores for predicted RBP sites in the 5′-UTRs. RNA motifs associated with known RBPs (27,28) were identified within the 5′-UTRs of mRNAs up-regulated in the caf20Δ mutant (Figure 6). RBP motifs are distributed throughout the 100 nts, shown from the 5′ end. The lower number of 5′-UTRs available in the data set as length increases will introduce a bias towards locations closer to the 5′ end. Overall, it appears that there is no clear tendency for the binding sites of this set of RBPs to lie towards the 5′ end. By contrast, the increase in PARS profile for this set of 5′-UTRs, relative to the transcriptome average (Figure 1a), is particularly apparent towards the 5′ end. In addition, the profile of PARS scores for motif hits is generally negative. Whilst it partly follows the PARS profile for caf20Δ up-regulated genes (e.g. with a dip around nucleotides 40–50, Figure 1a), this similarity is superposed on a different absolute scale (positive PARS scores in Figure 1a, largely negative in Figure 6a).
High throughput approaches are illuminating the field of post-transcriptional control, with a rich structure of RBP protein sites in mRNAs becoming apparent (28), analogous to the complexity of transcription factor binding sites in the genome. We find that regions of predicted disorder, relatively high in positive charge, are enriched in nucleic acid-binding proteins. Such positive charge runs are evident, but with variations, in the two yeast 4E-BPs that have been studied in respect of their mRNA selectivity (24). Subsets of mRNAs associated with these 4E-BPs have different 5′-UTR secondary structure properties. It is suggested that the degree of mRNA secondary structure could modulate interaction with positive charge runs in regions of protein disorder, and that the consequent variation could complement other mechanisms, in establishing mRNA selectivity of the 4E-BPs. The suggestion that positively charged runs in the 4E-BPs contribute to mRNA binding is consistent with earlier reports for positively charged, predicted disordered regions in eIF4G (19), and with the role of the RNA1 region in stabilising eIF4G-mRNA association (20).
To investigate whether the high average 5′-UTR secondary structures for caf20Δ up-regulated genes are associated with RBPs, we scanned the 5′-UTRs for known RBP-binding motifs. No clear association between the higher 5′-UTR secondary structure and RBPs was found. However, many RBPs remain uncharacterized in detail. Therefore, this analysis does not exclude a link between motif-specific RBPs, the mRNA secondary structure difference noted here (Figure 1a), and translational regulation. Indeed, translational control uncovered by the 4E-BP deletion mutants has been linked with the 3′-UTR binding PUF family of RBPs (24), and on a wider scale RBP target motifs are associated with translational regulation (28). Nevertheless, this work leads to the hypothesis that interactions beyond those encoded in specific RNA motif—RBP pairings could play a role in translational control.
As with many areas involving intrinsically unstructured proteins, lack of structural visualisation makes it difficult to characterize interactions in detail. Modelling of polyelectrostatic phenomena in biology (40,41), in combination with physico-chemical models for charged polymer interactions, will be required for a theoretical understanding of the binding phenomena proposed in this work. In simulation studies of entangled, linear, polyelectrolytes, it appears that oppositely charged molecules associate through wrapped intermediates (42). If this were the case for natively unstructured protein regions and mRNA, then the degree of mRNA secondary structure would influence binding. Wrapped chain intermediates are also evident in Brownian dynamics simulations of complexation between oppositely charged linear polymers with bond length asymmetry (i.e. differing charge densities) (43).
It is proposed that a relatively high positive charge run in Caf20p could be suited to interacting with mRNA that is relatively rich in secondary structure. The structure of a similar positively charged region in ribosomal protein L28 shows interactions with ribosomal RNA. Interestingly, longer 5′-UTRs (as well as higher PARS scores) associate with genes up-regulated in the caf20Δ mutant. Longer 5′-UTRs could lead to greater separation between paired bases, more constrained mRNA structure in 3D, and possibly higher mRNA negative charge density.
It has been suggested that auxiliary domains in eukaryotic RBPs provide relatively non-specific, preliminary protein–RNA interactions, followed by migration to form a high affinity, RNA sequence-specific interaction with a separate RNA-binding domain (44). This is similar to the process by which transcription factors are thought to use non-specific binding to facilitate intramolecular sliding and location of specific binding sites (45). Typical of proteins containing RNA-binding auxiliary domains is the SR protein family, with high affinity RNA recognition motifs and lower affinity arginine, serine-rich (RS) domains (21). The 4E-BPs and eIF4G also link to mRNA via high affinity (eIF4E-mediated) interactions. Weaker RNA-binding sites are present in eIF4G (19) and are proposed here for the 4E-BPs. Analysis of positive charge runs seems to be a more general method for identifying possible RNA binding regions than assignment of a particular (e.g. RS) domain and may therefore be useful as a tool to aid the study of auxiliary nucleic acid-binding domains.
Funding for open access charge: UK Biotechnology and Biological Sciences Research Council (PhD studentship award to A.C.).
Conflict of interest statement. None declared.
The authors thank Graham Pavitt and Simon Hubbard for discussions and the anonymous referees for their comments.