|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: LC IS MCR. Performed the experiments: LC IS MCR. Analyzed the data: LC IS MCR. Contributed reagents/materials/analysis tools: LC IS MCR. Wrote the paper: LC IS MCR.
To understand the complex relationship governing transcript abundance and the level of the encoded protein, we integrate genome-wide experimental data of ribosomal density on mRNAs with a novel stochastic model describing ribosome traffic dynamics during translation elongation. This analysis reveals that codon arrangement, rather than simply codon bias, has a key role in determining translational efficiency. It also reveals that translation output is governed both by initiation efficiency and elongation dynamics. By integrating genome-wide experimental data sets with simulation of ribosome traffic on all Saccharomyces cerevisiae ORFs, mRNA-specific translation initiation rates are for the first time estimated across the entire transcriptome. Our analysis identifies different classes of mRNAs characterised by their initiation rates, their ribosome traffic dynamics, and by their response to ribosome availability. Strikingly, this classification based on translational dynamics maps onto key gene ontological classifications, revealing evolutionary optimisation of translation responses to be strongly influenced by gene function.
Gene expression regulation is central to all living systems. Here we introduce a new framework and methodology to study the last stage of protein production in cells, where the genetic information encoded in the mRNAs is translated from the language of nucleotides into functional proteins. The process, on each mRNA, is carried out concurrently by several ribosomes; like cars on a small countryside road, they cannot overtake each other, and can form queues. By integrating experimental data with genome-wide simulations of our model, we analyse ribosome traffic across the entire Saccharomyces cerevisiae genome, and for the first time estimate mRNA-specific translation initiation rates for each transcript. Crucially, we identify different classes of mRNAs characterised by different ribosome traffic dynamics. Remarkably, this classification based on translational dynamics, and the evaluation of mRNA-specific initiation rates, map onto key gene ontological classifications, revealing evolutionary optimisation of translation responses to be strongly influenced by gene function.
The expression of genes can be considered as a two-stage process, beginning with transcription and the production of an mRNA, followed by translation of that mRNA into protein by the cell's ribosome population. Gene expression must be tightly regulated to control protein composition, enabling the cell to rapidly respond to a wide range of environmental conditions. For this reason, cells exert fine control over gene expression, both at the transcriptional ,  and post-transcriptional level –.
One key mechanism of post-transcriptional control of gene expression is translational regulation. The process of translation can be divided in three main phases, namely initiation, elongation and termination. Whereas termination is generally believed to be a fast process and therefore not limiting for translation , the respective contributions of initiation and elongation to translational regulation are still under debate .
On one hand, the translation initiation rate, or the rate at which ribosomes access the 5′ untranslated region (5′ UTR) and start translating the ORF, is regulated in part by formation of secondary structures in the 5′ leader , . The presence of secondary structures inhibits the ability of an mRNA to sequester ribosomes, thereby lowering the effective translation initiation rate. The 5′ leader composition is characteristic of each mRNA, resulting in a heterogeneity of the ribosome recruitment process among the transcripts , . Despite the importance of this process in gene expression regulation, there are currently no estimates of in vivo, mRNA-specific translation initiation rates based on refined traffic models, and how they regulate genome-wide patterns of protein expression. On the other hand, there is increasing evidence that translation elongation itself controls gene expression, being regulated by the rate of supply of tRNAs, particularly in microorganisms with codon biased genomes. Within families of isoacceptor tRNAs, members are not all present at the same concentration in the cell, leading to variation in delivery times, and the introduction of stochastic pauses . Such pauses control ribosome transit, regulating ribosome queue formation. There is evidence that a ramp of slow codons near the 5′ end of some open reading frames regulates the flow of ribosomes onto an mRNA , , and pausing during elongation on any mRNA will affect queue dynamics, and thus the flux (or current) of ribosomes along the mRNA. However, there is no knowledge of how, on a genome-wide scale, the dynamic flux of ribosomes along an mRNA might be crucial in regulating protein expression.
Here, we address these two problems: first, we estimate mRNA-specific in vivo translation initiation rates on a genome-wide basis by integrating a computational model of mRNA translation with experimental datasets of ribosome occupancy. Crucially, we show that translation initiation rates are correlated with gene function. Second, we show that the translation dynamics response of each mRNA is characteristic of its gene ontology, by elucidating how ribosome traffic, moving with variable speed across the codon field, responds to a range of initiation rates. We also show that codon arrangement rather than codon usage, clearly separates mRNAs into distinct classes typified by their responses to variations of the translation initiation rate. This suggests that not only codon usage but also codon arrangement is a selectable determinant of gene expression.
Our model describes how ribosomes bind to the mRNA, move along it performing the translation, and dissociate from the mRNA at the stop codon, releasing the finished protein into the cytoplasm . The mRNA is represented by a unidimensional lattice, with each site denoting a codon. Ribosomes are represented by particles occupying 9 codons  that attempt to bind the mRNA with a rate , provided that the binding region is not obstructed by another ribosome. The particle on-rate mimics the initiation of translation, in which several processes have been condensed into just one step. The factors influencing the initiation of translation, such as secondary structures in the 5′UTR, concentration of initiation factors and ribosome availability, are all included in this parameter and will be discussed below. Subsequently, ribosomes advance on the polynucleotide chain (elongation) following a two-state dynamics: (1) recognition of the cognate tRNA with rate depending on the codon , and (2) translocation towards the next codon with rate (see Figure 1). At the last codon, the ribosomes detach and release the protein with a rate (termination).
The cognate tRNA-capture rates can be estimated from data on tRNA abundances, which are assumed to be proportional to their gene copy numbers , and by considering further corrections such as the wobble base pairing (see Supplementary Information, Text S1). Effects of competition for near-cognate and non-cognate tRNAS were found not to materially affect any of the conclusions of this study (see Supplementary Information, Text S1), and are therefore neglected. The translocation rate has been measured to be 35 , and is codon independent. The termination rate is determined by the concentration of the release factors; the termination process is assumed to be fast, comparable to the translocation .
Moreover, the model takes into consideration steric interactions between ribosomes, so that even if a ribosome sitting on codon has already captured the cognate tRNA, it cannot translocate if the next codon is occupied by another ribosome. Hence, it is an exclusion process  exhibiting different regimes characterised by the flow of particles and by their density along the lattice. In particular, if the sequence contains slow sites, then queues of particles behind the slow sites or high density phases appear when the on-rate of particles is of the same magnitude as the bottleneck rate.
In contrast to commonly used exclusion models , , our model accounts for the processes involved in the mechano-chemical ribosome cycle, condensing them in two main steps: capture of the tRNA and translocation. It includes the crucial fact that ribosomes can capture a cognate tRNA while they wait for the next lattice position to become vacant. In contrast, ribosomes from simpler exclusion models unrealistically “lose” immediately the captured tRNA if they cannot move to the next codon. This is a key difference, which leads to different dynamics of ribosome traffic and transitions between traffic regimes . This effect is further enhanced by the fact that the time scales related to the capture of the tRNA and translocation are strongly separated, with the translocation being much faster. Furthermore, the two-state ribosome reproduces the dwell-times observed in single-molecule experiments .
In summary, our model predicts the current of ribosomes or translation rate, and the density of ribosomes on a particular mRNA (number of ribosomes divided by the ORF length), taking as input the specific sequence of codons of the mRNA. Both the translation rate and the ribosome density are predicted as a function of the translation initiation rate , i.e. the rate at which ribosomes arrive at the start AUG codon; their functional dependence on the initiation rate thus varies from sequence to sequence as a consequence of different codon compositions and codon arrangements.
The translation initiation rate , i.e. the rate at which ribosomes start translating the ORF, depends on many factors, such as the rate at which ribosomes attempt to bind the mRNA, the concentration of initiation factors and the presence of secondary structures in the 5′UTR region , , . Despite the key role of this parameter, direct experimental evaluations are intractable, both in vivo and in vitro, with no direct measurements having been carried out to date.
Previous works, such as , could only estimate the translation initiation rate as the value that maximised the predicted correlation of the ribosome current with experimental data. Furthermore, has usually been considered as a unique, fixed value (the same for each of the mRNAs), but it is well known that the translation initiation rate depends on several mRNA-specific factors, such as the structural properties of the mRNA leader region. Knowledge of mRNA-specific values of , therefore, would provide important insight into control of gene expression at the level of translation. Siwiak and Zielenkiewicz  present specific initiation rates, however with a simple model that neglects ribosome kinetics and traffic (the comparison is discussed in the Supplementary Text S2). Here we present a novel approach to identify the initiation rate of each individual mRNA for the whole genome.
We first apply our translation model to all mRNA sequences of S. cerevisiae. The model predicts the translation rate and the ribosome density on each mRNA as a function of the translation initiation rate . Then, by utilising genome-wide experimental data of ribosome density from  for yeast grown under non-stressed conditions, we identify the physiological translation initiation rate as the one which, when used in our simulations, replicates the experimentally observed density:
This yields a value of the translation initiation rate for each mRNA as shown by the genome-wide distribution in Figure 2. Using the genome-wide experimental data of ribosome density from Arava et al.  yields a very similar distribution of initiation rates (see Section 4 of Supplementary Text S1). The knowledge of this distribution reveals how translational regulation of gene expression works at the level of initiation by correlating the values of with the biological functions of the corresponding genes, encoded in their Gene Ontology (GO) annotations. In Figure 2 we split up this distribution in four parts, from small to high (i)–(iv). Strikingly, significantly enriched GO annotations are identifiable in each of the regions. Messenger RNAs with an initiation rate below (region (i) of Figure 2) contain a highly disproportionate number of regulatory proteins and proteins linked to transcription from Pol II promoters, mainly located in the nucleus, chromosome, membrane or protein complexes. In the range of from to (region (ii) of Figure 2) we find other significantly over-represented terms such as cytoplasmic translation, ribosome biogenesis or oxoacid metabolic process, while genes with from to (region (iii) of Figure 2) are primarily constituents of ribosomes. Very large initiation rates (region (iv) of Figure 2) are characteristic of genes associated with the respiratory chain. However, most of genes falling in this region are not annotated (a complete list of can be found in the Supplementary Table S1 and the details of the GO analysis, with the annotations found in each region and their enrichments, can be found in the Supplementary Table S2).
The assignment of initiation rates correlates with protein abundances typical of given GO categories: regulatory proteins are usually present at low levels. In contrast, proteins involved in translation, ribosome biogenesis and metabolic processes are abundant. This result is a signature of the divergent translational control that distinct genes exhibit at the level of initiation, suggesting that factors influencing , such as secondary structure in the 5′ leader region, have been shaped by evolution to contribute to the delicate balance of cellular protein composition.
To show that the procedure introduced above can be applied under different conditions, we carry out a similar analysis under pheromone treatment by using the corresponding measurements of ribosome densities from , and estimate the initiation rates under these conditions. The initiation rates do not substantially change, consistent with the finding by Mackay et al.  that only a small number of mRNAs exhibit altered densities after pheromone treatment. However, with our analysis we identify two mRNAs, SAG1 and HO, which exhibit a radical change in their initiation rate value under pheromone treatment. Importantly, these two mRNAs have been shown to present altered 5′UTR sequences that explain their significant ribosome density change .
Now we analyse the influence of the physical properties of the mRNA on the translation initiation rate by analysing the correlations of the identified rates with the presence of secondary structures in the 5′UTR and the length of the transcript. The physiological estimates of the initiation rate show a small but significant correlation with the free energy of the secondary structures in the 5′UTR (Spearman's rank, p-value) confirming that secondary structures may have an important regulatory role, as already suggested , , , see Supplementary Information Text S1.
Remarkably, we find a strong negative correlation between the initiation rate and the length of the ORF (Spearman's rank, p-value), see Figure 3. Of relevance to this observation, Arava and coworkers  found that ribosome density counter-intuitively and systematically decreases with increasing the ORF length. In a subsequent work , they reported that the explanation most consistent with their experimental investigation was that lower initiation rates predominate on longer mRNAs, exactly as we estimate in this work.
Our genome-wide simulation generates, for each transcript, curves describing how the ribosome density (polysome size) and the ribosomal current depend on the initiation rate . According to these characteristic curves, mRNA sequences fall predominantly into either of two categories (Figure 4): the ribosome density of some mRNAs presents a steep increment with increasing (we refer to these sequences as abrupt), while others present a gradual increase of the density with the initiation rate (smooth sequences). This classification coincides with the characteristic curve obtained for the ribosomal current against . In abrupt sequences the current presents a kink before reaching its saturation value, whereas in smooth sequences, the saturation value of is reached gradually . Therefore, mRNA sequences can be classified into two different types depending on how their overall translation rate and polysome size vary upon changes in the initiation rate.
The origin of these two types of ribosome traffic lies in the codon arrangement: smooth behaviour mRNAs contain either rare codons at the 5′ end, or no rare codons at all, whereas abrupt behaviour mRNAs contain rare codons or clusters of rare codons within the main body of the mRNA. These rare codons act as bottlenecks, causing queues to build up and consequently -dependent abrupt phase transitions to occur. In contrast, if the bottleneck is right at the beginning of the mRNA, no queue can be formed and therefore the polysome size increases smoothly with the initiation rate .
The curves for the ribosome density versus for the more than 6,000 S. cerevisiae mRNAs were analysed with an automated clustering algorithm  to classify them into abrupt or smooth sequences. The algorithm clearly classified 35% sequences to belong to the abrupt category, and 38% to the smooth category. The remaining 27% sequences were marked as hybrid since they did not show pronounced features to justify a discrimination between the two categories (Supplementary Information Text S1).
Strikingly, these two categories, each with distinct initiation rate response criteria (smooth or abrupt) correlate with the biological function of the encoded proteins: GO annotations (process) related to translation are significantly over-represented in smooth sequences (cytoplasmic translation, P-value , translation, P-value ). Conversely, abrupt sequences are connected to several processes, mainly involving regulation, e.g. biological regulation, P-value , metabolic process, P-value or cellular response to stimulus, P-value . More details about the enrichment in each category can be found in the Supplementary Table S3.
If one considers the abundance of the transcripts in the cell (data from ), then 68% of the total mRNA population belongs to the smooth type. This indicates that highly transcribed genes have preferentially slow codons at the 5′ end rather than in the main body of the mRNA. In this way, highly transcribed genes avoid having queues of ribosomes which might deplete a large amount of essential cell resources. Abrupt sequences, on the other hand, constitute only 14% of the transcribed mRNAs; this is consistent with the fact that abrupt sequences typically encode regulatory proteins, which are in general of low abundance.
A further characteristic we can conjecture from our genome-wide simulations, is how the translation rate changes upon variations in the initiation rate around the physiological value quantified above. As previously mentioned, the value of the translation initiation rate depends on several factors, such as the amount of free ribosomes, initiation factors, and folding features of the 5′ UTRs, all of which are strongly influenced by stress and nutrient conditions . The knowledge of the responsiveness of translation rate to variations of the initiation rate, therefore, theoretically provides key insight into the mechanisms of translational regulation of gene expression.
In order to study translation rate change responsiveness, or ‘gearing’, we study the combined role of and the presumed gradient of the translation rate, which quantifies the responsiveness of around the physiological value of the initiation rate (see ‘Materials and Methods’). We find that these two quantities are highly correlated (Spearman's rank, p-value), Figure 5. This indicates that, according to our model, genes characterised by a small initiation rate, such as regulatory genes, have in general a high translation rate gradient, suggesting that the corresponding proteins are produced at low levels under normal conditions but their synthesis can be rapidly increased upon changes in the initiation rate. Conversely, genes characterised by a high initiation rate, such as genes encoding proteins involved in translation and ribosome biogenesis, exhibit in general a medium-to-low value of the translation rate responsiveness, implying that their synthesis is tuned to be efficient, but stable against variations in initiating ribosomal subunit availability. Moreover, by dividing the distributions into quartiles, we identify sixteen different regions; a number of them exhibit significant enrichment in specific GO annotations determined principally on the basis of (Figure 5). However, by constraining the genes analysed to those with an value lying within a specific range, the specific contribution of ) could be identified. This revealed that there is a further, separable enrichment of GO categories on the basis of . This in turn indicates that the gearing function, or responsiveness to ribosome availability, is also coupled to gene function. Thus regions 4, 7, 8, 12 and 16 from Figure 5 (regions are numbered starting from the top left one and proceeding left to right) show a significant enrichment in specific GO annotations with a P-value smaller than 0.01 (see Supplementary Table S4). The genes exhibiting a significant enrichment in region 16 were un-annotated. The results in Figure 5 (the estimated plotted against the values) are annotated with GO category enrichments influenced by the combination of the physiological initiation rate and the gearing factor.
We would like to emphasise that, unlike all other results shown in this work, the gearing factor , i.e. the responsiveness capacity of the mRNAs, remains a speculative and theoretical outcome of the model. Since genome-wide experimental setups changing the initiation rates of single transcripts and observing the variation of translation remain are nowadays a challenge, its biological relevance remains to be proven.
We also perform a genome-wide analysis of the maximal translation rate that a sequence can achieve, when an increase in the initiation rate does not yield any further change in (see ‘Materials and Methods’). We extract for each mRNA sequence from our genome-wide simulations and analyse the mRNAs with largest and smallest (first quartiles). We find that sequences with the largest maximal production rate are mRNAs involved in cytoplasmic translation (P-value), such as ribosomal and translational proteins. Conversely, proteins with regulatory functions (such as nucleic acid binding transcription factor activity, P-value ) are encoded by sequences with the smallest . Supplementary Table S5 summarises this GO analysis.
In summary, our genome-wide analysis shows that the type of ribosome traffic on mRNA is significantly correlated with the biological function of the encoded protein: essential proteins that need to be constitutively produced, such as ribosomal proteins and proteins involved in translation, typically exhibit a smooth increase of polysome size upon increments of the initiation rate, a large physiological initiation rate and a high maximal overall translation rate. In contrast, proteins such as the ones involved in responses to stimuli typically exhibit an abrupt increase of the polysome size with the initiation rate, present a small physiological initiation rate and a low maximal overall translation efficiency. In the next section we discuss the fundamental role of codon arrangement in determining the translation efficiency.
In order to show that the genome-wide correlation between translational efficiency and biological function obtained above is not only the consequence of codon usage but is strongly influenced by the order in which codons are used in the mRNA, we simulate the translation of a library of randomised ORFs such that both amino acid sequence and codon composition remain identical. That means, two ORFs belonging to the library have exactly the same codon usage but the arrangement of these codons is different. Here we show that, even though all these randomised ORFs have exactly the same codon usage indices such as the CAI, codon adaptation index , and tAI, tRNA adaptation index , their predicted protein production rate can be very different. Figure 6 shows how the different values of predicted protein production rate (ribosomal current for a fixed initiation rate) are distributed for 2,000 synonymous randomised codon sequences of a typical Saccharomyces cerevisiae gene (YPL106C).
Figure 6 clearly shows that the relative positioning of codons has a crucial effect on the translation efficiency, suggesting that very different cellular production rates can be achieved through evolution of the codon arrangement. For instance, in the case shown in Figure 6 there is an increase of about from the lowest to the highest value of the translation rate. The variation of for different codon arrangements is a general result and does not depend on the gene or the chosen initiation rate (for more information see Supplementary Information, Text S1). We thus show that by randomising codon arrangement (i.e. randomly exchanging the position of synonymous codons in a sequence), different protein production rates are obtained, even though codon usage remains fixed. This indicates that the codon arrangement has a highly significant role in determining the efficiency of translation.
While several models of protein synthesis have been developed over the last decades , the role of codon sequence and stochastic ribosomal movement has been investigated only recently. But even recent models typically treat the initiation rate as a fixed parameter, identical for all mRNAs, despite its key role in determining translational efficiency. In contrast, our model predicts the protein production rate as a function of the initiation rate. By then integrating genome-wide simulations with datasets of polysome sizes, we have identified the physiological value of the initiation rate for each mRNA. This set of values then leads to the prediction of the protein production rate for each transcript. This allows us to validate our model predictions with experimental data.
Figure 7A is a scatter plot of the genome-wide simulations versus measured protein abundance from . The model predictions for , where denotes mRNA abundance, correlate very well with the experimental protein abundances (Spearman's rank=0.64, p-value), compared to other attempts such as the tAIc (Spearman's rank=0.38, p-value). Our outcome is further improved when considering just transcripts loaded onto polysomes (Spearman's rank=0.66, p-value), see ‘Materials and Methods’ and Supplementary Information, Table S1. Moreover, as it can be appreciated from Figure 7A, the predictions from our model correlate very well with measured protein abundance for all ranges of gene expression, in contrast to other translation efficiency indices (panels B,C,D), which exhibit a poor correlation for lowly expressed genes.
The phenotype exhibited by any cell is dictated by its proteomic composition. How much of each type of protein is expressed is governed by a range of factors, including the level of transcription and stability of the encoding mRNA, the half-life of the protein, and how efficiently its mRNA is translated. A number of strategies have been employed to predict translational efficiency, many of which utilise the observation that not all codons are used with equal frequency, and that codon usage frequency is proportional, at some level, to the abundance of the corresponding decoding tRNA species , . Initially, measures such as the codon adaptation (CAI) index were developed , which correlate high protein abundance with over-use of the sub-set of codons found in a group of very-highly expressed genes, normally those encoding the ribosomal proteins. However, such approaches frequently struggle to predict the expression level of less abundant proteins. More recently, dynamic TASEP (Totally Asymmetric Simple Exclusion Process) models have been employed to simulate the flow of ribosomal traffic, including queuing interactions between adjacent ribosomes on the polysome , –. Even though these models represent a big step towards a more complete description of the translation process, most of them miss one essential component, namely the mechano-chemical ribosome cycle. By including this mechanism into an exclusion process we showed that the mathematical description of translation becomes much more accurate . Here we applied this model to simulate the translation of every mRNA in the transcriptome of S. cerevisiae leading to the estimates of the individual translation initiation rates unique to each of the 6,000 genes in yeast. We furthermore showed that mRNA sequences can be classified according to their ribosome traffic characteristics, and crucially, this classification maps to gene ontology assignments.
Even though the role of the translation initiation rate has been shown to play a central role in translational control of gene expression , to our knowledge no genome-wide estimations of these rates have been reported, considering ribosome traffic effects. The translation initiation rate, i.e. the rate at which ribosomes start translating the ORF, condenses many factors, such as cytoplasmic ribosome availability, initiation factors and secondary structures on the 5′UTRs, all of them strongly dependent on nutrients and stress conditions. Some approaches consider the translation initiation rate to be fixed for every transcript, thereby neglecting the key factors that make the initiation rate unique to each transcript. In contrast, by considering traffic dynamics, we determined the first genome-wide estimate of initiation rates for each and every mRNA (Figure 2) by integrating our stochastic model of ribosome traffic with data of ribosome densities across all mRNAs . Our analysis showed a wide range of values under these non-stress conditions. Importantly, the values are strongly correlated with gene function, explaining for example why translation of ribosomal protein mRNAs, which typically have a very high value, is very efficient. These values of are expected to be influenced by the degree of secondary structure of the 5′ leader sequence, and indeed we did find a significant correlation with the free energies of the secondary structures. The strongest connection involving was however a negative correlation with mRNA length, mirroring the findings from experimental research that described lower ribosome densities on longer mRNAs , . In contrast to the explanation that the effect could be caused by bottlenecks of slow codons  (see Supplementary Information Text S1) this negative correlation supports the idea that due to the circular structure of mRNAs, the ends of shorter mRNAs can interact more easily than longer mRNAs, thereby promoting ribosome recycling , , . Indeed following detailed experimental analysis using ribosome density mapping, Arava and colleagues concluded that lower densities on longer mRNAs are best explained by lower rates of translation initiation , mirroring our findings in this work. To summarise, we interpret the correlation between the estimated initiation rates and the ORF lengths as a possible indication of a regulatory mechanism that allows circularised mRNAs to load ribosomes more efficiently onto their transcripts, leading to the observed ribosome-ORF length relationship.
Our analysis furthermore identified two main distinct classes of mRNAs regarding their responsiveness to changes in the initiation rate : some sequences exhibited an abrupt change in the polysome size upon a change in , whereas smooth sequences showed a gradual increase. Calculations with artificial sequences revealed that sequences with rare codons in the main body of the ORF belong to the abrupt class, whereas sequences with either no rare codons or rare codons at the 5′ end belong to the smooth class . Crucially, we note that the classification of mRNAs into smooth and abrupt responders maps onto particular gene ontological classifications. Smooth responder mRNAs as a class are highly over-populated with ribosomal protein mRNAs and translation factors. Conversely, the abrupt class contains disproportionate numbers of regulatory proteins, including nucleic acid-binding transcription factors, and cell cycle proteins. One reason why ribosomal protein mRNAs are predominantly of the smooth response type might relate to the massive manufacturing scale of ribosome biosynthesis; in yeast, ribosomal protein mRNAs account for nearly 30 percent of all mRNAs , . Smooth-type responses to must be of selective advantage for a cell, since if ribosome queues were established on such a large proportion of the cell mRNA population they would sequester a large numbers of ribosomes, with deleterious consequences for cell fitness. On the other hand, it has been recently found that cell-cycle regulated genes predominantly adopt non-optimal codon usage (with no ramp of slow codons at the beginning, and therefore of the abrupt class) to achieve elongation-limited mRNA translation; this can generate cell cycle-dependent oscillations in protein abundance induced by changes in the tRNA pool . Therefore, it is apparent that the cell coordinates codon usage and codon arrangement to achieve translational gene expression control.
Our results also showed important differences in the computationally deduced slope of the production rate curve in response to increasing . Some mRNAs are what we term highly geared, that is, small increases in produce relatively large increases in . This type of super-responsive mRNA was significantly enriched in regulatory proteins, which also have a relatively small initiation rate. We speculate that this might be a mechanism to facilitate rapid responses to changed environmental conditions, allowing, for example, rapid synthesis of transcriptional repressors that in physiological conditions are severely limited by the initiation (low ). Conversely, low geared mRNAs, where increases in produce proportionately lower responses in , were enriched in ribosomal proteins. Since ribosomal proteins are used to manufacture ribosomes, lower gearing of the responsiveness to may help prevent undesirable positive feedback effects. We furthermore classified mRNA sequences according to the maximal translation rate that they can achieve, i.e. their saturation value, and our analysis revealed that abrupt sequences have predominantly a small , whereas smooth sequences are characterised by a large . This correlates with the levels of the corresponding proteins: regulatory proteins are typically present in low abundance, whereas ribosomal proteins are highly abundant. Moreover, this might prevent possibly deleterious consequences of over-producing regulatory proteins, including cell cycle factors, during occasional bursts of ribosomal availability that would lead to a very large increase in the value of . In S. cerevisiae for example, this occurs upon sudden glucose depletion: translation initiation is rapidly inhibited  but some mRNAs (including those involved in carbohydrate metabolism) continue to be translated , thereby being exposed to a spike in ribosome availability. Similar complex translational re-programming, coincident with a partial cell-wide shut down of translation initiation, occurs in response to oxidative stress . Hence, by having a high responsiveness to and a low , abrupt sequences can have a very rapid gene expression upregulation, on one hand, but a controlled maximum translation rate, on the other hand. Figure 8 summarises our findings on the initiation rates and the consequences of different (mRNA-specific) dependencies of the protein production rate on the initiation step.
Incidentally, this classification of proteins according to their translation dynamics, coincides with the classification according to protein stability. In , the S. cerevisiae proteome was analysed using a clustering approach to classify proteins according to half-life, and the stable protein cluster was enriched with proteins involved in protein production, including ribosomal proteins and enzymes involved in amino acid metabolism. Moreover, the unstable protein cluster was enriched with cell cycle proteins and proteins involved in transcriptional regulation. Therefore, our analysis indicates that stable proteins tend to have a low responsiveness in their production rate to external changes which change the initiation rate, whereas unstable proteins production responses very effectively to external changes. Hence, our analysis strongly suggests that the cell coordinates dynamics of protein degradation with the dynamics of protein production.
In summary, we have shown how our stochastic model representing the ribosome traffic flow on mRNAs is able to discern and describe the biological interplay between translation initiation and elongation, at a single-codon level. We have illustrated how the application of this model across the entire genome can be used to infer mRNA-specific translation initiation rates in vivo, and that selection of codon arrangement is likely to be an important mechanism to tune the translation system to meet the competing demands of ribosome biosynthesis and translation of all other mRNAs in the cell. With our approach, mRNA sequences can be classified according to their translation dynamics, mapping to key gene ontological classifications; codon arrangement plays a fundamental role in this classification, indicating that it is optimized through evolution to match the corresponding gene function. Moreover, gene-specific physiological values of initiation rate can be used to determine the translational efficiency for each mRNA; this allows the prediction of genome-wide protein abundances with a significant increase in correlation when compared with previous approaches (Figure 7). We foresee this type of analysis will be of great value to understand how the economics of translation are regulated on a cell-wide basis, and how codon arrangement is optimised to control gene expression in response to the translational remodelling that occurs in response to many environmental stresses.
For each mRNA sequence of S. cerevisiae we performed a stochastic simulation of translation, one mRNA at a time, following the rules explained above and summarised in Figure 1. Our algorithm is a continuous time Monte-Carlo based on the Gillespie algorithm, and therefore it gives the real-time dynamics of the system.
In each simulation of individual mRNAs we let the system reach the steady-state. Then we measured, at constant interval times, two quantities: the current of ribosomes along the mRNA, i.e., how many ribosomes per unit time finish translation, and the density of ribosomes on the mRNA, i.e., the total number of ribosomes divided by the length (in codons) of the mRNA. Therefore, the current gives the translation rate, and the density determines the polysome size. We then averaged these quantities over the entire time interval of the simulation. We ran the simulations for a broad range of initiation rates between 0 and 5 , making sure that the plausible physiological regime for variation was covered, and we fixed the other parameters as explained in the previous sections. The obtained curves and were then smoothed with a ten-points running average.
The gradient of the translation rate at the physiological initiation rate is defined, for each mRNA, as the numerical derivative of the relation , computed at . It geometrically represents the slope of the curve at the physiological value . Since both the distributed physiological values of the initiation rates and different codon sequences cause a different dependence of on , the derivative differs from mRNA to mRNA.
The maximal values and were defined as the mean of the last five simulation points of the current and the density, respectively, corresponding to the five largest values of considered.
The translation efficiency of a transcript is defined as the protein production rate computed at the physiological value , . Denoting by the amount of a specific mRNA in the cell (data from ), for any protein the quantity is an estimate of the protein abundance, see . We also considered the effective amount of transcript involved in polysomes, , where can be found in . The prediction slightly improves the correlation with measured protein abundance, as discussed in the ‘Results’ section.
Dataset for the classification of the transcripts in the abrupt, smooth and hybrid classes, and values of the physiological and maximal quantities (initiation rates and ribosomal current) characteristic of each mRNA. In the second sheet one can find the database for the different estimators of protein production rates used in Fig. 7.
GO annotations of genes found in the smooth, abrupt and hybrid classes. Each sheet is named with the sequence type (smooth, abrupt, hybrid), and with the GO aspect (process, function, component).
GO annotations of genes belonging to the top and bottom quartile of the distribution. Each sheet is named according to their (top quartile=25% of genes having the largest , bottom quartile=25% of genes having the smallest ), and with the GO aspect (process, function, component).
Detailed description of the approaches used in the main text and regarding (1) the estimate of the hopping rates in the two-state model; (2) the method to classify sequences in abrupt, smooth and hybrid class; (3) other examples of randomisation (with constant amino-acid sequence) of codon arrangement and expected protein production rates (similar to Figure 6); (4) numerical details for the quantification of the initiation rates; (5) details on the computed energy of secondary structures ; (6) an accurate explanation of the alternative hypothesis for the ORF-length dependence of the initiation rate; (7) description of the computation of the p-values.
L.C. would like to thank M. Cosentino Lagomarsino for valuable discussions on this work and preliminary versions of the manuscript, as well as R. J. Allen and P. Greulich. The authors also thank M. Thiel for early and helpful discussions on the relevance of codon arrangement, and the anonymous referees for the useful comments that helped to improve the manuscript.
LC is funded by a Scottish Universities Life Sciences Alliance (SULSA) studentship; Biotechnology and Biological Sciences Research Council (BB/G010722/1) to IS and MCR, Biotechnology and Biological Sciences Research Council (BB/I020926/1) to IS, Biotechnology and Biological Sciences Research Council (BB/F00513/X1) and SULSA to MCR. Funding for open access charge: Biotechnology and Biological Sciences Research Council. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.