|Home | About | Journals | Submit | Contact Us | Français|
Proteins play major roles in most biological processes; as a consequence, protein expression levels are highly regulated. While extensive post-transcriptional, translational and protein degradation control clearly influence protein concentration and functionality, it is often thought that protein abundances are primarily determined by the abundances of the corresponding mRNAs. Hence surprisingly, a recent study showed that abundances of orthologous nematode and fly proteins correlate better than their corresponding mRNA abundances. We tested if this phenomenon is general by collecting and testing matching large-scale protein and mRNA expression datasets from seven different species: two bacteria, yeast, nematode, fly, human, and plant. We find that steady-state abundances of proteins show significantly higher correlation across these diverse phylogenetic taxa than the abundances of their corresponding mRNAs (p=0.0008, paired Wilcoxon). These data support the presence of strong selective pressure to maintain protein abundances during evolution, even when mRNA abundances diverge.
Proteins play major roles in most biological processes, ranging from central metabolism to cell structure, maintenance, and replication. Consequently, protein expression levels are subject to diverse and complex control. Due to extensive post-transcriptional, translation and stability regulation, protein abundance is only partly determined by accumulation and degradation of the corresponding mRNAs (e.g., as in references [1–3]), with perhaps 20-60% of the variation in steady-state protein abundances attributable to mRNA levels, depending upon organism and conditions . A recent study of the nematode and fly proteomes made the remarkable observation that the abundances of orthologous nematode and fly proteins correlated better than their corresponding mRNA abundances . The difficulty in making such measurements on a proteome scale has until recently held back such comparisons, and it is unknown whether this observation is generally true. We asked if this phenomenon is indeed general by collecting and testing matching large-scale protein and mRNA expression datasets from seven different species. We find that steady-state abundances of proteins show significantly higher correlation across diverse phylogenetic taxa than the abundances of their corresponding mRNAs (p=0.0008, paired Wilcoxon). These data support the presence of strong selective pressure to maintain protein abundances during evolution. A necessary consequence is that protein stability and post-transcriptional regulatory schemes must compensate for divergent mRNA levels to maintain protein levels at evolutionarily optimized levels.
Specifically, we assembled large-scale quantitative protein expression datasets and measured protein abundances from bacteria (E. coli, P. aeruginosa), fungi (Baker’s yeast, S. cerevisiae), plants (the leaf proteome of rice, O. sativa), insects (fruit fly, D. melanogaster), nematodes (C. elegans), and humans, as described in the online supplement. For each species, we identified or collected mRNA expression datasets from matching strain and growth conditions. We limited datasets to those from similar measurement platforms. For mRNA, we compiled data from single channel DNA microarrays and counting methods if available (Table S1). For proteins, we used mass spectrometry based shotgun proteomics, measuring absolute abundances with a label-free weighted spectral counting approach . We then computed orthologous genes between each pair of species using InParanoid . (Alternate choices of measurement platforms, quantitation, and calculation of orthology, described below, all give similar results.)
We then determined the extent to which steady-state protein concentrations were conserved between each pair of organisms by calculating the rank correlation of the protein abundances originating from orthologous genes, as shown for human and yeast in Figure 1A. Similarly, we measured the rank correlation in the abundances of the corresponding mRNAs. Importantly, we limited all comparisons to only those genes for which we had both protein and mRNA measurements, thereby controlling for possible sources of bias related to selection of genes, including technology-specific abundance biases (for example, the tendency for mass spectrometry to selectively sample abundant proteins). The relative conservation of protein and mRNA abundances could then be estimated by comparing the resulting rank correlations, listed in full in Figure 1B. Of the 21 organism pairs considered, the correlation in protein abundances was greater than that of mRNAs in 17 cases, and less than that in only four. The trend can be clearly seen in the distributions of protein-protein and mRNA-mRNA correlations (Figure 2A), supporting a significantly greater conservation of protein abundances than for the abundances of the corresponding mRNAs (p = 0.0008, paired Wilcoxon).
We attempted to rule out the possibility of either technical artifacts or conflating trends giving rise to our observations as follows: the trend was also observed when we considered mRNA measurements based only on sequencing (SAGE and RNA-seq) rather than DNA microarrays (Figure 2B; only 3 such comparisons available) and was highly statistically significant when we considered average mRNA abundance measurements obtained by multiple techniques (i.e., mixing microarrays and SAGE or RNA-seq; p < 0.0001, Table S4), and when we omitted any one organism (all p < 0.01). To control for errors in assigning orthology, we considered an alternate method of calculating orthologs (p = 0.025, Table S5); both cases behaved similarly and showed a similarly significant trend. Finally, both mRNA and protein abundances are known to be inversely correlated to gene length . To eliminate the possibility that our observation is due to correlations to a third variable, gene length, we measured the partial correlations for either protein or mRNA levels given gene lengths; again, protein levels were significantly better conserved than mRNA levels even after correcting for gene length (p = 0.018, Table S6). Also, protein abundance correlations were significantly higher than mRNA abundance correlations (p < 0.05, paired Wilcoxon) regardless of whether all observations were considered or whether only correlations with significant p-values were considered, for all comparisons described above (Tables S4 to S7).
To investigate if the differences in correlations are due to differences in the underlying measurement errors, we assessed (for a subset of the data) measurement reliability through correlation analysis of technical replicates. Measurements of mRNA concentrations tend to have higher reproducibilities than measurements of protein concentrations (Rs=0.99 and 0.80, respectively, Figure S1), arguing against general measurement errors as an explanation of the lower mRNA-mRNA correlations. We occasionally observed a contribution from expression level, e.g., for the fly-nematode comparison: the observed difference in correlation coefficient is most pronounced for the least abundant mRNAs and proteins; conversely, highly expressed proteins and mRNAs are similarly conserved in their abundance across the two organisms. However, this trend did not hold for all organism pairs (data not shown).
Higher conservation of protein abundances suggests that abundances of proteins are to some degree optimized and that evolutionary pressure helps to maintain these levels despite changing mRNA levels, as also exhibited by only partially correlated mRNA and protein levels within a species. Extensive regulation of protein abundances must therefore compensate for divergent mRNA expression levels to maintain proteins at favored levels. It remains to be seen if evolutionary or molecular signatures of such compensatory regulation can be detected. For example, it has been speculated that transcriptional bursts, observed to increase variance in mRNA abundances, may be buffered by long protein half-lives . Furthermore, divergence of mRNA expression levels is an evolutionarily well known process , and a remarkable conservation of protein expression levels across organisms has been observed recently . Within a population of organisms of the same species, variation in mRNA abundances may be a mechanism to increase molecular diversity so as to improve chances of survival under stress conditions. Under normal conditions, less varied protein expression levels are presumably needed for proper cellular function, with variation of mRNA expression buffered by mechanisms that are yet to be defined. Finally, these data also suggest that for conserved genes, direct assessment of protein levels may often be more informative of the cellular state than analysis of mRNA levels, despite the widespread use of mRNA expression levels as proxy measurements for protein expression levels.
We thank Sabine Schrimpf and colleagues for providing data from their publication. This work was supported by grants from the N.S.F., N.I.H., and Welch (F1515) and Packard Foundations to E.M.M, NIH grant #GM55962 to PCR, and NIH grant # 5R01AI075068 to MW. MW is a Burroughs Wellcome Investigator in the Pathogenesis of Infectious Disease.