Search tips
Search criteria

Results 1-25 (1201999)

Clipboard (0)

Related Articles

1.  Comparing Protein and mRNA Abundances to Protein Expression Regulation 
Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5' untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3'UTR explained a larger proportion of protein abundance variation than characteristics of the 5'UTR. Further, we used data from human and six other organisms (bacteria, yeast, worm, fly, and plant) and established that steady-state abundances of proteins show significantly higher correlation across these diverse phylogenetic taxa than the abundances of their corresponding mRNAs (p=0.0008, paired Wilcoxon). These data suggest strong selective pressure to maintain protein abundances during evolution, even when mRNA abundances diverge. The absolute protein and mRNA concentration measurements for >1000 human genes and for other organisms represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.
PMCID: PMC3186639
2.  A dynamic model of proteome changes reveals new roles for transcript alteration in yeast 
By characterizing dynamic changes in yeast protein abundance following osmotic shock, this study shows that the correlation between protein and mRNA differs for transcripts that increase versus decrease in abundance, and reveals physiological reasons for these differences.
The correlation between protein and mRNA change is very high at transcripts that increase in abundance, but negligible at reduced transcripts following NaCl shock.Modeling and experimental data suggest that reducing levels of high-abundance transcripts helps to direct translational machinery to newly made transcripts.The transient burst of transcript increase serves to accelerate changes in protein abundance.Post-transcriptional regulation of protein abundance is pervasive, although most of the variance in protein change is explained by changes in mRNA abundance.
Natural microenvironments change rapidly, and living creatures must respond quickly and efficiently to thrive within this flux. At all cellular levels—signaling, transcription, translation, metabolism, cell growth, and division—the response is dynamic and coordinated. Some aspects of this response, such as dynamic changes of the transcriptome, are well understood. But other aspects, like the response of the proteome, have remained obscured primarily because of previous limitations in technology. Without coordinated time-course data, it has remained impossible to correctly characterize the correlations and dependencies between these two essential levels of cell biology.
This work presents an extended picture of the coordinated response of the transcriptome and proteome as cells respond to an abrupt environmental change. To assay proteomic dynamics, we developed a strategy for large-scale, multiplexed quantitation using isobaric tags and high mass accuracy mass spectrometry. This sensitive yet efficient platform allows for the expedient collection of quantitative time-course proteomic data at six time points, sufficiently reproducible to permit meaningful interpretation of variation across biological replicates. Time-course transcriptome data were generated from paired biological samples, allowing us to examine the relationships between changes in mRNA and protein for each gene in terms of direction and intensity, as well as the characteristics of the temporal profiles for each gene.
It was immediately obvious that a single measure of correlation across the entire data set was a meaningless metric. We therefore analyzed relationships between mRNA and protein for different subsets of data. In response to osmotic shock, hundreds of transcripts are highly induced, and their temporal pattern reveals a transient peak of maximal induction, which resolves into a new elevated level as cells acclimate (Figure 2). For this group of genes, there is extremely high correlation between peak mRNA change and protein change (R2∼0.8). But the dynamics of the molecules differ: while mRNA levels transiently overshoot their final levels, proteins gradually rise in abundance toward their new, elevated state. We observed, however, that a measure of efficiency connects the two profiles. The time it takes for a protein to acclimate to its new state correlates with the magnitude of the excess mRNA induction. Thus, the cell imparts an urgency to protein induction by transiently producing excess transcript.
The most surprising result, however, involves transcripts that decrease in abundance. In response to osmotic shock, the cell transiently reduces over 600 transcripts, many of which are among the most highly expressed in unstressed cells. But protein levels for these genes remain, for the most part, almost completely unchanged. The stark absence of protein repression is independent of basal protein abundance, independent of reported protein half-lives, reproducible across biological replicates, and validated by quantitative western blots. Furthermore, since we do detect a handful of proteins whose abundance is significantly reduced, our technology is capable of identifying protein loss. Thus, we conclude that transcript reduction serves another purpose besides reducing protein levels.
To explore alternate interpretations of the consequence of transcriptional repression, we devised a mass-action kinetic model, which describes protein changes based on mRNA dynamics in the context of transient changes in the rates of cell division. The model successfully recapitulated the observed data, allowing us to alter modeling parameters to test various hypotheses.
In response to osmotic shock, overall rates of translation temporarily decrease and cell growth transiently arrests before resuming at a slower rate. We reasoned that mRNA reduction might lower the rate of new protein synthesis, but that retarded production is balanced by reduced cell division. We explored both aspects of this logic with our model.
As expected, removing cell division from our model led to a calculated decrease of protein levels, indicating that reduced growth is necessary for maintaining protein levels. However, when we computationally held mRNA levels stable and calculated protein levels in the absence of mRNA repression, we did not find the expected increase in protein abundance.
We then considered the possibility that one function of the regulated repression of these highly abundant transcripts was to liberate proteins essential for translation, such as ribosomes or translation initiation factors. To explore this, we examined a mutant lacking the Dot6p/Tod6p transcriptional repressors, which fails to properly repress ∼250 genes in response to osmotic shock. In the wild type, the mRNA for a Dot6p/Tod6p target (ARX1) decreased seven-fold, and the remaining transcript was generally unassociated with poly-ribosomes. In the mutant, however, the mRNA levels were reduced only two-fold, while the remaining transcript continued to bind ribosomes. Therefore, failure to reduce transcript levels led to a persistent association with poly-ribosomes, thereby consuming translational machinery.
Our hypothesis is, therefore, that widespread changes in the transcriptome promote efficient translation of new proteins. Transcript increase serves to increase abundance of the encoded proteins, while reduction of some of the most abundant and highly translated mRNAs supports this project by liberating translational capacity. While it is not clear what factors are the limiting elements, it is clear that a full picture of cellular biology requires exploring the dynamics of the cellular response.
The transcriptome and proteome change dynamically as cells respond to environmental stress; however, prior proteomic studies reported poor correlation between mRNA and protein, rendering their relationships unclear. To address this, we combined high mass accuracy mass spectrometry with isobaric tagging to quantify dynamic changes in ∼2500 Saccharomyces cerevisiae proteins, in biological triplicate and with paired mRNA samples, as cells acclimated to high osmolarity. Surprisingly, while transcript induction correlated extremely well with protein increase, transcript reduction produced little to no change in the corresponding proteins. We constructed a mathematical model of dynamic protein changes and propose that the lack of protein reduction is explained by cell-division arrest, while transcript reduction supports redistribution of translational machinery. Furthermore, the transient ‘burst' of mRNA induction after stress serves to accelerate change in the corresponding protein levels. We identified several classes of post-transcriptional regulation, but show that most of the variance in protein changes is explained by mRNA. Our results present a picture of the coordinated physiological responses at the levels of mRNA, protein, protein-synthetic capacity, and cellular growth.
PMCID: PMC3159980  PMID: 21772262
dynamics; modeling; proteomics; stress; transcriptomics
3.  Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line 
We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive (Rs=0.46, R2=0.29, P-value<2e16), although non-linear.Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration.When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.
mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation (Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations (de Sousa Abreu et al, 2009).
Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast (Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively (Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals.
Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation (Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis.
We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements (de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence (Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise.
Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far (Figure 3B).
Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.
PMCID: PMC2947365  PMID: 20739923
gene expression regulation; protein degradation; protein stability; translation
4.  Determinants of Protein Abundance and Translation Efficiency in S. cerevisiae 
PLoS Computational Biology  2007;3(12):e248.
The translation efficiency of most Saccharomyces cerevisiae genes remains fairly constant across poor and rich growth media. This observation has led us to revisit the available data and to examine the potential utility of a protein abundance predictor in reinterpreting existing mRNA expression data. Our predictor is based on large-scale data of mRNA levels, the tRNA adaptation index, and the evolutionary rate. It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe). The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels. Analysis of gene expression measurement experiments using the predicted protein abundance levels yields new insights that are not readily discernable when clustering the corresponding mRNA expression levels. Comparing protein abundance levels across poor and rich media, we find a general trend for homeostatic regulation where transcription and translation change in a reciprocal manner. This phenomenon is more prominent near origins of replications. Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.
Author Summary
DNA microarrays measuring gene expression levels have been a mainstay of systems biology research, but since proteins are more direct mediators of cellular processes, protein abundance levels are likely to be a better indicator of the cellular state. However, as proteomic measurements are still lagging behind gene expression measurements, there has been considerable effort in recent years to study the correlations between gene expression (and a plethora of protein characteristics) and protein abundance. Addressing this challenge, the current study is one of the first to introduce a predictor for protein abundance levels that is tested and validated on unseen data using all currently available large-scale proteomic data. The utility of this predictor is shown via a comprehensive set of tests and applications, including improved functional coherency of complexes and interacting proteins, better fit with gene phenotypic data, cross-species prediction of protein abundance, and most importantly, the reinterpretation of existing gene expression microarray data. Finally, our revisit and analysis of the existing large-scale proteomic data reveals new key insights concerning the regulation of translation efficiency and its evolution. Overall, a solid protein abundance prediction tool is invaluable for advancing our understanding of cellular processes; this study presents a further step in this direction.
PMCID: PMC2230678  PMID: 18159940
5.  Quantification of mRNA and protein and integration with protein turnover in a bacterium 
Determination of the average cellular copy number of 400 proteins under different growth conditions and integration with protein turnover and absolute mRNA levels reveals the dynamics of protein expression in the genome-reduced bacterium Mycoplasma pneumoniae.
Our study provides a fine-grained, quantitative picture to unprecedented detail in an established model organism for systems-wide studies.Our integrative approach reveals a novel, dynamic view on the processes, interactions and regulations underlying the central dogma pathway and the composition of protein complexes.Simulations using our quantitative data on mRNA, protein and turnover show how an organism copes with stochastic noise in gene expression in vivo.Our data serve as an important resource for colleagues both within our field of research and in related disciplines.
A hallmark of Systems Biology is the integration of diverse, large quantitative data sets with the aim to gain novel insights into how biological processes work. We measured individual mRNA and protein abundances as well as protein turnover in the bacterium Mycoplasma pneumoniae. This human pathogen is an ideal model organism for organism-wide studies. It can be readily cultured under laboratory conditions and it has a very small genome with only 690 protein-coding genes. This comparably low complexity allows for the exhaustive analysis of major cellular biomolecules avoiding constrains introduced by limitations of available analysis techniques.
Using a recently developed mass spectrometry-based approach, we determined the average cellular copy number for over 400 individual proteins under different growth and stress conditions. The 20 most abundant proteins, including Elongation factor Tu, cellular chaperones, and proteins involved in metabolizing glucose, the major energy source of M. pneumoniae account for nearly 44% of the total cellular protein mass. We observed abundance changes of many expected and several unexpected proteins in response to cellular stress, such as heat shock, DNA damage and osmotic stress, as well as along batch culture growth over 4 days.
Integration of the protein abundance data with quantitative mRNA measurements revealed a modest correlation between these two classes of biomolecules. However, for several classical stress-induced proteins, we observed a correlated induction of mRNA and protein in response to heat shock. A focused analysis of mRNA–protein abundance dynamics during batch culture growth suggested that the regulation of gene expression is largely decoupled from protein dynamics in M. pneumoniae, indicating extensive post-transcriptional and post-translational regulation influencing the cellular mRNA–protein ratios.
To investigate the factors influencing the cellular protein abundance, we measured individual protein turnover rates by mass spectrometry using a label-chase approach involving stable isotope-labelled amino acids. The average half-life of a protein in M. pneumoniae is 23 h. Based on the measured quantitative mRNA data, the protein abundances and their half-lives, we established an ordinary differential equations model for the estimation of individual in vivo protein degradation and translation efficiency rates. We found out that translation efficiency rather than protein turnover is the dominating factor influencing protein abundance. Using our abundance and turnover data, we additionally performed stochastic simulations of gene expression. We observed that long protein half-life and low translational efficiency buffers gene expression noise propagating from low cellular mRNA levels in vivo.
We compared the abundance ratios of proteins associating into complexes in vivo with their expected functional stoichiometries. We observed that for stable protein complexes, such as the GroEL/ES chaperonin or DNA gyrase, our measured abundance ratios reflected the expected subunit stoichiometries. More dynamic protein complexes, such as the DnaK/J/GrpE chaperone system or RNA polymerase, showed several unusual subunit ratios, pointing towards transient interaction of sub-stoichiometric subunits for function. A detailed, quantitative analysis of the ribosome, the largest cellular protein complex, revealed large abundance differences of the 51 subunits. This observation indicates a multi-functionality for several, abundant ribosomal proteins.
Finally, a comparison of the determined average cellular protein abundances with a different pathogenic bacterium, Leptospira interrogans, revealed that cellular protein abundances closely reflect their respective lifestyles.
Our study represents an organism-wide, quantitative analysis of cellular protein abundances. Integrating our proteomics data with determined mRNA levels and protein turnover rates reveals insights into the dynamic interplay and regulation of mRNA and proteins, the central biomolecules of a cell.
Biological function and cellular responses to environmental perturbations are regulated by a complex interplay of DNA, RNA, proteins and metabolites inside cells. To understand these central processes in living systems at the molecular level, we integrated experimentally determined abundance data for mRNA, proteins, as well as individual protein half-lives from the genome-reduced bacterium Mycoplasma pneumoniae. We provide a fine-grained, quantitative analysis of basic intracellular processes under various external conditions. Proteome composition changes in response to cellular perturbations reveal specific stress response strategies. The regulation of gene expression is largely decoupled from protein dynamics and translation efficiency has a higher regulatory impact on protein abundance than protein turnover. Stochastic simulations using in vivo data show how low translation efficiency and long protein half-lives effectively reduce biological noise in gene expression. Protein abundances are regulated in functional units, such as complexes or pathways, and reflect cellular lifestyles. Our study provides a detailed integrative analysis of average cellular protein abundances and the dynamic interplay of mRNA and proteins, the central biomolecules of a cell.
PMCID: PMC3159969  PMID: 21772259
mRNA–protein; Mycoplasma pneumoniae; protein homeostasis; protein turnover; quantitative proteomics
6.  mRNA turnover rate limits siRNA and microRNA efficacy 
Based on a simple model of the mRNA life cycle, we predict that mRNAs with high turnover rates in the cell are more difficult to perturb with RNAi.We test this hypothesis using a luciferase reporter system and obtain additional evidence from a variety of large-scale data sets, including microRNA overexpression experiments and RT–qPCR-based efficacy measurements for thousands of siRNAs.Our results suggest that mRNA half-lives will influence how mRNAs are differentially perturbed whenever small RNA levels change in the cell, not only after transfection but also during differentiation, pathogenesis and normal cell physiology.
What determines how strongly an mRNA responds to a microRNA or an siRNA? We know that properties of the sequence match between the small RNA and the mRNA are crucial. However, large-scale validations of siRNA efficacies have shown that certain transcripts remain recalcitrant to perturbation even after repeated redesign of the siRNA (Krueger et al, 2007). Weak response to RNAi may thus be an inherent property of the mRNA, but the underlying factors have proven difficult to uncover.
siRNAs induce degradation by sequence-specific cleavage of their target mRNAs (Elbashir et al, 2001). MicroRNAs, too, induce mRNA degradation, and ∼80% of their effect on protein levels can be explained by changes in transcript abundance (Hendrickson et al, 2009; Guo et al, 2010). Given that multiple factors act simultaneously to degrade individual mRNAs, we here consider whether variable responses to micro/siRNA regulation may, in part, be explained simply by the basic dynamics of mRNA turnover. If a transcript is already under strong destabilizing regulation, it is theoretically possible that the relative change in abundance after the addition of a novel degrading factor would be less pronounced compared with a stable transcript (Figure 1). mRNA turnover is achieved by a multitude of factors, and the influence of such factors on targetability can be explored. However, their combined action, including yet unknown factors, is summarized into a single property: the mRNA decay rate.
First, we explored the theoretical relationship between the pre-existing turnover rate of an mRNA, and its expected susceptibility to perturbation by a small RNA. We assumed a basic model of the mRNA life cycle, in which the rate of transcription is constant and the rate of degradation is described by first-order kinetics. Under this model, the relative change in steady-state expression level will become smaller as the pre-existing decay rate grows larger, independent of the transcription rate. This relationship persists also if we assume various degrees of synergy and antagonism between the pre-existing factors and the external factor, with increasing synergism leading to transcripts being more equally targetable, regardless of their pre-existing decay rate.
We next generated a series of four luciferase reporter constructs with destabilizing AU-rich elements (AREs) of various strengths incorporated into their 3′ UTRs. To evaluate how the different constructs would respond to perturbation, we performed co-transfections with an siRNA targeted at the coding region of the luciferase gene. This reduced the signal of the non-destabilized construct to 26% compared with a control siRNA. In contrast, the most destabilized construct showed 42% remaining reporter activity, and we could observe a dose–response relationship across the series.
The reporter experiment encouraged an investigation of this effect on real-world mRNAs. We analyzed a set of 2622 siRNAs, for which individual efficacies were determined using RT–qPCR 48 h post-transfection in HeLa cells ( Of these, 1778 could be associated with an experimentally determined decay rate (Figure 4A). Although the overall correlation between the two variables was modest (Spearman's rank correlation rs=0.22, P<1e−20), we found that siRNAs directed at high-turnover (t1/2<200 min) and medium-turnover (2001000 min) transcripts (P<8e−11 and 4e−9, respectively, two-tailed KS-test, Figure 4B). While 41.6% (498/1196) of the siRNAs directed at low-turnover transcripts reached 10% remaining expression or better, only 16.7% (31/186) of the siRNAs that targeted high-turnover mRNAs reached this high degree of silencing (Figure 4B). Reduced targetability (25.2%, 100/396) was also seen for transcripts with medium-turnover rate.
Our results based on siRNA data suggested that turnover rates could also influence microRNA targeting. By assembling genome-wide mRNA expression data from 20 published microRNA transfections in HeLa cells, we found that predicted target mRNAs with short and medium half-life were significantly less repressed after transfection than their long-lived counterparts (P<8e−5 and P<0.03, respectively, two-tailed KS-test). Specifically, 10.2% (293/2874) of long-lived targets versus 4.4% (41/942) of short-lived targets were strongly (z-score <−3) repressed. siRNAs are known to cause off-target effects that are mediated, in part, by microRNA-like seed complementarity (Jackson et al, 2006). We analyzed changes in transcript levels after transfection of seven different siRNAs, each with a unique seed region (Jackson et al, 2006). Putative ‘off-targets' were identified by mapping of non-conserved seed matches in 3′ UTRs. We found that low-turnover mRNAs (t1/2 >1000 min) were more affected by seed-mediated off-target silencing than high-turnover mRNAs (t1/2 <200 min), with twice as many long-lived seed-containing transcripts (3.8 versus 1.9%) being strongly (z-score <−3) repressed.
In summary, mRNA turnover rates have an important influence on the changes exerted by small RNAs on mRNA levels. It can be assumed that mRNA half-lives will influence how mRNAs are differentially perturbed whenever small RNA levels change in the cell, not only after transfection but also during differentiation, pathogenesis and normal cell physiology.
The microRNA pathway participates in basic cellular processes and its discovery has enabled the development of si/shRNAs as powerful investigational tools and potential therapeutics. Based on a simple kinetic model of the mRNA life cycle, we hypothesized that mRNAs with high turnover rates may be more resistant to RNAi-mediated silencing. The results of a simple reporter experiment strongly supported this hypothesis. We followed this with a genome-wide scale analysis of a rich corpus of experiments, including RT–qPCR validation data for thousands of siRNAs, siRNA/microRNA overexpression data and mRNA stability data. We find that short-lived transcripts are less affected by microRNA overexpression, suggesting that microRNA target prediction would be improved if mRNA turnover rates were considered. Similarly, short-lived transcripts are more difficult to silence using siRNAs, and our results may explain why certain transcripts are inherently recalcitrant to perturbation by small RNAs.
PMCID: PMC3010119  PMID: 21081925
microRNA; mRNA decay; RNAi; siRNA
7.  Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes 
PLoS Biology  2009;7(3):e1000048.
The nematode Caenorhabditis elegans is a popular model system in genetics, not least because a majority of human disease genes are conserved in C. elegans. To generate a comprehensive inventory of its expressed proteome, we performed extensive shotgun proteomics and identified more than half of all predicted C. elegans proteins. This allowed us to confirm and extend genome annotations, characterize the role of operons in C. elegans, and semiquantitatively infer abundance levels for thousands of proteins. Furthermore, for the first time to our knowledge, we were able to compare two animal proteomes (C. elegans and Drosophila melanogaster). We found that the abundances of orthologous proteins in metazoans correlate remarkably well, better than protein abundance versus transcript abundance within each organism or transcript abundances across organisms; this suggests that changes in transcript abundance may have been partially offset during evolution by opposing changes in protein abundance.
Author Summary
Proteins are the active players that execute the genetic program of a cell, and their levels and interactions are precisely controlled. Routinely monitoring thousands of proteins is difficult, as they can be present at vastly different abundances, come with various sizes, shapes, and charge, and have a more complex alphabet of twenty “letters,” in contrast to the four letters of the genome itself. Here, we used mass spectrometry to extensively characterize the proteins of a popular model organism, the nematode Caenorhabditis elegans. Together with previous data from the fruit fly Drosophila melanogaster, this allows us to compare the protein levels of two animals on a global scale. Surprisingly, we find that individual protein abundance is highly conserved between the two species. So, although worms and flies look very different, they need similar amounts of each conserved, orthologous protein. Because many C. elegans and D. melanogaster proteins also have counterparts in humans, our results suggest that similar rules may apply to our own proteins.
A quantitative comparison of two animal proteomes shows a striking correlation of protein abundance levels, a better correlation than transcript levels. Are the latter more variable during evolution?
PMCID: PMC2650730  PMID: 19260763
8.  Spliced Leader Trapping Reveals Widespread Alternative Splicing Patterns in the Highly Dynamic Transcriptome of Trypanosoma brucei 
PLoS Pathogens  2010;6(8):e1001037.
Trans-splicing of leader sequences onto the 5′ends of mRNAs is a widespread phenomenon in protozoa, nematodes and some chordates. Using parallel sequencing we have developed a method to simultaneously map 5′splice sites and analyze the corresponding gene expression profile, that we term spliced leader trapping (SLT). The method can be applied to any organism with a sequenced genome and trans-splicing of a conserved leader sequence. We analyzed the expression profiles and splicing patterns of bloodstream and insect forms of the parasite Trypanosoma brucei. We detected the 5′ splice sites of 85% of the annotated protein-coding genes and, contrary to previous reports, found up to 40% of transcripts to be differentially expressed. Furthermore, we discovered more than 2500 alternative splicing events, many of which appear to be stage-regulated. Based on our findings we hypothesize that alternatively spliced transcripts present a new means of regulating gene expression and could potentially contribute to protein diversity in the parasite. The entire dataset can be accessed online at TriTrypDB or through:
Author Summary
Some organisms like the human and animal parasite Trypanosoma brucei add a leader sequence to their mRNAs through a reaction called trans-splicing. Until now the splice sites for most mRNAs were unknown in T. brucei. Using high throughput sequencing we have developed a method to identify the splice sites and at the same time measure the abundance of the corresponding mRNAs. Analyzing three different life cycle stages of the parasite we identified the vast majority of splice sites in the organism and, to our great surprise, uncovered more than 2500 alternative splicing events, many of which appeared to be specific for one of the life cycle stages. Alternative splicing is a result of the addition of the leader sequence to different positions on the mRNA, leading to mixed mRNA populations that can encode for proteins with varying properties. One of the most obvious changes caused by alternative splicing is the gain or loss of targeting signals, leading to differential localization of the corresponding proteins. Based on our findings we hypothesize that alternative splicing is a major mechanism to regulate gene expression in T. brucei and could contribute to protein diversity in the parasite.
PMCID: PMC2916883  PMID: 20700444
9.  Concordant Regulation of Translation and mRNA Abundance for Hundreds of Targets of a Human microRNA 
PLoS Biology  2009;7(11):e1000238.
A specific microRNA reduces the synthesis of hundreds of proteins via concordant effects on the abundance and translation of the mRNAs that encode them.
MicroRNAs (miRNAs) regulate gene expression posttranscriptionally by interfering with a target mRNA's translation, stability, or both. We sought to dissect the respective contributions of translational inhibition and mRNA decay to microRNA regulation. We identified direct targets of a specific miRNA, miR-124, by virtue of their association with Argonaute proteins, core components of miRNA effector complexes, in response to miR-124 transfection in human tissue culture cells. In parallel, we assessed mRNA levels and obtained translation profiles using a novel global approach to analyze polysomes separated on sucrose gradients. Analysis of translation profiles for ∼8,000 genes in these proliferative human cells revealed that basic features of translation are similar to those previously observed in rapidly growing Saccharomyces cerevisiae. For ∼600 mRNAs specifically recruited to Argonaute proteins by miR-124, we found reductions in both the mRNA abundance and inferred translation rate spanning a large dynamic range. The changes in mRNA levels of these miR-124 targets were larger than the changes in translation, with average decreases of 35% and 12%, respectively. Further, there was no identifiable subgroup of mRNA targets for which the translational response was dominant. Both ribosome occupancy (the fraction of a given gene's transcripts associated with ribosomes) and ribosome density (the average number of ribosomes bound per unit length of coding sequence) were selectively reduced for hundreds of miR-124 targets by the presence of miR-124. Changes in protein abundance inferred from the observed changes in mRNA abundance and translation profiles closely matched changes directly determined by Western analysis for 11 of 12 proteins, suggesting that our assays captured most of miR-124–mediated regulation. These results suggest that miRNAs inhibit translation initiation or stimulate ribosome drop-off preferentially near the start site and are not consistent with inhibition of polypeptide elongation, or nascent polypeptide degradation contributing significantly to miRNA-mediated regulation in proliferating HEK293T cells. The observation of concordant changes in mRNA abundance and translational rate for hundreds of miR-124 targets is consistent with a functional link between these two regulatory outcomes of miRNA targeting, and the well-documented interrelationship between translation and mRNA decay.
Author Summary
The human genome contains directions to regulate the timing and magnitude of expression of its thousands of genes. MicroRNAs are important regulatory RNAs that tune the expression levels of tens to hundreds of specific genes by pairing to complimentary stretches in the messenger RNAs from these genes, thereby reducing their stability and their translation into protein. Although the importance of microRNAs is appreciated, little is known about the relative contributions of degradation or repression of translation of the cognate mRNAs to the overall effects on protein synthesis, or the links between these two regulatory mechanisms. We devised a simple, economical method to systematically measure mRNA translation profiles, then applied this method, in combination with gene expression analysis, to measure the effects of the human microRNA miR-124 on the abundance and apparent translation rate of its mRNA targets. We found that for the ∼600 mRNA targets of miR-124 that were identified by their association with microRNA effector complexes, around three quarters of the reduction in estimated protein synthesis was explained by changes in mRNA abundance. Although the apparent changes in translation efficiencies of the targeted mRNAs were smaller in magnitude, they were highly correlated with changes in the abundance of those RNAs, suggesting a functional link between microRNA-mediated repression of translation and mRNA decay.
PMCID: PMC2766070  PMID: 19901979
10.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance 
PLoS Computational Biology  2012;8(10):e1002743.
The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.
Author Summary
Microbial ecologists cannot observe their study organisms directly, so they use molecular sequencing to measure the abundance of different microbes living in the wild. The most commonly used method for measuring the abundance of different microbes is to collect a DNA sample from an environment and sequence a particular gene, the 16S SSU rRNA gene (“16S”) from those samples. The abundance of 16S sequences from different microbes is then used as a surrogate measure of the abundance of the microbial taxa in the community. One problem with the use of the 16S gene as a measure of microbial abundance is that many microbes have multiple copies of the gene in their genome. Thus, variation in 16S gene abundances can be caused by both genomic copy number variation and variation in the abundance of organisms. In this study we present a computational method that allows estimation of the abundance and genomic 16S copy number of microbes based on environmental sequencing of the 16S gene. We use simulations and analysis of microbial community data sets to demonstrate that estimating the abundance of organisms from 16S data improves our ability to accurately measure the diversity and abundance of microbial communities.
PMCID: PMC3486904  PMID: 23133348
11.  Impact of Nonsense-Mediated mRNA Decay on the Global Expression Profile of Budding Yeast 
PLoS Genetics  2006;2(11):e203.
Nonsense-mediated mRNA decay (NMD) is a eukaryotic mechanism of RNA surveillance that selectively eliminates aberrant transcripts coding for potentially deleterious proteins. NMD also functions in the normal repertoire of gene expression. In Saccharomyces cerevisiae, hundreds of endogenous RNA Polymerase II transcripts achieve steady-state levels that depend on NMD. For some, the decay rate is directly influenced by NMD (direct targets). For others, abundance is NMD-sensitive but without any effect on the decay rate (indirect targets). To distinguish between direct and indirect targets, total RNA from wild-type (Nmd+) and mutant (Nmd−) strains was probed with high-density arrays across a 1-h time window following transcription inhibition. Statistical models were developed to describe the kinetics of RNA decay. 45% ± 5% of RNAs targeted by NMD were predicted to be direct targets with altered decay rates in Nmd− strains. Parallel experiments using conventional methods were conducted to empirically test predictions from the global experiment. The results show that the global assay reliably distinguished direct versus indirect targets. Different types of targets were investigated, including transcripts containing adjacent, disabled open reading frames, upstream open reading frames, and those prone to out-of-frame initiation of translation. Known targeting mechanisms fail to account for all of the direct targets of NMD, suggesting that additional targeting mechanisms remain to be elucidated. 30% of the protein-coding targets of NMD fell into two broadly defined functional themes: those affecting chromosome structure and behavior and those affecting cell surface dynamics. Overall, the results provide a preview for how expression profiles in multi-cellular eukaryotes might be impacted by NMD. Furthermore, the methods for analyzing decay rates on a global scale offer a blueprint for new ways to study mRNA decay pathways in any organism where cultured cell lines are available.
Genes determine the structure of proteins through transcription and translation in which an RNA copy of the gene is made (mRNA) and then translated to make the protein. Cellular protein levels reflect the relative rates of mRNA synthesis and degradation, which are subject to multiple layers of controls. Mechanisms also exist to ensure the quality of each mRNA. One quality control mechanism called nonsense-mediated mRNA decay (NMD) triggers the rapid degradation of mRNAs containing coding errors that would otherwise lead to the production of non-functional or potentially deleterious proteins. NMD occurs in yeasts, plants, flies, worms, mice, and humans. In humans, NMD affects the etiology of genetic disorders by affecting the expression of genes that carry disease-causing mutations. Besides quality assurance, NMD plays another role in gene expression by controlling the abundance of hundreds of normal mRNAs that are devoid of coding errors. In this paper, the authors used DNA arrays to monitor the relative decay rates of all mRNAs in budding yeast and found a subset where decay rates were dependent on NMD. Many of the corresponding proteins perform related functional roles affecting both the structure and behavior of chromosomes and the structure and integrity of the cell surface.
PMCID: PMC1657058  PMID: 17166056
12.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
13.  Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins 
Bioinformatics  2009;25(15):1905-1914.
Motivation: Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.
Results: In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2712339  PMID: 19447782
14.  Natural antisense transcripts regulate the neuronal stress response and excitability 
eLife  2014;3:e01849.
Neurons regulate ionic fluxes across their plasma membrane to maintain their excitable properties under varying environmental conditions. However, the mechanisms that regulate ion channels abundance remain poorly understood. Here we show that pickpocket 29 (ppk29), a gene that encodes a Drosophila degenerin/epithelial sodium channel (DEG/ENaC), regulates neuronal excitability via a protein-independent mechanism. We demonstrate that the mRNA 3′UTR of ppk29 affects neuronal firing rates and associated heat-induced seizures by acting as a natural antisense transcript (NAT) that regulates the neuronal mRNA levels of seizure (sei), the Drosophila homolog of the human Ether-à-go-go Related Gene (hERG) potassium channel. We find that the regulatory impact of ppk29 mRNA on sei is independent of the sodium channel it encodes. Thus, our studies reveal a novel mRNA dependent mechanism for the regulation of neuronal excitability that is independent of protein-coding capacity.
eLife digest
Neurons communicate with one another via electrical signals known as action potentials. These signals are generated when a stimulus causes sodium and potassium ion channels in the cell membrane to open, leading to an influx of sodium ions, followed by an efflux of potassium ions. Changes in temperature affect the rate at which ion channels open and close, and thus affect how easy it is for a stimulus to trigger an action potential. In response to a sudden rise in temperature, neurons must adjust the number of ion channels in their membranes to ensure that they do not become hyperexcitable, which could result in epilepsy.
Now, Zheng et al. have revealed one possible mechanism for how neurons do this. In the fruit fly, Drosophila, a gene for a potassium channel is found on the same chromosomal location as a gene for a sodium channel, and some of the genetic elements that regulate the expression of these two genes even overlap. However, the genes are on opposite strands of the DNA double helix. This means that when the genes are transcribed to produce molecules of messenger RNA (mRNA), which is usually single stranded, some of the mRNA molecules will pair up to form double-stranded mRNA molecules. This is significant because such RNA ‘duplexes’ have been shown to inhibit the translation of conventional single-stranded mRNA molecules into proteins, or to lead to their complete degradation.
Zheng et al. found that flies with mutations in the potassium channel gene display seizures in response to sudden changes in temperature. However, insects with mutations in the sodium channel gene are not affected because, surprisingly, they have a higher than expected number of potassium channels. It turns out that the mutant sodium channel mRNA molecules are unable to form RNA duplexes with potassium channel mRNA molecules: these duplexes would normally limit the number of potassium channels so, in their absence, the number of potassium channels increases, and this protects the flies from seizures.
Zheng et al. also uncovered a novel mechanism by which mRNA molecules can regulate gene expression independent of their role as templates for proteins. Further work is required to determine whether this mechanism is also present in other organisms, including humans.
PMCID: PMC3953951  PMID: 24642409
Degenerin; Epithelial sodium channel; DEG/ENaC; Drosophila; fruit fly; D. melanogaster
15.  The Caenorhabditis elegans HEN1 Ortholog, HENN-1, Methylates and Stabilizes Select Subclasses of Germline Small RNAs 
PLoS Genetics  2012;8(4):e1002617.
Small RNAs regulate diverse biological processes by directing effector proteins called Argonautes to silence complementary mRNAs. Maturation of some classes of small RNAs involves terminal 2′-O-methylation to prevent degradation. This modification is catalyzed by members of the conserved HEN1 RNA methyltransferase family. In animals, Piwi-interacting RNAs (piRNAs) and some endogenous and exogenous small interfering RNAs (siRNAs) are methylated, whereas microRNAs are not. However, the mechanisms that determine animal HEN1 substrate specificity have yet to be fully resolved. In Caenorhabditis elegans, a HEN1 ortholog has not been studied, but there is evidence for methylation of piRNAs and some endogenous siRNAs. Here, we report that the worm HEN1 ortholog, HENN-1 (HEN of Nematode), is required for methylation of C. elegans small RNAs. Our results indicate that piRNAs are universally methylated by HENN-1. In contrast, 26G RNAs, a class of primary endogenous siRNAs, are methylated in female germline and embryo, but not in male germline. Intriguingly, the methylation pattern of 26G RNAs correlates with the expression of distinct male and female germline Argonautes. Moreover, loss of the female germline Argonaute results in loss of 26G RNA methylation altogether. These findings support a model wherein methylation status of a metazoan small RNA is dictated by the Argonaute to which it binds. Loss of henn-1 results in phenotypes that reflect destabilization of substrate small RNAs: dysregulation of target mRNAs, impaired fertility, and enhanced somatic RNAi. Additionally, the henn-1 mutant shows a weakened response to RNAi knockdown of germline genes, suggesting that HENN-1 may also function in canonical RNAi. Together, our results indicate a broad role for HENN-1 in both endogenous and exogenous gene silencing pathways and provide further insight into the mechanisms of HEN1 substrate discrimination and the diversity within the Argonaute family.
Author Summary
Small RNAs serve as sentinels of the genome, policing activity of selfish genetic elements, modulating chromatin dynamics, and fine-tuning gene expression. Nowhere is this more important than in the germline, where endogenous small interfering RNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs) promote formation of functional gametes and ensure viable, fertile progeny. Small RNAs act primarily by associating with effector proteins called Argonautes to direct repression of complementary mRNAs. HEN1 methyltransferases, which methylate small RNAs, play a critical role in accumulation of these silencing signals. In this study, we report that the 26G RNAs, a class of C. elegans endo-siRNAs, are differentially methylated in male and female germlines. 26G RNAs derived from the two germlines are virtually indistinguishable, except that they associate with evolutionarily divergent Argonautes. Our data support a model wherein the methylation status and, consequently, stability of a small RNA are determined by the associated Argonaute. Therefore, selective expression of Argonautes that permit or prohibit methylation may represent a new mechanism for regulating small RNA turnover. As we observe this phenomenon in the germline, it may be particularly pertinent for directing inheritance of small RNAs, which can carry information not encoded in progeny DNA that is essential for continued transgenerational genome surveillance.
PMCID: PMC3330095  PMID: 22548001
16.  Genome-wide Analysis of Transcript Abundance and Translation in Arabidopsis Seedlings Subjected to Oxygen Deprivation 
Annals of Botany  2005;96(4):647-660.
• Background and Aims DNA microarrays allow comprehensive estimation of total cellular mRNA levels but are also amenable to studies of other mRNA populations, such as mRNAs in translation complexes (polysomes). The aim of this study was to evaluate the role of translational regulation in response to oxygen deprivation (hypoxia).
• Methods Alterations in total cellular and large polysome (≥ five ribosomes per mRNA) mRNA levels were monitored in response to 12 h of hypoxia stress in seedlings of Arabidopsis thaliana with a full-genome oligonucleotide microarray.
• Key Results Comparison of two mRNA populations revealed considerable modulation of mRNA accumulation and diversity in translation in response to hypoxia. Consistent with the global decrease in protein synthesis, hypoxia reduced the average proportion of individual mRNA species in large polysome complexes from 56·1 % to 32·1 %. A significant decrease in the association with translational complexes was observed for 77 % of the mRNAs, including a subset of known hypoxia-induced gene transcripts. The examination of mRNA levels of nine genes in polysomes fractionated through sucrose density gradients corroborated the microarray data. Gene cluster analysis was used to identify mRNAs that displayed co-ordinated regulation. Fewer than half of the highly induced mRNAs circumvented the global depression of translation. Moreover, a large number of mRNAs displayed a significant decrease in polysome association without a concomitant decrease in steady-state accumulation. The abundant mRNAs that encode the ribosomal proteins behaved in this manner. By contrast, a small group of abiotic and biotic stress-induced mRNAs showed a significant increase in polysome association, without a change in abundance. Evaluation of quantitative features of mRNA sequences demonstrated that a low GC nucleotide content of the 5′-untranslated region provides a selective advantage for translation under hypoxia.
• Conclusions Alterations in transcript abundance and translation contribute to the differential regulation of gene expression in response to oxygen deprivation.
PMCID: PMC4247032  PMID: 16081496
Hypoxia; DNA microarray; polysome; translational control; mRNA sequence features; Arabidopsis thaliana
17.  Transcriptome and Proteome Dynamics of a Light-Dark Synchronized Bacterial Cell Cycle 
PLoS ONE  2012;7(8):e43432.
Growth of the ocean's most abundant primary producer, the cyanobacterium Prochlorococcus, is tightly synchronized to the natural 24-hour light-dark cycle. We sought to quantify the relationship between transcriptome and proteome dynamics that underlie this obligate photoautotroph's highly choreographed response to the daily oscillation in energy supply.
Methodology/Principal Findings
Using RNA-sequencing transcriptomics and mass spectrometry-based quantitative proteomics, we measured timecourses of paired mRNA-protein abundances for 312 genes every 2 hours over a light-dark cycle. These temporal expression patterns reveal strong oscillations in transcript abundance that are broadly damped at the protein level, with mRNA levels varying on average 2.3 times more than the corresponding protein. The single strongest observed protein-level oscillation is in a ribonucleotide reductase, which may reflect a defense strategy against phage infection. The peak in abundance of most proteins also lags that of their transcript by 2–8 hours, and the two are completely antiphase for some genes. While abundant antisense RNA was detected, it apparently does not account for the observed divergences between expression levels. The redirection of flux through central carbon metabolism from daytime carbon fixation to nighttime respiration is associated with quite small changes in relative enzyme abundances.
Our results indicate that expression responses to periodic stimuli that are common in natural ecosystems (such as the diel cycle) can diverge significantly between the mRNA and protein levels. Protein expression patterns that are distinct from those of cognate mRNA have implications for the interpretation of transcriptome and metatranscriptome data in terms of cellular metabolism and its biogeochemical impact.
PMCID: PMC3430701  PMID: 22952681
18.  System wide analyses have underestimated protein abundances and the importance of transcription in mammals 
PeerJ  2014;2:e270.
Large scale surveys in mammalian tissue culture cells suggest that the protein expressed at the median abundance is present at 8,000–16,000 molecules per cell and that differences in mRNA expression between genes explain only 10–40% of the differences in protein levels. We find, however, that these surveys have significantly underestimated protein abundances and the relative importance of transcription. Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher. We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression. We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R2 = 0.13). Based on this, our second strategy suggests that mRNA levels explain ∼81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the absence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the ∼40% of genes that are non-expressed.
PMCID: PMC3940484  PMID: 24688849
Transcription; Translation; Mass spectrometry; Gene expression; Protein abundance
19.  Selection on synonymous codons in mammalian rhodopsins: a possible role in optimizing translational processes 
Synonymous codon usage can affect many cellular processes, particularly those associated with translation such as polypeptide elongation and folding, mRNA degradation/stability, and splicing. Highly expressed genes are thought to experience stronger selection pressures on synonymous codons. This should result in codon usage bias even in species with relatively low effective population sizes, like mammals, where synonymous site selection is thought to be weak. Here we use phylogenetic codon-based likelihood models to explore patterns of codon usage bias in a dataset of 18 mammalian rhodopsin sequences, the protein mediating the first step in vision in the eye, and one of the most highly expressed genes in vertebrates. We use these patterns to infer selection pressures on key translational mechanisms including polypeptide elongation, protein folding, mRNA stability, and splicing.
Overall, patterns of selection in mammalian rhodopsin appear to be correlated with post-transcriptional and translational processes. We found significant evidence for selection at synonymous sites using phylogenetic mutation-selection likelihood models, with C-ending codons found to have the highest relative fitness, and to be significantly more abundant at conserved sites. In general, these codons corresponded with the most abundant tRNAs in mammals. We found significant differences in codon usage bias between rhodopsin loops versus helices, though there was no significant difference in mean synonymous substitution rate between these motifs. We also found a significantly higher proportion of GC-ending codons at paired sites in rhodopsin mRNA secondary structure, and significantly lower synonymous mutation rates in putative exonic splicing enhancer (ESE) regions than in non-ESE regions.
By focusing on a single highly expressed gene we both distinguish synonymous codon selection from mutational effects and analytically explore underlying functional mechanisms. Our results suggest that codon bias in mammalian rhodopsin arises from selection to optimally balance high overall translational speed, accuracy, and proper protein folding, especially in structurally complicated regions. Selection at synonymous sites may also be contributing to mRNA stability and splicing efficiency at exonic-splicing-enhancer (ESE) regions. Our results highlight the importance of investigating highly expressed genes in a broader phylogenetic context in order to better understand the evolution of synonymous substitutions.
PMCID: PMC4021273  PMID: 24884412
Mutation-selection model; dN/dS; Codon-based likelihood models; Visual pigment evolution
20.  In-Vivo Quantitative Proteomics Reveals a Key Contribution of Post-Transcriptional Mechanisms to the Circadian Regulation of Liver Metabolism 
PLoS Genetics  2014;10(1):e1004047.
Circadian clocks are endogenous oscillators that drive the rhythmic expression of a broad array of genes, orchestrating metabolism and physiology. Recent evidence indicates that post-transcriptional and post-translational mechanisms play essential roles in modulating temporal gene expression for proper circadian function, particularly for the molecular mechanism of the clock. Due to technical limitations in large-scale, quantitative protein measurements, it remains unresolved to what extent the circadian clock regulates metabolism by driving rhythms of protein abundance. Therefore, we aimed to identify global circadian oscillations of the proteome in the mouse liver by applying in vivo SILAC mouse technology in combination with state of the art mass spectrometry. Among the 3000 proteins accurately quantified across two consecutive cycles, 6% showed circadian oscillations with a defined phase of expression. Interestingly, daily rhythms of one fifth of the liver proteins were not accompanied by changes at the transcript level. The oscillations of almost half of the cycling proteome were delayed by more than six hours with respect to the corresponding, rhythmic mRNA. Strikingly we observed that the length of the time lag between mRNA and protein cycles varies across the day. Our analysis revealed a high temporal coordination in the abundance of proteins involved in the same metabolic process, such as xenobiotic detoxification. Apart from liver specific metabolic pathways, we identified many other essential cellular processes in which protein levels are under circadian control, for instance vesicle trafficking and protein folding. Our large-scale proteomic analysis reveals thus that circadian post-transcriptional and post-translational mechanisms play a key role in the temporal orchestration of liver metabolism and physiology.
Author Summary
The circadian clock is an evolutionary system that allows organisms to anticipate and thus adapt to daily changes in the environment. In mammals, the circadian clock is found in virtually every tissue regulating rhythms of metabolism and physiology. While a lot of studies have focused in how circadian clocks regulate gene expression little is known about daily control of protein abundance. Here we applied state of the art mass spectrometry in combination with quantitative proteomics to investigate global circadian oscillations of the proteome in the mouse liver. We found that approximately 6% of the liver proteins are cycling daily and interestingly the majority of these oscillations diverge from the behavior of their transcripts. Our data indicates that post-transcriptional mechanisms play an essential role in shaping the phase of rhythmic proteins downstream of transcription regulation to ultimately drive rhythms of metabolism. Moreover, the contribution of post-transcriptional regulation seems to differ among distinct metabolic pathways. Overall we not only found circadian oscillations in the abundance of proteins involved in liver specific metabolic pathways but also in essential cellular processes.
PMCID: PMC3879213  PMID: 24391516
21.  Growth-regulated expression and G0-specific turnover of the mRNA that encodes URH49, a mammalian DExH/D box protein that is highly related to the mRNA export protein UAP56 
Nucleic Acids Research  2004;32(6):1857-1865.
URH49 is a mammalian protein that is 90% identical to the DExH/D box protein UAP56, an RNA helicase that is important for splicing and nuclear export of mRNA. Although Saccharomyces cerevisiae and Drosophila express only a single protein corresponding to UAP56, mRNAs encoding URH49 and UAP56 are both expressed in human and mouse cells. Both proteins interact with the mRNA export factor Aly and both are able to rescue the loss of Sub2p (the yeast homolog of UAP56), indicating that both proteins have similar functions. UAP56 mRNA is more abundant than URH49 mRNA in many tissues, although in testes URH49 mRNA is much more abundant. UAP56 and URH49 mRNAs are present at similar levels in proliferating cultured cells. However, when the cells enter quiescence, the URH49 mRNA level decreases 3–6-fold while the UAP56 mRNA level remains relatively constant. The amount of URH49 mRNA increases to the level found in proliferating cells within 5 h when quiescent cells are growth-stimulated or when protein synthesis is inhibited. URH49 mRNA is relatively unstable (T½ = 4 h) in quiescent cells, but is stabilized immediately following growth stimulation or inhibition of protein synthesis. In contrast, there is much less change in the content or stability of UAP56 mRNA following growth stimulation. Our observations suggest that in mammalian cells, two UAP56-like RNA helicases are involved in splicing and nuclear export of mRNA. Differential expression of these helicases may lead to quantitative or qualitative changes in mRNA expression.
PMCID: PMC390356  PMID: 15047853
22.  Community transcriptomics reveals universal patterns of protein sequence conservation in natural microbial communities 
Genome Biology  2011;12(3):R26.
Combined metagenomic and metatranscriptomic datasets make it possible to study the molecular evolution of diverse microbial species recovered from their native habitats. The link between gene expression level and sequence conservation was examined using shotgun pyrosequencing of microbial community DNA and RNA from diverse marine environments, and from forest soil.
Across all samples, expressed genes with transcripts in the RNA sample were significantly more conserved than non-expressed gene sets relative to best matches in reference databases. This discrepancy, observed for many diverse individual genomes and across entire communities, coincided with a shift in amino acid usage between these gene fractions. Expressed genes trended toward GC-enriched amino acids, consistent with a hypothesis of higher levels of functional constraint in this gene pool. Highly expressed genes were significantly more likely to fall within an orthologous gene set shared between closely related taxa (core genes). However, non-core genes, when expressed above the level of detection, were, on average, significantly more highly expressed than core genes based on transcript abundance normalized to gene abundance. Finally, expressed genes showed broad similarities in function across samples, being relatively enriched in genes of energy metabolism and underrepresented by genes of cell growth.
These patterns support the hypothesis, predicated on studies of model organisms, that gene expression level is a primary correlate of evolutionary rate across diverse microbial taxa from natural environments. Despite their complexity, meta-omic datasets can reveal broad evolutionary patterns across taxonomically, functionally, and environmentally diverse communities.
PMCID: PMC3129676  PMID: 21426537
23.  Genetic Variation Shapes Protein Networks Mainly through Non-transcriptional Mechanisms 
PLoS Biology  2011;9(9):e1001144.
Variation in the levels of co-regulated proteins that function within networks in an outbred yeast population is not driven by variation in the corresponding transcripts.
Networks of co-regulated transcripts in genetically diverse populations have been studied extensively, but little is known about the degree to which these networks cause similar co-variation at the protein level. We quantified 354 proteins in a genetically diverse population of yeast segregants, which allowed for the first time construction of a coherent protein co-variation matrix. We identified tightly co-regulated groups of 36 and 93 proteins that were made up predominantly of genes involved in ribosome biogenesis and amino acid metabolism, respectively. Even though the ribosomal genes were tightly co-regulated at both the protein and transcript levels, genetic regulation of proteins was entirely distinct from that of transcripts, and almost no genes in this network showed a significant correlation between protein and transcript levels. This result calls into question the widely held belief that in yeast, as opposed to higher eukaryotes, ribosomal protein levels are regulated primarily by regulating transcript levels. Furthermore, although genetic regulation of the amino acid network was more similar for proteins and transcripts, regression analysis demonstrated that even here, proteins vary predominantly as a result of non-transcriptional variation. We also found that cis regulation, which is common in the transcriptome, is rare at the level of the proteome. We conclude that most inter-individual variation in levels of these particular high abundance proteins in this genetically diverse population is not caused by variation of their underlying transcripts.
Author Summary
The level of protein produced by each gene corresponds approximately to the level of mRNA transcript produced by that gene: so high-abundance proteins, like those involved in protein synthesis, are represented by high-abundance transcripts, whereas low-abundance proteins, like those involved in signaling pathways, are represented by low-abundance transcripts. Furthermore, genetic variation can cause variation in transcript levels for the same gene between different individuals. These two observations have led to the assumption that inter-individual variation in transcript levels for any particular gene causes corresponding variation in protein levels. However, this need not be the case, because protein levels could be controlled not only by regulating transcript levels but also by regulating protein translation and stability. Because inter-individual variation in the levels of the transcript for any particular gene is typically less than 3-fold, rather than orders of magnitude, it is possible that the predominant cause of inter-individual variation in levels of any particular protein is transcription-independent regulation of protein levels. Here, we look in a genetically diverse population of 95 yeast strains at the genetic variation that leads in turn to variation in levels of 354 proteins that function within co-regulated networks. We find that the between-strain variation predominantly reflects transcription-independent mechanisms. If this result is typical of the proteome as a whole, it suggests that protein levels in genetically diverse populations cannot be accurately inferred from levels of their underlying transcripts.
PMCID: PMC3167781  PMID: 21909241
24.  Genome-wide assessment of post-transcriptional control in the fly brain 
Post-transcriptional control of gene expression has central importance during development and adulthood and in physiology in general. However, little is known about the extent of post-transcriptional control of gene expression in the brain. Most post-transcriptional regulatory effectors (e.g., miRNAs) destabilize target mRNAs by shortening their polyA tails. Hence, the fraction of a given mRNA that it is fully polyadenylated should correlate with its stability and serves as a good measure of post-transcriptional control. Here, we compared RNA-seq datasets from fly brains that were generated either from total (rRNA-depleted) or polyA-selected RNA. By doing this comparison we were able to compute a coefficient that measures the extent of post-transcriptional control for each brain-expressed mRNA. In agreement with current knowledge, we found that mRNAs encoding ribosomal proteins, metabolic enzymes, and housekeeping genes are among the transcripts with least post-transcriptional control, whereas mRNAs that are known to be highly unstable, like circadian mRNAs and mRNAs expressing synaptic proteins and proteins with neuronal functions, are under strong post-transcriptional control. Surprisingly, the latter group included many specific groups of genes relevant to brain function and behavior. In order to determine the importance of miRNAs in this regulation, we profiled miRNAs from fly brains using oligonucleotide microarrays. Surprisingly, we did not find a strong correlation between the expression levels of miRNAs in the brain and the stability of their target mRNAs; however, genes identified as highly regulated post-transcriptionally were strongly enriched for miRNA targets. This demonstrates a central role of miRNAs for modulating the levels and turnover of brain-specific mRNAs in the fly.
PMCID: PMC3856366  PMID: 24367289
post-transcriptional regulation; RNA-sequencing; polyA tail; Drosophila melanogaster; brain; miRNA
25.  Effect of Correlated tRNA Abundances on Translation Errors and Evolution of Codon Usage Bias 
PLoS Genetics  2010;6(9):e1001128.
Despite the fact that tRNA abundances are thought to play a major role in determining translation error rates, their distribution across the genetic code and the resulting implications have received little attention. In general, studies of codon usage bias (CUB) assume that codons with higher tRNA abundance have lower missense error rates. Using a model of protein translation based on tRNA competition and intra-ribosomal kinetics, we show that this assumption can be violated when tRNA abundances are positively correlated across the genetic code. Examining the distribution of tRNA abundances across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. This work challenges one of the fundamental assumptions made in over 30 years of research on CUB that codons with higher tRNA abundances have lower missense error rates and that missense errors are the primary selective force responsible for CUB.
Author Summary
Codon usage bias (CUB) is a ubiquitous and important phenomenon. CUB is thought to be driven primarily due to selection against missense errors. For over 30 years, the standard model of translation errors has implicitly assumed that the relationship between translation errors and tRNA abundances are inversely related. This is based on an implicit and unstated assumption that the distribution of tRNA abundances across the genetic code are uncorrelated. Examining these abundance distributions across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. We further show that codons with higher tRNA abundances are not always “optimal” with respect to reducing the missense error rate and hence cannot explain the observed patterns of CUB.
PMCID: PMC2940732  PMID: 20862306

Results 1-25 (1201999)