Interstrain alignment of transcriptomes and proteomes.
Protein abundance data for the BM45 and VIN13 strains were generated at three time points during fermentation, namely, day 2 (exponential growth phase), day 5 (early stationary phase), and day 14 (late stationary phase). Three repeats each for both of the strains were combined for each time point in a single 8-plex iTRAQ analysis. In other words, the repeats for BM45 and VIN13 were grouped for comparative analyses into three sets according to time points (i.e., all day 2 samples were grouped together, all day 5 samples were grouped together, and all day 14 samples were grouped together). A total of 436 proteins were unambiguously identified. Not all of these proteins were identified for both strains across all three time points, but for each time point, at least 250 common proteins were quantified for the three BM45 samples and the three VIN13 samples.
To get an impression of the general data structure and overall alignment of transcript and protein data when comparing the two strains at each time point, we first calculated the ratios of the concentrations of identified proteins and the ratios of the corresponding gene expression values between the two strains (i.e., for BM45 versus VIN13 at each of the three time points). As a broad measure of alignment, we used the log ratios of these protein and transcript comparisons (Fig. ). In these representations, values above 1 and below −1 represent cases for which the fold change differences in protein concentration diverges by a factor of more than 2 from the fold change in transcription levels between the two strains. In other words, the changes in transcript levels are not aligned with the observed changes in protein levels outside these 1 and −1 value cutoffs.
FIG. 1. Distribution of protein/transcript ratios. The distribution of the different protein-transcript pairs across the spectrum of ratios was determined for days 2 (A), 5 (B), and 14 (C) of the BM45 versus VIN13 comparative analysis. For the intrastrain analysis, (more ...)
Figure shows the general alignment that the log2-transformed protein/mRNA ratios represented as a distribution curve. Log2-transformed ratios close to zero indicate very strong agreement between the protein levels and gene expression levels for comparisons between strains (for protein and mRNA levels). Hence, the steeper the gradient of the slopes of the Gaussian-shaped curves, the closer the alignment of transcript and protein data sets as a whole. For the interstrain analysis at specific time points, there is clearly a significant peak for days 2 and 5 around the optimal alignment point of zero, with sharply declining slopes in the direction of the 2-fold change indicators (namely, values of 1 and −1). The narrow peaks for these 2 days are a clear indicator of the close alignment of the protein and transcript data sets. The opposite is clearly true for day 14 (Fig. ), where no clear Gaussian distribution is evident, but rather, a segmented pattern of increase and decrease across the wide range of protein/transcript ratios is shown.
For a more-detailed analysis of individual protein-transcript pairs, standard t tests were applied to the three repeats of BM45 and VIN13 to determine significant differences in gene or protein levels. The interstrain ratios for transcripts or proteins are set to 1 in cases where no statistically significant differences exist for either the mRNA or protein levels between these two strains. Where interstrain differences are significant, the fold changes are reported for BM45 versus VIN13. This enables comparisons of transcript and corresponding protein fold changes to be made. Examples of the interstrain alignments of mRNA-protein pairs involved in general metabolism (Table ) and cell rescue and defense (Table ) for BM45 versus VIN13 are shown in the tables.
GO category of energy and metabolism for protein-mRNA pairs at days 2, 5, and 14
GO category of cell rescue and defense for protein-mRNA pairs at days 2, 5, and 14
For the day 2 analysis, only 9 of the 248 protein/mRNA ratios (for the entire set of identified proteins) differed significantly by a fold change of more than 2. This means that comparisons between strains at a given time point are surprisingly reliable, as fold changes in gene expression and in protein abundance data align with close to 95% overlap within the 2-fold threshold. The same observation holds for the day 5 analysis, where once again only ±4% (8 out of 260) of the protein/mRNA pair ratios differed by a fold change of 2 or greater. These data clearly suggest that comparisons of transcript levels are surprisingly reliable in predicting differences in protein levels between two strains. This appears to hold true for all GO categories and is in stark contrast with previous data (36
) which suggest that similar predictions are not reliable when analyzing the evolution of transcriptomes and proteomes during fermentation across time points for a given strain.
By day 14 of fermentation, the close alignment of transcript and protein ratios between strains breaks down slightly. Here, 32 of the 277 protein-mRNA pairs show significant discrepancies in the comparative ratios between BM45 and VIN13. The poorer alignment at this stage of fermentation can probably be explained by the fact that active fermentation has stopped and that cells are exposed to severe stress in the form of high ethanol levels and nutrient depletion. At this stage, active transcription is at a minimum, except for those genes related to the mobilization of reserve nutrients or tolerance of the severe stress conditions faced as the cells slow down metabolically. The levels of accumulated proteins still present at this point may thus bear limited correlation to the levels of mRNA in the cells.
Intrastrain comparison of the evolution of transcriptomes and proteomes.
In order to compare peptide signal areas between different runs (i.e., for comparisons between different time points for either VIN13 and BM45), the data were normalized as follows: all of the iTRAQ signals for peptides that are not shared among multiple detected proteins and that have a confidence score of at least 1.00 were selected. The area for each label in these peptides was calculated as a percentage of the total iTRAQ signal for each of the labels. This final transformed value is more conducive for comparisons across multiple iTRAQ experiments. The agreement among the replicates when expressed as a percentage of the total signal, as per our calculations, was very good and enabled intrastrain comparisons across time points to be made.
When the analysis of transcript versus protein ratios was applied to the intrastrain data sets established at different time points, the results indicated a largely random distribution of protein/transcript ratios (Fig. ). The intrastrain comparisons clearly do not conform to the distribution curve seen for interstrain alignments. It must be kept in mind that in this analysis, a large positive or negative change in the expression of a particular gene or protein, along with a moderate or large change in the corresponding protein levels (in the same direction), would fall outside the threshold applied here for a good alignment. However, such an alignment would in many cases be considered a good fit from a biological perspective.
To overcome the inherent stringency of this form of analysis, and considering the breakdown of correlation between transcripts and protein levels observed for the intrastrain analysis, we decided to use trends in transcript and protein levels as a second criterion. This assessment is much less stringent since it queries only whether up or down changes in transcript levels over time points would generally correlate with similar trends in protein levels. In this case, ratios in which both transcripts and proteins were less than 1 or greater than 1 were considered aligned. Inverse ratios (i.e., one ratio was less than 1 and the other was greater than 1) constituted a negative result (nonaligned).
Using this approach, the alignment of protein versus transcript data for the VIN13 and BM45 strains between time points (i.e., day 5 versus day 2 and day 14 versus day 5) was only around 60% for all three comparisons. Considering that a random sample would yield 50%, this value is surprisingly low but in line with previous reports. Even when protein-transcript pairs for only the top 50 genes in terms of the magnitude of the increase/decrease in mRNA levels were evaluated, the trend analysis did not improve in any noteworthy manner. For day 5 versus day 2 for both of the strains, the alignment value increased slightly from 65 to 68%, but for day 14 versus day 5, there was a decrease to close to 50%, much lower than the 60% value calculated for the entire gene set. This is surprising, since the transcript levels of these genes were changed by at least 1.8-fold (and up to 32-fold), and such relatively significant changes would generally be expected to be reflected on the proteome level.
There are several possible explanations for this discordant alignment of transcript and protein levels for the intrastrain comparisons. First, our transcriptome and proteome data were generated at the same stage of fermentation. However, the proteome at a specific time point is a reflection of previous rather than concomitant transcript levels. In other words, it would be expected that a particular transcriptomic data set should be more closely aligned with proteomic data that are generated at a later time point, i.e., after the translation and posttranslational modification workflow has responded to the earlier changes in transcription levels. Second, the time points assessed here represent very different environmental conditions within a dynamically changing system, whereas the comparison of different strains at the same time points de facto normalizes for the environmental background. Another point to consider involves the half-lives of proteins and protein turnover. Differences in the turnover rate of mRNA versus the half-lives of encoded proteins would also lead to a discrepancy in the correlation of the mRNA and protein, particularly during stationary phase when the half-lives of certain proteins are extended.
These findings help to explain our observation that the predictive capacity of the omics matrix that was derived from the alignment of transcriptome and exometabolome data sets (37
) was statistically reliant mainly on the comparative analysis of several strains and much less reliant on intrastrain comparisons.
Our data set also confirms previous observations (14
) that transcriptomic and proteomic data sets are frequently difficult to align across different time points and that transcriptome data need to be interpreted with caution. This is particularly the case when only a single strain is analyzed, as any changes at the transcript level might be specific to the strain in question and not represent a generally relevant response. In this sense, transcriptome comparisons of different strains under the same experimental conditions (regarding time point and medium composition, etc.) represent a more reliable system for inferring biological meaning, since only the genetic background will provide the basis for differences in physiological or phenotypic changes. Using different strains in comparative transcriptome analyses represents an inherent control system that is self-standardized to limit “noisy” outputs.
Transcript-protein pairs showing discordant regulation between time points (i.e., opposite trends in protein and mRNA levels) were investigated more closely in the two strains. Interestingly, the nonaligned gene-protein pairs followed similar trends in both of the strains, suggesting that these trends are not due to experimental error or noise but rather to a consistent feature of the system. To clarify, for the total of 95 gene-protein pairs showing opposite trends in expression levels for the day 5 analysis versus the day 2 analysis in either of the two strains, only 21 of these gene-protein pairs did not overlap between strains. For the day 14 analysis versus the day 5 analysis, only 37 of the total of 124 nonaligned mRNA-protein pairs did not overlap between the BM45 and VIN13 strains. Thus, discordant alignment between transcriptomes and proteomes between time points is relatively consistent for the different strains, which is helpful for elucidating the regulation of expression/translation of these consistently nonaligned genes. Without the use of multiple strains, this feature of the transcriptome/proteome would have been overlooked. The set of overlapping, yet discordant, transcript-protein pairs were classified according to functional activity and translation, cysteine metabolism, and biopolymer biosynthesis were strongly represented categories.
For comparisons within a single experiment, the ratios of BM45 and VIN13 for both expressed genes and proteins were determined and compared. To facilitate evaluation of the data, the protein-mRNA pairs were categorized according to GO classification terms. The proteins identified in our analysis can reasonably be considered representative of the entire proteome, as all functional categories are well represented (i.e., approximately 160 proteins are involved in energy and metabolism, 25 in cell cycle regulation, 35 in cellular transport, 35 in cell rescue and defense, 80 in protein synthesis, and 25 in transcription). Furthermore, no bias toward any generic protein feature, such as concentration or hydrophobicity profiles, was obvious in the data. In this section, the following two relevant categories are further discussed as examples: energy and metabolism as well as cell rescue and defense (Tables and ).
As can be seen in Tables and , and as would be expected when considering the overall good alignment presented for the interstrain comparisons at similar time points, the relative over- or underexpression of genes generally coincides with a similar trend in the protein abundance data (particularly for the first two time points during fermentation).
The same functional categories were also analyzed for the intrastrain data. Surprisingly, when considering the rather poor general alignment of changes in transcript and protein levels in this case, gene expression and protein levels also aligned well for the specific functional categories of amino acid metabolism and fermentative metabolism, suggesting a strong transcriptional control of such metabolic enzymes (Table ). This is in contrast to the results reported by Rossignol et al. (36
), in which most of the glycolytic and amino acid metabolic proteins identified showed opposite correlations between mRNA and proteins between the two fermentative stages considered (exponential phase versus stationary phase) during alcoholic fermentation in synthetic must (MS300).
Relative protein and transcript ratios for day 5 versus day 2 analyses of VIN13 and BM45 for genes involved in fermentation and amino acid metabolisma
Other categories showed almost no relationship between changes in transcript and protein levels. As an example, Table shows data from the GO category of transcription and cell cycle control. The difference in the alignment of protein and transcript data between different functional categories becomes quite apparent when contrasting it with the results depicted in Tables and . Transcriptomic data thus appears to be reasonably representative of protein levels for metabolic enzymes but not for most other GO categories such as general cell maintenance and growth.
Relative protein and transcript ratios for day 5 versus day 2 analyses of VIN13 and BM45 for the GO categories of transcription and cell cycle controla
Correlations between protein levels and phenotype.
In a related work, the strains VIN13 and BM45 were phenotypically profiled (38
), and some differences in protein abundance between the two strains can tentatively be correlated to specific phenotypic differences. For instance, the significantly lower levels of several heat shock proteins, such as Hsp60, Hsp82, and Ddr48 in BM45 in comparison to those in VIN13 (Table ), could account for the generally lower tolerance of this strain under various stress conditions, including heat stress. This hypothesis is strongly supported by data that show that individual overexpression of these gene results in higher stress resistance, and lifting the expression level of these genes in BM45 to the level observed in VIN13 should therefore result in a recognizable phenotypic change (8
). Similarly, lower levels of antioxidant proteins such as Tsa1 and Yhb1 (Table ) could also explain the increased susceptibility of BM45 to oxidative stress in comparison to the susceptibility of VIN13 (49
). Lower protein abundances of Erg13, Erg20, and Erg6 (Table ) in BM45 versus those in VIN13 could also account for the lower ethanol and osmotic shock tolerance of BM45, given that these proteins are involved in the production of a variety of sterols with roles in cell membrane stabilization (33
Regarding metabolism, the data indicate why the alignment of exometabolome and transcriptome data have previously proven successful (37
). Indeed, differences in the ratios of several proteins involved in the synthesis of the aromatic amino acids (namely, Aro1, Aro3, Aro4, and Aro8) (Table ) are reflected by differences in the concentrations of the end products of these pathways. Likewise, Bat1 is involved in catalyzing the first transamination step of the catabolic formation of fusel alcohols via the Ehrlich pathway (24
). Differences in Bat1 expression (Fig. ; Table ) have proven to effect large changes in higher alcohol production by wine yeast strains (37
gene expression and Bat1 protein levels are quite notably concordant (Fig. ), and the decrease in expression of BM45 relative to that of VIN13 agrees with metabolite data showing significantly lower propanol, butanol, and methanol production by BM45 in comparison to that by VIN13 (37
). In fact, this close alignment between transcript and protein levels appears to be the case for almost all of the gene-protein pairs linked to the metabolism of the amino acids shown in Fig. , at both days 2 and 5 and even at day 14. In Fig. , it is clear that there is a direct correlation between transcript and protein abundance in central metabolic pathways, (such as those pathways related to amino acid metabolism in this example).
FIG. 2. Network visualization of protein and gene expression ratios in metabolic hubs linked to amino acid metabolism. The pathway networks for BM45 versus VIN13 at days 2 (A), 5 (B), and 14 (C) are presented. (D) Changes in gene and protein levels for day 5 (more ...)
Amino acid metabolism is of particular interest from a wine-making perspective, as amino acids serve as the precursors of important volatile aroma compounds. For instance, sulfur-containing amino acids such as methionine (and cysteine to a lesser extent) are the precursors for the volatile thiols that are significant aroma compounds in wine (43
). The branched-chain amino acids such as valine, leucine, and isoleucine, on the other hand, serve as the precursors for various higher alcohols. Of the enzymes involved in branched-chain amino acid metabolism, BAT1
was been discussed above. Other genes that encode enzymes in this pathway and that were identified in our previous study (37
) for their strong statistical link between expression levels and the production of specific aroma compounds include LEU2
, encoding a beta-isopropylmalate dehydrogenase that catalyzes the third step in the leucine biosynthesis pathway (4
). Expression of this gene showed a significant statistical correlation with compounds such as isobutanol (37
), and as can be seen in Fig. , the relative transcript and protein abundance ratios align well for this gene. Of the genes involved in the metabolism of isoleucine and valine (precursors for higher alcohol synthesis), the ILV
gene family (ILV1
, and ILV6
) encode isoforms of acetohydroxyacid reductoisomerases involved in branched-chain amino acid biosynthesis (20
). Expression of the ILV
gene isoforms showed strong positive correlations with many higher alcohols analyzed in a previous study, and expression differences between BM45 and VIN13 once again align with differences in the exometabolite profiles of these two strains, as reported by Rossouw et al. (37
). The ILV
gene/protein ratio is also well-aligned, again confirming the tight, concordant regulation of transcript levels and enzyme abundance in key metabolic pathways.
In terms of intrastrain comparisons between time points, the alignment of changes in transcription and protein abundance is also good when considering metabolic pathways such as those of amino acid metabolism (Fig. ). Although the intensity of the fold change differs for mRNA and proteins, the overall trends match up well. Figure shows that there is a general downregulation of transcripts (and their corresponding proteins) involved in amino acid metabolism as fermentation proceeds from exponential growth phase (day 2) to early stationary phase (day 5). This is in line with yeast growth behavior, as day 5 represents a fermentative phase characterized by continued high rates of fermentative metabolism associated with a significant reduction in growth and biomass formation.
Although our coverage of the yeast proteome was only around 5%, the identified proteins were distributed over all functional categories. This coverage is also significantly higher than that obtained in previous studies (36
) and appears sufficient to assess the biological relevance and reliability of the transcriptome data. In our study, the alignment of relative protein abundance ratios with gene expression data was accurate for data generated in the early stages of fermentation (days 2 and 5), when active cell growth and metabolism is occurring. In the case of data comparisons across time points, the quality of gene expression to protein correlations deteriorates substantially, due to the lag time between the expressed transcriptome and later changes in the protein profile. In the intrastrain analysis, only the alignment of protein and transcript levels within metabolic pathways specifically proved to be extremely reliable. This confirms the observations by Rossignol et al. (36
Clearly, transcriptomic studies involving analyses across different time points are fraught with significant complication and therefore may be more difficult to interpret in a biologically meaningful manner. On the other hand, comparison of transcription patterns in the context of different genetic backgrounds appears to provide a reliable indication of underlying genetic differences and phenotypes. This means that many of the molecular causes of phenotypical differences between strains can most probably be directly derived from transcriptomic data sets.
Most notably, the concordance of gene and protein levels of enzymes involved in metabolism confirms transcriptional control of at least some of the important metabolic pathways in yeast. This implies that transcriptomic data can theoretically be applied to evaluate and model certain aspects of yeast metabolism with relative confidence. The agreement of protein abundance ratios between strains with the phenotypic characteristics of these strains further strengthens our belief that the “omics” data sets we have generated provide valuable and reliable insights into the fundamental molecular mechanisms at work in industrial wine yeast strains during alcoholic fermentation.