|Home | About | Journals | Submit | Contact Us | Français|
The geno- and phenotypic diversity of commercial Saccharomyces cerevisiae wine yeast strains provides an opportunity to apply the system-wide approaches that are reasonably well established for laboratory strains to generate insight into the functioning of complex cellular networks in industrial environments. We have previously analyzed the transcriptomes of five industrial wine yeast strains at three time points during alcoholic fermentation. Here, we extend the comparative approach to include an isobaric tag for relative and absolute quantitation (iTRAQ)-based proteomic analysis of two of the previously analyzed wine yeast strains at the same three time points during fermentation in synthetic wine must. The data show that differences in the transcriptomes of the two strains at a given time point rather accurately reflect differences in the corresponding proteomes independently of the gene ontology (GO) category, providing strong support for the biological relevance of comparative transcriptomic data sets in yeast. In line with previous observations, the alignment proves to be less accurate when assessing intrastrain changes at different time points. In this case, differences between the transcriptome and proteome appear to be strongly dependent on the GO category of the corresponding genes. The data in particular suggest that metabolic enzymes and the corresponding genes appear to be strongly correlated over time and between strains, suggesting a strong transcriptional control of such enzymes. The data also allow the generation of hypotheses regarding the molecular origin of significant differences in phenotypic traits between the two strains.
Saccharomyces cerevisiae has long been a model organism to investigate the biology of the eukaryotic cell. The yeast genome, which is compact and contains only around 6,000 protein-encoding genes, was completely sequenced in 1996 (18), but nearly 10% of putative proteins remain without predicted functions. The majority, if not all of these remaining gene products, are nonessential, and the deletion of these genes in most cases does not lead to a detectable phenotype.
A major limitation of most current approaches in this regard is that research is conducted using a limited number of laboratory yeast strains which, while displaying characteristics that are useful for genetic and molecular analyses, represent limited genetic and phenotypic diversity. These laboratory strains are furthermore significantly different from the strains that are used for industrial and commercial purposes. Industrial environments, however, constitute much of the evolutionary framework of the species S. cerevisiae in the past centuries, and many genes that appear not to be associated with a specific function in laboratory strains may be responsible for specific phenotypes in industrial strains. Such strains will therefore be better suited for the analysis of complex genetic and molecular networks and of their phenotypic relevance or biological meaning. The recent sequencing of wine yeast strains (9, 31) showed that a significant number of genes that are not found in the standard S288c laboratory strain were present in these strains and that a large number of other significant differences exist between these genomes. Furthermore, different wine yeast strains exhibit great variation in chromosome size and number, as well as ploidy, and cover a wide range of phenotypic traits, many of which are absent in laboratory yeast (6).
Large-scale gene expression analysis with microarrays is one of the most powerful and best-developed functional genomics methodologies that can be applied to yeast (5). Transcriptome analysis of wine yeast strains has already proven useful to analyze the broad genetic regulation of fermentative growth in wine environments and has allowed identification of stress response mechanisms that are active under these conditions (3, 16, 29). Rossouw et al. (37) showed that a comparative analysis of the transcriptome and exometabolome could be used to identify genes that are involved in aroma metabolism and to predict some of the impact of changed gene expression levels. While of great usefulness, transcription data alone are of limited value, since they cannot be directly correlated with protein levels and, a fortiori, with in vivo metabolic fluxes (13, 19, 36, 48). All omics data sets would indeed be significantly strengthened in combination with other layers of the biological information transfer system (36, 44, 47).
A current bottleneck of such approaches is that most “omics” tools are not developed to the same degree as transcriptomics. In particular, genome-scale protein quantification faces significant challenges, but methods for determining relative levels of protein between samples have been developed (42). Two-dimensional (2-D) gel electrophoresis has been and continues to be employed to separate complex protein mixtures and is frequently combined with in-gel tryptic digestion and mass spectrometry for the identification of proteins (27, 32). In general, most yeast proteomic studies to date have been conducted using this 2-D gel electrophoresis technology (10, 25, 36, 46). While over 1,400 soluble proteins of yeast have been identified using 2-D analyses, this approach has not addressed the issue of quantification in a satisfactory manner and also suffers from the relatively low number of proteins which are identified in a single analysis, combined with an underrepresentation of low-abundance and hydrophobic proteins (17, 35, 36). In wine yeast, the 2-D gel approach coupled to mass spectrometry has been used to study postinoculation changes in protein levels (39) and the proteomic response of fermenting yeast to glucose exhaustion (45). Rossignol et al. (36) used this approach to identify 59 proteins and compare the transcriptome and proteome of a single wine yeast strain during various stages of fermentation. Based on this analysis, those authors found limited alignment between these two layers of the biological information transfer system.
To overcome some of these limitations, whole-proteome analysis can also be implemented by a high-throughput chromatography approach in combination with mass spectrometry (28). The separation of peptides from complex protein digests is usually achieved by two-dimensional nano-liquid chromatography-mass spectrometry (LC/MS) (30). A total of 1,504 yeast proteins have been unambiguously identified in a single analysis using this 2-D chromatography approach coupled with tandem mass spectrometry (MS/MS) (34). Advances in LC/MS-based proteome analysis, in combination with advances in computational methods, have led to a more comprehensive identification and accurate quantification of endogenous yeast proteins (14, 26). Yet most of the above-mentioned studies were carried out with laboratory yeast strains, mostly under confined experimental conditions limited to steady, exponential growth rates. No such studies have been conducted using different wine yeast strains at different stages of the industrial growth cycle.
In our study we made use of such a chromatography-coupled mass spectrometry approach for the comparative analysis of wine yeast strains. To enable relative quantification between samples, we employed the 8-plex isobaric tag for relative and absolute quantitation (iTRAQ) labeling strategy. The strategy enables relative quantification of up to eight complex protein samples in a single analysis using isobaric tags (11). In short, unlabeled protein samples are trypsin digested, then labeled using isobaric tags (the eight reporter ions), and subsequently separated by liquid chromatography, followed by MS/MS. The covalently bound isobaric tags have the same charge and overall mass but produce different low-mass signatures upon MS/MS, thus enabling relative quantification between different samples in a single analysis (2).
In this paper, we extend the comparative omics approach by aligning the transcriptomes and proteomes of two industrial wine yeast strains. The transcriptomes of these strains, generated at the same time points under the same conditions, have been partially analyzed in a previous paper (37). Our data show that the differences in transcript levels of the two strains at a given time point are a reasonably accurate reflection of the differences in the corresponding protein levels independently of the gene ontology (GO) category. This provides strong support for the biological relevance of comparative transcriptomic data sets in yeast, showing that intrinsic differences between strains may form a more reliable platform for analyses of biologically relevant and meaningful genetic features of a system. Interstrain comparative transcriptome and proteome analyses (as opposed to single-strain analyses) appear to substantially increase our ability to provide a biologically relevant interpretation of omics data sets and to understand metabolic and physiological changes that occur during wine fermentation. Such combinatorial comparative approaches should ultimately enable accurate model building for industrial wine yeast and facilitate the generation of intelligent yeast improvement strategies.
Two yeast strains were used in this study, namely, VIN13 (Anchor Yeast, South Africa) and BM45 (Lallemand Inc., Canada). All are diploid Saccharomyces cerevisiae strains used in industrial wine fermentations. Yeast cells were cultivated at 30°C in yeast extract-peptone-dextrose (YPD) synthetic media, with 1% yeast extract (BioLab, South Africa), 2% peptone (Fluka, Germany), and 2% glucose (Sigma, Germany). Solid medium was supplemented with 2% agar (BioLab, South Africa).
Fermentation experiments were carried out with synthetic must MS300, which approximates to a natural must as previously described (7). The medium contained 125 g/liter glucose and 125 g/liter fructose, and the pH was buffered at 3.3 with NaOH.
All fermentations were carried out under microaerobic conditions in 100-ml glass bottles (containing 80 ml of the medium) sealed with rubber stoppers with a CO2 outlet. The fermentation temperature was approximately 22°C, and no continuous stirring was performed during the course of the fermentation. Fermentation bottles were inoculated with YPD cultures in logarithmic growth phase (around an optical density at 600 nm [OD600] of 1) to an OD600 of 0.1 (i.e., a final cell density of approximately 106 CFU·ml−1). The cells from the YPD precultures were briefly centrifuged and resuspended in MS300 to avoid carryover of YPD to the fermentation media. The fermentations followed a time course of 14 days, and the bottles were weighed daily to assess the progress of fermentation. Samples of the fermentation media and cells were taken at days 2, 5, and 14 as representative of exponential, early stationary, and late stationary growth phases, respectively.
Transcriptome data were generated (using the Affymetrix platform) at three time points during fermentation, namely, day 2 (exponential growth phase), day 5 (early stationary phase), and day 14 (late stationary phase) at the end of fermentation. These data were evaluated in part for a previous publication (38). Sampling of cells from fermentation and total RNA extraction were performed as described by Abbott et al. (1). For a complete description of the hybridization conditions, normalization, and statistical analysis, refer to the work of Rossouw et al. (37). Transcript data can be downloaded from the Gene Expression Omnibus (GEO) repository under accession number GSE11651.
General chemicals for sample preparation were acquired from Merck. Samples of the cells were taken from the fermentations (at days 2, 5, and 14) by centrifugation and weighed after being washed with double-distilled water (ddH2O). The pellets were sonicated using a Soniprep 150 probe sonicator on ice in 30-s bursts and then spun at 16,000× g, and the supernatants were collected. Protein content was assayed by the EZQ method (Invitrogen), and aliquots containing 50 μg of total protein underwent reduction (incubation with 10 mM dithiothreitol [DTT] at 56°C for 1 h) and alkylation (incubation with 30 mM iodoacetamide at pH 8.0 in the dark for 1 h) and were then quenched with further DTT. Samples were subsequently digested by incubation with 2 μg of trypsin (Promega, Madison, WI) at 37°C overnight. The resulting peptides were desalted on 10-mg Oasis SPE cartridges (Waters Corporation, MA) and completely dried down using a speed vacuum concentrator (Thermo Savant, Holbrook, NY).
Dried protein digests were reconstituted with 30 μl of dissolution buffer from the iTRAQ reagents multiplex kit (Applied Biosystems, Foster City, CA) and labeled with 8-plex iTRAQ reagents, according to the manufacturer's instructions. Labeled material from six different samples were then combined, acidified, desalted as described above, concentrated to approximately 50 μl, and finally diluted to 250 μl in 0.1% formic acid.
Pooled samples were fractionated in an on-line fashion on a BioSCX II 0.3- by 35-mm column (Agilent Technologies, Santa Clara, CA) using the following 10 salt steps: 10, 20, 40, 60, 80, 100, 140, 200, 260, and 500 mM KCl. Peptides were captured on a 0.3- by 5-mm PepMap cartridge (LC Packings, Dionex Corporation, Sunnyvale, CA) before being separated on a 0.3- by 100-mm Zorbax 300SB-C18 column (Agilent). The high-pressure liquid chromatography (HPLC) gradient between buffer A (0.1% formic acid in water) and buffer B (0.1% formic acid in acetonitrile) was formed at 6 μl/min as follows: 10% buffer B for the first 3 min, increasing to 35% buffer B by 80 min, increasing to 95% buffer B by 84 min, held at 95% until 91 min, back to 10% buffer B at 91.5 min, and held there until 100 min.
The LC effluent was directed into the IonSpray source of the QStar XL hybrid quadrupole time-of-flight mass spectrometer (Applied Biosystems), scanning from 300 to 1,600 m/z. The top three most abundant multiply charged peptides were selected for MS/MS analysis (55 to 1,600 m/z). The mass spectrometer and HPLC system were under the control of the Analyst QS software package (Applied Biosystems).
All of the data files from each 2-D liquid chromatography-MS/MS experiment were searched as a set by ProteinPilot 2.0.1 (Applied Biosystems) against a yeast protein database from Stanford University's Saccharomyces Genome Database (5,884 sequences, downloaded November 2008). The data were also searched against the same set of sequences in reverse to estimate the false discovery rate for each run, which was below 0.3% for all three runs. The proteomic data set is available in the supplemental material.
Microarray data were normalized with the GCRMA method (50). Ratios of the RNA levels for each gene at each time point comparing BM45 to VIN13 were subsequently created by the means of technical replicates performed for each strain. If the resulting ratio was less than 1, it was transformed by taking its negative inverse in order to express relative expression levels on the same scale. Ratios for protein levels between BM45 and VIN13 were similarly created. Ratios for the RNA and protein levels were also created to show the differences between time points within each strain.
XML files for the KEGG pathway database (21, 22, 23) were downloaded, parsed, and used to create an undirected graph consisting of nodes representing pathways and nodes representing gene products which participate in said pathways. Edges between the gene product nodes and each of the pathway nodes in which they are thought to participate were created. A neighborhood walking algorithm was implemented in order to extract subgraphs corresponding to all of the gene products and their associated pathways for which we had ratios for both protein and RNA levels. Given that the proteins identified by iTRAQ varied across time points (within and between each strain), this subgraph extraction was done separately for each time point.
The resulting subgraphs were visualized with Cytoscape 2.6.1 (12, 41). Pathways representing differences between strains as well as reasonable concordance in the regulation of RNA and protein levels were subsequently selected. An unweighted force-directed layout algorithm was applied to the selected subgraphs, and finally, the order of gene product nodes around pathway nodes was manually adjusted to be consistent across time points. Manual node order adjustment was necessary due to the variation in protein data identified by iTRAQ from time point to time point.
The resulting visually mapped subgraphs provide an effective visualization method with which to observe the ratios of RNA and proteins involved in specific pathways simultaneously and, as such, give further insight into the differences in metabolic regulation between strains and time points for both types of molecules.
All programming required for ratio creation, data parsing, graph creation, and neighborhood walking was implemented in Perl.
Protein abundance data for the BM45 and VIN13 strains were generated at three time points during fermentation, namely, day 2 (exponential growth phase), day 5 (early stationary phase), and day 14 (late stationary phase). Three repeats each for both of the strains were combined for each time point in a single 8-plex iTRAQ analysis. In other words, the repeats for BM45 and VIN13 were grouped for comparative analyses into three sets according to time points (i.e., all day 2 samples were grouped together, all day 5 samples were grouped together, and all day 14 samples were grouped together). A total of 436 proteins were unambiguously identified. Not all of these proteins were identified for both strains across all three time points, but for each time point, at least 250 common proteins were quantified for the three BM45 samples and the three VIN13 samples.
To get an impression of the general data structure and overall alignment of transcript and protein data when comparing the two strains at each time point, we first calculated the ratios of the concentrations of identified proteins and the ratios of the corresponding gene expression values between the two strains (i.e., for BM45 versus VIN13 at each of the three time points). As a broad measure of alignment, we used the log ratios of these protein and transcript comparisons (Fig. (Fig.1).1). In these representations, values above 1 and below −1 represent cases for which the fold change differences in protein concentration diverges by a factor of more than 2 from the fold change in transcription levels between the two strains. In other words, the changes in transcript levels are not aligned with the observed changes in protein levels outside these 1 and −1 value cutoffs.
Figure Figure11 shows the general alignment that the log2-transformed protein/mRNA ratios represented as a distribution curve. Log2-transformed ratios close to zero indicate very strong agreement between the protein levels and gene expression levels for comparisons between strains (for protein and mRNA levels). Hence, the steeper the gradient of the slopes of the Gaussian-shaped curves, the closer the alignment of transcript and protein data sets as a whole. For the interstrain analysis at specific time points, there is clearly a significant peak for days 2 and 5 around the optimal alignment point of zero, with sharply declining slopes in the direction of the 2-fold change indicators (namely, values of 1 and −1). The narrow peaks for these 2 days are a clear indicator of the close alignment of the protein and transcript data sets. The opposite is clearly true for day 14 (Fig. (Fig.1C),1C), where no clear Gaussian distribution is evident, but rather, a segmented pattern of increase and decrease across the wide range of protein/transcript ratios is shown.
For a more-detailed analysis of individual protein-transcript pairs, standard t tests were applied to the three repeats of BM45 and VIN13 to determine significant differences in gene or protein levels. The interstrain ratios for transcripts or proteins are set to 1 in cases where no statistically significant differences exist for either the mRNA or protein levels between these two strains. Where interstrain differences are significant, the fold changes are reported for BM45 versus VIN13. This enables comparisons of transcript and corresponding protein fold changes to be made. Examples of the interstrain alignments of mRNA-protein pairs involved in general metabolism (Table (Table1)1) and cell rescue and defense (Table (Table2)2) for BM45 versus VIN13 are shown in the tables.
For the day 2 analysis, only 9 of the 248 protein/mRNA ratios (for the entire set of identified proteins) differed significantly by a fold change of more than 2. This means that comparisons between strains at a given time point are surprisingly reliable, as fold changes in gene expression and in protein abundance data align with close to 95% overlap within the 2-fold threshold. The same observation holds for the day 5 analysis, where once again only ±4% (8 out of 260) of the protein/mRNA pair ratios differed by a fold change of 2 or greater. These data clearly suggest that comparisons of transcript levels are surprisingly reliable in predicting differences in protein levels between two strains. This appears to hold true for all GO categories and is in stark contrast with previous data (36) which suggest that similar predictions are not reliable when analyzing the evolution of transcriptomes and proteomes during fermentation across time points for a given strain.
By day 14 of fermentation, the close alignment of transcript and protein ratios between strains breaks down slightly. Here, 32 of the 277 protein-mRNA pairs show significant discrepancies in the comparative ratios between BM45 and VIN13. The poorer alignment at this stage of fermentation can probably be explained by the fact that active fermentation has stopped and that cells are exposed to severe stress in the form of high ethanol levels and nutrient depletion. At this stage, active transcription is at a minimum, except for those genes related to the mobilization of reserve nutrients or tolerance of the severe stress conditions faced as the cells slow down metabolically. The levels of accumulated proteins still present at this point may thus bear limited correlation to the levels of mRNA in the cells.
In order to compare peptide signal areas between different runs (i.e., for comparisons between different time points for either VIN13 and BM45), the data were normalized as follows: all of the iTRAQ signals for peptides that are not shared among multiple detected proteins and that have a confidence score of at least 1.00 were selected. The area for each label in these peptides was calculated as a percentage of the total iTRAQ signal for each of the labels. This final transformed value is more conducive for comparisons across multiple iTRAQ experiments. The agreement among the replicates when expressed as a percentage of the total signal, as per our calculations, was very good and enabled intrastrain comparisons across time points to be made.
When the analysis of transcript versus protein ratios was applied to the intrastrain data sets established at different time points, the results indicated a largely random distribution of protein/transcript ratios (Fig. 1D and E). The intrastrain comparisons clearly do not conform to the distribution curve seen for interstrain alignments. It must be kept in mind that in this analysis, a large positive or negative change in the expression of a particular gene or protein, along with a moderate or large change in the corresponding protein levels (in the same direction), would fall outside the threshold applied here for a good alignment. However, such an alignment would in many cases be considered a good fit from a biological perspective.
To overcome the inherent stringency of this form of analysis, and considering the breakdown of correlation between transcripts and protein levels observed for the intrastrain analysis, we decided to use trends in transcript and protein levels as a second criterion. This assessment is much less stringent since it queries only whether up or down changes in transcript levels over time points would generally correlate with similar trends in protein levels. In this case, ratios in which both transcripts and proteins were less than 1 or greater than 1 were considered aligned. Inverse ratios (i.e., one ratio was less than 1 and the other was greater than 1) constituted a negative result (nonaligned).
Using this approach, the alignment of protein versus transcript data for the VIN13 and BM45 strains between time points (i.e., day 5 versus day 2 and day 14 versus day 5) was only around 60% for all three comparisons. Considering that a random sample would yield 50%, this value is surprisingly low but in line with previous reports. Even when protein-transcript pairs for only the top 50 genes in terms of the magnitude of the increase/decrease in mRNA levels were evaluated, the trend analysis did not improve in any noteworthy manner. For day 5 versus day 2 for both of the strains, the alignment value increased slightly from 65 to 68%, but for day 14 versus day 5, there was a decrease to close to 50%, much lower than the 60% value calculated for the entire gene set. This is surprising, since the transcript levels of these genes were changed by at least 1.8-fold (and up to 32-fold), and such relatively significant changes would generally be expected to be reflected on the proteome level.
There are several possible explanations for this discordant alignment of transcript and protein levels for the intrastrain comparisons. First, our transcriptome and proteome data were generated at the same stage of fermentation. However, the proteome at a specific time point is a reflection of previous rather than concomitant transcript levels. In other words, it would be expected that a particular transcriptomic data set should be more closely aligned with proteomic data that are generated at a later time point, i.e., after the translation and posttranslational modification workflow has responded to the earlier changes in transcription levels. Second, the time points assessed here represent very different environmental conditions within a dynamically changing system, whereas the comparison of different strains at the same time points de facto normalizes for the environmental background. Another point to consider involves the half-lives of proteins and protein turnover. Differences in the turnover rate of mRNA versus the half-lives of encoded proteins would also lead to a discrepancy in the correlation of the mRNA and protein, particularly during stationary phase when the half-lives of certain proteins are extended.
These findings help to explain our observation that the predictive capacity of the omics matrix that was derived from the alignment of transcriptome and exometabolome data sets (37) was statistically reliant mainly on the comparative analysis of several strains and much less reliant on intrastrain comparisons.
Our data set also confirms previous observations (14, 15, 36) that transcriptomic and proteomic data sets are frequently difficult to align across different time points and that transcriptome data need to be interpreted with caution. This is particularly the case when only a single strain is analyzed, as any changes at the transcript level might be specific to the strain in question and not represent a generally relevant response. In this sense, transcriptome comparisons of different strains under the same experimental conditions (regarding time point and medium composition, etc.) represent a more reliable system for inferring biological meaning, since only the genetic background will provide the basis for differences in physiological or phenotypic changes. Using different strains in comparative transcriptome analyses represents an inherent control system that is self-standardized to limit “noisy” outputs.
Transcript-protein pairs showing discordant regulation between time points (i.e., opposite trends in protein and mRNA levels) were investigated more closely in the two strains. Interestingly, the nonaligned gene-protein pairs followed similar trends in both of the strains, suggesting that these trends are not due to experimental error or noise but rather to a consistent feature of the system. To clarify, for the total of 95 gene-protein pairs showing opposite trends in expression levels for the day 5 analysis versus the day 2 analysis in either of the two strains, only 21 of these gene-protein pairs did not overlap between strains. For the day 14 analysis versus the day 5 analysis, only 37 of the total of 124 nonaligned mRNA-protein pairs did not overlap between the BM45 and VIN13 strains. Thus, discordant alignment between transcriptomes and proteomes between time points is relatively consistent for the different strains, which is helpful for elucidating the regulation of expression/translation of these consistently nonaligned genes. Without the use of multiple strains, this feature of the transcriptome/proteome would have been overlooked. The set of overlapping, yet discordant, transcript-protein pairs were classified according to functional activity and translation, cysteine metabolism, and biopolymer biosynthesis were strongly represented categories.
For comparisons within a single experiment, the ratios of BM45 and VIN13 for both expressed genes and proteins were determined and compared. To facilitate evaluation of the data, the protein-mRNA pairs were categorized according to GO classification terms. The proteins identified in our analysis can reasonably be considered representative of the entire proteome, as all functional categories are well represented (i.e., approximately 160 proteins are involved in energy and metabolism, 25 in cell cycle regulation, 35 in cellular transport, 35 in cell rescue and defense, 80 in protein synthesis, and 25 in transcription). Furthermore, no bias toward any generic protein feature, such as concentration or hydrophobicity profiles, was obvious in the data. In this section, the following two relevant categories are further discussed as examples: energy and metabolism as well as cell rescue and defense (Tables (Tables11 and and22).
As can be seen in Tables Tables11 and and2,2, and as would be expected when considering the overall good alignment presented for the interstrain comparisons at similar time points, the relative over- or underexpression of genes generally coincides with a similar trend in the protein abundance data (particularly for the first two time points during fermentation).
The same functional categories were also analyzed for the intrastrain data. Surprisingly, when considering the rather poor general alignment of changes in transcript and protein levels in this case, gene expression and protein levels also aligned well for the specific functional categories of amino acid metabolism and fermentative metabolism, suggesting a strong transcriptional control of such metabolic enzymes (Table (Table3).3). This is in contrast to the results reported by Rossignol et al. (36), in which most of the glycolytic and amino acid metabolic proteins identified showed opposite correlations between mRNA and proteins between the two fermentative stages considered (exponential phase versus stationary phase) during alcoholic fermentation in synthetic must (MS300).
Other categories showed almost no relationship between changes in transcript and protein levels. As an example, Table Table44 shows data from the GO category of transcription and cell cycle control. The difference in the alignment of protein and transcript data between different functional categories becomes quite apparent when contrasting it with the results depicted in Tables Tables33 and and4.4. Transcriptomic data thus appears to be reasonably representative of protein levels for metabolic enzymes but not for most other GO categories such as general cell maintenance and growth.
In a related work, the strains VIN13 and BM45 were phenotypically profiled (38), and some differences in protein abundance between the two strains can tentatively be correlated to specific phenotypic differences. For instance, the significantly lower levels of several heat shock proteins, such as Hsp60, Hsp82, and Ddr48 in BM45 in comparison to those in VIN13 (Table (Table2),2), could account for the generally lower tolerance of this strain under various stress conditions, including heat stress. This hypothesis is strongly supported by data that show that individual overexpression of these gene results in higher stress resistance, and lifting the expression level of these genes in BM45 to the level observed in VIN13 should therefore result in a recognizable phenotypic change (8, 40). Similarly, lower levels of antioxidant proteins such as Tsa1 and Yhb1 (Table (Table2)2) could also explain the increased susceptibility of BM45 to oxidative stress in comparison to the susceptibility of VIN13 (49). Lower protein abundances of Erg13, Erg20, and Erg6 (Table (Table1)1) in BM45 versus those in VIN13 could also account for the lower ethanol and osmotic shock tolerance of BM45, given that these proteins are involved in the production of a variety of sterols with roles in cell membrane stabilization (33, 51).
Regarding metabolism, the data indicate why the alignment of exometabolome and transcriptome data have previously proven successful (37). Indeed, differences in the ratios of several proteins involved in the synthesis of the aromatic amino acids (namely, Aro1, Aro3, Aro4, and Aro8) (Table (Table1)1) are reflected by differences in the concentrations of the end products of these pathways. Likewise, Bat1 is involved in catalyzing the first transamination step of the catabolic formation of fusel alcohols via the Ehrlich pathway (24). Differences in Bat1 expression (Fig. (Fig.2;2; Table Table1)1) have proven to effect large changes in higher alcohol production by wine yeast strains (37). BAT1 gene expression and Bat1 protein levels are quite notably concordant (Fig. (Fig.2),2), and the decrease in expression of BM45 relative to that of VIN13 agrees with metabolite data showing significantly lower propanol, butanol, and methanol production by BM45 in comparison to that by VIN13 (37). In fact, this close alignment between transcript and protein levels appears to be the case for almost all of the gene-protein pairs linked to the metabolism of the amino acids shown in Fig. Fig.2,2, at both days 2 and 5 and even at day 14. In Fig. Fig.2,2, it is clear that there is a direct correlation between transcript and protein abundance in central metabolic pathways, (such as those pathways related to amino acid metabolism in this example).
Amino acid metabolism is of particular interest from a wine-making perspective, as amino acids serve as the precursors of important volatile aroma compounds. For instance, sulfur-containing amino acids such as methionine (and cysteine to a lesser extent) are the precursors for the volatile thiols that are significant aroma compounds in wine (43). The branched-chain amino acids such as valine, leucine, and isoleucine, on the other hand, serve as the precursors for various higher alcohols. Of the enzymes involved in branched-chain amino acid metabolism, BAT1 was been discussed above. Other genes that encode enzymes in this pathway and that were identified in our previous study (37) for their strong statistical link between expression levels and the production of specific aroma compounds include LEU2, encoding a beta-isopropylmalate dehydrogenase that catalyzes the third step in the leucine biosynthesis pathway (4). Expression of this gene showed a significant statistical correlation with compounds such as isobutanol (37), and as can be seen in Fig. Fig.2,2, the relative transcript and protein abundance ratios align well for this gene. Of the genes involved in the metabolism of isoleucine and valine (precursors for higher alcohol synthesis), the ILV gene family (ILV1, ILV2, ILV3, ILV5, and ILV6) encode isoforms of acetohydroxyacid reductoisomerases involved in branched-chain amino acid biosynthesis (20). Expression of the ILV gene isoforms showed strong positive correlations with many higher alcohols analyzed in a previous study, and expression differences between BM45 and VIN13 once again align with differences in the exometabolite profiles of these two strains, as reported by Rossouw et al. (37). The ILV gene/protein ratio is also well-aligned, again confirming the tight, concordant regulation of transcript levels and enzyme abundance in key metabolic pathways.
In terms of intrastrain comparisons between time points, the alignment of changes in transcription and protein abundance is also good when considering metabolic pathways such as those of amino acid metabolism (Fig. (Fig.2D).2D). Although the intensity of the fold change differs for mRNA and proteins, the overall trends match up well. Figure Figure2D2D shows that there is a general downregulation of transcripts (and their corresponding proteins) involved in amino acid metabolism as fermentation proceeds from exponential growth phase (day 2) to early stationary phase (day 5). This is in line with yeast growth behavior, as day 5 represents a fermentative phase characterized by continued high rates of fermentative metabolism associated with a significant reduction in growth and biomass formation.
Although our coverage of the yeast proteome was only around 5%, the identified proteins were distributed over all functional categories. This coverage is also significantly higher than that obtained in previous studies (36) and appears sufficient to assess the biological relevance and reliability of the transcriptome data. In our study, the alignment of relative protein abundance ratios with gene expression data was accurate for data generated in the early stages of fermentation (days 2 and 5), when active cell growth and metabolism is occurring. In the case of data comparisons across time points, the quality of gene expression to protein correlations deteriorates substantially, due to the lag time between the expressed transcriptome and later changes in the protein profile. In the intrastrain analysis, only the alignment of protein and transcript levels within metabolic pathways specifically proved to be extremely reliable. This confirms the observations by Rossignol et al. (36).
Clearly, transcriptomic studies involving analyses across different time points are fraught with significant complication and therefore may be more difficult to interpret in a biologically meaningful manner. On the other hand, comparison of transcription patterns in the context of different genetic backgrounds appears to provide a reliable indication of underlying genetic differences and phenotypes. This means that many of the molecular causes of phenotypical differences between strains can most probably be directly derived from transcriptomic data sets.
Most notably, the concordance of gene and protein levels of enzymes involved in metabolism confirms transcriptional control of at least some of the important metabolic pathways in yeast. This implies that transcriptomic data can theoretically be applied to evaluate and model certain aspects of yeast metabolism with relative confidence. The agreement of protein abundance ratios between strains with the phenotypic characteristics of these strains further strengthens our belief that the “omics” data sets we have generated provide valuable and reliable insights into the fundamental molecular mechanisms at work in industrial wine yeast strains during alcoholic fermentation.
Funding for the research presented in this paper was provided by the NRF and Winetech, and personal sponsorship was provided by the Wilhelm Frank Trust.
Proteomic analysis was performed by Martin Middleditch at the Centre for Genomics and Proteomics at the University of Auckland. We thank Jo McBride and the Cape Town Centre for Proteomic and Genomic Research for the microarray hybridization and subsequent signal detection and the staff and students at the IWBT for their support and assistance in numerous areas.
Published ahead of print on 23 April 2010.
†Supplemental material for this article may be found at http://aem.asm.org/.