The effect of oxygen provision on gene transcription in steady state glucose-limited chemostats
Microarray analysis of yeast from glucose-limited chemostat cultivations with 0, 0.5, 1.0, 2.8 and 20.9% oxygen in the feed gas was performed. Statistical analysis of the steady state data revealed that 3435 genes responded significantly (p < 0.01) to oxygen availability under the five conditions studied. While the highest number of responsive genes (2900) was observed between the anaerobic and fully aerobic conditions, the number of genes expressed differently in conditions of intermediate oxygen (0.5-2.8%) was relatively small (Figure and ). The transcriptome from cultures with 0.5% and 1.0% oxygen was particularly similar: only 10 genes had statistical differences (p < 0.01) in their expression. When the anaerobic or fully aerobic conditions were compared to conditions of intermediate oxygen, significant differences were found in 2000-2400 and 1500-1600 genes, respectively.
Figure 1 Venn diagrams of the genes which differ significantly (p < 0.01) in conditions of different oxygen provision in the feed gas. A. Anaerobic and either 0.5 or 20.9% oxygen in the feed gas and B. 0.5% and 1.0%, 0.5% and 2.8%, and 1.0% and 2.8% oxygen (more ...)
To obtain an overall picture of metabolic pathways responding to oxygen availability, gene set enrichment analysis was performed. This analysis allows the identification of defined sets of genes with differential expression between two classes of samples [31
]. Parametric gene set enrichment analysis (PAGE) uses fold changes between experimental groups to calculate Z scores for predefined gene sets and uses normal distribution to infer the statistical significance of the gene sets [33
]. This approach was used in the present study to identify KEGG pathways and GO categories (containing 10 or more genes) which contained genes that were differentially expressed in conditions of different oxygen provision. Pair wise comparisons of successive oxygen levels and of the anaerobic and fully aerobic conditions are shown in Table . Comparison of intermediate oxygen levels showed that few pathways were differentially expressed when cells were provided with 0.5, 1.0 or 2.8% oxygen. In particular, comparison of 0.5% and 1.0% oxygen found no statistically significant differences, even at a p-value of 0.05 (data not shown).
Parametric gene set enrichment analysis of GO classes and KEGG pathways
Most of the genes (78%) which were differentially expressed between anaerobic and 0.5% provided oxygen were likewise differentially expressed between anaerobic and fully aerobic conditions (Figure ). PAGE analysis revealed that the pathways that were differentially expressed between anaerobic and 0.5 or 1.0% provided oxygen, but not between anaerobic and fully aerobic conditions were those of oxidative phosphorylation, pheromone signalling, arginine, proline and glutathione metabolism and exocytosis (Table ). Pathways unique to the comparison of 2.8% and 20.9% provided oxygen were protein folding, iron ion homeostasis, protein targeting to membrane and metabolism of phenylamine and amino sugars.
Clustering of transcription data and promoter analysis of the clusters
Cluster analysis of the transcriptional data was carried out using fuzzy c-means clustering, which enabled clustering without prefiltering of the genes and thus included potentially interesting genes that did not differ strongly in the different conditions and which would otherwise have been discarded from the analysis [34
]. Fuzzy c-means clustering is a soft clustering method that assigns genes to clusters with gradual membership values between zero and one. Not all genes are forced into clusters, as is often the case in traditional clustering of predetermined, significantly changing genes. Moreover, the membership values for the clusters can be used to determine the level of coregulation under consideration. The fuzzy c-means clustering of gene expression data from S. cerevisiae
cultures grown with different amounts of oxygen and the most significant over-represented GO-categories and KEGG-pathways in these clusters are presented in Figure and Additional file 1
), respectively. Analysis of the gene expression data revealed 22 clusters containing 37-267 genes with alpha values higher than 0.5, i.e. the genes belonged with highest probability to the respective cluster.
Figure 2 Fuzzy c-means clustering of gene expression patterns in cells grown with 0, 0.5, 1.0, 2.8 and 20.9% oxygen in the feed gas. The clustering was performed with individual samples, but average values for each condition are shown in the graphs. The expression (more ...)
The promoter and 3'UTR sequences of genes in the clusters identified using fuzzy c-means clustering were analysed using FIRE software [35
] and the results of the analysis are shown in Figure . The analysis revealed 17 transcription factor binding site motifs and 7 3'UTR motifs, of which some had significant co-occurrence and/or co-localisation patterns. A more detailed description of the results of clustering and promoter analysis is provided below.
Figure 3 FIRE analysis for transcriptional regulatory motifs occurring in the clusters presented in figure 2. For each cluster, the most significant GO enrichments are shown at the top. Yellow indicates over-representation of a motif in a given cluster and significant (more ...)
Genes of the respiratory pathway and TCA-cycle have enhanced expression in intermediate compared to fully aerobic conditions
Two steady state clusters (cluster 4 and cluster 11) contained genes that had higher expression in all intermediate oxygen conditions compared to either anaerobic or fully aerobic conditions. The transcription levels of genes in cluster 4 were higher in anaerobic than aerobic conditions, while the opposite was observed in cluster 11. Cluster 4 was enriched in genes of KEGG pathways for the cell cycle and glycerophospholipid metabolism, while cluster 11 was enriched in genes related to oxidative phosphorylation, the TCA cycle, the MAPK signalling pathway and pyruvate metabolism. FIRE analysis revealed that different motifs were enriched in the promoters and 3'UTR sequences of the genes of these two clusters. In genes of cluster 4, motifs for Puf3p 3'UTR sites were found, while genes in cluster 11 were enriched in binding sites of the Hap2/3/4/5p transcription factor and two previously undescribed 3'UTR motifs (WHATATTC and HTTTAWTTH). All three motifs found in cluster 11 had significant co-occurrence amongst the genes.
Nearly all of the genes encoding nuclear-encoded subunits of respiratory chain complexes were located in cluster 11 (30 out of 37) and cluster 4 (4 out of 37), thus having their highest expression levels in the intermediate oxygen conditions. Cluster 11 and 4 also contained genes encoding several TCA cycle enzymes: Cit1p, Aco1p, Idh1p, Kgd1p, Kgd2p, Lpd1p, Mdh1p (cluster 11) and Idh2p (cluster 4). The increase in the expression was mainly less than 2-fold, suggesting a subtle change of the components of these pathways. Of the genes encoding the main enzymes of the TCA cycle, only FUM1, LSC1 and LSC2 did not have their highest expression level in the intermediate oxygen conditions, but in the fully aerobic conditions. Further, genes encoding isoenzymes of the enzymes of the TCA cycle had their highest expression either in fully aerobic (IDP2, IDP3, MDH2, MDH3, CIT3, YLR164W, YJL045W, YMR118C) or anaerobic (CIT2) conditions.
Many respiratory enzymes contain metals and accordingly, many genes involved in metal transport and homeostasis were found in clusters 4 and 11. Genes encoding vacuolar iron transporters Fth1p and Fet5p, plasma membrane copper transporters Ccc2p and Ctr1p, the metal ion transporter Smf1p and iron and copper reductase Fre1p were found in cluster 11. Additionally, genes encoding metallopeptidases/proteases Yta12p, Axl1p, Qri7p, and the copper deprivation induced ORF YOR296W
were amongst the members of this cluster. Cluster 4 contained genes encoding plasma membrane siderophore-iron transporter Arn1p, oxidoreductase Fet3p, vacuolar zinc transporter Zrc1p and Ggc1p involved in mitochondrial iron homeostasis. Comparing gene expression in 2.8% oxygen and the fully aerobic conditions, 9 out of 16 genes known to be involved in transport of iron from the extracellular medium to the cytosol [36
] had 2-16 fold higher expression and only two genes had lower expression in 2.8% oxygen than in the fully aerobic conditions.
Cluster 4 was enriched in genes related to mitochondrial organisation and biogenesis (RPM2, POR1, UTH1, PNT1, CLU1, DNM1, MGM1, MBA1). In addition, genes encoding mitochondrial translation elongation factors (TUF1, MEF1), mitochondrial translational activators (CBS2, PET309), mitochondrial ribosome recycling factor (RRF1) and subunits of mitochondrial ribosomes (10 genes) were found in this cluster. Cluster 10, in which the lowest level of expression occurred in the fully aerobic conditions and similar, higher expression levels occurred in the oxygen-limited and anaerobic conditions, also contained genes related to mitochondrial protein synthesis. 57 genes encoding components of mitochondrial ribosomes and 10 genes of mitochondrial protein import machinery were found in cluster 10. The 3' UTR motif for binding of Puf3p, which promotes degradation of mRNAs of nuclear-encoded mitochondrial proteins, was over-represented both in clusters 4 and 10. The expression of PUF3 itself was low and remained constant under all the conditions of different oxygen provision studied.
Effect of oxygen on transcription of genes involved in lipid metabolism
Clusters 16 and 21 were enriched in genes related to fatty acid oxidation and peroxisomal biogenesis. Cluster 16 showed highest expression in fully aerobic conditions, lowest expression in anaerobic conditions and a similar, intermediate level of expression in all the intermediate oxygen conditions. Genes encoding activities of fatty acid β-oxidation (TES1, POX1, CTA1, PXA1, SPS19, DCI1, ANT1, FOX2, POT1, PEX11, PXA2), the oleate responding transcription factor OAF1 and 4 genes related to peroxisomal biogenesis (PEX15, PEX2, PEX8, PEX18) were located in this cluster. Gene expression in cluster 21 was at its highest in fully aerobic conditions, and at a lower, comparable level in the oxygen-limited and anaerobic conditions. This cluster contained 6 genes (PCD1, YOR084W, CAT2, IDP3, ECI1, AAT2) related to fatty acid metabolism, and 7 genes related to peroxisomal biogenesis (PEX14, PEX5, PEX19, PEX30, PEX28, PEX1, PEX3, YMR018W). The oleate responding transcription factor PIP2 was also located in this cluster.
Clusters 3 and 14 were enriched in genes related to sterol metabolism. Genes of cluster 3 were transcribed at lower levels in intermediate oxygen conditions, compared to fully aerobic or anaerobic conditions. The cluster contained genes encoding activities of ergosterol biosynthesis (ERG6, ERG11, HMG2, ERG25, DAP1), sterol transport (SUT2, OSH2), sterol homeostasis (TGL1) and synthesis of membrane sterols (ATG26). Genes in cluster 14 were transcribed at a lower level in all oxygen containing conditions, compared to anaerobic conditions. The cluster was enriched in genes encoding proteins involved in ergosterol biosynthesis (ERG26, ERG7, ERG2, ERG3, ERG1, ERG10, NCP1, ERG9, ERG27, ERG24, ERG28, HES1), sterol esterification (ARE1), sterol transport (AUS1, SWH1) and regulation of sterol transport and biosynthesis (UPC2, ECM22). Also DAN/TIR genes, encoding cell wall mannoproteins, and PAU genes of unknown function were accumulated in cluster 14 (DAN1-4, TIR1-4, PAU2,3,5,9). When a less strict α-value of 0.1 was used to define the genes belonging to this cluster, three additional PAU genes were found in it (PAU7,17,18).
Promoters of the genes in clusters 3 and 14 were enriched in two putative transcription factor binding sites that had strong, positive co-occurrence. The motif BTAWACGA was found in all the sterol metabolism-related genes of cluster 14, except in SWH1, and in all the three ERG genes of cluster 3. The motif RACAATAG was found in the promoters of 11 out of the 29 genes related to sterol metabolism of cluster 14, and in 2 out of 9 of those in cluster 3.
Oxygen dependent stress responses
Three clusters (clusters 3, 8 and 16), with distinct expression profiles, showed enrichment in genes in the GO category of stress response, and binding sites of stress-related transcription factors Msn2/4p and Gis1p were over-represented among the promoters of the genes in two of these clusters (clusters 8 and 16). In the promoters of the genes in cluster 16, binding sites of Ume6p and two unknown transcription factors were also over-represented while, binding sites for a stress-activated transcriptional repressor Xbp1p were under-represented. Further, the gene encoding Xbp1p was a member of cluster 16. The expression level of XBP1 was induced 3-fold in the intermediate oxygen (0.5-2.8%) and 8-fold in the fully aerobic conditions compared to the anaerobic conditions. Promoter analysis revealed enrichment of the binding site for Xbp1p in clusters 1 and 22. These clusters had an average correlation of -0.81 and -0.97, respectively, to the expression level of XBP1. 72% and 68% of the genes in clusters 1 and 22, respectively, contained the central core bases (CTCGA) of the Xbp1p binding site. Many of these genes are related to the regulation of cell division (GIC1, BUD4, TOS4, KIP2, TOS1, KIN4, TUB4, CIN8, TUB3, VIK1, SMC2, UNG1, PIN4, FKH1) and cell wall organisation (EXG2, ORF YFL052W, TOS1, BUD7, MHP1, DSE1, SUN4).
The MAPK signalling pathway for pheromone response and filamentous growth is affected by oxygen availability
Clusters 4, 7 and 11, of which clusters 4 and 11 have been discussed above with reference to genes involved in the TCA cycle and respiration, and which contain those genes which were more highly expressed in the conditions of intermediate oxygen availability, were enriched in genes involved in mating and filamentous growth. These clusters contained genes which showed a low level of expression in anaerobic, compared to intermediate oxygen conditions. However, they differed in the fully aerobic conditions, genes of clusters 4 and 11 had lower expression levels in the aerobic than in the intermediate oxygen conditions, but in cluster 7 the expression levels were comparable in all conditions provided with oxygen.
Genes in cluster 11 included some encoding proteins of the MAPK signalling pathways for pheromone response and filamentous growth (Ste3p, Gpa1p, Fus3p, Sst2p, Kss1p), genes regulated by these signalling pathways (FUS2, FUS1, FIG1, SAG1, FIG2, PRM6, AGA1, PRM1, CLN1, BUD8, MSB2, CWP1, GFA1, KTR2, SVS1) and the transcription factors (Ste12p, Tec1p) that are activated by these pathways. According to FIRE analysis, this cluster as well as cluster 4, which contained a set of genes related to mating (FAR1, STE4, CLN2, MSG5, STE23, KAR5, ASH1, HO, CCW12), were enriched in genes whose promoters contain the transcription factor binding site for Ste12p. Cluster 7 contained genes regulated by the MAPK signalling pathway for mating (PRM5, PRM10, AGA2, MDG1, AFR1, PRR2, PRM8, CHS1). While promoters of genes in cluster 7 were overall enriched with a binding site of Ume6p transcription factor, Ume6p binding site was not enriched in the promoters of the genes related to pheromone signalling.
Comparison with previous data and oxygen dependence of genes of pentose phosphate pathway
We previously published transcription data for 72 selected genes related to central carbon metabolism, measured with the TRAC method [29
]. Of those genes analysed with both Affymetrix (p < 0.01) and with TRAC (p < 0.05) methods, 61 showed statistically significant differences in their expression levels with both methods. Sixteen of the significantly changing genes showed >3-fold difference in expression and had an average correlation of 0.8 between the TRAC and the Affymetrix analysis. Thirteen of the significantly changing genes showed 2 to 3-fold difference in expression and had an average correlation of 0.6. Twenty-four of the significantly changing genes had <2-fold difference in their expression and had an average correlation of only 0.2. However, five of these genes which had <2-fold difference had correlations > 0.7. The genes that showed poor correlation between the TRAC and the Affymetrix data, and that showed ≥ 2-fold differences in the Affymetrix were GPD2, CIT2, ACS1
, HAP1, MAE1
, the signals of the three latter genes being very close to the detection limit using the TRAC method.
Large changes in the expression of SOL4, GND2, TKL2 and the ORF YGR043C, from the pentose phosphate pathway, were observed in Affymetrix data. These genes had their highest levels of expression in the aerobic and lowest levels of expression in the anaerobic conditions (cluster 16). The fold differences were 2-15 between the anaerobic and intermediate oxygen and 16 to 40-fold between the anaerobic and fully aerobic conditions. In addition, SOL3 was slightly (1.5-fold) upregulated in the 2.8% oxygen and fully aerobic conditions compared to lower oxygen levels. Of these genes, the expression of GND2, TKL2 and ORF YGR043C had also been measured with the TRAC method and the correlation between the Affymetrix and TRAC measurements was > 0.7.
ZWF1 was also measured with both Affymetrix and TRAC. With both methods ZWF1 expression was shown to increase 1.3-fold, compared to expression in fully aerobic cells, however, this increase was seen in cells provided with 0, 0.5 and 1.0% oxygen in the Affymetrix analysis, but only in cells provided with 2.8% oxygen in the TRAC analysis. Of the other genes from the pentose phosphate pathway, GND1, TKL1 and TAL1 did not show significant differences in their expression levels in different oxygen conditions when measured with Affymetrix.
Effect of oxygen on the proteome and enzyme activities, correlated with transcriptome changes
2D-gel analysis of 2-4 independent cultures from each level of oxygen provision resulted in a proteome of 484 protein spots in total that were included in the statistical analysis. After quantile normalisation, a similar analysis for statistically significant changes in quantity with linear modelling was performed as with the gene expression data. This analysis revealed 145 spots that differed significantly (p < 0.01) when the cells were provided different levels of oxygen. Of the 484 spots, 209 were identified. The data is presented in additional data file 2
Enzymes of the TCA cycle and those involved in respiration showed either a slight increase in quantity (1.5 to 2-fold) in the intermediate oxygen conditions, compared to other conditions (Idh2p, Mdh2p, Sdh1p, Atp3p, Atp5, Atp7p, Qcr2p, Rip1), a strong increase (3 to 64-fold) in fully aerobic conditions (Cit1p, Fum1p, Lsc1p, Idp2, Atp1, Cyb2p) or did not differ in different levels of oxygen provision (Aco1p, Idh2p, Atp2, Atp7p, Idp1p, Lsc2p). Many of the proteins involved in glucose fermentation were found as multiple pI isoforms which differed in relative quantities in different oxygen levels. These included Adh1p (3 pI isoforms), Adh2p (3), Ald4p (2), Ald6p (2), Eno1p (6), Eno2p (4), Gpm1p (3), Fba1p (2) and Hxk1 (2).
Enzyme activities were measured from crude cell extracts, providing a measure of the combined activity of all isoforms of the respective enzymes in the cell (Figure ). The activities were expressed as units (U) per total soluble protein. It has previously been shown that there are only small differences in the protein content of the cells grown in aerobic and anaerobic glucose-limited chemostats at the growth rate of 0.1 h-1.
]. In comparison of enzyme activities we assumed that the protein content of cells grown in oxygen limited conditions would be similar to those of cells grown anaerobically and aerobically. The activities of citrate synthase (CS), aconitase (ACO), isocitrate dehydrogenase (IDH) and malate dehydrogenase (MDH), from the TCA cycle, strongly correlated (correlation > 0.89) with the transcriptome data for the corresponding genes of the TCA cycle (CIT1, ACO1, IDH1,2
, respectively). Of the enzymes of the pentose phosphate pathway, the activity of glucose-6-phosphate dehydrogenase (G6PDH) had a correlation of 0.7 with the corresponding gene, ZWF1
. The activities of 6-phosphogluconate dehydrogenase (6PGDH), transketolase (TKL) and transaldolase (TAL) had a correlation of 0.5 to GND1, TKL1
, respectively, and no correlation to GND2, TKL2
and ORF YGR043C
Figure 4 Enzyme activity levels in 0, 0.5, 1.0, 2.8 and 20.9% oxygen. Activity of TCA cycle enzymes citrate synthase (CS), aconitase (ACO), isocitrate dehydrogenase (IDH), malate dehydrogenase (MDH) and of the PPP enzymes glucose-6- phosphate dehydrogenase (G6PDH), (more ...)
In all the aeration conditions studied, the Pearson's correlation between proteins identified in the 2D gels and the mRNA levels of the corresponding genes in the transcriptome was similar, with an r-value between 0.41 and 0.55. For a more detailed comparison, the 107 significantly changing protein spots (from the 2D-gels) and the corresponding transcripts were hierarchically clustered (Figure ). In the case of multiple protein isoforms, the corresponding transcript was assigned to each isoform separately. Of the eight groups formed by the cluster analysis, the protein and transcript quantities in groups 1 and 6 showed a high correlation (average 0.80 and 0.77, respectively). Members of group 1, related to metabolism of ethanol (ADH2), the glyoxylate cycle (ICL1, MLS1), fatty acid metabolism (FAA2), acetyl CoA synthesis (ACS1, ALD6, ALD4), and glycolysis (FBA1), were at high levels in fully aerobic conditions and both the expression of the genes and the quantity of the proteins decreased with decreasing oxygen availability. Members of group 6, involved in translation (DED1, PAB1, DYS1, HTS1) and amino acid metabolism (MET17, SER1, SAM2), glycolysis and ethanol fermentation (HXK1, ADH1), were at high levels in anaerobic conditions and on low levels in fully aerobic conditions. In groups 2, 4 and 5 the transcript and protein levels differed significantly only in cells provided with 0.5% oxygen. Group 2 contained genes and proteins involved in oxidative stress (SOD2, TSA1), redox balance (GCY1, CYB2), fatty acid metabolism (ETR1) and the TCA cycle (FUM1, LSC1). The protein levels in group 2 were high with 1.0 to 20.9% provided oxygen, while the transcript levels were already high with 0.5% provided oxygen. In group 4, related to the TCA cycle (ACO1, IDH2, SDH1), oxidative phosphorylation (ATP1, QCR2, RIP1, ATP7, ATP3) and other mitochondrial reactions (ILV2, MCR1, TUF1, POR1), the protein levels were highest with 1.0 and 2.8% provided oxygen and the transcript levels were again high already with 0.5% provided oxygen. In group 5, containing genes and proteins related to redox balancing (TRR1, RHR2, DLD3, YEL047C), the highest protein levels were observed in anaerobic conditions and when 0.5% oxygen was provided, while gene expression levels were highest under anaerobic conditions. Members of group 3, involved in various different functions, had their highest protein and gene expression levels in fully aerobic conditions, but in oxygen-restricted conditions the levels did not correlate. Group 7 contained genes and proteins, the expression and quantity of which correlated in some levels of provided oxygen. Group 8 contained genes and proteins that did not show any correlation.
Figure 5 Comparison of protein and transcript level data. Clustering of protein spots which differed significantly in the cultures receiving different oxygen levels with their corresponding gene expression profiles. The expression values are centred and scaled (more ...)