|Home | About | Journals | Submit | Contact Us | Français|
We compared the transcriptomes of Saccharomyces cerevisiae cells growing under steady-state conditions on 21 unique sources of nitrogen. We found 506 genes differentially regulated by nitrogen and estimated the activation degrees of all identified nitrogen-responding transcriptional controls according to the nitrogen source. One main group of nitrogenous compounds supports fast growth and a highly active nitrogen catabolite repression (NCR) control. Catabolism of these compounds typically yields carbon derivatives directly assimilable by a cell's metabolism. Another group of nitrogen compounds supports slower growth, is associated with excretion by cells of nonmetabolizable carbon compounds such as fusel oils, and is characterized by activation of the general control of amino acid biosynthesis (GAAC). Furthermore, NCR and GAAC appear interlinked, since expression of the GCN4 gene encoding the transcription factor that mediates GAAC is subject to NCR. We also observed that several transcriptional-regulation systems are active under a wider range of nitrogen supply conditions than anticipated. Other transcriptional-regulation systems acting on genes not involved in nitrogen metabolism, e.g., the pleiotropic-drug resistance and the unfolded-protein response systems, also respond to nitrogen. We have completed the lists of target genes of several nitrogen-sensitive regulons and have used sequence comparison tools to propose functions for about 20 orphan genes. Similar studies conducted for other nutrients should provide a more complete view of alternative metabolic pathways in yeast and contribute to the attribution of functions to many other orphan genes.
Nitrogen is an essential nutrient for all life forms. Evolutionary selective pressure has thus likely favored the early emergence of cells able to transport and catabolize a wide variety of nitrogenous compounds as well as to synthesize endogenously all essential nitrogen-containing molecules. These properties, typical of free-living unicellular organisms, have been particularly well studied in the bacterium Escherichia coli (86) and the yeast Saccharomyces cerevisiae (24, 84, 136). Yeast can use almost 30 distinct nitrogen-containing compounds, including amino acids, urea, ammonium, nitrogen bases, and purine derivatives (Fig. (Fig.1)1) (24). These molecules enter cells via permeases (51) and are immediately used as building blocks in biosynthesis reactions or catabolized to release nitrogen in the form of ammonium (via deamination), glutamate (via transamination), or both (24, 84, 136). Glutamine is then synthesized by the condensation of glutamate and ammonium, a reaction catalyzed by glutamine synthetase (GNL1). Glutamate and glutamine are the two major nitrogen donors in biosynthesis reactions. As illustrated in Fig. Fig.1,1, ammonium, glutamate, and glutamine are interlinked by specific enzyme systems forming the hub of nitrogen metabolism (24, 84, 136).
Nitrogen transport, anabolism, and catabolism are subject to tight control according to the nitrogen content of the medium. These controls act on gene transcription and on the synthesis, activity, and/or degradation rates of enzymes and permeases (85). Transcriptional controls of genes involved in nitrogen anabolism are of two types: specific ones affecting the genes of one anabolic pathway and the general control of amino acid biosynthesis (GAAC) affecting many amino acid biosynthetic pathways (57, 69). Transcriptional repression of genes involved in a specific biosynthetic pathway has been described in the cases of arginine (39), branched-chain amino acids (leucine, isoleucine, valine) (74), methionine (124), and lysine (10, 103). In the case of leucine, repression is mediated through the inhibition of the Leu3 transcription factor (74), and the corresponding target genes have recently been identified on the whole-genome scale (13). For genes of other anabolic pathways, there appears to be no specific transcriptional control. This is notably the case for the biosynthetic pathways for histidine and aromatic amino acids (phenylalanine, tryptophan, and tyrosine) (69).
GAAC is mediated by the Gcn4 transcription factor (57). This protein is most active in cells starved of amino acids, a situation in which the Gcn2 protein (eIF2α kinase) causes the translation of Gcn4 mRNA to be derepressed (56). Other conditions also promote increased Gcn4 synthesis (139). Gcn4 activates the expression of a large number of genes involved in amino acid biosynthesis. Furthermore, in the absence of any amino acid deficiency, a basal level of Gcn4 mRNA translation is required for the transcription of several genes (e.g., HIS3, ARG4, ARG3, and ARO3) (57). Two genomic studies have aimed to identify the whole set of GAAC target genes (68, 90). In one of them (90), 539 genes whose expression is activated in a Gcn4-dependent manner in amino acid-starved cells were listed.
Many different amino acids can be used as general sources of nitrogen (Fig. (Fig.1).1). Most amino acids present in the external medium are detected by yeast cells via a membrane-associated sensor complex (Ssy1-Ptr3-Ssy5 [SPS]) made of three proteins, including Ssy1, an amino acid permease homolog devoid of transport activity (15, 46). This SPS complex in turn activates the transcription of several amino acid permease genes via the Stp1, Stp2, and Uga35/Dal81 transcription factors (1, 5, 63). Putative additional target genes of this transcriptional control system have emerged from several genomic studies aiming to list all genes induced by amino acids in a Ssy1-, Stp1-, and/or Stp2-dependent manner (43, 45, 73).
Like the transcriptional controls acting on genes involved in nitrogen anabolism, those regulating genes involved in transport and catabolism are of two types: specific ones affecting only a limited number of genes and nitrogen catabolite repression acting on a wide variety of genes (26, 84, 136). Thus, several nitrogenous compounds, such as arginine (39), proline (18), serine, threonine (59), urea, allantoin (25), γ-aminobutyric acid (GABA) (121), and the aromatic amino acids (62), specifically induce the transcription of the genes involved in their utilization. On the other hand, nitrogen catabolite repression (NCR) is typically exerted on the many genes involved in the utilization of nonpreferential nitrogen sources when a good nitrogen source (e.g., ammonium, glutamine, and asparagine) is available in the medium (26, 84, 136). NCR in fact acts through the inhibition of two transcription factors of the GATA family (Gln3 and Gat1/Nil1) which typically bind to upstream 5′-GATA-3′ core sequences and activate gene transcription alone or in conjunction with inducer-specific transcription factors (26, 84). The Gln3 and Gat1 factors are thus most active under limiting nitrogen supply conditions (e.g., when cells grow on poor nitrogen sources like urea and proline) and are also transiently activated upon the addition of rapamycin to nitrogen-rich media (26, 84). Rapamycin inhibits the Tor proteins, which are proposed to govern the inhibition of Gln3 and Gat1 under good nitrogen supply conditions (9). The Tor-dependent inhibition of Gln3 involves the Ure2 protein (28), whereas the repression of Gat1-dependent expression under good nitrogen supply conditions is also dependent on Gzf3/Deh1/Nil2, another GATA family transcription factor (23, 109, 119). A fourth GATA factor encoded by the DAL80/UGA43 gene (27) also acts as an inhibitor of Gat1 in specific gene contexts but is specifically active under poor nitrogen supply conditions (4, 27, 31). Transcription of the GAT1, GZF3, and DAL80 genes is under the control of all four GATA factors. A network of auto- and cross-regulation systems thus links these four key transcriptional regulators of NCR target genes (12, 23, 109, 119). Several studies have focused on identifying in the complete yeast genome the genes subject to NCR or regulated by the GATA factors or those whose expression is activated under nitrogen starvation or by rapamycin (7, 14, 29, 112, 116, 138).
In this study we used a systematic approach to examine the influence of nitrogen on the yeast transcriptome. For this we compared the expression levels of the 5,690 yeast genes in cells growing on 21 distinct unique sources of nitrogen. This analysis has enabled us to identify more than 500 nitrogen-regulated genes and to derive a general scheme representing the status of each nitrogen-sensitive transcriptional control according to the nitrogen source. It has further enabled us to associate new genes with several nitrogen-sensitive regulons and to propose a function related to nitrogen metabolism for several nitrogen-regulated orphan genes. Our results offer a novel and complementary view of how yeast cells adapt their transcriptome and metabolism to the nitrogen supply.
The Saccharomyces cerevisiae strains used in this study are all isogenic with the wild-type Σ1278b strain (8) (Table (Table1).1). Cells were grown in a minimal buffered (pH 6.1) medium with 3% glucose as the carbon source and various nitrogen sources (Table (Table2).2). To nitrogen source-free medium, described in reference 66, each of the following was added as the sole nitrogen source: 10 mM urea (reference medium), (NH4)2SO4, alanine, arginine, asparagine, aspartate, citrulline, GABA, glutamate, glutamine, isoleucine, leucine, methionine, ornithine, phenylalanine, proline, serine, threonine, tyrosine, or valine or 5 mM tryptophan. Comparative analysis of the influence of nitrogen on the expression of the nitrogen-sensitive GAP1 gene in cells growing in the nonbuffered yeast nitrogen base medium versus the citrate-buffered medium (pH 6.1) used in this study revealed similar responses to the quality of the nitrogen source (see Fig. S1 in the supplemental material). The gcn2Δ strain was constructed by the PCR-based gene deletion method (134). The DNA segment used to introduce this mutation was generated with the kanMX2 gene from plasmid pFAa-kanMX2 (81) as a template and the D5-GCN2 and D3-GCN2 PCR oligonucleotide primers (Table (Table3).3). Yeast strain 23344c (ura3) was transformed with the PCR fragment by the lithium method described previously (49). Transformants were selected on rich medium containing 200 μg/ml G418 (Geneticin; GIBCO BRL, Gaithersburg, MD). The GAP1::lacZ fusion in plasmid pFL38 has been previously described (119).
β-Galactosidase assays were performed as described previously (3). Results are expressed in nanomoles of o-nitrophenol formed per minute per milligram of protein. Protein concentrations were measured in assays using bovine serum albumin as the standard.
Total RNA was purified as previously described (75). Quantitative reverse transcription-PCRs (qRT-PCRs) were used to measure the mRNA levels of the following genes: ACT1, YGL258W, ZAP1, ARO9, ESBP6, ARO80, SNQ2, UGA4, AMD2, and MAE1. For this we used the RT-RTCK05 and RT-SN10-05 kits (Eurogentec, Liège, Belgium) with the following primers: ACT1-left, ACT1-right, YGL258W-left, YGL258W-right, ZAP1-left, ZAP1-right, ARO9-left, ARO9-right, ESBP6-left, ESBP6-right, ARO80-left, ARO80-right, SNQ2-left, SNQ2-right, UGA3-left, UGA3-right, AMD2-left, AMD2-right, MAE1-left, and MAE1-right (Table (Table33).
To purify mRNAs, we used the poly(dT) Oligotex kit (QIAGEN, Westburg, The Netherlands). In each assay, 5 μg mRNA was converted to a labeled cDNA target with the Fairplay indirect labeling kit (Amersham-Pharmacia-Biotech, Gent, Belgium) as previously described (137). Microarrays corresponding to the genome of S. cerevisiae strain S288C were produced by Eurogentec (Liège, Belgium) (D290C and G250E series) (122). Cy3- and Cy5-labeled cDNA targets were combined in equal amounts (0.5 μg), vacuum dried (SpeedVac centrifugation), and resolubilized in 50 μl hybridization buffer composed of DIG Easy Hyb solution (Roche Diagnostics, Vilvoorde, Belgium) with 1 mg/ml salmon DNA. Hybridization with the solution of CyDye-labeled cDNA was performed at 42°C for approximately 24 h. Following hybridization, the slide was first washed in 2× SSC (1× SSC is 0.15 M NaCl plus 0.015 sodium citrate) for 30 seconds and then in 0.1× SSC-0.1% sodium dodecyl sulfate for 5 min and finally twice in 0.1× SSC for 5 min. It was then immediately dried by centrifugation (8 min at 800 × g).
The hybridized microarray was scanned with a GMS418 fluorescence reader (Genetic MicroSystems, Woburn, MA) with a resolution of 10 μm. The slide was scanned twice to get the Cy5 and Cy3 signals, once with a high-photo multiplier tube (PMT) gain and once with a low-PMT gain. Signal quantification for each probe on the microarray was performed with GenePix 4.01.17 image acquisition software (Axon Instruments, Union City, CA). Spots with a diameter greater than 210 μm or smaller than 80 μm were considered low-quality spots, as were spots having a median pixel intensity minus mean pixel intensity of more than 40% of the median pixel intensity for each channel and those having less than 95% of their spot pixels be more than 2 standard deviations above background in either the green or the red channel. Low-quality spots were excluded from further analysis. Intensity values from high-PMT-gain pictures were used, except in the case of saturated spots. In the latter case, intensity values from low-PMT-gain pictures were used after scale correction. Intensity-dependent within-tip group and scale normalization were applied as described in reference 140 using Bioconductor tools (48). Fluorescence ratios were computed on the basis of hybridization signals normalized with background corrections. Experiments were carried out independently twice, with dye swapping. For each experiment, we calculated for each gene the value M as log2(expressionM.NS/expressionM.urea) (where expressionM.NS is the level of expression of the gene on minimal medium with the considered nitrogen source at the concentration specified in Table Table22 and expressionM.urea is that on minimal medium with urea). Genes that could not be measured (because of a low-quality spot) in at least one of the duplicate experiments were not considered in further analyses. Figure S2 in the supplemental material compares for each medium the results obtained in the two experiments. Calculated on the basis of a normal distribution of the SAM (significance analysis of microarray) test statistic S, S = Mg/c + SDg (where Mg and SDg are, respectively, the mean and the standard deviation of the M values for gene g and c is the 90th percentile of the SDg values) (30); the P value indicates the confidence level at which a gene can be considered differentially expressed on a given medium (M.NS) compared to the reference medium (M.urea). For each medium, we selected genes having a P value below 1/5,690 (5,690 is the total number of genes considered) to be differentially expressed. This value was chosen in the hope that there would be no more than one false positive per medium.
For genes not expressed on urea but highly expressed on other nitrogen media, M values are typically greater than 1 under all conditions for which a ratio could be calculated. Fifteen of the 506 differentially expressed genes are concerned: ARO9, BAP2, BAP3, SED1, ADH4, YGL258W, YPS5, DOG1, SPL2, YHR213W, YPS6, DAN1, YLL053c, ALD3, and YOR387C (see Table S1 in the supplemental material). The computed ratios calculated for these genes must be considered qualitative rather than quantitative. They were nevertheless considered in the data analysis, as they reflect a high level of expression on specific nitrogen media. Analysis of the distribution of gene expression levels on the 21 tested nitrogen media indicates that the number of unexpressed or poorly expressed genes is not particularly higher on urea than on the other media (see Fig. S3 in the supplemental material). Urea is thus an appropriate choice for the reference nitrogen source.
Hierarchical clustering was performed using TIGR MultiExperiment Viewer (111) on the 390 genes whose expression varies significantly under at least one nitrogen condition and for which an expression ratio could be computed for at least 13 media (N value = 13). The data presented here are those obtained using the complete linkage method and the average dot product as a measure of the distance between gene expression profiles. The tree was finally segmented into eight clusters, with the distance threshold between genes considered to be 0.137 in TIGR's MultiExperiment Viewer. Other settings (N value, clustering method, distance, distance threshold) were also tested and evaluated by comparing the obtained gene clusters with predefined lists of genes. Those which were finally chosen globally resulted in the best overlaps with several lists of nitrogen-regulated genes and thus optimal physiological coherence without too much loss of information. A second hierarchical clustering was performed on genes belonging to cluster 3 using the complete linkage method and Pearson's correlation. The resulting subtree was segmented into four subclusters, with a distance threshold between genes considered to be 0.548 in TIGR's MultiExperiment Viewer.
Comparison of groups or subgroups of coregulated genes to predefined gene lists (about 300 in total) were performed using the compare classes utility provided by the Regulatory Sequence Analysis Tools website (http://rsat.ulb.ac.be/rsat/) (130). To check the significance of the overlap between two lists, overlapping P values were computed on the basis of the following hypergeometric formula:
The P value is the probability of observing at least c common elements between a given query class (the size is q) and a given reference class (the size is r), with consideration of the size of the population (n). Ckj is the number of possible combinations of j elements among k. An e value was also calculated according to the formula Ncomp × P value, where Ncomp is to the number of performed comparisons. The corresponding significance score (sig) is equal to −log10(e value). The predefined gene lists are those available in the MIPS functional catalogue (110) or Gene Ontology categories (6) and in many other lists, like those obtained by chromatin immunoprecipitation (ChIP)-chip (53) or transcriptomic analyses or gene lists available in the literature. The complete data set of these comparisons and the gene lists are accessible online (http://dbm.ulb.ac.be/PhysCell/data/Godard.htm).
Upstream sequences of all the yeast genes were retrieved over 800 bp upstream from the start codon. When the upstream open reading frame (ORF) is closer than this distance, a shorter sequence is retrieved, which allows us to discard coding sequences.
We used the program oligo-analysis (131) to detect overrepresented oligonucleotides in the promoter sequences of the 41 genes annotated as NCR sensitive (A-NCR genes). This analysis was performed for all oligonucleotide sizes between 5 and 8, leading to a total of 56 significantly overrepresented oligonucleotides. Quite consistently, most of these motifs were variants of the GATA box, and the most significant among them was the canonical GATA box GATAAG. To these 56 discovered motifs, we added the auxiliary GATA box (GATTA), and six pairs of GATA boxes separated by a region from 0 to 60 base pairs. The program dna-pattern was used to count the occurrences of the 63 motifs in each yeast gene promoter.
We applied linear discriminant analysis to classify genes into two classes (NCR versus not NCR) on the basis of the pattern counts (the complete data set is available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). As a positive training set (NCR), we used the 41 genes previously A-NCR (see Table S2 in the supplemental material). Since we did not dispose of a reliable negative set for the training (i.e., genes not regulated by NCR), we applied the same strategy as that described previously (117) by randomly selecting a set of 123 (3 × 41) genes in the yeast genome. Since the number of variables (63 in total) is greater than the number of genes in the positive training set, we applied forward stepwise selection to select the subset of variables giving the most accurate classification. The efficiency of a classification was estimated using leave-one-out cross-validation. After this phase of training and variable selection, the discriminant function was then applied to each yeast gene to estimate its posterior probability to be NCR sensitive and to assign it to a class (NCR or not NCR). The whole process was repeated 10 times with different negative groups in order to reduce the number of fluctuations due to random selection. A list of 100 genes predicted to be subject to NCR was finally obtained (see Table S3 in the supplemental material).
The microarray data set has been deposited at the Gene Expression Omnibus resource (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE4861.
We have used whole-genome microarray hybridization to compare the transcriptomes of wild-type yeast strain Σ1278b during growth on a minimal medium containing 21 different single nitrogen sources, including urea used as a reference condition (Table (Table2).2). The 21 compounds were selected from a list of 27 nitrogen sources for their ability to support reproducible growth with a generation time of less than 5 h. The reasons for choosing urea as a reference nitrogen source to which the 20 other media were compared for genome expression were that urea catabolism and its regulation are well known (25), the generation time of strain Σ1278b on urea is near the middle of the studied generation time range (Table (Table2),2), and the major regulation systems like NCR and SPS-mediated control do not operate on this medium.
Strain Σ1278b was selected for this work because it is the reference strain of many previous studies of nitrogen regulation, including those having led to the concepts of NCR (136) and pseudohyphal growth induced by limiting ammonium supply to diploid cells (50). Furthermore, strain Σ1278b displays the negative regulatory effect exerted by ammonium on the expression of many genes involved in the use of poor nitrogen sources and on the activities of enzymes and permeases encoded by these genes (136). This negative control is less pronounced or even absent in strain S288C (whose genome has been sequenced) and its derivatives (106). This regulation by ammonium is also exerted in a diploid strain obtained by crossing Σ1278b with S288C, showing that the particular behavior of S288C with respect to ammonium is recessive (85).
The medium was buffered at pH 6.1, and the cells were harvested during exponential growth after at least 10 generations and at low cell density (~106 cells per ml). We could thus consider that changes in medium composition were minimal during cell culture and that cells were harvested in a balanced state of growth. These precautions ensured highly reproducible growth conditions. The mRNA samples were hybridized to microarrays containing 5,690 known or predicted genes defined after reannotation of the yeast genome on the basis of comparative genome analyses of closely related Saccharomyces species (88, 122). Statistical analysis of collected data (see Materials and Methods) enabled us to identify 506 genes displaying significantly different levels of expression on at least one of the test media and on urea (see Table S1 in the supplemental material). This list of 506 genes was compared to multiple predefined gene lists, including those of the Gene Ontology and MIPS functional categories (see Materials and Methods). As expected, it was found to be highly enriched in genes involved in nitrogen and amino acid and/or sulfur metabolism or in genes whose expression is known to be under nitrogen regulation (the complete data set is available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). Among the gene lists compared to the 506 nitrogen-regulated genes are those including the stress-responsive genes. The expression of these genes is typically up-regulated (between 200 and 300 genes) or down-regulated (between 400 and 600 genes) in response to a wide range of environmental changes, with this general control being variously named the environmental-stress response (ESR) or the common environment response (CER) (20, 47). For instance, the majority of genes repressed in response to environmental disturbances code for proteins involved in protein biosynthesis, notably, ribosomal proteins (20, 47). Although yeast cells grew at different rates according to the nitrogen source supplied, we observed no significant variation in levels of expression of the protein synthesis machinery genes from one medium to another (P value = 0.99 [see Fig. S4 in the supplemental material]). It thus seems that the ESR/CER is not differentially active on the 21 tested nitrogen media. This result most likely reflects the fact that, in our experiments, the cells were harvested during steady-state growth, without any significant perturbation in the medium.
We next used the transcriptome data set to classify both the nitrogen media and the genes and to identify the main transcriptional control circuits active on each tested medium.
The gene expression data were used to establish a classification of the 20 tested nitrogen sources versus urea. For this, we started from the 506 genes and discarded those whose expression versus that on urea could not be computed on a significant number of nitrogen sources, e.g., because their expression level was too low or could not be measured. We thus selected 390 of the 506 genes displaying a significant expression level on at least 13 of the 20 media tested. We then applied hierarchical clustering (see Materials and Methods) to their expression values and derived a nitrogen source classification tree based on the average dot product and the complete linkage method (Fig. (Fig.2A).2A). We compared this result with those obtained using other metrics and/or tree construction methods. It appeared that two main groups, together containing 14 nitrogen sources, were classified similarly by the majority of techniques applied. Group A (Fig. (Fig.2,2, left part of the tree) includes asparagine, glutamine, and ammonium, known to be good nitrogen sources supporting rapid growth (generation time, ~2 h). Interestingly, serine also appears in this group and the corresponding generation time is one of the shortest (generation time, 2 h 13 min). Group A also includes aspartate, alanine, arginine, and glutamate, on which the generation time is below 3 h. It is noteworthy that transamination or deamination of the nitrogenous compounds of group A yields pyruvate or Krebs cycle intermediates (α-ketoglutarate, oxaloacetate), directly assimilable by cell metabolism (Fig. (Fig.1).1). On the media containing nitrogen sources not belonging to group A, the generation time exceeds 3 h. The only exception is the medium containing GABA, which supports fast growth despite significant differences at the level of the yeast transcriptome. Group B (Fig. (Fig.2,2, right part of the tree) includes leucine, isoleucine, methionine, threonine, tryptophan, and tyrosine. In contrast to the group A nitrogen sources and except for leucine, these nitrogen sources support slow growth (generation time, >4 h). Moreover, it is known that the catabolism of these compounds leads to nonmetabolizable products that the cell must excrete and from which fusel oils derive (115, 132, 135). Finally, the classification of the remaining nitrogen sources, namely, valine, phenylalanine, ornithine, proline, GABA, and citrulline, depends strongly on the clustering technique applied. This is why we were unable to associate these compounds with any group on the basis of transcription data. Surprisingly, this classification also indicates that arginine, ornithine, and citrulline are not similar as regards yeast growth or their effects on the transcriptome. Yet all three are urea cycle intermediates degraded largely via the same pathway (136) (Fig. (Fig.1).1). Likewise, among the aromatic amino acids, phenylalanine affects the growth and transcriptome of yeast differently from tryptophan and tyrosine. This is also true of valine, which behaves differently from the other two branched-chain amino acids, leucine and isoleucine.
Two distinct approaches were used to identify the transcriptional regulations which are differentially active according to the nitrogen source and the associated target genes. One was to use hierarchical clustering to classify 390 genes according to their expression profiles and to compare the resulting groups of coexpressed genes with a large number of annotated gene lists available in databases or generated by previous genome-wide studies (see Materials and Methods). This first approach revealed mostly a limited number of transcriptional controls acting on large sets of genes. A second, complementary approach was to identify groups of genes which are up- or down-regulated on any medium compared to their expression on urea. This approach, applied to the 506 differentially expressed genes, revealed additional, smaller groups of coregulated genes which are the targets of transcriptional controls responding to more-specific signals, e.g., the presence of a single amino acid in the medium.
Below, we present the data obtained by hierarchical clustering based on the average dot product metric (see Materials and Methods). The resulting tree was segmented into eight clusters gathering 140, 14, 119, 37, 21, 23, 6, and 30 genes (Fig. (Fig.2A2A and see Table S1 in the supplemental material). These clusters of genes were systematically compared to predefined gene lists (see Materials and Methods; data available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). The largest cluster (cluster 1, 140 genes) significantly overlapped the lists of NCR target genes. We thus describe below how the expression of this first group of genes varies according to the nitrogen conditions. The next sections are devoted to the analysis of the other large clusters. The four remaining smaller clusters significantly overlapped with gene lists of more-specific regulations (e.g., genes inducible by GABA or repressed by methionine), which are considered in the second part of Results.
We established a list of genes shown by classical molecular biology studies to be sensitive to NCR, considering a gene to be subject to this regulation if its expression level responds to at least one positive transcriptional regulator (GATA transcription factor Gln3 or Gat1) and to at least one negative one (Dal80, Gzf3, or Ure2) related to NCR. For example, the ZRT1 gene was not included in the list because, although its expression depends on the positive factor Gln3, it does not seem to be controlled by any NCR-related negative regulator (29). We thus propose a list of 41 A-NCR target genes (see Table S2 in the supplemental material). We obtained results for only 34 of the 41 A-NCR genes because ASP3-1, ASP3-2, ASP3-3, and ASP3-4 are not present in the genome of strain Σ1278b (136) and because we were unable to measure any significant expression of the genes ATG14, BAT2, and GDH3 under most conditions tested. Furthermore, two A-NCR genes (VID30 and PEP4) failed to show significant differential expression under our conditions. We thus focused our analysis on the 32 remaining A-NCR genes.
Figure Figure3A3A shows that the expression profiles of these 32 A-NCR genes are not all similar. Closer analysis led us to subdivide these genes into two categories. One comprises A-NCR genes which are subject to other transcriptional regulations in addition to NCR. For instance, the UGA1 and UGA4 genes are specifically inducible by GABA (121) and the CAR1 gene by arginine (40). Likewise, DAL4, DAL5, DAL7, DUR1,2, and DUR3 are inducible by allophanate (a product of urea degradation) (25) and PUT1 and PUT2 by proline (18). Other genes in this first category are subject to transcriptional regulations acting on wider sets of genes. This is the case for AGP1, whose expression is induced by extracellular amino acids via the SPS system (63). Likewise, GDH2 is subject to GAAC via transcription factor Gcn4 (57). Yet the expression of several of these inducible genes tends to be lower on nitrogen media that support optimal growth (Fig. (Fig.3A),3A), which is consistent with the previous observation that the basal expression of these genes (i.e., monitored in inducer-free media) is subject to NCR.
The second category includes 18 A-NCR genes all sharing the same expression profile (Fig. (Fig.3A)3A) and for which NCR is the main (if not the sole) known transcriptional regulation. This category includes GAP1, which codes for the general amino acid permease (67) and has been the focus of many studies aimed at deciphering the molecular mechanisms of NCR (26, 85). It also includes three of the four genes (GAT1/NIL1, DAL80/UGA43, GZF3/DEH1/NIL2) coding for the GATA transcription factors involved in the transcriptional control of NCR target genes (26, 85), shown in previous studies to be under NCR control and interlinked in a network of trans- and auto-regulation systems (12, 23, 109, 119). Compared to the levels observed on urea, the expression of these 18 A-NCR genes is strongly repressed in yeast cells growing on group A nitrogen sources (Fig. (Fig.3).3). We thus used the average expression profile of these 18 genes as a measure of the degree of NCR on each medium tested. This measure turned out to be very similar to the expression profiles obtained for GAP1 both on microarrays and in experiments using a lacZ reporter gene (Fig. (Fig.3).3). We first noticed that the average expression of the 18 genes is lower on all 20 tested nitrogen media than on urea. This indicates that under the culture conditions used in this study, the reference nitrogen source (urea) is also the one on which NCR is least active. A clear NCR effect is observed with the group A nitrogen sources that support a generation time below 3 h (ammonium, asparagine, glutamine, aspartate, serine, glutamate, arginine, alanine, GABA). NCR is strongest with asparagine, glutamine, and serine. Although glutamine and asparagine are well known to exert a strong NCR effect, the finding that serine does too has been mentioned in only one previous study (17). Interestingly, despite the rapid growth supported by ammonium, glutamate, and aspartate (generation time, ≥2 h but ≤2 h 15 min), these nitrogen sources exert a lesser NCR effect than do asparagine, glutamine, and serine. Alanine, arginine, and GABA support slower growth (generation time, ≥2 h 20 min but ≤2 h 45 min) than glutamate or aspartate but exert equally strong NCR. When the nitrogen source supports growth at a generation time ranging from 3 to 4 h (this is the case for valine, proline, and phenylalanine), NCR is weaker than on the above-mentioned media, and when the generation time exceeds 4 h, there appears to be no NCR except on ornithine (generation time, 4 h 32 min). Surprisingly, as mentioned above, NCR is minimal on urea, despite the intermediate generation time (3 h 35 min). We can thus generally and qualitatively associate three generation time intervals, <3 h, ≥3 h but <4 h, and ≥4 h, with three levels of NCR: strong, weak, and essentially absent, but as shown by the exceptions highlighted above (ammonium, glutamate, aspartate, ornithine, urea), there is no perfect correlation between the growth rate observed with each nitrogen source and the degree of NCR. This point has been discussed in a previous review (24).
As mentioned above, hierarchical clustering analysis of gene expression data defined a group of 140 coexpressed genes (cluster 1) (see Table S1 in the supplemental material). These 140 genes include 26 of the 32 A-NCR genes found to be differentially expressed in our study (overlap P value = 4.5 × 10−34; sig = 31.6). The six remaining genes were classified in other clusters of coexpressed genes as described below. The group of 140 genes likely includes other NCR target genes in addition to the 26 A-NCR genes. To identify these genes, we first compared the 140 genes with the lists of genes generated by two independent studies aimed at identifying in the whole genome genes controlled by the GATA transcription factors. One of these lists contains 83 yeast genes proposed on the basis of an analysis of ChIP-chip and genome expression data—including those reported in reference 116—to be associated with at least one of the four GATA transcription factors (7); these 83 genes include 17 of the 41 A-NCR genes (overlap P value = 2.3 × 10−22; sig = 19.8). The other list comprises 91 genes whose transcript levels are reported to be positively controlled by the Gln3 and Gat1 GATA factors (112); it includes 28 of the 41 A-NCR genes (overlap P value = 1 × 10−44; sig = 42.2). We thus compared our group of 140 genes with those two lists (Fig. (Fig.4A).4A). Of the 16 genes found in all three studies, 4 do not appear on the list of A-NCR genes (GLT1, VBA1, DIP5, and YDR090C), nor do 26 of the 38 genes identified by two of the three studies (Fig. (Fig.4A).4A). These 30 genes (4 plus 26) are thus highly probable novel NCR target genes (P-NCR) (Fig. (Fig.4B).4B). Accordingly, the average expression profile of these P-NCR genes (Fig. (Fig.4C)4C) is quite similar to that described for the A-NCR genes (Fig. (Fig.3B),3B), although the amplitudes of differential expression according to the nitrogen source are smaller. We thus propose an updated list of 71 (41 plus 30) genes whose expression appears to be controlled by NCR (see Table S2 in the supplemental material).
Figure Figure4A4A shows that additional putative NCR target genes were found in only one of the three studies. This concerns 91 genes in our study, 52 genes in reference 7, and 47 genes in reference 112. As 11 of these 190 genes are A-NCR genes (Fig. (Fig.4A),4A), we cannot simply dismiss them all as false positives. To identify among these 190 genes those which are most likely to be true NCR targets, we analyzed their upstream regions. The promoter regions of NCR target genes typically contain several 5′-GATA-3′ core sequences recognized by the GATA family transcription factors (26, 85). We thus counted, in the upstream regions of all yeast protein-encoding genes, the occurrences of oligonucleotides and spaced pairs corresponding to NCR-specific regulatory motifs and used these counts to classify genes into two classes (NCR or not NCR) using linear discriminant analysis with forward stepwise variable selection (see Materials and Methods). Leave-one-out cross-validation estimates the precision to 92%, with most errors corresponding to false negatives. The optimal discriminant function was then applied to all yeast gene upstream regions, resulting in a list of 100 putative NCR-regulated genes (see Table S3 in the supplemental material). This list included 30 of the 41 A-NCR genes and 4 of the 30 P-NCR genes. Among the remaining genes, 14 were included in the 190 genes found in one of the three above-described experimental studies. We thus considered these 14 genes as additional P-NCR targets potentially extending to 85, i.e., 41 (30 plus 14) A-NCR plus P-NCR genes, the number of yeast genes subject to NCR (see Table S2 in the supplemental material).
We used qRT-PCR to measure the transcript levels of 20 of the 44 P-NCR genes in the wild type and ura3 gzf3 dal80 mutant grown on glutamine. In this triple mutant strain, the nitrogen catabolite repression exerted on Gln3- and Gat1-dependent transcription is largely relieved (26, 119). We have obtained data for 18 genes, and in all cases, the gene was found reproducibly derepressed in the mutant, confirming their sensitivity to NCR (see Table S4 in the supplemental material). About two-thirds of the 44 P-NCR gene products are known to be involved in nitrogen metabolism, and some of them have previously been shown to be expressed to a lower level under good nitrogen supply conditions (see Table S2 in the supplemental material). Other P-NCR genes code for previously studied proteins which do not seem to be directly associated with nitrogen metabolism. The remaining P-NCR genes and three of the A-NCR genes have not been functionally characterized to date. Different methods of protein sequence comparison were applied to most of these proteins to try to infer a putative function. Among other data provided by this analysis (for details, see Table S2 in the supplemental material), we identified two probable amino acid racemases (YIR030C/DCG1 and YGL196W) and a putative vacuolar amino acid transporter (YDR090C). Furthermore, ORF YIL167W has been annotated as a pseudogene coding for part of a serine/threonine deaminase homolog (Sdl1) in strain S288C, with the adjacent gene YIL168W coding for the second part of this enzyme. We found that both YIL167W/SDL1 and YIL168W emerge as P-NCR genes similarly expressed in strain Σ1278b on all tested nitrogen sources. In closely related Saccharomyces species (S. paradoxus, S. mikatae, and S. bayanus), these two ORFs are fused and the corresponding gene probably encodes a functional protein (70). Similarly, the NIT1 gene, coding for a protein similar to nitrilases (98), appears to correspond to two adjacent ORFs, YIL164C and YIL165C, which are similarly expressed in our experiments. These two ORFs are also fused in S. paradoxus, S. mikatae, and S. bayanus (70), suggesting that they code for a functional enzyme in these species. We have sequenced the YIL167W/SDL1-YIL168W and NIT1/YIL164C-YIL165C ORFs isolated from strain Σ1278b used here. We found that, contrary to the situation in strain S288C, neither ORF is interrupted by any stop codon. This strongly suggests that both YIL167-168W/SDL1 and NIT1/YIL164-165C code for functional proteins in strain Σ1278b. The strain with the latter genotype thus appears suited for functional analysis of these genes. Finally, two other P-NCR genes (LEE1/YPL054W and YOR052C) code for proteins of unknown function that contain predicted zinc finger motifs. These genes are described in more detail in the last section of Results. More details on the primary sequences of all P-NCR gene products are available in Table S2 in the supplemental material.
Among the groups of genes found to be coexpressed on the various test media, cluster 4 and cluster 5 include 37 and 21 genes, respectively (see Table S1 in the supplemental material). Both of these clusters contain a significant number of genes subject to the general control of amino acid biosynthesis (GAAC): 7 (cluster 4) and 5 (cluster 5) are indeed among the 37 GAAC target genes originally defined on the basis of classical molecular studies (57) (respective overlap P values = 1.2 × 10−9 and 9.9 × 10−8; sig = 7.13 and 5.20), 16 (cluster 4) and 15 (cluster 5) are on the list of 539 Gcn4 target genes defined on the basis of whole-genome transcript analyses (90) (respective overlap P values = 6.1 × 10−9 and 1 × 10−12; sig = 6.42 and 7.13), and 15 (cluster 4) and 14 (cluster 5) are among the 187 genes found by ChIP-chip analyses (binding P value < 10−3) to be associated with Gcn4 (53) (respective overlap P values = 3.4 × 10−14 and 2.3 × 10−17; sig = 10.98 and 14.15). In addition, analysis of the sequences located upstream from the 37 and 21 genes reveals that the 5′-GAGTCA-3′ sequence is significantly overrepresented among them (sig = 2.77 and 1.98) (130). This consensus motif corresponds with UASGCRE, the Gcn4 binding site (95). Clusters 4 and 5 thus contain a subset of genes whose expression is controlled by this transcription factor. Interestingly, the genes of these groups are typically expressed to a higher level on leucine, isoleucine, methionine, threonine, tryptophan, and tyrosine than on urea, thus suggesting that GAAC is activated in cells growing on the group B nitrogen sources (Fig. (Fig.5).5). Accordingly, the average expression profiles of genes identified as GAAC targets in two previous studies (57, 90) reveal higher expression levels on the above-mentioned media (see Fig. S5 in the supplemental material). To ascertain the importance of GAAC on group B media, we examined the growth of a gcn2Δ mutant strain on all 21 nitrogen sources tested in our study. GCN2 is a key gene of GAAC, coding for the eIF2 kinase required to derepress the translation of Gcn4 mRNA (58). We found that the deletion of GCN2 specifically reduces growth when the nitrogen source is a group B compound (Fig. (Fig.6).6). Hence, the GAAC is not only more active but also important for optimal growth on group B media. Yeast on these nitrogen sources is thus characterized by a lack of NCR, slow growth, and a more active GAAC (Fig. (Fig.2B2B).
Gene classification by hierarchical clustering revealed a third large group of 119 genes (cluster 3 [see Table S1 in the supplemental material]). These genes are expressed on most media better than on urea. Comparison of this cluster with predefined gene lists revealed significant overlaps with the target genes of the SPS system, those of the Zap1 transcription factor, and those of the unfolded-protein response (UPR) (data accessible at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). Moreover, different genes of this cluster display quite different expression profiles. We thus decided to subclassify them by means of a clustering strategy based on Pearson's correlation metric (see Materials and Methods). This led to subdividing the cluster 3 genes into four subclusters. The average expression profiles of these subclusters are presented in Fig. Fig.7.7. As detailed below, three of these subclusters overlap previously identified regulons.
A first subcluster (subcluster 3-1) comprises 15 genes which are expressed to a higher level on group B nitrogen sources, citrulline, phenylalanine, valine, alanine, and serine than on urea. They are poorly expressed on urea, GABA, or proline (Fig. (Fig.7A).7A). The nitrogen sources promoting higher-level expression of these 15 genes are precisely the strongest amino acid inducers of the SPS sensor system (Ssy1-Ptr3-Ssy5 [see the introduction]) (1, 33, 63). Consistently, 3 of the 15 genes (AGP1, MUP1, and GNP1) of subcluster 3-1 encode amino acid permeases known to be targets of the SPS system. Five permease genes (BAP2, BAP3, PTR2, TAT1, and TAT2) in addition to AGP1, MUP1, and GNP1 have been described as SPS target genes in classical molecular studies (11, 38, 63, 72). Although these genes also show lower expression on urea, proline, and GABA than on most of the other media, they were not classified within subcluster 3-1, as their expression profiles significantly differ from those of AGP1, GNP1, and MUP1. The BAP2 and TAT1 genes were, respectively, classified in subcluster 3-2 and subcluster 3-3 (see below), and the three others are not among the 390 genes used in this hierarchical clustering analysis. That the responses to external amino acids differ significantly among SPS target genes has also been observed in experiments based on the use of lacZ reporter fusions for the AGP1, GNP1, BAP2, BAP3, TAT1, and TAT2 genes (our unpublished data). These differences are likely due to the fact that the SPS target genes are subject to other transcriptional regulations whose natures and potencies differ from one gene to another. For instance, PTR2 is also inducible by internal peptides via the Cup9 and Ubr1 factors (126), AGP1 is also subject to NCR (1), and BAP2 is also under the control of the Leu3 transcription factor responding to leucine (37).
We consider that other genes in subcluster 3-1 might be SPS targets, displaying an expression profile resembling that of AGP1, GNP1, and MUP1. Three previous studies based on whole-genome microarray hybridization have aimed to identify all yeast genes up-regulated by external amino acids in an SPS-dependent manner (43, 45, 73). In another study, the binding sites for the Stp1 and Stp2 transcription factors were mapped on the whole-genome scale in ChIP-chip experiments (53). Interestingly, although these reports propose lists comprising between 22 and 72 target genes, the overlap between them is mostly limited to the few amino acid permease genes mentioned above. The same applies to the overlap between these lists and the 15 genes of subcluster 3-1. This suggests that the number of genes up-regulated by amino acids via the SPS system might be more limited than generally thought, possibly mainly to amino acid permease genes.
A second subcluster (subcluster 3-2) comprises 45 genes which are expressed to a higher level on media containing a good nitrogen source (asparagine, glutamine, serine, ammonium, aspartate, arginine, or glutamate) than on media containing a poor one (Fig. (Fig.7B).7B). The expression profile of these genes is thus opposite to that of the NCR target genes. Among these 45 genes, 21 belong to the list of 381 targets of the UPR as defined by a previous whole-genome transcript analysis (125) (overlap P value = 1.4 × 10−14; sig = 12). KAR2, a well-known UPR target (93, 108) even though it does not appear in this list of 381 genes, also belongs to subcluster 3-2. The expression of UPR target genes typically increases when incorrectly folded proteins accumulate in the secretory pathway. This response can also be induced by exposing cells to dithiothreitol, a powerful reducing agent preventing the formation of disulfide bridges, or to tunicamycin, an inhibitor of N-glycosylation reactions. The transcription factor Hac1 and its regulator Ire2 are the two main protein mediators of the UPR (83, 99, 120). Using yMGV (the yeast Microarray Global Viewer) (77, 89), we compared the average expression profile of the 45 genes of subcluster 3-2 with that of the 85 above-defined NCR target genes in a wide set of whole-genome transcript analyses. We found these two gene groups to display opposite expression profiles, notably in two series of experiments aimed at inventorying stress-responsive genes (47). In this study, the NCR target genes were activated under nitrogen starvation and in the stationary phase of growth, while the 45 genes of subcluster 3-2 showed reduced expression under these conditions (see Fig. S6 in the supplemental material).
Subcluster 3-3 comprises 42 genes showing lower expression on urea than on any other nitrogen source (Fig. (Fig.7C).7C). Surprisingly we found among these genes 16 of the 46 established targets of the Zap1 transcription factor (overlap P value = 5.1 × 10−25; sig = 22.5). Expression of these 46 genes is activated by the Zap1 transcription factor in zinc-deprived cells (82). Interestingly, most of the 26 other genes of subcluster 3-2 also showed Zap1-dependent derepression (82), even though they lack the binding site for this factor in their respective promoter regions. We measured by qRT-PCR the expression of two known Zap1 target genes, YGL258W and ZAP1 itself, on urea and valine media. The results confirm that both genes are expressed to a lower level on urea than on valine (see Fig. S7 in the supplemental material). We also increased the zinc ion concentration from 0.05 μM (the Zn2+ concentration in the minimal medium used in our experiments) to 2.5 μM (the Zn2+ concentration in standard yeast nitrogen base medium); under these conditions, neither of the two genes is expressed to a significant level on either urea or valine medium (data not shown). The Zap1 regulon is thus active under our growth conditions but less so when urea is the sole nitrogen source. Yet we observed no growth defect on media containing as little as 0.05 μM Zn2+ (data not shown), indicating that the zinc ion concentration in the buffered minimal medium used in our experiments is not growth limiting. We do not understand the effect exerted by urea on the expression of Zap1-controlled genes. Other studies have likewise revealed links between nitrogen metabolism and cellular zinc ion homeostasis. On the one hand, expression of the Zap1 target gene ZRT1, coding for a plasma membrane zinc transporter, depends on the GATA transcription factor Gln3 (29). The expression of two other Zap1 target genes, ZRT2 and ZRC1, coding for two zinc transporters located, respectively, at the plasma membrane and the vacuolar membrane (44), is also subject to this control (112). Yet the relationship between zinc ion homeostasis and nitrogen metabolism remains unclear.
Subcluster 3-4 comprises 17 genes showing higher expression on all media tested than on urea and their highest expression on methionine and threonine (Fig. (Fig.7D).7D). Interestingly, eight of these genes encode proteins displaying a mitochondrial localization, according to annotations in the SGD (42).
So far we have focused our analysis on 390 genes which are differentially expressed according to the nitrogen source, classifying them by clustering analysis. This has enabled us to identify several transcriptional regulations acting on large sets of genes and the nitrogen supply conditions under which these global controls operate (see Discussion). We next analyzed in detail the 506 genes over- or underexpressed on any medium compared to their expression on urea. For this, we again applied the hierarchical clustering technique, this time to the groups of genes which are up- or down-regulated on any of the 20 nitrogen sources versus urea. For several nitrogen sources such as asparagine, glutamine, ammonium, aspartate, alanine, and valine, the identified genes appear on lists of targets of the above-described transcriptional controls, e.g., NCR. Hence, none of these sources induced any detectable specific transcriptional regulation. For other nitrogen sources, there emerged groups of coregulated genes that were not identified in the above analysis and showed activated or repressed expression on only a limited number of nitrogen sources. These genes, described below, are targets of more-specific transcriptional regulations triggered by the nitrogen sources concerned.
Five genes show higher expression on phenylalanine, tryptophan, and tyrosine than on urea (see Fig. S8 in the supplemental material). As expected, these include ARO9 and ARO10, coding, respectively, for the transaminase and the decarboxylase involved in degrading these amino acids (62, 133) and known to be induced by aromatic amino acids via the Aro80 transcription factor (62). Among the three other genes displaying a similar expression profile, ESBP6 encodes a mitochondrial protein sharing sequence similarities with monocarboxylic acid transporters, but its function remains unknown (87). Another such gene is YDR379C-A, a gene adjacent to ARO10/YDR380W and separated from it by 701 bp. YDR379C-A and ARO10/YDR380W are divergent and thus share the same promoter region. The function of the YDR379C-A gene is unknown, and the sequence of its product is not indicative of any particular function. Lastly, ARO80 itself is expressed to a higher level on tryptophan than on urea, suggesting that Aro80 also controls the expression of its own gene. The promoter regions of these five genes have all been found to associate with the Aro80 transcription factor (53). In accordance with the view that Aro80 acts on ESBP6 and ARO80 in addition to ARO9 and ARO10, the upstream noncoding regions of these genes contain the cis-regulatory ARO upstream activation sequence through which Aro80 activates gene transcription (62). To verify that the ESBP6 and ARO80 genes are regulated by the Aro80 transcription factor, we used qRT-PCR to measure their transcript levels and that of ARO9 on urea medium with or without added tryptophan, tyrosine, or phenylalanine. The strains tested were a wild-type strain and an aro80 mutant (Fig. (Fig.8A).8A). In these experiments, the Aro80-dependent induction of ARO9 and ESBP6 expression was observed in the presence of each aromatic amino acid. The same is true of ARO80, but the effect was much less pronounced. We also compared the growth of a strain lacking ESBP6 with that of the corresponding wild type in all media tested in this work but found no difference (data not shown).
Nine other genes (IMD2, FLR1, SNQ2, YOR1, ICT1, GRE2, YLR046C, YLR346C, and QDR3) showed higher expression on tryptophan than on other media (see Fig. S8 in the supplemental material). A similar though less pronounced effect was detected on tyrosine medium. All of these genes code for proteins involved in resistance to multiple drugs, and seven of them are known to be targets of Pdr1, Pdr3, Pdr8, Yrr1, and/or Yrm1, i.e., transcription factors involved in pleiotropic-drug resistance (PDR) (34, 55, 78, 96). We used qRT-PCR to measure the expression of the SNQ2 gene on urea with or without added tryptophan or tyrosine. We thus confirmed that SNQ2 is specifically expressed to a higher level in the presence of tryptophan. Furthermore, this induction was not dependent on the Aro80 transcription factor (see Fig. S8 in the supplemental material). These results suggest that the PDR transcriptional network is activated when cells use tryptophan as the sole nitrogen source.
Surprisingly, the five Aro80 target genes also show higher expression on methionine, leucine, isoleucine, and threonine than on urea (see Fig. S8 in the supplemental material). As the catabolism of these amino acids remains only partially characterized, our observation raises the interesting possibility that the genes of the ARO regulon might also be involved in the catabolism of these amino acids.
Eight other genes showed lower expression on methionine than on any other tested medium. Six of them are among the eight target genes of the Cbf1-Met4-Met28 transcription complex mediating the repression of transcription in response to methionine (124), and the two others are also known to be repressed by methionine (52, 123). The two remaining Cbf1-Met4-Met28 target genes (MET10 and MET2) are also expressed to a lower level on methionine medium in our study but are not among the 506 genes displaying significant variation of expression on one or more tested nitrogen sources from that on urea.
The CAR1 and CAR2 genes, encoding arginase and ornithine transaminase, respectively, are up-regulated on arginine, whereas the ARG1, ARG3, ARG4, and ARG8 genes, encoding enzymes involved in arginine anabolism, are repressed on arginine medium. This is in accordance with the mechanisms of mutual exclusion of anabolism and catabolism of this amino acid (39). No other gene displaying arginine-dependent variation of expression was identified. The CAR1 and CAR2 genes are also highly expressed on citrulline and required for citrulline catabolism. Surprisingly, CAR1 is also more highly expressed on ornithine, isoleucine, leucine, methionine, threonine, tryptophan, and tyrosine even though arginase is not involved in the catabolism of any of these nitrogen sources and despite the fact that the arginine biosynthesis genes are GAAC activated only on group B nitrogen sources. A positive effect of these compounds on CAR1 expression has been observed previously, under conditions of nitrogen repression (41). The mechanism causing the higher expression of CAR1 under these conditions remains unknown.
The PUT1 and PUT2 genes (encoding, respectively, proline oxidase and Δ-1-pyrroline-5-carboxylate dehydrogenase) are involved in proline catabolism and are targets of the Put3 transcription factor activated by intracellular proline (18, 35, 36, 60). As expected, both genes show higher expression on proline than on the other tested media. They also show higher expression on citrulline, and increased PUT2 expression is also observed on arginine and ornithine. A recent work has shown that the MCH5 gene encoding a riboflavin transporter (105) is also induced by proline in a Put3-dependent manner (Juergen Stolz, personal communication). Accordingly, MCH5 is more highly expressed on proline medium and also on citrulline, arginine, and ornithine. That citrulline, ornithine, and arginine have a positive effect on the expression of the PUT regulon is likely due to the fact that the catabolism of these amino acids leads to the formation of Δ-1-pyrroline-5-carboxylate, which is converted into proline (19). Apart from the PUT and MCH5 genes, no other gene displaying significant proline-dependent variation of expression was found.
CIT2 and DLD3 (encoding, respectively, a citrate synthase and a d-lactate dehydrogenase) show lower expression on glutamate, proline, and citrulline than on the other nitrogen sources. These two genes are subject to the retrograde (RTG) control (22, 79). This regulation is mediated by the Rtg1 and Rgt3 transcription factors, which are inhibited by glutamate and responsible for the expression of genes encoding enzymes involved in α-ketoglutarate synthesis. Other known RTG targets, i.e., CIT1 (citrate synthase), IDH1, IDH2 (isocitrate dehydrogenase), and ACO1 (aconitase), show the same profile of inhibition by glutamate, but their P value does not exceed the threshold set for selecting differentially expressed genes.
Five genes show higher expression on GABA medium than on the other media (see Fig. S10 in the supplemental material). Among them, the genes UGA1 (GABA transaminase), UGA2 (succinate semialdehyde dehydrogenase), and UGA4 (GABA permease) are known to be induced specifically by GABA, via the Uga3 and Uga35/Dal81 transcription factors (3, 104, 121). The other two genes, AMD2 and MAE1, are thus new potential targets of this regulation. To test this hypothesis, we used qRT-PCR to measure their expression and that of UGA4 used as a control in a wild-type strain and a uga3Δ mutant growing on urea with or without added GABA (Fig. (Fig.8B).8B). We observed Uga3-dependent induction by GABA of the expression of both genes. AMD2 codes for a putative amidase (21), and MAE1 encodes a mitochondrial malic enzyme catalyzing the oxidative decarboxylation of malate to pyruvate (16). It is known that the uga1 and uga2 mutants are unable to grow on GABA as the sole nitrogen source, and the same is true of a uga4 mutant if the strain lacks the two other GABA permeases (Gap1 and Put4) (3, 104). We deleted the AMD2 and MAE1 genes in the Σ1278b strain, but the resulting mutants displayed no growth defect on any of the tested media, including GABA medium (data not shown).
Two genes show much higher expression on serine and threonine than on the other nitrogen sources (see Fig. S11 in the supplemental material). The first, CHA1, encodes a serine/threonine deaminase and is required for growth when the nitrogen source is one of these amino acids (101). Expression of this gene is known to be induced by threonine and serine via the Cha4 transcription factor (59). The second gene, MMF1, encodes a mitochondrial protein of unknown function. This protein is required to maintain the mitochondrial genome and for isoleucine biosynthesis (71, 97). Orthologs of MMF1 are found in all domains of life. The ortholog in Escherichia coli, tdcF, belongs to the tdcABCDEFG operon. Interestingly, several genes of this operon are involved in serine and threonine catabolism in this organism (54). This suggests that MMF1 is involved in serine and/or threonine degradation in yeast. Yeast also possesses a paralog of this gene, HMF1, which is not differentially expressed under the conditions of this study.
Among the 506 genes showing differential expression according to the nitrogen source, 22 code for transcription factors and 15 code for other proteins having a regulatory function. Generally speaking, most of these 37 genes show low-level expression and the variations that we observed are significant but slight. This is notably true of 14 genes that we were unable to associate with any of the above-described groups of coexpressed genes.
Eight of the 37 genes encode transcription factors directly involved in the regulation of nitrogen metabolism genes. As mentioned above, these include the genes coding for the positive GATA factor Gat1 and for the negative GATA factors Gzf3 and Dal80, which are involved in NCR and which are themselves targets of this regulation (23, 109, 119) (Fig. (Fig.3A).3A). Another central regulator of nitrogen metabolism is Gcn4, the principal transcription factor of GAAC. Interestingly, our analysis classified GCN4 among the 30 probable new target genes of NCR. The expression profile of GCN4 is indeed typical of genes subject mainly to NCR (Fig. (Fig.4B).4B). Furthermore, Gln3 has been associated with the GCN4 locus in ChIP-chip experiments (7), and our data of RT-qPCR experiments showed that GCN4 expression is significantly derepressed in a mutant strain defective in NCR (see Table S4 in the supplemental material). While these observations indicate that the transcription of GCN4 is subject to NCR, two other studies have highlighted a positive contribution of Gcn4 in NCR (118, 128), thus suggesting some cross-regulation between NCR and GAAC (see Discussion). Yet another transcription factor gene classified as an NCR target is UGA3, in keeping with a recent report (65). This factor is responsible for induction by GABA of the transcription of genes involved in GABA catabolism (2, 121). The expression of these genes is also subject to NCR (4, 32, 121). Finally, as already mentioned above, the Aro80 transcription factor seems to activate the expression of its own gene.
Among the 37 regulators differentially expressed in our experiments, 8 are involved in the cell cycle, filamentous growth, and/or pseudohyphal growth (Wtm1, Fus3, Kar4, Clb1, Tpk3, Sst2, Phd1, and Hms2), 7 are known to control glucose metabolism (Mig2, Pd7, Adr1, Nrg2, Hap4, Rgs2, and Mth1), 4 play a role in the cellular stress response (Haa1, Msn4, Hog1, and Ygk3), and 6 are associated with other regulations (Spl2, Spt4, Ino4, Tiss11, Zap1, and Hac1). It is possible that the nitrogen regulation of these regulatory genes allows other metabolic and signaling pathways of yeast to be modulated according to the nitrogen source. Only 4 of the 37 regulator-encoding genes differentially expressed in our experiments encode predicted regulatory proteins of unknown function. Two of them particularly draw our attention. One, encoded by the LEE1/YPL054W gene, codes for a protein containing two tandem repeats of the CCCH-type zinc finger domain. This domain is present in other proteins found in a wide range of organisms from yeast to the human species; it was shown in the case of Cth2 to promote binding to the 3′ end of mRNA and to accelerate its degradation under conditions of iron deficiency (102). The other gene (YOR052C) encodes a protein conserved in all eukaryotes and containing a zf-AN1-type zinc finger domain (80). This protein and its orthologs in many species (including other yeasts) contain an additional N-terminal ubiquitin-like domain whose presence in Yor052p, however, remains uncertain. Further experiments will be required to determine the functions of these potentially interesting putative regulatory proteins.
This study constitutes the first systematic analysis of the influence of nitrogen on the yeast transcriptome. We find that the levels of expression of almost 10% of all protein-encoding yeast genes (506 out of 5,690) vary significantly according to the nature of the unique nitrogen source available in the medium. This list of 506 genes is not significantly enriched in genes responding to stress. Even in cells grown on tryptophan, a nitrogen source supporting very slow growth, the expression levels of genes normally up- or down-regulated under stress conditions is not significantly different from that observed on nitrogen sources supporting fast growth. We believe that the lack of a stress response in the cells examined in this study is due to harvesting during steady-state growth. The variation in expression of the many genes subject to the ESR or CER (20, 47) is most visible in cells subjected to various perturbations of their environment, e.g., a shift to a different nutrient supply source, temperature, pH, or osmolarity. Also in support of this view, the transcriptome of yeast was recently examined in cells growing under steady-state conditions on proline or glutamine as the sole nitrogen source and in cells shifted for 2 hours from glutamine to proline (112). This analysis revealed a highly significant variation in the levels of expression of ESR/CER target genes in cells shifted from glutamine to proline but no significant difference in the expression levels of these genes between cells engaged in steady-state growth on one or the other medium.
We have used the expression data of nitrogen-regulated genes (i) to classify the nitrogen sources, (ii) to inventory the nitrogen-sensitive transcriptional controls and the corresponding target genes, and (iii) to evaluate the degree of activation of these transcriptional circuits on all tested nitrogen sources (Fig. (Fig.9).9). We have also tried to infer a function for some nitrogen-regulated orphan genes.
Most nitrogen sources fall into two main groups according to the supported growth rate and to the wide-spectrum transcriptional controls occurring in yeast cells growing on these sources (Fig. (Fig.22 and and9).9). Group A comprises asparagine, glutamine, serine, ammonium, aspartate, alanine, arginine, and glutamate. On these nitrogen sources, growth is rapid and NCR occurs. This result is essentially in agreement with the concept that NCR is active on good (preferential) and inactive on poor (nonpreferential) nitrogen sources (26, 85, 136). Yet we have also observed that the level of NCR does not correlate exactly with the generation time. For instance, NCR is partially relieved on glutamate and aspartate, despite rapid growth. Our study, furthermore, has revealed new genes potentially subject to NCR. Starting from a list of 41 established NCR target genes and using our expression data combined with data generated by other genome-wide studies or provided by the analysis of upstream gene sequences, we have extended this list with 44 additional genes, and experiments carried out on 18 of them confirmed that they are sensitive to NCR (see Table S2 in the supplemental material). Among the new NCR target genes are GDH1 and GLT1. The enzymes Gdh1 (anabolic glutamate dehydrogenase), Gdh2 (catabolic glutamate dehydrogenase), Glt1 (glutamate synthase), and Gln1 (glutamine synthetase) constitute the hub of yeast nitrogen metabolism (Fig. (Fig.1)1) (85, 136). Among them, GLN1 and GDH2 were previously established as being sensitive to NCR (84), but GDH1 and GLT1 were not, although observations consistent with this view have been reported (65, 129). It thus seems that the complete set of enzymes forming the heart of nitrogen metabolism in yeast is subject to NCR. Many of the new probable NCR target genes have known functions, and some code for proteins directly involved in nitrogen transport or metabolism. The functions of about 30 others, however, are still unknown. For 14 of these genes, clues to their roles in cell metabolism have been derived from in silico analyses of their protein sequences (see Table S2 in the supplemental material). Furthermore, we have found that two loci defined as pseudogenes in strain S288C are true protein-encoding genes in the Σ1278b strain (used in this study) and respond to NCR. One (YIL167-168W) encodes a homolog of serine/threonine deaminase and the other (YIL164-165C) a homolog of nitrilases. We also note that group A nitrogen sources are characterized by activation of the UPR pathway, a transcriptional control circuit typically activated when incorrectly folded proteins accumulate in the secretory pathway (83, 99, 120). Conversely, the expression of UPR target genes is reported to be reduced when cells are shifted to nitrogen starvation conditions (113). This suggests that the higher protein synthesis rate of cells growing on a nitrogen-rich medium is associated with a greater number of poorly folded proteins, resulting in UPR activation (113). Furthermore, other observations suggest that the UPR and signaling pathways involved in pseudohyphal growth or meiosis of diploid cells might be more connected than anticipated (113, 114).
The group B nitrogen sources comprise leucine, isoleucine, methionine, threonine, tryptophan, and tyrosine. On these nitrogen sources, growth is particularly slow and NCR does not occur. Furthermore, GAAC is activated on these nitrogen media (Fig. (Fig.22 and and9).9). The importance of GAAC in cells growing on group B amino acids is also illustrated by the specific growth defect of a gcn2Δ mutant on the corresponding media. In agreement with GAAC being particularly active under these conditions, the increased activities of two amino acid-biosynthetic enzymes, namely, indole-3-glycerol-phosphate synthase (TRP3) and arginosuccinate lyase (ARG4), have been observed on media containing isoleucine, leucine, methionine, threonine, or tyrosine as the sole nitrogen source (92). Furthermore, this activation depends on Ndr1/Gcn1 (92), a positive regulator of the function of Gcn2 (58). It has been proposed that the activation of GAAC during growth on these amino acids is due to amino acid imbalances caused, for instance, by feedback inhibition of enzymes shared by branched amino acid-biosynthetic pathways (92). In support of this view, the presence of an additional amino acid relieving the amino acid imbalance also leads to a reduction of GAAC (92).
Another difference between the group A and group B nitrogen sources lies in the fate of the carbon derivatives resulting from the catabolism of these compounds. Whereas the transamination or deamination of group A nitrogen sources yields derivatives directly assimilable by the cell metabolism (Fig. (Fig.1),1), the transamination of group B compounds leads to keto acids undergoing decarboxylation to aldehydes which are in turn converted by dehydrogenases into long-chain or complex alcohols. The amino acid derivatives generated by this so-called Ehrlich pathway (100, 115, 135) are toxic and thus excreted by the cell, and this contributes to the formation of fusel oils. The mechanisms involved in the excretion of aldehydes and higher alcohols remain unknown. In this respect it is noteworthy that growth on tryptophan (and to a lesser extent on tyrosine) leads to the up-regulation of several genes known to be sensitive to the PDR transcriptional control. These genes encode plasma membrane transporters known to promote the excretion of drugs: Snq2 and Yor1 are ATP binding cassette transporters (107), and Flr1 and Qdr3 are probable proton antiporters (91). Our observation that these genes are up-regulated on tryptophan medium suggests that their expression products may play a direct role in the excretion of certain derivatives of tryptophan catabolism, such as tryptophol.
The enzymes involved in the catabolism of the group B nitrogen sources are generally not nearly as well known as those contributing to the degradation of the other nitrogen sources. The main transaminase and the decarboxylase involved in degrading aromatic amino acids (phenylalanine, tryptophan, and tyrosine) have been identified (64, 133); the corresponding genes (ARO9 and ARO10) are controlled by the Aro80 transcription factor (62). Our results reveal that the ARO regulon is induced not only during growth on aromatic amino acids but also in cells grown on any other group B nitrogen source. This strongly suggests a possible involvement of the Aro9 and Aro10 enzymes in the catabolism of all these compounds. In agreement with this view, it was recently shown that Aro10 is a broad-spectrum decarboxylase (100, 132), and it is also known that Aro9 exhibits specificity for a wide range of substrates (76, 127). We have also observed that the ALT1/YLR089C gene, encoding an alanine transaminase homolog, shows higher expression on methionine, isoleucine, citrulline, valine, and alanine than on the other tested nitrogen sources. Hence, the protein encoded by this gene might be involved in the catabolism of these amino acids as well. The involvement of multiple, partially redundant enzymes displaying overlapping substrate specificities very likely explains why classical genetics-based approaches have not allowed effective dissection of the catabolic pathways of group B amino acids, in marked contrast to their successful use in dissecting amino acid anabolic pathways (69). We have also extended the ARO regulon involved in amino acid catabolism to two other genes, one of which (ESBP6/MCH3) encodes a putative transporter that localizes to the inner mitochondrial membrane (87). As Aro9 and Aro10 are cytoplasmic enzymes (61), Mch3 might be responsible for the transport of aldehydes into mitochondria, where several alcohol dehydrogenase activities have been detected (42).
A major difference between the two groups of nitrogen sources defined above is that NCR is active and GAAC is inactive on group A compounds, whereas the opposite is true on group B compounds. This raises the question of whether links might exist between the GAAC and NCR regulations. Our data reveal that the expression of GCN4, encoding the principal transcription factor of GAAC, depends on the nitrogen source supplied and is subject to NCR. On the other hand, previous work has highlighted a contribution of Gcn4 to NCR (118). Together these data suggest a possible general cross-regulation between the two major transcriptional controls of nitrogenous anabolism and catabolism in yeast, in which NCR (most active on good nitrogen sources) would down-regulate GAAC, which in turn would up-regulate NCR. Yet on several other nitrogen sources (valine, phenylalanine, ornithine, proline, citrulline, and urea), neither of these two major transcriptional control circuits appears to be very active (Fig. (Fig.9).9). Further experiments will be required to investigate possible molecular links between NCR and GAAC.
Some regulons are active on both group A and group B nitrogen sources (Fig. (Fig.9).9). This is the case, for instance, for the regulon formed by several amino acid permease genes induced by various external amino acids via the SPS sensor system and the Stp1, Stp2, and Uga35/Dal81 transcription factors (15, 46). We have observed that the target genes of the SPS regulon display quite divergent expression profiles when tested on the different nitrogen sources used here. This was not unexpected, since a similar result was obtained with a lacZ reporter fused to several of these permease genes (our unpublished data). It is likely that the different induction profiles of the genes of this regulon reflect their responsiveness to additional transcriptional controls, whose nature and importance vary from one gene to another. Many other nitrogen-responsive genes share this property of being under a combination of transcriptional controls rather than a single one. This likely explains why hierarchical clustering based on their expression profiles often show these genes to be scattered among distinct clusters of coexpressed genes (see below). Another regulon is less active on various nitrogen sources (especially glutamate and proline); it comprises the genes inhibited by glutamate via the RTG control (22, 79) (Fig. (Fig.9).9). These genes can be expected to be poorly expressed on proline medium, because the internal pool of glutamate is high on this medium. The average expression of genes subject to RTG control is also lower on group A nitrogen sources. This may again reflect a high glutamate pool, to be expected in the presence of a good nitrogen source.
Finally, several regulons are most active in cells grown on a more limited number of nitrogen sources (Fig. (Fig.9).9). Among them is the one inducible by GABA (121) and for which we identified two new target genes, MAE1 and AMD2. The MAE1 gene encodes a mitochondrial malic enzyme (16) that might contribute to optimizing the catabolism of succinate deriving from GABA catabolism (Fig. (Fig.1).1). We have also identified MMF1 as a new potential gene of the CHA regulon inducible by serine and threonine. A significant fraction of the 506 genes show weak though significant differential expression according to the nitrogen source code for transcription factors (22 genes) or for other proteins having a regulatory function (15 genes). Some of these regulatory genes encode proteins known to be directly involved in the regulation of nitrogen metabolism genes. For instance, we mention above a role of NCR in the transcriptional control of the GCN4 gene, while this gene appears to be required for optimal NCR (118). We have also identified two probable NCR target genes encoding potentially interesting regulatory factors, including one (LEE1/YPL054W) which may be involved in controlling mRNA stability. Experiments are in progress to test whether this protein plays such a role. Many other regulatory genes apparently controlled by nitrogen are involved in other metabolic pathways or cell processes. Further experiments will be needed to determine whether the apparent regulation by nitrogen of these regulatory genes contributes to the overall influence of nitrogen on other cellular pathways.
We have thus explored in a systematic manner the influence of different nitrogen sources on the yeast transcriptome. The same type of analysis could be used to acquire a general view of the transcriptional regulations involved in carbon, sulfur, and phosphorous metabolism. So far, genomic studies have focused on the mechanisms of adaptation to environmental change and especially to starvation (14, 47, 94, 138). Yet just as yeast is able to utilize many different nitrogen sources, it can also utilize different sources of carbon, sulfur, and/or phosphorous. A systematic study of the yeast transcriptome during growth on all these alternative nutrient sources should provide a more comprehensive view of the metabolic potentialities of yeast and of the associated transcriptional control mechanisms. It should also help to attribute potential functions to the still large number of orphan genes of yeast.
We are grateful to Sanna Venetvaara for her important contribution to the setting up of the microarray experiments in our laboratory and to Nasiha M'Rabet for her help in the carrying out of the first analyses. We are also very grateful to Catherine Jauniaux for her help in the sequencing the NIT1 and SDL1 loci and in the isolation of the gcn2Δ mutant. We also thank all members of the laboratory and the other participants of the ARC project (members of the laboratories of Jacques van Helden, Gianluca Bontempi, and Marcelline Kaufman) for fruitful discussions. Finally, we thank Juergen Stolz for communication of data before publication.
This work was supported by grant FRSM 3.4.605.05.F from the National Funds for Scientific Research, Belgium, grant Bioval 981/3861 (Région Wallonne), and the Communauté Française de Belgique (ARC grant number 04/09-307).
Published ahead of print on 16 February 2007.
†Supplemental material for this article may be found at http://mcb.asm.org/.