|Home | About | Journals | Submit | Contact Us | Français|
Exposure to carcinogenic alkylating agents, oxidizing agents, and ionizing radiation modulates transcript levels for over one third of Saccharomyces cerevisiae's 6,200 genes. Computational analysis delineates groups of coregulated genes whose upstream regions bear known and novel regulatory sequence motifs. One group of coregulated genes contain a number of DNA excision repair genes (including the MAG1 3-methyladenine DNA glycosylase gene) and a large selection of protein degradation genes. Moreover, transcription of these genes is modulated by the proteasome-associated protein Rpn4, most likely via its binding to MAG1 upstream repressor sequence 2-like elements, that turn out to be almost identical to the recently identified proteasome-associated control element (G. Mannhaupt, R. Schnall, V. Karpov, I. Vetter, and H. Feldmann, FEBS Lett. 450:27–34, 1999). We have identified a large number of genes whose transcription is influenced by Rpn4p.
Biological processes depend upon the structural integrity of the molecules that comprise living organisms. The structural integrity of the genome is particularly important because molecular alterations in the genetic material, usually DNA, can lead to permanent inheritable changes, i.e., mutations. However, the structural integrity of other cellular molecules, such as proteins, RNA, carbohydrates, and lipids, is also important, because the precise three-dimensional shape and the detailed chemistry of these molecules orchestrate the biochemical processes vital for life. Most biomolecules are inherently reactive, and as such their structural integrity is constantly challenged by reactive chemical and physical agents in the environment. It should therefore come as no surprise that all cells can sense and respond to unfavorable molecular alterations. Indeed, it is well known that cells sense and respond to damaged DNA and proteins, and such responses are exemplified by the SOS and heat shock responses that have been well characterized in Escherichia coli and other organisms (11, 12, 28).
Here we explore the transcriptional response of Saccharomyces cerevisiae to a wide range of chemical and physical damaging agents. Specifically, we explore how transcript levels for every S. cerevisiae gene and open reading frame (ORF) respond when cellular molecules are damaged by a selection of environmentally and clinically relevant chemical and physical carcinogens. The global transcriptional response of this budding yeast to these damaging agents turns out to be far more extensive than anticipated. However, computational analysis of almost 200,000 data points reveals patterns in the data that allow us to define novel regulatory networks. We find that the responses of S. cerevisiae to each of six damaging agents are markedly different and that, for at least one agent, the response is dramatically affected by the cell's position in the cell cycle at the time of exposure. Computational clustering of the data and subsequent searching for common sequence motifs in promoter regions reveal nine such motifs, only five of which have known binding factors. Furthermore, we find that a large number of protein degradation genes and a selection of base excision and nucleotide excision DNA repair genes are linked in a transcriptional regulon controlled by Rpn4p, a proteasome-associated protein (22).
Log-phase S. cerevisiae strain DBY747 (MATa his3-Δ1 leu2-3,112 ura3-52 trp1-289a gal2 can1 CUP1s) or BY4740 and its Δrpn4 derivative (MATa ura3Δ0 lys2Δ0 leu2Δ0 Δrpn4) was grown to a density of 5 × 106 cells per ml in 1% yeast extract–2% peptone–2% glucose. Cultures were split in two, and methyl methanesulfonate (MMS) was added to 0.1% to one half. Incubation was continued for 10, 30, or 60 min. For one experiment, MMS was added to 0.05, 0.1 or 0.2% and incubated for 1 h. For the other agents, log-phase cells were grown to a density of 5 × 106 cells per ml in 1% yeast extract–2% peptone–2% glucose. Cultures were split, and N-methyl-N′-nitro-N-nitrosoguanidine (MNNG) (6.7 or 27 μg/ml), mitomycin C (MMC) (2 μg/ml), 4-nitroquinoline n-oxide (4NQO) (2 or 8 μg/ml), 1,3-bis(2-chloroethyl)-1-nitrosourea (BCNU) (200 μM), or tert-butyl hydroperoxide (t-BuOOH) (5 mM) was added, and incubation was continued for 60 min. For cell synrochony, log-phase cells were arrested in G1 by α-factor (3 μM for 120 min), in S phase by hydroxyurea (0.1 M for 210 min), and in G2 by nocodazole (15 μg/ml for 120 min). Stationary-phase cells were harvested after 3 days of growth (5 × 108 cells/ml). Arrested cells were confirmed by microscopy and by fluorescence-activated cell sorting analysis (data not shown) and split in two, and 0.1% MMS was added to half the cultures for 1 h before RNA isolation.
RNA isolation and purification and cRNA labeling were done as described (17). Hybridizations with a set of four oligonucleotide arrays (GeneChip Ye6100 arrays; Affymetrix, Santa Clara, Calif.) containing probes for 6,218 yeast ORFs were done at 45°C for 16 h with constant mixing in 200 μl of MES buffer (100 mM MES [morpholimeethanesulfonic acid], 1 M Na+, 20 mM EDTA, 0.01% Tween 20) with 10 μg of labeled cRNA. After hybridization, arrays were washed in nonstringent wash A buffer (6× SSPE [1× SSPE is 0.18 M NaCl, 10 mM NaH2 PO4, and 1 mM EDTA (pH 7.7)], 0.01% Tween 20, 0.005% antifoam, 25°C) followed by stringent wash B buffer (100 mM MES, 0.1 M Na+, 0.01% Tween 20, 50°C). Arrays were then stained with strepavidin-phycoerythrin (30 min, 25°C; Molecular Probes), followed by rinsing with wash A buffer. Arrays were stained with R-phycoerythrin-streptavidin (10 μg/ml; Molecular Probes) in 100 mM MES–1.0 M Na+–0.01% Tween 20 at 25°C. All washes were automated on a fluidics station (Affymetrix). Arrays were scanned using a specialized confocal laser scanning microscope (Hewlett Packard or Molecular Dynamics) and analyzed using the GeneChip analysis suite, version 3.1. All arrays were scaled so that the average of the average intensity difference of the perfect match probes minus the mismatch probes was 300. This scaling allowed all the arrays to be directly compared with each other. Integrity of the sample was determined by measuring the intensity of probes derived from both the 3′ and 5′ ends of actin and TATA-binding protein. The signal from probes corresponding to the 5′ end was not less then twofold of the intensity of probes derived from the 3′ end of the gene in any sample. These measurements suggest that the mRNA is not more degraded in the treated samples than in the controls.
Three control untreated log-phase samples were hybridized to three different sets of GeneChip arrays. A baseline value was determined by averaging the hybridization intensity from the three control experiments. Each gene has approximately 20 pairs of oligonucleotide probes; within each pair, one is a perfect match to the gene and one has a mismatch. Hybridization intensity was determined by calculating the average intensity difference between the perfect match signal and the mismatch signal across the 20 pairs of probes. Fold changes were calculated by dividing the average intensity difference values from experimental samples by the baseline values. Note that in our original report we arbitrarily chose fourfold as the cutoff for induction and threefold as the cutoff for repression (17). Given an improved algorithm for calculating hybridization intensities, we now adopt threefold as the cutoff for both induction and repression. Accordingly, the number of genes categorized as responsive in this study has increased compared to our previous study.
All genes showing a change of 3.0-fold or more in at least one experimental condition were included in the analysis. Three control untreated experiments were performed. The baseline intensity value was calculated as the average of the three average difference values. For each treatment, the fold change was determined by dividing the average intensity difference from the experimental sample by the baseline intensity. The average difference was determined using GeneChip analysis suite 3.1 software (Affymetrix). Although some genes have less than the ideal number of probes, they were still included in the analysis. Cluster analysis was performed using one of three methods: Euclidean distances (34), hierarchical (10), or self-organized maps (SOMs) (33). For Euclidean distance measurements, log-transformed fold changes was arbitrarily clustered into groups of genes having similar expression profiles. Hierarchical clustering was done as described (10). The third method was based on SOMs using Genecluster 1.0 (33). A. filter was used to eliminate genes which had a relative change of less than 3.0-fold and whose expression level were less than 60 across all treatments. Expression levels were normalized to have a mean of 0 and a variance of 1, which forced genes to be grouped based on the shape of their expression pattern rather than on their absolute values (33). The number of clusters was chosen to give the largest number of fundamentally different patterns.
On the arrays, many genes are represented by more than one set of probes. In order to accurately determine the distribution of functional categories present, only one set of probes was used for each gene. When more than one set was present, sets containing less than the ideal number of probes were eliminated. In other cases the probe set with the higher signal was included.
To determine if fold changes were statistically significant between the triplicate experiments, a Student t test was performed in Microsoft Excel using a two-tailed distribution. To determine if SOMs produced distinct clusters, correlation coefficients were determined from the mean of each group. Correlation coefficients p(x,y) were determined using Microsoft Excel with the formula
where x,y represents any pair of clusters, xi or yi equals the value of treatment i in cluster x or y, μx and μy equal the means of the normalized intensities across the 26 treatments in clusters x and y, and ςx and ςy equal the standard deviations of the normalized intensities across the 26 treatments in clusters x and y, respectively.
Only four groups showed significant similarity. They included clusters 1 and 4 (0.86), clusters 14 and 17 (0.83), clusters 17 and 15 (0.78), and clusters 9 and 12 (0.79). Hypergeometric distribution [P(x)] was used to determine the chance probability of observing the number of genes of a particular function category or with a particular upstream motif within each cluster, as described (34); calculations were determined using the formula
where x is the number of genes in a functional category or with a motif in a defined cluster, n is the total number of genes in the cluster, M is the total number of genes in a functional category or with a particular motif in the dataset, and N is the total number of genes in the dataset.
We previously used Affymetrix GeneChip oligonucleotide arrays to characterize the global transcriptional response of S. cerevisiae upon exposure to a mildly toxic dose of a monofunctional SN2 alkylating agent, MMS (17). MMS is typical of a large class of reactive chemicals present in the air we breathe and the food we eat, as well as being representative of some normal cellular metabolites (23). To our surprise, transcript levels for roughly 400 of S. cerevisiae's ~6,200 genes were responsive to MMS exposure; ~300 genes were induced by 4- to 250-fold, and ~100 genes were repressed by 3- to 18-fold (17). Before undertaking a much larger study, we assessed the reproducibility of the transcriptional responsiveness measured by GeneChip analysis. Six separate S. cerevisiae cultures were grown to mid-log phase; three were exposed to 0.1% MMS for 60 min, and three remained untreated. cRNA prepared from each culture was hybridized to the GeneChip arrays as described above (17, 38). Figure Figure1a1a displays the range of hybridization intensities obtained for a representative selection of 69 MMS-inducible and 31 MMS-repressible genes; note that in virtually no instance do the error bars (representing standard deviations) for treated and untreated come close to overlapping. Figures Figures1b1b and c display individual hybridization intensities for the 693 genes whose transcripts changed by threefold or more upon MMS treatment (the identities of these genes and the raw data can be found at www.hsph.harvard.edu/geneexpression.) Individual values for the three untreated cultures are plotted against their averages in Fig. Fig.1b,1b, and individual values for the three MMS-treated cultures are plotted against their averages in Fig. Fig.1c;1c; this provides a graphic representation of the variation between experiments, viewing the untreated and treated groups separately. Figure Figure1d1d displays the average hybridization intensities for the 693 responsive genes, but here the average values for the MMS-treated cultures are plotted against those for the untreated cultures to provide a graphic representation of where the 693 responsive genes fall in the 3- to 217-fold range of MMS responsiveness. The data in Fig. Fig.11 confirm that, in our hands, transcript levels measured by GeneChip analysis are highly reproducible and show that transcript levels observed in MMS-treated cells are likewise reproducible. Furthermore, the changes in transcript level are also reproducible, and 648 of the 693 responsive genes (94%) showed a statistically significant change at a 90% confidence level. We are therefore confident that our estimate of such a surprisingly large number of MMS-responsive genes is reliable.
Having established that mRNA profiling using the Affymetrix oligonucleotide chips was reproducible, we adopted an experimental strategy first suggested by Eisen et al. (10), that it is much more informative to establish mRNA profiles for a wide variety of conditions than to make repeat observations on identical conditions. We therefore committed our available resources to monitoring changes in transcript levels induced by numerous different MMS exposures and induced by roughly equitoxic exposures to numerous carcinogens. Note that by using a 90% confidence level to determine significance, we may increase the number of false-positive results while increasing the number of responsive genes. We chose to be more inclusive with our data, relying on the clustering algorithms to determine interesting patterns that would be unaffected by a few false-positive results.
The collection of MMS-responsive S. cerevisiae genes observed after 60 min of exposure to 0.1% MMS (17) (Fig. (Fig.1)1) represents a simple snapshot of the transcriptional response of this eukaryote to alkylation damage. This raises the possibility that if one could monitor transcriptional responses as a continuum, even more genes might be counted as MMS responsive. In an attempt to gain insight into this continuum, we monitored transcriptionally responsive genes as a function of time in 0.1% MMS and as a function of MMS dose. Figures Figures2a2a and b provide a diagrammatic representation of the results (the identities of the genes and the raw data can be found at www.hsph.harvard /geneexpression). Represented in Fig. Fig.2a2a and b are genes whose transcript levels either increased (green) or decreased (red) by threefold or more for at least one of the treatments. In addition, the responsive genes are clustered into groups that show similar kinetics using the hierarchical clustering program developed by Eisen et al. (10). It is immediately apparent that many more than 400 genes are MMS responsive, and the total number of genes represented in Fig. Fig.2a2a and b is 969 and 1,863, respectively. The set of transcriptionally responsive genes is quite different at early versus late times and at low versus high alkylation levels. For the temporal response, there appear to be distinct groups of early-, middle-, and late-responsive genes, clusters IV, III, and II, respectively, in Fig. Fig.2a.2a. In addition, the response of several sets of genes appears to be transient, in that their responsiveness is seen at only one or two time points (e.g., clusters I, V, and VI). For the dose response (Fig. (Fig.2b),2b), it is clear that with increasing alkylation levels, the number of responsive genes as well as the degree of responsiveness increases in a cumulative way. At the highest dose, we monitored 1,426 responsive genes, with 999 upregulated and 427 downregulated; this represents over 20% of the S. cerevisiae genome. Why such a large fraction of the yeast genome should be MMS responsive is discussed below.
As eukaryotic cells move through the cell cycle, specific sets of genes are transcriptionally activated and inactivated, although transcript levels for the vast majority of genes do not change (4, 32). Moreover, responses to DNA-damaging agents are known to vary throughout the cell cycle; e.g., G1 cells that experience DNA damage activate a G1/S checkpoint, those in S phase activate an S-phase delay, and those in G2 or M activate a G2/M checkpoint (16, 21, 30, 37). For these reasons we monitored how S. cerevisiae responds to MMS-induced alkylation damage as a function of cell cycle. (Note that in our initial dataset  very few of the MMS-responsive genes turned out to be cell cycle regulated genes.) Cells arrested in G1, S, G2, or stationary phase were exposed to 0.1% MMS for 60 min; the MMS-induced transcriptional profiles for each synchronized population are diagrammed in Fig. Fig.33 and presented numerically at www.hsph.harvard.edu/geneexpression. Cell cycle stage had a profound effect on the MMS-induced transcriptional profiles. Numerous genes appear to be cell cycle specific in that they were only scored as weakly responsive or nonresponsive in MMS-treated log-phase cultures, but were scored as clearly responsive in a synchronized culture. Among them, 199 genes appear responsive only if cells experience damage in G1 (clusters II, V, and VI); 84 genes are only responsive in S phase (clusters IV and VII); 94 are only responsive in G2 (clusters III and IX); and 229 are only responsive in stationary phase (clusters I, VIII, and X). Fewer than 20% of these 614 genes were previously shown to have cell cycle-dependent expression (4, 32).
It turns out that a large fraction of the genes that are responsive to MMS in log-phase cycling cells are also responsive to simply being held in stationary phase, independent of MMS exposure. This is shown clearly in Fig. Fig.3b,3b, where the transcript level changes for MMS-treated log-phase cells and those for stationary-phase versus log-phase cells are reclustered and shown alongside each other. The MMS exposure (0.1% for 60 min) to some extent appears to mimic the arrest of cells in stationary phase, at least in terms of transcriptional profile. At first it seemed that fewer genes are MMS responsive in stationary-phase cells than in cells in other parts of the cell cycle (Fig. (Fig.3a3a and b). However, this may be explained by the fact that 335 transcripts that ordinarily respond to MMS are already up- or downregulated in stationary-phase cells prior to MMS exposure (Fig. (Fig.3b,3b, clusters I and II, respectively) and respond no further upon alkylation exposure. There appears to be an overlap of responsive genes by two different stressful conditions, MMS exposure and stationary growth. This may reflect a general stress response pathway, although we do not yet know whether these are a primary, secondary or tertiary response to stress.
One of our major goals is to understand exactly how cells respond to a range of carcinogenic alkylating agents representative of those present in our environment and those used in the cancer clinic. We therefore set out to compare the transcriptional response of S. cerevisiae to various candidate alkylating agents, including the SN2 alkylating agent MMS, the SN1 alkylating agent MNNG and the chemotherapeutic alkylating agent BCNU. In addition, we wished to determine which aspects of the responses are alkylating agent specific, and so for comparison we determined the transcriptional profile of cells exposed to three other types of damaging agent: γ-irradiation, 4NQO, and the oxidizing agent t-BuOOH (12). Cells were exposed to roughly equitoxic doses of each agent, as measured by colony formation, and the resulting profiles are shown in Fig. Fig.44 (shown numerically at www.hsph.harvard.edu/geneexpression). Doses were relatively nontoxic, resulting in 75 to 100% survival. Very few genes were responsive to all of the agents; indeed, among the hundreds of responsive genes, the transcript levels for only 12 turned out to be induced by all treatments, and transcripts for only 9 were repressed by all treatments. These 21 genes do not include any DNA repair genes, and since they were determined independently of clustering, they are detailed at http://www.hsph.harvard.edu/geneexpression for easy access. Furthermore, there were surprisingly extensive differences between the transcriptional profiles induced by each of the six damaging agents. Even the closely related methylating agents MMS and MNNG induce quite distinct transcriptional profiles at roughly equitoxic doses (Fig. (Fig.4,4, lane 4 versus 6). At a higher MNNG dose (24% survival), the profile begins to overlap more with the MMS-induced profile, although each profile still remains quite distinct (lane 4 versus 7). It is also notable that the profiles produced by equitoxic exposure to γ-rays and the oxidizing agent t-BuOOH are dramatically different. Since it has been estimated that following ionizing radiation, ~65% of the damage to DNA occurs by base oxidation and only ~35% occurs directly by ionization (12, 36), one might have expected more overlap.
In addition, there appear to be groups genes that are specifically responsive to each damaging agent (clusters I to VI), and these may turn out to represent unique signatures for each agent. It should be noted once again that these profiles represent snapshots of transcriptional responses, and upon further kinetic analysis, genes that appear to be agent specific in Fig. Fig.44 may turn up in response to the other agents. In fact, only 30% of the genes that appear to be agent specific in Fig. Fig.44 (excluding the MMS-specific cluster IV) can be found among the numerous MMS-induced profiles described in this study. However, a more extensive kinetic analysis will be needed to establish if there are certain responsive genes that are truly specific for a particular agent or class of agent. Finally, it is clear from Fig. Fig.44 that the volume of genes responding to exposure to a damaging agent is not a good predictor of toxicity. For example, the most toxic treatment (4NQO, producing 10% survival) influenced the expression of far fewer genes than did the least toxic treatment (BCNU, producing 100% survival).
Taken as a whole, this study shows that damaging cells by physical and chemical carcinogens elicits significant changes in transcript level for more than 2,500 of S. cerevisiae's ~6,200 genes. These transcriptionally responsive genes can be categorized by functional category (as defined by the S. cerevisiae genome database ) and are summarized in Table A at http: //www.harvard.edu/geneexpression. The number of induced genes is listed in green, the number of repressed genes is listed in red, and each number is linked to its corresponding list of genes and to a numerical representation of transcript levels and fold induction values. By far the largest category of responsive genes is genes of unknown function, and the next largest categories include those for protein and mRNA metabolism. Surprisingly, DNA repair, DNA replication, and cell cycle progression genes are only modestly represented in the dataset.
A powerful computational method for seeking meaningful patterns in large datasets can now be applied to transcriptional profiling data (33). The organization of data into SOMs places genes into clusters that behave similarly across multiple conditions. Using this algorithm, we organized 26 transcriptional profiles into 18 such SOMs (Fig. (Fig.5A);5A); the 26 conditions are listed in the figure legend, and at the website each box links to a list of the genes in that particular SOM. Note that for 3,600 genes, either the transcript levels did not change significantly for any of the 26 treatments or they were expressed at very low levels and were eliminated from the dataset; the remaining 2,610 genes are apportioned to the 18 SOMs. For this analysis, transcript levels are compared across all 26 conditions, and clusters are created based on whether or not the transcript levels go up or down; the analysis does not weigh the actual fold differences in transcript levels, but instead notes the trend. Put another way, genes whose transcript levels change by up to 10-fold in any one of the 26 conditions may be clustered with those that change up to 100-fold, provided the up and down trend is similar across all 26 conditions. Such computational organization of transcriptionally responsive genes is designed to cluster together genes that respond to some of the same signal transduction events, and thus genes whose expression may be controlled by the same regulatory proteins. In other words, it is hoped that such clustering will identify individual regulons and their regulators.
Figure Figure5B5B presents a diagrammatic representation of the 18 SOMs, but this time the fold changes in transcript levels are presented as they relate to the average transcript levels observed in the three untreated log-phase cultures (see Fig. Fig.1).1). The individual transcriptional profiles are also arranged so that those that are most similar to each other lie next to each other, and the extent of their relatedness is indicated by the dendogram above the figure. Accordingly, almost all of the MMS-induced profiles lie together, with the exception that the untreated stationary-phase profile sorts with this group, as mentioned above (Fig. (Fig.3b).3b). Note that the profiles induced by the two agents that, like MMS, are strong inducers of protein degradation and amino acid metabolism genes (BCNU and t-BuOOH; see Table A at www.hsph.harvard.edu/geneexpression) sort next to the MMS-induced profiles. Again, it is surprising that the profile induced upon exposure to the oxidizing agent t-BuOOH sorts so far away from that induced by an equitoxic γ-radiation dose, given that γ-irradiation-induced toxicity is thought to derive in large part from a flux of oxidative damage (12, 36).
The SOMs produced a distinct organization of genes. Of the 162 possible pairwise comparisons of the patterns within each cluster (Fig. (Fig.5A),5A), only four showed significant similarity (with a correlation coefficient greater than 0.75), while the vast majority did not. This indicates that for the most part, this analysis divided the responsive genes into distinctive groups. For some of the SOMs derived from this diverse array of damage-inducible profiles, it is quite apparent that genes encoding functionally related proteins become grouped together. As examples, SOM1 (Fig. (Fig.5A5A and B) contains 130 of the 212 responsive protein synthesis genes; SOM3 contains 47 of the 62 responsive genes involved in energy metabolism; SOM5 contains 13 of the 19 responsive genes involved in mating; and SOM13 contains 29 of the 96 responsive amino acid metabolism genes. Such grouping of functionally related genes agrees well with the results of Eisen et al. (10), who first proposed that clustering the combined data from transcriptional profiles generated by a large number of treatments would allow genes to be sorted into functional groups.
Several S. cerevisiae DNA repair and DNA metabolism genes have long been known to be induced upon MMS exposure, among them the MAG1 3-methyladenine DNA glycosylase gene, known for its important role in base excision repair and in alkylation resistance (2, 3, 31, 39). Indeed, it was the fact that MMS-responsive genes like MAG1 are important for protecting cells against carcinogenic alkylating agents that prompted us to seek the identity of all MMS-responsive S. cerevisiae genes, on the premise that some of these genes may also be important for alkylation resistance. We were therefore particularly interested to determine which genes cluster with MAG1 across the 26 conditions shown in Fig. Fig.5.5. MAG1 turns out to cluster with 213 other genes in cluster 14, as highlighted in Fig. Fig.5A5A and detailed in Table Table11 (and presented numerically at the website). To our surprise, the largest category of known genes to cluster with MAG1 were the protein degradation genes, and only four other DNA repair genes were present in the cluster. In our initial report (17), we noted that a large fraction of protein degradation genes were induced by MMS, along with an equally high fraction of amino acid metabolism genes. We inferred from these data that MMS exposure might signal the induction of a program to eliminate and replace alkylated proteins. Here, SOM analysis indicates a correlation between the regulation of MAG1 and nearly 50% of the responsive protein degradation genes (most amino acid metabolism genes cluster elsewhere). This observation led us to search for common regulatory motifs upstream of MAG1, upstream of the protein degradation genes, and upstream of the other genes in cluster 14.
Several years ago we identified an upstream repressor sequence (URS), called URS2, in the MAG1 promoter region with the sequence GGTGGCGA (31, 39). Using the AlignACE and ScanACE programs developed by Roth et al. (25), we now find that sequence motifs similar to the MAG1 URS2 can be found upstream of 56 of the 214 genes in cluster 14 and that 33 of these 56 genes are protein degradation and ubiquitination related. In total, 68 responsive genes are involved in protein degradation, and almost 50% (33) are found in this cluster. In order to show the significance of this finding, Fig. Fig.5C5C displays the distribution of genes with MAG1 URS2-like elements among the 18 SOMs; clearly this element is overrepresented (P = <10−300) in cluster 14 containing the MAG1 and protein degradation genes.
We and others have pointed out that sequence motifs similar to the MAG1 URS2 element are found upstream of over a dozen DNA repair and metabolism genes (27, 31, 39). These elements have been referred to as damage repair consensus elements (27), and many but not all genes bearing such elements are damage responsive. More recently, a similar putative regulatory sequence was identified for numerous genes encoding proteins involved in ubiquitin-mediated protein degradation; this was named the proteasome-associated control element (PACE) (22). It is now clear that damage repair consensus elements can be separated into two different sequence motif groups, one of which is indistinguishable from the PACE sequence motif group and which includes the MAG1 URS2 element (P. Estep G. Church, unpublished data). A protein that binds specifically to the PACE sequence motif was identified by one-hybrid analysis as Rpn4 (22), a protein thought to be associated with proteasomes (13, 14). It appears that Rpn4p binds the PACE sequence to serve as a transcriptional activator (22).
Here we characterize an rpn4 deletion strain for its ability to induce MAG1 transcript levels in response to MMS. Figure Figure6a6a shows the dramatic loss of MAG1 MMS inducibility in the rpn4 deletion strain, and as a result the rpn4 deletion strain turns out to be MMS sensitive, although not as sensitive as a mag1 deletion strain (Fig. (Fig.6b).6b). That the MAG1 URS2 element behaves as a repressor binding site (31) does not necessarily exclude Rpn4p's behaving as an activator at this site; our current model predicts that Rpn4p and a putative repressor compete for binding at the GGTGGCGA sequence. We also monitored the MMS inducibility of two other genes that contain the MAG1 URS2 sequence motif. The loss of Rpn4p caused a dramatic loss of inducibility for the RAD23 nucleotide excision repair gene and attenuated the inducibility of the PRE2 proteasome subunit gene. Rpn4p thus influences the regulation of genes in at least three different pathways, namely, base excision repair, nucleotide excision repair, and protein degradation. Note that two other MMS-inducible genes, neither of which localized to cluster 14, are totally unaffected by the absence of the Rpn4 transcriptional activator (Fig. (Fig.6a).6a).
We find that S. cerevisiae transcriptional profiles change dramatically in the absence of Rpn4p, both with and without MMS exposure. Lane 1 in Fig. Fig.6c6c displays transcript level changes in the untreated rpn4 deletion strain compared to its untreated wild-type parent. A total of 350 genes are downregulated, suggesting that Rpn4p affects transcriptional activation, and an even larger group of genes, 389, are upregulated, suggesting that Rpn4p affects transcriptional repression. Lanes 2 and 3 in Fig. Fig.6c6c depict MMS-responsive genes in wild-type and rpn4-deleted cells, respectively, treated with 0.1% MMS for 60 min. Extensive differences between the two profiles are quite apparent, and both the upregulation and downregulation of transcripts are affected by the loss of Rpn4. The data in Fig. Fig.6c6c were organized into 12 SOMs, and the multiple effects of losing the Rpn4 regulatory protein can be summarized as follows. (i) A total of 230 genes that are not MMS responsive in wild-type cells become susceptible to repression by MMS (cluster 2) and 85 become susceptible to induction by MMS (cluster 5); (ii) 461 genes become refractory to MMS induction (clusters 3, 8, and 10); (iii) 333 genes become more sensitive to MMS repression (clusters 4 and 7) and 455 become more sensitive to MMS induction (clusters 9 and 12); and (iv) 660 genes show little difference in their response to MMS despite the fact that their basal-level expression changed in rpn4-deleted cells (clusters 1, 6, and 11).
For the group of 213 genes that clustered with MAG1 (cluster 14, Fig. Fig.5A),5A), 56 had an upstream MAG1 URS2-1ike sequence; 44 of these 56 appear in the profiles shown in Fig. Fig.6c6c because their expression was affected by Rpn4p. The relative expression of all 44 genes is shown in Fig. Fig.6d,6d, and the genes are grouped into three categories based on their distribution in Fig. Fig.6c.6c. Shown in black are 21 genes from cluster 3 (which contains MAG 1), in blue are 12 genes from cluster 6, and in orange are the remaining 11 genes from four separate clusters. The following conclusions can be made: on average, their basal expression is lower in the rpn4 deletion strain than in the wild-type strain, and on average, the absence of Rpn4p renders these genes less MMS inducible. Presumably the constellation of other transcription factors at each promoter determines how Rpn4p influences transcription.
Finally, two DNA nucleotide excision repair genes, RAD23 and SSL2, display another intriguing link to the ubiquitin-mediated proteasome degradation pathway. First, RAD23 and SSL2 are coregulated with MAG1 and protein degradation genes. Second, Rad23p has an N-terminal ubiquitin-like domain that interacts with the 26S proteasome and a C-terminal domain that interacts with Rad4p (29). The Rad23p-Rad4p complex in turn interacts with TFIIH (which contains Ssl2p), the transcription initiation factor known to be required for nucleotide excision repair (15). Thus, just as regulation of RAD23, SSL2, and the proteasome genes is transcriptionally linked, their products are physically linked via protein-protein interactions. Moreover, recent in vitro and in vivo evidence demonstrates that such protein interactions are important for optimal nucleotide excision repair activity (26). Since the transcription of three DNA glycosylase genes (MAG1, NTG1, and NTG2) is also coregulated with proteasome genes, it is tempting to speculate that optimal base excision repair is connected in a similar way to proteasome function.
The promoter regions for the genes in each of the 18 clusters identified by SOMs in Fig. Fig.5A5A were analyzed by the AlignACE program (25) for common sequence motifs, and Table Table22 lists the consensus sequence for each motif with a MAP score of >10. This score is an internal metric used to determine the significance of an alignment. AlignACE searches in unaligned sequences for conserved DNA motifs and scores each motif based on the alignment and on the frequency of occurrences in intergenic regions. Motifs were considered significant if their MAP score was greater than 10 and if their distribution was significantly enriched in a particular cluster. Nine significant sequence motifs were identified, of which five are bound by known factors.
Five of the sequence motifs have known binding factors, namely, Rpn4p (discussed above), Rap1p (19), Hap2/3/4/5p (24), Abf1p (7), and Ste12p (40). Rap1p regulates ribosomal protein gene transcription (19), and accordingly Rap1 binding sites were found upstream of 45% of the genes in cluster 1 (Fig. (Fig.5A),5A), where most of the ribosomal protein genes sort. Indeed, a consensus Rap1p binding sequence was recently determined by a systematic search of all upstream ribosomal protein transcription start sites (18) that turns out to be identical to the motif identified here by the blind AlignACE search of our dataset. The HAP2/3/4/5p binding complex is important for the transcription of many mitochondrial proteins (20); 10% of the genes in cluster 3 contain a HAP2/3/4/5p binding site, more than half of which turn out to be involved in mitochondrial functions, including ATP synthesis, oxidative phosphorylation, and respiration. Abf1p binds to a sequence motif present in replication origins, promoters of rRNA genes, and other genes involved in translation and glycoylsis and at mating type silencing sequences (6); cluster 4 contains a concentration of RNA metabolism genes and translation genes. Ste12p is a transcription factor for yeast mating genes and associated cell cycle regulation genes (40), and these sites are found in 16% of the cluster 5 genes; most of the mating genes sort to cluster 5. The factors that bind the remaining four motifs (if any) remain to be identified.
Exploring transcriptional profiles is inherently descriptive. For S. cerevisiae, most of the transcriptional profiling carried out to date describes changes that occur upon specific alterations in growth conditions or upon specific alterations in genotype (4, 5, 9, 17, 18, 32, 35). Here we describe changes in transcriptional profiles that take place when cells are exposed to a reactive chemical or physical agent, such that virtually every molecule in the cell is at risk of being altered in some way. In retrospect, we should perhaps not have been surprised by the fact that over one third of S. cerevisiae's entire gene repertoire can respond to the deluge of damage. Nevertheless, the results were surprising, and they challenge us to determine what roles, if any, such a myriad of transcriptional changes play in protecting cells against inevitable exposures to carcinogenic agents.
One way to make sense of global transcriptional responses is to break them down into smaller components, by identifying individual regulons and their regulators. Ultimately, manipulating each regulon to alter the response component by component should help to reveal their in vivo roles. Despite the complexity and the sheer volume of information contained in global transcriptional profiles, elegant computational methods can unveil patterns in the data. In this study, these patterns led us to genetically define a novel MMS-responsive regulon that is controlled, at least in part, by the proteasome-associated protein Rpn4p. Moreover, the Rpn4p binding site was only one of nine sequence motifs identified upstream of the damage-responsive genes. Among the remaining eight motifs, four are known to be bound by previously characterized transcription factors, and four warrant further investigation. In this way, it may be ultimately possible to systematically study each component of the complex transcriptional response of eukaryotes to carcinogenic agents. Perhaps more important will be subsequent determination of the relative importance of each component in protecting against the cytotoxic, mutagenic, and thus carcinogenic effects of the kinds of damaging agents used in this study.
It is important to note that the transcriptional profiles from all the diverse exposures were required in order to generate the SOMs that link the proteasome and the DNA repair genes. If some of the profiles are omitted, the apparent connection between protein degradation and DNA repair is lost. This underscores the power of combining the information from numerous diverse treatments in order to generate informative patterns in the data.
The presentation of enormous datasets associated with transcriptional profiling in conventional publications must of necessity be limited to describing patterns and trends in the data, rather than discussing the identity of every transcript whose expression is affected. Such patterns and trends hold the promise of identifying novel biological pathways, elucidating how pathways are regulated, assigning to genes of unknown function a known or probable function, and ultimately (in conjunction with proteomics and other emerging techniques) elucidating how all the molecular components of cells integrate to make a living organism. However, in order to understand the final integrated picture, the identity of each gene whose changing expression produces the patterns and trends must ultimately be considered. It is therefore important that the information be scrutinized by experts in many different areas of molecular biology. Here we have inspected our dataset from the perspective of DNA repair in general and DNA alkylation repair in particular. We hope that others will inspect our data (at http://www.hsph.harvard.edu/geneexpression) from the perspective of their own highly specialized areas of expertise.
This work was supported by National Institutes of Health grant RO1 CA5502 to L.D.S. and ONR grant N00014-97-1-0865 to G.M.C. S.A.J. was supported by National Institutes of Health training grant CA09078 and NRSA grant CA81744. The Affymetrix academic user program was supported in part by National Institutes of Health grant PO1-HG0132. L.S. was a Burroughs Wellcome Toxicology Scholar.