Analysis of gene expression during the cell cycle. When cells are cultured in the absence of normal concentrations of growth factors, they enter a quiescent state usually referred to as G0. Upon the addition of serum, the cells reenter a growth state and progress synchronously through G1 into S phase and then G2 and mitosis. Although a large number of studies have employed this experimental strategy to study the molecular events associated with a proliferative response, there are at least two limitations to this approach. First, gene expression changes that can be measured following the stimulation of quiescent cells to enter a proliferative cycle (serum stimulation) do not distinguish between regulation that is strictly related to growth stimulation versus cell cycle control. For instance, genes induced during G1, including at G1/S, may reflect the fact that the cells are reentering a cell cycle as opposed to passing through G1 from a previous cell cycle; genes induced during this time might not be cell cycle regulated but rather growth regulated. Second, it is largely impossible to measure the events associated with continued cycling in serum-stimulated cultures, in particular the changes taking place at the second G1/S transition, due to a loss of synchrony as the population of cells proceed into the cell cycle.
To address these issues, we have combined two forms of analysis to study the events associated with cell cycle reentry and cell cycle progression. In the first instance, MEFs were brought to quiescence by serum starvation and then stimulated to grow by the addition of serum. Samples were taken through 24 h after serum addition and analyzed by flow cytometry. Under the conditions of this experiment, cells began to enter S phase at 15 h following serum addition, as indicated by a determination of DNA content by flow cytometry (Fig. A). To analyze events specific to the cell cycle and apart from control related to stimulation out of a quiescent state, a second population of MEFs were synchronized at the beginning of S phase by arresting the cells in the presence of HU. Upon removal of the drug, these cells then progressed through S phase, G
2, and mitosis and into the next G
1 and second S phase. We have previously described the use of this experimental approach for the analysis of cell cycle regulation of E2F activity as well as certain E2F target genes (
13). Flow cytometry analysis demonstrated that the cells completed the initial S phase by 6 h following release from the HU block and then entered the second S phase approximately 15 h following release (Fig. A).
Aliquots of these samples from the two experiments were also assayed for E2F DNA-binding activity as a measure of progression through the proliferative response. As shown in Fig. B, E2F activities previously shown to accumulate at G
1/S, including E2F1 and E2F3a, were first observed at 12 h following serum stimulation and then peaked at 18 h, coinciding with G
1/S, defined by DNA synthesis measurements. These activities were also elevated in the HU-arrested cells and declined as the cells entered S phase, and then E2F3a activity reaccumulated at the second G
1/S transition. These observations parallel results described previously that demonstrate a cell cycle control of E2F3a activity (
13,
14). In addition, an assay for cyclin E RNA accumulation by Northern blot revealed an accumulation at G
1/S that parallels the accumulation of E2F activity at G
1/S (Fig. C). As such, this experimental approach, which combines analysis of cells reentering a cell cycle from a quiescent state with analysis of proliferating cells leaving a G
1/S arrest, provides a comprehensive view of cell cycle progression.
We next used the RNA from each of these samples to hybridize to high-density DNA microarrays in order to provide a broader examination of the changes in gene expression as cells enter a proliferative state and also pass through a cell cycle. We made use of Affymetrix GeneChip DNA arrays that contained approximately 6,200 murine gene sequences and ESTs. RNA from each of the samples was converted to target following established procedures and then used to hybridize to the GeneChip arrays. The hybridized chips were then processed and analyzed as described in Materials and Methods. The hybridization quantified by the Affymetrix software is shown in Fig. D and compared to a densitometric analysis of the cyclin E Northern blot shown in Fig. C. It is evident from this analysis that the microarray analysis closely matches the Northern analysis.
In order to identify groups of genes with a similar pattern of expression within the cell cycle, the Affymetrix average difference values for each gene, as calculated by the GeneChip expression analysis algorithm, were plotted as a function of time following serum stimulation or time after HU release. Preliminary visual inspection of the data indicated the existence of distinct patterns of gene expression. We have clustered genes based on vectors of expression levels consisting of Affymetrix average difference values for all time points in both the growth stimulation and the cell cycle experiments. This was done using k-means clustering as implemented in the GeneSpring software (Silicon Genetics). This approach is a self-organization of the measured gene expression data and is hence not biased by any prior expectations of how genes might be regulated. Criteria were set to eliminate genes that failed to show significant induction in the serum stimulation experiment. Expression patterns of genes that met these criteria were normalized across the experiments and then clustered by a k-mean clustering algorithm. We have tested several values for the total number of clusters in the k-means clustering procedure. The final analysis was based on 16 clusters; with fewer clusters, we could not identify a unique course of up- and downregulation within each individual cluster, while a larger number of clusters led to distinct clusters with a similar course of gene expression. For this setup, we can summarize each cluster of genes by a characteristic sequence of up- and downregulation at specific time points in the experiments.
Delineation of multiple, distinct patterns of expression within the mammalian cell cycle. Figure displays the clusters as a function of the time of expression through the two experiments. As indicated in the figure, clusters could be identified that included genes expressed highly in quiescent cells and then turned off once the cells begin to proliferate (G0); genes whose expression increased soon after the stimulation of growth and then fell to basal levels (early G1); genes whose expression increased in G1, declined, and then increased again during the second G1 (G1 cycle); genes whose expression increased in G1 and then remained constant thereafter (G1 growth); genes whose expression increased at the G1/S transition, declined, and then increased again at the second G1/S transition (G1/S cycle); genes whose expression increased at G1/S and then remained constant (G1/S growth); and finally, genes whose expression increased at a time coincident with the end of S phase, declined, and then increased again at the second G2 (G2 cycle). Examples of patterns of expression for specific genes within each cluster are shown in Fig. A. The identities of the genes in these clusters, together with information regarding functional properties, were obtained from a search of the UniGene database and are listed in Table .
| TABLE 1Identification of genes regulated during the cell cycle |
Although there were clusters identified in the k-means clustering analysis whose biological relevance was not immediately apparent, other clusters clearly related to known functional properties. For instance, the G1/S and G2 clusters included a number of genes encoding replication and mitotic activities, respectively. The relationship between the time of RNA accumulation and the time when the gene product functions, at least for replication activities controlled at G1/S, has not always been seen in past work studying yeast cell cycle control. In addition, past experiments have not clearly detailed a role for transcriptional regulation during G2 in mammalian systems. In large part this is a reflection of the experimental strategy, which generally examines gene expression following serum stimulation of quiescent cells. Simply examining the pattern of gene expression following stimulation of cell growth does not reveal a clear pattern of gene control at G2, a situation most likely due to loss of cell synchrony. That is, such genes are induced by serum addition, but whether they are activated at G1/S, in S phase, or later is difficult to discern (for instance, compare the G1/S and G2 clusters in Fig. A).
Previous work has suggested that some of the genes in the G
2 cluster are induced at either G
1/S of the cell cycle or late in G
1 (
8). In order to confirm that the microarray analysis did indeed reflect the true behavior of these genes, we assayed the samples from the HU release experiment by Northern analysis, using probes for several genes categorized as G
2 regulated. As shown in Fig. B, it is apparent that both the
cdc2 gene and the
importin-α2 gene are indeed activated at G
2, consistent with the microarray assays. These patterns are in sharp contrast to the pattern for cyclin E expression, which is regulated at G
1/S. We believe that the discrepancy between these data and previous studies very likely reflects the method of cell synchronization and the ambiguity of cell cycle position when only a serum stimulation experiment is employed.
The importance of combining the HU-synchronized samples with the serum-induced samples is clearly illustrated by the last three clusters identified in Fig. . An analysis of only the serum-induced samples would not distinguish these genes. Rather, they would be grouped together as genes induced late in G1. But by combining these data with the HU-synchronized analysis, it becomes readily apparent that there are in fact three distinct clusters—genes induced late in G1 that remain constant, genes induced late in G1 that cycle, and genes induced in G2 that cycle.
Finally, a particularly revealing relationship can be seen in those genes that are activated at G
1/S. One group includes genes activated during G
1 whose expression levels remain high as cells continue to proliferate (G
1/S growth cluster). This group includes genes encoding a variety of proteins that function in transcription, signal transduction, and RNA metabolism (Table ). In contrast, a second group is also activated at G
1/S, but expression of this group oscillates as the cells continue to cycle in the presence of growth factors (G
1/S cycle cluster). This group includes genes whose function is distinct from the other G
1/S-induced group of genes in that these genes encode proteins that are almost exclusively involved in DNA replication. We do note that there is some discrepancy between these results and past experiments that identified several of these DNA replication genes as showing constant expression following G
1/S (
13). In particular, the previous work suggested that the expression of a subset of the Mcm genes was constant following the initial G
1/S, whereas the analyses performed here with DNA microarrays revealed an oscillation in the expression of each of the Mcm genes, as shown for
mcm7 in Fig. A. Although we cannot identify a clear distinction in the two analyses that would explain this difference other than a cell type difference, the fact that a substantial number of additional genes encoding replication proteins are coordinately regulated in this manner leads us to believe that the G
1/S oscillating pattern of expression may be a common aspect of control of replication activities.
Identification of genes induced by expression of E2F activities. We have previously described the use of recombinant adenovirus vectors as a means to efficiently produce proteins in otherwise quiescent cells (
4). The strategy takes advantage of the ability of adenoviruses to infect quiescent cell populations and do so with an efficiency that allows a biochemical analysis of the entire population of cells. Given the fact that the E2F1, E2F2, and E2F3a activities normally only accumulate at G
1/S of the cell cycle, as demonstrated previously and as shown by the data here in Fig. , overproduction of these proteins in a quiescent cell allows an analysis of the induction of potential target genes by these E2F proteins in the absence of other growth regulation activities. Indeed, we have made use of this approach in past experiments to study the induction of various E2F target genes (
3,
4,
12). We have now extended this work through the use of DNA microarrays to facilitate the assay of large numbers of genes in order to gain a more comprehensive view of the pathway of gene control involving E2F activities. Moreover, by performing these analyses in conjunction with the cell cycle determinations, they provide an opportunity to establish a context for understanding previously characterized as well as uncharacterized E2F-regulated genes.
MEFs were brought to quiescence by serum starvation and then infected with either a control adenovirus that expresses green fluorescent protein (GFP) or with viruses that express the E2F1, E2F2, or E2F3 gene products. As shown in Fig. A, these conditions allowed an accumulation of E2F1 or E2F2 activity that, at least for E2F1, was at a level similar to that observed when cells normally pass through G1/S. Thus, the experimental approach does not represent a gross overproduction of the proteins but rather an accumulation to near physiological levels in the absence of the other events normally associated with a proliferative response. In contrast to the accumulation of E2F1 and E2F2 activity, the production of E2F3 activity was markedly reduced compared to the others despite the use of a substantial multiplicity of infection (data not shown). Indeed, an increase in E2F3 activity was only clearly evident upon treatment of extracts with deoxycholate, suggesting that the majority of the ectopically expressed protein was bound to Rb. Given this reduced level of E2F3 activity, we have chosen to focus primarily on the analysis of gene induction by E2F1 and E2F2. A virus titration was used to determine the multiplicities of infection needed to achieve an equivalent level of E2F1 and E2F2 activity.
Measurement of the expression of cyclin E, a previously demonstrated E2F target, demonstrated that the production of the E2F1 and E2F2 activities did lead to an induction of cyclin E expression (Fig. B). We then used the RNA from these infections to generate target for GeneChip analysis. Targets prepared using the RNA from Ad-E2F-infected cells were hybridized to sets of the Affymetrix murine 11K GeneChips and compared to the hybridization pattern obtained with a control (target prepared from RNA from control-virus-infected cells).
We set the following criteria based on the Affymetrix GeneChip expression analysis software as the basis for identifying genes induced by E2F activities: an intensity of expression (average difference value) that was greater than or equal to 50 in the E2F-expressing cells; the gene was considered increased or marginally increased by comparison analysis using the Affymetrix GeneChip expression analysis algorithm; the fold change, as reported by the Affymetrix comparison analysis, was greater than or equal to 2.0. Of the approximately 11,000 sequences scored in the hybridization assays, a small fraction in any given experiment met these criteria. For instance, in one experiment in which the 11,000 sequences were scored for expression using RNA from E2F1- or E2F2-expressing cells, a total of 255 genes exhibited an induction of at least twofold.
It was also clear from an inspection of the data that there was variation from experiment to experiment in the genes scored as induced in the E2F-expressing cells. Such variation could represent differences in the actual experimental manipulations; alternatively, variations in the hybridization analysis could contribute to the variation. To address the basis for the variation, RNA expression was analyzed from two independent experiments. In addition, the RNA samples from one of these experiments were assayed twice independently. Samples obtained from each of these experiments were used to prepare targets and then used for hybridization to the 11,000 murine gene DNA microarray. Reproducibility was assessed by comparing the duplicate hybridization of a given sample. A comparison of the expression profiles of any given gene sequence in the duplicate hybridizations should, in principle, yield the same value. However, we observed 83 genes scoring as induced in the second hybridization over the first, using the criteria described above for the case of two E2F1-expressing samples, and 69 genes scored as induced in the second hybridization over the first for the E2F2 sample. These false-positives constitute different genes for the E2F1 and E2F2 comparisons, and they do not cluster into any known functional group. In contrast, they appear to represent a random sample from a uniform distribution of the set of genes on the chip.
Clearly, the variation described above leads to statistical significance problems for “calls” of induced genes if they are based on a single comparison. To address this issue, we examined all six analyses of gene expression comparing E2F1 or E2F2 against the control. While we would expect a substantial number of false-positive calls for each individual comparison caused by chance variation in measurement, we do not expect these false-positive calls to refer to the same genes in several comparisons. For instance, cyclin E met the criteria in all six possible comparisons, and there were many more genes that met the criteria in more than one comparison. To ensure maximal confidence in the identification of genes as truly induced by E2F activity, we have combined the data for the E2F1 and E2F2 expression analysis and used a criterion of induction being called in four of the six assays to identify genes as induced by E2F activities (see Materials and Methods for a description of the statistical analysis).
It is evident from the list detailed in Table that many previously identified E2F target genes, including cyclin E, cdk2, and thymidylate synthase, were found in this group. But additional genes were evident as well, including other activities known to function in conjunction with DNA replication, such as DNA primase, DNA ligase, flap endonuclease, and topoisomerase. In addition to these, we also identified a number of E2F-induced genes that encode activities not involved in DNA replication, such as several transcriptional regulatory proteins (HMG proteins, enhancer of zeste), DNA repair (RAD51), and cell cycle control (p18). The largest group of E2F-induced genes apart from those encoding replication activities was, however, a collection of genes that encode proteins that function in mitosis. These include kifC1, cdc2, cyclin B, and cdc20.
| TABLE 2Identification of E2F-induced genes |
Relationship of E2F-induced genes to cell cycle control—role for E2F in control of expression of G2 genes. The finding that many of the genes induced by either E2F1- or E2F2-encoded proteins known to function during mitosis was surprising given the fact that E2F activity, particularly E2F1-3, normally accumulates at G
1/S of the cell cycle. As such, it raised the possibility either that the effect of E2F activation on these genes was indirect or that these genes are normally regulated at G
1/S even though the products function in mitosis. The latter scenario has precedence, since a number of yeast DNA replication genes are induced in mitosis, well before S phase (
1,
23). To address this question, we have examined the relationship between the control of transcription by E2F proteins and the control during the cell cycle.
As shown by the data in Fig. A, the E2F-induced genes did not distribute uniformly over all clusters derived from the cell cycle analysis. Rather, the majority accumulated in only two of these cell cycle clusters. Most of the E2F-induced genes fell into either the G1/S cell cycle cluster, genes whose expression peaks at the initial G1/S transition upon stimulation of cell proliferation and whose expression then continues to oscillate during the cell cycle with a peak at G1/S, or the G2 cell cycle cluster. The clustering of E2F-induced genes within the G1/S group of cell cycle-regulated genes is consistent with previous work that demonstrates an accumulation of E2F activities at this time of the cell cycle. In contrast, the finding that a number of genes induced by E2F proteins are normally regulated at G2 is surprising in light of the fact that these E2F activities are essentially undetectable at this time of the cell cycle. Although it is possible that there is an accumulation of E2F activity in G2 that has gone undetected in previous work or that there is a role for other E2F activities, such as E2F4 or E2F5, which are present at this time, in transcription activation, it is also possible that the activation of these genes that are normally regulated at G2 during the cell cycle is a secondary effect of E2F accumulation at G1/S.
To provide further verification of the induction of genes by E2F, particularly those in the G2 category, we analyzed RNA samples by Northern blot assays. As shown by the data in Fig. B, the E2F-mediated induction of one of the G1/S-regulated genes (RRM2) was clearly evident, similar to the induction of cyclin E, as seen in the analysis shown in Fig. B and consistent with the induction of many others in this category (Table ). In addition, we also assayed several of the genes identified in the G2 cluster, including cdc20, cyclin B1, and importin-α2. It is evident from these assays that each of these genes was indeed induced by E2F, either E2F1 or E2F2, similar to the induction of the G1/S genes, thus confirming the results of the DNA microarray analysis.
The fact that cells expressing E2F1, E2F2, or E2F3 do not complete S phase or enter mitosis (data not shown) argues that the induction of these G2-specific genes is not the simple consequence of induced cell cycle progression. But whether the G2-specific genes such as cyclin B are directly or indirectly activated by E2Fs is not clear and must await a determination of the promoter elements that are critical for the induction of these genes in G2.