|Home | About | Journals | Submit | Contact Us | Français|
The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.
Proper regulation of the cell division cycle is crucial to the growth and development of all organisms; understanding this regulation is central to the study of many diseases, most notably cancer. Many of the diverse molecular processes needed to duplicate a cell, such as DNA replication, monitoring of replication fidelity, and accurate segregation of chromosomes to daughter cells, are characteristically aberrant in cancer cells. These processes and their regulation have been extensively investigated at the molecular level (reviewed in Stillman, 1996 ; Nurse, 2000 ; Shah and Cleveland, 2000 ; Hinchcliffe and Sluder, 2001 ; Wittmann et al., 2001 ). Characterization of the genome-wide transcriptional program of the cell division cycle in mammalian cells is a critical step toward understanding the basic cell cycle processes and their role in cancer.
The genome-wide transcriptional program during the cell cycle has been investigated in a wide range of organisms, including budding yeast (Cho et al., 1998 ; Spellman et al., 1998 ), bacteria (Laub et al., 2000 ), primary human fibroblasts (Iyer et al., 1999 ; Cho et al., 2001 ), mouse fibroblasts (Ishida et al., 2001 ), and human HeLa cells (Crawford and Piwnica-Worms, 2001 ). In synchronized cultures of the budding yeast Saccharomyces cerevisiae, 800 genes were found to be periodically expressed (Spellman et al., 1998 ) and in synchronized cultures of the bacterium Caulobacter crescentus, 553 periodically expressed genes were identified (Laub et al., 2000 ). The human cell cycle was previously studied first by observing the genes whose expression changes when primary fibroblast cultures were stimulated with serum to reenter the cell cycle from a serum-deprived, resting state. Interpretation of these studies is complicated by the subsequent realization that the major gene expression program observed is more than a simple cell cycle response, but rather is combined with a prominent wound-healing response (Iyer et al., 1999 ). By modifying this approach, Ishida et al. (2001) identified 578 genes that were induced upon serum stimulation and provided evidence of cell cycle regulation based on the pattern of expression after release from a cell cycle arrest with hydroxyurea, which blocks DNA replication. A study of the G2 DNA damage checkpoint in HeLa cells synchronized by a double thymidine block identified transcripts that showed differential expression from S to G2/M phases but failed to identify a specific transcriptional response associated with the G2 DNA damage checkpoint (Crawford and Piwnica-Worms, 2001 ). Finally, ~700 genes were identified as cell cycle regulated in primary human fibroblasts synchronized by a double thymidine block, which attempts to avoid the confounding serum stimulation response altogether (Cho et al., 2001 ).
Herein, we report the extension of these results by synchronizing HeLa cells not only by double thymidine block but also by two additional methods, a thymidine-nocodazole block and a physical method, mitotic shake-off. By combining data from experiments using these three different synchronization methods in five independent experiments, we identified >850 genes that are periodically expressed during the cell cycle. Using this list of genes we were able to show that most of the genes previously associated with the proliferative state of tumors are among the genes we identified as periodically synthesized during the human cell division cycle.
HeLa S3 cells were plated (2 × 106 cells) in 150-mm tissue culture dishes in DMEM with 10% fetal bovine serum and 100 U of penicillin-streptomycin (Invitrogen, Carlsbad, CA). Cells were arrested in S phase by using a double thymidine block or in mitosis with a thymidine-nocodazole block essentially as described previously (Whitfield et al., 2000 ) and in the supplemental material. Poly(A) RNA was prepared from cells collected at intervals (typically 1–2 h) by lysing cells directly on the plate with the FastTrack 2.0 mRNA isolation kit (Invitrogen). Synchrony was monitored by flow cytometry analysis of propidium iodide-stained cells (Stanford Shared FACS Facility, Stanford University School of Medicine, Stanford, CA).
Mitotic cells were collected every 10 min with an automated cell shaker (Eliassen et al., 1998 ), stored on ice, and plated at 2-h intervals in fresh prewarmed media (at least 106 cells for each time point). Because only a small number of cells can be obtained by mitotic shake-off, total RNA was prepared using ULTRASPEC RNA isolation system (Biotecx, Houston, TX). The number of cells in S phase at each point was determined using a 5-Bromo-2′-deoxyuridine (BrdU) Labeling and Detection Kit I (Roche Applied Science, Indianapolis, IN).
Reference RNA was prepared from asynchronously growing HeLa cells using TRIzol (Invitrogen) and poly(A) RNA isolated by affinity chromatography on oligo-dT cellulose (Amersham Biosciences, Piscataway, NJ). Poly(A) RNA was used as a reference in all experiments except the mitotic shake-off, where total RNA was labeled.
RNA from synchronous cells was reverse transcribed into Cy5-dUTP (Amersham Biosciences)-labeled cDNA, and reference RNA reverse transcribed into Cy3-dUTP (Amersham Biosciences)-labeled cDNA by standard methods (Eisen and Brown, 1999 ) (Details are available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.) Total RNA samples from cells collected in the mitotic shake-off experiment, and total reference RNA were first amplified using a modified Eberwine protocol before cDNA synthesis (Wang et al., 2000 ) then labeled cDNA was prepared from the amplified RNA.
Spotted cDNA microarrays, containing 22,692 elements representing ~16,332 different human genes or containing 43,198 elements representing ~29,621 genes (estimated by UNIGENE clusters), were manufactured in the Stanford Microarray Facility (http://www.microarray.org). Equal amounts of Cy5- and Cy3-labeled cDNA were hybridized to spotted cDNA microarrays and scanned using a GenePix 4000A Scanner (Axon Instruments, Union City, CA). Detailed protocols are available at http://brownlab.stanford.edu/protocols.html/.
Data were extracted by superimposing a grid over each array using GenePix 3.0 software (Axon Instruments). Spots of poor quality, determined by visual inspection, were removed from further analysis. Data collected for each array were stored in the Stanford Microarray Database (SMD) and are available from SMD at http://genome-www.stanford.edu/microarray/ (Sherlock et al., 2001 ).
Only features with signal intensity at least 20% above background in both Cy5 and Cy3 channels and for which adequate quality data were obtained for at least 80% of the samples in a given time course were analyzed further. Data points that did not meet these criteria are blank in the primary data tables. Log2 (Cy5/Cy3) was retrieved for each data point and used for all analysis, where (Cy5/Cy3) is the normalized ratio of the background-corrected intensities, as defined in SMD (Sherlock et al., 2001 ).
Because of systematic differences between experiments (e.g., array batch, labeling methods, and synchronization methods) each time course was centered independently by filtering out the first, most significant eigengene (Alter et al., 2000 ), which was a dominant, constant vector. Because singular valve decomposition (SVD) requires a full data matrix, missing data points were estimated using a k-nearest neighbors algorithm (Troyanskaya et al., 2001 ) with k = 12. These imputed values were used throughout the analysis and but were restored to “unknown” status in the figures and left blank in the primary data tables (http://genome-www.stanford.edu/Human-CellCycle/HeLa/).
A Fourier Transform (eqs. 1–3) was applied to the data for each clone in an experiment (Spellman et al., 1998 ), and the resulting vector (C, eq. 3) of the sine (A) and cosine (B) coefficients was stored, where T is the cell cycle period, t is the time after release, is the phase offset, and ratio(t) is the normalized Cy5/Cy3 expression ratio at time t. The value of was initially set to 0. The values obtained for C were determined over a range of 40 values of T equally spaced 1 h above and below the estimated cell cycle period and the resulting values averaged and stored.
The optimal cell cycle period was determined by finding the value of T where the largest numbers of genes pass an arbitrary magnitude cutoff (D, eq. 4). Fourier transforms were applied to the data series for each gene (eqs. 1–3), with equally spaced values of T, from 0 to 40 h in 15-min increments. The number of genes whose magnitude (D, eq. 4) exceeded the arbitrary cutoff of 3, 5, or 7 was plotted and a period (T) was chosen that maximized the numbers of genes exceeding our arbitrary thresholds. In most cases, the determined value of T was consistent with the data obtained by flow cytometry.
Because each experiment does not start at exactly the same point in the cell cycle, an offset (, eqs. 1 and 2) was calculated for each dataset relative to the first double thymidine arrest. The magnitudes of the Fourier transform (D, eq. 4) for the 1000 highest scoring clones were summed using different values of , equally spaced between 0 and 2π. The offset that gave the highest average combined magnitude (D, eq. 4) between the two datasets for these 1000 genes was then used. The Fourier transform was then repeated on the remaining datasets with the following values of Τ and : Thy-Thy 2 (T = 15.5 h, = 0.5 rad), Thy-Thy 3 (T = 15.4, = 0.0 rad), Thy-Noc (T = 18.5, = 3.2 rad), and mitotic selection (T = 24.5, = 3.5 rad). The vectors C (eq. 3) for all five datasets were then summed and the genes ranked according to the magnitude (D, eq. 4) of their combined vectors. Note, the Thy-Noc and mitotic shake-off experiments, which arrest cells in mitosis, have offsets of approximately half a cycle (π radians) from Thy-Thy 1, which arrest cells at G1/S.
Because the gene expression profiles of many cell cycle genes do not precisely match sine and cosine curves, the expression profile of each gene was correlated to an idealized vector obtained from known genes expressed in each cell cycle phase (G1/S, S, G2, and G2/M) as defined in Figure Figure2A.2A. Using a standard Pearson correlation, each gene received a peak correlation score defined as the highest absolute value correlation between one of the four idealized vectors and its expression profile (Spellman et al., 1998 ). The absolute value of the peak correlation was used to scale the magnitude of the vector (C, eq. 2) generating a “periodicity score” for each gene (Table (Table1).1).
To estimate the minimum periodicity score for a cell cycle regulated gene, the analysis described above was repeated on randomized data. The data were randomized either within rows only, or within both rows and columns, for each of the five datasets starting with the imputed, SVD-centered data. The Fourier transform and correlations were applied using the previously calculated values of T and ; the resulting vectors (C, eq. 3) were combined for each dataset. The magnitude (D, eq. 4) of the Fourier transform was scaled by each “gene's” peak correlation to one of the four ideal expression profiles. This analysis, including the data randomization, was repeated 10 times and the scores combined by averaging the score for each of the highest scoring “genes” from each randomization, followed by the second highest, third highest, etc. The estimated false positive rate at a given periodicity score is the number of genes that obtain at least that score in the randomized data. We chose a minimum periodicity score of 3.29, which gave us 1333 clones at an initial false positive estimate of 1% when the data were randomized in rows. Repeating this analysis 10 times gave an estimated 10 false positives (0.75%; periodicity scores of 5.18–3.33) when the data were randomized only within rows, and two false positives (0.15%; periodicity scores of 3.72 and 3.30) when the data were randomized in both rows and columns.
The false positive estimate, calculated above, is likely to underestimate the true false positive rate because it does not take into account genes that received a high Fourier score because they exhibited a sinusoidal pattern in only part of a time course. To filter out genes that did not show periodic expression, autocorrelations for each 1333 clones were calculated (eq. 5). The autocorrelation A is equal to the summation over all times t of the product of the ratio at t multiplied by the ratio at a time t + T, where T is the cell cycle period determined by Fourier analysis. If the data for a gene repeats with a period T, the autocorrelation will be high.
Autocorrelation scores were calculated for Thy-Thy 3 and the Thy-Noc experiments because they represent multiple cell cycles and because points were taken at equally spaced intervals throughout each time course. In experiment 3 autocorrelations were calculated for periods (T, eq. 5) of 15, 16, and 17 h. In experiment 4 autocorrelations were calculated for periods of 16, 18, and 20 h. The score for each gene in a given time course was taken to be the maximum of the three autocorrelations. The final autocorrelation score assigned to each gene was the sum of the scores calculated for each of the two time courses.
Autocorrelations were used as a filter to remove genes that showed transient expression despite receiving a high periodicity score. 199 Genes with a negative autocorrelation (a negative autocorrelation indicates the measured ratios do not repeat every cell cycle) were eliminated from the initial set of putative cell cycle-regulated genes. Autocorrelations were also calculated for data randomized in rows, whereupon few genes received negative autocorrelations in the randomized data, indicating that the negative scores are unlikely to occur by chance. The distribution of autocorrelation scores is shown in Supplemental Figure 16.
Our final list contains 1134 clones that correspond to 874 UNIGENE clusters (UNIGENE build 143, released November 9, 2001, 21 clones not found in UNIGENE, 66 map to more than one UNIGENE cluster). The data for all 1134 clones as well as the primary data are available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.
The sensitivity with which one can detect periodic activities in synchronized cell cultures depends almost entirely on the degree to which cells can be synchronized. In particular, cells that fail to begin cycling promptly upon release will contribute noise in the analysis. We chose the well-studied epithelial cell line, HeLa S3, derived from a cervical carcinoma (Puck et al., 1956 ), specifically because a high degree of synchrony can be achieved with diverse methods (Knehr et al., 1995 ; Whitfield et al., 2000 ). HeLa cells can be synchronized so that 95% of the cells reenter the cell cycle, whereas the comparable figure for primary cells (such as foreskin fibroblasts or mammary epithelial cells) is usually reported to be between 60 and 80% (Tobey et al., 1988 ; Fonagy et al., 1993 ; Stampfer et al., 1993 ). Three different methods were used to obtain synchronous populations; in each case synchrony was monitored either by flow cytometry or BrdU incorporation (Figure (Figure1).1). A double thymidine block was first used to arrest cells at the G1/S boundary (Adolph and Phelps, 1982 ), providing the best synchrony in G2 and M phases of the cell cycle (Figure (Figure1A).1A). This protocol provided a somewhat less robust synchrony of the G1-to-S transition, because cells tend to differ in the rate at which they pass through G1. To improve our resolution in the G1-to-S period, we used another method, in which HeLa cells were first arrested by thymidine, released, and subsequently arrested in mitosis with the antimicrotubule drug nocodazole (Figure (Figure1B)1B) (Zieve et al., 1980 ). In this case, better synchrony was obtained in G1 and S phases, because the synchronous release was at M. The final synchronization method involved no drugs; instead, HeLa cells at metaphase were collected by a physical method, mitotic shake-off (Schneiderman et al., 1972 ), by using an automated cell shaker (Eliassen et al., 1998 ). More than 95% of the cells collected by mitotic shake-off reentered the cell cycle and progressed into S phase as monitored by incorporation of BrdU (Figure (Figure1C).1C). In all cases, with the exception of the mitotic shake-off, two to three synchronous cell cycles were obtained.
To identify cell cycle-regulated transcripts, RNA was isolated from HeLa cells at intervals (typically 1–2 h) after release from a synchronous arrest (Figure (Figure2A).2A). Cy5- or Cy3-labeled cDNA was synthesized using standard protocols and hybridized to cDNA microarrays containing either 22,692 features representing ~16,332 different human genes (Thy-Thy 1 and Thy-Thy 2) or 43,198 features representing ~29,621 different genes (Thy-Thy 3, Thy-Noc, and Mitotic Shake-off) as estimated by UNIGENE clusters.
The data from each of the synchronization experiments required separate analysis before combination into a single dataset as described in MATERIALS AND METHODS. To identify those transcripts that were cell cycle regulated, a periodicity score was obtained for each clone using a Fourier transform and correlation to known cell cycle genes as described for yeast (Spellman et al., 1998 ). Representative genes and their periodicity scores are shown in Table Table1;1; the range was 0–58.8.
The minimum score for a gene we designated as “cell cycle-regulated” was determined by estimating the false positive rate by using randomized data as described in MATERIALS AND METHODS. We chose a threshold score of 3.29 that gave an estimated 10 false positives (0.75%; periodicity scores of 5.18–3.33) when the data were randomized only within rows (i.e., within each gene), and two false positives (0.15%; periodicity scores of 3.72–3.30) when rows and columns were randomized. Using this relatively conservative threshold, we identified 1333 clones as cell cycle regulated.
A fraction of the genes with relatively high Fourier scores varied in a quasi-sinusoidal manner during part of the time series but clearly were not periodically expressed. This quasi-sinusoidal variation resulted from variation in gene expression at the beginning of each time course, probably as a result of the synchronization procedure. Possible sources of this variation are the serum response, resulting from the addition of fresh growth media upon release of the cells from the arrest, or simply a stress response resulting from the cell cycle block. To filter out genes that did not oscillate across multiple cell cycles, each clone was assigned an autocorrelation score that describes whether the expression ratio at a given time is a good predictor for the value of the expression ratio one cell cycle later. Thus, genes with a consistent pattern of periodic expression every cell cycle have positive autocorrelation scores, whereas genes with low or negative scores do not have consistent periodic expression. Using this method, 199 of the initial 1333 clones received negative autocorrelation scores and were removed from the list, leaving 1134 clones representing 874 different genes.
To assess the quality of the criteria for a periodically expressed gene, the false negative rate was estimated by using a list of known cell cycle-regulated genes compiled from the literature. The known genes were limited to those regulated at the mRNA level during a continuous human cell cycle, as determined by traditional methods (e.g., Northern blot, S1 nuclease assay, or RNase protection; Table Table2).2). Many “known” cell cycle genes are not included in this list because their regulation has been demonstrated only during the resting-to-growing transition, not in cells synchronized independent of serum, or because data for the regulation of the mRNA could not be found, despite a sometimes extensive literature on protein levels, localization, and function during the cell cycle (e.g., CENPE; Wood et al., 1997 ; Abrieu et al., 2000 ). Of the 49 known cell cycle-regulated genes in our list, three were not measured in our analysis. Of the remaining 46, 52.2% (24/46) are found in the top 120 scoring clones, and 93.5% (43/46) are included in the top 850 clones, yielding a false negative rate of 6.5%. Although most of the known genes are represented in the top 850 clones, we believe there is still significant information to be obtained between our 850 and 1134 clones, which include duplicate clones of many of the known genes (e.g., the histones and RAD51) and many genes expressed during mitosis.
Figure Figure22 presents data for well-studied genes that showed peak expression at specific points in the cell cycle. The times of S phase and mitosis, as estimated from the flow cytometry and BrdU data (Figure (Figure1),1), are indicated in Figure Figure2.2. Maximum expression of the mitotic cyclins (CCNE1, CCNA2, and CCNB1) was observed in the known temporal order and at the expected times (Pines and Hunter, 1989 ; Pines and Hunter, 1990 ; Lew et al., 1991 ). Cyclin E1 expression at the G1/S boundary was accompanied, as expected, by E2F1 and the DNA replication factors CDC6 and PCNA (Morris and Mathews, 1989 ; Ohtani et al., 1995 ; Yan et al., 1998 ). Many DNA metabolism genes were expressed at the beginning of S phase; these are represented here by RFC4, DHFR, RRM2, and RAD51. In our data, two distinct groups of mitotic genes could be distinguished. Transcripts of some genes, such as CCNA2, TOP2A, Cyclin F, and CDC2, peaked in G2, whereas others, such as CCNB1, BUB1, STK15, and PLK1, peaked in mitosis. The differences in expression of the G2 and M phase genes are most evident in the third double thymidine arrest experiment in which samples were taken every hour for 46 h and at the earliest times after the thymidine-nocodazole arrest (Figure (Figure2).2). We have followed our data, rather than precedent, in recognizing that many, if not all, the genes expressed during the physical act of mitosis continue to be expressed into the G1 period and have thus labeled these genes M/G1. These genes are represented in Figure Figure22 by RAD21, PTTG1, VEGFC, and CDKN3.
All 1134 clones identified as periodically expressed were sorted by the point in the cell cycle when they showed peak expression as calculated from the sine and cosine components for the Fourier transform (Figure (Figure3A).3A). The periodic nature of the expression patterns is evident from the alternating red (strong expression relative to the asynchronous reference) and green (weak expression relative to the asynchronous reference). It is notable that these patterns persist across multiple cell cycles in both Figures Figures22 and and3.3.
To minimize unnecessary reassignment of genes relative to the published literature, each gene was assigned a cell cycle phase (G1/S, S, G2, G2/M, or M/G1) by their peak correlation to an idealized expression profile generated from the well-studied genes selected for Figure Figure2.2. These assignments are necessarily somewhat arbitrary because a gene assigned to G1/S can be immediately adjacent to a gene assigned to S phase. By assigning phase in this way, 211 (18.6%) of the clones were maximally expressed during the G1/S transition, and 221 (19.5%) were maximally expressed in S phase. Genes in each of these phases have a known role in replication initiation and DNA metabolism. More than 50% of the clones showed peak expression in G2 and M phases of the cell cycle; 239 clones (21.1%) peaked in G2, 273 clones (24.1%) at G2/M, and 190 clones (16.8%) at M/G1. Genes with maximal expression levels in G2 and M phases have roles in chromosome metabolism, surveillance of mitotic processes, and cell adhesion. Interestingly, relatively few genes show peak expression between M/G1 and G1/S, as judged from the distribution of arctangent values calculated from the Fourier transform (Supplemental Figure 15).
In most cases the phase assignments we made for the known genes are in agreement with the published literature (Table (Table2).2). Several genes that were previously characterized to peak in a specific cell cycle phase were assigned to an adjacent phase. Examples are CDC45L and CDKN3, both reported to peak at G1/S, whereas in our data they fell into the S phase and M/G1 groups, respectively (Table (Table2).2). The difference in assignments may represent differences in experimental systems, methods of measurement, or resolution in the different experiments.
The replication-dependent human histone mRNAs are not polyadenylated (Marzluff, 1992 ) and thus behave sporadically in all experiments except the mitotic shake-off experiment where total RNA was labeled. Because of this inconsistent behavior, they were incorrectly assigned a phase of G1/S by our method. Because the data obtained from mitotic selection and extensive literature on the cell cycle regulation of the histone genes indicated they are expressed in S phase, the phase has been assigned manually, rather than by correlation to the ideal vector (indicated by an asterisk [*] in Table 2 and the supplemental data).
Hierarchical clustering of transcripts with similar expression patterns often groups genes according to the processes in which they participate; these processes can often be inferred from the annotations of the known genes in each group (Eisen et al., 1998 ). To begin to identify the processes that are represented in our clusters, we annotated the known genes in our clusters with gene ontology (GO) terms (Ashburner et al., 2000 ) for biological process from LocusLink (Pruitt and Maglott, 2001 ). GO, initially developed using the model organisms S. cerevisiae, Drosophila melanogaster, and the mouse Mus musculus, provides a controlled vocabulary to describe the functions of known genes. GO terms are divided into three different categories: 1) molecular function, 2) biological process, and 3) cellular component. Examination of the lists of clustered genes and their annotations shows that within our set of 1134 clones there are clusters of genes that have similar or identical annotations (Figure (Figure3B).3B). In what follows, GO terms are italicized and terms we have applied provisionally herein because of the incomplete application of GO to human genes are marked with an asterisk (*). The DNA replication clusters include genes associated with the following process annotations: DNA replication, expressed in G1/S (early DNA replication; e.g., the components of the prereplication complex and ORC1); DNA metabolism (including its daughter processes DNA repair and DNA recombination) and nucleotide metabolism, expressed in S (late DNA replication; e.g., RAD51, DNA polymerases, and nucleotide metabolism enzymes); and chromatin assembly/disassembly (e.g., the histones), also expressed in S. The mitotic clusters include mitotic chromosome segregation (the α- and β-tubulins) expressed in G2; spindle assembly, expressed during G2 (e.g., kinesins and centrosome duplications genes); mitotic checkpoint and nuclear division, expressed at G2/M (e.g., the checkpoint genes BUB1 and CENPE); sister chromatid cohesion (e.g., PTTG1 and genes of the cohesin complex STAG1, RAD21); and cell adhesion (e.g., CTNND1 and vinculin) and components of the RAS signal transduction pathway expressed during the physical act of mitosis, M/G1. The first point to be made about these clusters (shown in more detail in Figures 6–8 and the supplemental data, which contain the complete cluster diagram with gene names) is that each represents a reasonably complete and specific list of the genes necessary for the major, essential processes that must occur every cell division cycle. The second point to be made about these is that, in addition to genes of known function, there are numerous genes of heretofore uncharacterized function that are now implicated in these processes.
Recently, several studies of global gene expression patterns in human tumors and tumor cell lines have been published (Perou et al., 1999 , 2000 ; Alizadeh et al., 2000 ; Ross et al., 2000 ). Each of these studies identified a prominent cluster of genes whose expression was correlated with the rate of proliferation of the tumors or cell lines under study. As we show now, half or more of the genes that comprise this “proliferation cluster” are the same genes that we have found to be periodically expressed in HeLa cells.
To compare explicitly the genes of the proliferation cluster with the 874 periodically expressed genes described here, we took the list of genes in each of the proliferation clusters identified in the studies of breast tumors (Perou et al., 2000 ) and lymphoma (Alizadeh et al., 2000 ), and extracted the patterns of gene expression for each from the dataset described here. These patterns, lined up by peak expression during the cell cycle as in Figure Figure3,3, are presented for 112 genes measured here and in breast cancer (Figure (Figure4A)4A) and for 96 genes measured here and in lymphoma (Figure (Figure4B).4B). As the figures show, 62% (69/112) of the genes in the breast tumor proliferation cluster and 45% (43/96) of the lymphoma proliferation cluster and are among the 874 periodically expressed genes we detected in synchronized HeLa cell cultures.
The simplest interpretation of this result is that the genes whose transcript levels correlate with the proliferation rate of the tumors are expressed only in cycling cells, presumably as a consequence of regulation that allows transcription only during the appropriate stage of the cell cycle and not at any other time, including quiescence. Even if this idea is correct, it need not apply to all genes that are periodically expressed during the cell cycle. It seems quite possible that some genes that are periodically expressed in cycling cells might not appear in the proliferation clusters of tumors because they might be expressed under some circumstances in noncycling cells or strongly regulated not only by the cell cycle but also by related factors, and thus show little association with tumor proliferation. One might even anticipate that some periodically expressed genes might be regulated in ways (e.g., strong expression in differentiated nonreplicating cells) that result in reduced transcription during active cell division cycles.
To explore further the relationship between periodic expression during the cell cycle and expression in proliferative tumors, we have extracted the data for every clone that was both identified as periodically expressed in this study and also measured in the breast tumor study, regardless of its pattern there (Perou et al., 2000 ). This yielded 386 clones, of which <25% appeared in the proliferation cluster. Hierarchical clustering of this set of 386 using the combined data from both studies revealed, as anticipated, not only periodically expressed genes that are highly correlated with tumor proliferation (a more comprehensive proliferation cluster, as described above; Figure Figure5B,5B, ii and iii) but also periodically expressed genes that show heterogeneous expression in tumors (i.e., no obvious association with tumor proliferation; Figure Figure5B,5B, v), and periodically expressed genes whose expression is apparently uncorrelated with the proliferative state of the tumors (Figure (Figure5B,5B, i and iv).
Correlation between periodic expression in the cell cycle and strong expression in highly proliferative tumors can be accounted for simply because there is an increase in the number of cycling cells (as described above), but other more specific explanations can also be proposed. Many genes that we have found to be periodically expressed have already been implicated directly in tumorigenesis and their expression has been reported to coincide with cellular transformation and oncogenic potential (reviewed in Bishop, 1991 ; Hunter, 1997 ). Genes of this kind are common in clusters whose expression is highly correlated with tumor proliferation in our study, including some expressed primarily in the G2 and M phases (Figure (Figure5B,5B, ii) and others whose expression peaks in G1 and S phases (Figure (Figure5B,5B, iii).
Among the periodically expressed genes whose expression is heterogeneous or uncorrelated with tumor proliferation are genes for a variety of cell cycle processes, including genes for cell-cell adhesion (calponin-2, smoothelin, and vinculin) (Figure (Figure5B,5B, i). Expression of some of these genes, such as vinculin, in transformed cells has been reported to decrease tumorigenecity (Rodriguez Fernandez et al., 1992 ). Another group of cell cycle-regulated genes that are not correlated with tumor proliferation includes genes necessary for withdrawal from the cell cycle and apoptosis (the cyclin-dependent kinase [cdk] inhibitor CDKN2C (p18) and Caspase 3; Figure Figure5B,5B, iv). Finally, a group of G1 and S phase genes shows heterogeneous expression in tumors (Figure (Figure5B,5B, v), including DHFR, E2F1, CDC6, CCNE1, CHAF1A, TOP3A, ORC1, and BRCA1; these data suggest that the regulation of each of these genes is more complex than simple restriction of transcription to a particular phase of the cell division cycle.
The close similarity between the set of genes whose expression is associated with tumor proliferation and those that are periodically expressed invites a closer examination of the biological roles of the gene clusters detected in Figure Figure3B,3B, in light of the specific features of both the cell cycle and tumor biology. In what follows, we survey roles of the genes of known function in each of the prominent clusters. It should be recalled, however, that each of these clusters contains, in addition to the genes of known function, genes whose biological role remains to be determined (expressed sequence tags; ESTs). The numbers of such uncharacterized genes are noted in the legends to Figures Figures66–8 and, of course, the detailed position of each of them can be found in the supplementary information and on the Web site.
The “early DNA replication” cluster contains genes expressed in late G1 phase with their expression continuing into S phase. The GO process annotations that apply to most of these genes are DNA replication (including well-characterized genes encoding RFC4, DNA polymerase delta 3, DDX11, and geminin), DNA packaging (including CHAF1A, CHAF1B, and PCAF), and DNA repair (including MSH2, FEN1, and PCNA). Many of the genes in the DNA replication category are components of prereplicative complex and include CDC6, ORC1L, MCM2, MCM4, MCM5, and MCM6 (Lei and Tye, 2001 ); this group of genes is consistently expressed at relatively high levels in diverse types of tumors and has been proposed as diagnostic markers by Laskey and coworkers (Williams et al., 1998 ). The cluster also includes genes that fall into the GO category cell cycle control and are necessary for entry into S phase (in this group are Cyclin E1, E2F1, CDC25A, and Cyclin E2).
The “late DNA replication” cluster (Figure (Figure6B)6B) contains genes necessary for the continued synthesis of DNA, including nucleotide metabolism (including TYMS, DHFR, RRM1, and RRM2) and additional DNA replication genes (DNA polymerase alpha and theta; Primase 1 and 2A). Also present are genes for DNA repair and DNA recombination (RAD54 and RAD51). Genes involved in DNA repair were also expressed in S phase in a study of the cell cycle in primary human fibroblasts, and it was reported that these genes were also induced when cells were treated with UV radiation (Cho et al., 2001 ).
Histone synthesis is required during DNA replication to package newly synthesized chromatin and is tightly regulated during the cell cycle (Marzluff and Pandey, 1988 ; Schumperli, 1988 ). The replication-dependent histone genes are coordinately regulated and the expression of their mRNAs is tightly restricted to S phase of the cell cycle by both transcriptional and posttranscriptional mechanisms (Harris et al., 1991 ; Eliassen et al., 1998 ). The “histone” cluster (Figure (Figure6C),6C), expressed during S phase, contains the core histones H2A, H2B, and H4 and the linker histone H1, which we have assigned to the GO category of chromatin assembly/disassembly. These mRNAs are not polyadenylated and hence are measured poorly and inconsistently in all experiments except mitotic selection where total RNA was labeled. It should be noted that the several H2A genes will cross-hybridize, as will the H2B, H4, and H1 genes, but significant cross-hybridization between the different genes (e.g., H2A and H2B) is unlikely. Histone H3 mRNA, although cell cycle-regulated, was not found in our study, possibly because of preferential labeling of the polyadenylated, non-cell cycle-regulated variant H3.3 (Wells and Kedes, 1985 ).
Two regulators of mammalian histone mRNA synthesis are present in the G1/S cluster (Figure (Figure6A),6A), the stem-loop binding protein (SLBP) necessary for histone pre-mRNA processing (Wang et al., 1996a ; Dominski et al., 1999 ), and nuclear protein mapped to the AT locus (NPAT), which was recently identified as an activator of histone gene transcription and is hypothesized to be a chromatin remodeling factor (Ma et al., 2000 ; Zhao et al., 2000 ). The SLBP was previously shown to be cell cycle regulated by both translational and posttranslational mechanisms; the magnitude of the changes observed herein in the mRNA level (approximately twofold) is consistent with Whitfield et al. (2000) .
Many of the genes expressed at G1/S and S phase are known E2F targets. A study of the cell cycle and the E2F transcription factors in mouse embryo fibroblast, using microarrays, identified both G1/S and S phase genes, as well as genes expressed at G2 and M phases, as targets of the E2F transcription factor (Ishida et al., 2001 ). Ishida et al. (2001) identified the histone SLBP as induced by E2F, suggesting that histone synthesis may be linked to the Rb/E2F pathway. Furthermore, NPAT is a cyclin E/cdk2 substrate and its overexpression promotes S phase entry (Zhao et al., 1998 ). Although NPAT itself has not been implicated in tumorigenesis, the role of chromatin remodeling in cancer is becoming increasingly clear (Archer and Hodin, 1999 ).
The mitotic genes form three distinct clusters, one of which includes genes that encode α- and β-tubulins; the other two contain well-characterized genes that encode proteins involved in assembly and surveillance of the mitotic spindle (Figure (Figure3B).3B). The remaining G2 and M phase genes that fall outside of these core clusters also have functions concerned with the GO processes mitosis, and its daughter processes (e.g., mitotic chromosome segregation, spindle assembly, mitotic checkpoint, etc).
The tubulin cluster includes α- and β-tubulins (TUBA1, TUBA2, TUBA3, TUBB, and TUBB2) as well as BUB3, which is a mitotic checkpoint regulator (Figure (Figure7A).7A). (Note that there may be cross-hybridization among the several α-tubulin genes or between the several β-tubulin genes, but cross-hybridization between α- and β-tubulin genes is unlikely.) Coordinate regulation of tubulin synthesis at the level of mRNA abundance is evident herein, even though a large component of the regulation of these genes is posttranscriptional (Cleveland, 1989 ). Yeast BUB3 was isolated as a multicopy suppressor of a conditional mutation in α-tubulin (Guenette et al., 1995 ), the coexpression of hBUB3 with the coordinately regulated tubulins is consistent with its role of monitoring microtubule assembly as cells enter mitosis.
The second G2 phase cluster contains genes predominately for organization of the mitotic spindle, including kinesins and the TTK kinase (human homolog of the S. cerevisiae MPS1 spindle checkpoint gene; Figure Figure7B;7B; Abrieu et al., 2001 ). Five different kinesins are present: KNSL1, KNSL2, KNSL4, KNSL5, and KNSL6. KNSL6 is expressed at the boundary between G2 and M phase genes and in our study has been assigned a phase of G2/M. In addition to Cyclin A2, CDC2, CKS1, Cyclin F, and numerous other regulatory factors are found in this cluster. Also specifically expressed in G2 is ESP1 or human Separin, which is released from PTTG1 (human homolog of the yeast securin PDS1) during anaphase and promotes chromatid separation (Zou et al., 1999 ). Finally, it may be worth noting that Importin alpha, also in this cluster, has recently been shown to inhibit spindle assembly by binding a protein that promotes spindle formation, Repp86 (Xenopus TPX2) (Gruss et al., 2001 ; Nachury et al., 2001 ; Wiese et al., 2001 ). This inhibition is relieved by the action of the small GTPase RAN when bound to GTP (present in the M/G1 cluster; Figure Figure8),8), which is proposed to surround M phase chromosomes (Carazo-Salas et al., 1999 ; Hetzer et al., 2000 ), thus allowing spindle formation only in the vicinity of chromosomes. The temporal succession of the transcripts for these two proteins suggests a possible role in regulating the mitotic spindle.
The third mitotic cluster contains genes that peak at G2/M and centers on Cyclin B2 (Figure (Figure7C);7C); Cyclin B1 falls just outside this cluster. Genes that have known functions in the mitotic spindle checkpoint show peak expression at G2/M, including BUB1, BUB1B, CDC20, and CENPE. Three genes in this cluster (STK15, PLK1, and NEK2) have been shown to have roles in centrosome duplication, whose improper expression is a potential cause of tumor aneuploidy (Lengauer et al., 1998 ). It is worth noting that PLK1 and STK15 had the very highest periodicity scores (Table (Table1),1), and thus are the most periodically synthesized mRNAs among the cell cycle-regulated transcripts.
Three groups of genes whose expression peaks in mitosis merit a more detailed examination. We classified these genes as M/G1 because they are expressed during the physical act of mitosis and their transcripts persist into G1 phase. Some of these genes are known to have roles in cell adhesion, chromosome remodeling, and membrane trafficking as opposed to roles specifically associated with the progress of the cell division cycle.
The M/G1 cluster contains genes known to function in actin cytoskeleton remodeling and cell adhesion (Figure (Figure8).8). As a synchronous population of cells proceeds through the cell cycle, visible changes occur to the cellular morphology. The most notable of these changes is the dramatic remodeling of cell shape that can be observed in cell culture during mitosis, when cells detach from the plate and surrounding cells, undergo cytokinesis, and reattach to the plate as they enter the next cell cycle (Schneiderman et al., 1972 ). A cluster of M/G1 genes is expressed during this transition and contains genes whose products have been localized to the structures that connect the actin cytoskeleton to the membrane or have been implicated in the control of cell-cell contacts.
The Ras GTPase can induce perturbations in cell-cell contacts when overexpressed in its constitutively activated form (Kinch et al., 1995 ; Zhong et al., 1997 ; Yamamoto et al., 1999 ). Included in the M/G1 cluster is Kirsten Ras-2 (KRAS2), regularly mutated in many malignancies, including lung adencarcinoma, mucinous adenoma, ductal carcinoma of the pancreas, and colon tumors (Barbacid, 1990 ) (Figure (Figure8A).8A). Ras is believed to act through downstream targets, including CDC42 (which appears in this cluster; Figure Figure8C)8C) as well as the Rho GTPases (which we do not detect as cell cycle- regulated). These GTPases regulate both the assembly of actin filaments and the assembly of the adhesion complexes. Among the M/G1 genes that have been localized to cell-cell adherens junctions are catenin delta 1 (CTTND1, also known as p120), which regulates cadherin clustering at adherens junctions through Rho GTPases (Anastasiadis and Reynolds, 2001 ); presenilin 1, which binds E-cadherin at cell-cell junctions (Georgakopoulos et al., 1999 ) and has also been shown to localize to the nuclear membrane, interphase kinetochores, and centrosomes, suggesting a function in chromosome segregation (Li et al., 1997 ); vinculin, involved in attaching actin to the membrane (Jockusch and Isenberg, 1981 ; Wilkins and Lin, 1982 ); calponin 2 (CNN2), which contains an actin binding site and is localized to adherens junctions with vinculin (Masuda et al., 1996 ); and MLLT4 (known also as AF-6 or afadin), which is a component of tight junctions (Yamamoto et al., 1997 ). Taken together, these results suggest that the regulation of transcripts during the cell cycle plays a role in orchestrating the morphological changes that occur during mitosis. Expression of these genes during the physical act of mitosis may be necessary to prepare the cell to reestablish contact with the surrounding milieu.
Genes involved in chromosome architecture and remodeling are also expressed at M/G1 (Figure (Figure8A).8A). These include presenilin 1, discussed above, and parathymosin, which is localized to sites of early DNA replication (Vareli et al., 2000 ) and binds to the linker histone H1 (Kondili et al., 1996 ). The dyskeratosis gene (DKC1) is implicated in telomere maintenance, binds to H/ACA class of snoRNPs, and has been shown to associate with the telomerase RNA that acts as a template to add sequence repeats to the ends of chromosomes (Mitchell et al., 1999 ). Two members of the SWI/SNF complex (SMARCB1 and SMARCD1), present in the M/G1 cluster, function in chromatin remodeling and transcriptional regulation (Wang et al., 1996b ; Muchardt and Yaniv, 2001 ). Other M/G1 genes with roles in chromosome architecture and remodeling include human securin (PTTG1), and RAD21, both genes involved in sister chromatid cohesion (Zou et al., 1999 ; Hoque and Ishikawa, 2001 ).
Finally, expression of CDK7 (MO15) the catalytic subunit of the CDK-activating kinase (CAK) also peaks at M/G1 (Figure (Figure8B).8B). CDK7 is a catalytic subunit of CAK, necessary for the phosphorylation and full activity of CDK2 and CDC2 (Fisher and Morgan, 1994 ). CAK activity and CDK7 protein have been reported to be constant during the cell cycle (Matsuoka et al., 1994 ; Tassan et al., 1994 ) although data on the mRNA regulation have not previously been reported. The CAK cyclin subunit Cyclin H (Fisher and Morgan, 1994 ) does not oscillate in our dataset even though we do see oscillation of the mRNA for the mitotic cyclins.
In summary, we identified 874 genes that show periodic expression in the cell division cycle of a human cell line. We found a relationship between genes associated with proliferation of tumors and those that are periodically expressed during the cell division cycle. Many of the genes that are periodically expressed have well-characterized functions associated with the cell division cycle, and a large number, >450 clones, are either uncharacterized ESTs or hypothetical proteins whose pattern of regulation during the cell division cycle now points to specific directions for further investigation.
We identified 874 genes that show periodic expression across the human cell cycle in a well-studied cancer cell line (HeLa). This system was chosen because a high degree of synchrony and low background of noncycling cells can be obtained by multiple methods. Herein, 1134 clones (representing 874 UNIGENE clusters) passed a minimum set of objective quantitative criteria for a periodically expressed gene. Although both the Rb and P53 tumor suppressors are inactivated in HeLa cells as a result of binding of the E6/E7 proteins of the human papillomavirus (Scheffner et al., 1991 ), genes involved in basic cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion still showed periodic expression.
Each gene was assigned a cell cycle phase as described above (Figures (Figures22 and and3).3). Approximately 38% of the genes identified as cell cycle regulated were assigned a phase of G1/S or S phase, whereas 45% were assigned a phase of G2 or G2/M. In mammalian cells, G2 and M phases are short (4 h of each 14–16-h cell cycle) relative to G1 and S phase, yet almost half of the periodically expressed genes peak during this interval. One potential explanation is that we were more able to detect genes expressed at G2 and M phases because three of our five synchronization methods arrest cells in S phase, which provides more robust synchrony in G2 and M phases. This possibility now seems particularly unlikely because a similar distribution is observed in the proliferation clusters, which have now been observed in a variety of tumors independently.
We recognize that our microarrays do not completely represent all expressed genes in the human genome, limiting the completeness of our survey. For example, three of the 49 genes known from the literature to be cell cycle-regulated were not represented on our arrays (see above and Table Table2).2). Subject only to these omissions, we conclude, based on the similarities found between cells synchronized in S phase, cells synchronized in M phase, cells selected by mitotic shake-off, and the tumor proliferation clusters, that we have identified a comprehensive list of cell cycle-regulated human genes.
Genome-wide transcriptional profiles of the cell cycle in animal cells have been reported previously from experiments carried out on fibroblasts and epithelial cells. A study of the response of primary human fibroblast to serum revealed not only a cell cycle response but also an equally prominent wound-healing response (Iyer et al., 1999 ). This made clear the shortcomings of serum as a synchronizing agent if the goal is to find periodically expressed genes. Herein, we have avoided the use of serum “starvation” entirely.
A study of the cell cycle and E2F transcription factors in mouse embryo fibroblasts identified 578 cell cycle-regulated genes and E2F targets (Ishida et al., 2001 ). Many of the genes identified in the mouse study were also identified as cell cycle-regulated in HeLa cells herein. Among these are TYMS, RRM1, RRM2, Cyclin E1, MCM3, MCM7, PCNA, TOPIIA, FEN1, RAD51, SLBP, Cyclin A2, Cyclin B1, Cyclin B2, KI-67, and importin alpha 2. Those identified as E2F targets but not identified as cell cycle regulated in our study include thymidine kinase 1 (TK), CDK2, DNA ligase I, and RB. It should be noted that not all E2F targets are expected to be cell cycle regulated in our study of HeLa cells because some genes show regulation primarily during the resting to growing transition rather than in a continuous cell cycle.
A study of the G2 DNA damage checkpoint in HeLa cells revealed a delay in the expression of mitotic genes when the cells are exposed to ionizing radiation (e.g., Cyclin B1, CKS2, TTK, STK15, CDC20, and CENPA) (Crawford and Piwnica-Worms, 2001 ). Many of these genes were classified as G2/M in our study. It is worth pointing out that similar results were obtained herein for G2 genes when cells are arrested in mitosis by nocodazole; G1/S, S, and G2 genes show relatively low expression, whereas G2/M and M/G1 genes show relatively high expression (e.g., 0 h Thy-Noc samples in Figures Figures22 and and66–8). The opposite is observed in cells arrested in S phase with thymidine; relatively high expression of G1/S and S phase genes, and relatively lower expression of G2, G2/M, and M/G1 genes (Figures (Figures22 and and66–8). The exception is the replication-dependent histone mRNAs, which, although expressed during S phase, are rapidly and coordinately degraded when DNA synthesis is inhibited (Harris et al., 1991 ). Together, these observations suggest that the gene expression program observed during a cell cycle arrest is strongly influenced by the point in the cell cycle where the arrest occurs; the genes normally expressed in that phase are usually but not always, relatively overexpressed.
Another study of the cell cycle in primary human fibroblasts synchronized by a double thymidine block, identified ~700 cell cycle-regulated genes (Cho et al., 2001 ) that we have mapped to 595 unique UNIGENE clusters. Surprisingly, only 96 genes were identified as cell cycle-regulated in both studies (these comparisons are available in the supplemental data). Of the remaining 499 genes that were identified as cell cycle-regulated by Cho et al. (2001) , but not in this study, only 50 were not measured on our microarrays. Of the 778 genes identified as cell cycle regulated in this study, but not by Cho et al. (2001) , only 109 were not measured on their microarrays. We have no ready explanation of this difference in results, except to note that there are differences in the cell lineage, the microarray technology, and in the analysis methods. As we have shown, 69 of the genes that were cell cycle-regulated in our study are nevertheless associated with the proliferative state of tumors and cell lines, whereas only 29 of the genes reported to be cell cycle-regulated by Cho et al. (2001) appear in the tumor proliferation clusters. We suspect that the most significant differences may well be in the degree of cell synchrony achieved and the percentage of cells that reentered the cell cycle after removal of the synchronizing block. Alternatively, HeLa cells, derived from a cervical carcinoma, may more closely resemble cancers in their cell cycle program of gene expression and differ significantly from the corresponding program in a normal fibroblast.
Summing up the comparisons with previous work, we believe that we have detected most of the genes expressed periodically through the HeLa cell division cycle. It will be important to compare this list of 874 genes to similarly comprehensive studies of more normal human cell types in the future.
Comparison of the results of this study with our previous study of the yeast cell division cycle (Spellman et al., 1998 ) produced interesting results. When the “well-characterized” (i.e., those that encode curated protein sequences in RefSeq) and periodically expressed human gene sequences were compared with the yeast genome sequence, ~18% (155 genes) had putative orthologs (Ball, Whitfield, and Botstein, unpublished data). Of these, only ~26% (41) were clearly cell cycle regulated in yeast (Spellman et al., 1998 ), suggesting regulation at the level of transcription is not necessarily a conserved property. An example of a gene periodically expressed in human cells but not in yeast is the human CDC2 kinase (homolog of S. cerevisiae CDC28 kinase). Yeast CDC28 is not regulated at the transcriptional level but rather the kinase activity of CDC28 is regulated (Lee and Nurse, 1987 ). It is likely that many of the periodically expressed genes in human cells that do not have periodically expressed counterparts in yeast are subject to multiple layers of regulation (e.g., phosphorylation and proteolysis) as is already known for many well-studied cell cycle genes. Among the genes periodically expressed in both species are many involved in basic processes such as DNA replication, repair, and metabolism, and mitosis. However, as noted by Spellman et al. (1998) , many of the genes that were periodically synthesized in yeast are not obviously involved in these basic processes but instead are genes involved in bud emergence and bud growth, which take place at particular points in the cell cycle because of the particular biology of budding yeasts. Likewise, it may be that many genes are periodically expressed in an animal cell for reasons that do not apply to yeast. Spellman et al. (1998) reasoned, partly from this observation, that parsimony is likely to explain much of the observed cell cycle regulation; genes are periodically expressed primarily because there is special need for the gene products at particular points in the cell cycle. For human cells, the periodic synthesis of the cytoskeleton and cell adhesion-associated genes is a particularly interesting case in point. The overall shape of an animal cell generally changes, and tends to lose contact with its substrate, when it forms the mitotic spindle and undergoes cell division. Reasoning in parallel to the yeast example, we suppose that animal cells might periodically synthesize cytoskeletal genes and adhesion factors because of the particular need to change shape in mitosis, and subsequently reattach to substrate, a need not shared by the yeasts.
Comparison of the 874 genes that we identify as cell cycle regulated in HeLa cells to studies of human breast tumors shows that almost 70% of the transcripts in the proliferation cluster are cell cycle regulated. We showed above that many genes involved in basic cell cycle processes are also more highly expressed in more proliferative tumors. However, the correlation we found may reflect only the reality that proliferative tumors contain a large proportion of actively cycling cells.
There are nevertheless periodically expressed genes (e.g., KRAS2, PTTG1, STK15, and PLK) (Der et al., 1982 ; Pei and Melmed, 1997 ; Smith et al., 1997 ; Zhou et al., 1998 ) that are likely to contribute more directly to tumor phenotypes. For example, the two highest scoring genes in our study, STK15 and Polo-like kinase (PLK), show peak expression at G2/M, are highly expressed in proliferative tumors, and have transforming activity in NIH3T3 cells (Holtrich et al., 1994 ; Smith et al., 1997 ; Bischoff et al., 1998 ; Zhou et al., 1998 ; Takai et al., 2001 ). STK15 has been shown to be amplified in human colon cancers and in cell lines derived from many other kinds of human tumors (Bischoff et al., 1998 ; Zhou et al., 1998 ). High expression has also been observed in the absence of amplification and contributes to abnormal centrosome numbers in cell lines (Zhou et al., 1998 ). PLK has also been implicated in centrosome maturation; injection of PLK into either HeLa cells or human foreskin fibroblasts resulted in reduced centrosome size and abnormal chromatin distribution (Lane and Nigg, 1996 ). Amplification or overexpression of STK15 and/or PLK has been postulated as a potential cause of aneuploidy in human tumors (Lengauer et al., 1998 ). The high expression of genes involved in centrosome duplication may contribute to the chromosomal translocations and aneuploidy found in HeLa cells.
Although there is significant overlap between the genes found in the proliferation cluster and those that we have identified as cell cycle-regulated, not all cell cycle-regulated genes are highly expressed in the more proliferative tumors. Some cell cycle-regulated genes are expressed in a heterogeneous pattern with no clear relationship to proliferation in the breast tumors studied (e.g., CDC6, CCNE1, ORC1, and BRCA1). Some cell cycle-regulated genes are expressed at lower levels in the more proliferative tumors (calponin 2, smoothelin, vinculin, and actin filament-associated protein). Many of the genes expressed at lower levels in tumors are genes with peak expression at M/G1 and have roles in cell-cell adhesion and regulation of the actin cytoskeleton. The strongly decreased expression of these genes in tumors is often associated with increased invasion, metastasis, and ultimately, poor prognosis (Rodriguez Fernandez et al., 1992 ; Rudiger, 1998 ; Engers and Gabbert, 2000 ). It is highly likely that the regulation of these genes is responsive to many other factors in addition to the cell cycle. Alternatively, the relative low level of expression of some of these genes may actually be advantageous in the proliferative tumors.
The genes identified in this study include the molecular targets of several different classes of chemotherapeutic agents (antimetabolite targets TYMS, RRM1 and 2, and DHFR; tubulin targeted by antimicrotubule drugs, TOP1 and TOP2 inhibitors, etc.) (Ratain, 1997 ). The list of genes that are strongly expressed in proliferative tumors, especially those previously uncharacterized, may prove to be a useful source of additional drug targets of this kind.
In conclusion, it is important to note that we have extracted only a fraction of the information inherent in this dataset. The entire dataset is now freely available for any purpose (http://genome-www.stanford.edu/Human-CellCycle/HeLa/). We are confident that others will find it a useful source for further discoveries and insights into the cell cycle and cancer.
We acknowledge members of the Botstein and Brown laboratories for helpful discussions; Max Diehn, Ash Alizadeh, and Jennifer Boldrick for assistance in the collection of time points; Orly Alter and Olga Troyanskaya for discussions of analysis methods; the Stanford Functional Genomics Core Facility for the production of microarrays; and Tim Stearns and May C. Chen for critical reading of the manuscript. M.L.W. is supported by a National Research Service Award Postdoctoral Fellowship from the National Human Genome Research Institute (HG00220-02) and by funds from the Scleroderma Research Foundation. J.I.M. is a Howard Hughes Medical Institute Predoctoral Fellow. M.M.H. is supported by a grant from the American Cancer Society, Florida Division (F99FSU-3) with prior support from the National Institutes of Health (GM-46768). This work was supported by grants from the National Cancer Institute to D.B. and P.O.B. (CA-77097 and CA-85129). P.O.B. is an Associate Investigator of the Howard Hughes Medical Institute.
A complete data set for this article is available at www.molbiolcell.org. Article published online ahead of print. Mol. Biol. Cell 10.1091/mbc.02–02–0030. Article and publication date are at www.molbiolcell.org/cgi/doi/10.1091/mbc.02–02–0030.