|Home | About | Journals | Submit | Contact Us | Français|
Mammalian c-Myc is a member of a small family of three related proto-oncogenic transcription factors. c-Myc has an unusually broad array of regulatory functions, which include roles in cell cycle and apoptosis, a variety of metabolic functions, cell differentiation, senescence and stem cell maintenance. c-Myc modulates the expression of a very large number of genes, but the magnitude of the majority of the regulatory effects is only two-fold or less. c-Myc can both activate and repress the promoters of its target genes. Identification of genes directly regulated by c-Myc has been an enduring question in the field. We report here microarray expression profiling of a high resolution time course of c-Myc induction, using fibroblast cells in which c-Myc activity can be modulated from null to physiological. The c-Myc transcriptome data set presented is the largest reported to date with 4,186 differentially regulated genes (1,826 upregulated, 2,360 downregulated, 1% FDR). The gene expression patterns fit well with the known biological functions of c-Myc. We describe several novel findings and present tools for further data mining. Although the mechanisms of transcriptional activation by c-Myc are well understood, how c-Myc represses an even greater number of genes remains incompletely described. One mechanism involves the binding of c-Myc to other, positively acting transcription factors and interfering with their activities. We identified rapid-response genes likely to be direct c-Myc targets and analyzed the promoters of the repressed genes to identify transcription factors that could be targets of c-Myc repression.
c-Myc is a member of a small family of three closely related transcription factors, c-Myc, N-Myc and L-Myc. c-Myc is expressed in most cell types, while the other members show more tissue-restricted expression. Increased and/or deregulated expression of one of the myc genes has been found in a wide variety of humans cancers.1 In addition to the widely documented translocations and amplifications of the myc loci, deregulation can occur by a range of mechanisms, including indirect ones such as activation of upstream signaling. All combined, deregulation of the Myc signaling pathway has been estimated to affect as much as 70–80% of all human cancers.2,3
Genes of the myc family are conserved in metazoans, and the Drosophila and mammalian genes can functionally substitute for each other.4 The expression and activity of Myc proteins are intricately regulated, including transcriptional regulation, mRNA stability, translational regulation, protein modification and regulated protein degradation.3 Knockout of c-myc or N-myc in the mouse results in embryonic lethality.5 In normal cells, expression of c-Myc mRNA is correlated with proliferation, being generally low in quiescent cells and rapidly induced by growth factors. Overexpression of c-Myc leads to reduced growth factor requirements and strongly promotes proliferation as well as growth (accumulation of mass), while reduced expression extends the cell cycle (mostly in G1 phase) and slows down growth.6 Ectopic c-Myc expression can strongly antagonize differentiation, while in many other situations enforced expression leads to apoptosis. The latter stems from the ability of c-Myc to strongly potentiate a variety of pro-apoptotic signals and is believed to constitute a defense mechanism against its deregulated expression.7,8
The goal of comprehensively delineating c-Myc-regulated gene networks has faced considerable challenges. First, there is the sheer size of the c-Myc transcriptome: some 15–20% of all genes.2 Second, c-Myc is a weak effector, with typical regulation of its targets in the 2–3-fold range (which presents a sensitivity issue, with a significant danger of false negatives). Third, c-Myc can exert both activation and repression, using different biochemical mechanisms and targeting distinct subsets of genes. c-Myc upregulated genes promote cell cycle entry, cell growth, ribosome biogenesis and protein synthesis, biogenesis of mitochondria, energy production (such as glycolysis) and anabolic metabolism (such as synthesis of amino acids and nucleotides), while downregulated genes fall mainly in the categories of differentiation, apoptosis, cell adhesion and cytoskeleton biogenesis.3
All Myc proteins contain a C-terminal DNA-binding and dimerization domain. Sequence-specific DNA binding requires the partner Max.9 Myc-Max heterodimers bind the recognition sequence CA(C/T)GTG (E-box) and several related motifs.10 Activation of transcription, mediated by the N-terminal domain of c-Myc, involves a combination of three mechanisms: recruitment of histone acetyl transferases, recruitment of ATP-dependent chromatin remodeling complexes; and facilitation of promoter clearance and elongation.11 In contrast, repression of transcription by c-Myc is much less understood, appears to be independent of E-boxes and is believed to be mediated by the binding of c-Myc (or Myc-Max dimers) to other transcription factors and interfering with their activities.3,12 The repressive activity of c-Myc not only affects many genes (roughly equivalent to the number of upregulated genes), but is also vital for many of its biological functions, such as transformation.12 Consistent with this, chromatin immunoprecipitation (ChIP) experiments show that a significant fraction (some 40%) of c-Myc binding promoters lack E-box sequences.13
We have previously reported a cell culture model system in which c-Myc activity can be modulated from essentially null to only slightly above physiological with excellent temporal resolution.14,15 This contrasts with the great majority of other studies that examine the consequences of c-Myc overexpression in tumor cells. We have previously shown that many genes affected by c-Myc overexpression are not regulated during its normal physiological range of expression.14,15 In this communication, we report expression profiling of c-Myc target genes using a high resolution time course of c-Myc induction. To our knowledge this is the most extensive in vivo kinetic analysis of c-Myc activity and allowed us not only to determine c-Myc regulated patterns of gene expression with high confidence, but also to identify rapid-response genes likely to be direct targets. Finally, promoters of the repressed genes were analyzed to identify transcription factors that could be targets of c-Myc repression.
We used the HOMycER12 cell line.15 This cell line is a derivative of the c-myc−/− rat fibroblast cell line HO15.19,16,17 in which both copies of the c-myc gene were knocked out by gene targeting, and was further modified by introducing a transgene expressing the conditionally-active c-Myc-Estrogen Receptor fusion protein (MycER).18 The transgene is expressed constitutively, but activity of the MycER protein can be regulated using the ER ligand 4-hydroxytamoxifen (OHT). In the absence of OHT, the MycER protein is inactive and sequestered in the cytoplasm, and the addition of OHT elicits its activation and rapid translocation into the nucleus.
HOMycER12 cells were expanded in the absence of OHT, multiple dishes were treated with OHT, and samples were collected over a time course of 24 h (for details see Fig. 1A and Materials and Methods). Three independent biological replicates were performed and total RNA was analyzed using Affymetrix GeneChip Rat Exon 1.0 ST arrays. Probes were filtered and mapped to genes annotated in the Entrez Gene database (National Center for Biotechnology Information, NCBI). A gene was deemed differentially expressed if (1) it was expressed at any time point during the time course, (2) its expression at any time point was significantly different from its expression before OHT treatment and (3) its expression in OHT-treated cells was significantly different from its expression in control (vehicle-treated) cells. A PLIER score (see Materials and Methods) for each gene was determined by averaging the values of the three biological replicates. To detect differentially expressed genes a three-way ANOVA was applied with the following factors: time, treatment and experiment (each biological replicate and its associated controls were considered as one independent experiment). For each gene, the time factor p value was calculated based on differential expression between any time point (1 to 24 h) and the t = 0 time point; the treatment factor p value was calculated based on differential expression between treated and control samples (8, 16 and 24 h). The thresholds for these two sets of p values were set at a false discovery rate (FDR) of 1%.19 The experiment factor p value was not included in this filtering process. After this series of analyses, even with the very stringent cutoff of 1% FDR, 4,186 genes (19.6% of the 21,335 genes on the array) were deemed significantly differentially expressed after c-Myc induction.
The great majority of the changes to the transcriptome (99.7%) occurred within 5 h of c-Myc induction (Fig. 1B). The effects of c-Myc activity were remarkably rapid, with 90.6% of upregulated genes and 80.3% of downregulated genes showing a statistically significant initial response within the first 3 h. Clustering of c-Myc regulated genes according to their expression profiles (Fig. 1C) revealed two major clusters of upregulated genes (1,821 genes, clusters 2, 4) and three clusters of downregulated genes (2,358 genes, clusters 1, 3, 5). The optimum number of clusters to reveal distinct expression behavior was found to be nine. Smaller cluster numbers merged the major clusters, while larger cluster numbers did not separate the major clusters into additional clusters with clearly distinct expression behaviors. The majority of upregulated genes (1,643 genes, cluster 2) showed a sustained response and remained upregulated for the duration of the experiment, while a smaller cluster (178 genes, cluster 4) showed a transient response and returned to pre-c-Myc induction levels within 8 h. The down-regulated clusters were differentiated mainly by the kinetics of their response: slow (1,712 genes, cluster 1), intermediate (30 genes, cluster 5) and fast (616 genes, cluster 3). The latter are excellent candidates for genes directly repressed by c-Myc.
The long time frame, frequent time points, and three biological replicates resulted in a rich data set with high resolution and statistical significance (4,186 differentially regulated genes at 1% FDR). Subsets of this data set have already been mined in several recent publications.20,21 The entire data set has now been deposited in the NCBI Gene Expression Omnibus (GEO) database (accession GSE20550). In addition, the time course data for each of the 21,335 genes on the array can be viewed individually, in the format used in Figure 1C, at www.brown.edu/Project/Myc.
We subjected our data sets to Gene Ontology (GO) and Gene Set Enrichment (GSEA) analysis (Materials and Methods).22,23 The program GOstat was used to analyze lists of significantly upregulated and downregulated genes at each time point for overrepresentation of GO categories (Sup. Table 1), while GSEA was used to analyze preranked lists of all genes at each time point for enrichment of gene sets in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Fig. 2).24 The results of these two analyses were in broad agreement, and many of the identified categories correspond to known c-Myc biological functions such as protein translation, mitochondrial biogenesis, apoptosis and cell adhesion. Several signaling pathways were identified that have not been previously associated with c-Myc induction, such as the Janus kinase and signal transducers and activators of transcription (JAK-STAT) pathway, mitogen-activated protein kinase (MAPK) pathway, insulin signaling pathway and gonadotropin-releasing hormone (GNRH) signaling pathway. Surprisingly, the cell cycle and mitosis categories were overrepresented among c-Myc downregulated genes. Glycolysis, which has been reported to be upregulated by c-Myc,25 was not identified by either GO or GSEA analysis. Some of these findings are explored in more detail below.
HOMycER12 cells in the absence of OHT display the very long cell cycle characteristic of c-myc null cells,16 and induction of c-Myc activity with OHT increases both proliferation and S phase content to essentially normal levels.15 Surprisingly, the cell growth, proliferation and cell cycle GO categories were overrepresented in downregulated genes (Sup. Table 1). Similarly, GSEA showed a negative score for the KEGG cell cycle pathway, to which 19 genes contributed (Fig. 2). Our ANOVA analysis with 1% FDR deemed 37 out of the 87 genes in the KEGG cell cycle pathway to be significantly regulated, with 15 being upregulated and 22 downregulated (Fig. 3). The explanation for these discrepancies is that none of these analyses consider whether a gene that is being regulated is a positive or negative effector of the process. Taking this into account it becomes clear that the majority of the genes (26/37, Fig. 3) were regulated so as to promote cell cycle progression: 12 negative effectors were downregulated and 14 positive effectors were upregulated. Prominent examples were the upregulation of Cdk4 (cyclin dependent kinase 4), Cdk6 (cyclin dependent kinase 6), Ccne1 (cyclin E) and Cdc25a (cell division cycle 25 homolog A), and downregulation of Cdkn1b (cyclin dependent inhibitor p27Kip1), Cdkn2c (cyclin dependent inhibitor p18Ink4C) and Gadd45a (growth arrest and DNA-damageinducible α). Similarly, of the 19 genes that contributed to the negative GSEA score, eight were cell cycle inhibitors.
GO analysis indicated that many mitosis-related categories, including cellular component categories such as microtubule organizing center, and biological processes such as chromosome segregation and cytokinesis, were overrepresented in the down-regulated genes. In the GSEA analysis, four mitosis-specific genes contributed to the negative score and all were promoters of cell cycle progression. Of the 37 KEGG cell cycle pathway regulated genes (Fig. 3) 10 act in mitosis. Four of these were regulated to promote progression: Pttg, which encodes securin, was down-regulated, and three members of the anaphase promoting complex (Anapc5, Cdc23, Cdc16) were upregulated. Five genes were regulated to inhibit mitosis: Plk1 (polo-like kinase 1), Cdc20 (cell division cycle 20), Fzr1 (Cdc20-related 1, also known as Cdh1), Espl1 (separin) and Anapc2 (anaphase promoting complex 2).26–28 Given that Anapc2, a critical component of the minimal ubiquitin ligase module of the APC29 was downregulated, the overall effect of the observed changes is likely to be an increase in the stringency and fidelity of the mitotic checkpoints. Such a mechanism may be a useful counterbalance to the promotion of proliferation induced by c-Myc under physiological (non-cancer promoting) situations. The effect of c-Myc on mitosis has not been previously reported and warrants further investigation.
Glycolysis and the tricarboxylic acid (TCA) cycle were not implicated by either GO or GSEA analyses. Ldha (lactate dehydrogenase A) is one of the earliest reported c-Myc-induced genes30 and promotes glycolysis and lactate production even under aerobic conditions (the Warburg effect). In our data set, Ldha was induced by c-Myc but did not reach statistical significance. Other glycolytic genes were also reported to be activated by c-Myc overexpression:31 Slc2a1 (glucose transporter type 1, previously known as Glut1), Gpi (phosphoglucose isomerase), Gapdh (glyceraldehyde3-phosphate dehydrogenase), Pgam1 (phosphoglycerate mutase) and Eno1 (enolase α). In our data set, Slc2a1 was upregulated at early time points, Gpi was slightly downregulated, and the other genes were not significantly affected. In addition, we observed a slight downregulation of Aldoa (aldolase A) which was also noted in a previous study.31 With the exception of the glucose transporter, all of the encoded enzymes catalyze reversible reactions at (or very close to) equilibrium, and small changes in their expression would be unlikely to significantly affect the flux through the glycolytic pathway. We therefore examined the major regulatory steps of glycolysis, catalyzed by hexokinases and phosphofructokinases, which control the entry of metabolites into the pathway. Both Hk2 (hexokinase 2) and Pfkp (phosphofructokinase, platelet) were strongly upregulated by c-Myc (Fig. 4). Of the three hexokinases Hk2 is the major form in fibroblasts (Hk1 and Hk3 are brain and white-cell specific, respectively). All three phosphofructokinases are expressed in fibroblasts.
This example illustrates another limitation of analyses based solely on a statistical treatment of gene expression changes: while both GO and GSEA failed to identify glycolysis, a more detailed analysis based on enzymology indicated that the observed gene expression changes would in fact be very likely to increase the metabolic activity of the pathway. Since metabolism is a major target of c-Myc regulation, we devised a tool that combines the KEGG database, our expression data and the Cytoscape software platform for visualizing molecular interaction networks.32 For any chosen KEGG pathway, the output will display the enzymes (represented by gene symbols), their regulation by c-Myc in our data set, the metabolites and the direction of the enzymatic reactions. An example of the output (for the early steps of glycolysis) is shown in Figure 4; this the tool can be freely accessed and downloaded at www.brown.edu/Project/Myc.
Since the TCA cycle was not identified by GO and GSEA analyses, we examined the rate limiting enzymatic steps for possible effects using the Cytoscape tool. In this case, we did not find any significant gene expression changes (not shown). For the KEGG oxidative phosphorylation pathway only 9 out of 82 genes were regulated by c-Myc. GO analysis showed the inner mitochondrial membrane, in which oxidative phosphorylation complexes are embedded, to be enriched and upregulated. Closer examination, however, revealed that genes encoding components of the oxidative phosphorylation complexes (mitochondrial respiratory chain, proton-transporting ATP synthase) were not differentially regulated. Strongly enriched among c-Myc upregulated genes were the categories of mitochondrial biogenesis and organization, mitochondrial transcription, translation and protein import (Sup. Table 1). This correlates well with a previous study showing that c-Myc increases both mitochondrial mass and function.21 In summary, our analysis indicates that while c-Myc does not significantly regulate genes encoding components of the TCA cycle and oxidative phosphorylation complexes, it likely stimulates both carbon assimilation and energy metabolism by upregulating rate limiting steps of glycolysis and promoting mitochondrial biogenesis.
c-Myc regulated 18 of the 46 genes in the KEGG JAK-STAT signaling pathway, 17 of which were downregulated (Fig. 5). This included all the Jak kinases (Jak1, Jak2, Jak3) and most of the Stat proteins. Only one gene was upregulated, which encodes a negative regulator of the pathway, Pias2 (protein inhibitor of activated Stat2). However, some other negative effectors were down-regulated, including two additional Pias-encoding genes (Pias3 and Pias4) and Socs3 (suppressor of cytokine signaling 3). Thus, c-Myc appears to downregulate much of the core JAK-STAT signaling pathway, including both its positively and negatively acting components. To our knowledge this remarkable effect of c-Myc activity has not been noted previously. The effects of the JAK-STAT pathway are complex, and cell type and context dependent. Under physiological conditions of c-Myc activation, such as to promote proliferation of stem cells and transit amplifying cells, dampening of the JAK-STAT pathway may make these cells more resistant to differentiation signals as well as proinflammatory or proapoptotic stimuli.
Promoters were defined and selected as sequences spanning from 1.0 kbp upstream to 0.5 kbp downstream of transcription start sites (TSS). Position weight matrices (PWMs) of interest were obtained from the JASPAR Core database33 and the TRANSFAC database.34 Promoter sequences were scanned for enriched motifs using the Cis-eLement OVERrepresentation (Clover) program.35 The background set was comprised of the promoter sequences of all the genes on the microarray (Materials and Methods). All c-Myc-associated E-box PWMs were highly enriched in the promoters of genes upregulated in the first 3 h of c-Myc activation (Table 1). The enrichment of c-Myc E-boxes was lost in the genes that were upregulated after 3 h. Since c-Myc is believed to activate genes by binding to E-boxes in their promoters, most of the genes responding in this early window are likely to be direct targets. Interestingly, the very high p-values for all c-Myc associated PWMs in the promoters of downregulated genes, and especially those responding in the early 1–3 h window, indicates that E-boxes are significantly underrepresented in these promoters. This is consistent with c-Myc exerting its repressive activity independently of E-boxes, especially for the early responders, which are likely to be direct targets.
One way in which c-Myc has been shown to repress genes is by interfering with the activity of their positive effectors.36 We therefore analyzed the promoters of genes downregulated by c-Myc for enriched PWMs, among which one would expect to find the binding sites of transcription factors potentially targeted by c-Myc for interference. Such mediators of c-Myc repression should fulfill the following criteria: (1) the binding motifs should be overrepresented in the promoters of c-Myc repressed genes; (2) the genes encoding the mediators should be expressed in the cells under investigation; (3) the mediators should engage in protein-protein interactions with c-Myc. We focused our PWM analysis on promoters of genes that were downregulated in the first 3 h of the time course, since these are likely to be direct c-Myc targets. As before, the background set was comprised of the promoter sequences of all the genes on the microarray. For comparison, we also included in the analysis the promoters of genes upregulated in the same time period. The PWMs of nine transcription factors were found to be significantly overrepresented in the promoters of downregulated genes (Table 2).
c-Myc has been reported to interfere with the activities of five transcription factors: TFII-I (general transcription factor IIi, gene symbol Gtf2i),37 Yy1 (Yin Yang 1, gene symbol Yy1),38,39 Miz1 (Myc-interacting zinc finger protein 1, gene symbol Zbtb17),40 Sp1 (specificity protein 1, gene symbol Sp1),41 and NF-Y (nuclear transcription factor Y, also known as CCAAT box-binding transcription complex, encoded by Nfya, Nfyb and Nfyc genes).42 The PWMs for three of these factors, Yy1, Nfya and Sp1, were present in the JASPAR Core database. Both TFII-I and Miz1 bind to transcriptional initiator (Inr) elements, which are comprised of several degenerate pyrimidine rich sequences. PWMs (Sup. Fig 1B) were generated from the reported consensus elements for TFII-I (YAYTCYYY),37 and Miz1 (YYCAYYYYY),40 and included in the analysis of c-Myc repressed promoters. The PWMs for Miz1 and NF-Y were found to be significantly enriched (Table 2). The PWMs of TFII-I and Yy1 were underrepresented in the downregulated gene promoters; interestingly, they were also underrepresented in upregulated gene promoters. The PWM of Sp1 was not significantly different from the background in any promoter set; this is most likely because the element is very short and degenerate. The ability to identify two of the most well-characterized c-Myc repression mediators validates the methodology of our analysis.
Two PWMs for the Ets (v-ets erythroblastosis virus E26 oncogene homolog) family of transcription factors were overrepresented in the promoters of downregulated genes. There are 26 Ets family members present in the rat genome. Ets factors bind motifs that contain a GGAA/T core sequence, but differ significantly in their preference for flanking sequences.43 The enriched JASPAR PWMs were MA0098 (Ets1) and MA0080 (Sfpi1); both are six nucleotides long, which is not enough to differentiate among the multiple family members. We subsequently obtained 15 additional 8–16 nucleotide PWMs for a total of 10 Ets family members, either from the TRANSFAC database or from the literature (Sup. Fig 1C).44–47 The remaining Ets family members do not have defined PWMs. Only the two 6-nucleotide PWMs (MA0080, MA0098) were significantly enriched in the down-regulated genes. Some of the longer PWMs were overrepresented in both the downregulated and upregulated promoters relative to background. The longer Ets1 PWM (MY0005) and the PWM for the Ets factor Elf5 (MA0136) were only overrepresented in the upregulated promoters. The overrepresentation of the Ets core motif specifically in the downregulated promoters suggests that one (or more) of the many Ets family members whose binding motifs have not been determined may participate in mediating repression by c-Myc. Given that 10 Ets family members (Elf2, Elf4, Elk3, Ets2, Etv3, Etv4, Etv5, Etv6, Spdef, Fev) were expressed in HOMycER12 cells but lack defined PWMs, much exploration remains to be done in this area.
Additional candidates arising from our analysis are Mzf1 (myeloid zinc finger 1), Rreb1 (Ras responsive element binding protein 1), Klf4 (Kruppel-like factor 4) and Irf1 and Irf2 (interferon regulatory factors 1 and 2) (Table 2). Mzf1 and Rreb1 are strong candidates since their expression is either not affected by c-Myc, or is affected only outside the 3 h window that was analyzed. These factors have not been extensively studied, but in a general sense, the biological activities of Mzf1 and Rreb1 are anti-proliferative and promote differentiation.48–51 Interference with their activity would be expected to facilitate cell proliferation and inhibit differentiation, both of which are known to be mediated by c-Myc.
Klf4 is well known as a regulator that can act both as a repressor and activator,52,53 and both as a tumor suppressor and oncogene.54 In untransformed cells, Klf4 acts as an inhibitor of proliferation. One of its targets is p53, whose promoter is bound and repressed by Klf4. c-Myc interference with Klf4 would thus facilitate proliferation and, ultimately, sensitize the cells to p53-dependent apoptosis, one of the hallmarks of c-Myc overexpression. Importantly, c-Myc and Klf4, together with Oct4 and Sox2, can reprogram a variety of somatic cells into induced pluripotent stem cells (iPS).55 The simple model of c-Myc interference with Klf4 does not fit well in this context, and multiple lines of evidence indicate that the balance between c-Myc and Klf4 activities plays an important role in reprogramming.56 It is also interesting to note that Klf4 was transcriptionally repressed in our data set starting at the 3 h time point, adding further complexity.
Identification of genes regulated by c-Myc has been an enduring question in the field. c-Myc also regulates the expression of other transcriptional regulators, both activators and repressors, which in turn modulate the expression of their targets.57 To disentangle this complex web, it is critical to understand which genes are the primary or direct targets of c-Myc. Toward this end many have examined the regulatory effects of the conditional MycER fusion protein18 activated by OHT in the presence of cyclohexamide. The rationale was that cyclohexamide, by inhibiting translation, would block the expression of any secondary effectors downstream of c-Myc. In practice, this approach has been disappointing.15 First, cyclohexamide exerts very strong effects on the steady-state levels of numerous mRNAs, and these alone are often of greater magnitude than those elicited by c-Myc. Second, the MycER protein (like c-Myc itself) is unstable, and preventing its resynthesis dampens the regulatory effects, especially with respect to downregulated genes.
Another approach, which we have taken here, is to collect samples for expression profiling in a time course after activation of MycER with OHT. While this approach does not provide direct evidence of direct action, it also does not suffer from the strong perturbations of cellular physiology caused by cyclohexamide. To maximize the resolution of this analysis we collected numerous samples at early time points. The expression profiling was performed in an immortalized but non-tumorigenic rat fibroblast cell line that had been genetically engineered such that c-Myc activity can be regulated with OHT from essentially null to only slightly above physiological.15 The long time course, frequent time points and three biological replicates resulted in a data set with very high resolution and statistical significance, allowing us to define 4,186 differentially regulated genes at 1% FDR. Of these, 1,826 were upregulated and 2,360 were downregulated (Fig. 1). Remarkably, 90.6% of the upregulated genes and 80.3% of the downregulated genes showed an initial response within the first 3 h of c-Myc activation (OHT addition).
The identification of a positive c-Myc target in our experimental regimen would require activation of the c-Myc protein, activation of the target gene promoter and accumulation of the encoded mRNA to a level that could be detected by the microarray. Our data indicate that this process is clearly detectable in 1 h and peaks at 2 h after c-Myc induction, with only relatively few genes responding at the 3 h time point (Fig. 1B). In contrast, for a negative target, in addition to repression of transcription, the steady-state levels of the mRNA would have to decay sufficiently to be detectable by the microarray. Accordingly, the kinetics of repression were observed to lag by approximately 1 h, with very few genes being first detected at 1 h, and the peak of detection being maximal and roughly equivalent at 2 and 3 h (Fig. 1B). Since the proliferation of the cultures is very slow at these early time points (doubling time >40 h), dilution due to cell division would be minimal, implying that c-Myc repressed genes tend to have unstable mRNA.
For indirect regulation, such as that imposed by a transcription factor regulated by c-Myc, the process would be significantly longer and entail the additional steps of translation and transport of the intermediate factor into the nucleus. It is very unlikely that all this, as well as the ultimate regulation of the target gene, could be achieved in a 3 h window, especially for downregulated genes. In agreement, we found that promoter sequences of genes upregulated in the first 3 h were highly enriched for E-boxes, whereas this enrichment was lost for promoters upregulated at later times (Table 1). Based on all these considerations we conclude that genes responding within the first 3 h of c-Myc activation should be highly enriched for direct targets.
The upregulation of target gene expression is mediated by the binding of Myc-Max complexes to E-boxes. These mechanisms have been extensively studied and are relatively well understood. In contrast, although c-Myc represses an equivalent (in our study an even greater) number of genes, our knowledge of these processes is still quite incomplete. One mechanism involves the binding of c-Myc to other, positively acting, transcription factors and interfering with their activities, typically not at the DNA-binding level but by antagonizing the interaction of the transcription factors with their co-activators.58 Whether E-boxes are required for repression has been controversial.36 Our data show that all E-box PWMs found in the TRANSFAC database are significantly negatively enriched (under-represented) in the promoters of c-Myc repressed genes at all time points (Table 1). This is especially the case for the early responders, which are most likely to be the direct targets, arguing against a link between E-boxes and repression. This does not rule out, however, that interaction with non-canonical E-boxes may stimulate the repressive activity of the c-Myc complexes.36
The most well studied transcription factor that is the target of c-Myc interference is Miz1 (encoded by the Zbtb17 gene).36,40,58 Other well documented factors include Sp1 and NF-Y. Given the wide range of protein-protein interactions that c-Myc engages in,59 it is widely believed that additional transcription factors targeted by c-Myc remain to be identified.60 We therefore took advantage of our database of genes likely to be directly repressed by c-Myc and performed a bioinformatic analysis of their promoters for enriched transcription factor binding sites (Table 2). Our approach was validated by clearly identifying Miz1 and NF-Y (the PWM of Sp1 was not significantly enriched above background in any promoter set because of its very short and degenerate sequence). The novel factors identified by our analysis are one (or more) Ets family members, Mzf1 (myeloid zinc finger 1), Rreb1 (Ras responsive element binding protein 1), Klf4 (Kruppel-like factor 4) and Irf1 and Irf2 (interferon regulatory factors 1 and 2). It is interesting to note that Mzf1, Rreb1 and Klf4, like Miz1 and Sp1, are zinc finger transcription factors.
Another mechanism of c-Myc repression is mediated by small noncoding RNAs known as microRNAs (miRNAs). A single miRNA can destabilize and/or inhibit the translation of multiple mRNAs, and recent work has shown that c-Myc dramatically affects miRNA expression.61 miRNAs are typically synthesized by RNA Pol II as larger polycistronic precursors from which the mature miRNAs are processed. Several miRNA precursor genes, such as the miR-17-92 cluster, have been shown to contain E-boxes in their promoters and to be directly activated by c-Myc61,62 The miRNAs processed from these precursors have been shown to promote a variety of c-Myc functions, such as stimulation of proliferation, survival and metabolism, by targeting the mRNAs that restrain these pathways.61 For example, miR-17, miR-20a, miR-93 and miR-106b have been shown to target the mRNA of the cyclin dependent kinase inhibitor p21Cip1 (encoded by the Cdkn1a gene), a known target of c-Myc repression.61,63 Hence, Cdkn1a can be downregulated by c-Myc at two levels: transcriptionally by interfering with Sp1 activation of Cdkn1a the promoter,41 and posttranscriptionally by inducing the expression of miRNAs that destabilize the Cdkn1a mRNA.63
The direct upregulation by c-Myc of miRNAs that subsequently target mRNAs for degradation is consistent with the rapid response of c-Myc repressed genes that we have observed. In the majority of reported studies of miRNA regulation, c-Myc was overexpressed in the context of cancer promotion.61,64 We have previously shown that a significant number of genes affected by c-Myc overexpression are not regulated during its normal physiological range of expression.14,15 The relative proportions of repressed genes in our data set that are regulated transcriptionally and posttranscriptionally thus remains to be determined.
As is the case with protein-coding genes, c-Myc also represses a large fraction of its miRNA targets.65 Initial reports indicate that these effects are likely to be mediated by both transcriptional and posttranscriptional mechanisms.66 The identification of transcription factors that are targets of c-Myc interference will thus remain a topic of considerable importance for understanding the repression of both coding and non-coding genes. In this context, it will be of interest to identify miRNA-targeted mRNAs in our data set so that the search for PWMs can be limited to transcriptionally repressed promoters, as well as to search for the PWMs we already identified in the promoters of miRNA transcripts.
The c-Myc transcriptome data set presented here is the largest reported to date (4,186 differentially regulated genes at 1% FDR), and the analysis of the gene expression patterns fits well with the known biological functions of c-Myc. To allow data-mining by the community we have not only deposited the primary data in the GEO database (accession GSE20550), but provide additional tools on our website (www.brown.edu/Project/Myc/). Finally, we have uncovered several novel aspects of c-Myc activity, such as downregulation of mitosis and the JAK-STAT pathway, which are discussed more fully in the Results.
HOMycER12 cells were cultured under exponential growth conditions in Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% calf serum (Hyclone) at 37°C and 5% CO2. HOMycER12 cells were previously shown by immunoblotting to express the MycER protein at levels only some 2–3-fold above those present in parental c-myc+/+ cells.15 Cells were plated at 20% confluence 24 h prior to the addition of OHT (248 nM in ethanol) or an equal volume of ethanol for mock-treated control cells. The time points at which OHT-treated and mock-treated cells were collected are shown in Figure 1A. Total RNA was harvested using the TRIZOL reagent (Invitrogen) according to the manufacturer's instructions. The entire experiment was performed on three separate occasions to generate three independent biological replicates. RNA samples were processed and hybridized to Affymetrix GeneChip Rat Exon 1.0 ST arrays using protocols supplied by the manufacturer.
The Probe Logarithmic Intensity Error (PLIER) algorithm, from the Affymetrix Power Tools (APT) software package, was used to generate gene and exon expression scores. Probes were filtered and remapped to rat genes with assigned Entrez ID numbers according to the tables provided by the Microarray Laboratory of the Molecular and Behavioral Neuroscience Institute, University of Michigan.67 Probes were remapped by blasting their sequences to the Rattus norvegicus genome and probes mapping to multiple locations in the genome that do not belong to homologous genes were removed. Exon probe sets were redefined as sets of at least three probes mapping to the same exon. Genes whose PLIER scores did not exceed a value of 48 for any time point (the first quartile of the mean scores over all time points) were considered below the expression threshold. To detect differentially expressed genes a 3-way analysis of variance (ANOVA) was applied with the following factors: time, treatment and experiment (each biological replicate with controls was considered an independent experiment). The experiment factor was included since we have evidence that scores from the same experiment are correlated. For many genes, the expression plot of each replicate exhibited a similar trend but with differential expression scores. Furthermore, after the modeling, 78.5% (19,651 genes) of the genes that are expressed (25,025 genes) had a replicate p value of less than 0.05, supporting the previous observation. The PLIER scores (Yijk) were assumed to follow a log-normal distribution, hence we modeled the data as log(Yijk) = µ + αi + βj + γk + εijk, where αi represents the effect of i-th time, βj represents treatment, γk represents the effect of replicates and εijk represents the noise term. For each gene, the time factor p value was calculated based on differential expression between any time point (t ≥ 1 h) and the t = 0 time point; the treatment factor p value was calculated based on differential expression between treated and control samples at time points when both treated and control samples were collected (8, 16 and 24 h). The thresholds for these p values were set to a false discovery rate (FDR) of less than 1% according to Storey and Tibshirani.19 The experiment factor p value was not included in this filtering process.
The differentially expressed genes were clustered according to their expression patterns over time. The metrics used in the hierarchical clustering algorithm were the correlations between the vectors obtained by collecting all the measurements of each gene, i.e., if V1 is the vector of all measurements of gene 1 and V2 for gene 2, d(1,2) = corr(V1,V2). Linkage was obtained by considering the group average. Each gene trend was obtained by subtracting the mean of all points at all times from each expression value in the original time series and dividing by the standard deviation. The gene trends in each cluster were standardized by computing the average of all genes and replicates in the cluster.
The overrepresented GO categories in each gene group were identified using the online program GOstat.22 We used the RGD database and Benjamini and Hochberg FDR correction options in the program and listed all genes found on the Affymetrix GeneChip Rat Exon 1.0 ST arrays as the reference or background group. The upregulated and downregulated genes at each time point were analyzed as distinct groups. The GOstat algorithm counts and compares the number of genes of each category in the test group and the number of the same category in the reference group. For each GO category, the analysis returns a p-value (approximated using Fisher's exact test), which indicates the probability that the observed counts of each GO category could have resulted from random distribution of the term between the test and reference groups.
We also used GSEA to identify biological pathways or processes that were regulated by c-Myc. GSEA is a program that computes the statistical significance of the differences between two biological states on defined gene sets.23,68 The preranked list for each time point was generated by ranking genes in order of their fold change with respect to t = 0. Genes closer to the top of the preranked list are upregulated genes, those in the middle are unregulated genes, while those closer to the bottom of the list are downregulated genes. The GSEA algorithm then identifies gene sets whose genes are preferentially at the top or at the bottom of the ranked gene list. The GSEA system also makes use of a Monte Carlo strategy in order to test whether the observed up or downregulation is statistically significant, and not just a consequence of random noise in the data. We generated gene sets from rat-specific pathways in the KEGG database.24 Gene sets with a q-value score of less than 0.05 and normalized enrichment score (NES) of less than −1.5 or more than 1.5 were deemed significantly regulated by c-Myc at that time point.19
The Entrez gene ID numbers of selected genes were mapped onto the RefSeq database. The promoter sequences (1,000 bp upstream to 500 bp downstream of the transcription start site) of each gene set were scanned for enriched transcription factor binding motifs using the Cis-eLement OVERrepresentation (Clover) program.35 Position weight matrices (PWMs) of motifs that are found in vertebrates from the JASPAR Core Database33 and three c-Myc binding motif PWMs from the TRANSFAC database were included in the analysis34 A raw score and a p value are returned for each PWM. The raw score quantifies the presence of each motif in the test sequences and is computed by repeated averaging of likelihood ratios, while the p value indicates the probability of obtaining a raw score or greater by chance and is computed by repeated random sampling of the background sequence set. The background set was comprised of the promoter sequences of all the genes on the microarray.
We thank the Brodsky lab for technical support. This work was supported in part by NIH/NIGMS grant R01 GM040960 and NIH/NIA grant R37 AG016694 to J.M.S. N.N. was supported in part by NIH/NIA grant K25 AG028753. G.C. was supported in part by grants SHEILA and EXCALIBUR from the Instituto Nazionale di Fisica Nucleare (INFN) of Italy. The Genomics Core Facility in the Laboratories for Molecular Medicine at Brown University was supported in part by the COBRE award from the NIH/NCRR P20 RR015578. J.M.S. is a Senior Scholar of the Ellison Medical Foundation.