|Home | About | Journals | Submit | Contact Us | Français|
We describe the results of a genome-wide analysis of human cells that suggests that most protein-coding genes, including most genes thought to be transcriptionally inactive, experience transcription initiation. We found that nucleosomes with H3K4me3 and H3K9,14Ac modifications, together with RNA polymerase II, occupy the promoters of most protein-coding genes in human embryonic stem cells. Only a subset of these genes produce detectable full-length transcripts and are occupied by nucleosomes with H3K36me3 modifications, a hallmark of elongation. The other genes experience transcription initiation but show no evidence of elongation, suggesting that they are predominantly regulated at post-initiation steps. Genes encoding most developmental regulators fall into this group. Our results also identify a class of genes that are excluded from experiencing transcription initiation, at which mechanisms that prevent initiation must predominate. These observations extend to differentiated cells, suggesting that transcription initiation at most genes is a general phenomenon in human cells.
Studies of individual genes have revealed that transcriptional regulation can occur at many levels, but these regulatory mechanisms fall into two general models (Ptashne, 1986; Krumm et al., 1993; Kuras and Struhl, 1999; Saunders et al., 2006). In one model, transcription is regulated primarily at the initiation step, when DNA-binding transcription factors recruit the transcriptional machinery. In the other, the key regulatory mechanisms occur subsequent to initiation and involve transcript elongation or stability. These models are not mutually exclusive but make different predictions about the chromatin state of the genome, which can be influenced by transcription.
Transcription initiation and elongation are both associated with specific chemical modifications of the histone components of nucleosomes that reside in the promoter regions and transcribed portions of active protein-coding genes (Jenuwein and Allis, 2001; Turner, 2002; Sims et al., 2004). Among the modifications associated with transcription initiation, the histone H3 lysine 4 trimethyl (H3K4me3) mark occurs in nucleosomes found in the promoter regions of actively transcribed genes (Bernstein et al., 2002; Santos-Rosa et al., 2002; Ng et al., 2003; Schneider et al., 2004; Schubeler et al., 2004; Pokholok et al., 2005). The histone methyltransferase (HMT) responsible for H3K4me3 modification in yeast, the Set1 complex, is recruited to the 5′ end of actively transcribed genes by interacting with the RNA polymerase II (Pol II)-associated PAF complex (Miller et al., 2001; Krogan et al., 2003a; Laribee et al., 2005). Recruitment of mammalian homologs of the Set1 complex to promoters may occur by a similar mechanism (Tenney and Shilatifard, 2005). Acetylation of histone H3 lysine 9 and 14 (H3K9,14Ac) also occurs in promoter-associated nucleosomes at actively transcribed genes (Liang et al., 2004; Schubeler et al., 2004; Bernstein et al., 2005; Pokholok et al., 2005). This is catalyzed by several promoter-associated histone acetyltransferases (HATs), including Gcn5, TAF1, and p300/CBP (Sterner and Berger, 2000; Strahl and Allis, 2000).
Among the modifications associated with transcription elongation, the histone H3 lysine 36 trimethyl (H3K36me3) modification occurs at nucleosomes in the 3′ portion of the transcribed region of actively transcribed genes (Strahl et al., 2002; Bannister et al., 2005; Pokholok et al., 2005). This may be a consequence of recruitment of the HMT responsible for this histone modification by the elongating RNA polymerase (Li et al., 2002, 2003; Xiao et al., 2003).
The relationship between transcriptional activity and chromatin state in embryonic stem (ES) cells is of particular interest. ES cells are uniquely capable of differentiating into almost any cell type (Mayhall et al., 2004; Pera and Trounson, 2004). ES cell chromatin proteins have been reported to be in a hyperdynamic state (Meshorer et al., 2006). Recent studies have shown that a small set of ES cell genes that are transcriptionally inactive but poised for expression contain histones whose modifications are characteristic of both active and inactive genes (Bernstein et al., 2006). However, histone modifications have yet to be studied throughout the genome of ES cells.
We report here that the promoter-proximal nucleosomes of at least 75% of all protein-coding genes in human ES cells are trimethylated at histone H3K4. This is surprising because this particular histone modification was thought to be associated with only actively transcribed genes. Further study revealed that transcription initiation and associated histone modifications occur at most promoters in both ES and differentiated cell types. Thus, most genes that have been considered inactive because they produce no detectable transcript nonetheless experience transcription initiation. This initiation is likely responsible for the modified nucleosome landmark we observe at most promoters.
We used chromatin immunoprecipitation coupled to DNA microarray analysis (ChIP-chip) (Boyer et al., 2005; Lee et al., 2006) to determine how nucleosomes with H3K4me3 are distributed across the entire genome in human embryonic stem (hES) cells (Figure 1; see also the Supplemental Experimental Procedures in the Supplemental Data available with this article online). Previous studies had suggested that H3K4me3-modified nucleosomes occur near the transcription start sites of actively transcribed genes (Bernstein et al., 2002; Santos-Rosa et al., 2002; Ng et al., 2003; Schneider et al., 2004; Schubeler et al., 2004; Pokholok et al., 2005). As expected, we found H3K4me3 enriched at sites of transcription initiation (Figure 1A; Table S1). Nearly all sites of H3K4me3 enrichment were within 1 kb of known or predicted transcript start sites, with maximal enrichment immediately downstream of the start site (Figure 1B; Supplemental Experimental Procedures).
Previous studies have estimated that only 30%–45% of genes have detectable mature transcripts (Sato et al., 2003; Brandenberger et al., 2004a; Su et al., 2004), so we were surprised to find that 74% of all annotated promoters were enriched for H3K4 methylation in hES cells at a high confidence level (Figure 1C; Table S2). Using slightly less stringent criteria (2-fold enrichment; see Supplemental Experimental Procedures), we found that 79% of all protein-coding genes have promoter-proximal nucleosomes enriched for H3K4me3 modification (Figures 1D and 1E; Table S3). This was confirmed using a smaller array that tiles all known human promoter regions (Supplemental Experimental Procedures; Table S4). These results led us to believe that a large fraction of genes for which no transcript has been detected nonetheless have promoter-proximal nucleosomes enriched for H3K4me3 modification.
Since H3K4me3 enrichment occurs at a surprisingly large fraction of known and predicted transcription start sites, we reexamined how H3K4me3 methylation is related to transcription. As expected, nucleosomes with H3K4me3 occurred immediately downstream of the transcription start site of actively transcribed genes (Figures 2A–2C) (Bernstein et al., 2002; Santos-Rosa et al., 2002; Ng et al., 2003; Schneider et al., 2004; Schubeler et al., 2004; Pokholok et al., 2005). Nucleosomes with H3K4me3 also occurred immediately downstream of the transcription start site of many genes that are transcriptionally inactive in ES cells. For example, cytokine receptors (IFNAR1), potassium channels expressed in neural tissue (KCNH1), and genes essential for adipose tissue function (UCP1) were all found to be occupied by nucleosomes modified at H3K4me3 (Figures 2D–2F). Transcripts for these genes have not been detected in ES cells based on DNA microarray and massively parallel signature sequencing (MPSS) data (Sato et al., 2003; Abeyta et al., 2004; Brandenberger et al., 2004a; Wei et al., 2005). The signals observed for histone H3K4me3 were typically lower (about 3-fold) at the inactive genes than at active genes but were substantially above background and located at the same position relative to the transcription start site.
To investigate the relationship between gene transcription and H3K4me3 modification genome-wide, we used three approaches. First, we identified genes whose transcripts in ES cells were consistently called “present” or “absent” in Affymetrix expression data (Sato et al., 2003; Abeyta et al., 2004) and averaged enrichment signals for these genes to create composite histone H3K4me3 profiles for the present and absent genes (Figure 2G). The composite profiles show that, for both gene classes, histone H3K4me3 signals span the 3 kb region surrounding the transcript start site and peak just downstream of the transcript start site. The level of the peak signal for H3K4me3 in the absent class of genes was about 1/3 of that for the present class. Second, we identified genes whose transcripts in ES cells were detected or not detected based on MPSS data (Brandenberger et al., 2004a; Wei et al., 2005) and created a composite histone H3K4me3 profile for the two classes (Figure 2H); the results were very similar to those obtained with microarray expression data. Finally, we used a compendium of microarray expression data for 79 different human cells and tissues to identify genes whose ES cell transcript expression levels were in the top 10%, middle 10%, and bottom 10% relative to all other cells (Su et al., 2004) and created a composite histone H3K4me3 profile for the three classes (Figure 2I). The results confirm that there is a positive correlation between gene transcript levels and histone H3K4me3 signal yet show that histone H3K4me3 is observed even at the promoters of genes for which there is no evidence of transcription in ES cells.
To ensure that the observation that histone H3K4me3 occupies most promoters in human ES cells was not an experimental or analytical artifact, we conducted a series of control experiments. A ChIP-chip experiment with nonspecific IgG produced only background signals that did not show a general enrichment of promoter regions (Figure S4; Supplemental Experimental Procedures). A ChIP-chip experiment with antibody against the transcription factor E2F4, with strength of signal comparable to H3K4me3, showed enrichment of DNA sequences at <10% of genes (Supplemental Experimental Procedures). Finally, an experiment with antibody specific for the histone H3K36me3 modification, which is associated with transcription elongation, showed enrichment of DNA sequences well downstream of the transcription start site at genes that produce RNA transcripts in ES cells (see below).
We considered the possibilities that the presence of histone H3K4me3 at most promoters is either due to transcription initiation events that occur at many more genes than suggested by measurements of mature transcripts or is instead independent of transcription. The former model predicts that additional histone modifications associated with transcription initiation, as well as the initiating form of Pol II itself, can be captured at the promoters of more protein-coding genes than are actively transcribed based on RNA transcript data (Sato et al., 2003; Brandenberger et al., 2004b; Su et al., 2004), while the latter would expect no such correlation.
We carried out ChIP-chip experiments in human ES cells using arrays tiling 21,223 known transcription start sites with an antibody specific for H3K9,14Ac, a pair of histone acetylation modifications associated with transcription initiation (Figures 3A–3E). Acetylation of histone H3K9,14 is critical for the recruitment of transcription factor IID (TFIID), an initiating step in transcription (Agalioti et al., 2002), and should therefore be associated with genes that have recently initiated transcription. Nucleosomes with histone H3K9,14Ac were enriched at the promoters of nearly 70% of genes, including both transcriptionally active and inactive genes (Table S2). Nearly all of the promoters (>95%) acetylated on H3K9 and H3K14 were also enriched for H3K4me3.
We then carried out ChIP-chip experiments in human ES cells using an antibody directed against the initiating form of Pol II (8WG16). The initiating form of Pol II occupied promoters both at genes that produce detectable transcripts and at genes that do not, albeit at lower levels (Figures 3F–3J; Table S2). We compared the set of genes whose promoters were occupied by Pol II with those occupied by histone H3K4me3. As expected, most promoters (98%) occupied by Pol II were also occupied by histone H3K4me3, whereas Pol II occupied few genes (2.3%) that lack histone H3K4me3. The fraction of all promoters enriched for the initiating form of Pol II (52% based on a 2-fold enrichment ratio) was somewhat smaller than that occupied by nucleosomes with H3K4me3 modification (79%) or by nucleosomes with H3K9,14Ac modifications (70%). This may be due to greater longevity of the nucleosome modifications than Pol II at these promoters or due to the higher signal produced by ChIP with histone antibodies than ChIP with RNA polymerase antibodies (Supplemental Results and Discussion). Nevertheless, our evidence suggests that the majority of genes experience transcription initiation in human ES cells.
If the promoters of most genes in human ES cells experience transcription initiation but only a portion of these genes produce complete transcripts, then we would expect to find nucleosome modifications associated with elongation only at those genes that produce detectable transcripts. H3K36me3 modification has been associated with elongating Pol II and is enriched within the body of transcriptionally active genes (Krogan et al., 2003b; Xiao et al., 2003; Pokholok et al., 2005). Indeed, we found that H3K36me3 occurs almost exclusively downstream of promoters that produce detectable transcripts in ES cells (Figures 4A–4C). Additionally, the total number of genes enriched for H3K36me3 is significantly lower than the number enriched for modifications associated with transcription initiation (Figures 4D and 4E). Histone H3 lysine 79 dimethyl (H3K79me2) modification has also been shown to be associated with active transcription (Schubeler et al., 2004; Morillon et al., 2005; Pokholok et al., 2005). The ChIP-chip results for H3K79me2 were similar to those for H3K36me3 (Figure 4F). Furthermore, we performed ChIP-chip experiments with antibodies that recognize both initiating and elongating Pol II and found that elongating Pol II signals occur almost exclusively downstream of promoters that produce detectable transcripts (Figure 4G). These data suggest that while transcription initiation occurs at both active and inactive genes, elongation is strongly correlated only with the set of genes that are active based on transcript detection.
We tested the possibility that many inactive genes occupied by histone H3K4me3 actually produce full-length transcripts that were not detected previously by either MPSS or microarray methods (Figures 4H–4I). Using quantitative RT-PCR, we assayed transcripts for 38 genes enriched in H3K4me3 but for which transcripts had not been detected previously (Figure 4H, green bars) and, as a positive control, 4 genes selected at random from the list of genes enriched in H3K4me3 for which transcripts had been previously detected (Figure 4H, blue bars). Standard curves derived from known quantities of three independent cDNAs were used to estimate the number of transcripts per cell (Figure S5). While the randomly selected positive control genes had transcript levels averaging 69 molecules per cell, the genes in the inactive set had transcript levels averaging 0.7 molecules per cell. The levels of transcripts in the positive control group ranged from 2 to 224 molecules per cell, whereas the inactive set had 0.0002 to 6.5 molecules per cell. These data show that some of the genes occupied by histone H3K4me3 and whose transcripts had not been detected previously by either MPSS or microarray methods are indeed transcribed, but that little or no transcript accumulates for most of these genes. Interestingly, the levels of histone H3K4me3, the modification associated with transcription initiation, were similar for the genes in both the active and inactive classes (Figure 4I). Thus, there is a class of genes for which there is ample evidence of transcription initiation (promoter occupancy by histone H3K4me3, histone H3K9,14Ac, and initiating Pol II) but for which there is little evidence of transcript elongation (histone H3K36me3, histone H3K79me2, and elongating Pol II) or transcript accumulation (RT-PCR).
For genes whose promoters were enriched in H3K4me3 but that did not accumulate detectable amounts of mRNA (Figure 4H), we attempted to detect RNA molecules originating from the extreme 5′ end of the gene using real-time qPCR. We designed probes to sequences within the 5′-most 70 nucleotides of genes expressed at <1 transcript per cell. The results demonstrate that RNA species containing sequences at the extreme 5′ end can be detected for essentially all genes tested (Figure S6). We conclude that many genes that do not produce full-length mRNAs nonetheless experience transcript initiation.
We searched for features common to the genes that lacked nucleosomes enriched with histone H3K4me3. Nearly half of the loci lacking histone H3K4me3 contained clusters of ≥3 homologous genes that encode conserved proteins including olfactory receptors, taste receptors, keratins, apolipoproteins, interleukins, and leukocyte antigens (Figure 5). In some instances, gene clusters lacking H3K4me3 enrichment extended across dozens of genes covering Mb regions of the genome (Figures 5A and 5B). For example, only 4 of the 309 olfactory genes represented on the DNA microarray showed >2-fold H3K4me3 enrichment (Table S3). For these clusters of olfactory receptor genes, there is evidence that a single enhancer controls selective transcription initiation at a single gene within the cluster (Lomvardas et al., 2006). It is possible, then, that the loci containing clustered homologous genes lacking histone H3K4me3 are generally regulated by enhancer regions that impose stricter control of initiation than typically observed at other genes.
ES cells possess unique properties, including pluripotency, self-renewal, and hyperdynamic chromatin (Meshorer and Misteli, 2006). Numerous reports have suggested that silent genes in ES cells are especially poised to become active upon differentiation (Azuara et al., 2006; Bernstein et al., 2006; Lee et al., 2006). We therefore considered the possibility that widespread transcription initiation at inactive genes is limited to ES cells. To test this, we profiled histone H3K4me3 modification across the genome in primary hepatocytes and B cells (Figure 6; Tables S7 and S8). The results show that the histone H3K4me3 modification is evident both at genes that produce detectable transcripts and at those that do not (Figures 6A and 6C). The promoters of 78% of genes in hepatocytes and 75% of genes in B cells contain nucleosomes with at least a 2-fold enrichment of histone H3K4me3 (Table S3). We conclude that nucleosomes with histone H3K4me3 occupy the majority of all active and repressed genes in both ES cells and differentiated cells of multiple lineages.
If the presence of nucleosomes with H3K4me3 at the promoters of inactive genes is associated with transcription initiation in these differentiated cells, we would expect to detect the initiating form of Pol II at many of those promoters. Indeed, we found that the initiating form of Pol II is enriched at 42% and 36% of inactive promoters in B cells and hepatocytes, respectively (Figures 6E–6H).
The majority of all genes contain H3K4me3-modified nucleosomes in both ES and differentiated cells, but many promoters do show cell-type-specific differences in H3K4 trimethylation. The promoters of about 25% of genes show differential H3K4 methylation in the three cell types we examined (Figure 7; Supplemental Results and Discussion). These genes frequently have cell-type-specific expression patterns and cell-type-specific function. For example, alcohol dehydrogenase genes do not contain H3K4me3-modified nucleosomes in either ES cells or B cells but do contain methylated nucleosomes in hepatocytes. Only in B cells are the leukocyte receptor cluster genes occupied by H3K4me3 nucleosomes and transcribed (Figure 7C). This cell type specificity of H3K4 methylation suggests that transcription initiation is tightly regulated at this subset of genes and supports our hypothesis that initiation is tightly regulated at many loci containing clustered homologous genes (Figure 5).
Our results suggest that most protein-coding genes in human cells, including most genes thought to be transcriptionally inactive, experience the hallmarks of transcription initiation. We found that nucleosomes with H3K4me3 and H3K9,14Ac modifications, together with the initiating form of Pol II, occupy the promoters of approximately 75% of protein-coding genes in human embryonic stem cells, but only about half of these produce detectable transcripts. Evidence for transcription initiation without transcript accumulation was also found for a similar fraction of genes in hepatocytes and B cells. These results suggest that protein-coding genes fall into three groups of regulatory behavior (Figure S7). The actively transcribed genes are occupied by nucleosomes with histone modifications that are hallmarks of both initiation and elongation, and these generally produce detectable transcripts. A second group experiences transcription initiation without evidence of transcript elongation or accumulation. Within this population of genes, which includes most genes encoding developmental regulators, regulation of events subsequent to transcription initiation must play important roles in preventing production or accumulation of transcripts. The third group consists of genes that are excluded from experiencing transcription initiation, where mechanisms that prevent transcription initiation must predominate.
Previous studies have emphasized that nucleosomes with histone H3K4me3 are associated with actively transcribed genes in various eukaryotes (Bernstein et al., 2002; Santos-Rosa et al., 2002; Ng et al., 2003; Schneider et al., 2004; Schubeler et al., 2004; Pokholok et al., 2005). Similarly, histone acetylation has been associated with actively transcribed genes (Sterner and Berger, 2000). The genome-wide results from human cells described here confirm that nucleosomes with H3K4me3 and H3K9,14Ac are present at actively transcribed genes and show that the relative levels of these modifications increase as transcript levels increase (Figure 2; Figure 3; Figure S6). Among the genes that are actively transcribed in ES cells based on MPSS and microarray data (Sato et al., 2003; Abeyta et al., 2004; Brandenberger et al., 2004a; Wei et al., 2005), 90% contain significantly enriched levels of H3K4me3 and H3K9,14Ac, with the highest levels of enrichment found at the promoters of genes that are most highly expressed (Figure 2I and data not shown). Our results also show that nucleosomes with histone H3K4me3 and H3K9,14Ac occupy promoter regions, with maximum occupancy just downstream of the transcription start site (Figure 1; Figure 2; Figure 3;Figure 6).
A striking result from this study is that among the 55% of genes that are transcriptionally inactive by either Affymetrix or MPSS data in ES cells (Table S5; Sato et al., 2003; Abeyta et al., 2004; Brandenberger et al., 2004a; Wei et al., 2005), at least half contain significantly enriched levels of H3K4me3 and H3K9,14Ac at their promoters but show no evidence of the elongation-associated H3K36me3 or H3K79me2 modifications (Table S2). Several previous studies have noted the presence of H3K4me3 at certain inactive genes. Nucleosomes with H3K4me3 have been reported at inactive genes within the β-globin locus (Schneider et al., 2004), X-linked genes (Brinkman et al., 2006), certain transcriptional regulatory genes (Azuara et al., 2006; Bernstein et al., 2006; Brinkman et al., 2006), and other genes (Kim et al., 2005). Because H3K4me3 was assayed at no more than 2% of all genes in these studies, it was not evident that occupancy of promoters of transcriptionally inactive genes by nucleosomes with H3K4me3 is a frequent and global phenomenon. A recent genome-wide study of H3K4me3 in T cells did note that 60% of genes were enriched for H3K4me3 but concluded that the modification was associated with actively transcribed genes and certain classes of genes such as those that are rapidly inducible (Roh et al., 2006). Our results show that nucleosomes with H3K4me3 are associated with the promoters of more than half of the transcriptionally inactive genes in three cell types, that these inactive genes are not limited to those poised for activation (Table S2; Supplemental Results and Discussion), and that this association is likely due to transcription initiation without transcript accumulation.
The co-occupancy of Pol II with H3K4me3- and H3K9,14Ac-modified nucleosomes at genes without detectable levels of transcription strongly suggests that a large fraction of human genes experience transcription initiation without transcript completion. Indeed, previous studies have also found Pol II localized to inactive genes (Soutoglou and Talianidis, 2002; Radonjic et al., 2005). We do not detect Pol II at all promoters occupied by H3K4me3 nucleosomes, and although this may simply be due to less robust signals for Pol II and the noise associated with microarray-based methods, it is also possible that some promoters acquire H3K4me3 nucleosomes in a manner that does not depend on transcription initiation. Evidence in budding yeast indicates that H3K4me3 modification occurs subsequent to Pol II recruitment and Ser5 phosphorylation of the Pol II C-terminal domain (Santos-Rosa et al., 2002; Krogan et al., 2003a; Ng et al., 2003; Pokholok et al., 2005). Similar studies in mammalian cells show that components of H3K4 methyltransferase complexes interact with the Ser5-phosphorylated form of Pol II, indicating that transcription initiation coincides with H3K4me3 deposition (Hughes et al., 2004). Because multiple H3K4 methylases exist in mammalian cells, it is not clear whether there are multiple mechanisms involved in their recruitment to promoters, and thus it is possible that some promoters acquire H3K4me3 nucleosomes in a manner that does not depend on transcription initiation.
There are at least two general mechanisms that may be responsible for transcription initiation without transcript accumulation. In one, the transcription apparatus is recruited to promoters and initiates transcription but is prevented from efficient transcript completion. This may be due to transcriptional pausing, poor processivity, or abortive initiation, all of which have been described as mechanisms that prevent initiating polymerase from elongating efficiently through specific genes (Conaway et al., 2000; Dvir, 2002; Sims et al., 2004; Saunders et al., 2006). A stable, paused Pol II molecule is found just downstream of the transcription initiation site at the Hsp70 gene in Drosophila (Lis, 1998) and it has been estimated that as many as 20% of Drosophila genes experience some level of transcriptional pause (Law et al., 1998). Paused RNA polymerases have also been observed at mammalian c-Myc, N-myc, and c-Fos promoters (Saunders et al., 2006). Poorly processive RNA polymerase molecules can initiate transcription at the HIV long terminal repeat but do not complete transcription in the absence of the TAT transcriptional activator (Laspia et al., 1993; Wei et al., 1998; Parada and Roeder, 1999). Abortive initiation, which involves premature termination of transcription in promoter regions, is best understood in prokaryotes but has also been described in eukaryotes (Dvir, 2002). Transcriptional pausing, poor processivity, or abortive initiation may all contribute to the observations described here. We do not detect Pol II molecules at all promoters that are occupied by nucleosomes with histone H3K4me3, so it is unlikely that a paused Pol II molecule is stably associated with the promoter regions of all of these inactive genes. Instead, our evidence is consistent with a less stable association of Pol II at promoters of genes that experience transcription initiation without transcript accumulation.
A second general mechanism that may be responsible for transcription initiation without transcript accumulation is posttranscriptional degradation. It is possible that pre-mRNA molecules are actually transcribed from the inactive genes but are rapidly degraded or accumulate at very low levels that are difficult to detect. Some transcripts may be produced and rapidly degraded as part of the process of gene silencing, as has been demonstrated for pericentromeric heterochromatin in Schizosaccharomyces pombe (Kato et al., 2005; Buhler et al., 2006). It is also possible that some low-abundance transcripts are targeted for degradation by miRNAs; indeed, mRNA species that are not detected in specific tissues are more likely to constain target sequences for tissue-specific miRNAs, which may help control any “leaky expression” (Farh et al., 2005). If very low levels of transcripts are produced from the inactive genes, it may be difficult to detect the mRNA or nucleosomes enriched for H3K36me3. Experiments with genome-wide tiling arrays show evidence for small amounts of transcript at almost all genes in numerous cell types (Cheng et al., 2005). While we were able to detect some amount of full-length transcript for almost all the inactive genes we tested using RT-PCR, the majority of these likely exist at levels much lower than one transcript per cell (Figure 4H).
While the majority of genes have promoters enriched for nucleosomes with histone H3K4me3, one class of genes provides a notable exception. Clusters of highly related genes often do not have nucleosomes with H3K4me3 and H3K9,14Ac modifications, nor are they occupied by the initiating form of Pol II (Figure 5; Figure 7). Many of the clustered genes lacking H3K4me3 are expressed in a tissue-specific fashion (Su et al., 2004; Table S3; Supplemental Results and Discussion). The most numerous members of this class of genes are the olfactory receptors, in which there is evidence that a single enhancer controls selective transcription initiation at only one of many such genes (Lomvardas et al., 2006). The fact that the clusters of olfactory genes do not experience transcription initiation and do not contain nucleosomes enriched in H3K4me3 further supports the view that the presence of H3K4me3 is a hallmark of transcription initiation at most genes.
In summary, we have found that transcription initiation and histone H3K4me3 modification occur at the promoters of most protein-coding genes in human cells. This may serve to create a chromatin structure at transcription initiation sites that is more accessible to transcription factors and signaling kinases than at much of the rest of the genome, perhaps providing landmarks that produce a much smaller search space for these regulatory molecules. Such landmarks might facilitate reprogramming of gene expression during differentiation.
Human H9 ES cells (WiCell) were cultured as described (Boyer et al., 2005). Hepatocytes were obtained directly from perfused human liver at the University of Pittsburgh through the Liver Tissue Procurement and Distribution System (S. Strom). REH cells (ATCC) were cultured in RPMI 1640 media containing 10% FBS. Cells were crosslinked as described (Boyer et al., 2005).
ChIP was combined with DNA microarray analysis as described (Boyer et al., 2005). The antibodies used here were specific for H3K4me3 (Abcam ab8580), hypophosphorylated RNA polymerase II (8WG16), H3K9,14Ac (Upstate 06-599), H3K36me3 (Abcam ab9050), H3K79me2 (Abcam ab3594), total histone H3 (Abcam ab1791), E2F4 (Santa Cruz Biotech sc-1082), and nonspecific IgG (Santa Cruz Biotech, sc-2043). The design of the oligo-based arrays, which were manufactured by Agilent Technologies, is described in detail in Supplemental Experimental Procedures. In addition to simple ratio measurements mentioned in the text, a whole-chip error model was used to calculate confidence values from the enrichment ratio and signal intensity of each probe (probe p value) and of each set of three neighboring probes (probe-set p value). Probe sets with significant probe-set p values (p < 0.001) and significant individual probe p values were judged to be bound at high confidence (see Supplemental Experimental Procedures for additional information). Bound regions were assigned to an Entrez gene ID if they were within 1 kb of the transcription start site from one of four genomic databases, RefSeq, MGC, Ensembl, or UCSC Known Gene, where the transcript had been assigned to the Entrez gene (transcript data downloaded from UCSC Genome Browser, http://genome.ucsc.edu/cgi-bin/hgGateway, using NCBI build 35). Enrichment ratio plots were expressed as a sliding average of three neighboring probes.
Gene expression data were collated from H1 ES cells (Sato et al., 2003); H9, HSF1, and HSF6 ES cells (Abeyta et al., 2004); and 79 differentiated human cell and tissue types (Su et al., 2004) and were analyzed as described in detail in Supplemental Experimental Procedures and in Lee et al. (2006). These data are summarized in Table S5. Absent/present (AP) calls for hepatocytes and REH cells were derived from liver and CD19+ peripheral blood samples from Su et al. (2004).
Standard curves were produced for three independent cDNAs (ANG, GPR143, and KCNJ3; OriGene Technologies) using 100 ng–0.0001 pg of vector subjected to quantitative real-time PCR in duplicate (TaqMan predeveloped gene expression assays; Applied Biosystems). The standard curves were then averaged to produce a composite standard curve (cycle threshold [Ct] versus molecules present) to which all test measurements were compared (Figure S5). 6 × 106 hES cells cultured as above were enriched from feeder murine embryonic fibroblasts by trypsinization. RNA was extracted from hES cells by the TRIzol method and precipitation (Invitrogen). Total RNA was reverse transcribed by means of the Invitrogen SuperScript III First-Strand Synthesis System using both oligo(dT) and random hexamer primers to produce cDNA. cDNA was amplified using TaqMan predeveloped gene expression assays as described by the manufacturer in an Applied Biosciences 7500 Real-Time PCR Thermocycler in duplicate. cDNA abundance was determined by measuring the point during cycling when amplification could first be detected (auto Ct value determined by Applied Biosciences Prism 7500 software) rather than the endpoint of the 40-cycle reaction. The measured Ct value was used to calculate the estimated transcripts present in the test sample using relative quantitation to the composite standard curve. Genes determined to be active or inactive by Affymetrix and MPSS methods were measured using unique TaqMan predeveloped gene expression assays for each species.
We thank M. Mitalipova for ES cell expertise; R. Kumar, E. Jacobsen, S. Johnstone, R. Jenner, S. McCuine, E. Herbolsheimer, and the Whitehead Institute Center for Microarray Technology for technical assistance; D. Odom and S. Strom for hepatocytes; and D. Gifford, J. Lis, J. Wysocka, T. Lee, and the Young lab for helpful discussions. This work was supported by NIH grant HG002668. R.A.Y. consults for Agilent Technologies.
Raw Data Further information, methods, and raw data can be found at http://web.wi.mit.edu/young/hES_chromatin/.
Supplemental Data Supplemental Data include Supplemental Experimental Procedures, Supplemental Results and Discussion, Supplemental References, ten figures, and eight tables and can be found with this article online at http://www.cell.com/cgi/content/full/130/1/77/DC1/.
Accession Numbers All microarray data discussed herein are available at ArrayExpress under the accession designation E-TABM-277.