|Home | About | Journals | Submit | Contact Us | Français|
Set2p, which mediates histone H3 lysine 36 dimethylation (H3K36me2) in Saccharomyces cerevisiae, has been shown to associate with RNA polymerase II (RNAP II) at individual loci. Here, chromatin immunoprecipitation-microarray experiments normalized to general nucleosome occupancy reveal that nucleosomes within open reading frames (ORFs) and downstream noncoding chromatin were highly dimethylated at H3K36 and that Set2p activity begins at a stereotypic distance from the initiation of transcription genome-wide. H3K36me2 is scarce in regions upstream of divergently transcribed genes, telomeres, silenced mating loci, and regions transcribed by RNA polymerase III, providing evidence that the enzymatic activity of Set2p is restricted to its association with RNAP II. The presence of H3K36me2 within ORFs correlated with the “on” or “off” state of transcription, but the degree of H3K36 dimethylation within ORFs did not correlate with transcription frequency. This provides evidence that H3K36me2 is established during the initial instances of gene transcription, with subsequent transcription having at most a maintenance role. Accordingly, newly activated genes acquire H3K36me2 in a manner that does not correlate with gene transcript levels. Finally, nucleosomes dimethylated at H3K36 appear to be refractory to loss from highly transcribed chromatin. Thus, H3K36me2, which is highly conserved throughout eukaryotic evolution, provides a stable molecular mechanism for establishing chromatin context throughout the genome by distinguishing potential regulatory regions from transcribed chromatin.
In eukaryotic cells, the accessibility of the DNA template is influenced by chromatin structure. For example, in Saccharomyces cerevisiae, transcription factors have been shown to bind to consensus sequences upstream of genes in preference to identical consensus sequences that occur within the coding sequences of transcribed genes (24, 31). Likewise, transposons preferentially insert into promoter regions (33), and the double-strand breaks required for meiotic recombination in S.cerevisiae occur preferentially in gene promoters rather than in the coding regions (12, 55). Chromatin context therefore is a major determinant of where on the genomic DNA template many biological phenomena will occur.
Regulation of accessibility to the DNA template is likely to be mediated in large part through differential regulation of nucleosome occupancy. Promoter regions of S. cerevisiae exhibit reduced nucleosome occupancy genome-wide (2, 26), and these differences in nucleosome occupancy are important for promoter accessibility (47). Furthermore, in S. cerevisiae, promoter and nonregulatory chromatin can be biochemically fractionated, indicating that those regions have distinct physical properties (36). Nucleosomes can be moved or displaced from specific genomic regions by several general mechanisms, including nucleosome-remodeling complexes like SWI/SNF and RSC (34), binding of activators to DNA (3, 4, 35), transcriptional elongation by RNA polymerase II (RNAP II) (20, 26, 46), and inherent properties of DNA sequence (47). Template accessibility and nucleosome occupancy can also be mediated by posttranslational modification of the N-terminal histone tails, most notably acetylation (27, 48). Although chromatin context may be defined in part by regional differences in histone modifications, no chromatin mark has been shown to correspond specifically to coding or regulatory regions throughout the genome. Here, we present evidence that dimethylation of histone H3 at lysine 36 (H3K36me2), which is mediated by the methyltransferase Set2p (25, 51), may provide such a mark.
Set2p interacts with the C-terminal domain (CTD) of RNAP II (22, 29, 56), and this interaction is regulated by the phosphorylation state of the CTD. Serine 5 (Ser5) of the CTD repeat is phosphorylated by Kin28p during initiation of transcription, while serine 2 (Ser2) and Ser5 are phosphorylated by Ctk1p during elongation (8, 17, 19, 30). Set2p associates preferentially with Ser2/Ser5 phosphorylated repeats of the RNAP II CTD, and deletion of CTK1 abolishes H3K36me2 (22, 56). Set2p-RNAP II interactions are also dependent on the Paf1 complex (Paf1p, Rtf1p, Cdc73p, Ctr9p, and Leo1p) (22), which also associates with RNAP II (21, 40, 49). This and other biochemical data suggest that Set2p associates with RNAP II specifically during transcription elongation (18, 22, 29, 45, 56). Chromatin immunoprecipitation (ChIP) assays followed by quantitative PCR on a few selected loci have supported this assertion, showing that H3K36me2 is generally restricted to the transcribed regions of RNAP II-regulated genes (1, 18, 22, 45).
While there is strong evidence that Set2p is associated with elongating polymerase, the physiological functions of Set2p and H3K36me2 are still unknown. Evidence suggesting a function for Set2p in transcriptional elongation comes from results showing either sensitivity or resistance of set2Δ strains to the elongation inhibitor 6-azauracil. These phenotypes are similar to those exhibited by strains defective for genes encoding elongation factors like Chd1p, Iswi1p, and Fkh1p (22, 28, 29, 45, 57). A role in transcriptional elongation is also supported by synthetic genetic interactions between set2Δ and deletions of all members of the Paf1 complex, the chromodomain factor Chd1p, a putative elongation factor Soh1p, and the Bre1p or Lge1p components of histone H2B ubiquitination complex (22). However, whatever role Set2p plays in elongation is either not essential or redundant, since set2Δ strains are viable and, in many backgrounds, exhibit very mild phenotypes.
To further elucidate the cellular function of H3K36me2, we determined its pattern of distribution throughout the S. cerevisiae genome. We performed additional experiments to determine how the pattern of H3K36me2 changes in response to a change in global transcriptional state and the relationship between the H3K36me2 mark and nucleosome stability. H3K36me2 demarcates the structurally distinct regulatory and nonregulatory regions of yeast genomic chromatin and may serve as an indicator of chromatin context.
Throughout the paper, we use a recently proposed uniform histone modification nomenclature (53). Thus, for example, “H3K36” refers to histone H3 lysine at residue 36, and “H3K36me2” refers to dimethylation of that residue.
For H3K36me2 and histone H3 ChIPs, strain AS4 (MATα trp1-1 arg4-17 tyr7-1 ade6 ura3) was used (50). For histone H4 ChIPs, a previously described myc-tagged H4 strain constructed in strain UCC1111 [MATα ade2::his3-Δ200 leu2-Δ0 lys2-Δ0 met15-Δ0 trp1-Δ63 ura3-Δ0 adh4::URA3-TEL (VII-L) hhf2-hht2::MET15 hhf1-hht1::LEU2 pRS412 (ADE2 CEN ARS)-HHF2-HHT2] was used (37, 38). Unless otherwise described, yeast was grown to an optical density of 0.8 to 1.0 at 600 nm with shaking at 30°C in 100 ml of yeast extract-peptone-dextrose media (1% yeast extract, 2% peptone, 2% dextrose).
Antibodies against histone H3 lysine 36 dimethylation have been described previously (51) and were derived from Upstate (catalog no. 07-369). myc-tagged antibodies were also obtained from Upstate (catalog no. 05-419). The rabbit histone H3 antiserum was obtained from Abcam, Inc. (AB1791), and was raised in rabbits using a peptide corresponding to amino acids 124 to 135 (CGIQLARRIRGERA) of human histone H3.
Peptides (KSAPSTGGVKKPHRYKPGTGK-BIOTIN) in which the residue corresponding to H3K36 (underlined) was either mono-, di-, or trimethylated were resuspended in double-distilled H2O (10 μg/μl) and serially diluted in Tris-buffered saline (TBS) (150 mM NaCl, 10 mM Tris, pH 7.6). Aliquots of 100-μl peptide-TBS solution were spotted onto polyvinylidene difluoride membranes by using a Bio-Rad dot blot apparatus. Membranes were washed in TBS and then blocked in a solution of 2.5% (wt/vol) Carnation nonfat dry milk in TBS-Tween 20 (0.1% Tween 20 in TBS) for 10 min prior to incubation with a 1:10,000 dilution of the specified antibody for 2 h at room temperature. Membranes were washed with TBS-Tween 20 for 10 min three times, incubated with anti-rabbit horseradish peroxidase-conjugated immunoglobulin G for 2 h at room temperature, and then washed again for 10 min three times prior to detection using ECL-Plus from Amersham.
ChIP assays were performed as described previously (23). Briefly, whole-cell extracts were prepared from 1% formaldehyde-fixed wild-type and set2Δ cells by using lysis buffer (50 mM HEPES-KOH, pH 7.5, 300 mM NaCl, 1 mM EDTA, 1% Triton X, and 0.1% sodium deoxycholate) and sonicated to shear the chromatin (0.25- to 1-kb range). Immunoprecipitation was performed with anti-H3K36me2, anti-myc, or anti-H3. After cross-link reversal at 65°C, DNA was extracted by using a QIAGEN PCR purification kit according to the manufacturer's instructions.
ChIP-enriched DNA and reference DNA in all experiments were amplified as described previously (5). Briefly, two initial rounds of DNA synthesis with T7 DNA polymerase using primer 1 (5′-GTTTCCCAGTCACGATCNNNNNNNNN-3′) were followed by 25 cycles of PCR with primer 2 (5′-GTTTCCCAGTCACGATC-3′). Cy3-dUTP or Cy5-dUTP was then incorporated directly with an additional 25 cycles of PCR using primer 2. Microarray hybridizations were performed using standard procedures (16). The arrays were scanned with a GenePix 4000 scanner, and data were extracted with Genepix 5.0 software. Data were normalized such that the median log2 ratio value for all quality elements on each array equaled zero, and the median of pixel ratio values was retrieved for each spot. Only spots of high quality by visual inspection, with at least 50 pixels of quality data (regression R2 of >0.6) and for which intensity of the reference signal was strong (>350 U), were used for analysis. Arrayed elements that did not meet these criteria on at least half of the arrays were excluded from analysis. All data were log transformed before further analysis. For normalization with the nucleosome occupancy data, the median log2 ratio values of H4-myc ChIP were subtracted from the median H3K36me2-ChIP ratio values. Unless otherwise noted, all data presented are nucleosome occupancy normalized in this way. While many methods of bulk nucleosome normalization are possible, all must contend with the inherent difficulties of combining ChIP data sets produced with two different antibodies (6). The method used here is simplest and provides a more realistic representation of the modification pattern than do unnormalized data. We provide all raw data (see below) so that readers may apply their preferred normalization method.
Open reading frames (ORFs) and intergenic regions from yeast (S288C) were PCR amplified and printed on polylysine-coated glass slides by using a robotic arrayer as described previously (16). ORFs were generally represented by PCR products that extended from start codon to stop codon. Elements representing intergenic regions generally included all DNA between annotated ORFs, with the fragments divided such that PCR products were no longer than 1.5 kb.
All raw microarray data and images are available to the public through the UNC microarray database (https://genome.unc.edu/). Data are also available in Table S2 of the supplemental material and through GEO (accession number GSE 2991).
As the initial step in our goal to determine the genome-wide location of H3K36me2 in S. cerevisiae, we characterized a polyclonal antibody directed against H3K36me2. While the general specificity of this antibody for methylation at H3K36 had been previously verified (18, 57), its precise specificities to the different possible H3K36 methylation states (mono-, di-, and trimethylation) were unknown. To determine the specificity of this antibody for H3K36 methylation, we performed dot blots against peptides that were either mono-, di-, or tri-methylated at the residue corresponding to H3K36 (see Materials and Methods). As shown in Fig. Fig.1A,1A, the antiserum was specific to dimethylation of H3K36 and did not cross-react with any of the related modifications.
Having verified the specificity of this antiserum, ChIP experiments were performed using extract from wild-type strains. To assess the relative abundance of genomic fragments enriched by the ChIP, samples were RNase treated and DNA was amplified and labeled fluorescently. In parallel, total genomic DNA was prepared from input extract, RNase treated, amplified, and labeled with a different fluorescent marker. The two samples were then analyzed by comparative hybridization to DNA microarrays. The microarrays used in this study cover the entire yeast genome, including the coding and noncoding regions, at approximately 1-kb resolution (see Materials and Methods). H3K36me2 ChIP-microarray (ChIP-chip) experiments were performed with a total of 12 independent wild-type yeast cultures (see Table S1 in the supplemental material for details). As a control, ChIP-chip experiments were performed from each of eight independent extracts in which H3K36me2 was eliminated by deletion of the SET2 gene. We found that H3K36me2 ChIPs enriched chromatin corresponding to ORFs relative to chromatin from genomic regions upstream of genes (Fig. (Fig.1B1B).
In S. cerevisiae, the noncoding regions downstream of two convergently transcribed genes are almost always completely transcribed, often on both strands, by the converging polymerases (15). We found that these regions, which correspond to 3′ untranslated regions (UTRs), were enriched by H3K36me2 ChIPs at a level equal to or greater than the enrichment observed at ORFs (Fig. (Fig.1B).1B). To confirm that our ChIPs were reflections of H3K36me2 levels, we performed ChIP experiments using extracts from set2Δ strains. Very little DNA was recovered from these ChIPs, and analysis of the DNA that was recovered revealed none of the specific patterns described above (Fig. (Fig.1B).1B). We therefore interpret the efficiency of DNA recovery at each locus after H3K36me2 ChIP to reflect relative H3K36me2 levels. The evidence presented thus far supports the hypothesis that regions of the genome transcribed by RNAP II are enriched for H3K36me2.
In further support of the hypothesis that H3K36me2 is restricted to transcribed regions, the lowest levels of H3K36me2 were found in chromatin upstream of two divergently transcribed genes (“double promoters”), which is not expected to be transcribed by RNAP II (Fig. (Fig.1B).1B). On the other hand, “single promoters” are expected to be partially transcribed since they contain the 3′ UTR of the upstream gene. As predicted, single promoters exhibit a level of enrichment lower than that observed for ORFs and 3′ UTRs but higher than that observed for double promoters (Fig. (Fig.1B).1B). These experiments provide evidence that dimethylation of histone H3 at lysine 36 is absent from regions of the genome that are not transcribed by RNAP II.
Nucleosome occupancy is generally lower in noncoding regions upstream of genes than in ORFs (2-4, 26, 43, 52). Thus, we wondered if the pattern we observed with the H3K36me2-specific antiserum was a reflection, at least in part, of general nucleosome occupancy. To ensure that our results were specific to the H3K36me2 modification, we normalized our H3K36me2 distribution data to general nucleosome occupancy. We prepared extracts from yeast cells in which the only source of histone H4 was tagged with the myc epitope and performed ChIP assays using anti-myc antibodies. Histone H4 ChIPs were performed on five independent yeast cultures. Consistent with published data (26), results of the histone occupancy ChIPs revealed that nucleosomes were more enriched in the coding region of genes than in intergenic regions (Fig. (Fig.1B).1B). Indistinguishable results were obtained with nucleosome ChIP-chips using an antibody specific to the C terminus of histone H3 (data not shown). In parallel, H3K36me2 ChIPs were performed using the same extracts. Even without normalization, the qualitative differences between the distribution of H3K36me2 and general nucleosome occupancy indicated that the H3K36me2 pattern was indeed distinct. Specifically, noncoding regions downstream of convergently transcribed genes were enriched by the H3K36me2 ChIP at a level nearly equal to the enrichment observed at ORFs, whereas in histone H3 or H4 ChIPs, ORFs were more strongly enriched than 3′ UTRs (Fig. (Fig.1B1B).
For further data analysis, we chose the simplest possible normalization routine by subtracting the median log2 ratio values of the H4-myc ChIP-chip data from the median ratio values of H3K36me2 ChIP-chip data (see Materials and Methods). After normalization, the clear enrichment of transcribed genomic regions and corresponding depletion of regulatory regions of the genome persisted (Fig. 1C, D, F, and G). As a test of the validity of this normalization approach, we performed direct comparative hybridizations between DNA enriched by H3K36me2 ChIP and DNA enriched with H4-myc ChIP. The data obtained from direct comparative hybridizations were essentially identical to the computationally normalized H3K36me2 data (Fig. 1E and H).
In H3K36me2 ChIPs, chromatin downstream of convergently transcribed genes was by far the most highly enriched class of noncoding chromatin (Fig. (Fig.1G).1G). This suggested that Set2p is active throughout the entire transcript length, providing a possible mechanism for distinguishing nonregulatory intergenic regions from promoters. To validate this observation, we interrogated our ChIP results with PCR primers that represent regulatory and transcribed chromatin across a 16-kb region on chromosome XII (Fig. (Fig.2).2). Again we found that the chromatin heavily enriched by H3K36me2 ChIPs corresponded to coding regions and to regions lying downstream of two convergently transcribed genes. For example, SST2 and LEU3 are both transcribed under the conditions assayed. Their 3′ UTRs are each about 450 bp in length (15) and are represented by the primer sets B, C, and D in Fig. Fig.2.2. Chromatin covered by these primer sets is among the most heavily enriched in the tested region.
Localization of H3K36me2 to chromatin in the body of the RNAP II-transcribed genes is consistent with the earlier studies showing that Set2p associates with the elongating form of RNAP II. We wondered if the frequency of transcription correlated with the degree of H3K36 dimethylation. The transcription frequency (also called transcription rate) for each S. cerevisiae gene has been calculated based on measurements of steady-state RNA levels and RNA half-lives in exponentially growing yeast cells at 30°C (14). We compared these published transcription frequency values to the results of 12 independent H3K36me2 ChIP-chip experiments. We found that among genes with measurable transcription frequencies (>0 mRNA/hour), the level of H3K36me2 enrichment did not correlate with transcription frequency (Fig. (Fig.3A).3A). Genes with transcription frequencies ranging from 1 to 120 mRNAs/hour were consistently enriched in the H3K36me2 ChIPs. For example, despite low rates of transcription, genes like BUD14 and TPK2 (both 1.8 mRNAs/hour) were enriched in H3K36me2 ChIPs (97th and 95th ChIP percentiles, respectively) as highly as were heavily transcribed genes like HXK2 (71 mRNAs/hour, 96th ChIP percentile). These results suggest that H3K36 dimethylation occurs chiefly in the initial instance or early instances of gene transcription, with subsequent transcription playing at most a maintenance role.
Previous studies have shown that chromatin in very highly transcribed genes exhibits relatively low nucleosome occupancy, suggesting that nucleosomes are either removed or displaced temporarily on transcriptionally active chromatin (26, 46). However, we did not observe this phenomenon for nucleosomes that contain dimethylated H3K36. The levels of H3K36me2 appeared to remain constant on genes, irrespective of their rates of transcription (Fig. (Fig.3B).3B). This suggests that the level of H3K36 dimethylation is maintained on highly transcribed chromatin either by preferential retention of nucleosomes containing H3K36me2 or by nonlinear increases in H3K36 methylation as a function of transcription rate. For the “nonlinear” hypothesis to be true, the H3K36me2 mark would have to be less than saturated at all genes, and increased enzymatic activity of Set2p would have to be linked to increased transcriptional activity for only the subset of very heavily transcribed genes at which bulk nucleosome loss has been observed. Since no correlation between transcription rate and H3K36me2 level is observed across a broad range of transcription rates (Fig. (Fig.3A),3A), we favor the “preferential retention” hypothesis.
We then asked whether nucleosomes are less stable on highly transcribed chromatin in the absence of H3K36me2. We measured bulk nucleosome occupancy by performing histone H3 ChIPs in a set2Δ strain. However, we found that nucleosomes are lost from highly transcribed chromatin equally in set2Δ and wild-type strains, indicating that nucleosome occupancy is not directly affected by H3K36me2 (see Fig. S2 in the supplemental material). Therefore, additional mechanisms may work to stabilize H3K36me2 nucleosomes, or H3K36me2 may be an indicator, but not a cause, of transcription-stable nucleosomes.
The hypothesis that H3K36 dimethylation is stable and occurs chiefly in the initial instance of gene transcription predicts no correlation between H3K36 dimethylation level and transcriptional rate (as we observed) but does predict a positive correlation between H3K36 dimethylation level and the on or off transcription state of a gene. We defined a gene as on if it had a measurable transcription rate (>0 mRNA/hour) (14) and off if it did not (0 mRNA/hour) (14). To test this prediction, we ranked the ORFs according to their enrichment levels in H3K36me2 ChIP experiments and divided them equally into 10 bins, such that the least enriched 10% of ORFs were in bin 1, the most enriched 10% were in bin 10, and so on. We then simply asked what proportion of genes in each bin was on (Fig. (Fig.4A).4A). The results show that genes that were not enriched by H3K36me2 ChIPs were more likely than others to be off and that the likelihood of any given gene to be off decreased with increasing H3K36me2 ChIP enrichment. No such trend was observed with H4-myc ChIPs (Fig. (Fig.4A)4A) or with H3K36me2 ChIPs performed with a set2Δ strain (data not shown). This result provides evidence that H3K36me2 is not linked with how often a gene is transcribed per se but rather with the occurrence of transcription.
The hypothesis that H3K36 dimethylation is established by initial instances of gene transcription predicts that upon a switch from an inactive to an active state, chromatin will become dimethylated at H3K36. To test this prediction, we induced transcription at hundreds of genes simultaneously by subjecting yeast cells to transfer from 25°C to 37°C (7, 10). ORFs that become transcribed during heat shock acquire the H3K36me2 mark, while a relative decrease in H3K36me2 levels is observed in ORFs that are repressed (Fig. (Fig.4B).4B). Of particular interest are the genes that were off during log phase but strongly induced after heat shock (measured expression increase of >log2 2). Of these genes, 75% (48/65) had increased levels of H3K36me2. In contrast, among genes that were off during log phase and remained off during heat shock (expression increase of <log2 1), only 47% (144 of 306) exhibited increased levels of H3K36me2. The difference between these groups was significant (χ2 test; P = 8.6 × 10−5).
We also observed a relative loss of H3K36me2 in the ORFs of repressed genes. Of genes that were on during log phase and that remained active (log2 expression ratios of >−1), only 46% (1,099/2,388) had increased H3K36me2 levels. In contrast, of genes that were on during log phase and repressed fourfold or more after heat shock, 65% (210/332) had decreased levels of H3K36me2 (χ2 test; P = 0.0014). By examining H3K36me2 ChIP data that were not normalized to nucleosome occupancy (data not shown), we found that this relative decrease is attributable in part to bulk nucleosome replenishment at repressed genes (26) rather than loss of H3K36me2 on existing nucleosomes.
To further confirm our ChIP-chip data, we performed quantitative ChIP analysis before and after heat shock along the length of PHM7 (Fig. (Fig.5A).5A). PHM7 is repressed in logarithmically growing cell cultures (~0 mRNA per hour) but is induced by a factor of 7 during heat shock (10). The results showed that this gene acquired the H3K36me2 mark after heat shock and only in the 3′ region of the ORF (primer sets R and S) (Fig. 5B to C). This result persisted after we normalized for bulk histone occupancy changes following heat shock (Fig. (Fig.5D5D).
Previous studies at selected genes, including ADH1, PYK1, PMA1, and SCC2, have shown that dimethylation ofH3K36 is initiated after transcriptional initiation, concomitant with association of Set2p with the elongating polymerase (1,18, 22). If this mechanism operates genome-wide, and if the interval between transcriptional initiation and Set2p association is constant regardless of gene length, longer genes will appear to be enriched by our H3K36me2 ChIPs to a greater extent than shorter genes. The reason for this predicted relationship is illustrated in Fig. Fig.6A6A and the corresponding legend and is a consequence of the fact that the DNA on our microarrays covers each ORF from the start codon to the stop codon, regardless of length.
To test this prediction, we plotted enrichment in H3K36me2 ChIPs against ORF length (Fig. (Fig.6B).6B). This analysis showed that longer ORFs appeared to be more efficiently enriched in H3K36me2 ChIPs, consistent with the prediction made by the hypothesis described above (Fig. (Fig.6B).6B). The “leveling off” observed at ~2,000 bp is also a predicted feature of a mark that begins at a set distance from the transcriptional start, since the proportion of the gene that is not modified becomes smaller with increasing length. No such relationship between ORF length and enrichment is observed with H3K36me2 ChIPs performed from set2Δ extracts (data not shown). We did observe a weak relationship between ORF length and apparent bulk nucleosome occupancy for the H4-myc ChIPs performed in this study (the effect was even less pronounced in reference 26). This may be due to nucleosome loss very near to the site of transcriptional initiation, which would be predicted to have this effect. In any case, the magnitude of the relationship between size and length was much stronger for the H3K36me2 ChIPs, suggesting a defined boundary for the initiation of H3K36me2 inside the ORFs genome-wide. This conclusion is further supported by the H3K36me2 ChIP profile across 16 kb of chromosome XII (Fig. (Fig.2D).2D). For example, primer set A is situated ~154 bp from the LEU3 transcription start site and does not detect significant enrichment, whereas chromatin represented by primer set E is situated ~400 bp after the start site of SST2 and is highly enriched. Likewise, for the relatively long FMP27 gene, primer set K at the 5′ end of the coding region does not report enrichment, but the downstream primer sets L, M, and N show steady enrichment of the chromatin at the 3′ end of the coding region.
In the course of this analysis, we noted that short genes, on the whole, tend to be more frequently transcribed than long ones. Therefore, we wondered whether the correlations between length and ratio reported here confounded the conclusions presented in Fig. Fig.3.3. To test this possibility, only ORFs greater than 1,000 bp in length, which do not show a relationship between length and transcription frequency, were used in the same analysis shown in Fig. Fig.3.3. The resulting plot was indistinguishable from the one presented (data not shown).
The results presented thus far provide evidence that dimethylation of H3K36 is restricted to transcribed genomic regions. A corollary to that hypothesis is that H3K36 dimethylation ends upon transcriptional termination. This hypothesis predicts that the smaller the interval between two convergently transcribed genes, the higher the measured ratio of enrichment in our H3K36me2 ChIPs. This is because these shorter regions are likely to be entirely transcribed, while as the distance between the two upstream genes grows, it becomes progressively less likely that the entire intergenic region will be transcribed. This would result in unmodified nucleosomes toward the center of the fragment, resulting in lower ratios (illustrated in Fig. Fig.7A).7A). Note that this prediction of the relationship between size and enrichment is the opposite of the previously described scenario for ORFs.
As predicted, we find that these shorter nonpromoter regions appear more highly enriched than longer nonpromoters in our H3K36me2 ChIPs (Fig. (Fig.7B).7B). In contrast, the sizes of the regulatory regions upstream of two divergently transcribed genes, which themselves are not transcribed, show no relationship to the degrees of ChIP enrichment (Fig. (Fig.7C).7C). Single promoters are expected to be partially transcribed, since they contain the 3′ UTR of an upstream gene. As predicted, single promoters exhibit an inverse relationship (weaker than that observed for nonpromoters) between size and reported H3K36me2 ratio value (Fig. (Fig.7D).7D). No such relationship was observed when ChIPs were performed with histone H3 antibodies or in H4-myc strains or when H3K36me2 ChIPs were performed from set2Δ extracts (data not shown).
To explore the possibility of other mechanisms of H3K36 dimethylation, we examined chromatin at telomeres and mating-type loci, two types of loci that are generally transcriptionally silent but serve specialized genomic functions. Both regions exhibit high nucleosome occupancy but are lacking in H3K36me2 (Fig. (Fig.8).8). We also asked whether H3K36 dimethylation was specific to chromatin transcribed by RNAP II or whether other polymerases might support cotranscriptional modification. Due to the repetitive nature of the RNAP I-transcribed rRNA genes, we were unable to make conclusions regarding H3K36me2 levels at these loci. However, we examined the RNAP III-transcribed tRNA loci and found that these regions were not enriched by our H3K36me2 ChIPs (Fig. (Fig.8).8). Although general nucleosome occupancy was also very low in regions transcribed by RNAP III, these results are consistent with the lack of evidence linking Set2p to RNA polymerase III. Therefore, Set2p's function in dimethylation of H3K36 appears to be mediated exclusively through its association with RNAP II.
While previous studies had examined the distribution and behavior of S. cerevisiae H3K36me2 at a few loci, it was not known whether the reported characteristics were representative of other genes and genomic regions. The genome-wide analyses of H3K36me2 distribution and dynamics reported here allow us to unambiguously identify the core properties of this epigenetic mark in S. cerevisiae. The most important of these emergent properties are as follows. (i) H3K36me2 is scarce or absent in upstream gene regulatory regions, telomeres, mating loci, and regions transcribed by RNA polymerase III. This provides evidence that the enzymatic activity of Set2p is firmly restricted to its association with RNAP II. (ii) The degree of H3K36 dimethylation within ORFs correlates with the “on” or “off” state of transcription, but among genes that are measurably transcribed, the degree of modification does not correlate with the frequency of transcription. This provides evidence that H3K36 dimethylation occurs chiefly in the initial instance of gene transcription, with subsequent rounds playing at most a maintenance role. (iii) In support of the previous point, newly transcribed genes that had been transcriptionally dormant acquire the H3K36me2 mark but at levels that do not correlate with the degree of induction. (iv) Set2p enzymatic activity begins in ORFs at a fairly constant distance from the initiation of transcription, which does not vary with gene length. (v) Nucleosomes in noncoding regions immediately downstream of transcribed genes are as highly dimethylated at H3K36 as nucleosomes in ORFs. This result provides evidence that once Set2p is associated with RNAP II, Set2p continues to be active and remains associated with RNAP II throughout the rest of the transcript length. (vi) With increasing distance downstream of the stop codon, H3K36me2 levels decrease. This result provides evidence that Set2p histone methyltransferase activity is terminated along with the termination of transcription. (vii) Unlike the nucleosome depletion that is normally observed in the ORFs of highly transcribed chromatin, H3K36 dimethylated nucleosomes appear to be refractory to depletion, suggesting preferential retention of H3K36 methylated nucleosomes on DNA. However, nucleosome dynamics in a set2 deletion strain appeared to be normal, indicating that while H3K36me2 may be an indicator of stability, it is not likely to be a required factor. Although there may be some variation from locus to locus, these core properties shed light on important questions surrounding the function of the Set2p enzyme and H3K36me2 itself.
Before discussion of the biological function of H3K36me2, it is worth mentioning some of the challenges that are inherent to any experiment that aims to determine the distribution of a histone modification genome-wide. In this study, we used ChIP to specifically enrich for genomic regions that contain H3K36me2 nucleosomes and then interpreted the efficiency of DNA recovery at each locus to reflect the relative amount of H3K36 dimethylation at each locus. Using this approach, several factors could create nonbiological variation in results, including the effects of fixation, epitope accessibility, antibody specificity, microarray content, and underlying bulk nucleosome occupancy. These challenges have been discussed at length in recent reviews (6, 13, 32, 54).
This study includes important advances in addressing some of these issues. First, we thoroughly demonstrated the specificity of our H3K36me2 antibody by dot blot against H3K36me0, H3K36me1, and H3K36me3 peptides (Fig. (Fig.1A;1A; see also Fig. S1 in the supplemental material), Western blots derived from whole-cell and nuclear extracts (data not shown), and control ChIPs in set2Δ strains (Fig. (Fig.1B).1B). Second, we used DNA microarrays that cover the entire genome on a single slide. This represents a significant improvement over many published studies that used arrays representing only the ORFs or only noncoding intergenic regions or that split ChIP samples and hybridized them independently to separate arrays representing only the ORFs or the intergenic regions. Use of a whole-genome array was essential to most of the conclusions presented here (13). Third, the H3K36me2 data have been normalized to bulk nucleosome occupancy, using data from H3 or H4-myc ChIPs that were performed in parallel from the same extract. This is important because recent studies have shown that nucleosome occupancy throughout the yeast genome is heterogeneous (2, 26), and if left unaccounted for, misleading patterns could emerge. To our knowledge, this is the first instance of modified-nucleosome ChIP data being normalized to apparent bulk nucleosome occupancy genome-wide. Finally, we followed up each of our ChIP-chip experiments with high-resolution PCR-based detection at individual loci, which provided additional information and confirmed the conclusions drawn from the array results.
The general mechanism of directing Set2p to specific genomic regions by piggybacking on elongating RNAP II through association with a doubly modified CTD (Ser2/Ser5) is entirely sufficient to explain the pattern of H3K36me2 we observed throughout the genome. This is an important conclusion because it indicates that Set2p modifies chromatin only when associated with RNAP II and not, for example, on soluble histone H3 prior to chromatin assembly. More specifically, the genome-wide analysis shows that H3K36me2 occurs at a determined distance from the initiation of transcription, regardless of ultimate transcript length (Fig. (Fig.6).6). This result is consistent with PCR-based ChIP assays performed on single genes (1, 18, 22, 45) and with locus-specific results presented here that imply H3K36 dimethylation begins approximately two nucleosomes downstream of the start codon. In addition, the data indicate that Set2p chromatin-modifying activity stops upon transcriptional termination (Fig. (Fig.7).7). We observed no relationship between the presence of introns and H3K36 dimethylation levels (data not shown).
Our results indicate that levels of H3K36me2 are not correlated with the frequency of transcription but rather with the occurrence of transcription per se (Fig. (Fig.33 and Fig. Fig.4).4). This result suggests that H3K36 methylation does not generally act as a “rheostat” for gene transcription. In S. cerevisiae, set2Δ strains are viable and, in many backgrounds, exhibit only mild phenotypes. So, what does H3K36 methylation do?
At individual loci, Set2p has been shown to act as a transcriptional repressor (25, 51). In one of these studies, Set2p caused repression of GAL4 but not of other examined genes, and in the other, Set2p was artificially tethered to a promoter, which resulted in transcriptional repression. So while Set2p may act as a transcriptional repressor at individual genes or have the capacity to repress transcription if inappropriately tethered at promoters, a general role for repression of transcription at gene promoters is not consistent with the genomic pattern reported here.
Given Set2p's established interaction with elongating polymerase and the genomic pattern of H3K36me2 reported here, it is easy to envision a role for Set2p and H3K36me2 in transcriptional elongation. Several lines of evidence suggest that this is the case. Perhaps the most convincing is synthetic genetic array analysis, which revealed growth defects when a set2Δ mutant was combined with deletions of any of the five components of the Paf1p complex or of the transcription elongation factors Chd1p or Soh1p (22). It has also been shown that deletion of genes encoding either of two components of the Paf1 complex, Rtf1p or Cdc73p, resulted in a decrease in the recruitment of Set2p across the PMA1 gene and abolished H3K36 dimethylation at that locus (22). In addition to these findings, studies involving 6-azauracil help to confirm a role for Set2 as an elongation factor (18, 22, 28). However, it is still not clear exactly how Set2p or H3K36me2 might participate in the process of elongation itself. It remains possible that Set2p's association with elongation is solely a mechanism to control the distribution of H3K36 dimethylation, rather than an indication of any direct participation in the transcription elongation process. In this case, the defects in elongation observed in the absence of H3K36me2 could be indirect consequences of a failure to recruit chromatin-modifying enzymes or other factors important for transcriptional elongation to ORFs.
A mark such as H3K36me2 could also function as a “molecular memory” of transcription patterns that are specified at only one point during development or the life cycle but must be maintained afterwards. This concept of transcriptional memory is similar to what has been proposed for S. cerevisiae H3K4me3, which remains stable on chromatin long after the transcription of chromatin at that locus has ceased (39). The putative “memory” role of H3K36 methylation, which appears to be similarly stable, could be accomplished by the ability of this mark to physically affect the chromatin fiber or, more likely, through the recruitment of other remodeling factors that alter chromatin structure (41, 44). For example, it has been recently observed that histone H3 and H4 acetylation is generally lower in coding regions than in promoters and that this global acetylation pattern is regulated by the protein Eaf3p (42). Eaf3 is a subunit of both the NuA4 histone acetylase complex and the Rpd3 histone deacetylase complex (9, 11). Reid et al. proposed that “Eaf3 might recognize some feature of chromatin (e.g., nucleosome conformation or nonhistone protein) that is distinct between promoters and coding regions” (42). H3K36 methylation could be just such a distinguishing feature of coding and noncoding chromatin.
One intriguing possibility along these lines is that Set2p mediates H3K36 dimethylation to create a stable epigenetic mark that generally serves to distinguish regulatory and nonregulatory chromatin genome-wide. What function might such a distinction serve? As described in the introduction, coding and noncoding chromatin exhibit several biologically important differences whose underlying physical basis remains unexplained. Higher nucleosome occupancy in the body of genes may serve to prevent nonproductive transcription factor-DNA interactions by occluding binding sites that occur in coding regions. Conversely, nucleosomes in promoter regions may be more prone to low occupancy or disassembly, thereby exposing binding sites and directing transcription factors to appropriate targets (47). These two tendencies, acting in concert, would have the effect of reducing the “sequence space” that must besearched by any given factor before an appropriate target isfound.
In S. cerevisiae, about 85% of genes are transcribed at detectable levels during mitotic growth (14, 15), meaning that H3K36me2 distinguishes regulatory and nonregulatory chromatin throughout most of the genome. In actively transcribed chromatin domains, upstream regulatory sequences are clearly distinguished by their lack of the H3K36me2 mark. Therefore, the H3K36me2 mark, which is conserved throughout eukaryotic evolution, represents the first physical mark that has been shown to distinguish regulatory sequences from coding and nonregulatory intergenic sequences genome-wide. A transcription-coupled mark that does not correlate with transcription rate and is correlated with stabilized nucleosomes on transcribed chromatin, both properties of H3K36me2 described here, may be a general feature of eukaryotic chromatin that contributes to the mechanism of context-dependent targeting of DNA-associated proteins.
We thank Ronald N. Laribee for providing Fig. Fig.1A1A and Fig. S1 in the supplemental material. We thank Michael Buck and Paul Giresi for help with data analysis and Arkady Khodursky for careful reading of the manuscript prior to submission.
This work was supported by NIH grants to B.D.S. (R01GM68088) and J.D.L. (K22HG002577). B.D.S. is a Pew Scholar in the Biomedical Sciences.
†Supplemental material for this article may be found at http://mcb.asm.org/.