|Home | About | Journals | Submit | Contact Us | Français|
An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.
Establishment and propagation of gene-expression patterns involves covalent modification of histones (1–4). These modifications play an important role in such processes as cell fate determination, development and cancer (5). Proteins of the trithorax (trxG) and Polycomb (PcG) groups form a cellular memory system that functions to maintain a heritable transcriptional state. These proteins were first identified for their role in homeotic gene regulation in Drosophila, but are now understood to constitute a conserved mechanism (6,7). Both trxG and PcG proteins act in multimeric complexes; some members exhibit histone methyltransferase (HMT) activity, while others interpret these marks and translate them into changes in chromatin structure which ultimately leads to changes in gene expression (8). A common hallmark of activated genes is trimethylation of histone 3 on lysine 4 (H3K4me3) at promoter regions, but it remains unclear how this modification is linked to transcriptional activation. In Saccharomyces cerevisiae, a single HMT complex is recruited to genes by the ubiquitination of histone H2B, requiring prior recruitment of RNA Polymerase II and the PAF1 complex (9–11). In mammalian systems, instead, recent studies provide evidence that H3K4me3 is needed for enrolment of the basal transcription machinery and transcriptional initiation (12,13). A member of the trxG, ASH2 (absent, small or homeotic discs 2) is essential for the deposition of the H3K4me3, but does not have the SET domain [Su(var)3-9, E(Z) and Trx] characteristic of the HMTs (14,15). ASH2 is associated with several HMT complexes in various organisms (15–18) and interacts with transcription factors such as HCF-1, Menin or Myc (14,18–21).
The Drosophila wing imaginal disc has proven to be a useful model to study the role of ASH2. Mutants in ash2 show a variety of pattern formation defects in addition to homeotic transformations expected for a trxG protein (22–24). Expression profile analysis of ash2 mutant discs has revealed downregulation of wing development and patterning genes (14), supporting the view that trxG proteins are involved in maintaining the activated state of those genes. An important step towards understanding ASH2 function is the identification of its target genes and the association with the transcriptional machinery since it is not clear whether ASH2 acts globally on active genes or binds sites in the genome without directly regulating gene transcription. We investigated the relationship between ASH2 occupancy, histone modifications and the transcriptional machinery in the wing disc using chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq). Here, we provide a comprehensive analysis of target genes in association with expression levels.
All Drosophila strains and crosses were kept on standard media at 25°C. The strains used were: Canton S, ash2I1/TM6C and w;daughterless·GAL4;UAS·Ash2HA (14).
Canton S flies were used for histones and RNA Polymerase II ChIP-Seq experiments, and w;daughterless·GAL4;UAS·Ash2HA (Ash2-Hemagglutinin) immunoprecipitated with anti-HA antibody for ASH2. Polytene chromosomes stained with a newly generated anti-ASH2 antibody showed no differences between overexpressed and endogenous ASH2 binding (data not shown). Third instar larva wing imaginal discs isolated from the above flies were fixed as previously described (25) and used as a source of chromatin for ChIP-Seq experiments. The discs were pooled in 700µl of sonication buffer and sonicated in a Branson sonifier. Conditions were established to obtain chromatin fragments, 200–1000bp in length. Chromatin was centrifuged for 10min at top speed at 4°C and the supernatant was recovered. As input sample, 10µl of fixed and sonified chromatin were decrosslinked and purified. For histone modifications, three immunoprecipitations of 100µl, corresponding to 100 discs each, were carried out in RIPA buffer (140mM NaCl, 10mM Tris–HCl pH 8.0, 1mM EDTA, 1% Triton X-100, 0.1% SDS, 0.1% Na deoxycholate, protease inhibitors). For non-histone proteins, six immunoprecipitations (IPs) of 100 discs were performed either in RIPA buffer for PolIIS2P and PolIIS5P or in IP buffer (0.5% NP40, 150mM NaCl, 200mM Tris–HCl pH8.0, 20mM EDTA, protease inhibitor) for HA. As a pre-clearing step, 35µl of 50% (v/v) protein A—Sepharose CL4B was added to the IPs and incubated for 1.5h at 4°C in a rotating wheel. Protein A was removed by centrifugation at 3000rpm for 2min. A suitable amount of antibody (1–2µg) was added to each chromatin aliquot and incubated on a rotating wheel overnight at 4°C. As a negative control, an aliquot was immunoprecipitated without antibody. Immunocomplexes were recovered by adding 35µl of 50% (v/v) protein A-Sepharose (previously blocked in RIPA or IP/1% BSA for 2h at 4°C) to the sample and incubating with rocking for 3h at 4°C. Protein A was washed five times for 10min each in 1ml of RIPA buffer or IP buffer, once in 0.25M LiCl, 0.5% NP-40, 0.5% sodium deoxycholate, 1mM Na–EDTA, 10mM Tris–HCl, pH 8.0, and twice in TE (10mM Tris–HCl, pH 8.0, 1mM Na–EDTA). Protein A was resuspended in 100µl of TE and DNase-free RNase at 50µg/ml was added and incubated for 30min at 37°C. To purify the immunoprecipitated DNA, samples were adjusted to 0.5% SDS, 500µg/ml Proteinase K and incubated overnight at 65°C. IP chromatin was purified with Qiagen PCR purification columns, following the manufacturer’s instructions. Two independent replicates were performed per ChIP-Seq.
For qPCR ChIPs, 40 wild-type (Canton S) and ash2I1 homozygous third instar larva were disrupted, fixed and processed as above. Only anterior-half larva were used. For total PolII ChIPs, we used from 5 to 10µg of antibody and chromatin was immunoprecipitated in IP buffer. Immunocomplexes were recovered with a mixture of protein A/G. Real-time PCRs were normalized against the input sample and depicted as percentage of the input (see Supplementary Data S1 for selected primers).
The antibodies used for chromatin immunoprecipitation were: H3K4me3 (Abcam/ab8580) (Millipore-Upstate/07-473); H3K27me3 (Millipore-Upstate/07-449); H3K36me3 (Abcam/ab9050); PolIIS2P (Abcam/ab5095); PolIIS5P (Abcam/ab5131); HA tag (Abcam/ab9110) and PolII clone 8WG16 (Abcam/ab817).
All protocols for Solexa/Illumina ChIP-Seq analysis (sample preparation and sequencing) were carried out following the manufacturer’s protocol. For a detailed protocol, see Supplementary Data S1.
We ran PeakSeq (26) to identify the regions significantly enriched on ChIP-Seq reads from each sample in comparison to the normalized input control (READLENGTH=325, MAXGAP=40, MINFDR=0.05 and PVALTHRESH=0.05). The optimal read length selected for PeakSeq (26) maximizes the overlap between reads in both forward and reverse strands on each sample. The resulting read maps and target lists were visualized as custom tracks in the University of California Santa Cruz (UCSC) Genome Browser (27). ChIP-Seq profiles and target regions were deposited in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) repository as wiggle (WIG) and Browser Extensible Data (BED) files, respectively, under the accession number GSE24115. Correlation between replicates was performed at three levels (coordinates of reads, number of targets and number of genes associated to the targets). Using RefSeq (28), we determined the genes overlapping at least one nucleotide to each target on each sample. We considered the Gene Ontology (GO) enrichments identified by DAVID (29) in Level 3 of biological process, molecular function and cellular component categories and in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. To identify probable novel genes, we combined the H3K4me3-enriched areas with the collection of full-length mRNAs from GenBank (30), selecting those regions that do not contain any RefSeq gene (28). The same procedure was used to detect putative alternative initial exons. We next determined the genome fragments in which ASH2, the H3K4me3 mark and RNA Polymerase II modifications were present (no gene was annotated within) and selected those elements overlapping with H3K4me3 targets presenting the characteristic occupancy pattern from the FlyBase (31) catalogue of non-coding RNAs. To produce the reads’ graphical distribution for each sample around the transcriptional start site (TSS), we calculated the weighted number of reads on each position from 2000bp upstream to 2000bp downstream of the TSS of all genes (according to RefSeq). For the graphical representation of the idealized gene, we normalized the location of the reads within the genes using a window of 100 units, calculating the mean at each point. We integrated this representation into the neighbouring genomic region corresponding to 1000bp upstream and downstream of the idealized gene. To measure the background levels of ASH2 read counts in intergenic regions, we computationally searched the set of 10kb regions that do not contain gene annotations (1900 regions according to RefSeq). Similar results were obtained by searching intergenic regions of multiple sizes (25, 50 and 100kb). We found 12% of ASH2 reads in these intergenic regions, while 65% of ASH2 ChIP-Seq reads are found within RefSeq gene regions (this represents 5-fold enrichment on gene regions).
We reanalysed previously published data of wild-type and ash2 mutant transcriptomes (14), where two Affymetrix GeneChip Drosophila Genome 2.0 arrays (Affymetrix Inc.) were hybridized per sample. For wild-type flies, we defined three gene classes according to the expression level: highly expressed (5000 or more), expressed (50–5000) and silenced (50 or less). To calculate the Spearman’s rank correlation coefficient between gene expression and the number of targets of each ChIP-Seq experiment, we previously computed the target gene density of the microarray (using a window of 100 genes). To build the list of upregulated and downregulated genes in the mutant microarray, we selected the genes for which the ratio between the expression value in the mutant array and the wild-type wing disc transcriptome was either above 2.0 or below 0.5 (upregulated and downregulated in ash2 mutants, respectively) with an absolute difference between the values of at least 100 units. Both lists were intersected with ASH2 and H3K4me3 target genes to build the final set of 196 downregulated and 137 upregulated genes in ash2 mutants. To evaluate the statistical significance of the differences observed in gene size, number of exons and number of isoforms between upregulated and downregulated genes, we performed on each gene feature a two sample t-test that discriminates whether two distributions of means can be assumed to be equal (null hypothesis) or not. MEME (32) was run on the preferred ASH2 binding region of these genes in our ChIP-Seq experiments, which is (–370, +560) around the TSS, using TomTom (33) to scan the collections of known transcription factor binding sites (34,35). In addition, TRANSFAC and JASPAR catalogues were used to complement the motif search in the ASH2 binding regions (similarity threshold 85%) (36,37). We filtered out the predicted motifs that were not conserved at least in Drosophila pseudoobscura and four additional Drosophilids in the genome alignments of 12 Drosophila species (38). We implemented multiple scripts written in Perl and R in order to perform most of these tasks (format conversion among different tools, comparison of lists of targets, association of genes with lists of targets and graphical representations of reads around the TSS of genes and within the genes). This software is available upon request to the authors.
Details of other procedures are provided in Supplementary Data S1.
ASH2 occupancy was mapped using chromatin isolated from third instar larva wing imaginal discs, obtaining 8009 target genes (Figure 1 and Supplementary Table S1). GO analysis revealed a significant enrichment in development and morphogenesis categories (Supplementary Figure S1). We also determined the genomic distribution of H3K4me3 and H3K36me3, as specific marks of positive transcriptional regulation and H3K27me3 as a negative mark (Supplementary Table S1). We identified 5730 target genes for H3K4me3, 4919 for H3K36me3 and 2999 for H3K27me3. Figure 1A shows an example of ASH2 binding between two peaks of H3K4me3 associated with katanin-60 and Mms19 genes, which display opposite transcriptional orientation. H3K36me3 extends over the gene region of both expressed genes. By contrast, in an example of a gene silenced in the wing disc, H3K27me3 is spread throughout the Deformed (Dfd) locus. As anticipated, there was extensive overlap between ASH2 occupancy and the H3K4me3 and H3K36me3 marks, but not with H3K27me3 (Figure 1B). A subset of 441 genes has both activating and silencing marks, and among those, 423 contain ASH2 binding sites. Genes from several pathways known to be expressed differentially in the wing disc are included in this group (Supplementary Figure S1), suggesting that the two marks are present in various cell types according to their transcriptional state. Thus, we can predict that uncharacterized genes of this group display heterogeneous expression patterns in the wing tissue.
We observed extensive overlap between the targets obtained by ChIP-Seq in wing discs and by ChIP-on-chip in embryos (39): up to 95% of H3K4me3 targets match any ChIP-on-chip region. The enriched regions detected by high-throughput sequencing were, though, more precise (average target length: 327.2nt in ChIP-Seq and 2048.3nt in ChIP-on-chip), confirming that this technique results in higher resolution (Supplementary Figure S2). These results suggest that there are few differences in the chromatin state between these two time points, although a subset of genes is cell type and developmental stage specific. Additionally, through the combination of our data with FlyBase/RefSeq gene collections, we refined the annotation of 21 genes (Supplementary Table S2) and uncovered 55 new regions (Supplementary Table S3) in the fly genome that show significant enrichment of activating marks in the wing discs. Using a similar approach, we identified 21 non-coding RNAs that display transcription-activating marks in the wing disc (Supplementary Table S4).
The projection of the mean reads over the TSS of the full set of genes or over an idealized gene (Figure 1C) reveals a single ASH2 peak from the promoter to the gene region (5-fold enrichment of ASH2 read counts in gene regions relative to typical intergenic regions, see ‘Materials and Methods’ section). The H3K27me3 distribution was found scattered throughout silenced regions. In contrast, H3K4me3 exhibits a main peak at the first 500bp downstream of the TSS and a secondary one upstream, presumably caused by the presence of genes transcribed in the opposite direction (Figure 1C). This hypothesis is supported by the fact that one single peak is detected downstream the TSS when only plotting genes for which there are no other annotated genes in their vicinity (data not shown). Finally, we observed that the ASH2 peak localizes upstream of the main H3K4me3 peak in ~80% of the genes, suggesting it contributes to this methylation in nearby nucleosomes.
To uncover the relationship between ASH2 and transcription, we took advantage of previously published data on the wing disc transcriptome (14). We classified the genes into three categories according to their expression level: silenced, expressed and highly expressed genes (Figure 2A). We also performed ChIP-Seq analysis using specific antibodies against two modified forms of RNA Polymerase II: serine 5 phosphorylated (PolIIS5P, as a mark of the stalled polymerase at the TSS) and serine 2 phosphorylated (PolIIS2P, as the elongating mark) (Supplementary Table S1). We found 1080 genes containing only PolIIS5P mark (putatively stalled genes), 1452 genes showing only the elongating PolIIS2P mark and 1817 genes with both. As expected, PolIIS5P, like ASH2, peaks around the TSS, and the elongating polymerase (PolIIS2P) is present in actively transcribed regions, coinciding with H3K36me3 (Figure 2B). We uncovered a positive association between gene expression and number of ChIP-Seq reads as previously reported (40). The correlation between the expression level and the ChIP-Seq data (Spearman’s rank correlation coefficient, see ‘Material and Methods’ section) confirmed that the set of expressed genes is clearly enriched in ASH2 (correlation coefficient 0.88), H3K4me3 (0.93), PolIIS5P (0.95), PolIIS2P (0.95) and H3K36me3 (0.93) targets. In contrast, H3K27me3 (–0.84) is primarily associated with silenced genes (Figure 2A).
Of the genes containing the PolIIS5P modification but not PolIIS2P, 193 are ASH2 target genes silenced in wing disc and belong to categories related to pupal and adult functions such as learning or memory, mating and circadian behaviour (Supplementary Figure S3). This number is likely to be an underestimate due to the stringent normalization protocol followed. On the other hand, most genes showing PolIIS2P alone or both modifications of the RNA Polymerase II are actively transcribed targets of ASH2 and H3K4me3 (Supplementary Figure S4). Finally, 35% of ASH2 target genes are silenced in the wing disc and include GO terms related to signal transduction and metabolism (Supplementary Figure S5). A subset of these silenced genes (1171) possess H3K27me3 but not H3K4me3 (Figure 1B) and is enriched in signal transduction and receptor activity categories (938 genes; Supplementary Figure S6). The identified functional categories suggest that many silenced ASH2 target genes are involved in dynamic biological processes and would thus require the ability to respond rapidly to signals. The phosphorylation state of PolII likely reflects this stalled state of the polymerase in the promoters of the above-mentioned ASH2 target genes.
In light of our results, we reanalysed the expression data obtained previously in microarray analyses of ash2 mutant discs (14). By comparing wild-type with ash2I1 mutants, we identified 342 downregulated genes and 368 upregulated genes in wing imaginal discs (see ‘Materials and Methods’ section). A significant fraction of these differentially expressed genes are ASH2 target genes: 294 downregulated genes (85%) and 253 upregulated genes (69%). We next selected those genes that also present the H3K4me3 mark and found 196 ASH2 and H3K4me3 target genes among the downregulated genes and 137 among the upregulated ones. These genes display distinct features in terms of GO categories (Figure 3A). The downregulated set of genes is enriched in development and transcription categories, whereas the upregulated list is enriched in ribosomal and mitochondrial metabolism categories. Downregulated genes and upregulated genes also show significant differences (see ‘Materials and Methods’ section) in gene size (on average 14068 and 6858bp, respectively, P-value <10−5), number of exons (5.9 and 3.6 exons, P-value <10−13) and number of alternative forms as annotated in RefSeq (2.3 and 1.3 alternative transcripts, P-value <10−11). Moreover, genes showing a higher expression level in the mutant condition do not correspond to silenced genes in the wild-type disc. Instead, those genes were already expressed and only increased their values in the absence of ASH2. The projection of ASH2 and H3K4me3 reads over the TSS of downregulated and upregulated genes uncovers no differences in their occupancy. The difference in number of reads of H3K4me3 may reflect the number of cells presenting this activating mark in the wing disc (Figure 3B). Taken together, our data suggest that ASH2 action is dependent on interactions with other transcriptional regulators.
To address this possibility and to understand the sequence determinants of ASH2 binding, we proceeded to computationally characterize the regions around the TSS of upregulated and downregulated genes. Using motif discovery tools, we first identified multiple regulatory sites specifically present on each set of sequences. Complementary to this approach, we used TRANSFAC and JASPAR to scan these regions and enrich the initial collection of motifs. We next filtered out those predictions that were not confirmed by phylogenetic footprinting using the genomes of 12 Drosophilae by selecting only those sites conserved in D. pseudoobscura and at least four additional Drosophilids (enrichment calculated in comparison to the total number of conserved sites of each class in the D. melanogaster genome, see ‘Materials and Methods’ section). Roughly, 50% of ASH2-regulated genes presented at least one evolutionarily conserved motif (see Figure 4A). As anticipated, we found the GAGA motif, known to engage the GAGA transcription factor GAF, within the ASH2-binding regions of a significant subset of downregulated genes (58 genes, P-value <10−12). Recently published data from ChIP-on-chip analysis of GAF in Drosophila embryos (39) support our predictions, since 74% of GAGA predicted sites are located within GAF ChIP-on-chip regions. Interestingly, ASH2 binding regions are enriched in E2F-binding sites (42 genes, P-value <10−7) known to recruit E2F transcription factors. A different situation was observed in the set of upregulated genes, where we identified a non-canonical E-box (48 genes, P-value <10−10) and a DRE motif (39 genes, P-value <10−8), known to recruit the DNA replication-related element factor (DREF) (41). We also identified a common motif in both lists of genes (TGGTCACACTG) that is reportedly involved in the recruitment of Mnt/Max complexes (42). In fact, 18 putative Mnt/Max sites overlap with binding regions previously defined by DamID analysis (42) supporting our predictions. One novel motif was additionally identified in each group (Motifs 1 and 2 in Figure 4A). We believe that these novel sequences, together with the transcription factors, participate in ASH2 binding. Again, given the stringent protocol employed to identify these motifs, our results are likely to underestimate the actual number of binding sites. In order to decipher putative cis-regulatory modules underlying ASH2-binding regions, we depict the genes in both sets containing two or more motifs (Figure 4B). We next focused on those cases in which the binding motifs are located at a distance up to 100bp, thus constituting a plausible regulatory unit. As shown in Figure 4C, we characterized several ASH2-binding regions that manifest specific preferences concerning local positioning and order between the components of each potential module.
To clarify the role of ASH2 in transcriptional regulation we performed ChIP-qPCR analysis of individual genes in wild-type and ash2I1 mutant larva and analysed H3K4me3 and RNA Polymerase II modifications. The genomic regions were selected based on the following criteria: ASH2 targets with differential expression in ash2 mutants possessing at least one predicted motif around their TSS. We chose two downregulated genes: engrailed (en) and Cyclin A (CycA); two upregulated genes: mitochondrial Ribosomal protein L40 (mRpL40) and Ribosomal protein L36 (RpL36); and one gene whose expression does not change in the mutant condition used as a control: Sphingosine-1-phosphate lyase (Sply) (Figure 5A and Supplementary Figure S7). We observed that, consistent with the general function of ASH2, H3K4me3 is reduced in all genes in ash2 mutant flies, independently of their transcriptional state. Strikingly, we did not detect any change in PolIIS2P, but a decrease in PolIIS5P was observed in the TSS of the three classes of genes. A possible disengagement of PolIIS2P along the gene should be discounted since no change in its occupancy was observed when performing ChIP analysis on the 3′ gene region (Figure 5A). To confirm these observations we performed immunostaining on polytene chromosomes. As shown in Figure 5B, there was a general decrease of PolIIS5P on ash2 mutant larva in comparison to wild-type. In agreement with our ChIP experiments, PolIIS2P does not show clear differences between mutant and control larva.
To further analyse RNA Polymerase II (PolII) occupancy over these genes, we performed ChIP-qPCR experiments with an antibody that recognizes total PolII and calculated the ratio between TSS and 3′ region (Supplementary Figure S8). Those genes that exhibit a clear enrichment of the polymerase at TSS in wild-type flies (en and RpL36; TSS/3′ ratio >1) display a decrease in PolII at the TSS relative to 3′ region in ash2 mutants. We detected a slight decrease of the TSS/3′ ratio in the case of CycA, mRpL40 and Sply (ratio ~1 in wild-type flies). The uniform distribution of PolII along these genes might mask a reduction at the TSS in ash2 mutants. Furthermore, the presence of total PolII occupancy is likely to be underestimated, since the antibody used primarily recognizes the unphosphorylated form of the polymerase (43). Taken together, these results support the idea that the mechanism of action of ASH2 in terms of RNA Polymerase II modifications does not differ between developmentally regulated and housekeeping genes. Analysis of additional control mechanisms, such as RNA capping or splicing, require further experimentation.
Work using various model organisms and cultured cells has provided high-resolution profiles of histone modifications and transcription factor binding across different genomes (40,44,45). In this study, we use direct sequencing of ChIP DNA from wing disc to analyse ASH2 function. Because the cell composition of isolated wing disc tissue is rather homogeneous, we have been able to set apart several attributes. First, ASH2 occupancy correlates with the presence of phosphorylated forms of RNA Polymerase II and activating histone marks in expressed genes. On the other hand, we cannot dismiss a direct role for ASH2 in gene repression as well, as ASH2 also targets silenced genes. In support of this, ASH2-interacting proteins HCF-1 and dMyc are involved in both transcriptional activation and repression (14,18,46). Alternatively, silenced ASH2 target genes could be arrested in an intermediate ready-to-go state of transcription, which may be activated by external signals. Second, our results agree with previous observations in Drosophila and Xenopus embryos, where dually marked domains do not seem to be a common feature (39,44). It has been reported that bivalently marked chromatin, containing both H3K4 and H3K27 trimethylation, is a hallmark of developmentally regulated silenced promoters in mammalian embryonic stem cells (47,48). In contrast, these marks can be coupled to the differential expression pattern of several genes throughout the wing disc, therefore indicating the presence of each individual mark in different cells. A recent report using a similar genome-wide approach in undifferentiated cell-enriched Drosophila testis reveals that differentiation-associated genes are also linked with monovalent modifications (49). Third, we use ASH2 binding together with activating marks of transcription as a powerful tool to identify previously unannotated genes.
The actively transcribed genes in the wing disc are occupied by nucleosomes with histone modifications that are hallmarks of both initiation and elongation, as described in human cells (50). We have uncovered a positive correlation between activating marks of transcription (both H3K4me3 and H3K36me3) and ASH2 occupancy. Our study has also determined that ASH2 contributes to H3K4me3 in nearby nucleosomes. H3K4me3 is associated with the TSS of active genes, whereas H3K27me3 spreads over large regions of chromatin to promote silencing and H3K36me3 is found in actively transcribed regions (40,51,52). Only genes containing H3K36me3 undergo further elongation and produce mature transcripts [reviewed in (53)].
Transcriptional regulation is a multistep process controlled by a large complex machinery at the level of recruitment, initiation, pausing and elongation of RNA Polymerase II (53,54). A series of recent genome-wide studies indicate that many developmental and inducible genes, prior to their expression, contain RNA Polymerase II bound predominantly in their promoter proximal regions in a stalled state (53,55,56). Nevertheless, not only silenced genes show an enrichment of the RNA Polymerase II density at their TSS as the stalled polymerase is also present at this region in active genes (57). The presence of ASH2 and H3K4me3 together with PolIIS5P at the TSS of expressed genes is consistent with previous reports proposing that promoter-proximal stalling serves not only to fully repress but also to attenuate transcription of active genes. As recently described, transient stalling of polymerase is a general feature of early elongation, even in highly active genes (58).
The analysis of ash2 mutant flies indicates that ASH2 is performing its canonical function promoting H3K4me3, regardless of the effect on the transcriptional state of its target genes and the context specificity of its recruitment to promoters. In light of the results obtained with RNA Polymerase II modifications in the mutants, we conclude that ASH2 influences different aspects of transcription. The specific binding motifs identified in differentially regulated genes, together with the co-occupancy of ASH2 and PolIIS5P at the TSS, suggests a role in transcription initiation. Nevertheless, the reduction of PolIIS5P in mutant flies points to a fast escape from stalling in the absence of ASH2.
Distinct sets of accessory factors are associated with polymerase stalling and its escape from this state, acting either by direct interaction with RNA Polymerase II, or by manipulating the chromatin environment (59). Among these factors, there are proteins associated with polymerase stalling, such as the DRB sensitivity-inducing factor (DSIF) and the negative elongation factor (NELF), and others that contribute to escape from stalling, such as the positive transcription-elongation factor-b (P-TEFb) complex and the general transcription factors TFIIS and TFIIF [(53) and references herein]. It remains to be elucidated whether ASH2 interacts directly with some of these factors. However, NELF and GAF have been found linked to promoter-proximal pausing at many genes in Drosophila (60). A connection between ASH2 and polymerase stalling in developmental genes could, therefore, be envisioned through GAF, since it is known that GAF is a recruiter of PcG and trxG complexes to DNA (8). In fact, about half of the downregulated genes in ash2 mutants presenting GAGA sites are NELF targets (data not shown). Furthermore, it has been recently reported that c-Myc regulates RNA Polymerase II pause release by recruiting P-TEFb to its target genes (61), and it is known that ASH2 interacts with Myc in flies (19). The enrichment of Ebox and Mnt/Max motifs found in upregulated genes in ash2 mutants points to a function of ASH2 through Myc in their transcriptional regulation. A subset of these motifs was characterized in H3K4me3 regions by Schuettengruber et al. (39). We have been able to associate these motifs with downregulated and upregulated genes in ash2 mutants, suggesting differential transcriptional regulation.
Several effector proteins that can bind to H3K4me3 determine the functional outcome of this histone modification. The activities of these binding proteins range from activation and repression of transcription, chromatin remodelling or splicing efficiency among others (62). An additional role for ASH2 during transcript elongation and maturation should not be excluded. Indeed, it has been suggested that methylated H3K4 serves to facilitate the competency of pre-mRNA maturation through the bridging of spliceosomal components (63). The fact that downregulated and upregulated genes in ash2 mutants display clear differences in size and genomic organization (gene size, alternative isoforms and number of exons) suggests they may be regulated in a different way during transcription and processing of RNA, as previously suggested [for review see (64)]. Finally, recent reports indicate an association of RNA Polymerases II and III at promoter regions of housekeeping genes (65–67) and a recruitment of RNA Polymerase III through Myc interacting with the cofactor BRF has also been described (68). However, preliminary experiments discard the implication of other polymerases in the transcription of these housekeeping genes in the absence of ASH2. Taken together, our results support a model in which an ASH2-containing complex would act at different levels of transcriptional regulation.
Supplementary Data are available at NAR Online.
Ministerio de Ciencia e Innovación (GEN2006-28564-E, BMC2006-07334, ACI2009-0903 and CSD2007-00008), Juan de la Cierva fellowship (to E.B.); Universitat de Barcelona, APIF fellowship (to A.C.); NIH (to M.S). Funding for open access charge: ACI2009-0903 and Universitat de Barcelona.
Conflict of interest statement. None declared.
We thank R. Guigó and J.F. Abril for kindly providing access to their computer facilities, A. Mazo, M. Buschbeck and C. Byars Baker for insightful suggestions and A. Mateo for technical support. We also thank the Ultrasequencing Unit of the CRG (Barcelona, Spain) and the Functional Genomics Core Facility of the IRB (Barcelona, Spain).