|Home | About | Journals | Submit | Contact Us | Français|
Poised RNA polymerase II is predominantly found at developmental control genes and is thought to allow their rapid and synchronous induction in response to extracellular signals. How the recruitment of poised RNA Pol II is regulated during development is not known. By isolating muscle tissue from Drosophila embryos at five stages of differentiation, we show that the recruitment of poised Pol II occurs at many genes de novo and this makes them permissive for future gene expression. When compared to other tissues, these changes are stage-specific and not tissue-specific. In contrast, Polycomb group repression is tissue-specific and in combination with Pol II (the balanced state) marks genes with highly dynamic expression. This suggests that poised Pol II is temporally regulated and is held in check in a tissue-specific fashion. We compare our data to mammalian embryonic stem cells and discuss a framework for predicting developmental programs based on chromatin state.
The recruitment of RNA polymerase II (Pol II) has long been thought to be the rate-limiting step for transcription at most genes. However, in recent years it has become clear that at a large fraction of genes, Pol II has initiated transcription but then pauses just downstream of the transcription start site (TSS), and that the regulation of Pol II elongation is also a critical step for transcription (Core et al., 2008; Guenther et al., 2007; Muse et al., 2007; Nechaev et al., 2010; Rahl et al., 2010; Zeitlinger et al., 2007). Strikingly, paused Pol II is preferentially found at developmental control genes, suggesting that these genes are frequently regulated at the level of elongation (Muse et al., 2007; Zeitlinger et al., 2007). However, exactly how the interplay of Pol II recruitment and elongation contributes to the regulation of developmental processes is not known.
Evidence so far suggests that paused Pol II helps the rapid and synchronous induction of genes in response to extracellular stimuli. For example, at Drosophila heat shock genes, where paused Pol II was originally discovered, gene induction in response to heat shock occurs very rapidly (Boehm et al., 2003; Gilmour and Lis, 1986; Rougvie and Lis, 1988). Furthermore, genes paused in the early Drosophila embryo tend to be activated in a more synchronous fashion (Boettiger and Levine, 2009). The exact mechanisms by which paused Pol II helps gene induction are not entirely understood. It has been proposed that paused Pol II keeps the promoter in an open state by displacing the promoter nucleosome just upstream of the TSS (Gilchrist et al., 2010; Gilchrist et al., 2008). Furthermore, genes with paused Pol II are transcribed at low levels (Fuda et al., 2009; Zeitlinger et al., 2007), raising the possibility that occasional full-length transcription may also prime genes for activation. Thus, paused Pol II could directly mediate rapid gene activation, or indirectly by establishing a permissive state.
How is Pol II pausing regulated during development? The simplest model is that Pol II pausing occurs by default and thus may represent a transcriptional checkpoint for important, highly regulated genes. Indeed, Pol II pausing could be an intrinsic property of the promoter since core promoter elements such as Inr, DPE and PB are highly enriched among genes with Pol II pausing (Gilchrist et al., 2010; Hendrix et al., 2008; Lee et al., 2008; Rach et al., 2009; Rahl et al., 2010). However, there is also evidence that genes can lose paused Pol II and show a closed or inactive promoter state with high nucleosome occupancy (Gilchrist et al., 2010). This raises the possibility that recruitment of paused Pol II is developmentally regulated and that this may occur independently of gene induction. Such a mechanism could render genes either inaccessible or more permissive to activation in certain tissues or developmental stages. Thus, it may represent an additional developmental checkpoint that ensures precise and robust gene regulation during development (Levine, 2011).
Paused Pol II has frequently been associated with Polycomb group (PcG) repression. Both paused Pol II and PcG proteins are preferentially found at developmental control genes (Boyer et al., 2006; Bracken et al., 2006; Lee et al., 2006; Negre et al., 2006; Oktaba et al., 2008; Schwartz et al., 2006; Tolhuis et al., 2006), have been observed to co-occur (Bracken et al., 2006; Brookes et al., 2012; Enderle et al., 2011; Kharchenko et al., 2011; Lee et al., 2006; Marks et al., 2012; Schwartz et al., 2010), and there is mechanistic evidence that they antagonize each other (Brookes et al., 2012; Chopra et al., 2011; Dellino et al., 2004; Marks et al., 2012; Stock et al., 2007). In Drosophila, the co-occurrence of PcG repression and Pol II has been referred to as the balanced state (Schwartz et al., 2010), but its significance for development is unclear.
PcG repression is epigenetically inherited, making it an ideal mechanism for guiding and stabilizing cell fate. A classical example is the repression of Hox genes by PcG complexes, which maintains the segmental identity across the body axis throughout the life cycle of Drosophila (Ringrose and Paro, 2007; Schwartz and Pirrotta, 2007). PcG repression also restricts the expression of other important developmental control genes (Oktaba et al., 2008; Pelegri and Lehmann, 1994) but its relationship to paused Pol II is not known.
It is possible that the balanced state is related to the bivalent domain in mouse and human embryonic stem cells, which is the co-occurrence of H3K27 trimethylation (H3K27me3) and H3K4me3 near the transcription start site (Bernstein et al., 2006; Mikkelsen et al., 2007). Bivalent domains are found at higher frequency in embryonic stem cells than in differentiated cells and are thought to poise genes for activation during differentiation (Bernstein et al., 2006). However, the universal role of bivalent domains in development has been questioned since they have neither been found in Drosophila (Gan et al., 2010; Schuettengruber et al., 2009) nor Xenopus (Akkers et al., 2009), and even in mouse embryonic stem cells they may not be as prevalent as previously thought (Marks et al 2012).
So far, the role of Pol II pausing and PcG repression in development has not been systematically examined. First, this requires a large number of cells from various developmental stages and tissues, and such techniques have only recently been developed (Bonn et al., 2012; Deal and Henikoff, 2010). Second, measurements of paused Pol II are sensitive to the level of transcription (Lee et al., 2008; Nechaev et al., 2010), making it challenging to analyze the role of Pol II pausing during a developmental process where gene expression is highly regulated.
In this study, we used FACS to isolate muscle cells from Drosophila embryos at five time points during development and analyzed the distribution of Pol II and H3K27me3 across the genome. We specifically focused our analysis on paused Pol II in the absence of significant transcription - a state we refer to as poised Pol II. We found that the set of genes occupied by poised Pol II changes dynamically during development and that de novo recruitment of poised Pol II is indicative for future gene induction. Interestingly though, this does not occur in a tissue-specific manner, suggesting that changes in poised Pol II occur globally as a function of developmental time. In contrast, the H3K27me3 mark is tissue-specific, suggesting that PcG repression keeps Pol II in check in a tissue-specific fashion. Indeed, the combination of both marks, the balanced state, is associated with highly dynamic spatial and temporal expression during embryogenesis and is similar to the bivalent domain in mammals.
To analyze chromatin state and transcription during the development of specific cell types, we developed a FACS-based method that can be coupled to immunoprecipitation experiments and mRNA isolation followed by deep sequencing (ChIP-seq and mRNA-seq; Figure 1A). Muscle cells were labeled by expressing plasma membrane-targeted GFP under the control of mef2-GAL4, which drives expression in the developing mesoderm as well as in the somatic, visceral and cardiac musculature starting from embryonic stage 9. This allowed us to sample various developmental stages, encompassing mesoderm subdivision (6–8 h after egg laying (AEL), myoblast fusion (8–10 h AEL), terminal differentiation (10–12 h AEL), as well as the terminally differentiated musculature (14–17 h AEL). To examine mesodermal tissue at the time of mesoderm specification and gastrulation (2–4 h AEL), we used the Toll10b mutant, which produces embryos that only consist of mesodermal precursors (Furlong et al., 2001; Schneider et al., 1991).
For ChIP-seq experiments, embryos were dissociated into single cells, fixed, filtered and then sorted (Figure 1A and Figure S1). Microscopic examination of sorted cells indicates a purity of greater than 80–90 % (Figure S1B). Furthermore, Poll II binds to muscle-specific genes at the expected stages during our time course (Figure 1B), and GFP-positive versus GFP-negative cells sorted from the same cell suspension show large differences in Pol II binding (Figure S1E), indicating strong enrichment of muscle cells in our sample.
mRNA-seq was performed on live-sorted cells from the same tissues. We find that our mRNA-seq data are highly reproducible (R2 = 0.99 for all samples) and show the expected dynamic regulation of known muscle genes (Figure 1C). Furthermore, using an FDR of < 0.05 (corrected for multiple testing), we find that the function of up-regulated and down-regulated genes as determined by GO function are consistent with the known stages of muscle development (Figure S2A).
To test whether Pol II pausing is regulated during development, we analyzed the occupancy of Pol II across the muscle time course using ChIP-seq. We used an antibody against the C-terminal domain (CTD) of Pol II (8WG16) in independent replicate experiments. Experiments with a different Pol II CTD antibody (4H8) gave similar results in our analyses (Figure S2E and S3A). We previously defined paused Pol II by the pausing index, which is the ratio of Pol II enrichment around the TSS (Pol IITSS) versus Pol II enrichment in the transcription unit (Pol IITU) (Muse et al., 2007; Zeitlinger et al., 2007) (Figure 2A). Since Pol IITU depends on the transcription levels and is subject to noise at very low levels, we focused our analysis on poised Pol II, which we define as high levels of Pol II near the transcription start site (Pol IITSS in the top 20th percentile of all genes) with transcript levels below an RPKM of 10 as determined by mRNA-seq (as poised genes are transcribed above background (Fuda et al., 2009; Zeitlinger et al., 2007)). This preferentially identifies developmental control genes similar to those published previously (Figure S2B and C).
To test whether the recruitment of poised Pol II changes during development, we selected all genes that have poised Pol II in at least one time point. Thus these genes can have paused Pol II with active transcription or be in an inactive state without Pol II at other time points. While 60% remain bound by Pol II with or without transcription throughout the time course (constant set), 40% are found to be in an inactive state with no Pol II at some point (Figure 2B). Strikingly, most of these genes lack Pol II at the first time point and gradually acquire Pol II promoter occupancy during our time course (opening set, n = 502; Figures 2B and S2E). Only a small fraction of genes lose Pol II occupancy over time (closing set, n = 65; Figures 2B and S2E). The de novo recruitment of poised Pol II also correlates with changes in chromatin accessibility as measured by increased DNAse I hypersensitivity (DHS) in a whole-embryo time course (Figure 2C; note that the DHS time course ends at ~11 h, and thus our last time point with maximum Pol II binding cannot be compared). Thus, for a large fraction of poised genes, the promoter becomes accessible and occupied by Pol II during the course of development.
When poised Pol II is established de novo, does it indicate that these genes are now more likely to be activated? Although this might be expected, it has not been formerly tested. To do so, we used each time point of the RNA-seq data sequentially as a reference time point (gray squares in Figure 2D) and identified all genes that are induced at future time points or were expressed in past time points (with different thresholds giving similar results, see Figure S3). We then asked what fractions of genes are induced among different Pol II groups (Figure 2D–E). We found that among poised genes in the constant set, ~10–45% are typically induced in the future. Interestingly, a very similar fraction (yet containing mostly different genes) has been expressed in the past (Figure 2D, top), indicating that poised Pol II can also be a mark for past activation. In contrast, newly poised genes are much more likely expressed in the future (~45% versus ~10%, with past expression likely due to maternal transcripts; Figure 2D, bottom), supporting the idea that de novo recruitment of poised Pol II is a mechanism that prepares genes for future activation.
To obtain a more general measurement of the activity of gene groups, we refined our method. So far, the fraction of induced genes in each group varies and depends on the total number of genes induced, which in turn increases over developmental time (Figures 2D and E). To normalize, we defined the large number of genes with Pol II levels at or below background as control genes, and calculated the ratio between induced genes in the test set (poised Pol II) over control genes (no Pol II) (Figure 2E and F). We call this normalized measurement the relative predictive value. At most time points, the fraction of induced genes among those with prior poised Pol II is significantly higher than those without prior Pol II (shown in red in Figure 2E), with highest values typically found near the reference sample. Only poised genes in the opening set are less likely expressed in the past as compared to control genes (shown in blue in Figure 2E), and this overall pattern is robust for a variety of thresholds for identifying a poised gene and its activation (Figure S3B). This suggests that when genes switch from no Pol II to poised Pol II, their likelihood of activation becomes significantly higher.
Since on average poised genes tend to be expressed at higher levels than genes with no Pol II, we also used control genes with transcript levels similar to poised genes (Figures S3C). We found that the overall pattern of the predictive values for poised Pol II is still similar. Although this does not rule out that the permissive state associated with poised Pol II is in part mediated by low levels of transcription, it argues that low levels of transcripts per se do not have the same relative predictive value for future gene expression as poised Pol II itself.
We next analyzed whether the recruitment of poised Pol II is tissue-specific but surprisingly found no evidence. First, we determined Pol II occupancy in differentiated neuronal tissue by sorting GFP-positive cells (GFP driven by elav-GAL4) (Figure S1D). This showed that all genes that are poised in muscle cells have detectable levels of Pol II in neurons, and a large fraction (49%) of these genes are active in neurons (Figures 3A). Second, based on the large-scale in situ hybridization database ImaGO (Tomancak et al., 2007), the opening set genes identified in our muscle time course are indeed expressed late in embryogenesis, but they are expressed in various tissue types, suggesting that Pol II is also recruited to these genes in many other tissues (Figure 3B).
Finally, when we analyzed the relative predictive value of poised genes using the method described in Figure 2F, we also find that poised genes in muscle, whether in the constant set or opening set, are frequently expressed in neuronal cells or whole embryos (data by (Graveley et al., 2011)) (Figure 3C). Furthermore, the expression of the opening set is also restricted to later expression in whole embryos, consistent with the hypothesis that Pol II is recruited de novo throughout the embryo.
This suggests a model in which poised Pol II is dynamically recruited to genes over time and these genes are then induced in a tissue-specific fashion. This explains why not all poised genes are induced in a particular tissue. For example, only ~50% of all poised genes are expressed during the entire muscle time course, while this cumulative percentage increases to ~70% when the expression data from neuronal cells and whole embryos are included (Figure S3).
To test how promoter elements determine the dynamics of Pol II occupancy during development, we analyzed the core promoter elements in all our gene groups. Studies so far have analyzed highly paused versus less paused genes (Gilchrist et al., 2010; Hendrix et al., 2008; Lee et al., 2008), but whether this difference corresponds to focused and dispersed transcription (Rach et al., 2009) is not clear.
Here we identified three promoter classes (Figure 4). First, so-called housekeeping genes, which are broadly expressed in the embryo (Tomancak et al., 2007), have dispersed promoter elements as previously shown (Rach et al., 2009). Second, we find that genes that are poised at any time point (constant set or opening set) are all highly enriched in promoter elements previously associated with Pol II stalling (GAGA, Inr, DPE, PB, MTE). This suggests that these elements predispose genes for the recruitment of poised Pol II but do not do so by default. Third, we find that genes that are induced without prior poised Pol II fall into a third class of promoters that are enriched for Inr and the TATA box. TATA-enriched promoters have previously been identified as a separate class of promoters that are associated with cell-type-specific gene expression in adult somatic tissues (Engstrom et al., 2007). Thus, our results corroborate TATA-enriched promoters as a separate class and suggest that these promoters do not require recruitment of poised Pol II prior to induction.
Since paused Pol II has been associated with a strong promoter nucleosome in the absence of transcription (Gilchrist et al., 2010), we analyzed the nucleosome organization in the three classes of promoters by performing micrococcal nuclease (MNase) treatment and paired-end sequencing at the first and last time points of the muscle time course (Figure 4B). We found that poised genes indeed show a strong promoter nucleosome when Pol II is not present at the first time point, while promoters occupied by poised Pol II are depleted for the promoter nucleosome. This difference is not intrinsic to DNA sequence because both sets of genes show similar predicted promoter nucleosome occupancy. In contrast, housekeeping genes or TATA-enriched genes do not have a strong promoter nucleosome and the profile looks similar when active or inactive. However, housekeeping genes were distinct from TATA-enriched genes in that the nucleosome occupancy at the first nucleosome was significantly higher. These results show that there are three distinct promoter classes at the level of nucleosome organization.
To analyze the role PcG repression, we mapped the genome-wide profile of H3K27me3 in all time points of muscle development, as well as in differentiated neuronal cells. We did not map PcG proteins directly because at the well-characterized Ubx gene in Drosophila, PcG proteins bind independently of whether the gene is repressed or active (Papp and Muller, 2006), suggesting that PcG protein occupancy alone may not be a good indicator for PcG repression. On the other hand, the presence of H3K27me3 on the transcription unit of genes has been found to correlate well with PcG repression (Papp and Muller, 2006; Schwartz et al., 2006).
We found that genes that are differentially marked by H3K27me3 were preferentially expressed in either muscle or nervous system based on mRNA-seq expression levels (p < 0.02, Scheirer-Ray-Hare test; Figure 5A) or whole-embryo in situ hybridizations (p < 0.027, Fisher exact test; Figure 5B). An example is the twist gene, which shows high H3K27me3 levels across the transcription unit in neuronal cells but less in muscle cells (Figure 5C). Conversely, the shaven gene has high H3K27me3 levels in muscle cells but less in neuronal cells (Figure 5C). Note that the H3K27me3 levels are not completely absent in the other cell type and this is likely due to the segmentally modulated expression of twist, shaven and many other developmental control genes. Thus, even if a PcG-regulated gene is active in muscle cells, it is rarely expressed in all cells of this tissue.
Next we analyzed H3K27me3 across the muscle time course. We found that genes with H3K27me3 at the transcription unit are less likely to be induced at future time points (blue in Figure 5D). But unlike the predictions of poised Pol II, the predictions of H3K27me3 are tissue-specific. The set of genes highly occupied by H3K27me3 in muscle cells do not negatively predict gene expression in neuronal cells of the same stage or the entire embryo (Figure 5D). This suggests that the gene set with PcG repression is tissue-specific and tends to be maintained during Drosophila embryogenesis.
We now analyzed the co-occurrence of Pol II binding and H3K27me3, which defines the balanced state. For this, we performed sequential ChIP (reChIP) analysis with chromatin from early wild-type embryos (2–4 h AEL), using antibodies against H3K27me3 and then Pol II. The enrichment over input for single ChIPs and reChIPs was calculated following qPCR (Figures 6A and S4A) or deep sequencing (Figures 6B and S4B), after normalizing to an intergenic control region or total read counts, respectively. An increase in enrichment from the first ChIP to the reChIP indicates some degree of co-occupancy, while equal enrichment or less is expected if the two antigens are mutually exclusive ((Geisberg and Struhl, 2004), see Extended Discussion). Indeed, we found that genes with Pol II and H3K27me3 enrichment in single ChIPs, but not genes with either H3K27me3 or Pol II enrichment only, showed higher enrichment after reChIP as compared to the first ChIP (Figure 6A). This effect increased with higher Pol II enrichment in single ChIPs and was statistically significant across all genes with H3K27me3 (p <10−32) (Figure 6B). In contrast, increased reChIP enrichment was not observed with either control antibodies (FLAG) or H3K4me3 in the second ChIP, consistent with previous evidence arguing against the bivalent domain in Drosophila (Gan et al., 2010; Schuettengruber et al., 2009).
This suggests that H3K27me3 and Pol II co-occur to some degree at many genes. While it is possible that Pol II occupancy levels are reduced upon PcG repression (see Extended Discussion, Figure 6D and below), our data argue against the possibility that the balanced state is the result of mixed populations of cells. This is also consistent with reChIP experiments in human embryonic stem cells indicating the co-occurrence of a form of Pol II and PcG components (Brookes et al., 2012).
We next examined the relationship between the two marks over time. The overlap between genes with high Pol II and high H3K27me3 is highest at the first time point of our series (29.7% of all H3K27me3-marked genes) and decreases during later developmental stages (to 12.5%). This result is similar to observations on the bivalent domain in mammalian embryonic stem cells (Bernstein et al., 2006; Lee et al., 2006; Mikkelsen et al., 2007). Furthermore, although a large number of genes maintain both Pol II and H3K27me3 throughout the time course (cluster 1 in Figure 6D), many genes that are initially balanced lose Pol II, H3K27me3 or both over time (clusters 2–4 in Figure 6D). In fact, PcG-repressed genes significantly overlapped with the closing set in Figure 2B (p < 10−5; Fisher exact test), supporting the idea that PcG repression can reduce Pol II occupancy over time (Chopra et al., 2011; Dellino et al., 2004).
We also analyzed how Pol II and H3K27me3 occupancy at balanced genes correlate with gene expression (Figure 6D). The presence of Pol II correlates with higher expression levels, while the presence of H3K27me3 correlates with lower expression levels. Indeed, genes in the balanced state are expressed at low levels and are often poised. This supports the antagonistic relationship between Pol II and H3K27me3, hence the term ‘balanced state’ is appropriate.
To test whether the balanced state confers specific dynamic expression properties, we analyzed the expression of balanced genes based on in situ hybridization data. We found that 68% of balanced genes belong to a previously identified group termed ‘Blastoderm Patterning’ (p < 10−23; Fisher exact test), characterized by highly dynamic expression patterns from the blastoderm stage onward (Tomancak et al., 2007). In comparison, genes selected by the presence of only H3K27me3 show less enrichment (35%; p < 10−12; Fisher exact test). Interestingly, many of these expression patterns are more dynamic than those of Hox genes and are not restricted to specific lineages. Thus, the balanced state marks genes with highly dynamic regulation. This suggests that PcG regulation keeps poised Pol II in check and that this repression can be overcome in a tissue-specific fashion.
Since the behavior of the balanced state during Drosophila embryogenesis is reminiscent of the bivalent state in mammalian embryonic stem cells, we investigated why the balanced state in Drosophila is not associated with H3K4me3. We found that genes with poised Pol II do not have significant levels of H3K4me3 (Figures 7A and B). Only genes that are transcribed (but with similar Pol IITSS occupancy) show H3K4me3 (Figures 7A and B).
This is in contrast to mouse embryonic stem cells where the H3K4me3 signal is higher relative to Pol II (using the same antibodies in the two species), and genes with poised Pol II have high levels of H3K4me3 (Figure 7C). The H3K4me3 mark at poised genes is more narrowly distributed but it is almost as high as at highly transcribed genes. The high levels of H3K4me3 at poised genes in mammals cannot be explained by the presence of CpG islands since even promoters not within CpG islands show significant H3K4me3 levels (Figure 7D). While the exact mechanisms that explain this species-specific difference remain to be shown, we found evidence that the lack of H3K4me3 at poised genes in Drosophila is due to their low nucleosome occupancy and higher nucleosome turnover (Figure S5).
Finally, to compare the dynamic behavior of balanced and bivalent genes during embryonic stem cell differentiation, we analyzed published Pol II, H3K4me3 and H3K27me3 data in mouse embryonic stem cells and performed an extended time course expression analysis in response to retinoic acid treatment (Lin et al., 2011). We found that the bivalent state is overall more frequent than the balanced state, but after adjusting the analysis thresholds (Extended Experimental Procedures), balanced genes and bivalent genes largely overlap and show a similar behavior in our analysis (Figure 7E).
For both poised Pol II and H3K4me3, the relative predictive values for future gene expression are high. Poised Pol II may be more stage-specific since the values are highest in time points just following the reference sample, while the values for H3K4me3 are high throughout the time course (Figure 7E). In combination with H3K27me3 though, poised Pol II and H3K4me3 behave very similarly, i.e. they tend to mark genes with late expression. This supports the hypothesis that the balanced state and the bivalent domain are in principle related, and that differences in the relative levels of Pol II and H3K4me3 (and perhaps other regulatory differences) explain why the bivalent domain was discovered in mammals, while the balanced state was first described in Drosophila.
We find that poised Pol II is frequently recruited to promoters de novo over developmental time, and that this recruitment helps establish a permissive state that can enable future activation. The mechanisms by which poised Pol II is recruited de novo are not known. While it could be mediated by sequence-specific transcription factors, transcription factors examined in vivo so far appear to affect both the recruitment and elongation of Pol II (Adelman et al., 2005; Boehm et al., 2003), or may even preferentially regulate Pol II elongation (Rahl et al., 2010). It is also possible that the recruitment of poised Pol II is regulated at the level of chromatin state, e.g. changes in the boundaries of heterochromatin could affect promoter accessibility.
It is clear, however, that not all genes that are poised will be expressed in these cells in the near future. Thus, the poised Pol II state is not simply an early sign of gene activation. This makes sense because the recruitment of poised Pol II is not tissue-specific and thus the cell may not receive the appropriate developmental or environmental signal to activate a poised gene. Furthermore, poised Pol II can persist for some time after a gene is down-regulated and marks past activation.
Regulation of a permissive state over developmental time has developmental implications. First, cells of a developing tissue sometimes have a time window in which they are competent to respond to certain signals (Pearson and Doe, 2004; Tran and Doe, 2008). Thus, changes in poised Pol II might alter the way a cell responds to extracellular signals over time. Second, it may be important during pattern formation that a wide range of cells are able to respond to activating signals such as morphogen gradients, although only a subset will receive sufficient signal to activate appropriate genes. Since we find that the poised state is also present in mouse ES cells and predicts stage-specific gene expression, it is possible that the role of poised Pol II in development reflects a broadly conserved feature of animal development.
While much of our work focused on poised Pol II, we identified a significant number of genes that are induced without prior poised Pol II consistent with other previous examples (Gilchrist et al., 2012; Lin et al., 2011). Remarkably, these genes tend to have a distinct combination of core promoter elements. Their promoters are enriched for the TATA box and their nucleosome configuration is distinct from paused genes or housekeeping genes. It remains to be shown how different core promoter elements are differentially used in development and how they influence the dynamics of Pol II initiation and elongation in vivo.
We found that different aspects of the chromatin state, such as poised Pol II or H3K27me3, can be used to analyze transcription during development. For example, while the recruitment of poised Pol II is mostly stage-specific, PcG repression is tissue-specific and may keep poised Pol II in check. Thus, different properties of the transcription or chromatin state correlate with either spatial or temporal changes during development, suggesting that there are as yet undiscovered relationships between chromatin regulation and development. This is exciting because an important goal in biology is to predict cellular behavior and development based on genotype and epigenetic state. Thus, mapping the relationship between chromatin and development more systematically could serve as a roadmap for predicting the behavior of diseased cells in humans, e.g. by identifying the tissue of origin and the developmental potential of cells.
Briefly, 50 mg aliquots of tightly staged embryos expressing CD8-GFP in either muscle (mef2-GAL4) or neurons (elav-GAL4) are dissociated in 7 mL Dounce tissue grinders, filtered, pre-fixed for 5 min with 1% formaldehyde while spinning down, post-fixed for 15 min and passed through a 70 μm syringe filter (BD Medimachine). GFP-positive cells are isolated on a MoFlo high speed sorter (Beckman Coulter). For a list of fly lines used refer to Table S1.
ChIPs from whole embryos (Toll10b, Oregon R) are performed as described in He et al. (2010) (see Extended Experimental Procedures). Chromatin from FAC sorted cells is pelleted by high-speed centrifugation, sonicated to an average size of 200 bp, and 2–7 μg soluble chromatin is used for each ChIP. Sequencing libraries are prepared from 5–20 ng immunoprecipitated DNA or 100 ng input DNA according to Illumina’s instructions.
Briefly, 60 μg chromatin is immunoprecipitated with 10 μg anti-H3K27me3 antibody (abcam ab6002 and Active Motif #39155), eluted and subsequently diluted before precipitation with 10 μg anti-CTD4H8 antibody (Millipore). For an extended protocol see Extended Experimental Procedures. Sequences of qPCR primers are listed in Table S3.
Total RNA from sorted cells is isolated using TRIzol (Invitrogen). Polyadenylated B. subtilis spike-in RNAs (in vitro transcribed from ATCC clones #87482-87486) are added to a defined amount of total RNA before mRNA-seq libraries were made following Illumina’s instructions.
50 mg crosslinked Toll10b embryos are homogenized, washed and aliquots are digested with increasing amounts of MNase and 20 μg RNaseA at 37 °C for 1 h. After purification by MinElute columns (QIAgen), samples are run on a 2 % agarose gel, and DNA corresponding to mononucleosomes (in this case from the sample treated with 32 U MNase) is prepared for paired-end sequencing following Illumina’s instructions.
Sequenced libraries (Illumina GAIIx) are aligned to the UCSC dm3 reference genome. Enrichment values are calculated for each protein-coding transcript in Flybase Release 5.28, for Pol II TSS (200 bp wide region centered at +30bp), for Pol II TU (from +400 bp to the 3′ end), for H3K27me3 (entire length of the transcript), and for H3K4me3 (TSS to +500 bp). Enrichment values are the number of aligned reads overlapping each region in the IP sample divided by the corresponding input control after read-count normalization. To correct for artificially high ratios due to little signal in both the IP and control regions, high ratios with low IP signal are discarded. For genes with multiple annotated TSSs, the enrichment values for the transcript with the highest Pol IITSS enrichment is used. Enrichment values for all genes are listed in Table S4.
Libraries are sequenced on an Illumina GAIIx and Tophat was used to align them to the reference genome (Flybase Release 5.28 with the five spike-in mRNA sequences added as pseudo-chromosomes). Cufflinks was used for transcript abundance (in RPKM) and differential expression analysis (Cuffdiff).
We define a gene as “minimally expressed”, if its RPKM is < 10, and as “poised”, if it is both minimally expressed and has a Pol IITSS enrichment value in the top 20th percentile for both Pol II 8WG16 antibody replicates. “Up- and down-regulated” genes are based on the default false discovery rate of 0.05. A gene is “induced”, if it crosses the “minimally expressed” threshold between two consecutive time points and qualifies as “up-regulated”. A gene is PcG-repressed if the H3K27me3 enrichment is in the top 2.5% of all genes.
Sequences surrounding all annotated Drosophila melanogaster transcript start sites were scanned for the core promoter elements listed in Table S2. A core promoter element was scored as present if found with no mismatch within a specified bp window relative to the transcription start site.
We thank V. Weake for fly stocks, the Cytometry and Molecular Biology core facilities at Stowers for technical help, Y. Jiang for help in the analysis, C. Seidel, J. Conaway and J. Workman for discussions. This project was funded by the NIH New Innovator Award to J.Z. (1DP2 OD004561-01). The work on embryonic stem cells was performed to fulfill, in part, requirements for B. D.’s PhD thesis research as a student registered with the Open University. J.Z. is a Pew scholar.
All data have been submitted to GEO under the accession number GSE34304.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.