|Home | About | Journals | Submit | Contact Us | Français|
Understanding the principles governing mammalian gene regulation has been hampered by the difficulty in measuring in-vivo binding dynamics of large numbers of transcription factors (TF) to DNA. Here, we develop a high-throughput Chromatin ImmunoPrecipitation (HT-ChIP) method to systematically map protein-DNA interactions. HT-ChIP was applied to define the dynamics of DNA binding by 25 TFs and 4 chromatin marks at 4 time-points following pathogen stimulus of dendritic cells. Analyzing over 180,000 TF-DNA interactions we find that TFs vary substantially in their temporal binding landscapes. This data suggests a model for transcription regulation whereby TF networks are hierarchically organized into cell differentiation factors, factors that bind targets prior to stimulus to prime them for induction, and factors that regulate specific gene programs. Overlaying HT-ChIP data on gene expression dynamics shows that many TF-DNA interactions are established prior to the stimuli, predominantly at immediate-early genes, and identified specific TF ensembles that coordinately regulate gene-induction.
The complex gene expression programs that underlie development, differentiation, and environmental responses are primarily determined by binding of sequence-specific transcription factors (TFs) to DNA (Graf and Enver, 2009; Laslo et al., 2006; Struhl, 2001). While it is clear that TFs play a critical role in gene regulation, how these factors work together to control gene expression responses in complex organisms is still not fully understood (Davidson, 2010).
To date, systematic efforts to understand the mammalian regulatory code have mostly relied on generalization from studies on simple model organisms (Capaldi et al., 2008; Harbison et al., 2004), in vitro experiments, and studies of individual gene loci (Bossard and Zaret, 1998; Cirillo et al., 2002; Thanos and Maniatis, 1992). Genomic approaches, such as correlation analysis of gene expression profiles (Segal et al., 2003), and more recently RNAi perturbation followed by gene expression readouts (Amit et al., 2009), have provided an initial glimpse into the complexity of mammalian gene regulation. However, such approaches cannot distinguish direct from indirect effects and cannot address network redundancy and temporal regulation, thus they provide limited insight into the underlying regulatory mechanisms.
A complementary approach is to measure the temporal in vivo binding of TFs to cis-regulatory regions under relevant stimuli. Recent advances in genomic technologies allow for unbiased and accurate genome-wide characterization of TF binding using ChIP followed by DNA sequencing (ChIP-Seq) (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007). Despite these advances in detection, ChIP remains relatively low throughput (Barski et al., 2007; Gerstein et al., 2010; Johnson et al., 2007; Mikkelsen et al., 2007; Negre et al., 2011; Roy et al., 2010). As a result, little is known about the genome-wide dynamics of protein-DNA interaction networks.
To address these challenges we developed HT-ChIP, a reproducible, high throughput and cost-effective method for ChIP coupled to multiplexed massively parallel sequencing. We used HT-ChIP to investigate the principles of gene regulation in the model system of primary innate immune dendritic cells (DCs) stimulated with the pathogen component lipopolysaccharide (LPS). In response to stimulation, DCs activate a robust, specific, and reproducible response that unfolds over several hours, involves changes of thousands of genes (Amit et al., 2009; Rabani et al., 2011), and plays a critical role in directing the host immune response. We used HT-ChIP to build genome-wide dynamic maps of TF localization to DNA during response of DCs to LPS. We screened antibodies for the most expressed transcription factors and identified ChIP-Seq grade antibodies for 25 TFs, RNA polymerase II (Pol II), and 3 epigenetic modifications. Using these validated antibodies we performed HT-ChIP across four time points upon LPS stimulation. Surprisingly, we find that much of the binding of TFs is pre-coded during differentiation and prior to stimulation, predominantly on immediate early genes. Many of the immediate early genes are associated with High Occupancy Target (HOT) regions similarly to those recently reported in flies and worms (Gerstein et al., 2010; Negre et al., 2011; Roy et al., 2010). By focusing on dynamics of expression and binding, our work further expands the functional role of these HOT regions as potential stimulus dependent induction hubs in mammals.
Our data shows that TFs vary substantially in their binding dynamics, number of binding events, preferred genomic locations and interactions with other TFs. Analysis of these different binding properties together with temporal gene expression and epigenetic marks shows that TFs fall into at least three broad characteristics, suggesting a multilayered architecture. Pioneer TF described recently (Bossard and Zaret, 1998; Cirillo et al., 2002; Ghisletti et al., 2010; Heinz et al., 2010; Lupien et al., 2008) are coded during differentiation, are unchanged in binding location during stimulus and correlate with the cell epigenetic state (Ghisletti et al., 2010; Heinz et al., 2010). A second prominent layer of TF binds thousands of genes in the un-stimulated state and is highly correlated with future stimulus dependent gene induction. A third set of TFs bind dynamically in a stimulus dependent manner and control induction of gene sets enriched for a shared biological activity (e.g. Inflammatory, anti-viral response and cell cycle). Together, our findings demonstrate the importance of global TF dynamic maps in uncovering the principles of the regulatory code. For visual exploration of the data, we developed an extension to the Integrative Genomics Viewer (IGV (Robinson et al., 2011)), geared specifically towards viewing time course data. The entire data can be viewed from: http://www.weizmann.ac.il/immunology/AmitLab/data-and-method/HT-ChIP.
We developed HT-ChIP, an automated method for systematic mapping of in vivo protein-DNA binding that increases the throughput and sensitivity, while reducing the labor and cost required for ChIP-Seq. Unlike the standard ChIP assay performed in individual tubes, which involves over 25 steps of chromatin washing, reverse crosslinking, DNA purification, gel extraction and library construction (Barski et al., 2007; Johnson et al., 2007; Mikkelsen et al., 2007); HT-ChIP uses magnetic solid phase beads for the immunoprecipitation of protein-DNA complexes, DNA purification, size-selection and library construction eliminating laborious manual processes (Figure 1, Methods). Furthermore, the entire HT-ChIP process is performed in the same well reducing sample loss of precipitated DNA material allowing a significant reduction in the required number of cells (Figure S1A–B, Methods). HT-ChIP further leverages the yield of current next-generation sequencing by multiplexing an arbitrary number of different indexed sequencing adapters, 96 in our case, to combine samples in a single sequencing flow cell (Figure 1A). The data produced by HT-ChIP-Seq is highly correlated with traditional ChIP-Seq data generated both in our labs and by others ((Ghisletti et al., 2010; Heinz et al., 2010); Figure S1C–D).
We used HT-ChIP to reconstruct the dynamic binding network of 25 TFs in primary mouse dendritic cells (DCs) following LPS stimulation (Figure 1C and Table S1). We used RNA-Seq of DC activated with LPS at five time points (0, 1, 2, 4, 6 hours) to identify the most highly expressed TFs in DC (RPKM > 15, totaling 184; see Supplementary Text). We then collected 271 commercially available antibodies targeting these TFs (Figure 1B; Methods). We tested each antibody using a signature readout (Ram et al., 2011) (‘ChIP-String’) that measures selected genomic DNA regions with high regulatory activity (Ghisletti et al., 2010). We identified 29 antibodies (25 TFs, 3 histone modifications and Pol II) that passed our selection criteria as ‘ChIP grade’, based on their enrichment on the signature regions and performance in Western blots. These antibodies were then used for HT-ChIP at four time points (0, 0.5, 1, 2 hours) post LPS stimulation, during which most of the transcriptional changes occur (Figure S1F,G).
Recent studies have demonstrated that the ratio between H3K4me3 and H3K4me1 histone marks can be used to identify promoter and enhancer regions (Heintzman and Ren, 2009): promoters are associated with a higher proportion of H3K4me3-marked histones (H3K4me3+), while enhancers have a higher proportion of H3K4me1 marked histones (H3K4me1+). We identified promoter candidates as H3K4me3+ regions, and retained those that overlapped a known (Pruitt et al., 2007) or reconstructed transcription start sites as identified from RNA-Seq data ((Guttman et al., 2010); Figure 2A, Figure S2 Methods). Notably, ~75% of the identified promoters were bound by at least one of the TFs. To define enhancers, we identified candidates containing H3K4me1+ and retained those that were also bound by at least one TF (See for example the Il1a loci in Figure 2A; Methods). Altogether, we identified 38,439 enhancers and 11,505 promoters.
Consistent with previous observations (Ghisletti et al., 2010), we found that different chromatin marks exhibit different dynamics during stimulation (Figure S2C–I). For example H3K4me3 is remarkably stable during the first 2 hours of LPS response. The few exceptions are in ~30 loci which are lowly expressed pre-stimulation and become strongly induced after stimulation (top 95% of induction). Conversely, H3K27Ac is more variable and tends to change in correlation with PolII binding (r=0.66 for H3K27Ac vs r=0.49 for H3K4me3). These chromatin marks are significantly less dynamic than most TFs (Figure S2C–I).
Taking all temporal reads together to obtain a “compressed” dataset, we identified significant binding events (peaks) for each TF (Guttman et al., 2010) (Methods). The vast majority (82%) of high scoring TF peaks fall within the promoter regions or the enhancer regions defined above (p<10−20). The binding landscape is consistent with the known specificities of TFs (Methods, Table S5 and Figure S2J). Using de-novo motif discovery (Bailey and Elkan, 1994) across the high-scoring bound sites, we identified the known motifs for 20 (80%) of the TFs (Gupta et al., 2007), as well as novel motifs for E2f4, Ets2 and Ahr (12%). The highest scoring motif (E < e−100) found for the TF E2f4 is the cell cycle genes homology region (CHR), a previously identified regulatory element found adjacent to a handful of cell cycle genes, which appears in tandem to an E2f canonical motif (Lange-zu Dohna et al., 2000).
The TFs vary in both number and location of binding events. Some TFs (PU.1 and Cebpb) bind >30,000 sites, while others (such as Hif1a) bind <1000 (Figure 2B), consistent with ChIP data for the same factors from other studies (Barish et al., 2010; Ghisletti et al., 2010; Heinz et al., 2010). Notably, ~70% of the identified peaks fall in close proximity (500bp) to a peak of either PU.1 or Cebpb (p<10−10; Table S2). Different factors exhibit substantially different localization preferences with some favoring enhancers while others tend to bind in promoters or in less canonical regions (Figure 2). For example, the runt domain 1 factor Runx1 binds many targets at their 3′ UTR regions (Figure S3). Further analysis showed that Runx1 binding at the 3′ end tends to be stronger and more dynamic in comparison to promoter binding, and that the 3′ end target genes are more strongly expressed and have a stronger enrichment for an anti-inflammatory function. Examining genes that are down regulated upon Runx1 knock down in primary DC activated with LPS (Amit et al., 2009), we find a significant enrichment for inflammatory genes (p=0.003, hypergeometric) that are bound at their 3′ end. Taken together, these results suggest that Runx1 may have a different function when binding the 3′ end of genes as compared to promoter bound regions.
Associating each binding site with its associated regulatory region (promoter or enhancer) resulted in 184,805 high confidence interactions (Methods). Similar to recent reports (Zinzen et al., 2009), the resulting network suggests that TFs tend to bind in cis-Regulatory Modules (CRMs; Figure 3A; Table S2 and Figure S4A,B) occupied by multiple other factors (1.5 fold enrichment over a random model, p<10−10; Supplementary text). Two notable factors, PU.1 and Cebpb tend to occupy most regions bound by all other factors but also bind many regions devoid of binding of the TFs we surveyed (>10% of their bound regions have no other factor binding, a 3-fold enrichment; p<10−10; Figure S4C). PU.1 and Cebpb bound regions are also highly enriched in motifs of other TFs we did not survey (e.g. Klf, Myc, and Hlf; p < 10−10) suggesting that PU.1 and Cebpb may co-bind with additional TFs at these sites (Ghisletti et al., 2010). Moreover, we find that ~8% of the regions are occupied by a larger number of TF than expected by chance. These regions, termed HOT regions (Gerstein et al., 2010; Negre et al., 2011; Roy et al., 2010), are defined to have 8 or more bound TFs (3.5-fold enrichment; Methods and Figure S4B).
TFs interact with one another in a combinatorial fashion to control different gene programs either by forming complexes that together bind DNA (Junion et al., 2012) or by independently binding DNA regions (Arnosti and Kulkarni, 2005). We searched for co-occurring pairs of TFs, while excluding HOT regions, which can confound discovery of such interactions (Gerstein et al., 2010; Negre et al., 2011; Roy et al., 2010) (Figure 3, Methods). As expected, our results recapitulate well-known homotypic transcriptional complexes. For example, Rela co-occurs with other Nfkb family members (Relb, Nfkb1, cRel,(Smale, 2012)), Stat1 co-occurs with Stat2 and Irf1 co-occurs with Irf2. We also find novel heterotypic interactions such as, RelA-Runx1 and E2f4-Ets2 that warrant further exploration. Interestingly, the E2f4-Ets2 complex is enriched in cell-cycle genes that are dynamically repressed after stimulation (p<10−3; Methods) (Table S3).
The TFs vary substantially in the extent of dynamic changes in their binding during the response. The Ifit locus, a robust anti-viral response cluster provides an illustrative example (Figure 4A). While PU.1 is bound at the same level in both un-stimulated and stimulated cells (Figure 4A, top inset), Stat1 binds only during the late stages of LPS response (Figure 4A, bottom inset). Globally only ~10% of the PU.1 binding sites are associated with substantial (>3-fold) changes post-stimulation, as opposed to ~90% of the Stat1 binding sites (Figure 4B, methods). Overall, 58,075 (31%) TF binding events are “dynamic” (Figure 4B, methods, and Figure S4D). In the following sections we analyze how temporal changes in the TF binding profiles correlate with the expression levels of their target genes.
To study the functional impact of TF binding, we associated cis-regulatory regions with their target genes and generated a temporal TF-gene regulatory network containing 79,797 TF-gene interactions (Figure 4C,D, Methods). Overall, we find that genes bound by few TFs (<5) are enriched for basic cellular processes (p<10−5), while genes targeted by many TFs (>15) are enriched for inflammatory response pathways (p<10−7; Figure 4D, Table S3). The targets of individual TFs are also enriched for specific functional classes (Table S4). For example, E2f4 binding is enriched for cell cycle genes (p<10−10); Nfkb binding is enriched for inflammatory response genes (p<10−10); and Stat TF binding is enriched for anti-viral response genes (p<10−10; Figure S5, Table S4).
To explore the relationship between binding dynamics and expression patterns we used temporal gene expression data using RNA-Seq for 5 different time-points (0, 1, 2, 4, 6 hours) following LPS stimulation. We divided the 4,993 genes that responded to LPS stimulation (2-fold change compared to the un-stimulated state, Methods) into five clusters (Figure 5A, Table S6, Figure S5A and Methods): The ~1,300 induced genes constituted three clusters: immediate early induced genes whose expression peaks before the first hour (293 genes), intermediate induced genes with peak expression prior to the second hour (227 genes), and late induced genes whose expression peaks after two hours (808 genes). Over 3,500 repressed genes comprised two additional clusters, genes that are gradually repressed and those that are rapidly repressed.
Genes in the LPS-induced clusters are bound by more TFs prior to the stimulus than non-induced genes (p<10−10, Supplementary text). Furthermore, some of the factors (e.g., Junb, Atf3, Irf4) are specifically enriched at the promoters or enhancers of these induced genes even prior to exposure to stimulus (p<10−3, Figure 5A, S5A,B; Methods). In contrast, PU.1 and Cebpb bind a larger number of genes in the pre-stimulated state, but are not enriched for LPS-induced genes. These results suggest that transcriptional induction potential is established prior to stimulation via preferential binding of a selective set of TFs to inducible genes.
We next compared TF dynamics and gene expression following stimulation, finding multiple cases in which the timing of gain or loss of TF binding at genes in the induced clusters significantly precedes or coincides with the timing of transcriptional induction (p<10−3; Figure S5C,D; Methods). We therefore further sub-clustered the genes within each expression profile by the similarity of their dynamic binding profiles, resulting in 19 clusters, each representing a unique combination of expression and binding profiles (Figure 5A, Methods, Table S3 and Figure S5E).
The 19 binding/expression clusters uncover different regulatory programs activated by innate immune DCs challenged with LPS. For instance, the late induced gene cluster is partitioned into two sub-groups. Late induced cluster II is strongly associated with late binding of Stat1 and Stat2 (Figure 5A,B, p<10−3; Methods), and consists of highly expressed genes (average RPKM 250) that are enriched in interferon signaling and other anti-viral pathways (p<10−10). In contrast, late induced cluster I is only weakly bound by the Stat factors, genes in this cluster have lower absolute expression levels (average RPKM 100) and are enriched mainly in leukocyte proliferation pathways and also in lowly expressed anti-viral genes (p<10−5 and p<10−10 respectively; Table S3). This partition suggests two different regulatory modes of late LPS gene activation: a high expression Stat-bound anti-viral response arm (Figure 5A,B), and a Stat-independent response arm, which orchestrates a second wave of inflammatory response genes (e.g., CD86 that plays a critical role in T cell activation and survival (Sharpe and Freeman, 2002)).
Immediate early genes play a critical role in rapid response to changes in the environment, yet their mode of regulation is not fully understood (Amit et al., 2007; Hargreaves et al., 2009; Ramirez-Carrozzi et al., 2009; Weake and Workman, 2010). The immediate early genes are partitioned into three clusters, each associated with a distinct binding profile and enriched for genes from different pathways (Figure 5A). Immediate early cluster I is defined by strong binding of Rela and Egr1 during the first hour of stimulation, has a relatively low maximal expression (average RPKM 50) and is enriched for transcription factor genes, including Egr1, Egr2 and Egr3. In contrast, immediate early cluster II (Figure 5C) consists of highly expressed genes (average RPKM 300) that are targeted by a large number of TFs, many of which bind their targets prior to stimulation, and are enriched for inflammatory response genes (both TFs and cytokines, e.g., Nfkbiz, Tnfaip3, Junb, Klf6 and TNF, p<10−5; Figure S5E and Table S3). The low conservation together with the high degree of redundancy observed on immediate early genes (Figure S6, Supplementary text) suggests regulation via a ‘billboard’ or collective model rather than an enhanceosome model (Arnosti and Kulkarni, 2005). In this model, the billboard/collective is pre-assembled prior to stimuli and recruits, possibly without great specificity, many different factors on relatively non-conserved and weak binding sites to achieve high expression levels.
While induced genes are generally associated with gain of binding post-stimulation (p<10−10), repressed clusters are enriched for loss of TF binding or for no binding gain (p<10−10, Supplementary text). For instance, repressed cluster III (Figure 5A) is strongly enriched in cell cycle genes (e.g. Cdk1) and is primarily associated with static binding of the cell-cycle related factor E2f4 and Ets2 while it is depleted of binding of Junb, Irf4 and Atf3 which bind most of the induced genes. In another example, the histone gene locus is bound by Nfkb and E2f family members in the basal state, followed by loss of Nfkb factors immediately post-stimulation (Figure S5F). Together this suggests that genes that are not bound by priming factors pre-stimulation (Junb, Atf3 and Irf4) are more prone to repression following stimulation. A second alternative is that circuits involved in repression, like recruitment of the Smart/Ncor complex by Bcl6 (Barish et al., 2010), may be less profiled in our study. We conclude that dynamic binding maps specify distinct regulatory programs and provide information on both timing and amplitude of gene expression.
The temporal structure of the TF network is consistent with a model of hierarchical organization and temporal dependencies between the different TFs where some TFs are bound prior to or concomitantly with other TFs. Such “layered architecture” of regulation has been described previously where Pioneer factors bind compacted chromatin, initiate chromatin remodeling during differentiation, and enable subsequent binding of non-Pioneers factors (Bossard and Zaret, 1998; Cirillo et al., 2002; Lupien et al., 2008).
To analyze the patterns of binding dependencies between the different TFs, we constructed a hierarchy graph (Figure 6A), where an edge is directed from factor A to factor B if factor A binds at least 30% of the regions bound by factor B at the same or earlier time. The graph reveals a clear organization that supports and extends the basic distinction between pioneers and non-pioneers. Not surprisingly, the top-most tier consists of the two factors in our set (PU.1 and Cebpb) previously described as pioneers (Ghisletti et al., 2010; Heinz et al., 2010). A second tier consists of three TFs (Junb, Irf4, Atf3), which bind pre-stimulation at LPS induced genes that later become associated with more specific and dynamic factors. Interestingly, in macrophages AP-1 binding motifs are also enriched at enhancers of LPS induced genes bound by the Pioneer factor PU.1 (Ghisletti et al., 2010). Our results suggest that Junb and Atf3 may be the AP-1 components at these sites. At the bottom tier we find factors that are more dynamic and control more specific sets of genes that have common biological functions. For instance, the Stat TFs target the late induced anti viral genes, while the Nfkb factors Rel, Relb and Nfkb1 target the inflammatory program.
To better characterize the TFs in the hierarchy we consolidated the various binding properties discussed above: (1) number of bound regions, (2) ratio of enhancer to promoter binding, (3) percent of dynamic binding events, (4) fraction of regions bound in isolation, (5) fraction of all DNA motifs in the genome bound by the factor, (6) Conservation of binding sites (see Supplementary text) (7) number of outgoing edges in the hierarchy, and (8) number of incoming edges in the hierarchy. Using Principal Component Analysis (Figure 6B) we found that the Pioneer factors, PU.1 and Cebpb clearly separate from all other factors. Both Cebpb and PU.1 are abundantly bound already in un-stimulated cells and cover the majority of sites bound by other TFs, but are also found in “isolated” sites with no binding by any of the analyzed TF. Furthermore, the binding of Cebpb and PU.1 is relatively static during the response, comparable to the histone marks and Ctcf (Figure 4B, Figure S2C–I). The remaining factors form at least two additional sub-groups. Factors in one group (Figure 6B green) bind many genes, but rarely bind in isolation (average 5% alone), have a larger proportion of dynamic binding events (36% vs. 12%) compared to the pioneers, and form an intermediate layer in the network, between the pioneers and the non-pioneer factors (Figure 6A). The remaining factors (Figure 6B red) tend to bind fewer genes, are mostly dynamic, tend to preferentially bind promoters, and are located lower in the hierarchy.
The factor classes are also distinguished by their effect on gene expression in a manner consistent with a ‘layered’ hierarchical organization. Pioneer binding correlates to a lesser degree with gene induction levels than factors in other tiers. Binding of second tier factors in the un-stimulated state correlates with the potential for induction (Figure S5A,B), but has lower enrichment for specific functional categories (Table S4). The remaining factors tend to bind a smaller number of regions from specific functional categories (e.g. Stat1 with anti-viral genes, E2f4 with cell cycle genes, Runx1 with Inflammatory genes) and dynamically coincide with the induction of genes post-stimulation (Figure 5, Figure S5C,D).
Mammalian genomes can give rise to hundreds of cell types and numerous transcriptional programs; this variety is largely encoded in TF-DNA networks, which are only partially understood. Our results indicate that the response of DCs to a pathogenic stimulus is encoded by a multilayered TF network that has at least three major layers (Figure 6): Pioneer factors potentiate binding by opening previously inaccessible sites (Bossard and Zaret, 1998; Cirillo et al., 2002; Heinz et al., 2010; Lupien et al., 2008). These new elements are occupied in a relatively static manner by second tier of TFs (e.g. Junb) that prime the response and set the basal expression levels of thousands of genes, and thus term them “Primer” factors. The final tier consists of TFs that bind subsets of genes, often in a very dynamic fashion, and usually at genes of a shared biological process (Smale, 2012).
The layer architecture we propose helps explain how the cell’s expression potential is set during lineage commitment: while Pioneer factors initiate chromatin remodeling, Primer factors may serve as beacons, which upon stimulation direct other TFs or post-translation modifying enzymes to the appropriate genomic sites, a role previously suggested for pioneer factors such as Cebpb, PU.1, E2a and Ebf (Cirillo et al., 2002; Heinz et al., 2010).
Future work will be required to elucidate the exact mechanisms and nuclear complexes that these different classes of factors associate with to execute their diverse functions. For instance, in several cases we observe a Primer factor from one homotypic family joined or replaced by another factor from the same family (e.g. Egr1-Egr2, Irf members and several AP-1 factors) this may suggest that a partial role of the priming factors is to maintain the binding site or serve as a docking point for the dynamic partners from the same family. This proposed model may generalize to other transcriptional responses in different cell types (Mullen et al., 2011; Trompouki et al., 2011).
Our understanding of mammalian regulatory circuits is currently limited by technical constraints such as differences in the efficiency of antibodies or TF-DNA crosslinking. To overcome these limitations it will be important to generate reference-binding maps using tagged TFs, to directly benchmark antibody efficiency. The resulting inventory of ChIP antibodies and tagged TF libraries will enable the exploration of differences in TF physical networks under different conditions, cell types, or individuals in a population, and provide insights into the mammalian regulatory code and the role of specific cis-binding elements in disease (Kasowski et al., 2010). Such efforts will likely extend our proposed layered organization to other cellular states, and may enable efficient engineering of cellular identities by controlling the expression and timing of different regulatory layers.
20 million DC were used for each ChIP experiment. Cells were fixed for 10 min with 1% formaldehyde, quenched with glycine and washed with ice-cold PBS and pellets where flash frozen in liquid nitrogen. Cross-linked DC where thawed on ice and resuspended in RIPA lysis buffer supplemented with protease inhibitor. Cells were lysed for 10 min on ice and the chromatin was sheared. The sonicated cell lysate was cleared by centrifugation and mixed with 75 ul of protein G magnetic dynabeads (Invitrogen) coupled to target antibody in 96 well plates and incubated over night at 4 degrees. Using 96 well magnets unbound cell lysate was removed and samples was washed 5 times with cold RIPA, twice with high salt RIPA, twice with LiCl buffer, twice with TE, and then eluted in 50 ul of elution buffer. The eluate was reverse crosslinked at 65C for 4 hours and then treated sequentially with 2ul of RNaseA for 30 min and 2.5 ul of Proteinase K for two hours. Solid-phase reversible immobilization (SPRI) cleanup steps were performed in 96 well plates using the Bravo liquid handling platform (Agilent) using a modified version of (Fisher et al., 2011). 120ul SPRI beads were added to the reverse-crosslinked samples mixed and incubated for 2 minutes. Supernatant were separated from the beads using a 96-well magnet for 4 minutes. Beads were washed twice on the magnet with 70% ethanol and then air dried for 4 minutes. The DNA was eluted in 40 ul EB buffer. For the remainder of the library construction process (DNA end-repair, A-base addition, adaptor ligation and enrichment) a general SPRI cleanup involves addition of buffer containing 20% PEG and 2.5 M NaCl to the DNA reaction products (without moving the sample from the original well position). All enzymatic steps are carried out using enzymes from New England Biolabs. More-detailed description of the methods is provided in the Supplemental Experimental Procedures and http://www.weizmann.ac.il/immunology/AmitLab/data-and-method/HT-ChIP/.
We designed ~4 probes targeting regulatory regions of ~200 genes centered at the TSS and complemented this set with two probes tiling of any significant PolII peak or K4me3 peak that lied within the gene body or any significant K4me3 peak that lied within 30Kb of the TSS of the genes we targeted. The final probeset consisted of 786 probes. See Experimental Procedures for more information.
To obtain sufficient number of cells, we implemented a modified version of the DCs isolation used in (Lutz et al., 1999). See Experimental Procedures for more detailed information
Total RNA was extracted with QIAzol reagent following the miRNeasy kit’s procedure (Qiagen), and sample quality was tested on a 2100 Bioanalyzer (Agilent). We prepared the RNA-A+-Seq libraries using the ‘dUTP second strand (strand specific) protocol as described in (Levine et al 2010). Libraries where sequenced using the Illumina Genome Analyzer (GAII), two lanes for each sample, corresponding to 45 million paired-end reads/sample on average.
Nuclear extracts from mouse bone marrow dendritic cells (DC) were prepared by using NE-PER nuclear and cytoplasmic extraction reagents (Thermo scientific, USA). 20ug of the nuclear proteins were separated by SDS-PAGE and transferred to PVDF membrane. Membranes were probed with antibodies and visualized by ECL, according to the instructions of the manufacturer.
ChIP libraries were indexed, pooled and sequenced on Illumina HiSeq-2000 sequencers at the Broad Institute sequencing center. Reads were aligned to the reference mouse genome NCBI37, using BWA (Li and Durbin, 2009) version 0.5.7. RNA sequencing reads were aligned to the mouse reference genome (NCBI 37, MM9) using the TopHat aligner, version 1.1.4.(Trapnell et al., 2009) See supplementary Text more information.
We implemented our contiguous segmentation algorithm, described in (Guttman et al., 2009) as part of the Scripture package (available from http://www.broadinstitute.org/software/scripture/) and used it to call, score and filter peaks for both chromatin and TF libraries. See supplementary text for more information.
Top-Hat alignments were processed by Scripture (Guttman et al., 2010) to obtain significantly expressed transcripts for each time course. Only multi-exonic transcripts were retained. Quantification was used using the constituent model ((Garber et al., 2011) And Supplementary Text).
We performed both de-novo motif discovery and known motif matching using the MEME-ChIP pipeline (http://meme.nbcr.net/meme4_6_1/memechip-intro.html).
The binding value of a TF in a region is the sum of enrichment scores over all the peaks that pass the detection cutoff during at least one time point. Gain or loss of binding are defined when there is at least 3-fold change in the binding value compared to the basal state (at t=0). We associate a gene with a gain (loss) event if at least 50% of its enhancers or 50% its promoters are associated with gain (loss). The association of regulatory regions with genes is described in the supplementary text.
We compared the observed TF complexities (Fig 4a) to a random model under which the complexity of a region is proportional to its length (Supplementary text). We use this analysis to define the cutoff for HOT regions as the minimal complexity value (x>2) for which the observed frequency is higher by at least two fold than the random one. The selected cutoff was x=8 (Fig S8).
We employ a two-step k-means clustering process: first the genes were clustered by their temporal expression profiles, then each cluster is partitioned using the TF binding data. We used a randomization test in order to evaluate the dependency between the expression and binding data, as captured by the cluster analysis. To this end, we compare the distribution of cluster sizes obtained with the original data to that obtained from randomized instances, shuffling the binding data while retaining the expression levels and the number of binding TF per-gene (Kolmogorov-Smirnov p<10−10).
We evaluate the enrichment of TF binding in a set of genes (Figure. 5b,c) using a hypergeometric score. In order to control for the general tendency of TF to be associated with highly expressed genes (Figure S17), we shuffle the binding data while maintaining the correlation between the number of bound TF and expression magnitude. We then compute an empirical p-value by comparing the randomized scores to the original ones. To get more specific results we also exclude genes that are associated with HOT regions.
We thank Schraga Schwartz, Tommy Kaplan, Ami Citri, Kevin Struhl, Gioacchino Natoli, Richard Young and John Rinn for valuable discussions and comments; Leslie Gaffney for artwork, Jim Bochicchio for project management, and the Broad Sequencing Platform. This project was supported by the Human Frontiers Science Program; Career Development Award an ISF; Bikura Institutional Research Grant Program; ERC starting grant 309788 (IA), by the Broad Institute (MG, NY, IA, AR), by DARPA D12AP00004 (MG). HHMI, NHGRI P01 an NIH PIONEER award, a Burroughs-Wellcome Fund Career Award at the Scientific Interface, a Center for Excellence in Genome Science from the NHGRI, and the Merkin Foundation for Stem Cell Research at the Broad Institute (AR), and by the New England Regional Center for Excellence/Biodefense and Emerging Infectious Disease U54 AI057159 (NH). EU FP7 Model-In and (NF) US-Israel Binational science foundation (NF and AR)
Complete Sequencing data sets are available at the Gene Expression Omnibus (accession no. GSE36104).