ChIP-string: meso-scale location analysis for chromatin proteins
We developed a new method to determine, in multiplex, the enrichment of many CRs or histone modifications at hundreds of representative loci. We reasoned that such a signature binding profile would be highly informative. First, querying several hundred regions is less biased than sampling a handful of loci as typically done by ChIP-PCR. Second, it yields a ‘signature’ pattern that could help determine if a CR is consistently associated with loci sharing a chromatin state. Third, a signature can be measured faster and at a much lower cost that a genome-wide profile and is thus appropriate for screening antibodies and ChIP conditions for difficult targets or for perturbation screens using RNA interference or small molecules.
As a signature readout, we assembled a panel of 487 genomic loci representing different types of chromatin environments. To choose the regions, we used genome-wide chromatin state annotations for human ES and K562 cells, derived from multiple histone modification maps (Ernst et al., 2011
). We selected representative loci for each of the major states in the two cell types; e.g.
, active or repressed promoters, transcripts, distal elements, etc. (Experimental Procedures
; Table S1
). We reasoned that individual CRs would localize to subsets of these representative loci, and thus enable us to distinguish an effective CR ChIP assay.
To measure enriched binding at the signature loci, we developed the ChIP-string method. ChIP-string leverages the nCounter Analysis System platform (NanoString Technologies), originally developed for multiplex quantification of RNA molecules. We designed a probe-set complementary to the signature loci, and adapted the nCounter operating procedures for ChIP DNA (Experimental Procedures
). We validated ChIP-string by analyzing histone modification ChIPs and comparing the measurements to ChIP-seq data ( and S1A
Screening CR-antibodies by ChIP-string
We evaluated the sensitivity of ChIP-string by conducting the assay with successively smaller quantities of ChIP DNA. We found that a minimum of ~5 ng of DNA is needed to maintain quantitative accuracy. While histone modification ChIPs typically yield more than 5 ng of DNA, CR ChIPs yield much smaller quantities (<1 ng), even when millions of cells are used as starting material. We therefore implemented a rapid genomic amplification step to ChIP DNA prior to nCounter detection, and confirmed that it faithfully maintains enrichment for a majority of the signature loci ().
ChIP-string screen identifies effective reagents for mapping CRs
We applied ChIP-string to 126 CR antibodies, 17 histone modification antibodies and 2 IgG control antibodies (Table S2
). We also analyzed 16 other control samples of un-enriched chromatin input. We used chromatin from K562 cells, with the exception that ES cell-specific CRs were profiled using chromatin from ES cells. In 21 cases, more than one antibody was tested for the same target protein, allowing us to evaluate different epitopes. Overall, we screened ~150 samples. We normalized the data by sample and then by probe, using an approach analogous to methods applied to microarray data. We then standardized the measurements in each sample, creating a scale of relative probe enrichment that is comparable across samples (Experimental Procedures
Next, we distinguished effective CR antibodies from those yielding non-specific enrichment. We calculated correlation coefficients between each pair of ChIP-string experiments, and hierarchically clustered the data. A substantial majority of the CR binding signatures (~80) either clustered with IgG control antibodies, or formed separate clusters with overall weak signals before standardization (Figure S1B
). Furthermore, none of these experiments correlated well with any specific histone modification or chromatin state. Although we cannot rule out that they enrich regions not captured by our probe-set, we designated these CR antibodies as ‘failed’ in our screen.
The remaining CR ChIP-string experiments were clearly distinct from the IgG and input control experiments (Figure S1B
), exhibiting both a larger number of enriched probes as well as higher enrichment values. In many cases, these CR experiments enriched subsets of loci in patterns reminiscent of individual histone modifications or chromatin states (). Regardless, we designated all of these remaining CR antibodies as ‘passed’. Notably, an alternative analysis procedure, which used different statistical methods in pre-processing and antibody assessment, led to highly similar results, supporting the robustness of the screen. This alternative procedure can be used even when only few antibodies are tested (Experimental Procedures
We carried out ChIP-seq for 39 CR antibodies that passed the screen, and a sample of 9 ‘failed’ CR antibodies. Of the 39 ‘passed’ antibodies, 34 (~90%) yielded high-quality genome-wide profiles as reflected by robust enrichment of specific genomic loci, whereas none of the failed antibodies yielded high-quality data. These results indicate that ChIP-string provides an independent and objective means to identify effective reagents for CR mapping.
A compendium of genome-wide CR maps
We used 29 of the CR antibodies that ‘passed’ our screen to generate 42 ChIP-seq datasets of the genome-wide distributions of 27 CRs in K562 cells and 15 CRs in ES cells ( and S2
; Experimental Procedures
). We confirmed the specificity of each of these antibodies by Western blots (Figure S1C
). We used two independent peak-calling procedures to collate enriched sites for each CR in each cell type (Table S3
). The number of sites ranged from 1,680 for HP1γ to 30,993 for RBBP5 to 39,180 for RNA polymerase II phosphorylated at serine 5 (RNAPIIS5P), with a median of 9,194 sites per CR. The vast majority of enriched regions were between 1 and 2 kb in size.
CR binding maps reveal modular organization and coherent associations with chromatin states
CR binding patterns reveal a modular organization
Comparing the enrichment profiles of the CRs, we found that CRs bind in characteristic combinations. Specifically, we calculated correlations between each pair of CR binding profiles over all regions showing a significant peak in at least one dataset (Experimental Procedures). This allowed us to not only compare the different bound locations, but also to consider the shapes of the binding peaks in cases where the locations overlap. We then hierarchically clustered the CRs based on all pair-wise correlations. The resulting dendrogram and correlation matrix reveal striking associations between groups of CRs. These are reflected in six major modules (), each containing between three and six CRs with similar binding profiles. The six modules encompass all of the CR profiles, except REST (RE1-binding protein), whose profile is dissimilar to all others. Although REST is extensively implicated in CR recruitment, it is the only sequence-specific DNA binding protein in our compendium, which may explain its failure to conform to the modular organization seen for the other 28 CRs.
CR-Modules associate with distinct genomic features and chromatin environments
We next studied the relationship of the CRs and CR-Modules to genomic features, including promoters, transcribed regions, and distal regulatory elements. CRs within each module exhibit remarkably similar patterns of association to genomic features and chromatin states (), which are distinct between modules. Each localization pattern is consistent with known biology, while also providing insight into CR functions. We discuss each module below.
(PHF8, RBBP5, PLU1, CHD1, HDAC1 and SAP30; promoters
) is characterized by preferential binding at promoters (), with 65% to 80% of binding sites overlapping transcriptional start sites (TSSs). The targets carry H3K4me3 and other modifications related to competent (i.e., non-repressed) promoters, but exhibit a wide range of transcriptional activity based on RNA-seq. The results are consistent with known biology: RBBP5 is a core component of MLL complexes that catalyze H3K4me3 (Smith and Shilatifard, 2010
), while PHF8 and CHD1 both bind this modification (Flanagan et al., 2005
; Kleine-Kohlbrecher et al., 2010
; Sims et al., 2005
). In addition, the module contains PLU1 (JARID1B), an H3K4me3 demethylase.
Module I also includes HDAC1 and SAP30, core members of the SIN3 histone deacetylase complex with exquisitely similar binding profiles (R = 0.92). Although deacetylases have generally been linked to repression, the robust occupancy of these factors at non-repressed TSSs is consistent with a prior study that localized deacetylases to many active genes (Wang et al., 2009b
). Importantly, such co-association of CRs with ‘activating’ and ‘repressive’ characteristics in one module is also seen in other modules and nearly all classes of target loci (see below). The co-binding patterns likely reflect widespread roles for opposing CRs in fine-tuning chromatin structure at regulatory loci. Such co-association of CRs with ‘activating’ and ‘repressive’ characteristics in one module is also seen in other modules and nearly all classes of target loci (e.g. promoters, candidate enhancers, etc.). This ‘bi-functional
’ co-binding patterns likely reflect widespread roles for opposing CRs in fine-tuning chromatin structure at regulatory loci.
(RNAPIIS5P, SIRT6, NSD2, CHD7; transcribed regions
) is characterized by binding to active promoters as well as proximal and distal transcripts (). In particular, the cobinding patterns suggest interplay between initiating RNAPII (Smith and Shilatifard, 2010
) and SIRT6 (R = 0.70): 78% of SIRT6-enriched windows reside over the TSS or within the first 5 KB of an active gene (compare to 75% for RNAPIIS5P). Another member of the module, NSD2, also localizes to active transcripts but with greater preference for distal, elongating regions (49% of enriched intervals are within actively transcribed regions). This may reflect interplay between NSD2, a histone methyltransferase, and the elongation mark H3K36 methylation (Nimura et al., 2009
). Finally, CHD7 binds promoters, transcribed regions, and some distal elements.
Module III (JARID1C, HDAC2, HDAC6, ESET; promoters) comprises four CRs with catalytic activities typically associated with repression. These factors co-localize with active and competent promoters, similar to Module I, but also bind repressed targets. JARID1C (SMCX) is an H3K4 demethylase closely related to PLU1 (Module I). HDAC2 and HDAC6 complement HDAC1 (Module I) at active promoters, but also associate with Polycomb-repressed targets. Finally, the H3K9 methyltransferase ESET expands the spectrum of known heterochromatic CRs at promoters. These binding patterns suggest prevalent roles for ‘repressive’ CRs at sites of dynamic chromatin activity.
(P300, MI2, LSD1; candidate enhancers
) includes three CRs that preferentially bind distal regulatory elements, including sites with enhancer-like chromatin (). Consistent with prior reports (Heintzman et al., 2007
), over 70% of P300 sites are distal from TSSs and ~50% of those distal regions are enriched for modifications that correlate with enhancer activity, such as H3K4me1 and H3K27Ac (Birney et al., 2007
; Ernst et al., 2011
; Heintzman et al., 2007
). Moreover, ~55% of distal P300 sites coincide with highly conserved sequences (Lindblad-Toh et al., 2011
). Module IV
also contains two members of the NuRD repressor complex – MI-2 and LSD1 (Wang et al., 2009a
). Both CRs bind distal elements, with ~30% overlap to P300 peaks. LSD1 is a demethylase specific for mono- and di-methylated H3K4 (Shi et al., 2004
), two characteristic methylation states of enhancer chromatin (Birney et al., 2007
; Heintzman et al., 2007
). These novel associations support a model in which chromatin at distal regulatory elements is tightly regulated by opposing enzymatic activities, as observed for promoters above.
(NCOR, PCAF, CBP, HP1γ; candidate enhancers, other distal features
) contains CRs that bind a more diverse set of elements. NCOR, PCAF and CBP each bind thousands of distal elements, many with enhancer-like characteristics, such as P300 binding. Considering all distal P300 sites, 48% are co-bound by CBP, 45% by NCOR and 35% by PCAF (p < 10−15
in all cases). Nevertheless, these three CRs also bind many other loci, accounting for their overall lower correlation with P300 and separate module. CBP is closely related to P300 and has also been shown to bind enhancers (Kim et al., 2010
). PCAF and NCOR are antagonistic regulators associated with nuclear hormone receptor activity and repression, respectively (Perissi et al., 2010
). Although they are typically studied at promoters, their co-localization patterns suggest that they also act at enhancers. The partitioning of distal element CRs into separate modules suggests a high degree of specificity among enhancers and their regulators.
HP1γ, a heterochromatin protein that physically interacts with H3K9me3, occupies diverse chromatin environments. Its association to this module reflects frequent co-binding of distal elements with CBP. However, HP1γ also correlates with CRs in other modules, and binds repetitive elements, Polycomb-repressed regions and ZNF gene clusters (O'Geen et al., 2007
(EZH2, SUZ12, CBX2, CBX8, RNF2; Polycomb-repressed
) comprises core components of Polycomb repressive complexes 1 and 2 (PRC1 and PRC2). Binding occurs almost exclusively in regions enriched for H3K27me3, which typically correspond to transcriptionally-inactive, GC-rich promoters (Ku et al., 2008
; Lee et al., 2006
; Simon and Kingston, 2009
). However, RNF2 (RING1B), an E3 ubiquitin ligase also present in other protein complexes (Vidal, 2009
), shows a limited extent of binding outside of H3K27me3-marked regions.
Together, these findings portray diverse regulatory functions for CRs, and identify combinations of regulators that co-bind, and likely co-regulate, common genomic targets. In specific examples, coordination involves multiple CRs in the same protein complex. However, in most cases, CRs in a module show only partial, albeit significant, overlap, consistent with both shared and unique regulatory functions.
Fine-scale analysis of CR binding patterns across promoters
To evaluate the extent and significance of the modular CR organization and whether it is also guided by combinatorial principles, we next systematically examined CR binding patterns at individual loci. We inspected all promoters bound by more than one regulator. We focused on promoters because roughly half of CR binding events occur within 3 kb of a TSS, and nearly all CRs show some binding across such regions. We clustered the 1,081 promoters that are highly enriched for at least two CRs (Experimental Procedures) by the combinatorial binding of the 18 CRs with substantial promoter occupancy. We also grouped the CRs based on their localization patterns across these loci. This promoter-focused grouping (CR groups) is largely consistent with the CR Modules deduced from genome-wide correlations. However, this fine-scale analysis highlights differential associations of individual CRs with TSSs and flanking regions, as well as differential relations to gene activity.
A first group of CRs – PLU1, CHD1, SIRT6, and CHD7 – exhibits binding profiles characteristic of RNAPII initiation (). Although enriched across all transcriptionally-competent promoters, these CRs are most strongly bound at highly active promoters undergoing productive initiation and elongation, as indicated by the high expression levels of the corresponding genes. Their broad binding distributions over TSSs emulate RNAPIIS5P (). Fine binding patterns thus identify additional CRs with close connections – and possible direct physical interactions – with initiating RNAPII (Smith and Shilatifard, 2010
Fine-scale CR binding profiles distinguish coherent gene sets
A second, larger CR group – ESET, HDAC6, JARID1C, HDAC2, HDAC1, SAP30 and RBBP5 – bind active and competent promoters () in sharp peaks that precisely coincide with TSSs (). In addition to facilitating RNAPII engagement, these CRs may help maintain chromatin integrity around the nucleosome-free TSSs (Jiang and Pugh, 2009
), by fine-tuning modifications of the flanking −1 and +1 nucleosomes.
A third group of CRs – EZH2, SUZ12, RNF2, CBX2 and CBX8 – includes core components of PRC2, which catalyzes H3K27me3, and PRC1, which binds H3K27me3 and mediates chromatin compaction (Margueron and Reinberg, 2010
; Simon and Kingston, 2009
) (). These CRs bind inactive promoters, many of which correspond to genes involved in development or signaling. Remarkably, PRC2 and PRC1 subunits exhibit distinct fine-scale binding profiles over the promoters (). PRC2 components (EZH2, SUZ12) peak over TSSs, potentially reflecting interactions with DNA sequences in these nucleosome-depleted regions. In contrast, PRC1 components (CBX2, CBX8) bind broadly across the same regions, likely promoted by physical interactions with flanking H3K27me3-marked nucleosomes (). Notably, Polycomb-repressed promoters are the only set of genomic elements in our study that are not subject to opposing chromatin regulatory activities, as they are bound exclusively by repressive CRs in K562 cells.
Fine-scale promoter analysis reveals combinatorial complexity of CR associations
Although the promoter clustering largely corresponds to the modular organization discerned from genome-wide correlations, it also reveals several exceptions that may reflect combinatorial CR binding. In some cases, different CRs bind the same promoters but with distinct binding structures. For example, despite largely overlapping targets, CHD1 and PLU1 exhibit markedly different binding patterns. CHD1 peaks sharply over TSSs, while PLU1 extends well into transcribed regions ().
In other cases, a CR is associated with different CR groups under different promoter contexts (). Particularly striking examples of such combinatorial partitioning involve deacetylase complexes (Yang and Seto, 2008
). SIN3 complex members HDAC1, HDAC2 and SAP30 bind promoters of genes that oscillate during the cell cycle with an intensity that distinguishes them from all other targets (, cluster 5). In addition, HDAC1, HDAC2 and JARID1C (members of the CoREST complex (Tahiliani et al., 2007
)) co-bind along with HDAC6 to repressed PRC2 targets (, clusters 6–8). This association may reflect physical interactions between CoREST and Polycomb complexes (Ren and Kerppola, 2011
; Tsai et al., 2010
) and/or direct interactions between HDAC2 and PRC2 (van der Vlag and Otte, 1999
). The fine binding patterns of these CRs vary dramatically based on the context of the target gene’s activity or the co-binding CRs. For example, HDAC2 binds sharply over transcriptionally-competent TSSs, but distributes broadly over Polycomb-repressed promoters ( clusters 5 and 13, and ). This is consistent with a model in which histone deacetylases act as fine-tuners of accessible chromatin at competent TSSs, but as enforcers of hypo-acetylated chromatin domains at Polycomb-repressed loci.
Combinatorial CR binding patterns are associated with refined functional distinctions
We next explored whether individual CRs or CR combinations might be associated with specific cellular processes. The promoter-based analysis revealed 15 ‘combinatorial binding’ gene clusters
, each of which shares binding by a combination of CRs, as well as a fine CR location structure around their TSSs ( and S3
, horizontal blocks). The genes in many of these clusters are characterized by shared functional attributes (, labels on right, Table S4
). In particular, genes with similar
expression levels but distinct
biological functions are often bound by distinct
combinations of CRs. For example, the ‘Protein Metabolism’ cluster (, cluster 2; 110 genes) is comprised of highly expressed genes whose promoters are co-bound by SIRT6, CHD1, PLU1 and RNAPIIS5P. A distinct cluster consists of 84 genes with similarly high expression, but whose promoters are co-bound by these CRs along with CHD7 and HDAC6, is enriched for genes involved in chromatin architecture (cluster 1). A separate binding cluster (cluster 5; 92 genes), enriched for cell cycle gene promoters, is unremarkable in terms of its intermediate expression levels, but prominently co-bound by HDAC1, HDAC2 and SAP30. The physical association of these core SIN3 components with these promoters offers a mechanistic explanation for documented roles for this repressor complex and histone deacetylase activity in cell cycle progression (David et al., 2008
; Minucci and Pelicci, 2006
). Interestingly, the promoters in the ‘stress response’ cluster are co-bound by most of the ‘activating’ and ‘repressive’ CRs in our panel, which may play important roles in the notable capacity of these genes to rapidly change their activity in response to stimuli.
CRs occupy different loci in ES cells, but maintain their modular associations
We next explored whether the co-localization patterns and associations observed in K562 cells can be generalized to other cell types. We considered several layers of CR organization. First, we asked whether CRs distribute to different genomic locations consistent with changed gene expression programs. Second, we asked whether the associations between individual CRs and chromatin modification states change between cell types. Third, we asked whether the modular relationships between CRs are maintained in different cell types.
To examine each of these possibilities, we generated ChIP-seq data for 15 CRs in human ES cells and analyzed their localization patterns (). We used the same computational methods as in K562 cells to identify regions of enrichment, which yielded similar overall statistics (Table S3
Comparisons of CR binding and modular associations in K562 and ES cells
The CRs differ substantially in their genomic location between the cell types, though the degree of overlap varies between CRs (, left). The patterns of re-localization are reminiscent of histone modifications (, right), which dynamically change between these cell types (Ernst et al., 2011
) consistent with differential transcriptional programs.
Despite substantial differences in CR localization, the underlying CR organization is maintained between the cell types (). First, the degree of co-binding between pairs of CRs is conserved between the two cell types (R=0.64). Similarly, the degree of correspondence between a given CR and a given histone modification is also well correlated (R=0.79). Thus, CR-CR associations as well as CR-histone modification associations are globally preserved between cell types.
Furthermore, the relationships between individual CRs and genomic annotations remain largely unchanged. For most CRs, the distribution of binding between promoter, transcribed, distal and repressed regions is highly concordant between K562 and ES cells (). Conservation of binding patterns is also evident when comparing the fine-scale promoter profiles of CRs in ES cells () to those in K562 cells (). Consistent patterns of binding are evident for CRs associated with competent TSSs (e.g., PHF8, RBBP5, SAP30), productive initiation (e.g. CHD1, SIRT6) and Polycomb repression (e.g., EZH2, SUZ12). Gene sets distinguished based on combinatorial binding profiles are also similar between the cell types ( and S4
; Table S5
Notably, when there are
changes in CR localization, they tend to be shared by members of the same module and to relate to a fundamental difference in chromatin structure between cells (). For example, although Module I
CRs (e.g., PHF8, CHD1, RBBP5) are restricted to active and competent promoters in K562 cells, they also associate with Polycomb-repressed promoters in ES cells (). The presence of multiple ‘activating’ CRs at these inactive targets is consistent with the enrichment of the underlying chromatin for opposing (‘bivalent’) histone modifications. These CRs likely contribute to the poised character of the corresponding genes, many of which are induced during ES cell differentiation (Bernstein et al., 2006
). In addition, P300 binds substantially fewer sites in ES cells than in K562 cells (), possibly reflecting a lower prevalence of enhancer-like chromatin in ES cells (Ernst et al., 2011
Overall, our analysis suggests that the modular and combinatorial structures of CRs, and their association with histone modification states, are constitutive features of the chromatin regulatory network. Thus, changes in CR binding tend to be coordinated at the level of modules, and to correspond to changes in the underlying chromatin landscape.