In this study, we present an integrated analytical method to identify and characterize different types of chromatin linkages. By integrating multiple genome-wide data sets from K562 cells, including Hi-C data, ChIP-seq for 45 transcription factors and 9 histone modifications, DNase hypersensitivity assays and RNA-seq data, we have identified 12 distinct sets of chromatin linkages comprising a total of 96 137 sets of two spatially separated interacting loci, shown that each cluster has distinct epigenetic markings, is composed of paired sets of genes with distinct expression patterns and is differently correlated with TFBSs. To validate the biological importance of the identified interacting loci, we investigated genes regulated by GATA1 and GATA2. Our analyses are consistent with the hypothesis that the GATA factors regulate a subset of target genes via looping.
The Hi-C data analysis demonstrated that the regions involved in creating loops between interacting loci were preferentially in or near open chromatin, suggesting that bound transcription factors may play a crucial role in creating the genomic interactome. Many of the transcription factors we analyzed bind near promoter regions (as defined by specifically modified histones), suggesting that a factor bound near a start site may interact with another factor bound to a distal region of open chromatin. In support of this concept, many of the paired interactions showed evidence for a promoter region at one locus but not at the other. However, our analyses also identified a subset of interacting loci (cluster 9) that has unique properties. These loci show evidence of open chromatin and H3K4me1 but do not resemble active promoters or enhancers. Analysis of the ChIP-seq data identified a set of three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (SIRT6, BRG1 and INI1) that were specifically enriched at sites having only these two chromatin marks. BRG1 (also called SMARCA4) and INI1 (also called SMARCB1) are both components of the SWI/SNF chromatin-remodeling complex. The presence of SIRT6, a histone deacetylase, perhaps explains the absence of H3K27Ac at the regions of open chromatin bound by this complex. Interestingly, analysis of the 45 ChIP-seq data sets using the Apriori algorithm also showed that GATA1, GATA2, c-Jun, BRG1 and INI1 were closely linked. Two of the factors, GATA1 and GATA2, are members of the same gene family, have several similar DNA-binding motifs and bind to many of the same sites in K562 cells (17
). Also, we have previously shown that GATA2 co-localizes and regulates gene expression in concert with c-Jun in human endothelial cells (45
), providing support that GATA factors cooperate with c-Jun to regulate expression of genes in cluster 9. Finally, BRG1 is reported as a cofactor of GATA1 (46
). We, and others, have previously shown that BRG1 functions cooperatively with GATA1 at certain genes through chromatin loop structure (44
). However, the overall involvement of GATA factors in chromatin looping has not been previously investigated. Thus, taken together, the unbiased clustering of Hi-C interacting loci, the unbiased clustering of ChIP-seq data and the correlation of transcription factor binding with histone modifications, in combination with previous reports of linkage between GATAs, c-Jun, and BRG1, suggest that a subset of GATA targets may be regulated via interacting loci. Accordingly, we tested this prediction by introduction of siRNAs to GATA1 or GATA2 followed by RNA-seq analyses. We found that 7497 genes were deregulated on knockdown of GATA1 of which ~36% were regulated by a nearby bound GATA1, 12% by a GATA1 bound to an interacting loci and 52% were indirectly regulated. Similarly, 2512 genes were deregulated on knockdown of GATA2, of which ~30% were regulated by a nearby bound GATA2, 9% by a GATA2 bound to an interacting loci and 61% were indirectly regulated. Our experimental validations support the concept that GATA factors indeed regulate gene expression through interaction with distal loci. We note that GATA1 and GATA2 bind to many of the same sites in K562 cells (17
) and thus most likely regulate some of the same genes. These factors bind independently (and not at the same time) to GATA sites; knockdown of GATA1 would still allow binding of GATA2 (and vice versa). Therefore, it is likely that more robust changes in gene expression may have been observed if both factors could be knocked down at the same time.
GATA factors have previously been shown to be both activators and repressors, and our data demonstrate that genes regulated by looping can be either upregulated or downregulated on loss of GATA1 or GATA2. One possible model by which loss of GATA1 could result in activation of a distal gene is shown in . In this model, a loop is shown between a promoter and a distal region that is created by interactions of GATA1, BRG1, INI1 and SIRT6 (a histone deacetylase), all bound to a region having H3K4me1 but no marks of an active enhancer or promoter and the nearby gene is repressed (consistent with the histone marks enriched in cluster 9). On reduction of GATA1 levels, a different set of enhancer-binding factors is recruited to the distal open chromatin region, the loop changes from a repressive to an activating structure, and transcription is initiated. However, we recognize that there are many other mechanistic possibilities for how the GATAs and BRG1 could regulate transcription, such as the complex serving as an activator. For example, a recent study showed that 58% of the GATA1 sites identified in Cluster of Differentiation (CD) 36+ erythrocyte precursor cells were also bound by BRG1 (49
); the authors suggest that recruitment of BRG1 by GATA1 allows binding of T-cell acute lymphocytic leukemia protein 1 (TAL1) to the enhancer region and results in transcriptional activation of certain GATA1 target genes.
Figure 8. Schematic model of GATA-regulated chromatin linkages. Dynamic chromatin interactions form globule structures, which may function to initiate or stabilize three-dimensional gene regulatory structures. Multiple CREs, including enhancers and promoters, are (more ...)
In conclusion, we demonstrate that when combined with in-depth analysis of histone modifications and transcription factor binding, Hi-C data can serve as a powerful tool for exploring the complex underlying mechanisms of chromatin organization. Previous studies have shown that environmental changes such as estrogen treatment can cause intensive looping and de-looping events (50–52
), providing evidence that chromatin-bound TFs may induce dynamic changes in genome organization. Our analyses show that most TFs have thousands of binding sites that are associated with chromatin interaction sites and that distinct clusters of interacting loci can be bound by subsets of TFs provide genome-wide evidence in support of the concept that a set of TFs may create distinct types of chromatin linkages, where co-regulated genes are brought into close proximity from different chromosomal locations. We also note that our identification of a GATA-enriched set of physically interacting loci was obtained using unbiased clustering of Hi-C and ChIP-seq data from K562 cells. Given the documented role of GATA factors in controlling hematopoiesis and erythroid differentiation (53–55
), the identification of a GATA-enriched set of chromatin linkages provides evidence that the clustering analysis can identify master regulators of the transcriptome. With the rapid development of sequencing technologies, Hi-C data collection is becoming more readily available for a variety of cell types. As other cell type-specific Hi-C data are obtained and the set of factors analyzed by ChIP-seq increases, our analyses can be repeated using data from these additional cell types. We predict that clusters defined by open chromatin and specific histone marks (such as cluster 9 in K562 cells) will show co-association with different sets of transcription factors in different cell types. We suggest that an integrative analysis of the Hi-C data with histone modifications and transcription factor ChIP-seq data sets will identify different biologically-relevant clusters in different cell types and help to identify cell type-specific master regulators.