To study chromatin structure in mammalian cells, we performed the Hi-C experiment2
in mouse embryonic stem cells (mESCs), human embryonic stem cells (hESCs), and human IMR90 fibroblasts. Together with Hi-C data for the mouse cortex generated in a separate study3
, we analyzed over 1.7 billion read pairs of Hi-C data corresponding to pluripotent and differentiated cells (Supplemental Table 1
). We normalized the Hi-C interactions for biases in the data (Supplemental Figure 1 and 2
. To validate the quality of our Hi-C data, we compared the data with previous 5C, 3C, and FISH results 5–7
. Our IMR90 Hi-C data shows a high degree of similarity when compared to a previously generated 5C dataset from lung fibroblasts (Supplementary Figure 4
). In addition, our mESC Hi-C data correctly recovered a previously described cell-type specific interaction at the Phc1
(Supplementary Figure 5
). Furthermore, the Hi-C interaction frequencies in mESCs are well-correlated with the mean spatial distance separating six loci as measured by 2D-FISH7
(Supplemental Figure 6
), demonstrating that the normalized Hi-C data can accurately reproduce the expected nuclear distance using an independent method. These results demonstrate that our Hi-C data is of high quality and accurately captures the higher order chromatin structures in mammalian cells.
We next visualized 2D-interaction matrices using a variety of bin sizes to identify interaction patterns revealed as a result of our high sequencing depth (Supplemental Figure 7
). We noticed that at bin sizes less than 100kb, highly self-interacting regions begin to emerge (, Supplemental Figure 7
, seen as “triangles” on the heatmap). These regions, which we term “topological domains,” are bounded by narrow segments where the chromatin interactions appear to end abruptly. We hypothesized that these abrupt transitions may represent boundary regions in the genome that separate topological domains.
Topological Domains in the Mouse ES cell Genome
To systematically identify all such topological domains in the genome, we devised a simple statistic termed the “directionality index” (DI) to quantify the degree of upstream or downstream interaction bias for a genomic region, which varies considerably at the periphery of the topological domains (, see supplemental methods
for details). The DI was reproducible (Supplemental Table 2
) and pervasive, with 52 % of the genome having a DI that was not expected by random chance (, FDR = 1%). We then used a Hidden Markov model (HMM) based on the DI to identify biased “states” and therefore infer the locations of topological domains in the genome (, see supplemental methods
for details). The domains defined by HMM were reproducible between replicates (Supplemental Figure 8
). Therefore, we combined the data from the HindIII replicates and identified 2,200 topological domains in mESCs with a median size of 880kb that occupy ~91% of the genome (Supplemental Figure 9
). As expected, the frequency of intra-domain interactions is higher than inter-domain interactions (). Similarly, FISH probes7
in the same topological domain () are closer in nuclear space than probes in different topological domains (), despite similar genomic distances between probe pairs (). These findings are best explained by a model of the organization of genomic DNA into spatial modules linked by short chromatin segments. We define the genomic regions between topological domains as either “topological boundary regions” or “unorganized chromatin”, depending on their sizes (Supplemental Figure 9
We next investigated the relationship between the topological domains and the transcriptional control process. The HoxA
locus is separated into two compartments by an experimentally validated insulator 5,8,9
, which we observed corresponds to a topological domain boundary in both mouse () and human (). Therefore, we hypothesized that the boundaries of the topological domains might correspond to insulator or barrier elements.
Topological Boundaries Demonstrate Classical Insulator or Barrier Elements Features
Many known insulator or barrier elements are bound by the zinc-finger containing protein CTCF 10–12
. We see a strong enrichment of CTCF at the topological boundary regions (, Supplemental Figure 10
), indicating that topological boundary regions share this feature of classical insulators. A classical boundary element is also known to stop the spread of heterochromatin. Therefore, we examined the distribution of the heterochromatin mark H3K9me3 in humans and mice in relation to the topological domains13,14
. Indeed, we observe a clear segregation of H3K9me3 at the boundary regions that occurs predominately in differentiated cells (, Supplemental Figure 11
). Since the boundaries we analyzed in are present in both pluripotent cells and their differentiated progeny, the topological domains and boundaries appear to “pre-mark” the end points of heterochromatic spreading. Therefore, the domains do not appear to be a consequence of the formation of heterochromatin. Taken together, the above observations strongly suggest that the topological domain boundaries correlate with regions of the genome displaying classical insulator and barrier element activity, thus revealing a potential link between the topological domains and transcriptional control in the mammalian genome.
We compared the topological domains with previously described domain-like organizations of the genome, specifically with the A and B compartments described by Lieberman-Aiden et al.,2
with Lamina-Associated Domains (LADs) 11,15
, replication time zones,16,17
and Large Organized Chromatin K9-modification (LOCK) domains18
. In all cases, we can see that topological domains are related to, but independent from, each of these previously described domain-like structures (Supplemental Figures 12–15
). Notably, a subset of the domain boundaries we identify appear to mark the transition between either LAD and non-LAD regions of the genome (, Supplemental Figure 12
), the A and B compartments (Supplemental Figure 13, 14
), and early and late replicating chromatin (Supplemental Figure 14
). Lastly, we can also confirm the previously reported similarities between the A and B compartments and early and late replication time zone (Supplemental Figure 16
We next compared the locations of topological boundaries identified in both replicates of mESCs and cortex, or between both replicates of hESCs and IMR90 cells. In both human and mouse, the majority of the boundary regions are shared between cell types (, Supplemental Figure 17a
), suggesting that the overall domain structure between cell types is largely unchanged. At the boundaries called in only one cell type, we noticed that trend of upstream and downstream bias in the DI is still readily apparent and highly reproducible between replicates (Supplemental figure 17b,c
). We cannot determine if the differences in domain calls between cell types is due to noise in the data or due to biological phenomena, such as a change in the strength of the boundary region between cell types19
. Regardless, these results suggest that the domain boundaries are largely invariant between cell types. Lastly, only a small fraction of the boundaries show clear differences between two cell types, suggesting that a relatively rare subset of boundaries may actually differ between cell types (Supplemental Figure 18
Boundaries are shared across cell types and conserved in evolution
The stability of the domains between cell types is surprising given previous evidence showing cell type specific chromatin interactions and conformations 6,8
. To reconcile these results, we identified cell-type specific chromatin interactions between mouse ES cell and mouse cortex. We identified 9,888 dynamic interacting regions in the mouse genome based on 20kb binning using a binomial test with an empirical false discover rate of < 1% based on random permutation of the replicate data. These dynamic interacting regions are enriched for differentially expressed genes, (, Supplemental Figure 19
, Supplemental Table 5
). In fact, 20% of all genes that undergo a 4-fold change in gene expression are found at dynamic interacting loci. This is likely an underestimate, because by binning the genome at 20kb, any dynamic regulatory interaction less than 20kb will be missed. Lastly, > 96% of dynamic interacting regions occur in the same domain (). Therefore, we favor a model where the domain organization is stable between cell types, but the regions within each domain may be dynamic, potentially taking part in cell-type specific regulatory events.
The stability of the domains between cell types prompted us to investigate if the domain structure is also conserved across evolution. To address this, we compared the domain boundaries between mouse ES cells and human ES cells using the UCSC liftover tool. The majority of boundaries appear to be shared across evolution (53.8% of human boundaries are boundaries in mouse and 75.9% of mouse boundaries are boundaries in humans, compared to 21.0% and 29.0% at random, p-value <2.2×10−16, Fisher’s Exact Test) (). The syntenic regions in mouse and human in particular share a high degree of similarity in their higher order chromatin structure (), indicating that there is conservation of genomic structure beyond the primary sequence of DNA.
We explored what factors may contribute to the formation of topological boundary regions in the genome. While most topological boundaries are enriched for the binding of CTCF, only 15% of CTCF binding sites are located within boundary regions (). Thus, CTCF binding alone is insufficient to demarcate domain boundaries. We reasoned that additional factors might be associated with topological boundary regions. By examining the enrichment of a variety of histone modifications, chromatin binding proteins, transcription factors, around topological boundary regions in mESC, we observed that factors associated with active promoters and gene bodies are enriched at boundaries in both mouse and humans ( and Supplemental Figures 20–23
. In contrast, non-promoter associated marks, such as H3K4me1 (associated with enhancers) and H3K9me3, were not enriched or were specifically depleted at boundary regions (). Furthermore, transcription start sites (TSS) and global run on sequencing (GRO-Seq)22
signal were also enriched around topological boundaries (). We found that “housekeeping genes” were particularly strongly enriched near topological boundary regions (, See Supplemental Table 7
for complete GO terms enrichment). Additionally, the tRNA genes, which have the potential to function as boundary elements23,24
, are also enriched at boundaries (p-value < 0.05, Fisher’s exact test (). These results suggest that high levels of transcription activity may also contribute to boundary formation. In support of this, we can see examples of dynamic changes in H3K4me3 at or near some cell-type specific boundaries that are cell type-specific (Supplemental Figure 24
). Indeed, boundaries associated with both CTCF and a housekeeping gene account for nearly a third of all topological boundaries in the genome (, Supplemental Figure 24
Boundary regions are enriched for housekeeping genes
Lastly, we analyzed the enrichment of repeat classes around boundary elements. We observed Alu/B1 and B2 SINE elements in mouse and Alu SINE elements in humans are enriched at boundary regions (, Supplemental Figures 24,25
). In light of recent reports indicating that a SINE B2 element functions as a boundary in mice 25
, and SINE element retrotransposition may alter CTCF binding sites during evolution 26
, we believe this contributes to a growing body of evidence suggesting a role for SINE elements in the organization of the genome.
In summary, we show that the mammalian chromosomes are segmented into megabase-sized topological domains, consistent with some previous models of the higher order chromatin structure 1,27,28
. Such spatial organization appears to be a general property of the genome: it is pervasive throughout the genome, stable across different cell types and highly conserved between mice and humans.
We have identified multiple factors that are associated with the boundary regions separating topological domains, including the insulator binding factor CTCF, housekeeping genes, SINE elements. The association of housekeeping genes with boundary regions extends previous studies in yeast, insects and lower vertebrates and suggests that non-CTCF factors may be also involved in insulator/barrier functions in mammalian cells 29
The topological domains we identified are well conserved between mice and humans. This suggests that the sequence elements and mechanisms that are responsible for establishing higher order structures in the genome may be relatively ancient in evolution. A similar partitioning of the genome into physical domains has also been observed in Drosophila
and in high-resolution studies of the X-inactivation center in mice (termed Topologically Associated Domains or TADs)31
, suggesting that topological domains may be a fundamental organizing principle of metazoan genomes.