Currently, efforts are directed at producing high-resolution genome annotations where the positions of functional elements or specific chromatin states are mapped onto the linear genome sequence1
. However, these linear representations do not indicate functional or structural relationships between distant elements. For instance, recent insights suggest that widely spaced functional elements cooperate to regulate gene expression by engaging in long-range chromatin looping interactions. The three-dimensional (3D) organization of chromosomes is thought to facilitate compartmentalization2,3
, chromatin organization4
, and spatial sequestration of genes and their regulatory elements5–7
, all of which may modulate the output and functional state of the genome. A general approach to determine the spatial organization of chromatin can aid in the identification of long-range relationships between genes and distant regulatory elements as well as in the identification of higher-order folding principles of chromatin in general.
Chromosome conformation capture (3C)-based assays use formaldehyde cross-linking followed by restriction digestion and intra-molecular ligation to study chromatin looping interactions7–12
. 3C-based assays have been used to show that specific elements such as promoters, enhancers and insulators are involved in the formation of chromatin loops5,7,13–16
. The frequencies by which loci interact reflect chromatin folding7,17
and thus comprehensive chromatin interaction datasets can help building spatial models of chromatin. Previously, chromatin conformation has been modeled using polymer models8,18
and molecular dynamics simulations19
, which have proven valuable for understanding general features of chromatin fibers including flexibility and compaction20,21
. However, such methods only partially leverage the current wealth of experimental data on chromatin folding. Recently, experiment-driven approaches, in combination with computational modeling, have resulted in low-resolution models for the topological conformation of the immunoglobulin heavy-chain22
, the HoxA23
loci and the yeast genome24
. However, those methods were limited by the resolution and completeness of the input experimental data22
, by insufficient model representation, scoring and optimization23
or limited analysis of the 3D models24
. To overcome such limitations, we developed a new approach that couples high-throughput 3C-carbon copy (5C) experiments9
with the Integrative Modeling Platform (IMP)25
. We applied this approach to determine the higher-order spatial organization of a 500 Kb gene dense domain located near the left telomere of human chromosome 16 (). Embedded in this cluster of ubiquitously expressed house keeping genes is the tissue-specific α-globin locus that is only expressed in erythroid cells. This 500 Kb domain corresponds to the ENm008 region extensively studied by the ENCODE pilot project1
Figure 1 ENCODE region ENm008 on human chromosome 16. (a) Map of ENm008 including the ζ, μ, α2, α1, and θ globin genes. Genes are indicated by grey lines above the linear representation. Vertical black lines indicate Hin (more ...)
The α-globin locus has been widely used as a model to study the mechanism of long-range and tissue-specific gene regulation15,26–30
. The α-globin genes are up regulated by a set of functional elements, characterized by the presence of DNAse I hypersensitive sites (HSs) located 33 to 48 Kb upstream of the ζ gene. One of these elements, HS40, is considered to be of particular importance31,32
. This element can act as an enhancer in reporter constructs and its deletion severely impacts activation of the α-globin genes33
. HS40 is bound by several erythroid transcription factors including GATA factors and NF-E234
. Importantly, previous 3C studies have demonstrated direct long-range looping interactions between some of these distant functional elements (i.e.
, HS48, HS46 and HS40) and the α-globin genes upon gene activation in mouse and human erythroid cells15,30
. Major unanswered questions revolve around the higher-order folding of multi-gene domains like ENm008, and how long-range interactions involved in regulation of each of the resident genes are accommodated.
We have obtained comprehensive interaction maps of the α-globin locus by performing 5C analysis of the ENm008 region in GM12878 and K562 cells. These two cell lines, which differ in the expression of the α-globin genes, are studied by the ENCODE consortium and therefore extensive chromatin structural and functional information for ENm008 is publicly available. We developed a general approach to generate 3D chromatin models based on chromatin interaction data. Our models of the ENm008 domain in GM12878 cells show that it forms a single compact structure, which we refer to as a chromatin globule. We find that active genes and promoters tend to be located at the center of the globule, whereas inactive genes are more peripherally positioned. Interestingly, in cells that express high levels of α-globin (K562) the chromatin is broken into two globules separated by an extended chromatin segment. We propose that sets of neighboring active genes cluster to form chromatin globules, perhaps analogous to transcription factories, and that a given globule can accommodate only a limited number of active genes.