|Home | About | Journals | Submit | Contact Us | Français|
We developed Tethered Conformation Capture (TCC), a method for genome-wide mapping of chromatin interactions. By implementing solid-phase ligation, TCC substantially enhanced the signal-to-noise ratio and thus, enabled a detailed analysis of inter-chromosomal interactions. We identified a group of regions in each chromosome that predominantly mediate inter-chromosomal interactions. These regions are marked by high transcriptional activity, suggesting that their interactions are mediated by transcription factories. Each of these regions interacts with numerous other such regions throughout the genome in an indiscriminate fashion, partly driven by the accessibility of the partners. Therefore, it is likely that a different combination of interactions is present in different cells. Accommodating this variability, we developed a computational method to translate the TCC data into physical chromatin contacts in a population of three-dimensional genome structures. Statistical analysis of the resulting population demonstrates that the indiscriminate properties of inter-chromosomal interactions is consistent with the well-known architectural features of the human genome.
The three-dimensional (3D) organization of the eukaryotic genome plays important roles in nuclear functions1, 2. However, few structural details of chromatin organization have been delineated at the genomic scale. For instance, individual chromosomes are localized in spatially distinct volumes known as the chromosome territories3, which tend to occupy preferential positions with respect to the nuclear periphery4, 5. Moreover, the territories of different chromosomes form extensive interactions6, and high-density gene clusters can extend outside of the bulk of their chromosome’s territory7. Nevertheless, the internal organization of chromosome territories and the mechanisms that govern the interactions between them are not well-understood.
Chromosome conformation capture (3C)-based techniques have emerged as powerful tools for mapping chromatin interactions8-16. The genome-wide application of these techniques has revealed that functional activity can determine the association preferences of loci within each chromosome10. Further understanding of the spatial organization of chromosomes, however, is limited by several factors. For one, low signal-to-noise ratios in conformation capture experiments compromise their ability to map low frequency interactions, especially those between chromosome territories. Additionally, the data represent an ensemble average of genome structures in the cell population, wherein individual structures may significantly differ from each other17-19. Coupled with the enormous size of the genome, this heterogeneity of genome architecture makes translating conformation capture data into 3D structural models challenging. As a result, even as genome-wide conformation capture data have been used to propose theoretical folding models10, they have not yet been employed for determining the corresponding 3D structures of the entire genome in mammalian cells.
For the genome-wide mapping of chromatin contacts, we have developed the Tethered Conformation Capture (TCC) technology, a modified conformation capture method in which key reactions are carried out on solid-phase instead of in solution. This tethering strategy leads to higher signal-to-noise ratios, enabling an in-depth analysis of inter-chromosomal interactions. We show that a specific group of functionally active loci are more likely to form inter-chromosomal contacts and that most of these contacts are a result of indiscriminate encounters between loci that are accessible to each other. We also introduce a structural modeling procedure that calculates a population of 3D genome structures from the TCC data. We show that the calculated population reproduces the hallmarks of chromosome territory positioning in agreement with independent fluorescence in situ hybridization (FISH) studies. This population-based approach allows for a probabilistic analysis of the spatial features of the genome, a capability that can accommodate the wide range of cell-to-cell structural variations that are observed in mammalian genomes17, 20.
To identify chromatin interactions using TCC (Fig. 1), native chromatin contacts were preserved by chemically crosslinking DNA and proteins. The DNA was then digested with a restriction enzyme, and, after cysteine biotinylation of proteins, the protein-bound fragments were immobilized at a low surface density on streptavidin-coated beads. The immobilized DNA fragments were then ligated while tethered to the surface of the beads. Finally, ligation junctions were purified, and ligation events were detected by massively parallel sequencing, a process which revealed the genomic locations of the pairs of loci that had formed the initial contacts (Fig. 1).
We applied TCC, using HindIII as the restriction enzyme, to map the chromatin contacts in GM12878 human lymphoblastoid cells (Supplementary Table 1). As an example of non-tethered conformation capture, we also applied Hi-C10 to the same cell line using identical cell counts and crosslinking conditions. The resulting contact frequency maps (Fig. 2a,b and Supplementary Fig. 1a) showed that TCC accurately reproduces the patterns observed in Hi-C results (Pearson’s r for genome-wide comparison = 0.96, p-value < 10-16). Additionally, the general features of genome-wide conformation capture data that were described previously10 were also observed in our data (Fig. 2a,b and Supplementary Fig. 1a,b,c).
One of the main sources of noise in conformation capture experiments is random intermolecular ligations between DNA fragments that are not crosslinked to each other9, 21. Because randomly selected DNA fragments are more likely to originate from different chromosomes, these ligations tend to be exceedingly inter-chromosomal. We, therefore, measured the fraction of inter-chromosomal ligations in our tethered (TCC) and non-tethered (Hi-C) HindIII libraries to compare their relative noise levels (Fig. 2c). In the tethered library, this fraction is almost half that of the non-tethered library. We also compared the average difference between the observed inter-chromosomal contact frequencies in each library and those expected from completely random inter-molecular ligations. This difference is twice as large in the tethered library compared to the non-tethered library (Supplementary Methods). Together, these observations indicate that the noise from random inter-molecular ligations is considerably lower in the tethered library.
We also generated tethered and non-tethered libraries using the 4-cutter MboI instead of HindIII. MboI results in a shorter size and a higher concentration of DNA fragments, thereby increasing the probability of random inter-molecular ligations. Consequently, the fraction of inter-chromosomal ligations increased substantially in the non-tethered MboI library (Fig. 2c). By contrast, it showed only a modest increase in the tethered MboI library. This result demonstrates that tethered libraries are minimally affected by the concentration of DNA fragments, confirming that most ligations in these libraries are between DNA fragments that are crosslinked to each other.
An improved signal-to-noise ratio allows a more accurate analysis of contacts with relatively low frequencies such as interactions between chromosomes (Supplementary Fig. 1d). For instance, several interactions between the small arm of chromosome 2 and chromosomes 20, 21, and 22 are clearly enriched in the tethered HindIII library (Fig. 2d) but not the non-tethered HindIII library (Fig. 2e).
We first analyzed the contact pattern within each chromosome. We defined the contact profile of a region as the ordered list of frequency values for its contacts with all the other regions in the genome (Methods). The Pearson’s correlation between two intra-chromosomal contact profiles is a similarity measure for the corresponding regions’ contact behaviors. Using this measure and confirming a previous study10, we observed that each chromosome can be divided into two classes of regions with anti-correlated intra-chromosomal contact profiles (Fig. 3a and Supplementary Fig. 2a). At any given genomic distance, regions in the same class contact each other more frequently than regions in different classes (Supplementary Fig. 2b). One of these classes, here referred to as the “active class”, is significantly enriched for the presence and expression of genes, DNase hypersensitivity, and activating histone modifications10(Supplementary Fig. 2c). The other class, here referred to as “inactive”, displays the opposite behavior (Supplementary Fig. 2c).
We asked how the similarity between contact profiles changes with increasing genomic distance between the regions on a chromosome. Interestingly, the contact profiles of the active regions remain similar even when relatively long genomic distances separate them (Fig. 3b). For the inactive regions, in contrast, the contact profile similarity decreases more quickly and dissipates at longer distances (Fig. 3b). Therefore, inactive regions are more likely to associate with their neighboring regions while active regions can associate with a more diverse panel of long-range contact partners.
A special case of this behavior was observed in the interactions between inactive regions of large chromosomes (i.e., 1-6,8,10). The average contact profile similarity decreases abruptly for inactive regions separated by the centromere. Consequently, only inactive regions in the same chromosome arm have similar contact profiles (Supplementary Fig. 3a). The frequency of contacts between inactive regions in different chromosome arms is also significantly lower than would be expected from their sequence separation alone (Supplementary Fig. 3b). These characteristics give rise to a distinctive four-block pattern in the “inactive-only” correlation matrices of the larger chromosomes (Fig. 3c and Supplementary Fig. 3c). In contrast, the contact profile similarity of active regions is largely unaffected by the centromere (Fig. 3c and Supplementary Fig. 3a,c). These results suggest that, in larger chromosomes, inactive regions from opposing chromosome arms are largely inaccessible to each other while active regions can still interact.
We next analyzed the contacts between chromosomes. We began by defining the inter-chromosomal contact probability index (ICP) as the sum of a region’s inter-chromosomal contact frequencies divided by the sum of its inter and intra-chromosomal contact frequencies. ICP, therefore, describes the propensity of a region to forming inter-chromosomal contacts.
Interestingly, we observed large differences in the distribution of ICP between the active and inactive classes. In the inactive class, the vast majority of regions have relatively low ICPs with the exception of a few cases (Fig. 4a and Supplementary Fig. 4a,b). Most of these exceptions flank the unalignable regions of the centromeres, and their high ICP is due to interaction with the centromeric regions of other chromosomes (Supplementary Fig. 5a). Additionally, the centromeric regions of the acrocentric chromosomes are more likely to contact each other than the centromeric regions of the metacentric chromosomes (Supplementary Fig. 5b). Furthermore, we found the highest centromere contact frequencies between chromosomes 13 and 21 and between chromosomes 14 and 22 (Supplementary Fig. 5c). All of these observations are in excellent agreement with previous imaging studies in lymphocytic cells22-24.
In the active class, on the other hand, many regions have high ICPs. In fact, the vast majority of regions with a large ICP belong to the active class (Fig. 4a and Supplementary Fig. 4a,b). For example, in chromosome 2, 90% of the regions with a top 25% ICP are members of the active class (Fig. 4a). Nevertheless, not all the active regions have a large ICP. For instance, about 40% of the active regions in chromosome 2 form relatively few inter-chromosomal contacts, and their ICPs are similar to those of the inactive regions (Fig. 4a). This non-uniform contact behavior may reflect functional variations within this class. Indeed, we observed that those active regions with larger ICPs also show higher RNA polymerase II binding (Fig. 4b) as well as higher total gene expression (Pearson’s r = 0.54, p-value < 10-15), indicating that higher transcriptional activity is associated with an increased probability of forming inter-chromosomal contacts.
We asked whether the regions’ differences in ICP are reflected in their localization within their chromosomes’ territories. Previous fluorescence imaging studies have shown that highly transcribed regions can frequently extend outside of the bulk territory of their chromosome25, 26. One of these studies analyzed several loci on chromosome 11 in lymphoblastoid cells27. Remarkably, we found that the reported average distances of these loci from the edge of their chromosome territory is strongly correlated with their ICPs (Pearson’s r = 0.98, p-value < 10-3) (Fig. 4c and Supplementary Fig. 4c). Moreover, the loci that showed preferential localization in the bulk of the chromosome territory in the imaging study are inactive in the TCC data, while those that showed more frequent localization beyond the bulk of the territory are active and have large ICPs(Fig. 4c). While more fluorescence imaging experiments are required to extend this observation to the entire genome, these examples suggest that ICP can also reflect the preferred positions of a locus within the territory of its chromosome.
To further examine the interactions between chromosomes, we analyzed those inter-chromosomal contacts with frequencies clearly above noise level. We refer to these contacts as “significant interactions” (Fig. 4d). Most of these significant interactions are formed by active regions, in particular by those with high ICPs (Fig. 4d). Interestingly, most of these regions interact with numerous other high-ICP active regions throughout the genome (Fig. 4d and Supplementary Fig. 6a). For instance, each of the high-ICP active regions on chromosome 19 forms significant interactions with at least 40% of all the high-ICP active regions on chromosome 11 (Fig. 4d) and many more on other chromosomes (Supplementary Fig. 6a). Moreover, none of these interactions appears to be dominant, and they all have relatively low frequencies (Fig. 4d and Supplementary Fig. 1d). In the case of chromosomes 11 and 19, the significant inter-chromosomal interactions between high-ICP active regions are on average more than seventy times less frequent than intra-chromosomal contacts between neighboring ~1 Mb regions. The numerosity of these interactions and their low frequencies suggest that each can be present in only a fraction of the cells.
Strikingly, the larger the ICP of the inter-chromosomal contact partners, the higher the observed frequency of their interaction (Supplementary Fig. 6a). Indeed, the contact frequency between a pair of high-ICP active regions shows a positive correlation with the product of their ICPs (Fig. 4e and Supplementary Fig. 6b,c). Based on these observations, it appears that for many high-ICP active regions the probability of forming inter-chromosomal interactions is independent of the identity of their interaction partners. We already established that ICP can be an indicator for the relative position of a region from the edge of the chromosome territory. This correlation, therefore, suggests that the propensity for forming inter-chromosomal contacts between high-ICP active regions is largely governed by the spatial accessibility of the contact partners.
To confirm the existence of inter-chromosomal interactions between high-ICP active regions we measured the colocalization frequency of one probe on chromosome 19 with each of four different probes on chromosome 11 using 3D DNA FISH (Fig. 4f-h and Supplementary Table 3). The chromosome 19 probe was located in a high-ICP active region while the four chromosome 11 probes were equally split between inactive and high-ICP active regions. These measurements showed that, in a small but significant fraction of the cells, the high-ICP active region on chromosome 19 colocalizes with each of its active counterparts on chromosome 11 (Fig. 4h). In contrast, the same region on chromosome 19 is unlikely to localize in proximity to either inactive regions on chromosome 11. These results support the conclusion that high-ICP active regions on different chromosomes can interact and that each interaction occurs in only a small fraction of the cells.
In summary, our observations indicate that most active regions do not exclusively interact with only a few specific regions on other chromosomes, rather they can form interactions indiscriminantly with many high-ICP active regions at different times. These contacts may only be present in the fraction of cells where both interaction partners are mutually accessible.
We then asked whether the indiscriminate and numerous low-frequency chromosome interactions can be reconciled with the non-random positioning of chromosome territories with preferred radial positions seen in other studies3-5. Chromatin contacts are observed with a wide range of frequencies, suggesting that many potential contacts are present in only a fraction of cells. In other words, the contacts in TCC data describe not necessarily one structure but represent the average contacts of numerous genome structures in different cells. Therefore, a population of genome structures must be generated in which the resulting variety of structures is statistically consistent with the data. We express this task as an optimization problem with three main components28, 29: (1) a structural representation of chromosomes at an appropriate level of resolution; (2) a scoring function quantifying the structure population’s accordance with the data; and (3) a method for optimizing the scoring function to yield a population of genome structures.
The plaid appearance of the contact frequency maps suggests that each chromosome can be partitioned into “blocks” of consecutive regions that share similar contact profiles. To identify these blocks, we applied constrained clustering using the Pearson’s correlation between the regions’ contact profiles as a similarity measure (Fig. 5a, Methods). Optimizing the clustering cutoff divided the haploid genome into 428 “chromatin-block” regions (Supplementary Fig. 7a, Methods). The resulting block-based contact frequency map (Fig. 5b) is highly correlated with the original frequency map (Spearman’s-correlation 0.81, p-value<10-16), confirming that the characteristic longrange contact patterns are preserved (Fig. 5a,b). Several observations indicate that large portions of chromatin regions in any given block are in spatial proximity and predominately occupy the same specific sub-territory in the nucleus. First, the vast majority of contacts are between regions inside a block. Second, across the block borders, the contact probability between neighboring regions is abruptly reduced and an abrupt change in contact profiles is observed. As a first approximation, we defined the sub-territory that is largely occupied by each block region as a globular volume whose spherical radius is approximated by the block size (Fig. 5c and Supplementary Table 4). The structure of a genome is then given by a spatial arrangement of these spheres. Our goal is then to determine a population of genome structures, where in each structure all the 856 spheres of the diploid genome are packed into the nucleus in such a way that their contacts across the population are entirely consistent with the TCC data (Fig. 5d).
We converted the TCC contact frequencies into a set of contact restraints between spheres in all the structures of the population. A restraint can be thought of as generating a “force” between the spheres so that they form a contact. Importantly, any given contact can only be enforced in the fraction of models in the population corresponding to its TCC frequency (Supplementary Methods). If a contact is not enforced, no assumptions are made about the relative positions of the corresponding spheres. Therefore, our method does not correlate contact frequencies with averaged distances; it relies purely on the TCC data by incorporating only the presence or absence of chromatin contacts.
In a diploid cell, most loci are present in two copies. Because the TCC data do not distinguish between these copies, the optimal assignment of each sphere to a specific contact is determined as a part of our optimization process28, 30.
Finally, starting from random positions, we simultaneously optimized the positions of all the spheres in a population of 10,000 genome structures to a score of 0, indicating that no restraint violations remained (Supplementary Methods).
To test how consistent this structure population is with the experiment, the block contact frequency map was calculated from the structure population and compared with the original data. The two are strongly correlated; the average Pearson’s correlation is 0.94, confirming the excellent agreement between contact frequencies in the structure population and experiment (Supplementary Fig. 7b-d). Furthermore, three independently calculated populations showed that our structure population is highly reproducible (Pearson’s r > 0.999), which also indicates that, at this resolution, the size of the model population is sufficiently large (Supplementary Methods).
Because chromatin contacts in the TCC data are observed over a wide range of frequencies, the resulting population shows a fairly large degree of structural variation (Supplementary Fig. 8a,b). For instance, on average only 21% of contacts are shared between any two structures in the population (Supplementary Fig. 8c). Despite this large heterogeneity, the structure population reveals a distinct and non-random chromosome organization. Specifically, the population clearly identifies the preferred radial positions of chromosomes (Fig. 6a,b and Supplementary Fig. 9b). These positions strongly agree with independent FISH studies in lymphoblasts4, 5: the Pearson’s correlation between the experimental and population-based average positions was 0.71 (p-value < 10-3) for the 22 chromosomes whose radial positions were previously determined4. Instead, radial positions in a control population generated without TCC data did not agree with the experiment (Pearson’s r = -0.2, Supplementary Fig. 9a), indicating that the TCC data are responsible for generating the correct radial distributions seen in the imaging experiments4. In general, the radial chromosome positions tend to increase with their size, with some noticeable exceptions (Fig. 6b). One of these cases is the radial positions of chromosomes 18 and 19 which, despite their similar size, we observed at significantly different positions5. Chromosome 19 is located closer to the center of the nucleus, while chromosome 18 is preferentially located closer to the nuclear envelope (Fig. 6a). Furthermore, the homologous copies of chromosome 18 are often distant from each other while those of chromosome 19 are often closely associated (Fig. 6a and Supplementary Fig. 9b), in agreement with independent experimental evidence5.
When chromosome territories are clustered based on their average distances, two main groups can be identified (Fig. 6c). The first group (chromosomes 1,11,14-17,19-22) tend to occupy the central region of the nucleus as is evident from their population-based joint localization probabilities (Fig. 6d). These chromosomes also tend to have relatively higher gene densities31. The second group (chromosomes 2-10,12,13,18,X) preferentially occupies the periphery of the nucleus (Fig. 6d).
Finally, we observe differences in the local packing between the spheres composed of mainly active or inactive regions. The average distances between spheres of mainly active regions are statistically larger (Supplementary Fig. 9c), suggesting that inactive regions are more densely packed in the structure population in comparison to the active regions.
TCC offers improved sensitivity in identifying chromatin interactions. In particular, libraries generated with the tethering strategy have a lower level of random intermolecular ligation compared to those generated by a non-tethered approach (Fig. 2c). The reduced noise level facilitates the analysis of low-frequency contacts such as inter-chromosomal interactions, which can otherwise be lost in the relatively higher background noise (Fig. 2d,e and Supplementary Fig. 10). Because the inter-molecular ligation noise remains low even at substantially increased DNA concentrations, this method also facilitates higher resolution analyses with enzymes that cut the chromatin more frequently.
Two main factors may contribute to this reduction of random inter-molecular ligations in the tethered libraries. First, DNA fragments can only be immobilized when they are crosslinked to proteins and are otherwise washed out of the reaction (Fig. 1). Therefore, “naked” DNA fragments, which would only produce false-positive contacts, are unlikely to participate in ligation. Second, immobilized protein-DNA complexes cannot diffuse freely, markedly reducing encounters between non-crosslinked molecules during ligation. When combined with a sufficiently low surface density of complexes which reduces their chance of immobilizing in close vicinities, these conditions can effectively reduce inter-molecular ligations.
The TCC data provide new insights into the internal organization of the chromosome territories. The regions of the inactive class preferentially associate with neighboring inactive regions, while the regions of the active class have a diverse panel of long-range contact partners (Fig. 3b and Supplementary Fig. 2b). A pronounced instance of this behavior can be observed across the centromeres. In large chromosomes, inactive regions on opposing sides of the centromere have little interaction with each other (Fig. 3c and Supplementary Fig. 3). At the same time, active regions on different arms show extensive interactions (Fig. 3c and Supplementary Fig. 3). This behavior is consistent with previous reports in D. melanogaster where interactions between some inactive polycomb-associated regions were constrained within a chromosome arm32, 33. These observations are also consistent with the more dense packing of the inactive regions seen in our genome structure population (Supplementary Fig. 9c).
More clues into the spatial organization of loci is provided by their propensity to forming inter-chromosomal contacts. With the inter-chromosomal contact probability index (ICP), we have introduced a quantitative measure of inter-chromosomal contact propensity for each region (Fig. 4a and Supplementary Fig. 4a,b). ICP appears to be an indicator of the relative position of a region within the chromosome territory (Fig. 4c). Based on the available localization data26, we found active regions with higher ICPs show more frequent localization beyond the bulk or at the border of the territory (Fig. 4c). Another important property of ICP is that it correlates with the functional characteristics of loci. For instance, active regions with larger ICP values show higher binding by RNA polymerase II (Fig. 4b) and higher levels of gene expression.
Our results reveal new insights into interactions between chromosomes. Most of these interactions are mediated by active regions with relatively high ICPs. Each of these regions forms significant interactions with numerous high-ICP active regions on other chromosomes (Fig. 4d). Notably, the frequencies of these interactions increase with the ICP of the interaction partners (Fig. 4e and Supplementary Fig. 6). As these regions tend to localize at the territory borders more frequently with increasing ICPs (Fig. 4c), their interaction frequency may be largely governed by their accessibility rather than other factors. In other words, inter-chromosomal interactions can form indiscriminately between high-ICP active regions that are accessible to each other. Accessibility may be determined by factors such radial position or regional transcriptional activity in each cell.
We also observed that the propensity to forming inter-chromosomal contacts is correlated with a region’s transcriptional activity (Fig. 4b). Because transcription is often focused at discrete sites (i.e., transcription factories)34, this correlation may be a consequence of the active regions being recruited to the same factory, thereby supporting previous suggestions that transcription factories play an important role in stabilizing inter-chromosomal interactions2, 35, 36. The indiscriminate nature of these interactions suggests that, based on accessibility in each cell, different combinations of loci associate in one factory. Nevertheless, the association of a specific transcription factor with only some of the transcription factories, as reported before36, can make the recruitment of its targets to the same factories more likely. Moreover, since transcription is not the only nuclear function that is concentrated at discrete sites1, 37, it is possible that other factories, such as those of splicing and DNA repair, also mediate the indiscriminate interactions between chromosome territories.
As these inter-chromosomal interactions are both numerous and low-frequency, each can only be present in a small fraction of the cells. In fact, in our FISH experiments, two pairs of high-ICP active regions were found to colocalize in only a few percent of the cells (Fig. 4f-h). These cell-to-cell differences are reflected in a fairly large variation between the genome structures in the population generated from the TCC data (Fig. 6a,b and Supplementary Fig. 8). In spite of this variation, however, the structure population reproduces the previously described4, 5 preferred radial positions of chromosomes (Fig. 6a,b and Supplementary Fig. 9a,b). The structural analysis indicates that the genome-wide behavior of inter-chromosomal interactions, as observed in the TCC data, is in keeping with the previously described architectural features. Furthermore, this population demonstrates that the TCC data alone are sufficient to reproduce the distinct spatial distributions of chromosome territories (Fig. 6a,b and Supplementary Fig. 9a,b).
Our population-based modeling, therefore, provides a novel means of studying the three-dimensional genome architectures. By systematically translating the TCC data into a population of genome structures, this approach also allows a statistical interpretation of the genome organization (Fig. 6 and Supplementary Figs. 8 and 9b,c). While not every structure in the population may necessarily be a definitive structure of chromosomes, several lines of evidence indicate that, as a whole, this population is representative of the true configurations of the genome. The structure population is highly reproducible with independently generated populations reproducing the same statistical features with a high precision. More importantly, the population statistics agree with independent experimental data (such as FISH data) not included when generating the structures. Moreover, a structure population based only on part of the TCC data was able to correctly predict the missing data (Supplementary Methods).
Here, we have focused on chromosome territory localizations. However, the resulting genome structure population provides a starting point for a higher resolution description of the spatial properties of the genome.
25 million GM12878 cells were crosslinked with 1% formaldehyde. Cells were lysed and treated with Iodoacetyl-PEG2-Biotin to biotinylate cysteine residues. Biotinylated chromatin was digested with either HindIII or MboI and immobilized on 400 μL MyOne Streptavidin T1 beads (Invitrogen), which has about 100 cm2 surface area. The DNA ends were filled in using dGTPaS and Biotin-14-dCTP nucleotide analogues and ligated. Crosslinking was reversed and DNA was purified and treated with E. coli exonuclease III to remove the biotinylated residues from non-ligated DNA ends. Fragments that contain ligation junctions were then purified by pull-down with streptavidin coated magnetic beads and prepared for massively parallel sequencing.
As an example of non-tethered conformation capture, Hi-C was carried out as described previously10 on 25 million GM12878 cells. Crosslinking conditions were identical to that of the TCC experiments. Digestion was carried out with either HindIII or MboI. The ligation step was carried out in a total volume of 40 mL.
Unless otherwise stated, analyses described in this article have been carried out using the tethered HindIII library. Moreover, in all the analyses of this library, intra-chromosomal contacts between regions closer than 30,000 bp have been removed from consideration (Supplementary Methods).
To generate the contact frequency maps, the genome was divided into contiguous “segments” spanning an equal number of restriction sites. The contact matrix F was defined such that the matrix entry fi,j is based on the number of observed ligation products between segments i and j(Supplementary Methods)9, 10, 40. Depending on the resolution that was desired, the number of restriction sites in each segment may have varied. For example, in the contact frequency maps shown in Figure 2a,b, chromosome 2 was divided into segments spanning 277 HindIII sites, dividing it into 258 segments.
The contact profile of region i is the ith row-vector of the matrix (F), which entails the ordered list of contact frequencies of segment i with all other segments in the genome.
The expected value for the frequency of a contact between segments i and j(ei,j) was calculated as:
where si and sj are the total of all observed contact frequencies involving segment i and j, respectively and g is a normalization constant. For example, in Figure 2d,e, γ is chosen such that the average observed/expected frequency (fi,j/ei,j) of all inter-chromosomal contacts is equal to 1.
For each chromosome all contact frequencies were first normalized by the average contact frequency of all pairs of segments with the same distance in the map. Then each element in the correlation map, pi,j, was defined as Pearson’s correlation between the intra-chromosomal contact profiles of segments i and j.
The first principal component of each intra-chromosomal correlation map (defined as the eigenvector with the largest eigenvalue), was calculated. The projection of each segment’s intra-chromosomal correlation profile on this eigenvector was taken as the value of its first principal component (EIG). Of the two possible directions for the eigenvector, the one that would result in a positive correlation between EIG and RNA polymerase II binding was chosen. Segments with a positive EIG were then assigned to the active and others to the inactive class. For the analyses that required a high-confidence assignment of the classes (i.e., Figs. 3c and and4d4d and Supplementary Fig. 3), only the segments with positive EIG values that were larger than a third of the maximum chromosome-wide EIG were assigned to the active class, and only those with negative EIG values that were smaller than a third of the minimum chromosome-wide EIG were assigned to the inactive class. The remaining segments were left unassigned. With these criteria, ~77% of all segments in autosomal chromosomes were assigned to one of the two classes.
Raw RNA polymerase II (pol II) ChIP-seq data in GM12878 cells were obtained from another study38. The ChIP-seq data were aligned to the human genome (GRCh37/hg19). The binding of pol II to each segment was calculated as the number of reads that aligned to the segment in anti-pol II ChIP divided by number of aligned reads in anti-IgG negative control.
Raw RNA-seq (poly-A enriched) data for GM12878 cells were obtained from another study38 and aligned to the human genome (GRCh37/hg19). The expression level of UCSC known canonical genes in hg19 was estimated using a two-parameter generalized Poisson model as described by Srivastava and Chen41. Total gene expression for each segment was measured as the sum of the expressions (Theta values) of all genes that overlap with that segment.
Raw histone modification ChIP-seq data in GM12878 cells were obtained from the ENCODE project42 (generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School). The ChIP-seq data were aligned to the human genome (GRCh37/hg19). Each histone modification level was calculated as the number of reads that aligned to the segment in the corresponding antibody pulldown experiment divided by the number of aligned reads in the input negative control.
Raw DNaseI sensitivity sequencing data in GM12878 cells which were generated using the Digital DNaseI methodology43 were obtained from the ENCODE project42 (these data were generated by the UW ENCODE group). The Digital DNase sequencing reads were aligned to the human genome (GRCh37/hg19). The total number of alignments to each segment was taken as the total amount of DNase hypersensitivity in that segment.
BACs were obtained from the BACPAC Resource Center (BPRC) at Children’s Hospital Oakland Research Institute. 3D-FISH experiments were carried as described previously44. The only BAC that aligns to chromosome 19 (RP11-50I11) was labelled with Digoxigenin while the other BACs (RP11-651M4, RP11-220C23, RP11-169D4, and RP11-770J1), all of which align to chromosome 11, were labelled with Biotin in nick-translation reactions. In each hybridization reaction, roughly 300 ng of each labelled probe and 5 μg of CotI DNA were used. Each label was detected with two layers; avidin-FITC and Mouse anti-dig as the first layer, and goat anti-avidin-FITC and Sheep anti-mouse-Cy3 as the second layer. The total DNA was counterstained by DAPI. Confocal microscopy was carried out using an Olympus FluoView FV1000 imaging system equipped with a 60X/1.42 PlanApo objective. Optical sections (z stacks) of 0.20 mm apart were obtained in the sequencial mode in DAPI, FITC, and Cy3 channels. Center-to-center distances between the probes were calculated using the Smart 3D-FISH pluging for ImageJ as described45. Each pair of probes was processed in duplicates with about 1,000 total cells per pair.
To identify the clustering cutoff, we used a penalty function designed to simultaneously minimize the number of clusters and the variation within each cluster39.
The genome of the diploid cell was represented by 856 spheres, whose relative radii depend on the genomic length of the chromatin regions in a block (see Figure 5b and Supplementary Methods for the definition of the blocks). Each sphere is represented by two concentric spheres, a hard sphere and a soft sphere (Fig. 5c). The radius of the hard sphere of a block was defined as (Supplementary Table 4):
with li as the genomic length of the block region i, Rnuc as the nuclear radius. The summation runs over all blocks in the genome. The chromatin occupancy volume Onuc was set to 20%. The radius of the soft sphere is twice the radius of the hard sphere.
The scoring function captures all the information about the genome structure and is the sum of restraints of various types. These restraints ensure that all spheres are positioned within the nuclear volume. The overlap between hard spheres is prevented, allowing for a defined genome occupancy in the nucleus. A contact restraint enforces that the soft radii of two spheres are overlapping. Contacts are enforced based on the contact information from the HindIII-TCC library. Our procedure ensures that only a fraction of models in the population enforces a contact according to the observed contact frequency. The scoring function was implemented and optimized in the integrative modeling platform (IMP)28, 46.
The optimization relies on conjugate gradients and molecular dynamics with simulated annealing. It starts with a random configuration of spheres and then iteratively moves these spheres so as to minimize violations of the restraints to a score of zero, resulting in a population of 10,000 genome structures that are consistent with the input data.
The authors would like to acknowledge Dr. Peter Laird, Dr. James Knowles, and Joseph Aman and the USC Epigenome Center for assistance in high-throughput sequencing, Drs. Matthew Michael and Ashley Williams for assistance in confocal microscopy, Drs. Nunzio Bottini and Qi-Long Ying and members of their laboratories for assistance in cell culture, Dr. Norman Arnheim, Dr. Andrew Smith, Dr. Oscar Aparicio, Dr. Susan Forsburg, Dr. Wenyuan Li, Dr. M.S. Madhusudhan, Ke Gong, Sudeep Srivastava, Sarmad Al-Bassam, MaryAnn Murphy, Jared Peace, and Zac Ostrow for useful discussions and comments on the manuscript. This work is supported by Human Frontier Science Program grant RGY0079/2009-C to F.A., Alfred P. Sloan Foundation grant to F.A.; NIH grants GM064642, GM077320 to L.C., NIH grant GM096089 to F.A., and NIH grant RR022220 to F.A. and L.C.. F.A. is a Pew Scholar in Biomedical Sciences, supported by the Pew Charitable Trusts.
AUTHOR CONTRIBUTIONSR.K. and L.C. conceived the tethered conformation capture technique, R.K. performed the experiments and analyzed the contact data. R.K. and N.J. performed the FISH experiments and analyzed the results. H.T. and F.A. conceived the modeling strategy and R.K. and L.C. provided input and discussions. H.T. performed the modeling experiments and analysis. R.K., F.A., H.T., and L.C. wrote the manuscript. All authors commented on and revised the manuscript. F.A. and L.C. supervised the project.
All sequencing results and binary contact catalogues are publicly available in NCBI SRA under accession number SRA025848.
A more detailed description of experimental and computational procedures is provided in the Supplementary information.
Competing financial interests
A provisional patent for TCC is under review.