|Home | About | Journals | Submit | Contact Us | Français|
The folding of genomic DNA from the beads-on-a-string like structure of nucleosomes into higher order assemblies is critically linked to nuclear processes. We have calculated the first 3D structures of entire mammalian genomes using data from a new chromosome conformation capture procedure that allows us to first image and then process single cells. This has allowed us to study genome folding down to a scale of <100 kb and to validate the structures. We show that the structures of individual topological-associated domains and loops vary very substantially from cell-to-cell. By contrast, A/B compartments, lamin-associated domains and active enhancers/promoters are organized in a consistent way on a genome-wide basis in every cell, suggesting that they could drive chromosome and genome folding. Through studying pluripotency factor- and NuRD-regulated genes, we illustrate how single cell genome structure determination provides a novel approach for investigating biological processes.
Our understanding of nuclear architecture has been built on electron and light microscopy studies that suggest the existence of territories pervaded by an inter-chromosomal space through which molecules diffuse to and from their sites of action1. In parallel, biochemical studies, in particular chromosome conformation capture experiments (3C, Hi-C etc.) where DNA sequences in close spatial proximity in the nucleus are identified after restriction enzyme digestion and DNA ligation, have provided molecular information about chromosome folding2. At a mega-base scale, Hi-C experiments have partitioned the genome into two (A/B) compartments3. In addition, they have provided evidence for 0.5-1.0 Mb “topological-associated domains” (TADs)4–6, as well as smaller loops (hundreds of kilobases)7. 3C-type experiments have further shown that enhancers make direct physical interactions with promoters, and that these interactions are stabilized by a network of protein-protein interactions involving CTCF, cohesin and mediator8,9. Although probabilistic methods can be used to calculate ensembles of low-resolution models that are consistent with population Hi-C data10,11, understanding genome structure at higher resolution requires the development of single cell approaches.
In mitotic cells both A/B-compartments and TADs disappear12 and thus the structural complexity of interphase chromosomes is reestablished during G1 phase. To study interphase genome structure, we have combined imaging with an improved Hi-C protocol (Fig. 1a) to determine whole genome structures of single G1 phase haploid mouse embryonic stem cells (mESCs) at a 100 kb scale. The structures allow us to study TAD/loop structure genome-wide, to analyze the principles underlying genome folding, and to understand which factors may be important for driving chromosome/genome structure. We also illustrate how combining single-cell genome structures, with population-based ChIP- and RNA-seq data, provides new insight into the organization of pluripotency factor- and Nucleosome Remodeling Deacetylase (NuRD)-regulated genes.
We imaged haploid mESC nuclei, expressing fluorescently tagged CENP-A (the centromeric histone H3 variant) and histone H2B proteins, to select G1 phase cells (Extended Data Fig. 1a) and to later validate the structures. Hi-C processing of eight individual mESCs yielded 37,000-122,000 contacts (Extended Data Table 1), representing 1.2-4.1% recovery of the total possible ligation junctions. In single cells, unlike in population data, Hi-C contacts are observed between distinct and different sets of chromosomes (Fig. 1b and Extended Data Fig. 1b).
Using a particle-on-a-string representation and an extended simulated annealing protocol we calculated highly consistent 3D genome structures [ensemble root mean square deviations (RMSDs) < 1.75 particle radii] with discrete chromosome territories (Fig. 1c and Supplementary Videos 1, 2). The structures were calculated with an average of 1-3 Hi-C contact derived restraints for each 100 kb particle (with a total of 26,000-75,000 restraints, Extended Data Table 2 and Extended Data Fig. 1c). Recalculation after randomly omitting 10-70% of the data reliably generated the same folded conformation (RMSD < 2.5 particle radii). Moreover, structure calculations after randomly merging half the data from two different cells resulted in a vast increase in the number of violated experimental restraints (37.4 % have a distance >4 particle radii, compared to 5-6% for the separate data), and generated compacted, highly inconsistent structures (Extended Data Fig. 1d). Thus, single-cell Hi-C data cannot result from independent sampling of contacts from a single underlying conformation. In addition, cells with either a broken/recombined chromosome (Extended Data Fig. 1e) or with a duplicated chromosome (Extended Data Fig. 1f) can be immediately recognized from the data.
A consistent Rabl configuration (with centromeres and telomeres clustered on opposite sides of the nucleus) was observed in all G1 phase mESCs, strongly validating the structures (Fig. 2a, Extended Data Fig. 2a and Supplementary Video 3). Fig. 2b shows two examples of CENP-A image superposition with the corresponding genome structure from the same cell, providing independent evaluation of the reliability of the structure. Cell 7 shows typical clustering of the pericentromeric regions in a cavity on one side of the structure, which is clearly supported by the centromere positions in the CENP-A image. In Cell 8 the centromeres are more diffusely distributed in both the image and the structure. The structures were additionally validated through: 1) comparison with previous imaging studies, and both our own and previous DNA-FISH experiments; and 2) testing structural predictions using super-resolution microscopy (see below).
The single cell Hi-C data shows fairly uniform coverage of long range contacts across both the A and B compartments, suggesting similar restriction enzyme/ligase accessibility in each (Extended Data Fig. 2b). Importantly, the contact probability is preserved for all nearby particles, showing that the entire structure is consistent with the Hi-C contact data (Extended Data Fig. 2c). We noticed an increase in contact density in some regions that coincided with sites of early DNA replication13, but after studying violated experimental restraints we were unable to identify any region that cannot be described by a single structural conformation, i.e. where replication appeared to have begun (Extended Data Fig. 2b).
Comparison of haploid and diploid mESCs using RNA-seq and ChIP experiments respectively showed that the levels of gene expression are highly correlated with each other (Spearman’s rho=0.97, P<10-15) (Extended Data Fig. 2d) and that protein-genome interactions are highly similar (Extended Data Fig. 2e). This allowed us to utilise published ChIP-seq data when analysing the haploid structures.
Discrete chromosome territories can be seen in all the intact genome structures (Fig. 2c and Supplementary Video 1), although there is a significant degree (5-10%) of chromosome intermingling (Extended Data Fig. 3a). Whilst chromosome structure varies dramatically from cell-to-cell, we find that regions belonging to the A or B compartments always cluster together and A segregates from B (Fig. 2d and Extended Data Fig. 3b). This is supported by recent imaging experiments showing that A and B compartment TADs are organized in a spatially polarized manner in single chromosomes14, providing further validation of our structures. In all cells the chromosomes then pack together to give an outer ring of B compartment, an inner ring of A compartment, and an internal region of B compartment around the hollow nucleoli (Fig. 2e, Extended Data Fig. 3c and Supplementary Video 4). The nucleolus is often close to the nuclear membrane with the A compartment forming a bowl-like structure. To achieve this organization chromosomes can fold in from the surface towards the nucleoli, or fold in and back out again, or go all the way through the nucleus (Fig. 2f and Supplementary Video 5). Chromatin states computed from the genome-wide association of post-translationally modified histones in mammalian cells15 (a completely independent method), also show a similar organization (Extended Data Fig. 3d). Likewise, regions that constitutively associate with Lamin B1 (cLAD’s)16,17 are confined to either the nuclear membrane or nucleolar periphery in every cell, consistent with reshuffling between these regions each cell cycle18,19. Highly expressed genes, however, mostly lie in the inner ring of A compartment (Figs. 2e,g, Extended Data Figs. 3c,e,f and Supplementary Videos 6, 7).
By mapping ChIP-seq data onto the single cell genome structures we observed 3D clustering of histones H3K4me1, H3K27ac, and H3K4me3, consistent with the presence of enhancer/promoter clusters or transcription factories (Extended Data Figs. 4a,b). Annotating enhancers and promoters for activity (see Supplementary Methods) showed that active enhancers spatially associate most strongly with each other, followed by active enhancers with active promoters (Fig. 3a). We also found a pronounced clustering of highly expressed genes, in single cells, after mapping nuclear RNA-seq data onto the structures (Fig. 2g), and the greater the level of gene expression the larger the effect (Fig. 3b). Genome-wide analysis also showed that active/poised enhancers and active/bivalent promoters have a clear preference for being located at chromosomal interfaces (Extended Data Fig. 4c). Interestingly, there are very clear correlations between a gene’s expression level, and both localization to a chromosomal interface and depth within the A compartment (Fig. 3c and Extended Data Figs. 4d,e). We also related the preferred positions of pluripotency genes20 to gene expression and found that two highly expressed genes, Zfp42/Rex1 and Nanog, have variable positions in our structures (Fig. 3d). They are either found near the nuclear membrane or buried. DNA-FISH experiments, where Pou5f1 is a typical highly expressed (and usually buried) gene control, verified these conclusions providing further validation of the structures (Fig. 3e).
Notably, the A/B compartments, cLAD, ChIP- and RNA-seq data were all determined from populations of cells. Their consistent organization in every cell suggests that overall chromosome/genome conformation may be driven by a combination of interactions of LADs with the nuclear membrane/nucleolus and the clustering of active enhancers/promoters, which can be modulated by chromatin remodeling21. That genome structure is driven by transcription is supported by live cell imaging of histone-GFP fusion proteins during C. elegans development, which shows that knock-down of RNA Pol II leads to a collapse of the chromatin to a ring inside the nuclear membrane22.
As in previous studies5,9,23, we observed an alignment between highly expressed genes and both A/B compartment and TAD boundaries (Fig. 4a and Extended Data Fig. 5a). Analysis of four TADs, either side of highly expressed genes (Regions 1 and 2 in Fig. 4a), illustrates that in some cells a particular TAD is compacted, often such that its two boundaries are close enough to interact, whilst in others it is completely extended. This difference is not due to a lack of data because the structures obtained from repeated calculations using identical experimental restraints are very well defined (Fig. 4b and Extended Data Fig. 5b).
We systematically studied compaction in chromosome 12 TADs (Extended Data Fig. 5a), by computing the radius of gyration (ROG) after excluding possible sites of early DNA replication where TAD structure might be disrupted. As with previous studies of the Tsix TAD24 individual TAD compaction varies widely from highly extended to compacted states (Fig. 4c), consistent with ligation occurring between almost every site in population Hi-C data. The structures of both compact and extended TADs are well defined and there is little correlation between the ROG and Hi-C contact density (Extended Data Fig. 5c), further showing that extended TAD structures do not result from a lack of experimental contacts. Analysis of TAD structure in all the other chromosomes gave analogous results (Extended Data Fig. 6). It is noteworthy that compaction in the structures often appears to involve the formation of loops within a TAD (see Fig. 4b, Extended Data Fig. 5b and Supplementary Videos 8-11) and it will be interesting to investigate whether these structures are related to supercoiling25,26 or loop extrusion27–29.
We found that CTCF/Cohesin loops identified in high-resolution Hi-C data from mouse B-lymphoblasts7 mostly involve interactions where at least one end of the loop is in (or very near to) the A compartment (Extended Data Fig. 5d). Considering the 88 largest loops from 2,823 in total (with sequence separation >600 kb), we found that 33% do not form in any of the cells whereas the boundaries of the remainder contact each other in 12-62% of the cells (Fig. 4d). Extending this analysis to all 2,823 loops in 8 cells showed that the boundaries interact in 62.1% (Extended Data Fig. 7). Our genome-wide results suggest that TADs and CTCF/Cohesin loops do not form in all cells, in agreement with previous DNA-FISH experiments by Rao et al. (Ref. 7) who showed that four representative loops form in only a proportion of cells.
Our structures provide snapshots of genome folding at a particular time in different cells, and thus do not provide information about dynamics. They are, however, strikingly consistent with what one would expect from recently proposed loop-extrusion models, where TADs and CTCF/Cohesin loops might be expected to have highly dynamic and variable structures as Cohesin rings are driven to stable binding sites7,27–29. It is not known what drives the movement of Cohesin rings in mammalian cells, but previous studies in yeast suggest that it might be RNA polymerase molecules and transcription30. This would be consistent with our observation that CTCF/Cohesin loops7 are mostly found in the A compartment (where transcription levels are higher), studies in Drosophila suggesting that TADs result from the compaction of chromatin due to transcription31,32, and recent studies of the inactive mouse X chromosome that show a global loss of TAD structure except at expressed genes33,34.
In addition to CTCF, Cohesin and Mediator, previous studies have implicated key pluripotency factors as well as the Polycomb complexes (PRC1 and PRC2) in organizing 3D genome structure in mESCs. Analysis of one of the published 4C Nanog-gene interaction networks35 showed that only one (or two) of the previously identified 4C contacts can be identified in each single cell structure, showing that the propensity for particular genes to interact is low (Fig. 5a and Extended Data Figs. 8a,b). Analysis of Pou5f1-gene interacting regions36 gave very similar results (Extended Data Fig. 8c).
We mapped ChIP-seq data for different pluripotency factors onto the single cell genome structures and showed that, in single cells, Klf4 spatially clusters strongly with itself, H3K4me1, H3K27ac, and H3K4me3, i.e. with active enhancers/promoters (Extended Data Figs. 4a,b). This analysis also suggested 3D clustering of histone H3K27me3 (a marker for Polycomb complexes), but lower levels of 3D clustering of Nanog, both with itself and with H3K27me3. These results are consistent with previous mESC imaging experiments36,37, and strongly validate our single cell structures. They support the proposal that Klf4 organizes long-range chromosomal interactions36, and suggest that the observed large-scale 3D segregation of Nanog and H3K27me337 mostly results from Nanog and PRC complexes binding to separated sequences in chromosomes. However, whilst they suggest that Klf4-bound genes cluster, they also show that there is little propensity for “particular” Klf4-bound genes to interact with each other.
Next, we used the structures to study genes regulated by the NuRD complex, which plays a key role in controlling the earliest stages of differentiation of mESCs38. Whilst ChIP-seq experiments showed that CHD4 (the chromatin remodeling component) and MBD3 (part of the deacetylase core)39 are widely distributed (data not shown), we surprisingly found marked 3D clustering of NuRD-regulated genes (Figs. 5b,c). Super-resolution microscopy and single particle tracking using photo-activated light microscopy (PALM) in fixed and live cells, respectively, showed clustering of both the chromatin remodeling and deacetylase sub-modules (as illustrated by the mEos3.2 tagged CHD4 and MBD3 proteins, respectively), consistent with the 3D clustering of NuRD-regulated genes (Fig. 5d and Extended Data Fig. 8d). Interestingly, whilst our structures show that regions containing highly NuRD-regulated genes cluster, the actual regions that interact vary from cell-to-cell (Fig. 5e). In addition, we found that most genes are up/down regulated in either the CHD4 depletion experiment (CHD4-KD) or in the MBD3 knockout cells (MBD3-KO), but not both (Fig. 5c), suggesting that the chromatin remodeling and deacetylase sub-modules may function separately. However, despite regulating different sets of genes, it is notable that genes that are down-regulated in the MBD3-KO cells cluster more strongly than those that are up regulated, and that genes that are down-regulated in the MBD3-KO cluster more strongly with genes that are up-regulated in the CHD4-KD (and vice-versa) (Fig. 5b). Although further work is necessary to understand what drives the formation of NuRD clusters, the 3D clustering of CHD4 and MBD3 with active enhancers and promoters is noteworthy (Fig. 5b).
The structures allow the first genome-wide analysis of 3D interactions of individual regulatory elements/genes in single cells. In combination with 3D imaging they show that whilst Klf4- and NuRD-regulated genes interact and cluster to form foci, the genes they bring together are very variable. Our combination of imaging with genome structure determination will allow further studies of these and many other biological processes. In addition, the finding that chromosomes have a Rabl configuration in mammalian G1-phase cells may underlie slight preferences in long-range chromosomal interactions – e.g. those leading to translocation events involved in disease40.
|Cell||Input read pairs||Unique mapped pairs||Primary contacts||Final contacts*||Normal ligation %||Single read %||Trans %||Promiscuous ends||Mean redundancy|
|Cell||400 kb Particles||200 kb Particles||100 kb Particles (without ambiguous restraints)||100 kb Particles (with ambiguous restraints)|
|RMSD*||NRest†||%V§ >3||%V§ >4||RMSD*||NRest†||%V§ >3||%V§ >4||RMSD*||NRest†||%V§ >3||%V§ >4||RMSD*||NRest†||%V§ >3||%V§ >4|
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
We thank Andy Riddell for cell sorting, Peter Humphreys for confocal microscopy, Alex Peter Gunnarson for the density mapping software, the CRUK Cambridge Institute for DNA sequencing, Takashi Nagano and Peter Fraser for processing the initial haploid mES cells, and Wendy Dean, Stefan Schoenfelder and Stephen Wingett for helpful advice. We thank the Wellcome Trust (082010/Z/07/Z), the EC FP7 4DCellFate project (277899) and the MRC (MR/M010082/1) for financial support.
Data Availability Statement
The ChIP-seq, RNA-seq and Hi-C data, structures and images reported in this study have been made available at the Gene Expression Omnibus (GEO) repository under accession code GSE80280.
Author ContributionsDL, SB and YC developed the protocol and carried out imaging/Hi-C processing. TJS developed the software with assistance from LPA and KJW. AO’S-K, JC, MR and BH carried out the CHD4/MBD3 depletion experiments, associated RNA- and ChIP-seq, and created the mEos3.2-Halo tagged ES cell lines. ML and AW provided the initial samples of haploid mESCs. SFL, MP and DK designed and built the microscope. LM, MS and LDiC carried out ChIP- and RNA-Seq experiments, whilst AF, EB and BL carried out bioinformatics analysis. TJS and EDL designed experiments, analyzed the results and wrote the manuscript with contributions from all the other authors.
Reprints and permissions information is available at www.nature.com/reprints.
The authors declare no competing interests.