|Home | About | Journals | Submit | Contact Us | Français|
Cells face the challenge of storing two meters of DNA in the three-dimensional (3D) space of the nucleus that spans only a few microns. The nuclear organization that is required to overcome this challenge must allow for the accessibility of the gene regulatory machinery to the DNA and, in the case of embryonic stem cells (ESCs), for the transcriptional and epigenetic changes that accompany differentiation. Recent technological advances have allowed for the mapping of genome organization at an unprecedented resolution and scale. These breakthroughs have lead to a deluge of new data, and a sophisticated understanding of the relationship between gene regulation and 3D genome organization is beginning to form. In this review we summarize some of the recent findings illuminating the 3D structure of the eukaryotic genome, as well as the relationship between genome topology and function from the level of whole chromosomes to enhancer-promoter loops with a focus on features affecting genome organization in ESCs and changes in nuclear organization during differentiation.
Embryonic stem cells (ESCs), isolated from the inner cell mass of pre-implantation blastocysts, self-renew indefinitely under appropriate culture conditions and have the ability to produce cell types from all three germ layers upon induction of differentiation in vivo and in vitro 1,2. Linear genomic features, such as the location of transcription factors, the basic transcriptional machinery, and chromatin modifications, as well as DNase hypersensitivity, expression state, and replication timing have been extensively mapped in ESCs. Therefore, gene regulatory processes controlling the transcriptional program of ESCs are relatively well characterized (reviewed recently elsewhere3) and center on three core transcriptional networks: the pluripotency network, made up of highly expressed, ESC-specific genes bound by the transcription factors Oct4, Sox2, and Nanog which, together, control pluripotency through co-binding of many enhancers and promoters including their own3; the cMyc network, formed by transcription factors of the Myc family which drives gene expression by promoting the release of paused polymerase at its target genes4; and, the Polycomb group (PcG) protein network, which represses developmental and lineage specific genes5 through the tri-methylation of lysine 27 of histone 3 (H3K27me3)6, H2AK119 ubiquitylation7, and chromatin compaction8. These transcriptional networks work in concert with external signaling pathways to maintain the pluripotent state, most notably the LIF-Jak-Stat pathway in mouse ESCs9 and bFGF-signaling in human ESCs10. Highlighting the importance of these transcriptional networks to pluripotent cell identity, ectopic expression of Oct4, Sox2, cMyc, and the pluripotency-associated transcription factor Klf4 is sufficient to reprogram somatic cells to induced pluripotent stem cells (iPSCs)11. iPSCs carry all the typical characteristics of ESCs including self-renewal, expression of the endogenous pluripotency program, and differentiation in both the teratoma and chimera formation assays11. More recently, there has been a push towards determining genome organization and correlating 3D topology with genomic functions such as transcriptional regulation. Because transcriptional networks and gene regulation are well studied in ESCs, these cells are a great model system with which to understand 3D genome organization and its changes upon cell fate change. In this review, we will first summarize general aspects of genome organization revealed from work with various cell types and then focus on new findings that begin to address genome organization in ESCs and changes upon induction of differentiation.
Years of research from many groups utilizing a variety of cell types from numerous species have defined a number of general features of eukaryotic genome organization. Interphase chromosomes reside in discrete, minimally overlapping chromosome territories (CTs, reviewed exhaustively by the Cremer brothers12, Figure 1a). CTs are organized such that small, gene rich chromosomes tend to pair and localize to the nuclear interior13,14. Cell type-specific radial positioning of CTs within the nucleus has also been reported15, although the extent to which CT pairing and positioning are conserved through mitosis varies depending on the cell type analyzed16,17. Individual genes are largely confined to their respective chromosome’s territory, however, in certain developmental contexts, such as Hox gene activation18 and X-chromosome inactivation19 (discussed in more detail below), gene loci have been shown to loop out or move to the outer edges of their CTs.
Localization of genomic regions to the nuclear periphery, specifically the nuclear lamina, is correlated with gene silencing across the eukaryotic kingdom20–22, and ectopic targeting of genetic loci to the nuclear envelope (NE) can induce transcriptional silencing in some cases20,23,24. NE-mediated gene silencing is thought to function in part through the interaction of heterochromatin protein 1 (HP1) with repressive protein complexes localized to the NE through interactions with the B-type lamins, the major constituents of the NE (reviewed extensively elsewhere25), as well as through histone-Lamin A interactions26. Sequestration of the transcriptional machinery away from the nuclear periphery has been suggested as an additional mechanism of NE-mediated transcriptional silencing, although it is unclear if this phenomenon is a general feature of eukaryotic genome organization27. Recent work has added a new player in targeting specific genomic regions to the NE, the vertebrate homologue of the Drosophila GAGA factor, cKrox. cKrox binds GA repeat-enriched lamina associating DNA sequences (LASs) in a cell type- specific manner, targeting these regions to the NE, although it is currently unclear how cKrox is targeted to specific LASs28.
Early studies of genome organization relied on cytological methods such as fluorescence in situ hybridization (FISH), and as such were limited in the number of gene loci that could be analyzed in a single experiment. The past decade has witnessed the introduction of molecular techniques and high-throughput mapping to the field of genome organization in the form of chromosome conformation capture (3C)-based techniques. 3C allows for a molecular view of genome organization via chemical fixation, restriction enzyme digestion, ligation of juxtaposed DNA fragments and detection of ligation events by PCR. The juxtaposition frequency of two DNA fragments in 3D space can be inferred based on the quantity of the PCR product produced upon amplifying a given ligation event29. In recent years a number of groups have expanded 3C-based molecular techniques30 to include 4C – which allows for the identification of all chromatin contacts made by a single locus with the rest of the genome31,32, 5C – enabling the identification of all pair-wise chromatin interactions for a given genomic region33, Hi-C34 and its technical variants35–37 – permitting the identification of all pairwise chromatin interactions genome-wide, and ChIA-PET38 – allowing the identification of all pairwise chromatin interactions genome-wide, which share binding of a protein of interest.
These techniques have revealed a previously unappreciated hierarchical organization of eukaryotic genomes. As expected from the CT-based structure of the genome, intra-chromosomal (cis) chromatin interactions mapped by 3C-based techniques are much more frequent than inter-chromosomal (trans) ones31,34. Apart from verifying the existence of chromosome territories and the preferential pairing of small, gene rich chromosomes, mapping of genome-wide chromatin interactions with Hi-C in human lymphoblasts34, mouse pro-B cells39, and Drosophila embryos37 demonstrated the existence of a further organizational sub-division of the genome into ‘A’ and ‘B’ compartments, where the A compartment is enriched for features of euchromatin and the B compartment is depleted of these features34. From an organizational standpoint, chromatin interactions within compartments are much more frequent than those between compartments (Figure 1b).
The comparatively smaller size of the Drosophila genome allowed for higher resolution DNA topology mapping than was previously accomplished in mammalian genomes and led to the identification of a further organizational sub-division of the genome into linear domains with shared epigenetic features, ranging in size from 10 kilobases (kb) to 500kb37. These domains appear to act modularly in governing global genome organization in Drosophila. Interactions of loci within a given domain are more frequent than interactions between loci in different domains. However, where inter-domain interactions occur, active domains preferentially interact with other active, domains, inactive with inactive, and PcG-regulated with other domains of PcG enrichment37. Recent work with a number of different cell lines has identified analogous domains in mammalian genomes40–42, termed topological domains or topologically associating domains (TADs). TADs delimit the range within which enhancers can affect their target genes, as co-regulated enhancer-promoter groups tend to form extended clusters of interacting chromatin that align with TADs43 (Figure 1c). Additionally, the changes in gene expression upon differentiation are more likely to occur in the same direction for genes within a TAD than for genes in different TADs41. It has long been appreciated that enhancer-promoter interactions are responsible for regulating the cell type-specific expression of genes. The importance of looping between promoter and enhancers for gene regulation is highlighted by data from the ENCODE consortium showing that genes whose transcriptional start sites are contacted by an enhancer are more highly transcribed than those that are not44.
The locations of TAD boundaries are strongly conserved between the mouse and human genomes, particularly within syntenic regions; and TADs of both species’ are largely conserved across different cell types40,41. CP190, a critical contributor to the function of various Drosophila insulator proteins through its mediation of DNA looping45, is enriched at TAD boundaries in Drosophila, and the vertebrate insulator protein CTCF46 is similarly enriched at the boundaries of a large subset of mammalian TADs40,42, suggesting an evolutionarily conserved mechanism of TAD formation by insulator proteins, similar to what has been proposed for mammalian insulators in general47. In ESCs, CTCF has been shown to mediate DNA looping events which partition the genome into physical domains each characterized by distinct epigenetic states42, supporting a model of DNA organization wherein many TADs function as large, independently regulated DNA loops (Figure 1b,c). Although the data arguing for the role of insulator proteins in delimiting TAD boundaries is strong, it is worth noting that only a portion of insulator binding sites function as TAD boundaries in mammalian and Drosophila cells37,40, and that many enhancer-promoter interactions cross CTCF binding events in a variety of mammalian cells types44. More work will therefore be required to determine the necessary and sufficient constituents of TAD boundary delimiters.
Albeit in flies the interactions of TADs has been described37 (see above), the extent to which mammalian TADs interact with each other, and the mechanistic logic behind these interactions, remains unclear. Distal chromatin interactions between loci many millions of bases (Mb) apart, or in trans, have been demonstrated in a number of mammalian cell types by various 3C-based studies31,32,34,42,44, but these interactions have not been examined in the context of TADs. It has been shown that long-range chromatin contacts can be cell type-specific and can occur between regions of the genome enriched for the DNA binding motif of a given transcription factor or for genes regulated by the same trans acting factors48,49, or by binding of gene regulatory factors as has been demonstrated for PcG-regulated distal chromatin interactions in Drosophila37,50 (Figure 1). One may speculate that co-regulated TADs are brought together in physical space in mammalian genomes as a general rule. Comprehensive analysis of long-range interactions in a well-annotated cell type such as ESCs should contribute to a better understanding of this question, as gene regulatory networks are well understood3, and - in the case of mouse ESCs - are amenable to genetic manipulations, which can be used to test causal links between linear genomic features and genome organization both in pluripotency and during the course of differentiation.
The genomes of ESCs have a number of unique characteristics that distinguish them from somatic cell genomes. The contribution of these features to the different layers genome organization described above is currently unclear, however they may have an effect on the interpretation of organizational data in ESCs and thus are important to note. Among features unique to the genome of mouse ESCs are a hyper-dynamic association of chromatin proteins with the chromatin polymer51, enhanced global transcriptional activity52, a lack of condensed heterochromatin at the NE and peri-nucleolar regions53, and two active X-chromosomes in females cells. Upon differentiation, chromatin protein association becomes more stable51, wide-spread transcription of both protein coding and non-coding regions is restricted, repeat elements are silenced51,52, and heterochromatic regions of the genome compact and localize to the nuclear periphery53. At the same time, a subset of pluripotency gene loci is silenced and moves to the nuclear periphery even before germ layer restriction occurs53–56. These processes occur contemporaneously with large-scale changes in DNA replication timing54,55, silencing of an X-chromosome in female cells57, and the onset of Lamin A expression, which stabilizes histone H1 in heterochromatin and is required for the establishment of the large number of heterochromatin foci characteristic of differentiated cells58. Together, these data indicate that the dramatic changes in gene expression that occur upon pluripotent cell differentiation are accompanied by large-scale changes in genome topology.
Despite the correlation between NE localization and gene silencing in ESCs56, LaminB1/B2 double knockout ESCs and trophectoderm cells show few changes in gene expression compared to their respective wild-type cells, and those genes that do change expression levels are not bound by B-type Lamins in wild-type cells59. This suggests that LaminB does not directly regulate expression of its interacting genes in ESCs or trophectoderm cells. Alternatively, unidentified redundant mechanisms may work to maintain gene silencing at the NE in the absence of B-type lamins in these cells. Additionally, LaminB-null ESCs show none of the NE morphology defects typical of somatic cells with mutations in nuclear lamina proteins59,60. During the course of differentiation of ESCs to neural precursor cells, many pluripotency specific genes are re-localized to the nuclear lamina and many NPC-specific genes detach from the lamina56. In contrast to the phenotypically wild-type ESCs, upon embryonic development, LaminB1/B2-null mice display severe organogenesis and neural migration defects59. Implicated as a major player in somatic cell genome organization, it will be important to understand the role of the nuclear lamina in regulating genome organization of ESCs, or alternatively, to determine if chromatin-NE co-localization is only required upon differentiation.
In contrast to the transcriptionally repressive nuclear envelope, in yeast, gene localization to the nuclear pore complex is associated with transcriptional activation in certain inducible systems61. In metazoans, however, some of the nucleoporins (Nups), the major constituents of the nuclear pore complex, have been implicated as regulators of gene expression through direct binding of chromatin in the nucleoplasm, mostly away from the nuclear pore62–64. Specifically, Nup133-null mice display defects in neural differentiation and Nup133-null ESCs differentiate inefficiently along neural lineages and do not contribute to the neural tube of chimeric embryos65. Similarly, the integral membrane protein Nup210 is expressed cell type specifically and is, not essential for nuclear pore function, but is required for ESC differentiation into neural progenitors as well as for myogenesis. Nup210 depletion abrogates the upregulation of differentiation-associated genes and its overexpression facilitates the expression of essential differentiation genes. Notably, the authors argue against a role for Nup210 in tethering genes to the nuclear pore complex upon induction, as they do not see changes in candidate, Nup210 regulated gene localization to the NE66. It will be important to understand the differing roles of Nups when they are chromatin bound in the nucleoplasm, versus when they are part of the nuclear pore complex, as well as their role in genome organization or re-organization upon differentiation in metazoans.
The X chromosome inactivation process is a striking example for topology changes associated with differentiation. The equalization of X-linked gene expression between sexes in mammals occurs via the silencing of one of two X chromosomes upon induction of differentiation of ESCs. This process, induced by the upregulation and spreading of the non-coding RNA Xist on the future inactive X chromosome (Xi), leads to the transcriptional silencing of the majority of X-linked genes on the Xi, and the establishment of a number of repressive chromatin modifications along the Xi, including Polycomb group protein-mediated H3K27 methylation, DNA methylation, and deposition of the histone variant macroH2A57.
At the onset of X-inactivation homologous X chromosomes co-localize allowing for the pairing of the Xist-encoding X-inactivation centers (XIC), a process thought to be necessary for the initiation of X-inactivation on one of the two X chromosomes67–69. Following X-chromosome pairing, the Xi preferentially localizes to the NE and peri-nucleolar regions of the nucleus70, both of which are enriched for autosomal heterochromatin in differentiated cells71. This localization occurs predominantly during S phase of the cell cycle, and is dependent on Xist expression. Deletion of Xist in fibroblasts causes a re-localization of the Xi away from the nucleolus, with concomitant re-activation of a subset of genes in small proportion of cells70.
In addition to these large scale movements of the X-chromosome upon induction of X-inactivation, Xist expression leads to the formation of an Xist RNA domain over one of the two X-chromosomes and the immediate exclusion of RNA polymerase II (RNAPII) and transcription machinery from the future Xi19. Interestingly, the exclusion of transcription machinery precedes the completion of transcriptional silencing. At the time of transcription machinery exclusion from the territory of the Xi, genes localize to the periphery of the X chromosome territory, where they can contact the transcriptional machinery. As these genes are silenced during the course of differentiation and X-inactivation, they localize to the interior of the Xi territory. Silencing and sequestration of X-linked genes into the Xi territory requires the A-repeat19, a portion of Xist necessary for transcriptional silencing72. Genes that escape X-inactivation remain localized to the periphery of the Xi territory19. A subsequent 4C study has shown that these escaping genes co-localize with other escaping genes as well as with gene loci on other chromosomes73. Conversely, silenced genes in the center of the Xi territory make few preferential interactions with other genomic regions, suggesting a random localization or restricted movement of these loci within the Xi73. Xi-specific 3D chromatin organization is partially dependent on Xist RNA coating, as Xist deletion results in an organizational state of the Xi resembling the Xa19,73.
The mechanisms regulating the dramatic re-organization of the X-chromosome upon silencing are unclear, however, SatB1/B2 are implicated in this process74. In thymocytes, the SatB1 protein is organized in a cage-like structure throughout the nucleus75 where it regulates gene expression through the anchoring of looped chromatin structures and the recruitment of chromatin modifying enzymes76,77. Upon induction of Xist expression in thymocytes and ESCs, Xist RNA accumulates in a region delimited by SatB1, and SatB1 depletion during ESC differentiation reduces the efficiency of X-inactivation74, although MEFs derived from SatB1/B2-null embryos display normal X-inactivation78,79, calling into question an essential role for SatB1 in the organization of chromatin and gene silencing during X-inactivation.
Despite the large-scale re-organization of the X chromosome during the course of inactivation, the existence of the two TADs encompassing the XIC does not change. However, specific intra-TAD interactions are lost upon X-inactivation, suggesting a random organization of the intra-TAD space within the Xi41, similar to that shown for long-range interactions with in the Xi by 4C analysis73. Alternatively, molecular ‘gluing’ of these TADs to the nuclear lamina could lead to a very limited interactome. Cell lines lacking G9a, an H3K9 methyltransferase, or Eed, an essential component of the Polycomb repressive complex 2, have no effect on the chromatin conformation or TAD structure within the XIC, suggesting that epigenetic modifications function downstream of TAD formation. In contrast, deletion of the TAD boundary region in the XIC, specifically between Xist and Tsix, resulted in the partial merger of neighboring TADs in mouse ESCs41, although cells lacking this TAD boundary are still capable of undergoing random X-inactivation upon differentiation80, leaving open the question of whether a specific organization of the XIC is required for X-inactivation.
Together, these data argue that X-inactivation is an essential developmental process that is associated with topology changes at various levels and may be a great model system to dissect the molecular mechanisms underlying genome organization and its dynamics during the course of differentiation. Notably, the 3D organization of the X-chromosome during Xi-reactivation events in vitro or in vivo, either in the context of somatic cell reprogramming81 or germ cell development82, has not been investigated.
Based on studies of promoter and enhancer interactions by DNA looping, it is clear that gene expression is facilitated and regulated through contacts of distal chromatin contacts. The mode and mechanism of action of enhancer elements has been the subject a large body of work over the years, and recent experiments have brought to light various molecular mechanisms underlying this phenomenon. In particular, the Cohesin complex – which, canonically, forms a ring around sister chromatids during mitosis83 - has been shown to play a major role in organizing DNA topology and affecting gene regulatory processes at the level of enhancer-promoter interactions. It was initially characterized at the developmentally regulated IFNG locus in T-cells where it is required for enhancer-promoter looping and expression of IFNG84, and at the H19/IF2 loci in humanized mouse cells where it is required for insulator activity85. Cohesin binding sites overlap significantly with CTCF binding sites genome wide85–88, many of which are conserved across cell types and species47, leading to a model wherein CTCF-associated Cohesin localization is largely cell type invariant89 (Figure 1b,c), potentially explaining the conservation of TAD boundaries across cell types and species, as hypothesized by Dixon et al40,47.
In order to generate cell-type specific DNA topologies for the facilitation of specific transcriptional programs, cells appear to utilize non-CTCF mediated recruitment of Cohesin. For instance, Cohesin is co-bound with the transcription factor CEBPA in Hep2G cells and with the estrogen receptor (ER) in MCF7 cells, where Cohesin binding persists in the absence of CTCF90. In the case of MCF7 cells, Cohesin binding is particularly enriched at regions involved in ER-mediated chromatin interactions38. Mounting evidence suggests that, similar to its role during mitosis, Cohesin functions by holding functional DNA elements together in the nucleus (Figure 2), and additionally, may stabilize TF binding to highly occupied cis regulatory elements91.
A major advance in our understanding of the mechanistic underpinnings of promoter-enhancer interactions in ESCs was achieved recently through an shRNA screen for loss of Oct4 gene expression89. This screen identified numerous subunits of Mediator - a massive protein complex that regulates the activity of RNA Polymerase II92 - and Cohesin subunits, as well as the Cohesin loading factor Nipbl, as regulators of Oct4 gene expression. The authors found that Cohesin and Mediator co-immunoprecipitate with each other and Nipbl in ESCs, potentially allowing Cohesin to enable ESC-specific enhancer-promoter interactions upon recruitment of Mediator to chromatin by various transcription factors (Figure 2). Unlike CTCF and Cohesin co-bound sites, Mediator and Cohesin co-bound sites are cell type-specific and often overlap with locations of pluripotency transcription factors Oct4, Sox2, and Nanog in ESCs. In MEFs, among loci where Mediator binding is different compared to ESCs, enhancer-promoter looping interactions are likewise different, as shown using 3C at a number of candidate loci89. These findings likely explain previous work demonstrating a chromatin topology that brings together a variety of DNase HS sites and co-regulated genes within the extended 150kb Nanog locus, a topology that is lost upon Oct4 depletion93. Although it has not been explicitly demonstrated outside of ESCs, we speculate that recruitment of mediator to binding sites occupied by cell type-specific transcription factors facilitates the recruitment of Cohesin to interphase chromatin where it mediates enhancer-promoter interactions, and potentially even more long-range chromatin contacts.
The synthesis of recently published data leads us to propose the following speculative model of mammalian genome organization (Figure 1): Within TADs37,40,41, enhancers and promoters dynamically co-localize with and co-regulate each other44,94 in a cell type-specific manner44, limited in range along the chromatin polymer by TAD boundaries41,43. These TADs, existing as topologically isolated loops42, can re-locate to various sub-nuclear compartments18,34,41,56 in response to specific developmental and gene regulatory cues, but apart from limited cases where specific genes (and likely entire TADs) loop out of their CTs, TAD localization is limited to its own CT. An important piece of information missing from this model is the mode and mechanism of preferential TAD-TAD interactions that we infer from 4C data. Due to their well-defined transcriptional networks, chromatin states, and gene expression data sets, ESCs - in pluripotency and during the course of differentiation - will be an ideal cell type for studying this question with 3C-based methodologies. In combination with a transcriptionally permissive nuclear environment52 and a lack of highly condensed heterochromatin53, future studies may also help us to understand whether an ESC-specific 3D genomic organization contributes to the developmental plasticity of ESCs.
KP is supported by the NIH (DP2OD001686 and P01 GM099134), CIRM (RN1-00564 and RB3-05080), and by the Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at UCLA. MD is supported by the Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at UCLA, CIRM (TG2-01169), and the UCLA graduate division.