|Home | About | Journals | Submit | Contact Us | Français|
Structural-functional domains have long been hypothesized to occur in eukaryotic chromosomes, but their existence still remains controversial. Here, we discuss the current state of studies of 3D genome folding and the relation of this folding to the functional organization of the genome.
The relation of the structural organization of the eukaryotic genome to its functional organization has been discussed ever since the discovery of the preferential sensitivity of active genes to nucleases. Initially, it was proposed that the eukaryotic genome was composed of similarly organized domains that could either be active or repressed and that the transcriptional status of these domains was controlled via changes in the mode of chromatin packaging within such domains.1 However, genome-wide analyses of transcription and epigenetic profiles demonstrated that the relation between transcription and modes of chromatin packaging could not be described by this simple assumption. Several types of both active and repressive chromatin have been identified that are not segregated in continuous non-overlapping domains on a linear chromatin fiber. Similarly, functional domains of the genome appear to be discontinuous as they include genes or gene clusters along with remote regulatory elements that are frequently located far away from their target genes (for a review see refs.2,3). Nevertheless, this does not mean that there are no structural-functional domains in the eukaryotic genome. Recent evidence suggests that these domains may be assembled from different genomic segments that are gathered together in the 3D nuclear space.4-6 Here, we review the current progress of studies of the structural-functional organization of the eukaryotic genome.
The current progress in studies of the eukaryotic genome 3D organization was made possible by the development of chromosome conformation capture (3C) and related techniques.7 Using the Hi-C approach, Lieberman-Aiden and colleagues have demonstrated that in mammals, transcriptionally active and inactive regions of the genome are spatially segregated within the 3D nuclear space, forming A and B compartments, respectively.8 Both compartments appear to be partitioned into megabase- and sub-megabase-scale topologically-associating (self-interacting) domains (TADs).8-11 TADs have also been described in Drosophila.12,13 However, in Drosophila, TADs are much smaller than in mammals.12,13. Notably, in both mammals and Drosophila, TADs are hierarchical.14,15 Using various annotation approaches, one can annotate TADs of different sizes: loose mega-TADs, “normal” TADs and compact sub-TADs14-16 (Fig. 1A). The patterns of chromosome partitioning into TADs were initially reported to be cell lineage-independent.9 However, the generality of this observation has been questioned.12,15 Interestingly, our recent observations suggest that changes in TAD profiles may correlate with changes in transcriptional status.15
Using computer simulations, we have demonstrated that simple physical laws direct the assembly of TADs.15 The ability of nucleosomal arrays to form a compact chromatin mass is an intrinsic feature of these arrays, resulting from the electrostatic interactions between nucleosomal particles (Fig. 1B). However, questions remain concerning what restricts the assembly of TADs to megabase (sub-megabase) size chromatin domains. In the simplest case, the inter-TADs are formed due to the presence of active genes. Active chromatin simply cannot be packaged into compact tertiary structures due to a high level of histone acetylation.15 Acetylation decreases the positive charge of histone tails and thus suppresses their ability to interact with the acidic patch on a neighboring nucleosome (Fig. 1B). Our computer modeling has shown that the stickiness of non-acetylated (inactive) nucleosomes and the absence of stickiness for acetylated (active) nucleosomes are sufficient for chromatin partitioning into TADs and inter-TADs15 (Fig. 1C). It appears that in Drosophila, the TAD profiles are primarily defined by the distribution of clustered housekeeping genes.15 In mammals, the situation is more complex and diverse. There is strong evidence for the participation of CTCF in the formation of TAD borders.9, 11, 17 This may be a consequence of separate TAD assembly by self-compaction of extruded DNA loops.18-20 The “loop extrusion” model postulates that TADs in mammalian genomes could be formed due to the energy-dependent pulling of the chromatin fiber through the “extrusion machine” resulting in the loop formation between convergent CTCF-binding sites (CBS) within this machine and folding of the looped fragment into a TAD (Fig. 1D). The exact nature of the extrusion machine is not known. The most popular candidates are cohesin and condensin complexes. Cohesin/condensin rings may topologically constrain chromatin fiber extrusion. However, cohesin and condensin complexes do not possess intrinsic motor activities. At the same time, there are several well-characterized molecular motors within the cell nucleus, the foremost being RNA Polymerase II (PolII). The original version of the loop extrusion model of mammalian TAD formation proposes that extrusion begins somewhere at a single point within the genomic region that will be folded into a TAD18 (Fig. 1D). Meanwhile, there are other possibilities. For example, two chromatin extrusion machines could initiate chromatin pulling independently at two convergent CBSs at the flanks of the TAD being assembled and elongate toward each other (Fig. 1E). In this case, two collided extrusion machines will form the loop base, and preferential looping between convergent CBSs could be explained by the supposition that the CBS orientation determines the extrusion direction. If PolII is a motor for the extrusion machine (as postulated by the transcription factories model21), the CTCF-cohesin complex attached to the CBS may serve as an anchor for PolII. The first obvious prediction from this scenario is that components of the transcription machinery should be distributed asymmetrically around the CBS, with the peaks shifted downstream of the forward CBS and upstream of the reverse CBS. Indeed, it has been recently observed that in chromatin loops formed by convergent CBS (convergent loops), forward and reverse CBSs are strongly associated with forward and reverse TSS, respectively, and TSS peaks are preferentially located downstream of the forward CBS and upstream of the reverse CBS.17 One may note that the above-proposed interpretation of the extrusion model of TAD formation makes this model similar to the facilitated tracking model of enhancer-promoter communication. Indeed, this model postulates that PolII bound to the enhancer is able to bring it to the promoter via looping of an intervening DNA fragment that should occur in the course of transcription elongation.22 Whatever is the possible mechanism of DNA loop extrusion, it should be stressed that the model itself is currently based on circumstantial evidence and lacks direct proves.
Some authors have proposed that the tightening of remote sections of chromatin fibers by various sets of protein bridges is a driving force for the formation of separate TADs.14,23 In this model, different parts of the genome contain binding sites for different sets of communicator proteins (CPs), and CPs that belong to the same set are able to interact with one another but not with CPs from another set. The binding of CPs to a chromatin fiber establishes protein-protein interactions that lead to the formation of dense chromatin globules maintained by internal protein bridges. In practice, however, this model requires the presence of an extremely large number (comparable to the number of TADs within the entire genome) of non-overlapping sets of CPs and seems to not to be valid. It cannot be excluded, however, that at least some genomic segments (such as extended regions of active chromatin) are folded into “active TADs”12,13,15 or TAD-like structures due to stochastically or regularly organized networks of interactions between CPs. This scenario deserves further investigations.
Notably, the above-mentioned mechanisms are not mutually exclusive. The extruded chromatin loops held by CTCF may be self-compacted via stochastic internucleosomal interactions, and the structure of stochastically assembled TADs in the Drosophila genome could be modulated by CP-mediated loops between enhancers and promoters.24 Indeed, folded chromatin fibers are rather flexible and are likely to undergo continuous spatial reconfigurations (options sorting). The establishment of links between remote genomic elements, for example, the assembly of active chromatin hubs, can restrict the spatial mobility of the chromatin fiber giving preference to a particular configuration (for an extended discussion, see ref.3).
The term “functional genomic domain” is poorly defined. One can consider replication and transcription domains that, in some cases, may coincide.6 Here, we focus our discussion on functional domains of the genome related to transcription regulation. These domains may be defined as areas of the genome where the expression of all or most of the genes present is controlled by the same set of regulatory elements. The recently described regulatory archipelagos4 and regulatory domains25 fit well within this definition. They represent large (up to one megabase) segments of the genome containing non-related genes that demonstrate a similar tissue-specificity of expression. Reporter genes integrated into such a domain under the control of a minimal promoter demonstrate tissue-specific expression profiles typical for the domain as a whole.25
Notably, current evidence suggests that the tissue-specificity of gene expression is determined by enhancers rather than by promoters. Each tissue-specific gene is likely to be controlled by several enhancers, and, vice versa, one enhancer may influence the expression of more than one gene.26,27 Although the mechanisms of promoter activation by enhancers still remain unclear, the majority of current models suggest that enhancers should reside in a spatial proximity to their target promoters.2 Consequently, the network of spatial genomic contacts should reflect the network of functional links between promoters and remote regulatory elements. Such spatially functional networks have indeed been observed.27,28 The possibility to establish long-distance enhancer-promoter contacts is obviously restricted by the partitioning of chromosomes into TADs. For this reason, the above described regulatory domains (regulatory archipelagos) generally colocalize with TADs.25 A restriction of the spatial contacts within a TAD appears to be biologically relevant as the fusion of TADs results in dramatic changes of promoter-enhancer interactions and the deregulation of the involved genes.29 Notably, current models of eukaryotic genome functional domains do not assume that these domains are continuous on a chromatin fiber. Rather, they are thought to be assembled only in 3D space due to the juxtaposition of remote regulatory elements. The modern genomics typically only take into consideration distances along a DNA chain and the expansion of various types of chromatin domains is commonly assumed to occur along a linear chromatin fiber. However, similar spreading mechanisms (modifications of histones in neighboring nucleosomes and recruiting of modifying complexes to these nucleosomes) may also operate in a 3D space when nucleosomes are located close enough to each other.30 In this case, the expansion of chromatin domains will be restricted by TADs or sub-TADs. The resent evidence suggests that taking into account the 3D organization of the genome allows provide better explanation for the observed functional links between remote genomic elements, coordinated chromatin state changes within separated regulatory modules and existence of various genome-wide associations (GWAS) in non-coding areas of the genome.31
The results of our recent study suggest that, at least in Drosophila, TADs are primarily composed of inactive chromatin.15 TADs are assembled as a result of a stochastic interaction of non-acetylated nucleosomes and may have functions in the storage of repressed genes. However, after having appeared over the course of evolution, this “storage system” was adapted to serve other functions. In Drosophila, approximately 15% of TADs are composed of active chromatin and contain transcribed genes.15 These TADs are partially decompacted, which is not surprising as nucleosomes of active chromatin rarely establish contacts with each other. It is interesting to consider why these TADs are not converted into loosely packed inter-TADs, as reported in some isolated cases.15 We believe that transcriptionally active TADs are linked together by bridges between promoters and enhancers (possibly between communicator elements, such as insulators, that are located close to promoters and enhancers), thus allowing the gene storage blocks to become filled with newer content. TADs act as regulatory domains where the spatial contacts between enhancers and promoters of tissue-specific genes are established. The advantage of partitioning the genome into TADs is that the area the enhancer should explore to find a target promoter is much smaller than the whole chromosome territory or even the whole nucleus (Fig. 2). Keeping in mind that the repositioning of different parts of a chromatin fiber occurs locally in a stochastic manner32 so that a given genomic locus can only explore 0.5–0.8 μm in 1 hour,33 this may constitute an important advantage. On the other hand, TADs are large enough to harbor various regulatory elements that allow for the assembly of alternate regulatory networks. Surprisingly, this leads back to the main assumption of the classical domain hypothesis of eukaryotic genome organization.1 Rather than operating within the whole genome, regulatory networks are assembled in relatively small and apparently similarly organized structural-functional genomic domains. In mammals, the regulatory role of TADs is likely to become more important. Along with this, the structural organization of TADs becomes more complex and diversified.11,16,34 The functional importance of intra-TAD interactions secured by CTCF and cohesin also increases.11,18,35 Still, it is likely that parts of chromatin fibers that are not involved in long-distance intra-TAD interactions self-assemble into compact tertiary structures due to simple electrostatic interactions between nucleosomes.
It is becoming increasingly evident that the 3D organization of the eukaryotic genome is tightly linked to genomic functional activity.3,5,6,11 However, it is still not clear whether the 3D organization of the genome simply reflects the distribution of various marks along linear fibers of chromatin or the distribution of active genes on DNA, or whether this organization contributes to the regulation of gene activity. Essentially, it remains unclear whether the 3D organization of the genome is established passively or actively. Here, we examine evidence for the cooperative action of both processes. The packaging of the long eukaryotic genome (approximately 2 meters of DNA in mammals) within a relatively small nucleus (10 μM in diameter on average) constitutes a perplexing problem. This task is further complicated by the necessity to keep active genes accessible for trans-acting factors. The TAD organization of the genome solves this problem by placing permanently transcribed genes into less densely packed loose TADs or unfolded inter-TADs. The primary function of TADs appears to be in the storage of inactive parts of the genome.15 However, a partitioning of the genome into spatially isolated blocks (TADs) allows for the evolution of these domains into relatively independent functional genomic blocks with partially autonomous regulatory mechanisms.36 These structural-functional domains are evolutionarily conserved, and their disruption may cause a severe deregulation of numerous genes, resulting in the development of various diseases.29,37 It is clear that studies of eukaryotic genome structural-functional domains are important for both basic science and practical applications in areas such as gene therapy and biotechnology.
No potential conflicts of interest were disclosed.
This work was supported by the Russian Science Foundation (RSF) [grant number 14-24-00022].