|Home | About | Journals | Submit | Contact Us | Français|
Transcriptional regulation of human genes depends not only on promoters and nearby cis-regulatory elements, but also on distal regulatory elements such as enhancers, insulators, locus control regions, and silencing elements, which are often located far away from the genes they control. Our knowledge of human distal regulatory elements is very limited, but the last several years have seen rapid progress in the development of strategies to identify these long-range regulatory sequences throughout the human genome. Here, we review these advances, focusing on two important classes of distal regulatory sequences — enhancers and insulators.
Eight years have passed since the human genome sequence was first mapped [1,2], but much of the genome still remains unannotated today. Roughly 1.5% of the genome is devoted to protein coding, and 45% is repetitive DNA — mostly remnants of retrotransposons over eons of evolution. The remaining ~50% of the human genome was sometimes referred to as ‘junk’ sequences in the past, but increasing evidence now suggests that these noncoding sequences contain key regulatory elements responsible for the elaborate expression programs in the diverse cell types of the human body. The critical, unresolved questions are: Where are these regulatory sequences? Which specific genes do they control? What transcriptional programs are they part of? How many of these sequences are there? These questions seemed very daunting only a short time ago, but thanks to recent progress in genomic technologies and new understanding of chromatin landscape in the nuclei, the answers appear to be in sight.
Temporal and tissue-specific gene expression in mammals depends on cis-regulatory elements in the genome. These noncoding sequences can be divided into many classes depending on their regulatory functions (Figure 1) . Among the better-characterized elements are promoters, enhancers, silencers, and insulators. Transcription initiates from promoters, which serve as anchor points for the recruitment of the general transcriptional machinery [4,5]. Enhancers act to recruit a complex array of transcription factors and chromatin-modifying activities that facilitate gene transcription [6,7]. Silencing elements, on the other hand, bind proteins and/or modify chromatin structure to inhibit gene transcription [6,8]. Insulator elements provide additional regulation by preventing the spread of heterochromatin and restricting transcriptional enhancers from activating unrelated promoters . Besides these four classes of cis-regulatory sequences, there are also locus control regions that facilitate the activation of a cluster of genes through still poorly understood mechanisms. A recent comprehensive survey of 1% of the human genome, using a combination of multiple genomic and computational methods, has identified a large number of transcripts and potential regulatory elements [10•]. The results indicate that the regulatory elements are more abundant in the genome than the genes they control, they are mostly distal to the genes that they regulate, and they undergo more rapid turnover during evolution.
In this review, we focus on two types of distal regulatory elements: insulators and enhancers. Compared to promoters, which have been extensively characterized and amply reviewed elsewhere (see  and references therein), the landscape of insulators and enhancers in the human genome and their roles in cell type specific gene expression have only recently become evident. A number of new findings now point to the existence of a remarkably large number of enhancers and insulators in the genome, and indicate that these elements contribute to cell type specific gene expression in distinct ways. It has emerged that enhancers are likely of primary importance in determining cell type specific gene expression. On the other hand, the activities of insulators across the genome are largely cell type invariant. These results point to key regulatory pathways that determine lineage specification and cellular identity, and have broad impacts on understanding mechanisms of human development and molecular basis of many human diseases.
Insulator elements affect gene expression by preventing the spread of heterochromatin (barrier function) and/or restricting transcriptional enhancers from activation of unrelated promoters (enhancer blocking) . In vertebrates, enhancer-blocking function of insulators requires association with the CCCTC-binding factor (CTCF), a protein with 11 zinc-finger motifs in its DNA binding domain that is capable of recognizing long and diverse nucleotide sequences [13–15]. To identify CTCF binding sites in the human genome, Kim et al. performed ChIP-chip analysis using genome-tiling microarrays, and determined a total of 13 804 regions bound by CTCF [16•]. They defined a consensus motif that is shared by most of these CTCF binding sequences, which is capable of interacting with CTCF in vitro. Therefore, CTCF’s binding in vivo is to a great extent mediated via this consensus motif. Interestingly, using bioinformatics approaches, Xie et al. independently identified the same motif as one of the most abundant and evolutionarily conserved long motifs in the human genome, suggesting a strong evolutionary pressure to maintain the insulator function and a general role for the CTCF protein in this process .
Several additional studies using ChIP-based approaches have identified CTCF binding sites in additional cell types including CD4+ T cells, HeLa cells, and activated T cells [18,19,20••]. Remarkably, the majority of CTCF binding sites identified in the fibroblasts and CD4+ T cells were found to be in common, implying that most insulator elements are not specific to individual cell types. This notion is further supported by several additional studies. Crawford and colleagues used a high throughput method (DNase-chip) to identify sites sensitive to DNaseI treatment in isolated nuclei from several cell types . They showed that the DNaseI hypersensitive sites (DHS) shared among different cell types are highly enriched for the CTCF motif, and overlap with previously identified CTCF binding sites in IMR90 cells. Further functional studies confirmed that many of these cell type invariant DHS are indeed insulators. More recently, Heintzman et al. investigated the binding of CTCF in 1% of the human genome (ENCODE regions) in five diverse cell types [22•]. They found over 600 CTCF binding sites, and demonstrated that most of them are actually associated with CTCF in each of cell types investigated. Therefore, most CTCF binding insulator elements appear to function in a way that is independent of cell type.
How does CTCF protein mediate enhancer-blocking function? Several research groups using a chromatin conformation capture (3C) approach have pointed out a role for CTCF in mediating interchromosomal and intrachromosomal interactions [23–25]. This finding is further supported by recent studies that link CTCF to the cohesin complex ( and references therein). Using the ChIP-chip method, Wendt et al. found that over 90% of CTCF binding sites are also occupied by the cohesin complex in a diverse set of human cells . Since the cohesin complex has a well documented role in keeping sister chromatids together during DNA replication and before mitosis, it is conceivable that the cohesin complex may play a similar structural role in stabilizing long-range interactions between distant chromosomal regions or between DNA on two different chromosomes. Indeed, this is shown to be the case for several CTCF binding sites located upstream and downstream of the γ-interferon locus .
In order for transcription to start, the transcription machinery must first overcome the negative effects of chromatin, the highly ordered compact form in which native DNA exists inside every cell . The fundamental structural units of chromatin are nucleosomes, which consist of 146 base pairs of DNA wrapped around a single histone octamer composed of two histone H2A–H2B dimers and one H3–H4 histone tetramer . The compaction of DNA into chromatin prevents the protein–DNA interactions required for transcription, unless these chromatin structures are decondensed and altered in ways to make the underlying DNA sequence available to transcription factors and RNA polymerase II (RNAPII).
Enhancer elements can be defined as DNA sequences that serve to recruit transcription factors which promote the decondensation of repressed chromatin and/or facilitate the assembly of the transcription machinery at gene promoters . The human genome encodes approximately 1700–1900 sequence-specific transcription factors [31••]. These proteins usually contain two distinct domains, one responsible for the recognition of specific DNA sequences (DNA-binding domain), the other carrying out a regulatory function (regulatory domain). One primary function of the regulatory domain is to recruit cofactors that carry chromatin-remodeling activities or can directly interact with the RNAPII transcriptional machinery [32,33].
Several classes of protein complexes are recruited to specific enhancer elements to remodel the local chromatin structures  (Figure 2). One class of proteins, represented by the SWI/SNF complexes, modifies the chromatin structure noncovalently in an ATP-dependent fashion . These proteins, once recruited to enhancer elements, can reposition specific nucleosomes along the DNA. Consequently, core promoters may be exposed to allow transcription to start [35,36]. Alternatively, key transcription factor target sites may also be exposed to allow the assembly of functional enhancer complexes. Another class of cofactors remodels chromatin structure by introducing covalent modifications to the N-terminal tails of histones [8,37]. One of the well-known modifications involves the acetylation of histones H3 and H4 at the N-terminal domains. Such modifications may directly induce the decondensation of packed nucleosomes, or serve as a platform for the recruitment of additional chromatin-remodeling factors. A number of histone acetyl-transferases (HATs) have been identified that catalyze the acetylation of histones at specific residues. The protein complexes that catalyze histone acetylation include PCAF, CBP, p300, GCN5, TRRAP, and others, which are also known to function as cofactors for many transcriptional activators .
The third class of cofactors that can be recruited to enhancer elements includes so-called mediator complexes . These proteins facilitate transcription by serving as interfaces between sequence-specific transcription factors and the general transcription apparatus in eukaryotes. Transcriptional coactivators in this category, including MED1, p160, Asc2, and others, have been shown to be recruited to specific enhancer sequences to promote the assembly of functional transcription initiation complexes [40,41].
One type of experimental evidence to suggest a DNA sequence as an enhancer element is its association with an activator protein that binds to specific DNA sequences. Although this strategy has been successfully carried out for a number of transcription factors in a variety of cell types (for example, see Refs. [10,42]), the strategy is not really feasible for the determination of all enhancer elements, because of the large number of transcription factors encoded by the human genome and the number of cell types needed. Further, the mere binding of a sequence-specific DNA binding protein could lead to activation, repression, or no transcriptional consequence. Therefore, an alternative approach has been used to determine the binding sites of coactivator proteins, such as p300, binding of which is more closely related to transcriptional activation. Using this strategy, Visel et al. recently determined the p300-binding sites in the mouse genome in forebrain, mid brain, and limb of e11.5 mouse embryos [43••]. Between 500 and 2500 binding sites were identified in each of the embryonic tissues. That these elements function as tissue-specific enhancers was confirmed by mouse transgenic reporter assays, which showed that over 80% of the tested elements drive reporter gene expression in the tissue where p300-binding was detected. Because virtually all known transcription factors function by recruiting transcriptional coregulators, and because the number of chromatin-remodeling complexes or mediators in the genome is much less than the number of sequence-specific transcription factors, the strategy of using cofactors as one ‘marker’ for enhancer elements is a more practical approach to identify all enhancer elements in the genome. The main hurdle, however, is the availability of suitable antibodies against each of the known coactivator proteins.
Another strategy to experimentally determine enhancers stems from the initial observation that distal p300-binding sites are associated with a unique combination of chromatin modifications that involves, among others, the presence of mono-methylated histone H3 lysine 4 (H3K4me1) and the absence of the tri-methylated form of this lysine (H3K4me3) [22,44••] (Figure 3). Indeed, when this pattern of chromatin modification signature was used to search for additional similar genomic regions in 1% sampling of the human genome, approximately 400 putative enhancers were identified that included 85% of the p300-binding sites and ~300 other sequences. Importantly, the majority of these putative enhancers are associated with DNaseI hypersensitivity, bound by coactivators p300 or MED1, and associated with additional ‘active’ chromatin marks such as histone acetylation, making them likely enhancers. When tested in reporter assays, the predicted enhancers can indeed support transcriptional activation, providing preliminary evidence for their function. With the enhancer-specific chromatin signatures, we have generated a list of more than 90 000 potential enhancers in four types of human cells (; Hawkins et al., unpublished data). To date, a total of 26 predicted enhancers have been tested by reporter assay in transient transfection in vitro, and over 80% (21 out of 26) of the tested fragments were shown to possess enhancer activity, supporting the validity of this enhancer-finding method (; Hawkins et al., unpublished data). It is worth noting that although many chromatin modification marks are found at enhancers and can be used to predict such elements in the genome, Heintzman et al. found that with the use of profiles of just two chromatin marks — H3K4me1 and H3K4me3 — one can achieve excellent specificity and sensitivity [22,44••]. Additionally, this minimal chromatin signature has been used to identify enhancers in a variety of different cell types, in both humans and mice (Ren et al., unpublished data).
One of the most prominent features displayed by enhancers, compared to that of promoters and insulator elements, is their cell type specific activities. While previous works on classical enhancers such as those in beta-globin genes have suggested such properties of enhancers, recent genome-wide studies have confirmed this on a global scale. Among the p300-binding sites identified in three embryonic tissues, the majority are occupied by the coactivator in only one of the tissues, and when tested in mouse transgenic assays exhibited tissue-specific enhancer activities . Similarly, p300-binding sites found in three human cell lines demonstrated highly cell type specific occupancy by the factor [22•]. Furthermore, the enhancers identified in different cell types are associated with cell type dependent chromatin modification patterns. The cell type specific presence of chromatin marks, such as H3K4me1, at enhancers is closely correlated with cell type specific expression of the putative targets of these enhancers. These findings indicate that enhancers are more dynamically regulated in different cell types, suggesting that these elements are of primary importance in driving cell type specific gene expression.
Computational analysis of the putative enhancers discovered in the human genome has revealed a number of over-represented DNA motifs, with some matching the recognition sites of known transcription factors [22•]. Interestingly, of the 41 motifs identified in these enhancer sequences, over 90% appear to be unique to enhancers, and exhibit no enrichment at promoters, suggesting that some transcription factors may function exclusively through these distal cis-regulatory elements. Indeed, recent investigations into the genomic binding sites of 14 sequence-specific transcription factors in the mouse embryonic stem cells revealed two classes of in vivo binding sites by these factors — nearly half of them, including Oct4, Sox2, Nanog, appear to bind more preferentially to distal regulatory sequences, while the rest, including cMyc, prefer to occupy promoters .
One of the challenges in characterizing enhancer function is determining which genes they control. The issue arises because frequently these distal cis-regulatory elements are located tens or hundreds of kilobases away from their target genes, and could be located at the gene body of nearby genes. Further complicating the issue, there has also been report that enhancers could activate target genes located on different chromosomes .
To resolve the target genes of enhancers, researchers frequently assign the enhancers to the nearest genes as a first order approximation [43••]. While in most cases, such assignment would sufficiently explain cell type specific expression of genes, there has not been any report on the rate of false positives by this strategy. A variation of the above strategy is to assign enhancers to the genes located within the same genomic segments bounded by the enhancer-blocking insulator elements, which can be experimentally determined as CTCF binding sites [22•]. This strategy appears to capture nicely the correlation between chromatin modification patterns at enhancers and the differential gene expression at the presumed target genes. Consistent with this model, upon depletion of CTCF and presumably loss of enhancer-blocking function by the insulator elements, a significant number of genes located near previously shielded enhancers become activated (Figure 4). While this strategy is conceptually simple, the limitation has been a lack of understanding of the functional mechanism for enhancer blocking by insulators. As discussed above, an emerging consensus is that CTCF binding sites act to establish long-range chromosomal interactions that would lead to the formation of local topographical constraints. Depending on the way such topographical constraints are formed, the enhancer/promoter interactions may be restricted in different ways, and therefore different assignments may be made for the enhancers.
In principle, a more direct approach for assigning enhancers to target genes is to experimentally determine the long-range chromosomal interactions between enhancers and target promoters . This can be accomplished by the Chromosome Conformation Capture (3C) method  or its high throughput variations including 4C (circular 3C)  or 5C (3C carbon copy) [45,49–51]. This strategy is based on the observations that active enhancers are brought in close proximity to target promoters through DNA looping . In many cases, this method has helped to define unexpected target genes, such as those located in different chromosomes . The future will see more enhancer/target relationships defined using this strategy.
Although great strides have been made in recent years in unraveling the distal regulatory sequences in the human genome, it should be recognized that a long journey is still ahead of us. We are only beginning to comprehend the landscape of enhancers in the human genome, and it is almost certain that more enhancers are yet to be identified. Since these elements are active in a cell type specific manner, their identification will require investigation of many more additional cell types than current efforts have engaged. Based on the limited sampling, however, the number of enhancers in the human genome is likely on the order of hundreds of thousands, or even millions, making them the most abundant class of cis-regulatory sequences.
With genome-wide identification of enhancer sequences in different cell types, tissues, and developmental stages, the next important question is which transcription factors act through these elements to mediate transcriptional activation of the target genes. Motif analysis can lend insights into potential transcription factors involved. However, a bottleneck has been our limited knowledge of consensus DNA binding sites recognized by the over 1700 transcription factors encoded by the human genome [31••]. So far, such information is available for just a few hundred of transcription factors, described in databases such as TRANSFAC, JASPAR, and UniPROBE [52–54]. Fortunately, investigations using high throughput methods, including Protein Binding Microarrays and one hybrid methods are quickly enriching our knowledge base and will likely provide the full spectrum of DNA binding for most human transcription factors in the near future [55–60].
In parallel, a consortium of investigators is now generating genome-wide data for various transcription factors using ChIP-seq in many different human cell lines [10•]. This will surely one day lead to systematic determination of functional sequences in the human genome, and provide a rich resource for understanding gene regulatory sequences.
A fundamental question remains to be resolved regarding the molecular mechanisms by which long-range elements such as enhancers and insulator elements act to regulate gene expression. A commonly accepted model is that DNA sequences between enhancers and promoters form loops to allow the distal regulatory sequences to interact directly with the promoters  (Figure 5). In this model, the function of enhancers is to facilitate the assembly of the RNA polymerase complex at the promoters, or subsequent steps of RNA polymerase transcription cycle. Although this model is supported by strong evidence over the years, important details remain to be elucidated. How does the loop form? What mechanism allows the enhancers to be positioned to the appropriate target promoters? Which step(s) of the RNA polymerase transcription process do enhancers act on?
Another theory, not mutually exclusive to the looping model, is the RNA polymerase factory model . In this model, synthesis of RNA transcripts occurs in nuclear compartments that contain stores of RNA polymerase and accessory factors. Active genes are dynamically shuttled in and out of the ‘factories’ during transcription. Since there are a limited number of such factories, ranging from several hundreds to thousands in a single nucleus, each factory needs to simultaneously transcribe multiple genes. This model therefore predicts colocalization of coordinately regulated genes in the nucleus, and the role of the enhancers, thusly, is to facilitate the movement of genes into the RNA polymerase factories. Indeed, recent studies by Hu and colleagues showed that nuclear receptors induce the rapid movement of enhancer sequences to specific nuclear speckles, and the new interchromosomal interactions formed this way are required for the induction of nuclear receptor target genes .
Future work will also help resolve other questions regarding the mechanisms of chromatin signatures at enhancers [44••]. Are the chromatin signatures dependent on transcription factor binding? Which enzymes are responsible for depositing and maintaining these chromatin modifications? Are the modifications necessary for enhancer function? If so, how do they affect transcriptional activation? Answers to these questions promise to shed light on the mysterious mechanisms of enhancer function.
We apologize to those whose are not referenced here due to limitation of space. We thank Gary Hon for comments on the manuscript. The work is supported by funds from the Ludwig Institute for Cancer Research, Nation Institute of Health, and the California Institute of Regenerative Medicine.
This review comes from a themed issue on Genomes and evolution
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest