|Home | About | Journals | Submit | Contact Us | Français|
When placed between an enhancer and promoter, certain DNA sequence elements inhibit enhancer-stimulated gene expression. The best characterized of these enhancer blocking insulators, gypsy in Drosophila and the CTCF-binding element in vertebrates and flies, stabilize contacts between distant genomic regulatory sites leading to the formation of loop domains. Current results show that CTCF mediates long-range contacts in the mouse β-globin locus and at the Igf2/H19 imprinted locus. Recently described active chromatin hubs and transcription factories also involve long-range interactions; it is likely that CTCF interferes with their formation when acting as an insulator. The properties of CTCF, and its newly described genomic distribution, suggest that it may play an important role in large scale nuclear architecture, perhaps mediated by the co-factors with which it interacts in vivo.
Insulators are DNA elements that were first identified based on their ability to protect a gene from outside influences, which might otherwise lead either to inappropriate activation or silencing of the gene. Insulators have been divided into two classes: enhancer blocking (EB) insulators, which prevent distal enhancers from activating a promoter when placed between an enhancer and promoter, and barrier insulators, which block heterochromatinization and consequent silencing of a gene. In both cases, studies have shown that insulators make use of many of the known mechanisms of epigenetic regulation and genome organization. Recent results suggest that insulating activities, though exploited by the cell for important biological purposes, may be incidental to other properties of equal importance to cell function. Particularly in the case of enhancer blocking, the subject of this review, insulation may be a corollary of a wider role in organizing large-scale structures within the nucleus.
Enhancer blocking insulators are best exemplified by the gypsy element in Drosophila  and the CTCF-binding sites identified as insulators initially in vertebrates  and later in Drosophila . In both cases, placing the insulator element between a variety of enhancers and promoters interferes with transcriptional activation. The ability of CTCF to confer EB insulation was discovered during studies of a DNA insulator element, 5′HS4, located at the 5′ end of the chicken β-globin locus . Interest rose with the discovery that CTCF mediates allele-specific expression at the Igf2/H19 imprinted locus in mouse and human [4–6] (Figure 1A). This finding established the functional importance of EB insulators; much of this report is devoted to recent studies of CTCF function, the structures and co-factors with which it is associated, and its distribution in the genome.
A series of papers analyzing the gypsy element have been important in establishing what may be a general model for the way EB insulators organize chromatin structure. Gypsy sites are bound, in a sequence-specific manner, by a protein, Suppressor of Hairy-wing (Su(Hw)), which recruits other factors. Multiple gypsy sites and their associated proteins cluster together to form ‘insulator bodies’, with the ultimate effect of organizing the nearby chromatin into loop domains (Figure 1B). Although less well studied, the Drosophila scs/scs’ insulator elements also stabilize loop formation mediated by a different set of proteins [7,8].
Although the mechanisms may be different, the overall concept of loop domain organization, established in Drosophila, also appears to hold true for CTCF insulator sites. Before discussing this in detail, it is useful to review what is known about long-range interactions within the nucleus. Much of our knowledge comes from Chromatin Conformation Capture (3C) and its elaborations, measurements that detect contacts between distant DNA sequences, in combination with DNA and RNA fluorescent in situ hybridization (FISH). Studies over a number of years have revealed that enhancers, locus control elements, and promoters are in contact with one another when genes are being transcribed . Other recent studies  have made it clear that most or all active genes are transcribed while associated with RNA polymerase (pol) II transcription factories containing many pol II molecules. A given factory is visited by many genes, with the effect of bringing individual genes that are widely separated on a chromosome, or even situated on different chromosomes, into proximity. The formation of loop domains thus appears to be a common and necessary process associated with transcription in eukaryotes.
This point of view has strongly influenced recent papers analyzing CTCF insulator function. CTCF can interact with itself and with other proteins that may allow it to form clusters, which, as in the case of gypsy, would lead to formation of discrete domains . Recent data confirm that CTCF molecules bound to distant sites, even on different chromosomes, can interact with one another in vivo .
Physical interactions among cis-regulatory elements have been most elegantly demonstrated at the mouse β-globin locus. In differentiated primitive and definitive erythroid cells, where the globin genes are expressed, the distant locus control regions (LCRs) cluster together with the developmentally appropriate promoters to form an active chromatin hub (ACH) . These interactions are absent in embryonic day 12.5 erythroid progenitor cells, which do not express globin. There are three CTCF-binding sites upstream of the mouse β-globin locus and one downstream [13,14]. Recent 3C results  show that, in erythroid progenitor cells, all the CTCF sites are in contact with one another, maintaining a compact domain structure that is dependent on CTCF expression (Fig. 1C). However, at later developmental stages, neither CTCF nor its downstream binding site is necessary for establishment of an ACH or for proper expression of the globin genes.
A more complicated picture emerges from 3C analysis of the mouse imprinted Igf2/H19 locus. Four CTCF-binding sites are situated at the imprinted control region (ICR) that lies between the Igf2 gene and downstream endodermal and mesodermal enhancers. These sites are occupied by CTCF on the maternally inherited allele, but the DNA of the paternal copy is methylated at the ICR, preventing CTCF binding and abolishing insulator function (Fig. 1A) [4–6]. Mice in which the two alleles can be distinguished have been used in 3C experiments to study insulator effects on higher order structure [16,17]. Contacts are detected between the ICR and one of two upstream, imprinted DNA methylated regions (DMR), DMR1 and DMR2 (Fig. 1D, panels a and b). These contacts differ on the two alleles, suggesting a way in which allele-specific interaction between the enhancer and an Igf2 promoter could occur to give rise to allele-specific Igf2 expression.
A different model of the Igf2/H19 system has been proposed based on a similar 3C approach, but with emphasis on interactions between promoters and downstream enhancers  (Fig. 1D, panels c and d). The paternal Igf2 promoters are found to contact the endodermal or the mesodermal enhancer in liver and skeletal muscle cells respectively, consistent with the presence of allele-specific ACHs. On the maternal allele, carrying an active insulator, the hub is disrupted. There is no promoter-enhancer contact, but both Igf2 promoters and downstream enhancers interact instead with the ICR/insulator element. The insulator thus preempts promoter-enhancer interaction. The behavior of the ICR is independent of its location and of the associated gene: predictable and consistent results are obtained when a copy is placed between the endodermal and mesodermal enhancers, or when it is coupled to a different gene and its associated regulatory element.
These experiments suggest a model in which CTCF-binding sites mark the boundaries of separate domains. In the case of the mouse β-globin locus, the presence of such structures in erythroid precursor cells depends on direct or indirect interactions between CTCF molecules bound to insulator sites at opposite ends of the gene cluster . Long-range contacts, dependent on the presence of CTCF at both sites, also stabilize the interaction between the Igf2 ICR and one of the alleles of the Wsb1/Nf1 gene pair on another chromosome . It is less clear what interactions stabilize CTCF/insulator interactions with promoters and enhancers within the Igf2/H19 locus, although imprinted CTCF sites have been documented upstream of Igf2  and CTCF-binding sites have been reported in the neighborhood of the human Igf2 promoters . As will be discussed below, CTCF can recruit many co-factors to its binding sites, and the mechanisms stabilizing long-range interaction could be correspondingly complex. This seems likely, given that the gypsy insulator body in Drosophila, which requires multiple proteins for its stabilization, depends also on an RNA component (Fig. 1B); mutations of the Argonaute genes piwi or aubergine, part of the RNAi pathway, impair insulator activity as well as insulator body morphology . It should be noted that a protein with strong homology to CTCF is also present in Drosophila, and that it is localized at Fab-8, a known insulator site important for regulation of the Abd-B locus .
The effect of sequestering an enhancer and promoter in separate domains, however they are formed, is to prevent enhancer-promoter interaction. As has been pointed out [21,22], this could be a steric effect, or it could result from interference with a processive activating signal (histone modification, pol II transcription) originating at the enhancer. In either case, the mechanism must be directional: insulation only occurs when the insulator lies between promoter and enhancer. Evidence consistent with the processive model comes from an analysis of an episome carrying the human HS2 enhancer and ε-globin gene [23,24]. Insertion of the chicken β-globin 5′HS4 insulator element between an enhancer and promoter results in inhibition of transcription, accumulation of RNA pol II 5′ of the insulator, and interruption in the spread of histone H3 and H4 acetylation.
The action of insulators is not absolute. Strong enhancers can overcome insulation [24,25] and promoter targeting sequences (PTS) found in Drosophila can nullify insulator action and allow a blocked enhancer to function . Perhaps 3C methods can be used to investigate how PTS elements enter into the complex array of interactions and structures involved in insulator activity. We point out that although this review deals only with EB insulators, barrier insulator elements may also establish loop domains as part of their function (see ).
Originally identified as a transcriptional repressor at the myc locus, CTCF has since been characterized as a transcriptional activator, enhancer-blocker, boundary definer and, potentially, genome organizer. It remains to be determined, therefore, where CTCF functions within the genome and how it carries out its various activities at specific sites. The ability of CTCF to act with a diversity of functions has been attributed to its structure. CTCF contains 11 zinc-finger domains with which it binds DNA in a sequence-specific manner. Two recent studies have undertaken different approaches to identify CTCF-binding sites throughout the genome and to tighten a consensus sequence for CTCF binding. Ren and colleagues employed ChIP-Chip, immunoprecipitating CTCF-bound DNA and hybridizing to an oligomicroarray, to identify novel CTCF-binding sites in human fibroblasts, whereas Lander and colleagues applied a computational approach towards the identification of conserved regulatory elements in the genome [19,28]. From these studies, ~14,000 CTCF-binding sites were identified, including a number that have been described previously.
Additionally, this genome-wide approach allowed characterization of the distribution of CTCF in the genome. First, CTCF binding is ubiquitous throughout the genome at conserved sites, implying that these sites are functional. Identified sites correlate strongly with regions containing genes, suggesting that CTCF’s primary role in the genome is to regulate gene expression, although these CTCF sites are far from promoters. Second, domains depleted of CTCF sites tend to include clusters of related gene families and genes that are transcriptionally coregulated; many of these regions are flanked by a pair of CTCF-binding sites. Domains that are enriched in CTCF sites tend to have multiple alternative promoters . These observations are all consistent with a role for CTCF as an insulator. Finally, both studies provide an excellent resource for pursuing further studies of CTCF-binding sites towards a full catalog of CTCF’s functions and mechanisms.
The discovery of a multitude of CTCF-binding sites facilitates generation of a consensus sequence. The identified 20bp consensus binding motif refines that previously described . Base choices at several sites within the motif are ambiguous, and not all of the newly identified sites in the Ren study  are a version of this consensus. It is possible that variation within the consensus sequence can generate versatility in CTCF function if there is a difference in the specific zinc fingers required for binding at each variant of the consensus. Consistent with this idea, studies of truncated versions of CTCF’s zinc fingers domain have revealed differences in binding among various CTCF sites (reviewed in ). It has been suggested, therefore, that CTCF’s myriad functions result from specific differences at each site regarding the fingers bound to DNA and those available for protein-protein interactions.
Another recent genome-wide survey of CTCF-binding sites is part of a high-resolution study, using Solexa sequencing technology, of histone modifications in human T cells . In a number of cases, CTCF sites flank both ends of an active gene domain, which displays extensive H3K27 monomethylation. Immediately outside the region bounded by CTCF, however, the H3K27 sites are trimethylated, a modification associated with condensed, inactive chromatin. The CTCF sites thus mark the boundaries of open chromatin domains, as in the case of the chicken β-globin locus where the insulator function of CTCF was first noted [2,31].
CTCF’s functions may be regulated through its genomic location and/or by the choice of zinc fingers used in DNA binding. In addition, CTCF is post-translationally modified; it can be phosphorylated in its C-terminus and poly(ADP-ribosyl)ated in its N-terminus, and it is likely that more modifications will be identified [32,33]. These post-translational modifications probably play a key role in regulating CTCF binding and/or in mediating CTCF’s diverse functions. Studies using 3-aminobenzamide (ABA), a general inhibitor of poly(ADP-ribose) polymerases, have led to the suggestion that insulator activity requires poly(ADP-ribosyl)ation of CTCF . It should be noted, however, that poly(ADP-ribosyl)ation has been implicated in maintenance of DNA hypomethylation in the genome [34,35]. As CTCF binding is sensitive to methylation status, further work may be needed to reach an unambiguous conclusion.
Recently, poly(ADP-ribsoyl)ation of CTCF was implicated in nucleolar targeting of CTCF upon differentiation . In response to various stimuli inducing differentiation of human cell lines, CTCF was observed to relocate to nucleoli from its normal diffuse nuclear distribution [36,37]. Interestingly, the zinc finger domain of CTCF was sufficient for strong nucleolar localization, while the C-terminus conferred only weak localization. Treatment of these cells with ABA led to an exclusion of CTCF from the nucleoli for the full-length CTCF, but not the zinc fingers construct. It seems likely, as the authors suggest, that the translocation of CTCF to the nucleolus is the consequence of protein-protein interactions, perhaps with nucleophosmin/B23 (, see below).
CTCF has a number of identified binding partners and the list continues to grow rapidly (Table 1). As these partners have been implicated in a range of activities, the variety of CTCF functions, including insulation and genome organization, is likely to reflect the diversity of these cofactors. CTCF interacts with DNA-binding proteins and transcription factors, proteins that interact with histones as well as histones themselves, and other regulatory proteins (Table 1). While several of these proteins have been demonstrated to bind the zinc finger domain of CTCF (Sin3, YB-1, Chd8), Kaiso binds the C-terminus and Yy1 has its highest affinity for the N-terminus (Table 1). It should be noted that while interaction with CTCF has been demonstrated for these proteins, only CHD8, nucleophosmin and RNA pol II have been ChIPed at CTCF-dependent insulators, as indicated in Table 1. It remains to be determined, therefore, whether a number of these identified interactions have functional consequences in vivo.
It has been suggested that the interaction between CTCF and nucleophosmin is connected with its insulator function . Nucleophosmin was detected at sites of CTCF binding in the chicken β-globin locus, and a multicopy transgene containing the 5′HS4 insulator was localized by FISH to the periphery of the nucleolus in a CTCF-dependent manner. These results led to a model of insulator action in which CTCF-nucleophosmin interactions result in the tethering of the insulator to a nuclear structure, thus prohibiting enhancer-promoter communication. It is not yet known whether CTCF-nucleophosmin interactions occur at other CTCF-binding sites and whether tethering of the insulator to a nuclear structure is sufficient to confer insulator function.
Another partner of CTCF, CHD8, has been implicated directly in insulator function . CHD8, a member of the chromodomain helicase family, interacts with CTCF in vitro and is found at several known CTCF-binding sites. Knockdown of CHD8 by RNAi disrupts enhancer blocking in a reporter assay and at the endogenous H19 insulator, without displacing CTCF from the ICR. CHD8 is known to form complexes with histone modifying enzymes , and many members of the CHD family interact both with modifying enzymes and ATP-dependent chromatin remodeling complexes . It thus seems plausible that CHD8 may bring similar activities to CTCF-bound insulators. Enhancer blocking activity could be aided if CHD8 complexes interfered with processive activating signals from an upstream enhancer, or if these complexes helped in recruitment of CTCF sites to nuclear substructures.
CTCF has also been shown to interact in vitro with the large subunit of RNA pol II, both RNA pol IIa and IIo . These two different forms of RNA pol II vary in their phosphorylated state and in the complexes with which they tend to interact; pol IIa is hypophosphorylated and is part of the initiation complex, whereas pol IIo is hyperphosphorylated and associates with the elongating complexes. CTCF in vivo primarily interacts with pol IIa, the form associated with the initiation complex. Studies at the chicken HS4 β-globin insulator revealed an interaction between CTCF and RNA pol II specifically in proliferating cells, but not in differentiated cells , and pol II appears to accumulate at the HS4 insulator when it is downstream of an enhancer (see above) . This suggests that, rather than actively recruiting RNA pol II, CTCF could, in these cases, contact RNA pol II as a consequence of blocking the spread of RNA pol II or its transfer from enhancer to promoter.
Many, perhaps most, transcription-related events within the nucleus take place at clusters of regulatory molecules that promote long-range interactions between genes and their regulatory sites. The two best characterized insulator proteins, Su(Hw) and CTCF, are implicated in stabilizing long-range interactions, but these clusters may be independent of the transcription-related centers. CTCF, in particular, can mediate long-range interactions at the β-globin locus at developmental stages where ACHs do not form. It has been suggested that this activity helps to organize the β-globin locus and other gene clusters into domains amenable to shared regulation or to exclusion of outside influences . This is consistent with the observation that many gene domains in the human genome are flanked by CTCF sites. Insulation is an essential function, but it may reflect an even more general role in genome organization.
Long-range interactions involving CTCF can occur between distant pairs of sites that bind CTCF [12,15]; these may be stabilized simply by CTCF dimerization . CTCF may also be able to form loops by interacting with other target proteins at distant sites. These may be mediated by some of the CTCF-associated factors we have discussed. Whether enhancer, promoter and CTCF are all gathered at the same site or at distinct sites is unknown . The implications of loop formation for insulation mechanisms have been examined elsewhere . Problems remain with all of the obvious mechanisms, which suggest that just as there appears to be more than one kind of CTCF-mediated interaction, there may be more than one way to make an insulator.
We thank our colleagues in the Felsenfeld lab for their comments on this manuscript. This work was supported by the Intramural Research Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.