|Home | About | Journals | Submit | Contact Us | Français|
Biological differences among metazoans, and between cell types in a given organism, arise in large part due to differences in gene expression patterns. The sequencing of multiple metazoan genomes, coupled with recent advances in genome-wide analysis of histone modifications and transcription factor binding, has revealed that among regulatory DNA sequences, gene-distal enhancers appear to exhibit the greatest diversity and cell-type specificity. Moreover, such elements are emerging as important targets for mutations that can give rise to disease and to genetic variability that underlies evolutionary change. Studies of long-range interactions between distal genomic sequences in the nucleus indicate that enhancers are often important determinants of nuclear organization, contributing to a general model for enhancer function that involves direct enhancer-promoter contact. In a number of systems, however, mechanisms for enhancer function are emerging that do not fit solely within such a model, suggesting that enhancers as a class of DNA regulatory element may be functionally and mechanistically diverse.
Genomic DNA acts as a carrier of information in two fundamental ways: first, in transcribed genes, by specifying the sequences of mRNAs and functional RNAs that are translated into protein, as well as information encoded in RNA that affects its processing and stability; second, in regulatory sequences, by providing sites for transcription factors to bind and establish the appropriate levels and spatial and temporal expression patterns of those genes. A large proportion of the regulatory information necessary for gene expression is confined to the promoter region immediately upstream of transcription start sites. In single-celled organisms, this information can serve to specify absolute levels of transcription, and in many cases can mediate alternate responses (up- or down-regulation) to external stimuli.
Metazoans present a challenge in this regard. A single genome specifies many morphologically distinct cell types, and also directs the ordered processes of development and differentiation that lead to the varied structures present in an adult multicellular organism. Sequencing of metazoan genomes has not revealed a simple correlation between genome size, as measured by the number of genes, and relative complexity, as measured by number of cells and cell types and by diversity of behavior. The 959 to 1031 cells of the nematode Caenorhabditis elegans and the trillions of cells in a typical human are both specified by ~20,000-25,000 protein-coding genes. Thus, morphological and developmental complexity is not a function of increased numbers of genes, but of alternative mechanisms. For example, a higher proportion of vertebrate genes are subject to alternative splicing, as compared to invertebrates (Kim et al., 2006). Notably, however, complexity can be generated by diversification of the patterns in which genes are expressed, both spatially and temporally, within an organism.
Such diversification is enabled by a correspondingly more complex set of regulatory information in the genomes of metazoans. In particular, in metazoans transcriptional regulation is often decoupled from the confines of the promoter-proximal region and distributed among distal sequence elements, termed enhancers, which can be located far from the transcription start site. This distribution of regulatory sequences evades the limitations inherent in systems in which transcription is a function solely of the few hundred base pairs immediately upstream of a gene promoter. Recent developments in genomics, coupled with studies of covalent histone modifications, structural features of genomic DNA, functional assays for regulatory elements, methods to investigate nuclear organization, and cross-species sequence comparisons have revealed enhancers, as a class of regulatory element, to be generally and fundamentally
Enhancers were first characterized using, and are still most often functionally defined by, transient reporter gene assays in cultured cell lines. The activity associated with such elements – first for viral sequences (Banerji et al., 1981; Moreau et al. 1981), then subsequently in sequences originating from metazoan gene loci (Banerji et al., 1983; Gillies et al., 1983) – is the activation of transcription regardless of their location or orientation relative to the promoter within a plasmid construct. This flexibility is the defining hallmark of enhancers. They are commonly found within the introns of the genes they regulate (or, in fact, within the introns of neighboring genes), and often at prodigious distances from the promoter. One of the most extreme examples known, for example, is a limb bud enhancer for the mouse sonic hedgehog (Shh) gene, which is located within the intron of another gene more than 1 Mb from the Shh gene promoter (Lettice et al., 2003; Sagai et al., 2005).
This flexibility, however, impedes attempts to comprehensively identify and catalog the full population of enhancers within the genome, or indeed the full complement of enhancers that act upon a single selected gene. Whereas the promoter of a gene can be identified simply by sequencing the 5′ end of its mRNA, no similarly clearcut criterion exists that can pinpoint the location of an enhancer or the target gene for its activity. Enhancer detection therefore relies on a number of imperfect measures of chromatin structure and sequence functionality.
Enhancers are typically found to colocalize with disruptions in chromatin structure revealed by hypersensitivity to digestion by DNaseI. DNaseI hypersensitivity was first discovered at the promoter region of the Drosophila hsp70 gene (Wu, 1980), and is usually thought to result from short (100-300 bp) regions of genomic DNA from which nucleosomes are excluded due to the binding of transcription factors (Elgin, 1988; Gross and Garrard, 1988), although DNA bending by transcription factors has also been implicated (Stamatoyannopoulos et al., 1995; Leach et al., 2001). Enhancers consist of clusters of cognate binding sites for transcription factors that can both exclude nucleosomes and bend DNA, and so are marked by such nuclease hypersensitivity. For several decades, scans of gene loci for putative enhancer elements involved the slow and laborious indirect end-labeling assay, but recent advances have allowed the mapping of nuclease hypersensitivity genome-wide using microarrays or high-throughput sequencing (Crawford et al., 2006).
Another criterion employed for predicting putative enhancer sequences is noncoding sequence conservation. The full utility of this approach has only been realized in the past decade, when multiple fully sequenced genomes have been available for comparison. The principle, however, is well established, resting on the assumption that conservation of DNA sequence across evolution in regions that do not encode proteins implies regulatory function. Nonfunctional sequences, in contrast, will accumulate mutations over evolutionary timescales and eventually diverge. High-level expression of the mammalian β-globin genes in erythroid cells, for example, requires a set of sequence elements termed the locus control region (LCR) that is distributed across a region of 20-30 kb located upstream of the gene cluster (Bender et al., 1990; Moon and Ley, 1990; Grosveld et al., 1987; Hardison et al., 1997). Each of these sequence elements shows significant sequence conservation among all mammalian genomes sequenced thus far. More generalized studies have similarly indicated that evolutionary DNA sequence conservation can predict enhancer activity (Visel et al., 2009; Noonan and McCallion, 2010).
On the other hand, several well-documented examples exist of genes that are expressed in identical patterns in different species, but which require the activities of enhancers that look nothing alike. Both Drosophila and sepsid flies, for example, express orthologous even-skipped genes with identical patterns in the developing embryo, and this expression pattern is governed by multiple enhancers. Sequence comparisons, however, fail to reveal any significant similarities between the DNA sequences of these enhancers (Hare et al., 2008). Similarly, expression of the Pax2 gene in the Drosophila melanogaster eye is regulated in part by a cone-specific enhancer, within which multiple transcription factors bind in a specific pattern that can be disrupted by small deviations in sequence of, or spacing between, binding sites. Surprisingly, however, the cone-specific Pax2 enhancers in other Drosophila species exhibit very little conservation of sequence or binding site spacing, even though they function identically when transferred into D. melanogaster (Swanson et al., 2010). A systematic evaluation of a 40 kb region encompassing the zebrafish phox2b gene showed that a large proportion of sequences with demonstrable regulatory activity were not identified by measures of evolutionary sequence constraint (McGaughey et al., 2008).
In fact, comparisons of genome-wide transcription factor binding patterns across species indicate that a large proportion of enhancers are species-specific. In an analysis of mouse and human hepatocytes, for example, 41-89% of binding sites for four transcription factors were found to be species-specific (Odom et al, 2007). A subsequent study examined genome-wide binding of the liver-specific transcription factors CEBPA and HNF4α in multiple species (Schmidt et al., 2010). Only 10-22% of binding sites were shared between placental mammals (human, mouse or dog), and this number was even lower in comparisons between placental mammals and a nonplacental mammal (opossum) or chicken. Moreover, variation in transcription factor binding patterns appears to have a predominantly genetic origin. In mouse hepatocytes harboring an intact human chromosome 21, binding patterns of the transcription factors HNF1α, HNF4α and HNF6 are nearly identical to those observed on chromosome 21 in human hepatocytes (Wilson et al., 2008). Combined with other indications that a large proportion of functional enhancers is not subject to evolutionary constraint (ENCODE Project Consortium, 2007; McGaughey et al., 2008), such findings indicate that enhancers can “turn over” rapidly over evolutionary timescales, even when associated gene expression patterns are conserved.
Thus, conservation of regulatory function is not always reflected in conservation of DNA sequence. An approach to enhancer prediction that relies on sequence conservation alone will fail to identify a large proportion of enhancers.
Recently, genome-wide studies have suggested that enhancers exhibit a characteristic chromatin “signature”. This signature consists of monomethylation of histone H3 lysine 4 (H3K4Me1) in the absence of significant trimethylation (H3K4Me3) (ENCODE Project Consortium, 2007; Heintzman et al., 2007; Koch et al., 2007). Notably, H3K4Me3 is associated with active gene promoters, which in turn exhibit low levels of H3K4Me1 at the transcription start site. In other studies, however, a sharp divide between H3K4Me3 present at promoters but not enhancers, or H3K4Me1 present at enhancers but not transcription start sites, has not been as obvious (Barski et al., 2007; Wang et al., 2008). The basis for the discrepancy is not clear, although each study utilizes different sources of chromatin (transformed cell lines vs. primary T lymphocytes), different protocols for isolation of chromatin (formaldehyde crosslinking and sonication vs. micrococcoal digestion of native chromatin) and different antibodies. It is nevertheless now possible to catalog enhancers by identifying histone methylation signatures that are not associated with other functional elements.
The H3K4 methylation signature has been correlated with enhancer activity in gain-of-function assays (ENCODE Project Consortium, 2007; Heintzman et al., 2007), but imperfectly. This is perhaps unsurprising, since the transient reporter gene assay is often a poor proxy for the activities of regulatory elements integrated in the genome, which can be active in narrow windows of development and/or cellular differentiation. Enhancer predictions become more significant, however, when the histone methylation signature is combined with other indicators of enhancer activity in a specific cell type, such as transcription cofactor binding – most often, the acetyltransferase p300 (Visel et al., 2009; Blow et al., 2010; Ghisletti et al., 2010).
In addition, studies utilizing embryonic stem (ES) cells and multiple primary cell types suggest that acetylation of histone H3K27 in combination with H3K4Me1 is correlated with enhancers near active genes, while H3K4Me1 in the absence of H3K27 acetylation appears to mark inactive or “poised” enhancers (Creyghton et al., 2010; Rada-Iglesias et al, 2010). Notably, p300 can acetylate H3K27. Moreover, many poised enhancers in ES cells were instead associated with H3K27Me3, and at a subset of these enhancers, the H3K27Me3 mark was replaced with H3K27Ac upon differentiation of ES cells along a neuronal pathway (Rada-Iglesias et al., 2010). The available evidence suggests that H3K4Me1 represents a generalized, although perhaps not all-inclusive, mark for distal enhancers in a given cell type. Additional modifications can then distinguish between enhancers that are active and those that are potentiated for activity in response to growth conditions or cell-fate decisions.
The most accurate predictions for enhancer activity to date are derived from combinatorial analysis of the binding of multiple transcription factors across the genome in Drosophila (Zinzen et al., 2009). In this study, binding of five transcription factors known to be involved in the differentiation of different muscle cell types from mesoderm was mapped genome-wide at different stages of development. A machine learning method was then applied to this data set in order to derive predictions for enhancers active in different cell types. Using this approach, the authors were able to identify 77% of all previously characterized muscle-specific enhancers in five different cell types in Drosophila, and had a roughly equivalent success rate when novel predicted sequences were tested for enhancer activity.
If one accepts the description of a chromatin “signature” for enhancers, current evidence then suggests that mammalian genomes harbor an abundance of such elements, and that they are the major determinant of cell type specificity in gene expression. Examination of genome-wide H3K4 methylation patterns in two cell lines – K562, a human erythroleukemia, and HeLa, a human cervical carcinoma – resulted in an estimate of 24,000-36,000 enhancers in each line (Heintzman et al., 2009). When the locations of these enhancers were compared between the two lines, only 5,000 were found to be present in both. Analysis of histone modification patterns across a region comprising 1% of the human genome, compared among five different cell lines, indicated a frequency of H3K4Me1 distal from promoters that agrees roughly with the genome-wide K562/HeLa study, and also that a much higher number of enhancers defined by this criterion exhibited cell-type specificity, as compared to promoters (Koch et al., 2007).
Genome-wide mapping of nuclease hypersensitive sites (HSs) also indicates that enhancers are the primary determinant of cell-type specificity. A survey of HSs across 1% of the human genome in six cell lines (including HeLa and K562) established a strong correlation between hypersensitive sites that were cell-type specific and enhancer elements as defined by the H3K4Me signature (Xi et al., 2007).
Based on such observations, it has been suggested that the human genome might harbor as many as 1 × 106 enhancers (Heintzman et al, 2009). At present, however, this is little more than the roughest of estimates, since it is not clear how extensive variability in the H3K4Me signature actually is in vivo. For example, as yet no studies have yet addressed the degree of overlap between cells within a specific lineage at different stages of differentiation, or similar cells at different stages of embryonic development. There is also no a priori reason to expect that enhancers as a general class of functional genomic element should necessarily exhibit the same histone methylation pattern. A study of human CD4+ T-cells that investigated 39 distinct chromatin-associated marks identified several different histone modifications that correlated with putative enhancers, but no single modification was completely predictive (Wang et al., 2008). Finally, despite studies such as this, dozens of histone modifications remain untested. Thus, a true enhancer census remains a subject of speculation.
Assuming that each enhancer is marked by a nucleosome-free region that can extend for 200-300 bp, estimates of the abundance of enhancers suggest that they could comprise as much as 10% of the human genome. For comparison, the total extent of protein-coding sequences in the human genome is estimated at 2-3%. Mutations within protein-coding sequences are well established as a basis for human disease, with many thousands of known examples, and so it is unsurprising to find that mutations within enhancers can similarly lead to heritable disorders. For example, some instances of X-linked deafness type 3 are associated with loss of an enhancer region located 900 kb upstream of the POU3F4 gene (de Kok et al., 1996). An enhancer located within the first intron of the RET proto-oncogene occurs in a variant that confers a 20-fold increased risk for Hirschsprung's disease (Emison et al., 2005). Numerous other examples have been presented, indicating that non-coding distal regulatory sequences represent a target for mutations that can lead to disease (Kleinjan and van Heyningen, 2005; Visel et al., 2009; Noonan and McCallion, 2010).
Thus far, however, the proportion of known mutations that localizes to enhancer sequences does not align with the apparent complexity of the enhancer population in the genome. A very small percentage of mutations documented in the Human Gene Mutation Database map to noncoding DNA, and the majority of these correspond to promoter-proximal regions (Noonan and McCallion, 2010). In part, this may stem from the historical difficulty in mapping and characterizing gene-distal enhancers, and thus reflect an artificial bias for protein-coding regions. Another contributing factor, however, may be that enhancers are not as sensitive to single-base pair alterations as protein-coding sequences. While such changes have the potential to fundamentally alter transcription factor association and render an enhancer nonfunctional, they are more likely to alter binding affinities or developmental patterns of association in ways that can change function more subtly without eliminating it. While a mutation in a protein-coding region will affect function in every cell in which the protein is expressed, a mutation in an enhancer may affect only part of the expression pattern. Enhancer mutations are therefore a potentially powerful engine for intraspecies variation, and insofar as such changes affect selectable traits, for evolutionary divergence (Rebeiz et al., 2009; Visel et al., 2009; Levine, 2010; Noonan and McCallion, 2010).
Illustrations of the principle include freshwater stickleback fish populations, which lack pelvic fins that are present in ancestral fish and in modern-day, saltwater stickleback populations. Freshwater sticklebacks have lost an enhancer located 5′ of the gene for the homeobox transcription factor Pitx1. In stickleback fish that have pelvic fins, this enhancer specifically activates Pitx1 gene expression in the pelvic fins during development (Chan et al., 2010; Levine, 2010). Another study defined a series of mutations in an enhancer for the ebony gene in Drosophila (Rebeiz et al., 2009). Populations of Drosophila in Uganda exhibit a correlation between their abdominal pigmentation and the elevation of their primary habitat. Abdominal pigmentation is influenced by the product of the ebony gene, in the absence of which the abdomen exhibits a dark, melanic phenotype. An analysis of dark vs. light-colored lines resulted in the identification of an enhancer located 5′ of the ebony gene, mutations in which give rise to phenotypic variation.
Several studies have systematically attempted to define human-specific mutations in noncoding genomic DNA sequences that are otherwise highly conserved among mammalian species. Such analyses are able to define isolated sequences, termed “human-accelerated conserved noncoding elements” (HACNSs) or “human-accelerated regions” (HARs), that in some cases appear to be enhancers (Pollard et al., 2006; Prabhakar et al., 2006; Noonan, 2009). Notably, one element, HACNS1/HAR2, consists of 81 bp within which 16 human-specific substitutions have accumulated during evolution. When tested in reporter constructs in transgenic mice, the human version acts as an enhancer to drive gene expression in the developing anterior limb and a few other locations (Prabhakar et al., 2008). Versions of this element that lack the human-specific substitutions fail to direct gene expression to the developing limb. Although the biological significance of this expression pattern, including the gene(s) influenced by HACNS1/HAR2, is unknown, such behavior provides a concrete demonstration that changes in enhancer function can alter expression patterns to generate variation.
Since the discovery of enhancers, the dominant model for their mechanism of action on promoters has invoked direct interactions (Figure 1A). This model is commonly termed “looping”, since it requires that the intervening DNA be looped out or otherwise organized in order to permit the enhancer-promoter interaction (Bulger and Groudine, 1999; Blackwood and Kadonaga, 1998; de Laat et al., 2008). Alternative models for enhancer function have chiefly differed from the basic “looping” premise only in how the enhancer-promoter interaction is established – whether by free or facilitated diffusion within the nucleus, or by an active “scanning” or “tracking” mechanism (Figure 1B), in which the enhancer diffuses one-dimensionally along the chromatin fiber in search of a promoter (Blackwood and Kadonaga, 1998). In addition, more indirect models have been proposed, including “oozing” or “linking”, in which a complex is nucleated at the enhancer and then polymerizes along the chromatin fiber bidirectionally until it reaches a promoter (Ptashne, 1986; Dorsett, 1999; Bulger and Groudine, 1999). In one variation of this model, RNA polymerase II or other complexes are loaded at the enhancer and then actively move along the DNA until reaching a promoter.
The “looping” model in particular has received abundant support from studies of nuclear architecture utilizing “chromosome conformation capture” (3C) and its high-throughput derivatives. In this assay, intact nuclei are crosslinked with formaldehyde, digested with restriction endonucleases, and then incubated with DNA ligase (Cullen et al., 1993; Dekker et al., 2002; Miele and Dekker, 2009). Using this procedure, specific genomic restriction fragments, which may be located far from each other on the linear genome, are found to ligate to each other with a greater frequency than other fragments if they are nevertheless colocalized within the three-dimensional space of the nucleus. Such colocalization can be revealed by the amplification of PCR products across the novel junctions produced during the ligation. Moreover, this approach can be adapted to produce genome-wide maps of the three-dimensional associations a specific locus makes among all the other sequences in the genome (“4C”) (Zhao et al., 2006; Simonis et al., 2006), or even associations among multiple sequences located throughout the genome (“5C” and “Hi-C”) (Dostie et al., 2006; Lieberman-Aiden et al., 2010).
3C has been employed to reveal interactions between distal sequence elements – primarily enhancers and promoters – within multiple loci in mammalian genomes (Miele and Dekker, 2008; de Laat et al., 2008). It is now common to find distal enhancers that colocalize with the promoters they regulate, which has uniformly been interpreted to be the result of direct enhancer-promoter interactions that are necessary for gene activation. Moreover, at multiple gene loci strong correlations have been made between active transcription and the ability to reveal such associations. Within the mammalian β-globin locus, for example, transcription factor knockouts that eliminate β-globin gene transcription uniformly result in the loss of colocalization of the gene with the β-globin LCR, as revealed by 3C (Drissen et al., 2004; Vakoc et al., 2005).
Additional support for direct communication between enhancers and promoters is provided by indications that enhancer-promoter interactions can be specific. As a general rule, enhancers are capable of activating transcription from heterologous promoters, and in fact in the majority of gain-of-function assays reporter gene expression follows the pattern governed by the enhancer, not the promoter. Some notable exceptions to this principle have been demonstrated, however. In the Drosophila Antennapedia gene complex, for example, an enhancer specifically mediates activation of the Sex combs reduced (Scr) gene despite the presence of another active gene, fushi tarazu (ftz), located between it and the Scr promoter (Calhoun et al., 2002; Calhoun et al., 2003). This interaction requires a “tethering” element near the Scr promoter, which can direct the enhancer to activate the ftz gene if it is moved to the ftz promoter instead. A similar element within the promoter of the Drosophila yellow gene renders heterologous promoters capable of being activated by the otherwise highly specific yellow gene enhancers (Melnikova et al., 2008). In addition, several studies have indicated that some enhancers can exhibit a preference for specific classes of gene promoters, for example between promoters that harbor a canonical TATA box vs. promoters that contain a DPE (Ohtsuki et al., 1998; Butler and Kadonaga, 2001). Transcription of the ftz gene is mediated by an enhancer bound by the transcription factor Caudal, which specifically activates genes with DPE-containing promoters (Juven-Gershon et al., 2008).
Evidence supporting enhancer-promoter interactions is part of a large and growing body of studies that have suggested that nuclear architecture is a major determinant of gene expression. On a global scale, for example, several generalized properties of genomic organization within the nucleus have been established, and enhancers have been shown to influence many of them.
First, each chromosome occupies a distinct “territory” within the nucleus, although up to 20% of the volume of the nucleus may be comprised of intermingling of neighboring chromosomes (Cremer and Cremer, 2010; Branco and Pombo, 2007). In addition, specific sequences on a given chromosome have been observed to extrude or “loop” out from the main body, and can even be found in other chromosomal territories (CTs). A study of the β-globin locus, for example, found that the region extruded from its CT specifically in erythroid cells (Ragoczy et al., 2003). This extrusion or looping from the CT occurred prior to high-level β-globin gene expression, and was dependent upon the presence of the β-globin LCR. Furthermore, ectopic integration of a β-globin LCR into a gene-dense region of the mouse genome resulted in more frequent extrusion/looping of the region away from its CT (Noordermeer et al., 2008). The limb bud enhancer of the Shh gene is similarly required for extrusion of the gene locus from its CT (Amano et al., 2009). These results suggest that some enhancers mediate a change in nuclear localization for genes in their vicinity, and that this represents a step distinct from transcriptional activation.
Second, in many metazoan cell types, active genes appear to be localized in the interior of the nucleus, while silent genes are found at the periphery. The correlation is far from absolute – the vicinity of the nuclear pore in yeast, for example, has actually been associated with active genes (Taddei, 2007) – and thus localization does not appear to determine expression state, but specific examples indicate that the nuclear periphery is a repressive environment. Outside of numerous correlative studies, some studies have shown that artificial tethering of reporter genes to the nuclear lamina or nuclear periphery results in downregulation of the reporter and of neighboring genes, although not all genes are affected (Andrulis et al., 1998; Finlan et al., 2008; Reddy et al., 2008). A link between this pattern and enhancer function is again provided by the β-globin locus. A study of β-globin locus positioning within the nucleus during erythroid differentiation demonstrated that at early maturational stages, the locus is found near the nuclear periphery (Ragoczy et al., 2006). As differentiation proceeds and β-globin transcription is activated, the locus moves more toward the interior of the nucleus. This process requires the presence of the β-globin LCR, although here it is not clear if relocalization is a function of the LCR directly or if it represents an indirect consequence of LCR-dependent β-globin gene activation. Separate studies of transgenes under the control of an enhancer derived from the β-globin LCR, however, indicated that at ectopic integration sites the enhancer was required for localization of the transgene far from regions of centromeric heterochromatin, and this was in turn associated with a higher, stochastically-determined probability of the gene being active at all (Francastel et al, 1999).
Third, the nucleus harbors a number of self-organized substructures, proximity to which can affect gene expression (Ferrai et al., 2010). In addition to the nuclear lamina, such substructures include nucleoli, Cajal bodies, PML bodies, splicing speckles and other features that represent concentrations of factors that can influence transcription – in the case of splicing speckles, for example, of the splicing machinery and other mRNA processing factors. Such substructures can serve as the basis for colocalization of genes that are otherwise located far apart on a linear chromosome, or even on different chromosomes. For example, erythroid-specific genes have been found to be positioned near common splicing speckles in erythroid cells (Brown et al., 2008), while several muscle-specific genes have been shown to localize to shared speckles in differentiated muscle cells (Moen et al., 2004).
Finally, gene loci can colocalize in the nucleus on the basis of shared associations with specific factors. Perhaps the most notable of these is with RNA polymerase II itself. Visualization using antibodies against RNA polymerase II or labeling of primary mRNA transcripts have suggested that transcription is localized to a limited number of RNA polymerase II “factories” (Iborra et al., 1996; Sutherland and Bickmore, 2009). Transcription factories have been proposed to underlie the observation that active gene loci distributed across single chromosomes, or even located on separate chromosomes, tend to colocalize in the nucleus (Osborne et al., 2004; Simonis et al., 2006). In fact, some studies have suggested that transcription factories occur in different varieties, corresponding to transcription mediated by different factors (Xu and Cook, 2008; Schoenfelder et al., 2010).
In addition to RNA polymerase II, other factors involved in transcriptional regulation appear to organize into discrete foci in the nucleus, and can either directly or indirectly bring distal gene loci into proximity with each other. These include, for example, special AT-rich sequence binding protein 1 (SATB1), a protein expressed in thymocytes and several other cell types that has been implicated in anchoring disparate genomic loci via long-range interactions (Cai et al., 2003; Cai et al., 2006). In thymocytes SATB1 is observed in a “cage-like” distribution around, but not coincident with, concentrations of centromeric heterochromatin. SATB1 in turn binds to promoter-distal regulatory elements in multiple gene loci, and loss of the factor results in disruptions of normal gene expression patterns and locus-wide chromatin structure.
A special case of this category of interaction is presented by CTCF, a zinc finger transcription factor that can have many roles in gene regulation. Notably, CTCF binding to insulator elements can block enhancer-mediated gene activation w hen such elements are located between an enhancer and promoter (Phillips and Corces, 2009). CTCF is capable of self-association (Pant et al., 2004; Yusufzai et al., 2004), and has been implicated in the formation of chromatin loops in vivo by 3C-based approaches (Kurukuti et al., 2006; Splinter et al., 2006; Hou et al., 2010). Interestingly, CTCF associates with cohesins, protein complexes that physically connect sister chromatids during mitosis and meiosis, and this association appears to be necessary for enhancer-blocking activity (Rubio et al., 2008; Wendt et al., 2008). A recent study of enhancers in embryonic stem cells demonstrated that enhancers and promoters were also associated with cohesin (Kagey et al., 2010), and the cohesin-loading factor Nipbl (Nipped-B in Drosophila) was one of two factors that emerged from a genetic screen in Drosophila for factors involved in enhancer-promoter communication (Rollins et al., 1999). Common association with cohesins provides a potential mechanistic link between long-range associations observed between CTCF binding sites and between enhancers and their cognate promoters.
3C and its variants are inherently descriptive assays, and as with most studies of nuclear organization, they do not provide obvious ways to distinguish between correlation and causation. Thus, while the spatial colocalization of active enhancer and promoter regions revealed by 3C is suggestive of direct interactions that mediate gene activation, it is formally just as likely that it represents a consequence of a distinct activating mechanism. RNA polymerase II transcription factories, for example, provide an alternate mechanism by which enhancer-promoter colocalization might take place (Figure 1A). RNA polymerase II is recruited to many enhancers (Heintzman et al., 2007; Koch et al., 2008; Kim et al., 2010). Although the function of this recruitment, if any, is unknown, the likelihood that RNA polymerase II-bound enhancers and promoters might colocalize by virtue of coincidental association in such transcription factories has rarely elicited much commentary. In some cases – notably, the β-globin locus – it has been speculated that part of the role of the enhancer is to transfer RNA polymerase II to the promoter directly (Zhu et al., 2007; Leach et al., 2001), but this has not been demonstrated, and in the case of the β-globin locus RNA polymerase II still associates with the gene promoter in the absence of the LCR (Sawado et al, 2003).
The function of RNA polymerase II recruitment to enhancers, if it does not involve transfer to the promoter, is not clear, but recent studies have shown that enhancers themselves are transcribed. In one study, a population of ~12,000 enhancers was defined by the combination of H3K4Me1 and binding of the transcription cofactor CBP (Kim et al., 2010). Roughly 25% of the enhancers defined in this fashion were found to be associated with RNA polymerase II, and this subset was in turn transcribed into RNA. The product RNAs, termed eRNAs, were short, bidirectional and not polyadenylated, and in at least one case, eRNA transcription required the presence of the target promoter. In another study, 70% of extragenic RNA polymerase II binding in primary macrophages was found to map to enhancers, as defined by the H3K4Me1 signature (De Santa et al., 2010).
Such findings complement earlier studies that have shown transcription originating from specific enhancers, for example at HS2 of the β-globin LCR (Kim et al., 2007; Zhu et al., 2007). In a few cases, non-genic transcription originating from enhancers has been shown to be necessary for normal gene activation. For example, at transgenic human growth hormone loci in mice, a large region located upstream of the growth hormone gene is transcribed in an enhancer-dependent fashion (Ho et al., 2006). Insertion of a transcriptional terminator in this region partially eliminates this non-genic transcription pattern and in turn results in a decrease in growth hormone gene expression.
Another study has suggested that some noncoding RNAs (ncRNAs) act as enhancers on neighboring genes (Orom et al., 2010). The authors identified a set of several thousand unique, long ncRNAs, and then focused on a subset that exhibited differential expression during keratinocyte differentiation. RNAi-mediated knockdown of several of these ncRNAs resulted in a decrease in expression of neighboring genes. Further analysis of one ncRNA, located near the gene for the transcription factor Snai1, revealed that it behaved as a classical enhancer in transient reporter gene assays, and furthermore that it was the RNA itself that was required for the effect. The ncRNAs characterized in this study appear to be distinct from eRNAs – the former, for example, are polyadenylated, while the latter are not.
The significance of eRNA transcription is not known, and the mechanism by which ncRNAs might mediate activation of neighboring (but distal) genes is similarly unclear. Such studies, however, reveal an unforeseen complexity to the roles of RNA polymerase II association with enhancers and of the association of enhancers with transcription factories, and suggest functions for noncoding RNA for which conventional models of enhancer-mediated gene activation currently do not account (Figure 2C).
Several studies have provided evidence for a mechanism of enhancer function distinct from, or perhaps complementary to, the nuclear colocalization implied by 3C-based studies. Within the β-globin locus, some chromatin modifying enzymes – notably, the histone H3 lysine 4 methyltransferase MLL2 and the histone H3 lysine 9 methyltransferase G9a – have been shown to be recruited by association with the enhancer-binding factor NF-E2, and then to spread locus-wide (Demers et al., 2007; Chaturvedi et al., 2009). In addition, a novel class of enhancer activity is suggested by studies of enhancers within the endogenous mouse β-globin locus, and within a transgenic human growth hormone locus in mice. In both cases, the active genes are embedded within a larger “domain”, extending for 15-30 kb, defined by high levels of histone hyperacetylation. Deletion of binding sites for the transcription factor Pit-1 within the transgenic human growth hormone locus (Ho et al., 2002), or of an evolutionarily conserved enhancer located between the embryonic β-globin genes (G. Fromm and M. Bulger, submitted), results in complete loss or significant decreases in expression, respectively, coupled with complete loss of the hyperacetylated domain. Thus, enhancers can also control chromatin modifications that are distributed continuously over large regions, in a manner analogous to silencers that control heterochromatic domains (Figure 2B; see also below).
Another function for enhancers is emerging from studies of epigenetic marks in pluripotent embryonic stem (ES) cells (Figure 2A). In several cases, “pioneer” transcription factors have been found to associate with distal enhancers in ES cells, although they do not mediate gene activation. These factors are then lost upon ES cell differentiation. If differentiation proceeds along a lineage within which the enhancer is normally active, the “pioneer” factor is replaced with another factor that mediates enhancer-dependent gene activation. If differentiation proceeds along another lineage, loss of the “pioneer” factor results in inactivation of the enhancer and thus the entire gene locus. For example, an enhancer within the liver-specific Alb1 gene locus is bound by the transcription factor FoxD3 in ES cells; binding of FoxD3 prevents DNA methylation at this enhancer. Upon differentiation into endoderm, FoxD3 is replaced by FoxA1, the activating factor, and the Alb1 gene is transcribed (Xu et al., 2009). Differentiation into another lineage is accompanied by DNA methylation at the enhancer. Similar models have been presented for the macrophage/dendritic cell gene IL12b, the thymocyte-specific Ptcra gene (Xu et al., 2009), and for an enhancer within the pre-B cell-specific λ5/VpreB1 gene locus (Liber et al., 2010).
These examples suggest a general model in which developmental and lineage maturational decisions to activate or to silence a given tissue-specific gene locus are postponed until specific timepoints via the binding of “pioneer” transcription factors at enhancer elements. Factor binding to the enhancer thus acts as a “placeholder” for a later step in gene regulation. Although this mechanism for enhancer function does not involve direct activation of gene promoters, it is no less necessary for the proper regulation of transcription, and represents a novel, indirect method by which enhancers can affect gene activation.
Thus, while most studies of enhancer function have focused on models involving direct enhancer-promoter interactions and the assays that can reveal them, other studies indicate that regulation at a distance can involve indirect mechanisms as well. Transcription of enhancers, noncoding RNAs, enhancer “placeholding” and spreading mechanisms are likely to be important for the function of at least some enhancers, and observations of these phenomena raise the larger possibility that simple models of enhancer-promoter looping are not sufficient to account for enhancer activity.
Such diversity of mechanism is anticipated by long-range regulatory mechanisms in organisms – prokaryotes and single-celled eukaryotes – in which such elements represent the exceptions rather than the rule. In the budding yeast S. cerevisiae, for example, regulatory sequences are usually limited to upstream activation sequences (UASs) that are located within a few hundred base pairs of the promoter. Relocation of a UAS at greater distances results in loss of function (Dobi and Winston, 2007).
Artificial model systems in yeast, however, have shown that bridging of distal sequences can be accomplished. Integration of expression cassettes within yeast telomeres, which naturally fold and can bring distal elements within close proximity of each other, or use of the self-associating Drosophila GAGA factor to mediate looping between two sequences, have demonstrated that gene activation can be accomplished by simple looping interactions (de Bruin et al., 2001; Petrascheck et al., 2005).
In addition, regulatory elements that appear to act like enhancers – activation at a distance and/or from positions downstream of the promoter – have been found at select gene loci in S. cerevisiae. The most notable example in this regard is the gene for the HO endonuclease, an enzyme involved in mating-type switching that is expressed in a brief window during late G1 phase of the cell cycle. The process of HO gene activation requires binding of the transcription factor Swi5 to two sites located more than 1 kb upstream of the promoter (Figure 3A). Swi5 in turn is required for recruitment of chromatin remodeling factors, and results in histone acetylation across the entire region between the binding sites and the promoter. Interestingly, binding of Swi5 is a transient event, and HO gene activation occurs well after Swi5 is no longer present, suggesting that these distal regulatory elements initiate an activation mechanism that is maintained by chromatin remodeling at the promoter in an indirect fashion (Cosma et al., 1999; Krebs et al., 1999).
Studies in yeast also provide a cautionary note that may be applied to enhancer studies in metazoans. For example, early investigations of the S. cerevisiase rRNA genes identified an apparent transcriptional enhancer located downstream of the 35S rRNA gene. In transgene constructs, this enhancer region mediated 10- to 30-fold increases in 35S rRNA transcription, and was also selective, having no effect on the 5S rRNA gene located between the enhancer region and the 35S rRNA promoter (Neigeborn and Warner, 1990; Morrow et al., 1993). The results suggested a simple model in which the enhancer interacted with the 35S rRNA gene promoter by a looping mechanism, with the intervening 5S gene isolated within the loop. Subsequent studies, however, demonstrated that the enhancer was completely unnecessary for normal 35S rRNA expression (Wai et al., 2001), and instead functioned in ectopically integrated reporter constructs to recruit RNA polymerase I, which is otherwise primarily localized within the nucleolus. Such findings sound a cautionary note for any study in which the potential function of a distal enhancer is investigated outside of its normal context.
Finally, yeast heterochromatin provides a well-established model for action at a distance in single-celled eukaryotes. Notably, the simplest version of this model, exemplified by silencing at the yeast mating-type gene loci, is entirely indirect: silencer elements within the locus serve to nucleate a complex of Sir proteins (Rusche et al., 2003). The complex includes the histone deacetylase Sir2, which deacetylates the histone tails of nearby nucleosomes. Other components of the complex bind with high affinity to deacetylated histone tails, and then serve to recruit more of the Sir2 deacetylase, resulting in a progressive spreading of histone deacetylation and silencing factors along the chromatin fiber (Figure 3B). In this way DNA sequences that in themselves do not encode information sufficient for silent regulation are packaged in a repressive chromatin structure. It has been shown that once established, silencing is maintained in yeast cells even after the silencer itself is excised by an in vivo recombination strategy, although the silencer is still required for re-establishment of the silent state after cell division (Holmes and Broach, 1996).
Silent domains in yeast, however, can be established and maintained discontinuously. For example, silent domains at yeast telomeres or within the HMR locus can encompass artificially inserted active genes. Such behavior involves the activity of “proto-silencer” elements located elsewhere in the domain that have no activity by themselves, but can augment the activity of canonical silencers (Talbert and Henikoff, 2006). Recently, it has been demonstrated that silencers within the HMR locus in yeast colocalize, as determined by 3C (Valenzuela et al., 2008). Thus far, however, it is not clear if discontinuous silencing requires such direct interactions, or if it occurs indirectly by localization to nuclear subcompartments enriched in silencing factors. Such issues directly parallel corresponding aspects of enhancer function, as described above.
Studies of enhancer elements in bacteria have provided elegant demonstrations of both the looping and tracking models (reviewed by Xu and Hoover, 2001). A number of bacterial genes regulated by the σ54-holoenzyme are also controlled by factors that bind to enhancer sequences that can be located up to 3 kb upstream and 1.5 kb downstream of the promoter. These enhancer binding proteins then interact directly with the σ54-holoenzyme via DNA looping, and notably, in some cases the looping mechanism is facilitated by additional factors that bind between the enhancer and promoter and bend DNA (Figure 3C). In contrast, a distal enhancer that is required for activation of phage T4 late genes functions by the loading of a ring-shaped trimer of a factor termed gp45, followed by one-dimensional diffusion of the trimeric protein along the DNA until it reaches the promoter (Figure 3D).
In summary, studies in organisms that typically lack enhancer activity indicate that distal sequences can affect gene expression by widely different mechanisms. In considering such mechanisms, there is no reason to exclude the possibility that any given enhancer might work by a combination of them. Thus, such studies argue against simple, unified models for enhancer function in metazoans.
Enhancers were discovered nearly 30 years ago, and for most of the intervening period they existed in the experimental literature largely as an odd feature of certain tissue-specific loci, with a peculiar ability to activate transcription from promoters over large genomic distances. It is only recently that advances in technology and understanding have provided a fuller delineation of the role distal enhancers play in gene regulation in metazoans. Their function is not only likely to constitute a primary basis for differential gene expression that underlies cell-type specificity, but also to have crucial roles in stem cell pluripotency, human disease and metazoan evolution. Mechanistically, enhancers lie at the nexus of transcription, nuclear organization, chromatin structure, epigenetics and noncoding RNA. In accordance with such a complex spectrum of biological functions, it seems unlikely that enhancers constitute a monolithic class of regulatory element that works via a single, unified mechanism.
We are grateful to M. Bender, M. Conerly, R. Kamakaka, J. Palis, T. Ragoczy, H. Rincon-Arano. J. Ritlund, and D. Strongin for critical reading of the manuscript and helpful suggestions.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.