The development and generalized use of high-throughput and/or genome-wide methodologies for examining transcription factor binding, core histone modifications and RNA polymerase II association has drastically altered the perception of how regulatory sequences are distributed in mammalian genomes. In contrast to the budding yeast S. cerevisiae, for example, the human genome is only sparsely populated with protein-coding genes, and even when growing awareness of noncoding genes, such as small RNAs, is considered, it is readily apparent that the largest proportion of the genome consists of intergenic or intragenic (intronic) sequences for which a specific function is not obvious. Prior studies of selected gene loci have identified distal regulatory sequences such as enhancers and LCRs within these regions, but the gain-of-function assays used to characterize these elements have only served to delineate one or a few such elements for each locus, leaving the majority of noncoding DNA with no known function.
More recently, however, distal regulatory elements have been distinguished from gene promoters by a signature of histone modifications and trans
-acting factor binding identified via
genome-wide microarray and high-throughput sequencing (chIP-seq) (The ENCODE Project Consortium, 2007
; Koch et. al., 2007
; Heintzman et. al., 2007
; Heintzman et. al., 2009
; Visel et. al., 2009
). Features of this signature include monomethylation of histone H3 lysine 4 (H3K4) and association of specific factors, such as the histone acetyltransferase and transcriptional coactivator p300. Levels of H3K4 monomethylation in particular peak at enhancers and not at transcription start sites. Conversely, H3K4 trimethylation appears to occur at promoters but not at enhancers. In addition, there is a strong correlation between these regulatory elements and the locations of DNaseI hypersensitive sites (DNaseI HSs), which are generally thought to mark regions where local chromatin structure is disrupted by transcription factor binding (Xi et. al., 2007
Both H3K4 monomethylation and p300 binding have proven to be predictive for enhancer activity of genomic elements in functional assays (The ENCODE Project Consortium, 2007
; Heintzman et. al., 2007
; Visel et. al., 2009
). This is perhaps not surprising – for example, any sequence that is bound by p300 might be expected to exhibit enhancer activity in a transient transfection assay when linked to a reporter gene, but this doesn’t necessarily indicate that such a sequence actually functions as an enhancer at its native location.
Still, current high-throughput studies are intriguing in several ways. First, they have revealed an unexpected abundance of putative enhancer sequences. A genome-wide study utilizing only two cell lines identified 55,000 sequences exhibiting the “chromatin signature” indicative of enhancers (Heintzman et. al., 2009
), which is significantly larger than the number of genes expressed in these lines. The signature at most of these sequences was specific to one or the other cell type as well, and given the variety of cell types present in mammals, the authors extrapolated this figure to estimate that the human genome harbors 105
such elements in total. This would represent an average across the genome of one such element every 3,000–30,000 bp, with significantly higher densities in “gene-rich” regions. A pilot survey of 1% of the human genome by the ENCODE project revealed a similar frequency of occurrence of monomethyl H3K4 not associated with gene promoters (The ENCODE Project Consortium, 2007
; Koch et. al., 2007
Second, comparisons of patterns of histone modification and transcription factor association between putative enhancers and known transcription start sites have suggested that the greatest differences between cell types lie in the distal enhancers, not the promoters (Heintzman et. al., 2009
). Similarly, mapping of DNaseI HSs across six different cell lines showed that the majority, which were common among all of the lines, were associated with promoters or putative insulator elements, while the remaining cell type-specific HSs were highly enriched for enhancer elements (Xi et. al. 2007
). The implication is that development and differentiation of disparate cell types is accomplished for the most part via the differential activities of distal regulatory elements like enhancers.
Since the initial discovery of enhancers, it has been known that they are most often the dominant element in conferring tissue specificity to a linked gene. A hallmark of most enhancers is their ability to activate transcription from any linked promoter in reporter gene constructs, even if promoter and enhancer originate from gene loci with completely different expression patterns in vivo. Although there are exceptions to the general principle, expression of the reporter gene follows the pattern governed by the enhancer, not the promoter. This ability, in fact, has been used to identify enhancers (or, more correctly, regions of the genome) that drive specific expression patterns, via the “enhancer trap” – a transgene under the control of a weak promoter will only be expressed if it integrates into a genomic location that is under the influence of an enhancer that can activate the promoter. The importance of enhancers in determining patterns of eukaryotic gene expression is also illustrated by known examples of genes expressed in different tissues or locations in an organism, which in turn are regulated by multiple enhancers, each of which specifies part of the expression pattern.
On the other hand, the finding that differences in histone modification patterns and transcription factor binding between cell types localizes most often to enhancers and not promoters would seem to conflict with the known prevalence of genes with multiple promoters. Genome-wide analyses have shown that more than 50% of human genes (Kimura et. al., 2006
; Carninci et. al., 2007), and ~14% of genes in Drosophila
(Zhu and Halfon, 2009
), are associated with multiple transcription start sites, and the literature is abundant with examples of genes that are expressed in different cell types via
different promoters. It would appear, however, that expression from these alternate promoters is under the control of multiple, alternate enhancers, and that in the majority of cases tissue-, developmental- and/or differentiation stage-specific transcription is under the control of distal regulatory elements that are dominant over the promoter(s).
Third, genome-wide and otherwise high-throughput studies of putative enhancers have unexpectedly revealed that a substantial proportion of such elements are not evolutionarily constrained (The ENCODE Project Consortium, 2007
; Margulies et. al., 2007
). In the ENCODE pilot survey, roughly half of the sequences determined to have activity in functional assays did not appear to be subject to evolutionary constraint based on cross-species sequence comparisons. Previously, sequence conservation in regions of the genome not associated with gene-coding exons has been used to support other lines of evidence for function of distal regulatory elements, and in fact such conservation has been used as a predictive tool to identify potential regulatory regions, a technique termed “phylogenetic footprinting” (Hardison, 2000
). The results of the ENCODE analysis indicate either that many distal regulatory elements cannot be identified on the basis of DNA sequence conservation, or that the conserved sequences within these elements are so small as to escape detection by commonly used computer-based algorithms.
A study of the embryonic enhancers of the even-skipped
gene in Drosophila
as compared to scavenger flies (Sepsidae
) illustrates how this might occur (Hare et. al., 2008
). Although the DNA sequences of the enhancer regions in either species are highly divergent, they function to accomplish embryonic patterns of even-skipped
expression that are nearly identical. Conservation of small sequence motifs was also excluded. Thus, highly specific function of a set of distal regulatory elements can be conserved even when DNA sequence is not. Speculation for how this can occur has focused on the possibility of compensatory mutations – that is, pairs of mutations that together are not as deleterious as would be expected for single mutations (Veitia, 2008
). In addition, an enhancer could conceivably recruit the same activating complex, but interact with different components of it. Over time, the entire sequence of an enhancer can be transformed while maintaining function.
Some caution in generalization of these studies is warranted, in that thus far they have exclusively utilized transformed cell lines, and so it is not yet clear that primary tissues follow the same pattern. Still, these genome-wide studies of histone modification patterns and transcription factor binding have provided a strong suggestion that in at least some metazoans the genome is rife with promoter-distal sequences that represent the dominant regulatory elements in gene expression.