As a complementary strategy to comparative genomic methods, it has recently become possible to generate genome-wide maps of chromatin marks that can be used to identify the location of enhancers and other regulatory regions. These genomic approaches have been enabled by (a) an improved understanding of the proteins and epigenetic marks found at particular categories of regulatory elements and (b) concurrently developed technologies that allow traditional chromatin-immunoprecipitation techniques to be applied on the scale of whole vertebrate genomes. In particular, the initial in-depth studies of 1% of the genome in the ENCODE pilot project, largely based on datasets generated by the ChIP-chip technique (
Box 1), revealed molecular properties of a variety of regulatory elements. With respect to enhancer identification, a particularly relevant insight was the identification of specific methylation signatures found at enhancers. In contrast to promoters, which are marked by trimethylation of histone H3 at lysine residue 4 (H3K4me3), active enhancers are marked by monomethylation (H3K4me1) at this position
38. Mapping these marks in the ENCODE regions and, more recently, throughout the entire genome
39 revealed tens of thousands of elements that were predicted to be active enhancers in the examined cell types. Importantly, these predicted enhancers were also frequently associated with the transcriptional coactivators p300 and/or TRAP220, raising the possibility that such coactivators might represent useful general markers for mapping enhancers. While it was initially not clear to what extent the presence of transcriptional coactivators like p300 is indicative of active vs. inactive enhancers, comparison of DNaseI hypersensitivity (DNaseIHS, a marker of open chromatin structure) in several cell lines throughout the ENCODE regions revealed that the location of cell line-specific distal DNaseI HS sites correlates with cell line-specific p300 binding at these sites, providing further support for the possibility that transcriptional coactivators, along with histone modification signatures, may be useful for mapping of DNA elements with cell-and tissue-specific enhancer activities
40.
Box 1. Mapping of Regulatory Elements by ChIP-chip and ChIP-seqFormaldehyde cross-linking of DNA to proteins that bind to it directly or as part of larger complexes
70 combined with subsequent immunoprecipitation targeting specific DNA-associated proteins (ChIP,
71) has been widely used in the pre-genomic era to study protein-DNA interactions directly in living cells. The technique involves the molecular fixation of non-covalent protein-DNA interactions, shearing of the cross-linked chromatin, immunoprecipitation with an antibody binding the protein (or protein modification) of interest, and subsequent quantitation of enrichment of the associated DNA fragments compared to non-immunoprecipitated (“input”) DNA. While useful to examine protein-DNA interactions at individual hypothesized binding locations, the need for quantitation at every single site of interest initially thwarted the application of this technique on a genomic scale. The introduction of DNA microarrays enabled hybridization-based quantitation of large numbers of candidate sites in parallel (“ChIP-on-chip” or “ChIP-chip”), thus making it possible to screen in a single experiment entire compact model organism genomes
72,73 or large vertebrate genome intervals
74 (). This technique was used on a massive scale in the Encyclopedia of DNA Elements (ENCODE)pilot project, where dozens of proteins and protein modifications were initially mapped in a representative 1% portion of the human genome
57.
Recently, chromatin immunoprecipitation coupled to massively-parallel sequencing (ChIP-seq) has become increasingly utilized as an alternative to ChIP-chip
42–45. The ChIP-seq method is very similar to the experimental setup of ChIP-chip, except that in the final step, massive-parallel sequencing techniques are used to determine the sequence of immunoprecipitated DNA fragments, which are then computationally mapped to the reference genome (). Improved sequencing technologies offer the possibility to obtain millions of mappable reads in a single ChIP-seq experiment at moderate cost. The results from ChIP-seq are based on statistical analysis of read counts, which overcomes many of the challenges associated with the quantitation and normalization of hybridization signals, and an increasing number of advanced computational ChIP-seq analysis tools are becoming available
75. ChIP-seq analysis covers by default the entire mappable portion of the reference genome without the need to restrict the analysis to its subregions.
Thanks to the development of the ChIP-seq technique (
Box 1), which has now superseded ChIP-chip as the method of choice for many applications, genome-wide maps for a considerable number of chromatin marks and transcription factors both in human and mouse have become available
41–53. In addition to the H3K4me1/3 signature discussed above, these datasets enabled the identification of additional chromatin marks present at predicted or validated enhancers and provided a refined view of their correlation to enhancer activities
42,49,53. However, with very few exceptions (e.g., references
48,52) genome-wide mapping of these and other regulation-associated chromatin marks () was done in immortalized cell lines, cultured stem cells or primary cell cultures. Thus, the maps of potentially enhancer-associated marks produced by these studies provided limited insight into their
in vivo distribution during embryonic development and in adult organs, likely concealing the genomic location of enhancers that are inactive in these cells.
| Table 1Selected major categories of noncoding functional elements. |
In a recent ChIP-seq study targeted at the prediction of enhancers that are active in a particular tissue during embryonic development, the transcriptional coactivator p300 was mapped in chromatin directly derived from embryonic mouse tissues including the forebrain, the midbrain, and the limb buds
54. Overall, several thousand p300 peaks were identified from these three tissues, with the vast majority of genome regions only being significantly enriched in one of the three tissues and located in noncoding regions distal from known promoters. Transgenic mouse experiments with close to a hundred of these sequences revealed that they are in almost all cases developmental enhancers. More importantly, the tissue-specific occupancy by p300 as identified by ChIP-seq could in most cases also accurately predict the
in vivo patterns of expression driven by these enhancers, providing an important advantage over comparative genomic methods for enhancer identification. The study also showed that tissue-specific p300 peaks are globally enriched near genes that are expressed in the same tissue, again consistent with their hypothesized function as active transcriptional enhancers.
These experimentally predicted genome-wide sets of
in vivo enhancers also made it possible to address the controversial issue to what extent evolutionary conservation is a hallmark of
in vivo enhancers
55. Several studies have shown that highly conserved noncoding elements are enriched in developmental
in vivo enhancers
31–33. However, some observations have challenged such a generalized correlation between sequence conservation and enhancer activity: (1) experimental analysis of individual loci suggested that a large proportion of enhancers cannot be detected by comparative genomics
56, (2) a surprisingly large fraction of sequences in the ENCODE regions whose molecular marks suggest regulatory functions were not or only weakly conserved
57, (3) histone methylations present at orthologous loci in human and mouse did not correlate with overall increased levels of sequence conservation
58. In contrast to these findings, approximately 90% of the tissue-specific p300 peaks identified by ChIP-seq in developing mouse tissues overlapped regions that are under detectable evolutionary constraint
54. While there may be variation in the degree of evolutionary constraint of enhancers that are active in different types of cells or developing tissues, these data suggest that developmental enhancers that can be identified through p300-binding are commonly evolutionarily constrained.
While in its infancy, the selected studies reviewed here highlight the clear potential of mapping various chromatin marks for identifying and predicting the activity of transcriptional enhancers on a genome-wide scale. The continued progress in throughput and cost reductions of next-generation sequencing technologies offers an increasingly powerful genome-wide means for identifying specific DNA-protein interactions. We anticipate that high-resolution genome-wide in vivo maps of chromatin marks will become available for comprehensive series of developing and adult tissues in normal as well as disease states, providing multi-layered in vivo annotations of the noncoding portion of our genome. It is important to realize that despite this expected progress, we will continue to need parallel in vitro and in vivo biological studies to understand the functions associated with chromatin marks and to conclusively study the mechanisms by which sequence variation in distant-acting enhancers contributes to disease.