|Home | About | Journals | Submit | Contact Us | Français|
Recent years have seen unprecedented characterization of mammalian chromatin thanks to advances in chromatin assays, antibody development and genomics. Genome-wide maps of chromatin state can now be readily acquired using microarrays or next-generation sequencing technologies. These datasets reveal local and long-range chromatin patterns that offer insight into the locations and functions of underlying regulatory elements and genes. These patterns are dynamic across developmental stages and lineages. Global studies of chromatin in embryonic stem cells have led to intriguing hypotheses regarding Polycomb/trithorax and RNA polymerase roles in ‘poising’ transcription. Chromatin state maps thus provide a rich resource for understanding chromatin at a ‘systems level’, and a starting point for mechanistic studies aimed at defining epigenetic controls that underlie development.
Once conceived as an inert scaffold to package feet of DNA into each cell, chromatin is now recognized to have a multiplicity of functions in genome regulation. The basic building block of chromatin, the nucleosome, comprises an octamer of histone proteins wrapped by ~146 basepairs of DNA. Histone proteins are subject to over one hundred known post-translational modifications, including acetylation, methylation, ADP-ribosylation, ubiquitination, and phosphorylation . These modifications occur on the side chains of specific residues in the histone tails and cores and functionally impact transcription, replication, recombination, and repair. Accordingly, chromatin modifying proteins can show cell type specific phenotypes in gain- or loss-of-function studies [1–3]. Lysine acetylation and methylation are both reversible processes thanks to the activities of histone deacetylases and recently discovered demethylases [1,4]. Both of these enzyme families are implicated in a range of developmental and physiologic pathways as well as in malignancy [1,5,6].
Our ability to interrogate chromatin structure has been transformed in the past few years by a convergence of advances in chromatin antibody development and genomic technologies. Since its introduction 20 years ago, the chromatin immunoprecipitation (ChIP) assay has gradually become a mainstay for chromatin biology experimentation [7–9]. This procedure uses specific antibodies to enrich genomic DNA associated with a particular histone modification or chromosomal protein in vivo. The enriched DNA can then be interrogated by PCR (ChIP-PCR), microarray hybridization (ChIP-chip), or high-throughput sequencing (ChIP-Seq). PCR can be used to query a limited number of genomic positions – e.g., a small panel of known promoters. In contrast, high-density microarrays with probes placed at regular intervals (‘tiling arrays’) can be used to query large domains such as chromosomes or the whole genome, although the latter can be costly (reviewed in ). More recently, next-generation sequencing platforms have been applied to chromatin state mapping [11,12]. These new technologies can sequence ChIP DNA at sufficient depth to enable accurate and high-resolution whole genome analysis. Several technical advantages, including rapid throughput, high resolution, better genome coverage and the ability to interrogate comparatively small amounts of ChIP DNA, make the sequencing approach particularly powerful.
The power of genomic technologies to inform accurately on biology is contingent on high quality reagents. A vast number of chromatin modifications and structural proteins have been identified to date, and an immense number of antibody reagents have been raised against these epitopes. However, the specificity of these reagents and their efficacy in ChIP may vary widely depending on the different epitopes, antibody sources, and bleeds. As with any chromatin assay, precise standardization of these starting reagents is essential for effective genome-scale characterization.
Genome-scale chromatin assays also require large numbers of cells, and homogeneity of the input population is critical for clarity of data interpretation. Because of this requirement, most experiments to date have focused on cell lines that can be readily expanded in culture. However, recent studies have leveraged techniques in stem cell biology, in vitro differentiation and cell sorting to obtain populations of high biological interest. Genome-wide maps for a number of chromatin marks have now been reported for human and mouse embryonic stem (ES) cells, neural progenitor cells, fibroblasts, hepatocytes and T-cells [11–16].
Chromatin packaging in ES cells is of particular interest as increasing evidence suggests that it plays a critical role in maintaining pluripotency . Since ES cells are derived from the inner cell mass of the blastocyst at a developmental time point that follows large-scale erasure and re-establishment of epigenetic modifications, chromatin state maps for ES cells may provide unique insights into the reprogramming process. The maps also offer a framework for understanding differentiation and lineage-specification, and how these processes alter and are altered by the chromatin state.
The chromatin landscapes of human and mouse ES cells have been systematically characterized by a series of large-scale studies over the past few years [11,13–16,18,19]. Overall, the studies display a remarkable degree of concordance.
Studies in the model system S. cerevisiae correlated histone H3 lysine 4 (H3K4) tri-methylation with RNA polymerase II initiation, while H3 lysine 36 (H3K36) tri-methylation has been correlated with elongation . Similar results have been obtained for mammalian cells where genome maps of H3K4 and H3K36 tri-methylation reveal the promoters and transcribed regions of protein coding genes as well as non-coding transcripts (Figure 1) [11–13,21,22].
In ES cells, H3K36 tri-methylation levels as determined by ChIP-Seq or ChIP-chip correlate strongly with RNA expression levels. H3K4 tri-methylation also correlates with expression, but to a lesser extent. This reflects the fact that in ES cells virtually all promoters with annotated CpG islands or otherwise high CG-density carry H3K4 tri-methylation [11,13–15]. Many CG-rich promoters regulate constitutive house-keeping genes, but others correspond to developmental regulators and signaling proteins that are not expressed in ES cells (see below). The roughly 25% of promoters that lack H3K4 tri-methylation have low CG content, frequently exist in clusters, and tend to correspond to clustered genes that encode tissue-specific proteins such as olfactory receptors or keratin proteins [11,13].
The disconnect between H3K4 tri-methylation and expression may reflect an unexpectedly widespread role for ‘poised polymerase’ at inactive promoters . RNA polymerase II can be detected at the promoters of >50% of genes, including many that do not show evidence of elongation as judged by mRNA expression and H3K36 tri-methylation levels. Many of these ‘inactive’ promoters generate short abortive transcripts, suggesting they are initiating but are unable to transition to elongation . Thus, regulation of the transition to elongation may play an important role in titrating gene expression and in keeping certain inactive genes poised for induction (Figure 2). Although the underlying mechanisms remain elusive, the phenomenon appears in other models, including B-cells and T-cells and Drosophila [12,13,23].
So what is the functional relevance of H3K4 tri-methylation at active and inactive promoters? This mark may serve as a general means for recruiting the transcriptional machinery or ensuring its access to the promoter. Indeed, several components of transcriptional machinery contain domains that recognize the H3K4 tri-methylated histone tail [24–26]. Tissue specific expression of these genes would then be controlled by additional factors that titrate initiation rates or regulate the transition to elongation [13,15,20]. In addition H3K4 tri-methylation may protect inactive CG-rich promoters from DNA methylation by preventing engagement by the de novo DNA methyltransferase Dnmt3L [27,28].
An extensive body of literature, largely focused on the Drosophila model system, has defined fundamental roles for trithorax and Polycomb protein complexes in the establishment and maintenance of lineage-specific gene expression [29,30]. Trithorax proteins are a diverse set of transcriptional activators, several of which catalyze or physically interact with H3K4 methylation. Polycomb proteins are transcriptional repressors that catalyze and bind Histone H3 lysine 27 (H3K27) methylation. The patterns of H3K27 tri-methylation and Polycomb protein binding in ES cells have been systematically characterized by a series of studies [11,14–16,18,19]. The repressive modification is evident at roughly 20% of promoters in ES cells. Its localization correlates with that of the Polycomb repressive complex 2, which catalyzes H3K27 tri-methylation [16,19]. Genes targeted by Polycomb and marked by H3K27 tri-methylation are transcriptionally silent in ES cells and include a large number that encode developmental regulators as well as key signaling proteins.
The unexpected observation that virtually all Polycomb targets in ES cells display the activating H3K4 tri-methylation mark in addition to the repressive H3K27 tri-methylation mark led to the hypothesis that a ‘bivalent’ chromatin state may poise these genes for subsequent activation (Figure 2) [18,31]. Consistent with this model, a high proportion of Polycomb targets, including key developmental regulators, are rapidly induced upon ES cell differentiation. The activated genes become selectively marked by H3K4 tri-methylation in differentiated populations. In contrast, non-induced genes tend to lose the activating mark and selectively acquire repressive H3K27 tri-methylation (Figure 3).
Studies on multipotent and committed cells offer insight and generate new questions regarding the bivalent domain hypothesis [11,18]. In ES cell–derived neural progenitors, most bivalent domains resolve in accord with expression changes (Figure 3). A small subset of bivalent promoters are retained in neural progenitors, and many correspond to genes with glial or neuronal functions whose fate is yet to be determined. A key issue remains whether the retention of H3K4 tri-methylation indeed renders these promoters more readily activated than those that retain only H3K27 tri-methylation is an open question. Such a finding could suggest a broad role for bivalent domains in hierarchical differentiation. However, other studies have revealed potential bivalent promoters in differentiated T-cells as well as evidence for de novo formation of bivalent chromatin during ES cell differentiation (e.g., at the OCT4 promoter) [12,14]. These findings are not readily explained by such a hierarchical model.
An open question remains as to why bivalent chromatin is evident at so many promoters in ES cells. Estimates of the number of bivalent promoters from various studies are fairly consistent at around two to three thousand genes [11,14,15]. In contrast, differentiated cells contain far fewer examples . The complete set of bivalent genes could represent key regulatory targets of Polycomb and trithorax that are inactive in ES cells but important for downstream differentiation pathways. However, an alternate explanation is that only a subset of the bivalent genes are subject to such epigenetic controls, while others reflect distinct phenomena. Notably, those targets that encode key developmental regulators are marked by particularly expansive regions containing robust signals for the opposing histone modifications [11,16]. Furthermore, only a few ‘bivalent’ targets have been rigorously confirmed by sequential ChIP to have concomitant H3K4 and H3K27 tri-methylation on the same chromatin segment. Identification of the most relevant set of gene targets and the functional import of the bivalent chromatin state in ES cells awaits further investigation.
Although clearly distinct, the stalled polymerase and bivalent chromatin domain models are not mutually exclusive. A recent study found evidence for poised polymerase at a panel of bivalent gene promoters in ES cells .
The uninterrupted, high resolution maps of chromatin modification patterns across genomic loci achieved with tiling arrays and ChIP-Seq methodology have revealed remarkably expansive regions with continuous chromatin markings. Domains of H3K27 methylation, H3K4 methylation and/or histone acetylation extending 10 or even 100 kilobases are evident at the mammalian Hox clusters and other evolutionarily conserved developmental loci [16,21,33]. The domain patterns vary among different cell types and likely function in the maintenance of gene expression or repression. Chromatin domains are attractive models of ‘epigenetic’ memory as their large size could ensure robust transmission through cell division [34,35].
Studies in model organisms have identified specific sequence elements capable of halting the propagation of a given chromatin state along the chromosomal DNA . These boundary elements play an important role in defining chromatin domain structure in such organisms. The mechanisms of boundary formation in mammalian cells remain poorly understood, but some insights have emerged in recent publications.
Non-coding RNAs, well known for their roles in silencing the inactive X chromosome and other loci, are potential candidates for regulating the establishment of chromatin domains and their boundaries [37–40]. A recent microarray study identified many non-coding RNAs in the mammalian Hox clusters . Remarkably, antisense knock-down of a non-coding RNA expressed at the edge of a chromatin domain in HoxC caused expansion of a chromatin domain in HoxD. The authors proposed that an interaction between the non-coding RNA and the Polycomb protein Suz12 mediates silencing of the HoxD locus in trans . Global studies are documenting an increasing repertoire of RNAs without apparent coding potential . Although speculative, the potential for non-coding RNAs to impact the chromatin landscape is considerable.
Chromatin insulators may also demarcate chromatin boundaries. In Drosophila, the gypsy insulator element can block the silencing influence of a Polycomb response element and halt the spread of associated H3K27 tri-methylation . In vertebrates, the major insulator protein identified is CTCF, which binds to a highly conserved DNA sequence motif [43,44]. Studies using ChIP-Chip and ChIP-Seq found that CTCF binding is frequently invariant across tissues [12,43]. CTCF often binds between genes that are in close genomic proximity, but have distinct expression patterns and opposing chromatin modifications [12,43]. Additional CTCF binding sites are evident between alternative promoters of a single gene, and may allow tissue specific promoter usage . A functional role for CTCF in influencing the locations and boundaries of mammalian chromatin domains, though intriguing, has yet to be documented.
State maps have also revealed chromatin signatures associated with a variety of other functional genomic elements. By mapping an array of histone modifications and chromatin proteins across a common 30 MB portion of genome, a common signature of enhancer elements has been identified [45,46]. It was found that the presence of the histone acetylase p300 along with mono-methylated and di-methylated H3K4 could effectively predict novel enhancers and distinguish them from promoters, which are marked instead by tri-methylated H3K4. This chromatin signature was used to identify a novel enhancer element for the SLC22A5 gene . Notably, the enhancer elements also tend to show hypersensitivity to DNase in genomic assays, but lack the nucleosome-free region shown to be a general characteristic of transcription start sites [47,48].
More broadly, chromatin state maps may be generally applicable for identifying the locations and cell type-specificities of functional DNA elements. An example includes the combined use of H3K4 and H3K36 tri-methylation to identify promoters and primary transcripts of microRNAs and other small RNAs . These entities are otherwise difficult to characterize due to rapid processing in the nucleus. In addition, the repressive histone modification H3K9 tri-methylation marks certain repeat classes as well as imprinting control regions, and may thus also have potential for identifying novel sites of epigenetic regulation [11,49].
Technological developments are transforming our ability to interrogate chromatin state and introducing exciting opportunities as well as formidable challenges. It is becoming both routine and cost-effective to acquire genome-wide maps for a rapidly expanding panel of chromatin epitopes in any cell type that can be obtained in sufficient numbers. State maps will provide a powerful and increasingly routine means for characterizing novel chromatin marks; i.e., by comparing their localization to known modifications and genome annotations, and/or following their dynamics upon perturbation of cell state. Alternatively, the phenotype and differentiation potential of a stem cell population might be ‘read out’ by mapping informative chromatin modifications and comparing the chromatin patterns against in vivo tissues. Once a critical mass of these ‘normal’ cell populations has been characterized, the chromatin states could be compared against cell models of human disease to potentially identify underlying defects.
The next few years will see the generation of a vast number of chromatin state maps. Effective interpretation of this data windfall will require major investment in computational infrastructure. A single state map consists of millions of contiguous datapoints (depending on resolution) that describe enrichment as a function of genomic position. Enriched regions, domains and chromatin boundaries must be defined systematically – suitable algorithms are under development, although they will need to be streamlined in order to handle the sheer mass of data and made generally accessible. Maps for multiple modifications and different cell types must then be integrated and evaluated in the context of existing data and genome annotations.
Significant challenges remain at the levels of cell biology, reagent standardization and computational analysis before the full potential of emerging chromatin state mapping technologies can be realized. However, as each of these issues is addressed, we will be rewarded with systems-level views that greatly enhance our understanding of chromatin structure, its role in normal development, and how its dysfunction contributes to disease pathology.
We thank Richard Koche, Andrew Chi and Mezher Adli for critical reading of the manuscript. E.M. is supported by an institutional training grant from the National Cancer Institute. Research by the authors is supported by funds from the National Human Genome Research Institute, the Burroughs Wellcome Fund, the Culpeper Foundation, the Harvard Stem Cell Institute, Massachusetts General Hospital, and the Broad Institute of MIT and Harvard.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.