|Home | About | Journals | Submit | Contact Us | Français|
Covalent modifications of histone proteins play key roles in transcription, DNA repair, recombination, and other such processes. Over a hundred histone modifications have been described, and a popular idea in the field is that the function of a single histone mark cannot be understood without understanding its combinatorial co-occurrence with other marks, an idea generally called the “histone code hypothesis.” This idea is hotly debated, with increasing biochemical evidence for chromatin regulatory factors that bind to specific histone modification combinations, but functional and localization studies finding minimal combinatorial complexity in histone modification patterns. This review will focus on these contrasting results, and will briefly touch on possible ways to reconcile these conflicting views.
The great variety and number of covalent modifications found on proteins provides one of the great challenges of the post-genome era – understanding the functions of the tens of thousands of proteins encoded in a given genome is difficult enough without having to contend with billions of possible combinatorial modification states on individual proteins. Yet this is exactly the problem we are faced with in chromatin – the highly-conserved histone proteins are extensively modified post-translationally, with well over one hundred distinct modification sites described in the literature. Histone modifications include lysine acetylation, lysine methylation, arginine methylation, serine phosphorylation, lysine ubiquitination, and many others. This incredible diversity of histone modifications leads naturally to the question of what it all means – why do so many histone modifications occur in the cell? This question only becomes more vexing when considering that even in the past year mass spectrometry studies have identified scores of previously unknown histone modifications , vastly increasing the possible state space of combinatorial modifications. Perhaps the most-cited and most-debated concept in histone modifications is something called “the histone code”, and it is the purpose of this review to critically evaluate current evidence for and against the histone code hypothesis.
One pervasive semantic problem in the histone modification literature is that different investigators mean different things when they refer to the histone code, meaning that in many cases two papers can present nearly-identical results yet claim that their respective results support or refute the histone code. There are multiple reasons for this confusion, ranging from the multiple dictionary definitions of the word “code” to the number of histone modification reviews that use the code metaphor. Hypotheses based on the code metaphor include the idea that histone modifications have binding partners , the idea that histone modification patterns are heritable , and the suggestion that histone modifications function in combinatorial patterns , among others.
I will focus here on the review from Strahl and Allis entitled “The Language of Covalent Histone Modifications” . This review focused on combinations of histone modifications, and has lead to a hotly-contested notion that specific combinations of histone marks would specify unique biological outcomes. Even focusing on this one histone code hypothesis, there is a great deal of confusion and debate in the literature with the same result often being used as evidence either for or against the “histone code”. However, I believe the most clearly implied hypothesis in the review centers on the concept that a given combination of modifications, such as H3K4me3/H4K16ac, would be “read” by a combination-specific protein or protein complex and thereby lead to a downstream event distinct from the readouts resulting from H3K4me3 alone or H4K16ac alone. It is important to note that even this prediction is subject to some level of ambiguity: if H4K5ac leads to a 2-fold increase in transcription, and H4K8ac leads to a 2-fold increase, then if H4K5acK8ac leads to a 4-fold increase in transcription is this a “code”? Or does the effect of combinations in a code need to be qualitatively distinct from a linear combination of the two single effects? Nonetheless, this aspect of the histone code provides fodder for investigation. How many distinct modification combinations occur, and can specific combinations of modifications be matched with specific outcomes such that the code can be deciphered?
At present, an intellectual schism exists between biochemists on one hand, and geneticists and epigenomics researchers on the other. Genome-wide mapping of histone modifications invariably shows that histone modifications occur in groups of multiple highly-correlated modifications, demonstrating that the huge potential space of modification combinations is not utilized in vivo. Conversely, many chromatin regulatory proteins and complexes contain multiple binding domains specific for histone modifications, and in some cases have been shown to bind in vitro to a combination of modifications more strongly than to either single modification, suggesting that these “readers” must respond to some specific combination in vivo. These results are of course not incompatible – in most described cases, histone binding proteins tend to bind to combinations that occur widely in vivo, so binding sites certainly exist for such factors in vivo. The problem lies in the fact that such combinatorial binding does not therefore explain specificity in the function of chromatin regulators – if a factor binds to two marks that globally co-occur, then combinatorial histone modification patterns do not provide any additional insight into why a given histone modification only affects a small subset of genes at which it occurs in vivo. We believe that bridging this divide between localization, biochemistry, and function is the single greatest challenge facing the histone modification field currently.
The pro-complexity viewpoint comes from, and finds support in, the field of chromatin modifying enzyme biochemistry. Chromatin-regulating proteins often carry domains that have been shown to bind specifically to a type of covalently-modified amino acid – bromodomains bind to acetylated lysines , PHD fingers bind to methylated lysines , and so forth. Most of these domains bind to singly-modified histone peptides, although occasionally it appears that a single modification-binding domain can bind to a multiply-modified histone peptide – for example, bromodomain 1 from the mouse TAF1 homolog Brdt binds to H4K5acK8ac peptides with a Kd of 28 µM, but does not bind detectably to either singly modified peptide .
Many individual proteins include more than one of these domains, or in some cases protein dimerization leads to complexes with multiple histone-binding domains. For example, the Snf2 homolog and ATP-dependent remodeling protein Chd1 has two chromodomains, while the yeast RSC subunit Rsc1 carries two bromodomains. When considering entire protein complexes this potential for multivalent binding is obviously much greater, with large complexes such as TIP60 carrying as many as 10 distinct histone modification-binding domains.
This potential for multivalent binding is a significant motivator for the pro-complexity argument (Figure 1). Indeed, in some cases, proteins or complexes with multiple modification-specific domains have been shown to bind to combinatorial histone marks . For example, the Arabidopsis CMT3 chromodomain preferentially binds, as a dimer, to the doubly-modified H3K9me3K27me3 but not to either individual methylation state . As an example of an individual protein with multiple histone binding domains, Chd1 carries tandem chromodomains which bind with higher affinity to H3R2me2K4me3 peptides than to H3K4me3 alone . Another protein involved in ATP-dependent chromatin remodeling, BPTF, carries adjacent PHD and bromodomains. The bromodomain binds to peptides carrying one of three different acetylated H4 residues with affinities from 70 to 130 µM , whereas the PHD finger binds to H3K4me3 peptides with 2.7 µM affinity . Together, the PHD-Bromo module from this protein was shown to bind with a ~2-fold increased affinity to H3K4me3/H4K16ac mononucleosomes relative to nucleosomes lacking the H4K16ac  (as illustrated in Figure 1A). Peptide microarray studies also showed an increase in binding between this PHD-Bromo domain construct when H3K4me3 was combined with many different H3 acetyl states  (which were not tested in the mononucleosome-binding assay of Ruthenburg et al), consistent with the relative promiscuity of bromodomains for acetyl-lysines.
In other (seemingly more numerous) cases, modification-specific binding by a given domain is inhibited by nearby modifications (Figure 1B). Most famously, binding of HP1 to H3K9me3 is inhibited by phosphorylation of the adjacent H3S10 . A similar case was described for a PHD-Bromo cassette in TRIM24 , which binds preferentially to H3K23ac, but whose binding is inhibited by methylation at H3K4. Inhibition of modification-specific binding by H3K4 methylation also occurs for the ADD (Atrx, Dnmt3, Dnmt3L) domain of Atrx, which binds to H3 tail peptides carrying H3K9me3 or me2, but whose binding is inhibited by H3K4 methylation [14, 15]. It is worth pointing out in such cases that combinatorial complexity remains low – for HP1 binding, three of four combinations of marks (unmodified H3, H3S10P, and H3K9me3S10P) lead to the same output (no HP1 binding), so the four possible combinations still only yield two biological outputs.
Thus, in an increasing number of cases evidence supports the idea that “multivalent” modification binding can occur in vitro. However, the connections between biochemical binding events (which are generally of disconcertingly low affinities of micromolar and above) and in vivo function are often unclear. Specifically, functional studies in which transcriptional output in specific chromatin mutants is assayed often find little link between the presence of a chromatin regulator and its function (at least in transcription). Moreover, genome-wide mapping studies find little evidence for combinatorial complexity in histone modification patterns, and raise the question of whether combinatorial binding by proteins provides any meaningful biological discrimination (eg if states A and B always co-occur, then an AB-binding protein has no need to discriminate between A and AB). These issues are detailed below.
One systematic approach to understand the functions of combinatorial histone modifications has been the whole genome analysis of gene expression changes observed in histone mutants. Such studies typically find that histone mutants exhibit relatively little phenotypic complexity resulting from different combinations of histone mutants. This is seen both in studies focusing on specific histone point mutants and their combinations, as well as in studies focusing on deletion mutants of histone modifying enzymes.
An early study from the Wyrick lab examined all 5 single K→A mutations in the H3 N-terminal tail, as well as the quintuple mutant . Gene expression changes in the single mutants were modest, and were highly correlated with one another. In other words, the loss of H3K9 acetylation appears little different than the loss of H3K18 acetylation to the cell. A similar study on the H4 N-terminal tail systematically examined all 16 possible combinatorial mutations among the 4 lysines in this tail . Here, mutation of three of the residues (lysines 5, 8, and 12) had indistinguishable effects on gene expression, whereas lysine 16 was confirmed to have unique effects on gene expression. Furthermore, gene expression defects in combinatorial mutants were little different from linear combinations of the component mutations – in other words, the effect of the H4K5,16R double mutant on gene expression could be predicted by adding the K5R and K16R datasets together. Similar results are observed when combining deletions of chromatin regulatory factors .
Perhaps more troubling, a dramatic mismatch is observed between mapping studies and gene expression analyses. For instance, H3K4me3 occurs universally at the 5’ ends of transcribed genes, yet mutation of H3K4 or of the K4 methylase has almost no effects on gene expression [19, 20]. This observation holds true for virtually all chromatin regulators studied, although it is not routinely pointed out. As one of a multitude of prominent examples, Polycomb group proteins in embryonic stem cells map to the promoters of key developmental regulators, yet various Polycomb knockdowns do not result in upregulation of most of these genes. This conundrum will be discussed below.
How many histone modification combinations occur in vivo? The advent of genome-wide localization analysis has enabled an unprecedented view of histone modifications. The first of these studies to specifically address the question of combinatorial histone modifications came from Schubeler and Groudine, who showed using ~1 kb resolution microarrays that many histone modifications were highly-correlated in vivo . Similar results were obtained by the Grunstein lab for budding yeast, with strong correlations among a variety of histone acetylation states in vivo . Such results were soon obtained at mononucleosomal resolution in yeast , where the majority (>81%) of the variance in a set of 12 histone modifications could be captured using two groups of modifications (Figure 2).
These results have been seen over and over in ensuing years. In mammals the Zhao laboratory pioneered the genome-wide mapping of many histone methylation and acetylation states in CD4+ T cells [24, 25], and later analysis of these 41 chromatin marks showed that >56% of the variance in this dataset was captured using 3 principal components . Subsequently the ENCODE and reference epigenome projects have extended these results to a wide range of cell lines, with similar conclusions . Similar studies have been published for many species, most extensively D. melanogaster , A. thaliana , and C. elegans . In all cases, it is found that the potentially enormous space of possible combinatorial modification patterns can be productively compressed with little loss of information. For instance, low resolution analysis of 53 proteins (largely chromatin-related) by DamID in Drosophila revealed that 5 “colors” of chromatin could be used to classify the majority of the genome . At nucleosomal resolution, using 18 chromatin marks in Drosophila , the authors identified 9 distinct “combinatorial” histone modifications patterns, dramatically compressing the potential state space down from ~218 (>250,000).
Such results are possible because of the strong co-occurrence patterns of histone modifications, a result also observed (but seldom remarked upon) in mass spectrometry studies. When individual chromatin “states” are defined from mapping studies, these typically describe a handful of genomic locations – promoters, introns, exons, enhancers, etc. At each location, many histone modifications co-occur. Most obviously, actively transcribed promoters, in all organisms studied, are associated with H3K4me3 as well as a range of histone tail acetylation marks such as H3K9ac and many others. Many reasons exist for such co-occurrence patterns, including low substrate specificity for some histone-modifying enzymes (particularly acetyltransferases), recruitment of multiple modifying factors to a genomic locus (as in, eg, multiple histone modifying enzymes traveling with elongating RNA Pol II), and the recruitment/regulation of histone modifying enzymes by other histone modifications (histone crosstalk).
The simplicity of histone modification patterns in vivo therefore raises questions about the meaning of combinatorial histone modification binding observed so often in vitro. Most relevant in my view is the question of whether combinatorial binding by chromatin regulators leads to any biological discrimination between genomic regions. The ability of a protein to bind to combination AB does not mean that the protein’s in vivo function is to distinguish between A and AB (or between B and AB). In other words, if two modifications universally co-occur in vivo and are located close to one another on a peptide, then a protein selected to bind to A will most likely bind to AB. But this does not mean that the protein plays a role in distinguishing A from AB in vivo.
For instance, BPTF binding to H3K4me3 is enhanced by various H3 or H4 acetylation states [10, 11]. Since H3K4me3 typically co-occurs with a large number of H3/H4 acetylation marks in vivo [23, 27, 28], the role of BPTF’s bromodomain in genome-wide localization is likely to be minimal. Furthermore, other factors must play key roles in localization of chromatin factors such as BPTF – Ruthenburg et al compared BPTF localization genome-wide to H3K4me3, H4K12ac, and H4K16ac, and not only are the predicted overlaps observed, but even in the examples presented in Figure 6 from Ruthenburg et al  several BPTF binding peaks are observed that cannot be explained by these histone marks. Many other examples of combinatorial histone modification binding proteins or complexes raise similar questions – the Atrx ADD domain binds to H3K9me3, a heterochromatin mark, and this binding is inhibited by methylation at H3K4, which typically occurs in euchromatin. Since these marks seldom if ever co-occur, it remains to be seen whether H3K4 methylation is ever used in vivo to evict Atrx from a locus previously carrying H3K9me3 without K4 methylation.
Taken together, we find little evidence at present to support the hypothesis that recruitment or activation of chromatin regulatory proteins by a specific combination of modifications leads to unique outcomes. Yet there is little question that some proteins do bind in vitro to specific modification combinations. How can we reconcile these views: why do most confirmed combination-specific binding proteins have in vivo functions that appear not to rely on combinatorial modifications? Below, we note several potential reasons for the disconnect between chromatin biochemistry and in vivo function.
As noted above, one prediction that might be made for the histone code is that proteins that bind strongly to a specific combination of modifications should be specifically found at genomic loci where this combination occurs. This prediction often fails to hold, as noted for BPTF above. Similarly, Chd1 binds preferentially to H3R2me2K4me3 in vitro, yet localizes very close to the TSS in embryonic stem cells , where H3K4me3 likely occurs without concomitant R2 methylation (which is limited to gene bodies). What role, then, does R2 methylation play in Chd1 biology?
An emerging theme in the biology of histone modification binding domains is that these domains may not be involved in recruiting chromatin regulators to specific genomic loci, but may instead play roles in allosterically regulating complex behavior. One of the clearest examples is the Rpd3S complex, a deacetylase complex conserved from yeast to humans. This complex carries a chromodomain protein Eaf3 (yeast), which has been shown to specifically bind to H3K36me3 over the middle and 3’ ends of coding regions [33, 34]. Deletion of the H3K36 methylase Set2 in yeast, or deletion of the Eaf3 chromodomain, both lead to increased histone acetylation over the bodies of transcribed genes . This suggested that H3K36me3 served to recruit this complex to coding regions, where the complex functioned to deacetylate histones. Surprisingly, it has recently been shown that deletion of Set2, or of the Eaf3 chromodomain, has no effect on localization of the Rpd3S complex . Instead, this complex is localized by binding to the S2P form of the Pol2 CTD (the same mark that recruits Set2). Thus, localization of the complex is unaffected by H3K36 methylation, while function of the complex is completely dependent on this mark, suggesting that H3K36me3 serves to activate the Rpd3S complex, rather than recruit it.
A clear example of allosteric regulation of chromatin complexes by modifications comes from elegant work from the Cairns lab on the RSC ATP-dependent chromatin remodeling complex. This complex contains several bromodomains, including a tandem bromodomain in Rsc4, giving it the combinatorial potential to bind to a very specific pattern of lysine acetylation in vivo. However, a crystal structure of Rsc4 found that while the first bromodomain could bind to H3K14ac, the second bromodomain bound to Rsc4 K25ac . Acetylation of this lysine by Gcn5 was shown to inhibit Rsc4 binding to histone peptides. This provides a mechanistically detailed example of allosteric regulation of chromatin regulatory complex function, rather than localization, by lysine modification, although in this case the modification occurs on the complex itself rather than on the histone proteins.
Regulation of other ATP-dependent remodeling enzymes by histone marks appears to be at the level of function rather than localization – the tandem chromodomain Chd1 regulates its nucleosome binding and function, but not its genomic localization [38, 39]. Similarly, regulation of Drosophila ISWI by acetylation of H4K16 appears to occur largely at the level of ATPase activity rather than localization [40, 41].
Together, these examples provide one potential explanation for the disconnect between modification binding in vitro and in vivo localization of specific proteins. If a combinatorial binding protein is activated, rather than recruited, by one or more of its preferred combinations of histone modifications, this could explain why binding proteins with characterized histone binding specificities do not exhibit predicted localization patterns.
We next come to the question of why genetic analyses of histone mutants do not provide any evidence supporting a requirement for combinatorial modification patterns for the in vivo functions of chromatin regulators. In general, there is a vexing mismatch between histone modification occurrence and importance, even for single modifications . In other words, H3K36me3 is a universal mark found over coding regions and has been termed an “elongation” mark, yet H3K36A mutants, or set2Δ mutants, affect a very small subset of genes in yeast (and do not exhibit general defects in transcriptional elongation).
In perhaps the most egregious example, H3K4me3 is universally found at the 5’ ends of genes [23, 24, 43–45], at levels that scale with transcription of the associated gene (or, at genes with paused RNA polymerase, at levels that scale with polymerase abundance). H3K4me3 is linked to gene activation, as Trithorax group proteins in Drosophila required for maintenance of a gene’s active state during development play roles in H3K4 methylation, or are recruited by H3K4 methylation . Yet in budding yeast, deletion of the K4 methylase Set1 is well-tolerated, resulting primarily in derepression of ~50–100 genes such as midsporulation genes . In mammalian ES cells, knockdown of DPY30 results in a global 80% decrease in H3K4me3, yet remarkably few genes change expression . The central question raised is: Why do so few genes “care” about H3K4me3, and what is special about those that do?
Several explanations for this behavior have been proposed. These include, first, the possibility that transcription is not the relevant signal generated by a given mark, which is no doubt often true. Second, in the case of H3K4me3, it was suggested that this mark “poises” genes for future activation, which is plausible in some cases, although alternative hypotheses can explain much of the current evidence for this idea. Third, the gene-specific effects of mutants in the relevant modifying enzymes, provides much of the motivation for belief in the histone code. For example, if a given protein such as BPTF only functions at H3K4me3/H4K16ac promoters then this might explain why not all H3K4me3 promoters are affected by BPTF knockdown. However, based on the widespread co-occurrences noted above, this explanation is unlikely in most cases (although most extant modification mapping studies are carried out in near-optimal growth conditions, leaving the possibility that certain combinatorial modification patterns occur in alternative growth conditions).
One of the most instructive examples for thinking about the context-dependence of histone modifications comes from the yeast RPD3S complex, which as noted above is activated by H3K36me3. The role of this deacetylase complex is to re-compact chromatin loosened by Pol2 passage, and to thereby repress sequences in coding regions (and elsewhere) that would act as promoters if inappropriately opened. These so-called “cryptic” promoters are not universal, occurring within a few hundred open reading frames . Thus, while RPD3S acts universally over coding regions, only genes with cryptic internal promoters are significantly affected by its loss, neatly explaining the specificity of action of this near-universal mark. Recently, a conceptually similar explanation was offered for the role of the H2A.Z ortholog in S. pombe – mutants lacking this near-universal 5’ chromatin mark carry increased levels of antisense transcripts at genes oriented convergently with respect to a downstream gene . Thus, the role of H2A.Z here is to provide a universal 5’-restricted signal, so that the passage of elongating Pol2 must result from an antisense transcript, which can then be degraded. It seems likely that additional examples like this will explain context dependence of histone modifications in the future – in budding yeast, loss of the K4 methylase Set1 is increasingly seen to specifically affect the expression of genes that are regulated by antisense transcripts produced from a different locus (eg in trans) [49–51].
Bridging the intellectual divide between chromatin biochemistry, functional genetics, and epigenomics is a key goal for histone modification biology. Many chromatin regulatory proteins bind (slightly) more tightly to doubly-modified histones or nucleosomes than to nucleosomes with either single modification, suggesting that combinatorial complexity is meaningful for chromatin biology. Yet histone modification patterns are simple in vivo, occurring in few combinations. Moreover, chromatin regulatory proteins often associate with genomic loci in vivo that lack the modification patterns to which they bind in vitro. Finally, chromatin regulators typically only have measurable effects at a subset of the locations to which they are recruited. These considerations will need to be addressed for a full understanding of the regulation of chromatin function by histone modification-binding factors.
I believe that several avenues of investigation will be fruitful for reconciling these observations. Allosteric activation of proteins by histone marks will help explain the disconnect between in vivo protein localization and the occurrence of relevant histone modifications, and careful functional analysis of gene-specific effects of widespread chromatin regulators has begun to explain some cases of “context dependence” for histone modifications. To fully understand the role of combinatorial binding, such in vivo studies must be extended to carefully chosen mutants in chromatin regulators with altered modification binding behavior in vitro – are any genes transcriptionally affected in BPTF mutants that bind solely to H3K4me3, rather than H3K4me3/H4K16ac?
Importantly, what we learn from the histone modification field will provide unique insights for the many other biological systems that involve large numbers of covalent modifications, as the physical separation of covalently modified histones along the genome provides insights that are difficult to glean from multiply-modified proteins that freely diffuse in solution. To (over)extend the language metaphor, the physical linkage between the language of histone modifications and the “book of life” may give us the Rosetta stone for understanding covalent modifications throughout biology.
I thank S. Henikoff, C. Peterson, P. Kaufman, T. Fazzio, and especially B. Cairns for comments on the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.