|Home | About | Journals | Submit | Contact Us | Français|
Fuelled by new sequencing technologies, epigenome mapping projects are revealing epigenomic variation at all levels of biological complexity, from species to cells. Comparisons of methylation profiles among species reveal evolutionary conservation of gene body methylation patterns, pointing to the fundamental role of epigenomes in gene regulation. At the human population level, epigenomic changes provide footprints of the effects of genomic variants within the vast non-protein coding fraction of the genome while comparisons of the epigenomes of parents and their offspring point to quantitative epigenomic parent-of-origin effects confounding classical Mendelian genetics. At the organismal level, comparisons of epigenomes from diverse cell types provide insights into cellular differentiation. Finally, comparisons of epigenomes from monozygotic twins help dissect genetic and environmental influences on human phenotypes and longitudinal comparisons reveal aging-associated epigenomic drift. The development of new bioinformatic frameworks for comparative epigenome analysis is putting epigenome maps within reach of researchers across a wide spectrum of biological disciplines.
Epigenomics is a new arrival into an old lineage that includes the classical concept of epigenesis and the evolving concept of epigenetics. Within mainstream molecular biology, epigenetics currently has several specific definitions, all focusing on mechanisms other than changes in DNA sequence that perpetuate altered cellular activity states    . These mechanisms are essential for differentiated cells, which acquire distinct “programs”, not encoded in DNA, to guide their cell-type specific homeostatic behavior within a multicellular organism. In humans and other species epigenetic memory integrates genetic, physiological and environmental influences through the lifetime of an organism and is particularly sensitive to those influences during early development.
Epigenetic memory may reside in various cellular compartments and, most conspicuously, within the nucleus. The genomic DNA sequence in a living cell is tagged by methylation of specific cytosines, and by modifications of DNA-attached protein and RNA fractions of the chromatin. Some layers of this information, such as cytosine methylation tags within more than twenty eight million cytosine-phosphate-guanine (CpG) dinucleotides in the human genome, are replicated concurrently with the DNA. Other layers may reflect altered epigenetic states that may be propagated over long periods of time and during cell division by other mechanisms. An epigenome of a specific cell population is a comprehensive genome-wide map of these chromatin tags  .
Methylation of CpG dinucleotides in genomic DNA is the most studied epigenomic mark . In humans, methylation, hydroxymethylation, and possibly other modifications of the fifth carbon of cytosine occur within CpG dinucleotides. CpG methylation patterns may be reprogrammed during cellular differentiation but generally tend to be stably replicated during mitosis. A genome-wide map of cytosine methylation within a cell type comprises a methylome , a key component of the epigenome.
Additional layers of epigenomic information are contained within the protein fraction of chromatin. Stretches of about 147bp of DNA sequence wrap around nucleosomes to form nucleosome core particles, which in turn may form higher levels of periodic and aperiodic chromatin structure filling the nucleus . Each nucleosome contains eight histone proteins (typically two H2A, H2B, H3 and H4) with globular cores and unstructured amino-terminal tails. Numerous chemical modifications of specific amino acids in histone tails such as trimethylation of the fourth lysine on histone H3 (denoted H3K4me3) and trimethylation of the twenty seventh lysine (H3K27me3) occur in specific combinations along the genome and participate in gene expression regulation and all major processes within the nucleus .
In addition to the DNA and protein fractions, chromatin also includes an RNA fraction which plays an important role in epigenome regulation  . These and other layers of epigenomic information (Box 1) are reprogrammed by rearrangement and chemical modifications during cellular differentiation  . There is an extensive cross-talk between DNA methylation changes, histone modifications, non-coding RNAs, chromatin accessibility and the three-dimensional conformation of the chromatin  .
Each of the following five key epigenomic components consists of a layer of information about state of chromatin other than the information about the genomic nucleotide sequence.
Methylation of cytosines in genomic DNA . In differentiated human cell types cytosines that occur in a majority of close to thirty million 5′-CpG-3′ sites may be methylated. Approximately half of human genes contain CpG islands, regions highly enriched for CpG dinucleotides which tend not to be methylated across tissues. Mammalian tissues can be distinguished by their methylation patterns [25, 26].
Includes both binding of master regulators and nucleosomes to DNA as well as their modifications. The modifications include methylation, acetylation, and other modifications of specific amino acids in nucleosomal histones .
Both small  and large non-coding RNA [12, 100] species are involved in a diversity of mechanisms of epigenome regulation including chromatin silencing, imprinting control and X chromosome inactivation in mammals, and regulation of developmental regulators by the Polycomb-Trithorax system .
Degree of accessibility of the genomic DNA fraction of chromatin. The genomic DNA that is tightly bound by the protein component of the chromatin is less accessible to transcription machinery and other regulatory proteins . Chromatin accessibility is regulated during cellular differentiation. It associates with genomic variants and is heritable .
Proximity of chromosomal loci is experimentally determined by methods such as 3C-seq (Table 1). Spatial organization itself is dynamic, may affect gene expression, correlates with genomic methylation  and is determined at least in part by the action of master regulators such as estrogen receptor .
The rapidly decreasing cost of DNA sequencing is enabling wider adoption of technologies for sequencing-based mapping of epigenomes. The most widely used technologies using sequencing in the readout step (such technologies are usually suffixed by “-seq”) include whole-genome sequencing of bisulfite-treated genomic DNA (MethylC-seq) for mapping DNA methylomes, chromatin immuno-precipitation and sequencing (ChIP-seq) for mapping histone marks and DNA binding sites of regulatory protein complexes, RNA-sequencing (RNA-seq) for mapping the RNA fraction of the epigenome and others (listed in Table 1). The recently initiated NIH Roadmap Epigenomics Project  is applying such assays on a large scale to map the landscape of epigenomic variation (Figure 1) and to construct the Human Epigenome Atlas (http://www.epigenomeatlas.org). European and other worldwide epigenome mapping projects are also being initiated .
Here, I review some recent epigenomic advances with a primary focus on human and mouse epigenomic variation not directly related to disease, emphasizing studies of methylomes and to a lesser degree of histone marks. I discuss fresh insights gained by epigenome mapping and comparison, particularly those overturning long-held assumptions. I start with the comparison among species, which reveals evolutionarily conserved gene body methylation patterns, pointing to the fundamental role of epigenomes in gene regulation. I then move to the human population level, examining epigenomic footprints of genomic variation outside of traditionally better understood protein coding regions. Studies of parent-offspring trios reveal intricate tissue-specific parent-of-origin epigenomic effects confounding classical Mendelian genetics. At the level of individual organisms, analysis of epigenomes from diverse cell types helps map the pathways of cellular differentiation, providing insights into mammalian development. Influences of the environment and aging on human phenotypes are being dissected by longitudinal epigenome comparison and by comparing of epigenomes of monozygotic twins.
Epigenome mapping is fuelled by rapid assay technology advances but those require matching advances in bioinformatic technologies for downstream analysis of high volumes of data. I review bioinformatic resources and methodological frameworks for epigenomics, including methods for “deciphering” epigenomic “codes” and open issues in comparative epigenome analysis. I conclude by proposing a computationally-inspired perspective on the epigenome as the working memory of the cell.
It is widely recognized that DNA methylation, histone modifications and other layers of epigenomic information appear to be generally present across eukaryotes, but much of our understanding of epigenomic variation was until recently extrapolated to many organisms from studying just a handful of model organisms . Historically, DNA methylation has been associated with transcriptional silencing of transposons , but recent systematic mapping of DNA methylation across a number of animals, plants and fungi using whole-genome bisulfite sequencing   provided strong evidence that invertebrates generally do not follow this rule. The studies instead confirm the ancestrality and ubiquity of another pattern – methylation of the gene bodies of active constitutively transcribed housekeeping genes . Exons of such genes tend to be more methylated than introns and methylation tends to be excluded from transcription start and termination sites pointing to the role of methylation in regulating gene expression both at level of the whole gene and at the level of individual exons. The functional significance of gene body methylation has been most dramatically demonstrated in honeybees where knockdown of the DNA methyltransferase Dnmt3 diminishes gene body methylation and mimics the effect of royal jelly by causing larvae to develop into queens .
The pattern of preferential exon methylation in gene bodies is paralleled by preferential nucleosome positioning over exons and by enrichment of the histone mark H3K36me3 over transcribed exons. These patterns indicate an intimate connection between epigenomic changes and regulation of transcription at the level of whole genes and individual exons . A key mechanistic link between histone marks and exon splicing was provided in a breakthrough study  which identified adaptor molecules that recognize exonic histone mark signature involving H3K36me3 and affect tissue-specific pre-mRNA splicing. These results reinforce the hypothesis that for some genes the epigenome harbors a “splicing code”  that determines tissue-specific splicing outcomes.
Tissue-specific DNA methylation patterns are sufficiently conserved between human and mouse to completely discriminate tissue types regardless of species origin . Remarkably, comparative analyses revealed conservation of DNA methylation patterns between the two species even at loci where genomic DNA sequences diverged . These findings are consistent with observed conservation between human and mouse of tissue-specific spatial chromatin organization into coordinately replicated megabase-sized “replication domains” .
Epigenomes of non-human primates are yet to be mapped at high resolution. A comparison of DNA methylation patterns of 36 genes in brain, liver and lymphocytes between chimpanzee and human  revealed that divergence in methylation patterns is most pronounced in the brain, hinting that the rapid evolution of the human brain may have left significant epigenomic footprints.
In summary, evolutionary patterns of epigenome variation among species are revealing evolutionarily conserved roles of epigenomes in regulation of gene transcription, splicing, tissue specificity, development, and pointing to rapid epigenomic changes that may have accompanied human evolution.
Our ability to interpret functional effects of genomic sequence variants in the human genome is still largely limited to those occurring in exons and splice sites and thus affecting peptide sequences. But protein coding regions comprise less than two percent of the human genome. The effects of a fraction of sequence variants throughout the remaining fraction of the genome may now also be in principle discerned by observing associated epigenomic changes. Known examples of such effects include variants that cause disease via changes of chromatin structure in cis, such as CGG triplet repeat expansions in the Fragile X syndrome and repeat contractions in Facioscapulohumeral dystrophy . These disease variants create allelic imbalances between epigenomes on homologous chromosomes. Such allelic imbalances are sequence-associated and therefore distinct from other allelic imbalances such as in imprinting, where epigenomic state and gene expression depend on parent of origin, from randomized patterns of X inactivation in females, and from randomized patterns of monoallelic expression of genes involved in olfaction, immunity, and other processes .
The results of a first high-resolution genome-wide survey of allele-specific DNA methylation in humans using the methylation-sensitive single nucleotide polymorphism (SNP) analysis based method  were recently published . A large degree of quantitative allelic methylation imbalances was observed. The study showed that a large fraction of the imbalances was associated with sequence variation in cis, indicating that many sequence variants that reside on either maternal or paternal chromosomes may affect methylation levels on the same chromosome. Similar patterns were found in a study of peripheral blood mononuclear cells using whole-genome bisulfite sequencing . Both studies demonstrate not only allelic DNA methylation but also allele-specific expression of affected genes. A detailed plan has been proposed  to utilize the information about methylation imbalances in order to identify the SNPs of functional significance within critical regions detected by genome-wide association studies (GWAS). This is important because a large proportion of critical regions identified by GWAS fall outside of protein-coding regions. In such regions, epigenomic changes may provide the only lead toward variants that have functional consequences and may therefore be causing disease.
Genomic variants also have widespread effects on chromatin accessibility and transcription factor binding in lyphoblastoid cell lines . It is not clear at this point how changes of various layers of the epigenome, including histone marks, relate to each other genome-wide. However, locus-specific studies indicate coordinate allele-specific changes in DNA methylation and histone marks  .
A recent study  provides indications that structural genomic variants also affect the epigenome. Indeed, the effects of copy number variants on gene expression are not limited to the genes within copy-altered loci, as commonly assumed. In fact, many affected genes reside far away from the structural alteration, leading the authors to hypothesize that structural variant effects on gene expression may be mediated by changes in chromatin structure spanning hundreds of kilobasepairs. Future epigenome mapping and comparison studies should shed more light on this exciting hypothesis.
Genomic imprinting is a mechanism of transcriptional regulation of mammalian genes through which expression of a set of genes is restricted to one parental allele . Imprinted loci typically contain multiple genes regulated by parent-of-origin specific DNA methylation of an imprinting center , typically a CpG island. While the allelic methylation patterns of the imprinting center tend to be shared by all somatic cells, mono-allelic (maternal or paternal) expression of some imprinted genes may be observed only in specific tissues such as brain.
Classical on/off imprinting may only be a tip of an iceberg of complex and quantitative parent-of-origin effects, as revealed recently in a study of allelic expression in mouse using the mRNA-seq method  . The study reports a list of 1300 genes with parent-of-origin expression patterns, a thirteen-fold increase over the current number of about one hundred known imprinted genes in mouse. The deep mRNA sequencing revealed many subtle quantitative allelic imbalances, indicating that typical parent-of-origin effects are not binary. Alleles of maternal origin were preferentially expressed in embryonic brains, whereas paternal alleles were preferentially expressed in adults. Three times more genes with a sex-specific imprinting status were imprinted in female than in male offspring. The sex-specific influence was observed in the hypothalamus of female offspring, indicating a parent-specific influence on hypothalamic function in daughters but not in sons. The parent-of-origin effects revealed by these studies point to complex parental influences on behavior and physiology of their offspring, with obvious implications for studies of human phenotypes and diseases, particularly those affecting brain function and development.
Unexpectedly, parent-of-origin preference was detected even for different transcript isoforms of the same gene. This implicates the epigenome in splicing regulation, consistent with independent mechanistic studies pointing to an epigenomic “splicing code”  discussed above. Mapping of CpG methylation and additional epigenomic marks will be required in order to fully understand the extent and nature of these complex parent-of-origin effects in mammals.
Cellular differentiation leaves highly informative footprints on the epigenome at all stages of development. Mammalian epigenome experiences two major waves of genomic DNA demethylation and chromatin remodeling during embryogenesis, one upon fertilization  setting the stage for cellular differentiation during development and the other in primordial germ cells (PGCs) , setting the stage for imprinting and return to totipotency. A pioneering methylome comparison study  profiled DNA methylation throughout the genomes in embryos deficient in Activation-Induced Cytidine Deaminase (AID) and wild-type mice by whole-genome bisulfite sequencing. The comparison of epigenomes revealed significantly reduced genome-wide erasure of DNA methylation in PGCs affected by AID deficiency, confirming the role of AID in demethylation. A related study  has argued that deamination may be a triggering event and provided evidence that base-excision repair pathway plays a key role in genomic demethylation of PGCs as well as in genomic demethylation of paternal pronuclei upon oocyte fertilization.
Comprehensive mapping of methylomes of embryonic stem cells and differentiated fibroblasts   was a milestone in epigenomic studies of development. High degree of genome-wide methylation in non-CpG cytosines in embryonic stem cells but not in fibroblasts was detected at basepair level of resolution and for the first time characterized genome-wide . Reduced methylation levels in fibroblasts were associated with lower transcriptional activity of the affected genes. Regions proximal to genes involved in pluripotency were differentially methylated, confirming the role of DNA methylation in cell fate determination.
Histone modifications also undergo changes during differentiation. The “bivalent” or “poised” state consisting of the H3K27me/H3K4me combination is particularly abundant in pluripotent cells . This state is inherited by daughter cells during mitosis and makes genes responsive to certain intrinsic or environmental stimuli during cellular differentiation. There is also an extensive cross-talk between histone modifications and DNA methylation changes during cellular differentiation . About half of all human genes, among them many key regulators of development regulated by the Polycomb/Trithorax system  contain in their promoter regions a CpG island . These CpG islands are mostly unmethylated across different mammalian tissue types but may become methylated during cellular differentiation, in aging tissues, and in cancer  .
Recent comparison of methylomes and histone marks in the H1 human embryonic stem cell (hESC) line and differentiated fibroblast cell line IMR90  indicates that the most dynamic DNA methylation changes occur at enhancers and other distant regulatory regions, involving complex cell-type specific interactions with H3K4me1 and H3K4me2 marks. The differentiation of hESC into fibroblasts also involves extensive spreading of repressive chromatin marks. The repressive H3K27me3 mark covers only 119 Mbp of the genome of the H1 human embryonic stem cell line but spreads over 394Mbp in differentiated fibroblast cell line IMR90. The repressive H3K9me3 mark similarly spreads from 148Mbp to 510Mbp. The gain of the repressive marks is associated with corresponding decrease in DNA methylation, challenging the common assumption that gene repression is associated with DNA methylation.
The expansion of repressive H3K9me3 and H3K27me3 marks in IMR90 and CD4+ T cells selectively affects developmental genes. Distinct genes were repressed by expanded H3K27me3 domains in the two cell types, hinting at the role of chromatin-mediated repression in cell fate determination. This is consistent with a comparison of H3K27me3 marks in pancreatic beta cells and acinar cells  revealing an extensive H3K27me3 program shared by the two endodermal cell types. In sharp contrast to the shared endodermal epigenomic program, the gene expression signature of beta cells resembles that of ectoderm-derived neural tissues. The study further showed that the neural expression program was co-opted during late pancreatic cell differentiation through activation of a small number of transcriptional regulators by selective removal of inactivating H3K27me3 marks.
A pioneering study of twelve human tissues  identified tissue-specific differentially methylated genomic regions (T-DMRs) in 17% of 873 genes examined on chromosomes 6, 20 and 22. DNA methylation in about one third of 5′ untranslated regions was inversely proportional to gene expression. A bimodal distribution of DNA methylation levels with transcription start sites (TSS) was observed showing a hypomethylated core region of about 1,000bp symmetrically surrounding the sites. Several epigenome mapping studies reported T-DMRs at low-CpG promoters   , CpG island shores  and at promoter-distal sites that exhibit enhancer activity  .
Cellular differentiation involves interaction between epigenomic changes and the action of master regulators of development including key transcription factors regulating pluripotency and micro RNAs  . The list of prominent master regulators also includes the enhancer blocker and domain barrier associated protein CTCF , enhancer associated pioneer factor FOXA1 , master regulator of erythroid development GATA1 , estrogen receptor  and others. Binding of these regulators to typically many thousands of genomic loci is revealed by ChIP-seq assays. Regulator binding is typically accompanied by alterations in multiple histone marks, changes in chromatin accessibility, and spatial reorganization of chromosomes .
In summary, accumulating epigenomes from diverse cell types are providing raw material for detailed maps of cellular differentiation. More insights into cellular differentiation are expected to emerge from comparative analyses of epigenomes of differentiated cell types (Box 2 lists some issues in comparative analysis) that are being produced by current epigenome mapping projects (Table 2).
Any specific comparative analysis of epigenomes may require that some key experimental and computational issues be addressed. In the following we briefly review those issues.
Tissue samples may contain multiple distinct cell types. Consequently, differences detected by comparison of two samples may be due to different proportions of epigenomically distinct cell types within each sample. If the epigenomic signatures of constituent cell types are known in advance, multiple linear regression methods may help estimate relative abundance of different cell types within the tissue.
Various methylation assays differ in resolution and cover different but overlapping genomic regions [8, 37]  (Table 1). When comparing methylomes obtained by two different assays, comparison may be performed at resolution (window size) that equals the lower of the two. To avoid detecting spurious differences, comparison may be limited to loci equally ascertained by both assays.
For the purpose of comparison, two epigenomes may be converted into lists of numbers, each number corresponding to an average value of an epigenomic marks over a window of fixed size or over a feature such as an enhancer, a promoter or a CpG island. The choice of features may depend on the goal of comparison.
Each cell contains two epigenomes, one on the paternal and the other on the maternal chromosome. Assays with sequencing readouts provide means of distinguishing the two signals at heterozygous loci. Pash 3.0 program  maps both regular and bisulfite-treated reads while detecting basepair and indel variants and can be used for calling allele-specific methylation.
Epigenome comparison may be a search for a small or even a single important difference between two epigenomes. For example, a single difference between methylomes of monozygotic twins may be responsible for a phenotype distinguishing them. Sensitive well-characterized pipelines for such comparisons are yet to be developed. The best current solution is to perform multiple comparisons using different tools to improve the chance of detecting causative differences .
Appropriate algorithms must be chosen for comparative analyses involving more than 2 epigenomes. Random forest algorithms have been used to cluster collections of epigenomes by tissue type and age . The use of cladistic tree reconstruction methods has been suggested for the purpose of reconstructing the bifurcating tree of cellular differentiation .
A combination of genetic, physiological and environmental influences through the lifetime of an organism may have an effect on the epigenomic “program” acquired by a cell during differentiation. The epigenomic program may be particularly sensitive to environmental influences during early stages of development, but significant environmentally-induced and stochastic changes may accumulate over a lifetime of a mature organism, increasing risk of aging-associated diseases.
The epigenome has now been established as a key mediator of environmental influences on phenotypes in plants and animals, including mammals. Increased licking, grooming and nursing by rat mothers improved the stress response of their pups via DNA methylation and histone modifications in the promoter of a glucocorticoid receptor gene expressed in the hippocampus . Both mouse  and human studies  show that maternal nutrition during gestation may leave epigenomic footprints causing obesity in offspring. Transmission of environmental influences by the epigenome sometimes carries across multiple generations .
A powerful approach to identify environmental influences on the epigenome and aging-associated epigenomic drift in humans is to compare epigenomes of monozygotic (MZ) twins. A pioneering study  applied restriction landmark genome scanning (RLGS), an early method for CpG methylation mapping using methylation-sensitive restriction enzymes, to obtain epigenome maps in a large cohort of MZ twins. Although the twins were epigenomically indistinguishable during their early years of life in terms of CpG methylation and histone acetylation, epigenomes significantly diverged with age across various tissue types. The role of the environment is indicated by the fact that the epigenomes of twins that were older, had different lifestyles, and spent less time together diverged more.
Large discordance between MZ twins in heritable psychiatric disorders, including schizophrenia and bipolar disorder provided an early rationale for epigenomic comparisons  . The first comprehensive characterization of epigenomic differences between MZ twins discordant for a disease came from MZ twins discordant for multiple sclerosis (MS) , where the relevant cell types are more readily accessible. Comparison of CD4+ T-cells (cells involved in the MS-causing auto-immune reaction) involved complete genomic DNA sequences, DNA methylomes determined by the RRBS method, and transcriptomes sequenced by RNA-seq. Observed differences failed to explain the discordance between the twins. The search for causative changes revealed open methodological problems that need to be addressed at both the experimental and algorithmic levels (Box 2).
Evidence for aging-associated epigenomic changes has been accumulating  [70, 75]   . Several recent studies on humans   and mouse  identified CpG islands associated with “bivalent” histone marks and key developmental regulators (genes typically silenced in stem cells by the Polycomb/Trithorax system and not by CpG island methylation) as being preferentially methylated with aging. The methylation of CpG islands silences developmental regulators, thus conferring stem-cell like properties to both aging and cancer cells. The patterns of CpG island methylation are therefore consistent with aging-associated cancer risk. Mapping of methylation patterns in the CpG-poor fraction of the genome  revealed even more pervasive aging-associated tissue-dependent patterns of hypomethylation affecting expression of metabolism-related genes, pointing to yet another connection between epigenomic changes and aging-associated pathophysiology.
In summary the pervasive aging-associated methylation changes discovered by the first comprehensive epigenome comparisons between MZ twins discordant for a human disease may help explain aging-associated disease risk. With the lowering of experimental and bioinformatic barriers, many more such studies are soon likely to provide insights into epigenomic connections between aging, environment, and human health.
As evident from the results reviewed so far, epigenome mapping and analysis is picking up pace. A number of large collaborative projects , including the NIH Roadmap Epigenomics Initiative  are contributing toward a map of epigenomic variation (Table 2). The International Human Epigenome Consortium (http://ihec-epigenomes.org) has been organized with the goal to coordinate and provide standards for international epigenome mapping efforts. Tools for epigenome analysis and visualization are becoming available and have recently been reviewed  . It the following we briefly review three conceptual frameworks for epigenome analysis.
The significance of epigenomic variation is commonly evaluated based on its immediate effects on gene expression. But, the causative link may also go in the other direction – the epigenomic state may be a consequence of active transcription. Even in cases where epigenomic state is causative, the change of the epigenome may be separated in time from its effect on gene expression. Also, the final effects may depend not on a single change but on a combination of epigenomic changes. Moreover, the effects of an epigenomic change, say during development, may be contingent on future intrinsic or environmental inputs. Based on these and additional theoretical considerations, criteria for biologically meaningful functional “codes” have been proposed . One recently discovered combinatorial chromatin state that meets the proposed criteria is the “bivalent” or “poised” state consisting of the H3K27me/H3K4me combination . This state is inherited by daughter cells during mitosis and makes genes responsive to certain intrinsic or environmental stimuli during cellular differentiation. Newly discovered biologically meaningful recurrent combinations of histone marks that may belong to an epigenomic “code” include signatures of active and inactive core and proximal promoters ; active and inactive enhancers   ; gene-body signatures of active transcription ; gene regulation by alternative promoters  and alternative splicing  as well as gene silencing within larger heterochromatic domains conserved across cell types . A “code-book” of fifty one epigenomic states, each state defined by a combination of epigenomic marks, has been inferred using an unsupervised learning approach based on the Hidden Markov Model approach . Similar codes can also be inferred using a Bayes Network approach . Using such “codes”, epigenomes can be “parsed” into segments corresponding to individually defined and biologically meaningful states.
The second framework is comparative analysis of epigenomes. This framework will gain importance with the increasing density of the sampling of the epigenomic variation space by the growing number of epigenomic projects. A key requirement for “meta-comparison” of epigenomes produced by different projects will be adherence to common data and metadata standards and the development of an informatic infrastructure for comparative epigenome analysis . A number of issues of relevance for the comparative analysis of epigenomes are listed in Box 2.
The third framework that is yet to be tested on epigenomic data is systems modeling. One class of promising models, already used in tissue engineering, are cellular automata [94, 95]. Equipped with programmable epigenomes that are responsive to extra-cellular stimuli—which are currently lacking in these models--cellular automata may help bridge the chasm between the microcosm of the cell and the macrocosm of tissues and higher levels of organization of multi-cellular organisms.
In summary, the emerging patterns of epigenomic variation are revealing epigenomic “codes” and providing insights through comparative analysis across a broad range of fields. With predictable technological advances over the coming years, patterns of epigenomic variation will be mapped at an ever accelerating rate. But will epigenomics live up to current high expectations?
Reasons for optimism are suggested by the central role of the epigenome in a cell-as-computer analogy. By combining the attributes of stability and programmability, the epigenome corresponds to computer’s working memory, which typically resides in computer RAM and includes the complete record of all active processes. In this key role, epigenome bridges the gap between the “read-only memory” or “CPU firmware” of genomic DNA and the transient or “memoryless” intra- and inter-cellular signaling processes. Cell’s “working memory” includes both the cell-type specific “program” that guides cell’s behavior and a record of key biological processes and environmental influences. It is therefore reasonable to expect that reading and interpreting epigenomes will help close knowledge gaps across a broad spectrum of biology.
The author thanks Dr. R. Alan Harris for thoughtful comments and corrections and acknowledges support from the NIH Roadmap Epigenomics Roadmap Initiative grant U01 DA025956.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.