In eukaryotic cells, the genomic DNA is etched with a number of chemical modifications, called epigenomic modifications (epi- modifications). These epi- modifications add an extra layer of information onto the genomic sequence, and enable it to encode a more complex program of gene regulation (Karlić et al., 2010; Maunakea et al., 2010a). Different epi- modifications affect how the DNA interacts with transcription factors, although many mechanisms remain unknown (Campos and Reinberg, 2009). Adding to the complexity, the genomes are far from being completely annotated on the functional level, making it necessary to first find regulatory genomic sequences before we can understand their complex regulatory roles.
Evolutionary comparisons provide a powerful tool to study genome functions. This became obvious when it was recognized that the majority of DNA can mutate freely without deleterious effects, while certain sequence elements are more constrained (Kimura, 1968). Leveraging this theory, researchers have inferred functional genomic segments by examining genomic sequence conservation (Hardison, 2003) and have identified human-specific regulatory DNA by looking for sequences with accelerated rates of evolutionary change (Pollard et al., 2006).
The successes in genomic comparisons beg the question: can we also use evolution to study the functions of the epigenome? To do so, the basic evolutionary properties of the epigenome must be established first, preferably in the contexts of both genomic and transcriptomic evolution. To explore relationships among evolutionary changes to the genome, the epigenome, and the transcriptome, several specific questions were of critical interest. First, evolutionary selection has left clear traces on the human genome (Ren, 2010); what are the traces of evolutionary selection on the human epigenome? Second, are evolutionary changes to the epigenome merely a consequence of genomic sequence changes or, rather, has the epigenome made the genome more or less susceptible to evolutionary selection? Third, the degree of gene expression conservation correlates poorly with the extent to which nonexonic sequences are conserved among vertebrates (Chan et al., 2009; Wilson et al., 2008); might this discrepancy be explained by the epigenome? Fourth, mammalian orthologous transcription factors (TF) often do not bind to orthologous DNA sequences (Jegga et al., 2008), as only ~5% of the Oct4 and Nanog binding sites occupy homologous sequences in human and mouse embryonic stem (ES) cells (Kunarso et al., 2010); do epigenetic modification enzymes apply the same types of modifications to orthologous sequences in mammals?
Among many types of epi- modifications (Tan et al., 2011), a subset is known to correlate with gene transcription. For example, DNA cytosine methylation (Cm) (Maunakea et al., 2010a), histone 3 lysine 27 tri-methylation (H3K27me3), and histone 3 lysine 9 tri-methylation (H3K9me3) may repress gene transcription, whereas histone 3 lysine 4 mono-, di-, and tri-methylation (H3K4me1/2/3), lysine 27 acetylation (H3K27ac), and lysine 36 tri-methylation (H3K36me3) are positively associated with transcription (Karlić et al., 2010). The roles of some epigenomic marks (epi- marks) remain controversial. For example, histone variant H2A.Z is generally assumed to be associated with active promoters because it anti-correlates with Cm in plants, insects, and fish (Zemach et al., 2010); consistently with this, H2A.Z is associated with active promoters in flies (Weber et al., 2010). However, H2A.Z is associated with inactive promoters in yeasts (Guillemette et al., 2005; Raisner et al., 2005). The role of H2A.Z has yet to be tested in mammals. Even for the epi- modifications whose roles are better established, they may have undiscovered functions.
The functions of many epi- modifications have so far only been evaluated individually, primarily due to the difficulty of assessing the functional significances of co-localized epi- marks. Any two epi- marks can co-localize in some genomic regions, but most of such co-localizations do not serve any regulatory functions. The best documented epi- marks co-localization is probably the bivalent domain (H3K27me3 + H3K4me3), which is hypothesized to be poised for activation during differentiation of embryonic stem cells (ESCs) (Mikkelsen et al., 2007). This hypothesized function was derived from comparing ESCs with other cell types. However, a mechanistic understanding of how bivalent domains regulate lineage-specific ESC differentiation is still lacking. We wish to provide a new approach to systematically examine the functions of epi- modifications, and more importantly, the functions of combinations of epi- marks. We propose to leverage the connection between evolutionary conservation and functional importance to achieve this goal.
Here we introduce ‘comparative epigenomics’ – interspecies comparison of epigenomes – as a novel approach for annotation of the regulatory sequences of the genome. We created a multi-species epigenomic dataset from pluripotent stem cells of humans, mice and pigs, which is comprised of genomic distributions of DNA methylation and eight histone modifications, the binding intensities of four transcription regulators (Nanog, Oct4, P300, Taf1), and transcribed RNA sequences. We first examined the co-evolution properties among the epigenome, the genome, and the transcriptome. Comparing epigenomic changes to genomic changes, we observed strong epigenomic conservation for both fast-evolving and slowly evolving DNA sequences, but not on neutrally evolving DNA sequences. These data suggest epigenomic conservation is not completely dictated by genomic sequences. On the other hand, interspecies epigenomic changes are linearly correlated with evolutionary changes of transcription factor binding and gene expression, suggesting that comparative epigenomics can directly reveal critical information on gene regulation. Based on these initial analyses, we set out to discover regulatory sequences by conserved co-localization of different epi- marks. To test the functions of these putative regulatory sequences, we developed a differentiation assay in which mouse embryonic stem cells were differentiated into mesendoderm cells. Our time-course ChIP-seq and RNA-seq data in this differentiation process confirmed the regulatory functions of all of the seven pairs of epi- marks identified by conserved co-localization. Thus, conserved co-localization is an efficient approach to identify functional epi- mark combinations from a large (combinatorial) number of random combinations of epi- marks. More importantly, comparative epigenomics reveals regulatory features of the genome that cannot be discerned from sequence comparison alone.