Histone modifications and DNA methylation play a critical role in regulating gene expression by modifying the chromatin structure of genes and recruiting additional regulatory factors. Our work is based on the hypothesis that a truly integrated epigenomic analysis could yield superior insight into the transcriptional programming of cancer cells beyond that obtained by simply measuring abundance of mRNAs through expression microarrays, since epigenetic modifications will capture information not only from genes being actively transcribed, but they will also reflect the availability for transcription by informing on the chromatin structure at specific loci. As proof of principle, we selected two functionally validated epigenetic marks–cytosine methylation and histone 3 lysine 9 acetylation–in addition to standard gene expression arrays and tested their ability to identify gene regulatory differences between the AML and ALL cell types. First, we demonstrate that such multiplatform epigenomic studies can be readily performed in enriched leukemia cells from standard clinical trial patient specimens. Second, using a novel approach for integrated analysis, we demonstrate that not only is there a functional relationship between gene expression and epigenetic marks but, more importantly, that these platforms synergize to provide a more complete and comprehensive analysis of transcriptional programming in human cells. Some of the strengths of this study include the rigorous quality control steps, the use of a powerful DNA methylation platform on specially designed high density oligonucleotide microarrays, the use of primary patient materials, the performance of the three different assays using the same type of 50-mer high density oligonucleotide arrays in multiple replicates and the extensive single locus validation.
Recently DNA methylation microarrays have been used to study acute leukemias as well as other malignancies. Groups of hypermethylated genes have been identified by such studies in AML cell lines
[25] as well as in ALL patient samples
[26]. Thus, the integration of genetic and epigenetic platforms seems only natural, since individually both types of platforms have proven to capture biologically relevant information. Along these lines, some groups have begun to investigate the potential to be found in the combination of information from different microarray platforms. Shi et. al. used a CpG island microarray containing 1507 expressed CpG island sequence tags to carry out a triple analysis of histone acetylation, DNA methylation and gene expression in an ovarian cancer cell line treated with trichostatin A and 5′-deoxyazacytidine. While they were able to detect a functional interaction between histone acetylation and DNA methylation, they could not demonstrate an overall correlation between changes in epigenetic modifications and changes in expression levels
[27]. Wu et. al. used a combination of ChIP-chip for H3K9 modifications and Differential Methylation Hybridization (DMH) on a 9.2 K mouse promoter array and showed an inverse correlation between H3K9 acetylation and DNA methylation, while no significant correlation could be found between DNA methylation and H3 dimethyl-K9 at the promoter level
[28]. In our study, however, we propose the use of a combination of three different high-density genomic and epigenomic platforms for the in-depth analysis of their relationship in the context of human cancer specimens.
Posing a simple biological question–that is the differentiation between cell types in a sample set–we determined first by carrying out a systematic unsupervised clustering analysis that epigenomic platforms can be readily used for profiling and classification of leukemia clinical samples. Moreover, the combination of DNA methylation and H3K9 acetylation to gene expression data resulted in a significantly larger number of genes being identified that distinguished ALL from AML samples. Since each one of these platforms is affected by unique technical limitations, it is not surprising that they would result in the detection of only partially overlapping cohorts of genes. The existence of such limitations was confirmed by the fact that restricting the analysis to the subset of genes that displayed high signal to noise ratios on any two platforms (i.e. those genes that we can be certain were accurately measured by both platforms) resulted in a high degree of correlation between the different measures.
Furthermore, we hypothesized that this technical limitation due to the presence of noise in gene expression arrays was significantly affecting our ability to detect genuine differences in mRNA levels. By looking at a group of genes that displayed a significant difference between ALL and AML in either H3K9 acetylation or DNA methylation levels but did not display significant differences on gene expression arrays we found that when the mRNA levels of these genes were measured by qRT-PCR, an underlying difference in gene expression could be readily detected. Thus, we were able to confirm that there is an important degree of loss of information when carrying out genome-wide studies by gene expression microarrays alone, and that this information can be recovered by the integration of epigenetic data, reflecting the additive benefit obtained from such an integrated approach.
However, our main goal was to investigate whether gene expression and epigenomic microarrays were capable of reinforcing each other. Our data show that it is possible to further harness the power of integrated epigenomics to identify differentially regulated genes, since genes missed by both gene expression and epigenetic profiling could be recovered for analysis by taking advantage of the tendency of gene expression profiling to correlate positively or negatively with epigenetic marks. For this we looked for genes that were marginally below the significance threshold on gene expression and H3K9 acetylation and for which both measures behaved concordantly. Using these new criteria, an additional 382 genes were identified that had been missed by both platforms. Careful single locus validation of randomly selected genes from this cohort confirmed such genes to be genuinely differentially acetylated and expressed, thus demonstrating the synergistic power of this integrative analysis.
We propose that the additive and/or synergistic ability of integrative genomics and epigenomics to capture differentially-regulated genes in human clinical samples will enhance understanding of disease pathogenesis when carried out in an adequately designed study. The current study used the extreme comparison of ALL with AML clinical samples to demonstrate proof of principle of the approach. However, even with limited numbers of samples, the integrated analysis captures gene networks missed by single platforms, improves the level of confidence in gene networks which were only partially recognized by single platforms, and may center networks more completely around critical mediators of tumorigenesis so that subsequent functional studies could focus on gene products most likely to occupy central roles in the biology of the specific tumors.
In summary, our simple approach for integrated analysis shows a functional relationship between gene expression and epigenetic marks and, more importantly, demonstrates that these platforms synergize to provide a more complete and comprehensive analysis of transcriptional programming. We predict that when applied to large cohorts of patients enrolled in clinical trials, this integrated epigenomics approach will provide more accurate disease classification and more powerful prognostic information, which could be then used to design improved risk adapted and targeted therapy clinical trials.