Unravelling the complexities of the epigenome is a very important objective, and recent years have seen the development of several strategies for genome-wide analysis of epigenetic marks, including DNA methylation. One of the principal challenges now is to develop more powerful analytical tools to interpret the vast amounts of data that are being/will be generated.
Here, we have reported the development and validation of Batman – a novel cross-platform algorithm for the quantitative analysis of MeDIP data generated using either arrays (MeDIP-chip) or next-generation sequencing technologies (MeDIP-seq, representing the first high-resolution DNA methylome of any mammalian genome). Batman, combined with MeDIP-chip or MeDIP-seq, provides estimation of absolute methylation levels over a wide range of CpG densities. This is a very useful property for DNA methylome analyses, as it will allow more effective genome-wide/whole-genome profiling, including CpG-poor regions that have traditionally been overlooked in most DNA methylome studies to date. Furthermore, estimation of absolute DNA methylation levels will facilitate cross-platform comparisons.
Although several strategies now exist for DNA methylation profiling, there are, to the best of our knowledge, two others that compare with MeDIP-chip/MeDIP-seq + Batman in terms of genomic coverage and quantitative performance. The first is Comprehensive High-throughput Arrays for Relative Methylation (CHARM), which was recently reported by Feinberg and colleagues
42. CHARM combines a tiling-array design strategy with statistical procedures that average information from neighboring genomic locations. The authors applied CHARM to the McrBC assay in which the DNA is digested with McrBC, which restricts methylated DNA (recognition sequence R
mC(N)
55–103R
mC). The enzyme is used on size-selected (1.5 – 4.0 kb) DNA to fractionate unmethylated DNA after digestion, which is co-hybridized on arrays with DNA similarly processed but not cut with the enzyme. The authors demonstrated that CHARM correlates well with bisulfite-conversion based data (R
2 = 0.76). However, CHARM is not a ‘stand-alone’ algorithm but rather a strategy that requires the use of a particular array design and it is unclear whether it can be adapted to next-generation sequencing technologies. It does not estimate absolute DNA methylation levels and, as the authors note, CHARM suffers to some degree in the ability to discriminate highly methylated from highly unmethylated CpG islands. Interestingly, the authors also tested MeDIP-chip and concluded that it cannot be used to analyze CpG-poor regions
42. Our results show that MeDIP + Batman can be used to provide absolute DNA methylation levels across a range of CpG densities (including CpG-poor regions) from arrays or next-generation sequencing.
Another recently reported approach is BS-seq, which Jacobsen and colleagues used to delineate a DNA methylome for the ~120 Mb
Arabidopsis genome
19. BS-seq has the ability to provide single-base pair resolution DNA methylation profiles, which is indeed a very useful property. However, at current sequencing costs, such an approach is still prohibitively expensive to analyze larger genomes such as the human which is ~25X bigger than the
Arabidopsis genome. Based on our results, we estimate that ~40 million paired-end reads (less than a single run of an Illumina Genome Analyzer) are sufficient to generate a high-quality mammalian methylome, whereas approximately ~3.8 Gb of sequence (which would equate to > 40 million paired-end reads) was required to generate a single-base pair resolution (~20 X coverage) methylome for the ~120 Mb
Arabidopsis genome using BS-seq
19. Also, even though single-CpG resolution is desirable, the fact that the methylation status of CpG sites within <1000 bp is significantly correlated
18 (e.g. ~75% for ≤100 bp), means that the ~100 bp resolution is suitable for many applications.
Although Batman in its present form performs well, we see opportunities for future development of MeDIP – post-processing platforms, especially with regard to the use of sequencing technologies. In particular, when analyzing paired-end MeDIP-seq data, it should be possible to take advantage of the exact mapping positions of each read, rather than summarizing the data as a set of read-depth samples, thereby improving the resolution. Also, it would be interesting to apply Batman to the analysis of
Arabidopsis MeDIP data. Although both CpG and non-CpG methylation is found in
Arabidopsis22,26, gene bodies contain predominantly the former, and therefore it should be possible to use Batman for the analysis of genic regions.
In the near future, the integration of (epi)genomic and functional approaches is going to be crucial for elucidating the biological role of DNA methylation. The need for such an integrated approach is also evident from the recently announced NIH Epigenome Roadmap Initiative calling for mapping of reference DNA methylation profiles on an unprecedented scale (
http://nihroadmap.nih.gov/epigenomics/). We believe that the Batman algorithm combined with MeDIP-chip or MeDIP-seq will provide powerful and cost-effective strategies for quantitative, high-resolution DNA methylome analysis, and will contribute towards elucidating the role of the epigenome in health and disease.