|Home | About | Journals | Submit | Contact Us | Français|
Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
DNA methylation is a key component of mammalian gene regulation and the most classical example of an epigenetic mark. DNA methylation patterns are mitotically heritable and stable over time, but they undergo considerable changes in response to cell differentiation, diseases and environmental influences. Several methods have been developed for DNA methylation profiling on a genomic scale. Here, we benchmark four of these methods on two sample pairs, comparing their accuracy and power to detect DNA methylation differences. The results show that all evaluated methods (MeDIP-seq: methylated DNA immunoprecipitation, MethylCap-seq: methylated DNA capture by affinity purification, RRBS: reduced representation bisulfite sequencing, and the Infinium HumanMethylation27 assay) produce accurate DNA methylation data. However, these methods differ in their ability to detect differentially methylated regions between pairs of samples. We highlight strengths and weaknesses of the four methods and give practical recommendations for the design of epigenomic case-control studies.
Twenty-five years of research on cancer epigenetics have firmly established the prevalence of aberrant DNA methylation in cancer cells1–5. Moreover, recent studies have investigated the role of DNA methylation for neural and autoimmune diseases, its correlation with physiological conditions and its response to environmental influences6–8. Comprehensive mapping of DNA methylation in relevant clinical cohorts is likely to identify new disease genes and potential drug targets, help establish the relevance of epigenetic alterations in diseases other than cancer, and provide a rich source for biomarker development9. In a biotechnology context, DNA methylation profiling could also facilitate quality control of cultured cells, exploiting the fact that cell states and differentiation potential of stem cells are reflected in their DNA methylation patterns10.
Several methods have been developed to enable DNA methylation profiling on a genomic scale. Most of these methods combine DNA analysis by microarrays or high-throughput sequencing with one of four ways of translating DNA methylation patterns into DNA sequence information or library enrichment: (i) Methylated DNA immunoprecipitation (MeDIP) uses an antibody that is specific for 5-methyl-cytosine to retrieve methylated fragments from sonicated DNA11,12. (ii) Methylated DNA capture by affinity purification (MethylCap) employs a methyl-binding domain protein to obtain DNA fractions with similar methylation levels13–15. (iii) Bisulfite-based methods utilize a chemical reaction that selectively converts unmethylated (but not methylated) cytosines into uracils, thus introducing methylation-specific single-nucleotide polymorphisms into the DNA sequence10,16,17. (iv) Methylation-specific digestion uses prokaryotic restriction enzymes to fractionate DNA in a methylation-specific way18–20.
The diversity of methods and absence of an uncontested commercial market leader raise questions about each method’s strengths and weaknesses – questions that researchers have to answer for themselves when selecting the most appropriate technology for any given project. The goal of this study was to perform a comprehensive benchmarking of four popular methods, with a special emphasis on their practical utility for biomedical research and biomarker development. We selected MeDIP-seq11, MethylCap-seq13, RRBS21 and the Infinium HumanMethylation27 assay16 for inclusion in this comparison, based on the following considerations: (i) All four methods are relatively easy to set up because detailed protocols have been published and / or commercial kits are available. (ii) We chose RRBS rather than genome-wide bisulfite sequencing because its per-sample cost are comparable to the other methods and realistic for large sample sizes. (iii) The Infinium HumanMethylation27 assay was included because of its wide use and easy integration with existing genotyping pipelines; it is the only microarray-based method in our comparison. (iv) Methods that utilize tiling microarrays were excluded because they have been benchmarked previously19 and because next-generation sequencing enables higher resolution and/or higher genomic coverage at competitive cost. (v) Methylation-specific digestion was excluded because no algorithm exists that could accurately infer quantitative DNA methylation data from digested read frequencies18. An outline of the experimental and analytical procedure of this technology comparison is shown in Figure 1.
Genome-wide DNA methylation mapping is most commonly used as a discovery tool, in order to identify differentially methylated regions (DMRs) as candidates for further research. Typical examples are cancer-specific DMRs, which play an increasing role as biomarkers for cancer diagnosis and therapy optimization9. To emulate the case-control approach that is widely used for epigenetic biomarker development, our study focuses on sample pairs that we statistically compare with each other. Specifically, we selected two human embryonic stem (ES) cell lines that were derived from genetically unrelated embryos22, and a matched pair of colon tumor and adjacent normal colon tissue obtained from the same donor. We applied each of the four methods (MeDIP, MethylCap, RRBS, Infinium) to all four samples (HUES6 ES cells, HUES8 ES cells, colon tumor, matched normal colon tissue), generating a total of 16 genome-scale DNA methylation maps. All data were processed with a standardized bioinformatic pipeline, and the technical data quality turned out to be similarly high across all samples and methods (Table 1).
When plotting the DNA methylation data as genome browser tracks we found excellent visual agreement between all four methods (Figure 2; tracks are available online for interactive browsing: http://meth-benchmark.computational-epigenetics.org/). MeDIP and MethylCap gave rise to peaks of methylated DNA that were similar in shape, size and location, indicating that MeDIP’s monoclonal antibody and MethylCap’s methyl-binding domain enrich for similar DNA fragments. However, MeDIP exhibited higher baseline levels and lower peak heights than MethylCap. This reduced dynamic range is already apparent from Figure 2 (note the different scale of the y-axis) and becomes more obvious when plotting MeDIP and MethylCap tracks along an entire chromosome (Supplementary Figure 1). This observation was quantitatively confirmed by plotting the mean read frequency for enriched and depleted fractions of the genome (Supplementary Figure 2). We also observed high visual agreement between RRBS and Infinium, with the limitation that Infinium covers two orders of magnitude fewer CpGs than RRBS (Table 1). Finally, the bisulfite-based methods (RRBS, Infinium) generally confirm the results of the enrichment-based methods (MeDIP, MethylCap), although there are deviations in repeat-rich as well as in CpG-poor genomic regions (Supplementary Figure 3).
For a more quantitative assessment of measurement accuracy, we compared the results of the three sequencing-based methods (MeDIP, MethylCap, RRBS) with the Infinium HumanMethylation27 assay as a common reference. The Infinium assay was used as reference because its quantitative accuracy has been established in previous studies16,23, which reported correlation coefficients around 0.9 relative to the GoldenGate and MethyLight assays. Note however that the probes of the Infinium assay cover only a small percentage of all CpGs in the genome and are preferentially located in unmethylated promoter regions. To compensate for this potential source of bias, we calculate two correlation coefficients, one across the entire spectrum of methylation levels and the other focusing only on those CpGs that exhibit at least 20% methylation according to the Infinium assay.
RRBS and Infinium data can be compared directly and without normalization, because both methods measure absolute DNA methylation levels. For a total of 5,088 single CpGs that were covered by both an Infinium probe and at least five RRBS reads, we observed a Pearson correlation of 0.92 across all DNA methylation levels, and a Pearson correlation of 0.83 when we excluded unmethylated CpGs. Because neighboring CpGs tend to exhibit highly correlated DNA methylation levels17,24 we also evaluated the correlation for RRBS measurement averages over a 200 basepair sequence window around each Infinium probe. Again, we observed excellent agreement between the two methods (Figure 3C), with an overall Pearson correlation of 0.92 across all DNA methylation levels and a Pearson correlation of 0.84 when we excluded unmethylated CpGs. This second comparison supports the hypothesis that a single-CpG measurement can often act as an indicator of the DNA methylation levels at neighboring, unmeasured CpGs.
Comparison with MeDIP and MethylCap is less straightforward because both methods measure the relative enrichment of methylated DNA rather than absolute DNA methylation levels. When we correlated the number of sequencing reads per 1-kilobase region with the DNA methylation measurements of the Infinium assay, the Pearson correlation did not exceed 0.6 across all DNA methylation levels and 0.4 when we excluded unmethylated CpGs (Supplementary Figure 3A and B). High density of repetitive DNA was identified as the major source of spurious read enrichment in regions with low absolute DNA methylation levels, and low CpG density gave rise to low read numbers in regions with high levels of DNA methylation (Supplementary Figure 3C and D). The confounding effect of DNA sequence is also visible in Figure 2: Low read counts can indicate either the relative absence of CpGs (example: region 1 in Figure 2) or the absence of DNA methylation in the presence of CpGs (region 2); and strong peaks can occur in genomic regions that are incompletely methylated if the CpG density is sufficiently high to give rise to substantial read enrichment (region 3).
It has previously been reported that statistical correction for CpG density can improve the quantification of DNA methylation levels based on MeDIP data11,25. We therefore constructed a linear regression model that corrects for the confounding effect of DNA sequence (see Methods section for details), and we observe substantially improved results (Figure 3A and 3B). Across all DNA methylation levels the correlation between the statistically corrected read counts and the DNA methylation measurements of the Infinium assay amounted to 0.84 for MeDIP and to 0.88 for MethylCap. However, the correlations dropped to 0.57 (MeDIP) and 0.66 (MethylCap) when we excluded unmethylated CpGs. These results indicate that MeDIP and MethylCap can almost as precisely distinguish between methylated and unmethylated region as RRBS, but are less accurate for quantifying the DNA methylation levels in partially methylated genomic regions.
The single-basepair resolution of the two bisulfite-based methods comes at the cost of reduced genomic coverage compared to the two enrichment-based methods: RRBS reads are clustered in approximately 1% of the genome10,26 and Infinium focuses on the methylation status of slightly less than 15,000 gene promoters, while MeDIP and MethylCap reads are theoretically able to identify methylated genomic regions located anywhere in the genome. To assess the empirical genomic coverage of each method, we calculated the number of reads (MeDIP, MethylCap) or CpG methylation measurements (RRBS, Infinium) for each of the following genomic regions: (i) CpG islands, (ii) gene promoters, and (iii) a 1-kilobase tiling of the genome. The results are shown in Figure 4, and coverage details for a total of 13 types of genomic regions are available online (http://meth-benchmark.computational-epigenetics.org/).
As expected, MeDIP and MethylCap provide broad coverage of the genome, whereas RRBS and Infinium are more restricted to CpG islands and promoter regions. However, the practically relevant differences in genomic coverage are lower than Figure 4 may suggest. This is because a minimum number of reads are required in at least one sample to reliably detect differential methylation among a given pair of samples. We illustrate this point by two statistical power calculations, which were performed with G*Power 327. Assume that a genomic region is covered by five MeDIP or MethylCap reads in one sample. Then it has to contain at least 20 reads in the second sample to be detected as hypermethylated (assuming a statistical power of 80% and a p-value of 5% without multiple-testing correction). Similarly, RRBS would detect a DNA methylation increase from 30% to 70% only when at least 25 measurements are available in each sample (again assuming a statistical power of 80% and a p-value of 5% without multiple-testing correction).
Genome-wide DNA methylation mapping is most commonly used for detecting DNA methylation differences, for example between diseased and healthy tissue or between genetically modified and unmodified control cells. To assess how well MeDIP, MethylCap and RRBS perform on this task, we developed a bioinformatic method that identifies statistically significant DMRs from multiple types of sequencing data (the Infinium assay requires a different approach and is discussed in a separate section below). For a pre-defined set of genomic regions we count the numbers of sequenced reads (for MeDIP and MethylCap), or alternatively the numbers of methylated vs. unmethylated CpGs (for RRBS), and we test for statistically significant differences between two samples using Fisher’s exact test. When applied to a complete tiling of the human genome, this method performs genome-wide DMR detection. Alternatively, it can be targeted to specific region types such as CpG islands, gene promoters or putative enhancers, which can lead to more sensitive detection of small difference because the multiple-testing burden is reduced compared to genome-wide DMR detection. We pursued both the unbiased and the annotation-guided approach in parallel, focusing our comparison on three types of genomic regions: (i) CpG islands, (ii) gene promoters, and (iii) a 1-kilobase tiling of the genome (Figure 5, Supplementary Figures 4 to 8).
Overall, we observed high correlation for each of the two sample pairs, but also outliers suggesting the presence of DMRs. Based on the RRBS data we obtained Pearson correlations around 0.9 for all three region types, both between the two ES cell lines (HUES6 and HUES8) and between the colon tumor and matched normal colon tissue. For MethylCap and MeDIP, the correlations were somewhat lower and ranged from 0.75 to 0.92 (Figure 5, Supplementary Figures 4 to 8). Using the DMR detection algorithm (see Methods for details), we identified several hundred to several thousand DMRs in each of the two sample pairs. There was substantial, but by no means perfect, overlap between the DMRs identified by all three methods. For the two human ES cell lines, 277 out of 44,440 CpG islands were detected as differentially methylated by each of the three methods (Figure 5D), and pairwise comparisons for each sample and region type (Supplementary Figures 4 to 8) confirmed that the agreement between the three methods was statistically significant in all cases (p<0.01, Fisher’s exact test). In total, we observed that up to 1,000 CpG islands, 405 promoter regions or 1,924 of the 1-kilobase tiling regions (i.e. less than 0.1% of the genome) were detected as differentially methylated by at least two methods. Note however that it is not possible to combine these values into a single sum of DMRs because many CpG islands overlap with promoter regions, and every CpG island and promoter region overlaps with at least one tiling region. Nor does the number of differentially methylated tiling regions provide an accurate estimate of the “true” number of DMRs because a sizable number of DMRs are not statistically significant anymore when split into 1-kilobase regions. Despite these conceptual difficulties, our data clearly indicate that – on average – MethylCap identifies more DMRs than RRBS, while MeDIP identifies the lowest number of DMRs. This order was observed not only based on the total number of DMRs per method, but also when focusing only on those DMRs that were detected by at least two methods, indicating that the comparison is not distorted by high numbers of method-specific artifacts.
In order to pinpoint potential problems of MeDIP, MethylCap or RRBS, we manually inspected a large number of regions that were identified as significant DMRs by only one method. The most frequent reasons why method-specific DMRs were missed by the other methods were insufficient genomic coverage (RRBS, Infinium) and low read numbers conferring insufficient statistical power to detect differential DNA methylation (MeDIP, MethylCap). No cases were identified in which the RRBS and Infinium data were in direct contradiction with each other. However, we could identify a few cases in which MeDIP or MethylCap were inconsistent with RRBS and/or Infinium data. These were almost exclusively located in repetitive regions, indicating that high copy-number repeats can amplify minor differences in the efficiency of methylated DNA enrichment and give rise to a small number of spurious DMRs. In contrast, RRBS seems more robust toward such fluctuations because it measures DNA methylation based on the DNA sequence of the reads in a given region, rather than based on their read frequency. We also assessed whether copy-number variation was a major confounding factor for DMR discovery. This does not seem to be case for our data: The vast majority of DMRs were shorter than 10kb (Supplementary Figure 9), while it is not uncommon for cancer-specific as well as germline-transmitted copy-number variations to extend for much longer distances28,29.
As an additional validation, we selected eight method-specific DMRs based on the ES cell comparison, and we investigated DNA methylation patterns in the two ES cell lines by clonal bisulfite sequencing (Table 2). These genomic regions were hand-picked such that one method clearly identified them as DMRs while the two other methods did not show a trend in either direction. Note that this pre-selection makes the validation substantially harder than confirming randomly selected DMRs, because method-specific DMRs tend to be weaker than DMRs that are detected by multiple methods. As an additional complication, some of the selected DMRs are highly repetitive or overlap with known copy-number variations. Sequencing an average of 11 clones per sample and region we were able to confirm three out of three MethylCap-specific DMRs and two out of two RRBS-specific DMRs. In contrast, two MeDIP-specific DMRs could not be confirmed, and for the third region the agreement was marginal (Table 2, Supplementary Data 1).
To assess the practical relevance of the method-specific differences, we asked whether biologically interesting hits were missed by any of the three methods. For this analysis we focused on the colon samples because of the large number of genes with a known or suspected role in colon cancer. Our results show that several interesting DMRs are detected by all methods, including tumor-specific hypermethylation in the promoters of GATA230 and GATA531. However, a significant number of interesting DMRs were missed by MeDIP, while MethylCap and RRBS both detect them. To give a few examples, this is the case for tumor-specific hypermethylation in the promoter regions of SOX1732, POU2AF133 and SEPT934. Somewhat more rarely, we also observed interesting DMRs being missed by MethylCap or RRBS. For example, MethylCap overlooked tumor-specific hypermethylation at the promoter of SFRP135, and RRBS missed tumor-specific hypermethylation at the promoter of DKK236.
The three sequencing-based methods use DNA sequencing as a way of counting DNA fragments, in order to determine the percentage of methylation-enriched reads that align to specific regions (MeDIP, MethylCap) or to calculate the ratio of methylated and unmethylated cytosines at single CpGs (RRBS). Conceptually, sequencing can be thought of as random sampling from a large pool of DNA fragments. It is therefore expected that the performance of these methods increases when sequencing more DNA fragments, until it levels off as the sequencing depth approaches saturation. To quantify this effect, we repeated the accuracy analysis (Figure 3) and the DMR detection (Figure 5) on randomly sampled subsets of sequencing reads. First, we benchmarked each method against the Infinium data, assessing their ability to quantify DNA methylation levels based on reduced read numbers (Supplementary Figure 10). The results show that all three methods give rise to accurate DNA methylation measurements based on as few as 20% of the total read coverage, and almost no improvement was observed between 50% and 100% sequencing depth. While these data suggest that relatively low sequencing depths are often sufficient for obtaining accurate DNA methylation levels, this cannot be generalized to the entire genome: Infinium probes tend to be located in CpG-rich genomic regions, which are also preferentially covered by MeDIP, MethylCap and RRBS measurements (Figure 4), such that saturation is reached earlier in the vicinity of Infinium probes than in CpG-poor genomic regions.
Second, we tested how many DMRs were still detected among the two sample pairs when the number of sequencing reads in each of the samples was reduced (Supplementary Figure 11). For MeDIP, the number of detected DMRs dropped to less than half when the sequencing depth was reduced to 50%, and there was little indication that the number of MeDIP DMRs approaches saturation even at the highest sequencing depth. For MethylCap the reduction is less dramatic and there is a trend toward saturation. RRBS quickly approaches saturation especially for the ES-cell comparison (Supplementary Figure 11). Overall, the saturation analysis reinforced a conceptual difference between RRBS on the one hand and MeDIP and MethylCap on the other hand: In RRBS, all sequencing is focused on a well-defined, CpG-rich “reduced representation” of the genome, which leads to relatively early saturation but limited coverage of DMRs in CpG-poor genomic regions. In contrast, MeDIP and MethylCap reads are widely distributed over the genome (albeit with a significant tendency toward high coverage in CpG-rich regions), and deep sequencing increasingly uncovers weak DMRs located in CpG-poor genomic regions.
DNA methylation differences in repetitive regions have frequently been ignored by genome-wide studies, due to technical difficulties such as ambiguous read alignment (for sequencing) and cross-hybridization (for microarrays). This is unfortunate given that loss of DNA methylation in repetitive DNA was the first epigenetic alteration shown to play a role in cancer3 and has been an area of active research ever since37. In the current study, we explored two complementary approaches to test for repeat-associated DNA methylation differences. First, we included repetitive regions alongside non-repetitive regions in the DMR detection described above (Figure 5, Supplementary Figures 4 to 8), rather than discarding all sequencing reads that map to repetitive portion of the genome. It was thus possible to identify repeat-associated DMRs in a similar way as non-repetitive DMRs, and we could validate several such cases by clonal bisulfite sequencing (Table 2). However, the focus on specific genomic regions makes it difficult to detect global trends that affect certain repeat classes independent of their exact location in the genome. We therefore developed a second approach, which was motivated by the common origin of many repetitive regions from a small number of retrotransposons. The basic concept was to align sequencing reads to prototypic sequences (e.g., of Alu and L1 elements), in order to obtain DNA methylation measures per repeat class rather than per repeat instance.
To that end, we obtained a manually curated list of 1,267 prototypic repeat sequences that spans the spectrum of repetitive DNA present in the human genome38, and we aligned the sequencing reads of all three methods to this collection of repeat sequences. Approximately 20% of all MeDIP, MethylCap and RRBS reads could be aligned with high confidence, enabling us to estimate the global DNA methylation levels for 553 prototypic repeat sequences. The results of the three methods were in excellent agreement with each other (Supplementary Data 2) and detected substantial differences in the DNA methylation levels of different repeat classes: Among Alu, SVA and satellite repeat sequences we observed consistently high levels of DNA methylation, while most LINE, LTR and DNA repeat sequences exhibited low levels of DNA methylation in the four samples that we investigated. However, we found that the repeat sequences with the highest copy-number throughout the genome were highly methylated for all repeat classes.
When we compared the DNA methylation levels in the two sample pairs (Supplementary Data 3), we observed widespread but relatively moderate hypomethylation in the colon tumor relative to matched normal colon tissue. The most common targets were Alu, SVA and satellite repeat sequences, consistent with previous reports about cancer-specific hypomethylation37. An interesting difference was identified between the two ES cell lines on the one hand and the two colon samples on the other hand: the only human-specific LINE repeat sequence in our collection (L1HS_5end) exhibited high levels of DNA methylation in the two colon samples, but was largely unmethylated and even marked by histone H3K4 trimethylation in the two ES cell lines (Supplementary Data 2). These data suggest that young retrotransposons find ways to evade silencing by DNA methylation in pluripotent cells, which may contribute to their ability to maintain activity in spite of an elaborate epigenetic genome defense39.
Our study used the Infinium HumanMethylation27 assay as a common reference for evaluating the accuracy of the sequencing-based methods, which was justified by prior studies showing high quantitative accuracy of the Infinium assay16,23. However, no prior study investigated the Infinium HumanMethylation27 assay’s power to detect DMRs on a genome-wide scale, hence we could not use the Infinium assay as reference when evaluating DMR discovery by the sequencing-based methods. In fact, one might expect that the utility of the Infinium assay for DMR discovery is quite limited (despite its well-established accuracy) because the assay’s genomic coverage is low (Figure 4). To systematically assess the utility of the Infinium HumanMethylation27 assay for DMR discovery, we initially performed statistical testing in much the same way as for Figure 5. However, most CpG islands were covered by only two Infinium probes, which resulted in low statistical power to detect significant differences. Specifically, paired-samples t-tests identified just three significant DMRs among the ES cell lines and two DMRs between the colon tumor and matched normal colon tissue (data not shown).
Thus, we reformulated our question and asked how many true DMRs exhibited suggestive (albeit insignificant) DNA methylation differences in the Infinium data. As an approximation of true DMRs, we focused on those CpG islands that were detected by at least two sequencing-based methods (which are unlikely to contain a high number of technical artifacts according to the comparative validations described above). Between the two ES cell lines a total of 1,000 consensus DMRs were identified (corresponding to the sum of all center fields in Figure 5), of which 251 were covered by at least one Infinium probe. Similarly, we identified 463 consensus DMRs between the colon tumor and matched normal colon tissue, of which 177 were covered by at least one Infinium probe. In most cases, the directionality of the difference was consistent between the consensus DMRs and the Infinium data (Supplementary Figure 10). But when we imposed a minimum threshold of 20 percentage points DNA methylation difference in the same way as for RRBS, the number of Infinium-detected DMRs dropped to 162 (ES-cell comparison) and 95 (colon cancer comparison). In other words, the Infinium assay detected approximately a fifth of the consensus DMRs that we identified by the sequencing-based methods.
Over the last decade, DNA methylation mapping played an important role in establishing the prevalence of altered DNA methylation in cancer40,41. Furthermore, researchers have started to systematically study the role of DNA methylation in a wide range of non-neoplastic diseases42. This is indeed a good time to probe for epigenetic alterations that contribute to human diseases: Genome-wide association studies have been completed for all common diseases, and their results point to a major role of non-genetic factors in the etiology of most diseases43. Furthermore, it has been suggested that epigenetic events could provide a tractable link between the genome and the environment, with the epigenome emerging as a biochemical record of relevant life events44,45. Systematic investigation of these topics requires powerful, accurate and cost-efficient methods for identifying DNA methylation differences between samples.
The goal of this study was to evaluate current methods for global DNA methylation mapping and to compare their performance in a practical application scenario. To mimic a typical disease-centered case-control study, we worked with primary patient material (colon samples) and used lower amounts of input DNA than in most previous studies (MeDIP: 300ng, MethylCap: 1µg, RRBS: 50ng, Infinium: 1µg). Furthermore, we focused on cell types that are known to exhibit relatively moderate DNA methylation differences30,46, in contrast to the massive DNA methylation alterations that are frequently observed in cultured somatic cells10 and cancer cell lines47. Finally, all four methods included in the current study are widely available and not excessively costly, such that there are few obstacles to using this technology comparison as a blue print for individual lab efforts as well as large-scale epigenomic case-control studies investigating the epigenetics of human diseases.
Overall, the data confirmed that all four methods provide accurate DNA methylation measurements and can be used to detect DMRs in clinical samples. In terms of accuracy, the bisulfite-based methods (RRBS, Infinium) performed slightly better than the enrichment-based methods and did not require any statistical correction of CpG bias. The genomic coverage was moderately higher for MethylCap than for MeDIP, RRBS coverage was by design focused on CpG-rich regions, and the Infinium assay covered a relatively small number of preselected genomic regions. Despite the striking differences in genomic coverage, a substantial fraction of DMRs detected by MeDIP or MethylCap were also identified by RRBS (and vice versa). This somewhat counter-intuitive observation can be explained by the role of region-specific read coverage for the ability to identify statistically significant DMRs: If a genomic region is CpG-poor and thus rarely sequenced by MeDIP or MethylCap, both methods have low statistical power to detect differential DNA methylation. In contrast, CpG-rich genomic regions tend to be more amenable to DMR detection by MeDIP and MethylCap and are also frequently covered by RRBS measurements. Finally, we observed that MethylCap was able to detect roughly twice as many DMRs as MeDIP at comparable sequencing depths, RRBS detected more DMRs than MeDIP but fewer DMRs than MethylCap, and the Infinium assay detected only 20% of the consensus DMRs identified by the sequencing-based methods. These differences could be reproduced in two independent pairwise comparisons, providing strong indication that they are robust across biological replicates and cannot be explained by random experimental variation. On the other hand, we used one specific protocol for each method, and it is quite possible that protocol variations (e.g., different antibody for MeDIP, different elution procedure for MethylCap, or different size selection for RRBS) would produce different results.
Our study also reinforces the importance of sequencing depth as a key parameter determining to power to detect differential methylation with any of the sequencing-based methods. To allow for a fair and practically relevant comparison, we sequenced approximately 30 to 40 million reads for each sample and method. However, it became evident that deeper sequencing would identify further DMRs, especially for MeDIP and MethylCap (Supplementary Figure 11). For disease-centered studies it is therefore necessary to make an informed decision about how to distribute the available resources between sequencing few samples more deeply and sequencing more samples less deeply. Such a decision can be guided by statistical power calculations when some prior knowledge exists about the characteristics of expected DMRs (e.g., magnitude of difference, location in CpG-rich vs. CpG-poor genomic regions), or they can be dictated by practical considerations such as the number of available samples. MeDIP, MethylCap and RRBS as performed in this study seem to provide a practically useful compromise between breadth and depth of sequencing. In contrast, whole-genome bisulfite sequencing48 provides comprehensive genomic coverage at the cost of having to sequence over a billion reads per sample. On the other end of the spectrum, low sequencing depths are often sufficient to detect strong differences such as global loss of DNA methylation but fail to provide reliable locus-specific information49.
Finally, genome-wide studies tend to ignore repetitive regions due to technical difficulties, and the few studies that focused specifically on mapping DNA methylation in repetitive regions did so at relatively low coverage50–52. The current dataset was well-suited to analyze DNA methylation in repetitive regions because the joint results obtained by three different experimental methods helped us to control for technical artifacts that can burden the analysis of repetitive DNA. We observed that repeat sequences are most highly methylated when they are CpG-rich and highly prevalent in the human genome (Supplementary Data 2). In contrast, the DNA methylation levels varied widely among repeat sequences that are either CpG-poor or infrequent in the genome. These results lend support to the hypothesis that DNA methylation provides a mechanism for keeping active retrotransposons in check53. They argue for a highly specific mechanism of repeat repression, which targets DNA methylation mostly to those repeat sequences that threaten genome integrity, while many “benign” repeat sequences may remain unmethylated.
In summary, we benchmarked four methods for genome-scale DNA methylation profiling in terms of their accuracy and power to detect DNA methylation differences. These results will facilitate the selection of suitable methods for studying the role of DNA methylation in human diseases.
Human ES cells were cultured in knockout serum replacement (KOSR) medium according to established protocols22 and genomic DNA was extracted as described previously54. DNA for the colon tumor and matched normal colon tissue was purchased from BioChain (lot number A704198). Both samples originate from the same donor, an 81-year-old male patient diagnosed with moderately differentiated adenocarcinoma.
MeDIP11 was performed using the EZ DNA methylation kit (Zymo Research). A total of 300ng DNA per sample was sonicated using Bioruptor (Diagenode) with 8 intervals of 10min (30s on, 30s off), resulting in an average fragment size of 150 basepairs. Sonicated DNA was end-repaired and ligated with sequencing adapters as described previously11. Gel-based selection for fragment sizes between 100 and 200 basepairs was followed by methylated DNA immunoprecipitation according to the manufacturer’s protocol. A total of 1µg of monoclonal antibody against 5-methyl-cytosine (included in the EZ DNA methylation kit) was used for immunoprecipitation. The immunoprecipitated DNA was PCR-amplified and the specificity of the enrichment was confirmed by qPCR for selected loci as described previously55. Two lanes of 36-basepair single-ended sequencing were performed on the Illumina Genome Analyzer II according to the manufacturer’s standard protocol. Maq with default parameters was used to align the sequencing reads to the NCBI36 (hg18) assembly of the human genome56.
MethylCap13 was performed in a robotized procedure using a SX-8G / IP-Star (Diagenode). 2µg of His6-GST-MBD (Diagenode) was combined with 1µg of sonicated DNA in 200µl of binding buffer (BB, 20mM Tris-HCl pH 8.5, 0.1% Triton X-100) containing 200mM NaCl. This solution was incubated at 4°C for 2 hours. Magnetic GST-beads were prepared by washing 35µl of a well-mixed MagneGST glutathione particle suspension (Promega) with 200µl of binding buffer plus 200mM NaCl at 4°C. Washing was repeated once and the supernatant was removed. The GST-MBD-DNA solution was added to the washed and collected beads, and this suspension was rotated for another hour at 4°C. After removal of the supernatant (this is the flow-through) the beads-GST-MBD-DNA complexes were eluted by washing. 200µl of binding buffer with different concentrations of NaCl was added and the suspension was rotated for 10min at 4°C. Beads were captured using a magnet, and the supernatant was collected. The elution procedure consisted of 1× 300mM (wash), 2× 400mM (wash), 1× 500mM (“low” eluate), 1× 600mM (“medium” eluate), 1× 800mM NaCl (“high” eluate). The collected eluates were purified using QIAquick PCR purification spin columns (Qiagen), eluted with 100µl elution buffer and prepared for sequencing as described previously13. A single lane of 36-basepair single-ended sequencing on performed on the Illumina Genome Analyzer II was performed for the low, medium and high eluates, respectively. The sequencing reads were aligned to the NCBI36 (hg18) assembly of the human genome using Illumina’s analysis pipeline (ELAND) with default parameters. The lanes for each of the three eluates are shown separately in Figure 2, and we tested whether the accuracy relative to the Infinium assay could be improved by taking this additional information into account. However, a linear model that was based on the separate read counts of the three lanes did not outperform a model that was based on the sum of the three lanes, which is why we used only pooled read data for the analyses described in this paper.
RRBS21 was performed according to a previously published protocol54 with some optimizations for clinical samples and low amounts of input DNA21. The main steps were: (i) A total of 50ng (ES cells) or 1µg (colon samples) genomic DNA was digested by 5U to 20U of MspI (New England Biolabs, NEB) for up to 16h. (ii) End-repair and adenylation of digested DNA were performed in a 20µl reaction consisting of 10U of Klenow fragments (3’→ 5’ exo-, NEB), 2µl premixed nucleotide triphosphates (1mM dGTP, 10mM dATP, 1mM 5’ methylated dCTP). The reaction was incubated at 30°C for 30min followed by 37°C for additional 30min. (iii) Preannealed 5-methylcytosine-containing Illumina adapters were ligated with adenylated DNA fragments in a 20µl reaction containing of 1µl concentrated T4 ligase (NEB), 1–2µl of 15µM adapters at 16°C for 16 to 20 hours. (iv) Gel-based selection for fragments with insertion sizes of 40 to 120 basepairs and 120 to 220 basepairs was performed as described previously21. (v) Bisulfite treatment with the EpiTect Bisulfite Kit (Qiagen) was conducted following the protocol designated for DNA isolated from formalin-fixed and paraffin-embedded tissues. Two rounds of conversion were performed in order to maximize bisulfite conversion rates. The final bisulfite-converted DNA was eluted with 2× 20µl pre-heated (65°C) EB buffer. (vi) To determine the minimum number of PCR cycles for final library enrichment, analytical (10µl) PCR reactions containing 0.5µl of bisulfite-treated DNA, 0.2µM each of Illumina PCR primers LPX1.1 and 2.1 and 0.5U PfuTurbo Cx Hotstart DNA polymerase (Stratagene) were set up. The thermocycler conditions were: 5min at 95°C, varied cycle numbers (10–20) of 20s at 95°C, 30s at 65°C, 30s at 72°C, followed by 7min at 72°C. PCR products were visualized by running on a 4–20% polyacrylamide Criterion TBE Gel (Bio-Rad) and stained by SYBR Green. The final libraries were generated by 8 of 25µl PCR reaction with each one containing 2–3µl of bisulfite-converted template, 1.25U PfuTurbo Cx Hotstart polymerase and 0.2µM each of Illumina LPX1.1 as well as 2.1 PCR primers. The libraries were PCR amplified and sequenced on the Illumina Genome Analyzer II as described previously21. The sequencing reads were aligned to the NCBI36 (hg18) assembly of the human genome using a custom alignment software that was developed for RRBS data10.
Infinium16 analysis was performed by the Genetic Analysis Platform at the Broad Institute. A total of 1µg of genomic DNA per sample was bisulfite-treated according to the manufacturer’s protocol and hybridized onto Infinium HumanMethylation27 bead arrays (Illumina). We previously observed almost perfect agreement between technical replicates (Pearson’s r>0.98), which is why only a single hybridization was performed for each sample.
For MeDIP and MethylCap, the aligned reads were extended to the mean fragment length obtained during sonication, and from each group of duplicate reads (i.e. reads aligned to the exact same start position on the same chromosome) all but one read were discarded, in order to minimize the impact of PCR bias on downstream analysis. For RRBS, the aligned reads were compared to the reference genome, and the DNA methylation status was determined using a custom software as described previously21. Infinium HumanMethylation27 data were processed with Illumina’s BeadStudio 3.2 software, using the default background subtraction method for normalization. UCSC Genome Browser tracks were constructed by custom scripts implemented in the Python programming language (http://www.python.org/).
We used linear regression models to estimate the absolute DNA methylation levels from the MeDIP and MethylCap read counts. Based on a number of different feature selection experiments, we found that the following combination of variables was robustly predictive of DNA methylation levels: (i) the square root of the total number of MeDIP or MethylCap reads within the given region, (ii) the square root of the total number of whole-cell extract (WCE) reads within the region (based on a cross-tissue WCE track that we routinely use for ChIP-seq data normalization), (iii) the logit of the CpG frequency within the region, (iv) the relative GC content of the region, (v) the ratio of Cs relative to CpGs, and (vi) the relative repeat content of the region as determined by RepeatMasker (http://www.repeatmasker.org). For both MeDIP and MethylCap, we observed that the read frequencies were strongly positively associated with the absolute methylation level according to Infinium data, while the repeat content was moderately positively associated. In contrast, the logit of the CpG frequency was highly negatively associated with DNA methylation, and all other variables as well as the model’s intercept exhibited a moderately negative association. For model fitting and performance evaluation, the current dataset was split into equally sized training and test sets. All model fitting was performed using the R statistics package (http://www.r-project.org/).
In our experience, classical peak detection57,58 is not well-suited for DMR identification because of the high number of spurious hits encountered when borderline peaks are detected in one sample but not in the other (C. Bock, unpublished observation). Instead, we used a statistical test to compare two samples directly with each other. For a given region with RRBS data, we count the number of methylated vs. unmethylated CpGs in both samples and perform Fisher’s exact test to obtain a p-value that is indicative of the likelihood of the region being a DMR. Similarly, for MeDIP and MethylCap we count the numbers of reads that align inside the region for both samples and use Fisher’s exact test to contrast these values with the total numbers of reads that align elsewhere in the genome. And for the Infinium assay we use a paired-samples t-test to compare the two samples’ β-values of all Infinium probes inside the region. These tests are performed on a large number of genomic regions in parallel (e.g., on all CpG islands), and the p-values are corrected for multiple testing using the q-value method59. Genomic regions with a q-value of less than 0.1 are flagged as hypermethylated or hypomethylated (depending on the directionality of the difference), but only if the absolute DNA methylation difference exceeds 20% (for RRBS and Infinium) or if there is at least a twofold difference in the read number (for MeDIP and MethylCap). These thresholds were chosen by their practical utility in a number of comparisons between different cell types and have no further justification. We also mark genomic regions with insufficient sequencing coverage, but do not exclude them from DMR analysis. For MeDIP and MethylCap we require at least ten reads per 10 million total reads for the sample with higher read coverage, and for RRBS we require a minimum of five CpGs with at least five reads each in both samples.
This statistical approach to DMR identification requires us to define sets of genomic regions on which the analysis is being performed. We pursued a two-way strategy to maximize the chances of finding interesting DMRs. One the one hand, we focused specifically on CpG islands and gene promoters, which are prime candidates for epigenetic regulation. This approach provides increased statistical power for regions with well-known functional roles because the relatively low number of CpG islands and gene promoters reduces the burden of multiple-testing correction compared to the genome-wide case. On the other hand, we used a 1-kilobase tiling of the genome to detect DMRs that are located outside of any candidate regions. And to cast an even wider net, we collected a comprehensive set of 13 types of genomic regions, which includes not only CpG islands and gene promoters, but also CpG island shores30, enhancers60, evolutionary conserved regions and other types of genomic regions. DMR data for all of these region sets were calculated using a set of Python and R scripts and are available online (http://meth-benchmark.computational-epigenetics.org/).
Based on the CpG islands that were detected as differentially methylated between the two ES cell lines (Figure 5), we manually selected eight method-specific DMRs for experimental validation. To that end, those CpG islands that were identified as statistically significant DMRs by one method (but not by the other two methods) were visually inspected in the UCSC Genome Browser, and regions were selected for validation only if the data fully supported their classification as method-specific DMRs. In particular, regions were not selected if a second method already picked up a suggestive but insignificant trend in the same direction as the first method, or when the data of the first method already suggested that the DMR was a false-positive hit (e.g., because of contradictory trends in the vicinity of the DMR). Experimental validation was performed by clonal bisulfite sequencing following established protocols61. Primers were designed using MethPrimer62 such that the amplicon overlapped with those CpGs that exhibited the highest levels of differential methylation according to our original data. To prepare for bisulfite sequencing, 1µg of DNA was bisulfite-converted using the EpiTect kit (Qiagen); 50ng of bisulfite-converted DNA was PCR-amplified (see Supplementary File 1 for primer sequences); and purified amplicons were cloned using the TOPO TA cloning kit (Invitrogen). For each region an average of 11 clones were randomly chosen for sequencing. All sequencing data were processed using the BiQ Analyzer software63, and the results are summarized in Supplementary File 1.
Repeat sequences were obtained from database version 14.07 of RepBase Update38, which is publicly available online (http://www.girinst.org/server/RepBase/index.php). From a total of 11,670 prototypic repeat sequences we selected those 1,267 that were annotated either to human or to its ancestors in the taxonomic tree, and we combined these prototypic repeat sequences into a pseudo-genome file. Maq with default parameters was used to align MeDIP, MethylCap, RRBS, ChIP-seq (H3K4me3) and whole-cell extract (WCE) sequencing reads against this pseudo-genome56. For RRBS, both the reads and the reference genome were bisulfite-converted in silico prior to the alignment. The epigenetic status of each prototypic repeat sequence was quantified as follows: (i) For MeDIP, MethylCap and ChIP-seq we calculated the odds ratios relative to the WCE data. (ii) For RRBS we computed the number of methylated CpGs, total number of CpG measurements and percentage of DNA methylation based on the comparison of the aligned reads with the prototypic repeat sequence.
We discarded rare repeats with WCE coverage below 100 aligned reads or RRBS coverage below 25 CpG measurements, resulting in 553 prototypic repeat sequences that were used for further analysis. Among these were 97 LINE class sequences (92 of them from the L1 family), 51 SINEs (48 of them from the Alu family), 6 SVAs, 62 DNA repeats, 15 satellite repeats, 315 LTRs, 1 low-complexity repeat and 6 RNA repeats (Supplementary File 2). To quantify differential methylation between a pair of MeDIP and MethylCap samples, we calculated the pairwise odds ratio of the read coverage for each prototypic repeat sequence, while the absolute DNA methylation difference was used in the case of RRBS (Supplementary File 3). The significance of the difference was assessed using Fisher’s exact test in the same way as for the non-repetitive genome (described above).
We thank A. Crenshaw and M. Parkin (Broad Institute, Cambridge, MA) for assistance with the Infinium assay and K. Halachev (Max Planck Institute for Informatics, Saarbrücken, Germany) for the provision of genome annotation files. C. Bock is supported by a Feodor Lynen Fellowship from the Alexander von Humboldt Foundation. A. B. Brinkman is supported by the Dutch Cancer Foundation (KWF, grant KUN 2008-4130). A. Meissner is supported by the Massachusetts Life Science Center and the Pew Charitable Trusts. The described work was in part funded by the Pew Charitable Trusts, the NIH Roadmap Initiative on Epigenomics (U01ES017155), and the European Union’s CANCERDIP project (HEALTH-F2-2007-200620)
Author contributionsC.B., E.T. and A.M. conceived and designed the study; E.T., A.B., F.S. and H.G. performed the experiments; C.B., F.M. and N.J. analyzed the data; C.B., A.G., H.G.S. and A.M interpreted the results; and C.B. and A.M. wrote the paper.
The authors declare that no competing financial interests exist.