In this study, we have developed, refined and applied an MBD-chip approach along with accompanying computational analyses for comparison of chromosome-wide DNA methylation patterns in prostate cancer cells with those in normal prostate epithelial cells. We present several technological advances over previous affinity-enrichment based DNA methylation profiling approaches. First, the enrichment process has been streamlined and optimized for fairly small amounts of input DNA (only 300 ng of DNA were used for these studies). Second, compared to antibody based approaches which require the generation of single-stranded DNA for affinity enrichment of methylated DNA, the MBD-based enrichment approach offers the ability to enrich for methylated double-stranded DNA. Third, among the MBD-based approaches for affinity enrichment of methylated DNA, the fragment of the MBD2 protein used in this study is highly streamlined for binding methylated DNA with high affinity and selectivity [
9]. The high selectivity of the MBD2-MBD polypeptide for methylated DNA and the high density of the oligonucleotide tiling microarrays covering all non-repetitive regions of chromosomes 21 and 22 with an average inter-probe spacing of ~10 bp allowed unbiased, high-resolution, chromosome-wide mapping of DNA methylation in the LNCaP prostate cancer cell line and the PrEC normal prostate epithelial cells in primary culture. Finally, we have developed novel computational approaches for analysis of affinity enrichment-based genome-wide DNA methylation data that correct for sequence bias in the methylation signal. The resulting methods greatly enhance the specificity and accuracy of the DNA methylation calls. These analytical methods were specifically optimized for interpretation of DNA methylation tiling microarray data. Knowing that DNA methylation occurs almost exclusively at CpG dinucleotides in adult somatic human cells, and that the MBD2-MBD polypeptide very selectively binds CpG methylated DNA, we were able to define a set of null probes that interrogate regions of the genome that contain an extremely low CpG density that should never be enriched. The signals arising from these probes allowed us to identify and correct for sequence biases that led to increased spurious signals in these regions. Additionally, one theoretical advantage of high-density tiling microarrays is that, if we assume independence between signals from adjacent probes, multiple consecutive probes exhibiting enrichment would multiplicatively increase our confidence that the overlying region was truly enriched. However, in many cases of tiling array data, the assumption of independence of adjacent probes is clearly not met and we therefore cannot easily calculate the confidence of signals arising in multiple consecutive or adjacent probes. In our own data also, we saw that the raw smoothed log-ratios from null probes were highly autocorrelated with the smoothed log-ratios from adjacent probes. However, correcting for the GC content sequence biases using the null probes eliminated this autocorrelation, allowing us to assume independence in signals arising from consecutive null probes. The resulting analyses were highly accurate for absolute methylation calls, with false discovery rates of < 5% and concordance with bisulfite sequencing data of ~90%.
In this study, we restricted analysis to absolute (qualitative) DNA methylation calls because significant new computational methods development is necessary for quantitative analysis of DNA methylation from affinity-enrichment based genome-wide DNA methylation data. This is because deriving quantitative information regarding the fraction of input DNA that is methylated at a given locus from affinity-enrichment based approaches is confounded by multiple issues that are independent of the fraction of methylated input DNA fragments. First, the degree of enrichment is clearly influenced by the density of methylated CpGs around a given locus, and this appears to show a non-linear dependence. Second, the degree of enrichment is likely influenced by various sequence effects and biases. These biases we have in large part been able to isolate and adjust for in qualitative analyses (as described in the manuscript), but significant further research is required to understand how such parameters influence the ability to quantitate methylation levels at a given locus in a specific sample. Third, the degree of enrichment at a given locus is influenced by the total amount of captured species in a given sample. That is, because the same amount of total DNA is hybridized (or sequenced) for each sample, the degree of signal at a given region is influenced both by the amount of methylation at that region and by the total number of methylated molecules making up the enriched sample. Unfortunately, it seems likely that each of these parameters can influence the other parameters in a non-linear and currently unpredictable fashion. In ongoing studies, we are developing methodologies to overcome these issues in order to facilitate accurate quantitative estimates of DNA methylation from enrichment-based genome-wide DNA-methylation data. In the meanwhile, our accurate approaches for qualitative assessment of DNA methylation have allowed significant new biological insights into the differences in chromosome-wide DNA methylation patterns in a cancer/normal model system.
In the classically held view, DNA methylation patterns in cancer cells differ from normal cells in at least two major ways [
31,
32]. First, they often harbor hypomethylation of repetitive elements and of regions of the genome with low CpG density. Our methods did not directly interrogate this aspect of DNA methylation biology since repetitive elements were excluded from the arrays to avoid cross-hybridization signals and because our method, like other restriction enzyme and enrichment based genome-wide DNA methylation assays, cannot robustly detect differential methylation in regions with very low CpG density [
5,
6]. Second, cancer cells are thought to become hypermethylated mostly in CpG islands at the promoters of genes, resulting in epigenetic silencing of those genes. Accordingly, the majority of genome-wide DNA methylation assays have focused on CpG islands and promoters using various types of microarray formats with probes that selectively interrogate such regions. Here, we assessed whether DNA hypermethylation changes in cancer cells occur mostly in gene promoter CpG islands by carrying out an unbiased assessment of DNA methylation across all non-repetitive regions of chromosomes 21 and 22 (without bias to promoters, genes, or other annotations) in prostate cancer and normal prostate cells.
Annotation of the identified methylated regions revealed a significant clustering of DNA methylation in gene-associated compartments of the genome in both the cancer and normal cells, and in regions found to be hypermethylated in the cancer cells. We identified numerous 5' gene upstream regions that were methylated in the cancer and normal cells, some of which were differentially methylated in the cancer cells. For some of these regions, we confirmed that demethylation using a methyltranferase inhibitor led to re-expression of the associated gene, suggesting that methylation of these regions was indeed involved in epigenetic silencing of the associated gene. Two of these regions were confirmed to be novel biomarkers for prostate cancer in an independent set of prostate cancer cell lines and prostate cancer tissues.
Interestingly, we also found significant enrichment for methylation greater than would be expected by random chance for several other gene-associated genome compartments. For instance, we found that methylation of 3' gene downstream regions was enriched to nearly the same extent as 5' gene upstream regions in the LNCaP prostate cancer but not PrEC normal prostate cells, and was also enriched in the cancer hypermethylated regions. Recent reports have suggested that many genes may have antisense transcripts that may be involved in the regulation of the sense transcripts [
24]. We speculate that methylation of the 3' downstream regions may be involved in the regulation of such antisense transcripts. Another possibility is that methylation of such regions is involved in regulating transcriptional elongation/termination or transcript processing such as polyadenylation. Further studies will be required to understand the role of the 3' gene downstream methylation events.
Introns and exons also showed significant enrichment of methylation in the cancer and normal cells. Interestingly, exon sequences and intron-exon junctions showed an extremely high degree of enrichment within methylated regions in cancer and normal cells, as well as in hypermethylated regions in the cancer cells. Luco et al., recently showed that histone methylation patterns occurring at intron-exon boundaries can play a role in regulating alternative splicing of mRNA [
26]. We speculate that DNA methylation patterns may help to reinforce these histone methylation patterns or may also be directly involved in regulation of alternative splicing. Another recent report has suggested that DNA methylation patterns occurring within gene bodies may be involved in regulation of alternative transcriptional start sites [
25]. To our knowledge, neither of these or other previous reports compared gene body methylation in cancer and normal cells. Our data suggest that such gene body DNA methylation can become abnormally increased in prostate cancer cells. We can speculate that cancer cells can take advantage of this regulatory machinery to activate oncogenes or silence tumor suppressors by dysregulating production of alternative transcripts and spliceoforms.
Although the majority of methylated regions overlapped with gene-associated genome compartments, a significant fraction of regions (~30 - 40%) were distal intergenic, occurring at least 3 kbp away from any known genes. Several such distal intergenic regions showed hypermethylation in the cancer cells compared to the normal cells. Interestingly, these intergenic methylated and cancer hypermethylated regions were significantly enriched for a high degree of conservation across several mammalian and vertebrate species, suggesting that there are significant evolutionary pressures against changes at these regions. We can speculate that these regions are involved in long range regulation of genes. Another possibility is that some subset of these intergenic methylated regions are involved in regulation of nearby transcripts that are not yet annotated or known. Consistent with both of these hypotheses, the genomic methylated regions are highly enriched for conserved transcription factor binding sites.
Regardless of the function of the cancer hypermethylated regions, it is apparent that many of these have significant potential in serving as DNA methylation biomarkers of prostate cancer. Cancer hypermethylated regions from different annotation categories (5' gene upstream, 3' gene downstream, intergenic) were frequently methylated in prostate cancer cell lines but not the normal prostate epithelial cells. A few of these (regions associated with ADAMTS1, SCARF2, and DSCR9) were tested further, and in combination, showed ~100% sensitivity and ~85% specificity for prostate cancer compared to matched adjacent benign tissues.
We envision several possibilities for application of the methodologies presented here for cancer biomarker development. For example, the MBD-enrichment based genome-wide DNA methylation approaches can be applied to tumor-normal pairs from several subjects of a given cancer type to assess whether there are any high-frequency DNA methylation changes that can distinguish tumor vs. normal tissue. Then, sensitive DNA methylation analytical techniques, such as COMPARE-MS [
9], real-time MSP [
19] or MethyLight [
33], Methyl-BEAMing [
34], etc., can be used to measure a panel of these DNA methylation alterations in blood, urine, stool, biopsies or other patient biomaterials. A different strategy, analogous to one that was recently described [
35], would involve development of personalized DNA methylation biomarkers. In this strategy, for a given individual, technologies similar to those presented here would be applied to profile the genome-wide DNA methylation patterns distinguishing the individual's tumor from their own normal tissues. These personalized methylation alterations could then be followed in blood, urine or other biospecimens using the various sensitive DNA methylation techniques listed above to track response to therapy, follow disease burden, etc. Of course, such strategies will require significant testing prior to clinical implementation.
The overall MBD-chip approach described here should be broadly applicable to characterizing genome-wide DNA methylation patterns and to identify novel DNA methylation biomarkers for various diseases. The MBD2-MBD polypeptide is now commercially available as part of kits for enriching methylated DNA marketed by different companies (e.g. ClonTech, Invitrogen), and is therefore easily accessible to the research community. Additionally, tiling microarrays interrogating all non-repetitive regions of the entire genome of multiple species, including humans, are now available through various companies including Affymetrix, Nimblegen, and Agilent. Therefore, the methodologies presented here can be readily applied to analysis of the entire human genome. Furthermore, these methods should be easily adaptable to analysis with next generation sequencing [
17]. For instance, recent studies have demonstrated that next generation sequencing platforms also produce significant sequence biases in data produced by their applications [
36], including DNA methylation data [
18]. It has been shown that sequence biases and amplification bias can affect affinity-enrichment based DNA methylation data produced by next generation sequencing platforms [
18]. The general principle of using regions of the genome with ultra-low CpG content to correct such artifactual effects in DNA methylation data introduced by technology platforms should be generally applicable. Methods such as those presented here are poised to facilitate the thorough examination of DNA methylation patterns genome-wide in health and disease.