|Home | About | Journals | Submit | Contact Us | Français|
Aberrant DNA methylation often occurs in colorectal cancer (CRC). In our study we applied a genome-wide DNA methylation analysis approach, MethylCap-seq, to map the differentially methylated regions (DMRs) in 24 tumors and matched normal colon samples. In total, 2687 frequently hypermethylated and 468 frequently hypomethylated regions were identified, which include potential biomarkers for CRC diagnosis. Hypermethylation in the tumor samples was enriched at CpG islands and gene promoters, while hypomethylation was distributed throughout the genome. Using epigenetic data from human embryonic stem cells, we show that frequently hypermethylated regions coincide with bivalent loci in human embryonic stem cells. DNA methylation is commonly thought to lead to gene silencing; however, integration of publically available gene expression data indicates that 75% of the frequently hypermethylated genes were most likely already lowly or not expressed in normal tissue. Collectively, our study provides genome-wide DNA methylation maps of CRC, comprehensive lists of DMRs, and gives insights into the role of aberrant DNA methylation in CRC formation.
Colorectal cancer (CRC) is the third most common cancer, and the third leading cause of cancer death in the world.1 CRC arises as a consequence of accumulation of genetic mutations and epigenetic alterations in colonic cells. Common genetic alterations in CRC include mutations in the APC gene and in DNA mismatch repair genes. At the epigenetic level, promoter DNA methylation causing the silencing of tumor suppressor genes, such as CDKN2A (the p16/INK4a gene), is a common alteration.2-4
Identification of differential DNA methylation patterns between normal and cancer cells may shed light on the processes underlying tumorigenesis. Another objective of studying DNA methylation is to identify biomarkers for diagnostic and prognostic use. Specifically, aberrant methylation of a particular gene (or group of genes) can provide a useful clinical tool for early detection of CRC or for stratifying tumors into subtypes.
Extensive efforts have been invested in identifying regions with aberrant DNA methylation in CRC and in finding correlations with tumor development or phenotype. Until recently, most studies were performed on small sets of loci, but now large-scale methods are emerging.5-8
Here, we investigated DNA methylation in CRC genome-wide using MethylCap-seq.9 Our study provides comprehensive DNA methylation maps of CRC and matched normal colon tissue, and identifies differentially methylated regions (DMRs). We determined characteristics of these DMRs using publically available expression and epigenetic data sets, and select a list of candidate biomarkers for CRC detection.
To investigate DNA methylation in CRC in a genome-wide unbiased fashion, we applied MethylCap-seq. This method involves capture of methylated DNA using the MBD domain of MeCP2, and subsequent next-generation Illumina sequencing of eluted DNA as schematically outlined in Figure S1.9 MethylCap-seq was performed on 24 tumors and an equal number of matched normal samples (48 DNA methylation profiles in total).
A representative screen shot of the normalized DNA methylation data visualized in the UCSC genome browser is shown in Figure 1A. This example shows DNA methylation shared by normal and tumor samples, as well as sample-specific methylation and an occasional region that is differentially methylated between tumors and normal samples.
For each of the 48 samples, the genomic regions with elevated numbers of aligned sequence reads were identified and merged, yielding 329,613 regions of potential interest, with a median length of 2,192 bp. These regions are distributed across all chromosomes (Fig. S2). Per sample, the number of normalized sequence reads overlapping each region of interest was calculated, which is referred to as read density and used as a measure of DNA methylation.
No strong associations were found between the first two principal components of the samples in the methylation space and any of the sample preparation and processing factors (Table S2 and Fig. 1D), indicating that there are no strong batch effects present in the data.
Previously, we extensively reported that methylated regions identified by MethylCap-seq could be validated with classical bisulfite sequencing.9-12 To validate the current MethylCap-seq, four fragments were selected and assessed in two sample pairs. These fragments overlap with the promoters of SDC2, SIM2, ROBO1 and CYP1B1. In all cases, bisulfite sequencing corroborated the MethylCap-seq data: hypermethylation at SDC2, and SIM2 in the two tumors, ROBO1 promoter hypermethylation in one of the two tumors, and no differential methylation of CYP1B1 (Fig. S3). Thus, MethylCap-seq provides a good representation of DNA methylation.
For each tumor and matched normal sample the read densities in the peaks of interest were compared. The Pearson correlation coefficients for the pairs range from 0.337 to 0.897 (median = 0.723), whereby only four sample pairs have a correlation below 0.5 (Fig. S4A). Furthermore, for most regions the median methylation level of tumor and normal and the standard deviation for tumor and normal are very similar (Fig. 1B and C). Nevertheless, principal component analysis (PCA) separated normal tissue from tumor (Fig. 1D). This suggests that, although the genome methylation generally remains preserved, there are distinct loci that differentiate tumor from normal.
We proceeded to identify these regions that are differentially methylated. The DMR calling algorithm used is essentially identical to the one described previously (see Material and Methods).10 In total, 28% of the merged peaks were differentially methylated in one or more sample pairs (Fig. 2A).
The number of DMRs identified per sample pair varied greatly (between 515 and 33,576) (Fig. S4B) without displaying a clear correlation between the number of DMRs and the available clinical data (Table S3), such as tumor grade or the presence of a mutation in the tumor suppressor gene KRAS (lowest p value = 0.33). Tumor purity and copy number aberrations could in theory affect the MethylCap-seq profiles and hence the number of DMRs; however, such data are not available. In 14 out of the 24 sample pairs more hypermethylated than hypomethylated regions were observed in the tumor; in the remaining nine pairs this was reversed. Especially, sample pair 41 shows many regions with hypomethylation in the tumor compared with the matched normal. A large fraction of these hypo DMRs are sample pair specific. Our quality control analyses revealed only a slight increase in median normalized read density for 41N as compared with others samples. This could be due to both biological and experimental factors, which cannot be address easily.
When combining all DMRs identified in the individual sample pairs it becomes apparent that many DMRs are patient-specific or occur in a small subset of the patients. Nevertheless, a handful of DMRs occur frequently in sample pairs: 184 DMRs are common to more than 19 samples. Furthermore, frequently occurring DMRs are predominantly hypermethylated in the tumors compared with normal tissue, whereas sporadic DMRs (e.g., as present in less than six sample pairs) can be either hyper- or hypo-methylated in the tumors (Fig. 2B).
To assess whether differential methylation at a given site is recurrent in CRC, we determined the “support” for all DMRs. We defined the “support” as the number of tumors in which a region was differentially methylated as compared with the matched normal tissues. As 24 samples were analyzed, the support values of hypermethylation for a given region are integer numbers between 0 and 24. We modeled the differential methylation events by Bernoulli trials and thus calculated the probability of a given support value based on a binomial distribution. We set a threshold for “high-confidence CRC DMRs” at probability ≤ 1 × 10−3, which corresponds to a minimum support of six. Consequently, a hypermethylated region was regarded with high-confidence if: (1) the support for hypermethylation of this region is at least six, and (2) the support for hypomethylation is at most one. The high-confidence hypomethylated regions in CRC were defined analogously. In total, 2,687 hypermethylated and 468 hypomethylated regions were selected (Fig. 2C). Table 1 shows 10 regions with the highest support for hypermethylation and Table S4 shows the full list.
To confirm that the selected DMRs are also recurrent outside the analyzed 24 sample pairs, we performed MethylCap-seq on 5 independently obtained sample pairs. We find that 95% of the high-confidence hyper DMRs were a DMR in the additional samples. Specifically, all 184 DMRs that are common to at least 80% of the samples of the main set (support > 19) are also a DMR in at least 3 of the additional samples (Fig. S5).
Hierarchical clustering was performed to categorize the tumors into different methylation subgroups. Using the read densities for the most variable regions or the high-confidence hypermethylated regions we could not identify stable clusters that were enriched for a particular tumor grade, DUKES staging, microsatellite instability, or KRAS mutation. In addition, the elastic net method, which is a regularized regression method designed for genomic applications where the number of features greatly exceeds the number of analyzed samples,13 was unable to identify a stable predictive pattern for the clinical properties. However, in the hierarchical clustering with the high-confidence hypermethylated regions, three tumors (2T, 31T and 33T) appeared to be different (Fig. S6A). These tumors are grade 3, wild type for KRAS, and two are microsatellite instable (MSI) (Table S3). No regions were found to be significantly characteristic for this group. This is partially due to the low ratio of samples to regions, which restricts the power of the applied Mann-Whitney U test followed by false discovery rate correction.
Several labs showed that a subset of the CRCs has relatively high methylation in specific regions in the genome. These CRCs are said to have the CpG island methylator phenotype (CIMP). Weisenberger et al.14 developed a marker panel of five regions (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1) to characterize CIMP. Strikingly, 31T and 33T had particularly high read densities at these regions. In addition, 44 of the 50 regions with the lowest p-values for differential methylation are CIMP-specific regions as determined by Xu et al.6 (Fig. S6B). This indicates that these three samples could be CIMP.
Analysis of the distribution of all DMRs over six genomic elements showed that DMRs occurring only in one or two tumors appear randomly distributed compared with those that occur more frequent (Fig. 2D). Strikingly, the percentage of DMRs with hypermethylation in the tumor at CpG islands (CGIs) overlapping transcription start sites (TSSs) increases with the support for these DMRs. The regions that are hypermethylated in 1 tumor (n = 30,280) coincide for only 4% with TSSs with CGI, whereas for regions that are hypermethylated in 22 tumors (n = 31) this value rises to 71%. Furthermore, the two regions hypermethylated in 23 tumors also both coincide with a TSS with CGI.
The high-confidence hypermethylated regions coincide for 52% with CGIs with promoter and for 40% with CGIs without promoter (i.e., orphan CGIs), whereas this is only 5% and 24% for high-confidence hypomethylated regions (Fig. S7A and B). The overlap of the DMRs with the CGIs was analyzed in more detail: 90% of the high-confidence hypermethylated DMRs covered more than 50% of the CGI (Fig. S7B), and only 1% of the high-confidence hypermethylated or hypomethylated DMRs overlapped specifically with CGI shores (defined as the 2,000 kb region up- or downstream of a CGI, Fig. S7A and B).
To gain insight into the biological functions that could be altered by hypermethylation, gene ontology analysis was performed with the web based tool DAVID.15,16 Analysis of the 1,667 genes with high-confidence hypermethylated DMRs over their promoter showed that the encoded proteins are enriched for localization at cellular membranes, and DNA binding and/or transcription factor activity (Table S5). Furthermore, there is enrichment of proteins that play a role in neuroactive ligand-receptor interactions, calcium signaling, and pathways in cancer (Table S5). Genes in cancer pathways include components of the Wnt-, MAPK-signaling pathway, or proteins with a role in interactions between cells and the extracellular matrix (ECM).
Examples of previously described, frequently hypermethylated genes are CDKN2A, SEPT9, VIM and CSPG2/VCAN. They were detected as hypermethylated in 9, 9, 17 and 20 tumors, respectively. Notably, CSPG2/VCAN was identified by Fernandez et al.17 to be hypermethylated in CRC, and not in 29 other cancers analyzed in their study.
The most frequently hypermethylated regions (in 23 of the 24 tumors) overlap with SDC2, TLX1 and TLX1NB (Fig. 3). SDC2 is a transmembrane heparan sulfate proteoglycan that functions as a cell surface receptor interacting with ECM components and cell-to-cell signaling molecules such as growth factors.18 TLX1 and TLX1NB are divergent genes, TLX1 (also known as HOX11) is a nuclear transcription factor important for development of the spleen during embryogenesis, specification of neuronal cell fates, and is associated with certain T-cell acute lymphoblastic leukemias.19-21 There is little knowledge about TLX1NB.
Next, we compared MethylCap-seq with RNA-seq and ChIP-seq profiles of H3K4me3 and H3K27me3 of the colon cancer tumor cell line HCT116 (HCT116 WT) and HCT116 with the DNA methyltransferases DNMT1 and DNMT3b knocked-out (HCT116 DKO).22 These two cell lines are frequently used to analyze DNA methylation and its role in regulating gene expression in colon cancer. The regions frequently hypermethylated in tumors generally also displayed high methylation in HCT116 WT and reduced methylation in HCT116 DKO (Fig. S8A and B). Correlations between the support for hypermethylation and occupancy of the epigenetic marks were calculated (Fig. S8C) and intensity plots were made to visualize the occupancy of DNA methylation and the other marks in the HCT116 cells at the high-confidence DMRs (Fig. 4A). The intensity plot of DNA methylation in HCT116 WT shows a clear positive correlation: the regions with the highest support have higher levels of DNA methylation compared with the lower support regions. Furthermore, the intensity plots show that H3K4me3 and H3K27me3 are somewhat depleted in the high-confidence hypermethylated regions in line with the antagonism between DNA methylation and H3K27me3 described for mouse embryonic stem cells and HCT116.12
For differential expression analysis between HCT116 WT and DKO, sequence reads per kilo base of transcript per million mapped reads (RPKM values) were calculated. For 17% of all annotated genes the expression was more than 2-fold higher in the DKO compared with WT HCT116. We focused on transcripts with a DMR at their promoter. Transcripts that overlap with DMRs with high support tend to show a larger increase in expression in the DKO compared with WT HCT116 than in those with lower support (Fig. S8D). For 44% of the promoters frequently hypermethylated in CRC, expression of the associated gene was more than 2-fold higher in the DKO than in WT HCT116 (Fig. 4B). ITGA4, for example, is frequently hypermethylated at its promoter in tumors (support 21) and is highly differentially expressed in DKO vs. WT (9-fold increase in DKO). The transcripts associated with the two DMRs with support 23 showed increased expression in the DKO of at least 4-fold (Fig. 3C).
To further characterize the set of high-confidence hypermethylated DMRs, publically available data sets on epigenetic marks in the human embryonic stem cells H1 from the ENCODE project were mined.23 The correlations between support for hypermethylation and occupancy of H3K4me3, H3K27me3, H3K36me3, Pol2, CTCF and DNaseI hypersensitive sites were assessed (Fig. S9A). We find a strong positive correlation between the support for hypermethylation and the repressive mark H3K27me3 in H1 ESC (Fig. S9A and B). H3K4me3 and DNaseI hypersensitivity, which are indicators for transcription/promoters and chromatin accessibility, respectively, also show positive correlations (Fig. S9A, C and D). In other words, regions frequently hypermethylated in CRC often show higher levels of these marks than regions sporadically hypermethylated.
To visualize the epigenetic patterns at the high-confidence DMRs intensity plots showing the occupancy of these six marks in H1 were generated (Fig. 5A). We separated the regions into CGI with TSSs, orphan CGIs, and regions not overlapping with annotated CGI. A notable observation is that 70% of the high-confidence hypermethylated “orphan” CGIs display elevated H3K4me3, are DNaseI hypersensitive and lack H3K27me3 in H1 cells (Fig. 5A; Fig. S10), suggesting that these regions are likely alternative or un-annotated TSSs, reinforcing the notion that hypermethylation in CRC is strongly biased toward promoters.
Furthermore, these plots show that the hypermethylated CRC DMRs overlap with regions with elevated H3K27me3 and H3K4me3 and are DNaseI hypersensitive in H1 cells, so-called bivalent regions (Fig. 5A).24,25 This is confirmed with a comparison of peak tracks of H3K4me3 and H3K27me3 in H1 cells (called by MACS26) and the high-confidence hypermethylated DMRs (Fig. 5B).
In the absence of matching expression analysis, two publically available expression data sets (GSE25070 and GSE21815) were mined.5,27 The first data set consists of transcript levels of 26 CRC and 26 matched normal colon tissues obtained with Illumina beadchips.5 From the second data set, obtained with Agilent arrays, transcript levels of 32 CRC and five normal colon tissues were analyzed.27 The probes were linked to gene transcripts and subsequently to the MethylCap-seq peaks that overlap with promoters. Analysis was done on the mean expression level in tumor vs. normal.
For the data set by Hinoue et al.,5 17,683 probes were linked to 16,215 regions of interest, corresponding to 13,545 genes. Interestingly, 73% of the probes that overlap with the high-confidence hypermethylated regions in tumors (corresponding to 75% of the genes) show an expression level below the median in normal colon (Fig. 6A). This percentage is a relatively large fraction when compared with the probes linked to low-confidence hypermethylated DMRs or the probes without a DMR (Fig. 6A). Thus, many genes that become hypermethylated in tumors were already silent or lowly expressed in normal colon.
For 96% of all analyzed probes the mean expression level between tumor and normal differed less than 2-fold (Fig. 6B) and only probes that were expressed above median level in normal tissue showed a more than 2-fold reduction in expression in tumor (Fig. 6B). Of the probes linked to high-confidence hypermethylated DMRs and with expression above median in normal, only 13.3% showed a more than 2-fold reduction in the mean expression level of tumor as compared with normal (Fig. 6C). Analysis of the data set by Kogo et al.27 produced similar results (Fig. S11).
Taken together, 35 genes that were linked to high-confidence hypermethylated DMRs had a 2-fold or higher loss of the mean expression in tumors compared with normal samples in both data sets (Table S6). DAVID analysis of these 35 genes showed that at least seven genes are involved in tissue morphogenesis—BMP2, EYA2, KLF4, LAMA1, RGMA, SLIT2 and TCF21 (Bonferroni adjusted p value = 8.8 × 10−4). The carbonic anhydrase IV gene (CA4) that is hypermethylated in 8/24 tumors, showed the highest reduction of expression in both data sets (about 90 and 20-fold, respectively). SFRP1, that encodes the secreted frizzled-related protein 1 and functions as modulator of Wnt signaling, was hypermethylated in 17/24 tumors and showed an approximate four and 7-fold reduction in expression. This gene was previously also reported as differentially methylated and expressed in CRC.5,28
DNA methylation is strongly associated with cancer, and studies on DNA methylation are likely to shed light on the process of tumorigenesis as well as identify biomarkers. In this study, we used MethylCap-seq, a genome-wide DNA methylation capture method, to analyze differential DNA methylation in 24 CRCs and matched normal colon tissues. We show that the tumor and normal profiles can be clearly distinguished from one another (Fig. 1), and that DMRs are detected for each tumor compared with its matched normal sample. Applying a stringent cut-off, we identified 2,678 and 468 high-confidence hyper- and hypomethylated DMRs, respectively (Fig. 2). However, this approach and the limited cohort size did not allow us to identify statistically significant methylation signatures for subgroups of the analyzed tumor set.
The high-confidence hypermethylated DMRs overlap almost exclusively with CGIs and promoters (Fig. 5A; Fig. S6 and S9), whereas the high-confidence hypomethylated DMRs are mainly focused in intragenic regions and not CGI shores (Fig. S7).
Classical DNA methylation analyses have shown overall hypomethylation in (colorectal) cancers, which occur at unique sequences, but especially at repetitive elements.29-31 To detect differentially methylated regions we focused in this study at regions for which we detect DNA methylation above the peak-calling threshold in at least one sample. We did not analyze global hypomethylation because MethylCap-seq is not the best method to study this. Mainly because MethylCap-seq, as well as other capture assays by means of MBD proteins, is biased toward regions with high methylated CpGs content, and hence regions with low methylated CpG density are covered less well.9
The analysis of epigenetic marks in the human embryonic stem cells showed that hypermethylated CRC DMRs overlap for 53% with regions with elevated H3K27me3 (Fig. 5). Notably, hESC bivalent regions are enriched for developmental regulators,24,32 and silencing of these regulators by aberrant DNA methylation could contribute to cancer formation. Possibly, methylation factors are preferentially targeted to bivalent regions in cancer, and in this way reversible gene silencing in stem cells is replaced by permanent gene silencing in cancer.33,34
We performed an integrated analysis of our data with publically available expression data. Generally, we did not observe strong changes in expression between tumor and normal for the genes associated with the high-confidence hypermethylated DMRs (Fig. 6A). Indeed, hypermethylation frequently occurred at genes not or lowly expressed in normal colon (Fig. 6B). Other studies also reported marginal effects of aberrant DNA methylation on overall gene expression in cancer.5,33,35 The question arises whether these hypermethylated regions are bystander/passengers or causative. It seems safe to assume that lowly expressed genes are “easier” to become methylated than actively transcribed genes, and this can result in sites of non-functional DNA methylation. However, it is also possible that methylation is an important mechanism that ensures a permanent inactive state (a hard switch). In line with this hypothesis, a small fraction of the genes that become hypermethylated in tumors and do not show differences in expression between tumor and normal, do gain expression in the HCT116 DKO compared with the corresponding WT HCT116 cells (Fig. 4C). In the DKO cells maintenance of methylation is lost, and this suggests that DNA methylation is important to suppress the expression at these sites.
The majority of the identified high-confidence hypermethylated DMRs (~95%) co-localize with genes that are not known to be cancer-related. However, at least 9% of these regions are at the promoters of transcription factors. For example, seven different members of the Sox protein family (SOX1, SOX2, SOX5, SOX7, SOX11, SOX14 and SOX21) are targeted. It is well known that these and many other transcription factors are involved in development; they turn on/off genes that induce changes in cell morphology, proliferation, interaction, and movement. It is conceivable that deregulation of some of these transcription factors by hypermethylation could contribute to tumorigenesis. In addition, we also find frequent DMRs positioned at genes associated with the extracellular matrix, such as IGFBP3, MMP9, FBN2, NRCAM, NTNG1, GPNMB, GPC6, RELN, COL4 and ITGA4. Since the ECM affects adhesion, motility, viability, and proliferation, deregulation of this pathway could contribute to malignant transformation of colon cells. Functional assays for these genes, such as siRNA-mediated knockdown or overexpression in model cell lines, will be required to clarify their significance and could provide new insights in tumorigenesis.
The 184 DMRs occurring in over 80% of the tumors (support > 19) constitute potential biomarkers for CRC detection. The fact that hypermethylation of these regions was found in different subtypes of CRC suggests that hypermethylation at these sites may occur relatively early in tumorigenesis. Examples of genes that coincide with these DMRs are SDC2 and KCNK12. SDC2 is a cell surface receptor and appears to play diverse and possibly conflicting roles in cancer: SDC2 downregulation was associated with poor prognosis in esophageal squamous cell carcinoma,36 whereas SDC2 overexpression was associated with poor prognosis in prostate cancer tissue.37 KCNK12 encodes a member of the potassium channel superfamily, and was shown to be hypermethylated in 32 out of 77 colorectal tumors.7 Analysis of the DMRs in a large independent cohort of CRC and preferably using an orthogonal analysis method should be performed to evaluate their frequency of hypermethylation in CRC, as well as their clinical value.
In summary, we have performed a comparative and comprehensive genome-wide DNA methylation analysis of 24 CRC and matched normal colon samples. We provide a resource containing genome-wide DNA methylation maps and comprehensive lists of DMRs. From our analyses, it is evident that hypermethylation in CRC does not only occur at tumor suppressor-like genes, but affects a wide spectrum of coding loci. This suggests that only a small fraction of the DMRs plays a direct role in tumorigenesis. Furthermore, this study corroborates and extends the intriguing link between promoter region hypermethylated in tumors and H3K27me3 in human embryonic stem cells.
Twenty-four patients with primary sporadic CRC from the Fatebenefratelli Hospital in Benevento and five from the Catalan Institute of Oncology in Barcelona were included in this study. Each tumor sample was matched with adjacent apparently normal mucosa removed during the same surgery. The tumor samples were excised from the central part of the tumor mass, reducing the percentage of contaminating normal cells. The liquid nitrogen frozen specimens were micro-dissected and genomic DNA was isolated. Only sample pairs from which sufficient and good high molecular weight DNA was available were included in this study. Patients had no familial history of intestinal dysfunction or CRC. The study was performed according to the principles of the Declaration of Helsinki and approved by the Institutional Review Board of the Fatebenefratelli Hospital and the Catalan Institute of Oncology.
Genomic DNA was sonicated to generate fragments of on average 400 bp. MethylCap was performed using the IP-STAR robot (Diagenode) as described before.9 In short, 1 µg DNA was incubated with paramagnetic beads coated with the MBD domain of MeCP2 fused to GST. After washing with 200 mM, 400 mM and 500 mM NaCl, the bound methylated DNA was eluted in two fractions using 600 mM and 800 mM NaCl, respectively. The eluates were each prepared for Illumina sequencing according to Illumina's protocols. Sequence reads were generated on the Illumina Genome Analyzer IIx using a standard 36-base protocol. The total number of sequenced reads per sample is shown in Table S1.
Initial data processing, base calling, and alignment to the human genome (hg18) was done with Illumina software. Further data analysis was done using in-house generated scripts written in LINUX shell, Python, Perl and R. The uniquely mapped sequence reads were directionally extended to 234 bp, which was the estimated median length of MethylCap-seq DNA fragments. Uniquely mapped reads within satellite repeats were discarded. Mapped reads from the medium and high library were combined and the data was converted to Browser Extensible Data (BED) files for downstream analysis.
Enriched regions (peaks) were called on the basis of a Poisson distribution of overlapping sequence reads within a dynamic window. A false discovery rate (FDR) was calculated relative to the total covered sequence, and peaks with an FDR of ≤ 1 × 10−6 were selected. All peaks from the 48 samples were merged. Per sample the number of reads overlapping each region of interest was calculated and normalized by dividing by the total number of aligned reads of the sample, and the length of the peak.
Wiggle (WIG) files for viewing the data in the UCSC Genome Browser were generated. First, to compensate for differences in sequencing depth among samples, the total number of unique mapped reads of each sample was uniformly equalized relative to the sample with the lowest number of sequence reads. Next, for each base pair in the genome the number of overlapping sequence reads was determined and averaged over a 10 bp window.
The DMR calling algorithm we applied is essentially identical to the one described previously.10 We analyzed per sample pair each region and applied two criteria. First, we tested for statistically significant differences between the two samples using Fisher’s exact test. After executing Fisher’s exact test we performed multiple testing correction using the Benjamini-Hochberg algorithm and selected those regions for which the adjusted p value was lower than 0.01. Second, the difference in the normalized peak densities between tumor and normal should at least be 2-fold.
Genomic DNA was treated with sodium bisulfite using the EpiTect Kit (QIAGEN). The bisulfite-treated DNA was used for PCR (primers are listed in Table S7). The PCR products were isolated from gel, cloned into the pGEM-T vector (Promega), sequenced with the BigDye Terminator kit v1.1 (Life Technologies), and analyzed on a 3730(XL) DNA sequencer. Alignment and calculations were performed using the BiQ Analyzer tool.38
ChIP-seq data (H3K4me3, H3K27me3, H3K36me3, Pol2, CTCF) and DNaseI hypersensitivity data for hESC H1 was downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC. See Table S8 for the complete links. The reads were extended to 200 bp, and BED files and WIG files were generated. The BED files were used to calculate the reads in and around regions of interest. In addition, peaks were called for H3K4me3 and H3K27me3 using MACS with mfold = 4 and p value = 1 × 10−14 and 1 × 10−6, respectively.
The annotation of the genomic features such as intergenic regions, exons, introns, CGIs were obtained from the UCSC Genome Browser.
Two data sets GSE21815 and GSE25070 were downloaded from the Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo). GSE21815 contains data from Agilent arrays and GSE25070 data from Illumina Beadchips. These were normalized using the R package “limma” and “lumi,” respectively.
The HCT116 DK0 cells were a gift from B. Vogelstein, John Hopkins Kimmel Cancer Center, Baltimore, MD. BJ.HCT116 WT and DKO cells were cultured in McCoy’s 5A medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin (all Gibco/Invitrogen) at 37°C in 5% CO2 atmosphere. Genomic DNA was isolated with a standard protocol and MethylCap-seq was preformed as described above.
ds-cDNA and ChIP DNA were prepared for Illumina sequencing according to Illumina's protocols. Sequence reads were generated on the Genome Analyzer IIx. Transcript expression levels were expressed as RPKM values (reads per kilo base of transcript per million mapped reads). The total number of sequenced reads, mapped reads or reads in transcripts are listed in Table S1.
The data generated for this work have been deposited in the NCBI Gene Expression Omnibus (GEO) (www.ncbi.nlm.nih.gov/ geo) and are accessible through GEO Series accession number GSE39068.
We thank E.M. Janssen-Megens, Y. Tan, and K.-J. Francoijs for technical support. This work was supported by the CancerDIP EU Collaborative project HEALTH-F2-2007-200620.
No potential conflicts of interest were disclosed.
CancerDIP EU Collaborative project HEALTH-F2-2007-200620, Breast Cancer Somatic Genetics Study [BASIS] HEALTH-2009-2.1.1-2-242006, Dutch Cancer Foundation (KWF) grant KUN 2008-4130.
H.G.S., L.A., C.B., F.S. and A.B.B designed the study; L.S., V.C., A.V., D.H. and M.E. provided patient material; F.S., A.B.B. and F.M. acquired the data; A.K. provided technical support; F.S., A.B.B, Y.A. and H.G.S. analyzed and interpreted the data; F.S. and H.G.S. wrote the manuscript; A.B.B., Y.A., C.B. and T.L. provided advice and proofread the manuscript; H.G.S. supervised the study.
Supplemental materials may be found here: www.landesbioscience.com/journals/epigenetics/article/22552
Previously published online: www.landesbioscience.com/journals/epigenetics/article/22562