|Home | About | Journals | Submit | Contact Us | Français|
We applied a solution hybrid selection approach to the enrichment of CpG islands (CGIs) and promoter sequences from the human genome for targeted high-throughput bisulfite sequencing. A single lane of Illumina sequences allowed accurate and quantitative analysis of ~1 million CpGs in more than 21408 CGIs and more than 15946 transcriptional regulatory regions. Of the CpGs analyzed, 77–84% fell on or near capture probe sequences; 69–75% fell within CGIs. More than 85% of capture probes successfully yielded quantitative DNA methylation information of targeted regions. Differentially methylated regions (DMRs) were identified in the 5′-end regulatory regions, as well as the intra- and intergenic regions, particularly in the X-chromosome among the three breast cancer cell lines analyzed. We chose 46 candidate loci (762 CpGs) for confirmation with PCR-based bisulfite sequencing and demonstrated excellent correlation between two data sets. Targeted bisulfite sequencing of three DNA methyltransferase (DNMT) knockout cell lines and the wild-type HCT116 colon cancer cell line revealed a significant decrease in CpG methylation for the DNMT1 knockout and DNMT1, 3B double knockout cell lines, but not in DNMT3B knockout cell line. We demonstrated the targeted bisulfite sequencing approach to be a powerful method to uncover novel aberrant methylation in the cancer epigenome. Since all targets were captured and sequenced as a pool through a series of single-tube reactions, this method can be easily scaled up to deal with a large number of samples.
DNA methylation plays an essential role in normal development and is involved in the pathogenesis of a variety of human diseases (1). The human genome is comprised of 3 billion bp of DNA, of which ~30 million are CpG dinucleotides, the main sites of DNA methylation. Given the GC frequency, however, the number of CpGs is much lower than expected. One unique feature of the mammalian genome is the presence of genomic regions containing dense clusters of CpGs, called ‘CpG islands’ (CGIs). CGIs are associated with over half of the known promoters. CpG methylation, occurring in CGIs, has functional consequences, such as transcriptional repression or aberrant activation (2). Many tissue-specific genes are known to be silenced by promoter methylation. Accumulating evidence has also indicated that hypermethylation of CGIs can be one of the most prevalent molecular events in human cancers (3,4).
The methylation status of DNA can be detected by several methods including: (i) methylation-sensitive restriction enzymes that only cut DNA at unmethylated recognition sites; (ii) sodium bisulfite which converts unmethylated cytosines to uracils but has no effect on methylated cytosines; (iii) methylated or unmethylated DNA can be enriched by certain protein or antibodies (5). These procedures are usually combined with genome-screening techniques to examine DNA methylation on a whole- or subgenome scale (6–9). With the rapid advances in next-generation sequencing (NGS) technologies, whole-genome shotgun bisulfite sequencing was successfully used to map the complete methylomes of several human embryonic stem (ES) cell lines at the single-methyl-cytosine resolution (10,11). The single base-level analysis was instrumental for the discovery of non-CpG methylation in ES cells (11). However, in order to analyze a large number of clinical samples, technology that is quantitative, high throughput, cost-effective and both scalable and flexible with respect to coverage is extremely desirable (12).
Several genomic enrichment (or targeted capture) methods developed for whole-exome sequencing (13) were recently applied to targeted bisulfite sequencing (TBS) applications (12,14–16). Hodges and colleagues demonstrated that bisulfite-converted DNA can be hybridized to an oligonucleotide capture array to enrich regions of interest for TBS (16). Within 324 selected CGIs, 25000 CpGs were successfully sequenced. Deng and colleagues designed approximately 30000 padlock probes to assess methylation of approximately 66000 CpG sites within 2020 CGIs on human chromosomes 12 and 20 (15); Ball and colleagues designed approximately 10000 padlock probes to profile approximately 7000 CpGs within the ENCODE pilot project regions (14). Both studies demonstrated that the TBS approach based on padlock probes was highly specific and reproducible. However, due to the reduced sequence complexity after bisulfite treatment, the methods that directly enrich targets from bisulfite-converted DNA by hybridization may have difficulty selecting the optimal capture probes in the highly GC-rich regions. Nautiyal and colleagues developed a unique method for capturing target DNA via oligo-guided ligation before bisulfite conversion (12). The methylation status of more than 145000 CpGs from 5472 promoters was determined using a microarray. One of the challenges of this methodology is the construction of the capture probe panels, which were prepared by single-plex PCR reactions (12).
In this study, we present an approach that combines solution-phase hybrid selection and massively parallel bisulfite sequencing to profile DNA methylation in targeted CGIs and promoter regions. The solution hybrid selection method was initially developed for capture-sequencing of exons (17). However, successful hybrid selection of genomic DNA coupled with bisulfite conversion has not been demonstrated. In addition, the high GC content present in the CGIs makes capturing these regions a challenge. We designed 51551 highly specific capture probes and optimized the solution hybrid selection method for TBS in the CGIs and promoter regions across the human genome. Using one lane of Illumina sequences, we can quantitatively determine the methylation status of ~0.9–1 million CpGs in more than 21408 CGIs and more than 15946 transcriptional regulatory regions. Single-base resolution CpG methylation maps in the promoters of a large number of genes in three breast cancer cell lines were generated. Also, differential methylation in the 5′-end of regulatory regions, as well as the intra- and intergenic regions, particularly in the X-chromosome was identified. The method developed was further tested using wild-type and DNA methyltransferase (DNMT) deficient colon cancer cell lines and the results revealed novel CpG methylation patterns in these cell lines.
The method for preparation of the biotinylated capture probe is the very similar to the method described by Gnirke et al. (17) with minor modifications. The oligonucleotide library was purchased from Agilent Technologies. It consisted of 51551 oligonucleotides of the sequence 5′-AGGACCGGATCAACTN160CATTGCGTGAACCGA-3′ with N160 indicating the capture probe sequences (17). The information on the probes (start and end positions of the probe sequences in the human genome), which can reproducibly capture target DNA is provided in Supplementary Table S1. The library was resuspended in 100µl 0.1× TE buffer (10mM Tris–HCl, 0.1mM EDTA, pH 8.0). A 4µl aliquot was used for the first PCR amplification in 100µl containing 800nmol dNTPs, 40pmol each of PCR primer eMIP_CA_F (5′-TGCCTAGGACCGGATCAACT-3′) and eMIP_CA_R (5′-GAGCTTCGGTTCACGCAATG-3′) (18), and 5U Pfu TurboCx DNA polymerase (Stratagene). The PCR conditions used were 5min denature at 94°C, followed by 20 cycles of 20s at 94°C, 30s at 55°C, 30s at 72°C and 5min final extension at 72°C. The PCR products were purified using a PCR cleanup kit (Qiagen) and loaded on a 2% NuSieve 3:1 agarose gel (Lonza). A 200-bp band was cut from the gel and purified using a gel purification kit (Qiagen). The gel-purified DNA was diluted 1:1000 and kept at −80°C. To add the T7 promoter, 8µl diluted PCR product was re-amplified in 200µl aliquots containing 800nmol dNTPs, 40pmol of each of PCR primer combination eMIP_T7_F (5′-TAATACGACTCACTATAGGGAGGACCGGATCAACT-3′)/eMIP_CA_R (5′-GAGCTTCGGTTCACGCAATG-3′) and eMIP_T7_R (5′-TAATACGACTCACTATAGGGAGAGCTTCGGTTCACGCAATG-3′)/eMIP_CA_F (5′-TGCCTAGGACCGGATCAACT-3′) and 5U Pfu TurboCx DNA polymerase (Stratagene) in two separate PCR reactions with the same conditions mentioned above. The 215-bp PCR products were purified using a PCR purification kit (Qiagen). To generate the biotinylated RNA probes, 1µg of the purified PCR products was used as template in a 100µl in vitro transcription reaction using a MAXIscript T7 kit, (Ambion). The template DNA was removed by incubating for 15min with 1µl Turbo DNase (Ambion). The reaction was stopped by adding 0.5M EDTA and the biotinylated RNA probes were purified using a MEGAclear kit (Ambion). Biotinylated RNA probes were kept in the presence of 1U/µl SUPERase-In RNase inhibitor (Ambion) at −80°C.
Genomic DNA was isolated from cancer cell lines using DNeasy Blood and Tissue Kit (Qiagen). The genomic DNA was randomly fragmented into 150–500-bp segments using a Bioruptor (Diagenode). The sheared DNA was repaired, and an adenine was added to the 3′-end of the DNA fragments. Pre-annealed Illumina sequencing adaptors containing 5-methyl-cytosine were ligated to both ends of the DNA fragments according Illumina standard protocols (Illumina). The sequencing library was then purified with an Ampure XP Kit (Agencourt). About 20–30µg of DNA each from breast cancer cell lines MCF10A, MCF7 and MDA-MB-231 were used to establish the protocols, which yielded roughly 12µg of ligated fragment library that is enough for six capture reactions for each cell line. To make the assay consistent, we attempted to maintain a constant ratio of library DNA to the amount of capture probes used during the hybridization reaction. To investigate if the amount of input DNA affects sequencing results, we performed bisulfite sequencing experiments using DNA obtained from 6, 3, 2 or 1 capture reactions. The details about the amount of captured DNA used in each experiment are listed in Table 1.
The hybridization procedure was performed as previously described (17). Briefly, 5µg human Cot-1 DNA (Invitrogen), 5µg Salmon sperm DNA (Invitrogen) and 1µg pair-end DNA library were mixed and adjusted to 7µl by vacuum drying. The mixture was denatured at 95°C for 5min, and annealed at 65°C for 5min. Then, 13µl of pre-warmed 2× hybridization solution (10× SSPE, 10× Denhardt's, 10mM EDTA and 0.2% SDS) and 6µl of a freshly prepared mixture containing 1µg biotinylated RNA probe and 20U SUPERase-In were added at 65°C. The hybridization mixture was incubated for 48h at 65°C. After the hybridization was finished, the DNA:RNA hybrid was captured with 100µl of M-280 Streptavidin Dynabeads (Invitrogen) that had been washed three times and resuspended in binding buffer (200µl 1M NaCl, 10mM Tris–HCl, pH 7.5 and 1mM EDTA). The bead/capture solution was rotated for 30min at the room temperature and then was pulled down and washed once at room temperature for 15min using 500µl of 1× SSC/0.1% SDS. The bead/capture solution was again pulled down and washed three times at 65°C for 10min using 500µl of 0.1× SSC/0.1%SDS. For elution of the capture DNA, a portion of 0.2M NaOH solution was added and incubated at room temperature for 10min. The beads were pulled down and the supernatant was transferred to a new tube containing 70µl of neutralization buffer (1M Tris–HCl pH 7.5). The captured DNA was desalted and purified using a Zymo DNA Clean & Concentrator Kit (Zymo Research) and eluted in 20µl. The neutralization step can be omitted, and the DNA in 0.2M NaOH solution can be directly used for bisulfite treatment. For each sample, six capture reactions were performed with probes targeting the Watson strand, and six reactions were performed for the Crick strand. The captured DNA was pooled in a strand specific manner and used in two separate bisulfite treatment reactions.
Bisulfite treatment was performed on the captured DNA using EZ DNA Methylation Kit (Zymo Research, Orange, CA, USA) according to the manufacturer's protocol. The bisulfite-modified DNA was used immediately for PCR or stored at −80°C. For the first round of PCR amplification, the bisulfite-converted DNA pooled from six capture reactions was amplified in 4×50µl aliquots by 10 cycles of PCR using the following reaction mixture, 2.5U of uracil-insensitive Pfu TurboCx Hotstart DNA polymerase (Stratagene), 5µl 10× Pfu TurboCx buffer, 2.5mM dNTPs, 25µM of each Pair End (PE) Primers 1 and 2. Cycling conditions for the reaction were: 95°C for 2min, 98°C for 30s, then 10 cycles of 98°C for 15s, 60°C for 30s and 72°C for 4min, final extension at 72°C for 10min. In order to completely remove the primer dimers, the PCR products were first purified using a Zymo DNA Clean & Concentrator Kit (Zymo Research) followed by a second purification with an Ampure XP Kit (Agencourt). For the second round of PCR amplification, the first PCR product was amplified in 50µl volume using 14 cycles of PCR with the following reaction mixture, 2.5U Phusion DNA polymerase (Finnzyme), 10µl 5× Phusion buffer HF, 1.5mM dNTPs, 25mM of each Pair End (PE) Primers 1 and 2. Cycling conditions for the reaction were: 98°C for 15s, 65°C for 30s and 72°C for 4min, final extension at 72°C for 5min. The PCR products were purified using a Zymo DNA Clean & Concentrator Kit (Zymo Research) as well as an Ampure XP Kit (Agencourt). The PCR products were sequenced using an Illumina GAIIx according to manufacturer's protocol.
Bisulfite-PCR primers for 46 genes were designed using the PRIMEGENSv2 software (19) and synthesized by IDT in a 96-well format. The amplicon information is listed in Supplementary Table S2. Bisulfite-modified genomic DNA from MDA-MB-231 and MCF7 was used as the template for PCR amplification of the 46 genes. The individual PCR products were pooled together, purified, end-repaired and ligated to the bar-coded sequencing adaptors using 454 library construction kits and sequenced according to the manufacturer's protocols (Roche 454 Life Sciences).
ChIP experiments were carried out using a Magna ChIP kit (Millipore) following the manufacturer's suggested protocols. For each ChIP experiment, 5×106 MDA-MB-231 cells were cross-linked with 1% formaldehyde at 37°C for 10min. The anti-trimethyl-H3K4 (Upstate, no. 04-745) antibody was used. The ChIP-seq libraries were prepared from 10ng of both Input and ChIP DNA samples using a ChIP-seq sample preparation kit (Illumina) according to the manufacturer's protocol. The libraries were sequenced on a GAII sequencer (Illumina) at 36-bp read length. The raw sequencing reads were processed and aligned against the human genome using ELAND program (Illumina) and the peaks were identified using MACS (20).
The sequence analysis was done using an in-house bioinformatics analysis pipeline in which the raw bisulfite sequencing reads generated using Illumina sequencing were first converted to ‘unmethylated reads’ via the conversion of all Cs to Ts in silico. The unmethylated, converted sequencing reads were then aligned to the two copies of in silico converted human reference genome (hg18) using SOAP2 (15). The results of both mappings were combined, and any reads mapped against both copies of converted reference were resolved by identifying their strand based on the TCAG ratio and the mapping positions used accordingly. The clonal reads were not removed when the sequencing reads were mapped to the genome. The methylation level [(number of methylated)/(number of methylated+number of unmethylated)] was calculated after measuring the amount of methylation (CG→CG) and unmethylation (CG→TG) in the CpG position. The results were stored and can be visualized using the UCSC Genome Browser. Cluster analyses and statistical analyses were performed using Partek Genomic Suites (Partek) and R. The genomic annotation was performed using in-house Perl scripts. The 454-sequencing reads were analyzed using BSmapper (http://sourceforge.net/projects/bsmapper). The raw and processed sequencing data were submitted to NCBI gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). The accession number is GSE26826 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26826).
The overall experimental approach is illustrated in Figure 1a. A pool of single-stranded DNA oligonucleotides was synthesized by Agilent Technologies’ oligo library synthesis (OLS) services. Each oligonucleotide consisted of a target-specific 160-mer sequence flanked by 15 bases of a universal primer sequence on each side to allow PCR amplification. After initial PCR, a T7 promoter was added on each side of the PCR product in two separate PCR reactions. The double-stranded PCR products were converted into two sets of complementary biotin-labeled RNA probes by in vitro transcription (17), which were used to capture Watson and Crick strands of the genomic DNA in two separate capture reactions, respectively. The captured DNA was pooled from several hybridization reactions, treated with sodium bisulfite, amplified by PCR and subsequently sequenced using an Illumina GA IIx sequencer (Figure 1a).
We designed 51551 highly specific capture probes using the following criteria: (i) one capture probe (160bp) for every 500bp window within a given CGI across the genome. CGIs <500bp only have one probe per island; (ii) one capture probe per transcription start sites (TSS) for all RefSeq genes. We implemented these criteria in the PRIMEGENS-v2 software to search for unique 160-bp segments within the targeted regions (19,21). A highly stringent criterion for cross hybridization was used and only highly specific probes were selected. A total of 35746 probes were successfully designed to target 23441 CGIs (~83% of the 28226 CGIs in the human genome). Next, we designed 15805 probes to capture the regions crossing the TSS (±300bp) of 19369 RefSeq genes after excluding genes that already contained the CGI that spanned around the TSS. Overall, the capture probes targeted over 8.2 million bases in the genome and overlap with >80% of CGIs and the promoters of known genes in the human genome.
Targeted bisulfite sequencing was performed on three breast cancer cell lines, MCF7, MDA-MB-231 and MCF10A. We obtained 23–30 million single-end 76-bp sequencing reads for each cell line. The bisulfite sequences were analyzed using an in-house bioinformatics analysis pipeline. We mapped the raw reads to the in silico bisulfite-converted human genome, and an average of ~79% unique mapping rate was achieved. As an example, the typical bisulfite sequencing results and sequencing coverage profiles in the promoter CGI of HOXA9 are shown in Figure 1b. The average sequencing depth per CpG is between 86 and 146 across cell lines (Table 1). The fraction of probes with at least half of the average abundance is 52–54% (Figure 1c), which is comparable to previous studies (15,17). Methylation levels for CpG sites covered with at least 10 sequencing reads were extracted providing accurate quantification for ~1 million CpGs. Overall, 77–84% of the CpGs measured in the samples fell on or near capture probe sequences; 69–75% fell within CGIs. More than 85% of capture probes successfully yielded quantitative DNA methylation information of the targeted regions (Table 1). As expected, ~40% of CpGs were located in 5′-end transcriptional regulatory regions including promoters and 5′-UTRs, while <18% of CpGs were associated with repetitive sequences (Supplementary Figure S1). In all three cell lines (Supplementary Table S1), 21408 CGIs and 15946 transcriptional regulatory regions were captured and sequenced. Bisulfite treatment efficiency can be determined by calculating the C to T conversion (rate for all cytosine bases other than those in CpG dinucleotides (this includes CpA, CpC or CpT dinucleotides). The bisulfite conversion rate was estimated to be 98.3, 98.3 and 98.7% for MCF7, MCF10A and MDA-MB-231, respectively.
Next, the amount of input DNA was decreased in order to determine the effects on the performance of TBS method based on solution capture. When half of the amount of captured DNA (from three capture reactions) was used for bisulfite treatment and subsequent Illumina sequencing, the number of CpGs sequenced was reduced by ~45% (Table 1). The bisulfite sequence mapping rate and the on-target rate did not change significantly, indicating that the specificity of the solution hybrid capture was not affected by the amount of input DNA. However, the sensitivity of the TBS approach was adversely affected. When DNA from only a single capture reaction was used, a significant decrease in the number of probes that were able to capture the target DNA was observed. Using genomic DNA from a normal breast tissue sample (~10µg of DNA), we were able to measure the methylation status of ~0.3 million CpGs and 11549 CGIs.
To validate the measurement accuracy of the targeted bisulfite sequencing, we first compared the methylation level of the same CpG sites on two opposite strands. Since we captured the forward and reverse strands separately (Figure 1a), and CpG methylation is symmetric on the two DNA strands, the accuracy of the assay can be determined by comparing the methylation level of these CpG sites on the two strands (15). For 41257 such CpG sites that were covered by more than 100 sequencing reads, the Pearson correlation coefficient (R) was 0.955 (Figure 2a), despite the fact that there was little correlation in the sequencing coverage at these CpG sites between the two strands (Figure 2b). To confirm the targeted bisulfite sequencing results, we quantified the methylation levels of 762 CpG sites from 46 randomly selected PCR products using 454 sequencing (Figure 3a–d and Supplementary Table S2). Correlation between the two assays was very good, but not perfect, at the single-CpG level (R=0.818 and 0.826 for MCF7 and MDA-MB-231, respectively). From Figure 4 and Supplementary Table S2, it is apparent that 454 amplicon sequencing often gave a higher methylation measurement. In the MCF7 cell line, >80% of 219 CpG sites containing different methylation measurements were found to have greater methylation levels when measured by 454 sequencing. Similar results were obtained for the MDA-MB-231 cell line. The higher methylation measurements indicated by 454 sequencing could be due to selective amplification of methylated DNA (which is more GC rich than unmethylated DNA) in the initial PCR reaction. However, when compared at the amplicon level, the correlation between the two assays was excellent (R=0.924 and 0.944 for MCF7 and MDA-MB-231, respectively; Figure 4). To test the reproducibility of the capture system, we compared the methylation measurements from two independent technical replicates. Excellent correlation was observed (Figure 2c). Therefore, the validation experiments demonstrate that the capture sequencing approach is accurate and reproducible.
Using this method, we generated genome wide, single-base resolution DNA-methylation maps in three of the most commonly used breast cancer cell lines which have been cited extensively in the literature. The overall methylation levels of CpGs displayed a similar bimodal distribution among the three cell lines, which is consistent with previous reports (Supplementary Figure S2). To evaluate the DNA methylation status of each targeted loci, we first grouped the CpGs based on the 500-bp windows centered by the capture probes, and then took an average of the methylation values of all CpGs within the 500-bp window. The average methylation value was assigned to each target loci. The methylation levels of 42460 loci and their annotations are included in Supplementary Table S1. We split these targeted loci into three main categories: 5′-end of known genes (27996 loci), intragenic regions including the body and 3′-end of genes (8921 loci), and intergenic regions (5472 loci). The distribution of the average methylation levels of target loci displayed a similar trend to the distribution at the single-CpG level (Figure 5a). However, the patterns of distribution seemed to depend on their functional annotations, and are very similar between the three cell lines, except the X-chromosome. A significantly higher number of unmethylated loci were observed in the X-chromosome of MDA-MB-231 cells when compared to other cell lines (Figure 4a). Cluster analysis grouped the MDA-MB-231 and MCF10A cell lines together when the methylation data from 5′-end, intra- and intergenic loci was used. However, cluster analysis using methylation data from the X-chromosome-grouped MCF7 and MCF10A together (Figure 5b), separating them from the triple negative and aggressive MDA-MB-231 cell line. Overall, differentially methylated regions (DMRs) can be identified in the 5′-end regulatory regions, as well as the intra- and intergenic regions, particularly in the X-chromosome among the three cell lines (Figure 5b and Supplementary Figure S3).
We identified 1155, 2133 and 985 methylated genes with average methylation levels equal or above 0.75 in MCF10A, MCF7 and MDA-MB-231 cell lines, respectively (Figure 5c). Surprisingly, MDA-MB-231 cells seem to have the fewest number of methylated genes among the three cell lines. These methylated genes were classified according to their gene ontology (GO) category (Supplementary Tables S3–S5). The single-CpG resolution methylation maps of many known tumor suppressor genes were established in the three cell lines. Supplementary Figure S4 reveals the methylation patterns in the CGIs of tumor suppressor genes such as SFRP1, SFRP2, SOX17 and WIF-1 which are all inhibitors of WNT-signaling pathway (Supplementary Figure S4A) as well as several microRNAs that are embedded in the CGIs (Supplementary Figure S4B). Supplementary Figure S5 illustrates differential methylation patterns in all four HOX gene clusters. In addition, the DNA methylation patterns correlated particularly well with histone modification data (Figure 5d). For instance, CpG methylation was observed in the promoter of RASSF1A tumor suppressor gene (the long transcript of RASSF1), but not in the promoter of its alternative splice variant, RASSF1C (the short form of RASSF1 transcript) (Figure 5d). This is consistent with our previous finding that DNA methylation functioned in concert with histone modifications to regulate gene expression of these specific transcripts (22). GATA2 is another example, which has not been characterized previously. For both genes, the histone H3 lysine 4 tri-methylation (H3K4Me3) was associated with the unmethylated promoters. Particularly for GATA2, the CGI seems to be split in half and only the unmethylated region is clearly overlapping with H3K4Me3.
To further test the robustness of the targeted bisulfite sequencing method, we analyzed a group of DNMT knockout cell lines derived from colon cancer cell line HCT116 (23). These are isogenic cell lines with a single gene defect (23–25). Using targeted bisulfite sequencing, we analyzed 0.54, 0.42, 0.48 and 0.65 million CpGs in DNMT1 knockout (1KO), DNMT3B knockout (3BKO) and DNMT1, 3B double knockout (DKO) cell lines. Similar to previous reports (23,24,26,27), the targeted bisulfite sequencing data revealed a significant decrease in the number of methylated CpG sites in the DNMT1, 3B double knockout (DKO) cell line, but only modest decreases in the single knockout, 1KO and 3BKO, cell lines (Figure 6a). Cluster analysis of the 129022 common CpGs shared by all four cell lines revealed an interesting pattern in which demethylation increased gradually from the 3BKO to DKO cell lines (Figure 6b). Approximately 30.5% of the 129022 CpGs were fully methylated (methylation levels>0.95) in the parental HCT116 cells. Further analysis of the 38650 fully methylated CpGs (mCpGs) in HCT116 showed that 88.3% of mCpGs displayed some degree of decreased methylation in at least one DNMT knockout cell line. Almost half of the mCpGs (17021 CpGs) had a methylation value of less than 0.5 (approximately ≥2-fold reduction in methylation level) in at least one DNMT knockout cell line. We divided the 17021 mCpGs into four different groups based on the patterns of methylation in four colon cancer cell lines (Figure 6c). Fifty-four percent of mCpGs were hypomethylated in both the 1KO and DKO cell lines (Class 2). Of mCpGs, 34% were hypomethylated only in the DKO cell line (Class 4). Of mCpGs, 9% were hypomethylated in all three knockout cell lines (Class 1) and only 3% of mCpGs had demethylation in both the 3BKO and DKO cell lines (Class 3). Figure 6d illustrates some of the demethylation patterns observed in the four cell lines. The demethylated CpG sites were widely distributed across the genome, affecting promoter and 5′-UTR (24%, 4019 CpGs), gene body including exon, intron and 3′-UTR (33%, 5649 CpGs) and intergenic regions (43%, 7353) (Figure 6c). The distribution of demethylated CpGs in the aforementioned four classes did not change significantly among promoter, gene body and intergenic regions. Only 386 demethylated CpGs were associated with repetitive sequences that were widely distributed across the subclass of repetitive sequences.
For the first time, we have demonstrated that the solution hybrid-capture approach can be successfully used for targeted bisulfite sequencing. The solution hybrid selection approach has recently become a popular method for target enrichment. One unique feature of this protocol is to convert the long oligonucleotides into biotin-labeled RNA bait for capture of target DNA. This approach increases the specificity of the hybridization capture, and makes the capture process highly flexible. However, an earlier study indicated that targets having higher GC content resulted in a much lower coverage (17). This systematic bias poses a challenge for capturing CGI sequences. This challenge was overcome by carefully selecting unique capture probes sequences. We demonstrated that over 85% of the 51551 probes successfully captured their target DNA. Using a single lane of Illumina sequences (20–30 million reads), we can reproducibly quantify 0.9–1 million CpGs in the human genome. This number is 6–36-fold higher than the previously published results that use other sequence capture methods. Since the number of capture probes can be easily expanded, the coverage of the TBS approach described here can be significantly increased. We also showed that the TBS approach based on solution hybrid selection was highly reproducible. The accuracy and precision of this approach was further confirmed by bisulfite sequencing analysis of 46 candidate loci using an independent method. The current capture probe set focused mainly on the promoter sequences and CGIs; however, it can be easily expanded to include CGI shore, enhancer and other regulatory regions.
Although whole-genome bisulfite sequencing is ideal for methylome analysis, it remains cost-prohibitive for analyzing a large number of clinical samples. Meissner et al. (28) developed a reduced representation bisulfite sequencing (RRBS) method for sub-genome bisulfite sequencing using NGS and demonstrated its utility for analyzing clinical samples. However, the analysis is only limited to the MspI fragments in the genome. The solution hybrid selection combined with bisulfite sequencing on NGS platforms has clear advantages due to its flexible and expandable probe design as well as the simple sample preparation process which could be fully automated. Multiple captured DNA samples can be bar-coded and sequenced as a pool to meet the increased throughput of NGS.
Several other groups have also developed methods for targeted genome-level bisulfite sequencing (12,14–16). Our assays have several unique features when compared with these methodologies. First, because the solution hybrid selection procedure captures the DNA before bisulfite conversion, the methylation status of the target sequence does not factor into the capture probe design. Second, the solution hybrid-capture approach is highly scalable and flexible with respect to coverage while maintaining the high level of specificity. Finally, the capture probes can be obtained commercially through Agilent's eArray service. One limitation of our current protocol is the requirement of a relatively high amount of DNA for the initial capture reaction. We reason that the following factors might have been involved: (i) the targeted regions are small (0.25% of the human genome), and the probe density is rather low, one probe per 500-bp window. If one probe failed to capture the DNA, there is no nearby probe to compensate. (ii) A significant amount of DNA was lost during the library preparation and bisulfite modification steps. We sequenced the capture DNA from a single-capture reaction before bisulfite treatment and found that the coverage was similar and distributed more evenly than the bisulfite sequencing of pooled DNA from multiple capture reactions (data not shown). We predict that the amount of DNA required can be reduced by designing a larger and more densely distributed probe set, and by further optimization of the library construction (29), hybridization (30) and bisulfite treatment procedures. Carrier DNA can also be added to the bisulfite treatment reaction to reduce the potential loss of DNA during bisulfite treatment and the subsequent purification process (31). To analyze primary tumor samples with a limited amount of DNA, a bar-code could be assigned to multiple tumor samples, the samples could then be pooled together, and finally the capture reaction could be performed.
Using the targeted bisulfite-sequencing approach, we generated single-base resolution methylation maps in thousands of CGI and non-CGI promoters in breast cancer cell lines, MCF10A, MCF7 and MDA-MB-231. Interesting methylation patterns, such as localized shifts between hyper- and hypomethylated states, were observed in some promoters. For instance, although the promoter region of HOXA9 is methylated in all three cell lines, the first exon is clearly differentially methylated (Figure 1b). The biological significance of this observation is unclear and may deserve further investigation. However, we have shown that DNA methylation may regulate the alternative promoter usage and function in concert with histone methylation (Figure 5d). We mapped a large number of differentially methylated genes including microRNAs (Supplementary Figure S4B). Differential methylation was identified not only in the 5′-end regulatory regions, but also in the intra- and intergenic regions (Figure 5b). Intriguingly, the cluster analyses of DNA methylation in 5′-end regulatory regions, and inter- and intragenic regions all grouped MCF10A and MDA-MB-231 together. This is consistent with the classification based on gene expression, by which MDA-MB-231 and MCF10A were classified as ‘Basal B’ subtype, while MCF7 was classified as ‘Luminal’ subtype (32). Although MCF10A is a non-tumorgenic breast epithelial cell line, we have found a large number of methylated genes in this cell line. Interestingly, it seems that DNA methylation patterns in the X-chromosome can distinguish MCF10A from MDA-MB-231 cell line. A significantly higher number of under-methylated loci were observed in MDA-MB-231 cells. Although we could not determine if this observation is due to loss of X-inactivation based on our data, it certainly deserves further investigation in the future.
We further validated the ability of targeted bisulfite sequencing to detect changes in DNA methylation by using DNA from wild-type and DNMT deficient HCT116 colon cancer cell lines. These DNMT deficient cell lines have been used as DNA-demethylation controls in numerous studies (26,27,33,34). The targeted bisulfite sequencing correctly identified a gradual increase in the number of DNA hypomethylation events from the 3BKO, to 1KO, to DKO cell lines. This result is in close agreement with previous findings that utilized HPLC-based global methylation analysis (23,24). The new sequencing results indicated the critical role of DNMT1 in maintaining global DNA methylation, as hypomethylated CpG sites in the DKO cell line were most likely to be hypomethylated in the 1KO cell line as well. We have also identified DNMT3B-specific demethylation patterns in a small group of genomic loci (Figure 6d). These candidate loci may be further investigated in the future in order to dissect the role of each individual DNMT enzyme in establishing CpG methylation patterns. In summary, the single-base resolution DNA-methylation maps generated in the study may become a valuable resource for gene-specific epigenetic studies in these cell lines.
Supplementary Data are available at NAR Online.
Funding for open access charge: National Institutes of Health (grants CA134304 to H.S. and DA025779 to H.S. and K.Z., respectively, in part); Department of Defense (grant W81XWH-07-1-0560 to D.X. and H.S., in part). H.S. is a Georgia Cancer Coalition Distinguished Cancer Scientist.
Conflicts of interest statement. E.J.L., L.P., G.S., T.J., G.K., J.H.C., X.W., K.R., J.C., D.X., K.Z., and H.S. declare no conflict of interest. L.Z. and G.P.S. were employed by Illumina Inc. at the time these experiments were carried out.
We thank Dr Bill Dynan for critical review of the article and Mr James Wilson for his assistance in editing the article. The DNMT knockout cell lines are generous gifts of Dr Bert Vogelstein.