|Home | About | Journals | Submit | Contact Us | Français|
We conducted a genome-wide DNA methylation analysis in CD19+ B-cells from chronic lymphocytic leukemia (CLL) patients and normal control samples using reduced representation bisulfite sequencing (RRBS). The methylation status of 1.8–2.3 million CpGs in the CLL genome was determined; about 45% of these CpGs were located in more than 23,000 CpG islands (CGIs). While global CpG methylation was similar between CLL and normal B-cells, 1764 gene promoters were identified as being differentially methylated in at least one CLL sample when compared with normal B-cell samples. Nineteen percent of the differentially methylated genes were involved in transcriptional regulation. Aberrant hypermethylation was found in all HOX gene clusters and a significant number of WNT signaling pathway genes. Hypomethylation occurred more frequently in the gene body including introns, exons, and 3′-UTRs in CLL. The NFATc1 P2 promoter and first intron was found to be hypomethylated and correlated with upregulation of both NFATc1 RNA and protein expression levels in CLL suggesting that an epigenetic mechanism is involved in the constitutive activation of NFAT activity in CLL cells. This comprehensive DNA methylation analysis will further our understanding of the epigenetic contribution to cellular dysfunction in CLL.
Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in the United States and accounts for approximately 11% of all hematologic neoplasms. Despite recent advances in the understanding of the underlying pathophysiology and in the treatment of this disease, CLL remains incurable. The clinical course of patients with CLL is heterogeneous; some patients experience rapid disease progression while others live for decades without requiring treatment.1,2 Since treatment of unselected early stage patients with alkylating agents at diagnosis offers no survival advantage over treatment at the time of disease progression, the current paradigm for newly diagnosed, early-stage CLL patients is to pursue a strategy of “watch and wait,” which defers treatment until the disease progresses.2 However, this approach often leads to profound anxiety and emotional distress for patients with CLL.1 Although the identification and validation of prognostic molecular markers (including surface markers ZAP70 and CD38, cytogenetic abnormalities, and IGHV mutational status) has resulted in refinements in the management of these patients,3-5 many of these biological assays are expensive, difficult to standardize and not widely available.1 Thus, the discovery of biologically relevant factors that influence the heterogeneity and progression of CLL will not only promote our understanding of the disease process, but also will allow us to identify rational therapeutic approaches.
Epigenetic processes such as DNA methylation and histone modifications control packaging of DNA and have a direct impact on the function of the human genome. Numerous studies have demonstrated that aberrant epigenetic changes play an important role in tumor initiation and progression.6 Aberrant DNA methylation changes in tumor suppressor genes such as DAPK1,7 SFRP1,8 ID4,9 genes involved in apoptosis, cell cycle regulators p16 and p1510 and prognostic markers ZAP7011,12 and TWIST213 have been identified in CLL patients. DNA methylation changes were also found to be associated with disease progression in the Eµ-TCL1 transgenic mouse model of CLL.14 In an initial global screen for CpG island (CGI) methylation using a technique called restriction landmark genomic scanning (RLGS), DNA hypermethylation was found in CLL patients with a mean of 4.8% of CGIs affected.15 Recently, several genome-wide DNA methylation studies using DNA methylation microarray analyses identified additional aberrantly methylated genes in CLL samples.16-18 Our studies, as well as many others, have also shown that some of these methylated genes had good correlation with the existing prognostic markers.16-19 In this study, we utilized the reduced representation bisulfite sequencing (RRBS) to identify aberrant DNA methylation changes at single-base resolution in CD19+ B-cells purified from the peripheral blood of CLL samples and normal control subjects. Differential methylation between CLL and normal B-cells was identified within sequences with different functional annotations including, but not limited to CGI, CGI shore and repetitive sequences. Examples of both hyper and hypomethylation were identified in the 5′ end regulatory region and gene body of many genes that could potentially function in the CLL pathogenesis and lead to aberrant gene expression.
To perform a genome-wide analysis of DNA methylation in CLL, we applied the RRBS approach to 11 CLL B-cell samples, 3 normal control samples including one each of normal CD19+, CD19+/ IgD+ naïve (NBC), CD19+/CD27+ memory B-cell sample (MBC) and three CLL cell lines (Mec-1, Mec-2 and Wac-3). We generated 20–30 million Illumina sequencing reads for each sample. Of these, 63% to 75% were successfully mapped to either strand of the human genome (hg18). The average sequencing depth per CpG was between 32X and 43X. We were able to consistently determine the methylation status of approximately 1.8–2.3 million CpGs (Table 1). Over 23,000 CGIs, which accounts for more than 80% of annotated CGIs in the genome, were examined. About 40% of CpGs analyzed were located in the 5′ end regulatory or coding sequences (Fig. S1A). The overall methylation levels of CpGs showed a similar bimodal distribution in all samples (Fig. S1B), which is consistent with previous reports.20 Most of the CpGs located in the 5′ end regulatory regions were found to be unmethylated (methylation level < 0.25), while CpGs located in the gene body and intergenic regions were typically methylated (methylation level > 0.75). The overall distribution of CpGs, as well as methylated CpGs, in the repetitive sequences was also consistent between samples (Fig. S2A and B). Figure 1A shows a pair-wise comparison of the single-base resolution methylation data. The overall CpG methylation profiles among the three normal B-cell samples were highly correlated as the pair-wise Pearson correlation coefficients (R) ranged from 0.96–0.97. The correlations among CLL samples and between CLL and normal B-cells were also high (R ranged from 0.89 to 0.93). However, the correlation between the cell lines and primary normal and CLL B-cells was significantly lower as indicated by decreased R-values ranging from 0.72 to 0.82 (Fig. 1A).
To identify the DMRs between normal B-cells and CLL B-cells, we performed a genome-wide, unbiased DMR detection using a complete tiling of the human genome in 200 bp windows. Because adjacent CpGs are usually coordinately methylated,21 we only performed analysis on windows with at least 5 sequenced CpGs. Using the criteria requiring an FDR q value < 0.01 and difference of average methylation level > 0.25, we identified several thousand differentially methylated windows in each primary CLL sample when compared with all three normal B-cell samples (Fig. 1B). Figure S3A shows the volcano plots identifying DMRs between each CLL sample and the three normal control samples combined as a group. In the CLL samples, the number of hypomethylated DMRs was similar; however, the number of hypermethylated DMRs was quite variable (Fig. 1B). In total, we identified 8703 DMRs that were hyper- or hypomethylated in at least one primary CLL sample as compared with all three normal control samples. We performed a permutation test to assess whether the identified DMRs were due to random variation between samples. We randomly re-assigned case-control status to the 11 cases and 3 controls, and then re-performed the genome-wide analysis and counted the number of DMRs observed. As shown in Figure S3C, after 100 permutations, none generated more DMRs than the original results with correct case-control status, suggesting that it was unlikely to obtain the original results by chance (p = 6.36 × 10−87). The majority of the 8703 DMRs (75%) were located within CGIs, and about 9% of the DMRs were located in the CGI shore regions. 28% of the DMRs were located in the promoter or 5′-UTR regions, while over half of the DMRs (55%) were located in the intergenic or intragenic regions (Fig. 1C). Over half of the 8703 DMRs were differently methylated in only one CLL when compared with normal B-cell samples (Fig. 1D). 353 DMRs were differentially methylated in more than 5 CLL samples and 52 DMRs presented in all 11 CLL samples.
From the 8703 DMRs, we identified 1764 known genes that have DMRs located at the 5′ end regulatory regions (TSS ± 1000 bp). The functional annotation analysis generated using DAVID showed that about 18% (324 out of 1764) of these proteins regulate transcription, and 105 genes belong to the Homeobox protein family. There was significant enrichment in transcriptional regulators (p = 4.7 × 10−9, FDR-adjusted p value = 8.6 × 10−6) and Homeobox genes (p = 7.9 × 10−47, FDR-adjusted p value = 2.3 × 10−44). Aberrant methylation changes were observed in all four HOX gene clusters. As shown in Figure 2A, the CGIs associated with HOXA6 through HOXA13 were frequently hypermethylated in the CLL samples. Nearly all HOXD genes were found to be aberrantly hypermethylated in the CLL samples as well (Fig. 2B). HOXA13, HOXD8 and HOXD11 were hypermethylated in all 11 CLL samples analyzed (Fig. 2C). In the CLL cell lines, most of the HOX genes exhibited hypermethylation that was much denser and more uniform than was observed in primary CLL samples.
Based on the results from Ingenuity Pathway Analysis (IPA), the 1764 genes are enriched in several functional categories and canonical pathways (i.e., G-protein coupled receptor signaling, FDR-adjusted p-value = 4.57 × 10−5; cAMP-mediated signaling, FDR-adjusted p-value = 0.011; axon guidance signaling, FDR-adjusted p-value = 0.011; and WNT signaling, FDR-adjusted p-value = 0.014) (Tables S1 and S2). For instance, 29 of 174 WNT pathway genes were associated with DMRs located near the TSS in CLL samples. Notably, most of the hypermethylated genes in CLL were antagonists of WNT signaling such as members of DICKKOPF (DKK) and soluble frizzled protein (SFRP) families, as well as the SRY-like box (SOX) proteins (Fig. 3). The hypomethylated genes, on the other hand, involved the WNT ligands and TCF7 transcription factor (Fig. 3B).
Using a similar approach as described above, we identified 1870 DMRs between NBC and MBC across the entire genome. Of those DMRs, 1706 were hypomethylated, while only 164 were hypermethylated in MBC as compared with NBC (Fig. S3B). Among the 152 genes associated with DMRs near the transcription start sites (TSS ± 1000 bp) (Table S3), 123 were hypomethylated and only 29 were hypermethylated in MBC when compared with NBC. IPA analysis showed that out of the 152 genes, 17 genes were involved in hematological system development and function, 10 genes were involved in hematopoiesis and 8 genes were involved in cell-mediated immune responses. Among them, Epstein-Barr virus induced 3 (EBI3), Interleukin-2 receptor α (IL2RA), programmed cell death 1 (PDCD1), tumor necrosis factor (ligand) superfamily, member 13B and 14 (TNFSF14 and TNFRSF13B) and myelin basic protein (MBP) play important roles in regulating the survival and proliferation of B and T-cells.
To identify the most frequently hyper- or hypomethylated genes in CLL, we performed Student's t-test analysis between CLL and normal B-cells using the average methylation value of each DMR. After the multiple testing adjustments, 533 out of 8703 DMRs survived the stringent statistical test (FDR q value < 0.05, methylation difference > 0.25). The detailed annotations of the 533 DMRs are listed in Table S4. Of these 533 DMRs, 400 DMRs were hypermethylated and 133 were hypomethylated. Furthermore, 158 DMRs were located at the 5′ end regulatory regions of 140 known genes. Cluster analysis using the average methylation values of the 158 DMRs clearly separated the CLL samples from three normal B-cell samples and cell lines (Fig. 4). However, the cluster analysis failed to further separate the CLL samples based on their unique clinical and molecular characteristics. Many frequently hypermethylated genes previously reported in CLL, as well as other hematological malignancies, such as: FOXD3,14 FOXE1, FOXG1, IRX1,15 ID4,9,14 SFRP1,8 SLIT2,22 BNC1, ADCY5, EBF3, NR2F218 and DIO323 were among the 140 genes (Fig. 4). Other frequently hypermethylated genes that have been previously reported in CLL, such as DAPK1, were among the initial list of 1764 genes, but failed to survive the statistical test. Several other SFRP proteins have been reported to be methylated in CLL;8,24 however, our study identified only SFRP1 as one of the most frequently hypermethylated genes in CLL. A subset of these genes was discovered to be differentially expressed between normal B-cells and CLL samples using an independent microarray study (unpublished results, McCarthy and Chiorazzi). The expression of SOX9 and SOX11 was also downregulated in CLL samples as compared with normal B-cells (Fig. S4).
To confirm the DNA methylation results generated using RRBS, we performed bisulfite pyrosequencing to validate the methylation status of FOXA2 and SOX11 in an independent set of 43 CLL and 5 normal CD19+ B-cells samples (Fig. 5and S5). The results confirmed that FOXA2 and SOX11 were hypermethylated in a significant number of CLL patient samples (Student t-test, p = 0.0049 and 0.035, respectively). Treatment of two lymphoma cell lines with 5′-aza-2’-deoxycytidine (5′-Aza) and/or Trichostatin A (TSA) was able to successfully re-activate the expression of several hypermethylated genes in CLL including FOXA1, FOXA2, SOX9, SOX11 and IRX1 (Fig. 5C).
So far, most CLL methylation studies have been focused on promoter hypermethylation. RRBS analysis in CLL also identified genes frequently hypomethylated in CLL cells when compared with normal B-cells. Interestingly, 95 out of 133 (> 70%) hypomethylated DMRs were located in the gene body including exons, introns and 3′-UTRs (Table S4 Figure S6). Only 5 of the 133 hypomethylated DMRs were located in the promoter region. This was in contrast to the hypermethylated DMRs as over 40% of them were located in the 5′ end regulatory regions. Many genes that exhibited hypomethylation have a known or potential role as oncogenes; for instance, the oncogene TCL1A contained a hypomethylated DMR in the 3′-UTR (Fig. S6A). Hypomethylation was also identified within the gene body of BCR, LFNG, NOTCH1 (Fig. S6B), TCF7 (Fig. S6C), RASGRF1 and VAV2, as well as numerous other genes.
Among the hypomethylated genes, the promoter P2 and first intron regions of the transcription factor NFATc1 were found to be hypomethylated in all CLL patient samples examined by RRBS (Fig. 6A). Conventional bisulfite genomic sequencing confirmed the differential methylation patterns in the 4 CLL samples and the Mec-1 CLL cell line that were analyzed using RRBS (Fig. 6B). The bisulfite clone sequencing results were in complete agreement with the RRBS results. Pyrosequencing analysis was then conducted on the NFATc1 P2 promoter in 44 CLL samples, 4 normal CD19+ B-cell samples, 1 memory B-cell sample and 2 naïve B-cell samples used in the RRBS analysis. Figure S5C illustrates several representative pyrosequencing results. Figure 6C shows the summary of the pyrosequencing analysis for 6 CpGs in the NFATc1 P2 promoter in the CLL samples. Nearly all CLL samples contained significantly lower methylation levels in the promoter P2 (less than 10%), while the average methylation levels of NFATc1 in normal B-cells were more than 50%. The memory B-cell sample had the lowest methylation values when compared with naïve and CD19+ B-cells suggesting tissue specific differential methylation. We then analyzed the mRNA expression level of NFATc1 and found significant upregulation (Student t-test, p = 0.03) in CLL samples (Fig. 6D). Western blot analysis of 4 CLL samples also indicated a 3 to 10-fold increase in NFATc1 protein expression (Fig. 6E).
In this study, we determined the methylation status of approximately 6–8% of the CpGs in the CLL genome using RRBS. These CpGs were highly enriched in CGI regions; over 23,000 CGIs were examined in normal and CLL B-cells. To our knowledge, this is the first sequencing-based methylation study in CLL. Previous studies using microarray or RLGS only analyzed up to approximately 28,000 CpG sites in the CLL genome. On average, only a few CpGs were examined in each CpG island or promoter region. At the single-CpG level, the global methylation pattern was similar between normal and CLL B-cells, as indicated by similar Pearson correlation coefficients among all primary B-cell samples. However, after scanning the genome using 200bp tiling windows, 8703 small DMRs were identified between CLL and normal B-cells, and about one third of the DMRs were located near the TSS. We have also identified tissue specific DMRs (tDMRs) between normal memory and naïve B-cells. The majority of the 1870 tDMRs (91%) were hypomethylated in memory B-cells as compared with naïve B-cells. This result appears to be in agreement with a recent genome-wide methylation study comparing germinal center (GC) B-cells with naïve B-cells, in which GC B-cells were found to be predominantly hypomethylated when compared with naïve B cells.25 To determine whether the observed methylation differences could be due to variation in B-cell composition, or to normal variation between samples, we performed a permutation test and demonstrated that this was unlikely to be the case (p = 6.36 × 10−87).
We identified 1764 genes that were hyper- or hypomethylated in at least one CLL sample when compared with all three normal B-cell samples. The number of differentially methylated genes was quite variable between CLL samples suggesting heterogeneity among aberrant methylation changes in CLL. We further identified a group of 140 genes that were most commonly hypermethylated including several genes previously reported in CLL, as well as other hematological malignancies, such as, FOXD3,14 FOXE1, FOXG1, IRX1,15 ID4,9,14 SFRP1,8 SLIT2,22 BNC1, ADCY5, EBF3, NR2F218 and DIO3.23 Interestingly, several large protein families such as HOX genes, FOX genes, and SOX genes were among the most frequently hypermethylated genes in CLL. The HOX gene family consists of four gene clusters located on chromosomes 2, 7, 12 and 17. We found that the HOXA, HOXC, and HOXD genes were the most affected by aberrant DNA methylation changes. Our study represents the first comprehensive interrogation of aberrant HOX gene methylation changes in CLL to date. Chen et al. showed that methylation-mediated silencing of FOXD3 was a key event early in leukemogenesis in both mouse and human CLL.14 It was suggested that silencing of FOXD3 might lead to silencing of many of its target genes. In this study, we found that the methylation status of 20 FOX genes was altered in CLL including FOXD3. Several genes such as FOXA1 and FOXA2 are also potential downstream target genes of FOXD3.
Previous studies have determined that the WNT signaling pathway is activated in CLL.26 The key transcription factor in the WNT pathway, LEF-1, has been shown to be overexpressed more than 28-fold in CLL B-cells.27 We have found that most of the hypermethylated WNT pathway genes in CLL were antagonist proteins of WNT signaling such as DKK1, DKK3, and SFRP1. In addition, we found that 11 out of 20 SOX proteins were hypermethylated in CLL, including SOX9 which was identified as one of the most frequently hypermethylated genes in CLL. A recent study concluded that SOX9 inhibited β-catenin/TCF-dependent transcription and promoted β-catenin degradation by two separate mechanisms involving different domains of SOX9.28 Epigenetic silencing of WNT signaling antagonists may disrupt the balanced inputs of these proteins and lead to the activation of this signaling network.
A significant number of genes were affected by hypomethylation in CLL, and most of the frequently hypomethylated DMRs were located in the gene body including coding sequences, introns, and 3′ UTRs. Intriguingly, the oncogene TCL1A was hypomethylated at the 3′ UTR. The impact of this aberrant epigenetic change on TCL1A expression is currently unclear and warrants further investigation. NOTCH1 is another important gene affected by hypomethylation in one of the exons. Hypomethylation of NOTCH1 in the 5′ regulatory region has previously been reported in mantle cell lymphoma;29 however, we did not find similar changes in CLL. The significance and functional consequences of the hypomethylation in the gene body has not been well documented. However, these methylation changes may play an important role in regulating gene expression, particularly in a lineage or tissue specific manner. For example, the methylation status of conserved non-coding DNA elements in the Foxp3 gene can facilitate the heritable maintenance of the active state of the Foxp3 locus and determine the regulatory T-cell fate.30
NFATc1 is a member of the nuclear factor of activated T-cells (NFAT) family of transcription factors and plays an epigenetic chromatin remodeling role in the transcriptional regulation of growth and survival genes including CD40L (CD40 ligand, also known as CD154) and BlyD (also known as BAFF).31-33 Recently it was reported that NFATc1 supports the proliferation and suppresses the activation-induced cell death of splenic B-cells upon B-cell receptor (BCR) stimulation.34 Previous studies have shown that CLL contains constitutively activated NFAT transcription factor activity,35 and immunohistochemistry staining has shown that NFATc1 is expressed in CLL.36 Our data suggests that NFATc1 is overexpressed in CLL B-cells due to hypomethylation in both the promoter P2 and first intron regions. This hypomethylation occurred in 55 CLL samples studied (including both sequencing and validation studies) suggesting that epigenetic dysregulation of NFATc1 is a frequent event in CLL and may play an important role in the constitutive action of NFAT activity.
In summary, we used a single-base resolution bisulfite sequencing approach to characterize the DNA methylation map of purified B-cells from both normal control and CLL samples. We observed distinct patterns of DNA methylation in different functional elements across the genome. This study has uncovered not only several thousand novel cancer-specific DMRs, but also DMRs between subtypes of normal B-cells. Finally, hypomethylation of the transcription factor NFATc1 occurred in the majority of the CLL samples and causes significant upregulation of this gene. The digital methylation map generated in this study provides the precise genomic locations that undergo methylation changes. This map will be a highly valuable public resource for investigations aimed at understanding epigenetic regulation of the CLL genome.
Blood samples were obtained from CLL patients at the Ellis Fischel Cancer Center (EFCC) in Columbia, MO (11 CLL samples), the GHSU Cancer Center in Augusta, GA (14 samples) and the North Shore-LIJ Health System in Manhasset, NY (29 samples) in compliance with the local Institutional Review Boards. Clinical and molecular characteristics of 11 CLL samples used for RRBS analysis are summarized in Table S5, and the CLL samples used for replication studies are summarized in Table S6. The CD19+ normal B-cell, CD19+/CD27+ memory B-cell, and CD19+/CD27- naïve B-cell samples used in the RRBS analysis were purchased from ALLCELLs Inc. Other normal CD19+ B-cell samples were purified from leuko-enriched blood samples purchased from a local blood bank. Mononuclear cell fractions (PBMCs) were isolated over a Ficoll-Hypaque density gradient. Freshly isolated B-cells were prepared by negative selection using RosetteSep B-cell isolation kit (StemCell Technologies). Naïve B-cell samples were enriched using a positive selection kit (StemCell Technologies). DNA was isolated using the QIAmp DNA Blood Mini kit (Qiagen). RNA was isolated using an RNeasy mini kit (Qiagen).
Three CLL cell lines were included in this study and were found to differ in their levels of CD38 expression, Wac-3 (4.7% CD38), Mec-1 (69.5% CD38), and Mec-2 (96.6% CD38).17 CLL cell lines were maintained in RPMI 1640 media with 10% fetal bovine serum (FBS). A Burkitts lymphoma cell line, Raji, was also maintained in the media described above. For gene reactivation experiments, cells were cultured in the presence of vehicle (PBS) or 1.0 μM 5′-Aza with medium changed every 24 h. After 4 d, cells were either harvested, treated with TSA (1.0 μM) for an additional 12 h and then harvested, or treated solely with TSA for 12 h before being harvested. The total RNA was isolated as described above.
RRBS was performed according to a previously published protocol20,37 with minor modifications. For each sample, 1 μg genomic DNA was digested overnight using 40 units of MspI (New England Biolabs). The digested DNA was end-repaired and adenylated in a 50 μl reaction consisting of 10U of exo-Klenow fragments (Enzymatics), and 2 μl each of dGTP (1 mM), dATP (10 mM), and methylated dCTP (1 mM). The reaction was incubated at 30°C for 30 min followed by 37°C for an additional 30 min. The methylated Illumina adapters were ligated to the adenylated DNA fragments in a 20 μl reaction containing 2 μl concentrated T4 ligase (Enzymatics) at room temperature for 15 min. The ligation products were gel-selected for fragments with insertion sizes of 40 to 120 bp and 120 to 220 bp as previously suggested.20,37 Bisulfite treatment was conducted using the EZ DNA methylation kit (Zymo Research) according to manufacturer’s protocol. The final libraries were generated using 5 μl bisulfite-converted template in a 14-cycle PCR amplification using the PfuTurbo Cx Hotstart polymerase (Agilent Technologies). The libraries were sequenced using an Illumina Genome Analyzer IIx (Ilumina) with a read length of 52 or 76bp.
Total RNA was reverse transcribed in the presence of SuperScript II reverse transcriptase (Invitrogen). The cDNA was then analyzed using real time PCR performed using RT2 SYBR green PCR master mix containing the appropriate reagents (SA Biosciences) in a LightCycler480 instrument (Roche). The primer sequences are listed in Table S7.
Normal and CLL B-cells were centrifuged and the cell pellets were re-suspended in lysis buffer. Forty μg of the protein lysate were separated on a 10% SDS-PAGE gel. The separated proteins were transferred to nitrocellulose membranes before being sequentially blocked for 1 h in LI-COR blocking buffer and incubated with primary antibody and secondary antibody with four 5 min washes in between. Membranes were scanned, and bands were quantified using the Odyssey infrared imaging system (LI-COR). The antibodies used were: anti-NFATc1 (7A6) (SC-7294), anti-β-Actin (C4) (sc-47778) from Santa Cruz Biotechnology and Goat anti-Mouse IgG IRDye800CW (827–0836) from LI-COR.
Two regions in the promoter P2 and first intron of NFATc1 were analyzed using bisulfite clone sequencing. Primer sequences are listed in Table S7. The bisulfite conversion of genomic DNA was conducted as described above using 500ng of DNA for each sample. Amplified PCR products were sub-cloned using the TOPO TA Cloning Kit for sequencing (Invitrogen). Plasmid DNA of 16 insert-positive clones was isolated using the Qiagen Plasmid Miniprep kit and sequenced by Sanger sequencing.
The DMRs associated with FOXA2, SOX11, and NFATc1 were subjected to bisulfite pyrosequencing analysis. The bisulfite PCR and sequencing primers are listed in Table S7. The bisulfite treatment of genomic DNA was performed as described above. The pyrosequencing analysis was performed using PyroMark CpG assay reagents on a PyroMark Q24 instrument, according to manufacturer’s instructions (Qiagen). Program outputs were analyzed by the PyroMark Q24 software, and the percentage of methylated vs. unmethylated alleles was determined by calculating the ratio of relative peak heights.
The raw sequencing reads were cleaned using in-house scripts to trim sequencing adapters and low quality bases (Q < 67 in Illumina 1.5) in the 3′ end and ambiguous bases in both ends. To map the sequencing reads from RRBS, we extracted sequenceable regions from the human genome (hg18) that were within 100bp from the MspI sites at both ends. Each sequenceable region was indexed by converting all C’s and G’s to T’s and A’s, respectively, i.e., two different reference databases. Bowtie was used to map the cleaned reads to each of the two reference databases after converting all C’s to T’s. For each read, an in-house script computed the best of all alignments for the different loci using two different reference databases based on the number of mismatches after realigning the original read and reference sequences. The script also determined the methylation status of each cytosine residue by comparing the bisulfite-converted sequence to the reference sequence. Another in-house script piled reads for each cytosine in the reference genome and counted the numbers of reads that contained methylated and unmethylated cytosines, respectively. Finally the methylation of each CpG site was defined as the fraction of methylated reads to that of methylated and unmethylated reads combined. CpGs with < 5 reads were filtered out of further analyses. The correlation between genome-wide CpG methylation across two samples was calculated using the Pearson’s product-moment coefficient using an R script. The raw and analyzed sequencing data from this study has been submitted to NCBI Gene Expression Omnibus (http://ncbi.nlm.nih.gov/geo/) under accession number GSE32698.
We used 200 bp non-overlapping windows to identify differentially methylated regions (DMRs). The windows containing fewer than 5 CpGs were filtered out of the further analysis. For each of these windows (> 5 CpGs), the number of methylated and unmethylated CpG observations was determined by summing the numbers of methylated and unmethylated CpGs in all reads that were mapped within each window, and a p value was assigned using Fisher exact tests. Once all p values were calculated, multiple-testing correction was performed separately for each window using the FDR q-value developed by Benjamini and Hochberg.38 The DMRs between two samples (e.g., MBC vs. NBC) were identified with an FDR q value < 0.01 and a methylation difference > 0.25. To obtain the DMRs between two groups, i.e., CLL samples vs. normal B-cells, we first identified the DMRs between one CLL sample and each of the three normal B-cells samples by applying Fisher’s exact tests using the same stringent cutoff values in a pair wise fashion. Next, we selected only the common DMRs that were consistently identified in each CLL sample compared with all three normal B-cell samples. Finally, the DMRs discovered in each CLL sample when compared with the normal B-cell samples were merged to make up the DMRs between two groups.
Student's t-test analysis was used to identify differentially methylated genes with statistical significance between groups (i.e., CLL verses normal B-cells; IGHV mutated and unmutated CLL samples). The average methylation value of each window was used to perform the statistical analysis in R. The p-value was adjusted using FDR based on the method described above.38 An FDR q value < 0.05 and a methylation difference > 0.25 were the cut off values used to identify the statistically significant differentially methylated genes.
We thank Drs. Nicholas Chiorazzi, Kanti R. Rai and Steven L. Allen for providing the CLL samples. We also thank Dr. Judith Giri and Ms. Sameera Qureshi of GHSU tumor bank for helping collect the CLL samples. We are grateful to Dr. Chiorazzi for sharing the unpublished microarray data with us. This work was supported in part by the National Institute of Health (Grants CA134304 and DA025779 to H.S.). H. S. is a Georgia Cancer Coalition Distinguished Cancer Scientist. J. C and H. T. were supported by a National Research Foundation of Korea Grant funded by the Korean Government (NRF-2009–352-D00275).
The authors declare no competing financial interests.
Previously published online: www.landesbioscience.com/journals/epigenetics/article/20237