Methods for cell isolation, culture, drug treatment, MeDIP, and methylation and expression-profiling were previously reported [14
]. Briefly, methylated DNA was enriched using the MeDIP approach followed by hybridization to genomic promoter tiling arrays (NimbleGen C426-00-01) containing 390,000 probes. Standard normalization methods for two-channel arrays were applied, and relative methylation levels were determined using the MEDME bioconductor library [24
]. Approximately 20-30 million cells were used for each mRNA extraction. Cells were treated with low-dose (200 nM) decitabine for two days, followed by one day recovery before total RNA extraction. Gene expression data was derived from NimbleGen human whole genome expression microarrays (array 2005_04-20_Human_60 mer_1in2) containing 380,000 probes with an average of 11 probes per refseq, located throughout the gene. Probe measurements were then averaged for each refseq. The same chip was hybridizied with differentially labeled, polyA-selected cDNA from decitabine-treated and untreated cells, experiments were repeated with dye swapping. Data was captured and processed by NimbleGen Systems Iceland LLC. Normalization within arrays was performed with Loess-based methods to correct for biases due to labeling with different dyes on the two microarray channels and to correct for spatial artifacts. As such, M and A values were determined where M describes the amount of differential expression (M = log2(cy5/cy3)) and A associates M with the magnitude of overall expression (A = (log2cy5+log2cy3)/2). Normalization between arrays was performed via quantile-based normalization. mRNA RefSeqs were mapped to the genome and those with < 96% sequence identity, as well as those that mapped to more than two genomic loci, were discarded. Analysis of the data revealed upregulation of 292 common genes across the cell lines after Decitabine treatment, and the treatment effect on demethylation was validated in selected strongly upregulated genes, such as CDKN1A and TGFBI.
For our experiments on decitabine-induced gene upregulation, we pooled information on gene promoter methylation level, CpG density, as well as differential gene expression from 7 cell strains. Data pooling yielded measurements for 22,824 promoters per cell strain, for a total of 159,768 data triplets (methylation, CpG density and differential gene expression).
For each promoter, the sequence from 2,200 basepairs upstream to 500 basepairs downstream of the transcription start site (TSS) was analyzed for CpG content, and five equal categories defined. Methylation levels for these promoters were previously reported for each of six bins spanning the same 2,700 basepairs around the TSS [14
]. In order to obtain a single methylation measurement for each promoter, we used the sum of these six values and divided the results into four categories such that each bin contains roughly the same number of promoters (~40,000). Basal and decitabine-induced expression values represent the mean of two replicate experiments for each promoter. Promoters that demonstrate a two-fold increase in expression (post-treatment/pre-treatment expression >= 2) as well as an absolute expression increase of at least 5,000 units are labeled as up-regulated.
For each combination of CpG content and methylation level, the non-parametric wilcoxon signed-rank test was employed to compare post-decitabine to pre-decitabine expression levels. For this analysis we used the R stats module wilcox.test function.
For each combination of CpG content and methylation level, the significance of the difference between the observed number of up-regulated genes and the number expected by chance alone (total number in the bin multiplied by the fraction of all genes that undergo up-regulation) is calculated from the hypergeometric distribution using the R stats module dhyper function with the number up-regulated in the CpG/methylation bin, total number of up-regulated promoters, total number not up-regulated, and number in the CpG/methylation bin.
We used the GSEA program http://www.broadinstitute.org/gsea/
to analyze our expression data for enriched Gene Ontology and motif gene sets. We uploaded the gct and cls files corresponding to our data from the YUMAC cell strain, and set the Metric for ranking gene
s to Ratio_of_Classes
, and the Permutation type
. All other parameters were set to default.
Expression response data following decitabine treatment on the MCF-7 breast cancer cell line was downloaded from the BROAD Institute Connectivity Map [22
]. Methylation data for the MCF-7 cell line was downloaded from the supplementary material from a study by Li et al. that used a modified methylation-specific digital karyotyping for genome-wide methylation profiling of two breast cancer cell lines [23
]. Methylation levels were in the form of the number of sequencing reads per fragment. Using the 90th quantiles from the MCF-7 and melanoma datasets, the MCF-7 methylation levels were scaled so that the distribution of values between the two datasets occupied an equivalent range. MCF-7 promoters were then categorized as having either low (0-1), intermediate (1-6), or high (>6) levels of methylation.
A computational model of decitabine-response was built using the generalized linear model for logistic regression. This was implemented using the R stats module glm function with the following arguments: formula = upregulated ~promoter methylation + promoter CpG content; family = gaussian; method = glm.fit (iteratively weighted least squares). Briefly, data from the YUMAC cell line was filtered for genes with pre-treatment expression levels below 700 units (app. 40% of the data). For each promoter bin, and for the region as a whole, the model was then trained and tested using ten-fold cross-validation, receiver operating curves were generated and AUCs calculated using the ROCR package [25
]. The class labels were then permuted 500 times, the model trained and tested for each permutation, and the mean AUC calculated.
Gene expression and promoter methylation data have been uploaded to GEO (Accession: GSE13706) and ArrayExpress (Accession: E-MTAB-185).