The profile of DNA methylation was obtained for 460 peripheral-blood samples using the Human Methylation27 Beadarray. We used a semi-supervised strategy to identify profiles of DNA methylation associated with bladder cancer and to examine whether the identified profiles can predict case status in a series of blinded test samples (). Following quality assurance procedures, the data set was split into training and testing series. Characteristics of the cases and controls are shown in , and do not differ significantly between training and testing sets (Data Supplement).
Characteristics of the Participants Used in the Analysis
The first step of our semi-supervised strategy was to identify those CpG loci whose methylation state was most significantly associated with being a bladder cancer case rather than control. To do this, we fit a series of linear mixed-effects models using the training data only for each of the 26,486 CpGs in the data set. This allowed us to model each methylation value as the dependent variable, with a random effect for plate (to allow for inter-plate normalization) based on a single normalization sample run on all plates and a fixed effect for case-control status. CpG loci were ranked based on the absolute value of the t statistic derived from the model, and the top nine loci were chosen on the basis of a nested cross-validation procedure () for inclusion in the RPMM, which clustered the samples on the basis of the methylation profile of these nine loci in the training data. To predict class membership in the testing data using only the methylation status of these nine loci, the latent class structure from the RPMM solution fit to the training data was used in conjunction with an empirical Bayes procedure. The methylation profile of these nine loci in the testing data is depicted in A, which also shows the mean methylation across loci within a given class and the relationships among the classes through the dendrogram. The right branch classes (those beginning with the letter R) had overall mean methylation that was significantly greater than that of the left branch classes (P < .0001). The distribution of the methylation values for each of the nine loci, across classes, is depicted (Data Supplement).
Fig 2. DNA methylation profiles defined by a panel of nine loci are significantly associated with bladder cancer. (A) The recursively partitioned mixture model–based classification of methylation of nine loci (columns) in the peripheral blood–derived (more ...)
In the test set, we observed that class membership was significantly associated with case-control status (P < .0001, permutation-based χ2 test, B), with the right branch classes (those beginning with R) containing a higher proportion of bladder cases than controls compared with the left branch classes. The methylation beta values for cases compared with controls for each of the loci in the testing set are shown (Data Supplement). Each of the nine CpG loci used in the classifier had greater methylation among cases than controls. We assessed performance of the classifier by using receiver operating characteristic curves and calculating the area under the curve (AUC). Using methylation class alone, the AUC was 0.70 (95% bootstrap CI, 0.63 to 0.77). After adjustment for participant age, sex, smoking status (never, former, current), and family history of bladder cancer, the AUC increased to 0.76 (95% bootstrap CI, 0.70 to 0.82; A and B). To identify whether the association between methylation profiles and bladder cancer is sensitive to the statistical methodology used in the examination, we also performed our analysis using a LASSO approach, using the same training and testing data sets. The methods and results of these analyses are described (Data Supplement) and suggest that our identification of bladder cancer–associated methylation classes is robust to the statistical method used.
Fig 3. Receiver operating characteristic (ROC) curve analysis of methylation profiles. (A) ROC curve based on methylation class only results in a significant area under the curve (AUC) of 0.70 (95% CI, 0.63 to 0.77). (B) ROC curve including methylation classes, (more ...)
Unconditional logistic regression was used to calculate the magnitude of the association between methylation class and bladder cancer, controlling for potential confounders. The odds ratios (ORs) and 95% CI resulting from each of the pairwise comparisons between the seven predicted classes are shown (Data Supplement). There was a trend of increasing risk of disease moving from the left to right branch of the classification, with the highest risk for members of class RR compared with LLL (OR = 8.7; 95% CI, 1.5 to 55.2). Comparing all the right branch classes with all the left classes, the OR for bladder cancer was 5.2 (95% CI, 2.8 to 9.7), controlled for participant age, sex, smoking status, and family history of bladder cancer. There was no difference in the prevalence of invasive disease across the predicted classes (data not shown).
Because previous work has suggested that aging is associated with epigenetic states in peripheral blood and can be related to the alterations associated with cancer, we sought to examine whether there was any overlap in the biologic pathways impacted by differential DNA methylation associated with age or case status. We performed a gene set enrichment analysis (GSEA) based on Kegg-defined pathways using the combined training and testing data and compared pathways over-represented among loci associated with participant age (in controls) with those associated with disease. Pathways with a nominal P < .05 based on the GSEA enrichment statistic are provided in , grouped by function. No overlapping pathways based on age- and disease-associated loci were identified. However, similar functional groupings of pathways were identified in both age-associated and bladder cancer–associated loci and are detailed in . Genetic information processing pathways were identified exclusively among loci associated with bladder cancer.
Fig 4. Diagram of the gene-set enrichment analysis on DNA methylation data. The upper panel depicts the transcription factor binding sites (TFBS) within 1 kB of differentially methylated loci associated with aging, bladder cancer, and their overlap grouped by (more ...)
In addition to examining the functional consequences of differential methylation in peripheral blood between cases and controls, we hypothesized that differential methylation profiles may represent a response of the hematopoietic system to a developing tumor (ie, the methylation profiles capture the downstream effects of this response, which may be through differential binding of transcription factors near sites of altered methylation). The top half of depicts the results of this GSEA-based analysis, depicting binding sites of transcription factors over-represented within 1 kB of loci whose DNA methylation was related to age, bladder cancer status, or both, grouped by similar structure or functional response. Binding sites for a forkhead-containing transcription factor and a transcription factor involved in immune modulation (GATA1) overlapped between loci associated with age and disease status. Loci with differential methylation strongly associated with age were nearby binding sites of a large number of transcription factors related to developmental processes, including homeobox-containing transcription factors, as well as factors involved in immune modulation and stress response. Oncogenic transcription factor binding sites as well as immune modulation and development-related transcription factor binding sites were exclusively over-represented near loci whose methylation was associated with bladder cancer.