Ovarian cancer is prevalent in women [1
] and is associated with a high mortality rate as it is usually diagnosed at an advanced stage [2
]. A standard treatment of advanced ovarian cancer involves surgical resection followed by cycles of adjuvant chemotherapy, typically a combination of taxane-based regimens and platinum-based cytotoxic agents [3
]. The combination of paclitaxel and carboplatin is one of the most common first-line treatments of ovarian cancer [4
]. The mechanism of action (MOA) of paclitaxel is to stabilize microtubules and as a result it induces mitotic arrest and apoptosis [6
], and the MOA of carboplatin is to bind with DNA and form intra-strand crosslinks so as to inhibit DNA replication and transcription, and eventually activate the p53-dependent apoptosis [7
]. In most patients, the initial responses to the combination of paclitaxel and carboplatin are good; however, subsequent relapses frequently occur [8
]. Unraveling the underlying mechanisms causing chemoresistance is crucial for personalized therapy and the improvement of patients' long-term survival.
Microarrays have been used to study genes and molecular functions associated with chemoresistance. For example, Jazaeri et al.
(2005) detected differentially expressed genes among primary chemosensitive, primary chemoresistant, and postchemotherapy tumors using cDNA-based microarrays [9
]. Additionally, Hartmann et al.
(2005) applied a supervised learning algorithm and selected 14 genes to predict the relapsed outcome of ovarian cancer patients after platinum-paclitaxel chemotherapy [10
]. Etemadmoghadam et al.
(2009) further considered chromosomal aberrations and proposed that DNA copy number alterations (CNAs) at genes such as CCNE1 and NCOA3 are associated with chemoresistance [11
]. While many studies had proposed genes or pathways associated with chemotherapeutic response, most of these studies suffered from limited number of patients and patient diversity, as well as other confounding factors to a certain extent, particularly when the samples were derived from patients with different treatment plans. Since these factors such as tumor stage, subtype, and different chemotherapies may change clinical outcome significantly, reliable results could be difficult to achieve if these confounding effects were not adequately addressed during statistical analysis.
In this regard, the Cancer Genome Atlas (TCGA) data need to be carefully assessed for eligibility to a chemotherapy study. Recently, the TCGA Research Network concluded an ovarian cancer study with thousands of microarray data including mRNA expression, DNA copy number, miRNA, SNP, and CpG methylation data from more than 500 ovarian tumor samples [12
]. While a large number of samples provide ample opportunities to carry out sophisticated survival analysis, caution should be taken: patient ages, tumor stages and treatment cycles may confound the survival outcome, while various therapeutic compounds, their combination and sample processing batches may suppress the detection without proper handling. As an example, among more than 500 patients, treatments include avastin, bevacizumab, carboplatin, cisplatin, cytoxan, docetaxel, doxoribicin, etoposide, gemcitabine, navelbine, paclitaxel, and others. In addition, these samples were processed in 13 batches.
Herein, a procedure for reducing the confounding and suppression effects is proposed, in which, factors such as experimental batches, clinical treatment, patient ages, tumor stages, and molecular classifications are carefully considered and dealt with. Beginning with a batch effect correction, we chose eligible samples through a rigorous sample selection process. In this paper, we will focus only on patients with paclitaxel and carboplatin treatment in order to remove possible confounding factors due to better drug or treatment combination when examining the survival outcome, and in the meantime, to maximize the ability of discriminating tumor subtypes. After the selection, 85 ovarian cancer samples treated only with the combination of paclitaxel and carboplatin were selected for training, and another independent 83 samples treated mainly with the combination of paclitaxel and carboplatin but including some other drugs were applied for testing. Then, gene expression, copy number, and methylation data were analyzed in a novel semi-supervised clustering method. By performing a series of statistical hypothesis testing and clustering tasks, two molecular classifications with poor progression-free survival (PFS) were identified. Comparing these classifications to other samples with good PFS, genes significantly associated with chemotherapeutic response were detected, and enriched biological processes were further examined using a gene ontology enrichment analysis method.
In this paper, the proposed procedure and the semi-supervised clustering method are detailed with flow-charts and mathematical explanations in Methods. In Results, the clustering results and the subsequent differences in chemotherapeutic response are compared via Kaplan-Meier curves. Discussions of analysis results and conclusions are provided in Discussion and conclusions.