Modeling of cancer hazards at age t deals with a dichotomous population, a small part of which (the fraction at risk) will get cancer, while the other part will not. Therefore, we conditioned the hazard function, h(t), the probability density function (pdf), f(t), and the survival function, S(t), on frailty α in individuals. Assuming α has the Bernoulli distribution, we obtained equations relating the unconditional (population level) hazard function, hU(t), cumulative hazard function, HU(t), and overall cumulative hazard, H0, with the h(t), f(t), and S(t) for individuals from the fraction at risk. Computing procedures for estimating h(t), f(t), and S(t) were developed and used to fit the pancreatic cancer data collected by SEER9 registries from 1975 through 2004 with the Weibull pdf suggested by the Armitage-Doll model. The parameters of the obtained excellent fit suggest that age of pancreatic cancer presentation has a time shift about 17 years and five mutations are needed for pancreatic cells to become malignant.
cancer incidence; cancer hazard; frailty; Weibull distribution; pancreatic cancer
We present a novel machine learning approach for the classification of cancer samples using expression data. We refer to the method as “decision trunks,” since it is loosely based on decision trees, but contains several modifications designed to achieve an algorithm that: (1) produces smaller and more easily interpretable classifiers than decision trees; (2) is more robust in varying application scenarios; and (3) achieves higher classification accuracy. The decision trunk algorithm has been implemented and tested on 26 classification tasks, covering a wide range of cancer forms, experimental methods, and classification scenarios. This comprehensive evaluation indicates that the proposed algorithm performs at least as well as the current state of the art algorithms in terms of accuracy, while producing classifiers that include on average only 2–3 markers. We suggest that the resulting decision trunks have clear advantages over other classifiers due to their transparency, interpretability, and their correspondence with human decision-making and clinical testing practices.
classification; machine learning; gene expression; biomarkers
The aim of this study was to perform comparative analysis of multiple public datasets of gene expression in order to identify common genes as potential prognostic biomarkers. Additionally, the study sought to identify biological processes and pathways that are most significantly associated with early distant metastases (<5 years) in women with estrogen receptor-positive (ER+) breast tumors. Datasets from three published studies were selected for in silico analysis of gene expression profiles of ER+ breast cancer, using time to distant metastasis as the clinical endpoint. A subset of 44 differently expressed genes (DEGs) was found common to all three studies and characterized by mitotic checkpoint genes and pathways that regulate mitotic spindle and chromosome dynamics. DEG promoter regions were enriched with NFY binding sites. Analysis of miRNA target sites identified significant enrichment of miR-192, miR-193B, and miR-16-1 targets. Aberrant mitotic regulation could drive increased genomic instability leading to a progression towards an early onset metastatic phenotype. The relative importance of mitotic instability may reflect the clinical utility of mitotic poisons in metastatic breast cancer, including poisons such as the taxanes, epothilones, and vinca alkaloids.
estrogen receptor alpha-positive; mitotic checkpoint signaling; mitotic regulation network; microRNA targets; early distant metastasis
Cancer risk management involves obliterating excess concentration of cancer causing trace elements by the natural immune system and hence intake of nutritious diet is of paramount importance. Human diet should consist of essential macronutrients that have to be consumed in large quantities and trace elements are to be consumed in very little amount. As some of these trace elements are causative factors for various types of cancer and build up at the expense of macronutrients, cancer risk management of these trace elements should be based on their initial concentration in the blood of each individual and not on their tolerable upper intake level. We propose an information theory based Expert System (ES) for estimating the lowest limit of toxicity association between the trace elements and the macronutrients. Such an estimate would enable the physician to prescribe required medication containing the macronutrients to annul the toxicity of cancer risk trace elements. The lowest limit of toxicity association is achieved by minimizing the correlated information of the concentration correlation matrix using the concept of Mutual Information (MI) and an algorithm based on a Technique of Determinant Inequalities (TDI) developed by the authors. The novelty of our ES is that it provides the lowest limit of toxicity profile for all trace elements in the blood not restricted to a group of compounds having similar structure. We demonstrate the superiority our algorithm over Principal Component Analysis in mitigating trace element toxicity in blood samples.
carcinogenic trace elements; high correlation coefficient; cancer screening; expert system; mutual information
Genome-wide association studies (GWAS) have identified genetic variants associated with an increased risk of developing breast cancer. However, the association of genetic variants and their associated genes with the most aggressive subset of breast cancer, the triple-negative breast cancer (TNBC), remains a central puzzle in molecular epidemiology. The objective of this study was to determine whether genes containing single nucleotide polymorphisms (SNPs) associated with an increased risk of developing breast cancer are connected to and could stratify different subtypes of TNBC. Additionally, we sought to identify molecular pathways and networks involved in TNBC. We performed integrative genomics analysis, combining information from GWAS studies involving over 400,000 cases and over 400,000 controls, with gene expression data derived from 124 breast cancer patients classified as TNBC (at the time of diagnosis) and 142 cancer-free controls. Analysis of GWAS reports produced 500 SNPs mapped to 188 genes. We identified a signature of 159 functionally related SNP-containing genes which were significantly (P <10−5) associated with and stratified TNBC. Additionally, we identified 97 genes which were functionally related to, and had similar patterns of expression profiles, SNP-containing genes. Network modeling and pathway prediction revealed multi-gene pathways including p53, NFkB, BRCA, apoptosis, DNA repair, DNA mismatch, and excision repair pathways enriched for SNPs mapped to genes significantly associated with TNBC. The results provide convincing evidence that integrating GWAS information with gene expression data provides a unified and powerful approach for biomarker discovery in TNBC.
triple negative breast cancer GWAS gene expression
The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.
In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.
MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.
Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
microarray; classification; metagenes; breast cancer
For science, theoretical or applied, to significantly advance, researchers must use the most appropriate mathematical methods. A century and a half elapsed between Newton’s development of the calculus and Laplace’s development of celestial mechanics. One cannot imagine the latter without the former. Today, more than three-quarters of a century has elapsed since the birth of stochastic systems theory. This article provides a perspective on the utilization of systems theory as the proper vehicle for the development of systems biology and its application to complex regulatory diseases such as cancer.
cancer; control; epistemology; systems biology
Philadelphia positive malignant disorders are a clinically divergent group of leukemias. These include chronic myeloid leukemia (CML) and de novo acute Philadelphia positive (Ph(+)) leukemia of both myeloid, and lymphoid origin. Recent whole genome screening of Ph(+)ALL in both children and adults identified an almost obligatory cryptic loss of Ikaros, required for the normal B cell maturation. Although similar losses were found in lymphoid blast crisis the genetic background of the transformation in CML is still poorly defined. We used Significance Analysis of Microarrays (SAM) to analyze comparative genomic hybridization (aCGH) data from 30 CML (10 each of chronic phase, myeloid and lymphoid blast stage), 10 Ph(+)ALL adult patients and 10 disease free controls and were able to: (a) discriminate between the genomes of lymphoid and myeloid blast cells and (b) identify differences in the genome profile of de novo Ph(+)ALL and lymphoid blast transformation of CML (BC/L). Furthermore we were able to distinguish a sub group of Ph(+) ALL characterized by gains in chromosome 9 and recurrent losses at several other genome sites offering genetic evidence for the clinical heterogeneity. The significance of these results is that they not only offer clues regarding the pathogenesis of Ph(+) disorders and highlight the potential clinical implications of a set of probes but also demonstrates what SAM can offer for the analysis of genome data.
sam; significance analysis; arraycgh; ph+all; cml; lymphoid blast crisis; igh rearrangements; tarp; chr 9p; chr7p
Haploinsufficiency of tumor suppressor genes, wherein the reduced production and activity of proteins results in the inability of the cell to maintain normal cellular function, is one among the various causes of cancer. However the precise molecular mechanisms underlying this condition remain unclear. Here we hypothesize that single nucleotide polymorphisms (SNPs) in the 3′untranslated region (UTR) of mRNAs and microRNA seed sequence (miR-SNPs) may cause haploinsufficiency at the level of proteins through altered binding specificity of microRNAs (miRNAs). Bioinformatics analysis of haploinsufficient genes for variations in their 3′UTR showed that the occurrence of SNPs result in the creation of new binding sites for miRNAs, thereby bringing the respective mRNA variant under the control of more miRNAs. In addition, 19 miR-SNPs were found to result in non-specific binding of microRNAs to tumor suppressors. Networking analysis suggests that the haploinsufficient tumor suppressor genes strongly interact with one another, and any subtle alterations in this network will contribute to tumorigenesis.
haploinsufficiency; microRNA; single nucleotide polymorphism; miR-SNPs; tumor suppressor genes; cancer
Triple-negative breast cancer (TNBC) is a heterogeneous breast cancer group, and identification of molecular subtypes is essential for understanding the biological characteristics and clinical behaviors of TNBC as well as for developing personalized treatments. Based on 3,247 gene expression profiles from 21 breast cancer data sets, we discovered six TNBC subtypes from 587 TNBC samples with unique gene expression patterns and ontologies. Cell line models representing each of the TNBC subtypes also displayed different sensitivities to targeted therapeutic agents. Classification of TNBC into subtypes will advance further genomic research and clinical applications.
We developed a web-based subtyping tool TNBCtype for candidate TNBC samples using our gene expression meta data and classification methods. Given a gene expression data matrix, this tool will display for each candidate sample the predicted subtype, the corresponding correlation coefficient, and the permutation P-value. We offer a user-friendly web interface to predict the subtypes for new TNBC samples that may facilitate diagnostics, biomarker selection, drug discovery, and the more tailored treatment of breast cancer.
triple-negative breast cancer; gene expression microarray; meta-analysis; classification; subtypes
Mutations in cancer-causing genes induce changes in gene expression programs critical for malignant cell transformation. Publicly available gene expression profiles produced by modulating the expression of distinct cancer genes may therefore represent a rich resource for the identification of gene signatures common to seemingly unrelated cancer genes. We combined automatic retrieval with manual validation to obtain a data set of high-quality gene microarray profiles. This data set was used to create logical models of the signaling events underlying the observed expression changes produced by various cancer genes and allowed to uncover unknown and verifiable interactions. Data clustering revealed novel sets of gene expression profiles commonly regulated by distinct cancer genes. Our method allows retrieval of significant new information and testable hypotheses from a pool of deposited cancer gene expression experiments that are otherwise not apparent or appear insignificant from single measurements. The complete results are available through a web-application at http://biodata.ethz.ch/cgi-bin/geologic.
cancer genes; gene microarray database analysis; gene expression signatures; meta-analysis; network interactions; clustering
We aimed to find clinically relevant gene activities ruled by the signal transducer and activator of transcription 3 (STAT3) proteins in an ER(−) breast cancer population via network approach. STAT3 is negatively associated with both lymph nodal category and stage. MYC is a component of STAT3 network. MYC and STAT3 may co-regulate gene expressions for Warburg effect, stem cell like phenotype, cell proliferation and angiogenesis. We identified a STAT3 network in silico showing its ability in predicting its target gene expressions primarily for specific tumor subtype, tumor progression, treatment options and prognostic features. The aberrant expressions of MYC and STAT3 are enriched in triple negatives (TN). They promote histological grade, vascularity, metastasis and tumor anti-apoptotic activities. VEGFA, STAT3, FOXM1 and METAP2 are druggable targets. High levels of METAP2, MMP7, IGF2 and IGF2R are unfavorable prognostic factors. STAT3 is an inferred center regulator at early cancer development predominantly in TN.
STAT3; transcriptional regulatory network; microarray; grade; vascularity
Aberrant transcriptional activities have been documented in breast cancers. Studies often find some transcription factors to be inappropriately regulated and enriched in certain pathological states. The promoter regions of most target genes have binding sites for their transcription factors. An ample of evidence supports their combinatorial effect on their shared target gene expressions. Here, we used a new statistic method, bivariate CID, to predict combinatorial interaction activity between ERα and a transcription factor (E2F1or GATA3 or ERRα) in regulating target gene expression via four regulatory mechanisms. We identified gene sets in three signal transduction pathways perturbed in breast tumors: cell cycle, VEGF, and PDGFRB. Bivariate network analysis revealed several target genes previously implicated in tumor angiogenesis are among the predicted shared targets, including VEGFA, PDGFRB. In summary, our analysis suggests the importance for the multivariate space of an inferred ERα transcriptional regulatory network in breast cancer diagnostic and therapeutic development.
bivariate CID; network; transcription factor; shared target gene expression; angiogenesis
Reverse phase protein arrays (RPPA) measure the relative expression levels of a protein in many samples simultaneously. Observed signal from these arrays is a combination of true signal, additive background, and multiplicative spatial effects. Background subtraction alone is not sufficient to remove all nonbiological trends from the data. We developed a surface adjustment that uses information from positive control spots to correct for spatial trends on the array beyond additive background. This method uses a generalized additive model to estimate a smoothed surface from positive controls. When positive controls are printed in a dilution series, a nested surface adjustment performs an intensity-based correction. When applicable, surface adjustment is able to remove spatial trends and increase within slide replicate agreement better than background subtraction alone as demonstrated on two sets of arrays. This work demonstrates the importance of including positive control spots on the array.
protein array; normalization; control spots; generalized additive models
Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M−) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian (P = 1.3e−11). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M−, while the mesenchymal samples have the opposite profile.
cancer; microRNA; gene expression; methylation; correlation; integrated analysis
Motivated by the frustration of translation of research advances in the molecular and cellular biology of cancer into treatment, this study calls for cross-disciplinary efforts and proposes a methodology of incorporating drug pharmacology information into drug therapeutic response modeling using a computational systems biology approach. The objectives are two fold. The first one is to involve effective mathematical modeling in the drug development stage to incorporate preclinical and clinical data in order to decrease costs of drug development and increase pipeline productivity, since it is extremely expensive and difficult to get the optimal compromise of dosage and schedule through empirical testing. The second objective is to provide valuable suggestions to adjust individual drug dosing regimens to improve therapeutic effects considering most anticancer agents have wide inter-individual pharmacokinetic variability and a narrow therapeutic index. A dynamic hybrid systems model is proposed to study drug antitumor effect from the perspective of tumor growth dynamics, specifically the dosing and schedule of the periodic drug intake, and a drug’s pharmacokinetics and pharmacodynamics information are linked together in the proposed model using a state-space approach. It is proved analytically that there exists an optimal drug dosage and interval administration point, and demonstrated through simulation study.
drug effect; drug efficacy region; dosing regimens; hybrid systems; systems biology; tumor growth
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.
cancer prognostic; false positive rate; gene selection; high-dimensional regression; microarray data; survival analysis
Somatic cell genetic alterations are a hallmark of tumor development and progression. Although various technologies have been developed and utilized to identify genetic aberrations, identifying genetic translocations at the chromosomal level is still a challenging task. High density SNP microarrays are useful to measure DNA copy number variation (CNV) across the genome. Utilizing SNP array data of cancer cell lines and patient samples, we evaluated the CNV and copy number breakpoints for several known fusion genes implicated in tumorigenesis. This analysis demonstrated the potential utility of SNP array data for the prediction of genetic aberrations via translocations based on identifying copy number breakpoints within the target genes. Genome-wide analysis was also performed to identify genes harboring copy number breakpoints across 820 cancer cell lines. Candidate oncogenes were identified that are linked to potential translocations in specific cancer cell lines.
copy number variation; copy number breakpoint; SNP array; translocation
Early detection (localized stage) of colon cancer is associated with a five-year survival rate of 91%. Only 39% of colon cancers, however, are diagnosed at that early stage. Early and accurate diagnosis, therefore, constitutes a critical need and a decisive factor in the clinical treatment of colon cancer and its success. In this study, using supervised linear discriminant analysis, we have developed three diagnostic biomarker models that—based on global micro-RNA expression analysis of colonic tissue collected during surgery—can discriminate with a perfect accuracy between subjects with colon cancer (stages II–IV) and normal healthy subjects. We developed our three diagnostic biomarker models with 57 subjects [40 with colon cancer (stages II–IV) and 17 normal], and we validated them with 39 unknown (new and different) subjects [28 with colon cancer (stages II–IV) and 11 normal]. For all three diagnostic models, both the overall sensitivity and specificity were 100%. The nine most significant micro-RNAs identified, which comprise the input variables to the three linear discriminant functions, are associated with genes that regulate oncogenesis, and they play a paramount role in the development of colon cancer, as evidenced in the tumor tissue itself. This could have a significant impact in the fight against this disease, in that it may lead to the development of an early serum or blood diagnostic test based on the detection of those nine key micro-RNAs.
colon cancer; ROC-supervised linear discriminant analysis; biomarkers; diagnostic biomarker models; global micro-RNA expression analysis; systems biology
Mouse (m) 11β-hydroxysteroid dehydrogenase type 2 (11βHSD2) was homology-modeled, and its structure and ligand-receptor interaction were analyzed. The modeled m11βHSD2 showed significant 3D similarities to the human (h) 11βHSD1 and 2 structures. The contact energy profiles of the m11βHSD2 model were in good agreement with those of the h11βHSD1 and 2 structures. The secondary structure of the m11βHSD2 model exhibited a central 6-stranded all-parallel β-sheet sandwich-like structure, flanked on both sides by 3-helices. Ramachandran plots revealed that only 1.1% of the amino acid residues were in the disfavored region for m11βHSD2. Further, the molecular surfaces and electrostatic analyses of the m11βHSD2 model at the ligand-binding site exhibited that the model was almost identical to the h11βHSD2 model. Furthermore, docking simulation and ligand-receptor interaction analyses revealed the similarity of the ligand-receptor bound conformation between the m11βHSD2 and h11βHSD2 models. These results indicate that the m11βHSD2 model was successfully evaluated and analyzed. To the best of our knowledge, this is the first report of a m11βHSD2 model with detailed analyses, and our data verify that the mouse model can be utilized for application to the human model to target 11βHSD2 for the development of anticancer drugs.
11βHSD2; anticancer drug; homology modeling; Molecular Operating Environment (MOE); tumor
Lung cancer is the second most commonly occurring non-cutaneous cancer in the United States with the highest mortality rate among both men and women. In this study, we utilized three lung cancer microarray datasets generated by previous researchers to identify differentially expressed genes, altered signaling pathways, and assess the involvement of Hedgehog (Hh) pathway. The three datasets contain the expression levels of tens of thousands genes in normal lung tissues and squamous cell lung carcinoma. The datasets were combined and analyzed. The dysregulated genes and altered signaling pathways were identified using statistical methods. We then performed Fisher’s exact test on the significance of the association of Hh pathway downstream genes and squamous cell lung carcinoma.
395 genes were found commonly differentially expressed in squamous cell lung carcinoma. The genes encoding fibrous structural protein keratins and cell cycle dependent genes encoding cyclin-dependent kinases were significantly up-regulated while the ones encoding LIM domains were down. Over 100 signaling pathways were implicated in squamous cell lung carcinoma, including cell cycle regulation pathway, p53 tumor-suppressor pathway, IL-8 signaling, Wnt-β-catenin pathway, mTOR signaling and EGF signaling. In addition, 37 out of 223 downstream molecules of Hh pathway were altered. The P-value from the Fisher’s exact test indicates that Hh signaling is implicated in squamous cell lung carcinoma.
Numerous genes were altered and multiple pathways were dysfunctional in squamous cell lung carcinoma. Many of the altered genes have been implicated in different types of carcinoma while some are organ-specific. Hh signaling is implicated in squamous cell lung cancer, opening the door for exploring new cancer therapeutic treatment using GLI antagonist GANT 61.
biomarkers; drug targets; signaling pathways; microarray technology; non-small cell lung cancer; cancer treatment
We have previously shown the hepatic gene expression profiles of carcinogens in 28-day toxicity tests were clustered into three major groups (Group-1 to 3). Here, we developed a new prediction method for Group-1 carcinogens which consist mainly of genotoxic rat hepatocarcinogens. The prediction formula was generated by a support vector machine using 5 selected genes as the predictive genes and predictive score was introduced to judge carcinogenicity. It correctly predicted the carcinogenicity of all 17 Group-1 chemicals and 22 of 24 non-carcinogens regardless of genotoxicity. In the dose-response study, the prediction score was altered from negative to positive as the dose increased, indicating that the characteristic gene expression profile emerged over a range of carcinogen-specific doses. We conclude that the prediction formula can quantitatively predict the carcinogenicity of Group-1 carcinogens. The same method may be applied to other groups of carcinogens to build a total system for prediction of carcinogenicity.
toxicogenomics; carcinogenicity; hepatocarcinogen; microarray; prediction method
Array-based comparative genomic hybridization (aCGH) allows measuring DNA copy number at the whole genome scale. In cancer studies, one may be interested in identifying DNA copy number aberrations (CNAs) associated with certain clinicopathological characteristics such as cancer metastasis. We proposed to define test regions based on copy number pattern profiles across multiple samples, using either smoothed log2-ratio or discrete data of copy number gain/loss calls. Association test performed on the refined test regions instead of the probes has improved power due to reduced number of tests. We also compared three types of measurement of copy number levels, normalized log2-ratio, smoothed log2-ratio, and copy number gain or loss calls in statistical hypothesis testing. The relative strengths and weaknesses of the proposed method were demonstrated using both simulation studies and real data analysis of a liver cancer study.
aCGH; DNA copy number aberration (CNA); downstream analysis; gain/loss calls; segmentation
Following initial standard chemotherapy (platinum/taxol), more than 75% of those patients with advanced stage epithelial ovarian cancer (EOC) experience a recurrence. There are currently no accurate prognostic tests that, at the time of the diagnosis/surgery, can identify those patients with advanced stage EOC who will respond to chemotherapy. Using a novel mathematical theory, we have developed three prognostic biomarker models (complex mathematical functions) that—based on a global gene expression analysis of tumor tissue collected during surgery and prior to the commencement of chemotherapy—can identify with a high accuracy those patients with advanced stage EOC who will respond to the standard chemotherapy [long-term survivors (>7 yrs)] and those who will not do so [short-term survivors (<3 yrs)]. Our three prognostic biomarker models were developed with 34 subjects and validated with 20 unknown (new and different) subjects. Both the overall biomarker model sensitivity and specificity ranged from 95.83% to 100.00%. The 12 most significant genes identified, which are also the input variables to the three mathematical functions, constitute three distinct gene networks with the following functions: 1) production of cytoskeletal components, 2) cell proliferation, and 3) cell energy production. The first gene network is directly associated with the mechanism of action of anti-tubulin chemotherapeutic agents, such as taxanes and epothilones. This could have a significant impact in the discovery of new, more effective pharmacological treatments that may significantly extend the survival of patients with advanced stage EOC.
ovarian cancer; biomarkers; mathematical models; prognostic biomarker models; treatment response; survival; global gene expression analysis