One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.
We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.
We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment.
Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present
study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative
study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic
regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this
dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks
as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all
well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature
selection is an efficient and viable alternative to popular techniques.
gene expression; tumor classification; t-statistic; feature selection; SVM neural network; logistic regression
A consensus prognostic classifier for estrogen receptor positive breast tumors has been developed and shown to be valid in nearly 900 samples across different microarray platforms.
A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer.
Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation.
The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors.
The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.
In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.
MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.
Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
microarray; classification; metagenes; breast cancer
Gene expression profiles provide important information about the biology of breast tumors and can be used to develop prognostic tests. However, the implementation of quantitative RNA-based testing in routine molecular pathology has not been accomplished, so far. The EndoPredict assay has recently been described as a quantitative RT-PCR-based multigene expression test to identify a subgroup of hormone–receptor-positive tumors that have an excellent prognosis with endocrine therapy only. To transfer this test from bench to bedside, it is essential to evaluate the test–performance in a multicenter setting in different molecular pathology laboratories. In this study, we have evaluated the EndoPredict (EP) assay in seven different molecular pathology laboratories in Germany, Austria, and Switzerland. A set of ten formalin-fixed paraffin-embedded tumors was tested in the different labs, and the variance and accuracy of the EndoPredict assays were determined using predefined reference values. Extraction of a sufficient amount of RNA and generation of a valid EP score was possible for all 70 study samples (100%). The EP scores measured by the individual participants showed an excellent correlation with the reference values, respectively, as reflected by Pearson correlation coefficients ranging from 0.987 to 0.999. The Pearson correlation coefficient of all values compared to the reference value was 0.994. All laboratories determined EP scores for all samples differing not more than 1.0 score units from the pre-defined references. All samples were assigned to the correct EP risk group, resulting in a sensitivity and specificity of 100%, a concordance of 100%, and a kappa of 1.0. Taken together, the EndoPredict test could be successfully implemented in all seven participating laboratories and is feasible for reliable decentralized assessment of gene expression in luminal breast cancer.
Breast cancer; Prognosis; mRNA; Quality control
The clinical grades and staging methods currently employed for bladder cancer (BC) are inadequate for assessing treatment outcomes for non-muscle invasive bladder cancer (NMIBC). We have developed a clinically applicable quantitative real-time PCR (qPCR) gene signature to predict the progression of NMIBC. Three genes not previously described for BC were selected from our published progression-related gene classifier data set. Data were drawn from a previous study population and from new cases. Primary NMIBC tissue specimens (n=193) were analyzed by qPCR. Risk scores were then used to rank specimens into high- and low-risk signature groups based on their gene expression. The Kaplan-Meier method and a multivariate Cox regression model were used to identify the prognostic value of the three-gene signature for both recurrence and progression. The Kaplan-Meier estimates revealed significant differences in time-to-recurrence and progression between low- and high-risk signatures (log-rank test, p=0.011 and p<0.001, respectively). The multivariate Cox regression analysis showed that the three-gene risk signature is an independent predictor of bladder tumor progression (hazard ratio, 4.268; 95% CI, 1.542–11.814; p=0.005). In conclusion, our three-gene signature was found to be closely associated with progression among patients with NMIBC.
bladder cancer; cadherin EGF LAG seven-pass G-type receptor 3; coagulation factor C homolog; gene signature; kinesin family member 1A; prognosis; quantitative real-time PCR
While several molecular markers of bladder cancer prognosis have been identified, the limited value of current prognostic markers has created the need for new molecular indicators of bladder cancer outcomes. The aim of this study was to identify genetic signatures associated with disease prognosis in bladder cancer.
We used 272 primary bladder cancer specimens for microarray analysis and real-time reverse transcriptase polymerase chain reaction (RT-PCR) analysis. Microarray gene expression analysis of randomly selected 165 primary bladder cancer specimens as an original cohort was carried out. Risk scores were applied to stratify prognosis-related gene classifiers. Prognosis-related gene classifiers were individually analyzed with tumor invasiveness (non-muscle invasive bladder cancer [NMIBC] and muscle invasive bladder cancer [MIBC]) and prognosis. We validated selected gene classifiers using RT-PCR in the original (165) and independent (107) cohorts. Ninety-seven genes related to disease progression among NMIBC patients were identified by microarray data analysis. Eight genes, a progression-related gene classifier in NMIBC, were selected for RT-PCR. The progression-related gene classifier in patients with NMIBC was closely correlated with progression in both original and independent cohorts. Furthermore, no patient with NMIBC in the good-prognosis signature group experienced cancer progression.
We identified progression-related gene classifier that has strong predictive value for determining disease outcome in NMIBC. This gene classifier could assist in selecting NMIBC patients who might benefit from more aggressive therapeutic intervention or surveillance.
S100 calcium binding protein A8 (S100A8) has been implicated as a prognostic indicator in several types of cancer. However, previous studies are limited in their ability to predict the clinical behavior of the cancer. Here, we sought to identify a molecular signature based on S100A8 expression and to assess its usefulness as a prognostic indicator of disease progression in non-muscle invasive bladder cancer (NMIBC).
We used 103 primary NMIBC specimens for microarray gene expression profiling. The median follow-up period for all patients was 57.6 months (range: 3.2 to 137.0 months). Various statistical methods, including the leave-one-out cross validation method, were applied to identify a gene expression signature able to predict the likelihood of progression. The prognostic value of the gene expression signature was validated in an independent cohort (n = 302).
Kaplan-Meier estimates revealed significant differences in disease progression associated with the expression signature of S100A8-correlated genes (log-rank test, P < 0.001). Multivariate Cox regression analysis revealed that the expression signature of S100A8-correlated genes was a strong predictor of disease progression (hazard ratio = 15.225, 95% confidence interval = 1.746 to 133.52, P = 0.014). We validated our results in an independent cohort and confirmed that this signature produced consistent prediction patterns. Finally, gene network analyses of the signature revealed that S100A8, IL1B, and S100A9 could be important mediators of the progression of NMIBC.
The prognostic molecular signature defined by S100A8-correlated genes represents a promising diagnostic tool for the identification of NMIBC patients that have a high risk of progression to muscle invasive bladder cancer.
Current histo-pathological prognostic factors are not very helpful in predicting the clinical outcome of breast cancer due to the disease's heterogeneity. Molecular profiling using a large panel of genes could help to classify breast tumours and to define signatures which are predictive of their clinical behaviour.
To this aim, quantitative RT-PCR amplification was used to study the RNA expression levels of 47 genes in 199 primary breast tumours and 6 normal breast tissues. Genes were selected on the basis of their potential implication in hormonal sensitivity of breast tumours. Normalized RT-PCR data were analysed in an unsupervised manner by pairwise hierarchical clustering, and the statistical relevance of the defined subclasses was assessed by Chi2 analysis. The robustness of the selected subgroups was evaluated by classifying an external and independent set of tumours using these Chi2-defined molecular signatures.
Hierarchical clustering of gene expression data allowed us to define a series of tumour subgroups that were either reminiscent of previously reported classifications, or represented putative new subtypes. The Chi2 analysis of these subgroups allowed us to define specific molecular signatures for some of them whose reliability was further demonstrated by using the validation data set. A new breast cancer subclass, called subgroup 7, that we defined in that way, was particularly interesting as it gathered tumours with specific bioclinical features including a low rate of recurrence during a 5 year follow-up.
The analysis of the expression of 47 genes in 199 primary breast tumours allowed classifying them into a series of molecular subgroups. The subgroup 7, which has been highlighted by our study, was remarkable as it gathered tumours with specific bioclinical features including a low rate of recurrence. Although this finding should be confirmed by using a larger tumour cohort, it suggests that gene expression profiling using a minimal set of genes may allow the discovery of new subclasses of breast cancer that are characterized by specific molecular signatures and exhibit specific bioclinical features.
Prognosis of ovarian carcinoma is poor, heterogeneous, and not accurately predicted by histoclinical features. We analysed gene expression profiles of ovarian carcinomas to identify a multigene expression model associated with survival after platinum-based therapy.
Data from 401 ovarian carcinoma samples were analysed. The learning set included 35 cases profiled using whole-genome DNA chips. The validation set included 366 cases from five independent public data sets.
Whole-genome unsupervised analysis could not distinguish poor from good prognosis samples. By supervised analysis, we built a seven-gene optimal prognostic model (OPM) out of 94 genes identified as associated with progression-free survival. Using the OPM, we could classify patients in two groups with different overall survival (OS) not only in the learning set, but also in the validation set. Five-year OS was 57 and 27% for the predicted ‘Favourable' and ‘Unfavourable' classes, respectively. In multivariate analysis, the OPM outperformed the individual current prognostic factors, both in the learning and the validation sets, and added independent prognostic information.
We defined a seven-gene model associated with outcome in 401 ovarian carcinomas. Prospective studies are warranted to confirm its prognostic value, and explore its potential ability for better tailoring systemic therapies in advanced-stage tumours.
ovarian cancer; gene expression profiling; prognosis
More accurate prognostic assessment of patients with neuroblastoma is required to improve the choice of risk-related therapy. The aim of this study is to develop and validate a gene expression signature for improved outcome prediction.
Fifty-nine genes were carefully selected based on an innovative data-mining strategy and profiled in the largest neuroblastoma patient series (n=579) to date using RT-qPCR starting from only 20 ng of RNA. A multigene expression signature was built using 30 training samples, tested on 313 test samples and subsequently validated in a blind study on an independent set of 236 additional tumours.
The signature accurately classifies patients with respect to overall and progression-free survival (p<0·0001). The signature has a performance, sensitivity, and specificity of 85·4% (95%CI: 77·7–93·2), 84·4% (95%CI: 66·5–94·1), and 86·5% (95%CI: 81·1–90·6), respectively to predict patient outcome. Multivariate analysis indicates that the signature is a significant independent predictor after controlling for currently used riskfactors. Patients with high molecular risk have a higher risk to die from disease and for relapse/progression than patients with low molecular risk (odds ratio of 19·32 (95%CI: 6·50–57·43) and 3·96 (95%CI: 1·97–7·97) for OS and PFS, respectively). Patients with increased risk for adverse outcome can also be identified within the current treatment groups demonstrating the potential of this signature for improved clinical management. These results were confirmed in the validation study in which the signature was also independently statistically significant in a model adjusted for MYCN status, age, INSS stage, ploidy, INPC grade of differentiation, and MKI. The high patient/gene ratio (579/59) underlies the observed statistical power and robustness.
A 59-gene expression signature predicts outcome of neuroblastoma patients with high accuracy. The signature is an independent risk predictor, identifying patients with increased risk in the current clinical risk groups. The applied method and signature is suitable for routine lab testing and ready for evaluation in prospective studies.
The Belgian Foundation Against Cancer, found of public interest (project SCIE2006-25), the Children Cancer Fund Ghent, the Belgian Society of Paediatric Haematology and Oncology, the Belgian Kid’s Fund and the Fondation Nuovo-Soldati (JV), the Fund for Scientific Research Flanders (KDP, JH), the Fund for Scientific Research Flanders (grant number: G•0198•08), the Institute for the Promotion of Innovation by Science and Technology in Flanders, Strategisch basisonderzoek (IWT-SBO 60848), the Fondation Fournier Majoie pour l’Innovation, the Instituto Carlos III,RD 06/0020/0102 Spain, the Italian Neuroblastoma Foundation, the European Community under the FP6 (project: STREP: EET-pipeline, number: 037260), and the Belgian program of Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister's Office, Science Policy Programming.
Multiple breast cancer gene expression profiles have been developed that appear to provide similar abilities to predict outcome and may outperform clinical-pathologic criteria; however, the extent to which seemingly disparate profiles provide additive prognostic information is not known, nor do we know whether prognostic profiles perform equally across clinically defined breast cancer subtypes. We evaluated whether combining the prognostic powers of standard breast cancer clinical variables with a large set of gene expression signatures could improve on our ability to predict patient outcomes.
Using clinical-pathological variables and a collection of 323 gene expression "modules", including 115 previously published signatures, we build multivariate Cox proportional hazards models using a dataset of 550 node-negative systemically untreated breast cancer patients. Models predictive of pathological complete response (pCR) to neoadjuvant chemotherapy were also built using this approach.
We identified statistically significant prognostic models for relapse-free survival (RFS) at 7 years for the entire population, and for the subgroups of patients with ER-positive, or Luminal tumors. Furthermore, we found that combined models that included both clinical and genomic parameters improved prognostication compared with models with either clinical or genomic variables alone. Finally, we were able to build statistically significant combined models for pathological complete response (pCR) predictions for the entire population.
Integration of gene expression signatures and clinical-pathological factors is an improved method over either variable type alone. Highly prognostic models could be created when using all patients, and for the subset of patients with lymph node-negative and ER-positive breast cancers. Other variables beyond gene expression and clinical-pathological variables, like gene mutation status or DNA copy number changes, will be needed to build robust prognostic models for ER-negative breast cancer patients. This combined clinical and genomics model approach can also be used to build predictors of therapy responsiveness, and could ultimately be applied to other tumor types.
Gene expression profiling yields quantitative data on gene expression used to create prognostic models that accurately predict patient outcome in diffuse large B cell lymphoma (DLBCL). Often, data are analyzed with genes classified by whether they fall above or below the median expression level. We sought to determine whether examining multiple cut-points might be a more powerful technique to investigate the association of gene expression with outcome.
We explored gene expression profiling data using variable cut-point analysis for 36 genes with reported prognostic value in DLBCL. We plotted two-group survival logrank test statistics against corresponding cut-points of the gene expression levels and smooth estimates of the hazard ratio of death versus gene expression levels. To facilitate comparisons we also standardized the expression of each of the genes by the fraction of patients that would be identified by any cut-point. A multiple comparison adjusted permutation p-value identified 3 different patterns of significance: 1) genes with significant cut-point points below the median, whose loss is associated with poor outcome (e.g. HLA-DR); 2) genes with significant cut-points above the median, whose over-expression is associated with poor outcome (e.g. CCND2); and 3) genes with significant cut-points on either side of the median, (e.g. extracellular molecules such as FN1).
Variable cut-point analysis with permutation p-value calculation can be used to identify significant genes that would not otherwise be identified with median cut-points and may suggest biological patterns of gene effects.
Bladder cancer is one of the most frequent malignancies in developed countries and it is also characterized by a high number of recurrences. Despite this, several authors in the past reported that only two altered molecular pathways may genetically explain all cases of bladder cancer: one involving the FGFR3 gene, and the other involving the TP53 gene. Mutations in any of these two genes are usually predictive of the malignancy final outcome. This cancer may also be further classified as low-grade tumors, which is always papillary and in most cases superficial, and high-grade tumors, not necessarily papillary and often invasive. This simple way of considering this pathology has strongly changed in the last few years, with the development of genome-wide studies on expression profiling and the discovery of small non-coding RNA affecting gene expression. An easy search in the OMIM (On-line Mendelian Inheritance in Man) database using “bladder cancer” as a query reveals that genes in some way connected to this pathology are approximately 150, and some authors report that altered gene expression (up- or down-regulation) in this disease may involve up to 500 coding sequences for low-grade tumors and up to 2300 for high-grade tumors. In many clinical cases, mutations inside the coding sequences of the above mentioned two genes were not found, but their expression changed; this indicates that also epigenetic modifications may play an important role in its development. Indeed, several reports were published about genome-wide methylation in these neoplastic tissues, and an increasing number of small non-coding RNA are either up- or down-regulated in bladder cancer, indicating that impaired gene expression may also pass through these metabolic pathways. Taken together, these data reveal that bladder cancer is far to be considered a simple model of malignancy. In the present review, we summarize recent progress in the genome-wide analysis of bladder cancer, and analyse non-genetic, genetic and epigenetic factors causing extensive gene mis-regulation in malignant cells.
Bladder carcinoma; urinary tract; NMIBC; MICB; carcinoma in situ; CIS; FGFR3; TP53; epigenetics; small non-coding RNA; environmental causes of bladder carcinoma.
Motivation: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature.
Results: We show on three cancer studies that prediction accuracy can be improved when combining both types of variables. Furthermore, the selected genes were found to be of high relevance and can be considered as potential biomarkers for the prognostic selection of cancer therapy.
Availability: Integrative ME is implemented in the R package integrativeME (http://cran.r-project.org/).
Supplementary information: Supplementary data are available at Bioinformatics online.
We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome.
The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone.
Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.
Statistical modelling, in combination with genome-wide expression profiling
techniques, has demonstrated that the molecular state of the tumour is
sufficient to infer its pathological state. These studies have been extremely
important in diagnostics and have contributed to improving our understanding of
tumour biology. However, their importance in in-depth understanding of cancer
patho-physiology may be limited since they do not explicitly take into
consideration the fundamental role of the tissue microenvironment in specifying
tumour physiology. Because of the importance of normal cells in shaping the
tissue microenvironment we formulate the hypothesis that molecular components of
the profile of normal epithelial cells adjacent the tumour are predictive of
tumour physiology. We addressed this hypothesis by developing statistical models
that link gene expression profiles representing the molecular state of adjacent
normal epithelial cells to tumour features in prostate cancer. Furthermore,
network analysis showed that predictive genes are linked to the activity of
important secreted factors, which have the potential to influence tumor biology,
such as IL1, IGF1, PDGF BB, AGT, and TGFβ.
Carcinoma in situ (CIS) is believed to be a precursor of invasive bladder cancer. Identification of CIS is a valuable prognostic factor since radical treatment strategies can be offered these patients before the disease becomes invasive.
We developed a pathway based classifier approach to predict presence or absence of CIS in patients suffering from non muscle invasive bladder cancer. From Ingenuity Pathway Analysis we considered four canonical signalling pathways (p38 MAPK, FGF, Calcium, and cAMP pathways) with most coherent expression of transcription factors (TFs) across samples in a set of twenty-eight non muscle invasive bladder carcinomas. These pathways contained twelve TFs in total. We used the expression of the TFs to predict presence or absence of CIS in a Leave-One-Out Cross Validation classification.
We showed that TF expression levels in three pathways (FGF, p38 MAPK, and calcium signalling) or the expression of the twelve TFs together could be used to predict presence or absence of concomitant CIS. A cluster analysis based on expression of the twelve TFs separated the samples in two main clusters: one branch contained 11 of the 15 patients without concomitant CIS and with the majority of the genes being down regulated; the other branch contained 10 of 13 patients with concomitant CIS, and here genes were mostly up regulated. The expression in the CIS group was comparable to the expression of twenty-three patients suffering from muscle-invasive bladder carcinoma. Finally, we validated our results in an independent test set and found that prediction of CIS status was possible using TF expression of the p38 MAPK pathway.
We conclude that it is possible to use pathway analysis for molecular classification of bladder tumors.
Molecular characterisation using gene-expression profiling will undoubtedly improve the prediction of treatment responses, and ultimately, the clinical outcome of cancer patients.
To establish the procedures to identify responders to FOLFOX therapy, 83 colorectal cancer (CRC) patients including 42 responders and 41 non-responders were divided into training (54 patients) and test (29 patients) sets. Using Random Forests (RF) algorithm in the training set, predictor genes for FOLFOX therapy were identified, which were applied to test samples and sensitivity, specificity, and out-of-bag classification accuracy were calculated.
In the training set, 22 of 27 responders (81.4% sensitivity) and 23 of 27 non-responders (85.1% specificity) were correctly classified. To improve the prediction model, we removed the outliers determined by RF, and the model could correctly classify 21 of 23 responders (91.3%) and 22 of 23 non-responders (95.6%) in the training set, and 80.0% sensitivity and 92.8% specificity, with an accuracy of 69.2% in 29 independent test samples.
Random Forests on gene-expression data for CRC patients was effectively able to stratify responders to FOLFOX therapy with high accuracy, and use of pharmacogenomics in anticancer therapy is the first step in planning personalised therapy.
colorectal cancer; FOLFOX therapy; machine learning algorithm; class predictor; personalised therapy
Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.
We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.
Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.
Identification of a molecular signature predicting the relapse of tamoxifen-treated primary breast cancers should help the therapeutical management of ER-positive cancers.
A series of 132 primary tumors from patients who received adjuvant tamoxifen were analysed for expression profiles at the whole genome level by 70-mer oligonucleotide microarrays. A supervised analysis was performed to identify an expression signature.
We defined a 36-gene signature that classified correctly 78% of patients with relapse and 80% of relapse-free patients (79% accuracy). Using 23 independent tumors, we confirmed the accuracy of the signature (78%), whose relevance was further demonstrated by using published microarray data from 60 tamoxifen-treated patients (63% accuracy).
Univariate analysis using the validation set of 83 tumors demonstrated that the 36-gene classifier was more efficient to predict disease-free survival than the traditional histo-pathological prognostic factors and as effective as the Nottingham Prognostic Index or the “Adjuvant!“ software. Multivariate analysis demonstrated that the molecular signature was the only independent prognostic factor. Comparison with several already published signatures demonstated that the 36-gene signature was among the best to classify tumors from both training and validation sets. Kaplan-Meier analyses emphasized its prognostic power both on the whole cohort of patients and on a subgroup with an intermediate risk of recurrence as defined by the St Gallen criteria.
This study identifies a molecular signature specifying a subgroup of patients who do not gain benefits from tamoxifen treatment. These patients may therefore be eligible for alternative endocrine therapies and/or chemotherapy.
Adult; Aged; Aged, 80 and over; Antineoplastic Agents, Hormonal; therapeutic use; Breast Neoplasms; diagnosis; drug therapy; genetics; Carcinoma; diagnosis; drug therapy; genetics; Chemotherapy, Adjuvant; Cluster Analysis; Disease-Free Survival; Drug Resistance, Neoplasm; genetics; Female; Follow-Up Studies; Gene Expression Profiling; Humans; Middle Aged; Neoplasm Recurrence, Local; diagnosis; genetics; Oligonucleotide Array Sequence Analysis; Prognosis; Receptors, Estrogen; genetics; Receptors, Progesterone; genetics; Sensitivity and Specificity; Tamoxifen; therapeutic use; Treatment Outcome; gene expression profiling; classifier; tamoxifen; breast cancer
We previously developed and validated the Cancer of the Prostate Risk Assessment (CAPRA) score to predict prostate cancer recurrence based on pre-treatment clinical data. We aimed to develop a similar post-surgical score with improved accuracy via incorporation of pathologic data.
We analyzed 3837 prostatectomy patients in the CaPSURE national disease registry. Cox regression was used to determine the predictive power of preoperative prostate specific antigen (PSA), pathologic Gleason score (pGS), surgical margins (SM), extracapsular extension (ECE), seminal vesicle invasion (SVI), and lymph node invasion (LNI). Points were assigned based on the relative weights of these variables in predicting recurrence. The new post-surgical score (CAPRA-S) was tested and compared to a commonly-cited nomogram with proportional hazards analysis, the concordance (c) index, calibration plots, and decision-curve analysis.
16.8% of the men recurred; actuarial progression-free probability at 5 years was 78.0%. The CAPRA-S was determined by adding up to three points for PSA, up to three for pGS, one point each for ECE and LNI, and two points each for SM and SVI. The hazard ratio for each point increase in CAPRA-S score was 1.54 (95% CI 1.49–1.59), indicating a 2.4-fold increase in risk for each two point increase in score. The CAPRA-S c-index was 0.77, substantially higher than 0.66 for the pre-treatment CAPRA score and comparable to 0.76 for the nomogram. The CAPRA-S score performed better in both calibration and decision curve analyses.
The CAPRA-S offers good discriminatory accuracy, calibration, and ease of calculation for clinical and research settings.
The pathological complete response (pCR) after neoadjuvant chemotherapy is a surrogate marker for a favorable prognosis in breast cancer patients. Factors capable of predicting a pCR, such as the proliferation marker Ki67, may therefore help improve our understanding of the drug response and its effect on the prognosis. This study investigated the predictive and prognostic value of Ki67 in patients with invasive breast cancer receiving neoadjuvant treatment for breast cancer.
Ki67 was stained routinely from core biopsies in 552 patients directly after the fixation and embedding process. HER2/neu, estrogen and progesterone receptors, and grading were also assessed before treatment. These data were used to construct univariate and multivariate models for predicting pCR and prognosis. The tumors were also classified by molecular phenotype to identify subgroups in which predicting pCR and prognosis with Ki67 might be feasible.
Using a cut-off value of > 13% positively stained cancer cells, Ki67 was found to be an independent predictor for pCR (OR 3.5; 95% CI, 1.4, 10.1) and for overall survival (HR 8.1; 95% CI, 3.3 to 20.4) and distant disease-free survival (HR 3.2; 95% CI, 1.8 to 5.9). The mean Ki67 value was 50.6 ± 23.4% in patients with pCR. Patients without a pCR had an average of 26.7 ± 22.9% positively stained cancer cells.
Ki67 has predictive and prognostic value and is a feasible marker for clinical practice. It independently improved the prediction of treatment response and prognosis in a group of breast cancer patients receiving neoadjuvant treatment. As mean Ki67 values in patients with a pCR were very high, cut-off values in a high range above which the prognosis may be better than in patients with lower Ki67 values may be hypothesized. Larger studies will be needed in order to investigate these findings further.
BACKGROUND & AIMS
Staging inadequately predicts metastatic risk in patients with colon cancer. We used a gene expression profile derived from invasive, murine colon cancer cells that were highly metastatic in an immunocompetent mouse model to identify patients with colon cancer at risk of recurrence.
This phase 1, exploratory biomarker study used 55 patients with colorectal cancer from Vanderbilt Medical Center (VMC) as the training dataset and 177 patients from the Moffitt Cancer Center as the independent dataset. The metastasis-associated gene expression profile developed from the mouse model was refined with comparative functional genomics in the VMC gene expression profiles to identify a 34-gene classifier associated with high risk of metastasis and death from colon cancer. A metastasis score derived from the biologically based classifier was tested in the Moffitt dataset.
A high score was significantly associated with increased risk of metastasis and death from colon cancer across all pathologic stages and specifically in stage II and stage III patients. The metastasis score was shown to independently predict risk of cancer recurrence and death in univariate and multivariate models. For example, among stage III patients, a high score translated to increased relative risk of cancer recurrence (hazard ratio, 4.7; 95% confidence interval, 1.566–14.05). Furthermore, the metastasis score identified patients with stage III disease whose 5-year recurrence-free survival was >88% and for whom adjuvant chemotherapy did not increase survival time.
A gene expression profile identified from an experimental model of colon cancer metastasis predicted cancer recurrence and death, independently of conventional measures, in patients with colon cancer.
Gene Expression Profiling; Colon Cancer Prognosis; Predictive Gene Signature; Mouse Model
Drug-induced liver injury (DILI) is a significant concern in drug development due to the poor concordance between preclinical and clinical findings of liver toxicity. We hypothesized that the DILI types (hepatotoxic side effects) seen in the clinic can be translated into the development of predictive in silico models for use in the drug discovery phase. We identified 13 hepatotoxic side effects with high accuracy for classifying marketed drugs for their DILI potential. We then developed in silico predictive models for each of these 13 side effects, which were further combined to construct a DILI prediction system (DILIps). The DILIps yielded 60–70% prediction accuracy for three independent validation sets. To enhance the confidence for identification of drugs that cause severe DILI in humans, the “Rule of Three” was developed in DILIps by using a consensus strategy based on 13 models. This gave high positive predictive value (91%) when applied to an external dataset containing 206 drugs from three independent literature datasets. Using the DILIps, we screened all the drugs in DrugBank and investigated their DILI potential in terms of protein targets and therapeutic categories through network modeling. We demonstrated that two therapeutic categories, anti-infectives for systemic use and musculoskeletal system drugs, were enriched for DILI, which is consistent with current knowledge. We also identified protein targets and pathways that are related to drugs that cause DILI by using pathway analysis and co-occurrence text mining. While marketed drugs were the focus of this study, the DILIps has a potential as an evaluation tool to screen and prioritize new drug candidates or chemicals, such as environmental chemicals, to avoid those that might cause liver toxicity. We expect that the methodology can be also applied to other drug safety endpoints, such as renal or cardiovascular toxicity.
Translational research involves utilization of clinical data to address challenges in drug discovery and development. The rationale behind this study is that the side effects observed in clinical trial and post-marketing surveillance can be translated into a screening system for use in drug discovery. As a proof-of-concept study, we developed an in silico system based on 13 hepatotoxic side effects to predict drug-induced liver injury (DILI), which is one of the most frequent causes of drug failure in clinical trial and withdrawal from post-marketing application, and also one of the most difficult clinical endpoints to predict from preclinical studies. We first identified 13 types of liver injury which yielded high prediction accuracy to distinguish drugs known to cause DILI from these don't. To effectively apply these 13 hepatotoxic side effects to the drug discovery process for DILI, we developed in silico models for each of these side effects solely based on chemical structure data. Finally, we constructed a DILI prediction system (DILIps) by combining these 13 in silico models in a consensus fashion, which yielded >91% positive predictive value for DILI in humans. The DILIps methodology can be extended in applications for addressing other drug safety issues, such as renal and cardiovascular toxicity.