In cancer research, high-throughput genomic studies have been extensively conducted, searching for markers associated with cancer diagnosis, prognosis and variation in response to treatment. In this article, we analyze cancer prognosis studies and investigate ranking markers based on their marginal prognosis power. To avoid ambiguity, we focus on microarray gene expression studies where genes are the markers, but note that the methodology and results are applicable to other high-throughput studies. The objectives of this study are 2-fold. First, we investigate ranking markers under three commonly adopted semiparametric models, namely the Cox, accelerated failure time and additive risk models. Data analysis shows that the ranking may vary significantly under different models. Second, we describe a nonparametric concordance measure, which has roots in the time-dependent ROC (receiver operating characteristic) framework and relies on much weaker assumptions than the semiparametric models. In simulation, it is shown that ranking using the concordance measure is not sensitive to model specification whereas ranking under the semiparametric models is. In data analysis, the concordance measure generates rankings significantly different from those under the semiparametric models.
cancer prognosis markers; semiparametric survival analysis; concordance measure
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
Breast cancer prognosis; Gene expression; Marker identification; Integrative analysis; 2-norm group bridge
In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.
We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.
Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.
Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.
Breast cancer prognosis; Gene Expression; Integrative analysis; Sparse boosting
Prognosis plays a pivotal role in patient management and trial design. A useful prognostic model should correctly identify important risk factors and estimate their effects. In this article, we discuss several challenges in selecting prognostic factors and estimating their effects using the Cox proportional hazards model. Although a flexible semiparametric form, the Cox’s model is not entirely exempt from model misspecification. To minimize possible misspecification, instead of imposing traditional linear assumption, flexible modeling techniques have been proposed to accommodate the nonlinear effect. We first review several existing nonparametric estimation and selection procedures and then present a numerical study to compare the performance between parametric and nonparametric procedures. We demonstrate the impact of model misspecification on variable selection and model prediction using a simulation study and a example from a phase III trial in prostate cancer.
Cox’s Model; Model Selection; LASSO; Smoothing Splines; COSSO
Colorectal cancer prognosis is currently predicted from pathological staging, providing limited discrimination for Dukes’ stage B and C disease. Additional markers for outcome are required to help guide therapy selection for individual patients.
A multi-site single-platform microarray study was performed on 553 colorectal cancers. Gene expression changes were identified between stage A and D tumors (three training sets) and assessed as a prognosis signature in stage B and C tumors (independent test and external validation sets).
128 genes showed reproducible expression changes between three sets of stage A and D cancers. Using consistent genes, stage B and C cancers clustered into two groups resembling early-stage and metastatic tumors. A Prediction Analysis of Microarray (PAM) algorithm was developed to classify individual intermediate-stage cancers into stage A-like/good prognosis or stage D-like/poor prognosis types. For stage B patients, the treatment adjusted hazard ratio for six-year recurrence in individuals with stage D-like cancers was 10.3 (95% CI 1.3 to 80.0, P=0.011). For stage C patients, the adjusted hazard ratio was 2.9 (95% CI 1.1 to 7.6, P=0.016). Similar results were obtained for an external set of stage B and C patients. The prognosis signature was enriched for down-regulated immune response genes and up-regulated cell signaling and extracellular matrix genes. Accordingly, sparse tumor infiltration with mononuclear chronic inflammatory cells was associated with poor outcome in independent patients.
Metastasis-associated gene expression changes can be used to refine traditional outcome prediction, providing a rational approach for tailoring treatments to subsets of patients.
colorectal cancer; gene expression; outcome prediction
Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed.
The new method advances beyond existing alternatives along the following aspects. First, it can assess the predictive power of gene pathways, whereas existing methods tend to focus on model fitting accuracy only. Second, it can account for the joint effects of multiple genes in a pathway, whereas existing methods tend to focus on the marginal effects of genes. Third, it can accommodate multiple heterogeneous datasets, whereas existing methods analyze a single dataset only. We analyze four breast cancer prognosis studies and identify 97 pathways with significant predictive power for prognosis. Important pathways missed by alternative methods are identified.
The proposed method provides a useful alternative to existing pathway analysis methods. Identified pathways can provide further insights into breast cancer prognosis.
Ductal carcinoma in situ (DCIS) now represents up to 20% of breast cancer cases, yet its behaviour is still poorly understood. Morphological classifications go some way to predicting prognosis, but more sophisticated approaches are required to better tailor therapy to the individual. A number of biological molecules have been identified that appear to relate to prognosis and, in model systems, promote progression to invasive disease. Some of these, such as COX-2, provide real therapeutic opportunities, whilst other marker combinations are showing promise in categorising women according to risk. Gene expression studies have led to an emerging molecular classification of invasive breast cancer, and it is now evident that at least some of these molecular subtypes can be identified at the pre-invasive stage. The difference in frequency of these subtypes between DCIS and invasive cancer may hold clues as to the biological mechanisms underpinning disease transition. It is increasingly clear that the host microenvironment can have a major impact on disease behaviour, and as well as acting as potential predictive factors, the altered microenvironment phenotype also offers novel therapeutic opportunities.
DCIS; Linear progression; Parallel progression; Molecular classification; Microenvironment; Myoepithelial cells
One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively.
Methods and Findings
Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER− patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome.
The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic value for both ER-positive and ER-negative breast cancer. The signature was selected using a novel biological approach and hence holds promise to represent the key biological processes of breast cancer.
Extensive biomedical studies have shown that clinical and environmental risk factors may not have sufficient predictive power for cancer prognosis. The development of high-throughput profiling technologies makes it possible to survey the whole genome and search for genomic markers with predictive power. Many existing studies assume the interchangeability of gene effects and ignore the coordination among them.
We adopt the weighted co-expression network to describe the interplay among genes. Although there are several different ways of defining gene networks, the weighted co-expression network may be preferred because of its computational simplicity, satisfactory empirical performance, and because it does not demand additional biological experiments. For cancer prognosis studies with gene expression measurements, we propose a new marker selection method that can properly incorporate the network connectivity of genes. We analyze six prognosis studies on breast cancer and lymphoma. We find that the proposed approach can identify genes that are significantly different from those using alternatives. We search published literature and find that genes identified using the proposed approach are biologically meaningful. In addition, they have better prediction performance and reproducibility than genes identified using alternatives.
The network contains important information on the functionality of genes. Incorporating the network structure can improve cancer marker identification.
Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0–2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer.
One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.
We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.
We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.
The use of molecular markers and gene expression profiling provides a promising approach for improving the predictive accuracy of current prognostic indices for predicting which patients with non-muscle-invasive bladder cancer will progress to muscle-invasive disease. There are many statistical pitfalls in establishing the benefit of a multigene expression classifier during its development. First, there are issues related to the identification of the individual genes and the false discovery rate, the instability of the genes identified and their combination into a classifier. Secondly, the classifier should be validated, preferably on an independent data set, to show its reproducibility. Next, it is necessary to show that adding the classifier to an existing model based on the most important clinical and pathological factors improves the predictive accuracy of the model. This cannot be determined based on the classifier's hazard ratio or p-value in a multivariate model, but should be assessed based on an improvement in statistics such as the area under the curve and the concordance index. Finally, nomograms are superior to stage and risk group classifications for predicting outcome, but the model predicting the outcome must be well calibrated. It is important for investigators to be aware of these pitfalls in order to develop statistically valid classifiers that will truly improve our ability to predict a patient's risk of progression.
Area under the curve; biostatistics; molecular profile; nomograms; non-muscle-invasive bladder cancer; predictive accuracy; prognosis; progression; validation
A large number of gene expression profiling (GEP) studies on prognosis of colorectal cancer (CRC) has been performed, but no reliable gene signature for prediction of CRC prognosis has been found. Bioinformatic enrichment tools are a powerful approach to identify biological processes in high-throughput data analysis.
We have for the first time collected the results from the 23 so far published independent GEP studies on CRC prognosis. In these 23 studies, 1475 unique, mapped genes were identified, from which 124 (8.4%) were reported in at least two studies, with 54 of them showing consisting direction in expression change between the single studies. Using these data, we attempted to overcome the lack of reproducibility observed in the genes reported in individual GEP studies by carrying out a pathway-based enrichment analysis. We used up to ten tools for overrepresentation analysis of Gene Ontology (GO) categories or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in each of the three gene lists (1475, 124 and 54 genes). This strategy, based on testing multiple tools, allowed us to identify the oxidative phosphorylation chain and the extracellular matrix receptor interaction categories, as well as a general category related to cell proliferation and apoptosis, as the only significantly and consistently overrepresented pathways in the three gene lists, which were reported by several enrichment tools.
Our pathway-based enrichment analysis of 23 independent gene expression profiling studies on prognosis of CRC identified significantly and consistently overrepresented prognostic categories for CRC. These overrepresented categories have been functionally clearly related with cancer progression, and deserve further investigation.
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide. The recurrence of HCC after curative treatments is currently a major hurdle. Identification of subsets of patients with distinct prognosis provides an opportunity to tailor therapeutic approaches as well as to select the patients with specific sub-phenotypes for targeted therapy. Thus, the development of gene expression profiles to improve the prediction of HCC prognosis is important for HCC management. Although several gene signatures have been evaluated for the prediction of HCC prognosis, there is no consensus on the predictive power of these signatures. Using systematic approaches to evaluate these signatures and combine them with clinicopathologic information may provide more accurate prediction of HCC prognosis. Recently, Villanueva et al developed a composite prognostic model incorporating gene expression patterns in both tumor and adjacent tissues to predict HCC recurrence. In this commentary, we summarize the current progress in using gene signatures to predict HCC prognosis, and discuss the importance, existing issues and future research directions in this field.
Gene expression signatures; Hepatocellular carcinoma; Prognosis
High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level ‘super markers ’. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.
This review deals with the application of a new prefractionation tool, free-flow
electrophoresis (FFE), for proteomic analysis of colorectal cancer (CRC). CRC is a
leading cause of cancer death in the Western world. Early detection is the single most
important factor influencing outcome of CRC patients. If identified while the disease
is still localized, CRC is treatable. To improve outcomes for CRC patients there
is a pressing need to identify biomarkers for early detection (diagnostic markers),
prognosis (prognostic indicators), tumour responses (predictive markers) and disease
recurrence (monitoring markers). Despite recent advances in the use of genomic
analysis for risk assessment, in the area of biomarker identification genomic methods
alone have yet to produce reliable candidate markers for CRC. For this reason,
attention is being directed towards proteomics as a complementary analytical tool
for biomarker identification. Here we describe a proteomics separation tool, which
uses a combination of continuous FFE, a liquid-based isoelectric focusing technique, in
the first dimension, followed by rapid reversed-phase HPLC (1–6 min/analysis) in the
second dimension. We have optimized imaging software to present the FFE/RP-HPLC
data in a virtual 2D gel-like format. The advantage of this liquid based fractionation
system over traditional gel-based fractionation systems is the ability to fractionate
large quantity protein samples. Unlike 2D gels, the method is applicable to both
high-Mr proteins and small peptides, which are difficult to separate, and in the case
of peptides, are not retained in standard 2D gels.
High-throughput genomic technologies have identified biomarkers and potential therapeutic targets for ovarian cancer. Comprehensive functional validation studies of the biological and clinical implications of these biomarkers are needed to advance them toward clinical use. Amplification of chromosomal region 5q31–5q35.3 has been used to predict poor prognosis in patients with advanced stage, high-grade serous ovarian cancer. In this study, we further dissected this large amplicon and identified the overexpression of FGF18 as an independent predictive marker for poor clinical outcome in this patient population. Using cell culture and xenograft models, we show that FGF18 signaling promoted tumor progression by modulating the ovarian tumor aggressiveness and microenvironment. FGF18 controlled migration, invasion, and tumorigenicity of ovarian cancer cells through NF-κB activation, which increased the production of oncogenic cytokines and chemokines. This resulted in a tumor microenvironment characterized by enhanced angiogenesis and augmented tumor-associated macrophage infiltration and M2 polarization. Tumors from ovarian cancer patients had increased FGF18 expression levels with microvessel density and M2 macrophage infiltration, confirming our in vitro results. These findings demonstrate that FGF18 is important for a subset of ovarian cancers and may serve as a therapeutic target.
Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations.
Goodness of fit; Martingale residuals; Model checking; Model misspecification; Model selection; Recurrent events; Survival data; Time-dependent covariate
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/.
Network-based computational models are attracting increasing attention in studying cancer genomics because molecular networks provide valuable information on the functional organizations of molecules in cells. Survival analysis mostly with the Cox proportional hazard model is widely used to predict or correlate gene expressions with time to an event of interest (outcome) in cancer genomics. Surprisingly, network-based survival analysis has not received enough attention. In this paper, we studied resistance to chemotherapy in ovarian cancer with a network-based Cox model, called Net-Cox. The experiments confirm that networks representing gene co-expression or functional relations can be used to improve the accuracy and the robustness of survival prediction of outcome in ovarian cancer treatment. The study also revealed subnetwork signatures that are enriched by extracellular matrix receptors and modulators and the downstream nuclear signaling components of extracellular signal-regulators, respectively. In particular, FBN1, which was detected as a signature gene of high confidence by Net-Cox with network information, was validated as a biomarker for predicting early recurrence in platinum-sensitive ovarian cancer patients in laboratory.
Despite continual efforts to develop prognostic and predictive models of colorectal cancer by using clinicopathological and genetic parameters, a clinical test that can discriminate between patients with good or poor outcome after treatment has not been established. Thus, the authors aim to uncover subtypes of colorectal cancer that have distinct biological characteristics associated with prognosis and identify potential biomarkers that best reflect the biological and clinical characteristics of subtypes.
Unsupervised hierarchical clustering analysis was applied to gene expression data from 177 patients with colorectal cancer to determine a prognostic gene expression signature. Validation of the signature was sought in two independent patient groups. The association between the signature and prognosis of patients was assessed by Kaplan–Meier plots, log-rank tests and the Cox model.
The authors identified a gene signature that was associated with overall survival and disease-free survival in 177 patients and validated in two independent cohorts of 213 patients. In multivariate analysis, the signature was an independent risk factor (HR 3.08; 95% CI 1.33 to 7.14; p=0.008 for overall survival). Subset analysis of patients with AJCC (American Joint Committee on Cancer) stage III cancer revealed that the signature can also identify the patients who have better outcome with adjuvant chemotherapy (CTX). Adjuvant chemotherapy significantly affected disease-free survival in patients in subtype B (3-year rate, 71.2% (CTX) vs 41.9% (no CTX); p=0.004). However, such benefit of adjuvant chemotherapy was not significant for patients in subtype A.
The gene signature is an independent predictor of response to chemotherapy and clinical outcome in patients with colorectal cancer.
Individualized cancer treatment (e.g. targeted therapy) based on molecular alterations has emerged as an important strategy to improve the current standard-of-care chemotherapy. A large number of studies have demonstrated the importance of biomarkers not only in predicting prognosis but more importantly in predicting the response towards therapies. For example, amplification or mutation status of the two biomarkers HER2 (human epidermal growth factor 2) and BRCA (breast cancer) can be used to decide on a specific targeted therapy in breast cancer. However, no biomarkers with a similar clinical impact have been identified in pancreatic ductal adenocarcinoma. Although many genome-wide and proteome-based high-throughput studies have identified candidate genes or proteins as promising biomarkers, none of them were eventually transferred into the clinical setting. Notably, the most reliable markers for predicting prognosis are still the tumor stage and grade and biomarkers for therapy response remain undefined. One reason lies in the lack of systemic approaches to analyze the complexity of dominating cancer pathways and the impact of such signal complexity on prognosis and therapy response.
Pancreatic cancer; Diagnostic markers; Biomarkers; Targeted therapy; Prognosis; Pathways
Introduction. Predicting the aggressiveness of prostate cancer at biopsy is invaluable in making treatment decisions. In this paper we review the differential expression of genes and microRNAs identified through microarray analysis as potentially useful markers for prostate cancer prognosis and discuss some of the challenges associated with their development. Methods. A review of the literature was conducted through Medline. Articles were identified through searches of the following terms: “prostate cancer AND differential expression”, “prostate cancer prognosis”, and “prostate cancer AND microRNAs”. Results. Though numerous differentially expressed genes and microRNAs were identified as possible prognostic markers, the significance of several of these genes is either debated due to conflicting results or is not validated in other study populations. A few of the articles constructed predictive nomograms using a panel of biomarkers which require further validation. Challenges to the development of useful markers include different methodology, cancer heterogeneity, and sampling error. These can be overcome by categorizing prognostic factors into particular gene pathways or by supplementing biopsy information with blood or urine-based biomarkers. Conclusion. Though biomarkers based on differential expression offer the potential to improve decision making concerning prostate cancer, further validation of their utility and accuracy at the biopsy level is needed.
To identify transcriptional profiles predictive of the clinical benefit of cisplatin and fluorouracil (CF) chemotherapy to gastric cancer patients, endoscopic biopsy samples from 96 CF-treated metastatic gastric cancer patients were prospectively collected before therapy and analyzed using high-throughput transcriptional profiling and array comparative genomic hybridization. Transcriptional profiling identified 917 genes that are correlated with poor patient survival after CF at P<0.05 (poor prognosis signature), in which protein synthesis and DNA replication/recombination/repair functional categories are enriched. A survival risk predictor was then constructed using genes, which are included in the poor prognosis signature and are contained within identified genomic amplicons. The combined expression of three genes—MYC, EGFR and FGFR2—was an independent predictor for overall survival of 27 CF-treated patients in the validation set (adjusted P=0.017), and also for survival of 40 chemotherapy-treated gastric cancer patients in a published data set (adjusted P=0.026). Thus, combined expression of MYC, EGFR and FGFR2 is predictive of poor survival in CF-treated metastatic gastric cancer patients.
gastric; cancer; chemotherapy; gene; expression
In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.
Cancer genomics; Marker identification; Sparse boosting
Glioblastoma multiforme (GBM) is the most common and aggressive brain tumor with poor clinical outcome. Identification and development of new markers could be beneficial for the diagnosis and prognosis of GBM patients. Deregulation of microRNAs (miRNAs or miRs) is involved in GBM. Therefore, we attempted to identify and develop specific miRNAs as prognostic and predictive markers for GBM patient survival.
Expression profiles of miRNAs and genes and the corresponding clinical information of 480 GBM samples from The Cancer Genome Atlas (TCGA) dataset were downloaded and interested miRNAs were identified. Patients’ overall survival (OS) and progression-free survival (PFS) associated with interested miRNAs and miRNA-interactions were performed by Kaplan-Meier survival analysis. The impacts of miRNA expressions and miRNA-interactions on survival were evaluated by Cox proportional hazard regression model. Biological processes and network of putative and validated targets of miRNAs were analyzed by bioinformatics.
In this study, 6 interested miRNAs were identified. Survival analysis showed that high levels of miR-326/miR-130a and low levels of miR-323/miR-329/miR-155/miR-210 were significantly associated with long OS of GBM patients, and also showed that high miR-326/miR-130a and low miR-155/miR-210 were related with extended PFS. Moreover, miRNA-323 and miRNA-329 were found to be increased in patients with no-recurrence or long time to progression (TTP). More notably, our analysis revealed miRNA-interactions were more specific and accurate to discriminate and predict OS and PFS. This interaction stratified OS and PFS related with different miRNA levels more detailed, and could obtain longer span of mean survival in comparison to that of one single miRNA. Moreover, miR-326, miR-130a, miR-155, miR-210 and 4 miRNA-interactions were confirmed for the first time as independent predictors for survival by Cox regression model together with clinicopathological factors: Age, Gender and Recurrence. Plus, the availability and rationality of the miRNA-interaction as predictors for survival were further supported by analysis of network, biological processes, KEGG pathway and correlation analysis with gene markers.
Our results demonstrates that miR-326, miR-130a, miR-155, miR-210 and the 4 miRNA-interactions could serve as prognostic and predictive markers for survival of GBM patients, suggesting a potential application in improvement of prognostic tools and treatments.
Glioblastoma multiforme; microRNA; Prognostic marker; Overall survival; Progression-free survival; Interaction