Search tips
Search criteria

Results 1-25 (1251169)

Clipboard (0)

Related Articles

1.  Ranking prognosis markers in cancer genomic studies 
Briefings in Bioinformatics  2010;12(1):33-40.
In cancer research, high-throughput genomic studies have been extensively conducted, searching for markers associated with cancer diagnosis, prognosis and variation in response to treatment. In this article, we analyze cancer prognosis studies and investigate ranking markers based on their marginal prognosis power. To avoid ambiguity, we focus on microarray gene expression studies where genes are the markers, but note that the methodology and results are applicable to other high-throughput studies. The objectives of this study are 2-fold. First, we investigate ranking markers under three commonly adopted semiparametric models, namely the Cox, accelerated failure time and additive risk models. Data analysis shows that the ranking may vary significantly under different models. Second, we describe a nonparametric concordance measure, which has roots in the time-dependent ROC (receiver operating characteristic) framework and relies on much weaker assumptions than the semiparametric models. In simulation, it is shown that ranking using the concordance measure is not sensitive to model specification whereas ranking under the semiparametric models is. In data analysis, the concordance measure generates rankings significantly different from those under the semiparametric models.
PMCID: PMC3030811  PMID: 21087949
cancer prognosis markers; semiparametric survival analysis; concordance measure
2.  Identification of Breast Cancer Prognosis Markers via Integrative Analysis 
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
PMCID: PMC3389801  PMID: 22773869
Breast cancer prognosis; Gene expression; Marker identification; Integrative analysis; 2-norm group bridge
3.  Identification of Breast Cancer Prognosis Markers using Integrative Sparse Boosting 
In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.
We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.
Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.
Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.
PMCID: PMC3598607  PMID: 22344268
Breast cancer prognosis; Gene Expression; Integrative analysis; Sparse boosting
4.  Incorporating Network Structure in Integrative Analysis of Cancer Prognosis Data 
Genetic epidemiology  2012;37(2):173-183.
In high-throughput cancer genomic studies, markers identified from the analysis of single datasets may have unsatisfactory properties because of low sample sizes. Integrative analysis pools and analyzes raw data from multiple studies, and can effectively increase sample size and lead to improved marker identification results. In this study, we consider the integrative analysis of multiple high-throughput cancer prognosis studies. In the existing integrative analysis studies, the interplay among genes, which can be described using the network structure, has not been effectively accounted for. In network analysis, tightly-connected nodes (genes) are more likely to have related biological functions and similar regression coefficients. The goal of this study is to develop an analysis approach that can incorporate the gene network structure in integrative analysis. To this end, we adopt an AFT (accelerated failure time) model to describe survival. A weighted least squares approach, which has low computational cost, is adopted for estimation. For marker selection, we propose a new penalization approach. The proposed penalty is composed of two parts. The first part is a group MCP penalty, and conducts gene selection. The second part is a Laplacian penalty, and smoothes the differences of coefficients for tightly-connected genes. A group coordinate descent approach is developed to compute the proposed estimate. Simulation study shows satisfactory performance of the proposed approach when there exist moderate to strong correlations among genes. We analyze three lung cancer prognosis datasets, and demonstrate that incorporating the network structure can lead to the identification of important genes and improved prediction performance.
PMCID: PMC3909475  PMID: 23161517
Integrative analysis; Cancer prognosis; Gene network; Penalized selection; Laplacian shrinkage
5.  On Model Specification and Selection of the Cox Proportional Hazards Model* 
Statistics in medicine  2013;32(26):4609-4623.
Prognosis plays a pivotal role in patient management and trial design. A useful prognostic model should correctly identify important risk factors and estimate their effects. In this article, we discuss several challenges in selecting prognostic factors and estimating their effects using the Cox proportional hazards model. Although a flexible semiparametric form, the Cox’s model is not entirely exempt from model misspecification. To minimize possible misspecification, instead of imposing traditional linear assumption, flexible modeling techniques have been proposed to accommodate the nonlinear effect. We first review several existing nonparametric estimation and selection procedures and then present a numerical study to compare the performance between parametric and nonparametric procedures. We demonstrate the impact of model misspecification on variable selection and model prediction using a simulation study and a example from a phase III trial in prostate cancer.
PMCID: PMC3795916  PMID: 23784939
Cox’s Model; Model Selection; LASSO; Smoothing Splines; COSSO
6.  Integrative Analysis of Cancer Prognosis Data with Multiple Subtypes Using Regularized Gradient Descent 
Genetic epidemiology  2012;10.1002/gepi.21669.
In cancer research, high-throughput profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Despite seemingly significant differences, different subtypes of the same cancer (or different types of cancers) may share common susceptibility genes. In this study, we analyze prognosis data on multiple subtypes of the same cancer, but note that the proposed approach is directly applicable to the analysis of data on multiple types of cancers. We describe the genetic basis of multiple subtypes using the heterogeneity model, which allows overlapping but different sets of susceptibility genes/SNPs for different subtypes. An accelerated failure time (AFT) model is adopted to describe prognosis. We develop a regularized gradient descent approach, which conducts gene-level analysis and identifies genes that contain important SNPs associated with prognosis. The proposed approach belongs to the family of gradient descent approaches, is intuitively reasonable, and has affordable computational cost. Simulation study shows that when prognosis-associated SNPs are clustered in a small number of genes, the proposed approach outperforms alternatives with significantly more true positives and fewer false positives. We analyze an NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements, and identify genes associated with the three major subtypes of NHL, namely DLBCL, FL and CLL/SLL. The proposed approach identifies genes different from using alternative approaches and has the best prediction performance.
PMCID: PMC3729731  PMID: 22851516
Integrative analysis; Cancer Prognosis; Gradient descent; NHL; SNP
7.  Metastasis-associated gene expression changes predict poor outcomes in patients with Dukes’ stage B and C colorectal cancer 
Colorectal cancer prognosis is currently predicted from pathological staging, providing limited discrimination for Dukes’ stage B and C disease. Additional markers for outcome are required to help guide therapy selection for individual patients.
Experimental Design
A multi-site single-platform microarray study was performed on 553 colorectal cancers. Gene expression changes were identified between stage A and D tumors (three training sets) and assessed as a prognosis signature in stage B and C tumors (independent test and external validation sets).
128 genes showed reproducible expression changes between three sets of stage A and D cancers. Using consistent genes, stage B and C cancers clustered into two groups resembling early-stage and metastatic tumors. A Prediction Analysis of Microarray (PAM) algorithm was developed to classify individual intermediate-stage cancers into stage A-like/good prognosis or stage D-like/poor prognosis types. For stage B patients, the treatment adjusted hazard ratio for six-year recurrence in individuals with stage D-like cancers was 10.3 (95% CI 1.3 to 80.0, P=0.011). For stage C patients, the adjusted hazard ratio was 2.9 (95% CI 1.1 to 7.6, P=0.016). Similar results were obtained for an external set of stage B and C patients. The prognosis signature was enriched for down-regulated immune response genes and up-regulated cell signaling and extracellular matrix genes. Accordingly, sparse tumor infiltration with mononuclear chronic inflammatory cells was associated with poor outcome in independent patients.
Metastasis-associated gene expression changes can be used to refine traditional outcome prediction, providing a rational approach for tailoring treatments to subsets of patients.
PMCID: PMC2920750  PMID: 19996206
colorectal cancer; gene expression; outcome prediction
8.  Detection of gene pathways with predictive power for breast cancer prognosis 
BMC Bioinformatics  2010;11:1.
Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed.
The new method advances beyond existing alternatives along the following aspects. First, it can assess the predictive power of gene pathways, whereas existing methods tend to focus on model fitting accuracy only. Second, it can account for the joint effects of multiple genes in a pathway, whereas existing methods tend to focus on the marginal effects of genes. Third, it can accommodate multiple heterogeneous datasets, whereas existing methods analyze a single dataset only. We analyze four breast cancer prognosis studies and identify 97 pathways with significant predictive power for prognosis. Important pathways missed by alternative methods are identified.
The proposed method provides a useful alternative to existing pathway analysis methods. Identified pathways can provide further insights into breast cancer prognosis.
PMCID: PMC2837025  PMID: 20043860
9.  Progression of Ductal Carcinoma in Situ from the Pathological Perspective 
Breast Care  2010;5(4):233-239.
Ductal carcinoma in situ (DCIS) now represents up to 20% of breast cancer cases, yet its behaviour is still poorly understood. Morphological classifications go some way to predicting prognosis, but more sophisticated approaches are required to better tailor therapy to the individual. A number of biological molecules have been identified that appear to relate to prognosis and, in model systems, promote progression to invasive disease. Some of these, such as COX-2, provide real therapeutic opportunities, whilst other marker combinations are showing promise in categorising women according to risk. Gene expression studies have led to an emerging molecular classification of invasive breast cancer, and it is now evident that at least some of these molecular subtypes can be identified at the pre-invasive stage. The difference in frequency of these subtypes between DCIS and invasive cancer may hold clues as to the biological mechanisms underpinning disease transition. It is increasingly clear that the host microenvironment can have a major impact on disease behaviour, and as well as acting as potential predictive factors, the altered microenvironment phenotype also offers novel therapeutic opportunities.
PMCID: PMC3346168  PMID: 22590443
DCIS; Linear progression; Parallel progression; Molecular classification; Microenvironment; Myoepithelial cells
10.  Incorporating gene co-expression network in identification of cancer prognosis markers 
BMC Bioinformatics  2010;11:271.
Extensive biomedical studies have shown that clinical and environmental risk factors may not have sufficient predictive power for cancer prognosis. The development of high-throughput profiling technologies makes it possible to survey the whole genome and search for genomic markers with predictive power. Many existing studies assume the interchangeability of gene effects and ignore the coordination among them.
We adopt the weighted co-expression network to describe the interplay among genes. Although there are several different ways of defining gene networks, the weighted co-expression network may be preferred because of its computational simplicity, satisfactory empirical performance, and because it does not demand additional biological experiments. For cancer prognosis studies with gene expression measurements, we propose a new marker selection method that can properly incorporate the network connectivity of genes. We analyze six prognosis studies on breast cancer and lymphoma. We find that the proposed approach can identify genes that are significantly different from those using alternatives. We search published literature and find that genes identified using the proposed approach are biologically meaningful. In addition, they have better prediction performance and reproducibility than genes identified using alternatives.
The network contains important information on the functionality of genes. Incorporating the network structure can improve cancer marker identification.
PMCID: PMC2881088  PMID: 20487548
11.  A Prognosis Classifier for Breast Cancer Based on Conserved Gene Regulation between Mammary Gland Development and Tumorigenesis: A Multiscale Statistical Model 
PLoS ONE  2013;8(4):e60131.
Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0–2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer.
PMCID: PMC3614930  PMID: 23565194
12.  Consensus Pathways Implicated in Prognosis of Colorectal Cancer Identified Through Systematic Enrichment Analysis of Gene Expression Profiling Studies 
PLoS ONE  2011;6(4):e18867.
A large number of gene expression profiling (GEP) studies on prognosis of colorectal cancer (CRC) has been performed, but no reliable gene signature for prediction of CRC prognosis has been found. Bioinformatic enrichment tools are a powerful approach to identify biological processes in high-throughput data analysis.
Principal Findings
We have for the first time collected the results from the 23 so far published independent GEP studies on CRC prognosis. In these 23 studies, 1475 unique, mapped genes were identified, from which 124 (8.4%) were reported in at least two studies, with 54 of them showing consisting direction in expression change between the single studies. Using these data, we attempted to overcome the lack of reproducibility observed in the genes reported in individual GEP studies by carrying out a pathway-based enrichment analysis. We used up to ten tools for overrepresentation analysis of Gene Ontology (GO) categories or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in each of the three gene lists (1475, 124 and 54 genes). This strategy, based on testing multiple tools, allowed us to identify the oxidative phosphorylation chain and the extracellular matrix receptor interaction categories, as well as a general category related to cell proliferation and apoptosis, as the only significantly and consistently overrepresented pathways in the three gene lists, which were reported by several enrichment tools.
Our pathway-based enrichment analysis of 23 independent gene expression profiling studies on prognosis of CRC identified significantly and consistently overrepresented prognostic categories for CRC. These overrepresented categories have been functionally clearly related with cancer progression, and deserve further investigation.
PMCID: PMC3081819  PMID: 21541025
13.  Gene network-based cancer prognosis analysis with sparse boosting 
Genetics research  2012;94(4):205-221.
High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level ‘super markers ’. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.
PMCID: PMC3573352  PMID: 22950901
14.  Checking semiparametric transformation models with censored data 
Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations.
PMCID: PMC3276276  PMID: 21785165
Goodness of fit; Martingale residuals; Model checking; Model misspecification; Model selection; Recurrent events; Survival data; Time-dependent covariate
15.  FGF18 as a prognostic and therapeutic biomarker in ovarian cancer 
The Journal of Clinical Investigation  2013;123(10):4435-4448.
High-throughput genomic technologies have identified biomarkers and potential therapeutic targets for ovarian cancer. Comprehensive functional validation studies of the biological and clinical implications of these biomarkers are needed to advance them toward clinical use. Amplification of chromosomal region 5q31–5q35.3 has been used to predict poor prognosis in patients with advanced stage, high-grade serous ovarian cancer. In this study, we further dissected this large amplicon and identified the overexpression of FGF18 as an independent predictive marker for poor clinical outcome in this patient population. Using cell culture and xenograft models, we show that FGF18 signaling promoted tumor progression by modulating the ovarian tumor aggressiveness and microenvironment. FGF18 controlled migration, invasion, and tumorigenicity of ovarian cancer cells through NF-κB activation, which increased the production of oncogenic cytokines and chemokines. This resulted in a tumor microenvironment characterized by enhanced angiogenesis and augmented tumor-associated macrophage infiltration and M2 polarization. Tumors from ovarian cancer patients had increased FGF18 expression levels with microvessel density and M2 macrophage infiltration, confirming our in vitro results. These findings demonstrate that FGF18 is important for a subset of ovarian cancers and may serve as a therapeutic target.
PMCID: PMC3784549  PMID: 24018557
16.  Complexity of molecular alterations impacts pancreatic cancer prognosis 
Individualized cancer treatment (e.g. targeted therapy) based on molecular alterations has emerged as an important strategy to improve the current standard-of-care chemotherapy. A large number of studies have demonstrated the importance of biomarkers not only in predicting prognosis but more importantly in predicting the response towards therapies. For example, amplification or mutation status of the two biomarkers HER2 (human epidermal growth factor 2) and BRCA (breast cancer) can be used to decide on a specific targeted therapy in breast cancer. However, no biomarkers with a similar clinical impact have been identified in pancreatic ductal adenocarcinoma. Although many genome-wide and proteome-based high-throughput studies have identified candidate genes or proteins as promising biomarkers, none of them were eventually transferred into the clinical setting. Notably, the most reliable markers for predicting prognosis are still the tumor stage and grade and biomarkers for therapy response remain undefined. One reason lies in the lack of systemic approaches to analyze the complexity of dominating cancer pathways and the impact of such signal complexity on prognosis and therapy response.
PMCID: PMC3555239  PMID: 23355925
Pancreatic cancer; Diagnostic markers; Biomarkers; Targeted therapy; Prognosis; Pathways
17.  Current Challenges in Development of Differentially Expressed and Prognostic Prostate Cancer Biomarkers 
Prostate Cancer  2012;2012:640968.
Introduction. Predicting the aggressiveness of prostate cancer at biopsy is invaluable in making treatment decisions. In this paper we review the differential expression of genes and microRNAs identified through microarray analysis as potentially useful markers for prostate cancer prognosis and discuss some of the challenges associated with their development. Methods. A review of the literature was conducted through Medline. Articles were identified through searches of the following terms: “prostate cancer AND differential expression”, “prostate cancer prognosis”, and “prostate cancer AND microRNAs”. Results. Though numerous differentially expressed genes and microRNAs were identified as possible prognostic markers, the significance of several of these genes is either debated due to conflicting results or is not validated in other study populations. A few of the articles constructed predictive nomograms using a panel of biomarkers which require further validation. Challenges to the development of useful markers include different methodology, cancer heterogeneity, and sampling error. These can be overcome by categorizing prognostic factors into particular gene pathways or by supplementing biopsy information with blood or urine-based biomarkers. Conclusion. Though biomarkers based on differential expression offer the potential to improve decision making concerning prostate cancer, further validation of their utility and accuracy at the biopsy level is needed.
PMCID: PMC3434411  PMID: 22970379
18.  Identification of cancer genomic markers via integrative sparse boosting 
Biostatistics (Oxford, England)  2012;13(3):509-522.
In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.
PMCID: PMC3577103  PMID: 22045909
Cancer genomics; Marker identification; Sparse boosting
19.  Prognostic Breast Cancer Signature Identified from 3D Culture Model Accurately Predicts Clinical Outcome across Independent Datasets 
PLoS ONE  2008;3(8):e2994.
One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively.
Methods and Findings
Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER− patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome.
The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic value for both ER-positive and ER-negative breast cancer. The signature was selected using a novel biological approach and hence holds promise to represent the key biological processes of breast cancer.
PMCID: PMC2500166  PMID: 18714348
20.  DACH1: Its Role as a Classifier of Long Term Good Prognosis in Luminal Breast Cancer 
PLoS ONE  2014;9(1):e84428.
Oestrogen receptor (ER) positive (luminal) tumours account for the largest proportion of females with breast cancer. Theirs is a heterogeneous disease presenting clinical challenges in managing their treatment. Three main biological luminal groups have been identified but clinically these can be distilled into two prognostic groups in which Luminal A are accorded good prognosis and Luminal B correlate with poor prognosis. Further biomarkers are needed to attain classification consensus. Machine learning approaches like Artificial Neural Networks (ANNs) have been used for classification and identification of biomarkers in breast cancer using high throughput data. In this study, we have used an artificial neural network (ANN) approach to identify DACH1 as a candidate luminal marker and its role in predicting clinical outcome in breast cancer is assessed.
Materials and methods
A reiterative ANN approach incorporating a network inferencing algorithm was used to identify ER-associated biomarkers in a publically available cDNA microarray dataset. DACH1 was identified in having a strong influence on ER associated markers and a positive association with ER. Its clinical relevance in predicting breast cancer specific survival was investigated by statistically assessing protein expression levels after immunohistochemistry in a series of unselected breast cancers, formatted as a tissue microarray.
Strong nuclear DACH1 staining is more prevalent in tubular and lobular breast cancer. Its expression correlated with ER-alpha positive tumours expressing PgR, epithelial cytokeratins (CK)18/19 and ‘luminal-like’ markers of good prognosis including FOXA1 and RERG (p<0.05). DACH1 is increased in patients showing longer cancer specific survival and disease free interval and reduced metastasis formation (p<0.001). Nuclear DACH1 showed a negative association with markers of aggressive growth and poor prognosis.
Nuclear DACH1 expression appears to be a Luminal A biomarker predictive of good prognosis, but is not independent of clinical stage, tumour size, NPI status or systemic therapy.
PMCID: PMC3879319  PMID: 24392136
21.  Identifying cancer biomarkers by network-constrained support vector machines 
BMC Systems Biology  2011;5:161.
One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.
We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.
We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.
PMCID: PMC3214162  PMID: 21992556
22.  Interactions of miR-323/miR-326/miR-329 and miR-130a/miR-155/miR-210 as prognostic indicators for clinical outcome of glioblastoma patients 
Glioblastoma multiforme (GBM) is the most common and aggressive brain tumor with poor clinical outcome. Identification and development of new markers could be beneficial for the diagnosis and prognosis of GBM patients. Deregulation of microRNAs (miRNAs or miRs) is involved in GBM. Therefore, we attempted to identify and develop specific miRNAs as prognostic and predictive markers for GBM patient survival.
Expression profiles of miRNAs and genes and the corresponding clinical information of 480 GBM samples from The Cancer Genome Atlas (TCGA) dataset were downloaded and interested miRNAs were identified. Patients’ overall survival (OS) and progression-free survival (PFS) associated with interested miRNAs and miRNA-interactions were performed by Kaplan-Meier survival analysis. The impacts of miRNA expressions and miRNA-interactions on survival were evaluated by Cox proportional hazard regression model. Biological processes and network of putative and validated targets of miRNAs were analyzed by bioinformatics.
In this study, 6 interested miRNAs were identified. Survival analysis showed that high levels of miR-326/miR-130a and low levels of miR-323/miR-329/miR-155/miR-210 were significantly associated with long OS of GBM patients, and also showed that high miR-326/miR-130a and low miR-155/miR-210 were related with extended PFS. Moreover, miRNA-323 and miRNA-329 were found to be increased in patients with no-recurrence or long time to progression (TTP). More notably, our analysis revealed miRNA-interactions were more specific and accurate to discriminate and predict OS and PFS. This interaction stratified OS and PFS related with different miRNA levels more detailed, and could obtain longer span of mean survival in comparison to that of one single miRNA. Moreover, miR-326, miR-130a, miR-155, miR-210 and 4 miRNA-interactions were confirmed for the first time as independent predictors for survival by Cox regression model together with clinicopathological factors: Age, Gender and Recurrence. Plus, the availability and rationality of the miRNA-interaction as predictors for survival were further supported by analysis of network, biological processes, KEGG pathway and correlation analysis with gene markers.
Our results demonstrates that miR-326, miR-130a, miR-155, miR-210 and the 4 miRNA-interactions could serve as prognostic and predictive markers for survival of GBM patients, suggesting a potential application in improvement of prognostic tools and treatments.
PMCID: PMC3551827  PMID: 23302469
Glioblastoma multiforme; microRNA; Prognostic marker; Overall survival; Progression-free survival; Interaction
23.  Combining a molecular profile with a clinical and pathological profile: Biostatistical considerations 
The use of molecular markers and gene expression profiling provides a promising approach for improving the predictive accuracy of current prognostic indices for predicting which patients with non-muscle-invasive bladder cancer will progress to muscle-invasive disease. There are many statistical pitfalls in establishing the benefit of a multigene expression classifier during its development. First, there are issues related to the identification of the individual genes and the false discovery rate, the instability of the genes identified and their combination into a classifier. Secondly, the classifier should be validated, preferably on an independent data set, to show its reproducibility. Next, it is necessary to show that adding the classifier to an existing model based on the most important clinical and pathological factors improves the predictive accuracy of the model. This cannot be determined based on the classifier's hazard ratio or p-value in a multivariate model, but should be assessed based on an improvement in statistics such as the area under the curve and the concordance index. Finally, nomograms are superior to stage and risk group classifications for predicting outcome, but the model predicting the outcome must be well calibrated. It is important for investigators to be aware of these pitfalls in order to develop statistically valid classifiers that will truly improve our ability to predict a patient's risk of progression.
PMCID: PMC2748188  PMID: 18815933
Area under the curve; biostatistics; molecular profile; nomograms; non-muscle-invasive bladder cancer; predictive accuracy; prognosis; progression; validation
24.  Identification of Prognostic Genes for Recurrent Risk Prediction in Triple Negative Breast Cancer Patients in Taiwan 
PLoS ONE  2011;6(11):e28222.
Discrepancies in the prognosis of triple negative breast cancer exist between Caucasian and Asian populations. Yet, the gene signature of triple negative breast cancer specifically for Asians has not become available. Therefore, the purpose of this study is to construct a prediction model for recurrence of triple negative breast cancer in Taiwanese patients. Whole genome expression profiling of breast cancers from 185 patients in Taiwan from 1995 to 2008 was performed, and the results were compared to the previously published literature to detect differences between Asian and Western patients. Pathway analysis and Cox proportional hazard models were applied to construct a prediction model for the recurrence of triple negative breast cancer. Hierarchical cluster analysis showed that triple negative breast cancers from different races were in separate sub-clusters but grouped in a bigger cluster. Two pathways, cAMP-mediated signaling and ephrin receptor signaling, were significantly associated with the recurrence of triple negative breast cancer. After using stepwise model selection from the combination of the initial filtered genes, we developed a prediction model based on the genes SLC22A23, PRKAG3, DPEP3, MORC2, GRB7, and FAM43A. The model had 91.7% accuracy, 81.8% sensitivity, and 94.6% specificity under leave-one-out support vector regression. In this study, we identified pathways related to triple negative breast cancer and developed a model to predict its recurrence. These results could be used for assisting with clinical prognosis and warrant further investigation into the possibility of targeted therapy of triple negative breast cancer in Taiwanese patients.
PMCID: PMC3226667  PMID: 22140552
25.  Challenges of incorporating gene expression data to predict HCC prognosis in the age of systems biology 
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide. The recurrence of HCC after curative treatments is currently a major hurdle. Identification of subsets of patients with distinct prognosis provides an opportunity to tailor therapeutic approaches as well as to select the patients with specific sub-phenotypes for targeted therapy. Thus, the development of gene expression profiles to improve the prediction of HCC prognosis is important for HCC management. Although several gene signatures have been evaluated for the prediction of HCC prognosis, there is no consensus on the predictive power of these signatures. Using systematic approaches to evaluate these signatures and combine them with clinicopathologic information may provide more accurate prediction of HCC prognosis. Recently, Villanueva et al[13] developed a composite prognostic model incorporating gene expression patterns in both tumor and adjacent tissues to predict HCC recurrence. In this commentary, we summarize the current progress in using gene signatures to predict HCC prognosis, and discuss the importance, existing issues and future research directions in this field.
PMCID: PMC3419990  PMID: 22912544
Gene expression signatures; Hepatocellular carcinoma; Prognosis

Results 1-25 (1251169)