PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1588434)

Clipboard (0)
None

Related Articles

1.  Integration of Gene Expression Profiling and Clinical Variables to Predict Prostate Carcinoma Recurrence after Radical Prostatectomy 
Cancer  2005;104(2):290-298.
BACKGROUND
Gene expression profiling of prostate carcinoma offers an alternative means to distinguish aggressive tumor biology and may improve the accuracy of outcome prediction for patients with prostate carcinoma treated by radical prostatectomy.
METHODS
Gene expression differences between 37 recurrent and 42 nonrecurrent primary prostate tumor specimens were analyzed by oligonucleotide microarrays. Two logistic regression modeling approaches were used to predict prostate carcinoma recurrence after radical prostatectomy. One approach was based exclusively on gene expression differences between the two classes. The second approach integrated prognostic gene variables with a validated postoperative predictive model based on standard variables (nomogram). The predictive accuracy of these modeling approaches was evaluated by leave-one-out cross-validation (LOOCV) and compared with the nomogram.
RESULTS
The modeling approach using gene variables alone accurately classified 59 (75%) tissue samples in LOOCV, a classification rate substantially higher than expected by chance. However, this predictive accuracy was inferior to the nomogram (concordance index, 0.75 vs. 0.84, P = 0.01). Models combining clinical and gene variables accurately classified 70 (89%) tissue samples and the predictive accuracy using this approach (concordance index, 0.89) was superior to the nomogram (P = 0.009) and models based on gene variables alone (P < 0.001). Importantly, the combined approach provided a marked improvement for patients whose nomogram-predicted likelihood of disease recurrence was in the indeterminate range (7-year disease progression-free probability, 30–70%; concordance index, 0.83 vs. 0.59, P = 0.01).
CONCLUSIONS
Integration of gene expression signatures and clinical variables produced predictive models for prostate carcinoma recurrence that perform significantly better than those based on either clinical variables or gene expression information alone.
doi:10.1002/cncr.21157
PMCID: PMC1852494  PMID: 15948174
prostatic neoplasms/pathology/surgery; prostatectomy; gene expression profiling; treatment outcome; logistic models
2.  Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer 
Background
Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone.
Methods
Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided.
Results
A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets.
Conclusions
The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management.
doi:10.1093/jnci/dju290
PMCID: PMC4241889  PMID: 25344601
3.  Outcome Prediction of Children with Neuroblastoma using a Multigene Expression Signature, a Retrospective SIOPEN/COG/GPOH Study 
The lancet oncology  2009;10(7):663-671.
BACKGROUND
More accurate prognostic assessment of patients with neuroblastoma is required to improve the choice of risk-related therapy. The aim of this study is to develop and validate a gene expression signature for improved outcome prediction.
METHODS
Fifty-nine genes were carefully selected based on an innovative data-mining strategy and profiled in the largest neuroblastoma patient series (n=579) to date using RT-qPCR starting from only 20 ng of RNA. A multigene expression signature was built using 30 training samples, tested on 313 test samples and subsequently validated in a blind study on an independent set of 236 additional tumours.
FINDINGS
The signature accurately classifies patients with respect to overall and progression-free survival (p<0·0001). The signature has a performance, sensitivity, and specificity of 85·4% (95%CI: 77·7–93·2), 84·4% (95%CI: 66·5–94·1), and 86·5% (95%CI: 81·1–90·6), respectively to predict patient outcome. Multivariate analysis indicates that the signature is a significant independent predictor after controlling for currently used riskfactors. Patients with high molecular risk have a higher risk to die from disease and for relapse/progression than patients with low molecular risk (odds ratio of 19·32 (95%CI: 6·50–57·43) and 3·96 (95%CI: 1·97–7·97) for OS and PFS, respectively). Patients with increased risk for adverse outcome can also be identified within the current treatment groups demonstrating the potential of this signature for improved clinical management. These results were confirmed in the validation study in which the signature was also independently statistically significant in a model adjusted for MYCN status, age, INSS stage, ploidy, INPC grade of differentiation, and MKI. The high patient/gene ratio (579/59) underlies the observed statistical power and robustness.
INTERPRETATION
A 59-gene expression signature predicts outcome of neuroblastoma patients with high accuracy. The signature is an independent risk predictor, identifying patients with increased risk in the current clinical risk groups. The applied method and signature is suitable for routine lab testing and ready for evaluation in prospective studies.
FUNDING
The Belgian Foundation Against Cancer, found of public interest (project SCIE2006-25), the Children Cancer Fund Ghent, the Belgian Society of Paediatric Haematology and Oncology, the Belgian Kid’s Fund and the Fondation Nuovo-Soldati (JV), the Fund for Scientific Research Flanders (KDP, JH), the Fund for Scientific Research Flanders (grant number: G•0198•08), the Institute for the Promotion of Innovation by Science and Technology in Flanders, Strategisch basisonderzoek (IWT-SBO 60848), the Fondation Fournier Majoie pour l’Innovation, the Instituto Carlos III,RD 06/0020/0102 Spain, the Italian Neuroblastoma Foundation, the European Community under the FP6 (project: STREP: EET-pipeline, number: 037260), and the Belgian program of Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister's Office, Science Policy Programming.
doi:10.1016/S1470-2045(09)70154-8
PMCID: PMC3045079  PMID: 19515614
4.  Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge 
BMC Bioinformatics  2013;14(Suppl 3):S14.
Background
Advances in sequencing technology over the past decade have resulted in an abundance of sequenced proteins whose function is yet unknown. As such, computational systems that can automatically predict and annotate protein function are in demand. Most computational systems use features derived from protein sequence or protein structure to predict function. In an earlier work, we demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. We have also shown that the combination of text-based and sequence-based prediction improves the performance of location predictors. Following up on this work, for the Critical Assessment of Function Annotations (CAFA) Challenge, we developed a text-based system that aims to predict molecular function and biological process (using Gene Ontology terms) for unannotated proteins. In this paper, we present the preliminary work and evaluation that we performed for our system, as part of the CAFA challenge.
Results
We have developed a preliminary system that represents proteins using text-based features and predicts protein function using a k-nearest neighbour classifier (Text-KNN). We selected text features for our classifier by extracting key terms from biomedical abstracts based on their statistical properties. The system was trained and tested using 5-fold cross-validation over a dataset of 36,536 proteins. System performance was measured using the standard measures of precision, recall, F-measure and overall accuracy. The performance of our system was compared to two baseline classifiers: one that assigns function based solely on the prior distribution of protein function (Base-Prior) and one that assigns function based on sequence similarity (Base-Seq). The overall prediction accuracy of Text-KNN, Base-Prior, and Base-Seq for molecular function classes are 62%, 43%, and 58% while the overall accuracy for biological process classes are 17%, 11%, and 28% respectively. Results obtained as part of the CAFA evaluation itself on the CAFA dataset are reported as well.
Conclusions
Our evaluation shows that the text-based classifier consistently outperforms the baseline classifier that is based on prior distribution, and typically has comparable performance to the baseline classifier that uses sequence similarity. Moreover, the results suggest that combining text features with other types of features can potentially lead to improved prediction performance. The preliminary results also suggest that while our text-based classifier can be used to predict both molecular function and biological process in which a protein is involved, the classifier performs significantly better for predicting molecular function than for predicting biological process. A similar trend was observed for other classifiers participating in the CAFA challenge.
doi:10.1186/1471-2105-14-S3-S14
PMCID: PMC3584852  PMID: 23514326
5.  Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes 
Cancer Informatics  2012;11:193-217.
Background
The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.
Methods
In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.
Results
MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.
Conclusion
Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
doi:10.4137/CIN.S10375
PMCID: PMC3529607  PMID: 23304070
microarray; classification; metagenes; breast cancer
6.  Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome 
BMC Bioinformatics  2012;13(Suppl 4):S13.
Background
Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome.
Methods
Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained.
Results
We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point.
Conclusions
The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.
doi:10.1186/1471-2105-13-S4-S13
PMCID: PMC3314564  PMID: 22536959
7.  Use of nomograms for predictions of outcome in patients with advanced bladder cancer 
Introduction
Accurate estimates of risk are essential for physicians if they are to recommend a specific management to patients with bladder cancer. In this review, we discuss the criteria for the evaluation of nomograms and review current available nomograms for advanced bladder cancer.
Methods
A retrospective review of the Pubmed database between 2002 and 2008 was performed using the keywords ‘nomogram’ and ‘bladder’. We limited the articles to advanced bladder cancer. We recorded input variables, prediction form, number of patients used to develop the prediction tools, the outcome being predicted, prediction tool-specific features, predictive accuracy, and whether validation was performed.
Results
We discuss the characteristics needed to evaluate nomograms such as predictive accuracy, calibration, generalizability, level of complexity, effect of competing risks, conditional probabilities, and head-to-head comparison with other prediction methods. The predictive accuracies of the pre-cystectomy tools (n = 2) range from ∼65–75% and that of the post-cystectomy tools (n = 5) range from ∼75–80%. While some of these nomograms are well-calibrated and outperform AJCC staging, none has been externally validated. To date, four studies demonstrated a statistically significant improvement in predictive accuracy of nomograms by including biomarkers.
Conclusions
Nomograms provide accurate individualized estimates of outcomes. They currently represent the most accurate and discriminatory decision-making aids tools for predicting outcomes in patients with bladder cancer. Use of current nomograms could improve current selection of patients for standard therapy and investigational trial design by ensuring homogeneous groups. The addition of biological markers to the currently available nomograms using clinical and pathologic data holds the promise of improving prediction and refining management of patients with bladder cancer.
doi:10.1177/1756287209103923
PMCID: PMC3126044  PMID: 21789050
bladder cancer; nomogram; prediction; prognosis; risk
8.  Identification and Validation of a Gene Expression Signature That Predicts Outcome in Adult Men With Germ Cell Tumors 
Journal of Clinical Oncology  2009;27(31):5240-5247.
Purpose
Germ cell tumor (GCT) is the most common malignancy in young adult men. Currently, patients are risk-stratified on the basis of clinical presentation and serum tumor markers. The introduction of molecular markers could improve outcome prediction.
Patients and Methods
Expression profiling was performed on 74 nonseminomatous GCTs (NSGCTs) from cisplatin-treated patients (ie, training set) and on 34 similarly treated patients with NSGCTs (ie, validation set). A gene classifier was developed by using prediction analysis for microarrays (PAM) for the binary end point of 5-year overall survival (OS). A predictive score was developed for OS by using the univariate Cox model.
Results
In the training set, PAM identified 140 genes that predicted 5-year OS (cross-validated classification rate, 60%). The PAM model correctly classified 90% of patients in the validation set. Patients predicted to have good outcome had significantly longer survival than those with poor predicted outcome (P < .001). For the OS end point, a 10-gene model had a predictive accuracy (ie, concordance index) of 0.66 in the training set and a concordance index of 0.83 in the validation set. Dichotomization of the samples on the basis of the median score resulted in significant differences in survival (P = .002). For both end points, the gene-based predictor was an independent prognostic factor in a multivariate model that included clinical risk stratification (P < .01 for both).
Conclusion
We have identified gene expression signatures that accurately predict outcome in patients with GCTs. These predictive genes should be useful for the prediction of patient outcome and could provide novel targets for therapeutic intervention.
doi:10.1200/JCO.2008.20.0386
PMCID: PMC3651602  PMID: 19770384
9.  A Systems Biology-Based Classifier for Hepatocellular Carcinoma Diagnosis 
PLoS ONE  2011;6(7):e22426.
Aim
The diagnosis of hepatocellular carcinoma (HCC) in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis.
Methods and Results
In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71%) and area under ROC curve (approximating 1.0), and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers.
Conclusion
Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.
doi:10.1371/journal.pone.0022426
PMCID: PMC3145651  PMID: 21829460
10.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value 
PLoS Medicine  2013;10(5):e1001453.
Background
Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.
Methods and Findings
Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype–like, normal-like, serrated CC phenotype–like), and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II–III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse-free survival, even after adjusting for age, sex, stage, and the emerging prognostic classifier Oncotype DX Colon Cancer Assay recurrence score (hazard ratio 1.5, 95% CI 1.1–2.1, p = 0.0097). However, a limitation of this study is that information on tumor grade and number of nodes examined was not available.
Conclusions
We describe the first, to our knowledge, robust transcriptome-based classification of CC that improves the current disease stratification based on clinicopathological variables and common DNA markers. The biological relevance of these subtypes is illustrated by significant differences in prognosis. This analysis provides possibilities for improving prognostic models and therapeutic strategies. In conclusion, we report a new classification of CC into six molecular subtypes that arise through distinct biological pathways.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cancer of the large bowel (colorectal cancer) is the third most common cancer in men and the second most common cancer in women worldwide. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year from this form of cancer—8% of all cancer deaths. The prognosis and treatment options for colorectal cancer depend on five pathological stages (0–IV), each of which has a different treatment option and five year survival rate, so it is important that the stage is correctly identified. Unfortunately, pathological staging fails to accurately predict recurrence (relapse) in patients undergoing surgery for localized colorectal cancer, which is a concern, as 10%–20% of patients with stage II and 30%–40% of those with stage III colorectal cancer develop recurrence.
Why Was This Study Done?
Previous studies have investigated whether there are any possible gene expression profiles (identified through microarray techniques) that can help predict prognosis of colorectal cancer, but so far, there have been no firm conclusions that can aid clinical practice. In this study, the researchers used genetic information from a French multicenter study to identify a standard, reproducible molecular classification based on gene expression analysis of colorectal cancer. The authors also assessed whether there were any associations between the identified molecular subtypes and clinical and pathological factors, common DNA alterations, and prognosis.
What Did the Researchers Do and Find?
The researchers used genetic information from a cohort of 750 patients with stage I to IV colorectal cancer who underwent surgery between 1987 and 2007 in seven centers in France. The researchers identified relevant clinical and pathological staging information for each patient from the medical records and calculated recurrence-free survival (the time from surgery to the first recurrence) for patients with stage II or III disease. In the genetic analysis, 566 tumor samples were suitable—443 were used in a discovery set, to create the classification, and the remainder were used in a validation set, to test the classification. The researchers also used information from eight public datasets to validate their findings.
Using these methods, the researchers classified the colon cancer samples into six molecular subtypes (based on gene expression data) and, on further analysis and validation, were able to distinguish the main biological characteristics and deregulated pathways associated with each subtype. Importantly, the researchers found that that these six subtypes were associated with distinct clinical and pathological characteristics, molecular alterations, specific gene expression signatures, and deregulated signaling pathways. In the prognostic analysis based on recurrence-free survival, the researchers found that patients whose tumors were classified in one of two clusters (C4 and C6) had poorer recurrence-free survival than the other patients.
What Do These Findings Mean?
These findings suggest that it is possible to classify colorectal cancer into six robust molecular subtypes that might help identify new prognostic subgroups and could provide a basis for developing robust prognostic genetic signatures for stage II and III colorectal cancer and for identifying specific markers for the different subtypes that might be targets for future drug development. However, as this study was retrospective and did not include some known predictors of colorectal cancer prognosis, such as tumor grade and number of nodes examined, the significance and robustness of the prognostic classification requires further confirmation with large prospective patient cohorts.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001453.
The American Cancer Society provides information about colorectal cancer and also about how colorectal cancer is staged
The US National Cancer Institute also provides information on colon and rectal cancer and colon cancer stages
doi:10.1371/journal.pmed.1001453
PMCID: PMC3660251  PMID: 23700391
11.  Establishment and validation of circulating tumor cell-based prognostic nomograms in first-line metastatic breast cancer patients 
Purpose
Circulating tumor cells (CTC) represent a new outcome-associated biomarker independently from known prognostic factors in metastatic breast cancer (MBC). The objective here was to develop and validate nomograms that combined baseline CTC counts and the other prognostic factors to assess the outcome of individual patients starting first-line treatment for MBC.
Experimental Design
We used a training set of 236 MBC patients starting a first-line treatment from the MD Anderson Cancer Center to establish nomograms that calculated the predicted probability of survival at different time points: 1, 2, and 5 years for overall survival (OS) and 6 months and 1 and 2 years for progression-free survival (PFS). The covariates computed in the model were: age, disease subtype, visceral metastases, performance status, and CTC counts by CellSearch. Nomograms were independently validated with 210 MBC patients from the Institut Curie who underwent first-line chemotherapy. The discriminatory ability and accuracy of the models were assessed using Harrell’s c-statistic and calibration plots at different time points in both training and validation datasets.
Results
Median follow-up was of 23 and 29 months in the MD Anderson and Institut Curie cohorts, respectively. Nomograms demonstrated good C-statistics: 0.74 for OS and 0.65 for PFS and discriminated OS prediction at 1, 2, and 5 years, and PFS prediction at 6 months and 1 and 2 years.
Conclusions
Nomograms, which relied on CTC counts as a continuous covariate, easily facilitated the use of a web-based tool for estimating survival, supporting treatment-decisions and clinical trial stratification in first-line MBC.
doi:10.1158/1078-0432.CCR-12-3137
PMCID: PMC3662240  PMID: 23340302
circulating tumor cells; first-line; metastatic breast cancer; nomogram; survival
12.  On the statistical assessment of classifiers using DNA microarray data 
BMC Bioinformatics  2006;7:387.
Background
In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.
Results
We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed.
Conclusions
The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required.
doi:10.1186/1471-2105-7-387
PMCID: PMC1564153  PMID: 16919171
13.  TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection 
BMC Medical Genomics  2013;6(Suppl 1):S3.
Background
One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG.
Results
The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations.
Conclusions
Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.
doi:10.1186/1755-8794-6-S1-S3
PMCID: PMC3552704  PMID: 23445528
14.  Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (danio rerio) 
BMC Genomics  2012;13:358.
Background
Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset.
Results
Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain.
Conclusions
The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions.
doi:10.1186/1471-2164-13-358
PMCID: PMC3469349  PMID: 22849515
Gene classifiers; Endocrine-disrupting chemicals; Transcriptomics; Mechanism of action; Zebrafish
15.  Prioritizing CD4 Count Monitoring in Response to ART in Resource-Constrained Settings: A Retrospective Application of Prediction-Based Classification 
PLoS Medicine  2012;9(4):e1001207.
Luis Montaner and colleagues retrospectively apply a potential capacity-saving CD4 count prediction tool to a cohort of HIV patients on antiretroviral therapy.
Background
Global programs of anti-HIV treatment depend on sustained laboratory capacity to assess treatment initiation thresholds and treatment response over time. Currently, there is no valid alternative to CD4 count testing for monitoring immunologic responses to treatment, but laboratory cost and capacity limit access to CD4 testing in resource-constrained settings. Thus, methods to prioritize patients for CD4 count testing could improve treatment monitoring by optimizing resource allocation.
Methods and Findings
Using a prospective cohort of HIV-infected patients (n = 1,956) monitored upon antiretroviral therapy initiation in seven clinical sites with distinct geographical and socio-economic settings, we retrospectively apply a novel prediction-based classification (PBC) modeling method. The model uses repeatedly measured biomarkers (white blood cell count and lymphocyte percent) to predict CD4+ T cell outcome through first-stage modeling and subsequent classification based on clinically relevant thresholds (CD4+ T cell count of 200 or 350 cells/µl). The algorithm correctly classified 90% (cross-validation estimate = 91.5%, standard deviation [SD] = 4.5%) of CD4 count measurements <200 cells/µl in the first year of follow-up; if laboratory testing is applied only to patients predicted to be below the 200-cells/µl threshold, we estimate a potential savings of 54.3% (SD = 4.2%) in CD4 testing capacity. A capacity savings of 34% (SD = 3.9%) is predicted using a CD4 threshold of 350 cells/µl. Similar results were obtained over the 3 y of follow-up available (n = 619). Limitations include a need for future economic healthcare outcome analysis, a need for assessment of extensibility beyond the 3-y observation time, and the need to assign a false positive threshold.
Conclusions
Our results support the use of PBC modeling as a triage point at the laboratory, lessening the need for laboratory-based CD4+ T cell count testing; implementation of this tool could help optimize the use of laboratory resources, directing CD4 testing towards higher-risk patients. However, further prospective studies and economic analyses are needed to demonstrate that the PBC model can be effectively applied in clinical settings.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
AIDS has killed nearly 30 million people since 1981, and about 34 million people (most of them living in low- and middle-income countries) are now infected with HIV, the virus that causes AIDS. HIV destroys immune system cells (including CD4 cells, a type of lymphocyte and one of the body's white blood cell types), leaving infected individuals susceptible to other infections. Early in the AIDS epidemic, most HIV-infected people died within ten years of infection. Then, in 1996, antiretroviral therapy (ART) became available, and for people living in affluent countries, HIV/AIDS became a chronic condition. However, ART was expensive, and for people living in developing countries, HIV/AIDS remained a fatal illness. In 2003, HIV was declared a global health emergency, and in 2006, the international community set itself the target of achieving universal access to ART by 2010. By the end of 2010, only 6.6 million of the estimated 15 million people in need of ART in developing countries were receiving ART.
Why Was This Study Done?
One factor that has impeded progress towards universal ART coverage has been the limited availability of trained personnel and laboratory facilities in many developing countries. These resources are needed to determine when individuals should start ART—the World Health Organization currently recommends that people start ART when their CD4 count drops below 350 cells/µl—and to monitor treatment responses over time so that viral resistance to ART is quickly detected. Although a total lymphocyte count can be used as a surrogate measure to decide when to start treatment, repeated CD4 cell counts are the only way to monitor immunologic responses to treatment, a level of monitoring that is rarely sustainable in resource-constrained settings. A method that optimizes resource allocation by prioritizing who gets tested might be one way to improve treatment monitoring. In this study, the researchers applied a new tool for prioritizing laboratory-based CD4 cell count testing in resource-constrained settings to patient data that had been previously collected.
What Did the Researchers Do and Find?
The researchers fitted a mixed-effects statistical model to repeated CD4 count measurements from HIV-infected individuals from seven sites around the world (including some resource-limited sites). They then used model-derived estimates to apply a mathematical tool for predicting—from a CD4 count taken at the start of treatment, and white blood cell counts and lymphocyte percentage measurements taken later—whether CD4 counts would be above 200 cells/µl (the original threshold recommended for ART initiation) and 350 cells/µl (the current recommended threshold) for up to three years after ART initiation. The tool correctly classified 91.5% of the CD4 cell counts that were below 200 cells/µl in the first year of ART. With this threshold, the potential savings in CD4 testing capacity was 54.3%. With a CD4 count threshold of 350 cells/µl, the potential savings in testing capacity was 34%. The results over a three-year follow-up were similar. When applied to six representative HIV-positive individuals, the tool correctly predicted all the CD4 counts above 200 cells/µl, although some individuals who had a predicted CD4 count of less than 200 cells/µl actually had a CD4 count above this threshold. Thus, none of these individuals would have been exposed to an undetected dangerous CD4 count, but the application of the tool would have saved 57% of the CD4 laboratory tests done during the first year of ART.
What Do These Findings Mean?
These findings support the use of this new tool—the prediction-based classification (PBC) algorithm—for predicting a drop in CD4 count below a clinically meaningful threshold in HIV-infected individuals receiving ART. Further studies are now needed to demonstrate the feasibility, clinical effectiveness, and cost-effectiveness of this approach, to find out whether the tool can be used over extended periods of time, and to investigate whether the accuracy of its predictions can be improved by, for example, adding in periodic CD4 testing. Provided these studies confirm its early promise, the researchers suggest that the PBC algorithm could be used as a “triage” tool to direct available laboratory testing capacity to high-priority individuals (those likely to have a dangerously low CD4 count). By optimizing the use of limited laboratory resources in this and other ways, the PBC algorithm could therefore help to maintain and expand ART programs in low- and middle-income countries.
Additional Information
Please access these web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001207.
Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS
NAM/aidsmap provides basic information about HIV/AIDS and summaries of recent research findings on HIV care and treatment
Information is available from Avert, an international AIDS charity, on many aspects of HIV/AIDS, including information on HIV/AIDS treatment and care and on universal access to AIDS treatment (in English and Spanish)
The World Health Organization provides information about universal access to AIDS treatment (in several languages)
More information about universal access to HIV treatment, prevention, care, and support is available from UNAIDS
Patient stories about living with HIV/AIDS are available through Avert and through the charity website Healthtalkonline
doi:10.1371/journal.pmed.1001207
PMCID: PMC3328436  PMID: 22529752
16.  Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios 
PLoS ONE  2014;9(4):e94917.
Objectives
Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.
Methods
In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.
Results
Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).
Conclusions
Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.
doi:10.1371/journal.pone.0094917
PMCID: PMC3986413  PMID: 24733553
17.  International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society International Multidisciplinary Classification of Lung Adenocarcinoma 
Introduction
Adenocarcinoma is the most common histologic type of lung cancer. To address advances in oncology, molecular biology, pathology, radiology, and surgery of lung adenocarcinoma, an international multidisciplinary classification was sponsored by the International Association for the Study of Lung Cancer, American Thoracic Society, and European Respiratory Society. This new adenocarcinoma classification is needed to provide uniform terminology and diagnostic criteria, especially for bronchioloalveolar carcinoma (BAC), the overall approach to small nonresection cancer specimens, and for multidisciplinary strategic management of tissue for molecular and immunohistochemical studies.
Methods
An international core panel of experts representing all three societies was formed with oncologists/pulmonologists, pathologists, radiologists, molecular biologists, and thoracic surgeons. A systematic review was performed under the guidance of the American Thoracic Society Documents Development and Implementation Committee. The search strategy identified 11,368 citations of which 312 articles met specified eligibility criteria and were retrieved for full text review. A series of meetings were held to discuss the development of the new classification, to develop the recommendations, and to write the current document. Recommendations for key questions were graded by strength and quality of the evidence according to the Grades of Recommendation, Assessment, Development, and Evaluation approach.
Results
The classification addresses both resection specimens, and small biopsies and cytology. The terms BAC and mixed subtype adenocarcinoma are no longer used. For resection specimens, new concepts are introduced such as adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA) for small solitary adenocarcinomas with either pure lepidic growth (AIS) or predominant lepidic growth with ≤5 mm invasion (MIA) to define patients who, if they undergo complete resection, will have 100% or near 100% disease-specific survival, respectively. AIS and MIA are usually nonmucinous but rarely may be mucinous. Invasive adenocarcinomas are classified by predominant pattern after using comprehensive histologic subtyping with lepidic (formerly most mixed subtype tumors with nonmucinous BAC), acinar, papillary, and solid patterns; micropapillary is added as a new histologic subtype. Variants include invasive mucinous adenocarcinoma (formerly mucinous BAC), colloid, fetal, and enteric adenocarcinoma. This classification provides guidance for small biopsies and cytology specimens, as approximately 70% of lung cancers are diagnosed in such samples. Non-small cell lung carcinomas (NSCLCs), in patients with advanced-stage disease, are to be classified into more specific types such as adenocarcinoma or squamous cell carcinoma, whenever possible for several reasons: (1) adenocarcinoma or NSCLC not otherwise specified should be tested for epidermal growth factor receptor (EGFR) mutations as the presence of these mutations is predictive of responsiveness to EGFR tyrosine kinase inhibitors, (2) adenocarcinoma histology is a strong predictor for improved outcome with pemetrexed therapy compared with squamous cell carcinoma, and (3) potential life-threatening hemorrhage may occur in patients with squamous cell carcinoma who receive bevacizumab. If the tumor cannot be classified based on light microscopy alone, special studies such as immunohistochemistry and/or mucin stains should be applied to classify the tumor further. Use of the term NSCLC not otherwise specified should be minimized.
Conclusions
This new classification strategy is based on a multidisciplinary approach to diagnosis of lung adenocarcinoma that incorporates clinical, molecular, radiologic, and surgical issues, but it is primarily based on histology. This classification is intended to support clinical practice, and research investigation and clinical trials. As EGFR mutation is a validated predictive marker for response and progression-free survival with EGFR tyrosine kinase inhibitors in advanced lung adenocarcinoma, we recommend that patients with advanced adenocarcinomas be tested for EGFR mutation. This has implications for strategic management of tissue, particularly for small biopsies and cytology samples, to maximize high-quality tissue available for molecular studies. Potential impact for tumor, node, and metastasis staging include adjustment of the size T factor according to only the invasive component (1) pathologically in invasive tumors with lepidic areas or (2) radiologically by measuring the solid component of part-solid nodules.
doi:10.1097/JTO.0b013e318206a221
PMCID: PMC4513953  PMID: 21252716
Lung; Adenocarcinoma; Classification; Histologic; Pathology; Oncology; Pulmonary; Radiology; Computed tomography; Molecular; EGFR; KRAS; EML4-ALK; Gene profiling; Gene amplification; Surgery; Limited resection; Bronchioloalveolar carcinoma; Lepidic; Acinar; Papillary; Micropapillary; Solid; Adenocarcinoma in situ; Minimally invasive adenocarcinoma; Colloid; Mucinous cystadenocarcinoma; Enteric; Fetal; Signet ring; Clear cell; Frozen section; TTF-1; p63
18.  A Risk Prediction Model for the Assessment and Triage of Women with Hypertensive Disorders of Pregnancy in Low-Resourced Settings: The miniPIERS (Pre-eclampsia Integrated Estimate of RiSk) Multi-country Prospective Cohort Study 
PLoS Medicine  2014;11(1):e1001589.
Beth Payne and colleagues use a risk prediction model, the Pre-eclampsia Integrated Estimate of RiSk (miniPIERS) to help inform the clinical assessment and triage of women with hypertensive disorders of pregnancy in low-resourced settings.
Please see later in the article for the Editors' Summary
Background
Pre-eclampsia/eclampsia are leading causes of maternal mortality and morbidity, particularly in low- and middle- income countries (LMICs). We developed the miniPIERS risk prediction model to provide a simple, evidence-based tool to identify pregnant women in LMICs at increased risk of death or major hypertensive-related complications.
Methods and Findings
From 1 July 2008 to 31 March 2012, in five LMICs, data were collected prospectively on 2,081 women with any hypertensive disorder of pregnancy admitted to a participating centre. Candidate predictors collected within 24 hours of admission were entered into a step-wise backward elimination logistic regression model to predict a composite adverse maternal outcome within 48 hours of admission. Model internal validation was accomplished by bootstrapping and external validation was completed using data from 1,300 women in the Pre-eclampsia Integrated Estimate of RiSk (fullPIERS) dataset. Predictive performance was assessed for calibration, discrimination, and stratification capacity. The final miniPIERS model included: parity (nulliparous versus multiparous); gestational age on admission; headache/visual disturbances; chest pain/dyspnoea; vaginal bleeding with abdominal pain; systolic blood pressure; and dipstick proteinuria. The miniPIERS model was well-calibrated and had an area under the receiver operating characteristic curve (AUC ROC) of 0.768 (95% CI 0.735–0.801) with an average optimism of 0.037. External validation AUC ROC was 0.713 (95% CI 0.658–0.768). A predicted probability ≥25% to define a positive test classified women with 85.5% accuracy. Limitations of this study include the composite outcome and the broad inclusion criteria of any hypertensive disorder of pregnancy. This broad approach was used to optimize model generalizability.
Conclusions
The miniPIERS model shows reasonable ability to identify women at increased risk of adverse maternal outcomes associated with the hypertensive disorders of pregnancy. It could be used in LMICs to identify women who would benefit most from interventions such as magnesium sulphate, antihypertensives, or transportation to a higher level of care.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Each year, ten million women develop pre-eclampsia or a related hypertensive (high blood pressure) disorder of pregnancy and 76,000 women die as a result. Globally, hypertensive disorders of pregnancy cause around 12% of maternal deaths—deaths of women during or shortly after pregnancy. The mildest of these disorders is gestational hypertension, high blood pressure that develops after 20 weeks of pregnancy. Gestational hypertension does not usually harm the mother or her unborn child and resolves after delivery but up to a quarter of women with this condition develop pre-eclampsia, a combination of hypertension and protein in the urine (proteinuria). Women with mild pre-eclampsia may not have any symptoms—the condition is detected during antenatal checks—but more severe pre-eclampsia can cause headaches, blurred vision, and other symptoms, and can lead to eclampsia (fits), multiple organ failure, and death of the mother and/or her baby. The only “cure” for pre-eclampsia is to deliver the baby as soon as possible but women are sometimes given antihypertensive drugs to lower their blood pressure or magnesium sulfate to prevent seizures.
Why Was This Study Done?
Women in low- and middle-income countries (LMICs) are more likely to develop complications of pre-eclampsia than women in high-income countries and most of the deaths associated with hypertensive disorders of pregnancy occur in LMICs. The high burden of illness and death in LMICs is thought to be primarily due to delays in triage (the identification of women who are or may become severely ill and who need specialist care) and delays in transporting these women to facilities where they can receive appropriate care. Because there is a shortage of health care workers who are adequately trained in the triage of suspected cases of hypertensive disorders of pregnancy in many LMICs, one way to improve the situation might be to design a simple tool to identify women at increased risk of complications or death from hypertensive disorders of pregnancy. Here, the researchers develop miniPIERS (Pre-eclampsia Integrated Estimate of RiSk), a clinical risk prediction model for adverse outcomes among women with hypertensive disorders of pregnancy suitable for use in community and primary health care facilities in LMICs.
What Did the Researchers Do and Find?
The researchers used data on candidate predictors of outcome that are easy to collect and/or measure in all health care settings and that are associated with pre-eclampsia from women admitted with any hypertensive disorder of pregnancy to participating centers in five LMICs to build a model to predict death or a serious complication such as organ damage within 48 hours of admission. The miniPIERS model included parity (whether the woman had been pregnant before), gestational age (length of pregnancy), headache/visual disturbances, chest pain/shortness of breath, vaginal bleeding with abdominal pain, systolic blood pressure, and proteinuria detected using a dipstick. The model was well-calibrated (the predicted risk of adverse outcomes agreed with the observed risk of adverse outcomes among the study participants), it had a good discriminatory ability (it could separate women who had a an adverse outcome from those who did not), and it designated women as being at high risk (25% or greater probability of an adverse outcome) with an accuracy of 85.5%. Importantly, external validation using data collected in fullPIERS, a study that developed a more complex clinical prediction model based on data from women attending tertiary hospitals in high-income countries, confirmed the predictive performance of miniPIERS.
What Do These Findings Mean?
These findings indicate that the miniPIERS model performs reasonably well as a tool to identify women at increased risk of adverse maternal outcomes associated with hypertensive disorders of pregnancy. Because miniPIERS only includes simple-to-measure personal characteristics, symptoms, and signs, it could potentially be used in resource-constrained settings to identify the women who would benefit most from interventions such as transportation to a higher level of care. However, further external validation of miniPIERS is needed using data collected from women living in LMICs before the model can be used during routine antenatal care. Moreover, the value of miniPIERS needs to be confirmed in implementation projects that examine whether its potential translates into clinical improvements. For now, though, the model could provide the basis for an education program to increase the knowledge of women, families, and community health care workers in LMICs about the signs and symptoms of hypertensive disorders of pregnancy.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001589.
The World Health Organization provides guidelines for the management of hypertensive disorders of pregnancy in low-resourced settings
The Maternal and Child Health Integrated Program provides information on pre-eclampsia and eclampsia targeted to low-resourced settings along with a tool-kit for LMIC providers
The US National Heart, Lung, and Blood Institute provides information about high blood pressure in pregnancy and a guide to lowering blood pressure in pregnancy
The UK National Health Service Choices website provides information about pre-eclampsia
The US not-for profit organization Preeclampsia Foundation provides information about all aspects of pre-eclampsia; its website includes some personal stories
The UK charity Healthtalkonline also provides personal stories about hypertensive disorders of pregnancy
MedlinePlus provides links to further information about high blood pressure and pregnancy (in English and Spanish); the MedlinePlus Encyclopedia has a video about pre-eclampsia (also in English and Spanish)
More information about miniPIERS and about fullPIERS is available
doi:10.1371/journal.pmed.1001589
PMCID: PMC3897359  PMID: 24465185
19.  Non-muscle invasive bladder cancer risk stratification 
Introduction:
Non-muscle invasive bladder cancer (NMIBC) comprises about 70% of all newly diagnosed bladder cancer, and includes tumors with stage Ta, T1 and carcinoma in situ (CIS.) Since, NMIBC patients with progression to muscle-invasive disease tend to have worse prognosis than with patients with primary muscle-invasive disease, there is a need to significantly improve risk stratification and earlier definitive treatment for high-risk NMIBC.
Materials and Methods:
A detailed Medline search was performed to identify all publications on the topic of prognostic factors and risk predictions for superficial bladder cancer/NMIBC. The manuscripts were reviewed to identify variables that could predict recurrence and progression.
Results:
The most important prognostic factor for progression is grade of tumor. T category, tumor size, number of tumors, concurrent CIS, intravesical therapy, response to bacillus Calmette–Guerin at 3- or 6-month follow-up, prior recurrence rate, age, gender, lymphovascular invasion and depth of lamina propria invasion are other important clinical and pathological parameters to predict recurrence and progression in patients with NMIBC. The European Organization for Research and Treatment of Cancer (EORTC) and the Spanish Club UrológicoEspañol de Tratamiento Oncológico (CUETO) risk tables are the two best-established predictive models for recurrence and progression risk calculation, although they tend to overestimate risk and have poor discrimination for prognostic outcomes in external validation. Molecular biomarkers such as Ki-67, FGFR3 and p53 appear to be promising in predicting recurrence and progression but need further validation prior to using them in clinical practice.
Conclusion:
EORTC and CUETO risk tables are the two best-established models to predict recurrence and progression in patients with NMIBC though they tend to overestimate risk and have poor discrimination for prognostic outcomes in external validation. Future research should focus on enhancing the predictive accuracy of risk assessment tools by incorporating additional prognostic factors such as depth of lamina propria invasion and molecular biomarkers after rigorous validation in multi-institutional cohorts.
doi:10.4103/0970-1591.166445
PMCID: PMC4626912  PMID: 26604439
Non-muscle invasive bladder cancer; superficial bladder cancer; outcome; prediction models; progression; recurrence; risk stratification
20.  Predicting Progression from Mild Cognitive Impairment to Alzheimer's Dementia Using Clinical, MRI, and Plasma Biomarkers via Probabilistic Pattern Classification 
PLoS ONE  2016;11(2):e0138866.
Background
Individuals with mild cognitive impairment (MCI) have a substantially increased risk of developing dementia due to Alzheimer's disease (AD). In this study, we developed a multivariate prognostic model for predicting MCI-to-dementia progression at the individual patient level.
Methods
Using baseline data from 259 MCI patients and a probabilistic, kernel-based pattern classification approach, we trained a classifier to distinguish between patients who progressed to AD-type dementia (n = 139) and those who did not (n = 120) during a three-year follow-up period. More than 750 variables across four data sources were considered as potential predictors of progression. These data sources included risk factors, cognitive and functional assessments, structural magnetic resonance imaging (MRI) data, and plasma proteomic data. Predictive utility was assessed using a rigorous cross-validation framework.
Results
Cognitive and functional markers were most predictive of progression, while plasma proteomic markers had limited predictive utility. The best performing model incorporated a combination of cognitive/functional markers and morphometric MRI measures and predicted progression with 80% accuracy (83% sensitivity, 76% specificity, AUC = 0.87). Predictors of progression included scores on the Alzheimer's Disease Assessment Scale, Rey Auditory Verbal Learning Test, and Functional Activities Questionnaire, as well as volume/cortical thickness of three brain regions (left hippocampus, middle temporal gyrus, and inferior parietal cortex). Calibration analysis revealed that the model is capable of generating probabilistic predictions that reliably reflect the actual risk of progression. Finally, we found that the predictive accuracy of the model varied with patient demographic, genetic, and clinical characteristics and could be further improved by taking into account the confidence of the predictions.
Conclusions
We developed an accurate prognostic model for predicting MCI-to-dementia progression over a three-year period. The model utilizes widely available, cost-effective, non-invasive markers and can be used to improve patient selection in clinical trials and identify high-risk MCI patients for early treatment.
doi:10.1371/journal.pone.0138866
PMCID: PMC4762666  PMID: 26901338
21.  Nomogram for predicting lymph node metastasis rate of submucosal gastric cancer by analyzing clinicopathological characteristics associated with lymph node metastasis 
Background
To combine clinicopathological characteristics associated with lymph node metastasis for submucosal gastric cancer into a nomogram.
Methods
We retrospectively analyzed 262 patients with submucosal gastric cancer who underwent D2 gastrectomy between 1996 and 2012. The relationship between lymph node metastasis and clinicopathological features was statistically analyzed. With multivariate logistic regression analysis, we made a nomogram to predict the possibility of lymph node metastasis. Receiver operating characteristic (ROC) analysis was also performed to assess the predictive value of the model. Discrimination and calibration were performed using internal validation.
Results
A total number of 48 (18.3%) patients with submucosal gastric cancer have pathologically lymph node metastasis. For submucosal gastric carcinoma, lymph node metastasis was associated with age, tumor location, macroscopic type, size, differentiation, histology, the existence of ulcer and lymphovascular invasion in univariate analysis (all P<0.05). The multivariate logistic regression analysis identified that age ≤50 years old, macroscopic type III or mixed, undifferentiated type, and presence of lymphovascular invasion were independent risk factors of lymph node metastasis in submucosal gastric cancer (all P<0.05). We constructed a predicting nomogram with all these factors for lymph node metastasis in submucosal gastric cancer with good discrimination [area under the curve (AUC) =0.844]. Internal validation demonstrated a good discrimination power that the actual probability corresponds closely with the predicted probability.
Conclusions
We developed a nomogram to predict the rate of lymph node metastasis for submucosal gastric cancer. With good discrimination and internal validation, the nomogram improved individualized predictions for assisting clinicians to make appropriated treatment decision for submucosal gastric cancer patients.
doi:10.3978/j.issn.1000-9604.2015.12.06
PMCID: PMC4697108  PMID: 26752931
Endoscopic resection; lymph node metastasis; nomogram; receiver operating characteristic (ROC); submucosal gastric cancer
22.  Nomogram predicted survival of patients with adenocarcinoma of esophagogastric junction 
Background
The aim of this study is to develop a prognostic nomogram for patients with adenocarcinoma of esophagogastric junction and compare its predictive accuracy with the traditional tumor-node-metastasis (TNM) malignant staging system.
Methods
Patients from the Surveillance, Epidemiology, and End Results Program (from 1988 to 2011) and the First Affiliated Hospital of Xi’an Jiaotong University (from 2005 to 2010) were collected retrospectively. Preselected multiple potential interactions were tested irrespective of significance as nomogram parameters. And the Harrell’s C-index was used to estimate the accuracy of the nomogram system. Model validation was performed using bootstrap to quantify our modeling strategy.
Results
In our study, six clinical associated factors (age, sex, depth of invasion, metastasized lymph nodes, examined lymph nodes, histological grade) were evaluated in the nomogram. In the training set, the nomogram exhibited superior discrimination power compared with the American Joint Committee on Cancer (AJCC) TNM classification (Harrell’s C-index, 0.69 and 0.63, respectively). Calibration of the nomogram predicted survival was similar to the actual overall survival. In the validation set, the discrimination of nomogram was also better than the AJCC TNM staging system (C-index, 0.75 and 0.65, respectively), and the calibration of nomogram predicted survival was within a 10 % margin of actual overall survival.
Conclusions
Based on the patients with adenocarcinoma of esophagogastric junction from a Western and an Eastern database, the nomogram provided significantly improved discrimination than the traditional AJCC TNM classification and also provided an accurate individualized prediction of the survival.
Electronic supplementary material
The online version of this article (doi:10.1186/s12957-015-0613-7) contains supplementary material, which is available to authorized users.
doi:10.1186/s12957-015-0613-7
PMCID: PMC4465317  PMID: 26055624
Adenocarcinoma of esophagogastric junction; Nomogram; Predictor; Survival
23.  Mass spectrometry protein expression profiles in colorectal cancer tissue associated with clinico-pathological features of disease 
BMC Cancer  2010;10:410.
Background
Studies of several tumour types have shown that expression profiling of cellular protein extracted from surgical tissue specimens by direct mass spectrometry analysis can accurately discriminate tumour from normal tissue and in some cases can sub-classify disease. We have evaluated the potential value of this approach to classify various clinico-pathological features in colorectal cancer by employing matrix-assisted laser desorption ionisation time of-flight-mass spectrometry (MALDI-TOF MS).
Methods
Protein extracts from 31 tumour and 33 normal mucosa specimens were purified, subjected to MALDI-Tof MS and then analysed using the 'GenePattern' suite of computational tools (Broad Institute, MIT, USA). Comparative Gene Marker Selection with either a t-test or a signal-to-noise ratio (SNR) test statistic was used to identify and rank differentially expressed marker peaks. The k-nearest neighbours algorithm was used to build classification models either using separate training and test datasets or else by using an iterative, 'leave-one-out' cross-validation method.
Results
73 protein peaks in the mass range 1800-16000Da were differentially expressed in tumour verses adjacent normal mucosa tissue (P ≤ 0.01, false discovery rate ≤ 0.05). Unsupervised hierarchical cluster analysis classified most tumour and normal mucosa into distinct cluster groups. Supervised prediction correctly classified the tumour/normal mucosa status of specimens in an independent test spectra dataset with 100% sensitivity and specificity (95% confidence interval: 67.9-99.2%). Supervised prediction using 'leave-one-out' cross validation algorithms for tumour spectra correctly classified 10/13 poorly differentiated and 16/18 well/moderately differentiated tumours (P = < 0.001; receiver-operator characteristics - ROC - error, 0.171); disease recurrence was correctly predicted in 5/6 cases and disease-free survival (median follow-up time, 25 months) was correctly predicted in 22/23 cases (P = < 0.001; ROC error, 0.105). A similar analysis of normal mucosa spectra correctly predicted 11/14 patients with, and 15/19 patients without lymph node involvement (P = 0.001; ROC error, 0.212).
Conclusions
Protein expression profiling of surgically resected CRC tissue extracts by MALDI-TOF MS has potential value in studies aimed at improved molecular classification of this disease. Further studies, with longer follow-up times and larger patient cohorts, that would permit independent validation of supervised classification models, would be required to confirm the predictive value of tumour spectra for disease recurrence/patient survival.
doi:10.1186/1471-2407-10-410
PMCID: PMC2927547  PMID: 20691062
24.  Postoperative systems models more accurately predict risk of significant disease progression than standard risk groups and a 10-year postoperative nomogram: potential impact on the receipt of adjuvant therapy after surgery 
BJU international  2011;109(1):40-45.
Objectives
To compare the performance of a systems-based risk assessment tool with standard defined risk groups and the 10-year postoperative nomogram for predicting disease progression, including biochemical relapse and clinical (systemic) failure.
Patients and methods
Clinical variables, biometric profiles and outcome results from a training cohort comprising 373 patients in a published postoperative systems-based prognostic model were obtained.
Patients were stratified according to D’Amico standard risk groups, Kattan 10-year postoperative nomogram and prognostic scores from the postoperative tissue model.
The association of pathological variables and calculated risk groups with biochemical recurrence and clinical (systemic) failure was assessed using the concordance index (C-index) and hazard ratio (HR).
Results
Systems-based post-prostatectomy models to predict significant disease progression (post-treatment clinical failure) were more accurate than the D’Amico defined risk groups and the Kattan 10-year postoperative nomogram (systems model: C-index, 0.84; HR, 17.46; P < 0.001 vs D’Amico: C-index, 0.73; HR, 11; P = 0.001; 10-year nomogram: C-index, 0.79; HR, 5.06; P < 0.001).
The systems models were also more accurate than standard risk groups for predicting prostate-specific antigen recurrence (systems model: C-index, 0.76; HR, 8.94; P < 0.001 vs D’Amico C-index, 0.70; HR, 4.67; P < 0.001) and shower incremental improvement over the 10-year postoperative nomogram (C-index, 0.75; HR, 5.83; P < 0.001).
The postoperative tissue model provided additional risk discrimination over surgical margin status and extracapsular extension for predicting disease outcome, and was most significant for the clinical (systemic) failure endpoint (surgical margin: C-index, 0.58; HR, 1.57; P = 0.2; extracapsular extension: C-index, 0.62; HR, 2.06; P = 0.04).
Conclusions
Risk assessment models that incorporate characteristics from the patient’s own tumour specimen are more accurate than clinical-only nomograms for predicting significant disease outcome.
Systems-based tools should provide useful information concerning the appropriate receipt of adjuvant therapy in the post-surgical setting.
doi:10.1111/j.1464-410X.2011.10398.x
PMCID: PMC4035101  PMID: 21771247
prostate cancer; systems-based models; nomogram
25.  Pre- and Post-Operative Nomograms to Predict Recurrence-Free Probability in Korean Men with Clinically Localized Prostate Cancer 
PLoS ONE  2014;9(6):e100053.
Objectives
Although the incidence of prostate cancer (PCa) is rapidly increasing in Korea, there are few suitable prediction models for disease recurrence after radical prostatectomy (RP). We established pre- and post-operative nomograms estimating biochemical recurrence (BCR)-free probability after RP in Korean men with clinically localized PCa.
Patients and Methods
Our sampling frame included 3,034 consecutive men with clinically localized PCa who underwent RP at our tertiary centers from June 2004 through July 2011. After inappropriate data exclusion, we evaluated 2,867 patients for the development of nomograms. The Cox proportional hazards regression model was used to develop pre- and post-operative nomograms that predict BCR-free probability. Finally, we resampled from our study cohort 200 times to determine the accuracy of our nomograms on internal validation, which were designated with concordance index (c-index) and further represented by calibration plots.
Results
Over a median of 47 months of follow-up, the estimated BCR-free rate was 87.8% (1 year), 83.8% (2 year), and 72.5% (5 year). In the pre-operative model, Prostate-Specific Antigen (PSA), the proportion of positive biopsy cores, clinical T3a and biopsy Gleason score (GS) were independent predictive factors for BCR, while all relevant predictive factors (PSA, extra-prostatic extension, seminal vesicle invasion, lymph node metastasis, surgical margin, and pathologic GS) were associated with BCR in the post-operative model. The c-index representing predictive accuracy was 0.792 (pre-) and 0.821 (post-operative), showing good fit in the calibration plots.
Conclusions
In summary, we developed pre- and post-operative nomograms predicting BCR-free probability after RP in a large Korean cohort with clinically localized PCa. These nomograms will be provided as the mobile application-based SNUH Prostate Cancer Calculator. Our nomograms can determine patients at high risk of disease recurrence after RP who will benefit from adjuvant therapy.
doi:10.1371/journal.pone.0100053
PMCID: PMC4061043  PMID: 24936784

Results 1-25 (1588434)