PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1273340)

Clipboard (0)
None

Related Articles

1.  Integration of Gene Expression Profiling and Clinical Variables to Predict Prostate Carcinoma Recurrence after Radical Prostatectomy 
Cancer  2005;104(2):290-298.
BACKGROUND
Gene expression profiling of prostate carcinoma offers an alternative means to distinguish aggressive tumor biology and may improve the accuracy of outcome prediction for patients with prostate carcinoma treated by radical prostatectomy.
METHODS
Gene expression differences between 37 recurrent and 42 nonrecurrent primary prostate tumor specimens were analyzed by oligonucleotide microarrays. Two logistic regression modeling approaches were used to predict prostate carcinoma recurrence after radical prostatectomy. One approach was based exclusively on gene expression differences between the two classes. The second approach integrated prognostic gene variables with a validated postoperative predictive model based on standard variables (nomogram). The predictive accuracy of these modeling approaches was evaluated by leave-one-out cross-validation (LOOCV) and compared with the nomogram.
RESULTS
The modeling approach using gene variables alone accurately classified 59 (75%) tissue samples in LOOCV, a classification rate substantially higher than expected by chance. However, this predictive accuracy was inferior to the nomogram (concordance index, 0.75 vs. 0.84, P = 0.01). Models combining clinical and gene variables accurately classified 70 (89%) tissue samples and the predictive accuracy using this approach (concordance index, 0.89) was superior to the nomogram (P = 0.009) and models based on gene variables alone (P < 0.001). Importantly, the combined approach provided a marked improvement for patients whose nomogram-predicted likelihood of disease recurrence was in the indeterminate range (7-year disease progression-free probability, 30–70%; concordance index, 0.83 vs. 0.59, P = 0.01).
CONCLUSIONS
Integration of gene expression signatures and clinical variables produced predictive models for prostate carcinoma recurrence that perform significantly better than those based on either clinical variables or gene expression information alone.
doi:10.1002/cncr.21157
PMCID: PMC1852494  PMID: 15948174
prostatic neoplasms/pathology/surgery; prostatectomy; gene expression profiling; treatment outcome; logistic models
2.  Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer 
Background
Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone.
Methods
Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided.
Results
A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets.
Conclusions
The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management.
doi:10.1093/jnci/dju290
PMCID: PMC4241889  PMID: 25344601
3.  Outcome Prediction of Children with Neuroblastoma using a Multigene Expression Signature, a Retrospective SIOPEN/COG/GPOH Study 
The lancet oncology  2009;10(7):663-671.
BACKGROUND
More accurate prognostic assessment of patients with neuroblastoma is required to improve the choice of risk-related therapy. The aim of this study is to develop and validate a gene expression signature for improved outcome prediction.
METHODS
Fifty-nine genes were carefully selected based on an innovative data-mining strategy and profiled in the largest neuroblastoma patient series (n=579) to date using RT-qPCR starting from only 20 ng of RNA. A multigene expression signature was built using 30 training samples, tested on 313 test samples and subsequently validated in a blind study on an independent set of 236 additional tumours.
FINDINGS
The signature accurately classifies patients with respect to overall and progression-free survival (p<0·0001). The signature has a performance, sensitivity, and specificity of 85·4% (95%CI: 77·7–93·2), 84·4% (95%CI: 66·5–94·1), and 86·5% (95%CI: 81·1–90·6), respectively to predict patient outcome. Multivariate analysis indicates that the signature is a significant independent predictor after controlling for currently used riskfactors. Patients with high molecular risk have a higher risk to die from disease and for relapse/progression than patients with low molecular risk (odds ratio of 19·32 (95%CI: 6·50–57·43) and 3·96 (95%CI: 1·97–7·97) for OS and PFS, respectively). Patients with increased risk for adverse outcome can also be identified within the current treatment groups demonstrating the potential of this signature for improved clinical management. These results were confirmed in the validation study in which the signature was also independently statistically significant in a model adjusted for MYCN status, age, INSS stage, ploidy, INPC grade of differentiation, and MKI. The high patient/gene ratio (579/59) underlies the observed statistical power and robustness.
INTERPRETATION
A 59-gene expression signature predicts outcome of neuroblastoma patients with high accuracy. The signature is an independent risk predictor, identifying patients with increased risk in the current clinical risk groups. The applied method and signature is suitable for routine lab testing and ready for evaluation in prospective studies.
FUNDING
The Belgian Foundation Against Cancer, found of public interest (project SCIE2006-25), the Children Cancer Fund Ghent, the Belgian Society of Paediatric Haematology and Oncology, the Belgian Kid’s Fund and the Fondation Nuovo-Soldati (JV), the Fund for Scientific Research Flanders (KDP, JH), the Fund for Scientific Research Flanders (grant number: G•0198•08), the Institute for the Promotion of Innovation by Science and Technology in Flanders, Strategisch basisonderzoek (IWT-SBO 60848), the Fondation Fournier Majoie pour l’Innovation, the Instituto Carlos III,RD 06/0020/0102 Spain, the Italian Neuroblastoma Foundation, the European Community under the FP6 (project: STREP: EET-pipeline, number: 037260), and the Belgian program of Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister's Office, Science Policy Programming.
doi:10.1016/S1470-2045(09)70154-8
PMCID: PMC3045079  PMID: 19515614
4.  Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge 
BMC Bioinformatics  2013;14(Suppl 3):S14.
Background
Advances in sequencing technology over the past decade have resulted in an abundance of sequenced proteins whose function is yet unknown. As such, computational systems that can automatically predict and annotate protein function are in demand. Most computational systems use features derived from protein sequence or protein structure to predict function. In an earlier work, we demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. We have also shown that the combination of text-based and sequence-based prediction improves the performance of location predictors. Following up on this work, for the Critical Assessment of Function Annotations (CAFA) Challenge, we developed a text-based system that aims to predict molecular function and biological process (using Gene Ontology terms) for unannotated proteins. In this paper, we present the preliminary work and evaluation that we performed for our system, as part of the CAFA challenge.
Results
We have developed a preliminary system that represents proteins using text-based features and predicts protein function using a k-nearest neighbour classifier (Text-KNN). We selected text features for our classifier by extracting key terms from biomedical abstracts based on their statistical properties. The system was trained and tested using 5-fold cross-validation over a dataset of 36,536 proteins. System performance was measured using the standard measures of precision, recall, F-measure and overall accuracy. The performance of our system was compared to two baseline classifiers: one that assigns function based solely on the prior distribution of protein function (Base-Prior) and one that assigns function based on sequence similarity (Base-Seq). The overall prediction accuracy of Text-KNN, Base-Prior, and Base-Seq for molecular function classes are 62%, 43%, and 58% while the overall accuracy for biological process classes are 17%, 11%, and 28% respectively. Results obtained as part of the CAFA evaluation itself on the CAFA dataset are reported as well.
Conclusions
Our evaluation shows that the text-based classifier consistently outperforms the baseline classifier that is based on prior distribution, and typically has comparable performance to the baseline classifier that uses sequence similarity. Moreover, the results suggest that combining text features with other types of features can potentially lead to improved prediction performance. The preliminary results also suggest that while our text-based classifier can be used to predict both molecular function and biological process in which a protein is involved, the classifier performs significantly better for predicting molecular function than for predicting biological process. A similar trend was observed for other classifiers participating in the CAFA challenge.
doi:10.1186/1471-2105-14-S3-S14
PMCID: PMC3584852  PMID: 23514326
5.  Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes 
Cancer Informatics  2012;11:193-217.
Background
The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.
Methods
In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.
Results
MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.
Conclusion
Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
doi:10.4137/CIN.S10375
PMCID: PMC3529607  PMID: 23304070
microarray; classification; metagenes; breast cancer
6.  Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome 
BMC Bioinformatics  2012;13(Suppl 4):S13.
Background
Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome.
Methods
Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained.
Results
We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point.
Conclusions
The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.
doi:10.1186/1471-2105-13-S4-S13
PMCID: PMC3314564  PMID: 22536959
7.  Identification and Validation of a Gene Expression Signature That Predicts Outcome in Adult Men With Germ Cell Tumors 
Journal of Clinical Oncology  2009;27(31):5240-5247.
Purpose
Germ cell tumor (GCT) is the most common malignancy in young adult men. Currently, patients are risk-stratified on the basis of clinical presentation and serum tumor markers. The introduction of molecular markers could improve outcome prediction.
Patients and Methods
Expression profiling was performed on 74 nonseminomatous GCTs (NSGCTs) from cisplatin-treated patients (ie, training set) and on 34 similarly treated patients with NSGCTs (ie, validation set). A gene classifier was developed by using prediction analysis for microarrays (PAM) for the binary end point of 5-year overall survival (OS). A predictive score was developed for OS by using the univariate Cox model.
Results
In the training set, PAM identified 140 genes that predicted 5-year OS (cross-validated classification rate, 60%). The PAM model correctly classified 90% of patients in the validation set. Patients predicted to have good outcome had significantly longer survival than those with poor predicted outcome (P < .001). For the OS end point, a 10-gene model had a predictive accuracy (ie, concordance index) of 0.66 in the training set and a concordance index of 0.83 in the validation set. Dichotomization of the samples on the basis of the median score resulted in significant differences in survival (P = .002). For both end points, the gene-based predictor was an independent prognostic factor in a multivariate model that included clinical risk stratification (P < .01 for both).
Conclusion
We have identified gene expression signatures that accurately predict outcome in patients with GCTs. These predictive genes should be useful for the prediction of patient outcome and could provide novel targets for therapeutic intervention.
doi:10.1200/JCO.2008.20.0386
PMCID: PMC3651602  PMID: 19770384
8.  Use of nomograms for predictions of outcome in patients with advanced bladder cancer 
Introduction
Accurate estimates of risk are essential for physicians if they are to recommend a specific management to patients with bladder cancer. In this review, we discuss the criteria for the evaluation of nomograms and review current available nomograms for advanced bladder cancer.
Methods
A retrospective review of the Pubmed database between 2002 and 2008 was performed using the keywords ‘nomogram’ and ‘bladder’. We limited the articles to advanced bladder cancer. We recorded input variables, prediction form, number of patients used to develop the prediction tools, the outcome being predicted, prediction tool-specific features, predictive accuracy, and whether validation was performed.
Results
We discuss the characteristics needed to evaluate nomograms such as predictive accuracy, calibration, generalizability, level of complexity, effect of competing risks, conditional probabilities, and head-to-head comparison with other prediction methods. The predictive accuracies of the pre-cystectomy tools (n = 2) range from ∼65–75% and that of the post-cystectomy tools (n = 5) range from ∼75–80%. While some of these nomograms are well-calibrated and outperform AJCC staging, none has been externally validated. To date, four studies demonstrated a statistically significant improvement in predictive accuracy of nomograms by including biomarkers.
Conclusions
Nomograms provide accurate individualized estimates of outcomes. They currently represent the most accurate and discriminatory decision-making aids tools for predicting outcomes in patients with bladder cancer. Use of current nomograms could improve current selection of patients for standard therapy and investigational trial design by ensuring homogeneous groups. The addition of biological markers to the currently available nomograms using clinical and pathologic data holds the promise of improving prediction and refining management of patients with bladder cancer.
doi:10.1177/1756287209103923
PMCID: PMC3126044  PMID: 21789050
bladder cancer; nomogram; prediction; prognosis; risk
9.  A Systems Biology-Based Classifier for Hepatocellular Carcinoma Diagnosis 
PLoS ONE  2011;6(7):e22426.
Aim
The diagnosis of hepatocellular carcinoma (HCC) in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis.
Methods and Results
In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71%) and area under ROC curve (approximating 1.0), and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers.
Conclusion
Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.
doi:10.1371/journal.pone.0022426
PMCID: PMC3145651  PMID: 21829460
10.  Prioritizing CD4 Count Monitoring in Response to ART in Resource-Constrained Settings: A Retrospective Application of Prediction-Based Classification 
PLoS Medicine  2012;9(4):e1001207.
Luis Montaner and colleagues retrospectively apply a potential capacity-saving CD4 count prediction tool to a cohort of HIV patients on antiretroviral therapy.
Background
Global programs of anti-HIV treatment depend on sustained laboratory capacity to assess treatment initiation thresholds and treatment response over time. Currently, there is no valid alternative to CD4 count testing for monitoring immunologic responses to treatment, but laboratory cost and capacity limit access to CD4 testing in resource-constrained settings. Thus, methods to prioritize patients for CD4 count testing could improve treatment monitoring by optimizing resource allocation.
Methods and Findings
Using a prospective cohort of HIV-infected patients (n = 1,956) monitored upon antiretroviral therapy initiation in seven clinical sites with distinct geographical and socio-economic settings, we retrospectively apply a novel prediction-based classification (PBC) modeling method. The model uses repeatedly measured biomarkers (white blood cell count and lymphocyte percent) to predict CD4+ T cell outcome through first-stage modeling and subsequent classification based on clinically relevant thresholds (CD4+ T cell count of 200 or 350 cells/µl). The algorithm correctly classified 90% (cross-validation estimate = 91.5%, standard deviation [SD] = 4.5%) of CD4 count measurements <200 cells/µl in the first year of follow-up; if laboratory testing is applied only to patients predicted to be below the 200-cells/µl threshold, we estimate a potential savings of 54.3% (SD = 4.2%) in CD4 testing capacity. A capacity savings of 34% (SD = 3.9%) is predicted using a CD4 threshold of 350 cells/µl. Similar results were obtained over the 3 y of follow-up available (n = 619). Limitations include a need for future economic healthcare outcome analysis, a need for assessment of extensibility beyond the 3-y observation time, and the need to assign a false positive threshold.
Conclusions
Our results support the use of PBC modeling as a triage point at the laboratory, lessening the need for laboratory-based CD4+ T cell count testing; implementation of this tool could help optimize the use of laboratory resources, directing CD4 testing towards higher-risk patients. However, further prospective studies and economic analyses are needed to demonstrate that the PBC model can be effectively applied in clinical settings.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
AIDS has killed nearly 30 million people since 1981, and about 34 million people (most of them living in low- and middle-income countries) are now infected with HIV, the virus that causes AIDS. HIV destroys immune system cells (including CD4 cells, a type of lymphocyte and one of the body's white blood cell types), leaving infected individuals susceptible to other infections. Early in the AIDS epidemic, most HIV-infected people died within ten years of infection. Then, in 1996, antiretroviral therapy (ART) became available, and for people living in affluent countries, HIV/AIDS became a chronic condition. However, ART was expensive, and for people living in developing countries, HIV/AIDS remained a fatal illness. In 2003, HIV was declared a global health emergency, and in 2006, the international community set itself the target of achieving universal access to ART by 2010. By the end of 2010, only 6.6 million of the estimated 15 million people in need of ART in developing countries were receiving ART.
Why Was This Study Done?
One factor that has impeded progress towards universal ART coverage has been the limited availability of trained personnel and laboratory facilities in many developing countries. These resources are needed to determine when individuals should start ART—the World Health Organization currently recommends that people start ART when their CD4 count drops below 350 cells/µl—and to monitor treatment responses over time so that viral resistance to ART is quickly detected. Although a total lymphocyte count can be used as a surrogate measure to decide when to start treatment, repeated CD4 cell counts are the only way to monitor immunologic responses to treatment, a level of monitoring that is rarely sustainable in resource-constrained settings. A method that optimizes resource allocation by prioritizing who gets tested might be one way to improve treatment monitoring. In this study, the researchers applied a new tool for prioritizing laboratory-based CD4 cell count testing in resource-constrained settings to patient data that had been previously collected.
What Did the Researchers Do and Find?
The researchers fitted a mixed-effects statistical model to repeated CD4 count measurements from HIV-infected individuals from seven sites around the world (including some resource-limited sites). They then used model-derived estimates to apply a mathematical tool for predicting—from a CD4 count taken at the start of treatment, and white blood cell counts and lymphocyte percentage measurements taken later—whether CD4 counts would be above 200 cells/µl (the original threshold recommended for ART initiation) and 350 cells/µl (the current recommended threshold) for up to three years after ART initiation. The tool correctly classified 91.5% of the CD4 cell counts that were below 200 cells/µl in the first year of ART. With this threshold, the potential savings in CD4 testing capacity was 54.3%. With a CD4 count threshold of 350 cells/µl, the potential savings in testing capacity was 34%. The results over a three-year follow-up were similar. When applied to six representative HIV-positive individuals, the tool correctly predicted all the CD4 counts above 200 cells/µl, although some individuals who had a predicted CD4 count of less than 200 cells/µl actually had a CD4 count above this threshold. Thus, none of these individuals would have been exposed to an undetected dangerous CD4 count, but the application of the tool would have saved 57% of the CD4 laboratory tests done during the first year of ART.
What Do These Findings Mean?
These findings support the use of this new tool—the prediction-based classification (PBC) algorithm—for predicting a drop in CD4 count below a clinically meaningful threshold in HIV-infected individuals receiving ART. Further studies are now needed to demonstrate the feasibility, clinical effectiveness, and cost-effectiveness of this approach, to find out whether the tool can be used over extended periods of time, and to investigate whether the accuracy of its predictions can be improved by, for example, adding in periodic CD4 testing. Provided these studies confirm its early promise, the researchers suggest that the PBC algorithm could be used as a “triage” tool to direct available laboratory testing capacity to high-priority individuals (those likely to have a dangerously low CD4 count). By optimizing the use of limited laboratory resources in this and other ways, the PBC algorithm could therefore help to maintain and expand ART programs in low- and middle-income countries.
Additional Information
Please access these web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001207.
Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS
NAM/aidsmap provides basic information about HIV/AIDS and summaries of recent research findings on HIV care and treatment
Information is available from Avert, an international AIDS charity, on many aspects of HIV/AIDS, including information on HIV/AIDS treatment and care and on universal access to AIDS treatment (in English and Spanish)
The World Health Organization provides information about universal access to AIDS treatment (in several languages)
More information about universal access to HIV treatment, prevention, care, and support is available from UNAIDS
Patient stories about living with HIV/AIDS are available through Avert and through the charity website Healthtalkonline
doi:10.1371/journal.pmed.1001207
PMCID: PMC3328436  PMID: 22529752
11.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value 
PLoS Medicine  2013;10(5):e1001453.
Background
Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.
Methods and Findings
Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype–like, normal-like, serrated CC phenotype–like), and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II–III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse-free survival, even after adjusting for age, sex, stage, and the emerging prognostic classifier Oncotype DX Colon Cancer Assay recurrence score (hazard ratio 1.5, 95% CI 1.1–2.1, p = 0.0097). However, a limitation of this study is that information on tumor grade and number of nodes examined was not available.
Conclusions
We describe the first, to our knowledge, robust transcriptome-based classification of CC that improves the current disease stratification based on clinicopathological variables and common DNA markers. The biological relevance of these subtypes is illustrated by significant differences in prognosis. This analysis provides possibilities for improving prognostic models and therapeutic strategies. In conclusion, we report a new classification of CC into six molecular subtypes that arise through distinct biological pathways.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cancer of the large bowel (colorectal cancer) is the third most common cancer in men and the second most common cancer in women worldwide. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year from this form of cancer—8% of all cancer deaths. The prognosis and treatment options for colorectal cancer depend on five pathological stages (0–IV), each of which has a different treatment option and five year survival rate, so it is important that the stage is correctly identified. Unfortunately, pathological staging fails to accurately predict recurrence (relapse) in patients undergoing surgery for localized colorectal cancer, which is a concern, as 10%–20% of patients with stage II and 30%–40% of those with stage III colorectal cancer develop recurrence.
Why Was This Study Done?
Previous studies have investigated whether there are any possible gene expression profiles (identified through microarray techniques) that can help predict prognosis of colorectal cancer, but so far, there have been no firm conclusions that can aid clinical practice. In this study, the researchers used genetic information from a French multicenter study to identify a standard, reproducible molecular classification based on gene expression analysis of colorectal cancer. The authors also assessed whether there were any associations between the identified molecular subtypes and clinical and pathological factors, common DNA alterations, and prognosis.
What Did the Researchers Do and Find?
The researchers used genetic information from a cohort of 750 patients with stage I to IV colorectal cancer who underwent surgery between 1987 and 2007 in seven centers in France. The researchers identified relevant clinical and pathological staging information for each patient from the medical records and calculated recurrence-free survival (the time from surgery to the first recurrence) for patients with stage II or III disease. In the genetic analysis, 566 tumor samples were suitable—443 were used in a discovery set, to create the classification, and the remainder were used in a validation set, to test the classification. The researchers also used information from eight public datasets to validate their findings.
Using these methods, the researchers classified the colon cancer samples into six molecular subtypes (based on gene expression data) and, on further analysis and validation, were able to distinguish the main biological characteristics and deregulated pathways associated with each subtype. Importantly, the researchers found that that these six subtypes were associated with distinct clinical and pathological characteristics, molecular alterations, specific gene expression signatures, and deregulated signaling pathways. In the prognostic analysis based on recurrence-free survival, the researchers found that patients whose tumors were classified in one of two clusters (C4 and C6) had poorer recurrence-free survival than the other patients.
What Do These Findings Mean?
These findings suggest that it is possible to classify colorectal cancer into six robust molecular subtypes that might help identify new prognostic subgroups and could provide a basis for developing robust prognostic genetic signatures for stage II and III colorectal cancer and for identifying specific markers for the different subtypes that might be targets for future drug development. However, as this study was retrospective and did not include some known predictors of colorectal cancer prognosis, such as tumor grade and number of nodes examined, the significance and robustness of the prognostic classification requires further confirmation with large prospective patient cohorts.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001453.
The American Cancer Society provides information about colorectal cancer and also about how colorectal cancer is staged
The US National Cancer Institute also provides information on colon and rectal cancer and colon cancer stages
doi:10.1371/journal.pmed.1001453
PMCID: PMC3660251  PMID: 23700391
12.  Establishment and validation of circulating tumor cell-based prognostic nomograms in first-line metastatic breast cancer patients 
Purpose
Circulating tumor cells (CTC) represent a new outcome-associated biomarker independently from known prognostic factors in metastatic breast cancer (MBC). The objective here was to develop and validate nomograms that combined baseline CTC counts and the other prognostic factors to assess the outcome of individual patients starting first-line treatment for MBC.
Experimental Design
We used a training set of 236 MBC patients starting a first-line treatment from the MD Anderson Cancer Center to establish nomograms that calculated the predicted probability of survival at different time points: 1, 2, and 5 years for overall survival (OS) and 6 months and 1 and 2 years for progression-free survival (PFS). The covariates computed in the model were: age, disease subtype, visceral metastases, performance status, and CTC counts by CellSearch. Nomograms were independently validated with 210 MBC patients from the Institut Curie who underwent first-line chemotherapy. The discriminatory ability and accuracy of the models were assessed using Harrell’s c-statistic and calibration plots at different time points in both training and validation datasets.
Results
Median follow-up was of 23 and 29 months in the MD Anderson and Institut Curie cohorts, respectively. Nomograms demonstrated good C-statistics: 0.74 for OS and 0.65 for PFS and discriminated OS prediction at 1, 2, and 5 years, and PFS prediction at 6 months and 1 and 2 years.
Conclusions
Nomograms, which relied on CTC counts as a continuous covariate, easily facilitated the use of a web-based tool for estimating survival, supporting treatment-decisions and clinical trial stratification in first-line MBC.
doi:10.1158/1078-0432.CCR-12-3137
PMCID: PMC3662240  PMID: 23340302
circulating tumor cells; first-line; metastatic breast cancer; nomogram; survival
13.  Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios 
PLoS ONE  2014;9(4):e94917.
Objectives
Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.
Methods
In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.
Results
Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).
Conclusions
Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.
doi:10.1371/journal.pone.0094917
PMCID: PMC3986413  PMID: 24733553
14.  TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection 
BMC Medical Genomics  2013;6(Suppl 1):S3.
Background
One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and prediction analysis of microarrays (PAM) are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG) classifier simplified as TSG.
Results
The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations.
Conclusions
Redefining the scores for gene set and the classification rules in TSP family classifiers by incorporating the sample size information can lead to better selection of informative genes and classification accuracy. The resulting TSG classifier offers a useful tool for cancer classification based on numerical molecular data.
doi:10.1186/1755-8794-6-S1-S3
PMCID: PMC3552704  PMID: 23445528
15.  Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (danio rerio) 
BMC Genomics  2012;13:358.
Background
Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset.
Results
Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain.
Conclusions
The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions.
doi:10.1186/1471-2164-13-358
PMCID: PMC3469349  PMID: 22849515
Gene classifiers; Endocrine-disrupting chemicals; Transcriptomics; Mechanism of action; Zebrafish
16.  On the statistical assessment of classifiers using DNA microarray data 
BMC Bioinformatics  2006;7:387.
Background
In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.
Results
We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed.
Conclusions
The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required.
doi:10.1186/1471-2105-7-387
PMCID: PMC1564153  PMID: 16919171
17.  A Risk Prediction Model for the Assessment and Triage of Women with Hypertensive Disorders of Pregnancy in Low-Resourced Settings: The miniPIERS (Pre-eclampsia Integrated Estimate of RiSk) Multi-country Prospective Cohort Study 
PLoS Medicine  2014;11(1):e1001589.
Beth Payne and colleagues use a risk prediction model, the Pre-eclampsia Integrated Estimate of RiSk (miniPIERS) to help inform the clinical assessment and triage of women with hypertensive disorders of pregnancy in low-resourced settings.
Please see later in the article for the Editors' Summary
Background
Pre-eclampsia/eclampsia are leading causes of maternal mortality and morbidity, particularly in low- and middle- income countries (LMICs). We developed the miniPIERS risk prediction model to provide a simple, evidence-based tool to identify pregnant women in LMICs at increased risk of death or major hypertensive-related complications.
Methods and Findings
From 1 July 2008 to 31 March 2012, in five LMICs, data were collected prospectively on 2,081 women with any hypertensive disorder of pregnancy admitted to a participating centre. Candidate predictors collected within 24 hours of admission were entered into a step-wise backward elimination logistic regression model to predict a composite adverse maternal outcome within 48 hours of admission. Model internal validation was accomplished by bootstrapping and external validation was completed using data from 1,300 women in the Pre-eclampsia Integrated Estimate of RiSk (fullPIERS) dataset. Predictive performance was assessed for calibration, discrimination, and stratification capacity. The final miniPIERS model included: parity (nulliparous versus multiparous); gestational age on admission; headache/visual disturbances; chest pain/dyspnoea; vaginal bleeding with abdominal pain; systolic blood pressure; and dipstick proteinuria. The miniPIERS model was well-calibrated and had an area under the receiver operating characteristic curve (AUC ROC) of 0.768 (95% CI 0.735–0.801) with an average optimism of 0.037. External validation AUC ROC was 0.713 (95% CI 0.658–0.768). A predicted probability ≥25% to define a positive test classified women with 85.5% accuracy. Limitations of this study include the composite outcome and the broad inclusion criteria of any hypertensive disorder of pregnancy. This broad approach was used to optimize model generalizability.
Conclusions
The miniPIERS model shows reasonable ability to identify women at increased risk of adverse maternal outcomes associated with the hypertensive disorders of pregnancy. It could be used in LMICs to identify women who would benefit most from interventions such as magnesium sulphate, antihypertensives, or transportation to a higher level of care.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Each year, ten million women develop pre-eclampsia or a related hypertensive (high blood pressure) disorder of pregnancy and 76,000 women die as a result. Globally, hypertensive disorders of pregnancy cause around 12% of maternal deaths—deaths of women during or shortly after pregnancy. The mildest of these disorders is gestational hypertension, high blood pressure that develops after 20 weeks of pregnancy. Gestational hypertension does not usually harm the mother or her unborn child and resolves after delivery but up to a quarter of women with this condition develop pre-eclampsia, a combination of hypertension and protein in the urine (proteinuria). Women with mild pre-eclampsia may not have any symptoms—the condition is detected during antenatal checks—but more severe pre-eclampsia can cause headaches, blurred vision, and other symptoms, and can lead to eclampsia (fits), multiple organ failure, and death of the mother and/or her baby. The only “cure” for pre-eclampsia is to deliver the baby as soon as possible but women are sometimes given antihypertensive drugs to lower their blood pressure or magnesium sulfate to prevent seizures.
Why Was This Study Done?
Women in low- and middle-income countries (LMICs) are more likely to develop complications of pre-eclampsia than women in high-income countries and most of the deaths associated with hypertensive disorders of pregnancy occur in LMICs. The high burden of illness and death in LMICs is thought to be primarily due to delays in triage (the identification of women who are or may become severely ill and who need specialist care) and delays in transporting these women to facilities where they can receive appropriate care. Because there is a shortage of health care workers who are adequately trained in the triage of suspected cases of hypertensive disorders of pregnancy in many LMICs, one way to improve the situation might be to design a simple tool to identify women at increased risk of complications or death from hypertensive disorders of pregnancy. Here, the researchers develop miniPIERS (Pre-eclampsia Integrated Estimate of RiSk), a clinical risk prediction model for adverse outcomes among women with hypertensive disorders of pregnancy suitable for use in community and primary health care facilities in LMICs.
What Did the Researchers Do and Find?
The researchers used data on candidate predictors of outcome that are easy to collect and/or measure in all health care settings and that are associated with pre-eclampsia from women admitted with any hypertensive disorder of pregnancy to participating centers in five LMICs to build a model to predict death or a serious complication such as organ damage within 48 hours of admission. The miniPIERS model included parity (whether the woman had been pregnant before), gestational age (length of pregnancy), headache/visual disturbances, chest pain/shortness of breath, vaginal bleeding with abdominal pain, systolic blood pressure, and proteinuria detected using a dipstick. The model was well-calibrated (the predicted risk of adverse outcomes agreed with the observed risk of adverse outcomes among the study participants), it had a good discriminatory ability (it could separate women who had a an adverse outcome from those who did not), and it designated women as being at high risk (25% or greater probability of an adverse outcome) with an accuracy of 85.5%. Importantly, external validation using data collected in fullPIERS, a study that developed a more complex clinical prediction model based on data from women attending tertiary hospitals in high-income countries, confirmed the predictive performance of miniPIERS.
What Do These Findings Mean?
These findings indicate that the miniPIERS model performs reasonably well as a tool to identify women at increased risk of adverse maternal outcomes associated with hypertensive disorders of pregnancy. Because miniPIERS only includes simple-to-measure personal characteristics, symptoms, and signs, it could potentially be used in resource-constrained settings to identify the women who would benefit most from interventions such as transportation to a higher level of care. However, further external validation of miniPIERS is needed using data collected from women living in LMICs before the model can be used during routine antenatal care. Moreover, the value of miniPIERS needs to be confirmed in implementation projects that examine whether its potential translates into clinical improvements. For now, though, the model could provide the basis for an education program to increase the knowledge of women, families, and community health care workers in LMICs about the signs and symptoms of hypertensive disorders of pregnancy.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001589.
The World Health Organization provides guidelines for the management of hypertensive disorders of pregnancy in low-resourced settings
The Maternal and Child Health Integrated Program provides information on pre-eclampsia and eclampsia targeted to low-resourced settings along with a tool-kit for LMIC providers
The US National Heart, Lung, and Blood Institute provides information about high blood pressure in pregnancy and a guide to lowering blood pressure in pregnancy
The UK National Health Service Choices website provides information about pre-eclampsia
The US not-for profit organization Preeclampsia Foundation provides information about all aspects of pre-eclampsia; its website includes some personal stories
The UK charity Healthtalkonline also provides personal stories about hypertensive disorders of pregnancy
MedlinePlus provides links to further information about high blood pressure and pregnancy (in English and Spanish); the MedlinePlus Encyclopedia has a video about pre-eclampsia (also in English and Spanish)
More information about miniPIERS and about fullPIERS is available
doi:10.1371/journal.pmed.1001589
PMCID: PMC3897359  PMID: 24465185
18.  Mass spectrometry protein expression profiles in colorectal cancer tissue associated with clinico-pathological features of disease 
BMC Cancer  2010;10:410.
Background
Studies of several tumour types have shown that expression profiling of cellular protein extracted from surgical tissue specimens by direct mass spectrometry analysis can accurately discriminate tumour from normal tissue and in some cases can sub-classify disease. We have evaluated the potential value of this approach to classify various clinico-pathological features in colorectal cancer by employing matrix-assisted laser desorption ionisation time of-flight-mass spectrometry (MALDI-TOF MS).
Methods
Protein extracts from 31 tumour and 33 normal mucosa specimens were purified, subjected to MALDI-Tof MS and then analysed using the 'GenePattern' suite of computational tools (Broad Institute, MIT, USA). Comparative Gene Marker Selection with either a t-test or a signal-to-noise ratio (SNR) test statistic was used to identify and rank differentially expressed marker peaks. The k-nearest neighbours algorithm was used to build classification models either using separate training and test datasets or else by using an iterative, 'leave-one-out' cross-validation method.
Results
73 protein peaks in the mass range 1800-16000Da were differentially expressed in tumour verses adjacent normal mucosa tissue (P ≤ 0.01, false discovery rate ≤ 0.05). Unsupervised hierarchical cluster analysis classified most tumour and normal mucosa into distinct cluster groups. Supervised prediction correctly classified the tumour/normal mucosa status of specimens in an independent test spectra dataset with 100% sensitivity and specificity (95% confidence interval: 67.9-99.2%). Supervised prediction using 'leave-one-out' cross validation algorithms for tumour spectra correctly classified 10/13 poorly differentiated and 16/18 well/moderately differentiated tumours (P = < 0.001; receiver-operator characteristics - ROC - error, 0.171); disease recurrence was correctly predicted in 5/6 cases and disease-free survival (median follow-up time, 25 months) was correctly predicted in 22/23 cases (P = < 0.001; ROC error, 0.105). A similar analysis of normal mucosa spectra correctly predicted 11/14 patients with, and 15/19 patients without lymph node involvement (P = 0.001; ROC error, 0.212).
Conclusions
Protein expression profiling of surgically resected CRC tissue extracts by MALDI-TOF MS has potential value in studies aimed at improved molecular classification of this disease. Further studies, with longer follow-up times and larger patient cohorts, that would permit independent validation of supervised classification models, would be required to confirm the predictive value of tumour spectra for disease recurrence/patient survival.
doi:10.1186/1471-2407-10-410
PMCID: PMC2927547  PMID: 20691062
19.  Postoperative systems models more accurately predict risk of significant disease progression than standard risk groups and a 10-year postoperative nomogram: potential impact on the receipt of adjuvant therapy after surgery 
BJU international  2011;109(1):40-45.
Objectives
To compare the performance of a systems-based risk assessment tool with standard defined risk groups and the 10-year postoperative nomogram for predicting disease progression, including biochemical relapse and clinical (systemic) failure.
Patients and methods
Clinical variables, biometric profiles and outcome results from a training cohort comprising 373 patients in a published postoperative systems-based prognostic model were obtained.
Patients were stratified according to D’Amico standard risk groups, Kattan 10-year postoperative nomogram and prognostic scores from the postoperative tissue model.
The association of pathological variables and calculated risk groups with biochemical recurrence and clinical (systemic) failure was assessed using the concordance index (C-index) and hazard ratio (HR).
Results
Systems-based post-prostatectomy models to predict significant disease progression (post-treatment clinical failure) were more accurate than the D’Amico defined risk groups and the Kattan 10-year postoperative nomogram (systems model: C-index, 0.84; HR, 17.46; P < 0.001 vs D’Amico: C-index, 0.73; HR, 11; P = 0.001; 10-year nomogram: C-index, 0.79; HR, 5.06; P < 0.001).
The systems models were also more accurate than standard risk groups for predicting prostate-specific antigen recurrence (systems model: C-index, 0.76; HR, 8.94; P < 0.001 vs D’Amico C-index, 0.70; HR, 4.67; P < 0.001) and shower incremental improvement over the 10-year postoperative nomogram (C-index, 0.75; HR, 5.83; P < 0.001).
The postoperative tissue model provided additional risk discrimination over surgical margin status and extracapsular extension for predicting disease outcome, and was most significant for the clinical (systemic) failure endpoint (surgical margin: C-index, 0.58; HR, 1.57; P = 0.2; extracapsular extension: C-index, 0.62; HR, 2.06; P = 0.04).
Conclusions
Risk assessment models that incorporate characteristics from the patient’s own tumour specimen are more accurate than clinical-only nomograms for predicting significant disease outcome.
Systems-based tools should provide useful information concerning the appropriate receipt of adjuvant therapy in the post-surgical setting.
doi:10.1111/j.1464-410X.2011.10398.x
PMCID: PMC4035101  PMID: 21771247
prostate cancer; systems-based models; nomogram
20.  Combining Gene Signatures Improves Prediction of Breast Cancer Survival 
PLoS ONE  2011;6(3):e17845.
Background
Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study.
Principal Findings
To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction.
Conclusion
Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set.
doi:10.1371/journal.pone.0017845
PMCID: PMC3053398  PMID: 21423775
21.  Pre- and Post-Operative Nomograms to Predict Recurrence-Free Probability in Korean Men with Clinically Localized Prostate Cancer 
PLoS ONE  2014;9(6):e100053.
Objectives
Although the incidence of prostate cancer (PCa) is rapidly increasing in Korea, there are few suitable prediction models for disease recurrence after radical prostatectomy (RP). We established pre- and post-operative nomograms estimating biochemical recurrence (BCR)-free probability after RP in Korean men with clinically localized PCa.
Patients and Methods
Our sampling frame included 3,034 consecutive men with clinically localized PCa who underwent RP at our tertiary centers from June 2004 through July 2011. After inappropriate data exclusion, we evaluated 2,867 patients for the development of nomograms. The Cox proportional hazards regression model was used to develop pre- and post-operative nomograms that predict BCR-free probability. Finally, we resampled from our study cohort 200 times to determine the accuracy of our nomograms on internal validation, which were designated with concordance index (c-index) and further represented by calibration plots.
Results
Over a median of 47 months of follow-up, the estimated BCR-free rate was 87.8% (1 year), 83.8% (2 year), and 72.5% (5 year). In the pre-operative model, Prostate-Specific Antigen (PSA), the proportion of positive biopsy cores, clinical T3a and biopsy Gleason score (GS) were independent predictive factors for BCR, while all relevant predictive factors (PSA, extra-prostatic extension, seminal vesicle invasion, lymph node metastasis, surgical margin, and pathologic GS) were associated with BCR in the post-operative model. The c-index representing predictive accuracy was 0.792 (pre-) and 0.821 (post-operative), showing good fit in the calibration plots.
Conclusions
In summary, we developed pre- and post-operative nomograms predicting BCR-free probability after RP in a large Korean cohort with clinically localized PCa. These nomograms will be provided as the mobile application-based SNUH Prostate Cancer Calculator. Our nomograms can determine patients at high risk of disease recurrence after RP who will benefit from adjuvant therapy.
doi:10.1371/journal.pone.0100053
PMCID: PMC4061043  PMID: 24936784
22.  Preoperative nomogram for the identification of lymph node metastasis in early cervical cancer 
British Journal of Cancer  2013;110(1):34-41.
Background:
The objective of this study is to construct a preoperative nomogram predicting lymph node metastasis (LNM) in early-cervical cancer patients.
Methods:
Between 2009 and 2012, 493 early-cervical cancer patients received hysterectomy and pelvic/para-aortic lymphadenectomy. Patients who were diagnosed during 2009–2010 were assigned to a model-development cohort (n=304) and the others were assigned to a validation cohort (n=189). A multivariate logistic model was created from preoperative clinicopathologic data, from which a nomogram was developed and validated. A predicted probability of LNM<5% was defined as low risk.
Results:
Age, tumour size assessed by magnetic resonance imaging, and LNM assessed by positron emission tomography/computed tomography were independent predictors of nodal metastasis. The nomogram incorporating these three predictors demonstrated good discrimination and calibration (concordance index=0.878; 95% confidence interval (CI), 0.833−0.917). In the validation cohort, the discrimination accuracy was 0.825 (95% CI, 0.736−0.895). In the model-development cohort, 34% of them were classified as low risk and negative predictive value (NPV) was 99.0%. In the validation cohort, 38% were identified as low risk and NPV was 95.8%. Integrating the model-development and validation cohorts, negative likelihood ratio was 0.094 (95% CI, 0.036−0.248).
Conclusion:
A robust nomogram predicting LNM in early cervical cancer was developed. This model may improve clinical trial design and help physicians to decide whether lymphadenectomy should be performed.
doi:10.1038/bjc.2013.718
PMCID: PMC3887306  PMID: 24231954
cervical cancer; lymphatic metastasis; lymph node excision; nomogram; likelihood functions
23.  Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes 
PLoS Computational Biology  2012;8(5):e1002511.
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.
Author Summary
Why do some people with the same type of cancer die early and some live long? Apart from influences from the environment and personal lifestyle, we believe that differences in the individual tumor genome account for different survival times. Recently, powerful methods have become available to systematically read genomic information of patient samples. The major remaining challenge is how to spot, among the thousands of changes, those few that are relevant for tumor aggressiveness and thereby affecting patient survival. Here, we make use of the fact that genes and proteins in a cell never act alone, but form a network of interactions. Finding the relevant information in big networks of web documents and hyperlinks has been mastered by Google with their PageRank algorithm. Similar to PageRank, we have developed an algorithm that can identify genes that are better indicators for survival than genes found by traditional algorithms. Our method can aid the clinician in deciding if a patient should receive chemotherapy or not. Reliable prediction of survival and response to therapy based on molecular markers bears a great potential to improve and personalize patient therapies in the future.
doi:10.1371/journal.pcbi.1002511
PMCID: PMC3355064  PMID: 22615549
24.  An Accurate Prostate Cancer Prognosticator Using a Seven-Gene Signature Plus Gleason Score and Taking Cell Type Heterogeneity into Account 
PLoS ONE  2012;7(9):e45178.
One of the major challenges in the development of prostate cancer prognostic biomarkers is the cellular heterogeneity in tissue samples. We developed an objective Cluster-Correlation (CC) analysis to identify gene expression changes in various cell types that are associated with progression. In the Cluster step, samples were clustered (unsupervised) based on the expression values of each gene through a mixture model combined with a multiple linear regression model in which cell-type percent data were used for decomposition. In the Correlation step, a Chi-square test was used to select potential prognostic genes. With CC analysis, we identified 324 significantly expressed genes (68 tumor and 256 stroma cell expressed genes) which were strongly associated with the observed biochemical relapse status. Significance Analysis of Microarray (SAM) was then utilized to develop a seven-gene classifier. The Classifier has been validated using two independent Data Sets. The overall prediction accuracy and sensitivity is 71% and 76%, respectively. The inclusion of the Gleason sum to the seven-gene classifier raised the prediction accuracy and sensitivity to 83% and 76% respectively based on independent testing. These results indicated that our prognostic model that includes cell type adjustments and using Gleason score and the seven-gene signature has some utility for predicting outcomes for prostate cancer for individual patients at the time of prognosis. The strategy could have applications for improving marker performance in other cancers and other diseases.
doi:10.1371/journal.pone.0045178
PMCID: PMC3460942  PMID: 23028830
25.  Inclusion of Genotype with Fundus Phenotype Improves Accuracy of Predicting Choroidal Neovascularization and Geographic Atrophy 
Ophthalmology  2013;120(9):1880-1892.
Purpose
The accuracy of predicting conversion from early stage age-related macular degeneration (AMD) to the advanced stages of choroidal neovascularization (CNV) and/or geographic atrophy (GA) was evaluated to determine if inclusion of clinically relevant genetic markers improved accuracy beyond prediction using phenotypic risk factors alone.
Design
Cohort study.
Participants
White, non-Hispanic subjects participating in the Age Related Eye Disease Study (AREDS) sponsored by the National Eye Institute, consented to provide a genetic specimen. Of 2,415 DNA specimens available, 940 were from disease-free subjects and 1,475 were from subjects with early or intermediate AMD.
Methods
DNA specimens from study subjects were genotyped for 14 single nucleotide polymorphisms (SNPs) in genes shown previously to associate with CNV: ARMS2, CFH, C3, C2, FB, CFHR4, CFHR5 and F13B. Clinical demographics and established disease associations, including age, sex, smoking status, body mass index, AREDS treatment category, and educational level were evaluated. Four multivariate logistic models [Phenotype; Genotype; Phenotype + Genotype; Phenotype + Genotype + Demographic + Environmental factors] were tested using two endpoints (CNV, GA). Models were fitted using Cox proportional hazards regression to utilize time-to-disease onset data.
Main Outcome Measures
Brier score (measure of accuracy) was employed to identify the model with the lowest prediction error in the training set. The most accurate model was subjected to independent statistical validation and final model performance described using area under the receiver operator curve (AUC) or C-statistic.
Results
The CNV prediction models that combined genotype with phenotype with or without age and smoking revealed superior performance (C-statistic=0.96) compared to the phenotype model based on simplified severity scale and the presence of CNV in the non-study eye (C-statistic=0.89), p value < 0.01. For GA, the model that combined genotype with phenotype demonstrated the highest performance (AUC=0.94). Smoking status and ARMS2 genotype had less of an impact on the prediction of GA compared to CNV.
Conclusions
Inclusion of genotype assessment improves CNV prediction beyond that achievable with phenotype alone and may improve patient management. Separate assessments should be used to predict progression to CNV and GA since genetic markers and smoking status do not equally predict both endpoints.
doi:10.1016/j.ophtha.2013.02.007
PMCID: PMC3695024  PMID: 23523162

Results 1-25 (1273340)