Search tips
Search criteria

Results 1-25 (1217021)

Clipboard (0)

Related Articles

1.  Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes 
Cancer Informatics  2012;11:193-217.
The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.
In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.
MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.
Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.
PMCID: PMC3529607  PMID: 23304070
microarray; classification; metagenes; breast cancer
2.  The kSORT Assay to Detect Renal Transplant Patients at High Risk for Acute Rejection: Results of the Multicenter AART Study 
PLoS Medicine  2014;11(11):e1001759.
Minnie Sarwal and colleagues developed a gene expression assay using peripheral blood samples to detect patients with renal transplant at high risk for acute rejection.
Please see later in the article for the Editors' Summary
Development of noninvasive molecular assays to improve disease diagnosis and patient monitoring is a critical need. In renal transplantation, acute rejection (AR) increases the risk for chronic graft injury and failure. Noninvasive diagnostic assays to improve current late and nonspecific diagnosis of rejection are needed. We sought to develop a test using a simple blood gene expression assay to detect patients at high risk for AR.
Methods and Findings
We developed a novel correlation-based algorithm by step-wise analysis of gene expression data in 558 blood samples from 436 renal transplant patients collected across eight transplant centers in the US, Mexico, and Spain between 5 February 2005 and 15 December 2012 in the Assessment of Acute Rejection in Renal Transplantation (AART) study. Gene expression was assessed by quantitative real-time PCR (QPCR) in one center. A 17-gene set—the Kidney Solid Organ Response Test (kSORT)—was selected in 143 samples for AR classification using discriminant analysis (area under the receiver operating characteristic curve [AUC] = 0.94; 95% CI 0.91–0.98), validated in 124 independent samples (AUC = 0.95; 95% CI 0.88–1.0) and evaluated for AR prediction in 191 serial samples, where it predicted AR up to 3 mo prior to detection by the current gold standard (biopsy). A novel reference-based algorithm (using 13 12-gene models) was developed in 100 independent samples to provide a numerical AR risk score, to classify patients as high risk versus low risk for AR. kSORT was able to detect AR in blood independent of age, time post-transplantation, and sample source without additional data normalization; AUC = 0.93 (95% CI 0.86–0.99). Further validation of kSORT is planned in prospective clinical observational and interventional trials.
The kSORT blood QPCR assay is a noninvasive tool to detect high risk of AR of renal transplants.
Please see later in the article for the Editors' Summary
Editors' Summary
Throughout life, the kidneys filter waste products (from the normal breakdown of tissues and food) and excess water from the blood to make urine. If the kidneys stop working for any reason, the rate at which the blood is filtered decreases, and dangerous amounts of creatinine and other waste products build up in the blood. The kidneys can fail suddenly (acute kidney failure) because of injury or poisoning, but usually failing kidneys stop working gradually over many years (chronic kidney disease). Chronic kidney disease is very common, especially in people who have high blood pressure or diabetes and in elderly people. In the UK, for example, about 20% of people aged 65–74 years have some degree of chronic kidney disease. People whose kidneys fail completely (end-stage kidney disease) need regular dialysis (hemodialysis, in which blood is filtered by an external machine, or peritoneal dialysis, which uses blood vessels in the abdominal lining to do the work of the kidneys) or a renal transplant (the surgical transfer of a healthy kidney from another person into the patient's body) to keep them alive.
Why Was This Study Done?
Our immune system protects us from pathogens (disease-causing organisms) by recognizing specific molecules (antigens) on the invader's surface as foreign and initiating a sequence of events that kills the invader. Unfortunately, the immune system sometimes recognizes kidney transplants as foreign and triggers transplant rejection. The chances of rejection can be minimized by “matching” the antigens on the donated kidney to those on the tissues of the kidney recipient and by giving the recipient immunosuppressive drugs. However, acute rejection (rejection during the first year after transplantation) affects about 20% of kidney transplants. Acute rejection needs to be detected quickly and treated with a short course of more powerful immunosuppressants because it increases the risk of transplant failure. The current “gold standard” method for detecting acute rejection if the level of creatinine in the patient's blood begins to rise is to surgically remove a small piece (biopsy) of the transplanted kidney for analysis. However, other conditions can change creatinine levels, acute rejection can occur without creatinine levels changing (subclinical acute rejection), and biopsies are invasive. Here, the researchers develop a noninvasive test for acute kidney rejection called the Kidney Solid Organ Response Test (kSORT) based on gene expression levels in the blood.
What Did the Researchers Do and Find?
For the Assessment of Acute Rejection in Renal Transplantation (AART) study, the researchers used an assay called quantitative polymerase chain reaction (QPCR) to measure the expression of 43 genes whose expression levels change during acute kidney rejection in blood samples collected from patients who had had a kidney transplant. Using a training set of 143 samples and statistical analyses, the researchers identified a 17-gene set (kSORT) that discriminated between patients with and without acute rejection detected by kidney biopsy. The 17-gene set correctly identified 39 of the samples taken from 47 patients with acute rejection as being from patients with acute rejection, and 87 of 96 samples from patients without acute rejection as being from patients without acute rejection. The researchers validated the gene set using 124 independent samples. Then, using 191 serial samples, they showed that the gene set was able to predict acute rejection up to three months before detection by biopsy. Finally, the researchers used 100 blood samples to develop an algorithm (a step-wise calculation) to classify patients as being at high or low risk of acute rejection.
What Do These Findings Mean?
These findings describe the early development of a noninvasive tool (kSORT) that might, eventually, help clinicians identify patients at risk of acute rejection after kidney transplantation. kSORT needs to be tested in more patients before being used clinically, however, to validate its predictive ability, particularly given that the current gold standard test against which it was compared (biopsy) is far from perfect. An additional limitation of kSORT is that it did not discriminate between cell-mediated and antibody-mediated immune rejection. These two types of immune rejection are treated in different ways, so clinicians ideally need a test for acute rejection that indicates which form of immune rejection is involved. The authors are conducting a follow-up study to help determine whether kSORT can be used in clinical practice to identify acute rejection and to identify which patients are at greatest risk of transplant rejection and may require biopsy.
Additional Information
Please access these websites via the online version of this summary at
The US National Kidney and Urologic Diseases Information Clearinghouse provides links to information about all aspects of kidney disease; the US National Kidney Disease Education Program provides resources to help improve the understanding, detection, and management of kidney disease (in English and Spanish)
The UK National Health Service Choices website provides information for patients on chronic kidney disease and about kidney transplants, including some personal stories
The US National Kidney Foundation, a not-for-profit organization, provides information about chronic kidney disease and about kidney transplantation (in English and Spanish)
The not-for-profit UK National Kidney Federation provides support and information for patients with kidney disease and for their carers, including information and personal stories about kidney donation and transplantation
World Kidney Day, a joint initiative between the International Society of Nephrology and the International Federation of Kidney Foundations, aims to raise awareness about kidneys and kidney disease
MedlinePlus provides links to additional resources about kidney diseases, kidney failure, and kidney transplantation; the MedlinePlus encyclopedia has a page about transplant rejection
PMCID: PMC4227654  PMID: 25386950
3.  Validation of the prognostic gene portfolio, ClinicoMolecular Triad Classification, using an independent prospective breast cancer cohort and external patient populations 
Using genome-wide expression profiles of a prospective training cohort of breast cancer patients, ClinicoMolecular Triad Classification (CMTC) was recently developed to classify breast cancers into three clinically relevant groups to aid treatment decisions. CMTC was found to be both prognostic and predictive in a large external breast cancer cohort in that study. This study serves to validate the reproducibility of CMTC and its prognostic value using independent patient cohorts.
An independent internal cohort (n = 284) and a new external cohort (n = 2,181) were used to validate the association of CMTC between clinicopathological factors, 12 known gene signatures, two molecular subtype classifiers, and 19 oncogenic signalling pathway activities, and to reproduce the abilities of CMTC to predict clinical outcomes of breast cancer. In addition, we also updated the outcome data of the original training cohort (n = 147).
The original training cohort reached a statistically significant difference (p < 0.05) in disease-free survivals between the three CMTC groups after an additional two years of follow-up (median = 55 months). The prognostic value of the triad classification was reproduced in the second independent internal cohort and the new external validation cohort. CMTC achieved even higher prognostic significance when all available patients were analyzed (n = 4,851). Oncogenic pathways Myc, E2F1, Ras and β-catenin were again implicated in the high-risk groups.
Both prospective internal cohorts and the independent external cohorts reproduced the triad classification of CMTC and its prognostic significance. CMTC is an independent prognostic predictor, and it outperformed 12 other known prognostic gene signatures, molecular subtype classifications, and all other standard prognostic clinicopathological factors. Our results support further development of CMTC portfolio into a guide for personalized breast cancer treatments.
PMCID: PMC4226941  PMID: 24996446
4.  Quantification of Heterogeneity as a Biomarker in Tumor Imaging: A Systematic Review 
PLoS ONE  2014;9(10):e110300.
Many techniques are proposed for the quantification of tumor heterogeneity as an imaging biomarker for differentiation between tumor types, tumor grading, response monitoring and outcome prediction. However, in clinical practice these methods are barely used. This study evaluates the reported performance of the described methods and identifies barriers to their implementation in clinical practice.
The Ovid, Embase, and Cochrane Central databases were searched up to 20 September 2013. Heterogeneity analysis methods were classified into four categories, i.e., non-spatial methods (NSM), spatial grey level methods (SGLM), fractal analysis (FA) methods, and filters and transforms (F&T). The performance of the different methods was compared.
Principal Findings
Of the 7351 potentially relevant publications, 209 were included. Of these studies, 58% reported the use of NSM, 49% SGLM, 10% FA, and 28% F&T. Differentiation between tumor types, tumor grading and/or outcome prediction was the goal in 87% of the studies. Overall, the reported area under the curve (AUC) ranged from 0.5 to 1 (median 0.87). No relation was found between the performance and the quantification methods used, or between the performance and the imaging modality. A negative correlation was found between the tumor-feature ratio and the AUC, which is presumably caused by overfitting in small datasets. Cross-validation was reported in 63% of the classification studies. Retrospective analyses were conducted in 57% of the studies without a clear description.
In a research setting, heterogeneity quantification methods can differentiate between tumor types, grade tumors, and predict outcome and monitor treatment effects. To translate these methods to clinical practice, more prospective studies are required that use external datasets for validation: these datasets should be made available to the community to facilitate the development of new and improved methods.
PMCID: PMC4203782  PMID: 25330171
5.  Clinical Outcome Prediction by MicroRNAs in Human Cancer: A Systematic Review 
MicroRNA (miR) expression may have prognostic value for many types of cancers. However, the miR literature comprises many small studies. We systematically reviewed and synthesized the evidence.
Using MEDLINE (last update December 2010), we identified English language studies that examined associations between miRs and cancer prognosis using tumor specimens for more than 10 patients during classifier development. We included studies that assessed a major clinical outcome (nodal disease, disease progression, response to therapy, metastasis, recurrence, or overall survival) in an agnostic fashion using either polymerase chain reaction or hybridized oligonucleotide microarrays.
Forty-six articles presenting results on 43 studies pertaining to 20 different types of malignancy were eligible for inclusion in this review. The median study size was 65 patients (interquartile range [IQR] = 34–129), the median number of miRs assayed was 328 (IQR = 250–470), and overall survival or recurrence were the most commonly measured outcomes (30 and 19 studies, respectively). External validation was performed in 21 studies, 20 of which reported at least one nominally statistically significant result for a miR classifier. The median hazard ratio for poor outcome in externally validated studies was 2.52 (IQR = 2.26–5.40). For all classifier miRs in studies that evaluated overall survival across diverse malignancies, the miRs most frequently associated with poor outcome after accounting for differences in miR assessment due to platform type were let-7 (decreased expression in patients with cancer) and miR 21 (increased expression).
MiR classifiers show promising prognostic associations with major cancer outcomes and specific miRs are consistently identified across diverse studies and platforms. These types of classifiers require careful external validation in large groups of cancer patients that have adequate protection from bias. –
PMCID: PMC3317879  PMID: 22395642
6.  Stratification bias in low signal microarray studies 
BMC Bioinformatics  2007;8:326.
When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated.
We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice.
Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets.
PMCID: PMC2211509  PMID: 17764577
7.  Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation 
PLoS ONE  2014;9(6):e100335.
With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.
We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.
We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
PMCID: PMC4072626  PMID: 24967636
8.  Threats to Validity in the Design and Conduct of Preclinical Efficacy Studies: A Systematic Review of Guidelines for In Vivo Animal Experiments 
PLoS Medicine  2013;10(7):e1001489.
The vast majority of medical interventions introduced into clinical development prove unsafe or ineffective. One prominent explanation for the dismal success rate is flawed preclinical research. We conducted a systematic review of preclinical research guidelines and organized recommendations according to the type of validity threat (internal, construct, or external) or programmatic research activity they primarily address.
Methods and Findings
We searched MEDLINE, Google Scholar, Google, and the EQUATOR Network website for all preclinical guideline documents published up to April 9, 2013 that addressed the design and conduct of in vivo animal experiments aimed at supporting clinical translation. To be eligible, documents had to provide guidance on the design or execution of preclinical animal experiments and represent the aggregated consensus of four or more investigators. Data from included guidelines were independently extracted by two individuals for discrete recommendations on the design and implementation of preclinical efficacy studies. These recommendations were then organized according to the type of validity threat they addressed. A total of 2,029 citations were identified through our search strategy. From these, we identified 26 guidelines that met our eligibility criteria—most of which were directed at neurological or cerebrovascular drug development. Together, these guidelines offered 55 different recommendations. Some of the most common recommendations included performance of a power calculation to determine sample size, randomized treatment allocation, and characterization of disease phenotype in the animal model prior to experimentation.
By identifying the most recurrent recommendations among preclinical guidelines, we provide a starting point for developing preclinical guidelines in other disease domains. We also provide a basis for the study and evaluation of preclinical research practice.
Please see later in the article for the Editors' Summary
Editors' Summary
The development process for new drugs is lengthy and complex. It begins in the laboratory, where scientists investigate the causes of diseases and identify potential new treatments. Next, promising interventions undergo preclinical research in cells and in animals (in vivo animal experiments) to test whether the intervention has the expected effect and to support the generalization (extension) of this treatment–effect relationship to patients. Drugs that pass these tests then enter clinical trials, where their safety and efficacy is tested in selected groups of patients under strictly controlled conditions. Finally, the government bodies responsible for drug approval review the results of the clinical trials, and successful drugs receive a marketing license, usually a decade or more after the initial laboratory work. Notably, only 11% of agents that enter clinical testing (investigational drugs) are ultimately licensed.
Why Was This Study Done?
The frequent failure of investigational drugs during clinical translation is potentially harmful to trial participants. Moreover, the costs of these failures are passed onto healthcare systems in the form of higher drug prices. It would be good, therefore, to reduce the attrition rate of investigational drugs. One possible explanation for the dismal success rate of clinical translation is that preclinical research, the key resource for justifying clinical development, is flawed. To address this possibility, several groups of preclinical researchers have issued guidelines intended to improve the design and execution of in vivo animal studies. In this systematic review (a study that uses predefined criteria to identify all the research on a given topic), the authors identify the experimental practices that are commonly recommended in these guidelines and organize these recommendations according to the type of threat to validity (internal, construct, or external) that they address. Internal threats to validity are factors that confound reliable inferences about treatment–effect relationships in preclinical research. For example, experimenter expectation may bias outcome assessment. Construct threats to validity arise when researchers mischaracterize the relationship between an experimental system and the clinical disease it is intended to represent. For example, researchers may use an animal model for a complex multifaceted clinical disease that only includes one characteristic of the disease. External threats to validity are unseen factors that frustrate the transfer of treatment–effect relationships from animal models to patients.
What Did the Researchers Do and Find?
The researchers identified 26 preclinical guidelines that met their predefined eligibility criteria. Twelve guidelines addressed preclinical research for neurological and cerebrovascular drug development; other disorders covered by guidelines included cardiac and circulatory disorders, sepsis, pain, and arthritis. Together, the guidelines offered 55 different recommendations for the design and execution of preclinical in vivo animal studies. Nineteen recommendations addressed threats to internal validity. The most commonly included recommendations of this type called for the use of power calculations to ensure that sample sizes are large enough to yield statistically meaningful results, random allocation of animals to treatment groups, and “blinding” of researchers who assess outcomes to treatment allocation. Among the 25 recommendations that addressed threats to construct validity, the most commonly included recommendations called for characterization of the properties of the animal model before experimentation and matching of the animal model to the human manifestation of the disease. Finally, six recommendations addressed threats to external validity. The most commonly included of these recommendations suggested that preclinical research should be replicated in different models of the same disease and in different species, and should also be replicated independently.
What Do These Findings Mean?
This systematic review identifies a range of investigational recommendations that preclinical researchers believe address threats to the validity of preclinical efficacy studies. Many of these recommendations are not widely implemented in preclinical research at present. Whether the failure to implement them explains the frequent discordance between the results on drug safety and efficacy obtained in preclinical research and in clinical trials is currently unclear. These findings provide a starting point, however, for the improvement of existing preclinical research guidelines for specific diseases, and for the development of similar guidelines for other diseases. They also provide an evidence-based platform for the analysis of preclinical evidence and for the study and evaluation of preclinical research practice. These findings should, therefore, be considered by investigators, institutional review bodies, journals, and funding agents when designing, evaluating, and sponsoring translational research.
Additional Information
Please access these websites via the online version of this summary at
The US Food and Drug Administration provides information about drug approval in the US for consumers and for health professionals; its Patient Network provides a step-by-step description of the drug development process that includes information on preclinical research
The UK Medicines and Healthcare Products Regulatory Agency (MHRA) provides information about all aspects of the scientific evaluation and approval of new medicines in the UK; its My Medicine: From Laboratory to Pharmacy Shelf web pages describe the drug development process from scientific discovery, through preclinical and clinical research, to licensing and ongoing monitoring
The STREAM website provides ongoing information about policy, ethics, and practices used in clinical translation of new drugs
The CAMARADES collaboration offers a “supporting framework for groups involved in the systematic review of animal studies” in stroke and other neurological diseases
PMCID: PMC3720257  PMID: 23935460
9.  A consensus prognostic gene expression classifier for ER positive breast cancer 
Genome Biology  2006;7(10):R101.
A consensus prognostic classifier for estrogen receptor positive breast tumors has been developed and shown to be valid in nearly 900 samples across different microarray platforms.
A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer.
Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation.
The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors.
PMCID: PMC1794561  PMID: 17076897
10.  Improved Glomerular Filtration Rate Estimation by an Artificial Neural Network 
PLoS ONE  2013;8(3):e58242.
Accurate evaluation of glomerular filtration rates (GFRs) is of critical importance in clinical practice. A previous study showed that models based on artificial neural networks (ANNs) could achieve a better performance than traditional equations. However, large-sample cross-sectional surveys have not resolved questions about ANN performance.
A total of 1,180 patients that had chronic kidney disease (CKD) were enrolled in the development data set, the internal validation data set and the external validation data set. Additional 222 patients that were admitted to two independent institutions were externally validated. Several ANNs were constructed and finally a Back Propagation network optimized by a genetic algorithm (GABP network) was chosen as a superior model, which included six input variables; i.e., serum creatinine, serum urea nitrogen, age, height, weight and gender, and estimated GFR as the one output variable. Performance was then compared with the Cockcroft-Gault equation, the MDRD equations and the CKD-EPI equation.
In the external validation data set, Bland-Altman analysis demonstrated that the precision of the six-variable GABP network was the highest among all of the estimation models; i.e., 46.7 ml/min/1.73 m2 vs. a range from 71.3 to 101.7 ml/min/1.73 m2, allowing improvement in accuracy (15% accuracy, 49.0%; 30% accuracy, 75.1%; 50% accuracy, 90.5% [P<0.001 for all]) and CKD stage classification (misclassification rate of CKD stage, 32.4% vs. a range from 47.3% to 53.3% [P<0.001 for all]). Furthermore, in the additional external validation data set, precision and accuracy were improved by the six-variable GABP network.
A new ANN model (the six-variable GABP network) for CKD patients was developed that could provide a simple, more accurate and reliable means for the estimation of GFR and stage of CKD than traditional equations. Further validations are needed to assess the ability of the ANN model in diverse populations.
PMCID: PMC3596400  PMID: 23516450
11.  Risk Models to Predict Chronic Kidney Disease and Its Progression: A Systematic Review 
PLoS Medicine  2012;9(11):e1001344.
A systematic review of risk prediction models conducted by Justin Echouffo-Tcheugui and Andre Kengne examines the evidence base for prediction of chronic kidney disease risk and its progression, and suitability of such models for clinical use.
Chronic kidney disease (CKD) is common, and associated with increased risk of cardiovascular disease and end-stage renal disease, which are potentially preventable through early identification and treatment of individuals at risk. Although risk factors for occurrence and progression of CKD have been identified, their utility for CKD risk stratification through prediction models remains unclear. We critically assessed risk models to predict CKD and its progression, and evaluated their suitability for clinical use.
Methods and Findings
We systematically searched MEDLINE and Embase (1 January 1980 to 20 June 2012). Dual review was conducted to identify studies that reported on the development, validation, or impact assessment of a model constructed to predict the occurrence/presence of CKD or progression to advanced stages. Data were extracted on study characteristics, risk predictors, discrimination, calibration, and reclassification performance of models, as well as validation and impact analyses. We included 26 publications reporting on 30 CKD occurrence prediction risk scores and 17 CKD progression prediction risk scores. The vast majority of CKD risk models had acceptable-to-good discriminatory performance (area under the receiver operating characteristic curve>0.70) in the derivation sample. Calibration was less commonly assessed, but overall was found to be acceptable. Only eight CKD occurrence and five CKD progression risk models have been externally validated, displaying modest-to-acceptable discrimination. Whether novel biomarkers of CKD (circulatory or genetic) can improve prediction largely remains unclear, and impact studies of CKD prediction models have not yet been conducted. Limitations of risk models include the lack of ethnic diversity in derivation samples, and the scarcity of validation studies. The review is limited by the lack of an agreed-on system for rating prediction models, and the difficulty of assessing publication bias.
The development and clinical application of renal risk scores is in its infancy; however, the discriminatory performance of existing tools is acceptable. The effect of using these models in practice is still to be explored.
Please see later in the article for the Editors' Summary
Editors' Summary
Chronic kidney disease (CKD)—the gradual loss of kidney function—is increasingly common worldwide. In the US, for example, about 26 million adults have CKD, and millions more are at risk of developing the condition. Throughout life, small structures called nephrons inside the kidneys filter waste products and excess water from the blood to make urine. If the nephrons stop working because of injury or disease, the rate of blood filtration decreases, and dangerous amounts of waste products such as creatinine build up in the blood. Symptoms of CKD, which rarely occur until the disease is very advanced, include tiredness, swollen feet and ankles, puffiness around the eyes, and frequent urination, especially at night. There is no cure for CKD, but progression of the disease can be slowed by controlling high blood pressure and diabetes, both of which cause CKD, and by adopting a healthy lifestyle. The same interventions also reduce the chances of CKD developing in the first place.
Why Was This Study Done?
CKD is associated with an increased risk of end-stage renal disease, which is treated with dialysis or by kidney transplantation (renal replacement therapies), and of cardiovascular disease. These life-threatening complications are potentially preventable through early identification and treatment of CKD, but most people present with advanced disease. Early identification would be particularly useful in developing countries, where renal replacement therapies are not readily available and resources for treating cardiovascular problems are limited. One way to identify people at risk of a disease is to use a “risk model.” Risk models are constructed by testing the ability of different combinations of risk factors that are associated with a specific disease to identify those individuals in a “derivation sample” who have the disease. The model is then validated on an independent group of people. In this systematic review (a study that uses predefined criteria to identify all the research on a given topic), the researchers critically assess the ability of existing CKD risk models to predict the occurrence of CKD and its progression, and evaluate their suitability for clinical use.
What Did the Researchers Do and Find?
The researchers identified 26 publications reporting on 30 risk models for CKD occurrence and 17 risk models for CKD progression that met their predefined criteria. The risk factors most commonly included in these models were age, sex, body mass index, diabetes status, systolic blood pressure, serum creatinine, protein in the urine, and serum albumin or total protein. Nearly all the models had acceptable-to-good discriminatory performance (a measure of how well a model separates people who have a disease from people who do not have the disease) in the derivation sample. Not all the models had been calibrated (assessed for whether the average predicted risk within a group matched the proportion that actually developed the disease), but in those that had been assessed calibration was good. Only eight CKD occurrence and five CKD progression risk models had been externally validated; discrimination in the validation samples was modest-to-acceptable. Finally, very few studies had assessed whether adding extra variables to CKD risk models (for example, genetic markers) improved prediction, and none had assessed the impact of adopting CKD risk models on the clinical care and outcomes of patients.
What Do These Findings Mean?
These findings suggest that the development and clinical application of CKD risk models is still in its infancy. Specifically, these findings indicate that the existing models need to be better calibrated and need to be externally validated in different populations (most of the models were tested only in predominantly white populations) before they are incorporated into guidelines. The impact of their use on clinical outcomes also needs to be assessed before their widespread use is recommended. Such research is worthwhile, however, because of the potential public health and clinical applications of well-designed risk models for CKD. Such models could be used to identify segments of the population that would benefit most from screening for CKD, for example. Moreover, risk communication to patients could motivate them to adopt a healthy lifestyle and to adhere to prescribed medications, and the use of models for predicting CKD progression could help clinicians tailor disease-modifying therapies to individual patient needs.
Additional Information
Please access these websites via the online version of this summary at
This study is further discussed in a PLOS Medicine Perspective by Maarten Taal
The US National Kidney and Urologic Diseases Information Clearinghouse provides information about all aspects of kidney disease; the US National Kidney Disease Education Program provides resources to help improve the understanding, detection, and management of kidney disease (in English and Spanish)
The UK National Health Service Choices website provides information for patients on chronic kidney disease, including some personal stories
The US National Kidney Foundation, a not-for-profit organization, provides information about chronic kidney disease (in English and Spanish)
The not-for-profit UK National Kidney Federation support and information for patients with kidney disease and for their carers, including a selection of patient experiences of kidney disease
World Kidney Day, a joint initiative between the International Society of Nephrology and the International Federation of Kidney Foundations, aims to raise awareness about kidneys and kidney disease
PMCID: PMC3502517  PMID: 23185136
12.  A Risk Prediction Model for the Assessment and Triage of Women with Hypertensive Disorders of Pregnancy in Low-Resourced Settings: The miniPIERS (Pre-eclampsia Integrated Estimate of RiSk) Multi-country Prospective Cohort Study 
PLoS Medicine  2014;11(1):e1001589.
Beth Payne and colleagues use a risk prediction model, the Pre-eclampsia Integrated Estimate of RiSk (miniPIERS) to help inform the clinical assessment and triage of women with hypertensive disorders of pregnancy in low-resourced settings.
Please see later in the article for the Editors' Summary
Pre-eclampsia/eclampsia are leading causes of maternal mortality and morbidity, particularly in low- and middle- income countries (LMICs). We developed the miniPIERS risk prediction model to provide a simple, evidence-based tool to identify pregnant women in LMICs at increased risk of death or major hypertensive-related complications.
Methods and Findings
From 1 July 2008 to 31 March 2012, in five LMICs, data were collected prospectively on 2,081 women with any hypertensive disorder of pregnancy admitted to a participating centre. Candidate predictors collected within 24 hours of admission were entered into a step-wise backward elimination logistic regression model to predict a composite adverse maternal outcome within 48 hours of admission. Model internal validation was accomplished by bootstrapping and external validation was completed using data from 1,300 women in the Pre-eclampsia Integrated Estimate of RiSk (fullPIERS) dataset. Predictive performance was assessed for calibration, discrimination, and stratification capacity. The final miniPIERS model included: parity (nulliparous versus multiparous); gestational age on admission; headache/visual disturbances; chest pain/dyspnoea; vaginal bleeding with abdominal pain; systolic blood pressure; and dipstick proteinuria. The miniPIERS model was well-calibrated and had an area under the receiver operating characteristic curve (AUC ROC) of 0.768 (95% CI 0.735–0.801) with an average optimism of 0.037. External validation AUC ROC was 0.713 (95% CI 0.658–0.768). A predicted probability ≥25% to define a positive test classified women with 85.5% accuracy. Limitations of this study include the composite outcome and the broad inclusion criteria of any hypertensive disorder of pregnancy. This broad approach was used to optimize model generalizability.
The miniPIERS model shows reasonable ability to identify women at increased risk of adverse maternal outcomes associated with the hypertensive disorders of pregnancy. It could be used in LMICs to identify women who would benefit most from interventions such as magnesium sulphate, antihypertensives, or transportation to a higher level of care.
Please see later in the article for the Editors' Summary
Editors' Summary
Each year, ten million women develop pre-eclampsia or a related hypertensive (high blood pressure) disorder of pregnancy and 76,000 women die as a result. Globally, hypertensive disorders of pregnancy cause around 12% of maternal deaths—deaths of women during or shortly after pregnancy. The mildest of these disorders is gestational hypertension, high blood pressure that develops after 20 weeks of pregnancy. Gestational hypertension does not usually harm the mother or her unborn child and resolves after delivery but up to a quarter of women with this condition develop pre-eclampsia, a combination of hypertension and protein in the urine (proteinuria). Women with mild pre-eclampsia may not have any symptoms—the condition is detected during antenatal checks—but more severe pre-eclampsia can cause headaches, blurred vision, and other symptoms, and can lead to eclampsia (fits), multiple organ failure, and death of the mother and/or her baby. The only “cure” for pre-eclampsia is to deliver the baby as soon as possible but women are sometimes given antihypertensive drugs to lower their blood pressure or magnesium sulfate to prevent seizures.
Why Was This Study Done?
Women in low- and middle-income countries (LMICs) are more likely to develop complications of pre-eclampsia than women in high-income countries and most of the deaths associated with hypertensive disorders of pregnancy occur in LMICs. The high burden of illness and death in LMICs is thought to be primarily due to delays in triage (the identification of women who are or may become severely ill and who need specialist care) and delays in transporting these women to facilities where they can receive appropriate care. Because there is a shortage of health care workers who are adequately trained in the triage of suspected cases of hypertensive disorders of pregnancy in many LMICs, one way to improve the situation might be to design a simple tool to identify women at increased risk of complications or death from hypertensive disorders of pregnancy. Here, the researchers develop miniPIERS (Pre-eclampsia Integrated Estimate of RiSk), a clinical risk prediction model for adverse outcomes among women with hypertensive disorders of pregnancy suitable for use in community and primary health care facilities in LMICs.
What Did the Researchers Do and Find?
The researchers used data on candidate predictors of outcome that are easy to collect and/or measure in all health care settings and that are associated with pre-eclampsia from women admitted with any hypertensive disorder of pregnancy to participating centers in five LMICs to build a model to predict death or a serious complication such as organ damage within 48 hours of admission. The miniPIERS model included parity (whether the woman had been pregnant before), gestational age (length of pregnancy), headache/visual disturbances, chest pain/shortness of breath, vaginal bleeding with abdominal pain, systolic blood pressure, and proteinuria detected using a dipstick. The model was well-calibrated (the predicted risk of adverse outcomes agreed with the observed risk of adverse outcomes among the study participants), it had a good discriminatory ability (it could separate women who had a an adverse outcome from those who did not), and it designated women as being at high risk (25% or greater probability of an adverse outcome) with an accuracy of 85.5%. Importantly, external validation using data collected in fullPIERS, a study that developed a more complex clinical prediction model based on data from women attending tertiary hospitals in high-income countries, confirmed the predictive performance of miniPIERS.
What Do These Findings Mean?
These findings indicate that the miniPIERS model performs reasonably well as a tool to identify women at increased risk of adverse maternal outcomes associated with hypertensive disorders of pregnancy. Because miniPIERS only includes simple-to-measure personal characteristics, symptoms, and signs, it could potentially be used in resource-constrained settings to identify the women who would benefit most from interventions such as transportation to a higher level of care. However, further external validation of miniPIERS is needed using data collected from women living in LMICs before the model can be used during routine antenatal care. Moreover, the value of miniPIERS needs to be confirmed in implementation projects that examine whether its potential translates into clinical improvements. For now, though, the model could provide the basis for an education program to increase the knowledge of women, families, and community health care workers in LMICs about the signs and symptoms of hypertensive disorders of pregnancy.
Additional Information
Please access these websites via the online version of this summary at
The World Health Organization provides guidelines for the management of hypertensive disorders of pregnancy in low-resourced settings
The Maternal and Child Health Integrated Program provides information on pre-eclampsia and eclampsia targeted to low-resourced settings along with a tool-kit for LMIC providers
The US National Heart, Lung, and Blood Institute provides information about high blood pressure in pregnancy and a guide to lowering blood pressure in pregnancy
The UK National Health Service Choices website provides information about pre-eclampsia
The US not-for profit organization Preeclampsia Foundation provides information about all aspects of pre-eclampsia; its website includes some personal stories
The UK charity Healthtalkonline also provides personal stories about hypertensive disorders of pregnancy
MedlinePlus provides links to further information about high blood pressure and pregnancy (in English and Spanish); the MedlinePlus Encyclopedia has a video about pre-eclampsia (also in English and Spanish)
More information about miniPIERS and about fullPIERS is available
PMCID: PMC3897359  PMID: 24465185
13.  Detection of inter-patient left and right bundle branch block heartbeats in ECG using ensemble classifiers 
Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM).
This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types.
The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB.
A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients.
PMCID: PMC4086987  PMID: 24903422
Heartbeat classification; Left bundle branch block (LBBB); Right bundle branch block (RBBB); Independent component analysis (ICA); Linear discriminant classifier; Support vector machine (SVM); Ensemble
14.  Mass spectrometry protein expression profiles in colorectal cancer tissue associated with clinico-pathological features of disease 
BMC Cancer  2010;10:410.
Studies of several tumour types have shown that expression profiling of cellular protein extracted from surgical tissue specimens by direct mass spectrometry analysis can accurately discriminate tumour from normal tissue and in some cases can sub-classify disease. We have evaluated the potential value of this approach to classify various clinico-pathological features in colorectal cancer by employing matrix-assisted laser desorption ionisation time of-flight-mass spectrometry (MALDI-TOF MS).
Protein extracts from 31 tumour and 33 normal mucosa specimens were purified, subjected to MALDI-Tof MS and then analysed using the 'GenePattern' suite of computational tools (Broad Institute, MIT, USA). Comparative Gene Marker Selection with either a t-test or a signal-to-noise ratio (SNR) test statistic was used to identify and rank differentially expressed marker peaks. The k-nearest neighbours algorithm was used to build classification models either using separate training and test datasets or else by using an iterative, 'leave-one-out' cross-validation method.
73 protein peaks in the mass range 1800-16000Da were differentially expressed in tumour verses adjacent normal mucosa tissue (P ≤ 0.01, false discovery rate ≤ 0.05). Unsupervised hierarchical cluster analysis classified most tumour and normal mucosa into distinct cluster groups. Supervised prediction correctly classified the tumour/normal mucosa status of specimens in an independent test spectra dataset with 100% sensitivity and specificity (95% confidence interval: 67.9-99.2%). Supervised prediction using 'leave-one-out' cross validation algorithms for tumour spectra correctly classified 10/13 poorly differentiated and 16/18 well/moderately differentiated tumours (P = < 0.001; receiver-operator characteristics - ROC - error, 0.171); disease recurrence was correctly predicted in 5/6 cases and disease-free survival (median follow-up time, 25 months) was correctly predicted in 22/23 cases (P = < 0.001; ROC error, 0.105). A similar analysis of normal mucosa spectra correctly predicted 11/14 patients with, and 15/19 patients without lymph node involvement (P = 0.001; ROC error, 0.212).
Protein expression profiling of surgically resected CRC tissue extracts by MALDI-TOF MS has potential value in studies aimed at improved molecular classification of this disease. Further studies, with longer follow-up times and larger patient cohorts, that would permit independent validation of supervised classification models, would be required to confirm the predictive value of tumour spectra for disease recurrence/patient survival.
PMCID: PMC2927547  PMID: 20691062
15.  Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures 
PLoS Computational Biology  2008;4(2):e22.
Microarray gene expression signatures hold great promise to improve diagnosis and prognosis of disease. However, current documentation standards of such signatures do not allow for an unambiguous application to study-external patients. This hinders independent evaluation, effectively delaying the use of signatures in clinical practice. Data from eight publicly available clinical microarray studies were analyzed and the consistency of study-internal with study-external diagnoses was evaluated. Study-external classifications were based on documented information only. Documenting a signature is conceptually different from reporting a list of genes. We show that even the exact quantitative specification of a classification rule alone does not define a signature unambiguously. We found that discrepancy between study-internal and study-external diagnoses can be as frequent as 30% (worst case) and 18% (median). By using the proposed documentation by value strategy, which documents quantitative preprocessing information, the median discrepancy was reduced to 1%. The process of evaluating microarray gene expression diagnostic signatures and bringing them to clinical practice can be substantially improved and made more reliable by better documentation of the signatures.
Author Summary
It has been shown that microarray based gene expression signatures have the potential to be powerful tools for patient stratification, diagnosis of disease, prognosis of survival, assessment of risk group, and selection of treatment. However, documentation standards in current publications do not allow for a signature's unambiguous application to study-external patients. This hinders independent evaluation, effectively delaying the use of signatures in clinical practice. Based on eight clinical microarray studies, we show that common documentation standards have the following shortcoming: when using the documented information only, the same patient might receive a diagnosis different from the one he would have received in the original study. To address the problem, we derive a documentation protocol that reduces the ambiguity of diagnoses to a minimum. The resulting gain in consistency of study-internal versus study-external diagnosis is validated by statistical resampling analysis: using the proposed documentation by value strategy, the median inconsistency dropped from 18% to 1%. Software implementing the proposed method, as well as practical guidelines for using it, are provided. We conclude that the process of evaluating microarray gene expression diagnostic signatures and bringing them to clinical practice can be substantially improved and made more reliable by better documentation.
PMCID: PMC2242819  PMID: 18282081
16.  A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study 
The lancet oncology  2009;10(2):125-134.
In childhood acute lymphoblastic leukemia (ALL) genetic subtypes are recognized that determine the risk-group for further treatment. However, 25% of precursor BALL are currently genetically unclassified and have an intermediate prognosis. The present study used genome-wide strategies to reveal new biological insights and advance the prognostic classification of childhood ALL.
A classifier based on gene expression in ALL cells from 190 newly diagnosed pediatric cases was constructed using a double-loop cross-validation method and next, was validated on an independent cohort of 107 newly diagnosed pediatric ALL cases. Hierarchical cluster analysis using classifying gene probe sets then revealed a novel ALL subtype for which underlying genetic abnormalities were characterized by comparative genomic hybridization-arrays and molecular cytogenetics.
The prediction accuracy of the classifier was median 90% in the discovery cohort and 87.9% in the independent validation cohort. A significant part of the currently genetically unclassified cases clustered with BCR-ABL-positive cases in both the discovery and validation cohort. These BCR-ABL-like cases represent 15–20% of ALL cases and have a highly unfavorable outcome (5-year disease-free survival 59.5%, 95%CI: 37.1%–81.9%) compared to other precursor B-ALL cases (84.4%, 95%CI: 76.8%–92.1%; P=0.012), similar to the poor prognosis of BCR-ABL-positive ALL (51.9%, 95%CI: 23.1%–80.6%), as was confirmed in the validation cohort. Further genetic studies revealed that the BCR-ABL-like subtype is characterized by a high frequency of deletions in genes involved in B-cell development (82%), including IKAROS, E2A, EBF1, PAX5 and VPREB1, compared to other ALL cases (36%, p=0.0002). BCR-ABL-like leukemic cells were median >70-times resistant to L-asparaginase (p=0.001) and 1.6-times more resistant to daunorubicin (p=0.017) compared to other precursor B-ALL cases whereas the toxicity of prednisolone and vincristine did not significantly differ.
Classification by gene expression profiling identified a novel subtype of ALL not detected by current diagnostic procedures but which comprises the largest group of patients with a high-risk of treatment failure. New treatment strategies are needed to improve outcome for this novel high-risk subtype of ALL.
Dutch Cancer Society, Sophia Foundation for Medical Research, Pediatric Oncology Foundation Rotterdam, Center of Medical Systems Biology of the Netherlands Genomics Initiative/Netherlands Organisation for Scientific Research, American National Institute of Health, American National Cancer Institute and American Lebanese Syrian Associated Charities.
PMCID: PMC2707020  PMID: 19138562
microarray; gene expression profiling; classification; genotype; novel subtype; class discovery; ALL
17.  Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods 
The Scientific World Journal  2012;2012:380495.
Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients.
PMCID: PMC3515909  PMID: 23251101
18.  Unsupervised Analysis of Transcriptomic Profiles Reveals Six Glioma Subtypes 
Cancer research  2009;69(5):2091-2099.
Gliomas are the most common type of primary brain tumors in adults and a significant cause of cancer-related mortality. Defining glioma subtypes based on objective genetic and molecular signatures may allow for a more rational, patient-specific approach to therapy in the future. Classifications based on gene expression data have been attempted in the past with varying success and with only some concordance between studies, possibly due to inherent bias that can be introduced through the use of analytic methodologies that make a priori selection of genes before classification. To overcome this potential source of bias, we have applied two unsupervised machine learning methods to genome-wide gene expression profiles of 159 gliomas, thereby establishing a robust glioma classification model relying only on the molecular data. The model predicts for two major groups of gliomas (oligodendroglioma-rich and glioblastoma-rich groups) separable into six hierarchically nested subtypes. We then identified six sets of classifiers that can be used to assign any given glioma to the corresponding subtype and validated these classifiers using both internal (189 additional independent samples) and two external data sets (341 patients). Application of the classification system to the external glioma data sets allowed us to identify previously unrecognized prognostic groups within previously published data and within The Cancer Genome Atlas glioblastoma samples and the different biological pathways associated with the different glioma subtypes offering a potential clue to the pathogenesis and possibly therapeutic targets for tumors within each subtype.
PMCID: PMC2845963  PMID: 19244127
19.  Comparison of multivariate classifiers and response normalizations for pattern-information fMRI 
NeuroImage  2010;53(1):103-118.
A popular method for investigating whether stimulus information is present in fMRI response patterns is to attempt to “decode” the stimuli from the response patterns with a multivariate classifier. The sensitivity for detecting the information depends on the particular classifier used. However, little is known about the relative performance of different classifiers on fMRI data. Here we compared six multivariate classifiers and investigated how the response-amplitude estimate used (beta or t-value) and different pattern normalizations affect classification performance. The compared classifiers were a pattern-correlation classifier, a k-nearest-neighbors classifier, Fisher’s linear discriminant, Gaussian naïve Bayes, and linear and nonlinear (radial-basis-function-kernel) support vector machines. We compared these classifiers’ accuracy at decoding the category of visual objects from response patterns in human early visual and inferior temporal cortex acquired in an event-related design with BOLD fMRI at 3T using SENSE and isotropic voxels of about 2-mm width. Overall, Fisher’s linear discriminant (with an optimal-shrinkage covariance estimator) and the linear support vector machine performed best. The pattern-correlation classifier often performed similarly as those two classifiers. The nonlinear classifiers never performed better and sometimes significantly worse than the linear classifiers, suggesting overfitting. Defining response patterns by t-values (or in error-standard-deviation units) rather than by beta estimates (in % signal change) to define the patterns appeared advantageous. Cross-validation by a leave-one-stimulus-pair-out method gave higher accuracies than a leave-one-run-out method, suggesting that generalization to independent runs (which more safely ensures independence of the test set) is more challenging than generalization to novel stimuli within the same category. Independent selection of fewer more visually responsive voxels tended to yield better decoding performance for all classifiers. Normalizing mean and standard deviation of the response patterns either across stimuli or across voxels had no significant effect on decoding performance. Overall our results suggest that linear decoders based on t-value patterns may perform best in the present scenario of visual object representations measured for about 60-minutes per subject with 3T fMRI.
PMCID: PMC2914143  PMID: 20580933
Multi-voxel pattern analysis; decoding; classification analysis; fMRI; normalization
20.  Risk score to predict gastrointestinal bleeding after acute ischemic stroke 
BMC Gastroenterology  2014;14:130.
Gastrointestinal bleeding (GIB) is a common and often serious complication after stroke. Although several risk factors for post-stroke GIB have been identified, no reliable or validated scoring system is currently available to predict GIB after acute stroke in routine clinical practice or clinical trials. In the present study, we aimed to develop and validate a risk model (acute ischemic stroke associated gastrointestinal bleeding score, the AIS-GIB score) to predict in-hospital GIB after acute ischemic stroke.
The AIS-GIB score was developed from data in the China National Stroke Registry (CNSR). Eligible patients in the CNSR were randomly divided into derivation (60%) and internal validation (40%) cohorts. External validation was performed using data from the prospective Chinese Intracranial Atherosclerosis Study (CICAS). Independent predictors of in-hospital GIB were obtained using multivariable logistic regression in the derivation cohort, and β-coefficients were used to generate point scoring system for the AIS-GIB. The area under the receiver operating characteristic curve (AUROC) and the Hosmer-Lemeshow goodness-of-fit test were used to assess model discrimination and calibration, respectively.
A total of 8,820, 5,882, and 2,938 patients were enrolled in the derivation, internal validation and external validation cohorts. The overall in-hospital GIB after AIS was 2.6%, 2.3%, and 1.5% in the derivation, internal, and external validation cohort, respectively. An 18-point AIS-GIB score was developed from the set of independent predictors of GIB including age, gender, history of hypertension, hepatic cirrhosis, peptic ulcer or previous GIB, pre-stroke dependence, admission National Institutes of Health stroke scale score, Glasgow Coma Scale score and stroke subtype (Oxfordshire). The AIS-GIB score showed good discrimination in the derivation (0.79; 95% CI, 0.764-0.825), internal (0.78; 95% CI, 0.74-0.82) and external (0.76; 95% CI, 0.71-0.82) validation cohorts. The AIS-GIB score was well calibrated in the derivation (P = 0.42), internal (P = 0.45) and external (P = 0.86) validation cohorts.
The AIS-GIB score is a valid clinical grading scale to predict in-hospital GIB after AIS. Further studies on the effect of the AIS-GIB score on reducing GIB and improving outcome after AIS are warranted.
PMCID: PMC4120715  PMID: 25059927
21.  Bias in error estimation when using cross-validation for model selection 
BMC Bioinformatics  2006;7:91.
Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data.
We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error.
The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance.
The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions.
We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
PMCID: PMC1397873  PMID: 16504092
22.  Pragmatic controlled clinical trials in primary care: the struggle between external and internal validity 
Controlled clinical trials of health care interventions are either explanatory or pragmatic. Explanatory trials test whether an intervention is efficacious; that is, whether it can have a beneficial effect in an ideal situation. Pragmatic trials measure effectiveness; they measure the degree of beneficial effect in real clinical practice. In pragmatic trials, a balance between external validity (generalizability of the results) and internal validity (reliability or accuracy of the results) needs to be achieved. The explanatory trial seeks to maximize the internal validity by assuring rigorous control of all variables other than the intervention. The pragmatic trial seeks to maximize external validity to ensure that the results can be generalized. However the danger of pragmatic trials is that internal validity may be overly compromised in the effort to ensure generalizability. We are conducting two pragmatic randomized controlled trials on interventions in the management of hypertension in primary care. We describe the design of the trials and the steps taken to deal with the competing demands of external and internal validity.
External validity is maximized by having few exclusion criteria and by allowing flexibility in the interpretation of the intervention and in management decisions. Internal validity is maximized by decreasing contamination bias through cluster randomization, and decreasing observer and assessment bias, in these non-blinded trials, through baseline data collection prior to randomization, automating the outcomes assessment with 24 hour ambulatory blood pressure monitors, and blinding the data analysis.
Clinical trials conducted in community practices present investigators with difficult methodological choices related to maintaining a balance between internal validity (reliability of the results) and external validity (generalizability). The attempt to achieve methodological purity can result in clinically meaningless results, while attempting to achieve full generalizability can result in invalid and unreliable results. Achieving a creative tension between the two is crucial.
PMCID: PMC317298  PMID: 14690550
23.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value 
PLoS Medicine  2013;10(5):e1001453.
Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.
Methods and Findings
Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype–like, normal-like, serrated CC phenotype–like), and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II–III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse-free survival, even after adjusting for age, sex, stage, and the emerging prognostic classifier Oncotype DX Colon Cancer Assay recurrence score (hazard ratio 1.5, 95% CI 1.1–2.1, p = 0.0097). However, a limitation of this study is that information on tumor grade and number of nodes examined was not available.
We describe the first, to our knowledge, robust transcriptome-based classification of CC that improves the current disease stratification based on clinicopathological variables and common DNA markers. The biological relevance of these subtypes is illustrated by significant differences in prognosis. This analysis provides possibilities for improving prognostic models and therapeutic strategies. In conclusion, we report a new classification of CC into six molecular subtypes that arise through distinct biological pathways.
Please see later in the article for the Editors' Summary
Editors' Summary
Cancer of the large bowel (colorectal cancer) is the third most common cancer in men and the second most common cancer in women worldwide. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year from this form of cancer—8% of all cancer deaths. The prognosis and treatment options for colorectal cancer depend on five pathological stages (0–IV), each of which has a different treatment option and five year survival rate, so it is important that the stage is correctly identified. Unfortunately, pathological staging fails to accurately predict recurrence (relapse) in patients undergoing surgery for localized colorectal cancer, which is a concern, as 10%–20% of patients with stage II and 30%–40% of those with stage III colorectal cancer develop recurrence.
Why Was This Study Done?
Previous studies have investigated whether there are any possible gene expression profiles (identified through microarray techniques) that can help predict prognosis of colorectal cancer, but so far, there have been no firm conclusions that can aid clinical practice. In this study, the researchers used genetic information from a French multicenter study to identify a standard, reproducible molecular classification based on gene expression analysis of colorectal cancer. The authors also assessed whether there were any associations between the identified molecular subtypes and clinical and pathological factors, common DNA alterations, and prognosis.
What Did the Researchers Do and Find?
The researchers used genetic information from a cohort of 750 patients with stage I to IV colorectal cancer who underwent surgery between 1987 and 2007 in seven centers in France. The researchers identified relevant clinical and pathological staging information for each patient from the medical records and calculated recurrence-free survival (the time from surgery to the first recurrence) for patients with stage II or III disease. In the genetic analysis, 566 tumor samples were suitable—443 were used in a discovery set, to create the classification, and the remainder were used in a validation set, to test the classification. The researchers also used information from eight public datasets to validate their findings.
Using these methods, the researchers classified the colon cancer samples into six molecular subtypes (based on gene expression data) and, on further analysis and validation, were able to distinguish the main biological characteristics and deregulated pathways associated with each subtype. Importantly, the researchers found that that these six subtypes were associated with distinct clinical and pathological characteristics, molecular alterations, specific gene expression signatures, and deregulated signaling pathways. In the prognostic analysis based on recurrence-free survival, the researchers found that patients whose tumors were classified in one of two clusters (C4 and C6) had poorer recurrence-free survival than the other patients.
What Do These Findings Mean?
These findings suggest that it is possible to classify colorectal cancer into six robust molecular subtypes that might help identify new prognostic subgroups and could provide a basis for developing robust prognostic genetic signatures for stage II and III colorectal cancer and for identifying specific markers for the different subtypes that might be targets for future drug development. However, as this study was retrospective and did not include some known predictors of colorectal cancer prognosis, such as tumor grade and number of nodes examined, the significance and robustness of the prognostic classification requires further confirmation with large prospective patient cohorts.
Additional Information
Please access these websites via the online version of this summary at
The American Cancer Society provides information about colorectal cancer and also about how colorectal cancer is staged
The US National Cancer Institute also provides information on colon and rectal cancer and colon cancer stages
PMCID: PMC3660251  PMID: 23700391
24.  Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma 
Cancer Informatics  2008;6:389-421.
In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA ( to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.
PMCID: PMC2623303  PMID: 19259419
25.  Development and validation of a prognostic model in patients with metastatic renal cell carcinoma treated with sunitinib: a European collaboration 
British Journal of Cancer  2013;109(2):332-341.
Accurate prediction of outcome for metastatic renal cell carcinoma (mRCC) patients receiving targeted therapy is essential. Most of the available models have been developed in patients treated with cytokines, while most of them are fairly complex, including at least five factors. We developed and externally validated a simple model for overall survival (OS) in mRCC. We also studied the recently validated International Database Consortium (IDC) model in our data sets.
The development cohort included 170 mRCC patients treated with sunitinib. The final prognostic model was selected by uni- and multivariate Cox regression analyses. Risk groups were defined by the number of risk factors and by the 25th and 75th percentiles of the model's prognostic index distribution. The model was validated using an independent data set of 266 mRCC patients (validation cohort) treated with the same agent.
Eastern Co-operative Oncology Group (ECOG) performance status (PS), time from diagnosis of RCC and number of metastatic sites were included in the final model. Median OS of patients with 1, 2 and 3 risk factors were: 24.7, 12.8 and 5.9 months, respectively, whereas median OS was not reached for patients with 0 risk factors. Concordance (C) index for internal validation was 0.712, whereas C-index for external validation was 0.634, due to differences in survival especially in poor-risk populations between the two cohorts. Predictive performance of the model was improved after recalibration. Application of the mRCC International Database Consortium (IDC) model resulted in a C-index of 0.574 in the development and 0.576 in the validation cohorts (lower than those recently reported for this model). Predictive ability was also improved after recalibration in this analysis. Risk stratification according to IDC model showed more similar outcomes across the development and validation cohorts compared with our model.
Our model provides a simple prognostic tool in mRCC patients treated with a targeted agent. It had similar performance with the IDC model, which, however, produced more consistent survival results across the development and validation cohorts. The predictive ability of both models was lower than that suggested by internal validation (our model) or recent published data (IDC model), due to differences between observed and predicted survival among intermediate and poor-risk patients. Our results highlight the importance of external validation and the need for further refinement of existing prognostic models.
PMCID: PMC3721408  PMID: 23807171
renal cancer; prognostic model; targeted therapy; sunitinib

Results 1-25 (1217021)