The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
The study compared actual with predicted survival estimates in advanced stage non-small cell lung cancer patients. Regardless of years of experience, physicians overestimated the survival duration of these patients.
Because most cases of non-small cell lung cancer (NSCLC) are diagnosed at an advanced stage with a poor prognosis, patient inclusion in clinical trials is critical. Most trials require an estimated life expectancy >3 months, based on clinician estimates of patient survival probability, without providing formal guidelines. The aim of this study was to assess the accuracy of clinicians' predictions of survival in NSCLC patients (stages IIIB, and IV) and the possible impact of patient quality of life on survival estimation.
At diagnosis, clinical, biological, and quality of life data (QLQ-C30 questionnaire) were recorded, and doctors “forecast” each patient's estimated survival. Concordance between predicted and actual survival was assessed with the intraclass correlation coefficient.
Eighty-five patients with a mean age of 62.2 years, 81.1% male, were included (squamous cell carcinoma, 33; adenocarcinoma, 42; large cell carcinoma, 8; neuroendocrine carcinoma, 2). The mean follow-up was 40 months and median survival time was 11.7 (range, 0.4–143.7) weeks. All clinicians (residents, registrars, and consultants) overestimated patient survival time, with a moderate concordance between predicted and actual survival time. A worse global health status was associated with a lower discrepancy between estimated and actual patient survival, and a worse role functioning was associated with a larger difference between estimated and actual patient survival.
The absence of specific recommendations to estimate patient survival may introduce major selection in clinical studies. Further research should investigate whether the accuracy of patient survival estimates by clinicians would be improved by taking into account patient quality of life.
Advanced stage non-small cell lung cancer; Survival; Quality of life; Prognostic factors; Predictive estimation
The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.
Uncovering the dominant molecular deregulation among the multitude of pathways implicated in aggressive prostate cancer is essential to intelligently developing targeted therapies. Paradoxically, published prostate cancer gene expression signatures of poor prognosis share little overlap and thus do not reveal shared mechanisms. The authors hypothesize that, by analyzing gene signatures with quantitative models of protein–protein interactions, key pathways will be elucidated and shown to be shared.
The authors statistically prioritized common interactors between established cancer genes and genes from each prostate cancer signature of poor prognosis independently via a previously validated single protein analysis of network (SPAN) methodology. Additionally, they computationally identified pathways among the aggregated interactors across signatures and validated them using a similarity metric and patient survival.
Using an information-theoretic metric, the authors assessed the mechanistic similarity of the interactor signature. Its prognostic ability was assessed in an independent cohort of 198 patients with high-Gleason prostate cancer using Kaplan–Meier analysis.
Of the 13 prostate cancer signatures that were evaluated, eight interacted significantly with established cancer genes (false discovery rate <5%) and generated a 42-gene interactor signature that showed the highest mechanistic similarity (p<0.0001). Via parameter-free unsupervised classification, the interactor signature dichotomized the independent prostate cancer cohort with a significant survival difference (p=0.009). Interpretation of the network not only recapitulated phosphatidylinositol-3 kinase/NF-κB signaling, but also highlighted less well established relevant pathways such as the Janus kinase 2 cascade.
SPAN methodolgy provides a robust means of abstracting disparate prostate cancer gene expression signatures into clinically useful, prioritized pathways as well as useful mechanistic pathways.
Prostate cancer; protein networks; systems biology; information theory; network modeling; Simulation of complex systems (at all levels: molecules to work groups to organizations); knowledge representations; Uncertain reasoning and decision theory; languages and computational methods; statistical analysis of large datasets; advanced algorithms; discovery and text and data mining methods; Natural-language processing; Automated learning; Ontologies
The categorical definition of response assessed via the Response Evaluation Criteria in Solid Tumors has documented limitations. We sought to identify alternative metrics for tumor response that improve prediction of overall survival.
Individual patient data from three North Central Cancer Treatment Group trials (N0026, n=117; N9741, n=1109; N9841, n=332) were used. Continuous metrics of tumor size based on longitudinal tumor measurements were considered in addition to a trichotomized response (TriTR: Response vs. Stable vs. Progression). Cox proportional hazards models, adjusted for treatment arm and baseline tumor burden, were used to assess the impact of the metrics on subsequent overall survival, using a landmark analysis approach at 12-, 16- and 24-weeks post baseline. Model discrimination was evaluated using the concordance (c) index.
The overall best response rates for the three trials were 26%, 45%, and 25% respectively. While nearly all metrics were statistically significantly associated with overall survival at the different landmark time points, the c-indices for the traditional response metrics ranged from 0.59-0.65; for the continuous metrics from 0.60-0.66 and for the TriTR metrics from 0.64-0.69. The c-indices for TriTR at 12-weeks were comparable to those at 16- and 24-weeks.
Continuous tumor-measurement-based metrics provided no predictive improvement over traditional response based metrics or TriTR; TriTR had better predictive ability than best TriTR or confirmed response. If confirmed, TriTR represents a promising endpoint for future Phase II trials.
continuous; tumor measurement; RECIST; prediction; survival
The majority of current models utilized for predicting toxicity in prostate cancer radiotherapy are based on dose-volume histograms. One of their main drawbacks is the lack of spatial accuracy, since they consider the organs as a whole volume and thus ignore the heterogeneous intra-organ radio-sensitivity. In this paper, we propose a dose-image-based framework to reveal the relationships between local dose and toxicity. In this approach, the three-dimensional (3D) planned dose distributions across a population are non-rigidly registered into a common coordinate system and compared at a voxel level, therefore enabling the identification of 3D anatomical patterns, which may be responsible for toxicity, at least to some extent. Additionally, different metrics were employed in order to assess the quality of the dose mapping. The value of this approach was demonstrated by prospectively analyzing rectal bleeding (≥Grade 1 at 2 years) according to the CTCAE v3.0 classification in a series of 105 patients receiving 80Gy to the prostate by IMRT. Within the patients presenting bleeding, a significant dose excess (6Gy on average, p<0.01) was found in a region of the anterior rectal wall. This region, close to the prostate (1cm), represented less than 10% of the rectum. This promising voxel-wise approach allowed subregions to be defined within the organ that may be involved in toxicity and, as such, must be considered during the inverse IMRT planning step.
Humans; Imaging, Three-Dimensional; methods; Male; Organs at Risk; radiation effects; Prostatic Neoplasms; radiotherapy; Radiation Dosage; Radiotherapy Dosage; Radiotherapy Planning, Computer-Assisted; methods; Rectum; radiation effects; Prostate cancer radiotherapy; rectal bleeding; toxicity prediction; spatial characterization; population analysis; non-rigid registration; predictive models
Survival analysis focuses on modeling and predicting the time to an event of interest. Many
statistical models have been proposed for survival analysis. They often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In particular, the prevalent proportional hazards model assumes that covariates are multiplicatively related to the hazard. Here we propose a nonparametric model for survival analysis that does not explicitly assume particular forms of hazard functions. Our nonparametric model utilizes an ensemble of regression trees to determine how the hazard function varies according to the associated covariates. The ensemble model is trained using a gradient boosting method to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model performance evaluation. We implemented our model in a software package called GBMCI (gradient boosting machine for concordance index) and benchmarked the performance of our model against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. GBMCI is implemented in R and is freely available online.
Clinicians need to predict prognosis of Alzheimer's disease (AD), and researchers need models of progression to develop biomarkers and clinical trials designs. We tested a calculated initial progression rate to see whether it predicted performance on cognition, function and behavior over time, and to see whether it predicted survival.
We used standardized approaches to assess baseline characteristics and to estimate disease duration, and calculated the initial (pre-progression) rate in 597 AD patients followed for up to 15 years. We designated slow, intermediate and rapidly progressing groups. Using mixed effects regression analysis, we examined the predictive value of a pre-progression group for longitudinal performance on standardized measures. We used Cox survival analysis to compare survival time by progression group.
Patients in the slow and intermediate groups maintained better performance on the cognitive (ADAScog and VSAT), global (CDR-SB) and complex activities of daily living measures (IADL) (P values < 0.001 slow versus fast; P values < 0.003 to 0.03 intermediate versus fast). Interaction terms indicated that slopes of ADAScog and PSMS change for the slow group were smaller than for the fast group, and that rates of change on the ADAScog were also slower for the intermediate group, but that CDR-SB rates increased in this group relative to the fast group. Slow progressors survived longer than fast progressors (P = 0.024).
A simple, calculated progression rate at the initial visit gives reliable information regarding performance over time on cognition, global performance and activities of daily living. The slowest progression group also survives longer. This baseline measure should be considered in the design of long duration Alzheimer's disease clinical trials.
The spinal metastasis occurs in up to 40% of cancer patient. We compared the Tokuhashi and Tomita scoring systems, two commonly used scoring systems for prognosis in spinal metastases. We also assessed the different variables separately with respect to their value in predicting postsurgical life expectancy. Finally, we suggest criteria for selecting patients for surgery based on the postoperative survival pattern.
Materials and Methods:
We retrospectively analyzed 102 patients who had been operated for metastatic disease of the spine. Predictive scoring was done according to the scoring systems proposed by Tokuhashi and Tomita. Overall survival was assessed using Kaplan–Meier survival analysis. Using the log rank test and Cox regression model we assessed the value of the individual components of each scoring system for predicting survival in these patients.
The factors that were most significantly associated with survival were the general condition score (Karnofsky Performance Scale) (P=.000, log rank test), metastasis to internal organs (P=.0002 log rank test), and number of extraspinal bone metastases (P=.0058). Type of primary tumor was not found to be significantly associated with survival according to the revised Tokuhashi scoring system (P=.9131, log rank test). Stepwise logistic regression revealed that the Tomita score correlated more closely with survival than the Tokuhashi score.
The patient's performance status, extent of visceral metastasis, and extent of bone metastases are significant predictors of survival in patients with metastatic disease. Both revised Tokuhashi and Tomita scores were significantly correlated with survival. A revised Tokuhashi score of 7 or more and a Tomita score of 6 or less indicated >50% chance of surviving 6 months postoperatively. We recommend that the Tomita score be used for prognostication in patients who are contemplating surgery, as it is simpler to score and has a higher strength of correlation with survival than the Tokuhashi score.
Spine metastasis; prognostic predictors; survival
Several methodological approaches have been used to estimate distance in health service research. In this study, focusing on cardiac catheterization services, Euclidean, Manhattan, and the less widely known Minkowski distance metrics are used to estimate distances from patient residence to hospital. Distance metrics typically produce less accurate estimates than actual measurements, but each metric provides a single model of travel over a given network. Therefore, distance metrics, unlike actual measurements, can be directly used in spatial analytical modeling. Euclidean distance is most often used, but unlikely the most appropriate metric. Minkowski distance is a more promising method. Distances estimated with each metric are contrasted with road distance and travel time measurements, and an optimized Minkowski distance is implemented in spatial analytical modeling.
Road distance and travel time are calculated from the postal code of residence of each patient undergoing cardiac catheterization to the pertinent hospital. The Minkowski metric is optimized, to approximate travel time and road distance, respectively. Distance estimates and distance measurements are then compared using descriptive statistics and visual mapping methods. The optimized Minkowski metric is implemented, via the spatial weight matrix, in a spatial regression model identifying socio-economic factors significantly associated with cardiac catheterization.
The Minkowski coefficient that best approximates road distance is 1.54; 1.31 best approximates travel time. The latter is also a good predictor of road distance, thus providing the best single model of travel from patient's residence to hospital. The Euclidean metric and the optimal Minkowski metric are alternatively implemented in the regression model, and the results compared. The Minkowski method produces more reliable results than the traditional Euclidean metric.
Road distance and travel time measurements are the most accurate estimates, but cannot be directly implemented in spatial analytical modeling. Euclidean distance tends to underestimate road distance and travel time; Manhattan distance tends to overestimate both. The optimized Minkowski distance partially overcomes their shortcomings; it provides a single model of travel over the network. The method is flexible, suitable for analytical modeling, and more accurate than the traditional metrics; its use ultimately increases the reliability of spatial analytical models.
Detection and analysis of epileptic seizures is of clinical and research interest. We propose a novel seizure detection and analysis scheme based on the phase-slope index (PSI) of directed influence applied to multichannel electrocorticogram data. The PSI metric identifies increases in the spatio–temporal interactions between channels that clearly distinguish seizure from interictal activity. We form a global metric of interaction between channels and compare this metric to a threshold to detect the presence of seizures. The threshold is chosen based on a moving average of recent activity to accommodate differences between patients and slow changes within each patient over time. We evaluate detection performance over a challenging population of five patients with different types of epilepsy using a total of 47 seizures in nearly 258 h of recorded data. Using a common threshold procedure, we show that our approach detects all of the seizures in four of the five patients with a false detection rate less than two per hour. A variation on the global metric is proposed to identify which channels are strong drivers of activity in each patient. These metrics are computationally efficient and suitable for real-time application.
Epilepsy; multichannel electrocorticogram (ECoG); phase-slope index (PSI); seizure detection; seizure evolution
Prediction of prognosis is important for patients so that they can make the most of the rest of their lives. Oncologists could predict survival, but the accuracy of such predictions is unclear.
In this observational prospective cohort study, 14 oncologists treating 9 major adult solid malignancies were asked to complete questionnaires predicting survival based on performance status, oral intake, and other clinical factors when patients experienced progressive disease after standard chemotherapies. Clinically predicted survival (cps) was calculated by the oncologists from the date of progressive disease to the predicted date of death. Actual survival (as) was compared with cps using Kaplan–Meier survival curves, and factors affecting inaccurate prediction were determined by logistic regression analysis. The prediction of survival time was considered accurate when the cps/as ratio was between 0.67 and 1.33.
The study cohort consisted of 75 patients. Median cps was 120 days (interquartile range: 60–180 days), and median as was 121 days (interquartile range: 40–234 days). The participating oncologists accurately predicted as within a 33% range 36% of the time; the survival time was overestimated 36% of time and underestimated 28% of the time. The factors affecting the accuracy of the survival estimate were the experience of the oncologist, patient age, and information given about the palliative care unit.
Prediction of cps was accurate for just slightly more than one third of all patients in this study. Additional investigation of putative prognostic factors with a larger sample size is warranted.
Survival prediction; cancer patient survival; chemotherapy
In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia.
Materials and Methods
We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic).
We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48–0.53) to 0.63 (95% CI, 0.60–0.66).
The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making.
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials.
Causal inference; Meta-analysis; Surrogacy
Many approaches have been taken to adjust for smoking in modeling cancer risk. In case-control studies, these metrics are often used arbitrarily rather than being based on the properties of the metric in the context of the study. Depending on the underlying study design, hypotheses and base population, different metrics may be deemed most appropriate. We present our approach to evaluating different smoking metrics. We examine the properties of a new metric, “logcig-years”, that we initially derived from utilizing a biological model of DNA adduct formation. We compare this metric to three other smoking metrics, namely pack-years, square-root pack-years, and a model in which smoking duration and intensity are separate variables. Our comparisons use generalized additive models and logistic regression to examine the relationship between the logit probability of cancer and each of the metrics, while adjusting for other covariates. All models were fit using data from a lung cancer study of 1275 cases and 1269 controls that has focused on gene-smoking relationships. There was a very significant, linear relationship between logcig-years and the logit probability of lung cancer in this sample, without any need to adjust for smoking status. These properties together were not shared by the other metrics. In this sample, logcig-years captured more information about smoking that is important in lung cancer risk than the other metrics. In conclusion, we provide a general framework for evaluating different smoking metrics in studies where smoking is a critical variable.
This study was conducted to investigate the assessment of treatment efficacy of radiotherapy (RT) and other therapeutic modalities compared with palliative care only for treatment with advanced hepatocellular carcinoma (HCC).
From 2002 to 2010, based on the case of 47 patients with advanced HCC, we have investigated each patients' Child-Pugh's class, ECOG performance, serum level of alpha fetoprotein and other baseline characteristics that is considered to be predictive variables and values for prognosis of HCC. Out of overall patients, the 29 patients who had received RT were selected for one group and the 18 patients who had received only palliative care were classified for the other. The analysis in survival between the two groups was done to investigate the efficacy of RT.
Under the analysis in survival, the mean survival time of total patients group was revealed between 30.1 months and 45.9 months in RT group, while it was 4.8 months in palliative care group, respectively. In the univariate analysis for overall patients, there were significant factors which affected survival rate like as follows: ECOG performance, Child-Pugh's class, the tumor size, the type of tumor, alpha fetoprotein, transarterial chemoembolization, and RT. The regressive analysis in multivariate Cox for total patients. No treatment under radiotherapy and high level of Child-Pugh's class grade were independent predictors of worse overall survival rate in patients. In contrast, for the subset analysis of the twenty-nine patients treated with radiotherapy, the higher serum level of alpha fetoprotein was an independent predictors of worse overall survival rate in patients.
We found that the survival of patients with advanced HCC was better with radiotherapy than with palliative care. Therefore, radiotherapy could be a good option for in patients with advanced HCC.
Hepatocellular carcinoma; Radiotherapy; Survival rate; Alpha-fetoprotein; Child-Pugh class
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.
We developed an extensible software framework for sharing molecular prognostic models of breast cancer survival in a transparent collaborative environment and subjecting each model to automated evaluation using objective metrics. The computational framework presented in this study, our detailed post-hoc analysis of hundreds of modeling approaches, and the use of a novel cutting-edge data resource together represents one of the largest-scale systematic studies to date assessing the factors influencing accuracy of molecular-based prognostic models in breast cancer. Our results demonstrate the ability to infer prognostic models with accuracy on par or greater than previously reported studies, with significant performance improvements by using state-of-the-art machine learning approaches trained on clinical covariates. Our results also demonstrate the difficultly in incorporating molecular data to achieve substantial performance improvements over clinical covariates alone. However, improvement was achieved by combining clinical feature data with intelligent selection of important molecular features based on domain-specific prior knowledge. We observe that ensemble models aggregating the information across many diverse models achieve among the highest scores of all models and systematically out-perform individual models within the ensemble, suggesting a general strategy for leveraging the wisdom of crowds to develop robust predictive models.
There are dilemmas associated with the diagnosis and prognosis of prostate cancer which has lead to over diagnosis and over treatment. Prediction tools have been developed to assist the treatment of the disease.
A retrospective review was performed of the Irish Prostate Cancer Research Consortium database and 603 patients were used in the study. Statistical models based on routinely used clinical variables were built using logistic regression, random forests and k nearest neighbours to predict prostate cancer stage. The predictive ability of the models was examined using discrimination metrics, calibration curves and clinical relevance, explored using decision curve analysis. The N = 603 patients were then applied to the 2007 Partin table to compare the predictions from the current gold standard in staging prediction to the models developed in this study.
30% of the study cohort had non organ-confined disease. The model built using logistic regression illustrated the highest discrimination metrics (AUC = 0.622, Sens = 0.647, Spec = 0.601), best calibration and the most clinical relevance based on decision curve analysis. This model also achieved higher discrimination than the 2007 Partin table (ECE AUC = 0.572 & 0.509 for T1c and T2a respectively). However, even the best statistical model does not accurately predict prostate cancer stage.
This study has illustrated the inability of the current clinical variables and the 2007 Partin table to accurately predict prostate cancer stage. New biomarker features are urgently required to address the problem clinician’s face in identifying the most appropriate treatment for their patients. This paper also demonstrated a concise methodological approach to evaluate novel features or prediction models.
Prediction models; Model evaluation; Discrimination; Calibration; Prostate cancer
Currently used treatment response criteria in multiple myeloma (MM) are based in part on serum monoclonal protein (M-protein) measurements. A drawback of these criteria is that response is determined solely by the best level of M-protein reduction, without considering the serial trend. The authors hypothesized that metrics incorporating the serial trend of M-protein would be better predictors of progression-free survival (PFS).
Fifty-five patients with measurable disease at baseline (M-protein ≥1 g/dL) who received ≥4 cycles of treatment from 2 clinical trials in previously untreated MM were included. Three metrics based on the percentage of M-protein remaining relative to baseline (residual M-protein) were considered: metrics based on the number of times residual M-protein fell within prespecified thresholds, metrics based on area under the residual M-protein curve, and metrics based on the average residual M-protein reduction between Cycles 1 and 4. The predictive value of these metrics was assessed in Cox models using landmark analysis.
The average residual M-protein reduction was found to be significantly predictive of PFS (P = .02; hazard ratio, 0.37), in which a patient with a 10% lower average residual M-protein reduction from Cycle 1 to 4 was estimated to be at least 2.7× more likely to develop disease progression or die early. None of the other metrics was predictive of PFS. The concordance index for the average residual M-protein reduction was 0.63, compared with 0.56 for best response.
The average residual M-protein reduction metric is promising and needs further validation. This exploratory analysis is the first step in the search for treatment-based trend metrics predictive of outcomes in MM.
multiple myeloma; prediction; progression-free survival; response; serum monoclonal protein
To investigate the relative predictive value of CD4+ metrics for serious clinical endpoints.
Patients (3012; 20317 person-years) from control arms of ESPRIT and SILCAAT trials were followed prospectively. We used Cox regression to identify CD4+ metrics (latest, baseline and nadir CD4+count, latest CD4+%, time spent with CD4+count below certain thresholds and CD4+ slopes) independently predictive of i)all-cause mortality; ii) non-AIDS deaths; iii) non-AIDS (cardiovascular, hepatic, renal and non-AIDS malignancy) and iv) AIDS events. Akaike Information Criteria (AIC) was calculated for each model. Significant metrics (p<0.05) were then additionally adjusted for latest CD4+ count.
Non-AIDS deaths occurred at a higher rate than AIDS deaths (rate-ratio: 6.48, 95%CI: 5.1–8.1) and similarly, non-AIDS events (rate-ratio: 1.72, 95%CI: 1.65–1.79). Latest CD4+count was strongly predictive of lower risk of death (HR per log2 rise: 0.48, 95%CI: 0.43–0.54), with lowest AIC of all metrics. CD4+ slope over 7-visits, after additional adjustment for latest CD4+count, was the only metric to be independent predictor for all-cause (HR for slope<-10/mm3/month vs. 0±10: 3.04, 95%CI: 1.98–4.67) and non-AIDS deaths (HR for slope <-10/mm3/month vs. 0±10: 2.62, 95%CI: 1.62–4.22). Latest CD4+ count (per log2 rise) was the best predictor across all endpoints (i–iv) and predicted hepatic (HR: 0.46, 95%CI: 0.33–0.63) and renal events (HR: 0.39, 95%CI: 0.21–0.70), but not cardiovascular events (HR: 1.05, 95%CI: 0.77–1.43) or non-AIDS cancers (HR: 0.78, 95%CI: 0.59–1.03).
Latest CD4+count is the best predictor of serious endpoints. CD4+ slope independently predicts all-cause and non-AIDS deaths.
CD4+; CD4+ counts; serious non-AIDS events; immunodeficiency; AIDS
Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks–DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.
Many prognostic models for cancer use biomarkers that have utility in early detection. For example, in prostate cancer, models predicting disease-specific survival use serum prostate-specific antigen levels. These models typically show that higher marker levels are associated with poorer prognosis. Consequently, they are often interpreted as indicating that detecting disease at a lower threshold of the biomarker is likely to generate a survival benefit. However, lowering the threshold of the biomarker is tantamount to early detection. For survival benefit to not be simply an artifact of starting the survival clock earlier, we must account for the lead time of early detection. It is not known whether the existing prognostic models imply a survival benefit under early detection once lead time has been accounted for. In this article, we investigate survival benefit implied by prognostic models where the predictor(s) of disease-specific survival are age and/or biomarker level at disease detection. We show that the benefit depends on the rate of biomarker change, the lead time, and the biomarker level at the original date of diagnosis as well as on the parameters of the prognostic model. Even if the prognostic model indicates that lowering the threshold of the biomarker is associated with longer disease-specific survival, this does not necessarily imply that early detection will confer an extension of life expectancy.
Disease-specific survival; Early Detection; Proportional hazards model
A better understanding of patients’ views on the benefit and burden obtained from palliative chemotherapy would facilitate shared decision making. We evaluated palliative cancer patients’ reported outcomes (PROs) for toxicity and investigated the survival threshold for which they would repeat chemotherapy (CTx).
Patients who had received a minimum of three months of palliative CTx for advanced colorectal (CRC) or non-colorectal (non-CRC: upper gastrointestinal, lung and head-and-neck) cancer were assessed by questionnaire. Patients were questioned about PROs for toxicity, subjective burden from side effects, and were asked for the survival threshold necessary for them to repeat CTx. Expected survival (sum of indicated survival threshold and median survival time with best supportive care) was compared to the patients’ actual survival.
One hundred and thirty-four patients (CRC: 58; non-CRC: 76) were surveyed. The most frequent PRO- grade 3/4 toxicities were acne (12.8%), fatigue (9.0%), and diarrhea (8.5%). The symptom causing the highest subjective burden was fatigue and was worse than expected in 29.9% of the patients. The median survival threshold for which patients would repeat CTx was significantly longer in CRC than in non-CRC patients (p=0.01). Median expected survival was significantly longer than actual median survival (CRC: 44.0 months [22.0-65.9] compared with 30.0 months of actual survival [20.9-39.1]; non-CRC: 22.0 months [15.3-28.6] compared with 19.0 months of actual survival [15.1-22.9], p=0.03).
Fatigue deserves more attention when toxicity of treatment and symptoms of disease are explained to patients. Patients’ survival expectations from palliative chemotherapy are higher than previously described, exceed the median survival time known from phase III trials, and are significantly longer than their actual survival.
Chemotherapy; Palliative care; Survival threshold; Fatigue
The objective of this study was to: (1) systematically review the reporting and methods used in the development of clinical prediction models for recurrent stroke or myocardial infarction (MI) after ischemic stroke; (2) to meta-analyze their external performance; and (3) to compare clinical prediction models to informal clinicians’ prediction in the Edinburgh Stroke Study (ESS).
We searched Medline, EMBASE, reference lists and forward citations of relevant articles from 1980 to 19 April 2013. We included articles which developed multivariable clinical prediction models for the prediction of recurrent stroke and/or MI following ischemic stroke. We extracted information to assess aspects of model development as well as metrics of performance to determine predictive ability. Model quality was assessed against a pre-defined set of criteria. We used random-effects meta-analysis to pool performance metrics.
We identified twelve model development studies and eleven evaluation studies. Investigators often did not report effective sample size, regression coefficients, handling of missing data; typically categorized continuous predictors; and used data dependent methods to build models. A meta-analysis of the area under the receiver operating characteristic curve (AUROCC) was possible for the Essen Stroke Risk Score (ESRS) and for the Stroke Prognosis Instrument II (SPI-II); the pooled AUROCCs were 0.60 (95% CI 0.59 to 0.62) and 0.62 (95% CI 0.60 to 0.64), respectively. An evaluation among minor stroke patients in the ESS demonstrated that clinicians discriminated poorly between those with and those without recurrent events and that this was similar to clinical prediction models.
The available models for recurrent stroke discriminate poorly between patients with and without a recurrent stroke or MI after stroke. Models had a similar discrimination to informal clinicians' predictions. Formal prediction may be improved by addressing commonly encountered methodological problems.
Systematic review; Meta-analysis; Stroke; Prediction; Statistical modelling; Evaluation; Development
We investigated whether intracranial pressure (ICP) pulse morphological metrics could be used to realize continuous detection of low cerebral blood flow. Sixty-three acutely brain injured patients with ICP monitoring, daily 133Xenon CBF, and daily Transcranial Doppler (TCD) assessment were studied. Their ICP recordings were time-aligned with the CBF and TCD measurements so that an one-hour ICP segment near the CBF and TCD measurement was obtained. Each of these recordings was processed by Morphological Cluster and Analysis of Intracranial Pressure (MOCAIP) algorithm to extract pulse morphological metrics. Then the Differential Evolution algorithm was used to find the optimal combination of the metrics that provided, using the regularized linear discriminant analysis, the largest combined positive predictivity and sensitivity. At a CBF threshold of 20 ml/min/100g, a sensitivity of 81.8 ± 0.9% and specificity of 50.1 ± 0.2% were obtained using the optimal combination of conventional TCD and blood analysis metrics as input to a regularized linear classifier. However, using the optimal combination of the MOCAIP metrics alone was able to achieve a sensitivity of 92.5 ± 0.7% and specificity of 84.8 ± 0.8%. Searching the optimal combination of all available metrics achieved the best result that was marginally better than those from using MOCAIP alone. This study demonstrated that the potential role of ICP monitoring may be extended to provide an indicator of low global cerebral blood perfusion.
cerebral blood flow; intracranial pressure; machine learning; cerebral ischemia; morphology; brain injury