The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
The study compared actual with predicted survival estimates in advanced stage non-small cell lung cancer patients. Regardless of years of experience, physicians overestimated the survival duration of these patients.
Because most cases of non-small cell lung cancer (NSCLC) are diagnosed at an advanced stage with a poor prognosis, patient inclusion in clinical trials is critical. Most trials require an estimated life expectancy >3 months, based on clinician estimates of patient survival probability, without providing formal guidelines. The aim of this study was to assess the accuracy of clinicians' predictions of survival in NSCLC patients (stages IIIB, and IV) and the possible impact of patient quality of life on survival estimation.
At diagnosis, clinical, biological, and quality of life data (QLQ-C30 questionnaire) were recorded, and doctors “forecast” each patient's estimated survival. Concordance between predicted and actual survival was assessed with the intraclass correlation coefficient.
Eighty-five patients with a mean age of 62.2 years, 81.1% male, were included (squamous cell carcinoma, 33; adenocarcinoma, 42; large cell carcinoma, 8; neuroendocrine carcinoma, 2). The mean follow-up was 40 months and median survival time was 11.7 (range, 0.4–143.7) weeks. All clinicians (residents, registrars, and consultants) overestimated patient survival time, with a moderate concordance between predicted and actual survival time. A worse global health status was associated with a lower discrepancy between estimated and actual patient survival, and a worse role functioning was associated with a larger difference between estimated and actual patient survival.
The absence of specific recommendations to estimate patient survival may introduce major selection in clinical studies. Further research should investigate whether the accuracy of patient survival estimates by clinicians would be improved by taking into account patient quality of life.
Advanced stage non-small cell lung cancer; Survival; Quality of life; Prognostic factors; Predictive estimation
The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.
Uncovering the dominant molecular deregulation among the multitude of pathways implicated in aggressive prostate cancer is essential to intelligently developing targeted therapies. Paradoxically, published prostate cancer gene expression signatures of poor prognosis share little overlap and thus do not reveal shared mechanisms. The authors hypothesize that, by analyzing gene signatures with quantitative models of protein–protein interactions, key pathways will be elucidated and shown to be shared.
The authors statistically prioritized common interactors between established cancer genes and genes from each prostate cancer signature of poor prognosis independently via a previously validated single protein analysis of network (SPAN) methodology. Additionally, they computationally identified pathways among the aggregated interactors across signatures and validated them using a similarity metric and patient survival.
Using an information-theoretic metric, the authors assessed the mechanistic similarity of the interactor signature. Its prognostic ability was assessed in an independent cohort of 198 patients with high-Gleason prostate cancer using Kaplan–Meier analysis.
Of the 13 prostate cancer signatures that were evaluated, eight interacted significantly with established cancer genes (false discovery rate <5%) and generated a 42-gene interactor signature that showed the highest mechanistic similarity (p<0.0001). Via parameter-free unsupervised classification, the interactor signature dichotomized the independent prostate cancer cohort with a significant survival difference (p=0.009). Interpretation of the network not only recapitulated phosphatidylinositol-3 kinase/NF-κB signaling, but also highlighted less well established relevant pathways such as the Janus kinase 2 cascade.
SPAN methodolgy provides a robust means of abstracting disparate prostate cancer gene expression signatures into clinically useful, prioritized pathways as well as useful mechanistic pathways.
Prostate cancer; protein networks; systems biology; information theory; network modeling; Simulation of complex systems (at all levels: molecules to work groups to organizations); knowledge representations; Uncertain reasoning and decision theory; languages and computational methods; statistical analysis of large datasets; advanced algorithms; discovery and text and data mining methods; Natural-language processing; Automated learning; Ontologies
The categorical definition of response assessed via the Response Evaluation Criteria in Solid Tumors has documented limitations. We sought to identify alternative metrics for tumor response that improve prediction of overall survival.
Individual patient data from three North Central Cancer Treatment Group trials (N0026, n=117; N9741, n=1109; N9841, n=332) were used. Continuous metrics of tumor size based on longitudinal tumor measurements were considered in addition to a trichotomized response (TriTR: Response vs. Stable vs. Progression). Cox proportional hazards models, adjusted for treatment arm and baseline tumor burden, were used to assess the impact of the metrics on subsequent overall survival, using a landmark analysis approach at 12-, 16- and 24-weeks post baseline. Model discrimination was evaluated using the concordance (c) index.
The overall best response rates for the three trials were 26%, 45%, and 25% respectively. While nearly all metrics were statistically significantly associated with overall survival at the different landmark time points, the c-indices for the traditional response metrics ranged from 0.59-0.65; for the continuous metrics from 0.60-0.66 and for the TriTR metrics from 0.64-0.69. The c-indices for TriTR at 12-weeks were comparable to those at 16- and 24-weeks.
Continuous tumor-measurement-based metrics provided no predictive improvement over traditional response based metrics or TriTR; TriTR had better predictive ability than best TriTR or confirmed response. If confirmed, TriTR represents a promising endpoint for future Phase II trials.
continuous; tumor measurement; RECIST; prediction; survival
Clinicians need to predict prognosis of Alzheimer's disease (AD), and researchers need models of progression to develop biomarkers and clinical trials designs. We tested a calculated initial progression rate to see whether it predicted performance on cognition, function and behavior over time, and to see whether it predicted survival.
We used standardized approaches to assess baseline characteristics and to estimate disease duration, and calculated the initial (pre-progression) rate in 597 AD patients followed for up to 15 years. We designated slow, intermediate and rapidly progressing groups. Using mixed effects regression analysis, we examined the predictive value of a pre-progression group for longitudinal performance on standardized measures. We used Cox survival analysis to compare survival time by progression group.
Patients in the slow and intermediate groups maintained better performance on the cognitive (ADAScog and VSAT), global (CDR-SB) and complex activities of daily living measures (IADL) (P values < 0.001 slow versus fast; P values < 0.003 to 0.03 intermediate versus fast). Interaction terms indicated that slopes of ADAScog and PSMS change for the slow group were smaller than for the fast group, and that rates of change on the ADAScog were also slower for the intermediate group, but that CDR-SB rates increased in this group relative to the fast group. Slow progressors survived longer than fast progressors (P = 0.024).
A simple, calculated progression rate at the initial visit gives reliable information regarding performance over time on cognition, global performance and activities of daily living. The slowest progression group also survives longer. This baseline measure should be considered in the design of long duration Alzheimer's disease clinical trials.
Several methodological approaches have been used to estimate distance in health service research. In this study, focusing on cardiac catheterization services, Euclidean, Manhattan, and the less widely known Minkowski distance metrics are used to estimate distances from patient residence to hospital. Distance metrics typically produce less accurate estimates than actual measurements, but each metric provides a single model of travel over a given network. Therefore, distance metrics, unlike actual measurements, can be directly used in spatial analytical modeling. Euclidean distance is most often used, but unlikely the most appropriate metric. Minkowski distance is a more promising method. Distances estimated with each metric are contrasted with road distance and travel time measurements, and an optimized Minkowski distance is implemented in spatial analytical modeling.
Road distance and travel time are calculated from the postal code of residence of each patient undergoing cardiac catheterization to the pertinent hospital. The Minkowski metric is optimized, to approximate travel time and road distance, respectively. Distance estimates and distance measurements are then compared using descriptive statistics and visual mapping methods. The optimized Minkowski metric is implemented, via the spatial weight matrix, in a spatial regression model identifying socio-economic factors significantly associated with cardiac catheterization.
The Minkowski coefficient that best approximates road distance is 1.54; 1.31 best approximates travel time. The latter is also a good predictor of road distance, thus providing the best single model of travel from patient's residence to hospital. The Euclidean metric and the optimal Minkowski metric are alternatively implemented in the regression model, and the results compared. The Minkowski method produces more reliable results than the traditional Euclidean metric.
Road distance and travel time measurements are the most accurate estimates, but cannot be directly implemented in spatial analytical modeling. Euclidean distance tends to underestimate road distance and travel time; Manhattan distance tends to overestimate both. The optimized Minkowski distance partially overcomes their shortcomings; it provides a single model of travel over the network. The method is flexible, suitable for analytical modeling, and more accurate than the traditional metrics; its use ultimately increases the reliability of spatial analytical models.
Detection and analysis of epileptic seizures is of clinical and research interest. We propose a novel seizure detection and analysis scheme based on the phase-slope index (PSI) of directed influence applied to multichannel electrocorticogram data. The PSI metric identifies increases in the spatio–temporal interactions between channels that clearly distinguish seizure from interictal activity. We form a global metric of interaction between channels and compare this metric to a threshold to detect the presence of seizures. The threshold is chosen based on a moving average of recent activity to accommodate differences between patients and slow changes within each patient over time. We evaluate detection performance over a challenging population of five patients with different types of epilepsy using a total of 47 seizures in nearly 258 h of recorded data. Using a common threshold procedure, we show that our approach detects all of the seizures in four of the five patients with a false detection rate less than two per hour. A variation on the global metric is proposed to identify which channels are strong drivers of activity in each patient. These metrics are computationally efficient and suitable for real-time application.
Epilepsy; multichannel electrocorticogram (ECoG); phase-slope index (PSI); seizure detection; seizure evolution
The spinal metastasis occurs in up to 40% of cancer patient. We compared the Tokuhashi and Tomita scoring systems, two commonly used scoring systems for prognosis in spinal metastases. We also assessed the different variables separately with respect to their value in predicting postsurgical life expectancy. Finally, we suggest criteria for selecting patients for surgery based on the postoperative survival pattern.
Materials and Methods:
We retrospectively analyzed 102 patients who had been operated for metastatic disease of the spine. Predictive scoring was done according to the scoring systems proposed by Tokuhashi and Tomita. Overall survival was assessed using Kaplan–Meier survival analysis. Using the log rank test and Cox regression model we assessed the value of the individual components of each scoring system for predicting survival in these patients.
The factors that were most significantly associated with survival were the general condition score (Karnofsky Performance Scale) (P=.000, log rank test), metastasis to internal organs (P=.0002 log rank test), and number of extraspinal bone metastases (P=.0058). Type of primary tumor was not found to be significantly associated with survival according to the revised Tokuhashi scoring system (P=.9131, log rank test). Stepwise logistic regression revealed that the Tomita score correlated more closely with survival than the Tokuhashi score.
The patient's performance status, extent of visceral metastasis, and extent of bone metastases are significant predictors of survival in patients with metastatic disease. Both revised Tokuhashi and Tomita scores were significantly correlated with survival. A revised Tokuhashi score of 7 or more and a Tomita score of 6 or less indicated >50% chance of surviving 6 months postoperatively. We recommend that the Tomita score be used for prognostication in patients who are contemplating surgery, as it is simpler to score and has a higher strength of correlation with survival than the Tokuhashi score.
Spine metastasis; prognostic predictors; survival
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials.
Causal inference; Meta-analysis; Surrogacy
Many approaches have been taken to adjust for smoking in modeling cancer risk. In case-control studies, these metrics are often used arbitrarily rather than being based on the properties of the metric in the context of the study. Depending on the underlying study design, hypotheses and base population, different metrics may be deemed most appropriate. We present our approach to evaluating different smoking metrics. We examine the properties of a new metric, “logcig-years”, that we initially derived from utilizing a biological model of DNA adduct formation. We compare this metric to three other smoking metrics, namely pack-years, square-root pack-years, and a model in which smoking duration and intensity are separate variables. Our comparisons use generalized additive models and logistic regression to examine the relationship between the logit probability of cancer and each of the metrics, while adjusting for other covariates. All models were fit using data from a lung cancer study of 1275 cases and 1269 controls that has focused on gene-smoking relationships. There was a very significant, linear relationship between logcig-years and the logit probability of lung cancer in this sample, without any need to adjust for smoking status. These properties together were not shared by the other metrics. In this sample, logcig-years captured more information about smoking that is important in lung cancer risk than the other metrics. In conclusion, we provide a general framework for evaluating different smoking metrics in studies where smoking is a critical variable.
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.
We developed an extensible software framework for sharing molecular prognostic models of breast cancer survival in a transparent collaborative environment and subjecting each model to automated evaluation using objective metrics. The computational framework presented in this study, our detailed post-hoc analysis of hundreds of modeling approaches, and the use of a novel cutting-edge data resource together represents one of the largest-scale systematic studies to date assessing the factors influencing accuracy of molecular-based prognostic models in breast cancer. Our results demonstrate the ability to infer prognostic models with accuracy on par or greater than previously reported studies, with significant performance improvements by using state-of-the-art machine learning approaches trained on clinical covariates. Our results also demonstrate the difficultly in incorporating molecular data to achieve substantial performance improvements over clinical covariates alone. However, improvement was achieved by combining clinical feature data with intelligent selection of important molecular features based on domain-specific prior knowledge. We observe that ensemble models aggregating the information across many diverse models achieve among the highest scores of all models and systematically out-perform individual models within the ensemble, suggesting a general strategy for leveraging the wisdom of crowds to develop robust predictive models.
Currently used treatment response criteria in multiple myeloma (MM) are based in part on serum monoclonal protein (M-protein) measurements. A drawback of these criteria is that response is determined solely by the best level of M-protein reduction, without considering the serial trend. The authors hypothesized that metrics incorporating the serial trend of M-protein would be better predictors of progression-free survival (PFS).
Fifty-five patients with measurable disease at baseline (M-protein ≥1 g/dL) who received ≥4 cycles of treatment from 2 clinical trials in previously untreated MM were included. Three metrics based on the percentage of M-protein remaining relative to baseline (residual M-protein) were considered: metrics based on the number of times residual M-protein fell within prespecified thresholds, metrics based on area under the residual M-protein curve, and metrics based on the average residual M-protein reduction between Cycles 1 and 4. The predictive value of these metrics was assessed in Cox models using landmark analysis.
The average residual M-protein reduction was found to be significantly predictive of PFS (P = .02; hazard ratio, 0.37), in which a patient with a 10% lower average residual M-protein reduction from Cycle 1 to 4 was estimated to be at least 2.7× more likely to develop disease progression or die early. None of the other metrics was predictive of PFS. The concordance index for the average residual M-protein reduction was 0.63, compared with 0.56 for best response.
The average residual M-protein reduction metric is promising and needs further validation. This exploratory analysis is the first step in the search for treatment-based trend metrics predictive of outcomes in MM.
multiple myeloma; prediction; progression-free survival; response; serum monoclonal protein
This study was conducted to investigate the assessment of treatment efficacy of radiotherapy (RT) and other therapeutic modalities compared with palliative care only for treatment with advanced hepatocellular carcinoma (HCC).
From 2002 to 2010, based on the case of 47 patients with advanced HCC, we have investigated each patients' Child-Pugh's class, ECOG performance, serum level of alpha fetoprotein and other baseline characteristics that is considered to be predictive variables and values for prognosis of HCC. Out of overall patients, the 29 patients who had received RT were selected for one group and the 18 patients who had received only palliative care were classified for the other. The analysis in survival between the two groups was done to investigate the efficacy of RT.
Under the analysis in survival, the mean survival time of total patients group was revealed between 30.1 months and 45.9 months in RT group, while it was 4.8 months in palliative care group, respectively. In the univariate analysis for overall patients, there were significant factors which affected survival rate like as follows: ECOG performance, Child-Pugh's class, the tumor size, the type of tumor, alpha fetoprotein, transarterial chemoembolization, and RT. The regressive analysis in multivariate Cox for total patients. No treatment under radiotherapy and high level of Child-Pugh's class grade were independent predictors of worse overall survival rate in patients. In contrast, for the subset analysis of the twenty-nine patients treated with radiotherapy, the higher serum level of alpha fetoprotein was an independent predictors of worse overall survival rate in patients.
We found that the survival of patients with advanced HCC was better with radiotherapy than with palliative care. Therefore, radiotherapy could be a good option for in patients with advanced HCC.
Hepatocellular carcinoma; Radiotherapy; Survival rate; Alpha-fetoprotein; Child-Pugh class
To investigate the relative predictive value of CD4+ metrics for serious clinical endpoints.
Patients (3012; 20317 person-years) from control arms of ESPRIT and SILCAAT trials were followed prospectively. We used Cox regression to identify CD4+ metrics (latest, baseline and nadir CD4+count, latest CD4+%, time spent with CD4+count below certain thresholds and CD4+ slopes) independently predictive of i)all-cause mortality; ii) non-AIDS deaths; iii) non-AIDS (cardiovascular, hepatic, renal and non-AIDS malignancy) and iv) AIDS events. Akaike Information Criteria (AIC) was calculated for each model. Significant metrics (p<0.05) were then additionally adjusted for latest CD4+ count.
Non-AIDS deaths occurred at a higher rate than AIDS deaths (rate-ratio: 6.48, 95%CI: 5.1–8.1) and similarly, non-AIDS events (rate-ratio: 1.72, 95%CI: 1.65–1.79). Latest CD4+count was strongly predictive of lower risk of death (HR per log2 rise: 0.48, 95%CI: 0.43–0.54), with lowest AIC of all metrics. CD4+ slope over 7-visits, after additional adjustment for latest CD4+count, was the only metric to be independent predictor for all-cause (HR for slope<-10/mm3/month vs. 0±10: 3.04, 95%CI: 1.98–4.67) and non-AIDS deaths (HR for slope <-10/mm3/month vs. 0±10: 2.62, 95%CI: 1.62–4.22). Latest CD4+ count (per log2 rise) was the best predictor across all endpoints (i–iv) and predicted hepatic (HR: 0.46, 95%CI: 0.33–0.63) and renal events (HR: 0.39, 95%CI: 0.21–0.70), but not cardiovascular events (HR: 1.05, 95%CI: 0.77–1.43) or non-AIDS cancers (HR: 0.78, 95%CI: 0.59–1.03).
Latest CD4+count is the best predictor of serious endpoints. CD4+ slope independently predicts all-cause and non-AIDS deaths.
CD4+; CD4+ counts; serious non-AIDS events; immunodeficiency; AIDS
We investigated whether intracranial pressure (ICP) pulse morphological metrics could be used to realize continuous detection of low cerebral blood flow. Sixty-three acutely brain injured patients with ICP monitoring, daily 133Xenon CBF, and daily Transcranial Doppler (TCD) assessment were studied. Their ICP recordings were time-aligned with the CBF and TCD measurements so that an one-hour ICP segment near the CBF and TCD measurement was obtained. Each of these recordings was processed by Morphological Cluster and Analysis of Intracranial Pressure (MOCAIP) algorithm to extract pulse morphological metrics. Then the Differential Evolution algorithm was used to find the optimal combination of the metrics that provided, using the regularized linear discriminant analysis, the largest combined positive predictivity and sensitivity. At a CBF threshold of 20 ml/min/100g, a sensitivity of 81.8 ± 0.9% and specificity of 50.1 ± 0.2% were obtained using the optimal combination of conventional TCD and blood analysis metrics as input to a regularized linear classifier. However, using the optimal combination of the MOCAIP metrics alone was able to achieve a sensitivity of 92.5 ± 0.7% and specificity of 84.8 ± 0.8%. Searching the optimal combination of all available metrics achieved the best result that was marginally better than those from using MOCAIP alone. This study demonstrated that the potential role of ICP monitoring may be extended to provide an indicator of low global cerebral blood perfusion.
cerebral blood flow; intracranial pressure; machine learning; cerebral ischemia; morphology; brain injury
Predicting the long-term viability of ischaemic bowel during surgery is challenging. We hypothesized that intraoperative near-infrared (NIR) angiography (NIR-AG) of ischaemic bowel might provide metrics that were predictive of long-term outcome.
Materials and Methods
NIR-AG using indocyanine green (ICG) was performed on N = 24 pigs before and after inducing bowel ischaemia to determine the feasibility of NIR-AG to detect compromised perfusion. Contrast-to-background ratio (CBR) over time was measured in regions of interest throughout the bowel, and various metrics of the CBR-time curve were developed. N = 60 rat bowels, with or without strangulation, were imaged intraoperatively and on postoperative day (POD) 3. CBR metrics and clinical findings obtained intraoperatively were assessed quantitatively for their ability to predict animal survival, histological grade of ischaemic injury, and visible necrosis at POD 3.
In the ischaemic bowels of pigs, various qualitative and quantitative CBR metrics appeared to correlate with bowel injury as a function of distance from normal bowel. In rats, intraoperative clinical assessment showed high specificity but low sensitivity for predicting outcome on POD 3. Qualitative patterns of the CBR-time curve, such as absence of an arterial inflow peak and presence of a NIR filling defect, resulted in better accuracies to predict animal survival, histological grade, and visible necrosis at POD 3 of 90%, 85% and 92%.
Bowel survival at POD 3 can be predicted by intraoperative NIR-AG with higher accuracy compared to clinical evaluation alone. NIR-AG may someday prove useful clinically for avoiding unnecessary resection.
Bowel ischaemia; bowel infarction; near-infrared fluorescence; indocyanine green
A better understanding of patients’ views on the benefit and burden obtained from palliative chemotherapy would facilitate shared decision making. We evaluated palliative cancer patients’ reported outcomes (PROs) for toxicity and investigated the survival threshold for which they would repeat chemotherapy (CTx).
Patients who had received a minimum of three months of palliative CTx for advanced colorectal (CRC) or non-colorectal (non-CRC: upper gastrointestinal, lung and head-and-neck) cancer were assessed by questionnaire. Patients were questioned about PROs for toxicity, subjective burden from side effects, and were asked for the survival threshold necessary for them to repeat CTx. Expected survival (sum of indicated survival threshold and median survival time with best supportive care) was compared to the patients’ actual survival.
One hundred and thirty-four patients (CRC: 58; non-CRC: 76) were surveyed. The most frequent PRO- grade 3/4 toxicities were acne (12.8%), fatigue (9.0%), and diarrhea (8.5%). The symptom causing the highest subjective burden was fatigue and was worse than expected in 29.9% of the patients. The median survival threshold for which patients would repeat CTx was significantly longer in CRC than in non-CRC patients (p=0.01). Median expected survival was significantly longer than actual median survival (CRC: 44.0 months [22.0-65.9] compared with 30.0 months of actual survival [20.9-39.1]; non-CRC: 22.0 months [15.3-28.6] compared with 19.0 months of actual survival [15.1-22.9], p=0.03).
Fatigue deserves more attention when toxicity of treatment and symptoms of disease are explained to patients. Patients’ survival expectations from palliative chemotherapy are higher than previously described, exceed the median survival time known from phase III trials, and are significantly longer than their actual survival.
Chemotherapy; Palliative care; Survival threshold; Fatigue
To assess the public health risk of heat waves and to set criteria for alerts for excessive heat, various meteorologic metrics and models are used in different jurisdictions, generally without systematic comparisons of alternatives. We report such an analysis for New York City that compared maximum heat index with alternative metrics in models to predict daily variation in warm-season natural-cause mortality from 1997 through 2006.
Materials and methods
We used Poisson time-series generalized linear models and generalized additive models to estimate weather–mortality relationships using various metrics, lag and averaging times, and functional forms and compared model fit.
A model that included cubic functions of maximum heat index on the same and each of the previous 3 days provided the best fit, better than models using maximum, minimum, or average temperature, or spatial synoptic classification (SSC) of weather type. We found that goodness of fit and maximum heat index–mortality functions were similar using parametric and nonparametric models. Same-day maximum heat index was linearly related to mortality risk across its range. The slopes at lags of 1, 2, and 3 days were flat across moderate values but increased sharply between maximum heat index of 95°F and 100°F (35–38°C). SSC or other meteorologic variables added to the maximum heat index model moderately improved goodness of fit, with slightly attenuated maximum heat index–mortality functions.
In New York City, maximum heat index performed similarly to alternative and more complex metrics in estimating mortality risk during hot weather. The linear relationship supports issuing heat alerts in New York City when the heat index is forecast to exceed approximately 95–100°F. Periodic city-specific analyses using recent data are recommended to evaluate public health risks from extreme heat.
epidemiology; heat wave; meteorology; mortality; temperature
Despite being a common cancer worldwide, management of transitional cell carcinoma of the bladder currently relies primarily on clinical staging and histopathologic parameters. Assaying alterations in molecular pathways can contribute valuable information that can accurately predict outcome and chemotherapeutic response in individual patients with bladder cancer. Medium- to high-throughput gene-expression profiling technologies are now allowing multiplexed assessment of alterations responsible for the genesis and progression of bladder tumors. These investigations employ global or pathway-based approaches to define molecular signatures that can predict prognosis independent of traditional clinical performance metrics. Prognostic panels generated using these strategies can also elucidate the biology of tumor progression and identify potential therapeutic targets.
gene-expression profiling; global approach; microarray; multimarker analysis; pathway-specific approach; urothelial carcinoma
Although several prognostic genomic predictors have been identified from independent studies, it remains unclear whether these predictors are actually concordant with respect to their predictions for individual patients and which predictor performs best. We compared five prognostic genomic predictors, the V7RHS, the ColoGuideEx, the Meta163, the OncoDX, and the MDA114, in terms of predicting disease-free survival in two independent cohorts of patients with colorectal cancer.
Using original classification algorithms, we tested the predictions of five genomic predictors for disease-free survival in two cohorts of patients with colorectal cancer (n = 229 and n = 168) and evaluated concordance of predictors in predicting outcomes for individual patients.
We found that only two predictors, OncoDX and MDA114, demonstrated robust performance in identifying patients with poor prognosis in 2 independent cohorts. These two predictors also had modest but significant concordance of predicted outcome (r>0.3, P<0.001 in both cohorts).
Further validation of developed genomic predictors is necessary. Despite the limited number of genes shared by OncoDX and MDA114, individual-patient outcomes predicted by these two predictors were significantly concordant.
Survival for cancer patients is usually only reported as survival from time of diagnosis. For patients who survive 1 or more years after diagnosis, however, their survival probability changes over time, and is more accurately depicted by conditional survival. The specific aim of this project was to build a survival regression model and web-based tool to make individualized estimates of conditional survival for head and neck cancer patients based on tumor and patient characteristics. Using data from the Surveillance, Epidemiology, and End Results (SEER) database, we built a prediction modeling tool that can estimate prognosis for head and neck cancer patients who have already survived a period of time after diagnosis. We believe that having more accurate prognostic information may empower both patients and clinicians to be able to make more appropriate decisions regarding follow-up, surveillance testing, and future treatment.
Proportional Hazards Models; Models; Statistical; Regression Analysis; survival modeling; head and neck cancer
AIM: To develop a prognostic model to predict survival of patients with colorectal cancer (CRC).
METHODS: Survival data of 837 CRC patients undergoing surgery between 1996 and 2006 were collected and analyzed by univariate analysis and Cox proportional hazard regression model to reveal the prognostic factors for CRC. All data were recorded using a standard data form and analyzed using SPSS version 18.0 (SPSS, Chicago, IL, United States). Survival curves were calculated by the Kaplan-Meier method. The log rank test was used to assess differences in survival. Univariate hazard ratios and significant and independent predictors of disease-specific survival and were identified by Cox proportional hazard analysis. The stepwise procedure was set to a threshold of 0.05. Statistical significance was defined as P < 0.05.
RESULTS: The survival rate was 74% at 3 years and 68% at 5 years. The results of univariate analysis suggested age, preoperative obstruction, serum carcinoembryonic antigen level at diagnosis, status of resection, tumor size, histological grade, pathological type, lymphovascular invasion, invasion of adjacent organs, and tumor node metastasis (TNM) staging were positive prognostic factors (P < 0.05). Lymph node ratio (LNR) was also a strong prognostic factor in stage III CRC (P < 0.0001). We divided 341 stage III patients into three groups according to LNR values (LNR1, LNR ≤ 0.33, n = 211; LNR2, LNR 0.34-0.66, n = 76; and LNR3, LNR ≥ 0.67, n = 54). Univariate analysis showed a significant statistical difference in 3-year survival among these groups: LNR1, 73%; LNR2, 55%; and LNR3, 42% (P < 0.0001). The multivariate analysis results showed that histological grade, depth of bowel wall invasion, and number of metastatic lymph nodes were the most important prognostic factors for CRC if we did not consider the interaction of the TNM staging system (P < 0.05). When the TNM staging was taken into account, histological grade lost its statistical significance, while the specific TNM staging system showed a statistically significant difference (P < 0.0001).
CONCLUSION: The overall survival of CRC patients has improved between 1996 and 2006. LNR is a powerful factor for estimating the survival of stage III CRC patients.
Colorectal cancer; Prognostic factors; Cox proportional hazard regression; Lymph node ratio
Purpose: An estimated 24%–45% of patients with cancer develop brain metastases. Individualized estimation of survival for patients with brain metastasis could be useful for counseling patients on clinical outcomes and prognosis. Methods: De-identified data for 2367 patients with brain metastasis from 7 Radiation Therapy Oncology Group randomized trials were used to develop and internally validate a prognostic nomogram for estimation of survival among patients with brain metastasis. The prognostic accuracy for survival from 3 statistical approaches (Cox proportional hazards regression, recursive partitioning analysis [RPA], and random survival forests) was calculated using the concordance index. A nomogram for 12-month, 6-month, and median survival was generated using the most parsimonious model. Results: The majority of patients had lung cancer, controlled primary disease, no surgery, Karnofsky performance score (KPS) ≥ 70, and multiple brain metastases and were in RPA class II or had a Diagnosis-Specific Graded Prognostic Assessment (DS-GPA) score of 1.25–2.5. The overall median survival was 136 days (95% confidence interval, 126–144 days). We built the nomogram using the model that included primary site and histology, status of primary disease, metastatic spread, age, KPS, and number of brain lesions. The potential use of individualized survival estimation is demonstrated by showing the heterogeneous distribution of the individual 12-month survival in each RPA class or DS-GPA score group. Conclusion: Our nomogram provides individualized estimates of survival, compared with current RPA and DS-GPA group estimates. This tool could be useful for counseling patients with respect to clinical outcomes and prognosis.
brain metastases; nomogram; prediction; prognosis; survival
Dedicated software packages incorporating prognostic models are meant to aid physicians in making accurate predictions of prognosis. This study concerns 742 predictions of 5-year survival on consecutive newly diagnosed patients with head- and neck squamous cell carcinoma. The 5-year survival predictions made by the physicians are not compared with actual survival, but with a prediction made by OncologIQ, a dedicated software package. We used a linear regression and a linear mixed-effects model to look at absolute differences between both predictions and possible learning effects. Predictions made by the physicians were optimistic and inaccurate. Using the linear regression and linear mixed-effects models, the physicians’ learning effect showed little improvement per successive prediction. We conclude that prognostic predictions in general are imprecise. When given feedback on the model’s predicted survival, the accuracy increases, but only very modestly.
Head and neck oncology; Survival prediction; Accuracy; Learning effect; Dedicated software