Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.
Survival prediction; prediction error curves; random survival forest; R.
Prediction of cumulative incidences is often a primary goal in clinical studies with several end-points. We compare predictions among competing risks models with time-dependent covariates. For a series of landmark time points we study the predictive accuracy of a multi-state regression model, where the time-dependent covariate represents an intermediate state and two alternative landmark approaches (Cortese & Andersen, 2010). At each landmark time point, the prediction performance is measured as the t-year expected Brier score where pseudovalues are constructed in order to deal with right censored event times. We apply the methods to data from a bone marrow transplant study where graft versus host disease (GvHD) is considered a time-dependent covariate for predicting relapse and death in remission.
Bone marrow transplant studies; Brier score; Competing risks; Prediction models; Pseudovalues; Time-dependent covariates
Targeted interventions for the long-term sick-listed may prevent permanent exclusion from the labour force. We aimed to develop a prediction method for identifying high risk groups for continued or recurrent long-term sickness absence, unemployment, or disability among persons on long-term sick leave.
We obtained individual characteristics and follow-up data from the Danish Register of Sickness Absence Compensation Benefits and Social Transfer Payments (RSS) during 2004 to 2010 for 189,279 Danes who experienced a period of long-term sickness absence (4+ weeks). In a learning data set, statistical prediction methods were built using logistic regression and a discrete event simulation approach for a one year prediction horizon. Personalized risk profiles were obtained for five outcomes: employment, unemployment, recurrent sickness absence, continuous long-term sickness absence, and early retirement from the labour market. Predictor variables included gender, age, socio-economic position, job type, chronic disease status, history of sickness absence, and prior history of unemployment. Separate models were built for times of economic growth (2005–2007) and times of recession (2008–2010). The accuracy of the prediction models was assessed with analyses of Receiver Operating Characteristic (ROC) curves and the Brier score in an independent validation data set.
In comparison with a null model which ignored the predictor variables, logistic regression achieved only moderate prediction accuracy for the five outcome states. Results obtained with discrete event simulation were comparable with logistic regression.
Only moderate prediction accuracy could be achieved using the selected information from the Danish register RSS. Other variables need to be included in order to establish a prediction method which provides more accurate risk profiles for long-term sick-listed persons.
Labour market; Long-term sick-listed; Risk profiling; Logistic regression; Discrete event simulation; Register data; Registry
Using a large, contemporary primary care population we aimed to provide absolute long-term risks of cardiovascular death (CVD) based on the QTc interval and to test whether the QTc interval is of value in risk prediction of CVD on an individual level.
Methods and results
Digital electrocardiograms from 173 529 primary care patients aged 50–90 years were collected during 2001–11. The Framingham formula was used for heart rate-correction of the QT interval. Data on medication, comorbidity, and outcomes were retrieved from administrative registries. During a median follow-up period of 6.1 years, 6647 persons died from cardiovascular causes. Long-term risks of CVD were estimated for subgroups defined by age, gender, cardiovascular disease, and QTc interval categories. In general, we observed an increased risk of CVD for both very short and long QTc intervals. Prolongation of the QTc interval resulted in the worst prognosis for men whereas in women, a very short QTc interval was equivalent in risk to a borderline prolonged QTc interval. The effect of the QTc interval on the absolute risk of CVD was most pronounced in the elderly and in those with cardiovascular disease whereas the effect was negligible for middle-aged women without cardiovascular disease. The most important improvement in prediction accuracy was noted for women aged 70–90 years. In this subgroup, a total of 9.5% were reclassified (7.2% more accurately vs. 2.3% more inaccurately) within clinically relevant 5-year risk groups when the QTc interval was added to a conventional risk model for CVD.
Important differences were observed across subgroups when the absolute long-term risk of CVD was estimated based on QTc interval duration. The accuracy of the personalized CVD prognosis can be improved when the QTc interval is introduced to a conventional risk model for CVD.
QTc interval; Gender; Marquette 12SL validation; Cardiovascular death; Risk prediction
The concordance probability is a widely used measure to assess discrimination of prognostic models with binary and survival endpoints. We formally define the concordance probability for a prognostic model of the absolute risk of an event of interest in the presence of competing risks and relate it to recently proposed time-dependent area under the receiver operating characteristic curve measures. For right-censored data, we investigate inverse probability of censoring weighted (IPCW) estimates of a truncated concordance index based on a working model for the censoring distribution. We demonstrate consistency and asymptotic normality of the IPCW estimate if the working model is correctly specified and derive an explicit formula for the asymptotic variance under independent censoring. The small sample properties of the estimator are assessed in a simulation study also against misspecification of the working model. We further illustrate the methods by computing the concordance probability for a prognostic model of coronary heart disease (CHD) events in the presence of the competing risk of non-CHD death.
C index; Competing risks; Concordance probability; Coronary heart disease; Prognostic models; Time-dependent AUC
The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
Plasmodium falciparum malaria kills nearly a million people annually. Over 90% of these deaths occur in children under five years of age in sub-Saharan Africa. A neutrophil mediated mechanism, the antibody dependent respiratory burst (ADRB), was recently shown to correlate with protection from clinical malaria. Human neutrophils constitutively express Fc gamma receptor-FcγRIIA and FcγRIIIB by which they interact with immunoglobulin (Ig) G (IgG)-subclass antibodies. Polymorphisms in exon 4 of FCGR2A and exon 3 of FCGR3B genes encoding FcγRIIA and FcγRIIIB respectively have been described to alter the affinities of both receptors for IgG. Here, associations between specific polymorphisms, encoding FcγRIIA p.H166R and FcγRIIIB-NA1/NA2/SH variants with clinical malaria were investigated in a longitudinal malaria cohort study. FcγRIIA-p.166H/R was genotyped by gene specific polymerase chain reaction followed by allele specific restriction enzyme digestion. FCGR3B-exon 3 was sequenced in 585 children, aged 1 to 12 years living in a malaria endemic region of Ghana. Multivariate logistic regression analysis found no association between FcγRIIA-166H/R polymorphism and clinical malaria. The A-allele of FCGR3B-c.233C>A (rs5030738) was significantly associated with protection from clinical malaria under two out of three genetic models (additive: p = 0.0061; recessive: p = 0.097; dominant: p = 0.0076) of inheritance. The FcγRIIIB-SH allotype (CTGAAA) containing the 233A-allele (in bold) was associated with protection from malaria (p = 0.049). The FcγRIIIB-NA2*03 allotype (CTGCGA), a variant of the classical FcγRIIIB-NA2 (CTGCAA) was associated with susceptibility to clinical malaria (p = 0.0092). The present study is the first to report an association between a variant of FcγRIIIB-NA2 and susceptibility to clinical malaria and provides justification for further functional characterization of variants of the classical FcγRIIIB allotypes. This would be crucial to the improvement of neutrophil mediated functional assays such as the ADRB assay aimed at assessing the functionality of antibodies induced by candidate malaria vaccines.
Recently meta analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this paper, we are interested in developing robust prognostic rules for the prediction of t-year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time-specific ROC curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5-year survival of breast cancer patients based on five breast cancer genomic studies.
Biomarker; Classification; Conditional Kaplan-Meier; Meta Analysis; Nonparametric Maximum Likelihood; Predictive Values; Prognosis; ROC; Survival Analysis
Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-κB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors.
Testicular dysgenesis syndrome (TDS) is a common disease that links testicular germ cell cancer, cryptorchidism and some cases of hypospadias and male infertility with impaired development of the testis. The incidence of these disorders has increased over the last few decades, and testicular cancer now affects 1% of the Danish and Norwegian male population.
To identify genetic variants that span the four TDS phenotypes, the authors performed a genome-wide association study (GWAS) using Affymetrix Human SNP Array 6.0 to screen 488 patients with symptoms of TDS and 439 selected controls with excellent reproductive health. Furthermore, they developed a novel integrative method that combines GWAS data with other TDS-relevant data types and identified additional TDS markers. The most significant findings were replicated in an independent cohort of 671 Nordic men.
Markers located in the region of TGFBR3 and BMP7 showed association with all TDS phenotypes in both the discovery and replication cohorts. An immunohistochemistry investigation confirmed the presence of transforming growth factor β receptor type III (TGFBR3) in peritubular and Leydig cells, in both fetal and adult testis. Single-nucleotide polymorphisms in the KITLG gene showed significant associations, but only with testicular cancer.
The association of single-nucleotide polymorphisms in the TGFBR3 and BMP7 genes, which belong to the transforming growth factor β signalling pathway, suggests a role for this pathway in the pathogenesis of TDS. Integrating data from multiple layers can highlight findings in GWAS that are biologically relevant despite having border significance at currently accepted statistical levels.
TDS; systems biology; GWAS; infertility; testis cancer; reproductive medicine; genome-wide; genetics; epidemiology; diabetes; endocrinology; genetic epidemiology; cancer: urological; chromosomal; oncology; developmental
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
Optimal management of colon cancer (CC) requires detailed assessment of extent of disease. This study prospectively investigates the diagnostic accuracy of 2-deoxy-2-[18F]fluoro-D-glucose positron emission tomography/computed tomography (PET/CT) for staging and detection of recurrence in primary CC.
Material and methods
PET/CT for preoperative staging was performed in 66 prospectively included patients with primary CC. Diagnostic accuracy for PET/CT and CT was analyzed. In addition to routine follow up, 42 stages I–III CC patients had postoperative PET/CT examinations every 6 months for 2 years. Serological levels of tissue inhibitor of metalloproteinase-1 (TIMP-1), carcinoembryonic antigen, and liberated domain I of urokinase plasminogen activator receptor were analyzed.
Accuracy for tumor, nodal, and metastases staging by PET/CT were 82% (95% confidence interval [CI]: 70; 91), 66% (CI: 51; 78), and 89% (CI: 79; 96); for CT the accuracy was 77% (CI: 64; 87), 60% (CI: 46; 73), and 69% (CI: 57; 80). Cumulative relapse incidences for stages I–III CC at 6, 12, 18, and 24 months were 7.1% (CI: 0; 15); 14.3% (CI: 4; 25); 19% (CI: 7; 31), and 21.4% (CI: 9; 34). PET/CT diagnosed all relapses detected during the first 2 years. High preoperative TIMP-1 levels were associated with significant hazards toward risk of recurrence and shorter overall survival.
This study indicates PET/CT as a valuable tool for staging and follow up in CC. TIMP-1 provided prognostic information potentially useful in selection of patients for intensive follow up.
carcinoembryonic antigen; colonic neoplasms; colorectal neoplasms; neoplasm staging; positron emission tomography; prognosis; receptors; tissue inhibitor of metalloproteinase-1; urokinase plasminogen activator; X-ray computed tomography