The diagnostic likelihood ratio function, DLR, is a statistical measure used to evaluate risk prediction markers. The goal of this paper is to develop new methods to estimate the DLR function. Furthermore, we show how risk prediction markers can be compared using rank-invariant DLR functions. Various estimators are proposed that accommodate cohort or case–control study designs. Performances of the estimators are compared using simulation studies. The methods are illustrated by comparing a lung function measure and a nutritional status measure for predicting subsequent onset of major pulmonary infection in children suffering from cystic fibrosis. For continuous markers, the DLR function is mathematically related to the slope of the receiver operating characteristic (ROC) curve, an entity used to evaluate diagnostic markers. We show that our methodology can be used to estimate the slope of the ROC curve and illustrate use of the estimated ROC derivative in variance and sample size calculations for a diagnostic biomarker study.
Biomarker; density estimation; diagnosis; logistic regression; rank invariant; risk prediction; ROC–GLM
To assess the value of a continuous marker in predicting the risk of a disease, a graphical tool called the predictiveness curve has been proposed. It characterizes the marker’s predictiveness, or capacity to risk stratify the population by displaying the distribution of risk endowed by the marker. Methods for making inference about the curve and for comparing curves in a general population have been developed. However, knowledge about a marker’s performance in the general population only is not enough. Since a marker’s effect on the risk model and its distribution can both differ across subpopulations, its predictiveness may vary when applied to different subpopulations. Moreover, information about the predictiveness of a marker conditional on baseline covariates is valuable for individual decision making about having the marker measured or not. Therefore, to fully realize the usefulness of a risk prediction marker, it is important to study its performance conditional on covariates. In this article, we propose semiparametric methods for estimating covariate-specific predictiveness curves for a continuous marker. Unmatched and matched case-control study designs are accommodated. We illustrate application of the methodology by evaluating serum creatinine as a predictor of risk of renal artery stenosis.
A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics
56, 337–344). We propose a direct, non-parametric method to estimate the time-dependent Area under the curve (AUC) which we refer to as the weighted mean rank (WMR) estimator. The proposed estimator performs well relative to the semi-parametric AUC curve estimator of Heagerty and Zheng (2005. Survival model predictive accuracy and ROC curves. Biometrics
61, 92–105). We establish the asymptotic properties of the proposed estimator and show that the accuracy of markers can be compared very simply using the difference in the WMR statistics. Estimators of pointwise standard errors are provided.
AUC curve; Survival analysis; Time-dependent ROC
Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is commonly summarized using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We describe three different ways of using covariate information. For factors that affect marker observations among controls, we present a method for covariate adjustment. For factors that affect discrimination (i.e. the ROC curve), we describe methods for modelling the ROC curve as a function of covariates. Finally, for factors that contribute to discrimination, we propose combining the marker and covariate information, and ask how much discriminatory accuracy improves with the addition of the marker to the covariates (incremental value). These methods follow naturally when representing the ROC curve as a summary of the distribution of case marker observations, standardized with respect to the control distribution.
The receiver operating characteristic (ROC) curve, the positive predictive value (PPV) curve and the negative predictive value (NPV) curve are three measures of performance for a continuous diagnostic biomarker. The ROC, PPV and NPV curves are often estimated empirically to avoid assumptions about the distributional form of the biomarkers. Recently, there has been a push to incorporate group sequential methods into the design of diagnostic biomarker studies. A thorough understanding of the asymptotic properties of the sequential empirical ROC, PPV and NPV curves will provide more flexibility when designing group sequential diagnostic biomarker studies. In this paper we derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves under case-control sampling using sequential empirical process theory. We show that the sequential empirical ROC, PPV and NPV curves converge to the sum of independent Kiefer processes and show how these results can be used to derive asymptotic results for summaries of the sequential empirical ROC, PPV and NPV curves.
Group Sequential Methods; Empirical Process Theory; Diagnostic Testing
This paper considers receiver operating characteristic (ROC) analysis for bivariate marker measurements. The research interest is to extend tools and rules from univariate marker to bivariate marker setting for evaluating predictive accuracy of markers using a tree-based classification rule. Using an and-or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are proposed for examining the performance of bivariate markers. The proposed functions evaluate the performance of and-or classifiers among all possible combinations of marker values, and are ideal measures for understanding the predictability of biomarkers in target population. Specific features of ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating ROC-related functions, (partial) area under curve and concordance probability. With emphasis on average performance of markers, the proposed procedures and inferential results are useful for evaluating marker predictability based on a single or bivariate marker (or test) measurements with different choices of markers, and for evaluating different and-or combinations in classifiers. The inferential results developed in this paper also extend to multivariate markers with a sequence of arbitrarily combined and-or classifier.
Concordance probability; Prediction accuracy; Tree-based classification; U-statistics
To develop more targeted intervention strategies, an important research goal is to identify markers predictive of clinical events. A crucial step towards this goal is to characterize the clinical performance of a marker for predicting different types of events. In this manuscript, we present statistical methods for evaluating the performance of a prognostic marker in predicting multiple competing events. To capture the potential time-varying predictive performance of the marker and incorporate competing risks, we define time- and cause-specific accuracy summaries by stratifying cases based on causes of failure. Such definition would allow one to evaluate the predictive accuracy of a marker for each type of event and compare its predictiveness across event types. Extending the nonparametric crude cause-specific ROC curve estimators by Saha and Heagerty (2010), we develop inference procedures for a range of cause-specific accuracy summaries. To estimate the accuracy measures and assess how covariates may affect the accuracy of a marker under the competing risk setting, we consider two forms of semiparametric models through the cause-specific hazard framework. These approaches enable a flexible modeling of the relationships between the marker and failure times for each cause, while efficiently accommodating additional covariates. We investigate the asymptotic property of the proposed accuracy estimators and demonstrate the finite sample performance of these estimators through simulation studies. The proposed procedures are illustrated with data from a prostate cancer prognostic study.
Biomarker evaluation; Cause-specific Hazard; Competing risk; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Survival analysis
The receiver operating characteristic (ROC) curve is used to evaluate a biomarker’s ability for classifying disease status. The Youden Index (J), the maximum potential effectiveness of a biomarker, is a common summary measure of the ROC curve. In biomarker development, levels may be unquantifiable below a limit of detection (LOD) and missing from the overall dataset. Disregarding these observations may negatively bias the ROC curve and thus J. Several correction methods have been suggested for mean estimation and testing; however, little has been written about the ROC curve or its summary measures. We adapt non-parametric (empirical) and semi-parametric (ROC-GLM [generalized linear model]) methods and propose parametric methods (maximum likelihood (ML)) to estimate J and the optimal cut-point (c*) for a biomarker affected by a LOD. We develop unbiased estimators of J and c* via ML for normally and gamma distributed biomarkers. Alpha level confidence intervals are proposed using delta and bootstrap methods for the ML, semi-parametric, and non-parametric approaches respectively. Simulation studies are conducted over a range of distributional scenarios and sample sizes evaluating estimators’ bias, root-mean square error, and coverage probability; the average bias was less than one percent for ML and GLM methods across scenarios and decreases with increased sample size. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods. We address the limitations and usefulness of each method in order to give researchers guidance in constructing appropriate estimates of biomarkers’ true discriminating capabilities.
Youden Index; ROC curve; Sensitivity and Specificity; Optimal Cut-Point
In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this paper, we proposed a likelihood-based approach to estimate the ROC curve and the area under ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all the four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in the Alzheimer’s disease.
Alzheimer’s disease; nonignorable missing data; ROC curve; verification bias
The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Non-parametric, semiparametric and parametric estimators are calculated. Comparisons between curves are based on the area or partial area under the ROC curve. Alternatively pointwise comparisons between ROC curves or inverse ROC curves can be made. Options to adjust these analyses for covariates, and to perform ROC regression are described in a companion article. We use a unified framework by representing the ROC curve as the distribution of the marker in cases after standardizing it to the control reference distribution.
Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the non-diseased population before including them in the logistic regression model. Among the advantages of this method are: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker datasets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case-control sampling designs. The dataset motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. Estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 dataset.
constrained likelihood; empirical likelihood; logistic regression; predictiveness curve; ROC curve
It has become commonplace to use receiver operating curve (ROC) methodology to evaluate the incremental predictive accuracy of new markers in the presence of existing predictors. However, concerns have been raised about the validity of this practice. We have evaluated this issue in detail.
Simulations have been used that show clearly that use of risk predictors from nested models as data in subsequent tests comparing areas under the ROC curves of the models leads to grossly invalid inferences. Careful examination of the issue reveals two major problems: (1) the data elements are strongly correlated from case to case; and (2) the model that includes the additional marker has a tendency to interpret predictive contributions as positive information regardless of whether observed effect of the marker is negative or positive. Both of these phenomena lead to profound bias in the test.
We recommend strongly against the use of ROC methods derived from risk predictors from nested regression models to test the incremental information of a new marker.
Inherited variability in genes that influence androgen metabolism has been associated with risk of prostate cancer. The objective of this analysis was to evaluate interactions for prostate cancer risk using classification and regression tree (CART) models (i.e. decision trees), and to evaluate whether these interactive effects add information about prostate cancer risk prediction beyond that of “traditional” risk factors.
We compared CART models to traditional logistic regression models for associations of factors with prostate cancer risk using 1084 prostate cancer cases and 941 controls. All analyses were stratified by race. We used unconditional logistic regression (LR) to complement and compare to the race-stratified CART results using the area under curve (AUC) for the receiver operating characteristic (ROC) curves.
The CART modeling of prostate cancer risk showed different interaction profiles by race. For European Americans, interactions among CYP3A43 genotype, history of benign prostate hypertrophy, family history of prostate cancer and age at consent revealed a distinct hierarchy of gene-environment and gene-gene interactions. While for African Americans, interactions among family history of prostate cancer, individual proportion of European ancestry, number of GGC AR repeats and CYP3A4/CYP3A5 haplotype revealed distinct interaction effects from those found in European Americans. For European Americans the CART model had the highest AUC while for African Americans, the LR model with the CART discovered factors had the largest AUC.
Conclusion & Impact
These results provide new insight into underlying prostate cancer biology for European Americans and African Americans.
Decision tree; classification and regression tree (CART); androgen pathway; prostate cancer risk; ancestry
Consider a continuous marker for predicting a binary outcome. For example, serum concentration of prostate specific antigen (PSA) may be used to calculate the risk of finding prostate cancer in a biopsy. In this paper we argue that the predictive capacity of a marker has to do with the population distribution of risk given the marker and suggest a graphical tool, the predictiveness curve, that displays this distribution. The display provides a common meaningful scale for comparing markers that may not be comparable on their original scales. Some existing measures of predictiveness are shown to be summary indices derived from the predictiveness curve. We develop methods for making inference about the predictiveness curve, for making pointwise comparisons between two curves and for evaluating covariate effects. Applications to risk prediction markers in cancer and cystic fibrosis are discussed.
risk; classification; explained variation; biomarker; ROC curve; prediction
The receiver operating characteristic (ROC) curve has been a popular statistical tool for characterizing the discriminating power of a classifier, such as a biomarker or an imaging modality for disease screening or diagnosis. It has been recognized that the accuracy of a given procedure may depend on some underlying factors, such as subject’s demographic characteristics or disease risk factors, among others. Non-parametric- or parametric-based methods tend to be either inefficient or cumbersome when evaluating effect of multiple covariates is the main focus. Here we propose a semi-parametric linear regression framework to model covariate effect. It allows the estimation of sensitivity at given specificity to vary according to the covariates and provides a way to model the area under the ROC curve indirectly. Estimation procedure and asymptotic theory are presented. Extensive simulation studies have been conducted to investigate the validity of the proposed method. We illustrate the new method on a diagnostic test dataset.
AUC; Covariate effect; Linear regression; ROC curve; Sensitivity
Rationale and Objectives
Semiparametric methods provide smooth and continuous receiver operating characteristic (ROC) curve fits to ordinal test results and require only that the data follow some unknown monotonic transformation of the model's assumed distributions. The quantitative relationship between cutoff settings or individual test-result values on the data scale and points on the estimated ROC curve is lost in this procedure, however. To recover that relationship in a principled way, we propose a new algorithm for “proper” ROC curves and illustrate it by use of the proper binormal model.
Materials and Methods
Several authors have proposed the use of multinomial distributions to fit semiparametric ROC curves by maximum-likelihood estimation. The resulting approach requires nuisance parameters that specify interval probabilities associated with the data, which are used subsequently as a basis for estimating values of the curve parameters of primary interest. In the method described here, we employ those “nuisance” parameters to recover the relationship between any ordinal test-result scale and true-positive fraction, false-positive fraction, and likelihood ratio. Computer simulations based on the proper binormal model were used to evaluate our approach in estimating those relationships and to assess the coverage of its confidence intervals for realistically sized datasets.
In our simulations, the method reliably estimated simple relationships between test-result values and the several ROC quantities.
The proposed approach provides an effective and reliable semiparametric method with which to estimate the relationship between cutoff settings or individual test-result values and corresponding points on the ROC curve.
Receiver operating characteristic (ROC) analysis; proper binormal model; likelihood ratio; test-result scale; maximum likelihood estimation (MLE)
The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on the complete cases loses efficiency due to the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random (MAR) and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. While a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis, and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve (AUC). We conduct simulation studies to evaluate the finite sample performance of the proposed methods, and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods using an observational study of maternal depression during pregnancy.
Area Under Curve; Bootstrap Methods; Dimension Reduction; Multiple Imputation; Nearest Neighbor Methods; Nonparametric Imputation; Receiver Operating Characteristics Curve
Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.
With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC.
pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.
High-throughput studies have been extensively conducted in the research of complex human diseases. As a representative example, consider gene-expression studies where thousands of genes are profiled at the same time. An important objective of such studies is to rank the diagnostic accuracy of biomarkers (e.g. gene expressions) for predicting outcome variables while properly adjusting for confounding effects from low-dimensional clinical risk factors and environmental exposures. Existing approaches are often fully based on parametric or semi-parametric models and target evaluating estimation significance as opposed to diagnostic accuracy. Receiver operating characteristic (ROC) approaches can be employed to tackle this problem. However, existing ROC ranking methods focus on biomarkers only and ignore effects of confounders. In this article, we propose a model-based approach which ranks the diagnostic accuracy of biomarkers using ROC measures with a proper adjustment of confounding effects. To this end, three different methods for constructing the underlying regression models are investigated. Simulation study shows that the proposed methods can accurately identify biomarkers with additional diagnostic power beyond confounders. Analysis of two cancer gene-expression studies demonstrates that adjusting for confounders can lead to substantially different rankings of genes.
ranking biomarkers; ROC; confounders; high-throughput data
In this article we propose a separation curve method to identify the range of false positive rates for which two ROC curves differ or one ROC curve is superior to the other. Our method is based on a general multivariate ROC curve model, including interaction terms between discrete covariates and false positive rates. It is applicable with most existing ROC curve models. Furthermore, we introduce a semiparametric least squares ROC estimator and apply the estimator to the separation curve method. We derive a sandwich estimator for the covariance matrix of the semiparametric estimator. We illustrate the application of our separation curve method through two real life examples.
Confidence band; Empirical distribution function; Least squares
We investigated whether metabolic biomarkers and single nucleotide polymorphisms (SNPs) improve diabetes prediction beyond age, anthropometry, and lifestyle risk factors.
RESEARCH DESIGN AND METHODS
A case-cohort study within a prospective study was designed. We randomly selected a subcohort (n = 2,500) from 26,444 participants, of whom 1,962 were diabetes free at baseline. Of the 801 incident type 2 diabetes cases identified in the cohort during 7 years of follow-up, 579 remained for analyses after exclusions. Prediction models were compared by receiver operatoring characteristic (ROC) curve and integrated discrimination improvement.
Case-control discrimination by the lifestyle characteristics (ROC-AUC: 0.8465) improved with plasma glucose (ROC-AUC: 0.8672, P < 0.001) and A1C (ROC-AUC: 0.8859, P < 0.001). ROC-AUC further improved with HDL cholesterol, triglycerides, γ-glutamyltransferase, and alanine aminotransferase (0.9000, P = 0.002). Twenty SNPs did not improve discrimination beyond these characteristics (P = 0.69).
Metabolic markers, but not genotyping for 20 diabetogenic SNPs, improve discrimination of incident type 2 diabetes beyond lifestyle risk factors.
Although the area under the receiver operating characteristic (ROC) curve (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new risk marker in an existing risk model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this paper, we extended the NRI and IDI to time-to-event settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies showed that the time-dependent NRI and IDI have better performance than Pencina’s NRI and IDI for measuring the improved discriminatory power of a new risk marker in prognostic survival models.
Improved discrimination; Prognostic survival models; Time-dependent NRI; Time-dependent IDI
Background Non-uniform reporting of relevant relationships and metrics hampers critical appraisal of the clinical utility of C-reactive protein (CRP) measurement for prediction of later coronary events.
Methods We evaluated the predictive performance of CRP in the Northwick Park Heart Study (NPHS-II) and the Edinburgh Artery Study (EAS) comparing discrimination by area under the ROC curve (AUC), calibration and reclassification. We set the findings in the context of a systematic review of published studies comparing different available and imputed measures of prediction. Risk estimates per-quantile of CRP were pooled using a random effects model to infer the shape of the CRP-coronary event relationship.
Results NPHS-II and EAS (3441 individuals, 309 coronary events): CRP alone provided modest discrimination for coronary heart disease (AUC 0.61 and 0.62 in NPHS-II and EAS, respectively) and only modest improvement in the discrimination of a Framingham-based risk score (FRS) (increment in AUC 0.04 and –0.01, respectively). Risk models based on FRS alone and FRS + CRP were both well calibrated and the net reclassification improvement (NRI) was 8.5% in NPHS-II and 8.8% in EAS with four risk categories, falling to 4.9% and 3.0% for 10-year coronary disease risk threshold of 15%. Systematic review (31 prospective studies 84 063 individuals, 11 252 coronary events): pooled inferred values for the AUC for CRP alone were 0.59 (0.57, 0.61), 0.59 (0.57, 0.61) and 0.57 (0.54, 0.61) for studies of <5, 5–10 and >10 years follow up, respectively. Evidence from 13 studies (7201 cases) indicated that CRP did not consistently improve performance of the Framingham risk score when assessed by discrimination, with AUC increments in the range 0–0.15. Evidence from six studies (2430 cases) showed that CRP provided statistically significant but quantitatively small improvement in calibration of models based on established risk factors in some but not all studies. The wide overlap of CRP values among people who later suffered events and those who did not appeared to be explained by the consistently log-normal distribution of CRP and a graded continuous increment in coronary risk across the whole range of values without a threshold, such that a large proportion of events occurred among the many individuals with near average levels of CRP.
Conclusions CRP does not perform better than the Framingham risk equation for discrimination. The improvement in risk stratification or reclassification from addition of CRP to models based on established risk factors is small and inconsistent. Guidance on the clinical use of CRP measurement in the prediction of coronary events may require updating in light of this large comparative analysis.
C-reactive protein; prediction; coronary heart disease; primary prevention; risk stratification
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
Decision curve analysis has been introduced as a method to evaluate prediction models in terms of their clinical consequences if used for a binary classification of subjects into a group who should and into a group who should not be treated. The key concept for this type of evaluation is the "net benefit", a concept borrowed from utility theory.
We recall the foundations of decision curve analysis and discuss some new aspects. First, we stress the formal distinction between the net benefit for the treated and for the untreated and define the concept of the "overall net benefit". Next, we revisit the important distinction between the concept of accuracy, as typically assessed using the Youden index and a receiver operating characteristic (ROC) analysis, and the concept of utility of a prediction model, as assessed using decision curve analysis. Finally, we provide an explicit implementation of decision curve analysis to be applied in the context of case-control studies.
We show that the overall net benefit, which combines the net benefit for the treated and the untreated, is a natural alternative to the benefit achieved by a model, being invariant with respect to the coding of the outcome, and conveying a more comprehensive picture of the situation. Further, within the framework of decision curve analysis, we illustrate the important difference between the accuracy and the utility of a model, demonstrating how poor an accurate model may be in terms of its net benefit. Eventually, we expose that the application of decision curve analysis to case-control studies, where an accurate estimate of the true prevalence of a disease cannot be obtained from the data, is achieved with a few modifications to the original calculation procedure.
We present several interrelated extensions to decision curve analysis that will both facilitate its interpretation and broaden its potential area of application.