Search tips
Search criteria

Results 1-25 (990165)

Clipboard (0)

Related Articles

1.  Covariate adjustment in estimating the area under ROC curve with partially missing gold standard 
Biometrics  2013;69(1):91-100.
In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as “verification bias”. To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modelled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not-missing-at-random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
PMCID: PMC3622116  PMID: 23410529
Alzheimer's disease; area under ROC curve; covariate adjustment; U statistics; verification bias; weighted estimating equations
2.  Direct Estimation of the Area Under the Receiver Operating Characteristic Curve in the Presence of Verification Bias 
Statistics in medicine  2009;28(3):361-376.
The area under a receiver operating characteristic (ROC) curve (AUC) is a commonly used index for summarizing the ability of a continuous diagnostic test to discriminate between healthy and diseased subjects. If all subjects have their true disease status verified, one can directly estimate the AUC nonparametrically using the Wilcoxon statistic. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Because estimators of the AUC based only on verified subjects are typically biased, it is common to estimate the AUC from a bias-corrected ROC curve. The variance of the estimator, however, does not have a closed-form expression and thus resampling techniques are used to obtain an estimate. In this paper, we develop a new method for directly estimating the AUC in the setting of verification bias based on U-statistics and inverse probability weighting. Closed-form expressions for the estimator and its variance are derived. We also show that the new estimator is equivalent to the empirical AUC derived from the bias-corrected ROC curve arising from the inverse probability weighting approach.
PMCID: PMC2626141  PMID: 18680124
Diagnostic test; Inverse probability weighting; Missing at random; U-statistic
3.  ROC curve estimation under test-result-dependent sampling 
Biostatistics (Oxford, England)  2012;14(1):160-172.
The receiver operating characteristic (ROC) curve is often used to evaluate the performance of a biomarker measured on continuous scale to predict the disease status or a clinical condition. Motivated by the need for novel study designs with better estimation efficiency and reduced study cost, we consider a biased sampling scheme that consists of a SRC and a supplemental TDC. Using this approach, investigators can oversample or undersample subjects falling into certain regions of the biomarker measure, yielding improved precision for the estimation of the ROC curve with a fixed sample size. Test-result-dependent sampling will introduce bias in estimating the predictive accuracy of the biomarker if standard ROC estimation methods are used. In this article, we discuss three approaches for analyzing data of a test-result-dependent structure with a special focus on the empirical likelihood method. We establish asymptotic properties of the empirical likelihood estimators for covariate-specific ROC curves and covariate-independent ROC curves and give their corresponding variance estimators. Simulation studies show that the empirical likelihood method yields good properties and is more efficient than alternative methods. Recommendations on number of regions, cutoff points, and subject allocation is made based on the simulation results. The proposed methods are illustrated with a data example based on an ongoing lung cancer clinical trial.
PMCID: PMC3577107  PMID: 22723502
Binormal model; Covariate-independent ROC curve; Covariate-specific ROC curve; Empirical likelihood method; Test-result-dependent sampling
4.  Bias in trials comparing paired continuous tests can cause researchers to choose the wrong screening modality 
To compare the diagnostic accuracy of two continuous screening tests, a common approach is to test the difference between the areas under the receiver operating characteristic (ROC) curves. After study participants are screened with both screening tests, the disease status is determined as accurately as possible, either by an invasive, sensitive and specific secondary test, or by a less invasive, but less sensitive approach. For most participants, disease status is approximated through the less sensitive approach. The invasive test must be limited to the fraction of the participants whose results on either or both screening tests exceed a threshold of suspicion, or who develop signs and symptoms of the disease after the initial screening tests.
The limitations of this study design lead to a bias in the ROC curves we call paired screening trial bias. This bias reflects the synergistic effects of inappropriate reference standard bias, differential verification bias, and partial verification bias. The absence of a gold reference standard leads to inappropriate reference standard bias. When different reference standards are used to ascertain disease status, it creates differential verification bias. When only suspicious screening test scores trigger a sensitive and specific secondary test, the result is a form of partial verification bias.
For paired screening tests with bivariate normally distributed scores, we give formulae and programs to quantify the effect of paired screening trial bias on a paired comparison of area under the curves. We fix the prevalence of disease, and the chance a diseased subject manifests signs and symptoms. We derive the formulas for true sensitivity and specificity, and those for the sensitivity and specificity observed by the study investigator.
The observed area under the ROC curves is quite different from the true area under the ROC curves. The typical direction of the bias is a strong inflation in sensitivity, paired with a concomitant slight deflation of specificity.
In paired trials of screening tests, when area under the ROC curve is used as the metric, bias may lead researchers to make the wrong decision as to which screening test is better.
PMCID: PMC2657218  PMID: 19154609
5.  A model for adjusting for nonignorable verification bias in estimation of ROC curve and its area with likelihood-based approach 
Biometrics  2010;66(4):1119-1128.
In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this paper, we proposed a likelihood-based approach to estimate the ROC curve and the area under ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all the four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in the Alzheimer’s disease.
PMCID: PMC3618959  PMID: 20222937
Alzheimer’s disease; nonignorable missing data; ROC curve; verification bias
6.  Estimation of AUC or Partial AUC under Test-Result-Dependent Sampling 
The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are summary measures used to assess the accuracy of a biomarker in discriminating true disease status. The standard sampling approach used in biomarker validation studies is often inefficient and costly, especially when ascertaining the true disease status is costly and invasive. To improve efficiency and reduce the cost of biomarker validation studies, we consider a test-result-dependent sampling (TDS) scheme, in which subject selection for determining the disease state is dependent on the result of a biomarker assay. We first estimate the test-result distribution using data arising from the TDS design. With the estimated empirical test-result distribution, we propose consistent nonparametric estimators for AUC and pAUC and establish the asymptotic properties of the proposed estimators. Simulation studies show that the proposed estimators have good finite sample properties and that the TDS design yields more efficient AUC and pAUC estimates than a simple random sampling (SRS) design. A data example based on an ongoing cancer clinical trial is provided to illustrate the TDS design and the proposed estimators. This work can find broad applications in design and analysis of biomarker validation studies.
PMCID: PMC3564679  PMID: 23393612
Area under ROC curve (AUC); Empirical likelihood; Nonparametric; Partial area under ROC curve (pAUC); Simple random sampling; Test-result-dependent sampling
7.  Estimating the ROC Curve in Studies that Match Controls to Cases on Covariates 
Academic radiology  2013;20(7):863-873.
Rationale and Objectives
Studies evaluating a new diagnostic imaging test may select control subjects without disease who are similar to case subjects with disease in regards to factors potentially related to the imaging result. Selecting one or more controls that are matched to each case on factors such as age, co-morbidities or study site improves study validity by eliminating potential biases due to differential characteristics of readings for cases versus controls. However it is not widely appreciated that valid analysis requires that the receiver operating characteristic (ROC) curve be adjusted for covariates. We propose a new computationally simple method for estimating the covariate adjusted ROC curve that is appropriate in matched case-control studies.
Materials and Methods
We provide theoretical arguments for the validity of the estimator and demonstrate its application to data. We compare the statistical properties of the estimator with those of a previously proposed estimator of the covariate adjusted ROC curve. We demonstrate an application of the estimator to data derived from a study of emergency medical services (EMS) encounters where the goal is to diagnose critical illness in non-trauma, non-cardiac arrest patients. A novel bootstrap method is proposed for calculating confidence intervals.
The new estimator is computationally very simple, yet we show it yields values that approximate the existing, more complicated estimator in simulated data sets. We found that the new estimator has excellent statistical properties, with bias and efficiency comparable with the existing method.
In matched case-control studies the ROC curve should be adjusted for matching covariates and can be estimated with the new computationally simple approach.
PMCID: PMC3679266  PMID: 23601953
risk prediction; logistic regression; diagnostic test; classification; case-control study
8.  Semiparametric estimation of the covariate-specific ROC curve in presence of ignorable veri cation bias 
Biometrics  2011;67(3):906-916.
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
PMCID: PMC3596883  PMID: 21361890
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
9.  A robust method using propensity score stratification for correcting verification bias for binary tests 
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
PMCID: PMC3276270  PMID: 21856650
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
10.  Compare diagnostic tests using transformation-invariant smoothed ROC curves⋆ 
Receiver operating characteristic (ROC) curve, plotting true positive rates against false positive rates as threshold varies, is an important tool for evaluating biomarkers in diagnostic medicine studies. By definition, ROC curve is monotone increasing from 0 to 1 and is invariant to any monotone transformation of test results. And it is often a curve with certain level of smoothness when test results from the diseased and non-diseased subjects follow continuous distributions. Most existing ROC curve estimation methods do not guarantee all of these properties. One of the exceptions is Du and Tang (2009) which applies certain monotone spline regression procedure to empirical ROC estimates. However, their method does not consider the inherent correlations between empirical ROC estimates. This makes the derivation of the asymptotic properties very difficult. In this paper we propose a penalized weighted least square estimation method, which incorporates the covariance between empirical ROC estimates as a weight matrix. The resulting estimator satisfies all the aforementioned properties, and we show that it is also consistent. Then a resampling approach is used to extend our method for comparisons of two or more diagnostic tests. Our simulations show a significantly improved performance over the existing method, especially for steep ROC curves. We then apply the proposed method to a cancer diagnostic study that compares several newly developed diagnostic biomarkers to a traditional one.
PMCID: PMC3358774  PMID: 22639484
ROC curve; Smoothing spline; Bootstrap
11.  Asymptotic Properties of the Sequential Empirical ROC, PPV and NPV Curves Under Case-Control Sampling 
Annals of statistics  2011;39(6):3234-3261.
The receiver operating characteristic (ROC) curve, the positive predictive value (PPV) curve and the negative predictive value (NPV) curve are three measures of performance for a continuous diagnostic biomarker. The ROC, PPV and NPV curves are often estimated empirically to avoid assumptions about the distributional form of the biomarkers. Recently, there has been a push to incorporate group sequential methods into the design of diagnostic biomarker studies. A thorough understanding of the asymptotic properties of the sequential empirical ROC, PPV and NPV curves will provide more flexibility when designing group sequential diagnostic biomarker studies. In this paper we derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves under case-control sampling using sequential empirical process theory. We show that the sequential empirical ROC, PPV and NPV curves converge to the sum of independent Kiefer processes and show how these results can be used to derive asymptotic results for summaries of the sequential empirical ROC, PPV and NPV curves.
PMCID: PMC3771874  PMID: 24039313
Group Sequential Methods; Empirical Process Theory; Diagnostic Testing
12.  A Unified Approach to Nonparametric Comparison of Receiver Operating Characteristic Curves for Longitudinal and Clustered Data 
We present a unified approach to nonparametric comparisons of receiver operating characteristic (ROC) curves for a paired design with clustered data. Treating empirical ROC curves as stochastic processes, their asymptotic joint distribution is derived in the presence of both between-marker and within-subject correlations. A Monte Carlo method is developed to approximate their joint distribution without involving nonparametric density estimation. The developed theory is applied to derive new inferential procedures for comparing weighted areas under the ROC curves, confidence bands for the difference function of ROC curves, confidence intervals for the set of specificities at which one diagnostic test is more sensitive than the other, and multiple comparison procedures for comparing more than two diagnostic markers. Our methods demonstrate satisfactory small-sample performance in simulations. We illustrate our methods using clustered data from a glaucoma study and repeated-measurement data from a startle response study.
PMCID: PMC2832229  PMID: 20209021
Area under the receiver operating characteristic curve; Clustered data; Confidence band; Intersection-union tests; Longitudinal data; Multiple comparison; Paired design; Partial area under the receiver operating characteristic curve; Quantile process; Repeated measurement
13.  ROC curve inference for best linear combination of two biomarkers subject to limits of detection 
The receiver operating characteristic (ROC) curve is a tool commonly used to evaluate biomarker utility in clinical diagnosis of disease. Often, multiple biomarkers are developed to evaluate the discrimination for the same outcome. Levels of multiple biomarkers can be combined via best linear combination (BLC) such that their overall discriminatory ability is greater than any of them individually. Biomarker measurements frequently have undetectable levels below a detection limit sometimes denoted as limit of detection (LOD). Ignoring observations below the LOD or substituting some replacement value as a method of correction has been shown to lead to negatively biased estimates of the area under the ROC curve for some distributions of single biomarkers. In this paper, we develop asymptotically unbiased estimators, via the maximum likelihood technique, of the area under the ROC curve of BLC of two bivariate normally distributed biomarkers affected by LODs. We also propose confidence intervals for this area under curve. Point and confidence interval estimates are scrutinized by simulation study, recording bias and root mean square error and coverage probability, respectively. An example using polychlorinated biphenyl (PCB) levels to classify women with and without endometriosis illustrates the potential benefits of our methods.
PMCID: PMC4159257  PMID: 22223252
Area under the curve; Best linear combinaton; Left censoring; Limit of detection; ROC
14.  Cut-off of body mass index and waist circumference to predict hypertension in Indian adults 
AIM: To determine the cut-off values of body mass index (BMI) and waist circumference to predict hypertension in adults in north India.
METHODS: A community based cross-sectional study was conducted in 801 subjects in Kanpur, aged 20 years and above, using multistage stratified random sampling technique. A pre-tested structured questionnaire was used to elicit the required information from the study participants and the diagnostic criteria for hypertension were taken according to the Seventh Joint National Committee Report on Hypertension (JNC-7). Receiver operating characteristic (ROC) analysis was used to estimate the cut-off values of BMI and waist circumference to predict hypertension.
RESULTS: The ROC analysis revealed that BMI is a good predictor of hypertension for both men (area under the ROC curve 0.714) and women (area under the ROC curve 0.821). The cut-off values of BMI for predicting hypertension were identified as ≥ 24.5 kg/m2 in men and ≥ 24.9 kg/m2 in women. Similarly, the ROC analysis for waist circumference showed that it is a good predictor of hypertension both for men (area under the ROC curve 0.784) and women (area under the ROC curve 0.815). The cut-offs for waist circumference for predicting hypertension were estimated as ≥ 83 cm for men and ≥ 78 cm for women. Adults with high BMI or high waist circumference had a higher prevalence of hypertension, respectively.
CONCLUSION: Simple anthropometric measurements such as BMI and waist circumference can be used for screening people at increased risk of hypertension in order to refer them for more careful and early diagnostic evaluation. Policies and programs are required for primary and secondary prevention of hypertension.
PMCID: PMC4097154  PMID: 25032202
Anthropometric indices; Body mass index; Waist circumference; Obesity, Hypertension; Adults
15.  Nonparametric Multiple Imputation for ROC Analysis When Some Biomarker Values Are Missing at Random 
Statistics in medicine  2011;30(26):3149-3161.
The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on the complete cases loses efficiency due to the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random (MAR) and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. While a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis, and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve (AUC). We conduct simulation studies to evaluate the finite sample performance of the proposed methods, and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods using an observational study of maternal depression during pregnancy.
PMCID: PMC3205437  PMID: 22025311
Area Under Curve; Bootstrap Methods; Dimension Reduction; Multiple Imputation; Nearest Neighbor Methods; Nonparametric Imputation; Receiver Operating Characteristics Curve
16.  Multivariate Normally Distributed Biomarkers Subject to Limits of Detection and Receiver Operating Characteristic Curve Inference 
Academic radiology  2013;20(7):838-846.
Rationale and Objectives
Biomarkers are of ever-increasing importance to clinical practice and epidemiologic research. Multiple biomarkers are often measured per patient. Measurement of true biomarker levels is limited by laboratory precision, specifically measuring relatively low, or high, biomarker levels resulting in undetectable levels below, or above, a limit of detection (LOD). Ignoring these missing observations or replacing them with a constant are methods commonly used although they have been shown to lead to biased estimates of several parameters of interest, including the area under the receiver operating characteristic (ROC) curve and regression coefficients.
Materials and Methods
We developed asymptotically consistent, efficient estimators, via maximum likelihood techniques, for the mean vector and covariance matrix of multivariate normally distributed biomarkers affected by LOD. We also developed an approximation for the Fisher information and covariance matrix for our maximum likelihood estimations (MLEs). We apply these results to an ROC curve setting, generating an MLE for the area under the curve for the best linear combination of multiple biomarkers and accompanying confidence interval.
Point and confidence interval estimates are scrutinized by simulation study, with bias and root mean square error and coverage probability, respectively, displaying behavior consistent with MLEs. An example using three polychlorinated biphenyls to classify women with and without endometriosis illustrates how the underlying distribution of multiple biomarkers with LOD can be assessed and display increased discriminatory ability over naïve methods.
Properly addressing LODs can lead to optimal biomarker combinations with increased discriminatory ability that may have been ignored because of measurement obstacles.
PMCID: PMC4160911  PMID: 23747152
Area under the curve; left censoring; limit of detection; maximum likelihood; ROC
17.  Statistical methods to correct for verification bias in diagnostic studies are inadequate when there are few false negatives: a simulation study 
A common feature of diagnostic research is that results for a diagnostic gold standard are available primarily for patients who are positive for the test under investigation. Data from such studies are subject to what has been termed "verification bias". We evaluated statistical methods for verification bias correction when there are few false negatives.
A simulation study was conducted of a screening study subject to verification bias. We compared estimates of the area-under-the-curve (AUC) corrected for verification bias varying both the rate and mechanism of verification.
In a single simulated data set, varying false negatives from 0 to 4 led to verification bias corrected AUCs ranging from 0.550 to 0.852. Excess variation associated with low numbers of false negatives was confirmed in simulation studies and by analyses of published studies that incorporated verification bias correction. The 2.5th – 97.5th centile range constituted as much as 60% of the possible range of AUCs for some simulations.
Screening programs are designed such that there are few false negatives. Standard statistical methods for verification bias correction are inadequate in this circumstance.
PMCID: PMC2600821  PMID: 19014457
18.  Estimating confidence intervals for the difference in diagnostic accuracy with three ordinal diagnostic categories without a gold standard 
Computational statistics & data analysis  2013;68:10.1016/j.csda.2013.07.007.
With three ordinal diagnostic categories, the most commonly used measures for the overall diagnostic accuracy are the volume under the ROC surface (VUS) and partial volume under the ROC surface (PVUS), which are the extensions of the area under the ROC curve (AUC) and partial area under the ROC curve (PAUC), respectively. A gold standard (GS) test on the true disease status is required to estimate the VUS and PVUS. However, oftentimes it may be difficult, inappropriate, or impossible to have a GS because of misclassification error, risk to the subjects or ethical concerns. Therefore, in many medical research studies, the true disease status may remain unobservable. Under the normality assumption, a maximum likelihood (ML) based approach using the expectation–maximization (EM) algorithm for parameter estimation is proposed. Three methods using the concepts of generalized pivot and parametric/nonparametric bootstrap for confidence interval estimation of the difference in paired VUSs and PVUSs without a GS are compared. The coverage probabilities of the investigated approaches are numerically studied. The proposed approaches are then applied to a real data set of 118 subjects from a cohort study in early stage Alzheimer’s disease (AD) from the Washington University Knight Alzheimer’s Disease Research Center to compare the overall diagnostic accuracy of early stage AD between two different pairs of neuropsychological tests.
PMCID: PMC3883051  PMID: 24415817
EM algorithm; Generalized pivot; Gold standard; Parametric bootstrap; Volume under the ROC surface
19.  On the convexity of ROC curves estimated from radiological test results 
Academic radiology  2010;17(8):960-968.e4.
Rationale and Objectives
Although an ideal observer’s receiver operating characteristic (ROC) curve must be convex — i.e., its slope must decrease monotonically — published fits to empirical data often display “hooks.” Such fits sometimes are accepted on the basis of an argument that experiments are done with real, rather than ideal, observers. However, the fact that ideal observers must produce convex curves does not imply that convex curves describe only ideal observers. This paper aims to identify the practical implications of non-convex ROC curves and the conditions that can lead to empirical and/or fitted ROC curves that are not convex.
Materials and Methods
This paper views non-convex ROC curves from historical, theoretical and statistical perspectives, which we describe briefly. We then consider population ROC curves with various shapes and analyze the types of medical decisions that they imply. Finally, we describe how sampling variability and curve-fitting algorithms can produce ROC curve estimates that include hooks.
We show that hooks in population ROC curves imply the use of an irrational decision strategy, even when the curve doesn’t cross the chance line, and therefore usually are untenable in medical settings. Moreover, we sketch a simple approach to improve any non-convex ROC curve by adding statistical variation to the decision process. Finally, we sketch how to test whether hooks present in ROC data are likely to have been caused by chance alone and how some hooked ROCs found in the literature can be easily explained as fitting artifacts or modeling issues.
In general, ROC curve fits that show hooks should be looked upon with suspicion unless other arguments justify their presence.
PMCID: PMC2897827  PMID: 20599155
Receiver operating characteristic (ROC) analysis; proper ROC curve; maximum likelihood estimation (MLE); contaminated ROC model
20.  Multicategory reclassification statistics for assessing improvements in diagnostic accuracy 
Biostatistics (Oxford, England)  2012;14(2):382-394.
In this paper, we extend the definitions of the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) in the context of multicategory classification. Both measures were proposed in Pencina and others (2008. Evaluating the added predictive ability of a new marker: from area under the receiver operating characteristic (ROC) curve to reclassification and beyond. Statistics in Medicine 27, 157–172) as numeric characterizations of accuracy improvement for binary diagnostic tests and were shown to have certain advantage over analyses based on ROC curves or other regression approaches. Estimation and inference procedures for the multiclass NRI and IDI are provided in this paper along with necessary asymptotic distributional results. Simulations are conducted to study the finite-sample properties of the proposed estimators. Two medical examples are considered to illustrate our methodology.
PMCID: PMC3695653  PMID: 23197381
Area under the ROC curve; Integrated discrimination improvement; Multicategory classification; Multinomial logistic regression; Net reclassification improvement
Statistica Sinica  2013;23:829-851.
The Receiver Operating Characteristic (ROC) curve is a widely used measure to assess the diagnostic accuracy of biomarkers for diseases. Biomarker tests can be affected by subject characteristics, the experience of testers, or the environment in which tests are carried out, so it is important to understand and determine the conditions for evaluating biomarkers. In this paper, we focus on assessing the effects of covariates on the performance of the ROC curves. In particular, we develop an accelerated ROC model by assuming that the effect of covariates relates to rescaling a baseline ROC curve. The proposed model generalizes the accelerated failure time model in the survival context to ROC analysis. An innovative method is developed to construct estimation and inference for model parameters. The obtained parameter estimators are shown to be asymptotically normal. We demonstrate the proposed method via a number of simulation studies, and apply it to analyze data from a prostate cancer study.
PMCID: PMC4013010  PMID: 24817797
Accelerated failure time model; asymptotic normality; receiver operating characteristic curve; regression models
22.  Non-parametric estimation of a time-dependent predictive accuracy curve 
A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344). We propose a direct, non-parametric method to estimate the time-dependent Area under the curve (AUC) which we refer to as the weighted mean rank (WMR) estimator. The proposed estimator performs well relative to the semi-parametric AUC curve estimator of Heagerty and Zheng (2005. Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105). We establish the asymptotic properties of the proposed estimator and show that the accuracy of markers can be compared very simply using the difference in the WMR statistics. Estimators of pointwise standard errors are provided.
PMCID: PMC3520498  PMID: 22734044
AUC curve; Survival analysis; Time-dependent ROC
23.  Clinical Utility of Serologic Testing for Celiac Disease in Ontario 
Executive Summary
Objective of Analysis
The objective of this evidence-based evaluation is to assess the accuracy of serologic tests in the diagnosis of celiac disease in subjects with symptoms consistent with this disease. Furthermore the impact of these tests in the diagnostic pathway of the disease and decision making was also evaluated.
Celiac Disease
Celiac disease is an autoimmune disease that develops in genetically predisposed individuals. The immunological response is triggered by ingestion of gluten, a protein that is present in wheat, rye, and barley. The treatment consists of strict lifelong adherence to a gluten-free diet (GFD).
Patients with celiac disease may present with a myriad of symptoms such as diarrhea, abdominal pain, weight loss, iron deficiency anemia, dermatitis herpetiformis, among others.
Serologic Testing in the Diagnosis Celiac Disease
There are a number of serologic tests used in the diagnosis of celiac disease.
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibodies (DGP)
Serologic tests are automated with the exception of the EMA test, which is more time-consuming and operator-dependent than the other tests. For each serologic test, both immunoglobulin A (IgA) or G (IgG) can be measured, however, IgA measurement is the standard antibody measured in celiac disease.
Diagnosis of Celiac Disease
According to celiac disease guidelines, the diagnosis of celiac disease is established by small bowel biopsy. Serologic tests are used to initially detect and to support the diagnosis of celiac disease. A small bowel biopsy is indicated in individuals with a positive serologic test. In some cases an endoscopy and small bowel biopsy may be required even with a negative serologic test. The diagnosis of celiac disease must be performed on a gluten-containing diet since the small intestine abnormalities and the serologic antibody levels may resolve or improve on a GFD.
Since IgA measurement is the standard for the serologic celiac disease tests, false negatives may occur in IgA-deficient individuals.
Incidence and Prevalence of Celiac Disease
The incidence and prevalence of celiac disease in the general population and in subjects with symptoms consistent with or at higher risk of celiac disease based on systematic reviews published in 2004 and 2009 are summarized below.
Incidence of Celiac Disease in the General Population
Adults or mixed population: 1 to 17/100,000/year
Children: 2 to 51/100,000/year
In one of the studies, a stratified analysis showed that there was a higher incidence of celiac disease in younger children compared to older children, i.e., 51 cases/100,000/year in 0 to 2 year-olds, 33/100,000/year in 2 to 5 year-olds, and 10/100,000/year in children 5 to 15 years old.
Prevalence of Celiac Disease in the General Population
The prevalence of celiac disease reported in population-based studies identified in the 2004 systematic review varied between 0.14% and 1.87% (median: 0.47%, interquartile range: 0.25%, 0.71%). According to the authors of the review, the prevalence did not vary by age group, i.e., adults and children.
Prevalence of Celiac Disease in High Risk Subjects
Type 1 diabetes (adults and children): 1 to 11%
Autoimmune thyroid disease: 2.9 to 3.3%
First degree relatives of patients with celiac disease: 2 to 20%
Prevalence of Celiac Disease in Subjects with Symptoms Consistent with the Disease
The prevalence of celiac disease in subjects with symptoms consistent with the disease varied widely among studies, i.e., 1.5% to 50% in adult studies, and 1.1% to 17% in pediatric studies. Differences in prevalence may be related to the referral pattern as the authors of a systematic review noted that the prevalence tended to be higher in studies whose population originated from tertiary referral centres compared to general practice.
Research Questions
What is the sensitivity and specificity of serologic tests in the diagnosis celiac disease?
What is the clinical validity of serologic tests in the diagnosis of celiac disease? The clinical validity was defined as the ability of the test to change diagnosis.
What is the clinical utility of serologic tests in the diagnosis of celiac disease? The clinical utility was defined as the impact of the test on decision making.
What is the budget impact of serologic tests in the diagnosis of celiac disease?
What is the cost-effectiveness of serologic tests in the diagnosis of celiac disease?
Literature Search
A literature search was performed on November 13th, 2009 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1st 2003 and November 13th 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with unknown eligibility were reviewed with a second clinical epidemiologist, then a group of epidemiologists until consensus was established. The quality of evidence was assessed as high, moderate, low or very low according to GRADE methodology.
Studies that evaluated diagnostic accuracy, i.e., both sensitivity and specificity of serology tests in the diagnosis of celiac disease.
Study population consisted of untreated patients with symptoms consistent with celiac disease.
Studies in which both serologic celiac disease tests and small bowel biopsy (gold standard) were used in all subjects.
Systematic reviews, meta-analyses, randomized controlled trials, prospective observational studies, and retrospective cohort studies.
At least 20 subjects included in the celiac disease group.
English language.
Human studies.
Studies published from 2000 on.
Clearly defined cut-off value for the serology test. If more than one test was evaluated, only those tests for which a cut-off was provided were included.
Description of small bowel biopsy procedure clearly outlined (location, number of biopsies per patient), unless if specified that celiac disease diagnosis guidelines were followed.
Patients in the treatment group had untreated CD.
Studies on screening of the general asymptomatic population.
Studies that evaluated rapid diagnostic kits for use either at home or in physician’s offices.
Studies that evaluated diagnostic modalities other than serologic tests such as capsule endoscopy, push enteroscopy, or genetic testing.
Cut-off for serologic tests defined based on controls included in the study.
Study population defined based on positive serology or subjects pre-screened by serology tests.
Celiac disease status known before study enrolment.
Sensitivity or specificity estimates based on repeated testing for the same subject.
Non-peer-reviewed literature such as editorials and letters to the editor.
The population consisted of adults and children with untreated, undiagnosed celiac disease with symptoms consistent with the disease.
Serologic Celiac Disease Tests Evaluated
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibody (DGP)
Combinations of some of the serologic tests listed above were evaluated in some studies
Both IgA and IgG antibodies were evaluated for the serologic tests listed above.
Outcomes of Interest
Positive and negative likelihood ratios
Diagnostic odds ratio (OR)
Area under the sROC curve (AUC)
Small bowel biopsy was used as the gold standard in order to estimate the sensitivity and specificity of each serologic test.
Statistical Analysis
Pooled estimates of sensitivity, specificity and diagnostic odds ratios (DORs) for the different serologic tests were calculated using a bivariate, binomial generalized linear mixed model. Statistical significance for differences in sensitivity and specificity between serologic tests was defined by P values less than 0.05, where “false discovery rate” adjustments were made for multiple hypothesis testing. The bivariate regression analyses were performed using SAS version 9.2 (SAS Institute Inc.; Cary, NC, USA). Using the bivariate model parameters, summary receiver operating characteristic (sROC) curves were produced using Review Manager 5.0.22 (The Nordiac Cochrane Centre, The Cochrane Collaboration, 2008). The area under the sROC curve (AUC) was estimated by bivariate mixed-efects binary regression modeling framework. Model specification, estimation and prediction are carried out with xtmelogit in Stata release 10 (Statacorp, 2007). Statistical tests for the differences in AUC estimates could not be carried out.
The study results were stratified according to patient or disease characteristics such as age, severity of Marsh grade abnormalities, among others, if reported in the studies. The literature indicates that the diagnostic accuracy of serologic tests for celiac disease may be affected in patients with chronic liver disease, therefore, the studies identified through the systematic literature review that evaluated the diagnostic accuracy of serologic tests for celiac disease in patients with chronic liver disease were summarized. The effect of the GFD in patiens diagnosed with celiac disease was also summarized if reported in the studies eligible for the analysis.
Summary of Findings
Published Systematic Reviews
Five systematic reviews of studies that evaluated the diagnostic accuracy of serologic celiac disease tests were identified through our literature search. Seventeen individual studies identified in adults and children were eligible for this evaluation.
In general, the studies included evaluated the sensitivity and specificity of at least one serologic test in subjects with symptoms consistent with celiac disease. The gold standard used to confirm the celiac disease diagnosis was small bowel biopsy. Serologic tests evaluated included tTG, EMA, AGA, and DGP, using either IgA or IgG antibodies. Indirect immunoflurorescence was used for the EMA serologic tests whereas enzyme-linked immunosorbent assay (ELISA) was used for the other serologic tests.
Common symptoms described in the studies were chronic diarrhea, abdominal pain, bloating, unexplained weight loss, unexplained anemia, and dermatitis herpetiformis.
The main conclusions of the published systematic reviews are summarized below.
IgA tTG and/or IgA EMA have a high accuracy (pooled sensitivity: 90% to 98%, pooled specificity: 95% to 99% depending on the pooled analysis).
Most reviews found that AGA (IgA or IgG) are not as accurate as IgA tTG and/or EMA tests.
A 2009 systematic review concluded that DGP (IgA or IgG) seems to have a similar accuracy compared to tTG, however, since only 2 studies identified evaluated its accuracy, the authors believe that additional data is required to draw firm conclusions.
Two systematic reviews also concluded that combining two serologic celiac disease tests has little contribution to the accuracy of the diagnosis.
MAS Analysis
The pooled analysis performed by MAS showed that IgA tTG has a sensitivity of 92.1% [95% confidence interval (CI) 88.0, 96.3], compared to 89.2% (83.3, 95.1, p=0.12) for IgA DGP, 85.1% (79.5, 94.4, p=0.07) for IgA EMA, and 74.9% (63.6, 86.2, p=0.0003) for IgA AGA. Among the IgG-based tests, the results suggest that IgG DGP has a sensitivity of 88.4% (95% CI: 82.1, 94.6), 44.7% (30.3, 59.2) for tTG, and 69.1% (56.0, 82.2) for AGA. The difference was significant when IgG DGP was compared to IgG tTG but not IgG AGA. Combining serologic celiac disease tests yielded a slightly higher sensitivity compared to individual IgA-based serologic tests.
IgA deficiency
The prevalence of total or severe IgA deficiency was low in the studies identified varying between 0 and 1.7% as reported in 3 studies in which IgA deficiency was not used as a referral indication for celiac disease serologic testing. The results of IgG-based serologic tests were positive in all patients with IgA deficiency in which celiac disease was confirmed by small bowel biopsy as reported in four studies.
The MAS pooled analysis indicates a high specificity across the different serologic tests including the combination strategy, pooled estimates ranged from 90.1% to 98.7% depending on the test.
Likelihood Ratios
According to the likelihood ratio estimates, both IgA tTG and serologic test combinationa were considered very useful tests (positive likelihood ratio above ten and the negative likelihood ratio below 0.1).
Moderately useful tests included IgA EMA, IgA DGP, and IgG DGP (positive likelihood ratio between five and ten and the negative likelihood ratio between 0.1 and 0.2).
Somewhat useful tests: IgA AGA, IgG AGA, generating small but sometimes important changes from pre- to post-test probability (positive LR between 2 and 5 and negative LR between 0.2 and 0.5)
Not Useful: IgG tTG, altering pre- to post-test probability to a small and rarely important degree (positive LR between 1 and 2 and negative LR between 0.5 and 1).
Diagnostic Odds Ratios (DOR)
Among the individual serologic tests, IgA tTG had the highest DOR, 136.5 (95% CI: 51.9, 221.2). The statistical significance of the difference in DORs among tests was not calculated, however, considering the wide confidence intervals obtained, the differences may not be statistically significant.
Area Under the sROC Curve (AUC)
The sROC AUCs obtained ranged between 0.93 and 0.99 for most IgA-based tests with the exception of IgA AGA, with an AUC of 0.89.
Sensitivity and Specificity of Serologic Tests According to Age Groups
Serologic test accuracy did not seem to vary according to age (adults or children).
Sensitivity and Specificity of Serologic Tests According to Marsh Criteria
Four studies observed a trend towards a higher sensitivity of serologic celiac disease tests when Marsh 3c grade abnormalities were found in the small bowel biopsy compared to Marsh 3a or 3b (statistical significance not reported). The sensitivity of serologic tests was much lower when Marsh 1 grade abnormalities were found in small bowel biopsy compared to Marsh 3 grade abnormalities. The statistical significance of these findings were not reported in the studies.
Diagnostic Accuracy of Serologic Celiac Disease Tests in Subjects with Chronic Liver Disease
A total of 14 observational studies that evaluated the specificity of serologic celiac disease tests in subjects with chronic liver disease were identified. All studies evaluated the frequency of false positive results (1-specificity) of IgA tTG, however, IgA tTG test kits using different substrates were used, i.e., human recombinant, human, and guinea-pig substrates. The gold standard, small bowel biopsy, was used to confirm the result of the serologic tests in only 5 studies. The studies do not seem to have been designed or powered to compare the diagnostic accuracy among different serologic celiac disease tests.
The results of the studies identified in the systematic literature review suggest that there is a trend towards a lower frequency of false positive results if the IgA tTG test using human recombinant substrate is used compared to the guinea pig substrate in subjects with chronic liver disease. However, the statistical significance of the difference was not reported in the studies. When IgA tTG with human recombinant substrate was used, the number of false positives seems to be similar to what was estimated in the MAS pooled analysis for IgA-based serologic tests in a general population of patients. These results should be interpreted with caution since most studies did not use the gold standard, small bowel biopsy, to confirm or exclude the diagnosis of celiac disease, and since the studies were not designed to compare the diagnostic accuracy among different serologic tests. The sensitivity of the different serologic tests in patients with chronic liver disease was not evaluated in the studies identified.
Effects of a Gluten-Free Diet (GFD) in Patients Diagnosed with Celiac Disease
Six studies identified evaluated the effects of GFD on clinical, histological, or serologic improvement in patients diagnosed with celiac disease. Improvement was observed in 51% to 95% of the patients included in the studies.
Grading of Evidence
Overall, the quality of the evidence ranged from moderate to very low depending on the serologic celiac disease test. Reasons to downgrade the quality of the evidence included the use of a surrogate endpoint (diagnostic accuracy) since none of the studies evaluated clinical outcomes, inconsistencies among study results, imprecise estimates, and sparse data. The quality of the evidence was considered moderate for IgA tTg and IgA EMA, low for IgA DGP, and serologic test combinations, and very low for IgA AGA.
Clinical Validity and Clinical Utility of Serologic Testing in the Diagnosis of Celiac Disease
The clinical validity of serologic tests in the diagnosis of celiac disease was considered high in subjects with symptoms consistent with this disease due to
High accuracy of some serologic tests.
Serologic tests detect possible celiac disease cases and avoid unnecessary small bowel biopsy if the test result is negative, unless an endoscopy/ small bowel biopsy is necessary due to the clinical presentation.
Serologic tests support the results of small bowel biopsy.
The clinical utility of serologic tests for the diagnosis of celiac disease, as defined by its impact in decision making was also considered high in subjects with symptoms consistent with this disease given the considerations listed above and since celiac disease diagnosis leads to treatment with a gluten-free diet.
Economic Analysis
A decision analysis was constructed to compare costs and outcomes between the tests based on the sensitivity, specificity and prevalence summary estimates from the MAS Evidence-Based Analysis (EBA). A budget impact was then calculated by multiplying the expected costs and volumes in Ontario. The outcome of the analysis was expected costs and false negatives (FN). Costs were reported in 2010 CAD$. All analyses were performed using TreeAge Pro Suite 2009.
Four strategies made up the efficiency frontier; IgG tTG, IgA tTG, EMA and small bowel biopsy. All other strategies were dominated. IgG tTG was the least costly and least effective strategy ($178.95, FN avoided=0). Small bowel biopsy was the most costly and most effective strategy ($396.60, FN avoided =0.1553). The cost per FN avoided were $293, $369, $1,401 for EMA, IgATTG and small bowel biopsy respectively. One-way sensitivity analyses did not change the ranking of strategies.
All testing strategies with small bowel biopsy are cheaper than biopsy alone however they also result in more FNs. The most cost-effective strategy will depend on the decision makers’ willingness to pay. Findings suggest that IgA tTG was the most cost-effective and feasible strategy based on its Incremental Cost-Effectiveness Ratio (ICER) and convenience to conduct the test. The potential impact of IgA tTG test in the province of Ontario would be $10.4M, $11.0M and $11.7M respectively in the following three years based on past volumes and trends in the province and basecase expected costs.
The panel of tests is the commonly used strategy in the province of Ontario therefore the impact to the system would be $13.6M, $14.5M and $15.3M respectively in the next three years based on past volumes and trends in the province and basecase expected costs.
The clinical validity and clinical utility of serologic tests for celiac disease was considered high in subjects with symptoms consistent with this disease as they aid in the diagnosis of celiac disease and some tests present a high accuracy.
The study findings suggest that IgA tTG is the most accurate and the most cost-effective test.
AGA test (IgA) has a lower accuracy compared to other IgA-based tests
Serologic test combinations appear to be more costly with little gain in accuracy. In addition there may be problems with generalizability of the results of the studies included in this review if different test combinations are used in clinical practice.
IgA deficiency seems to be uncommon in patients diagnosed with celiac disease.
The generalizability of study results is contingent on performing both the serologic test and small bowel biopsy in subjects on a gluten-containing diet as was the case in the studies identified, since the avoidance of gluten may affect test results.
PMCID: PMC3377499  PMID: 23074399
24.  Nonparametric ROC Based Evaluation for Survival Outcomes 
Statistics in medicine  2012;31(23):2660-2675.
For censored survival outcomes, it can be of great interest to evaluate the predictive power of individual markers or their functions. Compared with alternative evaluation approaches, the time-dependent ROC (receiver operating characteristics) based approaches rely on much weaker assumptions, can be more robust, and hence are preferred. In this article, we examine evaluation of markers’ predictive power using the time-dependent ROC curve and a concordance measure which can be viewed as a weighted area under the time-dependent AUC (area under the ROC curve) profile. This study significantly advances from existing time-dependent ROC studies by developing nonparametric estimators of the summary indexes and, more importantly, rigorously establishing their asymptotic properties. It reinforces the statistical foundation of the time-dependent ROC based evaluation approaches for censored survival outcomes. Numerical studies, including simulations and application to an HIV clinical trial, demonstrate the satisfactory finite-sample performance of the proposed approaches.
PMCID: PMC3743052  PMID: 22987578
time-dependent ROC; concordance measure; inverse-probability-of-censoring weighting; marker evaluation; survival outcomes
25.  Estimation and Comparison of Receiver Operating Characteristic Curves 
The Stata journal  2009;9(1):1.
The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Non-parametric, semiparametric and parametric estimators are calculated. Comparisons between curves are based on the area or partial area under the ROC curve. Alternatively pointwise comparisons between ROC curves or inverse ROC curves can be made. Options to adjust these analyses for covariates, and to perform ROC regression are described in a companion article. We use a unified framework by representing the ROC curve as the distribution of the marker in cases after standardizing it to the control reference distribution.
PMCID: PMC2774909  PMID: 20161343

Results 1-25 (990165)