For censored survival outcomes, it can be of great interest to evaluate the predictive power of individual markers or their functions. Compared with alternative evaluation approaches, the time-dependent ROC (receiver operating characteristics) based approaches rely on much weaker assumptions, can be more robust, and hence are preferred. In this article, we examine evaluation of markers’ predictive power using the time-dependent ROC curve and a concordance measure which can be viewed as a weighted area under the time-dependent AUC (area under the ROC curve) profile. This study significantly advances from existing time-dependent ROC studies by developing nonparametric estimators of the summary indexes and, more importantly, rigorously establishing their asymptotic properties. It reinforces the statistical foundation of the time-dependent ROC based evaluation approaches for censored survival outcomes. Numerical studies, including simulations and application to an HIV clinical trial, demonstrate the satisfactory finite-sample performance of the proposed approaches.
time-dependent ROC; concordance measure; inverse-probability-of-censoring weighting; marker evaluation; survival outcomes
The predictiveness curve shows the population distribution of risk endowed by a marker or risk prediction model. It provides a means for assessing the model’s capacity for stratifying the population according to risk. Methods for making inference about the predictiveness curve have been developed using cross-sectional or cohort data. Here we consider inference based on case-control studies which are far more common in practice. We investigate the relationship between the ROC curve and the predictiveness curve. Insights about their relationship provide alternative ROC interpretations for the predictiveness curve and for a previously proposed summary index of it. Next the relationship motivates ROC based methods for estimating the predictiveness curve. An important advantage of these methods over previously proposed methods is that they are rank invariant. In addition they provide a way of combining information across populations that have similar ROC curves but varying prevalence of the outcome. We apply the methods to PSA, a marker for predicting risk of prostate cancer.
biomarker; classification; predictiveness curve; risk prediction; ROC curve; total gain
We present a unified approach to nonparametric comparisons of receiver operating characteristic (ROC) curves for a paired design with clustered data. Treating empirical ROC curves as stochastic processes, their asymptotic joint distribution is derived in the presence of both between-marker and within-subject correlations. A Monte Carlo method is developed to approximate their joint distribution without involving nonparametric density estimation. The developed theory is applied to derive new inferential procedures for comparing weighted areas under the ROC curves, confidence bands for the difference function of ROC curves, confidence intervals for the set of specificities at which one diagnostic test is more sensitive than the other, and multiple comparison procedures for comparing more than two diagnostic markers. Our methods demonstrate satisfactory small-sample performance in simulations. We illustrate our methods using clustered data from a glaucoma study and repeated-measurement data from a startle response study.
Area under the receiver operating characteristic curve; Clustered data; Confidence band; Intersection-union tests; Longitudinal data; Multiple comparison; Paired design; Partial area under the receiver operating characteristic curve; Quantile process; Repeated measurement
ROC analysis occupies an increasingly important role in technology assessment. ROC curves allow one to compare a set of ordinal estimates over the entire range of estimates. Sources of such estimates may include subjective probabilities, mathematical prediction models and empiric prediction models (like the APGAR score). The area under the ROC curve measures the ability of the estimation method to discriminate between two states (usually disease and non-disease). This paper discusses how one constructs ROC curves, what the area under the curve means, and how and why one compares two ROC curves. The computer program (ROC ANALYZER) allows easy performance of these analyses on MS-DOS compatible machines.
The receiver operating characteristic (ROC) curve, the positive predictive value (PPV) curve and the negative predictive value (NPV) curve are three measures of performance for a continuous diagnostic biomarker. The ROC, PPV and NPV curves are often estimated empirically to avoid assumptions about the distributional form of the biomarkers. Recently, there has been a push to incorporate group sequential methods into the design of diagnostic biomarker studies. A thorough understanding of the asymptotic properties of the sequential empirical ROC, PPV and NPV curves will provide more flexibility when designing group sequential diagnostic biomarker studies. In this paper we derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves under case-control sampling using sequential empirical process theory. We show that the sequential empirical ROC, PPV and NPV curves converge to the sum of independent Kiefer processes and show how these results can be used to derive asymptotic results for summaries of the sequential empirical ROC, PPV and NPV curves.
Group Sequential Methods; Empirical Process Theory; Diagnostic Testing
Receiver operating characteristic (ROC) curve, plotting true positive rates against false positive rates as threshold varies, is an important tool for evaluating biomarkers in diagnostic medicine studies. By definition, ROC curve is monotone increasing from 0 to 1 and is invariant to any monotone transformation of test results. And it is often a curve with certain level of smoothness when test results from the diseased and non-diseased subjects follow continuous distributions. Most existing ROC curve estimation methods do not guarantee all of these properties. One of the exceptions is Du and Tang (2009) which applies certain monotone spline regression procedure to empirical ROC estimates. However, their method does not consider the inherent correlations between empirical ROC estimates. This makes the derivation of the asymptotic properties very difficult. In this paper we propose a penalized weighted least square estimation method, which incorporates the covariance between empirical ROC estimates as a weight matrix. The resulting estimator satisfies all the aforementioned properties, and we show that it is also consistent. Then a resampling approach is used to extend our method for comparisons of two or more diagnostic tests. Our simulations show a significantly improved performance over the existing method, especially for steep ROC curves. We then apply the proposed method to a cancer diagnostic study that compares several newly developed diagnostic biomarkers to a traditional one.
ROC curve; Smoothing spline; Bootstrap
Rationale and Objectives
To examine the effects of the number of categories in the rating scale used in an observer experiment on the results of ROC analysis by a simulation study.
Materials and Methods
We have previously evaluated the effects of computer-aided diagnosis (CAD) on radiologists’ characterization of malignant and benign breast masses in serial mammograms. The evaluation of the likelihood of malignancy was performed on a quasi-continuous (0-100 points) confidence-rating scale. In this study, we simulated the use of discrete confidence-rating scales with fewer number of categories and analyzed the results with receiver operating characteristic (ROC) methodology. The observers’ estimates of the likelihood of malignancy were also mapped to BI-RADS assessments with 5 and 7 categories and ROC analysis was performed. The area under the ROC curve and the partial area index obtained from ROC analysis of the different confidence-rating scales were compared.
The fitted ROC curves and the performance indices do not change significantly when the confidence-rating scales were varied from 6 to 101 points if the estimated operating points obtained directly from the data are distributed relatively evenly over the entire range of true-positive fraction (TPF) and false-positive fraction (FPF). The mapping of the likelihood of malignancy observer data to the 7-category BI-RADS assessment scale allowed reliable ROC analysis, whereas mapping to the 5-category BI-RADS scale could cause erratic ROC curve fitting because of the lack of operating points in the mid-range or failure in ROC curve fitting because of data degeneration for some observers.
ROC analysis of discrete confidence rating scales with few but relatively evenly distributed data points over the entire FPF and TPF range is comparable to that of a quasi-continuous rating scale. However, ROC analysis of discrete confidence rating scales with few and unevenly distributed data points may cause unreliable estimations.
Computer-Aided Diagnosis; Continuous and Discrete Confidence Rating Scales; ROC Observer Study; Classification; Mammography
Blood stasis syndrome (BSS) in traditional Asian medicine has been considered to correlate with the extent of atherosclerosis, which can be estimated using the cardioankle vascular index (CAVI). Here, the diagnostic utility of CAVI in predicting BSS was examined. The BSS scores and CAVI were measured in 140 stroke patients and evaluated with respect to stroke risk factors. Receiver operating characteristic (ROC) curve analysis was used to determine the diagnostic accuracy of CAVI for the diagnosis of BSS. The BSS scores correlated significantly with CAVI, age, and systolic blood pressure (SBP). Multiple logistic regression analysis showed that CAVI was a significant associate factor for BSS (OR 1.55, P = 0.032) after adjusting for the age and SBP. The ROC curve showed that CAVI and age provided moderate diagnostic accuracy for BSS (area under the ROC curve (AUC) for CAVI, 0.703, P < 0.001; AUC for age, 0.692, P = 0.001). The AUC of the “CAVI+Age,” which was calculated by combining CAVI with age, showed better accuracy (0.759, P < 0.0001) than those of CAVI or age. The present study suggests that the CAVI combined with age can clinically serve as an objective tool to diagnose BSS in stroke patients.
The objective of the present study is to find out the quantitative relationship between progression of liver fibrosis and the levels of certain serum markers using mathematic model. We provide the sparse logistic regression by using smoothly clipped absolute deviation (SCAD) penalized function to diagnose the liver fibrosis in rats. Not only does it give a sparse solution with high accuracy, it also provides the users with the precise probabilities of classification with the class information. In the simulative case and the experiment case, the proposed method is comparable to the stepwise linear discriminant analysis (SLDA) and the sparse logistic regression with least absolute shrinkage and selection operator (LASSO) penalty, by using receiver operating characteristic (ROC) with bayesian bootstrap estimating area under the curve (AUC) diagnostic sensitivity for selected variable. Results show that the new approach provides a good correlation between the serum marker levels and the liver fibrosis induced by thioacetamide (TAA) in rats. Meanwhile, this approach might also be used in predicting the development of liver cirrhosis.
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
Consider a continuous marker for predicting a binary outcome. For example, serum concentration of prostate specific antigen (PSA) may be used to calculate the risk of finding prostate cancer in a biopsy. In this paper we argue that the predictive capacity of a marker has to do with the population distribution of risk given the marker and suggest a graphical tool, the predictiveness curve, that displays this distribution. The display provides a common meaningful scale for comparing markers that may not be comparable on their original scales. Some existing measures of predictiveness are shown to be summary indices derived from the predictiveness curve. We develop methods for making inference about the predictiveness curve, for making pointwise comparisons between two curves and for evaluating covariate effects. Applications to risk prediction markers in cancer and cystic fibrosis are discussed.
risk; classification; explained variation; biomarker; ROC curve; prediction
To develop more targeted intervention strategies, an important research goal is to identify markers predictive of clinical events. A crucial step towards this goal is to characterize the clinical performance of a marker for predicting different types of events. In this manuscript, we present statistical methods for evaluating the performance of a prognostic marker in predicting multiple competing events. To capture the potential time-varying predictive performance of the marker and incorporate competing risks, we define time- and cause-specific accuracy summaries by stratifying cases based on causes of failure. Such definition would allow one to evaluate the predictive accuracy of a marker for each type of event and compare its predictiveness across event types. Extending the nonparametric crude cause-specific ROC curve estimators by Saha and Heagerty (2010), we develop inference procedures for a range of cause-specific accuracy summaries. To estimate the accuracy measures and assess how covariates may affect the accuracy of a marker under the competing risk setting, we consider two forms of semiparametric models through the cause-specific hazard framework. These approaches enable a flexible modeling of the relationships between the marker and failure times for each cause, while efficiently accommodating additional covariates. We investigate the asymptotic property of the proposed accuracy estimators and demonstrate the finite sample performance of these estimators through simulation studies. The proposed procedures are illustrated with data from a prostate cancer prognostic study.
Biomarker evaluation; Cause-specific Hazard; Competing risk; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Survival analysis
Rationale and Objectives
A basic assumption for a meaningful diagnostic decision variable is that there is a monotone relationship between the decision variable and the likelihood of disease. This relationship, however, generally does not hold for the binormal model. As a result, ROC-curve estimation based on the binormal model produces improper ROC curves that are not concave over the entire domain and cross the chance line. Although in practice the “improperness” is typically not noticeable, there are situations where the improperness is evident. Presently, standard statistical software does not provide diagnostics for assessing the magnitude of the improperness.
Materials and Methods
We show how the mean-to-sigma ratio can be a useful, easy-to-understand and easy-to-use measure for assessing the magnitude of the improperness of a binormal ROC curve by showing how it is related to the chance-line crossing. We suggest an improperness criterion based on the mean-to-sigma ratio.
Using a real-data example we illustrate how the mean-to-sigma ratio can be used to assess the improperness of binormal ROC curves, compare the binormal method with an alternative proper method, and describe uncertainty in a fitted ROC curve with respect to improperness.
By providing a quantitative and easily computable improperness measure, the mean-to-sigma ratio provides an easy way to identify improper binormal ROC curves and facilitates comparison of analysis strategies according to improperness categories in simulation and real-data studies.
receiver operating characteristic (ROC) curve; diagnostic radiology; mean-to-sigma ratio; binormal model; proper ROC model
Receiver operating characteristic (ROC) curves can be used to assess the accuracy of tests measured on ordinal or continuous scales. The most commonly used measure for the overall diagnostic accuracy of diagnostic tests is the area under the ROC curve (AUC). A gold standard test on the true disease status is required to estimate the AUC. However, a gold standard test may sometimes be too expensive or infeasible. Therefore, in many medical research studies, the true disease status of the subjects may remain unknown. Under the normality assumption on test results from each disease group of subjects, using the expectation-maximization (EM) algorithm in conjunction with a bootstrap method, we propose a maximum likelihood based procedure for construction of confidence intervals for the difference in paired areas under ROC curves in the absence of a gold standard test. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities and interval lengths. The proposed method is illustrated with two examples.
Area under the ROC curve; EM algorithm; bootstrap method; gold standard test; maximum likelihood estimation
Early diagnosis of infection with the use of valuable markers leads to decreased mortality and morbidity. The aim of this study was to evaluate the value of procalcitonin (PCT) and C-reactive protein (CRP) for detecting nosocomial infection in hospitalized patients without localizing signs.
We conducted a prospective observational study on 150 hospitalized patients with fever > 38°C emerging 48-72 hours after their admission at Alzahra Hospital, Isfahan, Iran. The subjects did not have any localizing sign of infection. PCT and CRP values were determined using rapid tests and were compared with results of blood culture as the standard test. The sensitivity, specificity, positive and negative predictive values (PV) and likelihood ratios (LRs) were calculated for both PCT and CRP. Receiver operating characteristic (ROC) curves were also used to evaluate the diagnostic value of the PCT and CRP for detecting nosocomial infections. Finally, the areas under the resulting curves were compared.
PCT had a sensitivity of 57.1%, a specificity of 89.1%, a positive PV of 46.2%, and a negative PV of 92.7% while the corresponding percentages for CRP test were 76.2%, 48%, 19.3%, and 92.5%. PCT marker also had a higher positive LR and lower negative LR than did CRP marker. The observed areas under the ROC curves were 0.73 for CRP (95% CI, 0.63-0.82; p = 0.023) and 0.80 for PCT (95% CI, 0.68-0.91; p = 0.001). The optimal cut-off values (best diagnostic accuracy) were 39 mg/L for CRP and 7.5 ng/mL for PCT.
Determination of PCT and CRP is a valuable tool for identifying nosocomial infections. PCT showed better specificity, negative and positive PV. However CRP showed significantly better sensitivity compared with PCT. Therefore, these tests should be considered as part of initial work-up for patients with unknown source of infection.
Nosocomial Infection; Blood Culture; Procalcitonin; C-Reactive Protein
Diagnostic accuracy studies address how well a test identifies the target condition of interest.Sensitivity, specificity, predictive values and likelihood ratios (LRs) are all different ways of expressing test performance.Receiver operating characteristic (ROC) curves compare sensitivity versus specificity across a range of values for the ability to predict a dichotomous outcome. Area under the ROC curve is another measure of test performance.All of these parameters are not intrinsic to the test and are determined by the clinical context in which the test is employed.High sensitivity corresponds to high negative predictive value and is the ideal property of a “rule-out” test.High specificity corresponds to high positive predictive value and is the ideal property of a “rule-in” test.LRs leverage pre-test into post-test probabilities of a condition of interest and there is some evidence that they are more intelligible to users.
Rationale and Objectives
Semiparametric methods provide smooth and continuous receiver operating characteristic (ROC) curve fits to ordinal test results and require only that the data follow some unknown monotonic transformation of the model's assumed distributions. The quantitative relationship between cutoff settings or individual test-result values on the data scale and points on the estimated ROC curve is lost in this procedure, however. To recover that relationship in a principled way, we propose a new algorithm for “proper” ROC curves and illustrate it by use of the proper binormal model.
Materials and Methods
Several authors have proposed the use of multinomial distributions to fit semiparametric ROC curves by maximum-likelihood estimation. The resulting approach requires nuisance parameters that specify interval probabilities associated with the data, which are used subsequently as a basis for estimating values of the curve parameters of primary interest. In the method described here, we employ those “nuisance” parameters to recover the relationship between any ordinal test-result scale and true-positive fraction, false-positive fraction, and likelihood ratio. Computer simulations based on the proper binormal model were used to evaluate our approach in estimating those relationships and to assess the coverage of its confidence intervals for realistically sized datasets.
In our simulations, the method reliably estimated simple relationships between test-result values and the several ROC quantities.
The proposed approach provides an effective and reliable semiparametric method with which to estimate the relationship between cutoff settings or individual test-result values and corresponding points on the ROC curve.
Receiver operating characteristic (ROC) analysis; proper binormal model; likelihood ratio; test-result scale; maximum likelihood estimation (MLE)
A marker's capacity to predict risk of a disease depends on disease prevalence in the target population and its classification accuracy, i.e. its ability to discriminate diseased subjects from non-diseased subjects. The latter is often considered an intrinsic property of the marker; it is independent of disease prevalence and hence more likely to be similar across populations than risk prediction measures. In this paper, we are interested in evaluating the population-specific performance of a risk prediction marker in terms of positive predictive value (PPV) and negative predictive value (NPV) at given thresholds, when samples are available from the target population as well as from another population. A default strategy is to estimate PPV and NPV using samples from the target population only. However, when the marker's classification accuracy as characterized by a specific point on the receiver operating characteristics (ROC) curve is similar across populations, borrowing information across populations allows increased efficiency in estimating PPV and NPV. We develop estimators that optimally combine information across populations. We apply this methodology to a cross-sectional study where we evaluate PCA3 as a risk prediction marker for prostate cancer among subjects with or without previous negative biopsy.
Biomarker; Classification; NPV; PPV; Sensitivity; Specificity
New markers may improve prediction of diagnostic and prognostic outcomes. We review various measures to quantify the incremental value of markers over standard, readily available characteristics. Widely used traditional measures include the improvement in model fit or in the area under the receiver operating characteristic (ROC) curve (AUC). New measures include the net reclassification index (NRI) and decision–analytic measures, such as the fraction of true positive classifications penalized for false positive classifications (‘net benefit’, NB).
For illustration we discuss a case study on the presence of residual tumor versus benign tissue in 544 patients with testicular cancer. We assessed 3 tumor markers (AFP, HCG, and LDH) for their incremental value over currently standard clinical predictors. AUC and R2 values suggested adding continuous LDH and AFP whereas NB only favored HCG as a potentially promising marker at a clinically defendable decision threshold of 20% risk. Results based on the NRI fell in the middle, suggesting reclassification potential of all three markers.
We conclude that improvement in standard discrimination measures, which focus on finding variables that might be promising across all decision thresholds, may not detect the most informative markers at a specific threshold of particular clinical relevance. When a marker is intended to support decision making, calculation of the improvement in a decision–analytic measure, such as NB, is preferable over an overall judgment as obtained from the AUC in ROC analysis.
prediction; logistic regression model; performance measures; incremental value
In order to improve our understanding of the molecular pathways that mediate tumor proliferation and angiogenesis, and to evaluate the biological response to anti-angiogenic therapy, we analyzed the changes in the protein profile of glioblastoma in response to treatment with recombinant human Platelet Factor 4-DLR mutated protein (PF4-DLR), an inhibitor of angiogenesis.
U87-derived experimental glioblastomas were grown in the brain of xenografted nude mice, treated with PF4-DLR, and processed for proteomic analysis. More than fifty proteins were differentially expressed in response to PF4-DLR treatment. Among them, integrin-linked kinase 1 (ILK1) signaling pathway was first down-regulated but then up-regulated after treatment for prolonged period. The activity of PF4-DLR can be increased by simultaneously treating mice orthotopically implanted with glioblastomas, with ILK1-specific siRNA. As ILK1 is related to malignant progression and a poor prognosis in various types of tumors, we measured ILK1 expression in human glioblatomas, astrocytomas and oligodendrogliomas, and found that it varied widely; however, a high level of ILK1 expression was correlated to a poor prognosis.
Our results suggest that identifying the molecular pathways induced by anti-angiogenic therapies may help the development of combinaatorial treatment strategies that increase the therapeutic efficacy of angiogenesis inhibitors by association with specific agents that disrupt signaling in tumor cells.
The aim of this study was to assess the predictive capacity of body fat percentage (%BF) estimated by equations using body mass index (BMI) and waist circumference (WC) to identify hypertension and estimate measures of association between high %BF and hypertension in adults.
This is a cross-sectional population-based study conducted with 1,720 adults (20–59 years) from Florianopolis, southern Brazil. The area under the ROC curve, sensitivity, specificity, predictive values, and likelihood ratios of cutoffs for %BF were calculated. The association between %BF and hypertension was analyzed using Poisson regression, estimating the unadjusted and adjusted prevalence ratios and 95% CI.
The %BF equations showed good discriminatory power for hypertension (area under the ROC curve > 0.50). Considering the entire sample, the cutoffs for %BF with better properties for screening hypertension were identified in the equation with BMI for men (%BF = 20.4) and with WC for women (%BF = 34.1). Adults with high %BF had a higher prevalence of hypertension.
The use of simple anthropometric measurements allowed identifying the %BF, diagnosing obesity, and screening people at risk of hypertension in order to refer them for more careful diagnostic evaluation.
Anthropometry; Risk factors; Obesity; Hypertension; Adults
The receiver operating characteristic (ROC) curve is used to evaluate a biomarker’s ability for classifying disease status. The Youden Index (J), the maximum potential effectiveness of a biomarker, is a common summary measure of the ROC curve. In biomarker development, levels may be unquantifiable below a limit of detection (LOD) and missing from the overall dataset. Disregarding these observations may negatively bias the ROC curve and thus J. Several correction methods have been suggested for mean estimation and testing; however, little has been written about the ROC curve or its summary measures. We adapt non-parametric (empirical) and semi-parametric (ROC-GLM [generalized linear model]) methods and propose parametric methods (maximum likelihood (ML)) to estimate J and the optimal cut-point (c*) for a biomarker affected by a LOD. We develop unbiased estimators of J and c* via ML for normally and gamma distributed biomarkers. Alpha level confidence intervals are proposed using delta and bootstrap methods for the ML, semi-parametric, and non-parametric approaches respectively. Simulation studies are conducted over a range of distributional scenarios and sample sizes evaluating estimators’ bias, root-mean square error, and coverage probability; the average bias was less than one percent for ML and GLM methods across scenarios and decreases with increased sample size. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods. We address the limitations and usefulness of each method in order to give researchers guidance in constructing appropriate estimates of biomarkers’ true discriminating capabilities.
Youden Index; ROC curve; Sensitivity and Specificity; Optimal Cut-Point
To determine the generalizability of the predictions for 90-day mortality generated by Model for End-stage Liver Disease (MELD) and the serum sodium augmented MELD (MELDNa) to Atlantic Canadian adults with end-stage liver disease awaiting liver transplantation (LT).
The predictive accuracy of the MELD and the MELDNa was evaluated by measurement of the discrimination and calibration of the respective models’ estimates for the occurrence of 90-day mortality in a consecutive cohort of LT candidates accrued over a five-year period. Accuracy of discrimination was measured by the area under the ROC curves. Calibration accuracy was evaluated by comparing the observed and model-estimated incidences of 90-day wait-list failure for the total cohort and within quantiles of risk.
The area under the ROC curve for the MELD was 0.887 (95% CI 0.705 to 0.978) – consistent with very good accuracy of discrimination. The area under the ROC curve for the MELDNa was 0.848 (95% CI 0.681 to 0.965). The observed incidence of 90-day wait-list mortality in the validation cohort was 7.9%, which was not significantly different from the MELD estimate of 6.6% (95% CI 4.9% to 8.4%; P=0.177) or the MELDNa estimate of 5.8% (95% CI 3.5% to 8.0%; P=0.065). Global goodness-of-fit testing found no evidence of significant lack of fit for either model (Hosmer-Lemeshow χ2 [df=3] for MELD 2.941, P=0.401; for MELDNa 2.895, P=0.414).
Both the MELD and the MELDNa accurately predicted the occurrence of 90-day wait-list mortality in the study cohort and, therefore, are generalizable to Atlantic Canadians with end-stage liver disease awaiting LT.
End-stage liver disease; Liver transplantation; Mortality; Statistical models; Validation study; Wait list
Statistical evaluation of medical imaging tests used for diagnostic and prognostic purposes often employ receiver operating characteristic (ROC) curves. Two methods for ROC analysis are popular. The ordinal regression method is the standard approach used when evaluating tests with ordinal values. The direct ROC modeling method is a more recently developed approach that has been motivated by applications to tests with continuous values, such as biomarkers.
In this paper, we compare the methods in terms of model formulations, interpretations of estimated parameters, the ranges of scientific questions that can be addressed with them, their computational algorithms and the efficiencies with which they use data.
We show that a strong relationship exists between the methods by demonstrating that they fit the same models when only a single test is evaluated. The ordinal regression models are typically alternative parameterizations of the direct ROC models and vice-versa. The direct method has two major advantages over the ordinal regression method: (i) estimated parameters relate directly to ROC curves. This facilitates interpretations of covariate effects on ROC performance; and (ii) comparisons between tests can be done directly in this framework. Comparisons can be made while accommodating covariate effects and comparisons can be made even between tests that have values on different scales, such as between a continuous biomarker test and an ordinal valued imaging test. The ordinal regression method provides slightly more precise parameter estimates from data in our simulated data models.
While the ordinal regression method is slightly more efficient, the direct ROC modeling method has important advantages in regards to interpretation and it offers a framework to address a broader range of scientific questions including the facility to compare tests.
comparisons; covariates; diagnostic test; markers; ordinal regression; percentile values
The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Non-parametric, semiparametric and parametric estimators are calculated. Comparisons between curves are based on the area or partial area under the ROC curve. Alternatively pointwise comparisons between ROC curves or inverse ROC curves can be made. Options to adjust these analyses for covariates, and to perform ROC regression are described in a companion article. We use a unified framework by representing the ROC curve as the distribution of the marker in cases after standardizing it to the control reference distribution.