Selecting controls that match cases on risk factors for the outcome is a pervasive practice in biomarker research studies. Yet, such matching biases estimates of biomarker prediction performance. The magnitudes of bias are unknown.
We examined the prediction performance of biomarkers and improvements in prediction gained by adding biomarkers to risk factor information. Data simulated from bivariate normal statistical models and data from a study to identify critically ill patients were used. We compared true performance with that estimated from case-control studies that do or do not use matching. Receiver operating characteristic curves quantified performance. We propose a new statistical method to estimate prediction performance from matched studies when data on the matching factors are available for subjects in the population.
Performance estimated with standard analyses can be grossly biased by matching especially when biomarkers are highly correlated with matching risk factors. In our studies, the performance of the biomarker alone was underestimated while the improvement in performance gained by adding the marker to risk factors was overestimated by 2 to 10 fold. We found examples where the relative ranking of two biomarkers for prediction was inappropriately reversed by use of a matched design. The new approach to estimation corrected for bias in matched studies.
To properly gauge prediction performance in the population or the improvement gained by adding a biomarker to known risk factors, matched case-control studies must be supplemented with risk factor information from the population and must be analyzed with nonstandard statistical methods.
design; diagnosis; prediction; prognosis; receiver operating characteristic curve
The diagnostic likelihood ratio function, DLR, is a statistical measure used to evaluate risk prediction markers. The goal of this paper is to develop new methods to estimate the DLR function. Furthermore, we show how risk prediction markers can be compared using rank-invariant DLR functions. Various estimators are proposed that accommodate cohort or case–control study designs. Performances of the estimators are compared using simulation studies. The methods are illustrated by comparing a lung function measure and a nutritional status measure for predicting subsequent onset of major pulmonary infection in children suffering from cystic fibrosis. For continuous markers, the DLR function is mathematically related to the slope of the receiver operating characteristic (ROC) curve, an entity used to evaluate diagnostic markers. We show that our methodology can be used to estimate the slope of the ROC curve and illustrate use of the estimated ROC derivative in variance and sample size calculations for a diagnostic biomarker study.
Biomarker; density estimation; diagnosis; logistic regression; rank invariant; risk prediction; ROC–GLM
Statistical evaluation of medical imaging tests used for diagnostic and prognostic purposes often employ receiver operating characteristic (ROC) curves. Two methods for ROC analysis are popular. The ordinal regression method is the standard approach used when evaluating tests with ordinal values. The direct ROC modeling method is a more recently developed approach that has been motivated by applications to tests with continuous values, such as biomarkers.
In this paper, we compare the methods in terms of model formulations, interpretations of estimated parameters, the ranges of scientific questions that can be addressed with them, their computational algorithms and the efficiencies with which they use data.
We show that a strong relationship exists between the methods by demonstrating that they fit the same models when only a single test is evaluated. The ordinal regression models are typically alternative parameterizations of the direct ROC models and vice-versa. The direct method has two major advantages over the ordinal regression method: (i) estimated parameters relate directly to ROC curves. This facilitates interpretations of covariate effects on ROC performance; and (ii) comparisons between tests can be done directly in this framework. Comparisons can be made while accommodating covariate effects and comparisons can be made even between tests that have values on different scales, such as between a continuous biomarker test and an ordinal valued imaging test. The ordinal regression method provides slightly more precise parameter estimates from data in our simulated data models.
While the ordinal regression method is slightly more efficient, the direct ROC modeling method has important advantages in regards to interpretation and it offers a framework to address a broader range of scientific questions including the facility to compare tests.
comparisons; covariates; diagnostic test; markers; ordinal regression; percentile values
The predictiveness curve is a graphical tool that characterizes the population distribution of Risk(Y) = P(D = 1|Y), where D denotes a binary outcome such as occurrence of an event within a specified time period and Y denotes predictors. A wider distribution of Risk(Y) indicates better performance of a risk model in the sense that making treatment recommendations is easier for more subjects. Decisions are more straightforward when a subject's risk is deemed to be high or low. Methods have been developed to estimate predictiveness curves from cohort studies. However early phase studies to evaluate novel risk prediction markers typically employ case-control designs. Here we present semiparametric and nonparametric methods for evaluating a continuous risk prediction marker that accommodate case-control data. Small sample properties are investigated through simulation studies. The semiparametric methods are substantially more efficient than their nonparametric counterparts under a correctly specified model. We generalize them to settings where multiple prediction markers are involved. Applications to prostate cancer risk prediction markers illustrate methods for comparing the risk prediction capacities of markers and for evaluating the increment in performance gained by adding a marker to a baseline risk model. We propose a modified Hosmer-Lemeshow test for case-control study data to assess calibration of the risk model that is a natural complement to this graphical tool.
biomarker; case-control study; classification; Hosmer-Lemeshow test; predictiveness curve; risk; ROC curve
Consider a continuous marker for predicting a binary outcome. For example, serum concentration of prostate specific antigen (PSA) may be used to calculate the risk of finding prostate cancer in a biopsy. In this paper we argue that the predictive capacity of a marker has to do with the population distribution of risk given the marker and suggest a graphical tool, the predictiveness curve, that displays this distribution. The display provides a common meaningful scale for comparing markers that may not be comparable on their original scales. Some existing measures of predictiveness are shown to be summary indices derived from the predictiveness curve. We develop methods for making inference about the predictiveness curve, for making pointwise comparisons between two curves and for evaluating covariate effects. Applications to risk prediction markers in cancer and cystic fibrosis are discussed.
risk; classification; explained variation; biomarker; ROC curve; prediction
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods.
Biased sampling; Biomarker; Case-control; Predictiveness curve; Risk prediction; Semiparametric method
The classification accuracy of a continuous marker is typically evaluated with the receiver operating characteristic (ROC) curve. In this paper, we study an alternative conceptual framework, the “percentile value.” In this framework, the controls only provide a reference distribution to standardize the marker. The analysis proceeds by analyzing the standardized marker in cases. The approach is shown to be equivalent to ROC analysis. Advantages are that it provides a framework familiar to a broad spectrum of biostatisticians and it opens up avenues for new statistical techniques in biomarker evaluation. We develop several new procedures based on this framework for comparing biomarkers and biomarker performance in different populations. We develop methods that adjust such comparisons for covariates. The methods are illustrated on data from 2 cancer biomarker studies.
Biomarker; Classification; Covariate adjustment; Percentile value; ROC; Standardization
Consider a set of baseline predictors X to predict a binary outcome D and let Y be a novel marker or predictor. This paper is concerned with evaluating the performance of the augmented risk model P(D = 1|Y,X) compared with the baseline model P(D = 1|X). The diagnostic likelihood ratio, DLRX(y), quantifies the change in risk obtained with knowledge of Y = y for a subject with baseline risk factors X. The notion is commonly used in clinical medicine to quantify the increment in risk prediction due to Y. It is contrasted here with the notion of covariate-adjusted effect of Y in the augmented risk model. We also propose methods for making inference about DLRX(y). Case–control study designs are accommodated. The methods provide a mechanism to investigate if the predictive information in Y varies with baseline covariates. In addition, we show that when combined with a baseline risk model and information about the population distribution of Y given X, covariate-specific predictiveness curves can be estimated. These curves are useful to an individual in deciding if ascertainment of Y is likely to be informative or not for him. We illustrate with data from 2 studies: one is a study of the performance of hearing screening tests for infants, and the other concerns the value of serum creatinine in diagnosing renal artery stenosis.
Biomarker; Classification; Diagnostic likelihood ratio; Diagnostic test; Logistic regression; Posterior probability
Development of a disease screening biomarker involves several phases. In phase 2 its sensitivity and specificity is compared with established thresholds for minimally acceptable performance. Since we anticipate that most candidate markers will not prove to be useful and availability of specimens and funding is limited, early termination of a study is appropriate if accumulating data indicate that the marker is inadequate. Yet, for markers that complete phase 2, we seek estimates of sensitivity and specificity to proceed with the design of subsequent phase 3 studies.
We suggest early stopping criteria and estimation procedures that adjust for bias caused by the early termination option. An important aspect of our approach is to focus on properties of estimates conditional on reaching full study enrollment. We propose the conditional-UMVUE and contrast it with other estimates, including naïve estimators, the well studied unconditional-UMVUE and the mean and median Whitehead adjusted estimators. The conditional-UMVUE appears to be a very good choice.