The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.
We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis.
The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.
A combination of biomarkers in a multivariate model may predict disease with greater accuracy than a single biomarker employed alone. We developed a non-linear method of multivariate analysis, weighted digital analysis (WDA), and evaluated its ability to predict lung cancer employing volatile biomarkers in the breath.
WDA generates a discriminant function to predict membership in disease vs no disease groups by determining weight, a cutoff value, and a sign for each predictor variable employed in the model. The weight of each predictor variable was the area under the curve (AUC) of the receiver operating characteristic (ROC) curve minus a fixed offset of 0.55, where the AUC was obtained by employing that predictor variable alone, as the sole marker of disease. The sign (±) was used to invert the predictor variable if a lower value indicated a higher probability of disease. When employed to predict the presence of a disease in a particular patient, the discriminant function was determined as the sum of the weights of all predictor variables that exceeded their cutoff values. The algorithm that generates the discriminant function is deterministic because parameters are calculated from each individual predictor variable without any optimization or adjustment. We employed WDA to re-evaluate data from a recent study of breath biomarkers of lung cancer, comprising the volatile organic compounds (VOCs) in the alveolar breath of 193 subjects with primary lung cancer and 211 controls with a negative chest CT.
The WDA discriminant function accurately identified patients with lung cancer in a model employing 30 breath VOCs (ROC curve AUC = 0.90; sensitivity = 84.5%, specificity = 81.0%). These results were superior to multi-linear regression analysis of the same data set (AUC= 0.74, sensitivity = 68.4, specificity = 73.5%). WDA test accuracy did not vary appreciably with TNM (tumor, node, metastasis) stage of disease, and results were not affected by tobacco smoking (ROC curve AUC =0.92 in current smokers, 0.90 in former smokers). WDA was a robust predictor of lung cancer: random removal of 1/3 of the VOCs did not reduce the AUC of the ROC curve by >10% (99.7% CI).
A test employing WDA of breath VOCs predicted lung cancer with accuracy similar to chest computed tomography. The algorithm identified dependencies that were not apparent with traditional linear methods. WDA appears to provide a useful new technique for non-linear multivariate analysis of data.
Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved.
In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites.
Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.
Many markers have been indicated as predictors of type 2 diabetes. However, the question of whether or not non-glycaemic (blood) biomarkers and non-blood biomarkers have a predictive additive utility when combined with glycaemic (blood) biomarkers is unknown. The study aim is to assess this additive utility in a large Japanese population.
We used data from a retrospective cohort study conducted from 1998 to 2002 for the baseline and 2002 to 2006 for follow-up, inclusive of 5,142 men (mean age of 51.9 years) and 4,847 women (54.1 years) at baseline. The cumulative incidence of diabetes [defined either as a fasting plasma glucose (FPG) ≥7.00 mmol/l or as clinically diagnosed diabetes] was measured. In addition to glycaemic biomarkers [FPG and hemoglobin A1c (HbA1c)], we examined the clinical usefulness of adding non-glycaemic biomarkers and non-blood biomarkers, using sensitivity and specificity, and the area under the curve (AUC) of the receiver operating characteristics.
The AUCs to predict diabetes were 0.874 and 0.924 for FPG, 0.793 and 0.822 for HbA1c, in men and women, respectively. Glycaemic biomarkers were the best and second-best for diabetes prediction among the markers. All non-glycaemic markers (except uric acid in men and creatinine in both sexes) predicted diabetes. Among these biomarkers, the highest AUC in the single-marker analysis was 0.656 for alanine aminotransferase (ALT) in men and 0.740 for body mass index in women. The AUC of the combined markers of FPG and HbA1c was 0.895 in men and 0.938 in women, which were marginally increased to 0.904 and 0.940 when adding ALT, respectively.
AUC increments were marginal when adding non-glycaemic biomarkers and non-blood biomarkers to the classic model based on FPG and HbA1c. For the prediction of diabetes, FPG and HbA1c are sufficient and the other markers may not be needed in clinical practice.
Rational and Objectives
Receiver operating characteristic analysis (ROC) is often used to find the optimal combination of biomarkers. When the subject level covariates affect the magnitude and/or accuracy of the biomarkers, the combination rule should take into account of the covariate adjustment. The authors propose two new biomarker combination methods that make use of the covariate information.
Materials and Methods
The first method is to maximize the area under covariate-adjusted ROC curve (AAUC). To overcome the limitations of the AAUC measure, the authors further proposed the area under covariate standardized ROC curve (SAUC), which is an extension of the covariate-specific ROC curve. With a series of simulation studies, the proposed optimal AAUC and SAUC methods are compared with the optimal AUC method that ignores the covariates. The biomarker combination methods are illustrated by an example from Alzheimer's disease research.
The simulation results indicate that the optimal AAUC combination performs well in the current study population. The optimal SAUC method is flexible to choose any reference populations, and allows the results to be generalized to different populations.
The proposed optimal AAUC and SAUC approaches successfully address the covariate adjustment problem in estimating the optimal marker combination. The optimal SAUC method is preferred for practical use, because the biomarker combination rule can be easily evaluated for different population of interest.
Biomarker combination; covariate adjustment; AUC; covariate standardization
An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. Thus it is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is also of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers.
The ROC (receiver operating characteristic) technique has been widely used in disease classification with low dimensional biomarkers. Compared with the empirical ROC approach, the binormal ROC is computationally more affordable and robust in small sample size cases. We propose using the binormal AUC (area under the ROC curve) as the objective function for two-sample classification, and the scaled threshold gradient directed regularization method for regularized estimation and biomarker selection. Tuning parameter selection is based on V-fold cross validation. We develop Monte Carlo based methods for evaluating the stability of individual biomarkers and overall prediction performance. Extensive simulation studies show that the proposed approach can generate parsimonious models with excellent classification and prediction performance, under most simulated scenarios including model mis-specification. Application of the method to two cancer studies shows that the identified genes are reasonably stable with satisfactory prediction performance and biologically sound implications. The overall classification performance is satisfactory, with small classification errors and large AUCs.
In comparison to existing methods, the proposed approach is computationally more affordable without losing the optimality possessed by the standard ROC method.
It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties . We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times.
The area under the receiver operating characteristics curve (AUC of ROC) is a widely used measure of discrimination in risk prediction models. Routinely, the Mann–Whitney statistics is used as an estimator of AUC, while the change in AUC is tested by the DeLong test. However, very often, in settings where the model is developed and tested on the same dataset, the added predictor is statistically significantly associated with the outcome but fails to produce a significant improvement in the AUC. No conclusive resolution exists to explain this finding. In this paper, we will show that the reason lies in the inappropriate application of the DeLong test in the setting of nested models. Using numerical simulations and a theoretical argument based on generalized U-statistics, we show that if the added predictor is not statistically significantly associated with the outcome, the null distribution is non-normal, contrary to the assumption of DeLong test. Our simulations of different scenarios show that the loss of power because of such a misuse of the DeLong test leads to a conservative test for small and moderate effect sizes. This problem does not exist in cases of predictors that are associated with the outcome and for non-nested models. We suggest that for nested models, only the test of association be performed for the new predictors, and if the result is significant, change in AUC be estimated with an appropriate confidence interval, which can be based on the DeLong approach.
AUC; DeLong test; logistic regression; U-statistics; discrimination; risk prediction
A widely held viewpoint in the field of predictive biomarkers for disease holds that no single marker can provide high enough discrimination and that a panel of markers, combined in some type of algorithm, will be needed. Motivated by a recent study where 27 additional markers for ovarian cancer, many of which had good predictive value alone, failed to substantially increase the predictive ability of the primary marker of CA125, we explore the effect of additional markers on the area under the ROC curve (AUC). We develop a statistical model based on the multivariate normal distribution and linear algorithms and use it to explore how the magnitude and direction of statistical correlation among the markers (in diseased and in non-diseased) is critical in determining the added predictive value of additional markers. We show mathematically and empirically that if the additional marker(s) is negatively correlated with the primary marker, then it will always be able to provide increased AUC when combined with the primary marker (as compared to that obtained with the primary marker alone), even if it has little predictive ability on its own. In contrast, if the additional marker(s) is positively correlated with the primary marker, then it is unlikely to substantially increase the AUC when combined with the primary marker, even when it has good predictive ability on its own. Thus, univariate analyses alone may not be the best approach in choosing which markers to combine in a predictive panel of markers; patterns of statistical correlation should be considered in ranking top-performing biomarkers.
correlation; ROC AUC; biomarkers; multivariate normal distribution; linear algorithm
The area under the ROC curve (AUC) and partial area under the ROC curve (pAUC) are summary measures used to assess the accuracy of a biomarker in discriminating true disease status. The standard sampling approach used in biomarker validation studies is often inefficient and costly, especially when ascertaining the true disease status is costly and invasive. To improve efficiency and reduce the cost of biomarker validation studies, we consider a test-result-dependent sampling (TDS) scheme, in which subject selection for determining the disease state is dependent on the result of a biomarker assay. We first estimate the test-result distribution using data arising from the TDS design. With the estimated empirical test-result distribution, we propose consistent nonparametric estimators for AUC and pAUC and establish the asymptotic properties of the proposed estimators. Simulation studies show that the proposed estimators have good finite sample properties and that the TDS design yields more efficient AUC and pAUC estimates than a simple random sampling (SRS) design. A data example based on an ongoing cancer clinical trial is provided to illustrate the TDS design and the proposed estimators. This work can find broad applications in design and analysis of biomarker validation studies.
Area under ROC curve (AUC); Empirical likelihood; Nonparametric; Partial area under ROC curve (pAUC); Simple random sampling; Test-result-dependent sampling
Objective. To investigate the endocrine and/or clinical characteristics of women with low anti-Müllerian hormone (AMH) that could improve the accuracy of IVF outcome prediction based on the female age alone prior to the first GnRH antagonist IVF cycle. Methods. Medical records of 129 patients with low AMH level (<6.5 pmol/L) who underwent their first GnRH antagonist ovarian stimulation protocol for IVF/ICSI were retrospectively analyzed. The main outcome measure was the area under the ROC curve (AUC-ROC) for the models combining age and other potential predictive factors for the clinical pregnancy. Results. Clinical pregnancy rate (CPR) per initiated cycles was 11.6%. For the prediction of clinical pregnancy, DHEAS and age showed AUC-ROC of 0.726 (95%CI 0.641–0.801) and 0.662 (95%CI 0.573–0.743), respectively (P = 0.522). The predictive accuracy of the model combining age and DHEAS (AUC-ROC 0.796; 95%CI 0.716–0.862) was significantly higher compared to that of age alone (P = 0.013). In patients <37.5 years with DHEAS >5.7 pmol/L, 60% (9/15) of all pregnancies were achieved with CPR of 37.5%. Conclusions. DHEAS appears to be predictive for clinical pregnancy in younger women (<37.5 years) with low AMH after the first GnRH antagonist IVF cycle. Therefore, DHEAS-age model could refine the pretreatment counseling on pregnancy prospects following IVF.
Rationale and Objectives
Two problems of the Dorfman-Berbaum-Metz (DBM) method for analyzing multireader ROC studies are that it tends to be conservative and can produce AUC estimates outside the parameter space – i.e., greater than one or less than zero. Recently it has been shown that the problem of AUC (or other accuracy) estimates outside the parameter space can be eliminated by using normalized pseudovalues, and it has been suggested that less data-based model simplification be used. Our purpose is to empirically investigate if these two modifications – normalized pseudovalues and less data-based model simplification – result in improved performance.
Materials and Methods
We examine the performance of the DBM procedure using the two proposed modifications for discrete and continuous ratings in a null simulation study comparing modalities with respect to the ROC area. The simulation study includes 144 different combinations of reader and case sample sizes, normal/abnormal case sample ratios, and variance components. The ROC area is estimated using parametric and nonparametric estimation.
The DBM procedure with both modifications performs better than either the original DBM procedure or the DBM procedure with only one of the modifications. For parametric estimation with discrete rating data, use of both modifications resulted in the mean type I error (0.043) closest to the nominal .05 level and the smallest range (0.050) and standard deviation (0.0108) across the 144 type I error rates.
We recommend that normalized pseudovalues and less data-based model simplification be used with the DBM procedure.
receiver operating characteristic (ROC) curve; DBM; diagnostic radiology; corrected F
Motivation: The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this ‘early retrieval’ problem.
Results: To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization—the CROC(exp), an exponential transform of the ROC curve—as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix.
Availability: Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/
Although the area under the receiver operating characteristic (ROC) curve (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new risk marker in an existing risk model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this paper, we extended the NRI and IDI to time-to-event settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies showed that the time-dependent NRI and IDI have better performance than Pencina’s NRI and IDI for measuring the improved discriminatory power of a new risk marker in prognostic survival models.
Improved discrimination; Prognostic survival models; Time-dependent NRI; Time-dependent IDI
The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC “has vastly inferior statistical properties,” i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests.
We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper.
We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.
Biomarkers; Classification; Area under the ROC curve
The area under a receiver operating characteristic (ROC) curve (AUC) is a commonly used index for summarizing the ability of a continuous diagnostic test to discriminate between healthy and diseased subjects. If all subjects have their true disease status verified, one can directly estimate the AUC nonparametrically using the Wilcoxon statistic. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Because estimators of the AUC based only on verified subjects are typically biased, it is common to estimate the AUC from a bias-corrected ROC curve. The variance of the estimator, however, does not have a closed-form expression and thus resampling techniques are used to obtain an estimate. In this paper, we develop a new method for directly estimating the AUC in the setting of verification bias based on U-statistics and inverse probability weighting. Closed-form expressions for the estimator and its variance are derived. We also show that the new estimator is equivalent to the empirical AUC derived from the bias-corrected ROC curve arising from the inverse probability weighting approach.
Diagnostic test; Inverse probability weighting; Missing at random; U-statistic
An increasing number of genetic variants have been identified for many complex diseases. However, it is controversial whether risk prediction based on genomic profiles will be useful clinically. Appropriate statistical measures to evaluate the performance of genetic risk prediction models are required. Previous studies have mainly focused on the use of the area under the receiver operating characteristic (ROC) curve, or AUC, to judge the predictive value of genetic tests. However, AUC has its limitations and should be complemented by other measures. In this study, we develop a novel unifying statistical framework that connects a large variety of predictive indices together. We showed that, given the overall disease probability and the level of variance in total liability (or heritability) explained by the genetic variants, we can estimate analytically a large variety of prediction metrics, for example the AUC, the mean risk difference between cases and non-cases, the net reclassification improvement (ability to reclassify people into high- and low-risk categories), the proportion of cases explained by a specific percentile of population at the highest risk, the variance of predicted risks, and the risk at any percentile. We also demonstrate how to construct graphs to visualize the performance of risk models, such as the ROC curve, the density of risks, and the predictiveness curve (disease risk plotted against risk percentile). The results from simulations match very well with our theoretical estimates. Finally we apply the methodology to nine complex diseases, evaluating the predictive power of genetic tests based on known susceptibility variants for each trait.
Recently many genetic variants have been established for diseases, and the findings have raised hope for risk prediction based on genomic profiles. However, we need to have proper statistical measures to assess the usefulness of such tests. In this study, we developed a statistical framework which enables us to evaluate many predictive indices analytically. It is based on the liability threshold model, which postulates a latent liability that is normally distributed. Affected individuals are assumed to have a liability exceeding a certain threshold. We demonstrated that, given the overall disease probability and variance in liability explained by the genetic markers, we can compute a variety of predictive indices. An example is the area under the receiver operating characteristic (ROC) curve, or AUC, which is very commonly employed. However, the limitations of AUC are often ignored, and we proposed complementing it with other indices. We have therefore also computed other metrics like the average difference in risks between cases and non-cases, the ability of reclassification into high- and low-risk categories, and the proportion of cases accounted for by a certain percentile of population at the highest risk. We also derived how to construct graphs showing the risk distribution in population.
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator.
Genome-wide association studies in human populations have facilitated the creation of genomic profiles that combine the effects of many associated genetic variants to predict risk of disease. However, genomic profiles are inherently constrained in their ability to classify diseased from non-diseased individuals dictated by the genetic epidemiology of the disease. In this paper, we use a genetic interpretation to provide insight into the constraints on genomic profiles for risk prediction. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability available as an online calculator.
Rationale and Objective
To assess similarities and differences between methods of performance comparisons under binary (yes/no) and receiver operating characteristic (ROC) -type pseudo-continuous (0-100) rating data ascertained during an observer performance study of interpretation of full field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis (DBT).
Materials and Methods
Rating data consisted of ROC-type pseudo-continuous and binary ratings generated by 8 radiologists evaluating 77 digital mammography examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination, or equivalently the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between an FFDM alone mode versus an FFDM plus DBT mode were compared in the context of fixed and random reader variability of the estimates.
The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs. 0.07) and for the majority of individual readers (6 out of 8). Standardized differences were consistent with this finding (2.32 vs. 1.63 on average). Reader-averaged differences in AUCs standardized by fixed and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points.
The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios where analysis based on binary responses may provide statistical advantages.
breast cancer; digital breast tomosynthesis; observer performance; rating scale
Many efforts to reduce prostate specific antigen (PSA) overdiagnosis and overtreatment have been made. To this aim, Prostate Health Index (Phi) and Prostate Cancer Antigen 3 (PCA3) have been proposed as new more specific biomarkers. We evaluated the ability of phi and PCA3 to identify prostate cancer (PCa) at initial prostate biopsy in men with total PSA range of 2–10 ng/ml. The performance of phi and PCA3 were evaluated in 300 patients undergoing first prostate biopsy. ROC curve analyses tested the accuracy (AUC) of phi and PCA3 in predicting PCa. Decision curve analyses (DCA) were used to compare the clinical benefit of the two biomarkers. We found that the AUC value of phi (0.77) was comparable to those of %p2PSA (0.76) and PCA3 (0.73) with no significant differences in pairwise comparison (%p2PSA vs phi p = 0.673, %p2PSA vs. PCA3 p = 0.417 and phi vs. PCA3 p = 0.247). These three biomarkers significantly outperformed fPSA (AUC = 0.60), % fPSA (AUC = 0.62) and p2PSA (AUC = 0.63). At DCA, phi and PCA3 exhibited a very close net benefit profile until the threshold probability of 25%, then phi index showed higher net benefit than PCA3. Multivariable analysis showed that the addition of phi and PCA3 to the base multivariable model (age, PSA, %fPSA, DRE, prostate volume) increased predictive accuracy, whereas no model improved single biomarker performance. Finally we showed that subjects with active surveillance (AS) compatible cancer had significantly lower phi and PCA3 values (p<0.001 and p = 0.01, respectively). In conclusion, both phi and PCA3 comparably increase the accuracy in predicting the presence of PCa in total PSA range 2–10 ng/ml at initial biopsy, outperforming currently used %fPSA.
Lung cancer is a complex polygenic disease. Although recent genome-wide association (GWA) studies have identified multiple susceptibility loci for lung cancer, most of these variants have not been validated in a Chinese population. In this study, we investigated whether a genetic risk score combining multiple.
Five single-nucleotide polymorphisms (SNPs) identified in previous GWA or large cohort studies were genotyped in 5068 Chinese case–control subjects. The genetic risk score (GRS) based on these SNPs was estimated by two approaches: a simple risk alleles count (cGRS) and a weighted (wGRS) method. The area under the receiver operating characteristic (ROC) curve (AUC) in combination with the bootstrap resampling method was used to assess the predictive performance of the genetic risk score for lung cancer.
Four independent SNPs (rs2736100, rs402710, rs4488809 and rs4083914), were found to be associated with a risk of lung cancer. The wGRS based on these four SNPs was a better predictor than cGRS. Using a liability threshold model, we estimated that these four SNPs accounted for only 4.02% of genetic variance in lung cancer. Smoking history contributed significantly to lung cancer (P < 0.001) risk [AUC = 0.619 (0.603-0.634)], and incorporated with wGRS gave an AUC value of 0.639 (0.621-0.652) after adjustment for over-fitting. This model shows promise for assessing lung cancer risk in a Chinese population.
Our results indicate that although genetic variants related to lung cancer only added moderate discriminatory accuracy, it still improved the predictive ability of the assessment model in Chinese population.
Chinese; Cumulative risk; Genetic risk score; Lung cancer; Risk assessment
We investigated whether metabolic biomarkers and single nucleotide polymorphisms (SNPs) improve diabetes prediction beyond age, anthropometry, and lifestyle risk factors.
RESEARCH DESIGN AND METHODS
A case-cohort study within a prospective study was designed. We randomly selected a subcohort (n = 2,500) from 26,444 participants, of whom 1,962 were diabetes free at baseline. Of the 801 incident type 2 diabetes cases identified in the cohort during 7 years of follow-up, 579 remained for analyses after exclusions. Prediction models were compared by receiver operatoring characteristic (ROC) curve and integrated discrimination improvement.
Case-control discrimination by the lifestyle characteristics (ROC-AUC: 0.8465) improved with plasma glucose (ROC-AUC: 0.8672, P < 0.001) and A1C (ROC-AUC: 0.8859, P < 0.001). ROC-AUC further improved with HDL cholesterol, triglycerides, γ-glutamyltransferase, and alanine aminotransferase (0.9000, P = 0.002). Twenty SNPs did not improve discrimination beyond these characteristics (P = 0.69).
Metabolic markers, but not genotyping for 20 diabetogenic SNPs, improve discrimination of incident type 2 diabetes beyond lifestyle risk factors.
Because debate continues over the role of combination, platinum-based chemotherapy for platinum sensitive (PS), recurrent ovarian cancer (OC), we compared overall survival (OS), progression-free survival (PFS), confirmed complete response rate and time to treatment failure in this population.
Patients with recurrent stage III or IV OC, a progression-free and platinum-free interval of 6- 24 months after first-line platinum-based chemotherapy and up to 12 courses of a non-platinum containing consolidation treatment were eligible. Patients were randomized to IV pegylated liposomal doxorubicin (PLD) (30 mg/m2) plus IV carboplatin (AUC=5 mg/mL × min) once every 4 weeks (PLD arm) or IV carboplatin alone (AUC=5mg/mL × min) once every 4 weeks.
The PLD arm enrolled 31 patients and the carboplatin alone arm 30 for a total of 61 patients out of 900 planned. Response rates were 67% for the PLD arm and 32% for the carboplatin only arm (Fisher’s exact p=0.02). The estimated median PFS was 12 and 8 months for PLD versus carboplatin alone. The estimated median OS on the PLD arm was 26 months and 18 months on the carboplatin only arm (p=0.02). Twenty-six percent of the patients on the PLD arm reported grade 4 toxicities, all hematological in nature.
This study was closed early because of slow patient accrual. The response rate, median PFS and OS results are intriguing. These data suggest that there may be an advantage to the PLD plus carboplatin combination treatment in patients with PS, recurrent OC. The regimen should be further tested.
To investigate the validity of using electronic medical records (EMR) database in a large health organization for identifying patients with clinical depression.
The Massachusetts General Hospital EMR system was used to generate a sample of primary care patients seen in the primary care clinic in 2007. Using this sample, the validity of using certain fields in the EMR database (i.e., billing diagnosis, problem list, and medication list) to identify patients with clinical depression was compared to primary care physician (PCP) assessment by a written questionnaire. Based on this standard, the sensitivity, specificity, positive predictive value, negative predictive value, and the areas under receiver operating characteristic curve (AUC) of three specific EMR fields – individually and in combination - were calculated to identify which EMR field best predicted PCP classification.
The EMR fields “billing diagnosis,” “problem list,” and antidepressant in “medication list,” were all able to identify patients’ diagnosis of depression by their PCPs reasonably well. Having one or more “billing diagnosis” of depression had the highest sensitivity and highest AUC (77% sensitivity, 76% specificity, AUC 0.77) among any of the fields used alone.
The AUC for “billing diagnosis” of depression performed the best of the three single fields tested, with an AUC of 0.77, corresponding to a test with moderate accuracy. This analysis demonstrates that specific EMR fields can be used as a proxy for PCP assessment of depression for this EMR system. Limitations to our analysis include the physician response rate to our survey as well as the quality of the data, which is collected primarily for administrative and clinical purposes. When using administrative and clinical data in mental health studies, researchers must first assess the accuracy of choosing specific fields within their EMR system in order to determine the level of accuracy for them to be used as proxies for clinical diagnoses.
Depression; Electronic Medical Records; Clinical Research Methods
Pharmacokinetic and pharmacodynamic (PK/PD) indices are increasingly being used in the microbiological field to assess the efficacy of a dosing regimen. In contrast to methods using MIC, PK/PD-based methods reflect in vivo conditions and are more predictive of efficacy. Unfortunately, they entail the use of one PK-derived value such as AUC or Cmax and may thus lead to biased efficiency information when the variability is large. The aim of the present work was to evaluate the efficacy of a treatment by adjusting classical breakpoint estimation methods to the situation of variable PK profiles.
Methods and results
We propose a logical generalisation of the usual AUC methods by introducing the concept of "efficiency" for a PK profile, which involves the efficacy function as a weight. We formulated these methods for both classes of concentration- and time-dependent antibiotics. Using drug models and in silico approaches, we provide a theoretical basis for characterizing the efficiency of a PK profile under in vivo conditions. We also used the particular case of variable drug intake to assess the effect of the variable PK profiles generated and to analyse the implications for breakpoint estimation.
Compared to traditional methods, our weighted AUC approach gives a more powerful PK/PD link and reveals, through examples, interesting issues about the uniqueness of therapeutic outcome indices and antibiotic resistance problems.