The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.
risk prediction; discrimination; AUC; IDI; Youden index; relative utility
A breast cancer risk prediction model for black women, developed from data in the Women’s Contraceptive and Reproductive Experiences (CARE) study, has been validated in women aged 50 years or older but not among younger women or for specific breast cancer subtypes.
We assessed calibration and discrimination of the CARE model in the Black Women’s Health Study (BWHS) with data from 45 942 women aged 30 to 69 years at baseline.
During a mean follow-up of 9.5 years, we identified 852 invasive breast cancers. The CARE model predicted 749.6 breast cancers, yielding an expected-to-observed (E/O) ratio of 0.88 (95% confidence interval [CI] = 0.82 to 0.94). The E/O ratio did not appreciably differ between women aged less than 50 years and those aged 50 years or older. The model underpredicted risk to the greatest degree among women aged 25 years or older at birth of first child (E/O = 0.71, 95% CI = 0.63 to 0.81); the model was well calibrated among women aged less than 25 years at birth of first child. The prevalence of later age at birth of first child was higher in the BWHS than in the CARE study, and breast cancer incidence was higher in the BWHS compared with national rates used in the CARE model. With respect to discriminatory accuracy, the concordance statistic was 0.57 (95% CI = 0.55 to 0.59) for breast cancer overall, 0.59 (95% CI = 0.57 to 0.61) for estrogen receptor (ER)-positive breast cancer, and 0.54 (95% CI = 0.50 to 0.57) for ER-negative breast cancer.
The CARE model underpredicted breast cancer risk in the BWHS, at least in part because of older age at first birth in this cohort, which led to higher breast cancer incidence rates. Our results suggest that inclusion of age at first birth may improve model performance. Discriminatory accuracy was modest and worse for ER-negative breast cancer.
Cardiovascular disease (CVD) is among the leading causes of death and disability worldwide. Since its beginning, the Framingham study has been a leader in identifying CVD risk factors. Clinical trials have demonstrated that when the modifiable risk factors are treated and corrected, the chances of CVD occurring can be reduced. The Framingham study also recognized that CVD risk factors are multifactorial and interact over time to produce CVD. In response, Framingham investigators developed the Framingham Risk Functions (also called Framingham Risk Scores) to evaluate the chance or likelihood of developing CVD in individuals. These functions are multivariate functions (algorithms) that combine the information in CVD risk factors such as sex, age, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking behavior, and diabetes status to produce an estimate (or risk) of developing CVD or a component of CVD (such as coronary heart disease, stroke, peripheral vascular disease, or heart failure) over a fixed time, for example, the next 10 years. These estimates of CVD risk are often major inputs in recommending drug treatments such as cholesterol-lowering drugs.
Screening for osteoporosis with bone mineral density (BMD) is
recommended for older adults. It is unclear whether repeating a BMD
screening test improves fracture risk assessment.
To determine whether changes in BMD after 4 years provide additional
information on fracture risk beyond baseline BMD and to quantify the change
in fracture risk classification after a second BMD measure.
DESIGN, SETTING, AND PARTICIPANTS
Population-based cohort study involving 310 men and 492 women from
the Framingham Osteoporosis Study with 2 measures of femoral neck BMD taken
from 1987 through 1999.
MAIN OUTCOMES AND MEASURES
Risk of hip or major osteoporotic fracture through 2009 or 12 years
following the second BMD measure.
Mean age was 74.8 years. The mean (SD) BMD change was
−0.6% per year (1.8%). Throughout a median follow-up
of 9.6 years, 76 participants experienced an incident hip fracture and 113
participants experienced a major osteoporotic fracture. Annual percent BMD
change per SD decrease was associated with risk of hip fracture (hazard
ratio [HR], 1.43 [95% CI, 1.16 to
1.78]) and major osteoporotic fracture (HR, 1.21
[95% CI, 1.01 to 1.45]) after adjusting for baseline
BMD. At 10 years’ follow-up, 1 SD decrease in annual percent BMD
change compared with the mean BMD change was associated with 3.9 excess hip
fractures per 100 persons. In receiver operating characteristic (ROC) curve
analyses, the addition of BMD change to a model with baseline BMD did not
meaningfully improve performance. The area under the curve (AUC) was 0.71
(95% CI, 0.65 to 0.78) for the baseline BMD model compared with 0.68
(95% CI, 0.62 to 0.75) for the BMD percent change model. Moreover,
the addition of BMD change to a model with baseline BMD did not meaningfully
improve performance (AUC, 0.72 [95% CI, 0.66 to
0.79]). Using the net reclassification index, a second BMD measure
increased the proportion of participants reclassified as high risk of hip
fracture by 3.9% (95% CI, −2.2% to
9.9%), whereas it decreased the proportion classified as low risk by
−2.2% (95% CI, −4.5% to
CONCLUSIONS AND RELEVANCE
In untreated men and women of mean age 75 years, a second BMD measure
after 4 years did not meaningfully improve the prediction of hip or major
osteoporotic fracture. Repeating a BMD measure within 4 years to improve
fracture risk stratification may not be necessary in adults this age
untreated for osteoporosis.
The primary objective of this multicenter registry was to study the prognostic value of PET MPI and the improved classification of risk in a large cohort of patients with suspected or known coronary artery disease (CAD).
Limited prognostic data are available for myocardial perfusion imaging (MPI) with positron emission tomography (PET).
7,061 patients from 4 centers underwent a clinically indicated rest/stress rubidium-82 PET MPI with a median follow-up of 2.2 years. The primary outcome of this study was cardiac-death (169 patients) and the secondary outcome was all-cause death (570 patients). Net reclassification improvement (NRI) and integrated discrimination (IDI) analyses were performed.
Risk-adjusted hazard of cardiac-death increased with each 10% abnormal myocardium with mildly, moderately or severely abnormal stress PET [hazard ratio 2.3 (95% CI 1.4–3.8, P=0.001), 4.2 (95% CI 2.3–7.5, P<0.001), and 4.9 (95% CI 2.5–9.6, P <0.0001), respectively, normal MPI: referent]. Addition of %myocardium ischemic and scarred to clinical information (age, female sex, body mass index, history of hypertension, diabetes, dyslipidemia, smoking, angina, betablocker use, prior revascularization and rest heart rate) improved the model performance [C-statistic 0.805 (95% CI, 0.772–0.838) to 0.839 (95% CI, 0.809–0.869)] and risk reclassification for cardiac-death [NRI 0.116 (95% CI 0.021–0.210)] with smaller improvements in risk assessment for all-cause death.
In patients with known or suspected CAD, the extent and severity of ischemia and scar on PET MPI provide powerful and incremental risk estimates of cardiac-death and all-cause death compared to traditional coronary risk factors.
positron emission tomography; registry; prognosis; myocardial perfusion imaging; risk reclassification
The discrimination of a risk prediction model measures that model's ability to distinguish between subjects with and without events. The area under the receiver operating characteristic curve (AUC) is a popular measure of discrimination. However, the AUC has recently been criticized for its insensitivity in model comparisons in which the baseline model has performed well. Thus, 2 other measures have been proposed to capture improvement in discrimination for nested models: the integrated discrimination improvement and the continuous net reclassification improvement. In the present study, the authors use mathematical relations and numerical simulations to quantify the improvement in discrimination offered by candidate markers of different strengths as measured by their effect sizes. They demonstrate that the increase in the AUC depends on the strength of the baseline model, which is true to a lesser degree for the integrated discrimination improvement. On the other hand, the continuous net reclassification improvement depends only on the effect size of the candidate variable and its correlation with other predictors. These measures are illustrated using the Framingham model for incident atrial fibrillation. The authors conclude that the increase in the AUC, integrated discrimination improvement, and net reclassification improvement offer complementary information and thus recommend reporting all 3 alongside measures characterizing the performance of the final model.
area under curve; biomarkers; discrimination; risk assessment; risk factors
Understanding the risk for type 2 diabetes (T2D) early in the life course is important for prevention. Whether genetic information improves prediction models for diabetes from adolescence into adulthood is unknown.
With the use of data from 1030 participants in the Bogalusa Heart Study aged 12 to 18 followed into middle adulthood, we built Cox models for incident T2D with risk factors assessed in adolescence (demographics, family history, physical examination, and routine biomarkers). Models with and without a 38 single-nucleotide polymorphism diabetes genotype score were compared by C statistics and continuous net reclassification improvement indices.
Participant mean (± SD) age at baseline was 14.4 ± 1.6 years, and 32% were black. Ninety (8.7%) participants developed T2D over a mean 26.9 ± 5.0 years of follow-up. Genotype score significantly predicted T2D in all models. Hazard ratios ranged from 1.09 per risk allele (95% confidence interval 1.03–1.15) in the basic demographic model to 1.06 (95% confidence interval 1.00–1.13) in the full model. The addition of genotype score did not improve the discrimination of the full clinical model (C statistic 0.756 without and 0.760 with genotype score). In the full model, genotype score had weak improvement in reclassification (net reclassification improvement index 0.261).
Although a genotype score assessed among white and black adolescents is significantly associated with T2D in adulthood, it does not improve prediction over clinical risk factors. Genetic screening for T2D in its current state is not a useful addition to adolescents’ clinical care.
genetic predisposition to disease; diabetes mellitus, type 2; adolescent medicine
Smoking cessation reduces the risks of cardiovascular disease (CVD), but weight gain that follows quitting smoking may weaken the CVD benefit of quitting.
To test the hypothesis that weight gain following smoking cessation does not attenuate the benefits of smoking cessation among people with and without diabetes.
Design, Setting, and Participants
Prospective community-based cohort study using data from the Framingham Offspring Study collected from 1984 to 2011. At each 4-year exam, self-reported smoking status was assessed and categorized as smoker, recent quitter (≤ 4 years), long-term quitter (> 4 years), and non-smoker. Pooled Cox proportional hazards models were used to estimate the association between quitting smoking and 6-year CVD events and to test whether 4-year change in weight following smoking cessation modified the association between smoking cessation and CVD events.
Main outcome measure
Incidence over 6 years of total CVD events, comprising coronary heart disease, cerebrovascular events, peripheral artery disease, and congestive heart failure.
After a mean follow-up of 25 years (SD, 9.6), 631 CVD events occurred among 3251 participants. Median 4-year weight gain was greater for recent quitters without diabetes (2.7 kg, Interquartile range [IQR] −0.5-6.4) and with diabetes (3.6 kg, IQR −1.4-8.2) than for long term quitters (0.9 kg, IQR −1.4-3.2 and 0.0 kg, IQR −3.2-3.2, respectively, p<0.001). Among people without diabetes, age and sex-adjusted incidence rate of CVD was 5.9/ 100 person-exams (95% confidence interval [CI] 4.9-7.1) in smokers, 3.2/ 100 person-exams (95% CI 2.1-4.5) in recent quitters, 3.1 /100 person-exams (95% CI 2.6-3.7) in long-term quitters, and 2.4 /100 person-exams (95% CI 2.0-3.0) in non-smokers. After adjustment for CVD risk factors, compared with smokers, recent quitters had a hazard ratio (HR) for CVD of 0.47 (95% CI, 0.23-0.94) and long-term quitters had an HR of 0.46 (95% CI, 0.34-0.63); these associations had only a minimal change after further adjustment for weight change. Among people with diabetes, there were similar point estimates that did not reach statistical significance.
Conclusions and Relevance
In this community based cohort, smoking cessation was associated with a lower risk of CVD events among participants without diabetes, and weight gain that occurred following smoking cessation did not modify this association. This supports a net cardiovascular benefit of smoking cessation despite subsequent weight gain.
We sought to examine the relation of galectin-3 (Gal-3), a marker of cardiac fibrosis, with incident heart failure (HF) in the community.
Gal-3 is an emerging prognostic biomarker in HF, and experimental studies suggest that Gal-3 is an important mediator of cardiac fibrosis. Whether elevated Gal-3 concentrations precede the development of HF is unknown.
Gal-3 concentrations were measured in 3,353 participants in the Framingham Offspring Cohort (mean age 59 years, 53% women). The relation of Gal-3 to incident HF was assessed using proportional hazards regression.
Gal-3 was associated with increased left ventricular mass in age- and sex-adjusted analyses (P=0.001); this association was attenuated in multivariable analyses (P=0.06). A total of 166 participants developed incident HF and 468 died during a mean follow-up of 8.1 years. Gal-3 was associated with risk of incident HF (HR 1.28 per 1 standard deviation increase in log-Gal-3, 95% CI 1.14–1.43, P<0.0001), and remained significant after adjustment for clinical variables and B-type natriuretic peptide (HR 1.23, 95% CI 1.04–1.47, P=0.02). Gal-3 was also associated with risk of all-cause mortality (multivariable-adjusted HR 1.15, 95% CI 1.04–1.28, P=0.01). The addition of Gal-3 to clinical factors resulted in negligible changes to the c-statistic and minor improvements in the net reclassification index.
Higher concentration of Gal-3, a marker of cardiac fibrosis, is associated with increased risk of incident HF and mortality. Future studies evaluating the role of Gal-3 in cardiac remodeling may provide further insights into the role of Gal-3 in the pathophysiology of HF.
heart failure; epidemiology; biomarker; prognosis
Data regarding the familial aggregation of left ventricular (LV) geometry and its relations to parental heart failure (HF) are limited.
Methods and Results
We evaluated concordance of LV geometry within 1093 nuclear families in 5758 participants of the Original (parents; N=2351) and Offspring (N=3407) cohorts of the Framingham Heart Study undergoing routine echocardiography in mid-to-late adulthood. LV geometry was categorized based on cohort- and sex-specific 80th percentile cutoffs of LV mass and relative wall thickness (RWT) into normal (both <80th percentile), concentric remodeling (LV mass<80th percentile, RWT>80th percentile), concentric hypertrophy (both >80th percentile) and eccentric hypertrophy (LV mass>80th percentile, RWT<80th percentile). Within nuclear families, LV geometry was concordant among related pairs (parent-child, sibling-sibling) (P=0.0015), but not among unrelated spousal pairs (P=0.60), a finding that remained unchanged after adjusting for clinical covariates known to influence LV remodeling (age, systolic blood pressure, body mass index), excluding individuals with prevalent HF and myocardial infarction, and varying the thresholds for defining LV geometry. The prevalence of abnormal LV geometry was higher in family members of affected individuals, with recurrence risks of 1.4 for concentric remodeling (95%CI, 1.2–1.7) and eccentric hypertrophy (95%CI, 1.1–1.8), and 3.9 (95%CI, 3.2–4.6) for concentric hypertrophy. In a subset of 1497 offspring, we observed an association between parental HF (N=458) and eccentric hypertrophy in offspring (P<0.0001).
Our investigation of a two-generational community-based sample demonstrates familial aggregation of LV geometry, with the greatest recurrence risk for concentric LV geometry, and establishes an association of eccentric LV geometry with parental HF.
echocardiography; remodeling; risk factors
Common carotid artery (CCA) intima-media thickness (cIMT), a measure of atherosclerosis, varies between peak-systole (PS) and end-diastole (ED). This difference might affect cardiovascular risk assessment.
Materials and methods
IMT measurements of the right and left CCA were synchronized with an electrocardiogram: R-wave for ED and T-wave for PS. IMT was measured in 2930 members of the Framingham Offspring Study. Multivariable regression models were generated with ED-IMT, PS-IMT and change in IMT as dependent variables and Framingham risk factors as independent variables. ED-IMT estimates were compared to the upper quartile of IMT based on normative data obtained at PS.
The average age of our population was 57.9 years. Average difference in IMT during the cardiac cycle was 0.037 mm (95% CI: 0.035–0.038 mm). ED-IMT and PS-IMT had similar associations with Framingham risk factors (total R2= 0.292 versus 0.275) and were significantly associated with all risk factors. In a fully adjusted multivariable model, a thinner IMT at peak-systole was associated with pulse pressure (p < 0.0001), LDL-cholesterol (p = 0.0064), age (p = 0.046), and no other risk factors. Performing ED-IMT measurements while using upper quartile PS-IMT normative data lead to inappropriately increasing by 42.1% the number of individuals in the fourth IMT quartile (high cardiovascular risk category).
The difference in IMT between peak-systole and end-diastole is associated with pulse pressure, LDL-cholesterol, and age. In our study, mean IMT difference during the cardiac cycle lead to an overestimation by 42.1% of individuals at high risk for cardiovascular disease.
Ultrasonics; Risk Factors; Carotid Arteries; Blood Pressure; systole; diastole
It is often assumed that rare genetic variants will improve available risk prediction scores. We aimed to estimate the added predictive ability of rare variants for risk prediction of common diseases in hypothetical scenarios.
In simulated data, we constructed risk models with an area under the ROC curve (AUC) ranging between 0.50 and 0.95, to which we added a single variant representing the cumulative frequency and effect (odds ratio, OR) of multiple rare variants. The frequency of the rare variant ranged between 0.0001 and 0.01 and the OR between 2 and 10. We assessed the resulting AUC, increment in AUC, integrated discrimination improvement (IDI), net reclassification improvement (NRI(>0.01)) and categorical NRI. The analyses were illustrated by a simulation of atrial fibrillation risk prediction based on a published clinical risk model.
We observed minimal improvement in AUC with the addition of rare variants. All measures increased with the frequency and OR of the variant, but maximum increment in AUC remained below 0.05. Increment in AUC and NRI(>0.01) decreased with higher AUC of the baseline model, whereas IDI remained constant. In the atrial fibrillation example, the maximum increment in AUC was 0.02 for a variant with frequency = 0.01 and OR = 10. IDI and NRI showed at most minimal increase for variants with frequency greater than or equal to 0.005 and OR greater than or equal to 5.
Since rare variants are present in only a minority of affected individuals, their predictive ability is generally low at the population level. To improve the predictive ability of clinical risk models for complex diseases, genetic variants must be common and have substantial effect on disease risk.
Obesity affects one in three American adult women and is associated with overall mortality and major morbidities. A composite diet index to evaluate total diet quality may better assess the complex relationship between diet and obesity, providing insights for nutrition interventions. The purpose of the present investigation was to determine whether diet quality, defined according to the previously validated Framingham nutritional risk score (FNRS), was associated with the development of overweight or obesity in women. Over 16 years, we followed 590 normal-weight women (BMI < 25 kg/m2), aged 25 to 71 years, of the Framingham Offspring and Spouse Study who presented without CVD, cancer or diabetes at baseline. The nineteen-nutrient FNRS derived from mean ranks of nutrient intakes from 3d dietary records was used to assess nutritional risk. The outcome was development of overweight or obesity (BMI ≥ 25 kg/m2) during follow-up. In a stepwise multiple logistic regression model adjusted for age, physical activity and smoking status, the FNRS was directly related to overweight or obesity (P for trend= 0·009). Women with lower diet quality (i.e. higher nutritional risk scores) were significantly more likely to become overweight or obese (OR 1·76; 95% CI 1·16, 2·69) compared with those with higher diet quality. Diet quality, assessed using a comprehensive composite nutritional risk score, predicted development of overweight or obesity. This finding suggests that overall diet quality be considered a key component in planning and implementing programmes for obesity risk reduction and treatment recommendations.
Diet quality; Nutritional risk score; Obesity; BMI; Dietary quality index
Heart failure (HF) is a major public health burden worldwide. Of patients presenting with HF, 30–55% have a preserved ejection fraction (HFPEF) rather than a reduced ejection fraction (HFREF). Our objective was to examine discriminating clinical features in new-onset HFPEF vs. HFREF.
Methods and results
Of 712 participants in the Framingham Heart Study (FHS) hospitalized for new-onset HF between 1981 and 2008 (median age 81 years, 53% female), 46% had HFPEF (EF >45%) and 54% had HFREF (EF ≤45%). In multivariable logistic regression, coronary heart disease (CHD), higher heart rate, higher potassium, left bundle branch block, and ischaemic electrocardiographic changes increased the odds of HFREF; female sex and atrial fibrillation increased the odds of HFPEF. In aggregate, these clinical features predicted HF subtype with good discrimination (c-statistic 0.78). Predictors were examined in the Enhanced Feedback for Effective Cardiac Treatment (EFFECT) study. Of 4436 HF patients (median age 75 years, 47% female), 32% had HFPEF and 68% had HFREF. Distinguishing clinical features were consistent between FHS and EFFECT, with comparable discrimination in EFFECT (c-statistic 0.75). In exploratory analyses examining the traits of the intermediate EF group (EF 35–55%), CHD predisposed to a decrease in EF, whereas other clinical traits showed an overlapping spectrum between HFPEF and HFREF.
Multiple clinical characteristics at the time of initial HF presentation differed in participants with HFPEF vs. HFREF. While CHD was clearly associated with a lower EF, overlapping characteristics were observed in the middle of the left ventricular EF range spectrum.
Heart failure; Epidemiology; Risk factors; Ejection fraction
The area under the receiver operating characteristics curve (AUC of ROC) is a widely used measure of discrimination in risk prediction models. Routinely, the Mann–Whitney statistics is used as an estimator of AUC, while the change in AUC is tested by the DeLong test. However, very often, in settings where the model is developed and tested on the same dataset, the added predictor is statistically significantly associated with the outcome but fails to produce a significant improvement in the AUC. No conclusive resolution exists to explain this finding. In this paper, we will show that the reason lies in the inappropriate application of the DeLong test in the setting of nested models. Using numerical simulations and a theoretical argument based on generalized U-statistics, we show that if the added predictor is not statistically significantly associated with the outcome, the null distribution is non-normal, contrary to the assumption of DeLong test. Our simulations of different scenarios show that the loss of power because of such a misuse of the DeLong test leads to a conservative test for small and moderate effect sizes. This problem does not exist in cases of predictors that are associated with the outcome and for non-nested models. We suggest that for nested models, only the test of association be performed for the new predictors, and if the result is significant, change in AUC be estimated with an appropriate confidence interval, which can be based on the DeLong approach.
AUC; DeLong test; logistic regression; U-statistics; discrimination; risk prediction
Cardiovascular risk prediction functions offer an important diagnostic tool for clinicians and patients themselves. They are usually constructed with the use of parametric or semi-parametric survival regression models. It is essential to be able to evaluate the performance of these models, preferably with summaries that offer natural and intuitive interpretations. The concept of discrimination, popular in the logistic regression context, has been extended to survival analysis. However, the extension is not unique. In this paper, we define discrimination in survival analysis as the model’s ability to separate those with longer event-free survival from those with shorter event-free survival within some time horizon of interest. This definition remains consistent with that used in logistic regression, in the sense that it assesses how well the model-based predictions match the observed data. Practical and conceptual examples and numerical simulations are employed to examine four C statistics proposed in the literature to evaluate the performance of survival models. We observe that they differ in the numerical values and aspects of discrimination that they capture. We conclude that the index proposed by Harrell is the most appropriate to capture discrimination described by the above definition. We suggest researchers report which C statistic they are using, provide a rationale for their selection, and be aware that comparing different indices across studies may not be meaningful.
discrimination; risk function; censoring; AUC; concordance
The aim of the study was to evaluate whether knowledge of the circulating concentration of growth differentiation factor 15 (GDF-15) adds predictive information to the Global Registry of Acute Coronary Events (GRACE) score, a validated scoring system for risk assessment in non-ST-elevation acute coronary syndrome (NSTE-ACS). We also evaluated whether GDF-15 adds predictive information to a model containing the GRACE score and N-terminal pro-B-type natriuretic peptide (NT-proBNP), a prognostic biomarker already in clinical use.
Methods and results
The GRACE score, GDF-15, and NT-proBNP levels were determined on admission in 1122 contemporary patients with NSTE-ACS. Six-month all-cause mortality or non-fatal myocardial infarction (MI) was the primary endpoint of the study. To obtain GDF-15- and NT-proBNP-adjusted 6-month estimated probabilities of death or non-fatal MI, statistical algorithms were developed in a derivation cohort (n = 754; n = 66 reached the primary endpoint) and applied to a validation cohort (n = 368; n = 33). Adjustment of the GRACE risk estimate by GDF-15 increased the area under the receiver-operating characteristic curve (AUC) from 0.79 to 0.85 (P < 0.001) in the validation cohort. Discrimination improvement was confirmed by an integrated discrimination improvement (IDI) of 0.055 (P = 0.005). A net 31% of the patients without events were reclassified into lower risk, and a net 27% of the patients with events were reclassified into higher risk, resulting in a total continuous net reclassification improvement [NRI(>0)] of 0.58 (P = 0.002). Addition of NT-proBNP to the GRACE score led to a similar improvement in discrimination and reclassification. Addition of GDF-15 to a model containing GRACE and NT-proBNP led to a further improvement in model performance [increase in AUC from 0.84 for GRACE plus NT-proBNP to 0.86 for GRACE plus NT-proBNP plus GDF-15, P = 0.010; IDI = 0.024, P = 0.063; NRI(>0) = 0.42, P = 0.022].
We show that a single measurement of GDF-15 on admission markedly enhances the predictive value of the GRACE score and provides moderate incremental information to a model including the GRACE score and NT-proBNP. Our study is the first to provide simple algorithms that can be used by the practicing clinician to more precisely estimate risk in individual patients based on the GRACE score and a single biomarker measurement on admission. The rigorous statistical approach taken in the present study may serve as a blueprint for future studies exploring the added value of biomarkers beyond clinical risk scores.
GDF-15; NT-proBNP; GRACE score; Acute coronary syndrome; Risk stratification
Multiple studies have identified single-nucleotide polymorphisms (SNPs) that are associated with coronary heart disease (CHD). We examined whether SNPs selected based on predefined criteria will improve CHD risk prediction when added to traditional risk factors (TRFs).
SNPs were selected from the literature based on association with CHD, lack of association with a known CHD risk factor, and successful replication. A genetic risk score (GRS) was constructed based on these SNPs. Cox proportional hazards model was used to calculate CHD risk based on the Atherosclerosis Risk in Communities (ARIC) and Framingham CHD risk scores with and without the GRS.
The GRS was associated with risk for CHD (hazard ratio [HR] = 1.10; 95% confidence interval [CI]: 1.07–1.13). Addition of the GRS to the ARIC risk score significantly improved discrimination, reclassification, and calibration beyond that afforded by TRFs alone in non-Hispanic whites in the ARIC study. The area under the receiver operating characteristic curve (AUC) increased from 0.742 to 0.749 (Δ= 0.007; 95% CI, 0.004–0.013), and the net reclassification index (NRI) was 6.3%. Although the risk estimates for CHD in the Framingham Offspring (HR = 1.12; 95% CI: 1.10–1.14) and Rotterdam (HR = 1.08; 95% CI: 1.02–1.14) Studies were significantly improved by adding the GRS to TRFs, improvements in AUC and NRI were modest.
Addition of a GRS based on direct associations with CHD to TRFs significantly improved discrimination and reclassification in white participants of the ARIC Study, with no significant improvement in the Rotterdam and Framingham Offspring Studies.
Genetics; Risk factors; Coronary disease
New markers may improve prediction of diagnostic and prognostic outcomes. We review various measures to quantify the incremental value of markers over standard, readily available characteristics. Widely used traditional measures include the improvement in model fit or in the area under the receiver operating characteristic (ROC) curve (AUC). New measures include the net reclassification index (NRI) and decision–analytic measures, such as the fraction of true positive classifications penalized for false positive classifications (‘net benefit’, NB).
For illustration we discuss a case study on the presence of residual tumor versus benign tissue in 544 patients with testicular cancer. We assessed 3 tumor markers (AFP, HCG, and LDH) for their incremental value over currently standard clinical predictors. AUC and R2 values suggested adding continuous LDH and AFP whereas NB only favored HCG as a potentially promising marker at a clinically defendable decision threshold of 20% risk. Results based on the NRI fell in the middle, suggesting reclassification potential of all three markers.
We conclude that improvement in standard discrimination measures, which focus on finding variables that might be promising across all decision thresholds, may not detect the most informative markers at a specific threshold of particular clinical relevance. When a marker is intended to support decision making, calculation of the improvement in a decision–analytic measure, such as NB, is preferable over an overall judgment as obtained from the AUC in ROC analysis.
prediction; logistic regression model; performance measures; incremental value
Little is known about the familial aggregation of intermittent claudication (IC). Our objective was to examine whether parental IC increased adult offspring risk of IC independent of established cardiovascular risk factors. We evaluated Offspring cohort participants of the Framingham Heart Study (FHS) who were 30 years or older, cardiovascular disease (CVD) free, and had both parents enrolled in the FHS (n= 2970 unique participants, 53% women). Pooled proportional hazards regression was used to examine whether the 12 year risk for incident IC in offspring participants was associated with parental IC adjusting for age, sex, diabetes, smoking, systolic blood pressure, total cholesterol, high density lipoprotein (HDL) cholesterol, anti-hypertensive and lipid treatment. Among 909 person-exams in the parental IC history group and 5397 person-exams in the no parental IC history group there were 101 incident IC events (29 with parental IC history, 72 without parental IC history) during follow-up. Age and sex adjusted 12-year cumulative incidence rates per 1000 person-years were 5.08 (95% CI: 2.74; 7.33) and 2.34 (95% CI: 1.46; 3.19) in participants with and without parental IC history. Parental history of IC significantly increased the risk of incident IC in offspring (multivariable adjusted hazard ratio of 1.81, 95% CI 1.14, 2.88). The hazard ratio was unchanged with adjustment for occurrence of CVD (1.83, 95% CI 1.15, 2.91). In conclusion, IC in parents increases risk for IC in adult offspring independent of established risk factors. These data suggest a genetic component of peripheral artery disease and support future research into genetic causes.
claudication; peripheral artery disease; risk factors; family history
The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
Limited data exist regarding the use of a genetic risk score for predicting risk of incident cardiovascular disease (CVD) in US based samples.
Methods and Results
Using findings from recent GWAS, we constructed genetic risk scores (GRS) comprised of 13 genetic variants associated with myocardial infarction (MI) or other manifestations of CHD and 102 genetic variants associated with CHD or its major risk factors. We also updated the 13 SNP GRS with 16 SNPs recently discovered by GWAS. We estimated the association, discrimination and risk reclassification of each GRS for incident cardiovascular events and for prevalent coronary artery calcium (CAC).
In analyses adjusted for age, sex, CVD risk factors and parental history of CVD, the 13 SNP GRS was significantly associated with incident hard CHD (HR 1.07, 95% CI 1.00-1.15, p=0.04), CVD (hazard ratio [HR] per-allele 1.05, 95% confidence interval [CI] 1.01-1.09; p=0.03), and high CAC (defined as >75th age and sex-specific percentile; odds ratio [OR] per-allele 1.18, 95% CI 1.11-1.26, p=3.4 × 10-7). The GRS did not improve discrimination for incident CHD or CVD but led to modest improvements in risk reclassification. However, significant improvements in discrimination and risk reclassification were observed for the prediction of high CAC. The addition of 16 newly discovered SNPs to the 13 SNP GRS did not significantly modify these results.
A GRS comprised of 13 SNPs associated with coronary disease is an independent predictor of cardiovascular events and of high CAC, modestly improves risk reclassification for incident CHD and significant improves discrimination for high CAC. The addition of recently discovered SNPs did not significantly improve the performance of this GRS.
Genetics; single nucleotide polymorphisms; cardiovascular disease; coronary heart disease; risk prediction; reclassification
Net reclassification and integrated discrimination improvements have been proposed as alternatives to the increase in the AUC for evaluating improvement in the performance of risk assessment algorithms introduced by the addition of new phenotypic or genetic markers. In this paper, we demonstrate that in the setting of linear discriminant analysis, under the assumptions of multivariate normality, all three measures can be presented as functions of the squared Mahalanobis distance. This relationship affords an interpretation of the magnitude of these measures in the familiar language of effect size for uncorrelated variables. Furthermore, it allows us to conclude that net reclassification improvement can be viewed as a universal measure of effect size. Our theoretical developments are illustrated with an example based on the Framingham Heart Study risk assessment model for high risk men in primary prevention of cardiovascular disease.
AUC; biomarker; c statistic; model performance; risk prediction; ROC