For all local predictive models, no interactions between ethnicity and other covariates were found to be significant, so no interaction term was added to any of the local predictive models. local ARIC and SAHS models with one combined race variable for Malays and Asian Indians had better model fit than their counterparts with two separate race variables. For local SAHS models, the Akaike Information Criterion (AIC) for models with one and two race variables were 588.6 and 590.6, respectively. For local ARIC models, the AIC were 587.9 and 589.8, respectively.
Table shows comparisons of the coefficients of the three multivariate predictive functions that were estimated using the NHS-92 cohort with the published coefficients from the SAHS, ARIC and Framingham models.
Comparison of published and locally-estimated multivariate predictive functions
For the ARIC model, the 95% confidence intervals of the following risk factors did not include the published effect size from the original study: age, fasting FPG at baseline, waist circumference and triglyceride.
For the Framingham model, all locally-estimated effect size agreed well with the published ones, except for the effect of overweight, which was found to be significantly higher in our local model.
However, at this stage, we did not know to what extent the estimates from the local models had been influenced by the exclusion of 942 subjects without FPG measurement at follow-up or due to incomplete baseline information.
Table shows the multiple imputation estimates for the three local
models. Interestingly, after taking into account the dropout effects, the discrepancies between the local
models were less startling. In particular, for the SAHS and ARIC models, the discrepancies in terms of FPG and measures of adiposity (BMI or waist circumference) were no longer statistically significant. There were, however, significantly smaller age and gender effect sizes when compared to the published
models. With the Framingham model, we found a significantly larger effect of overweight, quite possibly because a Caucasian definition of overweight (BMI ≥ 25) had been used instead of the WHO recommendation for Singapore, which used a cut-off of BMI ≥ 23 to define overweight individuals [22
Comparisons of published and locally-estimated multivariate predictive functions after inclusion of 942 subjects with incomplete baseline and follow-up measurements using multiple imputations
Table compares AUC for the various predictive functions. Although the AUC for all three locally-estimated multivariate models were slightly higher than the corresponding statistic for their published counterpart, only in the case of SAHS model did this difference achieve statistical significance. All locally-estimated multivariate models achieved better discrimination power when compared to model that used FPG only (all P < 0.001, Table ), while the locally-estimated Framingham model is the only one that was not statistically better than model that used only 2hPG (P = 0.110).
Comparisons of area under the Receiver Operating Characteristic curve (AUC) for various predictive models evaluated using NHS-92 Cohort
Out of the three published functions, the ARIC model had the highest discrimination power (AUC = 0.847), followed by the SAHS model (AUC = 0.839) and the Framingham model (AUC = 0.805). The performance of the published SAHS and ARIC models were not statistically different (P = 0.230), but both models had significantly higher discrimination power than the Framingham model (P = 0.028 and 0.007, respectively). More importantly, the published SAHS and ARIC models were statistically better at discriminating T2DM cases from non-cases when compared to the local model that used FPG only (P < 0.001) or 2hPG only (P = 0.021 and 0.011, respectively).
The NRI statistic revealed that overall, the published ARIC model was only marginally better than the published SAHS model (NRI = 0.127, P = 0.060). When we looked at cases and non-cases separately, the ARIC model was not significantly better than the SAHS model in terms of reclassifying cases (Figure ). Specifically, compared to the SAHS model, 32 cases were appropriately reclassified by the ARIC model at a cost of 21 cases being reclassified inappropriately (NRI = 0.100, P = 0.131). However, the ARIC model was better at reclassifying non-cases. In total, 110 non-cases were appropriately reclassified using the ARIC model, at a cost of 76 non-cases being reclassified inappropriately (NRI = 0.026, P = 0.013).
Comparisons of risk classifications for subjects in NHS-92 cohort using recalibrated ARIC (M1) and SAHS (M2) models.
The calibration inspections revealed that the local models showed good calibration properties (Table ). In particular, the H-L statistics for local models are all less than 11.5 and the predicted incidence rates under all local models agree well with observed incidence rates over the 13-year period in the NHS-92 cohort, which was 7.8%. However, the three published models showed poor calibration, with the Framingham model being the worst. In particular, the SAHS and ARIC published models overestimated the incidence rates, while the Framingham model underestimated the incidence rates. Specifically, the estimated incidence rates from the SAHS and ARIC models were 13.5% and 9.8%, respectively. Meanwhile, estimated incidence rates from the Framingham model is 2.0%.
Calibration quality of various predictive models evaluated using NHS-92 Cohort (N = 1,401)
Recalibration improved the calibration quality of ARIC model (Figure ), but the same cannot be said for the Framingham model. The recalibration procedure seemed to work reasonably well for the SAHS model for subjects in the lowest three quintiles (Figure ); however, the SAHS model still overestimated the number of cases in the two highest quintiles even after recalibration. The poor performance of the Framingham model could be due to the fact that the Framingham cohort used to derive the model consists almost exclusively of one race while the Singapore population consists of three races with Chinese being different from Malays and Indians. To investigate this possibility, we performed local fitting and recalibration of the three published models separately in the Chinese and non-Chinese populations, with the race terms removed from the ARIC and SAHS models. In the Chinese population, the H-L statistic for locally-fitted ARIC, Framingham and SAHS models is 4.22, 3.43 and 1.38 respectively, indicating good calibration properties. However, only recalibrated ARIC and SAHS models show acceptable calibration quality with H-L statistic of 7.31 and 2.23 respectively. The recalibrated Framingham model still shows poor calibration quality (H-L statistic = 26.12). Among non-Chinese population the story is very similar. The H-L statistic for locally-fitted ARIC, Framingham and SAHS models is 1.53, 2.74 and 3.34 respectively. Among the recalibrated published models, only ARIC shows acceptable calibration quality with H-L statistic of 6.87. The recalibrated Framingham and SAHS have poor calibration quality with H-L statistic of 19.29 and 140.17, respectively. Thus, the poorer performance of the Framingham model is unlikely only due to differences in race effects between the Framingham cohort and Singapore population. It is more likely that differences in the effect sizes of some of the risk factors also contribute to the poor performance.
Observed and predicted number of cases in the different risk quintiles.