Description of the derivation and validation dataset
Overall, 531 UK practices met our inclusion criteria, of which 355 were randomly assigned to the derivation dataset and 176 to the validation dataset. We excluded 20 practices: four practices had not completely uploaded all their electronic data for the relevant study period, seven practices were from Scotland, and nine practices were from Northern Ireland.
The derivation cohort contained 2
578 patients, of whom 53
825 had type 1 or type 2 diabetes before the start of the study and were therefore excluded leaving 2
753 patients (1
135; 50.50% women) aged 25-79 years and free of diabetes at baseline for analysis. The validation cohort contained 1
419 patients aged 25-79, of whom 28
587 had a previous diagnosis of type 1 or type 2 diabetes leaving 1
832 patients for analysis (50.49% women).
Overall, we studied 3
585 patients contributing 24
172 person years, of whom 115
616 patients (78
081 in the derivation cohort and 37
535 in the validation cohort) had a new diagnosis of type 2 diabetes during follow-up. Table 1 compares the characteristics of eligible patients in the derivation and validation cohorts. Although this validation cohort was drawn from an independent group of practices, the baseline characteristics were very similar to those for the derivation cohort. Overall, 898
461 patients (23.81% of 3
585) had ethnicity recorded, and 122
736 (13.66%) of these were from a non-white ethnic group. Practices in areas where the proportion of patients from a non-white ethnic group is higher according to the 2001 census (such as London (28.9%), East Midlands (6.5%), and West Midlands (11.3%)) also have higher rates of completeness of recording of ethnicity on the QResearch database (40.1%, 21.4%, and 30.1% for the above areas).
Table 1 Characteristics of patients aged 25-79 free of diabetes at baseline in derivation and validation cohorts between 1993 and 2008. Values are numbers (percentages) unless stated otherwise
Patterns of missing data
Table 1 shows that 78.97% of women in the derivation cohort had body mass index recorded and 90.00% had smoking status recorded; 78.03% had both body mass index and smoking status recorded. For men, the corresponding figures were 71.19%, 83.24%, and 70.12%. Overall, 22.97% of women and 29.88% of men had either smoking or body mass index imputed by multiple imputation (data were not imputed for ethnicity—all patients with missing ethnicity were treated as white/not recorded). Similar figures were observed for men and women in the validation cohort, where multiple imputation was also used.
Table 2 shows the characteristics of men and women with complete data for smoking and body mass index compared with those who had missing data. Women with missing data had different patterns of risk factors—for example, women with complete data for body mass index were more likely to have a family history of diabetes, to be recorded as current smokers, and to have treated hypertension. They also had a lower 10 year observed risk of diabetes compared with women with missing body mass index data. Women with complete data for smoking were more likely to have a diagnosis of cardiovascular disease, a diagnosis of treated hypertension, and a family history of diabetes. The 10 year observed risk of diabetes was lower than for women whose smoking status was missing. The pattern was similar for men for most risk factors, except that the observed risks of diabetes were lower among men with missing data.
Table 2 Characteristics of men and women in derivation cohort with and without complete data for body mass index and smoking. Values are numbers (percentages) unless stated otherwise
Incidence of diabetes
Table 3 shows the crude and age standardised rates of type 2 diabetes by sex, deprivation, and ethnicity in the derivation cohort. The age standardised rates for the white reference group were 4.13 (95% confidence interval 4.08 to 4.17) per 1000 person years for women and 5.31 (5.26 to 5.36) per 1000 person years for men. The crude and age standardised incidence rates of type 2 diabetes in the derivation cohort varied widely between ethnic groups, as shown in table 3. Age standardised rates were significantly higher for men in every ethnic group compared with the white reference group, except for Chinese men. In women, age standardised incidence rates were higher for every group compared with the white reference group. The highest age standardised rates were in South Asians, and significant differences existed between the South Asian groups. For example, the rate for Bangladeshi women was 18.20 (12.93 to 23.47) per 1000 person years and that for Bangladeshi men was 19.34 (14.28 to 24.4) per 1000 person years. For Pakistanis, the corresponding rates per 1000 person years were 11.19 (9.16 to 13.21) for women and 13.22 (11.24 to 15.21) for men.
Table 3 Crude and age standardised incidence of type 2 diabetes per 1000 person years by sex, deprivation fifth, and ethnicity in derivation dataset
We also found a marked difference in the age standardised incidence rates of type 2 diabetes by deprivation, with a more than twofold difference for women when comparing the most deprived fifth (6.39 (6.25 to 6.54) per 1000 person years) with the most affluent fifth (3.00 (2.93 to 3.08) per 1000 person years). A similar, but less steep gradient was seen for men. The rates seen in the validation cohort were similar to those for the derivation cohort (data not shown).
Prevalence of risk factors by ethnicity
Table 4 shows the age standardised distribution of risk factors across each of the main ethnic groups. Substantial heterogeneity exists across the ethnic groups for risk factors, and the distribution also differs between men and women within ethnic groups. The notable results include substantial differences in the age standardised prevalence of smoking among men of Bangladeshi (46.04%, 95% confidence interval 43.16% to 48.92%), Caribbean (40.45%, 38.99% to 41.91%), Pakistani (32.82%, 31.29% to 34.35%), white/not recorded (33.49%, 33.40% to 33.58%), Chinese (26.63%, 24.23% to 29.03%), Indian (22.71%, 21.60% to 23.81%), and black African (17.95%, 16.76% to 19.14%) origin. Smoking rates were lower for women in each ethnic group compared with men but varied widely between women from different groups.
Table 4 Distribution of risk factors for type 2 diabetes by ethnic group in men and women in derivation cohort. Values are age standardised means and proportions with 95% confidence intervals
Treated hypertension was highest among black Caribbean and black African men and women and more than twice as high as that for the white reference group. Recorded family history of diabetes was highest among black Caribbean women (32.63%, 31.41% to 33.85%) and Indian men (29.95%, 28.78% to 31.11%), which was more than three times that for the white reference group who had the lowest rates (11.32%, 11.27% to 11.38% for women and 8.07%, 8.02% to 8.12% for men).
Bangladeshi men and women had the highest age standardised mean deprivation scores, followed by those of black African and black Caribbean origin. Indians and the white reference group had the lowest mean deprivation scores, as shown in table 4.
The highest mean body mass index was seen among black African women (age standardised mean 28.44, 28.29 to 28.58) compared with 25.47 (25.46 to 25.48) for women in the white reference group. The lowest value was in Chinese women (age standardised mean 22.87, 22.68 to 23.06). Similar patterns, although slightly less marked, were seen for men across the ethnic groups. Finally, 9.70% (7.76% to 11.65%) of Bangladeshi men had a recorded diagnosis of cardiovascular disease at baseline, which was more than twice that for men in the white reference group (4.54%, 4.50% to 4.57%) and more than four times that found in Chinese men (2.26%, 1.15% to 3.37%).
Table 5 shows the results of the Cox regression analysis for the QDScore. After adjustment for all other variables in the model, we found significant associations with risk of type 2 diabetes in both men and women for age, body mass index, family history of diabetes, smoking status, treated hypertension, use of corticosteroids, diagnosed cardiovascular disease, social deprivation, and ethnicity. We therefore included these variables in the final model and risk prediction algorithm.
Table 5 Adjusted hazard ratios (95% confidence interval) for QDScore in derivation cohort (see fig 1 for graphical representation of interaction terms)
We found significant heterogeneity of risk of type 2 diabetes by ethnic group compared with the white reference population, having adjusted for age, body mass index, deprivation, family history of diabetes, smoking status, treated hypertension, diagnosed cardiovascular disease, use of corticosteroids, and diagnosed cardiovascular disease, as shown in table 4. For example, among Bangladeshis, the adjusted hazard ratio for women was 4.07 (95% confidence interval 3.24 to 5.11) and that for men was 4.53 (3.67 to 5.59). These were significantly higher than the increased hazard ratios in Pakistani women and men (2.15, 1.84 to 2.52; and 2.54, 2.20 to 2.93). Both Pakistani and Bangladeshi men had significantly higher hazard ratios than Indian men. Black African men and Chinese women had increased risks compared with the corresponding white reference group. The only groups to have significantly lower risks than the white reference group were black African women (0.81, 0.66 to 0.98) and black Caribbean women (0.80, 0.70 to 0.92).
The fractional polynomial terms selected for inclusion in the model were as follows. For age in women the two terms were (age/10)½ and (age/10)3. For body mass index in women, the two terms were (bmi/10) and (bmi/10)3. For men, the two age terms were log(age/10) and (age/10)3 and the two terms for body mass index were (bmi/10)2 and (bmi/10)3. Figure 1 shows the estimated adjusted hazard ratios by age and body mass index for these fractional polynomial terms in men and women.
Fig 1Graphical representation of age interactions for men and women for risk of type 2 diabetes
We identified significant interactions between age and body mass index, age and family history of diabetes, and age and smoking status. We therefore included these interactions in the final model, and the general direction of the effects was that body mass index and family history of diabetes tended to have a greater impact on risk of diabetes at younger ages, as shown in fig 1. Smoking had a more complex relation with age; the risk peaked in middle age for both men and women.
In a comparison of models, the median Bayes information criterion for women for our final model (model A) was 875
203, for the model without deprivation and ethnicity (model B) it was 876
400, for the model without ethnicity (model C) it was 875
270, and for the model without deprivation (model D) it was 876
198, indicating that the model that incorporated both ethnicity and deprivation was superior to the other three. For men, the corresponding figures were 1
034, and 1
369, similarly supporting the inclusion of both ethnicity and deprivation into the final model.
Calibration and discrimination of QDScore
Table 6 shows the results for the validation statistics for men and women after application of the QDScore and the Cambridge risk score in the validation dataset. The QDScore shows higher levels of discrimination than the Cambridge risk score. For example, in women the D statistic for the QDScore was 2.11 (95% confidence interval 2.08 to 2.14) compared with 1.88 (1.85 to 1.91) with the Cambridge risk score; a 0.1 difference in the D statistic indicates an important difference in prognostic separation between two risk algorithms.41
The QDScore explained a higher proportion of the variation—it explained 51.53% of the variation in women and 48.16% of that in men. The corresponding values for the Cambridge risk score were 45.77% and 41.82%. The Brier score, however, was slightly lower for the Cambridge risk score in both men and women.
Table 6 Validation statistics for QDScore and Cambridge risk score in validation cohort. Values are mean (95% confidence interval)
Figure 2 compares the mean predicted scores from the QDScore with the observed risks at 10 years within each 10th of predicted risk in order to assess the calibration of the model in the validation sample. The close correspondence between predicted and observed 10 year risks within each model 10th suggests that the model was well calibrated. For example, in the top 10th of risk, the mean predicted risk was 18.31% (95% confidence interval 18.24% to 18.38%) in women and the observed risk was 18.82% (18.39% to 19.26%). The ratio of predicted to observed risk in this tenth was 0.97, indicating almost perfect calibration (a ratio of 1 indicates perfect calibration—that is, no under-prediction or over-prediction). We found similar results for men, with a ratio of 0.99 in the top 10th of predicted risk.
Fig 2 QDScore predicted and observed risk of diabetes by 10th of predicted risk
Predictions with age, sex, deprivation, and ethnicity
Table 7 shows the percentages of men and women in the validation dataset with a 10 year predicted risk of being diagnosed as having type 2 diabetes according to a range of thresholds and by age band. For example, at the 10% threshold, 10.60% of women and 15.06% of men had a 10% or higher predicted risk of being diagnosed as having type 2 diabetes over 10 years. This varied markedly by age such that 21.43% of women aged 55-59 and 30.99% of women aged 65-69 had a 10% or greater risk of being diagnosed as having type 2 diabetes over 10 years. The corresponding figures for men were 33.28% and 44.08%.
Table 7 Percentage of patients in validation dataset with 10 year predicted risk of type 2 diabetes from QDScore of ≥10%, ≥15%, ≥20%, ≥30%, ≥40%, and ≥50% by age and sex
Tables 8 and 9 show the 10 year risk of type 2 diabetes among men and women of different ethnic groups and for those living in the most deprived and affluent areas. For example, 33.83% of Bangladeshi women had a 10 year risk of being diagnosed as having diabetes of 10% or more compared with 10.48% of women in the white reference group, and 15.03% of women in the most deprived fifth had a 10% or higher risk of developing diabetes over the next 10 years compared with 6.52% of women in the most affluent fifth. The difference between affluent and deprived fifths is more marked for women than for men; the corresponding figures are 15.65% for men in the most deprived fifth and 13.21% for men in the most affluent fifth.
Table 8 Percentage of patients in validation dataset with 10 year predicted risk of type 2 diabetes from QDScore of ≥10%, ≥15%, ≥20%, ≥30%, ≥40%, and ≥50% by ethnicity and sex
Table 9 Percentage of patients in validation dataset with 10 year predicted risk of diabetes from QDScore of ≥10%, ≥15%, ≥20%, ≥30%, ≥40%, and ≥50% by deprivation fifth and sex
Overall, almost half (15
450; 47.9%) of cases of diabetes occurred in the top 10th of the distribution (risk of ≥10.38%) and almost 70% (22
450) occurred in the top fifth (risk of ≥5.98%).