We developed and validated a cardiovascular risk algorithm that simultaneously takes account of ethnicity and deprivation. The algorithm has face validity in the setting in which it will be used and had good discrimination and calibration. There are three main reasons why this study is likely to make an important impact on the decisions of doctors, patients, and commissioners. Firstly, in this prospective study we developed and validated a risk prediction algorithm that provides an individualised estimate of cardiovascular risk and includes the independent contributions of ethnicity and deprivation. This permits identification of those individuals and groups likely to be most disadvantaged by use of existing treatment algorithms. Such patients include south Asian women, who would otherwise be less likely to be identified. This information will, if acted on, help to reduce health inequalities.
Secondly, it extends and improves on our original equation for cardiovascular risk
12 by incorporating important additional clinical conditions (such as rheumatoid arthritis, chronic kidney disease, and atrial fibrillation), allowing more accurate quantification of risks for individual patients. This information should be considered in the context of specific treatment guidelines. Knowledge of cardiovascular risk might be useful in assessing response efficacy and concordance with recommended healthcare interventions for these specific conditions.
Thirdly, it also allows better quantification of risk of cardiovascular disease for patients with type 2 diabetes, which is especially prevalent among south Asian patients. Though there are alternative cardiovascular risk algorithms for patients with diabetes,
18 48 none is based on a large nationally representative primary care cohort, has large numbers of incident events, and also simultaneously takes account of other important risk factors such as deprivation and ethnicity. Although current guidelines might indicate statins for people with diabetes, knowledge of cardiovascular risk can be useful in helping to identify patients at particularly low risk for whom a statin might not be needed.
Strengths and limitations
The strengths and limitations of using this approach and the QRESEARCH database to develop and validate a new risk prediction algorithm have been discussed previously.
12 27We included more sophisticated modelling of the effect of age on risk factors, which results in greater weighting of some risk factors in younger patients, such as smoking status, family history of coronary heart disease, type 2 diabetes, systolic blood pressure, treated hypertension, BMI, deprivation, and atrial fibrillation. This also has the effect that in people without the risk factors the increase in risk with age will be steeper than with QRISK1. The inclusion of patients with type 2 diabetes in the main study population will have tended to increase the overall level of risk in the study population and this will also have tended to increase the risk for an individual, as can be seen from the hazard ratios (table 5).
We updated the analysis to include data until March 2008, increasing the number of patients with at least 10 years of follow-up data to almost 440

000 patients. We have furthermore included the linked cause of death as recorded by the Office for National Statistics (ONS). Death linkage increased cases of cardiovascular disease by about 7% across the entire study period, as the data from ONS were not available for the full study period at the time of the original study.
27We used self assigned ethnicity as reported by the patient to their general practice; this has advantages over analyses where ethnicity is assigned by an informant rather than the patient or is imputed geographically or is related to country of birth. The latter is particularly problematic with increasing numbers of people from ethnic minorities now being born in the UK.
49 We also disaggregated the south Asian groups and reported on them separately, which addresses concerns with studies that tend to combine them into one group when there are differences in exposure to risk factors and rates and outcomes of diseases.
25 Though only a quarter of patients had self assigned ethnicity recorded, we think it is reasonable to assume that where patients have self assigned ethnicity recorded as Bangladeshi (for example) that this is accurate and the patient was indeed Bangladeshi. Misclassification would most affect the reference category of “white or not recorded,” but because of the mix of the populations of England and Wales less than 10% of such patients were probably from a non-white ethnic group. This misclassification would therefore, if anything, tend to underestimate the relative effect of ethnicity on cardiovascular risk.
Just fewer than 3% of our total sample were classified as belonging to a minority ethnic group compared with the national proportion in this age group of 6.6% (based on projections for 2006
50). The comparison, however, is not “like for like” as national estimates are for 2006 and migration patterns and population demographics have probably changed over the 15 year period of our study. None the less, the lower percentage of patients from minority groups raises concerns about the possible under-representativeness of practices from ethnically diverse inner city areas or misclassification error, or both. We think under-representativeness of practices from ethnically diverse areas is unlikely as QRESEARCH practices are drawn from across England and Wales and have been shown to be similar to practices nationally for a range of measures.
26 In fact, QRESEARCH has proportionately more practices in areas of higher ethnicity such as the East Midlands, Yorkshire, and Humberside (fig 4. Also, table 1 shows that among patients from both cohorts, when ethnicity was recorded 11.7% were from a minority group. This is higher than from census estimates for 2006, indicating either over-representation of practices from ethnically diverse areas or that practices in ethnically diverse areas are more likely to record ethnicity, or both. Therefore, the reason for the apparent under-representation of people from black and minority ethnic groups has arisen is probably because we combined thenot recorded and the white groups. This combined group will contain additional patients from groups classified as other than white. This would, if non-differential, result in a bias towards the null hypothesis of no difference in risk between ethnic groups. The net consequence of this would be, if anything, to underestimate hazard ratios in the minority ethnic populations in question rather than generate spurious associations.
With a number of policy and legislative drivers co-aligning, ethnicity coding is likely to improve exponentially in the UK, and this evolving picture will therefore allow us to continue to monitor the impact of incorporating more complete ethnicity data into our models. But for the present, even though it is imperfect, incorporating ethnicity into our disease risk algorithm has, we believe, clearly been an important advance in understanding risk of disease in ethnically diverse populations. Furthermore, it is unlikely that a better estimate could be obtained for England and Wales given the difficulties of assembling a sufficiently large prospective cohort for follow-up over 10 or more years.
Another potential limitation of our study is that we have assumed that the absence of a recorded diagnosis of diabetes (or family history, for example) is equivalent to the person not having that factor. This is probably valid for diabetes as there have been consistent efforts in general practice over the past 15 years to develop and validate diabetes registers (including comparisons against prescribed medication for diabetes), though we accept there will additionally be large numbers of cases not yet diagnosed by clinicians. Recording of family history is less systematic in primary care and might be more susceptible to recording bias. As recording of risk factors becomes more complete over time, then better estimates of the relevant hazard ratios will be possible.
Also relevant is that we have calculated 95% confidence intervals around the QRISK2 scores to give a better idea of precision. We have improved on the method for validation by using multiple imputation for missing values in the validation set rather than mean values by age and sex derived from the derivation dataset as in our original study and independent validation.
12 27 One important limitation, though, is that while we have validated the results in a physically discrete group of practices, these practices all use the same EMIS clinical system and hence there is a potential “home advantage” that might reduce the generalisability to other systems, although, conversely, it is ideally suited for use in the EMIS system. In other words, any comparison done in the one third sample of practices in QRESEARCH will tend to favour QRISK2 compared with other prognostic scores. Our previous study
27 was additionally validated in a database (THIN, “The Health Improvement Network”) derived from a set of practices using a different clinical system (In Practice Systems) and gave similar results (apart from the prevalence of family history, which was lower in the THINdatabase). This suggests that our findings are probably generalisable to the 20% of practices in England and Wales that use In Practice Systems in addition to the 60% of practices that already use the EMIS clinical system from which the equation is derived. Further validation of QRISK2 is not currently possible on the THIN database as the database does not have the linked ONS death certificate data and recording of ethnicity is too low (personal communication, THIN, 2008). The validation we have presented constitutes the best currently possible given the extent and nature of comparable datasets. The results should generalise to at least 80% of practices nationally. None the less, it is important that QRISK2 is validated by another team on external populations and an international version of QRISK2 is being developed to allow this and will be reported in due course. In particular, we are working with another primary care database (THIN) to link their data to ONS death certificate data so that this can be used as a data source for further validation. Ethnicity recording could be improved on primary care databases by linkage of individual level data on self assigned ethnicity from the 2001 census, and this will be undertaken and reported, assuming access to these data is granted.
Comparisons with the modified Framingham score
This study improves on our original equation for cardiovascular risk in terms of its potential application as outlined above and also because the more complex model has slightly better discrimination (that is, greater ability to separate patients at high and low risk) than our original model. The QRISK1 equation improved on other equations in use in the UK by including additional readily available risk factors such as deprivation, family history, BMI, and blood pressure treatment. With QRISK2, the improvement in discrimination and calibration compared with the modified Framingham score remains significant, although this is probably partly because the modelling was undertaken on a more contemporaneous population from England and Wales and we used a more sophisticated approach for modelling and included additional variables. We have not compared QRISK2 with the most recently published Framingham score as this uses a much broader definition of cardiovascular disease that is less relevant to UK guidelines.
51 QRISK2 seems to improve on the Framingham score based Ethrisk,
19 perhaps because of its greater precision, larger sample, and prospective study design.
In contrast to our previous study, we compared QRISK2 with the modified Framingham risk score recently recommended by NICE. The modified score, in common with the risk equation advocated by the Joint British Societies, involves summing risks from two risk equations for coronary heart disease and stroke, which is mathematically incorrect because these are not independent outcomes and therefore will give an invalid result. This addition of the two separate and non-independent risks results in some patients having an estimated risk of more than 100% and would also result in overestimation of risk for other individuals at lower estimates of risk. This might have accounted for some of the overprediction. The inflation factors of 1.4 for south Asian men and 1.5 for those with a family history coronary heart disease, which have been developed by consensus rather than a mathematical model based on individual patient data, might also have accounted for some of the overprediction, although this was still present on our previous analysis where the inflation factors had not been applied.
12 27Comparisons with the literature
We found substantial heterogeneity between risk factors within south Asian populations and our prevalence figures for risk factors are comparable with the literature,
19 20 which increases the face validity of our findings. For example, as others have found, Bangladeshi men have higher rates of smoking but lower mean systolic blood pressure levels than Pakistani or Indian men.
20 Indian and Pakistani men and women have higher mean BMI than Bangladeshis.
20 Prevalence of type 2 diabetes was higher in Bangladeshis and Pakistanis than Indians.
20 Similarly, cholesterol/HDL ratio was higher among each of the south Asian groups compared with the white reference category.
20 Our findings also confirm Nazroo’s observations
52 and the findings of the Whitehall II study
53 of the independent effects of both ethnicity and deprivation. Overall, the results of our study add to a growing body of evidence that combining people of south Asian origin into one category is potentially misleading.
The magnitude of the increased cardiovascular risk among south Asians compared with white patients seems to be higher than the 40% previously thought in the absence of prospective incidence data.
22 24 For example, in our study, compared with the white reference group the adjusted risk is 45% higher (29% to 63%) among Indian men, 67% higher (40% to 101%) among Bangladeshi men, and 97% higher (70% to 129%) among Pakistani men, even after adjustment for multiple confounders including deprivation and diabetes. Similarly, the adjusted risks for Indian, Pakistani, and Bangladeshi women are all increased compared with the white reference population. Our results also suggest that the increased cardiovascular risks observed for Pakistani men are significantly higher than those for Indian men. The difference between these two groups for women is similar, although of borderline significance when a direct comparison is made, probably because of a lack of power.
There were also differences in the proportion of events that were stroke or transient ischaemic attacks rather than coronary heart disease. For example, a high proportion of first events among black Caribbean and black Africans was stroke or transient ischaemic attacks, which is consistent with the literature.
54 55 Other studies have found differences in mortality between different ethnic groups, such as the unexplained persistent higher mortality among Bangladeshis.
56 This deserves further study as to the underlying causes and potential missed opportunities for care.
Clinical implementation
QRISK2 has been designed to estimate cardiovascular risk for an entire population of patients in primary care by using data already collected within the patient’s electronic health record and by using default values for body mass index, cholesterol concentration, and systolic blood pressure where these data have not been recorded in the past five years. Computer generated risk scores have been integrated within routine clinical use of computers in UK primary care for the past 10 years, and, with QRISK2 embedded within computer applications, a rank ordered recall list can be generated so that those at greatest clinical need can be recalled first. Once such patients have been recalled, the individual can have a full clinical cardiovascular check to calculate an actual QRISK2 based on the most up to date data that are then used to guide decisions about treatment.
The only item in QRISK2 that is not already routinely collected and recorded electronically is the Townsend deprivation score, which is linked to an individual postcode. This score has already been integrated into the EMIS clinical system and linked to the records of over 32 million patients. The mapping of postcode to deprivation score will also be made available, together with the supporting reference tables and algorithm itself. QRISK2 can then be integrated within clinical management systems so that it can be used on an ongoing basis to generate an estimated score based on existing data. QRISK2 will be updated as improved analytical techniques are developed for application to the QRESEARCH database. QRISK will evolve as data quality and completeness improves and population characteristics change (obesity is increasing, while incidence of cardiovascular, for example). This will ensure that future versions of QRISK remain well calibrated to the population of England and Wales and makes best use of technical developments. Lastly, the NHS’ electronic health record(NHS Care Record Service) is central to the NHS Connecting for Health’s national programme for information technology and this will, within a relatively short space of time, result in electronic health records replacing paper based records in hospitals in England.
57 The plan is for these eventually to incorporate computerised decision support tools and so this will allow disease risk algorithms such as QRISK2 to be largely automatically populated with routine electronically coded data as is already possible in primary care in the UK.
These estimates, like any predictive score, are an aid but not a replacement for judgment in individual clinical circumstances. We have specifically identified atrial fibrillation and rheumatoid arthritis for consideration as both are known to be associated with increased risk
31 32 58 59 and knowledge of them might inform clinical management for an individual patient. We recognise that the likely age and comorbidity of these individuals, however, might place them at being at high risk of cardiovascular disease and therefore not appropriate for a primary prevention tool such as QRISK2. Nevertheless, if we had omitted rheumatoid arthritis and atrial fibrillation, the effect would be to underestimate risk for individuals with either of these two conditions who did not yet have concurrent cardiovascular disease. The prevalence of rheumatoid arthritis and atrial fibrillation is low so this will have a minimal impact on the overall precision of the model or its application at a population level, but we believe the additional complexity of the model is justified as no additional data entry will be required from most users, while it also provides relevant information to the individual patient with one or either of these conditions and their clinicians.
QRISK2 provides a mechanism for estimating absolute risk among individuals. Use of this information, however, should be tightly coupled with suitable guidelines. There are some patients in whom a QRISK2 score should not be calculated, including those with pre-existing cardiovascular disease (who we excluded from this study). Risk estimation should not be used for people with conditions such as peripheral vascular disease, heart failure, familial hypercholesterolaemia, or other conditions not specifically identified in the algorithm that are known to be associated with high risks of cardiovascular events.
5 We have not added further to the exclusions in this dataset as to do so would have added complexity with no appreciable gain in precision for people in whom we do not recommend the use of this score.
Clinical impacts and health inequalities
A risk prediction algorithm that does not include deprivation or ethnicity is likely to result in the inequitable definition of risk for affluent and deprived communities and also substantially underestimate the risk in south Asian people, especially women, in whom, like men, it is the commonest cause of premature death. Primary prevention programmes that do not take these variables into account risk exacerbating rather than reducing existing health inequalities,
6 7 8 especially as the evidence suggests that health inequalities naturally widen at the start of new health initiatives.
21 Other research highlights additional difficulties with accessing effective health promotion, including lack of risk awareness, influences of culture and lifestyle, time restrictions, and language difficulties
60 and this needs to be addressed once patients have been identified to improve clinical outcomes.
The QRISK2 algorithm, like its predecessor, has better calibration and is a better discriminator of risk of cardiovascular disease than the modified Framingham score. A major advantage of QRISK2 is the ability of the algorithm to be updated as population demographics, ethnic composition, prevalence of risk factors, and incidence of cardiovascular diseasechange. It also demonstrates the utility of linked electronic data for research to develop tools that can help doctors to make better decisions. The marked gradient with deprivation has already been demonstrated with QRISK1. The further identification of ethnicity as an independent factor additional to deprivation is an important consideration, particularly for south Asian women at high risk. A broader range of important clinical conditions included in QRISK2 but not in the modified Framingham score make it a more clinically relevant tool. Highlighting risks of conditions including type 2 diabetes and chronic renal disease supports further integration of vascular strategies and informs individual assessment.
The modified Framingham score underestimates risk in south Asian women. Like the earlier version, QRISK2 includes BMI and treatment for hypertension, neither of which are included in the Framingham score; in QRISK2, family history contributes an important additional weighting particularly at younger ages. The clinical relevance, superior performance, and equitable assignment of QRISK2 make it an appropriate tool to assist in the delivery of public health programmes that recognise the broader determinants of cardiovascular health, such as ethnicity and deprivation. This has particular relevance to equity of delivery of health care to the UK’s south Asian communities and might help to reduce widening health inequalities.
What is already known on this topic
- A 10 year cardiovascular disease risk threshold of 20% is recommended for intervention with statins for the primary prevention of cardiovascular disease
- Current algorithms for risk of cardiovascular disease do not adequately account for the combined effect of socioeconomic status and ethnicity, leading to an underestimate of risk in high risk populations that might potentially exacerbate existing health inequalities
What this study adds
- Compared with a white reference population, there is a substantially increased risk of cardiovascular disease in south Asian men and women that is independent of social deprivation, diabetes, and family history
- The results of the calibration and discrimination statistics for QRISK2 were significantly better than those for the modified Framingham score in the validation sample
- At the 10 year risk threshold of 20%, the population identified by QRISK2 was at higher risk of a CV event than the population identified by the modified algorithm