|Home | About | Journals | Submit | Contact Us | Français|
Chronic Kidney Disease is a major cause of morbidity and interventions now exist which can reduce risk. We sought to develop and validate two new risk algorithms (the QKidney® Scores) for estimating (a) the individual 5 year risk of moderate-severe CKD and (b) the individual 5 year risk of developing End Stage Kidney Failure in a primary care population.
We conducted a prospective open cohort study using data from 368 QResearch® general practices to develop the scores. We validated the scores using two separate sets of practices - 188 separate QResearch® practices and 364 practices contributing to the THIN database.
We studied 775,091 women and 799,658 men aged 35-74 years in the QResearch® derivation cohort, who contributed 4,068,643 and 4,121,926 person-years of observation respectively.
We had two main outcomes (a) moderate-severe CKD (defined as the first evidence of CKD based on the earliest of any of the following: kidney transplant; kidney dialysis; diagnosis of nephropathy; persistent proteinuria; or glomerular filtration rate of < 45 mL/min) and (b) End Stage Kidney Failure.
We derived separate risk equations for men and women. We calculated measures of calibration and discrimination using the two separate validation cohorts.
Our final model for moderate-severe CKD included: age, ethnicity, deprivation, smoking, BMI, systolic blood pressure, diabetes, rheumatoid arthritis, cardiovascular disease, treated hypertension, congestive cardiac failure; peripheral vascular disease, NSAID use and family history of kidney disease. In addition, it included SLE and kidney stones in women. The final model for End Stage Kidney Failure was similar except it did not include NSAID use.
Each risk prediction algorithms performed well across all measures in both validation cohorts. For the THIN cohort, the model to predict moderate-severe CKD explained 56.38% of the total variation in women and 57.49% for men. The D statistic values were high with values of 2.33 for women and 2.38 for men. The ROC statistic was 0.875 for women and 0.876 for men.
These new algorithms have the potential to identify high risk patients who might benefit from more detailed assessment, closer monitoring or interventions to reduce their risk.
Chronic Kidney Disease (CKD) is a significant cause of morbidity and mortality across the developed world. It is associated with increased risk of death from cardiovascular disease[1-3], as well as from End Stage Kidney Failure. The health burden due to CKD is likely to continue to rise with the ageing population and worldwide increase in Type 2 Diabetes. CKD is an insidious disease characterised by a clear natural history with a detectable asymptomatic period during which interventions (such as blood pressure control) could help prevent or delay progression to End Stage Kidney Failure. Despite this, patients are often identified and referred late when up to half already require Kidney dialysis or transplantation[6,7].
Whilst there are no randomised trials demonstrating that screening for CKD improves clinical outcomes, national programmes recommend that individuals who are at increased risk of CKD are tested for undetected kidney disease[8,9]. Reliable methods for identification of high risk patients are likely to be needed to identify and target early assessment and interventions to maximise health gain and improve outcomes.
High quality representative data within electronic patient records from primary care can be used to derive and validate robust risk prediction algorithms which can be then implemented and evaluated in clinical settings. This has already been achieved with the cardiovascular risk algorithm QRISK®2 which is integrated into the EMIS Clinic Computer System used by over half of general practices in the UK. The approach is being extended to other clinical conditions[11,12].
Although the case for developing a risk prediction model for CKD has been articulated, there are currently no widely accepted algorithms available to predict risk of CKD for an individual patient within primary care. In this paper, we describe the derivation and validation of two new risk predictions algorithms to predict 5 year risk of moderate-severe kidney disease. Designed to integrate with QRISK®2 and the QDScore® for diabetes, the QKidney® Scores complete the triad of prediction algorithms developed to identify patients at high risk of vascular disease for intervention.
We did a prospective open cohort study in a large population of primary care patients using the QResearch® database (version 22). We included all practices in England and Wales who had been using their EMIS computer system for at least a year. We used established methods for the study design and analysis which we summarise here and which are described and reviewed in detail elsewhere[10-12,14-16].
We randomly allocated two thirds of practices to the derivation cohort and the remaining third to a validation cohort. We identified an open cohort of patients aged 35-74 years without recorded evidence of CKD at study entry, registered with practices between the study start date of 01 Jan 2002 and the study end date of 31 Dec 2008. Patients entered the study on the latest of study start date, date of first registration with the practice or date they became 35 years old. Patients were censored at the earliest date of development of CKD, death, de-registration with the practice, last upload of computerised data or the study end date. Patients could therefore have up to 7 years of follow up data available.
We used the same inclusion and exclusion criteria to identify a separate sample of patients drawn from an independent set of practices contributing to the THIN primary care research database http://www.thin-uk.com except the study end date for this sample was 30 June 2008.
We had two main outcomes. One outcome was recorded evidence of moderate-severe CKD defined as the first occurrence of any of the following during follow-up:
a) recorded kidney transplant;
b) recorded kidney dialysis;
c) recorded diagnosis of nephropathy;
d) glomerular filtration rate < 45 mL/min/1.73m2 corresponding to stage 3B CKD.
e) recorded diagnosis of proteinuria;
Our second main outcome was End Stage Kidney Failure which was defined as the first occurrence of any of the following during follow-up:
a) recorded kidney transplant;
b) recorded kidney dialysis;
c) glomerular filtration rate < 15 mL/min/1.73m2 corresponding to stage 5 CKD.
We calculated the glomerular filtration rates using the MDRD equation using laboratory reported creatinine values
We developed models for men and women for each of our two main outcomes. Our initial list of predictor variables included those known to be associated with increased risk of CKD based on the literature[1,2,5,18,19] and from national guidance which are also likely to be recorded in the patients electronic health record. We also sought the opinions of two senior nephrologists including the National Clinical Director for Kidney Services in England, Dr Donal O'Donoghue.
Variables examined for inclusion in both models were:
• Age at study entry (in single years)
• Body mass index
• Systolic blood pressure (mmHg)
• Smoking status (non-smoker, ex-smoker; light smoker: < 10 cigarettes/day, moderate smoker: 10-19 cigarettes per day, heavy smoker: 20 or more cigarettes per day)
• Ethnic group
• Townsend deprivation score (derived from the patient's postcode) 
• Diagnosis of type 1 diabetes
• Diagnosis of type 2 diabetes
• Diagnosis of cardiovascular disease
• Diagnosis of rheumatoid arthritis
• Treated hypertension
• Diagnosis of congestive cardiac failure
• Diagnosis of peripheral vascular disease
• Diagnosis of systemic lupus erythematosis
• Two or more prescriptions for NSAIDs drugs in the 6 months before study entry
• Evidence of kidney stones based on diagnosis or operative procedure at baseline
• Recorded family history of kidney disease including polycystic kidneys
• Prostatic hypertrophy at baseline (men)
In both models we only used diagnoses recorded prior to the baseline date as predictor variables, for body mass index, smoking status and systolic blood pressure we used the values recorded closest to the study entry date. Ethnic group was categorised as in previous publications.
We developed and validated the risk prediction algorithms using established methods described in detail elsewhere [10-12,14-16]. In summary, we used Cox's proportional hazards models to estimate the coefficients for each risk factor for both outcomes for men and women separately, adjusting for other baseline risk factors. We excluded patients with the outcome at baseline. We used fractional polynomials to model non-linear risk relationships with continuous variables. We compared models using the Akaike information criterion (AIC). We used multiple imputation to replace missing values for body mass index, systolic blood pressure and smoking status and used these values in our main analyses[22-25]. We carried out 5 imputations. We examined interactions between predictor variables and age, and we included significant variables and significant interaction terms in the final models. We took the regression coefficients for each variable from the final models and used these as weights which we combined with the baseline survivor function for moderate-severe CKD evaluated at 5 years to derive risk equations for (a) moderate-severe CKD and (b) End Stage Kidney Failure at 5 years' follow-up.
We applied the algorithms obtained from the derivation cohort to both validation cohorts and calculated measures of discrimination (D statistic , R2 statistic for survival data and area under the receiver operating characteristic curve (ROC statistic)) and calibration (comparing observed with predicted risks by tenth of predicted risk). We used the THIN validation sample for our main validation as this is from practices using a different clinical computer system from QResearch practices. We used all the available data on each database to maximise the power and also generalisability of the results. We used STATA (version 11) for all analyses.
The project has been independently reviewed in accordance with the QResearch® agreement with Trent Multi-Centre Research Ethics Committee.
Overall, 556 QResearch® practices in England and Wales met our inclusion criteria, of which 368 were randomly assigned to the derivation dataset and 188 to the QResearch® validation dataset.
In the QResearch® derivation cohort there were 1,591,884 patients aged 35 to 74 at baseline of whom 17,135 had recorded evidence of pre-existing CKD and were therefore excluded leaving 1,574,749 patients for analysis. Of those with pre-existing CKD, 1,266 women and 1,524 men had End Stage Kidney Failure. In the QResearch® validation cohort there were 796,598 patients aged 35 to 74 at baseline of whom 8,278 had recorded evidence of CKD at baseline leaving 788,320 for analysis.
There were 364 practices from the THIN database which met our inclusion criteria. The THIN cohort consisted of 1,595,141 patients aged 35 to 74 of whom 13,396 had recorded evidence of CKD at baseline leaving 1,581,745 for analysis.
Baseline characteristics for both the THIN and QResearch® validation cohorts were very similar to the QResearch® derivation cohort in both men and women as shown in Table Table11.
Table Table11 also shows the proportions of patients with values recorded for smoking status, body mass index, systolic blood pressure and serum creatinine. Complete data for smoking status, body mass index and systolic blood pressure were available for 77.15% of patients in the QResearch derivation cohort and 77.57% of patients in the THIN validation cohort. There were differences in observed characteristics between those with and without missing data. As in previous studies[12,27], this pattern of missing data supports the use of multiple imputation (results available from the authors) under the assumption that data are missing at random.
Table Table22 shows the age/sex incidence rates of moderate-severe CKD and Table Table33 shows the incidence rates for End Stage Kidney Failure in both the QResearch® derivation cohort and the THIN validation cohort.
During the 4,068,643 person years of follow up for women in the derivation cohort without moderate-severe CKD at baseline, there were 23,786 incident cases of moderate-severe CKD giving an overall incidence rate of 58.46 per 10,000 person years. For men, there were 17,333 moderate-severe CKD cases arising from 4,121,926 person years giving a crude incidence rate of 42.05 per 10,000 person years. During the 4,177,287 person years of follow up for women in the derivation cohort without End Stage Kidney Failure at baseline, there were 1,266 incident cases of End Stage Kidney Failure giving an overall incidence rate of 3.03 per 10,000 person years. For men, there were 1,534 cases of End Stage Kidney Failure arising from 4,193,578 person years giving a crude incidence rate of 3.66 per 10,000 person years.
The incidence rates for both outcomes in the THIN validation cohort were very similar to that for both QResearch® cohorts.
Table Table44 shows the variables included in the final algorithm for moderate-severe CKD with the hazard ratios, fractional polynomial terms for the continuous variables and the associated interaction terms. The highest risks of moderate-severe CKD occurred with Type 1 diabetes (adjusted hazard ratio 12.30, 95% CI 10.3 to 14.6 for men and 8.21, 95% CI 6.74 to 9.99 for women). The adjusted hazard ratios were lower for Type 2 diabetes (6.07, 95% CI 5.61 to 6.57 for men and 4.50, 95% CI 4.14 to 4.89 for women).
Pakistani patients had the highest risks which were almost twice those in the "White or ethnicity not recorded" group. The adjusted hazard ratios were 2.00 (95% CI 1.70 to 2.35) for Pakistani men and 1.55 (95% CI 1.32 to 1.81) for Pakistani women. Lowest risks were observed among black Caribbean women (adjusted hazard ratio 0.48, 95% CI 0.41 to 0.57) and Black African women (adjusted hazard ratio 0.56, 95% CI 0.43 to 0.74). Smokers, men and women from deprived areas, those with cardiovascular disease, congestive cardiac failure, peripheral vascular disease, NSAID use, family history of kidney disease, treated hypertension and rheumatoid arthritis also had increased risks compared with patients without those factors. Kidney stones and systemic lupus erythematosis were significant predictors in women but not in men. Prostatic hypertrophy was not an independent risk factor for men so was not included in the final model.
Table Table55 shows the hazard ratios for End Stage Kidney Failure. Patients with type 1 and type 2 diabetes, congestive cardiac failure, treated hypertension and those with a family history of kidney disease all had more than twice the risk of End Stage Kidney Failure compared with patients without these factors. Kidney stones and systemic lupus erythematosis were significant predictors in women but not in men.
Other predictors in both men and women included deprivation, smoking, cardiovascular disease, rheumatoid arthritis and peripheral vascular disease. The increased risks associated with these factors were less marked than the factors above. NSAID use was not significant in men or women. Prostatic hypertrophy was not an independent risk factor for men so was not included in the final model.
For the THIN cohort, the model to predict moderate-severe CKD explained 56.38% of the total variation in women and 57.49% for men (Table (Table6).6). The D statistic values were high indicating good discrimination with values of 2.33 for women and 2.38 for men. The ROC statistic was also high with values of 0.875 for women and 0.876 for men.
The D statistic, R2 statistic and ROC statistics had marginally higher values for the moderate-severe CKD model compared with the End Stage Kidney Failure model suggesting marginally better performance for the moderate-severe CKD model in both men and women (Table (Table66).
Both algorithms were well calibrated in the THIN cohort as shown by the close correspondence between observed and predicted values in the calibration graphs across tenths of predicted risk both for moderate-severe CKD (Figure (Figure1)1) and End Stage Kidney Failure (Figure (Figure22).
The corresponding results for all validation statistics for both models in men and women in the QResearch validation cohort were very similar as can be seen in Table Table66.
The QKidney® Scores can also be used to risk stratify an entire population aged 35-74 years to identify patients at highest risk for more proactive intervention as part of the systematic Vascular Risk Assessment Programme currently underway in the UK. Since there are no established thresholds for risk of CKD comparable with the 20% threshold for cardiovascular disease[28-30], we defined these based on the distribution of the models within the QResearch validation cohort (which were extremely similar to those also found in the THIN analysis)
For example, the cut off for the top tenth for risk of moderate-severe CKD gives a 5 year risk threshold of 5.46% in men and 8.01% in women. This top tenth contained 58.01% of all men in the QResearch validation cohort who developed moderate-severe CKD over the 5 year period from baseline and 55.68% of women.
For End Stage Kidney Failure, the cut off for the top tenth gives a 5 year risk threshold of 0.49% in men and 0.36% in women. This top tenth contained 55.93% of all men in the QResearch validation cohort who developed End Stage Kidney Failure over the 5 year period from baseline and 55.44% of women.
Applying the QResearch age/sex incidence rates of moderate-severe CKD to the estimated population of England and Wales aged 35-74 for 2007, we estimate there will be about 807,400 new cases of moderate-severe CKD in the next 5 years. This is a conservative estimate since the population is likely to age over the next 5 years. Using the moderate-severe CKD algorithm to identify the 10% of patients with the highest risk would be expected to identify approximately 457,700 of the patients who develop moderate or severe CKD over the next 5 years. Assuming an intervention with 10% effectiveness at reducing risk, then approximately 45,770 cases of CKD could be prevented in England and Wales over the next 5 years by targeting the intervention at those at greatest risk.
Applying the QResearch age/sex incidence rates of End Stage Kidney Failure to the estimated population of England and Wales aged 35-74 for 2007, we estimate there will be about 46,500 new cases of End Stage Kidney Failure in the next 5 years. Using the End Stage Kidney Failure algorithm to identify the 10% of patients with the highest risk, would identify approximately 25,900 of these patients. Assuming an intervention with 10% effectiveness at reducing risk, then approximately 2,590 cases of End Stage Kidney Failure could be prevented in England and Wales over the next 5 years by targeting the intervention at those at greatest risk.
We have derived and validated two new algorithms designed to predict an individual's 5 year risk of being diagnosed with (a) moderate-severe CKD or (b) End Stage Kidney Failure. Both algorithms are based on factors which the user is likely to know and which are likely to be recorded within the patient's electronic health record. The algorithms do not require a laboratory measurement. They are therefore suitable for situations where this information is not readily available and can be used as part of a Vascular Risk Assessment to flag up those patients who need referral to the GP for a more detailed assessment. They can also be used to inform patients about their level of absolute risk to help them make an informed choice regarding the need for further assessment or intervention.
At population level, these algorithms can be used to "risk stratify" the entire population to systematically identify those patients who need investigation (e.g. creatinine test) or further assessment or regular monitoring. This could be achieved by automatically applying these algorithms to the computerised medical records of all patients aged 35-74 registered with a practice. This meets a core requirement of the NHS Programme for IT, namely to "calculate the risk of the renal function deteriorating, taking all recorded risk factors into account, and the recalculation of risk on a regular basis to take account of changes as a result of ageing or whenever more patient information becomes available (e.g. test results)" (personal communication).
Once identified, high risk patients can then avoid nephrotoxic drugs (such as NSAIDs), have more energetic treatment to lower blood pressure, reduce blood pressure targets or have more frequent follow up of kidney function to allow earlier referral to secondary care services. Further research is required to identify the effectiveness of these interventions in a high risk population.
To our knowledge, these are the first algorithms to predict both the risk of moderate-severe CKD and the risk of End Stage Kidney Failure in UK primary care. They improve on a recently described algorithm to predict CKD derived using 9,470 participants from the American ARIC/CHS cohorts . Both studies used similar statistical methods for the derivation and validation of the algorithms. In the ARIC/CHS study, the outcome included patients with the more mild Stage 3a disease which has less certain prognostic significance. Our algorithms include additional known risk factors such as family history of kidney disease, use of NSAIDS, kidney stones, rheumatoid arthritis, systemic lupus erythematosis, body mass index and smoking status. They also include more detailed variables for ethnic group, interactions with age and distinguish between type 1 and type 2 diabetes, which have markedly different risks. Our ROC values were all in excess of 0.82 which is substantially better than the ROC statistic of 0.70 reported in the ARIC/CHS study.
Strengths and weaknesses of our study are likely to be similar to those discussed in detail elsewhere[10-12,14-16]. Weaknesses, as with all observational studies, include the potential for bias. Misclassification bias of outcome or predictor variables could have occurred, which, if non-differential would tend to bias the hazard ratios towards one and reduce discrimination. However, it is probable that patients with established risk factors such as diabetes would be more likely to have blood or urine tests and this could have the effect of inflating hazard ratios associated with these risk factors. Nonetheless, our hazard ratios for the risk factors in the model apart from diabetes, are generally of a similar magnitude to those found in other similar studies which tested for chronic kidney disease in the entire study cohort. In addition, the assessment and recording of these factors in clinical practice is becoming increasingly routine and complete, so limiting the effect of this potential bias.
Whilst the outcomes were not adjudicated by a panel of clinicians, we think it unlikely that more than a small number of patients will have been misclassified as having the outcomes since the definitions are based on objective measurements or major operations or procedures. It is possible, indeed likely, that some patients had undetected or un-recorded kidney disease at baseline or follow-up since there is no systematic widespread testing of blood or urinalysis. This is in fact part of the justification for a systematic population based approach.
We have based the date of our outcome on the date of first recorded evidence of moderate-severe kidney disease and of end stage kidney failure. Given the insidious and gradual nature of decline in kidney function, it is likely that the real onset occurred before the date of the recorded onset. This will have tended to result in a general under-estimation of incidence rates which in turn would lead to an under-estimation in individualised risk estimates. However, we think in some cases the date of first recorded evidence may relate closely to the onset of symptoms leading to a consultation and blood or urine tests. Although some alternative analytical methods are available to allow for interval censored data these tend to make stronger assumptions than Cox regression about the distribution of the hazard rate, and often group all outcome dates into fixed intervals, hence potentially losing precision.
Key strengths include size, representativeness due to inclusion of entire practice populations and quality of the database used to derive the algorithm and its ability to generalise back into the setting where it can be applied. The algorithms are well calibrated to the setting in which they can be used and have good levels of discrimination. Our study has good face validity as the vast majority of risk factors identified in the literature or by consensus were found to be independent predictors and hence included in the QKidney® Scores[1,2,18]. As in other studies, we found an association between increasing levels of deprivation and risk of CKD  as well as confirming ethnic differences . The inclusion of ethnic group and deprivation within the risk prediction scores should help avoid widening inequalities which can occur at the start of major new public health initiatives. Lastly, we have validated each algorithm in an external set of practices contributing to the THIN database and demonstrated comparable performance with the validation cohort from the QResearch® database. We found very close similarities for a wide range of population characteristics between the THIN and QResearch® cohorts. This helps validate both databases and is reassuring regarding the likely generalisability of results from research using either database to the rest of the UK.
We have developed and validated two new risk prediction algorithms designed to predict risk of moderate-severe CKD and End Stage Kidney Failure in primary care which have the potential to identify high risk patients who might benefit from more detailed assessment, closer monitoring or interventions to reduce their risk. Further research is required to verify these findings as well as to identify the effectiveness of using this approach with appropriate interventions in a high risk population.
JHC is co-director of QResearch® - a not-for-profit organisation which is a joint partnership between the University of Nottingham and EMIS (leading commercial supplier of IT for 60% of general practices in the UK). EMIS may implement the QKidney® Scores within its clinical system. JHC is also director and co-founder of ClinRisk Ltd which produces open and closed software to ensure the reliable and updatable implementation of clinical risk algorithms within clinical computer systems. JHC is also a general practitioner and professor of clinical epidemiology at the University of Nottingham. CC is associate professor of Medical Statistics at the University of Nottingham and a consultant statistician for ClinRisk Ltd. This work and any views expressed within it are solely those of the co-authors and not of any affiliated bodies or organisations.
JHC initiated the study, undertook the literature review, data extraction, data manipulation and primary data analysis and wrote the first draft of the paper. CC contributed to the design, analysis, interpretation and drafting of the paper. Both authors read and approved the final version of the manuscript.
JHC is professor of Clinical Epidemiology & General Practice at the University of Nottingham (UK). CC is associate professor of Medical Statistics at the University of Nottingham (UK).
The work was undertaken by ClinRisk. There was no external funding.
The pre-publication history for this paper can be accessed here:
We acknowledge the contribution of EMIS practices who contribute freely to the QResearch® and to David Stables (medical director of EMIS) for its support in establishing, developing and maintaining the database. We thank EPIC for supplying data from the THIN database. We acknowledge the advice of Dr David Wheeler (reader in renal medicine) and Dr Donal O'Donoghue (National Director for Kidney Disease) for topic specific advice. We also acknowledge the constructive and helpful comments from the BMC peer reviewers.