This review is, to our knowledge, the first to systematically summarize studies that assess the validity of existing data related to kidney disease. Our results show large variability in accuracy of data items, depending on variable of interest, study population, and, importantly, the comparison gold standard. These results have important implications for clinicians, researchers, provider organizations and payers who seek to improve the health of kidney disease patients through analyses of secondary data sources.
Validation of the variable “kidney disease” is essential in CKD research, as many analyses are predicated on the correct diagnosis and coding of disease. In the 21 studies that address the presence of kidney disease, specificity is high, regardless of gold standard, indicating relatively few false positives. In a study population with a high prevalence of CKD, therefore, a researcher using one of the study datasets could be confident that few patients described as having kidney disease were misclassified. However, if one wanted to assemble a cohort of CKD patients, caution and scrutiny are advisable. Sensitivity of the studies varied widely. Studies using a gold standard of medical record documentation reported higher sensitivities than those using eGFR. No studies incorporated an assessment of kidney function using repeated measurements, proteinuria or measured GFR, the latter arguably the best gold standard.
Gold standard choice affects the case population. Defining CKD by medical record likely underestimates the number of cases due to lack of physician recognition or documentation. Using eGFR as a gold standard may overestimate CKD cases by misclassifying normal elderly or people with AKI; further, not all patients may have measured serum creatinine, and eGFR calculated from non-standardized creatinine measurements may be inaccurate.
47 As an example, despite similar study populations, 67% of Winkelmayer’s
19 population had CKD (defined by eGFR) whereas So et al
33 found only 17% with either AKI or CKD (defined by medical record). This discrepancy may introduce bias in certain studies. For instance, if white patients with less severe CKD are systematically misclassified as without kidney disease, a comparative effectiveness study of therapeutic interventions may not be generalizable to such patients.
Studies with the same gold standard were also variable. For example, So et al.,
33 Ferris et al.,
15 and Quan et al.
29 all evaluated ICD-coded kidney disease against the medical record, estimating sensitivities of 0.83, 0.50, and 0.42, respectively. Many reasons may underlie this variability. Geographic variation in coding practices is well described.
19 Practices may differ between inpatient and outpatient settings. Coding may change over time to incorporate new reimbursement practices, alterations in the numbers of allowable billing codes, and updates in the disease codes themselves.
17 For example, new CKD codes were recently introduced. Using data which differ by medical center, time, or coding practices may systematically bias results.
Validation of CKD-related variables such as etiology, associated comorbid conditions, and cause of death is inherently difficult. First, defining gold standards for variables can be subjective. Tissue diagnosis is considered the gold standard for most etiologies of CKD. However, some patients present too late in the course of disease for an informative biopsy, and biopsy is not often done for certain types of patients, such as those with longstanding diabetes or hypertension. Second, cause of death or kidney disease, if known, is often multifactorial and may not be reducible to one or two ICD codes. For instance, in a patient with multiple comorbidities, clinical differentiation between diabetes and hypertension as the cause of CKD is difficult. Third, many databases use the same sources of information, such as the Medical Evidence Report (Form 2728). Constructing a gold standard on a common source introduces incorporation bias, and overestimates the accuracy of the underlying data.
Despite these limitations, some interesting patterns emerge in the validation of CKD-related data. Among studies evaluating the cause of kidney failure, less agreement was found in polycystic kidney disease, perhaps because fewer PKD patients are hospitalized for complications of PKD, as compared to, for instance, diabetics with diabetic nephropathy. Using inpatient claims data for analysis of a CKD cohort, therefore, may not only under-capture CKD patients, but may also skew the cause of disease toward the etiologies more commonly hospitalized for non-renal reasons, such as diabetes and hypertension.
In conclusion, this review shows that existing data sources need careful scrutiny before use in any research effort, particularly surveillance studies, outcomes research, quality improvement research and comparative effectiveness research. There is a wide range in accuracy of kidney disease-related variables, and additional research is required to investigate the sources of this variation. In addition, initiatives to improve the quality of underlying kidney disease data should be undertaken, including standardized reporting for prevalence estimates. In particular, efforts must be made to increase providers’ recognition and documentation of disease, such as through more systematic testing and standardized ICD coding. Finally, an electronic medical record, integrated with laboratory and vital statistic data, would greatly facilitate clinical and epidemiologic research.