For this validation study, we used data from our survey of all veterans in the Upper Midwest Network with a VA health care encounter (n=70,334), details described elsewhere (12
). Of these, 1,241 had undergone prior knee or hip replacement, as documented by presence of an International Classification of Diseases-9th
version (ICD-9) or Current Procedure Terminology (CPT) code for knee or hip replacement (00.70– 00.76, 00.8–00.84, 81.51–81.55; 27437, 27438, 27440–27443, 27445–27447, 27486, 27487, 27125, 27130, 27132, 27134, 27137, 27138 and 27236). We obtained 4 random samples of 50 patients each with: neither hip nor knee replacement code, knee replacement code only, hip replacement code only and both knee and hip replacement codes. This combined list of 200 patients with names in alphabetic order were provided to a physician (S.A.) trained in chart abstraction, who was blinded to the ICD- and CPT-codes as well as how the sample was obtained. The Institutional Review Board at the Minneapolis VA Medical Center approved the study.
We used a standardized data extraction form to collect abstract demographics (age and gender), the date of knee or hip replacement, laterality (right, left), type of replacement (total or partial, primary or revision), underlying diagnosis and the procedure details from the operative and other clinical notes. A chart documentation of knee or hip replacement surgery in patients’ medical records was the gold standard for patient having undergone a knee or a hip replacement. All medical records (in- and out-patient visits) were reviewed starting from the first available encounter. We retrieved complete VA medical records including paper and computerized records for 140 patients (70%) and incomplete medical records for 26 patients (13%). No medical records were available for 34 patients (17%). Thus, the study sample consisted of 166 patients (83%). Main analyses were performed for the 140 with complete charts with sensitivity analyses including 166 patients. This chart retrieval rate is similar to other studies of validity of diagnoses in the VA health care system (7
We compared the administrative data definition of presence of ICD-9 or CPT code to the gold standard of chart documentation of knee or hip replacement for each patient. We calculated sensitivity, specificity, positive and negative predictive values and Kappa statistic for administrative data. Sensitivity was the fraction of those with knee replacement or hip replacement according to the gold standard that were correctly identified as positive by the data definition, respectively. Specificity was the fraction of those without joint replacement (knee or hip) according to the gold standard that were correctly identified as negative by the data definition. Positive predictive value was the proportion of those with positive test definition that meet the gold standard definition of medical chart documentation. according to the gold standard. Negative predictive value was the proportion of those with negative test definition that did not meet the gold standard definition. The kappa coefficient was used to describe agreement (beyond chance) between the rheumatologist’s diagnosis in the chart (gold standard) and the 4 database definitions (13
A receiver operating characteristic (ROC) curve analysis plotted the true positive rate against the false positive rate for the different possible cut-points of a case definition (14
). A ROC curve shows how any increase in sensitivity is accompanied by a decrease in specificity. The area under the ROC measured discrimination of the database test definition, i.e., its ability to correctly classify those with and without respective joint replacement status. The 45-degree diagonal line represents the null hypothesis and a test definition that is no better than random will overlap the diagonal.
Because one may arrive at different conclusions depending on the relative importance given to sensitivity or specificity using the classic methods described above, we performed Bayesian analysis. Specificity and sensitivity can be regarded as utility measures of a test procedure under 2 unknown states of nature, i.e., having or not having the disease. A weighted average of these 2 quantities is the Bayes utility of a test. Bayes values for each diagnosis definition were calculated by giving a range of importance (P; value ranging from 0 to 1) to sensitivity and (1−P) to specificity, where 0 indicates the least importance and 1 indicates the maximum importance. For example, if sensitivity is most critical, we choose the method with the highest sensitivity, i.e., P of 1. However, in various situations sensitivity and specificity have different weights of importance. Linear combinations of sensitivity and specificity for different values of P were graphed. The analyses were performed using SPSS software, version 11.5 (SPSS, Chicago, IL) and S-plus 2000 (Mathsoft, Seattle, WA).
To examine whether the dates of knee or hip replacement surgery in VA administrative databases are accurate, we identified the cohort of patients that underwent knee or hip replacement during the fiscal years 1992–1998, since the surgery dates were available for this time-period in the administrative databases. 94 patients of the original cohort of 140 patients with complete charts qualified (41 with no knee/hip replacement procedures and 5 with procedures outside the study period 1992–1998). Date difference was calculated in days for difference between the dates from administrative database and gold standard chart documentation (most often from the operative or anesthesia note).