Our study evaluated the diagnostic properties of ICD-9 codes to identify putative bacterial infections in hospitalized rheumatoid arthritis patients. We found that using ICD- 9 codes alone to identify bacterial infections in hospitalized rheumatoid arthritis patients may misclassify 15% to 46% of the infections, depending on the set of codes and the strength of the evidence desired to identify infections. We also developed diagnostic criteria for bacterial infections based on medical records abstraction, and showed that they had very high positive predictive values. A two-stage process where potential cases are identified using claims data and these medical records classified based on the infection criteria, result in a PPV of 96% and eliminate the need for physicians to review each medical record.
Complementary to our work, Schneeweiss and colleagues developed infection criteria and identified suspected infections among general medical patients at a Veterans Administration (VA) hospital using the restricted list of ICD-9 codes that we adopted for this study [11
]. In their study, the PPV of the claims data to identify selected bacterial infections combined was similar (90%) to our findings (85%). However, compared to the comprehensive set of codes, this high PPV and its corresponding specificity come at the expense of lower sensitivity (48%). Factors that might account for the modest differences in our studies include inherent dissimilarity in the coding practices between a VA and a university hospital, and differences in the characteristics of general medical patients versus RA patients. In addition, since both these studies were focused on assessing validity of the infection codes neither studies separately assessed possible nosocomial infections. Distinguishing nosocomial from community-acquired infections would be important when assessing susceptibility to infections with use of particular therapeutic interventions for managing RA. Schneeweiss and colleagues selected their study population using only the primary discharge diagnoses codes identified in VA data, excluding nosocomial infections [11
]. In contrast, as we had to overcome the possibility of financial incentives in coding for higher reimbursements, we included bacterial infection codes at any position of the discharge claims.
Given their higher sensitivity, a comprehensive set of administrative codes are better able to maximally identify putative bacterial infections. This is necessary when initially researching large administrative databases to identify any potential infections. A restricted set of codes, however, given their better specificity will decrease the number of falsely identified cases of infections. The resultant risk estimates vary with differing sensitivity and specificity. It is important to note though that even if sensitivity is low, high specificity will result in un-biased relative effect measures [14
]. In our past experience, we used a “more sensitive” method of applying the medical and pharmacy administrative claims codes from a health-care insurers’ database as a first step for identifying as many cases of presumed serious bacterial infections [6
]. Once the plausible bacterial infection cases were identified, these cases where further evaluated by applying a “more specific” medical record criteria for bacterial infections. As the specificity of the infection criteria increased, fewer cases met the criteria for an infection (i.e. sensitivity decreased and the adjusted hazard ratio for bacterial infection associated with TNFα antagonist increased).
To our knowledge, only few other studies have assessed the diagnostic accuracy of administrative data alone to identify patients with suspected bacterial infections. One study compared 5 different claims- based algorithms to a standard of clinically diagnosed pneumonia. The prevalence of pneumonia in that study was 2.5%, and the PPVs of the claims algorithms ranged from 73% to 81% compared to ours of 89% [16
]. In another study using data from the General Practice Research Database, 62% of pneumococcal pneumonia administrative codes were confirmed using medical record review [17
]. In both these studies the patients were healthier general medical populations in contrast to our RA patients that are commonly on immunosuppressant medications. Despite these differences in patient populations, the results of our study are similar, suggesting that ICD-9 codes are capable of identifying pneumonia accurately.
Our newly developed medical records-based infection criteria will be a particularly valuable resource when uniform classification criteria are needed. Examples of settings in which this may be particularly useful include multi-site trials where infectious adverse events are classified subjectively by individual site investigators or open label studies where investigators are not blinded to drug exposure status. They also may be useful in retrospective epidemiologic studies where abstractors can collect data and classify infections according to these validated medical records-based infection criteria without requiring case-by-case determinations from a physician.
Some bacterial infections have objective, pathognomonic laboratory findings and are therefore relatively simple to classify. For example, the diagnosis of bacteremia required presence of a positive blood culture, and thus the specificity of its infection criteria was 100%. Classification of some other infections however, was more challenging. For example, cellulitis, a predominantly clinical diagnosis, was defined mainly based on history and physical examination findings. These were sometimes inadequately documented in the medical records, and supporting microbiologic evidence for cellulitis is typically scant and not useful. Thus, the specificity of the cellulitis criteria was only 70%. In comparison, diagnosis of lower respiratory tract infections (mainly pneumonia), relied both on objective (laboratory & culture) and more subjective (i.e. clinical) evidence, and the infection criteria demonstrated an intermediate specificity of 82%.
As is true for any diagnostic test, higher specificity comes at the expense of lower sensitivity. This was made especially clear by the criteria for urinary tract infection (UTI) which had 100% specificity but a sensitivity of only 5%. The likely explanation for this very low sensitivity was the requirement for at least 10,000 colonies of an organism pathologic to the genitourinary tract, or for lesser colony counts, antibiotic administration of at least 7 days. Many of the suspected UTIs we found were treated on the basis of an abnormal urinalysis and no positive culture results were ever documented (even after hospital discharge).
Varying approaches have been used to diagnose bacterial infections in observational studies. Infections have been defined by using a combination of patient self-report, hospital records, physician reports, mortality records, antibiotic prescriptions, or by the opinion of treating physicians [5
]. A few of these studies developed operational criteria to identify their outcomes [2
]. For example, Doran and colleagues required the presence of specific symptoms and laboratory values to define any infectious adverse events in RA patients in a retrospective longitudinal cohort study [2
]. Leveille and colleagues [19
] compared antibiotic prescription fills in automated pharmacy records with medical record review as the standard for infections. Although the face validity of these approaches is reasonable, these approaches were not rigorously evaluated and validated, as we did in this study, nor have they been widely adopted by subsequent investigators.
The major strengths of our study are its sample size and the full access we had to all the information in the medical records, including culture results that became positive even after patient discharge. All medical records were independently abstracted by two healthcare providers who had complete access to all the desired medical records and had high agreement on the classification of infections. Additionally, we studied RA patients who may experience unique patterns of infections compared to general medical patients.
Despite its strengths, our results must be interpreted in the context of our study design. Since this project was undertaken in one university health system, the results may not be generalizable to some other health care settings where medical record documentation or ICD-9 coding practices differ. We were unable to assess the medical records of all patients without an ICD-9 code for infection, hence verification bias might exist [21
], which might increase the apparent sensitivity and decrease the apparent specificity of the ICD-9 codes. Based though on an expectation that claims data may have high specificity [23
], and the observation of only one infection (2%) from those medical records without an ICD-9 code for infection that were reviewed, this bias is likely lessened. Additionally, the medical record infection criteria were intentionally designed to favor specificity at the expense of sensitivity; thus, they under-ascertained infections that did not have objective diagnostic details in the medical records. For some particular sites of bacterial infections (e.g. meningitis), the low prevalence of these infections restricted drawing conclusions about the diagnostic accuracy of these groups of infections. Finally, our administrative data examined only single ICD-9 codes; more complex claims-based algorithms may be able to achieve higher specificities and PPVs.
In summary, the use of ICD-9 diagnosis codes alone may mis-classify bacterial infections in hospitalized RA populations, although the level of misclassification varies depending on the codes used. To improve accuracy in identifying infections by the administrative claims codes alone, a more restricted set of codes with higher specificity will be more efficient. Our novel, validated, medical records-based infection criteria for a broad range of serious bacterial infections, may limit the need for physicians’ manual review of medical records. These criteria may have usefulness both in clinical trials and observational study populations, either as the primary outcome or as part of a sensitivity analysis.