|Home | About | Journals | Submit | Contact Us | Français|
To evaluate diagnostic properties of International Classification of Diseases, Version 9 (ICD-9) diagnosis codes and infection criteria to identify bacterial infections among rheumatoid arthritis (RA) patients.
We performed a cross- sectional study of RA patients with and without ICD-9 codes for bacterial infections. Sixteen bacterial infection criteria were developed. Diagnostic properties of comprehensive and restrictive sets of ICD-9 codes and the infection criteria were tested against an adjudicated review of medical records.
Records on 162 RA patients with and 50 without purported bacterial infections were reviewed. Positive (PPV) and negative predictive values (NPVs) of ICD-9 codes ranged from 54% – 85% and 84% – 100%, respectively. PPVs of the medical records-based criteria were: 84% and 89% for “definite” and “definite or empirically treated” infections, respectively. PPV of infection criteria increased by 50% as disease prevalence increased using ICD-9 codes to enhance infection likelihood.
ICD-9 codes alone may misclassify bacterial infections in hospitalized RA patients. Misclassification varies with the specificity of the codes used and strength of evidence required to confirm infections. Combining ICD-9 codes with infection criteria identified infections with greatest accuracy. Novel infection criteria may limit the requirement to review medical records.
Bacterial infections in rheumatoid arthritis (RA) are common [1, 2] and are of growing interest based on an increasing number of serious infections reported in patients receiving biologic therapies [3–6]. A comprehensive understanding of the associations between infection with RA and the use of specific therapeutic agents has been limited by the absence of objective criteria to correctly identify infection in studies of large populations. Misclassifying infections may mask the risks related to use of particular arthritis medications  or could introduce bias if outcome assessment is subjective and reviewers are not blinded to medication exposure. A validated set of diagnostic criteria for a broad range of infections has been lacking in the medical literature and exists mainly for isolated infections such as the Duke criteria for endocarditis . Additionally, although the accuracy of administrative claims data has been studied for various conditions [8–10], their ability to accurately identify bacterial infections in a hospitalized rheumatoid arthritis population is largely unknown.
To address these methodological gaps, we sought to evaluate the accuracy of the International Classification of Diseases, Version 9, Clinical Modification (ICD-9) codes commonly used to identify infection outcomes in epidemiologic research, by evaluating a population of RA patients who was hospitalized. Additionally, we constructed medical records- based infection criteria for bacterial infections that could be used to validate the presence of infection when applied to abstracted medical information. Both the claims-based algorithms and the medical records- based infection criteria were validated against a standard of physician panel review of medical records for hospitalized RA patients.
After local institutional review board approval, we used the administrative claims data from the University of Alabama at Birmingham (UAB) health system to identify adults (age ≥ 18 years at the time of hospitalization) with one or more diagnostic codes for RA (ICD-9, 714.X in any position on the hospital discharge claim), who were hospitalized at our institution between January 2002 and December 2003. We compiled two sets of ICD-9 codes for infections (Appendix 1) based on expert consensus, 1) a “comprehensive” set that included a wide range of codes, with the goal of maximizing sensitivity; 2) a “restricted” set that included presumably more specific ICD-9 codes, as used by Schneeweiss and colleagues to validate infections in a Veterans Affairs (VA) hospital . Both these lists were grouped by anatomical site. Because hospitals may code the principal diagnosis as the one that leads to the highest reimbursement rather than the etiologic event that prompted the hospitalization, we allowed these infection codes to be in any position on the billing claim. We abstracted medical records of the first hospitalization during the study period that had an ICD-9 code for any bacterial infection (Figure 1). To ascertain the sensitivity of the administrative data to identify bacterial infections, medical records of 50 RA patients without a discharge diagnosis of infection in any position on the claims were randomly selected and similarly studied. The primary discharge diagnoses on the claims for these patients ranged from codes for arthritis related hospitalizations to codes for other systemic conditions such as multiple sclerosis, myocardial infarctions, and surgical treatments. None of the primary discharge diagnoses codes suggested admissions for treatment of putative infections.
We developed medical records-based infection criteria that encompassed 16 groups of bacterial infections commonly treated in clinical practice. As part of this process, we conducted a comprehensive literature review and integrated clinical, laboratory, microbiological, and radiological criteria used to diagnose infections (Appendix 2). These medical records-based infection criteria were further refined in collaboration with infectious disease specialists (N.A., M.S., A.R.), a gastroenterologist (C.E.), and a radiologist (R.L.). We excluded those microbes from bacterial cultures that were likely “contaminants” and whenever possible, we required objective criteria (e.g. culture data) to fulfill the infection criteria in order to maximize specificity. A medical record abstraction form was developed to collect relevant details of all bacterial infections that comprised these infection criteria. It included (among numerous data elements) results from up to six bacterial cultures and also the name and duration of use of all antibiotic treatments. Our medical record abstraction form though very similar to that used by Schneeweiss and colleagues , additionally included intra-abdominal bacterial infections requiring hospitalization that occurred in the presence of cholecystitis, diverticulitis or gastroenteritis. However, we did not capture non-bacterial atypical or opportunistic infections, as was done by Schneeweiss and colleagues . The medical record abstraction form was extensively pilot tested using medical records from RA patients hospitalized in our university facility.
All medical records with any ICD-9 code for bacterial infection(s) were reviewed by a team of three trained reviewers consisting of two physicians (N.M.P., G.G.T.) and one physician assistant (K.C.). Each medical record was independently abstracted by two of the three reviewers. Reviewers assessed the medical records for all categories of bacterial infection, not limited to just those pre-identified by the initial ICD-9 code(s). Based on their clinical judgment, the reviewers’ findings for each type of infection were assigned to three categories, “definite”, “empirically treated” or “no” bacterial infection(s). Anticipating some clinical uncertainty and an occasional paucity of information in medical records, “empirically treated” was listed by the reviewers if after reading the medical record, clear evidence for an infection was lacking, but the treating health care providers appeared to be managing a putative infection. Discordances in the reviewers’ assessment were adjudicated by a consensus of two physicians (K.G.S or J.R.C). These two physicians assigned the case status only among the range of diagnostic categories in discordance.
The reviewers’ assessment of certainty of the bacterial infections (“definite”, “empirically treated” or “no”) was used as the standard against which the ICD-9 codes and medical records-based infection criteria were compared to determine sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Inter-rater reliability of the standard was assessed using Cohen’s Kappa  as the measure of agreement for the certainty of the bacterial infections between the reviewers’ independent assessment of the medical records. Kappa values lower than 0.40 represent a poor agreement beyond chance, values between 0.40 and 0.75 as fair to good agreement beyond chance and Kappa values higher than 0.75 represent excellent agreement beyond chance. Since there may be circumstances where the high specificity of an infection is desired, one analysis considered only “definite” infections according to our gold standard definition as the outcome point. However, because the high specificity of this approach reduces sensitivity, we also grouped “definite” together with “empirically treated” infections in a separate analysis. Using these two separate standards, we calculated sensitivity, specificity, positive and negative predictive values of the infection criteria and the comprehensive and the restricted sets of ICD-9 codes for at least 1 infection of any type occurring during that hospitalization and organ specific infections (e.g. lung infections such as pneumonia). The predictive values vary with the prevalence of the disease as per the Bayes’ Theorem. For a known prevalence of the disease in a study population, the predictive values can be calculated using the sensitivity and specificity for the test . The findings from the 50 patients without a discharge diagnosis of infection were extrapolated to the additional hospitalized RA patients without ICD-9 evidence of bacterial infections to determine the prevalence of infection in the cohort of 557 patients. The sensitivity and specificity of ICD-9 codes were used to calculate the predictive values of these codes for a range of assumed prevalence of these infections .
Of the total 557 RA patients observed in the study period, we abstracted 100% of the relevant records of 162 RA patients hospitalized for the first time with a purported bacterial infection (based on ICD-9 codes) and 50 RA admitted patients without claims for infection (Figure 1). The mean ± standard deviation (SD) age of the patients was 63 ± 14 years and 73% were women. Using claims data, a median of 1.0 bacterial infection was identified per patient per hospitalization; the range of the number of infections per hospitalization was 1 to 5 among 13 possible discharge codes. Only one of the 50 (2%) medical records without claims for infection that were reviewed identified a bacterial infection.
Prior to adjudication, the inter-rater reliability of the reviewers’ assessment for the presence of any bacterial infection(s) was 0.85 (95% CI 0.74–0.97). Of persons hospitalized with a suspected infection, 41% (n = 87) had “definite” infection using both the restricted and comprehensive codes. In contrast, 64% (n = 135) and 62% (n = 132) had a “definite or empirically treated” infection with the comprehensive and the restrictive ICD-9 codes, respectively. Among all 557 RA patients hospitalized for any condition, the prevalence of “definite” and “definite or empirically treated” infection was 16% and 25%, respectively.
The sensitivity of comprehensive ICD-9 codes was higher than that of the restricted set for both “definite”, (100% vs. 59%) and “definite or empirically treated” infections (99% vs. 48%) respectively. The specificities of the comprehensive set of codes as expected were lower than those of the restricted set of codes (Table 1).
The PPV and NPV of the two sets of ICD-9 codes ranged from 54%–85% and 84% –100% respectively. The effect of varying the prevalence of infections on the PPVs of the comprehensive and the restricted sets of ICD-9 codes are shown in Figure 2. Across infectious disease prevalence rates, the PPVs of the more restricted set of ICD-9 codes to identify either “definite”, or “definite or empirically treated” infections compared to the PPVs of the comprehensive set of ICD-9 codes to identify definite or empirically treated infections were very similar and were consistently greater than 80% at a disease prevalence of 25% (“definite or empiric” infections). Thus these restricted codes demonstrated greater accuracy over using the comprehensive codes in identifying infections. The positive predictive value of the more comprehensive codes to identify only “definite” infections did not exceed 80% unless the expected disease prevalence was at least 40%.
The medical records-based infection criteria were intentionally constructed to favor specificity over sensitivity. The specificity of both “definite” and “definite or empirically treated” infections was very high at 94% and 91% respectively; the sensitivity was similarly low (48% and 41% respectively) (Table 1). The PPV of the infection criteria was 84% for “definite” infections and 89% for “definite or empirically treated” infections. The PPVs of the infection criteria across a range of infection prevalence rates is shown in Figure 3. Figure 3 also shows that bacterial infection detection can be improved by using the ICD-9 codes simultaneously with the medical records-based infection criteria. For example, at a 16% infectious disease prevalence, the PPV of the infection criteria to identify definite bacterial infections was just 60%. By first screening with ICD-9 codes to increase the prevalence of definite bacterial infections to 54%, the PPV of the infection criteria then increased to 90%. Similarly, for identifying “definite or empirically treated” infections, the PPV of these criteria was 60% at a 25% disease prevalence. Using the ICD-9 codes to prescreen potential cases, the disease prevalence increased to 83%, and the PPV was then very high (96%).
For the most common site-specific infections, the specificities for urinary tract infection, bacteremia, and device-associated medical record infection criteria were each 100% and the specificities of the other sites of infection were only slightly lower (Table 2). The other site-specific infections that we examined occurred less frequently and were therefore not reported separately. These included postoperative infections (n = 11), upper respiratory tract infections (n = 9), osteomyelitis (n = 8), meningitis (n = 7), gastroenteritis (n = 5), septic arthritis and intraabdominal abscesses (n = 3), diverticulitis (n = 2), endocarditis and cholecystitis (n = 1).
Our study evaluated the diagnostic properties of ICD-9 codes to identify putative bacterial infections in hospitalized rheumatoid arthritis patients. We found that using ICD- 9 codes alone to identify bacterial infections in hospitalized rheumatoid arthritis patients may misclassify 15% to 46% of the infections, depending on the set of codes and the strength of the evidence desired to identify infections. We also developed diagnostic criteria for bacterial infections based on medical records abstraction, and showed that they had very high positive predictive values. A two-stage process where potential cases are identified using claims data and these medical records classified based on the infection criteria, result in a PPV of 96% and eliminate the need for physicians to review each medical record.
Complementary to our work, Schneeweiss and colleagues developed infection criteria and identified suspected infections among general medical patients at a Veterans Administration (VA) hospital using the restricted list of ICD-9 codes that we adopted for this study . In their study, the PPV of the claims data to identify selected bacterial infections combined was similar (90%) to our findings (85%). However, compared to the comprehensive set of codes, this high PPV and its corresponding specificity come at the expense of lower sensitivity (48%). Factors that might account for the modest differences in our studies include inherent dissimilarity in the coding practices between a VA and a university hospital, and differences in the characteristics of general medical patients versus RA patients. In addition, since both these studies were focused on assessing validity of the infection codes neither studies separately assessed possible nosocomial infections. Distinguishing nosocomial from community-acquired infections would be important when assessing susceptibility to infections with use of particular therapeutic interventions for managing RA. Schneeweiss and colleagues selected their study population using only the primary discharge diagnoses codes identified in VA data, excluding nosocomial infections . In contrast, as we had to overcome the possibility of financial incentives in coding for higher reimbursements, we included bacterial infection codes at any position of the discharge claims.
Given their higher sensitivity, a comprehensive set of administrative codes are better able to maximally identify putative bacterial infections. This is necessary when initially researching large administrative databases to identify any potential infections. A restricted set of codes, however, given their better specificity will decrease the number of falsely identified cases of infections. The resultant risk estimates vary with differing sensitivity and specificity. It is important to note though that even if sensitivity is low, high specificity will result in un-biased relative effect measures [14, 15]. In our past experience, we used a “more sensitive” method of applying the medical and pharmacy administrative claims codes from a health-care insurers’ database as a first step for identifying as many cases of presumed serious bacterial infections . Once the plausible bacterial infection cases were identified, these cases where further evaluated by applying a “more specific” medical record criteria for bacterial infections. As the specificity of the infection criteria increased, fewer cases met the criteria for an infection (i.e. sensitivity decreased and the adjusted hazard ratio for bacterial infection associated with TNFα antagonist increased).
To our knowledge, only few other studies have assessed the diagnostic accuracy of administrative data alone to identify patients with suspected bacterial infections. One study compared 5 different claims- based algorithms to a standard of clinically diagnosed pneumonia. The prevalence of pneumonia in that study was 2.5%, and the PPVs of the claims algorithms ranged from 73% to 81% compared to ours of 89% . In another study using data from the General Practice Research Database, 62% of pneumococcal pneumonia administrative codes were confirmed using medical record review . In both these studies the patients were healthier general medical populations in contrast to our RA patients that are commonly on immunosuppressant medications. Despite these differences in patient populations, the results of our study are similar, suggesting that ICD-9 codes are capable of identifying pneumonia accurately.
Our newly developed medical records-based infection criteria will be a particularly valuable resource when uniform classification criteria are needed. Examples of settings in which this may be particularly useful include multi-site trials where infectious adverse events are classified subjectively by individual site investigators or open label studies where investigators are not blinded to drug exposure status. They also may be useful in retrospective epidemiologic studies where abstractors can collect data and classify infections according to these validated medical records-based infection criteria without requiring case-by-case determinations from a physician.
Some bacterial infections have objective, pathognomonic laboratory findings and are therefore relatively simple to classify. For example, the diagnosis of bacteremia required presence of a positive blood culture, and thus the specificity of its infection criteria was 100%. Classification of some other infections however, was more challenging. For example, cellulitis, a predominantly clinical diagnosis, was defined mainly based on history and physical examination findings. These were sometimes inadequately documented in the medical records, and supporting microbiologic evidence for cellulitis is typically scant and not useful. Thus, the specificity of the cellulitis criteria was only 70%. In comparison, diagnosis of lower respiratory tract infections (mainly pneumonia), relied both on objective (laboratory & culture) and more subjective (i.e. clinical) evidence, and the infection criteria demonstrated an intermediate specificity of 82%.
As is true for any diagnostic test, higher specificity comes at the expense of lower sensitivity. This was made especially clear by the criteria for urinary tract infection (UTI) which had 100% specificity but a sensitivity of only 5%. The likely explanation for this very low sensitivity was the requirement for at least 10,000 colonies of an organism pathologic to the genitourinary tract, or for lesser colony counts, antibiotic administration of at least 7 days. Many of the suspected UTIs we found were treated on the basis of an abnormal urinalysis and no positive culture results were ever documented (even after hospital discharge).
Varying approaches have been used to diagnose bacterial infections in observational studies. Infections have been defined by using a combination of patient self-report, hospital records, physician reports, mortality records, antibiotic prescriptions, or by the opinion of treating physicians [5, 6, 11, 18–20]. A few of these studies developed operational criteria to identify their outcomes [2, 11, 19, 20]. For example, Doran and colleagues required the presence of specific symptoms and laboratory values to define any infectious adverse events in RA patients in a retrospective longitudinal cohort study . Leveille and colleagues  compared antibiotic prescription fills in automated pharmacy records with medical record review as the standard for infections. Although the face validity of these approaches is reasonable, these approaches were not rigorously evaluated and validated, as we did in this study, nor have they been widely adopted by subsequent investigators.
The major strengths of our study are its sample size and the full access we had to all the information in the medical records, including culture results that became positive even after patient discharge. All medical records were independently abstracted by two healthcare providers who had complete access to all the desired medical records and had high agreement on the classification of infections. Additionally, we studied RA patients who may experience unique patterns of infections compared to general medical patients.
Despite its strengths, our results must be interpreted in the context of our study design. Since this project was undertaken in one university health system, the results may not be generalizable to some other health care settings where medical record documentation or ICD-9 coding practices differ. We were unable to assess the medical records of all patients without an ICD-9 code for infection, hence verification bias might exist [21, 22], which might increase the apparent sensitivity and decrease the apparent specificity of the ICD-9 codes. Based though on an expectation that claims data may have high specificity , and the observation of only one infection (2%) from those medical records without an ICD-9 code for infection that were reviewed, this bias is likely lessened. Additionally, the medical record infection criteria were intentionally designed to favor specificity at the expense of sensitivity; thus, they under-ascertained infections that did not have objective diagnostic details in the medical records. For some particular sites of bacterial infections (e.g. meningitis), the low prevalence of these infections restricted drawing conclusions about the diagnostic accuracy of these groups of infections. Finally, our administrative data examined only single ICD-9 codes; more complex claims-based algorithms may be able to achieve higher specificities and PPVs.
In summary, the use of ICD-9 diagnosis codes alone may mis-classify bacterial infections in hospitalized RA populations, although the level of misclassification varies depending on the codes used. To improve accuracy in identifying infections by the administrative claims codes alone, a more restricted set of codes with higher specificity will be more efficient. Our novel, validated, medical records-based infection criteria for a broad range of serious bacterial infections, may limit the need for physicians’ manual review of medical records. These criteria may have usefulness both in clinical trials and observational study populations, either as the primary outcome or as part of a sensitivity analysis.
We thank Drs. Nenad Avramovski and Ari Robicsek M.D. for guiding in constructing the medical records-based infection criteria, Dr. Robert Lopez M.D. for guiding in interpreting the radiological findings, Dr. Charles Elson M.D. for his expertise in gastroenterology, Ms. Karen Connor, physician assistant who helped in abstracting medical records, Mr. Bart Prevallet MBA, for creating the data abstraction instrument, and Mr. Jorge Nunez for helping with the data entry.
Supported by the Engalitcheff Arthritis Outcomes Initiative, Maryland Chapter, Arthritis Foundation grant HS10389 from the Agency for Healthcare Research and Quality, from grant award 5K24AR052361-03,1-K23-AR053351-01-A1,PhRMA Foundation, Research Starter Grant in Health Outcomes
|Discharge Diagnosis||Sensitive set||Specific set|
|Pyelonephritis/Urinary Tract Infection||590.XX||590.X|
|Upper Respiratory Tract Infection||34|
|Device Associated Infections||996.6X|
|Local Infections of skin and subcutaneous Tissue||686.1|
|Post Traumatic Wound Infection||958.3|
|Postoperative Wound Infection||998.5|
*this is not an exhaustive list, any organism irrespective of colony count
+Polymicrobial infection = any number of organisms
*Any number of organisms irrespective of the colony count
*Recent history = 4 to 6 weeks prior to meningitis
+“positive” CSF serology for syphilic meningitis = CSF VDRL or RPR or FTA any one positive
*= presence of any organism irrespective of colony counts
*presence of any organism irrespective of colony counts
*IGNORE any gram stain report
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.