|Home | About | Journals | Submit | Contact Us | Français|
To examine the impact of diagnostic coding error on estimates of hospital costs attributable to adverse events.
Original and reabstracted medical records of 9,670 complex medical and surgical admissions at 11 hospital corporations in Ontario from 2002 to 2004. Patient specific costs, not including physician payments, were retrieved from the Ontario Case Costing Initiative database.
Adverse events were identified among the original and reabstracted records using ICD10-CA (Canadian adaptation of ICD10) codes flagged as postadmission complications. Propensity score matching and multivariate regression analysis were used to estimate the cost of the adverse events and to determine the sensitivity of cost estimates to diagnostic coding error.
Estimates of the cost of the adverse events ranged from $16,008 (metabolic derangement) to $30,176 (upper gastrointestinal bleeding). Coding errors caused the total cost attributable to the adverse events to be underestimated by 16 percent. The impact of coding error on adverse event cost estimates was highly variable at the organizational level.
Estimates of adverse event costs are highly sensitive to coding error. Adverse event costs may be significantly underestimated if the likelihood of error is ignored.
A growing branch of research in the patient safety arena is the estimation of the economic cost of adverse events. Prior examinations have consistently reported alarmingly high estimates of the cost of adverse events (Kohn et al. 2000; Zhan and Miller 2003a; Zhan et al. 2006; Mello et al. 2007; Encinosa and Hellinger 2008). Not surprisingly, these estimates have caught the attention of policy makers who are increasingly incorporating adverse event cost estimates into hospital reimbursement schemes. The Centers for Medicare and Medicaid Services (CMS) decision to withhold additional hospital payments for certain “conditions that could reasonably have been prevented” and “serious preventable events” has probably been the most debated policy response to date (Rosenthal 2007; Wachter, Foster, and Dudley 2008).
While chart review has been the dominant method for identifying adverse events in hospitalized patients (Thomas, Lipsitz, and Studdert 2002; Zhan and Miller 2003b; Michel, Quenon, and de Sarasqueta 2004), patient safety researchers are increasingly turning to administrative data (e.g., Rosen et al. 2006; Encinosa and Hellinger 2008; Houchens, Elixhauser, and Romano 2008; Raleigh et al. 2008; Rivard et al. 2008; Friedman et al. 2009). While the advantages of administrative data over chart reviews in terms of cost and coverage are considerable, previous studies have found that adverse events are incorrectly coded in administrative data compared to patient charts or other reference standard datasets (Best et al. 2002; Romano et al. 2002; Quan, Parsons, and Ghali 2004; Leibson et al. 2008). In Ontario, the setting for this study, previous analysis has demonstrated that diagnosis coding error in administrative data can have important implications for cost and case mix analysis (Preyra 2004; Sutherland and Botz 2006). While some researchers have acknowledged the likelihood of coding error and hypothesized that it would lead to an underestimation of the cost of adverse events (Zhan and Miller 2003a), there is, as yet, no published study that has estimated the impact of coding error on estimates of adverse event costs.
This article aims to address this gap in the literature by measuring the effect of coding error on estimates of the costs attributable to Needleman et al.'s (2006) nursing sensitive adverse events. These events include central nervous system complications, deep venous thrombosis, hospital-acquired pneumonia, hospital-acquired sepsis, metabolic derangement, pressure ulcers, pulmonary failure, shock or cardiac arrest, upper gastrointestinal bleeding, urinary tract infections, and wound infections. The analysis relied on a large reabstraction study of medical records from 11 Ontario hospital corporations. Coding discrepancies between the original and reabstracted records were examined and used to design a reference standard dataset that served as the basis for assessing the sensitivity of cost estimates to coding error.
The Discharge Abstract Database (DAD) is Ontario's acute inpatient administrative database and is maintained by the Canadian Institute for Health Information (CIHI). For each discharge, hospital-employed coders prepare a DAD abstract that includes one most responsible diagnosis (MRDx) and up to 24 additional significant comorbidities. The coders also indicate whether each diagnosis was present on admission (POA) or manifested during the hospitalization. All diagnosis codes are abstracted using a Canadian adaptation of ICD10 (ICD10-CA).
Needleman et al.'s (2006) adverse events were identified based on the appropriate ICD10-CA codes that were flagged as postadmission complications. We selected Needleman et al.'s (2006) adverse events because they have been the subject of much research and the majority of the adverse events pertain to both medical and surgical patients. Other sets of adverse events, including AHRQ's Patient Safety Indicators, focus exclusively on surgical patients.
Canadian Institute for Health Information's coding standards state that diagnoses are to be abstracted only if the patient chart includes physician documentation that the condition satisfied at least one of the following criteria for significance: (1) the condition significantly affected the treatment received; (2) the condition required treatment beyond maintenance of the preexisting condition; or (3) the condition increased the length of stay by at least 24 hours (CIHI 2006). Although the standards aim to be specific, they may be open to multiple interpretations. For example, clinical evidence that attributes a 24-hour increase in a patient's hospital stay to a comorbid diagnosis may be ambiguous. Coders may therefore often be required to make a subjective assessment of the effect of diagnosis on a patient's hospital course. In the United States, CMS's guidelines for coding secondary diagnoses include similar criteria: “For reporting purposes the definition for ‘other diagnoses’ is interpreted as additional conditions that affect patient care in terms of requiring: clinical evaluation; or therapeutic treatment; or diagnostic procedures; or extended length of hospital stay; or increased nursing care and/or monitoring” (AMA 2008). Given the possibility for subjective coding decisions, it is relevant to consider whether applications of the data (e.g., for payment or performance measurement) might influence hospital coding practice.
Ontario's hospitals are nonprofit private corporations that receive funding from the provincial government equal to approximately 85 percent of their total operating expenses. Government funding is allocated using global budgets which are routinely adjusted for inflation, program changes, and population growth. At the time of the study, additional adjustments to the base budgets were made using a payment model based on hospitals’ relative cost per adjusted discharge. Hospitals received shares of incremental funding in proportion to the difference between their actual and expected cost per adjusted discharge. Expected hospital costs were adjusted for teaching intensity, size, geography, and case mix. While hospitals had little ability to affect the other adjustment factors, they could influence their measured case mix by coding more aggressively. Under this model, hospitals that coded more adverse events would, ceteris paribus, increase their share of available funding. This aspect of Ontario's payment model was similar to that used by CMS for the PPS prior to its “never events” policy.
Looking to assess the adequacy of coding for its case mix based payment model, the Ontario Ministry of Health and Long Term Care (Ministry) partnered with CIHI to conduct a reabstraction study of 2002/03 and 2003/04 records from Ontario's 11 case costing hospital corporations. The corporations collectively operated 16 individual hospital sites, each of which collects patient-specific cost data using a standardized methodology. The cost data are submitted to the Ministry's Ontario Case Costing Initiative (OCCI) database. OCCI data have been the subject of numerous quality reviews and are used by the Ministry and CIHI to develop case mix systems. The costs captured in the OCCI include total direct (e.g., nursing, laboratory, pharmacy, imaging) and indirect costs. Indirect costs are those associated with administrative and support departments and each patient's share of these costs is determined using a standardized methodology (Ontario Ministry of Health and Long Term Care 2010). OCCI costs do not include physician payments. The 11 OCCI hospital corporations included in this study accounted for approximately 23 percent of Ontario's total discharges during the study years. All costs presented are in Canadian dollars.
The reabstraction study focused on records with multiple comorbidities. Specially trained coders (reviewers) recoded 13,803 abstracts directly from the original patient charts while blind to the original abstracts. After completing each abstract, the reviewers compared their coding with the original coding and characterized the nature of each observed discrepancy using one of the following descriptions.
The original data are potentially limited for adverse event cost analysis because they include adverse events that, according to the reviewers, had been coded without adequate supporting chart documentation or in violation of CIHI's standards. The reabstraction data may also be suboptimal for the analysis because they exclude adverse events identified by both the original and reviewing coders but deemed only by the reviewers to have had an insignificant impact on the patient's course.
To redress the limitations of the original and reabstraction data, we established a hybrid dataset. We made the hybrid dataset by including the following: (1) all adverse events coded by both the hospital coders and the reviewers; (2) all adverse events coded only by the reviewers; and (3) all the originally coded adverse events that were omitted by the reviewers because of “significance” and “optional” discrepancies. The hybrid data therefore exclude adverse events deemed by the reviewers to have been originally coded without adequate supporting chart documentation or in violation of coding standards. While we propose the hybrid data as our reference standard for the cost analysis, they are still subject to the inherent limitations of administrative data. These limitations include the potential for poor concordance of ICD codes with physicians’ notes, misinterpretation of physician notes by coders, and transcribing errors by physicians and coders. The hybrid data therefore do not establish a gold standard, but they are appropriate for our examination because they minimize coding inconsistencies associated with subjective interpretations of the coding standards.
Similar to previous adverse event costing studies, we mitigated confounding in our analysis by preprocessing our data using propensity score matching prior to conducting our parametric analysis (Bates et al. 1997; Classen et al. 1997; Zhan and Miller 2003a; Encinosa and Hellinger 2008; Rivard et al. 2008). We estimated each patient's probability of experiencing any of the adverse events using logit regressions that controlled for the covariates presented in Table 3. The covariates included the following patient characteristics deemed likely to have confounding effects on cost and likelihood of experiencing an adverse event: age, gender, urgent admission, and an indicator variable for each of three types of care: medical, surgical, and major surgical. As was done in previous adverse event costing studies, we included indicator variables for the Charlson chronic conditions (Charlson et al. 1987; Zhan and Miller 2003a; Encinosa and Hellinger 2008; Rivard et al. 2008). The regressions also controlled for the fixed effects of 19 major clinical categories (MCCs) which provide a general description of the body system or type of clinical condition associated with the primary cause of admission.1 Because 2 years of data were pooled for the analysis, an indicator variable for the 2002/03 fiscal year was included to control for potential time trends across the years. The regressions also included indicator variables for the 11 hospital corporations to control for hospital level effects, including those potentially associated with teaching mission, size, and cost efficiency.
Matching of cases to controls was performed using a one-to-one nearest-neighbor matching algorithm. For each adverse event case, our matching algorithm selected a nonadverse event control within a 1 percent difference in risk of experiencing an adverse event. The 1 percent difference has been used in previous adverse event costing studies (Zhan and Miller 2003a) and was selected here after investigating the impact of higher and lower thresholds on the degree of balance in the resulting matched samples. Statistical analysis and matching were performed using R version 2.12 and its MatchIt package (Ho et al. 2007; Stuart et al. 2007).
In the three matched samples, we assessed balance using the percent improvement in difference in means for each covariate. This measure is defined as ((|a| − |b|)/|a|) × 100, where a is the difference in means between the adverse event and nonadverse event groups in the raw sample and b is the difference in the matched sample. Although they are widely used to assess balance, we did not use t-tests of differences in means because they can be misleading and should be avoided (Imai, Stuart, and King 2008). Our parametric analysis controlled for the potential of residual differences in the distribution of the covariates in the matched samples.
We were interested in estimating the mean causal effect of the adverse events on patient cost averaged over all patients in the sample. This quantity is known in causal inference theory as the average treatment effect (ATE). In this section, we adopt the notation of Ho et al. (2007). Let Yi(1) be the cost that would be observed for patient i with an adverse event and characteristics Xi, and Yi(0) be the cost without the adverse event. The ATE is then defined as:
where the summation over i refers to the matched sample.
After matching, we used a generalized linear model (GLM) with a Gamma distribution and logarithmic link function to regress patient cost on the variables in Table 3 and indicator variables for each adverse event type, while controlling for the fixed effects of 19 MCCs and the 11 hospital corporations. We selected this model over an OLS regression on the natural logarithm of cost and other families of the GLM class of models using Manning and Mullahy's (2001) algorithm.
We used the estimated coefficients from the regressions to predict Yi(1) and Yi(0) for each patient and adverse event type. We then calculated the ATE of each adverse event type by taking the sample mean of the difference in predicted costs. We made this analysis using each of the original, reabstraction, and hybrid datasets, and examined the sensitivity of cost estimates to coding error by comparing the estimated costs from the three datasets.
We performed additional analysis to examine the nature of the coding discrepancies in our data. For each adverse event type, we computed and compared the ATE of four subsets of adverse events: (1) events coded by both the original and reviewing coders; (2) events coded by the original coders but deemed insignificant or optional by the reviewers; (3) events coded by the original coders but deemed improperly coded by the reviewers based on chart documentation or standards; and (4) events coded only by the reviewers. The ATEs were estimated from the original matched sample using the methods described above but with the 11 adverse event indicator variables in our GLM model replaced with 44 indicators for each adverse event–discrepancy type combination.
After applying Needleman et al.'s (2006) exclusion criteria, our raw sample consisted of 9,670 medical and surgical records in which no individual appeared twice. Records with multiple diagnosis codes indicating a single type of adverse event were deemed to have a single occurrence of that adverse event type. Records could have more than one type of adverse event, and this occurred in 8 percent (783) of records in the raw sample.
Table 1 shows that the original data included 3,620 adverse events, the reabstraction data included 2,586, and the hybrid data included 3,394. Table 1 also reports estimates of five agreement measures resulting from assessments of the original data against both the reabstraction and hybrid data. We present the results of both assessments because these estimates, in conjunction with the cost estimates presented subsequently, may help other researchers gauge how coding accuracy in their jurisdictions might affect estimates of adverse event costs.
When assessed against the reabstraction data, the median sensitivity of original adverse event coding was 0.67 and the median PPV was 0.52. Adverse events with the highest sensitivities included urinary tract infections (0.78) and shock or cardiac arrest (0.76). Adverse events with the lowest sensitivities were upper gastrointestinal bleeding (0.60) and pulmonary failure (0.62). Agreement measures are higher for all adverse events when the original data are compared against the hybrid data. The median sensitivity was 0.76 and the median PPV was 0.73. Metabolic derangement, central nervous system complications, and urinary tract infections have the largest differences in PPVs between the reabstraction and hybrid estimates (0.48, 0.38, and 0.35). Because they are largely driven by “significance” discrepancies, these differences indicate that coders had the most difficulty in assessing the effect of these conditions on a patient's hospital course. Not shown here, the median sensitivity of the Charlson comorbidities was 0.85 and the median PPV was 0.81.
Table 2 shows transition matrices for diagnosis codes captured in the original and reabstraction data. There are 3,949 original adverse event codes in this table compared to the 3,620 adverse events reported elsewhere in the analysis because some records had multiple codes indicating the presence of a single adverse event type. These tables suggest two promising findings. First, the MRDx appears reliably coded; coders agreed on the MRDx in 82 percent of records. Second, if they agreed on the presence and significance of the diagnosis, coders reliably established the timing of diagnosis onset. For example, only 1 percent of originally coded Charlson comorbidities was reclassified as having manifested during the hospitalization and only 4 percent of originally coded adverse events were reclassified as having been POA.
Less promising is that coders had great trouble agreeing on the presence and significance of adverse events and comorbidities. The reviewers agreed with the code selection and typing of only 49 percent of the 3,949 originally coded adverse events. Moreover, the reviewers re-coded only 48 percent of the original Charlson comorbidities. The proportion of original codes that were not re-coded by the reviewers, shown in the second last column of Table 2, demonstrates the conservatism of the reviewers relative to the original abstractors. For example, the reviewers did not re-code 45 percent of the original adverse events. Not shown in the table, the reviewers deemed that these adverse events did not meet the criteria for significance (22 percent), had inadequate supporting documentation in the chart (15 percent), did not meet coding standards (7 percent), or were optional/not wrong to code (1 percent). The adverse events associated with significance and optional disagreements are excluded from the reabstraction data but ought to be included in the cost analysis because there is agreement among the coders on the presence of the adverse event. In contrast, the adverse events associated with disagreements over documentation and standards should be excluded because the reviewers disagreed with the original coders over the presence of the adverse events.
Despite their relative conservatism, the reviewers did code conditions that had been overlooked by the original coders. Shown in the lower section of Table 2, 20 percent of the 2,761 adverse events coded by the reviewers had not been captured in the original abstracts. The reviewers deemed that these adverse events were originally omitted due to information on the chart being missed by the original coders (15 percent), incorrectly deemed insignificant by the original coders (2 percent), or omitted in contravention of the coding standards (2 percent). These adverse events are apparent false negatives and should be included in the cost analysis.
Table 3 shows that the adverse event cases had different characteristics than the nonadverse event cases in the raw sample. Using our algorithm, we matched 2,194 adverse event records from the original data to 2,194 nonadverse event records on the basis of similarity in propensity scores. Matches that met our threshold of a maximum of 1 percent difference in predicted risk could not be found for 115 (5 percent) adverse event records. The last column of Table 3 shows that our matching algorithm improved the extent of balance between the case and control groups for all covariates except Year 2002, Mild Liver Disease, and Severe Liver Disease. That the mean propensity score was equal in the case and control groups (0.303) after matching indicates the overall success of our matching exercise. Similar improvements in balance were achieved for the reabstraction and hybrid matched samples.
Table 4 shows the excess unit cost of each adverse event estimated from the original, reabstraction, and hybrid datasets. The mean excess cost was $17,218 in the original data, $26,157 in the reabstraction data, and $22,642 in the hybrid data. The mean cost of all cases in the raw sample was $21,358, which reflects the focus of the reabstraction study on complex cases. The mean cost of all cases at the OCCI hospitals was approximately $7,000 in 2003/04.
Excess unit costs estimates derived from the original data were, on average, 24 percent less than the estimates derived from the hybrid data. Upper gastrointestinal bleeding events resulted in the highest excess costs ($30,176) while metabolic derangements results in the lowest excess costs ($16,008). The difference between the original and hybrid estimates was largest for deep venous thrombosis (41 percent).
The last eight columns of Table 4 show the counts and ATEs of each type of coding discrepancy. Adverse events listed under the “significance” column are those that were originally coded and subsequently deemed by the reviewers to have occurred but not to have satisfied CIHI's criteria for significance. It would therefore be intuitive to expect that the ATEs of these events would be positive but less than the ATEs of events coded by both coders, and this was the case for 8 of the 11 adverse event types. On average, the ATE of insignificant events was 31 percent less than the average ATE of events coded by both coders. The ATE of insignificant occurrences of sepsis was negative. Contrary to intuition, the ATEs of insignificant occurrences of thrombosis, shock or cardiac arrest, and upper gastrointestinal bleeding were higher than the ATEs of occurrences coded by both coders. Given that the reviewers did not find adequate information in the charts to support coding these events, it is curious that the ATE for all events associated with chart documentation and standard discrepancies, save pressure ulcers, were positive. However, the average ATE of events with these discrepancies was 48 percent less than the average ATE of events coded by both coders. The ATEs of all events coded only by the reviewers were positive and, on average, 5 percent higher than the average ATE of events coded by both coders.
Table 5 shows results at the institutional level. Compared to the hybrid data, adverse events were, on average, 7 percent over-reported in the original data with a range across the hospitals of 26 percent over-reported (Hospital D) to 27 percent under-reported (Hospital A). Institutional estimates of the total cost attributable to the adverse events were derived by multiplying the number of adverse events at each hospital in each dataset by the corresponding unit cost estimate. Among the four largest institutions in the study (H, D, J, F), the extent to which the total costs attributable to adverse events varied between the original and hybrid estimates ranged from 3 percent underestimated (Hospital D) to 25 percent underestimated (Hospital F). Not shown in the table, the original total cost estimates were less than the hybrid estimates by 14 percent on average for the teaching hospitals and 20 percent on average for the large community hospitals. The last row of Table 5 shows that the estimate of the total cost attributable to adverse events in the original data was 16 percent less than the estimate derived from the hybrid data.
Estimates of the excess unit and total costs attributable to adverse events are highly sensitive to diagnosis coding error. Coding error in the original data caused the excess unit costs to be underestimated on average, relative to our reference standard estimate, by 24 percent, and the total cost attributable to the adverse events to be underestimated by 16 percent. This is an important result because it suggests that the economic impact of adverse events might be under-estimated in studies that ignore the likelihood of error. Given this finding, previous assessments of the business case for patient safety may have been biased against the cost effectiveness of patient safety improvements.
The observed extent of institutional-level variation in adverse event coding and costs estimates suggests that that Ontario's administrative data form an inconsistent basis for hospital performance measures related to adverse events. The variation also suggests that hospital payment schemes based on these inconsistent measures could be unjust and lead to the misdirection of efforts to improve quality and contain costs. These findings may have important implications for jurisdictions considering the implementation of hospital reimbursement systems that rely on administrative data to identify and estimate the cost of adverse events.
We believe that the finding that the ATEs of events associated with significance discrepancies were positive for all events types except sepsis, coupled with the implicit agreement among the coders on the occurrence of the events, supports their inclusion in our reference standard hybrid data. Moreover, despite finding that they had positive ATEs, we believe that the explicit disagreement among the coders on the occurrence of events associated with chart documentation and standards discrepancies supports their exclusion from our hybrid data. A reasonable alternative to our approach might be to use only the events coded by both the original and review coders as the reference standard. The results of this analysis are consistent with our findings; ATE estimates derived from the original data were, on average, 19 percent lower than estimates derived using only the events captured by both sets of coders.
Our findings suggest a need to critically review the reliability of coding standards pertaining to adverse events. A noteworthy difference between CIHI and CMS standards is that CIHI requires that a diagnosis extended the hospital stay by at least 24 hours, whereas CMS requires only the extension of the hospital stay. Given the potential difficulties in attributing a specific length of stay increase to a particular diagnosis code, it might be worthwhile to investigate whether CIHI's minimum threshold increases the requirement for subjective coding. Moreover, our findings related to the extent and ATEs of events coded only by the reviewers (i.e., those apparently missed by the original coders) also points to a need to examine more upstream processes associated with chart documentation. It is possible that many of the “missed” events were found by the reviewers due to information added to the charts after the preparation of the original abstracts.
There may be opportunities to improve on adverse event identifying algorithms in the absence of reforms to coding and data collection mechanisms. While procedure codes are not relevant for all adverse events, they have been used to augment the identification criteria for some adverse events (Wahl et al. 2010) and are generally coded more reliably than diagnoses (Juurlink et al. 2006). Results of studies that have enhanced administrative data with objective laboratory data for risk adjustment have been promising and suggest that similar approaches may be relevant for adverse event identification (Pine et al. 2007; Tabak et al. 2010). The potential of pharmacy data to identify clinical interventions associated with the management of adverse events should also be explored.
Adverse event coding in our sample of Ontario data appears to be at least as reliable as that in U.S. jurisdictions. Using the National Surgical Quality Improvement Program data to assess coding in the Department of Veteran Affairs’ Patient Treatment File, Best et al. (2002) found that only 7 percent of adverse event codes had sensitivities above 0.50, and only 4 percent of codes had positive predictive values above 0.50. Romano et al. (2002) used chart reviews to assess the quality of coding of postoperative complications among diskectomy patients in California and found that only 4 of 31 complications had sensitivities above 0.60. Looking to validate the complications included in the Complications Screening Program (Iezzoni et al. 1994); McCarthy et al. (2000) found that postoperative acute myocardial infarctions were well reported, but that <60 percent of other complications had adequate clinical evidence in the patient charts to support the diagnosis.
The costs reported herein are higher than those reported in previous studies. This may be due to differences in the costs being analyzed: we analyzed costs as reported by the treating hospitals, whereas previous articles have analyzed transacted payments (Encinosa and Hellinger 2008), charges (Zhan and Miller 2003a), or estimated costs based on hospital level cost-to-charge ratios applied to patient-level charges (Bates et al. 1997). Given these differences, jurisdictions outside Ontario may find the extent of variation in cost estimates (i.e., coding-error induced variation) of more interest than the point estimates. The differences in costs may also be due to the sampled patients. Our study focused on complex patients, whereas other studies investigated the impact of adverse events on cost for patients across all severity levels. No study has yet investigated whether the causal effect of adverse events on cost is constant across patient severities, but this would be a useful contribution to the literature.
Joint Acknowledgment/Disclosure Statement: This research was approved by the University of Toronto Ethics Review Board, and it was supported by a grant from the Canadian Institute for Health Research (CIHR grant 84310).
1Major clinical categories are referred to throughout this document and are registered trademarks of the CIHI.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.