Like previous PSI validation studies in the ICD-9-CM data, we focused on PPV in the ICD-10 administrative data as the foremost of our interests. Our study revealed that PSI PPVs in most instances are sufficiently high to support the widespread use of case findings. The low PPVs for some PSIs, such as sepsis, do not support utility of PSIs for quality of care reporting for comparisons across jurisdictions.
The validity of the ICD-10 data varied by PSI. PSI 5—foreign body and 13—sepsis had low PPV but PSI 7—infection, 12—PE/DVT and 15—laceration had high PPV. The high PPV for PSI 12—PE/DVT (89.5%) is supported by one US study12
(PPV=79%), but is higher than four other US studies (PPV=22–55%).13–16
In contrast to our finding for PSI 13—sepsis (PPV=9.8%), Romano et al15
reported a higher PPV for PSI 13—sepsis (45%). Similar to our finding PPV for PSI 15—laceration (PPV=90.8%), Kaafarani et al16
reported high PPV for PSI 15—laceration (91% and 85%, respectively).
Because low prevalence PSIs with reasonable precision (ie, 95% CIs) require many charts to be reviewed for calculating sensitivity, all previous studies, except the study conducted by Koch et al
evaluated data quality using PPV. The PPV value depends on the prevalence of PSI and varies greatly across PSIs and studies. For example, the PPV for PSI 12—PE/DVT ranged from 22% to 79% across studies conducted in the USA.12–16
Koch et al18
compared the agreement between the ICD-9-CM data, National Surgical Quality Improvement Program (NSQIP) and Cardiovascular Information Registry (CVIR) in PSIs. The agreement was substantial for PSI 12—PE/DVT and poor for PSI 9—haemorrhage, PSI 11—respiratory failure and PSI 13—sepsis. Sensitivity was very low; for example, it was 0.13% for PSI 9—haemorrhage, 1.35% for PSI 11—respiratory failure, 1.6% for PSI 12—PE/DVT and 0.13% for PSI 13—sepsis when ICD-9-CM and NSQIP were compared.
Variation in validity across PSIs is determined by factors related to physicians (ie, chart documentation) and coders (ie, coding guidelines and coders’ practice). Coders code medical events after discharge based on chart documentation. We used chart reviews as our reference standard; therefore, completeness of chart documentation could not be evaluated. Physicians might not document consequences of medical care in charts, leading to under-coding in hospital discharge abstracts. In addition, coders at hospitals are allotted a specific amount of time per chart on average, for example, 30 min in Alberta. Thus, they might focus on coding diagnoses and procedures that contribute significantly to length of stay such as PE/DVT and ignore minor conditions such as infection or laboratory results that indicate sepsis, to follow Canada national coding guidelines. Our reviewers focused on determining the presence or absence of conditions based on all documented information in the chart, including diagnostic imaging and laboratory results. This is in contrast to general coding guidelines8
that instruct coders to confine their coding to clinical problems, conditions or circumstances that are identified in the record by the treating physicians as the clinically significant reason for the patient's admission or that require or influence evaluation, treatment, management or care. Coders do not typically code problems that do not meet these requirements, whereas the reviewers who conducted our ‘reference standard’ chart review included them regardless of the significance of the condition on resource use during hospitalisation. Coders are instructed that when a condition is suggested by diagnostic test results, they should only code the condition if it has been confirmed by physician documentation. Our previous studies demonstrated that hospital discharge abstract data quality is not related to coders’ employment status (full-time/part time and length of employment) but to physician documentation quality.19
Excluding conditions present on admission improves PSI validity. For example, the PPV for PSI 12-PE/DVT increased from 79% to 89.5% by including or excluding the presence of the condition on admission. Canada has a long history of flagging timing of condition occurrence. Some US and Australian states currently have similar data elements in their discharge abstract data, and the USA has recently begun coding the timing of conditions nationwide. Timing of condition occurrence is not aimed at judging causal relationships between medical care and complications, just flagging whether the condition occurred or was diagnosed during the hospitalisation. To capture complications, Japan has specified fields for coding complications in its hospital discharge data, in addition to diagnoses and procedures.
Could AHRQ PSIs derived from hospital discharge abstract data be utilised for comparing quality of care across countries and/or jurisdictions or for monitoring system performance in an institution? Because data quality contributes to the magnitude of PSIs, data validity has to be similar across comparison groups (such as countries, regions or jurisdictions) and over time. Thus, PSIs should not be compared across jurisdictions without validation because adjustment for data validity is necessary. Our findings suggest that PSIs could be used to screen potential cases with adverse events using administration data. Confirming the presence of these events needs additional clinical information such as chart reviews. If PSIs are used for comparison, validity of data has to be adjusted and considered in the analysis.
While PSIs are used for monitoring quality of care improvement over time, the assumption of temporal consistency of data validity has to be met. Unfortunately, we did not evaluate PSI validity over time. Quan et al21
evaluated the impact of ICD-10 implementation on data quality through the chart review of 32 conditions. Canadian ICD-10 data had significantly higher sensitivity for one condition and lower sensitivity for seven conditions relative to the ICD-9-CM data. The two databases had similar sensitivity values for the remaining 24 conditions. Walker et al22
compared coding practices between ICD-9-CM and ICD-10 and reported that the number of diagnoses coded decreased in four Canadian provinces and remained similar in other five provinces after implementation of ICD-10. Januel et al23
reported that of the 36 conditions assessed in Switzerland, κ values for the ICD-10 and chart data increased for 29 conditions and decreased for seven conditions compared with the ICD-9-CM and chart data.
Our study has limitations. First, of the 20 AHRQ PSIs, we intentionally evaluated five conditions that might have a high validity. The remaining PSIs should be evaluated in future studies. Second, we used chart data as the reference standard; conditions not documented in the chart were missing. Prospective data collection through clinical examination on these events should be conducted to establish near gold standard. Third, this study was conducted in one urban area; the validity of PSIs might vary by institutions or regions. Fourth, we evaluated the validity using PPV alone. Sensitivity, specificity and NPV should be assessed for all the PSIs. Ascertainment of sensitivity requires a large sample size and involves expensive and time-consuming resources due to the low prevalence rate of PSIs. Fifth, the sample sizes for certain PSIs are small and a 95% CI is relatively wide.
In conclusion, our study supports that PSIs could be used for case findings in the ICD-10 hospital discharge abstract data. Even PSIs with low PPVs could be used to identify potential cases from the large volume of admissions for verification through chart reviews. In contrast, their sensitivity has not been well characterised because of the inherent challenges of reviewing the huge number of charts for properly testing sensitivity. Therefore, users of PSIs should be cautious if using these for ‘quality of care reporting’ presenting the rate of PSIs because under-coded data would generate falsely low PSI rates.