This study quantifies the limited utility of AHRQ PDIs for hospital-level public reporting due to low rates of the adverse events and low volumes of children in most hospitals. While many children who are eligible for a subset of the PDIs are seen at hospitals with adequate volumes to detect a doubling of the rate, there are many hospitals admitting pediatric patients that have inadequate volumes for any measure (). We found that 25% or fewer of California hospitals had pediatric volumes that would permit detection of a doubling of PDI rates, even though we aggregated three years of data.
This is also the first study to assess PDI rates using a database containing present on admission (POA) indicators. As expected, we found that adverse events rates were lower when POA indicators were used (), and were lower than two prior reports,10, 32
one of which used a national database that did not contain POA indicators.10
The percentage of hospitals with adequate volume to detect a two-fold difference in PDI event rates was low in our main and sensitivity analyses, regardless of the use of POA indicators.
A performance measurement system for identifying high adverse event rates that cannot identify hospitals with rates twice the state mean could mislead consumers, purchasers, and even the hospitals themselves into believing that such hospitals’ performance is near the benchmark. If one restricted quality measurement to hospitals with adequate pediatric volume, this strategy would leave completely unmeasured the care delivered at lower-volume institutions and could exclude a large proportion (8–100%) of pediatric admissions ().
The implications of our findings are immediately relevant to those states publicly reporting the quality of inpatient pediatric care. We are aware of three efforts to do so: Texas and Vermont are currently reporting hospital-level performance for a subset of the PDIs,15–16
and Florida is considering the same.17
In Texas, even hospitals with very low volumes (e.g. less than 5) are categorized as high, average or low quality15
and the findings in this paper imply that “average” labels applied to hospitals with such volumes may be misleading.
Our study suggests some possible solutions to the pediatric problem of low event rate and low sample size. The analysis shows that for some measures, most remarkably pediatric cardiac surgery mortality, almost all patients in the state eligible for the measure are being seen at hospitals with adequate volume to detect a two-fold difference (). This provides evidence that some specialized care is regionalized and demonstrates that the quality of this type of care can be assessed and compared using this measure at institutions where many children are seen. Thus, one approach to pediatric inpatient quality measurement is to compare it only in high volume hospitals using measures most appropriate to the patient population. However, a proportion of children are admitted to a large group of hospitals without adequate volume and so the need remains for measures that can be used to assess care for children seen at non-regional centers.
The analysis also shows that measures with low event rates (e.g. 0.6/1000 for accidental puncture) require a hospital volume that rarely occurs in pediatrics. Hence, another approach is to compare hospitals only on outcomes with a minimum event rate (e.g. 20–40/1000, requiring a volume of 200–400 patients annually to detect a two-fold difference, and ). In order to develop measures that are likely to have adequate event rates and apply to patients seen at most hospitals, not just tertiary care referral centers, we need a better understanding of the ecology of inpatient pediatric care—what types of patients (defined by diagnosis or age) are seen at lower volume hospitals and what types of relevant outcomes or processes of care occur often enough in most hospitals caring for children to demonstrate variability in hospital quality.
Additionally, to improve public reporting for existing measures with a low event rate but for which care is mostly delivered to patients at hospitals with adequate volume—such as selected infections due to medical care or postoperative sepsis— reporting conventions could be changed to reflect uncertainty when it exists. A common convention is to report only on hospitals with ≥30 eligible patients for a measure.34
Alternatively, Texas or Vermont, for instance, could decide on a clinically meaningful difference that must be detectable before it reports performance (e.g., “A hospital must have enough volume that we could detect an adverse event rate ‘X’ times the statewide mean, given the hospital’s expected event rate”). For hospitals whose volume is below the minimum number, the report could note “too few cases to rate.” This provides better information than labeling such hospitals as “average”. Another advantage of this approach is that basing the volume required to meet the reporting threshold on each hospital’s expected rate allows the minimum volume needed at each hospital to vary based on that hospital’s patient population. That is, hospitals that only cared for patients at lower risk would need more patients to reach the reporting threshold; those hospitals with higher risk patients would need fewer patients to reach the threshold. This would be statistically and clinically more appropriate than a single threshold used regardless of a hospital’s patient population.
Lastly, another solution to the issue of small numbers is to create a composite measure of multiple outcomes. In 2008, the PDI workgroup at AHRQ designed a composite PDI measure,35
which is a summarized, risk-adjusted, weighted measure of all provider-level PDIs except pediatric cardiac surgery. Advantages of composite measures include: they likely improve the ability to detect differences between groups, since they aggregate data to increase the numbers of patients affected and the number of events; and they may capture quality of hospital care more globally than individual measures.35
We do not report on the PDI composite here for several reasons. It combines disparate indicators (e.g., decubitus ulcer rate, post-operative wound dehiscence, infections due to medical care, and others) and so does not reflect one underlying domain of care (e.g. surgical care). This obscures from consumers of a public report the reasons for a hospital’s poor performance on the composite and so limits its usefulness in consumer decision making. It also is not a rate and so was not amenable to our analysis. In addition, the current specifications for the measure allow the user to determine the weight each indicator contributes to the composite score, which limits its usefulness in standardized comparisons of publicly reported hospital performance,33
and which leads to variable sample sizes needed to detect the same differences in hospital performance, depending on the weighting scheme used. Thus, though the PDI composite has the potential to address the limitations of event rate and hospital volume noted in our analysis, these substantive and methodological questions, which are beyond the scope of this paper, remain.
This study is limited by three main factors. First, our use of OSHPD data, though a strength due to our ability to exclude patients whose outcomes do not reflect inpatient quality of care, potentially limits the generalizability of our findings. California hospitals are, however, similar to hospitals nationally in terms of their eligible patient volumes (), suggesting that the limitations we demonstrate in California for detecting differences in quality in hospitals will be similar to hospitals nationwide.
Second, this study used a data imputation method to identify patients with age<18 years. It is possible that this may have systematically biased estimates if adverse events were more or less common in a group of pediatric discharges excluded in the data imputation. As the age-masking is applied randomly, however, we think that a systematic bias in this regard is unlikely.
Third, we assumed the same risk level for each group of patients eligible for measures at individual hospitals. In states that implement the PDIs, hospitals with lower-risk patients will have lower expected outcome rates than the state mean and so would need a larger number of eligible patients in order to be able to detect a rate twofold the state mean rate (and hospitals with higher expected rates would need fewer patients). This means that, within each PDI, the sample size required to detect a twofold difference actually varies among hospitals, as their expected rates vary. So our comparisons of all hospitals to the statewide mean can only be an estimate, but one that adequately illustrates the limitations of sample size in the use of low event rate indicators for hospital comparison.