|Home | About | Journals | Submit | Contact Us | Français|
To determine the percentage of hospitals with adequate sample size to meaningfully compare performance using the Agency for Healthcare Research and Quality (AHRQ) pediatric inpatient quality indicators (PDIs), which measure pediatric inpatient adverse events such as decubitus ulcer rate and infections due to medical care, have been nationally endorsed, and are currently publicly reported in at least two states.
We performed a cross-sectional analysis of California hospital discharges from 2005–2007 for patients <18 years old. For nine hospital-level PDIs, after excluding discharges with PDIs indicated as present on admission, we determined for each PDI: the volume of eligible pediatric patients for each measure at each hospital, the statewide mean rate, and the percent of hospitals with adequate volume to identify an adverse event rate twice the statewide mean.
Unadjusted California-wide event rates for PDIs during the study period (N=2,333,556 discharges) were 0.2-38/1000 discharges. Event rates for specific measures were, for example, 0.2/1000 (iatrogenic pneumothorax in non-neonates), 19/1000 (post-operative sepsis) and 38/1000 (pediatric heart surgery mortality), requiring patient volumes of 49,869, 419, and 201 to detect an event rate twice the statewide average; 0%, 6.6%, and 25% of California hospitals had this pediatric volume, respectively.
Using these AHRQ-developed, nationally-endorsed measures of the quality of inpatient pediatric care, one would not be able to identify many hospitals with performance two times worse than the statewide average due to extremely low event rates and inadequate pediatric hospital volume.
The quality of pediatric healthcare in the United States is less than optimal. In the outpatient setting, children receive recommended care only half the time1 and they are at significant risk for patient safety events in inpatient and emergency department settings.2–3 Efforts to improve the quality of pediatric healthcare depend on the availability of measures that accurately describe the quality of care delivered and are feasible to implement.4–5 Identifying and defining such measures is ongoing—multiple national organizations are developing or reporting pediatric quality indicators,6–11 and the Children’s Health Insurance Program Reauthorization Act of 2009 requires the development and refinement of pediatric quality measures for children nationally.12
The pediatric quality indicators (PDIs) developed by the Agency for Healthcare Quality and Research (AHRQ) are among the pediatric quality measures currently available and readily implementable using existing administrative data.10–11 AHRQ’s PDI’s were designed to provide valid estimates for population-based assessment or hospital comparison of adverse events associated with significant morbidity and mortality – examples of these are decubitus ulcer, foreign body left in during a procedure, and iatrogenic pneumothorax. A subset of the PDIs have been endorsed by the National Quality Forum,13–14 and selected for mandated public reporting in Texas and Vermont, and other states are contemplating the same.15–17 The National Quality Forum endorsement is likely to lead to wider use of these measures for comparative purposes.18 It is therefore crucial to examine the strengths and limitations of the PDIs for use in hospital comparison.
A priori we know that a combination of low event rates and small numbers of patients eligible for a quality measure (i.e., small sample size) can impair our ability to distinguish between hospitals delivering high and low quality care. Previous work has illustrated the degree of this problem for adult hospital care,19–20 but we are unaware of work examining the extent of the problem for inpatient pediatrics measures. While the PDIs measure important events, these events occur infrequently,10 and sample size at any given hospital can be quite variable; the volumes of children admitted to hospitals can range between quite large in the minority of primarily pediatric hospitals and much smaller in the vast majority of hospitals that are not primarily pediatric.21–24 Because low event rates and small sample sizes for the PDIs may impair our ability to compare hospitals on the quality of their pediatric care,25–26 we undertook this analysis to quantify this possible limitation, given the PDI’s current policy-relevance in the context of the NQF endorsement and current use in statewide public reporting programs.13, 15–16 We determined mean PDI rates, the number of pediatric discharges from each hospital, and the proportion of hospitals with sufficient pediatric volume to identify adverse event rates twice the statewide mean, using hospital discharge data from California. We anticipate that this analysis will provide useful information for the consideration of other inpatient pediatric quality measures, particularly measures proposed for use in comparative performance rating of all hospitals regardless of pediatric volume (e.g., for statewide public reporting purposes).
The guiding premise of the following methodological choices was to provide a ‘best case’ scenario – that is, to determine the greatest proportion of hospitals that could meaningfully participate in quality reporting using the PDIs and existing data and public reporting conventions.
We used publicly available administrative data for 2005–2007 from the California Office of Statewide Health Planning and Development (OSHPD), which compiles discharge abstracts for all discharges from non-Federal hospitals in California, where 12% of United States children reside.27 Although it would have been preferable to provide national estimates of the scope of this problem, we used the California’s discharge data because of two key methodological advantages it provided over the two nationally representative databases containing pediatrics admissions, the Healthcare Cost and Utilization Project’s Nationwide Inpatient Sample (NIS)28 and Kid’s Inpatient Database (KID).29 First, the OSHPD database contains a “present on admission” (POA) indicator for each diagnostic code; this indicator is not present in the NIS or the KID datasets. California is one of only two states—and the larger of the two—that includes POA indicators. The POA indicator is essential for excluding adverse events that are not due to care provided by that hospital during the admission; failing to exclude these would inappropriately inflate event rates and overestimate our ability to compare hospital quality. Second, OSHPD data can be used to calculate event rates for individual hospitals; hospital-level event rates cannot be calculated from the KID database because it samples only a subset of discharges from participating hospitals.29
For all PDIs, a main inclusion criteria is age <18 years. The masking of age information is one way OSHPD protects patient identities. None of the PDIs we analyzed require specific age in years. To limit the dataset by age, we used the “age-in-years” variable when it was available (53% of discharges); for the remainder of discharges, we used two categorical age variables when they were available (40% of discharges), and imputation from diagnosis related groups for discharges that had no age variables available (7%). Please see Appendix A for details on these methods.
We examined hospital volume of eligible discharges over three years, rather than, for example, one or five years, for practical and conceptual reasons. In preliminary analyses, we found that so few hospitals (range: 0%–6%) had sufficient volume during one year to identify poor performers that we decided to pool data across a greater number of years. Assessing hospital performance over five years required using data too old to be indicative of current performance, thereby diluting or concealing any recent quality improvements. While this argument could apply to using three years of data, the problem is not as severe. Notably, the United States Department of Health and Human Services recently adopted a three year aggregation for benchmarking comparison in its public reporting of outcomes measures on the Hospitalcompare.gov website.30
The UCSF institutional review board approved the study.
We examined nine of the 13 provider-level PDIs (Table 1). The development and technical specifications of the PDIs are available from AHRQ.11 Each indicator has a definition for population-at-risk and inclusion and exclusion of cases using ICD9 codes, length of stay, and patient characteristics such as birth weight. Eligible patients were identified and indicator rates were calculated using AHRQ software.11
We did not examine the remaining four provider-level AHRQ PDIs because: the OSHPD database lacked age in days, used to determine eligibility for the iatrogenic pneumothorax in neonates PDI;11 the transfusion reaction and foreign body left in during a procedure indicators were considered “never events”—intended to trigger an immediate hospital response and to be reported as absolute counts rather than rates;11, 31 and lastly, the hospital volume of pediatric cardiac surgeries was redundant with our study question (and is not a rate). We do not address the area-level PDIs (e.g. diabetes short-term complication rate, asthma admission rate) as these were created to monitor broader systems of care rather than individual hospitals.11
We determined the total number of events, eligible discharges, number of hospitals caring for eligible children, and statewide event rates for each PDI using OSHPD data for 2005–2007. Using the statewide mean PDI rates, we calculated the volume of discharges needed to detect a doubling of PDI rates. We chose a two-fold poorer performance a priori to highlight the proportion of hospitals that lacked the volume needed to detect what could be considered a moderate deficiency. This level of deficiency has been used as a detection threshold in prior work assessing other AHRQ quality indicators.19 Smaller differences may be clinically significant, but would be less likely to be detectable given the low national event rates for the PDIs.10
Lastly, we determined the percentage of California hospitals with an adequate volume of pediatric discharges to assess a doubling of these adverse event-based PDI rates and the proportion of pediatric discharges from hospitals with adequate volume.
To assess the generalizability of California hospital estimates, we used the NIS, which samples all admissions from each hospital in a nationally representative group of hospitals. 28 We assessed the annual median number and interquartile range of eligible discharges in California hospitals (OSHPD) and the annual median number and interquartile range of eligible discharges in hospitals nationally (NIS), using 2007 data.
The AHRQ software can incorporate POA variables, and uses an algorithm to exclude hospitalizations with a PDI that was POA. In this way, hospitals that admit patients to address complications (e.g., surgical revision for a decubitus ulcer) are not penalized. We used POA indicators because prior work has shown that rates of some PDIs differ considerably depending on whether POA conditions are excluded.32 Because only one other state collects POA variables, we conducted a sensitivity analysis inclusive of discharges with conditions POA.
The statewide mean event rate for each PDI was the basis for our sample size calculations to detect an effect size two-fold higher (worse) than this benchmark rate, utilizing a one-sided test (alpha=0.05) with a power of 80%. The one-sided test and power level <90% led to a lower and therefore more conservative estimate of the sample size necessary to detect a difference between hospitals. We assumed that the statistical test used to compare the proportions would be a Chi-squared test. We used a one sample test as we were comparing hospital rates to a benchmark.
We used AHRQ software and programming per technical specifications (AHRQ Windows Indicator Software, version 3.2, Rockville, MD)33 to calculate unadjusted PDI rates for discharges at each hospital and statewide using administrative data. All other calculations were performed using StataIC 10 (STATA Corp., College Station, TX).
In total, there were 2,333,556 pediatric discharges from 381 California hospitals in 2005–2007. Statewide PDI rates ranged from 0.2/1000 eligible discharges for iatrogenic pneumothorax in non-neonates to 38/1000 for cardiac surgery mortality. There were few events statewide for any of the PDIs (Table 1).
The median number of pediatric discharges on which to measure quality at individual hospitals in California ranged from 3–373 annually, depending on the particular quality measure. Hospitals in the NIS had similar median numbers of eligible discharges for some measures and somewhat lower numbers of discharges for others (Table 2). For some measures, almost all hospitals were eligible for performance measurement (e.g., 97% of hospitals had patients eligible for measurement of selected infections due to medical care), while for others, only a small proportion of hospitals were eligible (6% of hospitals performed pediatric cardiac surgery).
Depending on the measure, the number of discharges required to detect a doubling of the adverse event-based PDI measures ranged from 201 (cardiac surgery mortality) to 49,869 (iatrogenic pneumothorax in non-neonates). Few hospitals had sufficient pediatric discharge volume over three years (2005–2007) to detect a doubling of rates for any of the indicators (maximum of 25% of hospitals with discharges for pediatric cardiac surgery, Table 3). Figure 1 illustrates the relationship for each indicator between the number of eligible discharges in hospitals in California 2005–2007 (the median and the interquartile range), the state mean rate for the indicator, and a line depicting the number of discharges needed to detect a doubling of event rate over a range of rates (0.1–1000 events per 1000 discharges). The proportion of discharges from hospitals with adequate volume ranged from 0% to 92%, depending on the indicator (Table 3).
The analysis that included all events, without excluding those that were present on admission, showed slightly increased event rates (Table 1) and a resulting decrease in the number of discharges needed to detect a two-fold difference in event rates, but found that only a few hospitals had adequate volume to detect the differences (up to 28% of hospitals with discharges eligible for “Selected infections due to medical care”, data not shown).
This study quantifies the limited utility of AHRQ PDIs for hospital-level public reporting due to low rates of the adverse events and low volumes of children in most hospitals. While many children who are eligible for a subset of the PDIs are seen at hospitals with adequate volumes to detect a doubling of the rate, there are many hospitals admitting pediatric patients that have inadequate volumes for any measure (Table 3). We found that 25% or fewer of California hospitals had pediatric volumes that would permit detection of a doubling of PDI rates, even though we aggregated three years of data.
This is also the first study to assess PDI rates using a database containing present on admission (POA) indicators. As expected, we found that adverse events rates were lower when POA indicators were used (Table 1), and were lower than two prior reports,10, 32 one of which used a national database that did not contain POA indicators.10 The percentage of hospitals with adequate volume to detect a two-fold difference in PDI event rates was low in our main and sensitivity analyses, regardless of the use of POA indicators.
A performance measurement system for identifying high adverse event rates that cannot identify hospitals with rates twice the state mean could mislead consumers, purchasers, and even the hospitals themselves into believing that such hospitals’ performance is near the benchmark. If one restricted quality measurement to hospitals with adequate pediatric volume, this strategy would leave completely unmeasured the care delivered at lower-volume institutions and could exclude a large proportion (8–100%) of pediatric admissions (Table 3).
The implications of our findings are immediately relevant to those states publicly reporting the quality of inpatient pediatric care. We are aware of three efforts to do so: Texas and Vermont are currently reporting hospital-level performance for a subset of the PDIs,15–16 and Florida is considering the same.17 In Texas, even hospitals with very low volumes (e.g. less than 5) are categorized as high, average or low quality15 and the findings in this paper imply that “average” labels applied to hospitals with such volumes may be misleading.
Our study suggests some possible solutions to the pediatric problem of low event rate and low sample size. The analysis shows that for some measures, most remarkably pediatric cardiac surgery mortality, almost all patients in the state eligible for the measure are being seen at hospitals with adequate volume to detect a two-fold difference (Table 3). This provides evidence that some specialized care is regionalized and demonstrates that the quality of this type of care can be assessed and compared using this measure at institutions where many children are seen. Thus, one approach to pediatric inpatient quality measurement is to compare it only in high volume hospitals using measures most appropriate to the patient population. However, a proportion of children are admitted to a large group of hospitals without adequate volume and so the need remains for measures that can be used to assess care for children seen at non-regional centers.
The analysis also shows that measures with low event rates (e.g. 0.6/1000 for accidental puncture) require a hospital volume that rarely occurs in pediatrics. Hence, another approach is to compare hospitals only on outcomes with a minimum event rate (e.g. 20–40/1000, requiring a volume of 200–400 patients annually to detect a two-fold difference, Tables 1 and and3).3). In order to develop measures that are likely to have adequate event rates and apply to patients seen at most hospitals, not just tertiary care referral centers, we need a better understanding of the ecology of inpatient pediatric care—what types of patients (defined by diagnosis or age) are seen at lower volume hospitals and what types of relevant outcomes or processes of care occur often enough in most hospitals caring for children to demonstrate variability in hospital quality.
Additionally, to improve public reporting for existing measures with a low event rate but for which care is mostly delivered to patients at hospitals with adequate volume—such as selected infections due to medical care or postoperative sepsis— reporting conventions could be changed to reflect uncertainty when it exists. A common convention is to report only on hospitals with ≥30 eligible patients for a measure.34 Alternatively, Texas or Vermont, for instance, could decide on a clinically meaningful difference that must be detectable before it reports performance (e.g., “A hospital must have enough volume that we could detect an adverse event rate ‘X’ times the statewide mean, given the hospital’s expected event rate”). For hospitals whose volume is below the minimum number, the report could note “too few cases to rate.” This provides better information than labeling such hospitals as “average”. Another advantage of this approach is that basing the volume required to meet the reporting threshold on each hospital’s expected rate allows the minimum volume needed at each hospital to vary based on that hospital’s patient population. That is, hospitals that only cared for patients at lower risk would need more patients to reach the reporting threshold; those hospitals with higher risk patients would need fewer patients to reach the threshold. This would be statistically and clinically more appropriate than a single threshold used regardless of a hospital’s patient population.
Lastly, another solution to the issue of small numbers is to create a composite measure of multiple outcomes. In 2008, the PDI workgroup at AHRQ designed a composite PDI measure,35 which is a summarized, risk-adjusted, weighted measure of all provider-level PDIs except pediatric cardiac surgery. Advantages of composite measures include: they likely improve the ability to detect differences between groups, since they aggregate data to increase the numbers of patients affected and the number of events; and they may capture quality of hospital care more globally than individual measures.35 We do not report on the PDI composite here for several reasons. It combines disparate indicators (e.g., decubitus ulcer rate, post-operative wound dehiscence, infections due to medical care, and others) and so does not reflect one underlying domain of care (e.g. surgical care). This obscures from consumers of a public report the reasons for a hospital’s poor performance on the composite and so limits its usefulness in consumer decision making. It also is not a rate and so was not amenable to our analysis. In addition, the current specifications for the measure allow the user to determine the weight each indicator contributes to the composite score, which limits its usefulness in standardized comparisons of publicly reported hospital performance,33 and which leads to variable sample sizes needed to detect the same differences in hospital performance, depending on the weighting scheme used. Thus, though the PDI composite has the potential to address the limitations of event rate and hospital volume noted in our analysis, these substantive and methodological questions, which are beyond the scope of this paper, remain.
This study is limited by three main factors. First, our use of OSHPD data, though a strength due to our ability to exclude patients whose outcomes do not reflect inpatient quality of care, potentially limits the generalizability of our findings. California hospitals are, however, similar to hospitals nationally in terms of their eligible patient volumes (Table 2), suggesting that the limitations we demonstrate in California for detecting differences in quality in hospitals will be similar to hospitals nationwide.
Second, this study used a data imputation method to identify patients with age<18 years. It is possible that this may have systematically biased estimates if adverse events were more or less common in a group of pediatric discharges excluded in the data imputation. As the age-masking is applied randomly, however, we think that a systematic bias in this regard is unlikely.
Third, we assumed the same risk level for each group of patients eligible for measures at individual hospitals. In states that implement the PDIs, hospitals with lower-risk patients will have lower expected outcome rates than the state mean and so would need a larger number of eligible patients in order to be able to detect a rate twofold the state mean rate (and hospitals with higher expected rates would need fewer patients). This means that, within each PDI, the sample size required to detect a twofold difference actually varies among hospitals, as their expected rates vary. So our comparisons of all hospitals to the statewide mean can only be an estimate, but one that adequately illustrates the limitations of sample size in the use of low event rate indicators for hospital comparison.
The PDIs are adverse event measures that are easy to calculate in most states and are being used by some to compare all hospitals caring for children. We find, however, that using the PDIs in this way is limited by low adverse event rates among pediatric patients and low hospital volumes of pediatric patients, even when combining three years of data.
We would like to acknowledge James Anderson, BA, at the University of California San Francisco for his assistance in the data management and figure creation.
Dr. Bardach is supported by an Institutional Training Grant from the National Institute for Childhood Health and Development (T32 HD044331. PI: Hawgood)
Dr. Chien is supported by a Career Development Award from the Agency for Healthcare Research and Quality (K08 HS17146-01. PI: Chien). Dr. Dudley’s work was supported by a Robert Wood Johnson Foundation Investigator Award in Health Policy.
Determination of population with age<18 years of age using OSPHD publicly available dataset
For all PDIs, a main inclusion criteria is age <18 years. A proportion of the OSHPD public dataset discharges have the “age-in-years” variable masked in order to protect patient privacy. To limit the 2005–2007 OSHPD dataset of all discharges to discharges of age <18 years, we used the “age-in-years” variable when it was available (53% of discharges); when this variable was not available, we used two categorical age variables, a 20-category variable (used in 33% of the discharges) and a 5-category variable (used in 7% of the discharges), and diagnosis related groups for the remainder of the discharges (7%).
The 20 category age variable was defined as follows: under 1 year, 1–4 years, 5–9 years, 10–14 years, 15–19 years, and the remaining discharges in 15 categories for ages>19 years. We considered all discharges in the under 1 year to the 10–14 years categories to be <18 years. For the 15–19 year old category, we set the age in years for these discharges to the mean age of discharges in the category, which was greater than 18 years. 1.3% of all discharges were in the 15–19 year old category, none of which were included in the analysis as the mean age for patients with only that age variable available in that category was >18 years. The 5-category age variable, included a category of 0–17 years; we excluded any discharges in the other 4 categories.
For observations with all age variables masked, we imputed age based on DRG category as follows: age was set to the mean age of all discharges with the same DRG that also had the age in years variable available. Lastly, the AHRQ software excludes any discharges with an adult DRG, but none of the discharges in our cleaned dataset met this criterion for exclusion.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of interest:
The authors have no potential conflicts of interest to disclose.
Data access: Dr. Bardach had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.