|Home | About | Journals | Submit | Contact Us | Français|
To examine whether high performance on one measure of quality is associated with high performance on others and to develop a data-driven explanatory model of neonatal intensive care unit (NICU) performance.
We conducted a cross-sectional data analysis of a statewide perinatal care database. Risk-adjusted NICU ranks were computed for each of 8 measures of quality selected based on expert input. Correlations across measures were tested using the Pearson correlation coefficient. Exploratory factor analysis was used to determine whether underlying factors were driving the correlations.
Twenty-two regional NICUs in California.
In total, 5445 very low-birth-weight infants cared for between January 1, 2004, and December 31, 2007.
Pneumothorax, growth velocity, health care–associated infection, antenatal corticosteroid use, hypothermia during the first hour of life, chronic lung disease, mortality in the NICU, and discharge on any human breast milk.
The NICUs varied substantially in their clinical performance across measures of quality. Of 28 unit-level correlations only 6 were significant (P < .05). Correlations between pairs of quality measures were strong (ρ > .5) for 1 pair, moderate (.3 < |ρ| <
.5) for 8 pairs, weak (.1 < |ρ| < .3) for 5 pairs and negligible (|ρ| < .1) for 14 pairs. Exploratory factor analysis revealed 4 underlying factors of quality in this sample. Pneumothorax, mortality in the NICU, and antenatal corticosteroid use loaded on factor 1; growth velocity and health care–associated infection loaded on factor 2; chronic lung disease loaded on factor 3; and discharge on any human breast milk loaded on factor 4.
In this sample, the ability of individual measures of quality to explain overall quality of neonatal intensive care was modest.
Quality of care delivered by providers is increasingly scrutinized in an attempt to increase efficiency and improve the quality of patient care.1, 2 In other areas of medicine, performance measurement and financial incentives are common.3-5 In the neonatal intensive care unit (NICU) setting, multi-stakeholder health care organizations (such as the National Quality Forum6) and payers of health care are promoting performance assessments of perinatal care providers.
Two facets of performance measurement have received little attention. First, is it fair to draw conclusions regarding institutional performance based on a single or limited set of measures of quality of care? Conclusions based on a small or limited set assume that measured aspects of quality reflect unmeasured aspects of care. However, a study of hospital quality assessments based on hospitalwide mortality rates alone found substantial discrepancies in performance based on the methods used to calculate mortality rates. This calls into question whether it is valid to draw conclusions about quality of care based on hospitalwide mortality rates.7 In the NICU setting, good performance on one measure of quality (e.g., the proportion of infants with chronic lung disease) is assumed to indicate good performance on related measures of quality (e.g., duration of mechanical ventilation), and on unrelated measures (e.g., rates of health-care-associated infection).
The use of a limited quality measure set for comparative performance measurement would be supported if NICU performance was strongly correlated across multiple measures of quality of care. However, in other areas of health care, studies have found weak or no correlation across measures of quality of care.8-12 If intra-institutional correlations among quality measures are weak and performance inconsistent, then inferences about quality from 1 or a few measures of quality are likely uninformative and potentially misleading.13 Instead, quality should be assessed by combining multiple measures of quality into 1 or more composite indicators of quality.14
Second, should quality improvement efforts be directed toward individual measures of quality or toward building more tightly connected systems of care, so that performance can be based on several measures of quality simultaneously? Traditional approaches to quality improvement have typically addressed individual measures sequentially.15, 16 In many instances, this has promoted better, safer care, but often gains have been temporary. A growing body of literature suggests that sustained and widespread improvements in quality require changes to the system in which care occurs. For example, improvements in unit safety culture, which varies widely across NICUs17, have been linked to lasting improvements.18, 19 The system supporting care delivery is interconnected with quality of care provided. Therefore, correlations between measures of quality might be interpreted to reflect the degree of care-systems integration. Weak correlations might suggest a low degree of systems integration, in which care processes are largely functionally independent.9 Such a finding might signal the need for interventions such as improvements in safety culture20 or composite measurement of quality14, 21 that could more broadly affect performance.
Neonatal intensive care presents a natural laboratory to test whether comparative performance measurement should be approached via limited or expanded sets of measures of quality of care. Specifically, high-quality clinical data are being collected by the California Perinatal Quality Care Collaborative (CPQCC) and other quality-of-care consortia. Our group has been working with CPQCC data to develop a composite indicator of neonatal intensive care quality provided to very low-birth-weight infants, the Baby-MONITOR.21, 22 This study uses CPQCC data and eight of the measures of quality that have been selected for inclusion in the Baby-MONITOR to examine the consistency of NICU performance rankings. We hypothesized that correlations of NICU rankings across measures of quality would be at least moderate. The specific objectives of this study were to examine whether high performance on one measure of quality is associated with high performance on others and to develop a data-driven explanatory model for overall NICU performance measurement.
The CPQCC15 is a multi-stakeholder group of public and private obstetric and neonatal providers, health care purchasers, public health professionals, and private sector health industry specialists, committed to improving care and outcomes for the state's pregnant mothers and newborns. The collaborative includes more than 130 member hospitals, of which 24 are designated as regional centers. This roster accounts for most of the preterm infants requiring critical care in California.
In total, 5445 very low-birth-weight infants cared for at 22 of 24 California level III regional centers between January 1, 2004 and December 31, 2007 met inclusion criteria for the study. Of these centers, 15 are designated as level IIID on the basis of open-heart surgery performance, and the remainder are designated as level IIIC.23 We used multiyear analysis because of the few very low-birth-weight infants cared for in some institutions. Detailed descriptions of measure selection, definition, and exclusion criteria have been published elsewhere21 and are summarized in Table 1. Additional details are provided in the eAppendix (http://www.jamapeds.com).
We chose 8 quality-of-care measures that had been selected by an expert panel in a modified Delphi experiment for inclusion in the Baby-MONITOR and which have subsequently been confirmed by a sample of clinical neonatologists. Measure definitions were derived from standard CPQCC and Vermont Oxford Network algorithms.21, 24 Measures included the following: (1) antenatal corticosteroid use, (2) hypothermia (<36° C) during the first hour of life, (3) nonsurgically induced pneumothorax, (4) health care-associated bacterial or fungal infection, (5) survival to discharge or to 36 weeks’ gestational age with chronic lung disease, (6) discharge on any human breast milk, (7) mortality in the NICU during the birth hospitalization, and (8) growth velocity. Growth velocity was determined according to a logarithmic function.25 We aligned all variables so that a higher value represents a better outcome. Statistical modeling (described herein) for this analysis required transformation of continuous variables into categorical ones. Therefore, we empirically dichotomized growth velocity into high and low growth groups based on the median velocity of 12.4 g/kg/d derived from the 95% central sample. The denominators for the variables differ slightly. For example, infants who died in the NICU or who survived but remained in the NICU for more than 6 months were not included in the denominator for the breast milk variable.
We applied CPQCC standard operational definitions for all independent variables. Patients were grouped into gestational age at birth strata of 25 0/7 to 27 6/7, 28 0/7 to 29 6/7, and 30 0/7 or more weeks based on similar patient numbers between groups. Apgar score was categorized as 3 or less, between 4 to 6, or greater than 6.
Basic descriptive analyses examined the variation in unadjusted measures across sites. Hospital-level data included each level III NICU as the unit of analysis. To adjust for confounding due to differences in case mix, we developed risk adjustment models for each measure. For each one, we selected a set of candidate variables, based on reported associations in the literature or clinical relevance, and we tested for associations with the outcome of interest in univariate analyses using the Fisher exact test for categorical variables and, based on the underlying variable distribution, the t test or the 2-sample Wilcoxon signed rank test. Variables associated at a significance level of P < 0.25 were entered into a logistic regression model and variables associated at a significance level of P > 0.05 were successively removed from the model after checking the log-likelihood-ratio test for contribution to model fit.26
To rank NICU performance on each quality measure, we used a method that was developed by Draper and Gittoes27 for use in the United Kingdom educational system and which is relevant and valid in any profiling setting with dichotomous outcomes. For each NICU and for each quality measure, a z score was computed as the observed rate minus the expected rate, divided by its estimated standard error. The NICU's expected value was computed as a weighted mean of the rate (eg, the survival rate) in the overall database for all levels of the risk adjustment variables.
We used 2 approaches to examine the degree to which superior performance on one key measure (survival) was associated with superior performance on the other measures.9 First, we ranked NICU performance on each measure according to its z score and calculated correlations between the z scores using the Pearson correlation coefficient. Correlations were rated as weak, moderate, or strong according to conventional thresholds.28 Second, we compared the distribution of being ranked in the top 4 across measures to a binomial distribution using a χ2 test. A test that is statistically nonsignificant indicates that the hypothesis of independence cannot be rejected.
We performed an exploratory factor analysis to determine whether underlying factors were driving the correlations. Factor loadings in excess of 0.5 were used to classify variables into factors.
The CPQCC data are collected for quality improvement and meet the criteria for deidentified data. The dataset is then further deidentified with respect to hospital for use as a research dataset. This study was approved by the CPQCC and by the Baylor College of Medicine institutional review board.
Table 2 gives characteristics of the study sample. The means for the measures of quality of care are adjusted for illness severity at birth.
Table 3 lists z scores of performance on each variable (the standardized observed-minus-expected rate), with the NICUs labeled A through V in descending order of survival. A z score of 0 indicates that observed results on the quality measures equal the expected (ie, risk adjusted) results. A positive number indicates that performance is better than expected. We found substantial variation within measures of quality of care between NICUs, except for pneumothorax. A separate analysis using random-effects models showed significant NICU-level variation for all outcomes except for pneumothorax (data available from the author on request).
Table 4 gives the NICU-level correlation matrix among measures of quality of care. Of 28 unit-level correlations, 6 were significant (P < .05). Correlations between pairs of measures of quality of care were strong (ρ > .5) for 1 pair, moderate (.3 < |ρ| < .5) for 8 pairs, weak (.1 < |ρ| < .3) for 5 pairs and negligible (|ρ| < .1) for 14 pairs.
We found little consistency of high performance between NICUs. The number of times NICUs were among the top 4 ranks (a high performer) for the 8 measures of quality of care ranged from 0 (never among the top 4 ranks) to 4 (ranking in the top 4 for 4 out of 8 measures). Figure 1 shows the observed and expected distribution under an assumption that high performance on different measures occurs at random (according to a binomial distribution in which the probability of success on each trial is 4/22=0.18 and the 8 trials are independent). The observed distribution from the random binomial distribution was not statistically different (P > 0.9). Nevertheless, the sum of ranks (Figure 2) across measures of quality of care suggests that hospitals performing well on survival tend to do well on other measures of quality.
Exploratory factor analysis revealed 4 underlying factors of quality in this sample (Table 5). Pneumothorax, mortality in the NICU, and antenatal corticosteroid use loaded on factor 1; growth velocity and health care-associated infection loaded on factor 2; chronic lung disease loaded on factor 3; and discharge on any human breast milk loaded on factor 4. Hypothermia during the first hour of life did not load on any factor. These factors might be clinically interpreted as follows: factor 1 may reflect the quality of perinatal care, because the consequences of good perinatal care are low rates of pneumothorax and high survival; factor 2 may reflect the quality of supporting healthy development, which would be endangered by poor growth velocity and health care-associated infections; factor 3 may represent the quality of respiratory care as good care results in low rates of chronic lung disease; and factor 4 may reflect maternal involvement, which is key to achieving high rates of discharge on any human breast milk.
In this article, we examined NICU performance on 8 measures of quality of care. Except for the variable measuring pneumothorax, we found significant variation in clinical processes and outcomes among NICUs within and across each measure of quality. Correlations between most measures of quality were modest, and performance on one measure of quality had little predictive accuracy regarding performance on another. The only exception was high growth velocity and absence of health care-associated infection, which were reasonably correlated. An exploratory factor analysis revealed 4 underlying factors of quality in this sample.
Our results have important implications for the comparative performance measurement endeavor. Given the modest correlations among measures of quality of care and inconsistency among relative performances, one should not infer overall NICU quality based on a single or a few measures of quality. Our findings call into question the assumption that this measurement approach will lead to widespread improvements in quality, a method that underlies current benchmarking efforts in health care and is based on few measures of quality and a handful of diseases (e.g., diabetes, heart disease, hypertension).
Quality improvement efforts may need to focus on multidimensional improvement and build more tightly connected systems of care so that performance can be raised on several measures of quality simultaneously. We believe that exploratory factor analysis yielded results that have a meaningful clinical interpretation and may help inform a multidimensional conceptual model for measuring and understanding NICU quality. These findings could be the focus of future improvement efforts based on underlying aspects of quality that have a causal effect on the outcomes and might need to be considered together in improving overall performance. We are currently testing whether the Baby-MONITOR, if designed according to this model, better predicts other quality-related constructs, such as safety culture. If repeated elsewhere, this could lead to a more parsimonious set of quality measures to assess overall NICU quality - and offer a welcome relief to those who have to collect them. However, the limitation of such a data-driven approach may be that it may exclude the wisdom of the clinical community.
Our findings are open to different interpretations. Measures of quality may be functionally independent from each other. However, based on the clinical literature, we would a priori have expected stronger correlations. Many of the quality measures (such as mortality in the NICU and antenatal corticosteroid use) have demonstrated strong causal links in randomized controlled trials.29 We speculate that providers that excelled in one area of quality would similarly excel in others; furthermore, NICUs that reliably followed processes to avoid health care-associated infections would achieve better growth and lower mortality.
We interpret our results to spotlight a low degree of systems integration within the NICU setting. Neonatal intensive care may not exist as a tightly integrated and standardized care delivery system. The NICUs appear to have the ability to excel in some areas of care but not in others.
One way to promote systems-based care may be to meaningfully measure overall quality of care by combining individual measures of quality into a composite indicator of quality.21 One study30 showed that, while adherence to individual process measures of surgical infection prevention did not predict the postoperative infection, a composite of prevention measures did; similarly, the study found a quality signal based on the sum of ranks across NICUs, which would have been difficult to detect based on individual measures of quality. Our group is working to develop a composite indicator of NICU quality, the Baby-MONITOR, based on an explicit and rigorous framework.14 Until such composite indicators have been developed and tested in a rigorous manner to ensure internal and external validity, it seems that conclusions about overall quality of care based on measurement of restricted measure sets should be viewed with skepticism.
This study must be evaluated within the context of its design. Our investigation relies on data submitted to the CPQCC by the NICUs and not by independent medical records abstractors. This may raise concern regarding the validity of the data. However, little incentive exists for NICUs to systematically submit inaccurate data because this would diminish the usefulness of data feedback from the CPQCC, a service that NICUs pay for. In addition, data validity is strengthened by the CPQCC's use of standardized data abstraction protocols and operation manuals, as well as by automated data quality management tools to identify potentially inaccurate data entries.
An alternative explanation for our findings of modest correlation of NICU performance across different measures of quality could be that quality of care among our relatively small sample of California regional NICUs was similar. It may be that specific state-level policies foster care processes and cultures that are alike, making it harder to find diverging performance. On the other hand, investigations have found large differences in performance across other networks.31 Nevertheless, the specific attributes of the present study may hamper generalizability to other states and types of NICUs.
We developed individual risk adjustment models to control for confounding due to clinical risk at birth. These models have not been validated in other samples; therefore, it is possible that the models introduced bias into our results, although the direction of this bias is not easily ascertained. Similarly, residual confounding introduced by unobserved variables (such as academic affiliation or staffing ratios) may have influenced our results.
In conclusion, modest correlations of NICU performance on multiple measures of quality were observed. Benchmarking of NICU quality based on isolated indicators of quality may not reflect or improve overall quality of care. Multidimensional measurement of performance via composite indicators might promote multidimensional improvement using system-based interventions.
Funding support: Jochen Profit's contribution is supported in part by the Eunice Kennedy Shriver National Institute of Child Health and Human Development #1 K23 HD056298 (PI: Profit). Dr. Petersen was a recipient of the American Heart Association Established Investigator Award (#0540043N) at the time this work was conducted. Drs. Petersen and Hysong also receive support from a Veterans Administration Center Grant (VA HSR&D CoE HFP90-20). Dr. Hysong's contribution is supported in part by the Department of Veterans Affairs Health Services Research and Development Program (CD2-07-0181).
The authors have no financial relationships relevant to this article to disclose.
Author Contributions: Drs. Profit and Pietz had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Individual author contributions: ICMJE criteria for authorship read and met: JP JBG DD MAK KP JAZ SJH LAP. Agree with the manuscript's results and conclusions: JP JBG DD MAK KP JAZ SJH LAP. Designed the experiments/the study: JP JAZ LP DD KP JBG LAP. Analyzed the data: JP MAK DD KP. Wrote the first draft of the paper: JP LAP JBG JAZ. Assisted with approach and selection of data inputs: JP LAP JBG JAZ.
Assisted with interpretation of results: JP LAP JBG DD JAZ MK SJH. Contributed to revision of the paper: JP JBG DD MAK KP JAZ SJH LAP.
Additional Contributions: We thank Aloka Patel and Rush University Medical Center for granting Dr. Profit a nonexclusive license to use Rush's exponential infant growth model for noncommercial research purposes.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.