|Home | About | Journals | Submit | Contact Us | Français|
Previous studies of very low birth weight (VLBW) hospital volume effects on inhospital mortality have used standard risk-adjusted models that only account for observable confounders but not for self-selection bias due to unobservable confounders.
To assess the effects of hospital volume of VLBW infants on in-hospital mortality while explicitly accounting for unobservable confounders and self-selection bias using an instrumental-variables model.
The sample includes 4,553 VLBW infants born in 63 hospitals in 2000–2004 in New Jersey. We employ instrumental-variables analysis using as instruments the differences between the patient’s distances to the nearest low (<50 VLBW infants annually), moderate (51–100 infants annually) and high (>100 VLBW infants annually) volume hospitals. We evaluate several volume measures and adjust for observable infant and hospital characteristics.
We find beneficial volume effects on survival that are significantly underestimated in classical risk-adjusted models, under which low and moderate volumes compared to high volumes increase mortality odds by 1.8 and 1.88 times, respectively (risk ratios of 1.4 and 1.5, respectively). However, using an instrumental variables approach, we find that low and moderate volumes increase mortality odds by 5.42 and 3.51 times, respectively (risk ratios of 2.76 and 2.21, respectively). These findings suggest unobservable confounders that increase the selection of infants at a greater mortality risk into higher volume hospitals.
Accounting for unobserved self-selection bias reveals large survival benefits from delivering and treating VLBW infants at high-volume hospitals. This supports policies regionalizing the delivery and care for pregnancies at-risk for VLBW at high-volume hospitals.
Infants born at very low birth weight (VLBW) – less than 1500 grams – have high neonatal/infant mortality rates. In 2001 in the US, where 1.4% of live births were VLBW, neonatal mortality was 21.38%.1 While infant mortality rates have decreased significantly over the past two decades, the decline in mortality for VLBW infants has been approximately 50% of that for higher birth weight infants,1 indicating increasing survival disparity between VLBW infants and those at higher birth weight.
Regionalizing delivery and care for VLBW infants has received considerable research and consumer-group attention as an approach for enhancing survival.2, 3 The underlying theory is that higher-volume hospitals acquire knowledge and experience from treating more cases and increase their staff human capital and technological resources, which in turn improve care quality and patient outcomes.
Few studies have evaluated the effects of VLBW hospital volume on in-hospital mortality in the US and other developed countries.2, 4–7 Most studies have generally found small to moderate reductions in mortality with increasing volume.2, 4–6 However, one study has reported small reductions in mortality with volume up to 50 VLBW infants annually, and small mortality increases with volume above this threshold.7
A main analytical challenge in volume-outcome studies that only rely on adjusting for observable confounders is the potential bias in volume-effect estimates due to self-selection into hospitals based on volume and “unobserved” individual-level confounders, such as maternal/fetal health risks. The direction of such bias is theoretically ambiguous. Volume benefits may be underestimated if more severe/complicated VLBW cases are more likely to be delivered at higher-volume hospitals (adverse self-selection). In contrast, volume benefits may be overestimated if less severe/complicated cases select more into higher-volume hospitals (favorable self-selection). All previous studies of VLBW-volume effects on mortality have only adjusted for observable confounders and none has explicitly accounted for self-selection bias from unobservable confounders.2, 4–7
In this study, we assess the effects of hospital volume of VLBW infants on in-hospital mortality using an instrumental-variable model in order to account for self-selection into hospital volume based on unobservable confounders.
The study sample includes 4,553 VLBW infants born in 63 hospitals in New Jersey (NJ) between 2000 and 2004 and identified from the State Inpatient Dataset (NJ-SID).8 The NJ-SID includes the entire population of in-hospital births in NJ. The data are from inpatient hospital-discharge records systematically reported from hospitals to the NJ Department of Health and Senior Services and are released under the auspices of the Agency for Healthcare Research and Quality (AHRQ). The 2000–2004 American Hospital Association Annual Surveys of Hospitals are used to obtain data on hospital beds, staffing level, and teaching status. Data from the 2009 American Academy of Pediatrics list of neonatal intensive care units (NICUs) are used to assign hospital-NICU levels. NICU data for years 2000–2004 are unavailable. However, it is unlikely that hospital-NICU levels frequently change over a few years, and we only use NICU level as a covariate. Furthermore, we find that the results are overall insensitive to excluding NICU level altogether. We include zip-code level average school years completed among adult females and median-household income from the 2000 U.S. Census and county-level unemployment rates from the U.S. Bureau of Labor Statistics for years 2000–2004.
We limit the main sample to in-hospital delivered VLBW infants who were either discharged home after delivery or died in the hospital. Infants transferred to another hospital are excluded from the main analyses for two reasons. First, we focus on identifying the effects of “being born and cared for” at lower-versus higher-volume hospitals. This is the main question of interest for researchers and policymakers when evaluating the effectiveness of care for VLBW infants at lower-volume hospitals. Excluding transferred infants should not bias the volume effect for non-transferred infants who remain hospitalized at lower-volume hospitals. In addition to this conceptual justification, we face a data limitation; even though we observe discharge status for each infant, we cannot link with certainty pre- and post-transfer data for transferred infants since the dataset has no unique-infant identifiers. Previous studies with identifiers have generally assigned transferred infants into the volumes of the hospitals to where they were transferred. This may bias the volume effect downward by ignoring the pre-transfer care at lower-volume hospitals. An opposite bias may occur if transferred infants are assigned to the volumes of the transferring hospitals. We employ sensitivity analyses described below that explicitly account for infant transfers and find that our inference is robust to excluding transferred infants.
We exclude infants of mothers residing outside of NJ for whom we cannot calculate the distance instruments defined below and infants with additional health conditions (including congenital anomalies, birth weight less than 500 grams, gestational age less than 24 weeks, and multiple birth) that may modify the volume effects but are a small group to study alone. Finally, we exclude infants with incomplete data on study variables.
The outcome is in-hospital mortality of VLBW infants, defined by death after delivery and before discharge to home based on hospital discharge status. The main volume measures are two binary indicators for ranges of the number of VLBW infants born in the hospital in a given year. Using indicators for volume ranges accounts for potential non-linearity in volume effects. We designate hospitals as low, moderate, and high volumes where the number of VLBW births per year was ≤ 50, 51–100, and >100, respectively, and estimate the effects of low and moderate volumes relative to high volume (reference category). These ranges are consistent with cutoffs in previous studies,4, 6 allowing comparison to their results, and have well-balanced frequencies given the sample’s distribution of the hospitals’ numbers of VLBW infants (about 19%, 34%, and 47% of infants are in low, moderate, and high volume categories, respectively).
We also evaluate a continuous volume measure representing the hospital annual number of VLBW infants. Furthermore, we evaluate the effects of this continuous measure within two volume thresholds: up to 50 VLBW infants annually and above 50 VLBW infants in order to compare to previous research.7 We do this by adding an interaction term between continuous volume and a binary threshold indicator (≤50 versus >50 VLBW infants).
Instrumental variables (IV) are used to identify treatment effects accounting for self-selection into treatments (hospital volume here) based on unobserved confounders. 9–11 The method employs variables, called instruments that should be: 1) strongly related to treatment selection; and 2) unrelated to the outcome except through their effect on treatment selection (i.e. the instruments should not affect the outcome directly or correlate with unobserved confounders). Under these conditions, the instruments exploit exogenous variation in the treatment that is independent of self-selection bias.
There are numerous IV applications.12–15 Among those studying hospital treatments, the distance between the patient’s residence and the nearest hospital that provides the treatment, or the difference between this distance and the distance to the nearest hospital that does not provide the treatment – differential-distance instrument – is a commonly used instrument.13–15 The strength of such instruments is that distance is strongly predictive of treatment selection and can be theorized, particularly in the case of differential-distance instruments, to be unrelated to unobserved confounders affecting treatment selection and outcomes.
We use as instruments the differences between each of the distances from the mother’s residence to the nearest low- and moderate-volume hospitals and the distance to the nearest high-volume hospital. Specifically, we subtract the distance to the nearest high-volume hospital from the distances to the nearest low- and moderate-volume hospitals. Since data on mothers’ addresses are only available at the zip-code level, distances (measured in estimated drive time minutes) are calculated from the mother’s zip-code centroid to the nearest hospital in each volume category (full addresses are available for hospitals). We use these two instruments for all the binary and continuous volume measures.
The differential-distance instrument has a theoretical advantage over distances to the nearest hospital in each volume category as instruments.16, 17 If any unobserved confounders influence residential location in general, it is unlikely that they determine residential choice based on the difference in distances to hospitals of various volumes. Unobserved maternal/fetal health risks are likely the main unobserved confounders. Pregnancy complications due to poor maternal health (hypertension, diabetes, infections, or other chronic/acute conditions) or fetal health problems (e.g., very poor fetal growth) that may predispose to delivering at higher-volume hospitals but also increase VLBW mortality are unlikely to relate to the differential-distance instrument. However, it is impossible to fully test this assumption due to the unobservable factors, which highlights the importance of selecting instruments based on theory.
We estimate the instrumental-variable model using the two-stage residual-substitution method.18 The first stage regresses the volume measures, including the interaction term between continuous volume and volume-threshold indicator when using that measure,19 on the instruments and all covariates using ordinary-least-squares (OLS) and obtains the residual terms. The second stage adds these residuals as regressors into a logistic regression for mortality. This controls for unobserved confounders correlated with both volume and mortality. The usual standard errors in the second-stage IV-logistic regression are biased due to including the first-stage residuals. Therefore, we bootstrap the IV model (both stages) with 2,000 replications in order to consistently estimate 95% confidence intervals for volume effects. Bootstrap is an appropriate method for estimating variance in this model.18
This IV method requires a consistent estimator of the first stage residuals. Since OLS consistently estimates these residuals, it is an appropriate choice for estimating the first stage.18 While estimating the first stage for the three-category volume measure by multinomial logit may seem appealing, it is unclear how to define and consistently estimate residuals from that model that would have appropriate properties for the IV residual substitution method. Since both residual terms from the two first stage OLS regressions for the low and moderate volumes are included in the mortality function, this accounts for any correlations between these error terms due to unobservable confounders related to both volumes.
Both stages are adjusted for relevant observed characteristics including demographics (race, sex), insurance status, infant’s birth weight, and an indicator for antenatal and delivery complications (hematologic, intrauterine growth restriction, oligohydramnios, RH isoimmunization, breech presentation, chorioamnionitis, cord prolapse, and fetal distress problems). There is no continuous measure of gestational age in the dataset but only an indicator for gestational age less than 24 weeks (an exclusion factor in our study). We also adjust for zip-code level average schooling years among females older than 18 years and mean household income, county-level unemployment rate, and hospital-level characteristics including numbers of beds and full-time staff, teaching status, and NICU level. In order to compare to the IV model, we estimate a classical risk-adjusted logistic regression for mortality with standard errors clustered at the hospital-level.
We report odds ratios (ORs) for volume effects in order to compare to previous studies, which only reported ORs. Since ORs may not accurately estimate relative risk for relatively common outcomes such as mortality in this sample, we also report risk ratios (RRs) for the main models.
We evaluate the significance of the instruments in the first-stage regressions using a partial F-statistic. Since the instruments are calculated at the mothers’ zip-code level, we cluster the first-stage standard errors at the zip-code level. As a partial assessment of whether the instruments may satisfy the second IV assumption, we evaluate their relationships with observable individual-level confounders by regressing them on these variables; we find no significant relationships, supporting the exogeneity of the instruments. All analyses were done using Stata 11.
In order to gauge the sensitivity of volume effects to the exclusion of transferred infants, we define a three-category outcome that includes transferred infants in a separate category in addition to the two categories of dead or discharged alive. Next, we simultaneously estimate the volume effects on these three categories using multinomial logistic regression. This analysis formally accounts for differences in transfers by volume when estimating the volume effects on mortality and allows for volume effects to vary between transfers and mortality. As shown below, we find that our general inference is insensitive to accounting for transfers. In another analysis, we evaluate interactions between volume and NICU level but find no significant interactions. Finally, in order to evaluate the effect of potential measurement error in NICU level, we estimate a model that excludes NICU level.
The total sample that meets the study inclusion criteria includes 6,176 infants. Of these, 1,623 infants meet the exclusion criteria, resulting in a final sample of 4,533 infants for the main analysis. Figure 1 shows a flow chart of sample exclusions.
Table 1 reports the distribution of the study outcome and variables for the total sample and subgroups stratified by VLBW volume. About 13.7% of the sample infants died before discharge. The unadjusted mortality rates were higher in low- and moderate-volume hospitals than high-volume hospitals (15.8%, 16.3%, and 11%, respectively). Hospital volume was significantly correlated with all observed characteristics except female birth. Birth weight and rates of antenatal and delivery complications were higher at low- and moderate-volume hospitals compared to high-volume hospitals. Publicly insured, uninsured, and black infants were more likely to be delivered at low- and moderate-volume hospitals compared to high-volume ones.
Table 2 reports the first-stage OLS regressions. The instruments have significant effects on hospital-volume choice, with a partial F-statistic of 22 for low volume and 275 for moderate volume, which exceed the minimum-threshold of 10 for non-weak instruments.20 Mothers living closer to high-volume than low-volume hospitals (greater positive differential distance between the nearest low- and high-volume hospitals) are less likely to deliver at a low-volume hospital and vice-versa. A 10-minute increase in differential driving time to a low- versus high-volume hospital increases the probability of delivering at a high-volume hospital by 0.056. In contrast, an increase in distance to the nearest moderate-volume hospital relative to distance to the nearest high-volume hospital increases the probability of selecting a low-volume hospital. Similar instrument effects are observed for moderate-volume choice. A 10-minute increase in differential driving time to a moderate-versus high-volume hospital increases the probability of delivering at a high-volume hospital by 0.152. The F-statistics for instrument effects on continuous volume and its interaction with the volume-threshold indicator are 79.4 and 13.7, respectively.
Table 3 reports the F-statistics from the regressions of the instruments on individual-level covariates, one at a time. The instruments are only marginally significantly related to insurance status but not to other variables, providing support that they are exogenous.
Table 4 reports the low- and moderate-volume effects relative to high volume on inhospital mortality as estimated from the IV model and classical logistic regression. Supplementary Table A1 reports full regression results.
We first describe the ORs as reported in previous studies. Using risk-adjusted logistic regression, low and moderate volumes compared to high volume are significantly associated with 1.8-fold and 1.88-fold increases in mortality odds, respectively. Slightly smaller effects are observed in the unadjusted regression. The volume effects are significant and noticeably larger under the IV model than the classical logistic regression. Using the IV model, low and moderate volumes compared to high volume increase mortality odds by 5.4 and 3.51 times, respectively. The hypothesis that volume measures are exogenous and that estimates from classical risk-adjusted logistic regression are unbiased is rejected for moderate volume based on a Hausman-type endogeneity test for the significance of the first-stage residual coefficients in the mortality function (p<0.05), indicating unobservable confounders related to both volume selection and mortality.21, 22
As expected, ORs overestimate relative risk and exceed RRs. However, overall inference is unaffected by using RRs. Under the classical risk-adjusted model, low and moderate volumes have RRs of 1.45 and 1.49, respectively, compared to 2.76 and 2.21 under the IV model.
Similar differences between the classical and IV estimates are observed for continuous volume (Table 4). In the unadjusted logistic regression, an increase in VLBW volume by 10 infants is associated with a 4% decrease in mortality odds (OR=0.96); the adjusted effect is insignificant (OR=0.95). In the IV model, an increase in VLBW volume by 10 infants reduces mortality odds by 16% (OR=0.84). The classical risk-adjusted estimates are rejected based on significance of the first-stage residual in the mortality function (significant at p<0.01). The RRs for an increase in volume by 10 infants are 0.97 in the classical model and 0.9 in the IV model.
The classical model suggests insignificant volume-change effects within either range of 50 infants or less or more than 50 infants. In contrast, the IV model suggests a larger and significant effect of volume increases at volumes above 50 infants with a volume increase by 10 infants decreasing mortality odds by 17% (OR=0.83); RR is 0.89. However, volume increases below the 50-infant threshold have smaller and insignificant effects.
Supplementary Table A2 reports the volume effects on mortality from the multinomial logistic regression including transferred infants in a separate category in addition to discharged alive and dead. This analysis adds 1,777 transferred infants that have complete data on study variables and pass other exclusion criteria for the main sample. A similar pattern of results is observed to the main analysis excluding transferred infants. The volume effects on mortality using the classical risk-adjusted model change little. The IV estimates of volume effects on mortality from this multinomial model are even larger than the ones excluding transfers. Therefore, the main finding of underestimated volume benefits in classical risk-adjusted models is not sensitive to including or excluding transferred infants. Furthermore, our classical estimates are within range of those from previous studies with post-transfer information,6 which provides further assurance that transfer status is not biasing our results.
We find an overall similar pattern of results for volume effects when excluding NICU level from the model (supplementary Table A3). Therefore, it is unlikely that potential measurement error in NICU level is significantly biasing our volume effects.
We find beneficial volume effects for in-hospital survival using the IV-model that are significantly underestimated in classical risk-adjusted models. The analysis rejects the estimates from classical risk-adjusted models that only account for observable confounders. The results suggest adverse self-selection into higher-volume hospitals. Specifically, infants with “unobserved” characteristics that increase their mortality risks are more likely to be delivered at higher-volume hospitals. These may include poor fetal growth and other maternal/fetal health risks that may be identified during prenatal care, all of which are often inadequately captured in secondary data sources. Ignoring self-selection on unobservable confounders may seriously underestimate the improvement in survival with delivery and care for VLBW infants at higher-volume hospitals. The study, which is the first to explicitly account for unobserved self-selection bias in this context, highlights the importance of accounting for unobservable confounders for obtaining accurate estimates of hospital-volume effects on infant health. Since administrative datasets cannot measure all relevant confounders, direct risk-adjustment alone may not remove self-selection bias. Therefore, it is important to employ models that explicitly account for such bias.
The results support regionalizing the delivery and care for VLBW infants at high-volume hospitals (>100 VLBW infants annually). Based on our IV estimates, 87 infants (64%) of the 136 infants who died in low-volume hospitals would have survived if they were delivered at high-volume hospitals. Similarly, 138 infants (55%) of the 250 infants who died at moderate-volume hospitals would have survived if they were delivered at high-volume hospitals. Therefore, policies that increase the access of at-risk pregnancies to high-volume hospitals will have large survival returns.
The study results should be considered within some qualifications. While the study sample is fairly large and population-based, it is based on one state and is smaller than those of previous VLBW volume-mortality studies using classical models. Therefore, it is important to replicate this study in larger samples from other states. Also, the model does not significantly reject the classical estimate for the low- versus high-volume effect (p=0.13). However, the nearly significant p-value and rejecting the classical estimates for the moderate and continuous volume measures suggest that the classical estimate for low volume may have not been rejected because of limited power, which is a common limitation of such tests. Also, we cannot capture pregnancies from NJ delivered in nearby states. However, insurance coverage restrictions on seeking care from non-contracted hospitals reduce such occurrences. This does not bias our estimates, but limits the generalizability of the results to births in NJ hospitals to resident mothers. Finally, the dataset does not allow for linking maternal and infant data. Observing maternal characteristics is useful for risk adjustment and evaluating the sources of self-selection bias. However, adjusting for general maternal characteristics available in birth registry data such as prenatal care use or general health conditions is unlikely to fully account for self-selection bias, as these characteristics were included in previous studies reporting comparable classical estimates. Future studies that evaluate the effects of hospital volume on health outcomes of surviving infants are needed. Also, evaluating if hospital volume effects vary by socioeconomic and clinical characteristics is important for identifying any disparities in benefiting from improved care effectiveness at larger volumes.
Data analysis was in-part supported by NIH grant 1R03 DE018394. The NIH had no role in the study design, data collection and analysis, or manuscript writing and submission.
George L. Wehby, Dept. of Health Management and Policy, University of Iowa, 105 River Street, N248 CPHB, Iowa City, IA 52242, Phone: 319-384-5133, Fax : 319-384-5125.
Fred Ullrich, Dept. of Health Management and Policy, University of Iowa, Iowa City, IA 52242.
Yang Xie, Global Outcomes Research, Merck & Co., Inc., One Merck Drive, WS-2E55, Whitehouse Station, NJ 08889, Phone: 908-423-4091, Fax: 908-775-1688.