|Home | About | Journals | Submit | Contact Us | Français|
To provide a method for any hospital to evaluate patient mortality using a hierarchical risk-adjustment equation derived from a reference sample.
American College of Surgeons National Trauma Data Bank (NTDB).
Hierarchical logistic regression models predicting mortality were estimated from NTDB data. Risk-adjusted hospital effects obtained directly from models using standard software were compared with approximations derived from a summary equation and data from each individual hospital.
Theoretical approximations were similar to results using standard software.
To allow independent verification, agencies using reference databases for hospital mortality “report cards” should publish their risk-adjustment equations. Similar hospitals not in the reference database may also use the published equations along with the approximations described to evaluate their own outcomes using their own data.
There is increasing interest in methods to evaluate hospital performance, especially patient mortality, with respect to some group standard. Hospital “profiles” for cardiac surgery have been published in several regions, leading to careful study of the associated statistical issues (Gatsonis et al. 1995; Normand, Glickman, and Gatsonis 1997; Hannan et al. 2005;). Extending this approach, the Agency for Healthcare Research and Quality (AHRQ) has published Inpatient Quality Indicators (IQI), described at http://qualityindicators.ahrq.gov, that allow hospitals to compare their performance in several areas to the outcomes expected from the reference databases of the AHRQ Healthcare Cost and Utilization Project (HCUP). The American College of Surgeons (ACS) has also been a leader in this effort, by supporting the development of a National Trauma Data Bank (NTDB) to combine registry data from multiple trauma centers and compare their outcomes.
Many hospital performance evaluations today rely on multilevel (ML) or hierarchical logistic regression modeling. While these methods can have theoretical advantages in modeling institutional mortality and are increasingly accessible due to software advances, they can present challenges as well. For example, the AHRQ or ACS NTDB data may be used to compute performance measures for a set of hospitals, but other hospitals not included in the original analysis may also be interested in comparing their outcomes to these standards. The purpose of this paper is to derive a method by which such a comparison can be made.
The method we propose could be useful in several contexts. Analysts can use this method to compute performance measures for hospitals that were not included in an original analysis. Hospitals that were included in the original analyses can also use this method to verify “report cards” provided by an agency conducting the analysis. Hospitals or other observers can use this method to derive updated scores based on the original model but incorporate newer data that become available. Finally, experiments can be conducted to see how the scores might change if certain aspects of the data were to change.
Let us consider the problem of comparing hospitals with respect to their mortality rates after statistically adjusting for the background characteristics of their patients. Define the outcome variable for patient i as yij=1 if that patient died and yij=0 if not, for i=1, …, nj patients in hospital j with pij=prob (yij=1), the probability of death. We expect this probability to depend upon patient background characteristics and also upon a hospital-specific contribution bj. The aim is to get a good estimate of bj for each hospital, so that each hospital can be compared with a set of reference hospitals. If hospital j is one in a large sample of hospitals, we might formulate a logistic regression model for this purpose (Iezzoni et al. 1992):
where b=(b1, b2, …, bm) are constant coefficients and bj is a fixed effect associated with hospital j. The odds ratio obtained from exponentiating bj may be used as a measure of effect assuming that the covariates in the model correctly incorporate all other sources of variability affecting mortality. The significance of an estimate of bj may be judged by a confidence interval or p-value obtained from maximum likelihood theory (Hosmer and Lemeshow 2000).
Allowance for “clustering” of patient characteristics has led to the development of “hierarchical” or “ML” models (Normand, Glickman, and Gatsonis 1997; Snijders and Bosker 1999; Raudenbush and Bryk 2002; Goldstein 2003;). The simplest ML model of mortality, known as the “random intercept model,” uses the logit link function just as in equation (1), but allows the intercept term to vary randomly by hospital. The specific hospital intercepts bj are regarded as distributed around a mean b0 with a variance v2 to be estimated from the data. In most applications, analysts have assumed that the hospital effects are independently normally distributed, that is,
so that uj is a hospital-specific random effect with respect to the overall mean intercept b0.
In the ML theory, the hospital's contribution to mortality predicted by this equation is a “shrunken” estimate that weights the observed mortality by its reliability; that is, the hospital-specific estimate of uj is shrunken or “pulled” toward zero, with the hospitals producing the least data for estimation experiencing the greatest shrinkage. Theoretical considerations and empirical evidence (Raudenbush and Bryk 2002) suggest that, on average, the shrunken estimates will tend to be more accurate than those based on the fixed effects model. The ML method has been used for numerous comparisons of interhospital or intersurgeon differences (Gatsonis et al. 1995; DeLong et al. 1997; Moerbeek, van Breukelen, and Berger 2003;). The major drawback to the use of ML modeling has been computational complexity, which is only recently being overcome.
The ML approach offers several theoretical advantages, including (1) appropriately modeling the hierarchical nature of the data and consequent correlation of outcomes within hospitals; (2) reducing the incidence of outlying mortality estimates based on scant data; and (3) allowing for incorporation of hospital characteristics in the model. In addition, ML models produce estimates for hospitals with no observed mortality (whose performance must be shrunken at least slightly toward the mean), whereas standard regression will fail due to collinearity.
Let us now consider the problem of obtaining a reasonable shrunken estimate for some hospital, call it hospital q, that did not contribute data for the reference sample from which estimates of the parameters (b0, b1, b2, …, bm, v2) were obtained. If certain assumptions to be described below are met, we can compute a good estimate as follows (see Appendix A for the derivation):
The final estimate of the risk-adjusted hospital mortality effect can be considered as uq(k), and the final estimate of the shrinkage factor can be considered as λq(k). An approximate variance of uq will be v2(1−λq), which can be used to construct confidence intervals or test pairwise differences between hospitals (Snijders and Bosker 1999). This measure of effect on the logit scale may be exponentiated to approximate an observed/expected (O/E) ratio for each hospital (DeLong et al. 1997; Hannan et al. 2005;).
One key assumption underlying this procedure is that the random intercept model generating the data for the reference sample of hospitals j=1, …, J is also the correct model for generating the data for hospitals q=1, …, Q of interest but not in the reference sample. This assumption would appear plausible when hospital q draws its patients from a population that is similar to those represented by the reference sample, and it can be checked if a summary of background information on patients in the reference sample is available. For this reason, registries or databases to be used for interhospital comparisons should report summary background data on patients.
A second assumption is that the number of hospitals in the reference sample is large enough to assume that the parameters (b0, b1, b2, …, bm, v2) are only negligibly different from their estimates. These assumptions are in addition to the usual assumptions of the random intercept model (normality of random effects, constant random effects variance, linearity of the log-odds of mortality in the covariates).
Our strategy for validating the procedure was first to compute estimates for a large reference sample containing data from hospitals j=1, …, J. We took the shrinkage estimators of the hospital-specific random effects uj, based on standard software, as the criterion for evaluating our procedure. Next, for each hospital j, we used the parameter estimates to compute hospital-specific effects as described in steps 1–4 above. We reasoned that if our procedure is correct, it should essentially reproduce the criterion estimates generated by the software.
NTDB data for 2001–2005 were obtained from the ACS in compliance with its standard Data Use Agreement. Relevant patient variables were extracted, and independent variables were centered on their population means. ML logistic regression models predicting hospital mortality for cases of blunt trauma were created using the “xtmelogit” command in Stata (version 10, Stata Corp., College Station, TX). The number of quadrature points was started at 1 (Laplace approximation) and increased in increments of 2 until the variance of the random effect did not change at three significant digits. After estimation of the random effects model, a posterior modal estimate of the effect for each hospital, and its standard error, were generated using the “predict, reffects” and “predict, reses” commands in Stata. These estimates were confirmed using the Laplace approximation in HLM6 (Scientific Software International, Lincolnwood, IL). The hospital-specific effects and their variances were obtained after estimation in HLM6 from the Level-2 residual file as the variables “ebintrcp” and “pv00.”
ML models and hospital effects estimated by Stata and HLM6 were virtually identical. Hospital effects (u0j) obtained using the algorithm described above (steps 1–4) converged within a few iterations to the effects reported by the software packages.
Analytical methods for hospital profiling must be practical and statistically valid, and they must be presented in such a way that they can be understood, and therefore trusted, by the institutions being evaluated. In many cases, hierarchical (ML) methods are now being used to generate estimates of hospital performance. However, it is not uncommon to encounter situations where it would be desirable to compute a performance score for a hospital that was not included in the original analysis, or to provide institutions with the ability to verify their published scores. We have presented a method for deriving such estimates.
The idea of publishing a risk adjustment equation to enable trauma centers to compare their mortality to a reference database originated with the Major Trauma Outcome Study (MTOS) coordinated by the ACS more than 20 years ago (Champion et al. 1990). As successor to the MTOS, the NTDB has steadily increased the number of contributing hospitals, but a third of the major trauma centers in the nation still find themselves unable to participate due to state regulations, financial constraints, personnel transitions, legal concerns, or other temporary or permanent factors. A method such as we describe would still allow these institutions to use the NTDB as a quality improvement tool, using ML statistical models if desired.
The AHRQ has taken a similar approach in the publication of their IQI. Any hospital may input its own data into the IQI computer programs and compare its performance in several areas to the outcomes expected from the HCUP databases. Since 2007, the equivalent of a ML or hierarchical method has been imbedded in the IQI risk-adjustment software, although the derivations of the specific models and algorithms are explained only in general terms.
These methods assume that if one more hospital were added to the reference database, the regression coefficients derived without that hospital would change very little. Therefore, they are valid only if the reference database is very large compared with the size of any one hospital, and only if the patient population at the additional hospital is sufficiently similar to that from which the reference database was derived (Shahian and Normand 2008). To improve validity, separate risk-adjustment equations could be provided for different geographic regions, hospital characteristics, etc., or additional variables reflecting these differences can be included in the equations.
We have described methods and theoretical justifications by which shrunken risk-adjusted hospital effects on mortality can be obtained for hospitals not necessarily included in a reference database, assuming their patient populations and data management are similar. In the interest of maximum transparency in the process of hospital profiling, we support the practice of making risk-adjustment equations public so that hospitals in a reference database may use these methods to verify their evaluations and other hospitals may evaluate themselves with respect to this standard.
Joint Acknowledgment/Disclosure Statement: This work was supported in part by grant R01HS015656 from the Agency for Healthcare Research and Quality, and a supplemental grant from the Maine Medical Center Research Strategic Plan. Dr. Clark has been a member and chairman of the National Trauma Data Bank Subcommittee of the American College of Surgeons Committee on Trauma. Dr. Raudenbush is an author of the HLM6 software.
Let yij=1 if patient i in hospital j dies and 0 if that patient survives. We assume that the probability of death for this patient may be expressed as
where uj is a hospital-specific random effect distributed as Normal(0, v2). The parameters (β, v2) have presumably been estimated from an appropriate large-scale database in which hospital j is not necessarily included. Indeed, we assume that the reference database is so large that the estimates of (β, v2) are essentially equivalent to the parameters.
We can model the outcome as
Expanding pij in a first-order Maclaurin series, we have
Moving all observables to the left-hand side and setting , we have the linearized dependent variable
Note that the error term in this model has mean zero and variance
This gives us a weighted least squares problem solved by using the inverse variance wij as the weight in estimating equation (A4):
We then assume that the true uj and the estimate are bivariate normal, that is
from which we can obtain the conditional distribution of uj given as
where . We may then use the conditional mean and variance of uj given from (A8) to make inferences about the unknown uj.
The approximation can be improved by expanding equation (A1) in a Taylor series around :
where Now we can compute a new linearized dependent variable
where we now use . We now compute the weighted least squares estimate
Inference proceeds as before, and the process of iteration can continue until convergence.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.