Let us consider the problem of comparing hospitals with respect to their mortality rates after statistically adjusting for the background characteristics of their patients. Define the outcome variable for patient

*i* as

*y*_{ij}=1 if that patient died and

*y*_{ij}=0 if not, for

*i*=1, …,

*n*_{j} patients in hospital

*j* with

*p*_{ij}=prob (

*y*_{ij}=1), the probability of death. We expect this probability to depend upon patient background characteristics

and also upon a hospital-specific contribution

*b*_{j}. The aim is to get a good estimate of

*b*_{j} for each hospital, so that each hospital can be compared with a set of reference hospitals. If hospital

*j* is one in a large sample of hospitals, we might formulate a logistic regression model for this purpose (

Iezzoni et al. 1992):

where

**b**=(

*b*_{1},

*b*_{2}, …, b

_{m}) are constant coefficients and

*b*_{j} is a fixed effect associated with hospital

*j*. The odds ratio obtained from exponentiating

*b*_{j} may be used as a measure of effect assuming that the covariates in the model correctly incorporate all other sources of variability affecting mortality. The significance of an estimate of

*b*_{j} may be judged by a confidence interval or

*p*-value obtained from maximum likelihood theory (

Hosmer and Lemeshow 2000).

Allowance for “clustering” of patient characteristics has led to the development of “hierarchical” or “ML” models (

Normand, Glickman, and Gatsonis 1997;

Snijders and Bosker 1999;

Raudenbush and Bryk 2002;

Goldstein 2003;). The simplest ML model of mortality, known as the “random intercept model,” uses the logit link function just as in

equation (1), but allows the intercept term to vary randomly by hospital. The specific hospital intercepts

*b*_{j} are regarded as distributed around a mean

*b*_{0} with a variance

*v*_{2} to be estimated from the data. In most applications, analysts have assumed that the hospital effects are independently normally distributed, that is,

so that

*u*_{j} is a hospital-specific random effect with respect to the overall mean intercept

*b*_{0}.

In the ML theory, the hospital's contribution to mortality predicted by this equation is a “shrunken” estimate that weights the observed mortality by its reliability; that is, the hospital-specific estimate of

*u*_{j} is shrunken or “pulled” toward zero, with the hospitals producing the least data for estimation experiencing the greatest shrinkage. Theoretical considerations and empirical evidence (

Raudenbush and Bryk 2002) suggest that, on average, the shrunken estimates will tend to be more accurate than those based on the fixed effects model. The ML method has been used for numerous comparisons of interhospital or intersurgeon differences (

Gatsonis et al. 1995;

DeLong et al. 1997;

Moerbeek, van Breukelen, and Berger 2003;). The major drawback to the use of ML modeling has been computational complexity, which is only recently being overcome.

The ML approach offers several theoretical advantages, including (1) appropriately modeling the hierarchical nature of the data and consequent correlation of outcomes within hospitals; (2) reducing the incidence of outlying mortality estimates based on scant data; and (3) allowing for incorporation of hospital characteristics in the model. In addition, ML models produce estimates for hospitals with no observed mortality (whose performance must be shrunken at least slightly toward the mean), whereas standard regression will fail due to collinearity.

Let us now consider the problem of obtaining a reasonable shrunken estimate for some hospital, call it hospital

*q*, that did not contribute data for the reference sample from which estimates of the parameters (

*b*_{0},

*b*_{1},

*b*_{2}, …, b

_{m},

*v*_{2}) were obtained. If certain assumptions to be described below are met, we can compute a good estimate as follows (see

Appendix A for the derivation):

- Estimate the parameters (
*b*_{0}, *b*_{1}, *b*_{2}, …, *b*_{m}, *v*_{2}) from the hospitals *j*=1, …, *J* in the reference dataset using whatever ML software is most practical and effective. Let us denote these estimates as . This step can be carried out, and the results published, by a central agency like the ACS with access to the entire database; the remaining steps of the algorithm can be carried out by any hospital *q*, with access only to its own data and the published model, regardless of whether hospital *q* is in the reference database. - For each patient
*i* in hospital *q*, compute a provisional risk-adjusted probability of mortality . The estimate of the specific hospital effect is equal to the sum of the estimated intercept from the reference sample, that is, , and a provisional estimate of the random effect, denoted . It would typically make sense to set the initial value . - Compute the next estimate of the random effect as follows: 3a. Estimate the
*raw effect* of hospital *q* at iteration 1 as 3b. Calculate an approximate Level-1 variance (on the logit scale) at iteration 1 as 3c. Partition the total variance so that it can be used as a “shrinkage factor,” calculating the quantity *λ*_{q}^{(1)} as 3d. Compute a corrected (“shrunken”) estimate of the hospital random effect as 3e. Now update the patient-specific predicted probability of mortality as where . - Reiterate steps 3a–3e, estimating from and until the estimate
*u*_{q}^{(}^{k)} at iteration *k* is negligibly different from the estimate *u*_{q}^{(}^{k−1)} at iteration *k*−1.

The final estimate of the risk-adjusted hospital mortality effect can be considered as

*u*_{q}^{(}^{k)}, and the final estimate of the shrinkage factor can be considered as

*λ*_{q}^{(}^{k)}. An approximate variance of

*u*_{q} will be

*v*_{2}(1−

*λ*_{q}), which can be used to construct confidence intervals or test pairwise differences between hospitals (

Snijders and Bosker 1999). This measure of effect on the logit scale may be exponentiated to approximate an observed/expected (O/E) ratio for each hospital (

DeLong et al. 1997;

Hannan et al. 2005;).

One key assumption underlying this procedure is that the random intercept model generating the data for the reference sample of hospitals *j*=1, …, *J* is also the correct model for generating the data for hospitals *q*=1, …, *Q* of interest but not in the reference sample. This assumption would appear plausible when hospital *q* draws its patients from a population that is similar to those represented by the reference sample, and it can be checked if a summary of background information on patients in the reference sample is available. For this reason, registries or databases to be used for interhospital comparisons should report summary background data on patients.

A second assumption is that the number of hospitals in the reference sample is large enough to assume that the parameters (*b*_{0}, *b*_{1}, *b*_{2}, …, *b*_{m}, *v*_{2}) are only negligibly different from their estimates. These assumptions are in addition to the usual assumptions of the random intercept model (normality of random effects, constant random effects variance, linearity of the log-odds of mortality in the covariates).