To minimize the presence of confounding factors in our data, our analysis is restricted to major surgery patients between the ages of 18 and 64 covered by private health insurance, which was defined as private FFS, HMO, or PPO. We begin with the following basic specification:

where *C*_{ijt} equals 1 if patient *i* undergoing a major surgery in hospital *j* at time *t* experienced one or more complications and 0 otherwise. MC_{ijt} is the categorical variable for the patient *i*'s payer type, with the private FFS insurance being the referent category. *P*_{ijt} is a vector of the patient characteristics, and *H*_{jt} represents the vector of hospital characteristics. MK_{jt} represents the market condition for hospital *j* at time *t*. Also, *ω*_{ijt} represents the unobserved patient heterogeneity that affects his or her complication in hospital *j* and time *t*, while *λ*_{jt} represents the unobserved quality of hospital *j* at time *t*. Because of the discrete nature of the dependent variable, (1) is estimated via binary logistic regression model, with the assumption that the error term *ε*_{ijt} is logistically distributed with mean 0. Our basic approach is to see how *α*_{1} changes under different model specifications.

To test whether the association between patients' payer types and complication rates are driven by selective contracting between hospitals and MCO based on hospital quality, we use hospital fixed effects (i.e., include dummy variables for 174 hospitals if the unobserved hospital quality *λ*_{jt} is time invariant) and hospital-year fixed effects (i.e., include dummy variables for 174 hospitals × 4 years if *λ*_{jt} is time variant) to capture *λ*_{jt}, while assuming that the unobserved patient heterogeneity, *ω*_{ijt}, is 0. Compared with a naïve model in which both *ω*_{ijt} and *λ*_{jt} are assumed 0, these fixed-effects models should yield an estimate of *α*_{1} that is closer to 0. That is, if MCO selectively contract with lower-quality hospitals, controlling for the unobserved hospital quality via hospital fixed effects should eliminate the positive and significant effect of the payer type variables.

It is important to note that this constitutes only an indirect method of testing the selective contracting hypothesis. In other words, we are unable to conclude from this part of our analysis whether selective contracting based on hospital quality does or does not occur. Rather, the purpose of this is to show whether selective contracting is a plausible explanation for the observed association between payer types and complication rates.

If the significant and positive effect still remains even after controlling for the unobserved hospital quality via fixed effects, we then move on to testing whether the association is driven by the unobserved patient heterogeneity, *ω*_{ijt}. To do so, we take a more structural approach. More specifically, we specify a two-equation model as the following:

The positive and significant estimate of

*β*_{1} arises from the cross-equation correlation between

_{1} and

_{2}; that is, if MCO patients are in worse health conditions than their FFS counterparts, then they may be more likely to experience complication. To account for this correlation, the error terms are decomposed as the following:

Thus, the cross-equation correlation of the error terms arises from the presence of *ω*_{ijt} in both (2) and (3).

Put differently, the difference between the naïve model as shown in

Equation (1) and the two-equation model is the explicit recognition that MCO patients are different from FFS patients in terms of the unobserved characteristics that contribute to one's likelihood of experiencing complications. The naïve model implicitly assumes that patients across all payer types are essentially homogeneous in terms of their unobserved characteristics, whereas the two-equation model allows MCO patients to be systematically different from FFS patients.

Discrete Factor Approximation Method (DFM)

To remove the effects of

*ω*_{ijt} in our model, the DFM is used (

Mroz 1999). Rather than imposing a stringent distributional assumption on

*ω*_{ijt}, DFM assumes that the distribution may be approximated using a finite discrete set of mass points (

Heckman and Singer 1984), or “factors.” Mechanically, DFM estimates (2) and (3) simultaneously via maximum likelihood estimation (MLE), conditional on each mass point representing a value of the random variable

*ω*_{ijt}. Then, the conditional likelihood values are summed over all the mass points to obtain the unconditional likelihood values.

Intuitively, the mass points represent the unobserved “types” of individuals in the data set. That is, we assume that there are finite types of patients in our data set and that those who belong to the same type are similar to one another in terms of the unobserved health status. Thus, rather than assuming an arbitrary distribution of *ω*_{ijt}, which is unknown, we approximate the empirical distribution of the types in our data with discrete mass points. Conceptually, once we know the approximate empirical distribution of the types, we can then “integrate out” the unobserved *ω*_{ijt} from our equations by summing over all discrete values of *ω*_{ijt}.

Although our DFM model is technically identified by its functional form, the identification is strengthened via exclusion restriction. In our model, complication rates among the publicly insured patients of each hospital serve as the exclusion restriction. That is, it appears as a right-hand-side variable in (3) to predict whether or not a privately insured patient experiences a complication, but it does not predict the payer type in (2). Publicly insured patients are defined as those covered by Medicare or Medicaid (mostly Medicare, as Medicaid patients accounted for <8 percent of the total hospital admissions in Florida during our study period [

Encinosa and Bernard 2005]). To the extent that how a hospital treats its publicly insured patient is strongly correlated with how it treats its privately insured patients, this is a reasonable variable to include in our complication model. There is ample literature to suggest that this may indeed be a reasonable assumption (

Needleman et al. 2003;

Wennberg et al. 2004;

Baker, Fisher, and Wennberg 2008).

The complication rate among the publicly insured patients in each hospital thus represents the underlying risk of complication experienced by all patients treated in that hospital—that is, unobserved hospital quality. To help identify our model, we further assume that the unobserved hospital quality is uncorrelated with the payer types of the privately insured patients. Therefore, this exclusion restriction would not hold if the hospital-MCO selective contracting hypothesis were true. That is, if MCO are able to observe the hospital quality and thus “steer” their enrollees to particular hospitals, the complication rates among the publicly insured patients would be correlated with the privately insured patients' payer types as well. However, we believe this is unlikely based on our findings presented below. In short, we find little evidence of MCO-hospital matching based on unobserved hospital quality, suggesting that our exclusion restriction is warranted.

We use DFM rather than the traditional instrumental variable (IV) method because the functional form of DFM helps the identification of the parameters of interest and thus avoids the restrictive properties of IV. Using our current data source, it is difficult to identify a plausible IV that predicts patients' payer types but not directly the patients' complication rates. Instead, we have identified a variable that is correlated with the complication rates but not with the payer types. Thus, we are able to achieve identification using an exclusion restriction and the functional form of DFM rather than resorting to an implausible IV.

DFM Model Specification

Let

*η* represent one of M mass points approximating the distribution of

*ω*_{ijt}. Combining

equations (2) through

(6), the two-equation model can be expressed as the following:

Since

*η* represents a mass point in a discrete probability distribution, each

*η*_{m} has a probability—denoted by

*π*_{m}—associated with it such that

. Thus, the probability corresponding to each

*η*_{m} is given by the following logit transformation:

Thus, we estimate

*ϕ*_{m} instead of

*π*_{m}, which can be calculated from (8). Following Mroz's suggestion,

*ρ*_{1} is constrained to one to determine the scale of the discrete factors. The estimated magnitude and sign of

*ρ*_{2}, therefore, indicate the extent to which the unobserved patient heterogeneity leads to the biased estimates of

*β*_{1}. The number of mass points is arbitrary.

Deb (2001) suggests that three or four mass points are usually sufficient to approximate the discrete empirical distribution of

*ω*_{ijt}. Thus, our strategy is to obtain separate model estimates with two, three, and four mass points and choose the one that yields the highest likelihood value—in this case, four mass points. For more details on DFM and the derivation of the likelihood function, see

Appendix SA2.