Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2917105

Formats

Article sections

Authors

Related links

Econom J. Author manuscript; available in PMC 2010 August 5.

Published in final edited form as:

Econom J. 2009; 12(1): S230–S234.

doi: 10.1111/j.1368-423X.2008.00269.xPMCID: PMC2917105

NIHMSID: NIHMS196306

James J. Heckman, University of Chicago, University College Dublin, Cowles Foundation, Yale University and American Bar Foundation;

See other articles in PMC that cite the published article.

The probability of selection into a treatment, also called the propensity score, plays a central role in classical selection models and in matching models (see, e.g., Heckman, 1980; Heckman and Navarro, 2004; Heckman and Vytlacil, 2007; Hirano et al., 2003; Rosenbaum and Rubin, 1983). ^{1} Heckman and Robb (1986, reprinted 2000), Heckman and Navarro (2004) and Heckman and Vytlacil (2007) show how the propensity score is used differently in matching and selection models. They also show that, given the propensity score, both matching and selection models are robust to choice-based sampling, which occurs when treatment group members are over- or under-represented relative to their frequency in the population. Choice-based sampling designs are frequently chosen in evaluation studies to reduce the costs of data collection and to obtain more observations on treated individuals. Given a consistent estimate of the propensity score, matching and classical selection methods are robust to choice-based sampling, because both are defined conditional on treatment and comparison group status.

This note extends the analysis of Heckman and Robb (1985),Heckman and Robb (1986, reprinted 2000) to consider the case where population weights are unknown so that the propensity score cannot be consistently estimated. In evaluation settings, the population weights are often unknown or cannot easily be estimated.^{2} For example, for the National Supported Work training program studied in LaLonde (1986), Dehejia and Wahba (1999, 2002) and in Smith and Todd (2005), the population consists of all persons eligible for the program, which was targeted at drug addicts, ex-convicts, and welfare recipients. Few datasets have the information necessary to determine whether a person is eligible for the program, so it would be difficult to estimate the population weights needed to consistently estimate propensity scores.

In this note, we establish that matching and selection procedures can still be applied when the propensity score is estimated on unweighted choice based samples. The idea is simple. To implement both matching and classical selection models, only a monotonic transformation of the propensity score is required. In choice based samples, the odds ratio of the propensity score estimated using misspecified weights is monotonically related to the odds ratio of the true propensity scores. Thus, selection and matching procedures can identify population treatment effects using misspecified estimates of propensity scores fit on choice-based samples.

Let *D* = 1 if a person is a treatment group member; *D* = 0 if the person is a member of the comparison group. *X* = *x* is a realization of *X*. In the population generated from random sampling, the joint density is

$$\begin{array}{l}g\left(d,x\right)={\left[\text{Pr}\left(D=1|x\right)\right]}^{d}{\left[\mathit{\text{Pr}}\left(D=0|x\right)\right]}^{1-d}g(x)\\ \phantom{\rule{4em}{0ex}}\text{for}\phantom{\rule{1.2em}{0ex}}D=d,\phantom{\rule{1.5em}{0ex}}d\in \left\{0,1\right\},\end{array}$$

where *g* is the density of the data. By Bayes's theorem, we have, letting *Pr*(*D* = 1) = *P*,

$$g\left(x|D=1\right)P=g(x)\mathit{\text{Pr}}\left(D=1|x\right)$$

(1a)

and

$$g\left(x|D=0\right)\left(1-P\right)=g(x)\mathit{\text{Pr}}\left(D=0|x\right).$$

(1b)

Take the ratio of (1a) to (1b)

$$\frac{g\left(x|D=1\right)}{g\left(x|D=0\right)}\phantom{\rule{0.3em}{0ex}}\left(\frac{P}{1-P}\right)=\frac{\mathit{\text{Pr}}\left(D=1|x\right)}{\mathit{\text{Pr}}\left(D=0|x\right)}.$$

(2)

Assume 0 < *Pr*(*D* = 1 | *x*) *<* 1. From knowledge of the densities of the data in the two samples, *g*(*x* | *D* = 1) and *g*(*x* | *D* = 0), one can form a scalar multiple of the ratio of the propensity score without knowing *P*. The odds ratio is a monotonic function of the propensity score that does not require knowledge of the true sample weights. In a choice-based sample, both the numerator and denominator of the first term in (2) can be consistently estimated. This monotonic function can replace *P*(*x*) in implementing both matching and nonparametric selection models.

However, estimating *g*(*x* | *D* = *d*) is demanding of the data when *X* is of high dimension. Instead of estimating these densities, we can substitute for the left hand side of (2) the odds ratio of the estimated conditional probabilities obtained using the choice-based sample with the wrong weights. (*i.e.* for example, ignoring the fact that the data are a choice based sample). The odds ratio of the estimated probabilities is a scalar multiple of the true odds ratio. It can therefore be used instead of *Pr*(*D* = 1 | *X*) to match or construct nonparametric control functions in selection bias models.

In the choice-based sample, let *r*(*D* = 1 | *x*) be the conditional probability that *D* = 1 and *P** be the unconditional probability of sampling *D* = 1, where *P* ≠ P*, the true population proportion. The joint density of the data from the sampled population is

$${\left[g\left(x|D=1\right)P\ast \right]}^{d}{\left[g\left(x|D=0\right)\left(1-P\ast \right)\right]}^{1-d}.$$

Using (1a) and (1b) to solve for *g*(*x* | *D* = 1) and *g*(*x* | *D* = 0) one may write the data density as

$${\left[\frac{\mathit{\text{Pr}}\left(D=1|x\right)g(x)}{P}P\ast \right]}^{d}{\phantom{\rule{0.3em}{0ex}}\left[\frac{\mathit{\text{Pr}}\left(D=0|x\right)g(x)}{\left(1-P\right)}\left(1-P\ast \right)\right]}^{1-d}$$

so

$$\stackrel{\sim}{P}r\left(D=1|x\right)=\frac{\mathit{\text{Pr}}\left(D=1|x\right)g(x)\frac{P\ast}{P}}{g\left(x|D=1\right)P\ast +g(x|D=0)(1-P\ast )}$$

(3a)

and

$$\stackrel{\sim}{P}r\left(D=0|x\right)=\frac{\mathit{\text{Pr}}\left(D=0|x\right)g(x)\frac{1-P\ast}{1-P}}{g\left(x|D=1\right)P\ast +g(x|D=0)(1-P\ast )}.$$

(3b)

Under random sampling, the right-hand sides of (3a) and (3b) are the limits to which the choice-based probabilities converge. Taking the ratio of (3a) to (3b), assuming the latter is not zero, one obtains

$$\frac{\stackrel{\sim}{P}r\left(D=1|x\right)}{\stackrel{\sim}{P}r\left(D=0|x\right)}=\frac{\mathit{\text{Pr}}\left(D=1|x\right)}{\mathit{\text{Pr}}\left(D=0|x\right)}\phantom{\rule{0.3em}{0ex}}\left(\frac{P\ast}{1-P\ast}\right)\phantom{\rule{0.3em}{0ex}}\left(\frac{1-P}{P}\right).$$

(4)

Thus, one can estimate the ratio of the propensity score up to scale (the scale is the product of the two terms on the right-hand side of (4)). Instead of estimating matching or semiparametric selection models using Pr(*D* = 1 | *x*) (as in, for example, Ahn and Powell (1993); Heckman (1980); Heckman and Hotz (1989); Heckman et al. (1998); Heckman and Robb (1986); Powell (2001), one can, instead, use the odds ratio of the estimate *r*(*D* = 1 | *x*), which is monotonically related to the true *Pr*(*D* = 1 | *x*). In the case of a logit *P*(*x*), *P*(*x*) = exp(*xβ*)/(1 + exp(*xβ*)), the log of this ratio becomes

$$\text{In}\phantom{\rule{0.2em}{0ex}}\frac{\stackrel{\sim}{P}r\left(D=1|x\right)}{\stackrel{\sim}{P}r\left(D=0|x\right)}=x\stackrel{\sim}{\beta}$$

where the slope coefficients are the true values and the intercept _{0} = *β*_{0} + *n*(*P**/(1 − *P**)) + *n*((1 − *P*)/*P*)), where *β*_{0} is the true value.^{3}

In implementing nearest-neighbor matching estimators, matching on the log odds ratio gives identical estimates to matching on the (unknown) Pr(*D* = 1 | *x*), because the odds ratio preserves the ranking of the neighbors. In application of either matching or classical selection bias correction methods, one must account for the usual problems of using estimated log odds ratios instead of true values.^{4}

^{*}This research was supported by NSF SBR 93-21-048 and NSF 97-09-873 and NICHD 40-4043-000-85-261.

^{1}It also plays a key role in instrumental variables models (see Heckman et al., 2006). Heckman and Vytlacil (2007) discuss the different role played by the propensity score in matching IV and selection models.

^{2}The methods of Manski and Lerman (1977) and Manski (1986) for adjusting for choice-based sampling in estimating the discrete choice probabilities cannot be applied when the weights are unknown and cannot be identified from the data.

^{3}See Manski and McFadden (1981, p. 26).

^{4}For discussion related to using estimated propensity scores, see Hahn (1998); Heckman et al. (1998); Heckman et al. (1998); Hirano et al. (2003).

James J. Heckman, University of Chicago, University College Dublin, Cowles Foundation, Yale University and American Bar Foundation.

Petra E. Todd, University of Pennsylvania and NBER.

- Ahn H, Powell J. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. Journal of Econometrics. 1993 July;58(1-2):3–29.
- Dehejia R, Wahba S. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association. 1999 December;94(448):1053–1062.
- Dehejia R, Wahba S. Propensity score matching methods for nonexperimental causal studies. Review of Economics and Statistics. 2002 February;84(1):151–161.
- Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998 March;66(2):315–31.
- Heckman JJ. Addendum to sample selection bias as a specification error. In: Stromsdorfer E, Farkas G, editors. Evaluation Studies Review Annual. Vol. 5. Beverly Hills: Sage Publications; 1980.
- Heckman JJ, Hotz VJ. Choosing among alternative nonexperimental methods for estimating the impact of social programs: The case of Manpower Training. Journal of the American Statistical Association. 1989 December;84(408):862–874. Rejoinder also published in Vol. 84, No. 408, (Dec. 1989)
- Heckman JJ, Ichimura H, Smith J, Todd PE. Characterizing selection bias using experimental data. Econometrica. 1998 September;66(5):1017–1098.
- Heckman JJ, Ichimura H, Todd PE. Matching as an econometric evaluation estimator. Review of Economic Studies. 1998 April;65(223):261–294.
- Heckman JJ, Navarro S. Using matching, instrumental variables, and control functions to estimate economic choice models. Review of Economics and Statistics. 2004 February;86(1):30–57.
- Heckman JJ, Robb R. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics. 1985 October-November;30(1-2):239–267.
- Heckman JJ, Robb R. Alternative methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In: Wainer H, editor. Drawing Inferences from Self-Selected Samples. New York: Springer-Verlag; Mahwah, NJ: Lawrence Erlbaum Associates; 1986. pp. 63–107. Reprinted in 2000.
- Heckman JJ, Urzua S, Vytlacil EJ. Understanding instrumental variables in models with essential heterogeneity. Review of Economics and Statistics. 2006;88(3):389–432.
- Heckman JJ, Vytlacil EJ. Econometric evaluation of social programs, part II: Using the marginal treatment effect to organize alternative economic estimators to evaluate social programs and to forecast their effects in new environments. In: Heckman J, Leamer E, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007. pp. 4875–5144.
- Hirano K, Imbens GW, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003 July;71(4):1161–1189.
- LaLonde RJ. Evaluating the econometric evaluations of training programs with experimental data. American Economic Review. 1986 September;76(4):604–620.
- Manski CF. Semiparametric analysis of binary response from response-based samples. Journal of Econometrics. 1986 February;31(1):31–40.
- Manski CF, Lerman SR. The estimation of choice probabilities from choice based samples. Econometrica. 1977 November;45(8):1977–1988.
- Manski CF, McFadden D. Statistical analysis of discrete probability models. In: Manski CF, McFadden D, editors. Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press; 1981. pp. 2–49.
- Powell JL. Semiparametric estimation of bivariate latent variable models. In: Hsiao C, Morimune K, Powell JL, editors. Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya. New York: Cambridge University Press; 2001.
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983 April;70(1):41–55.
- Smith JA, Todd PE. Does matching overcome LaLonde's critique of nonexperimental estimators? Journal of Econometrics. 2005 March-April;125(1-2):305–353.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |