Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biometrics. Author manuscript; available in PMC 2012 March 1.
Published in final edited form as:
PMCID: PMC3030650

Estimation and Inference for the Causal Effect of Receiving Treatment on a Multinomial Outcome: An Alternative Approach


Recently Cheng (Biometrics, 2009) proposed a model for the causal effect of receiving treatment when there is all-or-none compliance in one randomization group, with maximum likelihood estimation based on convex programming. We discuss an alternative approach that involves a model for all-or-none compliance in two randomization groups and estimation via a perfect fit or an EM algorithm for count data. We believe this approach is easier to implement, which would facilitate the reproduction of calculations.

Keywords: All-or-none compliance, Causal effect, Multinomial outcomes, Noncompliance, Perfect fit, Principal stratification, Randomized trials

1. Introduction

Cheng (2009) proposed a causal analysis of a randomized trial to investigate the effect of seeing a specialist on the development of depression symptoms among elderly patients. The role of the specialist was to increase adherence to the depression treatments via education and assessment. Twenty primary care practices were randomized to either seeing a specialist or usual care which involved no access to the specialist. For simplicity, group randomization was ignored, and randomization was considered at the level of the individual. An important outcome was depression symptoms (none, minor, and major) at four months. Some subjects randomized to seeing a specialist did not see the specialist and so essentially received usual care. To analyze these data, Cheng (2009) specified a causal model with latent compliance classes for multinomial outcomes with all-or-none compliance in one randomization group. Cheng (2009) computed maximum likelihood (ML) estimates using a grid search and a convex programming algorithm. For a more general causal model with all-or-none compliance in two groups, we propose ML estimation via a perfect fit estimate or a simple EM algorithm for count data. Our estimation approach is simple, which is helpful if others wish to reproduce the results. Reproducibility of research is an important desideratum in statistical analyses (Peng, 2009).

2. Model

All-or-none-compliance, sometimes called all-or-none switching of interventions, has a precise meaning with important implications for the formulation of a causal model (Baker, 1997, Baker and Kramer 2005). Suppose subjects are randomly assigned to either T0 or T1, which denote different programs of interventions starting at randomization. All-or-none compliance in one randomization group means that all subjects randomized to T0 receive T0, and some subjects randomized to T1 receive T0 and some receive T1. All-or-none compliance in two randomization groups means that in each randomization group some subjects receive T0 and some receive T1. Under all-or-none compliance, one can use latent compliance classes, randomization, and reasonable assumptions to estimate the causal effect of receiving intervention (e.g. Baker and Kramer, 2005). In a more general setting, latent compliance classes are called principal strata (Frangakis and Rubin, 2002).

Models for latent compliance classes with all-or-none compliance in one group were formulated by Bloom (1984) for continuous outcomes and Sommer and Zeger (1991) and Connor et al (1991) for binary outcomes. Similar models for latent compliance classes with all-or-none compliance in two groups were formulated by Baker and Lindeman (1994) for binary outcomes (with before-and-after time periods treated like randomization groups) and Angrist Imbens and Rubin (1996) for continuous outcomes.

We generalize the model of Cheng et al (2009) to multinomial outcomes with all-or-none compliance in two groups instead of one. Let r index randomization group with r = 0 for assignment to intervention T0 and r = 1 for assignment to intervention T1. Let a index intervention received, where a = 0 for receipt of T0 and a = 1 for receipt of T1. Let j = 1, 2, …, J index multinomial outcomes. The counts are denoted

  • nraj = number of persons in randomization group r who receive intervention a and have outcome j.

In the terminology of Angrist et al (1996), there are four latent compliance classes:

  • A, always-takers, who would receive intervention T1 if randomized to either group;
  • C, compliers, who would receive the intervention to which they were randomized;
  • N, never-takers, who would receive intervention T0 if randomized to either group;
  • D, defiers, who would receive the opposite intervention to which they were randomized.

Following Cheng (2009), we make the same assumptions listed in Angrist et al (2006) and described below without mathematical notation.

  • ASSUMPTION 1: Stable unit treatment value assumption. The outcome for a subject is unaffected by the particular assignments of treatments to the other subjects (Rubin, 1980).
  • ASSUMPTION 2: Random assignment to randomization groups.
  • ASSUMPTION 3: Exclusion restriction. For always-takers and never-takers, the distribution of outcomes does not depend on randomization group.
  • ASSUMPTION 4: The fraction of subjects who receive each intervention varies by randomization group.
  • ASSUMPTION 5: Monotonicity. There are no defiers.

Extending the notation in Cheng (2009), we define the following parameters under the latent compliance model:

  • πC, the probability of being a complier,
  • πA, the probability of being an always-taker,
  • πN, the probability of being a never-taker,
  • νj, the probablity a complier in randomization group 0 has outcome j,
  • tj, the probablity a complier in randomization group 1 has outcome j,
  • sj, the probablity a never-taker has outcome j,
  • bj, the probablity an always-taker has outcome j,

where πC + πA + πN = 1 and Σj tj = Σj sj = Σj νj = Σj bj = 1. See Table 1 for the distributions of nraj in terms of these parameters.

Table 1
Model for latent compliance classes

The quantity of interest is the change in a weighted sum of outcomes due to receiving T1 instead of T0 among compliers,


where wj denotes a weight for outcome j based on the relative severity of the outcome. This quantity is called a complier average causal effect (CACE)(Angrist, Imbens, and Rubin, 1996).

2. Estimation

Estimation is via maximum likelihood with the count data. The kernel of the likelihood is

L=[product]j=1J[(πNsj+πCνj)n00j(πA  bj)n01j(πNsj)n10j(πCtj+πA  bj)n11j,

which generalizes the likelihood in Baker and Lindeman (1994) from binomial to multinomial outcomes and the likelihood in Cheng (2009) from all-or-none compliance in one group to all-or-none compliance in two groups. Let [theta w/ hat]CACE denote the ML estimate of θCACE. There are two cases for [theta w/ hat]CACE: [theta w/ hat]CACE(PerfectFit) if ML estimates of all parameters lie in the interior of the parameter space, and [theta w/ hat]CACE(Boundary) if ML estimates of some parameters lie on the boundary of the parameter space.

A perfect fit estimate is possible with this model because the number of independent parameters equals the number of independent cell counts. There are 4 J − 2 independent parameters, corresponding to J − 1 values for each of sj, bj, tj, and νj plus the 2 parameters πN and πA. There are also 4 J − 2 independent cells counts, with 4 J counts for {nraj} and a fixed total in each randomization group. Thus the model is saturated, and one can compute [theta w/ hat]CACE(PerfectFit) by setting observed counts equal their expected values. As derived in Appendix A,

θ^CACE(PerfectFit)=j=1Jwj(t^jν^j), wheret^j=n11jn1++n01jn0++π^C,  ν^j=n00jn0++n10jn1++π^C,  π^C=1n01+n0++n10+n1++.

When [pi]C, [nu with circumflex]j, and tj are each between zero and one, [theta w/ hat]CACE(PerfectFit) is the ML estimate. This is because the observed counts are the ML estimates for a different saturated model in which each expected count is a parameter, there is a one-to-one function relating the parameters in the model for latent compliance classes and the aforementioned parameters corresponding to expected counts, and a theorem states that estimates derived from a one-to-one function of ML estimates are also ML estimates (Rohatgi, 1976).

The perfect fit estimate of causal effect equals the intent-to-treat estimate, [theta w/ hat]ITT, divided by the difference in the fraction receiving T1 in each group,

θ^CACE(PerfectFit)=θ^ITTπ^C, where θ^ITT=j=1Jwj(n0+jn0++n1+jn1++),

which parallels the perfect fit estimate for a causal effect with binary outcomes (Baker and Lindeman, 1994) and the instrumental variables estimate for a causal effect with continuous outcomes (Angrist Imbens and Rubin, 1996). The estimated asymptotic variance of [theta w/ hat]CACE(PerfectFit) can be computed using the Mulitnomial-Poisson transformation (Baker, 1994),

var^{θ^CACE(PerfectFit)}=r=01a=01j=1J{[partial differential]θ^CACE(PerfectFit)[partial differential]nraj}2nraj,

where the derivatives are readily computed using software for symbolic algebra.

If either [pi]C, [nu with circumflex]j, or tj in (3) is less than zero or greater than one, the perfect fit estimate is not admissible, and we require the ML estimate of θCACE when some parameters lies on the boundary of the parameter space. In this case ML estimates can be computed using the following EM algorithm. The E-step computes the complete-data counts{eruj} corresponding to the expected the number of persons in randomization group r who are in latent compliance class u = N, C, A and have outcome j. The M-step estimates the parameters from the complete data on the E-step. Let superscript (i) index each iteration of the combined E-step and M-step. Based on the parameter estimates from the previous M-step, iteration (i) of the E-step is

e0Nj(i)=n00jπ^N(i)s^j(i)π^N(i)s^j(i)+π^C(i)ν^j(i),  e0Cj(i)=n00jπ^C(i)ν^j(i)π^N(i)s^j(i)+π^C(i)ν^j(i),e0Aj(i)=n01j,  e1Nj(i)=n10j,e1Cj(i)=n11jπ^C(i)t^j(i)π^C(i)t^j(i)+π^A(i)b^j(i),e1Aj(i)=n11jπ^A(i)b^j(i)π^C(i)t^j(i)+π^A(i)b^j(i),

and iteration (i) of the M-step is

π^N(i+1)=e+N+(i)e+++(i),π^C(i+1)=e+C+(i)e+++(i),  π^A(i+1)=e+A+(i)e+++(i),s^j(i+1)=e+Nj(i)e+N+(i),   b^j(i+1)=e+Aj(i)e+A+(i),   ν^j(i+1)=e0Cj(i)e0C+(i),t^j(i+1)=e1Cj(i)e1C+(i).

The parameter estimates in (7) are used for iteration (i + 1) of the E-Step. The ML estimate of θCACE is obtained by substituting the final M-step estimates at convergence into (1). Starting values for parameter estimates equal the perfect fit parameter estimates in (3) with any perfect fit estimate of πC, νj, or tj that is less than zero set equal to zero, and any perfect fit estimate of πC, νj, or tj that is greater than one set equal to one. Imbens and Rubin (1997) also computed ML estimates for a multinomial model via an EM algorithm but did not provide details.

Consider the important special case of a binomial outcome with all-or-none compliance in one randomization group. The counts in group r = 0 are {n00j} and the counts in group r = 1 are {n1aj}, for a = 0, 1 and j = 0, 1. In this case, both ML perfect fit and ML boundary estimates can be written in closed-form,

θ^CACE(PerfectFit)=t^1ν^1,when 0<ν^1<1,θ^CACE(Boundary)={t^1,when ν^10,t^11,when ν^11,where t^1=n111n11+,  ν^1=n001n00+n101n1++π^C,  π^C=n11+n1++.

To our knowledge, the above ML boundary estimate, which is derived in Appendix B, is a new result. A previous ML estimate for this particular boundary scenario involved an iterative algorithm (Cheng et al, 2009).

4. Re-analysis of depression data in Cheng (2009)

For the analysis of the depression data from Cheng (2009) in Table 2, we modified the formulas for the special case of all-or-none compliance in one randomization group, so there are no always-takers. Assumption 1 holds because a patient's outcome did not depend on the treatment of another patient. Assumption 2 holds because of random assignment. Assumption 3 holds because a person who would receive usual care in either group (a never-taker) receives the same usual care if randomized to either group. Assumptions 4 and 5 hold because seeing a specialist was not possible in the control group.

We computed [theta w/ hat]CACE for the weights in Cheng (2009) of (w1, w2, w3) = (1, 2, 3) and (1, 3, 4). We also investigated weights (0, −0.5, −1.0) under the premise that minor depression is half the severity of major depression, with a negative sign indicating a decrease in these detrimental outcomes. With these latter weights, [theta w/ hat]CACE can be interpreted as the expected reduction in major depression "equivalents" per person due to seeing a specialist instead of receiving usual care. Confidence intervals computed asymptotically and by bootstrapping were similar (Table 3), with all but one of ten thousand bootstrap iterations yielding a perfect fit estimate in the interior of the parameter space. For the proposed weighting, [theta w/ hat]CACE(PerfectFit) = 0.009 with a 95% confidence interval of (− 0.14, 0.16). Thus, there is little evidence that seeing a specialist reduces the probability of developing depression symptoms, but the wide confidence intervals may preclude a definitive conclusion.

Table 3
Comparison of results

5. Conclusion

With the growing emphasis on reproducible research (Peng, 2009), this simpler approach to ML estimation in causal models for multinomial data should be attractive to many researchers.

Supplementary Material

Supplementary materials


The author thanks Jing Cheng for helpful comments and kindly providing the data.


Perfect fit estimates

Using the following procedure we derived the perfect fit estimates in (3) for all-or-none compliance in two randomization groups with multinomial outcomes. Setting observed counts equal to their expected values yields


n01j=n0++πA  bj,

n10j=n1++πN  sj,

n11j=n1++{(1πNπA)tj+πA  bj}.

Summing (A.2) and (A.3) over j yields n01+ = n0++ πA and n10+ = n1++ πN. Therefore [pi]A = n01+/n0++, [pi]N = n10+/n1++, and [pi]C= 1 − [pi]A[pi]N. Substituting [pi]A and [pi]N into (A.2) and (A.3) gives bj = n01j/n01+ and ŝj = n10j/n10+. Substituting these estimated parameters into (A.1) and (A.4) gives the perfect fit estimates for νj and tj in (3). For estimates under all-or-none compliance in one randomization group, one substitutes n01j = 0 into the aforementioned estimates of πC, sj, tj, and νj.


ML boundary estimate with binary outcomes

Using the following procedure we derived the closed-form ML boundary estimates in (8) for all-or-none compliance in only one randomization group and a binomial outcome. The counts in group r = 0 are {n0a}, and the counts in group r = 1 are {n1aj} for j = 0, 1. The kernel of the log-likelihood is

L=n000log{πN(1s1)+πC(1ν1)}+n001 log(πNs1+πCν1)+n100 log{πN(1s1)}+n101 log(πNs1)+n110 log{πC(1t1)}+n111 log(πCt1).

Let n = n0+ + n1++. Using software for symbolic algebra (Wolfram Research, 2008) to solve [partial differential]L/[partial differential]πC = [partial differential]L/[partial differential]s1 = [partial differential]L/[partial differential]t1 = 0 yields t1 = n111/n11+ and

π^C=n11+(n000+f)nf,s^1=f(n001+n101)n000n100+f(n001+n10+), for ν1=0,π^C=n11+(n001+k)nk,s^1=n101(n001+k)n001n101+k(n000+n10+), for ν1=1.

where f = n100 + n11+ and k = n101 + n11+.


Supplementary materials

Computer code in Mathematica 7.0 (Wolfram Research, Inc., 2008) to reproduce these results is available under the Paper Information link at the Biometrics website


  • Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. Journal of the American Statistical Association. 1996;91:44–455.
  • Baker SG. The Multinomial-Poisson transformation. The Statistician. 1994;43:495–504.
  • Baker SG. Compliance, all-or-none. In: Kotz S, Read CR, Banks DL DL, editors. The Encyclopedia of Statistical Science, Update Volume. New York: John Wiley and Sons, Inc.; 1997. pp. 134–138.
  • Baker SG, Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Statistical Methods in Medical Research. 2005;14:1–19. correction 14, 349. [PubMed]
  • Baker SG, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Statistics in Medicine. 1994;13:2269–2278. correction 14, 1841. [PubMed]
  • Bloom HS. Accounting for no-shows in experimental evaluation designs. Evaluation Review. 1984;8:225–246.
  • Cheng J. Estimation and inference for the causal effect of receiving treatment on a multinomial outcome. Biometrics. 2009;65:96–103. [PubMed]
  • Cheng J, Small DS, Tan Z, Ten Have TR. Efficient nonparametric estimation of causal effects in randomized trials with noncompliance. Biometrika. 2009;96:19–36.
  • Connor R, Prorok PC, Weed DL. The case-control design and the assessment of the efficacy of cancer screening. Journal of Clinical Epidemiology. 1991:44.1215–44.1221. [PubMed]
  • Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. [PubMed]
  • Peng RD. Reproducible research and Biostatistics. Biostatistics. 2009;10:405–408. [PubMed]
  • Imbens GW, Rubin DB. Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies. 1997;64:555–574.
  • Rohatgi VK. An Introduction to Probability Theory and Mathematical Statistics. New York: John Wiley and Sons, Inc.; 1976. p. 383.
  • Rubin DB. Comment on 'Randomization analysis of experimental data: The Fisher randomization test' by D. Basau. Journal of the American Statistical Association. 1990;75:591–593.
  • Sommer A, Zeger SL. On estimating efficacy from clinical trials. Statistics in Medicine. 1991;10:45–52. [PubMed]
  • Wolfram Research, Inc. Mathematica, Version 7.0. Champaign, IL: Wolfram Research, Inc.; 2008.