“Censoring by death” has become shorthand for the problem of characterizing the effect of a treatment or exposure

*X* on an outcome

*Y* when death (

*Z*; 1=alive, 0=dead) precludes the development or observation of that outcome.

Pearl (2011) and

VanderWeele (2011) accept the notion advanced by

Frangakis and Rubin (2002) and others that the combined PS/NEA approach defines a target quantity, the survivor average causal effect (SACE), of great interest

^{2}. This section will provide simple numerical examples to demonstrate that the SACE, while an interesting quantity, does not fully capture the notion of the effect of treatment on the censored outcome in the presence of censoring by death and is in general inadequate for decision making or regulatory purposes. We then discuss other quantities of interest, and consider important differences between censoring by death and the problem of post-infection outcomes, which has sometimes been treated as the same problem.

3.1. Problems with principal stratification

A continuous outcome

*Y* may be well-defined only for subjects who live.

Frangakis and Rubin (2002) argue that the causal effect of treatment on

*Y* is only defined in the principal stratum in which subjects live whether or not they are treated (i.e.,

*Z*_{0}=

*Z*_{1}=1) and so

*Y* is well-defined under both treatment levels. Thus, they argue, causal effects on

*Y*, as contrasts of the potential outcomes, are well defined only for this principal stratum.

It is productive to consider the contrast of the poorly-defined or missing potential outcome for subjects who die and the well-defined potential outcome for a subjects who do not to be a causal contrast. Let · denote the value of *Y* when a subject is dead and so *Y* cannot be measured. When the proportions of the population for whom treatment causes and prevents death are the same and so treatment has no net effect on mortality, this results in a simple measure of the net effect of treatment on *Y* adjusting for survival, as described below in this section. Section 3.2.3 considers a generalization to settings where treatment affects marginal survival distributions. This redefinition of the effects of *X* on *Y* allows us to avoid the apparent paradox in which treatment has no net effect on *Z* and no effect on *Y* (under Frangakis and Rubin’s definition), yet affects the distribution of *Y*.

The data in and illustrate these points. In these examples, treatment *X* has no effect overall on mortality, but causes and prevents death in equal numbers of subjects. In , treatment has no effect on *Y* in the “immune to death” principal stratum (i.e., *E*(*Y*_{1}–*Y*_{0}|*Z*_{0}=*Z*_{1}=1)=0). We would be tempted to conclude that treatment is neither beneficial nor harmful, since it has no effect on *Y* in the immune principal stratum and has no overall effect on mortality.

Even though treatment does not effect

*Y* in the immune principal stratum, it does affect the distribution of

*Y* in the population (i.e.,

*pr*(

*Y*_{0}=

*y*)≠

*pr*(

*Y*_{1}=

*y*), where

*Y*_{x} = · if

*Z*_{x} = 0), and further, the joint distribution of the potential outcomes is different under the two levels of treatment (i.e.,

*pr*(

*Y*_{0}=

*y,Z*_{0}=z)=

*pr*(

*Y*_{1}=

*y,Z*_{1}=z)). Thus, the average outcome conditional on surviving differs under both levels of treatment (i.e.,

*E*(

*Y*_{0}|

*Z*_{0}=1)=0.5≠

*E*(

*Y*_{1}|

*Z*_{1}=1)=1); this is represented in . One should conclude from these data that treatment is, overall, beneficial (if higher levels of

*Y* are good); this is supported by a decision-theoretic approach to the problem (

Joffe, Small, and Hsu, 2007), in which the utility under a treatment is a function only of the marginal (with respect to

*X*) distribution of variables under that treatment and not a function of the joint (with respect to

*X*) distribution of potential outcomes under different treatments. A sole concentration on the immune principal stratum (this focus is nearly explicit in

Rubin (2006)) would lead to an incorrect conclusion, even about the marginal effects of treatment on

*Y*.

Consideration of all principal strata simultaneously is here interesting and informative. We would correctly note that treatment has no effect on *Y* in the immune principal stratum, but that in the “die only if treated” stratum (i.e., *Z*_{1}=1,*Z*_{0}=0), the level of *Y* is higher than the “die only if untreated” stratum (*Z*_{1}=0,*Z*_{0}=1). A difficultly here is that membership in a principal stratum is not directly observed, and both individual-level membership and proportions in a stratum are not identified without further assumptions. Since these strata have the same size, it would be nice to say that, overall, treatment increases *Y* even after adjusting appropriately for mortality. We might do this by allowing the missing values for *Y* to cancel here, since the strata are of equal size. It is not immediately apparent how to do this when treatment affects the overall probability of death and the two strata are not of equal size.

In , treatment reduces the level of *Y* in the immune principal stratum but has no effect on the marginal distribution of *Y* or the joint distribution of *Y* and *Z*. If lower levels of *Y* are undesirable, focus solely on the distribution of *Y* in the immune principal stratum (with or without also considering the effect of *X* on *Z*) would lead to the incorrect conclusion that treatment is harmful, since it is harmful in the immune principal stratum and has no overall effect on mortality. Nonetheless, treatment has no overall effect on the distribution of *Y* in the population (i.e., *pr*(*Y*_{0}=*y*)=*pr*(*Y*_{1}=*y*), where *Y*_{x}=· if *Z*_{x}=0), and further, the joint distribution of the potential outcomes is the same under both levels of treatment (i.e., *pr*(*Y*_{0}=*y*,*Z*_{0}=*z*)=*pr*(*Y*_{1}=*y*,*Z*_{1}=*z*)), and so the average outcome conditional on surviving is the same under both levels of treatment (i.e., *pr*(*Y*_{0}=*y*|*Z*_{0}=1)=*pr*(*Y*_{1}=*y*|*Z*_{1}=1)=0.5. One should conclude from these data that treatment is, overall, neither beneficial or harmful. A sole concentration on the immune principal stratum would lead to an incorrect conclusion, even about the marginal effects of treatment on *Y*.

An assumption often made in the PS literature is monotonicity of the potential auxiliary variable (i.e., *Z*_{1}≥*Z*_{0} with probability 1 or *Z*_{1}≤*Z*_{0} with probability 1). Consider the data in , which could represent observed data in a randomized trial (changing *x* to *X*). The combination of monotonicity and the data in leads to the data in . Under the (incorrect) monotonicity assumption, the effect of treatment in the immune principal stratum leads to the same contrast as derived above; i.e., *E*(*Y*_{1}|*Z*_{0}=*Z*_{1}=1)–*E*(*Y*_{0}|*Z*_{0}=*Z*_{1}=1)=*E*(*Y*_{1}|*Z*_{1}=*Z*_{1}=1)–*E*(*Y*_{0}|*Z*_{0}=1)=1–0.5=0.5. That incorrect assumptions can lead to an appropriate estimand, while correct assumptions can lead to unreasonable conclusions is curious but not accidental in this setting. This will be explored further below.

3.2. Alternatives to principal stratification

The above examples suggest that sole focus on the PS estimand can be misleading. While our example above is suggestive of problems with the PS approach, it involves a situation in which treatment has no effect on the marginal distribution of mortality. The motivation for the PS approach derives from settings in which treatment does affect mortality and so standard comparisons conditional on observed survival are biased; one may view our examples above as a special case of settings in which treatment may have arbitrary effects on the survival distribution. We consider several alternative approaches: death blocking, marginal joint distributions, and marginal adjusted conditional contrasts. Finally, we discuss and compare the roles of each of the different estimands.

3.2.1. Death blocking One approach to this situation is to suppose that there are latent outcomes that would have been measured if each subject received his/her treatment

*X* but death were prevented. This approach is implicit in some approaches to survival analysis in the presence of competing risks and is explicit in

Robins (1986; chapter 12). These approaches rely on the assumption that, “at least conceptually, deaths ... could be eliminated in a manner that does not affect past or future covariate status or ...” the (latent) outcome

*Y*_{x}. The combination of the implausibility of this assumption with the vagueness of the associated counterfactuals (How is death to be eliminated? Would all means of blocking death lead to the same outcomes?) make this option unattractive (e.g.,

Kalbfleisch and Prentice, 2002); the NEA view seems attractive here.

We might view this as the controlled direct effect of treatment controlling for death (by setting it to 0). A less drastic intervention might involve equalizing death between different levels of treatment (

Robins and Greenland, 1989); this might be formulated in terms of natural or pure direct effects (

Robins and Greenland, 1992;

Pearl, 2001) or in terms of stochastic interventions on death (

Geneletti, 2007); such approaches would still be unattractive because of the vagueness of the counterfactuals or hypothetical interventions.

3.2.2. Marginal joint distributions One can consider the effects of treatment on the joint distribution of the outcomes

*Z* and

*Y*. This would lead to comparisons of

*pr*(

*Z*_{0},

*Y*_{0}) and

*pr*(

*Z*_{1},

*Y*_{1}). Such contrasts are adequate for making decisions and comparing utilities of various treatments. Let

*U*(

*x*,

*Z*_{x},

*Y*_{x}) denote the utility function for a particular decision; in general, it may be a function of all three arguments. In a decision-theoretic approach (

Joffe, Small, and Hsu, 2007), one would seek to choose the value

*x* of treatment which maximizes the expected utility

(here shown for discrete

*Y* and

*Z*. Because different decisionmakers will have different utility functions, one would thus want to have estimates of the marginal joint distributions

*pr*(

*Z*_{x},

*Y*_{x}) and not just of

*E*{

*U*(

*x*,

*Z*_{x},

*Y*_{x})}. In the setting of censoring by death, death is an important outcome that one would generally want to consider in making decisions (

Rosenbaum, 2006;

Joffe, Small, and Hsu, 2007). These contrasts of joint distributions, however, fail to isolate the effects of

*X* on

*Y* and so are unsatisfying for explanatory purposes.

One may factorize the marginal joint distributions as

*pr*(

*Z*_{x},

*Y*_{x})=

*pr*(

*Z*_{x})

*pr*(

*Y*_{x}|

*Z*_{x}). While contrasts of

*pr*(

*Z*_{1}) and

*pr*(

*Z*_{0}) are standard aggregate causal effects, contrasts of

*pr*(

*Y*_{1}|

*Z*_{1}=1) with

*pr*(

*Y*_{0}|

*Z*_{0}=1) are not standard causal effects, because they involve contrasts between different sets of subjects (

Rosenbaum, 1984;

Robins, 1986;

Frangakis and Rubin, 2002). In particular, when

*X* affects the marginal distribution of

*Z*, these contrasts will be unsatisfying for explanatory purposes, because subjects with

*Z*_{1}=1 may systematically differ from subjects with

*Z*_{1}=0 with respect to

*Y*_{1} or

*Y*_{0}.

3.2.3. Marginal adjusted conditional contrasts In this section, we consider another quantity that attempts to capture the net effect of treatment on

*Y* adjusting for censoring by death. Suppose that we observe not only whether but also when a subject fails

*T*. Under treatment level

*x*, we may observe

*T*_{x},

*Y*_{x}. For observations made at a fixed follow-up time

*m*,

*Y*_{x}=· if

*T*_{x}<

*m*. Let

*T*_{0,x}*T*_{0}(

*T*_{x},

*x*), where

, where

*S*_{x}*pr*(

*T*_{x}≥

*t*), and

.

*T*_{0}(

*t,x*) involves a mapping of survival distributions under treatment at level

*x* to treatment level 0. For illustration, consider a simple accelerated failure-time model, in which withholding treatment lengthens the distribution of lifetimes by a factor exp(β). We then have

*T*_{0}(

*t,x*)=

*t*exp(

*x*β), and

*T*_{0,x}=

*T*_{x}exp(

*x*β).

*T*_{0,x} is a variable whose distribution is not a function treatment. We can then factorize the marginal joint distribution of

*T*_{x},Y_{x} as

*pr*(

*T*_{x},Y_{x})=

*pr*(

*T*_{0,x},

*Y*_{x})=

*pr*(

*T*_{0,x})

*pr*(

*Y*_{x}|T_{0,x}). Contrasts of

*pr*{(

*Y*_{1}*|T*_{0,1}=

*t*)} with

*pr*(

*Y*_{0}*|T*_{0,0}=

*t*) are a sort of marginal (with respect to the joint distribution of potential outcomes under different treatments) contrast of the potential outcomes

*Y*_{x} adjusted for treatment-adjusted survival, a variable whose distribution is not affected by treatment. These are numerical contrasts of levels of the outcome for

*T**

min(

*T*_{0,0},

*T*_{0,1})≥

*m*,

*T*_{0,1}≥

*m*. We can also marginalize contrasts over

*T**≥

*m*. When rank preservation for

*T* holds (i.e.,

*T*_{0,x}=

*T*_{0} with probability 1), this contrast is a contrast of

*pr*{

*Y*_{1}|

*T*_{1}=

*t*,

*T*_{0}=

*T*_{0}(t,1)} with

*pr*{

*Y*_{0}|

*T*_{1}=

*t*,

*T*_{0}=

*T*_{0}(t,1)}; i.e., a PS type of estimand. Rank-preservation is stronger than and implies monotonicity. Rank-preservation is usually an implausible assumption.

We view these contrasts as appropriate control for a survival variable whose distribution is not affected by treatment. The contrast is adjusted for the effect of treatment on survival. The joint distribution may be parametrized in part using joint structural nested failure-time models for the effect of treatment on survival and structural nested distribution (or mean) models for the conditional effects of treatment on outcome

*Y* (see

Robins, 2008, who considers joint models with outcomes

*Y* meaningful after failure. Greene (2011, personal communication) proposed a version marginalizing over

*T**>

*m*, which may most closely approximate an overall assessment of the effect of

*X* on

*Y* controlling for death..

In the data in and , the marginal adjusted conditional contrasts are simply contrasts of *pr*(*Y*_{1}|*T*_{1}>*m*)=*pr*(*Y*_{1}|*T*_{1}=1) with *pr*(*Y*_{0}|*T*_{0}>*m*)=*pr*(*Y*_{0}|*T*_{0}=1), since treatment has no effect on the distribution of *Z*. These contrasts appropriately find that, in (see ), treatment has no effect, whereas in it does not.

Contrasts between *pr*(*Y*_{1}|*T*_{0,1}=*t*) and *pr*(*Y*_{0}|*T*_{0,0}=*t*) are not formal causal contrasts in the sense of comparisons of individual outcomes among a common group of subjects. Nonetheless, they may be viewed in a broader sense as causal inasmuch as they are a component of the effect of treatment on the joint distribution of *T* and *Y*. The factorization provided above best isolates the effect of *X* on *Y* from its effect on *T*, or adjusts the effect of *X* on *Y* for its effect on *T*. More expanded treatment of these sorts of joint models will be provided elsewhere.

3.3. Post-infection outcomes

The problem of post-infection outcomes can productively be viewed as similar to the problem of censoring by death (

Gilbert, Bosch, and Hudgens, 2003). The purpose of some vaccines in development for HIV is to reduce the level of some adverse outcome (e.g., viral load) after infection instead of just reducing the rate of infection. The outcome of interest

*Y* is considered meaningful only if a subject is not infected; thus, if

*Z*=1 denotes absence of infection,

*Y* is considered available only if

*Z*=0. For binary

*Z*, the problem is isomorphic with censoring by death if one switches the coding of

*Z*.

This isomorphism is broken when infection is considered a failure-time outcome. Now, if *T* denotes the time of infection, *Y* is available if *T*≥*m*. Joint structural nested models would need to be reformulated to reflect this change.

3.4. Roles of different estimands

Based on the foregoing considerations, we consider the roles of different estimands in the setting of censoring by death. We view many of the different quantities as complementary and so are loathe to identify a single estimand as being of sole or primary interest^{3}.

Of primary interest for decision problems is the joint distribution of the failure-time outcome and other outcomes that would be seen under a given treatment and the contrasts of these joint distributions under different treatments; i.e., we would like to estimate and contrast *pr*(*Z*_{0},*Y*_{0}) and *pr*(*Z*_{1},*Y*_{1}). The decisionmaker can then supply a utility to each combination of treatment and potential outcomes and choose a treatment to maximize the expected utility. In populations and conditions identical to those from which extant data are derived, we can often (e.g., in randomized trials) simply estimate *pr*(*Z*_{x},*Y*_{x}) from data on the joint distribution of *pr*(*X*,*Z*,*Y*).

Often we are interested in decisions under different conditions than those obtaining in a particular study. Following

Pearl and Barenboim (2011), denote the marginal joint distributions of the potential outcomes under the new conditions by

*pr**(

*Z*_{x},

*Y*_{x}). Here decisions should ideally be made based on

*pr**(

*Z*_{x},

*Y*_{x}) and a decisionmaker’s utility function. Identifying

*pr**(

*Z*_{x},

*Y*_{x}) under other than the precise conditions of the study from which data are collected can be challenging.

Making decisions is not the sole object of scientific inference, and sometimes not even the primary object. Understanding of the processes involved in generating the data can be a primary goal; this has been true in a wide variety of scientific disciplines, including astronomy, meteorology, and evolutionary biology. Such understanding may be enhanced by appropriate factorization of the marginal joint distributions of potential outcomes *pr*(*Z*_{x},*Y*_{x}) and by consideration of the joint distributions of the potential outcomes (e.g., *pr*(*Z*_{0},*Y*_{0},*Z*_{1},*Y*_{1}) used in PS.

To see this, consider the data in . These data are indistinguishable in observable data from the data in . Nonetheless, the story told by the two tables is rather different. In , treatment has no effect on *Z* and no average effect on *Y* in the always survivor stratum; in , treatment affects *Z*, and *Y*, but the positive and negative effects on both balance each other. The different tables would generate different expectations of what would happen in populations with different survival experiences. Although even experimental data cannot distinguish between the two settings, external theory or knowledge is sometimes available which allows us to say something about the joint distribution.

Similarly, when treatment affects survival, comparisons of

*pr*(

*Y*_{1}|

*T*_{0,1}>

*m*) and

*pr*(

*Y*_{0}|

*T*_{0,0}>

*m*) are likely more informative than contrasts of

*pr*(

*Y*_{1}|

*T*_{1}>

*m*) with

*pr*(

*Y*_{0}|

*T*_{0}>

*m*). Differences in the latter quantities may be due to the effect of treatment on

*Y* or the net effect of

*X* on

*T*, which is associated with

*Y*. In contrast, the marginal adjusted conditional contrast of

*pr*(

*Y*_{1}|

*T*_{0,1}>

*m*) and

*pr*(

*Y*_{1}|

*T*_{0,0}>

*m*) has sought to adjust for the average effect of treatment on survival and so is closer to a standard causal contrast.

Pearl (2009) has argued for the primacy of causal over observational knowledge, in part because causal knowledge is based on a more fundamental understanding of causal processes and more readily generalizes to new situations; these considerations apply here as well.

Similarly, qualitative aspects of causal effects are often more fundamental building blocks of our knowledge than their precise quantification, which may change from one setting to the next. Thus, the statement that smoking causes lung cancer is based on a fundamental understanding of biological process and generalizes more broadly than statements about the precise value of the average causal effect of smoking, which will be dependent on a host of measured and unmeasured factors. In such circumstances, we can gain a useful understanding of causal processes without being able to choose definitively an optimal treatment. A similar phenomenon arises in DAGs, where knowledge of the causal structure is more fundamental and prior to precise quantification.

Pearl (2009) has argued for the complementary nature of multiple conceptualizations of causality, including the graphical, nonparametric structural equations, and counterfactual. Here too, multiple approaches and estimands can provide complementary information which can enhance scientific understanding and may aid prediction

^{4}.