PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Am Stat Assoc. Author manuscript; available in PMC Sep 1, 2011.
Published in final edited form as:
J Am Stat Assoc. Sep 1, 2010; 105(491): 912–929.
doi:  10.1198/jasa.2010.ap08739.
PMCID: PMC3035160
NIHMSID: NIHMS192613
CAUSAL EFFECTS OF TREATMENTS FOR INFORMATIVE MISSING DATA DUE TO PROGRESSION/DEATH
Keunbaik Lee,1* Michael J. Daniels,2 and Daniel J. Sargent3
1Assistant Professor, Biostatistics Program, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA
2Professor, Department of Statistics, University of Florida, Gainesville FL 32611, U.S.A.
3Professor of Biostatistics and Oncology, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, U.S.A.
* ; klee4/at/lsuhsc.edu
; mdaniels/at/stat.ufl.edu
; sargent.daniel/at/mayo.edu
SUMMARY
In longitudinal clinical trials, when outcome variables at later time points are only defined for patients who survive to those times, the evaluation of the causal effect of treatment is complicated. In this paper, we describe an approach that can be used to obtain the causal effect of three treatment arms with ordinal outcomes in the presence of death using a principal stratification approach. We introduce a set of flexible assumptions to identify the causal effect and implement a sensitivity analysis for non-identifiable assumptions which we parameterize parsimoniously. Methods are illustrated on quality of life data from a recent colorectal cancer clinical trial.
Keywords: Principal stratification, QOL, Ordinal data, Sensitivity analysis
In longitudinal clinical trials, outcome variables at later time points are only defined for patients who survive to those times. This complicates analyses, since those who survive on each treatment arm to any given time point are not a random sample of those randomized to each treatment arm. Standard approaches to account for missing data in the literature implicitly ‘impute’ values of outcome after dropout. For quality of life (QOL) data, if a subject drops out due to deteriorating disease status or death, the QOL will not be defined after the dropout time. If we restrict the analyses to subgroup of patients who survived, the resulting estimate of the treatment effect will no longer have a causal interpretation.
One way to conduct analyses in this circumstance is to construct the joint distribution of the longitudinal responses and death times (Hogan and Laird, 1997; Pauler et al., 2003; Kurland and Heagerty, 2005; Lee and Daniels, 2007). Hogan and Laird (1997) proposed a likelihood-based mixture model approach to model the joint distribution for incomplete repeated measurements and right-censored event times. Pauler et al. (2003) proposed a pattern mixture model for longitudinal quality of life data with nonignorable missingness due to dropout and censorship by death. Recently, Kurland and Heagerty (2005) explored regression models conditioning on being alive as a valid target of inference using regression models that condition on survival status rather than a specific survival time. Lee and Daniels (2007) used a pattern mixture approach with patterns defined on the observed tumor progression or death times which is similar to ideas in Hogan and Laird (1997) and Pauler et al. (2003). Each of these approaches to drawing inference in the presence of death, in effect, stratify the analysis using events that occur after randomization and do not maintain the causal interpretation of the treatment effect.
To obtain the causal effect of treatment, we start by defining potential outcomes. Potential outcomes are all the outcomes that would be observed if each of the treatments had been applied to each of the subjects (Rubin, 1974, 1978; Holland, 1986). Frangakis and Rubin (2002) used the concept of potential outcomes in an approach called ‘principal stratification’. Principal stratification partitions subjects into sets with respect to posttreatment variables. For example, consider two treatments, Tx = 0, 1 and a binary response Y(Tx). The causal effect of the treatment on response is Y(1) − Y(0). However, we only observe either Y(1) or Y(0) for each subject. Let D(Tx) be a binary posttreatment variable. In this case, there are four principal strata defined by the pairs of potential values of D(Tx), (i) ({i|D(0) = 0, D(1) = 0}), (ii) ({i|D(0) = 0, D(1) = 1}), (iii) ({i|D(0) = 1, D(1) = 0}), and (iv) ({i|D(0) = 1, D(1) = 1}). The principal strata are not affected by treatment assignment. Causal effects are defined within these principal strata.
Frangakis and Rubin (2002) discussed causal effects in studies where the outcome does not exist due to death. Rubin (2000) and Hayden et al. (2005) referred to the estimand in Frangakis and Rubin (2002) as ‘the survivors average causal effect (SACE)’. Egleston et al. (2007) proposed assumptions to identify the SACE among survivors at a fixed time point in a cohort study of older adults and implemented a sensitivity analysis for some of those assumptions. Rubin (2006) introduced the causal effect of a treatment on an outcome that is censored by death in a Quality of Life study. Recently, Lee and Daniels (2008) used a ‘principal stratification’ approach to estimate the SACE for ordinal data in a two treatment arm setting.
The identification of the SACE relies on untestable assumptions. For example, the widely used monotonicity assumption (in the context of death) states that “no subject would be dead if randomized to an active treatment, but would be alive if randomized to a control treatment”. This assumption (and variations) has been used in a variety of settings, including death and noncompliance (Zhang and Rubin, 2003; Gilbert et al., 2003; Cheng and Small, 2006; Egleston et al., 2007; Lee and Daniels, 2008; Cheng, 2009; Imai 2008; Jin and Rubin, 2008). Stochastic monotonicity assumptions have been developed recently as a weaker and more realistic alternative in the context of compliance (Roy et al., 2008) and instrumental variables (Small and Tan, 2008).
In this paper, we describe an approach that can be used to obtain the causal effect of treatment in the presence of death in a setting of three treatment arms with longitudinal ordinal outcomes. We replace the common deterministic monotonicity assumption with a more flexible set of assumptions that allows incorporation of subject specific covariates to increase its applicability to our setting. Overall, we provide a realistic framework that has only four sensitivity parameters for longitudinal ordinal data involving comparisons among three treatments. Although our main interest will be treatment differences at the end of the study, we will make efficient use of all available longitudinal data which is essential due to the high rate of dropouts unrelated to death.
The paper is arranged as follows. In section 2, we describe the clinical trial that motivated our analyses. In section 3, we formally define the causal effects of interest, provide some assumptions that are necessary to identify them, and then derive how the causal effects can be estimated from these assumptions along with modeling assumptions for the observed data in this complex setting. We also propose a sensitivity analysis procedure for some of these untestable assumptions. Methods are illustrated on the quality of life data in section 4. Section 5 contains conclusions and a discussion.
As a secondary outcome, many clinical trials measure aspects of a patient’s sense of well-being and ability to perform various tasks to assess the effects that cancer and its treatment have on the patient. Such quality of life (QOL) outcomes in clinical trials are generally obtained repeatedly over time using questionnaires that can be completed by the patient without direct supervision. These standardized questionnaires rate factors such as pain, treatment side-effects, mood, energy level, family and social interactions, sexual function, ability to work, and ability to maintain routine daily activities. Here, we describe the QOL data set with which we will work.
We analyze QOL data from a recent colorectal cancer clinical trial (Trial N9741, Goldberg et al., 2004). The trial randomized patients with advanced colorectal cancer to compare three different two drug combinations. The main objective of the trial was to find a better treatment for colorectal cancer. The trial led to a new standard of care.
A total of 795 patients with colorectal cancer were randomly assigned to one of three treatments between May 1999 and April 2001. Patients received either irinotecan and bolus fluorouracil plus leucovorin (IFL arm), oxaliplatin and infused fluorouracil plus leucovorin (FOLFOX arm), or irinotecan and oxaliplatin (IROX arm). At the time the trial was initiated, the IFL regimen was the FDA approved standard of care (Saltz et al., 2000). Based on the primary endpoints of time to progression and overall survival, the FOLFOX arm was identified as the superior treatment and FOLFOX was also comparatively safe relative to the other treatments.
However, given that the toxicity profiles were quite different on the three treatment arms, it was of interest to see if there was a negative impact of the ‘better’ treatment (FOLFOX) on patient’s QOL. In this paper we compare the quality of life data on the three treatments.
2.1 QOL outcomes
The QOL outcomes measured here included fatigue, nausea, vomiting, diarrhea, and dehydration. Previous analyses split the QOL scores into 12 week windows. The patient’s visit is the window number during which the survey was filled out (0=baseline, 1=1–84 days after going on study, 2=85–168 days after going on study,…, 5=337–420 days after going on study). We will analyze the data from the first five windows (up to about one year).
For illustration, we focus on one QOL measure, fatigue. Fatigue is measured on a 5 point ordinal scale (1: I am usually not tired at all; 2: I am occasionally rather tired; 3: There are frequently periods when I am quite tired; 4: I am usually very tired; 5: I feel exhausted most of the time). Because very few patients reported category 5, we collapsed category 4 and 5 into one category.
2.2 Dropouts
Similar to other longitudinal studies, the QOL data collection in this clinical trial had substantial dropout. The overall dropout rates were similar in the three treatment groups (96% for IFL; 98% for IROX; 98% for FOLFOX). About 95% of dropouts occurred between baseline and the fourth visit. Dropout was related to whether cancer had progressed, death, toxicity, and other factors. A large number of subjects dropped out due to tumor progression or death (40.5%). The dropout rates due to progression/death were marginally higher on the IFL and IROX arms than on the FOLFOX arm. An interesting feature of this dataset is that we have the actual progression/death times for all subjects including those who dropped out for other reasons. More detailed description of the pattern of dropouts in this study is given in Lee and Daniels (2007).
We will use this data to illustrate the models and methods introduced in the subsequent sections.
Consider a three-arm trial. In our application the three treatments were IFL (standard of care) and FOLFOX and IROX (both experimental). Let Txi be the treatment indicator for subject i = 1, … , N,
equation M1
and let Dit(Txi) be tumor-progression/death indicator for subject i at visit t = 1, … ,T,
equation M2
Note T is last visit. Obviously, if Dit(Txi) = 1, then Dis(Txi) = 1 for s > t. To maintain clarity, we will refer to the tumor progression/death indicator Dit(·) as the death indicator for rest of this paper.
We define the following eight (principal) strata:
  • At(0) = {i|(Dt(0), Dt(1), Dt(2)) = (0, 0, 0)}: the subjects who would be alive under all three arms at visit t.
  • At(1) = {i|(Dt(0), Dt(1), Dt(2)) = (1, 0, 0)}: the subjects who would be alive under the two experimental treatments but not alive under the standard of care (IFL) at visit t.
  • At(2) = {i|(Dt(0), Dt(1), Dt(2)) = (1, 1, 0)}: the subjects who would be alive under FOLFOX but not alive under the standard of care (IFL) and IROX at visit t.
  • At(3) = {i|(Dt(0), Dt(1), Dt(2)) = (1, 0, 1)}: the subjects who would be alive under IROX but not alive under the standard of care (IFL) and FOLFOX at visit t.
  • At(4) = {i|(Dt(0), Dt(1), Dt(2)) = (0, 0, 1)}: the subjects who would be alive under the standard of care (IFL) and IROX but not alive under FOLFOX at visit t.
  • At(5) = {i|(Dt(0), Dt(1), Dt(2)) = (0, 1, 0)}: the subjects who would be alive under the standard of care (IFL) and FOLFOX but not alive under IROX at visit t.
  • At(6) = {i|(Dt(0), Dt(1), Dt(2)) = (0, 1, 1)}: the subjects who would be alive under standard of care (IFL) but not alive under the two experimental treatments at visit t.
  • At(7) = {i|(Dt(0), Dt(1), Dt(2)) = (1, 1, 1)}: the subjects who would not be alive under any treatment at visit t.
Now, let Yit(Txi) be the potential ordinal outcome (Yit(Txi) = k for k = 1, … , K) for subject i at time t. The full set of potential outcomes at visit t is
equation M3
where, for treatment tx, Yt(tx) is the potential outcome if a subject is alive at visit t (Dt(tx) = 0). If the patient is alive (Dit(Txi) = 0), define Rit to be the indicator that Yit = Yit(Txi) is observed. In the following, we assume monotone dropout (Rit = 1 [implies] Rit−1 = 1). Wi and Si respectively denote the vector of baseline covariates and progression/death time (Si = j [left and right double arrow ] {Dij = 1, Dij−1 = 0}). So, the observed data for individual i is
equation M4
Unfortunately, given Oit alone, we cannot determine to which stratum individual i belongs.
In the next section, we formally define the causal effects of interest and then state some additional assumptions that are necessary to estimate them.
3.1 Defining the Causal Effects
Suppress i here for clarity. We assume each individual at visit T (last visit) belongs to one of 8 basic principal strata (Frangakis and Rubin, 2002) defined by the unique combinations of (DT(0), DT(1), DT(2)). Some of these strata will be empty because of the monotonicity assumptions (Assumption I in Section 3.2).
For each pairwise treatment comparison, we define the three causal effects of interest at time T (last visit) for k = 1, … , K − 1,
  • equation M5
  • equation M6
  • equation M7
    There are three SACEs for comparing IROX and IFL, three for comparing FOLFOX and IFL, and three for comparing IROX and FOLFOX. The SACEk,1(1, 0) is the odds ratio that the outcome at time T is larger than k for comparing IROX and IFL for subjects who would be alive under all three treatment arms. The SACEk,2(1, 0) is the odds ratio that the outcome at time T is larger than k for comparing IROX and IFL for subjects who would be alive under IROX and IFL but not alive under FOLFOX. The SACEk,3(1, 0) is the odds ratio that the outcome at time T is larger than k for comparing IROX and IFL for subjects who would be alive either under all three treatment arms or only under IROX and IFL. The SACEk,.(2, 0) and SACEk,.(2, 1) are defined similarly for comparisons of FOLFOX and IFL, and FOLFOX and IROX, respectively.
3.2 Assumptions for Identifiability of a Causal Effect
Define Pt = (P1, … , Pt) to be the history of potential outcomes up to and including the outcomes at visit t. Before introducing the assumptions, it is convenient to define some additional quantities. Let gT (tx, w) = P(DT (tx) = 0|w) for tx = 0, 1, 2 where w is the vector of baseline covariates. We assume that
equation M8
(1)
That is, the probability of being alive on the treatments tx = 0, 1, 2 is a function of baseline covariates with coefficients depending on treatment tx. Also, let
equation M9
(2)
equation M10
(3)
equation M11
(4)
thus, equation M12 is the probability of a patient with baseline covariates w being alive on IROX at time T, given that they would be alive on IFL at time T. Similarly, equation M13 represents the probability of being alive on FOLFOX at time T, given that they would be alive on IFL at time T for a patient with baseline covariates, w. The equation M14 is the probability of a patient with baseline covariates w being dead on FOLFOX at time T given that they would be dead on the other two arms at time T. Finally, let
equation M15
(5)
The hT,tx(k) is the probability that the (ordinal) QOL response is greater than k for a subject on treatment tx that has not died or progressed by time T. We now describe the assumptions.
We first make the common Stable Unit Treatment Value Assumption (SUTVA, Rubin, 1978) which states that a patient’s potential outcomes are unrelated to the treatment status of other patients. Next, we list the assumptions necessary to identify the causal effect of interest:
Assumption 1 (Monotonicity)
  • Deterministic: The assumption that
    equation M16
    (6)
    implies that there will be no subjects who die on both experimental treatments and survive on the standard of care.
  • Stochastic I: We assume equation M17 for j = 1, 2 are given by
    equation M18
    (7)
    where equation M19. To understand Uj(w), note that given gT (j, w), we have the following bounds on the conditional distribution,
    equation M20
    This modeling assumption was used in Roy et al. (2008) in the context of compliance in a smoking cessation trial. Note that if ρ = 0, then equation M21, which implies that DT(1) is independent of DT(0) and DT(2) is also independent of DT(0). If ρ = 1, then equation M22, which is the largest possible probability that is compatible with the marginal distributions. The assumption states that being alive on FOLFOX (or IROX) is positively correlated with being alive on IFL (standard of care); i.e., if a subject is alive on IFL, they are more likely to be alive on FOLFOX (or IROX).
  • Stochastic II: We assume equation M23 is given by
    equation M24
    (8)
    where equation M25. Similarly, we have
    equation M26
    This assumption is similar to b) above, but addresses all three treatments. It says that dying on FOLFOX is positively correlated with dying on both IROX and IFL; that is, patients who would die on IROX and IFL (standard of care) would be more likely to die on FOLFOX.
Both stochastic monotonicity assumptions address the concern of a deterministic monotonicity assumption not being reasonable between two treatment arms and implicitly allows the assumption to vary with subject specific covariates, w. In addition, they both imply a positive covariance (correlation) between the event on the left and right of the conditioning. The sensitivity parameters in Assumption I are ρ and ν.
Assumption 2 (Ignorability): Tx [perpendicular] PT
Recall Pt is all the potential outcome up to and including time t. Assumption 2 states that the treatment arm is unrelated to the set of potential outcomes; this holds by randomization of the treatments.
Assumption 3 (Sensitivity Parameter Reduction via Proportional odds)
For k = 1, … , K − 1,
equation M27
(9)
equation M28
(10)
equation M29
(11)
equation M30
(12)
and
equation M31
(13)
Note that AT(1), AT(4), or AT(5) in equations (9)(11) are the principal strata in which subjects would be dead under one of three arms. In contrast, AT(2) or AT(3) in equations (12)(13) are the principal strata in which subjects would be dead under two of three arms. The sensitivity parameters, τ and λ correspond to the odds ratios formed from conditioning on the subjects dying on either one or two treatment arms respectively (vs. surviving on all three arms). More specifically, the first sensitivity parameter, τ, corresponds to the ratio of odds of fatigue, for a given treatment, for a patient in a principal stratum in which they would survive on only two arms to the odds for a patient in a principal stratum in which they would survive on all three arms. The second sensitivity parameter, λ corresponds to the ratio of odds of fatigue, for a given treatment, for a patient in a principal stratum in which they would survive on only one arm to the odds for a patient in a principal stratum in which they would survive on all three arms.
Note also that the two sensitivity parameters, τ and λ, do not depend on k. As such, the assumption is referred to as proportional odds. In total, Assumption 3 has two sensitivity parameters.
Assumptions like Assumption 3, which involve a few sensitivity parameters, are fundamental to the ability to reasonably elicit sensitivity parameters to conduct inference (Egleston et al., 2009). The semiparametric models of Robins and colleagues (Robins et al., 1995; Rotnitzky et al., 1998; Scharfstein et al., 1999) for MNAR missingness also make simplifying assumptions to reduce the sensitivity parameters (in that context, in the missing data mechanism). Egleston et al. (2007) and Lee and Daniels (2008) use similar assumptions to Assumption 3 in much simpler contexts. Other variations on Assumption 3 are possible and would be specific to the application. The assumptions in this case were carefully formulated with the expertise and guidance of Dr. Sargent, who was the statistician on the N9741 clinical trial.
3.3 Identification and Estimation of the SACE
Our goal in the following will be to use the observed data, Ot, along with the three assumptions in Section 3.2 to draw inference about the SACE. For clarity of presentation, we demonstrate the identification of the SACE in two parts. First, we identify the joint distribution of DT = (DT(0), DT(1), DT(2)); and second, we identify the conditional distribution of YT given DT.
3.3.1 Identification and Calculation of Joint Distribution of DT
We identify the marginal and joint distributions of DT using Theorem I.
Theorem I
(Identification of the joint distribution of DT): Under Assumptions 1 and 2, P(DT) is identified.
A detailed proof of this result can be found in the Appendix.
Using Theorem I, the marginal and joint probabilities of DT can be written as functions of (1)(4) as follows:
  • The marginal distributions, P(DT (tx) = 0) for tx = 0, 1, 2 can be expressed as
    equation M32
    (14)
  • The joint distribution, P(DT), which corresponds to P(AT (j)) for j = 0, … , 6 can be expressed as
    equation M33
    These expressions are used for the identification and estimation of the causal effects in the subsequent sections.
3.3.2 Identification of SACE
We now consider identification of the conditional distributions of YT given DT. Note that Assumption 2 implies
equation M34
for tx = 0, 1, 2. We demonstrate the identification of the SACEs using this result, Theorem I, and Assumption 3.
Theorem II
Under assumptions 1, 2, and 3, SACEk,.(1, 0) is identified.
A simple proof of this result starts by showing that the numerators and denominators of SACEk,.(1, 0) can all be written as functions of P(YT(0) > k|AT(0)) and P(YT(1) > k|AT(0)) using Assumption 3. We show this next.
From Assumption 3, we have
equation M35
(15)
equation M36
(16)
By identifying P(YT(0) > k|AT (0)), we identify (15) and by identifying P(YT(1) > k|AT(0)), we identify (16).
For SACEk,3(1, 0), we need
equation M37
(17)
equation M38
(18)
which are functions of P(AT(0) [union operator] AT(4)), P(AT(0)), and P(AT(4)) (all identified by Theorem I). We showed above that P(YT(0) > k|AT(4)) and P(YT(1) > k|AT(4)) can be written as functions of P(YT(0) > k|AT(0)) and P(YT(1) > k|AT(0)). The details on the identification of P(YT(0) > k|AT(0)) and P(YT(1) > k|AT(0)) can be found in the Appendix.
Similar arguments can be used to identify SACEk,.(2, 0) and SACEk,.(2, 1). We state this formally in the following two corollaries.
Corollary I
Under assumptions 1, 2, and 3, SACEk,.(2, 0) is identified.
Given the identification results in Theorems I and II and following a similar argument to Theorem I, all the relevant numerators and denominators can be written as a function of the quantities necessary to compute SACEk,.(1, 0) except for P(YT(2) > k|AT (0)). The Appendix contains a proof of the identification of this probability.
Corollary II
Under assumptions 1, 2, and 3, SACEk,.(2, 1) is identified.
The entire proof is in the Appendix.
3.4 Estimation and modeling of gT (tx, w), equation M39, and hT,tx(k)
As shown in the previous two subsections (and the Appendix), the SACE is a function of gT (tx, w), equation M40, and hT,tx(k). The parameters for gT (tx, w), ψtx, can be estimated using maximum likelihood based on NT(tx) subjects, where NT(tx) is the number of subjects who were alive after the study was complete under Tx = tx (see Table 1). We estimate equation M41 in Assumption 1 using ĝT (1, w), ĝT (2, w), and ĝT (0, w) with fixed ρ. We estimate equation M42 using ĝT (·, w) with fixed ν.
Table 1
Table 1
Breakdown of subjects who were alive by treatment group at the time of study completion. Proportions are in parentheses
Note that we can use the empirical expectation to estimate Ew {f(w; tx)} for an arbitrary function f(·) which is given by
equation M43
For example, Ew {gT(tx, w)} and equation M44 for j = 1, 2; tx = 0, 1, 2 can be estimated by
equation M45
To estimate hT,tx(k), given in (5), we need to specify models for the observed data and missing data due to loss to follow-up (not progression/death). Before stating the models, we specify one additional assumption that allows us to estimate the above quantities from the observed data. Recall that S is the progression/death time. So, S = j [left and right double arrow ] {Dj = 1, Dj−1 = 0}.
Assumption 4 (Conditional MAR)
Rt [perpendicular] {Yt, … , Yj−1}|S = j, Yt−1; Tx for t < j.
This assumption states that missingness of outcome is independent of the value of the outcome given the previous history of outcomes and Tx for all subjects with progress/death time j. It is MAR conditional on progression/death time. Missingness assumptions like this have recently been termed partially ignorable (Harel and Schafer, 2009). This assumption will allow us to use all the available data on survivors for the analysis.
We specify marginalized transition models of order two, IOMTM(2) (Lee and Daniels, 2007), conditional on progression/death time as follows, S,
equation M46
(19)
equation M47
(20)
where β(j, tx) are the coefficients of treatment arm tx, β01(j) > (...) > β0K−1(j) and equation M48 for l = 1, 2 are the linear and quadratic coefficients of of the lag one and two responses, respectively. Model (19) is a cumulative logit model, a common model for ordinal data. However, the link function in the conditional probabilities (20) is a multinomial logit because we expect the K − 1 (here, K = 4) conditional probabilities to have different coefficients equation M49, depending on k.
Clearly, for models (19) and (20) to form a coherent probability model, the Δitk(j, tx) will be constrained. The Δitk(j, tx) is found using the following identity
equation M50
We refer the reader to Lee and Daniels (2007) for more details on these models. Within this modeling framework, hT,tx(k) is given by
equation M51
(21)
where P(S = j|tx) is a multinomial mass function for S, and P(YT > k | S = j, tx) is calculated from (19) given tx and is only defined for t < j. Via Assumption 4, we assume that for a given progression/death time (pattern), that missing data before the progression/death time is MAR (conditional on pattern); this situation occurs when dropout is not due to progression/death. These models were chosen since they were shown to fit the observed data well for non-causal inference in Lee and Daniels (2007).
3.5 Standard Errors for SACE
To calculate standard errors for SACE, we use the delta method. Let
equation M52
where equation M53 and βT(·) = (βT(0), βT(1), βT(2)). The variance of the SACEK−1(Tx, Tx’) for Tx ≠ Tx’ is given by
equation M54
where ΔSACEK−1(Tx, Tx’) is the gradient of SACEK−1(Tx, Tx’) with respect to θ. Writing equation M55, we have
equation M56
Details on computing the gradient and variance of [theta w/ hat] are given in the Supplemental web materials (http://www.amstat.org/publications/jasa/supplemental materials).
3.6 Sensitivity Analysis
The identification of the SACE relies on untestable assumptions (in particular, Assumptions 1 and 3). We will use a sensitivity analysis procedure to draw inference about SACE by varying the sensitivity parameters (τ,λ) in Assumption 3, and (ρ ν) in Assumption 1. We discuss reasonable ranges for the former two sensitivity parameters in Section 4.1.
We use the approach proposed in Section 3 to analyze the QOL data from Section 2. We start out with a discussion of the elicitation of reasonable ranges for the sensitivity parameters.
4.1 Elicitation of Sensitivity Parameters
Dr. Sargent, the primary statistician for this clinical trial, provided the expertise on the sensitivity parameters for the analysis (including the expertise needed for the specification of Assumption 3). Dr. Sargent is both a PhD level statistician and a professor of oncology. Having both the quantitative and clinical knowledge is an important asset in eliciting values for sensitivity parameters. The feasibility of the eliciting of sensitivity parameters in general has been discussed in Shepherd et al. (2007). The following decisions regarding reasonable values for the sensitivity parameters were elicited through extensive discussion. The sensitivity parameters τ and λ are unlikely to be outside the range of (1/2, 1.5); that is, τ, λ [set membership] [1/2, 1.5]. The key to these ranges is that it was thought very unlikely for more than a 50% increase or decrease in these odds ratios.
We conduct a sensitivity analysis and draw inferences in the next section.
4.2 Causal inference results
We are interested in subjects who would be alive on at least two treatment arms until the end of the study. Table 1 gives the number of patients alive at the end of the study by treatment.
4.2.1 Estimation of survival and fatigue probabilities
Baseline covariates related to survival, w, included in the analyses were age (AGE), patient performance status (PS), and number of sites of metastatic disease (NOSITE). These three factors are well established to impact patient survival in this setting. To simplify computations of the empirical expectations in Section 3.4, we categorized each of the covariates. The cutpoints were chosen by Dr. Sargent using his extensive experience with the trial and the disease. Age was discretized as < 70 (AGE = 0) and ≥ 70 (AGE = 1). Performance status (PS) is a measurement of the patient’s ability to carry out activities of daily living, measured on a 5 point ordinal scale, where 0 represents full health and 4 represents death. Thus a higher PS indicates worse health. We discretized performance status into high {0 or 1} (PS = 0) vs. low {2,3,4} (PS = 1). The number of sites of metastatic disease was broken into three categories, {1,2, > 2} and coded using two indicator variables, NOSITE1 and NOSITE2, respectively as {(0,0), (1,0), (0,1)}.
We fit the logistic regression models in (1) using these covariates. The regression coefficient estimates, [psi], are given in Table 2. For FOLFOX, the estimated coefficients of performance status and number of metastatic sites were significant; for IFL, performance status was significant and for IROX, number of metastatic sites was significant. For all three arms, the coefficients indicated that the estimated survival probability decreased for patients with lower performance status and decreased for patients with more sites of metastases.
Table 2
Table 2
Maximum likelihood estimates of ψtx. Standard errors are in parentheses. AGE=I{age < 70}; PS=I{performance status [set membership] (2; 3; 4)}; NOSITE1 = I{# of sites = 2}; NOSITE2 = I{# of sites > 2}.
The MLE’s of hT,.(K − 1) based on the IOMTM(2) (specified in (19) and (20)) are given in Table 3. The probability of fatigue for survivors under IROX was highest among the three arms. The probability under IFL was the lowest.
Table 3
Table 3
Maximum Likelihood Estimates of hT,tx(K − 1)) (standard errors) where Tx [set membership] {IFL, IROX, FOLFOX}.
4.2.2 Estimation of SACE
In the following, we present inferential results and sensitivity analyses for log SACEK−1,.(1, 0), log SACEK−1,.(2, 0), and log SACEK−1,.(2, 1). Note that under Assumption 3, we have following equalities
equation M57
As such, we only focus SACEK−1,1(·,·) and SACEK−1,3(·,·) in the following. For each given set of sensitivity parameters, log SACEk,1(1, 0) and log SACEk,3(1, 0) are the log odds ratios that the fatigue score at time T is larger than k for comparing IROX and IFL (standard of care) for subjects who would be alive under all three treatment arms and who would be alive either under all three treatment arms or only under IFL (standard of care) and IROX. The log SACEk,1(2, 0) and log SACEk,3(2, 0) are the log odds ratios similarly defined for comparing IFL and FOLFOX; and log SACEk,1(2, 1) and log SACEk,3(2, 1), for comparing the two experimental treatments, IROX and FOLFOX.
The estimated values of log SACEK−1,1(1, 0) and associated p-values testing whether they are different from zero over the range of τ and λ elicited in Section 4.1 are given in Figure 1 for various values of ρ and ν. When (ρ, ν) = (0, 0), which corresponds to dying on IROX being independent of dying on IFL, dying on FOLFOX being independent of dying on IFL, and dying on FOLFOX being independent of dying on both IROX and IFL, the estimated values of log SACEK−1,1(1, 0) increased as τ and λ respectively increased and decreased. The estimated values of log SACEK−1,1(1, 0) were greater than 0 for all τ and λ (Figure 1(a)), implying that the probability of fatigue on the IROX treatment was greater than the probability of a patient’s fatigue on the IFL treatment. The estimated values of log SACEK−1,1(1, 0) increased as λ decreased for (ρ, ν) = (0.5, 0.5) which corresponds to a moderate correlation between death probabilities on the two treatments and three treatments, respectively (Figure 1(c)). In contrast, when (ρ,ν) = (1, 1) (highest correlation), the log SACEK−1,1(1, 0) decreased as τ increased (Figure 1(e)). However, for all values of (ρ, ν) there were no significant differences between IROX and IFL. (Figure 1(b), 1(d), 1(f)).
Figure 1
Figure 1
Maximum likelihood estimates of log SACEK−1,1(1, 0) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,1(1, 0).
Figure 2 presents estimated values of log SACEK𢄒1,3(1, 0) and corresponding p-values. The pattern was similar for log SACEK𢄒1,1(1, 0) (Figure 2(a), 2(c), 2(e)) and there were no significant differences between the two treatment arms for all values of (ρ, ν) (Figure 2(b), 2(d), 2(f)). We note that the plots for estimated values of log SACEK−1,1(1, 0) and log SACEK−1,3(1, 0) looked similar. However, there were small difference between these estimated values.
Figure 2
Figure 2
Maximum likelihood estimates of log SACEK−1,1(1, 0) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,3(1, 0).
Figures 3 and and44 show similar results for log SACEK−1,1(2, 0) and log SACEK−1,3(2, 0). The estimated values of log SACEK−1,1(2, 0) and log SACEK−1,3(2, 0) were greater than 0 for all values of (ρ, ν). However, there were no significant differences between FOLFOX and IFL.
Figure 3
Figure 3
Maximum likelihood estimates of log SACEK−1,1(2, 0) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,1(2, 0).
Figure 4
Figure 4
Maximum likelihood estimates of log SACEK−1,3(2, 0) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,3(2, 0).
Figure 5 and and66 give the estimated values of log SACEK−1,1(2, 1) and log SACEK−1,3(2, 1) and the corresponding p-values. The estimated values of SACEK−1,1(2, 1) were less than 0 for all values of (ρ, ν), implying that the probability of fatigue on IROX was larger than the probability of fatigue on FOLFOX. However, again the differences were not significant.
Figure 5
Figure 5
Maximum likelihood estimates of log SACEK−1,1(2, 1) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,1(2, 1).
Figure 6
Figure 6
Maximum likelihood estimates of log SACEK−1,3(2, 1) with ρ, ν [set membership] {0, 0.5, 1.0}, respectively, and corresponding p-values for log SACEK−1,3(2, 1).
In this paper, we have proposed a principal stratification approach to estimate the causal effect of treatment in the presence of death for a three arm study with two active treatments and a longitudinal ordinal outcome. Our formulation of sensitivity parameters in Assumption 3 is very much context specific. Based on this approach, we concluded that patients’ fatigue levels were not influenced by treatment, an important finding that allows the use of the treatment with the most favorable anti-cancer activities. However, we note that the power was low for this comparison with few patients still on study at one year.
In our approach, instead of just using the data YT observed at the end of the study, we used all available longitudinal data using the models in Section 3.4 along with Assumption 4. This was essential to the viability of our approach as only 18 subjects in total had the QOL measure at time T, but more than 45% of the subjects were alive at the end of the study. Assumption 4 (Conditional MAR) could be weakened by including covariates in the model specified in (19)(20).
A key assumption in our proposed approach is deterministic monotonicity, Assumption Ia, which assumes that there are no patients who would die on both experimental arms but survive on the control arm. While this is weaker than a more typical assumption that no patient would die on either experimental arm but be alive on the control arm, it still requires justification. In this trial, both experimental arms were superior for overall survival compared to the control, and were also less toxic. In addition, despite extensive analyses of both baseline clinical and blood-based predictors (pharmacogenomics) reported elsewhere (Sanoff et al., 2008; McLeod et al., 2003), no predictors of differential treatment efficacy between the three treatments has been identified. Thus, while personalized medicine has the hope to identify the optimal therapy for each patient (which for some patients may be the control), at the present time for this trial, Assumption Ia is not unreasonable.
A difficult research problem is the best way to ’honestly’ elicit sensitivity parameters in causal inference problems, often from physicians. Here we had the benefit of a statistician, Dr. Sargent, who is also a professor of oncology and was intimately involved in this trial. However eliciting from experts without this dual expertise remains a challenge.
In the current approach, we compute the large sample variance of SACE using the delta method. We plan in future work to examine its operating characteristic in small samples. We also plan to explore a fully Bayesian approach and as such, use informative priors for the sensitivity parameters, (τ, λ, ρ, ν), to obtain a single inference that characterizes our uncertainty about the sensitivity parameters. This will also allow us to easily make non-large sample inferences. This is ongoing work.
Supplementary Material
web appendix
Acknowledgments
We would like to thank Ms. Erin Green of the Mayo Clinic for her help in data collection and clarifying issues with the data. The authors gratefully acknowledge the associate editor and two referees for insightful comments that have greatly improved the manuscript. This project was supported by NIH grants CA85295, HL079457, and CA25224.
Appendix
Proof of Theorem I
We know that the marginal distributions, P(DT(j) = 0) for j = 0, 1, 2 are identified via Assumption 2 and (1),
equation M58
(22)
Now we move to identification of the bivariate distributions P(DT (j) = l,DT (j′) = l′), l, l′ [set membership]{0, 1}, j, j′ [set membership] {0, 1, 2}. From Assumption 1 and (2), we have the following results
equation M59
Therefore, both P(DT(1) = 0, DT(0) = 0) and P(DT(1) = 1,DT(0) = 0) are identified. Since
equation M60
P(DT(1) = 0|DT(0) = 1) is identified. From the following result
equation M61
P(DT(1) = 1;DT(0) = 1) is identified.
We now show that
equation M62
are identified. From Assumption 1 and (3), we have the following results
equation M63
Therefore, both P(DT(2) = 0, DT(0) = 0) and P(DT(2) = 1, DT(0) = 0) are identified.
Since
equation M64
P(DT(2) = 0|DT(0) = 1) is identified. Finally, P(DT(2) = 1|DT(0) = 1) is identified since
equation M65
Now we will prove P(AT(j)) for j = 0, 1, … , 6 are identified. More specifically, we just need to show that
equation M66
for j, l, k = 0, 1 are identified (except (j, l, k) = (1, 1, 1)). We first identify P(AT (5)) = P(DT(0) = 0, DT(1) = 1, DT(2) = 0),
equation M67
We now identify P(AT (0)) = P(DT(0) = 0, DT(1) = 0, DT(2) = 0).
equation M68
To identify P(AT(2)) = P(DT(0) = 1, DT(1) = 1, DT(2) = 0), we have
equation M69
To identify P(AT(1)) = P(DT(0) = 1, DT(1) = 0, DT(2) = 0), we need the following equation
equation M70
To identify P(AT (4)) = P(DT(0) = 0, DT(1) = 0, DT(2) = 1), we use the following relations,
equation M71
To identify P(AT(3)) = P(DT(0) = 1, DT(1) = 0, DT(2) = 1), we have the following equalities
equation M72
Finally, to identify P(AT (6)) = P(DT(0) = 0, DT(1) = 1, DT(2) = 0), we use the following equality
equation M73
Proof of Theorem II
In addition to (15) and (16), we have from Assumption 3
equation M74
(23)
equation M75
(24)
equation M76
(25)
To identify P(YT(0) > k|AT (0)), we use the following equality,
equation M77
Now we replace P(YT(0) > k|AT(4)) and P(YT(0) > k|AT(5)) with (15) and (23), respectively. Then we have the following equality,
equation M78
(26)
where
equation M79
The probabilities P(AT(j)|DT(0) = 0) for j = 0, 4, 5 are identified by Theorem I so the coefficients (A0, B0, C0) are identified. Here, we need a solution of the cubic equation (26) in [0, 1]. In our example, only one solution was in this range.
To identify P(YT(1) > k|AT(0)), we use following equation
equation M80
We replace P(YT(1) > k|AT (1)), P(YT(1) > k|AT (3)), and P(YT(1) > k|AT (4)) with (24), (25) and (16), respectively. Then we have the following equation,
equation M81
(27)
where
equation M82
The probabilities P(AT (0)|DT(0) = 0), P(AT (4)|DT(0) = 0), P(AT (1)|DT(0) = 0), P(AT (5)|DT(0) = 0), P(AT (0)|DT(1) = 0), P(AT (1)|DT(1) = 0), P(AT (3)|DT(1) = 0), and P(AT (4)|DT(1) = 0) are identified by Theorem I so the coefficients (A1;B1;C1;D1) are identified. Similarly to the solution to (26), we have only one solution in [0, 1].
Identification of SACEk,.(2, 0)
Via the following equations,
equation M83
(28)
equation M84
(29)
and the fact that P(AT(0)[union operator]AT(5)), P(AT(0)), and P(AT(5)) are identified from Theorem 1, the identification of P(YT(0) > k|AT(0) [union operator] AT(5)) and P(YT(2) > k|AT(0) [union operator] AT(5)) only requires the additional identification of P(YT(2) > k|AT(0)) for SACEk,1(2, 0), and P(YT(2) > k|AT(5)) for SACEk,2(2, 0), respectively.
From Assumption 3, we have following relations
equation M85
(30)
equation M86
(31)
equation M87
(32)
Therefore, we can identify P(YT(2) > k|AT(1)), P(YT(2) > k|AT(2)), and P(YT(2) > k|AT(5)) by identifying P(YT(2) > k|AT(0)). The identification of this final quantity is given in Corollary I, whose proof follows.
Proof of Corollary I
To identify P(YT(2) > k|AT(0)), we have following equation
equation M88
Now we replace P(YT(2) > k|AT(1)) and P(YT(2) > k|AT(5)) with (11) and P(YT(2) > k|AT(2)) with (13), respectively. Then we have the following equations,
equation M89
(33)
where
equation M90
The probabilities P(AT(0)|DT(2) = 0), P(AT(1)|DT(2) = 0), P(AT(2)|DT(2) = 0), and P(AT(5)|DT(2) = 0) are identified from Theorem 1. In the application, we only found one solution to the quartic equation in [0, 1].
Identification of SACEk,.(2, 1) (Corollary II)
Similar to (17), (18), (28), and (29), we have following expressions for SACEk,3(2, 1),
equation M91
(34)
equation M92
(35)
In the process of identifying SACEk,3(1, 0) and SACEk,3(2, 0), we identified P(YT(1) > k|AT(0)), P(YT(1) > k|AT(1)), P(YT(2) > k|AT(0)), and P(YT(2) > k|AT(5)). In addition, the mixing probabilities were identified in Theorem I.
  • Cheng J. Estimation and inference for the causal effect of receiving treatment on a multinomial outcome. Biometrics. 2009;65:96–103. [PubMed]
  • Cheng J, Small DS. Bounds on causal effects in three-arm trials with non-compliance. Journal of the Royal Statistical Society, Series B. 2006;68:815–836.
  • Egleston BL, Scharfstein DO, Freeman ER, West SK. Causal inference for non-mortality outcomes in the presence of death. Biostatistics. 2007;8:526–545. [PubMed]
  • Egleston BL, Scharfstein DO, MacKenzie E. On estimation of the survivor average causal effect in observational studies when important confounders are missing due to death. Biometrics. 2009;65:497–504. [PMC free article] [PubMed]
  • Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. [PubMed]
  • Gilbert PB, Ronald JB, Hudgens MG. Sensitivity analysis for the assessment of causal vaccine effects on viral load in HIV vaccine trials. Biometrics. 2003;59:531–541. [PubMed]
  • Goldberg RM, Sargent DJ, Morton RF, Fuchs CS, Ramanathan RK, Williamson SK, Findlay BP, Pitot HC, Albets SR. A randomized controlled trial of fluorouracil plus leucovorin, irinotecan, and oxaliplatin combinations in patients with previously untreated metastatic colorectal cancer. Journal of Clinical Oncology. 2004;22:23–30. [PubMed]
  • Harel O, Schafer JL. Partial and latent ignorability in missing-data problems. Biometrika. 2009;96:37–50.
  • Hayden D, Pauler DK, Schoenfeld D. An estimator for treatment comparisons among survivors in randomized trials. Biometrics. 2005;61:305–310. [PubMed]
  • Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997;16:239–257. [PubMed]
  • Holland P. Statistics and causal inference. Journal of the American Statistical Association. 1986;81:945–961.
  • Imai K. Sharp bounds on the causal effects in randomized experiments with truncation-by-death. Statistics and Probability Letters. 2008;78:144–149.
  • Jin H, Rubin D. Principal stratification for causal inference with extended partial compliance. Journal of the American Statistical Association. 2008;103:101–110.
  • Kurland BF, Heagerty PJ. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics. 2005;6:241–258. [PubMed]
  • Lee K, Daniels M. A class of Markov models for longitudinal ordinal data. Biometrics. 2007;63:1060–1067. [PMC free article] [PubMed]
  • Lee K, Daniels M. Marginalized Models for Longitudinal Ordinal Data with Application to Quality of Life Studies. Statistics in Medicine. 2008;27:4359–4380. [PMC free article] [PubMed]
  • McLeod HL, Sargent DJ, Marsh S, Fuchs C, Ramanathan RK, Williamson S, Findlay B, Thibodeau S, Petersen G, Goldberg R. Pharmaco-genetic analysis of systemic toxicity and response after 5-fluorouracil (5FU)/CPT-11, 5FU/oxaliplatin (oxal), or CPT-11/oxal therapy for advanced colorectal cancer (CRC): Results from an intergroup trial. Proc Am Soc Clin Oncol. 2003;22 abstract 1013.
  • Pauler DK, McCoy S, Moinpour C. Pattern mixture models for longitudinal quality of life studies in advanced stage disease. Statistics in Medicine. 2003;22:795–809. [PubMed]
  • Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121.
  • Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association. 1998;93:1321–1339.
  • Roy J, Hogan JW, Marcus BH. Principal stratification with predictors of compliance for randomized trials with 2 active treatments. Biostatistics. 2008;9:277–289. [PubMed]
  • Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701.
  • Rubin DB. Bayesian inference for causal effects. Annals of Statistics. 1978;6:34–58.
  • Rubin DB. Dawid AP, editor. Causal inference without counterfactuals. Journal of the American Statistical Association. 2000;6:34–58. Comment on.
  • Rubin DB. Causal inference through potential outcomes and principal stratification: Application to studies with ‘censoring’ due to death. Statistical Science. 2006;21:299–309.
  • Saltz LB, Cox JV, Blanke C, Rosen LS, Fehrenbacher L, Moore MJ, Maroun JA, Ackland SP, Locker PK, Pirotta N, Elfring GL, Miller LL. for the Irinotecan Study Group. Irinotecan plus fluorouracil and leucovorin for metastatic colorectal cancer. New England Journal of Medicine. 2000;343:905–914. [PubMed]
  • Sanoff HK, Sargent DJ, Campbell ME, Morton RF, Fuchs CS, Ramanathan RK, Williamson SK, Findlay BP, Pitot HC, Goldberg RM. Five-Year Data and Prognostic Factor Analysis of Oxaliplatin and Irinotecan Combinations for Advanced Colorectal Cancer: N9741. Journal of Clinical Oncology. 2008;26:5721–5727. [PMC free article] [PubMed]
  • Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for Non-ignorable Drop-out Using Semiparametric Non-response Models (with Discussion) Journal of the American Statistical Association. 1999;94:1096–1146.
  • Shepherd BE, Gilbert PB, Mehrotra DV. Eliciting a counterfactual sensitivity parameter. American Statistician. 2007;61:56–63.
  • Small D, Tan Z. A stochastic monotonicity assumption for the instrumental variables method. Working paper. 2008
  • Zhang JL, Rubin DB. Estimation of causal effects via principal stratification when some outcomes are truncated by “death” Journal of Educational and Behavioral Statistics. 2003;28:353–368.