Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biometrics. Author manuscript; available in PMC 2017 March 24.
Published in final edited form as:
PMCID: PMC5116003

Estimating the Average Treatment Effect on Survival Based on Observational Data and Using Partly Conditional Modeling


Treatments are frequently evaluated in terms of their effect on patient survival. In settings where randomization of treatment is not feasible, observational data are employed, necessitating correction for covariate imbalances. Treatments are usually compared using a hazard ratio. Most existing methods which quantify the treatment effect through the survival function are applicable to treatments assigned at time 0. In the data structure of our interest, subjects typically begin follow-up untreated; time-until-treatment and the pre-treatment death hazard are both heavily influenced by longitudinal covariates; and subjects may experience periods of treatment ineligibility. We propose semiparametric methods for estimating the average difference in restricted mean survival time attributable to a time-dependent treatment, the average effect of treatment among the treated, under current treatment assignment patterns. The pre- and post-treatment models are partly conditional, in that they use the covariate history up to the time of treatment. The pre-treatment model is estimated through recently developed landmark analysis methods. For each treated patient, fitted pre- and post-treatment survival curves are projected out, then averaged in a manner which accounts for the censoring of treatment times. Asymptotic properties are derived and evaluated through simulation. The proposed methods are applied to liver transplant data in order to estimate the effect of liver transplantation on survival among transplant recipients under current practice patterns.

Keywords: Landmark analysis, Observational data, Partly conditional model, Proportional hazards regression, Time-varying covariates, Treatment effect

1. Introduction

It is often of interest in biomedical settings to evaluate the benefit of a treatment on survival. In many clinical settings, treatment is not administered right at the time of diagnosis, such that a period of waiting time occurs for some (or perhaps all) patients. In cases where treatment is not randomized, it is often useful to assess the benefit of treatment under current treatment assignment patterns. Through the average effect-of-treatment-on-the-treated (ETT; Pearl, 2009), one can evaluate the benefit of treatment as currently practiced.

Survival probabilities are easily understood by health care professionals, as is the area under the survival curve (restricted mean lifetime). Various authors have proposed using Cox regression with the primary goal not being to estimate hazard ratios, but to compare differences in survival and/or restricted mean lifetime. For example, Zucker (1998) and Chen and Tsiatis (2001) proposed methods that involved averaging over fitted values from Cox models. Zhang and Schaubel (2011) extended the methods of Chen and Tsiatis (2001) to accommodate dependent censoring, then subsequently developed double-robust methods (Zhang and Schaubel, 2012). Each of the afore-described methods applies to treatments assigned at baseline, as opposed to time-varying treatments.

In the data structure of interest in this report, all patients begin follow-up untreated, with some eventually receiving treatment and others dying beforehand. Pre-treatment mortality and treatment assignment rates are dependent on longitudinal covariates (including periods during which a subject is declared treatment-ineligible), such that a patient’s pre-treatment death is dependently censored by the receipt of treatment. Post-treatment survival is dependent on a subject’s condition at the time of treatment, and the duration of pre-treatment follow-up time. Our objective is to contrast two scenarios: (a) treatment is never assigned (b) treatment is assigned according to current practice patterns.

The proposed methods are motivated by the end-stage liver disease (ESLD) setting. The number of available deceased-donor livers is always less than the number of patients in need of liver transplantation. As a result, medically suitable patients are placed on a liver transplant waiting list. Patients typically begin follow-up on the wait list (‘untreated’; i.e., not transplanted), such that transplantation can be viewed as a time-dependent treatment. In the U.S., chronic end-stage liver disease patients are sequenced in decreasing order of Model for End-Stage Liver Disease (MELD) score, a very strong predictor of pre-transplant mortality. Transplantation results in the dependent censoring of pre-transplant death, since MELD scores predict both wait list mortality and transplant rates. Note that patients may be removed from the wait list (or made inactive) and, in such cases, are permanently (or temporarily) ineligible to receive a transplant. In the setting of our interest, the effect of treatment on the treated is of greater interest than the average causal effect, due to the implausibility of all patients receiving treatment.

Our analysis in Section 5 is different from that in Gong and Schaubel (2013) since (i) the former only looked at pre-transplant survival; (ii) did not compare post- versus pre-transplant survival; (iii) reported contrasts only in terms of the hazard ratio; and (iv) did not exclude Status 1 (acute liver failure) patients and, in fact, focused on contrasting them with chronic ESLD patients.

We develop semiparametric methods to estimate the average effect-of-treatment-on-the-treated through partly conditional modeling. The proposed method involves averaging over the observed instances of treatment initiation, with the averaging accounting for the various complexities in data structure; e.g., treatment initiation times are subject to right censoring; patients may die before treatment is received; and patients cannot initiate treatment while ineligible. For each treated patient, we use the accrued history (up to the time of treatment initiation) to project out a survival curve for post-treatment residual lifetime. Based on the same accrued pre-treatment history, we also project out the survival curve that would apply in the absence of treatment. This set-up lends itself well to partly conditional modeling (Zheng and Heagerty, 2005; Gong and Schaubel, 2013); see also the closely related concept of landmark analysis (Feuer et al., 1992; van Houwelingen, 2007; van Houwelingen and Putter, 2012; Parast, Tian and Cai, 2014). Gong and Schaubel (2013) developed methods for fitting partly conditional hazard regression models which apply to the absence-of-treatment setting in our set-up. We extend the ideas in Gong and Schaubel (2013) to estimate the average ETT through residual survival and restricted mean survival time. Although we focus on partly conditional modeling in this report, it should be noted that other pertinent methods exist, as described in Section 6.

The remainder of this article is organized as follows. In Section 2, we describe the proposed methods. Asymptotic properties are provided in Section 3 (for proofs, see Supplementary Materials), with finite-sample properties evaluated through simulation in Section 4. We apply the proposed methods to the motivating data set in Section 5. Concluding remarks are made in Section 6.

2. Proposed Methods

2.1 Set-up and Notation

We now formalize the ideas introduced in Section 1, in the absence of censoring. We remove subscripting, such that defined variates pertain to any hypothetical subject. We let T represent treatment time, with T > 0 since subjects begin follow-up untreated. Death time in the absence of treatment is denoted by D0. Note that, consistent with the intent-to-treat principle, patients that initiate treatment are considered to be ‘treated’ thereafter. Let x2130(s) = 1 if the patient is treatment-eligible (i.e., eligible to initiate treatment) at time s, and 0 otherwise. A patient may oscillate between the eligible and ineligible states before time D0T, where ab = min(a, b). In particular, x2130(s) = 0 for s > D0T, since a patient cannot initiate treatment more than once, and cannot initiate treatment after death. Note that a patient may only initiate treatment while eligible; i.e., dI(Ts) = x2130(s)dI(Ts).

Under the above-listed Scenario (a), T = ∞. Under Scenario (b), treatment only occurs when T < D0, in which case D0 is considered latent; D0 serves as a competing risk for T. For a patient with treatment time T = s, D1 is the death time, such that (D1s)+ is the residual post-treatment survival, with a+ = a · I(a > 0) and I(·) being the familiar 0/1 indicator function. The quantity (D0s)+ then represents the residual survival beyond s that would have been observed in the absence of treatment. Note that if D0 < T, then D1 is undefined.

The covariate vector, which contains some time-varying elements, is denoted by Z*(s). The patient’s covariate and eligibility history up to time s is given by H(s) = {Z*(u), x2130(u); 0 ≤ u < s}. The above described set-up is illustrated in Figure 1. For a patient with treatment-initiation time T = s, we are interested in the average difference between (D1s)+ and (D0s)+ given [H(s), T = s], with the average being taken with respect to the subdistribution function for T.

Figure 1
History on [0, s) and residual survival beyond s under two scenarios. In each case, covariate (demographic, biological) and treatment-eligibility history H(s) has accumulated, and the subject is treatment-eligible at time s, x2130(s) = 1. ...

2.2 Treatment Effect: Conditional and Average

For a patient initiating treatment at time T = s, there are two death times of interest; the post-treatment residual death time, (D1s)+, and the residual death time that would have occurred in the absence of treatment, (D0s)+. At the time of treatment (e.g., T = s), we observe H(s), and x2130(s) = 1. Conditional on [H(s), T = s], we contrast



the survival functions pertaining to the counterfactual variates (D1s)+ and (D0s)+, respectively. Note that, in both S1(t; s|·) and S0(t; s|·), the time index s represents conditioning time, while t refers to residual survival t time units beyond the conditioning time, s. That is, Sj(t; s|·) pertains to a gap of t units beyond time s, which equals total time (s + t). We assume strong ignorability (Rubin, 1978), permitting inference on the counterfactuals (D1s)+ and (D0s)+, through observed data. The strong ignorability assumption is detailed in the Supplementary Materials. An implication this assumption is that S0(t; s|H(s), T = s) = S0(t; s|H(s), x2130(s) = 1), consistent with the counterfactuals (D1s)+ and (D0s)+ being independent of the receipt of treatment at time s.

For fixed L > 0, restricted mean residual survival times are given by



Conditioning on [H(s), T = s], a pertinent contrast in survival functions is then


while a contrast in restricted mean residual lifetime is defined as


Average survival functions are then defined as



where, in each case, the expectation is taken with respect to the joint distribution of [H(T), T)] over the identifiable range of T which would in practice be capped by the maximum follow-up time. Correspondingly, average restricted mean residual lifetimes are:



The ETT can then be defined in terms of mean difference in survival probability as


and in terms of average difference in residual mean survival time, by


Having specified the treatment effect of interest, the remaining subsections in Section 2 describe the proposed methods for estimating δ(t) and Δ(L).

2.3 Observed data: Notation and set-up

We let Di denote the death time for subject i (i = 1, …, n). The time of treatment is given by Ti, with Ti = ∞ when Di < Ti. Treatment and death times are subject to independent right censoring, Ci, intended to represent the combination of administrative censoring and random loss to follow-up. Observation time is then given by Xi = DiCi. We define counting processes for death, treatment and censoring, as Ni(t) = I(DitCi), NiT(t)=I(TitDiCi) and NiC(t)=I(CitDi), respectively. Recall that x2130i(u) equals 1 if patient i is eligible for treatment at time u, and 0 otherwise. Note that NiT(t)=0ti(u)dNiT(u), since treatment can only be initiated for an eligible subject. The covariate vector, observed longitudinally, is denoted by Zi*(t). The covariate and treatment-eligibility history for subject i as of time t is denoted by i(t)={Zi*(u),i(u);u[0,t)}. Covariate information is assumed to not be available after treatment is assigned, such that the total observed history for subject i is given by Hi(XiTi); such data are not required by the proposed methods.

2.4 Assumed Models and Estimation Methods

We now describe the assumed models for (Di1Ti)+,(Di0Ti)+, Ti and Ci. As implied by (7) and (8), our target ETT implies averaging over the observed [Ti, Hi(Ti)] distribution. Per (1) and (2), we achieve this by working with [(Di1s)+|i(s),Ti=s] and [(Di0s)+|i(s),Ti=s] directly, after which we will then average explicitly. We model the partly conditional hazard function for [(Di1s)+|i(s),Ti=s], which uses in the covariate vector all pertinent information in the history prior to the time of treatment, Hi(Ti). The model is partly conditional since the covariate is not updated after the time treatment is initiated. The covariate is not updated after time Ti since we want to project residual survival from Ti onward, and a survival projections based on traditional time-dependent model would require a model for Hi(s + t). In many cases, a model for Hi(s + t) is complicated to fit accurately, and is of little inherent interest to the investigators.

2.4.1 Post-Treatment Survival

We let λ1(t; s|H(s), T = s) denote the hazard function corresponding to S1(t; s|H(s), T = s) from (1). We assume the following proportional hazards model,

λ1(t;s|i(s),Ti=s)=λ01(t) exp{β1Zi1(s)},

where the covariate Zi1(s) is chosen to summarize the pre-treatment history, {Hi(s), Ti = s}, pertinent to predicting post-treatment survival. Typically, time until treatment, Ti, would be represented parametrically in the covariate vector, Zi1(s). Note that the Zi1(s) covariate is fixed at treatment time Ti = s, reflecting the partly conditional (Zheng and Heagerty, 2005; Gong and Schaubel, 2013) nature of (11), which uses time-dependent data ‘frozen’ at time of treatment. This could also be considered a ‘landmark’ analysis (e.g., van Houwelingen, 2007), with landmark times being customized to each subject and set to Ti.

We assume that treatment times are independently censored by Ci. Assuming that (DiTi)+ is independently censored by (CiTi)+ given [Zi1(Ti), Ti], parameter estimation for model (11) proceeds through unweighted partial likelihood. We denote the resulting estimators for model (11) by [beta]1 and [Lambda with circumflex]01(t), with the latter being the Breslow-Aalen (1972) estimator. We estimate S1(t; s|Hi(s), Ti = s) by Ŝ1(t; s|Zi1(s)) = exp{−[Lambda with circumflex]1(t; s|Zi1(s))}, where Λ^1(t;s|Zi1(s))=Λ^01(t) exp{β^1Zi1(s))}, and μ1(L; s|Hi(s), Ti = s) by μ^1(L;s|Zi1(s))=0LŜ1(t;s|Zi1(s))dt.

2.4.2 Survival in the Absence of Treatment

We begin by describing the assumed hazard model for survival in the absence of treatment. We then outline the proposed data augmentation, which involves selecting calendar date cross-sections. Next, we detail fitting the model through an inverse weighted and stratified log rank estimating function.

We let λ0(t; s|H(s), T = s) denote the hazard function corresponding to (2). Under strong ignorability, note that λ0(t; s|Hi(s), Ti = s) = λ0(t; s|Hi(s), x2130i(s) = 1), which we use in listing the assumed model,

λ0(t;s|i(s),i(s)=1)=λ00(t) exp{β0Zi0(s)},

where Zi0(s) is chosen such that λ0(t; s|Hi(s), x2130i(s) = 1) = λ0(t; s|Zi0(s)), implying that Zi0(s) contains all elements of the history pertinent to predicting (Di0s)+, including all appropriate functions of time-already-survived, s. Model (12) is partly conditional (Zheng and Heagerty, 2005; Gong and Schaubel, 2013) since, although the hazard at time s + t is of interest, the model conditions on information which is ‘frozen’ at time s. In contrast, a typical (fully) conditional or ‘time-dependent’ model would condition on Hi(s + t).

Partly Conditional Model

The motivation for using a partly conditional model is described at the start of Section 2.4. Generally, fitting a partly conditional model requires some form of data augmentation in which the records corresponding to each subject’s observed data are restructured in order to facilitate fitting the assumed model. After such augmentation, each input record has a prior time survived (e.g., si) and corresponding prior history Hi(si), with residual survival in the absence of treatment, (Disi)+ then being analyzed. In fitting the post-treatment residual survival model (11), there is an obvious choice for each treated subject’s conditioning time, namely si := Ti. In accordance with (2), we actually need to project residual survival (in the absence of treatment) beyond this same conditioning time. Although the appropriate conditioning time for projecting (1) is clear, the nature of the data augmentation for fitting model (12) requires consideration.

Calendar Time Cross-sections

In landmark analysis, typically survival from a specific follow-up time point (or set of specific time points) is desired, with survival probability projected out after the chosen landmark time(s). In our case, since treatment can occur at any time point (e.g., T = s), we need to be able to project conditional survival forward from any conditioning time s. This suggests a partly conditional model which includes terms representing previous time survived, s. Variation in previous time survived is then required, which means that sampling component of the data augmentation should be based on something other then s itself. We choose to sample based on calendar time, since each calendar time cross-section will contain wide variation in previous time survived. As we later describe, we stratify the model by cross-section for computational savings, which is important in large data sets like that we analyze in Section 5. For instance, Gong and Schaubel (2013) developed a partly conditional model which chooses the conditioning times to be the si values observed on a randomly selected calendar date. For example, consider a particular calendar date (e.g., 07/01/2004); input records for fitting the model would consist of si (subject i’s prior follow-up time as of 07/01/2004), the corresponding Hi(si), and (Xisi)+ among subjects who (as of 07/01/2004) were alive, uncensored, yet-untreated, but eligible to initiate treatment; i.e., {i : Xi > si, x2130i(si) = 1}.

Method of Gong and Schaubel (2013)

The estimation of β0 from model (12) was developed by Gong and Schaubel (2013). The essential ideas are presented here for continuity, and because the authors only derived the properties of [beta]0, but not those of Ŝ0(t; s|Zi0(s)), [mu]0(L; s|Zi0(s)), Ŝ0(t) or [mu]0(L).

To begin, we choose a set of K calendar dates, {CS1, …, CSK}. Each cross-section date CSk is intended to represent a calendar date at which a set of treatment-eligible patients (could have been but) was not treated; we model residual survival in the absence of treatment from this date forward. For calendar date CSk, we select the cross-section of treatment-eligible patients who were not treated (on or before that day). For patient i, follow-up time (previous time survived) as of calendar date CSk is denoted by sik. Hence, a patient selected into cross-section CSk must, as follow-up time sik be: alive (Di > sik), uncensored (Ci > sik), untreated (Ti > sik) and treatment-eligible x2130i(sik) = 1. Three remarks are important at this juncture. First, treatment-eligibility is a cross-section inclusion criterion, but not a censoring criterion; e.g., having been included in cross-section k and, hence, with x2130i(sik) = 1, patient i is not censored upon subsequently being deemed treatment-ineligible. Second, the covariate will be frozen at sik, such that the survival projection for the residual time (Di0sik)+ is based on Hi(sik). Third, a patient included in cross-section k is censored if treated; this induces dependent censoring. Each of these remarks is formalized shortly.

We now establish additional notation pertinent to model (12). Since survival time from cross-section is modeled, we define the following times-since-cross-section: Dik = (Disik)+, Tik = (Tisik)+ and Cik = (Cisik)+ as the death, treatment and censoring time respectively corresponding to the ith patient and measured from the kth cross section date. Figure 2 provides an illustration of how the treatment-free observation time is transformed into time-since-cross-section times. A modified counting and at-risk processes are also defined as Ni0k(t) = Ni(sik + t)I(Ti > sik + t) and Yi0k(t) = I(DikCikt), respectively.

Figure 2
Examples of the relationship between cross-section time and follow-up time. Four subjects (i = 1, …, i = 4) and two cross sections (k = 1, 2) are shown. The four subjects begin follow-up at different calendar dates. For subject i = 1, failure ...

Following Gong and Schaubel (2013), we estimate β0 through the stratified model,

λ0k(t;s|i(sik),i(sik)=1)=λ00k(t) exp{β0Zi0(sik)},

where β0 is the same parameter in the unstratified model of interest, (12). Model (13) is quite flexible. Non-proportionality can be accommodated by replacing β0 with β0(t), a parametric function on t. The parameter vector could also be allowed to be a parametric function of previous time survived; e.g., β0k, or β0(sik). Moreover, interactions between si and elements of Hi(si) are also possible. Alternatively, van Houwelingen and Putter (2015) suggested a stopped Cox model to avoid non-proportionality, with artificial censoring at t = L. By breaking the stratification on k, one could also model the effect of calendar time.

Inverse weighting

Model (13) conditions on Hi(sik). However, we anticipate that Hi(sik + t) would be predictive of both the treatment hazard and the pre-treatment death hazard at time (sik + t). The mutual association, even conditional on Hi(sik), between pre-treatment death after sik, the probability of treatment after sik and Hi(sik + t) sets up dependent censoring of (Disik)+ by (Tisik)+. The potential bias due to such dependent censoring can be corrected through a variant of Inverse Probability of Censoring Weighting (IPCW; e.g., Robins and Rotnitzky, 1992) which requires a model for the treatment-initiation hazard. We fit the following two treatment hazard models:

λiT(t|i(t),i(t))=(sik)i(t)λ0T(t) exp{θ0Zi(t)},

(sikλik(t;sik|Zi0(sik),(sik)=(sik)λ0k(t) exp{θ1Zi(sik))},

with model (14) assumed to be the correct model; model (15) is expected to be misspecified, but is only used to provide a weight stabilizer. We assume no-unmeasured-confounders for treatment, λiT(t|i(t))=λiT(t|i(Di),Di), and that λiT(t|i(t))=λiT(t|Zi(t)). Note that λiT(t|i(t),i(t)) in (14) uses (total) follow-up time t (measured from time 0) as the time axis, conditions on information on [0, t), while λik(t;sik|Zi0(sik),(sik)=1)) in (15) uses (residual) time since sik and conditions on the history over [0, sik] given [x2130i(sik) = 1]. Parameters in (14) and (15) are estimated through standard partial likelihood (Cox, 1975).

As derived in Gong and Schaubel (2013), an appropriate weight function is given by

WikA(t)=Yi0k(t) exp{ΛiT(sik+t)ΛiT(sik)},

where ΛiT(t)=0ti(u)λ0T(u) exp{θ0Zi(u)}du. The quantity WikA(t) can be thought of as the inverse of the conditional probability of remaining untreated at time (sik + t), given that the subject was untreated and treatment-eligible at time sik. Gong and Schaubel (2013) suggest the following stabilized inverse weight,


Note that artificially censoring subjects at t = L would be an alternative to the stabilizer.

Parameter Estimation for Model (12)

An estimator for β0, denoted by [beta]0, is obtained through solving the following inverse-weighted score function,


with Z¯0k(t;β0)=R0k(1)(t;β0)/R0k(0)(t;β0) and R0k(d)(t;β0)=n1i=1ni(sik)Wik(t)Zi0(sik)d exp{β0Zi0(sik)} with d = 0, 1, 2 and where τ0k satisfies P{Yi0k0k) = 1} > 0, and can in practice be set to the largest Xik among subjects with x2130i(sik) = 1. A Breslow-Aalen estimator pooled across strata is obtained as


for t [set membership] (0, L], where R0(0)(u;β0)=k=1KR0k(0)(u;β0).

2.4.3 Conditional Treatment Effect

Consider patient i, treated at follow-up time Ti = s with covariate history Hi(s). Post-treatment survival probability for this patient is predicted by Ŝ1(t; s|Hi(s), Ti = s), while predicted L−year restricted mean post-treatment lifetime is given by [mu]1(L; s|Hi(s), Ti = s). Correspondingly, in the absence of treatment, predicted survival and L-year restricted mean lifetime for subject i (from Ti onward) would be given by Ŝ0(t; s|Hi(s), Ti = s) and μ^0(L|i(s),i(s)=1)=0LŜ0(t|i(s),i(s)=1)dt, respectively. The treatment effect corresponding to treatment initiation by subject i at follow-up time Ti can then be estimated by


in terms of survival probability, and


in terms of restricted residual mean survival time.

2.4.4 Average Treatment Effect

Having established how to estimate the treatment effect for a subject treated at Ti = s with covariate history Hi(s), we now describe how to estimate the quantities of chief interest, namely δ(t) = E[δ(t|Hi(s), Ti = s)] and Δ(L)=0Lδ(t)dt from (9). In the absence of censoring, we could average with respect to the empirical distribution of {Ti, Hi(Ti)} values. Right censoring of Ti values rules out using the sample mean, since this averaging would then generally depend on the Ci distribution. This implies inverse weighting the observed treatment assignments, such that the inverse weighted distribution reflects that which would have been obtained in the absence of censoring. We use the result,


where FiT(t|i(t))=E[0tdI(Tiu)|i(u)] is analogous to the cumulative incidence function for Ti (with Di serving as a competing risk) and with Gi(u) = P(Ci > u|Zi(0)). We assume the following proportional hazards model for Ci,

λiC(t)=λ0C(t) exp{α0Zi(0)}.

Observed data used to fit model (23) include {Xi, I(Ci < Di), Zi(0)}, with α0 and Λ0C(t)=0tλ0C(u)du estimated through unweighted Cox regression. Note that Ci is viewed in this report as administrative censoring, in which case (23) may not even depend on Zi(0). If in fact λiC(t) depended on the Hi(t), model (23) could easily be enriched to accommodate such dependence, with little subsequent modification to the procedures next described.

Finally, estimators of δ(t) and Δ(L) are given by



respectively, where Ĝi(u)=exp{Λ^iC(u)}, and with τ satisfying P(Xi ≥ τ) > 0 and typically chosen to be the maximum observed follow-up time.

3. Asymptotic Properties

We assume that the random vectors {Xi, Ni(Xi), NiT(Xi), Hi(XiTi)} are independent and identically distributed for i = 1 … n, with all elements of Hi(t) bounded for t [set membership] (0, τ]. A complete list of regularity conditions is provided in the Supplementary Materials document.

Theorem 1

Under certain regularity conditions, n1/2{[delta with circumflex](t) − δ(t)} and n1/2{[Delta with circumflex](L) − Δ(L)} each converge asymptotically to zero-mean Gaussian processes with covariance functions Ej(t)2] and E[ηj2], respectively, where1(t), …, ξn(t)} and1(L), …, ηn(L)} are i.i.d. with mean 0 asymptotically. Expressions for ξi(t) and ηi(L)=0Lξi(t)dt, which are quite lengthly, are provided in the Supplementary Materials.

Variance estimators for [delta with circumflex](t) and [Delta with circumflex](L) are given by n2i=1nξ^i(t)2 and n2i=1nη^i(L)2, respectively; where [eta w/ hat]i(L) and [Xi w/ hat]i(t) are computed by replacing all limiting values by their empirical counterparts. A proof of Theorem 1 is given in the Appendix. The essence of the proof is demonstrating that, asymptotically, n1/2{δ^(t)δ(t)}=n1/2i=1nξi(t)+op(1) through a sequence of Taylor series expansions and applications of empirical process results.

The proof is provided for the weight, ŴikA(t). In practice, the stabilized weight, ŴikB(t) would often be preferred. As implied by Theorem 1, the computation of the variance is quite involved, and such computation becomes more complicated when a stabilizer is incorporated. Such concerns motivate a computationally simpler form for the variance estimator, resulting from taking Ĝi(t)−1 and ŴikA(t), or ŴikB(t) as the case may be, as fixed. Variance estimators for [delta with circumflex](t) and [Delta with circumflex](L) then simplify considerably. We evaluate the performance of these simplified variance estimators through simulation in Section 4.

4. Simulations

We generated treatment-free survival to follow the assumed partly conditional model using methods from Gong and Schaubel (2013). First, subject i enters the study on calendar date, Bi, which is generated from a Uniform(0, b) distribution. We then generate a single binary (0,1) group indicator Zia, taking the value 1 with probability 0.5. A longitudinal covariate, Zi(sik), is then created and assumed to be measured at a common set of cross-section dates: CS1, CS2, …, CSK. To generate data {Di, Zia, Zib} where Zib = vec{Zi(sik)}, we first let Zib0=bi+k=1Klog(Vik)/γ2, where bi ~ N(μ, σ2) and Vik ~ P(ρ), independent positive stable random variables with index ρ. A pre-treatment death time, Di0, is then generated with hazard λi0(t)=Vi01/ρλ0(t) exp{γ1Zia+γ2Zib0}, where Vi0 ~ P(ρ) and is independent of Vik, with Λ0(t) = (t/a)1/ρ2 and a is a constant. Setting Zi(sik) = Zib0 − log(Vik)/γ2, the pre-treatment death hazard can then be written as λi0(t)=Vi01/ρλ0(t) exp{γ1Zia+γ2Zi(sik)+log(Vik)}. Treatment time, Ti, is generated from the proportional hazards model, λiT(t)=λ0T(t) exp{θ01Zia+θ02I(Ri>t)}, where λ0T(t)=d3 and θ0=(θ01,θ02) and the time of treatment-ineligibility, Ri, is generated with hazard λiR(t)=λ0R(t) exp{d1Vi0}, where λ0R(t)=d2. Thus, Ri and Di are positively correlated, which is consistent with the data which motivated the proposed methods. Independent censoring time, Ci, is generated from hazard λiC(t)=λ0C(t) exp{α0Zia}, where λ0C(t)=d4. Note that treatment time and pre-treatment death time, Ti, and Di are dependent since both depend on treatment-ineligibility time, Ri. However, the independent censoring time Ci is independent of Di conditional on Zia.

After obtaining the pertinent survival function, transforming the time scale to represent time since cross-section (setting tk = tsik), then averaging, we obtain


Setting Λ0(t) = (t/a)1/ρ2 and λ0(tk + sik20(tk + sik)}2−1) = 1/a yields

λi(tk|Zia,Zi(sik),Di>sik)=exp{ρ2γ1Zia+ρ2γ2Zi(sik)}/[a cos(πρ/2)(ρ+1)].

If we define λi0k(t; sik) = λi(tk|Zia, Zi(sik), Di > sik), λ00k(t) = [a cos(πρ/2)(ρ+1)]−1 and β0 = (β01, β02) = (ρ2γ1, ρ2γ2), then the proportional hazards model for pre-treatment death time is given by λi0k(t; sik) = λ00k(t) exp{β01Zia + β02Zi(sik)}.

For patients who received treatment prior to dying (Di > Ti), a post-treatment death time (Di1Ti)+, is then generated via the hazard, λi1(t; Ti) = λ01(t) exp{β11Zia+β12Zi(Ti)}, where t represents time from treatment and β1=(β11,β12)=(ρ2γ1,ρ2γ2). We set λ01(t) = a1.

The complexity in the data generator is necessary to induce the partly conditional structure of the pre-treatment survival model. The positive stable frailty has become a common choice in the simulation of multivariate survival set-ups due to its preservation of the proportional hazards assumption both conditionally and marginally. Analogous set-ups were used by Zheng and Heagerty (2005) and Gong and Schaubel (2013).

We used K = 10 cross section dates, with CSk = 100 × k. For the simulation results presented, parameter specifications were as follows: b = 500, (θ01, θ02) = (−1, −1), μ = 18, σ = 1, (γ1, γ2) = (−1, −0.5), d1 = d2 = d3 = d4 = 0.001, and ρ = 0.8, which implies (β01, β02) = (β11, β12) = (−0.64, −0.32); We varied a from a = 2000, to a = 5000 and a = 7000, which led to treatment initiation rates of 10%, 15% and 20%, respectively; with similar independent censoring rates in each case. Each data configuration was replicated 1000 times, with n = 500 subjects per replicate.

We present settings where treatment has no effect (δ(t) = Δ(L) = 0), for which a1 = [a cos(πρ/2)(ρ+1)]−1. We also list results for a setting with a positive treatment effect (δ(t) > 0, Δ(L) > 0) induced by specifying a1 = 0.5 × 10−4. In developing appropriate parameter settings, we conceptualized the time scale as representing days. For reporting purposes, time is recorded in years, with results presented for [delta with circumflex](1), [delta with circumflex](2), [delta with circumflex](3) and [Delta with circumflex](3). The weight ŴikB(t) was used throughout, with the simplified variance estimators applied.

Table 1 presents simulation results for settings with Δ(L) = 0 and Δ(L) > 0. The quantity Δ(L), with L = 3, can be interpreted as the difference of 3-year restricted mean survival time due to treatment, among the treated. The proposed estimators appear to be approximately unbiased, with coverage probabilities close to the nominal 95% level. Some degree of under-coverage is observed, which is due to the approximation of the results from Section 3 by treated the (random) weights as fixed. The under-coverage is not in unacceptable amounts, particularly relative to the great reduction in complexity and hence computational burden associated with the approximation.

Table 1
Simulation results: n = 500, with weight function WikB(t)

We examined the performance of the proposed methods under various degrees of model misspecification (see Supplementary Materials). The methods generally perform adequately, although some bias is introduced, and increases with increasing model misfit. The method appears to be most sensitive to misspecification of the treatment initiation hazard.

5. Application to Liver Transplant Data

We applied the proposed methods to estimate the average effect of liver transplantation among the transplanted, by Model for End-stage Liver Disease (MELD) score. This study used data from the Scientific Registry of Transplant Recipients (SRTR). The SRTR data system includes data on all donor, wait-listed candidates, and transplant recipients in the U.S., submitted by the members of the Organ Procurement and Transplantation Network (OPTN), and has been described elsewhere. The Health Resources and Services Administration (HRSA), U.S. Department of Health and Human Services provides oversight to the activities of the OPTN and SRTR contractors.

The study population included patients age ≥ 18 wait listed between 03/01/2002 and 12/31/2009. We excluded patients who were Status 1 (acute liver failure) or previously transplanted. Cross-section dates were chosen every 7 days, 30 days or 90 days from 03/01/2002 to 12/31/2009, which led to K=409, 96, or 32 cross sections respectively. The transplant hazard model, λirT(t)=i(t)λ0rT(t) exp{θ0Zi(t)}, was stratified by United Network for Organ Sharing (UNOS) Region (r = 1, …, 11). The covariate, Zi(t), included MELD score, albumin, age, gender, race, diagnosis of Hepatitis C, body mass index, diabetes, hospitalization, blood type, dialysis within prior week, encephalopathy, ascites and serum creatinine.

The pre-transplant death model, λi0kr(t)=λ00kr(t) exp{β0Zi(sik)}, was also stratified, where k = 1, …, K stands for cross section and r again denotes UNOS Region. The covariate, Zi(sik), included MELD score, albumin, age, gender, race, diagnosis, body mass index, diabetes, hospitalization status at listing, previous dialysis, malignancy, time on wait-list (i.e., sik itself), slope of MELD score over [0, sik], slope of albumin, percentage of time spent in inactive status, and percent of time receiving dialysis. In the post-transplant death model, λi1(t;Ti)=λ01(t) exp{β1Zi1(Ti)}, Zi1(Ti) included terms for Ti, MELD score, albumin, age, gender, race, diagnosis, body mass index, diabetes, hospitalization status at listing, previous dialysis and malignancy and Donor Risk Index (DRI; Feng et al., 2006).

The pre-transplant study sample consisted of n = 66, 884 patients, of which 34,539 were observed to receive a deceased-donor liver transplant. For the MELD 30–40 subgroup, weekly cross section dates were chosen. For MELD 18–29 cross sections were drawn monthly. For MELD 6–17, cross sections were drawn every 3 months. Note that, we also tried weekly cross section dates for MELD 6–29 patients, which yielded almost identical results. The analysis was based on the weight, WikB(t).

Figure 3 shows the estimated survival curves for MELD groups 6–8, 15–17, 20–22 and 36–40. Note that the MELD score categories refer to MELD at transplant. Within a MELD category, Ŝ1(t) can be interpreted as the average survival probability, with t representing residual time post-transplant. Analogously, Ŝ0(t) can be interpreted as the average survival that would have resulted in the absence of liver transplantation, among patients who received a liver transplant. For the MELD 6–8 group, survival in the absence-of-transplantation exceeds post-transplant survival until approximately t = 2 years post-transplant. However, Ŝ1(t) > Ŝ0(t) for t > 2 years, with the distance between the curves widening as t increases. The early survival advantage (absence-of-transplant versus with a transplant) for patients in the MELD 6–8 group is the combination of relatively mortality in this subgroup, combined with the risk of surgery-related mortality (not faced unless transplantation occurs). The early survival advantage without transplant is even observed in MELD 15–17 patients, but is much less pronounced and very short-lived. In fact, Ŝ1(t) > Ŝ0(t) for t > 0.25 years in this subgroup. For MELD 36–40 group, the absence-of-transplant survival curve drops dramatically during the first couple of months, then steadily declines thereafter. Note that Ŝ1(t) curves are quite similar across MELD subgroups, with Ŝ0(t) decreasing strongly as MELD increases.

Figure 3
Analysis of SRTR data: Estimated survival curves after with a liver transplant (solid line) and in the absence of liver transplantation (dashed line) among liver transplant recipients. The time axis t is years post-transplant.

In Table 2, we list estimates of the difference in survival probability, [delta with circumflex](t) for t = 1, 3, 5 years, as well as [Delta with circumflex](5), the difference in 5-year restricted mean residual lifetime. The group that benefits the most from liver transplantation is clearly MELD 36–40, with an average gain in residual survival time of [Delta with circumflex](5) ≈ 2.4 years. The next greatest gain is observed in the MELD 30–35 group, with [Delta with circumflex](5) = 1.4 years. For MELD scores between 15 and 30, there is little difference in the gain in 5-year restricted mean residual survival time, with [Delta with circumflex](5) fluctuating about 1 year across the MELD 26–29, 23–25, 20–22, 18–19 and 15–17 subgroups. Only for the MELD 6–8 group is H0 : Δ(5) = 0 not rejected.

Table 2
Analysis of SRTR data: Estimating the effect of liver transplantation on the transplanted (with 95% confidence interval in parentheses), by MELD score at transplant.

In the Supplementary Materials, we provide results based on the Sequential Stratification method (Schaubel, Wolfe and Port, 2006; Schaubel et al., 2009), which features inverse weighted time-dependent stratification to create customized comparisons groups for each subject receiving the time-dependent treatment. Comparing our results in Table 2 to those based on Sequential Stratification, the main difference is in the MELD 6–8 group; the models from Sharma et al (2015) report a hazard ratio of 2.04 (p < 10−4), indicating that liver transplant is associated with a doubling of the mortality hazard in this subgroup. In the presence of non-proportionality (which is clear in Figure 3, particularly for this subgroup), the hazard ratio and difference in restricted mean do not have to agree.

Additional analysis is presented in the Supplementary Materials. For each MELD category, multiplying the number of transplants by the [delta with circumflex](5) yields the number of life-years saved via liver transplantation (considering only the first 5 post-transplant). The largest number of transplants was in the MELD 15–17 category (5,028), but the greatest number of life-years saved (7,649) was in the MELD 36–40 group. We estimate that 34,757 years of life were spared based on the liver transplants observed in this analysis. The Supplementary Materials also present plots of pre-transplant MELD profiles over time, the baseline pre-transplant mortality hazard, the liver transplant baseline hazard, and cumulative incidence of transplantation.

6. Discussion

In this report, we develop methods for estimating the average effect on the treated of a time-dependent treatment. The methods can be used to evaluate the benefit, in terms of patient survival, of a treatment under current treatment assignment practices. The methods were applied to quantify the survival benefit of deceased-donor liver transplantation among the transplanted, by Model for End-stage Liver Disease (MELD) score.

The proposed methods are not intended to guide treatment decisions. For example, the fact that we estimate a larger treatment effect for MELD 36–40 than for 30–35 does not imply that a patient with MELD=32 should wait until his/her MELD score increases to ≥ 36 before they agree to be transplanted. The proposed methods cannot generally be used to compare treatment effects, since each treatment effect is averaged differently. For example, the difference in the treatment effect between patients transplanted at MELD 15–17 ([Delta with circumflex](5) = 1.00) and MELD 12–14 ([Delta with circumflex](5) = 0.59) is partly attributable to the former group being transplanted with higher quality donor livers.

There are now many methods available for evaluating a time-dependent treatments. Marginal Structural Models (MSM; e.g., Hernán, Brumback and Robins, 2000; Robins, Hernán and Brumback, 2000) are not well-suited to our set-up due to the potential for treatment to interact with time-varying covariates. Structural Nested Failure Time Models (SNFTMs; e.g., Robins, 1988; Joffe et al., 1998; Keiding et al., 1999; Hernan et al., 2005; Taubman et al., 2009; Vock et al., 2013) are an alternative. In particular, the method of Vock et al. (2013) was motivated by the lung transplant setting. Versions of Sequential Stratification, which involves stratified and inverse weighted Cox regression, have been used to evaluate the benefit of kidney transplantation (Schaubel, Wolfe and Port, 2006) and liver transplantation (Schaubel et al., 2009). An advantage the proposed method over SNFTMs and Sequential Stratification is the avoidance of any parametric assumptions regarding the treatment effect. SNFTMs assume that treatment alters the time scale through a constant, while Sequential Stratification assumes proportionality of the pre- and post-treatment hazard functions. A further advantage of our proposed methods over SNFTMs relates to implementation. Although explicit coding would be required for either approach, the ‘core’ models in our method merely involve Cox regression and, therefore, can be fitted using standard statistical software (SAS, R) after modifying the input data appropriately.

In estimating the ETT, we consider the absence of treatment; i.e., Ti = ∞. In setting where this is found to be too ambitious a goal (e.g., lack of sufficiently long follow-up, in a setting where treatment is inevitable), one could change [Ti = ∞] to [Ti > L] in describing the absence-of-treatment scenario.

An alternative to the measures proposed in (9) and (10) would be to redefine S1(t) to be the population average survival (i.e., averaging over the current treated and untreated experiences), with S0(t) then representing the average population survival in the absence of treatment. Unless strong or unrealistic assumptions were made, the ‘core’ models for this approach would be quite similar to those in the proposed approach, except for the pre-treatment hazard model. The proposed averaging would be preferred in many practical settings (including the liver transplant setting which motivated our current work) since the absence of a treatment benefit among non-recipients is made explicit.

Supplementary Material

Supp Info


This work was supported in part by National Institutes of Health Grant R01-DK070869. The data reported here have been supplied by the Minneapolis Medical Research Foundation (MMRF) as the contractor for the Scientific Registry of Transplant Recipients (SRTR). The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy of or interpretation by the SRTR or the U.S. Government. The authors wish to thank the Associate Editor and two Reviewers, whose comments and suggestions led to considerable improvement of the manuscript. They also thank Min Zhang for her many thoughtful suggestions.


Supplementary Materials

Supplementary Materials, referenced in Sections 3, 4 and 5, are available with this paper at the Biometrics website on Wiley Online Library.


  • Andersen PK, Gill RD. Cox’s regression model for counting processes: A large sample study. The Annals of Statistics. 1982;10:1100–1120.
  • Breslow NE. Contribution to the discussion of paper by D.R. Cox. Journal of the Royal Statistical Society Series B. 1972;34:216–217.
  • Chen P, Tsiatis AA. Causal inference on the difference of the restricted mean life between two groups. Biometrics. 2001;57:1030–1038. [PubMed]
  • Cox DR. Regression models and life tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–200.
  • Cox DR. Partial likelihood. Biometrika. 1975;62:269–275.
  • Feng S, Goodrich NP, Bragg-Gresham JL, Dykstra DM, Punch JD, DebRoy MA, Greenstein SM, Merion RM. Characteristics associated with liver graft failure: The concept of a donor risk index. American Journal of Transplantation. 2006;6:783–790. [PubMed]
  • Feuer EJ, Hankey BF, Gaynor JJ, Wesley MN, Baker SG, Meyer JS. Graphical representation of survival curves associated with a binary non-reversible time dependent covariate. Statistics in Medicine. 1992;11:455–474. [PubMed]
  • Gong Q, Schaubel DE. Partly conditional estimation of the effect of a time-dependent factor in the presence of dependent censoring. Biometrics. 2013;69:338–347. [PMC free article] [PubMed]
  • Hernán MA, Cole SR, Margolick J, Cohen M, Robins JM. Structural accelerated failure time models for survival analysis in studies with time-varying treatments. Pharmacoepidemiology and drug safety. 2005;14:477–491. [PubMed]
  • Joffe MM, Hoover DR, Jacobson LP, Kingsley L, Chmiel JS, Visscher BR, Robins JM. Estimating the effect of Zidovudine on Karposi’s sarcoma form observational data using a rank preserving structural failure time model. Statistics in Medicine. 1998;17:1073–1102. [PubMed]
  • Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;282:457–481.
  • Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data: 2nd Edition. New York: Wiley; 2002.
  • Keiding N, Filiberti M, Esbjerg S, Robins JM, Jacobsen N. The graft versus leukemia effect after bone marrow transplantation: A case study using structural nested failure time models. Biometrics. 1999;57:23–28. [PubMed]
  • Parast L, Tian L, Cai T. Landmark estimation of survival and treatment effect in a randomized clinical trial. Journal of the American Statistical Association. 2014;109:383–394. [PMC free article] [PubMed]
  • Pearl J. Causality: Models, Reasoning, and Inference. New York: Cambridge; 2009.
  • Robins JM. The control of confounding by intermediate variables. Statistics in Medicine. 1988;8:679–701. [PubMed]
  • Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell B, editors. AIDS Epidemiology - Methodological Issues. Boston: Birkhäuser; 1992. pp. 297–331.
  • Rubin DB. Bayesian inference for causal effect: The role of randomization. Annals of Statistics. 1978;6:34–58.
  • Schaubel DE, Wolfe RA, Port FK. A sequential stratification method for estimating the effect of a time-dependent experimental treatment in observational studies. Biometrics. 2006;62:910–917. [PubMed]
  • Schaubel DE, Wolfe RA, Sima CS, Merion RM. Estimating the effect of a time-dependent treatment by levels of an internal time-dependent covariate. Journal of the American Statistical Association. 2009;104:49–59.
  • Sharma P, Schaubel DE, Goodrich NP, Merion RM. Serum sodium and the survival benefit of liver transplantation. Liver Transplantation. 2015;21:308–313. [PMC free article] [PubMed]
  • Van Houwelingen HC. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics. 2007;34:70–85.
  • Van Houwelingen HC, Putter H. Dynamic prediction in clinical survival analysis. New York: CRC Press; 2012.
  • Van Houwelingen HC, Putter H. Comparison of stopped Cox regression with direct methods such as pseudo-values and binomial regression. Lifetime Data Analysis. 2015;21:180–196. [PubMed]
  • Vock DM, Tsiatis AA, Davidian M, Laber EB, Tsuang WM, Finlen Copeland CA, Palmer SM. Assessing the causal effect of organ transplantation on the distribution of residual lifetime. Biometrics. 2013;69:820–829. [PMC free article] [PubMed]
  • Zhang M, Schaubel DE. Estimating differences in restricted mean lifetime using observational data subject to dependent censoring. Biometrics. 2011;67:740–749. [PMC free article] [PubMed]
  • Zhang M, Schaubel DE. Double-robust semiparametric estimator for differences in restricted mean lifetimes in observational studies. Biometrics. 2012;68:999–1009. [PMC free article] [PubMed]
  • Zheng YY, Heagerty PJ. Partly conditional survival models for longitudinal data. Biometrics. 2005;61:379–391. [PubMed]
  • Zucker DM. Restricted mean life with covariates: Modification and extension of a useful survival analysis method. Journal of the American Statistical Association. 1998;93:702–709.