Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4820319

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Marginal Structural Cox Model Estimators
- 3. Asymptotic Properties
- 4. Simulation
- 5. Additional Considerations
- Supplementary Material
- References

Authors

Related links

Stat Sin. Author manuscript; available in PMC 2017 April 1.

Published in final edited form as:

Stat Sin. 2016 April; 26(2): 509–526.

doi: 10.5705/ss.2014.015PMCID: PMC4820319

NIHMSID: NIHMS730938

Department of Biostatistics, Brown University, ude.nworb.tats@eelh

Department of Biostatistics, University of North Carolina at Chapel Hill, ude.cnu.liame@snegduhm
ude.cnu.soib@iac

Department of Epidemiology, University of North Carolina at Chapel Hill, ude.cnu@eloc

A common objective of biomedical cohort studies is assessing the effect of a time-varying treatment or exposure on a survival time. In the presence of time-varying confounders, marginal structural models fit using inverse probability weighting can be employed to obtain a consistent and asymptotically normal estimator of the causal effect of a time-varying treatment. This article considers estimation of parameters in the semiparametric marginal structural Cox model (MSCM) from a case-cohort study. Case-cohort sampling entails assembling covariate histories only for cases and a random subcohort, which can be cost effective, particularly in large cohort studies with low outcome rates. Following Cole et al. (2012), we consider estimating the causal hazard ratio from a MSCM by maximizing a weighted-pseudo-partial-likelihood. The estimator is shown to be consistent and asymptotically normal under certain regularity conditions. Finite sample performance of the proposed estimator is evaluated in a simulation study. In the corresponding supplementary document, computation of the estimator using standard survival analysis software is presented.

Biomedical cohort studies are often conducted with the goal of assessing the effect of a time-varying treatment (or exposure) on a survival time. In such studies there may exist time-dependent covariates which are simultaneously (i) confounders and (ii) affected by prior treatment. In the presence of time-varying confounders affected by prior treatment, standard methods such as Cox regression modeling with time-varying covariates do not in general yield consistent estimators of the causal effect of treatment (Robins (1986, 1998); Robins and Rotnitzky (1992); Hernán, Brumback, and Robins (2001)). On the other hand, marginal structural models (MSM) fit using inverse probability weighting can be employed to obtain consistent estimators of the causal effect of a time-varying treatment on an outcome of interest, even if there are time-varying confounders affected by prior treatment (Robins (1999); Hernán, Brumback, and Robins (2001)).

Recently, Cole et al. (2012) considered fitting MSCMs via inverse probability weighting in the presence of case-cohort sampling. The case-cohort study design is a cost-efficient approach to estimate treatment effects in large cohorts with low event rates when treatment or covariate information is expensive. The design entails randomly selecting a subcohort from the entire cohort. Covariate information is then collected only from the random subcohort and from individuals that are observed to experience an event (i.e., cases), saving cost and effort relative to obtaining covariate information from the full cohort. In addition to being cost efficient, the case-cohort design enjoys other benefits. For instance, the subcohort can serve as a basis for real-time covariate monitoring during the course of the study. Also, because the subcohort is chosen randomly, survival times to different diseases can be analyzed using the same subcohort (Self and Prentice (1988)).

In the presence of case-cohort sampling, Cole et al. (2012) considered estimating the causal hazard ratio of a MSCM via inverse probability weighting. Simulation studies indicated that their estimator could perform well empirically, but no formal justification for their estimator has been developed to date. Following Cole et al. (2012), we consider estimating the causal hazard ratio of a MSCM via inverse probability weighting in case-cohort studies and establish consistency and asymptotic normality for the estimator that maximizes a weighted-pseudopartial- likelihood (WPPL) under certain regularity conditions.

The approach utilized in this paper entails standard counting process and martingale theory. This formulation readily enables practical implementation of the methods using existing survival analysis software. Framing the problem using counting processes may also be helpful in future work, e.g., in fitting MSCMs to data from nested case-control studies or in the presence of competing risks. In the special situation that the subcohort is the full cohort, the proposed inverse probability weighted estimator is asymptotically equivalent to the estimator in Robins (1999). In this case our proof gives an alternative consistency and normality proof to the one in Robins (1999) that does not utilize the usual counting process framework. We also derive a new variance estimator that arises from the counting process formulation under both full and case-cohort settings. Empirical results presented here indicate that, in certain scenarios, the proposed variance estimator may be preferred to the so-called “robust” variance estimator (Lin and Ying (1993)) employed in Cole et al. (2012).

The outline of the remainder of this paper is as follows. In Section 2, estimators of the hazard ratio of a MSCM in the presence of case-cohort sampling are introduced, including the estimator proposed by Cole et al. (2012). Consistency and asymptotic normality results are presented in Section 3 and a simulation study is in Section 4. Some additional considerations are in Section 5. Regularity conditions and discussion of the conditions are deferred to the Appendix. The supplementary document accompanying this paper includes detailed proofs for Theorems 3.1 – 3.6, a description of how a MCSM can easily be fit via inverse probability weighting for either the full cohort or case-cohort setting using standard survival analysis software, such as R or SAS, additional simulation study results including performance of the baseline cumulative hazard estimator, and a summary of notation.

Consider an observational cohort study where the outcome of interest is a survival time *T*, based on the time from study entry until some particular outcome occurs. We assume *T* is continuous so that there are no tied failure times. During the study, individuals may dropout or discontinue participation so that *T* is right censored. Individuals may or may not elect to receive treatment at various points of time during the study; let *A _{i}*(

Let *ā* denote a possible (static) treatment plan, *ā*= {*a*(*t*) : 0 ≤ *t* ≤ *τ*}, where *τ* is the study duration. Assume *τ* = 1 without loss of generality. Each possible value of *ā* can be interpreted as a prespecified treatment plan. Given a single treatment, examples of *ā* are never treat (*a*(*t*) = 0 for all *t* [0, 1]), treat starting at a prespecified time *t*_{1} < 1 (*a*(*t*) = *I*{*t* ≥ *t*_{1}} where *I*{·} is the usual indicator function), treat from baseline (*a*(*t*) = 1 for all *t* [0, 1]). Let *T _{ā}* be a subject’s potential failure time had (possibly contrary to what was observed in the actual study) the subject been treated according to

$$T={T}_{\overline{a}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\forall \overline{a}\phantom{\rule{0.16667em}{0ex}}\text{such}\phantom{\rule{0.16667em}{0ex}}\text{that}\phantom{\rule{0.16667em}{0ex}}a(t)=A(t)\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\forall t\le T,$$

(2.1)

$${T}_{\overline{a}}\u2aebA(t)\mid \overline{A}({t}^{-}),\overline{L}(t)\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\forall \overline{a},$$

(2.2)

$$\text{pr}[A(t)\mid \overline{A}({t}^{-}),\overline{L}(t)]>0\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\forall t\in [0,1]\phantom{\rule{0.16667em}{0ex}}\text{such}\phantom{\rule{0.16667em}{0ex}}\text{that}\phantom{\rule{0.16667em}{0ex}}\text{pr}[\overline{A}({t}^{-}),\overline{L}(t)]>0.$$

(2.3)

These are referred to as the *causal consistency*, *conditional exchangeability*, and *positivity* assumptions, respectively. Assumption (2.1) states that, in the absence of censoring, the observed failure time *T* equals the potential failure time *T _{ā}* for all treatment plans

Consider the MSCM

$${\lambda}_{{T}_{\overline{a}}}(t)={\lambda}_{0}(t)exp\{{\beta}_{0}^{\prime}f(\overline{a}(t))\}$$

where *λ*_{Tā} (*t*) is the hazard of failure at time *t* if all individuals in the population had followed treatment plan *ā* through time *t*, *λ*_{0}(*t*) is an unspecified baseline hazard function corresponding to the hazard if all individuals had been untreated through time *t*, *f*(*ā*(*t*)) is a specified function of treatment history up to time *t*, and *β*_{0} is an unknown parameter vector. Hereafter, we consider the MSCM

$${\lambda}_{{T}_{\overline{a}}}(t)={\lambda}_{0}(t)r\{{\beta}_{0}^{\prime}a(t)\}$$

(2.4)

where, for notational convenience, we let *r*{·} = exp{·}. For example, if we are interested in the causal effect of current AZT treatment on mortality of HIV-positive homosexual men, then *r*(*β*_{0}) is the ratio of the hazard of death at time *t* had all subjects in the population alive at time *t* been exposed to AZT compared to being unexposed at time *t*. Here (2.4) focuses on the effect of current treatment status only; however, the results presented below are valid for any specified *f*(*ā*(*t*)).

We employ the counting process framework to study the large sample behavior of estimators of *β*_{0}. All processes discussed hereafter refer to observed processes. Let (Ω, , ) be a complete probability space and let {* _{t}* :

If we can correctly model the probability of receiving treatment at time *t* given the past treatment history and covariate history, then we can consistently estimate the weights

$${W}^{T}(t)=\prod _{k\le t}\frac{\text{pr}[A(k)\mid \overline{A}({k}^{-})]}{\text{pr}[A(k)\mid \overline{A}({k}^{-}),\overline{L}(k)]}.$$

(2.5)

These are referred to as *stabilized* inverse-probability-of-treatment-weights (IPTWs). If the numerator probabilities in (2.5) were replaced with 1, then the weights are referred to as *unstabilized* IPTWs. We can consistently estimate the numerator probabilities in (2.5) based on sample proportions because *A*(·) is assumed to have at most a finite number of jumps over the study period. Under (2.2) to (2.3), in the absence of censoring, Robins (1999) showed that a consistent estimator of the unknown parameter *β*_{0} in (2.4) can be obtained by fitting an ordinary time-dependent Cox model with the contribution of subject *i* to the risk set at time *t* weighted by estimates of (2.5). Informally we can think of the analysis via IPTWs as reweighting the observed data set such that it has the same properties as a random sample, with respect to the measured confounders *L*, from a population where (*t*) *A*(*t*)|*Ā*(*t*^{−}) holds at time *t*. The weighted study population is sometimes called a *pseudo-population*.

Dropout may introduce selection bias if it is associated with exposure and the outcome. In the presence of such censoring, we can still obtain a consistent estimator of *β*_{0} by fitting the ordinary Cox model, but weighting a subject alive and uncensored at time *t* by estimates of *W ^{T}*(

$${W}^{C}(t)=\prod _{k\le t}\frac{\text{pr}[C(k)=0\mid \overline{C}({k}^{-})=0,\overline{A}({k}^{-})]}{\text{pr}[C(k)=0\mid \overline{C}({k}^{-})=0,\overline{A}({k}^{-}),\overline{L}(k)]}.$$

(2.6)

This is under the assumption of no unmeasured confounders for censoring, an analogous assumption to (2.3) for censoring, and assuming that we can correctly model the denominator probabilities in (2.6) (Robins (1999)). Here the weighted study population can be thought of as a pseudo-population in which there is no confounding due to measured covariates or selection bias due to censoring. In Section 2.3, we make use of the stabilized weights defined by *W*(*t*) *W ^{T}* (

We now briefly describe estimation of the random weights *W*(*t*), denoted by *Ŵ*(*t*). One can specify a pooled logistic model (treating each person-visit as an observation) to estimate the probability in the denominators of (2.5) and (2.6) at each time (for example, at each visit), then plug in the estimated probabilities (Hernán et al. (2000, 2001)). In the presence of case-cohort sampling, the same model can be used to obtain *Ŵ*(*t*) after weighting subcohort controls by the inverse probability of subcohort selection. Example SAS code to obtain *Ŵ*(*t*) using case-cohort data is provided in Section 2 of the supplement. We assume throughout that the models to estimate denominator probabilities in the IPWs are correctly specified. In practice, investigators may wish to explore the sensitivity of treatment effect estimates to different model specifications for estimating the weights.

We consider two weighted-pseudo-partial-likelihoods (WPPL) that form the basis for obtaining consistent estimators of *β*_{0} in the presence of case-cohort sampling. They are formed by weighting individual contributions to the usual partial likelihoods by *W _{i}*(

The log-WPPL created by individual-time-specific weights at time *t* under the full cohort setting is given by

$$l(\beta ,t;W)=\sum _{i=1}^{n}{\int}_{0}^{t}{W}_{i}(u)\phantom{\rule{0.16667em}{0ex}}\left[{\beta}^{\prime}{A}_{i}(u)-log\sum _{l=1}^{n}{W}_{l}(u){Y}_{l}(u)r\{{\beta}^{\prime}{A}_{l}(u)\}\right]\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(u);$$

(2.7)

it is motivated by the weighted estimating equations proposed by Robins (1993). The log-WPPL in the case-cohort setting is

$$\stackrel{\sim}{l}(\beta ,t;W)=\sum _{i=1}^{n}{\int}_{0}^{t}{W}_{i}(u)\phantom{\rule{0.16667em}{0ex}}\left[{\beta}^{\prime}{A}_{i}(u)-log\sum _{l\in \stackrel{\sim}{\mathcal{C}}}{W}_{l}(u){Y}_{l}(u)r\{{\beta}^{\prime}{A}_{l}(u)\}\right]\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(u).$$

(2.8)

This is slightly different from the log-WPPL proposed by Cole et al. (2012),

$${l}^{\ast}(\beta ,t;W)=\sum _{i=1}^{n}{\int}_{0}^{t}{W}_{i}(u)\phantom{\rule{0.16667em}{0ex}}\left[{\beta}^{\prime}{A}_{i}(u)-log\sum _{l\in \stackrel{\sim}{\mathcal{C}}\cup \{i\}}{W}_{l}(u){Y}_{l}(u)r\{{\beta}^{\prime}{A}_{l}(u)\}\right]\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(u).$$

(2.9)

Here (2.8) and (2.9) differ only in whether a case outside the subcohort contributes to the risk set. If *W _{i}*(

Let , , and *β*^{*} be solutions to *l*(*β,* 1; *Ŵ*)/*β*= 0, (*β,* 1; *Ŵ*)/*β* = 0, and *l*^{*}(*β,* 1; *Ŵ*)/*β* = 0, respectively. Based on arguments as in Andersen and Gill (1982), in Theorems 3.1 - 3.2 we show and are consistent estimators of *β*_{0}. That *β*^{*}_{p} β_{0} can be shown analogously. Asymptotic normality of and will be shown via asymptotic normality of score statistics corresponding to (2.7) and (2.8).

To make use of counting process and martingale theory, under (2.1) each (observed) counting process *N _{i}*(·)(

$${N}_{i}(t)={\int}_{0}^{t}{\lambda}_{i}(u)du+{M}_{i}(t),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}t\in [0,1],$$

(2.10)

where the intensity process is given by

$${\lambda}_{i}(t)={Y}_{i}(t)r\{{\beta}_{0}^{\prime}{A}_{i}(t)\}{\lambda}_{0}(t),$$

(2.11)

which embodies the same parameters as in (2.4).

In this section, we present the main results: consistency and asymptotic normality of the estimators , , and *β*^{*}. Sufficient conditions for these results are stated in the Appendix, followed by discussion of the conditions. Proofs of the theorems are given in S1 of the supplementary document.

*(Consistency of under full cohort) Under conditions A–F, _{p} β*

*(Consistency of under the case-cohort) Under conditions A–G, _{p} β*

It is straightforward to show that the estimator based on (2.9) converges in probability to the same limit as . An individual case’s contribution to at its failure time (which is weighted by its IPWs) is asymptotically negligible in the sense that IPWs are bounded at all times and weighted subcohort averages are asymptotically stable (see conditions B and G-3 in the Appendix).

*Under conditions A–G, * − *β ^{*} _{p}* 0.

We need some additional notation. Let *c*^{0} = 1, *c*^{1} = *c*, *c*^{2} = *cc*′, and *c _{i}* denote the

(Asymptotic normality of the full cohort MSCM score statistic)

*Let U*(*β*_{0}*, t*) = *l*(*β, t*)/*β*|_{β=β0}
*be the full cohort MSCM score process at time t. Under conditions A–F,*

$${n}^{-1/2}U({\beta}_{0},1){\to}_{d}N(0,{\mathrm{\sum}}_{U})$$

*where* Σ* _{U}* = Σ

$${\mathrm{\Delta}}_{{W}_{(1)},{W}_{(2)}}={\int}_{0}^{1}{\{{e}_{{W}_{(2)}}({\beta}_{0},u)-{e}_{{W}_{(1)}}({\beta}_{0},u)\}}^{\otimes 2}{s}_{{W}_{(2)}}^{(0)}({\beta}_{0},u){\lambda}_{0}(u)du.$$

Theorem 3.4 can be used to show asymptotic normality of the MSCM case-cohort score statistic:

(Asymptotic normality of the case-cohort MSCM score statistic)

*Let Ũ*(*β*_{0}, *t*) = (*β, t*)/*β*|_{β=β0}
*be the case-cohort MSCM Score process at time t. Under conditions A–G,*

$${n}^{-1/2}\stackrel{\sim}{U}({\beta}_{0},1){\to}_{d}N(\mathbf{0},{\mathrm{\sum}}_{\stackrel{\sim}{U}})$$

*where* Σ* _{Ũ}* = Σ

$$\begin{array}{l}{\mathrm{\Delta}}_{\alpha}={\int}_{0}^{1}{\int}_{0}^{1}G({\beta}_{0},x,v){\lambda}_{0}(x){\lambda}_{0}(v)\mathit{dxdv},\\ G({\beta}_{0},x,v)=(1-\alpha ){\alpha}^{-1}[{h}^{(1)}({\beta}_{0},x,v)-{e}_{{W}_{(1)}}({\beta}_{0},x){h}^{(2)}{({\beta}_{0},x,v)}^{\prime}-{h}^{(2)}({\beta}_{0},v,x){e}_{{W}_{(1)}}{({\beta}_{0},v)}^{\prime}+{e}_{{W}_{(1)}}({\beta}_{0},x){e}_{{W}_{(1)}}{({\beta}_{0},v)}^{\prime}{h}^{(0)}({\beta}_{0},x,v)],\\ {h}^{(0)}(\beta ,x,v)={q}^{(0)}(\beta ,x,v)-{s}_{{W}_{(1)}}^{0}(\beta ,x){s}_{{W}_{(1)}}^{(0)}(\beta ,v),\\ {h}^{(1)}(\beta ,x,v)={q}^{(1)}(\beta ,x,v)-{s}_{{W}_{(1)}}^{(1)}(\beta ,x){s}_{{W}_{(1)}}^{(1)}{(\beta ,v)}^{\prime},\phantom{\rule{0.38889em}{0ex}}\mathit{and}\\ {h}^{(2)}(\beta ,x,v)={q}^{(2)}(\beta ,x,v)-{s}_{{W}_{(1)}}^{(0)}(\beta ,x){s}_{{W}_{(1)}}^{(1)}(\beta ,v),\end{array}$$

*with q*^{(}^{j}^{)}(*β*, ·, ·) *defined in condition G-2 in the *Appendix.

(Asymptotic normality of ) Under conditions A-G,

$${n}^{1/2}(\stackrel{\sim}{\beta}-{\beta}_{0}){\to}_{d}N(\mathbf{0},{\mathrm{\sum}}_{{W}_{(1)}}^{-1}{\mathrm{\sum}}_{\stackrel{\sim}{U}}{\mathrm{\sum}}_{{W}_{(1)}}^{-1})$$

*where* Σ* _{Ũ} is given in Theorem 3.5*.

Based on Theorem 3.6, we propose a new variance estimator

$$\widehat{var}(\stackrel{\sim}{\beta})={n}^{-1}{\widehat{\mathrm{\sum}}}_{{W}_{(1)}}^{-1}({\widehat{\mathrm{\sum}}}_{{W}_{(2)}}+{\widehat{\mathrm{\Delta}}}_{{W}_{(1)},{W}_{(2)}}+{\widehat{\mathrm{\Delta}}}_{\alpha}){\widehat{\mathrm{\sum}}}_{{W}_{(1)}}^{-1},$$

(3.1)

where

$$\begin{array}{l}\stackrel{\sim}{\mathcal{I}}(\stackrel{\sim}{\beta},1;W)=-{\partial}^{2}\stackrel{\sim}{l}(\beta ,1;W)/\partial {\beta}^{2}{\mid}_{\beta =\stackrel{\sim}{\beta}},\\ {\widehat{\mathrm{\sum}}}_{{W}_{(1)}}={n}^{-1}\stackrel{\sim}{\mathcal{I}}(\stackrel{\sim}{\beta},1;W=\widehat{W}),\end{array}$$

(3.2)

$${\widehat{\mathrm{\sum}}}_{{W}_{(2)}}={n}^{-1}\stackrel{\sim}{\mathcal{I}}(\stackrel{\sim}{\beta},1;{W}^{2}={\widehat{W}}^{2}),$$

(3.3)

$${\widehat{\mathrm{\Delta}}}_{{W}_{(1)},{W}_{(2)}}={n}^{-1}\sum _{i=1}^{\stackrel{\sim}{n}}{\int}_{0}^{1}{\widehat{W}}_{i}{(u)}^{2}\phantom{\rule{0.16667em}{0ex}}{\left[{\stackrel{\sim}{E}}_{\{{W}_{(2)}={\widehat{W}}^{2}\}}(\stackrel{\sim}{\beta},u),-{\stackrel{\sim}{E}}_{\{{W}_{(1)}=\widehat{W}\}}(\stackrel{\sim}{\beta},u)\right]}^{\otimes 2}{dN}_{i}(u),$$

(3.4)

$${\widehat{\mathrm{\Delta}}}_{\alpha}={n}^{-2}{\int}_{0}^{1}{\int}_{0}^{1}\widehat{G}(\stackrel{\sim}{\beta},x,v){\stackrel{\sim}{S}}_{W(1)}^{(0)}{(\stackrel{\sim}{\beta},x)}^{-1}\times {\stackrel{\sim}{S}}_{W(1)}^{(0)}{(\stackrel{\sim}{\beta},v)}^{-1}d{\overline{N}}_{\widehat{W}}(x)d{\overline{N}}_{\widehat{W}}(v).$$

(3.5)

Here *Ẽ*_{{W(2)=Ŵ2}} and *Ẽ*_{{W(1)=Ŵ}} denote that the IPWs in *Ẽ*_{W(2)} and *Ẽ*_{W(1)} are replaced by *Ŵ*^{2} or *Ŵ*, * _{Ŵ}*(

The proposed variance estimator (3.1) is different from the robust estimator proposed by Lin and Ying (LY, Lin and Ying (1993)) that is used in most MSM analyses. Both (3.1) and the LY estimator are sandwich-type estimators where the “bread” of sandwich (
${\widehat{\mathrm{\sum}}}_{{W}_{(1)}}^{-1}$) is the same. However, the “meat” is different. In particular, the “meat” of the LY estimator is based on (weighted) score residuals whereas the “meat” of (3.1) is given by an estimator of Σ* _{Ũ}* + Δ

A simulation study was conducted to examine the finite sample bias of and *β*^{*}, and the performance of the proposed variance estimator (3.1) as well as the LY variance estimator. Simulations were conducted as in Cole et al. (2012). Briey, potential survival times were generated according to the MSCM given in (2.4), and observed survival times were generated by stochastically generating time varying exposures and confounders for cohorts of size *n* = 1, 000 (see Cole et al. (2012) for details). While they considered only one scenario in which 25% of individuals were cases and a 20% subcohort fraction, *ñn*^{−1} × 100 = 20, we considered 36 scenarios by varying both the subcohort fraction and the event rate from 5 to 30%. Censoring times were generated from uniform distributions with support chosen to achieve the desired event rate. We did not incorporate (2.6) when calculating IPWs because the censoring times were generated independently of the exposure and potential survival times. Following Cole et al. (2012), stabilized weights were used to calculate IPWs. For each scenario 5,000 data sets were generated under the null *β*_{0} = 0 and the alternative *β*_{0} = log(1/2).

Results from the simulation study are summarized in Table 4.1. Only results obtained from six scenarios under the null are presented; results from other scenarios and under the alternative were similar (see S2 of the supplementary document for results under the alternative). For all scenarios, under both the null and alternative, and *β*^{*} were nearly unbiased; that the two estimators performed similarly is not surprising in light of Theorem 3.3. Under the null, the proposed variance estimator was always less biased than the LY variance estimator when the subcohort fraction was only 5%, regardless of the event rate. Similarly, (3.1) was less biased regardless of the subcohort fraction when the event rate was 5%. Both the proposed and the LY variance estimators were approximately unbiased when the subcohort fraction and event rate were both at least 20%. Wald confidence intervals (CIs) using the LY variance estimator tended to undercover when the subcohort fraction was 5%, whereas Wald CIs using (3.1) exhibited coverage close to the nominal level for all scenarios considered. In summary, both and *β ^{*}*, along with the proposed variance estimator and CIs, exhibited good finite sample properties for the scenarios considered, while performance of the LY variance estimator depended on subcohort size and event rate.

In addition to the treatment effect *β*, it may be of interest to estimate the cumulative baseline hazard function. Similar to the cumulative baseline hazard estimator proposed in Self and Prentice (1988), a consistent estimator of
${\mathrm{\Lambda}}_{0}(t)={\int}_{0}^{t}{\lambda}_{0}(u)du$ with case-cohort sampling is

$${\stackrel{\sim}{\mathrm{\Lambda}}}_{\widehat{W}}(\stackrel{\sim}{\beta},t)=\stackrel{\sim}{n}{n}^{-1}{\int}_{0}^{t}{\left[\sum _{i\in \stackrel{\sim}{\mathcal{C}}}{\widehat{W}}_{i}(u){Y}_{i}(u)r\{{\stackrel{\sim}{\beta}}^{\prime}{A}_{i}(u)\}\right]}^{-1}\sum _{i=1}^{n}{\widehat{W}}_{i}(u){dN}_{i}(u).$$

(5.1)

This estimator is equivalent to Self and Prentice’s estimator when *Ŵ _{i}*(

While Cole et al. (2012) extensively discussed limitations of the MSCM with case-cohort sampling, we reiterate some of the issues that are important from a theoretical point of view. First, the asymptotic results presented here correct model specification for both the treatment assignment model (2.5) and the censoring model (2.6). If (2.2) holds, flexible parametric regression models fit adjusting for all measured confounders and their histories should provide a good approximation to denominator probabilities in (2.5) and (2.6). However, (2.2) is an untestable assumption and thus in practice analysts may want to explore sensitivity of treatment effect estimates to departures from this assumption.

Second, the proposed MSCM estimators for the case-cohort study, which are based on Prentice (1986) and Self and Prentice (1988) type estimators, are not expected to be fully efficient. After Prentice (1986) and Self and Prentice (1988), various methods have been proposed as means of improving the efficiency of the hazard ratio estimator in the standard case-cohort Cox regression analysis. Those methods seek to improve efficiency mostly by using weighted partial-likelihood estimation. For example, Barlow (1994), Barlow et al. (1999), Kong, Cai, and Sen (2006), and Kong and Cai (2009) utilize fixed and time-varying inverse-probability-sampling weights to account for subcohort sampling to improve efficiency. Borgan et al. (2000), Kulich and Lin (2004), and Breslow et al. (2009a, b) consider leveraging phase 1 covariates (available from the entire cohort) to improve efficiency. These methods could potentially be extended to develop more efficient MSCM estimators in the presence of case-cohort sampling.

We thank Professor Danyu Lin for expert advice and the Editor and three anonymous reviewers for helpful comments that improved the paper. The authors were partially supported by the Lifespan/Tufts/Brown Center for AIDS Research (CFAR), University of North Carolina CFAR, and National Institutes of Health grants P30 AI042853, P30 AI50410, R01 AI085073, UL1RR025747, R01 ES021900, P01 CA142538, and R01 AI100654.

In what follows, norms are defined by ||*c*^{2}|| = sup_{i}_{,}* _{j}* |

- A(Uniform consistency of estimated weights)$$\underset{\begin{array}{c}i\in \{1,\dots ,n\}\\ t\in [0,1]\end{array}}{sup}\mid {\widehat{W}}_{i}(t)-{W}_{i}(t)\mid \equiv {\mathrm{M}}_{\widehat{W}}{\to}_{p}0.$$
- B(Stability of weights) Individual time-specific weights
*W*(_{i}*t*) and the corresponding estimators*Ŵ*(_{i}*t*) are strictly positive and bounded, with positive real numbers M_{1}and M_{2}such that$$\underset{\begin{array}{c}i\in \{1,\dots ,n\}\\ t\in [0,1]\end{array}}{sup}{W}_{i}(t)\le {\mathrm{M}}_{1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\underset{\begin{array}{c}i\in \{1,\dots ,n\}\\ t\in [0,1]\end{array}}{sup}{\widehat{W}}_{i}(t)\le {\mathrm{M}}_{2}.$$ - C(Finite interval) ${\int}_{0}^{1}{\lambda}_{0}(t)dt<\infty $.
- D(Asymptotic stability)
- There exists a neighborhood
_{0}of*β*_{0}and functions*s*^{(0)},*s*^{(1)}, and*s*^{(2)}defined on_{0}× [0, 1] such that$$\underset{\begin{array}{c}\beta \in {\mathcal{B}}_{0}\\ t\in [0,1]\end{array}}{sup}\Vert {S}^{(j)}(\beta ,t)-{s}^{(j)}(\beta ,t)\Vert \phantom{\rule{0.16667em}{0ex}}{\to}_{p}0,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=0,1,2.$$ - There exists a neighborhood of
*β*_{0},_{0}, and functions ${s}_{{W}_{(k)}}^{(j)}$ defined on × [0, 1] such that$$\underset{\begin{array}{c}\beta \in \mathcal{B}\\ t\in [0,1]\end{array}}{sup}\Vert {S}_{{W}_{(k)}}^{(j)}(\beta ,t)-{s}_{{W}_{(k)}}^{(j)}(\beta ,t)\Vert \phantom{\rule{0.16667em}{0ex}}{\to}_{p}0,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=0,1,2;k=1,2.$$ - ${n}^{1/2}\{{S}_{{W}_{(1)}}^{(0)}({\beta}_{0},t)-{s}_{{W}_{(1)}}^{0}({\beta}_{0},t)\}{\to}_{d}N(0,{\sigma}^{2}(t))$ uniformly in
*t*[0, 1], for some*σ*^{2}(*t*).

- E(Lindeberg condition) For any
*ε*> 0,*j*= 1, …,*p*$${n}^{-1}{\int}_{0}^{1}\sum _{i=1}^{n}{W}_{i}{(u)}^{2}{[{A}_{ij}(u)-{E}_{{W}_{(1)}}{({\beta}_{0},u)}_{j}]}^{2}{Y}_{i}(u)r\{{\beta}_{0}^{\prime}{A}_{i}(u)\}\times I\{{n}^{-1/2}{W}_{i}(u)\mid {A}_{ij}(u)-{E}_{{W}_{(1)}}{({\beta}_{0},u)}_{j}\mid \phantom{\rule{0.16667em}{0ex}}>\epsilon \}{\lambda}_{0}(u)du{\to}_{p}0$$ - F(Asymptotic regularity conditions)
*s*^{(}^{j}^{)}(*β*,*t*) and ${s}_{{W}_{(k)}}^{(j)}(\beta ,t)$ are continuous functions of*β*uniformly in*t*[0, 1] that are bounded on × [0, 1] for*j*= 0, 1, 2 and*k*= 1, 2. For all (*β*,*t*) × [0, 1] and*m*= 0, 1,$${s}^{(m+1)}(\beta ,t)=\frac{\partial {s}^{(m)}(\beta ,t)}{\partial \beta},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{s}_{{W}_{(k)}}^{(m+1)}(\beta ,t)=\frac{\partial {s}_{{W}_{(k)}}^{(m)}(\beta ,t)}{\partial \beta}.$$Here*s*^{(0)}and ${s}_{{W}_{(k)}}^{(0)}$ are bounded away from zero and the matrices Σ and Σ_{W(k)}are positive definite. - G-1(Stability of subcohort average)
- (Nontrivial subcohort)
*ñn*^{−1}→for some_{p}α*α*(0, 1]. - (Asymptotic normality of subcohort averages at
*β*_{0}) For any*ε*> 0$$\begin{array}{l}\underset{t\in [0,1]}{sup}{n}^{-1}\sum _{i=1}^{n}{W}_{i}{(t)}^{2}{Y}_{i}(t)r{\{{\beta}_{0}^{\prime}{A}_{i}(t)\}}^{2}I\{{n}^{-1/2}{W}_{i}(t){Y}_{i}(t)r\{{\beta}_{0}^{\prime}{A}_{i}(t)\}>\epsilon \}{\to}_{p}0,\\ \underset{t\in [0,1]}{sup}{n}^{-1}\sum _{i=1}^{n}{W}_{i}{(t)}^{2}{Y}_{i}(t){\Vert {r}^{(1)}\{{\beta}_{0}^{\prime}{A}_{i}(t)\}\Vert}^{2}I\{{n}^{-1/2}{W}_{i}(t){Y}_{i}(t)\Vert {r}^{(1)}({\beta}_{0}^{\prime}{A}_{i}(t))\Vert \phantom{\rule{0.16667em}{0ex}}>\epsilon \}{\to}_{p}0,\end{array}$$and the sequences of distributions of*n*^{1}^{/}^{2}{*Ẽ*(*β*_{0},*t*) −*E*(*β*_{0},*t*)} are tight on the product space of cadlag functions equipped with the product Skorohod topology, and so are*n*^{1}^{/}^{2}{*Ẽ*_{W(1)}(*β*_{0},*t*) −*E*_{W(1)}(*β*_{0},*t*)}.

- G-2(Asymptotic stability and regularity of covariance function) There exists a neighborhood of
*β*_{0}and functions*q*^{(}^{j}^{)}(*β*,*t*,*u*) for*j*= 0, 1, 2, defined on × [0, 1]^{2}such that*q*^{(}^{j}^{)}(*β*,*t*,*u*) are continuous functions of*β*uniformly in (*t*,*u*) [0, 1]^{2}, the*q*^{(}^{j}^{)}are bounded on × [0, 1]^{2}and$$\begin{array}{l}\underset{\begin{array}{c}\beta \in \mathcal{B}\\ (t,u)\in {[0,1]}^{2}\end{array}}{sup}\Vert {Q}^{(j)}(\beta ,t,u)-{q}^{(j)}(\beta ,t,u)\Vert \phantom{\rule{0.16667em}{0ex}}{\to}_{p}0,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=0,1,2,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\\ {Q}^{(0)}(\beta ,t,u)={n}^{-1}\sum _{i=1}^{n}{W}_{i}(t){Y}_{i}(t)r\{{\beta}_{0}^{\prime}{A}_{i}(t)\}{W}_{i}(u){Y}_{i}(u)r\{{\beta}_{0}^{\prime}{A}_{i}(u)\},\\ {Q}^{(1)}(\beta ,t,u)={n}^{-1}\sum _{i=1}^{n}{W}_{i}(t){Y}_{i}(t){r}^{(1)}\{{\beta}_{0}^{\prime}{A}_{i}(t)\}{W}_{i}(u){Y}_{i}(u){r}^{(1)}{\{{\beta}_{0}^{\prime}{A}_{i}(u)\}}^{\prime}\\ {Q}^{(2)}(\beta ,t,u)={n}^{-1}\sum _{i=1}^{n}{W}_{i}(t){Y}_{i}(t)r\{{\beta}_{0}^{\prime}{A}_{i}(t)\}{W}_{i}(u){Y}_{i}(u){r}^{(1)}\{{\beta}_{0}^{\prime}{A}_{i}(u)\}.\end{array}$$Moreover, sup_{n}_{≥1}[*Q*^{(}^{j}^{)}(*β*,*t*,*u*)] for*j*= 0, 1, 2 are bounded sequences where denote expectation. - G-3(Asymptotic stability of subcohort averages) If
^{(}^{j}^{)}(*β*,*t*,*u*) are covariance functions based on subcohort members*i*= 1, …,*ñ*, then$$\underset{\begin{array}{c}\beta \in \mathcal{B}\\ t\in [0,1]\end{array}}{sup}\Vert {\stackrel{\sim}{S}}_{{W}_{(k)}}^{(0)}(\beta ,t)-{s}_{{W}_{(k)}}^{(0)}(\beta ,t)\Vert \phantom{\rule{0.16667em}{0ex}}{\to}_{p}0\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}k=1,2,$$the subcohort average converges to the limit of the full cohort in probability uniformly in*β*and*t*[0, 1], and$$\underset{\begin{array}{c}\beta \in \mathcal{B}\\ (t,u)\in {[0,1]}^{2}\end{array}}{sup}\Vert {\stackrel{\sim}{Q}}^{(j)}(\beta ,t,u)-{q}^{(j)}(\beta ,t,u)\Vert \phantom{\rule{0.16667em}{0ex}}{\to}_{p}0,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=0,1,2,$$the subcohort covariance functions converge in probability uniformly in*β*and*t*[0, 1] to the full cohort covariance functions. In addition,$${n}^{1/2}\{{\stackrel{\sim}{S}}_{{W}_{(1)}}^{(0)}({\beta}_{0},t)-{s}_{{W}_{(1)}}^{(0)}({\beta}_{0},t)\}{\to}_{d}N(0,{\stackrel{\sim}{\sigma}}^{2}(t)),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{uniformly}\phantom{\rule{0.16667em}{0ex}}\text{in}\phantom{\rule{0.16667em}{0ex}}t\in [0,1]$$for some^{2}(*t*).

All conditions except A and B are extensions of the regularity conditions from Self and Prentice (1988) by incorporating IPWs. Conditions A and B are required for IPWs to ensure asymptotic properties of the proposed estimators.

*Ŵ*(·) and *W*(·) are assumed to be predictable with respect to the filtration * _{t}* because weights are determined by predictable processes:

All weights discussed in Section 2.2 satisfy conditions A and B in general, except the unstabilized weights. Unstabilized weights satisfy conditions A and B only when the assumption of finite support of *A*(·) and *C*(·) is met. The IPWs are strictly positive by (2.3).

Condition C is the same as in Self and Prentice (1988). When the IPWs are equal to 1,
${S}_{{W}_{(k)}}^{(j)}$ in D is *S*^{(}^{j}^{)} in Self and Prentice (1988) for all *j* = 0, 1, 2 and *k* = 1, 2. Then Σ_{W(k)} defined in Section 3 and Δ* _{α}* defined in Theorem 3.5 are Σ and Δ defined in Self and Prentice (1988), respectively. Hence, Σ

If the treatment process *A*(·) is bounded (as assumed here) and B and F are satisfied, then E holds trivially.

*e*_{W(k)} can be interpreted as the weighted average of a treatment function with the weights taking an inverse-probability weighted exponential form. The positive definite condition on Σ in Andersen and Gill (1982) can easily be extended to the Σ_{W(k)} assuming *W*(*t*) are bounded away from zero on *t* [0, 1].

This is the same as condition G in Self and Prentice (1988), incorporating individual-specific time-varying weights *W _{i}*(

The online supplementary material contains proofs corresponding to consistency and asymptotic normality of MSCM parameter estimators along with consistency of the cumulative baseline hazard estimator proposed in Section 5.1, implementation of the method using standard survival analysis software such as R or SAS, and additional simulation study results including performance of the proposed baseline cumulative hazard estimator. A summary of notation introduced in the main text and the supplement is presented.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

- Andersen PK, Gill RD. Cox’s regression model for counting processes: A large sample study. The Annals of Statistics. 1982;10:1100–1120.
- Barlow WE. Robust variance estimation for the case-cohort design. Biometrics. 1994;50:1064–1072. [PubMed]
- Barlow WE, Ichikawa L, Rosner D, Izumi S. Analysis of case-cohort designs - A prospective cohort study. Journal of Clinical Epidemiology. 1999;52:1165–1172. [PubMed]
- Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposure stratified case-cohort designs. Lifetime Data Analysis. 2000;6:39–58. [PubMed]
- Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Stat Biosc. 2009a;1:32–49. [PMC free article] [PubMed]
- Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of case-cohort data. American Journal of Epidemiology. 2009b;169:1398–1405. [PMC free article] [PubMed]
- Cole SR, Hernán MA, Robins JM, Anastos K, Chmiel J, Detels R, Ervin C, Feldman J, Greenblatt R, Kingsley L, Lai S, Young M, Cohen M, Munoz A. Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models. American Journal of Epidemiology. 2003;158:687–694. [PubMed]
- Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology. 2008;168:656–664. [PMC free article] [PubMed]
- Cole SR, Hudgens MG, Tien PC, Anastos K, Kingsley L, Chmiel JS, Jacobson LP. Marginal structural models for case-cohort study designs to estimate the association of antiretroviral therapy initiation with incident AIDS or death. American Journal of Epidemiology. 2012;175:381–390. erratum: 175 (7), 732. [PMC free article] [PubMed]
- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–570. [PubMed]
- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association. 2001;96:440–448.
- Kulich M, Lin D. Improving the efficiency of relative-risk estimation in case-cohort studies. Journal of the American Statistical Association. 2004;99:832–844.
- Kong L, Cai J, Sen PK. Asymptotic results for fitting semiparametric transformation models to failure time data from case-cohort studies. Statistica Sinica. 2006;16:135–151.
- Kong L, Cai J. Case-cohort analysis with accelerated failure time model. Biometrics. 2009;65:135–142. [PMC free article] [PubMed]
- Lin DY, Ying Z. Cox regression with incomplete covariate measurements. Journal of American Statistical Association. 1993;88:1341–1349.
- Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11.
- Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period - Application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512.
- Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell NP, Dietz K, Farewell VT, editors. AIDS Epidemiology - Methodological Issues. Birkhäuser; Boston: 1992. pp. 297–331.
- Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proceedings of the Biopharmaceutical Section, American Statistical Associtaion. 1993:24–33.
- Robins JM. Correction for non-compliance in equivalence trials. Statistics in Medicine. 1998;17:269–302. [PubMed]
- Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran ME, Berry DA, editors. Statistical Models Epidemiology: The Environment and Clinical Trials. Springer-Verlag; NY: 1999. pp. 95–134.
- Robins JM, Rotnitzky A, Scharfstein D. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran M, Berry D, editors. Statistical Models in Epidemiology: The Environment and Clinical Trials. Springer-Verlag; NY: 1999. pp. 1–92.
- Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. The Annals of Statistics. 1988;16:64–81.
- Therneau TM, Li H. Computing the Cox model for case cohort designs. Lifetime Data Analysis. 1999;5:99–112. [PubMed]
- Therneau T. A Package for Survival Analysis in S. R package version 2.37-4. 2013 http://CRAN.R-project.org/package=survival.
- VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20:880–883. [PubMed]
- Xiao Y, Abrahamowicz M, Moodie EEM. Accuracy of conventional and marginal structural Cox model estimators: A simulation study. The International Journal of Biostatistics. 2010;6:Article 13. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |