Home | About | Journals | Submit | Contact Us | Français |

**|**Biostatistics**|**PMC3202304

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. CHANGE-POINT SHARED PARAMETER MODELS
- 3. LIKELIHOOD-BASED ESTIMATION METHOD
- 4. APPLICATION TO THE ENRICHD STUDY
- 5. SIMULATION STUDY
- 6. DISCUSSION
- SUPPLEMENTARY MATERIAL
- FUNDING
- Supplementary Material
- References

Authors

Related links

Biostatistics. 2011 October; 12(4): 737–749.

Published online 2011 January 24. doi: 10.1093/biostatistics/kxq084

PMCID: PMC3202304

Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA ; Email: vog.hin.iblhn@cuw

Received 2010 February 27; Revised 2010 December 17; Accepted 2010 December 18.

Copyright Published by Oxford University Press 2011.

This article has been cited by other articles in PMC.

We investigate a change-point approach for modeling and estimating the regression effects caused by a concomitant intervention in a longitudinal study. Since a concomitant intervention is often introduced when a patient's health status exhibits undesirable trends, statistical models without properly incorporating the intervention and its starting time may lead to biased estimates of the intervention effects. We propose a shared parameter change-point model to evaluate the pre- and postintervention time trends of the response and develop a likelihood-based method for estimating the intervention effects and other parameters. Application and statistical properties of our method are demonstrated through a longitudinal clinical trial in depression and heart disease and a simulation study.

Change-point models are often used for evaluating the intervention effects in longitudinal clinical trials or epidemiological studies. By treating the starting time of an intervention as a change-point time, the longitudinal trends of the response variable in these models are described using different regression models before and after the intervention. Estimation and inferences can be obtained using well-known procedures, such as the maximum likelihood estimation (MLE), restricted maximum likelihood estimation (REMLE), and generalized estimation equations (e.g. Verbeke and Molenberghs, 2000; Diggle *and others*, 2002). This approach works well when the intervention is randomly assigned. When an intervention is not randomly assigned, treating the intervention as a time-dependent covariate in a change-point model may lead to model misspecification and biased estimation of the intervention effects (Wall *and others*, 2005).

Concomitant interventions are generally not randomly assigned in biomedical studies. The Enhancing Recovery in Coronary Heart Disease (ENRICHD) Study is a typical example which involves a concomitant intervention in addition to the randomly assigned treatment regimens. This is a randomized clinical trial for evaluating the efficacy of a 6-month psychosocial treatment (PT) versus the usual cardiovascular care (UC) on survival, hospitalization, myocardial infarction, and depression severity in 2481 patients with depression and/or low perceived social support after acute myocardial infarction (ENRICHD, 2001, ENRICHD, 2003). Depression severity was measured by Hamilton Rating Scale for Depression (HRSD) at the baseline and 6-month visits and Beck Depression Inventory (BDI) at regularly scheduled visits during the treatment and follow-up periods, where higher HRSD and BDI scores indicate worsened depression. BDI scores for patients in the PT arm were repeatedly measured at weekly visits during the treatment and 4 yearly follow-up visits, while BDI scores for patients in the UC arm were only measured at baseline, the 6-month visit and the yearly follow-up visits. Patients in both treatment arms were eligible for pharmacotherapy with antidepressants as a concomitant intervention if they had high baseline BDI scores or nondecreasing BDI trends after enrollment. In addition, antidepressants were prescribed at the requests of the patients or their primary care physicians. Taylor *and others* (2005) investigated the effects of antidepressants on cardiovascular morbidity and mortality among 1834 depressed ENRICHD patients and found that pharmacotherapy improved survival for this patient population. Their results, however, did not address the question of whether pharmacotherapy was beneficial for lowering patients' depression severity. In an analysis using a subsample of 91 ENRICHD patients in the PT arm who have received pharmacotherapy within the treatment period, Wu *and others* (2008) showed that the naïve mixed-effects models were misspecified when a concomitant intervention was present and proposed a varying-coefficient mixed-effects model to reduce the potential bias associated with the estimated pharmacotherapy effects. A main drawback of Wu *and others* (2008) is the potential loss of information because their regression model cannot be applied to patients who have already received a concomitant intervention at baseline or have not received any concomitant intervention during the study.

We propose in this paper a comprehensive regression method for evaluating the concomitant intervention effects which is capable to incorporate information from all the patients in a longitudinal study. Based on the framework of shared parameter models (Follmann and Wu, 1995), we describe the covariate effects on the response variable through a change-point mixed-effects model and incorporate the random coefficients and the intervention starting time (change-point time) through a series of joint distributions. Patients who have received a concomitant intervention at baseline or have not received any concomitant intervention during the study are treated as censored. A likelihood-based method is established for statistical estimation and inferences and its computation is implemented through a 2-stage iteration procedure. Our application to the ENRICHD pharmacotherapy data and simulation results suggest that the proposed method leads to adequate estimates when a concomitant intervention is present, while the naïve mixed-effects model is likely misspecified under such situations.

We describe our shared parameter model and its interpretations in Section 2, develop the likelihood-based estimation and inference procedures in Section 3, and present the application to the ENRICHD pharmacotherapy data and the simulation results in Sections 4 and 5, respectively. In Section 6, we discuss some potential extensions of our method. Further theoretical and technical results can be found in the supplementary material available at *Biostatistics* online.

Let _{0} and _{1} be the beginning and ending times of a study, and *n* the total number of randomly selected subjects. The *i*th subject has *n*_{i} visits and observations (*T*_{ij},*Y*_{ij},**X**_{i}) at the *j*th visit, where *T*_{ij}, the study time, is the time elapsed from the beginning of the study to the *j*th visit, **X**_{i} is a time-invariant covariate vector, and *Y*_{ij} is the real-valued outcome variable. For simplicity, we assume in this paper that the study involves only one concomitant intervention, and each subject can change from “without intervention” to “intervention” only once during the study. Extension to multiple concomitant interventions and multiple intervention changes involves more notation and computational complexity, hence, is not discussed in the current manuscript. When the context is clear, we may simply refer a “concomitant intervention” as an intervention. Define *S*_{i} to be the intervention starting time or change-point time, and *λ*_{ij} = 0 or 1, if *T*_{ij} ≤ *S*_{i} or *T*_{ij} > *S*_{i}, respectively, to be the intervention indicator for the *i*th subject. Since not every subject changes from without intervention to intervention during the study, the *i*th subject's change-point time is observed if *T*_{i1} ≤ *S*_{i} ≤ *T*_{ini}. If *S*_{i} < *T*_{i1} or *S*_{i} > *T*_{ini}, the subject's change-point time is left or right censored, respectively. The indicator variable for censoring *δ*_{i}^{(c)} is defined by *δ*_{i}^{(c)} = 0 if *T*_{i1} ≤ *S*_{i} ≤ *T*_{ini}, 1 if *S*_{i} > *T*_{ini}, and 2 if *S*_{i} < *T*_{i1}. The observed change-point times are {_{i}^{(c)} = (*S*_{i}^{(c)},*δ*_{i}^{(c)});*i* = 1,…,*n*}, where *S*_{i}^{(c)} = *S*_{i} if *δ*_{i}^{(c)} = 0, *T*_{ini} if *δ*_{i}^{(c)} = 1, and *T*_{i1} if *δ*_{i}^{(c)} = 2.

The change-point model assumes that *Y*_{ij} follows different trajectories before and after an intervention. The *i*th subject's trajectory before the intervention is *μ*_{0}(*T*_{ij},**X**_{i};**a**_{i}), which is parameterized by the subject-specific parameter **a**_{i}^{T} = (*a*_{i1},…,*a*_{id0})^{T}, *d*_{0} ≥ 1. The change of the trajectory after the intervention is *μ*_{1}(*T*_{ij},**X**_{i},*R*_{ij};**b**_{i}), which is parameterized by the subject-specific parameter **b**_{i} = (*b*_{i1},…,*b*_{id1})^{T}, *d*_{1} ≥ 1, and may depend on the “intervention duration time” *R*_{ij} = *T*_{ij} − *S*_{i} as well as *T*_{ij} and **X**_{i}. Following the usual mixed-effects model framework (Davidian and Giltinan, 1995; Verbeke and Molenberghs, 2000; Diggle *and others*, 2002), a naïve mixed-effects model for evaluating the pre- and postintervention trajectories is

(2.1)

where (**a**_{i}^{T},**b**_{i}^{T})^{T} has unknown mean (*α*^{T},*β*^{T})^{T} and covariance matrix Σ, *ϵ*_{ij} are mean-zero random errors with cov(*ϵ*_{ij1},*ϵ*_{ij2}) = *σ*_{ij1j2}, and *ϵ*_{i1j1} and *ϵ*_{i2j2} are independent if *i*_{1}≠*i*_{2}. The joint distribution of **a**_{i} and **b**_{i} is usually assumed to be multivariate Gaussian {(*α*^{T},*β*^{T})^{T},Σ}. A positive (or negative) value for *μ*_{1}(*T*_{ij},**X**_{i},*R*_{ij};**b**_{i}) would suggest that the intervention tends to increase (or decrease) the mean of *Y*_{ij} given (*T*_{ij},**X**_{i},*R*_{ij}). Wu *and others* (2008) showed that (2.1) could be a misspecified model even if *μ*_{0}(·;**a**_{i}), *μ*_{1}(·;**b**_{i}), *ϵ*_{ij}, and *G*(·) were correctly specified because the “self-selectiveness” of the intervention was ignored.

If the change-point time *S*_{i} depends on the preintervention trend of *Y*_{ij}, a natural extension of (2.1) is

(2.2)

where *F*_{a}(·) is the cumulative distribution function (CDF) of **a**_{i}, *F*_{s}(·|**a**_{i}) and *F*_{b}(·|**a**_{i}) are the conditional CDFs of *S*_{i} and **b**_{i}, respectively, given **a**_{i}, and **b**_{i} and *S*_{i} are independent given **a**_{i}. In contrast to the varying-coefficient model of Wu *and others* (2008), where the conditional means of **a**_{i} and **b**_{i} given *S*_{i} are used, (2.2) incorporates *S*_{i} through *F*_{s}(·|**a**_{i}), which allows *S*_{i} to be left or right censored. We assume for simplicity in (2.2) that **b**_{i} does not depend on *S*_{i}, although further generalizations may allow the distribution of **b**_{i} to depend on (*S*_{i},**a**_{i}). In practice, further specifications of *F*_{a}(·), *F*_{s}(·|**a**_{i}), and *F*_{b}(·|**a**_{i}) are usually needed to balance the computational feasibility and flexibility of the model.

A useful and mathematically tractable specification for (2.2) is to assume that *S*_{i} and **b**_{i} depend on **a**_{i} only through their conditional means. If the conditional means of *S*_{i} and **b**_{i} are linear models of **a**_{i}, a linear shared parameter model for (2.2) is

(2.3)

where *α* = (*α*_{1},…,*α*_{d0})^{T}, *α*_{d}*R*, **β** = (*β*_{1}^{T},…,*β*_{d1}^{T})^{T}, *β*_{l} = (*β*_{l0},…,*β*_{ld0})^{T}, *β*_{ld}*R*, *γ* = (*γ*_{0},…,*γ*_{d0})^{T}, and {*ϵ*_{i} = (*ϵ*_{i1},…,*ϵ*_{ini})^{T},*e*_{i}^{(a)},*e*_{i}^{(b)},*e*_{i}^{(s)}} are independent mean-zero random errors with covariance matrices {**V**_{y},**V**_{a},**V**_{b},*σ*_{s}^{2}}, respectively. Here, the unknown parameters are the mean components *θ* = (*α*^{T},*β*_{1}^{T},…,*β*_{d1}^{T},*γ*^{T})^{T} and the covariance structures **V** = {**V**_{y},**V**_{a},**V**_{b},*σ*_{s}^{2}}.

When the relationship between *S*_{i} and **a**_{i} are unknown, a nonparametric model is *S*_{i} = *μ*^{(s)}(**a**_{i}) + *ϵ*_{i}^{(s)}, where *μ*^{(s)}(**a**_{i}) = *E*(*S*_{i}|**a**_{i}) is a smooth function of **a**_{i}. Since unstructured estimation of *μ*^{(s)}(**a**_{i}) could be difficult when **a**_{i} is a high-dimensional vector, a simple additive model is to replace the relationship between *S*_{i} and **a**_{i} in (2.3) with

(2.4)

where *μ*_{d}^{(s)}(*a*_{id}) are smooth functions of *a*_{id}. Further generalizations of (2.4) are theoretically possible but at the expense of computational complexity.

The parameters in (2.2) can be estimated by a likelihood approach if the distributions can be specified. Let **Y**_{i} = (*Y*_{i1},…,*Y*_{ini})^{T}, **T**_{i} = (*T*_{i1},…,*T*_{ini})^{T}, _{i} = (**T**_{i},**X**_{i}), and *f*_{y}(·), *f*_{b}(·), *f*_{s}(·), and *f*_{a}(·) be the densities of *Y*_{ij}, **b**_{i}, *S*_{i}, and **a**_{i}. The joint density of (**b**_{i},*S*_{i},**a**_{i}) is *f*(**b**_{i},*S*_{i},**a**_{i}) = *f*_{b}(**b**_{i}|**a**_{i})*f*_{s}(*S*_{i}|**a**_{i})*f*_{a}(**a**_{i}). Integrating over **a**_{i} and **b**_{i}, the conditional density of (**Y**_{i},*S*_{i}) given _{i} = (**T**_{i},**X**_{i}) is

The conditional density *f*_{s}(*S*_{i}^{(c)},*δ*_{i}^{(c)}|**a**_{i}) of _{i}^{(c)} = (*S*_{i}^{(c)},*δ*_{i}^{(c)}) given **a**_{i} is *f*_{s}(*S*_{i}|**a**_{i}) if *δ*_{i}^{(c)} = 0, 1 − *F*_{s}(*T*_{ini}|**a**_{i}) if *δ*_{i}^{(c)} = 1, and *F*_{s}(*T*_{i1}|**a**_{i}) if *δ*_{i}^{(c)} = 2. The log-likelihood function for (**Y**_{i},_{i}^{(c)}) conditioning on _{i}, *i* = 1,…,*n*, is

(3.1)

where *f*_{(y,s)}(·|·) is given above, *f*_{(y,1)}(·|_{i},**a**_{i}) and *f*_{(y,2)}(·|_{i},**a**_{i},**b**_{i}) are the densities of **Y**_{i} given {_{i},**a**_{i},*δ*_{i}^{(c)} = 1} and {_{i},**a**_{i},**b**_{i},*δ*_{i}^{(c)} = 2}, respectively, *f*_{(y,1)}(**Y**_{i}|_{i}) = ∫*f*_{(y,1)}(**Y**_{i}|_{i},**a**_{i}){1 − *F*_{s}(*T*_{ini}|**a**_{i})}*f*_{a}(**a**_{i})d**a**_{i} and

When *f*_{(y,s)}(·|·) belongs to a parametric family {*f*_{(y,s)}(·;|_{i}); = (*θ*,**V**)}, we can estimate = (*θ*,**V**) by maximizing the log-likelihood (3.1).

Estimation for the additive model (2.4) can be achieved by an approximate likelihood method. Under some mild smoothness conditions on *μ*_{d}^{(s)}(·) (Huang *and others*, 2004), we can approximate *μ*_{d}^{(s)}(·) by the following B-spline expansion:

(3.2)

where {*B*_{p}^{(d)}(·);1 ≤ *p* ≤ *P*_{d}} is a spline basis, **B**^{(d)}(*a*_{id}) = (*B*_{1}^{(d)}(*a*_{id}),…,*B*_{Pd}^{(d)}(*a*_{id}))^{T}, and *γ*^{(d)} = (*γ*_{1}^{(d)},…,*γ*_{Pd}^{(d)})^{T} is a set of real-valued coefficients. It follows from (3.2) that *S*_{i}≈∑_{d = 0}^{d0}(*γ*^{(d)})^{T}**B**^{(d)}(*a*_{id}) + *ϵ*_{i}^{(s)}. If we substitute ∑_{d = 0}^{d0}*μ*_{d}^{(s)}(*a*_{id}) of (2.4) with ∑_{d = 0}^{d0}(*γ*^{(d)})^{T}**B**^{(d)}(*a*_{id}), the parameters are *θ* = (*α*^{T},*β*_{1}^{T},…,*β*_{d1}^{T},*γ*^{T})^{T}, *γ* = ((*γ*^{(0)})^{T},…,(*γ*^{(d0)})^{T})^{T}, and **V** = {**V**_{y},**V**_{a},**V**_{b},*σ*_{s}^{2}}. Let *f*_{s}^{*}(·;*γ*,*σ*_{s}|**a**_{i}) be the conditional density of ∑_{d = 0}^{d0}{(*γ*^{(d)})^{T}**B**^{(d)}(*a*_{id})} + *ϵ*_{i}^{(s)} given **a**_{i}. If the distributions of {*ϵ*_{ij},*e*_{i}^{(a)},*e*_{i}^{(b)},*e*_{i}^{(s)}} are parameterized by **V**, we can approximate *f*_{s}(*S*_{i}|**a**_{i}) by *f*_{s}^{*}(*S*_{i};*γ*,*σ*_{s}|**a**_{i}) and obtain the following approximate log-likelihood function for (**Y**_{i},_{i}^{(c)}) given _{i},

(3.3)

where = (*θ*,**V**), *f*_{(y,s)}^{*}(·|_{i}), *f*_{(y,k)}^{*}(·|_{i}), *k* = 1,2, are given in (3.1) with *f*_{s}(*S*_{i}|**a**_{i}) replaced by *f*_{s}^{*}(*S*_{i};*γ*,*σ*_{s}|**a**_{i}). If *L*_{d}^{*}() satisfies the regularity conditions for MLEs, the approximate MLE satisfy .

Since the computation of maximum likelihood (ML) or approximate ML estimators requires a nonlinear maximization over the parameter spaces of *θ* and **V**, a global maximization over *θ* and **V** simultaneously could be computationally unfeasible in practice. We propose a 2-stage procedure which combines REMLE with the Newton–Raphson algorithm:

Stage 1. Assume that {*ϵ*_{ij},**a**_{i},**b**_{i},*S*_{i}} of (2.3) or (2.4) are independent random variables with covariance matrices **V** = {**V**_{y},**V**_{a},**V**_{b},*σ*_{s}^{2}}, that is, the naïve mixed-effects model (2.1) holds. Compute of **V** using the REMLE procedure.

Stage 2. Substitute **V** with and maximize with respect to *θ* using the Newton–Raphson procedure. The maximizer is the approximate ML estimator for *θ*.

By (3.1) and the expressions of *f*_{(y,s)}(**Y**_{i},*S*_{i}|_{i}) and *f*_{(y,k)}(**Y**_{i}|_{i}) for *k* = 1,2, the Newton–Raphson algorithm for maximizing at Stage 2 involves multidimensional integrations over the functions of **a**_{i}, **b**_{i}, and *S*_{i} with respect to the joint distributions of **a**_{i} and **b**_{i}. Monte Carlo simulations are used to compute all the necessary quantities, including the log-likelihoods, and their gradients and Hessian matrices. Implementation of the Newton–Raphson algorithm can be computationally costly, since, in each iteration, large Monte Carlo samples are required to compute the gradient and the Hessian matrix. The computational complexity further increases with the number of subjects. If a suitable initial estimator is available, computation of the algorithm can be significantly reduced by a “1-step” Newton–Raphson procedure (Bickel, 1975; Cai *and others*, 2000). The estimators computed from the REMLE can be used as a natural candidate for the initial estimator . Initial estimators of *γ* can be computed by fitting the regression model using those subjects with *S*_{i} observed (i.e. *δ*_{i}^{(c)} = 0), where is the predicted value for **a**_{i}. Further details of this algorithm are provided in Appendix C of the supplementary material available at *Biostatistics* online.

Inferences for the MLE of a parametric model (3.1) may be obtained using their asymptotic distributions. Let be the MLE of . The asymptotic normality of the MLE's (Serfling, 1980, Chapter 4) implies that, when *n* is large, has approximately the multivariate normal distribution (, Var()). An approximate [100×(1 − *α*)]th confidence interval (CI) for (), a linear combination of , is , where is the variance estimator and *Z*_{α/2} is the [100×(1 − *α*/2)]th percentile of the standard normal distribution. For the additive model (2.4), where nonparametric components are present, asymptotic distributions of the approximate MLE in (3.2) have not yet been developed. As a practical alternative, we can generate bootstrap samples by resampling the subjects with replacement and compute the corresponding bootstrap estimators. The estimates obtained from the original sample are natural choices for the initial estimates for the bootstrap samples. A [100×(1 − *α*)]th CI based on percentiles is given by the corresponding lower and upper [100×(*α*/2)]th percentiles (*L*_{α/2},*U*_{α/2}) of the bootstrap estimators. This bootstrap approach has been used in Wu *and others* (2008) among others. We investigate in Section 5 the empirical coverage probabilities (CP) of our CIs in a simulation study.

We applied our method to the ENRICHD Study and investigated whether the use of antidepressants as a pharmacotherapy could lower the BDI scores for patients during the PT period. Wu *and others* (2008) investigated this question by applying their varying-coefficient model to the subsample of 91 depressed patients in the ENRICHD PT arm who had their change-point time *S*_{i} observed during the treatment period and concluded that the use of antidepressants was beneficial for lowering the BDI scores for patients who were eligible for pharmacotherapy. Our analysis was based on all 557 depressed patients who had their status of antidepressant use (yes or no) and exact dates of antidepressant starting time recorded and attended 5 or more PT sessions during the treatment period. In order to have a meaningful interpretation for this study, 3 types of patients were excluded from our analysis: patients in the usual care arm because of the lack of BDI scores observed between the baseline and the end of the treatment period, patients whose pharmacotherapy status and starting dates of antidepressant use were not recorded, and patients who had poor adherence to the required weekly PT sessions, that is, attended less than 5 (approximately 20%) of the sessions. Because antidepressant use for each patient was individually monitored and recorded as accurate as possibly by study psychiatrists (Taylor *and others*, 2005, p. 794), it is reasonable to assume that the missing records on antidepressant starting dates were missing at random. Within our sample, 11 patients used antidepressants before baseline, 92 started antidepressant during the treatment period, and 454 did not use antidepressants before and during the treatment period. The number of visits for these patients ranges from 5 to 36 and has a median of 12.

Let *Y*_{ij}, *T*_{ij}, *S*_{i}^{(c)}, and *R*_{ij} = *T*_{ij} − *S*_{i}^{(c)} be the *i*th patient's BDI score, trial time (months), starting time (months) of antidepressant use, and antidepressant duration time (months), respectively, at the *j*th visit. For all 1 ≤ *i* ≤ *n*, *T*_{i1} = 0, *T*_{ini} is the trial time at the last visit, and *λ*_{ij} = 1_{[Si < Tij]} is the indicator of antidepressants use at *T*_{ij}. The observed (*S*_{i}^{(c)},*δ*_{i}^{(c)}) is (*S*_{i}^{(c)} = *S*_{i},*δ*_{i}^{(c)} = 0) if the *i*th patient used antidepressants within the treatment period, (*S*_{i}^{(c)} = *T*_{ini},*δ*_{i}^{(c)} = 1) if the patient did not use antidepressant within the treatment period, and (*S*_{i} = 0,*δ*_{i}^{(c)} = 2) if the patient used antidepressants before baseline. When the linear models *μ*_{0}(*T*_{ij};*a*_{i0},*a*_{i1}) = *a*_{i0} + *a*_{i1}*T*_{ij} and *μ*_{1}(*R*_{ij};*b*_{i0},*b*_{i1}) = *b*_{i0} + *b*_{i1}*R*_{ij} are used, (*a*_{i0},*a*_{i1}) represents the intercept and slope of the *i*th subject's BDI trajectory before antidepressant use, and (*b*_{i0},*b*_{i1}) is the intercept and slope of the change of the subject's BDI trajectory after antidepressant use.

In order to have a clear and parsimonious interpretation of the overall trend of BDI over the trial time, we examined some preliminary analysis of the data and approximated the relationship between *Y*_{ij} and *T*_{ij} by a simple linear model. Using the naïve mixed-effects model (2.1), we first estimated the unknown parameters *α*_{0}, *α*_{1}, *β*_{00}, and *β*_{10} of

(4.1)

where (*α*_{0},*α*_{1},*β*_{00},*β*_{10})^{T} is the unknown mean vector and Σ is the unstructured covariance matrix for the multivariate normal distribution (·,·). Intuitively, *α*_{0} and *α*_{1} are the mean intercept and slope for BDI scores before antidepressant use, and *β*_{00} and *β*_{10} are the mean intercept and slope for the “correction term” after antidepressant use. Zero value of *β*_{00} + *β*_{10}*R*_{ij} indicates ignorable antidepressant effect on the mean BDI scores.

The left panel of Table 1 shows the estimates of *α*_{0}, *α*_{1}, *β*_{00}, and *β*_{10}, and their standard errors (SEs) and 95% CIs computed using REMLE with unstructured covariance matrix Σ. The negative estimate suggests that the mean BDI score for these patients tends to decrease over the trial time since the start of the PT sessions, while the positive estimate and its 95% CI seem to suggest that the use of antidepressants increase the patients' mean BDI scores. As discussed in Section 2.2, the self-selectiveness of the antidepressant use as a concomitant intervention is not considered in (4.1), so that the positive estimate of *β*_{00} may not reflect the real effect of pharmacotherapy on depression severity.

To account for the possible link between antidepressant starting time and the BDI trend before the use of antidepressants, we examined a series of preliminary plots among *S*_{i}, (*a*_{i0},*a*_{i1}) and (*b*_{i0},*b*_{i1}) using patients in the PT arm who had started using antidepressants during the treatment period. These preliminary results (see Appendix D.1 of the supplementary material available at *Biostatistics* online) suggested that the following linear shared parameter model could give a parsimonious approximation to the variables considered:

(4.2)

where *ϵ*_{i}^{(a)} and *ϵ*_{i}^{(b)} = (*ϵ*_{i0}^{(b)},*ϵ*_{i1}^{(b)})^{T} are mean-zero bivariate normal random vectors with unstructured covariance matrices Σ^{(a)} and Σ^{(b)}, respectively, *ϵ*_{i}^{(s)} is a mean-zero normal random variable with variance *σ*_{s}^{2}, and *ϵ*_{i}^{(a)}, *ϵ*_{i}^{(b)}, and *ϵ*_{i}^{(s)} are independent. Similar to (4.1), (*α*_{0},*α*_{1}) represents the mean intercept and slope for the BDI trend over the trial time before antidepressant use. Given (*a*_{i0},*a*_{i1})^{T}, (*β*_{00},*β*_{01},*β*_{02},*β*_{10},*β*_{11},*β*_{12})^{T} specifies the conditional mean structure for the BDI correction term after antidepressant use.

The middle panel of Table 1 shows the estimates of the coefficients in (4.2) computed using the 2-stage ML procedure with 10 Newton–Raphson iterations, and their SE and 95% percentile CIs computed from the bootstrap procedure. Since the 95% CIs for *β*_{01}, *β*_{02}, *β*_{10}, and *β*_{11} in (4.2) include zero, we repeated the estimation procedure for the submodel of (4.2) with *β*_{01}, *β*_{02}, *β*_{10}, and *β*_{11} set to zero and summarized the estimates, their SEs, and 95% percentile CIs from this submodel in the right panel of Table 1, which gives and . Contrary to the interpretation from (4.1), we obtain from this submodel that since 0 ≤ *R*_{ij} ≤ 6 and , which suggests that pharmacotherapy has on average a beneficial effect for lowering a patient's depression severity. The 95% CIs for *γ*_{1} obtained from (4.2) and its submodel suggests that (4.1) is likely a misspecified model for this data set. These results are consistent with the analysis of Wu *and others* (2008).

The simulation study has 2 objectives: (i) to demonstrate the biases of the naïve mixed-effects models; and (ii) to investigate the practical properties of our ML and approximate MLE procedures. In order to resemble a moderate sized longitudinal study with data structures similar to the ENRICHD pharmacotherapy data, each of our simulated samples contains *n* = 200 or 300 independent subjects who have unequal number of repeated measurements. For *i* = 1,…,*n*, the *i*th subject in the sample has a maximum of 30 “scheduled visits” at time points {*T*_{i,1},…,*T*_{i,30}} = {0,0.2 + *e*_{1},…,5.8 + *e*_{29}}, where {*e*_{l}} are independent “distortions” generated from the uniform *U*( − 0.1,0.1) distribution, but each scheduled visit has 40% probability to be skipped. The remaining observed visiting times are {*T*_{ij};*j* = 1,…,*n*_{i}} with possibly unequal *n*_{i} for different subjects. For each 1 ≤ *i* ≤ *n*, the random parameter **a**_{i} = (*a*_{i0},*a*_{i1})^{T} is generated by adding (*α*_{0},*α*_{1})^{T} = (20,10)^{T} to a randomly generated *ϵ*_{i}^{(a)} from the mean-zero bivariate normal distribution with covariance matrix diag(2.5,1), and **b**_{i} = (*b*_{i0},*b*_{i1})^{T} is generated by adding the conditional mean (*β*_{00} + *β*_{01}*a*_{i0},*β*_{12}*a*_{i1})^{T} with (*β*_{00},*β*_{01},*β*_{12})^{T} = ( − 4,0.01,0.02)^{T} to a randomly generated *ϵ*_{i}^{(b)} from the standard bivariate normal distribution. For each (**a**_{i}^{T},**b**_{i}^{T})^{T}, we generate *S*_{i} from 2 different scenarios: (a) *S*_{i}~*N*(*γ*_{1}*a*_{i0} + *γ*_{2}*a*_{i1},1) with *γ* = (*γ*_{1},*γ*_{2})^{T} = (1, − 1.7)^{T}; (b) *S*_{i}~*N*(*μ*_{0}^{(s)}(*a*_{i0}),0.04), where *μ*_{0}^{(s)}(*a*_{i0}) = − 0.75(*a*_{i0} − 18)^{2} + 6 if *a*_{i0} ≤ 20, and 0.75(*a*_{i0} − 22)^{2} if *a*_{i0} > 20. The purpose of (b) is to evaluate the B-spline estimate when the parametric form of *μ*_{0}^{(s)}(*a*_{i0}) is unknown. For each {*T*_{ij},*S*_{i},**a**_{i},**b**_{i}}, we generate *Y*_{ij} from the subject-specific change-point model

(5.1)

where *λ*_{ij} = 1_{[Tij > Si]}, *R*_{ij} = max(*T*_{ij} − *S*_{i},0), and *ϵ*_{ij}~*N*(0,1). When *S*_{i} < *T*_{i1} or *S*_{i} > *T*_{ini}, we use censored *S*_{i} and obtain {(*Y*_{ij},*T*_{ij},*λ*_{ij},*S*_{i}^{(c)},*δ*_{i}^{(c)});*j* = 1,…,*n*_{i}}.

The simulation was separately repeated 200 times under scenarios (a) and (b) with *n* = 200 and 300. Within each simulation, we used *μ*_{0}(*T*_{ij};**a**_{i}) = *a*_{i0} + *a*_{i1}*T*_{ij} and *μ*_{1}(*T*_{ij},*R*_{ij};**b**_{i}) = *b*_{i0} + *b*_{i1}*R*_{ij}, and estimated the parameters *α* = (*α*_{0},*α*_{1})^{T} and *β* = (*β*_{00},*β*_{01},*β*_{12})^{T} using the naïve mixed-effects model (2.1) and the shared parameter model (2.2). For estimation under the naïve mixed-effects model (2.1), we ignored the relationship between **a**_{i} and *S*_{i} and computed the restricted ML estimators .

For the Gaussian linear shared parameter model (2.3) with *S*_{i} generated from (a), we computed the MLE under (3.1) using the 2-stage estimation procedure with diagonal estimated covariance matrices and 1-step Newton–Raphson iteration. The REML estimators $\stackrel{~}{\alpha}$ and $\stackrel{~}{\beta}$ were used as the initial estimators of *α* and *β* for the Newton–Raphson iteration. For the initial estimator $\stackrel{~}{\gamma}$ of *γ*, we used the subjects with *S*_{i} observed (i.e. *δ*_{i}^{(c)} = 0), computed the predicted values and calculated $\stackrel{~}{\gamma}$ by the least squares estimate from the linear model .

Table 2 summarizes the averages of the estimates and their SEs, the mean squared errors (MSEs), and the empirical CP for the 95% bootstrap CIs based on the simulated samples with *S*_{i} generated from (a). In each simulated sample, the SE and CP are computed using *B* = 200 bootstrap samples. The estimates of *α* and *β* obtained under the naïve mixed-effects model (2.1) have significantly larger biases and MSEs than the estimates obtained under the linear shared parameter model (2.3). The 95% bootstrap CIs constructed under (2.1) have low empirical CPs, while the CPs for (2.3) are close to the nominal level of 95%. These results suggest that, because of the potential model misspecification and estimation bias, the naïve mixed-effects models may lead to erroneous conclusions when a concomitant intervention is present, while the estimation bias can be dramatically reduced by a shared parameter model.

For samples with *S*_{i} generated from (b), we used the additive shared parameter model (2.4) with *S*_{i} = *μ*_{0}^{(s)}(*a*_{i0}) + *ϵ*_{i}^{(s)} and Gaussian errors. We approximated *μ*_{0}^{(s)}(*a*_{i0}) using the quadratic B-splines with one interior knot at *a*_{i0} = 20 and estimated (*α*^{T},*β*^{T})^{T} using the approximate likelihood (3.2) and the 2-stage estimation procedure with diagonal estimated covariance matrices and 1-step Newton–Raphson iteration. The estimated curve *μ*_{0}^{(s)}(*a*_{i0}) was computed by the B-splines with the estimated spline coefficients. We computed the SEs and the 95% bootstrap CIs for (*α*^{T},*β*^{T})^{T} using the “resampling-subject” bootstrap procedure with *B* = 200 replications. Pointwise 95% CIs for *μ*_{0}^{(s)}(*a*_{i0}) were also computed using the same bootstrap procedure.

Table 3 summarizes the averages of the estimates and their SEs, the MSEs and the empirical CPs for the 95% bootstrap CIs based on the 200 simulated samples with *S*_{i} generated from (b). Similar to Table 2, the estimates obtained under the naïve mixed-effects model (2.1) have mostly larger MSEs than the estimates obtained under the additive shared parameter model (2.4), and the CIs obtained under (2.1) have lower empirical CPs than those obtained under (2.4). The left panel of Figure 1 shows the true curve *μ*_{0}^{(s)}(*a*_{i0}), and the spline estimated curve for *μ*_{0}^{(s)}(*a*_{i0}) together with its 95% bootstrap CIs at 600 equally spaced *a*_{i0} points within [17,23] computed from one randomly selected simulated sample. The right panel of Figure 1 shows the empirical CPs of these pointwise 95% bootstrap CIs computed over all the simulated samples with *n* = 200, and the average of these CPs over the 600 equally spaced *a*_{i0} points within [17,23]. The spline estimated curve appears to be reasonably close to the true curve *μ*_{0}^{(s)}(*a*_{i0}), and the CPs of the pointwise 95% bootstrap CIs are generally higher than the nominal level of 95% in the interior of the range of *a*_{i0}.

True parameter values, averages of the parameter estimates and their SEs, the MSEs, and the empirical CPs of the 95% CIs computed over the simulated samples generated from the additive shared parameter model (2.4)

Our approach suggests that concomitant interventions should not be treated the same way as a usual time-dependent covariate, and a shared parameter model may reduce the estimation bias by incorporating the self-selectiveness of the intervention. A number of extensions are worthwhile for further investigations. First, a patient may have multiple concomitant interventions in practice, so that a multiple change-point extension of (2.2) would be needed in such situations. Second, a worthy future development is to incorporate missing data and multiple concomitant interventions into a single framework of joint models. Third, asymptotic distributions of our estimators may provide useful insights and should be developed. Finally, since it may not be always clear whether an intervention is a concomitant intervention, model diagnostic methods for evaluating the appropriateness of a shared parameter model would be a valuable tool to be developed.

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Financial support of the ENRICHD study was provided by contracts N01-HC-55140, N01-HC-55141, N01-HC-55142, N01-HC-55143, N01-HC-55144, N01-HC-55145, N01-HC-55146, N01-HC-55147, and N01-HC-55148 from the National Heart, Lung and Blood Institute.

We would like to thank Drs Myron Waclawiw and Minjung Kwak, and the reviewers for their thoughtful comments. *Conflict of Interest:* None declared.

- Bickel PJ. One-step Huber estimates in linear models. Journal of the American Statistical Association. 1975;34:584–653.
- Cai J, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888–902.
- Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. London: Chapman Hall; 1995.
- Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd edition. Oxford: Oxford University Press; 2002.
- Investigators ENRICHD. Enhancing recovery in coronary heart disease patients (ENRICHD): study intervention rationale and design. Psychosomatic Medicine. 2001;63:747–755. [PubMed]
- Investigators ENRICHD. Enhancing recovery in coronary heart disease patients (ENRICHD): the effects of treating depression and low perceived social support on clinical events after myocardial infarction. Journal of the American Medical Association. 2003;289:3106–3116. [PubMed]
- Follmann D, Wu MC. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed]
- Huang JZ, Wu CO, Zhou L. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica. 2004;14:763–788.
- Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: Wiley & Sons; 1980.
- Taylor CB, Youngblood ME, Catellier D, Veith RC, Carney RM, Burg MM, Kaufmann P, Shuster J, Mellman T, Blumenthal J. and others. Effects of antidepressant medication on morbidity and mortality in depressed patients after myocardial infarction. Archives of General Psychiatry. 2005;62:792–298. [PubMed]
- Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer; 2000.
- Wall MM, Dai Y, Eberly LE. GEE estimation of a misspecified time-varying covariate: an example with the effect of alcoholism treatment on medical utilization. Statistics in Medicine. 2005;24:925–939. [PubMed]
- Wu CO, Tian X, Bang H. A varying-coefficient model for the evaluation of time-varying concomitant intervention effects in longitudinal studies. Statistics in Medicine. 2008;27:3042–3056. [PubMed]

Articles from Biostatistics (Oxford, England) are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |