Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2746699

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. APPLICATION: SMOKING CESSATION TRIAL
- 3. JOINT MODELS AND PRIORS
- 4. POSTERIOR SAMPLING
- 5. MODEL SELECTION AND GOODNESS OF FIT
- 6. DATA ANALYSIS
- 7. CONCLUSIONS AND DISCUSSION
- Supplementary Material
- REFERENCES

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2010 June 1.

Published in final edited form as:

J Am Stat Assoc. 2009 June 1; 104(486): 429–438.

doi: 10.1198/016214508000000904PMCID: PMC2746699

NIHMSID: NIHMS143238

Xuefeng Liu, Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN 37614;

Email: mdaniels/at/stat.ufl.edu

See other articles in PMC that cite the published article.

Joint models for the association of a longitudinal binary and a longitudinal continuous process are proposed for situations in which their association is of direct interest. The models are parameterized such that the dependence between the two processes is characterized by unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients. A Markov chain Monte Carlo (MCMC) sampling algorithm is developed for sampling from the posterior distribution, using data augmentation steps to handle missing data. Several technical issues are addressed to implement the MCMC algorithm efficiently. The models are motivated by, and are used for, the analysis of a smoking cessation clinical trial in which an important question of interest was the effect of the (exercise) treatment on the relationship between smoking cessation and weight gain.

In some longitudinal studies, although one time-varying outcome may be of primary interest, several related processes are measured. Examples in smoking cessation studies include smoking status and weight change and smoking status and alcohol use. In such studies, the association between the processes can reveal a great deal about the mechanism of behavior change. For example, a motivation for using exercise as an adjunct therapy for smoking cessation is to reduce the dependence between weight gain and relapse back to smoking and between the fear of weight gain and the inability to make a successful quit attempt. In this study we built models in the setting of two processes: a longitudinal binary process (e.g., smoking cessation) and a longitudinal continuous process (e.g., weight change). Our primary interest was to model the association between the two processes. We apply the models to a recent smoking cessation trial (Marcus et al. 2003), where the investigators were interested in the relation between smoking status and weight change.

Our approach builds on and extends recent research on joint modeling of mixed outcomes and Bayesian variable selection. For joint modeling, a well-known technique of joint modeling of mixed outcomes is based on introducing a partly observed random variable following a bivariate normal distribution, where one component defines the continuous outcome and the second, latent component defines the binary outcome through the common probit transformation. Authors taking this type of approach include Catalano and Ryan (1992), Cox and Wermuth (1992), Dunson (2000), Fitzmaurice and Laird (1995), Gueorguieva and Agresti (2001), Regan and Catalano (1999), Roy and Lin (2000), and Sammel, Ryan, and Legler (1997). In the present work, we extend this approach to longitudinal data with *T* individual measurements by considering a partly observed random variable following a 2T-variate normal distribution where the first *T* components define the binary outcomes by applying probit transformations and the last *T* components are continuous outcomes. Moreover, the main question of interest in previous studies was the effect of some treatment or therapy on the mean of the response vector. The effect of the treatment on the association between two outcomes was not of main interest. We propose similar models in which the association between two processes is of concern. To do this, we use a Bartlett decomposition of the covariance matrix (Bartlett 1933).

The association matrix induced by the Bartlett decomposition is high-dimensional and expected to be sparse, so we borrow ideas from the Bayesian variable selection literature to reduce the number of parameters (George and McCulloch 1993, 1997; Smith and Kohn 2002). Other related work includes that of Carlin and Chib (1995), Chipman (1996), Hoeting, Raftery, and Madigan (1996), and Wakefield and Bennett (1996). George and McCulloch developed stochastic search variable selection (SSVS) to select promising subsets of a set of covariates *X*_{1}, . . ., *X _{p} for* further consideration in regression models. Smith and Kohn (2002) proposed similar techniques for modeling a covariance matrix with high dimension for longitudinal data. In this study we construct a hierarchical prior to parsimoniously model the association between two longitudinal processes by extending the ideas of Smith and Kohn (2002). The hierarchical specification has the advantage that potential 0’s in the association matrix can be identified and estimates of the parameters can be calculated to account for the model uncertainty associated with determining which elements are 0’s.

We develop a Markov chain Monte Carlo (MCMC) algorithm to estimate the posterior distribution of the parameters in the models. To implement the MCMC algorithm more efficiently, we address several technical issues, including efficiently sampling from truncated multivariate normal (TMVN) distributions and efficiently sampling a correlation matrix from its full conditional distribution. Geweke (1991) and Robert (1995) proposed a Gibbs sampling algorithm to sample from a TMVN distribution. This algorithm is somewhat inefficient in that it requires repeatedly evaluating conditional means and variances from univariate conditional normals and can result in high autocorrelations in the chain. Barnard, McCulloch, and Meng (2000) and Chib and Greenberg (1998) suggested using the Griddy Gibbs (GG) sampler and the random-walk Metropolis-Hastings (RW-MH) algorithm to sample a correlation matrix. Although the GG sampler is simple to implement, it is not computationally efficient. The RW-MH algorithm has the problem of potentially slow mixing. We propose better ways to handle these issues.

The article is organized as follows. We introduce a smoking cessation clinical trial that motivates this research in Section 2. In Section 3 we propose joint models and hierarchical priors used to parsimoniously model the association between two processes. In Section 4 we describe MCMC sampling techniques to estimate the posterior distribution of parameters and address several technical issues regarding efficient implementation of the MCMC algorithm. We discuss the deviance information criterion (DIC) for model comparison and the calibrated posterior predictive *p* value for goodness of fit in Section 5. Finally, we present the results and conclusions from the analysis of the clinical trial in Section 6.

Commit to Quit II (CTQ II) (Marcus et al. 2003, 2005) was a 4-year randomized controlled clinical trial designed to test the efficacy of moderate-intensity physical activity as an aid for smoking cessation among women. This study was a logical progression of previous work (CTQ I) (Marcus, King, Albrecht, Parisi, and Abrams 1997; Marcus et al. 1999) on the efficacy of vigorous-intensity exercise to aid smoking cessation and weight regulation in women smokers, because moderate-intensity exercise is less arduous and can be performed by healthy individuals without medical supervision. In the CTQ II trial, 217 healthy women aged 18-65 who had regularly smoked 5 or more cigarettes per day for at least 1 year and who had routinely participated in moderate or vigorous intensity physical activity for 90 minutes or less each week were recruited and randomized to one of the two conditions (treatments): a moderate-intensity exercise condition or a contact condition. These two treatments are designated *exercise* and *wellness*. All recruited women participated in an 8-week cognitive-behavioral group-based smoking cessation program, followed by a 12-month follow-up. Participants in the exercise group were required to attend one supervised exercise session per week, on their smoking cessation treatment night. They were also given written instructions for home exercises. The duration and intensity of the exercise were gradually increased to 165 minutes per week, which could be performed onsite or at home. Participants in the contact group received lectures, films, and handouts on a variety of health and lifestyle issues. All participants were encouraged to attend makeup sessions if they failed to attend any session during the 8 weeks of treatments. Smoking status was determined through self-report and carbon monoxide testing at each session. In addition, participants were weighed on a weekly basis during the 8 weeks of treatment. This design allowed for a comparison of the effect of moderate-intensity physical activity plus standard smoking cessation with the effect of contact plus standard smoking cessation.

The primary outcome was quit status (a longitudinal binary outcome), but another outcome—weight change (a longitudinal continuous outcome)—also was measured. The investigators were interested in two questions:

- Does moderate-intensity exercise have significant effects on smoking cessation? In the study of the CTQ I trial, the investigators found significant differences between vigorous activity and contact control group through 12 months of follow-up.
- Does exercise effect smoking cessation by weakening the association between smoking status and weight gain?

Question 2 motivates our development of joint models for the association of longitudinal binary and continuous processes to examine the differential patterns of association across treatments in Section 3. But our approach also allows us to answer question 1.

Several authors have developed joint models for the analysis of multivariate longitudinal data using latent normal variables (Daniels and Normand 2006; Dunson 2003; Gueorguieva and Sanacora 2006). In this section we propose similar joint models for the association of a longitudinal binary and a longitudinal continuous process. In our setting the time-dependent covariance matrix is modeled as a function of predictors, however, and is of primary interest.

Denote the binary outcome (in our example, smoking status) for subject *j* in treatment *i* at week *t(i* = 1, . . . , *m;j* = 1, . . . , *n _{i}*;

(1)

where **X**_{ij} is the design matrix, ** β** is the vector of regression coefficients, and

Using the probit formulation for the binary process, we have *Q _{ij,t}* =

then the new models can be expressed as

(2)

and

(3)

where **B**_{i} = **Σ**_{i,21} **Σ**^{-1}_{i,11} is the matrix that reflects association between **Y**_{ij} and **W**_{ij}, **ε**_{1i} ~ N(**0, Σ**_{i,11}), and with . The reparameterization of **Σ**_{i} to is known in the literature as the Bartlett decomposition of a covariance matrix (Bartlett 1933).

It is easy to see that (2) is a correlated probit model and (3) is a standard correlated regression model, conditional on the latent variable **Y**_{ij}. For identifiability, it is common to restrict **Σ**_{i,11} to be a correlation matrix, **R**_{i,11} (Chib and Greenberg 1998); in the rest of this article, we use **R**_{i,11} instead of **Σ**_{i,11} as notation for the covariance matrix in (2). The advantage of this factorization is that the components of **B**_{i} in (3) are directly related to the variance and correlation terms in **Σ**_{i,12}. In addition, this factorization provides a convenient parameterization for examining the association between **Y**_{ij} (**Q**_{ij}) and **W**_{ij}, because the components of the **B**_{i} matrix are unconstrained.

Beside being unconstrained, the association matrix **B**_{i} in model (3) can be easily interpreted. The *t*th row of **B**_{i} reflects the association of the continuous process at week *t* with the binary process at all weeks *(t* = 1, . . . , *T)*. In particular, it corresponds to the regression model where *b*_{i,t} = (*b*_{i,t1}, . . . , *b*_{i,tT})’ is the *t*th row of **B**_{i}. Because the covariates associated with **b**_{i,t}**, Y _{ij}** -

For Bayesian inference, we need to specify priors for parameters in the models described in Section 3.1. Let **b**_{i} denote the column vector obtained by stringing the rows of **B**_{i} (*i* = 1, . . . , *m*); that is, **b**_{i} = (*B*_{i,11}, . . . , *B*_{i,1T}, . . . , *B _{i,TT}*)’. Let

(4)

This specification implies that *β*, b_{i}, and are a priori jointly independent of **R**_{i,11} and, marginally, ** β** and are a priori independent. Because we have little prior information for

(5)

(6)

and

(7)

where *r _{i,jk}* (

We now provide details on the prior . Because the components of **b**_{i} are the regression coefficients of **W**_{ij} on **Y**_{ij}, we expect many of the components of **b**_{i} to be 0’s based on conditional independence (Markov-type) arguments for longitudinal data. Consider, for example, the regression for *w _{ij,t}*. Once we condition on {

A key feature of the hierarchical prior is that each component of **b**_{i} (recall that each component is on the same scale because they are standardized regression coefficients) can be exactly 0 with positive probability. To attain this, we introduce the latent indicator vector *δ*_{i} associated with **b**_{i} such that

where **b**_{i,tl} is the component of the *t*th row and *l*th column in **B**_{i} and *δ*_{i,tl} is the corresponding binary indicator of *δ*_{i} that is associated with *b _{i,tl.}* The nonzero components of the vector

and covariance matrix

where **Z**_{ij} = **W**_{ij} - **X**_{2ij}*β*_{2}, **C**_{ijδ} is obtained by removing from **C**_{ij} the columns corresponding to zero elements of **b**_{i}, and **C**_{ij} is *T* × *T*^{2} matrix with elements that are functions of **Y**_{ij} and the *t*th row, .The vector **δ**_{i} is then given the prior

(8)

(9)

where *δ*_{i} = (*δ*_{i,11}, . . . , *δ*_{i,1T}, . . . , *δ*_{i,T1}, . . . , *δ*_{i,TT})’; *δ*_{i,tl} (*i* = 1, . . . , *m*; *l, t* = 1, . . . , *T*) is the element of *δ*_{i} associated with *b _{i,tl}*, which is the regression coefficient of

The prior for **b**_{i}, given *δ*_{i}, is derived based on

(10)

The rationale for (10) is that the prior provides only 1/*n*th of the weight provided by the likelihood.

Based on this prior construction, the quantity in prior (8) can be considered the prior probability that *b _{i,tl}* will require a nonzero estimate, and |

A complication with the prior for **b**_{i} is that it is a function of , which complicates the form of the full conditional distributions for the MCMC algorithm described in the Web appendix, Section I (www.amstat.org/publications/jasa/supplemental_materials). To address this issue, we replace the mean and covariance for the prior with *μ*^_{biδ} and covariance _{biδ}, defined later. These quantities are computed from an MCMC run with **b**_{i} = 0 as and , where *k* indexes the posterior sample. An alternative would be to just use a mean and variance with the posterior means of plugged in.

Given that we have little prior information on ** β**, and

We develop an MCMC algorithm to sample from the posterior distribution of the parameter , where , **p** = (*p*_{1}, . . . , *p*_{m})’, and **R**_{1}, **b**, and are defined as in (4). To simplify implementation of the algorithm, we include a data augmentation (DA) step (Tanner and Wong 1987) that we use to impute the latent data and the missing values. Let **Q**_{obs} be the vector of observed binary outcomes, **Q**_{mis} be the vector of missing binary outcomes, **W**_{obs} be the vector of observed continuous outcomes, and **W**_{mis} be the vector of missing continuous outcomes. Let **Y**_{obs} denote the latent vector associated with **Q**_{obs} and let **Y**_{mis} denote the latent vector related to **Q**_{mis.} Define **Y** to be **Q** to be and **W** to be . We use the generic notation *f* for the distribution of responses, ** π** for the prior and posterior distributions of related parameters, and

- DAI step. Sample latent data
**Y**and missing values**W**_{mis}from*f*(**Y, W**_{mis}|**θ**,**Q**_{obs},**W**_{obs}). To do this, factor this distribution as(11) - PS step (posterior sampling step). Generate
from*θ**f*(|*θ***Y, W**) using Gibbs sampling. The DAI step involves sampling in the order from [**Y**_{obs}|*θ*, Q_{obs},**W**_{obs}], [**Y**_{mis}|*θ*, Y_{obs},**W**_{obs}], and [**W**_{mis}|*θ*, Y_{obs},**Y**_{mis},**W**_{obs}]. Once we obtain latent data and missing values, we need to sample from full conditional distributions of the components of. This can be completed using the Gibbs sampler. All of the full conditional distributions for the Gibbs sampler are derived in the Web appendix, Section I (www.amstat.org/publications/jasa/supplemental_materials).*θ*

To implement the MCMC algorithm efficiently, two technical issues must be addressed: imputing the latent data and sampling the correlation matrix.

One of the challenges with the multivariate probit models is the simulation of latent variables from the TMVN distribution of **Y**_{obs} given (*θ*, Q_{obs}, **W**_{obs}). We propose an algorithm to sample from this distribution efficiently.

For simplicity, assume that **Y** is a *T* × 1 vector and **Y** ~ N(** μ**,

and

then we have where

(12)

Here *U*_{1t} = {*y _{t}*

Suppose that **Y**~ TN(** μ, Σ**)

This proposition is motivated by the integral calculation method mentioned by Chib and Greenberg (1998, p. 354). The idea behind this proposition is to obtain a more efficient implementation of the Gibbs sampler based on the new set of *T* conditional distributions of components of **Z**. These distributions are simple in the sense that we do not need to compute means and variances in (12), and the univariate truncation intervals *U*_{2t} can be easily derived. For example, in our model, **Y**~ TN(** μ, R**)

Thus, at iteration *k*, , where *ν*_{t} is the *t*th element of ** ν** and U

(13)

The Markov chain in our implementation of the Gibbs sampler can be obtained by first generating all components of **z** one by one from (*t* = 1, . . . , *T*), and then translating back to **y**^{(k)} by **y**^{(k)} = **Pz**^{(k)}.

Sampling correlation matrices in MCMC algorithms can be problematic. In addition to the positive definite constraint of covariance matrices, they have diagonal elements fixed at 1. The ideas for data augmentation and parameter expansion, introduced by Liu, Rubin, and Wu (1998), Liu and Wu (1999), and van Dyk and Meng (2001) to speed up convergence of algorithms (EM, DA, or others), provide a useful tool for addressing this problem. Liu (2007) and Liu and Daniels (2006) developed two-stage parameter expanded reparameterization and Metropolis-Hastings (PX-RPMH) algorithms for sampling a correlation matrix **R** by extending this idea. In these algorithms, the difficulty of simulating **R** can be overcome by creating an “expanded” model in which **R** can be transformed to a less constrained covariance matrix **Ψ** by borrowing the scale parameters from an expansion parameter matrix. In what follows, we derive an PX-RPMH algorithm for sampling the correlation matrix **R**_{i,11} in the joint models in Section 3. For notational convenience, we denote **R**_{i,11} as **R**_{i} in this section.

Define *θ*_{(-Ri)} to be the parameter vector not including **R**_{i}, and define **D**_{i} to be the expansion parameter, which is a diagonal matrix that we introduce to transform **R**_{i} into a less constrained covariance matrix, **Ψ**_{i} = **D**_{i}**R**_{i}**D**_{i}. Consider the following one-to-one mapping from {**Y**_{ij}, **R**_{i},**B**_{i} to :

(14)

where for any *t* = 1, . . . , *T*. Given ** β**, the step that draws

(15)

where *a _{i}* is a constant to be determined, we can derive a (parameter-expanded) candidate density (PXCD) for

If we choose priors as specified in Section 3.2, then, from the likelihood function for the complete data in (2), transformation (14) and candidate prior (15), we obtain

(16)

where *ν*_{i} = *n*_{i} - *T*, , , and . That is, , **B**_{i}, ** β** has an inverse-Wishart distribution with degrees of freedom

Proposition 2 gives the PXCD of **Ψ**_{i} to use as the proposal density in the Metropolis-Hastings stage. In this stage, we first simulate **Ψ**_{i} from (16), and then obtain the correlation matrix **R**_{i} through the reduction function . Second, we keep the candidate **R**_{i} with probability *α _{i}* (the acceptance rate in the Metropolis-Hastings algorithm). Sampling

Assume that *θ*_{(-Ri)} and **R**_{i} are a priori independent, that is, ** π**(

Theorem 1 provides a simple way to simulate the correlation matrix in the models proposed in this study.

Previous work (Liu 2007; Liu and Daniels 2006) has shown that the PX-RPMH algorithm is more efficient than other methods, such as the GG sampler (Ritter and Tanner 1992) and RW-MH algorithms (Chib and Greenberg 1998), for sampling a correlation matrix. Those authors compared the performance of the algorithms in detail, and the algorithms will generalize to our setting, the joint models specified in Section 3.

For sampling from the TMVN as described in Section 5.2.1, we conducted several simulations to compare our method with the Gibbs sampling technique of Robert (1995). In particular, we evaluated the mixing of Markov chain output from the two algorithms by calculating the lag-*n* (*n* = 1, 2, 3,. . .) autocorrelation of each component of **Y**. (The faster the decay of autocorrelation, the faster the mixing.) We denote our algorithm by LD-A and that of Robert (1995) by R-A. The decay of the autocorrelation was much faster for the LD-A algorithm than for the R-A algorithm; see the Web appendix, Section III (www.amstat.org/publications/jasa/supplemental_materials) for additional details. We next provide some summary remarks.

One computational inefficiency of the R-A for sampling from the TMVN distribution is that we need to repeatedly compute conditional means and variances in (12). The LD-A does not require this updating. The computational gains from this aspect are minimal, however.

The mixing of the chain from the R-A grows slower as the truncation region *U*_{1} increases; however, the LD-A has the advantage of fast mixing, regardless of the volume of *U*_{1}. Specifically, when the truncation region approaches infinity (i.e., no truncation), the LD-A provides an iid sample.

The correlation between components of **Y** has little influence on the mixing of the chain from the LD-A, whereas it affects the performance of the R-A. We see this when comparing the two algorithms for various choices for **Σ**.

We now consider the problem of comparing alternative models. For the models in Section 3, competing models arise from restrictions on the prior for the association matrix **B**_{i}, correlation matrices **R**_{i,11}, and/or conditional covariance matrices For example, we might assume that **R**_{i,11} = **R**_{11} and when there are insufficient data to estimate all of the parameters. Although Bayes factors and marginal likelihoods are appealing, these approaches are very difficult to implement in a complex hierarchical model, such as the one proposed in this article (Liu 2006). As a result, we derive the deviance information criterion (DIC) (Spiegelhalter, Best, Carlin, and van der Linde 2002) to compare the alternative models. A main reason that we use the DIC is that its computation will be a ready byproduct of the MCMC simulations. But we note its drawbacks, which include lack of invariance to reparameterization and potential ambiguity in the choice of likelihood.

The DIC is defined as a classical estimate of fit plus a penalty,

where is the posterior mean of the deviance and is the effective number of parameters in the model. Given that we have missing data from subjects dropping out, we use the observed data likelihood as the likelihood to construct the DIC (Celeux, Forbes, Robert, and Titterington 2006; Daniels and Hogan 2008). Thus is defined as

and *p _{D}* is given by

where ** θ** = (

The “best” model chosen from models under consideration according to the DIC still may not fit the data well. To check model fit, we use posterior predictive checks based on a discrepancy function (Gelman, Meng, and Stern 1996). The “significance” of these checks often is summarized with the posterior predictive *p* value (ppp) (Gelman et al. 1996); however, the ppp has the major problem that its distribution under the null tends to concentrate around .5 (Robins, Ventura, and van der Vaart 2000). As a result, we check the goodness of fit of the selected joint models based on the calibrated ppp (cppp) (Hjort, Dahl, and Steinbakk 2006), defined as

(17)

where **u** = (**Q**’,**W**’)’, *ppp*(**u**^{obs}) is the ppp derived by Gelman, Mechelen, Verbeke, Heitjan, and Meulders (2004), and **U** has the distribution implied by the prior and the model. Note that **u**^{obs} corresponds to the observed vector of responses along with the data augmented responses sampled at each iteration (i.e., **Q**_{mis} and **W**_{mis}). We discuss the form of the discrepancy function in the example. Hjort et al. (2006) showed that the distribution of *ppp*(**U**) is a Uniform(0,1) and that the cppp in (17) is a proper *p* value. To calculate *ppp*(**U**) in (17), we need to first produce a sample of **U**. We can obtain **U** by first sampling ** θ** from (4) and then drawing

To approximately sample ** β** and from flat priors, we use the following diffuse priors in place of the improper priors for

We use the methodology described in Sections 3-5 to analyze the association of longitudinal quit status and weight change in the CTQ II clinical trial described in Section 2. We removed the observations at week 1 and 2 from the data, because the quit rates in these two weeks were very low (0 and 1.10%) due to the design of the study in which subjects were not supposed to try to quit smoking until week 3. Although participants were encouraged to make up missed sessions, there still existed intermittent missing values in quit status and/or weight gain. In addition, a large number of subjects dropped out before the end of the experiment. In what follows, we assume this missingness is ignorable.

For the mean of the two longitudinal responses as a function of covariates, we set ** β** in (1) to be the vector of means at each time point across treatments; thus, the

We considered several models arising from restrictions on the association matrixes **B**_{i}, the prior for the association matrices **B**_{i}, the correlation matrices **R**_{i,11} and conditional covariance matrices . We fit a total of nine models to the CTQ II trial data. Denote the *k*th alternative model by *M _{k}*. Table 1 gives the details of all of the models considered.

For each model in Table 1, we computed the DIC using the methodology derived in Section 5. The DICs for all of the models are given in Table 2. From Table 2, we can see that the DIC is the smallest for model *M _{5}* and the largest for model

For the CTQ II data, we defined the discrepancy function to be weekly quit rates or weekly average weight gains. Table 3 gives the cppp for quit rates and average weight gains at each week across treatments for model 5 (*M*_{5}). These *p* values were calculated based on comparison of 2,000 pairs of [*ppp*(**u**), *ppp*(**u**^{obs})], with the ppp obtained using 5,000 iterations after burn-in. From this table, we can see that there are no extreme *p* values (<.05 or >.95). These checks suggest that the joint models (*M*_{5}) fit the mean structure of the CTQ II clinical trial data well.

Given the results presented in Sections 6.1 and 6.2, we base inference on model *M*_{5} (Table 1). We ran the MCMC algorithm described in Section 4.1 until convergence (determined by examining trace plots of multiple chains) and based inference on the last 10,000 iterations after burn-in.

Posterior means and 95% credible intervals (CIs) for quit rates across treatments are given in Table 4. The quit rates over time were slightly lower in the exercise treatment than in the wellness treatment. The 95% CI for the difference of quit rates between two treatments at the final week was (-.014, .150), marginally significant. This suggests that the exercise treatment does not have a positive effect on smoking cessation (more later).

We now turn our attention to the association between smoking cessation and weight gain. The *p _{i}* (

This weakening also can be seen by examining the posterior means of association matrixes across treatments, as given in Table 5. We have removed from the table those elements of the **B**_{i} matrix with probabilities of the corresponding indicators being equal to 1 of <.1. The weakened associations between smoking cessation and weight gain is obvious by noting the presence of more 0’s under the exercise treatment and the larger magnitude of the (standardized) coefficients.

Table 6 shows the posterior means of pairwise correlations with 95% credible intervals; correlations whose 95% credible intervals covered 0 are excluded. We can see that smoking cessation and weight gain appear to have a lagged correlation structure, and that exercise weakens pairwise correlations. In particular, we point out the 2 × 2 blocks in the upper right corners of pairwise correlation matrixes under both treatments (in bold-type). For the wellness treatment, the four pairwise correlations between weight gain at the beginning of the study and quit status at weeks 5 and 6 are all negative. This means that people who gain weight early in the trial are more likely to be smoking at the end. The corresponding correlations are essentially 0’s (no longer significant) under the exercise arm. In addition, looking at the last row in Table 6 for the wellness arm, the correlations indicate those who quit early in the study are more likely to gain weight by the end (week 6); the corresponding relationship in the exercise arm is weaker.

We have developed joint models for the association of longitudinal binary and continuous processes and applied them to the analysis of the CTQ II clinical trial to gain insight into the joint evolution of smoking status and weight gain. The results show that moderate-intensity exercise was not successful for smoking cessation, but that it did appear to weaken the association between smoking status and weight gain, supporting the hypothesis that exercise has an effect on smoking cessation by weakening the association between quitting smoking and gaining weight.

But we should be cautious in overinterpreting these results, because of the low compliance in the exercise arm. We might expect the low compliance to negatively bias the smoking cessation results (probability of quitting was too low) on the exercise arm; that is, the intention-to-treat effect (randomization to the exercise arm) reported here might be expected to be quite different from the causal effect (adherence to the exercise regimen). But the ability to still see the weakened association between weight gain and smoking on the exercise arm despite the low compliance supports this as the mechanism of action for exercise as a therapy for smoking cessation. In future work, we will extend these joint models to estimate causal effects and allow for nonignorable dropout.

An alternative way to factor model (1) is to decompose it into a marginal model for **W**_{ij} and a conditional probit regression model for **Y**_{ij}, conditioning on **W**_{ij}. This factorization could potentially greatly increase the computational burden in sampling the correlation matrix if the diagonal elements of **Σ**_{i,11} were still fixed at 1 (due to not being identified); however, this computational problem could be avoided by fixing the diagonal elements of the conditional covariance matrix to be l’s (i.e., make this matrix a correlation matrix). For interpretation of the mean parameters, ** β**, this computationally simpler approach would require adjusting the

The general methodology proposed here can be applied to analysis of other data sets where there are two processes and the question of interest is the association between the two processes. The methodology also can be directly extended to other longitudinal cases, such as modeling the association between longitudinal ordinal and continuous processes or between two continuous processes (Liu 2006).

Xuefeng Liu, Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN 37614.

Michael J. Daniels, Department of Statistics, University of Florida, Gainesville, FL 32611.

Bess Marcus, Department of Psychiatry & Human Behavior, Brown University, Providence, RI 02912.

- Barnard J, McCulloch R, Meng XL. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations With Application to Shrinkage. Statistica Sinica. 2000;10:1281–1311.
- Bartlett MS. On the Theory of Statistical Regression. Proceedings of the Royal Society of Edinburgh. 1933;53:260–283.
- Carlin BP, Chib S. Bayesian Model Choice via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society, Ser. B. 1995;57:473–484.
- Catalano PJ, Ryan LM. Bivariate Latent Variable Models for Clustered Discrete and Continuous Outcomes. Journal of the American Statistical Association. 1992;87:651–658.
- Celeux G, Forbes F, Robert CP, Titterington DM. “Deviance Information Criteria for Missing Data Models” (with discussion) Bayesian Analysis. 2006;1:651–706.
- Chen M-H, Shao Q-M. Properties of Prior and Posterior Distributions for Multivariate Categorical Response Data Models. Journal of Multivariate Analysis. 1999;71:277–296.
- Chib S, Greenberg E. Bayesian Analysis of Multivariate Probit Models. Biometrika. 1998;85:347–361.
- Chipman H. Bayesian Variable Selection With Related Predictors. Canadian Journal of Statistics. 1996;24:17–36.
- Cox DR, Wermuth N. Response Models for Mixed Binary and Quantitative Variables. Biometrika. 1992;79:441–461.
- Daniels MJ. Bayesian Modelling of Several Covariance Matrices and Some Results on Propriety of the Posterior for Linear Regression With Correlated and/or Heterogeneous Errors. Journal of Multivariate Analysis. 2006;97:1185–1207.
- Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC Press; Boca Ration, FL: 2008.
- Daniels MJ, Kass RE. Shrinkage Estimators for Covariance Matrices. Biometrics. 2001;57:1173–1184. [PMC free article] [PubMed]
- Daniels MJ, Normand SL. Longitudinal Profiling of Health Care Units Based on Continuous and Discrete Patient Outcomes. Biostatistics. 2006;7:1–15. [PMC free article] [PubMed]
- Dunson DB. Bayesian Latent Variable Models for Clustered Mixed Outcomes. Journal of the Royal Statistical Society, Ser. B. 2000;62:355–366.
- Dunson DB. Dynamic Latent Trait Models for Multidimensional Longitudinal Data. Journal of the American Statistical Association. 2003;98:555–563.
- Fitzmaurice GM, Laird NM. Regression Models for a Bivariate Discrete and Continuous Outcome With Clustering. Journal of the American Statistical Association. 1995;90:845–852.
- Gelman A, Mechelen IV, Verbeke G, Heitjan DF, Meulders M. Multiple Imputation for Model Checking: Completed-Data Plots With Missing and Latent Data. Biometrics. 2004;61:74–85. [PubMed]
- Gelman A, Meng X-L, Stern H. “Posterior Predictive Assessment of Model Fitness via Realized Discrepancies” (with discussion) Statistica Sinica. 1996;6:733–807.
- George EI, McCulloch RE. Variable Selection via Gibbs Sampling. Journal of the American Statistical Association. 1993;88:881–889.
- George EI, McCulloch RE. Approaches for Bayesian Variable Selection. Statistica Sinica. 1997;7:339–373.
- Geweke J. Efficient Simulation From the Multivariate Normal and Student
*t*-Distributions Subject to Linear Constraints; Computer Sciences and Statistics, Proceedings of the 23rd Symposium Interface; 1991.pp. 571–578. - Gueorguieva RV, Agresti A. A Correlated Probit Model for Joint Modelling of Clustered Binary and Continuous Responses. Journal of the American Statistical Association. 2001;96:1102–1112.
- Gueorguieva RV, Sanacora G. Joint Analysis of Repeatedly Observed Continuous and Ordinal Measures of Disease Severity. Statistics in Medicine. 2006;25:1307–1322. [PubMed]
- Hjort NL, Dahl FA, Steinbakk GH. Post-Processing Posterior Predictive p Values. Journal of the American Statistical Association. 2006;101:1157–1174.
- Hoeting J, Raftery AE, Madigan D. A Method for Simultaneous Variable Selection and Outlier Identification in Linear Regression. Computational Statistics and Data Analysis. 1996;22:251–270.
- Joe H. Generating Random Correlation Matrices Based on Partial Correlations. Journal of Multivariate Analysis. 2006;97:2177–2189.
- Kass RE, Wasserman L. The Selection of Prior Distributions by Formal Rules. Journal of the American Statistical Association. 1996;91:1343–1370. Corr. (1998), 93, 412.
- Liu C, Rubin DB, Wu YN. Parameter Expansion to Accelerate EM: The PX-EM Algorithm. Biometrika. 1998;85:755–770.
- Liu JS, Wu Y. Parameter Expansion for Data Augmentation. Journal of the American Statistical Association. 1999;94:1264–1274.
- Liu XF. unpublished doctoral dissertation. University of Florida, Dept. of Statistics; 2006. Bayesian Methodology for Models With Multivariate (Longitudinal) Outcomes.
- Liu XF. Parameter Expansion for Sampling a Correlation Matrix: An Efficient GPX-RPMH Algorithm. Journal of Statistical Computation and Simulation. 2008;78:1065–1076.
- Liu XF, Daniels MJ. A New Algorithm for Simulating a Correlation Matrix Based on Parameter Expansion and Re-Parameterization. Journal of Computational and Graphical Statistics. 2006;16:897–914.
- Marcus BH, Albrecht AE, King TK, Parisi AF, Pinto BM, Roberts M, Niaura RS, Abrams DB.
^{“}The Efficacy of Exercise as an Aid for Smoking Cessation in Women. Archives of Internal Medicine. 1999;36:479–492. [PubMed] - Marcus BH, King TK, Albrecht AE, Parisi AF, Abrams D. Rationale, Design, and Baseline Data for Commit to Quit: An Exercise Efficacy Trial for Smoking Cessation Among Women. Preventive Medicine. 1997;26:586–597. [PubMed]
- Marcus BH, Lewis BA, Hogan J, King TK, Albrecht AE, Bock B, Parisi AF, Niaura R, Abrams DB. The Efficacy of Moderate-Intensity Exercise as an Aid for Smoking Cessation in Women: A Randomized Controlled Trial. Nicotine & Tobacco Research. 2005;7:871–880. [PubMed]
- Marcus BH, Lewis BA, King TK, Albrecht AE, Hogan J, Bock B, Parisi AF, Abrams DB. Rationale, Design, and Baseline Data for Commit to Quit II: An Evaluation of the Efficacy of Moderate-Intensity Physical Activity as an Aid to Smoking Cessation in Women. Preventive Medicine. 2003;36:479–492. [PubMed]
- Pourahmadi M, Daniels MJ. Dynamic Conditionally Linear Mixed Models for Longitudinal Data. Biometrics. 2002;58:225–231. [PMC free article] [PubMed]
- Regan MM, Catalano PJ. Likelihood Models for Clustered Binary and Continuous Outcomes: Application to Developmental Toxicology. Biometrics. 1999;55:760–768. [PubMed]
- Ritter C, Tanner MA. Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler. Journal of the American Statistical Association. 1992;87:861–868.
- Robert CP. Simulation of Truncated Normal Variables. Statistics and Computing. 1995;5:121–125.
- Robins JM, Ventura V, van der Vaart A. Asymptotic Distribution of
*P*Values in Composite Null Models. Journal of the American Statistical Association. 2000;95:1143–1156. - Roy J, Lin X. Latent Variable Models for Longitudinal Data With Multiple Continuous Outcomes. Biometrics. 2000;56:1047–1054. [PubMed]
- Sammel MD, Ryan LM, Legler JM. Latent Variable Models for Mixed Discrete and Continuous Outcomes. Journal of the Royal Statistical Society, Ser. B. 1997;59:667–678.
- Smith M, Kohn R. Parsimonious Covariance Matrix Estimation for Longitudinal Data. Journal of the American Statistical Association. 2002;97:1141–1153.
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian Measures of Model Complexity and Fit” (with discussion) Journal of the Royal Statistical Society, Ser. B. 2002;64:583–639.
- Tanner MA, Wong WH.
^{“}The Calculation of Posterior Distribution by Data Augmentation” (with discussion) Journal of the American Statistical Association. 1987;82:528–550. - van Dyk DA, Meng XL. The Art of Data Augmentation” (with discussion) Journal of Computational and Graphical Statistics. 2001;10:1–111.
- Wakefield JC, Bennett JE. The Bayesian Modelling of Covariates for Population Pharmacokinetic Models. Journal of the American Statistical Association. 1996;91:917–927.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |