Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2928653

Formats

Article sections

- Abstract
- 1. Introduction
- 2. The joint longitudinal and survival model
- 3. Estimation
- 4. Model comparison
- 5. Application
- 6. Discussion
- REFERENCES

Authors

Related links

Ann Appl Stat. Author manuscript; available in PMC 2010 August 26.

Published in final edited form as:

Ann Appl Stat. 2009 September 1; 3(3): 1163–1182.

PMCID: PMC2928653

NIHMSID: NIHMS167978

See other articles in PMC that cite the published article.

We present a new joint longitudinal and survival model aimed at estimating the association between the risk of an event and the change in and history of a biomarker that is repeatedly measured over time. We use cubic B-splines models for the longitudinal component that lend themselves to straight-forward formulations of the slope and integral of the trajectory of the biomarker. The model is applied to data collected in a long term follow-up study of HIV infected infants in Uganda. Estimation is carried out using MCMC methods. We also explore using the deviance information criteria, the conditional predictive ordinate and ROC curves for model selection and evaluation.

In longitudinal studies it is common to monitor one or more biomarkers repeatedly over time while following participants until the occurrence of an event. Researchers are often interested in examining both the repeated measures and the time to event to gain an understanding of the underlying disease process. Additionally, the risk of an event may not depend solely on the level of the biomarker but also on the rate at which that biomarker is changing or its past average level. For example, two patients may present with the same biomarker value, but one patient's biomarker trajectory may be increasing while the other's is remaining constant. Their prognosis may appear to be be the same if only the current value is accounted for, but in fact the patient with the increasing biomarker may be at higher risk. In this paper we present a model to estimate the association between the risk of an event and the current value, as well as the rate of change or history of a longitudinally sampled biomarker. We illustrate the approach from a sub-study of HIVNET 012 [Guay et al. (1999); Jackson et al. (2003)] of HIV disease progression in Ugandan children who acquired HIV vertically, either in utero, during delivery or via breastmilk. In this sub-study children who tested positive before 18 months of age were followed until five years of age with blood samples drawn every 6 months. From these samples, the lab determined viral load, total lymphocyte count (TLC), CD4 percent and other related biomarkers. One aim was to determine the predictive value of these measures for time until death. In this manuscript we explore the association between change in and history of these markers and the risk of death. Overall, 128 children were followed after their first positive HIV test with lab measurements taken at regular intervals. Of these 128 infants, 70 died during follow-up.

In estimating the association between trends in a biomarker or set of biomarkers and the risk of an event, we face two important and distinct challenges. The first is selecting the correct model when the biomarker is collected in discrete time with error. The second is to determine how to obtain different summaries of the biomarker(s) and include them in the time to event model.

The first issue and its resolution through joint modeling of the time to event and longitudinal marker has been studied extensively as summarized by Tsiatis and Davidian (2004) when dealing with the current level of a biomarker and the risk of an event. In summary, survival analyses with time-dependent covariates can be biased if we simply include the raw measurements in the survival analysis [Prentice (1982)]. Joint longitudinal and survival models resolve this issue by modeling the biomarker process over time and including subject-specific parameters from the longitudinal model as covariates in the survival model. These same issues exist when trying to include other summaries of the biomarker process in the survival model, and may, in fact, be exacerbated.

Wulfsohn and Tsiatis (1997) and Faucett and Thomas (1996) introduced likelihood-based methods for analyzing a longitudinal marker and its association with the time to event simultaneously. This class of models is not restricted to linking the time-varying value of the longitudinal marker to the time-varying hazard through a regression on the current value of longitudinal model. They may instead include other information from the longitudinal trajectories summarized by the longitudinal model. Several authors have proposed models that group the trajectories into latent classes, then link the hazard and the longitudinal model by the latent class [Lin et al. (2002); Proust-Lima, Letenneur and Jacqmin-Gadda (2007); Han, Slate and Pena (2007); Proust-Lima et al. (2009)]. These models can help characterize longitudinal trajectories that may indicate a higher risk for event. In these models the latent class for the trajectory of the biomarker is defined based on the entire follow-up for the biomarker, which may have the drawback of using information about the biomarker from the future to estimate risk in the present.

Other descriptors from the longitudinal model have also been used to link the trajectory of the biomarker to risk of event. Yu, Taylor and Sandler (2008) developed individual risk prediction models for prostate cancer recurrence based on PSA trajectories. PSA was modeled using a nonlinear exponential decay and exponential growth model. The current value as well as slope (first derivative) of the PSA trajectory were included as time-varying covariates in the hazard model. Ye, Lin and Taylor (2008) presented a two-stage regression calibration approach for modeling longitudinal and time-to-event data. Their longitudinal model included smoothing splines at the population level with individual deviations from the smoothing spline allowed through a mean 0 integrated Wiener process and subject-specific slopes and intercepts. Both the current value and slope of the subject-specific trajectories were included as covariates in the hazard model.

In this manuscript we extend the work of Brown, Ibrahim and DeGruttola (2005) who used cubic B-splines to model the impact of multiple biomarkers on time to event to include the slope and integral of the cubic B-spline models as time-varying covariates in the hazard model. They showed that cubic B-splines provided flexibility in the biomarker model that simple parametric models could not. Because cubic B-spline trajectory models and their slopes and integrals are linear functions of the parameters that do not increase exponentially with time, they avoid computational instability. Additionally, because the basis functions of the cubic B-splines weight more heavily on local (in time) information for estimating the value of the trajectory, estimates of the slope early in follow-up do not rely heavily on values of the biomarker observed late in follow-up.

The paper proceeds as follows. In the next section we review cubic B-spline models for longitudinal data and outline the model associating change and cumulative exposure with time to event. Next, we discuss estimation and model selection procedures. We then show an example from HIVNET 012. We conclude with a discussion.

In this section we describe a joint longitudinal and survival model to estimate the association between the rate of change in a biomarker or cumulative history of a biomarker and the risk of an event. We begin with a description of the notation and a review of the longitudinal cubic B-spline model, including expressions for the first derivatives and integrals. We then introduce the model linking the biomarker and its rate of change or cumulative history to the risk of event.

Let *y _{ijl}* denote the

$${y}_{\mathit{ijl}}={\psi}_{l}\left({t}_{\mathit{ij}}\right)+{e}_{\mathit{ijl}},$$

where the errors are independent and normally distributed such that (*e*_{ij1}, …, *e _{ijL}*)′ ~

$${\psi}_{l}\left(t\right)=\sum _{k=1}^{q}{\beta}_{\mathit{ilk}}{B}_{k}\left(t\right),$$

(2.1)

where {*B _{k}*(·)} is a -dimensional basis for spline functions on [0,

An individual's contribution to the likelihood of the longitudinal marker can be expressed as

$$p({Y}_{i}\mid \Sigma ,{\beta}_{i})\propto \frac{1}{{\mid \Sigma \mid}^{{m}_{i}}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{-\frac{1}{2}\sum _{j=1}^{{m}_{i}}{({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right))}^{\prime}{\Sigma}^{-1}({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right))\right\},$$

(2.2)

where *Y _{ij}* = (

We next review the form of the joint model when we are only interested in modeling the relationship between the level of the marker at time *t* and the hazard at time *t*. We then show how the spline model of the trajectory can be used to describe the rate of change in and history of the biomarker. Next, we incorporate these functionals into the hazard, thus extending the joint model to estimate the impact of the rate of change and history on the risk of an event.

The usual joint longitudinal and survival model assumes proportional hazards, and the effect of the trajectories of the biomarkers on the hazard is modeled as

$$\lambda \left(t\right)={\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}({\gamma}^{\prime}\psi \left(t\right)+{Z}_{i}^{\prime}\zeta ),$$

(2.3)

where *λ*_{0}(*t*) is the baseline hazard at time *t*, *γ* is a parameter vector of length *L* linking the trajectory vector at time *t* to the hazard at time *t*, and *ζ* is a vector of parameters of length *p _{z}* linking the vector of baseline covariates

$${\lambda}_{0}\left(t\right)={\lambda}_{j},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{w}_{j}\le t<{w}_{j+1},j=1,\dots ,J,$$

where the *w _{j}*'s are the jump points with

Then the cumulative hazard,

$${\int}_{0}^{{s}_{i}}\lambda \left(u\right){e}^{{\gamma}^{\prime}\psi \left(u\right)+{z}_{i}^{\prime}\zeta}\mathit{du},$$

can be rewritten as

$${e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,\lambda ),$$

where

$${H}_{\mathit{ij}}(\beta ,\gamma ,\lambda )=I\{{s}_{i}\ge {u}_{j-1}\}{\lambda}_{j}{\int}_{{u}_{j-1}}^{\mathrm{min}({u}_{j},{s}_{i})}{e}^{{\gamma}^{\prime}\psi \left(u\right)+{z}_{i}^{\prime}\zeta}\mathit{du}$$

(2.4)

and *I* {*s _{i}* ≥

Extending the model to include the relationship between another function or functions of the time-varying covariates and the hazard requires adding another term to the model. If we are interested in the relationship between rate of change of the biomarker and the hazard, we include the first derivative of *ψ*(*t*). If we are interested in the relationship between the cumulative history of the biomarker and the hazard, we include the integral of *ψ*(*t*).

For clarity, we drop the subscripts for definition of the first derivative and integral. The first derivative of the trajectory function as shown by de Boor [(2001), page 116] can be expressed as

$${\psi}^{\prime}\left(t\right)={B}^{\left(2\right)\ast}{\left(t\right)}^{\prime}\beta ,$$

(2.5)

where *B*^{(2)}* (*t*) is a vector of length *q* with the *j*th element equal to $\frac{{B}_{j}^{\left(2\right)}\left(t\right)}{{u}_{j+3}-{u}_{j}}-\frac{{B}_{j+1}^{\left(2\right)}\left(t\right)}{{u}_{j+4}-{u}_{j+1}}$, and *B*^{(2)} (*t*) is the quadratic B-spline based on the same sequence of knots as the original B-spline in (2.1) with the first and last knot removed. Equation (2.5) is written as a linear function of the elements of *β _{i}*; however, it is a quadratic B-spline and still retains many of the desirable properties of B-splines mentioned earlier.

The general idea for calculating the integral of the cubic B-spline was laid out by de Boor [(2001), page 128]. We derived the integral of the trajectory up to time *t* to be

$${\int}_{0}^{t}\psi \left(v\right)\mathit{dv}={\psi}^{(-1)}\left(t\right)={B}^{\ast \left(4\right)}{\left(t\right)}^{\prime}\beta ,$$

(2.6)

where *B**^{(4)}(*t*) is a vector of length *q* with *j*th element equal to ${\Sigma}_{k=j+1}^{q+1}\phantom{\rule{thinmathspace}{0ex}}{B}_{k}^{\left(4\right)}\left(t\right)\times \left(\frac{{u}_{j+4}-{u}_{j}}{4}\right)$, and $\left\{{B}_{k}^{\left(4\right)}(\cdot )\right\}$ is the vector of *q* + 1-dimensional basis of a quartic B-spline based on the knot vector *u* augmented by two arbitrary knots, *u*_{0} < *u*_{1} and *u*_{q+5} > *u*_{q+4}.

To link the rate of change and history of the trajectories to the hazard, we use the following regression model:

$$\lambda \left(t\right)={\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{{\gamma}^{\prime}\psi \left(t\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left(t\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left(t\right)+{Z}_{i}^{\prime}\zeta \},$$

(2.7)

where *γ _{s}* is the

As in the case where only the trajectory value at time *t* is linked to the hazard at time *t*, the cumulative hazard can be written as

$${e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda ),$$

where

$$\begin{array}{cc}\hfill {H}_{\mathit{ij}}& (\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda )\hfill \\ \hfill & \phantom{\rule{thinmathspace}{0ex}}=I\{{s}_{i}\ge {u}_{j-1}\}{\lambda}_{j}\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\times {\int}_{{u}_{j-1}}^{\mathrm{min}({u}_{j},{s}_{i})}\mathrm{exp}\{{\gamma}^{\prime}\psi \left(u\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left(u\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left(u\right)\}\mathit{du}\hfill \end{array}$$

(2.8)

and *γ _{s}* = (

The distribution for an individual's time to event, *s _{i}*, given the trajectory function and choice of hazard regression, is given by

$$\begin{array}{c}\hfill f({s}_{i},{v}_{i}\mid {Y}_{i})=\lambda {\left({s}_{i}\right)}^{{v}_{i}}\mathrm{exp}\left\{{v}_{i}({\gamma}^{\prime}\psi \left({s}_{i}\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left({s}_{i}\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left({s}_{i}\right)+{z}_{i}^{\prime}\zeta )\right\}\hfill \\ \hfill \times \phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{-{e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda )\right\},\hfill \end{array}$$

(2.9)

where *ν _{i}* is the censoring indicator for subject

We can now express the *i*th subject's contribution to the joint likelihood function as

$$\begin{array}{cc}\hfill f({Y}_{i},{s}_{i},{v}_{i})=& f({s}_{i},{v}_{i}\mid {Y}_{i})\times f\left({Y}_{i}\right),\hfill \\ \hfill f({Y}_{i},{s}_{i},{v}_{i})\propto & \lambda {\left({s}_{i}\right)}^{{v}_{i}}\mathrm{exp}\left\{{v}_{i}({\gamma}^{\prime}\psi \left({s}_{i}\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left({s}_{i}\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left({s}_{i}\right)+{z}_{i}^{\prime}\zeta )\right\}\hfill \\ \hfill & \times \mathrm{exp}\left\{-{e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda )\right\}\hfill \\ \hfill & \times \frac{1}{{\mid \Sigma \mid}^{{m}_{i}}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{-\frac{1}{2}\sum _{j=1}^{{m}_{i}}{({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right))}^{\prime}{\Sigma}^{-1}({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right))\right\}.\hfill \end{array}$$

We then specify the prior distributions for the parameters in the likelihood as follows. We assume (*γ*, *γ _{s}*,

We use the Gibbs sampler [Gelfand and Smith (1990)] to sample from the posterior distribution of the parameters. Because the full conditionals of *γ*, *γ _{s}*,

We examine two statistics for model comparison, the deviance information criterion (DIC) [Spiegelhalter et al. (2002)] and the conditional predictive ordinate (CPO) [Gelfand, Dey and Chang (1992)].

The DIC is a measure of the deviance penalized for the number of parameters in the model which may be difficult to ascertain in hierarchical models and is therefore estimated. The DIC is the sum of the deviance estimated using the posterior estimates of the parameters, *D*(ϴ̄), and twice the effective number of parameters, *p _{D}*. The effective number of parameters is estimated by ${p}_{D}=\stackrel{\u2012}{D\left(\u03f4\right)}-D\left(\stackrel{-}{\u03f4}\right)$, where $\stackrel{\u2012}{D\left(\u03f4\right)}$ is the posterior mean of the deviance (the average of the deviances calculated using the estimated parameters at each step of the MCMC sampler). For the model presented in this paper, the DIC can be expressed as

$$\mathit{DIC}=2\frac{1}{G}\sum _{g=1}^{G}\sum _{i=1}^{N}\mathrm{log}\left\{f({s}_{i},{v}_{i},{Y}_{i}\mid {\Theta}^{\left(g\right)})\right\}-\sum _{i=1}^{N}\mathrm{log}\left\{f({s}_{i},{v}_{i},{Y}_{i}\mid \stackrel{-}{\Theta})\right\},$$

where ϴ^{(g)} denotes the parameter samples at the *g*th, *g* = 1,…, *G*, iteration of the Gibbs sampler and ϴ̄ represents the means of the posterior samples. A smaller DIC indicates a better fit when comparing models.

For the *i*th observation, the CPO statistic is defined as

$${\mathit{CPO}}_{i}=f({s}_{i},{v}_{i},{Y}_{i}\mid {D}^{(-i)})=\int f({s}_{i},{v}_{i},{Y}_{i}\mid \Theta ,{D}_{i})\pi (\Theta \mid {D}^{(-i)})d\Theta ,$$

(4.1)

where ϴ denotes the model parameters, *D _{i}* denotes the

$$\widehat{{\mathit{CPO}}_{i}}={\left(\frac{1}{G}\sum _{g=1}^{G}\frac{1}{f({s}_{i},{v}_{i},{Y}_{i}\mid {\Theta}^{\left(g\right)})}\right)}^{-1},$$

where ϴ^{(g)} denotes the parameter samples at the *g*th, *g* = 1,…, *G*, iteration of the Gibbs sampler. A large *CPO* value indicates a better fit. We can compare different models using the sums of the logs of the CPOs of the individual observations, also known as the log pseudo-marginal likelihood (LPML). Models with greater $\mathrm{LPML}=\Sigma \phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\left(\widehat{{\mathit{CPO}}_{i}}\right)$ values will indicate a better fit. Both the DIC and LPML are designed to measure a model's predictive ability, although the DIC is based on a penalized deviance approach and the LPML is based on a cross-validated approach.

HIVNET 012, conducted in Uganda, was a double-blinded controlled randomized trial of single dose nevirapine for the mother and newborn infant versus AZT administered only to the mother to prevent mother to child transmission (MTCT) of HIV. In spite of the success of nevaripine in reducing the risk of transmission, many infants still experienced MTCT of HIV. To better understand HIV infection in young children, 128 of those infants were enrolled in a long term follow-up study and followed until age 5 or death. CD4 cell percent and HIV viral load are known indicators of disease progression. TLC is also being studied for its potential use in resource-poor settings where routine CD4 and viral load monitoring may be cost prohibitive. In this section we examine the association between longitudinal measures of CD4 cell percent, viral load and TLC and time until death using separate models (one biomarker per model) in the HIVNET 012 long term follow-up study using the methods proposed in the previous sections.

Seventy infants died during follow-up. Jump points for the baseline hazard function were selected based on quantiles of the event times so that approximately 8 events occurred between the jump points. Infants had between one and thirteen longitudinal measures with a median of four. Overall, there were a total of 594 measurements of CD4 percent, 603 measurements of viral load and 763 measurements of TLC. 13% of the CD4 percent measures, 7% of the viral load measures and 16% of the TLC measures are taken at time 0. In this analysis we placed the knots for the splines based on the quantiles of the data; therefore, this clumping of measurement times limits the number of knots we could select before we start placing multiple knots at 0, making the slope undefined. Therefore, for CD4 percent, we fit models with *q* = 5,…, 10. For TLC, we fit models with *q* = 5,…, 9. For comparison to potential models with more basis functions, we also=fit models with equally spaced knots with *q* = 11 for CD4 percent and *q* = 10 for TLC. For viral load, we can fit models with *q* = 5,…, 19. Additionally, we included age at first positive HIV test as a covariate in the hazard model; therefore, the interpretation of *ζ* is the change in the log hazard associated with a 1 month increase in the age of the infant at the time of the first positive HIV test. We implemented the Gaussian quadrature procedure using 10 points. We also ran models with higher *q* using 50 points and found no difference in the estimates.

Table 1 shows the estimated values of the parameters of interest along with their 95% credible intervals obtained from fitting the data to three versions of the hazard model shown in equation (2.7), one with *γ _{s}* = 0 and

For viral load, DIC selected *q* = 6 for all three models. The LPML increased as *q* increased except for the model with *γ* and *γ _{s}* where it selected

For CD4 percent, the DIC selected *q* = 7 for all three models. The LPML selected *q* = 7 for the current value and current value plus slope models and *q* = 8 for the current value plus history model (although the value was close to that for *q* = 7). Here, we focus on the results for *q* = 7. A 10 unit decrease in CD4 percent is associated with a 2.6-fold increase in the hazard [95% CI: (1.86, 3.82)]. There is no suggestion of a further association between risk of death and the change in CD4 percent or its history. Figure 2 shows the fit of the spline model to the observed CD4 percent data for 12 infants. The majority of the infants in Figure 2 have an initial sharp decline in CD4 percent which is fit well by the model without forcing decreases where the data do not suggest it (infant 6).

For TLC, the DIC and LPML selected *q* = 9. The density of measurements near time zero forced quantile-based knots very close together if not equal shortly after infection leading to undefined slopes for models with *q* > 9. Using equally spaced knots did not result in improved DIC or LPML. Here we focus on the results for *q* = 9. A possible association between TLC and risk of death was suggested in this model with a 1000 unit increase in TLC being associated with a 34% [95% CI: (20, 49)] decrease in the risk of death. Figure 3 shows the fit from the longitudinal models for TLC.

The models all estimate a similar association between age at first positive test and risk of death, with a one month increase in age at first positive test associated with an approximate 10% reduction in risk given the trajectory of the longitudinal marker.

Figure 4 shows the posterior estimate of the cumulative hazards for the three markers when only the current value of the marker is included in the model. TLC provide better fits to the Kaplan–Meier estimate in the first year, while viral load provides a better fit between 2 and 3 years.

Fitted cumulative hazard curves for the selected models for TLC, viral load and CD4 percent. The solid and dashed stepped lines represent the Kaplan–Meier fit with 95% confidence intervals, with the small vertical lines representing censored observation **...**

We propose using ROC curves for censored data [Heagerty, Lumley and Pepe (2000)] as an additional comparison of joint longitudinal and survival models. We plotted ROC curves for predicting death within 1 year based on linear predictor, $\gamma \psi \left(t\right)+{\gamma}_{s}{\psi}^{\prime}\left(t\right)+{\gamma}_{h}{\psi}^{(-1)}\left(t\right))+{Z}_{i}^{\prime}\zeta $, at *t* = 6, 12 and 18 months infection (Figure 5). Taking *t* = 6 months as an example, we treat the baseline time for survival as 6 months, and calculate the ROC curve for death within 12 months (18 months after infection) with the linear predictor calculated at 6 months as the biomarker. Infants who died or were censored before 6 months are not included. The same procedure was used for 12 and 18 months. These results do not suggest any large improvement in prediction when either the rate of change in or history of the biomarker are included in the models, except possibly for including slope in the viral load model. This agrees with the results from the model and the DIC and LPML. Overall, TLC does not compare favorably to either viral load or CD4 percent in predicting death within one year, and viral load appears to be the best predictor.

Understanding the relationship between trends in a biomarker and risk of an event can yield important insight into the mechanisms of disease progression. Clearly, this is recognized scientifically, as the *Journal of the American Medical Association* recently made an exception to its own policies on publishing data over five years old to report an analysis of the association between CD4 slopes estimated from linear mixed effects models and time to development of AIDS or death in antiretroviral naive HIV-infected adults [Mellors et al. (2007)]. Here, we present a model that goes several steps further to addressing that question by jointly modeling the time to event and the slope and looking at local changes as opposed to long term trends. Additionally, we do not restrict ourselves to a linear model that assumes constant rate of change. This model is motivated and illustrated by a long-term follow-up study of HIV-infected children in Africa. However, it can be used in many settings where understanding the relationship between trends in a biomarker and time to an event is of interest. Other examples may include cognitive function and death or subclinical coronary disease and clinical coronary events.

This model gives interpretable parameters, although these values are difficult to translate into practice. For example, although we estimate that a 10-fold (1 log unit) increase per month in viral load is associated with a 2.8-fold increase in the hazard, we cannot easily measure the instantaneous change in viral load in practice. To do so, we would have to measure it frequently, which is not feasible in practice, especially in resource-poor settings. However, this approach may indicate if there is more information available in collected information than just the current value and what that information might be (change versus average history, for example).

We have proposed a model that provides additional insight into the biological processes of disease progression by linking the hazard of the event to the rate of change or cumulative history of a biomarker. This model expands the possibilities of examining disease progression within the class of joint longitudinal and survival models.

The author thanks an Associate Editor and the referees for their comments that have greatly improved the presentation of the article.

We use Gibbs sampling to sample from the joint posterior distribution of the parameters: *β*, *α*, *γ*, *γ _{s}*,

- Let
*β*=_{il}*(β*_{il1},…,*β*_{ilq})^{′},*i*= 1, …,*N, l*= 1, …,*L*. Then use ARS to sample*[β*|_{il}*rest,D*] from$$\begin{array}{cc}\hfill p({\beta}_{i}\mid \mathit{rest},D)\propto \mathrm{exp}& \left\{-\frac{1}{2}\left[\sum _{j=1}^{{m}_{i}}{({Y}_{\mathit{ij}}-{\psi}_{\beta}\left({t}_{\mathit{ij}}\right))}^{\prime}{\Sigma}^{-1}({Y}_{\mathit{ij}}-{\psi}_{\beta}\left({t}_{\mathit{ij}}\right))+\sum _{l=1}^{L}{({\beta}_{\mathit{il}}-{b}_{0l}-{X}_{i}^{\prime}{\alpha}_{l})}^{\prime}{V}_{0l}^{-1}({\beta}_{\mathit{il}}-{b}_{0l}-{X}_{i}^{\prime}{\alpha}_{l})\right]\right\}\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\times \mathrm{exp}\left\{{v}_{i}({\gamma}^{\prime}\psi \left({s}_{i}\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left({s}_{i}\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left({s}_{i}\right)+{z}_{i}^{\prime}\zeta )-{e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda )\right\}.\hfill \end{array}$$ - Sample$$\begin{array}{cc}\hfill & [{V}_{0l}^{-1}\mid \mathit{rest},D]\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}~\text{Wishart}\left({\left({S}_{{v}_{0l}}^{-1}+\sum _{i=1}^{N}({\beta}_{\mathit{il}}-{b}_{0l}-{x}_{i}^{\prime}\alpha ){({\beta}_{\mathit{il}}-{b}_{0l}-{x}_{i}^{\prime}\alpha )}^{\prime}\right)}^{-1},N+{v}_{{v}_{0l}}\right).\hfill \end{array}$$
- Sample$$\begin{array}{cc}\hfill & [{\Sigma}^{-1}\mid \mathit{rest},D]\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}~\text{Wishart}\left({\left({S}_{\Sigma}^{-1}+\sum _{i=1}^{N}\sum _{j=1}^{{m}_{i}}({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right)){({Y}_{\mathit{ij}}-\psi \left({t}_{\mathit{ij}}\right))}^{\prime}\right)}^{-1},\sum _{i=1}^{N}{m}_{i}+{v}_{\Sigma}\right).\hfill \end{array}$$
- Let
*b*_{0l}= (*b*_{0l1}, …,*b*_{0lq})′,*i*= 1, …,*N*,*l*= 1, …,*L*. Then Samplewhere$$[{b}_{0l}\mid \mathit{rest},D]~{N}_{2}({\mu}_{{b}_{0l}},{\Sigma}_{{b}_{0l}}),$$and$${\mu}_{{b}_{0l}}={\Sigma}_{{b}_{0l}}\left({V}_{0l}^{-1}\sum _{i=1}^{N}({\beta}_{\mathit{il}}-{x}_{i}^{\prime}\alpha )+{A}_{1l}^{-1}{A}_{0l}\right)$$$${\Sigma}_{{b}_{0l}}={({\mathit{NV}}_{0l}^{-1}+{A}_{1l}^{-1})}^{-1}.$$ - Use ARS to sample [
*γ, γ*|_{s}, γ_{h}*rest, D*] from$$\begin{array}{cc}\hfill p(\gamma \mid \mathit{rest},D)\propto & \mathrm{exp}\left\{\sum _{i=1}^{N}\left({v}_{i}({\gamma}^{\prime}\psi \left({s}_{i}\right)+{\gamma}_{s}^{\prime}{\psi}^{\prime}\left({s}_{i}\right)+{\gamma}_{h}^{\prime}{\psi}^{(-1)}\left({s}_{i}\right)+{z}_{i}^{\prime}\zeta )-{e}^{{z}_{i}^{\prime}\zeta}\sum _{j=1}^{J}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},\lambda )\right)\right\}\hfill \\ \hfill & \phantom{\rule{thinmathspace}{0ex}}\times \mathrm{exp}\left\{-\frac{1}{2}{({(\gamma ,{\gamma}_{s},{\gamma}_{h})}^{\prime}-{g}_{0})}^{\prime}{g}_{1}^{-1}({(\gamma ,{\gamma}_{s},{\gamma}_{h})}^{\prime}-{g}_{0})\right\}.\hfill \end{array}$$ - Sample $[{\lambda}_{j}\mid \mathit{rest},D]=\text{gamma}({d}_{0j}+{n}_{j},{\Sigma}_{i=1}^{N}{e}^{{z}_{i}^{\prime}\zeta}{H}_{\mathit{ij}}(\beta ,\gamma ,{\gamma}_{s},{\gamma}_{h},1)+{d}_{1j})$, where
*n*is the number of events in the_{j}*j*th interval and*H*(_{ij}*β, γ*, 1) is*H*(_{ij}*β*,*γ*,*γ*,_{s}*γ*,_{h}*λ*) evaluated with*λ*= 1._{j} - Samplewhere$$[\alpha \mid \mathit{rest},D]~{N}_{p}({\mu}_{\alpha},{\Sigma}_{\alpha}),$$and$${\mu}_{\alpha}={\Sigma}_{\alpha}\left(\sum _{i=1}^{N}{X}_{i}{1}_{q}{V}_{0l}^{-1}({\beta}_{\mathit{il}}-{b}_{0l})+{C}_{1}^{-1}{C}_{0}\right)$$where$${\Sigma}_{\alpha}={\left(\sum _{i=1}^{N}{X}_{i}{1}_{q}{V}_{0l}^{-1}{1}_{q}{X}_{i}^{\prime}+{C}_{1}^{-1}\right)}^{-1},$$
**1**_{q}is a vector of ones of length*q*.

- Brown ER, Ibrahim JG, Degruttola V. A flexible B-spline model for multiple longitudinal biomarkers and survival. Biometrics. 2005;61:64–73. MR2129202. [PubMed]
- Chen M-H, Shao Q-M, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer-Verlag; New York: 2000. MR1742311.
- de Boor C. A Practical Guide to Splines. revised ed. Springer-Verlag; New York: 2001. MR1900298.
- Faucett CL, Thomas DC. Simultaneously modelling censored survival data and repeatedly measured covariates: A Gibbs sampling approach. Statist. Med. 1996;15:1663–1685. [PubMed]
- Gelfand AE, Dey DK, Chang H. Model determination using predictive distributions with implementation via sampling-based methods (with discussion) In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics 4. Oxford University Press; Oxford: 1992. MR1380275.
- Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 1990;85:398–409. MR1141740.
- Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 1992;41:337–348.
- Guay L, Musoke P, Fleming T, Bagenda D, Allen M, Nakabiito C, Sherman J, Bakaki P, Ducar C, Deseyve M, Emel L, Mirochnick M, Fowler M, Mofenson L, Miotti P, Dransfield K, Bray D, Mmiro F, Jackson J. Intrapartum and neonatal single-dose nevirapine compared with zidovudine for prevention of mother-to-child transmission of HIV-1 in Kampala, Uganda: HIVNET 012 randomised trial. Lancet. 1999;354:795–802. [PubMed]
- Han J, Slate EH, Pena EA. Parametric latent class joint model for a longitudinal biomarker and recurrent events. Statist. Med. 2007;26:5285–5302. MR2415667. [PubMed]
- Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56:337–344. [PubMed]
- Jackson J, Musoke P, Fleming T, Guay L, Bagenda D, Allen M, Nakabiito C, Sherman J, Bakaki P, Owor M, Ducar C, Deseyve M, Mwatha A, Emel L, Duefield C, Mirochnick M, Fowler M, Mofenson L, Miotti P, Gigliotti M, Bray D, Mmiro F. Intrapartum and neonatal single-dose nevirapine compared with zidovudine for prevention of mother-to-child transmission of HIV-1 in Kampala, Uganda: 18-month follow-up of the HIVNET 012 randomised trial. Lancet. 2003;362:859–868. [PubMed]
- Lin H, Turnbull BW, Mcculloch CE, Slate EH. Latent class models for joint analysis of longitudinal biomarker and event process data: Application to longitudinal prostate-specific antigen readings and prostate cancer. J. Amer. Statist. Assoc. 2002;97 MR1947272.
- Mellors JW, Margolick JB, Phair JP, Rinaldo CR, Detels R, Jacobson LP, Munoz A. Prognostic value of HIV-1 RNA, CD4 cell count, and CD4 cell count slope for progression to AIDS and death in untreated HIV-1 infection. J. Amer. Med. Assoc. 2007;297:2349–2350. [PubMed]
- Neal RM. Slice sampling. Ann. Statist. 2003;31:705–767. MR1994729.
- Plummer M, Best N, Cowles K, Vines K. CODA: Convergence diagnosis and output analysis for MCMC. R News. 2006. pp. 7–11. Available at http://CRAN.R-project.org/doc/Rnews/
- Proust-Lima C, Joly P, Dartigues J, Jacqmin-Gadda H. Joint modelling of multivariate longitudinal outcomes and a time-to-event: A nonlinear latent class approach. Comput. Statist. Data Anal. 2009;53:1142–1154.
- Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69:331–342.
- Proust-Lima C, Letenneur L, Jacqmin-Gadda H. A nonlinear latent class model for joint analysis of multivariate longitudinal data and a binary outcome. Statist. Med. 2007;26:2229–2245. MR2368118. [PubMed]
- R Development Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2006. ISBN 3-900051-07-0. Available at http://www.R-project.org.
- Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of complexity and fit. J. Roy. Statist. Soc. Ser. B. 2002;64:583–639. MR1979380.
- Tsiatis A, Davidian M. An overview of joint modeling of longitudinal and time-to-event data. Statist. Sinica. 2004;14:793–818. MR2087974.
- Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. MR1450186. [PubMed]
- Ye W, Lin X, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data—A two-stage regression calibration approach. Biometrics. 2008;64:1238–1246. [PubMed]
- Yu M, Taylor JMG, Sandler HM. Individual prediction in prostate cancer studies using a joint longitudinal survival cure model. J. Amer. Statist. Assoc. 2008;103:178–187.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |