Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biometrics. Author manuscript; available in PMC 2011 February 27.
Published in final edited form as:
PMCID: PMC3045702

A Bayesian Semi-parametric Survival Model with Longitudinal Markers


We consider inference for data from a clinical trial of treatments for metastatic prostate cancer. Patients joined the trial with diverse prior treatment histories. The resulting heterogenuous patient population gives rise to challenging statistical inference problems when trying to predict time to progression on different treatment arms. Inference is further complicated by the need to include a longitudinal marker as a covariate. To address these challenges, we develop a semi-parametric model for joint inference of longitudinal data and an event time. The proposed approach includes the possibility of cure for some patients. The event time distribution is based on a non-parametric Pólya tree prior. For the longitudinal data we assume a mixed effects model. Incorporating a regression on covariates in a non-parametric event time model in general, and for a Póolya tree model in particular, is a challenging problem. We exploit the fact that the covariate itself is a random variable. We achieve an implementation of the desired regression by factoring the joint model for the event time and the longitudinal outcome into a marginal model for the event time and a regression of the longitudinal outcomes on the event time, i.e., we implicitly model the desired regression by modeling the reverse conditional distribution.

Keywords: Bayesian non-parametric models, Pólya tree, survival, regression

1 Introduction

We discuss inference for data from a phase III clinical trial on treatments of metastatic prostate cancer. The challenges include patient heterogeneity due to prior treatment history and the need to include a regression on prostate specific antigen (PSA) as a longitudinal marker. We constuct a semi-parametric Bayesian model to address these challenges. It implements joint inference on event time and longitudinal observations, with the possibility that some patients are cured.

Let T be the event time and Y be the longitudinal covariate. Most existing approaches are based on factoring the joint model as P(T, Y) = P(Y)P(T | Y). The first factor is the longitudinal submodel P(Y), typically assumed to be a mixed model. The second factor is the survival submodel P(T | Y). In the following discussion, we use the terms event time, survival time, time to progression and failure time exchangably. There is an extensive literature on the joint modeling of longtitudinal and event time data without a cured fraction (De Gruttola and Tu, 1994; Tsiatis et al., 1995; Lavalley and De Gruttola, 1996; Wulfsohn and Tsiatis, 1997; Dafni and Tsiatis, 1998; Henderson et al., 2000; Xu and Zeger, 2001; Lin et al., 2002; Ibrahim et al., 2004). A review can be found in Tsiatis and Davidian (2004). Less work has been published on the joint modeling of longitudinal and event time data with cure. Law et al. (2002) proposed a model with the longitudinal process described by an exponential-decay-exponential-growth model and a mixture model to accomodate cure. The imputed values of the longitudinal measurements are covariates in a proportional hazard model. Brown and Ibrahim (2003) and Chen et al. (2004) assume event times to arise from the development of an unobserved number of metastatis-competent tumer (MCT) cells, modeled by a Poisson distribution. Subjects with zero MCT cells constitute the cure group. Yu et al. (2004) provide a recent review of joint longitudinal-survival-cure models.

Specific to modeling PSA, Pauler and Finkelstein (2002) propose a joint analysis using a change-point regression model for PSA trajectory, and a Cox model for cancer recurrence time with time-dependent covariates including functions of longitudinal parameters and imputed PSA mean function. Lin et al. (2002) considered a latent class model to uncover subpopulation structure for both PSA trajectories and a survival outcome. Given latent class membership, the longitudinal marker and outcome are assumed independent. The model assumes class-specific baseline hazard functions and accommodates possibly time-dependent covariates. Yu et al. (2008) investigated individual prediction in prostate cancer studies using a joint longitudinal-survival-cure model. A logistic model is specified for the probability of an individual being in the susceptible group and separate nonlinar mixed effect models are assumed for the cure and susceptible groups. The event-time process is modeled by a proportional hazard model with time-dependent covariates including the current slope and current value of the PSA trajectory.

In the joint analysis of longitudinal and event time data, most researchers assume parametric or semi-parametric models for P(T | Y). However, it is difficult to implement non-parametric models for P(T | Y) because most non-parametric models do not allow straightforward incorporation of a regression on covariates. We propose to use the alternative factorization, P(T, Y) = P(T)P(Y | T). We proceed under the Bayesian paradigm. Choosing a non-parametric model for P(T) is the traditional problem of non-parametric inference for an event time. The model P(Y | T) is part of a convenient factorization of the joint model. It should not be interpreted as a predicting model of Y by future outcome T. We propose a mixed effect model for P(Y | T), with T conveniently included as a univariate covariate. The opposite, including a high-dimensional covariate Y in a non-parametric model for T, would pose challenging technical problems (This is distinct from the related problem of non-parametrically modeling a mean function E(T | Y), e.g., kernel smoothers). Both factorizations lead to a joint model, P(T, Y), describing the dependence between T and Y. It is this joint model that ultimately allows improved prediction of the event time given repeated measurements of the marker. Under the factorization P(T) P(Y | T) the desired regression P(T | Y) is not explicitly parametrized, but implied by Bayes theorem. In a parametric model and using maximum likelihood estimation the same factorization is used in Pawitan and Self (1993) to model longitudinal markers for AIDS patients. They assume Weibull models for the infection time and disease occurrence and a generalized linear model for the longitudinal measurements of T4 counts and T4/T8 ratio, with the intercept and slope being functions of the event times.

We use a Polya tree (PT) prior to model the event time distribution. The reasons for this modeling choice are the possibility to model multimodal distributions reflecting the diversity of the patient population, the computational simplicity, and the easy a priori centering at a parametric model. A PT prior can be constructed to give probability one to the set of continuous or absolutely continuous probability measures (Lavine, 1992).

Muliere and Walker (1997) implemented PT models in a survival analysis. Walker and Mallick (1997; 1999) applied PT priors in hierarchical generalized linear models, frailty models, and accelerated failure time models. Hanson and Johnson (2002) developed a general approach to modeling residual distributions with a mixture of PT. Neath (2003) used PT to model censored data. Paddock et al. (2003) developed randomized PT models, which uses random partitions to smooth out the effect of partitions on posterior inference. Hanson et al. (2007) used mixtures of PTs to construct a joint model for time-dependent covariates and survival time. They introduced flexible PT priors for the baseline distributions in the Cox model, the proportional odds model, and an accelerated failure time model accommodating time-dependent covariates. Their approach uses the factorization P(T, Y) = P(Y)P(T | Y). See Hanson (2006) for a review of recent development in finite PT models.

2 A Clinical Study

Androgen ablation (AA) is the preferred treatment for metastatic prostate cancer. AA therapy alters the natural history of the disease by disrupting the growth promoting effects mediated by androgen receptor signaling, which is usually accomplished by medical suppression of testicular endocrine function. Unfortunately, most patients with clinically detectable metastatic disease when the AA therapy started will eventually progress to androgen independent prostate cancer (AIPC). AIPC is a relentlessly progressive disease state, and is the cause of death for the vast majority of men in whom it develops. By this mechanism, prostate cancer leads to an annual death toll of more than 27,000 men in the United States.

To date, no treatment has been found to be curative for AIPC, and it is only fairly recently that some therapies are shown to alter the natural history of the disease. A chemotherapy demonstrated a survival advantage over historical results in a phase II trial conducted at M.D. Anderson Cancer Center (Ellerhorst et al., 1997). This therapy, dubbed KA/VE, treats patients with ketoconazole and doxorubicin alternating with vinblastine and estramustine.

In this paper we analyze data from a phase III trial at M.D. Anderson Cancer Center that comapred conventional AA therapy versus AA therapy plus three 8-week cycles of KA/VE. The aim of this trial was to investigate whether better clinical benefit can be achieved by applying the chemotherapy “early”, i.e., before the metastatic prostate cancer develops into the far-advanced AIPC. The two treatment arms are denoted by AA and CH, respectively. The patient population includes metastatic prostate cancer patients whose high risk of developing AIPC justifies long-term, sustained, androgen ablation. The primary endpoint is the time to progression (TTP) to AIPC, which is diagnosed by the following criteria: 1) Symptoms attributed by the treating physician to reflect progressive cancer; 2) Radiographic progression; 3) Rising PSA, with value greater than 1 and doubling time < 9 months; 4) Treatment with chemotherapy. The first 3 also require demonstration of testosterone < 50 and withdrawal of antiandrogens. More details about the clinical trial can be found in Millikan et al. (2008).

Besides TTP, we also observed the longitudinal measurements of PSA level from each patient. Carter et al. (2006) demonstrated that PSA velocity is associated with prostate cancer death even 10–15 years before diagnosis. To further improve the understanding of this important marker we propose to build a joint model of TTP and PSA.

To statisticians, a challenge posed by this clinical trial is the considerable heterogeneity among patients. Before coming to M.D. Anderson Cancer Center these patients had been treated by different physicians with different therapies at different institutions. These differences might have a long-term impact on the development of prostate cancer. Second, there is no completely satisfactory way to define “early” in the natural history of metastatic prostate cancer. As a practical solution, the clock start of the trial is defined as the initiation of the AA therapy. Thus at the beginning of the trial, the true stage of cancer might not be exactly the same for each patient.

3 Notation and Model

We use v = 1, 2 to denote the two treatment arms (1 for CH and 2 for AA). Let nv be the number of subjects in each arm. For the i-th subject on arm v, we use yvi = {yvij, j = 1, (...) , mvi} to denote the longitudinal PSA measurements, where mvi is the number of repeat measurements. We define Tvi to be the TTP, which is the time between the start of the CH/AA treatment and the progression to AIPC. We use tvi to denote the censoring time for censored observations, and the actual TTP for non-censored observations. We introduce a failure indicator dvi with dvi = 1 if Tvi = tvi and dvi = 0 if Tvi > tvi. The number of observed and unobserved TTP in each arm are denoted by nv1 and nv0, respectively. In summary, the observed data from each subject is (yvi, tvi, dvi). We use [X] and [X | Y] to generically indicate the probability model for a random variable X and the conditional distribution of X given Y.

3.1 The Likelihood

We define the sampling model for the observed data (yvi, tvi, dvi) from each patient. If dvi = 1, the progression time Tvi is observed. Therefore all subjects with dvi = 1 belong to the susceptible group. On the other hand, we only observe Tvi > tvi when dvi = 0. In this case the subject could be either in the susceptible group or in the cure group. We define a variable ωvi = 0/1 indicating membership in the susceptible/cure group. For dvi = 1, we have ωvi [equivalent] 0 by definition, and Tvi = tvi. The following discussion simplifies greatly by introducing latent variables Tvi and ωvi for subjects with censored TTP, dvi = 0.

If ωvi = 0, the subject is at risk of developing AIPC. We assume Tvi to be a random sample from distribution Gv, i.e., [Tvi | ωvi = 0,Gv] = gv(Tvi). Here gv(·) is the density function of Gv. If ωvi = 1, the subject is a long-term survivor. We assume Tvi = tc, where tc is an extremely long TTP that could not be observed in the clinical trial. Thus the model for TTP is [Tvi | ωvi,Gv] = {δ(tc)}ωvi{gv(Tvi)}1−ωvi, where δ(tc) denotes a point mass at tc. We assume ωvi|pviid Bernoulli(pv) with pv being the probability of cure under treatment v. In summary, we assume that Tvi arises from the mixture of a point mass and a continuous distribution. The prior distribution of TTP is Gv for all susceptible subjects under treatment v. The posterior distribution of TTP given PSA, however, may be very different across subjects.

Given Tvi, the longitudinal measurements yvi are modeled by a mixed effects model [yvi | Tvi, Ψ], indexed by parameters Ψ. We include Tvi in [yvi | Tvi, Ψ] via a regression on u(Tvi), as a subject-specific covariate. Here u(·) is a certain function of Tvi. With different specifications of u(Tvi) and [yvi | Tvi, Ψ], we can model various mechanisms between Tvi and yvi. In summary, the likelihood factors corresponding to (yvi, tvi, dvi) are


Note that Lvi0 is an augmented likelihood with latent variable Tvi and ωvi.

For Lvi0, the two values taken by ωvi lead to two models of different dimensions. If ωvi = 0, Tvi is a random parameter with prior Gv. In contrast, Tvi is fixed at Tvi = tc if ωvi = 1. Such a change in dimension complicates posterior simulation (Green, 1995). We use the pseudo prior approach by Carlin and Chib (1995) to avoid this complication. In words, we augment the smaller probability model under ωvi = 1 by defining a prior probability model for a hypothetical Tvi (but keep tc in the regression for yvi). The new variable Tvi has no meaningful interpretation under ωvi = 1. It is only introduced to match the model dimensions. The pseudo prior mechanism serves the same purpose as reversible jump (Green, 1995), but avoids the changing dimension of the parameter vector (Dellaportas et al., 2002). See Web Appendix A for implementation details.

3.2 The Prior Probability Model

We assume prior independence, [Ψ,pv,Gv,v=1,2]=[Ψ]·v=12[pv]·[Gv]. The prior specification [Ψ] and posterior inference for mixed models of repeated measurements have been discussed extensively. See, for example, Ibrahim et al. (2004). For priors [pv], v = 1, 2, we assume pv ~ Beta(ap, bp) with ap and bp being fixed hyperparameters.

For the unknown survival distribution Gv, we consider two choices. The first choice is a parametric model assuming Gv to be indexed by a finite-dimensional parameter vector. In this case the prior specification involves assigning priors to these parameters. Recall that Gv is the therapy-specific prior TTP distribution for susceptible subjects, who might be different in many important aspects. A parametric model may not suffice to characterize the complexity of Gv. This difficulty motivates the second choice, a non-parametric method. A Bayesian non-parametric prior defines Gv as a random probability measure, i.e., a prior for the unknown distribution Gv, denoted by PTv,[mathematical script A]v).

A PT prior is indexed by two hyper-parameters: a nested sequence of partitions Π = {B0, B1, B00, B01, …, Bε0,Bε1, …} of the sample space S, with S = B0 [union or logical sum] B1 and Bε = Bε0 [union or logical sum] Bε1; and parameters [mathematical script A] = {α0, α1, α00, α01, …, αε0, αε1, …}. Here ε = ε1 (...) εm denotes a binary sequence of length m. We can center the PT prior around a given distribution G, by setting αε0 = αε1 and defining Bε at level m to coincide with quantiles G−1(k/2m), k = 0, 1, (...) , 2m. The parameter [mathematical script A] has a similar role as the precision parameter in a Dirichlet process prior. Berger and Guglielmi (2001) considered a family of the form αε1, (...), εm = c·ρ(m), where ρ(m) = m2, m3, 2m, 4m or 8m, and c > 0 is a constant. In general, any ρ(m) such that m=1ρ(m)1< guarantees the PT to be absolutely continuous. See Hanson (2006) for a recent review. More details are presented in Web Appendix B.

3.3 Posterior Inference and Model Validation

To facilitate discussion, we define the following notation. The set of observed and unobserved TTPs under treatment v are denoted by tv1={tvi:dvi=1}andTv0={Tvi:dvi=0}, respectively. We also define ωv0={ωvi:dvi=0} to be the set of unknown indicators of cure. Without loss of generality, we assume that dvi = 0 for i = 1, (...) , nv0, and dvi = 1 for i = nv0+1, (...) , nv. Let Λ denote the collection of model parameters, including Ψ,Gv,Tv0,ωv0andpv. We have the full posterior distribution:


where (Y, t, d) = {(yvi, tvi, dvi) : v = 1, 2; i = 1, (...), nv}. We implement posterior inference by Markov Chain Monte Carlo (MCMC) posterior simulation.

Before proceeding with posterior MCMC, we analytically marginalize (2) with respect to Gv. Recall that each subject with ωvi = 0 is assumed to have a TTP arising from Gv. Define Tvs=tv1{Tvi;dvi=ωvi=0} to be the set of observed and unobserved TTP in the susceptible group. The size of Tvsisnvs=nvi=1nvωvi. Finally, let Tvjs denote the j-th element in Tvs, with the index j assigned arbitrarily. The joint probability model of Tvs and Gv is [Tvs,Gv]=j=1nvs[Tvjs|Gv]·[Gv]. We marginalize Gv by replacing [Tvs,Gv]with[Tvs]=j=2nvs[Tvjs|Tv1s,,Tv,j1s]·G˜v(Tv1s). Here [Tvjs|Tv1s,,Tv,j1s] is the (posterior) predictive distribution under a PT model, defined in Web Appendix B. The marginalization is important. Instead of working with the infinite dimensional random distributions Gv, it allows us to manipulate only the (finite dimensional) set of event times Tvs. Details of the MCMC transition probabilities are presented in Web Appendix C.

We compare the proposed model with four natural alternatives. Details of the competing models and results are described later, in Section 5. We use the conditional predictive ordinates (CPO) proposed by Gelfand et al. (1992) to compare different models. The CPO for subject i in group v (henceforth subject (v, i)) is defined as the posterior predictive distribution evaluated for the observation from subject (v, i), conditional on all the data minus the response from subject (v, i). Formally, letting (Y(−vi), t(−vi), d(−vi)) = (Y, t, d) \ (yvi, tvi, dvi), we define CPOvi = [yvi, tvi | Y(−vi), t(−vi)], where dvi is assumed given. Then we compute a summary statistic called the logarithm of the pseudomarginal likelihood (LPML), LPML=v=12i=1nvlog(CPOvi). A small value of LPML suggests disagreement between the observations and the model. Gelfand et al. (1992) show how the CPO for each subject, in our case v = 1, 2 and i = 1, …, nv, can be evaluated through an importance sampling scheme. We describe the computation of CPO in Web Appendix D.

4 A Phase III Study of Prostate Cancer

We return to the clinical trial from Section 2. The phase III trial for advanced prostate cancer had a total enrollment of 286 patients, with n1 = 137 in the CH arm and n2 = 149 in the AA arm. Starting from the diagnosis of prostate cancer, the PSA level of each patient was monitored for up to 10 years. On average, about 30 PSA measurements were collected from each patient. We use yvij (j = 1, (...) ,mvi) to denote the log-transformed longitudinal PSA measurement, yvij = log(1 + PSA). The age at which yvij was recorded is denoted by svij. As reference points, the age at diagnosis of prostate cancer is denoted by uvi0, and the age at the initiation of the CH/AA treatment is denoted by uvi1. The number of observed TTP events in the two treatment arms are n11 = 87 and n21 = 98, respectively. Figure 1 shows the Kaplan-Meier estimates of the survival function under the two treatments. There are plateaus at the end of the curves. This observation suggests that a significant portion of subjects have an excessively long event time and a cure model is appropriate.

Figure 1
The horizontal axis indicates years after treatment. The censoring times are marked by +. “K-M Est.” denotes Kaplan-Meier estimates. “Model Est.” denotes estimates based on model M1, where TTP are assumed to arise from ...

PSA level normally increases as the prostate enlarges with age. When prostate cancer develops, however, it increases much faster. The typical effect of a treatment on PSA level is a sharp drop in PSA level immediately after the treatment. Then gradually, the body adjusts to offset the treatment effect, and the PSA level bounces back. The speed of rebound depends on the progress of cancer. Web Figure 1 plots the longitudinal profiles of four randomly selected patients. Note the variability among the profiles. Exploratory analysis indicates a negative correlation between the PSA slope and TTP. Based on these considerations, the longitudinal submodel [yvi | Tvi, Ψ] is specified as yvij = fvi(svij) + evij with


where (x)+ = x if x > 0, and (x)+ = 0 otherwise. We assume independent normal residuals, evijiidN(0,σ2). The first two terms define a line with intercept θ0vi and slope θ1vi, describing the baseline linear trend of PSA over age. The coefficients are subject-specific. Parameters ηv and [var phi]1v model the size and the slope of the drop after the intervention with CH or AA. As age svij moves beyond uvi1, ηv{exp[−[var phi]1v(svijuvi1)+] − 1} drops from 0 and eventually levels off at −ηv, i.e., ηv controls the depth and [var phi]1v controls the slope of the drop. A smaller value of [var phi]1v indicates that the treatment effect persists longer. Similarly, parameters γ2vi and [var phi]0vi model the size and the slope of the drop due to the initial therapy right after the diagnosis of prostate cancer. We assume γ2vi and [var phi]0vi to vary individually since information about the initial therapy is unavailable. We use γ1v to model the average change of slope in the baseline trend, induced by treatment v. Finally, model (3) reflects our belief that subjects with flatter longitudinal profiles take longer to progress. To see this, first we observe that in (3) the slope after treatment, i.e., the coefficient of (svijuvi1)+, is θ1vi + γ1v + (θ1vi + γ1v)[exp(−ξvTvi) − 1]. Here ξv is constrained to be positive. With Tvi changing between 0 and +∞, the slope changes from θ1vi + γ1v to 0. We substitute a realistic upper bound for the limiting Tvi → ∞ using tc = 18 years. Matching with the earlier notation [yvi | Tvi, Ψ] used in (2), we have Ψ = (θ0, θ1, γ1, γ2, η, [var phi]0, [var phi]1, ξ, σ2). Here θ0 = {θ0vi, v = 1, 2; i = 1, (...), nv}, η = {ηv, v = 1, 2}, and θ1, γ2, [var phi]0, [var phi]1, ξ are defined in the same fashion. In (3) we use u(Tvi) = exp(−ξvTvi) − 1. In summary, besides TTP, the covariates considered include age, treatment, and time under treatment.

As for the PT priors, Gv ~ PTv,[mathematical script A]v), we use Π1 = Π2 = Π and [mathematical script A]1 = [mathematical script A]2 =[mathematical script A]. Thus E(Gv) = G for v = 1, 2. That is, the two PTs are centered around the same distribution a priori. The matching hyperprior parameters for the two PT priors ensures that posterior inference about the differences between two treatment groups reflects the evidence from data. For the centering measure G, we assume a Weibull distribution, G(t) = Weibull(t; τ, β). Here β and τ are, respectively, the shape and scale parameter. The partition Π is specified by the dyadic quantile sets of G. The elements of [mathematical script A] at the mth level are specified to be c·m2, with c being a constant.

The mixture probability pv is assumed to be Unif(0, 1), i.e., ap = bp = 1. The prior of Ψ and other hyperprior parameters are specified as follows. We assume (θ0vi, θ1vi, γ2vi)′ | µ, Σ ~ N3(µ,Σ), γ1v ~ N(0, 100), ηv ~ N(0, 100), ξv ~ Ga(a, b), [var phi]0vi ~ Ga(a, b), [var phi]1v ~ Ga(a, b), and 1/σ2 ~ Ga(a, b), all with a = b = 0.01. Here Ga(a, b) denotes a Gamma prior with mean a/b. We further assume µ ~ N3(0, 100I) and Σ ~ IW(3, 0.01I3). Here IW(ν, A) indicates an inverse Wishart prior with ν degree of freedom and matrix parameter A. The specification of hyper-parameter (τ, β) for G is based on estimation of the Weibull model M2, described in Section 5. We set τ = 4.52 and β = 1.23, which are the posterior means of Weibull parameters from the CH group.

5 Results

Model Selection

To validate the proposed model we consider comparisons with four alternative models. Let M1 denote the proposed model (2). The second model, M2, is also based on the factorization P(T, Y) = P(T)P(Y | T), with P(Y | T) as in (3), but P(T) being fully parametric. We assume a Weibull regression model for (Tvi | ωvi = 0) with an indicator of treatment as the covariate. The third model, M3, assumes no cure group. It is obtained from model (2) by setting ωvi = 0 for all patients. The last two models, M4 and M5, are constructed under the factorization P(T, Y) = P(Y)P(T | Y), where the longitudinal submodel P(Y) is specified as yvij = fvi(svij) + evij with


and evij ~ N(0, σ2). The survival submodel P(T | Y) is assumed to be a proportional hazard model with a cure fraction pv. The mean longitudinal process, fvi(svij), together with the PSA slope, fvi(svi)=fvi(svi)/svi, are included as time-dependent covariates (Yu et al., 2004). We assume the following hazard function,


where hv0(t) is the baseline hazard and (ζ1v, ζ2v) are scaling parameters. Under M4, we model hv0(t) as a piecewise constant function of J = 8 steps. For 0 < q1 < q2 < (...) < qJ−1 < ∞, we assume hv0(t) = κv1 if tq1, hv0(t) = κv2 if q1 < tq2, (...), and hv0(t) = κvJ if t > qJ−1. Gamma priors are assumed for κvj. More details can be found in Ibrahim et al. (2004). Under M5, we model hv0(t) by a Weibull hazard, with Gamma priors for the scale and shape parameters. The priors of the other parameters are specified as in M1.

The estimated LPML under M1 through M5 are 4833.6, 5115.2, 5007.9, 4889.1, and 5155.0, respectively. Clearly M1 achieves the best performance. The nonparametric PT model allows the density function to deviate from the form imposed by the Weibull assumption. Assuming a cure group further improves the model fit. Model M4 has the second best performance, which indicates that the PSA trajectory does play an important role in prostate cancer progression. The inferior performance of M5 suggests that the Weibull hazard assumption might be too restrictive for our data. We further validate the survival and cure aspect of the model based on subject specific martingale residuals (Barlow and Prentice, 1988; Therneau et al., 1990; Lin et al., 2002). The residuals are scattered horizontally over age (with three outliers), suggesting no evidence against the proposed model. The residual plot is shown in Web Figure 2.

The posterior distribution of TTP

The estimated cure probabilities pv (v = 1, 2) for the CH and AA treatments are 0.167 and 0.154, respectively. For advanced prostate cancer patients, here “cure” means that those patients take a very long time to progress to AIPC. Figure 2a shows the estimated densities of TTP in the susceptible group, E(Gv | Y, t, d), under the two treatments. The horizontal axis is in years after the treatments. For comparison, Figure 2b plots the posterior estimate of Weibull densities under M2. Figure 2 clearly shows deviation from the parametric Weibull distribution. For example, there is a small bump in the CH density curve around 7.5, which is also visible in the Kaplan-Meier estimates in Figure 1. This feature can not be captured by M2. In Figure 1 we also plot the posterior estimate of the survival function under model M1, where TTP are assumed to arise from the mixture of a point mass at tc and an unknown distribution Gv. Because a PT prior with a fixed partition has discontinuities at the partition points, we used additional kernel smoothing for the densities shown in Figure 2. Finally, we can assess the posterior uncertainty on Gv by plotting multiple random samples from its posterior distribution (Web Figure 3).

Figure 2
Posterior estimated E(Gv | Y, t, d) under M1 and M2. The horizontal axis shows years after treatment.

The dependence of event times on longitudinal profiles

Under model M1, different PSA profiles lead to different posterior distributions of Tvi. In Figure 3 we compare for four patients with censored TTP the PSA profiles (1st column) and the estimated posterior probability of “cure” Pvi = 1) and the conditional hazard curve of Tvi given ωvi = 0 (2nd column). Each row corresponds to one patient, with the first two under treatment CH, and the last two under AA. We plot the PSA profiles after initiation of the therapies. Figure 3 demonstrates the flexibility of M1. Each patient has a hazard curve of a different shape.

Figure 3
Posterior prediction of TTP (hazard given ω = 0 and P(ω = 1)) for four censored patients. The horizontal axis is time in years from the start of the AA/CH therapy.

The longitudinal model parameters

Web Figure 1 plots the longitudinal PSA profiles of four patients together with fitted values. Table 1 lists the posterior means and standard deviations of some parameters in M1. The posterior estimates of ξv (v = 1, 2) are practically identical, implying that the impact of TTP on the trajectory of PSA profiles are similar across the two treatments. The estimates of γ1v indicate that the PSA profiles of patients in the AA arm on average have an increased slope after treatment. The level and slope of the drop in PSA after the CH/AA treatment are modeled by lv(t) = ηv[exp(−[var phi]1vt) −1], where t ≥ 0 is the time from the start of treatment v. The patients under CH therapy experience a deeper and longer drop in PSA. We plot lv(t) in Web Figure 4.

Table 1
Parameter Estimates in M1

Continuously reassessing the risk of progression

Given a currently observed PSA profile, we can use the proposed method to obtain the predictive distribution of TTP, which provides a good assessment of progression risk. This predictive distribution can be continuously updated with additional PSA measurements. We demonstrate this learning process in Figure 4. The left panel plots the PSA profiles of two hypothetical patients from the AA arm. Each point denotes a PSA measurement. The two patients have their PSA level measured at the same time points. Within the first two years the two PSA profiles are identical, and then they deviate: the first patient’s PSA level stays low, while the second gradually rises. The center panel shows the continuously updated posterior estimates of P(ω = 1 | yt), with yt being the accumulated PSA measurements up to the time of assessment. We interpret P(ω = 1 | yt) as the individual probability of long term survival. The right panel shows the continuously updated posterior estimates of E(T | ω = 0, yt).

Figure 4
The left panel plots the PSA profiles of two hypothetical patients from the AA arm. The horizontal axis is time in years from the initiation of treatment. Each point denotes a PSA measurement. The center panel shows the continuously updated posterior ...

Sensitivity analysis

We conducted a sensitivity analysis to explore the impact of tc and c on posterior inference. We tried two values for c, (0.1, 1), and three values for tc, (15, 18, 20). The posterior means of pv and LPML are listed in Table 2. The estimated cure rate slightly increases with larger c, which implies stronger shrinkage of Gv to the parametric centering measure G. The parametric Weibull model G can not represent the secondary mode that we see in the data and in the non-parametric inference for Gv. Compensating the missing secondary mode by an increased cure fraction could explain the change in the posterior means of pv. The estimated LPML indicates the sensitivity of model fitting to tc and c. This issue can be resolved by expanding the model with hyperpriors on c and tc. We set them fixed to keep the discussion focused.

Table 2
Sensitivity Analysis

6 Discussion

The proposed model allows researchers to relax parametric assumptions on the survival submodel imposed by existing methods. An important limitation is that P(T, Y) = P(T)P(Y | T) does not explicitly state how T is affected by Y. Given a particular longitudinal profile, we need to carry out posterior simulation to learn about the posterior survival distribution given Y. The proposed approach can readily be generalized to problems with more than two treatments. The longitudinal data model (3) is appropriate for the discussed application to the prostate cancer trial. In general, any well specified model with a regression on the event time could be used.

Supplementary Material

Supp Data


We thank the two reviewers and associate editor for their constructive suggestion. We thank Randall Millikan for supplying the data set for analysis. Our work was partially supported by the NIH CTSA Grant UL1 RR024982, the Cancer Center Support Core Grant CA16672, SPORE in prostate cancer grant CA90270 from the National Cancer Institute, National Institute of Health.


Supplementary Materials

Web Appendices and Figures referenced in the paper are available under the Paper Information link at the Biometrics website


  • Barlow WE, Prentice RL. Residuals for relative risk regression. Biometrika. 1988;75:65–74.
  • Berger JO, Guglielmi A. Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. Journal of the American Statistical Association. 2001;96(453):174–184.
  • Brown ER, Ibrahim JG. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics. 2003;59(3):686–693. [PubMed]
  • Carlin BP, Chib S. Bayesian model choice via Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B: Methodological. 1995;57:473–484.
  • Carter B, Ferrucci L, Ketterman A, Landis P, Wright J, Epstein JI, Trock B, Metter J. Detection of life-threatening prostate cancer with prostate-specific antigen velocity during a window of curability. Journal of the National Cancer Institute. 2006;98:1521–1527. [PMC free article] [PubMed]
  • Chen M-H, Ibrahim JG, Sinha D. A new joint model for longitudinal and survival data with a cure fraction. Journal of Multivariate Analysis. 2004;91(1):18–34.
  • Dafni UG, Tsiatis AA. Evaluating surrogate markers of clinical outcome when measured with error. Biometrics. 1998;54:1445–1462. [PubMed]
  • De Gruttola V, Tu X. Modelling progression of CD4-Lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed]
  • Dellaportas P, Forster JJ, Ntzoufras I. On bayesian model and variable selection using mcmc. Statistics and Computing. 2002;12(1):27–36.
  • Ellerhorst J, Tu S, Amato R, Finn L, Millikan R, Pagliaro L, Jackson A, Logothetis C. Phase II trial of alternating weekly chemohormonal therapy for patients with androgen-independent prostate cancer. Clinical Cancer Research. 1997;3:2371–2376. [PubMed]
  • Gelfand A, Dey D, Chang H. Model determination using predictive distribution with implementation via sampling-based methods (with discussion). Bayesian Statistics 4 – Proceedings of the Fourth Valencia International Meeting; Oxford University Press.1992.
  • Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732.
  • Hanson T, Branscum A, Johnson W. Technical report. University of Minnesota; 2007. Joint modeling of longitudinal and survival data using mixtures of polya trees.
  • Hanson TE. Inference for mixtures of finite Polya tree models. Journal of the American Statistical Association. 2006;101(476):1548–1565.
  • Hanson T, Johnson WO. Modeling regression error with a mixture of Polya trees. Journal of the American Statistical Association. 2002;97(460):1020–1033.
  • Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics (Oxford) 2000;1(4):465–480. [PubMed]
  • Ibrahim JG, Chen M-H, Sinha D. Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials. Statistica Sinica. 2004;14(3):863–883.
  • Lavalley MP, De Gruttola V. Models for empirical Bayes estimators of longitudinal CD4 counts (Disc: P2337–2340) Statistics in Medicine. 1996;15:2289–2305. [PubMed]
  • Lavine M. Some aspects of Polya tree distributions for statistical modelling. The Annals of Statistics. 1992;20:1222–1235.
  • Law NJ, Taylor JMG, Sandler H. The joint modeling of a longitudinal disease progression marker and the failure time process in the presence of cure. Biostatistics (Oxford) 2002;3(4):547–563. [PubMed]
  • Lin H, Turnbull BW, McCulloch CE, Slate EH. Latent class models for joint analysis of longitudinal biomarker and event process data: Application to longitudinal prostate-specific antigen readings and prostate cancer. Journal of the American Statistical Association. 2002;97(457):53–65.
  • Millikan RE, Wen S, Pagliaro LC, Brown MA, Moomey B, Do K, Logothetis CJ. Phase iii trial of androgen ablation with or without three cycles of systemic chemotherapy for advanced prostate cancer. Journal of Clinical Oncology. 2008;26(36):5936–5942. [PubMed]
  • Muliere P, Walker S. A Bayesian non-parametric approach to survival analysis using Polya trees. Scandinavian Journal of Statistics. 1997;24(3):331–340.
  • Neath AA. Polya tree distributions for statistical modeling of censored data. Journal of Applied Mathematics and Decision Sciences. 2003;7(3):175–186.
  • Paddock SM, Ruggeri F, Lavine M, West M. Randomized Polya tree models for nonparametric Bayesian inference. Statistica Sinica. 2003;13(2):443–460.
  • Pauler DK, Finkelstein DM. Predicting time to prostate cancer recurrence based on joint models for non-linear longitudinal biomarkers and event time outcomes. Statistics in Medicine. 2002;21(24):3897–3911. [PubMed]
  • Pawitan Y, Self S. Modeling disease marker processes in AIDS. Journal of the American Statistical Association. 1993;88:719–726.
  • Therneau TM, Grambsch PM, Fleming TR. Martingale-based residuals for survival models. Biometrika. 1990;77:147–160.
  • Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica. 2004;14(3):809–834.
  • Tsiatis AA, De Gruttola V, Wulfsohn MS. Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association. 1995;90:27–37.
  • Walker SG, Mallick BK. Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing. Journal of the Royal Statistical Society, Series B: Methodological. 1997;59:845–860.
  • Walker S, Mallick BK. A Bayesian semiparametric accelerated failure time model. Biometrics. 1999;55:477–483. [PubMed]
  • Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed]
  • Xu J, Zeger SL. Joint analysis of longitudinal data comprising repeated measures and times to events. Journal of the Royal Statistical Society, Series C: Applied Statistics. 2001;50(3):375–387.
  • Yu M, Law NJ, Taylor JMG, Sandler HM. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14(3):835–862.
  • Yu M, Taylor JMG, Sandler HM. Individual prediction in prostate cancer studies using a joint longitudinal survival-cure model. Journal of the American Statistical Association. 2008;103(481):178–187.