Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J R Stat Soc Ser C Appl Stat. Author manuscript; available in PMC 2010 April 12.
Published in final edited form as:
J R Stat Soc Ser C Appl Stat. 2009 September 3; 59(1): 19–34.
doi:  10.1111/j.1467-9876.2009.00680.x
PMCID: PMC2853262

A Bayesian hierarchical mixture model for platelet derived growth factor receptor phosphorylation to improve estimation of progression-free survival in prostate cancer


Advances in understanding the biological underpinnings of many cancers have led increasingly to the use of molecularly targeted anti-cancer therapies. Because the platelet-derived growth factor receptor (PDGFR) has been implicated in the progression of prostate cancer bone metastases, it is of great interest to examine possible relationships between PDGFR inhibition and therapeutic outcomes. Here, we analyze the association between change in activated PDGFR (p-PDGFR) and progression free survival (PFS) time based on large within-patient samples of cell-specific p-PDGFR values taken before and after treatment from each of 88 prostate cancer patients. To utilize these paired samples as covariate data in a regression model for PFS time, and because the p-PDGFR distributions are bimodal, we first employ a Bayesian hierarchical mixture model to obtain a deconvolution of the pre-treatment and post-treatment within-patient p-PDGFR distributions. We evaluate fits of the mixture model and a non-mixture model that ignores the bimodality by using a supnorm metric to compare the empirical distribution of each p-PDGFR data set with the corresponding fitted distribution under each model. Our results show that first using the mixture model to account for the bimodality of the within-patient p-PDGFR distributions, and then using the posterior within-patient component mean changes in p-PDGFR so obtained as covariates in the regression model for PFS time provides an improved estimation.

Keywords: Bayesian analysis, Survival analysis, Markov chain Monte Carlo, Platelet derived growth factor receptor, Prostate cancer

1. Introduction

In recent years, clarification of the molecular underpinnings of many types of cancer has yielded improved therapeutic strategies involving so-called “targeted therapies.” If a molecular target is found to be present in a tumor, the strategy with targeted therapy is to achieve anti-disease effect by specific inhibition of the target. Biomarkers associated with therapeutic outcome may facilitate the process of choosing among existing targeted treatments for individual patients.

The platelet-derived growth factor receptor (PDGFR) is an established therapeutic target in several important cancers. PDGFR is of particular interest in men with advanced prostate cancer involving bone metastases because it frequently is over-expressed in this cancer. Denote phosphorylated PDGFR by p-PDGFR. Mathew et al. (2007) conducted a randomized, placebo-controlled clinical trial of the chemotherapeutic agent docetaxel combined with either the oral p-PDGFR inhibitor imatinib mesylate (DI) or placebo (DP) in patients with this disease, with time to disease progression or death (progression free survival, PFS) as the primary endpoint. A key secondary endpoint in this trial was the amount of inhibition of p-PDGFR since the putative effect of imatinib on inhibition of p-PDGFR coupled with the relationship seen between p-PDGFR inhibition in an experimental model of bone metastases (Uehara, et al. 2003) suggested that change in p-PDGFR level from its baseline level pre-treatment to its level post-treatment may be related to PFS time. In order to measure p-PDGFR inhibition, peripheral blood samples were drawn from each patient at two different time points: prior to chemotherapy, and after treatment with one cycle of chemotherapy. In each sample, the intensity of expression of activated p-PDGFR was measured in each of approximately 2000 individual peripheral blood leucocytes using immunoflourescent antibodies, capturing these with laser-scanning cytometry. It was hypothesized that the decrease in p-PDGFR level might be larger in the DI arm due to p-PDGFR inhibition by imatinib and that, in turn, the magnitude of the decrease might be associated with improved PFS. Because very large samples of cell-specific p-PDGFR values were obtained at both measurement times for each patient, very reliable estimates of mean pre-treatment and post-treatment p-PDGFR levels within each patient were available. The difference between the pre- and post-treatment within-patient sample mean p-PDGFR levels proved to be associated with PFS (Mathew et al., 2007). It was noted that in the control arm with docetaxel alone, DP, mean p-PDGFR rose after therapy, but in the DI arm a significantly smaller decrease in p-PDGFR was seen. On average, patients who had a smaller increase in p-PDGFR, corresponding to greater p-PDGFR inhibition, had longer PFS times.

Visual examination of the histograms of the within-patient p-PDGFR samples indicates that their distributions are clearly bimodal, both pre-treatment and post-treatment. This is illustrated by Figure 1, which gives histograms of the log-transformed p-PDGFR values for three typical patients. This observation motivated the statistical re-analysis of this data set that we report here. The main idea of our analysis is to exploit the fact that one may reliably estimate the entire distribution of p-PDGFR expression values at each measurement time for each patient, rather than only their sample means. We hypothesized that accounting for the bimodality of the p-PDGFR distributions would improve the estimation of PFS by providing a more refined representation of p-PDGFR inhibition for use as covariates in the regression model.

Figure 1
Histograms of the log-transformed p-PDGFR values for each of three typical patients,

Technically, the problem is to estimate each within-patient pre- and post-treatment p-PDGFR distribution, and then choose a small number of features from each distribution for use as covariates in a regression model for PFS. To carry out this analysis, we first fit a Bayesian hierarchical mixture model to the p-PDGFR data accounting for the observed bimodality (McLachlan and Peel, 2000; Gelman, et al., 2004). We compared the fit of this model with that of a corresponding, simpler Bayesian hierarchical model ignoring the bimodality, using a supnorm metric to quantify the distance between the empirical distribution of each observed within-patient p-PDGFR sample and the corresponding fitted distribution obtained under each model. We then used the posteriors of parameters characterizing mean change in p-PDGFR level obtained from either components of the fitted mixture model or, as a basis for comparison, from the fitted non-mixture model, as covariates in a regression model for estimating PFS. In particular, under either hierarchical model for the p-PDGFR data, the posterior mean changes are random quantities, and we account for this randomness when using these as covariates in the regression models for PFS time. Our results indicate that accounting for the bimodality of the p-PDGFR distributions yields a substantive improvement in the fit of the regression model for PFS.

In Section 2, we describe the data structure of the prostate cancer clinical trial and the Bayesian mixture and survival models. In Section 3, we describe the algorithm that was used to fit the mixture model. We present our analyses of the p-PDGFR data in Section 4, and close with a brief discussion in Section 5.

2. Data structure and models

2.1 Data structure

As mentioned in Section 1, Mathew et al. (2007) conducted a randomized clinical trial to compare DI (docetaxel plus imatinib) and DP (docetaxel plus placebo) in patients with advanced prostate cancer involving bone metastases. They accrued patients between April 2003 and July 2005 at five tertiary cancer care centers in the United States. Paired samples of the p-PDGFR values from peripheral blood leukocytes from 88 men (41 in DI and 47 in DP) were available before and after treatment. The within-patient leukocyte sample sizes {mi} taken pre-treatment and {ni} taken post-treatment were very similar, with median{mi} = 2021 (90% ci 2000 – 2074) and median {ni} = 2031 (90% ci 2000 – 2103).

For patient i = 1,…,N, let Xi = (Xi,1,…,Xi,mi) denote the p-PDGFR values of the mi cells in the pre-treatment blood sample, Yi = (Yi,1,…,Yi,ni) the ni p-PDGFR values in the post-treatment sample, with X = (X1,…,XN) and Y = (Y1,…,YN). Let Ti denote PFS time, Tio the observed value of Ti or right-censoring time and εi = 1 if Ti=Tio and 0 otherwise. We will utilize the covariates Z1i = 1 if patient i received DI and 0 if DP, Z2i = 1 if the pre-treatment hemoglobin (Hb) level is greater than 11 g/dl and 0 if not, and Z3i the pre-treatment to post-treatment increase in prostate-specific antigen (PSA) level, and we denote Zi = (Z1i, Z2i, Z3i) and Z = (Z1,…,ZN).

2.2 Non-mixture model for the p-PDGFR data

As a basis for comparison, we first fit the following model that ignores the observed bimodality of the p-PDGFR data. Denote the mean pre-treatment and post-treatment p-PDGFR values for the ith patient by μxi and μyi, respectively, with corresponding precision (inverse variance) parameters τxi and τyi. Each of these parameters corresponds to the p-PDGFR values of all leukocytes in the patient’s peripheral blood at the time the sample was taken. We assume patient-specific means μxi=μxi*+ξiandμyi=μyi*+ξi,whereξi=μxiμxi*=μyiμyi* is an effect associated with patient i that applies both before and after treatment. Given the patient-specific parameters θi=(ξi,μxi*,μyi*,τxi,τyi), we assume that each pair Xij and Yik are conditionally independent with the following normal distributions:


We assume that the elements of θi are a priori mutually independent with priors ξiN(μ˜ξ,τ˜ξ1),μxi*N(μ˜x*,τ˜x*1),μyi*N(μ˜y*,τ˜y*1), τxi ~ Ga(ãx, bx), and τyi ~ Ga(ãy, by). Collecting terms, we denote the hyperparameter vector by θ˜=(μ˜ξ,τ˜ξ,μ˜x*,τ˜x*,μ˜y*,τ˜y*,a˜x,b˜x,a˜y,b˜y), and the ith prior by p1(θi | [theta w/ tilde]). To complete the hierarchical model specification, we assume vague normal hyperpriors with mean 0 and precision 0.001 for μ˜x*,μ˜y*, and [mu]ξ and vague gamma hyperpriors with both shape and inverse scale 0.001 for τ˜x*,τ˜y*, [tau with tilde]ξ, ãx, bx, ãy, and by, and we denote the second level prior by p2([theta w/ tilde] | ϕ), where ϕ is the vector of fixed numerical parameters values that determine the hyperpriors. Denoting θ = (θ1,…,θN), the posterior of θ given the p-PDGFR data X, Y is proportional to the product of the patient-specific likelihoods and priors and the hyperprior,

p(θ | X,Y)i=1N[{j=1mif(Xij | ξi,μxi*,τxi)k=1nif(Yik | ξi,μyi*,τyi)}p1(θi | θ˜)] p2(θ˜ | ϕ).

Under this model, ξi accounts for correlations between pairs (Xij, Yik). Since E(Xij)=μxi*+ξi,eachμxi* accounts for correlations between pairs (Xij, Xik), and μyi* accounts for correlations between pairs (Yij, Yik). These correlations may be computed analytically as functions of [mu]ξ, μ˜x*,μ˜y*,τ˜ξ,τ˜x*,τ˜y*, τxi and τyi. We will focus attention on δi = μyi − μxi, the post-treatment minus pre-treatment change in within-patient mean p-PDGFR level, when used as a covariate in a time-to-event regression analysis of PFS.

2.3 Mixture model for the p-PDGFR data

To account for the bimodality observed in the within-patient p-PDGFR distributions, we now specify a more general version of the model given in section 2.2 by assuming that each within-patient pre-treatment and post-treatment p-PDGFR distribution is a mixture of two components, a left (L) and a right (R) distribution. To construct these mixture distributions, we first augment the observed p-PDGFR data (Xi, Yi) of the ith patient with the unobserved (latent) patient-specific mixture indicators ζxij = 1 if the pre-treatment value Xij is drawn from the left side component distribution and 0 if from the right side. Similarly, ζyik = 1 for the left and 0 for the right side of the post-treatment distribution. We assume that ζxij ~ Bernoullixi) and ζyik ~ Bernoulliyi). Generalizing the non-mixture model formulation given above, we assume the following normal distributions for each of the L and R component distributions of Xi and Yi :

Left SideRight Side



Under the mixture model, the patient-specific parameter vector now has the more refined structure θi = (ξi, θLi, θRi, λi) where θLi=(μxLi*,τxLi,μyLi*,τyLi),θRi=(μxRi*,τxRi,μyRi*,τyRi) and λi = (λxi, λyi). Thus, θi has 11 elements under the mixture model compared to 5 elements under the non-mixture model.

Given θi, each pair Xij and Yik are again conditionally independent, but in order to specify the likelihood contribution of (Xi, Yi) we now also require the latent mixture indicators ζxi = (ζxi1,…,ζximi) and ζyi = (ζyi1,…,ζyini). Denoting the pdf of a normal distribution with mean μ and precision τ by fN(· | μ, τ), the mixture model likelihood for the ith patient’s p-PDGFR data takes the form

j=1mi[λxifN(Xij | μxLi*+ξi,τxLi)]ζxij[(1λxi)fN(Xij | μxRi*+ξi,τxRi)]1ζxij× k=1ni[λyifN(Yik | μyLi*+ξi,τyLi)]ζyik[(1λyi)fN(Yik | μyRi*+ξi,τyRi)]1ζyik

To generalize the priors to account for the L and R components of the mixture, we assume that the elements of θi are a priori independent, with ξiN(μ˜ξ,τ˜ξ1),μxLi*N(μ˜xL*,τ˜xL*1),μxRi*N(μ˜xR*,τ˜xR*1),μyLi*N(μ˜yL*,τ˜yL*1),μyRi*N(μ˜yR*,τ˜yR*1), τxLi ~ Ga(ãxL, bxL), τxRi ~ Ga(ãxR, bxR), τyLi ~ Ga(ãyL, byL), τyRi ~ Ga(ãyR, byR), λxi ~ Be([alpha]x, betax), and λyi ~ Be([alpha]y, betay). We denote the hyperparameter vector by [theta w/ tilde] = ([theta w/ tilde]1, [theta w/ tilde]2) where θ˜1=(μ˜ξ,τ˜ξ,μ˜xL*,τ˜xL*,μ˜xR*,τ˜xR*,μ˜yL*,τ˜yL*,μ˜yR*,τ˜yR*), [theta w/ tilde]2 = (ãxL, bxL, ãxR, bxR, ãyL, byL, ãyR, byR, [alpha]x, betax, [alpha]y, betay). To avoid non-identifiability in the mixture model, we require the restrictions μxLi*<μxRi* and μyLi*<μyRi*, which formalize what can be seen clearly in Figure 1. There are now two post-treatment minus pre-treatment mean differences, δLi = μyLi − μxLi for the L component distributions and δRi = μyRi − μxRi for the R components. We assume vague normal hyperpriors with mean 0 and precision 0.001 for [mu]ξ, μ˜xL*,μ˜xR*,μ˜yL* and [mu]yR, and vague gamma hyperpriors with both shape and inverse scale 0.001 for the rest of the hyperparameters.

2.4 Regression models for estimating PFS

For the regression analyses of PFS, we first fit a set of candidate time-to-event distributions to the PFS time data, including the exponential, Weibull, gamma, and log-normal. In each model, the linear term includes a main treatment effect (DI versus DP, the imatinib effect), treatment group-specific effects of Hb, Z2i, and treatment group-specific effects of change in PSA, Z3i. Under the non-mixture model, the linear term for patient i is


The main imatinib effect is β1, the two treatment group-specific effects of Hb are β2 and β3, the two treatment group-specific effects of change in PSA are β4 and β5, and the two treatment group-specific effects of the mean change in p-PDGFR, δi, are β6 and β7.

Under the mixture model, the linear term for patient i is


This model includes four terms corresponding to the effects of the mean changes in each component of the mixture model for p-PDGFR, β6 and β7 for the effects of δLi within the two treatment arms and, similarly, β8 and β9 for the corresponding effects of δRi. The formulations (2) and (3) differ from a conventional Bayesian regression model in that, for each patient, whereas each of the covariates Z1i, Z2i, Z3i is a single value, each covariate δi in (2) or δiL, δiR in (3) is itself a random parameter whose posterior distribution is estimated from the p-PDGFR data. In section 3.2, we will provide details of the model and computational alogorithm accounting for the randomness of δi from the non-mixture model, or (δLi, δRi) from the mixture model.

Denoting either the non-mixture or mixture model form of the linear term by η, we considered an exponential distribution with pdf f(t | η) = e−η exp{−e−ηt}, a Weibull with pdf f(t | η, ν) = νtν−1e−η exp{−e−ηtν}, a gamma with pdf f(t | η, ν) = {exp(−η)}ν tν−1 exp(−e−ηt)/Γ(ν), and a log-normal with pdf f(t | η, ν) = (ν/2π)1/2t−1 exp[−ν{log(t)−η}2/2]. We assumed vague normal prior distributions with mean 0 and precision 0.001 for the βj’s in each linear term and a vague gamma prior distribution with both shape and inverse scale 0.001 for any of the additional scale or shape parameters ν appearing in the time-to-event model pdf. To choose a model for the PFS analyses, we compared the fits of the four distributions using the Deviance Information Criterion (DIC, Spiegel-halter, Best, Carlin and van der Linde, 2002) and the Bayes Information Criterion (BIC, Schwarz, 1978).

3. Model fitting

We used Markov chain Monte Carlo (MCMC) methods to obtain samples from the posterior distributions of the parameters for both the non-mixture and mixture models (Gilks, Richardson and Spiegelhalter, 1996; McLachlan and Peel, 2000), and also for the time-to-event regression model fits. Each algorithm was run in five parallel chains to assess convergence of the MCMC algorithm. A burn-in of 1,000 and a chain of length 20,000, retaining every 10th sample, provided adequate convergence. The sampling scheme that we used to compute posteriors for the hierarchical mixture model is described in the Appendix.

3.1 Computation of the regression model parameter posterior

Since the patient-specific parameters δi under the non-mixture model, or δiL, δiR under the mixture model, are used as covariates in the linear terms (2) and (3) of the regression models, it is important to clarify how the posteriors are computed. For simplicity, denote the PFS time data by T, the changes in mean p-PDGFR by δ, and the covariate parameters by β. The posterior of the regression model parameters is

p(β,ν | T,Z,X,Y)=p(β,ν | T,Z,δ)p(δ | X,Y)dδ.

This accounts for the uncertainty in δ given the p-PDGFR data. Equation (4) relies on the assumption that δ is conditionally independent of (T, Z) given (X, Y). We computed the posterior in (4) using the method given in section 4.3 of Mwalili, Lesaffre and Declerck (2005), as follows. At each iteration of the MCMC algorithm for obtaining the posterior of (β, ν), for each patient i = 1,…,N, a value of θi was sampled from p(θi | X, Y), and the resulting δi = δ(θi) under the non-mixture model or (δLi, δRi) = (δLi(θi), δRi(θi)) under the mixture model were incorporated into the linear component of the regression model for PFS.

3.2 Goodness-of-fit analysis for the p-PDGFR models

To compare how well the mixture and non-mixture models fit the p-PDGFR data, we used the following supnorm metric. The basic idea is to compute the maximum distance between the empirical distribution of each pre-treatment and post-treatment within-patient p-PDGFR sample and the corresponding predictive distribution under each fitted model, and then use these maximum distances to determine whether the mixture or non-mixture model provides a better fit. To do this, we first computed the empirical CDF’s of Xi and Yi for each i = 1,…,N,

F^i(x)=1mij=1miI(Xijx)  and  G^i(y)=1nik=1niI(Yiky),

where I(A) denotes the indicator of the event A. Next, we fixed x0 < mini,j{Xij} and y0 < mini,j{Yij} and a sufficiently small increment Δ in the domains of X and Y, and computed the corresponding empirical estimated probability increments




for h = 0,…,H and i = 1,…,N. We set x0 = y0 = 0.49, Δ = 0.02, and H = 75, after taking the ranges and variability of X and Y into account. For the mixture model, given ξi and θxi=(μxLi*,μxRi*,τxLi,τxRi,λxi), the pre-treatment p-PDGFR distribution was the mixture of normals

fximix(x | ξi,θxi)=λxifN (x | ξi+μxLi*,τxLi1)+(1λxi)fN (x | ξi+μxRi*,τxRi1),

and we defined the estimated mixture distribution f^ximix(x) to be the posterior mean of fximix(x | ξi,θxi), with the estimated post-treatment p-PDGFR mixture distribution f^yimix(y) obtained similarly. The supnorm metrics for patient i with data Xi and Yi are




and we used Si = max{Sxi, Syi} for our goodness-of-fit analysis of the mixture model. For the non-mixture model, we applied the same method using the conventional uni-modal normal distributions

fxinonmix(x | ξi,μxi*,τxi)=fN (x | ξi+μxi*,τxi1)


fyinonmix(y | ξi+μyi*,τyi)=fN (y | ξi+μyi*,τyi1).

4. Results

4.1 Fit of the non-mixture and mixture models to the p-PDGFR data

The population distributions of the patient-specific mean p-PDGFR differences between post-treatment and pre-treatment under the non-mixture and mixture models are summarized in Table 1, which gives the posterior means and standard devisions in the population-level distributions for δi, δLi and δRi. The most important message in Table 1 is that the left and right component mean changes in p-PDGFR, δLi and δRi, have very different population distributions. Comparing these two distributions with that of δi shows that the non-mixture model obfuscates the bimodality of the p-PDGFR distributions, and that δi is actually a weighted average of δLi and δRi. An interesting property of the pre- and post-treatment mixing proportions, λxi and λyi, is that the individual posterior estimates of λxi and λyi are virtually both on average .20, but both vary substantially, roughly between .05 and .40. Moreover, the dispersion of the left component distribution is much larger than that of the right component. The posterior estimates of the patient-specific standard deviations of the left components τxLi1/2 and τyLi1/2 are on average about .16, while those of the right components τxRi1/2 and τyRi1/2 are on average about .05. These statistics reflect what can be seen in Figure 1. Thus, the hierarchical model appears to be quite appropriate for all elements of θi.

Table 1
Summary of the posterior distributions of the means and standard deviations (SDs) of the population-level parameters for the patient-specific post-treatment minus pre-treatment mean p-PDGFR differences, obtained under either the non-mixture model or the ...

Computing the supnorm metrics S1,…,S88 to compare goodness-of-fit for the non-mixture and mixture models for the within-patient p-PDGFR data gave median{Si} = .098 with 5th and 95th percentiles .048 and .155 for the non-mixture model, compared to median{Si} = .033 with 5th and 95th percentiles .016 and .055 for the mixture model. Thus, the mixture model appears to provide a substantially better fit.

4.2 Regression analyses for PFS time

Table 2 gives the DIC and BIC values for the fitted regression models for PFS, under the mixture formulation, for each of four event time distributions. Both statistics indicate that the log-normal gives the best fit. In particular, this allows the hazard of PFS time to be non-monotone.

Table 2
Deviance information criterion (DIC) and Bayesian information criterion (BIC) for four competing event time distributions used to estimate progression free survival time under the mixture model.

Table 3 summarizes three fitted log-normal regression models, first using the non-mixture model for p-PDGFR to obtain a single mean change δi for use as a covariate, then using the pair of mean changes δLi and δRi obtained from the mixture model as covariates, and finally a reduced model using only the right component estimated means δRi. In all fits, as shown earlier in the definitions of ηinonmix and ηimix, the effects of these covariates on PFS were evaluated assuming a fully interactive model with different covariate effects within the two treatment arms. Under the Bayesian formulation, for a given covariate a coefficient β with posterior concentrated around 0 corresponds to no effect, whereas a value of Pr(β > 0 | data) close to either 0 or 1 has the interpretation that PFS changes substantively with the covariate. This corresponds to a small p-value in a frequentist analysis, but without the usual problem of interpretability. Under the log-normal model, β > 0 (β < 0) corresponds to longer (shorter) PFS, on average. Thus, the positive values of the pre-treatment Hb effects correspond to a higher pre-treatment Hb level being associated with longer PFS. The negative values of the post-treatment PSA - pre-treatment PSA effects correspond to the well-known fact that a rising PSA is associated with disease progression in prostate cancer. However, this effect was much larger in the placebo arm than in the imatinib arm, which suggests that imatinib may disrupt the effect of change in PSA on PFS time. While the mean change δi obtained from the non-mixture model is moderately associated with PFS, the effect of the mean change δRi from the mixture model is quite strong in the DP arm. It thus appears that δi has been deconvoluted into a strongly associated component, δRi, and a weakly associated component, δLi, by the mixture model. This is supported by the DIC and BIC values in Table 3, which indicate that the regression model including δRi but not δLi in each arm gives a better fit than either the regression model including both δLi and δRi, or including only δi from the non-mixture model.

Table 3
Fitted log-normal non-mixture and mixture regression models for estimating progression-free survival time. The reduced mixture model includes the right-hand mean changes but not the left-hand mean changes in p-PDGFR. The posterior mean of each regression ...

Figure 2 gives plots of the estimated covariate-adjusted hazard functions under the reduced log-normal model that includes δRi but not δLi. In the figure, the left and right columns correspond to the DI and DP arms, the rows correspond to increase in PSA evaluated at its 10, 50th and 90th percentiles, and within each of the six plots the estimated hazard is given for δRi evaluated at its 10%, mean, and 90% points. This figure may be regarded as a graphical analysis of covariance, illustrating the increasing hazard of disease progression with larger increase in PSA, a higher hazard in the imatinib arm compared to placebo, a decreasing hazard with larger δRi, and the interactions of both the PSA increase and δRi with treatment arm.

Figure 2
Plots of the estimated hazard functions under the final log-normal mixture model, given in Table 3, that includes δRi but not δLi. Within each of the six plots, the three hazards correspond to δRi set equal to its 10% (dotted line), ...

5. Discussion

In this analysis, we assumed patient-specific mean and precision parameters to analyze the p-PDGFR data, and introduced patient-specific mixture parameters for the purpose of deconvoluting the p-PDGFR distributions. Rather than fixing hyperparameters of priors for the patient-specific parameters, we assumed a hierarchical model. One could assume simpler, more parsimonious models, for example, a model where common data precision parameters are assumed among patients. For the p-PDGFR data, however, the goodness-of-fit analyses based on the supnorm metric indicated that the models including patient-specific precision parameters provide better fits.

The fact that the δRi effect was much larger in the placebo arm compared to the imatinib arm, like the PSA effect, may indicate that imatinib disrupts the effect of change in p-PDGFR. It may be hypothesized that the two component distributions of phosphorylated p-PDGFR represent different subsets of leukocytes, i.e. neutrophils, monocytes or lymphocytes, with variable roles in facilitating antitumor efficacy of taxane therapy. Alternatively, they may simply represent two leukocyte subpopulations having different capacities as cellular surrogates for a pharmacodynamic signature of antitumor efficacy, but with no functional difference. A third possibility is that the left component distribution is an artifact of a problem with laboratory method, although we have not been able to identify such a source of the bimodality.

Our analyses show that, while a decline in PSA and a rise in p-PDGFR are associated with longer PFS, these effects are much smaller in the imatinib arm. These results suggest that imatinib may disrupt the effects of these covariates on PFS. To validate this apparent effect of imatinib, however, further investigation with a larger patient cohort would be required.


Satoshi Morita’s work was supported in part by grant H20-CLINRES-G-009 from the Ministry of Health, Labour, and Welfare in Japan. The authors thank two referees for their constructive commments and suggestions, which led to substantive improvements in the paper.


The MCMC sampling scheme that we used to fit the mixture model was carried out as follows. We used the Gibbs sampling from standard conditional distributions (Steps 1 to 8) in combination with Metropolis-Hastings steps for sampling from nonstandard conditional distributions (Steps 9 and 10).

Step 0: Initialize the parameters ξi(0),μxLi*(0),μxRi*(0),μyLi*(0),μyRi*(0),τxLi*(0),τxRi*(0),τyLi*(0),τyRi*(0),λxi(0) and λyi(0) for i = 1,…,N and the hyperparameters μ˜ξ(0),μ˜xL*(0),μ˜xR*(0),μ˜yL*(0),μ˜yR*(0),τ˜ξ(0),τ˜xL*(0),τ˜xR*(0),τ˜yL*(0),τ˜yR*(0),α˜x(0),β˜x(0),α˜y(0),β˜y(0),a˜xL(0),b˜xL(0),a˜xR(0),b˜xR(0),a˜yL(0),b˜yL(0),a˜yR(0) and b˜yR(0) using the within-patient pre-treatment and post-treatment empirical means and variances of the p-PDGFR samples to obtain the initial values.

For each iteration t = 1,2,…,

Step 1: Sample the latent mixture indicator variables ζxij(t+1) and ζyik(t+1), for j = 1,…,mi and k = l,…,ni, respectively from Bernoulli(zxij(t))andBernoulli(zyik(t)) distributions, where


and repeat this step for each i = 1,…,N.

Step 2: Sample ξi(t+1) for i = 1,…,N from the following normal distribution,

N (Mξ(t)Sξ(t),(Sξ(t))1)

where Mξ(t)=τ˜ξ(t)μ˜ξ(t)+τxLi(t)(j=1miζxij(t+1))(X¯LiμxLi*(t))+τxRi(t)(j=1mi(1ζxij(t+1)))(X¯RiμxRi*(t))+τyLi(t)(k=1niζyik(t+1))(Y¯LiμyLi*(t))+τyRi(t)(k=1ni(1ζyik(t+1)))(Y¯RiμyRi*(t)) and Sξ(t)=τ˜ξ(t)+τxLi(t)j=1miζxij(t+1)+τxRi(t)j=1mi(1ζxij(t+1))+τyLi(t)k=1niζyik(t+1)+τyRi(t)k=1ni(1ζyik(t+1)). And X¯Li=(j=1miζxij(t+1)Xij)/j=1miζxij(t+1),X¯Ri=(j=1mi(1ζxij(t+1))Xij)/(j=1mi(1ζxij(t+1))),Y¯Li=(k=1niζyik(t+1)Yik)/k=1niζyik(t+1), and Y¯Ri=(k=1ni(1ζyik(t+1))Yik)/(k=1ni(1ζyik(t+1))).

Step 3: Sample μxLi*(t+1) and μxRi*(t+1) for i = 1,…,N, under the constraint μxLi*(t+1)<μxRi*(t+1), using the inverse cumulative distribution method, through the following sub-steps 3-1 to 3-4.

Step 3-1: Sample μxLi(t+1) from the following normal distribution,

N (MμxL(t)SμxL(t),(SμxL(t))1)

where MμxL(t)=τ˜xL*(t)μ˜xL*(t)+τxLi(t)(j=1miζxij(t+1))(X˜Liξi(t+1)) and SμxL(t)=τ˜xL*(t)+τxLi(t)j=1miζxij(t+1).

Step 3-2: Calculate μxLi*(t+1)ϕ(μxRi*)dμxRi*=1Φ(μxLi*(t+1)), where the quantity of Φ(μxLi*(t+1)) is the normal CDF at μxLi*(t+1) under the following normal distribution,

N (MμxR(t)SμxR(t),(SμxR(t))1)

where MμxR(t)=τ˜xR*(t)μ˜xR*(t)+τxRi(t)(j=1mi(1ζxij(t+1)))(X˜Riξi(t+1)) and SμxR(t)=τ˜xR*(t)+τxRi(t)j=1mi(1ζxij(t+1)).

Step 3-3: Sample u2(t+1) from U[Φ(μxLi*(t+1)),1].

Step 3-4: Sample μxRi*(t+1) from Φ1(u2(t+1)), where Φ1(u2(t+1)) is the inverse normal CDF evaluated at u2(t+1) under the normal distribution (7).

Step 4: Sample μyLi*(t+1) and μyRi*(t+1) for i = 1,…,N, similarly to Step 3, under the restriction that μyLi*(t+1)<μyRi*(t+1).

Step 5: Sample the mixture parameters λxi(t+1) and λyi(t+1) for i = 1,…,N respectively from




where ([alpha]x, betax) and ([alpha]y, betay) denote the hyperparameters of the priors of λxi and λyi, respectively. The simulations in this step are subject to the constraint that λxi(t+1) and λyi(t+1) are restricted to the interval [0.001, 0.999].

Step 6: Sample τxLi(t+1),τxRi(t+1),τyLi(t+1), and τyRi(t+1) for i = 1,…,N respectively from


where (ãxL, bxL), (ãxR, bxR),(ãyL, byL), and (ãyR, byR) denote the hyperparameters of the priors of τxLi, τxRi, τyLi, and τyRi, respectively.

Step 7: sample μ˜ξ(t+1),μ˜xL*(t+1),μ˜yL*(t+1),μ˜xR*(t+1), and μ˜yR*(t+1) respectively from the following normal distributions,

N (τξϕμξϕ+Nτ˜ξ(t)μ¯ξ(t+1)τξϕ+Nτ˜ξ(t),(τξϕ+Nτ˜ξ(t))1),N (τxLϕμxLϕ+Nτ˜xL*(t)μ¯xL*(t+1)τxLϕ+Nτ˜xL*(t),(τxLϕ+Nτ˜xL*(t))1),N (τxRϕμxRϕ+Nτ˜xR*(t)μ¯xR*(t+1)τxRϕ+Nτ˜xR*(t),(τxRϕ+Nτ˜xR*(t))1),N (τyLϕμyLϕ+Nτ˜yL*(t)μ¯yL*(t+1)τyLϕ+Nτ˜yL*(t),(τyLϕ+Nτ˜yL*(t))1),andN (τyRϕμyRϕ+Nτ˜yR*(t)μ¯yR*(t+1)τyRϕ+Nτ˜yR*(t),(τyRϕ+Nτ˜yR*(t))1),

where μ˜ξ(t+1)=N1i=1Nξi(t+1),μ˜xL*(t+1)=N1i=1NμxLi*(t+1),μ˜yL*(t+1)=N1i=1NμyLi*(t+1),μ˜xR*(t+1)=N1i=1NμxRi*(t+1) and μ˜yR*(t+1)=N1i=1NμyRi*(t+1). The pairs of (μξϕ,τξϕ),(μxLϕ,τxLϕ),(μxRϕ,τxRϕ),(μyLϕ,τyLϕ) and (μyRϕ,τyRϕ) are fixed numerical mean and precision parameter values that determine the hyperpriors for μ˜ξ,μ˜xL*,μ˜yL*,μ˜xR*, and μ˜yR*, respectively.

Step 8: Sample τ˜ξ(t+1),τ˜xL*(t+1),τ˜yL*(t+1),τ˜xR*(t+1), and τ˜yR*(t+1) respectively from


The pairs of (aξϕ,bξϕ),(axLϕ,bxLϕ),(axRϕ,bxRϕ),(ayLϕ,byLϕ) and (ayRϕ,byRϕ) are fixed numerical shape and inverse scale parameter values that determine the hyperpriors for [tau with tilde]ξ, τ˜xL*,τ˜yL*,τ˜xR* and τ˜yR*, respectively.

Step 9: Generate (α˜x(t+1),β˜x(t+1)) and (α˜y(t+1),β˜y(t+1)) using the the Metropolis-Hastings algorithm. The the Metropolis-Hastings algorithm was implemented using a product of two independent log-normal distributions as a proposal distribution.

Step 9-1: Draw a sample as a candidate for new [alpha]x from a log-normal distribution with mean mα˜x(t) and standard deviation σ[alpha]x fixed at 0.1 on the log scale, mα˜x(t) is given by log(α˜x(t))+σα˜x2. Perform the same sampling for betax with as σbetax fixed at 0.15.

Step 9-2: Compute a ratio r(t+1) denned as

r(t+1)=p(α˜xnew,β˜xnew | λx)q(α˜x(t),β˜x(t) | α˜xnew,β˜xnew)p(α˜x(t),β˜x(t) | λx)q(α˜xnew,β˜xnew | α˜x(t),β˜x(t))

where q(·) denotes a proposal distribution and λx = (λx1,…,λxN).

Step 9-3: Accept the new candidates with a probability min(l, r(t+1)), otherwise the values of [alpha]x and betax remain unchanged.

Step 9-4: Generate α˜y(t+1) and β˜y(t+1), similarly to Steps 9-1, 9-2, 9-3.

Step 10: Generate (a˜xL(t+1),b˜xL(t+1)),(a˜xR(t+1),b˜xR(t+1)),(a˜yL(t+1),b˜yL(t+1)), and (a˜yR(t+1),b˜yR(t+1)) using the Metropolis-Hastings algorithms, similarly to Step 9.


  • Belin TR, Rubin DB. The analysis of repeated-measures data on schizophrenic reaction times using mixture models. Statistics in Medicine. 1995;14:747–768. [PubMed]
  • Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd Edition. New York: Chapman and Hall/CRC; 2004.
  • Gilks W, Richardson S, Spiegelhalter D. Markov Chain Monte Carlo in Practice. London: Chapman and Hall; 1996.
  • Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109.
  • Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. New York: Springer Verlag; 2001.
  • Lindsey JK. Applying Generalized Linear Models. New York: Springer; 2000.
  • Mathew P, Thall PF, Bucana CD, Oh WK, Morris MJ, Jones DM, et al. Platelet-derived growth factor receptor inhibition and chemotherapy for castration-resistant prostate cancer with bone metastases. Clinical Cancer Research. 2007;13:5816–5824. [PubMed]
  • McLachlan G, Peel D. Finite Mixture Models. New York: Wiley; 2000.
  • Mwalili SM, Lesaffre E, Declerck D. A Bayesian ordinal logistic regression model to correct for interobserver measurement error in a geographical oral health study. J. R. Statist. Soc. C. 2005;54:77–93.
  • Pietras K, Sjoblom T, Rubin K, Heldin C-H, Ostman A. PDGF receptors as cancer drug targets. Cancer Cell. 2003;3:439–443. [PubMed]
  • Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464.
  • Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B. 2002;64:583–639.
  • Uehara H, Kim SJ, Karashima T, David LS, Dominic F, Tsan R, et al. Effects of blocking platelet-derived growth factor-receptor signaling in a mouse model of experimental prostate cancer bone metastases. Journal of the National Cancer Institute. 2003;95:458–570. [PubMed]