Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2870722

Formats

Article sections

- Summary
- 1. Introduction
- 2. Method
- 3. Simulation Studies
- 4. Colorectal Cancer Study
- 5. Concluding Comments
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 May 14.

Published in final edited form as:

PMCID: PMC2870722

NIHMSID: NIHMS196112

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

Colorectal cancer is the second leading cause of cancer related deaths in the United States, with more than 130,000 new cases of colorectal cancer diagnosed each year. Clinical studies have shown that genetic alterations lead to different responses to the same treatment, despite the morphologic similarities of tumors. A molecular test prior to treatment could help in determining an optimal treatment for a patient with regard to both toxicity and efficacy. This article introduces a statistical method appropriate for predicting and comparing multiple endpoints given different treatment options and molecular profiles of an individual. A latent variable-based multivariate regression model with structured variance covariance matrix is considered here. The latent variables account for the correlated nature of multiple endpoints and accommodate the fact that some clinical endpoints are categorical variables and others are censored variables. The mixture normal hierarchical structure admits a natural variable selection rule. Inference was conducted using the posterior distribution sampling Markov chain Monte Carlo method. We analyzed the finite-sample properties of the proposed method using simulation studies. The application to the advanced colorectal cancer study revealed associations between multiple endpoints and particular biomarkers, demonstrating the potential of individualizing treatment based on genetic profiles.

In most cancer clinical trials, some patients respond to chemotherapy very well while others show no sign of response. Likewise, some patients experience more toxicity than others given the same treatment. Recent clinical studies have suggested that patients who possess specific genetic alterations or mutations may respond differently to the same treatment for colorectal cancer (Milano and McLeod, 2000). However, tools for individualizing chemotherapy treatment using genetic profiles are not yet fully developed (McLeod and Murray, 1999).

The objective of a randomized phase III trial (Goldberg et al., 2004), initiated by the Mayo Clinic in 1997, was to compare the effect of combinations of chemotherapy agents in patients with advanced colorectal cancer. At that time, two chemotherapy drugs had been approved by the Food and Drug Administration for treatment of advanced colon cancer: 5-fluorouracil (5-FU) and irinotecan (CPT-11), while oxaliplatin (OXAL), a cisplatinum analogue with activity in colorectal cancer, was an investigational agent in the United States and Canada. Two experimental combinations of regimens, 5-FU+OXAL and OXAL+CPT-11, were compared to the standard regimen, 5-FU+CPT-11, in the trial. We refer to these regimens as arm F, arm G, and the control as arm A, respectively. A total of 1705 patients were included in the study, of which 513 (115 patients in arm A, 292 patients in arm F, and 106 patients in arm G) were genotyped for 23 biomarkers. These biomarkers were selected based on previous reports indicating that they were related to bioactivity of the chemotherapies by direct or indirect mechanisms. Descriptive summaries of the covariates are shown in Table 1.

In this article, we develop a statistical model, given multiple treatment options and characteristics of an individual, which predicts and compares toxicity and efficacy simultaneously. A direct comparison of treatments will compare the probability that the predicted multiple endpoints for an individual patient or a group of patients fall in a favorable region of treatment outcomes (Δ). This process involves two tasks: (1) building a predictive model with appropriate prognostic and predictive factors and (2) for each treatment, predicting the probability of being in region Δ. To be concrete, let *y* be a *k*-dimensional vector of outcomes (*y* = (*y*_{1}, …, *y _{k}*)),

Methods that involve Bayesian inference for variable selection, motivated by the seminal work of George and McCulloch (1993), have been applied in many studies (e.g., Brown, Vannucci, and Fearn, 1998; Chen and Dey, 2003; Sha, Tadesse, and Vannucci, 2006). Bayesian methods have the potential to account for small samples and selection of derived covariates, e.g., interaction terms, through a proper specification of the priors. In this article, we propose a statistical method to individualize treatment in the Bayesian framework.

Another challenge to conducting an appropriate analysis is that Δ is a complicated multidimensional space. For example, a patient with a confirmed tumor response may have a longer survival. A multivariate analysis is a reasonable choice when outcomes are not independent, as information is borrowed from the potentially correlated outcomes for variable selection. Furthermore, by estimating the posterior predictive probability of multiple outcomes in region Δ, one can provide a single score to compare treatments from multiple perspectives which may be related, e.g., time to progression, and overall survival. This is of critical importance, because an informed therapeutic decision is often based on consideration of multiple endpoints. When outcomes follow different type of distributions the coefficients of the same regressor across the different outcomes are not directly comparable. In the context of a multivariate regression model, properly scaled coefficients are needed. This can been done using latent variables; in addition, the variance–covariance matrix of such a multivariate regression model is often structured, which dictates additional care when specifying a prior. To address these issues, we propose a method called multivariate Bayesian selection of interactions (MBSI). The MBSI method features joint modeling of categorical and survival responses, a nonconjugate prior for structured variance components, selection of interactions with limited sample size, and a variable selection rule. Each issue has been addressed separately in literature; nevertheless, the particular combination has not been studied.

In Section 2, we describe the hierarchical structure of our MBSI model, and we will show how the decision rules for variable selection are derived. The performance of the MBSI is studied through simulations in Section 3. In Section 4, the method will be illustrated in an analysis of the large phase III colorectal cancer study. Finally, a short discussion is given in Section 5.

We consider the problem of modeling the relationship between *K* multiple responses (including categorical and survival) and *p* predictor variables with sample size *n* in a Bayesian framework. In the motivating example, we are interested in *K* 3 outcomes. Let *y*_{1} and *y*_{2} be the binary outcomes of toxicity and tumor response, *y*_{3} be the survival outcome, *x _{i}* be the

We relate the regressors with the binary responses *y*_{1} and *y*_{2} through a probit link, which yields

$$\mathit{\text{Pr}}\left({y}_{k}=1|\mathit{\text{X}},{\alpha}_{k},{\beta}_{(k)}\right)=\Phi \left({\alpha}_{k}+\mathit{\text{X}}{\beta}_{(k)}\right),$$

where Φ is the normal cumulative distribution function and *k* = 1, 2. We introduce a *n* × 1 vector of latent variable *z _{k}* (Albert and Chib, 1993). For each individual (

$${y}_{ik}=\{\begin{array}{cc}1& \text{if}\phantom{\rule{0.4em}{0ex}}{z}_{ik}>0\\ 0& \text{if}\phantom{\rule{0.4em}{0ex}}{z}_{ik}\le 0\end{array}.$$

The latent variable has a linear model form: *z _{k}* =

With respect to the survival outcome *y*_{3}, denote the censoring indicator by a *n* × 1 vector *c*, where *c _{i}* = 0 if

$$\text{log}({y}_{i3})\phantom{\rule{0.2em}{0ex}}\{\begin{array}{cc}={z}_{i3}& \text{if}\phantom{\rule{0.4em}{0ex}}{c}_{i}=1\\ <{z}_{i3}& \text{if}\phantom{\rule{0.4em}{0ex}}{c}_{i}=0\end{array}.$$

The complete variable has a normal distribution as *z*_{3} ~ *N*(*α*_{3} + *X**β*_{(3)}, *σ*^{2}). Note that when *c* = 0, the variable *z*_{3} is unobservable. Our approach is same as the data augmentation Gibbs sampling method used in Bayesian variable selection with log-normal accelerated failure time model in Sha et al. (2006). This results in explicit full conditional distributions of ** β** [

Having introduced the latent variables (*z*_{1}, *z*_{2}) for binary responses and the complete variable (*z*_{3}) for survival response, we can rewrite the preceding model in a multivariate regression form. Letting ** α** = [

$$\begin{array}{l}\begin{array}{ccc}\mathit{\text{Z}}={1}_{n}{\mathit{\alpha}}^{T}+\mathit{\text{X}}\beta +\epsilon & \text{and}& \text{vec}({\epsilon}^{T})\sim N(0,{\mathit{\text{I}}}_{n}\otimes \sum )\end{array},\\ \begin{array}{cc}\text{where}& \sum =\left[\begin{array}{ccc}1& {\rho}_{1}& {\rho}_{2}\sigma \\ {\rho}_{1}& 1& {\rho}_{3}\sigma \\ {\rho}_{2}\sigma & {\rho}_{3}\sigma & {\sigma}^{2}\end{array}\right]\end{array}.\end{array}$$

(1)

Note that denotes the Kronecker product, and vec (*ε** ^{T}*) is the vector obtained by stacking the columns of

We introduce a *p* × 3 matrix of binary latent variables ** γ** with components

$$\left[\begin{array}{ccc}(1-{\gamma}_{j1}+{\gamma}_{j1}{c}^{2}){\tau}^{2}& 0& 0\\ 0& (1-{\gamma}_{j2}+{\gamma}_{j2}{c}^{2}){\tau}^{2}& 0\\ 0& 0& (1-{\gamma}_{j3}+{\gamma}_{j3}{c}^{2}){\tau}^{2}{\sigma}^{2}\end{array}\right].$$

(2)

Here **Σ_{β}** corresponds to an a priori independence assumption for the coefficients. Note that for the survival outcome in equation (2), the variance is adjusted by a quantity

In this prior setting, the variable selection problem is formulated in terms of making inferences regarding ** γ**. A value of

We assume that all three intercepts are always included in the model. Hence, selection of the intercepts is not performed. We assume a simple diffuse multivariate normal prior for ** a** independent of

$$\begin{array}{ccc}\alpha |\sigma \sim {N}_{3}(0,{\sum}_{\alpha}),& \text{where}& {\sum}_{\alpha}=\left[\begin{array}{ccc}{\sigma}_{\alpha}^{2}& 0& 0\\ 0& {\sigma}_{\alpha}^{2}& 0\\ 0& 0& {\sigma}^{2}{\sigma}_{\alpha}^{2}\end{array}\right]\end{array}.$$

Note that **Σ_{α}** is adjusted for the survival outcome in the same manner as in equation (2).

In a Bayesian regression framework, a Wishart distribution is often used as a prior for the variance matrix **Σ** for convenience. However, in our case the Wishart distribution is not a proper prior due to the constraint on the variance of the binary outcomes *y*_{1} and *y*_{2}. Therefore, we modeled **Σ** using the separation strategy of Barnard, McCulloch, and Meng (2000).

The prior distributions for *σ* and ** ρ** = [

$$1-{\rho}_{1}^{2}-{\rho}_{2}^{2}-{\rho}_{3}^{2}+2{\rho}_{1}{\rho}_{2}{\rho}_{3}>0.$$

(3)

Hence, a joint uniform prior for *ρ*_{1}, *ρ*_{2}, *ρ*_{3} is specified as

$$p({\rho}_{1},{\rho}_{2},{\rho}_{3})=\frac{2}{{\pi}^{2}}{\mathit{\text{I}}}_{(1-{\rho}_{1}^{2}-{\rho}_{2}^{2}-{\rho}_{3}^{2}+2{\rho}_{1}{\rho}_{2}{\rho}_{3}>0,{\rho}_{1}{\rho}_{2},{\rho}_{3}\in (-1,1))}$$

with informative marginal distribution

$$\begin{array}{ccc}p({\rho}_{k})=\frac{2}{\pi}\sqrt{1-{\rho}_{k}^{2}},& {\rho}_{k}\in (-1,1),& k=1,2,3.\end{array}$$

Given the other two elements of ** ρ**, say

We first assume that the selection of regressors is a priori independent across the outcomes. Second, within each outcome, we assume that the selection of a main effect is dependent on the selection of all its interaction terms. Because interaction terms represent deviation from an additive model, we adopt the convention that a model containing an interaction term should also contain the corresponding main effects (Neter et al., 1996).

Define a *p* × 3 matrix of indicator variables ** ξ** with components

$$\begin{array}{l}p({\gamma}_{jk}=1|{\Omega}_{jk},{\pi}_{jk})=\{\begin{array}{lll}1& \text{if}\phantom{\rule{0.2em}{0ex}}{\xi}_{jk}=1\phantom{\rule{0.2em}{0ex}}\text{and}& \sum _{{\gamma}_{{j}^{\prime}k}\in {\Omega}_{jk}}{\gamma}_{{j}^{\prime}k}>0\\ {\pi}_{jk}& \text{if}\phantom{\rule{0.2em}{0ex}}{\xi}_{jk}=1\phantom{\rule{0.2em}{0ex}}\text{and}& \sum _{{\gamma}_{{j}^{\prime}k}\in {\Omega}_{jk}}{\gamma}_{{j}^{\prime}k}=0\\ {\pi}_{jk}& \text{if}\phantom{\rule{0.2em}{0ex}}{\xi}_{jk}=0& \end{array}\\ {\pi}_{jk}\sim \text{beta}(a,b).\end{array}$$

To favor parsimonious models or when *n* < *p*, the parameters (*a, b*) in the beta prior can be chosen to force *π _{jk}* to be small.

It is commonly perceived that the optimal predictive model is the model with highest joint posterior probability, but this is not necessarily the case as discussed in Barbieri and Berger (2004). Our simulation studies also showed that a method based on marginal posterior probabilities was better than that of the highest joint posterior probability method regarding the prediction performance (Table 2 in Section 3.2 and Web Figure 1).

The major differences between choosing a model (using highest joint posterior probability) and choosing important predictors (using marginal posterior probabilities) are the following: (1) The selection domain is *p* for predictors in contrast to 2* ^{p}* for models, assuming no restriction. For a fixed number of MCMC iterations, it is desirable to search in a small parameter domain space. Determining marginal probabilities is computationally simpler than determining highest joint probabilities (Barbieri and Berger, 2004). (2) The joint prior probabilities are affected by the size of

There are two types of errors in the variable selection problem: (1) selecting a variable that in truth is not a predictor (false discovery); and (2) not selecting a variable that in truth is a predictor (false negative). These two errors can be quantified by two complementary Bayesian losses: posterior expected FDR
$(\overline{\text{FDR}})$ and posterior expected FNR
$(\overline{\text{FNR}})$. For the sake of simplicity in this section only, we introduce notation omitting subscript *k* for the *k*th outcome, because the definition is the same for all three outcomes. Let *d _{j}* denote the decision of inclusion (

$$\begin{array}{ll}\overline{\text{FDR}}\hfill & =\{\begin{array}{ll}{E}_{\gamma |\text{Data}}\left(\frac{\sum {d}_{j}(1-{\gamma}_{j})}{D}|\text{Data}\right)\hfill & \\ \phantom{\rule{1.5em}{0ex}}=\frac{\sum {d}_{j}(1-{\nu}_{j})}{D}\hfill & \text{if}\phantom{\rule{0.4em}{0ex}}D>0\hfill \\ 0\hfill & \text{if}\phantom{\rule{0.4em}{0ex}}D=0\hfill \end{array}\hfill \\ \text{and}\hfill & \\ \overline{\text{FNR}}\hfill & =\{\begin{array}{ll}{E}_{\gamma |\text{Data}}\left(\frac{\sum (1-{d}_{j}){\gamma}_{j}}{p-D}|\text{Data}\right)\hfill & \\ \phantom{\rule{1.5em}{0ex}}=\frac{\sum (1-{d}_{j}){\nu}_{j}}{m-D}\hfill & \text{if}\phantom{\rule{0.4em}{0ex}}D<p\hfill \\ 0\hfill & \text{if}\phantom{\rule{0.4em}{0ex}}D=p,\hfill \end{array}\hfill \end{array}$$

(4)

where *ν _{j}* =

Accounting for the relationship between main effects and interactions, the
$\overline{\text{FDR}}$ and
$\overline{\text{FNR}}$ will be calculated separately within the set of main effects (*s*_{1}) and the set of interactions terms (*s*_{2}). Hence, given *α _{L}* for main effects and

As the decision in *s*_{1} is affected by the decision of their higher-order terms in *s*_{2}, we start with minimizing the
${\overline{\text{FNR}}}_{{\mathit{\text{s}}}_{2}}$ followed by minimizing the
${\overline{\text{FNR}}}_{{\mathit{\text{s}}}_{1}}$. After the decisions are made for all the terms in *s*_{2}, a subset (denoted as
${{\mathit{\text{s}}}^{\prime}}_{1}$) of the lower-order terms in *s*_{1} is forced to be included in the model due to the constraint. Hence, those terms are not involved in the decision to minimize the
${\overline{\text{FNR}}}_{{\mathit{\text{s}}}_{1}}$. Decisions will be made for the remaining terms in the complement of
${{\mathit{\text{s}}}^{\prime}}_{1}$ (denoted as
$\overline{{{\mathit{\text{s}}}^{\prime}}_{1}}$). The following algorithm illustrates the steps to reach these decisions. First, if *t* is a threshold such that the decision *d _{j}* =

The choice of *α _{L}* and

Let *M _{l}*(

Analytically evaluating the summation and integral is difficult due to the large model space. An immediate Monte Carlo estimate can be calculated as follows. We obtain draws of the parameters
${\mathit{\theta}}_{l}^{(1)},\dots ,{\mathit{\theta}}_{l}^{({T}_{l})}$ from model *M _{l}*. For each set of parameters
${\mathit{\theta}}_{l}^{(t)}$,

To decide which treatment is best for a given patient with covariates *x*_{new}, we computed posterior predictive probabilities of superiority for the three treatments (A, F, and G). For each posterior sample of the model parameters
${\mathit{\theta}}_{A}^{(t)}$, *t* = 1, …, *T*, compute
${\mathit{\text{I}}}_{A}^{(t)}$ = {Pr(*y*_{new} *θ*_{A}, x_{new}) > Pr(*y*_{new} *θ*_{F}, x_{new}) and Pr(*y*_{new} *θ*_{A}, x_{new}) > Pr(*y*_{new} *θ*_{G}, x_{new})}, and analogously compute
${\mathit{\text{I}}}_{F}^{(t)}$ and
${\mathit{\text{I}}}_{G}^{(t)}$. Averaging these indicators over the *T* posterior samples would yield relative probabilities for superiority
${p}_{A}=\frac{1}{T}{\sum}_{t=1}^{T}{\mathit{\text{I}}}_{A}^{(t)}$ (and likewise, *p _{F}* and

To investigate the performance of the variable selection, the prediction of multiple non-Gaussian outcomes, and the effect of sample size on our proposed method, we performed a series of simulation studies. For each sample size (*n* = 30 or 200), we set the number of potential predictors *p* 55, which consisted of 10 main effects and 45 pairwise interaction terms. We assume that all of the 10 main effects are independent and from a *N*(0, 1) distribution.

Three responses were simulated: two binary outcomes (with probit link) and one survival outcome (with a lognormal link). Among the 55 potential predictors, the true predictors and their coefficients were set as 0.3 + *X*_{1} + *X*_{2} + *X*_{3} + *X*_{4} + *X*_{1} *X*_{2} + *X*_{1}*X*_{4} for the first binary outcome, −0.3 + *X*_{2} + *X*_{3} + *X*_{4} + *X*_{5} + *X*_{2}*X*_{3} + *X*_{4}*X*_{5} for the second binary outcome, and 0.3*X*_{3} + 0.3*X*_{4} + 0.3*X*_{5} − 0.2*X*_{3}*X*_{4} + 0.2*X*_{4}*X*_{5} for the survival outcome. We drew a *n* × 3 matrix of latent variables ** Z** from the trivariate normal density (1) given these true predictors. The parameters in the variance–covariance matrix of

For all the simulated data sets, we applied the MBSI method with *c* = 10 and *τ* = 0.05. For the hyperparameters in the beta prior, the mode was set to 3/*p* and the mean was 20% larger than the mode. The length of the MCMC chain was set to be 20,000, from which the first 5000 iterations were discarded. The significance levels (*α _{L}, α_{H}*) were set to (0.1, 0.2) and (0.3, 0.8), respectively, for the sample sizes 200 and 30. When there was a sufficiently large sample size (

We evaluated prediction using the posterior predictive distributions as follows. Let *y _{test}* and

The true probability Pr(*y _{test}* Δ), the estimated mean and standard error of posterior predictive probability, and the 90% CPI are shown in Table 2. Two model selection approaches are compared in the table. One selects a model with the highest joint posterior probability, while the other selects a model that satisfies the
$\overline{\text{FDR}}$ criteria. Graphical illustration of the comparison between the two modeling approaches can be found in Web Figure 1.

When the sample size equals 200, there was no significant difference between the two model selection methods in terms of estimated mean and coverage. When the sample size dropped to *n* = 30, the prediction uncertainty increased greatly. Test cases 4 and 5 were underestimated using the joint posterior model selection method. The cause of this biased estimation was likely due to the exclusion of the important predictors. The joint posterior probabilities were dominated by the prior distribution, which favors parsimonious models. The predictions for cases 4 and 5 were slightly improved when using the
$\overline{\text{FDR}}$ criteria.

Now we return to the colorectal study described in the Introduction. We dichotomized the 23 biomarkers to mutant (0) or wild type (1). The endpoints observed for each patient in this trial were an indication of toxicities, tumor response, and time to tumor progression (TTP). Several types of toxicities were monitored, e.g., nausea, dehydration, neutropenia, vomiting, diarrhea, febrile neutropenia, and paresthesia. We chose the maximum grade among all types of toxicities and set a binary outcome toxicity (TOX) to 1 if the maximum grade was 4 or 5, which represents life threatening or fatal toxicity on a 5 point scale (National Cancer Institute Common Toxicity Criteria Version 2), and 0 otherwise. If a patient had at least two consecutive complete or partial tumor regressions, the binary outcome tumor response (RSP) was set to 1 and 0 otherwise. The third outcome was the TTP, which was a censored continuous variable. The total number of regressors is 325: 25 main effects (23 biomarkers plus AGE and SEX) plus 300 two-way interactions.

Preliminary investigation of the multiple endpoints (RSP, TOX, and TTP) indicated that they were not independent of each other, as there were positive correlations between RSP and TTP across all the treatment groups. Patients who had a confirmed tumor response consistently had a longer progression time relative to those who had no confirmed tumor response. To check the log-normal assumption for the survival model, we fit a parametric log-normal model to the failure times with no predictors using the survreg function in R. Based on the fit, we plotted the residuals (See Web Figure 2). In all three arms, log-normality appeared to be a tenable assumption.

We applied the proposed MBSI method on each treatment arm and chose *c* = 10 and *τ* = 0.05 for the mixture normal prior. This implied that if a coefficient was less than about 2*τ* = 0.1, we were comfortable excluding that regressor from the model. For the hyperparameters in the beta prior, the mode was set to 0.2 and mean was 0.25, which is 25% larger than the mode.

The estimated marginal posterior distributions of the regressors by treatment arms are shown in Figure 1a. The first 25 regressors are main effects and the rest are interaction terms. All the main effects show very high marginal posterior probabilities. The reason is that the marginal probability of a main effect consists of two sources: one is from itself, the other is from its interactions. The more interaction terms, the more the marginal probabilities for the main effects will be inflated. When *n* > *p*, there is more separation between signal and noise. See Web Figure 3 for the marginal posterior probabilities in our simulation study. However, when *n* < *p*, the beta hyperprior plays an important role. When the anticipated proportion of true predictors (the mean of beta hyperprior) was set to be 0.25, the marginal posterior probabilities for interactions were influenced by this prior accordingly (Figure 1a). To analyze the sensitivity to the choice of beta prior, we used another set of hyperparameters with mode equal to 0.1 and mean 0.05. This setting kept the basic shape of beta distribution unchanged, but the distribution is flatter and skewed more toward smaller values, suggesting that a priori there were fewer true predictors and we would be more uncertain about the number of the true predictors. The estimated marginal posterior probabilities of the interaction terms (Figure 1b) indicated that the posterior probabilities are sensitive to the prior, which is not a surprise as *n* < *p* in each treatment group. However, for those interaction terms whose estimated posterior probabilities of being selected are above average, the ordering from the highest to the lowest probabilities is very similar from either settings of beta prior. There is much uncertainty about most regressors, yet the strong signals from several of the interactions are worthy of further investigation.

Marginal posterior distributions of the regressors by treatment arms in the colorectal cancer study for the sensitivity analysis of beta priors.

To select the important predictors, we set the
$\overline{\text{FDR}}$ threshold *α _{L}* = 0.001 and

Note that the estimated coefficients were not significantly different from 0 for some main effects. For example, in arm F the estimated mean and standard error (SE) of the coefficient were 0.07(0.12) for marker 11 and 0.14(0.14) for marker 23. However, their interaction had estimated mean and SE of −0.65(0.28), which was significantly different from zero. This demonstrates that even though main effects were not significant, their significant joint effect was detected by the MBSI method.

In other examples (i.e., marker 10 and marker 12 in arm F), the coefficients of main effects were not significantly different from zero, and no interactions were selected. This may have one of the three possible explanations. First, fitting all the important variables that are selected due to their high marginal probabilities into a single multiple regression model does not guarantee that all of the coefficients are still significant. The multicollinearity among some important variables may lead to the nonsignificant coefficients. Second, the threshold for $\overline{\text{FDR}}$ procedure could be too low, leading to a high FDR. In our analysis, the threshold for the main effects was set very close to zero, which rules out this possibility. Third, the estimated high marginal probability of a main effect could be due to the falsely selected interactions during the MCMC procedure. A possible remedy for this problem is to apply the MBSI method on the main effects only and review the marginal probabilities of the main effects in question.

The estimated variance components are presented in Web Table 1. A positive correlation between RSP and TTP (*ρ*_{2}) was found in all the study arms. Across all three arms, arm F resulted in the largest model size (Table 3), which is defined as the total number of distinctive regressors in a multivariate model allowing different regressors for different outcomes. The model size will tend to increase with the size of sample, because the prediction variance will usually be reduced with increased sample size (Miller, 2002).

To compare the three treatment regimens, we defined a favorable region (Δ) of treatment outcomes as a union of Δ_{1} (RSP = 0 or 1, TOX = 0, and TTP > 365) and Δ_{2} (RSP = 1, TOX = 0 or 1, and TTP > 730). The definition of Δ can be very flexible and can change from study to study. A favorable treatment outcome often represents a trade-off between efficacy and toxicity: if a cancer treatment has high efficacy, sometimes patients are willing to accept more toxicity.

In this study, 18 variables, including AGE, were selected. If we select the design points of the continuous variable AGE at age 50 and 70 (20th and 80th percentile of age distribution), together with the other 17 binary variables, the total possible number of different configurations of patient profiles is 2^{18} = 262,144. For each of these 262,144 hypothetical or future patients, we can predict the probability of outcome in region Δ given treatment A, F, or G.

In general, the experimental treatment F had a slightly higher probability of achieving an outcome in the favorable region Δ compared to the standard treatment A, whereas the treatment G had a lower probability (See Web Figure 7). Under treatment A, patients with younger age, mutant markers 5 and 23, and wild-type marker 13 had greater than 50% chance to achieve the outcome region Δ. Patients with mutant marker 7 have worse outcome than those with wild-type marker 7 given treatment G.

Because we are dealing with the *n* < *p* situation in this application, further sensitivity analyses of model size were done using different settings of the tuning parameters *c* and *τ* (Web Table 2). The range of *c* where the predictive performance is invariant is narrower in our application (with *n* < *p*) than that recommended by George and McCulloch (1993). Possible explanations of this phenomenon are the small transition probabilities and sample size, discussed in Web Appendix A.

Because of the strong influence of tuning parameter *c*, we used a 10-fold cross-validation (CV) approach to choose *c*. We randomly divided data into 10 parts, stratified on treatment arms. In each of 10 subsets, one was set aside as testing data and the remaining nine were used as training set for modeling with different values of *c*. Optimal treatments for the testing set were predicted (see Section 2.5 for the calculation) using the model trained from the training set. We then separated the 513 patients into two groups: group 1 consisted of patients who received nonoptimal treatment; group 2 consisted of patients who received optimal treatment. When *c* = 10, the survival curve and percentages of toxicities and responses of each group are shown in Figure 2a. The median time to progression of group 2 was 11.5 months which was 53% greater than that of group 1 (7.5 months). Figures with other *c* values are shown in Web Figures 8–10. The predictive performance with *c* = 5 (Web Figure 8) was less than that of *c* = 10. This is likely due to the smaller model size, which left out a few important predictors. The predictive performance with *c* = 20 (Web Figure 9) was less than that of *c* = 10. This is probably due to the large model size, which led to model overfitting. Hence, *c* = 10 was chosen to have the best prediction performance.

To further validate whether or not the proposed individualized therapy model results in any difference in treatment effect, the 10-fold method in Section 4.4 is not appropriate because it used all the data for selecting *c*. Therefore, to obtain an assessment of prediction, a nested 10-fold CV was performed. We again divided data into 10 parts. For the 9/10 parts that were used for model building, we did inner 9-fold CV to select *c* – building models for a grid of *c* values (*c* = 5, 10, 15, 20) on the 8/9 parts and choosing the one that performed best in predicting the held-out part. Then, whichever *c* was the best, it was used on predicting the 1/10 part that was not used in selecting *c*. This was repeated for each of the ten 9/10 splits resulting in a valid assessment of prediction shown in Figure 2b.

The statistical testing and confidence intervals in Figure 2 are for illustration only, and the significance has to be interpreted with caution. Lusa et al. (2007) reported that the CV process results in inflated testing type I error rates. As pointed out by one referee, the only way to *truly* assess the value of individualized therapy would be to perform a prospective trial in which the patients were treated based on an individualized plan.

In this work, we proposed a multivariate Bayesian regression model for individualizing cancer treatments. We have addressed three issues: comparing the treatments using a quantitative score derived from predicting the multiple endpoints, modeling jointly non-Gaussian outcomes, and building a regression model with the selection of interactions. The endpoints considered were categorical and censored continuous variables, which are very common in most phase III clinical trial settings. The MBSI procedure which incorporates the selection rule for interaction terms by controlling posterior expected FDR was implemented. This particular selection rule increased the power of detecting interactions, which becomes more and more important in defining useful comprehensive models for complex diseases.

Our simulation study suggests the feasibility of predicting the multiple non-Gaussian outcomes simultaneously. Although the categorical outcomes in the current model were assumed to be binary outcomes, it is straightforward to extend the proposed approach to multilevel outcomes. The survival outcome in the colorectal cancer study was assumed to follow a log-normal accelerated failure time model for the simplicity of computation. We suggest checking this assumption before implementation in the data analysis.

Regarding the priors, although spike and slab mixture priors for variable selection were proposed and applied in the literature, we consider the narrow normal priors for nonsignificant regressors as proposed by George and McCulloch (1993) to be more practical. An important issue is the choice of *c*. We used CV type of methods to choose *c* in this application. These are all fairly computationally intensive methods. Further theoretical development is needed in this area.

When applying the MBSI method in the colorectal cancer study, we modeled each arm separately. Because different treatments work through different mechanisms, the possibility of treatment-biomarker interactions exists. Separate models eliminate those interaction terms and result in a less complex model structure and faster computation. As pointed out by one referee, it would be better if the modeling was not completely separate, but rather borrowed strength across treatments in estimating the covariance parameters. A hierarchical model component could be introduced that allowed each treatment to have its own covariance parameters. Nonetheless, the increased model complexity due to the additional level is beyond the scope of this article.

The authors thank the editor, the associate editor, and the referees for their comments that considerably improved this article. Part of the work was supported by the National Institutes of Health through Karmanos Cancer Institute Support Grant (5 P30 CA022453).

Supplementary Materials: Web Tables and Figures referenced in Sections 3.2 and 4.1–4.4 are available under the Paper Information link at the *Biometrics* website http://www.biometrics.tibs.org.

- Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88:669–679.
- Barbieri MM, Berger JO. Optimal predictive model selection. The Annals of Statistics. 2004;32:870–897.
- Barnard J, McCulloch R, Meng XL. Modelling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statistica Sinica. 2000;10:1281–1311.
- Brown PJ, Vannucci M, Fearn T. Multivariate Bayesian variable selection and prediction. (B).Journal of the Royal Statistical Society. 1998;60:627–641.
- Chen MH, Dey DK. Variable selection for multivariate logistic regression models. Journal of Statistical Planning and Inference. 2003;111:37–55.
- Chen W, Ghosh D, Raghunathan TE, Sargent DJ. A false-discovery-rate-based loss framework for selection of interactions. Statistics in Medicine. 2008;27:2004–2021. [PubMed]
- George EI. Discussion of Bayesian model averaging and model search strategies by M.A. Clyde. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 6. Oxford: Oxford University Press; 1999. pp. 175–177.
- George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993;88:881–889.
- Goldberg RM, Sargent DJ, Morton RF, Fuchs CS, Ramanathan RK, Williamson SK, Findlay BP, Pitot HC, Alberts SR. A randomized controlled trial of fluorouracil plus leucovorin, irinotecan, and oxaliplatin combinations in patients with previously untreated metastatic colorectal cancer. Journal of Clinical Oncology. 2004;22:23–30. [PubMed]
- Jeffreys H. The Theory of Probability. 3rd. Oxford: Oxford University Press; 1961.
- Lusa L, McShane LM, Radmacher MD, Shih JH, Wright GW, Simon R. Appropriateness of some resampling-based inference procedures for assessing performance of prognostic classifiers derived from microarray data. Statististics in Medicine. 2007;26:1102–1113. [PubMed]
- McLeod HL, Murray GI. Tumor markers of prognosis in colorectal cancer. British Journal of Cancer. 1999;79:191–203. [PMC free article] [PubMed]
- Milano G, McLeod HL. Can dihydropyrimidine dehydrogenase impact 5FU-based treatment? European Journal of Cancer Prevention. 2000;36:37–42. [PubMed]
- Miller AJ. Subset Selection in Regression. 2nd. London: Chapman & Hall; 2002.
- Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied Linear Statistical Models. New York: McGraw-Hill; 1996.
- Sha N, Tadesse MG, Vannucci M. Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics. 2006;22:2262–2268. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |