PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Clin Trials. Author manuscript; available in PMC 2010 June 14.
Published in final edited form as:
PMCID: PMC2884971
NIHMSID: NIHMS203347

Model Calibration in the Continual Reassessment Method

Abstract

Background

The continual reassessment method (CRM) is an adaptive model-based design used to estimate the maximum tolerated dose in dose finding clinical trials. A way to evaluate the sensitivity of a given CRM model including the functional form of the dose-toxicity curve, the prior distribution on the model parameter, and the initial guesses of toxicity probability at each dose is using indifference intervals. While the indifference interval technique provides a succinct summary of model sensitivity, there are infinitely many possible ways to specify the initial guesses of toxicity probability. In practice, these are generally specified by trial and error through extensive simulations.

Methods

By using indifference intervals, the initial guesses used in the CRM can be selected by specifying a range of acceptable toxicity probabilities in addition to the target probability of toxicity. An algorithm is proposed for obtaining the indifference interval that maximizes the average percentage of correct selection across a set of scenarios of true probabilities of toxicity and providing a systematic approach for selecting initial guesses in a much less time consuming manner than the trial and error method. The methods are compared in the context of two real CRM trials.

Results

For both trials, the initial guesses selected by the proposed algorithm had similar operating characteristics as measured by percentage of correct selection, average absolute difference between the true probability of the dose selected and the target probability of toxicity, percentage treated at each dose and overall percentage of toxicity compared to the initial guesses used during the conduct of the trials which were obtained by trial and error through a time consuming calibration process. The average percentage of correct selection for the scenarios considered were 61.5% and 62.0% in the lymphoma trial, and 62.9% and 64.0% in the stroke trial for the trial and error method versus the proposed approach.

Limitations

We only present detailed results for the empiric dose toxicity curve, although the proposed methods are applicable for other dose toxicity models such as the logistic.

Conclusions

The proposed method provides a fast and systematic approach for selecting initial guesses of probabilities of toxicity used in the CRM that are competitive to those obtained by trial and error through a time consuming process, thus, simplifying the model calibration process for the CRM.

Introduction

The estimation of the maximum tolerated dose (MTD) is considered the main objective of phase I clinical trials. The MTD is defined as the dose at which a specified proportion of the patients will experience dose limiting toxicity (DLT). Various methods for estimating the MTD have been proposed throughout the years among them the continual reassessment method (CRM; [1]). This method uses a single parameter dose toxicity model to describe the relationship between dose and probability of DLT. Using the observed toxicity data from each patient, the model parameter is estimated and the dose associated with the DLT probability closest to a prespecified target is administered to the next patient. The CRM, being a Bayesian model-based method, requires the specification of (i) the functional form of dose-toxicity model, (ii) the prior distribution of its parameter, and (iii) the ‘dose levels’. While Chevret [2] studies extensively the impacts of the functional form of the model and the prior distribution of its parameter on the performance of the method, the selection of the ‘dose levels’ has not been addressed in the literature. When using the CRM, the dose levels are not the doses administered, but rather they are obtained via backward substitution of the initial guesses of the DLT probabilities in the model. Ideally, these initial guesses correspond to the ‘desirable range of toxicity probabilities for testing’ [1]. However, in most circumstances that information is not available and selection of the initial guesses is performed in an ad hoc manner. In practice, they are generally chosen by trial and error based on extensive simulations to examine the operating characteristics of various sets of initial guesses. This process is complicated since there are as many initial guesses as the number of doses.

The challenges in the calibration of CRM are presented using two motivating examples. The first one is a dose finding trial in patients with previously untreated diffuse large B cell or mantle cell non-Hodgkin’s lymphoma [3]. The main objective of the trial was to determine the MTD of VELCADE when administered in combination with CHOP + Rituximab (CHOP-R). DLT was defined as life threatening or disabling neurologic toxicity, very low platelet count or symptomatic non-neurologic or non-hematologic toxicity requiring intervention. The target probability of DLT was 0.25. Eighteen patients were treated for six 21-day cycles (126 days). The standard dose for CHOP-R was administered every 21 days. There were five doses of VELCADE with the third dose level being the starting dose. Dose escalation was conducted according to the time-to-event CRM (TITE-CRM; [4]). In this paper, however, we will focus on the dose-toxicity modeling issue and disregard the time-to-event component.

The second motivating example is a dose finding trial of short-term high dose lovastatin for the treatment of stroke within 24 hours of symptom onset [5]. The MTD was defined as the dose associated with a target probability of DLT of 0.10, where DLT was defined as liver or muscle toxicity based on changes in laboratory markers or symptoms. The treatment was administered to 33 patients for 30 days, with dose escalation for lovastatin occurring in the first 3 days and all patients receiving 20 mg/day for the following 27 days. The five doses of lovastatin that were evaluated up to 30 days after treatment were 1, 3, 6, 8 and 10 mg/kg. Dose escalation for the stroke trial was done in two stages. The first 3 patients were assigned the first dose level. If no DLT was observed, dose escalation was done as described in Table 1 per the coherence principle [10]. Once a DLT is observed, the trial would be switched to the TITE-CRM.

Table 1
Stroke Trial: Dose escalation plan before any DLT was observed

The second author served as the statistician on both trials. Initial guesses of the DLT probabilities were chosen based on model sensitivity [7] and simulated operating characteristics under a pre-specified set of dose-toxicity configurations. As there was no systematic way to choose the initial guesses for evaluation, many sets were tested resulting in a time-consuming ad hoc process. In both trials, the prior MTD was set at the third dose; thus the initial DLT rate at dose level 3 was set at the corresponding pre-specified target. Therefore, the search of the remaining initial guesses were done in a four-dimensional space. This could be a daunting task that deters applied statisticians from using the CRM. The motivation of this paper is to provide a systematic approach for choosing the initial guesses of the probabilities of DLT and consequently the dose levels. We propose selecting initial guesses used in the CRM by specifying a range of acceptable toxicity probabilities instead of a target probability of toxicity. An algorithm is provided for obtaining the range of acceptable toxicity probabilities by maximizing the average percentage of correct selection across a set of scenarios of true probabilities of toxicity. The algorithm provides a fast and systematic approach for selecting near optimal initial guesses in a much less time consuming manner than the trial and error method. Thus, it simplifies the design calibration process, and enhances the use of the CRM.

Model specification in the CRM

Suppose that we are interested in estimating the dose level associated with a target DLT probability pT. Let d1, d2,…,dK be the K test doses, and F(d, β) be the dose toxicity model that is strictly increasing in the dose d for all β. (The model F(·,·) also needs to satisfy some mild regularity conditions; see Appendix 1.) Common choices for the dose toxicity model are:

Empiric: F(d, β) = dβ

One-parameter logistic: F(d, β) = {1 + exp(−a − βd)}−1 where a is a fixed constant

As the CRM was originally proposed in a Bayesian framework, a prior distribution π(·) on the model parameter β is also assumed. Given the prior distribution and the data accrued up to the first n patients (i.e. the doses assigned to the patients and their corresponding toxicity outcomes), β can be estimated by the posterior mean (denoted as [beta]n). The dose level recommended for the (n + 1)st patient is the dose with the model-based DLT probability closest to pT, i.e.,

argmindj|F(dj;β^n)pT|.

This process is continued until a pre-specified number of patients.

Here, we note an important distinction between the dose levels d1, d2,…,dK and the doses administered (e.g. 1 mg/kg, 3 mg/kg, etc.). In the context of the CRM, the dose level dj is obtained by substituting the initial guess of the DLT probability pj into the specified dose toxicity model. Precisely, dj is defined such that pj = F (dj; [beta]0), for j = 1,…,K, where [beta]0 denotes the prior mean of β. Thus, to utilize the CRM it is necessary to specify p1,…,pK in addition to the functional form of F and the prior distribution π so that the doses d1, d2,…,dK can be determined.

Chevret suggests via simulations that the performance of the CRM as measured by the percentage of correct selection depends on the choice of F, with one-parameter logistic with a = 3 being superior; see Table II in [2]. This conclusion is based on an exponential prior π(β) = exp(−β) and a particular choice of the initial guesses of DLT probabilities. In what follows, we take a different approach by which the initial guesses are chosen by first fixing F and π, and demonstrate that by selecting the appropriate initial guesses, the empiric dose toxicity model can have comparable performance to the one-parameter logistic with a = 3.

Table 2
Lower (pL) and upper (pU) probabilities of the calibration sets for pT of 0.10, 0.20, 0.25 and 0.33 where pL and pU are defined such that (pT/(1 − pT))/(pL/(1 − pL)) = (pU/(1 − pU))/(pT/(1 − pT)) = ψ.

Choosing dose levels based on indifference intervals

Assume that we have a dose finding trial with K doses (d1dK) and the target probability of DLT is pT. Let Θ = [b1, bK+1] be the parameter space (i.e β [set membership] Θ) and H1 = [b1, b2), Hk = (bk, bk+1) for k = 2,…,K − 1 and HK = (bK, bK+1] where bk is the solution for F(dk−1, bk) + F(dk, bk) = 2pT for k = 2,…,K. Shen and O’Quigley [8] showed that for large enough n the CRM will recommend the true MTD (l) with certainty, if βk [set membership] Hl for all k, where βk is defined such that F(dk, βk) = µk and µk is the true DLT rate associated with dose k. Cheung and Chappell [7] postulated that if the true dose toxicity function is steep around the MTD, for large enough n, the dose recommended by the CRM is the true MTD under the more relaxed conditions whereby βlHl,βki=k+1KHifor k = 1,…,l − 1, and βki=1k1Hi for k = l + 1,…,K.

While these consistency conditions are theoretically revealing, their practical use is limited because the conditions involve the unknown true DLT probabilities µk’s. There fore, Cheung and Chappell suggested converting the intervals in the parameter space for β to intervals on the probabilty scale. Then, the indifference interval for a given correct dose level l was defined as an interval of DLT probabilities associated with the neighboring doses such that these neighboring doses may be selected instead of the true MTD (l). In notation, the indifference interval for the MTD (l) can be expressed as (NA, F(dl+1, bl+1)) for l = 1, (F(dl−1, bl), F(dl+1, bl+1)) for l = 2,…,K − 1 and (F(dl−1, bl), NA) for l = K.

For example, in the lymphoma trial pT is 0.25 and the dose toxicity model is assumed to be empiric where β [set membership] [−5, 5]. Assume that the initial guesses of the probabilities of DLT at each dose are 0.05, 0.12, 0.25, 0.40 and 0.55, respectively. Then the sets are H1 = [−5, −0.6), H2=(−0.6, −0.2), H3=(−0.2, 0.22), H4=(0.22, 0.63) and H5=(0.63, 5]. If l = 3, the condition specified by Cheung and Chappell requires that µ1 [set membership] (0, 0.193), µ2 [set membership] (0, 0.176), µ3 [set membership] (0.179, 0.319), µ4 [set membership] (0.319, 1) and µ5 [set membership] (0.3254, 1) for the CRM to select the MTD for large enough n. If µ2 [set membership] (0.176, µ3), the CRM may select the second dose as the MTD, but the probability of DLT is close enough to the target probability of DLT that the investigator would be indifferent if the incorrect dose was selected. The same can be said if µ4 [set membership]3, 0.319). Thus, the indifference interval in this case is (0.176, 0.319). Indifference intervals can be calculated assuming that the MTD is each one of the doses and they are (NA, 0.31), (0.19, 0.32), (0.18, 0.32), (0.18, 0.32) and (0.18, NA) for l = 1,…,5. The overall indifference interval is then defined by taking the union of the indifference intervals for l = 1,…,K. Thus, for the lymphoma trial with the assumptions above, the overall indifference interval is (0.18, 0.32). Thus, by only specifying pT and dose levels or the initial guesses of the probabilities of DLT at each dose, indifference intervals provide a range in which the DLT probability of the recommended dose will fall, providing an approach to evaluate the sensitivity of different dose toxicity models.

In this article, instead of comparing different dose toxicity models using indifference intervals, we propose to calibrate the CRM model by first specifying the target length of an indifference interval. Specifying the indifference interval in turn facilitates the selection of the initial guesses of the probabilities of DLT due to the correspondence between indifference intervals and the initial guesses of the probabilities of DLT, as follows. Given the prior MTD (v), the total number of dose levels (K), the target probability of DLT (pT) and the dose toxicity model (F(d, β)), dv can be obtained from pv which equals pT using backward substitution since pv = F(dv; [beta]0) where [beta]0 denotes the prior mean of β. If an indifference interval of length 2δ is desired, dose levels dv−1 and dv+1 can be obtained from the following equations:

F(dv1,bv)+F(dv,bv)=2pTandF(dv,bv+1)+F(dv+1,bv+1)=2pT
(1)
F(dv1,bv)=pTδandF(dv+1,bv+1)=pΤ+δ
(2)

Equations (1) are based on the definition of bk for, k = 2 … K. Equations (2) are based on the definition of indifference interval and its specified length. Thus, dv1=Fβ1(pTδ) such that F(dv,β) = pT + δ and dv+1=Fβ1(pT+δ) such that F(dv, β) = pT − δ. These can be obtained using the equations above and they are unique given the regularity conditions for F. Keeping the indifference interval at each dose level the same as the overall indifference interval and using the same procedures above, dose levels d1,…,dK can be iteratively obtained. From d1,…,dK we can obtain p1,…,pK using the dose toxicity model since pi = F(di; [beta]0). Thus, by specifying the desired length (δ) of the indifference interval, the target probability of DLT (pT), the number of dose levels (K), the prior MTD (v) and the functional form of the dose toxicity model (F), we can obtain the initial guesses of DLT probabilities, without the need to search for good initial guesses (and hence dose levels) in a high dimensional space.

Additionally, it can be shown that given the regularity conditions specified on the dose toxicity model, as δ increases, the distance between pi and pT for iv also increases. However, the spread between each pair of doses does not necessarily increase.

Theorem 1: Let F(d, β) be the dose toxicity function such that it is strictly increasing in d for all β and strictly monotone in the parameter β in the same direction for all d, then |pv+i(δ)−pT| is strictly increasing in δ, where pv+i(δ) is the initial guess of toxicity probability for dose v + i given δ and and i is an integer such that 1 − viKv and i ≠ 0.

The proof for Theorem 1 is provided in Appendix 2. From Theorem 1, we can infer that as δ increases, the set of initial guesses of probabilities of toxicity will be steeper. Simulations were also performed to examine the effect of δ on the prior probability of choosing a dose, P[set membership] Hk). For example, in the lymphoma trial, if δ = 0.03 then P[set membership] Hi) for i = 1,…,5 are (0.42, 0.06, 0.06, 0.06, 0.40). If δ = 0.12 then P[set membership] Hi) for i = 1,…,5 are (0.19, 0.19, 0.25, 0.19, 0.17). Thus, small values of δ have higher prior probability of selecting extreme doses while large values of δ have higher prior probability of selecting the prior MTD.

Calibrating δ in finite-sample settings

For large sample sizes, the method will guarantee that the target probability of DLT will fall in the specified indifference interval, however, it is necessary to provide some guidelines for the selection of δ in the finite sample setting. We expect that for finite samples the optimal δ value that yields the highest percentage of correct dose selection (PCS) will depend on the other design parameters (pT, K, v, F) as well as the sample size. We propose an algorithm to calibrate the CRM by selecting the δ that yields the highest PCS given a calibration set of true probabilities of DLT.

Algorithm

  1. Let δ = 0.01
  2. Calculate the initial guesses of the probabilities of toxicity at each dose (p1,…,pK). If F(d, β) is the empiric model,
    pi+1=exp(log(pT+δ)log(pi)log(pTδ))fori=v,,K1.
    (3)
    pi1=exp(log(pTδ)log(pi)log(pT+δ))fori=2,,v.
    (4)
    If F(d, β) is the logistic model,
    di+1=(log((pT+δ)1(pT+δ))a))dilog((pTδ)1(pTδ))afori=v,,K1di1=(log((pTδ)1(pTδ))a))dilog((pT+δ)1(pT+δ))afori=2,,v.pi=exp(a+di)1+exp(a+di)fori=1,,K.
  3. Specify the calibration set and assume the K different scenarios of true probabilities of DLT follow the plateau configuration where pi = pL for i < l, pi = pU for i > l and pi = pT for i = l where l = 1,…,K, ψ = 2, pU = pTψ/(1 − pT (1 − ψ)) and pL = pT/(ψ + pT(1 − ψ))
  4. Perform simulations using the CRM under each of the scenarios indicated in step 3 and obtain the corresponding PCS.
  5. Average the PCS across all K scenarios of the calibration set.
  6. Repeat steps 1–5 for δ values of 0.02 to the minimum of pT −0.005 and the upper limit of the desired indifference interval on a discrete domain with a grid width of 0.01.
  7. Select the δ that yields the highest average PCS.
  8. Calculate the initial guesses of the probabilities of toxicity at each dose (p 1,…,pK) for the δ selected in step 7 and use these as the initial guesses of the probabilities of toxicity for the CRM.
    The following steps are optional:
  9. Repeat steps 1 through 8 with ψ = 3 and ψ = 5
  10. For step 7, in addition to selecting the δ that yields the highest average PCS, select the range of δ that yields PCS within 1 percentage point of the highest average PCS for each value of ψ.
    • If the intersection of the ranges of δ for the various values of ψ is non empty, select the δ in the intersection that yields the highest average PCS across all values of ψ.
    • If the intersection of the ranges of δ for the various values of ψ is empty, select the δ values that yield the highest average PCS for each ψ and perform a sensitivity analysis using the validation set to select the optimal value of δ.
  11. Calculate the initial guesses of the probabilities of toxicity at each dose (p1,…,pK) for the δ value or values selected in step 10 and use these as the initial guesses of the probabilities of toxicity for the CRM.

The plateau configuration was selected as the calibration set because it provides the most conservative scenario for the CRM. The performance of the CRM will improve for scenarios when the true probabilities of DLT are steeper. The size of the jump is specified using the odds ratio, ψ. The pL and pU associated with the most common target probability of DLT for ψ = 2, 3, 5 are listed in Table 2. This suggests that ψ = 2 yields the most relevant calibration set for practical applications with a range of pT. It also yields the most conservative calibration set. PCS is expected to increase with increasing odds ratio since the CRM performs better when the true probabilities of DLT are steeper. Thus, in most cases it is not necessary to perform the optional part of the algorithm. The optional section is for scenarios when the true probabilities of toxicities are believed to be steep. The algorithm along with the optional steps will be referred as “Extended Algorithm”.

General Applications

The proposed algorithm is used to obtain the δ that yields the highest average PCS for various scenarios of target probability of DLT, sample size, number of doses and prior MTD, given calibration sets using an odds ratio of 2. The target probabilities of DLT selected are 0.10, 0.20, 0.25 and 0.33. The sample sizes are 20, 25, 30, 35 and 40. The number of doses ranges from 4 to 7 and the prior MTD ranges from the lowest dose to the median dose. These were selected based on the most common scenarios encountered in practice. The dose toxicity model is assumed to be empiric with the prior distribution for β being normal with µ = 0 and σ2 = 1.34 in all cases. The model was chosen based on prior publications on the CRM [9]. We ran 2000 simulations for each scenario except for a sample size of 20 with 7 doses as on average there would be less than 3 patients per dose. The CRM did not allow dose skipping during escalation. The CRM also did not allow for dose escalation immediately after a DLT was observed [10]. However, that dose can be assigned after outcomes from more patients are obtained.

For each scenario, Table 3Table 6 displays the δ that yields the highest average PCS as well as the range of δ for which the average PCS was within 1% of the optimal δ. Table 3, ,4,4, ,5,5, and and66 display the optimal δ corresponding to target probability of DLT of 0.10, 0.20, 0.25 and 0.33, respectively. It can be observed that the value of the optimal δ decreases as sample size and the number of doses increases. Additionally, the value of the optimal δ increases as a function of the prior MTD.

Table 3
Optimal δ for OR=2 and pT = 0.10 given number of doses (K), prior MTD (v) and sample size
Table 4
Optimal δ for OR=2 and pT = 0.20 given number of doses (K), prior MTD (v) and sample size
Table 5
Optimal δ for OR=2 and pT = 0.25 given number of doses (K), prior MTD (v) and sample size
Table 6
Optimal δ for OR=2 and pT = 0.33 given number of doses (K), prior MTD (v) and sample size

To illustrate the application of the tables to a particular study, we calibrate the hypothetical example presented in Chevret. The example had a sample size of 25 with a target probability of DLT of 0.25 (pT = 0.25). It had six doses (K=6) with the third dose level being the prior MTD (v = 3). Given these parameters, we use Table 5 to find the optimal δ which in this case is 0.08. The initial guesses of the probabilities of DLT associated with a δ of 0.08 given the specifications above are 0.029, 0.109, 0.25, 0.42, 0.581, 0.712. These are obtained using equations (3) and (4).

The validation set of true probabilities of DLT included the six scenarios originally included in Chevret. For each scenario of the validation set, we obtain the distribution of the MTD based on 2000 simulations using the initial guesses of probabilities of DLT used by Chevret (i.e. 0.05, 0.10, 0.25, 0.35, 0.50, 0.70) along with a logistic with a=3 and a=1 as the dose toxicity model and then using the initial guess corresponding to a δ of 0.08 along with an empiric dose toxicity model. For each scenario of the validation set, Table 7 displays the percentage with which each dose level is recommended (% Recommendation), the average absolute difference between the true probability of the dose selected (d*) and pT (Average |p(d*)–pT|), the percentage of patients treated at each dose level (% Treated) and the percentage of patients with DLT (% DLT) using the initial guesses of probabilities of DLT used by Chevret along with a logistic with a=3 and a=1 as the dose toxicity model and the initial guesses using the Algorithm along with an empiric dose toxicity model. The proposed algorithm performs slightly better compared to the logistic a=3 when the MTD is in the middle and slightly worse when the MTD is at the extremes. However, the overall performance is similar, suggesting that the initial guesses obtained using the proposed algorithm with an empiric model is competitive to the logistic with a=3, which is the best performing model according to Chevret’s paper. The initial guesses of probabilities used by Chevret are not appropriate when the dose toxicity model is logistic with a=1 since the PCS is only 17% when the MTD is the highest dose. The average absolute difference between the true probability of the dose selected and pT is between 0.05 and 0.07 in most cases with the exception of the logistic with a=1 when the MTD is the highest dose.

Table 7
Operating characteristics of the initial guesses in Chevret with logistic a=3 (CL3) and logistic a=1 (CL1) models versus the initial guesses obtained using the Algorithm (A) with empiric model assuming pT = 0.25 and the prior MTD is dose level 3. The ...

Applications

The Extended Algorithm is applied to calibrate the two motivating examples. The calibration is based on 2000 simulations. For both of the motivating examples, the dose toxicity model is assumed to be empiric (F(d; β) = d exp(β)) with the prior distribution for β being normal with µ = 0 and σ2 = 1.34. In addition, we performed 2000 simulations using the CRM to compare the initial guesses of probabilities of DLT obtained using the Extended Algorithm to those selected previously based on extensive simulations. The comparisons are done using a validation set of true probabilities of DLT. This is the set of true probabilities of DLT that were originally used when designing the trial. The CRM does not allow dose skipping during escalation. In addition, the CRM does not allow for dose escalation immediately after a DLT is observed.[10]

Application to the Lymphoma Trial

For the lymphoma trial the initial guesses of the probabilities of toxicity at each dose were 0.05, 0.12, 0.25, 0.40 and 0.55, respectively. These were selected based on extensive simulations evaluating the operating characteristics of various initial guesses of the probabilities of DLT based on 2000 simulations using the validation set. Table 8 shows the validation set of the true probabilities of DLT. The other design parameter values are: pT = 0.25, v = 3 and K = 5 (see section 1). The starting dose is the third dose.

Table 8
Lymphoma Trial: Operating characteristics of the extensive simulation, Extended Algorithm (δ=0.10), the optimal δ for ψ=5 (δ=0.12) and a δ value of 0.03 given pT = 0.25 and the prior MTD is dose level 3. p(d*) ...

We calibrate the lymphoma trial using the Extended Algorithm. For each calibration set, we iterate the δ values between 0.01 and 0.24. Each δ value corresponds to a set of initial probabilities of DLT. For example, given a δ value of 0.05 which corresponds to an indifference interval of (0.20, 0.30), we can calculate the initial guesses of the probabilities of DLT using equations (3) and (4) as follow:

d1=p1=exp(log(0.20)log(0.157)log(0.30))=0.084d2=p2=exp(log(0.20)log(0.25)log(0.30))=0.157d4=p4=exp(log(0.30)log(0.25)log(0.30))=0.355d5=p5=exp(log(0.30)log(0.355)log(0.20))=0.460

Thus, the initial guesses of the probabilities of DLT at each dose are 0.084, 0.157, 0.250, 0.355, 0.460 respectively, and they yield an indifference interval of (0.20, 0.30).

The results are displayed in Figure 1. As expected, PCS increases with increasing odds ratio. However, the range of δ that yields high PCS varies very little depending on the odds ratio. For an odds ratio (ψ) of 2, the highest average PCS is 48.8% when δ is 0.09. The average PCS was within 1% for δ values between 0.06 and 0.10. For an odds ratio of 3, δ of 0.09 also yielded the highest average PCS of 64.6%. The range of δ is 0.09 to 0.13. The highest average PCS for an odds ratio of 5 was 79.0% which occurred when δ is 0.12. The range of δ is 0.10 to 0.14. Thus, across the various odds ratio a δ of 0.10 is the only δ value that gives an average PCS that is within 1% of the highest average PCS. The initial guesses of the probabilities of DLT at each dose for a δ of 0.10 are 0.011, 0.082, 0.25, 0.464, 0.654. We compare the performance of the initial guesses of the probabilities of DLT corresponding to a δ value of 0.10 to the initial guesses based on extensive simulations using the validation set. To evaluate the sensitivity of the performance to the δ selected based on different odds ratios, we also compared the above initial guesses to the initial guesses of the probabilities of DLT corresponding to a δ value of 0.12 which is the optimal for an odds ratio of 5. The initial guesses of the probabilities of DLT at each dose for a δ of 0.12 are 0.003, 0.058, 0.25, 0.509, 0.719. To evaluate the effect of δ on the operating characteristics, we also compared the above δ values to a δ value of 0.03.

Figure 1
Lymphoma Trial: Proportion of Correct Selection versus δ

The validation set of true probabilities of DLT included four scenarios. For each scenario of the validation set, Table 8 displays the percentage with which each dose level is recommended (% Recommendation), the average absolute difference between the true probability of the dose selected (d*) and pT (Average |p(d*) −pT|), the percentage of patients treated at each dose level (% Treated) and the percentage of patients with DLT (% DLT). The results are very similar with initial guesses from the extensive simulations doing slightly better when the MTD is at the extreme and slightly worse when the MTD is in the middle. The results for δ values of 0.10 and 0.12 were similar, suggesting that the performance is robust to the choice of odds ratio. By increasing the value of δ from 0.03 to 0.12, the PCS increased substantially when the MTD is in the middle and decreased substantially when the MTD is the highest dose. This is also demonstrated by the average difference between the true probability at the dose selected and pT.

Application to the Stroke Trial

For the stroke trial, the initial guesses of the probabilities of DLT based on extensive simulations were 0.02, 0.06, 0.10, 0.18 and 0.30 for each dose respectively. These were selected similarly to the ones for the lymphoma trial. Table 9 shows the validation set for the stroke trial. The other design parameter values are: pT = 0.10, v = 3 and K = 5. To calibrate the trial using the Extended Algorithm, we examine calibration sets within the same range of odds ratio (2, 3, 5). For each calibration set, we iterate the δ values between 0.01 and 0.095 since the target probability of DLT is 0.10. The results are displayed in Figure 2. It is visible in this example that for small odds ratios the PCS is low regardless of the choice of δ. Across all calibration sets, a δ of 0.04 yielded an average PCS was was within 1% of the optimal δ. The initial guesses of the probabilities of DLT at each dose for a δ of 0.04 are 0.009, 0.037, 0.10, 0.20, 0.325.

Figure 2
Stroke Trial: Proportion of Correct Selection versus δ
Table 9
Stroke Trial: Operating characteristics of the extensive simulation versus the Extended Algorithm given pT = 0.10. The initial guesses obtained using the Extended Algorithm correspond to δ = 0.04. An initial design is used prior to observing the ...

We compare the performance of the initial guesses of the probabilities of DLT corresponding to a δ value of 0.04 to the initial guesses previously chosen using the validation set. The validation set of true probabilities of DLT for the stroke trial included five scenarios. For each scenario of the validation set, Table 9 displays the percentage with which each dose level is recommended (% Recommendation), the average absolute difference between the true probability of the dose selected (d*) and pT (Average |p(d*) − pT|), the percentage of patients treated at each dose level (% Treated) and the percentage of patients with DLT (% DLT) using extensive simulations and the Extended Algorithm. The results are similar with initial guesses from the extensive simulations doing slightly better when the MTD is at the extreme and slightly worse when the MTD is in the middle.

Discussion

To estimate the MTD using the CRM, it is necessary to specify the initial guesses of the probability of DLT at each dose. The literature on the CRM generally uses the same set of initial guesses of the probabilities of DLT that was used in the original paper published by O’Quigley et. al. [1] without any justification. The task of selecting the initial guesses can be daunting given the high dimensionality of the problem and the extensive number of simulations involved. We believe this deters many applied statisticians from using the CRM. The focus of this paper is to provide a systematic and simple approach for choosing the initial guesses of the probabilities of DLT. We provide the optimal δ for the most common scenarios encountered assuming an empiric dose toxicity model along with an example that illustrates the way to use the tables. By simplifying the selection of the initial guesses of the probabilities of DLT in practice, we hope to enhance the use of the CRM. All simulations in this paper can be easily performed using the getprior and the crmsim function in the R package ‘dfcrm’ [11], [12]. The getprior function calculates the initial guesses of the probabilities of DLT at each dose given δ, pT, K, v, and the dose toxicity model. The crmsim function performs the simulations by specifying the true probabilities of DLT, the initial guesses, pT, the sample size, the number of simulations, and the dose toxicity model to be used. See Appendix 3 for the R program for the Chevret example. In addition to being simple, for both of the motivating examples and the hypothetical example in Chevret, our method yielded results that were very similar to those obtained using the ad hoc extensive simulations as well as the results published by Chevret.

The method proposed to select the initial guesses of the probabilities of DLT is based on the concept of indifference intervals. It can be argued that indifference interval is an asymptotic concept and sample sizes are small in dose finding trials. However, it has been demonstrated that poor asymptotic behavior implies poor finite sample operating characteristics. In cases when the investigator can specify the range of acceptable probabilities of toxicity, it is only necessary to calculate the initial guesses of the probability of toxicity at each dose based on that indifference interval and examine the operating characteristics given under scenarios of true probabilities of toxicity. In cases when the investigator can not specify a desired indifference interval, the proposed algorithm can be used to calibrate the CRM and to screen for model specifications. In addition, we propose an algorithm that chooses the optimal δ and provide the optimal δ for some common real life scenarios based on simulations examining the operating characteristics in the finite sample setting.

For all the examples and simulations, the dose toxicity model was assumed to be empiric as we generally use the empiric model in practice. The optimal δ will be different if the dose toxicity model is assumed to be logistic. These are not provided, but can be obtained using the approach described in the paper for the motivating examples. A logistic dose toxicity model can be used for the ‘dfcrm’ package.

Appendix 1 Regularity conditions for dose toxicity model F

  • F is strictly increasing in d for all β.
  • F is strictly monotone in the parameter β in the same direction for all d.
  • Given any pT [set membership] (0,1), for each d, there exists β in the interior of Θ, the parameter space, such that F (d; β)= pT.
  • F(dk, β) is bounded away from 0 and 1 for all k and β [set membership] Θ.
  • F′(d, β) = [partial differential]F(d, β)/[partial differential]β is uniformly bounded in β.
  • For each 0 < t < 1 and each d, the function
    tF(d,β)F(d,β)+(1t)F(d,β)1F(d,β)
    is continuous and strictly monotone in β.

Appendix 2

Theorem 1: Let F(d, β) be the dose toxicity function such that it is strictly increasing in d for all β and strictly monotone in the parameter β in the same direction for all d, then |pv+i(δ) − pT| is strictly increasing in δ, where pv+i(δ) is the initial guess of toxicity probability for dose v + i given δ and i is an integer such that 1 − viKv and i ≠ 0.

Without loss of generality, assume that F(d, β) is strictly decreasing in β for all d, i > 0 and δ1 < δ2. Thus pT < pv+i. It suffices to show that pv+i1) < pv+i2) where i = 1,…Kv. This can be proved by induction.

For i = 1, we need to show that p v+11) < p v+12). Since F(d,β) is decreasing in β, Fdv1(pTδ1)<Fdv1(pTδ2). This implies that, Fβ(δ1)1(pT+δ1)<Fβ(δ2)1(pT+δ2) where F(dv,β(δk)) = pT − δk for k = 1, 2 since F(d,β) is strictly increasing in d and F −1(p, β) is strictly increasing in β. Now, since F(d,β) is strictly increasing in d, F(d v+11);1) < F(d v+12); 1) where F(d v+1k),β(δk)) = pT + δk for k = 1,2. Thus, p v+11) < p v+12).

Now, suppose that pv+j1) < pv+j2), we want to show that p v+j+11) <p v+j+12). Since F(d,β) is strictly decreasing in β and F −1(p,β) is strictly increasing in β, Fdv+j(δ1)1(pTδ1)<Fdv+j(δ2)1(pTδ2). This implies that, Fβ(δ1)1(pT+δ1)<Fβ(δ2)1(pT+δ2) where F(dv+jk),β(δk)) = pT − δk since F(d,β) is strictly increasing in d and F −1(p, β) is strictly increasing in β. Since F(d, β) is strictly increasing in d, F(d v+j+11); 1) < F(d v+j+12);1) where F(d v+j+1k),β(δk)) = pT + δk for k = 1, 2. Thus, p v+j+11) < p v+j+12).

Thus, by induction, pv+i1) < pv+i2) where i = 1,…,Kv. Since pT < pv+i1) < pv+i2), |pv+i(δ) − pT| is strictly increasing in δ. The proof for 1 − vi ≤ −1 is similar and thus omitted.

Appendix 3 R program for Chevret example

nsim<−2000

n<−25

target<−0.25

K<−6

nu<−3

# using the delta from calibration

bestdelta<−0.08

prior<−getprior(bestdelta, target, nu, K)

ptrue<−c(0.09, 0.16, 0.27, 0.38, 0.57, 0.75)

new<−crmsim(ptrue, prior, target, n, nu, nsim=nsim, count=FALSE)

References

1. O’quigley J, Pepe M, Fisher L. Continual Reassessment Method: A practical design for Phase I clinical Trials in Cancer. Biometrics. 1990;46:33–48. [PubMed]
2. Chevret S. The continual reassessment method in cancer phase I clinical trials: A simulation study. Statistics in Medicine. 1993;12:1093–1108. [PubMed]
3. Leornard JP, Furman RR, Cheung YKK, et al. Phase I/II trial of botezomib plus CHOP-Rituximab in diffuse large B cell (DLBCL) and mantle cell lymphona (MCL): Phase I results. Blood. 2005;106(11):147A–147A. 491 Part1.
4. Cheung YK, Chappell R. Sequential designs for phase I clinical trials with late-onset-toxicities. Biometrics. 2000;56(4):1177–1182. [PubMed]
5. Elkind MS, Sacco RL, MacArthur RB, et al. The Neuroprotection with Statin Therapy for Acute Recovery Trial (NeuSTART): an adaptive design phase I dose-escalation study of high-dose lovastatin in acute ischemic stroke. Int J Stroke. 2008;3(3):210–218. [PMC free article] [PubMed]
6. Cheung YK. Coherence principles in dose-finding studies. Biometrika. 2005;92(4):863–873.
7. Cheung YK, Chappell R. A simple technique to evaluate model sensitivity in the Continual Reassessment Method. Biometrics. 2002;58:671–674. [PubMed]
8. Shen L, O’Quigley J. Consistency of the continual reassessment method under model mispecification. Biometrika. 1996;83:395–405.
9. O’Quigley J, Shen L. Continual reassessment method: A likelihood approach. Biometrics. 1996;52:673–684. [PubMed]
10. Cheung YK. Coherence principles in dose-finding studies. Biometrika. 2005;92:863–873.
11. R Development Core Team. R Foundation for Statistical Computing. Vienna, Austria: 2008. R: A language and environment for statistical computing. ISBN 3-900051-07-0, URL http://www.R-project.org.
12. Cheung YK. dfcrm: Dose-finding by the continual reassessment method. 2008. R package version 0.1–1. http://www.columbia.edu/~yc632.