There has been an increased interest in Bayesian Phase I designs, particularly in the oncologic community. The most prominent of these is a class of methods that are usually characterized by the acronym CRM (Continual Reassessment Method)[

1]. An extensive literature [

2–

11] has covered the statistical properties of CRM and the modifications or extensions that followed. Many authors [

6,

8,

12–

14] have shown through simulations that their proposed modifications have improved the original CRM in terms of a particular outcome of interest. As pointed out by Eisenhauer

*et al*. [

15] the question concerning which types of designs described in the literature meet the criteria for safety, efficiency and precision in estimating the MTD still remains unresolved. We sought to evaluate the practical benefits of CRM-based methods over the standard ‘3 + 3’ dose escalation scheme in order to inform an institutional policy regarding the selection of designs for Phase I trials.

While there are some applications of CRM designs in recent Phase I trials [

16–

18], the standard method remains far more widely used, [

19] not only because of its simplicity, but it is also well understood and accepted by clinicians. However, the standard method often underestimates the MTD, as shown by He et al. [

20] resulting in selection of a dose whose toxicity rate is lower than the target rate. Clinical investigators are interested in designs that can estimate the MTD accurately while using fewer patients. In addition, for certain agents, investigators are not always certain of how many dose levels to test and where the MTD could lie. If indeed CRM reaches the MTD faster, by allowing rapid dose-escalations in sub-optimal doses, it is plausible that we could test more dose levels by skipping the lower doses without increasing the total number of patients accrued for the trial. If this is true, it offers the opportunity for a more substantial improvement over the standard method.

The performance of CRM and its sample size requirements have been examined through simulated studies under a fixed sample framework, as well as with the implementation of stopping rules [

12,

21,

22]. The fixed sample approach assumes that the original CRM [

1], under certain conditions, will converge to the true maximum tolerated dose (MTD) when the total sample size is sufficiently large (in the range of

20–

25) [

1,

4]. Goodman et al. suggested treating patients in cohorts, with or without a fixed sample, and have compared the total sample between several CRM versions and the standard method [

12]. These simulations showed that the sample size ranged from 18–20 using CRM with cohorts of 1, 2, or 3 patients, but on average the standard method required three subjects less than the CRM to test six dose levels. Previous work [

9,

13] used the width of the confidence interval around the dose-toxicity parameter as a stopping rule and a sample size of 24 on average was shown adequate with this stopping rule, making it similar to the fixed sample approach. Zohar and Chevret [

22] extensively compared different stopping rules by varying the maximum sample at 10, 20, and 30 and confirmed that at least 20 patients are needed to reach an accurate estimate of the MTD. Similar to other studies [

23,

24] that have looked at sample size requirements in the context of CRM, the number of dose levels was most often held constant at six and in some cases the sample size was held fixed. As a result, it is difficult to generalize the conclusions from these studies with respect to the sample size needed among the different CRM methods in comparison to the standard method under different scenarios.

The objective of this article is to determine whether a CRM-based design should be used routinely in Phase I trials, and under which circumstances is CRM more appropriate. Which CRM design among various modified versions should we use, and is a fixed sample approach as accurate as one with a sequential stopping rule? We compared various CRM-based methods with the standard method. The methods include the original CRM; two-stage methods that combine rule-based and model-based approaches; as well as CRM that accrues patients in cohorts. We evaluated CRM-based methods using both pre-specification of the fixed sample and a stopping rule approach. We evaluated the methods under realistic scenarios that vary the location of the true MTD, covering situations where the MTD is located at the lower, middle or higher doses. Comparisons were also performed by varying the number of dose levels from five to eight. Standard endpoints are reported such as overall measures of accuracy, precision, safety, trial duration, as well as how fast the MTD is reached under the different methods and what is the total sample size needed when a stopping rule is used.

In the sections below we cover the methodological background, describe the assumptions of the simulations, present the results, and provide recommendations of which design to use in practice.