Search tips
Search criteria 


Logo of biostsLink to Publisher's site
Biostatistics. 2016 April; 17(2): 304–319.
Published online 2015 November 9. doi:  10.1093/biostatistics/kxv045
PMCID: PMC4834949

A decision-theoretic phase I–II design for ordinal outcomes in two cycles

Juhee Lee*
Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California, 1156 High Street, Mail Stop SOE2, Santa Cruz, CA 95064, USA
Peter F. Thall
Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
Yuan Ji
Program of Computational Genomics & Medicine, NorthShore University Health System, Evanston, IL, USA and Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA


This paper is motivated by a phase I–II clinical trial of a targeted agent for advanced solid tumors. We study a stylized version of this trial with the goal to determine optimal actions in each of two cycles of therapy. A design is presented that generalizes the decision-theoretic two-cycle design of Lee and others (2015. Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. Journal of the American Statistical Association, to appear) to accommodate ordinal outcomes. Backward induction is used to jointly optimize the actions taken for each patient in each of the two cycles, with the second action accounting for the patient's cycle 1 dose and outcomes. A simulation study shows that simpler designs obtained by dichotomizing the ordinal outcomes either perform very similarly to the proposed design, or have much worse performance in some scenarios. We also compare the proposed design with the simpler approaches of optimizing the doses in each cycle separately, or ignoring the distinction between cycles 1 and 2.

Keywords: Adaptive design, Bayesian design, Decision theory, Dynamic treatment regime, Latent probit model, Ordinal outcomes, Phase I–II clinical trial

1. Introduction and motivation

This paper is motivated by the problem of designing a dose-finding trial of a new agent for cancer patients with advanced solid tumors. The agent aims to inhibit a kinase, which regulates cell metabolism and proliferation, in the cancer cells to reduce or eradicate the disease. The agent is given orally each day of a 28-day cycle at one of five doses, 2, 4, 6, 8, or 10 mg, combined with a fixed dose of standard chemotherapy. Because both efficacy and toxicity are used for dose-finding, it is a phase I–II trial (Thall and Cook, 2004; Yin and others, 2006; Zhang and others, 2006; Thall and Nguyen, 2012). Both outcomes are 3-level ordinal variables, with toxicity defined as None/Mild (grade 0,1), Moderate (grade 2), or Severe (grade 3,4) and efficacy defined in terms of disease status compared with baseline, with possible values progressive disease (PD), stable disease (SD), or partial or complete response (PR/CR).

We study a stylized version of this trial with the more ambitious goal to determine optimal doses or actions for each patient in each of two cycles of therapy. This is a major departure from conventional dose-finding designs, which focus on choosing a dose for only the first cycle. While virtually all clinical protocols for dose-finding trials include rules for making within-patient dose adjustments in cycles after the first, this aspect usually is ignored in the trial design. In practice, each patient's doses in cycle 2, or later cycles, are chosen subjectively by the attending physician. To choose a patient's cycle 2 dose using a formal rule, it is desirable to use the patient's dose-outcome data from cycle 1, as well as data from other patients treated previously in the trial. Thus, ideally, a decision rule that is adaptive both within and between patients is needed.

Recent papers on designs accounting for multiple treatment cycles include Cheung and others (2014) and Lee and others (2015). In this paper, we build on the latter, who use a decision-theoretic approach for dose-finding in two cycles based on joint utilities of binary outcomes in each cycle. We extend the model to accommodate ordinal outcomes, and use a decision criterion that accounts for the many possible (efficacy,toxicity) outcomes in each of the two treatment cycles, including the risk-benefit trade-offs between the levels of efficacy and toxicity. In the stylized version of the trial described above, since there are 3-level ordinal toxicity and efficacy outcomes in each cycle, accounting for two cycles there are 81 possible elementary outcomes for each patient. Consequently, dose-finding is a much more complex problem than in a conventional phase I–II trial with two binary outcomes that chooses a dose for cycle 1 only.

Aside from the issue of accounting for two cycles, an important question is whether the additional complexity required to account for ordinal outcomes provides practical benefits compared with the common approach of dichotomizing efficacy and toxicity, which would allow the two-cycle design of Lee and others (2015) to be applied. Simulations, described in Section 4.4 of the main text, Figure Figure2,2, and Section 3 of Supplementary Material (available at Biostatistics online), show that reducing ordinal outcomes to binary variables produces a design that either performs very similarly to the proposed design, or has much worse performance in certain scenarios. Moreover, the behavior of the simplified design depends heavily on how one chooses to reduce the two ordinal outcomes to two binary variables.

Fig. 2.
Plots of equation M148 for a comparison of DTD-O2 vs. a design with binary outcomes. Here, equation M149, equation M150, and equation M151 represent empirical mean utilities of patients treated in the trial, true mean utilities of treatments given to patients in the trial, and true expected utilities chosen ...

A naive design might aim to optimize the doses given in the two cycles separately. This may be not optimal. To see this, denote a patient's toxicity outcome by equation M1 and efficacy outcome by equation M2 for equation M3 and denote the current data from equation M4 patients by equation M5 We include equation M6 as a possible action in either cycle for cases where it has been determined that no dose is acceptable, so the action equation M7 in each of cycles equation M8 may be either to choose a dose or equation M9, that is, equation M10 with 1 and equation M11 denoting the minimum dose and the maximum dose levels, respectively. Suppose that some optimality criterion has been defined. If one derives optimal adaptive actions equation M12 for cycle 1 and equation M13 for cycle 2 separately, each based on the current data equation M14 an inherent flaw is that in choosing one equation M15 for all patients it ignores each patient's cycle 1 data. As in Lee and others (2015), we derive optimal decision rules equation M16 = equation M17 with the important property that equation M18 = equation M19 is a function of the first cycle decision equation M20 and response equation M21. This is implemented by applying backward induction (Bellman, 1957, etc.). The method accounts for the patient's cycle 1 dose and outcomes, as well as other patient's data, in making an optimal decision for cycle 2.

Iasonos and others (2011) and Van Meter and others (2012) studied the use of ordinal toxicity outcomes for a generalized continual reassessment method and reported that gains in performance of their ordinal toxicity designs are not substantial in comparison to binary toxicity designs. However, the comparison looks quite different for the model-based two-cycle design for bivariate ordinal (efficacy, toxicity) outcomes that we propose in this paper. In simulations described in Section 4.4, we compare the proposed design with designs that do not properly model association between cycles. In simulations reported in Section 4.5, we show that the use of ordinal rather than binary outcomes can substantially improve design performance in our setting.

Section 2 describes the proposed decision-theoretic method for ordinal outcomes in two cycles (DTD-O2). Sections 3 and 4 include decision criteria using utilities and a simulation study. The last section concludes with a final discussion.

2. A decision-theoretic design

2.1. Actions and optimal sequential decisions

For notational convenience, we denote the possible levels of toxicity by equation M22 and efficacy by equation M23. For the motivating trial, these are equation M24 for None/Mild, 1 for Moderate, and 2 for Severe, and equation M25 for PD, 1 for SD, and 2 for CR/PR, so equation M26. If the adaptively chosen cycle 1 action equation M27 for any patient, then the trial is stopped and no more patients are enrolled. Otherwise, the patient receives a dose equation M28 of the agent in cycle 1. A cycle 2 action is a function mapping the cycle 1 dose and outcomes, equation M29 to an action in equation M30 For example, if the cycle 1 action equation M31 produced None/Mild toxicity (equation M32), one possible cycle 2 action is equation M33 if equation M34, and equation M35 if equation M36 or 2. That is, if there was little or no toxicity but PD in cycle 1, then the action equation M37 increases the dose in cycle 2, but if the patient had SD or better then it repeats the cycle 1 dose. The design thus involves an alternating sequence of decisions and observed outcomes, equation M38, equation M39, equation M40 and equation M41.

We apply a Bayesian decision-theoretic paradigm to determine an optimal decision rule. First, focus on cycle 1, and temporarily ignore cycle 2. The general setup of a Bayesian decision problem involves actions equation M42, observable data equation M43, parameters equation M44 that index a sampling model equation M45 for the data, and a prior probability model equation M46 for the parameters. We discuss specification of equation M47 in more detail below. A utility function equation M48 formalizes relative preferences for alternative actions under hypothetical outcomes equation M49 and assumed truth equation M50. Starting from first principles, one can then argue ((Robert, 2007, Chapter 2)) that a rational decision-maker chooses the action equation M51 that maximizes utility in expectation, that is

equation M52

The integral is the expected utility equation M53 with the expectation taken with respect to equation M54. To simplify notation, we will henceforth suppress conditioning on equation M55 in the notation.

In the two-cycle dose-finding problem, the sequential nature of the within-patient decisions complicates the solution. In the second cycle, the utility equation M56 is replaced by the expected utility under optimal continuation. Denote equation M57 and equation M58. We get an alternating sequence of optimization and expectation

equation M59

with the second cycle expected total utility as a function of equation M60, equation M61 and the optimal second cycle decision equation M62. When we substitute equation M63 and take the expectation with respect to equation M64 we obtain

equation M65

which is maximized to determine the optimal decision for cycle 1, equation M66. This alternating sequence of maximization and expectation, called dynamic programming, is characteristic of sequential decision problems. While it often leads to intractable computational problems ((Parmigiani and Inoue, 2009, Chapter 12)), in the present setting with ordinal outcomes the problem is solvable. Dynamic programming recently has been applied in other clinical trial design settings (Murphy, 2003; Zhao and others, 2011; Lee and others, 2015; Cheung and others, 2014).

2.2. Utility function

We construct a utility function

equation M67

as a sum over cycle-specific utilities equation M68, equation M69 where equation M70 is a scale parameter. If equation M71 then the cycle 2 utility is ignored in selecting equation M72 while equation M73 corresponds to treating utilities in the two cycles equally. Optimal decisions may change under different values of equation M74. Even with equation M75, however, the importance of jointly modeling the two cycles remains in that inference on equation M76 can be enhanced through borrowing information across cycles. For the simulations in Section 4, we used equation M77. A sensitivity analysis in equation M78 is reported in the Supplementary Materials (available at Biostatistics online). The utility function (2.4) focuses on the clinical outcomes and is a function of equation M79 only. That is, the inference on equation M80 does not affect utility, and we do not initially consider preferences across doses equation M81. We thus drop equation M82 and equation M83 from the arguments of equation M84 hereafter.

In practice, numerical utilities of the equation M85 elementary must be elicited from the clinical collaborators, with specific numerical values reflecting physicians' relative preferences (cf. Thall and Nguyen, 2012). In our stylized illustrative trial, we fix the utilities of the best and worst possible outcomes to be equation M86 and equation M87. In general, any convenient function with equation M88 and equation M89 that gives higher utilities to more desirable outcomes may be used. For future reference, we note that equation M90 is the expected utility corresponding to equation M91 i.e. do not treat the patient. Table Table11 shows the utilities that will be used for our simulation studies.

Table 1.
An example of elicited utilities, equation M92

To reduce notation, we denote the utility equation M93 as a function of hypothetical outcomes equation M94, and drop the arguments equation M95 and equation M96. Upper case equation M97 denotes expected utility, with data equation M98, equation M99 removed by marginalization and decisions equation M100, equation M101 substituted by maximization, as in (2.2). In addition to the cycle index equation M102, the arguments of equation M103 clarify the level of marginalization and maximization. Maximizing equation M104 in (2.1) and equation M105 inside the integral in (2.2) yields the optimal action pair equation M106, where equation M107 is either a dose or equation M108, equation M109 is applicable only when equation M110 is a dose, and equation M111 is a function of equation M112 and the patient's cycle 1 outcomes, equation M113. Assuming that the utility function takes the additive form (2.4), we define cycle-specific expected utilities, with the expected utility for cycle 2 given by

equation M114

Figure Figure1(a)–(c)1(a)–(c) illustrates equation M115 under the assumed simulation truth of Scenario 3 (discussed in Section 4.2), and shows how equation M116 changes with equation M117, given equation M118. Figure Figure1(d)1(d) illustrates the assumed true equation M119 over equation M120 for the simulation scenarios discussed in Section 4.2.

Fig. 1.
(a)–(c) The true expected cycle 2 utilities of taking equation M121 given equation M122, equation M123 with equation M124 for scenario 3. Each panel corresponds to one of the three possible outcomes of equation M125. equation M126 is acceptable only when its expected utility is greater than that of equation M127, equation M128. equation M129 is marked with a ...

Some practical guidelines of using utility functions for a design with ordinal outcomes in the two-cycle setting are provided in Section 1 of the Supplementary Material (available at Biostatistics online).

2.3. Action set

Equation (2.2) includes two maximizations to determine equation M155 and equation M156 In the discussion thus far, we have not used the particular elements of equation M157 and they might have been any actions. In actual dose-finding, ethical and practical constraints are motivated by the knowledge that, in general, higher doses carry a higher risk of more severe toxicity. We thus require a more restrictive action set, with additional conditions for the acceptability of a dose assignment.

The first additional criterion is that we do not skip untried dose levels when escalating. This rule is imposed almost invariably in actual trials with adaptive dose-finding methods. Let equation M158 denote the highest dose level among the dose levels that have been tried in cycle 1 and equation M159 the highest dose level among those that have been tried in either cycle 1 or cycle 2. The search for the optimal actions is constrained such that equation M160 and equation M161. In addition, we do not escalate a patient's dose level in cycle 2 if severe toxicity was observed in cycle 1 (equation M162). Both restrictions are due to safety concerns.

A third safety restriction is defined implicitly in terms of the cycle-specific utility equation M163 A patient is not treated (equation M164) if there is no dose with expected utility equation M165. For equation M166 the expected utility equation M167 is compared with the expected utility of not receiving any treatment in both cycles, equation M168 (horizontal dotted line in Figure Figure1(d)).1(d)). Any equation M169 with equation M170 below the line is not considered acceptable treatment. For equation M171, the expected utility equation M172 is similarly compared with the expected utility of equation M173, equation M174 (horizontal dotted line in Figure Figure1(a)–(c)),1(a)–(c)), and any equation M175 with equation M176 below the line is not acceptable.

At any interim point in the trial, let equation M177 denote the current data, including dose assignments for previously enrolled patients. The three conditions together make the action sets for equation M178 and equation M179 dependent on equation M180, equation M181 and equation M182. We let equation M183 and equation M184 denote the action sets for equation M185 and equation M186, respectively, that are implied by these three restrictions.

2.4. Inference model

Thus far, our discussion of optimal decisions has not included a particular probability model. We will assume a 4D ordinal probit model for equation M187 with a regression on doses equation M188 and equation M189 standardized to the domain [0, 1], with equation M190 and equation M191. Let equation M192 denote a vector of latent probit scores for the equation M193th patient and let equation M194 and equation M195 denote fixed cutoffs that define equation M196 if equation M197 and equation M198 if equation M199 equation M200. While varying the mean of distributions of equation M201 and equation M202 across cycles, the same cutoffs are used for all cycles. The equation M203 and equation M204 are multivariate normal probit scores,

equation M205

and equation M206. The covariance matrix implies associations across cycles and across outcomes through equation M207 and equation M208. Given that the ordinality of the outcomes is accounted for by the latent probit scores and fixed cutoff parameters equation M209 and equation M210 a simple yet flexible model for regression on dose is obtained by assuming equation M211 = equation M212 with equation M213 for toxicity and equation M214 for efficacy. A discussion of nonlinear dose–response models is given by Bretz and others (2005). We assume that the toxicity and efficacy probabilities increase monotonic in dose by requiring equation M215 and equation M216. Denote equation M217, equation M218 and equation M219. We complete the model with a normal prior equation M220, equation M221.

3. Trial design

3.1. Adaptive randomization

Denote equation M222. Although, in terms of the utility-based objective function, equation M223 yields the best clinical outcomes for the next patient, the performance of the design, in terms of frequentist operating characteristics, can be improved by including adaptive randomization (AR) among actions giving values of the objective function near the maximum at equation M224. Using AR decreases the probability of getting stuck at a suboptimal equation M225 and also has the effect of treating more patients at doses having larger utilities, on average. The problem that a “greedy” search algorithm may get stuck at suboptimal actions, and the simple solution of introducing additional randomness into the search process, are well known in the optimization literature (cf. Tokic, 2010). This has been dealt with only very recently in dose-finding (Bartroff and Lai, 2010; Azriel and others, 2011; Braun and others, 2012; Thall and Nguyen, 2012).

To implement AR, we first define equation M226 to be a function decreasing in patient index equation M227 and denote equation M228. We define the set of equation M229-optimal doses for cycle 1 to be

equation M230

The set, equation M231 contains doses equation M232 in equation M233 whose equation M234 is within equation M235 of the maximum posterior mean utility. Similarly, we define the set of equation M236-optimal doses for cycle 2 given equation M237 to be

equation M238

equation M239 in (3.1) is based on (2.5). Our design randomizes patients uniformly among doses in equation M240 for equation M241 and equation M242 for equation M243 which we call AR(equation M244). Numerical values of equation M245 depend on the range of equation M246, and are determined by preliminary trial simulations in which equation M247 is varied.

3.2. Illustrative trial

Our illustrative trial studied in the simulations is a stylized version of the phase I–II chemotherapy trial with five dose levels described in Section 1, but here accounting for two cycles of therapy. The maximum sample size is 60 patients with a cohort size of 2. Based on preliminary simulations, we set equation M248 for the first 10 patients, equation M249 for the next 10 patients, and equation M250 for the remaining 40 patients. An initial cohort of 2 patients is treated at the lowest dose level in cycle 1, their cycle 1 toxicity and efficacy outcomes are observed, the posterior of equation M251, equation M252 and equation M253 is computed, and actions are taken for cycle 2 of the initial cohort. If equation M254 then patient equation M255 does not receive a second cycle of treatment. If equation M256, then AR(equation M257) is used to choose an action for cycle 2 from equation M258. When the toxicity and efficacy outcomes are observed from cycle 2, the posterior of equation M259 is updated. The second cohort is not enrolled until the first cohort has been evaluated for cycle 1. For all cohorts after the first, after the outcomes of all previous cohorts are observed, the posterior is updated, the posterior expected utility, equation M260 is computed using equation M261, and equation M262 is determined. Using equation M263 and equation M264, we find equation M265 and search for equation M266. If equation M267 for any interim equation M268 then equation M269, and the trial is terminated. If equation M270, we then choose a cycle 1 dose from equation M271 using AR(equation M272). Once the outcomes in cycle 1 are observed, the posterior is updated. Using equation M273 and equation M274, equation M275 is searched. If equation M276 contains equation M277 only, then equation M278 and no cycle 2 dose is given to patient equation M279 Otherwise, equation M280 is selected from equation M281 using AR(equation M282). The toxicity and efficacy outcomes are observed from cycle 2 and the posterior of equation M283 is updated. The above steps are repeated until either the trial has been stopped early or equation M284 has been reached. At the end of the trial, we record equation M285 as recommended first cycle dose equation M286 and equation M287 as optimal policy equation M288. If the trial is early terminated, let equation M289 and equation M290 for all equation M291.

4. Simulation study

4.1. Designs for comparison

Let DTD-O2 denote the proposed decision-theoretic two-cycle design. We compare DTD-O2 with three other designs. The first is obtained by reducing each 3-level efficacy and toxicity outcome to a 2-category (binary) variable by combining categories, but using the same probability model to ensure a fair comparison. The next two comparators are single cycle designs. The first, called Single Cycle Comparator 1 (SCC1), assumes no association between cycles and optimizes equation M292 and equation M293 separately. The second, called Single Cycle Comparator 2 (SCC2), does not distinguish between cycles and treats the two cycles identically.

For SCC1, we assume patient-specific random probit scores, independent over cycles, equation M294 equation M295 equation M296, where equation M297 and equation M298 is the equation M299 covariance matrix. We let equation M300 be the upper-left partition of equation M301 in (2.6). Owing to the independence of probit scores over cycles within a patient, SCC1 models the association between equation M302 and equation M303 within the same cycle only and does not assume any association between outcomes in different cycles, for example, equation M304 and equation M305. The other model specification including the regression of equation M306 on the dose in Section 2.4 stays the same. For SCC2, in addition to having patient- and cycle-specific random probit scores as in SCC1, we assume that the mean dose effects are identical in the two cycles by dropping the cycle index from equation M307 in Section 2.4, i.e. setting equation M308, equation M309 for all equation M310. For these two methods, we apply the acceptability rules in Section 2.3 and the AR rules in Section 3.1 for each cycle separately. For example, a trial is terminated if equation M311 for all equation M312 and equation M313 is defined with equation M314 only. Also, the no-escalation rule after equation M315, no-skipping rule and AR similar to those implemented in the proposed method are implemented to SCC1 and SCC2.

4.2. Simulation setup

We simulated trials under each of 8 scenarios using each of the designs. A total of equation M316 trials were simulated for each design under each scenario. The simulation scenarios were determined by fixing a set of marginal probabilities and regression coefficients on probit scores, given in Table Table22 and Supplementary Material Table S1 (available at Biostatistics online). Each simulation scenario is specified by the marginal distributions of equation M317 and equation M318. Table Table22 gives the true equation M319 and equation M320 under each scenario. The corresponding probit scores are equation M321 and equation M322, where equation M323 is the cumulative distribution function of the standard normal distribution. To ensure a fair comparison, we intentionally define a simulation truth that is different from the assumed model used by the design methodology. The simulation model is best described as a generative model, first for equation M324, then equation M325 given equation M326, and then equation M327 given equation M328.

Table 2.
Assumed probabilities, equation M329 and equation M330. These marginal probabilities are used to determine probit scores, equation M331 and equation M332

Generating equation M333: We first generate equation M334 from the distribution specified by equation M335 where equation M336. For later reference, we define a rescaled variable equation M337 as equation M338 which is evenly spaced in equation M339.

Generating equation M340: Conditional on equation M341 we specify a distribution of equation M342 by letting

equation M343

with coefficient equation M344. Here, equation M345 induces association between the cycle 1 outcomes, equation M346 and equation M347. A negative value of equation M348 leads to a negative association between equation M349 and equation M350, that is, equation M351, equation M352. For later use, we define equation M353 by rescaling equation M354 to be evenly spaced in equation M355 similarly to equation M356

Generating equation M357: We generate equation M358 using

equation M359

Here, equation M360 is a standardized dose in equation M361. We restrict equation M362 and equation M363 to induce a positive association of equation M364 with equation M365 and equation M366 and negative association with equation M367. Here, equation M368 determines how equation M369 and equation M370 jointly affect equation M371. A large negative value of equation M372 implies that given that equation M373 (severe toxicity) is observed at equation M374, the probability of observing equation M375, equation M376 greatly increases for all equation M377. Similarly, observing equation M378 (mild toxicity) at equation M379 greatly increases the probability of observing equation M380 for all equation M381, implying a large positive value of equation M382.

Generating equation M383: We use

equation M384

where equation M385 and equation M386. Similar to equation M387, equation M388 determines a joint effect of equation M389 and equation M390 on equation M391. The detailed specification of the coefficients, equation M392 and equation M393 for each simulation scenario is described in the Supplementary Materials (available at Biostatistics online). Table Table33 shows the optimal actions, equation M394 and equation M395, over two cycles under each of the 8 simulation scenarios under the simulation truth. For example, in Scenario 3, the optimal cycle 1 action is to give dose level 3, and the optimal cycle 2 action is to treat patients with equation M396 at equation M397, and at equation M398 if equation M399.

Table 3.
True optimal actions, equation M400 and equation M401

We calibrate the fixed hyperparameters, equation M428, for equation M429 and equation M430 and the cutoff points, equation M431, using effective sample size (ESS), described in the Supplementary Materials (available at Biostatistics online). We set equation M432 and the cutoffs, equation M433 and equation M434, and simulate 1000 pseudo-samples of equation M435, equation M436, equation M437 and equation M438. We then compute probabilities of interest based on the pseudo-samples, such as equation M439 and equation M440, equation M441. For all simulations, we determined equation M442 to give each prior ESS between 0.5 and 2, using the approximation obtained by matching moments with a Dirichlet distribution. We used the same equation M443 for SCC1 and SCC2.

4.3. Evaluation criteria

We evaluate design performance for the patients treated in the trial using three different summary statistics, equation M444, equation M445, and equation M446. Recall that in a trial we record the clinical outcomes of the equation M447 patients with their assigned doses and recommended doses for future patients, equation M448, equation M449, equation M450 and equation M451, and equation M452 respectively. We index the equation M453 simulated replications of the trial by equation M454. We define average utility for the equation M455 patients in the equation M456th simulated trial in two different ways; equation M457 and equation M458 Note that equation M459 is a function only of occurred outcomes, equation M460, whereas equation M461 depends on the true utilities of assigned doses equation M462. For equation M463 and equation M464, equation M465 is used as the utility for patients with equation M466. The empirical mean total payoffs taken over all simulated trials are

equation M467

One may regard equation M468 and equation M469 as indexes of the ethical desirability of the method, given equation M470

The proposed method gives an optimal action equation M471 for cycle 1, and policy equation M472 for cycle 2. We let equation M473 for all equation M474 if equation M475 so the trial is terminated early. We use equation M476 and equation M477 to evaluate performance in terms of future patient benefit. Under SCC1 and SCC2, equation M478 is not a function of equation M479. For SCC2, equation M480 and equation M481 are identical. Assuming that the simulation truth is known, we define the expected payoff in cycle 1 of giving action equation M482 to a future patient as equation M483 for equation M484. That is the expected utility with respect to the assumed distribution of equation M485 when equation M486 is given. For equation M487, let equation M488. This expectation is computed under the distribution of equation M489 given equation M490. If the rule equation M491 is used, the expected cycle 2 payoff is

equation M492

where equation M493 becomes equation M494 if equation M495. The total expected payoff to a future patient treated using the optimal regime equation M496 = equation M497 is defined to be equation M498.

4.4. Comparison to designs with binary outcomes

We first compare DTD-O2 with designs obtained by collapsing each trinary toxicity and efficacy outcome to a binary variable. This mimics what often is done in practice in order to apply a phase I–II design based on binary efficacy and toxicity. We use an appropriately reduced version of our assumed underlying model to ensure a fair comparison. Since this reduction is not unique, we exhaustively define binary outcomes in four different ways, binary cases 1–4, given in Section 4 of the Supplementary Material (available at Biostatistics online). The utilities associated with the binary outcomes are defined accordingly based on the utilities in Table Table1.1. The results, in terms of equation M499 equation M500 and equation M501 are summarized graphically in Figure Figure2.2. Scenario 8 is not included in Figure Figure22 because the optimal action is equation M502 in both cycles, and in this case all designs stop the trial early with high probability, The figure shows that reducing to binary outcomes can produce designs with much worse performance than DTD-O2, while for some cases the performance may be comparable. The binary outcome design's performance also varies substantially with the particular dichotomization used. Since different physicians may combine ordinal categories in different ways, the practical implication is that the additional complexity of the ordinal outcome design is worthwhile, in terms of benefit to both the patients treated in the trial and future patients.

4.5. Comparison to single cycle designs

The simulation results for DTD-O2, SCC1, and SCC2 are summarized in Figure Figure3.3. Scenarios 1–4 have the same marginal toxicity and efficacy probabilities, but different values of coefficients (equation M503), yielding different probit scores and different association structures of equation M504, equation M505, equation M506 and equation M507. Scenario 1 has large equation M508 and equation M509, so that the cycle 1 toxicity outcome greatly affects cycle 2 expected utilities in the simulation truth. As shown in Table Table3,3, the optimal action in cycle 2 after observing severe toxicity in cycle 1 is equation M510 regardless of the cycle 1 efficacy outcome. Scenario 4 is similar to Scenario 1 but the cycle 1 efficacy outcome heavily affects the cycle 2 treatment in that all cycle 2 treatments are less desirable than equation M511 when PD is observed in cycle 1. In Scenarios 2 and 3, the two cycle 1 outcomes jointly determine the cycle 2 treatment as shown in the tables. Scenario 3 has larger association between equation M512 and equation M513 within each cycle. In Scenarios 1–4, modeling dependence across cycles improves the performance, as shown in Figure Figure3,3, where DTD-O2 is superior to SCC1 and SCC2 in terms of all the three criteria, equation M514, equation M515, and equation M516. Since the only difference between DTD-O2 and SCC1 is whether the two cycles are modeled jointly or separately, the results show that the joint modeling significantly improves the performance. Differences in the performance are smaller for Scenarios 1 and 4. This may be because the true structure that one cycle 1 outcome dominates cycle 2 decisions in the scenarios is not easily accommodated under the assumed covariance structure in (2.6) and each trial gets only a small number of patients. In such a case, separate estimation for the two cycles may not be a very poor approach. In addition, the three methods are compared using equation M517 and equation M518 based on the last 20 patients in each trial for the three designs (not shown). This comparison shows that the improvement by DTD-O2 over the other two methods becomes greater, especially for Scenarios 1 and 4. It may imply that learning takes more patients for DTD-O2 when there is a discrepancy between the truth and the model assumption.

Fig. 3.
Plot of equation M519 for a comparison with SCC1 and SCC2. Here, equation M520, equation M521, and equation M522 represent empirical mean utilities of patients treated in the trial, true mean utilities of treatments given to patients in the trial, and true expected utilities chosen for future patients, ...

Scenarios 5–7 have different shapes for equation M526 as a function of equation M527. The cycle 1 utilities are U-shaped in Scenario 5, monotone increasing in Scenario 6, and monotone decreasing in Scenario 7. Very mild associations between outcomes and between cycles are assumed for these scenarios. For Scenarios 5 and 6, DTD-O2 achieves notably better performance (see Figure Figure3),3), with equation M528 and equation M529 similar to each other for DTD-O2. This implies that DTD-O2 identifies desirable actions early in the trial, treats many of the patients with the desirable actions, and has a high probability of selecting truly optimal actions at the end of a trial. In Scenario 7, DTD-O2 shows slightly worse performance (see the rightmost of Figure Figure3).3). In the simulation truth of Scenario 7, the cycle 1 expected utility does not change much with equation M530 but the cycle 2 expected utility is very sensitive to equation M531, equation M532, and equation M533. This is a very challenging case for DTD-O2, and not modeling dependence between the cycles leads to better performance than incorrectly modeling in this particular scenario. Scenario 8 has no acceptable dose in either cycle. All the three methods terminate the trials with probability 1 in this case, with mean sample sizes 9.11, 8.33, and 8.29.

In all 8 scenarios, SCC2 yields better results than SCC1. This may be because equation M534 and equation M535 happen to be identical in many cases, so combining outcomes from the two cycles works well. However, the results for Scenarios 1–4 show that using each patient's cycle 1 dose and outcomes to select equation M536 gives significantly superior performance in cases where there is significant dependence between the two cycles. More results are summarized using empirical toxicity and efficacy probabilities in Section 3 of Supplementary Material (available at Biostatistics online).

We carried out a sensitivity analysis in equation M537 under Scenarios 2 and 5, including the four binary outcome designs, SCC1, SCC2, and DTD-O2, for equation M538, 0.4, 0.8, and 1.0. The results, given in Section 5 of Supplementary Material (available at Biostatistics online), show that changes in design performance with equation M539 are very small, but equation M540 corresponding to no use of cycle 2 utility in making a decision at cycle 1, yields higher early termination probabilities for binary outcome cases 1 and 3.

5. Discussion

We have extended the decision-theoretic two-cycle phase I–II dose-finding method in Lee and others (2015) to accommodate ordinal outcomes. Our simulations show that incorporating cycle 1 information into the cycle 2 treatment decision yields good performance for both patients treated in a trial and future patients. The simulations in Figure Figure22 show that this extension may greatly improve design performance, quantified by equation M541 equation M542 and equation M543 compared with using binary toxicity and efficacy indicators. The proposed model and method also compared quite favorably with either assuming the two cycles are independent or ignoring the distinction between cycles 1 and 2.

In theory, DTD-O2 could be extended to more than two cycles. For this to be tractable, additional modeling assumptions may required to control the number of parameters, since decisions must be made based on small sample sizes. Two possible approaches are to model dependence among cycles as a function of distance between cycles, or to make a Markovian assumption.


Y.J. research is supported in part by NIH R01 CA132897. P.F.T. research was supported in part by NIH R01 CA 83932. P.M. research was supported in part by NIH 1-R01-CA157458-01A1. This research was supported in part by NIH through resources provided by the Computation Institute and the Biological Sciences Division of the University of Chicago and Argonne National Laboratory, under grant S10 RR029030-01.

Supplementary Material

Supplementary Data:


We specifically acknowledge the assistance of Lorenzo Pesce (University of Chicago). Conflict of Interest: None declared.


  • Azriel D., Mandel M., Rinott Y. (2011). The treatment versus experimentation dilemma in dose finding studies. Journal of Statistical Planning and Inference 1418, 2759–2768.
  • Bartroff J., Lai T. L. (2010). Approximate dynamic programming and its applications to the design of phase I cancer trials. Statistical Science 255, 245–257.
  • Bellman R. (1957) Dynamic Programming, 1 edition Princeton, NJ, USA: Princeton University Press.
  • Braun T. M., Kang S., Taylor J. M. G. (2012). A phase I/II trial design when response is unobserved in subjects with dose-limiting toxicity. Statistical Methods in Medical Research. 0962280212464541. [PMC free article] [PubMed]
  • Bretz F., Pinheiro J. C., Branson M. (2005). Combining multiple comparisons and modeling techniques in dose–response studies. Biometrics 613, 738–748. [PubMed]
  • Cheung Y. K., Chakraborty B., Davidson K. W. (2014). Sequential multiple assignment randomized trial (smart) with adaptive randomization for quality improvement in depression treatment program. Biometrics 712, 450–459. [PMC free article] [PubMed]
  • Iasonos A., Zohar S., O'Quigley J. (2011). Incorporating lower grade toxicity information into dose finding designs. Clinical Trials 84, 370–379. [PMC free article] [PubMed]
  • Lee J., Thall P. F., Ji Y., Müller P. (2015). Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. Journal of the American Statistical Association 110510, 711–722. [PMC free article] [PubMed]
  • Murphy S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 652, 331–355.
  • Parmigiani G., Inoue L. (2009) Decision Theory: Principles and Approaches. New York: Wiley.
  • Robert C. P. (2007) The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd edition Berlin: Springer.
  • Thall P. F., Cook J. D. (2004). Dose-finding based on efficacy–toxicity trade-offs. Biometrics 603, 684–693. [PubMed]
  • Thall P. F, Nguyen H. Q. (2012). Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes. Journal of Biopharmaceutical Statistics 224, 785–801. [PMC free article] [PubMed]
  • Tokic M. (2010). Adaptive equation M544-greedy exploration in reinforcement learning based on value differences. In: KI 2010: Advances in Artificial Intelligence. Berlin: Springer, pp. 203–210.
  • Van Meter E. M., Garrett-Mayer E., Bandyopadhyay D. (2012). Dose-finding clinical trial design for ordinal toxicity grades using the continuation ratio model: an extension of the continual reassessment method. Clinical Trials 93, 303–313. [PubMed]
  • Yin G., Li Y., Ji Y. (2006). Bayesian dose-finding in phase i/ii clinical trials using toxicity and efficacy odds ratios. Biometrics 623, 777–787. [PubMed]
  • Zhang W., Sargent D. J., Mandrekar S. (2006). An adaptive dose-finding design incorporating both toxicity and efficacy. Statistics in Medicine 2514, 2365–2383. [PubMed]
  • Zhao Y., Zeng D., Socinski M. A., Kosorok M. R. (2011). Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 674, 1422–1433. [PMC free article] [PubMed]

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press