The majority of phase I designs use the presence of severe toxicities as a guide to finding the maximum tolerated dose (MTD) which is the aim of Phase I trials. Dose limiting toxicities (DLT) are defined as pre-specified severe adverse events (AE) of grade 3 or higher, based on Common Toxicity Criteria (CTCAE v4.0) [
1]. CTCAE are international guidelines that measure the severity of an AE from mild (grade 1) to death-related (grade 5). Most phase I trials [
2] use a binary response for each patient which indicates the presence or absence of DLTs. Many statistical papers have explored the use of individual grades either by combining various toxicities to a summary measure and assuming a continuous response [
3,
4], or by fitting multivariate models for ordinal response, when the outcome is any toxicity grade in the scale of 1-5 [
5,
6,
7]. A number of authors, [
4,
8] have proposed methodology that transforms the observed toxicities per patient, from mild to severe, into a single summary measure of “equivalent toxicity score” or “toxicity burden”. This summary measure takes the form of a linear combination of weights resulting in a single, continuous or quasi-continuous outcome, whose expected value is a weighted sum of rates of di erent toxicity types and grades. Chen et al [
9] have extended designs with escalation with overdose control (EWOC [
10]) to use a quasi-continuous variable as a toxicity response. Specifically, they proposed a novel toxicity score system that quantitatively maps the multiple toxicities per patient to a normalized toxicity score. Some of these designs have been shown via simulations to be superior to designs that use a binary response under certain conditions. However, there are practical implications that one should take into account. These designs are mapping the summary scores into a scale that it is not easily interpretable by clinical investigators, and in addition the designs are targeting an acceptable toxicity level based on an arbitrary cuto that it is not directly related to the original 1-5 scale. Moreover, the weighting scheme of certain configurations of toxicities can result in an acceptable “summary score” based on a pre-selected cuto , when in fact individually these toxicities could be deemed alarming or unsafe by the clinicians. More recently Lee et al [
11] proposed a toxicity burden score that summarizes di erent types and grades of toxicities into a single outcome per patient, and uses this within the Continual Reassessment framework. The approach of Lee et al has a lot of promise since, in actual clinical studies, the type of information on toxicities is always of such a nature. Their idea is to address this specificity directly so that the di erent types of toxicity, if not necessarily equally weighted, appear on an equal footing when the problem is considered. Other authors [
12,
13] have argued that the usual binary information on toxicity (DLT yes/no) can be refined by taking explicit account of lower and intermediary toxicity grades. Our focus here is to still consider the single binary outcome (DLT yes/no) as the main outcome variable of interest, but to see to what extent auxiliary information on lower and intermediary grades can help us in that endeavor. Our development is close to that of Wang et al.[
12] but we pay particular attention to the use of information on intermediary grades in the context of the two stage CRM designs.
If we decide to retain as our single outcome criterion the presence of DLT, then there is still an important question concerning the information which may be contained in lower grade toxicities. We might like to know whether information on individual grades can provide a more accurate or more e cient estimate of the MTD. If so, then this information could be used in improving the operating characteristics of a Phase I design. For example, a design might utilize the occurrence of a DLT as its essential outcome and, as a secondary, or auxiliary outcome variable, the occurrence of a lower grade toxicity. In this paper, we compare designs based on the Continual Reassessment Method (CRM) that use information on grades in various ways. For example, one can use individual grades only at the beginning of a two-stage design when limited data is available in order to enrich the dose escalation early on based on the information obtained from non-DLTs and possibly reach the MTD faster. Another design can utilize the grades throughout the trial by modifying the CRM algorithm for an ordinal response. In this situation, we assume a model so that the rate of occurrence of lower grade toxicities can be related to the rate of occurrence of DLTs. As the data are sequentially gathered, information can be obtained about the relative rates of lower grade toxicities and DLTs. If this knowledge can help in identifying the MTD more accurately then, of course, it might be made use of in dose finding studies.
The definition of a DLT itself combines a lot of information on grades. This is done in as part of a complex, if informal, procedure to determine which kind of toxicities are, broadly, unacceptable. Taking into account of lower grade toxicities is unlikely to make very big changes to our ability to accurately locate the MTD. However, even intuitively, we can see that there are cases where lower grade toxicities will be providing some information, if only be their absence. If there is no indication of any kind of reaction to treatment, or drug levels, then the chance must be quite high that we are still experimenting too low to be in the neighborhood of the MTD. It can also be of interest in its own right to learn something of the relationship between the occurrence of lower grade toxicity to the occurrence of DLT. In the comparative work that follows, we consider two stage CRM designs which make di erent use of lower-grade toxicity information. Specifically we study; a two-stage design that utilizes lower-grade toxicities in the first stage only, and two-stage designs where lower grades are used throughout the trial via explicit models. For purposes of reference within this article, we call these CRMG(1,0), CRMG(1,1) and CRMG(1,2) as shown in .