PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Biopharm Stat. Author manuscript; available in PMC 2018 January 1.
Published in final edited form as:
J Biopharm Stat. 2017; 27(1): 25–33.
Published online 2016 February 16. doi:  10.1080/10543406.2016.1148716
PMCID: PMC5635844
NIHMSID: NIHMS900356

Boundary problem in Simon’s two-stage clinical trial designs

Abstract

The activity of a new treatment in clinical trials with binary endpoints can be assessed by comparing the observed response rate to the target response rate. Traditionally, a one-sided hypothesis is used to make statistical inference, and the actual type I error rate has to be computed over the parameter space of the null hypothesis. The monotonicity property is a fundamental property that guarantees the actual type I error rate occurring at the boundary. One-arm two-stage designs are considered in this article. We theoretically proved this important property when the final threshold value of a design is less than the first stage sample size together with another weak condition being satisfied. The method used in this article may finally lead to the complete proof of this property in the future. We also numerically proved that the monotonicity property is satisfied for designs with the first stage and the second stage sample sizes from 10 to 100.

Keywords: Boundary, minimax design, monotonicity property, optimal design, Simon’s two-stage design

1 Introduction

The activity of a new treatment or therapy in Phase II clinical trials with dichotomized outcomes, such as Oncology studies, is often assessed by comparing the observed response rate to the target response rate [1, 2, 3]. It is often the case that the effectiveness of the treatment is determined when a high response rate is observed. Therefore, a one-sided hypothesis is traditionally used for statistical inference. Under such hypotheses, the actual type I error rate should be computed as the worst case scenario of the tail probability over the parameter space under the null hypothesis, and so is the type II error rate. However, existing multi-stage designs either test a simple hypothesis at two response rates, or assume that the type I error is obtained at the boundary of the hypothesis and check this assumption only for the obtained optimal design.

The monotonicity property is an important property for non-inferiority and superiority trials with discrete endpoints [4, 5, 6, 7, 8, 9, 11]. In existing literature, discrete data, e.g., binomial and Poisson data, are often considered in these trials, and a test statistic should be utilized to order the sample space. That is, the monotonicity property is associated with a test statistic. Röhmel and Mansmann [12] showed that the type I error is obtained at the boundary of the one-sided null space for comparing two independent binomial distributions. The optimal multi-stage design is often searched iteratively, and no test statistic is involved in this procedure. Thus, it is extreme difficult to prove the monotonicity property to make sure that the actual type I error rate and power occur on the boundary of the hypothesis. This fundamental property of monotonicity can be used to reduce the computational intensity for searching the optimal design.

Simon’s two-stage designs are the most widely used multi-stage designs in early phase clinical trials with binary endpoints. Two type of optimal designs were proposed by Simon [1]: the optimal design with the smallest expected sample size under the boundary of the null hypothesis, and the minimax design with the smallest possible total sample size and the smallest expected sample size under the boundary of the null hypothesis. These two optimal two-stage designs are commonly used in studies, such as Oncology research [13], AIDS research [14] and gastroesophageal research [15]. In such studies, the activity of the new treatment will be rejected for a low response rate. Therefore, a one-sided hypothesis would be appropriate for statistical inference. Due to the computational burden, it is often assumed that the actual type I error rate is obtained on the boundary instead of any other values of parameter within the null hypothesis space.

We theoretically prove this important property when the final threshold value of the design is less than the first stage sample size, together with another weak condition being met. When the threshold value of the design is greater than or equal to the first stage sample size, we derived the detailed formula for the derivative, and illustrate the proof by using two specific Simon’s two-stage designs. In addition, we numerically prove the monotonicity property of these designs that the type I error rate occurs at the boundary, from an extensive numerical study for all possible designs with the first stage and the second stage sample sizes from 10 to 100. The remainder of this article is organized as follows. Section 2 presents the monotonicity property of two-stage designs, and proves this property under certain conditions. Section 3 provides some remarks.

2 Methods

A new treatment or therapy in clinical studies will be rejected for smaller number of responses, and the associated hypotheses are presented as

H0:ppu,

against the alternative

Ha:ppa,

where pu and pa represent the unacceptable and acceptable response rates for the new treatment, respectively. The acceptable rate is always larger than the unacceptable rate, pa > pu. For example, in a clinical trial for urothelial cancer with neoadjuvant therapy [13], the unacceptable and acceptable response rates were pu = 35% and pa = 50%, respectively.

In the widely used Simon’s two-stage optimal designs, four design parameters, (r1, n1, r, n2), need to be determined for a given type I (α) and II (β) error rates, where r1 and r are the threshold values of the number of responses for the first stage and both stages combined, and n1 and n2 are the number of patients required in the first and second stages, respectively. It should be noted that these optimal designs are terminated in the first stage for futility only with a low response rate being observed. That is, the trial will be terminated if the first stage responses are less than or equal to r1 out of n1 patients. When the number of response from the first stage is observed to be larger than r1, an additional n2 patients will be enrolled, and the final decision will be made after a comparison between the total number of responses and r. The null hypothesis is rejected at the end of a study if at least r + 1 responses in n = n1 + n2 patients are observed. Suppose xi is the number of responses from the i-th stage, where i = 1, 2. For an optimal design with design parameters (r1, n1, r, n2), the rejection region is

Ω={(x1,x2):x1>r1,(x1+x2)>r},

and the associated tail probability at the probability p is given as

P(r1,n1,r,n2|p)=x1=r1+1min(n1,r)dbinom(x1,n1,p)[1pbinom(rx1,n2,p)]+x1=min(n1,r)+1n1dbinom(x1,n1,p),

where dbinom and pbinom represent the probability and cumulative functions of a binomial distribution, respectively. It follows that the actual type I error rate is

TIE=maxpH0P(r1,n1,r,n2|p).

Due to the complexity of the tail probability, it is not easy to prove the monotonicity property that the actual type I error is obtained on the boundary of the null hypothesis, pu. Therefore, one has to compute the tail probability over all {p: 0 < p ≤ pu}, to determine the maximum of these quantities as the actual type I error rate. If this step is added to the search for the optimal design, it would be very computationally intense. For this reason, existing articles assume that the actual type I error occurs on the boundary during the design search, and this assumption is only checked for the obtained design.

From a practical perspective, the trial should be stopped for futility or efficacy when no response or all responses are observed from the first stage. It is also reasonable to reject the null hypothesis if all patients from the second stage respond to the treatment. All these extreme cases were excluded in the proof of the monotonic property of the tail probability due to the fact that such designs are often not optimal, and their tail probability functions may not be monotonic as seen in our numerical studies. Although these studies are not presented in this article, they are available for request from the authors. After excluding these extreme cases, we provide the following theorem to prove that the tail probability f(p) = P (r1, n1, r, n2|p) is monotonic when r < n1.

Theorem 2.1

The monotonicity property of two-stage designs is satisfied when the following two conditions are met:

r<n1
(1)

and

n1n1r(n1(n1r12)(2n1r)(n1r11))n1r12n2r2+1,
(2)

where r2 = r − r1. Specifically, the tail probability f(p) is a monotonic increasing function of p, 0 < p < 1, when Equations (1) and (2) are satisfied.

Proof

For simplicity, we use x to replace x1 in the tail probability f(p), which can be alternatively expressed as

f(p)=1pbinom(r1,n1,p)x=r1+1min(n1,r)dbinom(x,n1,p)pbinom(rx,n2,p).

The derivative of the tail probability with regards to p is calculated as

fp=(n1r1)(n1r1)pr1(1p)n1r11+pr(1p)n1+n2r1x=r1+1min[n1,r](n1x)(n2rx)[n2r+xpn1x1(n12n1p+px)].

This derivative can then be written as

fp=pr1(1p)n1r11[(n1r1)(n1r1)+prr1(1p)n2r+r1x=r1+1min[n1,r](n1x)(n2rx)h(x|p)],

where r1 + 1 ≤ x ≤ min(r, n1) and h(x|p)=n2r+xpn1x1(n12n1p+px). In the function of h(x|p),

pn1x1(n12n1p+px)=(2n1x)(n12n1xp)pn1x1

is a unimodal function of p, and its maximum is obtained at px, where px is the solution of log[h(x|p)]p=0. The derivative of log[h(x|p)] with respect to p is

log[h(x|p)]p=n1+n1x+2n12p+px2n123n1pxp(n12n1p+px),

and the associated solution of log[h(x|p)]p=0 is calculated as

px=n12n1n1x2n123n1x+x2.

Thus, h(x|p) archives its minimum value at p = px as

hmin(x)=n2r+xn1n1x(n1(n1x1)(2n1x)(n1x))n1x1

Given r < n1 and r1 + 1 ≤ x ≤ min(r, n1) = r, it is easy to show that

hmin(x|p)n2r+r1+1n1n1r(n12n1rn1r12n1r11)n1r12.

The Equation (2) is used to guarantee that hmin(x) ≥ 0. It follows that h(x|p) ≥ hmin(x) ≥ 0 is satisfied for any x in the range of r1 + 1 ≤ x ≤ r. Therefore, f(p) is a monotonic increasing function of p, 0 < p < 1, when Equations (1) and (2) are satisfied. □

The condition in Equation (2) is a very weak condition as it only excludes a very small percentage of all possible designs with r < n1. We conduct an extensively numerical study to calculate this proportion, with n1 and n2 from 10 to 100, 1 < r1 < n1, and r1 < r < n1. The proportion of possible designs that do not satisfy the condition in Equation (2) is less than 0.9%. These cases often have r values very close to n1, and the range for the ratio of r and n1, r/n1, is between 90% and 99%.

Remark 2.1

All the commonly used optimal and minimax Simon’s two-stage designs with r < n1 in Simon [1], meet the condition in Equation (2).

We present the following conjecture for the boundary problem in two-stage designs.

Conjecture 1

The tail probability f(p) is a monotonic increasing function of p for any sample sizes n1 and n2 in the first stage and the second stage when 0 < r1 < n1 and r1 < r < r1 + n2.

Monotonicity is a fundamental property in two-stage designs with a one-sided hypothesis, to guarantee the actual type I error rate and power at the boundary of the hypothesis. To the best of our knowledge, there is no existing research to prove this important property for the design.

We conduct an extensive numerical study with sample sizes, n1 and n2 from 10 to 100, and the threshold values r1 and r as presented in the conjecture: 0 < r1 < n1 and r1 < r < r1 + n2. For each design parameter (r1, r, n1, n2), the derivative value [partial differential]f(p)/[partial differential]p is calculated at p from 1000 uniformly distributed points between 0 and 1. We found that all these derivative values over 0 to 1 are positive for all possible designs. The sample size range considered in this numerical study covers almost all designs in practice.

In designs with r ≥ n1, the derivative [partial differential]f(p)/[partial differential]p can be simplified as polynomials of p

fp=pr1(1p)n1r11[(n1r1)(n1r1)+prr11(1p)n2r+r1A(p)],
(3)

where A(p)=D(0)+D(1)p+k=2n1r11D(k)pk+D(n1r1)pn1r1, and

D(0)=n1(n2rn1),

D(1)=(n2rn1)(n1n12(nr)rn1+1)+x=r1+1n1(n1x)(n2rx)(n2r+x),

D(k)=(n1k1)(n2rn1+k1)(k+n11n1(n1k+1)(nkr+1)k(kn1+r)),k=2,,n1r11,

D(n1r1)=(n2rr11)(n1r1+1)(2n1r11)

The following two examples are used to demonstrate the proof of the monotonicity property of f(p) when r ≥ n1 by using the derivative from Equation (3).

Example 2.1

For the minimax two-stage design to attain 90% power at the signifiance level of 0.05, with pu = 0.85 and pa = 0.95, the design parameters are calculated as [1]:

(r1,r,n1,n)=(31,35,35,40).

In this case, r = n1.

Proof

In this setting with design parameters r1 = 31, n1 = 35, r = 35, n = 40, the derivative has the specific form as

fp(31/35;35/40)=5p31(1p)3[41888B(p)(1p)p3]=5p31(1p)3×E(p),

where B(p) = −497420 p4 + 414120 p3 + 40390 p2 − 28673 p + 7, and E(p) = 41888 − B(p) · (1 − p) · p3.

It is easy to show that (1 − p)p3 reaches its maximum value of 33/44 at p = 3/4 and it is increasing from 0 to 3/4 and decreasing from 3/4 to 1.

  1. For p between 0 and 3/4, we calculate the maximum of each component of B(p): −497420 p4 ≤ −497420 × 04 = 0, 414120 p3 ≤ 414120 × (3/4)3, 40390 p2 ≤ 40390 × (3/4)2, and −28673 p ≤ 0. It follows that
    B(p)414120×(3/4)3+40390×(3/4)2+7=1.9743×105,
    The maximum of B(P) is less than or equal to the sum of the maximum of each component. This approach is used in this example as well as the next example to find the upper bound of a quantity.
  2. For p between 3/4 and 1, it is easy to show that
    B(p)497420(3/4)4+414120+4039028673(3/4)+7=2.7563×105
    Thus min0<p<1 B(p) ≤ min(1.9743 × 105, 2.7563 × 105) = 1.9743 × 105.
    It follows that E(p) = 41888 − B(p) · (1 − p) · p3 ≥ 41888 − 33/44 × 1.9743 × 105 = 2.10654 > 0.

Therefore, the derivative of f(p) is always positive, and f(p) is an increasing function of p. □

Plots of f(p) and E(p) as a function of p are presented on the left side and the right side of Figure 1. An increasing trend is observed for f(p). It can be seen that E(p) is always positive, and its minimum is achieved at p between 0.6 and 0.8. The value of 5p31 (1 − p)3 is always positive for any p, 0 < p < 1. Therefore, the derivative [partial differential]f(p)/[partial differential]p is always positive. We proved the monotonic property of a design with r = n1 in this example, and the next example for a design with r > n1 is also provided to illustrate the proof.

Figure 1
Tail probability f(p) and E(p) from the derivative, are plotted as a function of p for the minimax design in the example 2.1 with design parameters (r1, r, n1, n) = (31, 35, 35, 40).

Example 2.2

For the optimal two-stage design to attain 80% power at the signifiance level of 0.05, with pu = 0.3 and pa = 0.5, the design parameters are calculated as [1]:

(r1,r,n1,n)=(5,18,15,46).

Proof

After a lengthy algebra calculation, the derivative is calculated as

fp(5/15;18/46)=p5(1p)9[300304495p12(1p)18C(p)],

where C(p) = −3771167400p10 − 430990560p9 + 421370235p8 + 481016250p7 + 231306075p6 + 63882000p5 + 10636353p4 + 1050462p3 + 57855p2 − 7379398237p + 15.

It is easy to show that (1 − p)18p12 reaches its maximum value of 212/330 at p = 1/3 and it is increasing from 0 to 1/3 and decreasing from 1/3 to 1, i.e., (1 − p)18p12 ≤ 212/330 for any p.

  1. For any p : 0 ≤ p ≤ 1/3,
    C(p) ≤ 421370235/38 + 481016250/37 + 231306075/36 + 63882000/35 + 10636353/34 + 1050462/33 + 57855/32+ 15 = 1041010.3992,
    Thus 4495 p12 (1 − p)18 C(p) − 30030 ≤ 4495 × 212/330 × 1041010.3992 − 30030 = − 30029.9 < 0. It follows that fp(5/15;18/46)>0
  2. For any p : 1/3 ≤ p ≤ 1,
    C(p) ≤ − 3771167400/310 − 430990560/39 + 421370235 + 481016250 + 231306075 + 63882000 + 10636353 + 1050462 + 57855 − 7379398237/3 + 15 = −1.2506 × 109 < 0,
    It follows that 4495 p12 (1 − p)18 C(p) − 30030 < 0 and fp(5/15;18/46)>0.

From the results in (I) and (II), the derivatives of f(p) are always positive for any p between 0 and 1, therefore f(p) is a increasing function of p. □

Although we are not able to prove this conjecture completely, these examples may be able to motivate other researchers to prove the conjecture by following a similar approach as presented here. The minimum of the derivative may be computed from a mutual exclusive sets of p, for example, the mutual exclusive sets (0,3/4) and (3/4,1) were used in the first example.

One of the reviewers suggested to rewrite the tail probability P (r1, n1, r, n2|p) as another function of p, and provided the proof for the following theorem that P (r1, n1, r, n2|p) is always non-negative when p ≤ (r + 1)/n, which is included in the null space of all p ≤ pa.

Theorem 2.2

The monotonicity property of two-stage designs is satisfied when p ≤ (r + 1)/n.

Proof

P (r1, n1, r, n2|p) can be rewritten as

x1=r1+1min(n1,r)dbinom(x1,n1,p)[1pbinom(rx1,n2,p)]+x1=min(n1,r)+1n1dbinom(x1,n1,p),

By ordering this expression according to terms px(1 − p)n−x, this expression can be simplified to yield

P(r1,n1,r,n2|p)=k=r+1npk(1p)nkH(n1,n2,r1,k),

when H(n1,n2,r1,k)=x1=r1+1k(n1x1)(n2kx1) when n1r, and H(n1,n2,r1,k)=(n1k)+x1=r1+1r(n1x1)(n2kx1) where n1 > r.

It is easy to show that the coefficients H(n1, n2, r1, k) are non-negative integers and are independent of the parameter p. The derivative of terms pk(1 − p)n−k with respect to p is

pk1(1p)nk1(knp).

When all these derivatives are non-negative, P (r1, n1, r, n2|p) is a monotonic function of p, that is,

pk/n,

for all k from r + 1 to n. Then, we can conclude that the derivative of P (r1, n1, r, n2|p) is nonnegative for all p with p ≤ (r + 1)/n. □

The overall goal is to prove that the monotonicity property of two-stage designs is satisfied when 0 < p < 1 or equivalently 0 < p ≤ pa. It is observed from all optimal designs provided in Simon [1] that (r + 1)/n is generally less than pa. Therefore, the suggestion from the reviewer partially proves the monotonicity of a two-stage design under a partial parameter space, not the complete null space. One reason could be the strong condition with all derivatives being non-negative in order to guarantee a non-negative overall derivative. In order to prove this property completely, we believe the coefficient H(n1, n2, r1, k) should be utilized in the proof procedure as we demonstrated in the two specific examples, Example2.1 and Example 2.2, when r ≥ n1.

3 Discussion

The monotonicity property is of fundamental importance in multi-stage designs with one-sided hypotheses to guarantee that actual type I and II error rates occur on the boundary. In each search step, one does not need to find the actual type I error over the null hypothesis if this property is met. This property needs to be proved in order to make sure the search is appropriate. We proved this property for two-stage designs when r < n1 and another weak condition is met in the theorem. In addition, we numerically proved this property with sample size from 10 to 100 in both stages, which covers most commonly used designs. We were not able to prove this property when r ≥ n1, but we prove two specific examples, and we hope the method used in these examples can finally lead to the complete proof of this property in the future.

Modification of traditional two-stage designs also assumes that the monotonicity property is met, e.g., adaptive two-stage designs [16]. It is well known that in Simon’s two-stage designs, the sample size in the second stage, n2, is fixed when the number of response in the first stage is above the pre-determined threshold for the first stage, r1. Several adaptive designs have been proposed to allow the second stage sample size depend on the results from the first stage. For example, Banerjee and Tsiatis [16] derived an optimal adaptive two-stage design by using a Bayesian decision-theoretic construct. Recently, Englert and Kieser [17] applied an efficient integer programing, named branch-and-bound algorithm [18, 10], to search for the optimal adaptive two-stage designs without restrictions. We consider the proof of the monotonicity property for these adaptive designs as future work.

Acknowledgments

The authors are very grateful to the Associate Editor and two reviewers for their insightful comments that help improve the manuscript. Shan’s research is partially supported by a grant from the National Institute of General Medical Sciences 5U54GM104944 from the National Institutes of Health. Chen’s research is partially supported by National Institutes of Health grants: U54MD007584, G12MD007601, P20GM103466, and U54GM104944.

References

1. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled clinical trials. 1989 Mar;10(1):1–10. [PubMed]
2. Kepner JL. On group sequential designs comparing two binomial proportions. Journal of biopharmaceutical statistics. 2010 Jan;20(1):145–159. [PubMed]
3. Shan G, Hutson AD, Wilding GE. Two-stage k-sample designs for the ordered alternative problem. Pharmaceut Statist. 2012;11(4):287–294. [PubMed]
4. Barnard GA. A new test for 2 × 2 tables. Nature. 1945;156:177.
5. Barnard GA. Significance tests for 2 × 2 tables. Biometrika. 1947;34(1/2):123–138. [PubMed]
6. Berger RL, Sidik K. Exact unconditional tests for a 2 × 2 matched-pairs design. Statistical methods in medical research. 2003 Mar;12(2):91–108. [PubMed]
7. Röhmel J. Problems with existing procedures to calculate exact unconditional P-values for non-inferiority/superiority and confidence intervals for two binomials and how to resolve them. Biometrical journal. 2005 Feb;47(1):37–47. [PubMed]
8. Shan G. Exact approaches for testing non-inferiority or superiority of two incidence rates. Statistics & Probability Letters. 2014 Feb;85:129–134.
9. Shan G, Ma C. Unconditional tests for comparing two ordered multinomials. Statistical Methods in Medical Research. 2012 In press. [PubMed]
10. Shan G, Wilding GE, Hutson AD, Gerstenberger S. Optimal adaptive two-stage designs for early Phase II clinical trials. Statistics in Medicine. 2015 In press. [PMC free article] [PubMed]
11. Liu M, Hsueh HM. Exact tests of the superiority under the Poisson distribution. Statistics & Probability Letters. 2013 May;83(5):1339–1345.
12. Röhmel J, Mansmann U. Unconditional Non-Asymptotic One-Sided Tests for Independent Binomial Proportions When the Interest Lies in Showing Non-Inferiority and/or Superiority. Biom J. 1999 May;41(2):149–170.
13. Siefker-Radtke AO, Dinney CP, Shen Y, Williams DL, Kamat AM, Grossman HB, Millikan RE. A phase 2 clinical trial of sequential neoadjuvant chemotherapy with ifosfamide, doxorubicin, and gemcitabine followed by cisplatin, gemcitabine, and ifosfamide in locally advanced urothelial cancer. Cancer. 2013 Feb;119(3):540–547. [PMC free article] [PubMed]
14. Zheng L, Rosenkranz SL, Taiwo B, Para MF, Eron JJ, Hughes MD. The Design of Single-Arm Clinical Trials of Combination Antiretroviral Regimens for Treatment-Naive HIV-Infected Patients. AIDS Research and Human Retroviruses. 2012 Dec;29(4):652–657. [PMC free article] [PubMed]
15. Katz PO, Gerson LB, Vela MF. Guidelines for the diagnosis and management of gastroesophageal reflux disease. The American journal of gastroenterology. 2013 Mar;108(3) [PubMed]
16. Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in medicine. 2006 Oct;25(19):3382–3395. [PubMed]
17. Englert S, Kieser M. Optimal adaptive two-stage designs for phase II cancer clinical trials. Biometrical Journal. 2013 Nov;55(6):955–968. [PubMed]
18. Wolsey LA. Integer Programming. 1. Wiley-Interscience; 1998.