In terms of estimation,

and

should be recommended as they perform better than the other estimators, in particular when the true response rate is higher than the one under H
0, i.e. in cases when estimation is the most important. Although our simulations did not encompass all possible ranges of response rates and treatment effects, they cover a wide range of plausible situations, in which no clear advantage of the bias corrected estimator

over the UMVUE

could be found.
The choice of a conditional or unconditional inference is clearly overlooked in practical applications. Conditional inference — and conditional bias in particular — has attracted some interest in the setting of group sequential phase III trials, with concerns rather directed at the conditional bias of the estimator of the treatment effect when trials were stopped early for efficacy [
30,
31]. In the setting of Simon’s two-stage phase II trials, conditional inference would rather be favored when the trial did not stop at the first stage, especially if the trial was deemed succesful at the end [
13]. Such aspects of conditional inference have however been rarely discussed to our knowledge [
13,
32]. Results show that unbiased or almost unbiased estimation can be performed using the UMVCUE [
13] or the proper conditional distribution [
12], respectively, both with very similar RMSE. In addition, both performed well even when the sample size at the second stage was slightly different from its planned value. To construct an estimator that would be both conditionally and unconditionally unbiased, one could also derive an estimator for trials stopping at the first stage that would use the conditional distribution given
X1≤
r1. In such a case, the estimator would be conditionally unbiased whether the trial was stopped at the first or the second stage, and thus would be unconditionally unbiased. Using a distribution of outcomes conditional on early stopping makes however little sense — if any — when
r1 is small. For instance, if
r1=0, then the only potential outcome in case of early stopping is
X1=0, thus leading to a single possible value for the estimator of
Π. It is therfore not possible to construct an unbiased estimator of any value of
Π in this case. We therefore did not further develop this point in the paper. Another solution, however, would be to use a biased-corrected estimator such as Whitehead’s [
19] or Guo’s [
7] when the trial was stopped early. This has already been evoked by Pepe
et al.[
13], without further investigations.
In this study, we have concentrated on Simon’s design for phase II cancer trials. Other designs or adaptations however exist. In particular, Jovic and Whitehead have recently proposed point estimates, confidence intervals and
p-values for a modified Simon’s design with early stopping for efficacy [
33]. Other extensions of Simon’s design could also have been considered [
5,
34]. In cases where early stopping for efficacy is possible, the results of the methods proposed by Jovic and Whitehead could have been used. Tsai
et al. also applied their conditional method to Shuster’s design [
34]. Nevertheless, a short look at cancer literature shows that a majority of cancer phase II trials still use Simon’s design.
In practical applications, it may occurr that the actual number of patients recruited would be slightly different from the preplanned value. For instance some patients may be unevaluable for response or they may withdraw their consent during study. On the contrary, some patients may be included in the study before recruitment is formally closed. For these cases, where the decrease or increase of second stage sample size may be considered as non informative, Koyama and Chen proposed inference procedures based on conditional power [
11]. They clearly state in their article that the properties of their estimators,
p-values and confidence intervals need to be further studied. In our numerical settings, it turned out that the UMVUE, which can still be used because it only makes use of boundary decisions at the second stage, performed better than the Koyama–Chen method. The behaviour of both estimators with modified sample size however deserve further investigations. Concerning confidence intervals, the mid-
p intervals performed better than the so-called exact confidence intervals in most settings for both unconditional and conditional inference. Koyama and Chen however did not consider such an approach, and their confidence intervals rely on Clopper–Pearson method. Using a mid-
p approach with their modifed
p-value (equation 11) may also have improved the coverage probabilities of the confidence intervals.
Another interesting field of further research concerns inference in adaptive phase II trials, where the second stage sample size can be adapted according to the first stage results [
16,
17]. In such cases, the decrease or increase in sample size cannot be considered as non informative anymore, and the method of Koyama and Chen does not apply. New developments are thus needed here.