Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4990829

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Assessment of assay sensitivity and noninferiority in group-sequential designs
- 3. Operating characteristics
- 4. Sample size recalculation
- 5 A further extension
- 6 Summary
- References

Authors

Related links

J Biopharm Stat. Author manuscript; available in PMC 2018 January 1.

Published in final edited form as:

J Biopharm Stat. 2017; 27(1): 1–24.

Published online 2016 February 18. doi: 10.1080/10543406.2016.1148710PMCID: PMC4990829

NIHMSID: NIHMS768439

Toshimitsu Ochiai,^{1,}^{2} Toshimitsu Hamasaki,^{2,}^{3,}^{4,}^{*} Scott R. Evans,^{5} Koko Asakura,^{3,}^{4} and Yuko Ohno^{2}

We discuss group-sequential three-arm noninferiority clinical trial designs that include active and placebo controls for evaluating both assay sensitivity and noninferiority. We extend two existing approaches, the fixed margin and fraction approaches, into a group-sequential setting with two decision-making frameworks. We investigate the operating characteristics including power, Type I error rate, maximum and expected sample sizes, as design factors vary. In addition, we discuss sample size recalculation and its’ impact on the power and Type I error rate via a simulation study.

Active-controlled noninferiority trial designs are an alternative to placebo-controlled superiority designs when a use of the placebo control is ethically undesirable due to the availability of a proven effective medical intervention. Active-controlled noninferiority trial designs include an existing effective intervention such as an effective standard of care. In contrast to superiority trials where there is interest in evaluating if an intervention is superior to a control (e.g., placebo), noninferiority trials evaluate if an intervention is noninferior to the control. In a noninferiority trial, the null hypothesis of inferiority is assumed to be true unless there is sufficient data to reject it in favor of the alternative (noninferiority). Noninferiority is assessed by evaluating whether inferiority of a pre-specified magnitude (called a noninferiority margin) can be ruled out with reasonable confidence using confidence intervals. The noninferiority margin is carefully selected to ensure that a noninferiority result would: (1) imply retention of the some of the effect that the active control has historically displayed (i.e., when compared to placebo), and (2) rule out clinically important levels of inferiority so that clinical application would be ethical and clinically acceptable.

For example, EMERALD 1 (conducted in the United States) and EMERALD 2 (conducted in Europe) are randomized, controlled, open-label, noninferiority clinical trials to evaluate the efficacy and safety of peginesatide as the maintenance treatment of anemia in patients with chronic renal failure who were receiving hemodialysis and previously treated with epoetin (Fishbane et al., 2013). Both trials included a 6-week screening period, a 28-week initial dose-adjustment period, an 8-week evaluation period, and a longer-term follow-up period (≥16 additional weeks). Eligible participants were randomly assigned, in a 2:1 ratio, to receive peginesatide once every 4 weeks or to continue to receive epoetin (epoetin alfa in the EMERALD 1, and epoetin beta in the EMERALD 2) one to three times a week. The frequency and route of administration of epoetin was determined based on the treatment regimen during the screening period. The primary efficacy endpoint was the change from the baseline hemoglobin level during the evaluation period. Noninferiority for both trials would be established if the lower limit of the two-sided 95% confidence interval was −1.0 g per deciliter or higher, indicated that inferiority of greater than −1.0 could be ruled out with reasonable confidence, compared to epoetin.

For noninferiority clinical trials to be valid, two assumptions (constancy and assay sensitivity) must be satisfied (D’Agostino, Massaro, Sullivian, 2003; ICH 2000; FDA, 2010). An active intervention which has been shown to be efficacious (e.g., superior to placebo) in a historical trial may be considered as the active control in a noninferiority trial but the most effective should be selected. The constancy assumption states that the demonstrated effect of the active control over placebo in the historical trial has not changed over time, i.e., would be the same as the effect in the current trial if a placebo group was included. This may not be the case if there were differences in trial conduct (e.g., differences in treatment administration, endpoints, or population) between the historical and current trials. This assumption is not testable in a trial without a concurrent placebo group.

Another important design assumption is assay sensitivity, i.e., the ability for the trial to be able to detect differences between strategies if they truly exist. Otherwise noninferiority may be concluded simply due to insensitivity of the trial to detect differences. In noninferiority trials, assay sensitivity (essentially making strategies appear similar) can be reduced (intentionally or unintentionally) by diluting effects though subtle choices about design and conduct. Many factors can affect assay sensitivity including: poor disease diagnosis, endpoint selection and timing, poor adherence, loss-to-follow-up, prior therapy, inclusion of subgroups where treatment effects may be small, and use of concomitant therapies. Furthermore the active-control nature of the most noninferiority trials, can make clinicians and participants more likely to rate positive outcomes, driving the results toward noninferiority.

The methodologies for two-arm (an experimental intervention and an effective active control) noninferiority clinical trials have been well-established. However, two-arm noninferiority trials often lack the necessary support for the assay sensitivity and constancy assumptions. As a result, inclusion of a third arm (placebo) into the trial has been proposed to address these concerns (Pigeot et al., 2003; Koch and Röhmel, 2004; Hauschke and Pigeot, 2005a). Regulatory authorities often recommend a use of such a three-arm (experimental intervention, active control, and placebo) noninferiority trial design (ICH, 2000; CHMP, 2005; FDA, 2010). The three-arm noninferiority trial offers several scientific advantages (ICH, 2000). Specially these designs provide the opportunity of establishing the validity of the assay sensitivity via a comparison of the placebo to the active control intervention within the trial. Although the three-arm noninferiority design provides such scientific advantages, it also provides challenges: (1) there may be ethical constraints to using a placebo, and (2) there is the added complexity of evaluating two distinct objectives: evaluation of (i) the superiority of the active control intervention to placebo (assay sensitivity) and (ii) the noninferiority of the experimental intervention to the active control intervention (noninferiority). This may result in a sample size that is too large and impractical to conduct. One approach to address this concern is the use of group-sequential designs. The group-sequential design offers the possibility to stop a trial early when evidence is overwhelming and thus offers efficiency (i.e., potentially fewer trial participants and minimizing the amount time that participants receive a placebo, compared to fixed-sample designs).

In this paper, we discuss group-sequential designs for three-arm noninferiority clinical trials. We extend two existing approaches for evaluating noninferiority and assay sensitivity into a group-sequential setting. One approach is discussed by Koch and Röhmel (2004), and Hida and Tango (2011a, 2013) (hereafter we call this “fixed margin approach”), and the other is so-called “fraction approach” proposed by Pigeot et al. (2003). We consider a three-arm noninferiority trial that has two co-primary objectives: (i) to evaluate if the control intervention is superior to placebo (assay sensitivity: AS) and (ii) to evaluate if the experimental intervention is not less effective than the control intervention by a prespecified non-inferiority margin (noninferiority: NI). Objective (ii) is relevant when the experimental intervention has advantages over the control (e.g., safer, more convenience, or less costly). On the other hand, in many noninferiority clinical trials, especially in a regulatory setting, demonstrating the superiority of the experimental intervention to placebo is desirable. However, as Gao and Ware (2008) discuss, if the assay sensitivity assumption does not hold, then there will be uncertainty regarding whether a noninferiority result means that they are similarly effective or similarly ineffective. In this paper, when there is a concern about the assay sensitivity, to make the evaluation of objective (ii) more interpretable, we evaluate a direct comparison of the control intervention to the placebo. For related discussions, please see Hauschke and Pigeot (2005a, 2005b) and Stucke and Kieser (2012).

Three-arm noninferiority clinical trials in a group-sequential setting have been discussed (Li and Gao, 2010; Schlömer and Brannath, 2013), but methodologies are still needed. Extensions of the fraction approach are discussed by Li and Gao (2010) and the fixed margin approach by Schlömer and Brannath (2013), in a setting of two-stage group-sequential three-arm noninferiority clinical trials with continuous or binary outcomes. We discuss two decision-making frameworks for the two approaches when the primary endpoint is continuous. We also discuss a method for sample size recalculation based on the observed effect size at an interim timepoint of the trial.

This paper is structured as follows: in Section 2, we describe the statistical settings and provide methods for the overall power for rejecting the null hypotheses for assay sensitivity and noninferiority when using the two decision-making frameworks for the fixed margin and the fraction approaches in a group-sequential setting. Then, we evaluate the operating characteristics including power, Type I error rate, and sample sizes, as design factors vary in Section 3. In Section 4, we discuss sample size recalculation and consider its’ impact on the power and Type I error rate via a simulation study. In Section 5, we discuss a further extension and summarize the findings in Section 6.

Consider a three-arm noninferiority group sequential clinical trial with a
maximum of *K* planned analyses (*K* ≥2). Let
*n*_{E}* _{k}*,

Assume that the group outcomes
*X*_{E}_{i}_{E},
*X*_{R}_{i}_{R}
and
*X*_{P}_{i}_{P}
are independently and normally distributed with common variance
*σ*^{2} as
*X*_{E}_{i}_{E}
~ N(*μ*_{E}, *σ*^{2}),
*X*_{R}_{i}_{R}
~ N(*μ*_{R}, *σ*^{2})
and
*X*_{P}_{i}_{P}
~ N(*μ*_{P}, *σ*^{2}),
respectively (*i*_{E} = 1, …,
*n*_{E}* _{k}*;

For the fixed margin approach, the hypotheses for evaluating AS and NI are:

$${\mathrm{H}}_{0}^{\text{AS}}:{\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}}\le \mathrm{\Delta}\phantom{\rule{0.16667em}{0ex}}\text{versus}\phantom{\rule{0.16667em}{0ex}}{\mathrm{H}}_{1}^{\text{AS}}:{\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}}>\mathrm{\Delta},$$

(1)

$${\mathrm{H}}_{0}^{\text{NI}}:{\mu}_{\mathrm{E}}-{\mu}_{\mathrm{R}}\le -\mathrm{\Delta}\phantom{\rule{0.16667em}{0ex}}\text{versus}\phantom{\rule{0.16667em}{0ex}}{\mathrm{H}}_{1}^{\text{NI}}:{\mu}_{\mathrm{E}}-{\mu}_{\mathrm{R}}>-\mathrm{\Delta},$$

(2)

where Δ(> 0) is a pre-specified
noninferiority margin (Hida and Tango,
2011a). This approach imposes an extra condition on the hypothesis
testing for the AS, that is superiority of the control intervention to the
placebo is demonstrated with a margin Δ. However, the key feature of the
approach is that the inequalities *μ*_{P} <
*μ*_{R} − Δ<
*μ*_{E} hold for any value of Δ if
both of the null hypotheses ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are rejected at the significance level of
*α* for a one-sided test. This means that the
superiority of the experimental intervention relative to the placebo can be
indirectly demonstrated if ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are rejected, without direct comparison of the
experimental intervention to the placebo. This avoids introduction of further
complexities in adjustment to the Type I or Type II error (Hida and Tango, 2011a).

We are now interested in hypothesis testing for AS and NI based on the
fixed margin approach within a group-sequential setting. The corresponding
statistics for testing hypotheses (1) and (2) at the
*k*th analysis are given by

$${T}_{k}^{\text{AS}}=\frac{{\overline{X}}_{\mathrm{R}k}-{\overline{X}}_{\mathrm{P}k}-\mathrm{\Delta}}{\sigma \sqrt{1/{n}_{\mathrm{R}k}+1/{n}_{\mathrm{P}k}}}\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{k}^{\text{NI}}=\frac{{\overline{X}}_{\mathrm{E}k}-{\overline{X}}_{\mathrm{R}k}+\mathrm{\Delta}}{\sigma \sqrt{1/{n}_{\mathrm{E}k}+1/{n}_{\mathrm{R}k}}},$$

where
_{E}* _{k}*,

$$\text{corr}[{T}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}]=\rho =-\sqrt{\frac{{n}_{\mathrm{E}k}{n}_{\mathrm{P}k}}{({n}_{\mathrm{E}k}+{n}_{\mathrm{R}k})({n}_{\mathrm{R}k}+{n}_{\mathrm{P}k})}}=-\sqrt{\frac{{C}_{\mathrm{P}}}{(1+{C}_{\mathrm{R}})({C}_{\mathrm{R}}+{C}_{\mathrm{P}})}}.$$

The correlation is determined by the allocation ratios
*C*_{P} and *C*_{R} (Hida and Tango, 2011a). The test statistics
( ${T}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}$) are negatively correlated and the correlation
is *ρ* = −0.5 if the intervention groups
are equally sized, i.e., *C*_{R} =
*C*_{P} = 1. Furthermore, the joint
distribution of ( ${T}_{1}^{\text{AS}},\dots ,{T}_{K}^{\text{AS}},{T}_{1}^{\text{NI}},\dots ,{T}_{K}^{\text{NI}}$) are 2*K* multivariate normal
distributed with correlations given by $\text{corr}({T}_{{k}^{\prime}}^{\text{AS}},{T}_{k}^{\text{AS}})=\text{corr}({T}_{{k}^{\prime}}^{\text{NI}},{T}_{k}^{\text{NI}})=\sqrt{{n}_{\mathrm{E}{k}^{\prime}}/{n}_{\mathrm{E}k}}$, and $\text{corr}({T}_{{k}^{\prime}}^{\text{AS}},{T}_{k}^{\text{NI}})=\text{corr}({T}_{{k}^{\prime}}^{\text{NI}},{T}_{k}^{\text{AS}})=\rho \sqrt{{n}_{\mathrm{E}{k}^{\prime}}/{n}_{\mathrm{E}k}}(1\le {k}^{\prime}\le k\le K)$ since ${T}_{k}^{\text{AS}}$ and ${T}_{k}^{\text{NI}}$ can be rewritten as ${T}_{k}^{\text{AS}}=\sqrt{{n}_{\mathrm{E}k}}({\overline{X}}_{\mathrm{R}k}-{\overline{X}}_{\mathrm{P}k}-\mathrm{\Delta})/(\sigma \sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}})$ and ${T}_{k}^{\text{NI}}=\sqrt{{n}_{\mathrm{E}k}}({\overline{X}}_{\mathrm{R}k}-{\overline{X}}_{\mathrm{P}k}-\mathrm{\Delta})/(\sigma \sqrt{1+1/{C}_{\mathrm{R}}})$.

For the fraction approach, the hypotheses for evaluating AS and NI are as follows;

$${\mathrm{H}}_{0}^{\text{AS}}:{\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}}\le 0\phantom{\rule{0.16667em}{0ex}}\text{versus}\phantom{\rule{0.16667em}{0ex}}{\mathrm{H}}_{1}^{\text{AS}}:{\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}}>0,$$

(3)

$${\mathrm{H}}_{0}^{\text{NI}}:({\mu}_{\mathrm{E}}-{\mu}_{\mathrm{P}})/({\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}})\le \theta \phantom{\rule{0.16667em}{0ex}}\text{versus}\phantom{\rule{0.16667em}{0ex}}{\mathrm{H}}_{1}^{\text{NI}}:({\mu}_{\mathrm{E}}-{\mu}_{\mathrm{P}})/({\mu}_{\mathrm{R}}-{\mu}_{\mathrm{P}})>\theta ,$$

(4)

where *θ*(0 <
*θ* < 1) is pre-specified and determined by
*θ* = 1 −
Δ/*μ*_{R} −
*μ*_{P}) as a fraction of the difference
between *μ*_{R} and
*μ*_{P}, using the noninferiority margin
Δ (Pigeot et al., 2003). In
addition, hypothesis testing is logically ordered, i.e., ${\mathrm{H}}_{0}^{\text{AS}}$ is tested first and then ${\mathrm{H}}_{0}^{\text{NI}}$ is tested if and only if ${\mathrm{H}}_{0}^{\text{AS}}$ is rejected at the prespecified significance
level of *α*. If both null hypotheses ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are rejected, then
*μ*_{E} >
*μ*_{P} irrespective of
*θ* since *μ*_{E}
− *μ*_{P} >
*θ*(*μ*_{R} −
*μ*_{P}) > 0. Many authors have discussed
the fraction approach in fixed-sample designs; binary outcomes are discussed by
Tang and Tang (2004) and Kieser and Friede (2007), time to event
outcomes by Mielke et al. (2008) and
Kombrink et al. (2013), and
continuous outcomes with heterogeneous variances by Hasler et al. (2008).

We focus on hypothesis testing based on the fraction approach within a
group-sequential setting. Assuming *μ*_{R}
− *μ*_{P} > 0, the hypothesis (4) can be rewritten as

$${\mathrm{H}}_{0}^{\text{NI}}:{\mu}_{\mathrm{E}}-\theta {\mu}_{\mathrm{R}}-(1-\theta ){\mu}_{\mathrm{P}}\le 0\phantom{\rule{0.16667em}{0ex}}\text{versus}\phantom{\rule{0.16667em}{0ex}}{\mathrm{H}}_{1}^{\text{NI}}:{\mu}_{\mathrm{E}}-\theta {\mu}_{\mathrm{R}}-(1-\theta ){\mu}_{\mathrm{P}}>0.$$

The corresponding statistics for testing hypotheses (3) and (4) at the *k*th analysis are
given by

$${T}_{k}^{\text{AS}}=\frac{{\overline{X}}_{\mathrm{R}k}-{\overline{X}}_{\mathrm{P}k}}{\sigma \sqrt{1/{n}_{\mathrm{R}k}+1/{n}_{\mathrm{P}k}}}\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{k}^{\text{NI}}=\frac{{\overline{X}}_{\mathrm{E}k}-\theta {\overline{X}}_{\mathrm{R}k}-(1-\theta ){\overline{X}}_{\mathrm{P}k}}{\sigma \sqrt{1/{n}_{\mathrm{E}k}+{\theta}^{2}/{n}_{\mathrm{R}k}+{(1-\theta )}^{2}/{n}_{\mathrm{P}k}}}.$$

The joint distribution of ( ${T}_{1}^{\text{AS}},\dots ,{T}_{K}^{\text{AS}},{T}_{1}^{\text{NI}},\dots ,{T}_{K}^{\text{NI}}$) are 2*K* multivariate normal
distributed with their correlations given by the same correlation structure as
the fixed margin approach, but the correlation of ${T}_{k}^{\text{AS}}$ and ${T}_{k}^{\text{NI}}$ is given by

$$\text{corr}\left[{T}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}\right]=\rho =\frac{-\theta /{C}_{\mathrm{R}}+(1-\theta )/{C}_{\mathrm{P}}}{\sqrt{1+{\theta}^{2}/{C}_{\mathrm{R}}+{(1-\theta )}^{2}/{C}_{\mathrm{P}}}\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}}.$$

The correlation is determined by the fraction *θ*
and the allocation ratios *C*_{P} and ${C}_{\mathrm{R}}:\rho =(1-2\theta )/2\sqrt{1-\theta +{\theta}^{2}}$ if the intervention groups are equally sized,
i.e., *C*_{R} = *C*_{P}
= 1.

There are important differences in the two approaches (Röhmel and Pigeot, 2011; Hida and Tango, 2011a, 2011b; Stucke and
Kieser, 2012). Specifically the concept of “assay
sensitivity” is different. A different conclusion is driven from the two
approaches when *μ*_{R} − Δ<
*μ*_{P} <
*μ*_{R} is true (Hida and Tango, 2011b). The fraction approach can
reject ${\mathrm{H}}_{0}^{\text{NI}}$, but the fixed margin approach cannot. Whether
the fraction approach can allow demonstration of noninferiority of the
experimental intervention to the control intervention is questionable under
*μ*_{R} − Δ<
*μ*_{P}. For further discussion, please see
Röhmel and Pigeot (2011),
Hida and Tango (2011b), and Stucke and Kieser (2012).

We consider the two decision-making frameworks associated with hypothesis testing. The first decision-making framework is flexible, where testing hypotheses for AS and NI are logically ordered similarly as in the fraction approach, i.e., NI is evaluated only after the AS is demonstrated and a trial is terminated if ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are rejected at any analysis (i.e., not necessarily simultaneously) (DF-A). The other framework is relatively simple and a special case of DF-A, where a clinical trial is terminated if and only if both ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are rejected simultaneously at the same analysis (DF-B). We separately describe the two decision-making frameworks, corresponding stopping rules and power definitions.

Under DF-A, a trial stops if the AS and the NI are achieved at any analysis (i.e., not necessarily simultaneously). NI is evaluated only after the AS is demonstrated. If AS is demonstrated but NI is not, then the trial continues and subsequent hypothesis testing is repeatedly conducted only for NI until the NI is demonstrated. The stopping rule for DF-A is formally given as follows:

- At the
*k*th analysis (*k*=*k*′, …,*K*− 1),- if ${T}_{{k}^{\prime}}^{\text{AS}}>{c}_{{k}^{\prime}}^{\text{AS}}$ for some
*k*′(1 ≤*k*′ ≤*k*) and ${T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}$, then reject ${\mathrm{H}}_{0}^{\text{NI}}$ and stop the trial - otherwise, continue the trial,

- at the
*K*th analysis,- if ${T}_{{k}^{\prime}}^{\text{AS}}>{c}_{{k}^{\prime}}^{\text{AS}}$ for some
*k*′(1 ≤*k*′ ≤*K*) and ${T}_{K}^{\text{NI}}>{c}_{K}^{\text{NI}}$, then reject ${\mathrm{H}}_{0}^{\text{NI}}$, - otherwise, do not reject ${\mathrm{H}}_{0}^{\text{NI}}$,

where ${c}_{k}^{\text{AS}}$ and ${c}_{k}^{\text{NI}}$ are the critical boundaries at the
*k*th analysis, which are constant and selected
separately for AS and NI to preserve the Type I error of
*α* for each hypothesis, using any
group-sequential method such as Lan-DeMets (LD) alpha-spending method (Lan and DeMets, 1983), analogously to a
trial with a single primary objective. For example, consider a three-arm
noninferiority clinical trial with a maximum number of analyses
*K* = 4 and equally spaced increments of
information, and the O’Brien-Fleming boundary (O’Brien and Fleming, 1979) is used to
reject the null hypothesis for the AS and the NI tests with the same
significance level of *α* = 2.5% for
a one-sided test. The boundaries for each analysis are 4.3326, 2.9631,
2.3590, and 2.0141, respectively. If the AS test is statistically
significant at the third analysis, then the NI test is evaluated twice with
the boundary of 2.3590 at the third analysis and 2.0141 at the final
analysis as if the Type I error for the NI test has been already spent at
the first and second analyses despite no test being conducted. Even if the
AS test is statistically significant at the third analysis, the remaining
Type I error of 1.5% (=2.5−1.0) is not reallocated
to the hypothesis test for NI. If the remaining Type I error rate of
1.5% for the AS test is reallocated to the hypothesis test for NI,
then the size of the hypothesis tests for AS and NI are at most
*α =* 4.0%
(=1.5+2.5) since the test is the intersection-union.

Therefore the overall power for rejecting the both ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ under ${\mathrm{H}}_{1}^{\text{AS}}$ and ${\mathrm{H}}_{1}^{\text{NI}}$ in DF-A is

$$1-\beta =Pr\phantom{\rule{0.16667em}{0ex}}\left[\underset{1\le {k}^{\prime}\le k\le K}{\cup}\left\{\{{T}_{{k}^{\prime}}^{\text{AS}}>{c}_{{k}^{\prime}}^{\text{AS}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right\}\mid {\mathrm{H}}_{1}^{\text{AS}}\cap {\mathrm{H}}_{1}^{\text{NI}}\right].$$

This power can be evaluated using the numerical integration method in Genz (1992) or other methods.

When using the fixed margin approach, DF-A allows for dropping of
the placebo group if AS is demonstrated at the interim. However, when using
the fraction approach, DF-A cannot allow this as the test statistics for the
NI includes the amount of * _{Pk}*.

Under DF-B, a trial is stopped if AS and NI are demonstrated at the same analysis simultaneously. Otherwise the trial will continue and the subsequent hypothesis testing is repeatedly conducted for both AS and NI until simultaneous significance is reached. The stopping rule for DF-B is formally given as follows:

- At the
*k*th analysis (*k*= 1, …,*K*− 1),- if ${T}_{k}^{\text{AS}}>{c}_{k}^{\text{AS}}$ and ${T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}$ simultaneously, then reject ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$, and stop the trial,
- otherwise, continue the trial,

- at the
*K*th analysis- if ${T}_{K}^{\text{AS}}>{c}_{K}^{\text{AS}}$ and ${T}_{K}^{\text{NI}}>{c}_{K}^{\text{NI}}$ then reject ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$,
- otherwise, do not reject ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$.

Similarly as in the DF-A, the critical boundaries at the
*k*th analysis ${c}_{k}^{\text{AS}}$ and ${c}_{k}^{\text{NI}}$ are constant and selected separately for
the AS and the NI tests to preserve the Type I error of
*α* for each hypothesis, using any
group-sequential method, analogously to a trial with a single primary
objective. Therefore, the overall power for rejecting both ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ under ${\mathrm{H}}_{1}^{\text{AS}}$ and ${\mathrm{H}}_{1}^{\text{NI}}$ in DF-B is

$$1-\beta =Pr\phantom{\rule{0.16667em}{0ex}}\left[\underset{k=1}{\overset{K}{\cup}}\left\{\{{T}_{k}^{\text{AS}}>{c}_{k}^{\text{AS}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right\}\mid {\mathrm{H}}_{1}^{\text{AS}}\cap {\mathrm{H}}_{1}^{\text{NI}}\right].$$

Power can also be numerically assessed by using multivariate normal integrals.

Based on the powers for DF-A and DF-B discussed above, in a
group-sequential setting, we describe two sample size concepts, i.e., the
maximum sample size (MSS) and the average sample number (ASN). The MSS is
the sample size required for the final analysis to achieve the desired
overall power 1 − *β* for rejecting both null
hypotheses for AS and NI. The MSS is the smallest integer not less than
*N _{K}* satisfying the desired power for a
group-sequential strategy at the prespecified hypothetical values of
parameters

To identify the value of
*n*_{E}* _{k}* or

In this section, we investigate the operating characteristics of the fixed
margin and fraction approaches for group-sequential designs based on the two
decision-making frameworks, where the number of planned analyses is
*K* = 4. Specifically we evaluate the overall Type I
error rate and overall power under a given sample size. Referring to the settings
discussed in Hida and Tango (2011a), assume
the means (*μ*_{E},
*μ*_{R}, *μ*_{P})
are (10,10,5) with a common standard deviation *σ* =
6.5. The pre-specified noninferiority margin for the fixed margin approach is
Δ = 2.5 and the corresponding fraction for the fraction approach is
*θ* = 0.5. The three allocation ratios
*n*_{E}* _{k}*:

Figures 1 and and22 illustrate the behavior of the power for rejecting both null hypotheses as a function of the experimental intervention group sample size, when using DF-A and DF-B for the fixed margin and fraction approaches.

Behavior of the power for rejecting both (i) the null hypotheses for the AS and
(ii) NI, as a function of the experimental intervention group sample size, when
using DF-A or DF-B for the fixed margin approach, where the number of planned
analyses is **...**

Behavior of the power for rejecting both (i) the null hypotheses for the AS and
(ii) NI, as a function of the experimental intervention group sample size, when
using DF-A or DF-B for the fraction approach, where the number of planned
analyses is *K* = 4. **...**

For the fixed margin approach, there is no practical difference in the overall power between DF-A and DF-B although DF-A provides a slightly higher power than DF-B in all of the stopping boundary combinations. In all three allocation ratios for DF-A and DF-B, the highest power is given by OF-OF and the lowest is by PC-PC. For the fraction approach, there is also no practical difference in the overall power between DF-A and DF-B although DF-A provides a slightly higher power than DF-B in all of the stopping boundary combinations. In all three allocation ratios for DF-A and DF-B, the highest power is given by OF-OF or PC-OF and the lowest by OF-PC or PC-PC. Comparing the powers for the fixed margin and fraction approaches, the fixed margin approach provides consistently lower power than the fraction approach in all of the decision-making frameworks, the stopping boundary combinations, and allocation ratios.

Figures 3 to to66 illustrate the behavior of the Type I error rate for
rejecting both null hypotheses as a function of the sample size for an
experimental intervention group, when using DF-A and DF-B for the fixed margin
and fraction approaches. For the fixed margin approach, the maximum of the Type
I error is not inflated over the targeted significance level of
*α =* 2.5% in any of the
decision-making frameworks, the stopping boundary combinations, or allocation
ratios, but the Type I error rate is small and conservative, especially when ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are true. There is no significant difference in
the Type I error rates between DF-A and DF-B, but DF-B provides a smaller Type I
error rate than DF-A. In all three allocation ratios and null hypothesis
settings, the largest Type I error rate is given by OF-OF or PC-OF for DF-A, and
OF-OF for DF-B. For the fraction approach, the maximum of the Type I error is
similarly not inflated over the prespecified significance level of
*α =* 2.5% in any of the
decision-making frameworks, the stopping boundary combinations or allocation
ratios, but the Type I error rate is small especially when ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ are true. There is no practical difference
between the Type I error rates of DF-A and DF-B, but DF-B provides a smaller
Type I error rate than DF-A. In all of the three allocation ratios and null
hypothesis settings, the largest Type I error rate is given by OF-OF for both
DF-A and DF-B.

Behavior of the Type I error rate for rejecting both null hypotheses (i) for the
AS and (ii) NI as a function of the experimental intervention group sample size,
when using DF-A for the fixed margin approach, where the number of planned
analyses is *K* **...**

Behavior of the Type I error rate for rejecting both null hypotheses (i) for the
AS and (ii) NI as a function of the experimental intervention group sample size,
when using DF-B for the fraction approach, where the number of planned analyses
is *K* = 4. **...**

Comparing the fixed margin and fraction approaches, the fixed margin approach provides consistently smaller power than the fraction approach in all of the decision-making frameworks, the stopping boundary combinations and allocation ratios.

Table 1 displays the MSS and ASN
required for evaluating AS and NI with the power of 1 −
*β* =80% at the significance level of
*α =* 2.5% for a one-sided test, when
using the fixed margin and fraction approaches based on DF-A and DF-B. For the
fixed margin approach based on DF-A, the ASN is calculated under ${\mathrm{H}}_{1}^{\text{AS}}$ and ${\mathrm{H}}_{1}^{\text{NI}}$ in two ways: in one strategy the placebo group
is not discontinued until NI is demonstrated even when AS is demonstrated at an
analysis (ASN1); while in the other strategy the placebo group is discontinued
when AS is demonstrated at an analysis (ASN2). The definitions of ASN1 and ASN2
are given in Appendix.

The MSS and ASN for demonstrating the AS and the NI with the power of 1 −
*β* =80% at the significance level of
*α* = 2.5% for a one-sided test,
where the maximum planned number of analyses is *K* = 4
and the means (*μ*_{E},
*μ* **...**

For both the fixed margin and fraction approaches, in all of the stopping boundary combinations and allocation ratios, there is a modest difference in the MSS and ASN between the DF-A and DF-B although DF-A provides a slightly smaller sample size than DF-B. For the fixed margin approach, the smallest MSS is given by OF-OF and the largest by PC-PC in all of the allocation ratios. The smallest ASN1 is associated with OF-OF or PC-PC and the largest with PC-OF or OF-PC. The largest ASN2 is provided by PC-OF or OF-PC in all of the allocation ratios. For the fraction approach, the smallest MSS is provided by OF-OF or PC-OF and the largest by OF-PC or PC-PC in all of the allocation ratios. The smallest ASN1 is consistently produced with PC-PC and the largest with OF-PC. Comparing the fixed margin and fraction approaches, the fraction approach provides smaller MSS and ASN than the fixed margin approach in all of the decision-making frameworks, the stopping boundary combinations and allocation ratios.

Clinical trials are designed based on assumptions often constructed based on prior data. However, prior data may be limited or an inaccurate indication of future data, resulting in trials that are over- or underpowered. Interim analyses at accumulating data provide an opportunity to evaluate the accuracy of the design assumptions and potentially make design adjustments (i.e., to the sample size) if the assumptions were markedly inaccurate. Group-sequential designs allow for early stopping when there is sufficient statistical evidence of assay sensitivity and noninferiority. However, more modern adaptive designs may also allow for increases (or decreases) in the sample size if effects are smaller (or larger) than assumed. Such adjustments must be conducted carefully for several reasons, especially to maintain control of statistical error rates. In this section, we discuss sample size recalculation based on the observed intervention’s effects at an analysis with a focus on the control of statistical error rates.

We now consider a scenario where the maximum sample size
*n*_{E}* _{k}* in the
experimental intervention group is recalculated to ${n}_{\mathrm{E}K}^{\prime}$ at the

Consider the Cui-Hung-Wang (CHW) statistics (Cui et al., 1999) for sample size recalculation in
group-sequential designs for three-arm clinical trials to preserve the overall
Type I error rate at a prespecified significance level of
*α* even when the sample size is increased and
conventional test statistics are used. When using the fixed margin approach, the
CHW statistics for the AS and the NI are

$$\begin{array}{c}{T}_{k+l}^{\text{AS}}=\sqrt{\frac{{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}{T}_{k}^{\text{AS}}+\sqrt{\frac{{n}_{\mathrm{E}k+l}-{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}\frac{{\overline{X}}_{\mathrm{R}k+l}^{\prime}-{\overline{X}}_{\mathrm{P}k+l}^{\prime}-\mathrm{\Delta}}{\sigma \sqrt{1/({n}_{\mathrm{E}k+l}^{\prime}-{n}_{\mathrm{E}k})}\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}},\text{and}\\ {T}_{k+l}^{\text{NI}}=\sqrt{\frac{{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}{T}_{k}^{\text{NI}}+\sqrt{\frac{{n}_{\mathrm{E}k+l}-{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}\frac{{\overline{X}}_{\mathrm{E}k+l}^{\prime}-{\overline{X}}_{\mathrm{R}k+l}^{\prime}+\mathrm{\Delta}}{\sigma \sqrt{1/({n}_{\mathrm{E}k+l}^{\prime}-{n}_{\mathrm{E}k})}\sqrt{1+1/{C}_{\mathrm{R}}}},\end{array}$$

where ${\overline{X}}_{\mathrm{E}k+l}^{\prime}=({\sum}_{i={n}_{\mathrm{E}k}+1}^{{n}_{\mathrm{E}k+l}^{\prime}}{X}_{\mathrm{E}i})/({n}_{\mathrm{E}k+l}^{\prime}-{n}_{\mathrm{E}k}),\phantom{\rule{0.16667em}{0ex}}{\overline{X}}_{\mathrm{R}k+l}^{\prime}=({\sum}_{i={n}_{\mathrm{R}k}+1}^{{n}_{\mathrm{R}k+1}^{\prime}}{X}_{\mathrm{E}i})/({n}_{\mathrm{R}k+l}^{\prime}-{n}_{\mathrm{R}k})$, and ${\overline{X}}_{\mathrm{P}k+l}^{\prime}=({\sum}_{i={n}_{\mathrm{P}k}+1}^{{n}_{\mathrm{P}k+l}^{\prime}}{X}_{\mathrm{P}i})/({n}_{\mathrm{P}k+l}^{\prime}-{n}_{\mathrm{P}k})$. The sample size is increased or decreased when
the conditional power evaluated at the *k*th analysis is lower or
higher than the desired power 1 − *β*. Under the
planned maximum sample size and a given observed value of ( ${T}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}$), if the decision-making for rejecting the
null-hypotheses ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ is based on DF-A, then the conditional power at
the *k*th analysis is given by

$$\text{CP}({\delta}_{1},{\delta}_{2})=\{\begin{array}{ll}{\mathrm{\Phi}}_{2}\phantom{\rule{0.16667em}{0ex}}\left(-\frac{{c}_{K}^{\text{AS}}-\sqrt{m}{t}_{k}^{\text{AS}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{1}}{\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}},-\frac{{c}_{K}^{\text{NI}}-\sqrt{m}{t}_{k}^{\text{NI}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{\mathrm{R}}}}\right),\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k\hfill \\ 1-{\mathrm{\Phi}}_{1}\phantom{\rule{0.16667em}{0ex}}\left(\frac{{c}_{K}^{\text{NI}}-\sqrt{m}{t}_{k}^{\text{NI}}}{\sqrt{1-m}}-\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{R}}}\right),\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k,\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{{l}^{\prime}}^{\text{AS}}>{c}_{{l}^{\prime}}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{any}\phantom{\rule{0.16667em}{0ex}}{l}^{\prime}=1,\dots ,k,\hfill \end{array}$$

where *m* =
*n*_{E}* _{k}*/

$$\text{CP}({\delta}_{1},{\delta}_{2})={\mathrm{\Phi}}_{2}\phantom{\rule{0.16667em}{0ex}}\left(-\frac{{c}_{K}^{\text{AS}}-\sqrt{m}{t}_{k}^{\text{AS}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{1}}{\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}},-\frac{{c}_{K}^{\text{NI}}-\sqrt{m}{t}_{k}^{\text{NI}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{\mathrm{R}}}}\right).$$

The details of the derivation for the conational powers are given in Appendix. On the other hand, when using the fraction approach, the CHW statistics are given by

$$\begin{array}{c}{T}_{k+l}^{\text{AS}}=\sqrt{\frac{{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}{T}_{k}^{\text{AS}}+\sqrt{\frac{{n}_{\mathrm{E}k+l}-{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}\frac{{\overline{X}}_{\mathrm{R}k+l}^{\prime}-{\overline{X}}_{\mathrm{P}k+l}^{\prime}}{\sigma \sqrt{1/({n}_{\mathrm{E}k+l}^{\prime}-{n}_{\mathrm{E}k})}\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}}\text{and}\\ {T}_{k+l}^{\text{NI}}=\sqrt{\frac{{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}{T}_{k}^{\text{NI}}+\sqrt{\frac{{n}_{\mathrm{E}k+l}-{n}_{\mathrm{E}k}}{{n}_{\mathrm{E}k+l}}}\frac{{\overline{X}}_{\mathrm{E}k+l}^{\prime}-\theta {\overline{X}}_{\mathrm{R}k+l}^{\prime}-(1-\theta ){\overline{X}}_{\mathrm{P}k+l}^{\prime}}{\sigma \sqrt{1/({n}_{\mathrm{E}k+l}^{\prime}-{n}_{\mathrm{E}k})}\sqrt{1+{\theta}^{2}/{C}_{\mathrm{R}}+{(1-\mathrm{\theta})}^{2}/{C}_{\mathrm{P}}}}.\end{array}$$

The conditional power can be calculated in the same manner as the fixed
margin approach except for *δ*_{1} =
(*μ*_{R} −
*μ*_{P})/*σ* and
*δ*_{2} =
(*μ*_{E} −
*θμ*_{R} − (1 −
*θ*)*μ*_{P})/*σ*,
and the coefficient of *δ*^{2} changes $\sqrt{1+1/{C}_{\mathrm{R}}}$ to $\sqrt{1+{\theta}^{2}/{C}_{\mathrm{R}}+{(1-\mathrm{\theta})}^{2}/{C}_{\mathrm{P}}}$.

When recalculating the sample size, we consider three possible options, i.e., (i) only allowing an increase in the sample size, (ii) only allowing a decrease in the sample size, and (iii) allowing an increase or a decrease in sample size. The recalculated maximum sample size ${n}_{\mathrm{E}K}^{\prime}$ required for the experimental intervention group for each respective option is:

- ${n}_{\mathrm{E}K}^{\prime}=\{\begin{array}{ll}min({n}_{\mathrm{E}K}^{\ast},\lambda {n}_{\mathrm{E}K}),\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}\text{CP}({\delta}_{1},{\delta}_{2})<\gamma (1-\beta ),\hfill \\ {n}_{\mathrm{E}K},\hfill & \text{otherwise};\hfill \end{array}$
- ${n}_{\mathrm{E}K}^{\prime}=\{\begin{array}{ll}{n}_{\mathrm{E}K}^{\ast},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}\text{CP}({\delta}_{1},{\delta}_{2})>\eta (1-\beta ),\hfill \\ {n}_{\mathrm{E}K},\hfill & \text{otherwise};\hfill \end{array}$
- ${n}_{\mathrm{E}K}^{\prime}=\{\begin{array}{ll}min({n}_{\mathrm{E}K}^{\ast},\lambda {n}_{\mathrm{E}K}),\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}\text{CP}({\delta}_{1},{\delta}_{2})<\gamma (1-\beta ),\hfill \\ {n}_{\mathrm{E}K}^{\ast},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}\text{CP}({\delta}_{1},{\delta}_{2})>\eta (1-\beta ),\hfill \\ {n}_{\mathrm{E}K},\hfill & \text{otherwise};\hfill \end{array}$

where ${n}_{\mathrm{E}K}^{\ast}$ is the calculated sample size in which the
conditional power based on the observed effect sizes achieves the target power
of 1 − *β*, and *γ*(0 <
*γ* < 1) and
*η*(*η >* 1) are the
pre-specified constant for allowance.

We investigate the impact of the sample size recalculation on the overall power and the Type I error rate for rejecting both null hypotheses using Monte-Carlo simulation.

Consider group sequential designs with four analyses (i.e., three
interim and one final analysis) for the fixed margin and the fraction approaches
with the decision-making frameworks DF-A or DF-B, where analyses are conducted
with equally spaced increments in information. The original planned total sample
size is calculated to test both null hypotheses for the AS and the NI with the
power of 1 − *β* = 80% at the
prespecified significance level of *α* =
2.5% for a one-sided test in the fixed-sample designs, where
(*μ*_{E},
*μ*_{R},
*μ*_{P}) = (10, 10, 5),
*σ* = 6.5, and Δ = 2.5 for
the fixed margin approach, and *θ* = 0.5 for the
fraction approach. The total sample sizes are 426 for the fixed margin approach
and 240 for the fraction approach.

One sample size recalculation is considered based on the observed
effects with three options: (i) only allowing an increase in the sample size,
(ii) only allowing a decrease in the sample size, and (iii) allowing an increase
or decrease in sample size, evaluated at the first, second or third interim
analyses. The sample size calculation is performed when the conditional power is
less than 70% (i.e., *γ* is set as 0.875) and/or
exceeds 90% (i.e., *η* is set as 1.125). The
sample size for the experimental intervention group can be increased up to 1.5
times of the original planned sample size (i.e., *λ* is
set as 1.5). Similarly as in Section 3, the critical values are determined based
on the three stopping boundary combinations considered using the LD
alpha-spending method with equally spaced increments in information: (i) the OF
for both AS and NI (OF-OF), (ii) the PC for AS and the OF for NI (PC-OF), and
(iii) the PC for both AS and NI (PC-PC). The empirical overall power is
evaluated under the situation where both ${\mathrm{H}}_{1}^{\text{AS}}$ and ${\mathrm{H}}_{1}^{\text{NI}}$ are true with
(*μ*_{E},
*μ*_{R},
*μ*_{P}) = (10, 10, 5), and the
actual overall Type I rate is evaluated under the three situations, i.e., (i) ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$, (ii) ${\mathrm{H}}_{0}^{\text{AS}}$ or (iii) ${\mathrm{H}}_{0}^{\text{NI}}$. The number of replications for the simulation
is set to 1,000,000 for the evaluation of the Type I error rate and 100,000
replications for the power. The number of replications for the simulation was
determined based on the precision. 1,000,000 replications provides a two-sided
95% confidence interval with a width equal to 0.001 when the proportion
is 2.5%, while 100,000 replications provides a two-sided 95%
confidence interval with a width equal to 0.005 when the proportion is
80%. We limit the discussion to the overall empirical power and Type I
error rate for DF-A with equally sized groups of *C*_{R}
= *C*_{P} = 1 as there is no appreciable
differences in the power and Type I error rates between DF-A and DF-B.

Figure 7 summarizes the overall empirical power for the fixed margin and fraction approaches when sample size recalculation is performed based on DF-A under the design setting and parameter configurations described above. For both the fixed margin and fraction approach, regardless of the stopping boundary combinations, the overall powers increase up to 15% compared to a design without sample size recalculation. Larger increases are observed with a later timing of the sample size recalculation when only allowing an increase in the sample size, and allowing an increase or decrease in the sample size. As shown in Table 2, the ASN are approximately equal or smaller than the planned sample size. When only allowing a decrease in the sample size, the overall power cannot maintain the targeted power of 80% although the expected sample size can be reduced more than the other recalculation options. Figures 8 and and9 summarize9 summarize the actual overall Type I error rates. For both the fixed margin and fraction approaches, regardless of the stopping boundary combinations, in all three recalculation options, the actual Type I error rates do not exceed the prespecified significance level of 2.5%, but are small and conservative, especially in the fixed margin approach.

The impact of the sample size recalculation on the overall empirical power for
rejecting both null hypotheses for the AS and the NI, when using DF-A for the
fixed margin and fraction approaches, where the number of planned analyses is
*K* = 4. One sample **...**

The impact of sample size recalculation on overall actual Type I error rate for
rejecting both null hypotheses for the AS and the NI, when using DF-A for the
fixed margin approach, where the number of planned analyses is
*K* = 4. One sample size recalculation **...**

The Impact of sample size recalculation on overall actual Type I error rate for
rejecting both null hypotheses for the AS and the NI, when using DF-A for the
fraction approach, where the number of planned analyses is *K*
= 4. One sample size recalculation **...**

When constructing efficient group-sequential designs in three arm noninferiority clinical trials, a major issue is that the Type I error rate is small and conservative as the rejection region of the null hypotheses ${\mathrm{H}}_{0}^{\text{AS}}$ and ${\mathrm{H}}_{0}^{\text{NI}}$ is restricted even when group-sequential designs are used for evaluating the AS and the NI, as with the fixed-sample designs. This is due to the requirement that the allocation of the Type I error to each analysis for the AS and the NI should be prespecified and determined, using an alpha-spending method.

To overcome this issue, the DF-A can be modified to allocate adaptively the
Type I error to each analysis for the NI although the Type I error allocation for
the AS is prespecified. This idea is first discussed by Tsong et al. (2004) in group-sequential three-arm
clinical trials when assessing the equivalence and efficacy of a generic product,
where the co-primary objectives of the trial are to assess whether the generic and
reference product are effective relative to placebo and whether the generic is
equivalent to the reference product with a prespecified equivalence margin. Their
method evaluates equivalence only after both null hypotheses of efficacy are
rejected and then specifies the Type I error allocation before the equivalence
evaluation is performed. In the three-arm noninferiority clinical trials for the
assessment of the AS and the NI, the NI is evaluated only after the AS is
demonstrated and the Type I error allocation for the NI is specified just before the
NI evaluation is performed. Figure 10
illustrates the behavior of the Type I error rate for rejecting both null hypotheses
for the AS and the NI as a function of sample size for an experimental intervention
group, when using the fixed margin approach based on DF-A with adaptive Type I error
allocation for the NI in a group-sequential setting, where the parameter settings
and configuration are same as in Figure 3, but
for
*n*_{E}* _{k}*:

Noninferiority clinical trials recently have received a great deal of attention by regulatory authorities (CHMP, 2005; FDA, 2010) and in the clinical trials literature (e.g., extensive reference found in Rothman et al. (2012)). Noninferiority clinical trials have complexities requiring careful design, monitoring, analyses, and reporting. When designing noninferiority clinical trials, the constancy and assay sensitivity are the important assumptions. The selection of the active control for a noninferiority trial should be done carefully, ensuring that it has demonstrated and precisely measured superiority over placebo and that its effect has not changed compared to the historical trials that demonstrated its efficacy (constancy assumption). To assess these issues in regulatory medical product development, a use of three-arm noninferiority design including an experimental intervention, an active control intervention, and a placebo has been considered as a gold-standard design. But this trial may raise ethical issues and result in a sample size that is too large and impractical to conduct the clinical trial.

In this paper, we discuss three-arm noninferiority clinical trials and extend two existing approaches, i.e., the fixed margin and fraction approaches, for evaluating noninferiority and assay sensitivity to a group-sequential setting with two decision-making frameworks. We evaluate the operating characteristics including power, Type I error rate, maximum and expected sample sizes as design factors vary. We also discuss sample size recalculation and consider its’ impact on the power and Type I error rate via a simulation study. Our findings are summarized as follows:

- The decision-making frameworks of DF-A and DF-B for the fixed margin and the fraction approaches provide the possibility of stopping a trial early when evidence is overwhelming, thus offering efficiency (e.g., an ASN potentially 4% to 15% fewer than the fixed-sample designs with equally sized groups and four analyses)
- There are no major differences in both MSS and ASN between DF-A and DF-B for the fixed margin and the fraction approaches, although DF-A is slightly more powerful than DF-B. By using the DF-A for the fixed margin approach, the time that participants are exposed to placebo can be minimized as the DF-A allows dropping of the placebo group if assay sensitivity has been demonstrated at an analysis.
- For the fixed margin approach, selecting the O’Brien-Fleming-type boundary for both AS and NI could lead to fewer participants for the MSS and the ASN compared with other boundary combinations. On the other hand, for the fraction approach, selecting the O’Brien-Fleming-type boundary for both AS and NI, or the Pocock-type boundary for AS and the O’Brien-Fleming-type boundary for NI provides better efficiency with respect to the MSS and the ASN compared with other boundary combinations.
- When considering sample size recalculation during a trial, only allowing a decrease in the sample size may be not a desirable option in both the fixed margin and fraction approaches as the power does not reach desired levels, although the expected sample size can be reduced more than the other recalculation options. In addition, the timing of the sample size recalculation should also be carefully considered. Power is increased if the sample size recalculation is carried out later in the trial, but the expected sample size is larger.

We caution that these findings are based on one set of design parameter configurations except for the allocation ratio. Further investigation will be required to evaluate how the power and Type I error rate behave with other design assumptions.

Behavior of the Type I error rate for rejecting both null hypotheses (i) for the
AS and (ii) NI as a function of the experimental intervention group sample size,
when using DF-B for the fixed margin approach, where the number of planned
analyses is *K* **...**

The authors are grateful to the two anonymous referees and the associate editor for their valuable suggestions and helpful comments that improved the content and presentation of the paper. Research reported in this publication was supported by JSPS KAKENHI under Grant Number 26330038 and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Numbers UM1AI104681. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The ASN is the expected sample size under hypothetical reference values and provides information regarding the number of participants anticipated in a group-sequential design in order to reach a decision point. We briefly describe the several definitions of the ASN corresponding to the decision-making frameworks.

When using DF-A or DF-B for the fixed margin and fraction approaches, if the placebo group is not terminated until NI is demonstrated even when the AS is demonstrated at an analysis, then the ASN can be calculated by

$$\begin{array}{l}\text{ASN}1=\sum _{k=1}^{K}{N}_{k}{P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})+\left(1-\sum _{k=1}^{K}{P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})\right)\phantom{\rule{0.16667em}{0ex}}{N}_{K}\\ ={N}_{K}+\sum _{k=1}^{K-1}({N}_{k}-{N}_{K}){P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2}),\end{array}$$

where *N _{k}*
(=

The stopping probability *P _{k}* based on DF-A is
given by

$${P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})=\{\begin{array}{ll}Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}>{c}_{1}^{\text{NI}}\}],\hfill & k=1,\hfill \\ \begin{array}{l}Pr[{\bigcap}_{l=1}^{k-1}\{{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\}\cap \{{T}_{k}^{\text{AS}}>{c}_{k}^{\text{AS}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}]\\ +Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap {\bigcap}_{l=1}^{k-1}\{{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}]\\ +{\sum}_{2\le l<k}Pr[{\bigcap}_{m=1}^{l-1}\{{T}_{m}^{\text{AS}}\le {c}_{m}^{\text{AS}}\}\cap \{{T}_{l}^{\text{AS}}>{c}_{l}^{\text{AS}}\}\cap {\bigcap}_{n=l}^{k-1}\{{T}_{n}^{\text{NI}}\le {c}_{n}^{\text{NI}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}],\end{array}\hfill & k\ge 2.\hfill \end{array}$$

For instance, at *K* = 2, the stopping
probabilities *P*_{1} and *P*_{2}
based on DF-A are calculated by multivariate normal integrals as follows:

$$\begin{array}{l}{P}_{1}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})=Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}>{c}_{1}^{\text{NI}}\}]={\int}_{{c}_{1}^{\text{AS}}}^{\infty}{\int}_{{c}_{1}^{\text{NI}}}^{\infty}{f}_{2}({t}_{1}^{\text{AS}},{t}_{1}^{\text{NI}})\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{NI}}{dt}_{1}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{and}\\ {P}_{2}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})=Pr[\{{T}_{1}^{\text{AS}}\le {c}_{1}^{\text{AS}}\}\cap \{{T}_{2}^{\text{AS}}>{c}_{2}^{\text{AS}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]+Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}\le {c}_{1}^{\text{NI}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]\\ =Pr[\{{T}_{1}^{\text{AS}}\le {c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}\le {c}_{1}^{\text{NI}}\}\cap \{{T}_{2}^{\text{AS}}>{c}_{2}^{\text{AS}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]\\ +Pr[\{{T}_{1}^{\text{AS}}\le {c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}>{c}_{1}^{\text{NI}}\}\cap \{{T}_{2}^{\text{AS}}>{c}_{2}^{\text{AS}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]\\ +Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}\le {c}_{1}^{\text{NI}}\}\cap \{{T}_{2}^{\text{AS}}>{c}_{2}^{\text{AS}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]\\ +Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}\le {c}_{1}^{\text{NI}}\}\cap \{{T}_{2}^{\text{AS}}\le {c}_{2}^{\text{AS}}\}\cap \{{T}_{2}^{\text{NI}}>{c}_{2}^{\text{NI}}\}]\\ ={\int}_{-\infty}^{{c}_{1}^{\text{AS}}}{\int}_{-\infty}^{{c}_{1}^{\text{NI}}}{\int}_{{c}_{2}^{\text{AS}}}^{\infty}{\int}_{{c}_{2}^{\text{NI}}}^{\infty}{f}_{4}({t}_{1}^{\text{AS}},{t}_{1}^{\text{NI}},{t}_{2}^{\text{AS}},{t}_{2}^{\text{NI}})\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{AS}}\\ +{\int}_{-\infty}^{{c}_{1}^{\text{AS}}}{\int}_{{c}_{1}^{\text{NI}}}^{\infty}{\int}_{{c}_{2}^{\text{AS}}}^{\infty}{\int}_{{c}_{2}^{\text{NI}}}^{\infty}{f}_{4}({t}_{1}^{\text{AS}},{t}_{1}^{\text{NI}},{t}_{2}^{\text{AS}},{t}_{2}^{\text{NI}})\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{AS}}\\ +{\int}_{{c}_{1}^{\text{AS}}}^{\infty}{\int}_{-\infty}^{{c}_{1}^{\text{NI}}}{\int}_{{c}_{2}^{AS}}^{\infty}{\int}_{{c}_{2}^{NI}}^{\infty}{f}_{4}({t}_{1}^{\text{AS}},{t}_{1}^{\text{NI}},{t}_{2}^{\text{AS}},{t}_{2}^{\text{NI}})\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{AS}}\\ +{\int}_{{c}_{1}^{\text{AS}}}^{\infty}{\int}_{-\infty}^{{c}_{1}^{\text{NI}}}{\int}_{-\infty}^{{c}_{2}^{\text{AS}}}{\int}_{{c}_{2}^{\text{NI}}}^{\infty}{f}_{4}({t}_{1}^{\text{AS}},{t}_{1}^{\text{NI}},{t}_{2}^{\text{AS}},{t}_{2}^{\text{NI}})\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{2}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}{dt}_{1}^{\text{AS}},\end{array}$$

where *f _{k}*(·)
is the probability density function of

$${P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})=\{\begin{array}{ll}Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}>{c}_{1}^{\text{NI}}\}],\hfill & k=1,\hfill \\ Pr\phantom{\rule{0.16667em}{0ex}}\left[{\bigcap}_{l=1}^{k-1}\{\{{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\}\cup \{{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\}\}\cap \{{T}_{k}^{\text{AS}}>{c}_{k}^{\text{AS}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right],\hfill & k\ge 2.\hfill \end{array}$$

When using DF-A for the fixed margin approach, we have an option for discontinuing the placebo group at the interim when the AS is demonstrated. In this situation, the ASN can be calculated by

$$\begin{array}{l}\text{ASN}2=\sum _{k=1}^{K}\sum _{l=1}^{k}\{{N}_{k}-({n}_{\mathrm{P}k}-{n}_{\mathrm{P}l})\}{P}_{k\mid l}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})+\left(1-\sum _{k=1}^{K}{P}_{k}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})\right)\phantom{\rule{0.16667em}{0ex}}{N}_{K}\\ ={N}_{K}+\sum _{k=1}^{K}\sum _{l=1}^{k}\{{N}_{k}-{N}_{K}-({n}_{\mathrm{P}k}-{n}_{\mathrm{P}l})\}{P}_{k\mid l}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2}),\end{array}$$

where
*P _{k|l}*(

$${P}_{k\mid l}({\mu}_{\mathrm{E}},{\mu}_{\mathrm{R}},{\mu}_{\mathrm{P}},\mathrm{\Delta},{\sigma}^{2})=\{\begin{array}{ll}Pr[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap \{{T}_{1}^{\text{NI}}>{c}_{1}^{\text{NI}}\}],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}k=l=1,\hfill \\ Pr\phantom{\rule{0.16667em}{0ex}}\left[{\bigcap}_{l=1}^{k-1}\{{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\}\cap \{{T}_{k}^{\text{AS}}>{c}_{k}^{\text{AS}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}k=l\ge 2,\hfill \\ Pr\phantom{\rule{0.16667em}{0ex}}\left[\{{T}_{1}^{\text{AS}}>{c}_{1}^{\text{AS}}\}\cap {\bigcap}_{l=1}^{k-1}\{{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}k>l=1,\hfill \\ Pr\phantom{\rule{0.16667em}{0ex}}\left[{\bigcap}_{m=1}^{l-1}\{{T}_{m}^{\text{AS}}\le {c}_{m}^{\text{AS}}\}\cap \{{T}_{l}^{\text{AS}}>{c}_{l}^{\text{AS}}\}\cap {\bigcap}_{n=l}^{k-1}\{{T}_{n}^{\text{NI}}\le {c}_{n}^{\text{NI}}\}\cap \{{T}_{k}^{\text{NI}}>{c}_{k}^{\text{NI}}\}\right],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}k>l\ge 2.\hfill \end{array}$$

We briefly describe the derivation of conditional powers discussed in Section 4. As the powers based on the DF-A or DF-B for the fixed margin and the fraction approaches can be derived in the same way, we only focus on the conditional powers based on the DF-A for the fixed margin approach.

Under the planned maximum sample size and a given observed value of ( ${T}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}$), the conditional power based on the DF-A
evaluated at the *k*th analysis is

$$\text{CP}=\{\begin{array}{ll}Pr[\{{T}_{K}^{\text{AS}}>{c}_{K}^{\text{AS}}\}\cap \{{T}_{K}^{\text{NI}}>{c}_{K}^{\text{NI}}\}\mid {T}_{k}^{\text{AS}}={t}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}={t}_{k}^{\text{NI}}],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k,\hfill \\ \phantom{\rule{0ex}{0ex}}Pr[{T}_{K}^{\text{NI}}>{c}_{K}^{\text{NI}}\mid {T}_{k}^{\text{NI}}={t}_{k}^{\text{NI}}],\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k,\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{{l}^{\prime}}^{\text{AS}}>{c}_{{l}^{\prime}}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{any}\phantom{\rule{0.16667em}{0ex}}{l}^{\prime}=1,\dots ,k.\hfill \end{array}$$

(B1)

For the fixed margin approach, the conditional distribution of ( ${T}_{K}^{\text{AS}},{T}_{K}^{\text{NI}}\mid {T}_{k}^{\text{AS}}={t}_{k}^{\text{AS}},{T}_{k}^{\text{NI}}={t}_{k}^{\text{NI}}$) is a bivariate normal distribution with mean vector given as

$${\left(\sqrt{m}{t}_{k}^{\text{AS}}+\sqrt{1-m}\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{1}}{\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}},\sqrt{m}{t}_{k}^{\text{NI}}+\sqrt{1-m}\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{\mathrm{R}}}}\right)}^{\mathrm{T}}$$

and covariance matrix given as $(1-m)\left(\begin{array}{cc}1& \rho \\ \rho & 1\end{array}\right)$, where *m* =
*n*_{E}* _{k}*/

$$\text{CP}({\delta}_{1},{\delta}_{2})=\{\begin{array}{c}{\mathrm{\Phi}}_{2}\phantom{\rule{0.16667em}{0ex}}\left(-\frac{{c}_{K}^{\text{AS}}-\sqrt{m}{t}_{k}^{\text{AS}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{1}}{\sqrt{1/{C}_{\mathrm{R}}+1/{C}_{\mathrm{P}}}},-\frac{{c}_{K}^{\text{NI}}-\sqrt{m}{t}_{k}^{\text{NI}}}{\sqrt{1-m}}+\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{\mathrm{R}}}}\right),\\ \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{AS}}\le {c}_{l}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k,\\ 1-{\mathrm{\Phi}}_{1}\phantom{\rule{0.16667em}{0ex}}\left(\frac{{c}_{K}^{\text{NI}}-\sqrt{m}{t}_{k}^{\text{NI}}}{\sqrt{1-m}}-\sqrt{{n}_{\mathrm{E}K}-{n}_{\mathrm{E}k}}\frac{{\delta}_{2}}{\sqrt{1+1/{C}_{\mathrm{R}}}}\right),\hfill \\ \text{if}\phantom{\rule{0.16667em}{0ex}}{T}_{l}^{\text{NI}}\le {c}_{l}^{\text{NI}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}l=1,\dots ,k,\text{and}\phantom{\rule{0.16667em}{0ex}}{T}_{{l}^{\prime}}^{\text{AS}}>{c}_{{l}^{\prime}}^{\text{AS}}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{any}\phantom{\rule{0.16667em}{0ex}}{l}^{\prime}=1,\dots ,k,\end{array}$$

where Φ_{1}(·) and
Φ_{2}(·) are the cumulative distribution functions
of the standardized univariate and bivariate normal distributions.

- Committee for medical products for human use (CHMP) [Accessed November 12, 2015];Guideline on the choice of the non-inferiority margin. 2005 Available at: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003636.pdf.
- Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55:853–857. doi: 10.1111/j.0006-341X.1999.00853.x. [PubMed] [Cross Ref]
- D’Agostino RB, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues: the encounters of academic consultants in statistics. Statistics in Medicine. 2003;22:169–186. doi: 10.1002/sim.1425. [PubMed] [Cross Ref]
- Fishbane S, Schiller B, Locatelli F, Covic AC, Provenzano R, Wiecek A, Levin NW, Kaplan M, Macdougall IC, Francisco C, Mayo MR, Polu KR, Duliege AM, Besarab A. for the EMERALD Study Groups. Peginesatide in patients with anemia undergoing hemodialysis. New England Journal of Medicine. 2013;368:307–19. doi: 10.1056/NEJMoa1203165. [PubMed] [Cross Ref]
- Food and Drug Administration (FDA) Guidance for industry non-inferiority trials. Rockville, MD: Food and Drug Administration; 2010. [Accessed July 14, 2015]. Available at: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM202140.pdf.
- Gao P, Ware JH. Assessing non-inferiority: a combination approach. Statistics in Medicine. 2008;27:392–406. doi: 10.1002/sim.2938. [PubMed] [Cross Ref]
- Genz A. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics. 1992;1:141–149. doi: 10.1080/10618600.1992.10477010. [Cross Ref]
- Hauschke D, Pigeot I. Establishing efficacy of a new experimental treatment in the ‘gold standard’ design. Biometrical Journal. 2005a;47:782–786. doi: 10.1002/bimj.200510169. [PubMed] [Cross Ref]
- Hauschke D, Pigeot I. Rejoinder to “Establishing efficacy of a new experimental treatment in the ‘gold standard’ design” Biometrical Journal. 2005b;47:797–798. doi: 10.1002/bimj.200510179. [PubMed] [Cross Ref]
- Hasler M, Vonk R, Hothorn LA. Assessing non-inferiority of a new treatment in a three-arm trial in the presence of heteroscedasticity. Statistics in Medicine. 2008;27:490–503. doi: 10.1002/sim.3052. [PubMed] [Cross Ref]
- Hamasaki T, Asakura K, Evans SR, Sugimoto T, Sozu T. Group-sequential strategies in clinical trials with multiple co-primary endpoints. Statistics in Biopharmaceutical Research. 2015;7:36–54. doi: 10.1080/19466315.2014.1003090. [PMC free article] [PubMed] [Cross Ref]
- Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: Exponential event times. Pharmaceutical Statistics. 2013;12:28–34. doi: 10.1002/pst.1545. [PMC free article] [PubMed] [Cross Ref]
- Hida E, Tango T. On the three-arm non-inferiority trial including a placebo with a prespecified margin. Statistics in Medicine. 2011a;30:224–231. doi: 10.1002/sim.4099. [PubMed] [Cross Ref]
- Hida E, Tango T. Response to Joachim Röhmel and Iris Pigeot. Statistics in Medicine. 2011b;30:3165. doi: 10.1002/sim.4313. [Cross Ref]
- Hida E, Tango T. Three-arm noninferiority trials with a prespecified margin for inference of the difference in the proportions of binary endpoints. Journal of Biopharmaceutical Statistics. 2013;23:774–789. doi: 10.1080/10543406.2013.789893. [PubMed] [Cross Ref]
- International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) [Accessed July 14, 2015];ICH Harmonised Tripartite Guideline E10: Choice of control group and related issues in clinical trials. 2000 Jul; Available at: http://www.ich.org/pdfICH/e10step4.pdf.
- Kieser M, Friede T. Planning and analysis of three-arm non-inferiority trials with binary endpoints. Statistics in Medicine. 2007;26:253–273. doi: 10.1002/sim.2543. [PubMed] [Cross Ref]
- Koch A, Röhmel J. Hypothesis testing in the “gold standard” design for proving the efficacy of an experimental treatment. Journal of Biopharmaceutical Statistics. 2004;14:315–325. doi: 10.1081/BIP-120037182. [PubMed] [Cross Ref]
- Kombrink K, Munk A, Friede T. Design and semiparametric analysis of non-inferiority trials with active and placebo control for censored time-to-event data. Statistics in Medicine. 2013;32:3055–3066. doi: 10.1002/sim.5769. [PubMed] [Cross Ref]
- Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. doi: 10.1093/biomet/70.3.659. [Cross Ref]
- Li G, Gao S. A group sequential type design for three-arm non-inferiority trials with binary endpoints. Biometrical Journal. 2010;52:504–518. doi: 10.1002/bimj.200900188. [PubMed] [Cross Ref]
- Mielke M, Munk A, Schacht A. The assessment of non-inferiority in a gold standard design with censored, exponentially distributed endpoints. Statistics in Medicine. 2008;27:5093–5110. doi: 10.1002/sim.3348. [PubMed] [Cross Ref]
- O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–556. doi: 10.2307/2530245. [PubMed] [Cross Ref]
- Pigeot I, Schäfer J, Röhmel J, Hauschke D. Assessing non-inferiority of a new treatment in a three-arm clinical trial including a placebo. Statistics in Medicine. 2003;22:883–899. doi: 10.1002/sim.1450. [PubMed] [Cross Ref]
- Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–199. doi: 10.1093/biomet/64.2.191. [Cross Ref]
- Röhmel J, Pigeot I. Statistical strategies for the analysis of clinical trials with an experimental treatment, an active control and placebo, and a prespecified fixed non-inferiority margin for the difference in means. Statistics in Medicine. 2011;30:3162–3164. doi: 10.1002/sim.4299. [PubMed] [Cross Ref]
- Rothmann MD, Wiens BL, Chan ISF. Design and Analysis of Non-Inferiority Trials. Chapman & Hall/CRC; 2012.
- Schlömer P, Brannath W. Group sequential designs for three-arm ‘gold standard’ non-inferiority trials with fixed margin. Statistics in Medicine. 2013;32:4875–4899. doi: 10.1002/sim.5950. [PubMed] [Cross Ref]
- Stucke K, Kieser M. A general approach for sample size calculation for the three-arm ‘gold standard’ non-inferiority design. Statistics in Medicine. 2012;31:3579–3596. doi: 10.1002/sim.5461. [PubMed] [Cross Ref]
- Sugimoto T, Sozu T, Hamasaki T. A convenient formula for sample size calculations in clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012;11:118–128. doi: 10.1002/pst.505. [PubMed] [Cross Ref]
- Tang ML, Tang NS. Test of noninferiority via rate difference for three-arm clinical trials with placebo. Journal of Biopharmaceutical Statistics. 2004;14:337–347. doi: 10.1081/BIP-120037184. [PubMed] [Cross Ref]
- Tsong Y, Zhang J, Wang SJ. Group sequential design and analysis of clinical equivalence assessment for generic nonsystematic drug products. Journal of Biopharmaceutical Statistics. 2004;14:359–373. doi: 10.1081/BIP-120037186. [PubMed] [Cross Ref]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |