PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Biopharm Stat. Author manuscript; available in PMC 2018 January 1.
Published in final edited form as:
J Biopharm Stat. 2017; 27(1): 1–24.
Published online 2016 February 18. doi:  10.1080/10543406.2016.1148710
PMCID: PMC4990829
NIHMSID: NIHMS768439

Group-sequential three-arm noninferiority clinical trial designs

Abstract

We discuss group-sequential three-arm noninferiority clinical trial designs that include active and placebo controls for evaluating both assay sensitivity and noninferiority. We extend two existing approaches, the fixed margin and fraction approaches, into a group-sequential setting with two decision-making frameworks. We investigate the operating characteristics including power, Type I error rate, maximum and expected sample sizes, as design factors vary. In addition, we discuss sample size recalculation and its’ impact on the power and Type I error rate via a simulation study.

Keywords: Average sample number, Assay sensitivity, Cui-Hung-Wang Statistics, Fixed margin approach, Fraction approach, Maximum sample size, Sample size recalculation, Type I error

1. Introduction

Active-controlled noninferiority trial designs are an alternative to placebo-controlled superiority designs when a use of the placebo control is ethically undesirable due to the availability of a proven effective medical intervention. Active-controlled noninferiority trial designs include an existing effective intervention such as an effective standard of care. In contrast to superiority trials where there is interest in evaluating if an intervention is superior to a control (e.g., placebo), noninferiority trials evaluate if an intervention is noninferior to the control. In a noninferiority trial, the null hypothesis of inferiority is assumed to be true unless there is sufficient data to reject it in favor of the alternative (noninferiority). Noninferiority is assessed by evaluating whether inferiority of a pre-specified magnitude (called a noninferiority margin) can be ruled out with reasonable confidence using confidence intervals. The noninferiority margin is carefully selected to ensure that a noninferiority result would: (1) imply retention of the some of the effect that the active control has historically displayed (i.e., when compared to placebo), and (2) rule out clinically important levels of inferiority so that clinical application would be ethical and clinically acceptable.

For example, EMERALD 1 (conducted in the United States) and EMERALD 2 (conducted in Europe) are randomized, controlled, open-label, noninferiority clinical trials to evaluate the efficacy and safety of peginesatide as the maintenance treatment of anemia in patients with chronic renal failure who were receiving hemodialysis and previously treated with epoetin (Fishbane et al., 2013). Both trials included a 6-week screening period, a 28-week initial dose-adjustment period, an 8-week evaluation period, and a longer-term follow-up period (≥16 additional weeks). Eligible participants were randomly assigned, in a 2:1 ratio, to receive peginesatide once every 4 weeks or to continue to receive epoetin (epoetin alfa in the EMERALD 1, and epoetin beta in the EMERALD 2) one to three times a week. The frequency and route of administration of epoetin was determined based on the treatment regimen during the screening period. The primary efficacy endpoint was the change from the baseline hemoglobin level during the evaluation period. Noninferiority for both trials would be established if the lower limit of the two-sided 95% confidence interval was −1.0 g per deciliter or higher, indicated that inferiority of greater than −1.0 could be ruled out with reasonable confidence, compared to epoetin.

For noninferiority clinical trials to be valid, two assumptions (constancy and assay sensitivity) must be satisfied (D’Agostino, Massaro, Sullivian, 2003; ICH 2000; FDA, 2010). An active intervention which has been shown to be efficacious (e.g., superior to placebo) in a historical trial may be considered as the active control in a noninferiority trial but the most effective should be selected. The constancy assumption states that the demonstrated effect of the active control over placebo in the historical trial has not changed over time, i.e., would be the same as the effect in the current trial if a placebo group was included. This may not be the case if there were differences in trial conduct (e.g., differences in treatment administration, endpoints, or population) between the historical and current trials. This assumption is not testable in a trial without a concurrent placebo group.

Another important design assumption is assay sensitivity, i.e., the ability for the trial to be able to detect differences between strategies if they truly exist. Otherwise noninferiority may be concluded simply due to insensitivity of the trial to detect differences. In noninferiority trials, assay sensitivity (essentially making strategies appear similar) can be reduced (intentionally or unintentionally) by diluting effects though subtle choices about design and conduct. Many factors can affect assay sensitivity including: poor disease diagnosis, endpoint selection and timing, poor adherence, loss-to-follow-up, prior therapy, inclusion of subgroups where treatment effects may be small, and use of concomitant therapies. Furthermore the active-control nature of the most noninferiority trials, can make clinicians and participants more likely to rate positive outcomes, driving the results toward noninferiority.

The methodologies for two-arm (an experimental intervention and an effective active control) noninferiority clinical trials have been well-established. However, two-arm noninferiority trials often lack the necessary support for the assay sensitivity and constancy assumptions. As a result, inclusion of a third arm (placebo) into the trial has been proposed to address these concerns (Pigeot et al., 2003; Koch and Röhmel, 2004; Hauschke and Pigeot, 2005a). Regulatory authorities often recommend a use of such a three-arm (experimental intervention, active control, and placebo) noninferiority trial design (ICH, 2000; CHMP, 2005; FDA, 2010). The three-arm noninferiority trial offers several scientific advantages (ICH, 2000). Specially these designs provide the opportunity of establishing the validity of the assay sensitivity via a comparison of the placebo to the active control intervention within the trial. Although the three-arm noninferiority design provides such scientific advantages, it also provides challenges: (1) there may be ethical constraints to using a placebo, and (2) there is the added complexity of evaluating two distinct objectives: evaluation of (i) the superiority of the active control intervention to placebo (assay sensitivity) and (ii) the noninferiority of the experimental intervention to the active control intervention (noninferiority). This may result in a sample size that is too large and impractical to conduct. One approach to address this concern is the use of group-sequential designs. The group-sequential design offers the possibility to stop a trial early when evidence is overwhelming and thus offers efficiency (i.e., potentially fewer trial participants and minimizing the amount time that participants receive a placebo, compared to fixed-sample designs).

In this paper, we discuss group-sequential designs for three-arm noninferiority clinical trials. We extend two existing approaches for evaluating noninferiority and assay sensitivity into a group-sequential setting. One approach is discussed by Koch and Röhmel (2004), and Hida and Tango (2011a, 2013) (hereafter we call this “fixed margin approach”), and the other is so-called “fraction approach” proposed by Pigeot et al. (2003). We consider a three-arm noninferiority trial that has two co-primary objectives: (i) to evaluate if the control intervention is superior to placebo (assay sensitivity: AS) and (ii) to evaluate if the experimental intervention is not less effective than the control intervention by a prespecified non-inferiority margin (noninferiority: NI). Objective (ii) is relevant when the experimental intervention has advantages over the control (e.g., safer, more convenience, or less costly). On the other hand, in many noninferiority clinical trials, especially in a regulatory setting, demonstrating the superiority of the experimental intervention to placebo is desirable. However, as Gao and Ware (2008) discuss, if the assay sensitivity assumption does not hold, then there will be uncertainty regarding whether a noninferiority result means that they are similarly effective or similarly ineffective. In this paper, when there is a concern about the assay sensitivity, to make the evaluation of objective (ii) more interpretable, we evaluate a direct comparison of the control intervention to the placebo. For related discussions, please see Hauschke and Pigeot (2005a, 2005b) and Stucke and Kieser (2012).

Three-arm noninferiority clinical trials in a group-sequential setting have been discussed (Li and Gao, 2010; Schlömer and Brannath, 2013), but methodologies are still needed. Extensions of the fraction approach are discussed by Li and Gao (2010) and the fixed margin approach by Schlömer and Brannath (2013), in a setting of two-stage group-sequential three-arm noninferiority clinical trials with continuous or binary outcomes. We discuss two decision-making frameworks for the two approaches when the primary endpoint is continuous. We also discuss a method for sample size recalculation based on the observed effect size at an interim timepoint of the trial.

This paper is structured as follows: in Section 2, we describe the statistical settings and provide methods for the overall power for rejecting the null hypotheses for assay sensitivity and noninferiority when using the two decision-making frameworks for the fixed margin and the fraction approaches in a group-sequential setting. Then, we evaluate the operating characteristics including power, Type I error rate, and sample sizes, as design factors vary in Section 3. In Section 4, we discuss sample size recalculation and consider its’ impact on the power and Type I error rate via a simulation study. In Section 5, we discuss a further extension and summarize the findings in Section 6.

2. Assessment of assay sensitivity and noninferiority in group-sequential designs

Consider a three-arm noninferiority group sequential clinical trial with a maximum of K planned analyses (K ≥2). Let nEk, nRk and nPk be the cumulative numbers of participants on the experimental intervention (E), control intervention (R), and placebo (P) groups respectively at the kth analysis (k = 1, …, K). Let the allocation ratios of the active intervention and placebo groups relative to the experimental intervention group be nEk : nRk : nPk = 1: CR: CP, where CR(> 0) and CP(> 0). When the groups are equally sized, then CR = CP = 1. Hence up to nEk, nRk = CRnEk and nPk = CPnEk participants are recruited and randomly assigned to either of the intervention groups. So that the sample size required for the final analysis NK is NK = nEk + nRk + nPk = (1 + CR + CP)nEk.

Assume that the group outcomes XEiE, XRiR and XPiP are independently and normally distributed with common variance σ2 as XEiE ~ N(μE, σ2), XRiR ~ N(μR, σ2) and XPiP ~ N(μP, σ2), respectively (iE = 1, …, nEk; iR = 1, …, nRk; iP = 1, …, nPk), where μE, μR and μP are the means of the experimental intervention, active control and placebo groups respectively, and that a larger mean represents a more preferable outcome. For simplicity, the variance σ2 is assumed to be known.

2.1 The fixed margin approach

For the fixed margin approach, the hypotheses for evaluating AS and NI are:

H0AS:μR-μPΔversusH1AS:μR-μP>Δ,
(1)

H0NI:μE-μR-ΔversusH1NI:μE-μR>-Δ,
(2)

where Δ(> 0) is a pre-specified noninferiority margin (Hida and Tango, 2011a). This approach imposes an extra condition on the hypothesis testing for the AS, that is superiority of the control intervention to the placebo is demonstrated with a margin Δ. However, the key feature of the approach is that the inequalities μP < μR − Δ< μE hold for any value of Δ if both of the null hypotheses H0AS and H0NI are rejected at the significance level of α for a one-sided test. This means that the superiority of the experimental intervention relative to the placebo can be indirectly demonstrated if H0AS and H0NI are rejected, without direct comparison of the experimental intervention to the placebo. This avoids introduction of further complexities in adjustment to the Type I or Type II error (Hida and Tango, 2011a).

We are now interested in hypothesis testing for AS and NI based on the fixed margin approach within a group-sequential setting. The corresponding statistics for testing hypotheses (1) and (2) at the kth analysis are given by

TkAS=X¯Rk-X¯Pk-Δσ1/nRk+1/nPkandTkNI=X¯Ek-X¯Rk+Δσ1/nEk+1/nRk,

where XEk, XRk and XPk are the sample means in the experimental intervention, active control intervention and placebo groups at the kth analysis, given by X¯Ek=(iE=1nEkXEiE)/nEk,X¯Rk=(iR=1nRkXRiR)/nRk and X¯Pk=(iP=1nPkXPiP)/nPk. Then ( TkAS,TkNI) is bivariate normally distributed with the correlation

corr[TkAS,TkNI]=ρ=-nEknPk(nEk+nRk)(nRk+nPk)=-CP(1+CR)(CR+CP).

The correlation is determined by the allocation ratios CP and CR (Hida and Tango, 2011a). The test statistics ( TkAS,TkNI) are negatively correlated and the correlation is ρ = −0.5 if the intervention groups are equally sized, i.e., CR = CP = 1. Furthermore, the joint distribution of ( T1AS,,TKAS,T1NI,,TKNI) are 2K multivariate normal distributed with correlations given by corr(TkAS,TkAS)=corr(TkNI,TkNI)=nEk/nEk, and corr(TkAS,TkNI)=corr(TkNI,TkAS)=ρnEk/nEk(1kkK) since TkAS and TkNI can be rewritten as TkAS=nEk(X¯Rk-X¯Pk-Δ)/(σ1/CR+1/CP) and TkNI=nEk(X¯Rk-X¯Pk-Δ)/(σ1+1/CR).

2.2 The fraction approach

For the fraction approach, the hypotheses for evaluating AS and NI are as follows;

H0AS:μR-μP0versusH1AS:μR-μP>0,
(3)

H0NI:(μE-μP)/(μR-μP)θversusH1NI:(μE-μP)/(μR-μP)>θ,
(4)

where θ(0 < θ < 1) is pre-specified and determined by θ = 1 − Δ/μRμP) as a fraction of the difference between μR and μP, using the noninferiority margin Δ (Pigeot et al., 2003). In addition, hypothesis testing is logically ordered, i.e., H0AS is tested first and then H0NI is tested if and only if H0AS is rejected at the prespecified significance level of α. If both null hypotheses H0AS and H0NI are rejected, then μE > μP irrespective of θ since μEμP > θ(μRμP) > 0. Many authors have discussed the fraction approach in fixed-sample designs; binary outcomes are discussed by Tang and Tang (2004) and Kieser and Friede (2007), time to event outcomes by Mielke et al. (2008) and Kombrink et al. (2013), and continuous outcomes with heterogeneous variances by Hasler et al. (2008).

We focus on hypothesis testing based on the fraction approach within a group-sequential setting. Assuming μRμP > 0, the hypothesis (4) can be rewritten as

H0NI:μE-θμR-(1-θ)μP0versusH1NI:μE-θμR-(1-θ)μP>0.

The corresponding statistics for testing hypotheses (3) and (4) at the kth analysis are given by

TkAS=X¯Rk-X¯Pkσ1/nRk+1/nPkandTkNI=X¯Ek-θX¯Rk-(1-θ)X¯Pkσ1/nEk+θ2/nRk+(1-θ)2/nPk.

The joint distribution of ( T1AS,,TKAS,T1NI,,TKNI) are 2K multivariate normal distributed with their correlations given by the same correlation structure as the fixed margin approach, but the correlation of TkAS and TkNI is given by

corr[TkAS,TkNI]=ρ=-θ/CR+(1-θ)/CP1+θ2/CR+(1-θ)2/CP1/CR+1/CP.

The correlation is determined by the fraction θ and the allocation ratios CP and CR:ρ=(1-2θ)/21-θ+θ2 if the intervention groups are equally sized, i.e., CR = CP = 1.

There are important differences in the two approaches (Röhmel and Pigeot, 2011; Hida and Tango, 2011a, 2011b; Stucke and Kieser, 2012). Specifically the concept of “assay sensitivity” is different. A different conclusion is driven from the two approaches when μR − Δ< μP < μR is true (Hida and Tango, 2011b). The fraction approach can reject H0NI, but the fixed margin approach cannot. Whether the fraction approach can allow demonstration of noninferiority of the experimental intervention to the control intervention is questionable under μR − Δ< μP. For further discussion, please see Röhmel and Pigeot (2011), Hida and Tango (2011b), and Stucke and Kieser (2012).

2.3 Decision-making frameworks, stopping rules and powers

We consider the two decision-making frameworks associated with hypothesis testing. The first decision-making framework is flexible, where testing hypotheses for AS and NI are logically ordered similarly as in the fraction approach, i.e., NI is evaluated only after the AS is demonstrated and a trial is terminated if H0AS and H0NI are rejected at any analysis (i.e., not necessarily simultaneously) (DF-A). The other framework is relatively simple and a special case of DF-A, where a clinical trial is terminated if and only if both H0AS and H0NI are rejected simultaneously at the same analysis (DF-B). We separately describe the two decision-making frameworks, corresponding stopping rules and power definitions.

DF-A

Under DF-A, a trial stops if the AS and the NI are achieved at any analysis (i.e., not necessarily simultaneously). NI is evaluated only after the AS is demonstrated. If AS is demonstrated but NI is not, then the trial continues and subsequent hypothesis testing is repeatedly conducted only for NI until the NI is demonstrated. The stopping rule for DF-A is formally given as follows:

  • At the kth analysis (k = k′, …, K − 1),
    • if TkAS>ckAS for some k′(1 ≤ k′ ≤ k) and TkNI>ckNI, then reject H0NI and stop the trial
    • otherwise, continue the trial,
  • at the Kth analysis,
    • if TkAS>ckAS for some k′(1 ≤ k′ ≤ K) and TKNI>cKNI, then reject H0NI,
    • otherwise, do not reject H0NI,

where ckAS and ckNI are the critical boundaries at the kth analysis, which are constant and selected separately for AS and NI to preserve the Type I error of α for each hypothesis, using any group-sequential method such as Lan-DeMets (LD) alpha-spending method (Lan and DeMets, 1983), analogously to a trial with a single primary objective. For example, consider a three-arm noninferiority clinical trial with a maximum number of analyses K = 4 and equally spaced increments of information, and the O’Brien-Fleming boundary (O’Brien and Fleming, 1979) is used to reject the null hypothesis for the AS and the NI tests with the same significance level of α = 2.5% for a one-sided test. The boundaries for each analysis are 4.3326, 2.9631, 2.3590, and 2.0141, respectively. If the AS test is statistically significant at the third analysis, then the NI test is evaluated twice with the boundary of 2.3590 at the third analysis and 2.0141 at the final analysis as if the Type I error for the NI test has been already spent at the first and second analyses despite no test being conducted. Even if the AS test is statistically significant at the third analysis, the remaining Type I error of 1.5% (=2.5−1.0) is not reallocated to the hypothesis test for NI. If the remaining Type I error rate of 1.5% for the AS test is reallocated to the hypothesis test for NI, then the size of the hypothesis tests for AS and NI are at most α = 4.0% (=1.5+2.5) since the test is the intersection-union.

Therefore the overall power for rejecting the both H0AS and H0NI under H1AS and H1NI in DF-A is

1-β=Pr[1kkK{{TkAS>ckAS}{TkNI>ckNI}}H1ASH1NI].

This power can be evaluated using the numerical integration method in Genz (1992) or other methods.

When using the fixed margin approach, DF-A allows for dropping of the placebo group if AS is demonstrated at the interim. However, when using the fraction approach, DF-A cannot allow this as the test statistics for the NI includes the amount of XPk.

DF-B

Under DF-B, a trial is stopped if AS and NI are demonstrated at the same analysis simultaneously. Otherwise the trial will continue and the subsequent hypothesis testing is repeatedly conducted for both AS and NI until simultaneous significance is reached. The stopping rule for DF-B is formally given as follows:

  • At the kth analysis (k = 1, …, K − 1),
    • if TkAS>ckAS and TkNI>ckNI simultaneously, then reject H0AS and H0NI, and stop the trial,
    • otherwise, continue the trial,
  • at the Kth analysis
    • if TKAS>cKAS and TKNI>cKNI then reject H0AS and H0NI,
    • otherwise, do not reject H0AS and H0NI.

Similarly as in the DF-A, the critical boundaries at the kth analysis ckAS and ckNI are constant and selected separately for the AS and the NI tests to preserve the Type I error of α for each hypothesis, using any group-sequential method, analogously to a trial with a single primary objective. Therefore, the overall power for rejecting both H0AS and H0NI under H1AS and H1NI in DF-B is

1-β=Pr[k=1K{{TkAS>ckAS}{TkNI>ckNI}}H1ASH1NI].

Power can also be numerically assessed by using multivariate normal integrals.

Based on the powers for DF-A and DF-B discussed above, in a group-sequential setting, we describe two sample size concepts, i.e., the maximum sample size (MSS) and the average sample number (ASN). The MSS is the sample size required for the final analysis to achieve the desired overall power 1 − β for rejecting both null hypotheses for AS and NI. The MSS is the smallest integer not less than NK satisfying the desired power for a group-sequential strategy at the prespecified hypothetical values of parameters μE, μR and μP, σ2, and Δ with Fisher’s information time for the interim analyses. The ASN is the expected sample size under hypothetical reference values and provides information regarding the number of participants anticipated in a group-sequential design in order to reach a decision point. The definitions of ASNs corresponding to the two decision-making frameworks for the fixed margin and fraction approaches are given in the Appendix.

To identify the value of nEk or NK, a simple strategy is to implement a grid search to gradually increase (or decrease) nEk until the power under nEk exceeds (or falls below) the desired power. The grid search often requires considerable computing time, especially with a larger number of analyses, or a small effect size. To reduce the computing time, the Newton–Raphson algorithm in Sugimoto et al. (2012) or the basic linear interpolation algorithm in Hamasaki et al. (2013) may be utilized. In this paper, we use of the basic linear interpolation algorithm to reduce the computing time.

3. Operating characteristics

In this section, we investigate the operating characteristics of the fixed margin and fraction approaches for group-sequential designs based on the two decision-making frameworks, where the number of planned analyses is K = 4. Specifically we evaluate the overall Type I error rate and overall power under a given sample size. Referring to the settings discussed in Hida and Tango (2011a), assume the means (μE, μR, μP) are (10,10,5) with a common standard deviation σ = 6.5. The pre-specified noninferiority margin for the fixed margin approach is Δ = 2.5 and the corresponding fraction for the fraction approach is θ = 0.5. The three allocation ratios nEk:nRk:nPk, considered are: (i) 1:1:1 (CR = CP = 1), (ii) 2:1:1 (CR = CP = 1/2), and (iii) 5:4:1(CR = 4/5, CP = 1/5). The critical values are determined based on the O’Brien–Fleming-type boundary (OF), Pocock-type boundary (PC) (Pocock, 1977) or their combinations, using the LD alpha-spending method with equally spaced increment in information. Four stopping boundary combinations are considered: (i) the OF for both AS and NI (OF-OF), (ii) the OF for AS and the PC for NI (OF-PC), (iii) the PC for AS and the OF for NI (PC-OF), and (iv) the PC for AS and NI (PC-PC). Under these parameter configurations and settings, for the overall power evaluation, the probability of rejecting both of the null hypotheses for AS and NI is calculated at the significance level of α= 2.5% for a one-sided test under H1AS and H1NI. For the overall Type I error rate evaluation, the probability is calculated under the three situations: (i) H0AS and H0NI, (ii) H0AS or (iii) H0NI. We also evaluate the behavior of MSS and ASN to demonstrate both AS and NI with the desirable power of 1 − β = 80% at the prespecified significance level of α = 2.5% for a one-sided test.

3.1 Behavior of the power and Type I error rate

Figures 1 and and22 illustrate the behavior of the power for rejecting both null hypotheses as a function of the experimental intervention group sample size, when using DF-A and DF-B for the fixed margin and fraction approaches.

Figure 1
Behavior of the power for rejecting both (i) the null hypotheses for the AS and (ii) NI, as a function of the experimental intervention group sample size, when using DF-A or DF-B for the fixed margin approach, where the number of planned analyses is ...
Figure 2
Behavior of the power for rejecting both (i) the null hypotheses for the AS and (ii) NI, as a function of the experimental intervention group sample size, when using DF-A or DF-B for the fraction approach, where the number of planned analyses is K = 4. ...

For the fixed margin approach, there is no practical difference in the overall power between DF-A and DF-B although DF-A provides a slightly higher power than DF-B in all of the stopping boundary combinations. In all three allocation ratios for DF-A and DF-B, the highest power is given by OF-OF and the lowest is by PC-PC. For the fraction approach, there is also no practical difference in the overall power between DF-A and DF-B although DF-A provides a slightly higher power than DF-B in all of the stopping boundary combinations. In all three allocation ratios for DF-A and DF-B, the highest power is given by OF-OF or PC-OF and the lowest by OF-PC or PC-PC. Comparing the powers for the fixed margin and fraction approaches, the fixed margin approach provides consistently lower power than the fraction approach in all of the decision-making frameworks, the stopping boundary combinations, and allocation ratios.

Figures 3 to to66 illustrate the behavior of the Type I error rate for rejecting both null hypotheses as a function of the sample size for an experimental intervention group, when using DF-A and DF-B for the fixed margin and fraction approaches. For the fixed margin approach, the maximum of the Type I error is not inflated over the targeted significance level of α = 2.5% in any of the decision-making frameworks, the stopping boundary combinations, or allocation ratios, but the Type I error rate is small and conservative, especially when H0AS and H0NI are true. There is no significant difference in the Type I error rates between DF-A and DF-B, but DF-B provides a smaller Type I error rate than DF-A. In all three allocation ratios and null hypothesis settings, the largest Type I error rate is given by OF-OF or PC-OF for DF-A, and OF-OF for DF-B. For the fraction approach, the maximum of the Type I error is similarly not inflated over the prespecified significance level of α = 2.5% in any of the decision-making frameworks, the stopping boundary combinations or allocation ratios, but the Type I error rate is small especially when H0AS and H0NI are true. There is no practical difference between the Type I error rates of DF-A and DF-B, but DF-B provides a smaller Type I error rate than DF-A. In all of the three allocation ratios and null hypothesis settings, the largest Type I error rate is given by OF-OF for both DF-A and DF-B.

Figure 3
Behavior of the Type I error rate for rejecting both null hypotheses (i) for the AS and (ii) NI as a function of the experimental intervention group sample size, when using DF-A for the fixed margin approach, where the number of planned analyses is K ...
Figure 6
Behavior of the Type I error rate for rejecting both null hypotheses (i) for the AS and (ii) NI as a function of the experimental intervention group sample size, when using DF-B for the fraction approach, where the number of planned analyses is K = 4. ...

Comparing the fixed margin and fraction approaches, the fixed margin approach provides consistently smaller power than the fraction approach in all of the decision-making frameworks, the stopping boundary combinations and allocation ratios.

3.2 Behavior of the sample size

Table 1 displays the MSS and ASN required for evaluating AS and NI with the power of 1 − β =80% at the significance level of α = 2.5% for a one-sided test, when using the fixed margin and fraction approaches based on DF-A and DF-B. For the fixed margin approach based on DF-A, the ASN is calculated under H1AS and H1NI in two ways: in one strategy the placebo group is not discontinued until NI is demonstrated even when AS is demonstrated at an analysis (ASN1); while in the other strategy the placebo group is discontinued when AS is demonstrated at an analysis (ASN2). The definitions of ASN1 and ASN2 are given in Appendix.

Table 1
The MSS and ASN for demonstrating the AS and the NI with the power of 1 − β =80% at the significance level of α = 2.5% for a one-sided test, where the maximum planned number of analyses is K = 4 and the means (μE, μ ...

For both the fixed margin and fraction approaches, in all of the stopping boundary combinations and allocation ratios, there is a modest difference in the MSS and ASN between the DF-A and DF-B although DF-A provides a slightly smaller sample size than DF-B. For the fixed margin approach, the smallest MSS is given by OF-OF and the largest by PC-PC in all of the allocation ratios. The smallest ASN1 is associated with OF-OF or PC-PC and the largest with PC-OF or OF-PC. The largest ASN2 is provided by PC-OF or OF-PC in all of the allocation ratios. For the fraction approach, the smallest MSS is provided by OF-OF or PC-OF and the largest by OF-PC or PC-PC in all of the allocation ratios. The smallest ASN1 is consistently produced with PC-PC and the largest with OF-PC. Comparing the fixed margin and fraction approaches, the fraction approach provides smaller MSS and ASN than the fixed margin approach in all of the decision-making frameworks, the stopping boundary combinations and allocation ratios.

4. Sample size recalculation

Clinical trials are designed based on assumptions often constructed based on prior data. However, prior data may be limited or an inaccurate indication of future data, resulting in trials that are over- or underpowered. Interim analyses at accumulating data provide an opportunity to evaluate the accuracy of the design assumptions and potentially make design adjustments (i.e., to the sample size) if the assumptions were markedly inaccurate. Group-sequential designs allow for early stopping when there is sufficient statistical evidence of assay sensitivity and noninferiority. However, more modern adaptive designs may also allow for increases (or decreases) in the sample size if effects are smaller (or larger) than assumed. Such adjustments must be conducted carefully for several reasons, especially to maintain control of statistical error rates. In this section, we discuss sample size recalculation based on the observed intervention’s effects at an analysis with a focus on the control of statistical error rates.

4.1 Cui-Hung-Wang test statistics and conditional power

We now consider a scenario where the maximum sample size nEk in the experimental intervention group is recalculated to nEK at the kth analysis, by allowing both an increase or a decrease in sample size. Suppose nEK is subject to nEk<nEKλnEK, where λ is a pre-specified constant for the maximum allowable sample size. In addition, let the sample size at the (k + l)th analysis be nEk+l=(nEK-nEk)/(nEK-nEk)(nEk+l-nEk)+nEk(l=1,,K-k).

Consider the Cui-Hung-Wang (CHW) statistics (Cui et al., 1999) for sample size recalculation in group-sequential designs for three-arm clinical trials to preserve the overall Type I error rate at a prespecified significance level of α even when the sample size is increased and conventional test statistics are used. When using the fixed margin approach, the CHW statistics for the AS and the NI are

Tk+lAS=nEknEk+lTkAS+nEk+l-nEknEk+lX¯Rk+l-X¯Pk+l-Δσ1/(nEk+l-nEk)1/CR+1/CP,andTk+lNI=nEknEk+lTkNI+nEk+l-nEknEk+lX¯Ek+l-X¯Rk+l+Δσ1/(nEk+l-nEk)1+1/CR,

where X¯Ek+l=(i=nEk+1nEk+lXEi)/(nEk+l-nEk),X¯Rk+l=(i=nRk+1nRk+1XEi)/(nRk+l-nRk), and X¯Pk+l=(i=nPk+1nPk+lXPi)/(nPk+l-nPk). The sample size is increased or decreased when the conditional power evaluated at the kth analysis is lower or higher than the desired power 1 − β. Under the planned maximum sample size and a given observed value of ( TkAS,TkNI), if the decision-making for rejecting the null-hypotheses H0AS and H0NI is based on DF-A, then the conditional power at the kth analysis is given by

CP(δ1,δ2)={Φ2(-cKAS-mtkAS1-m+nEK-nEkδ11/CR+1/CP,-cKNI-mtkNI1-m+nEK-nEkδ21+1/CR),ifTlNIclNIandTlASclASforalll=1,,k1-Φ1(cKNI-mtkNI1-m-nEK-nEkδ21+1/CR),ifTlNIclNIforalll=1,,k,andTlAS>clASforanyl=1,,k,

where m = nEk/nEk, δ1 = (μRμP − Δ)/σ and δ2 = (μEμR + Δ)/σ. In addition, Φ1(·) and Φ2(·) are the cumulative distribution functions of the standardized univariate and bivariate normal distributions. The critical values clAS and clNI are the same critical values utilized for the case without sample size recalculation. If the decision making is based on DF-B, then the conditional power is

CP(δ1,δ2)=Φ2(-cKAS-mtkAS1-m+nEK-nEkδ11/CR+1/CP,-cKNI-mtkNI1-m+nEK-nEkδ21+1/CR).

The details of the derivation for the conational powers are given in Appendix. On the other hand, when using the fraction approach, the CHW statistics are given by

Tk+lAS=nEknEk+lTkAS+nEk+l-nEknEk+lX¯Rk+l-X¯Pk+lσ1/(nEk+l-nEk)1/CR+1/CPandTk+lNI=nEknEk+lTkNI+nEk+l-nEknEk+lX¯Ek+l-θX¯Rk+l-(1-θ)X¯Pk+lσ1/(nEk+l-nEk)1+θ2/CR+(1-θ)2/CP.

The conditional power can be calculated in the same manner as the fixed margin approach except for δ1 = (μRμP)/σ and δ2 = (μEθμR − (1 − θ)μP)/σ, and the coefficient of δ2 changes 1+1/CR to 1+θ2/CR+(1-θ)2/CP.

When recalculating the sample size, we consider three possible options, i.e., (i) only allowing an increase in the sample size, (ii) only allowing a decrease in the sample size, and (iii) allowing an increase or a decrease in sample size. The recalculated maximum sample size nEK required for the experimental intervention group for each respective option is:

  1. nEK={min(nEK,λnEK),ifCP(δ1,δ2)<γ(1-β),nEK,otherwise;
  2. nEK={nEK,ifCP(δ1,δ2)>η(1-β),nEK,otherwise;
  3. nEK={min(nEK,λnEK),ifCP(δ1,δ2)<γ(1-β),nEK,ifCP(δ1,δ2)>η(1-β),nEK,otherwise;

where nEK is the calculated sample size in which the conditional power based on the observed effect sizes achieves the target power of 1 − β, and γ(0 < γ < 1) and η(η > 1) are the pre-specified constant for allowance.

4.2 Simulation study

We investigate the impact of the sample size recalculation on the overall power and the Type I error rate for rejecting both null hypotheses using Monte-Carlo simulation.

Consider group sequential designs with four analyses (i.e., three interim and one final analysis) for the fixed margin and the fraction approaches with the decision-making frameworks DF-A or DF-B, where analyses are conducted with equally spaced increments in information. The original planned total sample size is calculated to test both null hypotheses for the AS and the NI with the power of 1 − β = 80% at the prespecified significance level of α = 2.5% for a one-sided test in the fixed-sample designs, where (μE, μR, μP) = (10, 10, 5), σ = 6.5, and Δ = 2.5 for the fixed margin approach, and θ = 0.5 for the fraction approach. The total sample sizes are 426 for the fixed margin approach and 240 for the fraction approach.

One sample size recalculation is considered based on the observed effects with three options: (i) only allowing an increase in the sample size, (ii) only allowing a decrease in the sample size, and (iii) allowing an increase or decrease in sample size, evaluated at the first, second or third interim analyses. The sample size calculation is performed when the conditional power is less than 70% (i.e., γ is set as 0.875) and/or exceeds 90% (i.e., η is set as 1.125). The sample size for the experimental intervention group can be increased up to 1.5 times of the original planned sample size (i.e., λ is set as 1.5). Similarly as in Section 3, the critical values are determined based on the three stopping boundary combinations considered using the LD alpha-spending method with equally spaced increments in information: (i) the OF for both AS and NI (OF-OF), (ii) the PC for AS and the OF for NI (PC-OF), and (iii) the PC for both AS and NI (PC-PC). The empirical overall power is evaluated under the situation where both H1AS and H1NI are true with (μE, μR, μP) = (10, 10, 5), and the actual overall Type I rate is evaluated under the three situations, i.e., (i) H0AS and H0NI, (ii) H0AS or (iii) H0NI. The number of replications for the simulation is set to 1,000,000 for the evaluation of the Type I error rate and 100,000 replications for the power. The number of replications for the simulation was determined based on the precision. 1,000,000 replications provides a two-sided 95% confidence interval with a width equal to 0.001 when the proportion is 2.5%, while 100,000 replications provides a two-sided 95% confidence interval with a width equal to 0.005 when the proportion is 80%. We limit the discussion to the overall empirical power and Type I error rate for DF-A with equally sized groups of CR = CP = 1 as there is no appreciable differences in the power and Type I error rates between DF-A and DF-B.

Figure 7 summarizes the overall empirical power for the fixed margin and fraction approaches when sample size recalculation is performed based on DF-A under the design setting and parameter configurations described above. For both the fixed margin and fraction approach, regardless of the stopping boundary combinations, the overall powers increase up to 15% compared to a design without sample size recalculation. Larger increases are observed with a later timing of the sample size recalculation when only allowing an increase in the sample size, and allowing an increase or decrease in the sample size. As shown in Table 2, the ASN are approximately equal or smaller than the planned sample size. When only allowing a decrease in the sample size, the overall power cannot maintain the targeted power of 80% although the expected sample size can be reduced more than the other recalculation options. Figures 8 and and9 summarize9 summarize the actual overall Type I error rates. For both the fixed margin and fraction approaches, regardless of the stopping boundary combinations, in all three recalculation options, the actual Type I error rates do not exceed the prespecified significance level of 2.5%, but are small and conservative, especially in the fixed margin approach.

Figure 7
The impact of the sample size recalculation on the overall empirical power for rejecting both null hypotheses for the AS and the NI, when using DF-A for the fixed margin and fraction approaches, where the number of planned analyses is K = 4. One sample ...
Figure 8
The impact of sample size recalculation on overall actual Type I error rate for rejecting both null hypotheses for the AS and the NI, when using DF-A for the fixed margin approach, where the number of planned analyses is K = 4. One sample size recalculation ...
Figure 9
The Impact of sample size recalculation on overall actual Type I error rate for rejecting both null hypotheses for the AS and the NI, when using DF-A for the fraction approach, where the number of planned analyses is K = 4. One sample size recalculation ...
Table 2
The ASN when sample size recalculation is performed, using DF-A for the fixed margin and fraction approaches, where the number of planned analyses is K = 4. One sample size recalculation is conducted based on the observed effect at either of 1/4, 1/2, ...

5 A further extension

When constructing efficient group-sequential designs in three arm noninferiority clinical trials, a major issue is that the Type I error rate is small and conservative as the rejection region of the null hypotheses H0AS and H0NI is restricted even when group-sequential designs are used for evaluating the AS and the NI, as with the fixed-sample designs. This is due to the requirement that the allocation of the Type I error to each analysis for the AS and the NI should be prespecified and determined, using an alpha-spending method.

To overcome this issue, the DF-A can be modified to allocate adaptively the Type I error to each analysis for the NI although the Type I error allocation for the AS is prespecified. This idea is first discussed by Tsong et al. (2004) in group-sequential three-arm clinical trials when assessing the equivalence and efficacy of a generic product, where the co-primary objectives of the trial are to assess whether the generic and reference product are effective relative to placebo and whether the generic is equivalent to the reference product with a prespecified equivalence margin. Their method evaluates equivalence only after both null hypotheses of efficacy are rejected and then specifies the Type I error allocation before the equivalence evaluation is performed. In the three-arm noninferiority clinical trials for the assessment of the AS and the NI, the NI is evaluated only after the AS is demonstrated and the Type I error allocation for the NI is specified just before the NI evaluation is performed. Figure 10 illustrates the behavior of the Type I error rate for rejecting both null hypotheses for the AS and the NI as a function of sample size for an experimental intervention group, when using the fixed margin approach based on DF-A with adaptive Type I error allocation for the NI in a group-sequential setting, where the parameter settings and configuration are same as in Figure 3, but for nEk:nRk:nPk = 1:1:1 (CR = CP = 1). The Type I error rate is improved compared to that seen in Figure 3, but it is not inflated over the targeted significance level. This improvement in the Type I error rate is expected to provide smaller sample sizes but will increase complexity. Further investigation will be required to evaluate how much the method may improve the efficiency and whether this outweighs the complexity. The methodology for adaptive Type I error allocation may be applicable to the fraction approach. However, as seen in Section 2, the correlation between the two test statistics for the AS and the NI are determined by the allocation ratio and the fraction margin and the test statistics may be positively correlated in some situations (e.g., nEk:nRk:nPk = 1:1:1 and θ < 0.5). By analogy to work by Hamasaki et al (2015), the Type I error rate is inflated over the targeted significance level when the test statistics are positively correlated. Therefore use of adaptive Type I error allocation for the fraction approach should be carefully considered.

Figure 10
Behavior of the Type I error for rejecting both null hypotheses (i) for AS and (ii) NI as a function of the experimental intervention group sample size, when using adaptive Type I error allocation for the fixed margin approach, where the planned number ...

6 Summary

Noninferiority clinical trials recently have received a great deal of attention by regulatory authorities (CHMP, 2005; FDA, 2010) and in the clinical trials literature (e.g., extensive reference found in Rothman et al. (2012)). Noninferiority clinical trials have complexities requiring careful design, monitoring, analyses, and reporting. When designing noninferiority clinical trials, the constancy and assay sensitivity are the important assumptions. The selection of the active control for a noninferiority trial should be done carefully, ensuring that it has demonstrated and precisely measured superiority over placebo and that its effect has not changed compared to the historical trials that demonstrated its efficacy (constancy assumption). To assess these issues in regulatory medical product development, a use of three-arm noninferiority design including an experimental intervention, an active control intervention, and a placebo has been considered as a gold-standard design. But this trial may raise ethical issues and result in a sample size that is too large and impractical to conduct the clinical trial.

In this paper, we discuss three-arm noninferiority clinical trials and extend two existing approaches, i.e., the fixed margin and fraction approaches, for evaluating noninferiority and assay sensitivity to a group-sequential setting with two decision-making frameworks. We evaluate the operating characteristics including power, Type I error rate, maximum and expected sample sizes as design factors vary. We also discuss sample size recalculation and consider its’ impact on the power and Type I error rate via a simulation study. Our findings are summarized as follows:

  • The decision-making frameworks of DF-A and DF-B for the fixed margin and the fraction approaches provide the possibility of stopping a trial early when evidence is overwhelming, thus offering efficiency (e.g., an ASN potentially 4% to 15% fewer than the fixed-sample designs with equally sized groups and four analyses)
  • There are no major differences in both MSS and ASN between DF-A and DF-B for the fixed margin and the fraction approaches, although DF-A is slightly more powerful than DF-B. By using the DF-A for the fixed margin approach, the time that participants are exposed to placebo can be minimized as the DF-A allows dropping of the placebo group if assay sensitivity has been demonstrated at an analysis.
  • For the fixed margin approach, selecting the O’Brien-Fleming-type boundary for both AS and NI could lead to fewer participants for the MSS and the ASN compared with other boundary combinations. On the other hand, for the fraction approach, selecting the O’Brien-Fleming-type boundary for both AS and NI, or the Pocock-type boundary for AS and the O’Brien-Fleming-type boundary for NI provides better efficiency with respect to the MSS and the ASN compared with other boundary combinations.
  • When considering sample size recalculation during a trial, only allowing a decrease in the sample size may be not a desirable option in both the fixed margin and fraction approaches as the power does not reach desired levels, although the expected sample size can be reduced more than the other recalculation options. In addition, the timing of the sample size recalculation should also be carefully considered. Power is increased if the sample size recalculation is carried out later in the trial, but the expected sample size is larger.

We caution that these findings are based on one set of design parameter configurations except for the allocation ratio. Further investigation will be required to evaluate how the power and Type I error rate behave with other design assumptions.

Figure 4
Behavior of the Type I error rate for rejecting both null hypotheses (i) for the AS and (ii) NI as a function of the experimental intervention group sample size, when using DF-B for the fixed margin approach, where the number of planned analyses is K ...
Figure 5
Behavior of the Type I error rate for rejecting both null hypotheses (i) for the AS and (ii) NI as a function of the experimental intervention group sample size, when using DF-A for the fraction approach, where the number of planned analyses is K = 4. ...

Acknowledgments

The authors are grateful to the two anonymous referees and the associate editor for their valuable suggestions and helpful comments that improved the content and presentation of the paper. Research reported in this publication was supported by JSPS KAKENHI under Grant Number 26330038 and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Numbers UM1AI104681. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Appendix A: Average sample number

The ASN is the expected sample size under hypothetical reference values and provides information regarding the number of participants anticipated in a group-sequential design in order to reach a decision point. We briefly describe the several definitions of the ASN corresponding to the decision-making frameworks.

When using DF-A or DF-B for the fixed margin and fraction approaches, if the placebo group is not terminated until NI is demonstrated even when the AS is demonstrated at an analysis, then the ASN can be calculated by

ASN1=k=1KNkPk(μE,μR,μP,Δ,σ2)+(1-k=1KPk(μE,μR,μP,Δ,σ2))NK=NK+k=1K-1(Nk-NK)Pk(μE,μR,μP,Δ,σ2),

where Nk (=nEk + nRk + nPk) is the cumulative number of participants at the kth analysis, and Pk(μE, μR, μP, Δ, σ2) is the stopping probability at the kth analysis assuming that the true values of the intervention’s means are (μE, μR, μP). If the analysis is conducted with equally-spaced increments in information, then Nk can be rewritten as (k/K)NK.

The stopping probability Pk based on DF-A is given by

Pk(μE,μR,μP,Δ,σ2)={Pr[{T1AS>c1AS}{T1NI>c1NI}],k=1,Pr[l=1k-1{TlASclAS}{TkAS>ckAS}{TkNI>ckNI}]+Pr[{T1AS>c1AS}l=1k-1{TlNIclNI}{TkNI>ckNI}]+2l<kPr[m=1l-1{TmAScmAS}{TlAS>clAS}n=lk-1{TnNIcnNI}{TkNI>ckNI}],k2.

For instance, at K = 2, the stopping probabilities P1 and P2 based on DF-A are calculated by multivariate normal integrals as follows:

P1(μE,μR,μP,Δ,σ2)=Pr[{T1AS>c1AS}{T1NI>c1NI}]=c1ASc1NIf2(t1AS,t1NI)dt1NIdt1ASandP2(μE,μR,μP,Δ,σ2)=Pr[{T1ASc1AS}{T2AS>c2AS}{T2NI>c2NI}]+Pr[{T1AS>c1AS}{T1NIc1NI}{T2NI>c2NI}]=Pr[{T1ASc1AS}{T1NIc1NI}{T2AS>c2AS}{T2NI>c2NI}]+Pr[{T1ASc1AS}{T1NI>c1NI}{T2AS>c2AS}{T2NI>c2NI}]+Pr[{T1AS>c1AS}{T1NIc1NI}{T2AS>c2AS}{T2NI>c2NI}]+Pr[{T1AS>c1AS}{T1NIc1NI}{T2ASc2AS}{T2NI>c2NI}]=-c1AS-c1NIc2ASc2NIf4(t1AS,t1NI,t2AS,t2NI)dt2NIdt2ASdt1NIdt1AS+-c1ASc1NIc2ASc2NIf4(t1AS,t1NI,t2AS,t2NI)dt2NIdt2ASdt1NIdt1AS+c1AS-c1NIc2ASc2NIf4(t1AS,t1NI,t2AS,t2NI)dt2NIdt2ASdt1NIdt1AS+c1AS-c1NI-c2ASc2NIf4(t1AS,t1NI,t2AS,t2NI)dt2NIdt2ASdt1NIdt1AS,

where fk(·) is the probability density function of k multivariate normal distribution under the alternative hypotheses H1AS and H1NI. On the other hand, the stopping probability Pk based on DF-B is given by

Pk(μE,μR,μP,Δ,σ2)={Pr[{T1AS>c1AS}{T1NI>c1NI}],k=1,Pr[l=1k-1{{TlASclAS}{TlNIclNI}}{TkAS>ckAS}{TkNI>ckNI}],k2.

When using DF-A for the fixed margin approach, we have an option for discontinuing the placebo group at the interim when the AS is demonstrated. In this situation, the ASN can be calculated by

ASN2=k=1Kl=1k{Nk-(nPk-nPl)}Pkl(μE,μR,μP,Δ,σ2)+(1-k=1KPk(μE,μR,μP,Δ,σ2))NK=NK+k=1Kl=1k{Nk-NK-(nPk-nPl)}Pkl(μE,μR,μP,Δ,σ2),

where Pk|l(μE, μR, μP, Δ, σ2) is given by

Pkl(μE,μR,μP,Δ,σ2)={Pr[{T1AS>c1AS}{T1NI>c1NI}],ifk=l=1,Pr[l=1k-1{TlASclAS}{TkAS>ckAS}{TkNI>ckNI}],ifk=l2,Pr[{T1AS>c1AS}l=1k-1{TlNIclNI}{TkNI>ckNI}],ifk>l=1,Pr[m=1l-1{TmAScmAS}{TlAS>clAS}n=lk-1{TnNIcnNI}{TkNI>ckNI}],ifk>l2.

Appendix B: Conditional power derivation

We briefly describe the derivation of conditional powers discussed in Section 4. As the powers based on the DF-A or DF-B for the fixed margin and the fraction approaches can be derived in the same way, we only focus on the conditional powers based on the DF-A for the fixed margin approach.

Under the planned maximum sample size and a given observed value of ( TkAS,TkNI), the conditional power based on the DF-A evaluated at the kth analysis is

CP={Pr[{TKAS>cKAS}{TKNI>cKNI}TkAS=tkAS,TkNI=tkNI],ifTlNIclNIandTlASclASforalll=1,,k,Pr[TKNI>cKNITkNI=tkNI],ifTlNIclNIforalll=1,,k,andTlAS>clASforanyl=1,,k.
(B1)

For the fixed margin approach, the conditional distribution of ( TKAS,TKNITkAS=tkAS,TkNI=tkNI) is a bivariate normal distribution with mean vector given as

(mtkAS+1-mnEK-nEkδ11/CR+1/CP,mtkNI+1-mnEK-nEkδ21+1/CR)T

and covariance matrix given as (1-m)(1ρρ1), where m = nEk/nEk, δ1 = (μRμP − Δ)/σ and δ2 = (μEμR + Δ)/σ. Therefore, the conditional power (B1) can be rewritten as

CP(δ1,δ2)={Φ2(-cKAS-mtkAS1-m+nEK-nEkδ11/CR+1/CP,-cKNI-mtkNI1-m+nEK-nEkδ21+1/CR),ifTlNIclNIandTlASclASforalll=1,,k,1-Φ1(cKNI-mtkNI1-m-nEK-nEkδ21+1/CR),ifTlNIclNIforalll=1,,k,andTlAS>clASforanyl=1,,k,

where Φ1(·) and Φ2(·) are the cumulative distribution functions of the standardized univariate and bivariate normal distributions.

References

  • Committee for medical products for human use (CHMP) [Accessed November 12, 2015];Guideline on the choice of the non-inferiority margin. 2005 Available at: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003636.pdf.
  • Cui L, Hung HMJ, Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics. 1999;55:853–857. doi: 10.1111/j.0006-341X.1999.00853.x. [PubMed] [Cross Ref]
  • D’Agostino RB, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues: the encounters of academic consultants in statistics. Statistics in Medicine. 2003;22:169–186. doi: 10.1002/sim.1425. [PubMed] [Cross Ref]
  • Fishbane S, Schiller B, Locatelli F, Covic AC, Provenzano R, Wiecek A, Levin NW, Kaplan M, Macdougall IC, Francisco C, Mayo MR, Polu KR, Duliege AM, Besarab A. for the EMERALD Study Groups. Peginesatide in patients with anemia undergoing hemodialysis. New England Journal of Medicine. 2013;368:307–19. doi: 10.1056/NEJMoa1203165. [PubMed] [Cross Ref]
  • Food and Drug Administration (FDA) Guidance for industry non-inferiority trials. Rockville, MD: Food and Drug Administration; 2010. [Accessed July 14, 2015]. Available at: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM202140.pdf.
  • Gao P, Ware JH. Assessing non-inferiority: a combination approach. Statistics in Medicine. 2008;27:392–406. doi: 10.1002/sim.2938. [PubMed] [Cross Ref]
  • Genz A. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics. 1992;1:141–149. doi: 10.1080/10618600.1992.10477010. [Cross Ref]
  • Hauschke D, Pigeot I. Establishing efficacy of a new experimental treatment in the ‘gold standard’ design. Biometrical Journal. 2005a;47:782–786. doi: 10.1002/bimj.200510169. [PubMed] [Cross Ref]
  • Hauschke D, Pigeot I. Rejoinder to “Establishing efficacy of a new experimental treatment in the ‘gold standard’ design” Biometrical Journal. 2005b;47:797–798. doi: 10.1002/bimj.200510179. [PubMed] [Cross Ref]
  • Hasler M, Vonk R, Hothorn LA. Assessing non-inferiority of a new treatment in a three-arm trial in the presence of heteroscedasticity. Statistics in Medicine. 2008;27:490–503. doi: 10.1002/sim.3052. [PubMed] [Cross Ref]
  • Hamasaki T, Asakura K, Evans SR, Sugimoto T, Sozu T. Group-sequential strategies in clinical trials with multiple co-primary endpoints. Statistics in Biopharmaceutical Research. 2015;7:36–54. doi: 10.1080/19466315.2014.1003090. [PMC free article] [PubMed] [Cross Ref]
  • Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: Exponential event times. Pharmaceutical Statistics. 2013;12:28–34. doi: 10.1002/pst.1545. [PMC free article] [PubMed] [Cross Ref]
  • Hida E, Tango T. On the three-arm non-inferiority trial including a placebo with a prespecified margin. Statistics in Medicine. 2011a;30:224–231. doi: 10.1002/sim.4099. [PubMed] [Cross Ref]
  • Hida E, Tango T. Response to Joachim Röhmel and Iris Pigeot. Statistics in Medicine. 2011b;30:3165. doi: 10.1002/sim.4313. [Cross Ref]
  • Hida E, Tango T. Three-arm noninferiority trials with a prespecified margin for inference of the difference in the proportions of binary endpoints. Journal of Biopharmaceutical Statistics. 2013;23:774–789. doi: 10.1080/10543406.2013.789893. [PubMed] [Cross Ref]
  • International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) [Accessed July 14, 2015];ICH Harmonised Tripartite Guideline E10: Choice of control group and related issues in clinical trials. 2000 Jul; Available at: http://www.ich.org/pdfICH/e10step4.pdf.
  • Kieser M, Friede T. Planning and analysis of three-arm non-inferiority trials with binary endpoints. Statistics in Medicine. 2007;26:253–273. doi: 10.1002/sim.2543. [PubMed] [Cross Ref]
  • Koch A, Röhmel J. Hypothesis testing in the “gold standard” design for proving the efficacy of an experimental treatment. Journal of Biopharmaceutical Statistics. 2004;14:315–325. doi: 10.1081/BIP-120037182. [PubMed] [Cross Ref]
  • Kombrink K, Munk A, Friede T. Design and semiparametric analysis of non-inferiority trials with active and placebo control for censored time-to-event data. Statistics in Medicine. 2013;32:3055–3066. doi: 10.1002/sim.5769. [PubMed] [Cross Ref]
  • Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659–663. doi: 10.1093/biomet/70.3.659. [Cross Ref]
  • Li G, Gao S. A group sequential type design for three-arm non-inferiority trials with binary endpoints. Biometrical Journal. 2010;52:504–518. doi: 10.1002/bimj.200900188. [PubMed] [Cross Ref]
  • Mielke M, Munk A, Schacht A. The assessment of non-inferiority in a gold standard design with censored, exponentially distributed endpoints. Statistics in Medicine. 2008;27:5093–5110. doi: 10.1002/sim.3348. [PubMed] [Cross Ref]
  • O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979;35:549–556. doi: 10.2307/2530245. [PubMed] [Cross Ref]
  • Pigeot I, Schäfer J, Röhmel J, Hauschke D. Assessing non-inferiority of a new treatment in a three-arm clinical trial including a placebo. Statistics in Medicine. 2003;22:883–899. doi: 10.1002/sim.1450. [PubMed] [Cross Ref]
  • Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika. 1977;64:191–199. doi: 10.1093/biomet/64.2.191. [Cross Ref]
  • Röhmel J, Pigeot I. Statistical strategies for the analysis of clinical trials with an experimental treatment, an active control and placebo, and a prespecified fixed non-inferiority margin for the difference in means. Statistics in Medicine. 2011;30:3162–3164. doi: 10.1002/sim.4299. [PubMed] [Cross Ref]
  • Rothmann MD, Wiens BL, Chan ISF. Design and Analysis of Non-Inferiority Trials. Chapman & Hall/CRC; 2012.
  • Schlömer P, Brannath W. Group sequential designs for three-arm ‘gold standard’ non-inferiority trials with fixed margin. Statistics in Medicine. 2013;32:4875–4899. doi: 10.1002/sim.5950. [PubMed] [Cross Ref]
  • Stucke K, Kieser M. A general approach for sample size calculation for the three-arm ‘gold standard’ non-inferiority design. Statistics in Medicine. 2012;31:3579–3596. doi: 10.1002/sim.5461. [PubMed] [Cross Ref]
  • Sugimoto T, Sozu T, Hamasaki T. A convenient formula for sample size calculations in clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012;11:118–128. doi: 10.1002/pst.505. [PubMed] [Cross Ref]
  • Tang ML, Tang NS. Test of noninferiority via rate difference for three-arm clinical trials with placebo. Journal of Biopharmaceutical Statistics. 2004;14:337–347. doi: 10.1081/BIP-120037184. [PubMed] [Cross Ref]
  • Tsong Y, Zhang J, Wang SJ. Group sequential design and analysis of clinical equivalence assessment for generic nonsystematic drug products. Journal of Biopharmaceutical Statistics. 2004;14:359–373. doi: 10.1081/BIP-120037186. [PubMed] [Cross Ref]