Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3085081

Formats

Article sections

- Abstract
- 1 Introduction
- 2 A Unified Framework
- 3 Sample Size Controlling the Percentiles
- 4 Example
- 5 Discussion
- References

Authors

Related links

Clin Trials. Author manuscript; available in PMC 2011 August 1.

Published in final edited form as:

Published online 2010 June 23. doi: 10.1177/1740774510373629

PMCID: PMC3085081

NIHMSID: NIHMS288455

Song Zhang

Department of Clinical Sciences UT Southwestern Medical Center Dallas, TX

Jing Cao

Department of Statistical Science Southern Methodist University Dallas, TX

Department of Clinical Sciences UT Southwestern Medical Center Dallas, TX

Correspondence should be sent to Song Zhang, Ph.D. Department of Clinical Sciences University of Texas Southwestern Medical Center 5323 Harry Hines Blvd Dallas, Texas, 75390-9066, U.S.A. Email: Song.Zhang/at/UTSouthwestern.edu

The publisher's final edited version of this article is available at Clin Trials

See other articles in PMC that cite the published article.

Makuch and Simon [1] developed a sample size formula for historical control trials. When assessing power, they assumed the true control treatment effect to be equal to the observed effect from the historical control group. Many researchers have pointed out that the M-S approach does not preserve the nominal power and type I error when considering the uncertainty in the true historical control treatment effect.

To develop a sample size formula that properly accounts for the underlying randomness in the observations from the historical control group.

We reveal the extremely skewed nature in the distributions of power and type I error, obtained over all the random realizations of the historical control data. The skewness motivates us to derive a sample size formula that controls the percentiles, instead of the means, of the power and type I error.

A closed-form sample size formula is developed to control arbitrary percentiles of power and type I error for historical control trials. A simulation study further demonstrates that this approach preserves the operational characteristics in a more realistic scenario where the population variances are unknown and replaced by sample variances.

The closed-form sample size formula is derived for continuous outcomes. The formula is more complicated for binary or survival time outcomes.

We have derived a closed-form sample size formula that controls the percentiles instead of means of power and type I error in historical control trials, which have extremely skewed distributions over all the possible realizations of historical control data.

Randomized clinical trials (RCT) have become the gold standard in comparing the effects between treatments. Despite the rigorous scientific basis, there are situations where RCTs are infeasible due to concerns of ethics, patient preference, cost, and regulatory acceptability. For example, the resources required by an RCT might be prohibitive for some phase II trials which are intended to obtain preliminary data on the effectiveness of a new treatment [2]. Another example is that when evidence already exists showing the superiority of a new treatment over the standard one, it might be unethical for a RCT to assign patients to a potentially inferior treatment. One solution is to use a historical control trial (HCT), where the experimental therapy is compared with a control therapy (referred to as historical control or HC) that has been evaluated in a previously conducted trial. Because an HCT can be smaller in size and easier to conduct, it has been widely applied in clinical research [3, 4, 5, 6, 7, 8, 9].

Makuch and Simon [1] developed a sample size formula for HCTs with a binary outcome. In power calculation, they assumed that the observed response rate from the HC group was the true control response rate. Their formula was based on the two-sample test statistic employed in RCTs but the power calculation only accounted for the sampling variability in the experimental group. The sample size solution was obtained through a numerical search. Using a similar idea, Dixon and Simon [10] provided a sample size formula for HCTs with exponential survival outcomes. Chang et al. [11] presented a two-stage design for phase II clinical trials with HC and continuous outcomes. More discussions about the HCT sample size calculation can be found in [12] and [13].

The estimated sample size for an HCT is usually much smaller than that required by an RCT. Lee and Tseng [14] pointed out that the sample size reduction in HCT is largely unjustified due to the strong assumption that the observed historical control response rate is equal to the true control response rate. They proposed a uniform power method to control the expected power, taking into account the uncertainty in the HC response rate. The resulting sample size is closer to the RCT sample size than the one based on Makuch and Simmon's method (M-S). Korn and Freidlin [15] compared three approaches to HCT design: M-S approach, RCT approach, and one-sample approach (based on a one-sample test that the experimental treatment effect is greater than the observed HC treatment effect). The authors suggested to adopt the RCT approach because it preserves the unconditional power over the random HC observations.

In this study, we investigate the sample size calculation for HCT with continuous outcomes, accounting for uncertainty caused by the unknown true HC treatment effect. We provide a unified framework for the M-S, RCT and one-sample approaches, where they are shown to either control the mean or the median of the random power and type I error, obtained over all the possible realizations of the HC data given the true HC effect. We further demonstrate through simulation that the distributions of power and type I error are extremely skewed. This extreme skewness leads to undesirable properties of sample sizes calculated to control the means of power and type I error. One revealing example in our simulation is that with the mean power controlled at 0.8, a slight decrease in the mean type I error from 0.06 to 0.05 leads to a drastic increase in sample size from 286 to 487. This observation motivates us to develop a sample size formula that controls the percentiles, instead of the means, of the random power and typer I error. To our knowledge, it is the first study in HCT design to demonstrate the extreme skewness in the distributions of power and type I error, and to estimate sample size based on the percentiles of power and type I error. It provides researchers a sensible way to assess the risk in an HCT. The proposed formula has a closed form, which can be easily computed using a scientific calculator.

The rest of this paper is organized as follows. In Section 2 we review the three different approaches (M-S, RCT, and one-sample) to sample size calculation in HCT under a unified framework. A simulation study is conducted to demonstrate the extreme skewness in the distributions of power and type I error. In Section 3 we present a sample size formula to control arbitrary percentiles of power and type I error. We evaluate its performance through simulation in two scenarios: an ideal scenario (population variances known) and a more realistic scenario (population variances unknown). In Section 4 we provide a real application of the proposed method. The final section is devoted to discussion.

We briefly review the M-S, RCT, and one-sample approaches to HCT sample size calculation. Suppose in a clinical trial we compare the outcomes between an experimental group and an HC group. The outcome variable is continuous following a normal distribution. Let be the *m* observations from the HC group, and be the *n* observations from the experimental group. We define * Y* = {

(1)

where and are the sample means from the two groups. Given type I error α, power 1 − β, *m*, , , and difference in treatment effects Δ = θ_{1} − θ_{0}, the sample size *n* is obtained by solving

(2)

where *z*_{1−α} is the 100(1 − α)th percentile of the standard normal distribution. Here and in the rest of the paper we do not differentiate the estimated sample size or the solution to the sample size equations, with the understanding that the final sample size is the smallest integer greater than or equal to the solution.

**In the M-S approach**, the *Y _{j}s* from the HC group are considered not subject to sampling variability because they have been observed before the clinical trial. This consideration leads to the following manipulation of (2),

(3)

where Φ(·) is the cumulative distribution function of the standard normal distribution. Thus we find *n* by solving

(4)

Since the true HC effect (θ_{0}) is usually unknown, it is cancelled out in the equation by assuming . This is a strong assumption especially when the number of HC observations is limited. Traditionally Equation (4) has been solved through a numeric search [11]. Here we present a closed-form solution and a sufficient and necessary condition for its existence.

**Theorem 1**. Define , , and . In clinical trials we usually specify α and β such that α ≤ β. Equation (4) has a unique sample size solution if and only if , and the solution is

(5)

**Proof.** See Appendix A.1.

Theorem 1 helps researchers avoid time-consuming numerical search. It also points out a potential pitfall in the M-S approach where too small an assumed difference in the treatment effects () would lead to no solution for the sample size.

**In the one-sample approach**, the hypotheses are specified as and based on the assumption that . The one-sample test statistic is employed. The sample size estimate is .

**In the RCT approach**, the HC group is treated as a regular control arm in an RCT. The sample size is estimated by , which is based on a two-sample test.

The three approaches produce drastically different sample size estimates. For example, given *m* = 80, , Δ = 0.3, α = 0.05, and 1 − β = 0.8, the estimated sample sizes are *n*_{0} = 144, *n*_{1} = 69, and *n*_{2} = 487, respectively. The formulas of *n*_{0}, *n*_{1} and *n*_{2} do not depend on the specific values of HC sample mean () or true mean (θ_{0}). The test statistics, however, are calculated based on the HC sample mean. In a particular study, the unknown difference between and θ_{0} has a great impact on the realized power and type I error. We conduct Simulation 1 to compare the performance of *n*_{0}, *n*_{1} and *n*_{2}. Details of the simulation algorithm are presented in Appendix A.2.

The realization of a particular HC data (**Y**^{(k)}) leads to a conditional power () and a conditional type I error () under sample size *n _{v}*. Over the random realizations of

Without loss of generality, we set the true HC effect at θ_{0} = 0. We also assume *m* = 80, , Δ= 0.3, α = 0.05, and 1 − β = 0.8. Figure 1 shows the results of Simulation 1. The graphs in the first column indicate that both the conditional power and type I error decrease monotonically as the difference between the observed and true HC treatment effect, (), increases. Table 1 lists the conditional power and type I errors given **Y**^{(k)}, with changing between two standard errors below and above θ_{0}. For example, under *n*_{0}, when the observed HC effect () is one standard error away () from the true effect (θ_{0}), the mean power changes from 0.313 to 0.987 and the mean type I error from 0 to 0.088, which deviate far from the nominal levels of 1 − β = 0.8 and α = 0.05. Note that such a deviation is not a rare event because for a particular HC data set, there is a 32% chance that the sample mean is one standard error or further away from its true mean. The second and third columns of Figure 1 show that the distributions of type I error and power are extremely skewed, which is also revealed by the difference between their means and medians (achieved at ) in Table 1.

The type I errors () and powers () under *n*_{0}, *n*_{1}, and *n*_{2}. The first column plots and versus the difference between the observed and true HC effects, , with black for and red for . The second and third columns plot the histograms of *h*_{v} and *q*_{v}, respectively **...**

We briefly explain why the power and type I error have skewed distributions over the random realizations of the HC data. Taking the one-sample approach (*n*_{1}) for example, for a particular HC data *Y*^{(k)}, it can be shown that the conditional type I error is

In the parentheses, *z*_{1−α} is usually the dominant term and shifts the probability computation to the tail area of the normal distribution. As a result, although the sample mean () is symmetric around the true mean (θ_{0}), the impact of the sample mean being greater or smaller than the true mean is different. For , the conditional type I errors have a range of (0, α), which is narrow under commonly specified significance levels. On the other hand, for , the conditional type I errors have a much wider range, (α, 1). In summary, it is because researchers usually set power and significance level in the tail area (i.e., α close to 0 and 1 − β close to 1) that the random power and type I error are skewly distributed.

Finally, Table 1 provides empirical evidence for the theory presented in Theorem 2, which states a unified framework for *n _{v}* (

**Theorem 2**. The sample sizes (n_{0}, n_{1}, n_{2}) control the random power and type I error in such a way that

- The M-S approach (n
_{0}) controls the mean of type I error at α and the median of power at 1 − β; - The one-sample approach (n
_{1}) controls the medians of type I error and power at α and 1 − β, respectively; - The RCT approach (n
_{2}) controls the means of type I error and power at α and 1 − β, respectively.

**Proof**. See Appendix A.3.

Theorem 2 suggests that the M-S approach tries to reach a compromise between the one-sample and RCT approaches by controlling the mean type I error at α, while the median power at 1 − β.

Simulation 1 shows that the distributions of power and type I error, observed over all the random realizations of HC data, are extremely skewed. For random variables with extremely skewed distributions, making decisions based on a location parameter such as a percentile is usually more desirable than based on the mean. We propose a sample size formula to control arbitrary percentiles of the random power and type I error. It provides a more sensible way to assess the risk in HCTs.

Theorem 3. Suppose in an HCT the goal is to control the (1 − p_{q})th percentile of the power at 1 − β, and the p_{h}th percentile of the type I error at α. Then the required sample size is

(6)

and the null hypothesis is rejected if

The parameters p_{q} and p_{h} can be specified arbitrarily as long as the condition holds.

**Proof**. See Appendix A.4.

According to Theorem 3, let *q** and *h** be the random power and type I error under sample size *n**. Then we have *P*(*q** > 1 − β) = *p _{q}* and

In other words, we propose sample size *n** to achieve the goal that, the operational characteristics (realized power and type I error) of an HCT are more desirable than the nominal levels with certain pre-specified probabilities (*p _{q}* and

Based on the same setting as in Simulation 1, we conduct Simulation 2 to explore the properties of *n**. We consider different combinations of *p _{q}* and

In simulations 1 and 2, we have assumed the population variances of the HC and experimental groups ( and ) to be known, which is usually unrealistic in practice. We conduct Simulation 3 to further access the performance of *n** in a more realistic scenario. It proceeds as follows: a) To compute the required sample size *n**, the assumed Δ and will be plugged into Equation (6). However, is replaced by , the HC sample variance. b) In hypothesis testing, we compute the test statistic *Z** (* X*,

Table 3 lists the results of Simulation 3. The sample size *n** becomes random when we replace the HC population variance () with a random sample variance (). When *p _{q}* =

The safety and effcacy of laparoscopic rectopoxy for rectal prolapse will be compared with those of open rectopexy procedure, which was conducted several months ago by the same group of surgeons at the same institution [16]. Data will be collected prospectively for the laparoscopic rectopoxy group and by hospital chart review for the HC group. The HC group includes 24 consecutive patients who had undergone conventional open rectopexy without having concomitant gynecologic procedures. These patients required an average of 71.5 milligrams of morphine during the first 48 hours after procedure with a standard deviation of 45.9 milligrams. It is expected that the average amount of morphine needed during the first 48 hours of laparoscopic rectopexy will be 41.5 milligrams with a standard deviation of 35.0 milligrams. We estimate the number of patients needed to detect the difference in morphine requirement during the first 48 hours between open and laparoscopic procedures, controlling the 70th percentile (*p _{h}* = 0.7) of type I error at 5%, and the 30th percentile (

We have provided a unified framework for three existing approaches (M-S, one-sample, and RCT) in HCTs, by showing that they either control the mean or the median of power and type I error. We further developed a closed-form sample size formula to control arbitrary percentiles of the random power and type I error. It provides more flexibility in assessing the risk in HCTs and accommodates the extreme skewness in the distributions of power and type I error. We limited our discussion to the HCTs with continuous outcomes. In the future we will extend it to HCTs with binary and survival time outcomes.

Similar to the existing approaches, the proposed sample size formula (*n**) requires the population variances of the HC and the experimental group to be known. Through simulation study, we demonstrated that the proposed approach successfully controls the percentiles of power and type I error in a more realistic scenario, where the true variances are unknown and they are replaced with observed sample variances. One reviewer kindly pointed out that in situations where the measurements are continuous with bounded support, say on (*a*, *b*), a sample size formula can be derived without requiring the population variances. Specifically, we can define with . Applying the arcsin transformation on , we can calculate sample size based on sin^{−1}(), whose variance is free of the sampled population's true variance.

Lee and Tseng [14] presented sample size calculation for HCTs with binary outcomes controlling the means of power and type I error. Theorem 2 states that the same goal is achieved by *n*_{2} for HCTs with continuous outcomes. The computation in [14] is more complicated due to the transformation performed on binary data. For continuous outcomes, when the HC variance is assumed to be known, the sample size formula does not depend on observations from the HC group. Thus one pair of null and alternative hypotheses leads to one unique sample size estimate. For binary outcomes, the sample size formula computed under the arcsin transformation depends on the observations from the HC group. Thus one pair of hypotheses leads to many possible sample size estimates, each determined by a random realization of the HC data. In [14], the authors had to deal with the expectation of sample sizes.

The term () in the numerator of *Z**(* X*,

This study is supported in part by NIH grants UL1 RR024982 and P50 CA70907. The authors thank the two reviewers and associate editor for their constructive comments and suggestions.

*Proof*. Assuming and applying some simple algebra, we transform (4) to

Squaring on both sides and rearranging, we have

(7)

From (7) we can find a closed form solution for subject to constraint that .

First we need *b*^{2} − 4*ac* ≥ 0, where *a*, *b* and *c* are defined in (5). This condition implies , and two possible roots

**Fact 1**. No plausible solution exists under .

- If then a = 0, and the solution to (7) is r = −c/b. Because c > 0 when α < β and b > 0 by definition, we eliminate r due to the positive constraint on .
- If then a > 0 and 4ac > 0. It is easy to show that r
_{1}< 0 and r_{2}< 0.

**Suffciency**: We demonstrate that the condition implies (5) being the unique sample size solution. From the condition we have *a* < 0 and 4*ac* < 0. Together with *b* > 0, it is easy to show that *r*_{1} < 0 and *r*_{2} > 0. Thus (5) is the unique sample size solution.

**Necessity**: We demonstrate that (5) being the unique sample size solution implies . (5) being the unique solution is equivalent to *r*_{2} being the unique solution for . There are two scenarios:

*b*^{2}− 4*ac*= 0 or . It is eliminated due to Fact 1.*b*^{2}− 4*ac*> 0 and*r*_{1}< 0 and*r*_{2}> 0. Note that the condition*b*^{2}− 4*ac*> 0 implies . We eliminate based on Fact 1. The validity of is established by Sufficiency.

Thus we complete the proof.

**Simulation 1**. First we compute sample sizes n_{v} for a given set of (m, , , Δ, α, β), where v = 0, 1, 2 denote the M-S, one-sample, and RCT approach, respectively. Then we generate null experimental data sets from , and alternative experimental data sets from , for l = 1, …, L and v = 0, 1, 2. The superscript 0 indicates that the null distribution is true, and the superscript(l) indicates the lth experimental data set generated. For iteration k = 1, …, K,

- Simulate HC data from ;
- Estimate the conditional type I error given
**Y**^{(k)}by . Note that Z_{v}(**X, Y**) = Z(**X, Y**) for v = 0 and 2. - Estimate the conditional power given
**Y**^{(k)}by .

The superscript(k) of and suggests that they are computed given the kth simulated HC data. We set K = L = 5000.

*Proof*. We first state the fact that , , , and median() =θ_{0}. For *n*_{0} and *n*_{2}, the null hypothesis is rejected if *Z*(** X, Y**) >

We have the third equality through random variable transformation, where and it is easy to show that *U _{v}* ~

In the similar fashion, we can show that *E*(*q*_{2}) = 1 − β,

We have the second equality by defining . The third equality is obtained by plugging the expression of *n*_{2} and recognizing *U* ~ *N*(0, 1)

We then show that *median*(*q*_{0}) = 1 − β. From (3) we have

First we recognize that is a decreasing function of . Second, *n*_{0} is the solution to = 1 − β by setting = θ_{0} = midian(). These two points lead to the conclusion that median(*q*_{0}) = 1 − β. Note that *E*(*q*_{2}) and have different expressions because the former marginalizes with respect to random , while the latter is defined conditional on a particular **Y**^{(k)}.

Now we show that median(*h*_{1}) = α,

Thus is a decreasing function of and at . Thus we conclude median(*h*_{1}) = α. Similar argument leads to the conclusion that median(*q*_{1}) = 1 − β.

*Proof*. First we demonstrate that based on *Z**(** X, Y**), the

Thus *h** = α when , which is the (1−*p _{h}*)th percentile of . Together with the fact that

Then we solve for *n** which controls the (1 − *p _{q}*)th percentile of power at 1 − β: The conditional power given is

(8)

It is obvious that *q** is a monotonically decreasing with . Using this property, if we set *q** = 1 − β at , the *p _{q}*th percentile of , we can achieve the goal of controlling the (1 −

Thus by plugging *q** = 1 − β and into (8), we can solve for *n** from the following equation,

The solution for *n**, equation (6), can be obtained after some algebra. The condition is due to the positive constraint on .

**Simulation 3**. For iteration k = 1, …, K,

- Generate HC data from . Compute the sample variance ;
- Estimate the required sample size n*
^{(k)}based on Formula (6), with replaced by ; - Given sample size n*
^{(k)}, generate null experimental data sets from , and alternative experimental data sets from , for l = 1, …, L; - Compute the empirical type I error by . Note that we replace the population variances ( and ) in Z*(X
^{0(k,l)},Y^{(k)}by sample variances ( and ). Here is the sample variance of**X**^{0(k,l)}. Similarly, we compute the empirical power q*^{(k)}.

[1] Makuch RW, Simon RM. Sample size considerations for non-randomized comparative studies. Journal of Chronic Diseases. 1980;33(3):175–181. [PubMed]

[2] Vickers AJ, Ballen V, Scher HI. Setting the bar in phase II trials: The use of historical data for determining ”go/no go” decision for definitive phase III testing. Clinical Cancer Research. 2007;13(3):972–976. [PMC free article] [PubMed]

[3] Cho SD, Krishnaswami S, Mckee JC, Zallen G, Silen ML, Bliss DW. Analysis of 29 consecutive thoracoscopic repairs of congenital diaphragmatic hernia in neonates compared to historical controls. Journal of Pediatric Surgery. 2009;44(1):80–86. [PubMed]

[4] Abe T, Kakemura T, Fujinuma S, Maetani I. Successful outcomes of emr-l with 3d-eus for rectal carcinoids compared with historical controls. World Journal of Gastroenterology. 2008;14(25):4054–4058. [PMC free article] [PubMed]

[5] Storm C, Steffen I, Schefold JC, Krueger A, Oppert M, Jorres A, Hasper D. Mild therapeutic hypothermia shortens intensive care unit stay of survivors after out-of-hospital cardiac arrest compared to historical controls. Critical Care. 12(3):2008. [PMC free article] [PubMed]

[6] Van Rooij WJ, De Gast AN, Sluzewski M. Results of 101 aneurysms treated with polyglycolic/polylactic acid microfilament nexus coils compared with historical controls treated with standard coils. American Journal of Neuroradiology. 2008;29(5):991–996. [PubMed]

[7] Ando R, Nakamura A, Nagatani M, Yamakawa S, Ohira T, Takagi M, Matsushima K, Aoki A, Fujita Y, Tamura K. Comparison of past and recent historical control data in relation to spontaneous tumors during carcinogenicity testing in fischer 344 rats. Journal of Toxicologic Pathology. 2008;21(1):53–60.

[8] Song JY, Chung BS, Choi KC, Shin BS. A 5-year period clinical observation on herpes zoster and the incidence of postherpetic neuralgia (2002–2006); a comparative analysis with the historical control group of a previous study (1995–1999) Korean Journal of Dermatology. 2008;46(4):431–436.

[9] Loudon I. The use of historical controls and concurrent controls to assess the effects of sulphonamides, 1936–1945. Journal of the Royal Society of Medicine. 2008;101(3):148–155. [PMC free article] [PubMed]

[10] Dixon DO, Simon R. Sample size considerations for studies comparing survival curves using historical controls. Journal of Clinical Epidemiology. 1988;41(12):1209–1213. [PubMed]

[11] Chang MN, Shuster JJ, Kepner JL. Group sequential designs for phase II trials with historical controls. Controlled Clinical Trials. 1999;20(4):353–364. [PubMed]

[12] Kepner J, Wackerly D. Some observations on the makuch/simon approach to sample size determination in clinical trials with historical controls. Communications in Statistics Part B: Simulation and Computation. 2001;30(3):611–621.

[13] Chang MN, Shuster JJ, Kepner JL. Sample sizes based on exact unconditional tests for phase II clinical trials with historical controls. Journal of Biopharmaceutical Statistics. 2004;14(1):189–200. [PubMed]

[14] Lee JJ, Tseng C. Uniform power method for sample size calculation in historical control studies with binary response. Controlled Clinical Trials. 2001;22(4):390–400. [PubMed]

[15] Korn EL, Freidlin B. Conditional power calculations for clinical trials with historical controls. Statistics in Medicine. 2006;25(17):2922–2931. [PubMed]

[16] Abraham NS, DuraiRaj R, Young JM, Young CJ, Solomon MJ. How does an historic control study of a surgical procedure compare with the ”gold standard”? Diseases of the Colon and Rectum. 2006;49(8):1141–1148. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |