|Home | About | Journals | Submit | Contact Us | Français|
In the genetic association studies, the 2-stage design as a cost-effective design has received much attention recently. In this note, we focus on the 2-stage design in which DNA pooling is used in the first stage and individual genotyping is used in the second stage. An important problem with such a design is how to optimize it. Zuo and others (2008) investigated this problem under the given cost. The objective of this note is to solve the problem under the requirement of the given statistical power. In practical applications, the sample in the first stage can be reused in the second stage, such a 2-stage scheme is called “the 2-stage dependent design”. On the other hand, we may use 2 separate samples in the 2 stages with one sample used for screen and the other used for confirmation. Such a 2-stage scheme is called “the 2-stage independent design” (Zuo and others, 2006). We will consider how to optimize the parameters in these 2 kinds of 2-stage design so that the total cost of study is minimized when a given power is required. As mentioned by Satagopan and Elston (2003), this task can be completed by minimizing the cost fraction between the 2-stage design and the 1-stage design using individual genotyping, where the overall significance levels and the total sample sizes are the same for the 2 designs, and their powers are as close as possible.
Using the notation in Zuo and others (2008), the cost functions for the 2-stage dependent and independent designs can be expressed as
respectively. For the 1-stage design with individual genotyping, the cost function is given by
where N is the sample size attaining the desired power of with an overall significance level of α. Thus, when the total sample size of the 1-stage design equals that of the 2-stage design, the goal of minimizing (or is equivalent to minimizing (or for the 2-stage dependent (or independent) design, where
The constraints on the powers of the 2-stage dependent and independent designs (denoted by and are and , respectively, where eDe and eIn are some small numbers such as 0.01 and 0.03.
To obtain the optimal choices of the parameters in the 2-stage design with a desired power, we use a calculation procedure provided in the supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org). We consider the population frequency of allele A of p=0.05, 0.2, or 0.7 and the allele frequency difference between the cases and controls of pA−pU=0.05 or 0.10 and assume that the overall significance level is α=0.05 and the power of the 1-stage design is 1−β=0.8. We set the number of the total markers as M=25, or 500, or 106 and the number of the true disease markers as K=1 or 5 and let , , and .
Our calculation results show that for the 2-stage dependent design, the cost saving is very big, especially when the total number of markers is large. On the other hand, we observe that genotyping errors with common error rates have no large effect on the saving in cost, although the cost saving is slightly more with the increase of genotyping error rates. However, the measurement errors with DNA pooling have large effect on the optimal 2-stage dependent design. By forming multiple pools, such an effect can be reduced substantially. For the 2-stage independent design, we find that the cost saving largely depends on the measurement error rates in the first stage. For the usual error rates with DNA pooling, the optimal design tends to be the 1-stage individual genotyping design and in this case, there would substantially be no saving in cost. Also, unlike the situation of the 2-stage dependent design, forming multiple pools does not necessarily increase the cost saving for the 2-stage independent design. However, when the measurement error rates are very small, the optimal design tends to be the 1-stage DNA pooling design and in this case, the saving in cost can be substantial.
Comparing the 2-stage dependent and independent designs, we can save much money by using the 2-stage dependent design. This becomes clearer by observing Figure 1.
National Institutes of Health (AI62247-01 and AI59773) to H.L.; National Natural Science Foundation of China (70625004, 10721101, and 70221001) to G.Z.
The authors are grateful to the referee for the insightful comments and suggestions. Conflict of Interest: None declared.