Home | About | Journals | Submit | Contact Us | Français |

**|**Biostatistics**|**PMC2648901

Formats

Article sections

Authors

Related links

Biostatistics. 2009 April; 10(2): 324–326.

Published online 2008 December 3. doi: 10.1093/biostatistics/kxn038

PMCID: PMC2648901

Jiexun Wang

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China and Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan

Hua Liang

Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA

Guohua Zou^{*}

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China and Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA ; Email: moc.oohay@2002hguoz

Received 2008 March 31; Revised 2008 July 22; Accepted 2008 October 13.

Copyright © The Author 2008. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

In the genetic association studies, the 2-stage design as a cost-effective design has received much attention recently. In this note, we focus on the 2-stage design in which DNA pooling is used in the first stage and individual genotyping is used in the second stage. An important problem with such a design is how to optimize it. Zuo *and others* (2008) investigated this problem under the given cost. The objective of this note is to solve the problem under the requirement of the given statistical power. In practical applications, the sample in the first stage can be reused in the second stage, such a 2-stage scheme is called “the 2-stage dependent design”. On the other hand, we may use 2 separate samples in the 2 stages with one sample used for screen and the other used for confirmation. Such a 2-stage scheme is called “the 2-stage independent design” (Zuo *and others*, 2006). We will consider how to optimize the parameters in these 2 kinds of 2-stage design so that the total cost of study is minimized when a given power is required. As mentioned by Satagopan and Elston (2003), this task can be completed by minimizing the cost fraction between the 2-stage design and the 1-stage design using individual genotyping, where the overall significance levels and the total sample sizes are the same for the 2 designs, and their powers are as close as possible.

Using the notation in Zuo *and others* (2008), the cost functions for the 2-stage dependent and independent designs can be expressed as

and

respectively. For the 1-stage design with individual genotyping, the cost function is given by

where *N* is the sample size attaining the desired power of $1-\beta $ with an overall significance level of *α*. Thus, when the total sample size of the 1-stage design equals that of the 2-stage design, the goal of minimizing ${T}_{2,\text{De}}/{T}_{1}$ (or ${T}_{2,\text{In}}/{T}_{1})$ is equivalent to minimizing $S{T}_{\text{De}}/S{T}_{1}\equiv {\omega}_{\text{De}}$ (or $S{T}_{\text{In}}/S{T}_{1}\equiv {\omega}_{In})$ for the 2-stage dependent (or independent) design, where

with $r={C}_{\text{pool}}/{C}_{\text{ind}}$.

The constraints on the powers of the 2-stage dependent and independent designs (denoted by${P}_{\text{De}}$ and ${P}_{\text{In}})$ are $(1-\beta )-{P}_{\text{De}}\le {e}_{\text{De}}$ and $(1-\beta )-{P}_{\text{In}}\le {e}_{\text{In}}$, respectively, where *e*_{De} and *e*_{In} are some small numbers such as 0.01 and 0.03.

To obtain the optimal choices of the parameters in the 2-stage design with a desired power, we use a calculation procedure provided in the supplementary material available at *Biostatistics* online (http://www.biostatistics.oxfordjournals.org). We consider the population frequency of allele *A* of *p*=0.05, 0.2, or 0.7 and the allele frequency difference between the cases and controls of *p _{A}*−

Our calculation results show that for the 2-stage dependent design, the cost saving is very big, especially when the total number of markers is large. On the other hand, we observe that genotyping errors with common error rates have no large effect on the saving in cost, although the cost saving is slightly more with the increase of genotyping error rates. However, the measurement errors with DNA pooling have large effect on the optimal 2-stage dependent design. By forming multiple pools, such an effect can be reduced substantially. For the 2-stage independent design, we find that the cost saving largely depends on the measurement error rates in the first stage. For the usual error rates with DNA pooling, the optimal design tends to be the 1-stage individual genotyping design and in this case, there would substantially be no saving in cost. Also, unlike the situation of the 2-stage dependent design, forming multiple pools does not necessarily increase the cost saving for the 2-stage independent design. However, when the measurement error rates are very small, the optimal design tends to be the 1-stage DNA pooling design and in this case, the saving in cost can be substantial.

Comparing the 2-stage dependent and independent designs, we can save much money by using the 2-stage dependent design. This becomes clearer by observing Figure 1.

National Institutes of Health (AI62247-01 and AI59773) to H.L.; National Natural Science Foundation of China (70625004, 10721101, and 70221001) to G.Z.

The authors are grateful to the referee for the insightful comments and suggestions. *Conflict of Interest:* None declared.

- Satagopan J M, Elston R C. Optimal two-stage genotyping in population-based association studies. Genetic Epidemiology. 2003;25:149–157. [PubMed]
- Zuo Y, Zou G, Wang J, Zhao H, Liang H. Optimal two-stage design for case-control association analysis incorporating genotyping errors. Annals of Human Genetics. 2008;72:375–387. [PMC free article] [PubMed]
- Zuo Y, Zou G, Zhao H. Two-stage designs in case-control association analysis. Genetics. 2006;173:1747–1760. [PubMed]

Articles from Biostatistics (Oxford, England) are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |