|Home | About | Journals | Submit | Contact Us | Français|
In the Georgia Centenarian Study (Poon et al., Exceptional Longevity, 2006), centenarian cases and young controls are classified according to three categories (age, ethnic origin, and single nucleotide polymorphisms [SNPs] of candidate longevity genes), where each factor has two possible levels. Here we provide methodologies to determine the minimum sample size needed to detect dependence in 2 × 2 × 2 tables based on Fisher's exact test evaluated exactly or by Markov chain Monte Carlo (MCMC), assuming only the case total L and the control total N are known. While our MCMC method uses serial computing, parallel computing techniques are employed to solve the exact sample size problem. These tools will allow researchers to design efficient sampling strategies and to select informative SNPs. We apply our tools to 2 × 2 × 2 tables obtained from a pilot study of the Georgia Centenarians Study, and the sample size results provided important information for the subsequent major study. A comparison between the results of an exact method and those of a MCMC method showed that the MCMC method studied needed much less computation time on average (10.16 times faster on average for situations examined with S.E. = 2.60), but its sample size results were only valid as a rule for larger sample sizes (in the hundreds).
The genetic analysis of human longevity reaches back some 20 years with the early studies focused on the association of human leukocyte antigen (HLA) antigen frequencies with survival to old age (Proust et al., 1982; Takata et al., 1987). This use of associative genetics in case-control studies was dictated by the almost-insurmountable difficulty of sampling of pedigrees possessing living members who display exceptional longevity in more than one generation. The most highly replicated achievement of this genetic approach is the implication of alleles of the APOE gene in determining life span (Kervinen et al., 1994; Louhija et al., 1994; Schachter et al., 1994). In addition, several other notable accomplishments have been realized in this area (reviewed in Franceschi et al., 2005). The polygenic nature of longevity (Finch, 1990; Carey, 2003) has also contributed to the preponderance of associative genetics in the analysis of human aging. Genetic linkage analysis is difficult in the case of complex traits, in which each locus accounts for only a small fraction of the variance for that trait, making associative genetics which possesses the capability to perform under these conditions an attractive alternative (Lander and Schork, 1994).
The unbiased genome-wide search for longevity genes stands in juxtaposition to the candidate gene approach that has been the staple in human genetics of aging. In established populations, linkage disequilibrium extends over relatively short distances. This situation requires many thousands of markers to saturate the genome, although in some genomic regions this problem may be less severe. The multiple gene-hypothesis problem becomes acute as the number of markers increases (Lander and Schork, 1994). The potential inadequacy of the whole genome scan using single-nucleotide polymorphisms (SNPs) as markers has been pointed out (Hagmann, 1999). However, the emergence of the haplotype map may ameliorate some of the difficulties (The International HapMap Consortium, 2005). The considerations reviewed here have resulted in the tilt toward candidate genes in associative genetics of human longevity. It is important to note that the quandaries discussed above with respect to the genetics of longevity become more severe when the more complex phenotype of aging is the subject of analysis. This is because the genetics of aging requires the equivalent of an assessment of the biological age as opposed to the chronological age (Karasik et al., 2005).
In the Georgia Centenarian study (Poon et al., 2006), data are collected in a case (centenarian)-control (young control) fashion. Cases and controls are further classified according to their ethnic origin (Caucasian or African-American) and SNPs in candidate longevity genes. Thus the data are organized into a 2 × 2 × 2 contingency table according to three categories (age, race, SNP allele). The important question is what minimum sample size is needed to detect nonrandom associations between factors considered for a given significance level, power, and “effect size.”
There are at least two methods to detect associations in 2 × 2 × 2 tables and to solve the sample size problem of 2 × 2 × 2 tables: exact tests or tests whose properties are based on asymptotic results. Exact tests are based on the exact distribution of the test statistic itself, often on an underlying hypergeometric distribution while asymptotic methods rely upon asymptotic normal or chi-squared approximations to the null distribution of such statistics as for the Pearson chi-squared statistic. Asymptotic tests are valid only for large sample sizes and intermediate marginal frequencies. In contrast, for the exact test, there is no such sample size restriction, and they work equally well in principle with all marginal frequencies.
Fisher's exact test is famous in 2 × 2 tables (Fisher, 1935). Bennett and Hsu (1960) gave the power function of the exact test for 2 × 2 tables in case of one or two fixed margins, and subsequently many authors have studied the sample size problem in these cases (e.g., Gail and Gart, 1973; Hasemen, 1978). Fu and Arnold (1992) and Sánchez et al. (2006) studied the sample size problem for 2 × 2 tables and 2 × 3 tables, respectively, both assuming that none of the margins is fixed and only the grand total N is known. Mielke and Berry (1988) proposed a generalization of the hypergeometric distribution to dimensions higher than two. The exact test of significance in higher dimension tables has been described by Zelterman, Chan, and Mielke (1995).
Because studies of an exact test in principle require complete enumeration of an exact distribution and hence are very time consuming, various approximations to sample sizes have been proposed in the case of one fixed margin (Casagrande, Pike, and Smith, 1978; Fleiss, 1981). An alternative, less computationally demanding approach is using Monte Carlo sampling (Agresti, Wackerly, and Boyett, 1972; Agresti, 1992). Besag and Clifford (1989) suggested Markov chain Monte Carlo (MCMC) methods for performing significance tests. MCMC procedures for exact conditional tests have been described (Kreiner, 1987; Smith, Forster, and McDonald, 1996). To our knowledge, this is the first application of MCMC methods to address the sample size problem for exact tests.
Here we examine an exact test for 2 × 2 × 2 tables based on Fisher's 2 × 2 exact test (Fisher, 1935; Fleiss, 1981; Yates, 1984; Fu and Arnold, 1992). With this exact test, we determine the minimum sample size required to detect varied nonrandom associations in 2 × 2 × 2 tables by enumerating margins exactly as well as by MCMC sampling, assuming the case total and the control total are known with given nominal significance level and power. We determined the minimum sample sizes to detect a particular level of association for each candidate SNP with longevity using the parameters obtained from the pilot study of the Georgia Centenarian Project. Based on the sample size results, the most informative SNPs will be used in the subsequent major study of the project.
Consider the 2 × 2 × 2 table in Table 1. The SNPs data are classified by three factors—age, race, and allele—each having two levels. The symbol Ca refers to a Caucasian. An Am refers to an African-American. The number L is the number of centenarian alleles, and N is the number of “young control” alleles. The capital letter T refers to one allele in a SNP while t refers to the other allele. In the 2 × 2 × 2 table in Table 1, we have a total of eight cell counts, represented by a, b, c, d, e, f, g, and h. Marginal counts are represented by k, L − k, l, L − l, m, N − m, n, and N − n. Given L and N, the letter M is used to represent the set of possible margins, and each margin consists of four marginal counts k, m, n, and l. In Table 2, we show the corresponding probability model, with “frequencies” instead of counts. All cell counts in a 2 × 2 × 2 table are completely determined by L, N, the margin M (k, l, m, n and two cell counts a and e. Given L and N, when the margin M is fixed, the counts a and e are subject to the constraints:
Odds ratios permit us to make an inference about the nonrandom associations between SNPs, age and race:
The odds ratios ρ1 and ρ2 are used to measure the association between SNPs and race in each age group, and ρca and ρam are used to measure the association between SNPs and longevity in each race. There are only two independent odds ratios because ρ1/ρ2 = ρca/ρam. Because the probability table can be completely defined by two odds ratios (ρ1 and ρ2 or ρca and ρam) and four margin probabilities (p, q, r, s), the saturated model has six independent parameters. In our case, the motivation for structuring the tables as in Table 1 and using ρ1 and ρ2 as input parameters, that is, the effects, to solve the sample size is that it reflects the case and control sampling design and that this structure leads to bound (5), enabling the computation of the power.
Because the counts in cases and controls are independent, the probability distribution for a 2 × 2 × 2 table conditional on its margins can be written as:
When ρ1 = 1 and ρ2 = 1, there is no association between longevity and race; that is, under the null hypothesis, the distribution reduces to a simpler formula:
To assign an achieved significance level to an observed table, the probabilities of those tables under the null hypothesis that are at least as extreme as the observed one are summed using (2). The exact test defines the critical region RC for the null hypothesis H0. RC is the region where the null hypothesis H0 is rejected in favor of the alternative hypothesis Ha. The critical region RC consists of a collection of 2 × 2 × 2 tables whose probabilities in (2) sum to a value ≤α, which is the nominal significance level. RC for each margin is determined by: (i) calculating the probability of each set of counts under the null hypothesis (2), (ii) ordering those set of counts with a probability ≤α in ascending order, and (iii) summing the probabilities until they exceed α.
The expected power of an exact test for a 2 × 2 × 2 table of sample size L and N is then given by the probability of obtaining sets of cell counts that fall within this rejection region for H0. The probability of a margin M is given by the sum of the probability in (1) over the admissible range of cell counts a and e for that margin:
The expected power is thus obtained by the sum, over the possible margins M, of the product between Pr(M) in (3) and β(M), which is the conditional probability in (1) for those sets of cell counts belonging to the critical region RC of that margin for H0,
In the case of tables with ρ1 ≤ 1 and ρ2 ≤ 1 (all the tables can be rearranged to meet this condition without loss of generality), there is an upper bound for the probability of each margin:
The margins with very small upper bound probabilities can be ignored in order to save computation time in solving the sample size problem.
To obtain the minimum sample size to achieve a given power, by trial and error, one finds the smallest sample size resulting in a power not less than the target power. In many studies the cases are much more difficult to obtain than the controls, in which case the control total N could be fixed in advance and the smallest case total L is pursued. For this purpose, we first specify the six independent parameters that completely define the 2 × 2 × 2 probability table: the log values of two odds ratios (ρ1 and ρ2 or ρca and ρam) and four margin probabilities (p, q, r, s) in Table 2. The initial guesses for the minimum L and the control N are provided.
Because of the extremely large number of 2 × 2 × 2 tables to be enumerated, determining specific values of the power is a computationally intensive process. To save computation time, we ignore all margins with very low upper bound probabilities in (5), that is, <10−20 (Fu and Arnold, 1992). Parallel computing techniques are employed to run our C program to reduce further the computation time. Specifically, given the totals L and N, there are a total of (L + 1)(L + 1)(N + 1)(N + 1) possible margins that need to be enumerated. This enumeration task is divided in such a way that each processor enumerates a subset of margins in parallel with other processors, and each margin is enumerated exactly one time by one of these processors. After the enumeration procedure, the master processor collects the intermediate results from each processor and adds them together appropriately. With programs written in C and MPI (message passing interface; Snir et al., 1998), the numerical analysis was performed using 32 processors on the IBM p655 supercomputer in the Research Computing Center at the University of Georgia.
Enumerating all the possible margins M (i.e., k, l, m, and n) is very time consuming, especially for large L and N (Table 1). MCMC methodology may provide a way to save computation time by sampling a subset of margins. The expected power for the given L and N was estimated by the average of the conditional power in (1) for each sampled margin. The Markov chain is initialized to the observed margin and is constructed with the Metropolis–Hastings algorithm in (6) below. The new margin (Mn) is proposed by randomly updating one of the margins, k, l, m, or n of the current margin (Mo). The update is accomplished by adding a random value δ to one of them. The δ value is randomly chosen to be between −Δ and +Δ. The acceptance probability is:
where Pr(M) is computed with (3). The step-width Δ is initialized to L/2 or N/2. Because the acceptance rate affects the efficiency of the Metropolis algorithm (Roberts, Gelman, and Gilks, 1997), the step-width Δ is adjusted as follows to ensure a moderate sample-acceptance rate (i.e., 0.33–0.67) to improve the performance of the algorithm. The adjustor in (7) is predicated on an inverse relation between the acceptance rate and step-width. For every 20-margin proposals, the Δ is adjusted according to the ratio (A) of the number of accepted margins to the number of proposed margins:
where v is initialized to 0.5, and v is adjusted based on both A and the past adjustment of the step-width Δ. In the case of A > 2/3, if Δ was increased last time, v was set to v × 1.25; otherwise, we set v to v × 0.75. (The effect of increasing the step-width should decrease A.) In the case of A < 1/3, if Δ was increased last time, v was set to v 0.75; otherwise, we set v to v × 1.25. (The effect of decreasing the step-width should increase A.) In the case of A ≥ 1/3 and A ≤ 2/3, v was not changed. The history of step-width adjustments is tracked through v to ensure that there are no oscillations in the step-width during a Monte Carlo run. The program starts to compute the conditional power after the chain finishes running the 1-million-step burn-in stage, and the chain stops after running 10 million steps. The MCMC method was implemented serially on one processor of the IBM p655 supercomputer.
Asymptotic solutions to the sample size problem were computed as described in Harter and Owen (1970, p. 10) using the noncentral chi-squared distribution tabulated by Haynam, Govindarajulu, and Leone (1970).
In the pilot study of the Georgia Centenarian Project, the data on 27 SNPs from three candidate longevity genes were collected from 25 centenarians and 24 young controls (one Hispanic young control was not included). The probability tables for these SNPs were estimated by the maximum likelihood method. Six independent parameter estimates for each SNP were then derived from the corresponding table of counts (Table 1) for a power calculation. Because the study calls for 400 young control alleles, the number of young controls is fixed to be 400. Then the minimum number of case alleles necessary to detect an association was obtained under the following standard levels: (i) nominal significance level α = 0.05 and (ii) power 1 − β = 0.8. The “effect size” was fixed at the observed ρ1 and ρ2 (or equivalently ρca and ρam) from the pilot study. The sampling design for the 2 × 2 × 2 table (i.e., a simple random sample from cases and controls) represents an approximation to the more elaborate stratified sampling design for the entire Centenarian Study (Poon et al., 2006).
In Table 3, we present a sample of results computed from these pilot data. For the 2nd, 9th, 11th, and 17th SNPs, less than 10 centenarian alleles, with 400 young control alleles, are required to detect nonrandom associations between factors considered under the above target significant level and power. These four SNPs are likely to be more informative than other SNPs and were recommended for the subsequent major study. In contrast, for the 4th, 13th, 26th, and 27th SNPs, more than 3000 centenarian alleles are required to detect the possible association between factors considered under the nominal significance level and desired power. Because an insufficient number of centenarians is available for this region, it would be pointless for these four SNPs to be used as the genetic markers for the major study in our project.
Because there were 27 SNPs involved in the study, controlling the number of errors in these multiple hypothesis tests should be taken into account. A variety of ways have been reported to control overall error rate in multiple tests (Verhoeven, Simonsen, and McIntyre, 2005). One way is to control the false discovery rate (FDR), the expected proportion of errors among the rejected hypotheses (Benjamini and Hochberg, 1995). There are several advantages to this approach over, for example, a Bonferroni correction. A major one is protecting power. Their procedure is also more biologically plausible for the SNP application in that a false positive does not extract as severe a penalty as a Bonferroni correction because there is some redundancy in the utility of different SNPs (Verhoeven et al., 2005).
In the Benjamini and Hochberg procedure the P-values from the exact test with FDR q* are ranked from low to high, P(1), P(2),…,P(27). Let k be the largest i such that P(i) ≤ q*(i/m), where m = 27 in Table 3. The specific recommendation on a panel of SNPs are those with the P-values P(1), P(2),…,P(k). Using Benjamini and Hochberg's FDR procedure, the 1st, 16th, and 18th SNPs are shown to be significant with a FDR of q* = 0.05 and l = 3. However, at the adjusted lower significance levels [0.05 (1/27), 0.05 (2/27), and 0.05 (3/27)] and with 400 young control alleles, the minimal number of centenarian alleles needed to detect nonrandom associations for these three SNPs are found unchanged compared with the level of 0.05 (unadjusted for multiple testing). Therefore, under the study design with a panel of 27 SNPs given the assumptions in Table 3, the minimal number of cases (the minimal L) required for the study would be 54, 27, and 16 centenarians such that one can declare one or more SNPs significant at a FDR controlled level of 0.05. Because the same centenarians are genotyped for these three SNPs, the design would require 54 centenarians to insure an FDR of no more than 0.05.
It would be useful to have an estimate of the FDR from the P-values obtained on the 27 SNPs. If we are willing to assume that the tests are independent (which is NOT biologically reasonable because of the haplotype structure associating some of the SNPs), we can estimate the FDR and the fraction of null hypotheses that are true (π0) among the 27 null hypotheses tested (Storey, 2002). Storey (2002) advocates an alternative multiple-testing procedure. Storey would reject for P(1),…,P(estl) such that
Suppose we set γ = P(l). Then the estFDR(γ) reduces to:
The Storey (2002) procedure becomes:
The equivalence is evidently between Storey (2002) controlling at q* and Benjamini and Hochberg controlling at q*/π0.
So, we used this equivalence to estimate the FDR for Benjamini and Hochberg's (1995) procedure, using an optimal choice of λ suggested by Storey (2002, p. 493). The choice of λ was optimized on a grid from 0.001 to 0.999 in increments of 0.001 using 500 bootstrap samples from the 27 P-values for each λ. We first cycled through Storey's procedure to obtain for q* = 0.05: estl = 5, P(l) = 0.0456, estπ0 = 0.4498, optimized λ = 0.506, and estFDR = 0.0045. We then readjusted Storey's procedure to a newq* = estπ0 × oldq* = 0.0225, exploiting the equivalence of the two procedures (Benjamini and Hochberg, 1995; Storey, 2002). Then we cycled through Storey's procedure a second time at q* = 0.0225: estl = 4, P(l) = 0.00081, estπ0 = 0.4554 (i.e., 12 of 27 tests are estimated to be truly null), optimized λ = 0.512, and estFDR = 0.0082. By the equivalence, this should be the estimated operating characteristics of the Benjamini and Hochberg (1995) procedure, assuming the tests are independent (which is unlikely to be true).
The input parameters (or effects) that we employed in the exact test to solve the sample size problem are the odds ratios ρ1 and ρ2, which measure the association between SNPs and race in each age group. A major concern is that the association between longevity and SNPs measured by ρca and ρam, is confounded with an association between SNP and race. Because ρca and ρam are interrelated with ρ1 and ρ2, the associations to be detected could be an interesting association with longevity and/or with race. In Figure 1 we show how the odds ratio ρca is related to the sample size for the 2 × 2 × 2 table for the level at admixture in the pilot study.
To show the effects of the odds ratio ρca on the sample size needed, the sample size (the minimum number of the case total) was computed using five fixed parameters [logρ1 and four margin frequencies (p, q, r, s) from the 17th SNP and one varying parameter logρca association of longevity with SNP in Caucasians]. This condition means that in the 2 × 2 × 2 probability table, the probability table for the centenarians is fixed, while the probability table for the young controls varies (consistent with the fact that the probability structure of centenarians in Georgia is not changing for this particular SNP). We have obtained the sample sizes as the logρca varies from 0.5 to 1.0 in increments of 0.1 (Figure 1). The sample size necessary to detect the associations between factors of interest increases from 4 to more than 900 as the logρca varies from 1.0 to 0.5. In this example logρ1 is fixed at −0.6592. The results showed that the required sample size changes dramatically as the logρca varies from a small number to a relatively large number. For the 17th SNP, an association with longevity can be detected down to about logρca 0.86 from Figure 1. It indicates that the probability table of the control group plays a very important role in determining the power of the statistical test in our study when other parameters are fixed. We can also see that when the log odds ratios are close to zero, that is, the nonrandom associations are very small, a large sample size is required to detect associations.
Table 4 shows a comparison between the exact method, MCMC and asymptotic analysis for the sample size problem of 2 × 2 × 2 tables for some of the SNPs under the same conditions. For SNPs that require relatively large L values to detect the associations determined by the exact method, such as SNPs 15, 24, 26, and 27, the Ls determined by MCMC are close to the corresponding Ls required by the exact method. However, for SNPs that require relatively small L values to detect associations, such as SNPs 17, 18, and 1, the L values determined by MCMC are generally smaller than the Ls required by the exact method, and the N values are also smaller than 400. Compared to the results of the exact method and MCMC, the L values determined by the asymptotic analysis are smallest. The CPU time used by exact method is on average 10.16 [S.D. = 9.98, S.E. = 9.88/sqrt(12) = 2.60] times the corresponding time used by MCMC method depending on the SNPs considered.
Many medical and biological studies involve a 2 × 2 × 2 table, and many of these studies use two independent samples, case and control. Our study provides a solution for the sample size problem for an omnibus exact test for detecting nonrandom associations in a 2 × 2 × 2 table, and does not test for individual nonrandom associations in a particular direction in the parameter space (e.g., a particular odds ratio). Our results facilitate experimental researchers optimizing their sampling method by providing a method for computing the minimum sample size needed to detect nonrandom associations in 2 × 2 × 2 tables. Solving the sample size problem is a computationally intensive process. The time in seconds to calculate the power for the test with the sample size L and N is approximately linear in L3N3. In this case, parallel computing techniques were necessary for implementation. We would also like to point out that in those cases where the nonrandom associations between the factors of interest are very small, it takes more computation time to obtain the minimum L. A comparison between the results of the exact method and those of the MCMC method showed that for SNPs that require relatively large L values to detect the associations, MCMC provides a very good approximation faster (10.14 times on average for examples considered) than our exact method as suggested by Guo and Thompson (1992) in another context. However, for SNPs that require small L values to detect the associations, the exact method is the only method to provide the correct solutions.
It should be noted that there are several forms of the exact test (Yates, 1984; Agresti, 1992) including: (i) Fisher's one-sided exact test where the larger of the two tails is not larger than α/2; (ii) Fisher's one-sided exact test where the larger of the two tails is not larger than α; and (iii) Fisher's two-sided exact test. Fu and Arnold (1992) did not find appreciable differences in the solutions for these three tests for 2 × 2 tables. It was also found that the two-sided test tended to be the most conservative with respect to sample size. For these reasons and others (Sánchez et al., 2006), solutions for the sample size problem for the 2 × 2 × 2 table have only been reported for the two-sided test against an omnibus alternative.
An interesting generalization of the sample size problem considered here is a multiple testing problem involving hundreds or even thousands of SNPs simultaneously (Zou and Zuo, 2006). In solving the sample size problem they try to control FDR or related criterion and use an asymptotic solution for the samples sizes with all of its inherent limitations. As shown under results, a sample size recommendation can be made to control FDR without resorting asymptotic solutions for the sample sizes.
The Georgia Centenarian Study (Leonard W. Poon, PI) is funded by 1P01-AG17553 from the National Institute on Aging, a collaboration among The University of Georgia, Louisiana State University, Boston University, University of Kentucky, Emory University, Duke University, Rosalind Franklin University of Medicine and Science, Iowa State University, and University of Michigan. The authors acknowledge the contributions of the Study's project and core leaders to this article: L.W. Poon, S. M. Jazwinski, R. C. Green, M. Gearing, W. R. Markesbery, J. L. Woodard, M. A. Johnson, J. S. Tenover, I. C. Siegler, P. Martin, M. MacDonald, C. Rott, W. L. Rodgers, D. Hausman, J. Arnold, and A. Davey. We also acknowledge M. A. Batzer, E. Cress, and L. S. Miller for their contributions. Authors acknowledge the valuable recruitment and data acquisition effort from M. Burgess, K. Grier, E. Jackson, E. McCarthy, K. Shaw, L. Strong, and S. Reynolds, data acquisition team manager; S. Anderson, E. Cassidy, M. Janke, and T. Savla, data management; M. Durden for project fiscal management. We thank H.-B. Schüttler for the suggestion of the step-width adjustor for the MCMC method. The work was supported by NIH Program Project Grant NIH 5P01 AG017553-03. We also thank the UGA College of Agricultural and Environmental Sciences for their support. We thank the reviewers and associate editor, whose comments substantially improved the manuscript.