The power calculations require genotype data on a large representative sample of common SNPs from the population as well as a list of which of these representative SNPs are the tag SNPs (SNPs to be genotyped). Power is computed in three steps. First the best tag SNP for each of the representative SNPs is found. Then, the power for detecting association for each of the representative SNPs assuming that SNP directly influences the phenotype is computed. For this computation, it is assumed that the study will be performed by testing for genotype frequency differences between cases and controls using a two-degree of freedom χ2
test in which multiple tests are corrected for using the Bonferroni correction. This test explicitly assumes a codominant model. I use this test because it is the most general, at the cost of reduced power relative to a model-specific test. While a multimarker tagging approach could be taken [13
], this added level of complexity is not usually included in a first-pass analysis of genome-wide association data and is therefore including it in our power-calculation would inflate the power one might expect in real-world application of genome-wide association studies. Finally, the average power over all the SNPs is taken to be the power of the study.
Taking the average power over all the SNPs is justified using probability theory. Assume there are N
SNPs present in a given population, each one represented as Si
. Let Ci
represent SNP i
being causative, and Di
represent SNP i
being detected. Assume that one of these SNPs is the causative SNP, but it is unknown which of these is the causative SNP. Then the overall power of the study is given by
. The power computed for a specific SNP Si
is given as Pi
). Thus, if each Pi
multiplied by Pr(Ci
), we get
The added assumption that each SNP is equally likely to be causative yields
This final equation is the same as taking the average power over all the SNPs.
This method was applied to examine the power of genome-wide association studies in the four populations studied in the International HapMap Project [19
]. I examined the performance of the tag SNPs provided by the major high-density genotyping platforms available commercially: 100 K and 500 K SNP sets from Affymetrix and 300 K and 550 K SNP sets from Illumina. (Since then, more products have come on the market; the same approach can be taken with them.) I first asked how many SNPs on each of these arrays would be useful for studying a given population by asking what percentage of tag SNPs provided by each platform are common (minor allele frequency > 5%) in each of the four HapMap populations (Table ). The largest fraction of common SNPs is found when the Illumina chip is used in the CEU population. As the Illumina chip was designed to optimize coverage of the CEU population, this result is unsurprising.
The number of SNPs present in each population and present in each commercial genotyping system
I next asked how power changes with increasing sample size for the various genotyping platforms (Figure ), populations, and models. For all sets of tag SNPs, as expected, power increases both as the sample size increases and as the magnitude of effect, measured by the genotype relative risk (GRR), increases. While Figure only shows this data for a multiplicative model in the CEU population, similarly shaped curves were observed in the other populations and for other models [see Additional file 1
]. In the Affymetrix 500 K and Illumina 300 K SNP sets, the slope of the power curve starts leveling off (approaching zero) with a few thousand individuals when GRR is more than 1.5. For smaller GRRs, the sample sizes required for adequate (at least 50%) power becomes quite large.
Figure 1 Power for the test of genotypic association as a function of sample size at different genotype relative risks (GRR). All panels are for the CEU HapMap population when the number of cases equals the number of controls and a multiplicative model is used. (more ...)
One critique of this approach is that the non-specific test used may not be the most powerful approach if we know the genetic model the disease follows. For instance, to study a trait that we believe follows a multiplicative model; a 2 × 2 contingency table to test for allelic association may be more appropriate. Power calculations for this test (Figure ) shows that the relative pattern is the same as for a test of genotypic association, but the power is generally increased when an allelic test is used in instead of a genotypic test. Similar power calculations can be done if one wants to use an explicit test for a dominant or recessive mode of inheritance. However, as can be seen in this comparison between the Affymetrix 500 K and Illumina 550 K genotyping system, choice of SNPs and sample size can play a bigger role in determining power than choice of test. For the specified GRR of 1.5, the Illumina 550 K system with a genotypic test is more powerful than the Affymetrix 500 K system when sample size is greater than 2000 individuals (Figure ).
Power for genotypic and allelic tests. Data is shown for a GRR of 1.5 under a multiplicative model, the CEU HapMap population, and the specified genotyping system.
Another possible criticism of this method is that the SNPs genotyped as part of the International HapMap Project may not be a representative subset of the common SNPs in the genome as a whole. To investigate this possibility, I compared the coverage of the various SNPs in the ENCODE and non-ENCODE regions from the HapMap project (Figure ). Since the ENCODE regions of the HapMap project were completely resequenced in a subset of 48 individuals, I hypothesized that almost all common (minor allele frequency >5%) variants would have been identified in that region. If the SNPs genotyped as part of the HapMap are a representative subset of all of the common SNPs, then the coverage of an arbitrary set of tag SNPs should be equal for the two data sets. Assuming tag SNPs were chosen similarly for the ENCODE and non-ENCODE regions, relying on the HapMap data slightly overestimates r2 with the tag SNPs and therefore could slightly inflate the power estimation. As the fraction of SNPs with an r2 greater than the cutoff differs between the ENCODE and non-ENCODE regions by at most ten percentage points, and an average of three percentage points, this overestimation is not likely to be extreme.
Figure 3 Coverage of tag SNPs. Fraction of non-tag SNPs in LD with a tag SNP with r2 above specified threshold for the ENCODE and non-ENCODE regions of the HapMap project for the CEU and YRI populations. Results are shown for the Illumina 550 K (A) and Affymetrix (more ...)
An easy and useful way to compare the power of different tag SNP sets in different populations is the sample size needed to achieve 80% power. The Illumina 550 K clearly performs best in all three populations (Figure ). For the CEU population, the Illumina 300 K outperforms the Affymetrix 500 K, while in the other two populations the Affymetrix 500 K is better. This is not surprising, as the Illumina chips were optimized on CEU HapMap data. As the Affymetrix 500 K set is really two independent 250 K sets, I also looked at the power of each 250 K set individually. While the complete 500 K set of SNPs has more power than either half, the number of individuals required for 80% power using one half of the set is never twice the number required for the full set. This means that in cases when the number of chips that can be run rather than number of available samples is the limiting factor, it might make more sense to genotype more individuals using only one chip than to genotype fewer individuals using both chips. To test this hypothesis, I plotted power versus the number chips needed for the components of the Affymetrix 500 K system (Figure ). The number of chips is simply the sample size for Nsp and Sty alone, and twice the sample size for the Nsp+Sty combination. Except in cases where power gets very high due to a large GRR and/or sample size, for a constant number of chips using only one of Nsp or Sty on more individuals provides a more powerful study.
Figure 4 Total individuals required for 80% power. The computations assume the number of cases equals the number of controls and a GRR of 1.75. CEU, JPT+CHB, and YRI are the HapMap populations. Affy 250 K Nsp and Affy 250 K Sty represent the two chips that make (more ...)
Power as a function of number of chips needed for the Affymetrix 500 K system and its two components. Calculations are done for a GRR of (A) 1.5 and (B) 2.0.
I have presented a method to compute the power of a genome-wide association study in which a fixed set of tag SNPs will be genotyped. For the sake of simplicity, I only considered one straightforward single-SNP analysis scheme. While this approach has been used successfully [6
], others have suggested that greater power can be obtained by looking at multiple tags or haplotypes [18
]. This method for computing power can be adapted to such strategies provided it is possible to compute the power of detecting each SNP in the population given the set of tagging SNPs. I also assume that each SNP is equally likely to be functional. If we knew a priori
the probability that a given SNP is functional, we could use this to weight the average power over all the SNPs. Such a weighting scheme would prioritize SNPs more likely to be of interest because of either functional considerations or location [21
]. For instance, assume we assigned each SNP a probability of being the causative SNP based on external evidence such as a prior linkage study. If these probabilities are normalized to sum to one, they can be used to compute a weighted average power in this approach.