Because of the complexity of human LD patterns, many questions of interest cannot be addressed analytically. We have described in detail our simulation method, HAPGEN, for generating large samples of case and control data at every HapMap SNP, which mimic the patterns of diversity and LD present in the HapMap data. The software can simulate case data under a single causal disease SNP model for specified genotypic relative risks. We have used the method here to assess the power of various commercially available genotyping chips for case-control genome-wide association studies, but note that it could be utilised to assess other design questions, in the evaluation of analytical methods, and in considering follow-on studies such as resequencing and fine-mapping.
In Caucasian populations the differences in power afforded by current-generation genotyping chips are not large, and the power of these chips is close to that of an optimal chip which always directly genotyped the causal SNP. Listed in order of decreasing power for the CEU population, averaged over all potential disease SNPs with RAF ≥5%, the chips we considered were: Illumina 1M, Illumina 650 k, Illumina 610 k, Affymetrix 6.0, Illumina 300 k, Affymetrix 500 k and Affymetrix 100 k. In line with our previous work we have shown that imputation can boost the power of each chip substantially and that the resulting power will approach that which could be obtained by a hypothetical ‘complete’ chip that types all the SNPs in HapMap.
One limitation of the approach we (and others 
) have used is that the causal SNP is assumed to be one of those SNPs in the HapMap panel and this will not always be true. Other studies 
have shown that the majority of SNPs not in HapMap will be highly correlated with the SNPs that are in HapMap and this is especially true for the more common SNPs. This means there is a slight bias in our power results for each chip and for the use of imputation but we do not expect it to be large. A consequence of this point is that the power we estimate for the ‘complete’ chip approximates the power we might obtain if we had a chip which typed all the SNPs that exist in the human genome.
A main conclusion from our analysis is that study size is a crucial determinant of the power to detect a causal variant. Increasing study size typically has a larger effect on power than increasing the number or coverage of SNPs on the chip, at least amongst chips currently available. Even for effect sizes at the larger end of those estimated to date for common human diseases (RRs of 1.3–1.5) quite large sample sizes, at least 2000 cases and 2000 controls and ideally more, are needed to give good power to detect the causal variant. When case numbers are limited, there are still non-trivial gains in power available from increasing just the number of controls. Care is needed in assessing the appropriateness of a set of controls, but as larger sets of control genotypes are made publicly available this strategy has considerable appeal, whatever the number of available cases. SNPs with smaller effect sizes are unlikely to be detected even in studies of the sizes currently undertaken, but as has been shown empirically for several diseases, these can be found by meta-analyses which combine different GWAs, or by follow-up in large samples of SNPs which look promising in the original GWA but fail to meet the low levels of significance thought appropriate for GWAS.
When the causal SNP is rare (MAF<10%), all chips have low power unless its effect is large and sample sizes are large. This conclusion would hold even if the chip directly genotyped the causal SNP. The relative ordering of different chips, on the basis of power, also changes in this context.
As would be expected, power is also lower for all chips for samples which match the patterns of LD seen in the Yoruba HapMap sample, and again the relative ordering of chips changes in this setting. It is not yet clear how well the results for the Yoruba would extend to other African populations.
An often-quoted metric in assessing chips is the coverage of each chip: an estimate of the proportion of SNPs which have r2>0.8 with at least one SNP on the chip. Although relatively simple to calculate (and even simpler to miscalculate), not least because it does not depend on study size, our results show that coverage can be a poor surrogate for power, and that relatively large differences between chips in coverage do not translate to large differences in power.
The sets of SNPs on Illumina chips are chosen in part to maximize particular criteria, such as coverage, for certain populations, typically those in HapMap. One difficulty of analyses such as those in this paper is that these resources are also the natural ones with which to assess properties of the chips. Thus when Illumina chips “tuned” to one population (say the 610 K chip for CEU) are used in other populations, power might be systematically lower than the levels assessed here. In contrast, SNP sets of Affymetrix chips are chosen largely in a non-population specific way. While power is likely to vary in populations other than those we have considered here, there is not the same systematic effect which would lead to a decrease in power. A quantitative assessment of this phenomena will be possible when dense genotype data is available for other populations, such HapMap Phase 3.
We have assumed here that accurate genotypes are available for all SNPs on each chip. In practice some SNPs on each chip will fail QC tests and not be available for analyses. As a consequence, our study will overestimate power, though this effect is unlikely to be large. We are only able to use SNPs in HapMap as potential disease SNPs. These may not be systematically representative of all potential disease SNPs. HapMap SNPs have systematically higher MAFs than do arbitrary SNPs 
, but for SNPs within a particular range of MAF, it seems unlikely that their LD properties will differ systematically, so, for example, we would expect our results for common SNPs to extend beyond those in HapMap.
We have focussed on the most common GWA design, namely of a single-stage study, and the simplest disease model. The flexibility of the simulation approach allows many other practical aspects of study design to be incorporated into power calculations. These include more complex disease models, two-stage strategies (the starting point for our work was a comparison of power for one- and two-stage designs in the context of the WTCCC study 
), genotyping errors, QC filters, misidentification of cases as controls and simple types of population structure. The HAPGEN software also provides a useful tool for the development and comparison of more sophisticated multi-marker approaches to detecting disease association (e.g. imputation 
). We therefore believe that simulations are an essential tool in the design of association studies by allowing a focus on study power and an assessment of the affect on power of following a given study design. We hope that this method will continue to find use and can be extended to new catalogs of genetic variation such as the 1000 Genomes Project http://www.1000genomes.org/
As in other areas of science, power seems a central consideration in study design and choice of genotyping chip. But other issues may also play a role. These include coverage of particular genes, or genomic regions of interest; the utility of GWA data for directing downstream studies such as resequencing and fine mapping; data quality for particular chips; and the extent to which a chip reliably assays other forms of genetic variation such as copy number polymorphisms. Adding data to existing studies is straightforward if the same chip is used, but the success of imputation methods, in particular in meta-analyses 
means that this is not essential.
In general, Affymetrix chips have more redundancy than do Illumina chips, in the sense of containing sets of SNPs which are correlated with each other. The immediate consequence of this is lower coverage and lower power for the same number of SNPs, but there can be advantages to this redundancy: loss of a particular SNP to QC filters may not be as costly; and signals of association are likely to include more SNPs, thus making them easier to distinguish from genotyping artefacts.
Ultimately power can only be calculated under an alternative model. Thus on a practical level the optimal choice of assays and sample sizes will actually depend on the researcher's belief regarding the unknown distribution of effect sizes and models relating genotype and phenotype. In particular we show that one might adopt different strategies depending on the expected frequency of disease causing variant, the effect size and even the population from which cases and controls are sampled ().
In the continuing search to better understand the genetic basis of common human diseases, numerous study designs can be adopted which may involve combining data sets, imputing missing SNPs 
, distilling signals of association over multiple experimental stages, and so on. In this complex setting study power will remain a central criterion in study design, and the kinds of approaches developed here will continue to allow informed decision making by experimenters.