GWA studies undoubtedly constitute the present state-of-the-art in efforts to elucidate the genetic aetiology of complex phenotypes. Several commercial products offering the potential to simultaneously assay hundreds of thousands of SNPs genome-wide are available from companies such as Affymetrix and Illumina. These have varying SNP content and density, and have been designed using diverse marker selection strategies (). For example, arrays with an exon-centric SNP content, such as the Illumina Human-1, reflect strategies focusing on potentially functional variants. LD-based platforms contain tag sets of SNPs selected to maximize the amount of common variation captured on the basis of HapMap data. Affymetrix platforms comprise quasi-randomly distributed SNPs or a combination of random and tag SNPs. In recognition of their potential role in complex disease susceptibility, copy number variants (CNVs) are also increasingly featured.
Overview of marker content and array design across commercially available platforms and coverage of common variation (MAF > 0.05) based on HapMap phase II data
summarizes the extent to which different platforms capture common (MAF > 0.05) variation based on published evaluations in the three different HapMap phase II populations [11
]. Coverage in European- and East Asian-descent populations is very high and has substantially improved with next generation chips. Information capture in African-descent populations is lower, reflecting higher recombination rates and lower levels of inter-marker correlation. However, it has been shown theoretically that coverage of all common variation based on HapMap has been overestimated and that larger sample sizes and denser marker sets are required for more accurate estimation of tagging SNP efficacy [19
]. Overestimation of previously reported coverage estimates has also been empirically confirmed by the analysis of sequence-derived variation data from 76 genes in HapMap samples [21
]. Although variation capture is an important consideration in GWA study design, it is not the sole determinant of power.
The statistical power of a GWA study to detect variants associated with disease is a function of sample size, the susceptibility locus effect magnitude, risk allele frequency of the queried SNP and its correlation with the causal variant. Although the allelic architecture of complex traits has not been fully characterized yet, recent GWA scans and follow-up studies have highlighted that common susceptibility loci are likely to have modest or small effect sizes [allelic odds ratios (ORs) between 1.1 and 1.5]. In a genome-wide setting, the large number of tests performed requires stringent thresholds for declaring statistical genome-wide significance (P
= 5 × 10−8
], necessitating large-scale sample sizes. For example, in order to achieve 90% power to detect a risk allele with 0.20 frequency and an allelic OR of 1.2 (at the genome-wide significance level), more than 6000 affected individuals and twice as many controls would be required (). To achieve the same power to detect similar effects at lower frequency variants (frequency of 0.05 or less), a GWA study would need upwards of 20 000 cases ().
Figure 1: Number of affected individuals required (given a case/control ratio of 1:2) in order to achieve 10, 50 and 90% power to detect an effect at α = 5 × 10−8 for variants with modest to low effect sizes (allelic odds ratios 1.10, 1.15 (more ...)
Along with sample size considerations, GWA studies have also given rise to several logistical challenges: for example, issues relating to automated but accurate genotype calling, programmatic data handling and parsing, genotype quality control (QC) standards and analytical considerations that did not previously apply to smaller scale studies.
Genotype calling is the process by which hybridization intensities on genome-wide chips are translated into genotypes. Typically, intensities are normalized and transformed into coordinates which yield distinct genotype clouds. As high call rate and accuracy of genotype calling are important factors in safe-guarding QC standards in GWA scans, a variety of genotype calling algorithms have been developed and continue to evolve [24–27
]. The possible adverse effects of inaccurate genotype calling in downstream analyses have been recognized for a while [28
]. Therefore, inspection of intensity plots for interesting association signals is an essential aspect of genotype QC.
Genotype QC is an extremely important step in GWA studies, as it can dramatically reduce the number of false positive associations. The field has converged to an essential set of QC checks; summarizes the sample- and SNP-based QC steps that are typically employed.
Flowchart of the main quality control steps in a GWA study.
SNP call rate is a good indicator of genotype probe performance. Removing SNPs with a greater proportion of missing genotypes is essential to control for false positives, as spurious associations can arise due to non-random missingness. Checking for gross departure from Hardy–Weinberg equilibrium (HWE) could help in identifying SNPs with genotyping errors (e.g. excess of heterozygotes). As clustering algorithms tend to perform less well for SNPs with low-frequency alleles, it is current practice in GWA studies to exclude rare SNPs from single point analyses (these are underpowered to detect effects anyway). Genotype calling algorithms have the potential to make incorrect calls. Therefore, inspecting intensity plots, though not feasible on a genome-wide scale, is necessary for SNPs with interesting association signals.
Sample call rate is a good indicator of hybridization performance; high rates of missingness usually indicate low DNA quality or problematic arrays. Discrepancies in gender assignment (SNP data versus phenotype data) can help identify sample mix-ups. Excess genome-wide heterozygosity may indicate possible contamination leading to a larger proportion of heterozygous genotypes. Accidentally duplicated and related individuals in large-scale studies can be identified through identity-by-descent estimation given identity-by-state information in a relatively large homogeneous sample [29
]. Typically, the sample with the lowest call rate from each pair of related individuals is removed. Finally, ethnic outliers can be detected and either removed or accounted for in downstream analyses.
Population stratification can be a major confounding factor in GWA studies, both for case/control designs and population-based quantitative analyses. If undetected, it can lead to false positive associations due to differences in allele frequency between the different populations [30
]. To guard against it, most GWA scans attempt to match cases and controls for broad ethnic background from the outset and then rely on statistical approaches to detect population substructure and correct for it [29
]. Genomic control (λ) is an estimate of the degree of inflation of the test statistics genome-wide and can serve as a crude correction factor [31
]. Principal component analysis [32
] and multidimensional scaling [29
] are methods employed to identify individuals of different ethnic origin visualized onto a two-dimensional projection on axes of genetic variation. Inferred principal components can be included as covariates in association analyses.
Directly typed SNPs in GWA studies are typically analysed by single-point methods, most frequently under the additive or multiplicative model. General models are less frequently tested as they increase dimensionality; dominant and recessive models are equally parsimonious but generally less powerful than the additive model. Multimarker tests (such as sliding haplotype window analyses) are less feasible at the genome-wide scale. However, imputation approaches have recently been developed to take into account information from multiple surrounding markers in order to infer genotypes at untyped loci [33
]. Imputation therefore currently allows testing for association at >2.5 million markers genome-wide, thus maximizing information output from GWA studies, and additionally serves as an ideal tool for the combination of data from GWA scans that have been carried out on different platforms. The analysis of imputed data necessitates taking into account uncertainty by analysing the full genotype probability distribution appropriately.
The sheer number of SNPs tested for association with disease raises important statistical considerations about type I error and statistical significance levels. To account for the inflation in false positives, a variety of approaches, such as the conservative Bonferroni correction and the less stringent control of the false discovery rate [34
], have been proposed. Obtaining empirical P
-values after hundreds of thousands or millions of permutations are an alternative but prohibitively computer-intensive way to assess statistical significance. To overcome the multiple testing problem, stringent genome-wide significance thresholds have been proposed: adjustment for 1–2 million independent tests at common variants genome-wide has resulted in the aforementioned generally accepted significance threshold of P
= 5 × 10−8
]. In practise, most GWA studies prioritize signals for follow-up on the basis of their relative statistical strength for association and on evidence accrued from bioinformatics approaches. Replication in independent datasets (of the same variant, in the same direction, under the same model) constitutes the gold standard in genetic association studies of any scale.
T2D serves as a prime example of the success of the GWA scan approach. Over the past 2 years, multiple GWA scans have been published, greatly accelerating progress in identifying novel susceptibility variants for the disease [24
]. This first wave of studies collectively raised the number of established T2D loci to 11.
Approaches aiming to identify complex trait susceptibility loci have recently also extended to the meta-analysis of diverse scans carried out for the same phenotype. This move in the field has been brought about by the realization that effect sizes for common variants are becoming increasingly low. As attests, sample size is one of the most important factors in boosting power for an association study. Synergy across research groups, leading to the synthesis of GWA scan results, can greatly increase sample size and, hence, power to detect small individual effects. Several design and analytical challenges are associated with GWA scan meta-analysis (reviewed in [43
]). These collaborative efforts have recently started to successfully extend the list of robustly replicating associations with complex traits [44–48
]. For example, the Diabetes Genetics Initiative, Finland–United States Investigation of NIDDM and Wellcome Trust Case Control Consortium T2D scans undertook a three-way meta-analysis, which led to the identification of 6 novel susceptibility loci [44