The genome-wide imputation of genotypes has recently attracted significant attention given its broad applicability in the era of GWAS. It is likely that many disease variants have small effects, so that even today's large studies are underpowered to detect most of these effects. Therefore, combining data across multiple studies will be essential to uncovering the genetic complexity of common human diseases. Given this, one of the major utilities of genotype imputation is in combining data from studies that use different genotyping chips to facilitate the meta analysis of multiple GWAS [5
]. We explored three critical issues related to GWI: (1) accuracy of the imputed genotypes, (2) the extent to which imputation increases the power to detect associations, and (3) degree to which imputation increases resolution of the association peak.
The first issue we explored is particularly relevant for combining datasets. We systematically assessed a number of factors that had the potential to influence the accuracy of imputed genotypes. First, because accuracy depends on SNP density, we observed that the Ilmn650Y was superior to earlier versions of arrays. Further, we expect the new Affymetrix SNP 6.0 will offer good imputation performance similar to Ilmn650Y because their genetic coverages are comparable. Hence, the price of the array could be the most important factor in choosing a genotyping platform, given sample size has a profound impact on statistical power [3
]. Second, the similarity of LD patterns between the study sample and the reference has a significant impact on accuracy given the untyped SNPs are imputed based on haplotypes seen in the reference population (e.g. HapMap samples). Our findings indicate the need to extend the HapMap project to additional ethnic groups. For example, extending HapMap data on Native Americans would be useful for GWI in Hispanic American subjects. Third, we found high accuracy could not be achieved when the untyped SNPs were in weak LD with assayed ones. For example, untyped Ilmn650Y SNPs were imputed less successfully by Ilmn317K than randomly selected HapMap SNPs. This finding is theoretically easy to understand and nicely consistent with simulation studies, where Pei et al found all imputation methods performed better in strong LD regions versus weak LD regions [13
]. In addition, we observed low GWI accuracy in African Americans compared to Caucasian Americans, given the genetic coverage of the SNP arrays were lower in African populations. This is related to the dependency of LD strength on imputation accuracy, given the relatively weak LD for African subjects [21
]. Zhao et al recently reported the IMPUTE software achieved substantially higher accuracy in Caucasian and Asian subject compare to African subjects, a difference explained by their LD differences [21
]. Arrays with even higher marker density are necessary to capture more genetic information in genomes of African subjects, and such arrays will boost GWI performance in African American samples. Fourth, when studying GWI accuracy by randomly masking out SNPs, we also binned masked SNPs into minor allele frequency (MAF) categories (Additional file 6
) and examined whether MAF affected accuracy. At high QS scores (e.g. QS ≥ 0.8), MAF had little impact. At low QS scores, we saw significantly more errors for high frequency SNPs. These results are consistent with those of Pei et al, where they found MAF affected accuracy in low but not high LD regions [13
]. Finally, population structure could affect GWI accuracy as well as bias downstream association tests. We found eigenstrat and the human genetics diversity project (HGDP) were highly powered for detecting population admixture, which have been shown in previous reports [17
]. More importantly, application of these tools to our study samples revealed relationships to the 51 ethnic groups collected worldwide, guiding the appropriate choice of GWI reference. For example, HGDP elucidated the admixture of the Native American and Caucasian genetic components in Hispanic American samples. Therefore, including Native American data in the GWI reference panel would be critical to achieving high accuracy when imputing Hispanic American subjects. One caveat should be noted, while GWI can be successfully carried out in the presence of population admixture, such admixture could nevertheless lead to false-positive associations unless proper adjustments are made.
The second issue we explored in the context of GWI was statistical power, one of the most critical issues in genetic studies involving complex traits. Whether GWI genotypes provide extra power in a GWAS setting has been studied via simulation [4
]. However, such studies make a number of modeling assumptions that may or may not be true in practice. By leveraging the ~40,000 expression phenotypes measured in the liver gene expression cohort [15
], we were able to assess statistical power empirically. Incorporating GWI provided a 5.5% increase in power with respect to the Ilmn317K array. This increase in power is likely due to the incomplete coverage by the Ilmn317K array, so that GWI is able to extract moderately more information from the genome. Similar results were obtained for the Affx500K array [5
]. In contrast, the power gain by GWI is only 3.3% over that achieved by the Ilmn650Y array, given the already high genetic coverage by this array. Taken together, the power increase achieved by imputing genotypes is not more dramatic because the SNP arrays considered in this study are already quite dense, in addition, imputation introduces more tests, resulting in an increase in the multiple testing penalty. Interestingly, the power of the Ilmn317K SNPs + Imputed SNPs resulted in lower power than the Ilmn650Y SNPs to detect cis eQTL in the deLiver cohort (Figure ), indicating the regions poorly covered by the Ilmn317K SNPs cannot be recovered by imputation. That is, the latest high density arrays cannot simply be replaced by GWI even for studies on Caucasians. These observations are consistent with previous reports that there was not a substantial gain in power by genotyping all common SNPs compared to genotyping only those SNPs represented on the Ilmn650Y array [3
]. The Ilmn650Y (Affymetrix SNP 6.0 as well) are tag SNP arrays trained on the HapMap data (270 individuals and about 2.5 million common SNPs). Hence, GWI using HapMap data as a reference will not provide much additional information. Given 7.5 million common SNPs exist in the human genome [1
], it is essential to generate a reference SNP panel that goes beyond HapMap (e.g., incorporating novel SNPs and recruiting a greater diversity of individuals) if we hope to significantly increase the power gains that can be achieved by GWI in GWAS.
Finally, we found that GWI provided a denser association map with superior resolution power, enhancing our ability to define the boundaries of the association peak and infer the true causal variants. While the majority of cis eQTLs could be identified using only assayed SNPs, we found that imputation enhanced the p-values for 43.2% of the cis eQTLs. More importantly, incorporating GWI shifted the QTL peaks (i.e., the smallest p-value in the QTL) closer to the structural genes. For example, the TSS eQTLs, whose significance level gained at least one order of magnitude by imputation, moved considerably closer to the TSS (median distance shifted from -19.4 Kb to +1.6 Kb with imputation). The strongest eQTLs tend to cluster near the genes' TSS and TES regions, forming a bimodal distribution (Figure ). These observations match our current understanding that transcription initiation driven at the TSS of the gene is among the most important determinants of transcript levels, and supports a growing number of observations from the ENCODE project and others that many transcription factors bind near the TES of the gene. In addition, miRNAs are known to affect transcript stability and often bind transcript regions that are near the TES. Given the above considerations, we believe the shift of QTL peaks when incorporating GWI indicates the association hits are more proximal to the causal variants.
Multiple testing is a critical issue for GWAS with or without incorporating GWI genotypes. In the paper, we did not focus on the SNP-trait association p-value, because it certainly requires rigorous correction. Instead, we derived FDR empirically, which addressed the multiple testing and allowed direct comparison of number of discoveries (i.e., relative statistical power) among various SNP panels. The Bayes factor measures the impact of the data on the support for Ho
in preference to Ha
. It has been used in an eQTL study by Veyrieras et al [26
]. The interpretation of a Bayes factor obviates the need for an adjustment for multiple comparisons. The frequentist and Bayesian approaches have been compared on simulated data [12
], where the two strategies performed similarly at low FDRs (e.g. 10%). However, it is difficult to compute the FDR using Bayes factors on real data where the truths are unknown. In contrast, FDR is straightforward to derive using frequentist methods. As discussed above, the FDR provides a path to empirically assess statistical power, and so we chose the frequentist approach for the analyses carried out herein.
Clearly, GWI is very accurate when based on genotypes of the today's high density arrays. Previously we proposed methods to incorporate genotype uncertainty in association test [27
], and found that low genotype error rates (e.g. 2%) had almost no impact on power or point estimation of effect size. Therefore, conventional test methods might be sufficient. The MACH algorithm outputs the QS for each SNP, which provides a path to control the imputation uncertainty. Shown in Additional file 4
, we chose different QS cutoffs to filter out less accurately imputed SNPs and found the statistical power was not sensitive to the filtering. In summary, we found the Ilmn650Y and Affx500K + custom array could impute the entire HapMap set accurately, at least among Caucasians. This is encouraging news for researchers regarding merging data created on different platforms. Our results may also serve as a guide with respect to choosing an array type for a given study. Because sample size has a more profound impact on GWAS statistical power compared to genetic coverage of the SNP array [3
], genotyping more subjects using cheaper arrays will provide significantly more power to detect associations between SNPs and traits of interest.