|Home | About | Journals | Submit | Contact Us | Français|
A major challenge in genome-wide association studies (GWASs) is to derive the multiple testing threshold when hypothesis tests are conducted using a large number of single nucleotide polymorphisms. Permutation tests are considered the gold standard in multiple testing adjustment in genetic association studies. However, it is computationally intensive, especially for GWASs, and can be impractical if a large number of random shuffles are used to ensure accuracy. Many researchers have developed approximation algorithms to relieve the computing burden imposed by permutation. One particularly attractive alternative to permutation is to calculate the effective number of independent tests, Meff, which has been shown to be promising in genetic association studies. In this study, we compare recently developed Meff methods and validate them by the permutation test with 10,000 random shuffles using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K Array Set. Our results show that the simpleM method produces the best approximation of the permutation threshold, and it does so in the shortest amount of time. We also show that Meff is indeed valid on a genome-wide scale in these data sets based on statistical theory and significance tests. The significance thresholds derived can provide practical guidelines for other studies using similar population samples and genotyping platforms.
With the advent of high throughput genotyping technology, genome-wide association studies (GWASs) are becoming more common. One of the major challenges in GWASs is to derive the multiple testing correction threshold. There are two simple methods in practice: Bonferroni and Šidák corrections[Šidák 1967]. These methods try to keep the experimental-wise error rate (αe) at a nominal level, e.g. 0.05, by adjusting the point-wise error rate (αp). Suppose that there are M independent tests, Šidák correction gives αp = 1−(1−αe)1/M and Bonferroni correction gives αp = αe/M. These two methods are very closely related and lead to similar results. Unfortunately, they can be conservative if the independence assumption does not hold, and this is usually the case for densely typed single nucleotide polymorphisms (SNPs) in current genetic association studies.
In genetic association studies, researchers treat permutation tests as the gold standard for multiple correction adjustment. For the permutation tests of population-based studies, phenotypes are randomly shuffled while the SNP genotypes are kept the same [Churchill and Doerge 1994]. Hence, the correlation structure among SNPs is preserved and the phenotype is permuted to simulate the null hypothesis of no association. However, permutation tests are hindered by the large number of random shuffles required, which take considerable computing time and resources. If a large number of SNPs, e.g. 1M or 500K, are involved in the genetic association study, permutation tests become impractical for GWASs because it can take years to finish on a single modern PC if a substantial number of random shuffles are carried out [Conneely and Boehnke 2007; Gao, et al. 2008]. Reducing the number of random shuffles to make the computation feasible results in thresholds that suffer from high levels of variation. If we increase the number of random shuffles, the computing time increases tremendously. Therefore, less computationally demanding methods that can approximate permutation tests with a large number of shuffles are desirable.
One particularly attractive alternative to permutation tests is to calculate the effective number of independent tests, Meff [Cheverud 2001; Gao, et al. 2008; Li and Ji 2005; Moskvina and Schmidt 2008; Nyholt 2004; Nyholt 2005]. Despite criticism from Salyakina et al. , the validity of Meff has been shown by many researchers. The idea of Meff-based approaches is simple: “filter out” the correlation among tests, leaving only the effective number of independent ones, Meff, and then use the Bonferroni or the Šidák correction by replacing M with Meff in the corresponding formula.
Meff-based corrections have been shown to be effective and fast. The Meff approach was first proposed in genetic studies by Cheverud  and later refined by Nyholt . Later, it was found that Nyholt’s approach can be conservative when haplotype block structure exists [Salyakina, et al. 2005]. Salyakina et al.  also reported some anti-conservative behavior for Nyholt’s approach for data with strong linkage disequilibrium (LD) though these observations have not been seen by the other investigators. Li and Ji  proposed another Meff estimation formula and showed an improvement in power over Nyholt’s approach but at the expense of the type I error rate. Cheverud’s, Nyholt’s and Li and Ji’s Meff methods, however, were validated using only a small number of SNPs, and it remained to be seen how well they performed using a large number of correlated SNPs. Recently, Gao et al.  proposed to a principal component approach to derive Meff, named simpleM, and showed that it provided a better estimate than Li and Ji’s method in the presence of a large number of SNPs. Moskvina and Schmidt  derived a new Meff estimation formula, calling it Keff, and validated it using a GWAS data set.
Gao et al’s simpleM method performed accurately and efficiently when a large number of SNPs were involved, even under different LD situations and showed potential for GWASs [Gao, et al. 2008]. It uses composite LD among SNPs to capture the correlation and derives the Meff using the number of principal components that contribute to 99.5% of variation. When a large number of SNPs are used, they need to be partitioned into smaller blocks in order to calculate eigenvalues effectively [Gao, et al. 2008]. The authors tested two partition methods, fixed length and haplotype blocks inferred from HaploView software [Barrett, et al. 2005]. Partitions based on haplotype blocks gave slightly better Meff estimations than fixed length partitions. However, computing haplotype blocks is slow and impractical on the genome-wide scale. Moskvina and Schmidt’s Keff algorithm estimates the overall type I error probability using pair-wise correlations among SNPs[Moskvina and Schmidt 2008]. However, it can be very time consuming to derive the full correlation matrix among SNPs and the authors developed an approximation method by using only small window sizes, e.g. 20, for the correlation matrix. The Keff approach was shown to be able to give a relatively accurate though slightly conservative estimation when tested using a GWAS data set.
In this study, we evaluate the simpleM (available at https://dsgweb.wustl.edu/rgao/) as tested in the original paper [Gao, et al. 2008], and the Keff algorithm (available at http://x001.psycm.uwcm.ac.uk/) as tested in the original paper [Moskvina and Schmidt 2008], using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K SNP data set. We also show that Meff is indeed valid on a genome-wide scale in these data sets based on statistical theory and significance tests.
In our evaluation of the permutation approximation algorithm, we first used the Illumina 1M BeadChip data, from our genetic study of aspirin responsiveness (GeneSTAR) project (https://dsgweb.wustl.edu/genestar/). In total, 1,004,037 SNPs were genotyped. For the multiple testing evaluations, we only used common SNPs with a minor allele frequency greater than or equal to 0.05, reducing the set to 778,629 SNPs. 656 unrelated white individuals were used. The missing value rate among these common SNPs was about 0.26%. We used the K-nearest neighbor (KNN) approach [Hastie, et al. 2001] to fill in missing genotypes. Other imputation algorithms, e.g. Mach [Li and Abecasis 2006], IMPUTE [Marchini, et al. 2007], Beagle [Browning and Browning 2007] and BimBam [Servin and Stephens 2007], should work well too, but can take more computing time. For the second data set, we used the Affymetrix GeneChip® Human Mapping 500K Array Set, from the Framingham Heart Study [Splansky, et al. 2007] (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap). In total, there are 485,424 SNPs in the database. We also only used common SNPs, reducing the set to 383,213 SNPs. 698 unrelated white individuals were used. The missing value rate was about 1.4% among these common SNPs and we used the KNN approach to fill in missing genotypes.
We compared the simpleM and the Keff methods to permutation and Bonferroni corrections. We conducted permutation tests with 10,000 random shuffles using the Cochran-Armitage trend test[Armitage 1955; Cochran 1954]. We set experiment-wise error rates to be 0.05. Details of permutation tests have been described previously[Gao, et al. 2008]. Results for each chromosome (chromosome-wide αe = 0.05) and genome-wide (genome-wide αe = 0.05) are shown in Table 1 for the Illumina 1M data and Table 2 for the Affymetrix 500K data. For each chromosome, we calculated the permutation threshold and derived the αps for the simpleM, Keff and Bonferroni methods by setting chromosome-wide αes to 0.05. For the simpleM and Keff methods, we inferred the effective number of independent tests first and then applied the Šidák formula to derive the corresponding αps (see the Table 1 and and2).2). The simpleM method gives the best approximation (the smallest distance) to the permutation thresholds, while the Keff method gives more conservative estimations, and Bonferroni gives the most conservative results. We also derived the thresholds for the whole genome by setting genome-wide αes to 0.05. In Table 1 (the Illumina 1M data), the genome-wide permutation threshold is 1.36e–7 and the corresponding estimated Meff is log(1-0.05)/log(1–1.355645e–7) = 378368. The simpleM, Keff and Bonferroni methods give the thresholds as 1.24e–7, 9.70e–8 and 6.42e–8, respectively. Again, we see that Bonferroni is very conservative, Keff is better and simpleM gives the best approximation. For the genome-wide estimation of the Meff, the simpleM and Keff estimate 34992 and 150656 more tests than the permutation, respectively. The Keff estimates 115664 more tests than the simpleM method, and hence is more conservative. In Table 2 (the Affymetrix 500K data), the permutation threshold is 2.46e–7 and the corresponding estimated Meff is log(1-0.05)/log(1–2.46119e–7) = 208408. The simpleM, Keff and Bonferroni methods give the thresholds as 2.23e–7, 1.95e–7 and 1.30e–7, respectively. In order to see their difference clearly, we can treat the permutation threshold as the “true” baseline. The simpleM, Keff and Bonferroni thresholds are 91%, 79% and 53% of the “true” value. Therefore, we continue to see that Bonferroni is conservative, Keff is better and simpleM gives the best approximation. For the genome-wide estimation of the Meff, the simpleM and Keff estimate 21352 and 54306 more tests than the permutation, respectively. The Keff estimates 32954 more tests than the simpleM method, and thus is more conservative.
With the empirical distribution from permutation available, we can also derive the corresponding type I error rate for each correction threshold. For the Illumina 1M BeadChip, the type I error rate is 0.05 for the genome wide permutation threshold, 1.36e–7, since we set αe = 0.05. The genome-wide type I error rate for the simpleM method is 0.045 because 450 permutation tests out of 10,000 have p-values less than the derived threshold, 1.24e–7. Similarly, the Keff and Bonferroni genome-wide thresholds, 9.70e-8 and 6.42e–8, correspond to type I error rates of 0.035 and 0.022, respectively. Again, we see that the simpleM method gives the best approximation to the desirable type I error rate, 0.05. For the Affymetrix 500K data set, the type I error rate is 0.05 for the genome-wide permutation threshold, 2.46e–7. Similarly, the simpleM, Keff and Bonferroni genome-wide thresholds, 2.23e–7, 1.95e–7 and 1.30e–7, correspond to type I error rates of 0.046, 0.040 and 0.026, respectively. Again, we see that the simpleM method gives the best approximation to the desirable type I error rate, 0.05. We also see that the Bonferroni correction produces a more conservative type I error rate, 0.026 (the 500K data) vs. 0.022 (the 1M data), when the LD among SNP increases, while the simpleM method gives consistent type I error. As the SNP density goes higher and higher, e.g. some imputed datasets can contain up to 2.5M SNPs, the advantage of simpleM over Bonferroni would be more apparent.
To demonstrate the computing time required for each method, we will use the Illumina 1M BeadChip chromosome 1 data as an example. There were 63560 common SNPs in chromosome 1. Permutation tests with 10,000 random shuffles took more than 200 hours on a personal computer (PC) (Intel® Core™2 Duo 2.6G CPU with 3GB memory). Calculating Keff, with a window size of 20 (w = 20), took about 18 minutes. We also calculated Keff using different window sizes, because bigger window sizes should result in better estimations [Moskvina and Schmidt 2008]. It took about 48 minutes to calculate Keff when we used a window size of 50. The Keff = 41155 (w = 50) estimation did not significantly improve over Keff = 41438 (w = 20) even though the computing time more than doubled. Therefore, we did not try larger window sizes for Keff. In comparison, it took less than one minute for the simpleM method to infer Meff for the chromosome 1 data. If we extended these calculations on a single PC for the entire genome for the Illumina 1M BeadChip, it would take about 15 weeks for the permutation method with 10K shuffles, about 221 minutes for the Keff and only about 13 minutes for the simpleM.
It is known that the minimum P value in each permutation should follow a beta(1, M) distribution if the M markers are statistically independent [Casella and Berger, 2001]. This allows us to verify the consistency between observed minimum p-values with their theoretical distribution. Hence, we fitted a Beta distribution to the observed 10,000 minimum p-values from the permutation tests for each GWAS data set and estimated the two beta parameters by maximum likelihood estimation (MLE) using SAS version 9.1.3 (SAS Institute, Cary, NC, USA). For the Illumina 1M BeadChip, the MLEs for the two beta parameters, α and β, are = 1.02 and =413549.7. If we fix α=1 and only estimate β, then the MLE =404001, which is in line with the Meff suggested by the permutation tests, 378368. Goodness-of-fit tests based on empirical distribution function (EDF) statistics did not reject the null hypothesis (the data follow a specified distribution) at the 0.25 significance level, which suggests that the data (minimum p-values) follow the specified beta distribution. The p-values are >0.25, >0.25 and >0.25 for the EDF tests: Kolmogorov-Smirnow, Cramér-von Mises and Anderson-Darling statistic, respectively. The quantile-quantile plot for minimum p-values from the permutation tests is shown in Figure 1. From the plot, we see that the empirical distribution and theoretical distribution agree with each other and the data points fall on the diagonal line, indicating that the minimum p-values follow the beta distribution very well. In Table 1, we also find that the genome-wide Meff estimation from the simpleM, 413360, gives a better (closer) approximation to the MLE estimate and the permutation suggested Meff than the Keff estimate.
For the Affymetrix 500K data, we saw a pattern similar to the Illumina data. The MLEs for the two beta parameters, α and β, are = 1.03 and = 222043.9. The MLE = 214986.2 if we fix α=1, which is consistent with the Meff suggested by the permutation tests, 208408. Goodness-of-fit tests based on EDF statistics did not reject the null hypothesis at the 0.1 significance level, which suggests that the data follow the specified beta distribution. The p-values are 0.19, 0.13 and 0.13 for the EDF tests: Kolmogorov-Smirnow, Cramér-von Mises and Anderson-Darling statistic, respectively. The quantile-quantile plot for minimum p-values from the permutation tests is shown in Figure 2. From the plot, we see that the empirical distribution and theoretical distribution agree with each other, indicating that the minimum p-values follow the beta distribution very well. In Table 2, we find that the genome-wide Meff estimation from simpleM, 229760, gives a better approximation to the MLE estimate and the permutation suggested Meff than the Keff estimate. Therefore, the effective number of independent tests is indeed valid for these data sets based on statistical theory and significance tests and is a reasonable approach in multiple testing corrections for GWASs.
Many other multiple testing correction methods exist. Lin  proposed using a Monte Carlo approach to approximate the joint distribution of test statistics and then using a Monte Carlo distribution to adjust p-values. Seaman and Müller-Myhsok  proposed a direct simulation approach that approximates p-values from permutation tests. Both methods avoid permutation by simulating test statistics directly from the asymptotic distribution. Conneely and Boehnke  developed a p-value adjustment method through numerical integration assuming an asymptotic multivariate normal distribution for the test statistics and achieved even greater speed compared to simulation approaches. However, the numerical integration can only handle up to 1,000 dimensions at present. Furthermore, these methods can take much longer to compute than the simpleM method. Conneely and Boehnke  estimated that it would take several hours for a 400K genome-wide scale, and for the Affymetrix 500K data, the simpleM method only took about 7 minutes. Moreover, the simpleM method is a non-parametric method that does not rely on the joint distribution of test statistics to be known.
With the increasing popularity of GWASs, the simpleM algorithm can serve as a fast approximation to the computationally intensive permutation procedure. It gives a highly accurate estimation of the permutation threshold for multiple testing correction while greatly reducing the computing time and resources required. We validated the method using each individual autosomal chromosome and together in a genome-wide format. We believe the simpleM method, an effective number of independent tests approach, can be useful to current large-scale genetic association studies for quickly estimating the permutation threshold. Moreover, the multiple testing thresholds derived in this study can provide practical guidelines for other GWASs using similar population samples, genotyping platforms and quality control protocols.
We thank Drs. Ingrid Borecki and Aldi Kraja for help in gaining access to the data sets analyzed. This work was supported in part by NIH grants 5R01HL08770002, 1R01HL08768801, 5R01HL087698, HL088655, HL087700, HL088215 and AG023746 and NIEHS T32 ES007126.