With the advent of high throughput genotyping technology, genome-wide association studies (GWASs) are becoming more common. One of the major challenges in GWASs is to derive the multiple testing correction threshold. There are two simple methods in practice: Bonferroni and Šidák corrections[
Šidák 1967]. These methods try to keep the experimental-wise error rate (α
e) at a nominal level, e.g. 0.05, by adjusting the point-wise error rate (α
p). Suppose that there are M independent tests, Šidák correction gives α
p = 1−(1−α
e)
1/M and Bonferroni correction gives α
p = α
e/M. These two methods are very closely related and lead to similar results. Unfortunately, they can be conservative if the independence assumption does not hold, and this is usually the case for densely typed single nucleotide polymorphisms (SNPs) in current genetic association studies.
In genetic association studies, researchers treat permutation tests as the gold standard for multiple correction adjustment. For the permutation tests of population-based studies, phenotypes are randomly shuffled while the SNP genotypes are kept the same [
Churchill and Doerge 1994]. Hence, the correlation structure among SNPs is preserved and the phenotype is permuted to simulate the null hypothesis of no association. However, permutation tests are hindered by the large number of random shuffles required, which take considerable computing time and resources. If a large number of SNPs, e.g. 1M or 500K, are involved in the genetic association study, permutation tests become impractical for GWASs because it can take years to finish on a single modern PC if a substantial number of random shuffles are carried out [
Conneely and Boehnke 2007;
Gao, et al. 2008]. Reducing the number of random shuffles to make the computation feasible results in thresholds that suffer from high levels of variation. If we increase the number of random shuffles, the computing time increases tremendously. Therefore, less computationally demanding methods that can approximate permutation tests with a large number of shuffles are desirable.
One particularly attractive alternative to permutation tests is to calculate the effective number of independent tests, M
eff [
Cheverud 2001;
Gao, et al. 2008;
Li and Ji 2005;
Moskvina and Schmidt 2008;
Nyholt 2004;
Nyholt 2005]. Despite criticism from
Salyakina et al. [2005], the validity of M
eff has been shown by many researchers. The idea of M
eff-based approaches is simple: “filter out” the correlation among tests, leaving only the effective number of independent ones, M
eff, and then use the Bonferroni or the Šidák correction by replacing M with M
eff in the corresponding formula.
M
eff-based corrections have been shown to be effective and fast. The M
eff approach was first proposed in genetic studies by
Cheverud [2001] and later refined by
Nyholt [2004]. Later, it was found that Nyholt’s approach can be conservative when haplotype block structure exists [
Salyakina, et al. 2005].
Salyakina et al. [2005] also reported some anti-conservative behavior for Nyholt’s approach for data with strong linkage disequilibrium (LD) though these observations have not been seen by the other investigators.
Li and Ji [2005] proposed another M
eff estimation formula and showed an improvement in power over Nyholt’s approach but at the expense of the type I error rate. Cheverud’s, Nyholt’s and Li and Ji’s M
eff methods, however, were validated using only a small number of SNPs, and it remained to be seen how well they performed using a large number of correlated SNPs. Recently,
Gao et al. [2008] proposed to a principal component approach to derive M
eff, named simpleM, and showed that it provided a better estimate than Li and Ji’s method in the presence of a large number of SNPs.
Moskvina and Schmidt [2008] derived a new M
eff estimation formula, calling it K
eff, and validated it using a GWAS data set.
Gao et al’s simpleM method performed accurately and efficiently when a large number of SNPs were involved, even under different LD situations and showed potential for GWASs [
Gao, et al. 2008]. It uses composite LD among SNPs to capture the correlation and derives the M
eff using the number of principal components that contribute to 99.5% of variation. When a large number of SNPs are used, they need to be partitioned into smaller blocks in order to calculate eigenvalues effectively [
Gao, et al. 2008]. The authors tested two partition methods, fixed length and haplotype blocks inferred from HaploView software [
Barrett, et al. 2005]. Partitions based on haplotype blocks gave slightly better M
eff estimations than fixed length partitions. However, computing haplotype blocks is slow and impractical on the genome-wide scale. Moskvina and Schmidt’s K
eff algorithm estimates the overall type I error probability using pair-wise correlations among SNPs[
Moskvina and Schmidt 2008]. However, it can be very time consuming to derive the full correlation matrix among SNPs and the authors developed an approximation method by using only small window sizes, e.g. 20, for the correlation matrix. The K
eff approach was shown to be able to give a relatively accurate though slightly conservative estimation when tested using a GWAS data set.
In this study, we evaluate the simpleM (available at
https://dsgweb.wustl.edu/rgao/) as tested in the original paper [
Gao, et al. 2008], and the K
eff algorithm (available at
http://x001.psycm.uwcm.ac.uk/) as tested in the original paper [
Moskvina and Schmidt 2008], using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K SNP data set. We also show that M
eff is indeed valid on a genome-wide scale in these data sets based on statistical theory and significance tests.
In our evaluation of the permutation approximation algorithm, we first used the Illumina 1M BeadChip data, from our genetic study of aspirin responsiveness (GeneSTAR) project (
https://dsgweb.wustl.edu/genestar/). In total, 1,004,037 SNPs were genotyped. For the multiple testing evaluations, we only used common SNPs with a minor allele frequency greater than or equal to 0.05, reducing the set to 778,629 SNPs. 656 unrelated white individuals were used. The missing value rate among these common SNPs was about 0.26%. We used the K-nearest neighbor (KNN) approach [
Hastie, et al. 2001] to fill in missing genotypes. Other imputation algorithms, e.g. Mach [
Li and Abecasis 2006], IMPUTE [
Marchini, et al. 2007], Beagle [
Browning and Browning 2007] and BimBam [
Servin and Stephens 2007], should work well too, but can take more computing time. For the second data set, we used the Affymetrix GeneChip® Human Mapping 500K Array Set, from the Framingham Heart Study [
Splansky, et al. 2007] (
http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap). In total, there are 485,424 SNPs in the database. We also only used common SNPs, reducing the set to 383,213 SNPs. 698 unrelated white individuals were used. The missing value rate was about 1.4% among these common SNPs and we used the KNN approach to fill in missing genotypes.
We compared the simpleM and the K
eff methods to permutation and Bonferroni corrections. We conducted permutation tests with 10,000 random shuffles using the Cochran-Armitage trend test[
Armitage 1955;
Cochran 1954]. We set experiment-wise error rates to be 0.05. Details of permutation tests have been described previously[
Gao, et al. 2008]. Results for each chromosome (chromosome-wide α
e = 0.05) and genome-wide (genome-wide α
e = 0.05) are shown in for the Illumina 1M data and for the Affymetrix 500K data. For each chromosome, we calculated the permutation threshold and derived the α
ps for the simpleM, K
eff and Bonferroni methods by setting chromosome-wide α
es to 0.05. For the simpleM and K
eff methods, we inferred the effective number of independent tests first and then applied the Šidák formula to derive the corresponding α
ps (see the and ). The simpleM method gives the best approximation (the smallest distance) to the permutation thresholds, while the K
eff method gives more conservative estimations, and Bonferroni gives the most conservative results. We also derived the thresholds for the whole genome by setting genome-wide α
es to 0.05. In (the Illumina 1M data), the genome-wide permutation threshold is 1.36e–7 and the corresponding estimated M
eff is log(1-0.05)/log(1–1.355645e–7) = 378368. The simpleM, K
eff and Bonferroni methods give the thresholds as 1.24e–7, 9.70e–8 and 6.42e–8, respectively. Again, we see that Bonferroni is very conservative, K
eff is better and simpleM gives the best approximation. For the genome-wide estimation of the M
eff, the simpleM and K
eff estimate 34992 and 150656 more tests than the permutation, respectively. The K
eff estimates 115664 more tests than the simpleM method, and hence is more conservative. In (the Affymetrix 500K data), the permutation threshold is 2.46e–7 and the corresponding estimated M
eff is log(1-0.05)/log(1–2.46119e–7) = 208408. The simpleM, K
eff and Bonferroni methods give the thresholds as 2.23e–7, 1.95e–7 and 1.30e–7, respectively. In order to see their difference clearly, we can treat the permutation threshold as the “true” baseline. The simpleM, K
eff and Bonferroni thresholds are 91%, 79% and 53% of the “true” value. Therefore, we continue to see that Bonferroni is conservative, K
eff is better and simpleM gives the best approximation. For the genome-wide estimation of the M
eff, the simpleM and K
eff estimate 21352 and 54306 more tests than the permutation, respectively. The K
eff estimates 32954 more tests than the simpleM method, and thus is more conservative.
| Table 1Derived significance thresholds for the Illumina 1M data. |
| Table 2Derived significance thresholds for the Affymetrix 500K data. |
With the empirical distribution from permutation available, we can also derive the corresponding type I error rate for each correction threshold. For the Illumina 1M BeadChip, the type I error rate is 0.05 for the genome wide permutation threshold, 1.36e–7, since we set αe = 0.05. The genome-wide type I error rate for the simpleM method is 0.045 because 450 permutation tests out of 10,000 have p-values less than the derived threshold, 1.24e–7. Similarly, the Keff and Bonferroni genome-wide thresholds, 9.70e-8 and 6.42e–8, correspond to type I error rates of 0.035 and 0.022, respectively. Again, we see that the simpleM method gives the best approximation to the desirable type I error rate, 0.05. For the Affymetrix 500K data set, the type I error rate is 0.05 for the genome-wide permutation threshold, 2.46e–7. Similarly, the simpleM, Keff and Bonferroni genome-wide thresholds, 2.23e–7, 1.95e–7 and 1.30e–7, correspond to type I error rates of 0.046, 0.040 and 0.026, respectively. Again, we see that the simpleM method gives the best approximation to the desirable type I error rate, 0.05. We also see that the Bonferroni correction produces a more conservative type I error rate, 0.026 (the 500K data) vs. 0.022 (the 1M data), when the LD among SNP increases, while the simpleM method gives consistent type I error. As the SNP density goes higher and higher, e.g. some imputed datasets can contain up to 2.5M SNPs, the advantage of simpleM over Bonferroni would be more apparent.
To demonstrate the computing time required for each method, we will use the Illumina 1M BeadChip chromosome 1 data as an example. There were 63560 common SNPs in chromosome 1. Permutation tests with 10,000 random shuffles took more than 200 hours on a personal computer (PC) (Intel® Core™2 Duo 2.6G CPU with 3GB memory). Calculating K
eff, with a window size of 20 (w = 20), took about 18 minutes. We also calculated K
eff using different window sizes, because bigger window sizes should result in better estimations [
Moskvina and Schmidt 2008]. It took about 48 minutes to calculate K
eff when we used a window size of 50. The K
eff = 41155 (w = 50) estimation did not significantly improve over K
eff = 41438 (w = 20) even though the computing time more than doubled. Therefore, we did not try larger window sizes for K
eff. In comparison, it took less than one minute for the simpleM method to infer M
eff for the chromosome 1 data. If we extended these calculations on a single PC for the entire genome for the Illumina 1M BeadChip, it would take about 15 weeks for the permutation method with 10K shuffles, about 221 minutes for the K
eff and only about 13 minutes for the simpleM.
It is known that the minimum P value in each permutation should follow a beta(1, M) distribution if the M markers are statistically independent [
Casella and Berger, 2001]. This allows us to verify the consistency between observed minimum p-values with their theoretical distribution. Hence, we fitted a Beta distribution to the observed 10,000 minimum p-values from the permutation tests for each GWAS data set and estimated the two beta parameters by maximum likelihood estimation (MLE) using SAS version 9.1.3 (SAS Institute, Cary, NC, USA). For the Illumina 1M BeadChip, the MLEs for the two beta parameters, α and β, are
![[alpha]](/corehtml/pmc/pmcents/agrcirc.gif)
= 1.02 and
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
=413549.7. If we fix α=1 and only estimate β, then the MLE
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
=404001, which is in line with the M
eff suggested by the permutation tests, 378368. Goodness-of-fit tests based on empirical distribution function (EDF) statistics did not reject the null hypothesis (the data follow a specified distribution) at the 0.25 significance level, which suggests that the data (minimum p-values) follow the specified beta distribution. The p-values are >0.25, >0.25 and >0.25 for the EDF tests: Kolmogorov-Smirnow, Cramér-von Mises and Anderson-Darling statistic, respectively. The quantile-quantile plot for minimum p-values from the permutation tests is shown in . From the plot, we see that the empirical distribution and theoretical distribution agree with each other and the data points fall on the diagonal line, indicating that the minimum p-values follow the beta distribution very well. In , we also find that the genome-wide M
eff estimation from the simpleM, 413360, gives a better (closer) approximation to the MLE estimate
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
and the permutation suggested M
eff than the K
eff estimate.
For the Affymetrix 500K data, we saw a pattern similar to the Illumina data. The MLEs for the two beta parameters, α and β, are
![[alpha]](/corehtml/pmc/pmcents/agrcirc.gif)
= 1.03 and
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
= 222043.9. The MLE
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
= 214986.2 if we fix α=1, which is consistent with the M
eff suggested by the permutation tests, 208408. Goodness-of-fit tests based on EDF statistics did not reject the null hypothesis at the 0.1 significance level, which suggests that the data follow the specified beta distribution. The p-values are 0.19, 0.13 and 0.13 for the EDF tests: Kolmogorov-Smirnow, Cramér-von Mises and Anderson-Darling statistic, respectively. The quantile-quantile plot for minimum p-values from the permutation tests is shown in . From the plot, we see that the empirical distribution and theoretical distribution agree with each other, indicating that the minimum p-values follow the beta distribution very well. In , we find that the genome-wide M
eff estimation from simpleM, 229760, gives a better approximation to the MLE estimate
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
and the permutation suggested M
eff than the K
eff estimate. Therefore, the effective number of independent tests is indeed valid for these data sets based on statistical theory and significance tests and is a reasonable approach in multiple testing corrections for GWASs.
Many other multiple testing correction methods exist.
Lin [2005] proposed using a Monte Carlo approach to approximate the joint distribution of test statistics and then using a Monte Carlo distribution to adjust p-values.
Seaman and Müller-Myhsok [2005] proposed a direct simulation approach that approximates p-values from permutation tests. Both methods avoid permutation by simulating test statistics directly from the asymptotic distribution.
Conneely and Boehnke [2007] developed a p-value adjustment method through numerical integration assuming an asymptotic multivariate normal distribution for the test statistics and achieved even greater speed compared to simulation approaches. However, the numerical integration can only handle up to 1,000 dimensions at present. Furthermore, these methods can take much longer to compute than the simpleM method.
Conneely and Boehnke [2007] estimated that it would take several hours for a 400K genome-wide scale, and for the Affymetrix 500K data, the simpleM method only took about 7 minutes. Moreover, the simpleM method is a non-parametric method that does not rely on the joint distribution of test statistics to be known.
With the increasing popularity of GWASs, the simpleM algorithm can serve as a fast approximation to the computationally intensive permutation procedure. It gives a highly accurate estimation of the permutation threshold for multiple testing correction while greatly reducing the computing time and resources required. We validated the method using each individual autosomal chromosome and together in a genome-wide format. We believe the simpleM method, an effective number of independent tests approach, can be useful to current large-scale genetic association studies for quickly estimating the permutation threshold. Moreover, the multiple testing thresholds derived in this study can provide practical guidelines for other GWASs using similar population samples, genotyping platforms and quality control protocols.