We have proposed a simple and fast S-GSB-HWD procedure that uses smoothed weights for GWAS. The S-GSB-HWD procedure is based on application of the generalized sequential Bonferroni procedure of Holm to the incorporation of information of the HWD test using cases only into the CA trend test. When HWE holds or is only slightly violated in the general population, our S-GSB-HWD procedure can control the FWER very well. Compared to the widely used CA trend test (which assumes an additive mode of inheritance), our S-GSB-HWD procedure has much higher power for the recessive disease model and higher or comparable power for the dominant model but also roughly maintains the power of the CA trend test for the additive and multiplicative models. In addition, compared to the CA trend test, for case-control data with larger sample sizes, our S-GSB-HWD procedure can gain much more increased power under the recessive and dominant disease models compared to the CA trend test (tables , , , ). Therefore, our proposed method is more suitable for data with large sample sizes (such as ≥1,000 cases and ≥1,000 controls).
In the S-GSB-HWD procedure, the smoothed weights depend on the smooth parameter λ; how to determine the optimal value of λ is an open question. Based on our simulation studies, it appears the optimal value of λ does not depend on the number of SNPs but only depends on the underlying disease models. When the underlying disease models are unknown, setting λ = 0.6 is a good choice.
We note that when the MAF is low (such as ≤10%), in comparison to the CA trend test, our S-GSB-HWD procedure does not have much increased power for all GRR values under the dominant model and for GRR ≤2.0 under the recessive model because the procedure does not gain much information from the HWD test using cases only.
From section 2.3.1.on the GSB procedure, we know that one required condition to reject the null hypothesis of no association at a marker is that the p value from the CA trend test (pc) is less than the nominal FWER level α(such as 0.05), i.e., pc < α (this means that the marker has at least some weak association signal). In other words, if pc ≥ α, no matter how small the p value from the HWD test (or how big the weight for the marker), the S-GSB-HWD cannot reject the null hypothesis. In some situations, genotyping error can cause very small p values of the HWD test using cases only. This feature of the GSB procedure described here can prevent some false positive findings due to the very small p values caused by genotyping errors.
In our S-GSB-HWD procedure, we used p values from the HWD test statistic THWD
by using cases only to estimate weights. When HWE holds or is slightly violated, although the HWD test cannot control the FWER for multiple testing, this does not influence that the S-GSB-HWD procedure controls the FWER well. To control the FWER, our S-GSB-HWD procedure requires only that the weights are independent of the p values from the CA trend test. For data with a small sample size or with very small MAF (such as <10%), it might be better to use the HWD exact test statistic [18
] to replace the traditional HWD test statistic THWD
for more accurate results. When HWE holds in the population, the p values from the exact test may not follow a uniform distribution U[0, 1] under the null hypothesis of no association [18
] depending on allele frequencies and sample sizes. Therefore these p values cannot be used in Fisher's method, which requires that p values from each test follow U[0, 1] distribution. However, the p values from the HWD exact test can be used to estimate weights in our S-GSB-HWD procedure. We also note that when HWE holds in the general population, if the sample size is small (such as with 1,000 cases), Fisher's method often has slightly violated type I error rate (see our simulation results). When HWE is violated in the general population, Fisher's method has type I error rate much higher than the nominal levels.
In the S-GSB-HWD procedure, we calculate weights on the basis of reciprocal function of p values from the HWD test. We did test other functions, such as –log, square root, and entropy of the p values. It seems that the reciprocal function is optimal because it generated the highest power in our simulation studies. When using the reciprocal function of p values in the S-GSB-HWD procedure, we essentially compare the product of a p value from the HWD test and a p value from the CA trend test with the corresponding threshold. We note that Fisher's method is based on using the distribution of the logarithm of this product. However Fisher's method cannot control the FWER well while our S-GSB-HWD procedure can.
Our S-GSB-HWD procedure is based on the assumptions of independence, i.e., absence of linkage disequilibrium between markers, and cannot handle LD among markers. Wu et al. [26
] proposed an SNP-set analysis method which can account for LD and interaction among a set of SNPs and reduce the number of tests in GWAS. Therefore using SNP-set analysis in GWAS can achieve higher power than using individual SNP tests. In our future research, we hope to incorporate the SNP-set analysis into our S-GSB-HWD procedure.
Our S-GSB-HWD procedure also assumes no population stratification and no genotyping error. These issues often exist in the analysis of genetic data for both candidate association studies and GWAS. Since the S-GSB-HWD procedure estimates weights by use of p values from the HWD test using cases only, our S-GSB-HWD procedure may be sensitive to genotyping errors. Song and Elston [6
] showed their HWD trend test (detecting the difference of HWD coefficients between cases and controls) can be robust to genotyping errors and population stratification. In addition, Price et al. [27
] suggested that a generalization of the CA trend test can be used to control the stratification based on principal components analysis. Our future research will include extending our S-GSB-HWD procedure to handle genotyping errors and stratification. We plan to replace the HWD test using cases only and the CA trend test by the HWD trend test and generalized CA trend test, respectively, in the S-GSB-HWD procedure. However, when there is no obvious stratification in the case-control sample data, especially for data from a homogenous population, use of the HWD test (using cases only) and CA trend test as in our S-GSB-HWD procedure can generate higher power. We note that it may be difficult to extend the (non-permutation-based) approximation methods of Zang et al. [11
] for MAX3 to handle population stratification.
Our methods in the present study for GWAS are based on controlling the FWER for multiple testing. It is not difficult to extend the method to control of the false discovery rate.