|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) have been widely applied to identify informative SNPs associated with common and complex diseases. Besides single-SNP analysis, the interaction between SNPs is believed to play an important role in disease risk due to the complex networking of genetic regulations. While many approaches have been proposed for detecting SNP interactions, the relative performance and merits of these methods in practice are largely unclear. In this paper, a ground-truth based comparative study is reported involving 9 popular SNP detection methods using realistic simulation datasets. The results provide general characteristics and guidelines on these methods that may be informative to the biological investigators.
Single-nucleotide polymorphisms (SNPs) are the most common DNA sequence variation, which occurs when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual). The human genome contains millions of SNPs, some of which may either directly cause changes in traits of disease or influence the risk of disease along with other factors . GWAS allows researchers to genotype large number of SNPs from subjects to explore the genetic association between SNP mutations and disease phenotypes . Once new genetic associations are identified, this information can help unveil the disease mechanism and develop better strategies to detect, treat and prevent the disease.
So far, single-SNP analysis is widely performed via the traditional statistical filtering methods (e.g. chi-2 test, Fisher's Exact Test, logistic regression ), which perform hypothesis testing on SNPs one by one. Although single-SNP analysis is successful at discovering some novel disease-risk loci with confirmative validation by multiple independent cohorts, the new findings explain only a small fraction (usually less than 20%) of heritability. A frequently cited reason is that most common diseases have complex mechanisms, with the phenotype determined by interactions (also called epistasis effects) between multiple SNPs and other factors . Searching for SNP interactions in high-dimensional SNP data is a daunting computational task, and some unique characteristics of SNP data further prompt challenges for effective detection of informative SNPs.
Driven by goals of efficiency and effectiveness, many methods [4-9] originating from various underlying techniques and assumptions have been developed to detect SNP interactions, while so far, the relative performance and advantages of these methods are largely unknown.
The purpose of this paper is to provide a thorough and objective comparison on the performance of several benchmark SNP epistasis detection methods. As the results, this comparison study reveals the difficulties and existing problems in SNP epistasis detection, and provides guidelines on applying these methods in GWAS. In addition, by analyzing the merits and principles of these methods, we would gain novel insights into the key problems associated with the existing methods and potentially develop more effective methods. In section 2, we first discuss the data characteristics and challenges in GWAS, and then briefly introduce the basic principles of 9 representative SNP detection methods. In section 3, we give experimental comparisons on the performance of these methods using simulation datasets derived from real genotyped data, with synthetically generated (and hence known) ground-truth SNP interactions. Finally, the general characteristics and guidelines in applying these methods are presented.
We shall first simply discuss the characteristics and unique challenges in detecting interacting SNPs.
Facing to the 5 problems mentioned above, two important features in developing methods for detecting epistatic interacting SNPs are:
In recent years, many methods have been proposed for detecting informative SNPs. We have tested 9 representative methods originating from different underlying techniques and assumptions.
These 9 methods can be classified into several categories according to different principles.
There are 2 main factors deciding the applicability of these methods: firstly, the sensitivity and the specificity of the criteria function (whether the method makes good use of the data characteristics to provide a sensitive criteria which selects the informative SNPs along with few false positives); secondly, the computation complexity (to detect SNP interactions in large-scale SNP set is even more challenging than other difficulties discussed previously, so only those methods with high computational efficiency have realistic applicability).
In our study, we used the run time of the methods in the same computational environment to evaluate the computational complexity, and used the following two measures to assess the sensitivity and specificity of the detection principles:
We have constructed 11 simulation datasets for a ground-truth based comparative study. Within each dataset, there are two data which include 100 and 1000 SNPs, respectively. Each data file contains 2000 simulated individuals (about 1000 cases and 1000 controls) which are randomly drawn (with replacement) from the pre-existing human subjects genotyped by the 317K-SNP Illumina HumanHap300 BeadChip from the New York City Cancer Control Project and lupus study. The data retain the basic patterns of linkage disequilibrium and allele frequencies as those observed in the original genome scan data.
Assuming that the disease risk is 100% explained by genetic factors, ground truth SNPs are sampled according to the requirements of the penetrance function in which they participate (randomly within a narrow window of minor allele frequency tolerance) and the remaining SNPs are chosen at random. The disease labels are affected jointly by several penetrance functions of ground truth SNPs. Several 1-way, 2-way, 3-way and 5-way interaction models are defined. Their penetrance functions are built under various models, and those models are set as fully penetrant or incompletely penetrant in order to give a comprehensive comparison. 11 data sets were replicated using penetrance functions given by the following link: http://www.cbil.ece.vt.edu/software/SNP%20Simulation.pdf
For the first 8 methods on the 11 datasets, we record their run time, the sensitivity at FPR=0.5%, ROC curves, and AUC (not necessary but just give more details). Because of the large computational complexity of PLR (spent 2 days to select top 30 SNPs in the 100-SNP data), we only record its sensitivity at FPR=0.5% on the 100-SNP data.
The parameters are set as close as possible to the default settings in [4-9] or in the related software whenever possible, and we only modify a few parameters in order to control the computational cost and performance within a reasonable range. When testing MDR on 1000-SNP data, we used the heuristic search (100,000 evaluations) instead of exhaustive search because of limited memory and high computational complexity. When testing BEAM, we enlarged the number of Markov chains from 1 to 10, and increased the number of initial tries from 20 to 40 in order to obtain a more complete result.
For statistical filtering methods (chi-2 test, FET and LR), we only detect the marginal effects of each SNP. For FIM, IG and PLR, we detect up to 2-way SNP interactions with exhaustive search. And for MDR and SH, we detect up to 3-way SNP interactions. The run time (including time for loading data) used by the 9 methods to output the full rank of SNPs is recorded in Table 1. All the experiments are run on a desktop with 3GHz CPU and 2GB RAM, OS: WinXP.
For 100-SNP and 1000-SNP data in all of the 11 datasets, AUC and the sensitivity when FPR=0.5% are shown in Table 2~5. The last row of each table records the mean value of the performance measures across all datasets.
Although the global performance (AUCs) of these 9 methods are fairly good, the sensitivities at low FPR are not satisfactory, i.e. only a small part of ground truth SNPs is selected by the 9 methods at FPR=0.5% in most datasets. As stated in section 2.3, the sensitivity with low FPR is more critical than AUCs, the results are actually unsatisfactory. This may be caused by the aforementioned challenges in detecting SNP epistasis effects, or by existing problems or limitations in these methods. For example, statistical filtering methods cannot detect SNP interactions well; chi-2 test and LR are based on asymptotic distributions so that large sample size is required; FET is an exact test but assumes either the dominant or recessive genetic model; FIM has high degrees of freedom, which also requires large sample size and degrades the accuracy of statistical significance; the cross-validation prediction error of multifactor model (MDR) is not sensitive enough to differentiate ground truth SNPs from other SNPs, and etc.
Comparing the relative performance of the 9 methods, we can see that although statistical filtering methods (FET, chi-2, LR) can only detect marginal effects of SNPs, the performance of these methods is relatively good. The methods considering SNP interactions (FIM, IG, MDR, SH, BEAM, PLR) do not gain significantly better sensitivity and ROC compared to statistical filtering methods. From Table 2~5, we can see that IG, FIM, BEAM have similar but slightly lower mean sensitivity and mean AUC than FET, chi-2 and LR, while MDR and SH have apparently weaker performance. Only PLR has slightly (but not consistently) better sensitivity on 100-SNP data, but its high computational complexity prevents us from further evaluating it on 1000-SNP data.
By looking into the SNP interactions selected by FIM, IG, MDR, SH, BEAM, and PLR, we found that they indeed detect some ground truth SNP interactions which cannot be detected by statistical filtering methods. These methods have their merits by considering the interaction effects, and taking account of nonlinearity and complexity of the SNP interactions. Moreover, some of the methods, such as SH, have explicitly devised principles to detect the SNP interactions with insignificant marginal effects. An interesting observation is: although some of the ground truth SNP interactions are detected, many false positives are also highly ranked by these methods, and the false positive SNPs are usually mistakenly selected as interacting with the ground truth SNPs having strong marginal effects. Therefore, the performance of these methods is heavily weakened by the false positives. The criterion functions of these methods are not sensitive enough to differentiate ground truth SNPs from false positives, due to various reasons, e.g. multiple testing, high degrees of freedom, and the improper underlying model.
As to computational complexity, the run time of statistical filtering methods is quite modest on the 100-SNP data and goes up linearly on the 1000-SNP data. For methods considering interactions, the run time goes up quickly. Among them, SH is the fastest but its efficiency is obtained by paying the price of SNP discovery accuracy. For IG, FIT and BEAM, the computation increases at least quadratically with the number of SNPs. We are questioning their applicability for the real scenario, since with about 500k SNPs, it is estimated to take at least (for IG) 300 days. For MDR and PLR, the computation burden is even heavier, which makes their applicability unrealistic on large datasets. Certainly, the algorithms considering the interactions are still useful and applicable for candidate gene based association studies, alternatively, they can serve as post processing methods after pre-screening the whole SNPs into a manageable size using some other methods such as the statistical filtering methods.
Detection of informative SNP markers associated with complex genetic diseases is a challenging task, due to the particular SNP data characteristics and biological facts such as the large number of SNPs and nonlinear and complex SNP interactions of unknown a priori form. Many methods have been proposed in this field trying to enhance both sensitivity and computational efficiency. But so far there have been no studies giving a comprehensive evaluation of methods and guidelines for their applicability. This paper proposed a comparative study on the representative GWAS methods, in which multidimensional performance is measured via run time, ROC, AUC and sensitivity. From the results on simulation data, we see that although the newly proposed methods [4-7] obtain benefits at detecting SNP interactions with weak marginal effects, they do not gain apparently improved performance compared to traditional statistical filtering methods. We realize that the criterion functions need to be sensitive enough to differentiate those true SNP interactions from false positives caused by marginal effects. And the heuristic search strategies needs to be further explored according to data characteristics so that efficiency is achieved without too much sacrifice in performance. The software of MDR, BEAM, SH, and PLR can be found in either the original papers or the authors' websites. For the convenience of peers, we also provide free software on the other 5 methods at: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm
This work was supported in part by the US National Institutes of Health under Grant HL090567.