Determining the most promising single-nucleotide polymorphisms (SNPs) presents a challenge in genome-wide association studies, when hundreds of thousands of association tests are conducted. The power to detect genetic effects is dependent on minor allele frequency (MAF), and genome-wide association studies SNP arrays include SNPs with a wide distribution of MAFs. Therefore, it is critical to understand MAF's effect on the false positive rate.
Data from the Framingham Heart Study simulated data (Problem 3, with answers) was used to examine the effects of varying MAFs on the likelihood of false positives. Replication set 1 was used to generate 1 million permutations of case/control status in unrelated individuals. Logistic regression was used to test for the association between each SNP and myocardial infarction using an additive model. We report the number of "significant" tests by MAF at α = 10-4, 10-5, and 10-6.
Common SNPs exhibited fewer false positives than expected. At α = 10-4, SNPs with MAF 25% and 50% resulted in 69.2 [95%CI: 62.8-75.6] and 70.8 [95%CI: 61.3-80.4] false positives, respectively, compared to 100 expected. Rare SNPs exhibited more variability but did not show more false-positive results than expected by chance. However, at α = 10-4, MAF = 5% exhibited significantly more false positives (105.5 [95%CI: 81-130.1]) than MAF = 25% and 50%. Similar results were seen at the other alpha values.
These results suggest that removal of low MAF SNPs from analysis due to concerns about inflated false-positive results may not be appropriate.