In this study, we found that within a region in the genome, overall the common SNPs are not highly correlated with the number of rare alleles, so they are not powerful for tagging the presence of rare alleles. But in subpopulations, the common SNPs can capture some information on the presence of rare variants, and their increased correlations are statistically significant but are often small (Table ). We also found that including tagging SNPs in strong LD with each other is helpful in detecting rare alleles.
Common SNPs have higher correlations with the presence of rare SNPs in the subpopulations, which indicates that population structure influences the tagging power. The common SNPs have lower correlations with the presence of nonsynonymous SNPs, especially in the Yoruba population, which may indicate difficulty in capturing rare functional variants in that population. In addition to the presence of rare alleles, we also analyzed the correlation between common SNPs and another variable, a collapsing statistic for rare SNPs [7
], which has the value 1 if a rare allele is present and the value 0 if no rare alleles are present among several randomly selected SNPs within a genome region. We obtained similar results with the collapsing variable (data not shown).
Our study suggests that we should not exclude SNPs in strong LD (e.g., r2 > 0.95) from tagging SNPs in an association study, because they can help to detect rare SNPs. They are less helpful for predicting disease risk, however, because their attributable risk is so small; but the significant associations detected by them could be important for detecting new metabolic pathways.
The multiple correlation R2
could be overadjusted because the adjusting assumes independence of the common SNPs, which is not the case for our study. But we nevertheless get increased
to tag rare SNPs by including SNPs in strong LD with each other among the tagging SNPs, which indicates their importance in an association study to detect causal variants.