Figure shows the results of the single-SNP analysis for 1,379 common SNPs with and without PCA. We use the 95% quantile of the 200 p-values to represent the overall results of the 200 replicates. In the simulation, Q1 is influenced by 39 SNPs in 9 genes, including 2 common SNPs (MAF > 0.05) and 32 rare SNPs (MAF < 0.01). Our analysis detected the two causal common SNPs before and after population stratification adjustment. C13S523 has a relatively high MAF (0.165) with mild effect, and C4S1878 has a lower MAF (0.067) with moderate effect. In the analysis without PCA, 144 null SNPs were declared significant, leading to a false-positive rate (type I error) of 144/1,377 = 0.105. The false-positive rate dropped to 0 after adjusting for population stratification. Figure is the Manhattan plot of 10,648 rare SNPs before and after PCA. Forty-four null SNPs were declared significant before PCA and 21 null SNPs were declared significant after PCA, leading to type I errors of 0.004 and 0.002, respectively.
Figure 2 Manhattan plot of 1,379 common nonsynonymous SNPs (MAF > 0.05). Top panel: before PCA adjustment. Bottom panel: after PCA adjustment. Dashed line corresponds to the linkage-disequilibrium-adjusted Bonferroni significance level of 9.8 × (more ...)
Figure 3 Manhattan plot of 10,648 rare nonsynonymous SNPs (MAF < 0.01). Top panel: before PCA adjustment. Bottom panel: after PCA adjustment. Dashed line corresponds to the linkage-disequilibrium-adjusted Bonferroni significance level of 1.7 × (more ...)
These results suggest a MAF-dependent effect of PCA. We next examined the absolute difference in −log10(p-value) before and after PCA for various MAFs (Figure ). Our results show that the median difference increases with MAF. When comparing SNPs with low MAF (<0.01) with those with high MAF (>0.05), we detected statistical significance (Wilcox rank sum test, p < 2.2 × 10−16). These results suggest that principal components adjust the p-value more substantively for higher MAF SNPs.
Boxplot of the absolute difference in −log10(p-value) before and after PCA by MAF
We also tested association at the gene level. We compared three collapsing methods before and after PCA (Figure ). Before adjusting for population stratification, for all methods, three causal genes (KDR, FLT1, and VEGFC) were declared significant. Twenty-nine, 29, and 35 null genes were falsely detected for the indicator, proportion, and data-adaptive sum test methods, respectively (type I errors of 0.016, 0.016, and 0.020, respectively). After adjusting for population stratification, we detected two causal genes. The number of falsely detected genes was reduced dramatically to four, four, and seven, leading to type I errors of 0.0022, 0.0022, and 0.0039 for the indicator, proportion, and data-adaptive sum test methods, respectively. We then explored the effect of PCA on power. Table describes the number of times each causal gene was detected across 200 simulations for the three methods. Overall, power to detect genes in individual replicates was low; only KDR was identified at greater than 80% power without PCA adjustment. Furthermore, with PCA adjustment, power dropped to about 25% for KDR. Comparing the three methods, we found that the indicator method had lower power to detect KDR with or without PCA adjustment. Adjustment for population stratification greatly reduced the number of false positives but also reduced the power to detect true genes.
Figure 5 Manhattan plot of genes for the three collapsing methods. Left panels: before PCA adjustment. Right panels: after PCA adjustment. Dashed line corresponds to the linkage-disequilibrium-adjusted Bonferroni significance level of 7.86 × 10−5 (more ...)
Number of replicates with true discovery for the causal genes before and after PCA adjustment
We also investigated the effect of population stratification on phenotypes Q2 and Q4 (data not shown). Q2 showed the same pattern as Q1, supporting our contention that PCA does not perform well for rare variants. Q4 is not associated with any SNPs and thus is used to assess the effect of PCA on false-positive rates. No significant association was identified before and after PCA. The effect of population stratification appeared to diminish.