One way to control population stratification is to use markers distributed across the genome, which reflects demographic history of a population. Nevertheless, smaller regions that actually harbor disease-causing variants may be subject to demographic history, natural selection pressure, or random fluctuations in admixture. Thus, it becomes important to look at local ancestry in local regions of the chromosomes, where causal variants may exist to better control for population stratification. In the FHS sample, estimates of global ancestry do not explain local ancestry PCs, as illustrated by low adjusted r2 values from the regression models. The low correlations observed between the global and local PCs further suggest that estimates of global and local ancestry each provide different information about ancestry.
We observed that the distribution of
p-values adjusting for either global, or both local and global ancestries can best fit the expected distribution under the null, while adjusting for local ancestry slightly departs from the null distribution. It is known that there are many variants underlying height, which has an estimated heritability of 0.8 in world populations. Three recent genome-wide association studies uncovered over 50 independent SNPs for height cumulatively accounting for approximately 2-4% of the variation per study [
6-
8], suggesting many more remaining variants. We would expect some degree of departure of the observed
p-value distribution from the null, although further investigation is required to assess the degree of this departure. Many of the SNPs either are causal or in LD with causal variants, and can contribute the departure from the null distribution of
p-values adjusting for local ancestry, although detection requires much larger sample sizes.
The effectiveness of adjusting for local ancestry can be observed from the results of the association analysis between height and SNPs in LCT gene, known to be a false-positive finding. All three ancestry adjustment methods can remove the spurious association observed in unadjusted analyses. The decrease in significance for the top SNPs is much more substantial when using local ancestry or both global and local ancestry compared with global ancestry alone, indicating that incorporating local ancestry estimates can adequately control the effect of population stratification.
In theory, the best way to adjust for population stratification is to use the true ancestry at the testing locus. However, this true ancestry at the locus is difficult to estimate, especially for a population formed from several ancestral populations. The local PCs estimated the SNPs around the locus may better approximate the ancestry at the locus than the global PCs estimated using the SNPs across the genome. However, the number of SNPs that are required to adequately estimate the ancestry at a locus needs further investigation. In the LCT gene we demonstrate that selecting less correlated SNPs does not affect estimation of local PCs, and for the most significant LCT SNPs, using the maximal number of SNPs in a local region may provide the best reduction in spurious association.
This analysis does not explore whether varying the size of the local PC region would alter the ancestry estimates or change the observed associations between significant SNPs and height. We also do not explore the optimal number of PCs to account for ancestry, whether local or global, and perhaps there is additional loss of power with extraneous PCs in the model. Certainly, the first global and local PCs are the most correlated and capture most of the ancestral information; however, adjusting for both local and global ancestry may result in a statistical over adjustment and potential loss of statistical power. Previous studies suggest that the largest 10 PCs are able to capture the ancestry in most of the current world populations [
9]. We also note that studying a population with more stratification (i.e., African-Americans) may offer more insight into the comparison of global versus local ancestry adjustment.