We developed a novel method to systematically evaluate the directional differentiation of Risk Allele Frequencies (RAF) of ensembles of cross-ethnic SNPs for 12 common diseases across 11 populations from the HapMap project and 53 indigenous populations from the HGDP project. We found that type 2 diabetes (T2D) demonstrated the significant differentiation of RAF among diverse populations, compared with the European frequency-matched control genomic alleles and risk alleles for other diseases (, Figure S4
). T2D showed the most extreme differentiation among 12 common diseases, no matter whether we used cross-ethnic SNPs that had been replicated in five different populations (, ) or SNPs that had been replicated in two studies (Figures S5
). This extreme differentiation is caused by the phenomenon that all T2D risk alleles share a consistent pattern of gradually decreasing population frequencies from Sub-Saharan Africa through Europe to East Asia regions (, , Figure S2
This phenomenon, that T2D risk alleles decrease frequencies when humans migrate 
, suggests many potential explanations. One likely cause is the adaptation to the disparities of agriculture development across continents. It has been previously reported that some T2D SNPs have higher risk allele frequencies in populations where cereals are the main dietary component, and observed risk allele frequency might be related to historical events, such as the dispersal out of Sub-Saharan Africa to regions with different climates and the adoption of more specialized-often less diverse-diets (i.e. farming and animal husbandry vs. foraging) 
. There were three major events in human evolution, including early migration from 200KYA to 10KYA, agriculture revolution and population expansion from 10KYA to 4KYA, and new world discovery and associated mass-migration and admixture after 4KYA 
. The significantly decreased frequencies of T2D risk alleles in the East Asia might be caused by the agriculture revolution, including the cultivation of white rice and pork in China. A related explanation stems from the thrifty genotype 
hypothesis, which asserts that a predisposition to insulin resistance may have protected individuals during periods of food deprivation by reducing muscle utilization of glucose and favoring glucose utilization in organs, such as brain, that operates through an insulin independent mechanism 
. Combining these two related explanations together, we speculate that the decreasing T2D risk allele frequencies are caused by the promotion of energy storage and usage appropriate to environments and insistent energy intakes.
Another speculation is that T2D is known to find roots in the mismatch between our genetics and environment, as food contributes a significant environmental impact. When humans migrate, environmental change may have led to a mismatch between genetics and available diet, and put a positive evolutionary pressure on the frequencies of T2D protective alleles. Therefore, the decreasing T2D risk alleles are expected, while the other diseases are unusual given the underlying demographic history. Future evolutionary analysis on these T2D SNPs may provide some insight on the origin of this pandemic disease, as will more population-specific genetic studies.
Having shown the extreme differentiation of T2D RAF, we further combined the effect sizes from all independent risk variants and calculated a Predicted Genetic Risk (PGR) for each of 1,397 individuals in the HapMap3 project. T2D showed the most significant population differentiation among 40 diseases, after correcting for control genomic genotypes (). We identified a consistent pattern of high PGR in the African and low PGR in the Asian regardless whether we used ethnic-specific SNPs (), validated risk scores (), or different genotyping/sequencing technologies (). Our results indicate that there is indeed a differential T2D genetic risk across different populations across continents. The distributions we have found are very similar to a recent report measuring 19 common variants on five continent populations 
, with the highest risk in the African populations, and lowest risk in the East Asian populations.
The populations examined by this study are distributed broadly around the world, representing a wide range of environmental exposures and lifestyles. Hence, it is challenging to associate the increased prevalence of risk-associated alleles with actual manifestations of T2D, which we know to be heavily influenced by environmental factors. However, studies in England and the United States have consistently shown that individuals with African ancestry have increased diabetes rates relative to their neighbors of European or East Asian ancestry 
, while those with Chinese ancestry had lower incidence compared to others in a recent 10-year Canadian study 
. At the same time, citizens in China have higher prevalence of T2D within their own country 
. Disparities in T2D rates may be attributed to social, cultural, and economic differences or possible genetic confounders such as admixing of ancestral ethnicities, though our results suggest that differential genetics may indeed play some role in these differences in incidence rates.
We also found that African had higher PGR on prostate cancer than other populations. Epidemiology data from Center for Disease Control and Prevention from 1999 to 2007 show that incidence of prostate cancer is 1.56 times higher in the African American than white American. Further investigation on the genetic reasons behind the observed ethnic disparity of disease incidence rates across ethnic/racial groups might identify personalized medicine to improve the health disparity.
Many challenges to evaluate the population differentiation of RAF and PGR remain. Foremost, many of the SNPs identified from genome-wide association studies (GWASs) are tag SNPs and are therefore not assumed to be causal 
. However, each of the 12 cross-ethnic T2D SNP share the same risk allele and similar effect sizes across 34 different studied populations (Figure S1
, Table S2
), suggesting that they are the best representatives of the causal alleles based on the current data. The consistent observation of differential T2D genetic risk with different SNPs, risk scores, and technologies suggests validity as new causal variants are identified, but this remains a hypothesis that needs to be tested in the future. Second, we acknowledge that we adopted a relaxed p value cutoff of p<1×10−6
to identify cross-ethnic SNPs for a wide-variety of diseases for comparisons. With more GWAS in diverse population groups, a more rigorous cutoff and ethnicity-specific effects should be used. Third, there may be some ethnic-specific gene-environment interaction. Forth, our observed disparity of PGR between population groups might be related to the disparities in the application of modern genetic tools to study diseases across ethnicities. Finally, a large component of heritable risk is still missing for most common diseases, and consequently missing in our analysis here 
. Future GWAS and sequencing studies on different ethnic groups under diversified environmental conditions will likely further reveal and illustrate the origins of complex diseases.
In conclusion, we found that T2D risk alleles demonstrated extreme differentiation compared to other diseases, with population frequencies decreasing from Sub-Saharan Africa and through Europe to East Asia. These patterns may contribute to the observed disparity of T2D incidence rates across worldwide ethnic populations.