|Home | About | Journals | Submit | Contact Us | Français|
Different populations suffer from different rates of obesity and type-2 diabetes (T2D). Little is known about the genetic or adaptive component, if any, that underlies these differences. Given the cultural, geographic, and dietary variation that accumulated among humans over the last 60,000 years, we examined whether loci identified by genome-wide association studies for these traits have been subject to recent selection pressures. Using genome-wide SNP data on 938 individuals in 53 populations from the Human Genome Diversity Panel, we compare population differentiation and haplotype patterns at these loci to the rest of the genome. Using an “expanding window” approach (100 to 1,600 kb) for the individual loci as well as the loci as ensembles, we find a high degree of differentiation for the ensemble of T2D loci. This differentiation is most pronounced for East Asians and sub-Saharan Africans, suggesting that these groups experienced natural selection at loci associated with T2D. Haplotype analysis suggests an excess of obesity loci with evidence of recent positive selection among South Asians and Europeans, compared to sub-Saharan Africans and Native Americans. We also identify individual loci that may have been subjected to natural selection, such as the T2D locus, HHEX, which displays both elevated differentiation and extended haplotype homozygosity in comparisons of East Asians with other groups. Our findings suggest that there is an evolutionary genetic basis for population differences in these traits, and we have identified potential group-specific genetic risk factors.
Obesity and type-2 diabetes (T2D) are major health concerns, and are of growing concern across most human populations (Deitel, 2003; Seidell, 2000). The rates, types and consequences of T2D and obesity have been found to differ between populations (Diamond, 2003; Goran, 2008; Haslam et al., 2005; Seidell, 2000; Wang et al., 2007), possibly reflecting genetic differences between these populations (Cheng et al., 2009; Fernandez et al., 2003; Tang et al., 2006; Williams et al., 2000). It is still unclear whether these putative genetic differences arose due to natural selection, and what specific genetic factors, if any, are responsible for these population differences.
Several evolutionary hypotheses have been proposed to explain the disproportionate burden of obesity and T2D among some populations (Benyshek et al., 2001; Benyshek et al., 2006; Diamond, 2003; Hancock et al., 2008; Neel, 1962; Wells, 2007). These range from environmentally induced epigenetic modifications, to longer term adaptation to different climates and/or subsistence modes. For instance, current-day genetic susceptibility to obesity and T2D is often attributed to specific selective pressures in our evolutionary past, favoring behavioral or physiological traits that buffered our ancestors from starvation in times of food shortage (Neel, 1962). Since the spread of modern humans to other parts of the globe, some groups have adopted sedentary agricultural lifestyles, whereas others have maintained a hunter-gatherer lifestyle. This along with other ecological factors (eg. climate, geography, diet) may have resulted in selection pressures in some populations for or against behavioral or physiological parameters associated with greater body weight and glucose regulation. Although it is often hypothesized that these population differences arose due to different selective pressures, there is very little genetic evidence to support this argument (Hindorff et al., 2009; Myles et al., 2008; Pickrell et al., 2009; Southam et al., 2009).
With the convergence of recent increases in genotyping capacity, the ability to conduct large-scale genome-wide association studies (GWAS), worldwide panels of DNA samples, and methodological advances enabling the detection of natural selection from genetic data, these hypotheses are becoming more easily testable. We therefore examined the worldwide pattern of differentiation and recent natural selection in the regions surrounding a total of 32 GWAS-identified risk alleles for obesity and T2D (16 each), using genome-wide SNP data on the individuals in the HGDP-CEPH panel of world-wide populations (Li et al., 2008). We hypothesize that humans have undergone selection pressures within the last 60,000 years, related to body weight regulation and blood glucose regulation, resulting in population differentiation and unusually extended haplotypes at these loci. We also hypothesize that these signatures of selection will be restricted to specific groups. By using several overlapping window sizes for the FST analysis (100 to 1,600 kb), we also hypothesize that if differentiation occurred in any given case, that the extent of differentiation will decay as the window size gets larger. This pattern can be expected because if selection is acting on or near the locus of interest, but not in nearby regions, then the pattern of between-population differentiation will be broken down by recombination at greater distances from the region under selection.
We used the publicly available data from the HGDP/CEPH collection of 938 individuals from 53 different populations, genotyped at over 650,000 SNPs on the Illumina 650Y platform, and phased using fastPHASE (Li et al., 2008). Following recent publications on this dataset (Myles et al., 2008; Pickrell et al., 2009) we decided to group individual populations into seven broad geographical regions: Sub-Saharan Africa (n=102), Middle East (n=160), South Asia (n=200), Europe (n=156), East Asia (n=229), Oceania (n= 28), and America (n= 63). A listing of the populations in each group can be found in the Supplementary Material.
The set of loci utilized in this study was determined from published reviews on obesity and T2D GWASs (Florez, 2008; Hofker et al., 2009; O'Rahilly, 2009; Walley et al., 2009), and risk alleles were obtained from the online Catalog of Published Genome-Wide Association Studies (http://www.genome.gov/gwastudies/; accessed 10/25/2009). We examined a total of 28 variants in 16 obesity-associated loci and a total of 30 variants in 16 T2D-associated loci. Chimpanzee and Macaque reference alleles were obtained from the UCSC Genome browser (Kuhn et al., 2009). We examined the frequency of each obesity and T2D risk allele in each of the 7 population groups. A listing of the SNPs examined can be found in Supplementary Tables 1 and 2.
FST partitions the total genetic variance into within- and between- population components, thereby quantifying the extent of population differentiation. An elevated FST at a given locus suggests that selection has driven differentiation between populations. Previous research has shown that single allele estimates of FST are highly variable and may therefore be unreliable indicators of differentiation at a genomic locus (Gardner et al., 2007; Weir et al., 2005). Therefore, we considered that a more conservative approach would be to calculate an average of FST values for all SNPs contained in varying window sizes, from 100 kb to 1.6 Mb, each centered on the obesity and T2D-associated SNP. In order to account for differences in recombination patterns across the genome, we also performed these analyses with cM (centimorgan) distance (0.1 to 1.6 cM) instead of kb distance. Although we are examining overlapping windows, we expect that by “zooming out” by a factor of two, we would see a slow decay in FST much as haplotype-based tests of selection test for a slow decay of haplotypes. In some cases, several risk SNPs have been identified in the same region, usually concentrated in a relatively small region (<50 kb), so we defined the window of interest based on the center-most SNP. For the 100 kb windows, the number of SNPs contained in a window ranges from 7 to 42 SNPs. To calculate FST, we used the method of Weir and Cockerham (Weir B.S. et al., 1984; Weir et al., 2002). We calculated a single global estimate of FST (based on all 7 population groups), as well as all 21 pair-wise estimates of FST. FST was not calculated for a SNP if it is monomorphic in the groups being compared. Negative values of FST were given a value of 0 since negative values are biologically meaningless.
To control for population-specific demographic effects on the genome, we compared the FST in the risk window to a null distribution of random windows. Random windows were chosen in the following way. For each risk locus, we randomly chose 1000 equally sized (in bp) windows along the same chromosome, with similar genic/non-genic content (± 10%), since FST tends to be slightly higher for genic SNPs (Coop et al., 2009). The genic/non-genic classification was performed according to the annotation provided by Sullivan et al. (https://slep.unc.edu/evidence/?tab=Downloads) which classifies a SNP based on whether it is in the transcribed region of a gene. SNP annotations were created using the TAMAL database (Hemminger et al., 2006) based chiefly on UCSC genome browser files (Hinrichs et al., 2006), HapMap (Altshuler et al., 2005), and dbSNP (Wheeler et al., 2006). For the 7-way FST as well as all 21 possible pair-wise FST values, we obtained percentile ranks of the obesity and T2D SNP-centered window, compared to the 1000 randomly centered windows. For testing each of the risk loci separately, we used a Bonferroni correction for multiple testing. A p-value cutoff of 0.0031 (0.9969th percentile) keeps the nominal type I error rate at 0.05. Since we are interested in examining the 16 obesity and 16 T2D regions as ensembles, as well as each risk region separately, we first averaged the 16 percentiles of the 7-way FST. In order to obtain a group-specific FST percentile for a given group, which we will refer to as GSFST, we averaged the FST percentiles of the six pair-wise comparisons that contain the group in question, and averaged the GSFST percentiles over all 16 loci to examine the loci as ensembles. In order to determine whether this average is an outlier, we simulated a null distribution by generating a random number between 0 and 1 representing the FST percentile rank for each locus and window size, and averaged these over 16 simulated loci. The percentile ranks of FST are distributed uniformally, hence the uniform distribution of random numbers between 0 and 1 is a suitable representation of the distribution of the FST percentile ranks. We repeated this averaging 10,000 times and determined the 95th, 97.5th, and 99th percentile cut-off values.
In order to estimate variances of the percentile ranks, we used bootstrapping to generate confidence intervals on the FST estimates and subsequent percentiles. We generated 1000 bootstrap samples, calculated FST for each, and examined the 95% confidence intervals for the GSFST percentiles. Due to computational limitations we restricted this analysis to a test case (HHEX) in order to get a general idea of the variance in our estimate.
Extended haplotype homozygosity (EHH) is defined as the probability that two randomly chosen chromosomes carrying the core haplotype of interest are identical by descent, and the relative EHH (REHH) is the factor by which EHH decays on the tested core haplotype compared to that of other core haplotypes combined (Sabeti et al., 2002). The REHH thus corrects for local variation in recombination rates. We obtained REHH values using Sweep software v1.1 (Sabeti et al., 2002), (downloaded from http://www.broadinstitute.org/mpg/sweep). Using the same phased haplotype data as above, we examined REHH at haplotypes containing the obesity risk SNP, and all haplotypes contained in the surrounding 400 kb region (200 kb in either direction) of each risk SNP. Core haplotypes were defined according to the definition of a haplotype block in Gabriel et al. (Gabriel et al., 2002), and REHH was measured 300 kb in either direction of each core. For each region and each population group, we compared the REHH in the risk SNP region to the entire chromosome on which the risk SNP resides, to determine if the candidate region contains haplotypes with exceptionally high REHH, binning by haplotype frequency. Empirical significance was therefore determined separately in each of 20 bins of core haplotype frequency. We only considered core haplotypes with frequency greater than 5%. We counted instances of extreme REHH (p<0.01, and p<0.001) values for each gene region and for each population group. We used a generalized version of Fisher's exact test to determine if there are differences among groups in the number of loci with at least one extreme REHH value (p<0.01). We then tested specific pair-wise comparisons, using a two-sided Fisher's exact test, and corrected for multiple testing using a False Discovery Rate method as implemented in SAS (Cary, NC). We also noted instances when a risk SNP was in the core, has an REHH above the 95th percentile, and whether it contained the risk or the non-risk allele.
XP-EHH refers to a cross population comparison of EHH and is generally more powerful for detecting selection events that have gone to fixation (Sabeti et al., 2007). This method may therefore complement EHH and/or FST by detecting genomic regions that have experienced older selection events than those detectable by EHH. To determine XP-EHH for each obesity risk region, we used the HGDP Selection Browser (http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/). We entered the risk SNP, viewed the surrounding 500 kb window, and noted which population groups had XP-EHH that exceeded 2.5 (the −log10 of the p-value for a window centered at the SNP) at least once in that 500 kb region. This value reflects the degree to which a SNP is an outlier compared to the rest of the genome. As described by Pickrell et al. (Pickrell et al., 2009) since XP-EHH is the comparison of EHH between populations, the comparisons used were between Bantu (sub-Saharan Africa) and each of the non-African groups, whereas Europe was used as the reference for the Bantu group.
Out of 28 obesity risk alleles, 14 are derived since they differ from the reference allele in Chimpanzee (and Macaque, in most cases) (see Supplementary Table 1). Out of the 30 T2D risk alleles, 14 are derived (see Supplementary Table 2). We also find that the frequencies for some of the risk alleles differ substantially between populations, especially in FTO, SH2B1, NEGR1, and KCTD15 among the obesity SNPs, and HHEX, THADA, and KCNQ1 among the T2D SNPs (see Supplementary Table 1).
The percentile for the 7-way FST, averaged separately over the 16 obesity and 16 T2D loci, is shown in Figure 1 for varying window sizes. We find that, as an ensemble, the degree of differentiation at the 16 obesity loci slightly exceeds the 95th percentile (see Figure 1). The ensemble of T2D loci exceeds the 99th percentile for 100 kb windows, and the 95th percentile for 200–800 kb windows (see Figure 1). For the 100 kb window of T2D loci, the FST percentiles for 4 out of 16 loci are between the 90th and 100th percentile (TSPAN8, TCF7L2, HHEX, KCNQ1), and 4 are between 80th and 90th percentile. Therefore half of the 16 loci are above the 80th percentile. The rest of the loci lie above the 31st percentile, resulting in an average FST percentile of 69.4, which is higher than the 66.8 threshold representing the 99th percentile of the null distribution. Notably, we find that the degree of differentiation generally decays as the window size gets larger, further suggesting the localized action of natural selection. Upon examining the mean GSFST percentiles across each set of loci, we find evidence of elevated levels of differentiation (i.e. mean of 16 loci is above 50th percentile) at the ensemble of obesity loci (see Figure 2A) for most of groups, although none reach the 95th percentile. Among the T2D loci, we find that sub-Saharan Africans are highly differentiated (exceeding the 95th percentile) from other groups for the 100 kb window size. East Asians are highly differentiated from other groups for all window sizes and their mean GSFST percentile exceeds the 95th percentile for all window sizes greater than 100 kb (see Figure 2B). The high level of differentiation between East Asians and other groups, and between sub-Saharan Africans and other groups, as shown in Figure 2B, is responsible for the pattern we observe for the 7-way FST for T2D loci, as shown in Figure 1. For all above analyses, we find similar results using cM distance as opposed to kb distance, with the main exception being that the GSFST for East Asians at the set of T2D loci exceeds the 95th percentile only for the 0.1 and 0.8 cM windows.
We next examined patterns of differentiation among the specific loci. None of the loci exhibit empirical p-values that reach the statistical significance threshold after Bonferroni correction. The regions that are above the 95th percentile for the overall 7-way FST for at least one window size are NEGR1 (100 kb window), FTO (200–400 kb window), FAIM2 (800–1600 kb window), SH2B1 (200–400 kb window), and for T2D: TSPAN8 (100–400 kb), PPARG (800 kb), HHEX (1600 kb), and WFS1 (1600 kb) (data not shown). Using cM instead of kb distance, we observe the same pattern for FAIM2 in which the global FST reaches statistical significance (100th percentile) for the 0.8 cM window. For the other obesity loci mentioned above, we observe similar patterns, although not reaching the 95th percentile. In addition, we observe that the global FST at the 1.6 cM window surrounding the PTER locus nearly reaches statistical significance (99.6th percentile) according to the Bonferroni correction, and that the 0.4 cM window of KCTD15, and the 1.6 cM window surrounding NPC1 exceed the 95th percentile. Among the T2D loci, we find similar results using cM-based distance with the exception that PPARG and TSPAN8 do not quite exceed the 95th percentile, JAZF1 exceeds the 95th percentile for the 0.4 and 0.8 cM windows, and TCF7L2 exceeds the 95th percentile for the 0.1 and 0.2 cM windows.
Finally, we examined GSFST (group-specific FST) for specific loci. The region surrounding the NEGR1 and PTER loci are highly differentiated (above 95th percentile in some windows) in sub-Saharan Africans (see Figure 3 and Supplementary Figure 1). The region surrounding SEC16B appears to have undergone more differentiation among Oceanians than among other groups (see Supplementary Figure 2). Also of note among obesity loci is that the region surrounding NCR3 has universally experienced little differentiation, especially for sub-Saharan Africans, Oceanians, and East Asians (see Supplementary Figure 2). Among T2D loci, we find that HHEX and THADA are highly differentiated between East Asians and other groups (see Figure 4). Using bootstrap re-sampling, we find no overlap between the East Asia and any of the other groups' 95% confidence intervals for the 200 kb window size surrounding HHEX, suggesting that this magnitude of difference is statistically significant. We also observe that the region surrounding CDC123 appears to have undergone more differentiation among Oceanians than other groups (see Supplementary Material 3). For the above analyses, we observe similar results using cM-based distance.
First, we counted all instances in which the SWEEP-identified haplotypes in the 400 kb surrounding each risk SNP had REHH values that were in the top 1% for each population group and for the respective chromosome (see Tables 1 and and2).2). We also counted all instances where a risk allele was contained in a haplotype that was in the top 0.1% of the respective chromosome (see Supplementary Table 3). Among the obesity loci, we find that there is a significant difference among groups in the number of loci with at least one REHH value in the top 99th percentile (p=0.0066; see Table 1). Pair-wise comparisons show that South Asians (9) and Europeans (7) have more loci having at least one 99th percentile REHH value (with haplotype frequency >0.05) than Sub Saharan Africans (1) and Native Americans (1), although this difference is not statistically significant (p=0.059 for South Asians; p=0.019 for Europeans) after correction for 21 possible two-sided pair-wise comparisons. However, we consider this statistical test to be conservative because we do not take into account the fact that at several loci, South Asians and Europeans exhibit single loci with anywhere from 2 to 9 extreme REHH values. Among the T2D loci, we find no significant differences among groups in the number of loci having at least one 99th percentile REHH value (p=0.15) (see Table 2). We also examined instances in which the risk SNP is in the core of a haplotype that has an extreme REHH value (top 5%), and noted whether the risk or non-risk allele is contained in that particular core. For the obesity loci, among the 21 cases in which we observe such a haplotype in a given group and for a given locus, 15 of the cores of these haplotypes contain the risk allele, while the rest contain the non-risk allele. For the T2D loci, among the 7 cases, 5 haplotype cores contain the risk allele. (see Supplementary Tables 4 and 5). This result suggests that there has been recent positive selection for variants that are associated with higher body weight and insulin resistance.
Using the HGDP genome browser, we find very few instances of outlier XP-EHH in the regions surrounding the obesity risk alleles. There does not appear to be an excess of instances of elevated XP-EHH in any particular population group (see Supplementary Table 6A). On a gene by gene basis, there does appear to be an excess of elevated XP-EHH in the MAF, PTER and MC4R regions (see Supplementary Table 6B).
For the T2D-associated regions, we find a slight excess of elevated XP-EHH among East Asians, specifically at KCNJ11, HHEX, and THADA, although the difference between groups is not statistically significant (Supplementary Table 6B). The region surrounding KCNJ11 appears to have extended haplotypes in all groups compared to sub-Saharan Africans (Bantu). THADA also shows evidence of extended haplotypes among Native Americans.
We have tested the hypothesis that some groups of humans have recently experienced more evolutionary change at loci found to be associated with obesity and T2D compared to the rest of the genome. We have examined FST, a measure of population differentiation, and measures of shared extended haplotypes indicative of recent positive selection on new variation. Although our findings are not entirely consistent across tests, they have uncovered general as well as population- and gene-specific patterns.
First, with respect to the derived vs. ancestral status of the risk alleles, we find no evidence that the risk alleles tend to be either ancestral or derived for either the obesity or the T2D loci. We expected that if the thrifty genotype hypothesis applied specifically to the entire human species as an outlier among other primates, a majority of risk alleles would be derived. However, it is difficult to make any firm conclusions on the basis of this finding, since we are only considering markers that are still polymorphic in humans, and since the risk alleles that are reported in GWASs are unlikely to be the causative alleles, and are instead likely to only be associated with the causative variants.
Given the above-mentioned limitation, and the fact that these specific variants have been found to explain a very small proportion of the expected genetic variance (Hofker et al., 2009; Willer et al., 2009), we chose to examine average FST and haplotype patterns in the surrounding regions of each reported risk SNP (up to 800 kb in either direction). This enabled us to take into account more of the variation that is associated with any particular SNP, and may give some indication as to the timing and strength of selection. For example, depending on the population, an elevated GSFST that stretches over a long stretch of DNA may indicate more recent positive selection. Also, averaging FST over many SNPs in a region may be more a more sensitive approach given the highly variable nature of FST across neighboring loci.
We have found that the regions harboring T2D loci, as an ensemble, have experienced unusually high levels of differentiation compared to random regions of the genome, as assessed by the genotyped SNPs. Differentiation decays with distance as expected. Obesity loci, as an ensemble, also show unusually high levels of differentiation, but to a lesser extent than T2D loci. Our results further suggest that East Asians and sub-Saharan Africans have experienced higher levels of group-specific differentiation than other groups at the ensemble of T2D loci. We also find, as expected, that the degree of differentiation quickly decays with larger window sizes for sub-Saharan Africans, given overall reduced LD in these groups. Pickrell et al. (2009) used the same dataset to examine the single SNP with the highest FST in each T2D-associated region (within a 100 kb window) and found that sub-Saharan Africans are significantly differentiated from East Asians and Europeans at these loci. Our results confirm this finding and also uncover a high degree of differentiation among East Asians at larger window sizes. Our results also confirm the results of Pickrell et al. and others (Helgason et al., 2007; Southam et al., 2009) that the loci TCF7L2, JAZF1, and TSPAN8 show signatures of natural selection
The reasons for a high degree of differentiation are usually interpreted as being due to natural selection. However, various types of natural selection could explain any given pattern of differentiation: purifying selection in one or several groups, or positive selection in one group but not others, or in all groups except for one. These could represent one of several evolutionary/historical scenarios. One is that there was selection either for or against insulin resistance among East Asians and sub-Saharan Africans. Another possibility is that East Asians and sub-Saharan Africans underwent a relaxation of selection pressures at these genes due to a diet that did not select for insulin resistance and gluconeogenisis.
Among the T2D-associated loci, we find that HHEX is the most strongly differentiated between groups, specifically between East Asians and other groups. HHEX (hematopoietically-expressed homeobox protein) encodes a transcriptional regulator involved in pancreatic development (Bort et al., 2004). The risk allele has been found to be associated with reduced pancreatic β-cell function (Pascoe et al., 2007), and there is evidence that HHEX belongs to a highly conserved “genomic regulatory block” (Ragvin et al., 2010). Among sub-Saharan Africans, no single locus explains the overall T2D trend, suggesting that it is the effect of many moderately differentiated loci that contributes to the overall pattern for the ensemble of T2D loci.
Among the obesity loci, NEGR1 (Neuronal growth regulator 1) was found to be highly differentiated among sub-Saharan Africans. This gene has a role in neuronal outgrowth (Schafer et al., 2005) and is highly expressed in the hypothalamus (Willer et al., 2009). The region surrounding this allele has also been found to contain a large copy number polymorphism that could be a causal variant (Willer et al., 2009).
Whereas FST is most powerful for detecting selection on already standing genetic variation, present on multiple haplotype backgrounds, becoming favoured in one geographic region, REHH is most powerful to detect recent strong positive selection on a novel mutation that has reached an intermediate haplotype frequency in the population. Our results show that there are differences among groups in the number of obesity loci that show evidence of recent positive selection according to the REHH test. It appears that South Asians and Europeans exhibit more such loci compared, most notably, to sub-Saharan Africans and Native Americans.
An interesting result from the FST and REHH tests is the case of the region near NCR3. The risk SNP is near both NCR3 (natural cytotoxicity triggering receptor 3 precursor) and AIF1 (allograft inflammatory factor 1). Being in the HLA region of chromosome 6, this region appears to be highly conserved among different human populations, since it shows very little differentiation compared to the rest of the genome (see Supplementary Figure 2). However, this same region has among the strongest evidence of extended haplotypes among South and East Asian populations. In instances of extended haplotypes containing the risk SNP, we have determined that it is the risk allele that is contained in these haplotypes, suggesting that selection has recently favored variation in this region that enables individuals to avoid an energy deficit.
Finally, for the XP-EHH test, we do not observe major differences in signals between groups for the obesity loci. However for the T2D loci, we find that East Asians exhibit more evidence of recent positive selection, most notably at HHEX and THADA, a result that converges with the extreme differentiation that East Asians exhibit at this locus.
Although we do observe some overlap of genetic and geographic regions identified by the three tests considered (FST. REHH, XP-EHH), the lack of overlap could be due to several factors. Haplotype-based measures, such as those based on EHH, test for very restricted and likely rare set of cases of positive selection acting on newly arisen variation, that is relatively recent, and in which the haplotype quickly rises to an intermediate frequency in a given population (best seen by EHH) or a high frequency in one population but not another (best seen by XP-EHH). Therefore, the congruence of these various tests will depend on the type, timing, and strength of selection for each particular genetic and geographical region.
Our findings along with other published evidence appear to be slightly more consistent with the hypothesis that cycles of feast and famine were as or more severe among agricultural populations (Benyshek et al., 2006; Cordain et al., 1999). The REHH results among South Asians suggesting recent natural selection favoring obesity risk alleles is consistent with evidence of major famines in South Asia (Wells, 2007). It may be that the adoption of agriculture, along with its associated features such as sedentary life-ways resulted in an inflexible over-reliance on a more highly variable food supply. We find that while Eurasian populations show REHH signatures of selection at several obesity loci, American and sub-Saharan African populations show signs of selection at only one locus each (Table 1). This is consistent with the fact that the relative isolated sub-Saharan and Native American populations adopted agriculture later than Eurasian populations. These findings should be interpreted with caution since the loci that we have examined have been associated with body weight only among European or European/derived populations. If agriculture did indeed select for thrifty genes, we are left with the puzzle of explaining why rates of obesity and T2D are relatively low among individuals of European ancestry, for example. It suggests that non-genetic factors could more readily explain population differences in obesity and T2D prevalence. These could be environmental factors that are not yet well understood (Gravlee et al., 2009; McAllister et al., 2009), including infectious agents (Ley et al., 2006; Vijay-Kumar et al., 2010; Wells, 2009; Whigham et al., 2006) that would perfectly track genetic admixture proportions.
A limitation of our findings is that we have tested whether these candidate loci are outliers compared to the rest of the genome with respect to population differentiation and extended haplotype homozygosity. It is presently difficult to determine with certainty whether such outlier loci are the result of natural selection, as opposed to other evolutionary forces such as genetic drift. Another limitation of our findings, as mentioned above, is that we have examined loci that have been found to be associated with these traits among Europeans. Although several GWASs have recently been conducted in other populations (Cho et al., 2009; Liu et al., 2010; Tsai et al., 2010), there is still some uncertainty as to whether the same loci explain variation in these traits in different populations. If, as we have shown, there has been differentiation and selection at these loci, it may be that the genetic architecture of these traits is different in different populations. There may be loci that affect these traits in non-Europeans that we have not considered in this analysis. Our findings of greater evidence of thrifty genes among Eurasians may therefore be biased by the possibility that these loci are found to be associated in GWASs in Europeans, precisely because they underwent recent selection in those groups. It should also be noted that our results could be influenced by a subset of the several populations within each broader group that we are using.
In conclusion, our results have shown that genetic regions surrounding loci associated with T2D, and to a lesser extent, obesity, have been subject to unusually high levels of change in the last 50,000 to 100,000 years. Most notably, sub-Saharan Africans and East Asians appear to have undergone selection at T2D loci. Identifying specific targets of recent selection in the human genome can aid in determining population-specific risk variants, especially insofar as prevalence differences differ between populations (Ayodo et al., 2007). We anticipate that future studies will be at a finer scale at both the population, genetic, and phenotypic level, potentially further elucidating the genetic basis of obesity and T2D, and the population-specific genetic or non-genetic mechanisms that lead to different rates, types, and consequences of obesity and T2D.
The authors thank the individuals in the HGDP sample, Vinodh Srinivasasainagendra for computational assistance, and the UAB High Performance Computing Center. The authors also thank Nick Pajewski, Guo-Bo Chen, Nathan Wineinger, Robert Makowsky, and Charity Morgan for help with analyses. This work was funded by NIH-T32HL007457.
Funded by: NIH T32HL007457 from the National Heart, Lung, and Blood Institute
Conflict of Interest: The authors declare that they have no conflict of interest.