Recent positive selection on newly arising alleles produces a strong genetic signature: a long haplotype of unexpectedly high frequency13
. In contrast, weak polygenic selection on standing variation acts on multiple haplotypes simultaneously14–16
. As a result, the effects of polygenic adaptation on patterns of variation are generally modest and spread across many haplotypes at any one locus. To overcome these difficulties, we implemented an approach that combines evidence for selection across many loci. Specifically, we examined the single nucleotide polymorphisms (SNPs) tested in genome-wide association (GWAS) studies to identify which of the two alleles at each SNP is associated with increased trait values (“trait-increasing allele”), and then tested these trait-increasing alleles as a group for systematic, directional differences in allele frequencies between populations. Under polygenic selection, we expect that the trait-increasing alleles will tend to have greater frequencies in the population with higher trait values, compared to the population with lower trait values10,17
We propose that adult height in Europe might provide an example of polygenic adaptation in humans. Northern Europeans are typically taller than Southern Europeans (Supplemental Table 1
), and although nongenetic factors can produce phenotypic differences between groups18,19
, we suspected that the height differences between these closely-related populations might be partially explained by genetic differences due to widespread selection on standing variation. We tested this hypothesis using recent GWAS data for height generated by the Genetic Investigation of ANthropometric Traits (GIANT) consortium20
and Northern- and Southern-European allele frequency estimates based on two separate datasets, MIGen21
. In this case, we expect the height-increasing allele at height-associated loci to be more frequent in Northern- than in Southern-European populations.
We first compared the Northern- and Southern-European allele frequencies of 139 variants that are known to be associated with height at genome-wide significance20
and were directly genotyped in the MIGen study. We used 257 U.S. individuals of Northern-European ancestry and 254 Spanish individuals from MIGen as the Northern- and Southern-European populations, respectively (Supplementary Note
and Supplementary Figure 1
). We found that the height-increasing alleles are more likely to have higher frequencies in Northern than in Southern Europeans (85 out of 139, sign test p = 0.011; mean frequency difference = 0.012, t-test p = 4.3×10−4
; ). This result was robust when compared to 10,000 sets of SNPs drawn at random from the genome, matched on a per-SNP basis to the known height SNPs by the average Northern- and Southern-European allele frequencies (p = 0.0056 for mean frequency difference; ; Online Methods). We observed similar results in an independent dataset, POPRES (, Supplementary Table 2
and Supplementary Figure 2a
). Thus, the group of height-increasing alleles at known associated variants is more common in Northern than in Southern Europe, indicating that the phenotypic difference between these two populations is at least partly due to genetic factors.
Comparisons of the mean AF difference and the maximum likelihood estimate of s in pairwise combinations of populations across Europe
Mean allele frequency difference of height SNPs, matched SNPs and genome-wide SNPs between Northern- and Southern-European populations
We noted that the randomly matched SNPs used as a control in this analysis also showed a subtle trend towards the height-increasing allele being more common in Northern than Southern Europeans (mean frequency difference across 10,000 matched SNP sets = 0.0035; ). In fact, throughout much of the genome, the predicted height-increasing alleles are more likely to have higher frequency in Northern than Southern Europeans ( and Supplementary Figure 2b
). This observation suggested that, beyond the 180 known loci20
, many additional height-associated SNPs in the genome may reach genome-wide significance in GWAS studies as power is improved (consistent with previous modeling20,23
), and that the height-increasing alleles at these variants may further contribute to the height difference between these populations.
While there appears to be a genome-wide trend for the height-increasing allele to be the Northern-predominant allele (i.e.
, the allele that is more common in Northern than in Southern Europeans), we must also considered confounding by ancestry as a possible explanation for this observation24–27
. The GIANT consortium took multiple steps to control for ancestry20
, but if these steps were not completely effective, then SNPs with an allele frequency difference between Northern and Southern Europeans would tend to be spuriously associated with height, with the Northern-predominant allele appearing to be a height-increasing allele.
We therefore estimated the effect sizes for the Northern-predominant alleles on height in a family-based cohort (the Framingham Heart Study), using a sibship-based regression analysis that is immune to stratification (see Online Methods), and compared these estimates with those from GIANT. We observed that, for the most strongly associated ~1,400 SNPs, the estimated effects of the Northern-predominant alleles on height are indistinguishable between the sibship-based test and the GIANT data set (paired t-test p = 0.36; Supplementary Figure 3
). For the remaining SNPs, the average estimates of effect size from the family-based analysis fall towards zero slightly faster than the GIANT estimates (; Supplementary Figure 4a
). This faster decrease could be due to low power in the smaller family-based sample and/or residual stratification in the remaining GIANT data, although there is clearly a signal of true association beyond these ~1,400 SNPs (; Supplementary Figures 4a, 4b
). To ensure that our conclusions are not confounded by stratification, we therefore focus our subsequent analyses on this set of ~1,400 independent SNPs. The allele frequency of these ~1,400 height-increasing alleles is significantly higher in Northern than in Southern Europeans, including multiple comparisons within MIGen and within POPRES (all t-test p
; ). We also found that the frequencies in a central European population (Swiss-French from POPRES) fall between those of the Northern- and Southern-European POPRES populations (). Thus, the observation that many height-increasing alleles are more common in Northern than in Southern Europeans is not explained by stratification. Rather, consistent with selection, the data suggest a small but systematic increase in frequency of height-increasing alleles in Northern Europe and/or a decrease in frequency in Southern Europe.
Within-family analyses of height and the Northern-predominant alleles across the genome
Finally, we asked whether this systematic change in frequency of height-increasing alleles could be explained by genetic drift or, alternatively, if the data are more consistent with a model that also incorporates selection (Online Methods). In the absence of selection, the expected difference in allele frequency has a mean of 0 and a variance of p
)(2 × FST
), where p
is the estimated ancestral allele frequency, FST
is estimated using the genome-wide data, and Ni
are the population sample sizes28
. The expected effect of selection on allele frequency differences is estimated as:
is the number of generations of differential selection, and w
is the selective pressure per allele per generation (Online Methods). We used a likelihood ratio test (LRT) to compare models incorporating selection and drift to a model of drift alone; using simulations (Supplementary Note
), we verified that the LRT gave expected results under the null model of drift alone (Supplementary Figures 5, 6
), in models incorporating both drift and selection (Supplementary Table 3
), and is robust to the choice of ancestral allele frequency, p
(data not shown).
By calculating the combined likelihood of the frequency data at the ~1,400 independent SNPs under each of the different models, we found that models incorporating both selection and drift were more consistent with the data than models of drift alone, with LRT p-values ~10−16
over a range of values of T
(, Supplementary Tables 4–10
; see Supplementary Tables 11 and 12
for results using a larger genome-wide set SNPs). Given typical effect sizes of height-associated variants, which are generally 10−2
standard deviations or smaller (1 standard deviation ≈ 6.5 cm), we estimate that, in a model where selection is proportional to effect size, the typical selective pressure on individual height-associated variants would be ~10−3
per allele per generation. Thus, the data are much more consistent with the presence of widespread weak selection on standing variation than with a model of drift alone.
We also addressed several other factors that could confound our results. First, we considered whether demographic biases in GIANT could have produced our results. Because GIANT consists largely of individuals of Northern-European ancestry, the consortium could have greater power to identify height-associated variants whose frequencies are closer to 0.5 in Northern Europeans. However, when we reordered the GIANT GWAS results based on discovery power in Southern Europeans (Supplementary Note
), our results were essentially unchanged (Supplementary Table 2
, Supplementary Figures 7, 8
). Second, the height SNPs were limited to SNPs contained in HapMap, which itself ascertained SNPs in part by sequencing in Northern- but not Southern-European samples. This ascertainment bias could in theory influence the Northern- and Southern-European minor allele frequency distributions in HapMap SNPs, and hence the height-associated SNPs. However, the minor allele frequency distribution of the ~1,400 height-associated SNPs is indistinguishable between Northern and Southern Europeans (Kolmagoroff-Smirnov p = 0.996). Furthermore, we showed through simulations using an even more biased scheme of SNP ascertainment based on 1000 Genomes29
that such bias does not account for our results (Supplementary Note
). Importantly, our results show a directional
rather than overall shift in allele frequencies, so ascertainment biases in GIANT or HapMap would only be potentially relevant if height-increasing alleles were systematically biased towards being the major or minor allele. However, there is no statistically significant bias in either the known height-increasing alleles (70/138 major alleles in Northern Europeans, 71/139 major alleles in Southern Europeans) nor the expanded set of ~1,400 SNPs (752/1,434 major alleles in Northern Europeans, 740/1,436 major alleles in Southern Europeans; all p >0.05). Thus, our results cannot be explained by having ascertained height-associated SNPs largely in Northern Europeans.
Another important potential bias is that we studied a phenotype (height) and pair of populations (Northern and Southern Europeans) where the phenotype was known to differ between the populations. As discussed by Orr17
, once we selected a phenotype known to be differentiated, it may not be surprising to observe more height-increasing alleles in the taller population. To test whether height in Northern and Southern Europeans could simply be an extreme example of a neutrally evolving trait, we simulated 10,000 neutrally evolving traits that have the same genetic architecture as height (Supplementary Note
). We estimate that we would have had to ascertain height in Northern and Southern Europeans from more than 1016
neutrally evolving trait/population pairs to obtain the level of differentiation we observed in the actual data (Supplementary Figure 9
), suggesting our observations are not simply the extreme end of neutrally evolving traits but rather reflect the effects of selection.
In summary, we have provided an empirical example of widespread weak selection on standing variation. We observed genetic differences using multiple populations across Europe, thereby showing that the adult height differences across Europe are not due entirely to environmental differences, but rather are at least partly genetic differences arising from selection. Height differences across populations outside of Europe may also be genetic in origin, but potential nongenetic factors such as differences in timing of secular trends mean that this inference would need to be tested directly with genetic data in additional populations. By aggregating evidence of directionally consistent intra-European frequency differences over many individual height-increasing alleles, none of which individually has a clear signal of selection, we could observe a combined signature of widespread weak selection. However, we were not able to distinguish whether this differential weak selection (either positive or negative) favored increased height in Northern Europe and/or decreased height in Southern Europe. One intriguing possibility is that sexual selection or assortative mating (sexual selection for partners with similar height percentiles) fueled the selective process. It also remains possible that selection is not acting on height per se, but acted on a phenotype closely correlated with height or on a combination of phenotypes that includes height.
Our analysis is practicable because many variants have been reproducibly associated with height, and also suggests that many more loci with small effects on height remain to be identified. As more genome-wide association data become available for human traits or diseases, this approach can be used to search for other examples of human polygenic adaptation, including traits or diseases associated with climate or other environmental variables that vary across otherwise closely related populations8,30,31