|Home | About | Journals | Submit | Contact Us | Français|
The spread of modern humans across the globe has led to genetic adaptations to diverse local environments. Recent developments in genomic technologies, statistical analyses, and expanded sampled populations have led to improved identification and fine-mapping of genetic variants associated with adaptations to regional living conditions and dietary practices. Ongoing efforts in sequencing genomes of indigenous populations, accompanied by the growing availability of “-omics” and ancient DNA data, promises a new era in our understanding of recent human evolution and the origins of variable traits and disease risks.
Modern humans originated ~200,000 years ago in Africa. Over the past 100,000 years, humans spread across the globe into a variety of habitats, from tropical to arctic, from high altitudes to lowlands and even to toxic environments. After humans migrated out of Africa, they encountered and interbred with archaic populations such as Neandertals and Denisovans, resulting in introgression of archaic genomes into non-African modern human genomes (~1 to 6% of modern genomes) (1). Introgression within Africa also likely occurred (2) but is more challenging to quantify because ancient DNA (aDNA) does not preserve well in that region, and no archaic African genomes are currently available. Within the past 10,000 years, most human populations have transitioned from a hunting-gathering life-style to practicing agriculture and pastoralism, resulting in rapid population growth, increased population densities, and an increase in infectious diseases. The selection pressures for adapting to local environments and new diets have resulted in population- or region-specific genetic variants that influence variable phenotypes (such as height, innate immune response, lactose tolerance, fatty acid metabolic efficiency, and hemoglobin levels).
Establishing a complete picture of local human adaptation can be challenging because it involves identifying the genomic regions under selection, the phenotypes that selection is acting upon, and ideally, the external conditions driving the selection. Populations that have adapted to environments that severely challenge survival provide well-characterized cases for local adaptation. Here, we review several recent examples that illustrate the use of emerging data, such as aDNA and genome-wide association studies (GWAS), and the impact these adaptations have for disease risk (Fig. 1).
The advent of cattle domestication in the Middle East and North Africa, ~10,000 years ago, lead to strong selection pressure for the ability to drink milk as adults. Variants near the LCT locus—coding for the lactase enzyme that metabolizes lactose, the main carbohydrate in milk—show some of the strongest signals of selection in the human genome (3). In most mammals and in most humans, the level of the lactase enzyme decreases after weaning [lactase nonpersistence (LNP)]. However, many populations that have traditionally practiced dairying maintain high levels of lactase into adulthood [lactase persistence (LP)].
A genetic variant associated with LP in Europeans was mapped to intron 13 of MCM6 upstream of LCT (4). Additional variants located within 100 base pairs (bp) of the European variant were identified in African populations (5). The LCT region was repeatedly reported as a target of a recent strong selective sweep (Fig. 2A) in Europeans and Africans, on the basis of numerous statistical tests including allele frequency comparisons between global populations [fixation index (FST) analyses] and extended haplotype homozygosity tests within [extended haplotype homozygosity (EHH) and integrated haplotype score (iHS)] and across [cross-population EHH (XP-EHH)] populations (Fig. 3) (3, 6, 7). Indeed, the homozygosity of African haplotypes containing the LP-associated variants extends on average nearly 2 Mb, but only ~2000 bp in ancestral haplotypes (5, 8). In vitro studies showed that these derived alleles enhance the expression of LCT (5, 9). The European LP-associated variant is estimated to be ~9000 years old, whereas the most common East African LP variant is ~5000 years old, which is consistent with archeological evidence for cattle domestication in the Middle East and east Africa (5). Sequencing of aDNA indicates that the European LP-associated allele was absent in early Neolithic Central Europeans and was at low frequency in late Neolithic Europeans (10), suggesting that LP spread recently (within the past ~4000 years) across Europe (11).
The genetic adaptations resulting in LP are examples of convergent evolution in modern humans (causative genetic variants arose independently in geographically diverse populations owing to strong selective pressure for an adaptive phenotype). However, the identified LP-associated variants do not entirely explain the LP phenotype, particularly in western Africans and some central and southern Asian pastoralist populations. For example, the Fulani populations from Nigeria and Cameroon have the European variant at moderate frequency but lack the eastern African LP-associated variants and have a distinct haplotype with extended homozygosity near LCT, suggesting the presence of additional unknown functional variants (8). Furthermore, an outstanding question is the role that the human gut microbiome plays in LP.
The Inuit populations in Alaska, Canada, and Greenland have adapted to a cold and dark Arctic environment and a marine diet rich in omega-3 polyunsaturated fatty acids. A recent study compared genomic diversity in Greenlandic Inuits with Europeans and Chinese using the population branch statistic (PBS) (Fig. 3B, iii) (12) and found that the most differentiated region encompasses a gene cluster that codes for fatty acid desaturase enzymes (FADS), which are important modulators of fatty acid composition. Two variants in the FADS region were significantly associated with short stature in the Inuits, possibly because of the influence of fatty acid composition on growth hormone regulation (12). These variants were also associated with height in a larger study of Europeans, but the variants were present at low frequencies and would have been hard to discover without studying the Inuit population, demonstrating why studies of indigenous populations can be informative for identifying variants of functional importance across ethnic groups (13).
Tropical rainforests are some of the harshest environments in the world, characterized by high temperature and humidity as well as the prevalence of parasites and other pathogens. Individuals living in tropical environments often have very short life spans, which directly affects reproductive success and, hence, can act as a selective pressure.
One distinctive phenotype thought to be adaptive to a tropical rainforest environment is short stature (commonly referred to as a “pygmy” phenotype), which is defined as an average height of <150 cm in adult males. The short stature trait is an example of convergent evolution in rainforest hunter-gatherer (RFHG) populations across Africa, Asia, and South America. Selection for small body size may be due to limited food resources, resistance to heat stress, immune response, and/or a trade-off between early onset of reproduction and cessation of growth (14, 15). Perturbations in the GH1-IGF1 pathway have been implicated to play a role in short stature in RFHG populations in central Africa and southeast Asia (16).
In central African RFHG populations where admixture with neighboring populations is common, short stature is significantly correlated with ancestry and is highly heritable (14). Central African RFHG–specific genomic adaptations were identified through comparisons with other African populations by using methods such as locus-specific branch length (LSBL) (17) and XP-EHH (3) tests, which have high power to detect population-specific positive selection (Fig. 3B). These tests identified a 15-Mb region on chromosome 3 that shows signatures of strong positive selection (14). Several genes in this region are associated with short stature in Pygmies, including DOCK3, which is associated with height variation in non-Africans, and CISH, which is important in immune response but also inhibits human growth hormone receptor activity, indicating that short stature could be a product of selection acting on pleiotropic loci (14). A subsequent study discovered a ~200-kb haplotype containing HESX1 at high frequency in central African RFHGs but low frequency in other African populations; HESX1 is involved in the development of the anterior pituitary, where growth hormone is produced (18). This haplotype was not tagged in previous genotyping arrays, demonstrating the importance of including ethnically diverse populations in whole-genome sequencing studies (18). However, RFHG and neighboring agriculturalist populations in Uganda exhibit a distinct set of loci associated with short stature, which raises the possibility of convergent evolution of this trait across the RFHGs in Africa (15).
The studies discussed above suggest that short stature in central African RFHG is mostly driven by strong selection affecting a relatively small number of loci of moderate to strong effect. In contrast, hundreds of loci are associated with European height variation (19, 20). Detecting selection on a polygenic trait such as height is difficult because the causative variants can be ancient polymorphisms each of relatively small effect. Several recent methods use GWAS to identify trait-associated variants then test for allele frequency shifts between populations [either unweighted (20) or weighted by effect size (21)] greater than expected from neutral drift. Polygenic selection tests using GWAS data (Fig. 2C) have suggested weak selection influencing average height differences among European populations, with Northern Europeans generally being taller than Southern Europeans (20, 21). A recent aDNA analysis of Neolithic European populations suggests that the North-South European height gradient may reflect selection for shorter height in Early Neolithic migrants into southern Europe and admixture of taller steppe populations with northern Europeans (11). Although analysis of aDNA holds great promise for revealing human phenotypic history, this approach faces challenges for studies of indigenous populations such as RFHG because DNA is not well-preserved in tropical climates, and large-scale GWAS are not available.
Pathogenic environments are an important driver of local adaptation in humans (22), and nowhere is the challenge of survival from pathogens greater than in tropical rainforests. In particular, malaria, a mosquito-borne protozoan parasitic infection, is a major cause of mortality in sub-Saharan Africa. The most lethal malarial species, Plasmodium falciparum, causes >1 million child deaths in Africa each year (23). Tropical populations have long been known to have genetic variants that confer resistance to malaria, including the sickle cell, α+- and β+-thalassemia–causing alleles at the hemoglobin loci, as well as variants at ABO, GYPA, GYPB, GYPE, and G6PD [reviewed in (23)]. These variants are likely adaptive, as initially evidenced by the strong correlation between allele frequency and prevalence of malaria infection. However, several malaria-protective variants also cause common Mendelian diseases in hemi- or homozygotes [such as sickle cell anemia, G6PD deficiency, and thalassemia (23)] and are maintained through balancing selection.
Another example of adaptation to pathogens resulting in high frequencies of genetic variants associated with disease is at the APOL1 locus. The “G1” and “G2” genetic variants in the last exon of APOL1 are associated with chronic kidney disease, which is disproportionately common among individuals of African descent (24). These variants also confer resistance to human African trypanosomes (causing “African sleeping sickness”) by modifying the protein to lyse Trypanosoma brucei rhodesiense, a parasitic protozoa transmitted by the tsetse fly. The region flanking the G1 and G2 variants appear to be under recent positive selection, as indicated by extended haplotype homozygosity (25). G1 and G2 are common in some western African populations, but T.b. rhodesiense is currently common only in eastern and southern Africa, which raises questions about the evolutionary history of the APOL1 locus and the pleiotropic effects of G1 and G2 (25).
Living at high altitude (>2500 m above sea level) in regions such as the Tibetan Plateau, the Andean Altiplano, and the Semien plateau of Ethiopia can be deadly because of an insufficient supply of oxygen to vital organs (hypoxia). However, populations living in these high-altitude regions for thousands of years have adapted and thrived, with varying physiological adaptations to hypoxic environments [reviewed in (26)].
Recent genome-wide scans of selection (such as LSBL, PBS, iHS, and XP-EHH) (Fig. 3B) uncovered genetic adaptations to high altitude by comparing Andean, Ethiopian, and Tibetan genomes with lowland populations with similar genetic ancestry [reviewed in (27)]. Signatures of positive selection were found repeatedly at genes involved in the hypoxia-inducible factor (HIF) pathway [reviewed in (28)] but on different haplotype backgrounds (for example, EGLN1 in Andean and Tibetan populations) owing to convergent evolution (29). One of the strongest signals of selection in the Tibetan populations is at EPAS1, a transcription factor influencing the HIF pathway. Sequence analysis suggests that the selected Tibetan haplotype may have originated from introgression of genomic DNA from Denisovans (Fig. 2D) (30).
Arsenic is acutely toxic to humans but is naturally present at high levels in groundwater across the globe (31). San Antonio de los Cobres (SAC), Argentina, is one such locale with high levels of arsenic yet has been settled by human populations for the past ~11,000 years. Arsenic metabolism involves methylating inorganic arsenic to monomethylarsonic acid (MMA) and subsequently to dimethylarsinic acid (DMA), which is less toxic. The fraction of arsenic metabolites (%DMA/%MMA) in urine indicates the efficiency of arsenic metabolism. Individuals in SAC show particularly low urinary excretion of MMA, suggesting a local adaptation to arsenic. A recent association study in SAC identified potential protective regulatory variants upstream of AS3MT, a gene involved in the arsenic methylation pathway. The variants are likely under positive selection because they are embedded in long haplotypes in the SAC population and are at higher frequency than in neighboring low-arsenic groups with similar genetic ancestry (31).
Variation in human skin color is one of the most striking examples of human phenotypic diversity. Unlike other primates, human skin is not covered by dense body hair and is the primary interface between our body and the environment. Ultraviolet radiation (UVR) exposure is an important driver of pigmentation evolution in humans, with selection pressure for darker skin at low latitudes for UVR protection and for lighter skin at higher latitudes, possibly to maintain vitamin D photosynthesis (32).
The earliest studies of the genetics of human pigmentation were based on candidate genes identified in model organisms and highly penetrant genetic variants of Mendelian disorders [for example, SLC24A5 in zebrafish color patterns, MC1R in mouse coat colors, and OCA1-4 in human oculocutaneous albinism (33)]. GWAS in European populations have included additional candidate loci associated with light skin, a subset of which (such as OCA2, TYRP1, TYR, SLC24A5, and SLC45A2) are under strong selection, as evidenced by multiple genome-wide scans of selection [reviewed in (33)]. Furthermore, a recent analysis of 230 ancient Eurasian genomes found that the allele associated with light skin pigmentation has likely reached fixation in modern Europeans from very low frequency during the Neolithic period due to strong selection pressure over the past ~4000 years (11).
Although many genetic variants associated with European light skin color are identified, little is known about the genetic basis of skin color in Asia and Africa. Indeed, Africans exhibit high variability in skin color (ranging from dark-skinned Nilotic pastoralists to light-skinned San hunter-gatherers), and the genetic basis of pigmentation in these populations has just recently begun to be explored (34).
Genetic variants that were adaptive in the past can be maladaptive in modern environments. For example, the high prevalence of type 2 diabetes was proposed to be due to common variants that were adaptive for the efficient conversion of food into energy in the past when resources were scarce, but are maladaptive in modern urban environments (the “thrifty gene hypothesis”). This hypothesis remains controversial; an alternative hypothesis is that the disease-associated variants were never beneficial but became common through genetic drift (the “drifty gene hypothesis”) (35).
A recent study of Samoans provides functional evidence supporting the thrifty gene hypothesis. Over 80% of Samoans are overweight or obese [body mass index (BMI) > 26 kg/m2], which is among the highest prevalence in the world (36). By genotyping ~3000 Samoans, a missense variant in CREBRF was found to be associated with BMI and fasting glucose levels. This variant is under strong recent positive selection, as evidenced by extended haplotype homozygosity and high allele frequency in the Samoan population but not other populations. Functional expression experiments in adipose cells showed that the Samoan variant decreases energy use and increases adipose fat storage, suggesting that this variant may have been adaptive in the past by increasing tolerance to periods of starvation but is associated with risk for obesity and type 2 diabetes in modern populations.
aDNA sequencing provides a direct historical record of genomic variation and provides new possibilities for inferring recent human evolutionary history of modern phenotypes. Studies based on comparison of modern human with Neandertal and Denisovan genomes have found evidence of adaptive archaic haplotypes in genes related to innate immune response, metabolism, and skin phenotypes in ethnically diverse modern human populations [reviewed in (1)]. Future studies of aDNA will be informative for reconstructing the origin of functional variants and inferring the strength of selection based on direct observation of changes in allele frequencies over time (37). However, current understanding of ancient adaptation events is limited by sparse aDNA data over broad geographic and temporal scales. Moreover, methods for studying ancient genotype variation tend to focus on ascertained variants in specific populations, particularly Europeans (1).
Structural variants (SVs)—including duplications, copy number variants, deletions, insertions, and inversions—encompass a much larger fraction of the genome than single-nucleotide polymorphisms (SNPs) [~4.1 million to 5.0 million base pairs for SNPs compared with ~20 million base pairs for SVs (38)] and may consequently have substantial contributions to human adaptive evolution (39). For example, Perry et al. (40) observed higher copy numbers of AMY1, an enzyme that breaks down starches, in populations with high-starch diets. Further, a common inversion polymorphism at chromosome 17q21.31 is hypothesized to be under positive selection in Europeans because of increased fecundity of women carrying the inverted haplotype (41). However, SVs are extremely challenging to detect by using short-read sequencing technologies. The development of cost-effective high-throughput methods for obtaining long-sequence reads and novel computational approaches will facilitate identification and characterization of SVs in the future (42).
Although numerous methods for detecting selection have been developed, there is a known lack of concordance across methods (43), partly because of the different selection time scales each method is capable of detecting (44). Strong positive selection on a de novo (newly arising) mutation is the classic process that many methods have been devised to detect, in which the haplotype containing the beneficial variant rapidly rises in frequency in the population, resulting in long stretches of identical haplotypes around the selected variant within the population (Figs. 2 and and3).3). There is some debate over the fraction of the human genome affected by selective sweeps within the past ~250,000 years, in part owing to complexities in accounting for background selection that purges deleterious genetic variation (45, 46) and the difficulty in detecting simultaneous selection on multiple beneficial haplotypes at a locus (“soft sweeps”) (Fig. 2B) (47). Although recent methods have increased the power to scan for soft sweeps (48, 49), their detection is still limited because of relatively weak genomic signals (50) and because regions flanking a hard sweep (the “shoulders”) can mimic a soft sweep (51, 52). Development of machine-learning methods, which can effectively “learn” the appropriate signals within complex data when given training examples, may be effective for detecting hard and soft selective sweeps (53).
Perhaps the largest challenge to detecting selection is the fact that most traits are polygenic. It is difficult to distinguish small frequency shifts over large numbers of independent loci (Fig. 2C) from neutral drift without knowing a priori which trait is under selection and the genetic variants that influence the trait. Incorporating the results of GWAS into selection tests (20, 21) is one promising avenue to identify cases of polygenic adaptation, although such integration will have to overcome issues common to genetic association tests, including ascertainment bias in the set of genotyped variants, population stratification, and a lack of observed heritability (“missing heritability”).
The best-characterized human adaptive variants are those related to Mendelian, or near Mendelian, traits for which the adaptive phenotype can be easily distinguished (for example, LP and skin pigmentation). Characterizing the functional variants that affect nonobvious or intermediate phenotypes—such as blood metabolite levels, gene expression across cell types, and epigenetic modifications—motivates the need for detailed phenotyping of global populations based on “-omics” profiling (such as transcriptomics, metabolomics, or epigenomics). Additionally, integration of genome-wide selection scans with GWAS helps improve the power of genotype-phenotype association analyses by localizing putative regions with a functional impact. This is particularly important in studies of indigenous populations, in which obtaining large sample sizes is challenging. Lastly, functional experiments in model organisms can directly establish the link between candidate adaptive variants and phenotypes. For example, a derived allele at the EDAR locus (EDAR370A) is significantly associated with hair thickness and dental morphology [reviewed in (27)] and is a target of recent strong positive selection in Asian populations (3, 54). Development of humanized mice containing EDAR370A replicated the hair thickness phenotype and also identified previously unknown phenotypic impact on mammary and eccrine glands (55). With the development of in vitro and in vivo technologies, such as tissue-specific cell lines, genome/RNA-editing (such as CRISPR technology), and induced pluripotent stem cells, it is increasingly possible to validate the function of adaptive variants in humans.
Selection pressures in response to regional conditions have influenced global human genomic diversity, as evidenced by the reviewed local adaptations to diverse physical environments, pathogen exposure, and dietary practices. None of these insights would be possible without global genome-wide population genetic data collected over the past 15 years. New insights into local human adaptation will require high-coverage whole-genome sequencing of ethnically diverse populations and detailed phenotyping. Extending current GWAS-based methods to detect polygenic adaptation in order to include more accurate models of genetic architecture that take into account nonadditive, epistatic, and gene-environment effects is an important future direction. The integration of high-quality genomic data from ancient and modern populations, detailed phenotype data, and advances in computational approaches will illuminate the mode and tempo of local adaptation as humans settled the globe.
We thank M. Rubel for her assistance with creating Fig. 1 in this manuscript and members of the Tishkoff laboratory for helpful scientific discussions. This work is funded by NIH grants 1R01DK104339-01 and 1R01GM113657-01 and NSF grant BCS-1317217 to S.A.T., NIH grant T32ES019851-02 to M.E.B.H. through the Center of Excellence in Environmental Toxicology at the University of Pennsylvania, and NIH grants LM009012 and LM010098 to Y.L. through J. H. Moore.