|Home | About | Journals | Submit | Contact Us | Français|
The recent availability of genome-scale genotyping data has led to the identification of regions of the human genome that seem to have been targeted by selection. These findings have increased our understanding of the evolutionary forces that affect the human genome, have augmented our knowledge of gene function and promise to increase our understanding of the genetic basis of disease. However, inferences of selection are challenged by several confounding factors, especially the complex demographic history of human populations, and concordance between studies is variable. Although such studies will always be associated with some uncertainty, steps can be taken to minimize the effects of confounding factors and improve our interpretation of their findings.
The past few years have seen an explosion of studies using molecular data to detect Darwinian natural selection1–6. With the recent availability of large-scale genotyping data, genome-wide scans for genes or genomic regions that have been targeted by selection have become feasible. These studies have greatly advanced our understanding of human evolution and molecular evolution in general, but they have also sparked considerable controversy.
The interest in detecting selection is twofold. First, it stems from a natural curiosity about our evolutionary past and the basic mechanisms that govern molecular evolution. Much of the work in this field through the past four decades has focused on quantifying the relative importance of Darwinian selection and random genetic drift in determining levels of variability within species, as well as divergence between species (for example, REFS 7,8). However, as evidence accumulates for a strong role of selection, efforts are increasingly concentrating on identifying and characterizing particular instances of selection and adaptation at the molecular level. In humans, in particular, there has been a strong interest in identifying genes that have undergone recent selection relating to key human traits such as cognitive abilities4,9,10.
A second motivation for studying selection stems from the realization that inferences about selection can provide important functional information. For example, genes that are targeted by selection acting on segregating mutations are more likely to be associated with disease (for example, REF. 3). Even small fitness effects can, on an evolutionary timescale, leave a distinct pattern. Therefore, it might be possible to identify putative genetic disease factors by identifying regions of the human genome that currently are under selection3,11. In general, positions in the genome that are under selection must be of functional importance, otherwise selection could not be operating.
The aim of this Review is to discuss some of the major findings regarding selection in humans, and explain why the conclusions of these studies have at times been controversial with low levels of concordance among studies. We focus particularly on recent selection; that is, selection that might have affected current population genetic variation. We first address the question of the likely relative contributions of negative and positive selection to genetic variation in human populations, and explore how identifying these types of selection might contribute to our understanding of human evolution and gene function. We then discuss the different approaches that are taken to detecting the recent and ongoing positive selection that has affected the human genome, followed by a detailed discussion of recent genome-wide studies that have provided many new potentially selected genomic regions and individual genes. The key problems that face studies of selection are then addressed, along with a discussion of why low concordance has been seen among some of the studies that have been carried out so far. Finally, we bring together a discussion of studies that have provided insights into the patterns of selection that are likely to characterize Mendelian and complex human diseases, with the potential to aid the discovery of further disease-associated genes.
Although it is clear that selection is pervasive in humans and other organisms, the relative importance of positive and negative selection is still debated. Much of the natural selection acting on genomes may be negative selection acting to remove new deleterious mutations. Most exons in protein-coding regions are highly conserved between species, because many potential mutations would disrupt protein function. Therefore, the conservation of genic regions provides evidence of past negative selection and provides an important route to genome annotation. Similar evidence for conservation and negative selection in non-coding regions provides the basis for an important approach for detecting functional elements, such as microRNAs (for example, REFS 12,13).
Eyre-Walker and Keightley14 estimated that at least 38% of all new amino-acid altering mutations in the human genome are being eliminated by negative selection, assuming that all mutations are either deleterious or neutral (that is, having no effect on organismal fitness). As noted by the authors of this study, this is probably an underestimate, and subsequent studies15–17 have suggested that as a much as 70–75% of amino-acid altering mutations are affected by moderate or strong negative selection. Importantly, however, much of this selection might act at the level of gametogenesis, on mature gametes or during early development. Mutations that are strongly deleterious will be quickly eliminated by natural selection, and only mutations that have, at worst, a mildly negative fitness effect will be observed as segregating in the population. A. R. Boyko et al. (unpublished observations) estimated that the proportion of amino-acid altering mutations in humans that have a negative fitness effect, but are so weakly selected that they might still be segregating in the population, is approximately 30–40%.
Positive selection occurs when a new (or previously rare) mutation confers a fitness advantage to the individuals carrying it. Much attention has focused on this type of selection because it provides the footprints of evolutionary adaptation at the molecular level. Identifying genomic regions that have been influenced by positive selection provides a key to understanding the processes that lead to differences among species and a subset of heritable phenotypic differences within species. For example, we can learn much about the biological basis of the evolution of modern humans by studying how positive selection has affected the human genome over the past few hundred thousand years. In general, positive selection is relevant when we seek to understand species-specific adaptations or processes that relate to dynamic interactions between the organism and its environment.
If positive selection acts on protein- coding genes, and if it occurs by repeated rounds of favouring multiple mutations in a gene (that is, selection is recurrent), positive selection can be detected as an increased rate of amino-acid substitution. Many studies have recently taken advantage of this fact to quantify positive selection acting in the genome on the human lineage leading from the ancestor of human and chimpanzees to modern humans18–20. In general, these studies have identified genes involved in immune-related functions, spermatogenesis, olfaction and sensory perception, and have highlighted several other functional gene categories with an increased likelihood of having experienced positive selection. Genes in these categories are likely to be involved in direct interactions with the environment, and will be under selective pressure in the face of environmental change. In particular, genes involved in dynamic competitive or co-evolutionary interactions are expected to experience more positive selection. A prime example of this is immunity and defence-related genes, which are involved in dynamic interactions with pathogens. As a category, these genes have experienced by far the most positive selection in humans and other organisms18–20.
There are several theories regarding selection acting on spermatogenesis, one being that most selection is related to post-mating competition between sperm from different males for fertilization21. In this case, the changing environment is the phenotype of sperm from other males. Alternative theories suggest that the selection is related to interactions between egg and sperm cells22,23, or that it is driven by selfish mutations causing segregating distortion20.
Several individual genes that might underlie human-specific adaptations have been highlighted in interspecies studies, including genes involved in speech and cognition, such as forkhead box P2 (FOXP2), genes associated with pregnancy, such as the progesterone receptor (PGR), genes associated with skeletal development, such as tolloid-like 2 (TLL2), and numerous other genes4,18. However, genomic comparisons between species alone do not inform us about ongoing and recent selection within species, and have little power if genes have been affected by only a single, recent selective event, even if the strength of selection acting on the mutation is strong. To detect such selection, population genetic data are needed. These data are informative about selection that occurs less than approximately 4Ne generations ago (where Ne is the effective population size).
As a positively selected mutation increases in frequency in the population, it leaves a distinct mark on the pattern of genomic variation (FIG. 1). The pattern that is produced by such a ‘selective sweep’24 includes a reduction in the amount of variation, a temporary increase in the strength of linkage disequilibrium and a skew in the distribution of allele frequencies towards more alleles of low frequencies24–28 in the genomic region around the selected mutation. Because of recombination, the effect will be strongest in the immediate vicinity of the selected mutations, and will diminish with increasing genetic distance from them. When the selective sweep is ‘complete’, that is, when the favoured allele goes to fixation, all local variation is removed except that which has arisen by mutation and recombination during the sweep. If the sweep was rapid, then local variation will diminish to zero followed by a re-accumulation of variation through the combined processes of mutation and recombination, with a resulting site frequency spectrum that is skewed strongly towards rare alleles (FIG. 1).
Much interest has focused on identifying incomplete selective sweeps, which are seen when positively selected mutations are currently on the rise in the human populations but have not yet reached a frequency of 100%. The pattern that is left by such mutations is distinctive, involving some locally identical haplotypes that segregate at moderate or high frequencies, whereas the remaining haplotypes show normal levels of variability (FIG. 1).
One of the most famous examples of an incomplete sweep is that at the lactase (LCT) locus in European populations. Variants in this gene influence whether the ability to produce lactase, which enables the digestion of milk, persists into adulthood. Lactase persistence is thought to have increased in frequency as a result of positive selection during the past 10,000 years after the emergence of dairy farming29–33. The striking pattern of genomic variability that is observed in this locus involves a long, high-frequency haplotype that contains an allele associated with lactase persistence34 (FIG. 2). The haplotypes that carry the allele are almost identical in regions close to the location of the causative SNP, whereas haplotypes that do not carry the allele show a normal level of variability. This is exactly the pattern we would expect to observe if the allele has recently increased in frequency as a result of positive selection. Even more striking is that some African populations that use dairy farming also carry a high-frequency, long-range haplotype associated with lactase persistence, but the mutation is distinct from the one observed in Europeans35. Another gene that shows almost as strong evidence for an incomplete selective sweep is the glucose-6-phosphate dehydrogenase gene (G6PD): deficiency alleles confer resistance to malaria and show a signature of positive selection36–38.
The LCT and G6PD loci provide some of the most striking examples of ongoing selective sweeps in the human genome. These genes were identified a priori as candidate genes on the basis of functional information. Several recent papers have aimed at detecting loci under positive selection without such prior knowledge, based on genome-wide genotyping data. These methods can be used equally well to detect selection in non-coding and protein-coding regions, but the results are usually interpreted in terms of predictions for protein-coding regions, because most functional annotation is focused on genes. The following discussion will, therefore, also focus mostly on the results obtained for protein-coding regions.
The striking haplotype pattern that is observed at the LCT locus helped to motivate the development of the relative extended haplotype homozygosity (rEHH) and integrated haploytype score (iHS) tests for incomplete selective sweeps39,40 (BOX 1). These methods identify selection when a high-frequency haplotype with little intra-allelic variability is observed. The most comprehensive application of these methods made use of samples from the International HapMap Project, using some 800,000 SNPs in 89 Japanese and Chinese, 60 European and 60 Yoruban individuals40. Although there was significant overlap among populations, much of the evidence for selection was found to be specific to just one of them. This is not surprising as these methods have power primarily to detect incomplete selective sweeps, which might not have spread among different human populations. In addition to LCT and G6PD loci, Voight et al.40 also found signatures of selection for the 17q21 inversion in Europeans41, many cytochrome P450 genes, including CYP3A5 (REF. 42), the alcohol dehydrogenase (ADH) cluster in Asians43 and the olfactory-receptor clusters on chromosome 11 in Africans44. Cytochrome P450 genes, which are important in detoxification of plant secondary compounds, showed a signature of excess positive selection in all populations, with CYP3A5, CYP2E1 and CYP1A2 standing out. Finally, many genes involved in skin pigmentation showed signatures of positive selection outside of Africa, which would be consistent with the hypothesis that alleles that confer lighter skin colour have a selective advantage in regions that are further from the tropics. Interestingly, some of these same genes, and categories of genes, are also detected by comparative studies of selection based on human–chimpanzee divergence, including the olfactory-receptor clusters on chromosome 11 (REFS 18,20), suggesting that the selection acting on these genes occurred not only during recent human evolution, but also deeper in our ancestral past.
An influential approach for detecting recent and strong natural selection is the extended haplotype test91 and its derivatives40. The extended haplotype test relies on the linkage-disequilibrium structure of local regions of the genome. A haplotype at high frequency with high homozygosity that extends over large regions is a sign of an incomplete selective sweep. The method identifies tracts of homozygosity within a ‘core’ haplotype, using the ‘extended haplotype homozygosity’ (EHH) as a statistic. A relative EHH (rEHH) is calculated by comparing the EHH of the core haplotype to the EHH of all other haplotypes in the region. In the version by Voight et al.40, the EHH is summed over all sites away from a core SNP, and compared between the haplotypes that carry the ancestral and the derived allele in the SNP. The statistic (iHS — integrated haplotype score) is then normalized to have a mean of 0 and variance of 1. A related test was proposed by Wang et al.6, called the linkage-disequilibrium decay (LDD) test, which makes use of only homozygous SNP sites and therefore does not require separate phasing of haplotypes.
Several classical methods for detecting selection are based on the distribution of allele frequencies in SNPs, or SFS. The SFS can be ‘unfolded’, in which case the spectrum tallies the counts in the sample of the derived (more recently arisen) allele. For a sample of k chromosomes, the unfolded SFS has the frequencies of chromosomes with 1, 2, 3, …, k–1 copies of the derived allele in the site. The ‘folded’ SFS is applied when one does not know which is the ancestral and which is the derived allele. In this case, the classes with j and k–j, where j < k/2, copies of the derived allele are not distinguishable, and so they are pooled. A selective sweep strongly affects the SFS, leading to a deficiency of alleles of intermediate frequency right after a selective sweep and an increase of such alleles during the selective sweep. Several methods for detecting selective sweeps take advantage of this fact46,92. Tajima’s D test46 detects an excess (indicated by negative values of D) or deficiency (indicated by positive values of D) of mutations of intermediate frequency relative to derived mutations that segregate at low or high frequencies. Fay and Wu’s47 test extends the Tajima’s D test by providing the power to detect an excess of high-frequency derived alleles, a clear signal of positive selection. The method of Kim and Stephan93 and its derivatives49,64 use the spatial pattern of the SFS to identify the location of a selective sweep.
Locally increased levels of population subdivision can be caused by recent positive selection (for example, REF. 52). Several methods have been proposed for detecting selection based on this idea, for example, that of Akey et al.1, which identifies areas of increased FST, the traditional population genetic measure of population subdivision.
A selective sweep causes a strong temporary reduction in the level of variability. The first and most well-known test of neutrality based on detecting regions is the HKA test94, which compares levels of diversity in different genes or genomic regions (calibrated by interspecific divergence rates) to test whether the rates are significantly increased or reduced in a particular region.
The length of the conserved haplotypes that can be detected by the extended haplotype-based tests depends on the timing and strength of selection. A strongly selected mutation, caught at a time when its frequency is around 0.5–0.7, will show the strongest signature. Two particular regions of the genomes that were highlighted in the Voight et al.40 study are noteworthy for the length of the haplotypes observed. Near the β-glucosidase gene (GBA), which is associated with Gaucher disease, a glycogen-storage disorder, East Asians have a common haplotype that extends 1.39 cM and well over 1 Mb in its physical span. Another gene that is associated with carbohydrate metabolism and blood-sugar regulation is NKX2-2, which has a 1.25-cM haplotype in the European population. It is tempting to speculate that whole-genome association studies will be able to detect differences in some physiological attributes between alternative genotypes in these regions, given the strength of selection that is implied by these large haplotypes.
Finally, Wang et al. Conducted a similar study using another haplotype-based test, the linkage-disequilibrium decay (LDD) test6 (Box 1) , to scan the genome for selection based on the HapMap data. They argued that 1,800 genes, or 1.6% of the genes in the genome, are currently undergoing selective sweeps. In contrast to Voight et al.40, they find that most of these regions are not specific to one population, but are shared among at least two. The categories of genes that show excess evidence for selection in this study included pathogen response, cell cycle, neuronal function, reproduction, DNA metabolism and protein metabolism.
These tests use the allele frequencies in individual segregating nucleotide sites to detect selection (BOX 1). As previously mentioned, a selective sweep causes a skew in the distribution of allele frequencies towards more alleles of low frequencies. Carlson et al.45 used Tajima’s D46 and Fay and Wu’s H tests47 (BOX 1) to scan for distortions from the expected neutral-site frequencies in the human genome using the Perlegen data48, which was supported by additional sequencing of candidate genes. They identified 7, 23 and 29 aberrant regions for populations of African, European and Chinese descent, respectively. One region contained CYP3A4 and CYP3A5, which have a central role in the metabolism of some prescribed drugs. Another region contained vitamin K epoxide reductase complex, subunit 1 (VKORC1), which has been linked to human warfarin dosing. Carlson et al. Proposed that regions with extreme frequency spectra may provide important targets for genotype–phenotype studies45. However, they did not provide an analysis of the correlation of these regions with specific functional or biological categories of genes.
Williamson et al.5 used a composite-likelihood approach49 to detect selection, also based on the Perlegen SNP set48. This method compares models that are fitted to the data with and without a selective sweep to quantify the evidence in favour of such a sweep having occurred in a particular region of the genome. In contrast to haplotype-based methods, this method primarily detects recently completed selective sweeps, and is likely to have little or reduced power to detect incomplete sweeps. This is illustrated by the fact that the prime example of an incomplete sweep, the LCT locus, was not identified in this study. Williamson et al.5 found 101 genes for which there was strong evidence for a selective sweep having occurred (at a significance level of P < 10−5) and, as in Voight et al.40, there was wide variation among populations in locations of these regions. As in previous studies, genes that showed evidence of a selective sweep included those for olfactory receptors, as well as genes related to the nervous system, pigmentation and immunity. In addition, the method found increased signatures of selective sweeps around centromeres. There are several possible explanations for this pattern, one of which is increased selection associated with genetic elements that are responsible for meiotic drive in these regions50.
Another approach for detecting ongoing selection is to study between-population differences in allele frequencies. Positive selection may increase levels of genetic differentiation among populations for two reasons. First, selection might act locally and be related to adaptations to the local environment. An example might be genes relating to skin pigmentation in humans where selection is possibly related to adaptation to the local climate. Second, selective sweeps acting on mutations that arise in specific geographical regions might cause increased levels of population subdivision during the period of time in which the mutation is still increasing in frequency51,52. Even if the mutation is beneficial in all environments, the fact that the mutation arose in a particular geographical location might temporarily increase levels of population differentiation in the genomic region that is affected by selection. For example, allele frequencies for the LCT locus differ dramatically among European populations and between Europe and other continents34. The genetic differentiation in this locus among geographical regions may not be primarily caused by differences in the strength or direction of selection among regions, but rather by a historical contingency. The beneficial mutation that confers lactose tolerance in adults might have arisen in Europe first, and might therefore have so far reached the highest frequency in Europe as opposed to other continents.
Although the first attempts to use population subdivision to identify selection dates back to 1973 (REF. 53), the methodology has been reviewed in the context of genome-wide studies1,54,55. In a study of 26,530 SNPs from African Americans, European Americans and East Asians, Akey et al.1 identified several candidate genes including the CFTR gene, the gene that is famously associated with cystic fibrosis (BOX 1). Weir et al.54 examined the HapMap data using a similar approach, and also identified several regions that showed a drastically increased level of population subdivision, including the LAC locus. The HapMap phase I publication56 also included the results of scanning genome-wide SNP data for evidence for selective sweeps based on haplo-type structure, population differentiation and SFS. They identified a number of candidate regions that harbour genes with extreme differences in allele frequencies among populations, such as ALMS1, which is associated with Alström syndrome, a rare hereditary disease with clinical features including hyperinsulinaemia, visual impairment and obesity.
Inferences about human adaptation are often controversial; this is particularly clear from the debate surrounding studies of the FOXP2, asp (abnormal spindle) homolog, microcephaly-associated (ASPM) and microcephalin (MCPH1) genes, for which claims of human-specific adaptation have been made (BOX 2). Unfortunately, there is little consensus in the field of population genetics about what should be considered convincing evidence of selection. In the following, we will discuss some of the statistical issues that plague inferences of selection.
Much of the positive selection in humans, as well as in other organisms, is related to immune-defence functions and other processes that are unrelated to human-specific evolution. A list of some of the genes that show evidence for positive selection is given in REF. 4. However, there has recently been much interest in identifying loci that are involved in human-specific adaptation, particularly adaptation of cognitive skills. A prime example is the forkhead box P2 (FOXP2) gene, mutations in which cause deficiencies in language skills including grammatical competence, and are additionally associated with the motor-control of craniofacial muscles95–97. Only four FOXP2 mutations occur in the evolutionary tree of mice, macaques, orangutans, gorillas, chimpanzees and humans, two of which occur in the evolutionary lineage leading to humans. This relative speed-up in the evolution of this gene in humans is highly suggestive of positive selection1,10,98. One possible interpretation is that this selection introduced a change in the FOXP2 gene that was a necessary step to the development of speech. However, the selected phenotype could also have been unrelated to speech. FOXP2 is fairly ubiquitously expressed and also has an essential role in lung development. In addition, one of the mutations seems to have occurred independently on the carnivore lineage61, suggesting that this substitution might have been selected for reasons other than language development. Nonetheless, the gene shows a very negative value of Tajima’s D (–2.2) and other evidence for a recent selective sweep that might have coincided temporally with the emergence of anatomically modern humans10.
Other examples of genes for which there is evidence of potential involvement in human-specific adaptation include the asp (abnormal spindle) homologue, microcephaly-associated gene (ASPM) and the microcephalin gene (MCPH1). These genes are cell-cycle regulators, and loss-of-function mutations in these genes are known to cause several deleterious phenotypic effects, including reduced brain size. These genes have been proposed to have unusually long haplotypes and extensive geographical variation, which are both typical signs of recent ongoing selection9,99; the authors of these studies suggested that these patterns arose as a result of positive selection relating to increased brain size. However, Currat et al. 100 re-analysed their data and concluded that human demographic models that include population structure followed by population growth can explain the patterns observed for ASPM and MCPH1 without invoking selection. Furthermore, Yu et al.101 demonstrated that ASPM is not unusual compared with other anonymously selected regions in the genome, with respect to tests for selection based on population subdivision, haplotype structure and frequency spectrum, and argued that recent positive selection at this locus is unlikely. In addition, subsequent tests for an association between IQ and specific variants of ASPM and MCPH1 proved to be negative102.
Methods for detecting selection have historically been challenged by the confounding effects of demography57–62. For example, it is well known57,60 that Tajima’s D will falsely reject neutrality in the presence of population bottlenecks or certain types of population structure (FIG. 3). For example, a recent bottleneck tends to mimic the effects of a selective sweep in several ways63,27. In light of this concern, there has been a considerable effort to develop more robust methods of detecting recent positive selection, and some recently described methods are highly robust under various demographic models that are relevant for human populations20,64.
The issue of sensitivity to demographic factors for different types of test for selection has not been fully explored. However, some methods, such as the composite-likelihood ratio test, which is based on comparing allele frequencies in different parts of the genome, have been shown directly, using simulations, to be highly robust towards perturbations of the demographic model49. Other methods for detecting selective sweeps that are based on haplotypes are possibly also more robust than methods based on population subdivision or allele frequencies, because they compare variation between different allelic classes. Although the overall pattern of genetic variation can be strongly influenced by demographics, this influence might be smaller for the relative variability in different allelic classes. For example, under the simulation conditions of FIG. 3, we have found that the rEHH test never reveals more than 5% significant results at the 5% level when applied to a random haplotype. Although the test is unlikely to be applied to a random haplo-type in real-life applications, this finding suggests some robustness to assumptions about demography. Nonetheless, no method can claim to be 100% robust to such assumptions.
A particularly worrying effect that was recently discovered is the phenomenon of ‘allelic surfing’65,66. As a population expands geographically, rare alleles can, by random genetic drift, be caught on the haplotype with the favoured allele, and ride the wave in expansion of the favoured allele, increasing in frequency in restricted geographical areas. The resulting pattern includes large extended haplotypes, and may mimic selective sweeps. Allelic surfing may provide a new challenge that many current methods do not address, and it is currently unknown whether it will be possible to develop methods that can distinguish between allelic surfing and positive selection with great accuracy.
Another fundamental problem is that ascertainment bias confounds much of the large-scale human genomic data, affecting allele frequencies, levels of population subdivision and patterns of linkage disequilibrium (BOX 3; FIG. 3b). This ascertainment bias arises when the data are obtained not through direct sequencing but by genotyping SNPs that have been discovered in another sample. Patterns of allele frequencies, linkage disequilibrium, population differentiation and so on that are observed in the data depend on the exact procedure that is used to discover SNPs for inclusion in the genotyping effort. The effects of this bias on neutrality tests can be particularly worrying when the SNP-discovery protocol differs among different genomic regions. Regions in which many sequences were used for ascertainment might appear to have more SNP alleles segregating at low frequencies and a more defined haplotype structure — that is, more haplotypes are represented in the data. The observed levels of population subdivision will also depend on the ethnic make-up of the SNP-discovery panel in the local genomic region. In such cases, apparent evidence for selection might simply be caused by variation in the discovery protocol among different genomic regions. Only when the SNP-discovery protocol is well defined can this effect can be controlled for statistically5,67,68. However, in cases in which the SNP-discovery protocol is not known, it might not be possible to directly control for this effect. In most studies, little or no attempt has been made to correct for ascertainment bias, and the effect of this bias on these studies is currently unknown. However, several studies, such as the study by Carlson et al.45, used only the Perlegen data48, and not the full HapMap data, because it is known that ascertainment does not vary among regions in the former data set. In the Voight et al.40 study, a simulation procedure was used to generate data with the same allele frequencies as in the HapMap data, in order to at least partially control for the effects of ascertainment biases.
Almost all genome-wide human population genetic data sets are based on SNP genotyping data (so far, the prime exception is that of Bustmante et al.3, which is based on resequencing of coding genes). This type of data is obtained through a process in which the SNPs are first identified by direct sequencing of a small panel of individuals, and then subsequently genotyped in a larger panel. This two-stage process introduces biases in allele frequencies, patterns of linkage disequilibrium, overall levels of variability, and so on67,68,103,104. For example, because alleles that segregate at low frequencies are unlikely to be detected in the small panel that is initially sequenced, the genotyped data will be deficient in low-frequency alleles. The effect on the statistical methods for detecting selection can be severe. An example is shown in FIG. 3b, using the Tajima’s D46 statistic (BOX 1). When the ascertainment sample size is small, Tajima’s D49 is biased towards large values, with a decreased rate of false positives from below (shown in red in FIG. 3a) and an increased rate of false positives from above (shown in blue in FIG. 3a).
Although it is clear that some methods, for example methods that gain much of their information from the distribution of allele frequencies, such as Tajima’s D test46, will be largely invalidated by the ascertainment bias (FIG. 3b), the effect on other methods, such as methods based on haplotype patterns, is largely unexplored. The effect on patterns of linkage disequilibrium can be severe67, but the effect on many of the statistics that are used to detect selection is unclear. Tests based on haplotype structure, such as extended haplotype homozygosity (EHH) and integrated haplotype score (iHS), are commonly thought to be less affected by ascertainment bias because, even when there is a strong bias towards alleles of high frequency, haplotype structure might not be strongly affected as long as the selected SNPs adequately tag the haplotypes. For example, for a simple ascertainment scheme such as the one in FIG. 3b, with constant ascertainment based on the same chromosomes for all SNPs, the distribution of the relative EHH (rEHH) statistics for a randomly chosen haplotype is largely unaffected, with 6% false positives at the 5% significance level for an ascertainment sample size of n = 2.
Most methods can be corrected for ascertainment bias problems if the SNP-discovery protocol is well documented67. Unfortunately, such information is not available for much of the genome-wide data that have been generated for humans.
Tests for selection can also be highly sensitive to assumptions about recombination rate. Regions of low recombination may produce an upward bias in detection of selection, because the variance in most statistics is increased in regions of low recombination. Distinct from the issue of bias, the statistical power of all of the tests will be a function of the recombination rate, owing to the fact that a selective sweep will leave a much stronger signal in regions of low recombination (FIG. 1). For example, one of the reasons the signal seems to be so strong for LCT might be that this gene occurs in a region of low recombination (FIG. 2).
We examined the average recombination rate for genes or regions that have been determined to be under selection in various studies (TABLE 1), based on the linkage maps of Kong et al.69 and the linkage-disequilibrium map of Meyers et al.70. The regions identified by some of the studies show a drastically reduced level of recombination. For example, the regions in the Wang et al.6 study show an average recombination rate of 0.49 cM Mb−1 and 0.26 cM Mb−1, according to the Kong et al.69 and Meyers et al.70 estimates, respectively. This should be compared with an average recombination rate in the genome of 1.29 cM Mb−1 and 1.33 cM Mb−1 from the two genetic maps, respectively. This does not directly show that the conclusions from the Wang et al.6 study, or any other study, are invalid. It is entirely possible that the low level of recombination in the regions identified in the Wang et al.6 study is caused by an increase in statistical power to detect selection in regions of low recombination. Nonetheless, the results of these studies should be interpreted in the context of the fact that they primarily identify regions of low recombination. An obvious solution to this problem in future studies is to use critical values that are specific to the local recombination rate, and/or to primarily compare regions with similar recombination rates71. It might also be preferable to use methods with low sensitivity to the local recombination rate. However, the relative robustness of different methods to assumptions about recombination rates has not been systematically explored.
Because of the statistical uncertainties in detecting selection that are discussed above, several studies have not assigned p values to specific genes, but have simply relied on detecting outliers in the genome (for example, REF. 40). We stress that this is not the equivalent of a determination of statistical significance. Teshima et al. evaluated the accuracy of such outlier approaches using simulations and concluded that, although these approaches can identify many genes under selection, they may tend to give a biased view of which genes are under selection, depending on the specific assumptions that are used to define outliers71. Application of an outlier approach does not circumvent the problem of assigning statistical confidence. It is obviously preferable to have an accurately assigned p value which can be used to measure confidence in inferences about selection.
One potentially worrying issue is that many of the studies described above have identified different sets of genes as having been subject to recent or ongoing selection. For example, only three regions have been identified as being under selection in each of the Williamson et al.5, Voight et al.40 and Carlson et al.45 studies. A summary of the overlap among gene sets is provided in TABLE 2. The relatively low rate of overlap is not surprising for comparisons of some of these studies, such as those of Williamson et al.5 and Voight et al.40, as these studies aimed to detect incomplete and completed sweeps, respectively, and are therefore expected to pick up different genes. The power of each test depends on the strength of selection, the time that has elapsed since the mutation arose, the dominance of the selected mutations and so on4,71,72.
More worrying, however, is that only 7 of the 90 genes flagged by the Wang et al.6 study are also found among the 713 candidate genes of the Voight et al.40 study. Both aim at detecting incomplete selective sweeps based on extended haplotype methods. Although this might lead to some natural concern about the validity of the conclusions in genome-wide scans for selection, there are several reasons why we believe that the results of most of the studies discussed are reliable, if they are interpreted correctly. First, several of the methods have been evaluated extensively in simulation studies or by theory, and have been shown to be highly robust to the underlying assumptions3,5,49. Second, several of the methods find their strongest signal in regions that have been independently identified as being under selection (for example, REF. 40) such as the LCT region. And finally, we know from theory and simulations that all of the methods are sufficiently statistically sensitive to identify regions of strong selection. So, in the presence of selection, they will pick up selected loci. However, genome-wide studies should be interpreted in light of the fact that the false-negative and false-positive rates are often effectively unknown owing to statistical uncertainties, and that any set of candidate genes, except the most restrictive ones, can contain some false positives. Inferences of positive selection from population genetic data may always be associated with some uncertainty. However, as we learn more about the robustness of individual tests, the demographic history of human populations and the recombination landscape, this level of uncertainty is reduced.
The uncertainty regarding inferences of natural selection has led to calls for functional studies to verify claims of natural selection73. One example of such a functional study is that from Shu et al.74 showing that FOXP2 knockout mice are unable to produce certain vocalizations. However, it is important to realize that much of the selection that is occurring might be acting without any detectable phenotypic effects — it might be acting only under certain conditions (for example, in the presence of certain pathogens), and the selective effects might be so subtle that they are not easily measurable. Additionally, whereas functional effects are often caused by selection, functional differences alone do not demonstrate the past or present action of selection75. For example, the fact that humans from northern Europe have light-coloured hair does not in itself show that there has been selection for light hair colour in these populations. In theory, the evolutionary change in hair colour could have been caused by genetic drift as selection to maintain a dark hair colour has been relaxed, or it could have been caused by indirect effects due to selection relating to skin colour. Almost 30 years ago, Gould and lewontin76 launched a crusade against the adaptive paradigm that functional differences must be adaptive, that is, caused by natural selection. Although their arguments were controversial at the time, they have been highly influential on the community of evolutionary biologists. As a new generation of biologists with a background in genomics, molecular biology or bioinformatics has taken leadership in the field of genomic evolutionary biology, the old lessons from Gould and lewontin76 seem to have been forgotten. It is a desirable addition to a story of selection to identify possible functional reasons why selection might be acting, but it will never be a method for identifying or verifying selection.
Genetic factors underlying both complex and Mendelian diseases should, under most circumstances, be affected by selection, because most diseases will have an effect on organismal survival or reproduction. One exception might be diseases with late onset under the assumption that older individuals do not contribute to the fitness of their offspring. Examples of negative selection acting on disease-causing mutations include mutations in GBA causing Gaucher disease77, mutations in nucleotide-binding oligomerization-domain-containing 2 gene (NOD2; also known as CARD15) causing Crohn disease78, mutations in CMH1, CMH2, CMH3, and CMH4 causing familial hypertrophic cardiomyopathy79, and a host of other genetic disorders that are caused by de novo mutations or segregating recessive mutations. If negative selection alone is operating, the disease mutations are expected to segregate at low frequencies, and to be predominantly recessive. However, there are rare examples of partially dominant disease mutations segregating, such as in some forms of familial hypertrophic cardiomyopathy79, in cases in which the fitness effects of the mutations are relatively low.
In some cases, disease-causing mutations can segregate at relatively high frequencies (for example, diabetes32,80–82), which is not easily explainable if only negative selection has been acting on the disease mutations. Possible explanations for this include balancing selection, for example, in the case of mutations in the G6PD locus or in the α-globin gene, which cause G6PD enzyme deficiency and sickle-cell anaemia, respectively, in the homozygous state, but can confer partial protection against malaria in the heterozygous state83,84. Another example is provided by mutations in the CFTR locus, which causes cystic fibrosis in the homozygous state but protects against asthma in the heterozygous state85.
Another possible explanation for the segregation of disease alleles at moderate or high frequencies is that genetic drift has acted on mutations that have only moderate fitness effects, possibly exacerbated by bottlenecks in the population size, as suggested for Gaucher disease in Askhenazi Jews77. Yet another explanation is that there might have been a recent change in the direction of selection. For example, according to the popular thrifty-genotype hypothesis86, selection has originally worked to maximize metabolic efficiency, especially in population groups that often encountered a scarcity of food. With (evolutionarily) recent dietary changes, the direction of selection might have been reversed, causing many common alleles that are now related to metabolic diseases and/or diabetes to be selected against. There are also other reasons why disease genes might be associated with positive selection; for example, an increased frequency of moderately deleterious mutations due to genetic hitch-hiking25 during a selective sweep.
There is considerable interest in further elucidating the relationship between heritable diseases and selection. Bustamante et al.3 compared human genetic variation across more than 11,000 protein-coding genes that were re-sequenced in 39 individuals (19 African Americans and 20 European Americans). Comparing polymorphism and divergence between humans and chimpanzees at synonymous versus non-synonymous sites, they quantified the amount of positive or negative selection acting on each gene, including both current selection and that which has occurred during the shared evolutionary history of humans and chimpanzees. The study showed that mutations in evolutionarily constrained genes are disproportionately associated with heritable disorders. Specifically, although less than 12% of known genes have been associated with a Mendelian disorder, genes that show plentiful amino-acid variation in human populations (at least four amino-acid- replacement SNPs), but no divergence between humans and chimpanzees, have a 50% chance of causing at least one Mendelian disease. This is the expected pattern in genes if negative selection is acting on new mutations.
These results have been extended in a recent analysis by R. M. Bleckham et al. (unpublished observations) aimed at identifying differences in selective constraint among Mendelian disease genes, genes that contribute to complex disease and genes that are not associated with a disease. This study used the same data as in the Bustamante et al.3 study, in addition to divergence data from Human–Macaque comparisons to quantify selective constraints. They correlated their findings with a hand-curated version of the Mendelian Inheritance in Man database (OMIM), and concluded that Mendelian disease genes tend to be more constrained than those that contribute to non-Mendelian disease, with stronger purifying selection acting on genes with dominant rather than recessive disease mutations. As previously demonstrated by Thomas et al.87, they also found that genes that are implicated in complex diseases tend to be under less purifying selection than either Mendelian disease genes or non-disease genes, with some showing evidence of recent positive selection as reflected by high values of Tajima’s D. This might be taken as support for the thrifty-genotype hypothesis, but could also be consistent with balancing selection acting on these genes. A recent survey by Zlotogora88 discusses 14 common autosomal recessive diseases that show genetic heterogeneity, even in isolated populations. They argue that the pattern observed of multiple mutations segregating at high frequency can be adequately explained only if selection is, or has been, acting in favour of the mutations.
Whatever the role of positive and negative selection in explaining the presence of disease mutations, it is clear that there is a well-established relationship between disease status and selection that can be exploited when searching for the genetic causes for heritable diseases. This can be done at two levels: by selecting candidate loci using bioinformatical methods for detecting selection, or by prioritizing candidate SNPs or haplotypes by ranking them according to the magnitude of their inferred fitness effects. The latter methodology is already well-established in the use of computational methods for determining levels of conservation, such as SIFT89 or PolyPhen90.
Although the field of evolutionary genetics has been challenged by difficulties in separating the effects of demographic history and natural selection, the availability of improved statistical methods has helped to improve the accuracy of predictions of natural selection. With the emerging full-scale re-sequencing of human genomes, other challenges relating to the currently available SNP genotyping data will be eliminated. The re-sequencing of large, diverse panels of human populations will help to settle many of the outstanding questions regarding selection in humans. In addition, it will provide a vehicle for exploiting the link between selection and disease, which will inform many studies on heritable diseases. However, even in the presence of large data sets based on full re-sequencing, there may be debates about historical evolutionary events that cannot be settled. Evolutionary biology is by and large a historical science, and practitioners in this field will have to learn to live with the uncertainty that arises from making inferences about an experiment of nature that cannot be repeated.
We would like to thank D. Reich, M. Przeworski and two anonymous reviewers for their helpful comments on earlier versions of this manuscript. This work was supported by Danmarks Grundforskningsfond and the US National Insitutes of Health grants R01HG003229 and U01HL084706.
ALMS1 | ASPM | CFTR | CYP1A2 | CYP2E1 | CYP3A4 | CYP3A5 | FOXP2 | GBA | G6PD | LCT | MCPH1 | NKX2-2 | NOD2 | PGR | TLL2 | VKORC1
Alström syndrome | Crohn disease | cystic fibrosis | familial hypertrophic cardiomyopathy