In recent years, great efforts have been made to assess the role played by natural selection during human evolution 
. Genome-wide scans for recent positive natural selection identified a putative list of non-neutrally evolving genes involved in specific biological pathways including metabolism, immune function, and skin pigmentation 
. These findings suggest that selective pressures related to adaptation to local environmental conditions might have contributed in shaping human genetic variation.
Here we developed a statistical framework for identifying signals of adaptation to local environments. We correlated the spatial allele frequency distribution of a large sample of SNPs, genotyped in more than 50 populations distributed worldwide, to a set of environmental factors, describing local geographical features such as climate conditions, diet regimes (measured as subsistence strategies) and pathogen loads.
Previous studies have shown that SNPs with an increased degree of population genetic differentiation (measured using FST
or other statistics) are enriched for genic SNPs 
. Our analyses confirm these observations by finding a significant enrichment of genic SNPs, in particular non-synonymous SNPs, vs. intergenic SNPs for high values of the regression prediction accuracy, Q2
provides a measure of genetic differentiation among populations relative to the defined set of environmental variables.
Interestingly, the enrichment of non-synonymous SNPs is quantitatively greater than the enrichment of genic variants, in line with the hypothesis that a larger fraction of non-synonymous SNPs has direct functional effects.
We can exclude the possibility that this enrichment is explained by different distributions of recombination rates, allele frequencies or FST between genic and intergenic SNPs, as the enrichment is also apparent when stratifying according to MAF, recombination rate or FST.
It is worth noting that the SNP data analyzed here suffer from an ascertainment bias, owing to the protocols used for selecting SNPs for the genotyping platforms. One of the main effects of the ascertainment bias is a shift toward common variants with non-negligible consequences on various statistics, such as measures of population structure 
. However, as previously mentioned, the results hold up even when stratifying with respect to MAF and FST
, suggesting that ascertainment biases, which primarily affect the data through the allele frequency, do not have a strong effect on our results.
Overall these results strongly indicate that the enrichment of genic and nonsynonymous variants among SNPs with a high value of Q2 may truly reflect the action of natural selection.
Importantly, we find a quantitatively higher, and statistically significant, enrichment of putative functional SNPs for high values of Q2 for models comprising pathogens as predictors rather than climate or diet (), even we testing for additional climate variables such as temperature and precipitation annual ranges. Although all the environmental factors we have investigated contribute to Q2, our results suggest that pathogens are a more important driver of local adaptation than other factors explored in this paper.
To further investigate this issue, we computed partial Mantel correlation between the locus-specific population genetic distance and three different matrices describing pathogen load, diet regimes or climate conditions. In doing so, we used the average distance of allele frequencies as a covariate to control for background demographic processes. As expected, most (approx. 95%) of the variance in allele frequencies among populations can be explained by non-adaptive processes. Nonetheless, we were able to identify a non-negligible contribution of selection. Several loci showed large values (>15%) in the improvement of explained variance I(R2)
, when adding a specific environmental matrix (pathogen, diet or climate; see Materials and Methods
; , Figure S3
, Table S5
Genes with a statistically significant I(R2) are likely targets of local selection because I(R2) measures the increase in explained variance by an environmental factor when taking average distances among populations into account. In particular, there is a strikingly larger number of genes significantly correlated to the distance matrix describing pathogen diversity compared to the ones related to climate conditions or diet regimes. A total of 103 genes are significantly correlated in frequency with pathogen predictors while none correlates with climate or subsistence strategies. This predominant role of pathogen-driven selection in the human genome is confirmed when testing each variable within each environmental category separately (229, 10 and 9 genes significantly correlated in frequency with at least one pathogen, subsistence and climate variable, respectively).
Furthermore, we validated our results using low-coverage sequencing data for a smaller set of SNPs and populations, ruling out the possibility that ascertainment bias coupled with a bias in reporting pathogen diversity may lead to the observed prevalence of pathogen-driven selection. We should add that other factors could affect local adaptation than the factors examined here. The quantitative measures used here may not be the ones that correlate most closely with the components of the environment that affect fitness. Other measures of local climate or subsistence, that include variables not examined here might show a stronger effect on local adaptation. However, among the quantitative measures of environmental factors explored here, it is clear that pathogen load has been the most important factor shaping human genetic diversity.
It is perhaps not surprising that selection related to pathogens appears to be the most dominating driver of local adaptation, given the number of studies reporting pathogen related selection in humans, including selection on proteins used by pathogens to infect cells (such as certain blood group antigens 
), pathogen receptors (such as the TLR family 
and glycosylated extracellular membrane proteins 
) and selection on genes product directly involved in immune/defense response to pathogens (e.g. 
Infectious diseases have represented, and still represent, one of the major causes of death for human populations, especially in developing countries 
. Not surprisingly, genes responsible for heritable variation in the response to pathogens are likely targets of natural selection.
It may be more surprising that the pressure imposed by parasitic worms (helminthes) on human genes has been stronger than the one due to viral, protozoa or bacterial agents (). Perhaps this is due to the fact that helminthes evolve slower than unicellular/viral agents and that they often have complex life cycles which results in a relatively stable geographic distribution 
. Evolutionary changes in the helminthes, therefore, occur at a similar time-scale to that of humans, allowing for a true co-evolutionary interaction between humans and the pathogen. Faster evolving species (e.g., viruses) may perhaps not exert the same selective pressure for long enough time to induce a sufficiently strong change in allele frequencies.
We identified signatures of pathogen-mediated selection in 103 distinct human genes. Overall, genes highly correlated with pathogen diversity show a significant enrichment of immunity related functions, according to Gene Ontology analysis (). Again these findings strongly suggest that the candidate loci we detected truly are targeted by natural selection due to adaptation to pathogens.
Among 103 genes targeted by pathogen-driven selection, 23 are directly related to immunity processes, according to ImmPort database (www.immport.org
). These genes encode signaling molecules involved in the inflammatory response, such as IL6
, cell surface proteins participating in immune functions, such as ADAM17, ITGAL,
, and signal transducers of the innate and adaptive immune response such as MYD88
). In particular, ADAM17
has been shown to be involved in viral entry and to participate in intestinal inflammation triggered by Toll-like receptors (TLRs). In addition to ADAM17
, we have identified 9 other genes with high I
) values when using pathogen diversity as covariate that also participate in the Toll-like receptor signaling pathway (). One of these genes, MYD88,
encodes a cytosolic adapter protein central for the transduction of the immune response. This protein is implicated in sensing retroviral infections by endosomes 
is also implicated in the immune response to Bacteroides fragili
, Plasmodium berghei 
and helminth infections 
. Several of the 23 immunity-related genes with high I(R2)
values have previously been reported to be related with pathogen infection, mainly to bacterial infections and viral infections (Table S8
Interestingly, the two enriched signaling pathways we identified relate to two very different categories of immune response and they function in the defense against different pathogen groups (). Toll-like receptors (TLR) are molecules involved in the innate immunity and account for the first-line defense against viruses, bacteria, fungi and protozoa (reviewed in 
), although previous studies have demonstrated the TLR-mediated signaling pathway is also important for resistance to helminthes in mice (Schistosomal-derived lysophosphatidylcholine is involved in eosinophil activation and recruitment through Toll-like receptor-2-dependent mechanisms).
While different TLRs have previously been shown to be targets of natural selection 
, our data indicate that pathogens have also exerted a pressure on genes that impinge on the cellular pathways associated with these receptors.
The second signaling pathway enriched with 13 genes targeted by pathogen-driven selection genes is Leishmaniasis (). Leishmania are obligate intracellular parasites (protozoa) that produce diseases in humans and mice. When associated with malnutrition, Leishmania infection can produce extremely serious symptoms, and a recent WHO survey indicates that epidemics of visceral leishmaniasis can lead to massive deaths in affected areas (http://www.who.int/leishmaniasis/
). Thus, the parasite is likely to have exerted a strong selective pressure during human evolutionary history.
Dendritic cells (DC), sentinels of the immune system, detect Leishmania in vivo. It has been shown that MyD88-dependent receptors are implicated in the direct recognition of Leishmania by DC 
, pointing again to MyD88 as an important element in host-pathogen recognition.
Genes related to immunity and inflammation regulation are known to be common targets of natural selection 
. In particular, recent reports have suggested that a portion of susceptibility alleles for autoimmune diseases might be maintained in human population because they confer increased resistance against infection 
. The identification of several autoimmune disease-related genes as target of natural selection may be consistent with the hygiene hypothesis 
. This model states that humans have adapted to a pathogen-rich environment that no longer exists in industrialized societies. This change has reduced the exposure of the immune system to antigens, causing an overreacting immune response which favors the development of chronic inflammatory conditions 
Indeed, our data indicate that SNPs with allele frequencies that correlate highly with pathogen variables are enriched for GWAS SNPs associated with autoimmune diseases (). Specifically, among our candidate genes we identified several loci that have been associated with celiac disease, ulcerative colitis (UC), type 1 diabetes (T1D), Crohn's disease (CD), and multiples sclerosis (MS) (both susceptibility and disease severity) (). Signatures of natural selection at risk alleles for celiac disease, UC and CD have previously been described 
, although these variants were located in genes different from the ones we describe herein. Conversely, only a minority of genes involved in the susceptibility to T1D and MS have been described as possible selection targets 
, although a certain degree of overlap among genes involved in MS pathogenesis and loci subjected to virus-driven selection has previously been noticed 
. Therefore, our data further support the notion that natural selection has contributed to shaping the pattern of genetic variability relating to this common disorder.
Hancock and colleagues recently performed a genome-wide scan for selection signals by detecting SNPs strongly correlated in frequency with climate 
. They investigated genetic variation in a similar set of populations, and a similar data set of genotyped SNPs as this study. They retrieved a number of SNPs putatively subjected to climate-mediated selection, while we found only weak signals for genetic adaptation to climate conditions. There are several possible reasons for this apparent discrepancy. First, Hancock and colleagues' and our method are intrinsically different both in the analyzed elements (SNPs rather than genes, respectively) and in the approach to detecting significant signals (extreme Bayes Factors versus p-
values, respectively). Most likely, our criterion for selecting extreme genes is more conservative than the one used by Hancock and colleagues. However, when applying their approach to our data set, we retrieved a significant overlap of genes correlated with different environmental factors (Table S5
, Table S6
). These observations suggest that the two studies, although examining different climate variables in a different sample of populations, lead to concordant results. Second, they found evidence of selection for SNPs located in immune-related genes or previously associated with autoimmune diseases and inflammatory conditions. As stated by authors themselves, it is likely that the selective pressure imposed on these genes is related to pathogen resistance/susceptibility 
, which is in agreement with our main results.
A major assumption in this study, is that the number of different pathogen species (pathogen richness or diversity) transmitted in a given geographic location is a good estimate of the pathogen-driven selective pressure for populations living in that area 
. Indeed, there is evidence that pathogen richness is a suitable and more effective measure than standard epidemiological parameters (like prevalence or mortality) for estimating the selective pressure exerted by infection agents, and that it better captures the signatures left by adaptation to specific pathogens throughout recent human evolution 
. It is worth noting that our measure of pathogen evolutionary is noisy, discrete, possibly affected by report biases and calculated on a country level.
More accurate worldwide epidemiological data, as well as more detailed description of diet regimes for human population, are required to obtain a clearer picture of the effect of genetic adaptation to pathogen load or subsistence strategies, especially when comparing with adaptation to climate.
However, any inadequacies of the statistics we use to measure pathogenic environment will lead us to underestimate the role of the pathogenic environment in human local adaptation. Perhaps pathogen related selection plays an even stronger role in human evolution than what has been evidenced in this study and in previous studies.