|Home | About | Journals | Submit | Contact Us | Français|
More than 2 billion individuals worldwide suffer from helminth infections. The highest parasite burdens occur in children and helminth infection during pregnancy is a risk factor for preterm delivery and reduced birth weight. Therefore, helminth infections can be regarded as a strong selective pressure.
Here we propose that candidate susceptibility genes for parasitic worm infections can be identified by searching for SNPs that display a strong correlation with the diversity of helminth species/genera transmitted in different geographic areas. By a genome-wide search we identified 3478 variants that correlate with helminth diversity. These SNPs map to 810 distinct human genes including loci involved in regulatory T cell function and in macrophage activation, as well as leukocyte integrins and co-inhibitory molecules. Analysis of functional relationships among these genes identified complex interaction networks centred around Th2 cytokines. Finally, several genes carrying candidate targets for helminth-driven selective pressure also harbour susceptibility alleles for asthma/allergy or are involved in airway hyper-responsiveness, therefore expanding the known parallelism between these conditions and parasitic infections.
Our data provide a landscape of human genes that modulate susceptibility to helminths and indicate parasitic worms as one of the major selective forces in humans.
Helminth infections are estimated to infect about 2 billion individuals worldwide (reviewed in ). Although rarely fatal, these parasites cause high rates of morbidity by establishing chronic infections. In particular, the highest parasite burdens are observed in pre-school and school-aged children, often resulting in anemia, undernourishment and growth stunting (reviewed in ). During pregnancy helminth infection is a risk factor for preterm delivery, reduced birth weight and maternal mortality (reviewed in ). Moreover, by chronically infecting their host, parasitic worms increase the susceptibility to other pathogens such as viruses, bacteria and protozoa . Previous works have indicated that the intensity of helminth infection is a heritable trait, although measures of heritability vary among studies and depend on the parasite beanalyzed .
These observations suggest that helminth infections have represented a very strong selective pressure for humans, a selective pressure that is very likely to also be remarkably constant over time. Indeed, most vertebrates have been hosting a wide range of parasitic worms for million years and humans have had their share before emerging as a species (reviewed in ). We have previously addressed the role of helminths as selective agents in human evolution analyzing a large set of human genes encoding interleukins and their receptors; we demonstrated that the pressure imposed by parasitic worms on these genes has been stronger than the one due to viral and microbial agents . The reasons for this observation likely lie in the long-term relationship between humans and helminths, in the relatively slow evolutionary rates of these parasites and in their geographic distribution being considerably stable. Here we aimed at exploiting the selection signatures left by these pathogens on human genes to identify, at the genome-wide level, candidate genes and variants that may have been subjected to helminth-driven selective pressure.
In order to search for candidate variants subjected to helminth-driven selection, an estimate of the selective pressure exerted by these pathogens needs to be defined. It has been previously suggested [4-6] that the selective pressure exerted by infectious diseases in different geographic areas can be estimated by counting the number of different pathogen species/genera that are transmitted in these regions. In particular, in the case of helminthiases, we consider parasite diversity to be a better estimate of helminth-driven selective pressure than prevalence for different reasons. First of all, comprehensive data on prevalence are impossible to retrieve for many parasite species/genera or countries and even when available, prevalence data may vary considerably within the same country depending on the surveyed regions and the population surveyed (e.g. city dwellers rather than farmers/bushmen/nomads, children rather than adults). Moreover, the prevalence of specific helminthiases might have changed greatly over recent years as a result of eradication campaigns, and historical prevalence data are rarely available. Also, it should be considered that prevalence data are difficult to combine since in endemic regions polyparasitism is common ; indeed, in these regions subjects infected with multiple helminths tend to harbor the most intense infections, possibly due to an additive and/or multiplicative impact on nutrition and organ pathology (reviewed in ).
These same observations also apply to other possible measures such as parasite burden or infection pathogenicity, which are very difficult to quantify and that may have varied considerably along human evolutionary history. Conversely, the diversity of helminth species transmitted in different geographic locations has been shown to depend upon climatic variables  which, in turn, may be considered as relatively stable over time, suggesting that diversity may better describe long-term evolutionary pressures. Thus, we calculated helminth diversity from the Global Infectious Disease and Epidemiology Network database (Gideon); as described in the method section, all entries for single helminthiases were manually inspected and all species/genera transmitted in a given location were counted as present irrespective of their prevalence. The number of different helminth species/genera per country is reported in Additional file 1 (Table S1).
We next wished to verify whether the prevalence of the most common helminth infections (as reported in ) correlates with helminth diversity across the 52 populations genotyped in the HGDP-CEPH panel (Additional file 1, Table S1). Again, we retrieved prevalence and diversity data from Gideon by manually inspecting single and retrieving all prevalence surveys. The prevalence of single helminthiases per country was obtained by averaging all surveys; data obtained on HIV-seropositive individuals were not included in the analyses because of the known modulation of HIV on the susceptibility on helminth-based infections [1,9,10]. Finally, countries with no survey data were not included in the analysis for the specific helminth. Parasites were then grouped into five major classes (following ). As shown in table table1,1, the prevalence of all parasite groups correlated with helminth diversity.
Previous analysis have indicated that several genes coding for interleukins and interleukin receptors are subjected to helminth-driven selective pressure . Interleukins are central mediators of immunity and inflammation and, in general, we might expect genes with a role in immune response to be preferential targets of helminth-driven selective pressure. Specifically, we expect variants within these genes to be more frequently associated with helminth diversity than observed for randomly sampled loci. We verified this prediction by analyzing the ImmPort list which contains 2,287 genes involved in immune response and covered in the HGDP-CEPH panel, this latter containing data for more than 660,000 single nucleotide polymorphisms genotyped in almost 950 individuals sampled throughout the world (Additional file 1, Table S1) .
We calculated Kendall's τ rank correlation coefficient between allele frequencies of SNPs in ImmPort genes and helminth diversity. A SNP was defined as being significantly associated with helminth-diversity if it displayed a significant correlation (p value after Bonferroni correction < 0.01; uncorrected p value < 1.94 × 10-7) and a τ value higher than the 95th percentile in the distribution of correlation coefficients calculated over all SNPs having minor allele frequency (MAF) similar (in the 1% range) to that of the SNP being analyzed. This latter requirement stems from the need to account for the influence of non-selective events, as a few HGDP-CEPH population result from recent or ancient admixture  and population history, migratory events and genetic drift have affected human genetic variability [12,13]. Among 2,287 ImmPort genes, 246 contained at least one SNP significantly associated with helminth diversity (Additional file 2, Table S2). The likelihood to obtain an equal or higher number of genes carrying significantly associated SNPs was assessed by a re-sampling approach. Specifically, we divided all genes with at least one SNP typed in the HGDP-CEPH panel in 24 intervals based on the number of typed SNPs; 10,000 re-samplings were then performed by selecting for each ImmPort gene, a randomly selected gene with a similar number of SNPs (i.e. a gene located in the same interval) (see methods). The number of SNPs in ImmPort genes did not differ significantly from the average number in the re-samplings (p = 0.22) and the empirical probability of obtaining 246 genes with at least one significant SNP resulted equal to 0.041, indicating that immune response genes more frequently display variants correlating with helminth diversity compared to randomly chosen loci.
When the same analysis was performed using the prevalence of soil-transmitted nematodes, filarial nematodes, schistosomes and food-borne trematodes no significant enrichment for ImmPort genes was observed (empirical p values = 0.83, 0.96, 0.45, and 0.39, respectively).
In agreement with previous suggestions , these data indicate that helminth diversity may be a reliable estimator helminth-driven selective pressure.
Given these results, we wished to identify SNPs significantly associated with helminth diversity on a genome-wide base. We therefore calculated Kendall's rank correlations between allele frequency and helminth diversity for all SNPs (n = 660,832) typed in the HGDP-CEPH panel. We next searched for instances which withstood Bonferroni correction with α = 0.05 (i.e. uncorrected p value < 7.6 × 10-8) and displayed a τ percentile rank higher than the 95th among MAF-matched SNPs. A total of 3,478 SNPs mapping to 810 distinct genes satisfied both requirements (Additional file 3, Table S3). We next verified whether climatic variables could be responsible for the correlations detected between these SNPs and helminth diversity. Hence, for all countries where HGDP-CEPH populations are located we obtained the following parameters: average annual minimum and maximum temperature, short wave radiation flux and precipitation rate (annual maximum and mean). None of these SNPs withstood Bonferroni corrections in these analyses.
Previous works have reported an enrichment of selection signatures within or in close proximity to human genes [12,14-17]. In line with these data we verified that helminth-associated SNPs are more frequently located within gene regions compared to a control set of MAF-matched variants (χ2 test, p = 0.014).
A full list of the 3,478 SNPs that showed a significant correlation with helminth diversity is available as Additional file 3 (Table S3). Table Table22 shows the 20 strongest correlations together with a short comment on the possible role of candidate genes in immune response or helminth resistance.
Among the 810 genes subjected to helminth-driven selective pressure, we identified ectodysplasin A (EDA) and its receptor (EDAR). EDAR has been subjected to strong positive selection in populations of Asian descent  and is responsible for hair thickness in these populations . One coding variant (rs3827760, 370Val/Ala) in EDAR is thought to be the selection target but it has not been genotyped in the HGDP-CEPH panel. We therefore wished to verify whether the variants we found to correlate with helminth diversity are located on the same haplotype as the selected 370Ala allele. We used the Sweep software  to analyze the haplotype structure in the genomic region encompassing EDAR; we selected a core containing the 370Ala/Val variant and used HapMap data from Asian individuals (Chinese and Japanese). The results indicated that most chromosomes carrying the putatively selected 370Ala variant also harbor four alleles we found to be significantly correlated with helminth diversity (Figure (Figure11).
We investigated the role and functional relationship among helminth diversity-associated genes using the Ingenuity Pathway Analysis and the PANTHER classification system [20-22]. For these analyses, a SNP was ascribed to a given gene if it was located within the transcribed region or in the 25 kb upstream of the transcription start site.
Unsupervised IPA analysis retrieved five high-scoring networks (p < 10-12) (Figure (Figure22 and Additional file 4, Figure S1) and two additional networks with lower scores (p < 10-9). The two highest scoring networks were merged, as well as networks 3 and 5. As shown in figure figure2,2, 37 and 32 genes in merged networks 1 and 2 correlated with helminth diversity, respectively, corresponding to almost 60% of network nodes.
We next investigated the over-representation of PANTHER classification categories among genes significantly associated with helminth diversity. Table Table33 shows the 5 significantly over-represented PANTHER pathways with the contributing genes, as well as the most significantly over-represented molecular functions and biological processes. Notably, genes involved in cytokine-mediated inflammation and integrin signaling accounted for two significantly over-represented pathways. While the multiple functions of integrins and cytokines on the immune system are established, the role of glutamate and α adrenergic receptor signaling in immune related processes are less understood. Recent evidence has indicated that ionotropic glutamate receptors are expressed by T lymphocytes (reviewed in ) and glutamate can exert different actions on these cells including triggering adhesion, proliferation and chemotaxis (reviewed in ). Similarly, α adrenergic receptors are expressed by different immunocompetent cells (reviewed in ); in particular, expression of ADRA1A is induced in human monocytes by inflammatory cytokines . Finally, it is interesting to notice that components of the treleasing hormone (TRH) receptor signaling pathway are also over-represented among genes subjected to helminth driven selective pressure. Tacts on specific receptors within the pituitary gland to stimulate the release of thyroid stimulating hormone and prolactin. The immunomodulatory role of thyroid hormones on immune functions are clearlyestablished (reviewed in ). Moreover, experiments in mice have indicated that thyroxine plays a role in the establishment of Schistosoma mansoniinfection [27,28] and animals treated with the hormone display increased parasite numbers and development of giant worms . Notably, among variants correlating with helminth diversity at the genome-wide level, we also found one SNP relatively close to the gene encoding prolactin (rs13198653, τ = 0.64, p = 4.4 × 10-9) and one variant within PRLR (prolactin receptor, rs4235652, τ = 0.62, p = 1.8 × 10-9) (Additional file 3, Table S3). PRL was shown to be an immunomodulator and acts as a cytokine on many different immune cells; indeed, PRLR is expressed by B, T and NK cells, as well as macrophages (reviewed in ) and PRL expression in T cells is regulated by IL-2, IL-4 and IL-1β. These data therefore suggest a role for both thyroid hormones and prolactin in the resistance to parasitic worms.
We next wished to analyze the relationship between variants/genes associated with helminth diversity and the genetic susceptibility to asthma and allergy. We searched among published genome-wide association studies (GWAS) for SNPs that have been associated with allergy, asthma or related traits (serum IgE levels and plasma eosinophil count). Only 12 SNPs were retrieved, 9 of them genotyped in the HGDP-CEPH panel. One of these SNPs (rs12619285) displayed a significant correlation (p = 5.8 × 10-8) with helminth diversity (Table (Table4)4) and the allele associated with high eosinophil counts  positively correlated with the diversity of parasitic worms. In order to gain further insight into this issue, we focused on genes rather than variants, in line with the view that the gene rather than the allele should be regarded as the replication unit. Given the low consistency that often plagues association studies, only robust allergy/asthma susceptibility genes were considered (see methods). As shown in Table Table4,4, we observed that 12 allergy/atopy genes displayed at least one SNP (either genic or intergenic) significantly associated, at the genome-wide level, with helminth diversity. Among these genes, one SNP in IL4 that has been associated with asthma (rs2070874, +33C/T) has also been genotyped in the HGDP-CEPH panel; again, the allele that positively correlates with helminth diversity (T) is associated with asthma .
It is worth noting that we found several SNPs located upstream the transcription start site of CLTA4 and subjected to helminth driven selective pressure. CLTA4 is located on the long arm of chromosome 2, telomeric to CD28. This raises the possibility that the observed allele associations at these two genes (see Figure Figure3)3) derive from linkage to a single selected allele. Yet, analysis of linkage disequilibrium (LD) (Additional file 5, Figure S2) indicates that LD is not extensive across the genomic region (also due to the presence of a recombination hot-spot in between the two genes), suggesting that CD28 and the promoter region of CTLA4 are independent selection targets.
Here we propose that candidate susceptibility genes for parasitic worm infections can be identified by searching for SNPs that display a strong correlation with the diversity of helminth species/genera transmitted in different geographic areas. Our approach relies on the assumption that helminth-driven selective pressure has affected the spatial distribution of these variants and it may suffer from a few limits and caveats. First, we used the diversity of parasitic worms as a measure of helminth-driven selective pressure for the reasons reported above. Yet, helminth diversity may be affected by report biases (e.g. under-reporting in developing countries due to limited research and clinical facilities) and it weights equally all helminth species irrespective of their prevalence and disease burden. As an example, rare helminth species that are reported in Gideon and included in our diversity measure may have exerted a very limited selective force (although prevalence may have changed over time and it is difficult to infer selective pressure for single species). In order to evaluate the impact of rare species, we recalculated helminth diversity by taking into account only parasites that are common in at least one country (see methods and Additional file 1, Table S1). This diversity measure strongly correlated (τ = 0.75, p < 10-5) with parasite diversity calculated over all species, suggesting that the inclusion of rare helminths should not largely affect our results. Another possible confounding factor is accounted for by the co-variation of helminth diversity with other environmental variables (e.g. climate and other infectious agents). We verified that variants associated with helminth diversity are significantly enriched within immune response genes and that the SNPs we identified do not correlate with climatic factors, suggesting that climate does not act as an important confounding factor. Yet, we cannot rule out the possibility that other infectious agents have affected the spatial distribution of the variants we identified as the diversity of helminths correlates with that of other human pathogens [4,8] across geographic locations.
Finally, although we searched for SNPs strongly associated with helminth diversity (uncorrected p value < 7.6 × 10-8) and we applied a correction based on the MAF-matched distribution of Kendall's rank correlation coefficients, we cannot exclude that the spatial distribution of a fraction of the variants we identified is due to population demography, migration history and drift, therefore representing false positives.
Despite these limitations we were able to identify several genes that can be regarded as good candidates as modulators of susceptibility to helminth infection as testified by our interaction network and PANTHER analyses. Mammalian hosts respond to parasitic worms in a relatively uniform manner by producing specific cytokines (mainly IL-4, IL-10, IL-5 and IL-13) and IgE, as well as through the activation of effector cells such as eosinophils, basophils and mast cells . Overall, the response to helminth infection is Th2-dominated and serves to both oppose the parasite and to contain tissue-damage. In line with this concept, IL4 and IL10, as well as IL13 are hub genes in two interaction networks we identified (Figure (Figure2).2). Nonetheless, the role of other immune components during helminth infections is becoming increasingly clear. In addition to Th1 cells that mediate host response in some stages of Schistosoma and Brugia malayi infection (reviewed in [31,32]), the role of regulatory T cells (Treg) is now recognized (reviewed in [31,32]). An increase in Tregs has been observed in different experimental mouse models of helminthiases and in infected humans (reviewed in ). Notably, among the 20 genes more strongly associated with helminth diversity (Table (Table2)2) we identified two loci, namely CD200R1L  and STIM2  that are involved in the development and function of Treg cells. Additionally, we found several SNPs located upstream the transcription start site of CTLA4 to strongly correlate with helminth diversity (Table (Table4).4). The gene encodes a co-inhibitory lymphocyte molecule that is preferentially expressed by Treg cells  and is thought to be at least partially responsible for the hypo-responsive phenotype of Th2 effector cells (referred to as "conditioned Th2") which is often observed in helminth infections (reviewed in ). As an example, blockade of CTLA-4 during Nippostrongylus brasiliensis infection results in higher Th2 cytokine production and decreased parasite numbers . CTLA-4 competes with CD28 for binding to CD86 (and CD80) (reviewed in ). Both CD28 and CD86 carry variants significantly associated with helminth diversity (Figure (Figure33 and Additional file 3, Table S3) and their binding provides a co-stimulatory signal for naive T cells; however, binding of CTLA-4 to CD86/CD80 on dendritic and T cells leads to functional inhibition (reviewed in ). Similarly to CTLA4, both CD86 and CD28 have been implicated in the immune response against helminths. Specifically, previous studies have indicated that anti-CD86 treatment blocks immune response to Schistosoma mansoni and Heligmosomoides polygyrus and cd28-/- mice display increased susceptibility to S. mansoni . Interestingly, we also found CD247, another co-inhibitory molecule to correlate with helminth diversity (Additional file 3, Table S3). A recent report  has shown that Schistosoma induces anergy of CD4+ and CD8+ T cells by up-regulation of CD247 expression on macrophages. These latter cells are considered an important component of anti-helminth response and an alternative form of macrophages has been described in subjects infected by parasitic worms. These cells up-regulate arginase instead of iNOS and express specific molecules including RETNLB (in humans the entry corresponding to FIZZ1/retnla has been discontinued and replaced with RETNLB) and CHIA (acidic chitinase or AMCase) (reviewed in [31,32]). Notably, we found one SNP in RETNLB to significantly correlate with helminth diversity (Additional file 3, Table S3); with respect to CHIA, we noticed that one variant (rs10494133) displayed a strong correlation with helminth diversity although it did not withstand Bonferroni correction at the genome-wide level (τ = 0.49, p = 3.3 × 10-6).
We found several integrins and adhesion molecules to correlate with helminth diversity (Table (Table3).3). Among these, ITGAM, ITGAL and ITGAX (Figure (Figure2)2) encode integrin α chains that combine with the β 2 chain (encoded by ITGB2) to form leukocyte-specific heterodimeric integrins. These molecules regulate lymphocyte adhesion and transendothelial migration, playing therefore a central role in inflammatory processes. ITGAL and ITGAM are bound by neutrophil inhibitory factor (NIF), an antiadhesive glycoprotein isolated from the canine hookworm Ancylostoma caninum [40,41], suggesting that leukocyte integrins are relevant to the immune response to helminth infections. Interestingly, more recent evidence  has revealed that the genome of the human parasite Necator americanus encodes at least 9 genes with similarity to NIF, suggesting that their products might play a similar role in establishing an immunocompromised niche for the parasite. The ITGAM/ITGB2 and ITGAX/ITGB2 integrins bind iC3b (Figure (Figure2),2), a cleavage product of complement component 3 (C3). Previous studies have shown that iC3b is deposited on N. brasiliensis larvae [43,44] and c3 deficient mice carry high lung larval burdens . In line with these findings, C3 is essential for killing Strongyloides stercoralis larvae in mice  and c3-/- mice do not develop an effective Th2 response after infection with S.mansoni and cannot clear the parasite after chemotherapy .
As far as the second network is concerned (Figure (Figure2),2), it is worth mentioning that NOTCH1 and ETS1 have been implicated in multiple immune functions. The NOTCH signaling pathway is involved in the intrathymic differentiation of T cells, as well as in Th cell development in the periphery . In line with the role of Treg cells in helminth infection, NOTCH1 has recently been shown to be involved in Treg function  and to cooperate with TGFb for regulation of the FOXP3 promoter . With respect to ETS1, it functions as a transcriptional regulator of several cytokine genes including IL5, IL2 and GMCSF [50-52]. It also regulates expression of CD226  and it is known to induce of Th1 mediated inflammation .
Also, network B (Figure (Figure2)2) contains two genes coding for ryanodine receptors (RYR1 and RYR2); we also found RYR3 to correlate significantly with helminth diversity (Additional file 3, Table S3). The function of these molecules in the immune system is poorly understood yet, both RYR1 and RYR3 have been involved in calcium signaling in T cells [55,56]. Moreover, recent evidences have indicated that dendritic cells express RYR1 and activation of the receptor causes a rapid increase in the expression of MHCII molecules on the surface of these cells .
Finally, it is interesting to notice that among the genes subjected to helminth-driven selective pressure in network B we found SYK, encoding a tyrosine kinase that interacts with the high affinity IgE receptor and mediates IgE signaling in mast cells and basophils [58,59]. Similarly, CD226 and FYN (both in network A, Figure Figure2)2) have been involved in mast cell activation mediated by the high affinity IgE receptor , suggesting a role for these genes in allergic inflammation. Indeed, syk has been shown to mediate airway hyper-responsiveness in an experimental mouse model  and most genes discussed above have been involved in the elicitation of allergic phenomena. In addition genes reported in Table Table4,4, the interaction of CD28 with CD86 is central to induction of allergic airway inflammation in mice  and CD86 antisense oligonucleotides suppress airway hyper-responsiveness in allergic animals . Variants in C3 have been associated with asthma  and mice deficient in C3 exhibit diminished airway hyper-responsiveness and lung eosinophilia when challenged with allergen ; also, NOTCH1 is involved CD8+ T cell-mediated development of airway hyper-responsiveness and inflammation , while ITGAL/ITGB2 mediates altered responsiveness of atopic asthmatic airway smooth muscle in rabbits . Finally, Ets-1 induces tenascin expression in bronchial fibroblasts . In this respect it is worth mentioning that, although subjects genotyped in the HGDP-CEPH panel are supposed to be healthy, a proportion of them may suffer from relatively mild diseases including asthma, atopy and related disorders; this may be especially true in some areas such as Latin America, for example, where urban centres have the highest reported prevalence of asthma worldwide (reviewed in ). While this possibility does not affect the results we reported herein, it highlights the fact that the epidemiology of these disorders is rapidly changing, and several reports have revealed a general increase in prevalence with urbanization, leading to the suggestion that environmental factors (including helminth infections) may play a central role in modulating the susceptibility to these diseases (reviewed in ). The relationship between asthma/allergy susceptibility and parasitic worms is though to be complex (reviewed in ). On one hand helminth-driven selective pressure is expected to favor individuals carrying alleles that allow a strong Th2 response and, therefore to promote the transmission and spread of asthma-susceptibility variants. On the other hand, lack of parasites in developed countries has likely removed the immunomodulatory role of these organisms, eventually leading to the increased incidence of atopic conditions. The current knowledge of asthma/allergy susceptibility alleles (12 alleles identified by GWAS) is too limited to warrant extensive speculation on the first issue. Still, our data indicate that many genes we identified carry variants associated with asthma/allergy or have been involved in the elicitation of airway hyper-responsiveness. Therefore, our results expand the previously noticed parallelism between genes involved in the development of asthma/allergy and those responsible for responding to parasitic worms, suggesting that the evolutionary scenario underlying the increase in asthma, allergy and related phenotype envisages a relevant role for these long-standing parasites.
Among the genes subjected to helminth-driven selective pressure we identified EDAR and EDA, its ligand. Binding of ectodysplasin to EDAR activates the NF-kB pathway through the NEMO protein. The EDA/EDAR pair mediates signals needed for the development of ectodermal appendages and mutations in both genes result in hypohidrotic ectodermal dysplasia. Many studies [15,17,71,72] have indicated that EDAR has been subjected to a strong selective pressure resulting in the rapid spread of the putatively selected 370Ala allele in Asian populations. This allele is responsible for the hair phenotype of these populations but the selective pressure underlying the selective sweep is unknown. Hypotheses have been proposed that increased hair thickness might be protective against cold climates or be favored through sexual selection [18,73]. We found that in Asian populations, most chromosomes carrying the selected allele also carry four SNPs subjected to helminth-driven selective pressure (Figure (Figure1).1). Both EDA and EDAR are expressed in human lymphocytes and dendritic cells (see methods), suggesting that they may function as NF-kB activators in these cell types, as well. It is therefore tempting to speculate that helminths represent the selective pressure underlying the spread of a selected allele in Asia. This idea is consistent with the concept whereby infectious agents have represented one of the major selective forces for human populations.
In summary, our data are consistent with the notion whereby parasitic worms have acted as a powerful selective force on human populations and have contributed to shape nucleotide variability at a number of genes involved in immune responses. We also show that several genes associated with helminth diversity are involved in the pathogenesis of atopic conditions or in airway hyper-responsiveness.
Helminth absence/presence matrices for the 21 countries where HGDP-CEPH populations are located were derived from the Gideon database. Information in Gideon is weekly updated and derives from World Health Organization reports, National Health Ministries, PubMed searches and epidemiology meetings. The Gideon Epidemiology module follows the status of known infectious diseases globally, as well as in individual countries, with specific notes indicating the disease's history, incidence and distribution per country. We manually curated helminth absence/presence matrices by extracting information from single Gideon entries. Following previous suggestions [4-6], we recorded only helminths that are transmitted in the 21 countries, meaning that cases of transmission due to tourism and immigration were not taken into account. A total of 60 helminth species were identified in at least one country (Additional file 6, Table S4). Prevalence data for single helminth infections were similarly obtained from Gideon, as described in the text. In order to calculate parasite diversity for species that are common in at least one country, we inspected Gideon entries for survey data or prevalence notes; helminth infections reported as "rare in humans" were discarded; similarly, parasites with no prevalence estimates or notes were considered as rare; therefore, this diversity measure should be regarded as an approximate estimate.
The annual minimum and maximum temperature were retrieved from the NCEP/NCAR database (Legates and Willmott Average, re-gridded dataset) using the geographic coordinates reported by HGDP-CEPH website for each population. Similarly, net short wave radiation flux data were obtained from NCEP/NCAR (Reanalysis 1: Surface Flux); these data were read using Grid Analysis and Display System (GrADS).
Since helminth diversity, due to data organization in Gideon, can only be calculated per country (rather than per population), the same procedure was applied to climatic variables. Therefore the values of annual temperature, radiation flux and precipitation rate were averaged for populations located in the same country. This assures that a similar number of ties is maintained in all correlation analyses.
Data concerning the HGDP-CEPH panel derive from a previous work . Atypical or duplicated samples and pairs of close relatives were removed . Following previous indications [4,5], Bantu individuals (South Africa) were considered as one population.
A SNP was ascribed to a specific gene if it was located within the transcribed region or no farther than 500 bp upstream the transcription start site. MAF for any single SNP was calculated as the average over all populations. The list of immune response genes was derived from the Immunology Database and Analysis Portal (ImmPort). Expression data were obtained from SymAtlas. SNPs identified in GWAS and associated with allergy, asthma or related traits (serum IgE levels and plasma eosinophil count) were derived from the A Catalog of Published Genome-Wide Association Studies. The list of allergy/asthma susceptibility genes was obtained from a previous review  or from the Online Medelian Inheritance in Man website (MIM: 600807).
All correlations were calculated by Kendall's rank correlation coefficient (τ), a non-parametric statistic used to measure the degree of correspondence between two rankings. The reason for using this test is that even in the presence of ties, the sampling distribution of τ satisfactorily converges to a normal distribution for values of n larger than 10 .
In order to estimate the probability of obtaining 246 genes carrying at least one significantly associated SNP out of a group of 2,287 genes (the number of ImmPort genes), we applied a re-sampling approach after dividing genes on the basis of the number of SNPs typed in the HGDP-CEPH panel. In particular, all genes covered by at least one SNP in the HGDP-CEPH panel (number of genes = 15,280) were divided in 24 intervals based on the distribution of typed SNPs per gene (Additional file 7, Table S5). Samples of 2,287 genes were randomly extracted from a list of all genes covered by at least one SNP in the HGDP-CEPH panel by applying the criterion that for each ImmPort gene, a control gene was selected from the same interval. For each sample the number of genes with at least one significant SNP were counted. The empirical probability of obtaining 246 genes was then calculated from the distribution of counts deriving from 10,000 random samples. Similarly, the number of SNPs in ImmPort genes was compared to the distribution of SNPs in the 10,000 re-samplings.
Analysis of PANTHER over-represented functional categories and pathways was performed using the "Compare Classifications of Lists" tool available at the PANTHER classification system website. Briefly, gene lists are compared to the reference list using the binomial test  for each molecular function, biological process, or pathway term in PANTHER. All p values were Bonferroni corrected. All calculation were performed in the R environment . For PANTHER analysis we widened the inclusion criteria in that SNPs located within the transcribed region or in the 25 kb upstream the transcription start site were ascribed to the gene.
eQTL data were derived from the eQTL Resource web site held at the University of Chicago (Prtichard Lab).
Biological network analysis was performed with Ingenuity Pathways Analysis (IPA) software using an unsupervised analysis. IPA builds networks by querying the Ingenuity Pathways Knowledge Base for interactions between the identified genes and all other gene objects stored in the knowledge base; it then generates networks with a maximum network size of 35 genes/proteins. We used all genes showing at least one significantly associated SNP as the input set; in this case a SNP was ascribed to a gene if it was located within the transcribed region or in the 25 kb upstream. All network edges are supported by at least one published reference or from canonical information stored in the Ingenuity Pathways Knowledge Base. To determine the probability of the analyzed genes to be found together in a network from Ingenuity Pathways Knowledge Base due to random chance alone, IPA applies a Fisher's exact test. The network score represents the -log (p value).
SNP: single nucleotide polymorphism; Treg: regulatory T cell; LD: linkage disequilibrium; MAF: minor allele frequency; NIF: neutrophil inhibitory factor; TRH: treleasing hormone; PRL: prolactin.
MF, UP, RC and MS performed the analyzes, analyzed and interpreted the data; GPC and NB participated in the study coordination; MC and MS conceived the study and wrote the paper.
All authors read and approved the final manuscript.
The Immunology Database and Analysis Portal, https://www.immport.org
Ingenuity Pathway Analysis, Ingenuity Systems, http://www.ingenuity.com
NCEP/NCAR, Surface flux, http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surfaceflux.html
Catalog of published GWAS, http://www.genome.gov
eQTL resources @ the pritchard lab, http://eqtl.uchicago.edu/Home.html
UCSC Genome Browser, http://genome.ucsc.edu
HGDP-CEPH Panel, http://hagsc.org/hgdp/
Sweep software, http://www.broadinstitute.org/mpg/sweep/
Table S1. Populations in the HGDP-CEPH panel and helminth diversity estimates.
Table S2. Genes in the ImmPort list that display at least one SNP significantly associated with helminth diversity. For each gene the SNP showing the strongest correlation is reported. SNPs are ranked according to the value of τ.
Table S3. SNPs significantly associated with helminth diversity. The table reports all SNPs that withstood Bonferroni correction at the genome-wide level (with alpha = 0.5) and displayed a tau percentile rank higher than the 95th among MAF-matched SNPs, as described in the main text. SNPs are ranked according to the value of τ. If the SNP is located within a genic region (or in the 500 upstream nucleotides) the gene symbol is reported. Alternatively, the gene closest to the SNP and its distance (in bp) are indicated.
Figure S1. In addition to the two merged networks in the main text, IPA identified three additional networks (A-C) with p < 10-9. Genes are represented as nodes, edges indicate known interactions between proteins (sold lines depicts direct and dashed lines depict indirect interaction). Genes are color coded as follows: green, genes with at least one SNP significantly associated with helminth diversity; gray, genes covered by at least one SNP in the HGDP-CEPH panel; white, genes with no SNPs in the panel.
Figure S2. Analysis of LD in the genomic region encompassing CD28 and CTLA4. SNPs significantly associated with helminth diversity are shown in red while the region covered by CD28 and CTLA4 are shown in blue. The location of DNAse hypersensitive sites in CD4+ T cells is shown in gray while recombination hot-spots are in black. LD plots (r2) are shown for Yoruba (YRI), Europeans (CEU) and Asians (JPT+CHB). The image was generated by using the "add custom track" utility available through the UCSC Genome Browser.
Table S4. Helminth species/genera transmitted in at least one country and that are common in at least one country.
Table S5. Gene subdivision on the basis of SNP number. Genes were divided in 24 intervals according to the number of SNPs typed in the HGDP-CEPH panel.
M.S. is part of the Doctorate School in Molecular Medicine, University of Milan.
This study was supported by grants from Istituto Superiore di Sanita' "Programma Nazionale di Ricerca sull' AIDS", the EMPRO and AVIP EC WP6 Projects, the nGIN EC WP7 Project, the Japan Health Science Foundation, 2008 Ricerca Finalizzata [Italian Ministry of Health], 2008 Ricerca Corrente [Italian Ministry of Health], Progetto FIRB RETI: Rete Italiana Chimica Farmaceutica CHEM-PROFARMA-NET [RBPR05NWWC], and Fondazione CARIPLO.