|Home | About | Journals | Submit | Contact Us | Français|
Many genes involved in immunity evolve rapidly. It remains unclear, however, to what extent pattern-recognition receptors (PRRs) of the innate immune system in vertebrates are subject to recurrent positive selection imposed by pathogens, as suggested by studies in Drosophila, or whether they are evolutionarily constrained. Here, we show that Toll-like receptor 5 (TLR5), a member of the Toll-like receptor family of innate immunity genes that responds to bacterial flagellin, has undergone a history of adaptive evolution in primates. We have identified specific residues that have changed multiple times, sometimes in parallel in primates, and are thus likely candidates for selection. Most of these changes map to the extracellular leucine-rich repeats involved in pathogen recognition, and some are likely to have an effect on protein function due to the radical nature of the amino acid substitutions that are involved. These findings suggest that vertebrate PRRs might show similar patterns of evolution to Drosophila PRRs, in spite of the acquisition of the more complex and specific vertebrate adaptive immune system. At shorter timescales, however, we found no evidence of adaptive evolution in either humans or chimpanzees. In fact, we found that one mutation that abolishes TLR5 function is present at high frequencies in many human populations. Patterns of variation indicate that this mutation is not young, and its high frequency suggests some functional redundancy for this PRR in humans.
Vertebrate immune systems include acquired and innate components. Pattern-recognition receptors (PRRs) are an essential component of the innate immune system. PRRs recognize and bind pathogen-associated molecular patterns (PAMPs), conserved molecular motifs that are shared by infectious agents but which are absent in the host. The interaction between PRRs and PAMPs illustrates two fundamental aspects of the innate immune system: 1) the ability to discriminate between self and nonself and 2) the targeting of components essential for microbial fitness, which are therefore functionally constrained (Medzhitov and Janeway 1997). Toll-like receptors (TLRs) constitute the best characterized PRRs of the innate immune system of vertebrates, and so far, 10 have been described in humans (Akira and Takeda 2004). After stimulation with their ligands, TLRs form homo- or heterodimers and trigger intracellular signaling cascades that induce the expression of a variety of genes. This in turn leads to the activation of innate immunity effector mechanisms as well as the development of adaptive immunity (Akira and Takeda 2004).
Because TLRs interact with microbial invaders, theory predicts that over evolutionary time they may be engaged in coevolutionary arms races with their microbial ligands. Recent results from the comparison of several Drosophila genomes support this hypothesis, showing that among innate immunity loci, PRRs constitute a functional class that evolves quickly between species (Sackton et al. 2007). It remains unclear whether vertebrates and invertebrates are similar in this respect. On the other hand, given the extremely conserved nature of the molecular patterns targeted by TLRs, they might be evolutionarily inflexible. In fact, they are often cited as an example of evolutionary conservation due to the fundamental constraint imposed by the inability of pathogens to tolerate mutations in molecular motifs that are essential to their fitness (Medzhitov and Janeway 1997).
Here, we attempt to distinguish between these two competing hypotheses using the evolution of Toll-like receptor 5 (TLR5) in primates as an example. So far, the limited evidence about the patterns of molecular evolution of TLRs in primates is inconclusive. Although Ortiz et al. (2008) claimed that all TLRs, except for TLR1, have evolved under purifying selection in primates, Nakajima et al. (2008) using a more extensive phylogenetic sampling suggested that TLR4 has been under positive selection in Old World monkeys.
TLR5 targets monomeric flagellin, the main component of the bacterial flagella and a potent virulence factor (Hayashi et al. 2001; Ramos et al. 2004). Recently, Andersen-Nissen et al. (2005) showed that members of the α and ϵ Proteobacteria that are important human pathogens, such as Campylobacter jejuni and Helicobacter pylori, are able to evade TLR5 recognition by mutating key residues in the TLR5 recognition site. These mutations abolish flagellar motility, but the pathogens acquire compensatory mutations in other parts of the flagellin molecule that restore motility, which is essential for efficient infectivity. These results demonstrate that pathogens can evolve to evade PRR recognition while remaining fully functional and capable of infection. More importantly, these findings suggest opportunities for coevolution between PRRs and their microbial ligands, in spite of some overall functional constraint.
Additional motivation for studying PRRs in primates comes from ideas concerning the relationship between mating systems and disease risk. Based on the finding that the basal number of white blood cells in primates and carnivores is correlated with the degree of sexual promiscuity, but not with other life-history traits expected to influence disease risk, Nunn and colleagues proposed the controversial idea that mating system drives the evolution of the immune system (Nunn et al. 2000, 2003; Nunn 2002). The underlying hypothesis is that in promiscuous species, the increased risk of acquiring sexually transmitted diseases has resulted in the evolution of stronger immune systems. This hypothesis has not been broadly tested at the molecular level. As a secondary goal, we take advantage of the variation in mating system among primate species to test predictions of this hypothesis.
A final motivation for studying TLR5 comes from association studies in humans, which showed that a premature stop codon (TLR5392STOP) was linked to susceptibility to Legionnaire's disease, a type of pneumonia produced by the flagellated bacterium Legionella pneumophila (Hawn et al. 2003) and resistance to two autoimmune disorders: Crohn's disease (Gewirtz et al. 2006) and Systemic Lupus Erythematosus (SLE) (Hawn et al. 2005). TLR5392STOP results in a loss of function, acts in a negative-dominant fashion, and has been reported to segregate in different populations at frequencies between 5% and 10% (Hawn et al. 2003, 2005; Gewirtz et al. 2006). Hawn et al. (2005) suggested that the high population frequency of TLR5392STOP might be due to an evolutionary advantage associated with defective TLR5-mediated signaling, at least in some situations. The “less is more” hypothesis proposed by Olson (1999) and Olson and Varki (2003) suggests that gene loss might be advantageous and an important engine of evolutionary change. This idea has received considerable attention recently in light of several reports of adaptive pseudogenizations in the human lineage (Tournamille et al. 1995; Ali et al. 1998; Wang et al. 2006; Seixas et al. 2007). The idea that TLR5 might be another case of adaptive gene loss in humans is intriguing because of its putative important immunologic function.
Here, we have analyzed the entire TLR5 coding sequence of 22 species of old and new world primates and apes in a phylogenetic framework, and surveyed sequence variation in both coding and noncoding regions in population samples of humans and chimpanzees to answer the following questions: 1) Has TLR5 undergone adaptive evolution in primates? 2) Is there any support for the promiscuity/disease-risk hypothesis in the rates of protein evolution across primates? 3) Are there signatures of positive selection in the patterns of nucleotide variation at TLR5 in humans and chimpanzees? and 4) Has the premature stop codon in humans increased in frequency due to recent positive selection? We found convincing evidence of positive selection at TLR5 throughout the primate phylogeny, involving amino acids that might mediate flagellin recognition, suggesting that innate immunity genes may experience some of the same evolutionary pressures previously described for adaptive immunity genes. Only four of six independent transitions to increased sexual promiscuity were associated with increased rates of protein evolution, arguing against the hypothesis that mating system plays a major role in TLR5 evolution. In humans and chimpanzees, patterns of DNA sequence variation are largely consistent with neutral expectations, suggesting that the relatively high frequency and widespread distribution of the human TLR5392STOP mutation might be a consequence of functional redundancy.
The species used in the phylogenetic analyses are shown in figure 1, and the origins of the samples are given in supplementary table 1, Supplementary Material online. Samples were collected in accordance with Institutional Animal Care and Use Committee guidelines. Additionally, coding sequences of Homo sapiens and Macaca mulatta were retrieved from GenBank (accession numbers NM_003268 and XM_001099501, respectively).
DNA Samples from 19 Pan troglodytes verus, 3 Pan troglodytes troglodytes and 18 humans (9 Africans, 9 Europeans) from the Y-Chromosome Consortium DNA collection were provided by Dr. Michael Hammer at the University of Arizona. Human sequence data (24 African Americans and 23 European Americans) for two nonoverlapping fragments that together include ~17 kb were gathered from the Innate Immunity Database (www.innateimmunity.net).
Nine hundred and fifty individuals from the Human Genome Diversity Panel (Cann et al. 1999, 2002) were used to estimate the worldwide frequency and geographic distribution of the TLR5392STOP mutation. This Human Genome Diversity Panel (HGDP) excludes samples previously identified as related individuals or duplicates (Rosenberg 2006).
The entire coding region of TLR5 (~2.5 kb) was polymerase chain reaction (PCR)-amplified and sequenced from the 19 primate species listed above, using primers designed in conserved regions of published primate sequences. Together with the Macaca sequence from GenBank and the human and chimpanzee sequences (see below), the phylogenetic analyses included 22 species.
Two nonoverlapping genomic fragments were PCR amplified and sequenced from 18 humans (~12 and ~5 kb) to match similar gene regions available from the Innate Immunity Database (see below). A 5-kb fragment was also PCR amplified and sequenced in 19 P. t. verus and 3 P. t. troglodytes. In humans and chimpanzees, the sequenced regions contain the complete coding region as well as adjacent noncoding sequence.
PCR was performed in 25–50 μl reactions using Platinum Taq High Fidelity DNA Polymerase (Invitrogen, San Diego, CA). A complete list of amplification and sequencing primers for all fragments and the corresponding annealing temperatures and PCR protocols are provided in supplementary tables 2 and 3, Supplementary Material online. PCR products were purified using the Qiagen PCR purification kit (Qiagen, Valencia, CA) and sequenced using an ABI 3700 automated sequencer (Applied Biosystems, Foster City, CA). Sequences have been deposited in GenBank under the following accession numbers: Primates other than humans and chimpanzees: FJ542200–FJ542219; Chimpanzees: FJ546349–FJ546370; and Humans: FJ556974–FJ556991.
Sequence editing and assembly were performed using SEQUENCHER (Gene Codes, Ann Arbor, MI). DNA sequences were aligned using ClustalX (Thompson et al. 1997) with manual alignment of small indels using the amino acid sequence as a reference. Gametic phase was computationally determined using PHASE (Stephens et al. 2001).
We tested for positive selection in the primate phylogeny by comparing the number of nonsynonymous substitutions per nonsynonymous site (dN) with the number of synonymous substitutions per synonymous site (dS) in a maximum likelihood (ML) framework. A ratio of dN/dS(ω) greater than one is usually taken as evidence of selection. We used the accepted primate phylogeny (Purvis 1995; Bininda-Emonds et al. 2007) in all analyses. We also used the TLR5 data to estimate phylogenetic relationships using Neighbor-Joining. The resulting tree was similar to the well-accepted primate phylogeny (Bininda-Emonds et al. 2007) with only four branches placed in slightly different positions. Analyses of selection using the TLRs tree yielded very similar results to those obtained using the accepted primate phylogeny, so we report only the latter below.
First, we evaluated selection at individual codons, not allowing variation among lineages. We ran a series of nested models implemented in PAML ver 4 (Yang 1997, 2007), in which the “neutral” models restrict ω to values ≤1, whereas “selection” models include a class of sites with dN/dS > 1. A likelihood ratio test (LRT) was then used to compare nested models (table 1). To check for convergence, all analyses were run twice, using initial ω values of 0.5 and 1.5. Amino acids under selection for model M8 were identified using a Bayes Empirical Bayes approach (BEB) (Yang et al. 2005). Two models of codon frequencies were used: F3x4 and F61.
A recent improvement in statistical methods to infer selection in a phylogenetic context is the incorporation of variation in the rate of synonymous substitution (Pond and Muse 2005). Kosakovsky Pond and Frost (2005) proposed a series of models to study selection on a codon basis. They classify previous methods as either “counting methods,” “random effect models,” or “fixed effect models.” Counting methods reconstruct the ancestral sequences to estimate the number of synonymous and nonsynonymous changes at each codon. Random effect models assume a distribution of rates across sites and then infer the rate at which individual sites evolve. Fixed effect models estimate the ratio of nonsynonymous-to-synonymous substitution on a site-by-site basis, without assuming a priori a distribution of rates across sites. Single likelihood ancestor counting (SLAC), random effects likelihood (REL), and fixed effects likelihood (FEL) methods, new versions of the “counting,” “random effect” and “fixed effect” models, respectively, which allow variation in the synonymous substitution rate (Kosakovsky Pond and Frost 2005), were implemented at the DATAMONKEY web server (Pond and Frost 2005).
Finally, to detect variation in ω among lineages, a model with one ω (M0) was compared with a “free-ratio” model that allows each branch to have a separate ω value while keeping variation among sites constant (Nielsen and Yang 1998; Yang 1998). Because a parameter-rich model does not necessarily fit the data better than simpler models, a model selection scheme was performed in DATAMONKEY to find the variable-branch model with the best fit to the data.
Parallel amino acid changes were inferred using maximum parsimony in MacClade (Sinauer Associates, Sunderland, MA).
Nucleotide heterozygosity, π (Nei and Li 1979), and the proportion of segregating sites, θw (Waterson 1975), were estimated for the entire human and chimpanzee data sets, and also for different functional regions (coding, noncoding), and different human populations separately.
Tajima's D (Tajima 1989) and Fu and Li's D* (Fu and Li 1993) were calculated to assess whether the allele frequency spectrum deviates from neutral expectations. Coalescent simulations, conditioned on the observed number of segregating sites, were used to generate the null distributions of these test statistics. The ratio of nonsynonymous-to-synonymous polymorphisms in humans or chimpanzees was compared with the ratio of nonsynonymous-to-synonymous fixed differences with respect to the orangutan sequence (Mcdonald and Kreitman 1991). These analyses were performed using DnaSP (Rozas et al. 2003) and SITES (Hey and Wakeley 1997). To test for selection at putative regulatory regions as in Andolfatto (2005), we compared the ratio of polymorphism within humans with human–chimpanzee divergence at silent sites in the coding region and at two 1-kb regions directly upstream of two alternative human transcripts.
To study population structure in the chimpanzee data, 50,000 neutral genealogies of 38 chromosomes were simulated under panmixia using the program “ms” (Hudson 2002) using the observed level of variability and the recombination rate estimated from the data. To test for an excess of linkage disequilibrium (LD) due to admixture/population structure in chimpanzees, we computed the number of congruent sites (pb), defined as sites that determine only two haplotypes, and gd, defined as the maximum distance between any two congruent sites, using the script lbcalc (Garrigan et al. 2005). We then compared these values with the simulated distribution to calculate the probability of obtaining values more extreme than the observed ones.
To further evaluate the likelihood of gene flow between the chimpanzee subspecies, we fitted an isolation with migration model (Nielsen and Wakeley 2001; Hey and Nielsen 2004) using a Markov chain Monte Carlo method implemented in the program IMa (Hey and Nielsen 2007). Under this model, two populations split and diverge in isolation, with some level of gene flow. We used the largest nonrecombining region of the combined verus–troglodytes sample, which includes 1,660 bases of noncoding sequence, to run the program with a burn-in period of 2,000,000 steps using 15 chains with a geometric heating scheme. After the burn-in period, we ran the program for 15,399,385 steps, recording the results every 10 steps. We checked for convergence by comparing multiple runs.
The TLR5392STOP mutation was genotyped by restriction analysis with DdeI in the HGDP as in Hawn et al. (2003).
We obtained the coding sequence of TLR5 for a relatively broad array of primates including New World primates, Old World primates and apes. To address whether specific codons in the protein have been subject repeatedly to positive directional selection in different species, we first investigated models in which the dN/dS ratio is allowed to vary among different classes of sites. LRTs showed that models that incorporate selection fit significantly better than neutral models (table 1). For model 8, the most stringent of the models implemented in PAML, a small proportion of the codons (3.4% or 29 codons) was estimated to be under selection, with a ω value of 4.34, of which 13 were identified by the BEB approach with posterior probabilities above 0.8 (table 2).
We then compared these results with those from methods that incorporate synonymous rate variation (table 2). Using significance thresholds of P < 0.2 for SLAC and FEL (consistent with a true Type I error rate of ~5%, as suggested by Kosakovsky Pond and Frost 2005 and a Bayes factor >20 for REL [corresponding approximately to a P value of 0.05]) SLAC and FEL identified 1 and 14 codons, respectively, and REL identified 11 codons as targets of selection. Eleven codons (104, 158, 292, 312, 354, 482, 523, 530, 567, 586, and 847) were picked by at least two methods.
Although not independent from previous results, we also considered parallel amino acid changes (independent changes at the same codon position, from the same initial state to the same final state) as potential candidates for selection. At TLR5, 24 codon sites show parallel evolution in two lineages and eight sites have evolved in parallel in three lineages (table 2). Most of these do not fall at CpG sites (on either strand) and are thus not likely to be the product of mutational bias and/or increased mutation rate. Ten of the parallel changes (aa 158, 292, 312, 354, 482, 523, 530, 567, 586, and 847) correspond to sites that were identified by more than one ML method as targets of selection. Interestingly, parallel changes have not accumulated on specific branches but instead are relatively scattered across the primate phylogeny. A possible explanation for the high number of parallel changes is functional constraint due to the presence of many leucine-rich repeats in the extracellular domain. Such motifs typically contain a conserved 11 aa motif (LXXLXLXXNXL, where “L” is Leu, Ile, Val, or Phe; “N” is Asn; and X is any amino acid) and a variable region (Matsushima et al. 2007). In this case, all the parallel changes that occurred in the conserved portion of the LRRs involve “X” residues, suggesting that if functional constraint to maintain this motif exists, it does not seem to be responsible for the high number of parallel changes. We thus infer that selection might have played a role in driving these substitutions.
We investigated the radical or conservative nature of amino acid substitutions using U, an empirically derived universal index based on the genetic code that measures amino acid exchangeability during evolution (Tang et al. 2004) (table 2). In principle, more radical changes are more likely to affect function. U varies from 0.241 to 2.490 with lower values representing more radical (less common) changes (Tang et al. 2004). U is weakly correlated with other conventional measures, such as Grantham's distance, that determine amino acid exchangeability based on a combination of physicochemical properties such as volume and polarity (Grantham 1974), but it is a considerably better predictor of the observed pattern of amino acid substitution in a variety of taxa (Tang et al. 2004). Several sites show relatively radical amino acid changes (table 2).
Of the 11 sites that were identified by more than one ML method, amino acids 292, 312, 530, and 567 show the strongest evidence of selection because they were consistently identified by at least three ML methods, they show parallel changes, and they involve relatively radical amino acid changes. Particularly compelling is the evidence for selection on aa 530. This is the only site identified by all four ML methods, and it displays a radical change occurring in three independent lineages. Of the remaining seven sites, two deserve special attention. Site 354 involves a moderately radical change, and together with site 312, falls within the flagellin recognition domain (Andersen-Nissen et al. 2007). Site 847 also shows the same amino acid transition in three independent instances and is located in the very conserved TIR signaling domain.
Having shown that TLR5 evolution in primates is consistent with recurrent positive selection, we were interested in looking for heterogeneity in rates of protein evolution among different lineages and in investigating whether these differences were correlated with reported levels of sexual promiscuity. The best-fit model that allows variation in dN/dS among lineages grouped branches under four different rates: ω = 3.13, ω = 0.51, ω = 0.25, and ω = 0.06. The full model, which assigns a different rate to each branch, had a higher likelihood but not a significantly better fit than a model with a single rate for all branches. Nonetheless, we compared the dN/dS values obtained in this full model with the variation in mating systems among species (fig. 1). We categorized mating systems as less promiscuous (monogamous + polygynous) or more promiscuous (promiscuous + dispersed), based on information compiled by Dixon (1998) and Lindenfors and Tullberg (1998). To avoid the problem of uncertainty in reconstructing mating system along long branches, we focused only on the terminal branches. We observed an increase in the rate of evolution associated with an increase in promiscuity in four of the six independent transitions from less promiscuous to more promiscuous mating systems (fig. 1). This was true when we included all sites or when we included only the extracellular domain where most positively selected sites were located. For the extracellular domain, the average ω for more promiscuous branches (= 0.84; SD = 0.79) was higher than the average ω for less promiscuous branches ( = 0.46; SD = 0.22), but this difference was not significant (t-test, P = 0.093). Thus, there is no compelling evidence for a causal link between mating system and molecular evolution at TLR5 in these data.
Levels of variation at TLR5 in humans are summarized in table 3. In general, both coding and noncoding regions showed polymorphism levels similar to those seen at other genes (Akey et al. 2004). Overall, humans presented an excess of rare variants with negative values of Tajima's D and Fu and Li's D* for both the coding and noncoding regions (table 3). The African samples showed strongly negative values, whereas the European samples showed either less negative values (coding) or slightly positive (noncoding) values. Differences in the level and pattern of variation between the African and European samples in noncoding regions are largely in agreement with well-accepted demographic scenarios for African Americans, and European Americans (Stajich and Hahn 2005). For example, our Tajima's D values are not outliers in the distribution of Tajima's D for a large set of genes sequenced in African Americans and European Americans (Stajich and Hahn 2005), suggesting that demographic effects rather than positive selection best explain the deviations from the null model at noncoding sites.
For the coding region, both the African and European samples showed a more pronounced excess of low-frequency variants than in the noncoding region (table 3). Tajima's D for nonsynonymous sites was −1.495 and −1.020 for silent sites. This lower value for nonsynonymous polymorphisms is consistent with the idea that some of these mutations may be weakly deleterious. This is also supported by a slightly, but not significantly, higher ratio of polymorphism to divergence for nonsynonymous mutations than for synonymous mutations (table 4) using polymorphism data from both humans and chimpanzees.
Of the 13 observed replacement changes observed in humans, three had frequencies above 5% (C1174T [TLR5392STOP], freq = 0.069; A1775G, freq = 0.12; and T1846C, freq = 0.29). The high frequency of these mutations raises the question of whether they represent functional variants maintained at high frequency by selection. Merx et al. (2006) showed that only three of all known nonsynonymous single nucleotide polymorphisms (SNPs) at TLR5 had functional effects when tested on a site-by-site basis in transiently transfected CHO-K1 cells using reporter assays: One was TLR5392STOP and the other two were very rare SNPs not present in our sample. Each of these mutations resulted in a nonresponsive receptor (loss of function) after stimulation with flagellin. The T1846C and A1775G mutations do not have similar effects in this assay, and thus there is no experimental evidence that they affect function. Their high frequency might simply be due to drift.
We used a modified McDonald Kreitman (MK) test to compare the ratio of polymorphism to divergence for silent versus putative regulatory sites as in Andolfatto (2005) and found no deviation from the neutral expectation (table 5).
Levels of nucleotide variability in western chimpanzees (P. t. verus) are presented in table 3 and are similar to genomewide averages (Yu et al. 2003; Fischer et al. 2006). No significant deviations from neutrality were detected using tests of the allele frequency spectrum (table 3) or the MK test (table 4).
However, examination of the table of polymorphism revealed the presence of two major haplogroups (fig. 2, supplementary table 5, Supplementary Material online). Divergence between these haplogroups was 0.15%, close to the average value between chimpanzee subspecies (Yu et al. 2003; Fischer et al. 2006). To gain more insight into the origin of this variation, we sequenced three individuals of P. t. troglodytes. We found that the least frequent haplotype class (8 of 38) from P. t. verus is present in P. t. troglodytes in five of six chromosomes, whereas another haplotype present only in a single copy in P. t. troglodytes is more closely related to the major haplogroup in P. t. verus (fig. 2).
Three possible explanations for divergent haplotypes shared between subspecies are 1) unsorted ancestral polymorphism, 2) admixture (i.e., gene flow between groups), or 3) old balancing selection. Distinguishing among these is difficult. We note that the estimated divergence time between P. t. verus and P. t. troglodytes of 422,000 years (Won and Hey 2005) is less than the average time required to achieve reciprocal monophyly (E(t) ≈ 4Ne generations = 530,300 years, using the ancestral population size estimated by Won and Hey (2005) of Ne = 5,300 and a generation time of 25 years). Although the variance associated with this estimation is very large (Tajima 1983), this comparison suggests that ancestral variation could still be segregating between these subspecies. However, variation that is ancestral should have relatively little LD, whereas variation that is due to recent admixture should have higher levels of LD, an idea formalized into a test by Wall (2000) to detect ancient admixture in humans. We applied this test to our data. We computed the number of congruent sites (pb) and the maximum distance between any two congruent sites (gd), and compared these values with a simulated distribution generated by sampling neutral genealogies conditioned on the observed level of variation. The probability of obtaining both pb = 6 and gd = 0.285 under panmixia was 0.039 (using the level of recombination estimated from the data), indicating the existence of population structure or historical gene flow. We also fitted an isolation model with gene flow, as in Won and Hey (2005), and found evidence of gene flow between subspecies, although most of this gene flow was from P. t. verus to P. t. troglodytes. In light of the relative excess of LD revealed by the Wall test, some form of admixture or population structure seems to be the most likely explanation for the patterns of variation seen at TLR5 in P. t. verus, although we note that more complex scenarios involving retention of ancestral polymorphism and selection could also contribute to the observed patterns.
Two lines of evidence suggest that TLR5392STOP has functional consequences. First in vitro assays showed that it encodes a defective receptor (Merx et al. 2006). Second, it is associated with disease phenotypes in human populations (Hawn et al. 2003, 2005; Gewirtz et al. 2006). Because of these observations, we were interested in measuring the frequency of TLR5392STOP in different populations and exploring the idea that this mutation might be under recent strong positive selection in humans. We genotyped the mutation in the HGDP and estimated a global frequency of 4.2%. The genotype frequencies were close to Hardy–Weinberg expectations (supplementary table 5, Supplementary Material online). TLR5392STOP is distributed nearly worldwide, with the mutation present in at least one copy in approximately half of the populations sampled (fig. 3). Because the mutation is often relatively rare, it is possible that the mutation is present at low frequencies in more populations than those reported here. Notably, some populations in the Middle East and Southern Asia have considerably higher frequencies, such as Balochi and Baruscho from Pakistan (14.5% and 12.0%, respectively), Miaozu and Naxi from China (10.0% and 11.0%, respectively), Cambodia (16.7%), Papua-New Guinea (14.7%), and Melanesia (22.7%).
If TLR5392STOP has increased in frequency due to selection in the recent past, the mutation is expected to be embedded in unusually long haplotypes. For example, selection at G6pd has generated LD over more than 1 Mb (Saunders et al. 2005). However, only two sites (positions 9,946 and 11,185) show significant LD (measured as D') with TLR5392STOP (position 33,309) after Bonferroni correction for multiple testing (table 6). The distances between TLR5392STOP and these sites are 23,963 and 22,724 nt, respectively. Because TLR5392STOP (or any linked marker) is not present in the Hapmap, we were not able to evaluate the extent of LD at longer distances, but the fact that the haplotype containing TLR5392STOP extends less than 25 kb suggests that if selection is responsible for the actual frequencies, it is not recent and strong.
Immunity genes are among the fastest evolving classes of genes in mammalian genomes (Gibbs et al. 2004; Mikkelsen et al. 2005; Nielsen et al. 2005), an observation that is usually interpreted as evidence of positive selection due to their potential engagement in host–pathogen coevolution. Despite this generalization, it has been unclear whether genes of the adaptive and innate branches of immunity show similar patterns of evolution or whether they are characterized by very different levels of functional constraint. By studying both phylogeny-based estimates of evolutionary rates and patterns of nucleotide variation within and between closely related primate species, we sought to provide an integrated understanding of the molecular evolution of an innate immunity receptor at different evolutionary timescales.
The results of several ML approaches provide strong evidence that TLR5 has experienced positive selection in primates. Conservatively, we identified 11 sites that show congruence between different ML methods as the strongest candidates of adaptive evolution. Of these, 10 sites are localized in leucine-rich repeats of the extracelullar domain (table 2), and three are located within a 228 aa region where the putative flagellin recognition site lies (Andersen-Nissen et al. 2007). Although we still do not have a complete picture of the flagellin-TLR5 interaction surface, this observation strengthens the case of adaptive evolution at TLR5. Moreover, based on the modeled 3D structure of the extracellular domain, Andersen-Nissen et al. (2007) hypothesized that amino acids near a conserved concavity within the 228 aa region could mediate species-specific patterns of TLR5 recognition. It is worth noting that site 268 lies adjacent to a residue (267) that was identified by mutagenesis as responsible for differences in specificity between human and mouse TLR5 (Andersen-Nissen et al. 2007). It is possible that residue 268 or some of the other sites identified as candidates for being under selection are also involved in TLR5 species specificity, a matter that functional studies will be able to clarify. “U,” the evolutionary index (Tang et al. 2004), provides additional information about the likelihood that specific mutations affect function and thus may be under selection. Six of the 11 sites under selection (aa292, aa312, aa354, aa482, aa530, and aa567) show relatively radical changes, with U ranging between 0.375 and 0.732.
The identification of several sites under selection within the pathogen interaction domain fits the expectation of coevolutionary models. This is in line with the finding that several flagellated Proteobacteria are able to evade human TLR5 recognition (Andersen-Nissen et al. 2005). However, we note that only a small proportion of sites (11/858 = 1.3%) show strong evidence of positive selection. Thus, most of the protein, including the TIR (signaling) domain, shows strong functional constraint, in agreement with the most generally accepted paradigm of TLR function. This duality of strong positive selection on a few sites against a background of strong purifying selection over most of the TLR5 protein is in sharp contrast with antiretroviral genes such as APOBEC3G, TRIM5α. These genes show a much larger proportion of sites (30% and 18%, respectively) under positive selection (Sawyer et al. 2004, 2005). These differences between TLR5 and APOBEC3G or TRIM5α may reflect general differences between PRRs and genes involved in “intrinsic” immunity (i.e., genes that typically do not participate in the classic innate immunity pathways but nevertheless can confer pre-exposure resistance against certain pathogens). These differences might also reflect differences between genes whose products interact with bacteria versus those whose products interact with viruses. It is possible, for example, that due to their higher mutation rates and faster turnover, viruses impose stronger selection than do bacteria.
Vertebrate immune systems differ from invertebrate immune systems in many ways, but most notably in the presence of an adaptive immune response. The acquisition of adaptive immunity could have fundamentally changed the evolutionary dynamics of vertebrate PRRs. The recent publication of genomewide patterns of evolution of innate immunity genes in Drosophila by Sackton et al. (2007) allows us to start comparing patterns of evolution of PRRs and other classes of innate immunity genes between Drosophila and vertebrates. Using a similar codon-based ML approach as the one used here, Sackton et al. (2007) found that among 245 Drosophila immunity genes, PRRs constitute the class with the highest proportion of positively selected sites (followed by signaling peptides and then antimicrobial peptides) in the D. melanogaster group. In contrast, Schlenke and Begun (2003) reported that adaptive fixations are also common in signaling molecules in Drosophila simulans. In vertebrates, similar genomewide analyses of innate immunity genes are missing, but some evidence points to the possibility that innate immunity genes are also under strong selection. Recent examples include APOBEC3G and TRIM5α (Sawyer et al. 2004, 2005), TRIM22 (Sawyer et al. 2007), TLR4 (Nakajima et al. 2008), RNASEL (Summers and Crespi 2008), and PKR (Elde et al. 2008). Our results demonstrate that some PRRs can also evolve rapidly between species.
We tested the mating system/disease-risk hypothesis with six phylogenetically independent contrasts between promiscuous and monogamous/polygynous mating systems in the primate phylogeny. We found that dN/dS changed in the predicted direction in four of six cases and that the average dN/dS was not significantly higher in more promiscuous lineages. Thus, rates of molecular evolution at TLR5 do not seem to support this controversial hypothesis and suggest that lineage-specific effects are more important than the effect, if any, of mating system. A more complete test will require analysis of similar data from many immunity genes. An interesting observation is that the increase in ω in the more promiscuous group was accompanied by an increased variance. It is possible that promiscuous mating systems are associated with stronger natural selection on immunity genes only some of the time (or only on a subset of these genes) leading to a higher average ω and also to a greater variance in ω in more promiscuous lineages compared with less promiscuous lineages.
Patterns of nucleotide variation within humans and chimpanzees were largely consistent with neutral expectations. The deviations from neutral predictions in the spectrum of allele frequencies were similar to those seen at other genes, suggesting that demographic effects, rather than selection, are responsible for these patterns. Thus, despite the strong evidence for adaptive evolution at TLR5 over deeper evolutionary timescales in primates (see above), we did not find evidence for adaptive evolution within humans or chimpanzees. This suggests that adaptive evolution at TLR5 may be somewhat episodic, or at least not marked by continual turnover of new adaptive alleles as might be expected under an arms race model of host–pathogen coevolution.
We estimated the rate of adaptive fixations from our phylogenetic comparisons to get a sense of the likelihood of detecting selection within species. Using the 11 sites with the strongest evidence of selection (table 2), we estimated the rate of adaptive fixation by dividing the total number of amino acid substitutions (39) at these sites by the total length of the tree (417.2 My) using divergence times from Bininda-Emonds et al. (2007). This yielded a value of approximately one adaptive fixation every 10 My. This is probably an underestimate of the true rate, because the ML methods used here only have power to detect recurrent positive selection on the same sites. However, even if the true rate were an order of magnitude higher than this estimate, it would not be surprising to fail to find evidence of selection within humans or chimpanzees. Polymorphism-based tests of selection typically have power to detect selection over fairly recent timescales, often on the order of less than N generations (~250,000 years in humans) (Braverman et al. 1995; Simonsen et al. 1995; Przeworski 2002).
One result worth noting was the observation of low-frequency replacement polymorphisms in humans. These polymorphisms contribute to a ratio of replacement to silent variation that is slightly higher within species than between species (table 4). Along with negative values of Tajima's D for replacement polymorphisms, this suggests that many of these polymorphisms may be weakly deleterious, consistent with the general pattern of functional constraint revealed by the phylogenetic analysis.
Patterns of nucleotide variation within chimpanzees differed from those seen in humans. Levels of variation were lower in chimpanzees, in spite of similar effective population sizes (or slightly higher in P. t. verus) (Yu et al. 2004). The distribution of allele frequencies differed too, with an excess of rare variants in humans and a trend toward an excess of intermediate frequency variants in chimpanzees at noncoding sites. Chimpanzees exhibited two divergent haplogroups in both P. t. verus and in P. t. troglodytes. The presence of these shared haplotypes is probably best explained by gene flow between subspecies at some point in the recent past or by some more complicated form of population structure.
Recently, several cases of adaptive gene loss in humans have been reported (Tournamille et al. 1995; Ali et al. 1998; Wang et al. 2006; Seixas et al. 2007). This somewhat counterintuitive idea, positive selection favoring gene loss, has been proposed as a potentially important mechanism in human evolution (Olson 1999; Olson and Varki 2003).
TLR5392STOP, a loss-of-function mutation, segregates in humans at a considerable frequency along with the normal variants. This raises the question of whether 1) it is being constantly generated by recurrent mutation, 2) it has increased in frequency due to positive selection, in which case there might be a trade-off between the disadvantage of loosing the function and some other benefit, or 3) it has drifted in the population to its present frequency.
The frequency of TLR5392STOP is clearly not compatible with mutation-selection balance. Assuming a mutation rate, μ, of 2 × 10−8 (or ~10−7 for a CpG site) (Nachman and Crowell 2000) and an equilibrium frequency, qe, of 0.042 we can calculate the selection coefficient, s, against a dominant mutation as s ≈ μ/qe (Haldane 1932). The estimated s (6.0 × 10−7 or 2.4 × 10−6 for a CpG site) is so small as to be effectively neutral in human populations, where the effective population size is approximately 10,000 (Zhao et al. 2006). If s was 0.01, then the mutation rate would have to be on the order of 10−4 to account for the observed frequencies, and this is clearly unrealistic. Moreover, the fact that the TLR5392STOP always appears on the same haplotype argues against recurrent mutation.
We found no evidence of strong, recent selection on TLR5392STOP either in patterns of LD, which were unremarkable, or in levels of variability, which were average. Moreover, the distribution of allele frequencies at TLR5 fits well with generally accepted demographic models. This leaves drift as the most likely explanation for the present frequency of TLR5392STOP. Given the difficulties of detecting selection from polymorphism data in humans, we cannot rule out the possibility that TLR5392STOP has been under weak positive selection, especially in light of the phenotypes associated with this mutation. For instance, SLE has a relatively high prevalence (up to 160/100,000) and mostly affects women in reproductive age (Danchenko et al. 2006) making the hypothesis of selection for protection against autoimmune diseases at least reasonable. There are marked geographic differences in SLE burden that might reflect underlying genetic variation for resistance/susceptibility or variation in environmental factors. Microbial infections are common triggers of autoimmunity through TLRs (Anders et al. 2005). It would be interesting to correlate worldwide abundance of flagellated pathogens with the prevalence of SLE.
If drift took the mutation to its present frequency, then the mutation must be relatively old. An estimate of the age of an allele based on its frequency, q, is given by E(t) = (−2q)(ln q)/(1 − q), where age is measured in units of 2N generations (Kimura and Ohta 1973). The global frequency of TLR5392STOP is 0.042. Assuming that N = 10,000, the estimated age is 5,560 generations or 139,000 years (assuming a generation of 25 years). Another way to estimate the age of the TLR5392STOP mutation is from the decay of LD as a function of time and recombination rate. The time required to erode linkage to the observed level is given by: (Hedrick 2000), where is the observed LD in the data, is the initial LD (assumed to be complete when the TLR5392STOP mutation arose, D′ = 1), and c is the recombination distance calculated using the average recombination rate for chromosome 1 of 1.2 cM/Mb (Jensen-Seaman et al. 2004). Using five sites that show significant LD (table 6) t was estimated as 2,096 generations or 52,398 years.
The observation that TLR5392STOP, a null variant, is present at frequencies up to 23% in some populations suggests that TLR5 function might be partially compensated by other genes (i.e., functional redundancy for TLR5). A similar case is provided by Verdu et al. (2006) who, based on the patterns of nucleotide variation and absence of extended LD, concluded that MBL2, another innate immune receptor that activates the lectin–complement pathways, is functionally redundant in human innate immune defenses. Redundancy in PAMP recognition might be a common theme in the innate immune response (Miao et al. 2007). The recognition of viral RNA provides a good example in which several TLRs participate in the detection of ssRNA and dsRNA in endosomal compartments, while another suite of genes responds to the same PAMPs in the cytosol. It is possible that this recognition at multiple levels is an important and previously unappreciated feature of the innate immune system. The recent identification of a second flagellin receptor (cytosolic), Ipaf (Franchi et al. 2006), is consistent with this idea. However, the downstream effects of both genes are quite different, and they also respond to different types of bacteria (reviewed in Miao et al. 2007), suggesting that TLR5 and Ipaf might cooperate in recognizing flagellated bacteria rather than being completely functionally redundant.
We especially thank the following people/institutions for providing DNA or tissue samples: Drs. M. Hammer, O. Ryder, and B. Beer, The Museum of Vertebrate Zoology, Berkeley, California, The Southwest National Primate Research Center, and the Gladys Porter, Toronto, San Diego, and Los Angeles Zoos. We thank Dr. W. Klimecki for providing access to the raw human sequence data deposited in the Innate Immunity Database, Dr. M. Saunders and Dr. H. Norton for help with analysis, A. Woerner for the use of scripts to handle/analyze polymorphism data, and Donata Vercelli, Armando Geraldes, Matt Dean, Jeff Good Tovah Salcedo, Miguel Carneiro, and Mari Sans-Fuentes for very useful discussions of the data. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. This work was supported by an NIH grant to M.W.N. (GM074245). The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.