|Home | About | Journals | Submit | Contact Us | Français|
Inflammatory bowel disease 5 (IBD5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn’s disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn’s disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European CEU HapMap population (P = 0.0007) and European Human Genome Diversity Panel populations (P ≤ 0.01). To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1,868 Crohn’s disease cases and 5,550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease (odds ratio [OR]: 1.05, P = 0.21). In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three genes (OR: 1.24, P = 2.6 × 10−8), as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn’s disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn’s disease patients (P = 0.0006) and no significant difference in expression of OCTN1. Collectively, these data indicate that the 503F variant has increased in frequency due to recent positive selection and that disease-causing variants in linkage disequilibrium with 503F have hitchhiked to relatively high frequency, thus forming the IBD5 risk haplotype. Finally, our association results and expression data support IRF1 as a strong candidate for Crohn’s disease causation.
As an advantageous allele spreads through a population during a selective sweep, alleles in linkage disequilibrium (LD) with the advantageous allele can rapidly increase in frequency as a result of genetic hitchhiking. Under some conditions, genetic hitchhiking can also drive deleterious disease-causing alleles to high frequency (Wagener and Cavalli-Sforza 1975; Rice 1987). However, the study of this phenomenon has been limited to complete selective sweeps, primarily in nonrecombining genomes (Wagener and Cavalli-Sforza 1975; Rice 1987; Charlesworth B and Charlesworth D 2000; Seger et al. 2010). As a consequence, earlier studies of genetic hitchhiking are not directly relevant to the appreciable number of incomplete selective sweeps that have recently been identified in various human populations (Voight et al. 2006; Hawks et al. 2007; Pickrell et al. 2009; Simonson et al. 2010), and thus, the role of genetic hitchhiking in human disease remains unclear. Here, we analyze the disease implications of incomplete selective sweeps in large recombining genomes and identify the conditions that can lead to an increase in the frequency of disease-causing alleles. We then apply these results to the analysis of the inflammatory bowel disease 5 (IBD5) haplotype on chromosome 5.
Among Europeans, the IBD5 haplotype is associated with an increased risk of developing Crohn’s disease (Ma et al. 1999; Rioux et al. 2000, 2001; Burton et al. 2007), a chronic inflammatory disorder of the gastrointestinal tract. IBD5 has a frequency of approximately 40% in healthy Europeans but has a very low frequency (<5%) in African and East Asian populations (Fisher et al. 2006; Tosa et al. 2006; Silverberg et al. 2007). Due to extensive LD extending across 250 kb, multiple single nucleotide polymorphisms (SNPs) in this haplotype have equivalent statistical association with Crohn’s disease, including SNPs in the genes P4HA2, PDLIM4, OCTN1, OCTN2, and IRF1 (Onnie et al. 2006; Silverberg et al. 2007; Franke et al. 2010). We hypothesize that the disease association and extensive LD at IBD5 are the results of a recent selective sweep (fig. 1). Here, we evaluate the case for positive selection at IBD5 with statistical tests for selection in European populations. We then identify patterns of Crohn’s disease association at IBD5 that are consistent with genetic hitchhiking of disease-causing alleles on a sweeping haplotype.
The simulations in figure 2 model the interaction between a strongly favored allele and mildly deleterious alleles at a neighboring locus, as a special case of asymmetric Hill–Robertson interference (Hill and Robertson 1966). All simulations begin with a 5N generation burn-in time, where N is the effective population size. The simulations are conditioned on reaching the desired frequency of the favored allele by rejection sampling. The deleterious mutation rate is 1.1 × 10−8 per site per generation (Roach et al. 2010). The segregating sites within the disease susceptibility locus are nonrecombining. The fitness effects of multiple alleles are assumed to combine multiplicatively, although given the asymmetric relationship between the advantageous allele and deleterious alleles (s = 0.02 vs. s < = −0.005), the specific fitness model is of little consequence.
To test for recent positive selection on 503F, we calculated iHS scores from phased haplotype data across the genome in a Utah population (HapMap CEU) and in European populations from the HGDP (Consortium 2005; Li et al. 2008). For HapMap data, we used phased data from HapMap phase 2 (http://hapmap.ncbi.nlm.nih.gov/downloads/phasing/2006-07_phaseII/phased/; Frazer et al. 2007); for HGDP data, we phased the Illumina 550K SNP microarray data from a Crohn’s disease cohort reported previously (Imielinski et al. 2009) using fastPHASE (Scheet and Stephens 2006). We evaluated the statistical significance of the iHS test from the empirical distribution of iHS scores across the genome for each population and for all sites with a derived allele frequency between 20% and 80%. The iHS statistic is defined as the log of the ratio of integrated Extended Haplotype Homozygosity (iHH) scores at each site for the derived and ancestral alleles, standardized by the derived allele frequency (Voight et al. 2006). To calculate iHH for each allele at each site, we integrated the expected Extended Haplotype Homozygosity (EHH) in both directions from the core SNP until either expected EHH reached 0.05 or all haplotypes were unique (Voight et al. 2006; Huff et al. 2010). Because iHS has limited statistical power for sample sizes smaller than 20 (Pickrell et al. 2009), we restricted our analysis to population samples with at least 20 individuals.
The population frequencies of 503F were determined by genotyping SNP L503F (rs1050152) in 85 populations across the Old World, including 954 individuals from 48 HGDP populations (Li et al. 2008) and 772 individuals from 37 additional populations described in (Jorde et al. 1995; Bamshad et al. 1998; Watkins et al. 1999; Bulayeva et al. 2003; Xing et al. 2009, 2010). The SNP was genotyped by fluorescent primer extension using SNaPshot chemistry (Applied Biosystems) and analyzed on ABI 3100 genetic analyzer (Applied Biosystems). The populations genotyped and their population frequencies of the 503F allele are shown in supplementary table S2, Supplementary Material online.
The genotypes of the IBD5 region in 1,868 Crohn’s disease cases and 5,540 controls (Imielinski et al. 2009) were determined by genotyping individuals using the Illumina 550K SNP microarray at the Center for Applied Genomics at the Children’s Hospital of Philadelphia. All patients met the standard diagnostic criteria for Crohn’s disease (Silverberg et al. 2007). We used genotypes from 639 SNPs spanning 1 MB upstream and downstream of OCTN1 and selected only subjects of European ancestry as determined both by self-reported ancestry and by STRUCTURE (Falush et al. 2003) runs using ancestry informative markers as in Imielinski et al. (2009). We phased the genotype data using BEAGLE (Browning SR and Browning BL 2007). After phasing, we constructed haplotype bifurcation diagrams using the program SWEEP (Sabeti et al. 2002) from the phased genotype data of the cases and a subset of controls (the 1,262 samples collected in Utah and Atlanta), with the core haplotypes defined by the 503F and 503L alleles of OCTN1.
We examined colonic expression of selected candidate genes located in the IBD5 LD block. Gene expression was assayed in individual colonic biopsy specimens from subjects with early-onset Crohn’s disease (n = 30) and from healthy controls (n = 11). Inflammation was quantified in colon biopsies by using the Crohn’s Disease Histological Index of Severity. After informed consent, colonic biopsies were obtained from subjects with Crohn’s disease and healthy controls. All of the biopsies for Crohn cases and healthy controls were obtained from the ascending colon. Colon biopsies were immediately placed in RNAlater stabilization reagent (Qiagen, Germany) at 4 °C. Total RNA was isolated by an RNeasy Plus Mini Kit (Qiagen) and stored at −80 °C. Samples were then submitted to the Cincinnati Children's Hospital Medical Center Digestive Health Center Microarray Core where the quality and concentration of RNA were measured by the Agilent Bioanalyser 2100 (Hewlett Packard) using an RNA 6000 Nano Assay to confirm a 28S/18S ratio of 1.6:2.0. We amplified 100 ng of total RNA by using a Target 1-round Aminoallyl-aRNA Amplification Kit 101 (Epicentre, WI). The biotinylated complementary RNA was hybridized to Affymetrix GeneChip Human Genome HG-U133 Plus 2.0 arrays, containing probes for 22,634 genes. The images were captured by an Affymetrix Genechip Scanner 3000. The complete data set is available at the NCBI Gene Expression Omnibus (GEO): colonic gene expression data set, GSE10616. Data were normalized to an internal control within each batch and to the healthy control samples to allow for array-to-array comparisons.
To analyze the properties of deleterious hitchhiking alleles in large recombining genomes during an incomplete selective sweep, we employed forward-in-time simulations of a Wright–Fisher model (see Materials and Methods). The simulations model a newly arising advantageous mutation and a linked locus at which variation is constrained by purifying selection (see fig. 2). We are interested in the conditions that produce LD between the advantageous allele and one or more deleterious alleles in the linked locus. These conditions produce statistical association between disease risk and alleles in a sweeping haplotype block.
We restrict the simulations such that the new advantageous mutation originates on a chromosome with one or more deleterious alleles. In the absence of this restriction, the advantageous allele is expected to be at or near linkage equilibrium with deleterious alleles at the end of the modeled sweep in all of the scenarios we evaluated. Therefore, the most important factor controlling the behavior of deleterious hitchhiking is the probability that the advantageous mutation originates on a chromosome with a deleterious allele. This probability is equal to the population frequency of chromosomes with one or more deleterious alleles. At mutation–selection equilibrium, this probability is equal to μ/s, where μ is the deleterious mutation rate of the locus under purifying selection and s is the strength of selection against new mutants at this locus. Therefore, the scenario most likely to give rise to deleterious hitchhiking is one in which many weakly constrained mutational targets are in close proximity to the favorable allele. This scenario may be particularly relevant to complex genetic diseases for two reasons. First, because complex genetic diseases involve more genes than Mendelian diseases, there may be more mutational targets that can potentially contribute to disease susceptibility (higher μ). Second, the genes involved in complex diseases may be substantially less constrained by purifying selection than those involved in Mendelian diseases (lower s; Blekhman et al. 2008).
Our simulation results show that the two other major factors controlling the behavior of deleterious hitchhiking are the selective advantage of the advantageous allele and the genetic distance between the advantageous and deleterious alleles (fig. 2). As genetic distance increases, LD between advantageous and deleterious alleles decreases due to recombination between them (fig. 2A). The expected distance at which approximate linkage equilibrium will be reached (the equilibrium distance) depends almost exclusively on the selective advantage of the favorable allele (fig. 2). This result holds for the following reasons. First, recombination is the primary force for reestablishing equilibrium. Second, controlling for genetic distance, the expected amount of recombination since the origin of the favorable allele depends on the span of time since the start of the selective sweep. Finally, this time span depends primarily on the selective advantage of the favorable allele. Thus, selective advantage controls equilibrium distance.
The time since the start of the selective sweep and therefore equilibrium distance are relatively insensitive to both the initial population size and changes in population size (Hawks et al. 2007; fig. 2C and G). Recombination can introduce additional deleterious alleles into the sweeping haplotype block, as shown in the green area of figure 2A, but the rate at which additional deleterious alleles are introduced is lower than the rate at which the initial deleterious allele is lost, for example, at a distance of 0.4 cM, the deleterious allele frequency has dropped from 100% to 47%, whereas the frequency of other deleterious alleles has increased to only 6% (fig. 2A). Although mutation can introduce new deleterious alleles in the selective sweep, the mutation rate has no appreciable effect on equilibrium distance (fig. 2E).
When deleterious hitchhiking occurs in a disease susceptibility gene, common SNPs in LD with the advantageous allele should consistently show signals of disease association resulting from LD with multiple disease-causing variants. However, these disease-causing variants may be individually too rare to produce association signals. This is a type of synthetic association: Observed disease association at a common SNP resulting from variants separated from the common SNP by large genetic distances (Dickson et al. 2010). In the absence of positive selection, if rare mutations make a large contribution to disease risk, synthetic association at common SNPs can occasionally occur at distances greater than 2.5 cM from the disease-causing variants (Dickson et al. 2010). In contrast, for a genomic region influenced by a strong selective sweep, synthetic association is more predictable, with an equilibrium distance of less than 3 cM from the advantageous allele for most selective sweeps (fig. 2A).
The 503F variant (rs1050152) of OCTN1, a gene located near the center of the IBD5 haplotype, has been associated with Crohn’s disease in several studies (Rioux et al. 2001; Mirza et al. 2003; Peltekova et al. 2004; Fisher et al. 2006; Silverberg et al. 2007; Imielinski et al. 2009; Franke et al. 2010). However, no convincing causal link between this variant and Crohn’s disease has been established. The key substrate of the transporter encoded by OCTN1 is ergothioneine (ET), an antioxidant synthesized by fungi and present in most plants and animals (Grundemann et al. 2005; Ey et al. 2007). 503F is a gain-of-function mutation that increases ET transport efficiency by 50% and ET substrate affinity by 3-fold (Taubert et al. 2005). This variant is common in European and Middle Eastern populations but is rare throughout the rest of the world (fig. 3B). Thus, 503F is characterized by several unusual properties: It is absent in Africa and East Asia but common in Europe; it is associated with a specific haplotype background characterized by extensive LD in Europeans; it confers a gain-of-function on the protein; and it is associated with disease, although there is no direct evidence for causation. Collectively, these observations are consistent with the hypothesis that 503F was influenced by positive selection and that the association of this variant with disease is the result of genetic hitchhiking. This hypothesis can account for the extensive LD in the IBD5 haplotype, the geographic distribution of the 503F allele, and disease association with 503F in the absence of direct causation (Wagener and Cavalli-Sforza 1975; Rice 1987).
We propose that 503F is an adaptation to low dietary levels of ET among early Neolithic farmers in the Fertile Crescent. Although ET content is relatively high in meat and a variety of plant foods, it is conspicuously low in many of the plants first domesticated in the Fertile Crescent, including wheat, barley, lentils, and peas (Ey et al. 2007; table 1). Because the function of ET is not well understood, it is difficult to identify the specific fitness consequences that Neolithic farmers would have faced as a result of low levels of ET in their diet. Despite the lack of direct evidence, there are several lines of evidence supporting the importance of ET. ET functions both as an antioxidant and a neuroprotective agent (Moncaster et al. 2002) and OCTN1 functions exclusively as an ET transporter (Grundemann et al. 2005). OCTN1 is highly conserved among vertebrates, with >90% protein identity between human and other primates and 75% protein identity between human and chicken (Altschul et al. 1997). In addition, despite being exclusively synthesized by fungi and mycobacteria, ET is found in a wide variety of plants and animals (table 1; Ey et al. 2007). Further, depletion of ET has been shown to increase susceptibility to oxidative stress in mammalian cells, resulting in increased mitochondrial DNA damage, protein oxidation, and lipid peroxidation (Paul and Snyder 2010). Finally, ET has been shown to play a specific role in response to UV-induced oxidative stress (Markova et al. 2009), which may have been particularly relevant to Fertile Crescent farmers if they were as light skinned as many of their descendants are. If the transition from hunting and gathering to early agriculture in the Fertile Crescent resulted in a dietary ET deficiency, a genetic variant increasing the absorption of ET might have been favored by positive selection, allowing it to spread rapidly through the population.
Previous studies have suggested that a recent selective sweep may have occurred in one or more Eurasian populations near the IBD5 haplotype in the region containing IL13, which is approximately 350 kb downstream of OCTN1 (Sakagami et al. 2004; Zhou et al. 2004; Tarazona-Santos and Tishkoff 2005). Because signals of positive selection can extend for long genomic distances (Grossman et al. 2010), the patterns observed at IL13 could potentially be explained by positive selection on the 503F variant of OCTN1. To test for evidence of positive selection on the 503F variant, we employed the iHS statistic, which is the most powerful method for detecting recent positive selection when the favorable allele is still polymorphic in the population (Voight et al. 2006; Huff et al. 2010). This test measures the decay of LD around a polymorphic site and is designed to detect extended haplotype blocks that are produced by a recent selective sweep. When the test is applied to the advantageous allele, the statistical power is greater than 80% at the 0.01 significance level in a sample size of 50 individuals (Voight et al. 2006; Huff et al. 2010). The empirical one-tailed test for 503F in the HapMap CEU sample resulted in a P value of 0.007 (table 2). Of the four Human Genome Diversity Panel (HGDP) European populations we tested, three (Russia, Sardinia, and France) were significant at the 0.01 level, with P = 0.012 in the Basque sample. This result provides strong evidence that the 503F variant has been influenced by recent positive selection in European populations (table 2). By itself, the iHS test is rarely able to conclusively identify the variant that has been targeted by selection due to significant iHS signals from nearby hitchhiking alleles (see supplementary table S5, Supplementary Material online). Therefore, although these results support the hypothesis of positive selection acting on 503F, we cannot rule out the possibility that the target of selection was a variant other than 503F on the IBD5 haplotype.
To estimate the age of 503F, we measured the decay of LD in the HapMap CEU sample with the method described in Reich and Goldstein (1999) using the implementation details from Sabeti et al. (2005). Our estimate incorporated all SNPs in HapMap Phase 2 that are within 0.04 cM of 503F (Consortium 2005) (see supplementary table S4, Supplementary Material online). The estimated age of origin of 503F was 12,550 years ago (95% confidence interval = 7,750–19,025, 25-year generation time), which is consistent with the earliest archaeological evidence for the domestication of wheat (10,600 years ago) and barley (9,500 years ago) (Hillman 1975; Zeist and Bakker-Heeres 1982; Badr et al. 2000; Ozkan et al. 2002). The origin age of 503F is also consistent with the domestication of the earliest pulses, which are particularly low in ET content (see peas and lentils in table 1) and appear in the archaeological record soon after wheat and barley (Ladizinsky 1979).
To better assess the geographic distribution of 503F, we genotyped this variant in 954 individuals from 48 HGDP populations and 772 individuals from 37 additional populations. Figure 3 compares the frequency of 503F across the Old World with the ages of early Neolithic archaeological sites (see supplementary table S3, Supplementary Material online), suggesting a close relationship between the geographic distribution of 503F and the origin of agriculture in the Fertile Crescent. With an allele frequency of 0.4 and an age of origin of 12,550 (7,750–19,025) years ago, we estimate that the selective advantage was approximately 1.9% (1.3–3.2%) for early Neolithic farmers with one copy of 503F (estimated assuming additive deterministic selection with the best-fitting model of European population history from Schaffner et al. (2005)).
Modeling disease variants as genetic hitchhikers provides a framework for constructing haplotype association tests to assist in disease-gene mapping efforts. The haplotype bifurcation diagram in figure 4A depicts recombination events that have occurred on haplotypes with the 503F allele. Each bifurcation point represents an SNP that defines one or more recombinant haplotypes, which were created by recombination events upstream of that SNP. The disease-hitchhiking model predicts that haplotypes created by recombination events between the favorable allele and the disease locus will not be associated with disease and that the signal of disease association should be concentrated in haplotypes created by more distant recombination events (fig. 1). Our simulation results predict that hitchhiking disease variants should be located within 1.4 cM of 503F (fig. 2A).
Among the genes in the IBD5 region, there are two strong a priori biological candidates for Crohn’s disease causation: IRF1 (0.057 cM from 503F) and IL5 (0.1 cM from 503F). Both genes are involved in mechanisms that may contribute to Crohn’s disease pathogenesis. IL5 encodes the cytokine IL5, whose expression is increased in lymphocytes isolated from lamina propria cells in ulcerative colitis patients, though not from Crohn’s disease patients (Fuss et al. 1996). Overexpression of IL5 is associated with intestinal inflammation, though distinct in character from murine colitis (Lee et al. 1997). IL5 knockout mice demonstrate an apparently normal adaptive immune response but demonstrate significantly attenuated eosinophilia in response to parasitic challenge (Kopf et al. 1996). IRF1’s potential role in Crohn’s disease causation is supported by several lines of biological evidence: IRF1 deficient mice develop severe infection from the intracellular bacteria Mycobacterium bovis (Kamijo et al. 1994; Yamada et al. 2002). The absence of IRF1 results in the decoupling of the MyD88toll-like receptor signaling pathway (Negishi et al. 2006), a pathway critical for host defense against intracellular pathogens. Thus, a deficiency in IRF1 appears to result in defects in innate immunity, particularly in pathways important in the clearing of intracellular bacteria. Defects in intracellular bacterial clearance are an emerging theme in the pathogenesis of Crohn’s disease, highlighted by the identification of the Crohn’s disease susceptibility loci ATG16L1, NOD2, and IRGM (Singh et al. 2006; Cooney et al. 2010; Travassos et al. 2010).
To test for evidence of disease hitchhiking in IRF1 and IL5, we performed a haplotype association analysis using Illumina 550k SNP microarray data from 1,868 Crohn’s disease cases and 5,540 controls (Imielinski et al. 2009; fig. 4). The odds ratio (OR) for the 503F allele itself was 1.24 (P = 5.5 × 10−9, allele frequency 48.2% in cases vs. 42.7% in controls). At about 200 kb downstream of 503F, past IRF1 and IL5, the core haplotype splits into two common haplotypes at rs11739623 (see fig. 4). These two haplotypes are defined by 31 SNPs (supplementary table S1, Supplementary Material online); we refer them as haplotypes C and T, designating the alleles that differentiate them at rs11739623:C/T (see fig. 4). For the remaining group of individually rare 503F haplotypes, where a recombination event has occurred between 503F and IL5 (gray shading in fig. 4), the difference in frequency between cases and controls was not significant, with an OR of 1.05 (P = 0.21, allele frequency 10.7% in cases vs. 10.2% in controls, see fig. 4). In contrast, the two haplotypes with no detectable recombination between 503F and IRF1 or IL5 (blue shading in fig. 4) were both associated with Crohn’s disease (haplotype C: OR = 1.11, P = 0.021; haplotype T: OR = 1.26, P = 1.0 × 10−6; fig. 4). The OR for haplotypes C and T combined was 1.24 (P = 2.6 × 10−8, allele frequency 37.4% in cases vs. 32.4% in controls). The pattern of haplotype association matches the prediction under the hypothesis that the ancestral sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5; 503F haplotypes with no recombination between 503F and IRF1 or IL5 are strongly associated with disease, whereas the remaining 503F recombinant haplotypes are not associated with disease (fig. 4). From the simulation results in figure 2A, we can obtain a rough approximation of how severe a disease-causing mutation would need to be in IBD5. An OR of 1.4 for disease-causing mutations in IRF1 was required to produce the observed allele frequency difference (5.5%) at 503F.
To further evaluate the potential functional importance of genes in the IBD5 region, we measured mRNA expression levels in six genes (P4HA2, PDLIM4, OCTN1, OCTN2, IRF1, and IL5) from colonic biopsies of 30 childhood-onset Crohn’s disease cases and 11 healthy controls. Notably, we observed no significant difference in expression of OCTN1 between cases and controls (fig. 5). After correction for multiple comparisons, significant mRNA expression level differences were observed only in IRF1 and OCTN2. For OCTN2, we observed lower mRNA mean expression levels among Crohn’s disease cases compared with controls (0.67 vs. 1.0; uncorrected P = 0.003). OCTN2 is a paralogue of OCTN1 and encodes a membrane transporter for carnitine. As with many genetic variants on the IBD5 haplotype, genetic variants in OCTN2 are associated with Crohn’s disease (Peltekova et al. 2004). However, a role for OCTN2 in Crohn’s disease causation is not supported by our haplotype association results and has limited biological plausibility. We observed a striking difference in expression of IRF1, with a 72% increase in Crohn’s disease cases relative to controls (mean expression levels in Crohn 1.72 vs. controls 1.0; uncorrected P = 0.0006, fig. 5). This result is consistent with a previous study of IRF1 expression levels from colonic biopsies, which reported a 90% increase in Crohn’s disease cases relative to controls (Clavell et al. 2000). The increased expression in Crohn’s patients provides additional support for a potential role in disease causation for IRF1, although the IBD5 disease-causing variants may not be directly responsible for the difference in expression. In the absence of genotype data on these subjects, we are unable to determine whether IRF1 mRNA expression differences are associated with the IBD5 haplotype, but such studies are justified by our observations.
The pattern of LD surrounding the 503F allele of OCTN1 provides strong evidence for recent positive selection in populations ancestral to Europeans starting approximately 12,500 years ago. We propose that 503F was an adaptation to early agriculture in the Fertile Crescent, increasing the absorption of ET to compensate for the lack of ET in the diet. The association between 503F and Crohn’s disease is probably the result of one or more disease-causing variants that increased in frequency via genetic hitchhiking after becoming linked to 503F. Several lines of evidence collectively implicate IRF1 as a strong candidate for the disease-susceptibility locus at IBD5: our detailed haplotype association analysis, our mRNA expression results, which are consistent with previous reports (Clavell et al. 2000), and previous studies demonstrating that the IRF1 null mouse is susceptible to intracellular bacteria (Kamijo et al. 1994; Yamada et al. 2002; Negishi et al. 2006). Furthermore, our results demonstrate that genetic hitchhiking of multiple deleterious variants can be a common occurrence during a strong selective sweep at distances of up to 3 cM from the advantageous allele. A greater understanding of the phenomenon of deleterious hitchhiking should advance efforts to identify causal variants in genomic regions with overlapping signals of disease association and recent positive selection.
We thank Alan Rogers and Jon Seger for their helpful comments and suggestions. An allocation of computer time from the Center for High Performance Computing at the University of Utah is gratefully acknowledged. This work was supported by the Primary Children’s Medical Center Foundation and National Institute of Diabetes and Digestive and Kidney Diseases grant DK069513 (to S.L.G.), The University of Luxembourg—Institute for Systems Biology Program, the Gene Expression and Sequencing Core of the National Institutes of Health (NIH)-supported Cincinnati Children’s Hospital Research Foundation Digestive Health Center (1P30DK078392-01), and NIH grant 1T32HL105321-01. L.A.D. is supported by NIH grants R01 DK078683 and R01 DK068164. J.X. is supported by NIH/National Human Genome Research Institute K99HG005846. D.W. and L.B.J. are supported by NIH grant GM59290. This investigation was also supported by Public Health Service research grant UL1-RR025764 and CO6-RR11234 from the National Center for Research Resources.