|Home | About | Journals | Submit | Contact Us | Français|
We undertook a meta-analysis of six Crohn’s disease (CD) genome-wide association studies (GWAS) comprising 6,333 cases and 15,056 controls, and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent/offspring trios. Thirty new susceptibility loci meeting genome-wide significance (P-value <5×10−8) were identified. A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3a, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, the results described here identify a total of 71 distinct loci with genome-wide significant evidence for association with Crohn’s disease.
Crohn’s disease (OMIM #266600) results from the interaction of environmental factors, including the intestinal microbiota, with host immune mechanisms in genetically susceptible individuals. Along with ulcerative colitis (UC), it is one of the main subphenotypes of inflammatory bowel disease (IBD). GWAS have highlighted key CD pathogenic mechanisms, including autophagy and Th17 pathways. A meta-analysis of these early scans implicated 32 susceptibility loci, but only accounted for 20% of the genetic contribution to disease risk - suggesting that more loci await discovery1. Recognizing that an increased sample size would be required to detect these, we have expanded the International IBD Genetics Consortium (IIBDGC), approximately doubling the discovery panel size in comparison with the first meta-analysis.
The discovery panel for the current study comprised 6,333 CD subjects and 15,056 controls, all of European descent, with data derived from six index GWAS studies (for overview see Supplementary Table 1)2-6. Imputation using HapMap3 reference data allowed us to test for association at 953,241 autosomal SNPs. Our discovery panel had 80% power to detect variants conferring odds ratios ≥1.18 at the genome-wide significance level of P<5×10−8, assuming a minor allele frequency ≥20% in healthy controls. Under the same conditions, the sample size of our original meta-analysis had only 11% power1.
A quantile-quantile plot of the primary meta-statistic, using single-SNP Z-scores combined across all sample sets, showed a marked excess of significant associations (Supplementary Figure 1). A total of 2,024 SNPs within 107 distinct genomic loci, including all previously defined significant hits from our earlier meta-analysis, demonstrated association with P-values <10−5. A Manhattan plot is shown in Supplementary Figure 2. 51 of the regions, representing new loci associated at P<5×10−6, were followed up by genotyping the most significant SNPs in an independent panel of 15,694 CD cases, 14,026 controls and 414 parent/offspring trios (see Table 1 and Supplementary Table 2).
Variants within 30 distinct new loci met a genome-wide significance threshold of P<5×10−8 for association with CD in the combined discovery plus replication panel, with at least nominal association in the replication panel (see Table 1). Two additional loci, encompassing the CARD9 and IL18RAP genes, had previously been reported as associated with CD in a candidate gene study7 and were here both replicated and confirmed at P<5×10−8. Five loci were identified at genome-wide significance in GWAS studies published subsequent to our replication experiment being designed. One, the FUT2 locus, was from a recent adult CD GWAS6. Four more (ZMIZ1, IL27 at 16p11, 19q13 and 22q12) were identified in a pediatric IBD population5, these replicating here in our current sample set. Two further loci had produced “suggestive” evidence of association with replication in our earlier study1. Here, these clearly exceeded the genome-wide significance threshold in the meta-analysis alone and, given the previous replication evidence, were not followed up further (see Table 1). Thus cumulatively, 39 additional loci can now be added to the 32 confirmed CD susceptibility loci identified at the time of the Barrett et al. study. We did not observe statistically significant heterogeneity of the odds ratios (Breslow Day test P-value <0.05 after Bonferroni correction; Supplementary Table 4) between the panels from our 15 different countries (Supplementary Tables 1 and 2) for any of the 71 loci. Nor was any evidence of interaction between the associated loci observed (Supplementary Figure 3).
Regional association plots of all 71 susceptibility loci including the underlying genes are shown in detail in Supplementary Figure 4, and complete genotype data including odds ratios and allele frequencies are shown in Supplementary Tables 3 and 4. Five loci had evidence for more than one independently associated variant (Table 1). While 6 of the 30 novel regions contain just a single gene, which is thereby strongly implicated in CD pathogenesis (e.g. SMAD3, NDFIP1 and BACH2), 22 include more than one gene within the associated interval (Table 1; two regions without any gene or gene prediction). We thus applied additional in silico analyses to refine the list of functional candidate genes further. These were:
Summary results of these analyses are shown in the rightmost column of Table 1. The highlighted genes are described briefly in Box 1, as are genes that constitute particularly noteworthy candidates from intervals containing one or few genes. While we believe that these evidence-based approaches are helpful in identifying likely functional candidates, in some instances the different techniques highlight different genes. This reflects uncertainty as to which is causal, and highlights the need for functional studies.
30 new signals were identified here beyond those described in the earlier meta-analysis1 and other subsequent publications. The new associations were driven primarily by increased power arising from the expanded sample size rather than improved imputation, as more than two-thirds of the novel loci identified here have good proxies (r2>0.8) on both earlier generation arrays (Illumina 300K and Affymetrix 500k Set). Extending this argument beyond the current analysis, it seems likely that many more loci of modest effect size still await discovery.
For many of the novel loci, associations have been reported previously in other complex diseases, comprising mostly chronic inflammatory disorders (Table 1). Such diseases can cluster both within families and individuals, reflecting shared genetic risk factors. For example, IBD and ankylosing spondylitis can co-segregate and both are associated with IL23R2,10 and TNFSF1511,12. The IL10 locus was previously associated with UC13 and was identified as a novel CD locus in the present study. Thus IL10 is a generic IBD locus, which is a functionally intuitive finding of potential therapeutic significance.
For loci previously associated with other inflammatory diseases the direction of effect in CD is usually the same, but in five cases the risk allele for one disease appears to be protective in another disease (see arrow symbol in “Reported association” column in Table 1). In most such instances, functional annotation suggests modulation of T cell and other immune pathways. Indeed, GRAIL highlights a number of such genes. These inverse associations may reflect overlap in the pathways by which the host regulates effector functions in defense and regulatory functions in self-tolerance. This is a delicate balance and, in the face of competing requirements, selection pressures may have conferred advantage for divergent alleles in a cell- and environmentally dependent manner.
The associated SNP rs281379 at 19q13, recently also identified by McGovern et al.6 is highly correlated (r2>0.80) with a common nonsense variant (rs601338 also known as G428A or W142X) at FUT2. This is classically referred to as the non-secretor variant, as individuals homozygous for this null-allele do not secrete blood group antigens at epithelial surfaces. Recently, non-secretors were identified as having near-complete protection from symptomatic GII.4 norovirus infection14 and the same null allele is identified here as a CD risk factor. This suggests one potential elusive link between infection and immune-mediated disease.
In contrast to the implication of coding variation in the FUT2 gene, our previous data demonstrated that most CD-associated SNPs were not in LD with coding polymorphisms1, suggesting that regulatory effects are likely to be a more common mechanism of disease susceptibility. Providing further direct evidence for this, a number of new eQTL effects were identified here (see Table 1 and Supplementary Results Section) – notably including CARD9 (LOD=12.4), ERAP2 (LOD=47.2) and TNFSF11 (RANKL) (LOD=5.9). The latter maps adjacent to but outside the associated recombination interval, suggesting another potential long-range cis-regulatory effect as previously described for PTGER4 in CD4. RANKL has pleiotropic immunological effects and also stimulates osteoclast activity. This finding may be relevant to the osteoporosis clinically associated with CD.
Given the importance of regulatory effects, it is intriguing that variants within the gene encoding a key mediator of epigenetic regulation, DNA methyltransferase 3a (DNMT3A), should be associated with CD. By inducing transcriptional silencing, DNMT3a is known to play an important role in immunoregulation. For example, it methylates IL-4 and IFN-γ promoters following T cell receptor stimulation, hence regulating T cell polarization15, and induces dynamic regulation of TNF-α transcription following lipopolysaccharide exposure in leukocytes16. Genetically determined alterations in DNMT3a activity could thus have far-reaching effects.
The 32 loci described up to 2008 explained approximately 20% of CD heritability. Adding the 39 loci described since increases the proportion of heritability explained to just 23.2%. This pattern of common alleles, explaining a logarithmically decreasing fraction of heritability (Figure 2), is consistent with a recent model of effect size distribution17, which predicted (based on the previous CD meta-analysis) that our current sample size would likely identify 48 new loci. Furthermore, it is likely that more high-frequency CD risk alleles of even smaller effect size remain unidentified: The same model predicts that 140 loci would be identified by a sample size of 50,000, but these would explain only a few more percent of CD heritability. It is clear, therefore, that larger GWAS alone will not explain all of the missing heritability in CD.
One key shortcoming of our current model of heritability explained by these loci is a direct consequence of the extent to which GWAS tag SNPs are often imperfect proxies for causal alleles, and thus substantially underestimate the true attributable risk. For example, the best tag SNP at the NOD2 locus in our meta-analysis appears to explain just 0.8% of genetic variance, whereas the three NOD2 coding mutations themselves account for 5%. If an analogous situation applies to even a small fraction of the other 70 CD susceptibility loci, the proportion of overall heritability explained will increase significantly. Indeed, one study of LD between tag SNPs and causal variants in the heritability of human height18 suggests that this effect might double the total fraction of heritability explained by GWAS SNPs. Coding variants identified here from the 1000 Genomes Project which are in strong LD with the focal SNPs in several of our regions (see Supplementary Table 4) thus now require direct assessment in order to explore this possibility.
Other factors will also account for the heritability gap, including uncertain epidemiological estimates of disease prevalence and total heritability, as well as our observation that several of the new regions contain more than one independent risk allele. The likelihood is that many more such effects will be identified. Indeed, detailed future analyses will play a key role in helping us to understand the absolute contribution of common causal alleles, as well as identifying less common variants and rare (even family-specific) mutations. By contrast, our lack of evidence for epistasis among the loci described here suggests that non-additive interactions among common risk alleles do not play an important role in the genetic architecture of CD.
The current study has approximately doubled the number of confirmed CD susceptibility loci. For many of these loci we have identified potentially causal genes, accepting that confirmation of their role must await detailed fine mapping, expression and functional studies. While the alleles detected only modestly affect disease risk, they continue to enhance our understanding of the genetic etiology of CD. Analysis for evidence of sub-phenotype associations represents an important future goal for the consortium. Thus, we are working towards sharing of detailed genotype and clinical data to allow this. In the meantime, extensive resequencing, together with large-scale fine mapping exercises using custom array-based technologies, are already underway and will further elucidate the pathogenic mechanisms of IBD.
Noteworthy genes within loci newly implicated in Crohn’s disease pathogenesis. N.B. Although we highlight these as interesting genes, we do not yet have data to confirm causality
We thank all subjects who contributed samples, and physicians and nursing staff who helped with recruitment globally. This study was supported by the German Ministry of Education and Research through the National Genome Research Network and infrastructure support through the DFG cluster of excellence “Inflammation at Interfaces”. Also the Italian Ministry for Health GR-2008-1144485, with case collections supported by the Italian Group for IBD and the Italian Society for Paediatric Gastroenterology, Hepatology and Nutrition. We acknowledge funding provided by Royal Brisbane and Women’s Hospital Foundation; University of Queensland (Ferguson Fellowship); National Health and Medical Research Council, Australia and by the European Community (5th PCRDT) and by the European Crohn’s and Colitis Organization. UK case collections were supported by the National Association for Colitis and Crohn’s disease, Wellcome Trust, Medical Research Council UK and Peninsular College of Medicine and Dentistry, Exeter. We also acknowledge the NIHR Biomedical Research Centre awards to Guy’s & St Thomas’ NHS Trust / King’s College London and to Addenbrooke’s Hospital / University of Cambridge School of Clinical Medicine. The NIDDK IBD Genetics Consortium is funded by the following grants: DK062431 (S.R.B.), DK062422 (J.H.C.), DK062420 (R.H.D.), DK062432 & DK064869 (J.D.R.), DK062423 (M.S.S.), DK062413 (D.P.B.M.), DK76984 (MD), and DK084554 (MD and DPBM), and DK062429 (J.H.C.). J.H.C. is also funded by the Crohn’s and Colitis Foundation of America; and SLG by DK069513 and Primary Children’s Medical Center Foundation. Cedars Sinai supported by NCRR grant M01-RR00425; NIH/NIDDK grant P01-DK046763; DK 063491; and Cedars-Sinai Medical Center Inflammatory Bowel Disease Research Funds. RW is supported by a clinical fellow grant (90700281) from the Netherlands Organization for Scientific Research; EL, DF and SV are senior clinical investigators for the Funds for Scientific Research (FWO/FNRS) Belgium. SB was supported by the “Deutsche Forschungsgemeinschaft” (DFG; BR 1912/5-1). JCB is supported by Wellcome Trust grant WT089120/Z/09/Z. Replication genotyping was supported by unrestricted grants from Abbott Laboratories Ltd and Giuliani SpA. We acknowledge the Wellcome Trust Case Control Consortium. We thank the 1958 British Birth Cohort and Banco Nacional de ADN, Salamanca, Spain who supplied control DNA samples. The CHS research reported in this article was supported by contract numbers N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, grant numbers U01 HL080295 and R01 HL087652 from the National Heart, Lung, and Blood Institute, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at http://www.chs-nhlbi.org/pi.htm. Other significant contributors: K. Hanigan, Z.-Z. Zhao, N. Huang, P. Webb, N. Hayward, A. Rutherford, R. Gwilliam, J. Ghori, D Strachan, W. McCardle, W. Ouwehand, M. Newsky, S. Ehlers, I. Pauselius, K. Holm, C. Sina, L. Baidoo, A. Andriulli and M.C. Renda.
Contribution of authors AF, DPBM, GRS, TA, JL, RR, JB, TH, AL, CGM, NP, JIR, PS, YS, LS, KDT, DW, CW, GKU, JDR, MD’A, RW, SV, RHD, JS, SS, VA, HH were involved in establishing DNA collections, and/or assembling phenotypic data; AF, DE, JCB, KW, TG, SR, CAA, LJ, MJD performed statistical analyses; DPBM, GRS, CWL, EMF, RNB, MB, TMB, SB, CB, AC, J-FC, MC, SC, TD, MdV, RD’I, MD, CE, TF, DF, RG, JG, AVG, SLG, JH, DH, J-PH, DL, IL, ML, AL, CL, EL, CM, WN, JP, AP, DDP, MR, PR, JS, MS, FS, AHS, PCFS, SRT, LT, TW, SRB, RW, SK, AMG, JCM, SV, RHD, MSS, JS, SS, JHC, VA recruited patients; AF, DPBM, TB, SB, KT, MG, GM supervised laboratory work; AF, DPBM, JCB, KW, SB, RHD, JS, SS, JHC, MJD, MP contributed to writing the manuscript. All authors read and approved the final manuscript before submission.
All authors declare no financial interest.