|Home | About | Journals | Submit | Contact Us | Français|
Identifying shared and disease-specific susceptibility loci for Crohn’s disease (CD) and ulcerative colitis (UC) would help define the biologic relationship between the inflammatory bowel diseases. More than 30 CD susceptibility loci have been identified. These represent important candidate susceptibility loci for UC. Loci discovered by the index genome scans in CD have previously been tested for association with UC, but those identified in the recent meta-analysis await such investigation. Furthermore, the recently identified UC locus at ECM1 requires formal testing for association with CD.
We analyzed 45 single nucleotide polymorphisms, tagging 29 of the loci recently associated with CD in 2527 UC cases and 4070 population controls. We also genotyped the UC-associated ECM1 variant rs11205387 in 1560 CD patients and 3028 controls.
Nine regions showed association with UC at a threshold corrected for the 29 loci tested (P < .0017). The strongest association (P = 4.13 × 10-8; odds ratio = 1.27) was identified with a 170-kilobase region on chromosome 1q32 that contains 3 genes. We also found association with JAK2 and replicated a recently reported association with STAT3, further implicating the role of this signaling pathway in inflammatory bowel disease. Additional novel UC susceptibility genes were LYRM4 and CDKAL1. Twenty of the loci were not associated with UC, and several appear to be specific to CD. ECM1 variation was not associated with CD.
Collectively, these data help define the genetic relationship between CD and UC and characterize common, as well as disease-specific mechanisms of pathogenesis.
Genetic epidemiology data provide compelling evidence that the chronic inflammatory bowel diseases Crohn’s disease (CD) and ulcerative colitis (UC) are related polygenic diseases. This has catalyzed a series of molecular studies aimed at gene identification in both diseases. Following the initial success of hypothesis-free linkage analyses and positional cloning in identifying NOD2/CARD15 as the first CD susceptibility gene in 2001,1,2 further progress was eagerly anticipated. The technology of genome-wide association scanning (GWAS) is now yielding multiple susceptibility loci across the range of complex disease, and nowhere has it been more successfully applied than in CD.
Others and we have recently reported the results of index GWAS studies in CD3-11 and followed this by undertaking an international meta-analysis of GWAS studies with replication in a large independent sample set.12 In total, 28 new CD susceptibility genes or loci have been identified and confirmed by these studies, with an additional 10 loci showing nominal evidence of association (that is, modest association, which in the sample sets studied does not attain sufficient statistical support to withstand correction for multiple testing). Two major pathogenic themes have emerged: first, the evident importance of interleukin 23 and the T helper cell (Th)17 pathway of T-cell differentiation, where CD associations have been confirmed in multiple components including genes IL23R, IL12B, JAK2, STAT3, and CCR6; second, the critical contribution of defects in innate immunity has been confirmed, with autophagy genes ATG16L1 and IRGM being added to NOD2 as established susceptibility genes for CD. An emerging hypothesis is that CD may be driven by commensal bacteria infecting cells of the intestinal mucosa and that it is the failure of innate immune mechanisms to appropriately clear intracellular bacteria that leads to activation of adaptive immunity and inflammation.
Many other genes and loci that are not known to contribute to either Th17 or NOD2/autophagy pathways have also been identified. The function and pathogenic role of many of these are less well characterized but will now be subjected to detailed investigation.
The recent meta-analysis involving data from the index CD GWAS studies carried out in Great Britain, North America, and Belgium/France has led to the identification of several new susceptibility loci in CD, which require interrogation in UC. Most are of modest effect size in CD, hence only being revealed in the meta-analysis and follow-up rather than in the individual index GWAS studies. Indeed, the CD susceptibility loci identified to date, with the notable exceptions of NOD2 and IL-23R, mostly confer odds ratios (OR) of less than 1.4 — effect sizes typical of those we might expect to see in UC. This point does emphasize the need to study these loci in large panels.
In a previous report, we carried out a nonsynonymous single nucleotide polymorphism (SNP) scan in UC, which confirmed the known contribution of the major histocompatibility complex, and identified a new UC locus at ECM1.13 In that study, we also followed-up the confirmed CD hits identified in our index genome-wide scan and identified variants in IL23R, IL12B, NKX2-3, and the MST1 locus on chromosome 3p21 as showing significant association with UC. A parallel German case-control study, based on following up hits from our CD data set within the Wellcome Trust Case Control Consortium (WTCCC) study, identified association between UC and variants in STAT3, BSN, NKX2-3, HERC2, and CCNY and a borderline association at PTPN2.14 The study also detected nominal association to interleukin (IL)12B, thus replicating our finding of association between variants in this locus and UC risk.
Variants within nel-like 1 precursor (NELL1, chromosome 11p15) have previously been associated with UC risk.8 In addition, MDR1 polymorphisms have been associated with UC risk,15,16 and, although not all studies have replicated this association,17 a recent meta-analysis firmly implicated this locus as a determinant both of disease susceptibility and of severity.18
GWAS studies have yet to be reported for UC, but are underway. While awaiting such studies, a potentially fruitful approach to the identification of new UC susceptibility genes is the interrogation of all GWAS-derived CD loci in UC sample sets. This has the additional benefit of identifying which loci are shared and which are specific to CD and UC, hence illuminating the genetic relationship between the different forms of inflammatory bowel disease (IBD).
To identify additional UC susceptibility genes, we investigated the contribution to UC of 28 loci implicated in CD by the recent international meta-analysis.12 In addition, we formally tested the ECM1 variant recently associated with UC for evidence of association with CD. Using the large, well-characterized panels of patients and controls available to the UK IBD Genetics Consortium, our data identify several new UC susceptibility loci.
A total of 3026 UC and 1727 CD patients were recruited in participating centers across the United Kingdom. Diagnosis of UC and CD was based on standard clinical, endoscopic, radiologic, and histologic criteria. Patient demographics, family history, smoking history, and subphenotype data were ascertained by combination of questionnaire and case note review. Details of the cases analyzed after quality control are presented in Table 1. Ethics committee approval was granted in each of the four lead UK centers, and all participants gave their signed consent for inclusion.
We genotyped 45 SNPs from 29 distinct candidate loci (primer sequences in Supplementary Table 1; see supplementary Table 1 online at www.gastrojournal.org). These comprised 28 loci from the meta-analysis (19 confirmed and 9 nominally replicated) plus one (HERC2) that was reported to be associated with UC by Franke et al14 in their follow-up of unconfirmed CD hits from the WTCCC. Genotyping was undertaken with iPLEX chemistry on a matrix-assisted laser desorption/ionization time-of-flight MassARRAY platform (Sequenom, San Diego, CA), with the 45 SNPs split across 2 genomic pools. Genotyping was carried out in 2 batches, with 2120 individuals in the first batch and 906 in the second. For each batch of cases, and on a per-pool basis, individuals missing ≥20% of genotypes were removed. We also removed 66 UC patients identified as non-white ethnicity in our previous nonsynonymous SNP scan13 and 27 likely duplicates (identical by state [IBS] >0.98). Of UC cases, 2527 satisfied quality control criteria for at least 1 genomic pool. We removed 2 SNPs (rs2872507 and rs744166) with ≥20% missing genotype data from the largest batch of cases. Post-quality control, the genotype success rate was 0.932. Taking account of the small number of SNPs genotyped in this study, these measures represent stringent data quality control. All SNPs were in Hardy-Weinberg equilibrium.
The ECM1 variant rs11205384 previ ously associated with UC was genotyped in 1726 CD cases using the Taqman biallelic discrimination system (Applied Biosystems, Carlsbad, CA) with an ABI 7900HT analyzer. Following inspection of the genotype clusters, genotypes were available for 1560 individuals (9.6% missing data).
One thousand one hundred thirty-two UK population controls were drawn from the National Blood Service (NBS) collection. The validity of blood donor DNA as a source of control genotypes for cases/control association studies has previously been established by the WTCCC.9 Genotyping was carried out using Sequenom iPLEX technology as part of the replication phase of the CD meta-analysis,12 with additional genotyping of a further 110 individuals. Forty-four of 45 SNPs genotyped in the UC cases were available for comparison in this cohort (rs916977 [HERC2] was not genotyped). Individuals of non-white ethnicity were identified and removed from further analysis.
Of the 45 SNPs genotyped in our UC cases, 26 are present on the Affymetrix (Santa Clara, CA) SNP array 5.0 and 19 on the Illumina HumanHap550 platform (San Diego, CA). Use of these arrays in our earlier experiments allowed us to incorporate the previously generated control genotyping data for these markers in the current analysis.
Thus, 1480 individuals from the 1958 British Birth Cohort (58C) and 1458 from the NBS (total 2938) had been genotyped by the WTCCC and provided control data for the 26 of 45 SNPs present on the Affymetrix SNP array 5.0 platform. The 1480 58C subjects had also been genotyped using the Illumina HumanHap550 platform by the Sanger Institute and provided data for the remaining 19 SNPs. Again, individuals of non-white ethnicity were identified and removed prior to analysis. To allow a genotype concordance check between Sequenom iPLEX, Affymetrix SNP array 5.0, and Illumina HumanHap550 genotypes, we separately regenotyped 376 of the NBS DNAs and 294 of the 58C DNAs using the iPLEX platform, across all SNPs in the current study. Concordance for genotype calls between iPLEX and the Affymetrix array was 99.38% and with the Illumina array was 99.36%.
We previously genotyped rs11205387 (ECM1) using Taqman technology (Applied Biosystems) in 1894 NBS individuals (1465 available after quality control).13
As part of the same project, we genotyped 1728 NBS individuals for rs11205387 (ECM1) using iPLEX chemistry (1563 after quality control).13
Cochran-Armitage trend tests, implemented through PLINK,19 were used to detect case-control association. For the UC experiment, P values less than .0017 are significant following Bonferroni correction for the number of independent loci (29) tested. Because only a single SNP was tested for association in our CD experiment, a P value less than .05 is statistically significant.
We inspected 3 subphenotypes of UC (disease extent defined macroscopically as extensive, left-sided, or proctitis only), 3 disease modifiers (age at diagnosis, smoking status at diagnosis, and family history of disease), and 1 disease outcome measure (surgery). A linear regression model was used to test for association between the genotyped SNPs and age at diagnosis (in whole years). For the remaining within-case analyses, individuals were partitioned into 2 groups according to affection status. Individuals smoking at diagnosis were contrasted to nonsmokers/exsmokers at diagnosis. Cochran-Armitage trend tests were used to detect association between each binary phenotype and the genotyped SNPs.
Power calculations were performed using the online Genetic Power Calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/)20 to allow a better interpretation of negative association results. Calculations were undertaken assuming a population prevalence of 0.0024 for UC,21 a multiplicative disease model, a false-positive rate (α) of .0017, and an r2 of 1 between the marker and disease locus. For each SNP, power was estimated using the given case/control ratio and the estimated allelic risk for CD from the combined case/control and transmission disequilibrium test analysis from Barrett et al 200812 (ie, assuming equivalent effect size for UC as seen for CD).
Of the 45 SNPs included in the study, 13 showed significant association with UC (see supplementary Table 2 online at www.gastrojournal.org), and these comprise 9 independent loci (Table 2). The candidate region showing the strongest association was 1q32. Both SNPs genotyped within this locus showed robust evidence of association (Prs2297909 = 4.13 × 10-8 , Prs11584383 = 5.71 × 10-7). A further 3 loci are novel associations with UC (JAK2, LYRM4, and CDKAL1). Both SNPs genotyped within the JAK2 locus show evidence of association (Prs10758669 = 1.02 × 10-5, Prs7849191 = .0015). Only a single SNP was genotyped within LYRM4 (Prs12529198 = 1.10 × 10-4) and CDKAL1 (Prs6908425 = 1.89 × 10-4).
Our study also provides the first independent replication of association between UC and variants within loci encoding IL18RAP,22 CCNY,14 and STAT314 (Table 2). Reassuringly from a technical perspective, given the known sample overlap with our earlier nonsynonymous SNP study of 905 UC cases and 1465 controls,13 we observed strong association with 2 positive controls, IL12B and the MHC locus. Importantly, for each locus showing association, we observed the same risk-increasing allele as reported previously for CD.
We observed nominal evidence of association (P < .05 but > .0017) at 8 of the remaining 20 loci. Our power to detect association (assuming the same effect size as documented in CD) at each of the 19 remaining CD risk loci is given in Table 3. Estimated power varied widely between SNPs, and, for those loci for which we had little power, analyses in larger cohorts will be required to elucidate their role in UC. The previously reported association between HERC2 and UC risk was not replicated in our data (P = .085).
We saw no evidence of association between the genotyped SNPs and any of the UC subphenotypes, disease modifiers, or outcomes that we investigated (see supplementary Table 3 online at www.gastrojournal.org). This is true even assuming a nonconservative significance threshold of P < .0017, ie, uncorrected for the number of subphenotypes tested.
There was no evidence of association between the ECM1 variant rs11205387 and CD (P = .269). In our CD cases, the minor allele frequency was 26.8%, and control panels 3 and 4 had a minor allele frequency of 25.9% and 25.5%, respectively.
Through follow-up of discoveries from the recent CD meta-analysis, this study identifies 4 new susceptibility loci for UC. In addition, we provide replicated association and thereby confirmation of 3 recently reported UC loci. These results therefore add significantly to the body of evidence for shared pathways between the 2 diseases, an observation of considerable importance both in understanding diseases pathogenesis and in discovering new therapeutic targets.
Choice of appropriate statistical thresholds for significance in association studies is important in understanding the validity of the findings. The field now recognizes the need for stringent thresholds to distinguish true signals from false-positive findings. In the current study, we only tested loci showing confirmed or nominal association with CD, for which the prior probability of association with UC is significantly elevated. Accounting for the 29 independent loci tested, Bonferroni correction suggests that P < .0017 is appropriate. Of note, all the new UC loci that we identify meet this criterion by at least an order of magnitude.
The strongest association in this study was with 2 SNPs in tight linkage disequilibrium on chromosome 1q32, establishing this association in UC unequivocally for the first time. Following up WTCCC hits in CD, Franke et al14 had previously found some evidence for association between this locus and UC, but at P = .0017 this was of borderline significance given the 50 loci tested. Three genes map to the 170-kilobase region (as defined by HapMap recombination hot spots) highlighted by the association signal. These are C1orf106, KIF21B, and CACNA1S. Fine mapping is required to identify which gene is relevant to UC (and indeed CD) pathogenesis.
One of the strongest new signals was observed at a locus encoding Janus kinase 2 (JAK2) on chromosome 9p24. In addition, we confirmed association between UC and signal transducer and activator of transcription 3 (STAT3) as recently reported by Franke et al.14 These findings thus confirm that the JAK-STAT pathway is a common feature of both UC and CD pathogenesis. This is a focal point in the downstream transmission of signals from cell surface receptors to the nucleus to modify transcription of various genes. Several cytokines and other immunoactive molecules utilize this signal transduction pathway. These include interferons, epidermal growth factor, IL-5, IL-6, and notably also the IL12/23 axis within which variants are also associated in common with both UC and CD.10,13 JAK2 and STAT3 play a key role in IL23R signaling, and STAT3 is critical for maturation of naïve CD4+ve T cells to the proinflammatory Th17 cells increasingly recognized as central to driving inflammation in IBD.
The genes highlighted by the 2 other novel UC association signals are currently poorly characterized. LYRM4 (C6orf149) on chromosome 6p25 codes for a mitochondrial ribosomal protein with sequence homology to NADH,23 and CDKAL1 on chromosome 6p22 is noteworthy for being recently confirmed as a type 2 diabetes susceptibility gene.24
Our study provides the first independent replication of association between UC and variants in IL18RAP and CCNY. The IL-18 receptor accessory protein (IL18RAP) is involved in IL-18 signaling and has sequence homology to the IL-1 receptor accessory protein (IL1RACP). IL-18 is released by macrophages and, with IL-12, induces cell-mediated immunity following microbial infection.25-27 As well as association with CD, variants in IL18RAP have also been reported to be associated with celiac disease.22,28 CCNY (Cyclin fold protein-1) has also previously shown association with both CD and UC. The protein product belongs to the cyclin protein superfamily and contains a protein-binding domain that plays a role in cell-cycle and transcription control by regulating cyclin-dependent kinases.
None of the UC loci that we have identified showed significant subphenotype association. With regard to disease extent, this is perhaps unsurprising given that these are all also CD susceptibility loci. As generic IBD loci, it would be surprising if any were associated with one subclass of UC as defined by extent of colonic involvement but not others. Such subphenotype specific loci are more likely to be associated with UC alone and to derive from forthcoming GWAS studies in UC, although a surprising feature has been the paucity of such effects in CD.10
We do not see significant evidence of association at the previously reported UC susceptibility locus, HERC2 (P = .085), even though our power to detect association at rs916977 given the reported UC effect size (OR, 1.46) is estimated to be 100% (assuming a multiplicative disease model, a population prevalence of 0.0024,21 and a falsepositive rate [α] of .0017). The minor allele frequency in the Franke et al 2008 population controls is 0.106, yet, in our combined controls, the frequency is 0.140 (similar to the case frequencies of both studies). The minor allele frequency of this SNP in the CEU HapMap B36 data is 0.133. It appears that the unusually low minor allele frequency at this SNP in the control samples from the Franke et al 2008 study underlies the unreplicated association at this locus. Additional, high-powered, case-control, and family-based association studies are therefore needed to elucidate fully the role of HERC2 in UC.
A number of loci previously associated with CD did not show evidence of association with UC. For some loci, this may reflect a lack of power because extremely large sample sets would be required to reliably detect small effects with ORs <1.15. For others, our UC sample set had good or high power to detect an effect comparable with that seen in CD (Table 3). Of note, such loci include those encoding genes ICOSLG and CCR6. Both encode proteins that appear to play a key role in T-cell activation and differentiation. Therefore, it appears that CD-specific risk loci are not limited to disruption of innate immune pathways such as autophagy and NOD2.
We compared effect sizes for the shared UC/CD loci but did not observe any evidence of significant heterogeneity (Cochrane Q test: P < .05). Although there is some uncertainty in these effect size estimates, we can rule out scenarios in which any of these loci has a substantially different effect in UC vs CD. Overall, both phenotypes conform to the expectation of many genes of modest effect.
ECM1 was not associated with CD in the current study. We had 100% power to detect association at this SNP (assuming a multiplicative allelic OR of 1.23, a population prevalence of 0.00145,21 and α = .05). Even allowing for a much reduced OR of 1.10, our power to detect association to this SNP is 82%. It therefore appears that variants within ECM1 specifically confer susceptibility to UC risk.
What are the implications of this study with regard to the overall genetic architecture of the IBDs and the molecular relationship between CD and UC? Clearly, a complete understanding must await full, well-powered GWAS experiments in UC to compare with the GWAS studies in CD. In the meantime, interrogation of CD GWAS hits in our UC panel has identified a number of shared loci but has also found some key differences. Of the shared pathways, the Th17 inflammatory axis is particularly noteworthy and highlighted by the fact that a number of its molecular components show association with susceptibility to CD and UC. This is in the context of earlier reports highlighting innate immunity pathways of NOD2 and autophagy as specific to CD and the current study suggesting that ECM1 is UC specific. The precise causal variants of the 9 loci shared between UC and CD have not yet been defined, and their functional significance remains to be elucidated. These are immediate priorities in the fast moving field of IBD genetics, and their resolution should further illuminate both the underlying mechanisms of chronic intestinal inflammation and the cellular processes that lead to the distinct phenotypes of CD and UC.
The authors thank the Medical Research Council and Wellcome Trust for funding the 1958 British Birth Cohort, the National Association for Colitis and Crohn’s Disease and the Wellcome Trust, which supported case collections, and all subjects who contributed samples plus consultants and nursing staff across the United Kingdom who helped with recruitment of study subjects.
The authors disclose the following: Supported by the NIHR Cambridge Biomedical Research Centre, by the Wellcome Trust (to C.A.A.), and by the Medical Research Council (to D.C.O.M.).