Celiac disease is a common (1% prevalence) small intestinal inflammatory condition induced by dietary wheat, rye, and barley. However despite high heritability (estimated at 87% from twin studies1
), no non-HLA genetic factors have been identified and convincingly replicated. The majority of celiacs possess HLA-DQ2 (the remainder mostly HLA-DQ82
), and how HLA-DQ2 presents cereal peptides to intestinal T cells is understood3
. However HLA-DQ2 is common in healthy individuals, demonstrating it is necessary but not sufficient for disease development.
We therefore designed a genome wide association study to identify predisposing genetic factors in celiac disease. We genotyped samples with Illumina BeadChips (Supplementary Methods
). After quality control, association analysis was performed on 310,605 SNPs with minor allele frequency >1% genotyped in 778 UK celiac cases and 1422 UK population controls (Supplementary Table 1
). Overall SNP call rate was 99.87%. Single SNP association statistics are presented in Supplementary Figure 1
Highly significant association was seen around the HLA locus, as expected. Association was strongest at rs2187668, which maps to the first intron of HLA-DQA1 (χ2
, frequency of A allele in controls 13.8%, cases 53.1%, odds ratio (OR) 7.04 [95% CI 6.08 - 8.15]). When compared with classical HLA typing (Supplementary Methods
), the rs2187668-A allele efficiently tagged HLA-DQ2.5cis
=0.97, Supplementary Table 2
is the most common HLA-DQ2 haplotype associated with celiac disease, where the two chains of the DQ2 heterodimer are encoded on the same chromosome. One or two copies of HLA-DQ2.5cis
(inferred by rs2187668 genotype) were present in 89.2% of UK celiac patients versus 25.5% of population controls. In order to identify other HLA predisposing variants occurring in the presence, or absence, of HLA-DQ2.5cis
, we performed further analyses stratified by rs2187668 genotype. In cases (n=558) and controls (n=331) of rs2187668-AG genotype, peak association was seen at rs9357152 (P=5.2 × 10−14
); and in cases (n=83) and controls (n=1059) of rs2187668-GG genotype peak association was seen at rs9275141 (P=3.9 × 10−16
). Numbers of rs2187668-AA cases (n=31) were too small for analysis. The finding that rs2187668, rs9275141 and rs9357152 map within or adjacent to HLA-DQA1 and -DQB1 underpins the critical role of HLA-DQ2/8 in antigen presentation in celiac disease.
Outside the HLA region, we observed more significant SNPs than would be expected by chance, with 56 SNPs showing association at P<10−4
(Supplementary Table 3
). Many of these SNPs are in close proximity, suggesting that some of the excess in low p-value SNPs might be due to true disease associations among multiple SNPs in linkage disequilibrium with nearby disease variants. We therefore prioritised these findings for rapid replication (interim results in Supplementary Table 3
), whilst designing a more extensive SNP replication study. We noted weak evidence for association in the previously reported region CD28-CTLA4-ICOS4
(rs4675374 P=0.007, rs11681040 P=0.008) but not the MYO9B5
The most significant (non-HLA) finding was rs13119723 (P=2.0 × 10−7, frequency of G allele in controls 15.8%, cases 10.1%). Permutation of affection status labels demonstrated genome-wide significance (in 9 of 200 (P=0.045) permutations the most significant permuted P value was ≤2.0 × 10−7). The location of rs13119723 close to IL2 and IL21 made it a highly plausible celiac disease candidate gene. We did not observe any evidence for statistical interaction between rs13119723 genotype and inferred HLA-DQ2.5cis genotype (P>0.20). We then confirmed association of rs13119723 with celiac disease in two separate collections (). The G allele of rs13119723 was more common in controls in each collection, and meta-analysis (all 4,680 samples) established highly significant disease association at rs13119723 (P=4.8 × 10−11).
Chromosome 4q27 markers in UK genome wide association scan, and replication collections.
The rs13119723 SNP maps to a region of strong linkage disequilibrium (Supplementary Fig. 2
). We had genotyped 27 SNPs in this 4q27 region, extending ~480kb from rs6835946 to rs6840978. In addition to rs13119723, four other SNPs showed association with celiac disease at P<10−4
in the UK dataset (, ). We further genotyped rs6822844, rs13151961 and rs6840978 (all strongly correlated with rs13119723, .) in the Dutch and Irish collections and replicated the UK dataset associations (, Supplementary Table 4
). The strongest association overall was seen at rs6822844, approximately 24kB 5' of IL21
(meta-analysis P=1.3 × 10−14
, OR 0.63 [0.57 - 0.71]).
Analysis of chromosome 4q27 region around rs13119723
Markers on the HumanHap300 BeadChip (Illumina) are haplotype tag SNPs. We found that the 27 SNPs genotyped in the UK collection highly efficiently captured the common genetic variation in the ~480kb region (161 of 165 common phase I+II HapMap SNPs pairwise tagged at r2
>0.8 in CEU population6
, Supplementary Methods
). Genotyping of further markers in the UK collection was therefore unlikely to contribute substantial additional information.
Finer analysis of haplotype structure in the ~480kb region in the UK collection showed subdivision into two closely correlated ~439kb and ~40kb haplotype blocks (using strict criteria7
). The rs13119723-G allele was found on a single strongly associated haplotype in both blocks (Supplementary Fig. 2
), with haplotype frequencies in the 439kb block of 10.1% in cases and 15.3% in controls (P=2.1 × 10−6
), and in the 40kb block of 16.3% in cases and 21.5% in controls (P=4.3 × 10−5
). We genotyped 10 additional SNPs to tag >5% frequency haplotypes (in addition to the 4 SNPs already tested) in the Dutch and Irish collections, and found similar haplotype structure and association across all three populations (Supplementary Table 4
). Due to extensive linkage disequilibrium, these analyses did not enable us to determine the causal variant associated with celiac disease in the 4q27 region. The population specific genetic variance at the associated 4q27 markers (CEU HapMap data) is relatively high, suggesting possible selection in the Northern European population.
The 4q27 celiac disease associated region contains three known protein coding genes NP_640336.1/Tenr
and a predicted gene of unknown function (KIAA1109
/Q6ZS70). We manually annotated the human genome sequence in the region (not shown), but did not identify further genes. IL-2, secreted in an autocrine fashion by antigen-stimulated T cells, is a key cytokine for T cell activation and proliferation. Another T cell derived cytokine, IL-21, enhances B, T and NK cell proliferation and interferon-γ
production. Both cytokines are implicated in the mechanisms of other intestinal inflammatory diseases8,9
. Expression profiles for the four genes across multiple cell/tissue types were examined in the GNF SymAtlas database (Supplementary Methods
is specifically testis expressed, and is an unlikely candidate for the causal celiac disease susceptibility gene. The function of KIAA1109
is largely unknown10
, although KIAA1109
is widely expressed as multiple splice variants in multiple tissue types. We specifically looked at gene expression in duodenal tissue from normal and celiac disease individuals (with normal histology, and with villous atrophy). Tenr
expression levels were mostly undetectable. No differences were seen between normal and treated celiac individuals for KIAA1109
. In the presence of inflammation (untreated celiac disease), KIAA1109
levels showed a modest reduction and IL21
an increase (Supplementary Fig. 3
). The syntenic mouse region to human 4q27 (Idd3
) determines susceptibility to multiple autoimmune diseases in the NOD mouse model, by a mechanism influencing IL-2 mRNA/protein levels and CD4+
regulatory T cell activity11
. However further studies are required to determine the human celiac disease susceptibility gene in this region.
Our genome wide association study has identified genetic variation in a linkage disequilibrium block encompassing the KIAA1109/Tenr/IL2/IL21
genes as a novel susceptibility factor for celiac disease. In addition to further investigation of this 4q27 region, the next steps in dissecting the genetic causes of celiac disease include larger scale replication of other putative associations and additional genome-wide analyses (e.g. of copy number variation12