|Home | About | Journals | Submit | Contact Us | Français|
DNA polymorphisms in a region on chromosome 5q33.1 which contains two genes, immunity related GTPase related family, M (IRGM) and zinc finger protein 300 (ZNF300), are associated with Crohn's disease (CD). The deleted allele of a 20 kb copy number variation (CNV) upstream of IRGM was recently shown to be in strong linkage disequilibrium (LD) with the CD-associated single nucleotide polymorphisms and is itself associated with CD (P < 0.01). The deletion was correlated with increased or reduced expression of IRGM in transformed cells in a cell line-dependent manner, and has been proposed as a likely causal variant. We report here that small insertion/deletion polymorphisms in the promoter and 5′ untranslated region of IRGM are, together with the CNV, strongly associated with CD (P = 1.37 × 10−5 to 1.40 × 10−9), and that the CNV and the 5′-untranslated region variant −308(GTTT)5 contribute independently to CD susceptibility (P = 2.6 × 10−7 and P = 2 × 10−5, respectively). We also show that the CD risk haplotype is associated with a significant decrease in IRGM expression (P < 10−12) in untransformed lymphocytes from CD patients. Further analysis of these variants in a Japanese CD case–control sample and of IRGM expression in HapMap populations revealed that neither the IRGM insertion/deletion polymorphisms nor the CNV was associated with CD or with altered IRGM expression in the Asian population. This suggests that the involvement of the IRGM risk haplotype in the pathogenesis of CD requires gene–gene or gene–environment interactions which are absent in Asian populations, or that none of the variants analysed are causal, and that the true causal variants arose after the European–Asian split.
Genome-wide association scans (GWAS) have been very successful in identifying susceptibility loci for Crohn's disease (CD), one form of chronic inflammatory bowel disease [reviewed in (1)]. The discovery by the Wellcome Trust Case Control Consortium (WTCCC) that single nucleotide polymorphisms (SNPs) near the immunity related GTPase related family, M (IRGM) gene on chromosome 5q33.1 were associated with CD provided a potentially important clue to its pathogenesis (2,3). IRGM is an atypical member of the IRG family of p47 immunity-related GTPase genes (4,5) which are characteristically induced by interferon and provide resistance to intracellular pathogens. The gene has had an unusual evolutionary history, with disruption of the open reading frame generating a non-functional pseudogene in Old and New World monkeys and apparent restoration of a truncated version in humans and African great apes (5). Although human IRGM lacks interferon-inducible elements in its promoter, reduction of its expression in culture was associated with impairment of induction of autophagy and clearance of intracellular bacteria (6,7). The region of association with CD also includes ZNF300, a gene whose product is reported to bind the promoter of the gene encoding interleukin 2 receptor beta-chain (8) involved in T cell-mediated immunity. ZNF300 is expressed predominantly in heart, skeletal muscle and brain (9), with weaker expression in the small intestine. In addition to IRGM itself and ZNF300, which is transcribed in the opposite direction to IRGM (right to left in Fig. 1), the region also contains LOC134466, a pseudogene of ZNF300 (Fig. 1). The nearest gene other than IRGM of functional interest at this locus is TNIP1, which is located 80 kb distal to the region of association. TNIP1 encodes the tumour necrosis factor alpha inducing protein 3 (TNFAIP3) interacting protein, which inhibits NF-kB activation by tumour necrosis factor (10), and could conceivably be regulated by sequences within the region of association with CD. The original report (2) and replication (3) of the association of this locus with CD have been confirmed in several other studies (7,11–16).
The association of this locus with CD is clearly robust, but significant questions remain regarding the nature of its contribution to pathogenesis. In particular, we need to establish whether the association is driven by the IRGM gene itself or by other genes in the region, and to identify the causal variants in order to understand what effects they have on gene expression and function. Identification of causal variants also has the potential to provide more precise genetic markers of disease susceptibility (1,17). We reported previously (2) that extensive re-sequencing of the IRGM coding region did not reveal any obvious causal variants. A recent study by McCarroll et al. (7) showed that the deletion allele of a 20 kb copy number variant (CNV) that maps 1.6 kb upstream of IRGM is completely correlated (r2 = 1.0) with the CD risk allele at the SNP rs13361189 (3). They also showed that the CNV deletion allele itself was significantly associated with CD in 172 cases and 344 controls (P < 0.01), and that the risk haplotype was correlated with altered expression levels of IRGM in cultured cells. IRGM expression from the risk haplotype was reduced in HeLa cells and in lymphoblastoid cell lines from 10 individuals, but increased in a colon carcinoma cell line and in smooth muscle cells. They therefore proposed that the CD association results from altered regulation of IRGM, and that the common deletion polymorphism is likely to be the causal variant. This was further supported by the fact that the strongest association with CD from this region in a North American GWAS was with rs13361189 (p 3.02 × 10−4) just upstream of IRGM (7,18).
The fact that IRGM plays a role in autophagy, and that SNPs in another autophagy-related gene, autophagy 16-like isoform 1 (ATG16L1), are also associated with CD (2,18,19), add weight to the hypothesis that IRGM is the causal gene at this locus. However, given the extent of the association signal and the lack of experimental evidence that the CNV itself is directly responsible for the regulation of IRGM expression, we have undertaken a detailed genetic analysis of the contribution of this locus to susceptibility to CD. We have used the results of a large meta-analysis of three GWAS in CD which combined data from 3230 cases and 4829 controls (16) to provide a more robust estimate of the extent of the association across this locus. In addition we have carried out fine mapping in the region of association, and screening of all exon sequences, including ZNF300 and the previously neglected IRGM promoter and exon 1, for novel genetic variants. This was followed by an association study and conditional analysis of novel and known variants in a large UK-based case–control (1800 versus 2000) cohort. Finally, we investigated the expression of candidate genes and the association of candidate variants in different populations, and examined IRGM expression in a physiologically relevant primary tissue (lymphocytes) from CD patients of known risk genotypes. Our results provide novel insights into the contribution of sequence variants at this locus to disease susceptibility.
The WTCCC study (2) found strong association of 11 SNPs with CD, spanning a 110 kb region of chromosome 5 (from rs13361189 at 150 203 580 bp to rs931058 at 150 313 891 bp, NCBI build 36). In addition, a meta-analysis of three GWAS in CD (16) included 35 SNPs in the 200 kb interval from 150 150 000 bp to 150 350 000 bp on chromosome 5. The results, which are plotted onto the physical map of the region in Figure 1, show strong association with CD from the SNP rs2112637 at position 150 162 627 bp to SNP rs931058 at 150 313 891 bp. The most significant SNPs were rs11747270 and rs1000113 (P = 6.37 × 10−11 and 7.5 × 10−11), which are both located within the 42 kb of non-coding DNA between IRGM and ZNF300, and rs13361189 (P = 8.17 × 10−11) just upstream of IRGM. These data suggest that, purely on the basis of physical location, both IRGM and ZNF300 should be considered as candidates for the source of the association signal.
In order to evaluate the extent of the association signal and to detect any possible additional associated common haplotypes not well tagged on the Affymetrix 500K SNP array, genotypes from the HapMap panel, from Caucasian Europeans from Utah (CEU), were used to identify eight additional SNPs which provided more complete tagging of the region. These were genotyped in 931 CD cases and 976 controls (Supplementary Material, Table S1). The only new tagging SNP that was associated with CD was rs12659118, located within LOC134466 (MAFCD = 0.116, MAFCON = 0.084, P = 0.0017). This SNP is in strong linkage disequilibrium (LD) with rs13361189 in controls (r2 = 0.79), and was not associated with CD in individuals who did not carry the risk allele at rs13361189 (MAFCD = 0.019, MAFCON = 0.016, P = 0.55).
The strong association across the region suggested that re-sequencing the IRGM and ZNF300 genes to screen for possible causal variants was warranted. The six exons and adjacent splice sites of the ZNF300 gene were sequenced (see Materials and Methods) in 45 cases. The only variant detected was a synonymous SNP, rs17800771, which is in strong LD with the SNP rs2290989 (r2 = 0.88) that was genotyped in the WTCCC scan and was not associated with CD (P = 0.42).
Previous re-sequencing of the coding regions of IRGM in more than 700 CD cases detected two non-synonymous SNPs, E17D and T94K, which were not associated with CD in a sample of 769 cases and 705 controls, and an exonic synonymous SNP (L105 or rs10065172), which was associated (3). We extended this study by genotyping E17D and T94K in an expanded panel of 1400 cases and 1800 controls. Neither E17D (MAFCD = 0.0028, MAFCON = 0.0013) nor T94K (MAFCD = 0.044, MAFCON = 0.042) were associated with CD (P = 0.19 and 0.86, respectively).
In view of the lack of association of potential functional coding variants we investigated the upstream region of this gene for variants that might affect IRGM expression. The human and ape versions of IRGM are unusual in that the ancestral promoter has been supplanted by the promoter element of an inserted endogenous retrovirus (ERV9) long terminal repeat (LTR) ~1.6 kb upstream of the initiation codon (4,5). This has also introduced an upstream exon (exon 1) which encodes the first 695 bp of the 1.11 kb 5′ untranslated region (UTR) of IRGM and contains the ERV9 U5 repeat elements (Fig. 2). A 2.9 kb region spanning the IRGM initiation codon, the entire 5′-UTR including exon 1 and intervening intron, ERV9 LTR and promoter were sequenced in 94 unrelated individuals, including 43 cases of CD. Two insertion/deletion (indel) polymorphisms were detected (Fig. 2). One is a 4 bp insertion in the promoter region of the ERV9 LTR (−1644insTGGG) and the other is a 12 bp insertion in exon 2 (308 bp upstream of the initiation codon) which has also been detected in the Ghanaian population (20). The −308 variant is a microsatellite which has a common allele (GTTT)2 and two additional alleles, (GTTT)4 and (GTTT)5. The −1644ins is located between three closely juxtaposed transcription factor (TF) binding sites for nuclear factor gamma, myeloid zinc finger 1 and GATA binding protein 2. The region upstream of IRGM also contains a CNV (21–23) which is correlated with altered expression of IRGM (6). Fine-mapping of the CNV on a high-resolution array and sequencing of the breakpoint revealed a deletion of 20 101 bp, spanning from 150 183 354 to 150 203 455 on chromosome 5 (NCBI 36, Fig. 2). The position and size of the CNV, as we identified it, is in close agreement with what has been reported previously (5,7).
The association of the IRGM promoter indel, microsatellite and upstream CNV with CD was investigated by analysis of these variants together with a strongly associated and replicated SNP from the WTCCC study (rs13361189) (2,3) and the synonymous coding SNP (L105 or rs10065172) in 1848 CD cases and 2025 population controls. Mapping of the CNV breakpoints enabled the design of a robust, qualitative assay with a common forward primer positioned immediately upstream of the CNV and two allele-specific primers, one located in the deleted region and the other immediately downstream of the CNV [Materials and Methods and (24)]. The results (Table 1) show that all these variants are strongly associated with CD, with the most significant signals coming from the deletion allele of the CNV and from the WTCCC SNP, rs13361189 (Pallele = 1.4 × 10−9 and 3.7 × 10−9, respectively). However, allele frequencies and odds ratios for all the variants with the exception of the CD-associated allele −308(GTTT)5 are very similar. As reported by McCarroll et al. (7), the CNV was in strong LD with rs13361189 and rs10065172 (Fig. 3). The promoter variant −1644ins was also in strong LD with the CNV and both of these SNPs (r2 > 0.9 in cases and controls), whereas −308(GTTT)5 was in moderate LD with the other four variants.
The existence of multiple highly correlated variants (some of which have potential functional significance) which are all associated with disease risk raises the question as to which, if any, of these might be a causal variant and thus driving the association at this locus. We investigated this by conditional logistic regression analysis across the five loci (Table 2). The analysis showed that the CNV remained highly significantly associated with disease when conditioned on the variants at −1644 and −308; that is to say, when all the association at either of the two variants was accounted for, there remained significant independent association with the CNV (P = 1.6 × 10−5 and 2.6 × 10−7, respectively). However, the effect of the CNV on disease was not significant when conditioned on the two SNPs rs13361189 or rs10065172 (P = 0.221, 0.251). Thus the effect of the CNV could not be distinguished from the effect of either SNP, which is consistent with the strong LD between these three variants. Similarly, the two SNPs showed an association that was independent of the variants at −1644 and −308 but not of each other or the CNV. The −1644 variant showed significant independent association when conditioned on either the CNV (P = 3.1 × 10−4) or on −308 (P = 9.3 × 10−6), but was not significant or marginally so when conditioned on the two SNPs (P = 0.142, 0.047). However, −308 showed highly significant independent association when conditioned on any of the other four variants (P = 9.4 × 10−6 to 2.38 × 10−4). The apparently independent effect from −1644 appeared to be due to two very rare haplotypes (haplotypes 10 and 11 in Table 3) which had a combined frequency of 0.0035 in CD cases but were not present in controls; one of these (haplotype 10 in Table 3) contained the risk (del) allele at the CNV and the common (del) allele at −1644. In our case–control study the −308(GTTT)4 (rare 8 bp insertion) allele was detected in only four CD cases and in none of the controls (haplotype 11 in Table 3). It is possible that these very rare haplotypes, that were only seen in CD cases (n = 7), were over-inflating the test statistic. The conditional regression analysis was therefore repeated with the exclusion of rare haplotypes with a frequency <0.005 (Table 4). In this analysis the independent effect observed previously for −1644 disappeared and thus the effects of the CNV, −1644 and the two SNPs were indistinguishable. However, all remained significant when conditioned on the −308, and conversely, −308 retained highly significant independent association with CD when conditioned against all other four variants (P = 4.24 × 10−5 to 3.9 × 10−4). At least part of the independent effect for the −308 variant appeared to be due to a haplotype that had the non-risk (non-deleted) allele at the CNV but the high risk (GTTT)5 allele at -308 (haplotype 3 in Table 3); this haplotype had a frequency of 5.2% in CD cases and 4.1% in controls and was associated with CD (P = 0.038; Table 3).
The analysis was repeated on a subset (75%) of cases (1265) and controls (1609) with complete genotypes for all five loci and produced very similar results (not shown). The regression analysis suggests that the haplotype represented by the CNV, the two SNPs and −1644ins constitutes one independent effect on disease risk, and that −308(GTTT)5 may be another.
The difficulties in identifying the origin of association signals at the IRGM locus in European populations led us to investigate the frequency of these variants and LD structure in other populations, since weaker LD might facilitate fine-mapping studies. We genotyped the CNV, promoter variant and microsatellite in five of the HapMap3 populations for whom genotype data were available for the original associated SNP rs13361189. These were the Han Chinese from Beijing (CHB), the Japanese from Tokyo, Japan (JPT), the Yoruba in Ibadan, Nigeria (YRI) and two other African populations, the Maasai in Kinyawa, Kenya (MKK) and Luhya in Webuye, Kenya (LWK). We found that the frequencies of all four CD risk variants were much higher in all these populations compared with the white UK population (Fig. 4). Indeed, the CNV deletion (CD risk) allele is the common allele in the YRI population, whereas the −308(GTTT)5 CD risk allele is the common allele in both Asian populations. Also of interest is that the −308(GTTT)4 allele, which is very rare in European populations, has a frequency of 8–18% in the African populations; the frequencies of the −308 alleles in the Yoruba group are similar to those reported in the West African population of Ghana (20). We observed that LD between the CNV and rs13361189 was lower in the JPT (r2 = 0.88) and CHB (r2 = 0.84) compared with Europeans (r2 = 0.95), and was substantially lower in two of the three African populations (YRI: r2 = 0.70, LWK: r2 = 0.66). LD between the CNV and both the −1644 and the −308 variants was also lower in Asian and African populations as compared with Europeans (Supplementary Material, Fig. S1). The reduction in LD observed across this locus in Asian and African populations is consistent with the substantial differences in allele frequencies, and suggested that investigation of the association of these variants with CD in other populations might provide further insight into their contribution to CD.
An African case/control sample was not available, so we focused our analysis on a well-studied Japanese collection. A recent analysis of 484 Japanese CD cases and 470 controls from this collection found no association of SNPs rs13361189 or rs4958847 at the IRGM locus with CD (25). We genotyped the CNV, variants at −1644 and −308 and SNPs rs13361189 and rs10065172 in the same 484 CD cases and in an expanded set of 933 Japanese controls. The results (Table 5) show no association of these variants with CD, with the exception of a weak protective effect for the −308(GTTT)5 allele (P = 0.03), which is the risk allele in Europeans. As in the UK population, there was very strong LD between the CNV, −1644, rs13361189 and rs10065172 (r2 > 0.95), with −308 again being in weaker LD with the other four variants (r2 = 0.48–0.55). One obvious explanation for this lack of association is inadequate power to detect small effects. Reported odds ratios for the IRGM locus were 1.33 and 1.34 in two meta-analyses (15,16) and 1.44 in a recent study of a large Dutch–Belgian cohort (13). Power to detect association at the CNV in this Japanese case/control sample with an allele frequency in controls of 0.38 is >90% for an OR = 1.35, P = 0.05, so the study is well powered assuming that the effect size in Japanese is similar to that in European populations.
In view of the lack of obvious pathogenic or disease-associated variants in the coding regions of IRGM and the presence of several potential regulatory variants upstream of the ATG start codon (including −1644, −308 and the CNV), we then addressed the question of whether a correlation existed between the risk haplotype and expression levels of IRGM. McCarroll et al. (7) reported variable effects on IRGM expression in different cell lines and cell types (7). We analysed IRGM expression from the high- and low-risk haplotypes by sequencing cDNA and genomic DNA prepared from primary lymphocytes of eight CD patients who were heterozygous for the risk haplotype, and comparing the relative expression of the C (low-risk) and T (high-risk) alleles of the exonic SNP rs10065172 in these individuals (Supplementary Material, Fig. S2). This showed that expression of the T allele was markedly lower than the C allele (C/T peak height ratio in cDNA: 1.82–353.44) in seven of eight samples tested (P = 0.015). We also analysed primary lymphocytes in 25 CD patients of defined IRGM genotype and measured expression of both genes by real-time quantitative RT–PCR. The risk haplotype is relatively rare in European populations, and expression levels of IRGM varied widely between individuals. Nonetheless, we found significantly lower IRGM expression (P < 10−12) in homozygotes and heterozygotes for the risk haplotypes at all three loci, i.e. the CNV, −1644ins and −308(GTTT)5 than in homozygotes for the absence of the risk haplotype (Fig. 5). Most individuals tested had the same genotype for all three variants as a result of the strong LD between them, so we could not test for independent effects of the variants on expression.
We next analysed the effect of IRGM genotype on IRGM expression using microarray expression data from lymphoblastoid cell lines in the Asian and extended African HapMap populations (26; Stranger et al., in preparation). The risk alleles for the CNV, rs13361189 and −308(GTTT)5 were associated with a highly significant reduction in expression of IRGM in the YRI population and in pooled data from the three African populations (YRI, MKK, LWK), with much weaker association for −1644ins (Table 6). However, there was no association of any of the loci with altered IRGM expression in the Japanese or Chinese populations. Interestingly, overall expression of IRGM was higher in the JPT and CHB samples than in the three African populations, with the lowest expression across all populations observed in Europeans (P < 10−15; Fig. 6).
In addition to these quantitative effects, we noted that the −308(GTTT)5 allele potentially strengthens an alternative splice acceptor site 139 bp [or 151 bp on the (GTTT)5 allele] downstream of the canonical splice site (Fig. 2 and Supplementary Material, Fig. S3) by extending its polypyrimidine tract. The splicing of IRGM mRNA was therefore investigated by RT–PCR in four individuals with three possible genotypes at −308: (GTTT)5/(GTTT)5, (GTTT)5/(GTTT)2 and (GTTT)2/(GTTT)2. The identity of each mRNA was determined by sequencing of gel-extracted products. The −308(GTTT)5 resulted, as predicted, in use of the alternative splice site with removal of 139 bp from the 5′ untranslated region of the IRGM transcript (Supplementary Material, Fig. S3).
Finally, we investigated the expression of the other gene at this locus, ZNF300, since it remains a possible source of the association with CD. We looked for a correlation between the risk haplotype and ZNF300 expression in lymphocytes from the same 25 CD patients that showed a correlation for IRGM but found none (P = 0.10 for patients typed for the CNV, data not shown). Similarly, analysis of microarray expression data from 141 HapMap samples did not detect significant correlation between the risk haplotype and ZNF300 expression (P = 0.45). This does not exclude possible qualitative effects of the risk haplotype on ZNF300 expression.
In this study, we have addressed a question generic to the follow-up of GWAS in complex disease, which is how to define the causal genes and variants that are driving an association at a specific locus. At the IRGM locus, very significant association of SNPs with CD is seen across an interval which includes two known genes, IRGM and ZNF300. The biological evidence for the role of IRGM in autophagy, coupled with the correlation of the risk haplotype with altered IRGM expression, constitutes strong support for its role in the pathogenesis of CD. However, since the strongest association signals extend from just upstream of IRGM to a position midway between IRGM and ZNF300, and since IRGM does not contain an obvious functional variant, the ZNF300 gene cannot be formally excluded as the source of the association signal. The strong LD between the CNV upstream of IRGM and SNPs associated with CD, and the correlation of the deletion allele with altered IRGM expression (7), further supports a primary role for IRGM in pathogenesis and for the deletion as the causal variant. However, functional evidence for the role of this gene in intestinal inflammation and for a direct regulatory effect of the CNV (as opposed to association with altered IRGM expression) is needed. We have sought to address the question of which gene and which variants might be driving the association by sequencing the upstream region of IRGM and the coding region of ZNF300 to look for other potential causal variants, and by conditional analysis of the most strongly associated variants in a well powered sample of CD cases and controls. In addition we have investigated association of these variants with IRGM expression in CD cases from the UK as well as in five other populations from Africa and East Asia and carried out a case–control analysis in a Japanese population.
Sequencing of all the exons and adjacent splice sites of ZNF300 did not reveal any functionally relevant or CD-associated variants. However, sequencing of the 5′-UTR, intron and complex ERV9-derived promoter region of IRGM identified two insertion/deletion polymorphisms, one of which is novel and located within the proximal promoter (c.IRGM −1644) between three TF binding sites. This variant is strongly associated with CD but is in tight LD with the CNV. Another indel at c.IRGM −308, has been previously described as a tetranucleotide repeat (microsatellite) in a Ghanian population (20). We found that the −308(GTTT)5 allele was common in the UK population and was also significantly associated with CD. We also detected the 8 bp insertion allele −308(GTTT)4, which had a frequency of 0.1% in CD cases and 0% in controls. This suggested that both alleles were possible new CD risk alleles, which was supported by the multi-allelic association test. Conditional regression analysis of our data provided highly significant support for an independent effect for the −308 microsatellite polymorphism. The −308(GTTT)5 allele reinforces an alternative splice site which removes 139 bp from the 5′ untranslated region of the IRGM transcript. The consequence of this interstitial deletion is not known, but it may affect the stability of the transcript or the rate at which it is translated.
A previous study (7) has shown that the risk haplotype at the IRGM locus is associated with either a reduction or an increase in IRGM expression, depending on the cell line analysed. We find that the risk alleles −308(GTTT)5, CNVdel and −1644ins are all significantly associated with a down-regulation of IRGM expression in untransformed lymphocytes from CD patients. Given the strong LD between all three variants, a very large sample of individuals of known genotype would be required to determine whether each of the variants had independent effects on gene expression. It is possible that other as yet unknown variants at this locus may have different effects; the SNP −261C>T (rs9637876) has recently been reported to confer protection from tuberculosis caused by infection with Mycobacterium tuberculosis (27). Direct functional analysis of the effect of all these candidate causal variants on IRGM expression is likely to be required to fully resolve their contribution to CD pathogenesis.
The lack of association of the index SNPs or candidate causal variants with CD in the Japanese population is intriguing. This does not appear to be due to phenotypic differences, since CD in Japanese patients is clinically indistinguishable from Europeans. It is also unlikely to be due to insufficient statistical power, unless the effect size is significantly smaller in this population. An alternative explanation is that the contribution of IRGM to pathogenesis requires gene–gene or gene–environment interactions which are absent in the Japanese. It is also possible that none of the variants tested, including the CNV, are causal and that the causal variant at this locus arose after the European–Asian split, as is the case for the major CD susceptibility gene NOD2/CARD15 (28,29). A further possibility relates to the much higher expression of IRGM we observed in CHB and JPT HapMap samples than in the CEU samples. If the variants studied here result in only a modest decrease in IRGM levels, then the relative effect may be insufficient to influence disease risk in Japanese individuals, who show significantly higher expression than Europeans. Conversely, in European individuals, for whom we observed a much lower baseline expression, these variants may result in a larger relative effect that is sufficient to influence disease risk. This would be consistent with the lack of correlation between IRGM expression levels and risk haplotype in the JPT and CHB cell lines.
In conclusion, we have shown that multiple sequence variants upstream of the IRGM gene with potential gene regulatory effects are strongly associated with CD and with reduced IRGM expression in untransformed cells from CD patients. The lack of association of these variants with CD in the Japanese population suggests that they may have population-specific effects on the pathogenesis, or that more recent, un-described mutations may be responsible for the association in European populations. A combination of genetic and functional approaches will be required to fully understand the contribution of this locus to the development of this form of chronic inflammatory bowel disease.
More than 1800 patients with CD were recruited from specialist IBD clinics in London and Newcastle (30) after informed consent and ethical review (REC 05/Q0502/127). 2000 Population controls were obtained from the 1958 British Birth Cohort, which includes subjects born in 1 week of March 1958 in England, Scotland and Wales (31). UK case–control studies were restricted to white Caucasian individuals. Japanese CD cases (484) and controls (933) are described elsewhere (32). HapMap DNA samples were purchased from Coriell Cell Repositories, Camden, NJ, USA.
A 2.9-kb region of genomic DNA upstream of IRGM encompassing the entire 5′-UTR including the upstream exon 1 and intervening intron, ERV9 LTR and promoter was amplified in 94 unrelated individuals (including 43 CD individuals with known risk haplotype, 29 UC and 22 unaffected) using 8 pmol of each primer 5′-ACAATGAGTGTGTGAAACAGACCT-3′ and 5′-CATAGTGATGTTAACTGGTGTCCTG-3′, 1× PCR Master mix (Promega) and 25 ng of template genomic DNA in a 10 μl reaction. PCR conditions were as follows: 5 min at 95°C followed by 35 cycles of 30 s at 95°C, 30 s at 62°C and 3 min at 72°C with a final extension step of 10 min at 72°C. Subsequent ExoSAP-IT clean up (USB Europe, Staufen, Germany) followed by forward and reverse cycle sequencing was performed in ten independent reactions using 8 pmol of each of the overlapping nested sequencing primers (Supplementary Material, Table S2) and 0.25 μl of ABI BigDye v3.1 (Applied Biosystems) in a 5 μl reaction volume and using standard conditions. Products were analysed on an ABI3730xl DNA sequencer (Sequence analysis, Applied Biosystems) and aligned to the published genomic sequence using the Sequencher 4.7 package (GeneCodes).
Fine-mapping of the CNV was performed on a custom NimbleGen 385k array across a 300-kb interval encompassing the BAC on the WGTP array on which the CNV was first identified (22). The median spacing between probes was 45-bp. This custom array also targeted a number of other CNVs, which are not described here. The results confirmed that the CNV is a bi-allelic polymorphism that comprises either the presence or absence of 20 kb of sequence on chromosome 5q. The non-ancestral deletion spans from 150 183 354 to 150 203 455 on chromosome 5 (NCBI36).These data were subsequently used to design PCR primers, 5′-TTGCTGATGGCATGATCTTC-3′ and 5′-ATATGGCGAGAGCAGCAACT-3′ for amplifying and sequencing the deletion breakpoint.
The regions flanking the IRGM −1644 and −308 polymorphisms were amplified independently using 4 pmol of each of the following primer pairs 5′-AAATGGACCAATCAGCAGGA-3′ (5′ labelled with 6-FAM fluorescent dye): 5′-AGGGGCCAGGTATTTGAGAC-3′ and 5′-TGCCCACAGATACGACAGAG-3′ (5′ labelled with HEX fluorescent dye): 5′-GGACGCAGATATTGCAGTGA-3′, respectively. The reaction mix also included 1× PCR Mastermix (Promega) and 25 ng of genomic DNA in a 10 μl reaction volume. PCR conditions were as follows; 2 min at 95°C followed by 30 cycles of 20 s at 95°C, 30 s at 60°C and 30 s at 72°C, with a final extension step of 5 min at 72°C.
The CNV upstream of IRGM was genotyped via allele-specific PCR with a common forward primer and two allele specific reverse primers. The common forward primer (5′-AACAGTGACCTATCTGAAAAGGAAA-3′) was 5′ labelled with 6-FAM fluorescent dye and complementary to sequence immediately upstream of the copy number region. Of the two allele-specific reverse primers, the one complementary to sequence within the copy number region immediately adjacent to the forward primer (5′-TTGAAATTTTGTAGAGATTGCATTG-3′) will only amplify if the 20 kb copy number variant sequence is present, and the other complementary to sequence immediately downstream of the copy number region (5′-TGCAGGGTACTGACTGTCCA-3′) will only amplify if the 20 kb copy number sequence is absent (deleted). The assay was validated by analysis of eight HapMap samples of known CNV status (22) (2 copies: NA07000, NA07348; 1 copy: NA11995, NA12874, NA18501; 0 copies: NA18545, NA18547, NA18555). All samples gave genotypes consistent with CGH data. PCR products for all three variants (CNV, −308, −1644) were diluted 1 in 50, pooled for each individual and separated via by capillary electrophoresis on the ABI3730xl Genetic Analyser with 10 μl of HiDi formamide and 0.125 μl of GS500LIZ size standard (both Applied Biosystems). SNPs rs10065172 (L105) was genotyped using validated Taqman assays (ABI), and allelic discrimination was carried out via endpoint read on ABI7900HT Sequence detection system. All genotypes at all variants were in Hardy–Weinberg equilibrium (P > 0.01).
Lymphocytes from patients with CD, genotyped for all IRGM and upstream variants, were harvested from 40 ml of peripheral blood. Peripheral blood mononuclear cells were isolated by Lymphoprep (Axis-Shield, UK) and cultured at a density of 2 × 106 cells/ml in RPMI (Sigma Aldrich, UK) supplemented with 2 mm glutamine (Sigma Aldrich, UK) and 10% FCS (Sigma Aldrich, UK) in 24-well plates for 2 h at 37°C in a humidified atmosphere with 5% CO2. After this time the non-adherent cell fraction (lymphocytes) were removed and washed twice in PBS. The cell pellet was then re-suspended in 0.5 ml RNAlater (Sigma Aldrich, UK), incubated for 24 h at 4°C and then stored at −80°C. Whole RNA was extracted from primary lymphocytes using the Ribopure kit (Ambion) and quantified using the Agilent Bioanalyser RNA 6000 Nano chip (Agilent Technologies UK Limited). cDNA synthesis was performed on 500 ng per sample of whole RNA using iScript cDNA Synthesis Kit (BIO RAD Laboratories, CA, USA). HapMap RNA was purified from cells purchased from Coriell Cell Repositories, Camden, NJ, USA.
Sequencing of the exonic SNP rs10065172 in cDNA and genomic DNA (gDNA) samples was performed using standard procedure (see above) with exonic primers flanking the SNP (sequences available on request). Mean C and T peak heights in duplicate samples of cDNA and gDNA from eight CD individuals sequenced for SNP rs10065172 were estimated from sequence electropherograms by Sequence Scanner Software v1.0 (Applied Biosystems, Foster City, CA, USA). Comparison of the mean ratio of C:T peak heights in cDNA versus gDNA were calculated via the Wilcoxon signed rank test. A ratio of >1 indicates higher expression of the C allele (33).
Quantitative fluorescent real time RT–PCR was carried out in triplicate on 1 μl of cDNA from primary lymphocyte samples of 24 CD patients using custom 6-FAM labelled fluorigenic Taqman MGB probe (5′-TGCCCACAGATACGAC-3′) and flanking primers 5′-CCCGCCTGATGAGCTTACTC-3′ 5′-AAGAGGTTAAGGATGCAGCTAATAGAG-3′ and a parallel reaction with a GAPDH endogenous control (Eurogentech Ltd, Southampton, UK). The IRGM genotype of the 24 patients had been previously determined for each of the risk variants −1644 (11/8/5) −308 (9/7/5) and CNV (10/8/5). Real-time Quantitative-PCR was carried out on ABI7900HT system. Results were analysed via the ΔCt method for relative cDNA quantitation (http://www3.appliedbiosystems.com/cms/groups/mcb_support/ documents/generaldocuments/cms_042380.pdf). Briefly, a threshold fluorescence level was selected at which PCR amplification of the target sequence was in the logarithmic phase and the cycle number at which each sample PCR reaction crossed that threshold was recorded (the threshold cycle or Ct). The relative quantity of cDNA for the target gene in each individual (ΔCt) was calculated as the difference between the mean Ct value for the target and the mean Ct value for the endogenous control and all were calibrated (normalised) with Ct values of cDNA from a low-level expressing placental sample.
Alternative splicing of IRGM mRNA was investigated by RT–PCR of cDNA from four individuals with three different genotypes for the IRGM-308 variant [(GTTT)5/(GTTT)5, (GTTT)5/(GTTT)2, (GTTT)2/(GTTT)2] using forward primer 5′-GTCTCAAATACCTGGCCCCT-3′ and reverse primer IRGMPROM_PCR_rev (Supplementary Material, Table S2). The identity of each cDNA species was confirmed by sequencing of gel-extracted product.
Expression of IRGM and ZNF300 in HapMap3 RNA samples was analysed on Illumina human whole-genome expression arrays as previously described (26), but using Illumina WG-6 v2 arrays.
Association analysis for qualitative (CD) and quantitative (IRGM expression) trait loci, including conditional regression analysis was performed using UNPHASED v3.0.12 (34), the latter assuming a full haplotype model. Haploview v4.1 (35) was used to calculate linkage disequilibrium coefficients (r2). All other statistical analysis was performed using R v2.7.0 (www.r-project.org). Linear regression with repeated measures was used to analyse IRGM expression (as estimated by ΔCt values from Q-RT–PCR) with multiple replicates for each individual. The relationship between IRGM expression (from Illumina microarray data) and IRGM genotype in the different HapMap populations was also analysed using linear regression.
Conflict of Interest statement. None declared.
This work was supported by the Wellcome Trust (081808/C.G.M.), the National Institutes of Health Research Biomedical Research Centres at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London and University College Hospital Trust with University College London; and the Guy's and St Thomas' Charity. We acknowledge use of the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. Funding to pay the Open Access charge was provided by the Wellcome Trust.