|Home | About | Journals | Submit | Contact Us | Français|
We present a genome-wide association study of ileal Crohn's disease (CD) and two independent replication studies that identify five novel regions of association to CD. Specifically, in addition to the previously established CARD15 and IL23R associations, we report strong associations with independent replication to variation in the genomic regions encoding the PHOX2B, NCF4 and ATG16L1 genes, as well as a predicted gene on 16q24.1 (FAM92B) and an intergenic region on 10q21.1. We further demonstrate that the ATG16L1 gene is expressed in intestinal epithelial cell lines and that functional knock down of this gene abrogates autophagy of Salmonella typhimurium. Together these findings suggest that autophagy and host cell responses to intra-cellular microbes are involved in the pathogenesis of CD.
Crohn's disease (CD) and ulcerative colitis (UC) represent the two common forms of idiopathic inflammatory bowel disease (IBD), each with a prevalence of roughly 100-150 per 100,000 individuals of European ancestry1. CD most commonly involves the ileum and colon but can affect any region of the gut. UC always involves the rectum, and inflammation may extend as far as the cecum in a contiguous pattern2. Strong familial aggregation, twin studies and established genetic associations attest to the important role of genetics in IBD pathogenesis3-5. There is also very strong evidence that the enteric microflora plays a central role in the initiation and maintenance of disease. Therefore, like most complex trait diseases, IBD results from a combination of genetic and non-genetic risk factors, where each individual factor may be expected to have a relatively modest effect on diseaserisk.
While a combination of genome-wide linkage, candidate gene and targeted association mapping studies have been successful in the identification of CD-associated genetic variants in CARD15 and the IBD5 haplotype, these explain only a small fraction of the heritability of CD6-8. We therefore embarked upon a genome-wide association (GWA) study of CD in order to find additional genetic risk factors. Phenotypes for both CD and UC vary considerably among individuals, primarily with regard to sites of inflammation, disease behavior, severity and extraintestinal manifestations. Furthermore, CD site and behavior are likely under genetic control based on clustering within affected sibling pairs9, as well as specific observations that CARD15 mutations are a greater risk factor for ileal CD and stricturing behavior10. Therefore we have exclusively focused on patients with CD involving the ileal region of the small intestine (with or without other sites of involvement) in an attempt to minimize clinical and genetic heterogeneity. Based on an interim analysis approximately halfway through this study, we identified, confirmed and published the discovery of genetic variants in the IL23R gene that significantly influence risk to developing CD and UC11. Specifically at that point, 567 non-Jewish ileal CD cases had been scanned and analyzed. Here we report the results of the completed scan, with genotyping of 988 ileal CD patients and 1007 controls performed using the Illumina HumanHap300 Genotyping BeadChip, replication of the significantly associated SNPs in independent familybased and case-control cohorts, and preliminary functional data that begins to place these novel genes into biological context and suggests potential pathogenic mechanisms.
In order to identify genetic factors influencing risk to develop CD, we typed 308,332 SNPs using DNA samples from 988 patients with ileal CD and 1007 controls that were collected by the members of the North American NIDDK IBD Genetics Consortium. Overall, 98.6% of genotypes were called in this experiment; however, before analysis, quality filtering of both samples and SNPs were performed to insure robust association tests (see Methods). After performing these data filtering measures, we had a final data set of genotypes consisting of 304,413 SNPs in 946 cases and 977 controls with an average call rate of 99.35%. Our baseline analysis was a Cochran-Mantel-Haenszel (CMH) test using two groups (Jewish and non-Jewish) to accommodate potential subtle differences in the genetic background of the two groups. The median chi-square observed was slightly inflated so we corrected all statistics using a genomic control (GC) factor of 1.056.
This GC-corrected CMH analysis produced a significant excess of positive associations – for example 27 SNPs in 9 distinct genomic regions were associated with p<10−5, a level which we would informally expect to observe roughly 3 times by chance in this scan given the number of tests performed (See Fig. 1 and Table 1). The two most significant loci, with multiple SNPs being significantly associated, corresponded to the CARD15 and IL23R genes that have previously been confirmed as IBD genes. The remaining regions identified with extreme pvalues were all novel, showed no unusual pattern of missing data or departure from Hardy-Weinberg equilibrium (HWE) in controls (of the best SNPs in the top 23 regions examined further below, all had HWE-p > .01 in controls and 18 were missing less than 1% genotypes), and many were supported by nearby correlated SNPs also showing association. It should be noted that the confirmed IBD5 CD risk haplotype was not among the top loci in this study of ileal CD as it only had modest evidence of association (p<0.01).
While certainly encouraging, even an overall high-quality data set and robust analyses cannot be assumed to perfectly filter out all artifactual sources of false positives. We therefore tested the most significantly associated SNP in the 23 independently associated regions with 5 p<5×10−5 in an independent set of 650 individuals with ileal CD and their parents (Replication Cohort #1), to attempt replication of these findings in a robust family-based setting and with an alternative genotyping technology (iPlex; Sequenom). Although we did not succeed in typing three of these SNPs (and the replication of CARD15 and IL23R had already been established and are not considered here) the remaining 18 SNPs had high quality genotype data that passed our quality thresholds. After removal of DNAs with low genotype call rates and families with an excess of Mendel errors, the final analyses included 530 trios. Association testing of this final dataset from Replication Cohort #1 identified that six out of the 18 SNPs had significant evidence of replication (Table 1; see Supplementary Table 1 online): rs2241880 (p=0.00089); rs16853571 (p=0.00037); rs224136 (p=0.00021); rs4821544 (p=0.019); rs8050910 (p=0.024); rs11617463 (0.037). To provide further evidence for the association of these replicated SNPs, we typed these six in another independent cohort of 353 patients with ileal CD and 207 controls (Replication Cohort #2) on yet another independent genotyping platform (Taqman; Applied Biosystems). One of these failed to type but the other five were successfully typed and provided additional evidence of replication (Table 1; see Supplementary Table 1 online). Considering the two replication studies together (using Mantel-Haenszel statistics), two of these SNPs were unequivocally confirmed as novel CD risk factors (rs2241880, p<5×10−8; rs224136, p<5×10−7). In combination with the genome-wide scan data, these two SNPs are considerably in excess of the most conservative genome-wide thresholds (rs2241880, p<10−13; rs224136, p<10−10). Three additional loci (rs16853571, rs4821544, and rs8050910) showed very strong evidence of replication (p<0.01). Even when setting aside the two most significantly associated novel SNPs (rs2241880 and rs224136), observing replication of 3/16 SNPs with p<0.01 is highly unlikely to occur by chance (p=0.0005). Since there is only a 1% chance of observing even two such results by chance, it is highly likely that these three SNPs represent true CD susceptibility variants, but further confirmation is required to achieve the same level of certainty as attained for the IL23R, rs2241880 and rs224136 SNPs identified in this study.
Given the significant evidence of replication for these five novel loci, we examined the linkage disequilibrium structure surrounding the replicated SNPs using the genotype data from the GWA screen, and examined the association mapping data from the GWA, replication studies as well as some additional higher density typing in order to map these association signals to specific genomic regions and genes. As shown in Supplementary Figure 1 online, in four of these instances, the association mapping implicates specific genes, whereas in the fifth it implicates an intergenic region.
Specifically, the LD structure and association mapping around the most associated SNP (GWA, p=6.4E-08; Replication, p=4.1E-08), rs2241880, implicates a region on chromosome 6 2q37.1 containing a single gene known as ATG16 autophagy related 16-like 1 (S. cerevisiae) or ATG16L1 gene. In fact, this SNP encodes a non-synonymous amino acid change – an alanine to threonine substitution in exon 8 – Ala197Thr. Logistic regression analyses conditional on the Ala197Thr in the family-based samples indicates that this coding variant can fully explain the association signal to this locus – therefore we consider this to be the causal risk variant. In all of the datasets tested in this study, the threonine allele is the minor allele and has a protective effect; in the GWA dataset the estimated odds ratio is 0.692 (95% CI of 0.608-0.788). Interestingly, the threonine is the ancestral allele and is the major allele in the YRI, JPT, and HCB samples typed in the International HapMap Project (see Supplementary Fig. 2 online). ATG16L1 is part of a family of genes involved in autophagy – a biological process involved in protein degradation, antigen processing, regulation of cell signaling and many other pathways essential to the initiation and regulation of the inflammatory response – and therefore this association suggests that this biological process is likely to have an important role in disease pathogenesis.
The next SNP with incontrovertible association to ileal CD is rs224136 (GWA, p=7.9E-06; Replication, p=2.9E-07). The association mapping in this case, however, implicates a genomic region on chromosome 10q21.1 of approximately 70 kb that does not contain any known protein coding genes. This associated intergenic region is flanked on the centromeric side by the zinc finger gene known as ZNF365 and on the telomeric side by a predicted protein (c10orf22) and by the known early growth response gene EGR2, also a zinc finger protein. Although there is little known about the ZNF365 and c10orf22 genes, mutations in EGR2 are known to lead to the development of congenital hypomyelinating neuropathy (MIM#605253) or of Charcot-Marie-Tooth disease (MIM#607678). Potentially more relevant, however, is the observation that EGR2 appears to be a negative regulator of T cell activation in mice12, and although speculative, genetic variation within this intergenic region could play a role in the control of expression of EGR2 and influence biological function.
The replicated SNP rs16853571 (GWA, p=7.7E-07; Replication, p=8.4E−03) is located in the promoter region of the Paired-Like Homeobox 2B (PHOX2B) gene on chromosome 4p13 (see Supplementary Fig. 1 online) - 2143 bp upstream of the first known exon and 10 bp upstream of a putative alternate first exon. PHOX2B is a homeodomain transcription factor, predominantly expressed in differentiating neurons. Human mutations in the PHOX2B gene have been implicated in disorders of neural crest development, congenital central hypoventilation syndrome and Hirschsprung disease13,14. Inactivation of PHOX2B in mice disrupts noradrenergic differentiation throughout the nervous system. Autonomic ganglia fail to develop properly and these mice have striking defects in enteric neurons14-16.
The genomic region identified by rs4821544 (GWA, p=2.9E-05; Replication, p=9.0E-03) also contains a single gene known as NCF4 (rs4821544 itself is in intron 1). NCF4 encodes the 7 p40phox protein that has previously been demonstrated to play an important role in NADPH oxidase activity and the generation of reactive oxygen species (ROS) production upon phagocytosis17,18; both important for mounting an effective anti-microbial response.
Finally, rs8050910 (GWA, p=3.3E-05; Replication, p=8.5E-03) is located within the 4th intron of the predicted gene FAM92B located on chromosome 16q24.1. This gene prediction is supported by multiple mRNAs and spliced ESTs, as well as significant conservation across multiple species (data not shown). This gene encodes a protein with no known function or recognizable motifs.
It should be noted that when using a stratified analysis approach, none of these three loci appeared to have any statistically significant epistatic interactions with the CARD15, IBD5 or IL23R genes (data not shown). Though some association with UC is seen for rs224136, little or no evidence is seen at the other newly identified SNPs (see Supplementary Table 1 online).
Since we were interested in determining more about ATG16L1's function and biological context, we examined the expression distribution of ATG16L1 and other known autophagy components by quantitative real-time RT-PCR in a variety of epithelial and immune cell lines (see Fig. 2A, B and C). ATG5, 7 and 16 were broadly expressed, with SW480 cells having the lowest overall mRNA abundance relative to GAPDH controls. Apart from the markedly elevated levels of ATG7 seen in THP-1 cells, expression of each of the three components followed a similar pattern in the cells tested, as might be suggested by the stoichiometric relationship between these components in the homeostatic autophagy process. We also investigated the expression of ATG16L1 in primary human immune cells, to gain a better understanding of its role in the association with CD (see Fig. 2D). Amongst samples tested, we found that ATG16L1 was most highly expressed in the T cell compartment, with both CD4+ and CD8+ cells showing high levels (approximately 13- and 10-fold higher than placental control RNA, respectively). CD19+ cells (B cells) also showed almost 5-fold greater expression of ATG16L1 than both mononuclear cells and placental control RNA. Thus it is possible that ATG16L1 and its variants play roles in both the epithelial- and immune-driven aspects of CD.
PHOX2B is known to be expressed in the autonomic nervous system and enteric neurons16. Attempts to determine expression using real-time RT-PCR were unsuccessful, most likely due to expression being confined to specific cell types and therefore overall transcript levels being low in the tissues studied. To overcome this limitation we obtained a specific antibody and stained mouse and human gut samples (ileum and colon respectively). These clearly showed that the PHOX2B protein was expressed in a specific subset of epithelial cells, possibly neuroendocrine cells (See Supplementary Fig. 3 online).
NCF4 encodes p40phox, a component of the NADPH oxidase complex required for optimal ROS generation in immune cells. Expression is restricted to haematopoietic cells including neutrophils, monocytes, eosinophils, mast cells and basophils (See Supplementary Figure 4 and reference 19). The p40phox protein serves to enhance delivery of p47 and p67phox to the membrane and thus promotes high-level ROS generation. However, it's role in cells that do not generate high-levels of ROS and lack expression of p47 and p67phox remains unknown.
Although the yeast homologue of human ATG16L1 is required for autophagy20, it was not known whether the mammalian isoforms were similarly essential. We addressed this by utilizing oligo-based siRNA directed against ATG16L1 isoforms 1 and 2. Using co-transfection of a flag-tagged ATG16L1 expression plasmid and siRNA oligos we established specific knockdown of ATG16L1 in HEK293 cells (see Fig. 3A). In order to study the role of ATG16L1 in autophagy during a well-characterized host-pathogen interaction, we performed ATG16L1 knockdown in HeLa cells, followed by S. Typhimurium infection. We confirmed efficient knockdown of endogenous ATG16L1 in HeLa cells by real-time RT-PCR (see Fig. 3B). Since siRNA 2 was the most effective, this was used in subsequent experiments, co-transfected with a plasmid expressing GFP-LC3, a marker for the autophagic compartment21. It is known that a subpopulation of internalized salmonella are targeted by autophagy in this system22. Infections with S. Typhimurium showed a marked difference between control and siRNA 2 -treated cells: control cells targeted a mean of 17.5% (+/− 1.3 s.e.m.) of intracellular bacteria to autophagic LC3+ vacuoles within 1 hour, while siRNA 2 transfection reduced this to only 2% (+/− 0.5 s.e.m) (see Fig. 3C). The autophagic targeting rates in control-treated and ATG16L1-knockdown cells correspond well to those seen in ATG5+/+ and ATG5−/− mouse embryonic fibroblasts (20% and 3%, respectively)22, confirming that, like ATG5, ATG16L1 is required for autophagy. Confocal imaging of infected cells revealed a near-complete lack of LC3 localisation with intracellular bacteria in ATG16L1-knockdown cells, despite the presence of abundant cytoplasmic LC3-GFP and complete envelopment of targeted bacteria in control cells (see Fig. 3D). In addition, we utilized classical autophagic stimuli to demonstrate the inhibitory effect of ATG16L1 knockdown (see Supplementary Fig. 5 online); both serum starvation and rapamycin treatment for 24 hours induced LC3-GFP vesicles only in control siRNA transfected cells, not those with ATG16L1 knockdown. Inhibition of lysosome fusion by ammonium chloride also induced rapid accumulation of LC3+ vesicles in control cells, but these structures were not found in ATG16L1-siRNA treated cells (see Supplementary Fig. 5 online).
The potential of GWA studies to discover modest genetic risk factors in complex disease has been widely debated for several years. The simultaneous emergence of resources, such as the data from the International HapMap Project and dramatic leaps in genotyping technology (both clearly motivated by this potential), has now enabled the actual test of this hypothesis. In the case of CD, years of research effort by a substantial international community of researchers had identified only two uniformly replicated, genuinely associated, genetic risk factors for this disease (CARD15 and IBD5). Encouragingly, after only the first, moderately-sized GWA study and modest replication attempts, compelling evidence that this list has now grown substantially has been provided, as unequivocally strong evidence of replication has emerged from this study for variation at IL23R, ATG16L1 and an intergenic region on 10q. Moreover, the excess of additional low p-values in this study – in particular three additional top-ranking SNPs (in PHOX2B, NCF4, and FAM92B) which replicated at p<.01 - along with the fact that other, large GWA attempts are ongoing, offers the potential that the list of replicated loci will grow even further in a short period of time. In fact, since the submission of the current study, reports describing a genome-wide association studies of 735 German patients with CD and of 547 Belgian patients with CD have been published 23,24. In the German study, after association analyses of 7,159 nonsynonymous SNPs, the Ala197Thr in ATG16L1 was found to be associated with CD, confirming this gene's involvement in CD susceptibility. In contrast to our findings, however, this group reports an epistatic interaction with CARD15. It should be noted, however, that rather than an aberrant joint frequency estimates in cases (as would be expected in the case of epistasis promoting a rare disease), the interaction signal in that study was mostly due to deviation from expectation in the control genotypes. Although this is possible for some genetic models, it requires a careful investigation. Furthermore, since different IBD genes show different degrees of phenotypic specificity, genes which promote the same specific sub-phenotype may be perceived as interacting if that specificity is not accounted for in the analyses. In the Belgian study, using the Illumina HumanHap300 Genotyping BeadChip, the authors identified an associated SNP (rs1373692) that implicated the prostaglandin receptor EP4 (PTGER4) gene in CD pathogenesis. In the current study, we also found evidence of association of the rs1373692 SNP with ileal CD (p=5.37E-04), as well as to a SNP (rs4613763; p=1.21E-04) less than 40kb away, providing the first replication of this finding. Full release and integration of results from these and other ongoing studies seems likely to add further to this burgeoning list of confirmed associations.
It is without doubt, however, that there are limitations to the GWA approach, just like any other genetic approach, and therefore complementary strategies for gene discovery will continue to be necessary25. One limitation is related to our ability to type large enough cohorts to have sufficient statistical power to detect loci of very modest genetic effect25. One example from our current study is the lack of genome-wide significant association of the IBD5 haplotype, despite the fact that this is a confirmed CD risk haplotype26 and despite the fact that in a targeted replication study performed with the Replication Cohort #1 presented in the current study we find significant evidence of replication27. The strength of the association of the IBD5 in the current GWA study, it should be noted however, is consistent with modest risk conferred by this locus28. While recognizing the limitations inherent to the GWA study design, these current and future association findings also have the potential for dramatically changing our understanding of the biological processes that are crucial to disease pathogenesis, given the fact that genome-wide searches are not biased by our current understanding of pathogenic mechanisms. A prime example of this is our discovery that the ATG16L1 gene is associated with IBD, given that this gene, or the biological pathway of autophagy to which it belongs, was not previously implicated in IBD pathogenesis before the current generation of genome scans.
Autophagy is a constitutive process required for proper cellular homeostasis and organelle turnover. However, recent data has revealed a role for autophagy in innate and adaptive immune responses to pathogens29. This places autophagy as a gate-keeper for innate immune ligands and antigen presentation from intracellular compartments. So far the role of autophagy in innate immune recognition has been little studied, but several systems have been identified which show a potential role in host-defense30-32. We have exploited a model system using the invasive bacterium S. Typhimurium in cultured human epithelial cells22 to demonstrate that in addition to classical autophagic stimuli, such as serum starvation and rapamycin, ATG16L1 is required for autophagic targeting of a subset of intracellular Salmonella exposed to the cytoplasm. While we do not propose that this represents the physiological stimulus involved in CD, we do believe it provides initial evidence that ATG16L1 variants with altered autophagic efficiency/efficacy might alter either bacterial replication and immune control, or delivery of antigens to adaptive immune pathways. For example, interactions between innate immunity, the inflammasome and autophagy have been proposed for Legionella pneumophila 33. The proposed mechanism relies upon a NOD-LRR family member (Naip5) to detect invading L. pneumophila and subsequently activate autophagy, or if Naip5 signalling is sustained, proinflammatory cell death via caspase 1. Associations between the response to microbial products and chronic intestinal inflammation have been noted for flagellin, with both colitic mice and CD patients exhibiting elevated anti-flagellin serum IgG34. Since it is now known that TLR7- mediated detection of viral replication is autophagy-dependant35, it is tempting to speculate that differential autophagic effectiveness might enhance defects in NOD-LRR family based signaling from intracellular pathogens.
Adaptive responses are also influenced by autophagic processing, since HLA class II molecule loading occurs in lysosomes and autophagy delivers cytoplasmic components into this compartment. Antigen loading via this pathway has been observed in professional antigen presenting cells as well as epithelial cells36. In cells with low levels of endocytosis, such as epithelial cells, this is likely to be an important route for antigen presentation and immune surveillance37. It is possible that differential rates or substrate specificities induced by variation within the autophagic apparatus will result in differential antigen presentation, as has been observed for B cells in starvation stress37. In many models of CD and other inflammatory disorders, T-cells are the primary effectors and autophagy is a vital process for T cell maintenance and homeostasis, as demonstrated by the increased cell death seen in ATG5−/− T cells and their inability to proliferate following stimulation38. T cells may also be stimulated directly by bacterial products in the gut microenvironment, demonstrated by the discovery of the I2 superantigen39. Given the role of autophagy in controlling T-cell death and proliferation, ATG16L1-mediated alterations in the proliferative and survival abilities of T cells provide a possible explanation for some of the T-cell driven pathology observed in CD. The relationships between T cell survival and regulation and the ATG16L1 variants discussed here will be an important area of further study.
Although our observation of potential association of the NCF4 gene to ileal CD is not unequivocally confirmed, the known functions of this gene warrant further examination. In particular, previous studies of the p40phox protein encoded by the NCF4 gene have demonstrated that it plays an important role in NADPH oxidase activity and the generation of reactive oxygen species (ROS) production upon phagocytosis17,18; both important for mounting an effective anti-microbial response. An alteration in gene expression could potentially have an effect on phagosome function, less effectively kill phagocytosed microbes, and could result in prolonged immune activation, incomplete pathogen clearance or might influence TLR activation and antigen presentation due to inappropriate ROS generation40,41. The constant exposure to bacterial challenge and high antigen load in the gut may pose special difficulties for even slightly altered innate immune responses, resulting in gut-restricted phenotypes. Such gut restriction has been previously observed for much more severe phox deficiencies in a number of chronic granulomatous disease (CGD) patients, whose initial symptoms resulted in a diagnosis of CD. Thus we propose that mutations in NCF4 might interfere with the stochiometry of interaction with p67 and other p47phox interacting partners resulting in relatively moderate phagosome dysfunction phenotypes sufficient to trigger the onset of IBD.
Although additional functional studies are required to place all of these novel loci into a biological context that will enable a deeper understanding of their roles in normal homeostasis and disease pathogenesis, there can be little question that the opportunity to make advances in 12 our understanding and eventual treatment of CD is being offered by the successful application of robust GWA studies.
The Screening Cohort used for this GWA study consisted of 988 patients with ileal CD and 1007 controls from the North American NIDDK IBD Genetics Consortium (IBDGC). Cases and geographically matched controls were ascertained through the Cedars-Sinai Medical Center, Johns Hopkins University, University of Chicago, University of Montreal, University of Pittsburgh, and the University of Toronto Genetics Research Centers (GRCs), with additional age and ethnicity-matched controls provided by the New York Health project. In all cases, informed consent was obtained using protocols approved by the local institutional review board in all participating institutions. The diagnosis of IBD requires a) one or more symptoms of diarrhea, rectal bleeding, abdominal pain, fever, or complicated perianal disease, b) occurrence of symptoms on two or more occasions separated by at least 8 weeks or ongoing symptoms of at least 6 weeks duration, and c) objective evidence of inflammation from radiologic, endoscopic, and histologic evaluation. Ileal CD involvement was defined as mucosal ulceration, cobblestoning, stricturing or bowel wall thickening from endoscopy reports, barium X-rays, operative reports and/or pathology resection specimen reports. This was an inclusive definition where individuals with either “ileal only” or “ileo-colonic” were part of the “ileal CD” category. We previously reported the association between the IL23R gene and IBD from a genome-wide study of 567 CD patients and 571 controls11. These samples were included in the current study as well as an additional 421 ileal CD and 436 control individuals that were ascertained in an identical manner.
Replication Cohort #1 consisted of 883 nuclear families, which were independent of the Screening Cohort. These study subjects were collected by the same IBDGC Genetics Research Centers as described above, were previously described11 and included 650 individuals with ileal CD and their parents. The Replication Cohort #1 was typed for 19/23 SNPs that had significant association (p<5E-05) in the GWA screen (the CARD15 and IL23R had previously been typed and two other SNPs failed at the assay design stage).
Replication Cohort #2 consisted of 978 patients with IBD (353 of which had ileal CD as defined above) and 207 control individuals that were ascertained as part of ongoing genetic studies at the Inflammatory Bowel Center at Cedars-Sinai Medical Center, Los Angeles, California. Recruitment of subjects has been approved by the Cedars-Sinai Medical Center Institutional Review Board. Replication Cohort #2 was typed for the six SNPs that had significant evidence of replication (p<0.05) in Replication Cohort #1 (rs2241880, rs16853571, rs224136, rs4821544, rs8050910, and rs11617463).
For the genome-wide association study, approximately 750ng of genomic DNA was used to genotype each sample on the Illumina HumanHap300 Genotyping BeadChip (Illumina, San Diego) at the Feinstein Institute for Medical Research. Samples were processed according to the Illumina Infinium 2 assay manual. Briefly, each sample was whole-genome amplified, fragmented, precipitated and resuspended in appropriate hybridization buffer. Denatured samples were hybridized on prepared HumanHap300 beadchips for a minimum of 16 hours at 48°C. Following hybridization, the beadchips were processed for the single base extension reaction, staining and imaging on an Illumina Bead Array Reader. Normalized bead intensity data obtained for each sample was loaded into the Illumina Beadstudio 2.0 software which converted fluorescent intensities into SNP genotypes. The replication study of the all of the putative loci (Table 1), and the additional genotyping of the ATG16L1 region was performed using primer extension chemistry and mass spectrometric analysis (iPlex assay, Sequenom, San Diego) using Sequenom Genetics Services (Sequenom, San Diego). TaqMan MGB technology was used to perform the genotyping of the six variants with nominal replication in replication study #1 using the design available from Applied Biosystems and following the manufacturer's recommendations (Bulletin #4317594) in the Cedars-Sinai Inflammatory Bowel Disease Center cohort. Quality of the SNP data was checked by reproducibility with 5% of samples duplicated.
Overall, 98.6% of genotypes were called in this experiment; however, before analysis, quality filtering of both samples and SNPs were performed to insure robust association tests. Based on an evaluation of empirical distributions, data quality, and likely introduction of false positive associations, we required that samples pass a 93% genotyping call rate threshold and SNPs pass a 95% call rate threshold in order to be included in the analysis (see Supplementary Fig. 6 online). The data from the genome-wide association studies were used to detect possible relatedness in the case-control cohorts. The HWE test was performed on the genotype data from controls and we investigated the relationship between HWE and genotype yield (call rate). The call rate distribution suggested a 95% and 93% genotype completion threshold for inclusion of samples and SNPs, respectively in the genetic association analyses. At sample call rates of 90% and below, there was an elevation in observed heterozygosity suggesting either a bias in missing data or the presence of false heterozygous calls. To avoid this, we selected a threshold greater than 90 at a point consistent with the tail of successful samples. Below SNP success of 95% there was an excess of markers out of HWE (P-value <0.001 in controls) and a significant bias in missing data between cases and controls. We looked at segments of the data just 15 above and below this cutoff and observed that SNPs with a 90-95% call rate had much more substantial inflation than those in the 95-97% call rate range (genomic control correction 1.34 vs. 1.16) indicative of significant excess of false positives due to lower data quality/biases in missing data so we opted not to lower this threshold further. As a final step, using identity-bystate (IBS) counts from this data set, we identified 8 duplicate samples and 10 additional pairs of samples that shared 10% or more of their genome by descent (first cousins through full siblings) - we eliminated one member of each of these 18 pairs of samples. The data from the Jewish and non-Jewish case-control cohorts were analyzed jointly using a Cochran-Mantel-Haenszel chi-squared test; we used the test as implemented in R (http://www.r-project.org/) with the option that gives an exact p-value.
For the replication study, the most significantly associated SNP from the top 23 ranking independently associated regions (p-value=5×10−5 in the combined screening analyses) were selected. Of these, 2 failed in assay design, one failed to type, and 2 were located within previously reported associated loci, leaving 18 SNPs in the replication analysis. In addition to these, 19 additional SNPs were selected to test the variability in the ATG16L1 region. This set of SNPs was evaluated for replication in a family-based ileal-CD cohort composed of 650 mother-father-affected offspring trios. The genotype data from the family-based cohorts were used in determining Mendelian inconsistencies and departures from HWE. After removal of failing SNPs (monomorphic, call rate below 75%, and causing a high number of Mendel errors), families with excess Mendelian inconsistencies and uninformative families (where a parent or the sole affected offspring had a genotyping call rate below 90%), there remained 530 affected trios (from 431 independent nuclear families) with both parents and at least one affected offspring genotyped. Following the removal of additional SNPs causing excess Mendel errors, the final genotyping call rate for each of the 18 replication markers, as well as the 19 ATG16L1 tagging SNPs, ranged from 90-100%, and Mendelian inconsistencies were below 4 for all markers and less than 4 per family. Single marker association tests were performed in the family data using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). We then tested the replicated SNPs in the set of 530 independent IBD trios (post-QC) from the NIDDK IBDGC and in the independent Cedars-Sinai cohort of 350 patients with ileal CD and 207 controls, and onetailed p-values are reported. The analysis of ATG16L1 SNPs conditional on Ala197Thr was performed by logistic regression in the family-based sample using the software WHAP (http://pngu.mgh.harvard.edu/~purcell/whap).
RNA templates were purified from cell lines using the RNeasy miniprep kit (Qiagen). In the case of human primary immune cell subsets RNA was purchased from a commercial supplier (Human Blood Fractions MTC Panel, Clontech). In all cases cDNA was made from RNA templates using a standard reverse transcriptase protocol (iScript, BioRad), with 500 ng of RNA per reaction. The resulting template cDNAs were diluted 1:10 in nuclease-free water and 10 ul of this template used for each 40 ul qPCR reaction. Quantitative RT-PCR was performed using real-time SYBR Green method on a BioRad iQ5 thermal cycler, using iQ SYBR Green Supermix (BioRad) and specific primers. Primer sequences used were selected using MGH Primerbank (http://pga.mgh.harvard.edu/primerbank/); all primer sequences are shown in Supplementary Table 2. PCR products were visualized on a 2% agarose gel to confirm correct band sizes. Each reaction was performed in duplicate, with final calculations resulting from means of duplicate wells. Normalization for cDNA quantity was performed with GAPDH control primers for each template and final abundance figures adjusted to yield an arbitrary value of 1 for control templates (SW480 for cell lines or placental control RNA for primary cells) using the delta-delta Ct method.
Paraffin-embedded tissues from both mouse ileum and human colon biospies were dewaxed, rehydrated, blocked in serum and stained using rabbit anti-PHOX2B antibody (Sigma), incubated overnight at a dilution of 1:500. Tissues were subsequently incubated with biotinylated anti-rabbit secondary antibody, Vector ABC reagent and finally developed with DAB substrate (Vector Laboratories) as has been previously described42. Sections were mounted and observed under phase contrast to visualize tissue architecture.
HeLa cells were obtained from ATCC and cultured in DMEM supplemented with 10% iron-supplemented calf serum (CSFe) (Hyclone) and 20 μg ml−1 gentamicin sulfate (Sigma). Salmonella enterica sbsp. enterica serovar Typhimurium (S. Typhimurium), strain SL1344 was transformed with a DsRed2 expression plasmid (Clontech, CA) (DsRed2 is under the control of the lac promoter, which is constitutive in the absence of lacl) to facilitate bacterial visualization. The plasmid was maintained with 100 μg ml−1 ampicillin selection during culture in LB broth.
We obtained modified siRNA duplexes (Stealth siRNA) directed against ATG16L1 (siRNA 1 & 2) as well as a non-targeting scrambled sequence (siRNA control) from a commercial source (Invitrogen). Cells were co-transfected in the absence of antibiotics with siRNAs and plasmids using Lipofectamine (Invitrogen) according to the manufacturer's instructions. We optimized 17 siRNA conditions using a flag-tagged overexpression ATG16L1 construct in HEK293 cells, allowing us to easily assess knockdown by Western blotting. HEK293 cells grown in 12 well plates were transfected with 500 ng Flag-ATG16L1 expression plasmid and 20 pmol siRNA duplex. For knockdown of endogenous ATG16L1, HeLa cells were plated onto 18mm glass coverslips in 12-well plates at a density of 1×105 well−1 and allowed to grow for 24 hours prior to transfection. Each well received 500ng of GFP-LC3 plasmid and 20 pmol siRNA duplex. Four hours later the medium was changed again and the cells were allowed to grow for a further 48 hours. Knockdown was confirmed for each experiment using real-time quantitative RT PCR with specific primers to ATG16L1 and normalized to GAPDH controls, with RNA isolated from specimen wells.
Autophagy was induced in HeLa cells using serum starvation or rapamycin. Cells were transfected as described above and after 48 hours the medium was changed to either 1% serum, or 10% serum plus 200 nM rapamycin for a further 24 hours. Ammonium chloride treatment was performed with a final concentration of 50 mM for 2 hours in 1% serum medium. S. Typhimurium infections were performed as previously described, with slight modifications43. Briefly, S. Typhimurium SL1344 carrying a DsRed2 expression plasmid was grown overnight in LB broth containing 100 μg ml−1 ampicillin at 37°C with aeration and subcultured at a dilution of 1:33 for a further 3 hours in LB. This culture was further diluted in DMEM 10% CSFe without antibiotics, to yield an m.o.i. of 100. Infections were allowed to proceed for 20 minutes, cells were washed once in complete medium containing 100 μg ml−1 gentamycin sulfate, and then incubated in fresh high-gentamycin medium for 1 hour.
Infected, autophagy-induced and control HeLa cells were fixed in 4% formalin in PBS for 15 minutes, washed in PBS and mounted in aqueous mountant (Polysciences, PA). Slides were then either viewed under wide-field fluorescence illumination (Zeiss Axioplan) for counting, or laser scanning confocal microscopy (BioRad Radiance 2000) was used to obtain high-resolution z-stacks, which were subsequently projected onto single images using LSM Image software (Carl Zeiss GmbH, Germany). The total number of bacteria per cell, and the number of LC3-GFP positive bacteria were assessed in randomly chosen fields with at least 100 cells counted for each condition. The numbers of LC3-GFP positive bacteria were then calculated as a percentage of total bacteria. Significance was assessed using the two-tailed, unequal variance Student's T-test.
We thank the patients and their families for participating in these studies. We are grateful to all of the clinicians, research nurses and study coordinators for their essential contributions to the work. The authors would like to thank D. Caplan, G. Charron, C. Labbé, C. Lefebvre, and D. Miclaus for their help in the preparation of the manuscript, J. Adams for help with immunohistochemistry, A. Landry expression studies and to D. Altshuler for his critical reading of the manuscript. The National Institute of Diabetes and Digestive and Kidney Diseases IBD Genetics Consortium is funded by the following grants: DK62431 (S.R.B.), DK62420 (R.H.D.), DK62432 (J.D.R.), DK62423 (M.S.S.), DK62413 (K.D.T.), and DK62422 and DK62429 (J.H.C.). A.H.S. is on the Scientific Advisory Boards of Shire Pharmaceuticals, Schering (Canada), and Procter & Gamble Pharmaceuticals. The work on the Cedars-Sinai cohort was supported by project 1 of DK 46763 (J.I.R.). R.J.X. is supported by the following grants AI062773 and DK43351.