|Home | About | Journals | Submit | Contact Us | Français|
Type 1 diabetes (T1D) is a common autoimmune disorder that arises from the action of multiple genetic and environmental risk factors. We report the findings of a new genome-wide association study of T1D, combined in a meta-analysis with two previously published studies. The total sample set included 7,514 cases and 9,045 reference samples. Forty-one distinct genomic locations provided evidence for association to T1D in the meta-analysis (P < 10-6). After excluding previously reported associations, 27 regions were further tested in an independent set of 4,267 cases, 4,463 controls and 2,319 affected sib-pair (ASP) families. Of these, 18 regions were replicated (P < 0.01; overall P < 5 × 10-8) and four additional regions provided nominal evidence of replication (P < 0.05). The many new candidate genes suggested by these results include IL10, IL19, IL20, GLIS3, CD69 and IL27.
Results from linkage and association studies in T1D have long supported a model in which the major risk factor for T1D resided in the HLA region on chromosome 6p21. Candidate gene studies carried out over a number of years identified four non-HLA T1D risk loci: INS, CTLA4, PTPN22, and IL2RA1-4. Recently, the application of genome-wide SNP typing technology to large sample sets and comparisons with results from other immune-mediated diseases have provided convincing support for 19 additional T1D loci5-13, all with allelic odds ratios (OR’s) of less than 1.3.
In order to have adequate power to detect additional T1D risk loci with ORs in the range of 1.1 to 1.3, we performed a new genome-wide association scan using British cases and controls and used this dataset in a meta-analysis which included 7,514 cases and 9,045 reference samples (Table 1). The other datasets included in the meta-analysis were from the Wellcome Trust Case Control Consortium (WTCCC) study7 and a study12 that utilized T1D cases from the Genetics of Kidneys in Diabetes (GoKinD) study of diabetic nephropathy14, and reference samples from the National Institute of Mental Health (NIMH) study15.
The two earlier studies (WTCCC and GoKinD/NIMH) used Affymetrix 500K platforms while the new (T1DGC) study used the Illumina 550K platform. Of the 841,622 SNPs genotyped in these studies which had minor allele frequencies (MAF) exceeding 1% and passed our quality control standards, 328,044 were only genotyped by the Affymetrix platform, 437,739 only by the Illumina platform, and 75,839 were genotyped by both platforms. Since only 9% of SNPs are shared between these platforms, imputation was used to combine results across studies. To develop imputation rules, we took advantage of the fact that 1,422 of the original WTCCC controls which were included in the T1DGC study had been genotyped on both platforms (Methods).
An analysis using Mantel’s extension to the 1 degree of freedom (1 df) Cochran-Armitage trend test which combined comparisons over the three studies yielded 41 distinct genomic locations with P-values < 10-6 (Figure 1) (Individual plots for each study are in Supplementary Figure 1). Fifteen of these sites were in regions where there were prior reports of association to T1D (Table 2). The remaining 26 of these locations along with one weaker association on the X chromosome, were chosen for further analysis. To address the possible effects of population structure, the analyses were stratified by geographical region in the case of the British studies and by a “propensity score” based on principal components analysis on the US study. This was only partially successful in reducing the over-dispersion of test statistics, a large part of which derived from the US data (Table 3). If the residual over-dispersion were due to population structure, there would be a strong case for correcting the P-values (as shown in Table 3). However, the modest effect of the stratified analysis on over-dispersion, taken together with the absence of any over-dispersion in case-only interaction tests (see below) suggests that it is more likely due to differential genotyping errors. In this case, correction of the most significant P-values would be over-conservative since we have carefully checked all genotyping cluster plots for associated SNPs. The genomic control corrected P-values are nevertheless shown in Supplementary Table 1. The strongest associations tended to become somewhat less significant, but the choice of regions for follow-up, based on the criteria of P < 10-6, was not affected. We also carried out, for SNPs with minor allele frequency exceeding 10%, 2 df “genotype” tests which would be more sensitive to associations showing marked dominance (deviation from an additive model, on the log scale). Significance was notably increased, by 3 to 4 orders of magnitude, at three SNPs, but was less significant than the corresponding 1 df tests otherwise (Supplementary Table 1) yielding no additional findings at P < 10-6. The results of both simple and stratified 1 df tests of these SNPs, separated by study, are shown in Supplementary Tables 3 and 4. Quantile-quantile plots for tests in our new (T1DGC) study, and in the meta-analysis, after removal of tests for SNPs in linkage disequilibrium (LD) regions surrounding known and putative associations, are shown in Supplementary Figure 2a and 2b.
The most significantly T1D associated SNPs from each of the 27 novel regions selected for replication were genotyped in a further 4,267 cases and 4,670 controls and in 4,342 trios from 2,319 T1DGC families with multiple affected offspring. Genotype data passed design and quality control criteria for 25 of these SNPs. Eighteen regions replicated with P < 0.01 and showed genome-wide significant (P < 5 × 10-8) association in the joint analysis of the genome scans and replication samples (Table 4, individual scan data in Supplementary Table 2). A further three of the remaining seven SNPs also showed P < 0.01 in the replication studies, and a fourth had P < 0.05, but these failed to reach overall P < 5 × 10-8 (Table 4). This study, therefore, adds 18 T1D risk loci to the existing 24, and provides suggestive support for four more. As expected, nearly all of these loci have OR < 1.2, as larger effects would likely have been discovered in earlier studies. Two of the new associations (10q23 and 16q23) contradict this trend and highlight the disparity between genomic coverage of the older Affymetrix 500K chip and the newer Illumina 550K: these loci do not have a good proxy on the Affymetrix chip, explaining why they were not previously identified despite relatively large effect sizes (OR ~ 1.3).
The families utilized for replication were derived from affected sib-pair linkage studies. One consequence of ascertainment on the basis of at least two affected siblings was a high frequency of high risk HLA genotypes16. It has been reported that relative risks for several non-HLA loci are reduced in subjects carrying high risk HLA genotypes17, 18, reflecting deviation from a multiplicative model for joint effects, and this would lead us to expect reduced effect sizes in multiple-case families. Indeed, the results of the replication study were generally less convincing in the family data than in the case-control data reflecting smaller effect sizes in the families. One potential explanation for these different effect sizes lies in possible statistical interaction among risk loci leading to a less-than-multiplicative accumulation of risk in samples (such as those from multiplex families) with a large number of risk variants. This hypothesis is difficult to test because power to detect interaction terms is much less than that to find equivalent sized main effects and is doubly compounded when specific causal variants (rather than tag SNPs from a GWA scan) are not known.
We tested for deviation from the model of multiplicative effects with HLA, on a genome-wide basis, by first calculating predictive risk scores using SNPs in the MHC region on each platform, and testing for association between this score and every other SNP in the remainder of the genome. These tests are “case-only” tests for statistical interaction reflecting variation of allelic relative risks with the level of HLA-attributable risk. As noted earlier, these test statistics did not show the over-dispersion which would have been indicative of population stratification (Supplementary Figure 2c). However, the subset of these tests concerning established T1D susceptibility loci tended to have larger chi-squared values than expected by chance (Supplementary Figure 2d). In the majority of cases (31/45), the interaction tests took the opposite sign from the main effect test, consistent with high MHC risk leading to lower risk for other loci. Of the five interactions which reached P < 0.05, four were of this type (loci near 2q24.2/IFIH1, 1p13.2/PTPN22, 17p13.1 and 2q33.2/CTLA4). We carried out a further test by calculating a T1D risk score using all associated loci excluding the MHC region and testing, in cases only, for correlation between this score and the MHC risk score. We found a weak, but significant (P=0.0007) negative correlation, again indicating that risk from HLA and non-HLA sources accumulates at a rate less than expected based on the model of multiplicative effects, so that there is a general tendency for relative risks for non-HLA loci to be reduced when HLA-related risk is high.
Several of the 18 regions identified here contain genes of possible functional relevance to T1D. These include the region 1q32.1 containing the potent immunoregulatory cytokine genes, IL10, IL19 and IL20. The region of strong LD at 9p24.2 contains only a single gene, GLIS3. Mutations in GLIS3 have been reported in children from three different consanguineous families with permanent neonatal diabetes associated with congenital hypothyroidism and other clinical complications19. The region on 12p13.31 harbors a number of immunoregulatory genes including CD69, which is induced by activation of T cells and functions in thymic egress20. Several other members of the calcium-dependent (C-type) lectin (CLEC) domain family with immune functions also map to this region. Overall, our results provide a rich new source of candidate genes, but until further genotyping, re-sequencing and functional studies are performed, it is not possible to be more specific in regard to which genes might be causal.
The WTCCC study has been described elsewhere7. Cases were recruited from pediatric and adult diabetes clinics at 150 National Health Service Hospitals across Great Britain as part of the Genetic Resource for Investigating Diabetes (GRID) collection (www.childhood-diabetes.org.uk/grid.shtml) of the JDRF/WT DIL9. Half of the controls were drawn from the British 1958 Birth Cohort21 and half from a group of blood donors recruited by the WTCCC in collaboration with the UK Blood Services7. The former group was subsequently genotyped on the Illumina 550K platform and was used as controls in the new T1DGC study reported here. Since the removal of this group from the WTCCC study left it somewhat short of controls, we used a group of 1,868 patients with bipolar disorder as additional reference samples — a group conspicuous in the WTCCC studies in its lack of significant differences from control allele frequencies7.
Our new study added approximately 2,500 new controls from the British 1958 Birth Cohort to the 1,500 described above, and compared these with a new group of approximately 4,000 British cases from the JDRF/WT DIL collection. All cases and controls were resident in Great Britain. To minimize the effects of population structure, the case-control comparisons in the WTCCC and T1DGC studies have been stratified by the 12 regions of Great Britain5,7. Sample exclusions in the genome-wide studies are discussed in Supplementary Methods.
Replication studies were carried out in two groups of cases and control as well as 2,319 affected sib-pair families previously recruited and characterized by the T1DGC6. The British cases were from the JDRF/WT DIL, and the controls were drawn from the British 1958 Birth Cohort, and the UK Blood Service controls of the WTCCC. The second set of cases and controls from Denmark were recruited from a nationwide registry. All cases (49% females) were diagnosed before age 18 years and the mean age at onset 9.02 years. Control subjects were randomly selected from the Inter99 study22.
For the T1DGC study, the 4,000 T1D case and 2,500 control DNA samples were selected based on no prior use in a prior genome wide association study and migration as a high molecular weight band of genomic DNA, ~23 kb, by electrophoresis on a 0.75% agarose gel. All DNA samples were extracted using a chloroform-based method and quantified in triplicate using Picogreen®. Once selected, the case and control DNA were randomized by columns into a 96 well plate format.
For the T1DGC study, genotyping was performed on the Illumina 550K Infinium platform and, for comparability, all genotypes were re-scored using the ILLUMINUS algorithm23. The WTCCC study used the Affymetrix GeneChip Human Mapping 500K Array set, while the GoKinD/NIMH study used genotype data generated with the Affymetrix Genome-wide Human SNP Array 5.0. The 5.0 array incorporates all of the SNPs on the earlier 500K array but on a single chip along with an additional 420K non-polymorphic probes. Details of the scoring of genotypes may be found in the original publications7, 12. The criteria for discarding some SNPs from the analysis are discussed in Supplementary Methods.
For the replication studies, genotyping was performed in a fully blinded fashion using Taqman assays as previously described9.
One degree of freedom tests are Cochran-Armitage tests for trend alternatives, extended to pool information across multiple studies or across multiple strata within a single study by the method described by Mantel24. The two degree of freedom tests follow similar principles. Testing for association with SNPs on the X chromosome was carried out using the method proposed by Clayton25. More details are given in Supplementary Methods.
The meta-analysis involved studies that used different platforms, necessitating the use of imputation. Since we had a substantial sample typed on both platforms, we used a simple linear regression approach to imputation26. Details of this, and other methods used in the meta-analysis, are given in Supplementary Methods. Supplementary Figure 3 shows the distribution of the quality of imputation, as measured by the coefficient of determination, R2.
Analysis of the replication case-control studies was carried out in a similar manner, by 1 df comparisons of allele frequencies with Danish and UK studies treated as separate strata. The family study was analyzed by the transmission/disequilibrium test (TDT).
The MHC risk score was derived by an adaption of the lasso approach27 to logistic regression of case/control status versus all SNPs in the MHC region (defined as spanning from 24.7 Mb to 34.0 Mb on chromosome 6). This was applied to the combined Affymetrix data, with a dummy variable in the regression to differentiate WTCCC and GoKinD/NIMH studies and, separately to the T1DGC Illumina data. The coefficients for the selected regression equations are shown in Supplementary Table 3. The degree of risk prediction, as demonstrated by the receiver operating curves (Supplementary Figure 4) was very similar in the three study groups.
A case-only test for statistical interaction between each SNP and MHC risk score was carried out by a 1 df test based on the covariance between MHC risk score and the SNP genotype coded 0, 1 or 2. These tests were stratified within study by geographical region or by principal component score, and information pooled across strata and studies as described above. A 2 df test for association, possibly modified by MHC, was calculated by adding the chi-squared interaction test on 1 df to the 1 df chi-squared statistic for the stratified association test.
The lasso analysis of the MHC risk prediction was carried out in the lasso2 package in the R statistical system28. All the remaining analysis was carried out in the snpMatrix package from the bioConductor project 29.
This research utilizes resources provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), National Human Genome Research Institute (NHGRI), National Institute of Child Health and Human Development (NICHD), and Juvenile Diabetes Research Foundation International (JDRF) and supported by U01 DK062418. Further support was provided by a grant from the NIDDK (DK46635) to PC and a joint JDRF and Wellcome Trust grant to the Diabetes and Inflammation Laboratory at Cambridge, which also received support from the National Institute for Health Research Cambridge Biomedical Research Centre. DC is the recipient of a Wellcome Trust Principal Research Fellowship.
We acknowledge the contributions of the following individuals: Julie Alipaz, Anna Simpson, Judy Brown and Joe Garsetti for assistance with project management; Matt Hardy and Kate Downes for genotyping; Meeta Maisuria for DNA sample coordination and quality control; Joan Hilner and June Pierce for managing T1DGC Data and DNA resources; James Allen, Nigel Ovington, Vin Everett, Geoff Dolman, and Mark Brown for data services and computing; and Luc Smink, Oliver Burren, Joe Mychaleckyj, and Nat Goodman for bioinformatics support.
We gratefully acknowledge the following groups and individuals who provided biological samples or data for this study. We obtained DNA samples from the British 1958 Birth Cohort collection, funded by the Medical Research Council and the Wellcome Trust. We thank The Avon Longitudinal Study of Parents and Children laboratory in Bristol and the British 1958 Birth Cohort team, including S. Ring, R. Jones, M. Pembrey, W. McArdle, D. Strachan and P. Burton for preparing and providing the control DNA samples. We thank the Human Biological Data Interchange and Diabetes UK for providing DNA samples from USA and UK multiplex families, respectively. Danish patients were from the Danish Society of Childhood Diabetes (DSBD) and control DNA samples were kindly provided by Drs. T. Hansen, O. Pedersen, K. Borch-Johnsen, and T. Joergensen. This study makes use of data generated by the Wellcome Trust Case Control Consortium, funded by Wellcome Trust award 076113, and a full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk.
We gratefully acknowledge the Genetics of Kidneys in Diabetes (GoKinD) study for generously allowing the use of their sample SNP allele intensity and genotype data, which was obtained from the Genetic Association Information Network (GAIN) database found at http://view.ncbi.nlm.nih.gov/dbgap/ through dbGaP accession number phs000018.v1.p1 12.
We gratefully acknowledge the National Institute of Mental Health for generously allowing the use of their control CEL and genotype data. Control subjects from the National Institute of Mental Health Schizophrenia Genetics Initiative (NIMH-GI), data and biomaterials are being collected by the “Molecular Genetics of Schizophrenia II” (MGS-2) collaboration. The Investigators and co-investigators are: ENH/Northwestern University, Evanston, IL, MH059571, Pablo V. Gejman, M.D. (Collaboration Coordinator: PI), Alan R. Sanders, M.D.; Emory University School of Medicine, Atlanta, GA, MH59587, Farooq Amin, M.D. (PI); Louisiana State University Health Sciences Center; New Orleans, Louisiana, MH067257, Nancy Buccola APRN, BC, MSN (PI); University of California-Irvine, Irvine, CA, MH60870, William Byerley, M.D. (PI); Washington University, St. Louis, MO, U01, MH060879, C. Robert Cloninger, M.D. (PI); University of Iowa, Iowa, IA, MH59566, Raymond Crowe, M.D. (PI), Donald Black, M.D.; University of Colorado, Denver, CO, MH059565, Robert Freedman, M.D. (PI); University of Pennsylvania, Philadelphia, PA, MH061675, Douglas Levinson M.D. (PI); University of Queensland, Queensland, Australia, MH059588, Bryan Mowry, M.D. (PI); Mt. Sinai School of Medicine, New York, NY MH59586, Jeremy Silverman, Ph.D. (PI). The samples were collected by V L Nimgaonkar’s group at the University of Pittsburgh, as part of a multi-institutional collaborative research project with J Smoller, M.D. D.Sc. and P Sklar, M.D. Ph.D. (Massachusetts General Hospital) (grant MH 63420).
URLs Further information about all T1D loci, including those newly reported here can be found in T1DBase: http://www.t1dbase.org/