Results from linkage and association studies in T1D have long supported a model in which the major risk factor for T1D resided in the HLA region on chromosome 6p21. Candidate gene studies carried out over a number of years identified four non-HLA T1D risk loci: INS, CTLA4, PTPN22
, and IL2RA1-4
. Recently, the application of genome-wide SNP typing technology to large sample sets and comparisons with results from other immune-mediated diseases have provided convincing support for 19 additional T1D loci5-13
, all with allelic odds ratios (OR’s) of less than 1.3.
In order to have adequate power to detect additional T1D risk loci with ORs in the range of 1.1 to 1.3, we performed a new genome-wide association scan using British cases and controls and used this dataset in a meta-analysis which included 7,514 cases and 9,045 reference samples (). The other datasets included in the meta-analysis were from the Wellcome Trust Case Control Consortium (WTCCC) study7
and a study12
that utilized T1D cases from the Genetics of Kidneys in Diabetes (GoKinD) study of diabetic nephropathy14
, and reference samples from the National Institute of Mental Health (NIMH) study15
Samples from three genome-wide association analyses of type 1 diabetes used in this analysis.
The two earlier studies (WTCCC and GoKinD/NIMH) used Affymetrix 500K platforms while the new (T1DGC) study used the Illumina 550K platform. Of the 841,622 SNPs genotyped in these studies which had minor allele frequencies (MAF) exceeding 1% and passed our quality control standards, 328,044 were only genotyped by the Affymetrix platform, 437,739 only by the Illumina platform, and 75,839 were genotyped by both platforms. Since only 9% of SNPs are shared between these platforms, imputation was used to combine results across studies. To develop imputation rules, we took advantage of the fact that 1,422 of the original WTCCC controls which were included in the T1DGC study had been genotyped on both platforms (Methods).
An analysis using Mantel’s extension to the 1 degree of freedom (1 df) Cochran-Armitage trend test which combined comparisons over the three studies yielded 41 distinct genomic locations with P
-values < 10-6
() (Individual plots for each study are in Supplementary Figure 1
). Fifteen of these sites were in regions where there were prior reports of association to T1D (). The remaining 26 of these locations along with one weaker association on the X chromosome, were chosen for further analysis. To address the possible effects of population structure, the analyses were stratified by geographical region in the case of the British studies and by a “propensity score” based on principal components analysis on the US study. This was only partially successful in reducing the over-dispersion of test statistics, a large part of which derived from the US data (). If the residual over-dispersion were due to population structure, there would be a strong case for correcting the P
-values (as shown in ). However, the modest effect of the stratified analysis on over-dispersion, taken together with the absence of any over-dispersion in case-only interaction tests (see below) suggests that it is more likely due to differential genotyping errors. In this case, correction of the most significant P
-values would be over-conservative since we have carefully checked all genotyping cluster plots for associated SNPs. The genomic control corrected P
-values are nevertheless shown in Supplementary Table 1
. The strongest associations tended to become somewhat less significant, but the choice of regions for follow-up, based on the criteria of P
, was not affected. We also carried out, for SNPs with minor allele frequency exceeding 10%, 2 df “genotype” tests which would be more sensitive to associations showing marked dominance (deviation from an additive model, on the log scale). Significance was notably increased, by 3 to 4 orders of magnitude, at three SNPs, but was less significant than the corresponding 1 df tests otherwise (Supplementary Table 1
) yielding no additional findings at P
. The results of both simple and stratified 1 df tests of these SNPs, separated by study, are shown in Supplementary Tables 3
. Quantile-quantile plots for tests in our new (T1DGC) study, and in the meta-analysis, after removal of tests for SNPs in linkage disequilibrium (LD) regions surrounding known and putative associations, are shown in Supplementary Figure 2a and 2b
Figure 1 Genome-wide plots of -log10 P-values from stratified 1 df tests combining results from all three studies. Values of -log10 P greater than 10 are plotted at 10. SNPs only present on the Illumina chip are shown in blue, those only present on the Affymetrix (more ...)
Results for locations of known susceptibility loci for type 1 diabetes.
Over-dispersion factors (λ) of 1 df association tests
The most significantly T1D associated SNPs from each of the 27 novel regions selected for replication were genotyped in a further 4,267 cases and 4,670 controls and in 4,342 trios from 2,319 T1DGC families with multiple affected offspring. Genotype data passed design and quality control criteria for 25 of these SNPs. Eighteen regions replicated with P
< 0.01 and showed genome-wide significant (P
< 5 × 10-8
) association in the joint analysis of the genome scans and replication samples (, individual scan data in Supplementary Table 2
). A further three of the remaining seven SNPs also showed P
< 0.01 in the replication studies, and a fourth had P
< 0.05, but these failed to reach overall P
< 5 × 10-8
(). This study, therefore, adds 18 T1D risk loci to the existing 24, and provides suggestive support for four more. As expected, nearly all of these loci have OR < 1.2, as larger effects would likely have been discovered in earlier studies. Two of the new associations (10q23 and 16q23) contradict this trend and highlight the disparity between genomic coverage of the older Affymetrix 500K chip and the newer Illumina 550K: these loci do not have a good proxy on the Affymetrix chip, explaining why they were not previously identified despite relatively large effect sizes (OR ~ 1.3).
Replication study of new type 1 diabetes risk loci
The families utilized for replication were derived from affected sib-pair linkage studies. One consequence of ascertainment on the basis of at least two affected siblings was a high frequency of high risk HLA genotypes16
. It has been reported that relative risks for several non-HLA loci are reduced in subjects carrying high risk HLA genotypes17, 18
, reflecting deviation from a multiplicative model for joint effects, and this would lead us to expect reduced effect sizes in multiple-case families. Indeed, the results of the replication study were generally less convincing in the family data than in the case-control data reflecting smaller effect sizes in the families. One potential explanation for these different effect sizes lies in possible statistical interaction among risk loci leading to a less-than-multiplicative accumulation of risk in samples (such as those from multiplex families) with a large number of risk variants. This hypothesis is difficult to test because power to detect interaction terms is much less than that to find equivalent sized main effects and is doubly compounded when specific causal variants (rather than tag SNPs from a GWA scan) are not known.
We tested for deviation from the model of multiplicative effects with HLA, on a genome-wide basis, by first calculating predictive risk scores using SNPs in the MHC region on each platform, and testing for association between this score and every other SNP in the remainder of the genome. These tests are “case-only” tests for statistical interaction reflecting variation of allelic relative risks with the level of HLA-attributable risk. As noted earlier, these test statistics did not show the over-dispersion which would have been indicative of population stratification (Supplementary Figure 2c
). However, the subset of these tests concerning established T1D susceptibility loci tended to have larger chi-squared values than expected by chance (Supplementary Figure 2d
). In the majority of cases (31/45), the interaction tests took the opposite sign from the main effect test, consistent with high MHC risk leading to lower risk for other loci. Of the five interactions which reached P
< 0.05, four were of this type (loci near 2q24.2/IFIH1
, 17p13.1 and 2q33.2/CTLA4
). We carried out a further test by calculating a T1D risk score using all associated loci excluding the MHC region and testing, in cases only, for correlation between this score and the MHC risk score. We found a weak, but significant (P
=0.0007) negative correlation, again indicating that risk from HLA and non-HLA sources accumulates at a rate less than expected based on the model of multiplicative effects, so that there is a general tendency for relative risks for non-HLA loci to be reduced when HLA-related risk is high.
Several of the 18 regions identified here contain genes of possible functional relevance to T1D. These include the region 1q32.1 containing the potent immunoregulatory cytokine genes, IL10, IL19
. The region of strong LD at 9p24.2 contains only a single gene, GLIS3
. Mutations in GLIS3
have been reported in children from three different consanguineous families with permanent neonatal diabetes associated with congenital hypothyroidism and other clinical complications19
. The region on 12p13.31 harbors a number of immunoregulatory genes including CD69
, which is induced by activation of T cells and functions in thymic egress20
. Several other members of the calcium-dependent (C-type) lectin (CLEC) domain family with immune functions also map to this region. Overall, our results provide a rich new source of candidate genes, but until further genotyping, re-sequencing and functional studies are performed, it is not possible to be more specific in regard to which genes might be causal.