|Home | About | Journals | Submit | Contact Us | Français|
To test chromosomes carrying the same DRB1-DQA1-DQB1 haplotype for SNPs in the major histocompatibility complex (MHC) that might mark subgroups of the haplotype with different risks for type 1 diabetes (T1D).
Chromosomes from T1D children, their parents, and non-diabetic siblings in families of the Type 1 Diabetes Genetics Consortium (T1DGC) were analyzed by two haplotype-based methods: (1) logistic regression analysis restricted to phased chromosomes carrying the same DRB1-DQA1-DQB1 haplotype but differentiated by the two alleles at MHC SNPs which were individually tested for association with T1D; (2) homozygous parent TDT (hpTDT) testing for biased transmission of a SNP allele to diabetic children from parents who are heterozygous at the SNP but homozygous for the specific DRB1-DQA1-DQB1 haplotype being evaluated.
A number of SNPs gave nominally significant (p<0.05) evidence of marking two subsets of the 301-501-201 haplotype that might differ with respect to their diabetogenic potency. However, none of the SNPs achieved experiment-wide significance and hence may be false-positive associations.
We discuss limitations and possible deficiencies of our study suggesting further work which might yield more robust SNP associations marking two subgroups of a DRB1-DQA1-DQB1 haplotype with different T1D risks.
The primary genetic determinants of susceptibility to type 1 diabetes (T1D) in Caucasians appear to be haplotypes defined by alleles at the Human Leukocyte Antigen (HLA) class II loci DRB1, DQA1 and DQB1 . In particular, the common DRB1-DQA1-DQB1 haplotypes (0301-0501-0201, 0401-0301-0302, 0402-0301-0302, 0405-0301-0302) account for a major portion of T1D susceptibility conferred by the HLA region, and furthermore other DRB1-DQA1-DQB1 haplotypes are also associated with lesser degrees of T1D susceptibility or resistance [1,2]. However there are a number of important unresolved issues about the role of DRB1-DQA1-DQB1 haplotypes in T1D etiology. One unresolved question is the degree to which susceptibility to type 1 diabetes is altered by additional HLA variants other than those at DRB1, DQA1 and DQB1. Such additional HLA variants might act independently of DRB1-DQA1-DQB1 haplotypes and perhaps be located at a substantial distance from the class II loci. Alternatively, the additional HLA variants might depend on the presence of a specific DRB1-DQA1-DQB1 haplotype, perhaps because the additional variant(s) physically modulate the diabetes risk of the haplotype or simply because the variant(s) “tag” two or more subsets of the haplotype that differ in their diabetogenic potency.
Here we describe our use of the Type 1 Diabetes Genetics Consortium (T1DGC) MHC fine-mapping data to search for evidence of this second type of HLA variant, namely variants whose association with diabetes risk may depend on the presence or activity of a particular DRB1-DQA1-DQB1 haplotype. To search for such variants we pursued two independent strategies which each statistically controlled for T1D risk contributed by the DRB1-DQA1-DQB1 haplotype being considered while simultaneously testing other HLA variants for additional association with T1D beyond that explained by the DRB1-DQA1-DQB1 haplotype. One of these two strategies involved phasing of genotypes across the MHC region into extended haplotypes followed by logistic regression analysis that considered only a single DRB1-DQA1-DQB1 haplotype and tested individual SNPs carried on the haplotype for additional association with T1D. Besides being a first step in this strategy, phasing of MHC chromosomes also provided us with a valuable overview of DRB1-DQA1-DQB1 haplotype frequencies across the T1DGC datasets, enabling population differences and similarities to be noted. Until very recently, it would have been impossible to phase the ~2900 genotyped MHC SNPs into haplotypes for the ~11000 T1DGC samples since the available haplotype-phasing software could not process more than a few hundred subjects with reasonable speed and computer memory. However, the newly released Beagle software can accurately resolve unphased genotypes at thousands of SNPs into haplotypes for datasets of several thousand individuals in a reasonable (3-4 hour) timeframe . By comparison, our attempt to phase chromosomes in one T1DGC cohort using FASTPHASE took on the order of days. We therefore capitalized on Beagle's availability by phasing genotypes at 8 classical HLA genes together with high-density MHC SNP genotypes in each of the T1DGC datasets in order to apply the haplotype regression-based analysis we describe.
Our second strategy to search for HLA variants that mark differential disease risk within a specific DRB1-DQA1-DQB1 haplotype was to apply the “homozygous parent” TDT (hpTDT), a technique that appears to have been first adopted by Li et al . The hpTDT counts transmissions of each marker allele from heterozygous parents to affected offspring, but only includes transmissions from parents who are also homozygous at one or more additional polymorphisms. By requiring both parental chromosomes to be identical (i.e. homozygous) at the additional polymorphisms, the hpTDT removes any influence of these polymorphisms on biasing transmissions at other nearby markers on the same chromosome that are being tested for association with disease. The polymorphisms required to be homozygous are usually known to be associated with disease and the goal of removing their influence is to detect transmission bias emanating from other undiscovered polymorphisms in the same region that also alter disease susceptibility. For our application of the hpTDT, we required that DRB1, DQA1, and DQB1 be homozygous for a specific combination of classical HLA alleles and hence SNP alleles being tested by the hpTDT are carried on the DRB1-DQA1-DQB1 haplotype defined by the required combination of homozygous alleles. Consequently, the hpTDT is a haplotype-based analysis which, like the regression analyses we applied, only considers the subset of chromosomes that carry a specific DRB1-DQA1-DQB1 haplotype and searches within this subset for SNP allele variation that exhibits association with T1D susceptibility. However unlike the regression analysis, the hpTDT haplotype-based analysis does not rely on prior phasing of haplotypes by inferential steps which might sometimes be inaccurate or yield artefacts peculiar to the haplotype-phasing algorithm. Thus, the hpTDT analysis provided an important complementary strategy to our haplotype-based logistic regression analyses.
We analyzed subjects genotyped for the T1DGC MHC Fine Mapping project and identify the specific cohorts by the abbreviations adopted by the T1DGC (AP=Asia Pacific, BDA=British Diabetic Association, DAN=Danish, EUR=European, HBDI=Human Biological Data Interchange, JOS=Joslin Diabetes Center, NA=North American, SAR=Sardinian, UK=United Kingdom). Due to differences in coding of classical HLA genes in the BDA cohort compared to the other cohorts, BDA subjects were omitted from some of the analyses. The samples and genotype data in the T1DGC MHC fine-mapping study are described in greater detail in .
Phasing of chromosomes genotyped at 2919 MHC SNPs and at 8 classical HLA Class I and II genes was executed in Beagle software . We phased chromosomes across the MHC region separately for parents and children, and also separately for each T1DGC cohort. Beagle phasing was blind to affection status of subjects. These steps produced two phased MHC chromosomes for each T1DGC subject defined by alleles at the 2919 SNPs and 8 classical HLA genes. Combined phasing of the 2919 MHC SNPs and of the 8 classical HLA loci was anticipated to provide enhanced accuracy in the phased haplotypes since Beagle was able to incorporate information from the strong linkage disequilibrium which extends across the ~4.5 Mb region covered by the MHC SNPs .
Our logistic regression analyses aimed to test individual SNPs for association with T1D which might be due to: (1) altering the diabetes risk of a specific DRB1-DQA1-DQB1 haplotype or (2) marking a specific DRB1-DQA1-DQB1 haplotype for two subgroups with different diabetes risk. We therefore conducted logistic regression analyses for each common DRB1-DQA1-DQB1 haplotype that only considered the subset of phased MHC chromosomes carrying the DRB1-DQA1-DQB1 haplotype being evaluated and which tested individual SNPs carried on the haplotype for association with diabetes. Phased chromosomes were designated as “affected” (diabetic) or “unaffected/control” and coded as 1 or 0, respectively, to form the dependent variable of the regression equation. There were two predictor variables in the equation: (1) T1DGC cohort (coded as a factor) so that SNP allele frequencies for diabetic and control chromosomes were compared within cohort; and (2) an indicator variable to code for the presence (or absence) on the haplotype of a particular allele at the SNP being tested for association with diabetes. Note that this regression analysis treats the two haplotypes in each subject as independent observations analogous to standard allele tests for disease association by the Pearson chi-square or Armitage Trend statistics and hence no genetic model (recessive, dominant, etc.) is specified. Such tests of allelic rather than genotypic association with disease are often conducted in genetic investigations  and have been shown to be both valid and more powerful than genotypic tests for many disease models (for example, see [8,9]). Each SNP's association with diabetes was evaluated in the regression analysis by the standard likelihood ratio test which is distributed as chi-square with one degree of freedom. Logistic regression was carried out in R software (http://www.r-project.org/) with separate analyses being conducted for the following six common DRB1-DQA1-DQB1 haplotypes: 0401-0301-0302, 0405-0301-0302, 0701-0201-0303, 0301-0501-0201, 0101-0101-0501, 0701-0201-0201. Two relatively rare, diabetes-associated haplotypes (0801-0401-0402, 01601-0102-0502) were not evaluated due to their small sample size and likely limited statistical power. Prior to the regression analyses, we removed from the 11,279 total subjects approximately 11% who were missing genotypes at DRB1, DQA1, or DQB1.
In our initial efforts to apply logistic regression to Beagle-phased chromosomes, we assigned “affected” status only to parental chromosomes that were transmitted to all diabetic children in a family and “unaffected” status to the alternate parental chromosome not transmitted to any diabetic children in the family. We also explored a simpler approach in which, for each diabetic child, a parent's two chromosomes were tabulated as affected or unaffected depending on which was transmitted to the diabetic child. However, for many parental chromosomes (represented by extended, possibly recombinant, haplotypes), there were difficulties in unambiguously determining which chromosome had been transmitted to a diabetic offspring which resulted in substantial missing data whose missingness might not be random. We therefore ultimately settled on two other approaches to the haplotype-based regression analysis that did not require determination of which parental chromosome had been transmitted to a diabetic child. One of these analyses was to select one affected and one unaffected sibling from each T1DGC family that had an unaffected child and designate the two chromosomes in the diabetic sib as “affected” and those in the non-diabetic as “unaffected”. This regression analysis considered the 824 pairs of diabetic and non-diabetic sibs which were available in the T1DGC families. The second analysis was to select, from each T1DGC family, one diabetic child not used in the first regression analysis and to designate their chromosomes as “affected” while chromosomes from the child's parents were designated as “unaffected”. This second regression analysis was based on a sample set of 1387 diabetic children and their 2774 parents. Since these two regression analyses evaluated different subjects, we regard them as providing independent tests of whether a particular SNP is associated with diabetes when carried on chromosomes with a specific DRB1-DQA1-DQB1 haplotype.
We applied a homozygous parent TDT (hpTDT) analysis similar to that described by Lie et al  by creating our own software. In our version of hpTDT, we only considered SNP allele transmissions from parents who were homozygous for a specific haplotype at DRB1-DQA1-DQB1. To apply the hpTDT, we first identified parents homozygous for the particular DRB1-DQA1-DQB1 haplotype and among these parents identified the subset heterozygous at the SNP to be evaluated by the TDT for preferential transmission of one of its alleles to diabetic children. For this subset of heterozygous parents, we counted transmissions of the two SNP alleles to diabetic children and calculated the standard TDT chi-square statistic with one degree of freedom [χ2=(a−b)2/(a+b) where “a” and “b” are total counts of alleles A and B transmitted from heterozygous A/B parents to affected children].
Some DRB1-DQA1-DQB1 haplotypes are not sufficiently frequent to be found as homozygotes or are not carried as homozygotes by sufficient parents who are also heterozygous at the SNP to be evaluated by the hpTDT. To provide sufficient statistical power and to reduce multiple testing in our hpTDT analyses, we only considered SNPs with at least 10 allele transmissions from heterozygous parents to affected children. Due to total TDT transmissions falling below this threshold, a number of DRB1-DQA1-DQB1 haplotypes were not evaluated and, for other DRB1-DQA1-DQB1 haplotypes, we only considered a subset of the genotyped SNPs. In total, 5288 hpTDT tests were conducted representing ~30% of the ~17,500 regression tests that could be conducted if the 6 common HLA haplotypes (see above) were tested at all of the 2919 genotyped SNPs.
To combine chi-square results from the two regression analyses for each SNP and DRB1-DQA1-DQB1 haplotype, we applied the meta-analysis method of Stouffer . In this method, a chi-square value is converted to its corresponding Z-score and appropriate direction of association (positive or negative) followed by summing of the Z-scores and division by the standard deviation to achieve an overall Z-score for the independent components (Z-scores) of a study. Since the Z-scores are independent random variables, the standard deviation of the overall Z-score is the square root of the number of independent study components (hence we divided the summed Z-scores for the two T1DGC regression analyses by the square root of 2). This enabled us to achieve an overall p-value for each SNP which combined the regression results for the two independent analyses.
SNPs providing highly significant evidence for association were inspected for cluster plots and also assessed in relation to other genotyped SNPs if pairwise linkage disequilibrium with the statistically significant SNP was very high (r2>0.9) as determined from online MHC SNP genotypes in Caucasians . For example, SNP rs2040410 was highly significant within the 0301-0501-0201 haplotype when tested by the hpTDT (p<4.6×10−9); but we subsequently concluded this was due to genotyping artefact (see Appendix I). Evidence for genotyping error was partly provided by another genotyped SNP (rs6914950) with a pairwise r2 value of 0.93 with rs2040410. Following cluster plot inspection, rs2040410 was removed from further analysis. The cluster plots of rs2040410 and of another SNP (rs763026) having an unconfirmed association signal were suggestive of the presence of a nearby copy number variant (CNV) which is consistent with an elevated density of CNVs in this region (http://projects.tcag.ca/variation/).
For each T1DGC dataset, Table 1 shows the frequencies of DRB1-DQA1-DQB1 haplotypes in T1DGC parents as phased by Beagle software. The table lists all DRB1-DQA1-DQB1 haplotypes with at least 5 counts in the T1DGC parents considered in aggregate, and names of haplotypes known to be predisposing to type 1 diabetes [1,2] are highlighted in black. In general, haplotype frequencies are similar across the datasets except for those in the Sardinia (SAR). Sardinians appear to completely lack the diabetes-predisposing 0401-0301-0302 haplotype which is very frequent (>14%) in all of the other populations. Furthermore, other diabetes-predisposing haplotypes (0301-0501-0201, 0405-0301-0302, 1601-0102-0502) appear to be much more frequent in Sardinia than elsewhere. Marked differences in DRB1-DQA1-DQB1 haplotype frequencies between Sardinia and mainland Europe have previously been described .
As described above, we applied logistic regression to phased chromosomes carrying the same DRB1-DQA1-DQB1 haplotype in order to detect MHC SNPs that might exhibit association with type 1 diabetes because the SNP marks two subgroups of the DRB1-DQA1-DQB1 haplotype that have different diabetes risks. For each DRB1-DQA1-DQB1 haplotype that we evaluated, separate regression analyses were applied to two different datasets derived from the T1DGC families: (1) 824 pairs of diabetic and non-diabetic siblings identified by selecting one discordant sib-pair from each T1DGC family that had an unaffected child; (2) 1387 diabetic children and their 2774 parents identified by selecting, from each T1DGC family, one diabetic child not included in dataset (1). The results of regression analyses conducted on these two datasets can be regarded as being independent since the datasets consist of different subjects. Therefore, as a strategy to identify true-positive associations and filter out false positives and artefacts, we decided to focus on SNPs that gave evidence of significant association (p<0.05) in the regression analyses of both data sets. Although hpTDT results are not independent of the regression analysis results, we also applied the additional filter of requiring a SNP be statistically significant (p<0.05) in hpTDT analysis of the DRB1-DQA1-DQB1 haplotype in order to cross-check and validate the associations observed in the two regression analyses.
In applying these filters, we found that very few SNPs were statistically significant in both regression analyses conducted for the same DRB1-DQA1-DQB1 haplotype. Of the six common DRB1-DQA1-DQB1 haplotypes we evaluated, three haplotypes (0401-0301-0302, 0405-0301-0302, 0701-0201-0303) did not yield a single SNP with p<0.05 in both regression analyses. For the other three DRB1-DQA1-DQB1 haplotypes, most of the SNPs with significant (p<0.05) regression and hpTDT results were found for the 0301-0501-0201 haplotype and these results are displayed in Table 2. In some instances, the significant results in Table 2 are for two or more SNPs in strong pairwise LD (r2>0.9, see column 2 of the table) which increases confidence that the significant result is not due to genotyping error. The few remaining SNPs that gave significant results for 0101-0101-0501 or for 0701-0201-0201 in both regression analyses are shown in Table 3. Only one SNP in Table 3 (rs2523676 for the 0101-0101-0501 haplotype) was also significant in the hpTDT analysis.
The convergence of significant regression results for the particular SNPs listed in Tables Tables22 and and33 raises the question of whether any of these SNPs exhibit experiment-wide statistical significance that accounts for multiple testing. To provide an approximate p-value cutpoint for experiment-wide significance, we take 5288 as a lower bound on the number of statistical tests since this was the number of hpTDT tests conducted across the six DRB1-DQA1-DQB1 haplotypes after discarding hpTDT analyses for SNPs with fewer than 10 total TDT transmissions (see Methods). This lower bound for statistical tests implies a Bonferroni-corrected p-value of 9.4×10−6 for experiment-wide significance. To determine if any SNPs in Tables Tables22 or or33 yield a p-value below 9.4×10−6, we note that the two regression analyses conducted for each DRB1-DQA1-DQB1 haplotype can be considered independent and hence we combine the two results to achieve an overall p-value for each SNP (see Methods). The lowest such overall p-value from Tables Tables22 and and33 is 0.00011 (for rs3130685 in Table 2) and hence none of the combined regression analysis results appear to have experiment-wide significance. Nor do any of the hpTDT results, considered on their own, achieve experiment-wide significance. So, in summary, the SNPs highlighted in Tables Tables22 and and33 provide provisional evidence of marking subsets of DRB1-DQA1-DQB1 haplotypes with different diabetes risk; but none of the SNPs provided convincing experiment-wide significance and hence the SNPs may be false positives.
In this paper, we described results of a novel approach that searched for MHC SNPs that might mark chromosomes having different diabetes risk despite carrying the same haplotype at DRB1-DQA1-DQB1. This novel approach relied heavily on the newly released Beagle software which is able to rapidly phase genotypes from several hundred thousand SNPs in datasets consisting of several thousand subjects. We identified a number of SNPs that gave nominally significant evidence for marking subsets of the 301-501-201 haplotype that might differ with respect to their diabetogenic potency. However, none of the SNPs achieved experiment-wide significance and hence the SNPs may be false-positive associations.
To try to determine if any of these SNPs might be true-positives, we followed up these results by attempting to replicate them in the BDA cohort. The BDA families were not included in our primary analyses because DQA1 was not genotyped in BDA samples and the genotypes at DRB1 and DQB1 were coded somewhat differently than in the other T1DGC cohorts. However the BDA coding was unambiguous for carriers of the 301-501-201 haplotype enabling us to analyze this haplotype in the BDA samples using the same two regression analyses performed in the primary analysis (i.e. one diabetic versus non-diabetic sib per family; one affected child per family versus the parents). To increase the power of this attempted replication, we combined the results of the two BDA regression analyses by the same method used in the primary study (see Methods). When we compared these combined BDA regression results with those of the SNPs listed in Table 2, we found that none of the SNPs were replicated (i.e. p<0.05) in the BDA cohort. Though the smaller size of the BDA dataset might lack sufficient power to replicate the modest effect sizes in Table 2, the failure to replicate reinforces the impression that the SNPs may be false-positives.
This raises the issue of whether failure to identify more robust findings might be due to limitations or deficiencies of the analysis we conducted. Several items should be mentioned in this regard: The regression comparisons of affected versus non-affected sib, and of affected child versus parents are likely to be less powerful than a comparison of diabetic cases versus unrelated controls . Another possible weakness might arise from an unidentified feature of the Beagle software or its phased haplotypes which are not well suited to the haplotype-based regression analysis we conducted. We included the hpTDT analysis to guard against this possibility and to provide an alternative test for association that did not depend on haplotype phasing; however, a limitation of the hpTDT is that many combinations of heterozygous SNP and homozygous DRB1-DQA1-DQB1 haplotype were found in too few parents to test. Further work to address these possible deficiencies and limitations might identify more robust associations in which SNPs mark subgroups of DRB1-DQA1-DQB1 haplotypes with different diabetes risk. In addition, phasing of MHC haplotypes in large numbers of subjects may help answer a related question we did not address: namely whether any MHC variants modulate type 1 diabetes independent of specific DRB1-DQA1-DQB1 haplotypes.
This research utilizes resources provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), National Human Genome Research Institute (NHGRI), National Institute of Child Health and Human Development (NICHD), and Juvenile Diabetes Research Foundation International (JDRF) and supported by U01 DK062418.
This work was supported by the Wellcome Trust.
As noted in the main text, our initial hpTDT analyses identified SNP rs2040410 as providing highly significant TDT evidence (p<4.6×10−9) of marking two subsets of the 0301-0501-0201 haplotype with very different diabetes risks. However, further examination of this result indicated that it is likely to be due to a limited number of genotype errors in which homozygous parents who are “1/1” at rs2040410 have been miscalled as “1/3” heterozygotes; and hence TDT allele transmissions to affected children are tabulated for these (apparently) heterozygous parents. Since the other parent in the 24 families generating TDT significance is almost always a “3/3” homozygote, it is important to note that all children in the 24 families (affected and unaffected) have been called as “1/3” heterozygotes – this is exactly what their genotypes should be if their parent's genotypes are actually “1/1” and “3/3” at rs2040410. The affected “1/3” children appear to always inherit allele 1 (and never allele 3) from the likely miscalled “1/3” parent which, in turn, generates the highly significant hpTDT result. However, evidence that the “1/3” parents are miscalled is provided by the fact that all non-diabetic children in the 24 families also exhibit the same extremely skewed inheritance of allele 1 as diabetic children. Furthermore, strong evidence for miscalling of rs2040410 in the “1/3” parents is also provided by rs6914950 which is in strong LD with rs2040410 in HapMap CEU subjects (r2=0.93). All of the “1/3” parents at rs2040410 in the 24 families are called as “1/1” homozygotes at rs6914950. Consequently, no significant hpTDT result is generated at rs6914950 because there are almost no parents in the 24 families who are heterozygous at rs6914950 and homozygous for 301-501-201 at DRB1-DQA1-DQB1. This again indicates that the correct genotype for the “1/3” parents at rs2040410 is “1/1”.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no conflict of interests in publishing this article.