|Home | About | Journals | Submit | Contact Us | Français|
We recently reported an association with type 1 diabetes of a telomeric MHC SNP rs1233478. As further families have been analyzed in the Type 1 Diabetes Genetics Consortium (T1DGC), we sought to test replication of the association and with more data analyze haplotypic associations.
We have since analyzed an additional 2,717 case and 1,315 control chromosomes from the T1DGC, with HLA-typing and data for 2,837 SNPs across the MHC region.
We confirmed the association of rs1233478 [new data only: p=2.2E-5, OR=1.4]. We also found two additional SNPs nearby which were significantly associated with type 1 diabetes (new data only rs3131020: p=8.3E-9, OR=0.65; rs1592410 p=2.2E-8, OR=1.5). For studies of type 1 diabetes in the MHC region it is critical to account for linkage disequilibrium with the HLA genes. Logistic regression analysis of this new data indicated that the effects of rs3131020 and rs1592410 on type 1 diabetes risk are independent of HLA alleles (rs3131020: p=2.3E-3, OR=0.73; rs1592410: p=2.1E-3, OR=1.4). Haplotypes of 12 SNPs (including the three highly significant SNPs) stratify diabetes risk (high risk, protective, and neutral), with high risk haplotypes limited to approximately 20,000 base pairs in length. The 20,000 base pair region is telomeric of the UBD gene and contains LOC729653, a hypothetical gene.
We believe that polymorphisms of the telomeric MHC locus LOC729653 may confer risk for type 1 diabetes.
Type 1A diabetes is a complex autoimmune disease and is associated with genes within the major histocompatibility complex (MHC) and in other regions of the genome. Genome-wide association studies and other studies have identified more than 40 type 1 diabetes-associated genes, including insulin (INS), PTPN22, CTLA4, IFIH1 and IL2RA/CD251–17. However, the most significant association is with the MHC region, particularly the HLA-DRB1, HLA-DQA1, and HLA-DQB1 genes. The highest risk genotype for type 1 diabetes is DRB1*03-DQA1*0501-DQB1*0201/DRB1*04-DQA1*0301-DQB1*0302 (DR3/4)18,19.
There are genes other than HLA-DR/DQ within the MHC that also contribute to type 1 diabetes risk. Alleles of HLA-DPB1, HLA-B and HLA-A have also been associated with type 1 diabetes, independent of linkage disequilibrium with HLA-DR/DQ20,21. In particular, the HLA-B*39 allele is very high risk for type 1 diabetes, although it has a low frequency (4% of T1DGC case chromosomes)22,23. HLA-DPB1*0402 is protective against the development of anti-islet autoimmunity and diabetes24–26. We believe there are additional genes in the MHC region that contribute to type 1 diabetes risk as DR3/4 siblings that share 2 MHC haplotypes identical by descent with the proband have an 85% risk for developing anti-islet autoimmunity whereas those individuals that share 1 or 0 haplotypes identical by descent only have a risk of 20%27. As expected, class I HLA alleles associated with type 1 diabetes such as B*39 and A*24 are uncommon in the DAISY sibling case dataset given their low overall frequency.
Using an initial partial dataset from the Type 1 Diabetes Genetics Consortium (T1DGC), we previously reported an association of rs1233478 at the telomeric end of the MHC region with type 1 diabetes risk28. In this study, we analyzed additional data from the T1DGC, which provided an additional 1,060 affected sibling pairs and their parents for classical HLA alleles and 2,837 SNPs (single nucleotide polymorphisms) across the MHC region. We have confirmed the previous association at the telomeric end of the MHC region, identified additional strongly associated linked SNPs, and have detected a transcribed gene LOC729653 within the candidate region.
This analysis included 2,300 affected sibling pair families (10,012 individuals typed for HLA and/or SNPs) from the Type 1 Diabetes Genetics Consortium (T1DGC), using the 2007.11.MHC data freeze29. Affected sibling pairs and their parents were enrolled in 9 cohorts worldwide. Within the analyzed cohorts of Asia-Pacific, Europe, North America, UK, British Diabetes Association (BDA), Danish, Human Biological Data Interchange (HBDI), Joslin and Sardinian, 99% of individuals are classified as white/Caucasian or unknown. The T1DGC performed basic quality control analyses on the data. All study participants or their parents/surrogates provided written informed consent to participate, and the study protocol was approved by the relevant Ethics Committees and Institutional Review Boards.
Genotyping was completed for 3,072 SNPs at the Wellcome Trust Sanger Institute, using custom Illumina exon-centric and mapping panels [2957 distinct SNPs (1536 SNPs in each panel with 115 overlapping SNPs) with 2837 of 2957 SNPs successfully typed, yielding a 96% SNP success rate]. In addition, complete 4 digit HLA typing (HLA-DPB1, HLA-DPA1, HLA-DQB1, HLA-DQA1, HLA-DRB1, HLA-B, HLA-C and HLA-A), performed using immobilized probe linear arrays, was available for all samples30.
SNP positions used NCBI Build 36. T1DGC chromosomes were generated from SNP and HLA genotype data using multiple software packages. First, to establish that the genotype data demonstrated a Mendelian inheritance pattern within each family, the PedCheck program (http://watson.hgen.pitt.edu)31 was used on data from both Illumina panels and HLA separately. Mendelian inheritance patterns were present for all families. Next, data from the Illumina mapping SNP panel, the exon-centric SNP panel, and HLA were combined using custom Java programs. Merlin software (www.sph.umich.edu/csg/abecasis/Merlin)32 was used to phase the SNP and HLA genotype data from families into chromosomes. In situations of ambiguous phase (resulting from heterozygous SNPs or HLA in all family members), phase was not inferred and instead unphased alleles were labeled as such and were excluded from analyses where appropriate.
Founder chromosomes were used in these analyses, yielding 4 unique chromosomes per family. AFBAC (affected family based control) methodology was used to assign case or control status to chromosomes using Microsoft Excel macros as previously described33–36. Founder chromosomes were labeled as case if they were ever transmitted to a case individual. Correspondingly, chromosomes were labeled as control if they were never transmitted to a case individual. Two affected children in each family were used to label the chromosomes as either case or control, as this was the ascertainment scheme for T1DGC.
LOC729653 expression was analyzed using the TissueScan Human Major Tissue qPCR Array Panel (cDNA from 48 normal human tissues) according to the manufacturer’s protocols (Origene Technologies, Rockville, MD). Reactions were amplified using primers generated against exon 3 (Ex3F) and the 3’ untranslated region (UTR1R) (Supplemental Table 1). The amplification conditions were as follows: 3 minutes at 94°C, 40 cycles (94°C for 15 seconds, 57°C for 1 minute, 72°C for 1 minute), and a final extension of 72°C for 15 minutes.
Expression results were verified using Taqman quantitative PCR (Applied Biosystems, Foster City, CA), the TissueScan Human Major Tissue qPCR Array Panel and purified human whole pancreas and pancreatic islets (> 75% pure, >75% viable) from healthy non-diabetic donors procured from the JDRF nPOD initiative and the NIH/NIDDK-sponsored Islet Cell Resource (ICR) distribution program respectively. Total RNA was extracted from human pancreas and isolated islets by TRIzol reagent® (Invitrogen, Carlsbad, CA). RNA was purified by RNeasy columns (Qiagen, Valencia, CA), quantified and quality verified by capillary electrophoresis (Agilent-2100 Bioanalyzer; Agilent, Palo Alto CA). cDNA was synthesized using iScript® cDNA Synthesis Kit (BioRad, Hercules, CA). The Taqman primers and probes designed and purchased from Applied Biosystems were as follows: forward primer: CATTTCTGGATTGCTGTCTGTC (anneals between residues 37 and 58 with a Tm of 56°), reverse primer: CAGGAGATGTGGTTCTGGTAGCT (anneals between residues 134 and 112 with a Tm of 58°), FAM /TAMRA probe: CATCTGCCCTGCTCAGTCCTGGG (anneals between residues 83 and 105 with a Tm of 68°). Three micrograms (µg) of cDNA was amplified in a 30 µl standard amplification reaction using Taqman Universal Master Mix in an Applied Biosystems Thermocycler ABI Prism® 7000. Two experiments were performed using 40 cycles, and the final experiment was performed using 45 cycles. Mean Ct (Table 3) was calculated across the three experiments, substituting either 40 or 45 for the Taqman calculations where there was no amplification (>40 or >45 in Supplemental table 5).
Further characterization of the LOC729653 gene sequence was performed using First Strand cDNA Human Stomach (Origene Technologies, Rockville, MD) with primers listed in Supplemental Table 1 (Ex2F and UTR1R, Ex3F and UTR1R, Ex4F and UTR2R), followed by Sanger sequencing of each resulting product. In addition, Sanger sequencing was performed on 8 T1DGC individuals, each homozygous for one of four different haplotypes surrounding LOC729653. Sequencing focused on regions surrounding exons and the 3’ UTR of LOC729653, and partially covered the region from 29.571 Mb to 29.586 Mb. One low risk haplotype, one neutral and two different high risk haplotypes were represented, with sequencing performed for two individuals homozygous for each of the four haplotypes. Results are presented in supplemental table 6.
The 5’ end of LOC729653 was mapped by using rapid amplification of cDNA ends (5’ RACE) using the SMARTer RACE cDNA Amplification Kit and the Advantage® 2 PCR Kit according to instructions supplied by the manufacturer (Clontech Laboratories, Mountain View, CA). Primers used for the 5’ RACE were generated against the 3’ UTR and exon 4: one at the start of the 3’ UTR (RACE_OUT) and the other was a nested primer located in exon 4 (RACE_IN). 5’ RACE-ready cDNA was obtained from First Strand cDNA Human Stomach (Origene Technologies, Rockville, MD). The initial touchdown PCR conditions were 5 cycles (94°C for 30 seconds and 72°C for 3 minutes), 5 cycles (94°C for 30 seconds, 70°C for 30 seconds, and 72°C for 3 minutes) and 25 cycles (94°C for 30 seconds, 68°C for 30 seconds, and 72°C for 3 minutes), using the RACE_OUT primer. The reaction product was then used in a second round of PCR using the same conditions and the RACE_IN primer, followed by Sanger sequencing of the resulting products.
The Fisher’s exact test (two-sided) was used to calculate p-values for association with type 1A diabetes, using α=0.05. In addition, to verify that the significant findings were not caused by population stratification, the transmission disequilibrium test (TDT) was computed on the SNPs surrounding LOC729653, using Haploview version 4.2 (http://www.broadinstitute.org/mpg/haploview)37. We also used Haploview version 4.2 to determine Hardy-Weinberg Equilibrium (HWE) values for SNPs of interest and to generate the linkage disequilibrium plot in Figure 2. For the linkage disequilibrium plot, we used the standard color scheme and the default Gabriel et al. confidence interval definition of haplotype blocks38. An in-house script was used to complete the 12 SNP and 14 SNP haplotype analyses on case/control founder chromosomes. SAS software (version 9.2) was used to perform logistic regression analysis.
Utilizing additional data provided on 1,060 additional families from the Type 1 Diabetes Genetics Consortium (T1DGC), we were able to replicate the association of rs1233478 (identified in 1,240 initial families) at the telomeric end of the MHC region with type 1 diabetes (p=2.2E-5, OR=1.4) (Figure 1A, Supplemental Table 2). We also found two additional SNPs nearby which were significantly associated with type 1 diabetes (rs3131020: p=8.3E-9, OR=0.65; rs1592410: p=2.2E-8, OR=1.5). Given the larger combined dataset, three SNPs within and near LOC729653 are significantly associated with type 1 diabetes [rs3131020 p=1.6E-20, odds ratio (OR)=0.64; rs1233478 p=3.2E-25, OR=1.9; rs1592410 p=6.6E-20, OR=1.5] (Figure 1B, Supplemental Table 3). A summary of these results is presented in Table 1. We also focused specifically on the LOC729653 region (Figure 1C). These results were confirmed when we used the TDT to analyze transmission to affected children for each SNP. For the combined dataset, the “A” allele for all three SNPs was significantly over-transmitted to affected children (rs3131020: p=1.6E-22; rs1233478: p=3.3E-27; rs1592410: p=1.2E-21).
We performed haplotypic analyses to further localize the association signal. The high risk alleles for the three SNPs (in the T1DGC dataset) are rs3131020 “A”, rs1233478 “A” and rs1592410 “A.” A 12 SNP haplotype that includes A-A-A for the three SNPs of interest is the only haplotype (with frequency greater than 1%) that is significantly over-transmitted (CAGTGCAACAAC) (Table 2). When this is expanded to a 14 SNP haplotype (one additional SNP on each side), three haplotypes are significantly over-transmitted, all with the same core 12 SNP haplotype (Table 2). This suggests that the association signal [or causative high risk allele(s)] is within this 14 SNP range, or 19,440 base pairs. In addition, there are low risk or protective haplotypes present at this locus. The protective haplotypes of these 14 SNPs have GCTG and GCCG for rs3131020, rs1233478, rs3094572 and rs1592410, indicating that there is already a difference in risk across these 4 SNPs (8,066 base pairs), suggesting that the causative protective polymorphism may be within this region. We examined the linkage disequilibrium across the 14 SNPs of interest and found that there is one haplotype block in the region, encompassing the 12 central SNPs identified in our analysis (Figure 2).
As shown with HLA-DR/DQ stratification (Supplemental Table 4), the diabetes association of the rs1233478 “A” allele was not due to a single HLA-DR/DQ haplotype, as 12/14 haplotypes (with at least 100 chromosomes available for analysis) have positive odds ratios. To further evaluate the association of these SNPs in the context of HLA-DR/DQ effects in addition to class I HLA effects, we used logistic regression analysis to determine if the diabetes association of the 12 SNP haplotype is independent of the classical HLA alleles. We looked at the new/replication data and the combined dataset. In the replication dataset, two of the three SNPs have an effect independent of HLA-DQB1, HLA-DRB1, HLA-B and HLA-A (rs3131020: p=2.3E-3, OR=0.73; rs1233478: p=0.4, OR=1.1; rs1592410: p=2.1E-3, OR=1.4). In the combined data, each SNP by itself has an effect independent of HLA-DQB1, HLA-DRB1, HLA-B and HLA-A (rs3131020: p=0.02, OR=0.83; rs1233478: p=4.1E-3, OR=1.3; rs1592410: p=7.3E-3, OR=1.2). A summary of these results is presented in Table 1. We also tested the 12 SNP haplotype (coded as the high risk haplotype, CAGTGCAACAAC or not, which included all other haplotypes), and found that in the combined data the haplotype does have an effect independent of HLA-DQB1, HLA-DRB1, HLA-B and HLA-A (haplotype: p=0.01, OR=1.3). In a model with the three SNPs (rs3131020, rs1233478 and rs1592410) and the 12 SNP haplotype, the final model includes the 12 SNP haplotype and rs1592410. Extending this analysis, we tested the effect of the three SNPs only (rs3131020, rs1233478 and rs1592410) and found that in the combined data, the final model includes only rs3131020 and rs1233478, not rs1592410. We performed another logistic regression analysis to test the effect of the 12 SNP haplotype with specific high risk HLA alleles, and the final model included the 12 SNP haplotype (p=0.04, OR=1.2), HLA-DQB1, HLA-DRB1, HLA-B*39 and HLA-A*24.
Motivated by these results, we studied LOC729653. There is limited data available for LOC729653 from NCBI UniGene, other than two ESTs (expressed sequence tags) found in prostate tissue. We assayed cDNA from 48 different tissues, and found that spliced LOC729653 cDNA is present at low levels in 11 tissues (Table 3). We verified these results using Taqman quantitative PCR in three separate experiments (Table 3, detailed results in Supplemental Table 5). We found that LOC729653 cDNA is present at low levels in 19 tissues (defined as expression in 3 of 3 Taqman experiments), with less consistent expression results in an additional 20 tissues (defined as expression in 1 or 2 Taqman experiments). We also tested both whole pancreas and adult pancreatic islets, and by Taqman found expression in whole pancreas but not adult pancreatic islets. We next performed a combination of sequencing and 5’ RACE to determine the actual sequence of the transcribed LOC729653 mRNA. We found that LOC729653 contains at least three exons and has an unusual 3’ UTR that includes an intron that is spliced from the cDNA. A diagram of the transcript we have identified thus far is presented in Figure 3 and the sequence we obtained is in Supplemental Figure 1. We also sequenced eight T1DGC individuals, each homozygous for one of four haplotypes across LOC729653. We found that there was only one SNP at which the two individuals homozygous for the neutral haplotype differed, and the two individuals homozygous for the first high risk haplotype also differed at this SNP, which is presented in Supplemental Table 6. The only SNP which has one allele present in all the high risk individuals and the other allele present in all the low risk and neutral individuals is rs1233478. The only coding SNP identified was rs734961, and all eight individuals were homozygous for the common allele at this SNP.
We report here the confirmation of the association of SNPs at the telomeric end of the MHC region independent of HLA alleles (rs1233478, p=2.2E-5, OR=1.4) by analyzing additional families from the T1DGC. Using haplotype analysis, we define a conserved 20,000 base pair region for high risk haplotypes which includes the transcribed gene LOC729653. We have analyzed the expression pattern of LOC729653, and find that the gene is expressed in at least 19 different tissues.
Logistic regression analysis indicated an association of SNPs within this telomeric locus independent of HLA-DQB1, HLA-DRB1, HLA-B and HLA-A (rs1233478 p=4.1E-3, OR=1.3; haplotype p=0.01, OR=1.3). Given the logistic regression analysis with multiple loci, the p-values for the SNPs and the haplotype drop drastically when correction for classical HLA alleles is applied. There are 13 HLA-DRB1, 12 HLA-DQB1, 26 HLA-B and 15 HLA-A alleles in the dataset, which adds significant complexity to the logistic regression model. Even though the odds ratio for this haplotype is only 1.3 compared to higher odds ratios for B*39 and A*24 (HLA-B*39 p=3.3E-6, OR=2.7 and HLA-A*24 p=4.5E-5, OR=1.6 in the logistic regression model with the LOC729653 haplotype, HLA-DQB1 and HLA-DRB1), this high risk haplotype is much more common (22%) than A24 (10%) or B39 (3%). This odds ratio of 1.3 for the haplotype is equivalent to or greater than most non-MHC type 1 diabetes loci [INS relative risk (RR)=1.7, PTPN22 RR=1.6, CTLA4 RR=1.15, IL2RA RR=1.1, and IFIH1 RR=0.88]17.
Although finding a similar initial association of rs1233478, another group (Nejentsev and Howson) did not find a significant association after recursive partitioning with HLA-DRB1, HLA-DQB1, HLA-B, HLA-A and HLA-DPB1 alleles22,23. These analyses used recursive partitioning with genotypes rather than actual phased chromosomal haplotypes (that can be derived from the T1DGC data due to family analysis) to model the effect of HLA-DRB1 and HLA-DQB1 genotypes, followed by conditional logistic regression for HLA-B, HLA-A and HLA-DPB1. In the current study, we utilized defined chromosomal haplotypes and thus were able to define control chromosomes (non-transmitted chromosomes), while Nejentsev and Howson studied genotypes (both chromosomes together). Analysis of genotypes limits the number of controls with the DR3/4 highest risk genotype, as it is present in only 2.4% of the general population but in 30% of type 1 diabetes patients19. Analysis of defined chromosomal haplotypes allows for ascertainment of the direct contribution of DR3 and DR4 chromosomes that are present in approximately 22% of non-transmitted chromosomes.
A recent report describes a genome-wide linkage scan in the rat39. The results of this indicate that a genomic region, termed Iddm37, is linked to susceptibility to KRV-induced diabetes in LEW.1WR1 rats. One-hundred percent of LEW.1WR1 rats develop diabetes when treated with poly I:C and injected with kilham rat virus (KRV), whereas the control strain, LEW, does not develop diabetes under these conditions. This region includes the rat UBD gene, and is homologous to the human region that contains the human UBD gene. The authors also analyzed UBD mRNA expression levels in the rat model. UBD mRNA expression in both the spleen and pancreatic lymph node is higher in the LEW.1WR1 rat than in the LEW rat both prior to poly I:C treatment and when treated with poly I:C and KRV. Another recent report has described the association of a UBD non-synonymous SNP (rs11724) with celiac disease, another autoimmune disease40. Unfortunately, this SNP was not typed by the T1DGC. However, the 3’ end of the UBD gene is only 35,159 base pairs from the centromeric end of our defined 20 kb region. One major UBD function is to direct proteins to the proteasome using E1, E2 and E3 molecules that are different from the usual ubiquitin pathway members41,42. Additionally, UBD is inducible in the setting of inflammation as IFN-γ and TNF-α are cytokines known to increase UBD expression43. Finally, UBD has an important role in dendritic cell maturation44,45 and therefore has a potential role in generating auto-reactive T-cells that recognize self-peptides39. While the 12 SNP haplotype discussed here excludes UBD gene itself, there could be potential regulatory elements in the 20 kb region that affect UBD. Additionally, the haplotypic analysis discussed here is not exhaustive given the complexity of the MHC region. There are additional SNPs in the region surrounding LOC729653 that were not genotyped by the T1DGC, and future analyses could include either imputation of these SNPs using data from the 1000 Genomes Project and the International HapMap Project or further genotyping of additional SNPs in T1DGC individuals.”
We have now established the expression of LOC729653 RNA in at least nineteen different human tissues. This includes lymph node, thymus, stomach, pancreas, bone marrow, cervix, colon, duodenum, epididymus, small intestine, fat, lung, prostate, rectum, testis, urethra, urinary bladder, vagina, and uterus. We did not find expression in human pancreatic islets. The function of this gene is at present unknown. Although several of the type 1 diabetes autoantigens [e.g. glutamate decarboxylase 1 (GAD1), chromogranin A (CHGA), and dystrophia myotonica-protein kinase (DMPK)] are widely expressed in tissues outside of pancreatic islets, lack of a restricted expression pattern (e.g. pancreatic islets or specific lymphocyte subsets) for LOC729653 makes its direct contribution to diabetes pathogenesis less clear.
Analysis of DR3/4 siblings of patients with type 1 diabetes reveals that those siblings who share two MHC haplotypes identical by descent have a risk of 80% for anti-islet autoimmunity, compared to DR3/4 siblings who do not share both MHC haplotypes identical by descent and have a risk of approximately 20%27. This suggests that non-DR/DQ MHC loci may result in a four fold increase of risk in individuals with the DR3/4 genotype. A major portion of this non-DR/DQ association with the MHC region is likely not accounted for by low-frequency HLA-B or HLA-A alleles, nor by the telomeric locus reported in this study (odds ratio after correction is 1.3). We thus believe additional studies with direct sequencing of multiple HLA haplotypes and an additional in depth study of the MHC is essential to better define the genetics of type 1A diabetes.
The current study identifies a telomeric region of the MHC that is associated with type 1 diabetes independent of HLA alleles. We have identified a transcribed gene of unknown function as a candidate.
This study not only confirms our prior association study of a telomeric type 1 diabetes locus but identifies a gene candidate.