|Home | About | Journals | Submit | Contact Us | Français|
Wolfram syndrome 1 (WFS1) single nucleotide polymorphisms (SNPs) are associated with risk of type 2 diabetes. In this study we aimed to refine this association and investigate the role of low-frequency WFS1 variants in type 2 diabetes risk.
For fine-mapping, we sequenced WFS1 exons, splice junctions, and conserved noncoding sequences in samples from 24 type 2 diabetic case and 68 control subjects, selected tagging SNPs, and genotyped these in 959 U.K. type 2 diabetic case and 1,386 control subjects. The same genomic regions were sequenced in samples from 1,235 type 2 diabetic case and 1,668 control subjects to compare the frequency of rarer variants between case and control subjects.
Of 31 tagging SNPs, the strongest associated was the previously untested 3′ untranslated region rs1046320 (P = 0.008); odds ratio 0.84 and P = 6.59 × 10−7 on further replication in 3,753 case and 4,198 control subjects. High correlation between rs1046320 and the original strongest SNP (rs10010131) (r2 = 0.92) meant that we could not differentiate between their effects in our samples. There was no difference in the cumulative frequency of 82 rare (minor allele frequency [MAF] <0.01) nonsynonymous variants between type 2 diabetic case and control subjects (P = 0.79). Two intermediate frequency (MAF 0.01–0.05) nonsynonymous changes also showed no statistical association with type 2 diabetes.
We identified six highly correlated SNPs that show strong and comparable associations with risk of type 2 diabetes, but further refinement of these associations will require large sample sizes (>100,000) or studies in ethnically diverse populations. Low frequency variants in WFS1 are unlikely to have a large impact on type 2 diabetes risk in white U.K. populations, highlighting the complexities of undertaking association studies with low-frequency variants identified by resequencing.
The post genome-wide association study era presents several challenges. These include fine-mapping association signals to genes and/or variants within the genomic regions of interest, assessing the impact of low frequency variants (not tagged in previous association studies) on diseases/traits, and understanding the functional mechanisms behind genetic associations.
WFS1 encodes wolframin (1,2), an endoplasmic reticulum (ER) membrane protein with a role in ER calcium homeostasis (3–5) and in the ER stress response (6,7). Loss-of-function mutations in WFS1 cause Wolfram syndrome (MIM 222300), which includes young onset nonautoimmune insulin-dependent diabetes (8). Common single nucleotide polymorphisms (SNPs) at WFS1 have recently been shown to be reproducibly associated with type 2 diabetes risk in white European populations (9–11). However, the strongest associated SNP, rs10010131, is intronic and is not associated with WFS1 expression in HapMap lymphoblastoid cell lines (12), suggesting that it is tagging a causal variant(s).
Given the impact of rare and common WFS1 variants on Mendelian and common forms of diabetes, respectively, WFS1 is an excellent candidate gene in which to look for low frequency variants with intermediate effects on diabetes risk. Furthermore, anecdotal evidence suggests increased type 2 diabetes susceptibility in obligate carriers of Wolfram syndrome mutations (13).
We aimed to refine the association between WFS1 common variants and type 2 diabetes by sequencing exons, splice junctions, and conserved intragenic and upstream noncoding regions in a subset of case (n = 24) and control (n = 68) subjects from the Cambridgeshire case-control study. We used these data to select tagging SNPs to capture common (minor allele frequency [MAF] >0.05) and nonsynonymous variants and genotyped these tagging SNPs in two U.K. case-control studies (total 959 case and 1,386 control subjects). Replication studies were conducted in four additional studies: two U.K., one Swedish, and one Ashkenazi (total 3,753 type 2 diabetic case and 4,198 control subjects). We also aimed to test for the presence of independent type 2 diabetes association signals from low-frequency (MAF <0.05) putative functional WFS1 variants by sequencing 1,235 type 2 diabetic case and 1,668 control subjects from two U.K. case-control studies.
The Cambridgeshire (14) (552 type 2 diabetic case and 552 control subjects), European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk (15) (417 case and 834 control subjects), Anglo-Danish-Dutch Study of Intensive Treatment in People With Screen-Detected Diabetes in Primary Care (ADDITION)/Ely (16,17) (926 case and 1,497 control subjects), and Exeter (18–20) (601 case and 610 control subjects) studies comprise white U.K. participants. The Ashkenazi study comprises 930 type 2 diabetic case and 461 control subjects of Ashkenazi Jewish origin (21). The Västerbotten study comprises predominantly northern Swedish whites (1,296 type 2 diabetic case and 1,412 control subjects) (11). The online appendix (available at http://diabetes.diabetesjournals.org/cgi/content/full/db09-0920/DC1) describes the cohorts in detail.
PCR was performed on genomic DNA from Cambridgeshire case-control participants or whole-genome amplified DNA from ADDITION and Ely study participants. Fourteen primer pairs (sequences and cycling conditions available upon request), designed using Primer3 software (http://frodo.wi.mit.edu/primer3/), were required to amplify the eight WFS1 exons, including splice junctions, untranslated regions (UTRs), and selected conserved regions. Coverage is shown in supplemental Fig. 1 in the online appendix. PCR and bi-directional sequencing were performed using standard conditions and following manufacturers' protocols (supplemental Methods). Sequencing reactions were run on ABI3730 capillary machines (Applied Biosystems) and analyzed using an automatic SNP caller, ExoTrace (S. Leonard, Wellcome Trust Sanger Institute, unpublished data). The results of SNP calling were displayed and low-frequency variants were manually reviewed in a specific implementation of GAP4 (Staden Sequence Analysis Package software). All regions produced usable sequences for >90% of samples.
Details are provided in the supplemental Results and supplemental Fig. 2. Linkage disequilibrium (LD) was calculated using Haploview version 4.0 (http://www.broad.mit.edu/mpg/haploview), and pairwise tagging SNPs were selected by Tagger using r2 ≥ 0.8, force including nonsynonymous variants.
Tagging SNPs were genotyped using the Sequenom iPlex platform, whereas rs1046320 and rs7691824 were genotyped using TaqMan MGB chemistry (Applied Biosystems, Foster City, CA) according to the manufacturers' instructions (conditions and primers available upon request).
Variants were excluded if they departed from Hardy-Weinberg equilibrium (P < 0.001) or had low call rates (n < 85%) and/or if there was discrepancy in the call rate between case and control subjects (P < 0.001) (details provided in the online appendix).
The relative expression of rs1046320 alleles was assessed using total RNA from 10 HapMap lymphoblastoid cell lines (12) heterozygous for rs1046320 (six CEU, two YRI, and two CHN) and a TaqMan RNA-to-CT 2-Step Kit according to the manufacturer's instructions (primers and probes available upon request). Genomic DNA was used as a control.
Statistical analyses were conducted in StataSE 9. Logistic regression was used to assess the contribution of individual SNPs under a log-additive model (1 df) to risk of type 2 diabetes using study as a categorical covariate. Log-likelihood ratio tests were used to assess whether associated SNPs independently contributed to risk of type 2 diabetes by comparing the log likelihood of a nested model (2 df) containing an associated SNP and study with that of the full model (3 df) also containing the test SNP. The difference in prevalence of type 2 diabetes in carriers versus noncarriers of rare variants was analyzed using Fisher exact test. Power was calculated using the Power and Sample Size Program (22) and Quanto version 1.1.1 (http://hydra.usc.edu/gxe). Fixed-effects meta-analyses were performed using the metan command, combining summary estimates (log odds ratios and lower and upper CIs for each study), weighted using the inverse-variance method. An expectation-maximization algorithm was used to estimate haplotype frequencies, and GENEBPM software was used to cluster haplotypes by allelic make-up and risk of type 2 diabetes to obtain a Bayes' factor (BF) in favor of association (23).
Of 31 tagging SNPs, 24 passed quality control and captured 81% of the common (MAF >0.05) and/or nonsynonymous WFS1 variants in the Cambridgeshire case-control samples and 98% of the common WFS1 region variants in HapMap CEU trios.
Eight SNPs were nominally associated with type 2 diabetes risk (P < 0.05) in a pooled analysis of Cambridgeshire and EPIC-Norfolk studies (Table 1). The LD between these eight SNPs (supplemental Table 1) and the consistency of their effect size suggests that they are linked to similar extents with the real causal allele(s). However, we were unable to demonstrate that any of the associated SNPs were contributing to type 2 diabetes risk independently of the other seven (supplemental Table 2).
In our data, the strongest association with type 2 diabetes risk (rs1046320, P = 0.008) mapped within a putative functional region (3′ UTR). We therefore genotyped it in four further case-control studies—Exeter, Ashkenazi, ADDITION/Ely, and Västerbotten studies (3,753 case and 4,198 control subjects)—to improve the accuracy of the effect-size estimate and compare it with rs10010131, the strongest SNP from the original study (10,11). In meta-analyses, rs1046320 and rs10010131 demonstrated similar magnitudes of association with type 2 diabetes risk (OR 0.856 [95% CI 0.804–0.912], P = 1.25 × 10−6 and 0.854 [0.800–0.912], P = 2.58 × 10−6, respectively) (Fig. 1). The high correlation between these SNPs in our samples (r2 = 0.92) suggests >100,000 samples would be required to have 80% power to distinguish between their effects with a significance level P < 10−3 (supplemental Fig. 4). In an assessment of the possible function of rs1046320, we found no difference in allele-specific expression in lymphoblastoid cell lines from 10 heterozygous HapMap individuals (data not shown).
To test whether haplotypes tag the causal variant(s) better than individual SNPs, we estimated the frequency of haplotypes across 20 genotyped SNPs (excluding three variants with MAF <0.01) in Cambridgeshire and EPIC-Norfolk samples. We found nine haplotypes with MAF >0.01 (supplemental Table 3) that fell into two clusters according to allelic make-up and type 2 diabetes risk (supplemental Fig. 3). One cluster contained haplotypes that are protective against type 2 diabetes relative to the most common haplotype. There were six SNPs (including rs10010131 and rs1046320) that partition the two clusters entirely and, due to high LD between them, were each sufficient to separate the clusters. When the analysis was repeated with haplotypes made up of each SNP in turn, we found the evidence in favor of association was stronger for the single SNPs than for the haplotypes. For the overall haplotype analysis, the estimated log10 BF was 0.64, whereas for the single SNPs, the strongest log10 BF was 1.07 for rs1046320. This suggests that haplotypes made up of SNPs in this study do not tag the causal variant(s) any better than any of the individual SNPs. However, this does not preclude the possibility of independent causal variants in the region that we cannot tag with our SNPs (either pairwise or through the use of haplotypes).
To increase coverage in the region, we imputed missing genotypes at 66 additional loci (supplemental Methods) (Fig. 2). In this analysis, rs1046320 remained the most strongly associated SNP in Cambridgeshire and EPIC-Norfolk studies, except for the imputed intronic rs7691824. However, genotyping of rs7691824 in Cambridgeshire and EPIC studies showed that there were no carriers among our samples.
Sequencing of exons, splice junctions, and conserved noncoding regions in 1,235 type 2 diabetic case and 1,668 control subjects from the Cambridgeshire and ADDITION/Ely studies revealed 290 variants (supplemental Table 4). Of 250 rare (MAF <0.01) changes, 94% were novel, demonstrating the value of deep resequencing for identifying rare changes.
Given the sample size, our study is underpowered to detect effects of each rare variant tested individually. For example, 76 of the 82 rare, nonsynonymous variants have MAF <0.001, for which we have only 25% power to detect an OR of ~3. To improve power, we collapsed rare (MAF <0.01) variants together by comparing the prevalence of type 2 diabetes in carriers versus noncarriers. We collapsed only nonsynonymous variants in the first instance, as their relative paucity at higher MAFs in the population suggest they are enriched for functional changes under negative selective pressure (supplemental Fig. 5). However, there was no significant increase in risk of type 2 diabetes in carriers compared with noncarriers (OR 1.04 [95% CI 0.79–1.37], Fisher exact test P = 0.79) (Table 2). Adding rare variants in conserved noncoding sequences and TargetScan (http://www.targetscan.org/)-predicted miRNA seed sequences to the rare nonsynonymous changes made no material difference (P = 0.67) (Table 2).
A comparative study of synonymous variants (MAF <0.01), assumed to be functionally neutral, yielded similar results (Table 2). Further exploratory analyses, including examination of mutation load, PANTHER scores, and combined analysis of rare and intermediate frequency variants, also did not yield significant results (supplemental Results and supplemental Table 6).
To avoid diluting effects of rare (MAF <0.01) nonsynonymous variants on disease risk by pooling them with neutral nonsynonymous changes, we restricted analysis to nonsynonymous changes most likely to impact protein function. Variants were selected based on three criteria 1) previous biochemical evidence that the variant causes loss of wolframin function, 2) previous genetic evidence for involvement in Wolfram syndrome, and 3) predicted deleterious functional effects by three programs: SIFT (http://sift.jcvi.org/), PolyPhen (http://genetics.bwh.harvard.edu/pph/), and PANTHER (http://www.pantherdb.org/tools/csnpScoreForm.jsp). Using these criteria, we inferred 23 functionally important mutations (supplemental Table 5), but carriers were at type 2 diabetes risk comparable with that of noncarriers (OR 0.99 [95% CI 0.65–1.48], P > 0.99) (Table 2).
Two nonsynonymous SNPs, V871M and R456H, had MAFs of 0.013 and 0.042, respectively. In single SNP analyses of pooled Cambridgeshire and ADDITION/Ely studies, neither were associated with type 2 diabetes (P = 0.13 and P = 0.25, respectively).
We performed a comprehensive fine-mapping and low-frequency variant analysis for WFS1, a locus associated with type 2 diabetes risk (9–11). Using a sequencing, SNP-tagging, and genotyping approach, we identified a number of putative causal variants for type 2 diabetes association. However, due to strong LD between the SNPs within the candidate interval, we were unable to distinguish between their effects on disease risk. None of the associated SNPs have obvious functional properties, and real-time PCR revealed no difference in allele-specific expression of rs1046320 (the strongest associated in this study) in lymphoblastoid cell lines, suggesting this SNP is unlikely to affect mRNA stability or processing in this tissue. However, we cannot rule out rs1046320-associated expression changes in other tissues.
Deep resequencing of WFS1 exons, splice junctions, and conserved noncoding sequences in 1,235 type 2 diabetic case and 1,668 control subjects revealed no statistically significant differences in the cumulative frequency of rare (MAF <0.01) nonsynonymous variants (P = 0.79). Given that ~8% of study participants carried at least one rare nonsynonymous change, we had >80% power to detect ORs >1.43 at P < 0.05. This study was therefore well powered to detect previously reported effect sizes for rare variants on complex traits (the average being OR 3.74) (24). Restricting the analysis to those variants most likely to be functional reduced the frequency of the exposure (carrier status) to ~4%, but retained >80% power to detect ORs >1.65. Still, there were no statistical differences between case and control subjects (P > 0.99), suggesting rare variants in WFS1 do not have a large impact (ORs >2) on type 2 diabetes risk. It is worth noting that our analyses assumed all rare variants have the same direction of effect. Our power to detect significant effects on type 2 diabetes would have been reduced if the variants were a mixture of protective and susceptibility alleles. Finally, our study had >80% power to detect moderate effect sizes of intermediate-frequency SNPs V871M and R456H on risk of type 2 diabetes (ORs >1.93 and >1.45, respectively), though neither were statistically associated with type 2 diabetes (P = 0.13 and P = 0.25). Selecting case subjects enriched for early onset/family history of the disease might have increased our power to find rarer variants of slightly higher penetrance that might segregate in the family. However, this kind of analysis was not feasible in our study, as we have no DNA from family members.
Our attempts to refine the WFS1 association signal demonstrate that while high LD is useful for minimizing the amount of genotyping required to discover a genetic association, it can compromise attempts to further refine the association signal. Studying populations with different and/or weaker patterns of LD may help refine signals. For example, the LD block spanning the WFS1 gene is more fragmented in HapMap samples of African descent, and correlation between SNPs is generally weaker (r2 = 0.204 between SNPs rs10010131 and rs1046320 in YRI HapMap samples compared with r2 > 0.92 in CEU samples). In this setting, studies with ~10,000 samples (compared with >100,000) would be well powered to distinguish their effects (supplemental Fig. 4), assuming that this locus is associated with type 2 diabetes in this population. An alternative strategy is to test SNPs within the candidate region for association with proximal traits, which may provide greater power to distinguish between SNP effects (25).
Limitations of our fine-mapping study design are that we were underpowered to detect associations with SNPs at MAF <0.05, and we limited sequencing to regions most likely to harbor functional variation. Though we were able to impute 66 additional known SNPs in the region (most common and in high LD [r2 > 0.8] with directly genotyped SNPs), 7 had MAF <0.05 and were not well correlated (r2 < 0.8) with genotyped SNPs. As illustrated by follow-up genotyping of the imputed SNP (rs7691824), monomorphic in our samples, imputation of rare variants is less accurate. This could potentially lead to false negative results if rare variants of poorer imputation quality have larger effect sizes than more common SNPs.
In conclusion, we have undertaken the most comprehensive fine-mapping and rare variant analysis in a type 2 diabetes gene to date. We identified six SNPs that have comparable associations with type 2 diabetes ranging from ORs of 0.85 to –0.87. We also show that low-frequency variants in putative functional regions of WFS1 are not associated with diabetes risk in our U.K. populations. Future whole exome/genome resequencing studies should consider that functionality of rare variants is difficult to predict and that pooling variants and candidate genes together for purposes of analysis might diminish the power to detect true risk alleles.
I.B. acknowledges funding from Wellcome Trust Grant 077016/Z/05/Z and European Union contract LSHM-CT-2006-037197. Work on the Umeå cohorts was supported in part by grants from Novo Nordisk, the Swedish Heart-Lung Foundation, and the Heart Foundation of Northern Sweden (all to P.W.F.). S.L.R. is funded by the British Heart Foundation. The Ashkenazi Jewish cohort was ascertained by the Israel Diabetes Research Group, with support from the Russell Berrie Foundation, d-Cure, Diabetes Care in Israel, and an unrestricted research grant from Novo Nordisk, Denmark. I.B. and her spouse report stock in GlaxoSmithKline and Incyte. No other potential conflicts of interest relevant to this article were reported.
We thank the Wellcome Trust Sanger Institute high-throughput sequencing facility and informatics team (team leader, Sarah Hunt) for their technical support. We thank Jason Cooper at the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, for calculating power to distinguish correlated SNPs and providing supplemental Figure 4. We also thank Åsa Agren, Kerstin Enquist, and other staff of the Umeå Medical Biobank for their cooperation and expertise in sample retrieval and data organization.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.