Genome-wide association (GWA) analysis has provided a powerful stimulus to the discovery of common variants influencing type 2 diabetes risk, and, to date, ~20 susceptibility loci have been identified with high levels of statistical confidence (1
). However, these known variants account for only a small proportion of the inherited component of disease risk (probably <10%), and the molecular basis of the majority of the genetic predisposition to type 2 diabetes has yet to be established (1
The success of the GWA approach contrasts with the slow progress that characterized previous efforts to map susceptibility loci through genome-wide linkage (2
). However, now that many of the common variants of largest effect have been identified (in European-descent populations at least), there are cogent reasons to revisit regions previously identified through genome-wide linkage. First, variants within the genomic intervals representing replicated linkage signals can be considered to have raised prior odds for a susceptibility effect, and this information can be used to prioritize GWA signals (particularly those with only modest evidence of association) for targeted replication. Second, genuine linkage signals are likely to be driven by causal variants—particularly low-frequency SNPs or copy number variants not captured by the commodity GWA arrays—with effect sizes larger than those currently detectable by GWA (3
). Because alleles with these characteristics will have a more marked impact on individual disease predisposition than the common variants found by GWA, identification of causal variants underpinning replicated linkage signals should accelerate efforts to obtain better predictors of disease (4
For type 2 diabetes, there appears to be only limited overlap between the regions identified by genome-wide linkage and those revealed by GWA (5
). Although the discovery of TCF7L2
was prompted by a search for causal variants within a region of replicated type 2 diabetes linkage, neither the common variants in TCF7L2
nor those in HHEX
(a second nearby GWA signal) account for that linkage signal (6
). Thus, the discovery of TCF7L2
reflects either serendipity or the co-localization of common and rare causal variants in the same locus—the former driving the association and the latter the linkage. Similarly, whereas common variants in HNF4A
have been reported to explain the chromosome 20 linkage signals seen in Finns and Ashkenazim (7
), these associations have proved difficult to replicate (9
Chromosome 1q (in particular the 30-Mb stretch adjacent to the centromere) ranks alongside the regions on chromosomes 10 and 20 as among the strongest in terms of the replicated evidence for genome-wide linkage to type 2 diabetes. Linkage has been reported in samples of European (U.K., French, Amish, Utah), East Asian (Chinese, Hong Kong), and Native American (Pima) origin (summarized in Supplementary Table 1, which is available in the online-only appendix at http://diabetes.diabetesjournals.org/cgi/content/full/db09-0081/DC1
; ref. (2
). The region concerned is gene rich and contains a disproportionate share of excellent biological candidates (2
). The homologous region has also emerged as a diabetes susceptibility locus from mapping efforts in several well-characterized rodent models (10
The International 1q Consortium represents a coordinated effort by the groups with the strongest evidence for 1q linkage to identify variants causal for that signal. Here, we report efforts to map causal variants using a custom linkage-disequilibrium (LD) mapping approach, predominantly based around common SNP variants, applied to a well-powered set of multiethnic samples.
To improve power, ascertainment of type 2 diabetes cases for this study aimed to enrich for 1q causal alleles through 1
) a focus on populations and samples that had shown 1q linkage; 2
) selection for positive family history; and 3
) for some samples, preferential recruitment on the basis of patterns of identity by descent sharing in the 1q region. For each set of case subjects, we selected a control sample of individuals from the same population. Details of the recruitment have been reported previously (14
) and are summarized in the supplementary material available in the online appendix. In all, the case-control part of the study included 2,198 samples (1,000 case subjects, 1,198 control subjects) of European descent, 281 (140 case subjects, 141 control subjects) of East Asian origin, and 285 (144 case subjects, 141 control subjects) who were Native American (Pima) (supplementary Table 2). We also included a small sample of individuals of African American origin (242 case subjects, 173 control subjects) and an additional 599 Pima individuals (520 affected, 79 nondiabetic after age 45 years) from 255 families who, after combination with the Pima case-control set, were used for family-based association analyses.
These samples were submitted to dense-map SNP typing of the core region of interest (from 147.0 to 169.7 Mb [Build 35]) using a series of 1,536-plex BeadArray designs (Golden Gate, Illumina, San Diego, CA). Design of these arrays was contemporaneous with development of dbSNP and HapMap (15
). Thus, whereas the first arrays were LD-agnostic and compiled using genomic localization as the primary consideration for inclusion, subsequent arrays used LD information from the CEU (European ancestry) and CHB/JPT (Asian ancestry) components of HapMap to guide SNP selection and maximize coverage of the region. In all, we designed assays for 6,023 SNPs, of which 5,290 provided reliable data in all populations after passing through our extensive quality control (see the supplementary material). We estimate that after imputation (using the appropriate set of HapMap data as a reference), coverage of the region (minor allele frequency >0.05; r2
> 0.8) reaches ~80% in the European and ~72% in the East Asian samples. Coverage is harder to estimate (and imputation likely to be less valuable) in Pima and African American samples, since reference data from these populations are not available, although we estimate ~49% coverage in West Africans based on YRI data.
Genotyping quality was generally good, with over 91% of SNPs passing quality control in each population (see the supplementary material) and <0.45% of SNPs failing (P
) tests of within-sample Hardy-Weinberg equilibrium. Significant departures from expectation in the distributions of test statistics observed in the Amish and Pima samples (as revealed by QQ plots; see supplementary material) likely reflect residual relatedness between subjects from those populations. We adjusted for this (and any population stratification effects) through genomic control methods (16
). Association analyses treated each study as a separate stratum and used standard meta-analysis approaches to deliver estimates of joint effect size and statistical significance (see supplementary material). A series of nested meta-analyses were performed including 1
) European-descent samples only (“4-way”); 2
) non–African-descent samples only (“7-way”); and 3
) all samples (“8-way”).
Under an additive model with allele frequency of 0.25, our sample provides ~80% power to detect per-allele odds ratios (ORs) of >1.36 (“8-way”) or >1.43 (“4-way”) for α = 5 × 10−6. Given that the region covers ~1% of the genome, we consider this a reasonable benchmark for “region-wide” significance (equivalent to consensus genome-wide thresholds of 5 × 10−8). These power calculations are conservative: given the case ascertainment enrichment strategies used in this study, we would expect to detect variants with population-level effects in the 1.2–1.3 range. Under reasonable assumptions (three independent alleles contributing to a linkage signal with a locus-specific sibling relative risk of ~1.15), we can expect the effect size of the variants we were seeking to detect (i.e., those responsible for the 1q linkage) to be substantially greater than this (e.g., allelic OR 1.6 for a variant with risk allele frequency of 25%). Our study was therefore well powered to detect putatively causal alleles within the European and/or combined datasets.
Across these analyses, none of the SNPs showed an association with type 2 diabetes that withstood genome-wide correction (P
< 5 × 10−8
). However, two clusters of SNPs showed association signals that approached or exceeded “region-wide” significance thresholds. The first of these, involving rs7538490 and nearby SNPs, mapped to a 51.4-kb interval (at ~160.35 Mb) within the first intron of NOS1AP
(nitric oxide synthase 1 [neuronal] adaptor protein) with an estimated per allele OR (in the 4-way, European-only analysis) of 1.38 (95% CI 1.21–1.57, P
= 1.4 × 10−6
, additive model, , ). Rs7538490 lies ~5.6 kb from one of the SNPs (rs10494366) previously shown to influence cardiac repolarization and QT interval (18
): the two SNPs are in modest LD (r2
= 0.47 in HapMap CEU), and rs10494366 shows some evidence for association with type 2 diabetes (P
= 3.1 × 10−4
) in the same 4-way meta-analysis.
Association results for rs7538490 in the NOS1AP gene
FIG. 1. Single-point type 2 diabetes associations within the 1q region. This plot shows the “4-way” (European-descent samples only) meta-analysis using the additive model (Cochran-Armitage trend test). Directly typed SNPs are shown in orange and (more ...)
The second signal includes ~10 SNPs in a 220-kb region of extensive LD at ~152.1 Mb. This region includes the coding sequences of the genes encoding liver pyruvate kinase (PKLR) and ash1 (absent, small, or homeotic)-like (Drosophila) (ASH1L) among others. At the lead SNP (rs11264371), the estimated OR for the 4-way analysis was 1.36 (1.18–1.58) (P = 3.5 × 10−5) under the additive model. The effect size and significance were marginally greater (1.48 [1.18–1.76], P = 1.0 × 10−5), under a dominant model (, ).
Association results for rs11264371 in the PKLR/ASH1L region
Both signals were most prominent in the European samples, and there was no evidence that an equivalent association signal extended to the East Asian, Native American, or African American samples. Though the association P values for these two signals remained strong in the 8-way meta-analysis of all data (5.4 × 10−5 for rs7538490 and 2.9 × 10−4 for rs11264371, and ), in each case they were driven by the larger European samples. Analyses in the larger Pima family-based association dataset also found no evidence of association (rs7538490, P = 0.59; rs11264372 [r 2 of one with rs11264371 in CEU and CHB/JPT HapMap], P = 0.79).
Although neither signal was of sufficient effect size to be considered causal for the 1q linkage signal (the estimated sibling relative risk attributable to these loci in combination is only 1.045), we reasoned that these signals might nevertheless be pointers toward nearby causal variants (of lower frequency but higher penetrance) that were inadequately tagged by the SNPs we had typed. However, before proceeding to resequencing and fine-mapping, we first sought replication of our findings in independent datasets. Mindful that our case ascertainment strategies may have led to inflated estimates of effect size compared with those evident in unselected case subjects, we recognized that large sample sizes would be required to test the observed associations. Because the signals were clearest in European-descent samples, we focused replication on samples from Northern Europe.
First, we used GWA data from the Wellcome Trust Case Control Consortium (19
). After removing 429 overlapping case subjects, we examined 1,495 additional type 2 diabetes case subjects and 2,938 control subjects with Affymetrix 500k data (using imputation to test for association at the lead SNPs in each interval). No evidence of association was evident (rs7538490, P
= 0.57; rs11264371, P
= 0.33). Similarly, analysis of GWA data from the Diabetes Genetics Initiative (21
) and FUSION (22
) studies provided no corroboration of either signal. Furthermore, when analyzed jointly (4,549 case subjects, 5,579 control subjects), these three studies also failed to reveal any additional common variant signals of interest (P
) across the wider 1q region (23
) and no corroboration of any of the lesser signals evident in the 1q consortium analyses.
Finally, we genotyped the two lead SNPs (rs7538490, rs11264371) using fluorogenic 5′-nuclease (Taqman) assays in 4,572 case subjects and 6,941 control subjects from the U.K. (the UK Type 2 Diabetes Genetics Consortium and Warren 2 cases/1958 British Birth Cohort strata in and ). Once again, there was no evidence of replication. Taking into account all replication samples (~8,500 case subjects, ~12,400 control subjects), there was no significant association with type 2 diabetes (rs7538490, OR 1.03 [95% CI 0.98–1.07], P = 0.29; rs11264371, OR 1.02 [0.97–1.06], P = 0.54 [additive], 1.06 [0.99–1.13], P = 0.10 [dominant]). Nominal significance was retained when these replication data were combined with the original 1q consortium case-control data (rs7538490, P = 0.015; rs11264371, P = 0.069), but these associations are unimpressive in either the region-wide or genome-wide context. Even allowing for some heterogeneity of effect size between the primary and replication datasets (due to ascertainment differences and the “winner's curse”), there seems to be no substantive evidence that the association signals observed in the NOS1AP and the PKLR/ASH1L region are genuinely associated with type 2 diabetes.
In summary, we have undertaken a detailed survey of common variants across the region of replicated 1q linkage, achieving coverage that exceeds that of available GWA data for the region. Despite analysis of multiple ethnic groups in samples sufficiently powered (in the European-descent component at least) to have detected common variants causal for the linkage, we found no compelling signals.
Should we conclude therefore that the original evidence for 1q linkage was false? Although this possibility cannot be discounted, it is worth considering that recent experience from GWA studies has shown that, for common susceptibility variants at least, effect sizes are modest and that none is of magnitude sufficient to generate a linkage signal detectable in achievable sample sizes. Efforts to explain the “missing heritability” for type 2 diabetes (that is, the disparity between the predisposition attributable to the known loci and independent estimates of overall heritability and familiality) are now shifting toward the search for low-frequency, medium-penetrance alleles. Alleles with these characteristics are likely to have escaped detection through the genome-wide approaches available so far, since they would be insufficiently penetrant to be detected with traditional linkage approaches applied to monogenic families and too infrequent to be reliably identified through GWA studies (4
). Yet, low-frequency, medium-penetrance alleles could, particularly if several independent alleles map to the same locus, generate the kinds of linkage signals detectable in family-based studies (as is the case for NOD2/CARD15
and Crohn's, for example) (24
Detection of low-frequency susceptibility variants will require new approaches based around next-generation resequencing and large-scale fine-mapping. Genome-wide resequencing of large case-control samples remains economically and logistically unfeasible, but targeted resequencing of selected regions is not, and the future plans of the 1q consortium include deep resequencing of the 1q region of interest, focusing at least initially on exons and conserved sequence.