|Home | About | Journals | Submit | Contact Us | Français|
Despite numerous candidate gene and linkage studies, the field of type 2 diabetes (T2D) genetics had until recently succeeded in identifying few genuine disease-susceptibility loci. The advent of genome-wide association (GWA) scans has transformed the situation, leading to an expansion in the number of established, robustly replicating T2D loci to almost 20. These novel findings offer unique insights into the pathogenesis of T2D and in the main point towards the etiological importance of disorders of beta-cell development and function. All associated variants have common allele frequencies in the discovery populations, and exert modest to small effects on the risk of disease, characteristics which limit their prognostic and diagnostic potential. However, ongoing studies focussing on the role of copy number variation and targeting low frequency polymorphisms should identify additional T2D-susceptibility loci, some of which may have larger effect sizes and offer better individual prediction of disease risk.
Type 2 diabetes (T2D) is a common complex disease, thought to be caused by a complex interplay between environmental and genetic factors. The sibling recurrence risk ratio (λs) for T2D, a measure of the familial aggregation of the disease, has been estimated to be approximately 3 in European populations (1). As with other common complex diseases, the hunt for T2D-susceptibility genes started a few decades ago. Over this period, study designs evolved as a result of progress in technology, in the understanding of the patterns of human genome sequence variation and in the availability of appropriately-ascertained samples. Hundreds of candidate gene studies and more than 30 genome-wide linkage scans for T2D (2,3) were published. Unfortunately, neither approach proved particularly successful in identifying robustly-replicating T2D-susceptibility loci.
This is certainly the case with genome-wide linkage approaches. Such studies typically utilized a few hundred microsatellite markers distributed across the genome to type sib-pairs or families ascertained for diabetes, and relied on available recombination rate information to infer the likely location of disease signals. Although linkage scans took a genome-wide gene-agnostic approach, and might therefore have been expected to have a higher success rate than gene-centric studies, few, if any, regions generated credible statistical evidence for linkage across multiple studies. To date, no susceptibility variant that can reliably account for any of the replicated T2D-linkage peaks has been identified, even though several resource-intensive fine-mapping efforts have been completed. One possible explanation for this may be that the fine-mapping studies have concentrated on evaluating the role of common polymorphisms. It seems increasingly probable (in the light of more recent findings) that most genuine linkage signals reflect the actions of multiple rare variants (copy number or single base pair changes).
Until recently, the candidate-gene approach fared little better in delivering real T2D-susceptibility effects: several factors contributed to this disappointing outcome. Firstly, the majority of studies were limited in terms of power: the small sample sizes deployed were inadequate to detect the kinds of effect sizes that we now know are realistic for most complex traits. Secondly, the candidate genes studied were selected on the basis of limited knowledge of the underlying etiopathology of disease. In addition, poor understanding of the architecture of genetic variation together with the time-consuming, low-throughput genotyping technologies available at the time, meant that it was not possible to undertake comprehensive assessments of the variants within the regions of interest. Finally, irreproducibility of results was further reinforced by liberal thresholds for declaring significance and a widespread tendency to over-interpret results.
The last few years have seen substantial changes in the way genetic association studies are conducted. The availability of large sample collections has led to increased power in both discovery and replication studies. In addition, the development of affordable, high-accuracy, high-throughput genotyping technologies has made large-scale examination of genetic variants (and particularly single nucleotide polymorphisms (SNPs)) possible. International efforts like the Human Genome Sequencing Project and the HapMap (4) have contributed to a much improved understanding of patterns of sequence variation. Genetic association studies have subsequently taken advantage of knowledge about correlation patterns, or linkage disequilibrium (LD), between variants, thereby achieving more streamlined and efficient study designs. These advances have made it possible to undertake genome-wide surveys of common variant association (genome wide association scans or GWAs). Such studies have been facilitated by improved guidelines and tools for study design, analysis and interpretation. The growing number of robustly replicating T2D susceptibility loci is a testament to these recent advances.
For the purposes of this review, we define T2D-susceptibility loci as signals reaching genome-wide significance levels for T2D, regardless of their primary association. By consensus, genome-wide significance is usually taken as a p value of <5×10-8, which equates to the standard p<0.05 criterion adjusted for ~1M tests (roughly the potential number of independent common variant signals in non-African individuals, after allowing for linkage disequilibrium). Even so, deciding what does and does not constitute a T2D-susceptibility locus is not entirely straightforward. T2D predisposition is causally related to several other phenotypes, most notably body mass index (BMI) and other measures of adiposity, raising a debate as to whether a gene like FTO (which has a primary effect on BMI and obesity risk (5)) should really be counted as a T2D locus. Some would argue not, since the BMI effect is clearly primary: but others would argue that the effect of FTO variants on T2D-risk remains the same whether or not the mechanism responsible is known. Based on the definition cited earlier, there are currently 19 established T2D loci (Table 1) – a number poised to increase substantially.
Only four of these 19 T2D-associated variants have been identified through candidate gene studies (Table 1). This represents a rather meagre return given the enormous number of candidate gene studies carried out. The non-synonymous P12A and E23K variants, in PPARG and KCNJ11 respectively, were the first robustly replicating signals to emerge (6,7). The two genes, encoding peroxisome proliferator-activated receptor gamma (PPARG) and Kir6.2 (KCNJ11) were selected for study as their products represent drug targets used in the treatment of diabetes. The more recent findings of WFS1 and HNF1B (previously known as TCF2) SNP associations with T2D arose from in-depth studies of genes in candidate biological pathways (8-10).
The single largest effect size for T2D identified to date resides within the TCF7L2 gene. This locus was unearthed following a linkage peak fine-mapping study on chromosome 10 (11). Although there is evidence for linkage across the region in which the TCF7L2 gene resides, the common polymorphisms driving the T2D association do not account for the linkage signal observed. It therefore remains unclear whether this discovery was pure serendipity, or whether rare causal variants (as yet undiscovered) in TCF7L2 are responsible for the linkage signal.
The 14 remaining T2D-susceptibility loci (mapping near CDKN2A/B, CDKAL1, SLC30A8, IGF2BP2, HHEX/IDE, FTO, KCNQ1, NOTCH2, CDC123/CAMK1D, ADAMTS9, THADA, TSPAN8/LGR5, JAZF1 and MTNR1B; Table 1) have been identified over the course of the last 18 months through individual GWA studies (12-19), through the meta-analysis of GWA scans (20), or in the case of MTNR1B, as follow-up to a signal originally identified through analyses of continuous glycemic traits (21-23).
GWA scans have been the most important contributors in finding novel, reproducibly associated T2D variants. For the purposes of this review article, we define GWA scans as studies that have examined over 150,000 SNPs (on the basis that the genome coverage of studies with less than this number is relatively low). Thus far, the literature counts six such GWA scans carried out in European populations (12-18) and one scan carried out in Japanese individuals (19).
The European GWA scans for T2D all used a case-control study design and employed commercially available, fixed-content genotyping platforms. Sample ascertainment, the specific populations examined, sample size and follow-up strategies differed among studies: these factors underlie some of the heterogeneity in the findings. The first published study was conducted in 661 cases and 614 controls from France, using the Illumina platform, and reported the HHEX/IDE and SLC30A8 regions as novel T2D susceptibility loci (16). Four further scans followed shortly afterwards (12-15, 18). Steinthorsdottir et al (18) typed 1399 cases and 5275 controls from Iceland for approximately 300,000 SNPs on the Illumina chip and reported CDKAL1 as a new T2D locus. This same locus was independently found by three further contemporaneous scans performed by the Wellcome Trust Case Control consortium (WTCCC), Diabetes Genetics Initiative (DGI) and Finland-United States Investigation of NIDDM genetics (FUSION) (12-15). The WTCCC (14,15) typed ~500,000 SNPs using the Affymetrix platform in 1924 cases and 2938 controls from the UK. The DGI study (12) employed the same genotyping technology and examined 1464 cases and 1467 controls from Finland and Sweden. The FUSION scan (13) also focused on individuals from Finland (1161 cases, 1174 controls) and used Illumina to type ~300,000 SNPs across the genome. These last three studies collaborated by sharing data and coordinating replication studies, their efforts leading to identification of the CDKN2A/2B, FTO and IGF2BP2 disease loci, as well as confirming other previously-reported loci such as PPARG, KCNJ11 and TCF7L2. The last scan to date published on individuals of European descent (17) examined 500 cases and 497 controls from four populations (Finland, Israel, Germany and UK) but, primarily for reasons of restricted power, failed to reveal any novel T2D susceptibility loci.
The most recent published scan was performed in 194 T2D cases and 1558 controls, all from Japan (19). Perlegen array technology was used to genotype over 200,000 tag SNPs, selected to capture >50% of common variation in the Japanese population. This study confirmed association of the CDKAL1 and IGF2BP2 signals and, through replication efforts, additionally identified KCNQ1 as a novel T2D susceptibility locus. Association at this locus was also identified by a second, smaller scan, again carried out in Japanese individuals (24). The same association signal can be detected in European samples, though differences in allele frequency render the signal much less visible than in East Asians (19,24).
Although genome-wide association scans have proved successful in identifying T2D susceptibility variants, the power of individual studies to detect small or modest effects at common SNPs has been limited. For example, the WTCCC study, with 1924 cases and 2938 controls, had only 9% power to detect an allelic odds ratio (OR) of 1.2 at a SNP with minor allele frequency (MAF) of 30% at genome-wide significance levels (p=5×10-8).
These limitations have served as an impetus for collaborative meta-analyses of GWA scan data, designed to improve the power to detect genuine susceptibility loci. Power can be increased both through the combination of datasets to achieve a larger sample size, and through the concomitant examination of both directly typed and imputed SNPs. The development and implementation of statistical approaches for the imputation of genotypes at untyped variants on the basis of a common reference (25), such as the HapMap (4), has also facilitated the integration of GWA scans derived from different genotyping platforms.
Investigators from the WTCCC, DGI and FUSION scans joined forces as the DIAGRAM (Diabetes Genetics, Replication And Meta-Analysis) Consortium to provide the first complex disease GWA scan meta-analysis. DIAGRAM (20) combined data across 4549 T2D cases and 5579 controls, all of European descent, from the 3 studies (stage 1). Imputation of genotypes at untyped HapMap SNPs within each scan and subsequent stringent quality control resulted in the final examination of 2,202,892 variants across 10128 samples. These efforts boosted power to detect common susceptibility loci with modest effects, although the power to detect smaller effects remained low (Figure 1), justifying even larger meta-analysis efforts in the future. Following the synthesis of summary results across the genome, 65 independent promising signals were prioritized for replication genotyping in up to 22426 further samples of European descent (stage 2). Then, the 10 strongest signals emerging from stage 2 were selected for a further round of genotyping in an additional, independent set of up to 57366 T2D cases and controls, all of European descent.
When data from all 3 stages were combined using a meta-analysis framework, six novel T2D susceptibility loci reached genome-wide significance levels (mapping near NOTCH2, CDC123/CAMK1D, ADAMTS9, THADA, TSPAN8/LGR5, JAZF1; Table 1). Of these, only NOTCH2 could have been considered a reasonable biological candidate previous to this work. Several other loci (such as DCD and VEGFA) failed to reach stringent association values, but nevertheless were highlighted as interesting candidates. Incorporation of further data will undoubtedly clarify their true relationship to T2D predisposition.
The GWA studies performed to date have only been designed to detect a particular subset of potential susceptibility effects – those attributable to common SNPs (or copy number variants (CNVs)) well-tagged by the particular variants represented on the various commodity genotyping arrays. Even with the latest wave of high-density arrays, an appreciable minority of common SNPs remains poorly-tagged, even in the European-descent populations which have been the focus of most studies (26). In addition, there is limited coverage of low-frequency variants (27) and, despite the advances in design represented by the new generation of “CNV-aware” chips, the capacity to genotype structural variants (a collective term for polymorphic duplications, deletions and inversions) across the allele frequency spectrum lags well behind that of SNPs (28).
These considerations restrict the allele frequency range of the association signals that can be reliably detected. Since common variants of large effect seem to be few and far between, the consequence has also been to constrain the effect size spectrum of proven susceptibility variants (29). Variants within TCF7L2 remain the basis for the strongest T2D-susceptibility signal in European-descent samples, all other associations displaying per-allele odds ratios between 1.1 and 1.2. Where available, robust estimates of the true effect sizes of many of these proven variants indicate that chance (the “winner’s curse”) often contributed to their initial detection, and that, by inference, many more loci of broadly similar size remain to be found. For some of the confirmed signals, it is probable that fine-mapping efforts will, in time, identify variants causal for the association which have effects larger than those of the tagged variants which led to their discovery (30). In contrast, “winner’s curse” effects may mean that, for others, current measures of effect size overestimate the true state of affairs (31).
How much of the genetic predisposition to T2D is explained by the known variants? Before we benchmark the known T2D-susceptibility variants against some measure of the overall genetic contribution to disease, it is important to acknowledge that estimates of heritability are, by their nature, population- and time-specific. Besides, we have little understanding of the extent to which epigenetic effects (such as the influence of maternal uterine environment on offspring diabetes) (32) are capable of masquerading as apparent genetic predisposition, thereby inflating estimates of heritability. For dichotomous traits, the sibling relative risk (or λs) is often used as a convenient measure of the extent of familial aggregation, though this encapsulates both shared environment and genetic effects. Thus, whilst the frequently-quoted sibling relative risk estimate of T2D in Europeans (~3) (1) can only be considered to provide an upper bound for the total genetic influence, there seems little doubt that the combined effect attributable to known variants (a λs of 1.07) falls well short of any credible assessment of the genetic contribution to T2D. Most probably, the 20 or so known variants explain only 5-10% of the inherited predisposition.
What could explain the missing heritability (or genetic “dark matter” as it has been termed)? Certainly, continued efforts to uncover more common causal variants, through GWA meta-analysis, and fine-mapping, should fill some of the gap. However, given the modest locus effect sizes likely to result from these future discovery efforts, the overall additional contribution to heritability is unlikely to be substantial. At the same time, efforts to use the GWA approach to identify common susceptibility variants in other ethnic groups should, as the recent revelations concerning KCNQ1 demonstrate, improve the proportion of genetic variance explicable in non-European descent populations (19,24,33).
Common CNVs are also likely to be involved in disease-susceptibility (28). In terms of the base pair numbers involved, the sequence diversity attributable to CNVs is roughly equivalent to that due to SNPs, so one might naively expect the overall contribution to disease predisposition made by SNPs and CNVs to be quite similar. However, growing evidence that common CNVs are typically well-tagged by adjacent SNPs (34) suggests that many of the signals for which common CNVs are causal will already have been detected through SNP-based GWA studies. Ongoing efforts to explore global CNV associations for T2D and other common traits should clarify these issues.
Most evaluations of the joint effects of variants assume independence, so non-additivity (in the form of gene-gene and/or gene-environmental interactions) could be another factor contributing to the missing heritability. For the common T2D-susceptibility loci, little evidence of either has been observed (14): this may in part reflect intrinsic low power to detect interaction effects, particularly when the causal variants have not yet been defined. However, it has been argued on population genetics grounds that gene-gene interaction (GGI) is unlikely to have a major effect (35), and, in some scenarios, the variance explained under non-additivity can be LESS than that assumed if the effects are independent (36).
For many researchers in the field, the most probable source of missing heritability lies in the putative contribution of low frequency and/or rare variants (both SNPs and CNVs) of intermediate penetrance. Variants with such characteristics will certainly show marked familial aggregation, though the penetrance would generally be insufficient to allow for detection using classical linkage approaches: at the same time, the risk allele frequency is below the threshold detected reliably through GWAs. As a result, where such variants exist, they are likely to have been refractory to the kinds of systematic genome-wide surveys that have been so successful for rare, penetrant variants (at one extreme) and common variants of modest effect (at the other) (30). The arrival of “next-generation” sequencing technologies brings such variants into play and, in the coming years, it will become possible to examine the hypothesis that susceptibility to T2D (and other common traits) involves the action of multiple low-frequency variants.
The identification of DNA sequence variants shown to have a robust causal relationship to biomedical traits of interest represents one of the most powerful tools available for understanding the mechanisms involved in the pathogenesis of human disease. In principle, definitive statements relating a given association signal to a precise set of molecular and physiological events are generally only possible following identification of the causal variants (most likely through resequencing and fine-mapping) and/or detailed functional studies. In practice, however, it is often possible to make reasonably confident inferences about etiopathological mechanisms before such analyses have been completed.
Analysis of T2D-associated variants in healthy populations has clearly demonstrated that most of the T2D-susceptibility signals so far examined (notably the signals near KCNJ11, TCF7L2, CDKAL1, CDKN2A/B, IGF2BP2, HNF1B, HHEX/IDE, JAZF1 and SLC30A8) have a predominant effect on insulin secretion, with only FTO (which acts primarily through an effect on BMI and risk of obesity) (5) and PPARG shown to influence insulin action (18,37-41). These data reinforce the growing evidence (42) supporting the crucial role of defects in beta-cell function and mass in the development of T2D.
To go beyond these more general physiological observations to inferences about pathways depends on the confidence with which the likely causal genes can be identified. For some signals, the associated interval contains more than one highly credible candidate (e.g. HHEX and IDE in the region on chromosome 10). Nonetheless, some clear stories are emerging. Both HHEX and TCF7L2 are implicated in the regulation of Wnt-signalling, highlighting dysfunction of this pathway in the islet as a key process in T2D development (43). The association signal in SLC30A8 clearly points the finger at disturbances in zinc transport within insulin-secretory granules (16,44).
Perhaps most informative is the evidence incriminating abnormalities in cell cycle regulation. This evidence remains somewhat circumstantial, but at least four of the 18 regions can be tied to genes thought to be involved in this process (these are the signals close to CDKAL1, CDKN2A/B, CDC123 and CDKN1C [KCNQ1]). In rodents, Cdkn2a overexpression compromises maintenance of beta-cell mass and recapitulates the T2D phenotype (45). Further studies at these loci may provide valuable clues to which aspects of the regulation of beta-cell mass (development, regeneration or senescence) are central to the development of diabetes.
Several of the genes implicated in T2D-susceptibility appear to have effects on other common disease traits: these pleiotropic relationships often extend beyond accepted boundaries of co-morbidity. Thus, variation in CDKAL1 has been implicated in inflammatory bowel disease as well as T2D (46), and CDKN2A/B, JAZF1 and HNF1B are each known to play a role in cancer (10,47,48). In the case of HNF1B, alleles at the same variant have opposing effects on predisposition to T2D and prostate cancer, an inverse relationship which is consistent with the epidemiological data (49). The links between diabetes and cancer may betray a variety of shared etiological pathways: these include overlaps in signalling pathways as well as contrasting effects on the regulation of the cell cycle.
So far, the modest effect sizes of the common T2D-susceptibility variants limit the extent to which they can be used, individually or in combination, as tools for individual prediction of disease risk (50). Although it is certainly possible to use genetic risk scores to identify individuals who, by virtue of inheriting particularly large or small numbers of risk alleles, differ appreciably in their risk of diabetes (e.g. the top and bottom 1% differ about 4-fold in risk), it is by no means clear that such information is clinically actionable. Indeed, traditional risk factors, such as BMI and age, are much more effective tools for estimating future risk of diabetes, in adults at least, than any available combination of genetic variants. This does not mean that the prospects for individualized genetic profiling will remain of limited value: if, as described above, the coming years provide a bumper harvest of medium-penetrance, low-frequency susceptibility variants, these may provide a much more effective platform for individual prediction than the array of common variants currently available.
In the long-term of course, we can expect novel insights into T2D pathogenesis generated by genetic discoveries to feed forward into drug development. Proof-of-principle is provided by the observation that two of the proven T2D-susceptibility signals lie in genes (PPARG, KCNJ11) whose protein products are already the targets of established and effective therapeutic agents. But what about using genetic information to improve treatment of diabetes at the level of the individual patient? Prescribing on the basis of personal molecular diagnosis is already established practise for those with monogenic forms of diabetes (e.g. sulphonylureas for those with neonatal diabetes due to KCNJ11 mutations) (51,52). And it is possible to show, in cross sectional studies, that TCF7L2 genotype influences the efficiency of the response to sulphonylureas in those with common forms of T2D (53). However, as with disease risk, the modest effect sizes typical of T2D-susceptibility loci mean that the quality of these predictions is far too low for clinical implementation at the level of the individual patient. One potentially powerful use of genetic variant information may be to define groups of individuals which differ appreciably in disease risk as a means to improve the efficiency of therapeutic clinical trials.
So where do we stand? On the one hand, the explosion in confirmed signals, not just for T2D, but also for a range of related traits including BMI and hyperlipidaemia, has been spectacular. This is all the more so, coming, as it does, after so many years of faltering progress. At the same time, recognition that these discoveries have, so far, explained only a small proportion of the inherited predisposition limits our ability to deliver a complete understanding of disease pathogenesis and our capacity to undertake useful disease prediction.
However, the primary stated objective of GWA studies has always been to gain biological insights. This alone warrants detection of confirmed signals, and, as with monogenic diseases, the impact of such discoveries has little bearing on the extent of population variance explained. Substantial work will often be required to define the molecular mechanisms through which these association signals mediate disease risk. However, even before definitive evidence can be produced (through fine-mapping and functional studies), it is possible to make informed and credible inferences about the processes involved and thereby gain important clues to pathogenesis that can initiate translational advances. An obvious example in the T2D field is the relationship between the SLC30A8 association signal and dysfunction of the islet Zn transporter, ZnT-8, which has encouraged efforts to develop pharmaceuticals which target Zn uptake into beta-cell granules (16,44).
Whilst we may well need to look elsewhere to find the bulk of the inherited predisposition to T2D (particularly amongst low frequency variants), there can be little doubt that the implementation of GWA studies has taught us a great deal, not only about diabetes itself, but about the need for rigorous standards of analysis and interpretation, and the value of international collaboration. These lessons will stand us in good stead as we move forward to tackle these new challenges.
The authors would like to thank Mandy van Hoek for her contribution in power calculations.