|Home | About | Journals | Submit | Contact Us | Français|
Several single nucleotide polymorphisms (SNPs) associated with type 2 diabetes mellitus (T2DM) have been identified, but there is little information on their role in populations at high risk for T2DM. We genotyped SNPs at 63 T2DM loci in 3,421 individuals from a high-risk American Indian population. Nominally significant (P < 0.05) associations were observed at nine SNPs in a direction consistent with the established association. A genetic risk score derived from all loci was strongly associated with T2DM (odds ratio 1.05 per risk allele, P = 6.2 × 10−6) and, in 292 nondiabetic individuals, with lower insulin secretion (by 4% per copy, P = 4.1 × 10−6). Genetic distances between American Indians and HapMap populations at T2DM markers did not differ significantly from genomic expectations. Analysis of U.S. national survey data suggested that 66% of the difference in T2DM prevalence between African Americans and European Americans, but none of the difference between American Indians and European Americans, was attributable to allele frequency differences at these loci. These analyses suggest that, in general, established T2DM loci influence T2DM in American Indians and that risk is mediated in part through an effect on insulin secretion. However, differences in allele frequencies do not account for the high population prevalence of T2DM.
In recent years, more than 70 distinct genomic regions have been identified in which single nucleotide polymorphism (SNP) markers show reproducible association with type 2 diabetes mellitus (T2DM) at genome-wide statistical significance (P < 5 × 10−8) (1–17). Most of these variants were discovered by genome-wide association studies (GWAS) in European populations, and their effects are best characterized in populations of European ancestry. Studies in other ethnic groups suggest that effects on T2DM are similar to those seen in Europeans for most variants (18,19), but clear examples of heterogeneity in effects have been observed (8,10,20). There is limited information on the role of these established variants in populations at high risk for T2DM or on the extent to which differences in allele frequencies at these variants account for differences in population risk. In the present study, we analyze 63 established T2DM-susceptibility variants in Pima Indians, an American Indian population in whom the prevalence of T2DM is extraordinarily high (21).
Subjects were participants in a longitudinal study conducted in the Gila River Indian Community in central Arizona, where most residents are Pima Indians (21). The present study consisted of 3,421 individuals whose self-reported heritage was full Pima, Tohono O’odham, or a mixture of these closely related tribes and who had DNA available. These individuals constituted 1,951 sibships. There were 1,964 women and 1,457 men; mean ± SD age at last examination was 40.6 ± 16.5 years. Height and weight were measured, and a 75-g oral glucose tolerance test was administered; diabetes was diagnosed in 1,615 individuals (47.2%) according to 1997 American Diabetes Association criteria (22), i.e., 2-h postload plasma glucose ≥11.1 mmol/L, fasting plasma glucose ≥7.0 mmol/L, or a diagnosis during routine clinical care.
A subset of individuals participated in detailed physiologic studies to assess metabolic predictors of T2DM. Body composition was measured by hydrodensitometry or by DEXA, as previously described (23), in 405 nondiabetic full-heritage Pimas (172 women and 233 men; mean ± SD age 26.7 ± 6.1 years). Insulin sensitivity was measured in these 405 individuals by the hyperinsulinemic-euglycemic clamp (23). Insulin was infused at physiologic levels (~130 µmol/L), and glucose was infused to maintain euglycemia. Rate of glucose uptake, normalized to estimated metabolic body size (EMBS), was taken as a measure of insulin sensitivity (M) (milligrams per kilogram EMBS per minute). Insulin secretion was measured as the acute insulin response (microunits per milliliter) 3–5 min after a 25-g intravenous glucose challenge (23) in 292 individuals (105 women and 187 men; mean ± SD age 26.7 ± 6.1 years) with normal glucose tolerance (2-hour postload glucose <7.8 mmol/L).
A sentinel SNP for each region was selected for genotyping from previously reported GWAS (1–17). Two SNPs were selected for KCNQ1 and CDC123, where two distinct sets of variants have been described. In addition, 45 ancestry-informative markers (24) were genotyped for estimation of the individual proportion of European heritage (25). Genotyping was conducted by the SNPplex method (Life Technologies, Carlsbad, CA) or the BeadXpress system (Illumina, San Diego, CA) according to the manufacturer’s instructions. Results for 18 SNPs were reported previously (20,26–29). They are included here for a more complete characterization of the effects of established T2DM loci.
Association between genotype and T2DM at the last research examination was analyzed by a logistic regression model, which was fit by the generalized estimating equation procedure to account for sibship. Genotype was coded as a numeric variable representing number of risk alleles as defined in previous GWAS. Thus, an odds ratio (OR) >1 indicates association in the same direction as the established association and an OR <1 indicates association in the opposite direction. Continuous variables were analyzed with a linear mixed model in which genotype and other covariates were fixed effects and sibship was a random effect. The logarithm of each variable was analyzed, and the regression coefficient was exponentiated to obtain the effect per copy of the T2DM risk allele, expressed as a multiplier.
For assessment of whether associations in Pimas were consistent with those in Europeans, ORs were compared by the Cochran Q test of homogeneity, and heterogeneity was quantified by the I2 measure (30). ORs for Europeans were taken from previous publications (1,2,8,9,14–17,31–40). For assessment of whether GWAS-defined risk alleles contribute in aggregate to T2DM in Pimas, a multiallelic genetic risk score (GRS) was created by summing the number of risk alleles over all loci. To avoid reduction in sample size resulting from missing data at a few loci, we calculated the probability that an individual was of each possible genotype for each missing value from the genotypes in the individual’s relatives using MLINK (41); these probabilities were used in calculating the GRS.
To test for heterogeneity across all loci, we combined P values derived from the heterogeneity test for individual SNPs by constructing a signed Z score. The Z score was computed for each SNP as Zi = sign[ln(OREUi) − ln(ORPIi)]Φ−1(Pheti/2), where OREUi represents the OR for the ith SNP in Europeans, ORPIi represents the corresponding OR in Pimas, Pheti is the P value for heterogeneity, and Φ−1 represents the inverse of the cumulative normal probability function. The sum of the Z scores across all SNPs divided by the square root of the number of SNPs (Z*) was used to calculate a P value for the null hypothesis of homogeneity across all markers (42). If Z* is negative, it indicates that ORs on average are weaker in Pimas than in Europeans, whereas if Z* is positive it indicates that ORs are stronger in Pimas.
Frequency of the risk allele was estimated by maximum likelihood methods using the ILINK program to account for family membership (41). Data for these 63 SNPs were obtained from the International HapMap Project (http://hapmap.ncbi.nlm.nih.gov/) or, if not available from HapMap, from the 1000 Genomes Project (http://www.1000genomes.org/). For comparison of allele frequencies with other major continental ethnic groups, data were obtained for individuals of European ancestry from Centre d’Etude du Polymorphisme Humain families in Utah (CEU), for East Asians from Han Chinese in Beijing (CHB), and for Africans from Yoruba in Ibadan, Nigeria (YRI), HapMap populations. The likelihood ratio test was used to test significance of the difference in risk allele frequency in Pimas (fR-Pima) with that in each HapMap population (fR-CEU, fR-CHB, and fR-YRI). For assessment of whether T2DM risk alleles were systematically higher in one population than in another, the mean of the GRS (μGRS) was compared between populations.
For a more general comparison of genetic distance between Pimas and other populations, the coancestry coefficient (FST) was calculated across all T2DM-susceptibility variants by the method of moments (43). Since interpretation of FST is most straightforward when sample sizes are equal, Pima allele frequencies used in these calculations were derived from a random sample of equal effective size to the corresponding HapMap population; effective sample size was estimated by the method of Yang et al. (44). For comparison of FST calculated across the T2DM markers with its genomic expectation, random markers were selected from a GWAS in Pimas (45). Since SNP characteristics may have influenced detection of the T2DM markers, each T2DM-associated SNP was matched to potential random SNPs by minor allele frequency in CEU, base pair type, and chromosome type (autosomal vs. X chromosome); to avoid selecting markers highly concordant with those for susceptibility to T2DM, we excluded a 2-Mb region on either side of the sentinel SNP from this selection. A total of 294,467 potentially matching random SNPs were thus identified. Significance of the difference between FST at T2DM variants and FST at random markers was calculated by a bootstrap procedure in which one random marker was selected for each T2DM variant in each iteration.
For quantification of the extent to which differences in T2DM risk allele frequencies can explain the difference in T2DM prevalence between Pimas and Europeans, standard multivariable epidemiologic methods for calculation of attributable fraction (46) were modified to calculate the genetic attributable fraction (GAF) for the population difference in prevalence. We define this as the proportion of the excess T2DM prevalence in a high-risk “target” population compared with a low-risk “reference” population attributable to differences in risk allele frequency. If P0 represents prevalence in the reference population (e.g., Europeans) and P1 is prevalence in the target population (e.g., Pimas), then
where P1adj is prevalence in the target population adjusted for the allele frequency differences (i.e., prevalence if the target population had the same risk allele frequencies as the reference population) (Eq. 1). Data from non-Hispanic white participants in the oral glucose tolerance subset of the U.S. National Health and Nutrition Examination Survey (NHANES) 2005–2010 were used for the reference population (http://www.cdc.gov/nchs/nhanes/nhanes_questionnaires.htm). These data were from 3,282 individuals age 12–84 years (1,585 women, 1,697 men; mean ± SD age 46.8 ± 20.1 years); 523 individuals (15.9%) had diabetes. These data were combined with Pimas of the same age range for calculation of GAF.
If genotypic data for all markers were available for all individuals, the quantities needed to calculate GAF could be derived from multivariable logistic regression. However, such data are not readily available for NHANES participants, so we developed an approximation that uses allele frequency and OR estimates from other sources. Adjusted prevalence in each population was obtained from the following logistic regression equation: logit (prevalence) = α0 + α1I + γ1(cov1) + . . . γm(covm), where I is an indicator variable that takes the value of 0 for the reference population and 1 for the target population and γ1–γm represent the coefficients corresponding to m covariates (centered at the mean values in the target population). Under an additive model with assumptions of Hardy-Weinberg equilibrium in both populations, independence among SNPs, and that the population OR changes as a function of the OR associated with each SNP and the difference in risk allele frequency, the expected value of α1, given that allele frequencies are the same as in the reference population, is as follows: α1adj = α1 − Σ[2βi(fR1i2 + fR1i[1 − fR1i] − fR0i2 − fRoi[1 − fRoi])], where βi is the logarithm of the OR for the ith SNP, fR1i is the risk allele frequency in the target population, and fR0i is the frequency in the reference population. For the present analyses, allele frequencies in the HapMap CEU population were taken as representative of the reference population. The values required for calculation of GAF (see Eq. 1) are as follows:
Simulation studies suggest that estimates of GAF derived by this method provide a good approximation of those derived from a multivariable regression in which all data are available for all individuals (Fig. 1). CIs and hypothesis tests for GAF were derived from a bootstrap procedure in which Pima, NHANES, and CEU populations were resampled and ORs were sampled from published values and standard errors.
Eight SNPs (rs17106184 in FAF1, rs7578597 in THADA, rs3923113 in GRB14, rs831571 in PSMD6, rs6808574 in LPP, rs1531343 in HMGA2, rs7957197 in HNF1A, and rs17782313 in MC4R) were nearly monomorphic (minor allele frequency <0.01) in Pimas and were not analyzed for association. Table 1 shows the association for each of the remaining 55 SNPs with T2DM in Pima Indians, along with the test for heterogeneity in ORs between Pimas and Europeans. Nine SNPs, those in GCKR, ZBED3, CDKAL1, ZFAND3, KCNQ1, SPRY2, HMG20A, PRC1, and FTO, had nominally significant associations (P < 0.05) in Pimas in the same direction as the established association. The previously reported result with the KCNQ1 SNP rs2237892 was the strongest association. Ten SNPs, in IRS1, ADAMTS9, ARL15, ZFAND3, PTPRD, TCF7L2, MPHOSPH9, C2CD4A, SLC16A11, and DUSP9, showed nominally significant heterogeneity between Pimas and Europeans. Nonetheless, ORs were in the same direction as the established association for 39 of the 55 SNPs.
Results for SNPs with nominally significant and directionally consistent associations with metabolic traits are shown in Table 2. Results for all SNPs are shown in Supplementary Table 1. The T2DM risk allele was associated with lower insulin secretion for SNPs in PROX1, IGF2BP2, ZBED3, DGKB-TMEM195, GLIS3, CDC123, HHEX, KCNQ1, and MNTR1B. The T2DM risk allele for SNPs in IRS1, PPARG, MNTR1B, PRC1, and SRR was associated with lower values of insulin sensitivity. Since the MNTR1B SNP was associated with both insulin secretion and total body insulin sensitivity, we further investigated its relationship with hepatic insulin sensitivity, measured in the clamp using radiolabeled glucose, and found that the risk allele was associated with lower sensitivity (r = −0.15, P = 0.002). The T2DM risk allele was significantly associated with higher percentage body fat for SNPs in PRC1 and ZFAND3. When BMI was analyzed in the larger population, the T2DM risk alleles for variants in GCK and FTO were associated with significantly higher BMI (Supplemental Table 2).
Associations with the multiallelic GRS are shown in Fig. 2. The sum of the number of risk alleles over all 55 SNPs was significantly associated with T2DM (OR 1.05 per copy of a risk allele, P = 6.2 × 10−6). There was also a strong association between a greater number of T2DM risk alleles and lower values of insulin secretion such that each copy of a risk allele was associated with a 4% decrease in insulin secretion (P = 4.1 × 10−6). There was little association with insulin sensitivity or percentage body fat. When alleles were weighted by the logarithms of the published ORs in constructing the GRS, similar results were obtained (data not shown). When BMI was analyzed, results were similar to those seen with percentage body fat, but the inverse association was statistically significant (lower by 0.4% per risk allele, P = 1.2 × 10−5) (Supplementary Fig. 1). When the GRS was constructed using only the nine insulin secretion–associated SNPs, each risk allele was associated with a 13% decrease in insulin secretion; similarly, in a score constructed from the five insulin sensitivity SNPs, each risk allele was associated with a 7% decrease in insulin sensitivity (Supplementary Fig. 2). The insulin secretion score was associated with T2DM (OR 1.09, P = 2.7 × 10−4); when these nine SNPs were excluded from the global GRS, the T2DM association was modestly attenuated (OR 1.04, P = 1.1 × 10−3).
The test for heterogeneity in the effect on T2DM across all SNPs was statistically significant (P = 3.9 × 10−5) and negative in sign (Z* = −4.12); this indicates that the effects of these SNPs are on average weaker in Pimas than in Europeans. When the 18 SNPs with nominally significant association with T2DM or significant heterogeneity were excluded, the effects of the GRS on T2DM (OR 1.04, P = 4.9 × 10−4) and insulin secretion (effect −4%, P = 8.0 × 10−6) remained significant, as did evidence for heterogeneity (Z* = −2.82, P = 4.8 × 10−3).
The difference in frequency of the T2DM risk allele between Pimas and HapMap populations is shown for each locus in Supplementary Fig. 3. The distribution of the GRS in each population is shown in Fig. 3. Mean GRS in Pimas (68.4) was slightly but significantly lower than in CEU (69.2, P = 0.049); mean GRS in Pimas was also significantly lower than in YRI (73.7, P = 4.4 × 10−38) but higher than in CHB (66.6, P = 9.6 × 10−5). When loci were weighted by the logarithms of the ORs in constructing the GRS, results were similar, except that the contrast in mean GRS between Pimas and CEU was more pronounced (P = 1.2 × 10−10).
Genetic distances among populations across all 63 T2DM markers and across random markers are summarized in Fig. 4. FST across these T2DM loci was 0.163 (95% CI 0.154, 0.173) between Pimas and CEU, 0.138 (0.125, 0.152) between Pimas and CHB, and 0.232 (0.221, 0.244) between Pimas and YRI. These values were not significantly different from those derived from matched sets of SNPs randomly selected across the genome: FST 0.158 (0.106, 0.209) between Pimas and CEU (P = 0.83 for difference in FST), 0.129 (0.078, 0.180) between Pimas and CHB (P = 0.74), and 0.241 (0.173, 0.309) between Pimas and YRI (P = 0.80). Thus, differences in allele frequency are generally similar to those expected given genetic distances between populations.
Results for the calculation of GAF for Pimas compared with Europeans are shown in Fig. 5A. The age-sex adjusted prevalence of T2DM was 48.2% in Pima Indians and 8.2% in non-Hispanic whites from NHANES (OR 10.5). The prevalence in Pimas adjusted to the frequency of risk alleles in CEU was slightly higher at 55.9%, resulting in a GAF of −0.19 (95% CI −0.34, −0.03); the low value of GAF reflects the lower value of the GRS in Pimas. When the 10 SNPs with statistically significant heterogeneity in the ORs were excluded from the calculation, GAF was −0.03 (−0.19, 0.08). Calculations were also conducted comparing non-Hispanic blacks in NHANES (n = 1,610, mean ± SD age 39.5 ± 20.1 years, 812 with diabetes) with non-Hispanic whites using allele frequencies derived from the African ancestry in the southwest U.S. (ASW) HapMap population. These analyses suggest that 66% of the excess prevalence in the black population is potentially attributable to allele frequency differences at these loci (GAF 0.66 [95% CI 0.32, 1.07]) (Figure 5B).
In recent years, many genetic variants reproducibly associated with T2DM have been identified. These have mostly been identified by GWAS in European populations. Many of these variants are also associated with T2DM in non-European populations, but there are instances of heterogeneity (8,10,20). The extent of association in high-risk populations, such as American Indians, is not well characterized. Our previous analyses in Pima Indians, with a much smaller number of SNPs, identified associations with SNPs in FTO and KCNQ1 (27,28); the KCNQ1 associations are subject to parent-of-origin effects and are particularly strong in Pimas (28). KLF14 variants also show parent-of-origin effects (28). Statistically significant heterogeneity between Pimas and Europeans at TCF7L2 was also observed, and a multiallelic score from eight SNPs was modestly associated with T2DM in Pimas and with diminished insulin secretion (20,27). In the present study, we have conducted a more complete survey of T2DM susceptibility variants in Pimas, including a total of 63 SNPs reproducibly associated with T2DM at genome-wide significance. These analyses identify additional SNPs that are nominally significantly associated with T2DM in Pimas in the same direction as in Europeans, including those in GCKR, ZBED3, CDKAL1, ZFAND3, SPRY2, HMG20A, and PRC1. Many of the T2DM susceptibility SNPs have effects in Pimas that are directionally consistent with those in Europeans, even if they were not individually statistically significant. Indeed, a multiallelic GRS that assesses effects of these variants in aggregate was statistically significant, even when SNPs with nominally significant effects or heterogeneity were excluded. The GRS was also strongly associated with diminished insulin secretion. Thus, the present findings suggest that the majority of T2DM-susceptibility variants do have modest effects on T2DM in this high-risk population but that some do not achieve statistical significance in the current sample size. Analyses in European populations suggest that the majority of T2DM-susceptibility variants influence T2DM risk through an effect on insulin secretion (2,47), and the current analyses suggest that this is also the case in Pimas.
Despite general consistency for most SNPs between the direction of association with T2DM in Pimas and that observed in the original GWAS, there were several SNPs that showed evidence for heterogeneity in effect between Pimas and Europeans. In addition to TCF7L2, nominally significant heterogeneity was observed at IRS1, ADAMTS9, ARL15, ZFAND3, PTPRD, C2CD4A, MPHOSPH9, SLC16A11, and DUSP9. With the exception of ZFAND3, which has previously been described as associated in East Asians but not Europeans (8), the effect in Pimas was weaker than that in Europeans. Furthermore, the combined test of heterogeneity across all loci indicated that effects were generally weaker in Pimas than in Europeans (even when SNPs with nominally significant association or heterogeneity were excluded). Thus, while most T2DM-susceptibility variants do have an effect on T2DM risk in Pimas, this effect is generally not as strong as it is in Europeans. It is possible that, despite the large sample sizes, this heterogeneity reflects overestimation of effects in Europeans. Given that functional variants at most of these loci have not been identified, however, some heterogeneity between Europeans and other populations might be expected on account of differing linkage disequilibrium patterns. Indeed, fine-mapping studies have suggested that population heterogeneity at GWAS signals derived from Europeans is at least partly due to differences in linkage disequilibrium patterns (18).
Recent studies have described divergence in allele frequency at T2DM-susceptibility variants between major continental populations that is greater than expected given genetic distances between these populations and a gradient in genetic risk for T2DM with risk alleles at highest frequency in Africans and at lowest frequency in East Asians (48,49). Such divergence in allele frequencies may reflect effects of natural selection in the different evolutionary histories of these populations. Prevalence of T2DM among Pima Indians is among the highest reported in the world, and if such evolutionary factors are responsible for this high prevalence, one might expect to see established T2DM risk alleles at high frequency in Pimas. The present analyses are consistent with previous studies, conducted with fewer SNPs (48), in that we observed the highest genetic risk scores in Africans (YRI) and the lowest in East Asians (CHB). However, genetic risk scores in Pimas were not particularly high and were comparable with, or lower than, those from low-risk populations, such as Europeans. The population differences in GRS observed here could reflect effects of genetic drift or natural selection. One study found that the Africa–East Asia gradient was greater than expected with random markers (49), which suggests natural selection, but a recent study that analyzed several global populations at 65 established T2DM-susceptibility loci suggested that T2DM-susceptibility alleles are generally evolutionarily neutral (50). Further work is needed to determine whether the high genetic risk scores for T2DM in African versus Asian populations is reflective of genetic drift or natural selection. However, in the present general analysis of genetic distances, we did not observe significant excess in the divergence between Pimas and other continental populations across established T2DM-susceptibility variants. This suggests that any overall effects of natural selection at these variants do not appear to have contributed to the high risk of T2DM in Pimas.
Regardless of the mechanisms by which population differences in risk allele frequency have arisen, such differences could explain population differences in prevalence of T2DM. The present analyses of GAF, however, suggest that differences in allele frequency at these established T2DM variants account for little of the increased population risk for T2DM in Pimas compared with European Americans. GWAS within Amerindian-derived populations may identify variants that are more likely to explain these population differences. Our recent GWAS comparing Pimas with young-onset T2DM to older nondiabetic individuals found association with a variant in DNER in Pimas but not in Europeans (45); this variant (rs1861612) shows little difference in allele frequency, however. A recent study suggested that the risk allele of rs75493593 in SLC16A11, which is more common in American Indians than Europeans, could explain ~20% of the excess risk in Mexican Americans compared with European Americans, ignoring the effects of all other loci (16). In the present study, we found that the risk allele at SLC16A11 is much more common in Pima Indians than in Europeans; however, its effect is outweighed by other variants at which the risk allele is less common Pimas, such that the overall extent to which established T2DM risk alleles can account for the excess prevalence in Pimas is negligible. In contrast, the present analyses suggest that 66% of the difference in T2DM prevalence between African Americans and European Americans is potentially attributable to allele frequency differences at these loci. Since transferability of European-derived T2DM variants to African Americans is somewhat uncertain given the highly divergent linkage disequilibrium patterns, the validity of the assumption that European-derived ORs represent causal effects may be questionable. Nonetheless, in light of the high proportion of excess prevalence between African Americans and Europeans that is attributable to differences in allele frequency at established T2DM variants, the fact that they account for none of the excess T2DM prevalence in Pimas seems remarkable.
In summary, the present analyses suggest that established T2DM variants are largely transferrable to high-risk populations, such as Pima Indians, albeit with weaker effects than in Europeans. However, differences in allele frequency across these established T2DM alleles account for little, if any, of the high T2DM prevalence in Pimas compared with populations of European ancestry. Thus, the high prevalence of T2DM in Pimas is likely the result of environmental factors or of genetic factors that remain largely unidentified.
Acknowledgments. The authors thank the participants who volunteered for the study and the staff of the Phoenix Epidemiology and Clinical Research Branch who provided assistance.
Funding. This work was supported by the intramural research program of the National Institute of Diabetes and Digestive and Kidney Diseases.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. R.L.H. wrote the manuscript, researched data, and contributed to discussion. R.R., S.K., Y.L.M., E.J.W., J.M.C., R.G.N., and L.J.B. researched data, contributed to discussion, and reviewed and edited the manuscript. R.L.H. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. Parts of this study were presented in abstract form at the Annual Meeting of the American Society of Human Genetics, San Francisco, CA, 6–10 November 2012.
This article contains Supplementary Data online at http://diabetes.diabetesjournals.org/lookup/suppl/doi:10.2337/db14-1715/-/DC1.