|Home | About | Journals | Submit | Contact Us | Français|
The goal of this study was to identify genes and regions in the human genome that are associated with the acute insulin response to glucose (AIRg), an important predictor of type 2 diabetes, in Hispanic-American participants from the Insulin Resistance Atherosclerosis Family Study (IRAS FS).
A two-stage genome-wide association scan (GWAS) was performed in IRAS FS Hispanic-American samples. In the first stage, 318K single nucleotide polymorphisms (SNPs) were assessed in 229 Hispanic-American DNA samples (from 34 families) from San Antonio, TX. SNPs with the most significant associations with AIRg were genotyped in the entire set of IRAS FS Hispanic-American samples (n = 1190). In chromosomal regions with evidence of association, additional SNPs were genotyped to capture variation in genes.
No individual SNP achieved genome-wide levels of significance (P < 5 × 10-7); however, two regions — chromosomes 6p21 and 20p11 — had multiple highly-ranked SNPs that were associated with AIRg. Additional genotyping in these regions supported the initial evidence for variants contributing to variation in AIRg. One region resides in a gene desert between PXT1 and KCTD20 on 6p21 while the region on 20p11 has several viable candidate genes (ENTPD6, PYGB, GINS1 and R4-691N24.1).
A GWAS in Hispanic-American samples identified several candidate genes and loci that may be associated with AIRg. These associations explain a small component of variation in AIRg. The genes identified are involved in phosphorylation and ion transport and provide preliminary evidence that these processes have importance in beta cell response.
An individual’s risk of developing insulin resistance and type 2 diabetes is determined, in part, by genetic factors . The transition from normal glucose homeostasis to type 2 diabetes is thought to be primarily a function of increasingly dysfunctional beta cells [2-6]. While estimates of the contribution to beta cell response vary, there is little argument that discovery of genes that account for variation in the manner in which the beta cell responds to glucose could identify important pathways for type 2 diabetes risk prediction, intervention and treatment.
Both candidate gene evaluations and genome-wide linkage scans have been used to search for genes with modest to large effect that contribute to variation in beta cell response; however, these methods have not been highly successful. The genome-wide association scan (GWAS) has become a popular approach for detecting those genes with small to modest effect. GWAS studies in type 2 diabetes have been especially productive and have had an extraordinary impact on current understanding of genetic susceptibility to type 2 diabetes [7-13].
The vast majority of GWAS for type 2 diabetes have used cases and controls from European or European-derived populations. Although over a dozen genes/regions have been identified with robust statistical significance, it is unclear whether these genes/regions also will be found in non-European-derived populations [14, 15]. Currently, the genes that have been identified for type 2 diabetes from the European-derived populations have suggested that most, if not all, of the type 2 diabetes genes mediate their influence through the beta cell and not through insulin resistance pathways.
In the current report, we evaluate a quantitative, directly assessed measure of beta cell response in a non-European population. Herein, we present results of a two-stage GWAS in Hispanic Americans from the IRAS FS. Through a high-density SNP scan and follow-up genotyping, a series of genes and regions may have been identified that contribute to the variation in the acute insulin response of the beta cell to a glucose challenge.
The description of the study design, recruitment and phenotyping for IRAS FS have been presented previously . Briefly, the IRAS FS is a multi-center study designed to identify the genetic determinants of quantitative measures of glucose homeostasis in African-American and Hispanic American populations in the USA. Members of large families of self-reported Hispanic ancestry (n=1268 individuals in 92 pedigrees; San Antonio, TX; San Luis Valley, CO) were recruited and used in this report. The Institutional Review Board of each participating clinical and analysis site approved the study protocol and all participants provided their written informed consent.
A clinical examination was performed that included an in-depth medical history interview, a frequently sampled intravenous glucose tolerance test (FSIGT), anthropometric measurements, and collection of samples for blood chemistry and biomarker analysis. Measures of glucose homeostasis were derived using mathematical models [MINMOD, 17] from glucose and insulin values obtained during the FSIGT [18-20]. These estimates of glucose homeostasis include insulin sensitivity (SI), glucose effectiveness (SG), disposition index (DI; with DI=AIR × SI), and the acute insulin response to glucose (AIRg). This report is focused on the AIRg phenotype of glucose homeostasis.
A collection of IRAS FS DNA samples from Hispanic-American subjects (229 individuals from 34 families) were chosen from the San Antonio study sample for the first stage of the GWAS. These samples were chosen from participants without type 2 diabetes who had complete data for glucose homeostasis and obesity phenotypes, but with an age, BMI and sex distribution consistent with that of the entire IRAS FS collection. The participants appear to be representative of a relatively homogeneous population based upon STRUCTURE analysis  using microsatellite polymorphisms from an earlier genome-wide linkage scan [22, 23]. DNA used in the high throughput genotyping (318K SNPs) was obtained from EBV-transformed lymphoblastoid cell lines.
Genotyping was performed using 1.5 μg of genomic DNA (15 ul of 100 ng/ul stock) and the Illumina Infinium II HumanHap 300 BeadChips at Cedars-Sinai Medical Center using the Illumina Infinium II assay protocol . Genotypes were determined based on clustering of the raw intensity data for the two dyes using Illumina Bead Studio software. Consistency of genotyping was checked using 18 repeat samples; the concordance rate was 100%. Repeat genotyping of DNA samples was performed once if the overall call rate was less than 98%; the sample was rejected if there was no improvement in call rate. The average sample call rate was 99.76%. SNPs with Hardy-Weinberg Equilibrium P<0.001, minor allele frequency (MAF) less than 0.05, or more than 5% missing genotypes were excluded from subsequent analysis. Genotypes with gencall scores less than 0.15 were set to missing (0.25%). For highly associated SNPs, clustering was repeated to exclude spurious significance. All genotypes were oriented to the forward strand. There is little risk of strand ambiguities as there are no C/G or A/T polymorphisms included on the Illumina 300K HumanHap panel.
SNPs with evidence of association in the GWAS were validated in the entire Hispanic cohort (excluding subjects with type 2 diabetes). A total of 1536 SNPs were chosen for genotyping on all Hispanic samples for which glucose homeostasis data was available (n=1190). Genotyping was performed at Cedars-Sinai Medical Center using the Illumina Golden Gate Assay. SNPs with low call frequencies (<98%) were manually re-clustered (~15% of all SNPs). Of the 1536 SNPs, a total of 3.5% were excluded based on call frequency less than 0.7 and/or cluster separation less than 0.3. The average SNP call frequency was 99.48%. Duplicate genotyping of 12 of samples yielded a 100% concordance rate. The minimum acceptable sample call rate was 95%; the average sample call rate was 99.5%. SNP selection for this second stage was based upon 1) identification of the most strongly associated 50-100 SNPs for each glucose homeostasis and related phenotypes (SI, SG, AIRg, DI) from the initial GWAS; 2) tag SNPs in genes with high evidence for association across more than one phenotype (tags for SNPs with MAF >0.1 using Haploview v4 ); and 3) ancestry-informative markers (AIMs) for Hispanic populations [26, 27].
Individual genes/regions with confirmed evidence for association from the GWAS and the validation genotyping were targeted for additional genotyping using tag SNPs. This genotyping was performed using iPLEX Gold SBE assays on the Sequenom MassArray Genotyping System. Locus-specific primers were designed using the MassARRAY Assay Design 3.0 software (Sequenom, Cambridge, MA). Mass spectrograms were analyzed using MassARRAY TYPER software (Sequenom). The minimum acceptable call frequency was 98%; no SNP failed this criterion as the average call frequencies were >99.14%. Fifty-one blind duplicate samples were included to evaluate genotyping accuracy; the concordance rate was 99.94%. SNPs were selected to capture common variation within LD haplotype blocks as defined by the CEPH (CEU) population of the International HapMap project . Specifically, genotype data from the genomic interval containing the candidate gene +/-5 kb were exported from the HapMap database and imported into Haploview. For genes with few LD blocks (i.e., ETV7), SNPs were selected to tag the entire genic region with a mean r2=0.80 with forced inclusion of previously genotyped SNPs. For larger genes (e.g., STK38), SNP selection focused on the LD block containing the SNP associated in the validation genotyping; additional SNPs were selected to tag the block with a mean r2=0.80 with forced inclusion of previously genotyped SNPs.
For quality control, each SNP was examined for Mendelian inconsistencies using PedCheck . There were 1657 SNPs exhibiting inconsistencies and these genotypes were converted to “missing.” Maximum likelihood estimates of allele frequencies were computed using the largest set of unrelated Hispanic-American individuals (n=34); SNP genotypes were tested for departures from Hardy-Weinberg Equilibrium proportions. SNPs with no evidence of a difference in AIRg values between individuals with and without missing genotype data (P > 0.05), and no evidence of departure from HWE (P > 0.001) were included in subsequent analyses.
To test for association between individual SNPs and AIRg, variance component measured genotype analyses were performed as implemented in SOLAR . X-chromosome SNPs (used for ancestry adjustment) were not used in the primary analyses. For statistical testing, AIRg was transformed using the signed-square root to best approximate the distributional assumptions of the test and minimize heterogeneity of the variance. SNPs were ranked using P-values from the additive genetic model. The primary statistical inference was the additive genetic model. All tests and levels of significance were computed after adjustment for age, sex, and BMI.
Analyses for validation and locus-specific genotyping data used the same modeling framework as employed for the analysis of the GWAS data, except that covariate adjustment included a term for the site of recruitment (San Antonio, San Luis Valley) and admixture. For incorporation of admixture into the combined analyses of GWAS and validation data, a collection of ancestry-informative markers (AIMs) was used. These AIMs were selected from the literature on studies performed in Hispanics [26, 27]. The GWAS had 80 SNPs (including 14 on the X chromosome) and the validation genotyping had 149 SNPs (including 23 on the X chromosome). The 149 AIMs were available on 1279 subjects, and these data were merged with HapMap data for CEPH (n=90) and Yoruba (n=90) populations.
A principal components analysis was performed on the 149 AIMs as well as the 80 AIMs in common between the GWAS (317K SNP panel) and the validation (1536 SNP panel) experiments. The total proportion of variance explained by first three principal components with the 80 AIMs (PC1, 10.2%; PC2, 5.1%; PC3, 2.7%) differed little from the proportion of variance explained by the 149 AIMS (PC1, 10.3%; PC2, 4.8%; PC3, 1.9%). However, there were differences overall between the Hispanic-American sites with respect to PC2 (P = 2.35 × 10-53). Hispanic Americans from the two sites differed in two measures of glucose homeostasis (glucose effectiveness, SG, P = 2.65 × 10-17, disposition index, DI, P = 6.22 × 10-10) and two measures of adiposity (BMI, P = 1.46 × 10-12, visceral fat, VAT, P = 7.37 × 10-5), but not AIRg (Spearman correlation 0.09, P = 0.09). For AIRg, the proportion of variance explained by the center of ascertainment is 2.56%; thus, all results are presented with adjustment for admixture in addition to age, sex, BMI, and center of ascertainment.
A total of 229 Hispanic-American subjects who did not have type 2 diabetes but had complete data for glucose homeostasis, critical covariates, and DNA obtained from EBV-transformed cell lines from the San Antonio population was used in the pilot GWAS. A sample of 961 subjects with DNA and baseline data was used for replication. A total study sample of 1,190 Hispanic-American subjects includes 58.6% female, average age of 42.8 years, mean AIRg (excluding those with type 2 diabetes) of 767 pmol/l and BMI of 29.0 kg/m2 (Table 1). There were no significant differences between the groups in these phenotypes.
A total of 309,200 SNPs met all quality control criteria and were evaluated for association with AIRg. SNPs were ranked using P-values from the additive genetic model. The Q-Q plot for the stage 1 GWAS indicated that the majority of SNPs exhibited a —log10(P-value) less than 2, and the observed distribution of P-values matched the expectation for the majority of the observed data (Electronic Supplemental Material (ESM) Figure 1). There was some departure from the null distribution at P < 10-3, so this value was used as a rough cut-point for selection of SNPs for follow-up. The highest-ranking SNPs associated with AIRg were chosen for genotyping on all non-diabetic Hispanic-American participants in the IRAS FS (n=1190). A total of 672 SNPs with evidence of association with AIRg and other glucose homeostasis phenotypes (SI, SG, DI), or SNPs that tag genes associated with multiple phenotypes, were included in a 1536 custom chip (another 461 SNPs were tested based upon association with adiposity and related phenotypes). For AIRg, 157 SNPs were chosen based on AIRg alone (125 SNPs) or AIRg plus other traits (32 SNPs). Of these 157, 149 SNPs passed clustering-related quality control.
Results from the AIRg analyses (full admixture-adjusted analyses, initial GWAS and the independent replication analysis), representing the most associated SNPs that reside in genes is shown in Table 2. SNP locations within and in proximity to genes were determined by dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/). The most associated SNP with AIRg is rs1566458 (P = 5.5 × 10-4) in the zinc finger DHHC-type containing-7 gene on chromosome 16q24.1 (ZDHHC7). A second gene, RGS6 (rs2238256, P = 8.6 × 10-4) on 14q24.3, encodes proteins that negatively regulate heterotrimeric G protein signaling and modulate neuronal, cardiovascular, and lymphocytic activity. Three of the most associated genic SNPs for AIRg (rs1061632, rs2300064 and rs12190911) are located in the KCTD20 (potassium channel tetramerization domain containing 20) / STK38 (serine/threonine kinase 38) region on 6p21.3, within the human MHC and near the SNP in the NCR2 (natural cytotoxicity triggering factor 2) locus also associated with AIRg. Only two associated SNPs are in exons (rs1061632 in KCTD20 and rs3748400 in ZCCHC14).
The region on 6p21 (ESM Figure 2) contains five blocks that span approximately 200 kb, based upon the MEX HapMap (genotyping of 71 Mexican Americans from Los Angeles, California, USA). This region is covered by a 5 SNP haplotype (rs12190911/rs1061632/rs2300064/rs612399/rs7772334). A smaller LD block is tagged by rs12190911 (admixture adjusted P = 1.4 × 10-3) and contains two genes, ETV7 (ets variant gene 7, TEL2 oncogene, a member of the E26 transformation-specific family of transcription factors) and PXT1 (peroxisomal, testis-specific 1). A larger LD block is tagged by rs1061632 (P = 9.0 × 10-4), rs2300064 (P = 9.1 × 10-4), rs612399 (P = 6.2 × 10-3) and rs7772334 (P = 7.8 × 10-4), and contains the genes KCTD20 (potassium channel tetramerization domain containing 20), STK38 (serine/threonine kinase 38) and SFRS3 (splicing factor, arginine/serine-rich 3). For the entire region, 22 SNPs were chosen to capture common variation (r2 = 0.98) for follow-up analyses of these candidate genes in the entire IRAS FS Hispanic sample. The AIRg effect in the 6p21 region appears to be composed of three sets of two adjacent SNPs spanning a 105 kb region bounded by SNPs rs12190911 (Pgen = 0.001) and rs614028 (Pgen = 0.04), including the genes KCTD20 and STK38. Locations and genotypic means for these SNPs can be found in the Electronic Supplementary Material (ESM Table 1).
A second region of associated SNPs that appear clustered is contained on chromosome 20p11.2-p11.1, including the genes ENTPD6 (ectonucleoside triphosphate diphosphorylase 6), PYGB (brain glycogen phosphorylase), ABHD12 (abhydrolase domain containing 12), GINS1 (GINS complex subunit 1, Psf1 homolog) and RP4-691N24.1 (KIAA0980) (ESM Figure 3). In this case, however, only the ABHD12 gene did not have at least one SNP that was significantly associated with AIRg. ENTPD6 (rs2179638, P = 0.001), PYGB (rs6138553, P = 0.026), GINS1 (rs6076347, P = 0.0037; rs2500406, P = 0.003) and RP4-691N24.1 (rs16987806, P = 0.001; rs6083877, P = 0.035) all exhibited associations with variation in AIRg. Similar results are shown for SNPs that do not have recognized genes within 10 kb of the associated SNP (Table 3). The second most associated SNP in the entire data (rs7772334, P = 7.8 ×10-4) is 13.6 kb from NCR2 (natural cytotoxicity triggering receptor 2) on 6p21.1 which is involved in natural killer cell activity. Locations and genotypic means for these SNPs can be found in the Electronic Supplementary Material (ESM Table 2).
Measurement of beta cell response and insulin sensitivity derived from the frequently sampled intravenous glucose tolerance test (FSIGT) with minimal model analysis (MINMOD) is highly correlated with corresponding measures obtained from the euglycemic and hyperglycemic hyperinsulinemic clamp studies [31, 32]. The hyperglycemic clamp, unlike the euglycemic clamp, provides direct estimates of beta cell function. Variation in these measures of glucose homeostasis, including AIRg, appears to be controlled more extensively by genetic factors than other surrogate measures of insulin sensitivity, such as fasting insulin or HOMA . These parameters of glucose homeostasis may represent important phenotypes for prediction of the overall and genetic risk of type 2 diabetes through the mechanism of action of the beta cell.
To our knowledge this is the first report of GWAS analysis of a direct measure of beta cell response - the acute insulin response to glucose (AIRg). In this analysis, multiple regions of the human genome were identified as likely harboring genes that contribute to variation in AIRg. Two regions, 6p21 and 20p11, were notable in exhibiting associations at more than one SNP. In each case, the region is complex, with multiple genes and extensive LD structure. The region on 6p21 contains KCTD20 (proposed to participate in potassium ion transport) and STK38 (protein serine/threonine kinase activity, including ATP binding, magnesium ion binding, and protein binding) [34, 35]. The KCTD20 gene is of particular interest given the key role of potassium ion transport in glucose-stimulated insulin secretion . However, the strongest region of significance on 6p21 is centered on a 1.8 kb region without any known genes. It remains to be determined whether the observed SNP associations are due to a single or multiple causal variants. Preliminary analyses, in which each associated SNP is used as a covariate, suggests that at least two areas may contain independent variants, one associated with the PXT1/KCTD20/STK38 region and one associated with the TREM1/NCR2 region (ESM Table 3).
In contrast, the region associated with AIRg on chromosome 20p11 contains several interesting candidate genes. Similar to the preliminary analyses in the candidate region on chromosome 6, conditioning on SNP rs1555286 had little change in the significance of the GINS1 SNP rs6076347, suggesting that at least two independent variants may exist (ESM Table 4). The ENTPD6 gene encodes E-type NTPases (such as CD39) that participate in purine and pyrimidine metabolism, calcium ion binding, hydrolase activity, magnesium ion binding, and nucleoside-diphosphatase activity . The protein encoded by PYGB is a glycogen phosphorylase that catalyzes the rate-determining step in glycogen degradation . In addition, the rs6076347 SNP in GINS1 is a missense mutation in exon 5 (Ile -> Val). Genistein exposure results in an increase in GINS1 mRNA expression, and genistein is known to modulate hepatic glucose- and lipid-regulating enzyme activities in C57BL/KsJ-db/db mice . How the functions of these genes may relate to beta cell response is not immediately apparent.
Recent meta-analyses of GWAS in type 2 diabetes have uncovered at least 18 genes/regions that appear to be common in the general population (minor allele frequencies of the variant associated with type 2 diabetes greater than 1%) but with relatively small effect . A number of the associated SNPs have identified genes (HHEX, CDKN2A/2B, CDKAL1) that may be implicated in beta cell development and function. In this study, the most associated SNPs in the initial GWAS for several glucose homeostasis and obesity traits have been followed with a 1536 SNP analysis in an expanded population sample of Hispanic-American ethnicity from San Antonio and San Luis Valley. Of these, there were four SNPs in two genes (at the P < 0.01 level for an additive model) that were contained in the 18 type 2 diabetes-associated genes as evaluated by GWAS in the IRAS FS sample. These two genes are THADA (thyroid adenoma associated [death receptor-interacting protein]; rs7595299) and CAMK1D (calcium/calmodulin-dependent protein kinase ID; rs2768367, rs2399866, rs1004247) and appear to have a common pathway, involved in regulating the number of insulin-producing cells in the pancreas. The remaining 16 type 2 diabetes-associated genes as evaluated by GWAS were not associated with AIRg in the current study.
While the GWAS results in the IRAS FS Hispanic-American sample have provided interesting candidates for consideration, there are limitations to the interpretation of study results. First, as there are very few comparable studies with similarly defined phenotypes of beta cell response, the ability to replicate these findings is limited. Second, while the associations are ranked based upon rigorous statistical criteria, the admixture-adjusted P-values obtained do not meet genome-wide levels of significance (e.g., P < 5 × 10-7). Third, the effect sizes reflected in the genotypic means for each associated SNP average 25% of a standard deviation. In the full Hispanic-American sample of 1269 individuals for SNPs with MAF = 0.15, the power for this study to detect that effect size with type 1 error (P) of 10-3 and 10-4 are 89% and 71%, respectively. The power to detect a similar effect at a genome-wide level of significance is only 32%. Further examination of populations and correlated phenotypes for replication are needed.
In summary, a multistage GWAS for the acute insulin response to glucose (AIRg) - a measure of beta cell response - was performed in a Hispanic-American sample from the IRAS Family Study. Based upon these results, there are numerous SNPs and regions of the genome that may contain variants that account for common variation in AIRg. Two regions were interrogated with increased genotyping, and support was observed for a gene desert in chromosome 6p21 (containing PXT1, KCTD20 and STK38) and for several genes in chromosome 20p11 (ENTPD6, PYGB, GINS1 and RP4-691N24.1). The latter series of genes suggests a mechanism of action that could be involved in phosphorylation and ion transport. However, the power of the study is limited, the proportion of the observed heritability of AIRg explained by these genes is not substantial, and there is need for replication. In conclusion, several candidate genes have been identified as possibly contributing to variation in AIRg, a predictor of type 2 diabetes.
ESM Figure 1. QQ Plot of AIRg in the GWAS.
ESM Figure 2. HapMap (MEX) chromosome 6p21 containing candidate genes (ETV1, PXT1, KCTD20, STK38) for AIRg.
Figure 2A. Linkage disequilibrium as r2
Figure 2B. Linkage disequilibrium as D’
ESM Figure 3. HapMap (MEX) chromosome 20p11 containing candidate genes (ENTPD6, PYGB, ABHD12, GINS1, RP4-691N24.1) for AIRg.
Figure 3A. Linkage disequilibrium as r2
Figure 3B. Linkage disequilibrium as D’
This research was supported in part by NIH grants HL060894, HL060931, HL060944, HL061019, and HL061210; the University of Virginia Harrison Chair in Public Health Sciences (SSR), and the Cedars-Sinai Board of Governors’ Chair in Medical Genetics (JIR).
Duality of Interest The authors declare that there is no duality of interest associated with this manuscript.