Search tips
Search criteria

Results 1-25 (39)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Identifying Plausible Genetic Models Based on Association and Linkage Results: Application to Type 2 Diabetes 
Genetic epidemiology  2012;10.1002/gepi.21668.
When planning re-sequencing studies for complex diseases, previous association and linkage studies can constrain the range of plausible genetic models for a given locus. Here, we explore the combinations of causal risk allele frequency RAFC and genotype relative risk GRRC consistent with no or limited evidence for affected sibling pair (ASP) linkage and strong evidence for case-control association. We find that significant evidence for case-control association combined with no or moderate evidence for ASP linkage can define a lower bound for the plausible RAFC. Using data from large type 2 diabetes (T2D) linkage and genome-wide association study meta-analyses, we find that under reasonable model assumptions, 23 of 36 autosomal T2D risk loci are unlikely to be due to causal variants with combined RAFC < .005, and four of the 23 are unlikely to be due to causal variants with combined RAFC < .05.
PMCID: PMC3578091  PMID: 22865662
gene mapping; genetics; genetic structure; complex diseases
2.  Importance of different types of prior knowledge in selecting genome-wide findings for follow-up 
Genetic epidemiology  2013;37(2):10.1002/gepi.21705.
Biological plausibility and other prior information could help select genome-wide association (GWA) findings for further follow-up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts’ opinions and empirical evidence to estimate the relative importance of 15 types of information at the single nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from ten experts using a two-round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta-analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta-analysis or more than one study as conferring the highest relative probability of true association, while previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, while location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research.
PMCID: PMC3725558  PMID: 23307621
Gene prioritization; Genome-wide association studies; Bioinformatics databases
3.  SNP prioritization using a Bayesian probability of association 
Genetic epidemiology  2012;37(2):10.1002/gepi.21704.
Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the p-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified fourteen important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers’ subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the p-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with p-values alone.
PMCID: PMC3725584  PMID: 23280596
replication; prior knowledge; genome-wide studies
4.  Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci 
PLoS Genetics  2014;10(1):e1004147.
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Author Summary
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
PMCID: PMC3907339  PMID: 24497850
5.  Joint Analysis of Individual Participants’ Data from 17 Studies on the Association of the IL6 Variant -174G>C with Circulating Glucose Levels, Interleukin-6 Levels, and Body-Mass Index 
Annals of medicine  2009;41(2):128-138.
Several studies have investigated associations between the -174G>C polymorphism (rs1800795) of the IL6-gene, but presented inconsistent results.
This joint analysis aimed to clarify whether IL6 -174G>C was associated with type 2 diabetes mellitus (T2DM) related quantitative phenotypes.
Individual-level data from all studies of the IL6-T2DM consortium on Caucasian subjects with available BMI were collected. As study-specific estimates did not show heterogeneity (P>0.1), they were combined by using the inverse-variance fixed-effect model.
The main analysis included 9440, 7398, 24,117, or 5659 nondiabetic and manifest T2DM subjects for fasting glucose, 2-hour glucose, BMI or circulating interleukin-6 levels, respectively. IL6 -174 C-allele carriers had significantly lower fasting glucose (−0.091mmol/L, P=0.014). There was no evidence for association between IL6 -174G>C and BMI or interleukin-6. In an additional analysis of 641 subjects known to develop T2DM later on, the IL6 -174 CC-genotype was associated with higher baseline interleukin-6 (+0.75pg/mL, P=0.004), which was consistent with higher interleukin-6 in the 966 manifest T2DM subjects (+0.50pg/mL, P=0.044).
Our data suggest association between IL6 -174G>C and quantitative glucose, and exploratory analysis indicated modulated interleukin-6 levels in pre-diabetic subjects, being in-line with this SNP’s previously reported T2DM association and a role of circulating interleukin-6 as intermediate phenotype.
PMCID: PMC3801210  PMID: 18752089
blood glucose; body mass index; diabetes mellitus; type 2; epidemiology; molecular; genes; inflammation mediators; interleukin-6; intermediate phenotype; meta-analysis; polymorphism; single nucleotide
6.  Exome array analysis identifies novel loci and low-frequency variants for insulin processing and secretion 
Nature genetics  2012;45(2):197-201.
Insulin secretion plays a critical role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci contributing to insulin processing and secretion1,2; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5% to 5%) and rare (MAF<0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 non-diabetic Finnish males. We identified low-frequency coding variants associated with fasting proinsulin levels at the SGSM2 and MADD GWAS loci and three novel genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1, and PAM. We also demonstrate that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs nearby and megabases (Mb) away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits.
PMCID: PMC3727235  PMID: 23263489
7.  A common variant in the melatonin receptor gene (MTNR1B) is associated with increased risk of future type 2 diabetes and impaired early insulin secretion 
Nature genetics  2008;41(1):82-88.
Genome wide association studies revealed that variation in the Melatonin Receptor 1B gene (MTNR1B) is associated with insulin and glucose concentrations. Here we show that the risk genotype of this SNP predicts future type 2 diabetes (T2D) in two large prospective studies. Specifically, the risk genotype was associated with impairment of early insulin response to both oral and intravenous glucose and with faster deterioration of insulin secretion over time. We also show that the Melatonin Receptor 1B mRNA is expressed in human islets, and immunocytochemistry confirms that it is primarily localized in β-cells in islets. Non-diabetic individuals carrying the risk allele and patients with T2D showed increased expression of the receptor in islets. Insulin release from clonal β-cells in response to glucose was inhibited in the presence of melatonin. These data suggest that the circulating hormone melatonin, which is predominantly released from the pineal gland in the brain, is involved in the pathogenesis of T2D. Given the increased expression of Melatonin Receptor 1B in individuals at risk of T2D, the pathogenic effects are likely exerted via a direct inhibitory effect on β-cells. In view of these results, blocking the melatonin ligand-receptor system could be a therapeutic avenue in T2D.
PMCID: PMC3725650  PMID: 19060908
8.  Gene × Physical Activity Interactions in Obesity: Combined Analysis of 111,421 Individuals of European Ancestry 
PLoS Genetics  2013;9(7):e1003607.
Numerous obesity loci have been identified using genome-wide association studies. A UK study indicated that physical activity may attenuate the cumulative effect of 12 of these loci, but replication studies are lacking. Therefore, we tested whether the aggregate effect of these loci is diminished in adults of European ancestry reporting high levels of physical activity. Twelve obesity-susceptibility loci were genotyped or imputed in 111,421 participants. A genetic risk score (GRS) was calculated by summing the BMI-associated alleles of each genetic variant. Physical activity was assessed using self-administered questionnaires. Multiplicative interactions between the GRS and physical activity on BMI were tested in linear and logistic regression models in each cohort, with adjustment for age, age2, sex, study center (for multicenter studies), and the marginal terms for physical activity and the GRS. These results were combined using meta-analysis weighted by cohort sample size. The meta-analysis yielded a statistically significant GRS × physical activity interaction effect estimate (Pinteraction = 0.015). However, a statistically significant interaction effect was only apparent in North American cohorts (n = 39,810, Pinteraction = 0.014 vs. n = 71,611, Pinteraction = 0.275 for Europeans). In secondary analyses, both the FTO rs1121980 (Pinteraction = 0.003) and the SEC16B rs10913469 (Pinteraction = 0.025) variants showed evidence of SNP × physical activity interactions. This meta-analysis of 111,421 individuals provides further support for an interaction between physical activity and a GRS in obesity disposition, although these findings hinge on the inclusion of cohorts from North America, indicating that these results are either population-specific or non-causal.
Author Summary
We undertook analyses in 111,421 adults of European descent to examine whether physical activity diminishes the genetic risk of obesity predisposed by 12 single nucleotide polymorphisms, as previously reported in a study of 20,000 UK adults (Li et al, PLoS Med. 2010). Although the study by Li et al is widely cited, the original report has not been replicated to our knowledge. Therefore, we sought to confirm or refute the original study's findings in a combined analysis of 111,421 adults. Our analyses yielded a statistically significant interaction effect (Pinteraction = 0.015), confirming the original study's results; we also identified an interaction between the FTO locus and physical activity (Pinteraction = 0.003), verifying previous analyses (Kilpelainen et al, PLoS Med., 2010), and we detected a novel interaction between the SEC16B locus and physical activity (Pinteraction = 0.025). We also examined the power constraints of interaction analyses, thereby demonstrating that sources of within- and between-study heterogeneity and the manner in which data are treated can inhibit the detection of interaction effects in meta-analyses that combine many cohorts with varying characteristics. This suggests that combining many small studies that have measured environmental exposures differently may be relatively inefficient for the detection of gene × environment interactions.
PMCID: PMC3723486  PMID: 23935507
9.  Hyperglycemia and a Common Variant of GCKR Are Associated With the Levels of Eight Amino Acids in 9,369 Finnish Men 
Diabetes  2012;61(7):1895-1902.
We investigated the association of glycemia and 43 genetic risk variants for hyperglycemia/type 2 diabetes with amino acid levels in the population-based Metabolic Syndrome in Men (METSIM) Study, including 9,369 nondiabetic or newly diagnosed type 2 diabetic Finnish men. Plasma levels of eight amino acids were measured with proton nuclear magnetic resonance spectroscopy. Increasing fasting and 2-h plasma glucose levels were associated with increasing levels of several amino acids and decreasing levels of histidine and glutamine. Alanine, leucine, isoleucine, tyrosine, and glutamine predicted incident type 2 diabetes in a 4.7-year follow-up of the METSIM Study, and their effects were largely mediated by insulin resistance (except for glutamine). We also found significant correlations between insulin sensitivity (Matsuda insulin sensitivity index) and mRNA expression of genes regulating amino acid degradation in 200 subcutaneous adipose tissue samples. Only 1 of 43 risk single nucleotide polymorphisms for type 2 diabetes or hyperglycemia, the glucose-increasing major C allele of rs780094 of GCKR, was significantly associated with decreased levels of alanine and isoleucine and elevated levels of glutamine. In conclusion, the levels of branched-chain, aromatic amino acids and alanine increased and the levels of glutamine and histidine decreased with increasing glycemia, reflecting, at least in part, insulin resistance. Only one single nucleotide polymorphism regulating hyperglycemia was significantly associated with amino acid levels.
PMCID: PMC3379649  PMID: 22553379
10.  Human longevity and common variations in the LMNA gene: a meta-analysis 
Aging Cell  2012;11(3):475-481.
A mutation in the LMNA gene is responsible for the most dramatic form of premature aging, Hutchinson-Gilford progeria syndrome (HGPS). Several recent studies have suggested that protein products of this gene might have a role in normal physiological cellular senescence. To explore further LMNA's possible role in normal aging, we genotyped 16 SNPs over a span of 75.4 kb of the LMNA gene on a sample of long-lived individuals (US Caucasians with age ≥95 years, N=873) and genetically matched younger controls (N=443). We tested all common non-redundant haplotypes (frequency ≥ 0.05) based on subgroups of these 16 SNPs for association with longevity. The most significant haplotype, based on 4 SNPs, remained significant after adjustment for multiple testing (OR = 1.56, P=2.5×10−5, multiple-testing-adjusted P=0.0045). To attempt to replicate these results, we genotyped 3448 subjects from four independent samples of long-lived individuals and control subjects from 1) the New England Centenarian Study (NECS) (N=738), 2) the Southern Italian Centenarian Study (SICS) (N=905), 3) France (N=1103), and 4) the Einstein Ashkenazi Longevity Study (N=702). We replicated the association with the most significant haplotype from our initial analysis in the NECS sample (OR = 1.60, P=0.0023), but not in the other three samples (P>.15). In a meta-analysis combining all five samples, the best haplotype remained significantly associated with longevity after adjustment for multiple testing in the initial and follow-up samples (OR = 1.18, P=7.5×10−4, multiple-testing-adjusted P=0.037). These results suggest that LMNA variants may play a role in human lifespan.
PMCID: PMC3350595  PMID: 22340368
longevity gene; human; longevity; genetics
11.  Estimating Hepatic Glucokinase Activity Using a Simple Model of Lactate Kinetics 
Diabetes Care  2012;35(5):1015-1020.
Glucokinase (GCK) acts as a component of the “glucose sensor” in pancreatic β-cells and possibly in other tissues, including the brain. However, >99% of GCK in the body is located in the liver, where it serves as a “gatekeeper”, determining the rate of hepatic glucose phosphorylation. Mutations in GCK are a cause of maturity-onset diabetes of the young (MODY), and GCKR, the regulator of GCK in the liver, is a diabetes susceptibility locus. In addition, several GCK activators are being studied as potential regulators of blood glucose. The ability to estimate liver GCK activity in vivo for genetic and pharmacologic studies may provide important physiologic insights into the regulation of hepatic glucose metabolism.
Here we introduce a simple, linear, two-compartment kinetic model that exploits lactate and glucose kinetics observed during the frequently sampled intravenous glucose tolerance test (FSIGT) to estimate liver GCK activity (KGK), glycolysis (K12), and whole body fractional lactate clearance (K01).
To test our working model of lactate, we used cross-sectional FSIGT data on 142 nondiabetic individuals chosen at random from the Finland–United States Investigation of NIDDM Genetics study cohort. Parameters KGK, K12, and K01 were precisely estimated. Median model parameter estimates were consistent with previously published values.
This novel model of lactate kinetics extends the utility of the FSIGT protocol beyond whole-body glucose homeostasis by providing estimates for indices pertaining to hepatic glucose metabolism, including hepatic GCK activity and glycolysis rate.
PMCID: PMC3329822  PMID: 22456868
12.  FTEC: a coalescent simulator for modeling faster than exponential growth 
Bioinformatics  2012;28(9):1282-1283.
Summary: Recent genetic studies as well as recorded history point to massive growth in human population sizes during the recent past. To model and understand this growth accurately we introduce FTEC, an easy-to-use coalescent simulation program capable of simulating haplotype samples drawn from a population that has undergone faster than exponential growth. Samples drawn from a population that has undergone faster than exponential growth show an excess of very rare variation and more rapid LD decay when compared with samples drawn from a population that has maintained a constant size over time.
Availability: Source code for FTEC is freely available for download from the University of Michigan Center for Statistical Genetics Wiki at
Supplementary information: Supplementary data are available at Bioinformatics online
PMCID: PMC3338023  PMID: 22441586
13.  Quantifying and correcting for the winner's curse in quantitative-trait association studies 
Genetic epidemiology  2011;35(3):133-138.
Quantitative traits (QT) are an important focus of human genetic studies both because of interest in the traits themselves, and because of their role as risk factors for many human diseases. For large-scale QT association studies including genome-wide association studies (GWAS), investigators usually focus on genetic loci showing significant evidence for SNP-QT association, and genetic effect size tends to be overestimated as a consequence of the winner’s curse. In this paper, we study the impact of the winner’s curse on QT association studies in which the genetic effect size is parameterized as the slope in a linear regression model. We demonstrate by analytical calculation that the overestimation in the regression slope estimate decreases as power increases. To reduce the ascertainment bias, we propose a three-parameter maximum likelihood method and then simplify this to a one-parameter method by excluding nuisance parameters. We show that both methods reduce the bias when power to detect association is low or moderate, and that the one-parameter model generally results in smaller variance in the estimate.
PMCID: PMC3500533  PMID: 21284035
quantitative trait; winner’s curse; ascertainment bias; genome-wide association study; linear regression; maximum likelihood
14.  Effects of 34 Risk Loci for Type 2 Diabetes or Hyperglycemia on Lipoprotein Subclasses and Their Composition in 6,580 Nondiabetic Finnish Men 
Diabetes  2011;60(5):1608-1616.
We investigated the effects of 34 genetic risk variants for hyperglycemia/type 2 diabetes on lipoprotein subclasses and particle composition in a large population-based cohort.
The study included 6,580 nondiabetic Finnish men from the population-based Metabolic Syndrome in Men (METSIM) study (aged 57 ± 7 years; BMI 26.8 ± 3.7 kg/m2). Genotyping of 34 single nucleotide polymorphism (SNPs) for hyperglycemia/type 2 diabetes was performed. Proton nuclear magnetic resonance spectroscopy was used to measure particle concentrations of 14 lipoprotein subclasses and their composition in native serum samples.
The glucose-increasing allele of rs780094 in GCKR was significantly associated with low concentrations of VLDL particles (independently of their size) and small LDL and was nominally associated with low concentrations of intermediate-density lipoprotein, all LDL subclasses, and high concentrations of very large and large HDL particles. The glucose-increasing allele of rs174550 in FADS1 was significantly associated with high concentrations of very large and large HDL particles and nominally associated with low concentrations of all VLDL particles. SNPs rs10923931 in NOTCH2 and rs757210 in HNF1B genes showed nominal or significant associations with several lipoprotein traits. The genetic risk score of 34 SNPs was not associated with any of the lipoprotein subclasses.
Four of the 34 risk loci for type 2 diabetes or hyperglycemia (GCKR, FADS1, NOTCH2, and HNF1B) were significantly associated with lipoprotein traits. A GCKR variant predominantly affected the concentration of VLDL, and the FADS1 variant affected very large and large HDL particles. Only a limited number of risk loci for hyperglycemia/type 2 diabetes significantly affect lipoprotein metabolism.
PMCID: PMC3292337  PMID: 21421807
15.  Common Variants Show Predicted Polygenic Effects on Height in the Tails of the Distribution, Except in Extremely Short Individuals 
PLoS Genetics  2011;7(12):e1002439.
Common genetic variants have been shown to explain a fraction of the inherited variation for many common diseases and quantitative traits, including height, a classic polygenic trait. The extent to which common variation determines the phenotype of highly heritable traits such as height is uncertain, as is the extent to which common variation is relevant to individuals with more extreme phenotypes. To address these questions, we studied 1,214 individuals from the top and bottom extremes of the height distribution (tallest and shortest ∼1.5%), drawn from ∼78,000 individuals from the HUNT and FINRISK cohorts. We found that common variants still influence height at the extremes of the distribution: common variants (49/141) were nominally associated with height in the expected direction more often than is expected by chance (p<5×10−28), and the odds ratios in the extreme samples were consistent with the effects estimated previously in population-based data. To examine more closely whether the common variants have the expected effects, we calculated a weighted allele score (WAS), which is a weighted prediction of height for each individual based on the previously estimated effect sizes of the common variants in the overall population. The average WAS is consistent with expectation in the tall individuals, but was not as extreme as expected in the shortest individuals (p<0.006), indicating that some of the short stature is explained by factors other than common genetic variation. The discrepancy was more pronounced (p<10−6) in the most extreme individuals (height<0.25 percentile). The results at the extreme short tails are consistent with a large number of models incorporating either rare genetic non-additive or rare non-genetic factors that decrease height. We conclude that common genetic variants are associated with height at the extremes as well as across the population, but that additional factors become more prominent at the shorter extreme.
Author Summary
Although there are many loci in the human genome that have been discovered to be significantly associated with height, it is unclear if these loci have similar effects in extremely tall and short individuals. Here, we examine hundreds of extremely tall and short individuals in two population-based cohorts to see if these known height determining loci are as predictive as expected in these individuals. We found that these loci are generally as predictive of height as expected in these individuals but that they begin to be less predictive in the most extremely short individuals. We showed that this result is consistent with models that not only include the common variants but also multiple low frequency genetic variants that substantially decrease height. However, this result is also consistent with non-additive genetic effects or rare non-genetic factors that substantially decrease height. This finding suggests the possibility of a major role of low frequency variants, particularly in individuals with extreme phenotypes, and has implications on whole-genome or whole-exome sequencing efforts to discover rare genetic variation associated with complex traits.
PMCID: PMC3248463  PMID: 22242009
16.  Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci 
Cell metabolism  2010;12(5):443-455.
Identifying cis-regulatory elements is important to understand how human pancreatic islets modulate gene expression in physiologic or pathophysiologic (e.g., diabetic) conditions. We conducted genome-wide analysis of DNase I hypersensitive sites, histone H3 lysine methylation modifications (K4me1, K4me3, K79me2), and CCCTC factor (CTCF) binding in human islets. This identified ~18,000 putative promoters (several hundred unannotated and islet-active). Surprisingly, active promoter modifications were absent at genes encoding islet-specific hormones, suggesting a distinct regulatory mechanism. Of 34,039 distal (non-promoter) regulatory elements, 47% are islet-unique and 22% are CTCF-bound. In the 18 type 2 diabetes (T2D)-associated loci, we identified 118 putative regulatory elements and confirmed enhancer activity for 12/33 tested. Among 6 regulatory elements harboring T2D-associated variants, 2 exhibit significant allele-specific differences in activity. These findings present a global snapshot of the human islet epigenome and should provide functional context for non-coding variants emerging from genetic studies of T2D and other islet disorders.
PMCID: PMC3026436  PMID: 21035756
17.  Fine Mapping of Five Loci Associated with Low-Density Lipoprotein Cholesterol Detects Variants That Double the Explained Heritability 
PLoS Genetics  2011;7(7):e1002198.
Complex trait genome-wide association studies (GWAS) provide an efficient strategy for evaluating large numbers of common variants in large numbers of individuals and for identifying trait-associated variants. Nevertheless, GWAS often leave much of the trait heritability unexplained. We hypothesized that some of this unexplained heritability might be due to common and rare variants that reside in GWAS identified loci but lack appropriate proxies in modern genotyping arrays. To assess this hypothesis, we re-examined 7 genes (APOE, APOC1, APOC2, SORT1, LDLR, APOB, and PCSK9) in 5 loci associated with low-density lipoprotein cholesterol (LDL-C) in multiple GWAS. For each gene, we first catalogued genetic variation by re-sequencing 256 Sardinian individuals with extreme LDL-C values. Next, we genotyped variants identified by us and by the 1000 Genomes Project (totaling 3,277 SNPs) in 5,524 volunteers. We found that in one locus (PCSK9) the GWAS signal could be explained by a previously described low-frequency variant and that in three loci (PCSK9, APOE, and LDLR) there were additional variants independently associated with LDL-C, including a novel and rare LDLR variant that seems specific to Sardinians. Overall, this more detailed assessment of SNP variation in these loci increased estimates of the heritability of LDL-C accounted for by these genes from 3.1% to 6.5%. All association signals and the heritability estimates were successfully confirmed in a sample of ∼10,000 Finnish and Norwegian individuals. Our results thus suggest that focusing on variants accessible via GWAS can lead to clear underestimates of the trait heritability explained by a set of loci. Further, our results suggest that, as prelude to large-scale sequencing efforts, targeted re-sequencing efforts paired with large-scale genotyping will increase estimates of complex trait heritability explained by known loci.
Author Summary
Despite the striking success of genome-wide association studies in identifying genetic loci associated with common complex traits and diseases, much of the heritable risk for these traits and diseases remains unexplained. A higher resolution investigation of the genome through sequencing studies is expected to clarify the sources of this missing heritability. As a preview of what we might learn in these more detailed assessments of genetic variation, we used sequencing to identify potentially interesting variants in seven genes associated with low-density lipoprotein cholesterol (LDL-C) in 256 Sardinian individuals with extreme LDL-C levels, followed by large scale genotyping in 5,524 individuals, to examine newly discovered and previously described variants. We found that a combination of common and rare variants in these loci contributes to variation in LDL-C levels, and also that the initial estimate of the heritability explained by these loci doubled. Importantly, our results include a Sardinian-specific rare variant, highlighting the need for sequencing studies in isolated populations. Our results provide insights about what extensive whole-genome sequencing efforts are likely to reveal for the understanding of the genetic architecture of complex traits.
PMCID: PMC3145627  PMID: 21829380
18.  Detailed Physiologic Characterization Reveals Diverse Mechanisms for Novel Genetic Loci Regulating Glucose and Insulin Metabolism in Humans 
Diabetes  2010;59(5):1266-1275.
Recent genome-wide association studies have revealed loci associated with glucose and insulin-related traits. We aimed to characterize 19 such loci using detailed measures of insulin processing, secretion, and sensitivity to help elucidate their role in regulation of glucose control, insulin secretion and/or action.
We investigated associations of loci identified by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) with circulating proinsulin, measures of insulin secretion and sensitivity from oral glucose tolerance tests (OGTTs), euglycemic clamps, insulin suppression tests, or frequently sampled intravenous glucose tolerance tests in nondiabetic humans (n = 29,084).
The glucose-raising allele in MADD was associated with abnormal insulin processing (a dramatic effect on higher proinsulin levels, but no association with insulinogenic index) at extremely persuasive levels of statistical significance (P = 2.1 × 10−71). Defects in insulin processing and insulin secretion were seen in glucose-raising allele carriers at TCF7L2, SCL30A8, GIPR, and C2CD4B. Abnormalities in early insulin secretion were suggested in glucose-raising allele carriers at MTNR1B, GCK, FADS1, DGKB, and PROX1 (lower insulinogenic index; no association with proinsulin or insulin sensitivity). Two loci previously associated with fasting insulin (GCKR and IGF1) were associated with OGTT-derived insulin sensitivity indices in a consistent direction.
Genetic loci identified through their effect on hyperglycemia and/or hyperinsulinemia demonstrate considerable heterogeneity in associations with measures of insulin processing, secretion, and sensitivity. Our findings emphasize the importance of detailed physiological characterization of such loci for improved understanding of pathways associated with alterations in glucose homeostasis and eventually type 2 diabetes.
PMCID: PMC2857908  PMID: 20185807
19.  Genome-wide association studies in diverse populations 
Nature reviews. Genetics  2010;11(5):356-366.
Genome-wide association (GWA) studies have identified a large number of single-nucleotide polymorphisms (SNPs) associated with disease phenotypes. As most GWA studies have been performed primarily in populations of European descent, this review examines the issues involved in extending consideration of GWA studies to diverse worldwide populations. Although challenges exist with such issues as imputation, admixture, and replication, investigation of diverse populations in GWA studies has significant potential to advance the project of mapping the genetic determinants of complex diseases for the human population as a whole.
PMCID: PMC3079573  PMID: 20395969
20.  Transferability of Type 2 Diabetes Implicated Loci in Multi-Ethnic Cohorts from Southeast Asia 
PLoS Genetics  2011;7(4):e1001363.
Recent large genome-wide association studies (GWAS) have identified multiple loci which harbor genetic variants associated with type 2 diabetes mellitus (T2D), many of which encode proteins not previously suspected to be involved in the pathogenesis of T2D. Most GWAS for T2D have focused on populations of European descent, and GWAS conducted in other populations with different ancestry offer a unique opportunity to study the genetic architecture of T2D. We performed genome-wide association scans for T2D in 3,955 Chinese (2,010 cases, 1,945 controls), 2,034 Malays (794 cases, 1,240 controls), and 2,146 Asian Indians (977 cases, 1,169 controls). In addition to the search for novel variants implicated in T2D, these multi-ethnic cohorts serve to assess the transferability and relevance of the previous findings from European descent populations in the three major ethnic populations of Asia, comprising half of the world's population. Of the SNPs associated with T2D in previous GWAS, only variants at CDKAL1 and HHEX/IDE/KIF11 showed the strongest association with T2D in the meta-analysis including all three ethnic groups. However, consistent direction of effect was observed for many of the other SNPs in our study and in those carried out in European populations. Close examination of the associations at both the CDKAL1 and HHEX/IDE/KIF11 loci provided some evidence of locus and allelic heterogeneity in relation to the associations with T2D. We also detected variation in linkage disequilibrium between populations for most of these loci that have been previously identified. These factors, combined with limited statistical power, may contribute to the failure to detect associations across populations of diverse ethnicity. These findings highlight the value of surveying across diverse racial/ethnic groups towards the fine-mapping efforts for the casual variants and also of the search for variants, which may be population-specific.
Author Summary
Type 2 diabetes mellitus (T2D) is a chronic disease which can lead to complications such as heart disease, stroke, hypertension, blindness due to diabetic retinopathy, amputations from peripheral vascular diseases, and kidney disease from diabetic nephropathy. The increasing prevalence and complications of T2D are likely to increase the health and economic burden of individuals, families, health systems, and countries. Our study carried out in three major Asian ethnic groups (Chinese, Malays, and Indians) in Singapore suggests that the findings of studies carried out in populations of European ancestry (which represents most studies to date) may be relevant to populations in Asia. However, our study also raises the possibility that different genes, and within the genes different variants, may confer susceptibility to T2D in these populations. These findings are particularly relevant in Asia, where the greatest growth of T2D is expected in the coming years, and emphasize the importance of studying diverse populations when trying to localize the regions of the genome associated with T2D. In addition, we may need to consider novel methods for combining data across populations.
PMCID: PMC3072366  PMID: 21490949
Genetic epidemiology  2010;34(7):739-746.
Meta-analysis has become a key component of well-designed genetic association studies due to the boost in statistical power achieved by combining results across multiple samples of individuals and the need to validate observed associations in independent studies. Meta-analyses of genetic association studies based on multiple SNPs and traits are subject to the same multiple testing issues as single-sample studies, but it is often difficult to adjust accurately for the multiple tests. Procedures such as Bonferroni may control the type I error rate but will generally provide an overly harsh correction if SNPs or traits are correlated. Depending on study design, availability of individual-level data, and computational requirements, permutation testing may not be feasible in a meta-analysis framework. In this paper we present methods for adjusting for multiple correlated tests under several study designs commonly employed in meta-analyses of genetic association tests. Our methods are applicable to both prospective meta-analyses in which several samples of individuals are analyzed with the intent to combine results, and retrospective meta-analyses, in which results from published studies are combined, including situations in which 1) individual-level data are unavailable, and 2) different sets of SNPs are genotyped in different studies due to random missingness or two-stage design. We show through simulation that our methods accurately control the rate of type I error and achieve improved power over multiple testing adjustments that do not account for correlation between SNPs or traits.
PMCID: PMC3070606  PMID: 20878715
meta-analysis; association study; multiple testing; SNPs
22.  A Variance-Component Framework for Pedigree Analysis of Continuous and Categorical Outcomes 
Statistics in biosciences  2009;1(2):181-198.
Variance-component methods are popular and flexible analytic tools for elucidating the genetic mechanisms of complex quantitative traits from pedigree data. However, variance-component methods typically assume that the trait of interest follows a multivariate normal distribution within a pedigree. Studies have shown that violation of this normality assumption can lead to biased parameter estimates and inflations in type-I error. This limits the application of variance-component methods to more general trait outcomes, whether continuous or categorical in nature. In this paper, we develop and apply a general variance-component framework for pedigree analysis of continuous and categorical outcomes. We develop appropriate models using generalized-linear mixed model theory and fit such models using approximate maximum-likelihood procedures. Using our proposed method, we demonstrate that one can perform variance-component pedigree analysis on outcomes that follow any exponential-family distribution. Additionally, we also show how one can modify the method to perform pedigree analysis of ordinal outcomes. We also discuss extensions of our variance-component framework to accommodate pedigrees ascertained based on trait outcome. We demonstrate the feasibility of our method using both simulated data and data from a genetic study of ovarian insufficiency.
PMCID: PMC2860148  PMID: 20436936
Variance component model; linkage analysis; generalized linear mixed model
23.  Association of 18 Confirmed Susceptibility Loci for Type 2 Diabetes With Indices of Insulin Release, Proinsulin Conversion, and Insulin Sensitivity in 5,327 Nondiabetic Finnish Men 
Diabetes  2009;58(9):2129-2136.
We investigated the effects of 18 confirmed type 2 diabetes risk single nucleotide polymorphisms (SNPs) on insulin sensitivity, insulin secretion, and conversion of proinsulin to insulin.
A total of 5,327 nondiabetic men (age 58 ± 7 years, BMI 27.0 ± 3.8 kg/m2) from a large population-based cohort were included. Oral glucose tolerance tests and genotyping of SNPs in or near PPARG, KCNJ11, TCF7L2, SLC30A8, HHEX, LOC387761, CDKN2B, IGF2BP2, CDKAL1, HNF1B, WFS1, JAZF1, CDC123, TSPAN8, THADA, ADAMTS9, NOTCH2, KCNQ1, and MTNR1B were performed. HNF1B rs757210 was excluded because of failure to achieve Hardy-Weinberg equilibrium.
Six SNPs (TCF7L2, SLC30A8, HHEX, CDKN2B, CDKAL1, and MTNR1B) were significantly (P < 6.9 × 10−4) and two SNPs (KCNJ11 and IGF2BP2) were nominally (P < 0.05) associated with early-phase insulin release (InsAUC0–30/GluAUC0–30), adjusted for age, BMI, and insulin sensitivity (Matsuda ISI). Combined effects of these eight SNPs reached −32% reduction in InsAUC0–30/GluAUC0–30 in carriers of ≥11 vs. ≤3 weighted risk alleles. Four SNPs (SLC30A8, HHEX, CDKAL1, and TCF7L2) were significantly or nominally associated with indexes of proinsulin conversion. Three SNPs (KCNJ11, HHEX, and TSPAN8) were nominally associated with Matsuda ISI (adjusted for age and BMI). The effect of HHEX on Matsuda ISI became significant after additional adjustment for InsAUC0–30/GluAUC0–30. Nine SNPs did not show any associations with examined traits.
Eight type 2 diabetes–related loci were significantly or nominally associated with impaired early-phase insulin release. Effects of SLC30A8, HHEX, CDKAL1, and TCF7L2 on insulin release could be partially explained by impaired proinsulin conversion. HHEX might influence both insulin release and insulin sensitivity.
PMCID: PMC2731523  PMID: 19502414
24.  Genotype-Based Matching to Correct for Population Stratification in Large-Scale Case-Control Genetic Association Studies 
Genetic epidemiology  2009;33(6):508-517.
Genome-wide association studies are helping to dissect the etiology of complex diseases. Although case-control association tests are generally more powerful than family-based association tests, population stratification can lead to spurious disease-marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome-wide or large-scale candidate gene association study. GSM is comprised of three steps: 1) calculating similarity scores for pairs of individuals using the genotype data; 2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and 3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re-matching after genotyping is a method of choice for genome-wide association studies.
PMCID: PMC2732762  PMID: 19170134
population stratification; genome-wide association; genetic similarity
25.  Underlying Genetic Models of Inheritance in Established Type 2 Diabetes Associations 
American Journal of Epidemiology  2009;170(5):537-545.
For most associations of common single nucleotide polymorphisms (SNPs) with common diseases, the genetic model of inheritance is unknown. The authors extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations with type 2 diabetes. For 13 SNPs, the data fitted very well to an additive model of inheritance for the diabetes risk allele; for 4 SNPs, the data were consistent with either an additive model or a dominant model; and for 2 SNPs, the data were consistent with an additive or recessive model. Results were robust to the use of different priors and after exclusion of data for which index SNPs had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that were very similar to those previously reported based on fixed- or random-effects models, but uncertainty about several of the effects was substantially larger. The authors also examined the extent of between-study heterogeneity in the genetic model and found generally small between-study deviation values for the genetic model parameter. Heterosis could not be excluded for 4 SNPs. Information on the genetic model of robustly replicated association signals derived from genome-wide association studies may be useful for predictive modeling and for designing biologic and functional experiments.
PMCID: PMC2732984  PMID: 19602701
Bayes theorem; diabetes mellitus, type 2; meta-analysis; models, genetic; polymorphism, genetic; population characteristics

Results 1-25 (39)