The question of which statistical approach is the most effective for investigating gene-environment (G-E) interactions in the context of genome-wide association studies (GWAS) remains unresolved. By using 2 case-control GWAS (the Nurses’ Health Study, 1976–2006, and the Health Professionals Follow-up Study, 1986–2006) of type 2 diabetes, the authors compared 5 tests for interactions: standard logistic regression-based case-control; case-only; semiparametric maximum-likelihood estimation of an empirical-Bayes shrinkage estimator; and 2-stage tests. The authors also compared 2 joint tests of genetic main effects and G-E interaction. Elevated body mass index was the exposure of interest and was modeled as a binary trait to avoid an inflated type I error rate that the authors observed when the main effect of continuous body mass index was misspecified. Although both the case-only and the semiparametric maximum-likelihood estimation approaches assume that the tested markers are independent of exposure in the general population, the authors did not observe any evidence of inflated type I error for these tests in their studies with 2,199 cases and 3,044 controls. Both joint tests detected markers with known marginal effects. Loci with the most significant G-E interactions using the standard, empirical-Bayes, and 2-stage tests were strongly correlated with the exposure among controls. Study findings suggest that methods exploiting G-E independence can be efficient and valid options for investigating G-E interactions in GWAS.
case-control studies; case study; diabetes mellitus, type 2; epidemiologic methods; genome-wide association study; genotype-environment interaction
The primary circulating form of vitamin D is 25-hydroxy-vitamin D (25(OH)D), a modifiable trait linked with a growing number of chronic diseases. In addition to environmental determinants of 25(OH)D, including dietary sources and skin ultraviolet B (UVB) exposure, twin and family-based studies suggest that genetics contribute substantially to vitamin D variability with heritability estimates ranging from 43% to 80%. Genome-wide association studies (GWAS) have identified SNPs located in four gene regions associated with 25(OH)D. These SNPs collectively explain only a fraction of the heritability in 25(OH)D estimated by twin and family based studies. Using 25(OH)D concentrations and GWAS data on 5,575 subjects drawn from 5 cohorts, we hypothesized that genome-wide data, in the form of (1) a polygenic score comprised of hundreds or thousands of SNPs that do not individually reach GWAS significance, or (2) a linear-mixed-model for genome-wide complex trait analysis, would explain variance in measured circulating 25(OH)D beyond that explained by known genome-wide significant 25(OH)D associated SNPs. GWAS identified SNPs explained 5.2% of the variation in circulating 25(OH)D in these samples and there was little evidence additional markers significantly improved predictive ability. On average a polygenic score comprised of GWAS identified SNPs explained a larger proportion of variation in circulating 25(OH)D than scores comprised of thousands of SNPs which were on average, non-significant. Employing a linear-mixed-model for genome-wide complex trait analysis explained little additional variability (range 0-22%). The absence of a significant polygenic effect in this relatively large sample suggests an oligogenetic architecture for 25(OH)D.
vitamin D; heritability; genome wide association; polygenic score
Fetuin-A interferes with insulin action in animal studies, but data on fetuin-A and diabetes risk in humans are sparse and the role of nonalcoholic fatty liver disease in this association is unknown. From 2000 to 2006, we prospectively identified 470 matched incident diabetes case-control pairs in the Nurses’ Health Study, for whom levels of plasma fetuin-A, alanine transaminase (ALT), and γ-glutamyltranspeptidase (GGT) were measured. After multivariate adjustment for covariates, including ALT and GGT, the odds ratio (OR) (95% CI) comparing extreme fetuin-A quintiles was 1.81 (1.07–3.06) (P for trend = 0.009). A mediational analysis showed that this positive association was largely (79.9%) explained by fasting insulin and hemoglobin A1c levels; after further adjustment of these factors, the OR (95% CI) comparing extreme quintiles was attenuated to 1.09 (0.56–2.10) (P for trend = 0.42). In addition, liver enzymes did not modify this association (P for interaction = 0.91 for ALT and 0.58 for GGT). When results from this study were pooled with those in three prior prospective investigations of the same association, a consistent, positive association was observed between high fetuin-A levels and diabetes risk: the relative risk (95% CI) comparing high versus low fetuin-A levels was 1.69 (1.39–2.05) (P for heterogeneity = 0.45). These findings suggest that plasma fetuin-A levels were independently associated with higher risk of developing type 2 diabetes.
Genome-wide association studies have identified novel type 2 diabetes loci, each of which has a modest impact on risk.
To examine the joint effects of several type 2 diabetes risk variants and their combination with conventional risk factors on type 2 diabetes risk in 2 prospective cohorts.
Nested case–control study.
2809 patients with type 2 diabetes and 3501 healthy control participants of European ancestry from the Health Professionals Follow-up Study and Nurses’ Health Study.
A genetic risk score (GRS) was calculated on the basis of 10 polymorphisms in 9 loci.
After adjustment for age and body mass index (BMI), the odds ratio for type 2 diabetes with each point of GRS, corresponding to 1 risk allele, was 1.19 (95% CI, 1.14 to 1.24) and 1.16 (CI, 1.12 to 1.20) for men and women, respectively. Persons with a BMI of 30 kg/m2 or greater and a GRS in the highest quintile had an odds ratio of 14.06 (CI, 8.90 to 22.18) compared with persons with a BMI less than 25 kg/m2 and a GRS in the lowest quintile after adjustment for age and sex. Persons with a positive family history of diabetes and a GRS in the highest quintile had an odds ratio of 9.20 (CI, 5.50 to 15.40) compared with persons without a family history of diabetes and with a GRS in the lowest quintile. The addition of the GRS to a model of conventional risk factors improved discrimination by 1% (P < 0.001).
The study focused only on persons of European ancestry; whether GRS is associated with type 2 diabetes in other ethnic groups remains unknown.
Although its discriminatory value is currently limited, a GRS that combines information from multiple genetic variants might be useful for identifying subgroups with a particularly high risk for type 2 diabetes.
Large-scale genome-wide association studies (GWAS) have identified over 40 genomic regions significantly associated with type 2 diabetes mellitus. However, GWAS results are not always straightforward to interpret, and linking these loci to meaningful disease etiology is often difficult without extensive follow-up studies. The authors expanded on previously reported type 2 diabetes mellitus GWAS from the nested case-control studies of 2 prospective US cohorts by incorporating expression single nucleotide polymorphism (SNP) information and applying SNP set enrichment analysis to identify sets of SNPs associated with genes that could provide further biologic insight to traditional genome-wide analysis. Using data collected between 1989 and 1994 in these previous studies to form a nested case-control study, the authors found that 3 of the most significantly associated SNPs to type 2 diabetes mellitus in their study are expression SNPs to the lymphocyte antigen 75 gene (LY75), the ubiquitin-specific peptidase 36 gene (USP36), and the phosphatidylinositol transfer protein, cytoplasmic 1 gene (PITPNC1). SNP set enrichment analysis of the GWAS results identified enrichment for expression SNPs to the macrophage-enriched module and the Gene Ontology (GO) biologic process fat cell differentiation human, which includes the transcription factor 7-like 2 gene (TCF7L2), as well as other type 2 diabetes mellitus-associated genes. Integrating genome-wide association, gene expression, and gene set analysis may provide valuable biologic support for potential type 2 diabetes mellitus susceptibility loci and may be useful in identifying new targets or pathways of interest for the treatment and prevention of type 2 diabetes mellitus.
expression single nucleotide polymorphism; gene set enrichment analysis; genome-wide association study; integrative genomic analysis; single nucleotide polymorphism; type 2 diabetes
Aims/hypothesis: Genome-wide association studies have identified over 50 new genetic loci for type 2 diabetes (T2D). Several studies conclude that higher dietary heme iron intake increases the risk of T2D. Therefore we assessed whether the relation between genetic loci and T2D is modified by dietary heme iron intake.
Methods: We used Affymetrix Genome-Wide Human 6.0 array data [681,770 single nucleotide polymorphisms (SNPs)] and dietary information collected in the Health Professionals Follow-up Study (n = 725 cases; n = 1,273 controls) and the Nurses’ Health Study (n = 1,081 cases; n = 1,692 controls). We assessed whether genome-wide SNPs or iron metabolism SNPs interacted with dietary heme iron intake in relation to T2D, testing for associations in each cohort separately and then meta-analyzing to pool the results. Finally, we created 1,000 synthetic pathways matched to an iron metabolism pathway on number of genes, and number of SNPs in each gene. We compared the iron metabolic pathway SNPs with these synthetic SNP assemblies in their relation to T2D to assess if the pathway as a whole interacts with dietary heme iron intake.
Results: Using a genomic approach, we found no significant gene–environment interactions with dietary heme iron intake in relation to T2D at a Bonferroni corrected genome-wide significance level of 7.33 ×10-8 (top SNP in pooled analysis: intergenic rs10980508; p = 1.03 × 10-6). Furthermore, no SNP in the iron metabolic pathway significantly interacted with dietary heme iron intake at a Bonferroni corrected significance level of 2.10 × 10-4 (top SNP in pooled analysis: rs1805313; p = 1.14 × 10-3). Finally, neither the main genetic effects (pooled empirical p by SNP = 0.41), nor gene – dietary heme–iron interactions (pooled empirical p-value for the interactions = 0.72) were significant for the iron metabolic pathway as a whole.
Conclusions: We found no significant interactions between dietary heme iron intake and common SNPs in relation to T2D.
type 2 diabetes; gene environment interactions; dietary heme iron; pathway analysis
Genome-wide association study (GWAS) consortia and collaborations formed
to detect genetic loci for common phenotypes or investigate gene-environment
(G*E) interactions are increasingly common. While these consortia
effectively increase sample size, phenotype heterogeneity across studies
represents a major obstacle that limits successful identification of these
associations. Investigators are faced with the challenge of how to harmonize
previously collected phenotype data obtained using different data collection
instruments which cover topics in varying degrees of detail and over diverse
time frames. This process has not been described in detail. We describe here
some of the strategies and pitfalls associated with combining phenotype data
from varying studies. Using the Gene Environment Association Studies (GENEVA)
multi-site GWAS consortium as an example, this paper provides an illustration to
guide GWAS consortia through the process of phenotype harmonization and
describes key issues that arise when sharing data across disparate studies.
GENEVA is unusual in the diversity of disease endpoints and so the issues it
faces as its participating studies share data will be informative for many
collaborations. Phenotype harmonization requires identifying common phenotypes,
determining the feasibility of cross-study analysis for each, preparing common
definitions, and applying appropriate algorithms. Other issues to be considered
include genotyping timeframes, coordination of parallel efforts by other
collaborative groups, analytic approaches, and imputation of genotype data.
GENEVA's harmonization efforts and policy of promoting data sharing and
collaboration, not only within GENEVA but also with outside collaborations, can
provide important guidance to ongoing and new consortia.
phenotype; harmonization; genome-wide association studies; GENEVA; consortia
Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies. This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium (HWE) test p-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis (PCA) to SNP selection. The methods are illustrated with examples from the ‘Gene Environment Association Studies’ (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of genome-wide association studies.
GWAS; DNA sample quality; genotyping artifact; Hardy-Weinberg equilibrium; chromosome aberration
Retinol is one of the most biologically active forms of vitamin A and is hypothesized to influence a wide range of human diseases including asthma, cardiovascular disease, infectious diseases and cancer. We conducted a genome-wide association study of 5006 Caucasian individuals drawn from two cohorts of men: the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study and the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. We identified two independent single-nucleotide polymorphisms associated with circulating retinol levels, which are located near the transthyretin (TTR) and retinol binding protein 4 (RBP4) genes which encode major carrier proteins of retinol: rs1667255 (P =2.30× 10−17) and rs10882272 (P =6.04× 10−12). We replicated the association with rs10882272 in RBP4 in independent samples from the Nurses’ Health Study and the Invecchiare in Chianti Study (InCHIANTI) that included 3792 women and 504 men (P =9.49× 10−5), but found no association for retinol with rs1667255 in TTR among women, thus suggesting evidence for gender dimorphism (P-interaction=1.31× 10−5). Discovery of common genetic variants associated with serum retinol levels may provide further insight into the contribution of retinol and other vitamin A compounds to the development of cancer and other complex diseases.
Post-traumatic stress disorder (PTSD) is a prevalent, disabling anxiety disorder that constitutes a major health care burden. Despite evidence supporting a genetic predisposition to PTSD, the precise genetic loci remain unclear. Herein we review the current state and limitations of genetic research on PTSD. Although recent years have seen an exponential increase in the number of studies examining the influence of candidate genes on PTSD diagnosis and symptomatology, most studies have been characterized by relatively low rates of PTSD, with apparent inconsistencies in gene associations linked to marked differences in methodology. We further discuss how current advances in the genetics field can be applied to studies of PTSD, emphasizing the need to adapt a genome-wide approach that facilitates discovery rather than hypothesis testing. Genome-wide association studies offer the best opportunity to identify novel “true” risk variants for the disorder that in turn has the potential to inform our understanding of PTSD etiology.
Post-traumatic stress disorder; Trauma; Genetics; Genome-wide association; Gene–environment interaction
In genome-wide association studies (GWAS) of common genetic variants associated with circulating alpha- and gamma-tocopherol concentrations in two adult cohorts comprising 5006 men of European descent, we observed three loci associated with alpha-tocopherol levels, two novel single-nucleotide polymorphisms (SNPs), rs2108622 on 19pter-p13.11 (P= 1.7 × 10−8) and rs11057830 on 12q24.31 (P= 2.0 × 10−8) and confirmed a previously reported locus marked by rs964184 on 11q23.3 (P= 2.7 × 10−10). The three SNPs have been reported to be associated with lipid metabolism and/or regulation. We replicated these findings in a combined meta-analysis with two independent samples, P= 7.8 × 10−12 (rs964184 on 11q23.3 near BUD13, ZNF259 and APOA1/C3/A4/A5), P= 1.4 × 10−10 (rs2108622 on 19pter-p13.11 near CYP4F2) and P= 8.2 × 10−9 (rs11057830 on 12q24.31 near SCARB1). Combined, these SNPs explain 1.7% of the residual variance in log alpha-tocopherol levels. In one of the two male GWAS cohorts (n= 992), no SNPs were significantly associated with gamma-tocopherol concentrations after including data from the replication sample for 71 independent SNPs with P< 1 × 10−4 identified.
To identify type 2 diabetes (T2D) susceptibility loci, we conducted genome-wide association (GWA) scans in nested case–control samples from two prospective cohort studies, including 2591 patients and 3052 controls of European ancestry. Validation was performed in 11 independent GWA studies of 10 870 cases and 73 735 controls. We identified significantly associated variants near RBMS1 and ITGB6 genes at 2q24, best-represented by SNP rs7593730 (combined OR = 0.90, 95% CI = 0.86–0.93; P = 3.7 × 10−8). The frequency of the risk-lowering allele T is 0.23. Variants in this region were nominally related to lower fasting glucose and HOMA-IR in the MAGIC consortium (P < 0.05). These data suggest that the 2q24 locus may influence the T2D risk by affecting glucose metabolism and insulin resistance.
Plasma soluble leptin receptor (sOB-R) levels were inversely associated with diabetes risk factors, including adiposity and insulin resistance, and highly correlated with the expression levels of leptin receptor, which is ubiquitously expressed in most tissues. We conducted a genome-wide association study of sOB-R in 1504 women of European ancestry from the Nurses' Health Study. The initial scan yielded 26 single nucleotide polymorphisms (SNPs) significantly associated with sOB-R levels (P < 5 × 10−8); all mapping to the leptin receptor gene (LEPR). Analysis of imputed genotypes on autosomal chromosomes revealed an additional 106 SNPs in and adjacent to this gene that reached genome-wide significance level. Of these 132 SNPs (including two non-synonymous SNPs, rs1137100 and rs1137101), rs2767485, rs1751492 and rs4655555 remained associated with sOB-R levels at the 0.05 level (P = 9.1 × 10−9, 0.0105 and 0.0267, respectively) after adjustment for other univariately associated SNPs in a forward selection procedure. Significant associations with these SNPs were replicated in an independent sample of young males (n = 875) residing in Cyprus (P < 1 × 10−4). These data provide novel evidence revealing the role of polymorphisms in LEPR in modulating plasma levels of sOB-R and may further our understanding of the complex relationships among leptin, leptin receptor and diabetes-related traits.
Blood soluble E-selectin (sE-selectin) levels have been related to various conditions such as type 2 diabetes. We performed a genome-wide association study among women of European ancestry from the Nurses' Health Study, and identified genome-wide significant associations between a cluster of markers at the ABO locus (9q34) and plasma sE-selectin concentration. The strongest association was with rs651007, which explained ∼9.71% of the variation in sE-selectin concentrations. SNP rs651007 was also nominally associated with soluble intracellular cell adhesion molecule-1 (sICAM-1) (P = 0.026) and TNF-R2 levels (P = 0.018), independent of sE-selectin. In addition, the genetic-inferred ABO blood group genotypes were associated with sE-selectin concentrations (P = 3.55 × 10−47). Moreover, we found that the genetic-inferred blood group B was associated with a decreased risk (OR = 0.44, 0.27–0.70) of type 2 diabetes compared with blood group O, adjusting for sE-selectin, sICAM-1, TNF-R2 and other covariates. Our findings indicate that the genetic variants at ABO locus affect plasma sE-selectin levels and diabetes risk. The genetic associations with diabetes risk were independent of sE-selectin levels.
We report the first genome-wide association study of habitual caffeine intake. We included 47,341 individuals of European descent based on five population-based studies within the United States. In a meta-analysis adjusted for age, sex, smoking, and eigenvectors of population variation, two loci achieved genome-wide significance: 7p21 (P = 2.4×10−19), near AHR, and 15q24 (P = 5.2×10−14), between CYP1A1 and CYP1A2. Both the AHR and CYP1A2 genes are biologically plausible candidates as CYP1A2 metabolizes caffeine and AHR regulates CYP1A2.
Caffeine is the most widely consumed psychoactive substance in the world. Although demographic and social factors have been linked to habitual caffeine consumption, twin studies report a large heritable component. Through a comprehensive search of the human genome involving over 40,000 participants, we discovered two loci associated with habitual caffeine consumption: the first near AHR and the second between CYP1A1 and CYP1A2. Both the AHR and CYP1A2 genes are biologically plausible candidates, as CYP1A2 metabolizes caffeine and AHR regulates CYP1A2. Caffeine intake has been associated with manifold physiologic effects and both detrimental and beneficial health outcomes. Knowledge of the genetic determinants of caffeine intake may provide insight into underlying mechanisms and may provide ways to study the potential health effects of caffeine more comprehensively.
IL-18 is a proinflammatory cytokine involved in the processes of innate and acquired immunities and associated with cardiovascular disease and type 2 diabetes. We sought to identify the common genetic variants associated with IL-18 levels.
Methods and Results
We performed a two-stage genome-wide association study among women of European ancestry from the Nurses’ Health Study (NHS) and Women’s Genome Health Study (WGHS). IL-18 levels were measured by ELISA. In the discovery stage (NHS, n = 1523), 7 SNPs at IL18-BCO2 locus were associated with IL-18 concentrations at 1× 10−5 significance level. The strongest association was found for SNP rs2115763 in the BCO2 gene (P value = 6.31× 10−8). In silico replication in WGHS (435 women) confirmed these findings. The combined analysis of the two studies indicated that SNPs rs2115763, rs1834481, and rs7106524 reached genome-wide significance level (P < 5 ×10−8). Forward selection analysis indicated SNPs rs2115763 and rs1834481 were independently associated with IL-18 levels (P = 0.0002 and 0.0006, respectively). The two SNPs together explained 2.9% of variation of plasma IL-18 levels.
This study identified several novel variants at IL18-BCO2 locus associated with IL-18 levels.
GWAS; IL-18; IL18 gene
To investigate the associations between obesity-predisposing genetic variants, cardiovascular biomarkers, and cardiovascular disease (CVD) risk in women with preexisting type 2 diabetes.
Methods and Results
We genotyped polymorphisms at nine established obesity loci in 1,395 women with diabetes from the Nurses’ Health Study; 449 of these women developed CVD and 946 did not. A genetic risk score (GRS) was derived by summing risk alleles for each individual. Four polymorphisms, rs9939609 (FTO), rs11084753 (KCTD15), rs10838738 (MTCH2), and rs10938397 (GNPDA2), showed nominally significant associations with CVD. The GRS combining all obesity loci was linearly related to CVD risk (P for trend = 0.013). The OR was 1.08 per risk allele (95% CI: 1.02–1.15; P = 0.01) after adjustment for BMI and other conventional risk factors. Women with the highest quartile of GRS had 53% (6% – 122%) increased CVD risk, compared with those in the lowest quartile (P = 0.024). In addition, higher GRS was associated with lower adiponectin levels (P = 0.02). Further adjustment for BMI and other covariates did not change the association (P = 0.006). Higher GRS was also correlated with lower levels of HDL (P= 0.01).
Obesity-predisposing variants may jointly affect CVD risk among women with diabetes.
Cardiovascular disease; type 2 diabetes; obesity gene; polymorphism
For most associations of common single nucleotide polymorphisms (SNPs) with common diseases, the genetic model of inheritance is unknown. The authors extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations with type 2 diabetes. For 13 SNPs, the data fitted very well to an additive model of inheritance for the diabetes risk allele; for 4 SNPs, the data were consistent with either an additive model or a dominant model; and for 2 SNPs, the data were consistent with an additive or recessive model. Results were robust to the use of different priors and after exclusion of data for which index SNPs had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that were very similar to those previously reported based on fixed- or random-effects models, but uncertainty about several of the effects was substantially larger. The authors also examined the extent of between-study heterogeneity in the genetic model and found generally small between-study deviation values for the genetic model parameter. Heterosis could not be excluded for 4 SNPs. Information on the genetic model of robustly replicated association signals derived from genome-wide association studies may be useful for predictive modeling and for designing biologic and functional experiments.
Bayes theorem; diabetes mellitus, type 2; meta-analysis; models, genetic; polymorphism, genetic; population characteristics
For most associations of common polymorphisms with common diseases, the genetic model of inheritance is unknown. We extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations for type 2 diabetes. For 13 polymorphisms, the data fit very well to an additive model, for 4 polymorphisms the data were consistent with either an additive or dominant model, and for 2 polymorphisms with an additive or recessive model of inheritance for the diabetes risk allele. Results were robust to using different priors and after excluding data where index polymorphisms had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that are very similar to those previously reported based on fixed or random effects models, but uncertainty about several of the effects was substantially larger. We also examined the extent of between-study heterogeneity in the genetic model and found generally small values of the between-study deviation for the genetic model parameter. Heterosis could not be excluded in 4 SNPs. Information on the genetic model of robustly replicated GWA-derived association signals may be useful for predictive modeling, and for designing biological and functional experiments.
Genome-wide association studies (GWAS) have emerged as powerful means for identifying genetic loci related to complex diseases. However, the role of environment and its potential to interact with key loci has not been adequately addressed in most GWAS. Networks of collaborative studies involving different study populations and multiple phenotypes provide a powerful approach for addressing the challenges in analysis and interpretation shared across studies. The Gene, Environment Association Studies (GENEVA) consortium was initiated to: identify genetic variants related to complex diseases; identify variations in gene-trait associations related to environmental exposures; and ensure rapid sharing of data through the database of Genotypes and Phenotypes. GENEVA consists of several academic institutions, including a coordinating center, two genotyping centers and 14 independently designed studies of various phenotypes, as well as several Institutes and Centers of the National Institutes of Health led by the National Human Genome Research Institute. Minimum detectable effect sizes include relative risks ranging from 1.24 to 1.57 and proportions of variance explained ranging from 0.0097 to 0.02. Given the large number of research participants (N > 80,000), an important feature of GENEVA is harmonization of common variables, which allow analyses of additional traits. Environmental exposure information available from most studies also enables testing of gene-environment interactions. Facilitated by its sizeable infrastructure for promoting collaboration, GENEVA has established a unified framework for genotyping, data quality control, analysis and interpretation. By maximizing knowledge obtained through collaborative GWAS incorporating environmental exposure information, GENEVA aims to enhance our understanding of disease etiology, potentially identifying opportunities for intervention.
genome-wide association; complex disease; quantitative traits; gene-environment interaction; phenotype harmonization
Coffee; Caffeine; CYP1A2; Myocardial infarction; Nutrigenetics
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10−9) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10−4–2.2 × 10−7. Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Although cross-sectional studies have linked higher body mass index (BMI) and type 2 diabetes (T2D) to shortened telomeres, whether these metabolic conditions play a causal role in telomere biology is unknown. We therefore examined whether genetic predisposition to higher BMI or T2D was associated with shortened leukocyte telomere length (LTL).
We conducted an analysis of 3,968 women of European ancestry aged 43–70 years from the Nurses' Health Study, who were selected as cases or controls in genome-wide association studies and studies of telomeres and disease. Pre-diagnostic relative telomere length in peripheral blood leukocytes, collected in 1989–1990, was measured by quantitative PCR. We combined information from multiple risk variants by calculating genetic risk scores based on 32 polymorphisms near 32 loci for BMI, and 36 polymorphisms near 35 loci for T2D.
After adjustment for age and case-control status, there was no association between the BMI genetic risk score and LTL (β per standard deviation increase: −0.01; SE: 0.02; P = 0.52). Similarly, the T2D genetic score was not associated with LTL (β per standard deviation increase: −0.006; SE: 0.02; P = 0.69).
In this population of middle-aged and older women of European ancestry, those genetically predisposed to higher BMI or T2D did not possess shortened telomeres. Although we cannot exclude weak or modest effects, our findings do not support a causal relation of strong magnitude between these metabolic conditions and telomere dynamics.
Early menopause (EM) affects up to 10% of the female population, reducing reproductive lifespan considerably. Currently, it constitutes the leading cause of infertility in the western world, affecting mainly those women who postpone their first pregnancy beyond the age of 30 years. The genetic aetiology of EM is largely unknown in the majority of cases. We have undertaken a meta-analysis of genome-wide association studies (GWASs) in 3493 EM cases and 13 598 controls from 10 independent studies. No novel genetic variants were discovered, but the 17 variants previously associated with normal age at natural menopause as a quantitative trait (QT) were also associated with EM and primary ovarian insufficiency (POI). Thus, EM has a genetic aetiology which overlaps variation in normal age at menopause and is at least partly explained by the additive effects of the same polygenic variants. The combined effect of the common variants captured by the single nucleotide polymorphism arrays was estimated to account for ∼30% of the variance in EM. The association between the combined 17 variants and the risk of EM was greater than the best validated non-genetic risk factor, smoking.