Coronary artery calcification (CAC) detected by computed tomography is a non-invasive measure of coronary atherosclerosis, that underlies most cases of myocardial infarction (MI). We aimed to identify common genetic variants associated with CAC and further investigate their associations with MI.
Methods and Results
Computed tomography was used to assess quantity of CAC. A meta-analysis of genome-wide association studies for CAC was carried out in 9,961 men and women from five independent community-based cohorts, with replication in three additional independent cohorts (n=6,032). We examined the top single nucleotide polymorphisms (SNPs) associated with CAC quantity for association with MI in multiple large genome-wide association studies of MI. Genome-wide significant associations with CAC for SNPs on chromosome 9p21 near CDKN2A and CDKN2B (top SNP: rs1333049, P=7.58×10−19) and 6p24 (top SNP: rs9349379, within the PHACTR1 gene, P=2.65×10−11) replicated for CAC and for MI. Additionally, there is evidence for concordance of SNP associations with both CAC and with MI at a number of other loci, including 3q22 (MRAS gene), 13q34 (COL4A1/COL4A2 genes), and 1p13 (SORT1 gene).
SNPs in the 9p21 and PHACTR1 gene loci were strongly associated with CAC and MI, and there are suggestive associations with both CAC and MI of SNPs in additional loci. Multiple genetic loci are associated with development of both underlying coronary atherosclerosis and clinical events.
cardiac computed tomography; coronary artery calcification; coronary atherosclerosis; genome-wide association studies; myocardial infarction
Weight-loss interventions generally improve lipid profiles and reduce cardiovascular disease risk, but effects are variable and may depend on genetic factors. We performed a genetic association analysis of data from 2,993 participants in the Diabetes Prevention Program to test the hypotheses that a genetic risk score (GRS) based on deleterious alleles at 32 lipid-associated single-nucleotide polymorphisms modifies the effects of lifestyle and/or metformin interventions on lipid levels and nuclear magnetic resonance (NMR) lipoprotein subfraction size and number. Twenty-three loci previously associated with fasting LDL-C, HDL-C, or triglycerides replicated (P = 0.04–1×10−17). Except for total HDL particles (r = −0.03, P = 0.26), all components of the lipid profile correlated with the GRS (partial |r| = 0.07–0.17, P = 5×10−5–1×10−19). The GRS was associated with higher baseline-adjusted 1-year LDL cholesterol levels (β = +0.87, SEE±0.22 mg/dl/allele, P = 8×10−5, Pinteraction = 0.02) in the lifestyle intervention group, but not in the placebo (β = +0.20, SEE±0.22 mg/dl/allele, P = 0.35) or metformin (β = −0.03, SEE±0.22 mg/dl/allele, P = 0.90; Pinteraction = 0.64) groups. Similarly, a higher GRS predicted a greater number of baseline-adjusted small LDL particles at 1 year in the lifestyle intervention arm (β = +0.30, SEE±0.012 ln nmol/L/allele, P = 0.01, Pinteraction = 0.01) but not in the placebo (β = −0.002, SEE±0.008 ln nmol/L/allele, P = 0.74) or metformin (β = +0.013, SEE±0.008 nmol/L/allele, P = 0.12; Pinteraction = 0.24) groups. Our findings suggest that a high genetic burden confers an adverse lipid profile and predicts attenuated response in LDL-C levels and small LDL particle number to dietary and physical activity interventions aimed at weight loss.
The study included 2,993 participants from the Diabetes Prevention Program, a randomized clinical trial of intensive lifestyle intervention, metformin treatment, and placebo control. We examined associations between 32 gene variants that have been reproducibly associated with dyslipidemia and concentrations of lipids and NMR lipoprotein particle sizes and numbers. We also examined whether genetic background influences a person's response to cardioprotective interventions on lipid levels. Our analysis, which focused on determining whether common genetic variants impact the effects of cardioprotective interventions on lipid and lipoprotein particle size, shows that in persons with a high genetic risk score the benefit of intensive lifestyle intervention on LDL and small LDL particle levels is substantially diminished; this information may be informative for the targeted prevention of dyslipidemia, as it suggests that genetics might help identify persons in whom lifestyle intervention is likely to be an effective treatment for elevated lipids and lipoproteins. The NMR subfraction analyses provide novel insight into the biology of dyslipidemia by illustrating how numerous genetic variants that have previously been associated with lipid levels also modulate NMR lipoprotein particle sizes and number. This information may be informative for the targeted prevention of cardiovascular disease.
The genetic loci that have been found by genome-wide association studies to modulate risk of coronary heart disease explain only a fraction of its total variance, and gene-gene interactions have been proposed as a potential source of the remaining heritability. Given the potentially large testing burden, we sought to enrich our search space with real interactions by analyzing variants that may be more likely to interact on the basis of two distinct hypotheses: a biological hypothesis, under which MI risk is modulated by interactions between variants that are known to be relevant for its risk factors; and a statistical hypothesis, under which interacting variants individually show weak marginal association with MI. In a discovery sample of 2,967 cases of early-onset myocardial infarction (MI) and 3,075 controls from the MIGen study, we performed pair-wise SNP interaction testing using a logistic regression framework. Despite having reasonable power to detect interaction effects of plausible magnitudes, we observed no statistically significant evidence of interaction under these hypotheses, and no clear consistency between the top results in our discovery sample and those in a large validation sample of 1,766 cases of coronary heart disease and 2,938 controls from the Wellcome Trust Case-Control Consortium. Our results do not support the existence of strong interaction effects as a common risk factor for MI. Within the scope of the hypotheses we have explored, this study places a modest upper limit on the magnitude that epistatic risk effects are likely to have at the population level (odds ratio for MI risk 1.3–2.0, depending on allele frequency and interaction model).
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.
In this work we address a series of questions prompted by the rise of next-generation sequencing as a data collection strategy for genetic studies. How does low coverage sequencing compare to traditional microarray based genotyping? Do studies increase sensitivity by collecting both sequencing and array data? What can we learn about technology error modes based on analysis of SNPs for which sequence and array data disagree? To answer these questions, we developed a statistical framework to estimate genotypes from sequence reads, array intensities, and imputation. Through experiments with intensity and read data from the Hapmap and 1000 Genomes (1000 G) Projects, we show that 1 M SNP arrays used for genome wide association studies perform similarly to 1× sequencing. We find that adding low coverage sequence reads to dense array data significantly increases rare variant sensitivity, but adding dense array data to low coverage sequencing has only a small impact. Finally, we describe an improved SNP calling algorithm used in the 1000 G project, inspired by a novel next-generation sequencing error mode identified through analysis of disputed SNPs. These results inform the use of next-generation sequencing in genetic studies and model an approach to further improve genotype calling methods.
More than a thousand disease susceptibility loci have been identified via genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings generally remain to be defined. We utilize pooled next-generation sequencing to study 56 genes in regions associated to Crohn’s Disease in 350 cases and 350 controls. Follow up genotyping of 70 rare and low-frequency protein-altering variants (MAF ~ .001-.05) in nine independent case-control series (16054 CD patients, 12153 UC patients, 17575 healthy controls) identifies four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association to a novel, protective splice variant in CARD9 (p < 1e-16, OR ~ 0.29), as well as additional associations to coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by providing novel, rare, and likely functional variants that will empower functional experiments and predictive models.
The let-7 tumor suppressor microRNAs are known for their regulation of oncogenes, while the RNA-binding proteins Lin28a/b promote malignancy by blocking let-7 biogenesis. In studies of the Lin28/let-7 pathway, we discovered unexpected roles in regulating metabolism. When overexpressed in mice, both Lin28a and LIN28B promoted an insulin-sensitized state that resisted high fat diet-induced diabetes, whereas muscle-specific loss of Lin28a and overexpression of let-7 resulted in insulin resistance and impaired glucose tolerance. These phenomena occurred in part through let-7-mediated repression of multiple components of the insulin-PI3K-mTOR pathway, including IGF1R, INSR, and IRS2. The mTOR inhibitor rapamycin abrogated the enhanced glucose uptake and insulin-sensitivity conferred by Lin28a in vitro and in vivo. In addition, we found that let-7 targets were enriched for genes that contain SNPs associated with type 2 diabetes and fasting glucose in human genome-wide association studies. These data establish the Lin28/let-7 pathway as a central regulator of mammalian glucose metabolism.
Over 30 loci have been associated with risk of type 2 diabetes at genome-wide statistical significance. Genetic risk scores (GRSs) developed from these loci predict diabetes in the general population. We tested if a GRS based on an updated list of 34 type 2 diabetes–associated loci predicted progression to diabetes or regression toward normal glucose regulation (NGR) in the Diabetes Prevention Program (DPP).
RESEARCH DESIGN AND METHODS
We genotyped 34 type 2 diabetes–associated variants in 2,843 DPP participants at high risk of type 2 diabetes from five ethnic groups representative of the U.S. population, who had been randomized to placebo, metformin, or lifestyle intervention. We built a GRS by weighting each risk allele by its reported effect size on type 2 diabetes risk and summing these values. We tested its ability to predict diabetes incidence or regression to NGR in models adjusted for age, sex, ethnicity, waist circumference, and treatment assignment.
In multivariate-adjusted models, the GRS was significantly associated with increased risk of progression to diabetes (hazard ratio [HR] = 1.02 per risk allele [95% CI 1.00–1.05]; P = 0.03) and a lower probability of regression to NGR (HR = 0.95 per risk allele [95% CI 0.93–0.98]; P < 0.0001). At baseline, a higher GRS was associated with a lower insulinogenic index (P < 0.001), confirming an impairment in β-cell function. We detected no significant interaction between GRS and treatment, but the lifestyle intervention was effective in the highest quartile of GRS (P < 0.0001).
A high GRS is associated with increased risk of developing diabetes and lower probability of returning to NGR in high-risk individuals, but a lifestyle intervention attenuates this risk.
The risk of type 2 diabetes is approximately 2-fold higher in African Americans than in European Americans even after adjusting for known environmental risk factors, including socioeconomic status (SES), suggesting that genetic factors may explain some of this population difference in disease risk. However, relatively few genetic studies have examined this hypothesis in a large sample of African Americans with and without diabetes. Therefore, we performed an admixture analysis using 2,189 ancestry-informative markers in 7,021 African Americans (2,373 with type 2 diabetes and 4,648 without) from the Atherosclerosis Risk in Communities Study, the Jackson Heart Study, and the Multiethnic Cohort to 1) determine the association of type 2 diabetes and its related quantitative traits with African ancestry controlling for measures of SES and 2) identify genetic loci for type 2 diabetes through a genome-wide admixture mapping scan. The median percentage of African ancestry of diabetic participants was slightly greater than that of non-diabetic participants (study-adjusted difference = 1.6%, P<0.001). The odds ratio for diabetes comparing participants in the highest vs. lowest tertile of African ancestry was 1.33 (95% confidence interval 1.13–1.55), after adjustment for age, sex, study, body mass index (BMI), and SES. Admixture scans identified two potential loci for diabetes at 12p13.31 (LOD = 4.0) and 13q14.3 (Z score = 4.5, P = 6.6×10−6). In conclusion, genetic ancestry has a significant association with type 2 diabetes above and beyond its association with non-genetic risk factors for type 2 diabetes in African Americans, but no single gene with a major effect is sufficient to explain a large portion of the observed population difference in risk of diabetes. There undoubtedly is a complex interplay among specific genetic loci and non-genetic factors, which may both be associated with overall admixture, leading to the observed ethnic differences in diabetes risk.
The potential benefits of using population isolates in genetic mapping, such as reduced genetic, phenotypic and environmental heterogeneity, are offset by the challenges posed by the large amounts of direct and cryptic relatedness in these populations confounding basic assumptions of independence. We have evaluated four representative specialized methods for association testing in the presence of relatedness; (i) within-family (ii) within- and between-family and (iii) mixed-models methods, using simulated traits for 2906 subjects with known genome-wide genotype data from an extremely isolated population, the Island of Kosrae, Federated States of Micronesia. We report that mixed models optimally extract association information from such samples, demonstrating 88% power to rank the true variant as among the top 10 genome-wide with 56% achieving genome-wide significance, a >80% improvement over the other methods, and demonstrate that population isolates have similar power to non-isolate populations for observing variants of known effects. We then used the mixed-model method to reanalyze data for 17 published phenotypes relating to metabolic traits and electrocardiographic measures, along with another 8 previously unreported. We replicate nine genome-wide significant associations with known loci of plasma cholesterol, high-density lipoprotein, low-density lipoprotein, triglycerides, thyroid stimulating hormone, homocysteine, C-reactive protein and uric acid, with only one detected in the previous analysis of the same traits. Further, we leveraged shared identity-by-descent genetic segments in the region of the uric acid locus to fine-map the signal, refining the known locus by a factor of 4. Finally, we report a novel associations for height (rs17629022, P< 2.1 × 10−8).
We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 cases and 64,762 controls of European descent, followed by genotyping of top association signals in 60,738 additional individuals. This genomic analysis identified 13 novel loci harboring one or more SNPs that were associated with CAD at P<5×10−8 and confirmed the association of 10 of 12 previously reported CAD loci. The 13 novel loci displayed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6 to 17 percent increase in the risk of CAD per allele. Notably, only three of the novel loci displayed significant association with traditional CAD risk factors, while the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the novel CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.
Genome-wide association studies have begun to elucidate the genetic architecture of type 2 diabetes. We examined whether single nucleotide polymorphisms (SNPs) identified through targeted complementary approaches affect diabetes incidence in the at-risk population of the Diabetes Prevention Program (DPP) and whether they influence a response to preventive interventions.
RESEARCH DESIGN AND METHODS
We selected SNPs identified by prior genome-wide association studies for type 2 diabetes and related traits, or capturing common variation in 40 candidate genes previously associated with type 2 diabetes, implicated in monogenic diabetes, encoding type 2 diabetes drug targets or drug-metabolizing/transporting enzymes, or involved in relevant physiological processes. We analyzed 1,590 SNPs for association with incident diabetes and their interaction with response to metformin or lifestyle interventions in 2,994 DPP participants. We controlled for multiple hypothesis testing by assessing false discovery rates.
We replicated the association of variants in the metformin transporter gene SLC47A1 with metformin response and detected nominal interactions in the AMP kinase (AMPK) gene STK11, the AMPK subunit genes PRKAA1 and PRKAA2, and a missense SNP in SLC22A1, which encodes another metformin transporter. The most significant association with diabetes incidence occurred in the AMPK subunit gene PRKAG2 (hazard ratio 1.24, 95% CI 1.09–1.40, P = 7 × 10−4). Overall, there were nominal associations with diabetes incidence at 85 SNPs and nominal interactions with the metformin and lifestyle interventions at 91 and 69 mostly nonoverlapping SNPs, respectively. The lowest P values were consistent with experiment-wide 33% false discovery rates.
We have identified potential genetic determinants of metformin response. These results merit confirmation in independent samples.
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.
The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.
This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
The number and volume of cells in the blood affect a wide range of disorders including cancer and cardiovascular, metabolic, infectious and immune conditions. We consider here the genetic variation in eight clinically relevant hematological parameters, including hemoglobin levels, red and white blood cell counts and platelet counts and volume. We describe common variants within 22 genetic loci reproducibly associated with these hematological parameters in 13,943 samples from six European population-based studies, including 6 associated with red blood cell parameters, 15 associated with platelet parameters and 1 associated with total white blood cell count. We further identified a long-range haplotype at 12q24 associated with coronary artery disease in 9,479 cases and 10,527 controls. We show that this haplotype demonstrates extensive disease pleiotropy, as it contains known risk loci for type 1 diabetes, hypertension and celiac disease and has been spread by a selective sweep specific to European and geographically nearby populations.
We sequenced all protein-coding regions of the genome (the “exome”) in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.)
The CHEK2-1100delC mutation is recurrent in the population and is a moderate risk factor for breast cancer. To identify additional CHEK2 mutations potentially contributing to breast cancer susceptibility, we sequenced 248 cases with early-onset disease; functionally characterized new variants and conducted a population-based case–control analysis to evaluate their contribution to breast cancer risk. We identified 1 additional null mutation and 5 missense variants in the germline of cancer patients. In vitro, the CHEK2-H143Y variant resulted in gross protein destabilization, while others had variable suppression of in vitro kinase activity using BRCA1 as a substrate. The germline CHEK2-1100delC mutation was present among 8/1,646 (0.5%) sporadic, 2/400 (0.5%) early-onset and 3/302 (1%) familial breast cancer cases, but undetectable amongst 2,105 multiethnic controls, including 633 from the US. CHEK2-positive breast cancer families also carried a deleterious BRCA1 mutation. 1100delC appears to be the only recurrent CHEK2 mutation associated with a potentially significant contribution to breast cancer risk in the general population. Another recurrent mutation with attenuated in vitro function, CHEK2-P85L, is not associated with increased breast cancer susceptibility, but exhibits a striking difference in frequency across populations with different ancestral histories. These observations illustrate the importance of genotyping ethnically diverse groups when assessing the impact of low-penetrance susceptibility alleles on population risk. Our findings highlight the notion that clinical testing for rare missense mutations within CHEK2 may have limited value in predicting breast cancer risk, but that testing for the 1100delC variant may be valuable in phenotypically- and geographically-selected populations.
CHEK2; susceptibility; breast; cancer; mutation
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients.
Technological advances make it possible to use high-throughput sequencing as a primary discovery tool of medical genetics, specifically for assaying rare variation. Still this approach faces the analytic challenge that the influence of very rare variants can only be evaluated effectively as a group. A further complication is that any given rare variant could have no effect, could increase risk, or could be protective. We propose here the C-alpha test statistic as a novel approach for testing for the presence of this mixture of effects across a set of rare variants. Unlike existing burden tests, C-alpha, by testing the variance rather than the mean, maintains consistent power when the target set contains both risk and protective variants. Through simulations and analysis of case/control data, we demonstrate good power relative to existing methods that assess the burden of rare variants in individuals.
Developments in sequencing technology now enable us to assay all genetic variation, much of which is extremely rare. We propose to test the distribution of rare variants we observe in cases versus controls. To do so, we present a novel application of the C-alpha statistic to test these rare variants. C-alpha aims to determine whether the set of variants observed in cases and controls is a mixture, such that some of the variants confer risk or protection or are phenotypically neutral. Risk variants are expected to be more common in cases; protective variants more common in controls. C-alpha is sensitive to this imbalance, regardless of its origin—risk, protective, or both—but is ideally suited for a mixture of protective and risk variants. Variation in APOB nicely illustrates a mixture, in that certain rare variants increase triglyceride levels while others decrease it. The hallmark feature of C-alpha is that it uses the distribution of variation observed in cases and controls to detect the presence of a mixture, thus implicating genes or pathways as risk factors for disease.
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package.
Specific rare deletion and duplication events in the genome have now been shown to be associated with neuropsychiatric diseases such as 16p11.2 to autism and 22q11.21 to schizophrenia. However, controversy remains as to whether rare events impacting certain pathways as a group increase the risk of disease, and if so, what those pathways are. Other studies have used standard gene-set enrichment approaches to demonstrate that events discovered in cases contain more genes in neuro-developmental pathways than would be expected by chance. However, these analyses do not explicitly compare the relative enrichment in cases to any enrichment that may also be present in controls. Therefore, they can be confounded by the large size of brain genes or by larger size or frequency of CNVs in cases. Here we propose a case-control statistical test to assess whether a key pathway is differentially impacted by CNVs in cases compared to controls. Our approach is robust to skewed gene sizes and case-control differences in CNV rate and size.
For most associations of common single nucleotide polymorphisms (SNPs) with common diseases, the genetic model of inheritance is unknown. The authors extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations with type 2 diabetes. For 13 SNPs, the data fitted very well to an additive model of inheritance for the diabetes risk allele; for 4 SNPs, the data were consistent with either an additive model or a dominant model; and for 2 SNPs, the data were consistent with an additive or recessive model. Results were robust to the use of different priors and after exclusion of data for which index SNPs had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that were very similar to those previously reported based on fixed- or random-effects models, but uncertainty about several of the effects was substantially larger. The authors also examined the extent of between-study heterogeneity in the genetic model and found generally small between-study deviation values for the genetic model parameter. Heterosis could not be excluded for 4 SNPs. Information on the genetic model of robustly replicated association signals derived from genome-wide association studies may be useful for predictive modeling and for designing biologic and functional experiments.
Bayes theorem; diabetes mellitus, type 2; meta-analysis; models, genetic; polymorphism, genetic; population characteristics
It has been recently hypothesized that many of the signals detected in genome-wide association studies (GWAS) to T2D and other diseases, despite being observed to common variants, might in fact result from causal mutations that are rare. One prediction of this hypothesis is that the allelic associations should be population-specific, as the causal mutations arose after the migrations that established different populations around the world. We selected 19 common variants found to be reproducibly associated to T2D risk in European populations and studied them in a large multiethnic case-control study (6,142 cases and 7,403 controls) among men and women from 5 racial/ethnic groups (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In analysis pooled across ethnic groups, the allelic associations were in the same direction as the original report for all 19 variants, and 14 of the 19 were significantly associated with risk. In summing the number of risk alleles for each individual, the per-allele associations were highly statistically significant (P<10−4) and similar in all populations (odds ratios 1.09–1.12) except in Japanese Americans the estimated effect per allele was larger than in the other populations (1.20; Phet = 3.8×10−4). We did not observe ethnic differences in the distribution of risk that would explain the increased prevalence of type 2 diabetes in these groups as compared to European Americans. The consistency of allelic associations in diverse racial/ethnic groups is not predicted under the hypothesis of Goldstein regarding “synthetic associations” of rare mutations in T2D.
Single rare causal alleles and/or collections of multiple rare alleles have been suggested to create “synthetic associations” with common variants in genome-wide association studies (GWAS). This model predicts that associations with common variants will not be consistent across populations. In this study, we examined 19 T2D variants for association with T2D risk in 6,142 cases and 7,403 controls from five racial/ethnic populations in the Multiethnic Cohort (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In racial/ethnic pooled analysis, all 19 variants were associated with T2D risk in the same direction as previous reports in Europeans, and the sum total of risk variants was significantly associated with T2D risk in each racial/ethnic group. The consistent associations across populations do not support the Goldstein hypothesis that rare causal alleles underlie GWAS signals. We also did not find evidence that these markers underlie racial/ethnic disparities in T2D prevalence. Large-scale GWAS and sequencing studies in these populations are necessary in order to both improve the current set of markers at these risk loci and identify new risk variants for T2D that may be difficult, or impossible, to detect in European populations.
Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA's performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and ∼1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.
Mitochondria play a crucial role in metabolic homeostasis, and alteration of mitochondrial function is a hallmark of diabetes. While mitochondrial activity is reduced in people with diabetes, it is unclear whether mitochondrial dysfunction is a cause or effect of type 2 diabetes. Genome-wide association studies for type 2 diabetes have explained ≈10% of the heritability of the disease, but none of the loci are known to affect mitochondrial activity. It is possible though that a mitochondrial contribution is hidden in the remaining 90%. Hence, we tested the hypothesis that multiple mitochondria-related genes encoded in the nucleus, each having a weak effect (hard to detect individually), can collectively influence type 2 diabetes. To address this, we developed a computational method (MAGENTA) that allowed us to adequately analyze large collective datasets of human genetic variation obtained from collaborative studies of type 2 diabetes and related glycemic traits. Despite the increased sensitivity of MAGENTA compared to single-DNA variant analysis, we found no support for a causal relationship between mitochondrial dysfunction and type 2 diabetes. These results may help steer future efforts in understanding the pathogenesis of the disease. MAGENTA is broadly applicable to testing associations between other biological pathways and common diseases or traits.
For most associations of common polymorphisms with common diseases, the genetic model of inheritance is unknown. We extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations for type 2 diabetes. For 13 polymorphisms, the data fit very well to an additive model, for 4 polymorphisms the data were consistent with either an additive or dominant model, and for 2 polymorphisms with an additive or recessive model of inheritance for the diabetes risk allele. Results were robust to using different priors and after excluding data where index polymorphisms had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that are very similar to those previously reported based on fixed or random effects models, but uncertainty about several of the effects was substantially larger. We also examined the extent of between-study heterogeneity in the genetic model and found generally small values of the between-study deviation for the genetic model parameter. Heterosis could not be excluded in 4 SNPs. Information on the genetic model of robustly replicated GWA-derived association signals may be useful for predictive modeling, and for designing biological and functional experiments.
Cardiac conduction, as assessed by electrocardiographic PR interval and QRS duration, is an important electrophysiological trait and a determinant of arrhythmia risk.
We sought to identify common genetic determinants of these measures of cardiac conduction time.
We examined 1604 individuals from the island of Kosrae, Federated States of Micronesia, an isolated founder population. We adjusted for covariates and estimated the heritability of quantitative electrocardiographic QRS duration, PR interval and secondarily its subcomponents P wave duration and PR segment. Finally, we performed a genome-wide association study (GWAS) in a subset of 1262 individuals genotyped using the Affymetrix GeneChip Human Mapping 500K microarray.
The heritability of PR interval was 34% (SE 5%, p=4×10−18), of PR segment 31% (SE 6%, p=3.2×10−13) and P wave duration 17% (SE 5%, p=5.8×10−6) but for QRS duration only 3% (SE 4%, p=0.20). Hence, GWAS was performed only for PR interval and its subcomponents. A total of 338,049 SNPs passed quality criteria. For PR interval, the most significantly associated SNPs were located in and downstream of the alpha-subunit of the cardiac voltage-gated sodium channel gene SCN5A with a 4.8 msec (SE 1.0) or 0.23 standard deviation increase in adjusted PR interval for each minor allele copy of rs7638909 (p=1.6×10−6, minor allele frequency 0.40). These SNPs were also associated with P wave duration (p=1.5×10−4) and PR segment (p=0.01) but not with QRS duration (p≥0.22).
PR interval and its subcomponents showed substantial heritability in a South Pacific islander population and were associated with common genetic variation in SCN5A.
Conduction; electrocardiography; electrophysiology; genetics; ion channels
The architecture of natural variation present in a contemporary population is a result of multiple population genetic forces, including population bottleneck and expansion, selection, drift, and admixture. We seek to untangle the contribution of admixture to genetic diversity on the Micronesian island of Kosrae. Toward this goal, we used a complete genetic approach by combining a dense genome-wide map of 100 000 single-nucleotide polymorphisms (SNPs) with data from uniparental markers from the mitochondrial genome and the nonrecombining portion of the Y chromosome. These markers were typed in ∼3200 individuals from Kosrae, representing 80% of the adult population of the island. We developed novel software that uses SNP data to delineate ancestry for individual segments of the genome. Through this analysis, we determined that 39% of Kosraens have some European ancestry. However, the vast majority of admixed individuals (77%) have European alleles spanning less than 10% of their genomes. Data from uniparental markers show most of this admixture to be male, introduced in the late nineteenth century. Furthermore, pedigree analysis shows that the majority of European admixture on Kosrae is because of the contribution of one individual. This approach shows the benefit of combining information from autosomal and uniparental polymorphisms and provides new methodology for determining ancestry in a population.
admixture; haplotype; population genetics; mitochondria; Y chromosome; Oceania