DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA) sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1) an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies), incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2) an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031) and waist-hip ratio (p-value = 2.4×10-5), but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits.
We present a new program that provides a general solution for the analysis of variation of mtDNA (the small circular genome in mitochondria, separate from the DNA in the nucleus). This is needed because many large-scale genetic studies are using new DNA sequencing technologies to help assess genetic variation and its effects on disease, but the mitochondrial genome is often ignored because it exists in many copies in a cell, complicating analyses. Our approach both identifies variants on mitochondrial genome and estimates mtDNA copy number. Applying the programs to DNA sequence from ~2,000 SardiNIA project participants, we show that heteroplasmies (mtDNA variants with more than one allele at a DNA site) increase with age, and that copy number is relatively highly heritable and is correlated with metabolic traits, particularly central fat levels. The program package can facilitate comprehensive mtDNA analysis from any whole-genome sequencing data, with an increase in the understanding of mtDNA dynamics and its potential role in aging and metabolism.
Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests (SKAT). Through simulations we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and HDL in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis and gene-level tests. Our methods are implemented in freely available C++ code.
Rare variant; association; family sample; population sample; gene-level tests; study design; meta-analysis
Substance use is heritable but few common genetic variants have been associated with these behaviors. Rare non-synonymous exonic variants can now be efficiently genotyped, allowing exome-wide association tests. We identified and tested nonsynonymous variants for association with behavioral disinhibition and the use/misuse of nicotine, alcohol, and illicit drugs.
Comprehensive genotyping of exonic variation combined with single-variant and gene-based tests of association in 7181 individuals; 172 candidate addiction genes were evaluated in greater detail. We also evaluated the aggregate effects of nonsynonymous variants on these phenotypes using GCTA.
No variant or gene was significantly associated with any phenotype. No association was found for any of the 172 candidate genes, even at reduced significance thresholds. All nonsynonymous variants jointly accounted for 35% of the heritability in illicit drug use and, when combined with common variants from a genome-wide array, accounted for 84% of the heritability.
Rare nonsynonymous variants may be important in etiology of illicit drug use, but detection of individual variants will require very large samples.
Behavioral Disinhibition; Addiction; Exome; Nonsynonymous; Tobacco; Alcohol; Drug
Despite progress in identifying genes associated with breast cancer, many more risk loci exist. Genome-wide association analyses in genetically-homogeneous populations, such as that of Sardinia (Italy), could represent an additional approach to detect low penetrance alleles.
We performed a genome-wide association study comparing 1431 Sardinian patients with non-familial, BRCA1/2-mutation-negative breast cancer to 2171 healthy Sardinian blood donors. DNA was genotyped using GeneChip Human Mapping 500 K Arrays or Genome-Wide Human SNP Arrays 6.0. To increase genomic coverage, genotypes of additional SNPs were imputed using data from HapMap Phase II. After quality control filtering of genotype data, 1367 cases (9 men) and 1658 controls (1156 men) were analyzed on a total of 2,067,645 SNPs.
Overall, 33 genomic regions (67 candidate SNPs) were associated with breast cancer risk at the p < 10−6 level. Twenty of these regions contained defined genes, including one already associated with breast cancer risk: TOX3. With a lower threshold for preliminary significance to p < 10−5, we identified 11 additional SNPs in FGFR2, a well-established breast cancer-associated gene. Ten candidate SNPs were selected, excluding those already associated with breast cancer, for technical validation as well as replication in 1668 samples from the same population. Only SNP rs345299, located in intron 1 of VAV3, remained suggestively associated (p-value, 1.16x10−5), but it did not associate with breast cancer risk in pooled data from two large, mixed-population cohorts.
This study indicated the role of TOX3 and FGFR2 as breast cancer susceptibility genes in BRCA1/2-wild-type breast cancer patients from Sardinian population.
Electronic supplementary material
The online version of this article (doi:10.1186/s12885-015-1392-9) contains supplementary material, which is available to authorized users.
Breast cancer risk; BRCA1/2 mutation analysis; Genome-wide association study; Sardinian population
Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing.
Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals.
We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0489-0) contains supplementary material, which is available to authorized users.
Next-generation sequencing; Targeted sequencing; Variant calling
We previously reported linkage of schizophrenia and schizoaffective disorder to 13q32–34 in the European descent Afrikaner population from South Africa. The nature of genetic variation underlying linkage peaks in psychiatric disorders remains largely unknown and both rare and common variants may be contributing. Here, we examine the contribution of common variants located under the 13q32–34 linkage region. We used densely spaced SNPs to fine map the linkage peak region using both a discovery sample of 415 families and a meta-analysis incorporating two additional replication family samples. In a second phase of the study, we use one family-based data set with 237 families and independent case–control data sets for fine mapping of the common variant association signal using HapMap SNPs. We report a significant association with a genetic variant (rs9583277) within the gene encoding for the myosin heavy-chain Myr 8 (MYO16), which has been implicated in neuronal phosphoinositide 3-kinase signaling. Follow-up analysis of HapMap variation within MYO16 in a second set of Afrikaner families and additional case–control data sets of European descent highlighted a region across introns 2–6 as the most likely region to harbor common MYO16 risk variants. Expression analysis revealed a significant increase in the level of MYO16 expression in the brains of schizophrenia patients. Our results suggest that common variation within MYO16 may contribute to the genetic liability to schizophrenia.
association; Epidemiology; expression; linkage; Neurogenetics; Psychiatry & Behavioral Sciences; schizophrenia/Antipsychotics; SNPs; schizophrenia; association; MYO16; linkage; CNVs; expression
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
Elevated intraocular pressure (IOP) is a major risk factor for glaucoma and is influenced by genetic and environmental factors. Recent genome-wide association studies (GWAS) reported associations with IOP at TMCO1 and GAS7, and with primary open-angle glaucoma (POAG) at CDKN2B-AS1, CAV1/CAV2, and SIX1/SIX6. To identify novel genetic variants and replicate the published findings, we performed GWAS and meta-analysis of IOP in >6,000 subjects of European ancestry collected in three datasets: the NEI Glaucoma Human genetics collaBORation, GLAUcoma Genes and ENvironment study, and a subset of the Age-related Macular Degeneration-Michigan, Mayo, AREDS and Pennsylvania study. While no signal achieved genome-wide significance in individual datasets, a meta-analysis identified significant associations with IOP at TMCO1 (rs7518099-G, p = 8.0 × 10−8). Focused analyses of five loci previously reported for IOP and/or POAG, i.e., TMCO1, CDKN2B-AS1, GAS7, CAV1/CAV2, and SIX1/SIX6, revealed associations with IOP that were largely consistent across our three datasets, and replicated the previously reported associations in both effect size and direction. These results confirm the involvement of common variants in multiple genomic regions in regulating IOP and/or glaucoma risk.
Genetic and genomic studies have enhanced our understanding of complex neurodegenerative diseases that exert a devastating impact on individuals and society. One such disease, age-related macular degeneration (AMD), is a major cause of progressive and debilitating visual impairment. Since the pioneering discovery in 2005 of complement factor H (CFH) as a major AMD susceptibility gene, extensive investigations have confirmed 19 additional genetic risk loci, and more are anticipated. In addition to common variants identified by now-conventional genome-wide association studies, targeted genomic sequencing and exome-chip analyses are uncovering rare variant alleles of high impact. Here, we provide a critical review of the ongoing genetic studies and of common and rare risk variants at a total of 20 susceptibility loci, which together explain 40–60% of the disease heritability but provide limited power for diagnostic testing of disease risk. Identification of these susceptibility loci has begun to untangle the complex biological pathways underlying AMD pathophysiology, pointing to new testable paradigms for treatment.
complex disease; genetic susceptibility; neurodegeneration; retina; blindness
Summary: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples. Imputation in African Americans using 3384 haplotypes from the Exome Sequencing Project, compared with 2184 haplotypes from 1000 Genomes Project, increased effective sample size by 8.3–11.4% for coding variants with minor allele frequency <1%. No loss of imputation quality was observed using a panel built from phenotypic extremes. We recommend using haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2’s two-panel combination.
Supplementary data are available at Bioinformatics online.
Blood lipid levels are heritable, treatable risk factors for cardiovascular disease. We systematically assessed genome-wide coding variation to identify novel lipid genes, fine-map known lipid loci, and evaluate whether low frequency variants with large effect exist. Using an exome array, we genotyped 80,137 coding variants in 5,643 Norwegians. We followed up 18 variants in 4,666 Norwegians to identify 10 loci with coding variants associated with a lipid trait (P < 5×10−8). One coding variant in TM6SF2 (p.Glu167Lys), residing in a GWAS locus for lipid levels, modifies total cholesterol levels and is associated with myocardial infarction. Transient overexpression and knockdown of TM6SF2 in mouse produces alteration in serum lipid profiles consistent with the association observed in humans, identifying TM6SF2 as the functional gene at a large GWAS locus previously known as NCAN/CILP2/PBX4 or 19p13. This study demonstrates that systematic assessment of coding variation can quickly point to a candidate causal gene.
The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large sample sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays.
Knowledge of individual ancestry is important for genetic association studies where population structure leads to false positive signals. Estimating individual ancestry with targeted sequence data, which constitutes the bulk of current sequence datasets, is challenging. Here, we propose a new method for accurate estimation of genetic ancestry. Our method skips genotype calling and directly analyzes sequence reads. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry with whole genome shotgun coverage as low as 0.001X. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1X. At an even finer-scale, the method improves discrimination between exome-sequenced participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and reduce the risk of spurious findings due to population structure.
Summary: RAREMETAL is a computationally efficient tool for meta-analysis of rare variants genotyped using sequencing or arrays. RAREMETAL facilitates analyses of individual studies, accommodates a variety of input file formats, handles related and unrelated individuals, executes both single variant and burden tests and performs conditional association analyses.
Availability and implementation:
http://genome.sph.umich.edu/wiki/RAREMETAL for executables, source code, documentation and tutorial.
firstname.lastname@example.org or email@example.com
Low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, and total cholesterol are heritable, modifiable, risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,578 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5×10−8, including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian, and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipids are often associated with cardiovascular and metabolic traits including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio, and body mass index. Our results illustrate the value of genetic data from individuals of diverse ancestries and provide insights into biological mechanisms regulating blood lipids to guide future genetic, biological, and therapeutic research.
Summary: Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5–30×) and sample size (20–10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing Genomewide Association Study (GWAS)].
Availability and implementation: AbCD, including its downloadable terminal interface and web-based interface, and the associated tools ShotGun and DesignPlanner, including documentation, examples and executables, are available at http://www.unc.edu/∼yunmli/AbCD.html.
Genetic studies might provide new insights into the biological
mechanisms underlying lipid metabolism and risk of CAD. We therefore
conducted a genome-wide association study to identify novel genetic
determinants of LDL-c, HDL-c and triglycerides.
Methods and results
We combined genome-wide association data from eight studies,
comprising up to 17,723 participants with information on circulating lipid
concentrations. We did independent replication studies in up to 37,774
participants from eight populations and also in a population of Indian Asian
descent. We also assessed the association between SNPs at lipid loci and
risk of CAD in up to 9,633 cases and 38,684 controls.
We identified four novel genetic loci that showed reproducible
associations with lipids (P values 1.6 × 10−8 to
3.1 × 10−10). These include a potentially
functional SNP in the SLC39A8 gene for HDL-c, a SNP near
the MYLIP/GMPR and PPP1R3B genes for LDL-c
and at the AFF1 gene for triglycerides. SNPs showing strong
statistical association with one or more lipid traits at the
APOE-C1-C4-C2 cluster, LPL,
ZNF259-APOA5-A4-C3-A1 cluster and
TRIB1 loci were also associated with CAD risk (P values
1.1 × 10−3 to 1.2 ×
We have identified four novel loci associated with circulating
lipids. We also show that in addition to those that are largely associated
with LDL-c, genetic loci mainly associated with circulating triglycerides
and HDL-c are also associated with risk of CAD. These findings potentially
provide new insights into the biological mechanisms underlying lipid
metabolism and CAD risk.
lipids; lipoproteins; genetics; epidemiology
The Centre for Applied Genomics of the Hospital for Sick Children and the University of Toronto hosted the 10th Human Genome Variation (HGV) Meeting in Toronto, Canada, in October 2008, welcoming about 240 registrants from 34 countries. During the 3 days of plenary workshops, keynote address, and poster sessions, a strong cross-disciplinary trend was evident, integrating expertise from technology and computation, through biology and medicine, to ethics and law. Single nucleotide polymorphisms (SNPs) as well as the larger copy number variants (CNVs) are recognized by ever-improving array and next-generation sequencing technologies, and the data are being incorporated into studies that are increasingly genome-wide as well as global in scope. A greater challenge is to convert data to information, through databases, and to use the information for greater understanding of human variation. In the wake of publications of the first individual genome sequences, an inaugural public forum provided the opportunity to debate whether we are ready for personalized medicine through direct-to-consumer testing. The HGV meetings foster collaboration, and fruits of the interactions from 2008 are anticipated for the 11th annual meeting in September 2009.
SNP; CNV; GWAS; personalized medicine
A genome wide association scan of ~6.6 million genotyped or imputed variants in 882 Sardinian Multiple Sclerosis (MS) cases and 872 controls suggested association of CBLB gene variants with disease, which was confirmed in 1,775 cases and 2,005 controls (overall P =1.60 × 10-10). CBLB encodes a negative regulator of adaptive immune responses and mice lacking the orthologue are prone to experimental autoimmune encephalomyelitis, the animal model of MS.
Personality can be thought of as a set of characteristics that influence people’s thoughts, feelings, and behaviour across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in ten discovery samples (17 375 adults) and five in-silico replication samples (3 294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data were available of ~2.4M Single Nucleotide Polymorphisms (SNPs; directly typed and imputed using HAPMAP data). In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P = 2.8 × 10−8 and 3.1 × 10−8) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P = 4.9 × 10−8). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In-silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness.
Personality; Five-Factor Model; Genome-wide association; Meta-analysis; Genetic variants
Sequencing efforts, including the 1000 Genomes Project and disease-specific efforts, are producing large collections of haplotypes that can be used for genotype imputation in genome-wide association studies (GWAS). Imputing from these reference panels can help identify new risk alleles, but the use of large panels with existing methods imposes a high computational burden. To keep imputation broadly accessible, we introduce a strategy called “pre-phasing” that maintains the accuracy of leading methods while cutting computational costs by orders of magnitude. In brief, we first statistically estimate the haplotypes for each GWAS individual (“pre-phasing”) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because: (i) the GWAS samples must be phased only once, whereas standard methods would implicitly re-phase with each reference panel update; (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match unphased GWAS genotypes to a pair of reference haplotypes. This strategy will be particularly valuable for repeated imputation as reference panels evolve.
Insulin secretion plays a critical role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci contributing to insulin processing and secretion1,2; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5% to 5%) and rare (MAF<0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 non-diabetic Finnish males. We identified low-frequency coding variants associated with fasting proinsulin levels at the SGSM2 and MADD GWAS loci and three novel genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1, and PAM. We also demonstrate that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs nearby and megabases (Mb) away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits.
Given the anthropometric differences between men and women and previous evidence of sex-difference in genetic effects, we conducted a genome-wide search for sexually dimorphic associations with height, weight, body mass index, waist circumference, hip circumference, and waist-to-hip-ratio (133,723 individuals) and took forward 348 SNPs into follow-up (additional 137,052 individuals) in a total of 94 studies. Seven loci displayed significant sex-difference (FDR<5%), including four previously established (near GRB14/COBLL1, LYPLAL1/SLC30A10, VEGFA, ADAMTS9) and three novel anthropometric trait loci (near MAP3K1, HSD17B4, PPARG), all of which were genome-wide significant in women (P<5×10−8), but not in men. Sex-differences were apparent only for waist phenotypes, not for height, weight, BMI, or hip circumference. Moreover, we found no evidence for genetic effects with opposite directions in men versus women. The PPARG locus is of specific interest due to its role in diabetes genetics and therapy. Our results demonstrate the value of sex-specific GWAS to unravel the sexually dimorphic genetic underpinning of complex traits.
Men and women differ substantially regarding height, weight, and body fat. Interestingly, previous work detecting genetic effects for waist-to-hip ratio, to assess body fat distribution, has found that many of these showed sex-differences. However, systematic searches for sex-differences in genetic effects have not yet been conducted. Therefore, we undertook a genome-wide search for sexually dimorphic genetic effects for anthropometric traits including 133,723 individuals in a large meta-analysis and followed promising variants in further 137,052 individuals, including a total of 94 studies. We identified seven loci with significant sex-difference including four previously established (near GRB14/COBLL1, LYPLAL1/SLC30A10, VEGFA, ADAMTS9) and three novel anthropometric trait loci (near MAP3K1, HSD17B4, PPARG), all of which were significant in women, but not in men. Of interest is that sex-difference was only observed for waist phenotypes, but not for height or body-mass-index. We found no evidence for sex-differences with opposite effect direction for men and women. The PPARG locus is of specific interest due to its link to diabetes genetics and therapy. Our findings demonstrate the importance of investigating sex differences, which may lead to a better understanding of disease mechanisms with a potential relevance to treatment options.
Economic variables such as income, education, and occupation are known to affect mortality and morbidity, such as cardiovascular disease, and have also been shown to be partly heritable. However, very little is known about which genes influence economic variables, although these genes may have both a direct and an indirect effect on health. We report results from the first large-scale collaboration that studies the molecular genetic architecture of an economic variable–entrepreneurship–that was operationalized using self-employment, a widely-available proxy. Our results suggest that common SNPs when considered jointly explain about half of the narrow-sense heritability of self-employment estimated in twin data (σg2/σP2 = 25%, h2 = 55%). However, a meta-analysis of genome-wide association studies across sixteen studies comprising 50,627 participants did not identify genome-wide significant SNPs. 58 SNPs with p<10−5 were tested in a replication sample (n = 3,271), but none replicated. Furthermore, a gene-based test shows that none of the genes that were previously suggested in the literature to influence entrepreneurship reveal significant associations. Finally, SNP-based genetic scores that use results from the meta-analysis capture less than 0.2% of the variance in self-employment in an independent sample (p≥0.039). Our results are consistent with a highly polygenic molecular genetic architecture of self-employment, with many genetic variants of small effect. Although self-employment is a multi-faceted, heavily environmentally influenced, and biologically distal trait, our results are similar to those for other genetically complex and biologically more proximate outcomes, such as height, intelligence, personality, and several diseases.