Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.
Imputation uses genotype information from SNP arrays to infer the genotypes of missing markers. Here, the authors show that an imputation reference panel derived from whole-genome sequencing of 3,781 samples from the UK10K project improves the imputation accuracy and coverage of low frequency variants compared to existing methods.
Fasting glucose and insulin are intermediate traits for type 2 diabetes. Here we explore the role of coding variation on these traits by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls. We identify a novel association of a low-frequency nonsynonymous SNV in GLP1R (A316T; rs10305492; MAF=1.4%) with lower FG (β=-0.09±0.01 mmol L−1, p=3.4×10−12), T2D risk (OR[95%CI]=0.86[0.76-0.96], p=0.010), early insulin secretion (β=-0.07±0.035 pmolinsulin mmolglucose−1, p=0.048), but higher 2-h glucose (β=0.16±0.05 mmol L−1, p=4.3×10−4). We identify a gene-based association with FG at G6PC2 (pSKAT=6.8×10−6) driven by four rare protein-coding SNVs (H177Y, Y207S, R283X and S324P). We identify rs651007 (MAF=20%) in the first intron of ABO at the putative promoter of an antisense lncRNA, associating with higher FG (β=0.02±0.004 mmol L−1, p=1.3×10−8). Our approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.
Supplemental Digital Content is available in the text.
High blood pressure is a major contributor to the global burden of disease and discovering novel causal pathways of blood pressure regulation has been challenging. We tested blood pressure associations with 280 fasting blood metabolites in 3980 TwinsUK females. Survival analysis for all-cause mortality was performed on significant independent metabolites (P<8.9×10−5). Replication was conducted in 2 independent cohorts KORA (n=1494) and Hertfordshire (n=1515). Three independent animal experiments were performed to establish causality: (1) blood pressure change after increasing circulating metabolite levels in Wistar–Kyoto rats; (2) circulating metabolite change after salt-induced blood pressure elevation in spontaneously hypertensive stroke-prone rats; and (3) mesenteric artery response to noradrenaline and carbachol in metabolite treated and control rats. Of the15 metabolites that showed an independent significant association with blood pressure, only hexadecanedioate, a dicarboxylic acid, showed concordant association with blood pressure (systolic BP: β [95% confidence interval], 1.31 [0.83–1.78], P=6.81×10−8; diastolic BP: 0.81 [0.5–1.11], P=2.96×10−7) and mortality (hazard ratio [95% confidence interval], 1.49 [1.08–2.05]; P=0.02) in TwinsUK. The blood pressure association was replicated in KORA and Hertfordshire. In the animal experiments, we showed that oral hexadecanedioate increased both circulating hexadecanedioate and blood pressure in Wistar–Kyoto rats, whereas blood pressure elevation with oral sodium chloride in hypertensive rats did not affect hexadecanedioate levels. Vascular reactivity to noradrenaline was significantly increased in mesenteric resistance arteries from hexadecanedioate-treated rats compared with controls, indicated by the shift to the left of the concentration–response curve (P=0.013). Relaxation to carbachol did not show any difference. Our findings indicate that hexadecanedioate is causally associated with blood pressure regulation through a novel pathway that merits further investigation.
blood pressure; fatty acid synthases; hypertension; metabolomics; mortality
In recent years, multiple loci dispersed on the genome have been shown to be associated with coronary artery disease (CAD). We investigated whether these common genetic variants also hold value for CAD prediction in a large cohort of patients with familial hypercholesterolemia (FH). We genotyped a total of 41 single-nucleotide polymorphisms (SNPs) in 1701 FH patients, of whom 482 patients (28.3%) had at least one coronary event during an average follow up of 66 years. The association of each SNP with event-free survival time was calculated with a Cox proportional hazard model. In the cardiovascular disease risk factor adjusted analysis, the most significant SNP was rs1122608:G>T in the SMARCA4 gene near the LDL-receptor (LDLR) gene, with a hazard ratio for CAD risk of 0.74 (95% CI 0.49–0.99; P-value 0.021). However, none of the SNPs reached the Bonferroni threshold. Of all the known CAD loci analyzed, the SMARCA4 locus near the LDLR had the strongest negative association with CAD in this high-risk FH cohort. The effect is contrary to what was expected. None of the other loci showed association with CAD.
CAD; FH; GWAS; association
Quantitative ultrasound of the heel captures heel bone properties that independently predict fracture risk and, with bone mineral density (BMD) assessed by X-ray (DXA), may be convenient alternatives for evaluating osteoporosis and fracture risk. We performed a meta-analysis of genome-wide association (GWA) studies to assess the genetic determinants of heel broadband ultrasound attenuation (BUA; n = 14 260), velocity of sound (VOS; n = 15 514) and BMD (n = 4566) in 13 discovery cohorts. Independent replication involved seven cohorts with GWA data (in silico n = 11 452) and new genotyping in 15 cohorts (de novo n = 24 902). In combined random effects, meta-analysis of the discovery and replication cohorts, nine single nucleotide polymorphisms (SNPs) had genome-wide significant (P < 5 × 10−8) associations with heel bone properties. Alongside SNPs within or near previously identified osteoporosis susceptibility genes including ESR1 (6q25.1: rs4869739, rs3020331, rs2982552), SPTBN1 (2p16.2: rs11898505), RSPO3 (6q22.33: rs7741021), WNT16 (7q31.31: rs2908007), DKK1 (10q21.1: rs7902708) and GPATCH1 (19q13.11: rs10416265), we identified a new locus on chromosome 11q14.2 (rs597319 close to TMEM135, a gene recently linked to osteoblastogenesis and longevity) significantly associated with both BUA and VOS (P < 8.23 × 10−14). In meta-analyses involving 25 cohorts with up to 14 985 fracture cases, six of 10 SNPs associated with heel bone properties at P < 5 × 10−6 also had the expected direction of association with any fracture (P < 0.05), including three SNPs with P < 0.005: 6q22.33 (rs7741021), 7q31.31 (rs2908007) and 10q21.1 (rs7902708). In conclusion, this GWA study reveals the effect of several genes common to central DXA-derived BMD and heel ultrasound/DXA measures and points to a new genetic locus with potential implications for better understanding of osteoporosis pathophysiology.
Tissue plasminogen activator (tPA), a serine protease, catalyzes the conversion of plasminogen to plasmin, the major enzyme responsible for endogenous fibrinolysis. In some populations, elevated plasma levels of tPA have been associated with myocardial infarction and other cardiovascular diseases (CVD). We conducted a meta-analysis of genome-wide association studies (GWAS) to identify novel correlates of circulating levels of tPA.
Approach and Results
Fourteen cohort studies with tPA measures (N=26,929) contributed to the meta-analysis. Three loci were significantly associated with circulating tPA levels (P <5.0×10−8). The first locus is on 6q24.3, with the lead SNP (rs9399599, P=2.9×10−14) within STXBP5. The second locus is on 8p11.21. The lead SNP (rs3136739, P=1.3×10−9) is intronic to POLB and less than 200kb away from the tPA encoding gene PLAT. We identified a non-synonymous SNP (rs2020921) in modest LD with rs3136739 (r2 = 0.50) within exon 5 of PLAT (P=2.0×10−8). The third locus is on 12q24.33, with the lead SNP (rs7301826, P=1.0×10−9) within intron 7 of STX2. We further found evidence for association of lead SNPs in STXBP5 and STX2 with expression levels of the respective transcripts. In in vitro cell studies, silencing STXBP5 decreased release of tPA from vascular endothelial cells, while silencing of STX2 increased tPA release. Through an in-silico lookup, we found no associations of the three lead SNPs with coronary artery disease or stroke.
We identified three loci associated with circulating tPA levels, the PLAT region, STXBP5 and STX2. Our functional studies implicate a novel role for STXBP5 and STX2 in regulating tPA release.
tissue plasminogen activator; genome-wide association study; meta-analysis; cardiovascular disease risk; fibrinolysis; hemostasis
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identify extensive cell-type specific expression changes: 6,711 genes and 10,724 transcripts, enriched in non-protein coding elements at early stages of differentiation. In addition, we discovered 7,881 novel splice junctions and 2,301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrate experimentally cell specific isoform usage, identifying NFIB as a regulator of megakaryocyte maturation – the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.
We tested for interactions between body mass index (BMI) and common genetic variants affecting serum urate levels, genome-wide, in up to 42569 participants. Both stratified genome-wide association (GWAS) analyses, in lean, overweight and obese individuals, and regression-type analyses in a non BMI-stratified overall sample were performed. The former did not uncover any novel locus with a major main effect, but supported modulation of effects for some known and potentially new urate loci. The latter highlighted a SNP at RBFOX3 reaching genome-wide significant level (effect size 0.014, 95% CI 0.008-0.02, Pinter= 2.6 x 10-8). Two top loci in interaction term analyses, RBFOX3 and ERO1LB-EDARADD, also displayed suggestive differences in main effect size between the lean and obese strata. All top ranking loci for urate effect differences between BMI categories were novel and most had small magnitude but opposite direction effects between strata. They include the locus RBMS1-TANK (men, Pdifflean-overweight= 4.7 x 10-8), a region that has been associated with several obesity related traits, and TSPYL5 (men, Pdifflean-overweight= 9.1 x 10-8), regulating adipocytes-produced estradiol. The top-ranking known urate loci was ABCG2, the strongest known gout risk locus, with an effect halved in obese compared to lean men (Pdifflean-obese= 2 x 10-4). Finally, pathway analysis suggested a role for N-glycan biosynthesis as a prominent urate-associated pathway in the lean stratum. These results illustrate a potentially powerful way to monitor changes occurring in obesogenic environment.
Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.
Levels of circulating thyrotropin and free thyroxine reflect thyroid function, however, their genetic underpinnings remain poorly understood. Taylor et al. take advantage of whole-genome sequence data from cohorts within the UK10K project to identify novel variants associated with these traits.
The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (−1.43 standard deviations (standard error (s.e.=0.27) per minor allele (p-value=8.0×10−8)) discovered in 3202 individuals with low read-depth, whole genome sequence. We replicate this in 12831 participants from five additional samples of Northern and Southern European origin (−1.0 standard deviation (s.e.=0.173), p-value=7.32×10−9). This is consistent with an effect between 0.5 and 1.5mmol/L dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.
Whole genome sequence; triglycerides; APOC3
Fasting glucose and insulin are intermediate traits for type 2 diabetes. Here we explore the role of coding variation on these traits by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls. We identify a novel association of a low-frequency nonsynonymous SNV in GLP1R (A316T; rs10305492; MAF=1.4%) with lower FG (β=−0.09±0.01 mmol l−1, P=3.4 × 10−12), T2D risk (OR[95%CI]=0.86[0.76–0.96], P=0.010), early insulin secretion (β=−0.07±0.035 pmolinsulin mmolglucose−1, P=0.048), but higher 2-h glucose (β=0.16±0.05 mmol l−1, P=4.3 × 10−4). We identify a gene-based association with FG at G6PC2 (pSKAT=6.8 × 10−6) driven by four rare protein-coding SNVs (H177Y, Y207S, R283X and S324P). We identify rs651007 (MAF=20%) in the first intron of ABO at the putative promoter of an antisense lncRNA, associating with higher FG (β=0.02±0.004 mmol l−1, P=1.3 × 10−8). Our approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.
Both rare and common variants contribute to the aetiology of complex traits such as type 2 diabetes (T2D). Here, the authors examine the effect of coding variation on glycaemic traits and T2D, and identify low-frequency variation in GLP1R significantly associated with these traits.
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10-7) in SOX2OT and rs17030795 (P=5.84×10-6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10-6) between CUL3 and FAM124B and rs1886797 (P=8.05×10-6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4×10-6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
anorexia nervosa; eating disorders; GWAS; genome-wide association study; body mass index; metabolic
Using a nontargeted metabolomics approach of 447 fasting plasma metabolites, we searched for novel molecular markers that arise before and after hyperglycemia in a large population-based cohort of 2,204 females (115 type 2 diabetic [T2D] case subjects, 192 individuals with impaired fasting glucose [IFG], and 1,897 control subjects) from TwinsUK. Forty-two metabolites from three major fuel sources (carbohydrates, lipids, and proteins) were found to significantly correlate with T2D after adjusting for multiple testing; of these, 22 were previously reported as associated with T2D or insulin resistance. Fourteen metabolites were found to be associated with IFG. Among the metabolites identified, the branched-chain keto-acid metabolite 3-methyl-2-oxovalerate was the strongest predictive biomarker for IFG after glucose (odds ratio [OR] 1.65 [95% CI 1.39–1.95], P = 8.46 × 10−9) and was moderately heritable (h2 = 0.20). The association was replicated in an independent population (n = 720, OR 1.68 [ 1.34–2.11], P = 6.52 × 10−6) and validated in 189 twins with urine metabolomics taken at the same time as plasma (OR 1.87 [1.27–2.75], P = 1 × 10−3). Results confirm an important role for catabolism of branched-chain amino acids in T2D and IFG. In conclusion, this T2D-IFG biomarker study has surveyed the broadest panel of nontargeted metabolites to date, revealing both novel and known associated metabolites and providing potential novel targets for clinical prediction and a deeper understanding of causal mechanisms.
Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most comprehensive exploration of genetic loci influencing human metabolism to date, including 7,824 adult individuals from two European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity regarding more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information regarding gene expression, heritability, overlap with known drug targets, previous association with complex disorders and inborn errors of metabolism. We further developed a database and web-based resources for data mining and results visualization. Our findings contribute to a greater understanding of the role of inherited variation in blood metabolic diversity, and identify potential new opportunities for pharmacologic development and disease understanding.
The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (−1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10−8)) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (−1.0 s.d. (s.e.=0.173), P-value=7.32 × 10−9). This is consistent with an effect between 0.5 and 1.5 mmol l−1 dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.
Population-based genome sequencing provides an increasingly rich resource for the identification of low-frequency, large effect variants associated with clinically important phenotypes. Timpson et al. use UK10K data to identify a variant of the APOC3 gene strongly associated with plasma triglyceride levels.
Estimates of the heritability of plasma fibrinogen concentration, an established predictor of cardiovascular disease (CVD), range from 34 to 50%. Genetic variants so far identified by genome-wide association (GWA) studies only explain a small proportion (< 2%) of its variation.
Methods and Results
We conducted a meta-analysis of 28 GWA studies, including more than 90,000 subjects of European ancestry, the first GWA meta-analysis of fibrinogen levels in 7 African Americans studies totaling 8,289 samples, and a GWA study in Hispanic-Americans totaling 1,366 samples. Evaluation for association of SNPs with clinical outcomes included a total of 40,695 cases and 85,582 controls for coronary artery disease (CAD), 4,752 cases and 24,030 controls for stroke, and 3,208 cases and 46,167 controls for venous thromboembolism (VTE). Overall, we identified 24 genome-wide significant (P<5×10−8) independent signals in 23 loci, including 15 novel associations, together accounting for 3.7% of plasma fibrinogen variation. Gene-set enrichment analysis highlighted key roles in fibrinogen regulation for the three structural fibrinogen genes and pathways related to inflammation, adipocytokines and thyrotrophin-releasing hormone signaling. Whereas lead SNPs in a few loci were significantly associated with CAD, the combined effect of all 24 fibrinogen-associated lead SNPs was not significant for CAD, stroke or VTE.
We identify 23 robustly associated fibrinogen loci, 15 of which are new. Clinical outcome analysis of these loci does not support a causal relationship between circulating levels of fibrinogen and CAD, stroke or VTE.
Fibrinogen; cardiovascular disease; genome-wide association study
The formation of mature cells by blood stem cells is very well understood at the cellular level and we know many of the key transcription factors that control fate decisions. However, many upstream signalling and downstream effector processes are only partially understood. Genome wide association studies (GWAS) have been particularly useful in providing new directions to dissect these pathways. A GWAS meta-analysis identified 68 genetic loci controlling platelet size and number. Only a quarter of those genes, however, are known regulators of hematopoiesis. To determine function of the remaining genes we performed a medium-throughput genetic screen in zebrafish using antisense morpholino oligonucleotides (MOs) to knock down protein expression, followed by histological analysis of selected genes using a wide panel of different hematopoietic markers. The information generated by the initial knockdown was used to profile phenotypes and to position candidate genes hierarchically in hematopoiesis. Further analysis of brd3a revealed its essential role in differentiation but not maintenance and survival of thrombocytes. Using the from-GWAS-to-function strategy we have not only identified a series of genes that represent novel regulators of thrombopoiesis and hematopoiesis, but this work also represents, to our knowledge, the first example of a functional genetic screening strategy that is a critical step toward obtaining biologically relevant functional data from GWA study for blood cell traits.
In this manuscript we report on a follow-up study of the GWAS loci associated with the platelet size and number. A GWAS meta-analysis identified 68 genetic loci controlling platelet size and number. Only a quarter of those genes, however, are known regulators of hematopoiesis. To determine function of the remaining genes we performed a medium-throughput genetic screen in zebrafish using morpholinos (MOs) to knock down selected candidate genes. Here, we report on two major findings. First we identified 15 genes (corresponding to 12 human genes) required for distinct stages of specification or differentiation of HSCs in zebrafish. A detailed review of databases and literature revealed limited knowledge about the functional role of Satb1, Rcor1 and Brd3 in hematopoiesis and for the remaining nine genes our work represents the first study on their putative role in hematopoiesis. And secondly, we demonstrate that brd3a is critical for establishing, but not maintaining thrombopoietic compartment. Importantly, our study introduces zebrafish as a model system for functional follow-up of GWAS loci and generates a valuable resource for prioritization of platelet size and number associated genes for future in-depth mechanistic analyses. Following this route of investigation new regulatory molecules of hematopoiesis will be added to critical pathways.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Every individual carries two copies of each chromosome (haplotypes), one from each of their parents, that consist of a long sequence of alleles. Modern genotyping technologies do not measure haplotypes directly, but the combined sum (or genotype) of alleles at each site. Statistical methods are needed to infer (or phase) the haplotypes from the observed genotypes. Haplotype estimation is a key first step of many disease and population genetic studies. Much recent work in this area has focused on phasing in cohorts of nominally unrelated individuals. So called ‘long range phasing’ is a relatively recent concept for phasing individuals with intermediate levels of relatedness, such as cohorts taken from population isolates. Methods also exist for phasing genotypes for individuals within explicit pedigrees. Whilst high quality phasing techniques are available for each of these demographic scenarios, to date, no single method is applicable to all three. In this paper, we present a general approach for phasing cohorts that contain any level of relatedness between the study individuals. We demonstrate high levels of accuracy in all demographic scenarios, as well as the ability to detect (Mendelian consistent) genotyping error and recombination events in duos and trios, the first method with such a capability.
Blood pressure (BP) is a heritable determinant of risk for cardiovascular disease (CVD). To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP) and pulse pressure (PP), we genotyped ∼50 000 single-nucleotide polymorphisms (SNPs) that capture variation in ∼2100 candidate genes for cardiovascular phenotypes in 61 619 individuals of European ancestry from cohort studies in the USA and Europe. We identified novel associations between rs347591 and SBP (chromosome 3p25.3, in an intron of HRH1) and between rs2169137 and DBP (chromosome1q32.1 in an intron of MDM4) and between rs2014408 and SBP (chromosome 11p15 in an intron of SOX6), previously reported to be associated with MAP. We also confirmed 10 previously known loci associated with SBP, DBP, MAP or PP (ADRB1, ATP2B1, SH2B3/ATXN2, CSK, CYP17A1, FURIN, HFE, LSP1, MTHFR, SOX6) at array-wide significance (P < 2.4 × 10−6). We then replicated these associations in an independent set of 65 886 individuals of European ancestry. The findings from expression QTL (eQTL) analysis showed associations of SNPs in the MDM4 region with MDM4 expression. We did not find any evidence of association of the two novel SNPs in MDM4 and HRH1 with sequelae of high BP including coronary artery disease (CAD), left ventricular hypertrophy (LVH) or stroke. In summary, we identified two novel loci associated with BP and confirmed multiple previously reported associations. Our findings extend our understanding of genes involved in BP regulation, some of which may eventually provide new targets for therapeutic intervention.
Approaches exploiting extremes of the trait distribution may reveal novel loci for common traits, but it is unknown whether such loci are generalizable to the general population. In a genome-wide search for loci associated with upper vs. lower 5th percentiles of body mass index, height and waist-hip ratio, as well as clinical classes of obesity including up to 263,407 European individuals, we identified four new loci (IGFBP4, H6PD, RSRC1, PPP2R2A) influencing height detected in the tails and seven new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3, ZZZ3) for clinical classes of obesity. Further, we show that there is large overlap in terms of genetic structure and distribution of variants between traits based on extremes and the general population and little etiologic heterogeneity between obesity subgroups.