Elevated serum urate concentrations can cause gout, a prevalent and painful inflammatory arthritis. By combining data from >140,000 individuals of European ancestry within the Global Urate Genetics Consortium (GUGC), we identified and replicated 28 genome-wide significant loci in association with serum urate concentrations (18 new regions in or near TRIM46, INHBB, SFMBT1, TMEM171, VEGFA, BAZ1B, PRKAG2, STC1, HNF4G, A1CF, ATXN2, UBE2Q2, IGF1R, NFAT5, MAF, HLF, ACVR1B-ACVRL1 and B3GNT4). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. We further characterized these loci for associations with gout, transcript expression and the fractional excretion of urate. Network analyses implicate the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. New candidate genes for serum urate concentration highlight the importance of metabolic control of urate production and excretion, which may have implications for the treatment and prevention of gout.
The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10−8) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10−8). The top IBC association for SBP was rs2012318 (P= 6.4 × 10−6) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10−6) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexity.
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P < 5 × 10−8) with FVC in or near EFEMP1, BMP6, MIR-129-2/HSD17B12, PRDM11, WWOX, and KCNJ2. Two (GSTCD and PTCH1) loci previously associated with spirometric measures were related to FVC. Newly implicated regions were followed-up in samples of African American, Korean, Chinese, and Hispanic individuals. We detected transcripts for all six newly implicated genes in human lung tissue. The new loci may inform mechanisms involved in lung development and pathogenesis of restrictive lung disease.
Plasma fibrinogen is an acute phase protein playing an important role in the blood coagulation cascade having strong associations with smoking, alcohol consumption and body mass index (BMI). Genome-wide association studies (GWAS) have identified a variety of gene regions associated with elevated plasma fibrinogen concentrations. However, little is yet known about how associations between environmental factors and fibrinogen might be modified by genetic variation. Therefore, we conducted large-scale meta-analyses of genome-wide interaction studies to identify possible interactions of genetic variants and smoking status, alcohol consumption or BMI on fibrinogen concentration. The present study included 80,607 subjects of European ancestry from 22 studies. Genome-wide interaction analyses were performed separately in each study for about 2.6 million single nucleotide polymorphisms (SNPs) across the 22 autosomal chromosomes. For each SNP and risk factor, we performed a linear regression under an additive genetic model including an interaction term between SNP and risk factor. Interaction estimates were meta-analysed using a fixed-effects model. No genome-wide significant interaction with smoking status, alcohol consumption or BMI was observed in the meta-analyses. The most suggestive interaction was found for smoking and rs10519203, located in the LOC123688 region on chromosome 15, with a p value of 6.2×10−8. This large genome-wide interaction study including 80,607 participants found no strong evidence of interaction between genetic variants and smoking status, alcohol consumption or BMI on fibrinogen concentrations. Further studies are needed to yield deeper insight in the interplay between environmental factors and gene variants on the regulation of fibrinogen concentrations.
Modern genetic data combined with appropriate statistical methods have the potential to contribute substantially to our understanding of human history. We have developed an approach that exploits the genomic structure of admixed populations to date and characterize historical mixture events at fine scales. We used this to produce an atlas of worldwide human admixture history, constructed using genetic data alone and encompassing over 100 events occurring over the past 4,000 years. We identify events whose dates and participants suggest they describe genetic impacts of the Mongol Empire, Arab slave trade, Bantu expansion, first millennium CE migrations in eastern Europe, and European colonialism, as well as unrecorded events, revealing admixture to be an almost universal force shaping human populations.
In an isolated population, individuals are likely to share large genetic regions inherited from common ancestors. Identity by descent (IBD) can be inferred from SNP genotypes, which is useful in a number of applications, including identifying genetic variants influencing complex disease risk, and planning efficient cohort-sequencing strategies. We present ANCHAP – a method for detecting IBD in isolated populations. We compare accuracy of the method against other long-range and local phasing methods, using parent–offspring trios. In our experiments, we show that ANCHAP performs similarly as the other long-range method, but requires an order-of-magnitude less computational resources. A local phasing model is able to achieve similar sensitivity, but only at the cost of higher false discovery rates. In some regions of the genome, the studied individuals share haplotypes particularly often, which hints at the history of the populations studied. We demonstrate the method using SNP genotypes from three isolated island populations, as well as in a cohort of unrelated individuals. In samples from three isolated populations of around 1000 individual each, an average individual shares a haplotype at a genetic locus with 9–12 other individuals, compared with only 1 individual within the non-isolated population. We describe an application of ANCHAP to optimally choose samples in resequencing studies. We find that with sample sizes of 1000 individuals from an isolated population genotyped using a dense SNP array, and with 20% of these individuals sequenced, 65% of sequences of the unsequenced subjects can be partially inferred.
IBD; isolates; resequencing
To investigate whether bioelectrical impedance analysis could be used to identify overweight individuals at increased cardiometabolic risk, defined as the presence of metabolic syndrome and/or diabetes.
Design and Methods
Cross-sectional study of a Scottish population including 1210 women and 788 men. The diagnostic performance of thresholds of percentage body fat measured by bioelectrical impedance analysis to identify people at increased cardiometabolic risk was assessed using receiver-operating characteristic curves. Odds ratios for increased cardiometabolic risk in body mass index categories associated with values above compared to below sex-specific percentage body fat thresholds with optimal diagnostic performance were calculated using multivariable logistic regression analyses. The validity of bioelectrical impedance analysis to measure percentage body fat in this population was tested by examining agreement between bioelectrical impedance analysis and dual-energy X-ray absorptiometry in a subgroup of individuals.
Participants were aged 16-91 years and the optimal bioelectrical impedance analysis cut-points for percentage body fat for identifying people at increased cardiometabolic risk were 25.9% for men and 37.1% for women. Stratifying by these percentage body fat cut-points, the prevalence of increased cardiometabolic risk was 48% and 38% above the threshold and 24% and 19% below these thresholds for men and women, respectively. By comparison, stratifying by percentage body fat category had little impact on identifying increased cardiometabolic risk in normal weight and obese individuals. Fully adjusted odds ratios of being at increased cardiometabolic risk among overweight people with percentage body fat ≥25.9/37.1% compared with percentage body fat <25.9/37.1% as a reference were 1.93 (95% confidence interval: 1.20–3.10) for men and 1.79 (1.10–2.92) for women.
Percentage body fat measured using bioelectrical impedance analysis above a sex-specific threshold could be used in overweight people to identify individuals at increased cardiometabolic risk, who could benefit from risk factor management.
Refractive error (RE) is a complex, multifactorial disorder characterized by a mismatch between the optical power of the eye and its axial length that causes object images to be focused off the retina. The two major subtypes of RE are myopia (nearsightedness) and hyperopia (farsightedness), which represent opposite ends of the distribution of the quantitative measure of spherical refraction. We performed a fixed effects meta-analysis of genome-wide association results of myopia and hyperopia from 9 studies of European-derived populations: AREDS, KORA, FES, OGP-Talana, MESA, RSI, RSII, RSIII and ERF. One genome-wide significant region was observed for myopia, corresponding to a previously identified myopia locus on 8q12 (p = 1.25×10−8), which has been reported by Kiefer et al. as significantly associated with myopia age at onset and Verhoeven et al. as significantly associated to mean spherical-equivalent (MSE) refractive error. We observed two genome-wide significant associations with hyperopia. These regions overlapped with loci on 15q14 (minimum p value = 9.11×10−11) and 8q12 (minimum p value 1.82×10−11) previously reported for MSE and myopia age at onset. We also used an intermarker linkage- disequilibrium-based method for calculating the effective number of tests in targeted regional replication analyses. We analyzed myopia (which represents the closest phenotype in our data to the one used by Kiefer et al.) and showed replication of 10 additional loci associated with myopia previously reported by Kiefer et al. This is the first replication of these loci using myopia as the trait under analysis. “Replication-level” association was also seen between hyperopia and 12 of Kiefer et al.'s published loci. For the loci that show evidence of association to both myopia and hyperopia, the estimated effect of the risk alleles were in opposite directions for the two traits. This suggests that these loci are important contributors to variation of refractive error across the distribution.
Estimates of the heritability of plasma fibrinogen concentration, an established predictor of cardiovascular disease (CVD), range from 34 to 50%. Genetic variants so far identified by genome-wide association (GWA) studies only explain a small proportion (< 2%) of its variation.
Methods and Results
We conducted a meta-analysis of 28 GWA studies, including more than 90,000 subjects of European ancestry, the first GWA meta-analysis of fibrinogen levels in 7 African Americans studies totaling 8,289 samples, and a GWA study in Hispanic-Americans totaling 1,366 samples. Evaluation for association of SNPs with clinical outcomes included a total of 40,695 cases and 85,582 controls for coronary artery disease (CAD), 4,752 cases and 24,030 controls for stroke, and 3,208 cases and 46,167 controls for venous thromboembolism (VTE). Overall, we identified 24 genome-wide significant (P<5×10−8) independent signals in 23 loci, including 15 novel associations, together accounting for 3.7% of plasma fibrinogen variation. Gene-set enrichment analysis highlighted key roles in fibrinogen regulation for the three structural fibrinogen genes and pathways related to inflammation, adipocytokines and thyrotrophin-releasing hormone signaling. Whereas lead SNPs in a few loci were significantly associated with CAD, the combined effect of all 24 fibrinogen-associated lead SNPs was not significant for CAD, stroke or VTE.
We identify 23 robustly associated fibrinogen loci, 15 of which are new. Clinical outcome analysis of these loci does not support a causal relationship between circulating levels of fibrinogen and CAD, stroke or VTE.
Fibrinogen; cardiovascular disease; genome-wide association study
Obesity is of global health concern. There are well-described inverse relationships between female pubertal timing and obesity. Recent genome-wide association studies of age at menarche identified several obesity-related variants. Using data from the ReproGen Consortium, we employed meta-analytical techniques to estimate the associations of 95 a priori and recently identified obesity-related (body mass index (weight (kg)/height (m)2), waist circumference, and waist:hip ratio) single-nucleotide polymorphisms (SNPs) with age at menarche in 92,116 women of European descent from 38 studies (1970–2010), in order to estimate associations between genetic variants associated with central or overall adiposity and pubertal timing in girls. Investigators in each study performed a separate analysis of associations between the selected SNPs and age at menarche (ages 9–17 years) using linear regression models and adjusting for birth year, site (as appropriate), and population stratification. Heterogeneity of effect-measure estimates was investigated using meta-regression. Six novel associations of body mass index loci with age at menarche were identified, and 11 adiposity loci previously reported to be associated with age at menarche were confirmed, but none of the central adiposity variants individually showed significant associations. These findings suggest complex genetic relationships between menarche and overall obesity, and to a lesser extent central obesity, in normal processes of growth and development.
adiposity; body mass index; genetic association studies; menarche; obesity; waist circumference; waist:hip ratio; women's health
Variation in plasma levels of cortisol, an essential hormone in the stress response, is associated in population-based studies with cardio-metabolic, inflammatory and neuro-cognitive traits and diseases. Heritability of plasma cortisol is estimated at 30–60% but no common genetic contribution has been identified. The CORtisol NETwork (CORNET) consortium undertook genome wide association meta-analysis for plasma cortisol in 12,597 Caucasian participants, replicated in 2,795 participants. The results indicate that <1% of variance in plasma cortisol is accounted for by genetic variation in a single region of chromosome 14. This locus spans SERPINA6, encoding corticosteroid binding globulin (CBG, the major cortisol-binding protein in plasma), and SERPINA1, encoding α1-antitrypsin (which inhibits cleavage of the reactive centre loop that releases cortisol from CBG). Three partially independent signals were identified within the region, represented by common SNPs; detailed biochemical investigation in a nested sub-cohort showed all these SNPs were associated with variation in total cortisol binding activity in plasma, but some variants influenced total CBG concentrations while the top hit (rs12589136) influenced the immunoreactivity of the reactive centre loop of CBG. Exome chip and 1000 Genomes imputation analysis of this locus in the CROATIA-Korcula cohort identified missense mutations in SERPINA6 and SERPINA1 that did not account for the effects of common variants. These findings reveal a novel common genetic source of variation in binding of cortisol by CBG, and reinforce the key role of CBG in determining plasma cortisol levels. In turn this genetic variation may contribute to cortisol-associated degenerative diseases.
Cortisol is a steroid hormone from the adrenal glands that is essential in the response to stress. Most cortisol in blood is bound to corticosteroid binding globulin (CBG). Diseases causing cortisol deficiency (Addison's disease) or excess (Cushing's syndrome) are life-threatening. Variations in plasma cortisol have been associated with cardiovascular and psychiatric diseases and their risk factors. To dissect the genetic contribution to variation in plasma cortisol, we formed the CORtisol NETwork (CORNET) consortium and recruited collaborators with suitable samples from more than 15,000 people. The results reveal that the major genetic influence on plasma cortisol is mediated by variations in the binding capacity of CBG. This is determined by differences in the circulating concentrations of CBG and also in the immunoreactivity of its ‘reactive centre loop’, potentially influencing not only binding affinity for cortisol but also the stability of CBG and hence the tissue delivery of cortisol. These findings provide the first evidence for a common genetic effect on levels of this clinically important hormone, suggest that differences in CBG between individuals are biologically important, and pave the way for further research to dissect causality in the associations of plasma cortisol with common diseases.
Visual refractive errors (REs) are complex genetic traits with a largely unknown etiology. To date, genome-wide association studies (GWASs) of moderate size have identified several novel risk markers for RE, measured here as mean spherical equivalent (MSE). We performed a GWAS using a total of 7280 samples from five cohorts: the Age-Related Eye Disease Study (AREDS); the KORA study (‘Cooperative Health Research in the Region of Augsburg’); the Framingham Eye Study (FES); the Ogliastra Genetic Park-Talana (OGP-Talana) Study and the Multiethnic Study of Atherosclerosis (MESA). Genotyping was performed on Illumina and Affymetrix platforms with additional markers imputed to the HapMap II reference panel. We identified a new genome-wide significant locus on chromosome 16 (rs10500355, P = 3.9 × 10−9) in a combined discovery and replication set (26 953 samples). This single nucleotide polymorphism (SNP) is located within the RBFOX1 gene which is a neuron-specific splicing factor regulating a wide range of alternative splicing events implicated in neuronal development and maturation, including transcription factors, other splicing factors and synaptic proteins.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
The etiology of Parkinson disease (PD) is complex and multifactorial, with hereditary and environmental factors contributing. Monogenic forms have provided molecular clues to disease mechanisms but genetic modifiers of idiopathic PD are still to be determined.
We carried out whole-genome expression profiling of isolated human substantia nigra (SN) neurons from patients with PD vs. controls followed by association analysis of tagging single-nucleotide polymorphisms (SNPs) in differentially regulated genes. Association was investigated in a German PD sample and confirmed in Italian and British cohorts.
We identified four differentially expressed genes located in PD candidate pathways, ie, MTND2 (mitochondrial, p = 7.14 × 10−7), PDXK (vitamin B6/dopamine metabolism, p = 3.27 × 10−6), SRGAP3 (axon guidance, p = 5.65 × 10−6), and TRAPPC4 (vesicle transport, p = 5.81 × 10−6). We identified a DNA variant (rs2010795) in PDXK associated with an increased risk of PD in the German cohort (p = 0.00032). This association was confirmed in the British (p = 0.028) and Italian (p = 0.0025) cohorts individually and reached a combined value of p = 1.2 × 10−7 (odds ratio [OR], 1.3; 95% confidence interval [CI], 1.18–1.44).
We provide an example of how microgenomic genome-wide expression studies in combination with association analysis can aid to identify genetic modifiers in neurodegenerative disorders. The detection of a genetic variant in PDXK, together with evidence accumulating from clinical studies, emphasize the impact of vitamin B6 status and metabolism on disease risk and therapy in PD.
Mega- or meta-analytic studies (e.g. genome-wide association studies) are increasingly used in behavior genetics. An issue in such studies is that phenotypes are often measured by different instruments across study cohorts, requiring harmonization of measures so that more powerful fixed effect meta-analyses can be employed. Within the Genetics of Personality Consortium, we demonstrate for two clinically relevant personality traits, Neuroticism and Extraversion, how Item-Response Theory (IRT) can be applied to map item data from different inventories to the same underlying constructs. Personality item data were analyzed in >160,000 individuals from 23 cohorts across Europe, USA and Australia in which Neuroticism and Extraversion were assessed by nine different personality inventories. Results showed that harmonization was very successful for most personality inventories and moderately successful for some. Neuroticism and Extraversion inventories were largely measurement invariant across cohorts, in particular when comparing cohorts from countries where the same language is spoken. The IRT-based scores for Neuroticism and Extraversion were heritable (48 and 49 %, respectively, based on a meta-analysis of six twin cohorts, total N = 29,496 and 29,501 twin pairs, respectively) with a significant part of the heritability due to non-additive genetic factors. For Extraversion, these genetic factors qualitatively differ across sexes. We showed that our IRT method can lead to a large increase in sample size and therefore statistical power. The IRT approach may be applied to any mega- or meta-analytic study in which item-based behavioral measures need to be harmonized.
Electronic supplementary material
The online version of this article (doi:10.1007/s10519-014-9654-x) contains supplementary material, which is available to authorized users.
Personality; Item-Response Theory; Measurement; Genome-wide association studies; Consortium; Meta-analysis
Using the ImmunoChip custom genotyping array, we analysed 14,498 multiple sclerosis subjects and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (p-value < 1.0 × 10-4). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 multiple sclerosis subjects and 26,703 healthy controls. In these 80,094 individuals of European ancestry we identified 48 new susceptibility variants (p-value < 5.0 × 10-8); three found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants in 103 discrete loci outside of the Major Histocompatibility Complex. With high resolution Bayesian fine-mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalogue of multiple sclerosis risk variants and illustrates the value of fine-mapping in the resolution of GWAS signals.
Low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, and total cholesterol are heritable, modifiable, risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,578 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P < 5×10−8, including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian, and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipids are often associated with cardiovascular and metabolic traits including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio, and body mass index. Our results illustrate the value of genetic data from individuals of diverse ancestries and provide insights into biological mechanisms regulating blood lipids to guide future genetic, biological, and therapeutic research.
Triglycerides are transported in plasma by specific triglyceride-rich lipoproteins; in epidemiologic studies, increased triglyceride levels correlate with higher risk for coronary artery disease (CAD). However, it is unclear whether this association reflects causal processes. We used 185 common variants recently mapped for plasma lipids (P<5×10−8 for each) to examine the role of triglycerides on risk for CAD. First, we highlight loci associated with both low-density lipoprotein cholesterol (LDL-C) and triglycerides, and show that the direction and magnitude of both are factors in determining CAD risk. Second, we consider loci with only a strong magnitude of association with triglycerides and show that these loci are also associated with CAD. Finally, in a model accounting for effects on LDL-C and/or high-density lipoprotein cholesterol, a polymorphism's strength of effect on triglycerides is correlated with the magnitude of its effect on CAD risk. These results suggest that triglyceride-rich lipoproteins causally influence risk for CAD.
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Every individual carries two copies of each chromosome (haplotypes), one from each of their parents, that consist of a long sequence of alleles. Modern genotyping technologies do not measure haplotypes directly, but the combined sum (or genotype) of alleles at each site. Statistical methods are needed to infer (or phase) the haplotypes from the observed genotypes. Haplotype estimation is a key first step of many disease and population genetic studies. Much recent work in this area has focused on phasing in cohorts of nominally unrelated individuals. So called ‘long range phasing’ is a relatively recent concept for phasing individuals with intermediate levels of relatedness, such as cohorts taken from population isolates. Methods also exist for phasing genotypes for individuals within explicit pedigrees. Whilst high quality phasing techniques are available for each of these demographic scenarios, to date, no single method is applicable to all three. In this paper, we present a general approach for phasing cohorts that contain any level of relatedness between the study individuals. We demonstrate high levels of accuracy in all demographic scenarios, as well as the ability to detect (Mendelian consistent) genotyping error and recombination events in duos and trios, the first method with such a capability.
A recent genome-wide association study identified hepatocyte nuclear factor 1-α (HNF1A) as a key regulator of fucosylation. We hypothesized that loss-of-function HNF1A mutations causal for maturity-onset diabetes of the young (MODY) would display altered fucosylation of N-linked glycans on plasma proteins and that glycan biomarkers could improve the efficiency of a diagnosis of HNF1A-MODY. In a pilot comparison of 33 subjects with HNF1A-MODY and 41 subjects with type 2 diabetes, 15 of 29 glycan measurements differed between the two groups. The DG9-glycan index, which is the ratio of fucosylated to nonfucosylated triantennary glycans, provided optimum discrimination in the pilot study and was examined further among additional subjects with HNF1A-MODY (n = 188), glucokinase (GCK)-MODY (n = 118), hepatocyte nuclear factor 4-α (HNF4A)-MODY (n = 40), type 1 diabetes (n = 98), type 2 diabetes (n = 167), and nondiabetic controls (n = 98). The DG9-glycan index was markedly lower in HNF1A-MODY than in controls or other diabetes subtypes, offered good discrimination between HNF1A-MODY and both type 1 and type 2 diabetes (C statistic ≥0.90), and enabled us to detect three previously undetected HNF1A mutations in patients with diabetes. In conclusion, glycan profiles are altered substantially in HNF1A-MODY, and the DG9-glycan index has potential clinical value as a diagnostic biomarker of HNF1A dysfunction.
Central corneal thickness (CCT) is associated with eye conditions including keratoconus and glaucoma. We performed a meta-analysis on >20,000 individuals in European and Asian populations that identified 16 new loci associated with CCT at genome-wide significance (P < 5 × 10−8). We further showed that 2 CCT-associated loci, FOXO1 and FNDC3B, conferred relatively large risks for keratoconus in 2 cohorts with 874 cases and 6,085 controls (rs2721051 near FOXO1 had odds ratio (OR) = 1.62, 95% confidence interval (CI) = 1.4–1.88, P = 2.7 × 10−10, and rs4894535 in FNDC3B had OR = 1.47, 95% CI = 1.29–1.68, P = 4.9 × 10−9). FNDC3B was also associated with primary open-angle glaucoma (P = 5.6 × 10−4; tested in 3 cohorts with 2,979 cases and 7,399 controls). Further analyses implicate the collagen and extracellular matrix pathways in the regulation of CCT.
The length of female reproductive lifespan is associated with multiple adverse outcomes, including breast cancer, cardiovascular disease and infertility. The biological processes that govern the timing of the beginning and end of reproductive life are not well understood. Genetic variants are known to contribute to ∼50% of the variation in both age at menarche and menopause, but to date the known genes explain <15% of the genetic component. We have used genome-wide association in a bivariate meta-analysis of both traits to identify genes involved in determining reproductive lifespan. We observed significant genetic correlation between the two traits using genome-wide complex trait analysis. However, we found no robust statistical evidence for individual variants with an effect on both traits. A novel association with age at menopause was detected for a variant rs1800932 in the mismatch repair gene MSH6 (P = 1.9 × 10−9), which was also associated with altered expression levels of MSH6 mRNA in multiple tissues. This study contributes to the growing evidence that DNA repair processes play a key role in ovarian ageing and could be an important therapeutic target for infertility.
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10−9) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10−4–2.2 × 10−7. Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Fine structural details of glycans attached to the conserved N-glycosylation site significantly not only affect function of individual immunoglobulin G (IgG) molecules but also mediate inflammation at the systemic level. By analyzing IgG glycosylation in 5,117 individuals from four European populations, we have revealed very complex patterns of changes in IgG glycosylation with age. Several IgG glycans (including FA2B, FA2G2, and FA2BG2) changed considerably with age and the combination of these three glycans can explain up to 58% of variance in chronological age, significantly more than other markers of biological age like telomere lengths. The remaining variance in these glycans strongly correlated with physiological parameters associated with biological age. Thus, IgG glycosylation appears to be closely linked with both chronological and biological ages. Considering the important role of IgG glycans in inflammation, and because the observed changes with age promote inflammation, changes in IgG glycosylation also seem to represent a factor contributing to aging.
Glycosylation is the key posttranslational mechanism that regulates function of immunoglobulins, with multiple systemic repercussions to the immune system. Our study of IgG glycosylation in 5,117 individuals from four European populations has revealed very extensive and complex changes in IgG glycosylation with age. The combined index composed of only three glycans explained up to 58% of variance in age, considerably more than other biomarkers of age like telomere lengths. The remaining variance in these glycans strongly correlated with physiological parameters associated with biological age; thus, IgG glycosylation appears to be closely linked with both chronological and biological ages. The ability to measure human biological aging using molecular profiling has practical applications for diverse fields such as disease prevention and treatment, or forensics.
Aging; Glycome; Glycosylation; Immunoglobulin G; Inflammation.