A recent genome-wide association study identified hepatocyte nuclear factor 1-α (HNF1A) as a key regulator of fucosylation. We hypothesized that loss-of-function HNF1A mutations causal for maturity-onset diabetes of the young (MODY) would display altered fucosylation of N-linked glycans on plasma proteins and that glycan biomarkers could improve the efficiency of a diagnosis of HNF1A-MODY. In a pilot comparison of 33 subjects with HNF1A-MODY and 41 subjects with type 2 diabetes, 15 of 29 glycan measurements differed between the two groups. The DG9-glycan index, which is the ratio of fucosylated to nonfucosylated triantennary glycans, provided optimum discrimination in the pilot study and was examined further among additional subjects with HNF1A-MODY (n = 188), glucokinase (GCK)-MODY (n = 118), hepatocyte nuclear factor 4-α (HNF4A)-MODY (n = 40), type 1 diabetes (n = 98), type 2 diabetes (n = 167), and nondiabetic controls (n = 98). The DG9-glycan index was markedly lower in HNF1A-MODY than in controls or other diabetes subtypes, offered good discrimination between HNF1A-MODY and both type 1 and type 2 diabetes (C statistic ≥0.90), and enabled us to detect three previously undetected HNF1A mutations in patients with diabetes. In conclusion, glycan profiles are altered substantially in HNF1A-MODY, and the DG9-glycan index has potential clinical value as a diagnostic biomarker of HNF1A dysfunction.
Greater height and higher intelligence test scores are predictors of better health outcomes. Here, we used molecular (single-nucleotide polymorphism) data to estimate the genetic correlation between height and general intelligence (g) in 6,815 unrelated subjects (median age 57, IQR 49–63) from the Generation Scotland: Scottish Family Health Study cohort. The phenotypic correlation between height and g was 0.16 (SE 0.01). The genetic correlation between height and g was 0.28 (SE 0.09) with a bivariate heritability estimate of 0.71. Understanding the molecular basis of the correlation between height and intelligence may help explain any shared role in determining health outcomes. This study identified a modest genetic correlation between height and intelligence with the majority of the phenotypic correlation being explained by shared genetic influences.
Electronic supplementary material
The online version of this article (doi:10.1007/s10519-014-9644-z) contains supplementary material, which is available to authorized users.
Height; Intelligence; Molecular genetics; Genetic correlation; Generation Scotland
Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits. However, they have explained relatively little trait heritability. Recently, we proposed a new analytical approach called regional heritability mapping (RHM) that captures more of the missing genetic variation. This method is applicable both to related and unrelated populations. Here, we demonstrate the power of RHM in comparison with single-SNP GWAS and gene-based association approaches under a wide range of scenarios with variable numbers of quantitative trait loci (QTL) with common and rare causal variants in a narrow genomic region. Simulations based on real genotype data were performed to assess power to capture QTL variance, and we demonstrate that RHM has greater power to detect rare variants and/or multiple alleles in a region than other approaches. In addition, we show that RHM can capture more accurately the QTL variance, when it is caused by multiple independent effects and/or rare variants. We applied RHM to analyze three biometrical eye traits for which single-SNP GWAS have been published or performed to evaluate the effectiveness of this method in real data analysis and detected some additional loci which were not detected by other GWAS methods. RHM has the potential to explain some of missing heritability by capturing variance caused by QTL with low MAF and multiple independent QTL in a region, not captured by other GWAS methods. RHM analyses can be implemented using the software REACTA (http://www.epcc.ed.ac.uk/projects-portfolio/reacta).
common and rare variants; GWAS; regional heritability mapping; multiple independent effects; missing heritability
Measures of personality and psychological distress are correlated and exhibit genetic covariance. We conducted univariate genome-wide SNP (~2.5 million) and gene-based association analyses of these traits and examined the overlap in results across traits, including a prediction analysis of mood states using genetic polygenic scores for personality. Measures of neuroticism, extraversion, and symptoms of anxiety, depression, and general psychological distress were collected in eight European cohorts (n ranged 546 to 1 338; maximum total n=6 268) whose mean age ranged from 55 to 79 years. Meta-analysis of the cohort results was performed, with follow-up associations of the top SNPs and genes investigated in independent cohorts (n=527 to 6 032). Suggestive association (P=8×10−8) of rs1079196 in the FHIT gene was observed with symptoms of anxiety. Other notable associations (P<6.09×10−6) included SNPs in five genes for neuroticism (LCE3C, POLR3A, LMAN1L, ULK3, SCAMP2), KIAA0802 for extraversion, and NOS1 for general psychological distress. An association between symptoms of depression and rs7582472 (near to MGAT5 and NCKAP5) was replicated in two independent samples, but other replication findings were less consistent. Gene-based tests identified a significant locus on chromosome 15 (spanning five genes) associated with neuroticism which replicated (P<0.05) in an independent cohort. Support for common genetic effects among personality and mood (particularly neuroticism and depressive symptoms) was found in terms of SNP association overlap and polygenic score prediction. The variance explained by individual SNPs was very small (up to 1%) confirming that there are no moderate/large effects of common SNPs on personality and related traits.
GWAS; extraversion; neuroticism; anxiety; depression
Calcium is vital to the normal functioning of multiple organ systems and its serum concentration is tightly regulated. Apart from CASR, the genes associated with serum calcium are largely unknown. We conducted a genome-wide association meta-analysis of 39,400 individuals from 17 population-based cohorts and investigated the 14 most strongly associated loci in ≤21,679 additional individuals. Seven loci (six new regions) in association with serum calcium were identified and replicated. Rs1570669 near CYP24A1 (P = 9.1E-12), rs10491003 upstream of GATA3 (P = 4.8E-09) and rs7481584 in CARS (P = 1.2E-10) implicate regions involved in Mendelian calcemic disorders: Rs1550532 in DGKD (P = 8.2E-11), also associated with bone density, and rs7336933 near DGKH/KIAA0564 (P = 9.1E-10) are near genes that encode distinct isoforms of diacylglycerol kinase. Rs780094 is in GCKR. We characterized the expression of these genes in gut, kidney, and bone, and demonstrate modulation of gene expression in bone in response to dietary calcium in mice. Our results shed new light on the genetics of calcium homeostasis.
Calcium is vital to many biological processes and its serum concentration is tightly regulated. Family studies have shown that serum calcium is under strong genetic control. Apart from CASR, the genes associated with serum calcium are largely unknown. We conducted a genome-wide association meta-analysis of 39,400 individuals from 17 population-based cohorts and investigated the 14 most strongly associated loci in ≤21,679 additional individuals. We identified seven loci (six new regions) as being robustly associated with serum calcium. Three loci implicate regions involved in rare monogenic diseases including disturbances of serum calcium levels. Several of the newly identified loci harbor genes linked to the hormonal control of serum calcium. In mice experiments, we characterized the expression of these genes in gut, kidney, and bone, and explored the influence of dietary calcium intake on the expression of these genes in these organs. Our results shed new light on the genetics of calcium homeostasis and suggest a role for dietary calcium intake in bone-specific gene expression.
Twenty common genetic variants have been associated with risk of developing colorectal cancer (CRC) in genome wide association studies to date. Since large differences between populations exist, generalisability of findings to any specific population needs to be confirmed.
The aim of this study was to perform an association study between risk variants: rs10795668, rs16892766, rs3802842 and rs4939827 and CRC risk in Croatian population.
An association study was performed on 320 colorectal cancer cases and 594 controls recruited in Croatia. We genotyped four variants previously associated with CRC: rs10795668, rs16892766, rs3802842 and rs4939827.
SMAD7 variant rs4939827 (18q21.1) was significantly associated with CRC risk in Croatian population. C allele was associated with a decreased risk, odds ratio (OR): 0.70 (95% CI: 0.57-0.85, P=3.5E-04). Compared to TT homozygotes, risk was reduced by 34% in heterozygotes (OR=0.66, 95% CI: 0.47-0.92) and by 52% in CC homozygotes (OR=0.48, 95% CI: 0.33-0.72).
Our results show association of rs4939827 with colorectal cancer risk in Croatian population. The higher strength of the association in comparison to other studies suggests population-specific environmental or genetic factors may be modifying the association. More studies are needed to further describe role of rs4939827 in CRC. Likely reason for failure of replication for other 3 loci is inadequate study power.
Age-related macular degeneration (AMD) is a leading cause of visual loss in Western populations. Susceptibility is influenced by age, environmental and genetic factors. Known genetic risk loci do not account for all the heritability. We therefore carried out a genome-wide association study of AMD in the UK population with 893 cases of advanced AMD and 2199 controls. This showed an association with the well-established AMD risk loci ARMS2 (age-related maculopathy susceptibility 2)–HTRA1 (HtrA serine peptidase 1) (P =2.7 × 10−72), CFH (complement factor H) (P =2.3 × 10−47), C2 (complement component 2)–CFB (complement factor B) (P =5.2 × 10−9), C3 (complement component 3) (P =2.2 × 10−3) and CFI (P =3.6 × 10−3) and with more recently reported risk loci at VEGFA (P =1.2 × 10−3) and LIPC (hepatic lipase) (P =0.04). Using a replication sample of 1411 advanced AMD cases and 1431 examined controls, we confirmed a novel association between AMD and single-nucleotide polymorphisms on chromosome 6p21.3 at TNXB (tenascin XB)–FKBPL (FK506 binding protein like) [rs12153855/rs9391734; discovery P =4.3 × 10−7, replication P =3.0 × 10−4, combined P =1.3 × 10−9, odds ratio (OR) = 1.4, 95% confidence interval (CI) = 1.3–1.6] and the neighbouring gene NOTCH4 (Notch 4) (rs2071277; discovery P =3.2 × 10−8, replication P =3.8 × 10−5, combined P =2.0 × 10−11, OR = 1.3, 95% CI = 1.2–1.4). These associations remained significant in conditional analyses which included the adjacent C2–CFB locus. TNXB, FKBPL and NOTCH4 are all plausible AMD susceptibility genes, but further research will be needed to identify the causal variants and determine whether any of these genes are involved in the pathogenesis of AMD.
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.
Several infrequent genetic polymorphisms in the SERPINA1 gene are known to substantially reduce concentration of alpha1-antitrypsin (AAT) in the blood. Since low AAT serum levels fail to protect pulmonary tissue from enzymatic degradation, these polymorphisms also increase the risk for early onset chronic obstructive pulmonary disease (COPD). The role of more common SERPINA1 single nucleotide polymorphisms (SNPs) in respiratory health remains poorly understood.
We present here an agnostic investigation of genetic determinants of circulating AAT levels in a general population sample by performing a genome-wide association study (GWAS) in 1392 individuals of the SAPALDIA cohort.
Five common SNPs, defined by showing minor allele frequencies (MAFs) >5%, reached genome-wide significance, all located in the SERPINA gene cluster at 14q32.13. The top-ranking genotyped SNP rs4905179 was associated with an estimated effect of β = −0.068 g/L per minor allele (P = 1.20*10−12). But denser SERPINA1 locus genotyping in 5569 participants with subsequent stepwise conditional analysis, as well as exon-sequencing in a subsample (N = 410), suggested that AAT serum level is causally determined at this locus by rare (MAF<1%) and low-frequent (MAF 1–5%) variants only, in particular by the well-documented protein inhibitor S and Z (PI S, PI Z) variants. Replication of the association of rs4905179 with AAT serum levels in the Copenhagen City Heart Study (N = 8273) was successful (P<0.0001), as was the replication of its synthetic nature (the effect disappeared after adjusting for PI S and Z, P = 0.57). Extending the analysis to lung function revealed a more complex situation. Only in individuals with severely compromised pulmonary health (N = 397), associations of common SNPs at this locus with lung function were driven by rarer PI S or Z variants. Overall, our meta-analysis of lung function in ever-smokers does not support a functional role of common SNPs in the SERPINA gene cluster in the general population.
Low levels of alpha1-antitrypsin (AAT) in the blood are a well-established risk factor for accelerated loss in lung function and chronic obstructive pulmonary disease. While a few infrequent genetic polymorphisms are known to influence the serum levels of this enzyme, the role of common genetic variants has not been examined so far. The present genome-wide scan for associated variants in approximately 1400 Swiss inhabitants revealed a chromosomal locus containing the functionally established variants of AAT deficiency and variants previously associated with lung function and emphysema. We used dense genotyping of this genetic region in more than 5500 individuals and subsequent conditional analyses to unravel which of these associated variants contribute independently to the phenotype's variability. All associations of common variants could be attributed to the rarer functionally established variants, a result which was then replicated in an independent population-based Danish cohort. Hence, this locus represents a textbook example of how a large part of a trait's heritability can be hidden in infrequent genetic polymorphisms. The attempt to transfer these results to lung function furthermore suggests that effects of common variants in this genetic region in ever-smokers may also be explained by rarer variants, but only in individuals with hampered pulmonary health.
Elevated serum urate concentrations can cause gout, a prevalent and painful inflammatory arthritis. By combining data from >140,000 individuals of European ancestry within the Global Urate Genetics Consortium (GUGC), we identified and replicated 28 genome-wide significant loci in association with serum urate concentrations (18 new regions in or near TRIM46, INHBB, SFMBT1, TMEM171, VEGFA, BAZ1B, PRKAG2, STC1, HNF4G, A1CF, ATXN2, UBE2Q2, IGF1R, NFAT5, MAF, HLF, ACVR1B-ACVRL1 and B3GNT4). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. We further characterized these loci for associations with gout, transcript expression and the fractional excretion of urate. Network analyses implicate the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. New candidate genes for serum urate concentration highlight the importance of metabolic control of urate production and excretion, which may have implications for the treatment and prevention of gout.
It is a longstanding puzzle why non-coding variants in the complement factor H (CFH) gene are more strongly associated with age-related macular degeneration (AMD) than functional coding variants that directly influence the alternative complement pathway. The situation is complicated by tight genetic associations across the region, including the adjacent CFH-related genes CFHR3 and CFHR1, which may themselves influence the alternative complement pathway and are contained within a common deletion (CNP147) which is associated with protection against AMD. It is unclear whether this association is mediated through a protective effect of low plasma CFHR1 concentrations, high plasma CFH or both. We examined the triangular relationships of CFH/CFHR3/CFHR1 genotype, plasma CFH or CFHR1 concentrations and AMD susceptibility in combined case–control (1256 cases, 1020 controls) and cross-sectional population (n = 1004) studies and carried out genome-wide association studies of plasma CFH and CFHR1 concentrations. A non-coding CFH SNP (rs6677604) and the CNP147 deletion were strongly correlated both with each other and with plasma CFH and CFHR1 concentrations. The plasma CFH-raising rs6677604 allele and raised plasma CFH concentration were each associated with AMD protection. In contrast, the protective association of the CNP147 deletion with AMD was not mediated by low plasma CFHR1, since AMD-free controls showed increased plasma CFHR1 compared with cases, but it may be mediated by the association of CNP147 with raised plasma CFH concentration. The results are most consistent with a regulatory locus within a 32 kb region of the CFH gene, with a major effect on plasma CFH concentration and AMD susceptibility.
The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1–10%) in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28–38%, for SNPs with a minor allele frequency in the range 1–3%.
Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based biobank of 24,000 participants with rich phenotype and DNA available for genetic research. This paper describes the laboratory results from genotyping 32 single nucleotide polymorphisms (SNPs) on DNA from over 10,000 participants who attended GS:SFHS research clinics. The analysis described here was undertaken to test the quality of genetic information available to researchers. The success rate of each marker genotyped (call rate), minor allele frequency and adherence to Mendelian inheritance are presented. The few deviations in marker transmission in the 925 parent-child trios analysed were assessed as to whether they were likely to be miscalled genotypes, data or sample handling errors, or pedigree inaccuracies including non-paternity.
The first 10,450 GS:SFHS clinic participants who had spirometry and smoking data available and DNA extracted were selected. 32 SNPs were assayed, chosen as part of a replication experiment from a Genome-Wide Association Study meta-analysis of lung function.
In total 325,336 genotypes were returned. The overall project pass rate (32 SNPs on 10,450 samples) was 97.29%. A total of 925 parent-child trios were assessed for transmission of the SNP markers, with 16 trios indicating evidence of inconsistency in the recorded pedigrees.
The Generation Scotland: Scottish Family Health Study used well-validated study methods and can produce good quality genetic data, with a low error rate. The GS:SFHS DNA samples are of high quality and the family groups were recorded and processed with accuracy during collection of the cohort.
Genetics; SNP Genotyping; Parent-child trios; Error rate; Non paternity; Generation Scotland; Biobank
Protein glycosylation is a ubiquitous modification that affects the structure and function of proteins. Our recent genome wide association study identified transcription factor HNF1A as an important regulator of plasma protein glycosylation. To evaluate the potential impact of epigenetic regulation of HNF1A on protein glycosylation we analyzed CpG methylation in 810 individuals. The association between methylation of four CpG sites and the composition of plasma and IgG glycomes was analyzed. Several statistically significant associations were observed between HNF1A methylation and plasma glycans, while there were no significant associations with IgG glycans. The most consistent association with HNF1A methylation was observed with the increase in the proportion of highly branched glycans in the plasma N-glycome. The hypothesis that inactivation of HNF1A promotes branching of glycans was supported by the analysis of plasma N-glycomes in 61 patients with inactivating mutations in HNF1A, where the increase in plasma glycan branching was also observed. This study represents the first demonstration of epigenetic regulation of plasma glycome composition, suggesting a potential mechanism by which epigenetic deregulation of the glycome may contribute to disease development.
protein glycosylation; plasma glycome; HNF1A; CpG methylation; epigenetics
Glycosylation of immunoglobulin G (IgG) influences IgG effector function by modulating binding to Fc receptors. To identify genetic loci associated with IgG glycosylation, we quantitated N-linked IgG glycans using two approaches. After isolating IgG from human plasma, we performed 77 quantitative measurements of N-glycosylation using ultra-performance liquid chromatography (UPLC) in 2,247 individuals from four European discovery populations. In parallel, we measured IgG N-glycans using MALDI-TOF mass spectrometry (MS) in a replication cohort of 1,848 Europeans. Meta-analysis of genome-wide association study (GWAS) results identified 9 genome-wide significant loci (P<2.27×10−9) in the discovery analysis and two of the same loci (B4GALT1 and MGAT3) in the replication cohort. Four loci contained genes encoding glycosyltransferases (ST6GAL1, B4GALT1, FUT8, and MGAT3), while the remaining 5 contained genes that have not been previously implicated in protein glycosylation (IKZF1, IL6ST-ANKRD55, ABCF2-SMARCD3, SUV420H1, and SMARCB1-DERL3). However, most of them have been strongly associated with autoimmune and inflammatory conditions (e.g., systemic lupus erythematosus, rheumatoid arthritis, ulcerative colitis, Crohn's disease, diabetes type 1, multiple sclerosis, Graves' disease, celiac disease, nodular sclerosis) and/or haematological cancers (acute lymphoblastic leukaemia, Hodgkin lymphoma, and multiple myeloma). Follow-up functional experiments in haplodeficient Ikzf1 knock-out mice showed the same general pattern of changes in IgG glycosylation as identified in the meta-analysis. As IKZF1 was associated with multiple IgG N-glycan traits, we explored biomarker potential of affected N-glycans in 101 cases with SLE and 183 matched controls and demonstrated substantial discriminative power in a ROC-curve analysis (area under the curve = 0.842). Our study shows that it is possible to identify new loci that control glycosylation of a single plasma protein using GWAS. The results may also provide an explanation for the reported pleiotropy and antagonistic effects of loci involved in autoimmune diseases and haematological cancer.
After analysing glycans attached to human immunoglobulin G in 4,095 individuals, we performed the first genome-wide association study (GWAS) of the glycome of an individual protein. Nine genetic loci were found to associate with glycans with genome-wide significance. Of these, four were enzymes that directly participate in IgG glycosylation, thus the observed associations were biologically founded. The remaining five genetic loci were not previously implicated in protein glycosylation, but the most of them have been reported to be relevant for autoimmune and inflammatory conditions and/or haematological cancers. A particularly interesting gene, IKZF1 was found to be associated with multiple IgG N-glycans. This gene has been implicated in numerous diseases, including systemic lupus erythematosus (SLE). We analysed N-glycans in 101 cases with SLE and 183 matched controls and demonstrated their substantial biomarker potential. Our study shows that it is possible to identify new loci that control glycosylation of a single plasma protein using GWAS. Our results may also provide an explanation for opposite effects of some genes in autoimmune diseases and haematological cancer.
We have used targeted genomic sequencing of high-complexity DNA pools based on long-range PCR and deep DNA sequencing by the SOLiD technology. The method was used for sequencing of 286 kb from four chromosomal regions with quantitative trait loci (QTL) influencing blood plasma lipid and uric acid levels in DNA pools of 500 individuals from each of five European populations. The method shows very good precision in estimating allele frequencies as compared with individual genotyping of SNPs (r2=0.95, P<10−16). Validation shows that the method is able to identify novel SNPs and estimate their frequency in high-complexity DNA pools. In our five populations, 17% of all SNPs and 61% of structural variants are not available in the public databases. A large fraction of the novel variants show a limited geographic distribution, with 62% of the novel SNPs and 59% of novel structural variants being detected in only one of the populations. The large number of population-specific novel SNPs underscores the need for comprehensive sequencing of local populations in order to identify the causal variants of human traits.
pooling; next-generation DNA sequencing; SOLiD; SNP; indels
The limited proportion of complex trait variance identified in genome-wide association studies may reflect the limited power of single SNP analyses to detect either rare causative alleles or those of small effect. Motivated by studies that demonstrate that loci contributing to trait variation may contain a number of different alleles, we have developed an analytical approach termed Regional Genomic Relationship Mapping that, like linkage-based family methods, integrates variance contributed by founder gametes within a pedigree. This approach takes advantage of very distant (and unrecorded) relationships, and this greatly increases the power of the method, compared with traditional pedigree-based linkage analyses. By integrating variance contributed by founder gametes in the population, our approach provides an estimate of the Regional Heritability attributable to a small genomic region (e.g. 100 SNP window covering ca. 1 Mb of DNA in a 300000 SNP GWAS) and has the power to detect regions containing multiple alleles that individually contribute too little variance to be detectable by GWAS as well as regions with single common GWAS-detectable SNPs. We use genome-wide SNP array data to obtain both a genome-wide relationship matrix and regional relationship (“identity by state" or IBS) matrices for sequential regions across the genome. We then estimate a heritability for each region sequentially in our genome-wide scan. We demonstrate by simulation and with real data that, when compared to traditional (“individual SNP") GWAS, our method uncovers new loci that explain additional trait variation. We analysed data from three Southern European populations and from Orkney for exemplar traits – serum uric acid concentration and height. We show that regional heritability estimates are correlated with results from genome-wide association analysis but can capture more of the genetic variance segregating in the population and identify additional trait loci.
Genome-wide association studies (GWAS) aim to detect single nucleotide polymorphisms (SNP) associated with trait variation. However, due to the large number of tests, standard analysis techniques impose highly stringent significance thresholds, leaving potentially associated SNPs undetected, and much of the trait genetic variation unexplained. Pathway- and network-based methodologies applied to GWAS aim to detect associations missed by standard single-marker approaches. The complex and non-random architecture of the genome makes it a challenge to derive an appropriate testing framework for such methodologies. We developed a rapid and simple permutation approach that uses GWAS SNP association results to establish the significance of pathway associations while accounting for the linkage disequilibrium structure of SNPs and the clustering of functionally related elements in the genome. All SNPs used in the GWAS are placed in a “circular genome” according to their location. Then the complete set of SNP association P values are permuted by rotation with respect to the genomic locations of the SNPs. Once these “simulated” P values are assigned, the joint gene P values are calculated using Fisher’s combination test, and the association of pathways is tested using the hypergeometric test. The circular genomic permutation approach was applied to a human genome-wide association dataset. The data consists of 719 individuals from the ORCADES study genotyped for ∼300,000 SNPs and measured for 51 traits ranging from physical to biochemical measurements. KEGG pathways (n = 225) were used as the sets of pathways to be tested. Our results demonstrate that the circular genomic permutations provide robust association P values. The non-permuted hypergeometric analysis generates ∼1400 pathway-trait combination results with an association P value more significant than P ≤ 0.05, whereas applying circular genomic permutation reduces the number of significant results to a more credible 40% of that value. The circular permutation software (“genomicper”) is available as an R package at http://cran.r-project.org/.
GWAS; pathway-based; permutation method; genomicper R package; cardiac disease
We surveyed gene–gene interactions (epistasis) in human body mass index (BMI) in four European populations (n<1200) via exhaustive pair-wise genome scans where interactions were computed as F ratios by testing a linear regression model fitting two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide significant in either cohort based on the consensus threshold (P=5.0E−08) and a Bonferroni corrected threshold (P=1.1E−12), respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E−08) across cohorts to identify common epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published genome-wide association loci, including CDH13 (cadherin 13) associated with height and SORCS2 (sortilin-related VPS10 domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E−08) were tested across cohorts and found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts to confirm the findings.
body mass index; BMI; gene interaction; epistasis; pair-wise genome scan
Stature is a classical and highly heritable complex trait, with 80%–90% of variation explained by genetic factors. In recent years, genome-wide association studies (GWAS) have successfully identified many common additive variants influencing human height; however, little attention has been given to the potential role of recessive genetic effects. Here, we investigated genome-wide recessive effects by an analysis of inbreeding depression on adult height in over 35,000 people from 21 different population samples. We found a highly significant inverse association between height and genome-wide homozygosity, equivalent to a height reduction of up to 3 cm in the offspring of first cousins compared with the offspring of unrelated individuals, an effect which remained after controlling for the effects of socio-economic status, an important confounder (χ2 = 83.89, df = 1; p = 5.2×10−20). There was, however, a high degree of heterogeneity among populations: whereas the direction of the effect was consistent across most population samples, the effect size differed significantly among populations. It is likely that this reflects true biological heterogeneity: whether or not an effect can be observed will depend on both the variance in homozygosity in the population and the chance inheritance of individual recessive genotypes. These results predict that multiple, rare, recessive variants influence human height. Although this exploratory work focuses on height alone, the methodology developed is generally applicable to heritable quantitative traits (QT), paving the way for an investigation into inbreeding effects, and therefore genetic architecture, on a range of QT of biomedical importance.
Studies investigating the extent to which genetics influences human characteristics such as height have concentrated mainly on common variants of genes, where having one or two copies of a given variant influences the trait or risk of disease. This study explores whether a different type of genetic variant might also be important. We investigate the role of recessive genetic variants, where two identical copies of a variant are required to have an effect. By measuring genome-wide homozygosity—the phenomenon of inheriting two identical copies at a given point of the genome—in 35,000 individuals from 21 European populations, and by comparing this to individual height, we found that the more homozygous the genome, the shorter the individual. The offspring of first cousins (who have increased homozygosity) were predicted to be up to 3 cm shorter on average than the offspring of unrelated parents. Height is influenced by the combined effect of many recessive variants dispersed across the genome. This may also be true for other human characteristics and diseases, opening up a new way to understand how genetic variation influences our health.
Serum concentrations of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TGs) and total cholesterol (TC) are important heritable risk factors for cardiovascular disease. Although genome-wide association studies (GWASs) of circulating lipid levels have identified numerous loci, a substantial portion of the heritability of these traits remains unexplained. Evidence of unexplained genetic variance can be detected by combining multiple independent markers into additive genetic risk scores. Such polygenic scores, constructed using results from the ENGAGE Consortium GWAS on serum lipids, were applied to predict lipid levels in an independent population-based study, the Rotterdam Study-II (RS-II). We additionally tested for evidence of a shared genetic basis for different lipid phenotypes. Finally, the polygenic score approach was used to identify an alternative genome-wide significance threshold before pathway analysis and those results were compared with those based on the classical genome-wide significance threshold. Our study provides evidence suggesting that many loci influencing circulating lipid levels remain undiscovered. Cross-prediction models suggested a small overlap between the polygenic backgrounds involved in determining LDL-C, HDL-C and TG levels. Pathway analysis utilizing the best polygenic score for TC uncovered extra information compared with using only genome-wide significant loci. These results suggest that the genetic architecture of circulating lipids involves a number of undiscovered variants with very small effects, and that increasing GWAS sample sizes will enable the identification of novel variants that regulate lipid levels.
serum lipids; polygenic; genome-wide association; polygenic score; pathway analysis
Variation in the apolipoprotein E gene (APOE) has been reported to be associated with longevity in humans. The authors assessed the allelic distribution of APOE isoforms ε2, ε3, and ε4 among 10,623 participants from 15 case-control and cohort studies of age-related macular degeneration (AMD) in populations of European ancestry (study dates ranged from 1990 to 2009). The authors included only the 10,623 control subjects from these studies who were classified as having no evidence of AMD, since variation within the APOE gene has previously been associated with AMD. In an analysis stratified by study center, gender, and smoking status, there was a decreasing frequency of the APOE ε4 isoform with increasing age (χ2 for trend = 14.9 (1 df); P = 0.0001), with a concomitant increase in the ε3 isoform (χ2 for trend = 11.3 (1 df); P = 0.001). The association with age was strongest in ε4 homozygotes; the frequency of ε4 homozygosity decreased from 2.7% for participants aged 60 years or less to 0.8% for those over age 85 years, while the proportion of participants with the ε3/ε4 genotype decreased from 26.8% to 17.5% across the same age range. Gender had no significant effect on the isoform frequencies. This study provides strong support for an association of the APOE gene with human longevity.
aged; apolipoprotein E2; apolipoprotein E3; apolipoprotein E4; apolipoproteins E; longevity; meta-analysis; multicenter study
Myopia is a complex genetic disorder and a common cause of visual impairment among working age adults. Genome-wide association studies have identified susceptibility loci on chromosomes 15q14 and 15q25 in Caucasian populations of European ancestry. Here, we present a confirmation and meta-analysis study in which we assessed whether these two loci are also associated with myopia in other populations. The study population comprised 31 cohorts from the Consortium of Refractive Error and Myopia (CREAM) representing 4 different continents with 55,177 individuals; 42,845 Caucasians and 12,332 Asians. We performed a meta-analysis of 14 single nucleotide polymorphisms (SNPs) on 15q14 and 5 SNPs on 15q25 using linear regression analysis with spherical equivalent as a quantitative outcome, adjusted for age and sex. We calculated the odds ratio (OR) of myopia versus hyperopia for carriers of the top-SNP alleles using a fixed effects meta-analysis. At locus 15q14, all SNPs were significantly replicated, with the lowest P value 3.87 × 10−12 for SNP rs634990 in Caucasians, and 9.65 × 10−4 for rs8032019 in Asians. The overall meta-analysis provided P value 9.20 × 10−23 for the top SNP rs634990. The risk of myopia versus hyperopia was OR 1.88 (95 % CI 1.64, 2.16, P < 0.001) for homozygous carriers of the risk allele at the top SNP rs634990, and OR 1.33 (95 % CI 1.19, 1.49, P < 0.001) for heterozygous carriers. SNPs at locus 15q25 did not replicate significantly (P value 5.81 × 10−2 for top SNP rs939661). We conclude that common variants at chromosome 15q14 influence susceptibility for myopia in Caucasian and Asian populations world-wide.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-012-1176-0) contains supplementary material, which is available to authorized users.
Complement factor D catalyzes a critical step in the alternative complement activation pathway. The authors report a significant elevation in plasma CFD concentrations in age-related macular degeneration (AMD) patients compared with controls and a weak genetic association between CFD gene variants and AMD.
To examine the role of complement factor D (CFD) in age-related macular degeneration (AMD) by analysis of genetic association, copy number variation, and plasma CFD concentrations.
Single nucleotide polymorphisms (SNPs) in the CFD gene were genotyped and the results analyzed by binary logistic regression. CFD gene copy number was analyzed by gene copy number assay. Plasma CFD was measured by an enzyme-linked immunosorbent assay.
Genetic association was found between CFD gene SNP rs3826945 and AMD (odds ratio 1.44; P = 0.028) in a small discovery case-control series (462 cases and 325 controls) and replicated in a combined cohorts meta-analysis of 4765 cases and 2693 controls, with an odds ratio of 1.11 (P = 0.032), with the association almost confined to females. Copy number variation in the CFD gene was identified in 13 out of 640 samples examined but there was no difference in frequency between AMD cases (1.3%) and controls (2.7%). Plasma CFD concentration was measured in 751 AMD cases and 474 controls and found to be elevated in AMD cases (P = 0.00025). The odds ratio for those in the highest versus lowest quartile for plasma CFD was 1.81. The difference in plasma CFD was again almost confined to females.
CFD regulates activation of the alternative complement pathway, which is implicated in AMD pathogenesis. The authors found evidence for genetic association between a CFD gene SNP and AMD and a significant increase in plasma CFD concentration in AMD cases compared with controls, consistent with a role for CFD in AMD pathogenesis.