Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case–Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn's disease, bipolar disorder, hypertension and rheumatoid arthritis.
association studies; genome-wide scan; epistasis; biological network
Recent genome-wide association studies have resulted in a dramatic increase in our knowledge of the genetic loci involved in type 2 diabetes. In a complementary approach to these single-marker studies, we attempted to identify biological pathways associated with type 2 diabetes. This approach could allow us to identify additional risk loci.
RESEARCH DESIGN AND METHODS
We used individual level genotype data generated from the Wellcome Trust Case Control Consortium (WTCCC) type 2 diabetes study, consisting of 393,143 autosomal SNPs, genotyped across 1,924 case subjects and 2,938 control subjects. We sought additional evidence from summary level data available from the Diabetes Genetics Initiative (DGI) and the Finland-United States Investigation of NIDDM Genetics (FUSION) studies. Statistical analysis of pathways was performed using a modification of the Gene Set Enrichment Algorithm (GSEA). A total of 439 pathways were analyzed from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and BioCarta databases.
After correcting for the number of pathways tested, we found no strong evidence for any pathway showing association with type 2 diabetes (top Padj = 0.31). The candidate WNT-signaling pathway ranked top (nominal P = 0.0007, excluding TCF7L2; P = 0.002), containing a number of promising single gene associations. These include CCND2 (rs11833537; P = 0.003), SMAD3 (rs7178347; P = 0.0006), and PRICKLE1 (rs1796390; P = 0.001), all expressed in the pancreas.
Common variants involved in type 2 diabetes risk are likely to occur in or near genes in multiple pathways. Pathway-based approaches to genome-wide association data may be more successful for some complex traits than others, depending on the nature of the underlying disease physiology.
With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.
The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.
The distribution of rare variant signals (association P ≤ 10−5) was found to differ significantly between regions with and without linkage evidence (P = 2 × 10−17 by Fisher’s exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA–DRB1 locus was accounted for.
The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.
OBJECTIVE—This study examined how differences in the BMI distribution of type 2 diabetic case subjects affected genome-wide patterns of type 2 diabetes association and considered the implications for the etiological heterogeneity of type 2 diabetes.
RESEARCH DESIGN AND METHODS—We reanalyzed data from the Wellcome Trust Case Control Consortium genome-wide association scan (1,924 case subjects, 2,938 control subjects: 393,453 single-nucleotide polymorphisms [SNPs]) after stratifying case subjects (into “obese” and “nonobese”) according to median BMI (30.2 kg/m2). Replication of signals in which alternative case-ascertainment strategies generated marked effect size heterogeneity in type 2 diabetes association signal was sought in additional samples.
RESULTS—In the “obese-type 2 diabetes” scan, FTO variants had the strongest type 2 diabetes effect (rs8050136: relative risk [RR] 1.49 [95% CI 1.34–1.66], P = 1.3 × 10−13), with only weak evidence for TCF7L2 (rs7901695 RR 1.21 [1.09–1.35], P = 0.001). This situation was reversed in the “nonobese” scan, with FTO association undetectable (RR 1.07 [0.97–1.19], P = 0.19) and TCF7L2 predominant (RR 1.53 [1.37–1.71], P = 1.3 × 10−14). These patterns, confirmed by replication, generated strong combined evidence for between-stratum effect size heterogeneity (FTO: PDIFF = 1.4 × 10−7; TCF7L2: PDIFF = 4.0 × 10−6). Other signals displaying evidence of effect size heterogeneity in the genome-wide analyses (on chromosomes 3, 12, 15, and 18) did not replicate. Analysis of the current list of type 2 diabetes susceptibility variants revealed nominal evidence for effect size heterogeneity for the SLC30A8 locus alone (RRobese 1.08 [1.01–1.15]; RRnonobese 1.18 [1.10–1.27]: PDIFF = 0.04).
CONCLUSIONS—This study demonstrates the impact of differences in case ascertainment on the power to detect and replicate genetic associations in genome-wide association studies. These data reinforce the notion that there is substantial etiological heterogeneity within type 2 diabetes.
Given that genome wide association studies (GWAS) of psychiatric disorders have identified only a small number of convincingly associated variants, there is interest in seeking additional evidence for associated variants using tests of gene-gene interaction. Comprehensive pair-wise SNP-SNP interaction analysis is computationally intensive and the penalty for multiple testing is severe given the number of interactions possible. Aiming to minimize these statistical and computational burdens, we have explored approaches to prioritise SNPs for interaction analyses.
Primary interaction analyses were performed using the Wellcome Trust Case Control Consortium Bipolar Disorder GWAS (1868 cases, 2938 controls). Replication analyses were performed using the Genetic Association Information Network BD dataset (1001 cases, 1033 controls). SNPs were prioritized for interaction analysis that showed evidence for association that surpassed a number of nominally significant thresholds, are within genome-wide significant genes, or are within genes that are functionally related.
For no set of prioritized SNPs did we obtain evidence to support the hypothesis that the selection strategy identified pairs of variants that were enriched for true (statistical) interactions.
SNPs prioritized according to a number of criteria do not have a raised prior probability for significant interaction that is detectable in samples of this size. As is now widely accepted for single SNP analysis, we argue the use of significance levels reflecting only the number of tests performed does not offer an appropriate degree of protection against the potential for GWAS studies to generate an enormous number of false positive interactions.
GWAS; SNP; epistasis; association; interaction; gene
Genome-wide association (GWA) studies have identified a number of loci underlying variation in human serum uric acid (SUA) levels with the SLC2A9 gene having the largest effect identified so far. Gene-gene interactions (epistasis) are largely unexplored in these GWA studies. We performed a full pair-wise genome scan in the Italian MICROS population (n = 1201) to characterise epistasis signals in SUA levels. In the resultant epistasis profile, no SNP pairs reached the Bonferroni adjusted threshold for the pair-wise genome-wide significance. However, SLC2A9 was found interacting with multiple loci across the genome, with NFIA - SLC2A9 and SLC2A9 - ESRRAP2 being significant based on a threshold derived for interactions between GWA significant SNPs and the genome and jointly explaining 8.0% of the phenotypic variance in SUA levels (3.4% by interaction components). Epistasis signal replication in a CROATIAN population (n = 1772) was limited at the SNP level but improved dramatically at the gene ontology level. In addition, gene ontology terms enriched by the epistasis signals in each population support links between SUA levels and neurological disorders. We conclude that GWA epistasis analysis is useful despite relatively low power in small isolated populations.
We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two ‘missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10−16. The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, ‘missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.
genome-wide association study; the 1000 Genomes project; imputation
The P-value approach has been employed to prioritizing genome-wide association (GWA) scan signals, with a genome-wide significance defined by a prior P-value threshold, although this is not ideal. A rationale put forward is that the association signals rather should be expected to give less support for single nucleotide polymorphisms (SNPs) that are rare (with associated low-power tests) than for common SNPs with equivalent P-values, unless investigators believe, a priori, that rare causative variants contribute to the disease and have more pronounced effects.
Using data from a GWA scan for type 2 diabetes (1924 cases, 2938 controls, 393 453 SNPs), we compared P-values with four alternative signal measures: likelihood ratio (LR), Bayes factor (BF; with a specified prior distribution for true effects), ‘frequentist factor’ (FF; reflecting the ratio between estimated—post-data— ‘power’ and P-value) and probability of pronounced effect size (PrPES).
The 19 common SNPs [minor allele frequency (MAF) among the controls >29%] yielding strong P-value signals (P<5×10−7) were also top ranked by the other approaches. There was a strong similarity between the P-values, LR and BF signals, in terms of ranking SNPs. In contrast, FF and PrPES signals down-weighted rare SNPs (control MAF<10%) with low P-values.
For prioritization of signals that do not achieve compelling levels of evidence for association, the main driving force behind observed differences between the various association signals appears to be SNP MAF. The statistical power afforded by follow-up samples for establishing replication should be taken into account when tailoring the signal selection strategy.
Bayes factor; effect size; likelihood ratio; single nucleotide polymorphism; statistical power; statistics
Although they have demonstrated success in searching for common variants for complex diseases, Genome-Wide Association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include 7 complex diseases: Bipolar disorder, Cardiovascular disease, Hypertension, Rheumatoid Arthritis, Crohn’s disease, Type 1 Diabetes and Type 2 Diabetes. We identified 24 genes or regions that reach genome wide significance. 8 of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r2<0.32). The gene identified in hypertension was further replicated in the Framingham Heart Study (FHS), and is also significantly associated with Type 2 Diabetes. Our analysis suggests that searching for rare genetic variants is feasible in current genome-wide association studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.
Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS.
Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits.
Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/.
Supplementary data are available at Bioinformatics online.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS.
The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements.
The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware.
The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan1 on seven diseases, including the multifactorial, autoimmune disease, type 1 diabetes (T1D), shows significant association (P < 5 × 10−7 between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios that were independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (Pfollow-up ≤ 1.35 × 10−9; Poverall ≤ 1.15 × 10−14), leaving eight regions with small effects or false-positive associations with T1D. We also obtained evidence for chromosome 18q22 (Poverall = 1.38 × 10−8) from a genome-wide association study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.
Coronary artery disease (CAD) shares common risk factors with type 2 diabetes (T2DM). Variations in the transcription factor 7-like 2 (TCF7L2) gene, particularly rs7903146, increase T2DM risk. Potential links between genetic variants of the TCF7L2 locus and coronary atherosclerosis are uncertain. We therefore investigated the association between TCF7L2 polymorphisms and angiographically determined CAD in diabetic and non-diabetic patients.
We genotyped TCF7L2 variants rs7903146, rs12255372, and rs11196205 in a cross-sectional study including 1,650 consecutive patients undergoing coronary angiography for the evaluation of established or suspected stable CAD. Significant CAD was diagnosed in the presence of coronary stenoses ≥50%. Variant rs7903146 in the total study cohort was significantly associated with significant CAD (adjusted additive OR = 1.29 [1.09–1.53]; p = 0.003). This association was strong and significant in T2DM patients (n = 393; OR = 1.91 [1.32–2.75]; p = 0.001) but not in non-diabetic subjects (OR = 1.09 [0.90–1.33]; p = 0.370). The interaction risk allele by T2DM was significant (pinteraction = 0.002), indicating a significantly stronger impact of the polymorphism on CAD in T2DM patients than in non-diabetic subjects. TCF7L2 polymorphisms rs12255372 and rs11196205 were also significantly associated with CAD in diabetic patients (adjusted additive OR = 1.90 [1.31–2.74]; p = 0.001 and OR = 1.75 [1.22–2.50]; p = 0.002, respectively). Further, haplotype analysis demonstrated that haplotypes including the rare alleles of all investigated variants were significantly associated with CAD in the whole cohort as well as in diabetic subjects (OR = 1.22 [1.04–1.43]; p = 0.013 and OR = 1.67 [1.19–2.22]; p = 0.003, respectively).
These results suggest that TCF7L2 variants rs7903146 rs12255372, and rs11196205 are significantly associated with angiographically diagnosed CAD, specifically in patients with T2DM. TCF7L2 therefore appears as a genetic link between diabetes and atherosclerosis.
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case–control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.
Psychiatric phenotypes are currently defined according to sets of
descriptive criteria. Although many of these phenotypes are heritable, it
would be useful to know whether any of the various diagnostic categories in
current use identify cases that are particularly helpful for
To use genome-wide genetic association data to explore the relative genetic
utility of seven different descriptive operational diagnostic categories
relevant to bipolar illness within a large UK case–control bipolar
We analysed our previously published Wellcome Trust Case Control Consortium
(WTCCC) bipolar disorder genome-wide association data-set, comprising 1868
individuals with bipolar disorder and 2938 controls genotyped for 276 122
single nucleotide polymorphisms (SNPs) that met stringent criteria for
genotype quality. For each SNP we performed a test of association (bipolar
disorder group v. control group) and used the number of associated
independent SNPs statistically significant at P<0.00001 as a
metric for the overall genetic signal in the sample. We next compared this
metric with that obtained using each of seven diagnostic subsets of the group
with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder;
manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type;
DSM–IV: bipolar I disorder; bipolar II disorder; schizoaffective
disorder, bipolar type.
The RDC schizoaffective disorder, bipolar type (v. controls) stood
out from the other diagnostic subsets as having a significant excess of
independent association signals (P<0.003) compared with that
expected in samples of the same size selected randomly from the total bipolar
disorder group data-set. The strongest association in this subset of
participants with bipolar disorder was at rs4818065 (P =
2.42×10–7). Biological systems implicated included
gamma amniobutyric acid (GABA)A receptors. Genes having at least
one associated polymorphism at P<10–4 included
B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and
Our findings show that individuals with broadly defined bipolar
schizoaffective features have either a particularly strong genetic
contribution or that, as a group, are genetically more homogeneous than the
other phenotypes tested. The results point to the importance of using
diagnostic approaches that recognise this group of individuals. Our approach
can be applied to similar data-sets for other psychiatric and non-psychiatric
Genome-wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) at multiple loci that are significantly associated with coronary artery disease (CAD) risk. In this study, we sought to determine and compare the predictive capabilities of 9p21.3 alone and a panel of SNPs identified and replicated through GWAS for CAD.
Methods and Results
We used the Ottawa Heart Genomics Study (OHGS) (3323 cases, 2319 control subjects) and the Wellcome Trust Case Control Consortium (WTCCC) (1926 cases, 2938 control subjects) data sets. We compared the ability of allele counting, logistic regression, and support vector machines. Two sets of SNPs, 9p21.3 alone and a set of 12 SNPs identified by GWAS and through a model-fitting procedure, were considered. Performance was assessed by measuring area under the curve (AUC) for OHGS using 10-fold cross-validation and WTCCC as a replication set. AUC for logistic regression using OHGS increased significantly from 0.555 to 0.608 (P=3.59×10–14) for 9p21.3 versus the 12 SNPs, respectively. This difference remained when traditional risk factors were considered in a subgroup of OHGS (1388 cases, 2038 control subjects), with AUC increasing from 0.804 to 0.809 (P=0.037). The added predictive value over and above the traditional risk factors was not significant for 9p21.3 (AUC 0.801 versus 0.804, P=0.097) but was for the 12 SNPs (AUC 0.801 versus 0.809, P=0.0073). Performance was similar between OHGS and WTCCC. Logistic regression outperformed both support vector machines and allele counting.
Using the collective of 12 SNPs confers significantly greater predictive capabilities for CAD than 9p21.3, whether traditional risks are or are not considered. More accurate models probably will evolve as additional CAD-associated SNPs are identified.
coronary disease; genetics; risk factors
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene–gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene–gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis. We validate this approach in a two-stage (discovery/replication) pathway analysis of GWAS of BD. The discovery cohort comes from the Wellcome Trust Case Control Consortium (WTCCC) GWAS of BD, and the replication cohort comes from the National Institute of Mental Health (NIMH) GWAS of BD in European Ancestry individuals. Epistasis network centrality yields replicated enrichment of Cadherin signaling pathway, whose genes have been hypothesized to have an important role in BD pathophysiology but have not demonstrated enrichment in previous analysis. Other enriched pathways include Wnt signaling, circadian rhythm pathway, axon guidance and neuroactive ligand-receptor interaction. In addition to pathway enrichment, the collective network approach elevates the importance of ANK3, DGKH and ODZ4 for BD susceptibility in the WTCCC GWAS, despite their weak single-locus effect in the data. These results provide evidence that numerous small interactions among common alleles may contribute to the diathesis for BD and demonstrate the importance of including information from the network of gene–gene interactions as well as main effects when prioritizing genes for pathway analysis.
eigenvector centrality; epistasis network; evaporative cooling machine learning feature selection; pathway enrichment analysis; regression-based genetic association interaction network (reGAIN); SNPrank
Genome-wide association study (GWAS) aims to find genetic factors underlying complex phenotypic traits, for which epistasis or gene-gene interaction detection is often preferred over single-locus approach. However, the computational burden has been a major hurdle to apply epistasis test in the genome-wide scale due to a large number of single nucleotide polymorphism (SNP) pairs to be tested.
We have developed a set of three efficient programs, FastANOVA, COE and TEAM, that support epistasis test in a variety of problem settings in GWAS. These programs utilize permutation test to properly control error rate such as family-wise error rate (FWER) and false discovery rate (FDR). They guarantee to find the optimal solutions, and significantly speed up the process of epistasis detection in GWAS.
A web server with user interface and source codes are available at the website http://www.csbio.unc.edu/epistasis/. The source codes are also available at SourceForge http://sourceforge.net/projects/epistasis/.
We surveyed gene–gene interactions (epistasis) in human body mass index (BMI) in four European populations (n<1200) via exhaustive pair-wise genome scans where interactions were computed as F ratios by testing a linear regression model fitting two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide significant in either cohort based on the consensus threshold (P=5.0E−08) and a Bonferroni corrected threshold (P=1.1E−12), respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E−08) across cohorts to identify common epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published genome-wide association loci, including CDH13 (cadherin 13) associated with height and SORCS2 (sortilin-related VPS10 domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E−08) were tested across cohorts and found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts to confirm the findings.
body mass index; BMI; gene interaction; epistasis; pair-wise genome scan
We performed a genome-wide search for pairs of susceptibility loci that jointly contribute to rheumatoid arthritis in families recruited by the North American Rheumatoid Arthritis Consortium. A complete two-dimensional (2D) non-parametric linkage scan was carried out using 380 autosomal microsatellite markers in 511 families. At each 2D peak we obtained the most likely underlying genetic model explaining the two-locus effects, defining epistasis as a departure from an additive or a multiplicative two-locus penetrance function. The highest peak in the surface identified an epistatic interaction between loci 6p21 and 16p12 (two-locus lod score = 18.02, epistasis P < 0.012). Significant and suggestive two-locus effects were also obtained for region 6p21 in combination with loci 18q21, 8p23, 1q41, and 6p22, while the highest 2D peaks excluding region 6p21 were observed at locus pairs 8p23-18q21 and 1p21-18q21. The 2D peaks were further examined using combined microsatellite and single-nucleotide polymorphism (SNP) marker genotypes in 744 families. The two-locus evidence for linkage increased for region pairs 6p21-18q12, 6p21-16p12, 6p21-8p23, 1q41-6p21, and 6p21-6p22, but decreased for pairs of regions that did not include locus 6p21. In conclusion, we obtained evidence for multi-locus interactions in rheumatoid arthritis that are mediated by the major susceptibility locus at 6p21.
In 2007, the Wellcome Trust Case Control Consortium (WTCCC) performed a genome-wide association study in 2,000 British coronary heart disease (CHD) cases and 3,000 controls after genotyping 469,557 single nucleotide polymorphisms (SNPs). Seven variants associated with CHD were initially identified, and 5 SNPs were later found in replication studies. In the current study, the authors aimed to determine whether the 12 SNPs reported by the WTCCC predicted incident CHD through 2004 in a biracial, prospective cohort study (Atherosclerosis Risk in Communities) comprising 15,792 persons aged 45–64 years who had been selected by probability sampling from 4 different US communities in 1987–1989. Cox proportional hazards models with adjustment for age and gender were used to estimate CHD hazard rate ratios (HRRs) over a 17-year period (1,362 cases in whites and 397 cases in African Americans) under an additive genetic model. The results showed that 3 SNPs in whites (rs599839, rs1333049, and rs501120; HRRs were 1.10 (P = 0.044), 1.14 (P < 0.001), and 1.14 (P = 0.030), respectively) and 1 SNP in African Americans (rs7250581; HRR = 1.60, P = 0.05) were significantly associated with incident CHD. This study demonstrates that genetic variants revealed in a case-control genome-wide association study enriched for early disease onset may play a role in the genetic etiology of CHD in the general population.
coronary disease; genetic variation; genetics; genomics; heart diseases; myocardial infarction; polymorphism, single nucleotide
Modern genotyping platforms permit a systematic search for inherited components of complex diseases. We performed a joint analysis of two genomewide association studies of coronary artery disease.
We first identified chromosomal loci that were strongly associated with coronary artery disease in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1926 case subjects with coronary artery disease and 2938 controls) and looked for replication in the German MI [Myocardial Infarction] Family Study (which involved 875 case subjects with myocardial infarction and 1644 controls). Data on other single-nucleotide polymorphisms (SNPs) that were significantly associated with coronary artery disease in either study (P<0.001) were then combined to identify additional loci with a high probability of true association. Genotyping in both studies was performed with the use of the GeneChip Human Mapping 500K Array Set (Affymetrix).
Of thousands of chromosomal loci studied, the same locus had the strongest association with coronary artery disease in both the WTCCC and the German studies: chromosome 9p21.3 (SNP, rs1333049) (P=1.80×10−14 and P=3.40×10−6, respectively). Overall, the WTCCC study revealed nine loci that were strongly associated with coronary artery disease (P<1.2×10−5 and less than a 50% chance of being falsely positive). In addition to chromosome 9p21.3, two of these loci were successfully replicated (adjusted P<0.05) in the German study: chromosome 6q25.1 (rs6922269) and chromosome 2q36.3 (rs2943634). The combined analysis of the two studies identified four additional loci significantly associated with coronary artery disease (P<1.3×10−6) and a high probability (>80%) of a true association: chromosomes 1p13.3 (rs599839), 1q41 (rs17465637), 10q11.21 (rs501120), and 15q22.33 (rs17228212).
We identified several genetic loci that, individually and in aggregate, substantially affect the risk of development of coronary artery disease.
Several large, rare chromosomal copy number variants (CNVs) have recently been shown to increase risk for schizophrenia and other neuropsychiatric disorders including autism, ADHD, learning difficulties and epilepsy.
We wanted to examine the frequencies of these schizophrenia-associated variants in a large sample of individuals with non-psychiatric illnesses to better understand the robustness and specificity of the association with schizophrenia.
We used Affymetrix 500K microarray data from 10,259 individuals from the UK Wellcome Trust Case Control Consortium (WTCCC) who are affected with six non-psychiatric disorders (coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, types 1 and 2 diabetes) to establish the frequencies of nine CNV loci strongly implicated in schizophrenia, and compared them with the previous findings.
Deletions at 1q21.1, 3q29, 15q11.2, 15q13.1 and 22q11.2 (VCFS region), and duplications at 16p11.2 were found significantly more often in schizophrenia cases, compared with the WTCCC reference set. Deletions at 17p12 and 17q12, were also more common in schizophrenia cases but not significantly so, while duplications at 16p13.1 were found at nearly the same rate as in previous schizophrenia samples. The frequencies of CNVs in the WTCCC non-psychiatric controls at three of the loci (15q11.2, 16p13.1 and 17p12) were significantly higher than those reported in previous control populations.
The evidence for association with schizophrenia is compelling for six rare CNV loci, while the remaining three require further replication in large studies. Risk at these loci extends to other neurodevelopmental disorders but their involvement in common non-psychiatric disorders should also be investigated.
CNV; Schizophrenia; WTCCC