Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case–Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn's disease, bipolar disorder, hypertension and rheumatoid arthritis.
association studies; genome-wide scan; epistasis; biological network
Genome-wide association (GWA) studies have identified a number of loci underlying variation in human serum uric acid (SUA) levels with the SLC2A9 gene having the largest effect identified so far. Gene-gene interactions (epistasis) are largely unexplored in these GWA studies. We performed a full pair-wise genome scan in the Italian MICROS population (n = 1201) to characterise epistasis signals in SUA levels. In the resultant epistasis profile, no SNP pairs reached the Bonferroni adjusted threshold for the pair-wise genome-wide significance. However, SLC2A9 was found interacting with multiple loci across the genome, with NFIA - SLC2A9 and SLC2A9 - ESRRAP2 being significant based on a threshold derived for interactions between GWA significant SNPs and the genome and jointly explaining 8.0% of the phenotypic variance in SUA levels (3.4% by interaction components). Epistasis signal replication in a CROATIAN population (n = 1772) was limited at the SNP level but improved dramatically at the gene ontology level. In addition, gene ontology terms enriched by the epistasis signals in each population support links between SUA levels and neurological disorders. We conclude that GWA epistasis analysis is useful despite relatively low power in small isolated populations.
Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS.
The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements.
The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware.
Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS.
Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits.
Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/.
Supplementary data are available at Bioinformatics online.
With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.
The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.
The distribution of rare variant signals (association P ≤ 10−5) was found to differ significantly between regions with and without linkage evidence (P = 2 × 10−17 by Fisher’s exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA–DRB1 locus was accounted for.
The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.
We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two ‘missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10−16. The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, ‘missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.
genome-wide association study; the 1000 Genomes project; imputation
We performed a genome-wide search for pairs of susceptibility loci that jointly contribute to rheumatoid arthritis in families recruited by the North American Rheumatoid Arthritis Consortium. A complete two-dimensional (2D) non-parametric linkage scan was carried out using 380 autosomal microsatellite markers in 511 families. At each 2D peak we obtained the most likely underlying genetic model explaining the two-locus effects, defining epistasis as a departure from an additive or a multiplicative two-locus penetrance function. The highest peak in the surface identified an epistatic interaction between loci 6p21 and 16p12 (two-locus lod score = 18.02, epistasis P < 0.012). Significant and suggestive two-locus effects were also obtained for region 6p21 in combination with loci 18q21, 8p23, 1q41, and 6p22, while the highest 2D peaks excluding region 6p21 were observed at locus pairs 8p23-18q21 and 1p21-18q21. The 2D peaks were further examined using combined microsatellite and single-nucleotide polymorphism (SNP) marker genotypes in 744 families. The two-locus evidence for linkage increased for region pairs 6p21-18q12, 6p21-16p12, 6p21-8p23, 1q41-6p21, and 6p21-6p22, but decreased for pairs of regions that did not include locus 6p21. In conclusion, we obtained evidence for multi-locus interactions in rheumatoid arthritis that are mediated by the major susceptibility locus at 6p21.
We surveyed gene–gene interactions (epistasis) in human body mass index (BMI) in four European populations (n<1200) via exhaustive pair-wise genome scans where interactions were computed as F ratios by testing a linear regression model fitting two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide significant in either cohort based on the consensus threshold (P=5.0E−08) and a Bonferroni corrected threshold (P=1.1E−12), respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E−08) across cohorts to identify common epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published genome-wide association loci, including CDH13 (cadherin 13) associated with height and SORCS2 (sortilin-related VPS10 domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E−08) were tested across cohorts and found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts to confirm the findings.
body mass index; BMI; gene interaction; epistasis; pair-wise genome scan
Although they have demonstrated success in searching for common variants for complex diseases, Genome-Wide Association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include 7 complex diseases: Bipolar disorder, Cardiovascular disease, Hypertension, Rheumatoid Arthritis, Crohn’s disease, Type 1 Diabetes and Type 2 Diabetes. We identified 24 genes or regions that reach genome wide significance. 8 of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r2<0.32). The gene identified in hypertension was further replicated in the Framingham Heart Study (FHS), and is also significantly associated with Type 2 Diabetes. Our analysis suggests that searching for rare genetic variants is feasible in current genome-wide association studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.
Given that genome wide association studies (GWAS) of psychiatric disorders have identified only a small number of convincingly associated variants, there is interest in seeking additional evidence for associated variants using tests of gene-gene interaction. Comprehensive pair-wise SNP-SNP interaction analysis is computationally intensive and the penalty for multiple testing is severe given the number of interactions possible. Aiming to minimize these statistical and computational burdens, we have explored approaches to prioritise SNPs for interaction analyses.
Primary interaction analyses were performed using the Wellcome Trust Case Control Consortium Bipolar Disorder GWAS (1868 cases, 2938 controls). Replication analyses were performed using the Genetic Association Information Network BD dataset (1001 cases, 1033 controls). SNPs were prioritized for interaction analysis that showed evidence for association that surpassed a number of nominally significant thresholds, are within genome-wide significant genes, or are within genes that are functionally related.
For no set of prioritized SNPs did we obtain evidence to support the hypothesis that the selection strategy identified pairs of variants that were enriched for true (statistical) interactions.
SNPs prioritized according to a number of criteria do not have a raised prior probability for significant interaction that is detectable in samples of this size. As is now widely accepted for single SNP analysis, we argue the use of significance levels reflecting only the number of tests performed does not offer an appropriate degree of protection against the potential for GWAS studies to generate an enormous number of false positive interactions.
GWAS; SNP; epistasis; association; interaction; gene
Genome-wide association study (GWAS) aims to find genetic factors underlying complex phenotypic traits, for which epistasis or gene-gene interaction detection is often preferred over single-locus approach. However, the computational burden has been a major hurdle to apply epistasis test in the genome-wide scale due to a large number of single nucleotide polymorphism (SNP) pairs to be tested.
We have developed a set of three efficient programs, FastANOVA, COE and TEAM, that support epistasis test in a variety of problem settings in GWAS. These programs utilize permutation test to properly control error rate such as family-wise error rate (FWER) and false discovery rate (FDR). They guarantee to find the optimal solutions, and significantly speed up the process of epistasis detection in GWAS.
A web server with user interface and source codes are available at the website http://www.csbio.unc.edu/epistasis/. The source codes are also available at SourceForge http://sourceforge.net/projects/epistasis/.
Recent genome-wide association studies have resulted in a dramatic increase in our knowledge of the genetic loci involved in type 2 diabetes. In a complementary approach to these single-marker studies, we attempted to identify biological pathways associated with type 2 diabetes. This approach could allow us to identify additional risk loci.
RESEARCH DESIGN AND METHODS
We used individual level genotype data generated from the Wellcome Trust Case Control Consortium (WTCCC) type 2 diabetes study, consisting of 393,143 autosomal SNPs, genotyped across 1,924 case subjects and 2,938 control subjects. We sought additional evidence from summary level data available from the Diabetes Genetics Initiative (DGI) and the Finland-United States Investigation of NIDDM Genetics (FUSION) studies. Statistical analysis of pathways was performed using a modification of the Gene Set Enrichment Algorithm (GSEA). A total of 439 pathways were analyzed from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and BioCarta databases.
After correcting for the number of pathways tested, we found no strong evidence for any pathway showing association with type 2 diabetes (top Padj = 0.31). The candidate WNT-signaling pathway ranked top (nominal P = 0.0007, excluding TCF7L2; P = 0.002), containing a number of promising single gene associations. These include CCND2 (rs11833537; P = 0.003), SMAD3 (rs7178347; P = 0.0006), and PRICKLE1 (rs1796390; P = 0.001), all expressed in the pancreas.
Common variants involved in type 2 diabetes risk are likely to occur in or near genes in multiple pathways. Pathway-based approaches to genome-wide association data may be more successful for some complex traits than others, depending on the nature of the underlying disease physiology.
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case–control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.
It is believed that interactions among genes (epistasis) may play an important role in susceptibility to common diseases [Moore and Williams, 2002; Ritchie et al., 2001].
To study the underlying genetic variants of diseases, genome-wide association studies (GWAS) that simultaneously assay several hundreds of thousands of SNPs are being increasingly used. Often, the data from these studies are analyzed with single-locus methods [Lambert et al., 2009; Reiman et al., 2007]. However, epistatic interactions may not be easily detected with single-locus methods [Marchini et al., 2005]. As a result, both parametric and nonparametric multi-locus methods have been developed to detect such interactions [Heidema et al., 2006]. However, efficiently analyzing epistasis using high-dimensional genome-wide data remains a crucial challenge.
We develop a method based on Bayesian networks and the minimum description length principle for detecting epistatic interactions. We compare its ability to detect gene-gene interactions and its efficiency to that of the combinatorial method multifactor dimensionality reduction (MDR) using 28000 simulated data sets generated from 70 different genetic models We further apply the method to over 300,000 SNPs obtained from a GWAS involving late onset Alzheimer’s disease (LOAD). Our method outperforms MDR and we substantiate previous results indicating that the GAB2 gene is associated with LOAD. To our knowledge, this is the first successful model-based epistatic analysis using a high-dimensional genome-wide data set.
Alzheimer’s; APOE; GAB2; genome-wide; epistasis; Bayesian network; minimum description length
Though epistasis has long been postulated to have a critical role in genetic regulation of important pathways as well as provide a major source of variation in the process of speciation, the importance of epistasis for genomic selection in the context of plant breeding is still being debated. In this paper, we report the results on the prediction of genetic values with epistatic effects for 280 accessions in the Nebraska Wheat Breeding Program using adaptive mixed least absolute shrinkage and selection operator (LASSO). The development of adaptive mixed LASSO, originally designed for association mapping, for the context of genomic selection is reported. The results show that adaptive mixed LASSO can be successfully applied to the prediction of genetic values while incorporating both marker main effects and epistatic effects. Especially, the prediction accuracy is substantially improved by the inclusion of two-locus epistatic effects (more than onefold in some cases as measured by cross-validation correlation coefficient), which is observed for multiple traits and planting locations. This points to significant potential in using non-additive genetic effects for genomic selection in crop breeding practices.
adaptive mixed LASSO (least absolute shrinkage and selection operator); epistasis; genomic selection; plant breeding; wheat
Genome-wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) at multiple loci that are significantly associated with coronary artery disease (CAD) risk. In this study, we sought to determine and compare the predictive capabilities of 9p21.3 alone and a panel of SNPs identified and replicated through GWAS for CAD.
Methods and Results
We used the Ottawa Heart Genomics Study (OHGS) (3323 cases, 2319 control subjects) and the Wellcome Trust Case Control Consortium (WTCCC) (1926 cases, 2938 control subjects) data sets. We compared the ability of allele counting, logistic regression, and support vector machines. Two sets of SNPs, 9p21.3 alone and a set of 12 SNPs identified by GWAS and through a model-fitting procedure, were considered. Performance was assessed by measuring area under the curve (AUC) for OHGS using 10-fold cross-validation and WTCCC as a replication set. AUC for logistic regression using OHGS increased significantly from 0.555 to 0.608 (P=3.59×10–14) for 9p21.3 versus the 12 SNPs, respectively. This difference remained when traditional risk factors were considered in a subgroup of OHGS (1388 cases, 2038 control subjects), with AUC increasing from 0.804 to 0.809 (P=0.037). The added predictive value over and above the traditional risk factors was not significant for 9p21.3 (AUC 0.801 versus 0.804, P=0.097) but was for the 12 SNPs (AUC 0.801 versus 0.809, P=0.0073). Performance was similar between OHGS and WTCCC. Logistic regression outperformed both support vector machines and allele counting.
Using the collective of 12 SNPs confers significantly greater predictive capabilities for CAD than 9p21.3, whether traditional risks are or are not considered. More accurate models probably will evolve as additional CAD-associated SNPs are identified.
coronary disease; genetics; risk factors
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene–gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene–gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis. We validate this approach in a two-stage (discovery/replication) pathway analysis of GWAS of BD. The discovery cohort comes from the Wellcome Trust Case Control Consortium (WTCCC) GWAS of BD, and the replication cohort comes from the National Institute of Mental Health (NIMH) GWAS of BD in European Ancestry individuals. Epistasis network centrality yields replicated enrichment of Cadherin signaling pathway, whose genes have been hypothesized to have an important role in BD pathophysiology but have not demonstrated enrichment in previous analysis. Other enriched pathways include Wnt signaling, circadian rhythm pathway, axon guidance and neuroactive ligand-receptor interaction. In addition to pathway enrichment, the collective network approach elevates the importance of ANK3, DGKH and ODZ4 for BD susceptibility in the WTCCC GWAS, despite their weak single-locus effect in the data. These results provide evidence that numerous small interactions among common alleles may contribute to the diathesis for BD and demonstrate the importance of including information from the network of gene–gene interactions as well as main effects when prioritizing genes for pathway analysis.
eigenvector centrality; epistasis network; evaporative cooling machine learning feature selection; pathway enrichment analysis; regression-based genetic association interaction network (reGAIN); SNPrank
Young-onset hypertension has a stronger genetic component than late-onset counterpart; thus, the identification of genes related to its susceptibility is a critical issue for the prevention and management of this disease. We carried out a two-stage association scan to map young-onset hypertension susceptibility genes. The first-stage analysis, a genome-wide association study, analyzed 175 matched case-control pairs; the second-stage analysis, a confirmatory association study, verified the results at the first stage based on a total of 1,008 patients and 1,008 controls. Single-locus association tests, multilocus association tests and pair-wise gene-gene interaction tests were performed to identify young-onset hypertension susceptibility genes. After considering stringent adjustments of multiple testing, gene annotation and single-nucleotide polymorphism (SNP) quality, four SNPs from two SNP triplets with strong association signals (−log10(p)>7) and 13 SNPs from 8 interactive SNP pairs with strong interactive signals (−log10(p)>8) were carefully re-examined. The confirmatory study verified the association for a SNP quartet 219 kb and 495 kb downstream of LOC344371 (a hypothetical gene) and RASGRP3 on chromosome 2p22.3, respectively. The latter has been implicated in the abnormal vascular responsiveness to endothelin-1 and angiotensin II in diabetic-hypertensive rats. Intrinsic synergy involving IMPG1 on chromosome 6q14.2-q15 was also verified. IMPG1 encodes interphotoreceptor matrix proteoglycan 1 which has cation binding capacity. The genes are novel hypertension targets identified in this first genome-wide hypertension association study of the Han Chinese population.
Coronary artery disease (CAD) shares common risk factors with type 2 diabetes (T2DM). Variations in the transcription factor 7-like 2 (TCF7L2) gene, particularly rs7903146, increase T2DM risk. Potential links between genetic variants of the TCF7L2 locus and coronary atherosclerosis are uncertain. We therefore investigated the association between TCF7L2 polymorphisms and angiographically determined CAD in diabetic and non-diabetic patients.
We genotyped TCF7L2 variants rs7903146, rs12255372, and rs11196205 in a cross-sectional study including 1,650 consecutive patients undergoing coronary angiography for the evaluation of established or suspected stable CAD. Significant CAD was diagnosed in the presence of coronary stenoses ≥50%. Variant rs7903146 in the total study cohort was significantly associated with significant CAD (adjusted additive OR = 1.29 [1.09–1.53]; p = 0.003). This association was strong and significant in T2DM patients (n = 393; OR = 1.91 [1.32–2.75]; p = 0.001) but not in non-diabetic subjects (OR = 1.09 [0.90–1.33]; p = 0.370). The interaction risk allele by T2DM was significant (pinteraction = 0.002), indicating a significantly stronger impact of the polymorphism on CAD in T2DM patients than in non-diabetic subjects. TCF7L2 polymorphisms rs12255372 and rs11196205 were also significantly associated with CAD in diabetic patients (adjusted additive OR = 1.90 [1.31–2.74]; p = 0.001 and OR = 1.75 [1.22–2.50]; p = 0.002, respectively). Further, haplotype analysis demonstrated that haplotypes including the rare alleles of all investigated variants were significantly associated with CAD in the whole cohort as well as in diabetic subjects (OR = 1.22 [1.04–1.43]; p = 0.013 and OR = 1.67 [1.19–2.22]; p = 0.003, respectively).
These results suggest that TCF7L2 variants rs7903146 rs12255372, and rs11196205 are significantly associated with angiographically diagnosed CAD, specifically in patients with T2DM. TCF7L2 therefore appears as a genetic link between diabetes and atherosclerosis.
Although numerous candidate gene and genome-wide association studies have been performed on blood pressure, a small number of regulating genetic variants having a limited effect have been identified. This phenomenon can partially be explained by possible gene-gene/epistasis interactions that were little investigated so far.
We performed a pre-planned two-phase investigation: in phase 1, one hundred single nucleotide polymorphisms (SNPs) in 65 candidate genes were genotyped in 1,912 French unrelated adults in order to study their two-locus combined effects on blood pressure (BP) levels. In phase 2, the significant epistatic interactions observed in phase 1 were tested in an independent population gathering 1,755 unrelated European adults.
Among the 9 genetic variants significantly associated with systolic and diastolic BP in phase 1, some may act through altering the corresponding protein levels: SNPs rs5742910 (Padjusted≤0.03) and rs6046 (Padjusted =0.044) in F7 and rs1800469 (Padjusted ≤0.036) in TGFB1; whereas some may be functional through altering the corresponding protein structure: rs1800590 (Padjusted =0.028, SE=0.088) in LPL and rs2228570 (Padjusted ≤9.48×10-4) in VDR. The two epistatic interactions found for systolic and diastolic BP in the discovery phase: VCAM1 (rs1041163) * APOB (rs1367117), and SCGB1A1 (rs3741240) * LPL (rs1800590), were tested in the replication population and we observed significant interactions on DBP. In silico analyses yielded putative functional properties of the SNPs involved in these epistatic interactions trough the alteration of corresponding protein structures.
These findings support the hypothesis that different pathways and then different genes may act synergistically in order to modify BP. This could highlight novel pathophysiologic mechanisms underlying hypertension.
Blood pressure; Epistasis; Single nucleotide polymorphism; Epidemiology
Genome-wide association studies have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes. It has been proposed that extending existing analysis methods by considering interactions between pairs of loci may uncover additional genetic effects. However, the large number of possible two-marker tests presents significant computational and statistical challenges. Although several strategies to detect epistasis effects have been proposed and tested for specific phenotypes, so far there has been no systematic attempt to compare their performance using real data. We made use of thousands of gene expression traits from linkage and eQTL studies, to compare the performance of different strategies. We found that using information from marginal associations between markers and phenotypes to detect epistatic effects yielded a lower false discovery rate (FDR) than a strategy solely using biological annotation in yeast, whereas results from human data were inconclusive. For future studies whose aim is to discover epistatic effects, we recommend incorporating information about marginal associations between SNPs and phenotypes instead of relying solely on biological annotation. Improved methods to discover epistatic effects will result in a more complete understanding of complex genetic effects.
The P-value approach has been employed to prioritizing genome-wide association (GWA) scan signals, with a genome-wide significance defined by a prior P-value threshold, although this is not ideal. A rationale put forward is that the association signals rather should be expected to give less support for single nucleotide polymorphisms (SNPs) that are rare (with associated low-power tests) than for common SNPs with equivalent P-values, unless investigators believe, a priori, that rare causative variants contribute to the disease and have more pronounced effects.
Using data from a GWA scan for type 2 diabetes (1924 cases, 2938 controls, 393 453 SNPs), we compared P-values with four alternative signal measures: likelihood ratio (LR), Bayes factor (BF; with a specified prior distribution for true effects), ‘frequentist factor’ (FF; reflecting the ratio between estimated—post-data— ‘power’ and P-value) and probability of pronounced effect size (PrPES).
The 19 common SNPs [minor allele frequency (MAF) among the controls >29%] yielding strong P-value signals (P<5×10−7) were also top ranked by the other approaches. There was a strong similarity between the P-values, LR and BF signals, in terms of ranking SNPs. In contrast, FF and PrPES signals down-weighted rare SNPs (control MAF<10%) with low P-values.
For prioritization of signals that do not achieve compelling levels of evidence for association, the main driving force behind observed differences between the various association signals appears to be SNP MAF. The statistical power afforded by follow-up samples for establishing replication should be taken into account when tailoring the signal selection strategy.
Bayes factor; effect size; likelihood ratio; single nucleotide polymorphism; statistical power; statistics
Genome-wide interaction-based association (GWIBA) analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS). However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named “pair-wise interaction-based association mapping” (PIAM) for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P<0.05 (P = 0.039). This interaction was replicated with a pair of proxy linked loci (P = 0.013) on an independent dataset. Five other interactions had corrected P<0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09×10−7). Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P<0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.
Recent studies on the genetic basis of common diseases have identified many loci that confer disease susceptibility. However, much of the heritability of these diseases remains unexplained. Loci involved in gene–gene interactions are considered cryptic, because they confer susceptibility, but may not generate a detectable signal on their own. These interactions may account for the “missing heritability” of common diseases. Theoretically, these interactions can be identified with the genome-wide interaction-based association analysis. But, in reality, very few gene–gene interactions have been identified with that method, and most were based on prior biological knowledge. Here, we applied a parallel computing technique that facilitated the identification of multiple new cryptic susceptibility loci involved in common diseases. We applied stringent control for false positives, and we validated our findings with independent datasets. This study demonstrated that interactions between gene loci could be successfully identified with the genome-wide interaction-based approach. With this approach, we also identified cryptic loci with moderate single-locus effects. The identified loci and interactions merit further investigations for fine mapping and functional analyses. Our results extend the current knowledge of common diseases for future studies in genetic mapping. This approach is applicable to current and future genome-wide association datasets.
Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions.
The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global p-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in PGM1 (phosphoglucomutase 1), two intronic SNPs in LMX1A (LIM homeobox transcription factor 1, alpha), two intronic SNPs in PARK2 (Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in GYS2 (glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population.
An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.