Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case–Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn's disease, bipolar disorder, hypertension and rheumatoid arthritis.
association studies; genome-wide scan; epistasis; biological network
Recent genome-wide association studies have resulted in a dramatic increase in our knowledge of the genetic loci involved in type 2 diabetes. In a complementary approach to these single-marker studies, we attempted to identify biological pathways associated with type 2 diabetes. This approach could allow us to identify additional risk loci.
RESEARCH DESIGN AND METHODS
We used individual level genotype data generated from the Wellcome Trust Case Control Consortium (WTCCC) type 2 diabetes study, consisting of 393,143 autosomal SNPs, genotyped across 1,924 case subjects and 2,938 control subjects. We sought additional evidence from summary level data available from the Diabetes Genetics Initiative (DGI) and the Finland-United States Investigation of NIDDM Genetics (FUSION) studies. Statistical analysis of pathways was performed using a modification of the Gene Set Enrichment Algorithm (GSEA). A total of 439 pathways were analyzed from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and BioCarta databases.
After correcting for the number of pathways tested, we found no strong evidence for any pathway showing association with type 2 diabetes (top Padj = 0.31). The candidate WNT-signaling pathway ranked top (nominal P = 0.0007, excluding TCF7L2; P = 0.002), containing a number of promising single gene associations. These include CCND2 (rs11833537; P = 0.003), SMAD3 (rs7178347; P = 0.0006), and PRICKLE1 (rs1796390; P = 0.001), all expressed in the pancreas.
Common variants involved in type 2 diabetes risk are likely to occur in or near genes in multiple pathways. Pathway-based approaches to genome-wide association data may be more successful for some complex traits than others, depending on the nature of the underlying disease physiology.
Genome-wide association (GWA) studies have identified a number of loci underlying variation in human serum uric acid (SUA) levels with the SLC2A9 gene having the largest effect identified so far. Gene-gene interactions (epistasis) are largely unexplored in these GWA studies. We performed a full pair-wise genome scan in the Italian MICROS population (n = 1201) to characterise epistasis signals in SUA levels. In the resultant epistasis profile, no SNP pairs reached the Bonferroni adjusted threshold for the pair-wise genome-wide significance. However, SLC2A9 was found interacting with multiple loci across the genome, with NFIA - SLC2A9 and SLC2A9 - ESRRAP2 being significant based on a threshold derived for interactions between GWA significant SNPs and the genome and jointly explaining 8.0% of the phenotypic variance in SUA levels (3.4% by interaction components). Epistasis signal replication in a CROATIAN population (n = 1772) was limited at the SNP level but improved dramatically at the gene ontology level. In addition, gene ontology terms enriched by the epistasis signals in each population support links between SUA levels and neurological disorders. We conclude that GWA epistasis analysis is useful despite relatively low power in small isolated populations.
Given that genome wide association studies (GWAS) of psychiatric disorders have identified only a small number of convincingly associated variants, there is interest in seeking additional evidence for associated variants using tests of gene-gene interaction. Comprehensive pair-wise SNP-SNP interaction analysis is computationally intensive and the penalty for multiple testing is severe given the number of interactions possible. Aiming to minimize these statistical and computational burdens, we have explored approaches to prioritise SNPs for interaction analyses.
Primary interaction analyses were performed using the Wellcome Trust Case Control Consortium Bipolar Disorder GWAS (1868 cases, 2938 controls). Replication analyses were performed using the Genetic Association Information Network BD dataset (1001 cases, 1033 controls). SNPs were prioritized for interaction analysis that showed evidence for association that surpassed a number of nominally significant thresholds, are within genome-wide significant genes, or are within genes that are functionally related.
For no set of prioritized SNPs did we obtain evidence to support the hypothesis that the selection strategy identified pairs of variants that were enriched for true (statistical) interactions.
SNPs prioritized according to a number of criteria do not have a raised prior probability for significant interaction that is detectable in samples of this size. As is now widely accepted for single SNP analysis, we argue the use of significance levels reflecting only the number of tests performed does not offer an appropriate degree of protection against the potential for GWAS studies to generate an enormous number of false positive interactions.
GWAS; SNP; epistasis; association; interaction; gene
The P-value approach has been employed to prioritizing genome-wide association (GWA) scan signals, with a genome-wide significance defined by a prior P-value threshold, although this is not ideal. A rationale put forward is that the association signals rather should be expected to give less support for single nucleotide polymorphisms (SNPs) that are rare (with associated low-power tests) than for common SNPs with equivalent P-values, unless investigators believe, a priori, that rare causative variants contribute to the disease and have more pronounced effects.
Using data from a GWA scan for type 2 diabetes (1924 cases, 2938 controls, 393 453 SNPs), we compared P-values with four alternative signal measures: likelihood ratio (LR), Bayes factor (BF; with a specified prior distribution for true effects), ‘frequentist factor’ (FF; reflecting the ratio between estimated—post-data— ‘power’ and P-value) and probability of pronounced effect size (PrPES).
The 19 common SNPs [minor allele frequency (MAF) among the controls >29%] yielding strong P-value signals (P<5×10−7) were also top ranked by the other approaches. There was a strong similarity between the P-values, LR and BF signals, in terms of ranking SNPs. In contrast, FF and PrPES signals down-weighted rare SNPs (control MAF<10%) with low P-values.
For prioritization of signals that do not achieve compelling levels of evidence for association, the main driving force behind observed differences between the various association signals appears to be SNP MAF. The statistical power afforded by follow-up samples for establishing replication should be taken into account when tailoring the signal selection strategy.
Bayes factor; effect size; likelihood ratio; single nucleotide polymorphism; statistical power; statistics
Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS.
Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits.
Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/.
Supplementary data are available at Bioinformatics online.
Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS.
The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements.
The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware.
Although they have demonstrated success in searching for common variants for complex diseases, Genome-Wide Association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include 7 complex diseases: Bipolar disorder, Cardiovascular disease, Hypertension, Rheumatoid Arthritis, Crohn’s disease, Type 1 Diabetes and Type 2 Diabetes. We identified 24 genes or regions that reach genome wide significance. 8 of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r2<0.32). The gene identified in hypertension was further replicated in the Framingham Heart Study (FHS), and is also significantly associated with Type 2 Diabetes. Our analysis suggests that searching for rare genetic variants is feasible in current genome-wide association studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.
With the exception of the major histocompatibility complex (MHC) and STAT4, no other rheumatoid arthritis (RA) linkage peak has been successfully fine-mapped to date. This apparent failure to identify association under peaks of linkage could be ascribed to the examination of common variation, when linkage is likely to be driven by rare variants. The purpose of this study was to investigate the overlap between genome-wide rare variant RA association signals observed in the Wellcome Trust Case Control Consortium (WTCCC) study and 11 replicating RA linkage peaks, defined as regions with evidence for linkage in >1 study.
The WTCCC data set contained 40,482 variants with minor allele frequency of ≤0.05 in 1,860 RA patients and 2,938 controls. Genotypes of all rare variants within a given gene region were collapsed into a single locus and a global P value was calculated per gene.
The distribution of rare variant signals (association P ≤ 10−5) was found to differ significantly between regions with and without linkage evidence (P = 2 × 10−17 by Fisher’s exact test). No significant difference was observed after data from the MHC region were removed or when the effect of the HLA–DRB1 locus was accounted for.
The results suggest that rare variant association signals are significantly overrepresented under linkage peaks in RA, but the effect is driven by the MHC. This is the first study to examine the overlap between linkage peaks and rare variant association signals genome-wide in a complex disease.
OBJECTIVE—This study examined how differences in the BMI distribution of type 2 diabetic case subjects affected genome-wide patterns of type 2 diabetes association and considered the implications for the etiological heterogeneity of type 2 diabetes.
RESEARCH DESIGN AND METHODS—We reanalyzed data from the Wellcome Trust Case Control Consortium genome-wide association scan (1,924 case subjects, 2,938 control subjects: 393,453 single-nucleotide polymorphisms [SNPs]) after stratifying case subjects (into “obese” and “nonobese”) according to median BMI (30.2 kg/m2). Replication of signals in which alternative case-ascertainment strategies generated marked effect size heterogeneity in type 2 diabetes association signal was sought in additional samples.
RESULTS—In the “obese-type 2 diabetes” scan, FTO variants had the strongest type 2 diabetes effect (rs8050136: relative risk [RR] 1.49 [95% CI 1.34–1.66], P = 1.3 × 10−13), with only weak evidence for TCF7L2 (rs7901695 RR 1.21 [1.09–1.35], P = 0.001). This situation was reversed in the “nonobese” scan, with FTO association undetectable (RR 1.07 [0.97–1.19], P = 0.19) and TCF7L2 predominant (RR 1.53 [1.37–1.71], P = 1.3 × 10−14). These patterns, confirmed by replication, generated strong combined evidence for between-stratum effect size heterogeneity (FTO: PDIFF = 1.4 × 10−7; TCF7L2: PDIFF = 4.0 × 10−6). Other signals displaying evidence of effect size heterogeneity in the genome-wide analyses (on chromosomes 3, 12, 15, and 18) did not replicate. Analysis of the current list of type 2 diabetes susceptibility variants revealed nominal evidence for effect size heterogeneity for the SLC30A8 locus alone (RRobese 1.08 [1.01–1.15]; RRnonobese 1.18 [1.10–1.27]: PDIFF = 0.04).
CONCLUSIONS—This study demonstrates the impact of differences in case ascertainment on the power to detect and replicate genetic associations in genome-wide association studies. These data reinforce the notion that there is substantial etiological heterogeneity within type 2 diabetes.
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene–gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene–gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis. We validate this approach in a two-stage (discovery/replication) pathway analysis of GWAS of BD. The discovery cohort comes from the Wellcome Trust Case Control Consortium (WTCCC) GWAS of BD, and the replication cohort comes from the National Institute of Mental Health (NIMH) GWAS of BD in European Ancestry individuals. Epistasis network centrality yields replicated enrichment of Cadherin signaling pathway, whose genes have been hypothesized to have an important role in BD pathophysiology but have not demonstrated enrichment in previous analysis. Other enriched pathways include Wnt signaling, circadian rhythm pathway, axon guidance and neuroactive ligand-receptor interaction. In addition to pathway enrichment, the collective network approach elevates the importance of ANK3, DGKH and ODZ4 for BD susceptibility in the WTCCC GWAS, despite their weak single-locus effect in the data. These results provide evidence that numerous small interactions among common alleles may contribute to the diathesis for BD and demonstrate the importance of including information from the network of gene–gene interactions as well as main effects when prioritizing genes for pathway analysis.
eigenvector centrality; epistasis network; evaporative cooling machine learning feature selection; pathway enrichment analysis; regression-based genetic association interaction network (reGAIN); SNPrank
Genome-wide association studies (GWAS) have identified single-nucleotide polymorphisms (SNPs) at multiple loci that are significantly associated with coronary artery disease (CAD) risk. In this study, we sought to determine and compare the predictive capabilities of 9p21.3 alone and a panel of SNPs identified and replicated through GWAS for CAD.
Methods and Results
We used the Ottawa Heart Genomics Study (OHGS) (3323 cases, 2319 control subjects) and the Wellcome Trust Case Control Consortium (WTCCC) (1926 cases, 2938 control subjects) data sets. We compared the ability of allele counting, logistic regression, and support vector machines. Two sets of SNPs, 9p21.3 alone and a set of 12 SNPs identified by GWAS and through a model-fitting procedure, were considered. Performance was assessed by measuring area under the curve (AUC) for OHGS using 10-fold cross-validation and WTCCC as a replication set. AUC for logistic regression using OHGS increased significantly from 0.555 to 0.608 (P=3.59×10–14) for 9p21.3 versus the 12 SNPs, respectively. This difference remained when traditional risk factors were considered in a subgroup of OHGS (1388 cases, 2038 control subjects), with AUC increasing from 0.804 to 0.809 (P=0.037). The added predictive value over and above the traditional risk factors was not significant for 9p21.3 (AUC 0.801 versus 0.804, P=0.097) but was for the 12 SNPs (AUC 0.801 versus 0.809, P=0.0073). Performance was similar between OHGS and WTCCC. Logistic regression outperformed both support vector machines and allele counting.
Using the collective of 12 SNPs confers significantly greater predictive capabilities for CAD than 9p21.3, whether traditional risks are or are not considered. More accurate models probably will evolve as additional CAD-associated SNPs are identified.
coronary disease; genetics; risk factors
The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan1 on seven diseases, including the multifactorial, autoimmune disease, type 1 diabetes (T1D), shows significant association (P < 5 × 10−7 between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios that were independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (Pfollow-up ≤ 1.35 × 10−9; Poverall ≤ 1.15 × 10−14), leaving eight regions with small effects or false-positive associations with T1D. We also obtained evidence for chromosome 18q22 (Poverall = 1.38 × 10−8) from a genome-wide association study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Genome-wide association study (GWAS) aims to find genetic factors underlying complex phenotypic traits, for which epistasis or gene-gene interaction detection is often preferred over single-locus approach. However, the computational burden has been a major hurdle to apply epistasis test in the genome-wide scale due to a large number of single nucleotide polymorphism (SNP) pairs to be tested.
We have developed a set of three efficient programs, FastANOVA, COE and TEAM, that support epistasis test in a variety of problem settings in GWAS. These programs utilize permutation test to properly control error rate such as family-wise error rate (FWER) and false discovery rate (FDR). They guarantee to find the optimal solutions, and significantly speed up the process of epistasis detection in GWAS.
A web server with user interface and source codes are available at the website http://www.csbio.unc.edu/epistasis/. The source codes are also available at SourceForge http://sourceforge.net/projects/epistasis/.
Genome-wide association studies (GWAS) have emerged as a powerful approach for identifying susceptibility loci associated with polygenetic diseases such as type 2 diabetes mellitus (T2DM). However, it is still a daunting task to prioritize single nucleotide polymorphisms (SNPs) from GWAS for further replication in different population. Several recent studies have shown that genetic variation often affects gene-expression at proximal (cis) as well as distal (trans) genomic locations by different mechanisms such as altering rate of transcription or splicing or transcript stability.
To prioritize SNPs from GWAS, we combined results from two GWAS related to T2DM, the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC), with genome-wide expression data from pancreas, adipose tissue, liver and skeletal muscle of individuals with or without T2DM or animal models thereof to identify T2DM susceptibility loci.
We identified 1,170 SNPs associated with T2DM with P < 0.05 in both GWAS and 243 genes that were located in the vicinity of these SNPs. Out of these 243 genes, we identified 115 differentially expressed in publicly available gene expression profiling data. Notably five of them, IGF2BP2, KCNJ11, NOTCH2, TCF7L2 and TSPAN8, have subsequently been shown to be associated with T2DM in different populations. To provide further validation of our approach, we reversed the approach and started with 26 known SNPs associated with T2DM and related traits. We could show that 12 (57%) (HHEX, HNF1B, IGF2BP2, IRS1, KCNJ11, KCNQ1, NOTCH2, PPARG, TCF7L2, THADA, TSPAN8 and WFS1) out of 21 genes located in vicinity of these SNPs were showing aberrant expression in T2DM from the gene expression profiling studies.
Utilizing of gene expression profiling data from different tissues of individuals with or without T2DM or animal models thereof is a powerful tool for prioritizing SNPs from WGAS for further replication studies.
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
Recent genome wide association studies (GWAS) have identified DNA sequence variations that exhibit unequivocal statistical associations with many common chronic diseases. However, the vast majority of these studies identified variations that explain only a very small fraction of disease burden in the population at large, suggesting that other factors, such as multiple rare or low-penetrance variations and interacting environmental factors, are major contributors to disease susceptibility. Identifying multiple low penetrance variations (or ‘polygenes’) contributing to disease susceptibility will be difficult. We present a pathway analysis approach to characterizing the likely polygenic basis of seven common diseases using the Wellcome Trust Case Control Consortium (WTCCC) GWAS results. We identify numerous pathways implicated in disease predisposition that would have not been revealed using standard single-locus GWAS statistical analysis criteria. Many of these pathways have long been assumed to contain polymorphic genes that lead to disease predisposition. Additionally, we analyze the genetic relationships between the seven diseases, and based upon similarities with respect to the associated genes and pathways affected in each, propose a new way of categorizing the diseases.
Pathway; genome-wide; disease; common; diabetes; crohn’s; coronary; bipolar; arthritis; hypertension
In addition to the HLA-locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). Twenty-eight loci were followed up in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 novel risk loci (P<5×10−8) and replicated all previously associated loci. Three further novel loci were identified by meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A, and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.
To identify susceptibility alleles associated with rheumatoid arthritis, we genotyped 397 individuals with rheumatoid arthritis for 116,204 SNPs and carried out an association analysis in comparison to publicly available genotype data for 1,211 related individuals from the Framingham Heart Study1. After evaluating and adjusting for technical and population biases, we identified a SNP at 6q23 (rs10499194, ∼150 kb from TNFAIP3 and OLIG3) that was reproducibly associated with rheumatoid arthritis both in the genome-wide association (GWA) scan and in 5,541 additional case-control samples (P = 10−3, GWA scan; P < 10−6, replication; P = 10−9, combined). In a concurrent study, the Wellcome Trust Case Control Consortium (WTCCC) has reported strong association of rheumatoid arthritis susceptibility to a different SNP located 3.8 kb from rs10499194 (rs6920220; P = 5 × 10−6 in WTCCC)2. We show that these two SNP associations are statistically independent, are each reproducible in the comparison of our data and WTCCC data, and define risk and protective haplotypes for rheumatoid arthritis at 6q23.
Coronary artery disease (CAD) shares common risk factors with type 2 diabetes (T2DM). Variations in the transcription factor 7-like 2 (TCF7L2) gene, particularly rs7903146, increase T2DM risk. Potential links between genetic variants of the TCF7L2 locus and coronary atherosclerosis are uncertain. We therefore investigated the association between TCF7L2 polymorphisms and angiographically determined CAD in diabetic and non-diabetic patients.
We genotyped TCF7L2 variants rs7903146, rs12255372, and rs11196205 in a cross-sectional study including 1,650 consecutive patients undergoing coronary angiography for the evaluation of established or suspected stable CAD. Significant CAD was diagnosed in the presence of coronary stenoses ≥50%. Variant rs7903146 in the total study cohort was significantly associated with significant CAD (adjusted additive OR = 1.29 [1.09–1.53]; p = 0.003). This association was strong and significant in T2DM patients (n = 393; OR = 1.91 [1.32–2.75]; p = 0.001) but not in non-diabetic subjects (OR = 1.09 [0.90–1.33]; p = 0.370). The interaction risk allele by T2DM was significant (pinteraction = 0.002), indicating a significantly stronger impact of the polymorphism on CAD in T2DM patients than in non-diabetic subjects. TCF7L2 polymorphisms rs12255372 and rs11196205 were also significantly associated with CAD in diabetic patients (adjusted additive OR = 1.90 [1.31–2.74]; p = 0.001 and OR = 1.75 [1.22–2.50]; p = 0.002, respectively). Further, haplotype analysis demonstrated that haplotypes including the rare alleles of all investigated variants were significantly associated with CAD in the whole cohort as well as in diabetic subjects (OR = 1.22 [1.04–1.43]; p = 0.013 and OR = 1.67 [1.19–2.22]; p = 0.003, respectively).
These results suggest that TCF7L2 variants rs7903146 rs12255372, and rs11196205 are significantly associated with angiographically diagnosed CAD, specifically in patients with T2DM. TCF7L2 therefore appears as a genetic link between diabetes and atherosclerosis.
Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case–control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.
Psychiatric phenotypes are currently defined according to sets of
descriptive criteria. Although many of these phenotypes are heritable, it
would be useful to know whether any of the various diagnostic categories in
current use identify cases that are particularly helpful for
To use genome-wide genetic association data to explore the relative genetic
utility of seven different descriptive operational diagnostic categories
relevant to bipolar illness within a large UK case–control bipolar
We analysed our previously published Wellcome Trust Case Control Consortium
(WTCCC) bipolar disorder genome-wide association data-set, comprising 1868
individuals with bipolar disorder and 2938 controls genotyped for 276 122
single nucleotide polymorphisms (SNPs) that met stringent criteria for
genotype quality. For each SNP we performed a test of association (bipolar
disorder group v. control group) and used the number of associated
independent SNPs statistically significant at P<0.00001 as a
metric for the overall genetic signal in the sample. We next compared this
metric with that obtained using each of seven diagnostic subsets of the group
with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder;
manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type;
DSM–IV: bipolar I disorder; bipolar II disorder; schizoaffective
disorder, bipolar type.
The RDC schizoaffective disorder, bipolar type (v. controls) stood
out from the other diagnostic subsets as having a significant excess of
independent association signals (P<0.003) compared with that
expected in samples of the same size selected randomly from the total bipolar
disorder group data-set. The strongest association in this subset of
participants with bipolar disorder was at rs4818065 (P =
2.42×10–7). Biological systems implicated included
gamma amniobutyric acid (GABA)A receptors. Genes having at least
one associated polymorphism at P<10–4 included
B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and
Our findings show that individuals with broadly defined bipolar
schizoaffective features have either a particularly strong genetic
contribution or that, as a group, are genetically more homogeneous than the
other phenotypes tested. The results point to the importance of using
diagnostic approaches that recognise this group of individuals. Our approach
can be applied to similar data-sets for other psychiatric and non-psychiatric
Genome-wide association studies (GWAS) have repeatedly shown an association between non-coding variants in the TCF7L2 locus and risk for type 2 diabetes (T2D), implicating a role for cis-regulatory variation within this locus in disease etiology. Supporting this hypothesis, we previously localized complex regulatory activity to the TCF7L2 T2D-associated interval using an in vivo bacterial artificial chromosome (BAC) enhancer-trapping reporter strategy. To follow-up on this broad initial survey of the TCF7L2 regulatory landscape, we performed a fine-mapping enhancer scan using in vivo mouse transgenic reporter assays. We functionally interrogated approximately 50% of the sequences within the T2D-associated interval, utilizing sequence conservation within this 92-kb interval to determine the regulatory potential of all evolutionary conserved sequences that exhibited conservation to the non-eutherian mammal opossum. Included in this study was a detailed functional interrogation of sequences spanning both protective and risk alleles of single nucleotide polymorphism (SNP) rs7903146, which has exhibited allele-specific enhancer function in pancreatic beta cells. Using these assays, we identified nine segments regulating various aspects of the TCF7L2 expression profile and that constitute nearly 70% of the sequences tested. These results highlight the regulatory complexity of this interval and support the notion that a TCF7L2 cis-regulatory disruption leads to T2D predisposition.
We surveyed gene–gene interactions (epistasis) in human body mass index (BMI) in four European populations (n<1200) via exhaustive pair-wise genome scans where interactions were computed as F ratios by testing a linear regression model fitting two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide significant in either cohort based on the consensus threshold (P=5.0E−08) and a Bonferroni corrected threshold (P=1.1E−12), respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E−08) across cohorts to identify common epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published genome-wide association loci, including CDH13 (cadherin 13) associated with height and SORCS2 (sortilin-related VPS10 domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E−08) were tested across cohorts and found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts to confirm the findings.
body mass index; BMI; gene interaction; epistasis; pair-wise genome scan