PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of giGenomics & InformaticsAims and ScopeInstructions to Authorse-SubmissionThis Article
 
Genomics Inform. 2012 December; 10(4): 256–262.
Published online 2012 December 31. doi:  10.5808/GI.2012.10.4.256
PMCID: PMC3543927

Network Graph Analysis of Gene-Gene Interactions in Genome-Wide Association Study Data

Abstract

Most common complex traits, such as obesity, hypertension, diabetes, and cancers, are known to be associated with multiple genes, environmental factors, and their epistasis. Recently, the development of advanced genotyping technologies has allowed us to perform genome-wide association studies (GWASs). For detecting the effects of multiple genes on complex traits, many approaches have been proposed for GWASs. Multifactor dimensionality reduction (MDR) is one of the powerful and efficient methods for detecting high-order gene-gene (GxG) interactions. However, the biological interpretation of GxG interactions identified by MDR analysis is not easy. In order to aid the interpretation of MDR results, we propose a network graph analysis to elucidate the meaning of identified GxG interactions. The proposed network graph analysis consists of three steps. The first step is for performing GxG interaction analysis using MDR analysis. The second step is to draw the network graph using the MDR result. The third step is to provide biological evidence of the identified GxG interaction using external biological databases. The proposed method was applied to Korean Association Resource (KARE) data, containing 8838 individuals with 327,632 single-nucleotide polymorphisms, in order to perform GxG interaction analysis of body mass index (BMI). Our network graph analysis successfully showed that many identified GxG interactions have known biological evidence related to BMI. We expect that our network graph analysis will be helpful to interpret the biological meaning of GxG interactions.

Keywords: gene-gene interaction, generalized multifactor dimensionality reduction, genome-wide association study, graph analysis, graphic processing units, network graph

Introduction

In recent years, a significant advance of technology has been achieved in both genotyping and computing fields, which has enabled us to perform large-scale genome-wide association studies (GWASs) [1, 2]. GWASs have provided enormous potential in identifying genetic variants for common complex diseases. By GWASs, researchers have discovered many genetic variants that affect common complex traits, such as height, type 2 diabetes, and blood pressure [3-5].

In particular, the nature of obesity is related with many chronic diseases, such as type 2 diabetes, cardiovascular disease, and cancers [6-8]. In addition, it has been found that obesity may be caused by multiple genes, environmental factors, and those interactions [9-11].

In order to identify the genes associated with obesity, body mass index (BMI) has been commonly used as an obesity-related phenotype or trait in GWASs. Many studies have reported that the genetic variants in FTO are associated with BMI [12, 13]. Most of these studies have used the single single-nucleotide polymorphism (SNP) approach, in which each single susceptible locus is searched in association with obesity. The single SNP approach has long been used in detecting genetic traits, and it is especially optimal for Mendelian traits. However, complex traits are well known to be caused by multiple genetic factors as well as several environmental factors and their interactions. Thus, the single SNP approach is not appropriate for detecting multiple genetic factors affecting common complex polygenic traits.

For the identification of multiple and joint genetic factors associated with complex traits, the multifactor dimensionality reduction (MDR) method has been proposed by Ritchie et al. [14] and applied in detecting gene-gene (GxG) interactions [15, 16]. However, their approach can be applied only to binary traits, and the covariates that may affect the phenotype can not be adjusted. In order to overcome these drawbacks, generalized MDR (GMDR) analysis was proposed by Lou et al. [17], in which both binary and quantitative traits can be considered phenotypes while adjusting for covariates.

Although MDR and GMDR have been widely used to detect GxG interactions in genetic association studies, their applications have been mainly involved in dealing with candidate genes or a small set of genes. Since enormous computational complexity and high-speed computing resources are required in GWASs, neither MDR nor GMDR has been successfully applied to GWASs. In order to overcome these computational limitations, we previously proposed an efficient stepwise approach to identify GxG interactions for GWAS data [18] using cuGWAM, a graphic processing units (GPU)-based MDR program [19].

However, it still remains a challenge to have a biological interpretation of the identified GxG interactions. In order to elucidate the identified GxG interactions, we propose a network graph analysis. Our network analysis method focuses not only on the biological meaning but also on the strength of interaction.

We applied our network graph approach to Korean Association Resource (KARE) data for GxG interactions and its interpretation. We first performed GMDR analysis for BMI. Using our network graph analysis, we then identified several GxG interactions having reasonable biological interpretation from the network graph using public biological resources.

Methods

The KARE data

The KARE project, established as part of the Korean Genome Epidemiology Study (KoGES) in 2001, has started to undertake a large-scale genome-wide association analysis in Korean population-based cohort. These cohorts consist of 10,038 participants in the urban Ansan (n = 5,020) and rural Ansung (n = 5,018) communities. The objective of the KARE project is to identify genetic factors of several clinical traits and lifestyle-related diseases by GWAS, based on population-based cohorts. In this analysis, we focused on BMI, one of more than 260 lifestyle-related traits.

DNA samples were isolated from the peripheral blood of all participants and were genotyped with Affymetrix Genome-Wide Human SNP array 5.0 (Affymetrix, Rockville, MD, USA). Genotypes were called using Bayesian Robust Linear Modeling using the Mahalanobis distance (BRLMM) algorithm [20]. We performed a quality control process for the sample and genotype as previously described in Cho et al. [2]. In order to increase the coverage of common variants and capture additional association signals, we performed SNP imputation with PLINK using the Japanese/Chinese (JPT/CHB) component of HapMap as the reference. After sample and genotype quality controls, 327,872 SNPs for 8,842 individuals were available in the KARE data. We applied our proposed regularization method to Korean genome-wide data.

Network graph analysis

GMDR analysis step

GWAS datasets usually have enormous number of markers, which causes a severe computational burden. In order to deal with this computation problem, the single SNP association test with covariate adjustment is performed to all SNPs in a dataset. This step can be achieved via several toolkits that support regression analysis, such as PLINK. Filtering by specific criteria, such as p-value, is subsequently performed in order to reduce the number of SNPs.

With the selected SNPs from the single SNP association analysis, exhaustive two-way GMDR analysis is then applied. However, general toolkits supporting GMDR analysis can only handle hundreds of SNPs. Although the number of SNPs in this step can be reduced by giving more stringent criteria, it may ignore the SNPs having marginally less significant effects. In addition, a scheme to reduce overfitting, such as cross-validation, requires increased linear computation time. In order to solve this problem, faster toolkits are recommended, such as GPU-based GMDR analysis toolkits like cuGWAM [19].

Since the number of combinations in GMDR analysis is usually very large, all results of GMDR can not be stored. Thus, a criterion is needed to store the specific interactions satisfying the criterion, such as balanced accuracy (BA).

Network graph construction

Network graph is often used to visualize multiple sets of interactions. It consists of nodes and edges connecting two nodes. For simplicity of representation, we focus on two-way interactions. Since the two-way interaction is for the relationship between two variables, it can be easily represented as a graph that contains two nodes representing two variables of interaction and an edge corresponding to the interaction. According to this scheme, all interactions identified by GMDR analysis can be visualized in the network graph.

However, this scheme can cause too complex a graph structure when the number of interactions is large. Thus, it is recommended to select a certain number of possible interactions in the network graph to reduce the complexity of the network graph. Alternatively, the complexity of the network graph can be reduced by gene mapping information. Since many SNPs are placed in a gene or adjacent to a gene, it is possible to annotate multiple SNPs into a single node of a gene according to their position. However, this approach requires a delicate mapping process.

In addition, the network graph can represent the strength of the interaction by the thickness of the edge-that is, the higher the number of interactions between two nodes, the thicker the edge becomes.

Network graph analysis

It is important to derive a reasonable biological interpretation of identified GxG interactions. Here, we propose a network graph analysis step to aid the biological interpretation of the identified GxG interaction by using existing biological resources. In other words, the goal of this step is to find known biological evidence that supports the identified GxG interactions.

Resources used in this step consist of either a pair of a marker and target or two related markers, where a marker corresponds to a gene or SNP and a target represents a biologically proven target related with a marker. From a resource with marker and target information, such as HuGENet [21], miRBase [22], and is-rSNP [23], we simply assume that there is biological evidence if two SNPs within an interaction share the same biological target defined in the biological resource. For resources, such as COXPRESdb [24], having information about the correlation among genes, we assume that all correlated genes in these resources have biological interactions. Although this is a naïve assumption, the use of these resources is mainly to support the interactions identified by GMDR analysis. Note that many known biological resources have their own unique data structures that differ from each other. Consequently, the integration of multiple resources is recommended in order to provide further biological evidence for a given GxG interaction.

Results

In our GMDR analysis using KARE data, we first performed the single SNP association test via PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/) using a linear regression model with covariate adjustment of age, area, and sex. We used the p-value from linear regression as a criterion of screening and used a p-value threshold of 3 × 10-1 for the screening process; 101,837 SNPs were carried out after the screening process, as a result of the single SNP association test. The top 10 SNPs with the smallest p-values are listed in Table 1. The GMDR analysis was then performed using our GPU-based software, cuGWAM [19], in order to evaluate all possible two-way interactions and used a 10-fold cross-validation scheme, with covariate adjustment of age, area, and sex. An exhaustive search for two-way interactions from the selected SNPs took 19 h on the GPU system with three GTX285 graphic cards. In this step, a total of 51,853,363,660 possible two-way interaction models were evaluated.

Table 1
Top 10 SNPs from linear regression analysis

In GMDR analysis, 10-fold cross-validation was used. For each cross-validation set, the top 10,000 interactions with the highest test BAs were selected, where BA is defined as the arithmetic mean of sensitivity and specificity [25]. We then calculated cross-validation consistency (CVC), which represents how many times the same two-way interaction is selected out of 10 cross-validation sets. Finally, we performed a screening step for every pair of interactions achieved from GMDR analysis that satisfied the two criteria, CVC ≥ 9 and test BA ≥ 0.5.

Finally, we found 524 two-way interactions under these screening criteria. Among these interactions, 127 are from genes that are known to be associated with obesity and are summarized in Table 2. Among these interactions, five genes-FTO, CBARA1, CHUK, SLIT3, and NRXN3-and 59 SNPs have been reported in previous studies [15, 16, 26-29]. Visualization of these 524 two-way interactions into a network graph was then made after annotating SNPs to the genes. We used Gephi (https://gephi.org) as a visualization tool. Gene annotation was performed according to the hg18 human genome reference and dbSNP 129. These 524 twoway interactions are displayed in the network graph (Fig. 1). Each node represents either an SNP or gene. If a SNP is annotated to a known gene, then it is denoted by the gene name. The number of interactions between nodes was represented as the thickness of the edge. For example, node FTO has a very thick edge with node NT5C2. Conversely, some nodes have a large number of nodes that are connected with it. We call these nodes hub nodes or hub SNPs. Hub nodes are represented by a gray background color.

Fig. 1
Visualized result of significant interactions that have their cross-validation consistency ≥9. Arranged for readability: gray background indicates hub node, and red, white, blue, and yellow names indicate that they are identified for their relation ...
Table 2
Result of two-way interaction test

We investigated which SNPs that were included in the 524 two-way interactions are related with obesity using DAVID [30], a comprehensive set of functional annotation tools. As a result, we detected six genes-NDUFA8, NDUFB8, NDUFC2, NDUFS2, COX5A, and ATP6V1B2. The role of these genes in the metabolism pathway is related to oxidative phosphorylation, which plays an important role of hepatic mitochondrial function in the development of obesity [31].

Among the 524 two-way interactions, some SNPs might have weak main effects but have strong interactions with other SNPs. Table 3 shows the SNPs from hub genes that show significant two-way interactions with other SNPs but have weak marginal effects in the single SNP analysis.

Table 3
Single nucleotide polymorphisms (SNPs) with weak marginal effect and strong interaction

For the network graph analysis, four public databases (HuGENet [21], COXPRESdb [24], miRBase [22], and isrSNP [23]) were collected and used to investigate the biological relationship between two SNPs with an interaction. A short summary of each database is given in Table 4. All databases used in this study were converted from the original bulk to a database table with automated script for the integrated investigation. The script for this conversion can be provided upon request.

Table 4
Short summary of biological knowledge used in this study

In order to improve the biological relevance of the interpretation, we excluded all interactions having linkage disequilibrium (LD) between two SNPs. One of the results of the network graph analysis is given in Fig. 2. We identified at least one line of biological evidence among 26% of the 524 GxG interactions.

Fig. 2
A visualization of gene-gene interaction interpretation with biological knowledge. Two red circles denote two single-nucleotide polymorphisms (SNPs) within a two-way interaction, and purple circles denote corresponding genes against two SNPs. Gray circles ...

In order to investigate the relationship between CVC and the rate of biological evidence, we performed an additional network analysis using 1,838 interactions (CVC ≥ 7) and found out that there was a strong relationship between CVC of the GMDR analysis and the presence of known biological evidence. As shown in Table 5, the proportion of interactions having a known biological interaction was significantly higher when CVC = 10 than when 7 ≤ CVC < 10 (39% vs. 17% on average). In addition, from the network analysis of 179 interactions having CVC = 10, 69 interactions showed biological evidence and 30 (43% of 69) shared a known relationship against BMI-related diseases, such as cardiovascular disease, body weight, hypertension.

Table 5
Proportion of known biological interactions by cross-validation consistency (CVC)

Discussion

The main purpose of a GWAS is to detect genetic variants represented by a single SNP or a set of SNPs that are associated with common complex polygenic traits of interest [2, 4, 5, 16, 17]. However, it is commonly known that multiple genetic factors as well as several environmental factors and their interactions are involved in causing complex diseases. Thus, the single SNP approach is not appropriate for detecting multiple genetic factors for common complex polygenic traits. Recently, many researchers have tried GxG interaction analysis in GWASs. However, they focused on rather restricted regions with a small number of SNPs due to computational complexity [32, 33].

In this paper, we have successfully performed two-way GxG interaction analysis of 8,838 individuals with the selected 101,837 SNPs, which cover most genomic regions. It became possible with the help of a GPU-based system with three GPU graphic cards. It took only 19 h to perform 51,853,363,660 possible two-way interaction analyses with 10-fold cross-validation.

Our results showed that the FTO, CBARA1, CHUK, SLIT3, and NRXN3 genes are strongly associated with obesity. In addition, our approach provided more information about possible GxG interactions. Note that some SNPs having weak main effects showed strong interactions with other SNPs (Table 3). It would have been very difficult to detect these SNPs through single SNP analysis, which demonstrates the usefulness of our approach in detecting GxG interactions.

In addition, our analysis provided some novel interactions that have not been reported in previous studies. For example, 9 SNPs from the FTO gene were found to have interactions with SNPs from CNNM2, AS3MT, and NT5C2. Similar interactions were found between LOC301738 and PDZRN4, as shown in Fig. 1.

Our network graph summarizes two-way interactions between SNPs very efficiently. We expect that this visualization tool could be helpful to researchers. It is also quite efficient to identify specific patterns and hub genes.

Through our network graph analysis, we also identified several pieces of evidence that supported the identified interactions with known biological information. From the investigation with four published databases, we found that 39% of a total of 179 interactions with CVC = 10 are known to have biological interactions from the four databases we collected. In addition, our network graph analysis showed a higher identification rate of shared biological relevance when the CVC was large than when the CVC was low, which suggests that interactions having a high value of CVC tend to share more known biological evidence that are actually related the phenotype.

In summary, our network graph analysis from GxG interaction analysis successfully identified the relationship between interactions from GMDR analysis and actual biological relevance. In addition, our strategy interpreted GxG interactions very effectively in finding known biological evidence using multiple public databases, especially in finding evidence that is actually related with the phenotype.

Acknowledgments

This work was supported by a National Research Foundation of Korea (NRF) grant, funded by the Korea government (MEST) (no. 2012R1A3A2026438).

References

1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6:95–108. [PubMed]
2. Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, et al. A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet. 2009;41:527–534. [PubMed]
3. Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet. 2008;40:575–583. [PMC free article] [PubMed]
4. Voight BF, Scott LJ, Steinthorsdottir V, Morris AP, Dina C, Welch RP, et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet. 2010;42:579–589. [PMC free article] [PubMed]
5. Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, et al. Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet. 2009;41:666–676. [PMC free article] [PubMed]
6. Hill JO, Peters JC. Environmental contributions to the obesity epidemic. Science. 1998;280:1371–1374. [PubMed]
7. Ichihara S, Yamada Y. Genetic factors for human obesity. Cell Mol Life Sci. 2008;65:1086–1098. [PubMed]
8. Hofker M, Wijmenga C. A supersized list of obesity genes. Nat Genet. 2009;41:139–140. [PubMed]
9. Feitosa MF, Borecki IB, Rich SS, Arnett DK, Sholinsky P, Myers RH, et al. Quantitative-trait loci influencing body-mass index reside on chromosomes 7 and 13: the National Heart, Lung, and Blood Institute Family Heart Study. Am J Hum Genet. 2002;70:72–82. [PubMed]
10. Farooqi IS, O'Rahilly S. Genetic factors in human obesity. Obes Rev. 2007;8(Suppl 1):37–40. [PubMed]
11. Awaya T, Yokosaki Y, Yamane K, Usui H, Kohno N, Eboshida A. Gene-environment association of an ITGB2 sequence variant with obesity in ethnic Japanese. Obesity (Silver Spring) 2008;16:1463–1466. [PubMed]
12. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. [PMC free article] [PubMed]
13. Scuteri A, Sanna S, Chen WM, Uda M, Albai G, Strait J, et al. Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 2007;3:e115. [PubMed]
14. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–147. [PubMed]
15. Julià A, Moore J, Miquel L, Alegre C, Barceló P, Ritchie M, et al. Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. Genomics. 2007;90:6–13. [PubMed]
16. Brassat D, Motsinger AA, Caillier SJ, Erlich HA, Walker K, Steiner LL, et al. Multifactor dimensionality reduction reveals gene-gene interactions associated with multiple sclerosis susceptibility in African Americans. Genes Immun. 2006;7:310–315. [PubMed]
17. Lou XY, Chen GB, Yan L, Ma JZ, Mangold JE, Zhu J, et al. A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies. Am J Hum Genet. 2008;83:457–467. [PubMed]
18. Lee SY, Oh SH, Kwon MS, Lee SY, Park TS. In: Di Bernardo D, Li GZ, Chan TF, Luo B, Chen J, Michalowski M, editors. Two-way interaction analysis of obesity trait from Korean population using generalized MDR; IEEE International Conference on Bioinformatics and Biomedicine Workshops; 2010 Dec 18-21; Hong Kong. pp. 353–358.
19. Kwon MS, Kim K, Lee S, Park T. cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit. Int J Data Min Bioinform. 2012;6:471–481. [PubMed]
20. Rabbee N, Speed TP. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics. 2006;22:7–12. [PubMed]
21. Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ. A navigator for human genome epidemiology. Nat Genet. 2008;40:124–125. [PubMed]
22. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. [PMC free article] [PubMed]
23. Macintyre G, Bailey J, Haviv I, Kowalczyk A. is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics. 2010;26:i524–i530. [PMC free article] [PubMed]
24. Obayashi T, Kinoshita K. COXPRESdb: a database to compare gene coexpression in seven model animals. Nucleic Acids Res. 2011;39:D1016–D1022. [PMC free article] [PubMed]
25. Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, et al. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol. 2007;31:306–315. [PubMed]
26. Bordicchia M, Battistoni I, Mancinelli L, Giannini E, Refi G, Minardi D, et al. Cannabinoid CB1 receptor expression in relation to visceral adipose depots, endocannabinoid levels, microvascular damage, and the presence of the Cnr1 A3813G variant in humans. Metabolism. 2010;59:734–741. [PubMed]
27. Vogel CI, Greene B, Scherag A, Müller TD, Friedel S, Grallert H, et al. Non-replication of an association of CTNNBL1 polymorphisms and obesity in a population of Central European ancestry. BMC Med Genet. 2009;10:14. [PMC free article] [PubMed]
28. Heard-Costa NL, Zillikens MC, Monda KL, Johansson A, Harris TB, Fu M, et al. NRXN3 is a novel locus for waist circumference: a genome-wide association study from the CHARGE Consortium. PLoS Genet. 2009;5:e1000539. [PMC free article] [PubMed]
29. Liu YJ, Guo YF, Zhang LS, Pei YF, Yu N, Yu P, et al. Biological pathway-based genome-wide association analysis identified the vasoactive intestinal peptide (VIP) pathway important for obesity. Obesity (Silver Spring) 2010;18:2339–2346. [PMC free article] [PubMed]
30. Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. [PMC free article] [PubMed]
31. Buchner DA, Yazbek SN, Solinas P, Burrage LC, Morgan MG, Hoppel CL, et al. Increased mitochondrial oxidative phosphorylation in the liver is associated with obesity and insulin resistance. Obesity (Silver Spring) 2011;19:917–924. [PubMed]
32. Neuman RJ, Wasson J, Atzmon G, Wainstein J, Yerushalmi Y, Cohen J, et al. Gene-gene interactions lead to higher risk for development of type 2 diabetes in an Ashkenazi Jewish population. PLoS One. 2010;5:e9903. [PMC free article] [PubMed]
33. Yu HH, Liu PH, Lin YC, Chen WJ, Lee JH, Wang LC, et al. Interleukin 4 and STAT6 gene polymorphisms are associated with systemic lupus erythematosus in Chinese patients. Lupus. 2010;19:1219–1228. [PubMed]

Articles from Genomics & Informatics are provided here courtesy of Korea Genome Organization