In our GMDR analysis using KARE data, we first performed the single SNP association test via PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/
) using a linear regression model with covariate adjustment of age, area, and sex. We used the p-value from linear regression as a criterion of screening and used a p-value threshold of 3 × 10-1
for the screening process; 101,837 SNPs were carried out after the screening process, as a result of the single SNP association test. The top 10 SNPs with the smallest p-values are listed in . The GMDR analysis was then performed using our GPU-based software, cuGWAM [19
], in order to evaluate all possible two-way interactions and used a 10-fold cross-validation scheme, with covariate adjustment of age, area, and sex. An exhaustive search for two-way interactions from the selected SNPs took 19 h on the GPU system with three GTX285 graphic cards. In this step, a total of 51,853,363,660 possible two-way interaction models were evaluated.
Top 10 SNPs from linear regression analysis
In GMDR analysis, 10-fold cross-validation was used. For each cross-validation set, the top 10,000 interactions with the highest test BAs were selected, where BA is defined as the arithmetic mean of sensitivity and specificity [25
]. We then calculated cross-validation consistency (CVC), which represents how many times the same two-way interaction is selected out of 10 cross-validation sets. Finally, we performed a screening step for every pair of interactions achieved from GMDR analysis that satisfied the two criteria, CVC ≥ 9 and test BA ≥ 0.5.
Finally, we found 524 two-way interactions under these screening criteria. Among these interactions, 127 are from genes that are known to be associated with obesity and are summarized in . Among these interactions, five genes-FTO
, and NRXN3
-and 59 SNPs have been reported in previous studies [15
]. Visualization of these 524 two-way interactions into a network graph was then made after annotating SNPs to the genes. We used Gephi (https://gephi.org
) as a visualization tool. Gene annotation was performed according to the hg18 human genome reference and dbSNP 129. These 524 twoway interactions are displayed in the network graph (). Each node represents either an SNP or gene. If a SNP is annotated to a known gene, then it is denoted by the gene name. The number of interactions between nodes was represented as the thickness of the edge. For example, node FTO has a very thick edge with node NT5C2. Conversely, some nodes have a large number of nodes that are connected with it. We call these nodes hub nodes or hub SNPs. Hub nodes are represented by a gray background color.
Result of two-way interaction test
Fig. 1 Visualized result of significant interactions that have their cross-validation consistency ≥9. Arranged for readability: gray background indicates hub node, and red, white, blue, and yellow names indicate that they are identified for their relation (more ...)
We investigated which SNPs that were included in the 524 two-way interactions are related with obesity using DAVID [30
], a comprehensive set of functional annotation tools. As a result, we detected six genes-NDUFA8
, and ATP6V1B2
. The role of these genes in the metabolism pathway is related to oxidative phosphorylation, which plays an important role of hepatic mitochondrial function in the development of obesity [31
Among the 524 two-way interactions, some SNPs might have weak main effects but have strong interactions with other SNPs. shows the SNPs from hub genes that show significant two-way interactions with other SNPs but have weak marginal effects in the single SNP analysis.
Single nucleotide polymorphisms (SNPs) with weak marginal effect and strong interaction
For the network graph analysis, four public databases (HuGENet [21
], COXPRESdb [24
], miRBase [22
], and isrSNP [23
]) were collected and used to investigate the biological relationship between two SNPs with an interaction. A short summary of each database is given in . All databases used in this study were converted from the original bulk to a database table with automated script for the integrated investigation. The script for this conversion can be provided upon request.
Short summary of biological knowledge used in this study
In order to improve the biological relevance of the interpretation, we excluded all interactions having linkage disequilibrium (LD) between two SNPs. One of the results of the network graph analysis is given in . We identified at least one line of biological evidence among 26% of the 524 GxG interactions.
Fig. 2 A visualization of gene-gene interaction interpretation with biological knowledge. Two red circles denote two single-nucleotide polymorphisms (SNPs) within a two-way interaction, and purple circles denote corresponding genes against two SNPs. Gray circles (more ...)
In order to investigate the relationship between CVC and the rate of biological evidence, we performed an additional network analysis using 1,838 interactions (CVC ≥ 7) and found out that there was a strong relationship between CVC of the GMDR analysis and the presence of known biological evidence. As shown in , the proportion of interactions having a known biological interaction was significantly higher when CVC = 10 than when 7 ≤ CVC < 10 (39% vs. 17% on average). In addition, from the network analysis of 179 interactions having CVC = 10, 69 interactions showed biological evidence and 30 (43% of 69) shared a known relationship against BMI-related diseases, such as cardiovascular disease, body weight, hypertension.
Proportion of known biological interactions by cross-validation consistency (CVC)