As genotyping technologies progress and we move into the era of whole-genome sequencing, the need to improve analysis schemes is ever-present. This is especially true when gene-gene, gene-environment, and gene-drug interactions are concerned. Allowing our biological knowledge of gene and protein network dynamics to guide the search for the genetic basis of disease is a promising solution to this dilemma. While our current state of biological knowledge is limited, and that knowledge-base will continue to grow and develop over time, if we develop techniques that use the information we have, while still exploring novel interactions, we have a greater chance for success. By narrowing the dimensions of the search space, the computational complexity of the problem becomes much more amenable to current analytical techniques. In addition, interpretation of results is more straightforward. We utilized a list of 245 genes involved in absorption, distribution, metabolism and elimination of drugs and their metabolites to focus the search for gene-gene interactions associated with virologic failure during HIV treatment with efavirenz. Although there were no gene-gene interactions which remained significant after correction for multiple testing, this could be related to the small sample size present in this study. Due to race-stratification, the largest group in the analysis had 74 cases and 357 controls. But the development of this analytic pipeline and software tools will be immensely useful for future analyses.
The interactions which appeared most significant in the results of the logistic regression analysis occur between a SNP - rs2318785 - in the NME2 gene and multiple SNPs in the NME7 gene. Both NME2 and NME7 are part of the NDK family, coding for nucleoside diphosphate kinase enzymes involved in the synthesis of non-ATP nucleoside triphosphates. Although it is not readily apparent as to why purine and pyrimidine metabolism would be involved in the predisposition towards virologic failure, it is possible that this could represent novel biological knowledge in this field. Currently known reasons for virologic failure include lack of adherence to drug regimen, presence of drug resistance mutations in the HIV strain, and drug interactions which might limit efficacy. In the absence of environmental heterogeneity, little is known about the etiology of virologic failure. Small sample size precludes our ability to draw conclusions about the role of nucleoside triphosphate metabolism on risk for virologic failure. Other SNP interaction models which were among the most significant results involve a SNP in the TAP1 gene - rs735883 - and multiple SNPs in the ABCC9 gene. TAP1 encodes a transporter responsible for the shuttling of antigen into the endoplasmic reticulum for association with MHC class I while ABCC9 is part of the MRP subfamily of ABC transporters associated with multi-drug resistance and codes for a protein thought to be a subunit of a pancreatic potassium channel responsible for drug-binding modulation of the channel. It could be that down-regulation of TAP1 through mutation prevents proper immune response to the virus even after it has been affected by NNRTI action and this allows it to rebound during treatment. The results of the current study require validation with larger sample size before any firm conclusion can be drawn. The current results are meant to demonstrate the pipeline for analysis and the general approach rather than attempting to draw general statements regarding true biological associations with HIV therapy.
Despite the lack of statistical power to elucidate a significant genetic interaction, this study shows the promise of the use of Biofilter for focusing the search for gene-gene interactions during large-scale genetic association studies. The number of polymorphisms typed in association studies is nearing our limits to perform exhaustive explorations of two-way interactions during analysis. Reducing the set of interesting models to evaluate presents itself as a capable alternative. Utilizing Biofilter to provide the set of interesting models and PLATO to perform analysis has at least three advantages over traditional exhaustive gene-gene interaction analysis. First, it partially alleviates issues of multiple comparisons. Second, interpretation of results is significantly eased due to models construction. Third, the use of regression framework allows for the adjustment of the analysis taking into account important covariates. Although the use of Biofilter might not be as promising an option in cases where very little biological knowledge exists on the phenotype being analyzed, in the case of pharmacogenomics, where extensive drug metabolism networks have been elucidated, utilizing this knowledge to direct the analysis is a superior alternative, particularly when epistasis is concerned. As the search for the genetic architecture underlying complex traits such as drug pharmacokinetics continues, utilities such as the Biofilter can play an important role. Drug response is a nuanced trait and, as such, is likely to have genetic components which are monogenic as well as those that are multi-locus. Now that whole-genome sequencing technology is almost ready for wide-spread implementation, rare genetic variation is likely also to become an important component to consider for pharmacogenomic traits. Due to the nature of rare variants, the same pathway knowledge which is exploited by Biofilter to search for epistasis should be useful to group these rare variants to look for patterns predicting drug response. In summary, Biofilter is a tool which is likely to prove invaluable for the analysis of large-scale genetic association data for complex disease, especially in pharmacogenomic data where the biological knowledge is extensive.