Protein-Protein Interactions (PPIs) play important roles in many biological functions. Protein domains, which are defined as independently folding structural blocks of proteins, physically interact with each other to perform these biological functions. Therefore, the identification of Domain-Domain Interactions (DDIs) is of great biological interests because it is generally accepted that PPIs are mediated by DDIs. As a result, much effort has been put on the prediction of domain pair interactions based on computational methods. Many DDI prediction tools using PPIs network and domain evolution information have been reported. However, tools that combine the primary sequences, domain annotations, and structural annotations of proteins have not been evaluated before.
In this study, we report a novel approach called Gram-bAsed Interaction Analysis (GAIA). GAIA extracts peptide segments that are composed of fixed length of continuous amino acids, called n-grams (where n is the number of amino acids), from the annotated domain and DDI data set in Saccharomyces cerevisiae (budding yeast) and identifies a list of n-grams that may contribute to DDIs and PPIs based on the frequencies of their appearance. GAIA also reports the coordinate position of gram pairs on each interacting domain pair. We demonstrate that our approach improves on other DDI prediction approaches when tested against a gold-standard data set and achieves a true positive rate of 82% and a false positive rate of 21%. We also identify a list of 4-gram pairs that are significantly over-represented in the DDI data set and may mediate PPIs.
GAIA represents a novel and reliable way to predict DDIs that mediate PPIs. Our results, which show the localizations of interacting grams/hotspots, provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactome of cells.
We define the Gaia system of life and its environment on Earth, review the status of the Gaia theory, introduce potentially relevant concepts from complexity theory, then try to apply them to Gaia. We consider whether Gaia is a complex adaptive system (CAS) in terms of its behaviour and suggest that the system is self-organizing but does not reside in a critical state. Gaia has supported abundant life for most of the last 3.8 Gyr. Large perturbations have occasionally suppressed life but the system has always recovered without losing the capacity for large-scale free energy capture and recycling of essential elements. To illustrate how complexity theory can help us understand the emergence of planetary-scale order, we present a simple cellular automata (CA) model of the imaginary planet Daisyworld. This exhibits emergent self-regulation as a consequence of feedback coupling between life and its environment. Local spatial interaction, which was absent from the original model, can destabilize the system by generating bifurcation regimes. Variation and natural selection tend to remove this instability. With mutation in the model system, it exhibits self-organizing adaptive behaviour in its response to forcing. We close by suggesting how artificial life ('Alife') techniques may enable more comprehensive feasibility tests of Gaia.
This paper presents the design of a new wireless sensor node (GAIA Soil-Mote) for precision horticulture applications which permits the use of precision agricultural instruments based on the SDI-12 standard. Wireless communication is achieved with a transceiver compliant with the IEEE 802.15.4 standard. The GAIA Soil-Mote software implementation is based on TinyOS. A two-phase methodology was devised to validate the design of this sensor node. The first phase consisted of laboratory validation of the proposed hardware and software solution, including a study on power consumption and autonomy. The second phase consisted of implementing a monitoring application in a real broccoli (Brassica oleracea L. var Marathon) crop in Campo de Cartagena in south-east Spain. In this way the sensor node was validated in real operating conditions. This type of application was chosen because there is a large potential market for it in the farming sector, especially for the development of precision agriculture applications.
Wireless Sensor Networks; Mote; TinyOS; Precision Horticulture
The coupled biosphere–atmosphere system entails a vast range of processes at different scales, from ecosystem exchange fluxes of energy, water and carbon to the processes that drive global biogeochemical cycles, atmospheric composition and, ultimately, the planetary energy balance. These processes are generally complex with numerous interactions and feedbacks, and they are irreversible in their nature, thereby producing entropy. The proposed principle of maximum entropy production (MEP), based on statistical mechanics and information theory, states that thermodynamic processes far from thermodynamic equilibrium will adapt to steady states at which they dissipate energy and produce entropy at the maximum possible rate. This issue focuses on the latest development of applications of MEP to the biosphere–atmosphere system including aspects of the atmospheric circulation, the role of clouds, hydrology, vegetation effects, ecosystem exchange of energy and mass, biogeochemical interactions and the Gaia hypothesis. The examples shown in this special issue demonstrate the potential of MEP to contribute to improved understanding and modelling of the biosphere and the wider Earth system, and also explore limitations and constraints to the application of the MEP principle.
thermodynamics; interactions; Earth system science; ecosystems
In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD) with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS). For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test) and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
association study; adaptive regression splines; complex disease; genome scan; linkage disequilibrium (LD); non-parametric regression
Biological processes are regulated by complex interactions between transcription factors and signalling molecules, collectively described as Genetic Regulatory Networks (GRNs). The characterisation of these networks to reveal regulatory mechanisms is a long-term goal of many laboratories. However compiling, visualising and interacting with such networks is non-trivial. Current tools and databases typically focus on GRNs within simple, single celled organisms. However, data is available within the literature describing regulatory interactions in multi-cellular organisms, although not in any systematic form. This is particularly true within the field of developmental biology, where regulatory interactions should also be tagged with information about the time and anatomical location of development in which they occur.
We have developed myGRN (), a web application for storing and interrogating interaction data, with an emphasis on developmental processes. Users can submit interaction and gene expression data, either curated from published sources or derived from their own unpublished data. All interactions associated with publications are publicly visible, and unpublished interactions can only be shared between collaborating labs prior to publication. Users can group interactions into discrete networks based on specific biological processes. Various filters allow dynamic production of network diagrams based on a range of information including tissue location, developmental stage or basic topology. Individual networks can be viewed using myGRV, a tool focused on displaying developmental networks, or exported in a range of formats compatible with third party tools. Networks can also be analysed for the presence of common network motifs. We demonstrate the capabilities of myGRN using a network of zebrafish interactions integrated with expression data from the zebrafish database, ZFIN.
Here we are launching myGRN as a community-based repository for interaction networks, with a specific focus on developmental networks. We plan to extend its functionality, as well as use it to study networks involved in embryonic development in the future.
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for “base learners” and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value = 0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects.
epistasis; association study; complex disease; Type 2 diabetes
The Cochran–Armitage trend test (CATT) is well suited for testing association between a marker and a disease in case–control studies. When the underlying genetic model for the disease is known, the CATT optimal for the genetic model is used. For complex diseases, however, the genetic models of the true disease loci are unknown. In this situation, robust tests are preferable. We propose a two-phase analysis with model selection for the case–control design. In the first phase, we use the difference of Hardy–Weinberg disequilibrium coefficients between the cases and the controls for model selection. Then, an optimal CATT corresponding to the selected model is used for testing association. The correlation of the statistics used for selection and the test for association is derived to adjust the two-phase analysis with control of the Type-I error rate. The simulation studies show that this new approach has greater efficiency robustness than the existing methods.
Cochran–Armitage trend test; Disease risk; Efficiency robustness; Hardy–Weinberg disequilibrium; SNP
We introduce an innovative multilocus test for disease association. It is an extension of an existing score test that gains power over alternative methods by incorporating a parsimonious one-degree-of-freedom model for interaction. We use our method in applications designed to detect interactions that generate hypotheses about the functionality of prostate cancer (PRCA) susceptibility regions.
Our proposed score test is designed to gain additional power through the use of a retrospective likelihood that exploits an assumption of independence between unlinked loci in the underlying population. Its performance is validated through simulation. The method is used in conditional scans with data from stage II of the Cancer Genetic Markers of Susceptibility PRCA genome-wide association study.
Our proposed method increases power to detect susceptibility loci in diverse settings. It identified two high-ranking, biologically interesting interactions: (1) rs748120 of NR2C2 and subregions of 8q24 that contain independent susceptibility loci specific to PRCA and (2) rs4810671 of SULF2 and both JAZF1 and HNF1B that are associated with PRCA and type 2 diabetes.
Our score test is a promising multilocus tool for genetic epidemiology. The results of our applications suggest functionality for poorly understood PRCA susceptibility regions. They motivate replication study.
Gene-gene interaction; Score test; Prostate cancer
Complex diseases are multifactorial in nature and can involve multiple loci with gene × gene and gene × environment interactions. Research on methods to uncover the interactions between those genes that confer susceptibility to disease has been extensive, but many of these methods have only been developed for sibling pairs or sibships. In this report, we assess the performance of two methods for finding gene × gene interactions that are applicable to arbitrarily sized pedigrees, one based on correlation in per-family nonparametric linkage scores and another that incorporates candidate loci genotypes as covariates into an affected relative pair linkage analysis. The power and type I error rate of both of these methods was addressed using the simulated Genetic Analysis Workshop 14 data. In general, we found detection of the interacting loci to be a difficult problem, and though we experienced some modest success there is a clear need to continue developing new methods and approaches to the problem.
Multiple prostate cancer (PCa) risk-related loci have been discovered by genome-wide association studies (GWAS) based on case–control designs. However, GWAS findings may be confounded by population stratification if cases and controls are inadvertently drawn from different genetic backgrounds. In addition, since these loci were identified in cases with predominantly sporadic disease, little is known about their relationships with hereditary prostate cancer (HPC). The association between seventeen reported PCa susceptibility loci was evaluated with a family-based association test using 1,979 hereditary PCa families of European descent collected by members of the International Consortium for Prostate Cancer Genetics, with a total of 5,730 affected men. The risk alleles for 8 of the 17 loci were significantly over-transmitted from parents to affected offspring, including SNPs residing in 8q24 (regions 1, 2 and 3), 10q11, 11q13, 17q12 (region 1), 17q24 and Xp11. In subgroup analyses, three loci, at 8q24 (regions 1 and 2) plus 17q12, were significantly over-transmitted in hereditary PCa families with five or more affected members, while loci at 3p12, 8q24 (region 2), 11q13, 17q12 (region 1), 17q24 and Xp11 were significantly over-transmitted in HPC families with an average age of diagnosis at 65 years or less. Our results indicate that at least a subset of PCa risk-related loci identified by case–control GWAS are also associated with disease risk in HPC families.
This work presents the results of 4 years long monitoring of concentrations of SO2 gas and PM10 in the urban area around the copper smelter in Bor. The contents of heavy metals Pb, Cd, Cu, Ni, and As in PM10 were determined and obtained values were compared to the limit values provided in EU Directives. Manifold excess concentrations of all the components in the atmosphere of the urban area of the townsite Bor were registered. Through application of a multi-criteria analysis by using PROMETHEE/GAIA method, the zones were ranked according to the level of pollution.
Heavy metals; SO2 gas; PM10; Pollution; Distribution; PROMETHEE/GAIA
The goals of our analysis were to map functional loci, which contribute to the case-control status of a trait of interest, using large pedigrees. We used logistic regression fitted with the generalized estimation equation to test associations between a dichotomous phenotype and all genotyped common and rare single-nucleotide polymorphisms. In addition to the association study, we also developed and applied a simple and fast identical-by-descent-based test to identify loci that were shared among affected individuals more often than expected by chance. Among the top significant loci, we assessed the statistical power and the false discovery rate of both methods. We also demonstrated that family-based studies, compared with the standard population-based association studies, have great values and advantages for the discovery of multiple rare causal variants.
The Earth, with its core-driven magnetic field, convective mantle, mobile lid tectonics, oceans of liquid water, dynamic climate and abundant life is arguably the most complex system in the known universe. This system has exhibited stability in the sense of, bar a number of notable exceptions, surface temperature remaining within the bounds required for liquid water and so a significant biosphere. Explanations for this range from anthropic principles in which the Earth was essentially lucky, to homeostatic Gaia in which the abiotic and biotic components of the Earth system self-organise into homeostatic states that are robust to a wide range of external perturbations. Here we present results from a conceptual model that demonstrates the emergence of homeostasis as a consequence of the feedback loop operating between life and its environment. Formulating the model in terms of Gaussian processes allows the development of novel computational methods in order to provide solutions. We find that the stability of this system will typically increase then remain constant with an increase in biological diversity and that the number of attractors within the phase space exponentially increases with the number of environmental variables while the probability of the system being in an attractor that lies within prescribed boundaries decreases approximately linearly. We argue that the cybernetic concept of rein control provides insights into how this model system, and potentially any system that is comprised of biological to environmental feedback loops, self-organises into homeostatic states.
Life on Earth is perhaps greater than three and a half billion years old and it would appear that once it started it never stopped. During this period a number of dramatic shocks and drivers have affected the Earth. These include the impacts of massive asteroids, runaway climate change and increases in brightness of the Sun. Has life on Earth simply been lucky in withstanding such perturbations? Are there any self-regulating or homeostatic processes operating in the Earth system that would reduce the severity of such perturbations? If such planetary processes exist, to what extent are they the result of the actions of life? In this study, we show how the regulation of environmental conditions can emerge as a consequence of life's effects. If life is both affected by and affects it environment, then this coupled system can self-organise into a robust control system that was first described during the early cybernetics movement around the middle of the twentieth century. Our findings are in principle applicable to a wide range of real world systems - from microbial mats to aquatic ecosystems up to and including the entire biosphere.
Following the identification of several disease-associated polymorphisms by whole genome association analysis, interest is now focussing on the detection of effects that, due to their interaction with other genetic (or environmental) factors, may not be identified by using standard single-locus tests. In addition to increasing power to detect association, there is also a hope detecting interactions between loci will allow us to elucidate the biological and biochemical pathways underpinning disease. Here I provide a critical survey of the current methodological approaches (and related software packages) used to detect interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biologcal relevance of statistical interactions.
The genetic etiology of autism is heterogeneous. Multiple disorders share genotypic and phenotypic traits with autism. Network based cross-disorder analysis can aid in the understanding and characterization of the molecular pathology of autism, but there are few tools that enable us to conduct cross-disorder analysis and to visualize the results.
We have designed Autworks as a web portal to bring together gene interaction and gene-disease association data on autism to enable network construction, visualization, network comparisons with numerous other related neurological conditions and disorders. Users may examine the structure of gene interactions within a set of disorder-associated genes, compare networks of disorder/disease genes with those of other disorders/diseases, and upload their own sets for comparative analysis.
Autworks is a web application that provides an easy-to-use resource for researchers of varied backgrounds to analyze the autism gene network structure within and between disorders.
Autism; Autistic disorder; Autism spectrum disorders; Autism genetics; Autism genomics; Network biology; Network medicine; Translational bioinformatics; Protein-protein interactions
Hundreds of new loci have been discovered by genome-wide association studies of human traits. These studies mostly focused on associations between single locus and a trait. Interactions between genes and between genes and environmental factors are of interest as they can improve our understanding of the genetic background underlying complex traits. Genome-wide testing of complex genetic models is a computationally demanding task. Moreover, testing of such models leads to multiple comparison problems that reduce the probability of new findings. Assuming that the genetic model underlying a complex trait can include hundreds of genes and environmental factors, testing of these models in genome-wide association studies represent substantial difficulties.
We and Pare with colleagues (2010) developed a method allowing to overcome such difficulties. The method is based on the fact that loci which are involved in interactions can show genotypic variance heterogeneity of a trait. Genome-wide testing of such heterogeneity can be a fast scanning approach which can point to the interacting genetic variants.
In this work we present a new method, SVLM, allowing for variance heterogeneity analysis of imputed genetic variation. Type I error and power of this test are investigated and contracted with these of the Levene's test. We also present an R package, VariABEL, implementing existing and newly developed tests.
Variance heterogeneity analysis is a promising method for detection of potentially interacting loci. New method and software package developed in this work will facilitate such analysis in genome-wide context.
single-nucleotide polymorphisms (SNPs); genome-wide association (GWA); gene-environment interactions (GxE); gene-gene interactions (GxG); variance heterogeneity; environmental sensitivity; VariABEL; the GenABEL project
Driven by advances in molecular genetic technologies and statistical analysis methodologies, there have been huge strides taken in dissecting the complex genetic basis of many inflammatory dermatoses. One example is psoriasis where application of classical linkage analysis and genome wide association investigation has identified genetic loci of major and minor effect. Although most loci independently have modest genetic effect, they identify important biological pathways potentially relevant to disease pathogenesis and therapeutic intervention. In the case of psoriasis these appear to involve the epidermal barrier, NF-κB mechanisms and Th17 adaptive immune responses. The advent of next generation sequencing methods will permit a more detailed and complete map of disease genetic architecture, a key step in developing personalised medicine strategies in the clinical management of the complex inflammatory dermatoses.
Genetic discoveries are validated through the meta-analysis of genome-wide association scans in large international consortia. Because environmental variables may interact with genetic factors, investigation of differing genetic effects for distinct levels of an environmental exposure in these large consortia may yield additional susceptibility loci undetected by main effects analysis. We describe a method of joint meta-analysis of SNP and SNP by Environment (SNP×E) regression coefficients for use in gene-environment interaction studies.
In testing SNP×E interactions, one approach uses a two degree of freedom test to identify genetic variants that influence the trait of interest. This approach detects both main and interaction effects between the trait and the SNP. We propose a method to jointly meta-analyze the SNP and SNP×E coefficients using multivariate generalized least squares. This approach provides confidence intervals of the two estimates, a joint significance test for SNP and SNP×E terms, and a test of homogeneity across samples.
We present a simulation study comparing this method to four other methods of meta-analysis and demonstrate that the joint meta-analysis performs better than the others when both main and interaction effects are present. Additionally, we implemented our methods in a meta-analysis of the association between SNPs from the type 2 diabetes-associated gene PPARG and log-transformed fasting insulin levels and interaction by body mass index in a combined sample of 19,466 individuals from 5 cohorts.
2 degree of freedom meta-analysis; joint meta-analysis; PPARG; Gene-environment interaction meta-analysis
The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained.
Dissecting the genetic architecture of fitness-related traits in wild populations is key to understanding evolution and the mechanisms maintaining adaptive genetic variation. We took advantage of a recently developed genetic linkage map and phenotypic information from wild pedigreed individuals from Ram Mountain, Alberta, Canada, to study the genetic architecture of ecologically important traits (horn volume, length, base circumference and body mass) in bighorn sheep. In addition to estimating sex-specific and cross-sex quantitative genetic parameters, we tested for the presence of quantitative trait loci (QTLs), colocalization of QTLs between bighorn sheep and domestic sheep, and sex × QTL interactions. All traits showed significant additive genetic variance and genetic correlations tended to be positive. Linkage analysis based on 241 microsatellite loci typed in 310 pedigreed animals resulted in no significant and five suggestive QTLs (four for horn dimension on chromosomes 1, 18 and 23, and one for body mass on chromosome 26) using genome-wide significance thresholds (Logarithm of odds (LOD) >3.31 and >1.88, respectively). We also confirmed the presence of a horn dimension QTL in bighorn sheep at the only position known to contain a similar QTL in domestic sheep (on chromosome 10 near the horns locus; nominal P<0.01) and highlighted a number of regions potentially containing weight-related QTLs in both species. As expected for sexually dimorphic traits involved in male–male combat, loci with sex-specific effects were detected. This study lays the foundation for future work on adaptive genetic variation and the evolutionary dynamics of sexually dimorphic traits in bighorn sheep.
adaptive variation; animal model; domestic sheep; Ovis aries; sexual dimorphism; sexual selection
Background and aims: Genetic variation in the chromosome 5q31 cytokine cluster (IBD5 risk haplotype) has been associated with Crohn’s disease (CD) in a Canadian population. We studied the IBD5 risk haplotype in both British and Japanese cohorts. Disease associations have also been reported for CARD15/NOD2 and TNF variants. Complex interactions between susceptibility loci have been shown in animal models, and we tested for potential gene-gene interactions between the three CD associated loci.
Methods: Family based association analyses were performed in 457 British families (252 ulcerative colitis, 282 CD trios) genotyped for the IBD5 haplotype, common CARD15, and TNF−857 variants. To test for possible epistatic interactions between variants, transmission disequilibrium test analyses were further stratified by genotype at other loci, and novel log linear analyses were performed using the haplotype relative risk model. Case control association analyses were performed in 178 Japanese CD patients and 156 healthy controls genotyped for the IBD5 haplotype.
Results: The IBD5 haplotype was associated with CD (p=0.007), but not with UC, in the British Caucasian population. The CARD15 variants and IBD5 haplotype showed additive main effects, and in particular no evidence for epistatic interactions was found. Variants from the IBD5 haplotype were extremely rare in the Japanese.
Conclusions: The IBD5 risk haplotype is associated with British CD. Genetic variants predisposing to CD show heterogeneity and population specific differences.
Crohn’s disease; ulcerative colitis; inflammatory bowel disease; IBD locus
Genome-wide association studies (GWAS) test hundreds of thousands of single-nucleotide polymorphisms (SNPs) for association to a trait, treating each marker equally and ignoring prior evidence of association to specific regions. Typically, promising regions are selected for further investigation based on p-values obtained from simple tests of association. However, loci that exert only a weak, low-penetrant role on the trait, producing modest evidence of association, are not detectable in the context of a GWAS. Implementing prior knowledge of association in GWAS could increase power, help distinguish between false and true positives, and identify better sets of SNPs for follow-up studies.
Here we performed a GWAS on rheumatoid arthritis (RA) patients and controls (Problem 1, Genetic Analysis Workshop 16). In order to include prior information in the analysis, we applied four methods that distinctively deal with markers in candidate genes in the context of GWAS. SNPs were divided into a random and a candidate subset, then we applied empirical correction by permutation, false-discovery rate, false-positive report probability, and posterior odds of association using different prior probabilities. We repeated the same analyses on two different sets of candidate markers defined on the basis of previously reported association to RA following two different approaches. The four methods showed similar relative behavior when applied to the two sets, with the proportion of candidate SNPs ranked among the top 2,000 varying from 0 to 100%. The use of different prior probabilities changed the stringency of the methods, but not their relative performance.
Complex diseases are generally thought to be under the influence of multiple, and possibly interacting, genes. Many association methods have been developed to identify susceptibility genes assuming a single-gene disease model, referred to as single-locus methods. Multilocus methods consider joint effects of multiple genes and environmental factors. One commonly used method for family-based association analysis is implemented in FBAT. The multifactor-dimensionality reduction method (MDR) is a multilocus method, which identifies multiple genetic loci associated with the occurrence of complex disease. Many studies of late onset complex diseases employ a discordant sib pairs design. We compared the FBAT and MDR in their ability to detect susceptibility loci using a discordant sib-pair dataset generated from the simulated data made available to participants in the Genetic Analysis Workshop 14. Using FBAT, we were able to identify the effect of one susceptibility locus. However, the finding was not statistically significant. We were not able to detect any of the interactions using this method. This is probably because the FBAT test is designed to find loci with major effects, not interactions. Using MDR, the best result we obtained identified two interactions. However, neither of these reached a level of statistical significance. This is mainly due to the heterogeneity of the disease trait and noise in the data.
Genome-wide association studies have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes. It has been proposed that extending existing analysis methods by considering interactions between pairs of loci may uncover additional genetic effects. However, the large number of possible two-marker tests presents significant computational and statistical challenges. Although several strategies to detect epistasis effects have been proposed and tested for specific phenotypes, so far there has been no systematic attempt to compare their performance using real data. We made use of thousands of gene expression traits from linkage and eQTL studies, to compare the performance of different strategies. We found that using information from marginal associations between markers and phenotypes to detect epistatic effects yielded a lower false discovery rate (FDR) than a strategy solely using biological annotation in yeast, whereas results from human data were inconclusive. For future studies whose aim is to discover epistatic effects, we recommend incorporating information about marginal associations between SNPs and phenotypes instead of relying solely on biological annotation. Improved methods to discover epistatic effects will result in a more complete understanding of complex genetic effects.