Over the last decade, the genome-wide study of both heritable and somatic human variability has gone from a theoretical concept to a broadly implemented, practical reality, covering the entire spectrum of human disease. Although several findings have emerged from these studies1, the results of genome-wide association studies (GWAS) have been mostly sobering. For instance, although several genes showing medium-to-high penetrance within heritable traits were identified by these approaches, the majority of heritable genetic risk factors for most common diseases remain elusive2–7. Additionally, due to impractical requirements for cohort size8 and lack of methodologies to maximize power for such detections, few epistatic interactions and low-penetrance variants have been identified9. At the opposite end of the germline versus somatic event spectrum, considering that tumor cells abide by the same evolutionary fitness principles but on accelerated timescales due to mutator phenotypes, extensive somatic genomic rearrangements in solid tumors10 yield so many alterations that distinguishing ‘drivers’ from ‘passengers’ has been challenging.
This raises the question of whether GWAS data sets could yield additional insight when combined with other data modalities. Indeed, a number of previous studies have integrated significant genotype-phenotype associations with databases of gene annotations, such as the Gene Ontology (GO)11, MSigDB12 or the Kyoto Encyclopedia of Genes and Genomes (KEGG)13. The goal of these studies is to recognize higher-order structure within the data through the aggregation of loci in genes with similar functions or that are in the same pathway.
The context-specific networks of molecular interactions that determine cell behavior provide a particularly relevant framework for the integration of data from multiple ‘omics’. The rationale is straightforward: within the space of all possible genetic and epigenetic variants, those contributing to a specific trait or disease likely have some coalescent properties, allowing their effect to be functionally canalized via the cell communication and cell regulatory machinery that allows distinct cells to interact and regulates their behavior. Notably, contrary to random networks, whose output is essentially unconstrained, regulatory networks produced by adaptation to specific fitness landscapes are optimized to produce only a finite number of well-defined outcomes as a function of a very large number of exogenous and endogenous signals. Thus, if a comprehensive and accurate map of all intra- and intercellular molecular interactions were available, then genetic and epigenetic events implicated in a specific trait or disease should cluster in subnetworks of closely interacting genes.
Thus, if regulatory networks controlling cell pathophysiology were known a priori, one could systematically reduce the number of statistical association tests between genomic variants and the trait or disease of interest by considering only events that cluster within regulatory networks, as topologically related events would be more likely to produce related phenotypic effects. Such a pathway-wide association study (PWAS) strategy14 may improve our ability to distinguish signals from background noise by mitigating the need to account for a large number of multiple-hypothesis testing. In general, however, the molecular pathways governing physiological and disease-related traits are poorly characterized. Indeed, the classical notion of a relatively linear and interpretable set of regulatory pathways should be revisited in light of the dynamic, multiscale, context-specific nature of gene regulatory networks. We thus favor an alternative approach requiring the simultaneous reconstruction of context-specific gene regulatory networks15 as well as of the genetic and epigenetic variability they harbor. We call this second strategy integrative network-based association studies (INAS) and suggest that INAS will become increasingly valuable as the context-specific logic of gene regulatory networks is further elucidated.
In this Perspective, we explore current advances in PWAS and INAS research, inspired by a regulatory network–oriented view of traits and disease, and examine future directions that are being pursued within the emerging community of systems geneticists. We explore how networks (and pathway motifs within them) can be reconstructed and validated and how they may provide a valuable integrative framework within which to interpret GWAS results as well as other data on genetic and epigenetic variation.