The current gold standard for diagnosis of hepatic fibrosis and cirrhosis is the traditional invasive liver biopsy. It is desirable to assess hepatic fibrosis with noninvasive means. Targeted proteomic techniques allow an unbiased assessment of proteins and might be useful to identify proteins related to hepatic fibrosis. We utilized Selected Reaction Monitoring (SRM) targeted proteomics combined with an organ-specific blood protein strategy to identify and quantify 38 liver-specific proteins. A combination of protein C and retinol binding protein 4 in serum gave promising preliminary results as candidate biomarkers to distinguish patients at different stages of hepatic fibrosis due to chronic infection with hepatitis C virus (HCV). Also, alpha-1-B glycoprotein, complement factor H and insulin-like growth factor binding protein acid labile subunit performed well in distinguishing patients from healthy controls.
hepatitis C; fibrosis; liver-specific blood biomarkers; quantitation; selected reaction monitoring
Biomolecular pathways and networks are dynamic and complex, and the perturbations to them which cause disease are often multiple, heterogeneous and contingent. Pathway and network visualizations, rendered on a computer or published on paper, however, tend to be static, lacking in detail, and ill-equipped to explore the variety and quantities of data available today, and the complex causes we seek to understand.
RCytoscape integrates R (an open-ended programming environment rich in statistical power and data-handling facilities) and Cytoscape (powerful network visualization and analysis software). RCytoscape extends Cytoscape's functionality beyond what is possible with the Cytoscape graphical user interface. To illustrate the power of RCytoscape, a portion of the Glioblastoma multiforme (GBM) data set from the Cancer Genome Atlas (TCGA) is examined. Network visualization reveals previously unreported patterns in the data suggesting heterogeneous signaling mechanisms active in GBM Proneural tumors, with possible clinical relevance.
Progress in bioinformatics and computational biology depends upon exploratory and confirmatory data analysis, upon inference, and upon modeling. These activities will eventually permit the prediction and control of complex biological systems. Network visualizations -- molecular maps -- created from an open-ended programming environment rich in statistical power and data-handling facilities, such as RCytoscape, will play an essential role in this progression.
Biological networks; Visualization; Exploratory data analysis; Statistical programming; Bioinformatics
Blood carries a wide array of biomolecules, including nutrients, hormones, and molecules that are secreted by cells for specific biological functions. The recent finding of stable RNA of both endogenous and exogenous origin in circulation raises a number of questions and opens a broad, new field: exploring the origins, functions, and applications of these extracellular RNA molecules. These findings raise many important questions, including: what are the mechanisms of export and cellular uptake, what is the nature and source of their stability, what molecules do they interact with in the blood, and what are the possible biological functions of the circulating RNA? This review summarizes some key recent developments in circulating RNA research and discusses some of the open questions in the field.
microRNA; exosomes; microvesicles; cell–cell communication; exogenous RNA
Patients with Type 1 Diabetes (T1D) are particularly vulnerable to development of Diabetic nephropathy (DN) leading to End Stage Renal Disease. Hence a better understanding of the factors affecting kidney disease progression in T1D is urgently needed. In recent years microRNAs have emerged as important post-transcriptional regulators of gene expression in many different health conditions. We hypothesized that urinary microRNA profile of patients will differ in the different stages of diabetic renal disease.
Methods and Findings
We studied urine microRNA profiles with qPCR in 40 T1D with >20 year follow up 10 who never developed renal disease (N) matched against 10 patients who went on to develop overt nephropathy (DN), 10 patients with intermittent microalbuminuria (IMA) matched against 10 patients with persistent (PMA) microalbuminuria. A Bayesian procedure was used to normalize and convert raw signals to expression ratios. We applied formal statistical techniques to translate fold changes to profiles of microRNA targets which were then used to make inferences about biological pathways in the Gene Ontology and REACTOME structured vocabularies. A total of 27 microRNAs were found to be present at significantly different levels in different stages of untreated nephropathy. These microRNAs mapped to overlapping pathways pertaining to growth factor signaling and renal fibrosis known to be targeted in diabetic kidney disease.
Urinary microRNA profiles differ across the different stages of diabetic nephropathy. Previous work using experimental, clinical chemistry or biopsy samples has demonstrated differential expression of many of these microRNAs in a variety of chronic renal conditions and diabetes. Combining expression ratios of microRNAs with formal inferences about their predicted mRNA targets and associated biological pathways may yield useful markers for early diagnosis and risk stratification of DN in T1D by inferring the alteration of renal molecular processes.
Human plasma has long been a rich source for biomarker discovery. It has recently become clear that plasma RNA molecules, such as microRNA, in addition to proteins are common and can serve as biomarkers. Surveying human plasma for microRNA biomarkers using next generation sequencing technology, we observed that a significant fraction of the circulating RNA appear to originate from exogenous species. With careful analysis of sequence error statistics and other controls, we demonstrated that there is a wide range of RNA from many different organisms, including bacteria and fungi as well as from other species. These RNAs may be associated with protein, lipid or other molecules protecting them from RNase activity in plasma. Some of these RNAs are detected in intracellular complexes and may be able to influence cellular activities under in
vitro conditions. These findings raise the possibility that plasma RNAs of exogenous origin may serve as signaling molecules mediating for example the human-microbiome interaction and may affect and/or indicate the state of human health.
We describe some new conceptual tools for the rigorous, mathematical description of the “set-complexity” of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples.
Set-complexity; Biological networks; Modularity; Modular graphs; Bipartite graphs; Multi-partite graphs
MicroRNAs (miRNAs) are small, non-coding RNAs that regulate various biological processes, primarily through interaction with messenger RNAs. The levels of specific, circulating miRNAs in blood have been shown to associate with various pathological conditions including cancers. These miRNAs have great potential as biomarkers for various pathophysiological conditions. In this study we focused on different sample types’ effects on the spectrum of circulating miRNA in blood. Using serum and corresponding plasma samples from the same individuals, we observed higher miRNA concentrations in serum samples compared to the corresponding plasma samples. The difference between serum and plasma miRNA concentration showed some associations with miRNA from platelets, which may indicate that the coagulation process may affect the spectrum of extracellular miRNA in blood. Several miRNAs also showed platform dependent variations in measurements. Our results suggest that there are a number of factors that might affect the measurement of circulating miRNA concentration. Caution must be taken when comparing miRNA data generated from different sample types or measurement platforms.
Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to extracting meaningful biological information. The RNA-seq data analysis is built on the foundation of high quality initial genome localization and alignment information for RNA-seq sequences. Toward this goal, we have developed RNASEQR to accurately and effectively map millions of RNA-seq sequences. We have systematically compared RNASEQR with four of the most widely used tools using a simulated data set created from the Consensus CDS project and two experimental RNA-seq data sets generated from a human glioblastoma patient. Our results showed that RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs). RNASEQR analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers.
MicroRNAs (miRNAs) have been linked with various regulatory functions and disorders, such as cancers and heart diseases. They therefore present an important target for detection technologies for future medical diagnostics. We report here a novel method for rapid and sensitive miRNA detection and quantitation using surface plasmon resonance (SPR) sensor technology and a DNA*RNA antibody-based assay. The approach takes advantage of a novel high-performance portable SPR sensor instrument for spectroscopy of surface plasmons based on a special diffraction grating called a surface plasmon coupler and disperser (SPRCD). The surface of the grating is functionalized with thiolated DNA oligonucleotides which specifically capture miRNA from a liquid sample without amplification. Subsequently, an antibody that recognizes DNA*RNA hybrids is introduced to bind to the DNA*RNA complex and enhance sensor response to the captured miRNA. This approach allows detecting miRNA in less than 30 minutes at concentrations down to 2 pM with an absolute amount at high attomoles. The methodology is evaluated for analysis of miRNA from mouse liver tissues and is found to yield results which agree well with those provided by the quantitative polymerase chain reaction (qPCR).
microRNA; surface plasmon resonance; biosensor; liver toxicity; cancer diagnostics
MicroRNAs (miRNAs) are a recently discovered class of small, non-coding RNAs that regulate protein levels post-transcriptionally. miRNAs play important regulatory roles in many cellular processes, including differentiation, neoplastic transformation, and cell replication and regeneration. Because of these regulatory roles, it is not surprising that aberrant miRNA expression has been implicated in several diseases. Recent studies have reported significant levels of miRNAs in serum and other body fluids, raising the possibility that circulating miRNAs could serve as useful clinical biomarkers. Here, we provide a brief overview of miRNA biogenesis and function, the identification and potential roles of circulating extracellular miRNAs, and the prospective uses of miRNAs as clinical biomarkers. Finally, we address several issues associated with the accurate measurement of miRNAs from biological samples.
An endogenous molecular-cellular network for both normal and abnormal functions is assumed to exist. This endogenous network forms a nonlinear stochastic dynamical system, with many stable attractors in its functional landscape. Normal or abnormal robust states can be decided by this network in a manner similar to the neural network. In this context cancer is hypothesized as one of its robust intrinsic states.
This hypothesis implies that a nonlinear stochastic mathematical cancer model is constructible based on available experimental data and its quantitative prediction is directly testable. Within such model the genesis and progression of cancer may be viewed as stochastic transitions between different attractors. Thus it further suggests that progressions are not arbitrary. Other important issues on cancer, such as genetic vs epigenetics, double-edge effect, dormancy, are discussed in the light of present hypothesis. A different set of strategies for cancer prevention, cure, and care, is therefore suggested.
The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association.
We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions.
The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
Chronic lung diseases are the third leading cause of death in the United States due in part to an incomplete understanding of pathways that govern the progressive tissue remodeling that occurs in these disorders. Adenosine is elevated in the lungs of animal models and humans with chronic lung disease where it promotes air-space destruction and fibrosis. Adenosine signaling increases the production of the pro-fibrotic cytokine interleukin-6 (IL-6). Based on these observations, we hypothesized that IL-6 signaling contributes to tissue destruction and remodeling in a model of chronic lung disease where adenosine levels are elevated.
We tested this hypothesis by neutralizing or genetically removing IL-6 in adenosine deaminase (ADA)-deficient mice that develop adenosine dependent pulmonary inflammation and remodeling. Results demonstrated that both pharmacologic blockade and genetic removal of IL-6 attenuated pulmonary inflammation, remodeling and fibrosis in this model. The pursuit of mechanisms involved revealed adenosine and IL-6 dependent activation of STAT-3 in airway epithelial cells.
These findings demonstrate that adenosine enhances IL-6 signaling pathways to promote aspects of chronic lung disease. This suggests that blocking IL-6 signaling during chronic stages of disease may provide benefit in halting remodeling processes such as fibrosis and air-space destruction.
We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
The antineoplastic drug bleomycin leads to the side effect of pulmonary fibrosis in both humans and mice. We challenged genetically diverse inbred lines of mice from the Collaborative Cross with bleomycin to determine the heritability of this phenotype. Sibling pairs of mice from 40 lines were treated with bleomycin. Lung disease was assessed by scoring lung pathology and by measuring soluble collagen levels in lavage fluid. Serum micro ribonucleic acids (miRNAs) were also measured. Inbred sibling pairs of animals demonstrated high coinheritance of the phenotypes of disease susceptibility or disease resistance. The plasma levels of one miRNA were clearly correlated in sibling mice. The results showed that, as in humans, the lines that comprise the Collaborative Cross exhibited wide genetic variation in response to this drug. This finding suggests that the genetically diverse Collaborative Cross animals may reveal drug effects that might be missed if a study were based on a conventional mouse strain.
collaborative cross; drug side effects; genetic diversity; disease susceptibility; disease resistance; bleomycin; lung disease
We analyzed the whole genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors, and identify very rare SNVs. We also directly estimated a human intergeneration mutation rate of ∼1.1×10-8 per position per haploid genome. Both offspring in this family have two recessive disorders--Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the unique value of complete genome sequencing in families.
whole genome sequencing; rare genetic disease; inheritance analysis; recessive models; de novo mutations; recombination hotspot; crossover; haploidentity; haploidentical block; inheritance state; inheritance vector; HMM; haplotype; Miller syndrome; POADS; DHODH; DNAH5; KIAA0556; CES1
The molecular pathways involved in the interstitial lung diseases (ILDs) are poorly understood. Systems biology approaches, with global expression data sets, were used to identify perturbed gene networks, to gain some understanding of the underlying mechanisms, and to develop specific hypotheses relevant to these chronic lung diseases.
Lung tissue samples from patients with different types of ILD were obtained from the Lung Tissue Research Consortium and total cell RNA was isolated. Global mRNA and microRNA were profiled by hybridization and amplification-based methods. Differentially expressed genes were compiled and used to identify critical signaling pathways and potential biomarkers. Modules of genes were identified that formed a regulatory network, and studies were performed on cultured cells in vitro for comparison with the in vivo results.
By profiling mRNA and microRNA (miRNA) expression levels, we found subsets of differentially expressed genes that distinguished patients with ILDs from controls and that correlated with different disease stages and subtypes of ILDs. Network analysis, based on pathway databases, revealed several disease-associated gene modules, involving genes from the TGF-β, Wnt, focal adhesion, and smooth muscle actin pathways that are implicated in advancing fibrosis, a critical pathological process in ILDs. A more comprehensive approach was also adapted to construct a putative global gene regulatory network based on the perturbation of key regulatory elements, transcription factors and microRNAs. Our data underscores the importance of TGF-β signaling and the persistence of smooth muscle actin-containing fibroblasts in these diseases. We present evidence that, downstream of TGF-β signaling, microRNAs of the miR-23a cluster and the transcription factor Zeb1 could have roles in mediating an epithelial to mesenchymal transition (EMT) and the resultant persistence of mesenchymal cells in these diseases.
We present a comprehensive overview of the molecular networks perturbed in ILDs, discuss several potential key molecular regulatory circuits, and identify microRNA species that may play central roles in facilitating the progression of ILDs. These findings advance our understanding of these diseases at the molecular level, provide new molecular signatures in defining the specific characteristics of the diseases, suggest new hypotheses, and reveal new potential targets for therapeutic intervention.
Systems biology is an approach to the science that views biology as an information science, studies biological systems as a whole and their interactions with the environment. This approach, for the reasons described here, has particular power in the search for informative diagnostic biomarkers of diseases because it focuses on the fundamental causes and keys on the identification and understanding of disease- perturbed molecular networks. In this review, we describe some recent developments that have used systems biology to address complex diseases – prion disease and drug induced liver injury- and use these as examples to illustrate the importance of understanding network structure and dynamics. The knowledge of network dynamics through in vitro experimental perturbation and modeling allows us to determine the state of the networks, to identify molecular correlates, and to derive new disease treatment approaches to reverse the pathology or prevent its progress into a more severe state through the manipulation of network states. This general approach, including diagnostics and therapeutics, is becoming known as systems medicine.
Systems biology; biomarkers; systems medicine; prion disease; drug induced liver injury; microRNA; organ-specific proteins
Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics.
genetic interactions; genetic networks; genome-wide analysis; prior biological knowledge; probabilistic; logic-based modeling
The main conclusion is that systems biology approaches can indeed advance cancer research, having already proved successful in a very wide variety of cancer-related areas, and are likely to prove superior to many current research strategies. Major points include:
Systems biology and computational approaches can make important contributions to research and development in key clinical aspects of cancer and of cancer treatment, and should be developed for understanding and application to diagnosis, biomarkers, cancer progression, drug development and treatment strategies.Development of new measurement technologies is central to successful systems approaches, and should be strongly encouraged. The systems view of disease combined with these new technologies and novel computational tools will over the next 5–20 years lead to medicine that is predictive, personalized, preventive and participatory (P4 medicine).Major initiatives are in progress to gather extremely wide ranges of data for both somatic and germ-line genetic variations, as well as gene, transcript, protein and metabolite expression profiles that are cancer-relevant. Electronic databases and repositories play a central role to store and analyze these data. These resources need to be developed and sustained.Understanding cellular pathways is crucial in cancer research, and these pathways need to be considered in the context of the progression of cancer at various stages. At all stages of cancer progression, major areas require modelling via systems and developmental biology methods including immune system reactions, angiogenesis and tumour progression.A number of mathematical models of an analytical or computational nature have been developed that can give detailed insights into the dynamics of cancer-relevant systems. These models should be further integrated across multiple levels of biological organization in conjunction with analysis of laboratory and clinical data.Biomarkers represent major tools in determining the presence of cancer, its progression and the responses to treatments. There is a need for sets of high-quality annotated clinical samples, enabling comparisons across different diseases and the quantitative simulation of major pathways leading to biomarker development and analysis of drug effects.Education is recognized as a key component in the success of any systems biology programme, especially for applications to cancer research. It is recognized that a balance needs to be found between the need to be interdisciplinary and the necessity of having extensive specialist knowledge in particular areas.A proposal from this workshop is to explore one or more types of cancer over the full scale of their progression, for example glioblastoma or colon cancer. Such an exemplar project would require all the experimental and computational tools available for the generation and analysis of quantitative data over the entire hierarchy of biological information. These tools and approaches could be mobilized to understand, detect and treat cancerous processes and establish methods applicable across a wide range of cancers.
Systems biology; EU-USA workshop; Cancer
The discovery of microRNAs (miRNAs) as a new class of regulators of gene expression has triggered an explosion of research activities, but has left many unanswered questions about how this regulation functions and how it is integrated with other regulatory mechanisms. A number of miRNAs have been found to be present in plasma and other body fluids of humans and mice in surprisingly high concentrations. This observation was unexpected in two respects: first, the fact that these molecules are present at all outside the cell at significant concentrations and second, that these molecules appear to be stable outside of the cell. In light of this it has been suggested that the biological function of miRNAs may also extend outside of the cell and mediate cell–cell communication. We report here that after serum deprivation several human cell lines tested promptly export a substantial amount of miRNAs into the culture medium and the export process is largely energy dependent. The exported miRNAs are found both within and outside of the 16.5 and 120 K centrifugation pellets which contain most of the known cell-derived vesicles, the microvesicles and exosomes. We have identified some candidate proteins involved in this system, and one of these proteins may also play a role in protecting extracellular miRNAs from degradation. Our results point to a hitherto unrecognized and uncharacterized miRNA trafficking system in mammalian cells that is consistent with the cell–cell communication hypothesis.
Over the last 15 years, investigators have identified small noncoding RNAs as regulators of gene expression. One type of noncoding RNAs are termed microRNAs (miRNAs). miRNAs are evolutionary conserved, approximately 22-nucleotide single-stranded RNAs that target genes by inducing mRNA degradation or by inhibiting translation. miRNAs are implicated in many critical cellular processes, including apoptosis, proliferation, and differentiation. Furthermore, it is estimated that miRNAs may be responsible for regulating the expression of nearly one-third of the human genome. Despite the identification of greater than 500 mature miRNAs, very little is known about their biological functions and functional targets. In the last 5 years, researchers have increasingly focused on the functional relevance and role that miRNAs play in the pathogenesis of human disease. miRNAs are known to be important in solid organ and hematological malignancies, heart disease, as potential modulators of the immune response, and organ development. It is anticipated that miRNA analysis will emerge as an important complement to proteomic and genomic studies to further our understanding of disease pathogenesis. Despite the application of genomics and proteomics to the study of human lung disease, few studies have examined miRNA expression. This perspective is not meant to be an exhaustive review of miRNA biology but will provide an overview of both miRNA biogenesis and our current understanding of the role of miRNAs in lung disease as well as a perspective on the importance of integrating this analysis as a tool for identifying and understanding the biological pathways in lung-disease pathogenesis.
microRNA; epigenetics; genomics
Extraction of all the biological information inherent in large-scale genetic interaction datasets remains a significant challenge for systems biology. The core problem is essentially that of classification of the relationships among phenotypes of mutant strains into biologically informative “rules” of gene interaction. Geneticists have determined such classifications based on insights from biological examples, but it is not clear that there is a systematic, unsupervised way to extract this information. In this paper we describe such a method that depends on maximizing a previously described context-dependent information measure to obtain maximally informative biological networks. We have successfully validated this method on two examples from yeast by demonstrating that more biological information is obtained when analysis is guided by this information measure. The context-dependent information measure is a function only of phenotype data and a set of interaction rules, involving no prior biological knowledge. Analysis of the resulting networks reveals that the most biologically informative networks are those with the greatest context-dependent information scores. We propose that these high-complexity networks reveal genetic architecture at a modular level, in contrast to classical genetic interaction rules that order genes in pathways. We suggest that our analysis represents a powerful, data-driven, and general approach to genetic interaction analysis, with particular potential in the study of mammalian systems in which interactions are complex and gene annotation data are sparse.
Targeted genetic perturbation is a powerful tool for inferring gene function in model organisms. Functional relationships between genes can be inferred by observing the effects of multiple genetic perturbations in a single strain. The study of these relationships, generally referred to as genetic interactions, is a classic technique for ordering genes in pathways, thereby revealing genetic organization and gene-to-gene information flow. Genetic interaction screens are now being carried out in high-throughput experiments involving tens or hundreds of genes. These data sets have the potential to reveal genetic organization on a large scale, and require computational techniques that best reveal this organization. In this paper, we use a complexity metric based in information theory to determine the maximally informative network given a set of genetic interaction data. We find that networks with high complexity scores yield the most biological information in terms of (i) specific associations between genes and biological functions, and (ii) mapping modules of co-functional genes. This information-based approach is an automated, unsupervised classification of the biological rules underlying observed genetic interactions. It might have particular potential in genetic studies in which interactions are complex and prior gene annotation data are sparse.