Search tips
Search criteria

Results 1-25 (1154584)

Clipboard (0)

Related Articles

1.  Detecting and Removing Inconsistencies between Experimental Data and Signaling Network Topologies Using Integer Linear Programming on Interaction Graphs 
PLoS Computational Biology  2013;9(9):e1003204.
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.
Author Summary
Cellular signal transduction is orchestrated by communication networks of signaling proteins commonly depicted on signaling pathway maps. However, each cell type may have distinct variants of signaling pathways, and wiring diagrams are often altered in disease states. The identification of truly active signaling topologies based on experimental data is therefore one key challenge in systems biology of cellular signaling. We present a new framework for training signaling networks based on interaction graphs (IG). In contrast to complex modeling formalisms, IG capture merely the known positive and negative edges between the components. This basic information, however, already sets hard constraints on the possible qualitative behaviors of the nodes when perturbing the network. Our approach uses Integer Linear Programming to encode these constraints and to predict the possible changes (down, neutral, up) of the activation levels of the involved players for a given experiment. Based on this formulation we developed several algorithms for detecting and removing inconsistencies between measurements and network topology. Demonstrated by EGFR/ErbB signaling in hepatocytes, our approach delivers direct conclusions on edges that are likely inactive or missing relative to canonical pathway maps. Such information drives the further elucidation of signaling network topologies under normal and pathological phenotypes.
PMCID: PMC3764019  PMID: 24039561
2.  Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism 
A comprehensive genome-scale metabolic network of Chlamydomonas reinhardtii, including a detailed account of light-driven metabolism, is reconstructed and validated. The model provides a new resource for research of C. reinhardtii metabolism and in algal biotechnology.
The genome-scale metabolic network of Chlamydomonas reinhardtii (iRC1080) was reconstructed, accounting for >32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways.This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology.Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in iRC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii, including very long-chain polyunsaturated fatty acid and ceramide synthesis pathways.A novel approach for modeling light-driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence (Merchant et al, 2007), and gene functional annotation, we have reconstructed and experimentally validated the genome-scale metabolic network for this alga, iRC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. iRC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments (Figure 1A). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). The lipid metabolic pathways of iRC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl-chain positions; acyl-chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model-driven metabolic engineering efforts.
We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models (Figure 1B) and providing validating evidence for the contents of iRC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts detected, reflecting incomplete characterization of compartmental transporters and supporting a hypothesis of latent pathway evolution for ceramide synthesis in C. reinhardtii. Additional lines of evidence from the reconstruction effort similarly support this hypothesis including lack of ceramide synthetase and other annotation gaps downstream in sphingolipid metabolism. A similar hypothesis of latent pathway evolution was established for very long-chain fatty acids (VLCFAs) and their polyunsaturated analogs (VLCPUFAs) (Figure 1C), owing to the absence of this class of lipids in previous experimental measurements, lack of a candidate VLCFA elongase in the functional annotation, and additional downstream annotation gaps in arachidonic acid metabolism.
The network provides a detailed account of metabolic photon absorption by light-driven reactions, including photosystems I and II, light-dependent protochlorophyllide oxidoreductase, provitamin D3 photoconversion to vitamin D3, and rhodopsin photoisomerase; this network accounting permits the precise modeling of light-dependent metabolism. iRC1080 accounts for effective light spectral ranges through analysis of biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral ranges associated with each photon-utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, termed prism reactions. Coefficients for different photon wavelengths in a prism reaction correspond to the ratios of photon flux in the defined effective spectral ranges to the total emitted photon flux from a given light source (Figure 3B). This approach distinguishes the amount of emitted photons that drive different metabolic reactions. We created prism reactions for most light sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs. We also included regulatory effects, resulting from lighting conditions insofar as published studies enabled. Light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii on multiple levels: transcriptional regulation, chloroplast RNA degradation, translational regulation, and thioredoxin-mediated enzyme regulation. Through application of our light model and prism reactions, we were able to closely recapitulate experimental growth measurements under solar, incandescent, and red LED lights. Through unbiased sampling, we were able to establish the tremendous statistical significance of the accuracy of growth predictions achievable through implementation of prism reactions. Finally, application of the photosynthetic model was demonstrated prospectively to evaluate light utilization efficiency under different light sources. The results suggest that, of the existing light sources, red LEDs provide the greatest efficiency, about three times as efficient as sunlight. Extending this analysis, the model was applied to design a maximally efficient LED spectrum for algal growth. The result was a 677-nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s, suggesting that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum.
In summary, the C. reinhardtii metabolic network iRC1080 that we have reconstructed offers insight into the basic biology of this species and may be employed prospectively for genetic engineering design and light source design relevant to algal biotechnology. iRC1080 was used to analyze lipid metabolism and generate novel hypotheses about the evolution of latent pathways. The predictive capacity of metabolic models developed from iRC1080 was demonstrated in simulating mutant phenotypes and in evaluation of light source efficiency. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling approach for light-driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome-scale metabolic network for this alga and devised a novel light-modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light-driven metabolism and quantitative systems biology.
PMCID: PMC3202792  PMID: 21811229
Chlamydomonas reinhardtii; lipid metabolism; metabolic engineering; photobioreactor
3.  A Factor Graph Nested Effects Model To Identify Networks from Genetic Perturbations 
PLoS Computational Biology  2009;5(1):e1000274.
Complex phenotypes such as the transformation of a normal population of cells into cancerous tissue result from a series of molecular triggers gone awry. We describe a method that searches for a genetic network consistent with expression changes observed under the knock-down of a set of genes that share a common role in the cell, such as a disease phenotype. The method extends the Nested Effects Model of Markowetz et al. (2005) by using a probabilistic factor graph to search for a network representing interactions among these silenced genes. The method also expands the network by attaching new genes at specific downstream points, providing candidates for subsequent perturbations to further characterize the pathway. We investigated an extension provided by the factor graph approach in which the model distinguishes between inhibitory and stimulatory interactions. We found that the extension yielded significant improvements in recovering the structure of simulated and Saccharomyces cerevisae networks. We applied the approach to discover a signaling network among genes involved in a human colon cancer cell invasiveness pathway. The method predicts several genes with new roles in the invasiveness process. We knocked down two genes identified by our approach and found that both knock-downs produce loss of invasive potential in a colon cancer cell line. Nested effects models may be a powerful tool for inferring regulatory connections and genes that operate in normal and disease-related processes.
Author Summary
Biological processes are the result of the actions and interactions of many genes and the proteins that they encode. Our knowledge of interactions for many biological processes is limited, especially for cancer where genomic alterations may create entirely novel pathways not present in normal tissue. Perturbing gene expression (for example, by deleting a gene) has long been used as a tool in molecular biology to elucidate interactions but is very expensive and labor intensive. The search for new genes that may participate can be a daunting “fishing expedition.” We have devised a tool that automatically infers interactions using high-throughput gene expression data. When a gene is silenced, it causes other genes to be switched on or off, which provide clues about the pathway(s) in which the gene acts. Our method uses the genomewide on/off states as a fingerprint to detect interactions among a set of silenced genes. We were able to elucidate a network of interactions for several genes implicated in metastatic colon cancer. Genes newly connected to the network were found to operate in cancer cell invasion in human cells, validating the approach. Thus, the method enables an efficient discovery of the networks that underlie biological processes such as carcinogenesis.
PMCID: PMC2613752  PMID: 19180177
4.  Use of Pleiotropy to Model Genetic Interactions in a Population 
PLoS Genetics  2012;8(10):e1003010.
Systems-level genetic studies in humans and model systems increasingly involve both high-resolution genotyping and multi-dimensional quantitative phenotyping. We present a novel method to infer and interpret genetic interactions that exploits the complementary information in multiple phenotypes. We applied this approach to a population of yeast strains with randomly assorted perturbations of five genes involved in mating. We quantified pheromone response at the molecular level and overall mating efficiency. These phenotypes were jointly analyzed to derive a network of genetic interactions that mapped mating-pathway relationships. To determine the distinct biological processes driving the phenotypic complementarity, we analyzed patterns of gene expression to find that the pheromone response phenotype is specific to cellular fusion, whereas mating efficiency was a combined measure of cellular fusion, cell cycle arrest, and modifications in cellular metabolism. We applied our novel method to global gene expression patterns to derive an expression-specific interaction network and demonstrate applicability to global transcript data. Our approach provides a basis for interpretation of genetic interactions and the generation of specific hypotheses from populations assayed for multiple phenotypes.
Author Summary
Parallel advances in genotype and phenotype measurement technologies are yielding large-scale, multidimensional datasets that can potentially decipher the genetic etiology of complex traits. Understanding these data will require methods that combine the experimental power of molecular biology and the quantitative power of statistical genetics. In this work, we describe a novel approach that uses the complementary information encoded by multiple phenotypes in conjunction with genetic data to map genetic interaction networks in terms of quantitative variant-to-variant and variant-to-phenotype influences. We tested this method using a population of yeast strains with random combinations of five genetic mutations and derived an interaction network using molecular and colony-level assays of mating phenotypes. Distinct biological processes that underlie the two phenotypes were identified with gene expression analysis, validating the method's ability to exploit complementary biological information in multiple phenotypes. Our method generates data-driven models and testable hypotheses of how the genetic variation in a population combines to affect complex traits. It is designed to be flexible and scalable for application to populations with extensive genetic diversity.
PMCID: PMC3469415  PMID: 23071457
5.  Insight into human alveolar macrophage and M. tuberculosis interactions via metabolic reconstructions 
A human alveolar macrophage genome-scale metabolic reconstruction was reconstructed from tailoring a global human metabolic network, Recon 1, by using computational algorithms and manual curation.A genome-scale host–pathogen network of the human alveolar macrophage and Mycobacterium tuberculosis is presented. This involved integrating two genome-scale network reconstructions.The reaction activity and gene essentiality predictions of the host–pathogen model represent a more accurate depiction of infection.Integration of high-throughput data into a host-pathogen model followed by systems analysis was performed in order to elucidate major metabolic differences under different types of M. tuberculosis infection.
Mycobacterium tuberculosis (M. tb) is an insidious and highly persistent pathogen that affects one-third of the world's population (WHO, 2009). Metabolism is foundational to M. tb's infection ability and the ensuing host–pathogen interactions. In addition, M. tb has a heterogeneous clinical presentation and can infect virtually every tissue. Depending on the location of the infection, different metabolic pathways are active and inactive in both the host and pathogen cells. In this study, we sought to model the host–pathogen interactions of the human alveolar macrophage and M. tb as well as detail the metabolic differences in specific infection types using genome-scale metabolic reconstructions (Figure 4A).
Genome-scale metabolic reconstructions are knowledge bases of all known metabolic reactions of a given organism. Reconstructions have been shown to elucidate the mechanistic genotype-to-phenotype relationship through the integration of high-throughput and physiological data (Oberhardt et al, 2009). Genome-scale reconstructions are converted into mathematical models under the constraints-based reconstruction and analysis (COBRA) platform (Becker et al, 2007). COBRA models use network stoichiometry and steady-state mass balances to define a solution space of potential flux states that a network can take. Thus, the COBRA approach does not require kinetic parameters.
Recently, the global human metabolic network, Recon 1, has been reconstructed (Duarte et al, 2007). To understand the metabolic host–pathogen integrations of M. tb with its human host, we first tailored the global human metabolic network into a cell-specific metabolic reconstruction of the human alveolar macrophage. This was carried out using established computational algorithms (Becker and Palsson, 2008; Shlomi et al, 2008) and manual curation to confirm the included and excluded reactions. The human alveolar macrophage reconstruction, iAB-AMØ-1410, accounts for 1410 genes, 3012 intracellular reactions, and 2572 metabolites (Figure 4C). iAB-AMØ-1410 was able to accurately predict maximum ATP and NO production rates obtained from experimental data (Griscavage et al, 1993; Newsholme et al, 1999).
The second step to studying host–pathogen interactions was integration of the human alveolar macrophage reconstruction with an existing genome-scale metabolic model of M. tb, iNJ661 (Jamshidi and Palsson, 2007). Interfacial constraints were set to create a phagosomal environment that was hypoxic, nitrosative, rich in fatty acids, and poor in carbohydrates. From the onset, it was apparent that some oxygen (<15% of in vitro uptake) was required for proper simulations. In addition, algorithmic tailoring of the M. tb biomass objective function was performed to better represent an infectious state. The integrated host–pathogen metabolic reconstruction was dubbed iAB-AMØ-1410-Mt-661.
Analysis of the integrated host–pathogen metabolic reconstruction resulted in three main findings. First, by setting interfacial constraints and tailoring the biomass objective function, the solution space better represents an infectious state. Without adding artificial constraints to the host portion of the integrated model, the iAB-AMØ-1410 solution space is greatly reduced (Figure 4B). Macrophage glycolysis and nitric oxide production are up-regulated and macrophage ATP production, nucleotide synthesis, and amino-acid metabolism are suppressed. In addition, M. tb glycolysis is suppressed and isocitrate lyase is up-regulated for generation of acetyl-CoA. Fatty acid oxidation pathways and production of mycolic acids are increased, while production of nucleotides, peptidoglycans, and phenolic glycolipids are reduced. The modified solution space of the alveolar macrophage and M. tb better represents the infectious state.
Second, the host-pathogen model more accurately predicts M. tb gene deletion tests than the current in vitro model, iNJ661. The host-pathogen model predicted 11 essential genes and 37 unessential genes differently than iNJ661. A total of 22 of the differentially predicted genes have been experimentally characterized (Sassetti and Rubin, 2003; Sohaskey, 2008). The host-pathogen model correctly predicted 18 of the 22 genes. Thus, iAB-AMØ-1410-Mt-661 is a more accurate platform for studying infectious states of M. tb.
Finally, we sought to determine metabolic differences in both the macrophage and M. tb between three different types of infection: latent, pulmonary, and meningeal. Transcription profiling data of the macrophage for the three infections (Thuong et al, 2008) were integrated in the context of the host–pathogen network to elucidate the reaction activity of the three infections. There was wide heterogeneity in the three infection states; some of these differences are highlighted. Macrophage hyaluronan synthase and export were only active in the pulmonary infection. This is potentially interesting from a pharmaceutical viewpoint as hyaluronan has been implicated as a potential carbon source for extracellular M. tb (Hirayama et al, 2009). In addition, we detected metabolic activity differences in M. tb pathways that have been previously discussed as potential drug targets (Eoh et al, 2007; Boshoff et al, 2008). Polyprenyl metabolic reactions were only active in the latent state infection, while de novo synthesis of nicotinamide cofactors was only active in latent and meningeal M. tb infections.
Host-pathogen modeling represents a novel approach for studying metabolic interactions during infection. iAB-AMØ-1410-Mt-661 is a more accurate platform for understanding the biology and pathophysiology of M. tb infection. Most importantly, genome-scale metabolic reconstructions can act as scaffolds for integrating high-throughput data. Particularly, in this study we were able to discern reaction activity differences between different infection types.
Metabolic coupling of Mycobacterium tuberculosis to its host is foundational to its pathogenesis. Computational genome-scale metabolic models have shown utility in integrating -omic as well as physiologic data for systemic, mechanistic analysis of metabolism. To date, integrative analysis of host–pathogen interactions using in silico mass-balanced, genome-scale models has not been performed. We, therefore, constructed a cell-specific alveolar macrophage model, iAB-AMØ-1410, from the global human metabolic reconstruction, Recon 1. The model successfully predicted experimentally verified ATP and nitric oxide production rates in macrophages. This model was then integrated with an M. tuberculosis H37Rv model, iNJ661, to build an integrated host–pathogen genome-scale reconstruction, iAB-AMØ-1410-Mt-661. The integrated host–pathogen network enables simulation of the metabolic changes during infection. The resulting reaction activity and gene essentiality targets of the integrated model represent an altered infectious state. High-throughput data from infected macrophages were mapped onto the host–pathogen network and were able to describe three distinct pathological states. Integrated host–pathogen reconstructions thus form a foundation upon which understanding the biology and pathophysiology of infections can be developed.
PMCID: PMC2990636  PMID: 20959820
computational biology; host–pathogen; Mycobacterium tuberculosis; systems biology; macrophage
6.  Enhancing the Role of Veterinary Vaccines Reducing Zoonotic Diseases of Humans: Linking Systems Biology with Vaccine Development 
Vaccine  2011;29(41):7197-7206.
The aim of research on infectious diseases is their prevention, and brucellosis and salmonellosis as such are classic examples of worldwide zoonoses for application of a systems biology approach for enhanced rational vaccine development. When used optimally, vaccines prevent disease manifestations, reduce transmission of disease, decrease the need for pharmaceutical intervention, and improve the health and welfare of animals, as well as indirectly protecting against zoonotic diseases of people. Advances in the last decade or so using comprehensive systems biology approaches linking genomics, proteomics, bioinformatics, and biotechnology with immunology, pathogenesis and vaccine formulation and delivery are expected to enable enhanced approaches to vaccine development. The goal of this paper is to evaluate the role of computational systems biology analysis of host:pathogen interactions (the interactome) as a tool for enhanced rational design of vaccines. Systems biology is bringing a new, more robust approach to veterinary vaccine design based upon a deeper understanding of the host-pathogen interactions and its impact on the host's molecular network of the immune system. A computational systems biology method was utilized to create interactome models of the host responses to Brucella melitensis (BMEL), Mycobacterium avium paratuberculosis (MAP), Salmonella enterica Typhimurium (STM), and a Salmonella mutant (isogenic ΔsipA, sopABDE2) and linked to the basis for rational development of vaccines for brucellosis and salmonellosis as reviewed by Adams and Ficht (Adams et al. 2009; Ficht et al. 2009). A bovine ligated ileal loop biological model was established to capture the host gene expression response at multiple time points post infection. New methods based on Dynamic Bayesian Network (DBN) machine learning were employed to conduct a comparative pathogenicity analysis of 219 signaling and metabolic pathways and 1620 Gene Ontology (GO) categories that defined the host's biosignatures to each infectious condition. Through this DBN computational approach, the method identified significantly perturbed pathways and GO category groups of genes that define the pathogenicity signatures of the infectious agent. Our preliminary results provide deeper understanding of the overall complexity of host innate immune response as well as the identification of host gene perturbations that defines a unique host temporal biosignature response to each pathogen. The application of advanced computational methods for developing interactome models based on DBNs has proven to be instrumental in elucidating novel host responses and improved functional biological insight into the host defensive mechanisms. Evaluating the unique differences in pathway and GO perturbations across pathogen conditions allowed the identification of plausible host-pathogen interaction mechanisms. Accordingly, a systems biology approach to study molecular pathway gene expression profiles of host cellular responses to microbial pathogens holds great promise as a methodology to identify, model and predict the overall dynamics of the host-pathogen interactome. Thus, we propose that such an approach has immediate application to the rational design of brucellosis and salmonellosis vaccines.
PMCID: PMC3170448  PMID: 21651944
7.  Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism 
The first computational approach for the rapid generation of genome-scale tissue-specific models from a generic species model.A genome scale model of human liver metabolism, which is comprehensively tested and validated using cross-validation and the ability to carry out complex hepatic metabolic functions.The model's flux predictions are shown to correlate with flux measurements across a variety of hormonal and dietary conditions, and are successfully used to predict biomarker changes in genetic metabolic disorders, both with higher accuracy than the generic human model.
The study of normal human metabolism and its alterations is central to the understanding and treatment of a variety of human diseases, including diabetes, metabolic syndrome, neurodegenerative disorders, and cancer. A promising systems biology approach for studying human metabolism is through the development and analysis of large-scale stoichiometric network models of human metabolism. The reconstruction of these network models has followed two main paths: the former being the reconstruction of generic (non-tissue specific) models, characterizing the complete metabolic potential of human cells, based mostly on genomic data to trace enzyme-coding genes (Duarte et al, 2007; Ma et al, 2007), and the latter is the reconstruction of cell type- and tissue-specific models (Wiback and Palsson, 2002; Chatziioannou et al, 2003; Vo et al, 2004), based on a similar methodology to that described above, with the extra complexity of manual curation of literature evidence for the cell/system specificity of metabolic enzymes and pathways.
On this background, we present in this study, to the best of our knowledge, the first computational approach for a rapid generation of genome-scale tissue-specific models. The method relies on integrating the previously reconstructed generic human models with a variety of high-throughput molecular ‘omics' data, including transcriptomic, proteomic, metabolomic, and phenotypic data, as well as literature-based knowledge, characterizing the tissue in hand (Figure 1). Hence, it can be readily used to quite rapidly build and use a large array of human tissue-specific models. The resulting model satisfies stoichiometric, mass-balance, and thermodynamic constraints. It serves as a functional metabolic network that can then be used to explore the metabolic state of a tissue under various genetic and physiological conditions, simulating enzymatic inhibition or drug applications through standard constraint-based modeling methods, without requiring additional context-specific molecular data.
We applied this approach to build a genome scale model of liver metabolism, which is then comprehensively tested and validated. The model is shown to be able to simulate complex hepatic metabolic functions, as well as depicting the pathological alterations caused by urea cycle deficiencies. The liver model was applied to predict measured intra-cellular metabolic fluxes given measured metabolite uptake and secretion rates at different hepatic metabolic conditions. The predictions were tested using a comprehensive set of flux measurements performed by (Chan et al, 2003), showing that the liver model obtained more accurate predictions compared to those obtained by the original, generic human model (an overall prediction accuracy of 0.67 versus 0.46). Furthermore, it was applied to identify metabolic biomarkers for liver in-born errors of metabolism—once again, displaying superiority vs. the predictions generated by the generic human model (accuracy of 0.67 versus 0.59).
From a biotechnological standpoint, the liver model generated here can serve as a basis for future studies aiming to optimize the functioning of bio artificial liver devices. The application of the method to rapidly construct metabolic models of other human tissues can obviously lead to many other important clinical insights, e.g., concerning means for metabolic salvage of ischemic heart and brain tissues. Last but not least, the application of the new method is not limited to the realm of human modeling; it can be used to generate tissue models for any multi-tissue organism for which a generic model exists, such as the Mus musculus (Quek and Nielsen, 2008; Sheikh et al, 2005) and the model plant Arabidopsis thaliana (Poolman et al, 2009).
The computational study of human metabolism has been advanced with the advent of the first generic (non-tissue specific) stoichiometric model of human metabolism. In this study, we present a new algorithm for rapid reconstruction of tissue-specific genome-scale models of human metabolism. The algorithm generates a tissue-specific model from the generic human model by integrating a variety of tissue-specific molecular data sources, including literature-based knowledge, transcriptomic, proteomic, metabolomic and phenotypic data. Applying the algorithm, we constructed the first genome-scale stoichiometric model of hepatic metabolism. The model is verified using standard cross-validation procedures, and through its ability to carry out hepatic metabolic functions. The model's flux predictions correlate with flux measurements across a variety of hormonal and dietary conditions, and improve upon the predictive performance obtained using the original, generic human model (prediction accuracy of 0.67 versus 0.46). Finally, the model better predicts biomarker changes in genetic metabolic disorders than the generic human model (accuracy of 0.67 versus 0.59). The approach presented can be used to construct other human tissue-specific models, and be applied to other organisms.
PMCID: PMC2964116  PMID: 20823844
constraint based; hepatic; liver; metabolism
8.  Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks 
BMC Systems Biology  2014;8:87.
Modeling biological networks serves as both a major goal and an effective tool of systems biology in studying mechanisms that orchestrate the activities of gene products in cells. Biological networks are context-specific and dynamic in nature. To systematically characterize the selectively activated regulatory components and mechanisms, modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. While differential networks cannot be constructed by existing knowledge alone, novel incorporation of prior knowledge into data-driven approaches can improve the robustness and biological relevance of network inference. However, the major unresolved roadblocks include: big solution space but a small sample size; highly complex networks; imperfect prior knowledge; missing significance assessment; and heuristic structural parameter learning.
To address these challenges, we formulated the inference of differential dependency networks that incorporate both conditional data and prior knowledge as a convex optimization problem, and developed an efficient learning algorithm to jointly infer the conserved biological network and the significant rewiring across different conditions. We used a novel sampling scheme to estimate the expected error rate due to “random” knowledge. Based on that scheme, we developed a strategy that fully exploits the benefit of this data-knowledge integrated approach. We demonstrated and validated the principle and performance of our method using synthetic datasets. We then applied our method to yeast cell line and breast cancer microarray data and obtained biologically plausible results. The open-source R software package and the experimental data are freely available at
Experiments on both synthetic and real data demonstrate the effectiveness of the knowledge-fused differential dependency network in revealing the statistically significant rewiring in biological networks. The method efficiently leverages data-driven evidence and existing biological knowledge while remaining robust to the false positive edges in the prior knowledge. The identified network rewiring events are supported by previous studies in the literature and also provide new mechanistic insight into the biological systems. We expect the knowledge-fused differential dependency network analysis, together with the open-source R package, to be an important and useful bioinformatics tool in biological network analyses.
PMCID: PMC4131167  PMID: 25055984
Biological networks; Probabilistic graphical models; Differential dependency network; Network rewiring; Network analysis; Systems biology; Knowledge incorporation; Convex optimization
9.  Gradient Descent Optimization in Gene Regulatory Pathways 
PLoS ONE  2010;5(9):e12475.
Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Elucidating the architecture and dynamics of large scale gene regulatory networks is an important goal in systems biology. The knowledge of the gene regulatory networks further gives insights about gene regulatory pathways. This information leads to many potential applications in medicine and molecular biology, examples of which are identification of metabolic pathways, complex genetic diseases, drug discovery and toxicology analysis. High-throughput technologies allow studying various aspects of gene regulatory networks on a genome-wide scale and we will discuss recent advances as well as limitations and future challenges for gene network modeling. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms.
In the present article, we introduce a new method for identifying a set of optimal gene regulatory pathways by using structural equations as a tool for modeling gene regulatory networks. The method, first of all, generates data on reaction flows in a pathway. A set of constraints is formulated incorporating weighting coefficients. Finally the gene regulatory pathways are obtained through optimization of an objective function with respect to these weighting coefficients. The effectiveness of the present method is successfully tested on ten gene regulatory networks existing in the literature. A comparative study with the existing extreme pathway analysis also forms a part of this investigation. The results compare favorably with earlier experimental results. The validated pathways point to a combination of previously documented and novel findings.
We show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. The present method has been successful in deriving the optimal regulatory pathways for all the regulatory networks considered. The biological significance and applicability of the optimal pathways has also been discussed. Finally the usefulness of the present method on genetic engineering is depicted with an example.
PMCID: PMC2933224  PMID: 20838430
10.  Knowledge-driven genomic interactions: an application in ovarian cancer 
BioData Mining  2014;7:20.
Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner.
Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project.
We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge.
The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
PMCID: PMC4161273  PMID: 25214892
Knowledge-driven genomic interaction; Integrative analysis; Grammatical evolution neural network; Clinical outcome prediction; Ovarian cancer
11.  Metabolic Constraint-Based Refinement of Transcriptional Regulatory Networks 
PLoS Computational Biology  2013;9(12):e1003370.
There is a strong need for computational frameworks that integrate different biological processes and data-types to unravel cellular regulation. Current efforts to reconstruct transcriptional regulatory networks (TRNs) focus primarily on proximal data such as gene co-expression and transcription factor (TF) binding. While such approaches enable rapid reconstruction of TRNs, the overwhelming combinatorics of possible networks limits identification of mechanistic regulatory interactions. Utilizing growth phenotypes and systems-level constraints to inform regulatory network reconstruction is an unmet challenge. We present our approach Gene Expression and Metabolism Integrated for Network Inference (GEMINI) that links a compendium of candidate regulatory interactions with the metabolic network to predict their systems-level effect on growth phenotypes. We then compare predictions with experimental phenotype data to select phenotype-consistent regulatory interactions. GEMINI makes use of the observation that only a small fraction of regulatory network states are compatible with a viable metabolic network, and outputs a regulatory network that is simultaneously consistent with the input genome-scale metabolic network model, gene expression data, and TF knockout phenotypes. GEMINI preferentially recalls gold-standard interactions (p-value = 10−172), significantly better than using gene expression alone. We applied GEMINI to create an integrated metabolic-regulatory network model for Saccharomyces cerevisiae involving 25,000 regulatory interactions controlling 1597 metabolic reactions. The model quantitatively predicts TF knockout phenotypes in new conditions (p-value = 10−14) and revealed potential condition-specific regulatory mechanisms. Our results suggest that a metabolic constraint-based approach can be successfully used to help reconstruct TRNs from high-throughput data, and highlights the potential of using a biochemically-detailed mechanistic framework to integrate and reconcile inconsistencies across different data-types. The algorithm and associated data are available at
Author Summary
Cellular networks, such as metabolic and transcriptional regulatory networks (TRNs), do not operate independently but work together in unison to determine cellular phenotypes. Further, the phenotype and architecture of one network constrains the topology of other networks. Hence, it is critical to study network components and interactions in the context of the entire cell. Typically, efforts to reconstruct TRNs focus only on immediately proximal data such as gene co-expression and transcription factor (TF)-binding. Herein, we take a different strategy by linking candidate TRNs with the metabolic network to predict systems-level responses such as growth phenotypes of TF knockout strains, and compare predictions with experimental phenotype data to select amongst the candidate TRNs. Our approach goes beyond traditional data integration approaches for network inference and refinement by using a predictive network model (metabolism) to refine another network model (regulation) – thus providing an alternative avenue to this area of research. Understanding how the networks function together in a cell will pave the way for synthetic biology and has a wide-range of applications in biotechnology, drug discovery and diagnostics. Further we demonstrate how metabolic models can integrate and reconcile inconsistencies across different data-types.
PMCID: PMC3857774  PMID: 24348226
12.  Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference 
PLoS Computational Biology  2014;10(6):e1003666.
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.
Author Summary
We have developed a methodology that combines standard computational analysis of gene expression data with knowledge in the literature to identify pathways of gene and protein interactions. We extract the knowledge from PubMed citations using a tool (SemRep) that identifies specific relationships between genes or proteins. We string together networks of individual interactions that are found within citations that refer to the target pathways. Upon this skeleton of interactions, we calculate the weight of the interaction with the gene expression data captured over multiple time points using state-of-the-art analysis algorithms. Not surprisingly, this approach of combining prior knowledge into the analysis process significantly improves the performance of the analysis. This work is most significant as an example of how the wealth of textual data related to gene interactions can be incorporated into computational analysis, not solely to identify this type of pathway (a gene regulatory network) but for any type of similar biological problem.
PMCID: PMC4055569  PMID: 24921649
13.  Modularity-based credible prediction of disease genes and detection of disease subtypes on the phenotype-gene heterogeneous network 
BMC Systems Biology  2011;5:79.
Protein-protein interaction networks and phenotype similarity information have been synthesized together to discover novel disease-causing genes. Genetic or phenotypic similarities are manifested as certain modularity properties in a phenotype-gene heterogeneous network consisting of the phenotype-phenotype similarity network, protein-protein interaction network and gene-disease association network. However, the quantitative analysis of modularity in the heterogeneous network and its influence on disease-gene discovery are still unaddressed. Furthermore, the genetic correspondence of the disease subtypes can be identified by marking the genes and phenotypes in the phenotype-gene network. We present a novel network inference method to measure the network modularity, and in particular to suggest the subtypes of diseases based on the heterogeneous network.
Based on a measure which is introduced to evaluate the closeness between two nodes in the phenotype-gene heterogeneous network, we developed a Hitting-Time-based method, CIPHER-HIT, for assessing the modularity of disease gene predictions and credibly prioritizing disease-causing genes, and then identifying the genetic modules corresponding to potential subtypes of the queried phenotype. The CIPHER-HIT is free to rely on any preset parameters. We found that when taking into account the modularity levels, the CIPHER-HIT method can significantly improve the performance of disease gene predictions, which demonstrates modularity is one of the key features for credible inference of disease genes on the phenotype-gene heterogeneous network. By applying the CIPHER-HIT to the subtype analysis of Breast cancer, we found that the prioritized genes can be divided into two sub-modules, one contains the members of the Fanconi anemia gene family, and the other contains a reported protein complex MRE11/RAD50/NBN.
The phenotype-gene heterogeneous network contains abundant information for not only disease genes discovery but also disease subtypes detection. The CIPHER-HIT method presented here is effective for network inference, particularly on credible prediction of disease genes and the subtype analysis of diseases, for example Breast cancer. This method provides a promising way to analyze heterogeneous biological networks, both globally and locally.
PMCID: PMC3130676  PMID: 21599985
14.  Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli 
The in vivo distribution of metabolic fluxes in Escherichia coli can be predicted from optimality principles At least two different sets of optimality principles govern the operation of the metabolic network under different environmental conditionsMetabolism during unlimited growth on glucose in batch culture is best described by the nonlinear maximization of ATP yield per unit of flux
Based on a long history of biochemical and lately genomic research, metabolic networks, in particular microbial ones, are among the best characterized cellular networks. Most components (genes, proteins and metabolites) and their interactions are known. This topological knowledge of the reaction stoichiometry allows to construct metabolic models up to the level of genome scale (Price et al, 2004). Experimentally, sophisticated 13C-tracer-based methodologies were developed that enable tracking of the intracellular flux traffic through the reaction network (Sauer, 2006). With the accumulation of such experimental flux data, the question arises why a particular distribution of flux within the network is realized and not one of many alternatives?
Here, we address the question whether the intracellular flux state can be predicted from optimality principles, with the underlying rational that evolution might have optimized metabolic operation toward particular objectives or combinations of multiple objectives. For this purpose, we performed a systematic and rigorous comparison between computational flux predictions and available experimental flux data (Emmerling et al, 2002; Perrenoud and Sauer, 2005; Nanchen et al, 2006) under six different environmental conditions for the model bacterium E. coli. For computational flux predictions, we used a constraint-based modeling approach that requires a stoichiometric model of metabolism (Stelling, 2004). More specifically, we employed flux balance analysis (FBA) where objective functions are defined that represent optimality principles of network operation (Price et al, 2004). This approach has been applied successfully to predict gene deletion lethality (Edwards and Palsson, 2000a, bEdwards and Palsson, 2000a, b; Forster et al, 2003; Kuepfer et al, 2005), network capacities and feasible network states (Edwards 2001, Ibarra 2002), but in only few cases to predict the intracellular flux state (Beard et al, 2002; Holzhütter, 2004).
While different objective functions were proposed for different biological systems (Holzhütter, 2004; Price et al, 2004; Knorr et al, 2006), by far the most common assumption is that microbial cells maximize their growth. To address this issue more generally, we evaluated the accuracy of FBA-based flux predictions for 11 linear and nonlinear objective functions that were combined with eight adjustable constraints. For this purpose, we constructed a highly interconnected stoichiometric network model with 98 reactions and 60 metabolites of E. coli central carbon metabolism. Based on mathematical analyses, the overall model could be reduced to a set of 10 reactions that summarize the actual systemic degree of freedom.
As a quantitative measure of how accurate the experimental data are predicted, we defined predictive fidelity as a single value to quantify the overall deviation between in silico and in vivo fluxes. By comparing all in silico predictions to 13C-based in vivo fluxes, we show that prediction of intracellular steady-state fluxes from network stoichiometry alone is, within limits, possible. An unexpected key result is that no further assumptions on network operation in the form of additional and potentially artificial constraints are necessary, provided the appropriate objective function is chosen for a given condition.
While no single objective was able to describe the flux states under all six conditions, we identified two sets of objectives for biologically meaningful predictions without the need for further constraints. For unlimited growth on glucose in aerobic or nitrate-respiring batch cultures, we find that the most accurate and robust results are obtained with the nonlinear maximization of ATP yield per flux unit (Figure 1). Under nutrient scarcity in glucose- or ammonium-limited continuous cultures, in contrast, linear maximization of the overall ATP or biomass yields achieved the highest predictive accuracy.
Since these identified optimality principles describe the system behavior without preconditioning of the network through further constraints, they reflect, to some extent, the evolutionary selection of metabolic network regulation that realizes the various flux states. For conditions of nutrient scarcity, the maximization of energy or biomass yield objective is consistent with the generally observed physiology (Russell and Cook, 1995). The meaning of the maximization of ATP yield per flux unit objective for unlimited growth, however, is less obvious. Generally, it selects for small networks with yet high, albeit suboptimal ATP formation, which has three biological consequences. Firstly, resources are economically allocated since expenditures for enzyme synthesis are, on average, greater for longer pathways. Secondly, suboptimal ATP yields dissipate more energy and thus enable higher catabolic rates. Thirdly, at a constant catabolic rate, a small network results in shorter residence times of substrate molecules until they generate ATP. The relative contribution of these consequences to the evolution of network regulation is unclear, but simultaneous optimization for ATP yield and catabolic rate under this optimality principle identifies a trade-off between the contradicting objectives of maximum overall ATP yield and maximum rate of ATP formation (Pfeiffer et al, 2001).
To which extent can optimality principles describe the operation of metabolic networks? By explicitly considering experimental errors and in silico alternate optima in flux balance analysis, we systematically evaluate the capacity of 11 objective functions combined with eight adjustable constraints to predict 13C-determined in vivo fluxes in Escherichia coli under six environmental conditions. While no single objective describes the flux states under all conditions, we identified two sets of objectives for biologically meaningful predictions without the need for further, potentially artificial constraints. Unlimited growth on glucose in oxygen or nitrate respiring batch cultures is best described by nonlinear maximization of the ATP yield per flux unit. Under nutrient scarcity in continuous cultures, in contrast, linear maximization of the overall ATP or biomass yields achieved the highest predictive accuracy. Since these particular objectives predict the system behavior without preconditioning of the network structure, the identified optimality principles reflect, to some extent, the evolutionary selection of metabolic network regulation that realizes the various flux states.
PMCID: PMC1949037  PMID: 17625511
13C-flux; evolution; flux balance analysis; metabolic network; network optimality
15.  A Logical Model Provides Insights into T Cell Receptor Signaling 
PLoS Computational Biology  2007;3(8):e163.
Cellular decisions are determined by complex molecular interaction networks. Large-scale signaling networks are currently being reconstructed, but the kinetic parameters and quantitative data that would allow for dynamic modeling are still scarce. Therefore, computational studies based upon the structure of these networks are of great interest. Here, a methodology relying on a logical formalism is applied to the functional analysis of the complex signaling network governing the activation of T cells via the T cell receptor, the CD4/CD8 co-receptors, and the accessory signaling receptor CD28. Our large-scale Boolean model, which comprises 94 nodes and 123 interactions and is based upon well-established qualitative knowledge from primary T cells, reveals important structural features (e.g., feedback loops and network-wide dependencies) and recapitulates the global behavior of this network for an array of published data on T cell activation in wild-type and knock-out conditions. More importantly, the model predicted unexpected signaling events after antibody-mediated perturbation of CD28 and after genetic knockout of the kinase Fyn that were subsequently experimentally validated. Finally, we show that the logical model reveals key elements and potential failure modes in network functioning and provides candidates for missing links. In summary, our large-scale logical model for T cell activation proved to be a promising in silico tool, and it inspires immunologists to ask new questions. We think that it holds valuable potential in foreseeing the effects of drugs and network modifications.
Author Summary
T-lymphocytes are central regulators of the adaptive immune response, and their inappropriate activation can cause autoimmune diseases or cancer. The understanding of the signaling mechanisms underlying T cell activation is a prerequisite to develop new strategies for pharmacological intervention and disease treatments. However, much of the existing literature on T cell signaling is related to T cell development or to activation processes in transformed T cell lines (e.g., Jurkat), whereas information on non-transformed primary T cells is limited. Here, immunologists and theoreticians have compiled data from the existing literature that stem from analysis of primary T cells. They used this information to establish a qualitative Boolean network that describes T cell activation mechanisms after engagement of the TCR, the CD4/CD8 co-receptors, and CD28. The network comprises 94 nodes and can be extended to facilitate interpretation of new data that emerge from experimental analysis of T cell activation. Newly developed tools and methods allow in silico analysis, and manipulation of the network and can uncover hidden/unforeseen signaling pathways. Indeed, by assessing signaling events controlled by CD28 and the protein tyrosine kinase Fyn, we show that computational analysis of even a qualitative network can provide new and non-obvious signaling pathways which can be validated experimentally.
PMCID: PMC1950951  PMID: 17722974
16.  Maximal Extraction of Biological Information from Genetic Interaction Data 
PLoS Computational Biology  2009;5(4):e1000347.
Extraction of all the biological information inherent in large-scale genetic interaction datasets remains a significant challenge for systems biology. The core problem is essentially that of classification of the relationships among phenotypes of mutant strains into biologically informative “rules” of gene interaction. Geneticists have determined such classifications based on insights from biological examples, but it is not clear that there is a systematic, unsupervised way to extract this information. In this paper we describe such a method that depends on maximizing a previously described context-dependent information measure to obtain maximally informative biological networks. We have successfully validated this method on two examples from yeast by demonstrating that more biological information is obtained when analysis is guided by this information measure. The context-dependent information measure is a function only of phenotype data and a set of interaction rules, involving no prior biological knowledge. Analysis of the resulting networks reveals that the most biologically informative networks are those with the greatest context-dependent information scores. We propose that these high-complexity networks reveal genetic architecture at a modular level, in contrast to classical genetic interaction rules that order genes in pathways. We suggest that our analysis represents a powerful, data-driven, and general approach to genetic interaction analysis, with particular potential in the study of mammalian systems in which interactions are complex and gene annotation data are sparse.
Author Summary
Targeted genetic perturbation is a powerful tool for inferring gene function in model organisms. Functional relationships between genes can be inferred by observing the effects of multiple genetic perturbations in a single strain. The study of these relationships, generally referred to as genetic interactions, is a classic technique for ordering genes in pathways, thereby revealing genetic organization and gene-to-gene information flow. Genetic interaction screens are now being carried out in high-throughput experiments involving tens or hundreds of genes. These data sets have the potential to reveal genetic organization on a large scale, and require computational techniques that best reveal this organization. In this paper, we use a complexity metric based in information theory to determine the maximally informative network given a set of genetic interaction data. We find that networks with high complexity scores yield the most biological information in terms of (i) specific associations between genes and biological functions, and (ii) mapping modules of co-functional genes. This information-based approach is an automated, unsupervised classification of the biological rules underlying observed genetic interactions. It might have particular potential in genetic studies in which interactions are complex and prior gene annotation data are sparse.
PMCID: PMC2659753  PMID: 19343223
17.  Network Modeling Identifies Molecular Functions Targeted by miR-204 to Suppress Head and Neck Tumor Metastasis 
PLoS Computational Biology  2010;6(4):e1000730.
Due to the large number of putative microRNA gene targets predicted by sequence-alignment databases and the relative low accuracy of such predictions which are conducted independently of biological context by design, systematic experimental identification and validation of every functional microRNA target is currently challenging. Consequently, biological studies have yet to identify, on a genome scale, key regulatory networks perturbed by altered microRNA functions in the context of cancer. In this report, we demonstrate for the first time how phenotypic knowledge of inheritable cancer traits and of risk factor loci can be utilized jointly with gene expression analysis to efficiently prioritize deregulated microRNAs for biological characterization. Using this approach we characterize miR-204 as a tumor suppressor microRNA and uncover previously unknown connections between microRNA regulation, network topology, and expression dynamics. Specifically, we validate 18 gene targets of miR-204 that show elevated mRNA expression and are enriched in biological processes associated with tumor progression in squamous cell carcinoma of the head and neck (HNSCC). We further demonstrate the enrichment of bottleneckness, a key molecular network topology, among miR-204 gene targets. Restoration of miR-204 function in HNSCC cell lines inhibits the expression of its functionally related gene targets, leads to the reduced adhesion, migration and invasion in vitro and attenuates experimental lung metastasis in vivo. As importantly, our investigation also provides experimental evidence linking the function of microRNAs that are located in the cancer-associated genomic regions (CAGRs) to the observed predisposition to human cancers. Specifically, we show miR-204 may serve as a tumor suppressor gene at the 9q21.1–22.3 CAGR locus, a well established risk factor locus in head and neck cancers for which tumor suppressor genes have not been identified. This new strategy that integrates expression profiling, genetics and novel computational biology approaches provides for improved efficiency in characterization and modeling of microRNA functions in cancer as compared to the state of art and is applicable to the investigation of microRNA functions in other biological processes and diseases.
Author Summary
MicroRNAs regulate the expression of genes in cells and are important in cancer development and progression. Designing new microRNA-based treatments requires the understanding of their mechanisms of action. Previous biological studies lack in depth since only a few genes are confirmed as microRNA targets. Additionally, key biological systems perturbed by altered microRNA functions in the context of cancer remain to be identified. Here, we demonstrate for the first time how genetic knowledge about the inheritance of cancer can be utilized jointly with data about the expression of genes in cancer samples to model deregulated microRNAs and their functions at multiple scales of biology. Our approach further uncovers previously unknown connections between microRNAs, their regulated genes, and their dynamics. Using head and neck cancer as a model, we predict the presence, functions, and gene targets of a new tumor suppressor microRNA in a cancer-associated chromosomal region where a candidate gene has not been identified. We then confirm their validity with extensive and thorough biological characterization and show attenuation of lung metastasis in mice. The discovery of molecular networks regulated by microRNAs could be exploited for the design of new treatments as an alternative to the single-gene target paradigm.
PMCID: PMC2848541  PMID: 20369013
18.  Bayesian Inference of Signaling Network Topology in a Cancer Cell Line 
Bioinformatics  2012;28(21):2804-2810.
Motivation: Protein signaling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. To shed light on signaling network topology in specific contexts, such as cancer, requires interrogation of multiple proteins through time and statistical approaches to make inferences regarding network structure.
Results: In this study, we use dynamic Bayesian networks to make inferences regarding network structure and thereby generate testable hypotheses. We incorporate existing biology using informative network priors, weighted objectively by an empirical Bayes approach, and exploit a connection between variable selection and network inference to enable exact calculation of posterior probabilities of interest. The approach is computationally efficient and essentially free of user-set tuning parameters. Results on data where the true, underlying network is known place the approach favorably relative to existing approaches. We apply these methods to reverse-phase protein array time-course data from a breast cancer cell line (MDA-MB-468) to predict signaling links that we independently validate using targeted inhibition. The methods proposed offer a general approach by which to elucidate molecular networks specific to biological context, including, but not limited to, human cancers.
Availability: (code and data).
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3476330  PMID: 22923301
19.  Perturbation Biology: Inferring Signaling Networks in Cellular Systems 
PLoS Computational Biology  2013;9(12):e1003290.
We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology.
Author Summary
Drugs that target specific effects of signaling proteins are promising agents for treating cancer. One of the many obstacles facing optimal drug design is inadequate quantitative understanding of the coordinated interactions between signaling proteins. De novo model inference of network or pathway models refers to the algorithmic construction of mathematical predictive models from experimental data without dependence on prior knowledge. De novo inference is difficult because of the prohibitively large number of possible sets of interactions that may or may not be consistent with observations. Our new method overcomes this difficulty by adapting a method from statistical physics, called Belief Propagation, which first calculates probabilistically the most likely interactions in the vast space of all possible solutions, then derives a set of individual, highly probable solutions in the form of executable models. In this paper, we test this method on artificial data and then apply it to model signaling pathways in a BRAF-mutant melanoma cancer cell line based on a large set of rich output measurements from a systematic set of perturbation experiments using drug combinations. Our results are in agreement with established biological knowledge, predict novel interactions, and predict efficacious drug targets that are specific to the experimental cell line and potentially to related tumors. The method has the potential, with sufficient systematic perturbation data, to model, de novo and quantitatively, the effects of hundreds of proteins on cellular responses, on a scale that is currently unreachable in diverse areas of cell biology. In a disease context, the method is applicable to the computational design of novel combination drug treatments.
PMCID: PMC3868523  PMID: 24367245
20.  Global Quantitative Modeling of Chromatin Factor Interactions 
PLoS Computational Biology  2014;10(3):e1003525.
Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions.
Author Summary
Chromatin, like many other molecular biological systems, is composed of multiple interacting factors. Our knowledge about chromatin factors is mostly qualitative, and such qualitative knowledge can be insufficient for predicting collective behaviors. It's also extremely challenging to study collective behaviors involving multiple interacting factors through genetic and biochemical experiments. An alternative approach is to leverage large-scale genome-wide chromatin profiles and statistical modeling to create predictive models and infer underlying interaction mechanisms based on these observed high-throughput data. In this study, we developed a novel maximum entropy-based modeling approach to quantitatively capture interactions between chromatin factors at the same genomic location, which we see as a step toward quantitative understanding of chromatin organization that involves a system of multiple interacting factors. We applied this quantitative model to successfully infer functional properties of chromatin including interactions between chromatin factors. Furthermore, the model predicts unmeasured chromatin profiles with high accuracy based on its inferred dependencies with other factors within and across cell-types. Thus our modeling approach effectively ultilizes large-scale chromatin profiles to dissect chromatin factor interactions and to make data-driven inferences about chromatin regulation.
PMCID: PMC3967939  PMID: 24675896
21.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology 
PLoS Genetics  2011;7(1):e1001273.
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
Author Summary
Genome-wide association studies have uncovered hundreds of DNA changes associated with complex disease. The ultimate promise of these studies is the understanding of disease biology; this goal, however, is not easily achieved because each disease has yielded numerous associations, each one pointing to a region of the genome, rather than a specific causal mutation. Presumably, the causal variants affect components of common molecular processes, and a first step in understanding the disease biology perturbed in patients is to identify connections among regions associated to disease. Since it has been reported in numerous Mendelian diseases that protein products of causal genes tend to physically bind each other, we chose to approach this problem using known protein–protein interactions to test whether any of the products of genes in five complex trait-associated loci bind each other. We applied several permutation methods and find robustly significant connectivity within four of the traits. In Crohn's disease and rheumatoid arthritis, we are able to show that these genes are co-expressed and that other proteins emerging in the network are enriched for association to disease. These findings suggest that, for the complex traits studied here, associated loci contain variants that affect common molecular processes, rather than distinct mechanisms specific to each association.
PMCID: PMC3020935  PMID: 21249183
22.  Gene regulatory network modeling via global optimization of high-order dynamic Bayesian network 
BMC Bioinformatics  2012;13:131.
Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks.
To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques.
Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
PMCID: PMC3433362  PMID: 22694481
23.  Network Modeling Reveals Prevalent Negative Regulatory Relationships between Signaling Sectors in Arabidopsis Immune Signaling 
PLoS Pathogens  2010;6(7):e1001011.
Biological signaling processes may be mediated by complex networks in which network components and network sectors interact with each other in complex ways. Studies of complex networks benefit from approaches in which the roles of individual components are considered in the context of the network. The plant immune signaling network, which controls inducible responses to pathogen attack, is such a complex network. We studied the Arabidopsis immune signaling network upon challenge with a strain of the bacterial pathogen Pseudomonas syringae expressing the effector protein AvrRpt2 (Pto DC3000 AvrRpt2). This bacterial strain feeds multiple inputs into the signaling network, allowing many parts of the network to be activated at once. mRNA profiles for 571 immune response genes of 22 Arabidopsis immunity mutants and wild type were collected 6 hours after inoculation with Pto DC3000 AvrRpt2. The mRNA profiles were analyzed as detailed descriptions of changes in the network state resulting from the genetic perturbations. Regulatory relationships among the genes corresponding to the mutations were inferred by recursively applying a non-linear dimensionality reduction procedure to the mRNA profile data. The resulting static network model accurately predicted 23 of 25 regulatory relationships reported in the literature, suggesting that predictions of novel regulatory relationships are also accurate. The network model revealed two striking features: (i) the components of the network are highly interconnected; and (ii) negative regulatory relationships are common between signaling sectors. Complex regulatory relationships, including a novel negative regulatory relationship between the early microbe-associated molecular pattern-triggered signaling sectors and the salicylic acid sector, were further validated. We propose that prevalent negative regulatory relationships among the signaling sectors make the plant immune signaling network a “sector-switching” network, which effectively balances two apparently conflicting demands, robustness against pathogenic perturbations and moderation of negative impacts of immune responses on plant fitness.
Author Summary
When a plant detects pathogen attack, this information is conveyed through a molecular signaling network to turn on a large variety of immune responses. We investigated how this plant immune signaling network was organized using the model plant Arabidopsis. Wild type and mutant plants with defects in immune signaling were challenged with a pathogen. Then, expression levels of many genes were measured using microarrays. Detailed analysis of the mutation effects on gene expression allowed us to build a signaling network model composed of the genes corresponding to the mutations. This model predicted that the network components are highly interconnected and that it is very common for network components that mediate different signaling events to inhibit each other. The prevalent signaling inhibitions in the network suggest that only part of the signaling network is usually used but that if this part is attacked by pathogens, other parts kick in and back up the function of the attacked part. We speculate that plant immune signaling is highly tolerant to pathogen attack due to this backup mechanism. We also speculate use of only part of the network at any one time helps minimize negative impacts of the immune response on plant fitness.
PMCID: PMC2908620  PMID: 20661428
24.  Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations 
PLoS Computational Biology  2013;9(5):e1003068.
Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs) are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL), for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML) is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL) based scheme, and the QTL-directed dependency graph (QDG) method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Author Summary
Deciphering the structure of gene regulatory networks is crucial for understanding gene functions and cellular dynamics, as well as system-level modeling of individual genes and cellular functions. Computational methods exploiting gene expression and other types of data generated from high-throughput experiments provide an efficient and low-cost means of inferring gene networks. Sparse structural equation models are employed to: i) integrate both gene expression and genetic perturbation data for inference of gene networks; and, ii) develop an efficient sparsity-aware inference algorithm. Computer simulations corroborate that the novel algorithm markedly outperforms state-of-the-art alternatives. The algorithm is further applied to infer a real human gene network unveiling possible interactions between several genes. Since gene networks can be perturbed not only by genetic variations but also by other means such as gene copy number changes, gene knockdown or controlled gene over-expression, this paper's method can be applied to a number of practical scenarios.
PMCID: PMC3662697  PMID: 23717196
25.  Modeling Signal Transduction from Protein Phosphorylation to Gene Expression 
Cancer Informatics  2014;13(Suppl 1):59-67.
Signaling networks are of great importance for us to understand the cell’s regulatory mechanism. The rise of large-scale genomic and proteomic data, and prior biological knowledge has paved the way for the reconstruction and discovery of novel signaling pathways in a data-driven manner. In this study, we investigate computational methods that integrate proteomics and transcriptomic data to identify signaling pathways transmitting signals in response to specific stimuli. Such methods can be applied to cancer genomic data to infer perturbed signaling pathways.
We proposed a novel Bayesian Network (BN) framework to integrate transcriptomic data with proteomic data reflecting protein phosphorylation states for the purpose of identifying the pathways transmitting the signal of diverse stimuli in rat and human cells. We represented the proteins and genes as nodes in a BN in which edges reflect the regulatory relationship between signaling proteins. We designed an efficient inference algorithm that incorporated the prior knowledge of pathways and searched for a network structure in a data-driven manner.
We applied our method to infer rat and human specific networks given gene expression and proteomic datasets. We were able to effectively identify sparse signaling networks that modeled the observed transcriptomic and proteomic data. Our methods were able to identify distinct signaling pathways for rat and human cells in a data-driven manner, based on the facts that rat and human cells exhibited distinct transcriptomic and proteomics responses to a common set of stimuli. Our model performed well in the SBV IMPROVER challenge in comparison to other models addressing the same task. The capability of inferring signaling pathways in a data-driven fashion may contribute to cancer research by identifying distinct aberrations in signaling pathways underlying heterogeneous cancers subtypes.
PMCID: PMC4216050  PMID: 25392684
Bayesian Network; signaling pathways; protein phosphorylation; gene expression; species translation

Results 1-25 (1154584)