Search tips
Search criteria

Results 1-25 (1388233)

Clipboard (0)

Related Articles

1.  HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology 
We present HepatoNet1, a manually curated large-scale metabolic network of the human hepatocyte that encompasses >2500 reactions in six intracellular and two extracellular compartments.Using constraint-based modeling techniques, the network has been validated to replicate numerous metabolic functions of hepatocytes corresponding to a reference set of diverse physiological liver functions.Taking the detoxification of ammonia and the formation of bile acids as examples, we show how these liver-specific metabolic objectives can be achieved by the variable interplay of various metabolic pathways under varying conditions of nutrients and oxygen availability.
The liver has a pivotal function in metabolic homeostasis of the human body. Hepatocytes are the principal site of the metabolic conversions that underlie diverse physiological functions of the liver. These functions include provision and homeostasis of carbohydrates, amino acids, lipids and lipoproteins in the systemic blood circulation, biotransformation, plasma protein synthesis and bile formation, to name a few. Accordingly, hepatocyte metabolism integrates a vast array of differentially regulated biochemical activities and is highly responsive to environmental perturbations such as changes in portal blood composition (Dardevet et al, 2006). The complexity of this metabolic network and the numerous physiological functions to be achieved within a highly variable physiological environment necessitate an integrated approach with the aim of understanding liver metabolism at a systems level. To this end, we present HepatoNet1, a stoichiometric network of human hepatocyte metabolism characterized by (i) comprehensive coverage of known biochemical activities of hepatocytes and (ii) due representation of the biochemical and physiological functions of hepatocytes as functional network states. The network comprises 777 metabolites in six intracellular (cytosol, endoplasmic reticulum and Golgi apparatus, lysosome, mitochondria, nucleus, and peroxisome) and two extracellular compartments (bile canaliculus and sinusoidal space) and 2539 reactions, including 1466 transport reactions. It is based on the manual evaluation of >1500 original scientific research publications to warrant a high-quality evidence-based model. The final network is the result of an iterative process of data compilation and rigorous computational testing of network functionality by means of constraint-based modeling techniques. We performed flux-balance analyses to validate whether for >300 different metabolic objectives a non-zero stationary flux distribution could be established in the network. Figure 1 shows one such functional flux mode associated with the synthesis of the bile acid glycochenodeoxycholate, one important hepatocyte-specific physiological liver function. Besides those pathways directly linked to the synthesis of the bile acid, the mevalonate pathway and the de novo synthesis of cholesterol, the flux mode comprises additional pathways such as gluconeogenesis, the pentose phosphate pathway or the ornithine cycle because the calculations were routinely performed on a minimal set of exchangeable metabolites, that is all reactants were forced to be balanced and all exportable intermediates had to be catabolized into non-degradable end products. This example shows how HepatoNet1 under the challenges of limited exchange across the network boundary can reveal numerous cross-links between metabolic pathways traditionally perceived as separate entities. For example, alanine is used as gluconeogenic substrate to form glucose-6-phosphate, which is used in the pentose phosphate pathway to generate NADPH. The glycine moiety for bile acid conjugation is derived from serine. Conversion of ammonia into non-toxic nitrogen compounds is one central homeostatic function of hepatocytes. Using the HepatoNet1 model, we investigated, as another example of a complex metabolic objective dependent on systemic physiological parameters, how the consumption of oxygen, glucose and palmitate is affected when an external nitrogen load is converted in varying proportions to the non-toxic nitrogen compounds: urea, glutamine and alanine. The results reveal strong dependencies between the available level of oxygen and the substrate demand of hepatocytes required for effective ammonia detoxification by the liver.
Oxygen demand is highest if nitrogen is exclusively transformed into urea. At lower fluxes into urea, an intriguing pattern for oxygen demand is predicted: oxygen demand attains a minimum if the nitrogen load is directed to urea, glutamine and alanine with relative fluxes of 0.17, 0.43 and 0.40, respectively (Figure 2A). Oxygen demand in this flux distribution is four times lower than for the maximum (100% urea) and still 77 and 33% lower than using alanine and glutamine as exclusive nitrogen compounds, respectively. This computationally predicted tendency is consistent with the notion that the zonation of ammonia detoxification, that is the preferential conversion of ammonia to urea in periportal hepatocytes and to glutamine in perivenous hepatocytes, is dictated by the availability of oxygen (Gebhardt, 1992; Jungermann and Kietzmann, 2000). The decreased oxygen demand in flux distributions using higher proportions of glutamine or alanine is accompanied by increased uptake of the substrates glucose and palmitate (Figure 2B). This is due to an increased demand of energy and carbon for the amidation and transamination of glutamate and pyruvate to discharge nitrogen in the form of glutamine and alanine, respectively. In terms of both scope and specificity, our model bridges the scale between models constructed specifically to examine distinct metabolic processes of the liver and modeling based on a global representation of human metabolism. The former include models for the interdependence of gluconeogenesis and fatty-acid catabolism (Chalhoub et al, 2007), impairment of glucose production in von Gierke's and Hers' diseases (Beard and Qian, 2005) and other processes (Calik and Akbay, 2000; Stucki and Urbanczik, 2005; Ohno et al, 2008). The hallmark of these models is that each of them focuses on a small number of reactions pertinent to the metabolic function of interest embedded in a customized representation of the principal pathways of central metabolism. HepatoNet1, currently, outperforms liver-specific models computationally predicted (Shlomi et al, 2008) on the basis of global reconstructions of human metabolism (Duarte et al, 2007; Ma and Goryanin, 2008). In contrast to either of the aforementioned modeling scales, HepatoNet1 provides the combination of a system-scale representation of metabolic activities and representation of the cell type-specific physical boundaries and their specific transport capacities. This allows for a highly versatile use of the model for the analysis of various liver-specific physiological functions. Conceptually, from a biological system perspective, this type of model offers a large degree of comprehensiveness, whereas retaining tissue specificity, a fundamental design principle of mammalian metabolism. HepatoNet1 is expected to provide a structural platform for computational studies on liver function. The results presented herein highlight how internal fluxes of hepatocyte metabolism and the interplay with systemic physiological parameters can be analyzed with constraint-based modeling techniques. At the same time, the framework may serve as a scaffold for complementation of kinetic and regulatory properties of enzymes and transporters for analysis of sub-networks with topological or kinetic modeling methods.
We present HepatoNet1, the first reconstruction of a comprehensive metabolic network of the human hepatocyte that is shown to accomplish a large canon of known metabolic liver functions. The network comprises 777 metabolites in six intracellular and two extracellular compartments and 2539 reactions, including 1466 transport reactions. It is based on the manual evaluation of >1500 original scientific research publications to warrant a high-quality evidence-based model. The final network is the result of an iterative process of data compilation and rigorous computational testing of network functionality by means of constraint-based modeling techniques. Taking the hepatic detoxification of ammonia as an example, we show how the availability of nutrients and oxygen may modulate the interplay of various metabolic pathways to allow an efficient response of the liver to perturbations of the homeostasis of blood compounds.
PMCID: PMC2964118  PMID: 20823849
computational biology; flux balance; liver; minimal flux
2.  Dynamic Changes in Protein Functional Linkage Networks Revealed by Integration with Gene Expression Data 
PLoS Computational Biology  2008;4(11):e1000237.
Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein∶protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein∶protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for Escherichia coli by integrating functional linkages and gene expression information. As a case study, we have chosen to analyze UV exposure in wild-type and SOS deficient E. coli at 20 minutes post irradiation. The conditional networks exhibit similar topological properties. Although the global topological properties of the networks are similar, many subtle local changes are observed, which are suggestive of the cellular response to the perturbations. Some such changes correspond to differences in the path lengths among the nodes of carbohydrate metabolism correlating with its loss in efficiency in the UV treated cells. Similarly, expression of hubs under unique conditions reflects the importance of these genes. Various centrality measures applied to the networks indicate increased importance for replication, repair, and other stress proteins for the cells under UV treatment, as anticipated. We thus propose a novel approach for studying an organism at the systems level by integrating genome-wide functional linkages and the gene expression data.
Author Summary
Many cellular processes and the response of cells to environmental cues are determined by the intricate protein∶protein interactions. These cellular protein interactions can be represented in the form of a graph, where the nodes represent the proteins and the edges signify the interactions between them. However, the available protein functional linkage maps do not incorporate the dynamics of gene expression and thus do not portray the dynamics of true protein∶protein interactions in vivo. We have used gene expression data as well as the available protein functional interaction information for Escherichia coli to build the protein interaction networks for expressed genes in a given condition. These networks, named conditional networks, capture the differences in the protein interaction networks and hence the cell physiology. Thus, by exploring the dynamics of protein interaction profiles, we hope to understand the response of cells to environmental changes.
PMCID: PMC2580820  PMID: 19043542
3.  The extended TILAR approach: a novel tool for dynamic modeling of the transcription factor network regulating the adaption to in vitro cultivation of murine hepatocytes 
BMC Systems Biology  2012;6:147.
Network inference is an important tool to reveal the underlying interactions of biological systems. In the liver, a complex system of transcription factors is active to distribute signals and induce the cellular response following extracellular stimuli. Plenty of information is available about single transcription factors important for the different functions of the liver, but little is known about their causal relations to each other.
Given a DNA microarray time series dataset of collagen monolayers cultured murine hepatocytes, we identified 22 differentially expressed genes for which the corresponding protein is known to exhibit transcription factor activity. We developed the Extended TILAR (ExTILAR) network inference algorithm based on the modeling concept of the previously published TILAR algorithm. Using ExTILAR, we inferred a transcription factor network based on gene expression data which puts these important genes into a functional context. This way, we identified a previously unknown relationship between Tgif1 and Atf3 which we validated experimentally. Beside its known role in metabolic processes, this extends the knowledge about Tgif1 in hepatocytes towards a possible influence of processes such as proliferation and cell cycle. Moreover, two positive (i.e. double negative) regulatory loops were predicted that could give rise to bistable behavior. We further evaluated the performance of ExTILAR by systematic inference of an in silico network.
We present the ExTILAR algorithm, which combines the advantages of the regression based inference algorithm TILAR, like large network sizes processable and low computational costs, with the advantages of dynamic network models based on ordinary differential equation (i.e. in silico knock-down simulations). Like TILAR, ExTILAR makes use of various prior-knowledge types such as transcription factor binding site information and gene interaction knowledge to infer biologically meaningful gene regulatory networks. Therefore, ExTILAR is especially useful when a large number of genes is modeled using a small number of experimental data points.
PMCID: PMC3573979  PMID: 23190768
Gene regulation; Dynamic network inference; Transcription factor networks; Key regulator identification; Linear modeling; Least angle regression; Hepatocytes; Liver; Atf3 - activating transcription factor 3; Dbp - D site albumin promoter binding protein; Tgif1 - TGFB-induced factor homeobox 1
4.  Proteomic snapshot of the EGF-induced ubiquitin network 
In this work, the authors report the first proteome-wide analysis of EGF-regulated ubiquitination, revealing surprisingly pervasive growth factor-induced ubiquitination across a broad range of cellular systems and signaling pathways.
Epidermal growth factor (EGF) triggers a novel ubiquitin (Ub)-based signaling cascade that appears to intersect both housekeeping and regulatory circuitries of cellular physiology.The EGF-regulated Ubiproteome includes scores ubiquitinating and deubiquitinating enzymes, suggesting that the Ub signal might be rapidly transmitted and amplified through the Ub machinery.The EGF-Ubiproteome overlaps significantly with the EGF-phosphotyrosine proteome, pointing to a possible crosstalk between these two signaling mechanisms.The significant number of biological insights uncovered in our study (among which EphA2 as a novel, downstream ubiquitinated target of EGF receptor) illustrates the general relevance of such proteomic screens and calls for further analysis of the dynamics of the Ubiproteome.
Ubiquitination is a process by which one or more ubiquitin (Ub) monomers or chains are covalently attached to target proteins by E3 ligases. Deubiquitinating enzymes (DUBs) revert Ub conjugation, thus ensuring a dynamic equilibrium between pools of ubiquitinated and deubiquitinated proteins (Amerik and Hochstrasser, 2004). Traditionally, ubiquitination has been associated with protein degradation; however, it is now becoming apparent that this post-translation modification is an important signaling mechanism that can modulate the function, localization and protein/protein interaction abilities of targets (Mukhopadhyay and Riezman, 2007; Ravid and Hochstrasser, 2008).
One of the best-characterized signaling pathways involving ubiquitination is the epidermal growth factor (EGF)-induced pathway. Upon EGF stimulation, a variety of proteins are subject to Ub modification. These include the EGF receptor (EGFR), which undergoes both multiple monoubiquitination (Haglund et al, 2003) and K63-linked polyubiquitination (Huang et al, 2006), as well as components of the downstream endocytic machinery, which are modified by monoubiquitination (Polo et al, 2002; Mukhopadhyay and Riezman, 2007). Ubiquitination of the EGFR has been shown to have an impact on receptor internalization, intracellular sorting and metabolic fate (Acconcia et al, 2009). However, little is known about the wider impact of EGF-induced ubiquitination on cellular homeostasis and on the pleiotropic biological functions of the EGFR. In this paper, we attempt to address this issue by characterizing the repertoire of proteins that are ubiquitinated upon EGF stimulation, i.e., the EGF-Ubiproteome.
To achieve this, we employed two different purification procedures (endogenous—based on the purification of proteins modified by endogenous Ub from human cells; tandem affinity purification (TAP)—based on the purification of proteins modified by an ectopically expressed tagged-Ub from mouse cells) with stable isotope labeling with amino acids in cell culture-based MS to obtain both steady-state Ubiproteomes and EGF-induced Ubiproteomes. The steady-state Ubiproteomes consist of 1175 and 582 unambiguously identified proteins for the endogenous and TAP approaches, respectively, which we largely validated. Approximately 15% of the steady-state Ubiproteome was EGF-regulated at 10 min after stimulation; 176 of 1175 in the endogenous approach and 105 of 582 in the TAP approach. Both hyper- and hypoubiquitinated proteins were detected, indicating that EGFR-mediated signaling can modulate the ubiquitin network in both directions. Interestingly, many E2, E3 and DUBs were present in the EGF-Ubiproteome, suggesting that the Ub signal might be rapidly transmitted and amplified through the Ub machinery. Moreover, analysis of Ub-chain topology, performed using mass spectrometry and specific abs, suggested that the K63-linkage was the major Ub-based signal in the EGF-induced pathway.
To obtain a higher-resolution molecular picture of the EGF-regulated Ub network, we performed a network analysis on the non-redundant EGF-Ubiproteome (265 proteins). This analysis revealed that in addition to well-established liaisons with endocytosis-related pathways, the EGF-Ubiproteome intersects many circuitries of intracellular signaling involved in, e.g., DNA damage checkpoint regulation, cell-to-cell adhesion mechanisms and actin remodeling (Figure 5A).
Moreover, the EGF-Ubiproteome was enriched in hubs, proteins that can establish multiple protein/protein interaction and thereby regulate the organization of networks. These results are indicative of a crosstalk between EGFR-activated pathways and other signaling pathways through the Ub-network.
As EGF binding to its receptor also triggers a series of phosphorylation events, we examined whether there was any overlap between our EGF-Ubiproteome and published EGF-induced phosphotyrosine (pY) proteomes (Blagoev et al, 2004; Oyama et al, 2009; Hammond et al, 2010). We observed a significant overlap between ubiquitinated and pY proteins: 23% (61 of 265) of the EGF-Ubiproteome proteins were also tyrosine phosphorylated. Pathway analysis of these 61 Ub/pY-containing proteins revealed a significant enrichment in endocytic and signal-transduction pathways, while ‘hub analysis' revealed that Ub/pY-containing proteins are enriched in highly connected proteins to an even greater extent than Ub-containing proteins alone. These data point to a complex interplay between the Ub and pY networks and suggest that the flow of information from the receptor to downstream signaling molecules is driven by two complementary and interlinked enzymatic cascades: kinases/phosphatases and E3 ligases/DUBs.
Finally, we provided a proof of principle of the biological relevance of our EGF-Ubiproteome. We focused on EphA2, a receptor tyrosine kinase, which is involved in development and is often overexpressed in cancer (Pasquale, 2008). We started from the observation that EphA2 is present in the EGF-Ubiproteome and that proteins of the EGF-Ubiproteome are enriched in the Ephrin receptor signaling pathway(s). We confirmed the MS data by demonstrating that the EphA2 is ubiquitinated upon EGF stimulation. Moreover, EphA2 also undergoes tyrosine phosphorylation, indicating crosstalk between the two receptors. The EGFR kinase domain was essential for these modifications of EphA2, and a partial co-internalization with EGFR upon EGF activation was clearly detectable. Finally, we demonstrated by knockdown of EphA2 in MCF10A cells that this receptor is critically involved in EGFR biological outcomes, such as proliferation and migration (Figure 7).
Overall, our results unveil the complex impact of growth factor signaling on Ub-based intracellular networks to levels that extend well beyond what might have been expected and highlight the ‘resource' feature of our EGF-Ubiproteome.
The activity, localization and fate of many cellular proteins are regulated through ubiquitination, a process whereby one or more ubiquitin (Ub) monomers or chains are covalently attached to target proteins. While Ub-conjugated and Ub-associated proteomes have been described, we lack a high-resolution picture of the dynamics of ubiquitination in response to signaling. In this study, we describe the epidermal growth factor (EGF)-regulated Ubiproteome, as obtained by two complementary purification strategies coupled to quantitative proteomics. Our results unveil the complex impact of growth factor signaling on Ub-based intracellular networks to levels that extend well beyond what might have been expected. In addition to endocytic proteins, the EGF-regulated Ubiproteome includes a large number of signaling proteins, ubiquitinating and deubiquitinating enzymes, transporters and proteins involved in translation and transcription. The Ub-based signaling network appears to intersect both housekeeping and regulatory circuitries of cellular physiology. Finally, as proof of principle of the biological relevance of the EGF-Ubiproteome, we demonstrated that EphA2 is a novel, downstream ubiquitinated target of epidermal growth factor receptor (EGFR), critically involved in EGFR biological responses.
PMCID: PMC3049407  PMID: 21245847
EGF; network; proteomics; signaling; ubiquitin
5.  Modeling Drug- and Chemical-Induced Hepatotoxicity with Systems Biology Approaches 
We provide an overview of computational systems biology approaches as applied to the study of chemical- and drug-induced toxicity. The concept of “toxicity pathways” is described in the context of the 2007 US National Academies of Science report, “Toxicity testing in the 21st Century: A Vision and A Strategy.” Pathway mapping and modeling based on network biology concepts are a key component of the vision laid out in this report for a more biologically based analysis of dose-response behavior and the safety of chemicals and drugs. We focus on toxicity of the liver (hepatotoxicity) – a complex phenotypic response with contributions from a number of different cell types and biological processes. We describe three case studies of complementary multi-scale computational modeling approaches to understand perturbation of toxicity pathways in the human liver as a result of exposure to environmental contaminants and specific drugs. One approach involves development of a spatial, multicellular “virtual tissue” model of the liver lobule that combines molecular circuits in individual hepatocytes with cell–cell interactions and blood-mediated transport of toxicants through hepatic sinusoids, to enable quantitative, mechanistic prediction of hepatic dose-response for activation of the aryl hydrocarbon receptor toxicity pathway. Simultaneously, methods are being developing to extract quantitative maps of intracellular signaling and transcriptional regulatory networks perturbed by environmental contaminants, using a combination of gene expression and genome-wide protein-DNA interaction data. A predictive physiological model (DILIsym™) to understand drug-induced liver injury (DILI), the most common adverse event leading to termination of clinical development programs and regulatory actions on drugs, is also described. The model initially focuses on reactive metabolite-induced DILI in response to administration of acetaminophen, and spans multiple biological scales.
PMCID: PMC3522076  PMID: 23248599
systems toxicology; toxicity pathways; virtual liver; multi-scale modeling; drug toxicity; chemical toxicity; computational toxicology
6.  Metabolic Constraint-Based Refinement of Transcriptional Regulatory Networks 
PLoS Computational Biology  2013;9(12):e1003370.
There is a strong need for computational frameworks that integrate different biological processes and data-types to unravel cellular regulation. Current efforts to reconstruct transcriptional regulatory networks (TRNs) focus primarily on proximal data such as gene co-expression and transcription factor (TF) binding. While such approaches enable rapid reconstruction of TRNs, the overwhelming combinatorics of possible networks limits identification of mechanistic regulatory interactions. Utilizing growth phenotypes and systems-level constraints to inform regulatory network reconstruction is an unmet challenge. We present our approach Gene Expression and Metabolism Integrated for Network Inference (GEMINI) that links a compendium of candidate regulatory interactions with the metabolic network to predict their systems-level effect on growth phenotypes. We then compare predictions with experimental phenotype data to select phenotype-consistent regulatory interactions. GEMINI makes use of the observation that only a small fraction of regulatory network states are compatible with a viable metabolic network, and outputs a regulatory network that is simultaneously consistent with the input genome-scale metabolic network model, gene expression data, and TF knockout phenotypes. GEMINI preferentially recalls gold-standard interactions (p-value = 10−172), significantly better than using gene expression alone. We applied GEMINI to create an integrated metabolic-regulatory network model for Saccharomyces cerevisiae involving 25,000 regulatory interactions controlling 1597 metabolic reactions. The model quantitatively predicts TF knockout phenotypes in new conditions (p-value = 10−14) and revealed potential condition-specific regulatory mechanisms. Our results suggest that a metabolic constraint-based approach can be successfully used to help reconstruct TRNs from high-throughput data, and highlights the potential of using a biochemically-detailed mechanistic framework to integrate and reconcile inconsistencies across different data-types. The algorithm and associated data are available at
Author Summary
Cellular networks, such as metabolic and transcriptional regulatory networks (TRNs), do not operate independently but work together in unison to determine cellular phenotypes. Further, the phenotype and architecture of one network constrains the topology of other networks. Hence, it is critical to study network components and interactions in the context of the entire cell. Typically, efforts to reconstruct TRNs focus only on immediately proximal data such as gene co-expression and transcription factor (TF)-binding. Herein, we take a different strategy by linking candidate TRNs with the metabolic network to predict systems-level responses such as growth phenotypes of TF knockout strains, and compare predictions with experimental phenotype data to select amongst the candidate TRNs. Our approach goes beyond traditional data integration approaches for network inference and refinement by using a predictive network model (metabolism) to refine another network model (regulation) – thus providing an alternative avenue to this area of research. Understanding how the networks function together in a cell will pave the way for synthetic biology and has a wide-range of applications in biotechnology, drug discovery and diagnostics. Further we demonstrate how metabolic models can integrate and reconcile inconsistencies across different data-types.
PMCID: PMC3857774  PMID: 24348226
7.  Construction of a computable cell proliferation network focused on non-diseased lung cells 
BMC Systems Biology  2011;5:105.
Critical to advancing the systems-level evaluation of complex biological processes is the development of comprehensive networks and computational methods to apply to the analysis of systems biology data (transcriptomics, proteomics/phosphoproteomics, metabolomics, etc.). Ideally, these networks will be specifically designed to capture the normal, non-diseased biology of the tissue or cell types under investigation, and can be used with experimentally generated systems biology data to assess the biological impact of perturbations like xenobiotics and other cellular stresses. Lung cell proliferation is a key biological process to capture in such a network model, given the pivotal role that proliferation plays in lung diseases including cancer, chronic obstructive pulmonary disease (COPD), and fibrosis. Unfortunately, no such network has been available prior to this work.
To further a systems-level assessment of the biological impact of perturbations on non-diseased mammalian lung cells, we constructed a lung-focused network for cell proliferation. The network encompasses diverse biological areas that lead to the regulation of normal lung cell proliferation (Cell Cycle, Growth Factors, Cell Interaction, Intra- and Extracellular Signaling, and Epigenetics), and contains a total of 848 nodes (biological entities) and 1597 edges (relationships between biological entities). The network was verified using four published gene expression profiling data sets associated with measured cell proliferation endpoints in lung and lung-related cell types. Predicted changes in the activity of core machinery involved in cell cycle regulation (RB1, CDKN1A, and MYC/MYCN) are statistically supported across multiple data sets, underscoring the general applicability of this approach for a network-wide biological impact assessment using systems biology data.
To the best of our knowledge, this lung-focused Cell Proliferation Network provides the most comprehensive connectivity map in existence of the molecular mechanisms regulating cell proliferation in the lung. The network is based on fully referenced causal relationships obtained from extensive evaluation of the literature. The computable structure of the network enables its application to the qualitative and quantitative evaluation of cell proliferation using systems biology data sets. The network is available for public use.
PMCID: PMC3160372  PMID: 21722388
8.  Integrating Cellular Metabolism into a Multiscale Whole-Body Model 
PLoS Computational Biology  2012;8(10):e1002750.
Cellular metabolism continuously processes an enormous range of external compounds into endogenous metabolites and is as such a key element in human physiology. The multifaceted physiological role of the metabolic network fulfilling the catalytic conversions can only be fully understood from a whole-body perspective where the causal interplay of the metabolic states of individual cells, the surrounding tissue and the whole organism are simultaneously considered. We here present an approach relying on dynamic flux balance analysis that allows the integration of metabolic networks at the cellular scale into standardized physiologically-based pharmacokinetic models at the whole-body level. To evaluate our approach we integrated a genome-scale network reconstruction of a human hepatocyte into the liver tissue of a physiologically-based pharmacokinetic model of a human adult. The resulting multiscale model was used to investigate hyperuricemia therapy, ammonia detoxification and paracetamol-induced toxication at a systems level. The specific models simultaneously integrate multiple layers of biological organization and offer mechanistic insights into pathology and medication. The approach presented may in future support a mechanistic understanding in diagnostics and drug development.
Author Summary
Cellular metabolism is a key element in human physiology. Ideally the metabolic network needs to be considered within the context of the surrounding tissue and organism since the various levels of biological organization are mutually influencing each other. To mechanistically describe the interplay between intracellular space and extracellular environment, we here integrate the genome-scale metabolic network model HepatoNet1 at the cellular scale into physiologically-based pharmacokinetic models at the whole-body level. The resulting multiscale model allows the quantitative description of metabolic behavior in the context of time-resolved metabolite concentration profiles in the body and the surrounding liver tissue. The model has been applied to three case studies covering fundamental aspects of medicine and pharmacology: drug administration, biomarker identification and drug-induced toxication. Most notably, our multiscale approach fosters an improved quantitative understanding of drug action and the impact of metabolic disorders at an organism level, based on a genome-scale representation of cellular metabolism. Computational models such as the one presented include various aspects of human physiology and may therefore significantly support rational approaches in medical diagnostics and pharmaceutical drug development in the future.
PMCID: PMC3486908  PMID: 23133351
9.  Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks 
PLoS Computational Biology  2013;9(10):e1003252.
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development.
Author Summary
The state of a cell is largely determined by the genes the cell expresses. Transcriptional control of gene expression is exerted by transcription factor proteins that bind to regulatory regions of genes and affect their expression. Transcriptional programs have a modular organization enabling multiple genes to be coordinately regulated, and at the same time are fine-tuned for each gene through interactions of transcription factors with a gene's regulatory region. Transcription factors are themselves controlled by upstream signaling proteins, that in turn can be transcriptionally controlled. This complex process of gene expression control is described by a regulatory network that captures who regulates whom. A key challenge in systems biology is to reconstruct regulatory networks that capture precise gene-specific regulatory information, as well as the modular organization of transcriptional programs. We developed a novel regulatory network inference approach, MERLIN, Modular regulatory network learning with per gene information. When applied to examine transcriptional responses in two distinct processes, stress response and cellular differentiation, MERLIN accurately reconstructed regulatory programs of individual genes while revealing regulatory module organization and predicted upstream signaling proteins for regulatory modules. MERLIN is applicable to different environmental, developmental and disease contexts to dissect regulatory programs and ultimately build network-based predictive models of cellular states.
PMCID: PMC3798279  PMID: 24146602
10.  Identifying Tightly Regulated and Variably Expressed Networks by Differential Rank Conservation (DIRAC) 
PLoS Computational Biology  2010;6(5):e1000792.
A powerful way to separate signal from noise in biology is to convert the molecular data from individual genes or proteins into an analysis of comparative biological network behaviors. One of the limitations of previous network analyses is that they do not take into account the combinatorial nature of gene interactions within the network. We report here a new technique, Differential Rank Conservation (DIRAC), which permits one to assess these combinatorial interactions to quantify various biological pathways or networks in a comparative sense, and to determine how they change in different individuals experiencing the same disease process. This approach is based on the relative expression values of participating genes—i.e., the ordering of expression within network profiles. DIRAC provides quantitative measures of how network rankings differ either among networks for a selected phenotype or among phenotypes for a selected network. We examined disease phenotypes including cancer subtypes and neurological disorders and identified networks that are tightly regulated, as defined by high conservation of transcript ordering. Interestingly, we observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. At a sample level, DIRAC can detect a change in ranking between phenotypes for any selected network. Variably expressed networks represent statistically robust differences between disease states and serve as signatures for accurate molecular classification, validating the information about expression patterns captured by DIRAC. Importantly, DIRAC can be applied not only to transcriptomic data, but to any ordinal data type.
Author Summary
The systems approach to medicine derives from the idea that diseased cells arise from one or more perturbed biological networks due to the net effect of interactions among multiple molecular agents; by measuring differences in the abundance of biomolecules (e.g., mRNA, proteins, metabolites) we can identify reporters of network states and uncover molecular signatures of disease. However, a major limitation of previously published network analyses is the focus on small numbers of individual, differentially-expressed genes, hence the failure to take into account combinatorial interactions. We report a new technique, Differential Rank Conservation, for identifying and measuring network-level perturbations. Our rank conservation index is based entirely on the relative levels of expression for participating genes and allows us to detect differences in network orderings between networks for a given phenotype and between phenotypes for a given network. In examining cancer subtypes and neurological disorders, we identified networks that are tightly and loosely regulated, as defined by the level of conservation of transcript ordering, and observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. We also demonstrate that variably expressed networks represent robust differences between disease states.
PMCID: PMC2877722  PMID: 20523739
11.  Genes related to apoptosis predict necrosis of the liver as a phenotype observed in rats exposed to a compendium of hepatotoxicants 
BMC Genomics  2008;9:288.
Some of the biochemical events that lead to necrosis of the liver are well-known. However, the pathogenesis of necrosis of the liver from exposure to hepatotoxicants is a complex biological response to the injury. We hypothesize that gene expression profiles can serve as a signature to predict the level of necrosis elicited by acute exposure of rats to a variety of hepatotoxicants and postulate that the expression profiles of the predictor genes in the signature can provide insight to some of the biological processes and molecular pathways that may be involved in the manifestation of necrosis of the rat liver.
Rats were treated individually with one of seven known hepatotoxicants and were analyzed for gene expression by microarray. Liver samples were grouped by the level of necrosis exhibited in the tissue. Analysis of significantly differentially expressed genes between adjacent necrosis levels revealed that inflammation follows programmed cell death in response to the agents. Using a Random Forest classifier with feature selection, 21 informative genes were identified which achieved 90%, 80% and 60% prediction accuracies of necrosis against independent test data derived from the livers of rats exposed to acetaminophen, carbon tetrachloride, and allyl alcohol, respectively. Pathway and gene network analyses of the genes in the signature revealed several gene interactions suggestive of apoptosis as a process possibly involved in the manifestation of necrosis of the liver from exposure to the hepatotoxicants. Cytotoxic effects of TNF-α, as well as transcriptional regulation by JUN and TP53, and apoptosis-related genes possibly lead to necrosis.
The data analysis, gene selection and prediction approaches permitted grouping of the classes of rat liver samples exhibiting necrosis to improve the accuracy of predicting the level of necrosis as a phenotypic end-point observed from the exposure. The strategy, along with pathway analysis and gene network reconstruction, led to the identification of 1) expression profiles of genes as a signature of necrosis and 2) perturbed regulatory processes that exhibited biological relevance to the manifestation of necrosis from exposure of rat livers to the compendium of hepatotoxicants.
PMCID: PMC2478688  PMID: 18558008
12.  Coordination of frontline defense mechanisms under severe oxidative stress 
Inference of an environmental and gene regulatory influence network (EGRINOS) by integrating transcriptional responses to H2O2 and paraquat (PQ) has revealed a multi-tiered oxidative stress (OS)-management program to transcriptionally coordinate three peroxidase/catalase enzymes, two superoxide dismutases, production of rhodopsins, carotenoids and gas vesicles, metal trafficking, and various other aspects of metabolism.ChIP-chip, microarray, and survival assays have validated important architectural aspects of this network, identified novel defense mechanisms (including two evolutionarily distant peroxidase enxymes), and showed that general transcription factors of the transcription factor B family have an important function in coordinating the OS response (OSR) despite their inability to directly sense ROS.A comparison of transcriptional responses to sub-lethal doses of H2O2 and PQ with predictions of these responses made by an EGRIN model generated earlier from responses to other environmental factors has confirmed that a significant fraction of the OSR is made up of a generalized component that is also observed in response to other stressors.Analysis of active regulons within environment and gene regulatory influence network for OS (EGRINOS) across diverse environmental conditions has identified the specialized component of oxidative stress response (OSR) that is triggered by sub-lethal OS, but not by other stressors, including sub-inhibitory levels of redox-active metals, extreme changes in oxygen tension, and a sub-lethal dose of γ rays.
Reactive oxygen species (ROS), such as hydrogen peroxide (H2O2), superoxide (O2−), and hydroxyl (OH−) radicals, are normal by-products of aerobic metabolism. Evolutionarily conserved mechanisms including detoxification enzymes (peroxidase/catalase and superoxide dismutase (SOD)) and free radical scavengers manage this endogenous production of ROS. OS is a condition reached when certain environmental stresses or genetic defects cause the production of ROS to exceed the management capacity. The damage to diverse cellular components including DNA, proteins, lipids, and carbohydrates resulting from OS (Imlay, 2003; Apel and Hirt, 2004; Perrone et al, 2008) is recognized as an important player in many diseases and in the aging process (Finkel, 2005).
We have applied a systems approach to characterize the OSR of an archaeal model organism, Halobacterium salinarum NRC-1. This haloarchaeon grows aerobically at 4.3 M salt concentration in which it routinely faces cycles of desiccation and rehydration, and increased ultraviolet radiation—both of which can increase the production of ROS (Farr and Kogoma, 1991; Oliver et al, 2001). We have reconstructed the physiological adjustments associated with management of excessive OS through the analysis of global transcriptional changes elicited by step exposure to growth sub-inhibitory and sub-lethal levels of H2O2 and PQ (a redox-cycling drug that produces O2−; Hassan and Fridovich, 1979) as well as during subsequent recovery from these stresses. We have integrated all of these data into a unified model for OSR to discover conditional functional links between protective mechanisms and normal aspects of metabolism. Subsequent phenotypic analysis of gene deletion strains has verified the conditional detoxification functions of three putative peroxidase/catalase enzymes, two SODs, and the protective function of rhodopsins under increased levels of H2O2 and PQ. Similarly, we have also validated ROS scavenging by carotenoids and flotation by gas vesicles as secondary mechanisms that may minimize OS.
Given the ubiquitous nature of OS, it is not entirely surprising that most organisms have evolved similar multiple lines of defense—both passive and active. Although such mechanisms have been extensively characterized using other model organisms, our integrated systems approach has uncovered additional protective mechanisms in H. salinarum (e.g. two evolutionarily distant peroxidase/catalase enzymes) and revealed a structure and hierarchy to the OSR through conditional regulatory associations among various components of the response. We have validated some aspects of the architecture of the regulatory network for managing OS by confirming physical protein–DNA interactions of six transcription factors (TFs) with promoters of genes they were predicted to influence in EGRINOS. Furthermore, we have also shown the consequence of deleting two of these TFs on transcript levels of genes they control and survival rate under OS. It is notable that these TFs are not directly associated with sensing ROS, but, rather, they have a general function in coordinating the overall response. This insight would not have been possible without constructing EGRINOS through systems integration of diverse datasets.
Although it has been known that OS is a component of diverse environmental stress conditions, we quantitatively show for the first time that much of the transcriptional responses induced by the two treatments could indeed have been predicted using a model constructed from the analysis of transcriptional responses to changes in other environmental factors (UV and γ-radiation, light, oxygen, and six metals). However, using specific examples we also reveal the specific components of the OSR that are triggered only under severe OS. Notably, this model of OSR gives a unified perspective of the interconnections among all of these generalized and OS-specific regulatory mechanisms.
Complexity of cellular response to oxidative stress (OS) stems from its wide-ranging damage to nucleic acids, proteins, carbohydrates, and lipids. We have constructed a systems model of OS response (OSR) for Halobacterium salinarum NRC-1 in an attempt to understand the architecture of its regulatory network that coordinates this complex response. This has revealed a multi-tiered OS-management program to transcriptionally coordinate three peroxidase/catalase enzymes, two superoxide dismutases, production of rhodopsins, carotenoids and gas vesicles, metal trafficking, and various other aspects of metabolism. Through experimental validation of interactions within the OSR regulatory network, we show that despite their inability to directly sense reactive oxygen species, general transcription factors have an important function in coordinating this response. Remarkably, a significant fraction of this OSR was accurately recapitulated by a model that was earlier constructed from cellular responses to diverse environmental perturbations—this constitutes the general stress response component. Notwithstanding this observation, comparison of the two models has identified the coordination of frontline defense and repair systems by regulatory mechanisms that are triggered uniquely by severe OS and not by other environmental stressors, including sub-inhibitory levels of redox-active metals, extreme changes in oxygen tension, and a sub-lethal dose of γ rays.
PMCID: PMC2925529  PMID: 20664639
gene regulatory network; microbiology; oxidative stress
13.  Quantification of mRNA and protein and integration with protein turnover in a bacterium 
Determination of the average cellular copy number of 400 proteins under different growth conditions and integration with protein turnover and absolute mRNA levels reveals the dynamics of protein expression in the genome-reduced bacterium Mycoplasma pneumoniae.
Our study provides a fine-grained, quantitative picture to unprecedented detail in an established model organism for systems-wide studies.Our integrative approach reveals a novel, dynamic view on the processes, interactions and regulations underlying the central dogma pathway and the composition of protein complexes.Simulations using our quantitative data on mRNA, protein and turnover show how an organism copes with stochastic noise in gene expression in vivo.Our data serve as an important resource for colleagues both within our field of research and in related disciplines.
A hallmark of Systems Biology is the integration of diverse, large quantitative data sets with the aim to gain novel insights into how biological processes work. We measured individual mRNA and protein abundances as well as protein turnover in the bacterium Mycoplasma pneumoniae. This human pathogen is an ideal model organism for organism-wide studies. It can be readily cultured under laboratory conditions and it has a very small genome with only 690 protein-coding genes. This comparably low complexity allows for the exhaustive analysis of major cellular biomolecules avoiding constrains introduced by limitations of available analysis techniques.
Using a recently developed mass spectrometry-based approach, we determined the average cellular copy number for over 400 individual proteins under different growth and stress conditions. The 20 most abundant proteins, including Elongation factor Tu, cellular chaperones, and proteins involved in metabolizing glucose, the major energy source of M. pneumoniae account for nearly 44% of the total cellular protein mass. We observed abundance changes of many expected and several unexpected proteins in response to cellular stress, such as heat shock, DNA damage and osmotic stress, as well as along batch culture growth over 4 days.
Integration of the protein abundance data with quantitative mRNA measurements revealed a modest correlation between these two classes of biomolecules. However, for several classical stress-induced proteins, we observed a correlated induction of mRNA and protein in response to heat shock. A focused analysis of mRNA–protein abundance dynamics during batch culture growth suggested that the regulation of gene expression is largely decoupled from protein dynamics in M. pneumoniae, indicating extensive post-transcriptional and post-translational regulation influencing the cellular mRNA–protein ratios.
To investigate the factors influencing the cellular protein abundance, we measured individual protein turnover rates by mass spectrometry using a label-chase approach involving stable isotope-labelled amino acids. The average half-life of a protein in M. pneumoniae is 23 h. Based on the measured quantitative mRNA data, the protein abundances and their half-lives, we established an ordinary differential equations model for the estimation of individual in vivo protein degradation and translation efficiency rates. We found out that translation efficiency rather than protein turnover is the dominating factor influencing protein abundance. Using our abundance and turnover data, we additionally performed stochastic simulations of gene expression. We observed that long protein half-life and low translational efficiency buffers gene expression noise propagating from low cellular mRNA levels in vivo.
We compared the abundance ratios of proteins associating into complexes in vivo with their expected functional stoichiometries. We observed that for stable protein complexes, such as the GroEL/ES chaperonin or DNA gyrase, our measured abundance ratios reflected the expected subunit stoichiometries. More dynamic protein complexes, such as the DnaK/J/GrpE chaperone system or RNA polymerase, showed several unusual subunit ratios, pointing towards transient interaction of sub-stoichiometric subunits for function. A detailed, quantitative analysis of the ribosome, the largest cellular protein complex, revealed large abundance differences of the 51 subunits. This observation indicates a multi-functionality for several, abundant ribosomal proteins.
Finally, a comparison of the determined average cellular protein abundances with a different pathogenic bacterium, Leptospira interrogans, revealed that cellular protein abundances closely reflect their respective lifestyles.
Our study represents an organism-wide, quantitative analysis of cellular protein abundances. Integrating our proteomics data with determined mRNA levels and protein turnover rates reveals insights into the dynamic interplay and regulation of mRNA and proteins, the central biomolecules of a cell.
Biological function and cellular responses to environmental perturbations are regulated by a complex interplay of DNA, RNA, proteins and metabolites inside cells. To understand these central processes in living systems at the molecular level, we integrated experimentally determined abundance data for mRNA, proteins, as well as individual protein half-lives from the genome-reduced bacterium Mycoplasma pneumoniae. We provide a fine-grained, quantitative analysis of basic intracellular processes under various external conditions. Proteome composition changes in response to cellular perturbations reveal specific stress response strategies. The regulation of gene expression is largely decoupled from protein dynamics and translation efficiency has a higher regulatory impact on protein abundance than protein turnover. Stochastic simulations using in vivo data show how low translation efficiency and long protein half-lives effectively reduce biological noise in gene expression. Protein abundances are regulated in functional units, such as complexes or pathways, and reflect cellular lifestyles. Our study provides a detailed integrative analysis of average cellular protein abundances and the dynamic interplay of mRNA and proteins, the central biomolecules of a cell.
PMCID: PMC3159969  PMID: 21772259
mRNA–protein; Mycoplasma pneumoniae; protein homeostasis; protein turnover; quantitative proteomics
14.  Genomic Analysis Reveals a Potential Role for Cell Cycle Perturbation in HCV-Mediated Apoptosis of Cultured Hepatocytes 
PLoS Pathogens  2009;5(1):e1000269.
The mechanisms of liver injury associated with chronic HCV infection, as well as the individual roles of both viral and host factors, are not clearly defined. However, it is becoming increasingly clear that direct cytopathic effects, in addition to immune-mediated processes, play an important role in liver injury. Gene expression profiling during multiple time-points of acute HCV infection of cultured Huh-7.5 cells was performed to gain insight into the cellular mechanism of HCV-associated cytopathic effect. Maximal induction of cell-death–related genes and appearance of activated caspase-3 in HCV-infected cells coincided with peak viral replication, suggesting a link between viral load and apoptosis. Gene ontology analysis revealed that many of the cell-death genes function to induce apoptosis in response to cell cycle arrest. Labeling of dividing cells in culture followed by flow cytometry also demonstrated the presence of significantly fewer cells in S-phase in HCV-infected relative to mock cultures, suggesting HCV infection is associated with delayed cell cycle progression. Regulation of numerous genes involved in anti-oxidative stress response and TGF-β1 signaling suggest these as possible causes of delayed cell cycle progression. Significantly, a subset of cell-death genes regulated during in vitro HCV infection was similarly regulated specifically in liver tissue from a cohort of HCV-infected liver transplant patients with rapidly progressive fibrosis. Collectively, these data suggest that HCV mediates direct cytopathic effects through deregulation of the cell cycle and that this process may contribute to liver disease progression. This in vitro system could be utilized to further define the cellular mechanism of this perturbation.
Author Summary
Chronic HCV infection is associated with progressive liver injury and subsequent development of fibrosis/cirrhosis. The cellular mechanisms by which HCV replication, and subsequent virus–host interactions, may mediate liver injury are unclear. Microarray experiments were performed to characterize the host transcriptional response to HCV infection of cultured hepatocytes in an attempt to gain insight into the mechanism of HCV-associated cell death. Analysis of the gene expression data revealed that many differentially regulated genes function to induce apoptosis in response to cell cycle arrest, possibly in response to DNA damage and oxidative stress. Labeling of dividing cells in culture followed by flow cytometry also demonstrated the presence of significantly fewer cells in S-phase in HCV-infected cultures relative to mock cultures, suggesting HCV infection is associated with delayed cell cycle progression. Finally, many of the cell-death–related genes whose expression changes in response to HCV infection of cultured hepatocytes were also differentially regulated in liver tissue from HCV-infected patients with histological evidence of fibrosis. In summary, HCV may mediate direct cytopathic effects through perturbation of the cell cycle which potentially contributes to liver disease progression.
PMCID: PMC2613535  PMID: 19148281
15.  Successful mouse hepatocyte culture with sandwich collagen gel formation 
Primary mammalian hepatocytes largely retain their liver-specific functions when they are freshly derived from donors. However, long-term cultures of functional hepatocytes are difficult to establish. To increase the longevity and maintain the differentiated functions of hepatocytes in primary culture, cells can be cultured in a sandwich configuration of collagen. In such a configuration, hepatocytes can be cultured for longer periods compared with cultures on single layers of collagen. However, research regarding mouse hepatocytes in sandwich culture is lacking.
Primary mouse hepatocytes were sandwiched between two layers of collagen to maintain the stability of their liver-specific functions. After gelation, 2 mL of hepatocyte culture medium was applied.
After 24 hours, 5, 10 days of culture, the collagen gel sandwich maintained the cellular border and numbers of bile canaliculi more efficiently than a single collagen coating in both high and low density culture dishes. Reverse transcription-polymerase chain reaction analysis of alpha-1-antitrypsin (AAT), hepatocyte nuclear factor 4 alpha (HNF4A), alphafetoprotein, albumin, tryptophan oxygenase (TO), the tyrosine aminotransferase gene, glucose-6-phosphatase, glyceraldehyde-3-phosphate dehydrogenase for mouse primary hepatocytes cultured on collagen coated dishes and collagen gels showed superior hepatocyte-related gene expression in cells grown using the collagen gel sandwich culture system. AAT, HNF4A, albumin, TO were found to be expressed in mouse hepatocytes cultured on collagen gels for 5 and 10 days. In contrast, mouse hepatocytes grown on collagen-coated dishes did not express these genes after 5 and 10 days of culture.
The collagen gel sandwich method is suitable for primary culture system of adult mouse hepatocytes.
PMCID: PMC3616273  PMID: 23577314
Collagen; Culture; Hepatocyte
16.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
17.  A Genomewide Functional Network for the Laboratory Mouse 
PLoS Computational Biology  2008;4(9):e1000165.
Establishing a functional network is invaluable to our understanding of gene function, pathways, and systems-level properties of an organism and can be a powerful resource in directing targeted experiments. In this study, we present a functional network for the laboratory mouse based on a Bayesian integration of diverse genetic and functional genomic data. The resulting network includes probabilistic functional linkages among 20,581 protein-coding genes. We show that this network can accurately predict novel functional assignments and network components and present experimental evidence for predictions related to Nanog homeobox (Nanog), a critical gene in mouse embryonic stem cell pluripotency. An analysis of the global topology of the mouse functional network reveals multiple biologically relevant systems-level features of the mouse proteome. Specifically, we identify the clustering coefficient as a critical characteristic of central modulators that affect diverse pathways as well as genes associated with different phenotype traits and diseases. In addition, a cross-species comparison of functional interactomes on a genomic scale revealed distinct functional characteristics of conserved neighborhoods as compared to subnetworks specific to higher organisms. Thus, our global functional network for the laboratory mouse provides the community with a key resource for discovering protein functions and novel pathway components as well as a tool for exploring systems-level topological and evolutionary features of cellular interactomes. To facilitate exploration of this network by the biomedical research community, we illustrate its application in function and disease gene discovery through an interactive, Web-based, publicly available interface at
Author Summary
Functionally related proteins interact in diverse ways to carry out biological processes, and each protein often participates in multiple pathways. Proteins are therefore organized into a complex network through which different functions of the cell are carried out. An accurate description of such a network is invaluable to our understanding of both the system-level features of a cell and those of an individual biological process. In this study, we used a probabilistic model to combine information from diverse genome-scale studies as well as individual investigations to generate a global functional network for mouse. Our analysis of the global topology of this network reveals biologically relevant systems-level characteristics of the mouse proteome, including conservation of functional neighborhoods and network features characteristic of known disease genes and key transcriptional regulators. We have made this network publicly available for search and dynamic exploration by researchers in the community. Our Web interface enables users to easily generate hypotheses regarding potential functional roles of uncharacterized proteins, investigate possible links between their proteins of interest and disease, and identify new players in specific biological processes.
PMCID: PMC2527685  PMID: 18818725
18.  Global coordination of transcriptional control and mRNA decay during cellular differentiation 
We have systematically identified the targets of the Schizosaccharomyces pombe RNA-binding protein Meu5p, which is transiently induced during cellular differentiation. Meu5p-bound transcripts (>80) are expressed at low levels and have shorter half-lives in meu5 mutants, suggesting that Meu5p binding stabilizes its RNA targets.Most Meu5p targets are induced during differentiation by the activity of the Mei4p transcription factor. However, although most Mei4p targets display a sharp peak of expression, Meu5p targets are expressed for a longer period. In the absence of Meu5p, all Mei4p targets are expressed with similar kinetics (similar to non-Meu5p targets). Therefore, Meu5p determines the temporal profile of its targets.As the meu5 gene is itself a target of the transcription factor Mei4p, the RNA-binding protein Meu5p and their shared targets form a feed-forward loop (FFL), a network motif that is common in transcriptional networks.Our data highlight the importance of considering both transcriptional and posttranscriptional controls to understand dynamic changes in RNA levels, and provide insight into the structure of the regulatory networks that integrate transcription and RNA decay.
RNA levels are determined by the balance between RNA production (transcription) and degradation (decay or turnover). Therefore, cells can alter transcript levels by modulating either or both processes. Regulation of transcriptional initiation is one of the most common ways to regulate RNA levels. This function is frequently performed by transcription factors (TFs), which recognize specific sequence motifs on the promoters of their target genes and activate or repress their transcription. At the posttranscriptional level, RNA-binding proteins (RBPs) can bind to specific sequences on their target RNAs and regulate their rates of turnover.
RNA decay can be studied at the genome-wide level using microarrays or next-generation sequencing. The contribution of RNA turnover to transcript levels can be assessed by directly measuring decay rates. This is usually achieved by using microarrays to follow the decrease of RNA levels after inactivation of RNA polymerase II, or by in vivo labelling of newly synthesized RNA with modified nucleosides. These approaches can be applied to mutants in genes encoding RBPs, allowing the dissection of their specific functions in RNA turnover. Moreover, direct RBP targets can be identified by purifying RBP–RNA complexes, which are then analysed using microarrays (RIp-chip, for RBP Immunoprecipitation followed by analysis with DNA chips).
Many biological processes involve the establishment of complex programs of gene expression, in which the levels of hundreds of mRNAs are dynamically regulated. Although the genome-wide function of TFs in these processes has been studied extensively, much less is known about the contribution of RBPs, and especially about how the activity of TFs and RBPs is coordinated. Sexual differentiation of the fission yeast Schizosaccharomyces pombe culminates in meiosis and sporulation and is driven by an extensive gene expression program during which ∼40% of the genome (∼2000 genes) is regulated in complex temporal patterns. Transcriptional control is essential for the implementation of this program, and TFs responsible for the induction of most groups of upregulated genes have been identified. In particular, a transcription factor called Mei4p, which is itself transiently expressed during the meiotic divisions, induces the temporary expression of over 500 genes.
Here, we use genome-wide approaches to investigate the function of the Meu5p RBP, which is transiently induced by the Mei4p TF during the meiotic divisions. RIp-chip experiments identified >80 transcripts bound to Meu5p during meiosis, most of which were also targets of the Mei4p transcription factor. In meu5 mutants, Meu5p targets are expressed at low levels and have shorter half-lives, indicating that Meu5p stabilizes the transcripts it binds to. This stabilization has biological importance, as cells without meu5 are defective in spore formation.
Although the majority of Mei4p TF targets reach their peak in expression levels with similar kinetics, we noticed that the timing of their downregulation was heterogeneous. We could identify two discrete groups among Mei4p targets: a set of mRNAs with short (∼1 h) and sharp gene expression profiles (early decrease), and a group that displayed a broader expression pattern, with high levels of expression for 2–3 h (late decrease).
Most Meu5p RBP targets belonged to the late-decrease group, suggesting a simple model in which Meu5p might stabilize its targets, thus extending the duration of their expression. To test this idea, we followed gene expression in synchronized cultures of wild-type and meu5Δ meiotic cells. Although the expression of early decrease genes was not affected by the absence of meu5, late-decrease genes switched their profile to a pattern similar to that of early decrease genes. As transcription of meu5 is under the control of Mei4p, the TF Mei4p, the RBP Meu5p, and their common targets form a so-called feed-forward loop, in which a protein regulates a target both directly and indirectly through a second protein. This arrangement is common in transcriptional and protein phosphorylation networks.
Our results serve as a paradigm of how the coordination of the action of TFs and RBPs determines how RNA levels are dynamically regulated.
The function of transcription in dynamic gene expression programs has been extensively studied, but little is known about how it is integrated with RNA turnover at the genome-wide level. We investigated these questions using the meiotic gene expression program of Schizosaccharomyces pombe. We identified over 80 transcripts that co-purify with the meiotic-specific Meu5p RNA-binding protein. Their levels and half-lives were reduced in meu5 mutants, demonstrating that Meu5p stabilizes its targets. Most Meu5p-bound RNAs were also targets of the Mei4p transcription factor, which induces the transient expression of ∼500 meiotic genes. Although many Mei4p targets showed sharp expression peaks, Meu5p targets had broad expression profiles. In the absence of meu5, all Mei4p targets were expressed with similar kinetics, indicating that Meu5p alters the global features of the gene expression program. As Mei4p activates meu5 transcription, Mei4p, Meu5p and their common targets form a feed-forward loop, a motif common in transcriptional networks but not studied in the context of mRNA decay. Our data provide insight into the topology of regulatory networks integrating transcriptional and posttranscriptional controls.
PMCID: PMC2913401  PMID: 20531409
mRNA decay; RIp-chip; posttranscriptional control
19.  A synthetic library of RNA control modules for predictable tuning of gene expression in yeast 
The authors describe a library of synthetic RNA control elements that provide programmable post-transcriptional regulation of gene expression in yeast. This toolkit is then used to study endogenous regulation of the ergosterol biosynthetic pathway.
Rnt1p hairpins can act as effective posttranscriptional gene regulatory elements in the yeast Saccharomyces cerevisiae.Modification of the cleavage efficiency box (CEB) region of an Rnt1p hairpin can modulate Rnt1p cleavage rates, and thus the resulting gene regulatory activities of the hairpin control elements.A library of Rnt1p hairpins can act as a set of synthetic control modules that provide predictable tuning of gene expression over a wide range of expression levels.The Rnt1p-based control elements can be combined with any promoter to support titration of regulatory strategies encoded in transcriptional regulators, including feedback control around endogenous proteins.
The design of complex biological systems encoding desired functions require the development of genetic tools for the precise control of protein levels in cells (Elowitz and Leibler, 2000; Gardner et al, 2000; Basu et al, 2004). For example, in the design of engineered metabolic networks, the tuning of enzyme levels is often critical for overcoming metabolic burden (Jones et al, 2000; Jin et al, 2003), the accumulation of toxic intermediates (Zhu et al, 2001; Pfleger et al, 2006) and detrimental consequences associated with the redirection of cellular resources from native pathways (Alper et al, 2005b; Paradise et al, 2008). Various examples of libraries of genetic control modules have been described that have been generated through the randomization of well-characterized gene expression control elements (Basu et al, 2004; Pfleger et al, 2006; Anderson et al, 2007). However, most of these studies have been conducted in Escherichia coli such that there is a lack of similar tools for other cellular chassis.
The budding yeast, Saccharomyces cerevisiae, is a relevant organism in industrial processes, including biosynthesis and biomanufacturing strategies (Ostergaard et al, 2000; Szczebara et al, 2003; Nguyen et al, 2004; Veen and Lang, 2004; Ro et al, 2006; Hawkins and Smolke, 2008). The majority of existing methods for tuning gene expression in yeast are through transcriptional control mechanisms in the form of inducible and constitutive promoter systems (Hawkins and Smolke, 2006; Nevoigt et al, 2006; Nevoigt et al, 2007). RNA-based control modules based on posttranscriptional mechanisms may offer an advantage in that they can be coupled to any promoter of choice, providing for enhanced control strategies and finer resolution tuning of protein expression levels. Although posttranscriptional control elements, such as internal ribosome entry sites and AU-rich elements, have been applied to regulate heterologous gene expression in yeast (Vasudevan and Peltz, 2001; Zhou et al, 2001; Lautz et al, 2010), these control elements have exhibited substantial variability in activity and have not been engineered as synthetic libraries exhibiting a wide range of predictable gene regulatory activities.
RNase III enzymes are a class of enzymes that cleave double-stranded RNA. The S. cerevisiae RNase III enzyme, Rnt1p, exhibits a number of unique features that allow it to recognize very specific RNA hairpin substrates that harbor a consensus AGNN tetraloop sequence. Despite extensive characterization of this enzyme and its demonstrated role in processing non-coding RNA and mRNA, neither natural nor synthetic Rnt1p substrates have been used to control gene expression levels in yeast. Therefore, we developed a genetic control system based on directed Rnt1p processing of a target transcript. Specifically, Rnt1p hairpins were immediately flanked by a clamp sequence (that insulates the hairpin structure from surrounding sequences) and placed downstream of a gene of interest, where they direct cleavage and thus inactivate the transcript, resulting in rapid transcript degradation. We validated this Rnt1p-based control system with two Rnt1p hairpins based on previous in vitro studies and demonstrated that Rnt1p hairpins can act as gene control modules in yeast.
Previous in vitro studies had identified three key regions in Rnt1p hairpins: the cleavage efficiency box (CEB), the binding stability box and the initial binding and positioning box (Lamontagne et al, 2003). The CEB region affects the processing of the hairpin stem by Rnt1p, such that nucleotide (nt) modifications in this region are expected to specifically modulate the cleavage rate. We created an Rnt1p hairpin library by randomizing the CEB region (12 nt). This library was placed downstream of a fluorescent reporter protein and a cell-based screening assay was used to identify functional members of the library that resulted in lowered fluorescence levels. The functional Rnt1p hairpin library comprises 16 unique sequences that span a large gene regulatory range—from 8 to 85% (Figure 3A)—and are fairly evenly distributed across this range. The negative controls for each sequence (constructed by mutating the required consensus tetraloop sequence) demonstrated that the majority of gene knockdown observed from each hairpin is due to Rnt1p processing (Figure 3B). A correlation analysis on the transcript and protein levels for each library hairpin construct indicated a strong positive correlation and a strong preservation of rank order between the two in vivo regulatory measurements (Figure 3C). Characterization of the hairpin library in a different genetic context supported the broader utility of these control modules for providing predictable gene control.
We applied the Rnt1p control modules to titrating a key enzyme component of the endogenous ergosterol biosynthesis network—the ERG9 genetic target. Squalene synthase, encoded by the ERG9 gene, is responsible for catalyzing the conversion of two molecules of farnesyl pyrophosphate to squalene, the first precursor in the ergosterol biosynthetic pathway in S. cerevisiae (Poulter and Rilling, 1981; Figure 6A). We integrated several members of the Rnt1p hairpin library downstream of the native ERG9 gene to cover the regulatory range of the library (Figure 6B). A strong positive correlation and preservation of rank order was observed between the ERG9 transcript levels and their yEGFP3 counterparts (Figure 6C). However, ERG9 expression levels did not fall below ∼40%, regardless of the Rnt1p hairpin strength, indicating that a previously identified endogenous feedback mechanism associated with the native ERG9 promoter acts to maintain ERG9 expression levels at that threshold value. In addition, most strains exhibited high relative ergosterol levels and growth rates, except for two strains harboring synthetic Rnt1p hairpins that resulted in the lowest expression levels, which exhibited a significant reduction in the amount of ergosterol produced and growth rate (Figure 6D and E). Our studies indicate that the endogenous feedback mechanism can be acting to increase ERG9 expression levels to the desired set point in the slow-growing strains, but the perturbations introduced in these strains may result in other impacts on the pathway that inhibit the endogenous control systems from restoring cellular growth to wild-type rates. These studies support the unique ability of the synthetic Rnt1p hairpin library to systematically titrate pathway enzyme levels by introducing precise perturbations around major control points while maintaining native cellular control strategies acting through transcriptional mechanisms.
Advances in synthetic biology have resulted in the development of genetic tools that support the design of complex biological systems encoding desired functions. The majority of efforts have focused on the development of regulatory tools in bacteria, whereas fewer tools exist for the tuning of expression levels in eukaryotic organisms. Here, we describe a novel class of RNA-based control modules that provide predictable tuning of expression levels in the yeast Saccharomyces cerevisiae. A library of synthetic control modules that act through posttranscriptional RNase cleavage mechanisms was generated through an in vivo screen, in which structural engineering methods were applied to enhance the insulation and modularity of the resulting components. This new class of control elements can be combined with any promoter to support titration of regulatory strategies encoded in transcriptional regulators and thus more sophisticated control schemes. We applied these synthetic controllers to the systematic titration of flux through the ergosterol biosynthesis pathway, providing insight into endogenous control strategies and highlighting the utility of this control module library for manipulating and probing biological systems.
PMCID: PMC3094065  PMID: 21364573
gene expression control; metabolic flux control; RNA controller; Rnt1p hairpin; synthetic biology
20.  Metabolomic and transcriptomic stress response of Escherichia coli 
GC-MS-based analysis of the metabolic response of Escherichia coli exposed to four different stress conditions reveals reduction of energy expensive pathways.Time-resolved response of E. coli to changing environmental conditions is more specific on the metabolite as compared with the transcript level.Cease of growth during stress response as compared with stationary phase response invokes similar transcript but dissimilar metabolite responses.Condition-dependent associations between metabolites and transcripts are revealed applying co-clustering and canonical correlation analysis.
The response of biological systems to environmental perturbations is characterized by a fast and appropriate adjusting of physiology on every level of the cellular and molecular network.
Stress response is usually represented by a combination of both specific responses, aimed at minimizing deleterious effects or repairing damage (e.g. protein chaperones under temperature stress) and general responses which, in part, comprise the downregulation of genes related to translation and ribosome biogenesis. This in turn is reflected by growth cessation or reduction observed under essentially all stress conditions and is an important strategy to adjust cellular physiology to the new condition.
E. coli has been intensively investigated in relation to stress responses. Thus far, however, the majority of global analyses of E. coli stress responses have been limited to just one level, gene expression. To better understand system response to perturbation, we designed a time-resolved experiment to compare and integrate metabolic and transcript changes of E. coli using four stress conditions including non-lethal temperature shifts, oxidative stress, and carbon starvation relative to cultures grown under optimal conditions covering both states before and directly after stress application, resumption of growth after stress-induced lag phase, and finally the stationary phase.
Metabolic changes occurring after stress application were characterized by a reduction in metabolites of central metabolism (TCA cycle and glycolysis) as well as an increase in free amino acids. Whereas the latter is probably due to protein degradation and stalling of translation, the former supports and extends conclusions based on transcriptome data demonstrating a major decrease in energy-consuming processes as a general stress response. Further comparative analysis of the response on the metabolome and transcriptome, however, revealed in addition to these similarities major differences. Thus, the response on the metabolome displayed a significantly higher specificity towards the specific stress as compared with the transcriptome. Further, when comparing the metabolome of cells ceasing growth due to stress application with cells ceasing growth due to reaching stationary phase the metabolome response differed to a significant extent between both growth arrest phases, whereas the transcriptome response showed significant overlap again, suggesting that the response of E. coli on the metabolome level displays a higher level of significance as compared with the transcriptome level.
Subsequently, both data sets were jointly analyzed using co-clustering and canonical correlation approaches to identify coordinated changes on the transcriptome and the metabolite level indicative metabolite–transcript associations. A first outcome of this study was that no association was preserved during all conditions analyzed but rather condition-specific associations were observed. One set of associations found was between metabolites from the oxidative pentose phosphate pathway such as glc-6-P, 6-P-gluconic acid, ribose-5-P, and E-4-P and metabolites from the glycolytic pathway (3PGA and PEP in addition to glc-6-P with two genes encoding pathway enzymes, that is rpe encoding ribulose phosphate 3-epimerase and pps encoding PEP synthase.
A second example comprises metabolites of the TCA cycle such as pyruvic acid, 2-ketoglutaric acid, fumaric acid, malic acid, and succinic acid and the mqo gene encoding malate-quinone oxidoreductase (MQO). MQO catalyses the irreversible oxidation of malate to oxaloacetate that in turn regulates the activity of citrate synthase, which is a major rate determining enzyme of the TCA cycle. The strong association between mqo gene expression and multiple members of the TCA cycle as well as pyruvate suggest mqo expression to have a major function for the regulation of the TCA cycle, which need to be experimentally validated.
Multiple further associations identified show on the one hand the power of integrative systems oriented approaches for developing new hypothesis, on the other hand their condition-dependent behavior shows the extreme flexibility of the biological systems studied thus requesting a much more intense effort toward parallel analysis of biological systems under several environmental conditions.
Environmental fluctuations lead to a rapid adjustment of the physiology of Escherichia coli, necessitating changes on every level of the underlying cellular and molecular network. Thus far, the majority of global analyses of E. coli stress responses have been limited to just one level, gene expression. Here, we incorporate the metabolite composition together with gene expression data to provide a more comprehensive insight on system level stress adjustments by describing detailed time-resolved E. coli response to five different perturbations (cold, heat, oxidative stress, lactose diauxie, and stationary phase). The metabolite response is more specific as compared with the general response observed on the transcript level and is reflected by much higher specificity during the early stress adaptation phase and when comparing the stationary phase response to other perturbations. Despite these differences, the response on both levels still follows the same dynamics and general strategy of energy conservation as reflected by rapid decrease of central carbon metabolism intermediates coinciding with downregulation of genes related to cell growth. Application of co-clustering and canonical correlation analysis on combined metabolite and transcript data identified a number of significant condition-dependent associations between metabolites and transcripts. The results confirm and extend existing models about co-regulation between gene expression and metabolites demonstrating the power of integrated systems oriented analysis.
PMCID: PMC2890322  PMID: 20461071
Escherichia coli; metabolomic; response to stress; time course; transcriptomic
21.  Dynamic Circadian Protein–Protein Interaction Networks Predict Temporal Organization of Cellular Functions 
PLoS Genetics  2013;9(3):e1003398.
Essentially all biological processes depend on protein–protein interactions (PPIs). Timing of such interactions is crucial for regulatory function. Although circadian (∼24-hour) clocks constitute fundamental cellular timing mechanisms regulating important physiological processes, PPI dynamics on this timescale are largely unknown. Here, we identified 109 novel PPIs among circadian clock proteins via a yeast-two-hybrid approach. Among them, the interaction of protein phosphatase 1 and CLOCK/BMAL1 was found to result in BMAL1 destabilization. We constructed a dynamic circadian PPI network predicting the PPI timing using circadian expression data. Systematic circadian phenotyping (RNAi and overexpression) suggests a crucial role for components involved in dynamic interactions. Systems analysis of a global dynamic network in liver revealed that interacting proteins are expressed at similar times likely to restrict regulatory interactions to specific phases. Moreover, we predict that circadian PPIs dynamically connect many important cellular processes (signal transduction, cell cycle, etc.) contributing to temporal organization of cellular physiology in an unprecedented manner.
Author Summary
Circadian clocks are endogenous oscillators that drive daily rhythms in physiology, metabolism, and behavior. In mammals, circadian rhythms are generated within nearly every cell; and, although dysfunction of circadian clocks is associated with various diseases (including diabetes and cancer), the molecular mechanisms linking the clock machinery with output pathways are little understood. Since essentially all biological processes depend on protein–protein interactions, we investigated here on a systems-wide level how time-of-day-specific protein–protein interactions contribute to the temporal organization of cellular physiology. We constructed a circadian interactome using experimentally generated protein–protein interaction data and made this network dynamic by the incorporation of time-of-day-dependent expression data. Interestingly, systematic genetic network perturbation (RNAi and overexpression) suggests a crucial role for circadian components involved in dynamic interactions. Systems analysis of a global network revealed that interacting proteins are in the liver significantly more expressed at similar daytimes likely to restrict regulatory interactions to specific circadian phases within cells. Overall, circadian protein–protein interactions are predicted to dynamically connect important cellular processes (signal transduction, cell cycle, etc.) using—very often—protein modules with components co-expressed in time, shedding new light on the daily organization of cellular physiology.
PMCID: PMC3610820  PMID: 23555304
22.  Genome-wide transcriptional plasticity underlies cellular adaptation to novel challenge 
By recruiting the essential HIS3 gene to the GAL regulatory system and switching to a repressing glucose medium, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution.Adaptation to this challenge involved a global transcriptional response of a sizeable fraction of the genome, which relaxed on the time scale of the population adaptation, of order of 10 generations.For a large fraction of the responding genes there is no simple biological interpretation, connecting them to the specific cellular demands imposed by the novel challenge.Strikingly, repeating the experiment did not reproduce similar transcription patterns neither in the transient phase nor in the adapted state in glucose.These results suggest that physiological selection operates on the new metabolic configurations generated by the non-specific large scale transcriptional response to eventually stabilize an adaptive state.
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. Some common perturbations, such as changes in nutrient composition, elicit well-characterized transcriptional responses that can be understood by simple engineering-like design principles as satisfying specific demands imposed by the perturbation. However, cells also have the ability to adapt to novel and unforeseen challenges. This ability is central in realizing the evolvability potential of cells as they respond to dramatic genetic or environmental changes along evolution. Little is known about the mechanisms underlying such adaptations to novel challenges; in particular, the role of the transcriptional regulatory network in such adaptations has not been characterized. Genome-wide measurements have revealed that, in many cases, perturbations lead to a global transcriptional response involving a sizeable fraction of the genome (Gasch et al, 2000; Jelinsky et al, 2000; Causton et al, 2001; Ideker et al, 2001; Lai et al, 2005). Such global behavior suggests that general collective properties of the genetic network, rather than specific pre-designed pathways, determine an important part of the transcriptional response. It is not known however what fraction of genes within such massive transcriptional responses is essential to the specific cellular demands. It is also unknown whether the non-pre-designed part of the response can have a functional role in adaptation to novel challenges.
To study these questions, we confronted yeast cells with a novel challenge they had not encountered before along their history in evolution. A strain of the yeast Saccharomyces cerevisiae was engineered to recruit the gene HIS3, an essential enzyme from the histidine biosynthesis pathway (Hinnebusch, 1992), to the GAL regulatory system, responsible for galactose utilization (Stolovicki et al, 2006). The GAL system is known to be strongly repressed when the cells are exposed to glucose. Therefore, upon switching to a medium containing glucose and lacking histidine, the GAL system and with it HIS3 are highly repressed immediately following the switch and the cells encounter a severe challenge. We have recently shown that a cell population carrying this rewired genome can adapt to grow competitively in a chemostat in a medium containing pure glucose (Stolovicki et al, 2006). This adaptation occurred on a timescale of ∼10 generations; applying a stronger environmental pressure in the form of a competitive inhibitor to HIS3 (3AT) resulted in a similar adaptation albeit with a longer timescale. Figure 1 shows the dynamics of the population's cell density (blue lines, measured by OD) following a medium switch from galactose to glucose in the chemostat without (A) and with (B) 3AT. The experiments revealed that adaptation occurs on physiological timescales (much shorter than required by spontaneous random mutations), but the mechanisms underlying this adaptation have remained unclear (Stolovicki et al, 2006).
Yeast cells had not encountered recruitment of HIS3 to the GAL system along their evolutionary history, and their genome could not possibly have been selected to specifically address glucose repression of HIS3. This experiment, therefore, provides a unique opportunity to characterize the spontaneous transcriptional response during adaptation to a novel challenge and to assess the functional role of the regulatory system in this adaptation. We used DNA microarrays to measure the genome-wide expression levels at time points along the adaptation process, with and without 3AT. These measurements revealed that a sizeable fraction of the genome responded by induction or repression to the switch into glucose. Superimposed on the OD traces, Figure 1 shows the results of a clustering analysis of the expression of genes as measured by the arrays along time in the experiments. This analysis revealed two dominant clusters, each containing hundreds of genes in each experiment, which responded to the medium switch to glucose by a strong transient induction or repression followed by relaxation to steady state on the timescale of the adaptation process, ∼ 10 generations. The two clusters in each experiment show similar but opposite dynamics.
A detailed analysis of the gene content in the two clusters revealed that only a small portion of the response was induced by a change in carbon source (15% overlap between the corresponding clusters in the two experiments, with and without 3AT). Moreover, it revealed a very low overlap with the universal stress response observed for a wide range of environmental stresses (Gasch et al, 2000; Causton et al, 2001) and with the typical response to amino-acid starvation (Natarajan et al, 2001). Additionally, all known specific responses to stress in the literature are characterized by transient induction or repression with relaxation to steady state within a generation time (Gasch et al, 2000; Koerkamp et al, 2002; Wu et al, 2004), whereas in our experiments relaxation of the transcriptional response occurs over many generations. Taken together, these results show that the transcriptional response observed here is neither a metabolic response to the change in carbon source nor is it a standard response to stress or amino-acid starvation. This raises the possibility that it is a spontaneous collective response that is largely composed of genes that do not have a specific function. This possibility was tested directly by repeating the experiment with different populations and comparing their responses. This procedure revealed reproducible adaptation dynamics and steady states in terms of population density, but showed significantly different transcriptional transient responses and steady states for the two repeated experiments. Thus, a significant portion of the genes that changed their expression during the adaptation process do not have a well-defined and reproducible function in the challenging environment.
The application of a stronger environmental pressure in the form of 3AT had a dramatic effect on the global characteristics of the transcriptional response: it induced a markedly higher correlation among the hundreds of responding genes. Figure 3A compares the array data in color code for the two experiments. It is seen that the emergent pattern of transcription exhibits a higher degree of order by the introduction of high external pressure. Observation of the transcriptional patterns for specific metabolic pathways illustrates the different contributions to the correlated dynamics (Figure 3B–D). A general energetic module such as glycolysis exhibited similar patterns of induction and relaxation in experiments with and without 3AT (Figure 3B). However, in general, we found that more than one-third of the known metabolic modules (30 out of 88 modules described in KEGG) exhibited high expression correlation among their genes when the environmental pressure was high but not when it was low. As an example, Figure 3C shows the histidine biosynthesis pathway and Figure 3D the purine pathway. Note the highly ordered trajectories in the lower panels (with 3AT) compared to the disordered ones in the upper panels (no 3AT). This order extends also between genes belonging to different and even distant metabolic modules. It indicates that a global transcriptional regulatory mechanism is in operation, rather than a local specific one. Surprisingly, genes belonging to the same metabolic pathway exhibited simultaneous positively and negatively correlated dynamics. Thus, an important conclusion of this work is that the global transcriptional response to a novel challenge cannot be explained by a simple cellular or metabolic logic. This is to be expected if the response had not been specifically selected in evolution and was not pre-designed for the challenge.
Our data clearly reveal that the massive transcriptional response underlies the adaptation process to a novel challenge. The novelty of the challenge presented to the cells excludes the possibility that this response has been specifically selected toward this challenge. Thus, transcriptional regulation has dynamic properties resulting in a general massive nonspecific response to a novel perturbation. Such a response in turn allows for metabolic rearrangements, which by feeding back on transcription lead to adaptation of the cells to the unforeseen situation. The drastic change in the expression state of the cell opens multiple new metabolic pathways. Physiological selection works then on these multiple metabolic pathways to stabilize an adaptive state that causes relaxation of the perturbed expression pattern. This scenario, involving the creation of a library of possibilities and physiological selection over this library, is compatible with our understanding of a broad class of biological systems, placing the cellular metabolic/regulatory networks on the same footing as the neural or the immune systems (Gerhart and Kirschner, 1997).
Cells adjust their transcriptional state to accommodate environmental and genetic perturbations. An open question is to what extent transcriptional response to perturbations has been specifically selected along evolution. To test the possibility that transcriptional reprogramming does not need to be ‘pre-designed' to lead to an adaptive metabolic state on physiological timescales, we confronted yeast cells with a novel challenge they had not previously encountered. We rewired the genome by recruiting an essential gene, HIS3, from the histidine biosynthesis pathway to a foreign regulatory system, the GAL network responsible for galactose utilization. Switching medium to glucose in a chemostat caused repression of the essential gene and presented the cells with a severe challenge to which they adapted over approximately 10 generations. Using genome-wide expression arrays, we show here that a global transcriptional reprogramming (>1200 genes) underlies the adaptation. A large fraction of the responding genes is nonreproducible in repeated experiments. These results show that a nonspecific transcriptional response reflecting the natural plasticity of the regulatory network supports adaptation of cells to novel challenges.
PMCID: PMC1865588  PMID: 17453047
adaptation; cellular metabolism; expression arrays; plasticity; transcriptional response
23.  Stitching together Multiple Data Dimensions Reveals Interacting Metabolomic and Transcriptomic Networks That Modulate Cell Regulation 
PLoS Biology  2012;10(4):e1001301.
DNA variation can be used as a systematic source of perturbation in segregating populations as a way to infer regulatory networks via the integration of large-scale, high-dimensional molecular profiling data.
Cells employ multiple levels of regulation, including transcriptional and translational regulation, that drive core biological processes and enable cells to respond to genetic and environmental changes. Small-molecule metabolites are one category of critical cellular intermediates that can influence as well as be a target of cellular regulations. Because metabolites represent the direct output of protein-mediated cellular processes, endogenous metabolite concentrations can closely reflect cellular physiological states, especially when integrated with other molecular-profiling data. Here we develop and apply a network reconstruction approach that simultaneously integrates six different types of data: endogenous metabolite concentration, RNA expression, DNA variation, DNA–protein binding, protein–metabolite interaction, and protein–protein interaction data, to construct probabilistic causal networks that elucidate the complexity of cell regulation in a segregating yeast population. Because many of the metabolites are found to be under strong genetic control, we were able to employ a causal regulator detection algorithm to identify causal regulators of the resulting network that elucidated the mechanisms by which variations in their sequence affect gene expression and metabolite concentrations. We examined all four expression quantitative trait loci (eQTL) hot spots with colocalized metabolite QTLs, two of which recapitulated known biological processes, while the other two elucidated novel putative biological mechanisms for the eQTL hot spots.
Author Summary
It is now possible to score variations in DNA across whole genomes, RNA levels and alternative isoforms, metabolite levels, protein levels and protein state information, protein–protein interactions, and protein–DNA interactions, in a comprehensive fashion in populations of individuals. Interactions among these molecular entities define the complex web of biological processes that give rise to all higher order phenotypes, including disease. The development of analytical approaches that simultaneously integrate different dimensions of data is essential if we are to extract the meaning from large-scale data to elucidate the complexity of living systems. Here, we use a novel Bayesian network reconstruction algorithm that simultaneously integrates DNA variation, RNA levels, metabolite levels, protein–protein interaction data, protein–DNA binding data, and protein–small-molecule interaction data to construct molecular networks in yeast. We demonstrate that these networks can be used to infer causal relationships among genes, enabling the identification of novel genes that modulate cellular regulation. We show that our network predictions either recapitulate known biology or can be prospectively validated, demonstrating a high degree of accuracy in the predicted network.
PMCID: PMC3317911  PMID: 22509135
24.  iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections 
PLoS Computational Biology  2014;10(7):e1003731.
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from
Author Summary
Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
PMCID: PMC4109854  PMID: 25058159
25.  Learning Gene Networks under SNP Perturbations Using eQTL Datasets 
PLoS Computational Biology  2014;10(2):e1003420.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Author Summary
A complete understanding of how gene regulatory networks are wired in a biological system is important in many areas of biology and medicine. The most popular method for investigating a gene network has been based on experimental perturbation studies, where the expression of a gene is experimentally manipulated to observe how this perturbation affects the expressions of other genes. Such experimental methods are costly, laborious, and do not scale to a perturbation of more than two genes at a time. As an alternative, genetical genomics approach uses genetic variants as naturally-occurring perturbations of gene regulatory system and learns gene networks by decoding the perturbation effects by genetic variants, given population gene-expression and genotype data. However, since there exist millions of genetic variants in genomes that simultaneously perturb a gene network, it is not obvious how to decode the effects of such multifactorial perturbations from data. Our statistical approach overcomes this computational challenge and recovers gene networks under SNP perturbations using probabilistic graphical models. As population gene-expression and genotype datasets are routinely collected to study genetic architectures of complex diseases and phenotypes, our approach can directly leverage these existing datasets to provide a more effective way of identifying gene networks.
PMCID: PMC3937098  PMID: 24586125

Results 1-25 (1388233)