The availability of high throughput interaction data has led to the creation of methods for summarizing and exploring networks using node-edge graphs. In these graphs, genes or proteins are represented by nodes (vertexes) and interactions by edges (3
)). The underlying interaction data are diverse and include manual or automated text mining of the literature (5
), genetic interactions obtained from gene deletion sets, and physical interactions identified by large-scale mass spectrometry or two-hybrid analysis (4
). Interactions in node-edge graphs can be undirected (denoting an unspecified interaction), directed but unsigned (denoting substrate-product relationships) or directed and signed (denoting both substrate-product and inhibition-activation relationships); the latter are particularly useful because they capture biochemical causality. For protein data, graphs comprising undirected edges are typically called Protein Interaction Networks (PINs) whereas those with signed directed edges are known as Protein Signaling Networks (PSNs). Most work on PINs and PSNs to date has focused on adding as much data as possible, often from more than one organism or type of experiment, so as to construct large networks with the greatest possible scope and the greatest number of interactions per node (increasing the “degree” of the network); the culmination of this effort is a proposed “Human Interactome” covering all known gene products (8
In cancer biology, comparative analysis is the natural focus of “conventional” low-throughput studies of signal transduction with particular attention paid to differences in cellular responses to ligands or drugs in different cell types. In most cases, these differences reflect changes in the abundance or activity of signaling proteins (or of their substrates), features that could in principle be depicted by the strength of an edge in a network graph. However, existing PSNs and PINs do not encode the activities of proteins in cells that have been exposed to specific activators or inhibitors. A dearth of data on context-specific interactions makes it difficult to compare normal and diseased cells or diseased cells from different tumors. Cell- and state-specific information has been added to network graphs using gene expression data (3
), but few attempts have been made to reconstruct comparative networks using biochemical data.
In this paper we attempt to combine concepts from global network discovery and traditional biochemistry by constructing comparative network models of signal transduction in normal and transformed liver cells. Starting with a prototypical network derived from the literature (which we will refer to as a prior knowledge network or PKN), we first constructed a set of all Boolean models compatible with the PKN, used the model “superstructure” to guide the collection of biochemical data on multiple nodes in the network across multiple cell types, and then trained the superstructure against data to uncover underlying differences in signaling logic among cell types. The net result is a computational representation of a signaling network that focuses on activity rather than literature association or physical interaction and that is explicitly comparative.
A first essential step in adding activity data to networks is to convert PKNs into models in which it is possible to compute input-output (I/O) characteristics (1
). In this paper we use a two-state (Boolean) logical formalism in which each node can have only two states, 0
, but having a 1
at the output can depend on having a 1
at one of several inputs (an OR gate), all inputs (an AND gate), or 0
inputs in any combination. Boolean models have the advantage that they have no continuous free parameters and their topologies can be trained efficiently using data (1
), a task that is harder with large differential equation models (10
). However, we recognize that real biological systems exhibit dose response behavior that is only poorly approximated by Boolean logic. Thus, a major question at the outset of this work was whether the strengths of Boolean modeling with respect to computational simplicity would outweigh its weaknesses. It seemed possible that the crudeness of the Boolean on/off approximation would overwhelm any differences we might measure experimentally from one cell type to the next. Conversely, success in creating comparative models would constitute a proof-of-principle for the approach.
We therefore applied Boolean modeling to distinguishing patterns of immediate early signaling in normal and transformed cells, represented here by primary human hepatocytes and HepG2, Hep3B, Focus and Huh7 liver cancer cell lines. Liver cancer (which is dominated by hepatocellular carcinoma [HCC]) is the third most common cause of cancer death in humans (11
) and is known to involve alterations in the EGF-Ras-MAPK, AKT/mTOR, Jak/Stat and NFκB cascades (12
). Thus, we aimed to collect multivariate data on the activities of these pathways in normal and transformed hepatocytes. We show that it is possible to assemble predictive network models that are specific to each cell type, cluster models based on topology and uncover consistent biochemical differences between transformed and normal cells. By identifying an interaction missing from the starting PKN but supported by data, we also uncover a poorly documented off-target effect of a drug being developed for asthma and inflammation (13
). Our findings demonstrate that discrete logical modeling can capture cell-type specific biochemical relationships, raising the possibility of constructing large comparative models of signal transduction in normal and diseased cells.