The exponential accumulation of molecular-biological intracellular data has the promise to advance biomedical sciences into a stage so as to understand and track most important events that regulate mammalian cells under normal and pathophysiological conditions. This would permit the development of a new generation of personalised therapeutics [1
] as well as open doors in synthetic biology [2
]. Physicists, mathematicians and engineers are increasingly engaging in systems biology. In this trend, experts bring tools from different disciplines to model intracellular complexity. Modelling efforts can be divided into three categories: network inference, dynamical modelling and graph analysis [3
]. In contrast to graph analysis, network inference and dynamical modelling need quantitative details and/or large data sets to build models [4
]. The main challenge with network inference and dynamical modelling methods is that many model realisations can fit the same data. Hence, the question commonly asked is: ‘how do we really know whether the model represents the real system under investigation since there could be many alternative models that can fit the same data?’ Much of current available and rapidly accumulating experimental data attempting to capture intracellular regulation is qualitative, noisy, inaccurate and incomplete. Hence, validating models from an assemblage of possible models is difficult. Alternatively and complementarily, network integration and graph analysis highlighted in this review represent a practical alternative to network inference and dynamical simulations. Some of the challenges within this research domain are: ‘how to project lists of genes or proteins identified in multivariate experiments onto large-scale known intracellular interaction networks?’, ‘how to integrate different networks so they can be used as background knowledge to fill in missing gaps not captured experimentally?’ and ‘how to develop heuristics to overcome the NP-hardness of the graph search problem?’ Effective data integration with filtering, graph querying and visualisation tools are key components for success in this subfield of systems biology [5
The abstract representation of interactions within a cell to networks (formally graphs) is becoming a conventional approach to deal with the large volume of data collected through emerging high-throughput technologies [6
], and low-throughput studies reported in the research literature integrated and abstracted into networks. Different biological networks can be represented by different types of graphs () [9
]. For example, protein–protein interaction networks can be represented as undirected graphs where nodes are proteins and edges represent direct physical interactions. Gene-regulatory networks can be abstracted to directed graphs where nodes are genes encoding transcription factors (or other types of proteins) and links represent transcriptional regulation. Metabolic networks can be represented as bipartite graphs where nodes are separated into two sets: enzymes and substrates [10
]. Although different graphs are used for different networks, the abstraction to networks helps with data integration [11
]. For example, Tanay et al.
] used a bipartite graph to integrate different ‘omics’ data by using yeast genes as anchors. Most efforts in reconstructing in-silico regulatory networks are for model organisms, but it is recognised that mammalian cellular networks are critically needed to facilitate biomedical breakthroughs. This review will focus mostly on data sets and tools that deal with mammalian biomolecular intracellular networks.
Different intracellular biological networks can be represented by different types of graphs