The concept of networks is ubiquitous in systems biology. In the past decade, high-throughput experimental techniques such as yeast 2-hybrid systems and mass spectrometry-based proteomics led to an influx of biomolecular interaction data in curated databases such as HPRD [1
], DIP [2
], and BIND [3
]. Computational methods to predict protein interactions with domain interaction profiles [4
], co-expression patterns [5
], and term co-occurrences based on text mining [6
] have also led to the development of databases such as OPHID [7
], InterNetDB [8
], UniHi [9
], HAPPI [10
], and STRING [11
]. These databases support the transformation of biological network studies into essential biological data analysis tasks that include inferring global protein functions [12
], assembling protein modules [13
], integrating different Omics data sets [14
], reconstructing biological pathways [15
], predicting disease-relevant genes/proteins [16
] and developing panel biomarkers [17
Many network visualization software tools have been developed recently to help biological researchers visually query, annotate and analyze biomolecular network data. For example, Cytoscape [18
] is one of the most commonly used software platforms that contains all basic functions for visualizing and annotating a network graph derived from protein-protein or protein-DNA interaction data. It has a robust graph layout engine that allows for automatic layout and manual control of network graph nodes and edges attributes corresponding to user annotation data. Cytoscape adopts an open and flexible software architecture that supports software plug-ins, which extends the core functionality of Cytoscape through third-party software extensions. VisANT [19
] competes with Cytoscape by offering several built-in statistical functions to help users calculate several key network topological parameters and perform global real-time network analysis. WebInterViewer [20
] uses a ultra-fast graph-layout algorithm that can scale up for manipulating the layout of a biomolecular interaction network up to tens of thousands of nodes on a desktop computer, while providing several network abstraction and comparison operators. The most recent feature-rich network data analysis software tool, Biological Networks [21
], enables advanced bioinformatics users to integrate microarray data analysis with biomolecular interaction network analysis over a diverse set of database choices through powerful template-based query interfaces. Pathway Studio [22
], which is available commercially, also uses powerful visualization engine and query interfaces, and allows its users to manage and access data stored in relational databases and to integrate biomolecular interaction data from its PubMed literature mining engine with other sources. In summary, current development trend is to equip users with extended ability to query and interpret existing experimental data, particularly those from "Omics" platforms, in the emerging context of biomolecular interaction networks.
Recent research in network biology has expanded beyond the study of protein-protein interactions or protein-DNA interactions, therefore presenting new challenges and opportunities for biological network visualization and analysis software. These networks are more complex, with heterogeneous types of biological entities spanning broad range of scales from molecular (i.e., DNA, proteins, metabolites), to super-molecular (i.e., gene ontology categories, protein complexes, pathways), to intercellular (i.e., signaling between different cell types), to tissue and physiological (i.e., individual disorder types) levels. For example, Goh et al
. explored all known associations of disease phenotypes by representing disease phenotypes instead of molecular entities as nodes in a network graph [23
]. They described two new types of biological networks, "disease interaction network" and "disease-gene network", in which the former represented disease names as nodes and disease associations at the molecular level (sharing > 1 disease genes between associated diseases) as edges, while the latter represented genes as nodes and gene associations shared in a common disease (shared > 1 diseases between associated genes) as edges. To characterize the global relationships between protein targets and all chemical drug compounds available today, Yildirim et al
] built a drug-target association network representing all known drugs and their targets recorded in the DrugBank database [25
]. The network offered an intriguing view with "hot" drug intervention points (popular drug targets) and multi-targeted drugs clearly displayed. Analyzing the data in multi-scale biological networks is inherently more challenging than that of biomolecular interaction networks, primarily because the heterogeneous interacting biological entities may differ significantly in size, quality, complexity and annotation requirements, making it combinatorial more difficult to develop user interfaces that preserve usability and robustness at the same time. Few existing tools today can empower users to perform "visual analytics"–to discover novel information through visualization–for multi-scale biological networks.
To support multi-scale biological network visual analytics studies, new software tools must meet three basic requirements. First, the bulk of data should be managed by robust backend engines that support rich schemas such as relational database management systems (e.g., PostgreSQL, Oracle) or XML/RDF data stores (e. g, Jena, Piazza). Flat files quickly become unsustainable beyond one or two spreadsheets of custom user input data, due to lack of a standard schema and difficulty in combining information from separate spreadsheets. Second, iterative, exploratory and bi-directional data analysis capabilities to save temporary results and build visualization sessions on top of one another should be a pre-requisite. Many current software tools support only one-way information flow from data sheets to visualization, and therefore should be referred to as "visual annotation" or "visual display" tools instead of "visual analytic" tools. Third, visual querying languages, even if borrowed directly from SQL in relational database querying or SPARQL in semantic web based data querying, will become quite beneficial to advanced users, who have to filter different facets of biological networks and manipulate complex network analysis tasks, by automating tasks that are "menu-driven" or "mouse-click intensive". As Suderman et al
recently surveyed, none of the 35 commonly used biological network visualization tools supported such query languages embedded directly [26
We developed ProteoLens as a new visual analytic software platform for creating, annotating and analyzing multi-scale biological networks. When compared with existing biological network visualization tools, ProteoLens introduced a new set of design choices, which made it easy for bioinformatics expert data analysts work on large sets of biological networks and Omics data. There are three primary characteristics that distinguish it from existing network visualization tools. First, it supports direct database connectivity to Oracle and PostgreSQL database and SQL statements including both Data Definition Languages (DDL) and Data Manipulation Languages (DML). Users of ProteoLens can use the tool to iteratively prepare data stored in relational databases without leaving the visual analytic environment. Data from different tables in a complex relational database schema can also be queried on the fly to create networks at the appropriate level for exploration. Second, ProteoLens supports graph/network represented data expressed in standard Graph Modeling Language (GML) formats. Therefore, visual layouts performed in comparable software tools can interoperate with ProteoLens as long as they also support GML standards. This allows users to perform visual network analysis for data from heterogeneous sources that are syntactically represented in non-relational format. Third, it supports the decoupling of complex user interfaces for network visualization into two separate functional layers: data annotation and data visualization. The concepts of "node association rules" and "edge association rules" provide users with significant flexibility in choosing what data attributes (e.g., score, rank, description) to map to nodes or edges, and association visualization display options allow to select visual effects to represent values of these attributes later.
In the next several sections, we first describe ProteoLens implementation and then demonstrate how it can be used to enable multi-scale biological network-based research through three case studies.