|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Studies of cellular signaling indicate that signal transduction pathways combine to form large networks of interactions. Viewing protein-protein and ligand-protein interactions as graphs (networks), where biomolecules are represented as nodes and their interactions are represented as links, is a promising approach for integrating experimental results from different sources to achieve a systematic understanding of the molecular mechanisms driving cell phenotype. The emergence of large-scale signaling networks provides an opportunity for topological statistical analysis while visualization of such networks represents a challenge.
SNAVI is Windows-based desktop application that implements standard network analysis methods to compute the clustering, connectivity distribution, and detection of network motifs, as well as provides means to visualize networks and network motifs. SNAVI is capable of generating linked web pages from network datasets loaded in text format. SNAVI can also create networks from lists of gene or protein names.
Interactions between signaling pathways in mammalian cells indicate that a large-scale complex network of interactions is involved in determining and controlling cellular phenotype [1-3]. To visualize and analyze these complex networks, the biochemical networks may be abstracted to directed graphs . To understand the topology of such networks, graph-theory methodologies can be applied to analyze networks' global and local structural properties . Additionally, the value of assembled network datasets is enhanced with network visualization software and web-based information systems. These systems provide summary information, order, and logic for interpretation of sparse experimental results [6,7]. Visualization tools and web-based navigation systems provide an integrative resource that aids in understanding the system under investigation and may lead to the development of new hypotheses.
Graph-theory methods have been used in other scientific fields to analyze complex systems abstracted to networks. For example, Watts and Strogatz  defined a measure called the "clustering coefficient" (CC) for characterizing the level of clustered interactions within networks by measuring the abundance of triangles in networks (three interactions among three components). For instance, if a node has four neighbors and three of the neighbors are directly connected, the CC for that node is 0.5 because the four neighbors can be connected maximally with six links (3/6 = 0.5). The network's CC is the average CC computed for all nodes. Caldarelli et al.  formulated an algorithm to consider rectangles (four interacting nodes) in the clustering calculation, and called it the grid coefficient. Watts and Strogatz also used the characteristic path length to measure the disjointedness between nodes in networks. Characteristic path length is the average shortest path between any two pairs of nodes. It is calculated for all possible pairs of nodes, such that the average minimum number of steps between all pairs of nodes is the characteristic path length. Together, the CC and the characteristic path length measurements have a predictable relationship when computed for most real networks. This observation is called the "small-world" phenomenon .
Barabasi and coworkers  analyzed the connectivity distribution of metabolic networks and other biochemical networks and observed a connectivity distribution termed "scale-free". Scale-free property indicates that the connectivity distribution of nodes follows a long heavy tail that fits a power-law. Such distribution results in few highly connected nodes that serve as hubs whereas most other nodes have few links. Another topological property that is used to statistically analyze biochemical regulatory networks is the identification of network motifs. In biochemical regulatory networks, motifs are subcircuits of molecular interactions involving multiple cellular components. The different possibilities for subcircuit configurations made of several components define different types of network motifs. All the possible combinations for interconnectivity made of few components in directed graphs can be determined  and then used to identify their prevalence by comparing the counts in random topologies. This method was used to characterize motifs in gene regulatory networks from Caenorhabditis elegans and Saccharomyces cerevisiae [11-14]. This type of analysis identified signature patterns of network motifs that can characterize different types of networks, including signal-transduction networks [13,14]. The graph-theory based network analysis methods described above are statistical. Such statistical analysis of signaling networks requires that the size of the network is large enough (requiring an estimated minimum of 200 nodes). SNAVI includes functions to compute the clustering, characteristic path length, and connectivity distribution of networks, and provides the means to identify and visualize network motifs.
Statistical analysis of network topology is complemented by effective network visualization and web-based navigation tools. Maps or diagrams of signaling pathways help summarize many interactions at once. Maps may suggest new interpretations for experiments, because the act of preparing the maps imposes logical interpretation . Additionally, mapping a network is an important initial step for developing models for quantitative simulation . Molecular interaction network maps are constantly changing as new data become available, and manually redrawing signaling maps is not convenient or desirable. The requirements for mapping large-scale biological networks include showing an appropriate level of detail, minimizing overlap of nodes and links, and compatibility with multiple data storage formats .
Existing software tools draw networks automatically from databases, Excel spreadsheets, XML (Extensible Markup Language), or text files where interactions are listed in a structured format. One recommended platform is Cytoscape . General network visualization software are often used by computational biologists, for example the Pajek software project , or AT&T's Graphviz project , where the second is an open-source project used as a library in many applications, i.e., Science Signaling uses GraphViz to display their Connections Maps . When maps expand beyond a certain number of nodes (~40–50) it becomes impossible to follow the links generated using the Pajek (version 1.10) or GraphViz (version 1.0) programs. One solution is implementing zooming and panning functionality using scalable vector graphics (SVG) code  or Flash, or dividing large, complex pathways into sets of smaller interrelated pathways. Another solution is to allow users to specify a portion of the network they want to explore and then construct subnetworks that are easily navigated. SNAVI can be used to construct and visualize such subnetworks to allow investigation of larger networks.
SNAVI is a software tool for statistical analysis and visualization of large-scale cellular signaling networks and other biochemical intracellular networks. Here, we demonstrate how SNAVI can be used for web-based visualization and statistical analysis of biological regulatory networks. As an example, the installation of SNAVI provides a network representing signaling pathways in hippocampal neuronal cells . To create this network, direct interactions were extracted from primary papers into a template stored in a flat file (Table (Table1),1), and then verified through a multistep manual review process by biologists. The network currently contains 594 nodes and 1422 links extracted from 1296 articles. Users may use this dataset or may load their own data. The process of creating, analyzing, and visualizing signaling networks using SNAVI is described in the methods.
SNAVI is organized into seven functional parts: Loading data into SNAVI, file manipulation and validation, exporting data to other programs, network visualization, statistical analysis of network properties, and identification of network motifs. The main SNAVI interface divides the functions of the software into five sections (Figure (Figure1).1). Help is available within the software by clicking the question mark buttons, which either opens a new window with detailed information or pops up a brief description of the function in question. SNAVI installation script automatically places a set of sample data files into the "Data" directory under the directory in which the SNAVI program was installed. These are "SNAVI.sig", "BACKGROUND.sig", "GENESLIST.txt", and "TWOCOL.txt". When analyzing user-supplied datasets, it is recommended that each dataset be located in its own directory.
The native data format for SNAVI conforms to a specified format (Table (Table1)1) and these files are called "sig" files. These are flat text files that contain lists of interactions where each interaction has 13 attributes. Table Table11 also includes four interaction records given as examples, representing the 13 attributes from a "sig" file. There are three ways to load data into SNAVI for analysis: Users may load sample datasets that are included with the software ("sig" format), create a network from lists of human Entrez Gene official symbols using a background network within SNAVI that contain over 7,000 human protein-protein and ligand-protein interactions, or analyze user-supplied interaction data listed in two-column text format. Once loaded, datasets may be saved as "sig" format for later analysis or manual editing.
SNAVI contains several routines to validate the integrity and completeness of loaded network datasets. These routines check for name, type of nodes, type of interactions, and logical or syntactical inconsistencies in accession codes. For example, two different accession numbers associated with the same node name are flagged and reported, as well as interactions involving the same pairs of node-names but having different effects. All validation errors may be ignored, if the errors are not relevant to the user's analysis and visualization requirements. Most functions will still work even when validation errors exist. Validation errors are reported in a window created from a file called "validation.htm" that is automatically saved in the same directory or folder in which the loaded data file is from, which is referred to as the current working directory.
In most networks, not all nodes are reachable from all other nodes. This means that networks are fragmented into isolated but internally connected components. Some statistical measurements and visualization functions can only work for networks where all nodes are reachable from all other nodes. Therefore, SNAVI provides a mechanism to export the biggest component in a network into a new sig file for analysis. This file is created with the "Dump the biggest component" option on the File and Validation section of the main SNAVI interface.
Another acceptable input for SNAVI is a flat text file (".txt") containing a list of human genes or proteins. This type of file may be useful for creating an interaction network from experimental data, such as a list of mammalian genes or proteins produced by microarray or proteomic experiments. The gene or protein names must be listed in a text file and use the human Entrez Gene official symbol term for the corresponding human gene or protein in each row in the text file in a single column. SNAVI uses a background network of human protein-protein and ligand-protein interactions containing over 7,000 interactions to build a subnetwork containing the list of gene names connected through intermediate proteins or signaling components. The user specifies the number of intermediates (1 through 3) that can be used to connect any two genes in the list. Once SNAVI has created a subnetwork from the list of gene names, the resultant subnetwork can be saved as a "sig" file.
SNAVI provides several options for viewing entire networks or network motifs. The viewing options involve html files, SVG files, jpg, and dot files depending on the visualization option selected. The color and shape settings for the graphical displays are stored in a small text file called colors_and_shapes.txt, which can be found in the current working directory. This file may be modified manually to generate user-defined colors and shapes. The file must have three columns in each row. The first column corresponds to the molecule type defined in the sig file, the second column is a color name defined in at http://www.graphviz.org/doc/info/colors.html by GraphViz, and the third column is a shape defined at http://www.graphviz.org/doc/info/shapes.html by GraphViz.
After loading a network or creating a network from a list of gene names, one of the visualization options SNAVI provides is the creation of a set of web pages that are saved in the directory that contains the uploaded data file. When the program is creating the web site, it attempts to connect to PubMed and Swiss-Prot databases through an Internet connection to obtain information about the proteins and details about references used. The web page files include a main page that has links to pages containing the following information in addition to a network diagram organized by the type of component in the network: (i) an alphabetized list of molecules with links to individual pages for each node in the network, (ii) network statistics, (iii) legend to colors and shapes. The web pages created for each node contain an SVG map with all the upstream and downstream links from that node, and a list of references and detailed text descriptions of interactions directly implicated with each node. The pages also contain links to the Swiss-Prot database and statistics for individual nodes.
SNAVI's PathwayGenerator function provides a mechanism to query a network by allowing the user to select a source node and a target node and then finding the paths that connect these two elements in a network (for example, a ligand as source, and a downstream effector, such as a transcription factor, as target). The output is a graphical network map showing the interactions that connect the selected nodes. The program attempts to find the shortest paths from the source to the target while optimizing the maximum number of steps from the source to the target so that the map is of optimized size. This is performed to fit as many nodes and links as possible based on the user's specified parameters. The process of maximizing the number of steps from the source to the target is implemented to generate maps that are optimally sized for easiest navigation and maximal content, and to control execution time. An example of the interface and a generated map from GRIN2A to Syntaxin3 is provided (Figures (Figures22 and Figure Figure33).
There are two ways to create the query that draw subnetworks. The "PathwayGenerator" interface is the "advanced" user interface where the user may set the maximum number of parameters. The "Easy Interface" limits the options for the source node to those classified as ligands or receptors, and then allows the target node to be one of the following: transcription factor, kinase, phosphatase, vesicle-related protein, cytoskeleton, or ion channel. When the radio button next to each option is clicked a drop-down menu with the names of the nodes provides the available choices for each. SNAVI also includes a function that finds, counts, and plots a subset of small-sized network motifs: scaffolds (three-node triad of all neutral links), feedback loops, feed-forward loops and bifans . An example of the user interface dialog box to select different motifs and a sample output of such analysis is provided (Figures (Figures44 and Figure Figure55).
The statistics report in SNAVI shows the number of components, interactions, and references for the network. The components are categorized according to their function and their locations. The interactions are categorized according their effects and the types of interaction. SNAVI statistics reports provide several standard graph-theory measures. For example, the clustering coefficient (CC) measures the level of clustered interactions in networks by measuring the abundance of triangles. The grid coefficient measures the level of clustered interactions by counting rectangles. The connectivity distribution is the number of nodes that share the same degree of connectivity.
To further investigate the topological structure of networks, SNAVI can be used to partition the loaded networks into subnetworks based on two different specified criteria. In the first option, which is accessed by the "Generate and analyze subnetworks based on connectivity", the inclusion of nodes and interactions is based on nodes' degree. In the second option, which is accessed by the "Generate and analyze subnetworks propagating from source nodes", the inclusion of nodes and interactions is based on specifying a group of nodes from which the connectivity is expanded in steps to build subnetworks. For example, subnetworks can be generated and analyzed by setting ligands as the source nodes and expanding the subnetworks in steps in the downstream direction. Starting from a specified group of source nodes, series of subnetworks can be created and analyzed expanding in steps from source nodes. The subnetworks characteristics produced by such analysis are reported in a HTML table. This table can be exported to an Excel spreadsheet.
Comparing real topologies to shuffled networks is important for identifying topological properties that have been selected for in biological regulatory networks. SNAVI provides the user with functions to create and analyze three types of shuffled networks based on the original loaded network: shuffled networks in which pairs of target nodes were repeatedly swapped, shuffled networks in which the connectivity is the same as that in the original network but the signs of effects (positive +, negative -, or neutral 0) are shuffled, or Erdos-Renyi networks . Erdos-Renyi networks have the same number of nodes and links as the original network; however the links are randomly assigned between nodes.
Networks loaded into SNAVI can be exported into the following file formats: MFinder, Cytoscape, DOT and Pajek.
The MFinder program  is a command line program that counts and reports network motifs. The program identifies statistically over- or under-represented network motifs, and reports the motifs that have been found in an output text file. The MFinder program accepts input as text files containing three columns of integers. The first two columns represent the source node and the target node, respectively. The third column is not used (reserved for future implementations).
Cytoscape http://www.cytoscape.org/ is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. SIF files created with SNAVI can be loaded into Cytoscape for visualization. This file can be loaded into Cytoscape using the File->open function.
SNAVI users can also export networks into the Pajek software http://vlado.fmf.uni-lj.si/pub/networks/pajek/. Pajek is Windows-based freely available network visualization and analysis software. Pajek accepts formatted text files as input that follow the Pajek language specification. Files are saved with the .net extension. The Pajek files created with SNAVI specify the node colors based on their degree of connectivity (k), as well as by assigning colors to links (green for activation arrows, and red for inhibition arrows) and specifying the direction of the links so that arrows are drawn. The Pajek software is mostly useful for drawing large networks. It contains algorithms that optimize locations of nodes to minimize link lengths and link crossings.
Many of the functionalities that have been implemented for SNAVI are available with the network analysis software system Cytoscape. Although the SNAVI and Cytoscape software systems provide some similar functionality, Cytoscape is much more robust and provides many more features than SNAVI. Hence, SNAVI is not as stable as Cytoscape and as such we do not recommend SNAVI as the first choice for biological network analysis in general. However, SNAVI still has some unique features that are not found in Cytoscape. SNAVI is especially appropriate for analysis and visualization of mammalian cell signaling networks represented as directed graphs. This is because SNAVI's core functions were developed specifically for the analysis applied for a study published in Science in 2005 . A case study of how to use SNAVI to analyze and visualize this signaling network is provided as supporting material to this article [see additional file 1]. The two main features in SNAVI that are not available in Cytoscpae are: a) several specific open source algorithms for performing different types of complex network analyses; b) SNAVI can effectively be used to created complete web-sites from large-sized networks by breaking a large network into pieces presented as linkable web-pages. We summarized the differences and similarities between SNAVI and Cytoscape in Table Table22 as per the peer reviewers' request. Although many features are shared by SNAVI and Cytoscape users should consider that although the same features might be implemented there could be still significant differences in the way such features were implemented. Hence, there are advantages/disadvantages and/or user personal preferences for specific implementations.
Effective analysis and visualization tools for large-scale cellular networks that can draw connectivity diagrams with an unbiased predisposition from large interaction datasets can help cell and molecular biologists identify potential new pathways and sometimes predict missing components and interactions. Such tools can be used to support hypothesis generation and experimental design, improve presentations in seminars and publications, and serve as a valuable educational resource for students. The abstraction of signaling networks into nodes linked through three types of links (activating, inhibiting, or neutral) is simplified, but nevertheless captures key features of information flow in cellular regulatory networks. Others have proposed more detailed descriptive and visualization approaches, which may provide more information about cellular regulatory systems than what is currently provided by SNAVI [15,23]. Although these other visualization schemes can be useful, it is more difficult to annotate interactions to fit into their templates. In order to move from gene and protein lists to network maps and then to predictive models of mammalian cells, signaling networks will indeed have to become more detailed in their annotation. Such details will include kinetic parameters and spatial information . We hope that SNAVI will provide one more resource towards achieving such long term goals.
Project name: SNAVI
Project home page: http://snavi.googlecode.com/
Operating system: Windows XP or Vista
Programming language: Visual C++ .NET version 7.1
Other requirements: Adobe SVG Viewer Version 3.03, WinGraphViz.
License: GNU GPL
Any restrictions to use by non-academics: License is not needed but interested commertial parties should contact ude.mssm@ygolonhcet before reusing the code.
AM designed the system and implemented the code except for the grid coefficient and matrix visualization functions which were implemented by SPP. RLW, SIB, NAH, JMP, TF tested the system. AM and SLJ wrote the manuscript. RI reviewed the manuscript and supervised the project.
Case Study Instructions for SNAVI. The document provides step by step instructions on how to use SNAVI to analyze and visualize a large cell signaling networks provided with the software.
This work was supported by NIH Grant No. 1P50GM071558-01A27398. We would like to give special thanks to Nancy Gough the Editor of Science Signaling for her many useful suggestions and comments.