|Home | About | Journals | Submit | Contact Us | Français|
With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.
Biological pathways are often represented as pixel images (JPEG, GIF, etc.) or vector graphics (Scalable Vector Graphics (SVG) or PostScript). Typical examples of such static representations include those presented in databases such as KEGG (1), Reactome (2), BioCarta (http://www.biocarta.com) and EcoCyc (3). Although a static representation is intuitive and informative and has been widely used in textbooks and illustrations, it is difficult to edit, or to reuse for analysis, modeling and simulation. As a result, important resources such as the KEGG database cannot be fully exploited. Notable steps toward meeting the challenge of computable representations include the development of BioPAX (Biological Pathways Exchange, http://www.biopax.org/) and KGML (KEGG Markup Language, http://www.genome.jp/kegg/xml/) with BioPAX focusing on detailed ontology while KGML includes layout information.
A number of software tools (4–9) have been developed to visually build computable models of pathways. These tools are usually based on graphical models in which nodes represent genes, proteins or chemical compounds, and edges represent various types of interactions or associations. To date, few tools support the conditional dependencies of molecular and genetic entities and their associations. Thus, pathways encoded with existing tools may lack key information needed for interpreting the pathway's functioning.
In order to combine multiple pathways in a manner that is useful for modeling cellular behavior, two main challenges must be addressed. First, models must allow a hierarchical visual representation (6,10–12). Second, data representation is complicated when several complexes share some of their proteins, because the role of a common protein generally depends on context (13,14). Methods such as semantic zooming or hierarchical decomposition (10,12,15–20) are needed to aggregate and abstract entire pathways or pathway portions into small units that can be displayed within larger pathway systems. Hierarchical structures are also very common in the computable representation of biological knowledge in BioPAX and KGML formats. A protein complex must often be represented as a node containing a set of nodes, one for each subunit. Each subunit in turn may itself contain a set of nodes representing conserved domains identified in the subunit's 3D structure or primary sequence. Representing a protein complex as a simple, non-hierarchical, node often obscures properties of the proteins because attributes of the simple node are aggregated across multiple proteins, each of which may have different attributes with respect to one another. An obvious workaround for this issue is to model protein complexes as ‘compound nodes’(10,11,15) or ‘metanodes’, which are nodes with recursive internal structure (Figure 1) (20).
While biological systems contain an appreciable amount of hierarchical organization, molecular components are reused across subsystems, making it impossible to perfectly capture all of the information into a nested set of relations. Strict hierarchical representations can capture biological substructure but cannot model overlap between protein complexes. Related to this idea, nodes that represent only a single protein may not have a unique state but may instead behave in a condition-dependent manner. It is common practice (21) to use multiple nodes to represent different states of the same protein to maintain clarity of control logic and conditional dependency in pathways. However, this can lead to an explosively growing chain of nodes. It also breaks data integrity and introduces data redundancy, as the same protein is represented by multiple nodes. More importantly, the exact conditional-dependent state of a given protein can be unclear or unknown in many pathways. A typical example can be found for protein STE20 in the MAKP signaling pathway for yeast (http://www.genome.ad.jp/dbget-bin/show_pathway?sce04010+YHL007C), which most likely has different activities under different conditions, but the exact nature of the state differences is currently unknown. How can such conditional dependencies be represented and modified when corresponding biological information becomes available?
Protein–protein interaction data sets obtained from either large-scale experiments or computational predictions, as well as coexpressed genes predicted from large-scale expression data, can be used to help fill gaps in incomplete pathways (22–25). Although many tools provide facilities to visualize expression data in the context of pathways (4,5,7,9,26), facilities to enrich pathways in a computationally based visualization system, using both interaction and expression profiles, are missing.
Here we report new developments in VisANT 3.0, a Web-based platform with new modules supporting exploratory pathway analysis using metagraphs (20) to address multi-scale visualization of multiple pathways; editing and annotating pathways using a KEGG compatible visual notation; visualization of expression data in the context of pathways; enriching pathways using either coexpressed components of known pathway members predicted from expression data in the SMD (27) and GEO (28) databases or proteins with known physical interactions and assigning genes/proteins of unknown function to known pathways. The new version of VisANT will help users take full advantage of the large number of available resources in the KEGG pathway database when building new pathways.
A metagraph is a data structure for representing nodes, edges and subnetworks in a nested structure. One significant difference between a compound graph and a metagraph is that metagraphs allow one node to have multiple instances and these instances are automatically tracked. This capability allows a metanode in a metagraph to share nodes: each metanode has its own instance of the same node. Metanodes have two semantic states: an expanded state that reveals the associated subgraph inside, and a contracted state that hides the internal structure, rendering the metanode as a simple node. Edges between the nodes in an expanded metanode have the usual meaning (associations based on experimental data or computationally inferred correlations); edges between metanodes either reflect a correlation between standard (hidden) nodes or indicate that the same gene/protein occurs in both metanodes (20).
KGML is an exchange format for KEGG graph objects, particularly KEGG pathways, which are manually drawn and updated. The KGML files for KEGG metabolic pathways specify how enzymes (boxes) are linked by a relation and how compounds (circles) are linked by a reaction. In contrast, the KGML files for KEGG regulatory pathways contain only the former. KGML files for all supported species in VisANT have been preprocessed to map genes to their KEGG pathways, and a VisANT user can identify pathways for a specified gene either by searching for its interactions or resolving (normalizing) its names or IDs as explained subsequently.
Two pathway recommendation web services for identifying functionally related genes from transcriptional profiles are integrated in VisANT through its plugin architecture (20). Given a set of query genes, typically the known genes of a pathway, these services recommend additional genes in the same pathway as the query set. Both search engines support five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Saccharomyces cerevisiae. When VisANT is run as an online applet, connections to the services are mediated by the VisANT server.
GeneRecommender (29) discovers new genes with similar function to a given list of genes (the query) already known to have closely related function. It ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated.
ClueGene (30) uses the pattern of how genes cluster together in sets of experiments to recommend new genes in a pathway. ClueGene bases its recommendations on the query set and on a cluster compendium. Each set of experiments is clustered independently. The collection of clusters constitutes the cluster compendium. Each gene in the genome is given a co-clustering score. Higher scoring genes are more highly recommended and tend to be found in small clusters in the cluster compendium along with query genes.
The use of VisANT (20,31,32) to mine, integrate and display biological interactions based on KEGG pathways and expression data is facilitated by a name-normalization service which resolves IDs used by different databases. In addition, customized ID mappings, as well as corresponding Web links, can be easily added to the network through a simple tab-delimited format. VisANT is developed using Java technology. In addition to the Web browser applet interface, VisANT can also be run as a stand-alone application which implements an auto-upgrading detection system to keep it up-to-date. Detailed information on VisANT's three-tier structure (31) and plugin framework(20) can be found at http://visant.bu.edu. In addition, a new error-reporting system has been implemented to enhance the integration reliability of distributed systems: users will have the option to report critical errors to the plugin authors and VisANT team.
VisANT automatically recognizes the format of an input file based on its content. Only those formats that related with the new functions will be discussed here. The full list of supported files can be found in VisANT's web site.
Pathways can be loaded into VisANT using several different input methods as detailed in Figure 2. In particular, double-clicking on a contracted pathway node (e.g. the blue boxes in Figure 1) will also load the pathway if the corresponding KGML file is available from the KEGG. Expression data is input from a common tab-delimited file. The first column can be an Entrez Gene ID, an Access ID/GI number, a gene name or an ID from an organism-specific database. The file can have a header line to indicate the names of the different experiments; otherwise, VisANT will use a sequential number to identify different experiments. If the expression data is to be overlaid on an existing pathway, the name normalization service should be utilized first so that genes in the network and in the expression data can be matched to each other.
All data shown in VisANT can be saved in an XML format using the VisANT Markup Language (VisML). Registered users can save the network on the VisANT server so that it can be accessed wherever the internet is available. VisML uses a version number to facilitate compatibility and extensibility. A description of VisML can be found at http://visant.bu.edu. In addition, pathways can be exported without visual information, as tab-delimited edge and node lists. Pathways can also be saved as pixel images, or as high-quality SVG for publication and illustration. An SVG file can be further polished with an SVG editor.
Each pathway is represented as a metanode which may be nested within other metanodes (Figure 1). If links to other pathways are available in KGML, these pathways are represented as contracted metanodes.
VisANT adopts the KEGG notation for graphics annotation so that users will have consistent views of the KEGG pathways. However, a few changes were necessary. In particular, a single protein/gene is represented as a filled green circle, and a metanode displayed as a green box is used to represent multiple proteins/genes. Additionally, the number of proteins/genes contained in a metanode can be revealed by double-clicking the box. Use of a metanode for a protein complex is also introduced (Figure 3). Multiple instances of the same node can exist even in the same pathway (ARG5,6 in Figure 1). These instances can be tracked by pressing the right mouse button over the corresponding node. Dashed lines will connect all instances of the node. The lines vanish once the mouse button is released.
Pathways can be easily edited in VisANT. Nodes and edges can be modified, added or deleted. Additional components can be added to pathways by a simple drag and drop. Pathways can be easily ungrouped or regrouped as one large pathway, depending on the user's needs.
As with the extension of interactions for a given protein/gene, pathways can be extended by double-clicking on a pathway node. Using this method, a network of pathways can be quickly constructed. Figure 1A shows the network of pathways by first loading pathway MAP00220 and then expanding the pathway MAP00910. It is worth noting that crosstalk between MAP00251 and MAP00910 mediated by the compound C00025 is only visible after MAP00910 is expanded.
Because the state of a metanode can be toggled by mouse-clicking, an overview of the pathway shown in Figure 1B can be easily achieved by contracting the two pathway nodes MAP00910 and MAP00220. Thus, VisANT is capable of easily exploring pathways at different scales: a pathway overview enables users to observe the topology of large sets of pathways, while the detailed internal structure of any particular pathway or set of pathways is easily revealed by mouse-clicking.
VisANT provides two methods to visualize expression data over pathways: either the node color is used to represent the expression value in a particular experiment, or a plot of the expression profile is embedded in the node, as shown in Figure 3. The two methods can be toggled either for individual nodes or for the whole network. Different experiments can be navigated using a sliding bar and the navigation process can be animated. When the expression profile is shown, the corresponding experiment and expression value is indicated by a cursor.
In VisANT it is convenient to determine whether genes in the same pathway are coexpressed, as all the expression profiles of the nodes contained in a metanode (pathway), as well as the average profile, are drawn together as one plot with average profiles in black. Figure 3D shows such an example for a node representing a protein complex.
Sets of genes in the same pathway are often activated together and may have very similar expression profiles; their protein products may also interact, either physically or functionally, to achieve a specific task. VisANT provides functions to assign genes/proteins with unknown function to the known KEGG pathways based on these observations. Predictome (33) can easily be queried for sets of proteins that interact either functionally or physically with a specified protein. VisANT also has editing capabilities that allow any such set to be augmented with a user's own data set.
Genes with similar expression profiles can be identified using the ClueGene and GeneRecommender plugins and the genes so identified can be associated with one or another KEGG pathway in accordance with user specified criteria based on either functional or physical links (Figure 2E) (25,34,35). Query genes can be placed in identified pathways by a simple drag and drop.
We suggest that users test the coexpression of query genes with known genes in the potential pathways and compare scores using either ClueGene or GeneRecommender. In addition, expression profiles can be compared if query genes are searched using GeneRecommender.
New pathways can be created from scratch or from relevant KEGG pathways, the latter of course being substantially more convenient because of KEGG documentation. In collaboration with the KEGG, the VisANT web site lists all pathways for which KGML is available, allowing easy access and loading into VisANT (Figure 2A). These reference KEGG pathways can also be updated when necessary. When loaded into VisANT, they can be enriched either by querying functionally associated components from experimental and computational results accessible from the VisANT-Predictome system, or by searching for coexpressed genes as indicated above.
We next describe a use-case scenario to illustrate some of the new features of VisANT. Suppose a user is interested in the γ-secretase complex which acts in the H. sapiens notch signaling pathway (Figure 3A), and wishes to get more knowledge about related genes or the internal structure of the γ-secretase complex. First, the GeneRecommender plugin can be used to search for potential genes coexpressed with the five component members of the complex: APH1A, NCSTN, PSEN1, PSEN2 and PSENEN. GeneRecommender returns the top 10 coexpressed genes scored in the top 50 experiments. As can be seen from Figure 3B, the scores of the coexpressed genes can be separated into three groups. The top group, APH1A, PSEN1 and PSEN2, has much higher scores than the second group, PSENEN and LRRTM4. The plotter is linked to the network and selecting a spot in the plotter will select the corresponding node in the network (Figure 3B and C). Note that query gene NCSTN is not included in the top 10 coexpressed genes, indicating that NCSTN is not positively correlated with other members of the complex. Anti-correlations are very common in signaling pathways (Figure 3A); future implementations of the search engines will support identification of anticorrelated genes. Users may select different combination of query genes to achieve the best results. In addition, the degree of coexpression between members of a given metanode can be viewed by contracting the metanode and turning on the expression plotter option, as shown in Figure 3D. To further test the correlation of the 11 genes shown in Figure 3C, interactions between pairs of genes are queried against the Predictome database, which reveals the interaction between PSENEN and APH1A identified by coimmunoprecipitation (36), as shown in Figure 3E.
In addition, pathways can be updated against the KEGG database so that the latest pathway information can be easily incorporated into existing pathways customized by the users.
Among our goals for further development of VisANT is supporting pathways from other databases, including Reactome (2), BioCarta (http://www.biocarta.com), EcoCyc (3) and INOH (http://www.inoh.org/). Since computable representations of pathways from these databases are available in BioPAX format, one way to proceed would be to increase VisANT's support of BioPAX. This will require developing an automatic layout algorithm since BioPAX, unlike KGML, does not contain layout information. More importantly, a standard visual notation for different types of nodes and edges will also need to be developed for different types of biological components, and for the relations between them. Second, unlike KGML in which each pathway is usually stored in its own file, pathways in BioPAX format are usually represented in one large file which can exceed 100MB, making it impractical to load them all at once and also preventing exploratory navigation of pathways. New efforts, such as the latest developments in CPath (http://cbio.mskcc.org/cpath/home.do) have made significant progress to overcome this problem by providing corresponding Application Programming Interfaces (APIs) that can retrieve pathways one by one in the format of BioPAX. We expect obstacles discussed above will be removed in the near future and pathways from these databases will be ready for use in VisANT.
VisANT along with the full user manual and tutorials are available on the VisANT web site, http://visant.bu.edu
Conflict of interest statement. None declared.