|Home | About | Journals | Submit | Contact Us | Français|
Cytoscape is open-source software for integration, visualization and analysis of biological networks. It can be extended through Cytoscape plugins, enabling a broad community of scientists to contribute useful features. This growth has occurred organically through the independent efforts of diverse authors, yielding a powerful but heterogeneous set of tools. We present a travel guide to the world of plugins, covering the 152 publicly available plugins for Cytoscape 2.5–2.8. We also describe ongoing efforts to distribute, organize and maintain the quality of the collection.
High-throughput technologies allow enormous amounts of data to be collected on biological networks, including protein-protein interactions, protein-DNA interactions, kinase-substrate interactions, genetic interactions, gene coexpression and other functional relationships. One of the major computational platforms for analyzing these networks is Cytoscape, a general-purpose and freely available software platform for integration, visualization and statistical modeling of molecular networks together with other systems-level data1,2. To enable rapid prototyping and release of new methods, Cytoscape is implemented as an open-source software package with an accessible application programming interface (API) using the Java programming language.
One of the most powerful consequences of this design is that, through the Cytoscape API, software developers can write extensions called plugins that link Cytoscape with new code and provide access to new or alternative features. Plugins provide a flexible means by which any researcher can bring new concepts in network and systems biology to a broad user base of life scientists. Although some plugins come installed by default in the standard Cytoscape release, users optionally install most plugins to access the features they require (Box 1).
In the past several years, the number of publicly available Cytoscape plugins has grown dramatically, from a few dozen in 2005 to 152 registered plugins in the beginning of April 2012. This growth greatly increases the power and versatility of network analysis. However, it has occurred organically across a heterogeneous community of researchers and software developers, consequently presenting the user with a diverse and sometimes bewildering array of choices. Although most plugins provide user documentation and many are described in peer-reviewed research papers, a summary evaluation of the entire collection of plugins is needed. That is the purpose of this paper.
The Cytoscape website provides a mechanism for submitting plugins (http://www.cytoscape.org/plugin_submit.html), which keeps a copy of the plugin code and tracks information about each plugin: its authors and author affiliations, a brief description of its functionality, a link to the plugin homepage if one exists and the known compatible versions of Cytoscape. We used the plugin registry as our primary means of identifying plugins; as of April 2012, it contained a total of 152 publicly available plugins for Cytoscape v.2.5 or later (while this work was in review, 20 additional plugins were released, bringing the number up to 172). Laboratories contributing plugins are distributed worldwide, with the largest contributions coming from North America and Europe (Fig. 1a).
We first assessed the rate of use of each plugin by tabulating the number of downloads within the past year as well as the total number of downloads overall (Fig. 1b and Table 1). The former statistic indicates recent popularity and is directly comparable across plugins, whereas the latter statistic indicates all-time popularity but is skewed toward older plugins that have been consistently popular since their initial publication.
Next, we validated each plugin by downloading and testing its basic function. The latest version of each plugin was installed on an appropriate version of Cytoscape as determined from the information in our plugin database. We briefly followed basic manipulations described in tutorials and documents provided by the plugin authors. Eighteen (12%) plugins did not pass the basic validation test and were marked ‘Did not work’. Eleven (7%) plugins passed validation but were missing some of the expected functions and were marked ‘Problem found’. Both types of errors were communicated by email to the plugin authors, nearly all of whom replied and are currently working with us to resolve the apparent difficulties. We expect that by the time this work is published, many of the issues will have been fixed. In the ‘Problem found’ category, many problems have been traced to errors or ambiguities in the user documentation, not errors in the code. The 20 plugins registered after April 2012 were not tested, but they are listed in the Supplementary Data.
The utility of most Cytoscape plugins can be best understood within the larger context of how networks are analyzed (Fig. 2). A typical Cytoscape workflow begins by importing interactions (for example, protein-protein interactions) from a user’s own experiments or from public databases. Whereas experimental data on interactions are loaded directly into Cytoscape through standard file formats, public databases of interactions are accessed using plugins. Typically, the database is queried for interactions involving a list of genes of interest or for interactions among genes that have a certain attribute, such as a common molecular function or phenotype. Alternatively, interactions can be mined directly from the literature or through computational inference from non-interaction data such as expression profiles. Cytoscape has dozens of plugins for literature mining and for network inference.
Following the import of networks and visualization in Cytoscape, a large repertoire of plugins is available for network analysis (Fig. 2). For instance, plugins for network topological analysis enable users to calculate statistics such as the distribution of network connectivity (that is, node degrees), and network clustering plugins allow users to extract densely connected network regions, which often correspond to functional modules such as protein complexes or pathways. Biological functions of these modules can be inferred with plugins that perform functional enrichment: identification of functional terms that are statistically enriched among the set of genes comprising the module. Functional modules can also be identified by integrating the network with expression data to identify regions that are coherently up- or downregulated, or by integrating networks across species to identify regions of the network with conserved interactions. Finally, plugins for scripting and programmatic access allow control over the workflow.
In what follows, we review Cytoscape plugins at each step of this workflow, with special focus on the plugins that are most widely used, that is, those that have the greatest numbers of total downloads. Further descriptive use-cases of plugins are available in previous reviews1,3. To enable users to find suitable plugins at each step, we have developed a plugin classification system based on a broad set of 41 tags and a companion plugin webstore (http://apps.cytoscape.org/) that organizes plugins by tag. As an example, Supplementary Table 1 shows the top ten tags according to the number of plugins annotated to each. This information can also be illustrated by a network (Fig. 3a,b and Supplementary Fig. 1) in which plugins are connected to tags, such that plugins having similar tag assignments fall close to one another in the network, and the overall popularity of a tag can be seen by its total number of connections (Fig. 3c).
Cytoscape imports interaction data in various generic tabular formats including CSV (comma-separated values), TSV (tab-separated values) and Excel, along with network-specific formats such as SIF (simple interaction file, originally developed for Cytoscape), XGMML (Extensible Graph Markup and Modeling Language), GML (Graph Modelling Language), PSI MI (Proteomics Standards Initiative–Molecular Interaction format)4, BioPAX (Biological Pathway Exchange)5, OpenBEL (Open Biological Expression Language) and SBML (Systems Biology Markup Language)6. The generic tabular formats and SIF are especially useful when users wish to import their own experimental interaction data, which often consist of a simple list of gene pairs that have been found to interact. The network-specific formats can represent many additional details about each interaction when known, for example, the type, strength, mathematical details and functional consequence of interaction and, if applicable, the direction of information flow. Increasingly, the scientific community is beginning to use these more expressive formats, such as BioPAX, OpenBEL and SBML, to create and share models of biological networks among researchers.
Although the ability to recognize interaction data in these formats is provided by the Cytoscape core application, in many cases the user does not have new data but instead seeks to access the large online databases of previously generated interactions. Therefore, to complement the core Cytoscape functionality, several plugins are available to import existing interaction data catalogued in public databases. For example, the BioGridPlugin can be used to import an entire interactome (that is, the full set of interactions mapped for a species to date) from BioGrid7, one of several large databases of molecular and genetic interactions. Alternatively, a user may wish to import interactions involving a defined subset of genes or proteins; many plugins have been developed for this purpose. Among these, MiMI8, ConsensusPathDB9 and APID2NET10 are established and robust examples with useful features. The MiMI plugin retrieves and displays interactions from the Michigan Molecular Interactions (MiMI) database (Fig. 1b and Table 1), which combines data from a variety of established primary-interaction databases. The ConsensusPathDB plugin allows users to computationally validate whether there is previous support for a set of interactions in their own data. APID2NET provides a sophisticated graphical user interface to extract interactions involving a set of genes from the APID server (Agile Protein Interaction DataAnalyzer, Fig. 1b) and to perform analyses including hub identification, protein motif annotation and Gene Ontology (GO) enrichment. For databases that provide a PSICQUIC11 web service (standardized programmatic access to molecular interaction databases over the Web), interactions can be imported into Cytoscape by the PSICQUICUniversalClient plugin.
Some specialized plugins have been designed to import and visualize metabolic networks in particular, which can consist of multiple types of nodes (enzymes, small molecules and cofactors) or edges (reversible or irreversible reactions). The Metscape plugin12 generates metabolic networks based on information in the Kyoto Encyclopedia for Genes and Genomes (KEGG)13 and the Edinburgh Human Metabolic Network database14 (Fig. 1b and Supplementary Fig. 2). This plugin is powerful for superposition of a metabolic network with user-defined data on enzyme expression levels or compound concentrations. As an alternative, the KGMLReader plugin imports KEGG metabolic networks and preserves their hand-drawn intuitive layout. However, some network information in KEGG cannot be imported by KGMLReader because of problems mapping between the KEGG and Cytoscape network representations. Other plugins for importing metabolic networks into Cytoscape include the BioCycPlugin, which provides access to the BioCyc metabolic network database (http://biocyc.org/), and ReConn, which provides access to Reactome (http://reactome.org/).
Specialized plugins have also been developed to import canonical signaling or regulatory networks curated from literature. The GPML (Fig. 1b) and Superpathways plugins import and visualize networks from WikiPathways15, an open platform for curation of biological networks by the scientific community. We also recommend the Pathway Commons16 website (http://www.pathwaycommons.org/), which is able to transfer a network of interest directly to Cytoscape by clicking a hyperlink that appears on the web page for that network.
The large corpus of published papers provides information about interactions that are not yet available in public databases. Thus, extraction of interactions based on computational literature mining has become an important activity. The chief means in Cytoscape (v.2.5 or later) of building networks from the literature is AgilentLiteratureSearch17, a plugin that mines literature abstracts from sources such as Medline, Online Mendelian Inheritance in Man (OMIM)18 and the US patent database to identify putative interactions and use them to automatically construct a network (Fig. 4a). After a user enters search terms, the plugin finds matching records, extracts genes and their associations described within the record and displays them as a network. Although interaction networks based on automatic literature mining usually contain substantial false positives, they allow users to visualize a draft set of protein interactions that may not be present in other databases. The sentences that support each interaction can be manually reviewed to eliminate false positives. Demand for AgilentLiteratureSearch is high: it is the number three plugin by total number of downloads (Fig. 1b and Table 1).
For many species, genome-wide interaction screens have not been conducted, and users thus cannot assemble networks for these species. Even in an organism such as budding yeast, in which large-scale genetic and physical interaction experiments have been performed, complete network coverage has not yet been achieved19. Accordingly, many methods have been developed to predict novel interactions and generate networks from currently available data. GeneMANIA20 is one of the more refined plugins for this purpose. For a defined set of genes or proteins, it integrates data from many sources, including physical interactions, genetic interactions, pairs of coexpressed genes, pairs of genes in the same pathway or pairs of genes with the same subcellular location, and then visualizes the possible molecular associations among the given genes and other genes (Supplementary Fig. 3), thus allowing users to predict functions of uncharacterized proteins on the basis of functions of proteins associated with them. ExpressionCorrelation and MONET21 are plugins that predict functionally interacting pairs of proteins from expression data. MONET also incorporates biological annotations of genes to predict a regulatory network. Finally, for inference of metabolic network models, the CytoSEED plugin interfaces Cytoscape with the Model SEED22 resource for automatic generation of metabolic models from prokaryotic genome sequences.
Network topology refers to the arrangement or pattern of interactions within a network; several Cytoscape plugins have been developed to calculate topological properties. The NetworkAnalyzer23 plugin is installed in Cytoscape by default and calculates network metrics such as the distribution of node degrees (node degree refers to the number of interactions involving a node; it has been shown to correlate with the essential status of genes24). Users may also try CentiScaPe25,26 for this purpose (Fig. 1b) or the Interference plugin, which evaluates the topological effects of removing single or multiple nodes from a network.
A great deal of research has focused on mining networks for interaction clusters or ‘modules’, sets of interacting molecules that tightly associate with one another. Modules in a protein-protein interaction network, for instance, are suggestive of functional protein complexes. Plugins typically extract such modules by identifying densely connected subgraphs. MCL-new and APCluster implement the popular network clustering algorithms developed by Van Dongen27 and Frey et al.28, respectively, for clustering in general. MCODE29, one of the most popular Cytoscape plugins overall (Fig. 1b and Table 1), has been developed to perform network module identification specifically in biology. MCODE weights nodes by local neighborhood density, then performs an outward traversal from a locally dense seed protein node to isolate larger dense regions, and finally graphically displays extracted modules and associated information (Fig. 4b).
Several plugins improve on the basic MCODE algorithm or user interface. AllegroMCODE implements the MCODE algorithm using graphics-processing-unit–based parallelization to find clusters efficiently. NeMo30 identifies densely connected and bipartite network modules on the basis of the combination of a unique neighbor-sharing score with hierarchical agglomerative clustering. MINE31 clusters a given network via an agglomerative clustering algorithm similar to MCODE but using a modified vertex-weighting strategy.
Different network clustering plugins can yield quite different network modules from the same data. Plugin developers typically argue that more recently developed algorithms work better than older ones, with performance often measured by the ability to recapitulate known protein complexes or pathways. However, performance may also depend on particular characteristics of the input network: MINE was shown to outperform other algorithms including MCODE and NeMo specifically when analyzing the protein-protein interaction network of Caenorhabditis elegans, which has high interaction density31. Users should therefore test several different approaches to extract network modules and investigate which predicted modules make more biological sense. For this purpose, clusterMaker32 offers access to many different network clustering algorithms in one convenient interface (Fig. 1b). Also, literature comparing the performance of existing module identification algorithms is available33,34, which may help users to select appropriate plugins.
Genes connected in a network are likely to have similar functions; as such, the function of a network module can be inferred by finding the enriched functions of its genes. Methods such as Gene Set Enrichment Analysis (GSEA)35 have been developed to find enriched functions in a given gene list. Cytoscape has several plugins that perform this task for sets of genes in a given network, most notably BiNGO36, Cytoscape’s most popular plugin (Fig. 1b and Table 1). BiNGO extracts enriched functional terms recorded in the GO37 database and visualizes them in a hierarchy (Fig. 4c). A sister plugin called PiNGO38 was recently released and works in the opposite way; it starts with user-defined GO categories of interest and then finds candidate genes in a given network associated with those categories.
The ClueGO39 plugin creates a functionally organized network of GO, KEGG and BioCarta pathway terms (Fig. 1b) that represents functional organization within a set of interacting genes or proteins. Similarly, EnrichmentMap40 organizes gene sets into a similarity network in which nodes represent gene sets, edges represent the overlap of member genes, and node color encodes the statistical significance of enrichment. WordCloud41 visually summarizes the gene functional descriptions associated with a set of selected nodes (that is, data attributes; see below) by generating a cloud of words sized by their frequency of occurrence in the selected nodes. WordCloud is useful for visually summarizing gene function annotation of a given set of nodes in a simple way.
Beyond looking for shared node attributes as described, there are also plugins that spatially partition a network layout on the basis of such attributes. BubbleRouter groups nodes having the same attribute by a rectangular box on the main network view window (Fig. 1b). It is useful for visualizing relationships between groups of nodes having similar functions or nodes that are localized in the same cellular compartment. A more sophisticated successor called Mosaic has recently been released. Mosaic will retrieve GO annotations for nodes in any network with standard gene identifiers and then systematically partition, lay out and color the nodes as they relate to each of the three branches of GO. Mosaic thus provides a way to visualize molecular interaction networks in a known biological context.
A powerful feature of the core Cytoscape application is the ability to integrate biological networks with other types of data, including gene and protein sequences, functions, alternative identifiers and gene expression and other omics measurements. These other data sets are handled by what Cytoscape calls ‘Data Attributes’: tables that associate nodes and edges with columns of additional data values (of arbitrary type). As for networks, tables of biological data can be imported into Cytoscape from a user-supplied file or fetched from online sources using plugins. For instance, the BiomartClient and NCBIClient plugins (Fig. 1b) import basic gene and protein information into Cytoscape from the Biomart42 and NCBI databases, respectively. BioMartClient is also useful for retrieving or converting gene identifiers (IDs) so that newly imported information will match the IDs used in the current network. Another plugin that helps with identifier mapping is CyThesaurus, which converts gene, protein or metabolite IDs for one database to another via BridgeDb software43.
Gene and protein expression data can add information about which parts of a network are active in a given condition. The VistaClara plugin44 integrates expression data with network visualization. It provides a heat map view of gene expression data, colors genes in the network according to their expression levels (Fig. 1b and Supplementary Fig. 4) and can play a movie that animates expression changes over multiple conditions. NetAltas45 and OmicsAnalyzer46 are also available to visually investigate expression patterns of genes in a network. A unique feature of OmicsAnalyzer is that it can overlay a chart of the relevant gene expression data and statistics directly on each node.
Beyond visualization of gene expression data, some plugins enable a user to identify regions of a network that are enriched for highly or lowly expressed genes (network hot or cold spots). A popular plugin of this type is jActiveModules, which identifies and returns subnetworks in which the average gene expression level is significantly high or low in particular conditions47 (Fig. 1b and Table 1). Users may also want to try KeyPathwayMiner48, which tries to find densely connected networks in which genes have similar expression patterns by using a maximal-connected-subnetwork–finding algorithm. Alternatively, clusterMaker32 implements various algorithms for node clustering on the basis of not only graph structure but also gene expression patterns and may also be useful for finding network hot or cold spots (Fig. 1b). Finally, the PinnacleZ plugin49 identifies subnetworks for which the average expression level is diagnostic for clinical cases versus controls.
Other plugins such as BioQualiPlugin50, ExprEssence51 and PerturbationAnalyzer52 can be used to investigate the relationships between gene expression patterns and network structure. BioQualiPlugin checks global consistency between a regulatory network model (linking regulators to targets) and a set of expression data. ExprEssence compares gene expression levels in two experiments and highlights possible regulatory links that cause expression changes. PerturbationAnalyzer investigates the effects of perturbing protein concentration on protein interaction networks. DomainGraph53 allows users to combine full-length mRNA and exon expression data with interaction networks to analyze the effects of alternative splicing on pathways, protein-protein and domain-domain interaction networks.
Finally, one of the most integrative Cytoscape plugins to date (in terms of the number of layers of data being addressed) is the iCTNet plugin54, which was recently developed to integrate genome-wide association data (associations between single-nucleotide polymorphisms and phenotypes) with protein-protein, disease-tissue, tissue-gene and drug-gene interactions. It may assist users in elucidating a new trait classification, pathogenic mechanism or treatment for human disease traits.
Several plugins have been developed to compare or integrate multiple networks. One of the simplest examples is AdvancedNetworkMerge, which comes pre-installed with Cytoscape. This plugin performs defined operations (union, intersection and difference) on the sets of interactions in multiple networks loaded into Cytoscape. The Venndiagrams and VennDiagramGenerator plugins can compare two networks and draw a Venn or Euler diagram showing the overlap of nodes or edges between them. CABIN55 is a more refined plugin which has been used to integrate interaction data sets from different resources and to help explore the integrated network56. A user can conduct confidence analysis of the interactions with the integrated network.
Several plugins with more specialized comparison functions have also been developed. Based on the idea that interactions (known as interologs) are conserved to some extent across multiple species, the plugins NetworkEvolution57 and OrthoNets58 were developed to allow users to integrate interactions from multiple species to build conserved networks. Finally, because high-throughput genetic interaction screens have become feasible, integrating genetic interactions with other types of networks has been an important issue: the PanGIA plugin59 has recently been developed to integrate physical and genetic interactions to create hierarchical module maps (Supplementary Fig. 5).
The remaining Cytoscape plugins do not cluster tightly with others. They do, however, fall under general high-level categories that help convey their functions. We have added the tag ‘Utility’ for plugins that enhance the basic functionality of Cytoscape. This tag covers plugins that deal with selecting multiple nodes and processing them in different ways. For example, NamedSelection assigns a label to selected nodes and, after de-selection, enables users to reselect the nodes according to the label. Other plugins extend the basic definition of a network graph, nominally defined as a set of nodes and a set of edges connecting these nodes. For example, GroupTool enables a user to define groups of nodes and, for each group, to display basic information on the Cytoscape panel. MetaNodePlugin2 enables a user to define a ‘meta-node’ as a set of nodes that can be collapsed into a single node and then expanded back to the original set (Fig. 1b). These two plugins were tagged as ‘Grouping’. Seven plugins were tagged as ‘Layout’ because they are related to layout of nodes in the network. For example, ReOrientPlugin lays out nodes according to positions saved in a user-created Cytoscape session file. Three plugins, TransClust, BLAST2SimilarityGraph and clusterExplorerPlugin61, were tagged as ‘Sequence similarity’. They enable a user to visualize sequence similarity (for example, BLAST) results as networks of edges connecting genes that have high-scoring similar sequences. Another three plugins, ChemViz, structureViz62 (Fig. 1b) and RINalyzer63, were labeled with the ‘Molecular structure’ tag, as they visualize chemical and protein structures as networks on Cytoscape. FERN64 has the ‘Network simulation’ functional tag because it performs stochastic simulation of chemical reaction networks. In the future, we will allow developers and users to suggest tags for plugins to enable the community to maintain and extend our categorization system. The number of downloads for all plugins is shown in Supplementary Figure 6.
We are developing a number of community resources and improvements to Cytoscape to help make the plugin development process more fun and efficient. First, we are developing the next version of Cytoscape, version 3.0, to address the problem of maintaining backwards compatibility between Cytoscape and plugin versions. Cytoscape 3.0 uses the modular OSGi (Open Services Gateway initiative) framework (http://www.osgi.org/), which means that plugins will be less sensitive to changes in the software code as Cytoscape evolves and will be fully interoperable with other plugins. In the meantime, all of the plugins we review here will continue to work with v.2.8 and will be migrated to Cytoscape 3.0 soon after its release.
Second, we are developing the Cytoscape AppStore (http://apps.cytoscape.org/), a new online community forum centered on Cytoscape plugins that will promote the development, testing and distribution of plugins. Users can interactively tag, rate, review, document and install plugins via the web or from within Cytoscape.
Third, each year a different group of Cytoscape developers hosts an annual Cytoscape symposium to coordinate the use and development of Cytoscape and its plugins and to facilitate the exchange of ideas and research on network analysis. Information on the next Cytoscape symposium is available at http://www.cytoscape.org/.
Since Cytoscape was released and published a decade ago, a large number of plugins have been developed. This contribution by highly motivated users, developers and organizers has been crucial to the success and utility of the Cytoscape platform. If you are interested in participating in the Cytoscape community, we invite you to attend the symposium, develop a plugin, join our mailing list or simply try out Cytoscape.
Work on this review was funded by the National Resource for Network Biology (P41 GM103504) and the San Diego Center for Systems Biology (P50 GM085764). We thank J. Dutkowski, D. Emig and G. Hannum for advice and critical reading of the manuscript. Finally, the greatest thanks go to all of the plugin developers who have enriched the Cytoscape user experience with their ideas. We apologize to those plugin authors whose excellent work was not covered here because of space limitations.