Pathway diagrams act as a visual representation of known portions of the vast molecular network that underpins all aspects of biological function. Models of pathways produced either as a graphical representation of known events or as a resource for mathematical modelling, are fundamental to understanding the workings of biological systems. However the task of assimilating the large amounts of available data and representing this information in an intuitive manner remains a challenge. Accordingly there has been increasing interest in the biology community to develop approaches for representing biological pathways. The Molecular Interaction Map (MIM) and Process Description Notation schemes were proposed by Kurt Kohn [10
] and Hiroaki Kitano (Kitano 2005), respectively, and their ideas laid the foundations for much of the work on pathway notation that has followed. The current mEPN scheme is the based on ideas from the PDN and original EPN schemes but importantly the experience of over four years of pathway construction, notation testing and discussions.
The objectives of the EPN as originally proposed remain preserved, as do many of the original concepts of the EPN and PDN schemes [9
]. However substantial modifications have been made to the notation system from the introduction of new symbols to changes in the aesthetics of the scheme and pathway syntax in order to achieve our original objectives. Firstly, we wanted a notation system that was flexible enough to allow the detailed representation of diverse biological entities, interactions and pathway concepts. In this respect, we have used the mEPN as described here not only in the construction of the large macrophage pathway diagrams [16
] which in their own right cover a diverse range of signalling and effector pathways, but also for the depiction of cholesterol metabolism and the cell cycle (not shown). In all of these endeavours the mEPN scheme has been able to depict the literature-based understanding of these systems and where it was formerly unable to support a concept, it was modified to allow us to do so. Secondly, we wanted a system for presenting pathway knowledge in a semantically and visually unambiguous manner. To some degree this is down to actually labelling components in a way that is unambiguous. The use of standard gene nomenclature to label genes/protein components, together with a formalized system to describe modifications to them, goes someway to achieving this. This has meant in many cases that we have needed to first deconvolute the literature which describes these systems using numerous different names for the same protein or complex. It means however that one component is unlikely ever to be represented more than once but with different names. It also facilitates use of the diagrams in the interpretation of experimentally derived data which is frequently annotated using standard gene nomenclature. Our third aim, which is related to the second, is that diagrams are as simple as possible to construct and are understandable by a biologist. To help ensure this to be the case all the work in creating our pathway diagrams has been performed by relatively junior biologists (MSc/PhD students). They have been encouraged to discuss their ideas and their pathways with each other so as iron out areas where the information is not clearly depicted. For this to happen they must be able to communicate complicated biological concepts using the diagrams. The readability of a diagram is not only dependent on the notation system but also on its layout. Although a variety of automated layout algorithms exist for network graphs they do not perform as well as a human curator with an artistic eye for the task. Pathway layout is relatively trivial for small diagrams, but a long time has had to be spent on optimizing the layout all of our large pathways so that they are relatively easy to interpret. However, large integrated pathway diagrams, like the systems they represent, are inevitably complex. Finally, pathway diagrams are central to efforts to computationally model the observed behaviour of biological systems [40
]. Our fourth objective has therefore been to develop the mEPN such that the semantics of the resulting network diagrams are sufficiently well defined that software tools can convert graphical models into formal models, suitable for analysis and simulation. Whilst the primary objective behind our efforts has been to create a graphical model of events, we have been mindful to construct pathway diagrams as directional networks that could in principle support studies on the dynamics of these systems. In examining various approaches to pathway modelling some are clearly not scalable, such as those using ordinary differential equations (ODEs) that require interaction parameters to be known or computed. Other approaches do not support the modelling of the co-dependencies between components of a pathway or give quantitative outputs (reviewed in [36
]. However the recently published signaling Petri net (SPN) [42
] potentially allows us to use diagrams constructed using the mEPN scheme to study the 'flow' of information through pathways. The SPN algorithm uses stochastic flow simulations to distribute 'tokens' representing quantitative estimates of activity through a network graph over time using only the network structure to determine outcomes. The technique has the advantage of offering fast computational simulations on large networks (< 1 sec for ~100 node networks), can support concepts of co-dependency between components and requires no kinetic details for interactions. In this way it should be possible to estimate the dynamics of information flow through a network and the effects of perturbations on that flow. Pathways drawn using the mEPN system can easily be converted into a bipartite graph of places (nodes) and transitions connected by arcs (edges) that are required to support this approach. We are currently exploring how SPN modelling might be used to better understand the structure and activity of the signalling systems of interest to us.
One advantage of the simple node and edge based approach to pathway element depiction is that it facilitates mEPN's conversion into other software environments. Graphml files (the main file exchange format used by the yEd editor) are supported by other network programs such as NodeXL http://www.codeplex.com/NodeXL
, Sonivis http://www.sonivis.org/
, GUESS http://guess.wikispot.org/GraphML
, Pajek http://pajek.imfm.si/doku.php
and NetworkX http://networkx.lanl.gov/
and the use of standard shaped nodes (glyphs) means that other generic network analysis tools such as Cytoscape [44
] could also be used to draw mEPN diagrams. In particular we have been developing mEPN's compatibility with BioLayout Express3D
, a network analysis tool developed by us for the visualization and analysis networks derived from 'omics data [37
]. We have recently implemented a parser that supports the import of .graphml files into BioLayout Express3D
. This translates the visual characteristics and layout as defined by the original .graphml 2D node co-ordinates of mEPN pathway diagrams from yEd in to a series of 3D objects, each representing a different class component using a combination of shape, size and colour (Figure ). Translating a 2D pathway into a 3D environment arguably offers no advantage for small diagrams. Indeed in 3D, arrowheads and polylines are not currently supported. However, when diagrams become large, pathways be rotated and viewed from any angle, zoomed in on and generally manipulated in an environment which is quite different to that of any 2D representation. In the 3D environment colour is a powerful device that can be used to further overlay visual information on to nodes (Figures ). Indeed we have now built in the ability of BioLayout Express3D
to directly export the analyses of one graph e.g. clusters from expression data and import and overlay this information on to another, in this instance a pathway (Figure ). It is also possible to imagine much larger models of pathway systems where the spatial layout of components in 3D space is based on a components cellular location (Figure ). With BioLayout Express3D
now capable of supporting networks comprising of up to 30,000-40,000 node graphs there is considerable scope for building ever larger pathway models and further exploring the potential of 3D environments for pathway visualization and analysis. One final use of the 3D environment is as a means to visualize pathway activity. We are now working on a version of BioLayout Express3D
that will harness the power of the OpenGL 3D graphics to animate analyses of flow through a pathway, again using a node's shape, size and colour to indicate a components activity during dynamic simulations of pathway activity.
Figure 5 Pathway Representation in 3D Environment. Large macrophage activation pathway rendered in 3D environment where node shape, size and colour represents a components identity. (A) Nodes coloured according to type e.g. light blue - proteins, yellow - protein (more ...)
Running concurrently with our work has been an ongoing community effort to establish rules for best practice in pathway depiction. The Systems Biology Graphical Notation http://www.sbgn.org/
project has been discussing issues and ideas around this topic and a manuscript describing the SBGN Process Diagrams Level 1 specification was recently published [43
]. The mEPN scheme as described here aspires to many of the same goals as the SBGN and where possible we have tried to harmonize the mEPN scheme to the emerging SBGN specification. However, our biologist centric approach to this problem, combined with a lack of flexible pathway editing tools, the scale our diagrams and the range of biological systems we have attempted to map, have all played their part in determining the design and implementation of the mEPN scheme. As a result there are a number of important differences that exist between the mEPN as described here and the SBGN scheme for process description language as currently proposed (level 1, version 1.1). Firstly, in common with the proposed SBGN scheme, the mEPN uses glyphs of a specific shape to define the class of a component although there are some differences between the two schemes (Figure ). However, under the SBGN scheme the glyph representing a multimeric protein complex is comprised of each protein in a complex being depicted separately, modifications to them being overlaid on top of these and the whole thing is enclosed by a container node. We have found this a considerable overhead to implement and can interfere the clarity of what is depicted rather than enhancing it (Figure ). Furthermore the notation scheme is not supported by many of the general purpose network visualization tools e.g. yEd, Cytoscape, Biolayout Express3D
] in general use, requiring instead the use of dedicated pathway software. Given the relatively recent publication of the SBGN specification tools to support its deployment are largely still under development. As a result the mEPN scheme generally uses a single standard shape to depict a component even when made up of more than one entity or a series of attached entities (Figure &). It relies on a labelling system to define the exact identity and make up of the component and its state e.g. the protein subunits that make up protein complex and their modifications (Figure ). Secondly, we have avoided the use of different arrowheads to depict the nature of interactions (edges). The meaning of numerous arrowheads can be challenging to remember and again they are not always supported by general pathway/network editing software packages. Instead mEPN uses inline annotation nodes to depict the meaning of edges which carry a letter symbolizing the meaning of the edge e.g. A for activation, I for inhibition, and may also use colour as an additional visual clue (Figure ). In principle this approach could support a wide range of edge meanings but in practice we have found many of the edge concepts supported by SBGN of no use in our mapping efforts and hence have not been included in the mEPN scheme. For instance a consumption arc
(edge) as defined by SBGN is 'used to represent the fact that an entity affects a process, but is not affected by the process' and a production arc
is 'used to represent the fact that an entity is produced by a process.' In the first instance, then this is the case with many enzymes acting on their substrate and in the second instance it is obvious by the fact that one thing leads on to another. In both cases we see this information as self-evident with no need for specific notation to depict it. In the case of the inclusion of specific edges to define a 'modulation' then the question is what kind of modulation is this and how would one interpret or model such a vague concept and the mEPN equivalent of the 'stimulation' edge is an activation edge. Finally, mEPN uses labelled process nodes to explicitly state the nature of interactions between components. In the proposed SBGN scheme process nodes are used, but generally not as a means to convey the nature of interactions except in the case of protein binding (association) and dissociation (Figure ). Whilst this approach is understandable on the basis that most process nodes would function similarly during computational modelling of such systems, not depicting the nature of the process whereby one component is transformed to another does impair visual interpretation of the diagrams. Therefore the mEPN provides a visual clue as to the nature of interaction using a one-to-three letter key to represent the nature of the process being depicted. When pathways are large and the distance between interacting species may be great, this can be an important visual aid to reading the diagrams. There are a number of other differences between the two schemes and full description of the differences between the SBGN level 1 notation and the mEPN described here can be found in Additional file 4
. Whilst on these and other points the mEPN and SBGN schemes may differ, we are fully supportive of the principle of promoting the adoption of a common notation system for pathway depiction and hope that current the work will contribute to this end.
Figure 6 Comparison of mEPN to SBGN. Main glyphs used in the mEPN shown on the left, SBGN glyphs on the right. (A) Shows the main symbols used for depicting biological entities and (B) the different ways the two schemes represent protein complexes. (C) Different (more ...)
There are significant efforts already underway to garner the support and interest of the wider biological community in assembling resources, information and pathway diagrams covering a broad spectrum of biology. Indeed, the need has never been greater for these resources. However, if they do not record pathways in a standardized way, integration of the results of these efforts will continue to be a considerable issue. To this end we are fully supportive of the SGBN's effort to promote the principles of standard notation systems even if we can not fully support the proposed SBGN specification for process diagrams. We present this work and accompanying website http://www.mepn-pathway.org/
in the hope that it is as positive contribution to the debate about how best to graphically model pathway knowledge.