While there are several well-known metabolic pathway databases (KEGG, HumanCyc, Reactome, BioCarta, etc.), we have generally found that for the purposes of clinical ‘omics’ studies, the display formats, query options, information content and pathway coverage in these databases were often insufficient, sometimes incorrect or occasionally absent. In developing SMPDB, we not only tried to improve upon these shortcomings but also to build on some of the strengths of existing databases. We also endeavored to add content that is not normally found in other pathway databases. In particular, of the 364 pathways in SMPDB, 281 (or 72% of SMPDB’s content) are unique. More specifically, 11/13 metabolite signaling pathways, 4/70 metabolic pathways, 154/168 drug pathways and 112/113 metabolic disease pathways depicted in SMPDB are not depicted in any form by KEGG, Reactome, EHMN, WikiPathways, HumanCyc, BioCarta, PharmGKB or any other database. Indeed, SMPDB is currently the only pathway database that includes significant numbers of metabolic disease pathways (>110) and drug pathways (>160). In addition to providing a large number of novel pathways, SMPDB also adds a significant amount of new information content, including more than 30 000 words of original text describing each pathway in the context of human physiology and human biochemistry. Furthermore, each drug or metabolite in SMPDB is hyperlinked to detailed descriptions of that molecule, including extensive nomenclature information, comprehensive physico-chemical data, thousands of reference nuclear magnetic resonance (NMR) and mass spectrometry (MS) spectra as well as extensive information about tissue or biofluid locations and concentrations (~100 data fields per compound). In addition to this textual content, SMPDB also offers a significant amount of new and useful graphical content, including the depiction of the relevant organs, cellular locations, organelles, membrane boundaries, protein quaternary structures, cofactors and other cellular features in its pathway diagrams.
With regard to its interface design, SMPDB also offers a number of new or unique features. In particular, SMPDB uses thumbnail images to facilitate pathway viewing and browsing, it uses a scrollable table to display pathways and pathway synopses, and it employs a unique checkbox Highlight/Analyzer tool to allow users to interactively highlight and color multiple metabolites, drugs and/or proteins on its pathway images. SMPDB also uses a graphical structure-searching applet to enable sophisticated drug or metabolite similarity searches. As with some of the more fully developed pathway databases, SMPDB also provides protein/metabolite lists for each pathway, it supports advanced text and sequence queries and it allows pathway mapping from protein, gene and metabolite lists. While space does not permit a detailed comparison against all existing pathway databases, it is perhaps instructive to compare SMPDB to six of the larger and more established resources: KEGG, Reactome HumanCyc, BioCarta, EHMN and WikiPathways/GenMAPP.
This comparison is summarized in where we have used a number of general features or criteria to make our assessment. Some of these criteria definitions may need further elaboration. In particular: ‘Information provided on pathway entities’ is defined as providing hyperlinks to pages that give additional detail on protein/drug/metabolite sequences, functions, properties, reactions, structure, concentrations or spectra. ‘Supports advanced text search’ is defined as supporting field-specific searches, wild-card queries, Boolean searches, synonym searches, mis-spellings, text sorting or other kinds of complex textual queries beyond simple text matching. ‘Component lists available’ is defined as providing easily accessed, plain text or hyperlinked textual lists of all the genes, proteins, drugs and/or metabolites displayed in the pathway. In addition to the information in , we have also elaborated on the comparisons for four of the databases (KEGG, HumanCyc, Reactome and BioCarta) in the following paragraphs.
| Table 1.Comparison of SMPDB to KEGG, HumanCyc, Reactome, BioCarta, EHMN and WikiPathways/GenMAPP |
KEGG, with 330 reference pathways, is considered to be the ‘gold standard’ for most pathway databases because of its comprehensiveness and its breadth of organism-specific coverage. Of KEGG’s 158 metabolic pathways (), 73 are relevant to humans or other mammals. Interestingly, several key metabolic pathways in mammals are actually missing from KEGG (i.e. the malate-aspartate shuttle, and electron transfer). As the pathway diagrams in KEGG are designed to be very ‘generic’ they display no organ data, no cellular structure information, no protein superstructure data, no chemical structure information (except through hyperlinks) and no gene or protein names (only EC numbers). While KEGG does have 35 disease pathways, most are for cancer, neurological or immune diseases and only three of these relate to small-molecule metabolites or metabolic diseases. KEGG’s drug pathways are limited to showing only drug development or drug similarity as opposed to drug action or drug mechanism. While KEGG does provide very useful annotations (via hyperlinks) for the compounds shown in its pathways, it does not provide descriptive summaries of these pathways nor does it support the visual display of gene/protein/metabolite concentrations. As yet, KEGG does not provide protein/metabolite lists for each pathway nor does it support graphical structure queries or chemical structure similarity searches.
Similar to KEGG, the Reactome database provides 1000s of metabolic and signaling pathway data sets for many model organisms. Of these, 64 pathways are associated with human metabolism, three with human diseases and three with drug action or drug metabolism. Through its Reaction Map interface, Reactome is able to provide users with low-resolution pathway maps (similar to the thumbnail images used by the SMPDB browser) that allow users to interactively navigate through the database. Like SMPDB, Reactome provides extensive pathway or reaction descriptions along with hyperlinks to several external databases. Unlike SMPDB, Reactome does not display organ data, cellular compartment data, cellular organelle information, protein complex information or protein/gene names in its pathway diagrams. Likewise, Reactome has very few disease or drug pathways. On the other hand, Reactome, like SMPDB, has a ‘Skypainter’ feature allows users to paste in a list of genes or gene identifiers and to ‘paint’ the Reactome reaction map in a variety of ways. Unlike SMPDB, Reactome does not support chemical structure or sequence queries.
The HumanCyc database contains 349 pathways, including 29 superpathways (supersets of many of the other 320 pathways in the database). Two hundred and thirty-eight of these pathways are confirmed, meaning they have ‘evidence glyphs’ indicating 50% or more of the reactions have some evidence of occurring in humans. All pathways in HumanCyc can be ‘zoomed-in’ to display chemical structures, EC numbers and protein names, similar to SMPDB. Similar to SMPDB, HumanCyc supports advanced text searches as well as sequence searches. HumanCyc’s metabolic pathways are well referenced and generally well described, although most descriptions are given in the context of bacterial or plant metabolism. The images in HumanCyc do not display organ data, cellular compartment data, cellular organelle information or protein complex information. Currently, HumanCyc does not have any disease pathways and it provides only three drug pathways. While HumanCyc does support the visual display of gene/protein/metabolite concentrations using its own ‘OmicsViewer’, the display is small and sometimes difficult to interpret—especially for metabolomics applications. Unlike SMPDB, HumanCyc does not provide protein/metabolite lists for each pathway nor does it support chemical structure similarity searches.
The BioCarta database, with 360 pathways, is probably the most visually sophisticated database of the four pathway databases described here. Like SMPDB, not only most of its pathways depict cell, protein and chemical structure information, but also all of its pathways are well annotated with very detailed descriptions. BioCarta also contains a large number (>250) of protein signaling pathways and many other macromolecular interaction processes. However, BioCarta only has a modest number (55) of pathways that are devoted to small molecule metabolism and an even smaller number (<35) of pathways that are devoted to disease or drug action pathways. Very few of these disease or drug pathways overlap with those in SMPDB. As BioCarta is a community-annotated/generated database, its collection of pathways is somewhat haphazard and largely dependent on community interest. Likewise, BioCarta’s querying and display tools are very limited compared to most other pathway databases.