The advent of Systems Biology has been accompanied by the blooming of pathway databases. Currently pathways are defined generically with respect to the organ or cell type where a reaction takes place. The cell type specificity of the reactions is the foundation of immunological research, and capturing this specificity is of paramount importance when using pathway-based analyses to decipher complex immunological datasets. Here, we present DC-ATLAS, a novel and versatile resource for the interpretation of high-throughput data generated perturbing the signaling network of dendritic cells (DCs).
Pathways are annotated using a novel data model, the Biological Connection Markup Language (BCML), a SBGN-compliant data format developed to store the large amount of information collected. The application of DC-ATLAS to pathway-based analysis of the transcriptional program of DCs stimulated with agonists of the toll-like receptor family allows an integrated description of the flow of information from the cellular sensors to the functional outcome, capturing the temporal series of activation events by grouping sets of reactions that occur at different time points in well-defined functional modules.
The initiative significantly improves our understanding of DC biology and regulatory networks. Developing a systems biology approach for immune system holds the promise of translating knowledge on the immune system into more successful immunotherapy strategies.
A standard graphical notation is essential to facilitate exchange of network representations of biological processes. Towards this end, the Systems Biology Graphical Notation (SBGN) has been proposed, and it is already supported by a number of tools. However, support for SBGN in Cytoscape, one of the most widely used platforms in biology to visualise and analyse networks, is limited, and in particular it is not possible to import SBGN diagrams.
We have developed CySBGN, a Cytoscape plug-in that extends the use of Cytoscape visualisation and analysis features to SBGN maps. CySBGN adds support for Cytoscape users to visualize any of the three complementary SBGN languages: Process Description, Entity Relationship, and Activity Flow. The interoperability with other tools (CySBML plug-in and Systems Biology Format Converter) was also established allowing an automated generation of SBGN diagrams based on previously imported SBML models. The plug-in was tested using a suite of 53 different test cases that covers almost all possible entities, shapes, and connections. A rendering comparison with other tools that support SBGN was performed. To illustrate the interoperability with other Cytoscape functionalities, we present two analysis examples, shortest path calculation, and motif identification in a metabolic network.
CySBGN imports, modifies and analyzes SBGN diagrams in Cytoscape, and thus allows the application of the large palette of tools and plug-ins in this platform to networks and pathways in SBGN format.
Motivation: LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The library also supports validation of maps, which simplifies the task of ensuring compliance with the detailed SBGN specifications. With this effort we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner.
Availability and implementation: Milestone 2 was released in December 2011. Source code, example files and binaries are freely available under the terms of either the LGPL v2.1+ or Apache v2.0 open source licenses from http://libsbgn.sourceforge.net.
Summary: Pathway Processor 2.0 is a web application designed to analyze high-throughput datasets, including but not limited to microarray and next-generation sequencing, using a pathway centric logic. In addition to well-established methods such as the Fisher’s test and impact analysis, Pathway Processor 2.0 offers innovative methods that convert gene expression into pathway expression, leading to the identification of differentially regulated pathways in a dataset of choice.
Availability and implementation: Pathway Processor 2.0 is available as a web service at http://compbiotoolbox.fmach.it/pathwayProcessor/. Sample datasets to test the functionality can be used directly from the application.
Supplementary data are available at Bioinformatics online.
Motivation: Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe.
Results: We describe a novel signaling pathway impact analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance P-value, which combines the enrichment and perturbation P-values. We illustrate the capabilities of the novel method on four real datasets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods.
Availability: SPIA was implemented as an R package available at http://vortex.cs.wayne.edu/ontoexpress/
Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: The rapid accumulation of knowledge in the field of Systems Biology during the past years requires advanced, but simple-to-use, methods for the visualization of information in a structured and easily comprehensible manner.
Availability: The biographer tool can be used at and downloaded from the web page http://biographer.biologie.hu-berlin.de/. The different software packages, including a server-indepenent version as well as a web server for Windows and Linux based systems, are available at http://code.google.com/p/biographer/ under the open-source license LGPL.
firstname.lastname@example.org or email@example.com
Motivation: The recently proposed Systems Biology Graphical Notation (SBGN) provides a standard for the visual representation of biochemical and cellular processes. It aims to support more efficient and accurate communication of biological knowledge between different research communities in the life sciences. However, to increase the use of SBGN, tools for editing, validating and translating SBGN maps are desirable.
Results: We present SBGN-ED, a tool which allows the creation of all three types of SBGN maps from scratch or the editing of existing maps, the validation of these maps for syntactical and semantical correctness, the translation of networks from the KEGG and MetaCrop databases into SBGN and the export of SBGN maps into several file and image formats.
Availability: SBGN-ED is freely available from http://vanted.ipk-gatersleben.de/addons/sbgn-ed. The web site contains also tutorials and example files.
The Systems Biology Graphical Notation (SBGN) provides standard graphical languages for representing cellular processes, interactions, and biological networks. SBGN consists of three languages: Process Descriptions (PD), Entity Relationships (ER), and Activity Flows (AF). Maps in SBGN PD are often large, detailed, and complex, therefore there is a need for a simplified illustration.
To solve this problem we define translations of SBGN PD maps into the more abstract SBGN AF maps. We present a template-based translation which allows the user to focus on different aspects of the underlying biological system. We also discuss aspects of laying out the AF map and of interactive navigation between both the PD and the AF map. The methods developed here have been implemented as part of SBGN-ED (
SBGN PD maps become much smaller and more manageable when translated into SBGN AF. The flexible translation of PD into AF and related interaction methods are an initial step in translating the different SBGN languages and open the path to future research for translation methods between other SBGN languages.
SBGN; Translation; Process Description; Activity Flow
Summary: CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation.
Availability: Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/∼jlv
Supplementary information: Supplementary data are available at Bioinformatics online.
Influenza is a common infectious disease caused by influenza viruses. Annual epidemics cause severe illnesses, deaths, and economic loss around the world. To better defend against influenza viral infection, it is essential to understand its mechanisms and associated host responses. Many studies have been conducted to elucidate these mechanisms, however, the overall picture remains incompletely understood. A systematic understanding of influenza viral infection in host cells is needed to facilitate the identification of influential host response mechanisms and potential drug targets.
We constructed a comprehensive map of the influenza A virus (‘IAV’) life cycle (‘FluMap’) by undertaking a literature-based, manual curation approach. Based on information obtained from publicly available pathway databases, updated with literature-based information and input from expert virologists and immunologists, FluMap is currently composed of 960 factors (i.e., proteins, mRNAs etc.) and 456 reactions, and is annotated with ~500 papers and curation comments. In addition to detailing the type of molecular interactions, isolate/strain specific data are also available. The FluMap was built with the pathway editor CellDesigner in standard SBML (Systems Biology Markup Language) format and visualized as an SBGN (Systems Biology Graphical Notation) diagram. It is also available as a web service (online map) based on the iPathways+ system to enable community discussion by influenza researchers. We also demonstrate computational network analyses to identify targets using the FluMap.
The FluMap is a comprehensive pathway map that can serve as a graphically presented knowledge-base and as a platform to analyze functional interactions between IAV and host factors. Publicly available webtools will allow continuous updating to ensure the most reliable representation of the host-virus interaction network. The FluMap is available at http://www.influenza-x.org/flumap/.
Drug targets; FluMap; Host factors; Influenza virus; Pathways
The development of complex biochemical models has been facilitated through the standardization of machine-readable representations like SBML (Systems Biology Markup Language). This effort is accompanied by the ongoing development of the human-readable diagrammatic representation SBGN (Systems Biology Graphical Notation). The graphical SBML editor CellDesigner allows direct translation of SBGN into SBML, and vice versa. For the assignment of kinetic rate laws, however, this process is not straightforward, as it often requires manual assembly and specific knowledge of kinetic equations.
SBMLsqueezer facilitates exactly this modeling step via automated equation generation, overcoming the highly error-prone and cumbersome process of manually assigning kinetic equations. For each reaction the kinetic equation is derived from the stoichiometry, the participating species (e.g., proteins, mRNA or simple molecules) as well as the regulatory relations (activation, inhibition or other modulations) of the SBGN diagram. Such information allows distinctions between, for example, translation, phosphorylation or state transitions. The types of kinetics considered are numerous, for instance generalized mass-action, Hill, convenience and several Michaelis-Menten-based kinetics, each including activation and inhibition. These kinetics allow SBMLsqueezer to cover metabolic, gene regulatory, signal transduction and mixed networks. Whenever multiple kinetics are applicable to one reaction, parameter settings allow for user-defined specifications. After invoking SBMLsqueezer, the kinetic formulas are generated and assigned to the model, which can then be simulated in CellDesigner or with external ODE solvers. Furthermore, the equations can be exported to SBML, LaTeX or plain text format.
SBMLsqueezer considers the annotation of all participating reactants, products and regulators when generating rate laws for reactions. Thus, for each reaction, only applicable kinetic formulas are considered. This modeling scheme creates kinetics in accordance with the diagrammatic representation. In contrast most previously published tools have relied on the stoichiometry and generic modulators of a reaction, thus ignoring and potentially conflicting with the information expressed through the process diagram. Additional material and the source code can be found at the project homepage (URL found in the Availability and requirements section).
The Pathway Interaction Database (PID, http://pid.nci.nih.gov) is a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes. Created in a collaboration between the US National Cancer Institute and Nature Publishing Group, the database serves as a research tool for the cancer research community and others interested in cellular pathways, such as neuroscientists, developmental biologists and immunologists. PID offers a range of search features to facilitate pathway exploration. Users can browse the predefined set of pathways or create interaction network maps centered on a single molecule or cellular process of interest. In addition, the batch query tool allows users to upload long list(s) of molecules, such as those derived from microarray experiments, and either overlay these molecules onto predefined pathways or visualize the complete molecular connectivity map. Users can also download molecule lists, citation lists and complete database content in extensible markup language (XML) and Biological Pathways Exchange (BioPAX) Level 2 format. The database is updated with new pathway content every month and supplemented by specially commissioned articles on the practical uses of other relevant online tools.
Motivation: The Systems Biology Markup Language (SBML) is currently supported by >230 software tools, among which 160 support numerical integration of ordinary differential equation (ODE) models. Although SBML is a widely accepted standard within this field, there is still no language-neutral library that supports all features of SBML for simulating ODE models. Therefore, a demand exists for a simple portable implementation of a numerical integrator that supports SBML to enhance the development of a computational platform for systems biology.
Results: We implemented a library called libSBMLSim, which supports all the features of SBML and confirmed that the library passes all tests in the SBML test suite including those for SBML Events, AlgebraicRules, ‘fast’ attribute on Reactions and Delay. LibSBMLSim is implemented in the C programming language and does not depend on any third-party library except libSBML, which is a library to handle SBML documents. For the numerical integrator, both explicit and implicit methods are written from scratch to support all the functionality of SBML features in a straightforward implementation. We succeeded in implementing libSBMLSim as a platform-independent library that can run on most common operating systems (Windows, MacOSX and Linux) and also provides several language bindings (Java, C#, Python and Ruby).
Availability: The source code of libSBMLSim is available from http://fun.bio.keio.ac.jp/software/libsbmlsim/. LibSBMLSim is distributed under the terms of LGPL.
Supplementary data are available at Bioinformatics online.
Summary: The specifications of the Systems Biology Markup Language (SBML) define standards for storing and exchanging computer models of biological processes in text files. In order to perform model simulations, graphical visualizations and other software manipulations, an in-memory representation of SBML is required. We developed JSBML for this purpose. In contrast to prior implementations of SBML APIs, JSBML has been designed from the ground up for the Java™ programming language, and can therefore be used on all platforms supported by a Java Runtime Environment. This offers important benefits for Java users, including the ability to distribute software as Java Web Start applications. JSBML supports all SBML Levels and Versions through Level 3 Version 1, and we have strived to maintain the highest possible degree of compatibility with the popular library libSBML. JSBML also supports modules that can facilitate the development of plugins for end user applications, as well as ease migration from a libSBML-based backend.
Availability: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML.
Supplementary information: Supplementary data are available at Bioinformatics online.
Over the last years, several methods for the phenotype simulation of microorganisms, under specified genetic and environmental conditions have been proposed, in the context of Metabolic Engineering (ME). These methods provided insight on the functioning of microbial metabolism and played a key role in the design of genetic modifications that can lead to strains of industrial interest. On the other hand, in the context of Systems Biology research, biological network visualization has reinforced its role as a core tool in understanding biological processes. However, it has been scarcely used to foster ME related methods, in spite of the acknowledged potential.
In this work, an open-source software that aims to fill the gap between ME and metabolic network visualization is proposed, in the form of a plugin to the OptFlux ME platform. The framework is based on an abstract layer, where the network is represented as a bipartite graph containing minimal information about the underlying entities and their desired relative placement. The framework provides input/output support for networks specified in standard formats, such as XGMML, SBGN or SBML, providing a connection to genome-scale metabolic models. An user-interface makes it possible to edit, manipulate and query nodes in the network, providing tools to visualize diverse effects, including visual filters and aspect changing (e.g. colors, shapes and sizes). These tools are particularly interesting for ME, since they allow overlaying phenotype simulation results or elementary flux modes over the networks.
The framework and its source code are freely available, together with documentation and other resources, being illustrated with well documented case studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0420-0) contains supplementary material, which is available to authorized users.
Metabolic network visualization; Metabolic engineering; Open-source software
Summary: The XML-based Systems Biology Markup Language (SBML) has emerged as a standard for storage, communication and interchange of models in systems biology. As a machine-readable format XML is difficult for humans to read and understand. Many tools are available that visualize the reaction pathways stored in SBML files, but many components, e.g. unit declarations, complex kinetic equations or links to MIRIAM resources, are often not made visible in these diagrams. For a broader understanding of the models, support in scientific writing and error detection, a human-readable report of the complete model is needed. We present SBML2LaTEX, a Java-based stand-alone program to fill this gap. A convenient web service allows users to directly convert SBML to various formats, including DVI, LaTEX and PDF, and provides many settings for customization.
Availability: Source code, documentation and a web service are freely available at http://www.ra.cs.uni-tuebingen.de/software/SBML2LaTeX.
Supplementary information:Supplementary data are available at Bioinformatics online.
Motivation: Network diagrams are commonly used to visualize biochemical pathways by displaying the relationships between genes, proteins, mRNAs, microRNAs, metabolites, regulatory DNA elements, diseases, viruses and drugs. While there are several currently available web-based pathway viewers, there is still room for improvement. To this end, we have developed a flash-based network viewer (FNV) for the visualization of small to moderately sized biological networks and pathways.
Summary: Written in Adobe ActionScript 3.0, the viewer accepts simple Extensible Markup Language (XML) formatted input files to display pathways in vector graphics on any web-page providing flexible layout options, interactivity with the user through tool tips, hyperlinks and the ability to rearrange nodes on the screen. FNV was utilized as a component in several web-based systems, namely Genes2Networks, Lists2Networks, KEA, ChEA and PathwayGenerator. In addition, FVN can be used to embed pathways inside pdf files for the communication of pathways in soft publication materials.
Availability: FNV is available for use and download along with the supporting documentation and sample networks at http://www.maayanlab.net/FNV.
Multidisciplinary integrated research requires the ability to couple the
diverse sets of data obtained from a range of complex experiments and
computer simulations. Integrating data requires semantically rich
information. In this paper an end-to-end use of semantically rich data in
computational chemistry is demonstrated utilizing the Chemical Markup
Language (CML) framework. Semantically rich data is generated by the NWChem
computational chemistry software with the FoX library and utilized by the
Avogadro molecular editor for analysis and visualization.
The NWChem computational chemistry software has been modified and coupled to
the FoX library to write CML compliant XML data files. The FoX library was
expanded to represent the lexical input files and molecular orbitals used by
the computational chemistry software. Draft dictionary entries and a format
for molecular orbitals within CML CompChem were developed. The Avogadro
application was extended to read in CML data, and display molecular geometry
and electronic structure in the GUI allowing for an end-to-end solution
where Avogadro can create input structures, generate input files, NWChem can
run the calculation and Avogadro can then read in and analyse the CML output
produced. The developments outlined in this paper will be made available in
future releases of NWChem, FoX, and Avogadro.
The production of CML compliant XML files for computational chemistry
software such as NWChem can be accomplished relatively easily using the FoX
library. The CML data can be read in by a newly developed reader in Avogadro
and analysed or visualized in various ways. A community-based effort is
needed to further develop the CML CompChem convention and dictionary. This
will enable the long-term goal of allowing a researcher to run simple
“Google-style” searches of chemistry and physics and have the
results of computational calculations returned in a comprehensible form
alongside articles from the published literature.
Chemical Markup Language; FoX; NWChem; Avogadro; Computational chemistry
Biologists make frequent use of databases containing large and complex biological networks. One popular database is the Kyoto Encyclopedia of Genes and Genomes (KEGG) which uses its own graphical representation and manual layout for pathways. While some general drawing conventions exist for biological networks, arbitrary graphical representations are very common. Recently, a new standard has been established for displaying biological processes, the Systems Biology Graphical Notation (SBGN), which aims to unify the look of such maps. Ideally, online repositories such as KEGG would automatically provide networks in a variety of notations including SBGN. Unfortunately, this is non‐trivial, since converting between notations may add, remove or otherwise alter map elements so that the existing layout cannot be simply reused.
Here we describe a methodology for automatic translation of KEGG metabolic pathways into the SBGN format. We infer important properties of the KEGG layout and treat these as layout constraints that are maintained during the conversion to SBGN maps.
This allows for the drawing and layout conventions of SBGN to be followed while creating maps that are still recognizably the original KEGG pathways. This article details the steps in this process and provides examples of the final result.
Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.
We used the MagicDraw modelling tool to design a UML model (Flow-OM) according to the FuGE extension guidelines and the AndroMDA toolkit to transform the model to a markup language (Flow-ML). We mapped each MIFlowCyt term to either an existing FuGE class or to a new FuGEFlow class. The development environment was validated by comparing the official FuGE XSD to the schema we generated from the FuGE object model using our configuration. After the Flow-OM model was completed, the final version of the Flow-ML was generated and validated against an example MIFlowCyt compliant experiment description.
The extension of FuGE for flow cytometry has resulted in a generic FuGE-compliant data model (FuGEFlow), which accommodates and links together all information required by MIFlowCyt. The FuGEFlow model can be used to build software and databases using FuGE software toolkits to facilitate automated exchange and manipulation of potentially large flow cytometry experimental data sets. Additional project documentation, including reusable design patterns and a guide for setting up a development environment, was contributed back to the FuGE project.
We have shown that an extension of FuGE can be used to transform minimum information requirements in natural language to markup language in XML. Extending FuGE required significant effort, but in our experiences the benefits outweighed the costs. The FuGEFlow is expected to play a central role in describing flow cytometry experiments and ultimately facilitating data exchange including public flow cytometry repositories currently under development.
Pathway models serve as the basis for much of systems biology. They are often built using programs designed for the purpose. Constructing new models generally requires simultaneous access to experimental data of diverse types, to databases of well-characterized biological compounds and molecular intermediates, and to reference model pathways. However, few if any software applications provide all such capabilities within a single user interface.
The Pathway Editor is a program written in the Java programming language that allows de-novo pathway creation and downloading of LIPID MAPS (Lipid Metabolites and Pathways Strategy) and KEGG lipid metabolic pathways, and of measured time-dependent changes to lipid components of metabolism. Accessed through Java Web Start, the program downloads pathways from the LIPID MAPS Pathway database (Pathway) as well as from the LIPID MAPS web server . Data arises from metabolomic (lipidomic), microarray, and protein array experiments performed by the LIPID MAPS consortium of laboratories and is arranged by experiment. Facility is provided to create, connect, and annotate nodes and processes on a drawing panel with reference to database objects and time course data. Node and interaction layout as well as data display may be configured in pathway diagrams as desired. Users may extend diagrams, and may also read and write data and non-lipidomic KEGG pathways to and from files. Pathway diagrams in XML format, containing database identifiers referencing specific compounds and experiments, can be saved to a local file for subsequent use. The program is built upon a library of classes, referred to as the Biopathways Workbench, that convert between different file formats and database objects. An example of this feature is provided in the form of read/construct/write access to models in SBML (Systems Biology Markup Language) contained in the local file system.
Inclusion of access to multiple experimental data types and of pathway diagrams within a single interface, automatic updating through connectivity to an online database, and a focus on annotation, including reference to standardized lipid nomenclature as well as common lipid names, supports the view that the Pathway Editor represents a significant, practicable contribution to current pathway modeling tools.
Summary: We have developed PathBuilder, an open-source web application to annotate biological information pertaining to signaling pathways and to create web-based pathway resources. PathBuilder enables annotation of molecular events including protein–protein interactions, enzyme–substrate relationships and protein translocation events either manually or through automated importing of data from other databases. Salient features of PathBuilder include automatic validation of data formats, built-in modules for visualization of pathways, automated import of data from other pathway resources, export of data in several standard data exchange formats and an application programming interface for retrieving existing pathway datasets.
Availability: PathBuilder is freely available for download at http://pathbuilder.sourceforge.net/ under the terms of GNU lesser general public license (LGPL: http://www.gnu.org/copyleft/lesser.html). The software is platform independent and has been tested on Windows and Linux platforms.
Supplementary information: Supplementary data are available at Bioinformatics online.
The explosion in biological information creates the need for databases that are easy to develop, easy to maintain and can be easily manipulated by annotators who are most likely to be biologists. However, deployment of scalable and extensible databases is not an easy task and generally requires substantial expertise in database development.
BioBuilder is a Zope-based software tool that was developed to facilitate intuitive creation of protein databases. Protein data can be entered and annotated through web forms along with the flexibility to add customized annotation features to protein entries. A built-in review system permits a global team of scientists to coordinate their annotation efforts. We have already used BioBuilder to develop Human Protein Reference Database , a comprehensive annotated repository of the human proteome. The data can be exported in the extensible markup language (XML) format, which is rapidly becoming as the standard format for data exchange.
As the proteomic data for several organisms begins to accumulate, BioBuilder will prove to be an invaluable platform for functional annotation and development of customizable protein centric databases. BioBuilder is open source and is available under the terms of LGPL.
Integrated Pathway Resources, Analysis and Visualization System (iPAVS) is an integrated biological pathway database designed to support pathway discovery in the fields of proteomics, transcriptomics, metabolomics and systems biology. The key goal of IPAVS is to provide biologists access to expert-curated pathways from experimental data belonging to specific biological contexts related to cell types, tissues, organs and diseases. IPAVS currently integrates over 500 human pathways (consisting of 24 574 interactions) that include metabolic-, signaling- and disease-related pathways, drug–action pathways and several large process maps collated from other pathway resources. IPAVS web interface allows biologists to browse and search pathway resources and provides tools for data import, management, visualization and analysis to support the interpretation of biological data in light of cellular processes. Systems Biology Graphical Notations (SBGN) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway notations are used for the visual display of pathway information. The integrated datasets in IPAVS are made available in several standard data formats that can be downloaded. IPAVS is available at: http://ipavs.cidms.org.
Recently, the Extensible Markup Language (XML) has received growing attention as a simple but flexible mechanism to represent medical data. As XML-based markups become more common there will be an increasing need to transform data stored in one XML markup into another markup. The Extensible Stylesheet Language (XSL) is a stylesheet language for XML. Development of a new mammography reporting system created a need to convert XML output from the MEDLee natural language processing system into a format suitable for cross-patient reporting. This paper examines the capability of XSL as a rule specification language that supports the medical XML data transformation. A set of nine relevant transformations was identified: Filtering, Substitution, Specification, Aggregation, Merging, Splitting, Transposition, Push-down and Pull-up. XSL-based methods for implementing these transformations are presented. The strengths and limitations of XSL are discussed in the context of XML medical data transformation.