The advent of Systems Biology has been accompanied by the blooming of pathway databases. Currently pathways are defined generically with respect to the organ or cell type where a reaction takes place. The cell type specificity of the reactions is the foundation of immunological research, and capturing this specificity is of paramount importance when using pathway-based analyses to decipher complex immunological datasets. Here, we present DC-ATLAS, a novel and versatile resource for the interpretation of high-throughput data generated perturbing the signaling network of dendritic cells (DCs).
Pathways are annotated using a novel data model, the Biological Connection Markup Language (BCML), a SBGN-compliant data format developed to store the large amount of information collected. The application of DC-ATLAS to pathway-based analysis of the transcriptional program of DCs stimulated with agonists of the toll-like receptor family allows an integrated description of the flow of information from the cellular sensors to the functional outcome, capturing the temporal series of activation events by grouping sets of reactions that occur at different time points in well-defined functional modules.
The initiative significantly improves our understanding of DC biology and regulatory networks. Developing a systems biology approach for immune system holds the promise of translating knowledge on the immune system into more successful immunotherapy strategies.
Motivation: LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The library also supports validation of maps, which simplifies the task of ensuring compliance with the detailed SBGN specifications. With this effort we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner.
Availability and implementation: Milestone 2 was released in December 2011. Source code, example files and binaries are freely available under the terms of either the LGPL v2.1+ or Apache v2.0 open source licenses from http://libsbgn.sourceforge.net.
Motivation: The recently proposed Systems Biology Graphical Notation (SBGN) provides a standard for the visual representation of biochemical and cellular processes. It aims to support more efficient and accurate communication of biological knowledge between different research communities in the life sciences. However, to increase the use of SBGN, tools for editing, validating and translating SBGN maps are desirable.
Results: We present SBGN-ED, a tool which allows the creation of all three types of SBGN maps from scratch or the editing of existing maps, the validation of these maps for syntactical and semantical correctness, the translation of networks from the KEGG and MetaCrop databases into SBGN and the export of SBGN maps into several file and image formats.
Availability: SBGN-ED is freely available from http://vanted.ipk-gatersleben.de/addons/sbgn-ed. The web site contains also tutorials and example files.
A standard graphical notation is essential to facilitate exchange of network representations of biological processes. Towards this end, the Systems Biology Graphical Notation (SBGN) has been proposed, and it is already supported by a number of tools. However, support for SBGN in Cytoscape, one of the most widely used platforms in biology to visualise and analyse networks, is limited, and in particular it is not possible to import SBGN diagrams.
We have developed CySBGN, a Cytoscape plug-in that extends the use of Cytoscape visualisation and analysis features to SBGN maps. CySBGN adds support for Cytoscape users to visualize any of the three complementary SBGN languages: Process Description, Entity Relationship, and Activity Flow. The interoperability with other tools (CySBML plug-in and Systems Biology Format Converter) was also established allowing an automated generation of SBGN diagrams based on previously imported SBML models. The plug-in was tested using a suite of 53 different test cases that covers almost all possible entities, shapes, and connections. A rendering comparison with other tools that support SBGN was performed. To illustrate the interoperability with other Cytoscape functionalities, we present two analysis examples, shortest path calculation, and motif identification in a metabolic network.
CySBGN imports, modifies and analyzes SBGN diagrams in Cytoscape, and thus allows the application of the large palette of tools and plug-ins in this platform to networks and pathways in SBGN format.
Motivation: BioPAX is a standard language for representing and exchanging models of biological processes at the molecular and cellular levels. It is widely used by different pathway databases and genomics data analysis software. Currently, the primary source of BioPAX data is direct exports from the curated pathway databases. It is still uncommon for wet-lab biologists to share and exchange pathway knowledge using BioPAX. Instead, pathways are usually represented as informal diagrams in the literature. In order to encourage formal representation of pathways, we describe a software package that allows users to create pathway diagrams using CellDesigner, a user-friendly graphical pathway-editing tool and save the pathway data in BioPAX Level 3 format.
Availability: The plug-in is freely available and can be downloaded at ftp://ftp.pantherdb.org/CellDesigner/plugins/BioPAX/
Supplementary Information: Supplementary data are available at Bioinformatics online.
Summary: Payao is a community-based, collaborative web service platform for gene-regulatory and biochemical pathway model curation. The system combines Web 2.0 technologies and online model visualization functions to enable a collaborative community to annotate and curate biological models. Payao reads the models in Systems Biology Markup Language format, displays them with CellDesigner, a process diagram editor, which complies with the Systems Biology Graphical Notation, and provides an interface for model enrichment (adding tags and comments to the models) for the access-controlled community members.
Availability and implementation: Freely available for model curation service at http://www.payaologue.org. Web site implemented in Seaser Framework 2.0 with S2Flex2, MySQL 5.0 and Tomcat 5.5, with all major browsers supported.
Motivation: Network diagrams are commonly used to visualize biochemical pathways by displaying the relationships between genes, proteins, mRNAs, microRNAs, metabolites, regulatory DNA elements, diseases, viruses and drugs. While there are several currently available web-based pathway viewers, there is still room for improvement. To this end, we have developed a flash-based network viewer (FNV) for the visualization of small to moderately sized biological networks and pathways.
Summary: Written in Adobe ActionScript 3.0, the viewer accepts simple Extensible Markup Language (XML) formatted input files to display pathways in vector graphics on any web-page providing flexible layout options, interactivity with the user through tool tips, hyperlinks and the ability to rearrange nodes on the screen. FNV was utilized as a component in several web-based systems, namely Genes2Networks, Lists2Networks, KEA, ChEA and PathwayGenerator. In addition, FVN can be used to embed pathways inside pdf files for the communication of pathways in soft publication materials.
Availability: FNV is available for use and download along with the supporting documentation and sample networks at http://www.maayanlab.net/FNV.
The mammalian target of rapamycin (mTOR) is a central regulator of cell growth and proliferation. mTOR signaling is frequently dysregulated in oncogenic cells, and thus an attractive target for anticancer therapy. Using CellDesigner, a modeling support software for graphical notation, we present herein a comprehensive map of the mTOR signaling network, which includes 964 species connected by 777 reactions. The map complies with both the systems biology markup language (SBML) and graphical notation (SBGN) for computational analysis and graphical representation, respectively. As captured in the mTOR map, we review and discuss our current understanding of the mTOR signaling network and highlight the impact of mTOR feedback and crosstalk regulations on drug-based cancer therapy. This map is available on the Payao platform, a Web 2.0 based community-wide interactive process for creating more accurate and information-rich databases. Thus, this comprehensive map of the mTOR network will serve as a tool to facilitate systems-level study of up-to-date mTOR network components and signaling events toward the discovery of novel regulatory processes and therapeutic strategies for cancer.
cancer; CellDesigner; graphical notation; mTOR; regulatory network
LibSBML is an application programming interface library for reading, writing, manipulating and validating content expressed in the Systems Biology Markup Language (SBML) format. It is written in ISO C and C++, provides language bindings for Common Lisp, Java, Python, Perl, MATLAB and Octave, and includes many features that facilitate adoption and use of both SBML and the library. Developers can embed libSBML in their applications, saving themselves the work of implementing their own SBML parsing, manipulation, and validation software.
LibSBML 3 was released in August 2007. Source code, binaries and documentation are freely available under LGPL open-source terms from http://sbml.org/software/libsbml.
Thousands of biochemical interactions are available for download from curated databases such as Reactome, Pathway Interaction Database and other sources in the Biological Pathways Exchange (BioPAX) format. However, the BioPAX ontology does not encode the necessary information for kinetic modeling and simulation. The current standard for kinetic modeling is the System Biology Markup Language (SBML), but only a small number of models are available in SBML format in public repositories. Additionally, reusing and merging SBML models presents a significant challenge, because often each element has a value only in the context of the given model, and information encoding biological meaning is absent. We describe a software system that enables a variety of operations facilitating the use of BioPAX data to create kinetic models that can be visualized, edited, and simulated using the Virtual Cell (VCell), including improved conversion to SBML (for use with other simulation tools that support this format).
Recently, the Extensible Markup Language (XML) has received growing attention as a simple but flexible mechanism to represent medical data. As XML-based markups become more common there will be an increasing need to transform data stored in one XML markup into another markup. The Extensible Stylesheet Language (XSL) is a stylesheet language for XML. Development of a new mammography reporting system created a need to convert XML output from the MEDLee natural language processing system into a format suitable for cross-patient reporting. This paper examines the capability of XSL as a rule specification language that supports the medical XML data transformation. A set of nine relevant transformations was identified: Filtering, Substitution, Specification, Aggregation, Merging, Splitting, Transposition, Push-down and Pull-up. XSL-based methods for implementing these transformations are presented. The strengths and limitations of XSL are discussed in the context of XML medical data transformation.
Multidisciplinary integrated research requires the ability to couple the
diverse sets of data obtained from a range of complex experiments and
computer simulations. Integrating data requires semantically rich
information. In this paper an end-to-end use of semantically rich data in
computational chemistry is demonstrated utilizing the Chemical Markup
Language (CML) framework. Semantically rich data is generated by the NWChem
computational chemistry software with the FoX library and utilized by the
Avogadro molecular editor for analysis and visualization.
The NWChem computational chemistry software has been modified and coupled to
the FoX library to write CML compliant XML data files. The FoX library was
expanded to represent the lexical input files and molecular orbitals used by
the computational chemistry software. Draft dictionary entries and a format
for molecular orbitals within CML CompChem were developed. The Avogadro
application was extended to read in CML data, and display molecular geometry
and electronic structure in the GUI allowing for an end-to-end solution
where Avogadro can create input structures, generate input files, NWChem can
run the calculation and Avogadro can then read in and analyse the CML output
produced. The developments outlined in this paper will be made available in
future releases of NWChem, FoX, and Avogadro.
The production of CML compliant XML files for computational chemistry
software such as NWChem can be accomplished relatively easily using the FoX
library. The CML data can be read in by a newly developed reader in Avogadro
and analysed or visualized in various ways. A community-based effort is
needed to further develop the CML CompChem convention and dictionary. This
will enable the long-term goal of allowing a researcher to run simple
“Google-style” searches of chemistry and physics and have the
results of computational calculations returned in a comprehensible form
alongside articles from the published literature.
Chemical Markup Language; FoX; NWChem; Avogadro; Computational chemistry
Summary: The XML-based Systems Biology Markup Language (SBML) has emerged as a standard for storage, communication and interchange of models in systems biology. As a machine-readable format XML is difficult for humans to read and understand. Many tools are available that visualize the reaction pathways stored in SBML files, but many components, e.g. unit declarations, complex kinetic equations or links to MIRIAM resources, are often not made visible in these diagrams. For a broader understanding of the models, support in scientific writing and error detection, a human-readable report of the complete model is needed. We present SBML2LaTEX, a Java-based stand-alone program to fill this gap. A convenient web service allows users to directly convert SBML to various formats, including DVI, LaTEX and PDF, and provides many settings for customization.
Availability: Source code, documentation and a web service are freely available at http://www.ra.cs.uni-tuebingen.de/software/SBML2LaTeX.
Supplementary information:Supplementary data are available at Bioinformatics online.
Motivation: The rapid accumulation of knowledge in the field of Systems Biology during the past years requires advanced, but simple-to-use, methods for the visualization of information in a structured and easily comprehensible manner.
Availability: The biographer tool can be used at and downloaded from the web page http://biographer.biologie.hu-berlin.de/. The different software packages, including a server-indepenent version as well as a web server for Windows and Linux based systems, are available at http://code.google.com/p/biographer/ under the open-source license LGPL.
email@example.com or firstname.lastname@example.org
To develop dedicated markup language for clinical contents models (CCM) to facilitate the active use of CCM in electronic health record systems.
Based on analysis of the structure and characteristics of CCM in the clinical domain, we designed extensible markup language (XML) based CCM markup language (CCML) schema manually.
CCML faithfully reflects CCM in both the syntactic and semantic aspects. As this language is based on XML, it can be expressed and processed in computer systems and can be used in a technology-neutral way.
CCML has the following strengths: it is machine-readable and highly human-readable, it does not require a dedicated parser, and it can be applied for existing electronic health record systems.
Clinical Information System; XML; Semantics
The aim of this study is to develop a secure, Google-based data-mining tool for radiology reports using free and open source technologies and to explore its use within an academic radiology department. A Health Insurance Portability and Accountability Act (HIPAA)-compliant data repository, search engine and user interface were created to facilitate treatment, operations, and reviews preparatory to research. The Institutional Review Board waived review of the project, and informed consent was not required. Comprising 7.9 GB of disk space, 2.9 million text reports were downloaded from our radiology information system to a fileserver. Extensible markup language (XML) representations of the reports were indexed using Google Desktop Enterprise search engine software. A hypertext markup language (HTML) form allowed users to submit queries to Google Desktop, and Google’s XML response was interpreted by a practical extraction and report language (PERL) script, presenting ranked results in a web browser window. The query, reason for search, results, and documents visited were logged to maintain HIPAA compliance. Indexing averaged approximately 25,000 reports per hour. Keyword search of a common term like “pneumothorax” yielded the first ten most relevant results of 705,550 total results in 1.36 s. Keyword search of a rare term like “hemangioendothelioma” yielded the first ten most relevant results of 167 total results in 0.23 s; retrieval of all 167 results took 0.26 s. Data mining tools for radiology reports will improve the productivity of academic radiologists in clinical, educational, research, and administrative tasks. By leveraging existing knowledge of Google’s interface, radiologists can quickly perform useful searches.
Google; data mining; reports; HIPAA; search engine
Does PubMed Central—a government-run digital archive of biomedical articles—compete with scientific society journals? A longitudinal, retrospective cohort analysis of 13,223 articles (5999 treatment, 7224 control) published in 14 society-run biomedical research journals in nutrition, experimental biology, physiology, and radiology between February 2008 and January 2011 reveals a 21.4% reduction in full-text hypertext markup language (HTML) article downloads and a 13.8% reduction in portable document format (PDF) article downloads from the journals' websites when U.S. National Institutes of Health-sponsored articles (treatment) become freely available from the PubMed Central repository. In addition, the effect of PubMed Central on reducing PDF article downloads is increasing over time, growing at a rate of 1.6% per year. There was no longitudinal effect for full-text HTML downloads. While PubMed Central may be providing complementary access to readers traditionally underserved by scientific journals, the loss of article readership from the journal website may weaken the ability of the journal to build communities of interest around research papers, impede the communication of news and events to scientific society members and journal readers, and reduce the perceived value of the journal to institutional subscribers.—Davis, P. M. Public accessibility of biomedical articles from PubMed Central reduces journal readership—retrospective cohort analysis.
digital repositories; downloads; open access; scientific publishing
The ImmunoDeficiency Resource (IDR), freely available at http://www.uta.fi/imt/bioinfo/idr/, is a comprehensive knowledge base on immunodeficiencies. It is designed for different user groups such as researchers, physicians and nurses as well as patients and their families and the general public. Information on immunodeficiencies is stored as fact files, which are disease- and gene-based information resources. We have developed an inherited disease markup language (IDML) data model, which is designed for storing disease- and gene-specific data in extensible markup language (XML) format. The fact files written by the IDML can be used to present data in different contexts and platforms. All the information in the IDR is validated by expert curators.
Motivation: The biological pathway exchange language (BioPAX) and the systems biology markup language (SBML) belong to the most popular modeling and data exchange languages in systems biology. The focus of SBML is quantitative modeling and dynamic simulation of models, whereas the BioPAX specification concentrates mainly on visualization and qualitative analysis of pathway maps. BioPAX describes reactions and relations. In contrast, SBML core exclusively describes quantitative processes such as reactions. With the SBML qualitative models extension (qual), it has recently also become possible to describe relations in SBML. Before the development of SBML qual, relations could not be properly translated into SBML. Until now, there exists no BioPAX to SBML converter that is fully capable of translating both reactions and relations.
Results: The entire nature pathway interaction database has been converted from BioPAX (Level 2 and Level 3) into SBML (Level 3 Version 1) including both reactions and relations by using the new qual extension package. Additionally, we present the new webtool BioPAX2SBML for further BioPAX to SBML conversions. Compared with previous conversion tools, BioPAX2SBML is more comprehensive, more robust and more exact.
Availability: BioPAX2SBML is freely available at http://webservices.cs.uni-tuebingen.de/ and the complete collection of the PID models is available at http://www.cogsys.cs.uni-tuebingen.de/downloads/Qualitative-Models/.
Supplementary data are available at Bioinformatics online.
Image Markup Language is an extensible markup language (XML) schema used to describe both image metadata and annotations. It describes both data pertaining to an entire image, and data that are tied to specific regions or features of the image. Developed for a specific domain in Medical Education, this pa-per describes extensions to take advantage of the Dublin Core metadata standard, and of an XML schema for vector graphics representation. We have developed a prototype system of open source tools implementing an authoring system, a client system, and an image annotation database which can be queried though the Web.
There is general agreement amongst biologists about the need for good pathway diagrams and a need to formalize the way biological pathways are depicted. However, implementing and agreeing how best to do this is currently the subject of some debate.
The modified Edinburgh Pathway Notation (mEPN) scheme is founded on a notation system originally devised a number of years ago and through use has now been refined extensively. This process has been primarily driven by the author's attempts to produce process diagrams for a diverse range of biological pathways, particularly with respect to immune signaling in mammals. Here we provide a specification of the mEPN notation, its symbols, rules for its use and a comparison to the proposed Systems Biology Graphical Notation (SBGN) scheme.
We hope this work will contribute to the on-going community effort to develop a standard for depicting pathways and will provide a coherent guide to those planning to construct pathway diagrams of their biological systems of interest.
Motivation: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality.
Results: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model.
Availability: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists
The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at
The Olfactory Receptor Database (ORDB; http://senselab.med.yale.edu/senselab/ordb) is a central repository of olfactory receptor (OR) and olfactory receptor-like gene and protein sequences. To deal with the very large OR gene family, we have constructed an algorithm that automatically downloads sequences from web sources such as GenBank and SWISS-PROT into the database. The algorithm uses hypertext markup language (HTML) parsing techniques that extract information relevant to ORDB. The information is then correlated with the metadata in the ORDB knowledge base to encode the unstructured text extracted into the structured format compliant with the database architecture, entity attribute value with classes and relationship (EAV/CR), which supports the SenseLab project as a whole. Three population methods: batch, automatic and semi-automatic population are discussed. The data is imported into the database using extensible markup language (XML).
Many three-dimensional (3D) images are routinely collected in biomedical research and a number of digital atlases with associated anatomical and other information have been published. A number of tools are available for viewing this data ranging from commercial visualization packages to freely available, typically system architecture dependent, solutions. Here we discuss an atlas viewer implemented to run on any workstation using the architecture neutral Java programming language.
We report the development of a freely available Java based viewer for 3D image data, descibe the structure and functionality of the viewer and how automated tools can be developed to manage the Java Native Interface code. The viewer allows arbitrary re-sectioning of the data and interactive browsing through the volume. With appropriately formatted data, for example as provided for the Electronic Atlas of the Developing Human Brain, a 3D surface view and anatomical browsing is available. The interface is developed in Java with Java3D providing the 3D rendering. For efficiency the image data is manipulated using the Woolz image-processing library provided as a dynamically linked module for each machine architecture.
We conclude that Java provides an appropriate environment for efficient development of these tools and techniques exist to allow computationally efficient image-processing libraries to be integrated relatively easily.