In this article, we describe SED-ML, a language to encode procedures performed during computational simulation experiments, and its development process. The first version of SED-ML focuses on encoding uniform time-series experiments, since these are the most widely-used types of numerical model analysis in systems biology. They generally only require a model, and no additional resources such as experimental data.
We expect to extend future versions of SED-ML to include references to experimental data, as the standards and availability of relevant data develop. This is an essential first step towards encoding more complicated experiments such as nested simulations, parameter sweeps, parameter estimation, and sensitivity analysis. The limited scope of SED-ML Level 1 Version 1 lays a firm foundation from which to proceed, and any issues arising from its implementation can be dealt with better at an early stage. Moreover, an early release of a subset of the anticipated future functionality, with widespread community support, fosters participation and uptake amongst the modeling communities targeted by SED-ML.
As SED-ML evolves to describe more complex simulation experiments it will be increasingly useful to link models, simulation descriptions, and experimental data together in a machine-readable way. SED-ML describes the computational steps needed to reproduce particular results of a computational simulation, but it does not encode the simulation results themselves. The latter could be achieved, for instance, by the
Numerical Markup Language (NuML,
http://code.google.com/p/numl/). NuML initially had been part of the
Systems Biology Result Markup Language (SBRML, [
11]), a format to link a model with simulated and experimental datasets. SBRML used a free text 'Software' element to define the software tool, version and algorithm used to generate results. In addition, it will now provide the possibility to point towards a SED-ML file from the SBRML 'Method' element. Both SBRML and SED-ML will use NuML to store lists of numbers, either results or datasets.
SED-ML is agnostic about the underlying model representation formats and the software tool that gave rise to the experiment. The model variables that a SED-ML model needs to be aware of are addressed directly by XPath. SED-ML can thus encode simulation experiments involving models in different formats. Currently SED-ML is restricted to models encoded in XML-based formats. However, we envision that MIASE-compliant models may not always be XML-based and SED-ML should endeavor to address those formats in the future. Whilst many applications are tied to a particular modeling language, the increasing provision of simulation tools as web services [
12] would enable a computational workflow to execute such a SED-ML description. The goals of SED-ML closely align with those of the earlier RDF-based CellML Simulation and Graphing Metadata specifications [
13] and in the interests of developing a common standard, development of those metadata specifications has been migrated to SED-ML.
While the contributors to the development of the language are primarily from the systems biology community, there is no reason why SED-ML could not be used in other domains that use computational simulation, such as environmental or agricultural modeling, neuroscience or pharmacometrics. Various communities, working on biological model representations, have already committed to the use and support of SED-ML, including SBML, CellML, and the Virtual Cell. Promotion of SED-ML in other realms of science and model representation communities (e. g., ISML, NeuroML, NineML, SimileXML ...) is an ongoing focus. Some of these communities have implemented software support for SED-ML in different tools, including SED-ML validators and a SED-ML visual editor. An up-to-date list is available at the SED-ML website.
The model changes specified in a SED-ML file result in implicit new models. These new models are only instantiated by the simulation environment interpreting the SED-ML file. This important feature of SED-ML allows the exploration of many different model structures to be stored in a compact way. Other methods have been proposed in the past, such as XML diff and patch [
14]. This allows not only to change the parametrization of a model by changing the value of an XML attribute, but also to change the structure of the model by adding or removing XML construct. If a user then decides that the result of such changes is a new model, he may choose not to export a simulation description with that set of changes, but to store the modified version as a new model and use it as such in the simulation description. SED-ML is intended to be used by simulation software, as an export/import format. Therefore, the changes that are applicable to a model have to be specifiable within the software tool. As such, the software is responsible for only allowing valid model updates - and also for correctly translating them into SED-ML concepts. SED-ML itself does not restrict the changes that can be applied to the models mentioned in a SED-ML file.
A number of software libraries have already been made available in C++, Java and .NET. We briefly describe a few of them in the following paragraphs.
libSedML http://libsedml.sf.net/ is a set of .NET libraries for supporting SED-ML. The core library libSedML supports reading, validating and writing of SED-ML descriptions, along with all necessary utility functions for resolving models and XPath expressions. Two additional libraries are included: libSedML-Runner, which allows to schedule and execute simulation experiments encoded in SED-ML files using either RoadRunner (
http://roadrunner.sf.net/, [
15]) or a variety of simulators exposed through the Systems Biology Workbench (SBW, [
16]), such as iBioSim [
17] and COPASI [
18]. A third library, libSedMLScript, provides a script-based language for defining SED-ML experiments.
jlibsedml (
http://sourceforge.net/projects/jlibsedml/) is a Java library for creating, manipulating, validating and working with SED-ML documents. It provides support for retrieval and pre-processing of models, by application of XPath expressions, and also post-processing of raw simulation results as specified by SED-ML dataGenerator elements. The jlibsedml application programming interface (API) follows a similar organization to that of libSBML [
19], a successful and popular library for manipulation of SBML documents.
SProS (the SED-ML Processing Service) is an API described in interface definition language (IDL) for creating, reading and manipulating SED-ML documents, and so can be used by multiple software packages. The CellML API [
20] provides an implementation of SProS. Future versions of SProS will also provide support for running simulations described in SED-ML and involving CellML models (using the simulation facilities already present in the CellML API).
We see an important role for SED-ML in the publication workflow, and in the enrichment it can bring to manuscripts containing mathematical models. Many journals currently require that models described in a manuscript be made available in electronic form, often in SBML, but software-specific formats are also accepted. Although reviewers would ideally test these models during the review process, this is often not done, perhaps due to time pressure or unfamiliarity with modeling software. As a consequence, many figures that show simulation results cannot be reproduced by the models linked to the manuscript, resulting in a labor-intensive curation step for model repositories, such as BioModels Database [
21] and JWS Online model repository [
22]. To aid in the reviewing process and prevent discrepancies between manuscript and model, JWS Online, to give one example, has set up a model reviewing workflow with a number of journals. The workflow consists of an initial check by the curators to reproduce simulations in a submitted manuscript. SED-ML will make this workflow significantly easier. Ideally, modelers would provide SED-ML scripts with their manuscript submission, these scripts can be run directly by the curator and make the curation job much easier. If the SED-ML scripts are not provided upon model submission they are generated by the curator and made available to the manuscript reviewer. The script loads the respective model and returns the model simulation. A SED-ML script can be linked to each simulation figure in the manuscript. This publication workflow is shown in Figure .
SED-ML Level 1 Version 1 provides a foundation for storing simulation experiment descriptions. It is designed to be easily extensible through the definition of further simulation (and analysis) types. The community is already discussing several such extensions, and in particular to cover nested simulation experiments (needed in parameter scans) and steady state experiments. In addition to new simulation types, another important extension is the ability to consume experimental data and directly address previously-performed simulation results. This will open the door to further analyses such as parameter fitting and optimization tasks. Eventually, this will make SED-ML the format of choice for a compact but comprehensive description of simulation experiments, allowing for the seamless exchange of model, experimental data and simulation results between software tools. We also are hopeful that SED-ML will be used by Taverna-based workflows such as those presented in [
23].