CellML metadata serves two primary purposes: to describe what a model is and where it came from, and to annotate elements within the CellML document with relevant information. Metadata is also essential for distributing CellML models via any web-based repository. For a CellML document to be useful to the scientific community, it needs to meet certain requirements in terms of metadata. The minimal information requested in the annotation of biochemical model (MIRIAM, www.ebi.ac.uk/miriam
) standard (Le Novère et al. 2005
) defines minimum requirements for metadata annotation of biological models and is relevant to CellML models.
The current draft CellML metadata specification leverages several pre-existing metadata standards and creates some new CellML-specific definitions to provide a framework for defining metadata within a CellML document. These pre-existing standards include Dublin Core, vCard (www.imc.org/pdi/
) and bibliographic query service (BQS, www.omg.org/cgi-bin/doc?dtc/2001-12-03
Dublin Core is a set of metadata properties that were identified as common across a wide range of applications within library science and knowledge management, such as ‘creator’, ‘publisher’, ‘date’, ‘subject’, etc. vCard is an RDF definition of metadata about people and is used to annotate a CellML document with comprehensive information about the people who have been involved in all aspects of the model development. BQS provides a framework for defining bibliographic metadata. At the time the original CellML metadata specification was written, BQS was just a draft standard and no other standards existed for defining bibliographic metadata. BQS has since been superseded by frameworks such as the MIRIAM standard (Le Novère et al. 2005
Modification history metadata can describe who made what changes to any resource contained in a CellML model and at what time, which is particularly important in determining the provenance of a model (see Nickerson & Buist (2009)
for further discussion on the application of modification metadata). Currently, the addition of modification metadata is an optional step during the process of uploading a model to the repository, and the nature of this information is free form. Specific requirements for the modification metadata will be addressed in the near future by the new CellML 1.1 repository software (see §8
In addition to modification history, CellML metadata can also be used to define alternative names for CellML elements, add information pertaining to the species and/or sex of the organism the model specifically describes, and can also be used for free-form comments, annotations and descriptions of elements.
The most essential metadata defines where the model came from: if it is a description of a model from the literature, what is the citation? Who encoded the CellML, at what time? This information can be described using vCard and Dublin Core qualifiers. This is the core compulsory set of metadata, which is required to be associated with a CellML model for it to be entered into the CellML Model Repository. This metadata allows the models to be associated with the citation from which they are derived and provides information on provenance and must be added when a model is uploaded to the repository. A Uniform Resource Identifier (URI, www.w3.org/TR/uri-clarification)
is then derived from the author names, the publication date of the citation and the version number of the model. This URI is then associated with the model and converted to a Uniform Resource Locator (URL) where the model is stored.
The CellML metadata specification allows detailed revision histories to be associated with a model. As a model is curated, incremental changes to the model code are made. These changes may be trivial, such as correcting errors made during the translation of the model to CellML, or they may be more substantial, such as incorporating revisions to the model. For a model to be reused, it is essential that these changes are listed and fully documented so that a prospective user knows what has been changed and why, by whom and when; this information is especially salient when non-trivial changes to a model have been made. As model hierarchies defined using CellML 1.1 can be defined across multiple physical locations and components can be imported and reused from any of those locations, it is important to ensure that the modification annotations are associated with the appropriate resources in the model. It makes sense, for example, to have the history of parameter value changes associated with the corresponding variables in the submodel of the hierarchy in which the value is defined.
Other metadata that are relevant to curation of models in the CellML Model Repository include a brief description of the model, a comment on the curation status of the model and keywords about what the model describes. In the model publication 2.0 framework described in Nickerson & Buist (2009)
, a significantly more detailed description of the complete model is proposed, specified using not only the CellML metadata requirements described above for the CellML Model Repository but also the other metadata described in this paper including the CellML Simulation Metadata (CSM) standard, the CellML Graphing Metadata (CGM, www.cellml.org/specifications/metadata/graphs
) standard and new proposals for model biological and biophysical annotation.
(a) Support of the MIRIAM standard in the CellML metadata specification and curation practices
The MIRIAM standard (Le Novère et al. 2005
) is a community-agreed framework that defines the minimum requested information for annotation of biological models. The essential tenets of this framework are threefold: reference correspondence; attribution annotation; and external resource annotation.
The ‘reference correspondence’ requirements dictate that the model be encoded in a ‘public, standardized, machine-readable format’ and that the model be fully constrained and solvable to reproduce the results published in the reference description of the model. The validity of these assertions with respect to a particular model cannot currently be represented using the CellML metadata specification, but an effort is underway to use ‘curation flag’ qualifiers to achieve this. Assigning curation flags to a model would be an improvement on the current curation system, which relies on general text comments and assigning the model a number of ‘stars’, which can lead to uncertainty. A more descriptive set of ‘flags’ may be more informative, for example, ‘units/dimensions consistent’, ‘gives same results as referenced publication’, ‘meets the MIRIAM standard requirements’, etc.
To satisfy ‘attribution annotation’ under MIRIAM, the full provenance of both the original model and the CellML document that describes the model must be documented. This includes the citation of the reference description of the model, the name and contact information of the model authors and curators and the date the model was created and last modified. A version history is not specifically required by MIRIAM; however, the CellML project considers this information essential. This attribution annotation information can be specified using Dublin Core, vCard and BQS.
‘External resource annotation’ mandates that every constituent of a model be unambiguously annotated to a piece of knowledge, such as a database entry or an ontological term. The MIRIAM resources initiative (Laibe & Le Novère 2007
) provides a framework for this in the form of MIRIAM URIs. These provide ‘a way to uniquely describe entities with a perennial, stable identifier’, and to ‘link an entity to one or more online resources, where extra knowledge about it is available’. This information can be represented from within the CellML metadata specification using Dublin Core, although work is required to develop a best practice system.
MIRIAM URIs can also refer to citations, such as PubMed identifiers. The CellML project is currently considering the replacement of the BQS with MIRIAM URIs, as they are simpler and represent a much more widely accepted standard. By subscribing to such standards in CellML metadata, the potential for collaborative curation of biological models across the range of formats is greatly increased.