The life of a workflow extends beyond its initial construction and execution followed by its deposition in a repository. Its reuse also involves the discovery of existing and relevant designs, editing the workflow to repurpose it by the addition or removal of services, trying out the workflow and then re-registration of the workflow as a new version (22
). A workflow repository, such as myExperiment and a workflow construction environment such as Taverna, represent two components in the workflow life cycle. Existing workflows can be discovered through myExperiment and then downloaded and edited in their native workflow system. If the repurposing of downloaded workflows requires the addition of other web services, then their discovery can be aided by using service directories such as BioCatalogue (18
) and the EMBRACE registry (23
). It is also possible to discover which services are used in each workflow to enable searching of workflow content. Current work to provide closer integration of myExperiment and BioCatalogue will increase this functionality, allowing users to find all workflows containing particular services or services with a particular function.
Once a workflow is updated, it can be deposited in myExperiment with a link back to the original, allowing the evolution of workflows to be traced. This does not have to be performed by the original uploader of the original workflow since other members on myExperiment can contribute new versions depending on the access permissions of the initial workflow they have reused.
A workflow repository and construction tool provides two components targeted towards improving the reproducibility of data-driven research involving a combination of software packages that is now conducted in contemporary science (19
). Such analyses are often repeated several times with modification of the parameters until the final results are produced. While these results are reported in scientific papers, the actual process of computation is often neglected and makes replication of the computational analysis by an independent scientist difficult if not impossible. Mesirov (19
) proposes the use of a reproducible research system (RRS) to enable reproducible science. This RRS is comprised of a reproducible research environment (RRE) to perform the computational analysis and a reproducible research publisher (RRP) that is responsible for the preparation of a document describing the results of the computation.
The infrastructure provided by myExperiment and Taverna, together with the BioCatalogue registry of web services, can offer some of the functionality required for an RRS to replicate analyses of data. The analysis of data is described in a step-by-step manner as a workflow that can be constructed and enacted using Taverna, and Taverna is also responsible for recording the execution provenance in a separate repository (24
). The published workflow can be deposited in myExperiment and the web services it uses are described in BioCatalogue. While a document preparation system to complete the proposal by Mesirov (19
) is not yet offered directly, this type of component could be provided in the future, perhaps by making use of myExperiment packs for packaging workflows with provenance, input data and final results for redistribution with published papers.
myExperiment is a general repository for workflows and related research objects regardless of their format or native platform. The focus is to enable sharing and reuse of digital experimental protocols and support reproducible science. Some platform providers support workflow libraries, restricted to their own systems. Some are public, such as the Pipeline Pilot script and component libraries (http://accelrys.org/pipelinepilot/index.html
). Others are restricted to projects, enterprises or platform licence holders, such as InforSense’s Community Hub (http://chub.inforsense.com/
). The newly formed GenomeSpace project plans a repository in the style of myExperiment sometime in the future. Other popular workflow platforms such as Kepler (https://kepler-project.org/
), Knime (http://www.knime.org
) and the LONI pipeline (http://pipeline.loni.ucla.edu/
) have community forums but no community repositories.
A different kind of workflow repository focuses on protocol design. The Workflow Patterns repository (http://www.workflowpatterns.com/
) records abstract, generic workflow patterns. ProtocolDB [http://bioinformatics.eas.asu.edu/siteProtocolDB/projectProtocolDB.htm
)] supports ontology-guided workflow designs that are subsequently mapped onto real services, such as BioMOBY (26
). Semantic descriptions offer greater scope for workflow comparison, but at the price of a much higher overhead for metadata capture. myExperiment plans to incorporate richer semantics through controlled vocabulary tagging and integration with BioCatalogue. However, the emphasis will remain a mix of ontology-based, free tagging and community-based reviews, comments and ratings that do not discourage contribution or participation.