Combining Ontologies and Workflows
Our starting point is the EXACT ontology. We advocate its intrinsic value for describing protocols in a precise and unambiguous way and compared to OBI, it seems to be a better choice. The former describes the most typical entities of biomedical investigations undertaken by humans and not directly relevant to protocols. The latter is designed to specifically define experimental actions that can be performed by both scientists and machines and therefore it is more suitable for automation. Relying on ontologies, laboratory domain knowledge can be effiectively shared among the scientific community including scientists, computers and robots. The value of clearly describing protocols is demonstrated by the ability to exchange and compare them [39
]. Ontologies provide the human and machine-understandable universal language for such a shared understanding. Using the EXACT vocabulary, the protocols become unambiguous and key elements of actions are precisely identified. For instance in an EXACT Move
action, "what is moved?", "from where?" and "to where?" can be precisely defined.
On the other hand, defining protocols only by means of formal ontologies presents important limitations. Fully formalized protocols usually span many pages of text and the production of such descriptions by hand results labour intensive, error-prone and uninspiring [39
]. The modular structure of laboratory processes is not well identified, and this does not facilitate the reuse of well defined "building blocks". Furthermore, in our case the command actions currently defined by EXACT are minimal not allowing loops and other complex constructs.
It would be very problematic and almost impossible, for human users to read and fully understand protocols defined only in formal languages. A synthetic description in natural language of each activity should remain associated with its formal description. This would make each step understandable at first sight also to non-ontology experts. In general for human scientists it would be desirable to have a tool that permits a visual overview of the whole protocol. This would allow an easy identification of the constituting blocks, of the flow of execution and of the executors of each activity. In addition the tool would permit the retrieval of detailed information for each single block (i.e. parameters of an action or the structure of a subprotocol).
Therefore we developed a domain-specific language (DSL) [54
], together with software supporting it, that allows laboratory protocols to be expressed more clearly than the pre-existing languages presently allow. Keeping the advantages of ontologies, we adopt a more expressive formalism able to describe different aspects of laboratory protocols (i.e. execution flow and error handling). Among the existing models we opted for workflows which combine higher expressivity at the flow control level with higher comprehensibility for human beings. The combination of ontologies and workflows gave us the possibility of defining laboratory protocols in terms of workflows enriched with ontological knowledge.
Among the various standards available in the workflow community, we adopted XPDL which is the standard defined to facilitate interoperability between business processes and to promote serialization of the graphic BPMN notation.
We chose the EXACT ontology because it defines precise semantics for laboratory activities, while workflows are mainly designed to orchestrate them. In the literature, there was no evidence about methods specifically developed for building workflows based on ontologies. Our idea consists in taking elements of the EXACT Action class as building blocks (i.e. activities) of workflows, and using the EXACT Equipment class as parameters of the actions. Actions are executed using XPDL Applications (ranging from a text editor to custom built applications) to execute the whole protocol.
For integrating these two formalisms we applied the principles of model driven engineering (MDE) [55
]). In MDE there are three kinds of models: model, metamodel and metametamodel. "A metametamodel (also called M3) is a model that is its own reference model (i.e. it conforms to itself). A metamodel (M2) is a model such that its reference model is a metametamodel. A terminal model (M1) is a model such that its reference model is a metamodel" [56
]. The real-world manifestation of a model is also called M0. Our idea was to relate the two metamodels, XPDL and EXACT, to establish semantic correspondences between respective elements.
XPDL can be considered as our first metamodel defined in XSD, which is its metametamodel (Figure ). Therefore the XPDL language constructs, including Application and Activity, belong to the M2 level. In M1 we place Application definitions and invocations and Activity instances. An Activity represents an action which will be performed by a combination of resources and/or computer applications. One of the ways to implement an XPDL Activity is by using an XPDL Tool defined as a set of Applications. The latter are the description of programming language interfaces which may be invoked to support the Activity. The definition of Application reflects the interface that should be used to call the specific services that execute the Activity, including any parameters to be passed [57
Mapping between the two metamodels: EXACT Action and XPDL Application.
Our second metamodel is EXACT that has OWL as its metametamodel (Figure ). In EXACT, the Action class contains concepts describing what an action can do on the basis of the goal of the actions. These concepts represent the effect of an action but do not give any information about how to obtain this effect. Therefore Action subclasses are abstractions of actions and do not represent the real-world actions.
Following this interpretation, the Action individuals belong to the M1 level. Since EXACT and XPDL are both placed at the M2 level, we applied a model-to-model transformation from the EXACT meta-model to the XPDL meta-model that permits their integration.
The concept of function in programming languages can help to understand the relationship between EXACT and XPDL. The function definition and invocation belong to the M1 level, whereas the grammar rules applied for writing the function are at the M2 level. In our case, instances of the Action class can be thought as functions, and the Action class as meta-function since it describes what a function is and does; for the same reason an Application is a meta-function and its instances are functions. Establishing a relationship between the EXACT Action class and XPDL Application allows the integration of the semantic of the ontology into workflows.
In EXACT each action, included in the Action class, has a list of ad hoc defined properties that specify which (and how) objects are to be manipulated in the action. In this way the formal parameters required for each Action class are specified at the class level, while the actual parameters are passed at the instance level. An example is the Move action, defined as "an experiment action to change a spatial location of an entity from a start location to an end location". For creating a Move instance we need to define the actual parameters: a start location, an end location and an object that is changing location. This formalization has a limitation: for each action reported in a laboratory protocol, we have to create a particular instance of the action specifying the actual parameters to be passed.
XPDL uses a well known standard mechanism for passing parameters based on IN, OUT and INOUT modes. Both formal and actual parameters are respectively specified and passed at the instance level. Starting from the above consideration EXACT can not be directly integrated into XPDL due to the different mode of parameters handling.
In order to permit the integration we have extended EXACT in two ways: adding a new ontology layer named UnGap and enriching the Equipment class with new subclasses. Starting from the Action class, we develop UnGap, an ontology layer that gives a new structure for parameter definition, so that ontology actions can correspond to XPDL Applications. The resulting mapping is represented in Figure . As a result each action is characterized by a list of input parameters (ParamIN), a list of output parameters (ParamOUT) and a list of parameters both of input and output (ParamIN_OUT).
In the new layer we also define the new class Datatype specifying the available types of parameters. This class contains the EXACT Equipment class and owl:DatatypeProperty. The former contains objects that can be manipulated by actions; the latter scalar values that can be requested from actions (e.g. μl to be added). The instances of these subclasses can be used as actual parameters of XPDL Applications. The purpose of the UnGap layer is actually that of "closing the gap" between EXACT and XDPL. Thanks to the UnGap definition, EXACT and XPDL elements are now compatible. Therefore, action individuals correspond to XPDL Applications and action datatypes to XPDL datatypes. A relationship can be established between instances of Action and Application. In this way we can transform every instance of Action into the corresponding Application.
This translation mechanism provides a set of XPDL Application constructs that are invokable by workflow block activities. The invoked Application requires the complete specification of its formal parameters. This is obtained filling them with actual parameters defined as variables of subtype of the Datatype class. In this case we follow the same considerations for models and metamodels applied for the EXACT Action-XPDL Application mapping. We obtain a mapping between the UnGap Datatype class and XPDL DataType (Figure ).
Mapping between the two metamodels: UnGap Datatype and XPDL DataType.
To exploit the advantages of ontologies in describing domain knowledge we decided to maintain Datatype instead of translating it into an XPDL construct. We linked each Datatype subclasses to new custom datatypes. In XPDL this can be conveniently implemented in the DataType using xpdl:ExternalReference construct.
Following this strategy, the UnGap layer permits us to reap the advantages of both ontologies and workflows and makes EXACT semantically and operationally integrable with XPDL. Now in order to execute an activity we can invoke a specific instance of Action (e.g. add_reagent_to_container) through the corresponding XPDL Application that has been translated with COW (Figure ).
Figure 3 Description of the system proposed for designing a formal laboratory protocol. Add and add_reagent_to_container are respectively an Action subclass and one of its instances. Pipette is an Equipment subclass and is a formal parameter (paramIN) in add_reagent_to_container. (more ...)
Following the rules defined in UnGap, we can develop domain specific ontology (DSO) defining equipment and actions that are specific to a given laboratory. Figure presents the modified DSO structure and its new included concepts: ActiveEntity, describing the entities able to perform actions and PassiveEntity, corresponding to objects of actions. As a result, we enrich the Equipment class with new concepts common in laboratory protocols like thermoblock and swab.
Figure 4 The Equipment class. A) In the EXACT ontology the Equipment class does not have any child term. B) In our Domain Specific Ontology the original Equipment class is, together with the owl:DatatypeProperty, a subclass of the new defined Datatype class. (more ...)
In this way we defined a mechanism for the automatic transformation between elements of our two base meta-models: EXACT Action corresponds to XPDL Application and individuals of Action correspond to instances of Application.
The protocols formalized with our method will be translated into workflows written in XPDL and their activities will be formalized by EXACT actions and saved in an OWL file.
A COW protocol for paternity test
In order to discuss the novelty and the advantages of our proposal we present how the methodology and the developed tool have been used for formalising a laboratory protocol developed by our group.
In our laboratory we apply a protocol for paternity testing (Figure ). The objective of the test is to confirm or exclude paternity relationship among the donors of two DNA samples. The protocol was set up and is followed by the wet-lab staff of our laboratory. It is composed of several detailed steps, required to perform a series of operations and involves various instruments to execute these operations. The first three steps of the protocol written in natural language are reported in the first row of Figure . These steps describe a series of actions which require specific preconditions and parameters for their execution.
The Paternity Test protocol (a fragment) in textual form.
Figure 6 The Paternity Test protocol represented using COW. The first row represents the first three steps of the textual protocol. The second row represents the ontological concepts retrieved from the textual protocol. The third row displays an "ontologized" (more ...)
In order to represent the protocol as a workflow, the first operation consists in accurately examining each sentence of the textual protocol which can be divided into numerous independent steps (first row of Figure ). Reading step by step, we extract ontology concepts for Action and Datatype classes that should be included in our model (second row of Figure ). In the textual form, the verbs (e.g. Cut, Add, Incubate) describe the nature of the action to be executed, the objects (e.g. Eppendorf, Thermoblock, SwabHead) represent the required equipment, and the parameters of actions (1.5 ml, 30 minutes) become owl:DatatypeProperty.
Each identified verb is a candidate for being included as an ontological action in our DSO. For any action not yet existing in the DSO, a new concept specific for that action is then inserted as an Action subclass. Following the EXACT rules, the newly defined Action subclasses can be classified, according to their nature, among the separation actions, the transformation actions or the combination actions. This solution permits us to define instances of Action subclasses. As defined in the UnGap ontology layer, for each instance of the Action class it is required to specify ad-hoc parameters among the types listed in the Datatype class.
In our ontology all the instances of the Action class share the properties of the class but can be distinguished by the specified parameters, as commonly applied in object-oriented programming (OOP). In Figure , the Add class has two instances: add_labware_to_container and add_reagent_to_container. The add_reagent_to_container instance corresponds to step 2 of our protocol and differs from the add_labware_to_container in the paramIN, paramOUT, paramIN_OUT parameters. The paramIN parameters include the Pipette and Reagent as Equipment classes and the volumeQuantity_ul as a owl:DatatypeProperty. It is worth underlining that the add_reagent_to_container is a "template" that is defined just once at the beginning of the formalization process. As a result it can be re-used to model every step in the protocol that requires the addition of a reagent to a container.
Using the COW tool, all the concepts and properties described in the DSO and UnGap layers can be used to design a workflow (third row of Figure ). The resulting workflow is characterized by a flow of execution where activity blocks refer to the respective action in the corresponding protocol step. To specify a single block, e.g. the Add activity of step 2, we have to define workflow variables (myPipette, myLysisBuffer) that permit us to instantiate the add_reagent_to_container template. Such variables are instances of DSO classes, respectively Pipette and Reagent and they are actual parameters of the XPDL Application corresponding to the template. This method is repeated for each single protocol step obtaining a workflow completely defined by XPDL elements. The execution of the protocol formalized using COW can then be delegated to an independent XPDL-compliant run-time environment. The execution can also be concurrently distributed among several computational units interfaced with robotized stations or laboratory operators.