The driving application for the development of Seedpod is an information management system to help our Human Brain Project collaborator at the University of Washington, Dr. George Ojemann. Dr. Ojemann’s lab studies the functional anatomy of speech and speech memory. In some of his studies a technique called single unit recording (SUR) is used to record a large amount of high temporal resolution neuronal signal data during open brain surgery. Scientists in the lab correlate the electrical signal data with behavioral and other data in order to find the meaning behind neural activities.
The Ojemann lab is a small one, and has used spreadsheets to record data because of their ease of use. The lab stores numeric or string data in spreadsheets, but manages multimedia data, such as the neuronal firing patterns, in a separate file system. The file system uses a complicated naming convention, which is managed manually. Data version control, coordinated data entry, and data sharing are challenging due to the lack of a centralized management system that can be accessed through the Internet. In addition, searching the data requires meticulous hand trimming and picking of datasets from multiple Excel sheets. Such searches are becoming increasingly untenable as the number of SUR studies increase. We have built Seedpod partly in response to these problems.
2.1. Seedpod’s General Architecture
There are two major components in Seedpod: the model and the LIMS application engine. The model is an integrated representation of a LIMS (). It includes a domain-specific data model describing the entities and relationships that the scientist wants to manage. It also includes an application model describing properties that allow the scientist to customize the look and feel of the LIMS web-based user interface.
Seedpod architecture: the Protégé model (top) and the web-based LIMS application (bottom).
The LIMS application engine has a server application, a backend relational database, and a web-based graphical user interface (GUI). Seedpod automatically transforms the Protégé model into a relational schema for the relational database. The database stores the experiment data and the model. The server application queries the database regarding the model, retrieves and stores the experiment data, and creates dynamic web pages for users based on the look and feel specified in the application model. The Protégé model and LIMS application are not linked in real time (i.e., they can change and evolve independent of each other).
2.2 LIMS Model
The first step in implementing a Seedpod-based project is to create a LIMS model using Protégé. Protégé is a frame-based knowledge management tool (http://protege.stanford.edu/
). We choose to use Protégé primarily because of its expressivity in model representation [3
A Protégé model consists of a set of named classes. shows a Protégé screen shot of class Trial_Protocol in the Ojemann SUR model. The class hierarchy of the model is shown on the left. Each class instance is associated with a set of template-slots, which are properties that are propagated to its instances and children classes. The value of a template-slot can be a primitive type such as String, Integer, Boolean, etc. Protocol_Description in is an example of a slot with a String type value. A template-slot can also be a relationship type, which has class instances as its values. The template-slot electrodes in is such a relationship slot. This slot describes a one-to-many relationship from a Trial_Protocol instance to multiple Electrode instances. The valid values for the slot are class instances of the Electrode class.
Figure 2 Sample screen of a Protégé model for a single unit recording experiment. In the upper left panel template-slots listed to the right are slots for the highlighted class, Trial_Protocol. The output database schema for this class is shown (more ...)
This Protégé model differs from an entity-relational (ER) model because it is object-oriented. It does not require the user to understand normalization in data modeling. It allows users to easily describe common yet complex biological data structures such as inheritance, object relationships, and many-to-many relationships. The model designer does not need to be relational database experts.
Additionally, the Protégé model in Seedpod includes the application model as well as the data model. For example, we can extend the standard template-slot definition to allow users to customize the look and feel of the LIMS application. The expanded window of the electrodes slot in shows a slot definition that we have extended to include “Database Type,” user interface “View Widget,” “Form Widget,” etc. Protégé makes this extension possible by allowing the modeler to extend the standard slot definition.
2.3 Model Transformation and Data Storage
As described in Section 2.1, each Seedpod application stores its experiment data in a relational database for efficient storage and retrieval. It is important to note that Seedpod does not use a generic database schema to handle all labs. Any given lab will have its own specific Protégé model, which is transformed into a specific relational schema. Instead of transforming Protégé models to schema manually (ad hoc), we developed a generalized method that performs this transformation automatically [4
The formal definitions of the relational and frame-based models provided the inspiration for the following rules:
A class, C, is transformed to a relational table or a view (or both):
- T1 If C is a concrete class, then create a table with name C_table and add a primary key attribute ID.
- T2 If C has subclasses, (is non-leaf) then create a view, C_union, that is defined by selecting the union of C_table and all of its subclasses’ tables.
A slot, S, of a class, C, is transformed depending on the slot's value type and cardinality.
- T3 If the range of S is primitive (i.e., String, Integer, Float, Boolean or Symbol), and has cardinality of 1, then create an attribute Attr_S for table C_table, and give it the corresponding relational database primitive type.
- T4 If the range of S is instances of class B, and has cardinality 1, then create a foreign key attribute, FK_S (in table C_table) that references the ID attribute in table B_table.
- T5 If S has cardinality multiple, create a new association table, Assoc_S. Add a foreign key, FK_S (in table Assoc_S) that references the ID attribute in table C_table. Create an attribute Attr_S for S in Assoc_S according to single cardinality rules T3 or T4.
Additional rules dealing with multiple cardinality slots, relational slots, class inheritance, etc. are beyond the scope of this paper (see [4
] and [5
The transformation method is implemented in a JAVA program. It takes a Protégé model file as input, and outputs the relational schema in SQL statements in a text file. shows the result of transforming class Trial_Protocol from the SUR model (described in Section 2.2 and ) to its database schema. The LIMS developer loads the SQL command file (e.g., ) into a relational database engine to create the database. We use a PostgreSQL database in the Seedpod prototype.
A sample relational schema output shows SQL statements from the transformation for table Trial_Protocol, and a view for its superclass Ordered_Set.
In addition to the data tables (like those shown in ) the database also stores the LIMS model in two tables, one table for classes ( top) and the other for slots ( bottom). shows screenshots of the tables with our previous examples, class Trial_Protocol and slot electrodes, highlighted. The Seedpod engine queries these two tables for information about the model while dynamically generating the web application.
Figure 3 The model is stored in the relational database as two Seedpod application tables called _class (top) and _attribute (bottom). Table _class (top) shows an example of the class Trial_Protocol. The table _attribute (bottom) shows slot electrodes of Trial_Protocol (more ...) 2.4 Seedpod Application
The Seedpod server application is generic; it is not specific for any model. After a LIMS model is completed and loaded into the database, the Seedpod server connects to the database server to deploy the web application and start collecting data. The server application queries the model in the database () to dynamically generate the web-based user interface for the scientists.
The scientist can browse and manage data in the relational database through this user interface. For example, the web page for an instance of a Protégé class in the SUR model, Trial_Protocol
, displays the template-slots and their values (). The display name of the instance is specified in the model as the slot value of Protocol_Name
(also visible in (top) as Slot(Protocol_Name)
under the “browser_key” column). The application renders the slot values using information from the model such as slot layout sequences and form field widgets.
(bottom) shows that slot electrodes
uses widget OBJECT_LINK
. In , the values for electrodes
slot are rendered as two URL links. Clicking on one of the links takes users to the corresponding web page displaying Electrode
instances. See [5
] for a live demo of Seedpod.
Figure 4 Screenshots of Seedpod’s web-based graphical user interface. The page on top shows an instance of the Trial_Protocol class. The page displays its template-slots and their values. Users can browse data related to this instance through a URL link (more ...)
Seedpod’s server application is implemented using Tomcat and JAVA. For each class in the model, the application dispatches by reflection the appropriate JAVA class implementation based on the class’s identity in the model. For example, if a user wants to view an instance of a Trial_Protocol, the engine tries to dispatch the Trial_Protocol JAVA class implementation to handle the request. However, if the Trial_Protocol JAVA class does not exist, the engine tries to find Trial_Protocol’s parent class, Order_Set_Element’s JAVA class. It continues until an implementation is found. Most of the classes in the SUR model are handled by the default class. This mechanism allows the LIMS developer to extend the server application by implementing class plug-ins.