To describe common concepts used in synthetic biology, we implemented SBOL-semantic, an information model for synthetic biology, using the Web Ontology Language (OWL). The Synthetic Biology Open Language (SBOL) (sbolstandard.org) is a collaborative effort of the Synthetic Biology Data Exchange Group to develop standards and technologies to facilitate information exchange for synthetic biologists. SBOL-semantic is based on the rough consensus of core synthetic biology concepts and their relationships and represents the semantics of synthetic biology theory and practice. We used an open process for the evolution and standardization of data models according to a framework for how data models in synthetic biology should be published 
. This new work builds on the Provisional BioBrick Language (PoBoL) 
We have built SBOL-semantic using OWL so as to be compliant with Semantic Web information technology standards that allow SBOL data records to be read, manipulated, and interpreted using generic tools such as Protégé 
, RDFlib 
and Sesame 
. These tools were used for management of SBOL model structure, to create a scheme for unique identification of elements, and to reference the Sequence Ontology 
, a third party ontology. The choice of W3C recommended technology was made on the premise that modeling knowledge in a computable, standardized, and community supported format will provide long term benefit for the synthetic biology community. (See also our discussion
and future work sections.)
The SBOL semantic structure is organized as a hierarchy of classes
that refer to distinct categories of common information objects, such as Parts, Cells, Plasmids, and Sequence Features. The most general of the classes
() constitute the core SBOL concepts. Instances of a class
data elements. shows the specific part known as BBa_B0015, a commonly used transcriptional terminator 
. In this figure, the part has annotations
that divide the part into segments such as BBa_B0010 that are themselves instances of the Part
class. In our model, all such annotations are properties
that capture relationship information between individuals. Data represented in this form can be conceptualized as a graph in which nodes are individuals
, members of SBOL classes, and edges are the properties
between them. Here we present results focused on Parts
and the description of their nucleotide sequence, Sequence Features
. The long term goal of SBOL is to represent information relevant to all levels of the engineering process in synthetic biology (Tissues, Cells, Plasmids, etc). Here, we demonstrate the open nature of the framework 
by extending this class structure to support the needed concepts from the Registry.
Top level Class (bold) and example sub-class (regular face) SBOL semantic terminology with a simplified definition for clarity.
Classes (black rectangles) describe types (open faced arrows, colored by type) of individual data elements (yellow rounded rectangles) and the composition relationships between them (closed faced arrows).
To create the semantic knowledgebase for synthetic biology we used the information available from the Registry of Standard Biological Parts (partsregistry.org) to create an extension of the SBOL class structure. This extension uses SBOL-semantic in combination with the new terminology acquired from the Registry to describe biological parts. First, we extracted the Registry data and mapped its structure of tables, its relational schema, to SBOL-semantic. This mapping served as our translation table to transforming the Registry data of 13,444 part entries and the associated Sequence Features to OWL/RDF. Using a script, we converted 13,444 Registry part records with their associated Sequence Features from the Registry format to the SBOL semantic (OWL/RDF) form. Each Registry part record was also associated with the Registry's Sequence Feature table, a position based description of the nucleotide sequence (see for example sequence features such as a ‘terminator’). We then mapped the Registry Sequence Feature table to the SBOL Sequence Annotation and Feature Class structures and performed the analogous translation into OWL/RDF.
As part of the transformation of Registry data we used the categories attribute of the Registry parts
table to provide a richer description of parts. The Registry includes a total of 346 categories organized as a hierarchy of 28 top level categories (e.g. chassis, classic, dna, function, plasmid, plasmidbackbone, primer, promoter, proteindomain, proteintag, rbs, regulation, ribosome, rnap, terminator, etc. For full listing see Supporting Information Table S1
, which contains the list of terms extracted from the Registry data, and File S1
., which contains the generated OWL encoded semi-structured controlled vocabulary used throughout this work). These categories are a rich vocabulary used to describe parts and constitute a controlled vocabulary, created and maintained by the Registry staff, while its use is enforced by the Registry website software application. The categories form the basis of organization for the Registry Catalog website. Thus, to provide a good structure for querying the Registry information, we needed to augment our core SBOL-semantic ontology with this terminology. To do so, we auto-generated a class structure within SBOL-semantic that mimics the registry category structure. For an example, see . Finally, we loaded the SBOL-semantic data into a framework for querying RDF data, creating the Standard Biological Parts knowledgebase resource (SBPkb) (see Implementation and Availability for details). As we show in our results section, we can use these categories to directly query the SBPkb for specific features of parts.
Example of Registry Categories to SBOL class structure conversion.
The semi-structured controlled vocabulary resulting from this process does not fulfill many of the criteria of formal ontology design 
. The structure created reflects the organization found in the Registry, and is not a proper class hierarchy. Our effort, directed towards SPARQL query information retrieval, translates the existing Registry information to a Semantic Web technology standard to enhance its potential for re-use. This utilitarian approach provides immediate benefit of data access and lays out the scope of the knowledge engineering challenges which face the synthetic biology community. Challenges of formally structuring information for future use in multiple applications are especially evident in large collections such as the user-driven and community-supported data source for our work, the Registry of Standard Biological Parts. However, the main contribution of this work is to provide a pragmatic solution for synthetic biology users, and establish the need for improvement of information resources in the field.