We present here an ontology of DC types (DC-CL) and the method used to create the ontology. The motivation for developing DC-CL was two-fold: to provide a common point of reference for standardized terms and definitions for DC subtypes and to develop a method for representing cell types that is highly computable and builds on existing resources. DCs have a particularly complicated biology [
16-
18]; thus not only are efforts to develop standardized, comprehensive information resources needed, but DCs are a good model for testing a method for representing cells in an ontology.
We have developed DC-CL using a systematic approach for the ontological representation of cells that:
i) separates classification via the is_a relation from the assertion of structural, functional, and lineage properties by using formally defined, property-specific relations, such as has_function
ii) systematically includes both species-neutral and species-specific types
iii) defines cell types on the basis of specific combinations of surface proteins used for identification of the cells via flow cytometry.
The use of property-specific relations, such as
has_function, to incorporate structural, functional, and lineage properties has many benefits. First, this approach eliminates many of the errors that frequently result from multiple uses of the
is_a relation [
34-
36] in what has been called '
is_a overloading' [
15]. Second, the
is_a relation can only be used between entities of the same ontological category (higher level types, such as those found in the Basic Formal Ontology described below), while specific relations can be used to relate cells to entities in other categories, such as functions (
has_function), molecules (
has_part), and processes (
participates_in), that are represented within their own ontologies. DC-CL is formally connected to the hierarchical structure and relations of these ontologies, as well as the data annotated in their terms, thereby providing significant additional information and opportunities for data integration. The use of property-specific relations also allows us, without sacrificing expressive power, to maintain a policy of single inheritance (each representational unit in the ontology has maximally one single asserted
is_a parent), which brings benefits such as clearer statement of definitions, easier and more reliable curation, ability to use more powerful reasoning tools, and the ability to have a unique measure of distance between any two terms on the same branch of an ontology. Finally, the use of property-specific relations enhances ontologies for computational analyses because each relation can be defined with its own inference properties.
The inclusion of species-specific cell types allows for the more specific annotation of data and for the incorporation within DC-CL of species-specific properties, many of which have important functional consequences. For example, the plasmacytoid DCs observed in humans (CD11c
-) express Toll-like receptors (TLR) TLR7 and TLR9, while the plasmacytoid DCs observed in mice (CD11c
low) express all mouse TLRs except for TLR3 and TLR4, with consequent differences in the types of pathogens human and mouse plasmacytoid DCs can detect [
37]. We avoid use of species of origin as a basis of defining types, however, and only define types based on the presence or absence of specific surface proteins. Thus, the plasmacytoid DCs observed in humans are instances of the type
CD11c- plasmacytoid dendritic cell, while the plasmacytoid DCs observed in mice are instances of the type
CD11clow plasmacytoid dendritic cell, where the two types are defined by the patterns of surface protein expression given in the above definitions. Plasmacytoid DCs observed in a third species to have either pattern of surface protein expression would be instances of the corresponding type. In addition, we only include assertions about the cell types that hold across all species in which the type is observed. In this way, the inclusion of species-specific DC types in DC-CL facilitates understanding of the similarities and differences between mouse and human immunology and improved capacity for generating hypotheses about the human immune response from the interpretation of the results of mouse experiments. In this way DC-CL also fosters the advance of translational medicine.
To define cell types on the basis of species of origin, or to include assertions that hold for the type in one species but not another, we recommend the creation of species-specific extensions rather than the inclusion of such types in CL or DC-CL. This approach allows for the representation of detailed, species-specific information without using multiple modes of classification (structure and species of origin) or including conflicting assertions in the core ontology. The approach of more specific extensions of a core template ontology has been used successfully in the creation of species-specific anatomy ontologies as extensions of the Common Anatomy Reference Ontology (CARO) [
38] and in the creation of ontologies of specific infectious diseases as extensions of the core Infectious Disease Ontology (IDO) [
39].
The use of specific combinations of surface proteins to define DC subtypes has advantages both for the creation of DC-CL and for its application to the analysis of cellular data. A primary means by which experimentalists distinguish cell types is by distinguishing patterns of protein expression using flow cytometry. Defining DC subtypes in terms of flow markers allows easy incorporation into DC-CL of new discoveries about DCs deriving from experiments involving flow cytometry to isolate or analyze cell populations. Similarly, defining DC subtypes in terms of flow markers optimizes DC-CL for the annotation, analysis, and integration of flow cytometry data and of data deriving from experiments in which fluorescence-activated cell sorting is used as a source of cells. Just as the Gene Ontology has been shown to offer significant benefits for the computational analysis of high-throughput data in the study of gene expression using hybridization microarrays [
8], we anticipate similar benefits from the use of an ontology of cell types to support analysis of high-throughput, multidimensional flow data.
The relations
has_high_amount and
has_low_amount defined in terms of the geometric mean are used in the definition of cell types and are not meant to replace more complicated statistical methods for the analysis of flow cytometry data, such as is described in [
40], or other cellular data. Such statistical methods can be applied to the analysis of individual flow data sets, while ontology definitions need to hold universally, across different experimental designs, protocols, and equipment and across differences in the resulting distributions of fluorescence intensities for reference cells. Indeed, the ontology definitions should hold across different assays for surface protein expression, and should not be tied directly to flow cytometry. We have therefore taken a relatively simple approach to the formulation of cell definitions that hold universally and that are supported by our current understanding of DC biology. It is our hope, however, that our work, taken together with [
40], will encourage the use of more objective criteria in the analysis of flow cytometry data and in the description and analysis of cell types in general.
The classification of DCs is still an area of active research, thus DC-CL will continue to undergo revisions to keep current with new research results and new technologies for the characterization of cell types [
28]. Because ontologies are based on an open world assumption, in contrast to relational databases, they are easily extended to include new subcategories. In addition, the formulation of DC-CL definitions as logically conjoined statements of the from X
RY makes it easy to add or remove surface proteins from the definition of any cell type and to use a reasoner to assess consequences of the revision on the DC-CL hierarchy. Thus, newly discovered surface markers can be easily incorporated into the ontology. Furthermore, the system we have outlined is readily applicable to subcellular localizations other than the cell surface and to other cellular components such as mRNA molecules or cytoplasmic granules. In addition to defining more localization-specific relations like
has_plasma_membrane_part, the general RO relations
has_part and
lacks can be used. For all of these relations, too, cellular components other than proteins and protein complexes can be used as arguments. Morphological characteristics such as size and shape can also be used to define cell types using the
has_quality relation to link to the relevant qualities in PATO, the ontology of phenotypic qualities [
41]. In this way, the DC-CL framework lends itself quite readily to the incorporation of new information as knowledge of DC biology increases.
We have built our representations of cell types in DC-CL by relating terms in the domains covered by the OBO Foundry ontologies using relations from the Foundry's relation ontology (RO) and creating new relations as needed. The OBO Foundry [
6] was created in 2006 by a group of developers of OBO ontologies on the basis of an evolving set of principles designed to foster the pursuit of best practice in ontology development [
13]. Its ontologies are designed to represent in an interoperable fashion the biomedical reality from which data are sampled. Their development within the framework of a common top-level ontology (Basic Formal Ontology, BFO, [
42]) and consistent employment of a common set of relations allows Foundry ontologies to be used together as modules of a larger system.
There are many benefits to building DC-CL from OBO Foundry ontologies. In addition to the formalism underlying Foundry ontologies ensuring their support for sophisticated computation both within and between ontologies, building from Foundry ontologies means extensive use of existing ontology resources, both eliminating redundant effort and providing a significant head-start to ontology development. By building on OBO Foundry ontologies, DC-CL is automatically interoperable with other ontologies that also build from Foundry ontologies and with the large information resources, such as UniProt, that use Foundry ontologies for their annotations, representing a wide base of existing annotations. Finally, as OBO Foundry ontologies, and in particular GO, are widely used, use of Foundry ontologies in constructing DC-CL improves the chances that DC-CL will be accepted by the biological ontology and database communities.
DC-CL will serve as a valuable information resource not only providing centralized access to existing information about DCs, but also providing standardized representations that allow algorithmic processing for data analysis and the testing of hypotheses. The consistent use of formally defined relations means that reasoners, such as those included in ontology editing software like OBO-Edit and Protégé, can be reliably applied to DC-CL. In addition, representing information in DC-CL in the from of XRY statements, rather than in natural language definitions, means that DC-CL can be easily parsed, facilitating the implementation of custom algorithms for querying DC-CL or analyzing data annotated in its terms. For example, DC-CL can be queried for the list of proteins expressed by a certain cell type, for the list of cells that express a particular combination of proteins, or for the types of cells that participate in a particular process or have a particular function. We are currently working to integrate DC-CL into software designed for the analysis of flow cytometry data and to assess the ways in which the use of DC-CL can enhance flow data analysis.