Naming conventions for ontology engineering do not necessarily apply to other domains. For example, our recommendation "1.2 Use context independent names" (see Table ) will not make sense in the domain of database schemata or object-oriented programming. Terms from ontologies can be used in annotations outside the ontological context, whereas a java class is always situated in a class library hierarchy and embedded in code, providing its full context and therefore its name does not need to be fully explicit. However, general naming conventions such as "1. Be clear and unambiguous" and "2. Be univocal" can be applied in database schema generation, class naming in object oriented programming, natural language generation, even Wikipedia article naming. Formulation of universally applicable naming conventions in the bio-ontology space is no easy task due to the multidimensional complexity of the area, deriving not least from its intrinsically interdisciplinary character. Therefore, although we have carried out a comprehensive survey of existing naming convention documents in different domains [10
], we have deliberately confined ourselves here to considering the needs of the OBO Foundry community.
When conventions have been established their application may be non-trivial, not least because of the exceptions which different groups will want to make to given rules. In cases where the conventions cannot be strictly applied, common sense should be used. Here we describe some situations of this sort highlighted by our survey.
Positive names (see 2.4 in Table )
The responses to question 4.8.1 showed that most groups already try to avoid negative names and names containing expressions such as 'without' or 'excluding'; yet nearly half of the survey respondents still found examples of negative names in their ontologies. It seems it can be difficult to decide when a term is negative; e.g., 'unhealthy', 'immaterial anatomical entity', 'nonlinear transformation', 'inorganic' and 'rotenone-insensitive'. The difficulty in defining the criteria for 'negative' indicates that the convention cannot be enforced strictly, but we hold that it is nonetheless a valuable guideline. Further, we recommend that explicit exclusions should not be made within names; e.g., as in 'hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in cyclic amides' (GO:0016812).
Word separator (see 3.3 in Table )
We recommend the use of white space as separator in editor-preferred names. A consequence of the default behaviour of the Protégé 3.x Editor is that it encourages the use of the rdf:ID field to capture class names. Since this field can't contain spaces, developers using Protégé often use the underscore as a word separator. This can be cured by avoiding use of the rdf:ID field to record editor-preferred names and to use instead the rdfs:label field.
Expand Abbreviations (see 3.4 in Table )
When an abbreviation or acronym becomes more commonly used in everyday language than its full name, for example 'LASER', then it should be used as the name, with its expanded name captured as a synonym. In other words, usage frequency can take precedence over the rule of acronym avoidance.
Special character formatting and symbols (see 3.5 in Table )
The survey revealed that ontologies dealing with chemicals and using the IUPAC nomenclature need to apply character formatting to their names for purposes of semantic disambiguation. In ChEBI for example the full chemical name is represented with unrestricted character formatting, for example: CHEBI 30666: bis [tricarbonyl(η5-cyclopentadienyl)molybdenum](Mo-Mo). Since character formatting is not supported by most ontology editors and languages, the groups involved often develop specific tools to meet their requirements. For this reason ChEBI and the Systems Biology ontology have developed front ends built on top of relational databases to manage their ontologies. Defined character transformation rules can be used to encode special formatting for example as has been done by the Biological Imaging Methods Ontology, which uses  for superscripts and [] for subscripts. In general these should be avoided.
Benefits and applications
The application of common naming guidelines brings the following benefits:
• enhance communication between geographically dispersed developers
• simplify stand-alone ontology development and help in subsequent administration tasks
• simplify ontology networking; e.g., importing and using classes from external ontologies or imported ontology modules
• increase the accessibility and exportability of terms, facilitating re-use and reducing redundant development.
By increasing the robustness of ontology class names, a standard naming convention will:
• support the manual and automated integration (i.e., comparison, orthogonality-checking, alignment and mapping) of terminological artifacts
• facilitate access to ontologies through meta-tools such as the NCBO BioPortal by reducing the diversity with which these tools have to deal, thus reducing the burden on tool and ontology developers alike
• increase the robustness of context-based text mining for automatic term recognition and text annotation.
The proposed set of conventions is currently being applied by the Ontology for Biomedical Investigation (OBI) project [20
] and by the Proteomics Standards Initiative (PSI) [21
] and MSI ontology working groups. An example that illustrates how syntactic normalization enhances readability and navigability of the OBI ontology class hierarchy can be found on the OBO Foundry wiki [10
The usefulness of design principles in general and naming conventions in particular increases considerably when they are supported by ontology editing tools [22
]. In particular, tools should check for compliance to such conventions and provide the functionality not only to enforce, but also to exploit, convention-based naming patterns. We are pleased to observe that implementations of such functionality have already begun to appear. For example, in the OBO Edit 2 tool [23
] redundant class names are indicated and users can also define their own verification checks by specifying filters and error messages that will be displayed for each name that matches (or fails to match) the conventions defined. This verification system can serve as a framework upon which to build robust checks for conformity to naming conventions, either as a built-in OBO Edit module or as externally provided plug-ins (John Day-Richter personal communication). Also tools such as OBOL that use the lexical information in class names are already being applied to find inconsistencies within and between labels, and to aid ontology integration and ontology engineering in general through the methodology of cross-products [24
Some aspects of what we propose here mirror features of so-called Constrained Natural Languages, CNL [25
]. In particular, defined restrictions on the use of grammar and terminology can be found in CNL, and exploiting developments in this field could prove fruitful. However we must be careful not to be seen to be trying to impose too great a burden on ontology editors by attempting to require them to learn another full representation language. It is important to stress that having conventions for default names (using the editor-preferred name as display name) does not place restrictions on the use of less formal or colloquial names, which can and should still be captured as synonyms.
Impact on GO
As the longest established ontology in the OBO Foundry, GO has already invested effort in establishing its own naming conventions, having formerly suffered under many of the common pitfalls in naming described in this paper, for example, the use of catch-all terms such as "unlocalized" and "molecular function unknown" [26
]. Some of the recommendations outlined here have been inherited from the GO community, which in turn will move to include this whole set of naming conventions into the GO style guide. The impact on GO will certainly be positive, especially where it is used in combination with other OBO Foundry ontologies. For example, GO is considering changing to the context-independent name "cell nucleus" (as already used in FMA), instead of "nucleus" to distinguish it from "atomic nuclei" in ChEBI. The avoidance of conjunctions in term names will decompose terms like "actin polymerization and/or depolymerization", and the restriction to positive names will prevent or lead to the refactoring of terms like 'non-eye photoreceptor cell development' in GO.
The surveying process reported in this paper has been informative, and has provided evidence to support the various conventions presented herein. Furthermore, several responders explicitly stated that the questionnaire made them aware of issues which they had not thought of previously; and in some cases went on to indicate other areas where they considered that conventions would be helpful, such as:
• A reference terminology that names the various kinds of representational units (e.g., illustrating the differences between 'type', 'class', 'term', 'concept' and 'universal'), thereby supporting unambiguous discussion of particular representational units [19
• Conventions for other representational units, such as the names of relations, instances and identifiers. For example, OBI uses the identifier convention [group prefix] + [underscore] + [unique number] (e.g., 'OBI_0000016'); whereas BFO simply uses a 'meaningful string' (e.g., 'IndependentContinuant'). In addition, relations do not have numeric identifiers, which should probably be changed as these representational units, like classes, undergo changes and updates.
• A formalism is needed for naming and marking administrative 'helper' classes and metadata bins within ontologies. Until recently, non-ontological classes in OBI, such as 'unclassified' (OBI_200067), 'to_be_fixed' (OBI_334), 'ChEBI_objects' (OBI_336), 'PATO_quality' (OBI_302), 'collected_relations' (OBI_400132) could be found side-by-side with domain-level classes. These are now marked as helper classes by adding an underscore as prefix.
• Branch, module, file and namespace naming conventions should be investigated. This is also indicated by the recurring discussions on ontology naming conflicts on the OBO discussion mailing list.
• It needs to be investigated in how far certain conventions are dependent on the degree of formality of the representational artefact at hand. Conventions regulating name compositions [24
] may only be applicable to semantically granular ontologies using relations, but not to taxonomies.
• Besides our universal conventions, specialized ones for certain ontological classes of high interest, usage and abundance should be collected and evaluated. Such classes referring to 'processes', 'instruments' or 'organisations' are also called 'Named Entities' in the field of text mining.
Although work on some of the above issues has already started, these open issues are of importance and will be tackled in a next round of guideline development by the OBO Foundry coordinators, in collaboration with the OBO Foundry ontology developers.