The concept of “taxon treatment” is exploited by the Plazi team to model taxonomic publications and explore how much of the text tagging can be done by machine either before or after publication. Following taxonomic paper publishing traditions, an initial definition for the electronic form (Sautter et al. 2007
), a taxon treatment can include a formal description of a taxon including sections on nomenclature, morphological characteristics, behavior, ecology, distribution, and specimens examined.
The launch of the electronic taxon treatment concept played a key role in the development of taxonomic tagging methodology. Moreover, it is expected that its influence will increase in the near future. Thus, we consider it necessary to describe the concept here in more detail.
From the text-processing perspective, a taxon treatment is any “block of text” containing information on a given taxon, that can be delimited from other taxon treatments within the same document by specifying the treatment’s start and end tags. From the viewpoint of the publishing tradition in systematics, the treatment is a block of information on a given taxon that may include some elements of the following:
1. New taxon description
2. Change of a nomenclatorial status of a taxon (a nomenclatural act)
3. Summary of all previous knowledge on a taxon from literature sources, usually structured in logical pieces, e.g., nomenclature, morphological description, distribution, ecology, biology
4. Summary of all previous knowledge plus newly published data on the same taxon, e.g., localities, ecological/biological observations
5. Summary of newly published data on an already known taxon
6. Summary of treatments of subordinated taxa, for instance a revision or catalog of a genus listing treatments of ALL or SOME of its species is a treatment of that genus
7. Listing of subordinated taxa, e.g., a checklist of a family from a region forms a treatment of that family.
Taxon treatments usually have the form of published conventional texts that could be enhanced by a wide array of tags and external links. More importantly, taxon treatments may be archived, searched, harvested, or linked as separate pieces of information directly related to their respective taxa.
A publication may consist of one or many treatments of different taxa of different taxonomic ranks. One taxon may have more than one treatment within a publication, although the tradition of systematics publishing usually assumes one “core” treatment per taxon within a document.
Taxon profiles generated “on the fly” or extracted through web “scrapers” have several features of treatments (e.g., EOL, NCBI, Wikipedia, or ispecies.org taxon profiles). To be called treatments, however, they have to be published in a static and citable form. It seems necessary to distinguish these two types of taxon profiles (published and dynamic, generated on the fly), although the border between them may sometimes seem vague. The essential feature of a treatment is that it encompasses information published in accordance with both present-day publishing standards and the requirements of nomenclatural codes.
What is not a taxon treatment?
1. A citation of a taxon name within a text, although such a citation usually holds information linked to the particular taxon. For instance, listing of a species within a “plain” checklist cannot be a treatment of that species; a sentence within a text paragraph stating that “taxon X is parasitic on taxon Y” is neither a treatment of taxon X nor of taxon Y
2. A key, because in some cases keys are constructed for related taxa that do not form a taxon (they may form a “species-group” or “taxa-group”, but this is not a taxon unless a name is given to that group). Identification keys, even they are exhaustive for a named taxon, are usually tagged separately from taxon treatments.
3. A single picture or group of pictures of a taxon
4. A single map or group of maps of a taxon
5. Gene sequence(s) of a taxon
6. SDD (Structured Descriptive Data) (or any) matrices, or raw data, or databases. Treatments can be relatively easily generated from databases, however, information on a taxon becomes a treatment when (a) it is published, and (b) corresponds to the aforementioned definition of taxon treatment.
The TaxonX schema and the TaxPub DTD largely follow the above restrictions which arise from a community of practice rooted in paper publishing. In the electronic era, broader notions of a treatment can easily be added to the electronic forms by simple extension of the schema or DTD, in ways that do not make useless publications with the narrower form.
Why are taxonomic treatments important? What role do they play in various disciplines? Taxonomic treatments are important because they allow “atomising” taxonomic texts, that is they permit labelling and delimiting a piece of information (e.g., a block of text) linked to a taxon within a document from other similar pieces of information, linked to other taxa. Taxonomic treatments allow a rapid transition from conventional, article-level publishing in the biodiversity science, to treatment-level (or content- or data-level) taxonomic publishing. XML encoded taxonomic treatments facilitate future use, re-use and collation (harvesting and indexing, mashups, linkouts) of data, because computers can recognise data elements within treatments and relate such data to taxon names.
Taxonomic treatments are important because they allow mobilization, retrieval and re-use of any and all taxonomic data published not only in the present day, but also in historical taxonomic literature. Recent and historical treatments can be interlinked through taxon names.
Finally, treatments are important because in a straightforward way they relate information on organisms to the oldest and most widely used identifiers in the history of biology – the taxonomic names of organisms. Through names, and especially through the recently developed global index of taxon names (Global Names Architecture, or GNA, Global Names Index, or GNI, Global Names Usage Bank, or GNUB, see http://www.globalnames.org
) treatments may be linked to any other information in any other branch of science that uses taxonomic names.
To facilitate “atomizing” of taxonomic texts into retrievable and machine-readable forms, we need a computer language and sets of rules and protocols in taxonomic publishing, such as XML (see above for more details). TaxonX is a light markup XML schema developed to encode historical, or legacy, taxonomic literature. It is therefore robust enough to retrieve a great variety of styles used in such literature. TaxPub was developed as an extension of the general Document Type Definitions (DTD) format of the National Library of Medicine of the US (NLM, http://dtd.nlm.nih.gov
) to facilitate markup of prospective taxonomic publishing.