We demonstrate how a classical taxonomic description of a new species can be enhanced by applying new generation molecular methods, and novel computing and imaging technologies. A cave-dwelling centipede, Eupolybothrus
cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae), found in a remote karst region in Knin, Croatia, is the first eukaryotic species for which, in addition to the traditional morphological description, we provide a fully sequenced transcriptome, a DNA barcode, detailed anatomical X-ray microtomography (micro-CT) scans, and a movie of the living specimen to document important traits of its ex-situ behaviour. By employing micro-CT scanning in a new species for the first time, we create a high-resolution morphological and anatomical dataset that allows virtual reconstructions of the specimen and subsequent interactive manipulation to test the recently introduced ‘cybertype’ notion. In addition, the transcriptome was recorded with a total of 67,785 scaffolds, having an average length of 812 bp and N50 of 1,448 bp (see GigaDB). Subsequent annotation of 22,866 scaffolds was conducted by tracing homologs against current available databases, including Nr, SwissProt and COG. This pilot project illustrates a workflow of producing, storing, publishing and disseminating large data sets associated with a description of a new taxon. All data have been deposited in publicly accessible repositories, such as GigaScience GigaDB, NCBI, BOLD, Morphbank and Morphosource, and the respective open licenses used ensure their accessibility and re-usability.
Cybertaxonomy; gene sequence data; micro-CT; data integration; molecular systematics; caves; Croatia; biospeleology
With the publication of the first eukaryotic species description, combining transcriptomic, DNA barcoding, and micro-CT imaging data, GigaScience and Pensoft demonstrate how classical taxonomic description of a new species can be enhanced by applying new generation molecular methods, and novel computing and imaging technologies. This 'holistic’ approach in taxonomic description of a new species of cave-dwelling centipede is published in the Biodiversity Data Journal (BDJ), with coordinated data release in the GigaScience GigaDB database.
cybertaxonomy; open access publishing; semantic content; XML markup
The paper describes a pilot project to convert a conventional floristic checklist, written in a standard word processing program, into structured data in the Darwin Core Archive format. After peer-review and editorial acceptance, the final revised version of the checklist was converted into Darwin Core Archive by means of regular expressions and published thereafter in both human-readable form as traditional botanical publication and Darwin Core Archive data files. The data were published and indexed through the Global Biodiversity Information Facility (GBIF) Integrated Publishing Toolkit (IPT) and significant portions of the text of the paper were used to describe the metadata on IPT. After publication, the data will become available through the GBIF infrastructure and can be re-used on their own or collated with other data.
Data mining; taxonomic checklists; Darwin Core Archive
Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities that data should be freely, openly available in a sustained, persistent and secure way, and thus standards for 'free' and 'open' access to data have become well developed in recent years. The question of effective access to data remains highly problematic.
Specifically with respect to scientific publishing, the ability to critically evaluate a published scientific hypothesis or scientific report is contingent on the examination, analysis, evaluation - and if feasible - on the re-generation of data on which conclusions are based. It is not coincidental that in the recent 'climategate' controversies, the quality and integrity of data and their analytical treatment were central to the debate. There is recent evidence that even when scientific data are requested for evaluation they may not be available. The history of dissemination of scientific results has been marked by paradigm shifts driven by the emergence of new technologies. In recent decades, the advance of computer-based technology linked to global communications networks has created the potential for broader and more consistent dissemination of scientific information and data. Yet, in this digital era, scientists and conservationists, organizations and institutions have often been slow to make data available. Community studies suggest that the withholding of data can be attributed to a lack of awareness, to a lack of technical capacity, to concerns that data should be withheld for reasons of perceived personal or organizational self interest, or to lack of adequate mechanisms for attribution.
There is a clear need for institutionalization of a 'data publishing framework' that can address sociocultural, technical-infrastructural, policy, political and legal constraints, as well as addressing issues of sustainability and financial support. To address these aspects of a data publishing framework - a systematic, standard approach to the formal definition and public disclosure of data - in the context of biodiversity data, the Global Biodiversity Information Facility (GBIF, the single inter-governmental body most clearly mandated to undertake such an effort) convened a Data Publishing Framework Task Group. We conceive this data publishing framework as an environment conducive to ensure free and open access to world's biodiversity data. Here, we present the recommendations of that Task Group, which are intended to encourage free and open access to the worlds' biodiversity data.
Free and open access to primary biodiversity data is essential for informed decision-making to achieve conservation of biodiversity and sustainable development. However, primary biodiversity data are neither easily accessible nor discoverable. Among several impediments, one is a lack of incentives to data publishers for publishing of their data resources. One such mechanism currently lacking is recognition through conventional scholarly publication of enriched metadata, which should ensure rapid discovery of 'fit-for-use' biodiversity data resources.
We review the state of the art of data discovery options and the mechanisms in place for incentivizing data publishers efforts towards easy, efficient and enhanced publishing, dissemination, sharing and re-use of biodiversity data. We propose the establishment of the 'biodiversity data paper' as one possible mechanism to offer scholarly recognition for efforts and investment by data publishers in authoring rich metadata and publishing them as citable academic papers. While detailing the benefits to data publishers, we describe the objectives, work flow and outcomes of the pilot project commissioned by the Global Biodiversity Information Facility in collaboration with scholarly publishers and pioneered by Pensoft Publishers through its journals Zookeys, PhytoKeys, MycoKeys, BioRisk, NeoBiota, Nature Conservation and the forthcoming Biodiversity Data Journal. We then debate further enhancements of the data paper beyond the pilot project and attempt to forecast the future uptake of data papers as an incentivization mechanism by the stakeholder communities.
We believe that in addition to recognition for those involved in the data publishing enterprise, data papers will also expedite publishing of fit-for-use biodiversity data resources. However, uptake and establishment of the data paper as a potential mechanism of scholarly recognition requires a high degree of commitment and investment by the cross-sectional stakeholder communities.
We review the three most widely used XML schemas used to mark-up taxonomic texts, TaxonX, TaxPub and taXMLit. These are described from the viewpoint of their development history, current status, implementation, and use cases. The concept of “taxon treatment” from the viewpoint of taxonomy mark-up into XML is discussed. TaxonX and taXMLit are primarily designed for legacy literature, the former being more lightweight and with a focus on recovery of taxon treatments, the latter providing a much more detailed set of tags to facilitate data extraction and analysis. TaxPub is an extension of the National Library of Medicine Document Type Definition (NLM DTD) for taxonomy focussed on layout and recovery and, as such, is best suited for mark-up of new publications and their archiving in PubMedCentral. All three schemas have their advantages and shortcomings and can be used for different purposes.
mark-up; XML schema; taxonomy; TaxonX; TaxPub; taXMLit
The Creative Commons (CC) licenses are a suite of copyright-based licenses defining terms for the distribution and re-use of creative works. CC provides licenses for different use cases and includes open content licenses such as the Attribution license (CC BY, used by many Open Access scientific publishers) and the Attribution Share Alike license (CC BY-SA, used by Wikipedia, for example). However, the license suite also contains non-free and non-open licenses like those containing a “non-commercial” (NC) condition. Although many people identify “non-commercial” with “non-profit”, detailed analysis reveals that significant differences exist and that the license may impose some unexpected re-use limitations on works thus licensed. After providing background information on the concepts of Creative Commons licenses in general, this contribution focuses on the NC condition, its advantages, disadvantages and appropriate scope. Specifically, it contributes material towards a risk analysis for potential re-users of NC-licensed works.
Creative Commons; Open Access; Open Content; Licensing; Non-profit; Open Educational Resources; Data Sharing; Software Licenses; Europeana
This paper discusses the design and implementation of a citizen science pilot project, COMBER (Citizens’ Network for the Observation of Marine BiodivERsity, http://www.comber.hcmr.gr), which has been initiated under the ViBRANT EU e-infrastructure. It is designed and implemented for divers and snorkelers who are interested in participating in marine biodiversity citizen science projects. It shows the necessity of engaging the broader community in the marine biodiversity monitoring and research projects, networks and initiatives. It analyses the stakeholders, the industry and the relevant markets involved in diving activities and their potential to sustain these activities. The principles, including data policy and rewards for the participating divers through their own data, upon which this project is based are thoroughly discussed. The results of the users analysis and lessons learned so far are presented. Future plans include promotion, links with citizen science web developments, data publishing tools, and development of new scientific hypotheses to be tested by the data collected so far.
Citizen science; marine biodiversity; SCUBA diving; data collection and publication; sustainability
Scholarly publishing and citation practices have developed largely in the absence of versioned documents. The digital age requires new practices to combine the old and the new. We describe how the original published source and a versioned wiki page based on it can be reconciled and combined into a single citation reference. We illustrate the citation mechanism by way of practical examples focusing on journal and wiki publishing of taxon treatments. Specifically, we discuss mechanisms for permanent cross-linking between the static original publication and the dynamic, versioned wiki, as well as for automated export of journal content to the wiki, to reduce the workload on authors, for combining the journal and the wiki citation and for integrating it with the attribution of wiki contributors.
The paper describes the focus, scope and the rationale of PhytoKeys, a newly established, peer-reviewed, open-access journal in plant systematics. PhytoKeys is launched to respond to four main challenges of our time: (1) Appearance of electronic publications as amendments or even alternatives to paper publications; (2) Open Access (OA) as a new publishing model; (3) Linkage of electronic registers, indices and aggregators that summarize information on biological species through taxonomic names or their persistent identifiers (Globally Unique Identifiers or GUIDs; currently Life Science Identifiers or LSIDs); (4) Web 2.0 technologies that permit the semantic markup of, and semantic enhancements to, published biological texts. The journal will pursue cutting-edge technologies in publication and dissemination of biodiversity information while strictly following the requirements of the current International Code of Botanical Nomenclature (ICBN).
E-publications; open access; semantic tagging; semantic enhancements; plant systematics
The centipede genus Eupolybothrus Verhoeff, 1907 in North Africa is revised. A new cavernicolous species, Eupolybothrus kahfi Stoev & Akkari, sp. n., is described from a cave in Jebel Zaghouan, northeast Tunisia. Morphologically, it is most closely related to Eupolybothrus nudicornis (Gervais, 1837) from North Africa and Southwest Europe but can be readily distinguished by the long antennae and leg-pair 15, a conical dorso-median protuberance emerging from the posterior part of prefemur 15, and the shape of the male first genital sternite. Molecular sequence data from the cytochrome c oxidase I gene (mtDNA–5’ COI-barcoding fragment) exhibit 19.19% divergence between Eupolybothrus kahfi and Eupolybothrus nudicornis, an interspecific value comparable to those observed among four other species of Eupolybothrus which, combined with a low intraspecific divergence (0.3–1.14%), supports the morphological diagnosis of Eupolybothrus kahfi as a separate species. This is the first troglomorphic myriapod to be found in Tunisia, and the second troglomorph lithobiomorph centipede known from North Africa. Eupolybothrus nudicornis is redescribed based on abundant material from Tunisia and its post-embryonic development, distribution and habitat preferences recorded. Eupolybothrus cloudsley-thompsoni Turk, 1955, a nominal species based on Tunisian type material, is placed in synonymy with Eupolybothrus nudicornis. To comply with the latest technological developments in publishing of biological information, the paper implements new approaches in cybertaxonomy, such as fine granularity XML tagging validated against the NLM DTD TaxPub for PubMedCentral and dissemination in XML to various aggregators (GBIF, EOL, Wikipedia), vizualisation of all taxa mentioned in the text via the dynamically created Pensoft Taxon Profile (PTP) page, data publishing, georeferencing of all localities via Google Earth, and ZooBank, GenBank and MorphBank registration of datasets. An interactive key to all valid species of Eupolybothrus is made with DELTA software.
Eupolybothrus kahfi sp. n.; Eupolybothrus nudicornis; North Africa; barcoding; cytochrome c oxidase I gene; troglomorphism; habitat preferences; interactive key; cybertaxonomy; semantic tagging; semantic enhancements
We describe a method to publish nomenclatural acts described in taxonomic websites (Scratchpads) that are formally registered through publication in a printed journal (ZooKeys). This method is fully compliant with the zoological nomenclatural code. Our approach supports manuscript creation (via a Scratchpad), electronic act registration (via ZooBank), online and print publication (in the journal ZooKeys) and simultaneous dissemination (ZooKeys and Scratchpads) for nomenclatorial acts including new species descriptions. The workflow supports the generation of manuscripts directly from a database and is illustrated by two sample papers published in the present issue.
Online publishing; taxonomy; nomenclature; ICZN; ICBN
The concept of semantic tagging and its potential for semantic enhancements to taxonomic papers is outlined and illustrated by four exemplar papers published in the present issue of ZooKeys. The four papers were created in different ways: (i) written in Microsoft Word and submitted as non-tagged manuscript (doi: 10.3897/zookeys.50.504); (ii) generated from Scratchpads and submitted as XML-tagged manuscripts (doi: 10.3897/zookeys.50.505 and doi: 10.3897/zookeys.50.506); (iii) generated from an author’s database (doi: 10.3897/zookeys.50.485) and submitted as XML-tagged manuscript. XML tagging and semantic enhancements were implemented during the editorial process of ZooKeys using the Pensoft Mark Up Tool (PMT), specially designed for this purpose. The XML schema used was TaxPub, an extension to the Document Type Definitions (DTD) of the US National Library of Medicine Journal Archiving and Interchange Tag Suite (NLM). The following innovative methods of tagging, layout, publishing and disseminating the content were tested and implemented within the ZooKeys editorial workflow: (1) highly automated, fine-grained XML tagging based on TaxPub; (2) final XML output of the paper validated against the NLM DTD for archiving in PubMedCentral; (3) bibliographic metadata embedded in the PDF through XMP (Extensible Metadata Platform); (4) PDF uploaded after publication to the Biodiversity Heritage Library (BHL); (5) taxon treatments supplied through XML to Plazi; (6) semantically enhanced HTML version of the paper encompassing numerous internal and external links and linkouts, such as: (i) vizualisation of main tag elements within the text (e.g., taxon names, taxon treatments, localities, etc.); (ii) internal cross-linking between paper sections, citations, references, tables, and figures; (iii) mapping of localities listed in the whole paper or within separate taxon treatments; (v) taxon names autotagged, dynamically mapped and linked through the Pensoft Taxon Profile (PTP) to large international database services and indexers such as Global Biodiversity Information Facility (GBIF), National Center for Biotechnology Information (NCBI), Barcode of Life (BOLD), Encyclopedia of Life (EOL), ZooBank, Wikipedia, Wikispecies, Wikimedia, and others; (vi) GenBank accession numbers autotagged and linked to NCBI; (vii) external links of taxon names to references in PubMed, Google Scholar, Biodiversity Heritage Library and other sources. With the launching of the working example, ZooKeys becomes the first taxonomic journal to provide a complete XML-based editorial, publication and dissemination workflow implemented as a routine and cost-efficient practice. It is anticipated that XML-based workflow will also soon be implemented in botany through PhytoKeys, a forthcoming partner journal of ZooKeys. The semantic markup and enhancements are expected to greatly extend and accelerate the way taxonomic information is published, disseminated and used.
Semantic tagging; semantic enhancements; systematics; taxonomy