The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
Determining the magnitude of climate change patterns across elevational gradients is essential for an improved understanding of broader climate change patterns and for predicting hydrologic and ecosystem changes. We present temperature trends from five long-term weather stations along a 2077-meter elevational transect in the Rocky Mountain Front Range of Colorado, USA. These trends were measured over two time periods: a full 56-year record (1953–2008) and a shorter 20-year (1989–2008) record representing a period of widely reported accelerating change. The rate of change of biological indicators, season length and accumulated growing-degree days, were also measured over the 56 and 20-year records. Finally, we compared how well interpolated Parameter-elevation Regression on Independent Slopes Model (PRISM) datasets match the quality controlled and weather data from each station. Our results show that warming signals were strongest at mid-elevations over both temporal scales. Over the 56-year record, most sites show warming occurring largely through increases in maximum temperatures, while the 20-year record documents warming associated with increases in maximum temperatures at lower elevations and increases in minimum temperatures at higher elevations. Recent decades have also shown a shift from warming during springtime to warming in July and November. Warming along the gradient has contributed to increases in growing-degree days, although to differing degrees, over both temporal scales. However, the length of the growing season has remained unchanged. Finally, the actual and the PRISM interpolated yearly rates rarely showed strong correlations and suggest different warming and cooling trends at most sites. Interpretation of climate trends and their seasonal biases in the Rocky Mountain Front Range are dependent on both elevation and the temporal scale of analysis. Given mismatches between interpolated data and the directly measured station data, we caution against an over-reliance on interpolation methods for documenting local patterns of climatic change.
Anthropogenic effects on wildlife are typically assessed at the local level, but it is often difficult to extrapolate to larger spatial extents. Macro-level occupancy studies are one way to assess impacts of multiple disturbance factors that might vary over different geographic extents. Here we assess anthropogenic effects on occupancy and distribution for several mammal species within the Appalachian Trail (AT), a forest corridor that extends across a broad section of the eastern United States. Utilizing camera traps and a large volunteer network of citizen scientists, we were able to sample 447 sites along a 1024 km section of the AT to assess the effects of available habitat, hunting, recreation, and roads on eight mammal species. Occupancy modeling revealed the importance of available forest to all species except opossums (Didelphis virginiana) and coyotes (Canis latrans). Hunting on adjoining lands was the second strongest predictor of occupancy for three mammal species, negatively influencing black bears (Ursus americanus) and bobcats (Lynx rufus), while positively influencing raccoons (Procyon lotor). Modeling also indicated an avoidance of high trail use areas by bears and proclivity towards high use areas by red fox (Vulpes vulpes). Roads had the lowest predictive power on species occupancy within the corridor and were only significant for deer. The occupancy models stress the importance of compounding direct and indirect anthropogenic influences operating at the regional level. Scientists and managers should consider these human impacts and their potential combined influence on wildlife persistence when assessing optimal habitat or considering management actions.
Legacy data from natural history collections contain invaluable and irreplaceable information about biodiversity in the recent past, providing a baseline for detecting change and forecasting the future of biodiversity on a human-dominated planet. However, these data are often not available in formats that facilitate use and synthesis. New approaches are needed to enhance the rates of digitization and data quality improvement. Notes from Nature provides one such novel approach by asking citizen scientists to help with transcription tasks. The initial web-based prototype of Notes from Nature is soon widely available and was developed collaboratively by biodiversity scientists, natural history collections staff, and experts in citizen science project development, programming and visualization. This project brings together digital images representing different types of biodiversity records including ledgers , herbarium sheets and pinned insects from multiple projects and natural history collections. Experts in developing web-based citizen science applications then designed and built a platform for transcribing textual data and metadata from these images. The end product is a fully open source web transcription tool built using the latest web technologies. The platform keeps volunteers engaged by initially explaining the scientific importance of the work via a short orientation, and then providing transcription “missions” of well defined scope, along with dynamic feedback, interactivity and rewards. Transcribed records, along with record-level and process metadata, are provided back to the institutions. While the tool is being developed with new users in mind, it can serve a broad range of needs from novice to trained museum specialist. Notes from Nature has the potential to speed the rate of biodiversity data being made available to a broad community of users.
Natural History Museums; Biodiversity; Open Source; Museum Collections; Citizen Science; Digitization; Transcription
Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, via a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.
“Compose your notes as if you were writing a letter to someone a century in the future.”
Perrine and Patton (2011)
Field notes; notebooks; crowd sourcing; digitization; biodiversity; transcription; text-mining; Darwin Core; Junius Henderson; annotation; taxonomic referencing; natural history; Wikisource; Colorado; species occurrence records
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
Biodiversity data derive from myriad sources stored in various formats on many distinct hardware and software platforms. An essential step towards understanding global patterns of biodiversity is to provide a standardized view of these heterogeneous data sources to improve interoperability. Fundamental to this advance are definitions of common terms. This paper describes the evolution and development of Darwin Core, a data standard for publishing and integrating biodiversity information. We focus on the categories of terms that define the standard, differences between simple and relational Darwin Core, how the standard has been implemented, and the community processes that are essential for maintenance and growth of the standard. We present case-study extensions of the Darwin Core into new research communities, including metagenomics and genetic resources. We close by showing how Darwin Core records are integrated to create new knowledge products documenting species distributions and changes due to environmental perturbations.
The impacts of climate change on phenological responses of species and communities are well-documented; however, many such studies are correlational and so less effective at assessing the causal links between changes in climate and changes in phenology. Using grasshopper communities found along an elevational gradient, we present an ideal system along the Front Range of Colorado USA that provides a mechanistic link between climate and phenology.
This study utilizes past (1959–1960) and present (2006–2008) surveys of grasshopper communities and daily temperature records to quantify the relationship between amount and timing of warming across years and elevations, and grasshopper timing to adulthood. Grasshopper communities were surveyed at four sites, Chautauqua Mesa (1752 m), A1 (2195 m), B1 (2591 m), and C1 (3048 m), located in prairie, lower montane, upper montane, and subalpine life zones, respectively. Changes to earlier first appearance of adults depended on the degree to which a site warmed. The lowest site showed little warming and little phenological advancement. The next highest site (A1) warmed a small, but significant, amount and grasshopper species there showed inconsistent phenological advancements. The two highest sites warmed the most, and at these sites grasshoppers showed significant phenological advancements. At these sites, late-developing species showed the greatest advancements, a pattern that correlated with an increase in rate of late-season warming. The number of growing degree days (GDDs) associated with the time to adulthood for a species was unchanged across the past and present surveys, suggesting that phenological advancement depended on when a set number of GDDs is reached during a season.
Our analyses provide clear evidence that variation in amount and timing of warming over the growing season explains the vast majority of phenological variation in this system. Our results move past simple correlation and provide a stronger process-oriented and predictive framework for understanding community level phenological responses to climate change.
Responding to the urgent need to make biodiversity records broadly accessible, the natural history community turned to “the cloud.”
Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers.
We have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher.
We have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
Ecological niche models (ENMs) provide a means of characterizing the spatial distribution of suitable conditions for species, and have recently been applied to the challenge of locating potential distributional areas at the Last Glacial Maximum (LGM) when unfavorable climate conditions led to range contractions and fragmentation. Here, we compare and contrast ENM-based reconstructions of LGM refugial locations with those resulting from the more traditional molecular genetic and phylogeographic predictions. We examined 20 North American terrestrial vertebrate species from different regions and with different range sizes for which refugia have been identified based on phylogeographic analyses, using ENM tools to make parallel predictions. We then assessed the correspondence between the two approaches based on spatial overlap and areal extent of the predicted refugia. In 14 of the 20 species, the predictions from ENM and predictions based on phylogeographic studies were significantly spatially correlated, suggesting that the two approaches to development of refugial maps are converging on a similar result. Our results confirm that ENM scenario exploration can provide a useful complement to molecular studies, offering a less subjective, spatially explicit hypothesis of past geographic patterns of distribution.
The BioGeomancer Project provides a toolkit to georeference data and specimens collected for natural history collections, a crucial task if the potential of these specimens is to be fully realized.
Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.
BioGeomancer; data visualization; Geographic Information Systems; Global Biodiversity Information Facility; global biodiversity services; Google Earth; species richness estimation; survey gap analysis