Capturing metadata as described for the DDF or the Biocatalogue project is not easy. Our initial DDF metadata for over 220 resources was captured as part of a detailed MRB questionnaire sent to each resource, and active manual curation had to be used to fill in the gaps in responses. This is expensive and time-consuming and, after the first pass, there is a requirement to keep the captured data up to date, and this is not easily met.
To eliminate the cost of a central curation effort, it would be much better if each resource curated their own metadata and made it accessible to the wider scientific community. As an example of this, we produced a DDF extension to the Drupal content management system (http://drupal.org
), which allows curators to log-in and categorize their databases in terms of DDF categories and levels using a simple web form. The resulting metadata is then browsable and searchable either through a web interface or programmatically through RESTful web services. An example deployment is viewable at www.casimir.org.uk/casimir_ddf
() and is currently populated with the metadata for the MRB project. We encourage interested readers to visit our site and for maintainers of resources to curate their metadata using it. The Drupal framework is easily extensible to allow curation of other data associated with each resource, so allowing the production of a customisable community registry. The system is expected to be of great value to communities developing registry resources or individual informaticians wanting to establish quickly which features a database provides (the software is freely available under an open-source license). The REST web services allow a central DDF portal to be established offering the collection and sharing of data from individual database registries as well as avoiding redundancy in curation efforts.
Figure 1. The DDF query and annotation tool. This tool allows any user to browse a set of resources that have been annotated using the DDF categories. Searches for resources by DDF category and level are also possible. In addition, resource maintainers can log-in (more ...)
Biocatalogue have used a combination of central and community curation from the outset to capture data on web services and the large number of services already described is testament to such an approach. Again, the provision of easy to use web tools that suggest particular tags and ontology terms to use in the annotation increases the likelihood of achieving a high level of community engagement and annotation quality.
Community curation requires pro-active participation. Communities need to acknowledge; (i) a central site where they can find relevant resources would be useful, and; (ii) the only practical means of achieving this is for each database to self-curate its entry using a clearly articulated and standardized set of benchmarks and tools such as provided by the DDF and Biocatalogue solutions. Individual resources would also benefit from this small amount of curation effort as the central registry will direct users to them, who might not previously have known about their resource. Although the creators and maintainers of a resource are best placed to describe the associated metadata, a self-curation approach can raise data quality issues, but these should be minimized if the annotation tools are well designed i.e. fast and easy to use, with clear descriptions of what is being asked for, and responses presented as a lists of terms rather than free text. However, even with a well-designed annotation tool, registries are still likely to require some central curation for validating submitted data (e.g. the DDF tool allows administrator level access to check new submissions).
In summary, there is now a clear need for registries to be built that address biological categorization of databases and services, annotate any services provided and capture metadata on database best practises. Considerable progress has been made on standardizing the capture of each of these by such approaches as the DDF and Biocatalogue, but the community would benefit from coordination to produce full registries combining all these approaches. However, the value of a standard is dependent on its uptake by the community as can be seen, for example, in the MIBBI family of minimal information standards.15
Uptake of a standard is, of course, as much a social issue as one of producing the right technologies for the community. Here, support from funding agencies and journals will be vital in establishing the practice of publishing database and services metadata. All curators can enhance the value of their databases by posting a minimal amount of information about their resource on a community site. The task has minimal cost, but will provide considerable value to investigators, database developers, informaticians and funding agencies.