|Home | About | Journals | Submit | Contact Us | Français|
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.
As of 2010, there are more than 1400 publicly available bioinformatics tools and databases on the Web (1,2), with over 100 new Web servers providing interactive analysis tools reported in 2009 alone (3). These published resources are just the tip of a very large iceberg, and many others exist in relative obscurity, advertised only via project or laboratory Web Pages.
Though interactive access to these resources via Web Pages has been of enormous benefit to the community over the years, there is a growing demand for programmatic interfaces that allow these tools and databases to be linked together in automated analysis pipelines (4). Web Services are becoming an increasingly popular way of providing robust remote access (5), and this approach has been adopted by major service providers including the EMBL-EBI (6), KEGG (7), NCBI (8) and the DDBJ (9). Web Services can easily be accessed from most programming languages, or chained together as workflows using free tools [e.g. Taverna (10) or Kepler (11)], or their commercial equivalents [e.g. PipeLine Pilot (http://accelrys.com/products/pipeline-pilot/)].
The resources to which Web Services provide access are distributed across centres, projects, countries and disciplines and, for the most part, are currently likely to be discovered by word-of-mouth, Google searches, or from simple on-line lists such as http://www.xmethods.net/ or http://www.webservicelist.com/. As the number of Web Services has grown, so has the need for gathering information about them into one place. Table 1 gives a short summary of prominent public service registries that are relevant to the Life Sciences. These broadly fall into two categories: those that represent collections of services based on a specific schema and/or technology [e.g. BioMOBY Central (12,13), the DAS registry (14) and those that do not (e.g. seekda (http://www.seekda.com/) and the European Model for Bioinformatics Research and Community Education (EMBRACE) registry (15)]. Although some have major commercial or institutional backing, others have grown out of fixed term projects and hence their long-term future is unclear. Alongside registry building, there have also been ongoing efforts to describe Web Services with rich semantic annotations using ontologies and modern ontology languages. Examples include SSwap , Feta , SADI (http://sadiframework.org/) and BioMOBY.
Drawing together the experience of these existing initiatives, the BioCatalogue provides a universal catalogue of Web Services for the Life Sciences. Launched in June 2009 and hosted at the EMBL-EBI, it allows registration of services that are specific to the Life Sciences (such as those for protein sequence or molecular structures) as well as more generic services that are of direct utility in this domain (e.g. text mining and image analysis). The catalogue does not host these services itself; instead it provides a mechanism to discover services and annotate them. The BioCatalogue has five key properties:
Currently, the BioCatalogue has over 300 registered members. It describes 1627 Web Services (1585 SOAP services, and 42 REST services) from over 158 different providers from 25 countries. All the services of the major data centres (EMBL-EBI, DDBJ and NCBI) are present.
The BioCatalogue can be accessed via two mechanisms: a human-readable ‘Web 2.0’-style interface which supports browsing, searching and the manual creation and annotation of service entries; and a Web Service API for programmatic access.
The BioCatalogue's Web interface provides faceted browsing, extensive link-based navigation and filtering on multiple criteria including service categories, keywords, providers, location and service type. Displayed information such as service popularity based on view statistics, comments from other users and the number and quality of annotations, helps to identify suitable services and find alternative or similar services.
All available information held on a service, including its annotations, tags and provider documentation is included in the search. Searching is facilitated by term suggestion based on tags, previous user searches and terms from the myGrid ontology (20). The ‘Search by Data’ feature matches a sample of a user's input data against example input data provided in the service annotations, allowing the user to discover services that provide methods for analysing their data. The BioCatalogue is configured so as to be indexable by generic Web search engines (e.g. Google) as well as being explicitly indexed in the specialist EB-eye (21) search engine.
Announcements and release notes are posted on Twitter and syndicated on RSS feeds. Registry entries may be bookmarked using social bookmarking systems such as Delicious (http://delicious.com/) or Digg (http://digg.com/). Users may log in using OpenID, Google, Facebook, Twitter, Yahoo! or Verisign accounts, simplifying registration, and limiting username and password proliferation.
The BioCatalogue provides a REST Web Service API, enabling tools such as Taverna and registry aggregation sites such as ONIX (http://www.ncri-onix.org.uk/) to access its contents. The main exchange format is XML, with JSON (http://www.json.org/) output available for the annotations. The API broadly reflects the same functionality that can be accessed via the interactive Web interface. Table 2 outlines the main XML endpoints and their functions. Full documentation, along with code examples, is available from http://apidocs.biocatalogue.org/.
The descriptions of the services registered in the BioCatalogue are drawn from service providers, the user community and monitoring and usage analysis. Each annotation is associated with a source (automatic analysis, other registries, the providers or named curators) and can take the form of structured data, free text, tags or ontology terms. Annotations are divided into four main categories:
The BioCatalogue currently holds more than 33 000 annotations. Approximately a third of services have all operations described. As much documentation as possible is automatically extracted from the published service interfaces, and additional annotations may be added during or after initial submission by the contributor. These semantic service descriptions can be imported and exported in formats compliant with SAWSDL (18) and SA-REST (19) standards.
The status, reliability and stability of a Web Service are often the deciding factors for choosing a service. The BioCatalogue has adopted the EMBRACE Registry's system for monitoring service availability, service interface changes and service functionality (15). Availability is indicated using a simple ‘traffic light’ mechanism, whereby green means the service is active, yellow means it has one or more unresolved issues, and red means it is currently unavailable. Service interface changes are managed by periodically re-parsing interface documents and comparing them with the existing entry. Functionality is checked by the submission of scripts that exercise specific aspects of the services, managed by a separate server. By automatically monitoring changes, a history of service versions and performance can be provided and users relying on specific services can be notified of these changes by RSS subscription or Twitter.
Usage of the BioCatalogue is monitored to build up a profile of searches and access. This reveals relationships between services, including usage patterns; for example, services that are commonly used together, and/or services that provide similar functionality, which may be used as substitutes if one of these services becomes unavailable.
Members can register a Web Service, share their views, make comments or annotations on any service and provide examples of service usage with relevant input and output data. Automatic harvesting of service annotations provides the foundation on which user-provided annotations rest. Submission of services and annotations contribute to the reputation of a member, encouraging further contributions. Content is monitored by a full-time curator who oversees content and coordinates a small pool of curators to help members improve annotations and adopt best practices.
The BioCatalogue team includes several service providers, including the EMBL-EBI. Other providers are encouraged to contribute. As well as an active ‘friends’ mailing list, online news feeds and a wiki (http://www.biocatalogue.org/wiki/), ‘annotation jamborees’—virtual or face to face group efforts to annotate a large set of Web Services and to discuss best practices, new features, directions and general issues—are organized periodically. These jamborees serve as a resource review and a team-building forum as well as a source of new annotations.
All descriptions are attributed and open to scrutiny and all monitoring results are available. Documentation is provided at various levels of detail covering guidelines and best practices for service creation and execution. Help pages provide instructions or links on how to test and run services with different tools: GUI tools, such as soapUI (http://www.soapui.org/), SOAP Client (http://ditchnet.org/soapclient/) or workflow execution engines, such as Taverna and Kepler. Pointers to commonly used software libraries that can be used to incorporate Web Services into new programs in different programming and scripting languages, and links for creating new Web Services or writing a Web Service API to an existing tool, are also provided.
The first phase of the BioCatalogue has focused on the design and development of its Web interface and API, on establishing its core content and on the building of a contributing community. Since its launch in 2009, it has had over 14 000 visits and is successfully growing a community of contributors and users. The majority of visitors use the search and browsing features to discover services. Of the 300 or so registered members, a subgroup of around 20 actively contribute high quality manual annotations.
In cooperation with their respective developers, services generated during the EMBRACE and BioSapiens projects and relevant services found by the seekda search engine have already been included, and content from BioMOBY Central and the DAS registry will be added shortly. Thus the bulk of current services have been accumulated from registries, by scavanging, and by the major service providers. We now observe a growing number of more specialist service providers each adding a small number of domain-specific services to the catalogue.
The next phase of development concentrates on extending functionality and content, improving the quality and coverage of service curation, and integration with other systems. Support for tagging with community-curated ontologies will be extended. The myGrid ontology is already used and the EMBRACE project's EDAM ontology (http://sourceforge.net/projects/edamontology/) is under review.
Contributions will be made easier by the release of a write-API, providing members with the ability to register and update services programmatically. Consequently, profiles derived from other service-using and monitoring software, like the Taverna workflow system and its Web Service workflow library myExperiment (http://www.myexperiment.org/), and the service monitoring systems of BioMOBY, DAS and QBIOS (http://qbios.gforge.inria.fr/) will be integrated to form aggregated profiles.
The BioCatalogue aims to satisfy the needs of service providers, users and experts in the field, bringing them together in a common effort to make Web Services for biology more visible, better documented and easier to use. It is an important ‘one stop shop’ where users can locate Web Services that implement the analysis relevant for their experiments, learn how these services work and, most importantly, learn how to make the most of these valuable resources.
Funding for open access charge: Biotechnology and Biological Sciences Research Council (BB/F01046X/1, BB/F010540/1 to BioCatalogue project); the European Commission via the EMBRACE project (LHSG-CT-2004-512092); EMBO (ASTF 338.00-2009 to Development on Search By Data).
Conflict of interest statement. None declared.
Authors would like to thank all members of the BioCatalogue focus group, Strategy Advisory Board and other people who help us to improve the registry: Duncan Hull, Benjamin Good, Chrysanthi Ainali, Olivier Sallou, Chris Rawlings, Anil Wipat, Jo Dicks, Robert Gill, Steve Kemp, Antoine H.C. van Kampen, Holger Lausen, Terry Payne, Mark Wilkinson, Janusz Bujnicki, Paul Gordon, Khalid Belhajjame, Philip McDermott, Dave De Roure and all participants of annotation jamborees. Special acknowledgments are given to our partners and all projects that cooperate with BioCatalogue to ease and popularize usage of Web Services in Life Sciences, especially to EU EMBRACE network, OMII-UK, BioMOBY Central, seekda, myExperiment, myGrid, EU BioSapiens network and NBIC.