As of 2010, there are more than 1400 publicly available bioinformatics tools and databases on the Web (1
), with over 100 new Web servers providing interactive analysis tools reported in 2009 alone (3
). These published resources are just the tip of a very large iceberg, and many others exist in relative obscurity, advertised only via project or laboratory Web Pages.
Though interactive access to these resources via Web Pages has been of enormous benefit to the community over the years, there is a growing demand for programmatic interfaces that allow these tools and databases to be linked together in automated analysis pipelines (4
). Web Services are becoming an increasingly popular way of providing robust remote access (5
), and this approach has been adopted by major service providers including the EMBL-EBI (6
), KEGG (7
), NCBI (8
) and the DDBJ (9
). Web Services can easily be accessed from most programming languages, or chained together as workflows using free tools [e.g. Taverna (10
) or Kepler (11
)], or their commercial equivalents [e.g. PipeLine Pilot (http://accelrys.com/products/pipeline-pilot/
The resources to which Web Services provide access are distributed across centres, projects, countries and disciplines and, for the most part, are currently likely to be discovered by word-of-mouth, Google searches, or from simple on-line lists such as http://www.xmethods.net/
. As the number of Web Services has grown, so has the need for gathering information about them into one place. gives a short summary of prominent public service registries that are relevant to the Life Sciences. These broadly fall into two categories: those that represent collections of services based on a specific schema and/or technology [e.g. BioMOBY Central (12
), the DAS registry (14
) and those that do not (e.g. seekda (http://www.seekda.com/
) and the European Model for Bioinformatics Research and Community Education (EMBRACE) registry (15
)]. Although some have major commercial or institutional backing, others have grown out of fixed term projects and hence their long-term future is unclear. Alongside registry building, there have also been ongoing efforts to describe Web Services with rich semantic annotations using ontologies and modern ontology languages. Examples include SSwap [16
], Feta [17
], SADI (http://sadiframework.org/
) and BioMOBY.
A summary of existing on-line collections of Web Services
Drawing together the experience of these existing initiatives, the BioCatalogue provides a universal catalogue of Web Services for the Life Sciences. Launched in June 2009 and hosted at the EMBL-EBI, it allows registration of services that are specific to the Life Sciences (such as those for protein sequence or molecular structures) as well as more generic services that are of direct utility in this domain (e.g. text mining and image analysis). The catalogue does not host these services itself; instead it provides a mechanism to discover services and annotate them. The BioCatalogue has five key properties:
- It provides a single up-to-date port-of-call for finding Life Science Web Services, regardless of their technology or provenance. As well as allowing new registration of services manually and via its own Web Service interface, it aggregates contributions from other registries. For example, the catalogue carries service registrations from the EMBRACE Registry and domain-specific services from seekda.
- It offers a long-term sustained resource for service descriptions that is also a safe haven for securing the contents of registries beyond their originating projects (e.g. EMBRACE and BioSapiens services).
- It adds uniform and rich annotations to the services that harmonize their descriptions regardless of source or type. The annotations explain what the service does and how to use it. The descriptions draw upon existing and emerging work in the Semantic Web Services [e.g. Semantic Annotation for WSDL (SAWSDL) (18) and Semantic Annotation for REpresentational State Transfer (SA-REST) (19)]. Annotations from the EMBRACE and Feta registries have been contributed to the catalogue. Content is monitored by a full-time curator assisted by the registered members.
- It provides a rich range of facilities, adopting the best components of other registries where available, e.g. the EMBRACE service monitoring framework and endpoint validation software, and the use of seekda for service scavenging.
- It addresses the combined needs of service providers, users, annotators and developers alike, enabling the catalogue's content to be readily extended, curated and used by the community.
Currently, the BioCatalogue has over 300 registered members. It describes 1627 Web Services (1585 SOAP services, and 42 REST services) from over 158 different providers from 25 countries. All the services of the major data centres (EMBL-EBI, DDBJ and NCBI) are present.