Bioinformatics web services can be categorized into two major functional groups: data access and analysis. Access to public database repositories is obviously fundamental to bioinformatics research, and various systems have been developed for this purpose, such as Entrez at NCBI, Sequence Retrieval System (SRS) and EB-eye at EBI [19
], Distributed Annotation System (DAS) [20
], All-round Retrieval for Sequence and Annotation (ARSA) and getentry at DDBJ [21
], DBGET at KEGG [22
], and XML-based Protein Structure Search Service (xPSSS) at PDBj [6
]. These services provide programmable means for text-based keyword search and entry retrieval from their backend databases, which mostly consist of static entries written either in semi-structured text or XML. As each entry has a unique identifier it is generally assignable to a URI (Uniform Resource Identifiers).
The other group of services provides a variety of methods that require a certain amount of computation by implementing various algorithms, and they sometimes have complex input or output data structures. A typical example is a BLAST search, which needs a nucleic or amino acid sequence, as well as numerous optional arguments in order to find homologous sequences from a specified database using a dynamic programming algorithm. Services in this group sometimes require a large amount of computation time, including those providing certain functionalities of the European Molecular Biology Open Software Suite (EMBOSS) [23
], 3 D structural analysis of proteins, and data mapping on biochemical pathways.
Historically, the term web services was associated with SOAP (Simple Object Access Protocol), a protocol that transfers messages in a SOAP XML envelope between a server and a client, usually over the Hypertext Transfer Protocol, HTTP [24
]. SOAP services have several accessibility advantages, including an open standard that is independent from computer programming languages, and the use of the HTTP protocol which is usually not filtered by firewalls (SOAP services can therefore be accessed even from institutions having very strict security policies for Internet access). Since all SOAP messages are XML documents and the format of the messages are known in advance from the service description (see below), it is possible to use XML binding to seamlessly convert the messages to language-specific objects and thus avoid any custom-programmed parsing. XML binding is often leveraged by SOAP libraries to provide a programmatic interface to a web service similar to an object oriented API. Operations provided by SOAP services can consume several arguments, thus a service that requires a number of parameters can easily be utilized as an API, as if the method were a function call for a local library of a given programming language.
For the purpose of service description, SOAP services usually come with a Web Services Description Language (WSDL) [25
] file. A WSDL file is an XML formatted document that is consumed by a SOAP/WSDL library to allow automatic construction of a set of functions for the client program. In addition to the list of methods, WSDL contains descriptions for each method, including the types and numbers of input arguments as well as those of output data. WSDL is also capable of describing complex data models that combine basic data types into nested data objects. In this way, SOAP services can accept various kinds of complex biological objects, such as a protein sequence entry accompanied by several annotation properties like the identifier, description, and source organism.
Recently, another kind of web service model named REST (Representational State Transfer) has rapidly gained popularity as an effective alternative approach to SOAP-based web services [1
]. REST is an approach whereby an online service is decomposed into uniquely identifiable, stateless resources that can be called as a URL and return the relevant data in any format. Typically, many bioinformatics database services return entries in a text-based flatfile format upon REST calls. The strength of REST is in its simplicity. Since REST is built on top of HTTP requests, there is no need for supporting libraries, unlike SOAP/WSDL services. RESTful URLs are also highly suitable for permanent resource mapping, such as that between a database entry and a unique URI; therefore, biological web services that provide data access should ideally be exposed as simple REST services. On the other hand, REST is less appropriate for services that require complex input with multiple numbers of parameters, or for time-consuming and therefore asynchronous and stateful services. For those, SOAP/WSDL-based services are still more suitable.
WSDL description per se
is not enough for the immediate construction of biological workflows as multiple cascading web services, because of inconsistent data types defined by each service provider, sometimes even for essentially identical objects. Therefore, in most cases output of one service cannot be passed to another service as its input without appropriate conversion of data types or formats. Furthermore, services should also be discoverable by the object models they share so they can be linked in the construction of workflows. To this end, a centralized registry to discover appropriate services according to a given set of data types has become essential for web service interoperability. The BioMoby project has pioneered this task by providing MobyCentral, which serves as a central repository for BioMoby compatible web services [9
]. Service developers are encouraged to register their own service to the repository with a description of the service using the BioMoby ontologies that classify the semantic attributes of the method including the input and output data types. Metadata and ontologies for service description and discovery discussed during the BioHackathon are listed in Table .
Required metadata for service description and discovery.
To date, several applications that utilize BioMoby services have been developed, such as Taverna [26
], Seahawk [27
], MOWserv [8
], and G-language Genome Analysis Environment (G-language GAE) [28
]. Taverna is a software tool developed under the myGrid project [29
], written in Java and equipped with a graphical user interface (GUI) for the construction of workflows by interconnecting existing web services. Users can start from an initial set of data pipelined to a service, where the input data is remotely analyzed, resulting in an output of different data types. This output becomes the input for the subsequent analysis step, for which appropriate services that consume this input data type can be looked up, for example, through MobyCentral. Iteration of this procedure leads to cascading services forming a bioinformatics workflow, which can be repeatedly utilized with different datasets. The strength of Taverna is in its support of many non-BioMoby services that can be utilized in concert with BioMoby-based services, and its customizability by enabling small Java plug-ins to be written, for example to connect two services requiring data format conversion.
Seahawk is another GUI software tool that invokes BioMoby services in a context-dependent manner, for example, by selecting an amino acid sequence in a website to use as input data, so that users can analyze data as they browse information on web pages.
] is a web application that provides interactive analysis in a web browser. A web interface is dynamically generated for each BioMoby object and compatible service. MOWserv implements novel functionality to allow data persistence, user management, task scheduling and fault-tolerance capabilities. Therefore MOWserv allows monitoring of long and CPU-intensive tasks and automating the execution of complex workflows. Invocation of services can be traced in the web interface, including for later reference. An interesting aspect of MOWserv is that it has extended the BioMoby ontologies for objects and services through manual curation. This keeps ontologies clean enough, so that it greatly simplifies interoperability between services and helps in building workflows. Additionally, each service has been annotated with additional metadata that is used to provide a consistent help system.
G-language GAE [31
] is a Perl based genome analysis workbench that provides an interactive command-line shell environment for analyses. During the BioHackathon 2008, the G-language Project team added support for BioMoby services that can be seamlessly integrated with BioPerl and G-language GAE functions into genome analysis workflows. Also, it became evident during the hackathon that there needs to be a standardized way to retrofit existing web services to BioMoby, and this work started on this using the World Wide Web Consortiums' new SAWSDL standard [32
For many tasks custom programming is still needed, for example, to parse the results obtained from web services for further extraction of data, and to integrate with local analysis pipelines. One of these most time-saving ways to accomplish these tasks is by using the Open Bio* libraries, such as BioPerl, BioPython, BioRuby and BioJava. These libraries are being collaboratively developed as open source software by developers distributed all over the world, and they have the capability to manipulate numerous formats used in bioinformatics databases and applications. The Open Bio Foundation [33
] has an important role in supporting these projects by providing hosting services for the code repository, mailing lists, and web sites to the community.
SOAP and REST have improved accessibility of bioinformatics web services, but standardization of metadata is required to increase their interoperability (Table ). Although BioMoby has been contributing to it, many major services still have not adopted its formalities. This situation leaves end-users many cases where they have to make a code to construct a workflow. Even though some GUI applications or libraries of each programming language are provided to support it, there has not been a "total solution," yet (Table ). Considering these circumstances, a web service to convert data formats would be needed to alleviate the end-users' tasks.
Applications for bioinformatics web services.