|Home | About | Journals | Submit | Contact Us | Français|
The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is carried out by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations.
A variety of manual, computational and experimental annotations of biological data, such as genome and protein sequences, are being developed by different groups all around the world. The distributed annotation system (DAS) protocol allows data producers to share their results with the community without requiring aggregation into a central database (Dowell et al., 2001). Resources such as the EnsEMBL genome browser (Hubbard et al., 2005) use the DAS protocol to link new data to the browser and visualize it. Although originally designed to serve annotations of genomes, in the last year DAS has also received some interest from the protein-bioinformatics community, largely because of the BioSapiens (http://www.biosapiens.info/) and eFamily (http://www.efamily.org.uk/) projects.
DAS provides a simple convention to encode a DNA or protein sequence and its annotated features into simple XML documents that are exchanged via the Internet (http:// www.biodas.org/). To make the DAS protocol applicable for protein structures we developed two extensions that allow alignments and 3D structure information (http://www. sanger.ac.uk/Users/ap3/DAS/) to be transmitted. Using these extensions a variety of new DAS clients become possible. For example, pairwise or multiple alignments of chromosomes or protein sequences could be visualized. Here we are using these extensions to support a new DAS client, SPICE, that can be used to visualize annotations of protein sequences and protein structures.
SPICE is a Java program that can be started using Java Web Start simply by following a link from a web page. It accepts either a PDB (Berman et al., 2000) or a UniProt code (Appweiler et al., 2004) as an argument. When the application is started for the first time, Web Start will download the program automatically. Once SPICE is running, it connects to the DAS registration service (http://das.sanger.ac.uk/registry/) (manuscript in preparation) to retrieve a list of available DAS servers.
Four different types of DAS servers contribute to a complete SPICE display:
The available annotations include active site definitions, domain assignments and secondary structure assignments. One of the DAS sources provided is a mapping of genomic features including SNPs and intron/exon borders onto UniProt sequences. Using SPICE it is therefore possible to visualize the location of DNA features on the protein structure.
The SPICE viewer window consists of three main panels, as illustrated in Figure 1:
It is possible to add new DAS sources to SPICE. The SPICE configuration allows access to local DAS sources that are still under development or have not been registered with the DAS registration server. In this way SPICE can be used to evaluate new methods by comparing new results with the information that can be obtained from other sources. Since features can contain links back to the original data, providing data with DAS can also be used as a way to advertise and draw attention to new methods. Future developments include integrating SPICE into the EnsEMBL website. We will set up DAS sources that provide alignments of UniProt sequences and PDB structures to the EnsEMBL predicted peptides, through which it will be possible to launch SPICE.
SPICE is a tool to visualize protein sequence and protein structure annotations. It utilizes the DAS protocol to retrieve its data from separate sites on the Internet. It can be used to browse and compare the available annotations for a particular protein as well as to compare the results of newly developed methods with the pre-existing data.
The SPICE source code is available under the Lesser General Public Licence (LGPL) from http://www.derkholm.net/svn/repos/spice/. Some modules are being made available through BioJava, which is also available under the LGPL from http://www.biojava.org/.
This work has been supported by the Medical Research Council. Thanks to Rob Finn for suggesting the name SPICE and to Andreas Kähäri for many feature suggestions. Thanks to everybody who is setting up DAS servers—the system would not work without you.