The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches.
We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at
Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community.
We here present a user-friendly and extremely lightweight tool that can serve as a stand-alone front-end for the Open MS Search Algorithm (OMSSA) search engine, or that can directly be used as part of an informatics processing pipeline for MS driven proteomics. The OMSSA graphical user interface (OMSSAGUI) tool is written in Java, and is supported on Windows, Linux, and OSX platforms. It is an open source under the Apache 2 license and can be downloaded from http://code.google.com/p/mass-spec-gui/.
Bioinformatics; Protein identification
The number, size, and format variation for proteomics data files (both raw and processed), annotation, as well as challenges in designing a robust data repository are some of the major factors inhibiting public dissemination of proteomics data. Sharing large amounts of data and software is a legitimate need in the field of proteomics and other scientific disciplines as replication of results and the benefits of data reanalysis relies heavily on having access to the original data. Several journals have already published recommendations for providing access to data associated with proteomics manuscripts; however, researchers have been left with the challenge of how to appropriately satisfy the recommendations. Of particular concern is how potentially large datasets (gigabytes to terabytes of raw data) may be efficiently hosted in a publicly accessible fashion.
Described here is Tranche, a secured peer-to-peer system (http://www.proteomecommons.org/dev/dfs/), along with a reference implementation, supported by Proteome-Commons.org, that is capable of hosting virtually unlimited amounts of data and supporting virtually unlimited users. Furthermore Tranche solves many of the prominent concerns in data dissemination, including hosting raw data associated with a proteomics experiment and maintaining annotation. It is intended as both a reference implementation and a model system for comparison to other proteomics data dissemination efforts. Tranche currently hosts many prominent proteomics datasets and mirrors of other proteomics data resources, including most all of the publicly available proteomics data.
This paper describes the design of the NeuroLOG middleware data management layer, which provides a platform to share heterogeneous and distributed neuroimaging data using a federated approach. The semantics of shared information is captured through a multi-layer application ontology and a derived Federated Schema used to align the heterogeneous database schemata from different legacy repositories. The system also provides a facility to translate the relational data into a semantic representation that can be queried using a semantic search engine thus enabling the exploitation of knowledge embedded in the ontology. This work shows the relevance of the distributed approach for neurosciences data management. Although more complex than a centralized approach, it is also more realistic when considering the federation of large data sets, and open strong perspectives to implement multi-centric neurosciences studies.
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE.
ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list.
The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format.
The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac).
mass spectrometry; data analysis; search algorithms; software; cloud computing
Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations.
We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task.
We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.
Traditional scientific workflow platforms usually run individual experiments with little evaluation and analysis of performance as required by automated experimentation in which scientists are being allowed to access numerous applicable workflows rather than being committed to a single one. Experimental protocols and data under a peer-to-peer environment could potentially be shared freely without any single point of authority to dictate how experiments should be run. In such environment it is necessary to have mechanisms by which each individual scientist (peer) can assess, locally, how he or she wants to be involved with others in experiments. This study aims to implement and demonstrate simple peer ranking under the OpenKnowledge peer-to-peer infrastructure by both simulated and real-world bioinformatics experiments involving multi-agent interactions.
A simulated experiment environment with a peer ranking capability was specified by the Lightweight Coordination Calculus (LCC) and automatically executed under the OpenKnowledge infrastructure. The peers such as MS/MS protein identification services (including web-enabled and independent programs) were made accessible as OpenKnowledge Components (OKCs) for automated execution as peers in the experiments. The performance of the peers in these automated experiments was monitored and evaluated by simple peer ranking algorithms.
Peer ranking experiments with simulated peers exhibited characteristic behaviours, e.g., power law effect (a few dominant peers dominate), similar to that observed in the traditional Web. Real-world experiments were run using an interaction model in LCC involving two different types of MS/MS protein identification peers, viz., peptide fragment fingerprinting (PFF) and de novo sequencing with another peer ranking algorithm simply based on counting the successful and failed runs. This study demonstrated a novel integration and useful evaluation of specific proteomic peers and found MASCOT to be a dominant peer as judged by peer ranking.
The simulated and real-world experiments in the present study demonstrated that the OpenKnowledge infrastructure with peer ranking capability can serve as an evaluative environment for automated experimentation.
Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets.
This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available.
The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system.
We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results.
CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.
Picture Archiving and Communication Systems (PACS) have been widely deployed in healthcare institutions, and they now constitute a normal commodity for practitioners. However, its installation, maintenance, and utilization are still a burden due to their heavy structures, typically supported by centralized computational solutions. In this paper, we present Dicoogle, a PACS archive supported by a document-based indexing system and by peer-to-peer (P2P) protocols. Replacing the traditional database storage (RDBMS) by a documental organization permits gathering and indexing data from file-based repositories, which allows searching the archive through free text queries. As a direct result of this strategy, more information can be extracted from medical imaging repositories, which clearly increases flexibility when compared with current query and retrieval DICOM services. The inclusion of P2P features allows PACS internetworking without the need for a central management framework. Moreover, Dicoogle is easy to install, manage, and use, and it maintains full interoperability with standard DICOM services.
PACS; Digital Imaging and Communications in Medicine (DICOM); Medical imaging; Peer-to-peer; Computer communication networks; Open source; PACS implementation; Information storage and retrieval
Today’s biological experiments often involve the collaboration of multidisciplinary researchers utilising several high throughput ‘omics platforms. There is a requirement for the details of the experiment to be adequately described using standardised ontologies to enable data preservation, the analysis of the data and to facilitate the export of the data to public repositories. However there are a bewildering number of ontologies, controlled vocabularies, and minimum standards available for use to describe experiments. There is a need for user-friendly software tools to aid laboratory scientists in capturing the experimental information.
A web application called XperimentR has been developed for use by laboratory scientists, consisting of a browser-based interface and server-side components which provide an intuitive platform for capturing and sharing experimental metadata. Information recorded includes details about the biological samples, procedures, protocols, and experimental technologies, all of which can be easily annotated using the appropriate ontologies. Files and raw data can be imported and associated with the biological samples via the interface, from either users’ computers, or commonly used open-source data repositories. Experiments can be shared with other users, and experiments can be exported in the standard ISA-Tab format for deposition in public databases. XperimentR is freely available and can be installed natively or by using a provided pre-configured Virtual Machine. A guest system is also available for trial purposes.
We present a web based software application to aid the laboratory scientist to capture, describe and share details about their experiments.
Experimental annotation; Ontologies; Biological data management
The Cell Imaging and Analysis Network (CIAN) provides services and tools to researchers in the field of cell biology from within or outside Montreal's McGill University community. CIAN is composed of six scientific platforms: Cell Imaging (confocal and fluorescence microscopy), Proteomics (2-D protein gel electrophoresis and DiGE, fluorescent protein analysis), Automation and High throughput screening (Pinning robot and liquid handler), Protein Expression for Antibody Production, Genomics (real-time PCR), and Data storage and analysis (cluster, server, and workstations). Users submit project proposals, and can obtain training and consultation in any aspect of the facility, or initiate projects with the full-service platforms. CIAN is designed to facilitate training, enhance interactions, as well as share and maintain resources and expertise.
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.
Properly combining the results from different database search methods will enhance the accuracy of peptide identifications. This nevertheless requires, among different search methods, a common statistical standard, which can be achieved by statistical calibration of E-values. After transforming E-values into database P-values, we use the well-established Fisher’s formula to combine different P-values. Our protocol provides a statistically sound method for integration of search results.
With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.
This Technical Note describes a novel modular framework for development and interlaboratory distribution and validation of 3D tractography algorithms based on in vivo diffusion tensor imaging (DTI) measurements. The proposed framework allows individual MRI research centers to benefit from new tractography algorithms developed at other independent centers by “plugging” new tractography modules directly into their own custom DTI software tools, such as existing graphical user interfaces (GUI) for visualizing brain white matter pathways. The proposed framework is based on the Java 3D programming platform, which provides an object-oriented programming (OOP) model and independence of computer hardware configuration and operating system. To demonstrate the utility of the proposed approach, a complete GUI for interactive DTI tractography was developed, along with two separate and interchangeable modules that implement two different tractography algorithms. Although the application discussed here relates to DTI tractography, the programming concepts presented here should be of interest to anyone who wishes to develop platform-independent GUI applications for interactive 3D visualization.
Diffusion tensor imaging; white matter; tractography
A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.
We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.
An integrated data repository (IDR) containing aggregations of clinical, biomedical, economic, administrative, and public health data is a key component of an overall translational research infrastructure. But most available data repositories are designed using standard data warehouse architecture that employs arbitrary data encoding standards, making queries across disparate repositories difficult. In response to these shortcomings we have designed a Health Ontology Mapper (HOM) that translates terminologies into formal data encoding standards without altering the underlying source data. We believe the HOM system promotes inter-institutional data sharing and research collaboration, and will ultimately lower the barrier to developing and using an IDR.
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data.
A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism.
The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.
SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.
SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.
The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.
High-throughput structural proteomics is expected to generate
considerable amounts of data on the progress of structure determination
for many proteins. For each protein this includes information about
cloning, expression, purification, biophysical characterization
and structure determination via NMR spectroscopy or X-ray crystallography.
It will be essential to develop specifications and ontologies for
standardizing this information to make it amenable to retrospective
analysis. To this end we created the SPINE database and analysis
system for the Northeast Structural Genomics Consortium. SPINE,
which is available at bioinfo.mbb.yale.edu/nesg
or nesg.org, is specifically designed to enable distributed
scientific collaboration via the Internet. It was designed not just
as an information repository but as an active vehicle to standardize
proteomics data in a form that would enable systematic data mining.
The system features an intuitive user interface for interactive
retrieval and modification of expression construct data, query forms
designed to track global project progress and external links to many
other resources. Currently the database contains experimental data
on 985 constructs, of which 740 are drawn from Methanobacterium
thermoautotrophicum, 123 from Saccharomyces cerevisiae,
93 from Caenorhabditis elegans and the remainder
from other organisms. We developed a comprehensive set of data mining
features for each protein, including several related to experimental progress
(e.g. expression level, solubility and crystallization) and 42 based
on the underlying protein sequence (e.g. amino acid composition,
secondary structure and occurrence of low complexity regions). We
demonstrate in detail the application of a particular machine learning
approach, decision trees, to the tasks of predicting a protein’s
solubility and propensity to crystallize based on sequence features.
We are able to extract a number of key rules from our trees, in
particular that soluble proteins tend to have significantly more
acidic residues and fewer hydrophobic stretches than insoluble ones. One
of the characteristics of proteomics data sets, currently and in
the foreseeable future, is their intermediate size (∼500–5000 data points).
This creates a number of issues in relation to error estimation. Initially
we estimate the overall error in our trees based on standard cross-validation.
However, this leaves out a significant fraction of the data in model construction
and does not give error estimates on individual rules. Therefore,
we present alternative methods to estimate the error in particular
Technological advances in mass spectrometry and other detection methods are leading to larger and larger proteomics datasets. However, when papers describing such information are published the enormous volume of data can typically only be provided as supplementary data in a tabular form through the journal website. Several journals in the proteomics field, together with the Human Proteome Organization's (HUPO) Proteomics Standards Initiative and institutions such as the Institute for Systems Biology are working towards standardizing the reporting of proteomics data, but just defining standards is only a means towards an end for sharing data. Data repositories such as ProteomeCommons.org and the Open Proteomics Database allow for public access to proteomics data but provide little, if any, interpretation.
Results & conclusion
Here we describe PrestOMIC, an open source application for storing mass spectrometry-based proteomic data in a relational database and for providing a user-friendly, searchable and customizable browser interface to share one's data with the scientific community. The underlying database and all associated applications are built on other existing open source tools, allowing PrestOMIC to be modified as the data standards evolve. We then use PrestOMIC to present a recently published dataset from our group through our website.
Motivation: The work presented here describes the ‘anatomical Gene-Expression Mapping (aGEM)’ Platform, a development conceived to integrate phenotypic information with the spatial and temporal distributions of genes expressed in the mouse. The aGEM Platform has been built by extending the Distributed Annotation System (DAS) protocol, which was originally designed to share genome annotations over the WWW. DAS is a client-server system in which a single client integrates information from multiple distributed servers.
Results: The aGEM Platform provides information to answer three main questions. (i) Which genes are expressed in a given mouse anatomical component? (ii) In which mouse anatomical structures are a given gene or set of genes expressed? And (iii) is there any correlation among these findings? Currently, this Platform includes several well-known mouse resources (EMAGE, GXD and GENSAT), hosting gene-expression data mostly obtained from in situ techniques together with a broad set of image-derived annotations.
Availability: The Platform is optimized for Firefox 3.0 and it is accessed through a friendly and intuitive display: http://agem.cnb.csic.es
Supplementary information: Supplementary data are available at http://bioweb.cnb.csic.es/VisualOmics/aGEM/home.html and http://bioweb.cnb.csic.es/VisualOmics/index_VO.html and Bioinformatics online.