The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches.
We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at
Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community.
We here present a user-friendly and extremely lightweight tool that can serve as a stand-alone front-end for the Open MS Search Algorithm (OMSSA) search engine, or that can directly be used as part of an informatics processing pipeline for MS driven proteomics. The OMSSA graphical user interface (OMSSAGUI) tool is written in Java, and is supported on Windows, Linux, and OSX platforms. It is an open source under the Apache 2 license and can be downloaded from http://code.google.com/p/mass-spec-gui/.
Bioinformatics; Protein identification
The number, size, and format variation for proteomics data files (both raw and processed), annotation, as well as challenges in designing a robust data repository are some of the major factors inhibiting public dissemination of proteomics data. Sharing large amounts of data and software is a legitimate need in the field of proteomics and other scientific disciplines as replication of results and the benefits of data reanalysis relies heavily on having access to the original data. Several journals have already published recommendations for providing access to data associated with proteomics manuscripts; however, researchers have been left with the challenge of how to appropriately satisfy the recommendations. Of particular concern is how potentially large datasets (gigabytes to terabytes of raw data) may be efficiently hosted in a publicly accessible fashion.
Described here is Tranche, a secured peer-to-peer system (http://www.proteomecommons.org/dev/dfs/), along with a reference implementation, supported by Proteome-Commons.org, that is capable of hosting virtually unlimited amounts of data and supporting virtually unlimited users. Furthermore Tranche solves many of the prominent concerns in data dissemination, including hosting raw data associated with a proteomics experiment and maintaining annotation. It is intended as both a reference implementation and a model system for comparison to other proteomics data dissemination efforts. Tranche currently hosts many prominent proteomics datasets and mirrors of other proteomics data resources, including most all of the publicly available proteomics data.
This paper describes the design of the NeuroLOG middleware data management layer, which provides a platform to share heterogeneous and distributed neuroimaging data using a federated approach. The semantics of shared information is captured through a multi-layer application ontology and a derived Federated Schema used to align the heterogeneous database schemata from different legacy repositories. The system also provides a facility to translate the relational data into a semantic representation that can be queried using a semantic search engine thus enabling the exploitation of knowledge embedded in the ontology. This work shows the relevance of the distributed approach for neurosciences data management. Although more complex than a centralized approach, it is also more realistic when considering the federation of large data sets, and open strong perspectives to implement multi-centric neurosciences studies.
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE.
ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list.
The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format.
The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac).
mass spectrometry; data analysis; search algorithms; software; cloud computing
Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations.
We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task.
We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.
Traditional scientific workflow platforms usually run individual experiments with little evaluation and analysis of performance as required by automated experimentation in which scientists are being allowed to access numerous applicable workflows rather than being committed to a single one. Experimental protocols and data under a peer-to-peer environment could potentially be shared freely without any single point of authority to dictate how experiments should be run. In such environment it is necessary to have mechanisms by which each individual scientist (peer) can assess, locally, how he or she wants to be involved with others in experiments. This study aims to implement and demonstrate simple peer ranking under the OpenKnowledge peer-to-peer infrastructure by both simulated and real-world bioinformatics experiments involving multi-agent interactions.
A simulated experiment environment with a peer ranking capability was specified by the Lightweight Coordination Calculus (LCC) and automatically executed under the OpenKnowledge infrastructure. The peers such as MS/MS protein identification services (including web-enabled and independent programs) were made accessible as OpenKnowledge Components (OKCs) for automated execution as peers in the experiments. The performance of the peers in these automated experiments was monitored and evaluated by simple peer ranking algorithms.
Peer ranking experiments with simulated peers exhibited characteristic behaviours, e.g., power law effect (a few dominant peers dominate), similar to that observed in the traditional Web. Real-world experiments were run using an interaction model in LCC involving two different types of MS/MS protein identification peers, viz., peptide fragment fingerprinting (PFF) and de novo sequencing with another peer ranking algorithm simply based on counting the successful and failed runs. This study demonstrated a novel integration and useful evaluation of specific proteomic peers and found MASCOT to be a dominant peer as judged by peer ranking.
The simulated and real-world experiments in the present study demonstrated that the OpenKnowledge infrastructure with peer ranking capability can serve as an evaluative environment for automated experimentation.
Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets.
This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available.
The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system.
We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results.
CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.
Picture Archiving and Communication Systems (PACS) have been widely deployed in healthcare institutions, and they now constitute a normal commodity for practitioners. However, its installation, maintenance, and utilization are still a burden due to their heavy structures, typically supported by centralized computational solutions. In this paper, we present Dicoogle, a PACS archive supported by a document-based indexing system and by peer-to-peer (P2P) protocols. Replacing the traditional database storage (RDBMS) by a documental organization permits gathering and indexing data from file-based repositories, which allows searching the archive through free text queries. As a direct result of this strategy, more information can be extracted from medical imaging repositories, which clearly increases flexibility when compared with current query and retrieval DICOM services. The inclusion of P2P features allows PACS internetworking without the need for a central management framework. Moreover, Dicoogle is easy to install, manage, and use, and it maintains full interoperability with standard DICOM services.
PACS; Digital Imaging and Communications in Medicine (DICOM); Medical imaging; Peer-to-peer; Computer communication networks; Open source; PACS implementation; Information storage and retrieval
Today’s biological experiments often involve the collaboration of multidisciplinary researchers utilising several high throughput ‘omics platforms. There is a requirement for the details of the experiment to be adequately described using standardised ontologies to enable data preservation, the analysis of the data and to facilitate the export of the data to public repositories. However there are a bewildering number of ontologies, controlled vocabularies, and minimum standards available for use to describe experiments. There is a need for user-friendly software tools to aid laboratory scientists in capturing the experimental information.
A web application called XperimentR has been developed for use by laboratory scientists, consisting of a browser-based interface and server-side components which provide an intuitive platform for capturing and sharing experimental metadata. Information recorded includes details about the biological samples, procedures, protocols, and experimental technologies, all of which can be easily annotated using the appropriate ontologies. Files and raw data can be imported and associated with the biological samples via the interface, from either users’ computers, or commonly used open-source data repositories. Experiments can be shared with other users, and experiments can be exported in the standard ISA-Tab format for deposition in public databases. XperimentR is freely available and can be installed natively or by using a provided pre-configured Virtual Machine. A guest system is also available for trial purposes.
We present a web based software application to aid the laboratory scientist to capture, describe and share details about their experiments.
Experimental annotation; Ontologies; Biological data management
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.
Properly combining the results from different database search methods will enhance the accuracy of peptide identifications. This nevertheless requires, among different search methods, a common statistical standard, which can be achieved by statistical calibration of E-values. After transforming E-values into database P-values, we use the well-established Fisher’s formula to combine different P-values. Our protocol provides a statistically sound method for integration of search results.
The Cell Imaging and Analysis Network (CIAN) provides services and tools to researchers in the field of cell biology from within or outside Montreal's McGill University community. CIAN is composed of six scientific platforms: Cell Imaging (confocal and fluorescence microscopy), Proteomics (2-D protein gel electrophoresis and DiGE, fluorescent protein analysis), Automation and High throughput screening (Pinning robot and liquid handler), Protein Expression for Antibody Production, Genomics (real-time PCR), and Data storage and analysis (cluster, server, and workstations). Users submit project proposals, and can obtain training and consultation in any aspect of the facility, or initiate projects with the full-service platforms. CIAN is designed to facilitate training, enhance interactions, as well as share and maintain resources and expertise.
With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.
Bioinformatics is advanced from in-house computing infrastructure to cloud computing for tackling the vast quantity of biological data. This advance enables large number of collaborative researches to share their works around the world. In view of that, retrieving biological data over the internet becomes more and more difficult because of the explosive growth and frequent changes. Various efforts have been made to address the problems of data discovery and delivery in the cloud framework, but most of them suffer the hindrance by a MapReduce master server to track all available data. In this paper, we propose an alternative approach, called PRKad, which exploits a Peer-to-Peer (P2P) model to achieve efficient data discovery and delivery. PRKad is a Kademlia-based implementation with Round-Trip-Time (RTT) as the associated key, and it locates data according to Distributed Hash Table (DHT) and XOR metric. The simulation results exhibit that our PRKad has the low link latency to retrieve data. As an interdisciplinary application of P2P computing for bioinformatics, PRKad also provides good scalability for servicing a greater number of users in dynamic cloud environments.
This Technical Note describes a novel modular framework for development and interlaboratory distribution and validation of 3D tractography algorithms based on in vivo diffusion tensor imaging (DTI) measurements. The proposed framework allows individual MRI research centers to benefit from new tractography algorithms developed at other independent centers by “plugging” new tractography modules directly into their own custom DTI software tools, such as existing graphical user interfaces (GUI) for visualizing brain white matter pathways. The proposed framework is based on the Java 3D programming platform, which provides an object-oriented programming (OOP) model and independence of computer hardware configuration and operating system. To demonstrate the utility of the proposed approach, a complete GUI for interactive DTI tractography was developed, along with two separate and interchangeable modules that implement two different tractography algorithms. Although the application discussed here relates to DTI tractography, the programming concepts presented here should be of interest to anyone who wishes to develop platform-independent GUI applications for interactive 3D visualization.
Diffusion tensor imaging; white matter; tractography
An integrated data repository (IDR) containing aggregations of clinical, biomedical, economic, administrative, and public health data is a key component of an overall translational research infrastructure. But most available data repositories are designed using standard data warehouse architecture that employs arbitrary data encoding standards, making queries across disparate repositories difficult. In response to these shortcomings we have designed a Health Ontology Mapper (HOM) that translates terminologies into formal data encoding standards without altering the underlying source data. We believe the HOM system promotes inter-institutional data sharing and research collaboration, and will ultimately lower the barrier to developing and using an IDR.
A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.
We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data.
A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism.
The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.
SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.
SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
High-throughput structural proteomics is expected to generate
considerable amounts of data on the progress of structure determination
for many proteins. For each protein this includes information about
cloning, expression, purification, biophysical characterization
and structure determination via NMR spectroscopy or X-ray crystallography.
It will be essential to develop specifications and ontologies for
standardizing this information to make it amenable to retrospective
analysis. To this end we created the SPINE database and analysis
system for the Northeast Structural Genomics Consortium. SPINE,
which is available at bioinfo.mbb.yale.edu/nesg
or nesg.org, is specifically designed to enable distributed
scientific collaboration via the Internet. It was designed not just
as an information repository but as an active vehicle to standardize
proteomics data in a form that would enable systematic data mining.
The system features an intuitive user interface for interactive
retrieval and modification of expression construct data, query forms
designed to track global project progress and external links to many
other resources. Currently the database contains experimental data
on 985 constructs, of which 740 are drawn from Methanobacterium
thermoautotrophicum, 123 from Saccharomyces cerevisiae,
93 from Caenorhabditis elegans and the remainder
from other organisms. We developed a comprehensive set of data mining
features for each protein, including several related to experimental progress
(e.g. expression level, solubility and crystallization) and 42 based
on the underlying protein sequence (e.g. amino acid composition,
secondary structure and occurrence of low complexity regions). We
demonstrate in detail the application of a particular machine learning
approach, decision trees, to the tasks of predicting a protein’s
solubility and propensity to crystallize based on sequence features.
We are able to extract a number of key rules from our trees, in
particular that soluble proteins tend to have significantly more
acidic residues and fewer hydrophobic stretches than insoluble ones. One
of the characteristics of proteomics data sets, currently and in
the foreseeable future, is their intermediate size (∼500–5000 data points).
This creates a number of issues in relation to error estimation. Initially
we estimate the overall error in our trees based on standard cross-validation.
However, this leaves out a significant fraction of the data in model construction
and does not give error estimates on individual rules. Therefore,
we present alternative methods to estimate the error in particular
The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.
Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues.
We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy.
Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments.
Visualization; Visual analysis; Galaxy; Genome browser; Circos; Phylogenetic tree