Search tips
Search criteria

Results 1-25 (690088)

Clipboard (0)

Related Articles

1.  Access Guide to Human Proteinpedia 
Human Proteinpedia ( is a publicly available proteome repository for sharing human protein data derived from multiple experimental platforms. It incorporates diverse features of human proteome including protein-protein interactions, enzyme-substrate relationships, PTMs, subcellular localization, expression of proteins in various human tissues and cell lines in diverse biological conditions including diseases. Through a public distributed annotation system developed especially for proteomic data, investigators across the globe can upload, view and edit proteomic data even before they are published. Inclusion of information on investigators and laboratories that generated the data, visualization of tandem mass spectra, stained tissue sections, protein/peptide microarrays, fluorescent micrographs and Western blots ensure quality of proteomic data assimilated in Human Proteinpedia. Many of the protein annotations submitted to Human Proteinpedia have also been made available to the scientific community through Human Protein Reference Database (, another resource developed by our group. In this protocol, we describe how to submit, edit and retrieve proteomic data in Human Proteinpedia.
PMCID: PMC3664228  PMID: 23504933
mass spectrometry; tissue microarrays; biomarkers; disease proteomics; HPRD; proteotypic peptides; multiple reaction monitoring
2.  MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data 
BMC Bioinformatics  2007;8:197.
The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches.
We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at
Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community.
PMCID: PMC1906842  PMID: 17567892
3.  OMSSAGUI: An open-source user interface component to configure and run the OMSSA search engine 
Proteomics  2008;8(12):2376-2378.
We here present a user-friendly and extremely lightweight tool that can serve as a stand-alone front-end for the Open MS Search Algorithm (OMSSA) search engine, or that can directly be used as part of an informatics processing pipeline for MS driven proteomics. The OMSSA graphical user interface (OMSSAGUI) tool is written in Java, and is supported on Windows, Linux, and OSX platforms. It is an open source under the Apache 2 license and can be downloaded from
PMCID: PMC2905866  PMID: 18563730
Bioinformatics; Protein identification
4.  NeuroLOG: sharing neuroimaging data using an ontology-based federated approach 
This paper describes the design of the NeuroLOG middleware data management layer, which provides a platform to share heterogeneous and distributed neuroimaging data using a federated approach. The semantics of shared information is captured through a multi-layer application ontology and a derived Federated Schema used to align the heterogeneous database schemata from different legacy repositories. The system also provides a facility to translate the relational data into a semantic representation that can be queried using a semantic search engine thus enabling the exploitation of knowledge embedded in the ontology. This work shows the relevance of the distributed approach for neurosciences data management. Although more complex than a centralized approach, it is also more realistic when considering the federation of large data sets, and open strong perspectives to implement multi-centric neurosciences studies.
PMCID: PMC3243145  PMID: 22195101
5.  P6-T Tranche: Secure Decentralized Data Storage for the Proteomics Community 
The number, size, and format variation for proteomics data files (both raw and processed), annotation, as well as challenges in designing a robust data repository are some of the major factors inhibiting public dissemination of proteomics data. Sharing large amounts of data and software is a legitimate need in the field of proteomics and other scientific disciplines as replication of results and the benefits of data reanalysis relies heavily on having access to the original data. Several journals have already published recommendations for providing access to data associated with proteomics manuscripts; however, researchers have been left with the challenge of how to appropriately satisfy the recommendations. Of particular concern is how potentially large datasets (gigabytes to terabytes of raw data) may be efficiently hosted in a publicly accessible fashion.
Described here is Tranche, a secured peer-to-peer system (, along with a reference implementation, supported by, that is capable of hosting virtually unlimited amounts of data and supporting virtually unlimited users. Furthermore Tranche solves many of the prominent concerns in data dissemination, including hosting raw data associated with a proteomics experiment and maintaining annotation. It is intended as both a reference implementation and a model system for comparison to other proteomics data dissemination efforts. Tranche currently hosts many prominent proteomics datasets and mirrors of other proteomics data resources, including most all of the publicly available proteomics data.
PMCID: PMC2292060
6.  P19-S Managing Proteomics Data from Data Generation and Data Warehousing to Central Data Repository and Journal Reviewing Processes 
In today’s proteomics research, various techniques and instrumentation bioinformatics tools are necessary to manage the large amount of heterogeneous data with an automatic quality control to produce reliable and comparable results. Therefore a data-processing pipeline is mandatory for data validation and comparison in a data-warehousing system. The proteome bioinformatics platform ProteinScape has been proven to cover these needs. The reprocessing of HUPO BPP participants’ MS data was done within ProteinScape. The reprocessed information was transferred into the global data repository PRIDE.
ProteinScape as a data-warehousing system covers two main aspects: archiving relevant data of the proteomics workflow and information extraction functionality (protein identification, quantification and generation of biological knowledge). As a strategy for automatic data validation, different protein search engines are integrated. Result analysis is performed using a decoy database search strategy, which allows the measurement of the false-positive identification rate. Peptide identifications across different workflows, different MS techniques, and different search engines are merged to obtain a quality-controlled protein list.
The proteomics identifications database (PRIDE), as a public data repository, is an archiving system where data are finally stored and no longer changed by further processing steps. Data submission to PRIDE is open to proteomics laboratories generating protein and peptide identifications. An export tool has been developed for transferring all relevant HUPO BPP data from ProteinScape into PRIDE using the PRIDE.xml format.
The EU-funded ProDac project will coordinate the development of software tools covering international standards for the representation of proteomics data. The implementation of data submission pipelines and systematic data collection in public standards–compliant repositories will cover all aspects, from the generation of MS data in each laboratory to the conversion of all the annotating information and identifications to a standardized format. Such datasets can be used in the course of publishing in scientific journals.
PMCID: PMC2291891
7.  Dicoogle - an Open Source Peer-to-Peer PACS 
Journal of Digital Imaging  2010;24(5):848-856.
Picture Archiving and Communication Systems (PACS) have been widely deployed in healthcare institutions, and they now constitute a normal commodity for practitioners. However, its installation, maintenance, and utilization are still a burden due to their heavy structures, typically supported by centralized computational solutions. In this paper, we present Dicoogle, a PACS archive supported by a document-based indexing system and by peer-to-peer (P2P) protocols. Replacing the traditional database storage (RDBMS) by a documental organization permits gathering and indexing data from file-based repositories, which allows searching the archive through free text queries. As a direct result of this strategy, more information can be extracted from medical imaging repositories, which clearly increases flexibility when compared with current query and retrieval DICOM services. The inclusion of P2P features allows PACS internetworking without the need for a central management framework. Moreover, Dicoogle is easy to install, manage, and use, and it maintains full interoperability with standard DICOM services.
PMCID: PMC3180530  PMID: 20981467
PACS; Digital Imaging and Communications in Medicine (DICOM); Medical imaging; Peer-to-peer; Computer communication networks; Open source; PACS implementation; Information storage and retrieval
8.  Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms 
Journal of proteome research  2009;8(6):3148-3153.
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (
PMCID: PMC2691775  PMID: 19358578
mass spectrometry; data analysis; search algorithms; software; cloud computing
9.  A web-portal for interactive data exploration, visualization, and hypothesis testing 
Clinical research studies generate data that need to be shared and statistically analyzed by their participating institutions. The distributed nature of research and the different domains involved present major challenges to data sharing, exploration, and visualization. The Data Portal infrastructure was developed to support ongoing research in the areas of neurocognition, imaging, and genetics. Researchers benefit from the integration of data sources across domains, the explicit representation of knowledge from domain experts, and user interfaces providing convenient access to project specific data resources and algorithms. The system provides an interactive approach to statistical analysis, data mining, and hypothesis testing over the lifetime of a study and fulfills a mandate of public sharing by integrating data sharing into a system built for active data exploration. The web-based platform removes barriers for research and supports the ongoing exploration of data.
PMCID: PMC3972454  PMID: 24723882
data exploration; data sharing; genetics; data dictionary; imaging; hypothesis testing
10.  DAS Writeback: A Collaborative Annotation System 
BMC Bioinformatics  2011;12:143.
Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations.
We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task.
We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.
PMCID: PMC3115852  PMID: 21569281
11.  PrestOMIC, an open source application for dissemination of proteomic datasets by individual laboratories 
Proteome Science  2007;5:8.
Technological advances in mass spectrometry and other detection methods are leading to larger and larger proteomics datasets. However, when papers describing such information are published the enormous volume of data can typically only be provided as supplementary data in a tabular form through the journal website. Several journals in the proteomics field, together with the Human Proteome Organization's (HUPO) Proteomics Standards Initiative and institutions such as the Institute for Systems Biology are working towards standardizing the reporting of proteomics data, but just defining standards is only a means towards an end for sharing data. Data repositories such as and the Open Proteomics Database allow for public access to proteomics data but provide little, if any, interpretation.
Results & conclusion
Here we describe PrestOMIC, an open source application for storing mass spectrometry-based proteomic data in a relational database and for providing a user-friendly, searchable and customizable browser interface to share one's data with the scientific community. The underlying database and all associated applications are built on other existing open source tools, allowing PrestOMIC to be modified as the data standards evolve. We then use PrestOMIC to present a recently published dataset from our group through our website.
PMCID: PMC1892544  PMID: 17553161
12.  Human Proteinpedia: a unified discovery resource for proteomics research 
Nucleic Acids Research  2008;37(Database issue):D773-D781.
Sharing proteomic data with the biomedical community through a unified proteomic resource, especially in the context of individual proteins, is a challenging prospect. We have developed a community portal, designated as Human Proteinpedia (, for sharing both unpublished and published human proteomic data through the use of a distributed annotation system designed specifically for this purpose. This system allows laboratories to contribute and maintain protein annotations, which are also mapped to the corresponding proteins through the Human Protein Reference Database (HPRD; Thus, it is possible to visualize data pertaining to experimentally validated posttranslational modifications (PTMs), protein isoforms, protein–protein interactions (PPIs), tissue expression, expression in cell lines, subcellular localization and enzyme substrates in the context of individual proteins. With enthusiastic participation of the proteomics community, the past 15 months have witnessed data contributions from more than 75 labs around the world including 2710 distinct experiments, >1.9 million peptides, >4.8 million MS/MS spectra, 150 368 protein expression annotations, 17 410 PTMs, 34 624 PPIs and 2906 subcellular localization annotations. Human Proteinpedia should serve as an integrated platform to store, integrate and disseminate such proteomic data and is inching towards evolving into a unified human proteomics resource.
PMCID: PMC2686511  PMID: 18948298
13.  The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data Repositories 
The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.
PMCID: PMC2744712  PMID: 19567788
14.  Accessing and distributing EMBL data using CORBA (common object request broker architecture) 
Genome Biology  2000;1(5):research0010.1-research0010.10.
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data.
A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism.
The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
PMCID: PMC15028  PMID: 11178259
15.  PEDRo: A database for storing, searching and disseminating experimental proteomics data 
BMC Genomics  2004;5:68.
Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets.
This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available.
The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
PMCID: PMC521486  PMID: 15377392
16.  OpenKnowledge for peer-to-peer experimentation in protein identification by MS/MS 
Traditional scientific workflow platforms usually run individual experiments with little evaluation and analysis of performance as required by automated experimentation in which scientists are being allowed to access numerous applicable workflows rather than being committed to a single one. Experimental protocols and data under a peer-to-peer environment could potentially be shared freely without any single point of authority to dictate how experiments should be run. In such environment it is necessary to have mechanisms by which each individual scientist (peer) can assess, locally, how he or she wants to be involved with others in experiments. This study aims to implement and demonstrate simple peer ranking under the OpenKnowledge peer-to-peer infrastructure by both simulated and real-world bioinformatics experiments involving multi-agent interactions.
A simulated experiment environment with a peer ranking capability was specified by the Lightweight Coordination Calculus (LCC) and automatically executed under the OpenKnowledge infrastructure. The peers such as MS/MS protein identification services (including web-enabled and independent programs) were made accessible as OpenKnowledge Components (OKCs) for automated execution as peers in the experiments. The performance of the peers in these automated experiments was monitored and evaluated by simple peer ranking algorithms.
Peer ranking experiments with simulated peers exhibited characteristic behaviours, e.g., power law effect (a few dominant peers dominate), similar to that observed in the traditional Web. Real-world experiments were run using an interaction model in LCC involving two different types of MS/MS protein identification peers, viz., peptide fragment fingerprinting (PFF) and de novo sequencing with another peer ranking algorithm simply based on counting the successful and failed runs. This study demonstrated a novel integration and useful evaluation of specific proteomic peers and found MASCOT to be a dominant peer as judged by peer ranking.
The simulated and real-world experiments in the present study demonstrated that the OpenKnowledge infrastructure with peer ranking capability can serve as an evaluative environment for automated experimentation.
PMCID: PMC3377912  PMID: 22192521
17.  CNV-WebStore: Online CNV Analysis, Storage and Interpretation 
BMC Bioinformatics  2011;12:4.
Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system.
We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results.
CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.
PMCID: PMC3024943  PMID: 21208430
18.  Optimization of a low-cost truly preemptive multitasking PC diagnostic workstation 
Journal of Digital Imaging  1997;10(Suppl 1):171-174.
The Windows 95/NT operating systems (Microsoft Corp, Redmond, WA) currently provide the only low-cost truly preemptive multitasking environment and as such become an attractive diagnostic workstation platform. The purpose of this project is to test and optimize display station graphical user interface (GUI) actions previously designed on the pseudomultitasking Macintosh (Apple Computer, Cupertino, CA) platform, and image data transmission using time slicing/dynamic prioritization assignment capabilities of the new Windows platform. A diagnostic workstation in the clinical environment must process two categories of events: user interaction with the GUI through keyboard/mouse input, and transmission of incoming data files. These processes contend for central processing units (CPU) time resulting in GUI “lockout” during image transmission or delay in transmission until GUI “quiet time.” WinSockets and the Transmission Control Protocol/Internet Protocal (TCP/IP) communication protocol software (Microsoft) are implemented using dynamic priority timeslicing to ensure that GUI delays at the time of Digital Imaging and Communications in Medicine (DICOM) file transfer do not exceed 1/10 second. Assignment of thread priority does not translate into an absolute fixed percentage of CPU time. Therefore, the relationship between dynamic priority assignment by the processor, and the GUI and communication application threads will be more fully investigated to optimize CPU resource allocation. These issues will be tested using 10 MB/sec Ethernet and 100 MB/sec fast and wide Ethernet transmission. Preliminary results of typical clinical files (10 to 30 MB) over Ethernet show no visually perceptible interruption of the GUI, suggesting that the new Windows PC platform may be a viable diagnostic workstation option.
PMCID: PMC3452846  PMID: 9268871
preemptive multitasking; diagnostic workstation; optimization; CPU resources
19.  XperimentR: painless annotation of a biological experiment for the laboratory scientist 
BMC Bioinformatics  2013;14:8.
Today’s biological experiments often involve the collaboration of multidisciplinary researchers utilising several high throughput ‘omics platforms. There is a requirement for the details of the experiment to be adequately described using standardised ontologies to enable data preservation, the analysis of the data and to facilitate the export of the data to public repositories. However there are a bewildering number of ontologies, controlled vocabularies, and minimum standards available for use to describe experiments. There is a need for user-friendly software tools to aid laboratory scientists in capturing the experimental information.
A web application called XperimentR has been developed for use by laboratory scientists, consisting of a browser-based interface and server-side components which provide an intuitive platform for capturing and sharing experimental metadata. Information recorded includes details about the biological samples, procedures, protocols, and experimental technologies, all of which can be easily annotated using the appropriate ontologies. Files and raw data can be imported and associated with the biological samples via the interface, from either users’ computers, or commonly used open-source data repositories. Experiments can be shared with other users, and experiments can be exported in the standard ISA-Tab format for deposition in public databases. XperimentR is freely available and can be installed natively or by using a provided pre-configured Virtual Machine. A guest system is also available for trial purposes.
We present a web based software application to aid the laboratory scientist to capture, describe and share details about their experiments.
PMCID: PMC3571946  PMID: 23323856
Experimental annotation; Ontologies; Biological data management
20.  A virtual repository approach to clinical and utilization studies: application in mammography as alternative to a national database. 
A national mammography database was proposed, based on a centralized architecture for collecting, monitoring, and auditing mammography data. We have developed an alternative architecture relying on Internet-based distributed queries to heterogeneous databases. This architecture creates a "virtual repository", or a federated database which is constructed dynamically, for each query and makes use of data available in legacy systems. It allows the construction of custom-tailored databases at individual sites that can serve the dual purposes of providing data (a) to researchers through a common mammography repository and (b) to clinicians and administrators at participating institutions. We implemented this architecture in a prototype system at the Brigham and Women's Hospital to show its feasibility. Common queries are translated dynamically into database-specific queries, and the results are aggregated for immediate display or download by the user. Data reside in two different databases and consist of structured mammography reports, coded per BIRADS Standardized Mammography Lexicon, as well as pathology results. We prospectively collected data on 213 patients, and showed that our system can perform distributed queries effectively. We also implemented graphical exploratory analysis tools to allow visualization of results. Our findings indicate that the architecture is not only feasible, but also flexible and scaleable, constituting a good alternative to a national mammography database.
PMCID: PMC2233552  PMID: 9357650
21.  A Computational Approach to Analyze the Mechanism of Action of the Kinase Inhibitor Bafetinib 
PLoS Computational Biology  2010;6(11):e1001001.
Prediction of drug action in human cells is a major challenge in biomedical research. Additionally, there is strong interest in finding new applications for approved drugs and identifying potential side effects. We present a computational strategy to predict mechanisms, risks and potential new domains of drug treatment on the basis of target profiles acquired through chemical proteomics. Functional protein-protein interaction networks that share one biological function are constructed and their crosstalk with the drug is scored regarding function disruption. We apply this procedure to the target profile of the second-generation BCR-ABL inhibitor bafetinib which is in development for the treatment of imatinib-resistant chronic myeloid leukemia. Beside the well known effect on apoptosis, we propose potential treatment of lung cancer and IGF1R expressing blast crisis.
Author Summary
Protein interaction data are accumulating rapidly and, although imperfect and incomplete, they provide a valuable global description of the complex interplay of proteins in a human cell. In parallel, modern proteomics technologies make it possible to measure in an unbiased manner the protein targets of a drug. Such data reveal multiple targets in a view that contrasts with a previously prevalent paradigm that drugs had single – or a very limited number of – targets. In this context of newly available systems level data and more precise and complete information about drug interactions, it is natural to try to determine the global perturbation exerted by a drug on a human cell to identify potential side effects and additional indications. We present a computational method that aims at making such predictions and apply it to bafetinib, a recently developed leukemia drug. We show that meaningful predictions of additional applications to other cancers or resistant cases and likely side effects are obtained that are not straightforward to determine with existing algorithms. Our method has a strong potential to be applicable to other drugs.
PMCID: PMC2987840  PMID: 21124949
22.  Enhancing Peptide Identification Confidence by Combining Search Methods 
Journal of Proteome Research  2008;7(8):3102-3113.
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.
Properly combining the results from different database search methods will enhance the accuracy of peptide identifications. This nevertheless requires, among different search methods, a common statistical standard, which can be achieved by statistical calibration of E-values. After transforming E-values into database P-values, we use the well-established Fisher’s formula to combine different P-values. Our protocol provides a statistically sound method for integration of search results.
PMCID: PMC2658881  PMID: 18558733
23.  WebLab: a data-centric, knowledge-sharing bioinformatic platform 
Nucleic Acids Research  2009;37(Web Server issue):W33-W39.
With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at, with all source code released as Free Software.
PMCID: PMC2703900  PMID: 19465388
24.  CIAN - Cell Imaging and Analysis Network at the Biology Department of McGill University 
The Cell Imaging and Analysis Network (CIAN) provides services and tools to researchers in the field of cell biology from within or outside Montreal's McGill University community. CIAN is composed of six scientific platforms: Cell Imaging (confocal and fluorescence microscopy), Proteomics (2-D protein gel electrophoresis and DiGE, fluorescent protein analysis), Automation and High throughput screening (Pinning robot and liquid handler), Protein Expression for Antibody Production, Genomics (real-time PCR), and Data storage and analysis (cluster, server, and workstations). Users submit project proposals, and can obtain training and consultation in any aspect of the facility, or initiate projects with the full-service platforms. CIAN is designed to facilitate training, enhance interactions, as well as share and maintain resources and expertise.
PMCID: PMC2918154
25.  A guide to the Proteomics Identifications Database proteomics data repository 
Proteomics  2009;9(18):4276-4283.
The Proteomics Identifications Database (PRIDE, is one of the main repositories of MS derived proteomics data. Here, we point out the main functionalities of PRIDE both as a submission repository and as a source for proteomics data. We describe the main features for data retrieval and visualization available through the PRIDE web and BioMart interfaces. We also highlight the mechanism by which tailored queries in the BioMart can join PRIDE to other resources such as Reactome, Ensembl or UniProt to execute extremely powerful across-domain queries. We then present the latest improvements in the PRIDE submission process, using the new easy-to-use, platform-independent graphical user interface submission tool PRIDE Converter. Finally, we speak about future plans and the role of PRIDE in the ProteomExchange consortium.
PMCID: PMC2970915  PMID: 19662629
Bioinformatics; Data repository; Mass spectrometry

Results 1-25 (690088)