PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (188097)

Clipboard (0)
None

Related Articles

1.  Aptamer Database 
Nucleic Acids Research  2004;32(Database issue):D95-D100.
The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in ‘natural’ sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer.icmb.utexas.edu/.
doi:10.1093/nar/gkh094
PMCID: PMC308828  PMID: 14681367
2.  AANT: the Amino Acid–Nucleotide Interaction Database 
Nucleic Acids Research  2004;32(Database issue):D174-D181.
We have created an Amino Acid–Nucleotide Interaction Database (AANT; http://aant.icmb.utexas.edu/) that categorizes all amino acid–nucleotide interactions from experimentally determined protein–nucleic acid structures, and provides users with a graphic interface for visualizing these interactions in aggregate. AANT accomplishes this by extracting individual amino acid–nucleotide interactions from structures in the Protein Data Bank, combining and superimposing these interactions into multiple structure files (e.g. 20 amino acids × 5 nucleotides) and grouping structurally similar interactions into more readily identifiable clusters. Using the Chime web browser plug-in, users can view 3D representations of the superimpositions and clusters. The unique collection and representation of data on amino acid–nucleotide interactions facilitates understanding the specificity of protein–nucleic acid interactions at a more fundamental level, and allows comparison of otherwise extremely disparate sets of structures. Moreover, by modularly representing the fundamental interactions that govern binding specificity it may prove possible to better engineer nucleic acid binding proteins.
doi:10.1093/nar/gkh128
PMCID: PMC308862  PMID: 14681388
3.  A dynamic data structure for flexible molecular maintenance and informatics 
Bioinformatics  2010;27(1):55-62.
Motivation: We present the ‘Dynamic Packing Grid’ (DPG), a neighborhood data structure for maintaining and manipulating flexible molecules and assemblies, for efficient computation of binding affinities in drug design or in molecular dynamics calculations.
Results: DPG can efficiently maintain the molecular surface using only linear space and supports quasi-constant time insertion, deletion and movement (i.e. updates) of atoms or groups of atoms. DPG also supports constant time neighborhood queries from arbitrary points. Our results for maintenance of molecular surface and polarization energy computations using DPG exhibit marked improvement in time and space requirements.
Availability: http://www.cs.utexas.edu/~bajaj/cvc/software/DPG.shtml
Contact: bajaj@cs.utexas.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq627
PMCID: PMC3008647  PMID: 21115440
4.  MaXIC-Q Web: a fully automated web service using statistical and computational methods for protein quantitation based on stable isotope labeling and LC–MS 
Nucleic Acids Research  2009;37(Web Server issue):W661-W669.
Isotope labeling combined with liquid chromatography–mass spectrometry (LC–MS) provides a robust platform for analyzing differential protein expression in proteomics research. We present a web service, called MaXIC-Q Web (http://ms.iis.sinica.edu.tw/MaXIC-Q_Web/), for quantitation analysis of large-scale datasets generated from proteomics experiments using various stable isotope-labeling techniques, e.g. SILAC, ICAT and user-developed labeling methods. It accepts spectral files in the standard mzXML format and search results from SEQUEST, Mascot and ProteinProphet as input. Furthermore, MaXIC-Q Web uses statistical and computational methods to construct two kinds of elution profiles for each ion, namely, PIMS (projected ion mass spectrum) and XIC (extracted ion chromatogram) from MS data. Toward accurate quantitation, a stringent validation procedure is performed on PIMSs to filter out peptide ions interfered with co-eluting peptides or noise. The areas of XICs determine ion abundances, which are used to calculate peptide and protein ratios. Since MaXIC-Q Web adopts stringent validation on spectral data, it achieves high accuracy so that manual validation effort can be substantially reduced. Furthermore, it provides various visualization diagrams and comprehensive quantitation reports so that users can conveniently inspect quantitation results. In summary, MaXIC-Q Web is a user-friendly, interactive, robust, generic web service for quantitation based on ICAT and SILAC labeling techniques.
doi:10.1093/nar/gkp476
PMCID: PMC2703943  PMID: 19528069
5.  ISPIDER Central: an integrated database web-server for proteomics 
Nucleic Acids Research  2008;36(Web Server issue):W485-W490.
Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics.
doi:10.1093/nar/gkn196
PMCID: PMC2447780  PMID: 18440977
6.  BioRuby: bioinformatics software for the Ruby programming language 
Bioinformatics  2010;26(20):2617-2619.
Summary: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser.
Availability: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/.
Contact: katayama@bioruby.org
doi:10.1093/bioinformatics/btq475
PMCID: PMC2951089  PMID: 20739307
7.  PPDB, the Plant Proteomics Database at Cornell 
Nucleic Acids Research  2008;37(Database issue):D969-D974.
The Plant Proteomics Database (PPDB; http://ppdb.tc.cornell.edu), launched in 2004, provides an integrated resource for experimentally identified proteins in Arabidopsis and maize (Zea mays). Internal BLAST alignments link maize and Arabidopsis information. Experimental identification is based on in-house mass spectrometry (MS) of cell type-specific proteomes (maize), or specific subcellular proteomes (e.g. chloroplasts, thylakoids, nucleoids) and total leaf proteome samples (maize and Arabidopsis). So far more than 5000 accessions both in maize and Arabidopsis have been identified. In addition, more than 80 published Arabidopsis proteome datasets from subcellular compartments or organs are stored in PPDB and linked to each locus. Using MS-derived information and literature, more than 1500 Arabidopsis proteins have a manually assigned subcellular location, with a strong emphasis on plastid proteins. Additional new features of PPDB include searchable posttranslational modifications and searchable experimental proteotypic peptides and spectral count information for each identified accession based on in-house experiments. Various search methods are provided to extract more than 40 data types for each accession and to extract accessions for different functional categories or curated subcellular localizations. Protein report pages for each accession provide comprehensive overviews, including predicted protein properties, with hyperlinks to the most relevant databases.
doi:10.1093/nar/gkn654
PMCID: PMC2686560  PMID: 18832363
8.  The Wildcat Toolbox: A Set of Perl Script Utilities for Use in Peptide Mass Spectral Database Searching and Proteomics Experiments 
We describe in this communication a set of functional perl script utilities for use in peptide mass spectral database searching and proteomics experiments, known as the Wildcat Toolbox. These are all freely available for download from our laboratory Web site (http://proteomics.arizona.edu/toolbox.html) as a combined zip file, and can also be accessed via the Proteome Commons Web site (www.proteomecommons.org) in the tools section. We make them available to other potential users in the spirit of open source software development; we do not have the resources to provide any significant technical support for them, but we hope users will share both bugs and improvements with the community at large.
PMCID: PMC2291772  PMID: 16741236
Tandem mass spectrometry; protein identification; proteomics; perl; software development; fasta
9.  SynapticDB, Effective Web-based Management and Sharing of Data from Serial Section Electron Microscopy 
Neuroinformatics  2010;9(1):39-57.
Serial section electron microscopy (ssEM) is rapidly expanding as a primary tool to investigate synaptic circuitry and plasticity. The ultrastructural images collected through ssEM are content rich and their comprehensive analysis is beyond the capacity of an individual laboratory. Hence, sharing ultrastructural data is becoming crucial to visualize, analyze, and discover the structural basis of synaptic circuitry and function in the brain. We devised a web-based management system called SynapticDB (http://synapses.clm.utexas.edu/synapticdb/) that catalogues, extracts, analyzes, and shares experimental data from ssEM. The management strategy involves a library with check-in, checkout and experimental tracking mechanisms. We developed a series of spreadsheet templates (MS Excel, Open Office spreadsheet, etc) that guide users in methods of data collection, structural identification, and quantitative analysis through ssEM. SynapticDB provides flexible access to complete templates, or to individual columns with instructional headers that can be selected to create user-defined templates. New templates can also be generated and uploaded. Research progress is tracked via experimental note management and dynamic PDF forms that allow new investigators to follow standard protocols and experienced researchers to expand the range of data collected and shared. The combined use of templates and tracking notes ensures that the supporting experimental information is populated into the database and associated with the appropriate ssEM images and analyses. We anticipate that SynapticDB will serve future meta-analyses towards new discoveries about the composition and circuitry of neurons and glia, and new understanding about structural plasticity during development, behavior, learning, memory, and neuropathology.
doi:10.1007/s12021-010-9088-4
PMCID: PMC3063557  PMID: 21181305
Online database systems; Online data file checkout and check-in; Data and image sharing; Synapse structure and function; Data management; Dynamic PDF forms
10.  FUGOID: functional genomics of organellar introns database 
Nucleic Acids Research  2002;30(1):385-386.
FUGOID is a web-based, taxonomically broad organelle intron database that collects and integrates various functional and structural data on organellar (mitochondrial and chloroplast) introns. The main information provided by FUGOID includes intron sequence, subclass, resident ORF, self-splicing capability, host gene, protein factor(s) involved in splicing, mobility, insertion site, twintron, seminal references and taxonomic position of host organism. It is implemented in a relational database management system, allowing sophisticated, user-friendly searching, data entry and revision. Users can access the database by any common web browser using a variety of operating systems. The main page of the database is available at http://wnt.cc.utexas.edu/~ifmr530/introndata/main.htm.
PMCID: PMC99166  PMID: 11752344
11.  iPiG: Integrating Peptide Spectrum Matches into Genome Browser Visualizations 
PLoS ONE  2012;7(12):e50246.
Proteogenomic approaches have gained increasing popularity, however it is still difficult to integrate mass spectrometry identifications with genomic data due to differing data formats. To address this difficulty, we introduce iPiG as a tool for the integration of peptide identifications from mass spectrometry experiments into existing genome browser visualizations. Thereby, the concurrent analysis of proteomic and genomic data is simplified and proteomic results can directly be compared to genomic data. iPiG is freely available from https://sourceforge.net/projects/ipig/. It is implemented in Java and can be run as a stand-alone tool with a graphical user-interface or integrated into existing workflows. Supplementary data are available at PLOS ONE online.
doi:10.1371/journal.pone.0050246
PMCID: PMC3514238  PMID: 23226516
12.  An XML standard for the dissemination of annotated 2D gel electrophoresis data complemented with mass spectrometry results 
BMC Bioinformatics  2004;5:9.
Background
Many proteomics initiatives require a seamless bioinformatics integration of a range of analytical steps between sample collection and systems modeling immediately assessable to the participants involved in the process. Proteomics profiling by 2D gel electrophoresis to the putative identification of differentially expressed proteins by comparison of mass spectrometry results with reference databases, includes many components of sample processing, not just analysis and interpretation, are regularly revisited and updated. In order for such updates and dissemination of data, a suitable data structure is needed. However, there are no such data structures currently available for the storing of data for multiple gels generated through a single proteomic experiments in a single XML file. This paper proposes a data structure based on XML standards to fill the void that exists between data generated by proteomics experiments and storing of data.
Results
In order to address the resulting procedural fluidity we have adopted and implemented a data model centered on the concept of annotated gel (AG) as the format for delivery and management of 2D Gel electrophoresis results. An eXtensible Markup Language (XML) schema is proposed to manage, analyze and disseminate annotated 2D Gel electrophoresis results. The structure of AG objects is formally represented using XML, resulting in the definition of the AGML syntax presented here.
Conclusion
The proposed schema accommodates data on the electrophoresis results as well as the mass-spectrometry analysis of selected gel spots. A web-based software library is being developed to handle data storage, analysis and graphic representation. Computational tools described will be made available at . Our development of AGML provides a simple data structure for storing 2D gel electrophoresis data.
doi:10.1186/1471-2105-5-9
PMCID: PMC341449  PMID: 15005801
13.  METAL: fast and efficient meta-analysis of genomewide association scans 
Bioinformatics  2010;26(17):2190-2191.
Summary: METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats.
Availability and implementation: METAL, including source code, documentation, examples, and executables, is available at http://www.sph.umich.edu/csg/abecasis/metal/
Contact: goncalo@umich.edu
doi:10.1093/bioinformatics/btq340
PMCID: PMC2922887  PMID: 20616382
14.  The University of Texas at Austin — Protein and Metabolite Analysis Facility 
Journal of Biomolecular Techniques : JBT  2011;22(Supplement):S47.
The Protein and Metabolite Analysis Facility at the University of Texas at Austin is a joint effort of the College of Pharmacy, Center for Research on Environmental Disease (CRED), and the Institute for Cellular and Molecular Biology (ICMB). Services and collaborative research are offered for the detection, characterization and quantification of biomolecules. The Facility's goals are to provide sensitive protein identification and modification analyses, to provide custom peptide syntheses, to offer services for the identification and quantification of metabolites, nutrients and xenobiotics, to implement novel analytical methods, to improve the sensitivity of existing analyses, to provide consultation on the selection and implementation of analytical methods, to offer training in the usage and applications of the instrumentation, and to provide technical expertise in support of individual research goals. The ICMB portion of the Core contains an ABI Procise 492 cLC protein sequencer, a Protein Technologies Inc. Symphony peptide synthesizer, two Bio-rad Duoflows and a GE Heathcare AKTA protein purification systems, two Beckman System Gold HPLC systems, a Berthold Technologies Mithras luminescence and fluorescence detector, an Invitrogen gel electrophoresis set-up, an Art Robbins Instruments Phoenix crystallography robot and a LC-MALDI-TOF/TOF (an ABI 4700 with a LC Packings Ultimate Nano-LC system with a Probot spotting robot). In the College of Pharmacy, the Core has an Applied Biosystems 4000 Q-trap LC MS/MS system with ESI, APCI and nanospray sources coupled with a Shimadzu LC-20AD HPLC system, ThermoFinnigan LCQ ion trap mass spectrometer with ESI, APCI and microspray interfaces combined with a Michrom Magic 2002 HPLC system, a ThermoFinnigan Trace MS GC-quadropole with EI positive, negative CI and selected ion monitoring (SIM), an ABI Voyager-DE Pro MALDI-TOF and a Bio-rad Bioplex 200 fluorescent microbead array system.
PMCID: PMC3186625
15.  2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments 
BMC Bioinformatics  2008;9:302.
Background
The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed.
Results
In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application.
Conclusion
We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.
doi:10.1186/1471-2105-9-302
PMCID: PMC2475538  PMID: 18605993
16.  Proteomics data repositories: Providing a safe haven for your data and acting as a springboard for further research 
Journal of Proteomics  2010;73(11):2136-2146.
Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, several mass spectrometry (MS) based proteomics repositories are publicly available for the scientific community. The main existing resources are: the Global Proteome Machine Database (GPMDB), PeptideAtlas, the PRoteomics IDEntifications database (PRIDE), Tranche, and NCBI Peptidome. In this review the capabilities of each of these will be described, paying special attention to four key properties: data types stored, applicable data submission strategies, supported formats, and available data mining and visualization tools. Additionally, the data contents from model organisms will be enumerated for each resource. There are other valuable smaller and/or more specialized repositories but they will not be covered in this review. Finally, the concept behind the ProteomeXchange consortium, a collaborative effort among the main resources in the field, will be introduced.
Graphical Abstract
doi:10.1016/j.jprot.2010.06.008
PMCID: PMC2958306  PMID: 20615486
CV, Controlled Vocabulary; HGNC, HUGO Gene Nomenclature Committee; MCP, Molecular and Cellular Proteomics; MRM, Multiple Reaction Monitoring; NIH, National Institutes of Health; OLS, Ontology Lookup Service; PICR, Protein Identifier Cross-Referencing; PSI, Proteomics Standards Initiative; QC, Quality Control; SRM, Selected Reaction Monitoring; SBEAMS, Systems Biology Experiment Analysis Management System; TPP, Trans Proteomics Pipeline.; Proteomics; Databases; Bioinformatics; Data standards; Repositories
17.  PRIDE: a public repository of protein and peptide identifications for the proteomics community 
Nucleic Acids Research  2005;34(Database issue):D659-D663.
PRIDE, the ‘PRoteomics IDEntifications database’ () is a database of protein and peptide identifications that have been described in the scientific literature. These identifications will typically be from specific species, tissues and sub-cellular locations, perhaps under specific disease conditions. Any post-translational modifications that have been identified on individual peptides can be described. These identifications may be annotated with supporting mass spectra. At the time of writing, PRIDE includes the full set of identifications as submitted by individual laboratories participating in the HUPO Plasma Proteome Project and a profile of the human platelet proteome submitted by the University of Ghent in Belgium. By late 2005 PRIDE is expected to contain the identifications and spectra generated by the HUPO Brain Proteome Project. Proteomics laboratories are encouraged to submit their identifications and spectra to PRIDE to support their manuscript submissions to proteomics journals. Data can be submitted in PRIDE XML format if identifications are included or mzData format if the submitter is depositing mass spectra without identifications. PRIDE is a web application, so submission, searching and data retrieval can all be performed using an internet browser. PRIDE can be searched by experiment accession number, protein accession number, literature reference and sample parameters including species, tissue, sub-cellular location and disease state. Data can be retrieved as machine-readable PRIDE or mzData XML (the latter for mass spectra without identifications), or as human-readable HTML.
doi:10.1093/nar/gkj138
PMCID: PMC1347500  PMID: 16381953
18.  The Ruby UCSC API: accessing the UCSC genome database using Ruby 
BMC Bioinformatics  2012;13:240.
Background
The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby.
Results
The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.
The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby).
Conclusions
Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
doi:10.1186/1471-2105-13-240
PMCID: PMC3542311  PMID: 22994508
19.  PIQA: pipeline for Illumina G1 genome analyzer data quality assessment 
Bioinformatics  2009;25(18):2438-2439.
Summary: PIQA is a quality analysis pipeline designed to examine genomic reads produced by Next Generation Sequencing technology (Illumina G1 Genome Analyzer). A short statistical summary, as well as tile-by-tile and cycle-by-cycle graphical representation of clusters density, quality scores and nucleotide frequencies allow easy identification of various technical problems including defective tiles, mistakes in sample/library preparations and abnormalities in the frequencies of appearance of sequenced genomic reads. PIQA is written in the R statistical programming language and is compatible with bustard, fastq and scarf Illumina G1 Genome Analyzer data formats.
Availability: The PIQA pipeline, installation instructions and examples are available at the supplementary web site (http://bioinfo.uh.edu/PIQA).
Contact: yfofanov@bioinfo.uh.edu
doi:10.1093/bioinformatics/btp429
PMCID: PMC2735671  PMID: 19602525
20.  Bioinformatics as a Tool for Assessing the Quality of Sub-Cellular Proteomic Strategies and Inferring Functions of Proteins: Plant Cell Wall Proteomics as a Test Case 
Bioinformatics is used at three different steps of proteomic studies of sub-cellular compartments. First one is protein identification from mass spectrometry data. Second one is prediction of sub-cellular localization, and third one is the search of functional domains to predict the function of identified proteins in order to answer biological questions. The aim of the work was to get a new tool for improving the quality of proteomics of sub-cellular compartments. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/). It collects in one page predictions of sub-cellular localization and of functional domains made by available software. Using this database allows not only improvement of interpretation of proteomic data (top-down analysis), but also of procedures to isolate sub-cellular compartments (bottom-up quality control).
PMCID: PMC2808182  PMID: 20140071
bioinformatics; cell wall; plant; proteomics
21.  An informatic pipeline for the data capture and submission of quantitative proteomic data using iTRAQTM 
Proteome Science  2007;5:4.
Background
Proteomics continues to play a critical role in post-genomic science as continued advances in mass spectrometry and analytical chemistry support the separation and identification of increasing numbers of peptides and proteins from their characteristic mass spectra. In order to facilitate the sharing of this data, various standard formats have been, and continue to be, developed. Still not fully mature however, these are not yet able to cope with the increasing number of quantitative proteomic technologies that are being developed.
Results
We propose an extension to the PRIDE and mzData XML schema to accommodate the concept of multiple samples per experiment, and in addition, capture the intensities of the iTRAQTM reporter ions in the entry. A simple Java-client has been developed to capture and convert the raw data from common spectral file formats, which also uses a third-party open source tool for the generation of iTRAQTM reported intensities from Mascot output, into a valid PRIDE XML entry.
Conclusion
We describe an extension to the PRIDE and mzData schemas to enable the capture of quantitative data. Currently this is limited to iTRAQTM data but is readily extensible for other quantitative proteomic technologies. Furthermore, a software tool has been developed which enables conversion from various mass spectrum file formats and corresponding Mascot peptide identifications to PRIDE formatted XML. The tool represents a simple approach to preparing quantitative and qualitative data for submission to repositories such as PRIDE, which is necessary to facilitate data deposition and sharing in public domain database. The software is freely available from .
doi:10.1186/1477-5956-5-4
PMCID: PMC1796855  PMID: 17270041
22.  Roundup 2.0: enabling comparative genomics for over 1800 genomes 
Bioinformatics  2012;28(5):715-716.
Summary: Roundup is an online database of gene orthologs for over 1800 genomes, including 226 Eukaryota, 1447 Bacteria, 113 Archaea and 21 Viruses. Orthologs are inferred using the Reciprocal Smallest Distance algorithm. Users may query Roundup for single-linkage clusters of orthologous genes based on any group of genomes. Annotated query results may be viewed in a variety of ways including as clusters of orthologs and as phylogenetic profiles. Genomic results may be downloaded in formats suitable for functional as well as phylogenetic analysis, including the recent OrthoXML standard. In addition, gene IDs can be retrieved using FASTA sequence search. All source code and orthologs are freely available.
Availability: http://roundup.hms.harvard.edu
Contact: dpwall@hms.harvard.edu; todd_deluca@hms.harvard.edu
doi:10.1093/bioinformatics/bts006
PMCID: PMC3289913  PMID: 22247275
23.  jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats 
Proteomics  2012;12(6):795-798.
We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/.
doi:10.1002/pmic.201100578
PMCID: PMC3472022  PMID: 22539430
Bioinformatics; Data standard; Java; MS data processing; Proteomics standards initiative
24.  Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics 
Bioinformatics  2012;28(7):1035-1037.
Summary: Biogem provides a software development environment for the Ruby programming language, which encourages community-based software development for bioinformatics while lowering the barrier to entry and encouraging best practices.
Biogem, with its targeted modular and decentralized approach, software generator, tools and tight web integration, is an improved general model for scaling up collaborative open source software development in bioinformatics.
Availability: Biogem and modules are free and are OSS. Biogem runs on all systems that support recent versions of Ruby, including Linux, Mac OS X and Windows. Further information at http://www.biogems.info. A tutorial is available at http://www.biogems.info/howto.html
Contact: bonnal@ingm.org
doi:10.1093/bioinformatics/bts080
PMCID: PMC3315718  PMID: 22332238
25.  SoyProDB: A database for the identification of soybean seed proteins 
Bioinformation  2013;9(3):165-167.
Soybean continues to serve as a rich and inexpensive source of protein for humans and animals. A substantial amount of information has been reported on the genotypic variation and beneficial genetic manipulation of soybeans. For better understanding of the consequences of genetic manipulation, elucidation of soybean protein composition is necessary, because of its direct relationship to phenotype. We have conducted studies to determine the composition of storage, allergen and anti-nutritional proteins in cultivated soybean using a combined proteomics approach. Two-dimensional polyacrylamide gel electrophoresis (2DPAGE) was implemented for the separation of proteins along with matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) and liquid chromatography mass spectrometry (LC-MS/MS) for the identification of proteins. Our analysis resulted in the identification of several proteins, and a web based database named soybean protein database (SoyProDB) was subsequently built to house and allow scientists to search the data. This database will be useful to scientists who wish to genetically alter soybean with higher quality storage proteins, and also helpful for consumers to get a greater understanding about proteins that compose soy products available in the market. The database is freely accessible.
Availability
http://bioinformatics.towson.edu/Soybean_Seed_Proteins_2D_Gel_DB/Home.aspx
doi:10.6026/97320630009165
PMCID: PMC3569605  PMID: 23423175
Soybean; 2D-PAGE; MALDI-TOF-MS; LC-MS/MS; Conglycinin; Glycinin; Allergen proteins

Results 1-25 (188097)