Myocardial ischemia-reperfusion induces mitochondrial dysfunction and, depending upon the degree of injury, may lead to cardiac cell death. However, our ability to understand mitochondrial dysfunction has been hindered by an absence of molecular markers defining the various degrees of injury. To address this paucity of knowledge, we sought to characterize the impact of ischemic damage on mitochondrial proteome biology. We hypothesized that ischemic injury induces differential alterations in various mitochondrial sub-compartments, that these proteomic changes are specific to the severity of injury, and that they are important to subsequent cellular adaptations to myocardial ischemic injury. Accordingly, an in vitro model of cardiac mitochondria injury in mice was established to examine two stress conditions: reversible injury (induced by mild calcium overload) and irreversible injury (induced by hypotonic stimuli). Both forms of injury had a drastic impact on the proteome biology of cardiac mitochondria. Altered mitochondrial function was concomitant with significant protein loss/shedding from the injured organelles. In the setting of mild calcium overload, mitochondria retained functionality despite the release of numerous proteins, and the majority of mitochondria remained intact. In contrast, hypotonic stimuli caused severe damage to mitochondrial structure and function, induced increased oxidative modification of mitochondrial proteins, and brought about detrimental changes to the sub proteomes of the inner mitochondrial membrane and matrix. Using an established in vivo murine model of regional myocardial ischemic injury, we validated key observations made by the in vitro model. This pre-clinical investigation provides function and sub-organelle location information on a repertoire of cardiac mitochondrial proteins sensitive to ischemia reperfusion stress and highlights protein clusters potentially involved in mitochondrial dysfunction in the setting of ischemic injury.
proteome biology; ischemia injury; cardiac mitochondria; reversible injury; irreversible injury
Transcriptional control ensures genes are expressed in the right amounts at the correct times and locations. Understanding quantitatively how regulatory systems convert input signals to appropriate outputs remains a challenge. For the first time, we successfully model even skipped (eve) stripes 2 and 3+7 across the entire fly embryo at cellular resolution. A straightforward statistical relationship explains how transcription factor (TF) concentrations define eve’s complex spatial expression, without the need for pairwise interactions or cross-regulatory dynamics. Simulating thousands of TF combinations, we recover known regulators and suggest new candidates. Finally, we accurately predict the intricate effects of perturbations including TF mutations and misexpression. Our approach imposes minimal assumptions about regulatory function; instead we infer underlying mechanisms from models that best fit the data, like the lack of TF-specific thresholds and the positional value of homotypic interactions. Our study provides a general and quantitative method for elucidating the regulation of diverse biological systems.
The transcription of genes into messenger RNA (mRNA) molecules is one of the most important processes in biology, but our present understanding of this process is largely qualitative. Molecules such as transcription factors and regions of DNA other than the region that codes for the mRNA are known to interact with each other to influence the onset of transcription, and also the rate at which it occurs. However, given the cellular concentrations of transcription factors in a developing organism, it is not known if it is possible to accurately predict their effects on transcription. Being able to make such predictions would greatly improve our understanding of how transcription and the development of an organism are controlled.
Ilsley et al. have tackled this problem by analysing a large volume of data called the Virtual Embryo dataset: produced by the Berkeley Drosophila Transcription Network Project, this dataset includes the results of mRNA expression measurements on 95 different genes at six different times during the early development of Drosophila melanogaster, a species of fruit fly. In particular, Ilsley et al. focussed on the expression at one point in time of the even skipped (eve) gene, a widely studied gene that is important for embryo development in these fruit flies. The eve gene is one of the genes responsible for dividing the fly into segments which form part of its body plan.
Without making any assumptions about the biological mechanisms that might be involved, Ilsley et al. built a statistical model that was able to predict the pattern of gene expression for a fruit fly, given the concentrations of the relevant transcription factors in the various cells within the embryo as input. The model was also able to predict the patterns of gene expression observed in other experiments involving mutations and the misexpression of fruit fly genes. Moreover, Ilsley et al. have made various predictions involving the genes Bicoid and Hunchback that can be tested experimentally in future studies.
transcriptional regulation; logistic regression; fly embryo; developmental patterning; positional information; even skipped; D. melanogaster
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other “omics” fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called ‘LipidHome’, providing theoretically generated lipid molecules and useful metadata. Using the ‘FASTLipid’ Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a ‘tools’ section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.
The Gene Ontology (GO) is the de facto standard for the functional description of gene products, providing a consistent, information-rich terminology applicable across species and information repositories. The UniProt Consortium uses both manual and automatic GO annotation approaches to curate UniProt Knowledgebase (UniProtKB) entries. The selection of a protein set prioritized for manual annotation has implications for the characteristics of the information provided to users working in a specific field or interested in particular pathways or processes. In this article, we describe an organelle-focused, manual curation initiative targeting proteins from the human peroxisome. We discuss the steps taken to define the peroxisome proteome and the challenges encountered in defining the boundaries of this protein set. We illustrate with the use of examples how GO annotations now capture cell and tissue type information and the advantages that such an annotation approach provides to users.
http://www.ebi.ac.uk/GOA/ and http://www.uniprot.org
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
The Gene Ontology (GO) resource provides dynamic controlled vocabularies to provide an information-rich resource to aid in the consistent description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). System-focused projects, such as the Renal and Cardiovascular GO Annotation Initiatives, aim to provide detailed GO data for proteins implicated in specific organ development and function. Such projects support the rapid evaluation of new experimental data and aid in the generation of novel biological insights to help alleviate human disease. This paper describes the improvement of GO data for renal and cardiovascular research communities and demonstrates that the cardiovascular-focused GO annotations, created over the past three years, have led to an evident improvement of microarray interpretation. The reanalysis of cardiovascular microarray datasets confirms the need to continue to improve the annotation of the human proteome.
GO annotation data is freely available from: ftp://ftp.geneontology.org/pub/go/gene-associations/
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
The gene ontology (go) resource provides dynamic controlled vocabularies to aid in the description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). A renal-focused curation initiative, funded by Kidney Research UK and supported by the GO Consortium, has started at the European Bioinformatics Institute and aims to provide a detailed GO resource for mammalian proteins implicated in renal development and function. This report outlines the aims of this initiative and explains how the renal community can become involved to help improve the availability, quality and quantity of GO terms and their association to specific proteins.
gene ontology; annotation; biocuration; kidney; renal
Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a pre-competitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the “International Summit on Proteomics Data Release and Sharing Policy” in Amsterdam, the Netherlands, to identify and address potential roadblocks to rapid and open access to data.
The six principles agreed upon by key stakeholders at the summit addressed issues surrounding 1) timing, 2) comprehensiveness, 3) format, 4) deposition to repositories, 5) quality metrics, and 6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.
Mitochondria play essential roles in cardiac pathophysiology and the murine model has been extensively used to investigate cardiovascular diseases. In the present study, we characterized murine cardiac mitochondria using an LC/MS/MS approach. We extracted and purified cardiac mitochondria; validated their functionality to ensure the final preparation contains necessary components to sustain their normal function; and subjected these validated organelles to LC/MS/MS-based protein identification. A total of 940 distinct proteins were identified from murine cardiac mitochondria, among which, 480 proteins were not previously identified by major proteomic profiling studies. The 940 proteins consist of functional clusters known to support oxidative phosphorylation, metabolism and biogenesis. In addition, there are several other clusters--including proteolysis, protein folding, and reduction/oxidation signaling-which ostensibly represent previously under-appreciated tasks of cardiac mitochondria. Moreover, many identified proteins were found to occupy other subcellular locations, including cytoplasm, ER, and golgi, in addition to their presence in the mitochondria. These results provide a comprehensive picture of the murine cardiac mitochondrial proteome and underscore tissue- and species-specification. Moreover, the use of functionally intact mitochondria insures that the proteomic observations in this organelle are relevant to its normal biology and facilitates decoding the interplay between mitochondria and other organelles.
cardiac mitochondria; mass spectrometry; proteome; sample preparation; target validation
The proteome of human salivary fluid has the potential to open new doors for disease biomarker discovery. A recent study to comprehensively identify and catalog the human ductal salivary proteome led to the compilation of 1166 proteins. The protein complexity of both saliva and plasma is large, suggesting that a comparison of these two proteomes will provide valuable insight into their physiological significance and an understanding of the unique and overlapping disease diagnostic potential that each fluid provides. To create a more comprehensive catalog of human salivary proteins, we have first compiled an extensive list of proteins from whole saliva (WS) identified through MS experiments. The WS list is thereafter combined with the proteins identified from the ductal parotid, and submandibular and sublingual (parotid/SMSL) salivas. In parallel, a core dataset of the human plasma proteome with 3020 protein identifications was recently released. A total of 1939 nonredundant salivary proteins were compiled from a total of 19 474 unique peptide sequences identified from whole and ductal salivas; 740 out of the total 1939 salivary proteins were identified in both whole and ductal saliva. A total of 597 of the salivary proteins have been observed in plasma. Gene ontology (GO) analysis showed similarities in the distributions of the saliva and plasma proteomes with regard to cellular localization, biological processes, and molecular function, but revealed differences which may be related to the different physiological functions of saliva and plasma. The comprehensive catalog of the salivary proteome and its comparison to the plasma proteome provides insights useful for future study, such as exploration of potential biomarkers for disease diagnostics.
Biomarkers; Body fluid; MS; Plasma; Saliva
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
The Gene Ontology (GO) has proven to be a valuable resource for functional annotation of gene products. At well over 27 000 terms, the descriptiveness of GO has increased rapidly in line with the biological data it represents. Therefore, it is vital to be able to easily and quickly mine the functional information that has been made available through these GO terms being associated with gene products. QuickGO is a fast, web-based tool for browsing the GO and all associated GO annotations provided by the GOA group. After undergoing a redevelopment, QuickGO is now able to offer many more features beyond simple browsing. Users have responded well to the new tool and given very positive feedback about its usefulness. This tutorial will demonstrate how some of these features could be useful to the researcher wanting to discover more about their dataset, particular areas of biology or to find new ways of directing their research.
Database URL: http://www.ebi.ac.uk/QuickGO
A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
Summary: QuickGO is a web-based tool that allows easy browsing of the Gene Ontology (GO) and all associated electronic and manual GO annotations provided by the GO Consortium annotation groups QuickGO has been a popular GO browser for many years, but after a recent redevelopment it is now able to offer a greater range of facilities including bulk downloads of GO annotation data which can be extensively filtered by a range of different parameters and GO slim set generation.
Contact: firstname.lastname@example.org; email@example.com
Our knowledge of proteins has greatly improved in recent years, driven by new technologies in the fields of molecular biology and proteome research. It has become clear that from a single gene not only one single gene product but many different ones - termed protein species - are generated, all of which may be associated with different functions. Nonetheless, an unambiguous nomenclature for describing individual protein species is still lacking. With the present paper we therefore propose a systematic nomenclature for the comprehensive description of protein species. The protein species nomenclature is flexible and adaptable to every level of knowledge and of experimental data in accordance with the exact chemical composition of individual protein species. As a minimum description the entry name (gene name + species according to the UniProt knowledgebase) can be used, if no analytical data about the target protein species are available.
Motivation: There is a strong demand in the genomic community to develop effective algorithms to reliably identify genomic variants. Indel detection using next-gen data is difficult and identification of long structural variations is extremely challenging.
Results: We present Pindel, a pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads. We use both simulated reads and real data to demonstrate the efficiency of the computer program and accuracy of the results.
Availability: The binary code and a short user manual can be freely downloaded from http://www.ebi.ac.uk/∼kye/pindel/.
Contact: firstname.lastname@example.org; email@example.com
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set.
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.
To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.
The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
In proteomics a paradox situation developed in the last years. At one side it is basic knowledge that proteins are post-translationally modified and occur in different isoforms. At the other side the protein expression concept disclaims post-translational modifications by connecting protein names directly with function.
Optimal proteome coverage is today reached by bottom-up liquid chromatography/mass spectrometry. But quantification at the peptide level in shotgun or bottom-up approaches by liquid chromatography and mass spectrometry is completely ignoring that a special peptide may exist in an unmodified form and in several-fold modified forms. The acceptance of the protein species concept is a basic prerequisite for meaningful quantitative analyses in functional proteomics. In discovery approaches only top-down analyses, separating the protein species before digestion, identification and quantification by two-dimensional gel electrophoresis or protein liquid chromatography, allow the correlation between changes of a biological situation and function.
To obtain biological relevant information kinetics and systems biology have to be performed at the protein species level, which is the major challenge in proteomics today.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) provides interactive and programmatic interfaces to query, browse and navigate an ever increasing number of biomedical ontologies and controlled vocabularies. The volume of data available for querying has more than quadrupled since it went into production and OLS functionality has been integrated into several high-usage databases and data entry tools. Improvements have been made to both OLS query interfaces, based on user feedback and requirements, to improve usability and service interoperability and provide novel ways to perform queries.
The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.