|Home | About | Journals | Submit | Contact Us | Français|
Human Proteinpedia (http://www.humanproteinpedia.org) is a publicly available proteome repository for sharing human protein data derived from multiple experimental platforms. It incorporates diverse features of human proteome including protein-protein interactions, enzyme-substrate relationships, PTMs, subcellular localization, expression of proteins in various human tissues and cell lines in diverse biological conditions including diseases. Through a public distributed annotation system developed especially for proteomic data, investigators across the globe can upload, view and edit proteomic data even before they are published. Inclusion of information on investigators and laboratories that generated the data, visualization of tandem mass spectra, stained tissue sections, protein/peptide microarrays, fluorescent micrographs and Western blots ensure quality of proteomic data assimilated in Human Proteinpedia. Many of the protein annotations submitted to Human Proteinpedia have also been made available to the scientific community through Human Protein Reference Database (http://www.hprd.org), another resource developed by our group. In this protocol, we describe how to submit, edit and retrieve proteomic data in Human Proteinpedia.
Human Proteinpedia (Kandasamy et al., 2009; Mathivanan et al., 2008) is a unified protein resource with a large collection of experimentally-derived protein data obtained from multiple experiments, which include co-immunoprecipitation, mass spectrometry, fluorescence-based experiments, protein and peptide microarrays, Western blotting and yeast two-hybrid assays. Currently, the resource contains data derived from over 2,710 experiments gathered from 249 laboratories worldwide. Human Proteinpedia uses controlled vocabularies from the Gene Ontology (UNIT 7.2; Ashburner et al., 2000) for subcellular localization, eVOC (Kelso et al., 2003) for the human tissues, RESID (Garavelli, 2004) for post-translational modifications (PTMs), PSI-MS (Martens et al., 2011) for mass spectrometry data and interpretation of MS/MS spectra and PSI-MI (Kerrien et al., 2007) for protein-protein interactions (PPIs). Human Proteinpedia incorporates a Public Distributed Annotation System (PDAS), developed by our group for handling protein data, which enables scientific community to submit and edit their protein data.
Human Proteinpedia contains data derived from several experiments that include, 354 experiments for tissue expression (332 of them are mass spectrometry-derived expression studies), 192 cell line-based expression studies, 19 disease specific expression studies, 22 experiments for the detection of PPIs and 5 studies on subcellular localization. We have also included mass spectrometry derived data from HUPO initiatives such as human plasma proteome project (Omenn et al., 2011), human brain proteome project (Hamacher et al., 2004) and human liver proteome project (He, 2005). In order to ensure easy access for proteomic investigators, we describe the protocol of accessing and handling protein data submission in this resource, which will facilitate data sharing in the biomedical community.
We describe five basic protocols for utilizing Human Proteinpedia. Basic Protocol 1 describes the query system in Human Proteinpedia. Basic Protocol 2 describes PDAS for annotating data from both high-throughput and targeted investigations. Basic Protocol 3 explains integration of Human Proteinpedia data with HPRD (Keshava Prasad et al., 2009). Basic Protocol 4 explains the various ways to access and download the entire data available in the Human Proteinpedia. Finally, basic Protocol 5 describes protein data for diverse protein features in this resource.
The query page in Human Proteinpedia (Fig. 1) can be accessed through the URL: http://www.humanproteinpedia/query. The query system includes search by gene symbol or protein name, protein database accession numbers, type of protein feature or the type of experimental platform annotated in Human Proteinpedia. When a query is made through a free text search in a field of interest, the other text boxes are automatically deactivated.
A computer with internet connection.
The query system contains three independent levels of query to fetch data from Human Proteinpedia (Fig. 1). In the first level, one can query Human Proteinpedia using gene symbol/protein name (e.g., BRCA1) or external database identifiers such as RefSeq (e.g., NP_002077.1) (Pruitt et al., 2009) or UniProt (e.g., P01860) (UniProt, 2012).
The query fetched data from 19 different mass spectrometry-based experiments, which included 16 studies aimed at PTMs and three experiments of tissue expression studies. The results fetched and displayed include, experimental platform, experiment type, experiment description, name and laboratory details of Principal Investigators (PIs), and data submitters, publication status of the data, sample details such as tissue or cell line and source organisms. The mass spectrometry-based experiments provide ionization method, fractionation technique, and name of the MS/MS search engine used along with the peptide and PTM data. These include the peptide sequence, modifications, peptide score/probability, precursor mass, charge state, sequence identifier and MS/MS spectrum. MS/MS spectrum is created for each peptide by PRIDE viewer (UNIT 13.8; Medina-Aunon et al., 2011) from the MS/MS peak lists. The entire dataset can also be downloaded. The multiple results are displayed one below the other and a unique Human Proteinpedia identifier is provided on the top of each result.
The search for the protein name ‘Grb2’ fetched 21 immunohistochemistry-based experiments carried out on ‘Grb2.’ The results include the details such as Human Proteinpedia identifier, experiment type, description of the experiment, names and laboratory details of PIs and data providers, status of publication, sample sources and source organisms. It also provides additional protein details such as protein name, tissue name, cell type, antibody reliability score fetched from the Human Protein Atlas (HPA) (Uhlen et al., 2010) and a link to HPA and the images showing protein’s staining pattern.
Query by datasets allows the users to query using protein features such as tissue expression (e.g., saliva), PTMs (e.g., phosphorylation) or subcellular localization (e.g., nucleus). The controlled vocabularies obtained from the standard vocabulary consortia for each category are provided as a popup page near the text box provided for each search.
Fig. 2 shows multiple experiments carried out for tissue expression studies on ‘Brain.’ The query fetched five mass spectrometry-based tissue expression experiments of which four of them are from the recently submitted ‘unpublished’ studies. The results page provides Human Proteinpedia accession number hyperlinked to its corresponding experimental details, brief description of the experiment, status of publication, annotation category (e.g., tissue expression), experimental platform and names and laboratory details of contributing group. Similar results are fetched for the query by PTMs and subcellular localization.Query results of tissue expression (e.g., brain). The table displays the types of experiments carried out to check expression of proteins in brain. It also provides the number of datasets fetched.
A large fraction of the available data in Human Proteinpedia is driven by the contribution by the scientific community. Users can submit protein information obtained from their investigations through the Public Distributed Annotation System (PDAS). PDAS is a set of computational protocols made available for the scientific community to share their protein-related data. PDAS for protein data is a novel feature developed for Human Proteinpedia. Researchers could contribute their data in four different ways: 1) enter protein data using the web interface, 2) upload high-throughput data in batch mode using the web interface, 3) send data through FTP or e-mail; or, 4) set up PDAS servers at their own location. The users need to register and login to annotate, upload or manage their data in PDAS through web interface.
A computer with internet connection.
The registration page requires users to enter their contact and laboratory details. The required information includes full name of the contributor, email address, title, name of the organization, address, country, lab URL, whether PI or not. If the data provider is not a PI, the PI details need to be submitted. Also, the user needs to contain username and password in the lowest panel.
The PDAS annotation page provides access to the users who would like to contribute their experimental data into Human Proteinpedia. Users can enter their high-throughput or low-throughput data using this annotation system. The panel on the left side provides the user details along with an option to edit their registration details. The targeted protein data could be annotated by selecting the database name and typing the protein identifier from that database. High-throughput data can be uploaded by clicking the ‘Batch upload’ button. This page serves as a portal where the user can view/edit their own submitted data.
The data obtained from high-throughput experiments can also be shared through the PDAS annotation system using the portal for ‘Batch upload’ located at http://pdas.hprd.org/Batch_Upload
High-throughput data of PTMs, PPIs, tissue expression and substrates can be annotated through this annotation system.The data can also be sent through e-mail or through FTP in the prescribed format as indicated at the following URLs:
The tab delimited data file in the prescribed format can be uploaded using this portal. If the experiment carried out is to detect a PTM using mass spectrometry, the data providers need to provide the peak lists files obtained from mass spectrometer to link the peptide data with its MS/MS spectrum and also to provide them for data download.
The uploaded data files will be sent to the Human Proteinpedia annotation team to manually verify and upload the data into Human Proteinpedia. The user is permitted to opt for keeping the data private until their manuscript on the submitted dataset(s) is published.
The required fields for the annotation of PTMs are provided at http://pdas.hprd.org/ptm_file_format provide the protein details such as protein identifiers, name of protein database used for searches, type of PTM, site (position at which the amino acid gets modified in the protein sequence), modified amino acid, experiment type, upstream enzyme(s), score given by the search algorithm(s), database search engine(s) used (e.g., Mascot), the mass spectrum file name and the status of publication. Peptide score, algorithm used and mass spectrum name are the fields to be annotated only for the mass spectrometry-based experiments.
The required fields for the annotation of PPIs are provided at the URL: http://pdas.hprd.org/interaction_file_format
Direct PPIs which involve two molecules require the protein database identifiers for protein A and protein B, experiment name and the status of publication. Complex interactions, which involve multiple proteins, require the protein database identifier for each protein in the complex and the experimental method used to isolate the complex. When multiple complexes are annotated at a time, a common identifier for all the proteins in each complex needs to be provided to distinguish them. Multiple complexes can be sent in a single file by providing the different identifiers for each complex.
The required fields for the annotation of tissue expression are provided at http://pdas.hprd.org/expression_file_format. It requires protein database identifier with the mention of database name, site of expression (normal tissue/cancer tissue/cell line), type of the experiment used to detect the expression and a PubMed identifier.
The required fields for the annotation of substrates are provided at http://pdas.hprd.org/substrates_file_format. The annotation of substrates requires the protein database identifier with the mention of database name, modification type, protein sequence position at which the protein is modified, modified amino acid, name of the experiment carried out to identify the substrate, sequence identifier for the substrate, database search algorithm used, peptide score obtained from database search algorithm, name of the mass spectrum file from which the peptide is identified and PubMed identifier.
Annotation of a targeted protein can be carried out by the community through a web-based interface located on the PDAS annotation page. The protein identifier provided by the user is used to match the HPRD accession numbers for fetching the protein name and gene symbol from HPRD and they are displayed on the top of the low-throughput data annotation page. The low-throughput protein features include PTMs, PPIs, tissue expression, substrates and subcellular localization. The earlier annotation contributed by the logged in users can be viewed in the lower bottom table with the heading ‘Existing annotation using Human Proteinpedia by the <username>’.
You are directed to the URL (http://pdas.hprd.org/annotation) where different types of protein feature can be entered for using PDAS. It also provides the protein name and gene symbol of the queried protein hyperlinked to HPRD. If the query did not fetch the protein which user is looking for, the query could be modified by clicking on the link, ‘annotate a different protein’.
Annotation of site directed mutagenesis or mass spectrometry-based detections could be carried out using this link.
Fig. 3 shows the annotation page for PTM detected using the site directed mutagenesis. The data fields include, the detailed description of the experiment, sample source (normal tissue, cancer tissue or cell lines), status of publication, availability of data, a text box to enter the name of the tissue/cell lines, type of PTM from the standard menu provided as a dropdown, protein accession number, position of the amino acid, amino acid and type of experiment. The public data in Human Proteinpedia is accessible to anonymous users whereas the private data can be viewed only by the data contributor and their collaborators. Controlled vocabularies were given for the normal human tissues and can be added in the textbox by clicking on ‘Click here to add the normal tissue’.Targeted protein annotation page for PTMs.
Along with the list of annotation fields for site-directed mutagenesis-based PTM annotation, the mass spectrometry detection methods have additional data fields which include labeling technique, protease, whether the sample is from SDS-PAGE, name of the mass spectrometer used, instrument vendor, ionization method, activation method, mass tolerance used for database searching (MS and MS/MS), database used for searching, database search algorithm used, type of experiment, peptide sequence, peptide score/probability, precursor mass, charge state and provision to upload the mass spectrum.
Direct as well as the complex PPIs can be submitted using PDAS. Direct interaction means the interaction between two interactors, whereas a complex interaction means multiple proteins have been detected as the part of a complex and the exact topology of interaction between the individual interactors is not known.
The annotation fields include PubMed identifier for the article in which the interaction was described, data category (public or private), brief description about the experiment, sample source (normal tissue, cancer tissue or cell line), interacting molecule and experiment type. To make the process simple, the users can provide any external database protein identifiers for the interactors. The entered PPI dataset would be sent to Human Proteinpedia core annotation team to verify and upload the data into Human Proteinpedia.
The complex interaction annotation fields are the same as that of direct PPI with the exception that multiple text fields were provided to enter multiple interactors. Complex interactions could also be sent to Human Proteinpedia team by email, FTP or through batch upload.
Data derived from experiments such as antibody array detection, Western blot detection, immunohistochemistry, ELISA and mass spectrometry-based detection can be annotated for tissue expression.
The required annotation fields include status of publication, description of experiment, sample type (normal tissue/disease tissue/cell line), site of expression as given in the pull down menu, type of experiment, species of primary antibody, source of antibody (whether commercial), name of the vendor if commercial, primary antibody dilution and image obtained from the experiment. All the other types of experiments for the annotation of tissue expression other than mass spectrometry-based detection are the same as that of detection by Western blot. In case of mass spectrometry-based detection, the mass spectrometry specific parameters such as labeling technique, protease used, name of the mass spectrometer used (as given in the pull down menu), instrument vendor (as given in the pull down menu), mass tolerance used for database searching, database used for searching (e.g., RefSeq), search algorithm used (e.g., Sequest), peptide sequence, peptide score or probability, precursor mass, charge state and image of the mass spectrum needs to be provided.
The annotated information would reach the Human Proteinpedia core annotation team to validate and upload the data into Human Proteinpedia.
Substrates are proteins which are modified by enzymes. The annotation fields are same as that of the PTMs. Enzyme and substrate names need to be added as additional fields.
Subcellular localization can be annotated from different types of experiments that include fluorescence-based experiments, immunohistochemistry and mass spectrometry-based detection.
The required fields for subcellular localization annotation include the PubMed ID if published, description of the experiment, experimental sample (normal tissue, disease tissue or cell line), name of the sample source, name of subcellular localization (as selected from the pull down menu), experiment type, species of the primary antibody (selected from a pull down menu), source of antibody whether commercial or not, name of the vendor if commercial, primary antibody dilution and the image obtained from the experiment. For the immunohistochemistry-based expression studies, the same annotation fields need to be annotated as that of fluorescence-based experiments. For the mass spectrometry-based detection, the additional mass spectrometry specific fields must be provided as described in the previous section.
The annotated information would reach the Human Proteinpedia core annotation team to validate and upload into the database.
Human protein reference database (HPRD) is a repository hosting over 30,000 human proteins with the manually curated protein features such as tissue expression, PPIs, PTMs, subcellular localization and substrates. Protein features submitted to Human Proteinpedia are also made available to the scientific community through HPRD by integrating the data in Human Proteinpedia into HPRD (Kandasamy et al., 2009; Mathivanan et al., 2008). The integrated protein features are PTMs, subcellular localization and tissue expression. The volume of data in HPRD increases by incorporating additional data from Human Proteinpedia as Human Proteinpedia considers all the conditions of human samples.
A computer with internet connection.
The query fetches the vimentin molecule page in HPRD. In the summary tab, under the expression panel, the manually literature curated normal tissue expression were provided from HPRD annotation. The Human Proteinpedia tissue expression data for vimentin is provided below the HPRD expression panel with the side heading ‘Human Proteinpedia’. Human Proteinpedia provides expression details of disease as well as cell line expression data which is not available in HPRD. The names of the tissues or cell lines from Human Proteinpedia are hyperlinked to the annotation pages of vimentin in Human Proteinpedia (Fig. 4). HPRD provides only normal tissue expression whereas incorporation of Human Proteinpedia into HPRD gains cancer and cell line expression as additional details for HPRD.Integration of Human Proteinpedia data into HPRD. It describes the incorporation Human Proteinpedia data for the molecule ‘Vimentin’. Along with the normal tissue expression, Human Proteinpedia data provides cancer and cell line expression ...
The PTM annotations in Human Proteinpedia are made available in HPRD under the tab PTMs with the heading of ‘Human Proteinpedia’. It provides, site (sequence position at which amino acid is modified), amino acid residue, type of PTM, upstream enzymes details and the sequence of the identified peptide. The position of amino acid residue, which is post-translationally modified, is hyperlinked to its corresponding annotation in Human Proteinpedia.
This query fetches the molecule page of the gene ‘ATP13A1’ in HPRD. The summary tab in the molecule page provides the subcellular localization from HPRD as well as from Human Proteinpedia. Subcellular localization of ‘endoplasmic reticulum’ was fetched for ‘ATP13A1’ from Human Proteinpedia, which was not curated in HPRD clarifying the potential use of integrating Human Proteinpedia into HPRD.
Large amount of experimental data are being deposited into Human Proteinpedia, which can be used by the community for many meta-analysis studies. Human Proteinpedia data are made freely available for the community to download for any further analysis. The contributor uploads the mass spectrometry-derived raw files into Tranche server (Smith et al., 2011). Tranche is a distributed data repository to host proteomic data. The link to download the datasets from Tranche is provided in the experiment description page for the mass spectrometry-derived datasets.
A computer with internet connection.
The download tab provides list of publicly available datasets for download. For each of the datasets, this page provides the hyperlinked Human Proteinpedia accession number, which is linked to the detailed description of the dataset. It also provides brief description of the dataset, publication status, annotation category, experimental platform, download link and name of PI and name of the contributing laboratory. The entire dataset available in Human Proteinpedia has been made available for download on the top of this page.
This page provides detailed information about the selected dataset and the fields varies slightly based on the type of experiment and the type of platform are being used as explained in the example annotation files in Basic Protocol 5.
This link takes us to the page where the raw data files (link to Tranche server), processed protein identification files and meta-annotation data which provide brief description on the experiment could be downloaded for the dataset of interest.
Example annotation files are provided to make the users aware of different kinds of experimental data available in Human Proteinpedia. The display formats for Tissue expression and PTMs are discussed in Basic Protocol 1.
A computer with internet connection
This page provides the example annotation for various annotation features as well as for the types of experimental platforms. The hyperlinked ‘click here’ buttons would take us to the example annotation pages.
Protein name and its subcellular localization ‘Golgi apparatus’ are given in the top table followed by the experimental description. The experimental details include, experiment type, short and detailed descriptions of the experiment, PI and laboratory details, PubMed ID, sample source and source organism. The source organism is provided with the NCBI taxonomy ID. The lower bottom table describes the name of the protein, subcellular localization and external links to HPA. It also provides the immunofluorescence images describing the subcellular localization of KIAA2013 in Golgi apparatus.
This page shows a co-immunoprecipitation experiment carried out to study PPI. This page shows a complex interaction involving 11 molecules fetched from this experiment. Along with the experiment details as provided in the subcellular localization experiments, the interacting proteins and the name of the experiment is provided in the lower bottom table.
This page shows the casein kinase II substrates analysis using peptide microarray. RBM10 is the protein identified as substrate for the enzyme casein kinase II using peptide microarray. The peptide microarray image for the identification of substrates is provided in the lower bottom table along with the contributor’s details and experimental details.
Human Proteinpedia is a portal which integrates and disseminates published and unpublished protein data pertaining to PTMs, PPIs, subcellular localization and tissue expression. The data available in Human Proteinpedia can be accessed either by query system or by downloading files. The query for a protein fetches all the experimental data for the protein by one or more laboratories. The data derived from mass spectrometry-based experiments provides most of the details obtained from the experiment that include one or more peptides identified from that protein, the database search engine score for the confidence of identification and it also provides the mass spectrum for each of the peptide identification allowing the users to check the assigned peaks for each of the identified amino acid. The mass spectra could be viewed by PRIDE viewer using the peak list data of each identified peptide. It is available for those peptides where peak list files were provided by the contributors. Multiple experiments on a single protein by different laboratories provide more evidence for a given reaction. Identification of a PTM in the same site by multiple groups provides the confidence on the site of modification. Also, the annotated mass spectrum and the peptide score derived from database search engines for the mass spectrometry-based experiments could be used as the confidence measure for the identification of protein.
The Human Proteinpedia was developed in 2009 as a unified resource to store and share the protein data derived from various experiments pertaining to human proteins. With the advent of recent proteomics technologies such as mass spectrometry, protein/peptide microarrays, yeast two hybrid systems, a large amount of data are being published every year. The data derived from these experiments take a while before they are disseminated to the biomedical community through publications in the form of either PDF documents or supplementary files. Human Proteinpedia accelerates the sharing of experimental data in different user-friendly formats and thus removes the inconvenience and delay caused by the time window required for publication. We have developed a separate PDAS dedicated to handle proteomic data to facilitate submission of proteomic data to Human Proteinpedia. The data can also be sent through FTP or e-mail. We also curate data from published literature and contact those authors to provide us the data with high resolution images and data from Human Proteinpedia is also made available into HPRD to reach out to a larger user community.
The data derived from multiple experiments are fetched and stored in Human Proteinpedia to make it easy for the scientific community to access the data by simple queries. The data displayed in the Human Proteinpedia website are obtained from the experimentalists and formatted according to the Human Proteinpedia database architecture. Human Proteinpedia provides the data associated with the experiment including images and contact details of the data provider. The users can contact the data provider for further information.
The protein names for each protein in Human Proteinpedia are fetched from HPRD. Although HPRD provides comprehensive information on the names and alternative names for each protein, in order to avoid multiple and non-specific hits, it is advisable to use the gene symbols from the approved gene nomenclature committee.
The images derived from the experiments are adjusted to accommodate in the Human Proteinpedia display. The original images as sent by the contributors can be obtained from the Human Proteinpedia core team upon request at http://pdas.hprd.org/help. Private data available in Human Proteinpedia is accessible only to the contributor and their collaborators, As soon as the data contributor releases it to the public, these data are made available to all users.
The protein assignments to the experimental data are mapped to HPRD proteins to have a unified identifier for each protein. HPRD annotation is based on the RefSeq protein database. The proteins which do not match the HPRD proteins are stored with the accession numbers given by the contributors. Those proteins which have accessions that do not match with HPRD proteins can be accessed only through download files.
A.P was partially funded for this project by a grant from National Institutes of Health Roadmap initiative U54 RR020839 and a contract N01-HV-28180 from the National Heart Lung and Blood Institute. We thank the Department of Biotechnology (DBT), Government of India for research support to the Institute of Bioinformatics, Bangalore. B.M. is a recipient of a Senior Research Fellowship from the Council of Scientific and Industrial Research (CSIR), Government of India. T.S.K.P. is supported by a research grant on “Development of Infrastructure and a Computational Framework for Analysis of Proteomic Data” from DBT.
The Human Proteinpedia home page.
The Human Proteinpedia download server.
The Human Proteinpedia public distributed annotation server.
The user’s queries and data request are answered through this link.