The availability of complete genome sequences has greatly accelerated the establishment of genomic and proteomic technologies as powerful tool for studying tissue or cell-specific gene expressions at the system-level and for delineating novel pathways involved in metabolic diseases such as diabetes41, 42
. In particular, mass spectrometry-based proteomics has become an important tool for molecular and cellular biology research and for systems biology studies by providing large scale measurements of relative protein abundances including post-translational protein modifications43
. The importance of studying biological systems at the protein level is further emphasized by recent studies that clearly indicate that mRNA levels do not necessarily correlate with protein abundances 44–46
Mass spectrometry-based proteomic tissue profiling has been extensively applied recently for establishing the proteome composition and protein expression patterns, in different mouse tissues, organs, and cell lines7–14
. Such proteome profiling of mammalian tissues or organs is especially valuable for elucidating the diversity in protein composition and expression patterns among mammalian tissues. The proteome database will also serve as a reference resource for more focused hypothesis- driven biological studies and/or for more detailed systems biology studies. For example, one or more of the proteins can serve as potential biomarkers for tissue-specific pathologies. The present study represents the first extensive proteomic characterization of mouse pancreatic islets of Langerhans with the aim of establishing a reference database for mouse islet proteome for future metabolic research using rodent models. We performed this initial survey experiment using a pooled sample from both normal (control) mice and an insulin resistant model with the aim of gaining increased coverage for those proteins that are potentially expressed at higher levels in either of the two conditions. This strategy is advantageous compared to the analysis of islets from the normal state alone because many detectable proteins with increased expression in the disease state could be below the limit of detection if only the normal sample is analyzed. Also, since the database is a qualitative catalog, nearly all proteins identified from the pooled sample will be present in both the normal and disease states but at different abundance levels in the two states. Therefore, the increased coverage achievable for this database should serve as a more useful resource for future studies using both normal and disease mouse models 47
The resulting islet proteome database from this study covers ~4,000 proteins. One utility of the database will be mapping different canonical pathways and functional processes to identify which islet proteins are linked with specific metabolic and signaling pathways () since traditional pathway knowledgebases are often non-tissue specific. This database will also be a suitable complement other proteomes that have been characterized for liver, adipocyte, muscle, and brain7–14
. Furthermore, we have provided estimated protein-abundances within the islet proteome based on the normalized spectral counts. While it has been reported that spectral counts can provide an estimate on relative protein abundances within the proteome25, 48
, such estimates should only be used as a qualitative measure to query whether the protein is either highly abundant or of relatively low abundance. This is because several other factors can influence the spectral count including protein solubility, protein digestion efficiency and peptide ionization efficiency for a given protein.
The extensiveness of the islet proteomic datasets enabled us to compare the results with other available datasets from a number of mouse organs and tissues. We were able to identify a set of 133 proteins that were specific to islets but not detectable or detected with very low abundances in other tissues (Supplemental Table 5
). Indeed the subset of 133 proteins contains well-known islet specific secreted hormones, including: glucagon, islet amyloid polypeptide, insulin, prodynorpin, pancreatic polypeptide, peptide YY and urocortin 3. Besides secreted peptide hormones, the subset includes proteins known to be specific to islets, for example: G6pc2, Reg1 and Sytl4, which are islet specific glucose-6-phosphatase49
, regenerating islet derived 1 50
and synaptotagmin-like 4 also know as granuphilin51
, respectively. Although some proteins are indeed known and expected to be islet-specific, the majority of proteins were not known to be restricted to islets. In particular, several hypothetical or unknown proteins were confidently identified as islet-specific i.e. proteins not containing domains with known or reasonably specific functions. These islet-specific proteins, including the unknown proteins, may be important for islet function and are suitable candidates for future studies. An interesting example is the novel transmembrane protein, TMEM27, that was recently demonstrated to stimulate pancreatic beta-cell proliferation52
. We should note that those 133 proteins are relatively specific to islets based on our data because we only used 8 other organ and tissue types for this comparative analysis: placenta, muscle, heart, kidney, lung, adipocytes, liver and brain. Potentially, those proteins could be expressed in the tissues not yet profiled by LC-MS/MS proteomics.
In addition to protein identification, there is a significant need in identifying potential post-translational protein modifications in a global proteome profiling study since many modifications are known to regulate cell signaling and can also serve as markers of disease progression. Unfortunately, due to the low-abundant nature of most modifications, it has been a challenge to identify protein modifications in global profiling experiments without enrichment. In this work we explored the use of an alternative informatics strategy for identification of modified peptides from LC-MS/MS analyses of a global non-enriched sample. We based our analyses on the notion that post-translational modifications are usually substoichiometric, therefore, the true modified peptides should be present along with unmodified forms. By applying this criterion and an alternative FDR estimation approach using shifted-mass approach (details described in methods), we identifed a total of 54 modified peptides including oxidative modifications on cysteine and phosphorylation on serine and threonine with the FDR <5%. Our approach for controlling the FDR of the peptides with PTMs is important for identifying modified peptides within global profiling data, especially for those PTMs that cannot be specifically enriched such as cysteinyl oxidation.
While we realize that the number of identified modified peptides is very limited due to the nature of the global profiling experiment without specific enrichment, the identification of these modifications provides additional value to protein identities or abundances for this initial characterization of the islet proteome. One example is oxidative modifications since oxidative stress has been linked with diabetes 32, 53
. To our knowledge, the specific oxidative modification in islets have never been identified, presumably due to the limitations in technology. The observed oxidative modifications in this study could potentially provide a list of novel targets that may play a role in oxidative stress response and may also serve as markers of disease progression. An specific example is the oxidation of Cys-244 in glyceraldehyde-3-phosphate dehydrogenase (G3PDH). There are reports showing that Cys-244 is one of the strongest nucleophilic residues and is susceptible to modifications by 4-hydroxy-2-nonenal 54
, a major lipid peroxidation-derived reactive aldehyde, or by normal endogenous metabolites like acyl-CoA 55
and fumarate 56
. All three modifications result in strong inhibition of the G3PDH enzyme activity. The fact that the oxidation on Cys-244 was detected in both sulfonic and intermediate sulfinic acid forms further supports the confidence of the identifications. However, it remains to be proven that the oxidation of Cys-244 residue indeed inhibits the enzymatic activity of G3PDH.
In summary, the resulting mouse islet proteome database contains the identified peptide sequences, the protein identifications and spectral count information for each protein as information reflecting their relative abundances, and the identified PTMs. The database represents an important reference resource for further data mining and for islet biological studies focused on diabetes. For example, this database will provide a foundation for future quantitative proteomic studies applying the accurate mass and time tag approach where both accurately measured masses and elution times are utilized for peptide identifications57
. The available peptide sequences and islet-specific proteins will also be useful for selecting and devising specific targeted proteomic experiments. The database is included as Supplemental Material
and available at the NCRR Center for Integrative Biology website (http://ncrr.pnl.gov
) for access by the research community.