|Home | About | Journals | Submit | Contact Us | Français|
The pancreatic islets of Langerhans, and especially the insulin-producing beta cells, play a central role in the maintenance of glucose homeostasis. Alterations in the expression of multiple proteins in the islets that contribute to the maintenance of islet function are likely to underlie the pathogenesis of type 2 diabetes. To identify proteins that constitute the islet proteome, we provide the first comprehensive proteomic characterization of pancreatic islets for mouse, the most commonly used animal model in diabetes research. Using strong cation exchange fractionation coupled with reversed phase LC-MS/MS we report the confident identification of 17,350 different tryptic peptides covering 2,612 proteins having at least two unique peptides per protein. The dataset also identified ~60 post-translationally modified peptides including oxidative modifications and phosphorylation. While many of the identified phosphorylation sites corroborate those previously known, the oxidative modifications observed on cysteinyl residues reveal potentially novel information suggesting a role for oxidative stress in islet function. Comparative analysis with 15 available proteomic datasets from other mouse tissues and cells revealed a set of 133 proteins predominantly expressed in pancreatic islets. This unique set of proteins, in addition to those with known functions such as peptide hormones secreted from the islets, contains several proteins with as yet unknown functions. The mouse islet protein and peptide database accessible at http://ncrr.pnl.gov, provides an important reference resource for the research community to facilitate research in the diabetes and metabolism fields.
The pancreatic islets of Langerhans plays a critical role in the regulation of glucose homeostasis by secreting insulin and several peptide hormones including glucagon, somatostatin, pancreatic polypeptide, amylin, peptide YY, prodynorphin, urocortin 3, and ghrelin. Besides insulin-producing beta-cells (65–80%) 1, other major cell types within the islets of Langerhans include the glucagon releasing alpha-cells (15–20%) 2, somatostatin producing delta-cells (3–10%) 3, pancreatic polypeptide containing PP cells (1%) 4 and ghrelin containing epsilon cells (<1%) 5. While each of the secreted peptide hormones affect glucose homeostasis either directly or indirectly, their significance is best exemplified by the deficiency of insulin that leads to the development of diabetes mellitus. Type 1 diabetes results from autoimmune destruction of beta-cells and an absolute deficiency of insulin, while type 2 diabetes is commonly characterized by a failure of the beta-cells to produce sufficient amounts of bioactive insulin to compensate for insulin resistance in the liver, muscle or adipose tissue 6.
Due to the importance of islets for metabolism and particularly in the pathogenesis of both types of diabetes, a comprehensive understanding of islet biology is essential for the development of therapeutic strategies to prevent, manage, and/or cure the diseases. Recent advances in proteomic technologies offer approaches to comprehensively characterize the proteome at the global level. Therefore, it is not surprising that comprehensive proteome analyses of a number of mouse tissues related to diabetes have been recently reported 7–14. However, the mouse pancreatic islet proteome to date has only been characterized by two-dimensional electrophoresis (2DE) and only 44 proteins have been identified by this approach 15.
In this work we present the first comprehensive profiling of the mouse islet proteome with the aim of establishing an extensive peptide/protein database for the pancreatic islet proteome of rodent models. The extensive coverage was achieved by analyzing a pooled islet sample from two different states (normal and insulin resistant) applying two-dimensional LC-MS/MS profiling. Due to the limited dynamic range of detection of LC-MS/MS profiling, we chose to use a pooled islet sample from both the normal and disease states for this initial profiling so that a better coverage for those proteins with increased expression in the disease can be achieved compared to analyzing the normal sample alone; thus, a more complete coverage of the proteome can be achieved to facilitate future studies using mouse models.
Specifically, strong cation exchange (SCX) fractionation followed by reversed phase LC-MS/MS was applied for this proteome profiling, resulting in the confident identification of ~4,000 protein groups (2,612 proteins identified with two or more unique peptides). The dataset includes qualitative relative protein abundance information based upon MS/MS spectral counts. Additionally, we explored in utilizing the currently reported proteomic datasets on multiple other mouse tissues and cell types for discovering potential novel tissue-specific proteins by comparative analyses. In this case, 133 proteins were identified to be either specifically expressed or predominantly abundant in the pancreatic islets. Finally, the dataset revealed a number of post-translational modifications (PTMs) present in the islet proteome including phosphorylation and oxidative modifications. These data will be made available as a reference resource for the diabetes research community as well as for bioinformatic data mining to facilitate research in the fields of diabetes and metabolism.
Islets were isolated by the intraductal enzyme injection technique using collagenase16. Briefly, the pancreas was inflated with collagenase following anterograde injection via common bile duct, dissected, and incubated at 37°C for 22 min. Following density-gradient centrifugation using HISTOPAQUE-1077 (Sigma), islets were then washed and hand-picked under a stereomicroscope (Stereozoom GZ7, Leica). All islets were cultured overnight at physiological glucose levels (7 mM glucose, 10% FBS and antibiotics) to allow the islets to recover from the effects of collagenase digestion. Islets were then be transferred to nuclease- and pyrogen-free tubes and washed with phosphate buffer. Following removal of the buffer, pellets were frozen at – 80°C prior to proteomic analyses. Islets were isolated from four male control mice and four littermates with liver-specific insulin receptor knockout at age 6-months. The details regarding the creation of animal models have been described elsewhere17. All mice have been back-crossed to the C57BL/6 background for at least 9 generations.
Islet samples from individual mice were homogenized and digested using a 2,2,2-trifluoroethanol (TFE)-based protocol18. Briefly, islets were resuspended in 50 µL of 50% TFE (Sigma-Aldrich, St. Louis, MO) in 50 mM NH4HCO3 (pH 7.8) with 5 mM tributylphosphine (Sigma-Aldrich) and homogenized in a 5510 Branson ultrasonic water bath (Branson Ultrasonics, Danbury, CT), followed by incubation at 60°C for 2 h to reduce disulfide bonds. For tryptic digestion, samples were diluted 5-fold with 50 mM NH4HCO3, supplied with 1 mM CaCl2 and 2 µg of trypsin per sample, and incubated overnight at 37°C with gentle shaking. After lyophilization, samples were re-dissolved in 70 µL of 25 mM NH4HCO3. Peptide concentrations were determined with BCA assay (Pierce, Rockford, IL). On average islets from each mouse yielded 30–60 µg of tryptic peptides. Aliquots of 15 µg peptides were used from each mouse to form a pooled sample for this initial profiling experiment.
The pooled sample (~120 µg of total peptides pooled from 8 mice) were subjected to LC fractionation by strong cation exchange (SCX) chromatography on a 200 mm × 2.1 mm Polysulfoethyl A column (PolyLC, Columbia, MD) preceded by a 10 mm × 2.1 mm guard column, using a flow rate of 0.2 mL/min. LC separations were performed using an Agilent 1100 series HPLC system (Agilent, Palo Alto, CA). Mobile phase solvents consisted of (A) 10 mM ammonium formate, 25% acetonitrile, pH 3.0 and (B) 500 mM ammonium formate, 25% acetonitrile, pH 6.8. Once loaded, isocratic conditions at 100% A were maintained for 10 min. Peptides were separated by using a gradient from 0–50% B over 40 min, followed by a gradient of 50–100% B over 10 min. The gradient was then held at 100% solvent B for another 10 min. Following lyophilization, all thirty fractions were dissolved in 25 mM NH4HCO3 and stored at −80 °C.
Each SCX fraction was analyzed with an automated custom-built capillary HPLC system coupled online to an LTQ ion trap mass spectrometer (ThermoElectron, San Jose, CA) by using an electrospray ionization interface. The reversed phase capillary column was prepared by slurry packing 3-µm Jupiter C18 particles (Phenomenex, Torrence, CA) into a 150 µm i.d. × 65 cm fused silica capillary (Polymicro Technologies, Phoenix, AZ). The mobile phase solvents consisted of (A) 0.2% acetic acid and 0.05% TFA in water and (B) 0.1% TFA in 90% acetonitrile. An exponential gradient was used for the separation, which started with 100% A, and gradually increased to 60% B over 100 min. The instrument was operated in a data-dependent mode with an m/z range of 400–2000. Ten most abundant ions from each MS scan were selected for further MS/MS analysis by using a normalized collision energy setting of 35%. Dynamic exclusion was applied to avoid repeat analyses of the same abundant precursor ion.
The SEQUEST software (ThermoElectron) was used to search the MS/MS data against the mouse International Protein Index (IPI) database (version 3.19 http://www.ebi.ac.uk/IPI). Human keratins and porcine trypsin were added into the database as expected contaminants. No cleavage specificity was defined for database searching. We also considered methionine oxidation as a dynamic modification. The following criteria were used to filter raw SEQUEST results: 1) Xcorr ≥ 1.6 for charge state +1 full tryptic peptides; 2) Xcorr ≥ 2.4 for charge state +2 full tryptic peptides and Xcorr ≥ 4.3 for +2 partial tryptic peptides; and 3) Xcorr ≥ 3.2 for charge state +3 full tryptic peptides and Xcorr ≥ 4.7 for +3 partial tryptic peptides. The delta correlation value (ΔCn) > 0.1 was used in all cases.
To estimate the false discovery rate (FDR) of peptide identifications we searched against a reversed database as previously described 19. In an attempt to remove redundant protein entries, the software tool ProteinProphet was applied as a clustering tool to group similar or related protein entries into a “Protein Group” 20. All identified peptides that passed the filtering criteria were assigned an identical probability score of 1.0, and then entered into the ProteinProphet program solely for clustering analyses to generate the final non-redundant list of proteins or protein groups. To further increase the confidence in protein identifications we considered only proteins identified with at least two peptides.
To find peptides with post-translational modifications (PTM) we used X!Tandem software 21 and applied the following strategy. For the first pass we searched only for fully tryptic peptides with no dynamic modifications. Proteins that were identified with a peptide expectation value less than −2 were carried over for the second round of database searching where we considered only fully tryptic peptides and a set of dynamic PTMs including: oxidation of cysteine to sulfinic acid +31.9898 Da (Csulfinic), oxidation of cysteine to sulfonic acid +47.9847 Da (Csulfonic), serine and threonine phosphorylation +79.966331 Da (STphos). Each modification was searched independently. To estimate the confidence of those PTM peptide identifications we searched for similar modifications but with masses shifted by ± 10 Da (pseudo-PTMs). The ratio of the number of peptides identified carrying pseudo-PTM to the number of peptides carrying normal PTM was used an estimation of the FDR for PTM peptide identifications. In particular the FDR estimate for a given PTM modification with mass M at amino acid X was calculated as a ratio of the average of the number of peptides with modification mass M−10 and M+10 at the same amino acid type X to the number of peptides carrying modification M with original un-shifted mass. To achieve acceptable FDR rates, we require the peptides with E-values less than −3 for Csulfonic PTM and less than −4 for Csulfinic and STphospho PTMs and the non-modified peptide having the E-value passing the same threshold.
In addition to oxidation of cysteines and phosphorylation of serine and threonine residues we also searched for acetylation, methylation and ubiquitination of lysine, nitration and phosphorylation of tyrosine, carbonylation of arginine and proline and S-(2-succinyl) cysteine. However, we failed to identify a significant number of peptides with acceptable FDRs for these PTMs.
To compare the murine islet proteome with other mouse tissue proteomes, we considered the following mouse tissues and derived cell cultures characterized by LC-MS/MS and currently available proteomic datasets: mouse brain 7, 8, cortical neurons cell culture 9, heart 8, 22, muscle 22, kidney 9, lung 9, 10, placenta 8, liver 8, 11, 12, adipocyte cell culture 13, and islet alpha-cell culture 14. If the information on individual peptides was available the peptides were remapped to the mouse International Protein Index database v3.19. The probabilities of correct peptide assignment were then set to 1 and the remapped peptide lists were analyzed by Protein Prophet to derive a likely set of proteins and homologous protein groups. For subsequent steps we considered IPI annotations from the protein groups having a probability equal to or more than 0.95. Finally, the IPI indexes were mapped to Entrez Gene Symbols using the mouse IPI v 3.19 database. If the information about individual peptides was not readily available, the identifiers (e.g. UniProt 8 or IPI v3.07 annotations 13) were mapped directly to IPI v 3.19 to obtain the corresponding Entrez gene symbols.
As the peptide spectral count information was not readily available for a number of datasets, we used the number of unique peptides of the protein normalized for protein length 23, 24 as a measure of protein abundance. To compare protein abundance levels between the organs and cell-types, we used their assigned ranks within the datasets. The most abundant protein was ranked as No. 1. Missing values, in cases where a protein was not detected in a given dataset, were assigned with an arbitrarily high rank value, which was considered to have lower abundance than the detected least abundant protein within the given dataset. For example, if we have a total 9000 proteins for this tissue expression comparison, the least abundant protein for a given dataset is ranked as 2500, and we will have 5500 proteins are not being detected. To be consistency, we assign a high rank value of these missing proteins as the least abundant rank plus half of the total number of missing proteins, i.e., 2500 + 5500/2 =5250 for the rank. When two or more proteins had exactly the same abundance value, they were resolved by assigning the average rank to those proteins. The highest abundance value within the dataset corresponds to rank number one.
The InterPro protein family (http://www.ebi.ac.uk/interpro/), GO gene ontology (http://www.geneontology.org), KEGG pathways (http://www.genome.ad.jp/kegg/pathway.html), PIR protein information resource (http://pir.georgetown.edu/) and SMART simple modular architecture research tool (http://smart.embl-heidelberg.de/) annotations for the entire mouse genome were obtained using the DAVID web-based tool and parsed with ad hoc written Python script prior to importing into relational Microsoft Access database. The significance of over- or under-representation of a certain annotation term was computed using Fisher exact test using hypergeometric distribution with ad hoc R script utilizing phyper() function. The P-values were adjusted for multiplicity of testing using Benjamini-Hochberg method.
To obtain extensive proteomic characterization of mouse islet tissue, we utilized a bottom-up proteomics approach, which first entails detecting and identifying peptide sequences via tandem mass spectrometry and subsequently linking those peptide sequences to their respective proteins during downstream data analysis. Since our aim is to establish an islet proteome database as a reference resource for future diabetes and metabolism research using normal and diseased mouse models, we chose to analyze a pooled pancreatic islet sample isolated from both control mice and mice from a insulin resistance model that exhibits marked islet hyperplasia17 to achieve a more complete coverage of the proteome based on the understanding that proteins with increased expression in the insulin resistance model will be more detectable in the pooled sample compared to the normal. In the LC-MS/MS analyses of a total of 30 SCX fractions and 3 replicated analyses of the unfractionated global sample, 519,992 MS/MS spectra were collected. The results led to a total of 43,654 MS/MS spectra being confidently identified as peptides based on the SEQUEST custom filtering criteria. These spectra correspond to 17,350 unique peptide identifications (Supplementary Table 1) with a FDR of 0.9% based on reversed database searching. Following ProteinProphet analysis, this dataset contains 4,024 protein groups overall with 2,612 protein groups having two or more unique peptide identifications (Supplementary Table 2). Although we report the complete list of identifications, we considered only proteins with two or more peptide identifications for downstream comparative analyses.
The LC-MS/MS profiling can also provide qualitative estimates of the relative protein abundance based on the spectral count information (Supplementary Table 2) 25, 26. To account for the protein length difference, the observed spectral counts were normalized by the number of amino acid residues per protein for estimating the relative abundances within the islet proteome. The dynamic range of estimated abundances spanned approximately four orders of magnitude.
The extent of islet proteome coverage was also examined by mapping the dataset to different canonical signaling pathways. 72 KEGG and 62 Ingenuity Pathway Analysis pathways were revealed to be covered with at least 10 genes in this dataset. Insulin receptor signaling pathway is shown as an example of the receptor tyroine kinase signaling since this pathway plays a key regulating role in islet function and compensatory islet growth response to insulin resistance (reviewed in 27–31). Figure 1 shows coverage of the canonical insulin receptor signaling pathway. Out of 45 proteins in this pathway, 20 proteins were identified with at least two peptides and 7 more with one peptide, together accounting for approximately 60% of the known proteins in this pathway.
We have also explored the LC-MS/MS datasets for the presence of post-translational protein modifications including oxidative modifications and phosphorylation. Oxidative stress has been suggested to be linked with beta-cell dysfunction and insulin resistance 32. Thus, identification of a list of oxidative protein modifications may be useful in revealing primary hot spots of oxidation and for future quantitative proteomic studies regarding the roles of oxidative stress in islet biology.
The methionine (Met) oxidation is known to be a frequent modification and quite commonly included as a dynamic modification in routine peptide identification searches for MS/MS data. Indeed, for this islet dataset the number of peptides containing oxidized Met (1093) constitutes approximately 20% of all Met-containing peptides (1093 out of 5784). It has been controversial whether the detected Met oxidation in LC-MS experiments reflects endogenous oxidation events induced by reactive oxygen species or biologically irrelevant artifacts such as oxidation during sample preparation or electrospray ionization. However, we observed that peptides with oxidized forms of methionine clearly elute earlier during LC separation compared to their unmodified counterparts for a majority of the identified peptides (Figure 2). This suggests that the majority of oxidation is not occurring during the electrospray ionization, which would otherwise produce identical elution times. The observation in elution time differences is in good agreement with the notion that oxidized methionine is less hydrophobic than normal methionine33. On the average, oxidation caused peptides to elute earlier by 4.3% on the normalized elution time scale. Although this observation suggests that the majority of methionine oxidation events occur prior to electrospray ionization, the data do not conclusively prove the biological origin since oxidation resulting from sample processing remains as a potential source of artifact. Regardless of the origin, it may be informative to track the abundances between different biological conditions quantitatively for these modified peptides, which would aid in identification of the major protein targets in oxidative stress.
Unlike oxidized methionine, other PTMs were observed in fewer number and the estimated FDR values were quite high after applying the initial filtering criteria optimized for regular peptide identifications. To improve the FDR, we applied an additional filtering criterion that requires the presence of unmodified form of the peptide in addition to the modified form in order to be considered as a ‘true’ identification. However, such filtering criterion is not compatible with the common approach for assessing FDR, which use reversed or scrambled protein sequences. The peptide identifications from reversed database are random matches in nature, and it is unlikely to include modified peptides and unmodified peptides with the same sequences identified from the reversed database at the same time. Although they may have some cases with non-modified counterparts among peptides from reversed sequence search, this will significantly underestimate the FDR since the percentage of peptides existing in both modified and non-modified peptides among reversed database searches are significantly fewer than that from forward searches. To address this issue, we introduced an alternative strategy to assess the FDR for peptide identifications with PTMs. We propose that the estimate of the number of false hits for a given PTM can be made based on a search for non-existing PTMs with similar properties such as the same amino acid specificity and a similar, but distinct mass. In practice we performed searches for modification on the same amino acid residue with the intended modification mass shifted with ±10 Da. For example, we found 23 peptides with cysteines oxidized into sulfonic acid (Supplemental Table 3) which corresponds to +47.9847 Da. Searching for dynamic cysteine modification with the masses +37.9847 Da and +57.9847 Da gave zero and one peptide, respectively. Thus, our FDR estimate of the identification of the peptides having sulfonic acid PTM is 2%. We also identified 5 peptides bearing sulfinic acid, another oxidative modification of cysteine and no peptides with shifted PTM masses were detected, suggesting a relatively low FDR. Interestingly, all 5 sites identified as cysteine sulfinic acid were also identified as cysteine sulfonic acid, which agrees well with the notion that sulfinic acid is an intermediate product of oxidation of cysteine residues to sulfonic acid. All of the proteins bearing oxidized cysteine residues seem to be quite abundant: aspartate aminotransferase, actin, glyceraldehyde-3-phosphate dehydrogenase, elongation factors 2 and acetyl-CoA acetyltransferase with all ranked within top 300 out of the 4024 proteins.
In addition, we identified 26 peptides having phosphorylated serine or threonine with 6% estimated FDR. Interestingly, 11 of these sites have been previously reported or predicted based on homology, and are listed in the Swiss-Prot database. Because no specific enrichment of phosphopeptides was performed, this list of the phosphopeptides most likely reflects only very top abundant phosphoproteins. Since an important function of islets is hormone secretion, it is not surprising that we identified relatively abundant secretion-regulatory proteins including chromogranin A (rank 54) and secretogranin-2 (rank 29) proteins 34 that were detected with four phosphopeptides and two phosphopeptides, respectively.
The relatively extensive coverage of the islet proteome led us to examine what pathways or biologically important entities are enriched in the pancreatic islets. The most common approach for such analyses has been comparing the obtained proteomic dataset to the entire genome as a reference. However, such analysis using the annotations of the entire mouse genome as a reference often captures the biases of the experimental approach. For example, the global bottom-up LC-MS/MS proteomic profiling is biased towards the detection of high-abundant proteins. Thus the comparison of such a dataset against the entire genome usually will indicate overrepresentation of GO terms that involves mostly high-abundant proteins (e.g. mitochondria, ribosome and/or main methabolic pathways). Such overrepresentations usually are not resulted from the biology, but rather from the biases of the experimental approach. To overcome this issue, we generated an “average” proteomic database for the mouse based on the available proteomic data from different tissues or cells created with the same or a similar LC-MS/MS experimental approach. A number of studies describing the proteomes of different mouse organs, tissues and derived cell cultures, including mouse brain 7, 8, cortical neuron cell culture 9, heart 8, 22, muscle 22, kidney 8, lung 8, 10, placenta 8, liver 8, 11, 12, adipocyte cell culture 13, and islet alpha-cell culture 14 have been included in compiling such reference database. To approximate the protein set of an “average” LC-MS/MS mouse tissue analysis, we assembled available datasets (including the current one derived from pancreatic islets), but only those derived from adult mouse samples were considered in the final combined database. If a gene was observed in multiple different studies, we retained redundant entries in the combined dataset. This approach has the advantage of maintaining the approximate distribution of genes belonging to a given biological annotation between the data obtained from a single profiling study and the combined dataset of multiple profiling studies. As expected, some annotations are no longer statistically significantly over- or under-represented when analyzed against the combined proteomic database as compared to the entire genome as a reference. For example, it is typical to achieve significant coverage of the proteins involved in oxidative phosphorylation, so it appeared significantly over-represented when compared against the entire mouse genome. However, because other proteomic datasets also have an extensive coverage of oxidative phosphorylation pathways, when compared with the pooled proteomic dataset, it is not any longer significantly over-represented, as the p-value equals 0.084 even before the correction for multiplicity of testing (Table 1). Overall, when compared to the pooled proteomic dataset instead of the entire genome, only 9, instead of 255 GO “biological processes” terms, appeared to be significantly over-represented, and none (instead of 76) significantly under-represented (i.e., having adjusted p-values < 0.05). Even so, all the 9 over-represented terms (Table 2) relate to protein transport and secretion, thus likely reflecting insulin and other peptides related to hormone secretion as the main constituents of the pancreatic islets. To determine which protein complexes or sub-networks related to protein transport and exocytosis might contribute to over-representation of corresponding GO terms (Table 2), we collected evidence for protein-protein interactions from multiple sources for the protein list covered by those GO terms and analyzed them using Cytoscape35 plug-ins CABIN36 and MCODE37. We found a number of protein complexes involved in vesicular trafficking and exocytosis, such as the SNARE complex that is involved in fusing the vesicular membrane with endosomes, ARF proteins which are G-proteins responsible for regulation of trafficking, adaptor proteins involved in formation of clathrin-coated vesicles, proteins in the exocytotic complex responsible for fusion of protein-carrying vesicles to the plasma membrane to enable exocytosis, components of oligomeric Golgi complex and others (Figure 3). Notably 29 proteins related to vesicular secretion are regulated by XBP-1 transcription factor, indicating XBP-1 as the dominant regulator. Indeed, it has been shown that XBP-1 is a crucial transcription factor involved in the development and function of exocrine glands 38.
None of the KEGG, InterPro, GO molecular function and SMART annotations appeared to be statistically significantly over- or under-represented. Nonetheless, we detected two GO “cellular component” terms, one PIR super family and five PIR keywords as over-represented annotations in the pancreatic islet dataset. The two GO “cellular component” terms and one PIR keyword highlight the over-representation of the Golgi apparatus, which is indeed extensively involved in secretion of insulin and other peptide hormones. PIR super family SF001135:trypsin are often typical contaminants from exocrine pancreatic tissue of the islet isolates 39, 40. Secreted peptide hormones and their precursors produced by alpha- (Gcg, Pyy,), beta- (Chga, Iapp, Ucn3) or PP cells (Ppy) constitute the over-represented PIR keyword annotation “amidation”, referring to C-terminal amidation, which is essential for the biological activity of many peptide hormones. Both hormones from endocrine islet tissue (Gcg, Ins1, Ins2, Ppy) and contaminants from adjacent exocrine pancreatic tissue (Amy2, Cell) annotated PIR keyword as exclusively expressed in the pancreas. Cleavage on pair of basic residues is a common post-translational modification in the maturation process of peptide hormones from precursors (Chgb, Gcg, Iapp, Pdyn, Ppy, Ppy, Scg2, Scg3, Sst). Notably, all enriched annotations, except trypsin-like proteases potentially from adjacent exocrine tissue, precisely point toward the intracellular transport and secretion of peptide hormones as the major function of pancreatic islets.
Although analyses of over-representation of annotations expectedly highlighted the biological role of the pancreatic islets in the secretion of peptide hormones, it is unlikely that this information will provide novel information regarding the biological role of the islet and the functions of individual proteins. An alternative approach would be to identify the set of proteins specifically expressed in the islets. To this end, we compared our dataset with all the proteomic datasets obtained from different mouse tissues as described above. As a measure of protein abundance, we used the number of unique peptides or spectral count (if available) belonging to the protein normalized by the protein length. However, due to the qualitative nature of LC-MS/MS profiling, along with different experimental and instrumental setups and different types of search engines used to interpret the MS/MS spectra, direct comparison of the estimated protein abundances among the datasets obtained from different laboratories should be interpreted with caution. We reasoned that the rank of the abundance rather than the estimated abundance itself should be a more robust measure for comparison among such datasets (Figure 4), since highly abundant proteins should always have low rank values while low abundant proteins should always have high rank values, although their absolute abundance estimates may significantly differ between different datasets. The complete dataset used for analysis is available in Supplementary Table 4. With the qualitative abundance rank information available, we focused on identifying relatively islet-specific proteins, i.e., proteins expressed at high levels (with low ranking values) in pancreatic islets but not present or present at very low levels in other tissues. Figure 4 shows results of hierarchical clustering as a heatmap with a color gradient for ranks of protein abundance levels. In this analysis, it is important to note that to ensure confidence in the proteins identified in all datasets we considered only those having at least two peptide hits. To compare the islet proteome with proteomes in other tissues we considered all the datasets except the pancreatic islet alpha-cell dataset because the alpha-cells are part of pancreatic islet tissue. We found a cluster of 133 proteins that are almost exclusively present in the pancreatic islet dataset having two or more peptides per protein (Supplementary Table 5).
As shown in Figure 5, most enzymes involved in the citric acid cycle are present in most of the samples with relatively high abundance. Conversely, proteins involved in regulation of secretion (Chga, Chgb, Scg2, Scg3, Scg5), and moreover islet-specific peptide hormones (Gcg, Iapp, Ins1, Ins2, Pdyn, Ppy, Pyy, Sst, Ucn3) are among the top abundant proteins in the pancreatic islets, but with very low abundance in other organs and tissues. A notable exception is glucagon, which is expressed high in the pancreatic islet dataset as well as in the alpha cell dataset as expected.
This subset of islet-cell specific proteins covers the entire range of abundances with a slight shift towards low abundance proteins (Figure 6). Of these 133 proteins, 68 were not identified in other measurements even by a single peptide, and thus are highly likely specific to pancreatic islets. These 133 proteins were classified into the following annotation groups: (1) secreted protein hormones, (2) proteases and protease inhibitors, (3) proteins involved in transport, secretion and associated with the Golgi apparatus, (4) ribosome and translation, (5) regulation of transcription, (6) proteasome and ubiquitin, (7) glycolysis and oxidative phosphorylation, (8) lipases, (9) lysosome, (10) helicases, (11) proteins with other functional annotations and (12) proteins without any functional annotations. Table 3 lists several novel unannotated islet-specific proteins that do not contain any putative domains with known biological functions (Table 3). These unknown proteins may play an important role in islet function and are potential candidates for further detailed investigation.
The availability of complete genome sequences has greatly accelerated the establishment of genomic and proteomic technologies as powerful tool for studying tissue or cell-specific gene expressions at the system-level and for delineating novel pathways involved in metabolic diseases such as diabetes41, 42. In particular, mass spectrometry-based proteomics has become an important tool for molecular and cellular biology research and for systems biology studies by providing large scale measurements of relative protein abundances including post-translational protein modifications43. The importance of studying biological systems at the protein level is further emphasized by recent studies that clearly indicate that mRNA levels do not necessarily correlate with protein abundances 44–46.
Mass spectrometry-based proteomic tissue profiling has been extensively applied recently for establishing the proteome composition and protein expression patterns, in different mouse tissues, organs, and cell lines7–14. Such proteome profiling of mammalian tissues or organs is especially valuable for elucidating the diversity in protein composition and expression patterns among mammalian tissues. The proteome database will also serve as a reference resource for more focused hypothesis- driven biological studies and/or for more detailed systems biology studies. For example, one or more of the proteins can serve as potential biomarkers for tissue-specific pathologies. The present study represents the first extensive proteomic characterization of mouse pancreatic islets of Langerhans with the aim of establishing a reference database for mouse islet proteome for future metabolic research using rodent models. We performed this initial survey experiment using a pooled sample from both normal (control) mice and an insulin resistant model with the aim of gaining increased coverage for those proteins that are potentially expressed at higher levels in either of the two conditions. This strategy is advantageous compared to the analysis of islets from the normal state alone because many detectable proteins with increased expression in the disease state could be below the limit of detection if only the normal sample is analyzed. Also, since the database is a qualitative catalog, nearly all proteins identified from the pooled sample will be present in both the normal and disease states but at different abundance levels in the two states. Therefore, the increased coverage achievable for this database should serve as a more useful resource for future studies using both normal and disease mouse models 47.
The resulting islet proteome database from this study covers ~4,000 proteins. One utility of the database will be mapping different canonical pathways and functional processes to identify which islet proteins are linked with specific metabolic and signaling pathways (Figure 1) since traditional pathway knowledgebases are often non-tissue specific. This database will also be a suitable complement other proteomes that have been characterized for liver, adipocyte, muscle, and brain7–14. Furthermore, we have provided estimated protein-abundances within the islet proteome based on the normalized spectral counts. While it has been reported that spectral counts can provide an estimate on relative protein abundances within the proteome25, 48, such estimates should only be used as a qualitative measure to query whether the protein is either highly abundant or of relatively low abundance. This is because several other factors can influence the spectral count including protein solubility, protein digestion efficiency and peptide ionization efficiency for a given protein.
The extensiveness of the islet proteomic datasets enabled us to compare the results with other available datasets from a number of mouse organs and tissues. We were able to identify a set of 133 proteins that were specific to islets but not detectable or detected with very low abundances in other tissues (Supplemental Table 5). Indeed the subset of 133 proteins contains well-known islet specific secreted hormones, including: glucagon, islet amyloid polypeptide, insulin, prodynorpin, pancreatic polypeptide, peptide YY and urocortin 3. Besides secreted peptide hormones, the subset includes proteins known to be specific to islets, for example: G6pc2, Reg1 and Sytl4, which are islet specific glucose-6-phosphatase49, regenerating islet derived 1 50 and synaptotagmin-like 4 also know as granuphilin51, respectively. Although some proteins are indeed known and expected to be islet-specific, the majority of proteins were not known to be restricted to islets. In particular, several hypothetical or unknown proteins were confidently identified as islet-specific i.e. proteins not containing domains with known or reasonably specific functions. These islet-specific proteins, including the unknown proteins, may be important for islet function and are suitable candidates for future studies. An interesting example is the novel transmembrane protein, TMEM27, that was recently demonstrated to stimulate pancreatic beta-cell proliferation52. We should note that those 133 proteins are relatively specific to islets based on our data because we only used 8 other organ and tissue types for this comparative analysis: placenta, muscle, heart, kidney, lung, adipocytes, liver and brain. Potentially, those proteins could be expressed in the tissues not yet profiled by LC-MS/MS proteomics.
In addition to protein identification, there is a significant need in identifying potential post-translational protein modifications in a global proteome profiling study since many modifications are known to regulate cell signaling and can also serve as markers of disease progression. Unfortunately, due to the low-abundant nature of most modifications, it has been a challenge to identify protein modifications in global profiling experiments without enrichment. In this work we explored the use of an alternative informatics strategy for identification of modified peptides from LC-MS/MS analyses of a global non-enriched sample. We based our analyses on the notion that post-translational modifications are usually substoichiometric, therefore, the true modified peptides should be present along with unmodified forms. By applying this criterion and an alternative FDR estimation approach using shifted-mass approach (details described in methods), we identifed a total of 54 modified peptides including oxidative modifications on cysteine and phosphorylation on serine and threonine with the FDR <5%. Our approach for controlling the FDR of the peptides with PTMs is important for identifying modified peptides within global profiling data, especially for those PTMs that cannot be specifically enriched such as cysteinyl oxidation.
While we realize that the number of identified modified peptides is very limited due to the nature of the global profiling experiment without specific enrichment, the identification of these modifications provides additional value to protein identities or abundances for this initial characterization of the islet proteome. One example is oxidative modifications since oxidative stress has been linked with diabetes 32, 53. To our knowledge, the specific oxidative modification in islets have never been identified, presumably due to the limitations in technology. The observed oxidative modifications in this study could potentially provide a list of novel targets that may play a role in oxidative stress response and may also serve as markers of disease progression. An specific example is the oxidation of Cys-244 in glyceraldehyde-3-phosphate dehydrogenase (G3PDH). There are reports showing that Cys-244 is one of the strongest nucleophilic residues and is susceptible to modifications by 4-hydroxy-2-nonenal 54, a major lipid peroxidation-derived reactive aldehyde, or by normal endogenous metabolites like acyl-CoA 55 and fumarate 56. All three modifications result in strong inhibition of the G3PDH enzyme activity. The fact that the oxidation on Cys-244 was detected in both sulfonic and intermediate sulfinic acid forms further supports the confidence of the identifications. However, it remains to be proven that the oxidation of Cys-244 residue indeed inhibits the enzymatic activity of G3PDH.
In summary, the resulting mouse islet proteome database contains the identified peptide sequences, the protein identifications and spectral count information for each protein as information reflecting their relative abundances, and the identified PTMs. The database represents an important reference resource for further data mining and for islet biological studies focused on diabetes. For example, this database will provide a foundation for future quantitative proteomic studies applying the accurate mass and time tag approach where both accurately measured masses and elution times are utilized for peptide identifications57. The available peptide sequences and islet-specific proteins will also be useful for selecting and devising specific targeted proteomic experiments. The database is included as Supplemental Material and available at the NCRR Center for Integrative Biology website (http://ncrr.pnl.gov) for access by the research community.
The authors thank the NIH NCRR grant (RR018522) to Richard D. Smith and the Pacific Northwest National Laboratory LDRD program (W.-J.Q.) and RO1 DK67536 (R.N.K.) and in part by the Harvard Stem Cell Institute (R.N.K.) for support and the Environmental Molecular Sciences Laboratory (EMSL) for use of the instrumentation applied in this research. EMSL is a U.S. Department of Energy (DOE) national scientific user facility located at the Pacific Northwest National Laboratory in Richland, Washington. PNNL is a multi-program national laboratory operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RL01830.
Supporting Information Available
Supplemental tables (1–5) in Excel format are available. This material is available free of charge via the Internet at http://pubs.acs.org.