Identification of 12,000 proteins and 36,000 phosphorylation sites from 9 mouse tissues
To survey protein and phosphoprotein abundance, 9 organs were harvested from three-week-old male Swiss-Webster mice: brain, brown fat, heart, liver, lung, kidney, pancreas, spleen, and testis. After tissue homogenization, proteins were either digested in solution for subsequent strong cation exchange chromatography and phosphopeptide enrichment via IMAC (10 mg per tissue) or separated via SDS-PAGE (65 μg per tissue) followed by in-gel digestion (
Supplemental Methods and
Fig. S1). Protein and phosphoprotein extraction conditions were selected to minimize protease and phosphatase activity (
Castellanos-Serra and Paz-Lago, 2002;
Roche, 2004). This was most critical for pancreas, due to its high levels of endogenous proteases and phosphatases. Although assessing phosphatase activity is challenging, minimal protein degradation was observed via SDS-PAGE (
Figure S2). Samples enriched with phosphopeptides (24 per tissue) were analyzed in duplicate using a hybrid linear ion trap Orbitrap mass spectrometer, while non-phosphorylated samples (12 per tissue) were analyzed once (
Extended Experimental Procedures). The final dataset contained over 284,000 phosphopeptide identifications (
Table S1), matching nearly 36,000 phosphorylation sites (
Table S2) from 6296 proteins (
Table S3) at peptide and protein-level false discovery rates (FDRs) of 0.15% and 1.7%, respectively.
Following peptide- and protein-level filtering, each site on every phosphopeptide was scored using the AScore algorithm to assess the confidence of phosphorylation site localization (
Beausoleil et al., 2006). Sites scoring above 13 were considered localized (p < 0.05). 85% of sites were localized to a single amino acid and ranged from 89–93% for individual tissues (). A minimal list of phosphorylation sites was then assembled. Localized sites were counted once; non-localized sites were grouped when their regions of possible modification overlapped. Groups of non-localized sites were counted only when no localized sites could explain the observed phosphorylation.
Importantly, over 50% of observed sites and phosphoproteins were previously unreported, based on comparison with the PhosphoSite database of known phosphorylation sites (
Hornbeck et al., 2004) (
Figure S3A). Similarly, most sites have not been reported in the Phospho.ELM database (
Diella et al., 2008) (
Figure S3B). Several factors contribute to the high proportion of unreported sites. First, we included tissues (brown fat, kidney) that have been less studied. Second, continual improvements in instrumentation and methodology have enhanced the sensitivity of phosphoproteomic analyses. Previously, our lab surveyed the mouse liver phosphoproteome, using similar techniques to characterize organs from mice of identical age, strain, and sex as used in this study. This present work encompasses far more phosphorylation sites, across all tissues and within the liver itself (
Figure S3C). Interestingly, though the present study encompasses virtually all sites reported by Villen et al., some pTyr sites were not observed. These missing sites were detected via immunoprecipitation of pTyr-containing peptides from much larger amounts of digested peptides.
Without phosphopeptide enrichment, 894,041 peptide spectral matches corresponding to 10,102 proteins were made at a peptide and protein-level FDRs of 0.11% and 1.25% (
Tables S3 and
S4). Traditionally, control of peptide and protein FDRs from large datasets has posed significant challenges, risking accumulation of incorrect identifications to unacceptably high FDRs. To reliably estimate protein FDRs, we developed a method based on the target-decoy database search strategy (
Elias and Gygi, 2007). Peptide identifications were filtered using a multivariate approach that used linear discriminant analysis to distinguish valid identifications from random matches, following training with target and decoy peptides as positive and negative training data. Peptides were subsequently assembled into proteins and filtered via several protein quality metrics (
Extended Experimental Procedures). This extensive peptide- and protein-level filtering of both phosphorylated and non-phosphorylated data ensured the highest quality of all identifications.
Tissue Distribution of Phosphorylation Sites
We first examined the number of phosphorylated peptide spectral matches, the number of unique sites and the total phosphoproteins identified per tissue. These varied, reflecting differences in complexity and variable intracellular signaling within each tissue (). While their heterogeneity varies, each tissue contains many cell types that together create specialized physiology. The reported phosphorylation profiles are thus weighted averages that reflect signaling within all cell types in each tissue. Highest numbers of phosphopeptides, phosphorylation sites, and phosphoproteins were identified in brain, highlighting its unique cellular diversity and the specialized signaling networks these cells employ. In addition to brain, kidney, spleen, lung, and testis each contained over 30,000 phosphopeptides and 10,000 localized sites. Since spleen contains numerous immune cell populations, many phosphorylation-dependent signaling pathways are constitutively active, priming the young mice for rising immune challenges. Similarly, immune cells in lung contribute to its complex phosphorylation profile. Despite varying phosphopeptide numbers, phosphoprotein counts were similar across tissues, indicating that decreased phosphopeptide counts in tissues such as pancreas and heart reflect true differences in signaling and tissue heterogeneity, rather than varying instrument performance.
The diverse cell populations contributing to each phosphoproteome profile include cells specific to each tissue, as well as red blood cells and proteins within the vasculature of all tissues. To identify proteins and phosphorylation sites whose levels could be influenced by blood contamination, we compared our protein and phosphoprotein profiles with a proteomic survey of murine red blood cells (
Pasini et al., 2008). A small fraction of proteins in each tissue were also seen in red blood cells (see
Figure S3D). Overall, 445 of 12000 proteins detected in this study with or without phosphorylation were also seen in red blood cells. While some proteins such as hemoglobin are predominantly found in red blood cells, most, including actins and glycolytic enzymes, are found in virtually all cells, including cells within the tissues in question.
We next examined patterns among identified phosphorylation sites. Similar to previous studies (
Olsen et al., 2006), we observed mostly Ser phosphorylation (83%), followed by Thr (15%) and Tyr (2%). This enrichment far exceeds the relative abundance of Ser among residues subject to phosphorylation within the phosphoproteins detected in this study, indicating a strong preference for Ser phosphorylation (). We then analyzed numbers of sites within each phosphoprotein. 80% of phosphoproteins contained multiple sites, while 50% were phosphorylated on four or more residues and 10% carried more than fourteen sites (). Though these multiple modifications do not necessarily occur simultaneously on individual protein molecules, such multiple phosphorylation could reflect regulation of a single protein function via multiple pathways, or could suggest that many of the protein’s cellular activities and interactions are independently regulated via phosphorylation at distinct sites. Indeed, examination of predicted structural elements within proteins using PsiPred (
Jones, 1999) and VSL2 (
Peng et al., 2006) revealed that phosphorylated Ser, Thr, and Tyr residues exhibited marked differences in structural classification compared to their unmodified counterparts (
Figure S3E). Phosphorylated sites were predominantly predicted to reside in coiled and disordered regions rather than in ordered secondary structures. Although phosphorylation usually occurred in disordered regions, sites within kinase activation loops were a notable exception: most of the 120 observed activation loop sites were ordered, with elevated levels classified as strands. Virtually no phosphorylation sites were located in known or predicted α-helices.
While many proteins were multiply phosphorylated, generally only a small portion of Ser, Thr, and Tyr residues within each protein were modified (). Overall, 5% of these residues were modified, with some variability for each residue: Ser, 8%; Thr, 3%; Tyr, 1%. Nevertheless, some proteins bore extensive phosphorylation. Based on fractional modification (the number of potential sites versus the number of observed phosphorylated sites), the most heavily phosphorylated proteins included hemoglobin β1 (94% phosphorylated) as well as Marcksl1 (also known as MLP; 61% phosphorylated), which spans the protein kinase C and calmodulin signaling networks, and Hmgn1 (60% phosphorylated), which regulates DNA-histone interactions.
To assess overlap, we counted the number of tissues in which each site was observed (). 50% of sites were observed exclusively in single tissues, while 3% were found in all tissues and 18% were present in over half of examined tissues. Although tissue-specific sites were observed in all organs, they were not evenly distributed (). Most tissue-specific sites were found in brain (33%) and testis (17%), while lung contained only 6% and liver contributed 3%. These differences are not due to lower phosphopeptide counts in these tissues, since lung contained 95% of the total number of phosphopeptides as testis. To better assess tissue distributions, tissue enrichment was quantified for each site using Shannon’s entropy (
Experimental Procedures) (
Shannon, 1948). Selected tissue-specific phosphorylation sites are shown in . These sites come from variably abundant proteins, including Bassoon and Mtap1a which were highly expressed in brain, as well as Nexilin and the CXC chemokine receptor which were found in low abundance in heart and spleen, respectively. Many sites are previously unknown, with most of these sites identified in less frequently studied tissues. For comparison, proteins bearing global phosphorylation sites are listed in . Examples include Huntingtin, the protein implicated in Huntington’s disease, and kinases Mapk3 and Gsk3b. Few global sites are previously uncharacterized, presumably due to their ubiquity. Though some sites are globally modified, extensive tissue-specific phosphorylation underscores the importance of multi-tissue phosphoproteomics. First, even widely expressed proteins display dramatically different phosphorylation profiles across tissues. Even the heavily phosphorylated Srrm2 (310 sites) harbors an abundant testis-specific site (S1434). Second, many proteins are only expressed in a subset of tissues and could obviously only be phosphorylated in tissues where they are expressed. The proteins Speg (heart), calmegin (testis), and B-lymphocyte antigen CD-20 were only found in single tissues. Clearly, comprehensive phosphoproteomics requires analysis of many tissues.
| Table 1Abundant Tissue-Specific Sites, as Determined by Spectral Counts |
| Table 2Abundant “Global” Phosphorylation Sites |
To compare phosphorylation profiles for each tissue, we performed hierarchical clustering (). Total spectral counts (TSC) were used to approximate each site’s abundance within each tissue (
Liu et al., 2004). Clustering of sites based on their tissue distributions highlights tissue-specific phosphorylation, especially in brain and testis. Furthermore, clustering tissues based on their phosphorylation profiles reveals that lung and spleen were most similar, likely reflecting immune cell signaling, whereas brain was most dissimilar.
Multiple kinases modify most phosphoproteins To investigate which kinase classes were likely responsible for observed phosphorylation events, we examined the amino acid motifs surrounding each site and broadly classified each as basic, acid, proline-directed, or tyrosine using a decision tree (
Villen et al., 2007). Proline-directed sites were most common (29% of sites) (), while only 2.5% of sites corresponded to tyrosines. Statistically significant variations in frequencies of these classes were found across tissues, suggesting that specific tissues rely on distinct kinases to maintain specialized signaling. Proline-directed sites were elevated in spleen, testis and pancreas, while brown fat exhibited increased basic sites. Furthermore, when sites were divided into tissue-specific, moderate, and globally abundant groups based on entropy filtering, each showed distinct proportions of the 5 site classes (). Proline-directed sites were more frequently classified either tissue-specific or global, while basic sites were enriched among global events. Both tyrosine sites and those classified as “other” were decreased among tissue-specific and global phosphorylation events.
We next examined the distribution of phosphorylation classes within each phosphoprotein (). Hierarchical clustering revealed that Ser/Thr classes were similar, while Tyr sites diverged. 66% of phosphoproteins contained sites from multiple kinase classes and 4% harbored sites from all classes (). Two variably phosphorylated proteins are Mark1, a kinase involved with cytoskeletal dynamics (
Timm et al., 2008b) and Dennd1a, a protein that acts in synaptic endocytosis (
Allaire et al., 2006) (). Each was phosphorylated across its length and contained sites targeted to 4 site classes (neither contained pTyr). Individual sites showed distinct tissue profiles. In some cases, pairs of sites within the same class showed similar phosphorylation patterns; however, even within the same protein, different sites within the same class often showed variable patterns of modification. Overall, the presence of multiple site classes and the distinctive tissue-specific profiles seen across sites within most phosphoproteins suggest that the ‘typical’ phosphoprotein sits at the crossroads of multiple signaling pathways, where its activity depends upon many intracellular and extracellular influences.
A representative protein spanning multiple signaling networks is the kinase GSK3β, which regulates glycogen synthesis, microtubule dynamics, apoptosis, and cell proliferation (
Forde and Dale, 2007). We found 4 sites on GSK3β, from 3 classes: S9 (basic), S25 (other), Y216 (Tyrosine), and S219 (other). Multiple kinases catalyze these phosphorylations, allowing multiple networks to modulate GSK3β activity. Specifically, Y216 phosphorylation activates GSK3β, and results from autocatalytic activity or Pyk2 action. In contrast, S9 phosphorylation inhibits GSK3β and results from activity of PKB, PKA, and S6K, as well as through auto-inhibition (
Forde and Dale, 2007). Though sites S25 and S219 have been seen in multiple previous studies (
Hornbeck et al., 2004), the kinase(s) responsible for their phosphorylation are unknown.
Combining protein abundance and phosphorylation measurement identifies true differential protein phosphorylation Differential phosphorylation can reflect changes in protein abundance, as well as changes in a particular site’s phosphorylation. To distinguish these factors, we also performed a proteomic analysis of the 9 tissues examined in our phosphoproteomic experiments. Altogether we identified 12,039 proteins, 36% of which were identified both with and without phosphopeptide enrichment (), an overlap that was consistent across tissues (). 5,745 proteins were only identified without phosphopeptide enrichment, while 1,937 proteins were detected in the phosphorylation data alone, indicating that normally, these proteins are of low abundance, resisting detection via our shotgun proteomic approach. Phosphopeptide enrichment provides an excellent means to access proteins that are invisible to other fractionation methods.
To explore their expression and phosphorylation, proteins were clustered based on spectral counts within each tissue, with and without phosphopeptide enrichment and plotted as a heat map (). As with individual sites (), phosphorylated and non-phosphorylated protein profiles ranged from tissue-specific to global expression. Again, most tissue specificity was in brain and testis; however, unmodified proteins were more consistently expressed across tissues, indicating that protein expression is less variable than phosphorylation.
Perhaps most striking are differences among phosphorylated and non-phosphorylated profiles. Though many abundant, ubiquitous proteins were identified in the non-phosphorylated dataset, these proteins showed little phosphorylation. Similarly, the most abundant and globally phosphorylated proteins were sparsely observed without phosphorylation. Generally, there is little correlation between protein abundance and phosphorylation levels, either for the entire dataset, or for individual proteins. After spectral counts were normalized, comparison of each protein’s expression and phosphorylation profiles frequently revealed large differences. For example, the abundances of Nck1 with and without phosphorylation were very distinct (). In contrast, high concordance was observed for phosphorylated and non-phosphorylated Acaca (). Nevertheless, considerable heterogeneity was observed for both proteins’ individual sites across tissues () indicating that these fluctuations are not due to changes in substrate protein abundance and thus reflect true differential tissue-specific phosphorylation. Since this analysis relies upon accurate quantitation of proteins and phosphoproteins via spectral counting, we investigated its reproducibility by comparing duplicate analyses of non-phosphorylated brown fat (
Figure S4A) and found strong agreement. We also confirmed agreement between TSC and protein abundance using Western blots of selected proteins and their phosphorylation sites (;
Figure S4B). Finally, we compared our protein expression profiles with those reported in a previous proteomic survey of several mouse tissues (
Kislinger et al., 2006). Though only a subset of tissues was included in this prior study, excellent agreement was observed for the 3202 proteins shared among these datasets (
Figure S4C–E).
Tissue-Specific Phosphorylation does not Imply Tissue-Specific Expression To assess the relationship between tissue-specific phosphorylation and protein expression, proteins identified without phosphopeptide enrichment were classified as “tissue-specific”, “moderate”, or “global” based on entropy filtering (, “All Proteins”). Next, those proteins also identified with one or more phosphorylation sites were selected (“All Phosphoproteins”). Phosphoproteins were more likely to be “globally” expressed in their non-phosphorylated forms (24% to 37%; p < 10−112, χ2 test). When this list was further filtered to include only proteins for which one or more tissue-specific sites were observed (“Proteins with Tissue-Specific Sites”), a subtle increase in the fraction of tissue-specific proteins was observed (13% to 15%; p < 10−6, χ2 test). Thus proteins containing tissue-specific sites are only slightly more likely to display tissue-specific expression in non-phosphorylated form. In contrast, the vast majority (85%) of proteins that contained tissue-specific sites were expressed across multiple tissues in non-phosphorylated form, and 36% were globally expressed. Most tissue-specific phosphorylation is not due to tissue-specific protein expression, and instead reflects the independent influence of tissue-specific signaling.
Biological Classification of Global and Tissue-Specific Proteins and Phosphoproteins To explore their biological roles, proteins and phosphoproteins were classified as “global” or “tissue-enriched” via entropy filtering. Each of these classes was then compared with all identified proteins and phosphoproteins to detect enriched Gene Ontology (GO) categories and Protein Information Resource (PIR) classifications using DAVID (
Dennis et al., 2003). Enriched GO categories (
Ashburner et al., 2000) and PIR classifications (
Wu et al., 2003) were then clustered based on p-values reflecting enrichment in each class, following log transformation and z-transformation (
Figure S5). Global proteins were enriched for protein synthesis and degradation as well as mitochondrial function, nucleotide binding and ligase activity, while ubiquitin ligase activity and phosphoproteins were enriched among global phosphoproteins. GO and PIR enrichments for each tissue generally agreed with expectations. Brain-specific proteins and phosphoproteins were enriched with neuron differentiation and vesicle transport classes, while heart-specific proteins and phosphoproteins were enriched with classes specific to muscle and cardiac tissue. Some tissue-specific proteins and phosphoproteins displayed complementary enrichment patterns. Testis-specific phosphoproteins were enriched in meiosis and cell cycle as well as DNA damage and repair while testis-specific non-phosphorylated proteins were enriched in spermatogenesis and microtubule-based movement. This suggests that distinct regulatory strategies govern these testis-specific functions.
Tissue-Specific Expression of Phospho-Transfer Proteins To better understand variable phosphorylation across tissues, we examined proteins involved with phospho-transfer: kinases, kinase inhibitory proteins, phosphatases, and phosphatase inhibitory proteins (
Figure S6A, B). Proteins were classified based on GO classifications and clustered. We identified 416 of 556 kinases (
Figure S6A), with 57% detected in both phosphorylated and non-phosphorylated forms, as well as 11 of 21 kinase inhibitory proteins. Though mostly globally expressed, tissue-specific kinases were found in brain, lung, spleen, and testis. In contrast, notwithstanding a few brain-specific inhibitors, most kinase inhibitory proteins were widely expressed. Of 151 phosphatases, we identified 112 (
Figure S6B), with tissue-specific phosphatases observed in brain and testis. A significant fraction (43%) of phosphatases were not detected in phosphorylated form, despite nearly ubiquitous expression. We also identified 17 of 18 phosphatase inhibitory proteins, with most being widely expressed across tissues.
Tissue-Specific Expression and Phosphorylation within Protein Interaction Networks One effect of phosphorylation is to regulate physical interactions among proteins. Therefore, mapping phosphoproteomic data onto networks of known interacting proteins can reveal tandem phosphorylation that regulates the proteins’ shared biological activities. We created a high-confidence interaction map of the mouse proteome using protein-protein interactions in the STRING database (
Jensen et al., 2009) and superimposed onto this network protein phosphorylation and abundance data from each tissue. shows 3 networks composed of the nearest neighbor interactors for the proteins Syk, Vamp1, and Bad.
Each interaction network in displays distinct protein expression and phosphorylation patterns. Syk (spleen tyrosine kinase) and its interactors display tissue-specific phosphorylation that mostly correlates with protein expression. Syk is a tyrosine kinase that is active in B and T cells during immune responses and is also expressed in kidney, heart, brain, and lung (
Duta et al., 2006;
Ulanova et al., 2005). Accordingly, the most phosphorylation was found in spleen and lung, which also contain the most expressed proteins from this network; in contrast, liver, pancreas, brown fat, and testis show low network expression and phosphorylation. The high phosphorylation observed for Syk and its interactors in spleen and lung reflect immune activities of splenic lymphocytes and airway epithelia. Furthermore, many network proteins, including Syk, were expressed and phosphorylated in heart, while kidney showed both expression and phosphorylation of network proteins, but not Syk itself.
In contrast to Syk, Vamp1 and its interactors are expressed in all tissues, though brain shows dramatically increased network phosphorylation. Various Vamp isoforms are expressed in nearly every tissue, where they participate in vesicular trafficking; however, Vamp1 and Vamp2 are specific to brain and participate in neurotransmitter release (
Chen and Scheller, 2001). The extensive phosphorylation of Vamps and interacting proteins in brain suggests that phospho-regulation has enabled adaptation of widely distributed cellular machinery to support neural functions.
While the previous networks display variable and tissue-specific protein expression and phosphorylation, Bad (Bcl2-associated agonist of cell death) and its interactors exhibit remarkably consistent expression and phophorylation. Bad is a pro-apoptotic protein that regulates mitochondrial metabolism and when un-phosphorylated can trigger cell death (
Danial, 2009). Because apoptotic machinery is found in essentially every cell type, ubiquitous detection of this network is not surprising. Furthermore, the uniformly high phosphorylation is consistent with healthy, mature tissues whose cells are unlikely to undergo apoptosis.
Combining phosphorylation data with signaling network maps reveals ztissue-specific differences Most cellular signaling networks rely on sequential and coordinated phosphorylation of constituent pathway proteins to relay and amplify the initial signal; these pathways are found in virtually all cells and are required for sensing and responding to environmental cues. We investigated one of the most ubiquitous kinase cascades, the MAP kinase pathway, as it mediates cellular responses to growth factors and other survival and proliferation cues. To survey differences in MAPK signaling among tissues, we overlayed each tissue’s phosphoproteomic profile onto the KEGG database (
Kanehisa et al., 2010) MAPK pathway (). As expected for a central signaling pathway, much of the network was globally utilized; however, tissue-specific patterns were also apparent. Although signaling from Mras to Erk1 was found in almost all tissues, Mek1 was phosphorylated in brain and kidney, while Mek2 was modified in liver, lung, pancreas, and testis. These differences are post-translationally controlled, as unmodified Mek1 and Mek2 were detected in most tissues. These observations suggest avenues for future study that will elucidate how tissue-specific phenotypes are achieved through ubiquitous pathways.