|Home | About | Journals | Submit | Contact Us | Français|
Phosphorylation of proteins is a predominant reversible post-translational modification. It is central to a wide variety of physiological responses and signaling mechanisms. Recent advances have allowed the global scope of phosphorylation to be addressed by mass spectrometry using phosphoproteomic approaches. In this perspective we discuss four aspects of phosphoproteomics; namely insights and implications from recently published phosphoproteomic studies, and applications and limitations of current phosphoproteomic strategies. As about 50,000 known phosphorylation sites do not yet have any ascribed function, we present our perspectives on a major function of protein phosphorylation that may be of predictive value in hypothesis based investigations. Finally we discuss strategies to measure stoichiometry of phosphorylation in a proteome-wide manner which is not provided by current phosphoproteomic approaches.
Phosphorylation of proteins is central to a wide variety of metabolic, hormonal, developmental, and stress responses . It is an extensively employed mode of signal transduction, frequently involving cascades of phosphorylation events among kinases . Importantly, protein phosphorylation has been a cornerstone in accelerating our understanding of health and disease [3,4]. Further, protein kinases have gained increasing focus as therapeutic targets for the treatment of cancer and chronic inflammatory diseases [5,6]. Accordingly, over the last decade and half, analysis of protein phosphorylation has been of significant interest to the practitioners of mass spectrometry . Several approaches have been developed for the identification of phosphorylation sites [8–11]. These approaches take advantage of different types of chemistries for enrichment of phosphorylated peptides/proteins, and different technologies for ionization, fragmentation and mass-analysis of phosphorylated peptides. A general schematic for the phosphorylation analysis by mass spectrometry is provided in Figure 1. We point the reader to several excellent reviews on the details of methodological aspects [9,12–15].
Phosphorylation of proteins is one of the most predominant reversible post-translational modifications in eukaryotes . The human genome encodes 518 protein kinases, encompassing 1.7% of the genome . Based on traditional double radiolabeling experiments, it was estimated that nearly a third of all encoded proteins are phosphorylated in mammalian cells [1,16]. The global scope of phosphorylation can now be addressed by mass spectrometry as a result of more recent advancements such as sequential enrichment strategies, assessment of false-discovery rate of identification, and automated assignment of site localization in phosphopeptides [17–21]. This has paved the way to a new sub-discipline called phosphoproteomics. In this article, we summarize general insights gained from the phosphoproteomic studies. We highlight applications of phosphoproteomics in making advances in cell biology and in settings of direct clinical relevance. We also offer our perspectives on the fundamental functions of protein phosphorylation. We present arguments and observations that point to modulation of protein-protein interaction (PPI) as a widespread function of serine/threonine phosphorylation. Finally, we point out that lack of information on stoichiometry of phosphorylation is a key limitation of the current phosphoproteomic approaches and present our thoughts on how to measure it in a proteome-wide manner.
To date, large phosphoproteomic datasets with thousands of novel phosphorylation sites have been compiled for different model organisms and human cell lines [10,19,20,22–28]. Bioinformatic analyses of these phosphoproteomic datasets point to several global trends. We discuss the major global trends consistently observed in multiple datasets or by combining multiple datasets from different model organisms or biological sources. We propose that these trends are general, emergent features of most eukaryotic phosphoproteomes. We have been careful in generalizing the global trends observed in the different phosphoproteomic datasets as several factors influence the final compilation of identified phosphopeptides and phosphorylation sites.
Linear stretch of residues around the site of phosphorylation is a major determinant of kinase specificity towards protein substrates . Consensus linear recognition motifs have been defined for many kinases and kinase sub-families . Phosphoproteomic datasets have been classified based on these kinase-recognition motifs. Such classification has revealed that proline-directed, acidiphilic and basophilic phosphorylation sites are the predominant classes in eukaryotic cells [19,20,24,26–28]. Which among these is the most predominant is likely dependent on the particular biological state and thereby the activity status of kinases in the sample used for analysis. Similarly, further sub-classifications based on kinase recognition motifs do not show consistent trends across multiple datasets. The choice of different methodologies has also contributed to lack of consistent trends [10,22,26,28]. For example, enrichment of phosphopeptides using Titanium dioxide (TiO2) can be influenced by the presence of acidic and basic residues more than the other methods [25,31]. Further, immobilized metal affinity chromatography (IMAC) enriches multi-phosphorylated peptides better than TiO2 and phosphoramidate chemistry (PAC) [9,31]. In contrast, PAC and TiO2 enrich mono-phosphorylated peptides better than IMAC [9,31]. Similarly, evidence suggests that the use of endo-LysC and electron-transfer dissociation (ETD) will make more basophilic phosphorylation sites amenable for identification than the use of trypsin and collision activated dissociation (CAD) . Thus different approaches for phosphoproteomic analysis are complementary and a compilation from multiple approaches is regarded more comprehensive [25,31–33].
Protein kinases themselves are the most over-represented class of phosphoproteins in multiple model organisms [10,22,24]. Similarly, more transcriptional regulators are targeted for phosphorylation than expected from their representation in the genomes of model organisms [10,22,24]. Both these observations are consistent with the classical signaling paradigm wherein phosphorylation cascades control the activation of multiple kinases culminating in the activation of transcription factors. In HeLa and Jurkat T-cell lines, more nuclear proteins tend to be targeted for phosphorylation than proteins localized in other sub-cellular organelles [19,24].
Both in yeast and in the Jurkat T-cell line, phosphoproteins participate in more PPIs overall and among themselves than expected from randomly chosen proteins irrespective of their phosphorylation status [10,24]. Network analyses of PPI datasets assembled from accumulated biological literature and large-scale yeast two-hybrid and affinity pull-down experiments were used to arrive at these conclusions. The observed correlation between phosphorylation and PPI has implications for signal transduction research and will be discussed further in a later section.
Several research articles and reviews on phosphoproteomics have implied that phosphorylation often occurs on signaling proteins of low abundance [9,34–36]. But phosphoproteomic datasets from yeast, drosophila, and the Jurkat T-cell line are not over-represented in proteins of lower abundance [10,22,24]. Instead, the profile of abundance distribution is similar for the phosphoproteome and the proteome. Interestingly, a recent bioinformatic analysis indicates that proteins of lower abundance in human cells tend to be phosphorylated at more sites than those of higher abundance . However, this observation needs to be tested more rigorously in multiple model organisms.
The knowledge that linear stretch of residues around the site of phosphorylation is a major determinant of kinase specificity has structural implications . This knowledge posits that phosphorylation tends to occur in unstructured regions of proteins. Accordingly, it has been observed that ends of proteins, ends of structured domains, and loops and hinges in protein structures, as well as large disordered stretches have more phosphorylation sites [20,38–40]. Also, serine/threonine (S/T) phosphorylation sites are located more on solvent-accessible surface residues [38,39]. A recent analysis also noted that phosphorylation sites tend to congregate around each other on the surface of multi-phosphorylated proteins [37,40].
Multiple studies have recently shown that S/T phosphorylation sites are conserved to a greater extent than expected for S/T residues irrespective of known phosphorylations [41–43]. These studies have used different sequence alignment approaches and statistical methods to investigate evolutionary conservation of phosphorylation sites. This evidence on conservation of phosphorylation sites further emphasizes the critical role of phosphorylation in physiology and also validates the efforts in the development of technologies for phosphoproteomic analyses.
Yachie et al. found that phosphoproteins in yeast and mammals followed a power-law distribution with respect to the number of phosphorylation sites . A power-law distribution reflects that most proteins have few phosphorylation sites whereas a few proteins have disproportionately high number of phosphorylation sites. The observation of power-law distribution in the number of phosphorylation sites predicts that proteins with more phosphorylation sites tend to accumulate even more phosphorylation sites during evolution.
Availability of large-scale data has spawned and driven several sub-disciplines of bioinformatics in the past. On the evidence of investigations on the global trends discussed above, we anticipate that phosphoproteomic datasets will continue to feed bioinformatic analyses leading to new insights on the biology of protein phosphorylation. For example, even as this review was being compiled, important insights on the evolutionary aspects of phosphorylation based regulation have been obtained by novel bioinformatic analyses in the recent months [44–47]. Similarly, most of the global trends discussed above hold true mostly for S/T phosphorylation and focused bioinformatic analysis of tyrosine phosphorylation sites are yet to be published.
Quantitative mass spectrometric techniques involving heavy isotope labeling have been adapted to quantify changes in the abundance of phosphopeptides and thereby changes in the extent of phosphorylation on identified phosphopeptides (Figure 1B). This approach has been particularly fruitful in the study of cell signaling systems, identification of substrates for kinases, and comparative analysis of oncogenic signaling. We discuss insights from these quantitative phosphoproteomic studies.
Olsen et al. captured the temporal dynamics of the phosphoproteome in response to Epidermal Growth Factor (EGF) stimulation in HeLa cells . Nearly a thousand phosphopeptides showed more than two-fold change in abundance over the five early time-points represented by different heavy isotope labels. This study revealed that the scope of stimulus-dependent changes in phosphorylation is much more than previously appreciated, both in terms of classes of proteins and number of proteins in each class. Apart from a large number of kinases, phosphatases, and transcription factors, the authors observed changes in phosphorylation on small GTPases, GTPase modulators, actin-binding proteins, RNA binding proteins and ubiquitin ligases. This study, along with others, also points to a prevalence of multisite phosphorylation during signaling [23,24].
We have used quantitative phosphoproteomic strategies to identify novel TCR-responsive phosphorylation sites and gain insights on T Cell Receptor (TCR) signaling . TCR signaling is central to maturation of T cells in the thymus and multiple aspects of the adaptive immune response. Using the Jurkat T cell line as a model system, we identified 10,665 unique phosphorylation sites, of which 696 showed TCR-responsive changes. Among these TCR-responsive sites 70 were tyrosine phosphorylation sites. Our major finding is that TCR-responsive phosphorylation extensively targets proteins involved in all of the salient T cell activation-associated cellular phenomena: patterning of certain cell-surface proteins, TCR endocytosis, formation of F-actin cup, activation of integrin by ‘inside-out’ mechanism, polarization of microtubules, alternative splicing, and cytokine transcription . Although considerable understanding has come about on the regulation of cytokine transcription, little is known about how all the other associated phenomena are initiated and coordinated upon TCR stimulation. We conducted a detailed analysis of novel S/T TCR-responsive phosphorylation sites with the aim gaining potential insights on their role in TCR signaling. The results from a) conservation analysis, b) statistical analysis of network parameters, c) experimental results on TCR-responsive phosphorylation on tubulins, d) analysis of sequence stretches responsible for PPIs, and e) extrapolation from many functionally characterized phosphorylation sites enabled us to deduce that S/T phosphorylation modulates PPIs in a system-wide fashion to regulate diverse stimulus-dependent processes during TCR signaling .
Dephoure et al. used large-scale quantitative phosphoproteomic approaches to examine cell-cycle dependent phosphorylation changes in HeLa cells . Cells arrested in G1 phase showed little change in phosphorylation, whereas cells arrested in the M phase showed phosphorylation changes in over a thousand proteins when compared to asynchronously growing cells. Majority of the phosphorylation sites showing up-regulation at M phase could be grouped into classes of motifs conforming to minimal substrate requirements of three major mitotic kinases, namely cyclin dependent kinase (Cdk1), Aurora Kinases, and Polo-like Kinase 1. Mitosis is accompanied by drastic changes to the cell . Processes associated with protein synthesis such as transcription, splicing and translation are inhibited. In addition, the nucleolus dissolves and the nuclear envelope breaks down during the entry into mitosis. The chromosomes get condensed tremendously and get captured by microtubules to form the spindle structure. Finally the sister chromatids separate during anaphase, culminating in cytokinesis. Apart from known mitotic phosphorylations, the authors observed statistical over-representation of proteins involved in these processes to be phosphorylated extensively during mitosis. Thus many of the novel mitotic phosphorylations may have a functional and regulatory role in the mitotic phenomena.
The three systems discussed above have been extensively studied over the last two decades. On one hand, the available information on these signaling systems has allowed rigorous assessment of the capabilities of large-scale quantitative phosphoproteomic technologies. Simultaneously the quantitative data point to an extensive role for phosphorylation in regulating a wide variety of associated processes and classes of proteins that was previously unanticipated or under appreciated. Thus, these quantitative datasets allow for the formulation of concrete hypotheses on the true functional significance of the responsive phosphorylation sites. Based on these studies, it is reasonable for us to anticipate that quantitative phosphoproteomics will pave the way towards further characterization of many other signaling systems, including those of cytokines, hormones, morphogens, and those responsible for innate immunity.
Several strategies and reagents exist to screen and identify substrates of protein kinases [16,30]. Advances in quantitative phosphoproteomics have provided hypothesis-free, proteome-wide screens for generating lists of putative in vivo substrates and their target sites of phosphorylation [16,30]. Matsuoka et al. aimed to identify putative substrates and target sites of ATM/ATR kinases that are activated upon DNA damage caused by ionizing radiation . They used a panel of antibodies directed against many known substrate sites of ATM/ATR, as these kinases are known to target SQ/TQ motifs. The antibodies were used to enrich phosphopeptides conforming to this motif followed by quantitative mass spectrometry. More than 900 putative ATM/ATR substrate sites were identified in the study. The authors used siRNA to reduce the expression level of many of the novel candidates and provided evidence for their role in DNA damage response by multiple phenotypic assays. Most of the putative substrates identified in the study are known to play a role in processes that are modulated upon DNA damage such as DNA replication, recombination, and repair and cell-cycle checkpoint. Stokes et al. refined the substrate screening strategy further by using a phospho SQ/TQ motif-speicific antibody instead of a large panel of antibodies . Smolka et al. used kinase-null strains to identify putative substrates of three check-point kinases Mec1, Tel1, and Rad53 in yeast upon DNA damage caused by a DNA alkylating agent .
We have defined putative substrates of Erk kinases during the initial phase of TCR signaling by pharmacological means (Figure 2) . U0126, a widely used inhibitor for Mek kinases was used to prevent activation of Erk and downstream kinases upon TCR stimulation. The frequency plot of the residues surrounding the phosphorylation sites with inhibitor-dependent response indicates that the candidates are likely to be bona fide substrates of Erk. Among the 52 novel putative Erk substrate target sites, eight are on proteins with known roles in TCR signaling. Among them we have validated T260 of Bcl11B to be a substrate of Erk using constitutively active and dominant negative forms of Mek .
Holt et al. used yeast strains expressing analog-sensitive kinases to identify 547 putative substrate sites of Cdk1 . Conservation analysis of these phosphorylation sites revealed that the majority of putative Cdk1 substrate sites are located in rapidly evolving disordered or unstructured regions of proteins and that these have shifted positions as clusters in orthologs of the ascomycete lineage. This observation has important implications for the evolution of signaling by phosphorylation and resultant modulation of PPIs.
Although generating cell lines expressing analog-sensitive alleles is not seamless in mammals, Kevan Shokat’s lab has made available a handful of such reagents [51,52]. Analog-sensitive kinases function as wild-type counterparts, but are rapidly inhibitable by a specifically designed ATP analog . These reagents provide highly specific means of inhibiting the desired kinase at any desired time-point for short periods. Phosphorylation changes observed with this experimental design are likely to be due to direct effects of the specific kinase, thus leading to the delineation of its substrate sites. But the approach may be biased towards substrate sites that are not under tight control of phosphatases .
Receptor Tyrosine Kinases (RTKs) function as oncogenes or ‘drivers’ both in solid tumors and hematologic cancers [4,54–57]. Therapeutic strategy of targeted inhibition of RTKs has been partially successful . But, resistance towards tyrosine kinase inhibitors has been a major problem [56,58]. A better understanding of how the tumor cells uniquely rely on oncogenic signaling and how sensitivity or resistance towards RTK inhibitors develop in tumors is important for improvements in treating various malignancies. Here we discuss some of the recent quantitative phosphoproteomic studies that have contributed considerably towards this goal.
In order to better characterize phosphotyrosine signaling and identify putative RTK oncogenes in lung cancer, Rikova et al. conducted extensive profiling of phospho-tyrosine signaling in a large panel of non-small cell lung cancer (NSCLC) cell lines and tumors . Peptides containing phosphotyrosine were enriched by immunoprecipitation and identified by mass spectrometry. The authors rank ordered RTKs based on the accumulated spectral count of all the constituent tyrosine phosphopeptides in each cell line or tumor sample. The rationale behind this approach was that the cumulative spectral count would serve as an index of the activity of RTKs and thus allow the identification of oncogenic RTKs. In known cases, oncogenic RTKs were found to be ranked high, validating their approach. Along with the known oncogenic kinases, ALK, ROS, PDGFRα, and DDR were most frequently found to be highly phosphorylated and active in the NSCLC cell lines and tumor samples. Of these, the authors showed that ROS and PDGFRα can act as ‘driver’ kinases in lung cancer. Another important observation was that NSCLC tumors expressed different combinations of active RTKs suggesting a need for individualized therapies.
Among NSCLC patients, only ~15% respond clinically to EGFR inhibitors and these responding patients frequently have tumors with activating mutations or genomic amplifications in EGFR. Guo et al. sought to understand the underlying mechanisms of sensitivity to EGFR inhibitors in tumors with high EGFR activity using NSCLC model cell lines . The authors conducted quantitative phosphoproteomic analysis to identify tyrosine phosphorylations that decrease in response to EGFR inhibitors. In cell lines sensitive to EGFR inhibitors, the oncogenic EGFR activates multiple RTKs, including c-Met. Similar coactivation of other RTKs by the driver RTK has been observed in gastric cancer, glioblastoma, and other adenocarcinoma [58,60–62]. These studies highlight the need for targeting multiple RTKs in cancer therapy . The above results also explain why some of the first generation tyrosine kinase inhibitors that were eventually recognized to inhibit more than one tyrosine kinase were remarkably effective in the clinic .
EGFR is frequently mutated in glioblastoma and 50% of tumors have a truncated, constitutively active form called EGFRvIII . Huang et al. used a cell line model expressing varying amounts of EGFRvIII to understand the unique signaling properties of oncogenic form . From quantitative phosphoproteomic analysis of tyrosine phosphorylation, the authors noted that only the mutated form but not the wild-type form of EGFR robustly activates PI3K. The authors also observed co-activation of c-Met that correlated with the level of expression of EGFRvIII. Their results from pharmacological experiments indicate that cotreatment strategies based on targeting of c-Met may be a viable approach to treat glioblastoma patients with EGFRvIII deletion mutations.
To date, more than 50,000 phosphorylation sites have been discovered in the human proteome with majority being S/T sites (Source: www.phosphosite.org). In a stark contract, an estimated 3000 of these phosphorylation sites have been functionally characterized to some extent. It is compelling to ask what functions do these thousands of newly discovered S/T phosphorylation sites serve. While one may argue that individual researchers focusing on particular proteins are best equipped to address this question rigorously in individual cases, we feel it is worthwhile to consider the above question in general. Often, a commonly observed mechanism serves as a basis for further fruitful hypothesis-based experimentation on more such instances in nature. For example, functional characterization in many signaling systems and cell types has been aided by the knowledge that most tyrosine phoshorylations recruit SH2 domain containing proteins in a sequence-restricted manner . Are there similar rules or themes for signaling by S/T phosphorylations that may be of predictive value? We think that the major function is modulation of PPIs: inducing intra- or inter-molecular phosphorylation-dependent association or disruption. We present our arguments for this speculation below.
First, phosphoproteins participate in more PPIs than expected by chance alone [10,24]. Similarly, the case-by-case analysis as well as network analyses of proteins with TCR-responsive S/T phosphorylaitons point to system-wide modulation of PPIs during TCR signaling . When tested experimentally, we found that all the known and novel S/T phosphorylation sites on tubulins abrogate their incorporation into microtubules. Based on the available crystal structure data, it appears that the S/T phosphorylation sites do not favor the three modes of PPIs during microtubule assembly. Further, we have observed that proteins belonging to large macromolecular complexes such as the spliceosome and the nuclear pore complex have more S/T phosphorylation sites than expected from an average number of about 3 phosphorylation sites per protein . As extensive physical interactions are known to exist among proteins in macromolecular complexes, it is reasonable to speculate that S/T phosphorylations modulate these interactions. In fact, it has recently been proposed that extensive phosphorylations observed on nuclear pore proteins are responsible for the dissolution of the nuclear pore complex during mitosis . Similarly, many proteins that are part of other macromolecular complexes such as the kinetochore, cohesin complex, APC, and the pre-replication complex are also phosphorylated during mitosis . Finally, majority of S/T phosphorylation sites are found in disordered regions of proteins, which are also regions mediating majority of protein-protein interactions [40,46,47,65].
Accumulated knowledge on the role of S/T phosphorylation also points to modulation of PPIs as the predominant function. In our case-by-case analysis of TCR-responsive S/T phsophorylations, we found that majority of the functionally characterized sites were known to module PPIs . Further, several conserved domains have been discovered that bind to S/T phosphorylation motifs to mediate PPIs . While S/T phosphorylation has been documented to regulate protein half-life and intracellular localization in many cases, both are ultimately a consequence of modulation of PPIs . Based on all the reasoning presented above, we propose that a major function of large majority of S/T phosphorylation sites is modulation of PPIs. One apparent difference between tyrosine versus S/T phosphorylation is that S/T phosphorylation seems to influence protein-protein association as well as disruption, while tyrosine phosphorylation has been recognized as promoting association in most cases via SH2 domains [24,63]. As S/T phosphorylation events are much more pervasive in cells, it is reasonable to speculate that protein complexes that are assembled due to phosphorylation-dependent associations (on S/T/Y residues) may also be disrupted by other site-specific S/T phosphorylation events. Thus, during a signal transduction event that initiates from the plasma membrane-associated kinases and permeates to the nucleus, a series of interlinked and dynamic phosphorylation-dependent assembly and disassembly of protein complexes may serve as an underlying mechanism of information transfer .
Stoichiometry of phosphorylation refers to the extent of phosphorylation or the fraction of protein that is phosphorylated at a given site. Traditionally, for most functional studies, a relative-change or fold-change in the stoichiometry of phosphorylation between different biological states is sufficient. While it may seem that measuring the change in stoichiometry is more crucial, knowledge on the actual stoichiometry is also valuable as elaborated below. Also, while biochemical methods based on radioisotope labeling are available [67–69], we present arguments for the need to quantify the stoichiometry of phosphorylation in a site-specific and proteome-wide manner.
It is becoming increasingly clear that intracellular signaling comprising of kinases and phosphatases occurs through networks of interactions and not just linear pathways . Further, components of the classically defined linear pathways have been shown to participate in feedback, feed-forward regulations, and also bind to scaffolding proteins [71,72]. Mathematical and computational models have shown that these features of complexity allow signaling networks to not only transmit but also process, encode, and integrate extra- and intra-cellular signals [71,72]. However, in order to gain a comprehensive and quantitative understanding of the mechanisms, we need data-driven approaches to model-based analysis of intracellular signaling . Herein the information on the stoichiometry of phosphorylation is probably the most important for data-driven modeling. Take, for example, the MAPK module comprising of three kinases that are activated in a cascade. The MAPK module has been explored extensively by modeling analyses and shown theoretically to possess versatile signal processing capabilities . But yet, even fundamental systems-level properties such as amplification and ultrasensitivity, which is essential for the switch-like stimulus response, have not been characterized in cellulo due to the lack of data on stoichiometry of phosphorylation.
Multisite phosphorylation of proteins is a recurrent theme wherein the status of phosphorylation at one site controls that of another site [67,75,76]. Thus multisite phosphorylation enables intricate regulation of protein function and is an important feature of complexity of intracellular signaling. Rules for the regulation of multisite phosphorylation are currently interpreted from the measured fold-changes in phosphorylation on mutant forms of proteins . Information on stoichiometry of phosphorylation allows direct and simultaneous comparison of multiple sites and thereby enables a more comprehensive and conclusive deciphering of rules for multisite phosphorylation.
It has recently been proposed that many of the known phosphorylations are non-functional or silent . Unfortunately, non-functionality of specific phosphorylation sites cannot be proven definitively by experimentation, as one cannot exclude the possibility of the sites being functional under biological contexts other than those tested. For example, S454 of ATP:citrate lyase was generally considered to be a non-functional phosphorylation for 25 years, before a function was ascribed for it in 2000 . Despite the difficulties in ascertaining non-functionality, three major arguments have been put forward to support the possible existence of vast number of non-functional phosphorylations . One of them centers on the stoichiometry of phosphorylation: It can be argued that majority of the phosphorylations discovered by mass spectrometry have low stoichiometry due to extensive enrichment of phosphopeptides and improved sensitivity of mass spectrometers. Thus, even a phosphorylation with deleterious effect will effectively be non-functional if the stoichiometry of phosphorylation is low. Such non-functional phoshorylations can persist and accumulate due to lack of evolutionary pressure to eliminate them. While this argument is likely to be valid in isolation, it presupposes that functional phosphorylations have high stoichiometry of phosphorylation. But multiple counter-arguments can be made to suggest that the observed low stoichiometry phosphorylation sites are functional in vivo. a) Recent studies have demonstrated that even in isogenic populations there is considerable cell-to-cell variability in their states and responses [79,80]. Accordingly, one can expect the population of cells to show a wide distribution of individual stoichiometries, which when averaged during sample preparation can be low. b) Under many circumstances, as in neuronal and immunological synapses, phosphorylation changes in a small sub-cellular space dictate biological phenomena . As large population of cells are disrupted and prepared for mass spectrometric analysis, measured stoichiometry of phosphorylation will be low. c) If a certain phosphorylation modulates enzymatic activity, then even a low stoichiometry of phosphorylation can have a significant biological consequence. In order to rigorously assess all the above arguments, we urgently need stoichiometry data on hundreds of phosphorylations with known functional consequences. Similarly, we need stoichiometry data on phosphorylation sites that are conserved to different degrees across phyla to establish clarity on issues relating function, evolution and stoichiometry of phosphorylation. Unfortunately, stoichiometry of phosphorylation is rarely quantified. To our knowledge, quantitative data on in cellulo stoichiometry of phosphorylation is available for ~10 sites and sites with ~5% phosphorylation have proven functional consequence [67,68,81].
Availability of proteome-wide abundance data in yeast has helped benchmark proteomic profiling technologies and acted as a guide for their incremental improvement [82,83]. Having proteome-wide stoichiometry data on phosphorylation in yeast will similarly help benchmark phosphoproteomic technologies and getter a better understanding of the chemical mechanisms behind these methods. This is especially crucial to establish as many options are available for protease treatment, phosphopeptide enrichment, and fragmentation of phosphopeptide ions .
As described earlier, current phosphoproteomic methods rely on enriching phosphopeptides and disregarding non-phosphopeptide counterparts. Fold-change in the stoichiometry of phosphorylations on a proteome-wide scale is measured by enriching phosphopeptides and calculating their relative abundance based on incorporated heavy isotope labels (Figure 1B). Herein the phosphopeptides are identified from tandem mass spectra acquired in a data-dependent fashion. The corresponding parent ion traces are used for quantifying their relative abundance between different biological states. This approach works well if the phosphopeptide contains a single known phosphorylation site, as the fold-change in the phosphopeptide abundance equates to fold-change in the stoichiometry of phosphorylation. But the above approach presents limitations if there are multiple phosphorylations (Figure 3). In order to assign the fold-change in stoichiometry at each of the sites in a multiphosphorylated peptide one needs to have information on the relative abundance among each of its versions in different biological states, which is not provided by the current approaches (Figure 3C). Further, quantitative phosphoproteomic studies on cell signaling systems have revealed that the scope of phosphorylation is widespread [19,23,24]. But without the information on the actual stoichiometry of phosphorylation on a global scale, it is not possible to gain an understanding of how change in the activity status of a few protein species permeates into system-wide changes in phosphorylation. Thus, lack of information on stoichiometry of phosphorylation is a key limitation of the current phosphoproteomic approaches.
We propose to enrich a panel of endogenous proteins with affinity-tags from a library of cells in order to measure the stoichiometry of phosphorylation in a proteome-wide manner (Figure 4A). It must be mentioned that libraries have been developed for baker’s yeast and H1299 NSCLC cell line wherein affinity-tagged versions for over a thousand proteins are transcribed from their natural chromosomal locations [80,83]. These resources enable the application of high throughput strategies for simultaneous affinity-based enrichment of over a thousand endogenous proteins and subsequent proteolysis for mass spectrometric analysis . It must be borne in mind that the activity of endogenous phosphatases must be minimized during affinity enrichment and proteolysis to avoid underestimation of stoichiometry of phosphorylation.
There are three established MS-based approaches for measuring the stoichiometry of phosphorylation (Figure 4, B to D). In a simple and straightforward strategy, an identical sample that is labeled with a heavy isotope is subjected to complete dephosphorylation by alkaline phosphatase. The relative abundance of the unphosphorylated peptide gives the stoichiometry of phosphorylation (Figure 4B) . Though straightforward, this approach has three major limitations. Firstly, it is valid only for peptides with a single known phosphorylation site. Secondly, the efficiency of proteolytic action at the desired residues is known to be influenced by neighboring phosphorylated residues, which may lead to inaccurate measurement of stoichiometry. Thirdly, the results for low stoichiometry phosphorylation are going to be inaccurate due to the variability in the measurement of ion currents. Irrespective of the type of instrumentation and mass-analysis used we anticipate that the values will be unreliable for stoichiometry values that are less than 10%. Nonetheless, we feel this approach is useful in finding out how many of the phosphorylation sites are of low stoichiometry.
Steen et al. have developed an isotope-free approach to quantify the stoichiometry that can be extended to multiphosphorylated peptides (Figure 4C) . The crux of the approach lies in calculating the ‘flyability’ ratio for each of the phosphopeptide versions with respect to that of their unphosphorylated counterpart. The ‘flyability’ ratio can be calculated from subjecting aliquots of a sample to controlled dephosphorylation or by using the corresponding set of synthetic peptide standards. The authors note that ‘flyability’ ratios calculated from the cost-effective approach of controlled dephosphorylation are reliable only if the actual stoichiometry of phosphorylation in the original sample is more than 10%. For the alternate approach, peptides can now be synthesized in a high throughput fashion at nominal rates of 25$ per peptide . An important note of caution on the isotope-free method is that it has not yet been used to measure in vivo stoichiometry of phosphorylation on endogenous proteins. Finally, use of known amount of Stable Isotope labeled Standard (SIS) peptides to quantify the abundance of different versions of the phosphopeptide is probably the most superior and accurate method available to measure stoichiometry of phosphorylation (Figure 4D) [67,68,87]. But the cost of procuring thousands of SIS peptides that have been purified and quantified accurately is a constraint in employing this strategy.
As mentioned previously, over 50,000 phosphorylation sites have been catalogued in human cells, of which about 3000 may have been functionally characterized. Although pinpointing the role of phosphorylation can be very hard, we hope there will be a concerted effort to functionally characterize the rest of them. As discussed before, this is crucial to advance our understanding on the issues relating function, evolution, and stoichiometry of phosphorylation.
In the previous sections, we have highlighted that the lack of information on stoichiometry of phosphorylation is a key limitation of the current phosphoproteomic approaches. But there are many other limitations currently and further advancements are desired in the coming years. One aspect wherein a lot of improvement is still desired is the amount of sample needed for quantitative phosphoproteomic experiments. Currently, around 100 million cells are required if one aims to monitor thousands of phosphorylation sites. Thus, most of the newly discovered phosphorylation sites have come from a few transformed human cell lines [17–19,23,24,33]. On the contrary, we are aware of very few reports on the large scale phosphoproteomic analysis of clinical samples or primary cells and tissues , which are likely to deliver important insights. We also anticipate that such studies will identify thousands of novel phosphorylation sights, as the repertoire of cellular states is going to be much richer in clinical samples and primary cells/tissues. With these studies, we may have a better estimate on the total size of the mammalian phosphoproteomes.
As mentioned before, multisite phosphorylation is a recurrent theme in signal transduction [67,68,75,76]. Multisite phoshorylation can be employed for combinatorial control of protein function, wherein specific combinations of phosphorylations code for distinct outcomes, as in histone modifications [67,76]. The current shotgun approach to phosphoproteomics precludes the analysis of phosphorylation sites on entire proteins. ‘Top-down’ or ‘middle-down’ schemes are necessary for the discovery of these distinctly phosphorylated subsets of proteins .
Most of the quantitative phosphoproteomic studies published to date have compared a two or three biological states. We hope that the future studies will seek better temporal resolution to gain an understanding of how change in the activity status of a few protein species permeates into system-wide changes in phosphorylation. Another related aspect is the measurement of phosphorylation turn over rates in a proteome-wide fashion. Computational modeling studies have established that intracellular information is encoded not only in states, but also in rates [71,72]. Thus, efforts to measure phosphorylation turn over rates are paramount to further our understanding of signaling mechanisms.
Quantitative phosphoproteomics has provided very effective tools to identify putative substrates of kinases. Currently the list of putative substrates is being generated from a single end-point comparison. However, there is enough evidence to suggest that many kinases have different sets of substrates depending on the biological context [51,52]. We anticipate that analog sensitive alleles will be of immense utility in unraveling such context-dependent activity of kinases over time-course . We also hope that quantitative phosphoproteomics will be applied towards cataloguing putative substrates of phosphatases, which are increasingly being implicated in disease pathology [90–92]. Further, the principles of substrate recognition by protein phosphatases is increasingly becoming clear [90,92], which allows for more rational design of quantitative phosphoproteomic screens for phosphatase substrates.
Particular sets of kinases and phosphatases either sensitize or render cancer cells resistant to different types of insults [93,94]. Quantitative phosphoproteomics will be useful in understanding how such effects of kinases and phosphatases are manifested in cancer cells [56,93]. Protein kinases are given foremost importance as targets for therapies against cancers [4–6,58]. Many agents that inhibit the function of protein kinases are in clinical practice and many more are in clinical trials [4–6,58]. We anticipate that quantitative phosphoproteomics will help in providing a good understanding of their mode of action and off-target effects, as well as differences among their variants . Similarly, quantitative phosphoproteomics may be applicable to reveal fundamental mechanisms of resistance towards kinase inhibitors that is frequently observed [56,58]. Most importantly, we anticipate that quantitative phosphoproteomics strategies will drive the identification of phosphorylation-dependent dynamically-interacting protein networks such as the lymphoma / leukemia / hematopoiesis protein network that we have deduced . We propose that identification of disease-specific signaling protein networks will serve as a blueprint for targeting rational drug design and to specifically inhibit multiple key signaling nodes [54,93].
Authors’ work discussed here was supported by NIH Grants RO1 HL 67569 and PO1 HL 70694 to DKH. The authors also acknowledge critical inputs from the anonymous reviewers.
Viveka Mayya, Center for Vascular Biology, Department of Cell Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030, Tel: 860-679-2775, Fax: 860-679-1201, Email: email@example.com.
David K. Han, Center for Vascular Biology, Department of Cell Biology, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030, Tel: 860-679-2444, Fax: 860-679-1201, Email: ude.chcu.osn@nah.