|Home | About | Journals | Submit | Contact Us | Français|
Protein tyrosine phosphorylation represents a central regulatory mechanism in cell signaling. Here we present an extensive survey of tyrosine phosphorylation sites in a normal-derived human mammary epithelial cell line by applying anti-phosphotyrosine peptide immunoaffinity purification coupled with high sensitivity capillary liquid chromatography tandem mass spectrometry. A total of 481 tyrosine phosphorylation sites (covered by 716 unique peptides) from 285 proteins were confidently identified in HMEC following the analysis of both the basal condition and acute stimulation with epidermal growth factor (EGF). The estimated false discovery rate was 1.0% as determined by searching against a scrambled database. Comparison of these data with existing literature showed significant agreement for previously reported sites. However, we observed 281 sites that were not previously reported for HMEC cultures and 29 of which have not been reported for any human cell or tissue system. The analysis showed that the majority of highly phosphorylated proteins were relatively low-abundance. Large differences in phosphorylation stoichiometry for sites within the same protein were also observed, raising the possibility of more important functional roles for such highly phosphorylated pTyr sites. By mapping to major signaling networks, such as the EGF receptor and insulin growth factor-1 receptor signaling pathways, many known proteins involved in these pathways were revealed to be tyrosine phosphorylated, which provides interesting targets for future hypothesis-driven and targeted quantitative studies involving tyrosine phosphorylation in HMEC or other human systems.
Phosphorylation, particularly tyrosine phosphorylation, acts as a central regulatory mechanism for cell signaling. As such, characterizing how individual phosphorylation sites respond to different cell stimuli is critical for understanding the regulation of cellular function1. While recent advances in mass spectrometry (MS)-based phosphoproteomics technologies allow extensive profiling of site-specific phosphorylation on serine and threonine residues 2–7, the characterization of tyrosine phosphorylation is more challenging due to the significantly lower levels of tyrosine phosphorylation (pTyr) compared to phosphoserine (pSer) and phosphothreonine (pThr). For example, a relative abundance of 1800:200:1 has been estimated for pSer:pThr:pTyr in vertebrate cells8. More recently, the coupling of peptide level anti-pTyr immunoaffinity purification (IP) with LC-MS/MS has proven to be a good approach for profiling tyrosine phosphorylation 9–11, but sensitivity remains a significant challenge for obtaining extensive coverage of the phosphotyrosine proteome12. To date, several MS-based studies have investigated site-specific tyrosine phosphorylation in cellular and tissue proteomes 9, 13–16. The extensive characterization of pTyr sites in selected biological model systems will enable more directed studies that focus on the functional role of selected pTyr sites involved in specific pathways or biological processes.
In this work we present an extensive profiling of site-specific tyrosine phosphorylation in a well-studied human mammary epithelial cell line (HMEC, strain 184A1) using anti-pTyr peptide IP coupled with high sensitivity LC-MS/MS. These cells have been extensively used as a model system for studying the mechanisms controlling proliferation and differentiation in normal human cells 17, especially for understanding the role of the epidermal growth factor receptor (EGFR) signaling pathways 18–20. More recently, several studies have been focused on studying the dynamics of tyrosine phosphorylation involved in the EGFR signaling pathways by applying iTRAQ14 labeling and multiple reaction monitoring10, 21 coupled with anti-pTyr IP and LC-MS/MS. While these studies provided significant novel information regarding the dynamics of tyrosine phosphorylation, it is expected that many important tyrosine phosphorylation events within the HMEC proteome still remain to be discovered. In this study both basal (control) and epidermal growth factor (EGF)-stimulated conditions were analyzed separately to maximize coverage of the phosphotyrosine proteome. Nine replicate analyses of each condition were performed using anti-pTyr peptide IP coupled with high sensitivity LC-MS/MS. A total of 481 pTyr sites (covered by 710 unique peptides) were confidently identified, which represents a significant increase in the number of previously reported pTyr sites in HMEC.
HMEC line 184A1 was obtained from Lawrence Berkeley National Laboratory22. These cells were routinely cultured in DFCI-1 medium supplemented with 12.5 ng/mL human EGF (Calbiochem, San Diego, CA) as previously described23,24. HMEC were grown to ~80% confluence in 100 mm dishes, washed with phosphate-buffered saline to facilitate removal of the fetal bovine serum, and incubated overnight (18 h) in serum-free medium consisting of alpha MEM:F12 that contained 0.1% BSA prior to stimulation and sample collections. For EGF stimulation, cells were incubated for 7.5 min with medium that contained 25 ng/mL EGF.
Cells were scraped using freshly prepared 8 M urea in 20 mM ammonium bicarbonate (pH 8.0) with 1 mM sodium othovanadate, and 10 mM NaF. Lysis was performed by passing the solution through an 18-gauge syringe needle and then sonicating the solution at ice temperature for 5 min. Proteins were reduced with 10 mM dithiothreitol (DTT) for 60 min at 37 °C, followed by alkylation with 40 mM iodoacetamide for 90 min at room temperature. Samples were diluted 5-fold and digested overnight at 37 °C, using sequencing grade trypsin (Promega, Madison, WI) at a 1:50 (trypsin:protein, w/w) ratio. Immediately following digestion, acetic acid was added to a final concentration of 0.5% (v/v), and the peptide samples were desalted were using SepPak Environmental-Plus C18 solid phase extraction cartridges (Waters, Milford, MA). The reversed phase media was activated, using four column volumes of methanol and then equilibrated with four volumes of 0.1% trifluoroacetic acid (TFA). After loading the acidic peptide solution, the cartridge was washed with four volumes of 5% acetonitrile (ACN) with 0.1% TFA. Peptides were eluted with two column volumes of 40% ACN (0.1% TFA) followed by two column volumes of 80% ACN (0.1% TFA). Eluted peptides were lyophilized to dryness and stored at -80 °C until needed for immunoaffinity purification.
Anti-pTyr peptide immunoprecipitation was performed as previously described but with minor modification4 immediately before LC-MS/MS analysis. Lyophilized peptides were dissolved in IP buffer (50 mM 3-(N-morpholino)propanesulfonic acid, pH 7.2, 10 mM sodium phosphate, 50 mM NaCl) with a final peptide concentration ~ 4 mg/mL. The resulting peptide solution was transferred into a microcentrifuge tube that contained pY-100 antibody beads (Cell Signaling Technology, Danvers, MA) and incubated overnight at 4 °C. The beads were then gently washed three times with 1 mL IP buffer and twice with 1 mL of water. Peptides were eluted from the beads by incubating them in 45 μL of 0.15% TFA at room temperature for 10 min, followed by another 55 μL 0.15% TFA wash. Peptide eluates were collected following centrifugation (5 min at 1700 × g), and the enriched peptide samples were purified with micro C18 ZipTips (Millipore, Billerica, MA) prior to LC-MS/MS analysis.
Two 14 μL aliquots of IP eluate were used for each LC-MS/MS analysis, and bound peptides from both aliquots were eluted into the same 40% ACN, 0.1% TFA solution. All wash and elution volumes were 10 μL. The purified sample was dried down and reconstituted to the desired injection volume of 5 – 15 μL with 0.1% TFA.
Samples were injected into a nano-flow, metal-free nanoLC system (developed in house) with a 40-cm long, 30 μm i.d. capillary column packed (in-house) with 3 μm Jupiter (Phenomenex, Torrance, CA) C18 silica. An uncoated fused silica capillary pulled (5-μm at the capillary tip and 20 μm i.d. at the column junction) was joined to the column using a pico-clear union (both from New Objective, Woburn, MA). The gradient ran from 0% − 70% acetonitrile that contained 0.1 M acetic acid. Mass spectra were acquired using an LTQ-Orbitrap (ThermoScientific, Waltham MA) that provided 60,000 resolution Orbitrap MS survey scans, followed by 10 LTQ MS/MS scans of the most abundant parent ions in each scan, with a dynamic exclusion time of 30 s used for parent ion selection.
All DTA files were generated using DeconMSn25 and analyzed using the X!Tandem26 algorithm to search the human IPI protein database (August 22, 2006; 61,225 entries). Only fully and partially tryptic peptides with carboxyamidomethylation of cysteine residues as a fixed modification (57.0215 Da) and phosphorylation of Ser, Thr, and Tyr as a variable modification (79.9663 Da) were considered. The parent ion mass tolerance was set to ±60 ppm for database searching and identifications were filtered using a tighter mass tolerance (8 ppm in this work). ProteinProphet was employed to assign unique peptide sequences to proteins and protein groups derived from the same Human IPI database27. Following ProteinProphet analysis, one IPI accession number (the first listed in an ascending sort for the members of a protein group) was chosen to represent each protein group. The phosphorylation position in the protein selected for each protein group was determined for each peptide match and tagged with a unique identifier.
The false discovery rate (FDR) was estimated using scrambled protein database searches of the data (where each protein sequence in the database was randomized) and compared against forward searches of the same data sets, applying the same filtering parameters28. To estimate the FDR for unique pTyr-containing peptide sequences, both fully and partially tryptic peptides were constrained to an 8 ppm mass error cutoff. Partially tryptic peptides were further filtered by requiring at least two observations, using an X!Tandem log expectation value of < −2 for each peptide. A FDR of ~1.0% (5 false identification sites) was determined. Fragmentation spectra for peptides with only a single observation (for fully-tryptic peptides) and those peptides corresponding to previously unknown phosphorylation sites were confirmed by manual inspection.
The total spectral counts29 (a semi-quantitative measure of abundance) for each pTyr site under each biological condition were calculated by summing among the replicates. Unknown sites were determined by comparing our results to the PhosphoELM (http://phospho.elm.eu.org/)30 and Phosphosite (http://www.phosphosite.org)31 databases.
Non–redundant seed sequences that consisted of 15 or 31 amino acids were generated from the human IPI database using each identified phosphotyrosine as the central residue. These sequences were submitted in two separate batches to the motif search program Motif-X (http://motif-x.med.harvard.edu/)32. The minimum match significance was set to 1×10−6 and a minimum of five occurrences of the motif in the data set. The IPI Human Proteome provided on the Motif-X website was used as the background comparison data set. The central character and width were set appropriately for each data set being searched.
Results obtained for both control and EGF conditions were uploaded for Ingenuity Pathway Analysis (IPA). Proteins from our dataset that had known phosphorylation interactions downstream or upstream of EGFR or insulin-like growth factor 1 receptor (IGF1R) were included in the final network. The phosphorylation level for each protein was based on the total number of spectral counts for all identified sites on the protein after EGF stimulation.
The general workflow for analyzing tyrosine phosphorylation is depicted in Figure 1. Basically, peptide anti-pTyr IP was coupled with high sensitivity LC-MS/MS for proteome-wide profiling, similar to what has been previously reported 4, 14. Basal (control) and acutely EGF-stimulated conditions were evaluated separately in multiple IP experiments, with nine replicate LC-MS/MS analyses for each condition to enhance coverage of pTyr sites by minimizing the potential for undersampling33. Due to the extremely low levels of pTyr peptides, a relatively large amount of protein, i.e., ~4 mg of total peptide (which corresponds to ~4 × 107 cells) were used for each IP experiment. Each enriched pTyr peptide solution was typically analyzed twice by LC-MS/MS, and the resulting MS/MS spectra were searched against a human protein database, using X!Tandem. Identified peptides were filtered based on mass error (in ppm), the X!Tandem peptide expectation value, cleavage state, and peptide spectral count and an FDR estimated from a scrambled protein database search. Identified pTyr peptides were assigned a unique identification designation (SiteID) for pTyr sites. Sites with only a single spectral observation and previously unknown sites were manually examined. A total of 27 sites were removed from the list following manual examination. Spectral counts for each identified site, which provided a semi-quantitative measure of relative abundance of each pTyr site, were then rolled up by biological condition.
56,104 MS/MS spectra (from a total of 570,738 collected in 18 LC-MS/MS analyses, i.e., 9 for each condition) were assigned a sequence by X!Tandem prior to any filtering. Of these 56,104 spectra, 8383 (15% of the those identified by X!Tandem) passed the final filtering criteria. In total, 710 unique pTyr peptides with 481 unique pTyr sites were identified (Supplementary Table 1 and 2) from 285 proteins. 98% of these peptides were singly tyrosine phosphorylated. Among the 13 multiply phosphorylated peptides, nine contained both pTyr and pSer/pThr and three contained multiple pTyr residues.
Figure 2 shows the mass error and expectation value distribution for pTyr-containing peptides identified in both forward (black) and scrambled (red) database searches. Figure 2a shows the unfiltered pTyr-containing peptides, where the forward matches clustered within ± 8 ppm. Figure 2b shows only the fully tryptic pTyr-peptides and contains only a few scrambled database matches within ± 8 ppm, which indicates a very low FDR for this category of peptides. Because partially tryptic pTyr-peptides (Figure 2c) yielded a higher percentage of scrambled database matches, we required at least two spectral counts for each peptide identification and a log peptide expectation value cutoff of −2 (green line) in addition to the mass error filter to enhance data confidence. With the expectation value cutoff applied, the estimated FDR values for the filtered partially tryptic and fully tryptic peptides are close (i.e., 1.9% for partially tryptic peptides and 0.7% for tryptic peptides). The overall FDR estimated for the unique pTyr-containing peptides was 1.0%.
Of the 481 unique pTyr sites, 160 were observed only in the EGF-stimulated condition, while 85 pTyr sites were observed only in the control (Figure 3a). It is possible that many sites observed only in the EGF-stimulated condition have phosphorylation levels below detection limits under basal conditions. To determine whether new pTyr sites were observed in our HMEC samples, we first compared our dataset with two previously published MS-based tyrosine phosphorylation studies in HMEC 10, 11. Based on this comparison, we identified 281 new sites in HMECs (Figure 3b). Our results overlap with the combined results from previous studies by 200 sites (as marked in Supplementary Table 1). While approximately 30% of these new sites were observed on known tyrosine phosphorylated proteins identified in previous HMEC studies10, 11, the new sites also cover ~170 proteins that have not been reported as being tyrosine phosphorylated in HMEC. Note that 43 of these new phosphotyrosine proteins contain multiple pTyr sites and include several known cell signaling proteins, such as neural precursor cell expressed, developmentally down-regulated 9 (NEDD9), plakophilin 2 (PKP2), Insulin receptor substrate 1 (IRS1), and tensin 1 (TNS1), all of which were identified with at least four pTyr sites. Among these cell singling proteins, insulin receptor substrate 2 (IRS2) is a key protein involved in insulin signaling pathways and plays a critical role for glucose homeostasis along with IRS1 34, 35. IRS1 and IRS2 phosphorylation is regulated by insulin-like growth factor 1 receptor (IGF1R) and insulin receptor activation36. NEDD9 is also a well-characterized phosphoprotein acting as a docking protein for kinases involved in cell adhesion37. We identified nine different pTyr sites for NEDD9, only one of which had not previously been identified in other human tissues and cell lines. Tyrosine phosphorylation on NEDD9 is known to be regulated by integrin beta 138 and protein tyrosine kinase 2 (PTK2)39. NEDD9 subsequently increases the phosphorylation on mitogen-activated protein kinase 8 (MAPK8)40.
In the cases of known phosphoproteins with new sites identified in this study, the sites with higher spectral counts tended to correlate with previously reported sites, suggesting that the new site identifications were due to the increased coverage achieved by this study. However, there are several instances where the previously reported sites were the sites with lower spectral counts, e.g., AHNAK nucleoprotein (AHNAK). AHNAK is an adapter protein known to associate several enzymes in the EGFR pathway, such as protein kinase C-alpha 41. The one site that was previously observed (Y2159) was only observed with two spectral counts in the EGF-stimulated condition, while the other five identified sites had spectral counts ranging from 10 to 70. Conversely, there are 153 sites identified in previous MS-based studies10, 11, but not identified here. This difference could be due to several reasons. First, the previous studies10, 11 used a much higher concentration of EGF (100 ng/mL vs. 25 ng/mL) for stimulation, which could lead to phosphorylation of more sites (although they may not be physiologically relevant). Second, the previous studies made measurements at multiple time points following stimulation, while the current study measured phosphorylation present with no stimulation (control or time zero) and a single time-point (7.5 min) after stimulation. Nearly half of the previously observed 153 sites were not detected at early time points, which are similar to the one used in this study. Finally, the 153 sites not detected in this study could possibly be due to MS/MS undersampling; however, nine replicates were performed to minimize this possibility.
By comparing known pTyr sites in the entire human proteome, we identified 29 new tyrosine phosphorylation sites that have not previously been reported in human cells (Table 1). Sites were assigned as novel by searching against the most current version of the Phosphosite and PhosphoELM databases using the detected peptide sequence. Most of the new phosphorylation sites are lower abundance modifications (about half of them have one or two spectral counts). Also, approximately half of the new phosphorylation sites had only one or two spectral counts, so are most likely lower abundance modifications. Moreover, nearly half of the new sites are located on proteins with known multiple tyrosine phosphorylation sites. For example, PTPRK is a receptor tyrosine phosphatase and the new site Y805 is observed with 55 total spectral counts. This specific phosphatase is thought to be involved with cell adhesion because its expression is regulated by cell density 42 and because of its effect on the cellular location of β-catenin complexes 43. Another protein, the transmembrane BAX inhibitor protein 1 (TMBIM1) was recently shown to be phosphorylated at S81 in human platelets44. In our data, four spectra were detected for Y73, all in the control condition. Ribosomal protein L30 (RPL30) is phosphorylated at Y26 and S9 9, 13. We did not observe phosphorylation at these known positions and instead identified phosphorylation at Y73 with 8 total spectral counts (See MS/MS spectrum in Figure 4a).
The study has also led to the discovery of a number of new tyrosine phosphorylated proteins in human that include chromosome 3 open reading frame 1 (C3orf10), CD59 molecule, complement regulatory protein (CD59), toll interacting protein (TOLLIP), OTU domain containing 6B (OTUD6B), lactose-binding lectin 1 (LGALS1), LOC653269, LOC57228, and LOC643596. Many of these proteins are kinases and structural proteins likely to be participants in cell signaling. For example, CD59, a membrane protein that inhibits aspects of the complement cascade, has been reported to promote tyrosine phosphorylation of a c-Src immune complex45; however, its tyrosine phosphorylation has not been reported. In this work, we discovered that both residues Y29 and Y86 of CD59 were phosphorylated (see representative MS/MS spectrum in Figure 4b). The structural protein C3of10 (also called HSPC300) can regulate actin polymerization. Its absence in tumor cells can cause morphological abnormalities, which suggests a critical role in cell development 46. Two other proteins, TOLLIP and OTUD6B are thought to be involved with ubiquitination attachment 47 and removal, 48 respectively.
The coverage of pTyr sites in HMEC achieved in this study allows us to examine potentially significant motifs for tyrosine phosphorylation. Application of the motif-finding algorithm with the constraints described by Swartz et al. 32 led to the identification of six significant motifs (Table 2). The majority of these motifs were identified more than 10 times in the dataset and no motif with a significance of less than 1×10−6 was considered. Three of the motifs, i.e., y......R, G.y, and L.......y.......P..W (where “y” is pTyr, and a period represents any amino acid), have not been previously reported49. Although L.......y.......P..W had the fewest number of matches (i.e., 9), the confidence of this identification is high based on a high score and probability value. Interestingly, the motif y......R is similar to the known SHP2 phosphatase substrate EFyA.[V/I].[R/K/H]S50. The remaining three motifs are associated with tyrosine phosphorylation and have scores consistent with those previously reported32 .
We utilized spectral counting as a semi-quantitative measure to evaluate phosphorylation stoichiometry among different proteins. Specifically, we compared the spectral count information for tyrosine phosphorylated proteins identified in this study with total spectral counts obtained for proteins identified in our earlier global profiling study of HMEC51 to evaluate phosphorylation stoichiometry for individual proteins. Low-abundance proteins that show a high level of phosphorylation may represent important signaling proteins that serve a vital role in biological processes3, 52, while low-level phosphorylation in high-abundance proteins may be non-specific and not physiologically relevant. Figure 5a shows the comparison between the total spectral counts from previous global abundance profiling and the total spectral counts of tyrosine phosphorylation for selected proteins (spectral counts for all phosphorylation sites on the same protein are summed). Based on spectral counts, many of the highly phosphorylated proteins are very low-abundance proteins, while the majority of high-abundance proteins have relatively low levels of phosphorylation. For example, EGFR has 389 phosphopeptide counts, while the total spectral count in previous global profiling was 24. In contrast, filamin-B (FLNB) was observed in 532 total spectra in the previous global profiling study, but observed in only six spectra of one phosphorylation site (Y2502) in EGF-stimulated samples. Many key regulatory proteins involved in the EGFR pathways 53, such as EGFR, mitogen activated protein kinase 1 and 3 (MAPK1 and MAPK3), tyrosine protein kinase c-src (SRC), SHC-transforming protein 1 (SHC1), protein-tyrosine kinase fyn (FYN), and cell division cycle 2 protein (CDC2) were observed with high levels of phosphorylation in this study, but with low spectral counts in the global profiling study (detailed results are provided in Supplemental Table 3). GRB2-associated binding protein 1 (GAB1), Src homology 2 domain containing adaptor protein B (SHB), and the tyrosine kinse adaptor protein c-cbl (CBL) are known members of the EGFR signaling pathway and are observed to be highly phosphorylated in this study, but not observed in any spectra in previous global study. Conversely, low levels of phosphorylation were observed in a number of high-abundance proteins, such as FLNB and myosin-9 (MYH9). These data indicate that tyrosine phosphorylation in many key signaling proteins occurs with a high stoichiometry of phosphorylation. This trend mirrors our observations in an earlier comparison of the global proteome and phosphoproteome of pseudopodia in chemotatic cells 54 .
Spectral count information also provides a means of examining the differences in phosphorylation levels for multiple sites within a given protein, which can vary widely. Phosphorylation differences among multiple sites provide insight as to possible key signaling sites for protein-protein interactions.
In our data set, 284 proteins had two or more phosphorylation sites. For example, seven pTyr sites were identified in EGFR (Figure 5b), of which highly phosphorylated sites Y1197 (190 counts), Y1172 (96 counts), Y1110 (57 counts), and Y1092 (26 counts) interact with the key adaptor protein SHC1 and growth factor receptor bound protein 255. IRS2 is another example with five identified pTyr sites (Figure 5c). The two dominantly phosphorylated sites Y677 and Y825 bind to phosphoinositide-3-kinase56, a key signaling component of the EGFR and insulin signaling pathway.
Other proteins with one or two dominant phosphorylation sites include AHNAK (Figure 5d), annexin A2 (ANXA2), breast cancer anti-estrogen resistance 1 (BCAR1), caveolin 1 (CAV1), v-crk sarcoma virus CT10 oncogene homolog (CRK), catenin delta 1 (CTNND1), epithelial cell receptor protein tyrosine kinase (EPHA2), FYN, inositol polyphosphate phosphatase-like 1 (INPPL1), keratin 7 (KRT7), NEDD9, plakophilin 4 (PKP4), PTK2, protein tyrosine phosphatase-2 (PTPN11), SHC1, and tensin 3 (TNS3). Proteins such as cortactin (CTTN), hypothetical protein LOC64762 (FAM59A), ephrin-B2 (EFNB2), SHB (Figure 5e), and signal transduction adaptor molecule 2 (STAM2) show little difference in phosphorylation levels among the different sites. In these examples, phosphorylation trends among multiple sites are generally in good agreement between the two conditions. Note that this type of information cannot be obtained from isotopic labeling methods as they only measure relative differences between conditions.
Recently, dynamic changes of tyrosine phosphorylation in the EGF signaling pathways of the HMEC system have been studied in detail by mass spectrometry-based approaches10, 11. While the main purpose of the current effort is to identify potential novel pTyr sites, we find that the semi-quantitative spectral count-based abundance changes determined for sites following EGF-stimulation studies are in good agreement well with quantitative results reported in previous studies for the sites that were common. Selected phosphorylation sites along with their increased level of phosphorylation following EGF stimulation are listed in Table 3. Corresponding phosphoproteins with a high number of spectral counts in the EGF-stimulated sample include EGFR(Y1092, Y1172 & Y1197), CAV1(Y6), SHC1(Y428), MAPK3(Y204), MAPK1(Y186), KRT7(Y39). CDC2(Y15) and sodium-potassium-ATPase (ATP1A1) (Y260) are example proteins with a high degree of phosphorylation that decreases sharply upon application of EGF. Other phosphoproteins, such as AHNAK(Y468, Y717, Y1222, and Y2906), CBL(Y674), and MYO6(Y1114) are new identifications in the current study. In all cases, EGF stimulation clearly had an effect on the level of phosphorylation.
We also observed that phosphorylation levels for a number of proteins appeared to be independent of EGF stimulation. Examples of proteins that exhibited little change between control and EGF conditions include ANXA2(Y42 and Y265), CAV1(Y14), cyclin-dependent kinase 2 (CDK2) (Y15), PKP4(Y478), SHB(Y355), and signal transducer and activator of transcription 3 (STAT3) (Y708). Conversely, phosphorylation levels actually decreased following EGF stimulation in the case of CDC2(Y15), ATP1A1(Y260), and PTPN11(Y62), which is consistent with observations in the other studies10, 11. The consistency among the different studies further demonstrates the reproducibility of phosphorylation levels for particular proteins.
The tyrosine phosphorylated sites identified in the current study can also be mapped to signaling pathways of interest and protein interaction networks based on prior knowledge. Although protein-protein interactions have been extensively reported in the literature 57–59, it is often unclear whether these interacting proteins involve phosphorylation and if so, which residues are modified. Figure 6 shows a simplified phosphoprotein interaction network for proteins downstream of EGFR and IGF1R that was generated from Ingenuity Pathway Analysis tool60. Although IGF1R was not identified in this work, the figure shows that the data set can be utilized to map a pathway of interest to identify what proteins in that pathway are modified. This type of visualization is useful for determining which proteins in network pathways are phosphorylated. The map consists of 31 proteins in the current data set with total represented spectral counts ranging from 1 to 635 (in the case of EGFR). Ten of the proteins: annexin A11 (ANXA11), CBL, IRS2, IRS1, protein tyrosine phosphatase non-receptor type 6 (PTPN6), protein tyrosine phosphatase receptor type K (PTPRK), src-signal transducer and activator of transcription 1 (STAT1), SRC, and c-src tyrosine kinase (CSK), heat shock protein 1 (HSPE1), have not been observed as tyrosine phosphorylated in previous HMEC studies10, 11. Based on the diversity of other proteins that they interact with, many enzymes with multiple and increased phosphorylation upon EGF treatment, such as BCAR1, paxillin (PXN), GAB1, PTK2, PTPN11, and SHC1 are potential “hub” proteins. This information can guide selection of specific target proteins and pTyr sites for follow-up directed quantitative signaling studies.
Extensive characterization of tyrosine phosphorylation sites in human cells provides a confident reference dataset useful for hypothesis-driven, as well as directed studies of dynamic cell signaling. The extensive survey of tyrosine phosphorylation in a normal HMEC resulted in identification of 481 unique pTyr sites from 285 proteins, of which 281 pTyr sites were considered as novel for HMEC and 29 pTyr sites had never been reported for the human proteome. The use of available spectral count information for pTyr sites and HMEC global profiling51 revealed known signaling proteins such as GAB1, SHC1, MAPK1 and MAPK3 that had high levels of phosphorylation, but relatively low total abundances (based on spectral count information). This study also uncovered many proteins with multiple pTyr sites and observed that phosphorylation levels were site specific. For a given protein, sites with dominant levels of phosphorylation are usually known to interact with key binding partners involved in signaling pathways, as illustrated by EGFR and IRS2. As such, the site-specific phosphorylation spectral count data provide unique information for determining potentially important pTyr sites and signaling proteins. The extensive coverage of the pTyr sites allows these phosphoproteins to be mapped to specific signaling pathways and protein interaction networks to guide the selection of signaling proteins for directed studies. In summary, this extensive data set of site-specific tyrosine phosphorylation along with the semi-quantitative phosphorylation information for individual pTyr sites provides a reference resource for the scientific community to facilitate more focused biological studies in cell signaling.
This work was supported in part by the Pacific Northwest National Laboratory Biomolecular Systems Initiative LDRD program, NIH R01 DK074795, the NIH National Center for Research Resources RR018522, and the Environmental Molecular Science Laboratory, a national scientific user facility sponsored by the U.S. Department of Energy (DOE) Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory (PNNL). PNNL is operated by Battelle Memorial Institute for the DOE under Contract No. DE-AC05-76RLO-1830
SUPPORTING INFORMATION AVAILABLE
Full listing of the identified tyrosine phosphorylation sites, all detected unique peptide sequences with corresponding spectral count information, and a comparison of phosphorylation sites with a previously reported global study on the same cell system are available as a Microsoft Excel worksheet. All files are downloadable free of charge at http://pubs.acs.org.