Physiological and nonphysiological sites of tyrosine phosphorylation
The receptors EGFR, FGFR1 and IGF1R have 20, 15 and 15 intracellular tyrosine residues, respectively. We began by designating each tyrosine as ‘physiological’ or ‘nonphysiological’ based on previous efforts to uncover sites of tyrosine phosphorylation. In the case of EGFR, Lombardo et al
. initially used tandem mass spectrometry (MS/MS) to identify eight sites of tyrosine phosphorylation (Y1016, Y1069, Y1092, Y1110, Y1125, Y1138, Y1172 and Y1197) on recombinant receptor tail (residues 1000–1210) that had been incubated with purified EGFR kinase or with c-Src.19
Around the same time, Stover et al
. found many of the same sites, as well as two additional sites (Y915 and Y944), by phosphopeptide mapping of endogenous receptor that had been immunoprecipitated from breast and colorectal tumor cell lines.22
More recently, pY978 was found in a phosphoproteomics study of human HepG2 hepatocytes,25
and pY998 was found in a phosphoproteomics study of human 184A1 mammary epithelial cells.26
Finally, years after recruitment data had been reported concerning the activation loop tyrosine (Y869),27
phosphorylation of this residue was observed by MS/MS in U87MG glioblastoma cells stably transfected with a naturally-occurring deletion mutant of EGFR.28
In the case of FGFR1, Mohammadi et al
. initially found seven sites of tyrosine phosphorylation (Y463, Y583, Y585, Y653, Y654, Y730 and Y766) using an in vitro
approach that combined mutagenesis with microsequencing.20
Later, Hinsby et al
. verified most of these sites and identified an additional site (Y605) through an MS/MS analysis of receptor that had been immunoprecipitated from transfected human embryonic kidney cells.18
Although no direct evidence of phosphorylation of Y677 and Y701 has been reported, Foehr et al
. showed through mutagenesis experiments that these sites are required for neurite outgrowth in PC12 cells and that changing these tyrosines to phenylalanine results in lower levels of phosphorylated receptor.29
We therefore designated them as ‘physiological’.
In the case of IGF1R, six sites (Y973, Y980, Y1161, Y1165, Y1166 and Y1346) were initially identified by stably expressing human IGF1R in rat fibroblasts and using two-dimensional thin layer chromatography coupled with Edman degradation to observe phosphorylation.21
Although no direct evidence has been reported for the phosphorylation of Y1280 and Y1281, several functional studies have demonstrated their importance in activating the mitogen-activated protein kinase (MAPK) signaling cascade and in suppressing apoptosis in lymphocytic cells.30,31
We therefore designated these sites as ‘physiological’. A summary of our designations is provided in .
Fig. 1 Intracellular domains of EGFR, FGFR1 and IGF1R. Tyrosine residues that have been shown to be physiological sites of phosphorylation are colored red; the other tyrosines have been designated ‘nonphysiological’ and are colored green. Asterisks (more ...)
Quantitative protein interaction maps using protein microarrays
To study the interactions of SH2 and PTB domains with physiological and nonphysiological sites on each of the three receptors, we synthesized pY-containing peptides with sequences that correspond to the sequences surrounding every intracellular tyrosine. Structural studies have shown that recognition can occur as far upstream as the −7 position of a phosphopeptide for some PTB domains32–34
and as far downstream as the +5 position of a peptide for some SH2 domains.35–37
To ensure an accurate analysis of all domains, we synthesized peptides with sequences that include nine residues upstream and seven residues downstream of the pY (). To visualize peptides that are bound by SH2 and PTB domains, we labeled each peptide on its amino-terminus with 5-(and-6)-carboxytetramethylrhodamine [5(6)-TAMRA]. 5(6)-TAMRA serves as a chromophore for quantification, as well as a fluorophore for visualization on the microarrays. The labeled peptides were deprotected, cleaved from the resin, and purified by reversed-phase high performance liquid chromatography (HPLC). Fractions containing the desired product were identified by matrix assisted laser desorption/ionization time-of-flight (MALDI–TOF) mass spectrometry and the purified peptides were recovered by lyophilization and quantified by absorption spectroscopy. Following this procedure, we obtained pure product for 46 of the 50 phosphopeptides. Of these, 43 were sufficiently soluble in aqueous buffer to use as probes for our experiments ().
Phosphopeptides derived from intracellular tyrosines on EGFR, FGFR1 and IGF1R
We have previously reported cloning, expressing and purifying virtually every human SH2 and PTB domain and preparing microarrays of these domains on chemically-derivatized glass surfaces.14,38
Briefly, the coding regions for each domain were cloned from human cDNA and the corresponding proteins were produced recombinantly in Escherichia coli
using the T7 expression system. Each domain features an amino-terminal His6
-tag, as well as a thioredoxin tag to facilitate the high-level production of soluble protein. After purifying each domain from large-scale bacterial culture, we assessed its purity by SDS-polyacrylamide gel electrophoresis and its aggregation state by size-exclusion column chromatography. In the current version of our arrays, we eliminated domains that were impure or did not contain soluble, monomeric protein. Notably, SH2 domains derived from the STAT and SOCS families of proteins did not behave well. By cloning larger portions of STAT1 and STAT2 that included their entire SH2 domain-containing cores,39
we obtained soluble, monomeric material for these two proteins. In addition, we cloned, expressed, and purified the N-terminal domain of Cbl,40
which contains a noncanonical SH2 domain. In total, 133 domains representing 103 proteins were used in these studies.
To facilitate the rapid and automated processing of microarrays, we spotted our proteins in quadruplicate on aldehyde-displaying glass substrates, cut to a size that spans all the wells of a microtiter plate (). Ninety-six separate arrays were prepared on each glass substrate, and the glass was attached to the bottom of a bottomless microtiter plate using an intervening silicone gasket. Two 16 × 17 microarrays were required to accommodate all 133 domains as well as the appropriate controls (His6-tagged thioredoxin and buffer). Proteins were spotted at high concentration (40–200 μM), and a low concentration of Cy5-labeled bovine serum albumin (200 nM) was included in each sample to facilitate image analysis. Scanning for Cy5 fluorescence enables us to identify the location of all the spots on the microarrays and scanning for 5(6)-TAMRA fluorescence enables us to visualize and quantify domain–peptide interactions ().
Fig. 2 Measuring the binding affinity of SH2 and PTB domains for phosphopeptide IGF1R 1346 using protein microarrays. (A) Fluorescence images of SH2 and PTB domain microarrays in separate wells of a 96-well microtiter plate, obtained using a 633 nm laser. The (more ...)
We have previously shown that probing a protein microarray with a single concentration of a labeled peptide can produce very misleading results.15
We therefore probed our arrays with eight concentrations of each peptide and fit the resulting spot intensities, Fobs
, to eqn (1)
is the background fluorescence, Fmax
is the maximum fluorescence at saturation, [peptide] is the total peptide concentration, and KD
is the equilibrium dissociation constant (see, for examples, ). For each peptide, we fit all 133 curves, one for each domain. Interactions were scored as positive if the data fit well to eqn (1)
> 0.9), with KD
< 2 μM and Fmax
at least two-fold higher than the mean fluorescence of control spots (His6
-tagged thioredoxin). The interactions that met these criteria for IGF1R 1346 are shown in . Following this strategy, we performed the same quantitative analysis for all 43 phosphopeptides. One of the peptides (FGFR1 677) exhibited high levels of nonspecific binding to the glass surface and to the control spots. High quality data were obtained for the other 42 peptides (ESI,‡ Table S1
Using the data derived from this large-scale analysis, we constructed two quantitative protein interaction maps for each receptor, one using the physiological peptides and the other using the nonphysiological peptides (). The maps derived from the physiological peptides provide an unbiased, genome-wide view of each receptor, showing biophysical interactions between signaling proteins and sites of tyrosine phosphorylation. Which proteins are actually recruited to each receptor in the context of a cell depends on many additional factors, including the relative concentrations of receptors and proteins. As such, these diagrams should be viewed as maps of the receptors, rather than as a depiction of protein recruitment in any particular cell type or cell state. Just as a city map provides all possible routes from one destination to another, but does not specify the route that a specific individual follows on a given day, so too these maps provide the sum total of possible pY-mediated interactions between receptors and their downstream proteins, but do not represent which interactions are occurring in a given cell. We anticipate that these maps will help guide future investigations into the biology of these receptors and will facilitate efforts to construct quantitative models of RTK signaling.16,17
Fig. 3 Quantitative protein interaction maps for EGFR, FGFR1 and IGF1R. Two maps are shown for each receptor. The maps on the left (A, C, E) are constructed using physiological peptides derived from EGFR, FGFR1 and IGF1R, respectively; the maps on the right (more ...)
What do we learn by comparing the maps based on physiological sites with those based on nonphysiological sites ()? From visual inspection, it is clear that the physiological peptides bind to substantially more SH2 and PTB domains than the nonphysiological peptides. On average, the physiological peptides bind to 16.5 SH2 or PTB domains with a KD
< 2 μM, while the nonphysiological peptides bind to 3.5 domains. In addition, many of the interactions with the nonphysiological peptides are with a single peptide: EGFR 813 (). Excluding this peptide, the average number of interactions with nonphysiological peptide drops to 1.5. It is possible that Y813 of EGFR is, in fact, phosphorylated in vivo
and that this event has escaped detection to date. Given its location in a well-structured, α-helical segment of the kinase domain,41
however, it is more likely that this tyrosine is not accessible, but is surrounded by residues that, by chance, bind to SH2 and PTB domains.
Histograms showing the number of interactions per peptide () clearly highlight the difference between these two types of sequence. Since many of the nonphysiological peptides reside in the kinase domain while many of the physiological peptides reside in the solvent-accessible receptor tail, it is possible that the difference in their binding profiles arises simply from differences in their physicochemical properties. To test this hypothesis, we used three previously reported, quantitative descriptors of the physicochemical properties of amino acids (z
-scales) to characterize each peptide sequence: z1
is considered a descriptor of hydrophilicity, z2
a descriptor of molecular weight and surface area, and z3
a descriptor of polarity and charge.42
Each peptide was expressed as an 18-dimensional vector, with three dimensions (z1
) for each of the three amino acids upstream of the pY and three dimensions (z1
) for each of the three amino acids downstream of the pY. The phosphopeptide vectors were then clustered using Euclidean distance as the similarity metric and a dendrogram was prepared using the centroid linkage method. As can be seen in , the physiological peptides are indistinguishable from the nonphysiological peptides. Similar results were obtained using other linkage methods (minimum, maximum, median) or using different numbers of residues up- and downstream of the pY (four or five; data not shown). This finding rules out the simplistic explanation that differences in the SH2 and PTB domain binding properties of physiological and nonphysiological peptides arise simply from differences in their physicochemical properties.
Fig. 4 Comparison of the physiological and nonphysiological phosphopeptides. (A) Histograms showing the frequency of peptides with different numbers of interactions. Physiological peptides are shown in red; nonphysiological peptides are shown in green. (B) Dendrogram (more ...)
The observation that nonphysiological peptides (sequences that are not substrates for tyrosine kinases) bind to fewer domains than physiological peptides (sequences that are substrates) supports the hypothesis, first proposed by Cantley and co-workers,23,24
that kinases and interaction domains co-evolve. It also suggests, at least with respect to these three receptors, that selection favoring desirable interactions plays a larger role in establishing new recruitment sites than selection disfavoring deleterious interactions. Sequences that have not evolved to be phosphorylated tend, by default, not to bind to SH2 or PTB domains when they are phosphorylated artificially, even though the genome features over one hundred domains with diverse binding properties that recognize tyrosine-phosphorylated sequences. Even for a small binding site, there are many possible ligands and apparently only a small fraction of sequence space intersects with the binding preferences of natural SH2 and PTB domains. Incidental intersection cannot be ruled out, as is evident from the binding profile of EGFR 813, but, in general, selective pressure favoring new interactions must be applied in order to establish a physiological recruitment site within an RTK.
The striking preferential ability of the physiological peptides to bind to SH2 and PTB domains has another important implication: It strongly suggests that the sites of tyrosine phosphorylation from which they were derived play a biological role in recruiting SH2 or PTB domain-containing proteins in cells. Our finding is not a surprise for the well-characterized pY sites that have previously been shown to recruit signaling proteins in vivo
. At least 35 proteins containing SH2 or PTB domains have been reported to interact with EGFR (ref. 43
) and our microarrays detect approximately two-thirds of them.14
The situation is quite different, however, for FGFR1 and IGF1R. Substantially fewer direct interactions have been reported for these receptors and most of their pY sites have not yet been shown to act as recruitment sites. In the case of FGFR1, it is well-established that phospholipase C-γ (PLC-γ) is recruited to pY766 through its SH2 domain44–46
and, as anticipated, we observe interactions with both PLC-γ1 and PLC-γ2 on our arrays. Very few other SH2- or PTB-containing proteins, however, have been shown to bind directly to sites of tyrosine phosphorylation on FGFR1. As a result, it is generally believed that recruitment to this receptor occurs primarily through interactions with the constitutively associated adaptor protein FRS2.45,47
Our data suggest that there is substantially more direct recruitment to FGFR1 than is currently thought, and the same can be surmised for IGF1R. We submit that more interactions are occurring in vivo
than have been described to date, but that these interactions are transient and so are not easily detected using standard biochemical approaches. This argues strongly for the development of new technologies that enable us to visualize dynamic, short-lived interactions in real time, in live cells.
From a pragmatic point of view, it is also instructive to know, for each receptor, if the nonphysiological peptides bind to domains that are not targeted by the physiological peptides. For the receptors studied here, we find that this is rare (). In the case of EGFR, all but five of the domains that recognize nonphysiological peptides also recognize physiological peptides (). With IGF1R, the number drops to three () and with FGFR1, every domain that binds a nonphysiological peptide also binds a physiological peptide (). This has important implications for future studies of less well-characterized receptors. Even if the physiological sites of tyrosine phosphorylation are not known, quantitative interaction maps built using phosphopeptides derived from every intracellular tyrosine residue should approximate the ‘true’ maps built using only physiological sites of phosphorylation. Although such maps will contain extraneous, incorrect pY sites, many of these incorrect sites will feature no interactions and the others will feature interactions that largely overlap with correct interactions.
Based on the interaction maps of EGFR, FGFR1 and IGF1R, how do these receptors achieve specificity in signaling? It is likely that, in many cases, cellular context defines biological outcome. The identities and concentrations of signaling proteins vary from one cell type to the next, as do the levels of the transcription factors that carry out their instructions. Nevertheless, it has been shown that different RTKs induce different biological outcomes in the same cellular background. For example, FGFR1 stimulates neurite outgrowth in PC12 cells, while EGFR stimulates mitogenesis.13
When we compare the recruitment profiles of EGFR, FGFR1 and IGF1R at a qualitative level, we find that these receptors are remarkably similar (). Thirty-nine of the SH2- or PTB-containing proteins that recognize physiological sites of tyrosine phosphorylation are common to all three receptors; only 11, 7 and 3 proteins are unique to EGFR, FGFR1 and IGF1R, respectively. It is possible that specificity of signaling can be explained entirely by the proteins that are unique to each receptor. A closer examination of the proteins that are common to all three receptors, however, shows that they are very well-characterized proteins that activate well-studied signaling pathways (). Moreover, these proteins have been described to promote disparate and even opposing cellular responses: proliferation (e.g.
, Shc and RasA1);48
, Abl2, CrkL and Tenc1);49–51
, Jak2 and Src);52,53
, PI3K and PLC-γ);54,55
and apoptosis (e.g.
We hypothesize, based on these observations, that quantitative differences in the affinities of each receptor for the proteins they recruit play an important role in defining specificity. For example, although all three receptors feature pY sites that bind to the SH2 domains of the three isoforms of phosphoinositide-3 kinase (PI3K), these connections are strongest in FGFR1 and weakest in IGF1R. Similarly, although all three receptors features sites that bind to the PTB domain of Shc, there are five such connections in EGFR and only one each in FGFR1 and IGF1R. We are currently investigating the hypothesis that quantitative differences in the biophysical recruitment profile of RTKs are predictive of quantitative differences in the signaling pathways they induce.
Fig. 5 Venn diagram showing proteins that interact with phosphopeptides derived from EGFR, FGFR1 and IGF1R. The red, green and blue circles represent all the proteins with at least one SH2 or PTB domain that interacts with at least one physiological phosphopeptide (more ...)