|Home | About | Journals | Submit | Contact Us | Français|
This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. Creation of derivative works is permitted but the resulting work may be distributed only under the same or similar licence to this one. This licence does not permit commercial exploitation without specific permission.
Receptor tyrosine kinases (RTKs) process extracellular cues by activating a broad array of signaling proteins. Paradoxically, they often use the same proteins to elicit diverse and even opposing phenotypic responses. Binary, ‘on–off' wiring diagrams are therefore inadequate to explain their differences. Here, we show that when six diverse RTKs are placed in the same cellular background, they activate many of the same proteins, but to different quantitative degrees. Additionally, we find that the relative phosphorylation levels of upstream signaling proteins can be accurately predicted using linear models that rely on combinations of receptor-docking affinities and that the docking sites for phosphoinositide 3-kinase (PI3K) and Shc1 provide much of the predictive information. In contrast, we find that the phosphorylation levels of downstream proteins cannot be predicted using linear models. Taken together, these results show that information processing by RTKs can be segmented into discrete upstream and downstream steps, suggesting that the challenging task of constructing mathematical models of RTK signaling can be parsed into separate and more manageable layers.
Receptor tyrosine kinases (RTKs) constitute a large family of single-spanning membrane proteins found only in Metazoans (Robinson et al, 2000). Their primary role is to mediate intercellular communication by recognizing extracellular ligands and translating that information into an appropriate cellular response (Schlessinger, 2000). The intracellular region of an RTK contains a tyrosine kinase domain as well as several tyrosine residues that are phosphorylated upon receptor activation. These phosphotyrosines act as relays for information transmission, and the sequences surrounding these sites define signal specificity. Intracellular signaling proteins bind to these sites of tyrosine phosphorylation through Src homology 2 (SH2) or phosphotyrosine-binding (PTB) domains, initiating a variety of signaling cascades within the cell.
RTKs can elicit diverse and even opposing phenotypic responses, ranging from adhesion to migration, differentiation to proliferation, and survival to apoptosis (Schlessinger, 2000; Yarden and Sliwkowski, 2001). Although no two receptors feature identical sequences surrounding their pTyr sites, there is considerable qualitative overlap in the pathways they activate (Fambrough et al, 1999; Simon, 2000). The ability of RTKs to signal through common pathways, yet to induce diverse phenotypic responses, has largely been attributed to differences in cellular context, as signaling proteins are differentially expressed in different cell types (Jordan et al, 2000; Simon, 2000). For example, fibroblast growth factor receptor 1 (FGFR1) induces differentiation in neuronal cells, but induces proliferation in fibroblasts (Marshall, 1995; Lin et al, 1996). When expressed in the same cellular background, however, different RTKs have also been shown to elicit different phenotypic responses. For example, activation of epidermal growth factor receptor (EGFR) induces proliferation in PC12 neuronal cells, whereas activation of FGFR1 induces differentiation (Pollock et al, 1990; Lin et al, 1996). How, then, are intrinsic differences between RTKs manifested within the same cell type? Where does the information reside that defines these differences? How is that information processed?
To address these questions, we expressed six diverse RTKs in the same cellular background and monitored their signaling properties by quantitative immunoblotting. We found that although they activated many of the same signaling proteins, they did so to different degrees. We then used protein microarrays to define a quantitative interaction map for each receptor by measuring the affinity of almost every human SH2 and PTB domain for phosphopeptides representing pTyr sites on the receptors. Using partial least-squares regression (PLSR), we found that the relative phosphorylation levels of upstream signaling proteins could be accurately predicted using linear combinations of receptor-docking affinities, and that much of the predictive information resides in the docking sites for two central signaling proteins: phosphoinositide 3-kinase (PI3K) and Shc1. We also found that the relative phosphorylation levels of downstream proteins could not be predicted using linear models, suggesting that RTK signaling can be segmented into discrete upstream and downstream layers.
To determine at a quantitative level how different RTKs behave when placed in the same cellular background, we selected six well-studied and phylogenetically diverse RTKs: EGFR, FGFR1, insulin-like growth factor 1 receptor (IGF1R), hepatocyte growth factor receptor (MET), neurotrophic tyrosine kinase receptor type 2 (NTRK2), and platelet-derived growth factor receptor β (PDGFRβ). Six stable cell lines were generated by transfecting the full-length coding region for each receptor into human embryonic kidney Flp-In-293 cells, which do not normally express these receptors at appreciable levels (Figure 1A). The resulting cell lines grew normally and, in each case, the receptor was produced at ~105 copies per cell and activated by its cognate ligand in a dose-dependent manner (Supplementary Figure S1).
To obtain a broad and quantitative view of how each receptor activates intracellular signaling proteins, the six cell lines were serum-starved for 24 h and stimulated for 5 min with saturating levels of the appropriate ligand. This early time point was chosen because many signaling proteins peak in their phosphorylation levels within the first 10 min of stimulation and because we wanted to capture immediate, receptor-dependent signaling events without additional complications arising from feedback loops and other forms of network regulation. Quantitative immunoblotting was then used to measure the relative phosphorylation levels of a wide range of proteins that have previously been implicated in RTK signaling (Figure 1B and C). In total, we queried 65 sites of phosphorylation on 57 proteins and observed growth factor-induced phosphorylation of 24 sites on 23 proteins (Supplementary Table SI). To compare the phosphorylation levels of a given protein across the six cell lines, lysate concentrations were normalized, basal phosphorylation was subtracted, and each level was calculated relative to the maximum observed level for that site (Figure 1B; Supplementary Figure S2; Supplementary information). Duplicate experiments were in close agreement (r=0.91; Supplementary Figure S3A). Interestingly, each receptor induced a distinct pattern of phosphorylation. Some proteins, such as SHP-2 and Cbl, were phosphorylated in as few as two of the cell lines, while others, such as CrkL and p90RSK, were phosphorylated in all six (Figure 1B). For every site of phosphorylation, quantitative differences were observed across the six cell lines and the rank order varied depending on the site. Thus, although these six receptors have previously been shown to activate many of the same pathways, they do so to different degrees when placed in the same cellular context. What, then, accounts for these differences? As RTKs initiate signaling by recruiting proteins to sites of tyrosine phosphorylation (Schlessinger, 2000), we asked whether there was information in the recruitment properties of the pTyr sites on these receptors that could explain the observed differences.
Sites of tyrosine phosphorylation are recognized by either SH2 (Sadowski et al, 1986) or PTB domains (Kavanaugh and Williams, 1994). To obtain a genome-wide, unbiased, and quantitative measure of the recruitment potential of each receptor, we prepared protein microarrays comprising nearly every SH2 and PTB domain encoded in the human genome (Figure 1D; Supplementary Table SII) (Jones et al, 2006). We then probed these arrays with fluorescently labeled, 18-residue phosphopeptides with sequences derived from every known site of tyrosine phosphorylation on each of the six receptors (Supplementary Table SIII). Equilibrium dissociation constants (KD values) were obtained by probing the arrays with eight concentrations of each peptide and fitting the resulting fluorescence data (Supplementary information) to an equation that describes saturation binding (Figure 1D) (Jones et al, 2006). In total, we queried 96 SH2 domains and 37 PTB domains with 47 phosphopeptides and observed 652 interactions with KD2 μM (Supplementary Table SIV). Weaker interactions could not be quantified using this approach. When we repeated this process, duplicate KD measurements were in close agreement (r=0.85; Supplementary Figure S3B) and the mean KD was used for subsequent analyses.
Of the 131 domains, 112 domains representing 74 different proteins bound at least one phosphopeptide. As anticipated, there was considerable qualitative overlap between the six receptors: 50 of these proteins recognized peptides from at least three receptors and 21 of them recognized peptides from at least five receptors (Supplementary Figure S4). In general, the domains that recognized the most receptors are those found in well-studied signaling proteins, including the lipid-modifying enzyme PI3K; the transcription factors Stat1 and Stat2; the non-receptor tyrosine kinases Src and Abl1; the guanine nucleotide exchange factor Vav2; the adaptor proteins Crk, CrkL, and Nck; and the scaffold proteins Shc1 and Grb7. Thus, if viewed in strictly binary terms, these phylogenetically diverse receptors differ very little in their recruitment properties with respect to these core signaling proteins. At the quantitative level, however, they differ substantially. For example, although five of the six receptors feature docking sites for the regulatory subunit of PI3K, there is only one low-affinity site (KD=590 nM) on IGF1R, but there are five sites, including one high-affinity site (KD=10 nM), on PDGFRβ. Quantitative differences in both the number of docking sites and the binding affinities at these sites may therefore explain the observed differences in signaling elicited by each receptor.
To test this hypothesis, we represented each phosphopeptide as a row vector of association constants, KA, with each element in the vector corresponding to a different SH2 or PTB domain-containing protein (Figure 1E). For proteins that contained two domains that bound the same peptide, the larger KA was used. In addition, the three isoforms of the regulatory subunit of PI3K were treated as a single protein as their SH2 domains behaved similarly. The binding vector for a given receptor was then defined as the sum of its phosphopeptide vectors to take into account the number of docking sites, as well as the affinities at each site. The implicit assumption in adding the phosphopeptide vectors is that multiple docking sites for the same protein within a given receptor act independently of each other. While this is probably not always true, it is the simplest way to combine the data and is a reasonable approximation. In addition, the phosphopeptide vectors were all weighted equally as the relative stoichiometry of phosphorylation at each pTyr site was not known. Thus, the intrinsic signaling capabilities of the six receptors was captured in the matrix X, which comprises six rows, one for each receptor, and 74 columns, one for each SH2 or PTB domain-containing protein (Figure 1F). In a similar manner, the cellular activity of the RTKs was captured in the matrix Y, which comprises six rows, one for each receptor, and 24 columns (y1…y24), one for each phosphorylation site that was monitored by immunoblotting (Figure 1C).
The simplest connection between the in vitro binding data and the cellular phosphorylation data is a one-to-one relationship in which the degree to which an SH2/PTB-containing protein is phosphorylated correlates linearly with its docking affinities. Of the eight proteins for which both microarray and immunoblotting data were obtained, significant correlations were observed for two: Shc1 (r=0.82, P=0.045) and PI3K (r=0.94, P=0.0059) (Figure 2). These correlations depend heavily on the number of Shc1- and PI3K-docking sites on each receptor. If the number of docking sites is taken into account but the affinities are ignored, the correlation actually improves for Shc1 (r=0.99, P=0.0001), but gets slightly worse for PI3K (r=0.91, P=0.013) (Figure 2). If the quantitative information is ignored and the interactions are treated as binary, correlations become meaningless as each protein recognized five of the six receptors. These results are consistent with a model in which Shc1 and PI3K interact directly with the activated receptors and are not influenced substantially by other docking proteins. For these two proteins, information processing is approximately linear and univariate.
The same is not true, however, for the other SH2/PTB-containing proteins that were monitored by immunoblotting; significant correlations were not observed (Figure 2). For these proteins, the reductionist assumption that they bind directly to the receptor and act independently is too simplistic. Some proteins that contain SH2 or PTB domains have been shown to compete with each other for the same pTyr sites (Zhang et al, 2003), and many have been shown to interact with each other and with components of the cell membrane (Schlessinger and Lemmon, 2003). Thus, it is likely that they are inextricably interconnected. Are their relationships complex and nonlinear, or can they be approximated using relatively simple models that depend on combinations of docking affinities, rather than on single affinities alone?
The simplest multivariate model is one in which the phosphorylation levels of a given protein, yi, can be predicted using a linear combination of docking affinities. As the number of variables (docking affinities) exceeds the number of observations (RTKs), we used PLSR to regress each yi against X. PLSR reduces the dimensionality of X by decomposing it into a small number of orthogonal components that capture most of the covariance between X and yi. Each component is a linear combination of docking affinities, weighted by how much they contribute to predicting each immunoblot (yi). We found that four components were sufficient to capture ~90% of the covariance with each yi. To guard against overfitting and to assess the predictive value of the docking affinities, we built our models using leave-one-out cross-validation: each model was trained using data from five receptors and then used to predict the immunoblotting data for the sixth receptor based on its docking affinities. This procedure was performed in all six combinations and the cross-validated residual between these predictions and the observed data, Q2, was calculated. To assess the significance of these predictions, we repeated our calculations 2000 times for each phosphorylation site using randomized X matrices and then calculated P-values for each model. Models were built using all of the microarray data, as well as subsets of the data that included only the SH2/PTB-containing proteins that bound at least two receptors, at least three receptors, at least four receptors, or at least five receptors. Similar results were obtained in every case, but the significance of the results increased as the number of variables was reduced (Supplementary Figure S5). Most of the information content in X resides in the 21 SH2/PTB-containing proteins that recognize at least five receptors and hence the results presented below are based on these data alone.
Of the 24 phosphorylation sites that we monitored by immunoblotting, nine were accurately predicted using linear combinations of docking affinities (Q20.9; Figure 3A and B; Supplementary Figure S6). Of these, six passed significance testing (P0.05 and false discovery rate 0.1). Interestingly, all nine of these sites are found on proteins that contain SH2 or PTB domains and therefore represent upstream signaling events (Figure 3B; Supplementary Figure S6). Moreover, only two phosphorylation sites that occur on SH2/PTB-containing proteins had a Q2 value less than 0.9: pTyr239/240 of Shc1 and pSer727 of Stat3. As noted earlier, the relative phosphorylation levels of Shc1 can be explained using only the number of Shc1-docking sites on each receptor (Figure 2); combinations of docking affinities are not required. More interesting is pSer727 of Stat3 (Q2=−0.33; P=0.94; Figure 3A). This serine residue is phosphorylated in a protein kinase C-dependent fashion and so represents a downstream signaling event (Aziz et al, 2007). In contrast, Tyr705 of Stat3 can be phosphorylated by the RTK itself and so represents an upstream event (Hwang et al, 2003); its phosphorylation is accurately predicted (Q2=0.99; P=0.03; Figure 3A). Similar to pSer727 of Stat3, the other 13 downstream signaling events could not be predicted using linear models (Figure 3B; Supplementary Figure S6). Thus, the phosphorylation sites that we monitored by immunoblotting naturally segregate into two groups: upstream phosphorylation events that are accurately predicted and downstream phosphorylation events that are not. From this, we submit that information processing by RTK signaling networks can be segmented into an upstream layer comprising proteins that are activated in an approximately linear manner through combinations of receptor-docking affinities and a downstream layer comprising proteins that are activated in a nonlinear manner. We note, however, that this result does not prove that the upstream step is linear mechanistically, but rather that this step can be approximated using relatively simple linear models.
Both the number of docking sites and the docking affinities are important for predicting upstream signaling events. If only the number of docking sites is taken into account, the models perform less well and the results are less significant (Figure 3C). If only binary information is used (proteins are described either to interact or not interact with a receptor), all predictions fail as, at this level, the receptors are very similar (X is close to singular). To determine where most of the predictive information resides, we assessed the contribution of each SH2/PTB-containing protein to each PLSR model by calculating their variable importance in the projection (VIP; see Materials and methods). Reduced models were then prepared using only the most important variables. For all of the upstream signaling events, reduced models that included only the four most important variables performed almost as well as the full PLSR models (Figure 3C). On average, the two most important variables were PI3K and Shc1 (Figure 4A). In other words, much of the information needed to predict the relative phosphorylation levels of upstream signaling proteins resides in the number and affinity of PI3K- and Shc1-docking sites on the RTK.
As these models are statistical in nature, this observation does not necessarily mean that PI3K and Shc1 have a causative function in determining the strength of signaling through other upstream proteins. PI3K- and Shc1-binding sites may have co-evolved with some other feature of RTKs that determines their ability to activate upstream proteins, such as kinase specificity or localization of the receptors to different membrane microdomains. Nevertheless, it is possible that these proteins do have a causative function in determining the extent to which other signaling proteins are activated.
This hypothesis cannot be tested by altering the abundance of PI3K or Shc1, as altering the composition of the cell would change the parameter values in the models. It is possible, however, to alter the catalytic activity of PI3K without altering its abundance using the small molecule inhibitor LY294002. If PI3K activity has a causative function in determining the degree to which upstream proteins are phosphorylated, we would expect LY294002 treatment to have the largest effect on proteins that have high PLSR coefficients for PI3K (Figure 4B), and on receptors with the strongest recruitment potential for PI3K (Figure 4C). We therefore stimulated all six cell lines in the presence or absence of LY294002 and assessed the relative phosphorylation levels of the upstream signaling proteins by immunoblotting (Figure 4D and E; Supplementary Figure S7; Supplementary information). Interestingly, the relative phosphorylation levels of Src, which has a low coefficient for PI3K (Figure 4B), were minimally affected by LY294002 treatment (Figure 4D), whereas the relative phosphorylation levels of Stat3, which has a high positive coefficient for PI3K (Figure 4B), were affected in a manner consistent with the number and affinity of PI3K-docking sites on the six RTKs (Figure 4E). This result suggests that, at least for Stat3, the contribution of PI3K in the PLSR model is, in part, dependent on its kinase activity. The same result was not observed, however, for all of the upstream signaling proteins (Supplementary Figure S7). The Stat3 result is not easily explained based on our current RTK wiring diagrams and a mechanistic understanding of this observation will require further investigation.
The overall importance of PI3K and Shc1 in RTK signaling was recently highlighted in a comprehensive map of the ErbB network, which revealed that a large fraction of information converges on a small number of signaling molecules, all of which can be modulated by PI3K and Shc1 (Oda et al, 2005). Interestingly, when we examined the sequences surrounding all known sites of tyrosine phosphorylation on human RTKs as reported in the Phospho. ELM database (Diella et al, 2008), we observed a distinct and significant (P<0.05) bias for sites that feature the consensus binding sequences for the PTB domain of Shc1 (NPXpY) (Songyang et al, 1995) and the SH2 domains of PI3K (pYXXM) (Songyang et al, 1993; Yaffe et al, 2001) (Figure 5A and B). This bias is not observed in known sites of tyrosine phosphorylation derived from all other human proteins (Figure 5C and D). Thus, we find that, despite activating many of the same proteins, intrinsic differences between RTKs are manifested in the degree to which they activate upstream signaling proteins and that much of this information resides in the number and affinity of docking sites for PI3K and Shc1. As these proteins lie upstream of the Akt and MAP kinase signaling pathways, and as these two pathways have been found repeatedly to have a central function in RTK biology, it is likely that our observations are not specific to HEK Flp-In-293 cells, but extend to more physiological settings as well.
Recently, Miller-Jensen et al (2007) showed that the phenotypic response of cells to external stimuli can be predicted using models that rely on linear combinations of a common set of downstream signaling proteins. Coupled with our results, this suggests that different RTKs may be able to elicit different phenotypic responses in the same cell type by activating a common set of signaling proteins, but to different quantitative degrees. In addition, our study, coupled with that of Miller-Jensen et al, supports a model in which information processing by RTK signaling networks can be segmented into three discrete layers: an upstream layer comprising proteins that are activated in a linear manner through combinations of receptor-docking affinities; an intermediate layer in which these signals are processed in a nonlinear manner; and a downstream layer in which integrators of signaling combine in a linear manner to determine cellular outcome. We submit that the difficult task of constructing mathematical models of RTK signaling can be parsed into discrete problems and that our greatest challenge lies in dissecting the middle layer.
Stable cell lines were generated by co-transfecting Flp-In-293 cells (Invitrogen, Carlsbad, CA) with the plasmid pEF5/FRT/V5-DEST bearing the open reading frame for each RTK and the accessory plasmid pOG44 according to the manufacturer's directions (Invitrogen). Cells were maintained in Dulbecco's modified Eagle's medium supplemented with 10% (v/v) fetal bovine serum, 2 mM glutamine, 100 IU/ml penicillin, 100 μg/ml streptomycin, and 150 μg/ml hygromycin B. All cell culture and immunoblotting experiments were performed using standard procedures. Rabbit-derived primary antibodies were from Cell Signaling Technologies (Beverly, MA; Supplementary Table SI). For quantitative immunoblots, bands were detected with IRDye 680-labeled goat–anti-rabbit IgG (LI-COR Biosciences, Lincoln, NE) and imaged using an Odyssey Infrared Imaging System (LI-COR Biosciences). Expression levels of the RTKs were determined using ELISA kits from Invitrogen for EGFR and MET, and from R&D Systems (Minneapolis, MN) for NTRK2 and PDGFRβ. All protein microarray experiments were performed as described earlier (Jones et al, 2006; Kaushansky et al, 2008).
To define the receptor-docking affinity matrix, X, the matrix of KD values (Supplementary Table SIV) was converted to a matrix of KA values (KA=1/KD). If a protein contained two domains that bound the same peptide, the higher KA value was used. Each phosphopeptide was then expressed as a row vector of KA values:
and each receptor was defined as the sum of its constituent phosphopeptide vectors:
The raw receptor-docking affinity matrix, Xraw, was then assembled from the six receptor vectors:
The raw matrix was adjusted such that every SH2/PTB-containing protein (i.e. every column) was mean-centred and weighted according to its average affinity. This yielded the final receptor-docking affinity matrix, X. For the models presented in Figure 3, proteins that bound fewer than five receptors were removed from the matrix. Models obtained using all of the data or increasingly smaller subsets are shown in Supplementary Figure S5.
Relative phosphorylation levels of signaling proteins, as measured by immunoblotting, were calculated by first subtracting the level observed in the mock-treated, parental Flp-In-293 cell line and then dividing each value by the maximum observed value for that site across the six cell lines. Each phosphorylation site was treated as a separate vector, y, and each y was mean-centred and variance-normalized. A PLSR (Geladi and Kowalski, 1986) was then performed separately on each y. For cross-validation, each receptor (row ‘i') was removed once from both X and y, the regression was performed, and the resulting model was used to predict the value of yi. The residual, Q2, of this prediction was then compared with residuals generated from randomly shuffling X 2000 times. The distribution of these residuals was used to calculate the P-value of the observed Q2.
The weighted sum of squares (also known as the VIP) for each variable, k, was calculated according to equation (5):
where KT is the total number of variables, a is the principal component, and SSa is the sum of squares for that component.
Experimentally determined sites of tyrosine phosphorylation in human proteins were acquired from the Phospho.ELM database (Diella et al, 2008). Of the 1397 identified sites, 196 were in RTKs and 1201 were in proteins other than RTKs. The amino-acid frequencies at positions upstream and downstream of pTyr sites were calculated and then normalized to the expected frequency of each amino acid in all human proteins (Echols et al, 2002). The resulting histograms of observed/expected frequencies were fit to a log-normal distribution from which P-values were calculated. All analyses were performed using MATLAB 7.4. (The MathWorks, Inc., Natick, MA). More detailed protocols are provided in Supplementary information.
Supplementary information: This supplementary file includes: Supplementary Methods; Supplementary References; Supplementary Figures 6, 7, 8, 9, 10, 11, 12; Supplementary Tables 1, 2, 3
Supplementary Table 4: Equilibrium dissociation constants for the binding of phosphopeptides to human SH2 or PTB domains.
We thank Jiunn-Ren Chen for helpful suggestions with the modeling and Jeffrey Knott, Susan Rogers, and Colleen Hunter for peptide synthesis. This study was supported by awards from the WM Keck Foundation, the Arnold and Mabel Beckman Foundation, and the Camille and Henry Dreyfus Foundation and by grants from the National Institutes of Health (R33 CA128726 and R21 CA126720). AG is the recipient of an NSF Graduate Research Fellowship, JAK is the recipient of a Howard Hughes Medical Institute Predoctoral Fellowship in the Biological Sciences, EMB and AK were supported in part by the NIH Molecular, Cellular, and Chemical Biology Training Grant (5 T32 GM07598-25), MS is the recipient of an Alfred and Isabel Bader fellowship and a Jacques-Émile Dubois fellowship, and JR is an employee of Cell Signaling Technology Inc.
Author contributions GM, AG, JAK, and EMB designed the study; AW-Y and MS generated the cell lines; JAK and EMB performed the cell-based experiments; JR supervised the peptide synthesis; AG and AK performed the protein microarray experiments; BHC contributed to the microarray experiments; AG performed the modeling experiments; AG, JAK, EMB, and GM interpreted the data; AG and GM wrote the paper, with contributions from JAK and EMB; and GM supervised the research.
John Rush is an employee of and stockholder in Cell Signaling Technologies Inc. (Danvers, MA) and Gavin MacBeath is an advisor for and stockholder in Merrimack Pharmaceuticals Inc. (Cambridge, MA) and Makoto Life Sciences Inc. (Bedford, MA).