To determine at a quantitative level how different RTKs behave when placed in the same cellular background, we selected six well-studied and phylogenetically diverse RTKs: EGFR, FGFR1, insulin-like growth factor 1 receptor (IGF1R), hepatocyte growth factor receptor (MET), neurotrophic tyrosine kinase receptor type 2 (NTRK2), and platelet-derived growth factor receptor β (PDGFRβ). Six stable cell lines were generated by transfecting the full-length coding region for each receptor into human embryonic kidney Flp-In-293 cells, which do not normally express these receptors at appreciable levels (). The resulting cell lines grew normally and, in each case, the receptor was produced at ~10
5 copies per cell and activated by its cognate ligand in a dose-dependent manner (
Supplementary Figure S1).
To obtain a broad and quantitative view of how each receptor activates intracellular signaling proteins, the six cell lines were serum-starved for 24 h and stimulated for 5 min with saturating levels of the appropriate ligand. This early time point was chosen because many signaling proteins peak in their phosphorylation levels within the first 10 min of stimulation and because we wanted to capture immediate, receptor-dependent signaling events without additional complications arising from feedback loops and other forms of network regulation. Quantitative immunoblotting was then used to measure the relative phosphorylation levels of a wide range of proteins that have previously been implicated in RTK signaling (). In total, we queried 65 sites of phosphorylation on 57 proteins and observed growth factor-induced phosphorylation of 24 sites on 23 proteins (
Supplementary Table SI). To compare the phosphorylation levels of a given protein across the six cell lines, lysate concentrations were normalized, basal phosphorylation was subtracted, and each level was calculated relative to the maximum observed level for that site (;
Supplementary Figure S2;
Supplementary information). Duplicate experiments were in close agreement (
r=0.91;
Supplementary Figure S3A). Interestingly, each receptor induced a distinct pattern of phosphorylation. Some proteins, such as SHP-2 and Cbl, were phosphorylated in as few as two of the cell lines, while others, such as CrkL and p90RSK, were phosphorylated in all six (). For every site of phosphorylation, quantitative differences were observed across the six cell lines and the rank order varied depending on the site. Thus, although these six receptors have previously been shown to activate many of the same pathways, they do so to different degrees when placed in the same cellular context. What, then, accounts for these differences? As RTKs initiate signaling by recruiting proteins to sites of tyrosine phosphorylation (
Schlessinger, 2000), we asked whether there was information in the recruitment properties of the pTyr sites on these receptors that could explain the observed differences.
Sites of tyrosine phosphorylation are recognized by either SH2 (
Sadowski et al, 1986) or PTB domains (
Kavanaugh and Williams, 1994). To obtain a genome-wide, unbiased, and quantitative measure of the recruitment potential of each receptor, we prepared protein microarrays comprising nearly every SH2 and PTB domain encoded in the human genome (;
Supplementary Table SII) (
Jones et al, 2006). We then probed these arrays with fluorescently labeled, 18-residue phosphopeptides with sequences derived from every known site of tyrosine phosphorylation on each of the six receptors (
Supplementary Table SIII). Equilibrium dissociation constants (
KD values) were obtained by probing the arrays with eight concentrations of each peptide and fitting the resulting fluorescence data (
Supplementary information) to an equation that describes saturation binding () (
Jones et al, 2006). In total, we queried 96 SH2 domains and 37 PTB domains with 47 phosphopeptides and observed 652 interactions with
KD![[less-than-or-eq, slant]](/corehtml/pmc/pmcents/les.gif)
2 μM (
Supplementary Table SIV). Weaker interactions could not be quantified using this approach. When we repeated this process, duplicate
KD measurements were in close agreement (
r=0.85;
Supplementary Figure S3B) and the mean
KD was used for subsequent analyses.
Of the 131 domains, 112 domains representing 74 different proteins bound at least one phosphopeptide. As anticipated, there was considerable qualitative overlap between the six receptors: 50 of these proteins recognized peptides from at least three receptors and 21 of them recognized peptides from at least five receptors (
Supplementary Figure S4). In general, the domains that recognized the most receptors are those found in well-studied signaling proteins, including the lipid-modifying enzyme PI3K; the transcription factors Stat1 and Stat2; the non-receptor tyrosine kinases Src and Abl1; the guanine nucleotide exchange factor Vav2; the adaptor proteins Crk, CrkL, and Nck; and the scaffold proteins Shc1 and Grb7. Thus, if viewed in strictly binary terms, these phylogenetically diverse receptors differ very little in their recruitment properties with respect to these core signaling proteins. At the quantitative level, however, they differ substantially. For example, although five of the six receptors feature docking sites for the regulatory subunit of PI3K, there is only one low-affinity site (
KD=590 nM) on IGF1R, but there are five sites, including one high-affinity site (
KD=10 nM), on PDGFRβ. Quantitative differences in both the number of docking sites and the binding affinities at these sites may therefore explain the observed differences in signaling elicited by each receptor.
To test this hypothesis, we represented each phosphopeptide as a row vector of association constants, KA, with each element in the vector corresponding to a different SH2 or PTB domain-containing protein (). For proteins that contained two domains that bound the same peptide, the larger KA was used. In addition, the three isoforms of the regulatory subunit of PI3K were treated as a single protein as their SH2 domains behaved similarly. The binding vector for a given receptor was then defined as the sum of its phosphopeptide vectors to take into account the number of docking sites, as well as the affinities at each site. The implicit assumption in adding the phosphopeptide vectors is that multiple docking sites for the same protein within a given receptor act independently of each other. While this is probably not always true, it is the simplest way to combine the data and is a reasonable approximation. In addition, the phosphopeptide vectors were all weighted equally as the relative stoichiometry of phosphorylation at each pTyr site was not known. Thus, the intrinsic signaling capabilities of the six receptors was captured in the matrix X, which comprises six rows, one for each receptor, and 74 columns, one for each SH2 or PTB domain-containing protein (). In a similar manner, the cellular activity of the RTKs was captured in the matrix Y, which comprises six rows, one for each receptor, and 24 columns (y1…y24), one for each phosphorylation site that was monitored by immunoblotting ().
The simplest connection between the in vitro binding data and the cellular phosphorylation data is a one-to-one relationship in which the degree to which an SH2/PTB-containing protein is phosphorylated correlates linearly with its docking affinities. Of the eight proteins for which both microarray and immunoblotting data were obtained, significant correlations were observed for two: Shc1 (r=0.82, P=0.045) and PI3K (r=0.94, P=0.0059) (). These correlations depend heavily on the number of Shc1- and PI3K-docking sites on each receptor. If the number of docking sites is taken into account but the affinities are ignored, the correlation actually improves for Shc1 (r=0.99, P=0.0001), but gets slightly worse for PI3K (r=0.91, P=0.013) (). If the quantitative information is ignored and the interactions are treated as binary, correlations become meaningless as each protein recognized five of the six receptors. These results are consistent with a model in which Shc1 and PI3K interact directly with the activated receptors and are not influenced substantially by other docking proteins. For these two proteins, information processing is approximately linear and univariate.
The same is not true, however, for the other SH2/PTB-containing proteins that were monitored by immunoblotting; significant correlations were not observed (). For these proteins, the reductionist assumption that they bind directly to the receptor and act independently is too simplistic. Some proteins that contain SH2 or PTB domains have been shown to compete with each other for the same pTyr sites (
Zhang et al, 2003), and many have been shown to interact with each other and with components of the cell membrane (
Schlessinger and Lemmon, 2003). Thus, it is likely that they are inextricably interconnected. Are their relationships complex and nonlinear, or can they be approximated using relatively simple models that depend on combinations of docking affinities, rather than on single affinities alone?
The simplest multivariate model is one in which the phosphorylation levels of a given protein,
yi, can be predicted using a linear combination of docking affinities. As the number of variables (docking affinities) exceeds the number of observations (RTKs), we used PLSR to regress each
yi against
X. PLSR reduces the dimensionality of
X by decomposing it into a small number of orthogonal components that capture most of the covariance between
X and
yi. Each component is a linear combination of docking affinities, weighted by how much they contribute to predicting each immunoblot (
yi). We found that four components were sufficient to capture ~90% of the covariance with each
yi. To guard against overfitting and to assess the predictive value of the docking affinities, we built our models using leave-one-out cross-validation: each model was trained using data from five receptors and then used to predict the immunoblotting data for the sixth receptor based on its docking affinities. This procedure was performed in all six combinations and the cross-validated residual between these predictions and the observed data,
Q2, was calculated. To assess the significance of these predictions, we repeated our calculations 2000 times for each phosphorylation site using randomized
X matrices and then calculated
P-values for each model. Models were built using all of the microarray data, as well as subsets of the data that included only the SH2/PTB-containing proteins that bound at least two receptors, at least three receptors, at least four receptors, or at least five receptors. Similar results were obtained in every case, but the significance of the results increased as the number of variables was reduced (
Supplementary Figure S5). Most of the information content in
X resides in the 21 SH2/PTB-containing proteins that recognize at least five receptors and hence the results presented below are based on these data alone.
Of the 24 phosphorylation sites that we monitored by immunoblotting, nine were accurately predicted using linear combinations of docking affinities (
Q2![[gt-or-equal, slanted]](/corehtml/pmc/pmcents/ges.gif)
0.9; ;
Supplementary Figure S6). Of these, six passed significance testing (
P![[less-than-or-eq, slant]](/corehtml/pmc/pmcents/les.gif)
0.05 and false discovery rate
![[less-than-or-eq, slant]](/corehtml/pmc/pmcents/les.gif)
0.1). Interestingly, all nine of these sites are found on proteins that contain SH2 or PTB domains and therefore represent upstream signaling events (;
Supplementary Figure S6). Moreover, only two phosphorylation sites that occur on SH2/PTB-containing proteins had a
Q2 value less than 0.9: pTyr239/240 of Shc1 and pSer727 of Stat3. As noted earlier, the relative phosphorylation levels of Shc1 can be explained using only the number of Shc1-docking sites on each receptor (); combinations of docking affinities are not required. More interesting is pSer727 of Stat3 (
Q2=−0.33;
P=0.94; ). This serine residue is phosphorylated in a protein kinase C-dependent fashion and so represents a downstream signaling event (
Aziz et al, 2007). In contrast, Tyr705 of Stat3 can be phosphorylated by the RTK itself and so represents an upstream event (
Hwang et al, 2003); its phosphorylation is accurately predicted (
Q2=0.99;
P=0.03; ). Similar to pSer727 of Stat3, the other 13 downstream signaling events could not be predicted using linear models (;
Supplementary Figure S6). Thus, the phosphorylation sites that we monitored by immunoblotting naturally segregate into two groups: upstream phosphorylation events that are accurately predicted and downstream phosphorylation events that are not. From this, we submit that information processing by RTK signaling networks can be segmented into an upstream layer comprising proteins that are activated in an approximately linear manner through combinations of receptor-docking affinities and a downstream layer comprising proteins that are activated in a nonlinear manner. We note, however, that this result does not prove that the upstream step is linear mechanistically, but rather that this step can be approximated using relatively simple linear models.
Both the number of docking sites and the docking affinities are important for predicting upstream signaling events. If only the number of docking sites is taken into account, the models perform less well and the results are less significant (). If only binary information is used (proteins are described either to interact or not interact with a receptor), all predictions fail as, at this level, the receptors are very similar (X is close to singular). To determine where most of the predictive information resides, we assessed the contribution of each SH2/PTB-containing protein to each PLSR model by calculating their variable importance in the projection (VIP; see Materials and methods). Reduced models were then prepared using only the most important variables. For all of the upstream signaling events, reduced models that included only the four most important variables performed almost as well as the full PLSR models (). On average, the two most important variables were PI3K and Shc1 (). In other words, much of the information needed to predict the relative phosphorylation levels of upstream signaling proteins resides in the number and affinity of PI3K- and Shc1-docking sites on the RTK.
As these models are statistical in nature, this observation does not necessarily mean that PI3K and Shc1 have a causative function in determining the strength of signaling through other upstream proteins. PI3K- and Shc1-binding sites may have co-evolved with some other feature of RTKs that determines their ability to activate upstream proteins, such as kinase specificity or localization of the receptors to different membrane microdomains. Nevertheless, it is possible that these proteins do have a causative function in determining the extent to which other signaling proteins are activated.
This hypothesis cannot be tested by altering the abundance of PI3K or Shc1, as altering the composition of the cell would change the parameter values in the models. It is possible, however, to alter the catalytic activity of PI3K without altering its abundance using the small molecule inhibitor LY294002. If PI3K activity has a causative function in determining the degree to which upstream proteins are phosphorylated, we would expect LY294002 treatment to have the largest effect on proteins that have high PLSR coefficients for PI3K (), and on receptors with the strongest recruitment potential for PI3K (). We therefore stimulated all six cell lines in the presence or absence of LY294002 and assessed the relative phosphorylation levels of the upstream signaling proteins by immunoblotting (;
Supplementary Figure S7;
Supplementary information). Interestingly, the relative phosphorylation levels of Src, which has a low coefficient for PI3K (), were minimally affected by LY294002 treatment (), whereas the relative phosphorylation levels of Stat3, which has a high positive coefficient for PI3K (), were affected in a manner consistent with the number and affinity of PI3K-docking sites on the six RTKs (). This result suggests that, at least for Stat3, the contribution of PI3K in the PLSR model is, in part, dependent on its kinase activity. The same result was not observed, however, for all of the upstream signaling proteins (
Supplementary Figure S7). The Stat3 result is not easily explained based on our current RTK wiring diagrams and a mechanistic understanding of this observation will require further investigation.
The overall importance of PI3K and Shc1 in RTK signaling was recently highlighted in a comprehensive map of the ErbB network, which revealed that a large fraction of information converges on a small number of signaling molecules, all of which can be modulated by PI3K and Shc1 (
Oda et al, 2005). Interestingly, when we examined the sequences surrounding all known sites of tyrosine phosphorylation on human RTKs as reported in the Phospho. ELM database (
Diella et al, 2008), we observed a distinct and significant (
P<0.05) bias for sites that feature the consensus binding sequences for the PTB domain of Shc1 (NPXpY) (
Songyang et al, 1995) and the SH2 domains of PI3K (pYXXM) (
Songyang et al, 1993;
Yaffe et al, 2001) (). This bias is not observed in known sites of tyrosine phosphorylation derived from all other human proteins (). Thus, we find that, despite activating many of the same proteins, intrinsic differences between RTKs are manifested in the degree to which they activate upstream signaling proteins and that much of this information resides in the number and affinity of docking sites for PI3K and Shc1. As these proteins lie upstream of the Akt and MAP kinase signaling pathways, and as these two pathways have been found repeatedly to have a central function in RTK biology, it is likely that our observations are not specific to HEK Flp-In-293 cells, but extend to more physiological settings as well.
Recently,
Miller-Jensen et al (2007) showed that the phenotypic response of cells to external stimuli can be predicted using models that rely on linear combinations of a common set of downstream signaling proteins. Coupled with our results, this suggests that different RTKs may be able to elicit different phenotypic responses in the same cell type by activating a common set of signaling proteins, but to different quantitative degrees. In addition, our study, coupled with that of Miller-Jensen
et al, supports a model in which information processing by RTK signaling networks can be segmented into three discrete layers: an upstream layer comprising proteins that are activated in a linear manner through combinations of receptor-docking affinities; an intermediate layer in which these signals are processed in a nonlinear manner; and a downstream layer in which integrators of signaling combine in a linear manner to determine cellular outcome. We submit that the difficult task of constructing mathematical models of RTK signaling can be parsed into discrete problems and that our greatest challenge lies in dissecting the middle layer.