A rapid peptide-based approach for the high-throughput determination of kinase consensus phosphorylation site motifs
To determine phosphorylation motifs for yeast protein kinases, we developed a high-throughput approach using our previously reported positional scanning peptide library (13
). This library consisted of 200 distinct peptide mixtures in which each 16-mer peptide contained a central fixed phosphorylation acceptor (phosphoacceptor) site (an equimolar mixture of Ser and Thr) flanked by degenerate positions consisting of equimolar mixtures of the 20 amino acids excluding Ser, Thr, and Cys, and a carboxy-terminal biotin tag (). For each of the nine positions surrounding the phosphoacceptor site, there were 22 peptide mixtures in which each of the 20 unmodified amino acids, as well as phosphothreonine (pT) and phosphotyrosine (pY), were fixed. In addition to these 198 (9 × 22) peptide mixtures, two control peptide mixtures bearing either Ser or Thr alone as the fixed phosphoacceptor residue in the context of a fully degenerate sequence were also included. These control mixtures served as indicators of any preference the kinase had for either Ser or Thr residues at the phosphoacceptor site. Peptides were incubated with the kinase of interest in the presence of radiolabeled ATP. At the end of the incubation period, aliquots of each reaction were spotted simultaneously using a capillary pin-based liquid transfer device onto a streptavidin-coated membrane that captured the peptide substrates through their carboxy-terminal biotin tags. After extensive washing, the membrane was dried and exposed to a phosphor screen, allowing the extent of radiolabel incorporation for each peptide to be visualized and quantified. To enable high-throughput analysis, all steps were performed in a 1536-well format, thereby reducing the amount of kinase and peptide required and enabling simultaneous analysis of four kinases.
Fig. 1 Miniaturized peptide array approach enables high-throughput analysis of kinase consensus phosphorylation motifs. (A) Scheme for kinase peptide screening. Capillary pin-based liquid transfer devices were used to add components to reactions (2 μ (more ...)
Three yeast kinases (Tpk1, Tpk2, and Ste20) were assayed with both the miniaturized and large volume formats, and we performed multiple replicates with one of these kinases, Tpk1. Identical results were observed with the two formats and in replicate assays with the 1536-well format (data for Tpk1 is shown in Fig. S1
). These kinases also recapitulated preferences of their mammalian orthologs for basic residues upstream of the phosphorylation site (13
). These results confirm that the miniaturized peptide library screening system is reproducible and provides data that is quantitatively equivalent to lower throughput approaches.
Screening yeast kinases for their consensus phosphorylation site motifs
With our peptide array method, we screened 111 of the 122 yeast kinases. Kinases were initially purified from yeast strains that harbor galactose-inducible expression plasmids bearing either a C-terminal tandem affinity purification tag or an N-terminal glutathione S
-transferase (GST) tag (15
). In a number of instances, it was necessary to perform the assay in the presence of known activating subunits [(for example, cyclins for cyclin-dependent kinases (CDKs)], phosphorylate the kinase in vitro or co-express it with an activating kinase, or purify the kinase from yeast grown under activating conditions. For kinases with which poor yields were obtained from yeast, we employed alternative bacterial and mammalian cell expression systems. Each kinase was assayed on the peptide substrates in duplicate on separate days. In total, we generated reproducible phosphorylation motifs for 61 of the 111 yeast kinases screened ( and table S1
). Three distinct motifs were generated for the cyclin-dependent kinase Pho85 by analyzing separately in complex with different cyclin subunits (Pho80, Pcl1 and Pcl2). The remaining kinases were not sufficiently active to phosphorylate the peptides above background levels. These kinases may be highly specific for particular protein substrates and thus do not phosphorylate peptides efficiently. For example, in keeping with previous observations for their mammalian orthologs (17
), we did not observe activity on our peptide substrates for the eight kinases in the mitogen-activated protein kinase kinase (MAPKK) and mitogen-activated protein kinase kinase kinase (MAPKKK) families. Other kinases were likely simply inactive under exponential growth conditions or when assayed in the absence of obligate binding partners and may be suitable for analysis once their activation mechanisms are more completely understood.
Approximately half of the phosphorylation site motifs that we determined for yeast kinases were identical to known motifs, as they corresponded to yeast homologs of mammalian kinases that have been previously characterized (11
). In contrast, the remaining kinases and their mammalian homologs have either not been previously characterized (table S2
lists mammalian homologs and indicates which kinases have previously known motifs) or in one instance (Tos3) yielded a different motif from that reported. Representative spot arrays produced by four kinases for which phosphorylation motifs were not previously known (Atg1, Gin4, Mps1, and Prk1) are shown in . Spot intensities from the peptide arrays were quantified, background corrected, and normalized to provide the selectivity values shown in . We verified the consensus phosphorylation motifs for these kinases by performing kinase assays using optimized peptide substrates (named ATGtide, GINtide, MPStide, and PRKtide, respectively) consisting of those residues that were most highly selected at each position. As shown in , each kinase was highly specific for its corresponding peptide substrate, thus providing independent validation of our mixture based peptide library screening approach.
Table 1 Quantified selectivity values for protein kinases discussed in the text. Peptide array data were quantified and normalized to an average value of 1 within a position. Positively selected residues with values greater than 1.5 are shown. Complete quantified (more ...)
Notably, the autophagy-linked kinase Atg1 has an atypical motif exhibiting selections for hydrophobic residues at multiple positions. We verified this motif by making targeted substitutions to the ATGtide substrate. As anticipated, substituting a different favorable hydrophobic residue (Met) at the most selective position (P−3) had no significant effect on the rate of ATGtide phosphorylation. Moreover, substituting unfavorable charged residues at any of three most strongly selective positions dramatically reduced the reaction rate ().
Overall features of kinase phosphorylation signatures
Normalized, background corrected phosphorylation signals for each kinase were assembled into position weight matrices (PWMs), which are quantitative representations of the phosphorylation motif. We scored each position for its total selectivity, and a specificity heat map of all kinases and positions revealed the wide range of selectivity exhibited by kinases (). At one extreme, Yck1 and Cka1 (yeast casein kinase 1 and casein kinase 2 homologs) were highly sequence specific, with requirements for particular amino acids at multiple positions. At the other extreme, Cak1 and Rad53 were the least selective in that, although the extent of substrate phosphorylation by these kinases is clearly dependent on peptide sequence, there were no residues that were absolutely required at any position surrounding the phosphoacceptor. Most kinases fell between these extremes, with a combination of required residues and more subtle propensities that influence the overall efficiency of phosphorylation. Furthermore, although each position surrounding the phosphorylation site was highly selective for by at least several kinases, kinases were most frequently selective at the P−3 position, followed by the P−2 and P+1 positions. By contrast, few kinases were selective at the P−1 position.
Fig. 2 Heat map ranking kinases by their specificity quotients as calculated from their average PWMs. Kinases are ranked from least specific (top) to most specific (bottom). The specificity in each position is defined as the information content in each position, (more ...)
The 61 yeast kinases were clustered into groups on the basis of phosphorylation site selectivity (). 35 kinases were observed to target basophilic motifs. 31 of these showed a classic “basophilic” signature (10
), with a strong selectivity primarily for an Arg residue at the P−3 position. This was the single most common feature found among all motifs (, table S1
). Four other basophilic kinases, Ipl1, Skm1, Ste20, and Cla4, were selective for Arg at the P−2 position, but did not show strong selectivity for Arg at the P−3 position ( and table S1
). The basophilic kinases however diverged with respect to the residues selected at other positions. For example, basophilic kinases are often reported to be selective primarily for either Leu or Arg at the P−5 position, as well as selective for Arg at P−3 (13
). Among the various kinases that selected Arg at the P−3 position, we observed a spectrum of residues selected at the P−5 position, including Leu (Cmk1 and Cmk2) and Arg (Ypk1), but also Met (Vhs1), Val or Ile (Prr1), and His (Psk2) ( and table S1
). The seven proline-directed kinases, which primarily selected for Pro at the P+1 position, were also distinguishable on the basis of selectivity at other positions. For example, Kss1, Hog1, and Fus3 all showed a secondary selectivity for proline at the P−2 position that was not observed by Pho85 or Cdc28. Other motifs were less common, and include multiple distinct “acidophilic” motifs in which the strongest selectivity was for Asp, Glu, or pThr. Such acidophilic motifs have been previously seen for various mammalian kinases, including GSK3 (selectivity for acidic amino acids at the P+4 position), CK1 (P−5 through P−3), PLK (P−2), and CK2 (P+1 through P+3) (21
). All yeast orthologs of these kinases recapitulated the motif found in their mammalian orthologs (table S2
), but we also found additional yeast acidophilic kinases that were not anticipated (Mps1, Gcn2, and Cdc7). In addition, three kinases, Atg1, Kin1, and Kin3, exhibited their strongest selectivities for hydrophobic residues. The remaining kinases exhibited multiple strong selectivities and could not easily be categorized.
Fig. 3 Dendrogram of yeast kinases clustered by specificity. Specificity categories are indicated by shading: red, acidophilic; orange, Pro-directed; cyan, P−3 Arg selecting; blue, P−2 Arg selecting; green, other. Because there were multiple (more ...)
Connecting phosphorylation site motifs to kinase specificity determining residues
Yeast kinases have been classified into five groups on the basis of sequence homology: AGC (PKA/PKG/PKC), CAMK (calcium/calmodulin regulated and structurally similar kinases), CMGC (CDKs, MAPK, GSK, and CDK-like kinases), STE11/STE20, and STE7/MEK (MAPKK) (24
). These groups have then been classified further into families that share a high degree of sequence similarity within their catalytic domains. Although related kinases generally recognized similar phosphorylation motifs, kinases within the same family occasionally exhibited differences, both subtle and striking. One family that illustrates striking differences is the Snf1 kinase family, which belongs to the CAMK group. In yeast, the Snf1 [also known as the AMPK (AMP-activated protein kinase)] family has six family members — Gin4, Hsl1, Kcc4, Kin1, Kin2, and Snf1. We identified consensus phosphorylation site motifs for each of these kinases with the exception of Kin2 ( and table S1
). All five kinases had common features in their motifs, which are also shared with mammalian AMPKs (25
). For example, each one had preferences for a Ser residue as the phosphoacceptor site, a Ser residue at the P−2 position, an Asn residue at the P+3 position, and hydrophobic residues at the P+4 position (Gin4, Snf1, and Kin1 are summarized in ; see Dataset S1
for quantitative data for Hsl1 and Kcc4). Strikingly, however, only four of the five Snf1 family kinases exhibited the hallmark basophilic P−3 Arg selectivity of the CAMK group, with Kin1 lacking this conserved feature. Instead, Kin1 had an additional preference for an Asn residue at the P−2 position. This difference correlated with a single amino acid substitution within the kinase catalytic domain (). Gin4, Hsl1, Kcc4, and Snf1 each have a conserved Glu residue (corresponding to Glu127
in PKA, ). Crystal structures of multiple basophilic kinases in complex with peptide substrates have shown that this residue forms a salt bridge with the guanidino group of the P−3 Arg residue of the bound substrate (27
). Unlike the other family members, Kin1 has a Gln residue in place of this conserved Glu. These observations are thus consistent with a role for Glu127
as the critical specificity-determining residue for Arg at the P−3 position in substrates, at least within the Snf1 family.
Fig. 4 Comparison of kinase consensus phosphorylation site motifs to primary sequence reveals specificity-determining residues. (A) Sequence alignment of the regions surrounding residues 127 and 170 (human PKA numbering) in the catalytic domain of representative (more ...)
However, crystallographic insight into specificity determinants in protein kinases is limited to a handful of cases where structures have been solved of kinase-peptide complexes. Although computational approaches have offered additional insight into structural features that control specificity (31
), the existence of alternative binding modes, even between kinases with similar specificity (30
), makes it difficult to make general conclusions regarding the relationship of kinase sequence to specificity. Indeed, multiple sequence alignment of the yeast kinome and comparison with our experimentally determined motifs indicated that the presence of an acidic residue at position 127 is neither necessary nor sufficient to direct selectivity for Arg at the P−3 position in substrates. For example, within the CMGC group, members of the MAPK and CDK families (Fus3, Kss1, Hog1, Cdc28, and Pho85), which are proline-directed kinases, have an Asp residue at that position, despite a lack of selectivity for Arg at the P−3 position. Conversely, Yak1 within the same group is basophilic, yet lacks an acidic residue at that position ( and ). Presumably, other residues within the catalytic domain are responsible for dictating a basophilic signature within this group of kinases.
With our large collection of kinase motifs, we identified previously unknown specificity-determining residues, including, but not restricted to, residues that might confer P−3 Arg selectivity for kinases that are not part of the Snf1 family. We used an approach based on the idea of co-variation (33
). We identified residues whose variation in the primary sequence of the catalytic domain significantly correlated with the variation in phosphorylation site specificity across kinases. To measure sequence variation, we used a simple pairwise similarity matrix, and to compare specificities, we calculated the Frobenius norm of the differences in PWMs ( and ). This approach reproduced several specificity-determining residues previously known from both structural and mutagenesis studies, including Glu127
. In addition, we uncovered many previously unknown candidate specificity-determining residues, seven of which were predicted to be within ten angstroms to a bound protein substrate. Among these, an acidic Glu residue at position 170 (PKA numbering) correlated with P−3 Arg selectivity among CMGC kinases. This result contrasts with a previous prediction based on modeling of DYRK1A, the human homolog of Yak1 (34
). To test our predictions, we examined the role of residue 170 in substrate selection. Indeed, a Ser to Glu mutation at the analogous position in the MAPK Kss1 (residue 147) conferred a basophilic signature ( and Fig. S2
). This result validates our ability to predict new specificity-determining residues on the basis of our large motif dataset.
Computationally predicted kinase specificity determining residues. Correlation values and peptide-kinase distance measurements are defined in the Materials and Methods section.
Connecting kinases to substrates on the basis of phosphorylation site motifs
Because in vivo phosphorylation sites on protein substrates tend to fall within the context of the phosphorylation site motif for a particular kinase, database scanning has been used to predict new substrates and to pinpoint sites of phosphorylation (14
). However, simple sequence matching approaches are prone to false positives, because predicted sites may not be accessible for phosphorylation, and kinases can also depend on docking or scaffolding interactions for substrate recruitment. In addition, false negatives are frequent for kinases with low sequence specificity because their motifs occur in many proteins and are, thus present with high frequency in databases (14
). To increase the accuracy of such predictions, we generated and used a motif analysis pipeline, MOTIPS (http://motips.gersteinlab.org/
). MOTIPS scans sequence databases for sites that most closely match the PWM for a particular kinase using a modified algorithm based on the program Scansite (40
). Predicted sites are then scored on the basis of a panel of features (evolutionary conservation, predicted surface accessibility, and disordered structure) that are characteristic of known phosphorylation sites (41
We first analyzed established kinase substrates for the presence of their respective phosphorylation site motifs with MOTIPS. From a sampling of 174 in vivo kinase-substrate relationships curated from the literature, 99 of the substrates ranked among the top 0.5% of predicted sites for their respective kinase, with 27 substrates falling within the top 200 sites (). We next analyzed predicted substrates for each of the 61 yeast kinases for their associated biological processes and respective localization according to Gene Ontology (GO) assignments in the Saccharomyces
Genome Database (44
) (; the full list of predicted substrates for each kinase with associated GO terms and MOTIPS features is provided as Dataset S2
). We found that predicted substrates were more likely to be associated with the same biological process and to localize to the same subcellular compartment as their respective kinases than a randomly chosen set of proteins. Taken together, these observations suggest that motif scanning using our set of phosphorylation site motifs enriches for authentic kinase-substrate pairs.
Fig. 5 MOTIPS ranking of known and predicted kinase-substrate pairs. (A) Bar graph showing the number of protein substrates reported in the literature (true positives) that have at least one phosphorylation site falling within the indicated rank value of predicted (more ...)
To establish directly that our bioinformatics analysis had uncovered authentic substrates, we examined more closely the predicted substrates of the protein kinase Prk1. Prk1 is a member of a small family of kinases conserved throughout eukaryotes that mediates reorganization of the actin cytoskeleton during endocytosis (45
). Our peptide array analysis revealed an unusual phosphorylation site motif that included strong preferences for aliphatic residues at the P−5 position, Gly at the P+1 position, and Thr as the phosphoacceptor (, ). We selected 107 Prk1 candidate substrates identified by MOTIPS for further analysis. These substrates contained sites of high, middle, and low rank among the top 2,000 scoring sites. Because all five known Prk1 substrates undergo multisite phosphorylation (45
), candidates were also chosen for having at least three predicted Prk1 phosphorylation sites. Of the 107 candidate substrates, we observed phosphorylation of 19 candidates in vitro with wild-type Prk1 but not with a Prk1 inactive mutant (Fig. S3
). To identify additional candidates, we used these 19 candidates as positive data points in a training set to educate MOTIPS by machine learning. Negative data points in the training set included 81 of the original Prk1 candidates that were unambiguously not substrates in vitro, as well as about 400 proteins identified in the yeast protein database as localizing solely to non-cytosolic compartments (48
This set of positive and negative data points was used to re-train the Bayesian algorithm in MOTIPS to integrate the motif matching, conservation, surface accessibility, and disorder scores for each site, along with an additional score based on the number of predicted sites. The five known in vivo substrates of Prk1, which were excluded from the training set, all fell within the top seven targets (). Five additional candidates taken from the top 15 putative substrates in the new Prk1 hit list were tested by an in vitro kinase assay that used the purified candidates as substrates. These in vitro assays revealed three additional new substrates for Prk1— Gon7, a protein component of the EKC/KEOPS (Endopeptidase-like Kinase Chromatin-associated/Kinase, putative Endopeptidase and Other Proteins of Small size) complex involved in telomere regulation, Gph1, a protein involved in the mobilization of glycogen, and the key endocytic protein Las17. One of the five additional candidates tested was Ypl150w, which is a putative kinase that autophosphorylated in our assay and thus could not be confirmed or excluded as a substrate of Prk1. This second round of in vitro assays provides additional evidence that retraining our algorithm increased our success rate in predicting authentic kinase substrates. Furthermore, among the 22 in vitro confirmed Prk1 substrates, seven proteins (Bem2, Ede1, Las17, Sac3, Sla2, Syp1, and Yap1801) are reported to have roles in endocytosis or the regulation of the actin cytoskeleton, suggesting that they may be subject to regulation by Prk1 ().
Fig. 6 Prediction and confirmation of kinase-substrate relationships. (A) Top 15 hits from the trained Prk1 MOTIPS output. The Prk1 hit list of candidate substrates was subjected to machine learning using a training set consisting of 19 true positives (experimentally (more ...)
Table 3 Proteins phosphorylated by Prk1 in vitro. Proteins functionally associated with actin rearrangement or endocytosis are highlighted. Ub, ubiquitin; RhoGAP, Rho guanosine triphosphatase-activating protein; SCF, Skp1-Cullin-F-box; UPR, unfolded protein response; (more ...)
We next investigated whether our predicted Prk1 candidate substrates represented bona fide substrates. Because a closely related kinase, Ark1, has an overlapping biological function and shares a nearly identical phosphorylation site motif with Prk1, we examined the phosphorylation state of candidate substrates in yeast strains deleted for both PRK1
. Changes in phosphorylation were monitored by electrophoretic mobility shifts in immunoblots of purified substrates, with phosphatase-treated samples serving as a control for the unphosphorylated species. We observed a change in mobility for two candidate substrates, Bem2 and Ede1, suggesting that they are in vivo targets of Prk1 or Ark1, or both (). Although we did not observe gel shifts for other substrates, it is likely that some are authentic Prk1/Ark1 substrates as well but simply do not change mobility upon phosphorylation. Notably, previous mass spectrometry (MS) phosphoproteomic analysis identified three of the in vitro Prk1 substrates (Ede1, Syp1, and Rpl5) as phosphorylated at Prk1 consensus sites in vivo (49
) (the MOTIPS output for all kinases, which is available as Dataset S2
, indicates which candidate phosphorylation sites have been identified by MS).
We also validated kinase-substrate pairs through integration with other proteomic datasets. We found that the kinase Vhs1, for which limited functional information is known, exhibited selectivity for the phosphorylation site motif MXRXXS ( and table S1
. Fourteen in vitro substrates for the kinase Vhs1 (55
) were previously identified by protein microarray analysis (4
), and six of these, Mga1, Pfk26, Sef1, Sol1, Sol2, and Utr1, contain the Vhs1 consensus phosphorylation site motif. MS phosphoproteomic analysis (49
) revealed that Sef1 was phosphorylated in vivo at a Vhs1 consensus phosphorylation site and in an immunoprecipitation-MS analysis Sef1 and Vhs1 physically interacted (56
). In addition, MS phosphoproteomic analysis identified Sol1 as phosphorylated at a Vhs1 consensus phosphorylation site in vivo (50
), and its homolog Sol2 was the most highly phosphorylated Vhs1 in vitro substrate identified by protein microarray analysis (4
). Mobility shift analysis of VHS1
deletion strains using Phos-tag SDS-PAGE (57
) was consistent with Sol2 as a substrate for Vhs1 in vivo (). Though the presence of multiple Sol2 species in the presence and absence of Vhs1 indicates phosphorylation at multiple sites, likely by more than one kinase, the mobility shift indicates that in vhs1
mutant cells, Sol2 is phosphorylated at fewer sites. Sol2, which promotes nucleocytoplasmic tRNA transport (58
), is the first reported in vivo substrate for Vhs1 and suggests a role for this kinase in regulating this process. These results illustrate how integration of data from multiple proteomic approaches can shed light on the biology of poorly characterized molecules.