|Home | About | Journals | Submit | Contact Us | Français|
Many protein activities are driven by ATP binding and hydrolysis. Here, we explore the ATP binding proteome of the model plant Arabidopsis thaliana using acyl-ATP (AcATP)1 probes. These probes target ATP binding sites and covalently label lysine residues in the ATP binding pocket. Gel-based profiling using biotinylated AcATP showed that labeling is dependent on pH and divalent ions and can be competed by nucleotides. The vast majority of these AcATP-labeled proteins are known ATP binding proteins. Our search for labeled peptides upon in-gel digest led to the discovery that the biotin moiety of the labeled peptides is oxidized. The in-gel analysis displayed kinase domains of two receptor-like kinases (RLKs) at a lower than expected molecular weight, indicating that these RLKs lost the extracellular domain, possibly as a result of receptor shedding. Analysis of modified peptides using a gel-free platform identified 242 different labeling sites for AcATP in the Arabidopsis proteome. Examination of each individual labeling site revealed a preference of labeling in ATP binding pockets for a broad diversity of ATP binding proteins. Of these, 24 labeled peptides were from a diverse range of protein kinases, including RLKs, mitogen-activated protein kinases, and calcium-dependent kinases. A significant portion of the labeling sites could not be assigned to known nucleotide binding sites. However, the fact that labeling could be competed with ATP indicates that these labeling sites might represent previously uncharacterized nucleotide binding sites. A plot of spectral counts against expression levels illustrates the high specificity of AcATP probes for protein kinases and known ATP binding proteins. This work introduces profiling of ATP binding activities of a large diversity of proteins in plant proteomes. The data have been deposited in ProteomeXchange with the identifier PXD000188.
ATP binding and hydrolysis are the driving processes in all living organisms. Hundreds of cellular proteins are able to bind and hydrolyze ATP to unfold proteins, transport molecules over membranes, or phosphorylate small molecules or proteins. Proteins with very different structures are able to bind ATP. A large and important class of ATP binding proteins is that of the kinases, which transfer the gamma phosphate from ATP to substrates. Kinases, and particularly protein kinases, play pivotal roles in signaling and protein regulation.
The genome of the model plant Arabidopsis thaliana encodes for over 1099 protein kinases and hundreds of other ATP binding proteins (1, 2). Protein kinases are involved in nearly all signaling cascades and regulate processes ranging from cell cycle to flowering and from immunity to germination. Many protein kinases in plants are receptor-like kinases (RLKs), often carrying extracellular leucine-rich repeats (LRRs). The RLK class contains at least 610 members (3), including famous examples such as receptors involved in development (e.g. BRI1, ER, CLV1) and immunity (e.g. FLS2, EFR). Other important classes are mitogen-activated protein (MAP) kinases (MPKs) (20 different members), MPK kinase kinase kinases (MAP3Ks) (60 different members (4)), and calcium-dependent protein kinases (CPKs) (34 different members (5)). Because of their diverse and important roles, protein kinases have been intensively studied in plant science. The current approach is to study protein kinases individually—a daunting task, considering the remaining hundreds of uncharacterized protein kinases. New approaches are necessary in order to study protein kinases and other ATP binding proteins globally rather than individually.
ATP binding activities of protein kinases and other proteins can be detected globally by acyl-ATP (AcATP) probes (6, 7) (Fig. 1A). AcATP binds to the ATP pocket of ATP binding proteins and places the acyl group in close proximity to conserved lysine residues in the ATP binding pocket. The acyl phosphonate moiety serves as an electrophilic warhead that can be nucleophilically attacked by the amino group of the lysine, resulting in a covalent attachment of the acyl reporter of the AcATP probe on the lysine and a concomitant release of ATP. The reporter tag is usually a biotin to capture and identify the labeled proteins. Labeled proteins can be displayed on protein blots using streptavidin-HRP. However, because AcATP labels many ATP binding proteins and protein kinases are of relatively low abundance, mass spectrometry is more often used to identify and quantify labeling with AcATP probes. The analysis is preferably done using Xsite, a procedure that involves trypsination of the entire labeled proteome, followed by analysis of the biotinylated peptides rather than the biotinylated proteins (8). This “KiNativ ” approach provides enough depth and resolving power to monitor ~160 protein kinases in a crude mammalian proteome (7). Of the 518 human protein kinases (9), 394 (76%) have been detected via AcATP labeling (6).
KiNativ has mostly been used to validate targets of human drugs that target protein kinases using competitive labeling experiments. This approach has been used to identify selective inhibitors of, for example, Parkinson's disease protein kinase LRRK2 (10), the BMK1 and JNK MAP kinases (11, 12), and the mTOR kinase (13). Importantly, the correlation of the biological activity of protein-kinase-inhibiting drugs with inhibitor affinity detected using KiNativ is better than that achieved when affinities are determined by assays using heterologously expressed protein kinases (7). This improved correlation illustrates that assays in the native environment provide a more realistic measure of protein kinase function.
In addition to characterizing inhibitors selectively, AcATP probes can also display differential ATP binding activities of protein kinases. For example, labeling with AcATP probes during infection with dengue virus displayed a 2- to 8-fold activation of a DNA-dependent protein kinase (14) Similarly, AcATP labeling revealed an unexpected Raf kinase activation in extracts upon protein kinase inhibitor treatment (7). In conclusion, profiling with AcATP probes is a powerful approach for monitoring protein kinases and offers unprecedented opportunities to identify selective protein kinase inhibitors and discover protein kinases with differential ATP binding activities.
In this work, we introduce AcATP profiling of plant proteomes. In addition to the analysis of labeled peptides, we characterized labeling using gel-based approaches and discovered that biotin is often oxidized in this procedure. We also performed an in-depth analysis of labeling sites in proteins other than protein kinases, which had not been done before. We discuss labeling outside known nucleotide binding pockets and investigate the correlation of labeling sites with protein abundance. We describe 63 labeling sites of known nucleotide binding pockets, of which 24 represent a remarkable diversity of protein kinases, including several LRR-RLKs. This work launches a new approach to study ATP binding proteins in plant science.
Arabidopsis thaliana ecotype Col-0 was grown in a climate cabinet in a 10-h light regime at 22 °C and 60% relative humidity.
Biotin-AcATP was synthesized following the procedures described by Patricelli et al. (6). Briefly, N-(+)-biotinyl-6-aminohexanoic acid (30 mg, 0.085 mmol) was suspended under an argon atmosphere in anhydrous dioxane/N,N-dimethylformamide/dimethyl sulfoxide (1:1:1, 3 ml) and cooled to 0 °C. To this ice-cold solution were then added triethylamine (47 μl, 0.34 mmol) and iso-butyl chloroformate (33 μl, 0.255 mmol), resulting in the appearance of a precipitate. The resulting suspension was stirred at 0 °C for 20 min, allowed to warm up to room temperature, and then stirred for an additional 1.5 h. ATP triethylammonium salt (69 mg, 0.085 mmol) dissolved in anhydrous dimethyl sulfoxide (1 ml) was added to the reaction mixture. After 18 h, the reaction was stopped by the addition of water (4 ml), and the mixture was extracted with ethyl acetate (3 × 4 ml). The aqueous layer was frozen and subsequently lyophilized. The resulting solid was suspended in water (1 ml), transferred to a reverse-phase C18 flash-column (LiChroprep® RP-18, 40–63 μm, Merck) pre-equilibrated with 5% acetonitrile in water and eluted with 30% acetonitrile in water. Product-containing fractions were pooled, frozen, lyophilized, and subsequently stored at −20 °C (yield: 4.3 mg, 5.1 μmol, 6%). The identity of the product was confirmed via liquid chromatography electrospray ionization mass spectrometry analysis on a Thermo Scientific LCQ FleetTM electrospray ionization spectrometer equipped with an Eclipse XDB-C18 5-μm column from Agilent, Santa Clara, CA and using a linear gradient of solvent B (5 mm NH4OAc in acetonitrile) in solvent A (5 mm NH4OAc in H2O) at a flow rate of 1 ml/min (gradient program: 0 min/10% B → 1 min/10% B → 10 min/100% B → 12 min/100% B → 15 min/10% B). At Rt = 2.16 min, calculated for C26H40N8O16P3S− [M-H]− 845.15, we found 845.13. Desthiobiotin-AcATP is commercially available from Thermo Scientific.
Arabidopsis proteome was extracted from 4-week-old Arabidopsis rosette in 50 mm Tris pH 7.5. The lysate was cleared via centrifugation and subjected to gel filtration using DG10 columns (Bio-Rad). Labeling was performed in 50 mm Tris pH 7.5, 10 mm MgCl2, unless specified otherwise. The lysate was incubated with 5 or 10 μm BHAcATP or dimethyl sulfoxide for the no-probe control at room temperature for 15 min. For inhibition experiments, the lysate was preincubated with the different nucleotide-based inhibitors at a final concentration of 10 mm for 30 min before labeling with BHAcATP for 15 min. The labeling reaction was stopped by the addition of gel loading buffer containing SDS and DTT. Proteins were separated on 12% acrylamide gel and transferred onto PVDF membranes. The protein blot was probed with streptavidin-HRP (Ultrasensitive, Sigma) and detected with enhanced chemiluminescence (SuperSignal West Femto Chemiluminescent Substrate, Thermo Scientific).
After labeling of Arabidopsis leaf extract (1 ml 5 mg/ml protein) for 1 h, the lysate was desalted using a DG10 column (Bio-Rad). The eluate was incubated with 100 μl neutravidin beads (Thermo Scientific) in the presence of PBS and 0.2% SDS for 1 h. The beads were washed twice with 0.2% SDS, twice with 6 m urea, and twice with water and eluted with 50% formic acid. Proteins were separated on protein gels and stained with SYPRO Ruby (Invitrogen). Bands of interest were excised from both samples and subjected to in-gel digestion with trypsin.
LC-MS/MS analysis was performed on an LTQ Velos (Thermo) coupled to an EasyNanoLC (Proxeon/Thermo) using a “Top 20 ” data-directed acquisition. LC was performed on a C18 column (10 cm × 75 μm) with a gradient of 5%–40% acetronitrile in 0.1% formic acid for 30 min, followed by a wash with 95% acetonitrile. The data-directed acquisition included active exclusion for mass (±40 ppm) and time (60 s). Spectra were searched using Mascot 2.3 using a fixed modification of cysteine (57.02 Da for carbamidomethyl) and variable modifications of methionine (15.99 Da for oxidation) and lysine (339.16 and 355.16 Da for BHAc and OBHAc labeling, respectively). Mass tolerance was set at 0.3 Da for the precursor ions and 0.4 Da for fragment ions. Up to two missed cleavages for trypsin were permitted, as the labeling would prevent cleavage at the labeled lysine. The MS2 spectra were searched against TAIR10 pep 2012 (November 2012), containing 35,386 proteins of Arabidopsis thaliana, supplemented with a small database with 1095 artifact proteins and a reversed decoy database of the same proteins (total: 72,962 entries). Peptides were retained with Mascot scores of >41, which is above the Mascot significance level. Hits to the decoy database within this selection were given a 0.3% false discovery rate for all peptides and a 0.7% false discovery rate for non-redundant peptides. Proteins were selected having a minimum of two different peptides, as shown in supplemental Table S1. This selection for two peptides per protein removed all the false positive hits to the decoy database. Proteins were ranked on their score and manually selected considering the occurrence of the protein in the rest of the gel and its expected molecular weight and overall score and spectral count (supplemental Table S2). Only robustly identified proteins that were highly enriched in the gel slice relative to other gel slices are reported.
Analysis of desthiobiotin-acyl-ATP (DBAcATP)-labeled Arabidopsis leaf extracts was performed as described elsewhere (6, 7). Briefly, lysates from Arabidopsis leaves were generated via bead-based agitation in kinase buffer (20 mm HEPES pH 7.8, 150 mm NaCl, 0.1% Triton X-100). The resulting lysates were gel filtered into 20 mm HEPES pH 7.8, 150 mm NaCl, 1% v/v phosphatase inhibitor mixture II (Calbiochem) to remove endogenous competing nucleotides prior to labeling with 20 μm DBAcATP probe for 15 min in the presence of 20 mm MnCl2. The labeled extracts were denatured and reduced (6 m urea, 10 mm DTT, 65 °C for 15 min) and then alkylated with iodoacetamide (40 mm, 30 min at 37 °C). Following another gel filtration step (into 10 mm ammonium bicarbonate, 2 m urea) to remove unreacted reagents and lower the urea concentration, the samples were digested with trypsin. DBAcATP-labeled peptides were enriched on streptavidin resin (Thermo) and eluted using 0.1% TFA in 50% acetonitrile. MS analysis was performed on a Thermo-LTQ linear ion trap instrument in a data-dependent mode as described for the initial characterization (7) using a mass range from 500–1800 m/z.
The MS2 spectra were searched using SEQUEST against the TAIR9 pep 20090619 database (June 2009) containing 32,769 proteins of Arabidopsis thaliana. MS2 searches included fixed iodoacetamide modification of cysteine (57 Da for alkylation) and variable modifications of methionine (16 Da for oxidation) and lysine (196 Da for DBAcATP labeling). Up to three missed cleavages with trypsin were allowed, and non-tryptic or half-tryptic peptides were excluded. Mass tolerance was set at 3 Da for the precursor ions and 1 Da for fragment ions. This high mass tolerance for precursor ions had to be used because most of the precursor ions are detected with multiple charges, and the mass tolerance in SEQUEST is not charge state specific. A probability score was calculated for each labeled peptide as described previously (15). This score is based on Xcorr, delta-Cn, and peptide mass/length and compared with the distribution of false positive scores generated by searching with incorrect masses of the modification (+10 and −10 Da from the correct value). The resulting MS2 spectra were assembled in one Excel file, and all spectra with <95% probability were removed. Peptides that were detected in only one of the four MS runs were discarded. The resulting list of 242 peptides included peptides that are unique (u) in the Arabidopsis proteome or that match multiple genes (ambiguous (a)). Some peptides match multiple gene models of a single gene (isoforms (i)). These isoforms are indicated in a supplemental table but were treated as identifiers for specific genes. Supplemental Table S3 contains the lists, with 10,465 spectral counts with individual scores with probabilities of >75%. Supplemental Table S4 summarizes the spectral counts of the 242 labeled peptides with >95% probability.
In order to map the labeling sites onto protein structures, we searched the Protein Data Bank database for sequences that were homologous to the labeled proteins. Sequences for which a co-crystal with a nucleotide was available were aligned with the identified proteins, and the orthologous position of the labeled lysine was indicated in the alignment. This residue was selected in the protein structure using PyMol, and the distance to the nearest phosphate of the bound nucleotide was measured.
For every labeled protein in supplemental Table S4, we retrieved the number of spectral counts from leaves from the AtProteome database (48). For ambigious peptides, only the protein with the highest number of spectral counts was selected.
To predict the labeled peptides for all the protein kinases in Arabidopsis, we retrieved protein sequences of all protein kinases from the Arabidopsis thaliana kinase database. Sequences for each subfamily in the AthKD database were retrieved and aligned using MultiAlign. The labeled peptide that was experimentally identified was highlighted in the alignment, the orthologous Lys residues were indicated, and Lys and Arg residues were highlighted in the alignment. Tryptic peptides containing the labeling site were selected from the alignments. The orthologous labeled Lys and the Arg/Lys-Pro sites were not considered as trypsin cleavage sites, and miscleavages were ignored. The selected peptides were ranked on sequence, and redundancy for each peptide sequence was counted (n proteins carrying the same labeled peptide). Ile/Leu residues were not discriminated in this analysis, as they have the same mass. Protein kinases with a unique labeled lysine (n = 1) were counted. The other protein kinases contain ambiguous peptides (n > 1). This analysis was done for peptides carrying Lys1, Lys2, Lys3, and Lys3.
In this study, BHAcATP was used to label Arabidopsis leaf extracts. BHAcATP is composed of ATP, an acyl phosphate linker, and a biotin tag (Fig. 1A). Incubation of leaf proteomes with BHAcATP followed by the detection of protein blots with streptavidin-HRP revealed numerous biotinylated proteins when compared with the no-probe control (Fig. 1B). Signals at 55, 45, and 43 kDa were hallmark signals (Fig. 1B; white, black, and gray arrowheads, respectively) later identified as ATPB/RBCL, PGK1, and a mixture of ATP binding proteins, respectively (see below). The endogenously biotinylated proteins 3-methylcrotonyl-CoA carboxylase (MCCA) and biotin carboxyl carrier protein (BCCP) appeared as background signals at 80 kDa and 34 kDa, respectively, in both the labeled samples and the no-probe control. However, both these proteins are also ATP binding proteins, and the MCCA signal is mixed with methionine synthase, an ATP binding protein (see below).
Different labeling conditions were tested in order to further characterize BHAcATP labeling. Labeling with BHAcATP depends on pH. Weak or no labeling occurs at acidic pH (pH 3–5), and strong labeling occurs at neutral to basic pH (pH 7–10; Fig. 2A). We chose pH 7.5 as the standard labeling condition to mimic the conditions of the plant cytoplasm. The addition of MgCl2 or MnCl2 greatly increased labeling, whereas adding the chelating agents EDTA and EGTA decreased labeling (Fig. 2B). Interestingly, additional 60- and 70-kDa signals appeared when MnCl2 was added, and two 65-kDa signals appeared when MgCl2 was added (Fig. 2B, solid circles). The MnCl2-induced signals were suppressed more effectively by the addition of EGTA than of EDTA, whereas the MgCl2-induced signals were suppressed equally by both EDTA and EGTA. These results are in agreement with the fact that EGTA has a higher affinity for Mn2+ than EDTA. The increased labeling upon the addition of divalent metal cations and the corresponding decrease upon treatment with chelating agents are consistent with the role of divalent ions in ATP binding.
We next tested whether labeling would be suppressed by nucleotides. Pre-incubation with ATP or ADP strongly suppressed BHAcATP labeling, whereas AMP did not suppress labeling (Fig. 2C), suggesting that phosphates at the β and γ positions are essential for the suppression of BHAcATP labeling. In addition to ATP, CTP and, to a lesser extent, GTP and TTP also suppress BHAcATP labeling (Fig. 2C). Notably, NADP, but not NAD, also suppresses BHAcATP labeling (Fig. 2C), indicating the importance of a phosphate moiety at the 2′ position in the nucleotide to suppress labeling (supplemental Fig. S1). Interestingly, nucleotide triphosphates (NTPs) were more potent inhibitors than their deoxynucleotide counterparts (dNTPs; Fig. 2D), suggesting that the 2′ hydroxyl group on the sugar moiety of the nucleotide also plays a role in determining the ability to compete for labeling (supplemental Fig. S1). In conclusion, these data indicate that labeling is specific because it can be competed with ATP analogues.
To identify the proteins labeled by BHAcATP, the labeled proteins were affinity purified using streptavidin beads, separated on protein gel, and stained (Fig. 3A). Bands were excised, treated with trypsin, and analyzed via LC-MS/MS. Peptides with high Mascot scores (>41) and proteins with at least two unique peptides were retained. A total of 112 proteins were identified in the BHAcATP-labeled sample, but only 13 of these proteins were also identified in the no-probe control (supplemental Tables S1 and S2). The no-probe control contained proteins such as the endogenously biotinylated protein MCCA and abundant proteins such as RBCL.
We made a selection of the 112 proteins by selecting proteins with the highest spectral counts, supplemented with detected protein kinases and proteins for which a labeling site was identified (see below). The resulting selection consisted of 38 proteins that could be assigned to the 13 different bands (supplemental Table S2 and Fig. 3B). Some of these proteins also were assigned to the bands in Fig. 3A. These data indicate that the abundant 55-kDa signal (band 5) was caused by labeling of a subunit of the chloroplastic ATPase (ATPB) and the large subunit of ribulose-1,5-bisphophate carboxylase oxygenase (RBCL). This signal also contained peptides from the beta subunit of the mitochondrial ATP synthase, glycinamide ribonucleotide synthetase, and CPK9. The 45-kDa signal (band 7) was caused predominantly by phosphoglycerate kinase (PGK1), though this signal also contained spectra from other proteins. The 40-kDa signal (band 8) contained a mixture of chloroplastic sedoheptulose-1,7-bisphosphatase; chloroplast RNA binding protein; fructose-biphosphate aldolase-2; subunit A of glyceraldehyde 3-phosphate dehydrogenase; and several other proteins, including two protein kinases (LRR1 and PK), inositol-1,3,4-trisphosphate-5,6-kinase, and an uncharacterized carbohydrate kinase. Weaker signals in the upper 70- to 90-kDa region of the gel contained methionine synthase (band 1), MCCA (band 1), formyltetrahydrofolate synthetase (band 2), transketolase (band 2), heat shock proteins 70 (HSP70, band 2) and 60 (HSP60, band 3), thioglucoside glucohydrolase 2 (band 3), F-box protein GRH1 (band 3), Ser/Thr/Tyr kinase STY8 (band 3), and subunit B of glutamyl-tRNA aminotransferase (band 4). Weak signals at 50 kDa (band 6) contained a subunit of a biotin carboxylase (CAC2), a MAP3K protein kinase (VIK1), a monodehydroascorbate reductase (MDAR6), and CPK11. Weak signals in the 35- to 40-kDa region contained peroxisomal NAD-malate dehydrogenase-2 (band 9), an LRR-RLK (band 9), and phosphopantothenate-cysteine ligase (COAB, band 10). The signal at 33 kDa (band 11) contained predominantly avidin, but also present were subunit O2 of photosystem II and non-intrinsic ABC protein-7. Finally, the weak signals at the bottom of the gel (27–30 kDa) contained chloroplastic glutamine synthetase (GS2, band 11), chloroplast RNA-binding protein (band 12), carbonic anhydrase (band 12), glutathione transferase (band 12), and a UMP/CMP kinase (PYR6, band 12).
In conclusion, the vast majority of the identified proteins in the AcATP-labeled samples are ATP binding proteins (Fig. 3B). These ATP binding activities include a variety of protein kinases (STY8, CPK9, VIK1, CPK11, RLK-LRR, and LRR1), ATP-based transporters (ATPB, ATP synthase, and non-intrinsic ABC protein-7), metabolite kinases (carbohydrate kinse, PGK1, inositol-1,3,4-trisphosphate-5,6-kinase, and PYR6), and metabolic enzymes and chaperones using ATP (glycinamide ribonucleotide synthetase, MCCA, formyltetrahydrofolate synthetase, HSP70, HSP60, CAC2, MDAR6, peroxisomal NAD-malate dehydrogenase-2, and GS2).
We next examined whether the experimental molecular weight (MW) of the proteins identified from the protein gel (Fig. 3B) corresponded to the theoretical MW calculated from the protein sequence. We noticed a good correlation for nearly all the detected proteins, with the striking exception of two LRR-RLKs that had a calculated MW of 68 kDa but migrated in the region of 40 kDa (Fig. 3B). These were two highly homologous LRR-RLKs carrying six extracellular LRRs, a transmembrane domain, and a cytoplasmic kinase domain (Fig. 4). The extracellular domain also contains multiple putative N-glycosylation sites, which causes LRR-RLK proteins to migrate with an apparent MW that is typically 20 to 30 kDa larger than their calculated MW. Peptide coverage (Fig. 4 and supplemental Table S1) suggests that these labeled proteins consist of the full protein kinase domain but lack the extracellular domain. The detection of half-tryptic peptides of the C-termini of both these LRR-RLK proteins indicates that the C-terminus is intact and that a significant truncation must have occurred from the N-terminus. Interestingly, the calculated MW of the cytoplasmic domain is 37 and 38 kDa for At3g02880 and At5g16590, respectively, which coincides with the observed MW of these proteins. Taken together, these data indicate that these LRR-RLKs exist in leaves as kinase domains lacking the extracellular domain.
The searches mentioned above included a variable modification of lysine residues with the probe and allowed two trypsin miscleavages. Searches with the theoretical BHAc modification on lysine (339.16 Da) did not lead to hits with Mascot scores above the threshold (Fig. 3B). However, during more liberal searches we noticed that BHAc modifications appeared, but only in peptides that included an oxidation associated with the presence of a methionine in the peptide sequence (data not shown). This observation sparked the idea that the modification of the probe is associated with an additional oxygen. Indeed, when we searched the data with an oxygen added to the BHAc modification (OBHAc, 355.16 Da), we were able to identify modified peptides for nine different proteins with relatively high Mascot scores (41–71) (Fig. 3B).
One of those labeled peptides is from HSP60 (At3g23990), a well-described ATP binding protein. To determine the location of the labeling site in the protein relative to ATP, we searched the protein database for proteins homologous to HSP60 and identified GroEL, an HSP60 protein from Escherichia coli, for which an ATP-bound crystal structure is available (1sx3 (16)). HSP60 and GroEL proteins share 56% amino acid identity, including the region where the labeled lysine is located. The amino group of the labeled lysine is located 7.0 Å from the gamma phosphate of the bound ATP in GroEL (Fig. 3C). Thus, labeling in HSP60 occurred at the expected labeling site.
We next studied the fragmentation spectrum of the labeled peptide of HSP60 further to identify the location of the oxygen within the labeled peptide. The labeled peptide in HSP60 has the sequence VTKDGVTVAK and was detected with a Mascot score of 41.4 with a parental ion of 686.93 m/z (z = 2) (Fig. 5). Nearly all b- and y-ions were detected in the fragmentation spectrum, locating the labeled lysine including the oxygen at the third position in the peptide: VTK*DGVTVAK. The spectrum also contained three unexplained peaks in the low-mass range: 243.15, 356.31, and 439.35 Da. Importantly, the 243.15 Da ion corresponds to an oxidized biotin (expected mass: 243.3). An additional 113 Da for the aminohexanoic acid linker predicts an ion of 356.5 Da, which corresponds to the second signal at 356.31 Da. The third ion, observed 83 Da further at 439.35 Da, corresponds to the labeled version of the commonly observed immonium ion variant for lysine. High-energy collision-induced dissociation fragmentation of lysine residues typically results in an unsaturated cyclic 84 Da fragment, known as a piperidine or tetrahydropyridine ion (17). As the lysine has lost a proton during the labeling reaction, the observed 83 Da distance is explained, indicating that such fragmentation and subsequent cyclization of lysine do not cause a loss of the biotin. Taken together, these data demonstrate that the additional oxygen is located at the biotin, presumably on the sulfur. The 243.2, 356.5, and 439.3 Da ions were also found in spectra of the other labeled peptides, indicating that oxygen is consistently present on the biotin moiety.
The observation that most of the 112 BHAcATP-labeled proteins are ATP binding reflects the potency of the probe to target ATP binding proteins in a crude proteome. However, only a few of these 112 proteins are protein kinases. Furthermore, some of the BHAcATP-labeled proteins are not annotated as ATP binding proteins, and therefore the labeling sites on these proteins are uncertain. To address both issues, we analyzed the labeled proteome more deeply by analyzing the labeled peptides. For this we used a variant of the AcATP probe, DBAcATP (Fig. 6A) (7). Desthiobiotin cannot be oxidized and has a lower affinity for streptavidin, resulting in a more efficient elution of peptides for identification. After labeling with DBAcATP, the proteomes were digested with trypsin, and the biotinylated peptides were purified using streptavidin beads and analyzed via LC-MS/MS (Fig. 6B). We recorded 10,465 peptide spectra, corresponding to 567 different peptides carrying a modified lysine residue (Fig. 6C). Peptides with probability scores greater than 95% (15) and those that were identified in two or more reactions were selected (Fig. 6C), resulting in 6992 spectra corresponding to 242 peptides (supplemental Table S4).
We first analyzed the data for peptides derived from protein kinases. Annotation through the PFAM database revealed that 24 of the 242 labeled peptides originated from at least 21 different protein kinases (Fig. 7A; supplemental Table S4). This list contains seven RLKs, of which six carry extracellular LRRs. The detection of RLKs is remarkable, because we did not enrich for membranes. In addition, we identified protein kinase peptides from several MPKs, CPKs, PTI-like kinases, and other Ser/Thr protein kinases. We also detected AvrPphB susceptible 1 (PBS1) (18), VH1-interacting kinase (VIK) (19), AT6 protein kinase (20), and proline-rich extension-like receptor kinase-1 (PERK1) (21). Of the 24 protein kinase peptides, 19 were unambiguous and were derived from only one protein kinase (black accession numbers in Fig. 7A). Four protein-kinase-derived peptides were ambiguous, as the peptide sequence was identical in two or more protein kinases in the TAIR10 protein database (gray accession numbers in Fig. 7A). Two protein-kinase-derived peptides were identified for two protein kinases (top in Fig. 7A), indicating that DBAcATP labels these protein kinases at two positions. In the case of PERK1, two overlapping peptides containing the same labeled lysine were detected.
We marked the identified protein kinases in the Arabidopsis thaliana kinase database, which classifies all the 1099 protein kinases of Arabidopsis into classes, groups, and families based on sequence similarities. We identified representatives of 11 different protein kinase families belonging to 10 different groups and the three major protein kinase classes (Fig. 7B). This indicates that AcATP probes are not family specific and can be used to profile ATP binding activities of the majority of Arabidopsis protein kinases.
Alignment of the identified protein-kinase-derived peptides showed that these peptides fall into four groups containing four different labeled lysine residues, Lys1–4 (Fig. 7C). When mapped on the protein kinase domains, Lys1 is positioned in the beginning of the protein kinase domain, Lys2 in the middle, Lys3 in the second quarter, and Lys4 in the third quarter of the protein kinase domain (Fig. 7A). Lys1 and Lys2 are also part of the conserved protein kinase motifs II and VIb, respectively (22). These two lysines are also present in the PTO kinase, a tomato protein kinase involved in immunity, for which a structure is available (2qkw) (23). Both Lys1 and Lys2 reside in the ATP binding pocket of the PTO kinase, at distances of 7.43 and 4.32 Å from the gamma phosphate of the bound ATP (Fig. 7D and supplemental Fig. S2A). The closer proximity of Lys2 seems reflected in the higher spectral count of Lys2-containing peptides relative to Lys1-containing peptides. Modeling of PTI1–1 indicates that Lys3 locates in an extended loop that might fold back on the ATP binding pocket (supplemental Fig. S2B), whereas modeling of VIK1 indicates that Lys4 is located in close proximity to the gamma phosphate of bound ATP (supplemental Fig. S2C).
To determine how common these labeled lysine residues are within the protein kinase family of Arabidopsis, we counted these lysines in the alignments of each protein kinase family. We found that Lys1 and Lys2 were conserved in 70% and 47% of the 1099 protein kinases, respectively. Lys3 in PTI kinases and Lys4 in VIK1 were less conserved among protein kinases (3.5% and 5.0%, respectively). Thus, the frequencies of the theoretically labeled lysine in protein kinases correspond to the frequencies of the different labeling sites detected in protein kinases (supplemental Fig. S6).
When ranked according to peptide frequency, peptides from protein kinases are not among the top 20 most frequent peptides (Fig. 8A). The most frequent peptide from a protein kinase is that from LRR1 (At5g16590) on position 22 with 67 spectral counts (Fig. 8A and supplemental Table S4). The most frequently detected labeled peptide is from PGK1, with 968 spectral counts. PGK1 is also labeled at two additional lysine residues with 196 and 97 spectral counts, respectively. The second most frequent labeled peptide is from chloroplast ATP synthase subunit beta (ATPB), with 256 spectral counts. Other proteins in the top 20 are acetyl coenzyme A (CAC2), subunit B of Glu-tRNA aminotransferase, an ATP synthase, ribose-phosphate pyrophosphokinase 4, the small subunit of ribulose bisphosphate carboxylase, cell division cycle 48 (CDC48), a GTP binding protein (At1g30580), and UMP/CMP kinase (PYR6) (Fig. 8A). Most, but not all, of these proteins were also detected abundantly using the in-gel approach (Fig. 3). Although these proteins are unrelated and catalyze very different reactions, ATP is a common substrate, consistent with the reactivity of the probe (supplemental Table S5). Protein structures of homologs of PGK1, ATPB, CAC2, and ribose-phosphate pyrophosphokinase 4 contain the labeled Lys in close proximity to the terminal phosphate of the bound nucleotide (supplemental Fig. S3). Note that CAC2 is a biotin binding protein, but the labeled Lys is close to the bound ADP (supplemental Fig. S3C). This demonstrates that the labeling of these proteins is in accordance with the labeling mechanism of AcATP probes. However, not all proteins in this list are known to bind nucleotides. Most evident is the large subunit of Rubisco (RBCL), which is labeled at 13 different lysine residues that are scattered on the surface of the RBCL protein (supplemental Fig. S4).
We extended the analysis of labeling sites in homologous protein structures and found that 63 of the 242 labeled peptides resulted from labeling of a lysine residue in a known nucleotide binding pocket at ≤10 Å from the phosphate of the bound nucleotide (supplemental Fig. S5; blue in Fig. 8A). To estimate the potential of DBAcATP labeling in the Arabidopsis proteome, we retrieved for each of the 63 labeled peptides the corresponding PFAM domain. A total of 26 different PFAM domains corresponding to 13 different protein clans plus six uncategorized PFAM families were identified (supplemental Table S6). The Arabidopsis genome encodes for a total of 1683 proteins carrying these 26 different PFAM domains. The largest PFAM domains are those of protein kinases (PF00069 = 813 members and PF07714 = 388 members), followed by AAA ATPases (147 members). Because all 1683 of these proteins are probably binding ATP and representatives were found to be labeled by DBAcATP, we predict that the Arabidopsis proteome contains at least 1683 putative targets that could be labeled in the ATP binding pocket by DBAcATP.
To investigate whether protein kinases and other ATP binding proteins are preferentially labeled relative to highly abundant proteins, we plotted the spectral counts of the labeled peptides against the spectral counts detected for these proteins in leaf proteomes of the AtProteome database (Fig. 8B). We chose this approach to ensure the inclusion of low-abundance protein kinases, which were detected by Baerenfaller et al. (48) with over 1300 MS runs. The leaf proteomes that they used are virtually the same as the ones we used for our studies. Interestingly, this graph splits the labeled peptides into three groups. Peptides from protein kinases cluster in group 1 at low protein levels and medium labeling. Peptides from other ATP binding proteins are in group 2 and show medium protein levels and medium to high labeling. Finally, the third major group contains proteins with high protein levels and low to medium labeling. Importantly, the vast majority of these proteins were not labeled in ATP binding pockets. This illustrates that DBAcATP shows a high selectivity for labeling ATP binding proteins, as only highly abundant proteins are labeled outside known ATP binding pockets.
Interestingly, protein kinases were detected at a moderate labeling/protein ratio relative to labeling events in ATP binding pockets of non-protein kinases (0.5 ± 0.57(n = 15) versus 2.86 ± 0.68(n = 34)) (Fig. 8C). These data suggest that the protein kinases have a lower affinity for ATP or that that part of the protein kinase pool cannot be labeled. In order to investigate this in more detail, we extracted spectral counts for 494 protein kinases from leaf proteomes of the AtProteome database (48). Interestingly, of the top 20 protein kinases that are most frequently detected in leaf proteomes, only 8 were detected upon labeling (supplemental Fig. S7), and these data show no correlation between spectral counts in the AtProteome database and spectral count frequency upon AcATP labeling. Furthermore, inspection of the protein kinase sequences revealed that most of the undetected labeled peptides fell in a MW range that should have been detected in our assay. These data reinforce the idea that not all kinases can be labeled in a given extract, possibly because they do not bind ATP. However, such data should be interpreted with extreme caution, because different peptides have different ionization potentials, and the absence of a labeled peptide does not mean that the protein is not labeled.
We have reported the first in-depth analysis of targets of an AcATP probe in a plant proteome. By comparing a gel-based identification platform for labeled proteins with a gel-free platform for labeled peptides, we have demonstrated the advantages and limitations of these complementary approaches. The analysis of labeling sites using the protein database showed that AcATP probes target predominantly, but not exclusively, the ATP binding pockets of a broad range of protein kinases and other unrelated ATP binding proteins. The labeled proteins include a few well-characterized ATP binding proteins, as well as many ATP binding proteins that have not been studied so far.
The data generated by the gel-based and gel-free platforms are complementary, and each platform has its advantages and disadvantages. Although many proteins were detected using both approaches, several were detected by only one approach. HSP60 and HSP70, for example, were detected only using the in-gel approach, whereas CDC48 and MPKs were detected only via the gel-free approach. Gel-based identification resulted in the detection of 112 proteins, of which only 9 were protein kinases. Although the vast majority of the detected proteins were absent in the no-probe control, the labeling site could be determined for only nine proteins, based on the detection of labeled peptides. This number would have been even lower if we had not realized that biotin is somehow oxidized during this procedure. By contrast, sequencing the labeled peptides using Xsite/KiNativ via the gel-free approach identified more protein kinases and also determined labeling sites in each of the labeled proteins. Although this gel-free approach seems more powerful, it cannot discriminate between proteins if the labeled peptide is identical. For example, the labeled peptides do not discriminate between CPK9 and CPK11 because the labeled peptide is identical, whereas both proteins were distinguished via the gel-based approach. The gel-free approach also does not provide the MW of the labeled protein, and this can be very important information. For example, we noticed that two labeled LRR-RLKs migrated not at the expected 67–68 kDa, but at 40 kDa, indicating that they had lost the extracellular LRR domain. At this stage we cannot exclude the possibility that this processing occurred upon protein extraction. It is interesting to point out that it was recently discovered that Xa21, an LRR-RLK from rice, is proteolytically cleaved during signaling to release a cytosolic protein kinase domain that migrates to the nucleus to phosphorylate transcription factors (24). A similar receptor-shedding mechanism might be at work for the two LRR-RLKs detected on our study.
Our analysis of biotinylated proteins via in-gel digest and LC-MS/MS revealed no peptides that were labeled with BHAc but several peptides that were labeled with OBHAc, the oxidized version of BHAc. To our knowledge, this type of modified biotin has not been reported before during proteomics, and this discovery might have important implications for the detection of other modifications containing biotin. We speculate that the oxygen is located on the sulfur of biotin, resulting in a biotin sulfoxide (25). This modification was also observed, for example, during the synthesis of oligonucleotides (26), and would not occur on desthiobiotin. At this stage we do not know when during the in-gel procedure the oxidation occurs. Searches on the occurrence of oxidized biotin on proteins labeled with other probes will tell how common this modification is.
We detected 24 labeled peptides from protein kinases, but there are 1099 protein kinases encoded by the Arabidopsis genome (1). There are several reasons why no more protein kinases were detected. First, most of the protein kinases are not expressed in tissues that we used for the labeling experiments. Second, peptides from some protein kinases are too small or too large to be detected with the chosen MS settings. However, in silico analysis indicates that the majority (95%) of the theoretically labeled peptides of Arabidopsis protein kinases are in the range of 5 to 30 amino acids. Third, some protein kinases cannot be discriminated based on the sequence of the labeled peptides. Nonetheless, in silico analysis showed that 64% of the Arabidopsis protein kinases contain a putative labeled lysine in a peptide sequence that is unique in the Arabidopsis proteome (supplemental Fig. S6). Fourth, MS is not sensitive enough to detect more labeled peptides from protein kinases in an unbiased mode. Indeed, inclusion lists coupled to retention times and searches for marker ions in the MS2 mode can tremendously increase the number of identified protein kinases. The added search modes have increased the number of robustly detected protein kinases to 160 protein kinases per cell line (7). Thus, we expect that we will be able to detect labeling of over 100 different protein kinases from a single proteome with the implementation of empirical searches. The wider range of detection will increase the power of protein kinase profiling in plants tremendously.
We have detected 24 labeled peptides that originate from protein kinases. Analysis of the labeling sites in these protein kinases showed that AcATP probes react with lysines that reside at four distinct positions. Lys1 and Lys2 reside in the ATP binding pocket in motifs II and VIb, respectively, and were previously described to be labeled by AcATP probes (6). Lys3 and Lys4 were detected in PTI1-like kinases and MAP3K VIK, respectively, and had not been characterized before. Although crystal structures are not available for representatives of these protein kinase subfamilies, modeling indicates that regions carrying Lys3 and Lys4 might indeed be able to fold back on the ATP binding pocket. The reactivity of Lys3 and Lys4 indicates that even though these lysines are poorly conserved, they might be in close proximity to ATP and might play important roles in PTI1- and VIK-like protein kinases. The absence of any other labeled peptide from the detected protein kinases illustrates the remarkable selectivity of the AcATP probe to target the ATP binding pocket of these proteins.
The labeled protein kinases are remarkably diverse and contain representatives of all major classes of Arabidopsis protein kinases. The list includes RLKs, Pelle-like kinases, MAPKs, MAP3Ks, CPKs, PTI-like kinases, and PERK1. The diversity of the protein kinases illustrates the wide range of protein kinases that can be labeled with AcATP and is consistent with the fact that all protein kinases (by definition) bind ATP and most of them carry lysine residues to stabilize the phosphate. The data are also consistent with studies of AcATP probes on animal proteomes, where >75% of the human protein kinases have been detected (6). Thus, AcATP probes have the remarkable potential to label nearly all protein kinases not only of mammals, but also of other organisms. The detection of several RLKs with relatively high spectral counts is noteworthy because these proteins are thought to be low in abundance and we did not enrich for membrane proteins.
Functions have been described for several of the detected protein kinases. MPK3, -4, -6, and -11 are involved in immune signaling and control gene expression though WRKY transcription factors (4, 27, 28). PBS1 is involved in the perception of bacterial pathogens secreting the AvrPphB effector (29, 30). AvrPphB cleaves PBS1, and this activates resistance protein RPS5. Calcium-dependent kinase CPK3 is involved in stomatal closure, herbivore defense, and salt stress acclimation and is a positive regulator of sphingolipid-induced cell death (31–33). PTI1–2 interacts with oxidative stress-response protein kinase OXI1 and integrates phosphatidic acid signaling with reactive oxygen signaling (34). VIK is a MAP3K involved in the uptake of glucose into the vacuole (19), and AT6 is a MAP3K that acts as a negative regulator of salt tolerance (20). PERK1 is thought to be involved in sensing cell wall stress during wounding and pathogen infection (21, 35). The detection of these characterized protein kinases via AcATP labeling is consistent with their presence and activities in the Arabidopsis rosette used for our studies. The other two-thirds of the detected protein kinases have been described only by transcript levels and phylogeny. Our data demonstrate that these uncharacterized protein kinases are present and able to bind ATP.
Previous studies with AcATP probes have been mostly focused on the labeling of protein kinases. In this study we also examined the labeling of proteins other than protein kinases. An in-depth analysis of the labeled peptides revealed that AcATP probes label an astonishing diversity of protein families. Most of the labeled proteins are able to bind nucleotides, and analysis of structural data indicates that AcATP probes label these nucleotide binding proteins predominantly inside ATP binding pockets. The structural diversity of the nucleotide binding proteins is remarkable, as illustrated by the fact that they belong to 13 different protein clans in the PFAM database (36).
Several different ATP binding proteins have been identified with high spectral count frequencies. PGK1 is the top hit among the labeled proteins. PGK degrades ATP during glycolysis (37). Subunit B of glutamyl-tRNA aminotransferase is an Asn/Gln-tRNA amidotransferase involved in the transamidation of misacylated Asp-tRNA or Glu-tRNA forming a correctly charged Asn-tRNA or Gln-tRNA, a reaction that requires ATP (38, 39). CDC48 encodes for cell division cycle protein 48 and is important in cell division, expansion, and differentiation. Arabidopsis plants carrying mutations in CDC48-encoding genes show impaired seedling development and defective pollen and embryo development (40). CDC48 is an AAA-type ATPase and belongs to a large superfamily containing a highly conserved triple-A domain that binds and hydrolyzes ATP (41). CAC2 is the biotin carboxylase subunit of the heterodimeric, biotin-containing enzyme chloroplastic acetyl-coenzyme A carboxylase, which is involved in the synthesis of fatty acids in Arabidopsis (42) and uses ATP for the conjugation of a carboxylate-containing molecule to an amino- or thiol-group-containing molecule (43). CAC2 contains a phosphate-binding loop and a Mg2+-binding site and has two alpha-beta subdomains that grasp the ATP molecule. This “ATP-grasp ” domain is common to a superfamily that also contains d-alanine:d-alanine ligase, glutathione synthetase, and carbamoyl phosphate synthase. PYR6 is a uridine 5′-monophosphate (UMP)/cytidine 5′-monophosphate (CMP) kinase that converts UMP/CMPs to uridine and cytidine diphosphates, respectively. PYR6 orthologs in bacteria and yeast are required for cellular proliferation and RNA and protein synthesis (44, 45). PYR6 has also been detected in the mature pollen of Arabidopsis, consistent with its role in cell proliferation and division (46). Taken together, these five structurally and functionally diverse examples illustrate the broad range and relevance of ATP binding enzymes that we can monitor using AcATP probes.
Our analysis of labeling sites also showed that a large number of labeling sites are not located in known nucleotide binding pockets. It is possible that several of these labeling sites are nucleotide binding sites that have not yet been described. However, the correlation between spectral counts and transcript levels indicates that labeling outside known nucleotide binding pockets is typically detected for highly abundant proteins, such as Rubisco. An obvious explanation for the labeling of abundant proteins outside nucleotide binding sites would be that solvent-exposed lysine residues have a low basal reactivity toward AcATP and that this causes aspecific labeling of highly abundant proteins. Strangely enough, however, AcATP labeling of abundant proteins nevertheless requires divalent ions and can be competed with nucleotides that do not carry an acyl reactive group. These data support the exciting idea that ATP-competable AcATP labeling sites might identify novel nucleotide binding sites in proteins. Interestingly, a recent crystal structure of rice Rubisco with NADPH revealed that two of the labeled lysines are in close proximity to the phosphate in NADPH (supplemental Fig. S4D) (47). Thus, labeling outside known nucleotide binding sites is an interesting topic for further studies.
AcATP labeling is a powerful way to monitor ATP binding activities in native proteomes. Besides the majority of the 1099 protein kinases of Arabidopsis, AcATP can potentially label another 482 ATP binding proteins, representing another 11 different protein families. Although AcATP probes also label outside known nucleotide binding sites, the preferential labeling of nucleotide binding proteins in ATP binding pockets demonstrates the high selectivity of the AcATP probe. An interesting future prospect will be to study the changes of ATP binding activities of proteins upon an external stimulus. Our data suggest that labeling not only reflects the abundance of the ATP binding proteins, but also is affected by the ATP binding activity of the proteins. ATP binding can be dynamic and dependent on the cellular state. However, further studies on changes in ATP binding activities upon a stimulus must involve pair-wise comparisons of the same labeled peptide under different conditions, and this requires further development of quantitative methods such as stable isotope labeling of amino acids in cell culture and the use of isotope-labeled probes. We aim to use AcATP labeling in the near future to identify selective inhibitors of Arabidopsis protein kinases and to study the activation of protein kinases upon external stimuli.
The mass spectrometry proteomics data have been deposited to the Proteome XChange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository  with the dataset identifier PXD000188.
* This work was financially supported by the Max Planck Society and the Deutsche Forschungsgemeinschaft (DFG projects HO 3983/3-1 and 4-1), and COST action CM 1004. M.K. greatly acknowledges funding by an ERC Starting Grant (Grant No. 258413).
This article contains supplemental material.
1 The abbreviations used are::