|Home | About | Journals | Submit | Contact Us | Français|
The development of high-performance technology platforms for generating detailed protein expression profiles, or protein atlases, is essential. Recently, we presented a novel platform that we termed global proteome survey, where we combined the best features of affinity proteomics and mass spectrometry, to probe any proteome in a species independent manner while still using a limited set of antibodies. We used so called context-independent-motif-specific antibodies, directed against short amino acid motifs. This enabled enrichment of motif-containing peptides from a digested proteome, which then were detected and identified by mass spectrometry. In this study, we have demonstrated the quantitative capability, reproducibility, sensitivity, and coverage of the global proteome survey technology by targeting stable isotope labeling with amino acids in cell culture-labeled yeast cultures cultivated in glucose or ethanol. The data showed that a wide range of motif-containing peptides (proteins) could be detected, identified, and quantified in a highly reproducible manner. On average, each of six different motif-specific antibodies was found to target about 75 different motif-containing proteins. Furthermore, peptides originating from proteins spanning in abundance from over a million down to less than 50 copies per cell, could be targeted. It is worth noting that a significant set of peptides previously not reported in the PeptideAtlas database was among the profiled targets. The quantitative data corroborated well with the corresponding data generated after conventional strong cation exchange fractionation of the same samples. Finally, several differentially expressed proteins, with both known and unknown functions, many relevant for the central carbon metabolism, could be detected in the glucose- versus ethanol-cultivated yeast. Taken together, the study demonstrated the potential of our immunoaffinity-based mass spectrometry platform for reproducible quantitative proteomics targeting classes of motif-containing peptides.
In the quest for disease-associated biomarkers, the deciphering of the human proteome(s) will be central (1). Albeit powerful mass spectrometry (MS)-based technology platforms have been developed and frequently applied (2–4), the output in terms of validated biomarkers have so far been limited, mainly due to technological issues (5, 6). In recent years, affinity proteomics based on antibody microarrays have become an established proteomic technology for protein expression profiling of complex proteomes (7–11). To date, the technology has been applied in several clinical applications, demonstrating its potential for, e.g. biomarker discovery, improved diagnosis and prognosis, as well as classification (8, 12–15). Despite the success, the possibility to run large-scale and, in particular, discovery mode projects using the conventional antibody array designs have been limited (7, 8, 16). First, antibodies of only known specificities, i.e. directed against preselected targets of a known specie, have been included, thereby excluding the possibility to discover novel targets (across species). Second, the resolution of an antibody array is directly related to the sheer number of antibodies included and their range of specificities, which tends to be a bottleneck. Third, the numbers of readily available antibodies, designed for microarray applications, have per se been limited.
To bypass these technological hurdles and advance further, and even to provide quantitative capabilities, the most attractive features of affinity proteomics and MS could be combined (17, 18). The biological sample would then be digested and exposed to peptide-specific antibodies, after which any specifically enriched peptides would be detected, identified, and potentially quantified using MS. This was demonstrated in the stable isotope standard capture with antipeptide antibodies set-up, but, like conventional antibody arrays, this platform relied on the use of one binder per unique peptide/protein (19, 20). To circumvent the need of having to generate numerous antibodies, we (18) and others (17), have recently presented the novel concept of using antibodies directed against short peptide-motifs (epitopes) shared among up to hundreds of different peptides/proteins. This would provide an inherent capability of probing any proteome in a discovery mode, in a species independent manner, while still using a very limited number of antibodies. Based on this concept, we have recently designed a platform denoted global proteome survey (GPS)1, based on human recombinant single-chain fragment variable (scFv) antibodies (18, 22), while Joos et al. presented the triple X proteomic (TXP) set-up, relying on polyclonal and/or monoclonal antibodies. In our case, the scFv antibodies, microarray adapted by molecular design (7, 8, 23), were derived from a large phage-display library (24), representing a renewable probe source displaying an extensive range of specificities. In this manner, one hundred of such scFv antibodies, denoted context-independent motif specific (CIMS) antibodies, could theoretically cover almost 50% of the nonredundant human proteome (18, 22), a concept supported by a recent in-silico motif study of the human proteome (25). Recently, we demonstrated experimental proof-of-principle for that a limited number of CIMS antibodies could be used to profile crude, digested proteomes when combined with a mass spectrometry-based read-out (22). The GPS platform was in fact shown to provide novel and broad coverage, and to have the potential to reach deep into a proteome in a species independent manner.
In this study, we have demonstrated the GPS set-up with respect to its quantitative capability, reproducibility, sensitivity, and coverage, by using the stable isotope labeling by amino acids in cell culture (SILAC) approach (26), targeting SILAC-labeled yeast cultivated in either glucose or ethanol. In order to evaluate the quantitative ability in more detail, the same samples were in parallel experiments fractionated using conventional strong cation exchange chromatography (SCX). The data showed the potential and applicability of our immunoaffinity-based mass spectrometry GPS platform for reproducible quantitative proteomics targeting classes of motif-containing peptides.
Six human recombinant CIMS scFv antibodies (clones 1-B03, 15-A06, 17-E02, 32–3A-G03, 33–3D-F06, and 34–3A-D10) directed against six short C-terminal amino acid peptide motifs (denoted M-1, M-15, M17, M-32, M-33, and M-34), were selected from the n-CoDeR library (24), and kindly provided by BioInvent International AB, Lund, Sweden (supplemental Table S1). The specificity and dissociation constant (sub μm range) of the CIMS antibodies have recently been evaluated elsewhere (22) (Olsson et al., ms in prep.). Noteworthy, the antibody selection procedure adopted so far, was not designed to generate binders with a strict 4 or 6 amino acid motif specificity, why the sequence of the bound motif varied (wobbled). In general, only two or three positions appeared to be more critical (“anchor residues”), whereas a higher degree of wobbling was allowed in the other positions. The antibodies were produced in 100 ml E. coli cultures and purified using affinity chromatography on Ni2+-NTA agarose (Qiagen, Hilden, Germany). Bound molecules were eluted with 250 mm imidazole, dialyzed against phosphate-buffered saline (PBS) (pH 7.4) for 72 h and then stored at +4 °C until further use. The protein concentration was determined by measuring the absorbance at 280 nm using a Nanodrop-1000. The integrity and purity of the scFv antibodies was evaluated by running Protein 80 chips on Agilent Bioanalyzer (Agilent, Waldbronn, Germany).
Purified scFvs were individually coupled to magnetic beads (M-270 carboxylic acid-activated, Invitrogen Dynal, Oslo) using manufacture provided protocols with some minor modifications as described before (22). Briefly, 180–250 μg purified scFv was covalently coupled (EDC-NHS chemistry) to ~9 mg (300 μl) of magnetic beads, and stored in 0.005% (v/v) Tween-20 in PBS at 4 °C until use.
Saccharomyces cerevisiae was chosen as model organism in order to represent a well-characterized complex proteome. To this end, the yeast strain YAL6B, containing deficient genes for both lysine and arginine synthesis, and kindly provided by Prof. O.N. Jensen (University of Southern Denmark, Odense, Denmark), was applied (27). A preculture was started in yeast nitrogen base medium (YNB) containing 0.055 g/L adenine, 0.055 g/L tyrosine, 0.045 g/l uracil and supplemented with 2% (w/v) glucose and necessary amino acids. Thirty mg/L arginine and 30 mg/L lysine were added to the medium. The culture was incubated overnight at 30 °C on a rotary shaker at 200 rpm, followed by dilution to an OD600 of 1. Then, 1 ml of the overnight culture was transferred to either normal or heavy YNB medium supplemented with 30 mg/L 13C6,15N4-arginine and 30 mg/L 13C6-lysine (Cambridge Isotope Laboratories, Andover, MA). The cultures were incubated at 30 °C and grown overnight to an OD600 of 1 in order to ensure complete incorporation of isotopic amino acids. Finally, two yeast cell cultures were then initiated with either 2% (w/v) glucose or 0.05% (w/v) glucose and 3% (v/v) ethanol as the carbon source in YNB medium supplemented with necessary amino acids. Swap experiments were performed as a biological replicate in order to have the isotopic labeled amino acids present in both growth conditions. Hence, in total four cultures were initiated (glucose, glucose (K6, R10), ethanol, and ethanol (K6, R10)). In addition, two separate cultures (glucose and glucose (K6, R10)) were initiated for the spike-in proteome ratio experiments. The cultures were grown in 30 °C on a rotary shaker at 200 rpm and harvested in log phase at OD600 0.7. The cultures were centrifuged at 5000 rpm (10 min at 4 °C). The pellets were dissolved in 10 ml ice-cold Milli-Q water and then centrifuged at 2000 × g (5 min at 4 °C). In order to preserve the samples from degradation (28), the pellets were resuspended in 1.5 ml cold 10% (v/v) TCA, transferred to 2 ml tubes and finally collected by centrifugation at 13,200 rpm for 1 min at 4 °C. Samples were snap-frozen in liquid nitrogen and stored at −80 °C until further use.
Cell pellets were thawed on ice, resuspended in 500 μl extraction buffer 8 m Urea, 30 mm Tris, 5 mm MgAc and 4% (w/v) CHAPS (pH 8.5). Equal amount (500 μl) of resuspended cells from the different growth conditions of ethanol and glucose were pooled (heavy + light) and 1.0 g glass beads (0.55 mm) added to the pooled cells. For the separately planned spike-in proteome ratio experiments, no pooling of resuspended cells was done, and protein extracts were isolated separately allowing for downstream mixing of various “heavy” (H) and “light” (L) proteome amounts. Extractions were performed using a BeadBeater with 8 × 60 s mixing incubations at 4 °C (and 60s intervals on ice between mixing steps) followed by centrifugation at 10,000 × g (10 min at 4 °C). The buffer was exchanged to 0.15 m HEPES, 0.5 m Urea using Zeba desalting spin columns (Pierce, Rockford, IL), before the protein concentration was determined using Total Protein Kit, Micro Lowry (Sigma, Saint Louis, MO). The average protein concentrations (triplicates) were determined to be 2.03 μg/μl for the yeast grown in glucose and 1.98 μg/μl for the yeast grown in glucose in the presence of isotopic amino acids. Finally, the samples were aliquoted and stored at −80 °C until further use.
For the quantitative proteome spike-in control GPS-experiments, different amounts of protein extracts from glucose grown yeast (unlabeled and SILAC labeled) were mixed at three H and L ratios (20/80 H/L, 50/50 H/L, and 80/20 H/L) based on the total protein concentrations. The protein extracts were thawed, reduced, alkylated, and trypsin digested. First, 0.05% (w/v) SDS and 5 mm TCEP-HCl (Thermo Scientific, Rockford, IL) were added, and the samples were reduced for 60 min at 56 °C. The samples were cooled to room temperature before iodoacetamide was added to 10 mm and then alkylated for 30 min at room temperature. Next, sequencing-grade modified trypsin (Promega, Madison, WI) was added at 20 μg per mg of protein for 16 h at 37 °C. In order to ensure complete digestion, a second aliqout of trypsin (10 μg per mg protein) was added and the tubes were incubated for an additional 3 h at 37 °C. Finally, the digested samples were aliquoted and stored at −80 °C until further use. The addition of a trypsin inhibitor directly after thawing the digested sample, did not apparently affect the subsequent GPS analysis (Olsson et al., unpublished observations).
A standardized sample work-flow was adopted. For each capture experiment, a 35 μl solution of prewashed (300 μl PBS) CIMS-conjugated beads was used. A tryptic digest (20 μg was thawed just prior to use and diluted in PBS) into a final incubation volume of 30–35 μl and then incubated with the beads for precisely 15 min with gentle mixing. Next, the tubes were placed on a magnet, the supernatant removed, and the beads were washed twice with 62 and 50 μl PBS, respectively (a total of 5 min) (the beads were transferred to new tubes in between each wash step). Finally, the beads were incubated with 7.5 μl 5% (v/v) acetic acid for 1 min in order to elute captured peptides. Two technical replicates were performed for each sample and binder. The eluate was then used directly for mass spectrometry analysis without any additional cleanup.
For SCX fractionation of peptides, 100 μg of each biological sample (ethanol-glucose mix) was concentrated to a volume of 100 μl by speed-vac, followed by an addition of 500 μl 5 mm KH2PO4, 25% (v/v) acetonitrile (ACN) (pH 2.9). The SCX fractionation was performed on an ICAT-Cartridge (Applied Biosystems, Foster City, CA) cation exchange. The resin was first activated by adding 5 mm KH2PO4, 25% (v/v) ACN, 1 m KCl (pH 2.9) and then rinsed with 5 mm KH2PO4, 25% (v/v) ACN (pH 2.9) before the sample was loaded. An eight-step elution gradient from 30 to 500 mm KCl was used. Each 500 μl fraction (in total 8), was dried down and re-suspended in 5% (v/v) formic acid followed by a c18 clean up performed with UltraMicroSpinColumns (Vydac C18 Silica from The Nest Group, Southborough, MA). The wetting of the C18 Silica was performed by adding 2 × 50 μl 70% ACN, 5% (v/v) formic acid followed by equilibration by adding 2 × 50 μl 5% (v/v) formic acid. The sample was loaded and washed with 4 × 50 μl 5% (v/v) formic acid. Finally, the samples were eluted with 50 μl 50% (v/v) ACN, 5% (v/v) formic acid, dried down and resuspended in 30 μl 0.1% (v/v) formic acid.
An ESI-LTQ-Orbitrap XL (Thermo Electron, Bremen, Germany) coupled to an Eksigent 2D nano HPLC (Eksigent technologies, Dublin, CA) was used for all samples analyzed. The auto-sampler injected 6 μl of GPS-generated eluates and 5 μl of the SCX fractions respectively. The peptides were trapped on a precolumn (Zorbax 300SB-C18 5 × 0.3 mm, 5 μm, Agilent) and separated on a reversed-phase analytical column (Zorbax 300SB-C18 150 × 0.75 mm, 3.5 μm, Agilent). The flow rate was 350 nl/min. Solvent A consisted of 0.1% (v/v) formic acid in water and solvent B of 0.1% (v/v) formic acid in acetonitrile. The total runtime was 70 min starting with a 5 min wash of the GPS generated peptides. For the SCX fractions, the total run time was 80 min starting with a 15 min wash of the peptides. The linear ion trap (LTQ)-Orbitrap was operated in data-dependent mode to automatically switch between Orbitrap-MS and LTQ-MS/MS acquisition. Survey full scan MS spectra (from m/z 400 to 2000) were acquired in the Orbitrap with a resolution of 60,000 at m/z 400 using the lock mass option for internal calibration. The seven most intense ions with charge state 2 and up were sequentially isolated for CID-fragmentation in the LTQ with a normalized collision energy of 35% and the resulting fragment ions were recorded in the LTQ. For the proteome spike-in control experiments (20/80 H/L, 50/50 H/L, and 80/20 H/L), using CIMS-17-E02 and CIMS-33–3D-F06, a total of 12 LC-MS/MS runs were performed (2 capture experiments per binder and proteome mix). The mixed ratio (e.g. 20/80) was based on the total protein concentration (see above). In order to evaluate any nonspecific background binding peptides, blank beads, i.e. beads without any conjugated antibody, were exposed to a 50/50 H/L proteome mixture (supplemental Table S2A-C). Notably, only 13 background peptides, corresponding to 13 proteins, were identified when the eluates from two blank runs were analyzed. For the ethanol-glucose mixed proteome profiling experiments a total of 24 LC-MS/MS runs were performed with the GPS setup (two capture experiments per binder on two biological samples). In total, 32 LC-MS/MS runs were performed on the ethanol-glucose mixed proteomes of the SCX fractionated samples (two runs (technical replicates) for each of the eight eluted fractions from the two biological samples). The total number of identified peptides and proteins were lower in biosample 2 (isotopic amino acids present in glucose condition) than biosample 1, but as this was noticed for both the GPS and SCX set-up, this was attributed to the sample and not the experimental set-ups.
Raw Orbitrap full scan MS and MS/MS spectra were processed by MaxQuant (v 220.127.116.11) as described (29) using default settings and MASCOT Daemon (v 2.2.2 Matrix Science, London UK) as the database search engine for peptide identifications. Briefly, MS/MS peak lists were filtered to contain at most six peaks per 100 Da interval and searched by Mascot against a forward and a random combined database (Saccharomyces cerevisiae Swiss-Prot, 13-Oct 2009 with 6890 forward sequences, and with a random generated decoy version with same amino acid distribution and length, resulting in a total of 13,780 sequences). Tandem mass spectra were matched with a mass tolerance of 7 ppm on precursor mass and 0.5 Da for fragment ions. Carbamidomethylation of cysteine was selected as a fixed modification, and oxidation of methionine and acetylation of the protein N terminus were used as variable modifications. Labeled arginine and lysine were specified as fixed or variable. A false discovery rate of 0.01 was used (estimated on the basis of the number of identified random hits). The tool WebLogo was used to generate the motif figures for selected binders enriched sequence motifs (30). The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) tool version 8.3 (31) was used for creating protein-protein interaction (PPI) evidence maps with default settings (medium significance) for a subset of proteins identified as differentially expressed.
The data associated with this study is downloadable from Proteome Commons (http://proteomecommons.org/) Tranche using the following hash: xNBoKsLK7paM+4ZEI3nSPo7TTVaZw4TrVyZtv22cgMiX8fW+km9Bky96GrXoduVjmWCBO1mmVSeEFKWvUZLUNlvuAzYAAAAAAABb6g== and the following passphrase: pnCaYNyzvfltGXtjnQqy. Included are all raw files that were processed by MaxQuant as described above, identified proteins, sequences of identified peptides and peptide evidences (i.e. SILAC ratio measurements), annotated MS/MS spectra for single peptide identifications, and the yeast database file used for Mascot searches.
In this study, we have demonstrated the efficacy of the GPS technology platform for quantitative proteomics. To this end, SILAC-labeled protein extracts from S. cerevisiae, cultivated in glucose and ethanol, were used as model proteomes. An overall workflow outlining all steps of the set-up is schematically shown in supplemental Fig. S1.
To investigate the quantitative accuracy and reproducibility of the GPS measurements, two CIMS antibodies, clones CIMS-17-E02 and CIMS-33–3D-F06, were used to profile glucose grown yeast SILAC-proteomes mixed at three known ratios of H and L (80/20, 50/50, and 20/80) (Fig. 1 and supplemental Tables S3 and S4). In total, two captures per binder and proteome mix, i.e. 12 runs, were performed. First, the sequence of the enriched peptides were determined and matched to the motif specificity of the CIMS antibodies. The results showed that a total of 100 different peptides, including three background peptides, were enriched by CIMS-17-E02, whereas 145 different peptides (three background peptides) were captured by CIMS-33–3D-F06 (Fig. 1A).
Next, the H/L ratios were determined for all the enriched peptides (Fig. 1B and supplemental Figs. S2A and S2B). In Fig. 1B, representative data, illustrated for YXR (CIMS-17-E02) and DXR (CIMS-33–3D-F06) motif-containing peptides are plotted using non-normalized raw data. The data showed that the peptides were grouped into three distinct clusters, with the expected differences in peptide ratios between the three groups. Hence, the data outlined the quantitative accuracy of the GPS method. The fact that the observed ratios, 1.6 (20/80), –0.6 (50/50), and –2.8 (80/20) for CIMS-17-E02, and 1.1, –0.9, and –2.9 for CIMS-33–3D-F06 (Fig. 1B), were relatively displaced as compared with the expected ratios (–2, 0, and 2) could be explained by that non-normalized data was used. When the data was normalized toward the 50/50 ratio, the 20/80 and 80/20 ratios were clearly distributed around the anticipated values, displaying median ratios around –2.1 and 2.1, respectively (Fig. 1C and Supplemental Fig. S2B).
The reproducibility of the technical replicates, i.e. encompassing the entire GPS set-up including the capture step, was then determined (supplemental Fig. S2C). Representative data, illustrated for YXR (CIMS-17-E02) and DXR (CIMS-33–3D-F06) motif-containing peptides, is plotted in Fig. 1D. The results showed that high r2-values (0.98 and 0.99) were obtained, and the total median percent difference, considering all six captures and analytes, was 5.9% to 6.8% (supplemental Fig. S2C).
The technical reproducibility was also determined for analytes displaying a significantly larger fold change then the artificial mixtures used above (≤fourfold changes), by targeting SILAC-proteomes generated from S. cerevisiae cultivated in glucose or ethanol. Representative data, illustrated for CIMS antibody CIMS-33–3D-F06, is shown in supplemental Fig. S3. After removing a single potential outlier peptide, present in both biological replicates, r2-values between 0.96 and 0.98 were observed.
In order to evaluate the dynamic range (and sensitivity) of the GPS platform, glucose grown yeast SILAC-proteome were profiled using two CIMS antibodies, and the identified yeast proteins were then mapped to the known absolute protein abundances generated by orthogonal methods (32) (Fig. 2 and supplemental Tables S5A and S5B). For both CIMS antibodies, the results showed that yeast proteins could be detected in a wide range of abundances, spanning over a million and reaching down to less than 50 copies per cell (Fig. 2). Noteworthy, a significant set of the identified peptides (19 and 27%, respectively) had not previously been reported in PeptideAtlas (supplemental Tables S3B and S4B). Hence, the GPS set-up was shown to display a dynamic range of at least three orders of magnitude, high sensitivity, and a complementary peptide coverage.
To further evaluate the quantitative capability of GPS, the data was compared with that obtained for the same samples analyzed in parallel experiments using one of the main orthogonal approaches within conventional quantitative proteomics, namely SCX LC-MS/MS (Fig. 3 and supplemental Tables S6A–S6C). To this end, SILAC-proteomes generated from S. cerevisiae, cultivated in glucose or ethanol, were profiled using six CIMS antibodies and any differentially expressed analytes were identified and quantified. Although two biological replicates and two technical replicates were applied in both cases, the GPS set-up relied on six CIMS antibodies (in total 24 runs) and the SCX set-up on eight fractions (in total 32 runs).
The GPS platform was found to identify 738 peptides (Fig. 3A and supplemental Table S6B) corresponding to 449 proteins (Fig. 3B and supplemental Table S6A), i.e. in average detecting 1.6 peptides/protein. Each CIMS antibody was in average found to target 75 motif-containing but different proteins. Notably, the cross-talk was found to be low (supplemental Fig. S4). In more detail, 80% of all peptides were only captured by one of the CIMS antibodies, whereas 12% by two CIMS antibodies, and single % by three or more CIMS binders. In comparison, the SCX set-up was shown to detect 3805 peptides (Fig. 3A and supplemental Table S6A) corresponding to 920 proteins (Fig. 3B and supplemental Table S6A), in average detecting 3.8 peptides/protein. Notably, 321 peptides and 67 proteins were uniquely detected with the GPS methodology, and 3388 peptides and 539 proteins with SCX, respectively. Hence, the GPS and SCX set-ups were found to display overlapping and complementary protein expression profiling data. This notion was further supported by the observation that GPS and SCX were found to target peptides of overlapping but different average lengths. More specifically, a median peptide length of 11 (GPS) versus 13 (SCX) amino acids was observed based on all peptides (Fig. 3D), and 10 versus 14 amino acids (Fig. 3E) when limited to peptides identified and quantified in both biological replicates (ratio count ≥2).
Next, we determined the correlation between the quantitative data generated using GPS and SCX, again targeting SILAC-proteomes generated from S. cerevisiae, cultivated in glucose or ethanol (Fig. 4). The results showed that a high correlation between the quantitative data was obtained on both the peptide (r2-values of 0.86 and 0.82) and protein (r2-values of 0.91 and 0.89) levels covering a large concentration span of various analytes, demonstrating the quantitative capability of GPS.
Noteworthy, the reproducibility between the two biological replicates (including swap of isotopic labeled amino acids) for GPS and SCX were then also determined and compared (supplemental Fig. S5). In the case of GPS, the data showed that 319 of 738 (43%) detected peptides were consistently quantified between the biological replicates (supplemental Fig. S5A), although only 641 of 3805 (17%) detected peptides were consistently quantified with SCX (supplemental Fig. S5C). After removing any potential outliers (≤3), an r2 correlation of 0.76 was observed for GPS (supplemental Fig. S5B), but only 0.60 for SCX (supplemental Fig. S5D), clearly demonstrating that GPS generated reproducible and stable peptide quantification for biological replicates (including swapping of isotopic amino acids).
Next, the reproducibility of the MS/MS identification was evaluated for both GPS and SCX (Fig. 5). To this end, the identification overlap was determined for all replicate runs. The results showed that the median identification overlap was 68% for GPS (Fig. 5A), but only 43% for SCX (Fig. 5B). Furthermore, the identification overlap between biological samples was also found to be significantly higher for GPS than for SCX, 62% versus 31%, respectively (Fig. 5). It should be noted that the GPS data was based on two separate captures and LC-MS/MS runs, i.e. technical variation for the entire set-up, while the SCX data was based on the same eluate injected twice for LC-MS/MS analyses, i.e. technical variation only for the LC-MS/MS runs.
Finally, the biological relevance of proteins pin-pointed as differentially expressed in glucose versus ethanol cultivated yeast, using GPS and SCX was examined (Fig. 6 and supplemental Fig. S6). The results showed that 27 and 50 differentially expressed proteins (p < 0.01, biosample 1) were identified using GPS (Fig. 6A) and SCX (Fig. 6B), respectively. Focusing on the central carbon metabolism pathways, including the glycolysis, tricarboxylic acid (TCA) cycle, and glyoxylate cycle, several key differentially expressed proteins could be mapped (Fig. 6C). Noteworthy, the GPS set-up and the SCX approach showed coherent profiles.
In the case of GPS, the nature of the differentially expressed proteins was further characterized by generating protein-protein interaction maps (Fig. 6D). The results showed that several of the up-regulated proteins were involved in the carbon metabolism pathways and energy production, whereas many of the down-regulated proteins were directly or indirectly involved in transcription and translation, such as ribosomal proteins, ARC1, and TIF3. Taken together, the data demonstrated the applicability of GPS for quantitative proteomics in a reproducible and sensitive manner targeting biologically relevant proteins.
In this study, we have demonstrated the quantitative capability, reproducibility, and sensitivity of the GPS approach for proteomic profiling (discovery). In about 20 min, the crude proteome was reduced to a suitably large pool of motif-containing peptides, which was then subjected to LC-MS-MS. In comparison, reproducible detection and quantification of (low-abundant) proteins constitutes a major bottleneck in global shotgun profiling efforts using mass spectrometry (33–35). A common solution to this matter has been extensive conventional prefractionation prior to LC-MS-MS, but this approach, which might also introduce other issues (e.g. logistics, yield, and reproducibility), is not practical when large sample cohorts are to be addressed. In addition, the instrument time is frequently suboptimally used because of repetitive sampling of peptides from high-abundant proteins. In the case of GPS, the latter two issues could, if so required, be even bypassed, by (1) optimizing the capture step by designing the motifs so that few or no high-abundant motif-carrying peptides are targeted, and (2) optimizing the assay step by multiplexing, i.e. using ≥2 CIMS antibodies at the same time (18, 22). In order to increase the throughput, MALDI mass spectrometry could also be adopted in the detection step depending on the complexity of the sample (21, 22).
The GPS was shown to display a dynamic range of at least three orders of magnitude, capable of targeting high- (>1 million copies/cell) as well as low-abundant (<50 copies/cell) proteins (Fig. 2), thereby extending our initial findings (22). Additional experiments will be required in order to determine in particular the quantitative dynamic range. Although the abundance levels, adopted from Ghememaghami et al. 2003 (32), were based on a different strain of S. cerevisiae, and the proteome was harvested slightly earlier (OD600 0.5 versus 0.7), a comparison of abundance levels was still relevant because the cells were harvested in log-phase in both cases. Moreover, the sensitivity (and throughput) could be further optimized by combining GPS with predefined isotopic labeled peptides present in the elution buffer and the multiple reaction monitoring (MRM) technique, paving the way for high-throughput proteomic discovery and validation efforts (19, 34–37).
The GPS technology was found to exhibit a peptide (protein) coverage overlapping and complementary to existing proteomic technologies (32), validating the findings reported in our initial study (22). As for example, in the spike-in yeast proteome experiments, 239 peptides, of which 53 (27%) were not previously reported in PeptideAtlas (38), were identified based on only two CIMS antibodies and a few LC-MS/MS runs (Fig. 1). The observation that GPS were found to target peptides of overlapping but different (shorter) average length than SCX could be explained by (1) the nature of the motif-containing peptides, (2) the affinity of the antibodies (assuming that shorter peptides binds more strongly), and/or (3) the nature of the SCX peptide population in the mass spectrometry detection step. Although the GPS approach identified about half the number of proteins as SCX (Fig. 3), it was difficult to draw any direct conclusions, because these numbers depend heavily on the frequency of the CIMS motifs, the affinity of the antibodies (normally in the sub-μm range (22)), and the number of runs. In this case, the SCX set-up was assigned more LC-MS/MS time, and it is a well-known fact that the more MS/MS that are acquired the deeper one may probe the targeted proteome (33, 39).
On average, each CIMS antibody was found to target about 75 different motif-containing yeast proteins. In this context, it should be noted that the CIMS antibodies were originally selected against motifs designed for targeting the human proteome (22), clearly highlighting the cross-species applicability. Although only six of these non-yeast optimized CIMS antibodies were used along with minimal LC-MS/MS time, a significant section of the central carbon metabolism pathway in yeast was still covered using GPS (Figs. 6C and and66D). Notably, more than 60 proteins were uniquely identified with the GPS approach, again showing the complementary coverage provided compared with existing proteomic technologies, here illustrated by SCX. The coverage could be further extended, and directed against other pathways and/or certain protein families by generating additional CIMS binders against carefully designed CIMS-motifs. In this context, our recombinant scFv antibody library does not only represent a vast, renewable probe source, but could also provide unique opportunities in generating antibodies against peptides displaying low or even no immunogenicity (24). In addition, the proteome coverage could potentially also be extended by changing the specificity of the digesting enzyme employed and generating a new set of C-terminal motif specific CIMS antibodies. This would allow for a different set of peptides to be targeted, and it is a well-known fact that different portions of the proteome are more or less suited to be viewed by a mass spectrometry (40). Some peptides were detected and enriched by more than one CIMS antibody, but the cross talk was low. Some cross-binding will be anticipated, considering the nature of the experimentally determined binding motif, e.g. limited number of key residues (often only two or three), as well as nature and position of these residues. In addition, some of the overlapping peptides were identified as potential nonspecific background-binding peptides.
The quantitative capability and reproducibility was demonstrated addressing samples of different complexity (dynamic range). First, the quantitative accuracy was outlined by successfully determining the ratios (≤fourfold changes) of pre-mixed non-SILAC and SILAC-labeled glucose grown yeast proteomes (Figs. 1B and and11C). The observed groups of log2 ratios were relatively skewed as compared with the expected ratios using non-normalized data, whereas the anticipated ratios were observed when the data was normalized toward the 50/50 ratio. Although no clear trend was observed, the data indicated that the accuracy of the ratio decreased with decreasing signal intensities. We chose to display the raw data intensities from MaxQuant in order to better reflect the true performances of GPS. However, in real comparative experiments, normalized data should be applied. Next, the technical reproducibility, i.e. including both capture and LC-MS/MS, was found to be high (R2-values ≤ 0.99), whether proteomes with low (≤fourfold changes) or high (fourfold changes) were profiled, outlining the applicability of the GPS set-up for quantitative proteomics.
Furthermore, the determined expression ratios conformed very well with those determined for the same samples using SCX on the peptide level (Fig. 4). Noteworthy, this was the case also on the protein level, although the SCX data in several cases was based on multiple peptides per protein, whereas GPS mainly relied on a single peptide per protein, supporting the underlying concept of the GPS approach. In these comparisons, we used normalized data since the isotopically labeled conditions consistently were underrepresented in both GPS and SCX, although somewhat higher R2-values were obtained for non-normalized data in both cases.
The poor reproducibility in terms of overlap of MS/MS identified peptides between two technical LC-MS/MS runs of complex peptide mixtures is well known and because of the stochastic nature of data dependent sampling (39). The GPS-generated fractions consistently outperformed the SCX fractions (Fig. 5) in terms of MS/MS identified peptides in separate LC-MS/MS runs. In more detail, the median identification reproducibility was 68% for GPS, but only 43% for SCX. Perhaps even more importantly, when comparing the overlap between biological replicate samples, the GPS methodology was found to display a median MS/MS identification reproducibility of 62% compared with only 31% for SCX. Notably, the SCX data (Fig. 5B) was based on the same fraction (eluate) injected twice for LC-MS/MS analysis (technical variations in the LC-MS/MS runs), whereas the GPS data (Fig. 5A) always was based on two separate captures and LC-MS/MS runs (i.e. technical variations for the entire assay). Noteworthy, when discussing the reproducibility, the peptide capture step was always included when appropriate, indicating that the variation introduced by the capture step had to be small. Hence, the data outlined the reproducibility of the GPS methodology and a key advantage for consistent and reproducible measurement (profiling) of the same set of analytes (peptides) in various biological samples. In a recent protein expression study addressing technical and biological variation when applying ITRAQ and S. cerevisiae as model system, the highest variation was clearly attributed to the biological variation (41). In other words, adopting the GPS approach, running one capture per binder and biological sample might frequently be enough, thereby releasing valuable MS instrument time that instead could be used to run more biological replicate samples. In order to keep the GPS assay time at a minimum, multiplexing, i.e. applying several CIMS antibodies at the same time, will be essential.
Significant efforts have been made to map the proteome of yeast using various MS-based approaches (32, 34, 42). The present approach detected several differentially expressed proteins of high biological relevance in glucose versus ethanol cultivated yeast proteomes, representing the first GPS-based application (Fig. 6). In agreement with earlier findings using conventional proteomic methodologies (43, 44), many of these proteins were pin-pointed to the TCA-cycle and glyoxylate cycle, and displayed a significant induction upon growth in the presence of ethanol as carbon source. As for example, fructose-1.6-bisphosphatase I (FBP1), a key regulatory enzyme in the gluconeogenesis pathway (45, 46) was identified as being massively up-regulated. Furthermore, the expression trends for additional enzymes, such as MLS1, CIT1, ENO2, and SDH2 correlated well with a recent MRM study (47). In fact, only two key enzymatic players, including glucose-repressible alcohol dehydrogenase II (ADH2) and phosphoenolpyruvate carboxykinase 1 (PCK1) were missed by the GPS set-up, albeit using only six nonyeast optimized CIMS antibodies. It might also be of interest to note that two stress related proteins, HSP12 and HSP26; were highly up-regulated in the ethanol condition, a feature commonly observed for yeast cultivated under stressed conditions (48, 49). Furthermore, in agreement with other studies (50, 51), a set of several down-regulated ribosomal related proteins and related transcription, translation initiation factors (SFP1 and TIF3), and RNA-binding proteins (NOP6 and ARC1) were identified with the GPS setup. Of note, several uncharacterized proteins (YNR034W-A, YFR017C, YBL081W, YLR413W) were also identified to be significantly up-or down-regulated in both biological replicate sets using GPS. Further experiments will be required in order to delineate their functional role, and use in e.g. the development of ethanol tolerant yeast strains.
The GPS methodology thus currently display a promising suitability for addressing proteomes, such as mammalian tissue extracts and eukaryotes (e.g. yeast proteomes) in a discovery mode for identification and quantification of both high- and low-abundant analytes in a reproducible, cross-species manner using a limited set of CIMS binders. In other words, GPS provides a technology and an opportunity not matched by using either affinity proteomics or mass spectrometry (interfaced with classical fractionation methods) separately. The approach is currently limited by i) the motif design, ii) antibody affinity versus proteome target, and iii) performance of the MS-MS setup. First, the motif design and subsequent antibody selection steps is still a key bottleneck if certain proteins or proteins groups are to be specifically targeted as well as avoided (e.g. high-abundant species). Second, the affinity of the CIMS antibodies (currently in the μm range) versus the target proteome might represent a limitation, because captured peptides could be lost, i.e. washed-off, when more complex proteomes, such as human serum, requiring even more stringent washing conditions in order to reduce nonspecific background binding, are targeted. Third, as for any MS-based approach, the performance of the selected MS instrumentation will be critical for the GPS set-up.
In conclusion, we have studied the applicability of the GPS platform (22), and demonstrated its quantitative capability, reproducibility, sensitivity, and coverage, outlining its potential within proteomic discovery profiling efforts. The GPS technology provides a novel methodological edge when addressing complex samples, and could be suitable for expression profiling studies, ranging from large-scale unbiased discovery studies to focused MRM type like assays, paving the way for the next generation of affinity proteomic efforts.
We thank Dr. Karin Hansson for assistance with the LTQ-Orbitrap instrument, Dr. Fredrik Levander, supported by BioInformatics Infrastructure for Life Science (BILS), for all assistance with the Mascot server and Prof. Ole N. Jensen for kindly donating the yeast strain.
* This work was supported by grants from Swedish National Research Council (VR-NT), the Foundation for Strategic Research (SSF) (Strategic Centre for Translational Cancer Research - CREATE Health), and Vinnova.
This article contains supplemental Figs S1 to S6 and Tables S1 to S6.
1 The abbreviations used are: