In this study, we have demonstrated the quantitative capability, reproducibility, and sensitivity of the GPS approach for proteomic profiling (discovery). In about 20 min, the crude proteome was reduced to a suitably large pool of motif-containing peptides, which was then subjected to LC-MS-MS. In comparison, reproducible detection and quantification of (low-abundant) proteins constitutes a major bottleneck in global shotgun profiling efforts using mass spectrometry (
33–
35). A common solution to this matter has been extensive conventional prefractionation prior to LC-MS-MS, but this approach, which might also introduce other issues (
e.g. logistics, yield, and reproducibility), is not practical when large sample cohorts are to be addressed. In addition, the instrument time is frequently suboptimally used because of repetitive sampling of peptides from high-abundant proteins. In the case of GPS, the latter two issues could, if so required, be even bypassed, by (1) optimizing the capture step by designing the motifs so that few or no high-abundant motif-carrying peptides are targeted, and (2) optimizing the assay step by multiplexing,
i.e. using ≥2 CIMS antibodies at the same time (
18,
22). In order to increase the throughput, MALDI mass spectrometry could also be adopted in the detection step depending on the complexity of the sample (
21,
22).
The GPS was shown to display a dynamic range of at least three orders of magnitude, capable of targeting high- (>1 million copies/cell) as well as low-abundant (<50 copies/cell) proteins (), thereby extending our initial findings (
22). Additional experiments will be required in order to determine in particular the quantitative dynamic range. Although the abundance levels, adopted from Ghememaghami
et al. 2003 (
32), were based on a different strain of
S. cerevisiae, and the proteome was harvested slightly earlier (OD
600 0.5
versus 0.7), a comparison of abundance levels was still relevant because the cells were harvested in log-phase in both cases. Moreover, the sensitivity (and throughput) could be further optimized by combining GPS with predefined isotopic labeled peptides present in the elution buffer and the multiple reaction monitoring (MRM) technique, paving the way for high-throughput proteomic discovery and validation efforts (
19,
34–
37).
The GPS technology was found to exhibit a peptide (protein) coverage overlapping and complementary to existing proteomic technologies (
32), validating the findings reported in our initial study (
22). As for example, in the spike-in yeast proteome experiments, 239 peptides, of which 53 (27%) were not previously reported in PeptideAtlas (
38), were identified based on only two CIMS antibodies and a few LC-MS/MS runs (). The observation that GPS were found to target peptides of overlapping but different (shorter) average length than SCX could be explained by (1) the nature of the motif-containing peptides, (2) the affinity of the antibodies (assuming that shorter peptides binds more strongly), and/or (3) the nature of the SCX peptide population in the mass spectrometry detection step. Although the GPS approach identified about half the number of proteins as SCX (), it was difficult to draw any direct conclusions, because these numbers depend heavily on the frequency of the CIMS motifs, the affinity of the antibodies (normally in the sub-μ
m range (
22)), and the number of runs. In this case, the SCX set-up was assigned more LC-MS/MS time, and it is a well-known fact that the more MS/MS that are acquired the deeper one may probe the targeted proteome (
33,
39).
On average, each CIMS antibody was found to target about 75 different motif-containing yeast proteins. In this context, it should be noted that the CIMS antibodies were originally selected against motifs designed for targeting the human proteome (
22), clearly highlighting the cross-species applicability. Although only six of these non-yeast optimized CIMS antibodies were used along with minimal LC-MS/MS time, a significant section of the central carbon metabolism pathway in yeast was still covered using GPS (
C and
D). Notably, more than 60 proteins were uniquely identified with the GPS approach, again showing the complementary coverage provided compared with existing proteomic technologies, here illustrated by SCX. The coverage could be further extended, and directed against other pathways and/or certain protein families by generating additional CIMS binders against carefully designed CIMS-motifs. In this context, our recombinant scFv antibody library does not only represent a vast, renewable probe source, but could also provide unique opportunities in generating antibodies against peptides displaying low or even no immunogenicity (
24). In addition, the proteome coverage could potentially also be extended by changing the specificity of the digesting enzyme employed and generating a new set of C-terminal motif specific CIMS antibodies. This would allow for a different set of peptides to be targeted, and it is a well-known fact that different portions of the proteome are more or less suited to be viewed by a mass spectrometry (
40). Some peptides were detected and enriched by more than one CIMS antibody, but the cross talk was low. Some cross-binding will be anticipated, considering the nature of the experimentally determined binding motif,
e.g. limited number of key residues (often only two or three), as well as nature and position of these residues. In addition, some of the overlapping peptides were identified as potential nonspecific background-binding peptides.
The quantitative capability and reproducibility was demonstrated addressing samples of different complexity (dynamic range). First, the quantitative accuracy was outlined by successfully determining the ratios (≤fourfold changes) of pre-mixed non-SILAC and SILAC-labeled glucose grown yeast proteomes (
B and
C). The observed groups of log
2 ratios were relatively skewed as compared with the expected ratios using non-normalized data, whereas the anticipated ratios were observed when the data was normalized toward the 50/50 ratio. Although no clear trend was observed, the data indicated that the accuracy of the ratio decreased with decreasing signal intensities. We chose to display the raw data intensities from MaxQuant in order to better reflect the true performances of GPS. However, in real comparative experiments, normalized data should be applied. Next, the technical reproducibility,
i.e. including both capture and LC-MS/MS, was found to be high (R
2-values ≤ 0.99), whether proteomes with low (≤fourfold changes) or high (
![[dbl greater-than sign]](/corehtml/pmc/pmcents/x226B.gif)
fourfold changes) were profiled, outlining the applicability of the GPS set-up for quantitative proteomics.
Furthermore, the determined expression ratios conformed very well with those determined for the same samples using SCX on the peptide level (). Noteworthy, this was the case also on the protein level, although the SCX data in several cases was based on multiple peptides per protein, whereas GPS mainly relied on a single peptide per protein, supporting the underlying concept of the GPS approach. In these comparisons, we used normalized data since the isotopically labeled conditions consistently were underrepresented in both GPS and SCX, although somewhat higher R2-values were obtained for non-normalized data in both cases.
The poor reproducibility in terms of overlap of MS/MS identified peptides between two technical LC-MS/MS runs of complex peptide mixtures is well known and because of the stochastic nature of data dependent sampling (
39). The GPS-generated fractions consistently outperformed the SCX fractions () in terms of MS/MS identified peptides in separate LC-MS/MS runs. In more detail, the median identification reproducibility was 68% for GPS, but only 43% for SCX. Perhaps even more importantly, when comparing the overlap between biological replicate samples, the GPS methodology was found to display a median MS/MS identification reproducibility of 62% compared with only 31% for SCX. Notably, the SCX data (
B) was based on the same fraction (eluate) injected twice for LC-MS/MS analysis (technical variations in the LC-MS/MS runs), whereas the GPS data (
A) always was based on two separate captures and LC-MS/MS runs (
i.e. technical variations for the entire assay). Noteworthy, when discussing the reproducibility, the peptide capture step was always included when appropriate, indicating that the variation introduced by the capture step had to be small. Hence, the data outlined the reproducibility of the GPS methodology and a key advantage for consistent and reproducible measurement (profiling) of the same set of analytes (peptides) in various biological samples. In a recent protein expression study addressing technical and biological variation when applying ITRAQ and
S. cerevisiae as model system, the highest variation was clearly attributed to the biological variation (
41). In other words, adopting the GPS approach, running one capture per binder and biological sample might frequently be enough, thereby releasing valuable MS instrument time that instead could be used to run more biological replicate samples. In order to keep the GPS assay time at a minimum, multiplexing,
i.e. applying several CIMS antibodies at the same time, will be essential.
Significant efforts have been made to map the proteome of yeast using various MS-based approaches (
32,
34,
42). The present approach detected several differentially expressed proteins of high biological relevance in glucose
versus ethanol cultivated yeast proteomes, representing the first GPS-based application (). In agreement with earlier findings using conventional proteomic methodologies (
43,
44), many of these proteins were pin-pointed to the TCA-cycle and glyoxylate cycle, and displayed a significant induction upon growth in the presence of ethanol as carbon source. As for example, fructose-1.6-bisphosphatase I (FBP1), a key regulatory enzyme in the gluconeogenesis pathway (
45,
46) was identified as being massively up-regulated. Furthermore, the expression trends for additional enzymes, such as MLS1, CIT1, ENO2, and SDH2 correlated well with a recent MRM study (
47). In fact, only two key enzymatic players, including glucose-repressible alcohol dehydrogenase II (ADH2) and phosphoenolpyruvate carboxykinase 1 (PCK1) were missed by the GPS set-up, albeit using only six nonyeast optimized CIMS antibodies. It might also be of interest to note that two stress related proteins, HSP12 and HSP26; were highly up-regulated in the ethanol condition, a feature commonly observed for yeast cultivated under stressed conditions (
48,
49). Furthermore, in agreement with other studies (
50,
51), a set of several down-regulated ribosomal related proteins and related transcription, translation initiation factors (SFP1 and TIF3), and RNA-binding proteins (NOP6 and ARC1) were identified with the GPS setup. Of note, several uncharacterized proteins (YNR034W-A, YFR017C, YBL081W, YLR413W) were also identified to be significantly up-or down-regulated in both biological replicate sets using GPS. Further experiments will be required in order to delineate their functional role, and use in
e.g. the development of ethanol tolerant yeast strains.
The GPS methodology thus currently display a promising suitability for addressing proteomes, such as mammalian tissue extracts and eukaryotes (e.g. yeast proteomes) in a discovery mode for identification and quantification of both high- and low-abundant analytes in a reproducible, cross-species manner using a limited set of CIMS binders. In other words, GPS provides a technology and an opportunity not matched by using either affinity proteomics or mass spectrometry (interfaced with classical fractionation methods) separately. The approach is currently limited by i) the motif design, ii) antibody affinity versus proteome target, and iii) performance of the MS-MS setup. First, the motif design and subsequent antibody selection steps is still a key bottleneck if certain proteins or proteins groups are to be specifically targeted as well as avoided (e.g. high-abundant species). Second, the affinity of the CIMS antibodies (currently in the μm range) versus the target proteome might represent a limitation, because captured peptides could be lost, i.e. washed-off, when more complex proteomes, such as human serum, requiring even more stringent washing conditions in order to reduce nonspecific background binding, are targeted. Third, as for any MS-based approach, the performance of the selected MS instrumentation will be critical for the GPS set-up.
In conclusion, we have studied the applicability of the GPS platform (
22), and demonstrated its quantitative capability, reproducibility, sensitivity, and coverage, outlining its potential within proteomic discovery profiling efforts. The GPS technology provides a novel methodological edge when addressing complex samples, and could be suitable for expression profiling studies, ranging from large-scale unbiased discovery studies to focused MRM type like assays, paving the way for the next generation of affinity proteomic efforts.