|Home | About | Journals | Submit | Contact Us | Français|
The addition of off-line high-performance liquid chromatography to matrix-assisted laser desorption/ionization mass spectrometry greatly reduces congestion in the mass spectra, and also provides complete decoupling of the separation process from mass detection and measurement. This removes the time constraints inherent in on-line coupling, and so enables the detailed mass-spectrometric study of samples at later times. We describe here our use of this method to successfully characterize two “unknown” protein mixtures that were set as problems by the ABRF Proteomics Research Group (PRG) in the years 2003 and 2004.
The advent of matrix-assisted laser desorption/ionization (MALDI)1,2 and electrospray ionization (ESI)3 has made mass spectrometry (MS) the method of choice for protein and peptide analysis. Characterizing individual proteins by MS is often considered to be a straightforward exercise because of the high mass accuracy and sensitivity of modern mass spectrometers, but the usual complexity of biological samples still presents a problem. Thus, it is necessary to call on various techniques to separate the protein components in such samples, and then examine them one by one in the mass spectrometer. A standard procedure calls for separation of a protein mixture into its components by one- or two-dimension gel electrophoresis, followed by excision of the individual bands from the gel, proteolytic digestion of each band, measurement of the resulting peptides by mass spectrometry, and search of the databases to provide protein identification. This procedure is very labor-intensive and fails to detect some classes of protein, so more recently one- or two-dimension liquid chromatography (LC) has become increasingly popular as a replacement for the gel separation. Up to now, both methods have been directed mainly towards protein identification. However, biological studies increasingly require more detailed characterization of the protein, including the definition of post-translational modifications. The need for preliminary separation is even more acute in this case, and again LC is becoming the favored method for the initial separation.
For ESI and similar methods, the LC is coupled directly to the ion source, and the fractions are examined sequentially as they elute from the LC. For desorption methods, on the other hand, the most natural form of coupling is off-line;4 MALDI is a good example.5 This has the advantage that sampling and spectrum acquisition can be separated in time and optimized separately, and has the especially important feature that there is no time constraint on the MS measurement. A given LC fraction can be examined briefly, and a more detailed study (e.g., an MS/MS measurement) can be carried out later, taking as much time as the operator wishes.
For such off-line deposition, we constructed a small robot that operates at room temperature and pressure.5 Chromatographic peaks after typical gradient micro-HPLC separation of peptides have an average width at half-height of 20–30 sec, so a convenient duration (30–60 sec) of fraction deposition can be used. This device has been used in our laboratory for more than three years as an interface between nano/micro HPLC systems and the Manitoba/Sciex prototype MALDI QqTOF instrument.6 The combination has provided a useful platform for solution of several complex tasks, including de novo sequencing of SARS viral proteins7 and assignment of disulfide bonding patterns in complex cysteine-rich proteins such as integrins.8 In particular, we have successfully analyzed both 2003 and 2004 “unknown” samples circulated as problems by the ABRF Proteomics Research Group (submission numbers 65212 and 65213, respectively) by HPLC-MALDI QqTOF MS. Since these exercises are designed to simulate realistic conditions, they provide useful tests for any MS method. This paper describes our results on the ABRF samples, including arguments for the assignments submitted.
The lyophilized ABRF PRG03 sample was stated to contain a tryptic digest of 5 pmol “unknown” protein with two of the tryptic peptides phosphorylated at substoichiometric levels. It was said to also contain a tryptic digest of much less than 5 pmol of a nonphosphorylated contaminant.
This sample was resuspended in 50 μL 0.2% formic acid. One-half microliter of this solution (1%) was mixed 1:1 with 2,5-dihydroxybenzoic acid (DHB) matrix (150 mg/mL in 1:1 water:acetonitrile) and deposited on a MALDI target for direct MS analysis. For HPLC-MALDI characterization, one-tenth of the sample (0.5 pmol in 5.0 μL) was used for input into each of the two systems used (see below)—i.e., a total of 21% of the sample supplied was used for the analysis (~1 pmol).
The lyophilized ABRF PRG04 sample was said to contain 3 pmol each of three closely related intact proteins, as well as possible contaminants.
This sample was dissolved in 100 mM ammonium bicarbonate. One-sixth of the sample (i.e., 3 × 0.5 pmol protein) was reduced (10 mM DTT, 30 min, 57°C), alkylated (50 mM iodoacetamide, 30 min in the dark at room temperature), dialyzed against 100 mM NH4HCO3 (6 h, 7 kDa MWCO, Pierce), and digested overnight with (sequencing grade) modified trypsin (Promega 1/50 enzyme/substrate weight ratio, 12 h, at 37°C). The amount of trypsin added was calculated for an assumed protein molecular weight of ~50 kDa. The digestion mixture was lyophilized and resuspended in 5.5 μL of 0.5% TFA. One-half microliter was used for direct MALDI analysis, and the rest (5.0 μL) was subjected to micro-LC fractionation followed by MALDI MS or MS/MS.
We were able to identify three isoforms of carbonic anhydrase as components in the “unknown” sample (see below), but their intensities turned out to be much smaller than expected, so as a check we analyzed a control sample of bovine carbonic anhydrase (BCA) isoform I, prepared from commercially available protein (Sigma) and treated similarly to the ABRF sample. This control sample was first dissolved in 0.5% TFA, and solution calculated to contain 3 pmol of protein was lyophilized and resuspended in 100 mM ammonium bicarbonate. One-sixth of the latter solution was processed as described for the ABRF PRG04 sample.
Deionized (18 MΩ) water and HPLC-grade acetonitrile were used for the preparation of eluents. Column temperature was maintained at 30°C throughout all experiments. The chromatographic separations were performed using two different HPLC configurations:
The individual digests, as well as the chromatographic fractions, were analyzed both by single mass spectrometry (MS) with m/z range 200–7000 Da, and by tandem mass spectrometry (MS/MS) in the Manitoba/Sciex prototype MALDI quadrupole/TOF (QqTOF) mass spectrometer6 (Figure 11).). In this instrument, orthogonal injection of ions from the quadrupole into the TOF section normally produces a mass resolving power of ~10,000 FWHM and accuracy within a few mDa in the TOF spectra, in both MS and MS/MS modes.
The ABRF-PRG03 sample was said to be a tryptic digest of a mixture of two proteins (5 pmol for the major, and much less than this for the minor component), with two of the tryptic peptides from the major protein phosphorylated at substoichiometric levels).
Our first step was a MALDI MS analysis. As our instrument has fmol sensitivity, 0.5 μL (50 fmol) of the ABRF PRG03 sample solution (1%) was sufficient for the measurement. The MALDI MS spectrum is shown in Figure 22,, with expanded sections in Figures 3A3A and 4A4A.. Values of m/z measured manually for the most intense peaks in this spectrum were submitted to the program ProFound,9 which identified bovine protein disulfide isomerase (BPDI, SwissProt accession number P05307) as the major component in the mixture. A similar analysis of the improved spectra after HPLC separation enabled the program to identify a more extensive range of BPDI peptides, as listed in Table 11;; most of these (27) were confirmed by MS/MS measurements.
Examination of the Figure 22 spectrum and the HPLC-separated spectra uncovered an additional useful fact—the ABRF PRG03 sample had been overalkylated with iodoacetamide during sample preparation. Most of its peaks exhibited +57 Da adducts characteristic of incomplete removal of the alkylating agent prior to digestion,10 as listed in Table 11;; Figure 22 shows an example. This property has good and bad features. The good feature is the increased discrimination the additional peaks lend to the identification (Table 11),), once one realizes their cause. For example, we note that all phosphorylation candidates in Table 11 except two (1808 Da and the correct 2027 Da) lack observed +57 Da adducts, suggesting that either these are small peaks, or that their identifications may be mistaken, a result subsequently confirmed by MS/MS measurements.
The bad feature is the additional complication caused by the extra ions in the spectrum. In this case, the +57 Da adduct corresponding to the most prominent peptide in Figure 3A3A (BPDI 233–249) appeared at an especially unfortunate position (m/z 2022.084), just below m/z 2027.010, which ProFound identified as an ion consistent with phosphorylation of BPDI 257–273; (see below). Our resolution was good enough so that we were able to recognize that these two were separate ions, but this would likely not have been possible in a lower-resolution measurement.
Apart from BPDI-related fragments in the MALDI MS and the later HPLC-separated runs, we were able to identify a number of peptides from different human keratins in the sample. We did not find the “minor component” in the MALDI-MS run, but we identified 12 peptides from bovine serum albumin (BSA) in the HPLC-separated spectra (~19% sequence coverage). This turned out to be the actual “minor component,” but we did not report this identification to ABRF because a BSA tryptic digest had been routinely used for calibrating/testing purposes in both HPLC-MALDI MS systems, so we thought that the BSA peaks were simply carryover peptides. Of course, such a coincidence is an unlikely event, but it is an argument for using a new column in a problem of this kind!
ProFound discovered four cases where an observed m/z value for a peak in the MALDI MS spectrum corresponded to a possible phosphorylated peptide from BPDI, (i.e., 79.966 Da + the calculated peptide mass for a BPDI peptide containing Ser, Thr, or Tyr), as well as 10 such possibilities in the nano-HPLC run (marked with a single asterisk (*) in Table 11).). An estimate of the probability of phosphorylation for the various cases was obtained from the program NetPhos.11 In particular, this program predicted both Ser266 and Ser268 to be highly probable phosphorylation sites.
One strong candidate identified by ProFound from both the MALDI MS spectrum and the HPLC-separated run was BPDI 257–273, which contained both Ser266 and Ser268, as well as Thr257 and Tyr270, all possible phosphorylation sites. Careful examination of the MALDI MS spectrum (Figure 3A3A)) revealed the m/z 2027.010 candidate ion mentioned above, together with another ion consistent with its loss of phosphoric acid, but both ions were afflicted by poor signal/noise ratios. As expected, HPLC separation gave substantial improvement—isolation of the significant peaks in a given chromatographic fraction, and an increase in the signal/noise ratio by an order of magnitude (Figure 3B3B).). The presence in fraction #36 of both the phosphorylated BPDI 257–273 peptide (m/z 2027.032) and its decay product (m/z 1929.051), together with the removal of the prominent (BPDI 233–249, m/z 1965) ion and its adduct (to another fraction), was especially valuable. Thus the assignment of phosphorylation in BPDI 257–273 could be made with some confidence.
Phosphorylation in this peptide was finally confirmed by an MS/MS measurement on the phosphorylated parent ion (Figure 3C3C).). MS/MS gave another order of magnitude improvement in the signal/noise ratio, and yielded a number of phosphorylated daughters. Significantly, the observation of both the phosphorylated y8 ion and the unphosphorylated y7 ion defined Ser 266 as the phosphorylation site in BPDI 257–273, ruling out the other three possibilities. Thus, we concluded that BPDI 257–273 was certainly phosphorylated at Ser266.
Nevertheless, we were somewhat puzzled by the missed cleavage at Lys 265 in the phosphorylated BPDI 257–273. This was presumably a result of the phosphorylation, since the corresponding unphosphorylated peptide (calculated mass 1947.066 Da = 2027.032 Da – HPO3) was not visible in Figure 3B3B,, so it had apparently been cleaved completely. (Both phosphorylated and unphosphorylated ions are likely to elute from the RP column in the same fraction, because introduction of the hydrophilic –PO3H group does not significantly affect peptide hydrophobicity, but in any case we found no trace of the unphosphorylated BPDI 257–273 ion in other fractions). However, the presence of acidic residues (Asp or Glu) as immediate neighbors has been reported as one of the most frequent causes of missed cleavages,12 and we hypothesized that the –PO3H on Ser266 was sufficiently acidic to inhibit cleavage at Lys265, (as argued also by the organizers of the test16). See later discussion of this point.
But where was the second phosphorylated peptide that was specified in the initial description of the problem? MS/MS measurements on the other candidate ions starred in Table 11 gave negative results; all were mistaken identifications. However, as remarked above, NetPhos11 had identified Ser 268 as a highly probable phosphorylation site, and we guessed that it was far enough away from Lys265 so that the cleavage at that residue would not be inhibited. One product of such a cleavage (shared with the unphosphorylated version) was BPDI 257–265 at 1081.7 Da, which appeared in the MALDI-MS spectrum and also in the HPLC-separated spectra (Table 11).). However, the more definitive phosphorylated fragment (BPDI 266SVS-DYEGK273 at 964 Da) was obscured by the isotopic envelope of the BPDI 341–347 peptide (ITEFCHR, calculated m/z 962.452) in Figure 4A4A,, as well as background ions. Moreover, to our surprise, neither BPDI 266–273 nor its phosphorylated counterpart appeared in any of the nano-HPLC fractions.
At this point we realized that BPDI 266–273 is a small peptide of low hydrophobicity, so it is likely to pass through the trap column used during the sample injection. To avoid such a loss, we carried out a second LC separation, this time with a micro-LC instrument with direct sample injection, the second technique described above. After this separation, the peptide BPDI 266–273 (m/z 884.403) was found in an early fraction (#12, Figure 4B4B),), along with an ion consistent with phosphorylation (m/z 964.365). Again, both peptides were found to elute from the RP column in the same fraction, consistent with the previous argument.
The MS/MS spectra of both unphosphorylated and phosphorylated BPDI 266–273 are shown in Figures 4C and 4D4D.. Figure 4C4C shows unphosphorylated y1–y4 ions from the m/z 964.365 parent, together with b4–b6 ions showing neutral loss of phosphoric acid, ruling out Tyr270 as a phosphorylation site, but confirming phosphorylation at either Ser 266 or Ser268.
The ions observed in Figure 4D4D were consistent with either possibility. However, inhibition of tryptic cleavage at Lys 265 to form the observed peptide (BPDI 266–273) would again be expected by the argument above if the phosphorylation were at Ser266, and the disappearance of the y5 and y6 ions between Figures 4C and 4D4D suggested significant changes in the molecule in the neighborhood of Ser268. These arguments were not completely definitive, but both favored phosphorylation at Ser 268. We therefore reported certain identification of BPDI 266–273 as a second phosphorylated peptide, with phosphorylation most probable at Ser268. It is worth remarking that the doubly charged (BPDI 266–273) ion produced by ESI was found to produce a more definitive result by the organizers of the test,16 (a complete set of y ions), so this is a case where ESI measurements would have been more beneficial than MALDI.
Table 22 compares the three different techniques that we used for peptide mapping and detection of post-translational modifications on BPDI. The sequence coverage and the number of matched peptides were found to be very similar for all three procedures, but the separation of both background and BPDI ions into ~40 fractions enabled many more peaks of low abundance to be measured after the HPLC separations, particularly in nano-HPLC. In this case, some ions corresponding to missed cleavages were detected, as well as the BSA ions mentioned above, but the ions observed were predominantly background ions.
The amount of sample used for the HPLC runs was a factor of 10 higher than the amount used for MALDI MS, but it seems unlikely that an increase in sample amount in the MALDI MS measurement would have been helpful, since the primary problem in both Figures 3A3A and 4A4A was interference from real ions present in the sample. The unit mass resolution reached in our measurements was clearly important, but it also seems unlikely that an increase in mass resolution (e.g., by the use of FTICR) beyond that level would have been much help.
The comparison of two different HPLC techniques with the same amount of sample injected showed superior sensitivity for the nano version, as expected, since the combination of a smaller fraction size and a smaller amount of MALDI matrix per fraction provided more concentrated samples for the final stage of analysis. This resulted in a larger number of low abundance peptides with one missed cleavage detected (Table 22).). On the other hand, both the total number of matched peptides and the sequence coverage are higher for micro-HPLC MALDI MS, because of sample loss during injection through the trap column in the nano-HPLC system.
As clarified at the ABRF meeting13 and in subsequent reports,14–16 the phosphorylated content of the sample was actually supplied by the addition of two synthetic phosphopeptides (1 pmol of each), not from substoichiometric phosphorylation of the protein, as originally stated. Knowledge of this fact would have simplified the problem somewhat. In particular, it would have removed any concerns about missed cleavages in BPDI 257–273, since the spiked phosphopeptides were not actually subjected to tryptic digestion (although our arguments for resolving this difficulty were apparently anticipated by the organizers,16 as mentioned above). More significantly, knowledge of the real situation would have removed the possibility that the inhibition of cleavage at Lys 265 in BPDI 257–273 was only partial, in which case the phosphorylated BPDI 257–273 could have been simply a tryptic fragment from the phosphorylated peptide first observed. In the real situation, the spiked peptides were not subject to tryptic digestion, so any phosphorylated peptide observed was necessarily the second example specified.
The ABRF-PRG04 sample was stated to be a lyophilized mixture of three closely related intact proteins from up to two different species, 3 pmol of each. Participants were asked to identify the three most abundant proteins and their common post-translational modification.
Protein identification by peptide mass fingerprinting was the obvious choice for initial protein identification in this mixture. One-sixth of the sample was used for each HPLC-MS run, because our previous experience indicated that 0.5 pmol peptide typically provides intense peaks and reproducible results for a mass fingerprint. However, preliminary investigation of a tryptic digest (about 50 fmol) by MALDI MS did not show any intense peaks in the spectrum. Likewise, subsequent micro-HPLC MALDI MS measurements failed to reveal any peaks corresponding to the expected level of peptide. We first guessed that the distribution of tryptic cleavage sites in the proteins might have led to the production of fragments that were either too small or too large for efficient MS measurement, so we carried out digests with other enzymes (Glu-C, chymotrypsin), which we thought might produce proteolytic fragments of more reasonable size. These attempts were equally unsuccessful, so we became reconciled to searching for proteins of much lower intensity than expected.
Fortunately we were still able to return to the original micro-HPLC MALDI MS tryptic digest for closer examination, a key advantage of the off-line separation. MS/MS analysis of low-intensity peaks in the chromatographic fractions showed fragments from keratin and from trypsin autolysis. More significantly, we identified peptides corresponding to the sequences of three different carbonic anhydrases: bovine carbonic anhydrase (BCA, gi115453, isoforms I and II) and human carbonic anhydrase (HCA, gi229731), although at low levels. It was possible, however, to obtain about 60% sequence coverage for all three proteins, and also to detect their common N-terminal acetylation. This result seemed to correspond to the description given for the PRG04 sample, and we were able to report that result. However, the amount of protein found was much less than expected; we estimated roughly that the peptides separated and detected amounted to a few fmol for one-sixth of thePRG04 sample instead of the expected 500 fmol.
This discrepancy encouraged us to analyze a control sample of 500 fmol commercial BCA (see Materials and Methods). Figure 55 shows the comparison of two corresponding chromatographic fractions—from tryptic digests of one-sixth of the PRG04 sample, and 500 fmol of the commercial BCA isoform I. The corresponding signals obtained for the same experimental conditions were found to differ by at least a factor of 100.
The above experiment provides strong evidence that our sample as analyzed contained much less protein than specified, perhaps because of mishandling during transport. The Proteomics Research Group does not mention any such problem in their report on PRG04,14 although they remarked that “several groups did not observe any proteins in the sample,” which might indicate that others had similar difficulties.
As mentioned earlier, it is commonly believed that the advent of MALDI and ESI has made the task of protein characterization relatively straightforward. Indeed, there are many analyses in the literature that apparently confirm this idea. Nevertheless, it should be emphasized that negative results are not often published. In addition, many of those that are published fall into two categories:
First, analyses of known compounds. These can be rather unrealistic exercises, since any analysis is orders of magnitude easier if the answer is known.
Second, analyses of unknown mixtures. These are indeed real problems, but usually there is no independent verification of the results obtained.
Thus we believe that ABRF performs an important service to the analytical community by circulating test samples. Participants do not know the answer when they carry out the analysis, so for them the problem is a real unknown. However, the answer is subsequently available, so it is possible to check the results obtained.
It is illuminating to look at the overall results of the two exercises described here. For PRG03, 106 samples were sent out, and 54 reports were returned, of which 96% correctly identified BPDI as the major component of the test sample. By contrast, only 10 laboratories identified either phosphorylation site, and only 3 identified both correctly. For PRG04, 106 samples were sent out, and 42 reports were returned; of these, 24 laboratories reported partial success, but only 8 laboratories correctly identified all three carbonic anhydrase isoforms, as well as their common acetylation. As remarked by the organizers, these results suggest that protein identification can now be carried out quite efficiently if sufficiently rigorous criteria are applied. However, even identification is sometimes unreliable, and complete characterization remains elusive. In particular, the definition of post-translational modifications or minor sequence variations still presents a serious problem,14,16 as of course does de novo sequencing.
Our own results confirm these findings. On the one hand, the proteins themselves were correctly identified, including the abnormally low amounts of the carbonic anhydrase proteins in PRG04. On the other hand, 9 out of 10 identifications of the phosphorylations in PRG03 were mistaken, even though the database searched was of moderate size (~1200 peptides), and four of the mistakes agreed with calculation within 11 mDa. This emphasizes the problems inherent in identifications carried out simply by mass fingerprinting. MS/MS measurements, while by no means perfect, are certainly much more reliable, particularly if high mass accuracy is obtained.
As remarked recently,17 such findings arouse considerable skepticism about the accuracy of many results in the literature, particularly large-scale proteomic investigations that use wide mass windows. Most such investigations rely almost entirely on assignments carried out by various search engines, and false-positive identifications are very likely, particularly in mass fingerprinting. Computer identification programs are useful, as shown in our analyses above. However, small or overlapping peaks can often be misinterpreted by automatic peak assignment software, leading to the loss of important information, as well as errors in the results. Thus, we believe that careful manual examination and interpretation of MS (or MS/MS) spectra still play a crucial role in solving problems of any subtlety.
Such arguments also emphasize the benefits of producing a “permanent” record by off-line deposition of the sample. This feature, together with the addition of HPLC separation to the MALDI-TOF instrument, provided consistent ability to detect virtually all the significant peptides in the samples, and to characterize the corresponding proteins.
We thank Vic Spicer and Jim McNabb for technical assistance. This work was supported by grants from the US National Institutes of Health (GM 59240), the Natural Sciences and Engineering Research Council of Canada, and the Canadian Institutes of Health Research.