|Home | About | Journals | Submit | Contact Us | Français|
Complete coverage of protein primary structure is demonstrated for 37 yeast protein forms between 6 and 30 kDa in an improved platform for Top Down mass spectrometry (MS). Tandem mass spectrometry (MS/MS) for protein identification with 100% sequence coverage is achieved in a highly automated fashion with 15–300-fold less sample amounts than an initial report of a proteome fractionation approach employing preparative gel electrophoresis with an acid-labile surfactant to facilitate reversed phase separation in a second dimension. Using a quadrupole-enhanced Fourier Transform Ion Cyclotron Resonance Mass Spectrometer (FTICRMS) improves the dynamic range for protein detection by ~50-fold and MS/MS by ~30-fold. The technology development illustrated here typifies an accelerating effort to detect whole proteins in a more general and higher throughput fashion for improved biomarker identification and detection of diverse post-translational modifications. Capillary RPLC is used in both off-line and on-line modes, with one on-line LC/FTMS sample providing 25 observed protein forms from 11 to 22 kDa.
The “front end” problem in Mass Spectrometry (MS) has long challenged researchers to devise sample processing strategies that convert complexity and contamination into robust analyte presentation. For modern proteomics, the staggering complexity and dynamic range of protein expression creates an immense front end and MS measurement challenge. In peptide analysis, multidimensional separations on-line1 now present thousands of peptides per hour to an MS instrument via Electrospray Ionization (ESI). The most common step before MS analysis by either ESI or MALDI is reversed-phase liquid chromatography (RPLC)2 which decontaminates and reduces sample complexity, with microcapillary LC/MS serving as an optimal way to introduce low-to-sub femtomole amounts3 of complex peptide mixtures.
As the field of MS-based proteomics continues to mature, many efforts are underway to increase both the number of proteins identified and the sequence coverage of each individual protein to enable better detection of mass discrepancies such as post-translational modifications (PTMs). Standard approaches typically enable 5–50% sequence coverage using MALDI-TOF–MS4 with greater coverage possible using either on-line nanobore LC–MS3 or off-line nanospray ESI–MS.5 Recently, Lubman and co-workers have reported that nearly 100% sequence coverage can be obtained by using MALDI-and-ESI-based methods together.6 Other recent strategies to the PTM measurement challenge involve targeted analysis of specific modifications (e.g., phosphorylation7–9) or strategies to maximize the observation of small proteolysis products produced during enzymatic degradation.10,11 However, PTM detection and localization through 100% sequence coverage can be highly efficient when analyzing intact proteins directly by high-resolution tandem MS (MS/MS) in a top-down methodology.12 The processing of intact proteins presents a more difficult front end challenge for MS but also some advantages in the complete interrogation of DNA-predicted primary structure.
To realize a more general and robust implementation of the Top Down strategy, good control over undigested proteome samples becomes critical. Two-dimensional separations have been developed involving various chromatographic techniques, including ion exchange13–15 or size exclusion chromatography (SEC)16 with RPLC. Isoelectric focusing (IEF) with subsequent RPLC using nonporous silica (NPS) has been reported using electrospray ionization (ESI) and time-of-flight (TOF) MS for protein detection.17,18 One-dimensional IEF19,20 or RPLC21 in capillaries has been coupled to ESI–Fourier transform (FT) MS for protein profiling, but MS/MS for direct protein identification has only been accomplished with standard proteins and has not been achieved in a high-throughput setting to date.22 Typical amounts of total protein used range from 2 to 10 mg for solution phase 2-dimensional separations and mid-to-sub microgram for 1-dimensional, on-line capillary LC/MS.
Proteome-wide fractionation strategies for Top Down are labor intensive in both protein separations and MS analysis. A recent separation platform using preparative gel electrophoresis (PAGE) and RPLC coupled with an ESI/Q-FTMS instrument has made the “top down” approach more systematic via an offline processing platform.23 An acid labile surfactant (ALS)24,25 instead of SDS was utilized in the first dimension to provide strongly denaturing conditions for size-dependent proteome fractionation while avoiding tedious SDS removal prior to MS analysis. With an initial requirement of ~1 g of yeast cells, here we describe a smaller ALS-PAGE/RPLC platform that transfers samples more directly from RPLC to FTMS for more efficient sample-handling and MS/MS data acquisition. Such efforts to decrease sample utilization result in 15 or 300-fold improvements using a nanospray robot off-line or on-line capillary RPLC/FTMS, respectively, with improved sample and data processing augmenting an automated quadrupole-FT MS hybrid instrument.
An overview of the fractionation methods used in this study is shown in Figure 1. Both a 37 mm (model 491 Prep Cell) and a 7 mm (Mini Prep Cell) i.d. preparative gel were utilized as the first dimension of fractionation after cell lysis, with 10 mg and 1 mg of total protein loaded, respectively. Cells of Saccharomyces cerevisiae (strain S288C) were grown under aerobic conditions to stationary phase. The wet cell mass was resuspended in 10 mL of lysis buffer containing 0.05 M Tris-HCl, pH 7.2, 2% ALS, 2–5 mM EDTA, 5–10 mM TCEP, 2 μL DNase, and 2 protease inhibitor cocktail tablets. After lysis by French press (15 000 psi), the cellular debris was removed via centrifugation. The supernatant was diluted by an equal volume of ALS–PAGE sample buffer (2% ALS, 0.06 M Tris-HCl, pH 6.8, 25% glycerol) added prior to boiling for 5 min. The sample was stored at minus;80 °C for later processing.
Iodoacetamide was used to alkylate cysteines by adding a 50 μL aliquot of stock solution (125 mM in water) to 300 μL of the reduced yeast cell extract. The resulting mixture was incubated in the dark for 1 h at room temperature.26 The alkylated yeast cell extract (300 μL) was loaded on a Mini Prep Cell (7 mm i.d.; Bio-Rad, Hercules, CA) following the instructions described in the manual, with 0.1% ALS used instead of 0.1% SDS in the running buffer. A 12% T resolving gel was used with a 4% T stacking gel, where T represents the acrylamide monomer content, and 50 fractions with a volume of 300 μL each were collected over 5 h after elution of the dye front. The whole cell extract of yeast was fractionated according to protein size using continuous elution gel electrophoresis. Fractions from the Mini Prep Cell were first precipitated with cold acetone to remove bulk ALS. After combination of 2–3 fractions, the samples were resuspended in 100 μL 6 M guanidine at pH 2.0 (adjusted by TFA) to hydrolyze any remaining ALS which degrades into dodeca-2-one and sodium 3-(2,3-dihydroxypropoxyl) propanesulfonate with a half-life of 7.6 min at pH 1.9.25 The 28 mm i.d. gel (model 491 Prep Cell) used the same %T gel as above, with samples processed as previously reported.27
Either a whole fraction from Mini Prep Cell (7 mm i.d. gel) or half of the fraction from big Prep Cell (37 mm i.d. gel) was then injected onto a C4 capillary RPLC column (320 μm i.d. × 25 cm long, from Vydac, Hesperia, CA). An HP 1100 binary pump (from Agilent) and a splitter (1:100 splitting ratio, from LC Packings) were used to generate a flow rate of 5 μL/min. Standard solvents (H2O, CH3CN) and 0.5% formic acid were used. The column was washed at 10% CH3CN for 15 min. before a linear gradient over 25 min. from 20% to 55% solvent B. The RPLC fractions containing ~0.5–1.0 μg of total protein were collected every 2 min. providing a 10 μL sample volume. Solutions of the separated protein mixtures were analyzed by SDS-PAGE or by ESI-Q-FTMS via a nanospray robot.
Effluent from capillary RPLC (5–10 μL) was collected in a 96 well sample plate for the NanoMate 100 system (Advion BioSciences, Ithaca, NY). Acquisition of intact protein spectra (10 or 25 scans) was realized using ESI with a custom Q-FTMS28 with an 8.5 T actively shielded magnet (Bruker Daltonics, Billerica, MA). In general, ions generated from positive ESI were directed through a heated metal capillary, skimmer, and multiple ion guides into the ion cell (~10−9 Torr) of the FTMS. Transients were stored with a MIDAS datastation29 as 512 K data sets. Theoretical isotopic distributions were generated using Isopro v3.0. Relative molecular weight (Mr) values are for the monoisotopic peak until otherwise indicated by the mass difference (in units of 1.0024 Da30) between the monoisotopic peak and the most abundant isotopic peak denoted in italics. Spectra were calibrated externally using bovine ubiquitin, 8564.64–5.
Two strategies were applied for protein ion isolation and fragmentation. One is initiated by on-the-fly deconvolution of intact protein spectra,31 followed by automatic isolation with SWIFT32 and fragmentation by infrared multiphoton dissociation33 (IRMPD; 75 W, 150–300 ms, 50–100% power, normally 25 to 50 scans). Two species out of each sample could be fragmented within 30 min with 35 MS/MS scans of each (~4–10 s per MS/MS scan). The other strategy involved a two-stage process where selective accumulation of three 40 m/z regions (called “quad marching”)28 was followed by either IRMPD (after a stored waveform inverse Fourier transform, SWIFT) in the ICR cell or collisional fragmentation at the exit of the notch-filtering quadrupole of the Q-FTMS (Patrie, S. M. Int. J. Mass Spectrom. 2004, in press). Both broadband and “quad marching” spectra were processed by automatic THRASH30 on-the-fly.
For each Mini Prep Cell fraction analyzed, a micro injection valve was used to load 3 μL (~1 μg total protein) onto a 75 μm i.d. × 10 cm capillary column packed with C4 media (New Objective Inc., MA). An HP 1100 pump with a 1:1000 splitting ratio was used to obtain a flow rate of 400~500 nL/min. A tool command language (TCL) script was used to run and store each MS scan through the RPLC gradient. The reconstructed total ion chromatogram (TIC) was obtained from the MIDAS data station mentioned above.
Intact protein spectra were analyzed by a deconvolution program31 with manual validation of Mr values reported with isotopic resolution. MS/MS spectral analysis is performed by a modified THRASH algorithm,30 and the resulting peak lists together with the intact protein Mr values were sent to ProSight PTM, a web-based software for identification and characterization of intact proteins (https://prosightptm.scs.uiuc.edu).34 P-scores reported in this study are negative log of the probability score that has been defined previously.23 For example, a protein match with a probability score of 5 × 10−2 has a P-score of 1.3, which indicates a 5% chance of a spurious hit, a common threshold for indicating a protein identification. The P-scores in this study range from 1.9 to 19.2.
Processing yeast lysates with a 37 vs a 7 mm preparative gel gave the Figure 2 comparison, with the smaller diameter gel requiring one order-of-magnitude less sample volume (3 mL vs 0.3 mL of whole cell lysate). Analysis of the ALS–PAGE fractions by analytical PAGE (Figure 2) indicates a ~5–6 kDa window of proteins for typical fractions from the smaller i.d. gel, which is comparable to that from the larger diameter gel. After degradation of ALS in the size-sorted fractions of Figure 2b, RPLC employing a 320 μm i.d. capillary column in the second dimension of separation used either 10-fold or 2-fold (Figure 3, with and without asterisks, respectively) less sample amounts vs the initial report of ALS-PAGE/RPLC.27 For a typical fraction, 10–20 RPLC peaks were observed in the UV trace (λ220, data not shown). Such RPLC samples could be presented directly to the FTMS without additional sample processing (e.g., lyophilization). To date, over 200 Mr values from ~20 ALS-PAGE fractions have been observed through this platform. Thus, a robust protein processing procedure could transform ~1 mg of total protein into a large number of Mr values measured at high resolving power. Such size-sorting should prove useful for targeting biomarkers observed from MALDI-based experiments, with the direct identification ultimately relying on Top Down MS/MS.
ESI/Q-FTMS/MS data acquisition for a capillary RPLC fraction directly sampled by off-line nanospray is shown in Figure 4. Two major components were observed, SWIFT isolated and fragmented with IRMPD. The 21 519.8-0 Da component was identified to be ribosomal protein S7.e.A. with 6 b-type and 2 y-type ions matching, including a 4 amino acid sequence tag (Figure 4b). The P-score was 10.0, indicating a 10−10 probability that this identification was spurious. The observed molecular weight of this protein was 42.01 Da larger than the theoretical value calculated from the sequence in the database (Δm value obtained from fragmentation ions). These data are consistent with acetylation on the N-terminal serine (formally a 42.01 Da Δm localized to the 7 N-terminal residues), which was also indicated in the output from ProSight PTM (Figure 4c). Another protein automatically processed by the instrument was identified as a heat shock protein from the same sample (Figure 4d,e) also indicated N-terminal acetylation with 9 b/y hits, a P-score of 4.5, and a Mr error of 16 ppm.
One protein observed from a later Mini Prep Cell fraction was observed with a Mr value of 27 460.5-0 Da (Supporting Figure 1a). After quadrupole-enabled selective accumulation of the 29+ and 30+ charge states, collisional dissociation generated the MS/MS data of Supporting Figure 1. After spectral processing by THRASH, 54 ions were used as input to ProSight PTM for database retrieval. This protein was identified as phosphoglycerate mutase with 6 b- and 17 y-type ions matching and P-score of 19.2. No modifications other than removal of the N-terminal Met were indicated. A four amino acid sequence tag was found adding yet further confidence to the identification.
By using the automatic decon-SWIFT method for data acquisition described above in combination with an exclusion list containing all the Mr values of those protein forms previously identified, new information from each sample was obtained. However, exclusion of the most abundant species leaves those with lower ion signals that often involve difficult or lengthy MS/MS experiments to obtain high quality fragmentation data for identification. An alternative strategy employed the quadrupole enhancement to FTMS for selective accumulation of spectral windows of ~40 m/z. Protein signals were increased up to 50-fold (Figure 5, parts a vs b), allowing observation of more protein ions in the same overall data acquisition times. Good quality fragmentation spectrum was obtained after quad-SWIFT isolation as shown in the Figure 5c identification of 8690.07 with a P-score of 11.7. In general, use of Q-FTMS increased the dynamic range of successful MS/MS experiments by a factor of ~30. Pulse gas (~5 × 10−9 Torr) in the cell was also used in some of the experiments, which gave a factor of ~2 to 5 increase in ion signals. Of the 35 protein forms identified and characterized, 18 were intact proteins and 17 were protease products (http://kelleher.scs.uiuc.edu/publications/yeast_proteins_miniaturization.xls). These species arise from 27 unique genes.
The combination of >50 μg sample requirements and mixture for off-line analysis makes online LC/FTMS a complementary approach provided identifications of sufficient confidence can be obtained. Employing 75 μm i.d. columns for LC/FTMS operating at 400~500 nL/min, ~10 nanograms of protein from a Mini Prep Cell fraction #15 (containing ~20 kDa proteins) gave the reconstructed total ion chromatogram (TIC) shown in Figure 6a. This required ~300-fold less sample than the original report of ALS-PAGE/RPLC. Single-scan mass spectra for the designated capillary RPLC peaks were shown in Figure 6b–d. During the LC run, 25 protein forms were observed with six of them over 20 kDa. After searching the yeast database, five possible matches could be obtained based on Mr values alone and a 0.5 Da window (Table 1). For each of these “intact mass tags”,35 a single match corresponded to a list of 150 yeast proteins identified by Top Down MS to date (Meng, F. Anal. Chem. 2004, in press). Thus, tentative identifications can be made with extension of this approach uncertain for mammalian proteome projects incorporating top down on a large scale. Indeed, improved separation ability was realized using online capillary RPLC/FTMS vs off-line sample collection (1–2 min), with each peak in the total ion chromatogram (TIC) containing one major species. The lower complexity of each spectrum will enable future fragmentation by threshold dissociation (e.g., using infrared photons22) or even electron capture dissociation36,37 on-the-fly and without isolation of precursor ions.
Improvements in sample creation and utilization for presentation of intact proteins in a more streamlined fashion for Top Down Mass Spectrometry have been described. An improvement of 15–300-fold was demonstrated over an initial report of ALS-PAGE/RPLC and application of this miniaturized sample processing platform beyond the 37 proteins detected here has the potential to identify more proteins with ever increasing throughput while decreasing sample requirements. Thus, this 2-D proteome processing can present RPLC samples directly (no lyophilization) to ESI/MS in either off- or on-line modes after a size-dependent proteome fractionation, which could find importance in biomarker identification by Top Down MS/MS. Further extension of the online capillary LC/FTMS/MS will require far faster instruments for acquisition of MS/MS data using highly automated Q-FTMS technology. Future efforts will be devoted to the improvement of the separation efficiency and extension of the methods illustrated here to higher mass proteins from mammalian proteomes with complex combinations of post-translational modifications.
The acid-labile analogue of SDS was a generous gift from Edward Bouvier and Reb Russell of the Waters Corp. The laboratory of N.L.K. received support from the Searle Scholars Program, the Burroughs Wellcome Fund, the Sloan and Packard Foundations, the Research Corporation (RI 0683), an NSF Career Award (CHE-0134953), and the National Institutes of Health (GM 067193).
Supporting Information Available: ESI/FTMS of a Mini Prep Cell/cap RPLC fraction from S. cerevisiae and its MS/MS spectrum by collisionally activated dissociation. This material is available free of charge via the Internet at http://pubs.acs.org.