|Home | About | Journals | Submit | Contact Us | Français|
The dynamic range of protein expression in complex organisms coupled with the stochastic nature of discovery-driven MS/MS analysis continues to impede comprehensive sequence analysis and often provides only limited information for low-abundance proteins. High performance fractionation of proteins or peptides prior to mass spectrometry analysis can mitigate these effects, though achieving an optimal combination of automation, reproducibility, separation peak capacity, and sample yield remains a significant challenge. Here we demonstrate an automated nanoflow 3D LC-MS/MS platform based on high-pH reversed phase (RP), strong anion exchange (SAX), and low-pH reversed phase (RP) separation stages for analysis of complex proteomes. We observed that RP-SAX-RP outperformed RP-RP for analysis of tryptic peptides derived from escherichia coli, and enabled identification of proteins present at a level of 50 copies per cell in saccharomyces cerevisiae, corresponding to an estimated detection limit of 500 amol, from 40 μg total lysate on a low-resolution 3-D ion trap mass spectrometer. A similar study performed on a LTQ-Orbitrap yielded over 4000 unique proteins from 5 μg total yeast lysate analyzed in a single, 101 fraction RP-SAX-RP LC-MS/MS acquisition, providing an estimated detection limit of 65 amol for proteins expressed at 50 copies per cell.
Mass spectrometry-based proteomics is now well-established for characterization of proteins across a wide range of experimental contexts: recombinant protein expression systems, analysis of proteins resolved via SDS-PAGE,1 multi-component protein complexes isolated through affinity purification,2 post-translational modifications (phosphorylation,3-4 glycosylation,5,6-8 sulfation,9 sumoylation,10,11and ubiquitination12,13), protein sub-domains and termini,14-15 sub-cellular compartments,16,17 and finally whole proteomes.18-19 While advances in instrumentation continue to improve the discovery power of mass spectrometry in biomedical applications,20-26 it is nevertheless true that data quality often varies inversely with sample complexity. The deleterious effects observed during analysis of peptides present in complex matrices manifest in diminished performance of both MS and MS/MS scans and include: signal suppression of low-basicity peptides27 and undersampling of low-abundance peptides during MS/MS analysis.28 In addition, overlapping isotope profiles of co-eluting peptides can lead to errors in mass assignment, quantification and precursor selection.29, 30 Collectively these phenomena diminish the reproducibility of discovery-based experiments and inhibit characterization of low-abundance proteins.31-32 Biochemical fractionation of sub-cellular compartments and organelles, or affinity purification based on sequence tags or specific posttranslational modifications are effective strategies for analyte enrichment and reduction of sample complexity prior to LC-MS/MS analysis.16, 33-37 Although these techniques have proven reliable and robust, the protein and peptide concentration dynamic range of enriched samples often exceeds the analytical capabilities of current mass spectrometry platforms. In these cases, chromatographic pre-fractionation of proteins or peptides can significantly improve LC-MS/MS analysis. This is perhaps best exemplified by the traditional approach of gel-based protein separation, followed by in-gel digestion, peptide extraction, and LC-MS/MS analysis.38-39 Although these “Gel-LC” methods have been refined,18, 40-41 they are typically labor-intensive and low-throughput. These limitations catalyzed efforts to implement orthogonal peptide-based fractionation strategies, beginning with the combination of strong cation exchange (SCX) and reversed phase (RP) chromatography, termed “Multidimensional Protein Identification Technology (MudPIT).42-43 The success of the MudPIT approach has led to a plethora of alternative fractionation geometries, with the overall goal of improving separation power and proteome sequence coverage: Offline/online SCX-RP,44-45 offline immobilized pH gradient gels (IPG) coupled with RP-LC-MS/MS,46-47 offline solution-based IEF (Offgel) coupled with RP-LC-MS/MS,48-49 offline high-pH RP coupled with low-pH RP-LC-MS/MS45, 50-52, offline mixed-mode high pH RP-SAX coupled with low pH RP-LC-MS/MS,53 and online HILIC-SCX-RP.54 Generally speaking the offline formats can accommodate large sample quantities (>100 μg), but are susceptible to losses resulting from additional sample handling, lyophilization, and non-specific adsorption to tube or other surfaces.45, 46 Online systems offer the advantage of efficient peptide recovery and transfer between all separation stages, but are typically limited in total sample capacity as compared to larger-scale offline formats. For example, Dowell and colleagues reported that online SCX-RP detected twice the number of peptides as compared to an equivalent offline SCX-RP format.45 Similarly, work by Slebos et al.,46 suggested that although the separation power of gel-based IEF fractionation of peptides as a first dimension is largely independent of sample quantity, the observed recovery and subsequent identification by LC-MS/MS varied inversely with total starting material. We recently demonstrated an automated, online multi-dimension fractionation platform that provided for direct comparison of SCX-RP and RP-RP configurations coupled to true nanoflow LC-MS/MS.55 Although RP-RP exhibited superior analytical figures of merit as compared to SCX-RP under all conditions tested, we observed that fraction-to-fraction overlap of peptides began to increase beyond an analysis depth of 40, 1st dimension fractions. These results are consistent with fundamental descriptions of peak capacity for multi-dimension separations in general56 and previous reports of RP-RP fractionation in particular.45, 50-51, 57 Collectively these observations and results ultimately limit the separation peak capacity and total proteome sequence coverage that can be obtained from 2-D platforms.
Here we expand our automated nanoflow RP-RP-LC-MS/MS platform55 to include a stage of high pH SAX separation providing an integrated 3-D (RP-SAX-RP) fractionation system. Based on analysis of peptides derived from an escherichia coli standard we found that RP-SAX-RP provided for significantly improved sequence identification at a fractionation depth beyond which RP-RP previously exhibited diminished peak capacity. Using biochemical data from a recent, large-scale protein expression study in saccharomyces cerevisiae as a benchmark, we found that RP-SAX-RP provided for MS/MS-based identification of peptides and proteins across a wide abundance range on both low resolution 3D ion trap and high performance Orbitrap instruments. Interestingly our data suggest an approximate 100-fold performance difference between previous generation and state-of-the-art mass spectrometers. Moreover, our results indicate that efficient, online fractionation strategies provide large-scale proteome sequence coverage from only a few micrograms of tryptic peptides. More generally, our data provide compelling evidence that improved capillary-based separation systems can augment the rapid pace of improvements in state-of-the-art mass spectrometry platforms.
Due to space considerations experimental methods related to cell culture, sample preparation, mass spectrometry and data analysis are provided in Supplementary Materials.
Lists of proteins and peptides associated with Figures Figures2,2, ,3,3, S1, and S2 are provided in Supplementary File S1. In addition, all native mass spectrometry files and mzResults files which include annotated MS/MS spectra are available for download from the Proteocommons website based on the following hash: hash link will be provided prior to publication.
The 3D RP-SAX-RP platform (Figure 1) consisted of Waters UHPLC binary and isocratic pumps, an autosampler (Waters Corp., Milford, MA) and additional 6-port, 2-position valve (Valco Inc., Austin, TX). The 1st dimension reversed phase (RP) column consisted of a 250 μm I.D. capillary packed with 15 cm of 5 μm diameter XBridge C18 resin (Waters Corp., Milford, MA). An anion exchange (SAX) column (250 μm I.D. ×15 cm of 10 μm dia. POROS10HQ [AB Sciex, Foster City, CA] resin) was connected directly to the outlet of the 1st dimension RP column with a union. The 3rd dimension was constructed as described previously.58-59 The isocratic pump delivered either peptide samples or 1st (acetonitrile in 20 mM ammonium formate, pH 10) and 2nd (KCl in 20 mM ammonium formate, pH 10) dimension eluents at 2 μL/min. through the sample loop. Discrete eluent concentrations used for all experiments are provided in Supplementary Table S1. The binary pump delivered 0.1% formic acid at 8 μL/min. to dilute the organic content and acidify the 1st/2nd dimension effluent prior to the 3rd dimension pre-column, or provided for gradient elution (2-30% B in 45 minutes, A = 0.2 M acetic acid, B = acetonitrile with 0.2 M acetic acid) of peptides from the 3rd dimension reversed phase columns for LC-MS/MS analysis at a flow rate of ≈10 nL/min.60 A Digital PicoView electrospray source platform (New Objective, Woburn, MA; model DPV-550 for the Orbitrap XL) was used to automatically position the emitter tip at the heated metal capillary inlet during LC-MS/MS acquisition or beneath a gravity-driven drip station during injection of peptide samples or 1st/2nd dimension eluents.
Safety glasses should be worn at all times during construction and use of fused silica based capillaries. In addition, a lab coat and gloves should be worn when handling other volatile organics in a chemical fume hood.
We recently performed a detailed comparison of automated, online 2D RP-RP and SCX-RP fractionation platforms for analysis of peptides derived from complex cell lysates and affinity purified multi-component protein complexes.55 Given the high degree of orthogonality that we observed between reversed phase separations performed at high- and low-pH, respectively, we asked whether addition of a third dimension would provide significantly improved fractionation power. Based on the buffer conditions used in the 1st dimension of our RP-RP platform (20 mM ammonium formate, pH 10) we chose strong anion exchange (SAX) as the 2nd dimension in a (now) 3D RP-SAX-RP configuration (Figure 1). Briefly, peptides are loaded through the autosampler (Figure 1, dashed box) in ammonium formate buffer (pH 10) and captured on the 1st dimension RP column. Peptides poorly retained on reversed phase at high pH are trapped on either the 2nd (SAX, pH 10) or 3rd (RP, pH 3.0) dimension columns. The low pH pre- and analytical columns constitute the 3rd dimension separation and are assembled in a vented58-60 configuration to facilitate loading of samples or 1st (acetonitrile, 20 mM ammonium formate, pH 10) and 2nd (KCl in 20 mM ammonium formate, pH 10) dimension eluents, respectively. The system provides complete, online fractionation, meaning that once peptides are injected from the autosampler well onto the 1st dimension column, they are automatically transferred to the 2nd and 3rd dimensions and are only exposed to the fused silica tubing, PEEK LC fittings, and flow paths of the 6-port valves.
Our previous analytical characterization studies55 suggested that for LC-MS/MS analysis of E. coli peptides in a complex background of whole cell lysate, RP-RP performance appeared to peak at a depth of between 30-40 fractions. Based on this observation we asked whether RP-SAX-RP would provide for improved performance at a fractionation depth beyond which RP-RP previously exhibited diminishing returns. Towards this end we performed replicate RP-RP (40 fraction) and RP-SAX-RP (37 fraction) analyses of E. coli tryptic peptides. Figure 2A shows that while each fractionation technique provided reproducible data, the RP-SAX-RP platform identified 44% and 31% more peptides and proteins, respectively, as compared to RP-RP. As a surrogate analysis for separation peak capacity, we next asked whether the number of fractions across which discrete peptides were identified varied between the two separation platforms. Indeed, we observed that 75% of all peptide identifications spanned a single RP-SAX-RP fraction, while in the case of RP-RP (Figure 2B) this figure of merit was reduced to 60%. Consistent with these data we observed that 34 of 37 RP-SAX-RP fractions contained between 100-300 unique peptide identifications (Figure 2C), suggesting that this 3-D platform provided for uniform fractionation of E. coli peptides across the entire separation space.
Given the results above, we next asked whether RP-SAX-RP could provide sufficient fractionation depth to identify proteins expressed across a wide concentration dynamic range in the context of a shotgun, data-dependent type acquisition. In order to facilitate comparison of our results with other large-scale proteomic efforts, we switched to yeast as our model system. S. cerevisae has a well-annotated genome and has been extensively studied at the whole proteome level using biochemical61 as well as shotgun18,43-44 and targeted62 mass spectrometry-based approaches. With these studies as a reference point, we next sought to establish an extreme limit for automated, online RP-SAX-RP fractionation. Towards this end we loaded 40 μg of peptides derived from yeast whole cell lysate and acquired 236 LC-MS/MS fractions on a LCQ Deca XP ion trap mass spectrometer (Supplementary Figure S1). This experiment required nearly 18 days of uninterrupted acquisition time and yielded 18,359 unique peptides (1% FDR) that mapped to 2902 proteins (see Supplemental Methods for details on protein inference). Although these data are based on somewhat dated mass spectrometry technology, it is nonetheless interesting to note that our results compare very favorably to a previous study that utilized 80 fractions of offline SCX fractionation coupled to LC-MS/MS on the same mass spectrometry platform (1504 proteins identified from 1 mg protein lysate44). This comparison suggests that continued improvements in fractionation can compensate for limitations in the stochastic nature of data-dependent MS/MS acquisition. Consistent with this hypothesis, we observed a surprisingly uniform distribution of unique peptide identifications across all 236 RP-SAX-RP fractions.
Several proteins identified in this analysis have been previously characterized with expression levels at or below 128 copies per cell (Table 1), including YOR093C (41copies/cell), YBL063W (49 copies/cell), and YJL084C (57 copies/cell).61 Annotated MS/MS spectra for representative peptides from these proteins are provided on our mzServer63 via embedded links in Table 1. For comparison, Table 1 also lists low abundance proteins (≤ 128 copies per cell) recently characterized by triple quadrupole-based MRM assays62. Interestingly we identified 8 out of these 15 proteins in our 236 fraction RP-SAX-RP analysis, suggesting that efficient fractionation can offset, in-part, the discrepancies in detection between discovery- and targeted-mode MS/MS analysis. Based on these data we estimate a detection limit of 500 amol (≈50 protein copies per cell, 7E6 cells analyzed) and a dynamic range of 3.2E4 (YKL096W-a 1.59E6 copies/cell, 7E6 cells analyzed) for data-dependent analysis of total tryptic peptides on a 3-D ion trap instrument.
As an extension of the studies above, we next coupled our RP-SAX-RP separation system to a hybrid linear ion trap orbitrap (Orbitrap XL) mass spectrometer. As we expected significantly improved performance as compared to the 3-D ion trap instrument, we first reduced the sample quantity by nearly an order of magnitude (5 μg versus 40 μg), and then sought to establish the number of RP-SAX-RP fractions required to obtain yeast proteome coverage roughly equivalent to that observed with the LCQ platform (236 fractions, 18,359 unique peptides, 2902 proteins). We found that 19 fractions yielded 2809 protein identifications based on 8929 unique peptide sequences (Figure 3A, blue). Acquisition of these data required approximately 2 days of LC-MS/MS time. We next performed a second 19 fraction RP-SAX-RP analysis (Figure 3A, yellow) to establish a baseline for reproducibility of our separation platform. Here we identified 2856 proteins from 9021 peptides. Across both analyses we identified 2422 proteins in common (≈75% reproducibility) and 3243 proteins in total. We also observed good agreement between signals of individual peptides identified in both runs (Figure 3B), with ≈80% of all intensity ratios within ± w2-fold of the mean value. We next tested whether the peptide content of individual fractions reproduced across these replicate runs by first requiring that peptide sequences were identified in the same fraction across both analyses; under these conditions, we observed that ≈48% of all peptides were reproducibly identified across discrete fractions in these two RP-SAX-RP analyses. However, the stochastic nature of MS/MS likely makes this result the lower bound of “reproducibility.” Hence we next used a more permissive approach which included as “reproduced” those MS-level precursor peaks that corresponded in mass (±10 ppm) and (third dimension) retention time (±0.5 minute) to peptide sequences identified in only one of the two analyses. Based on these criteria we found that potentially 98% of all identified peptides were reproducibly detected.
We next asked whether additional fractionation depth would provide higher proteome sequence coverage. In independent experiments we loaded 5 μg of yeast tryptic peptides and acquired MS/MS data from 19, 51, and 101 RP-SAX-RP fractions, respectively. Encouragingly, at each fractionation depth we obtained data that constituted a significant superset of the previous experiment, with overlaps of at least 70% and 92% at the peptide (Supplementary Figure S2A) and protein (Supplementary Figure 2SB) level, respectively. Moreover we observed that at least 62% of unique peptide identifications were confined to a single RP-SAX-RP fraction, even for the most in-depth analysis (101 fractions, Supplementary Figure S2C). As with the experiments performed on the low resolution instrument (Supplementary Figure S1) we readily detected low abundance proteins (≈50 copies per cell) at each fractionation depth, albeit at total acquisition times (e.g., number of fractions) far less than were required on the 3D ion trap (see embedded links in Table 1 for MS/MS spectra of representative peptides). Based on the sample quantity consumed per analysis (5 μg) we estimate a detection limit of 65 amol and a dynamic range of 3.2E4 (YKL096W-a 1.59E6 copies/cell, 8E5 cells analyzed) for data-dependent analysis of total yeast tryptic peptides on a LTQ-Orbitrap XL. Collectively these results suggest that our 3D RP-SAX-RP platform provides for reproducible separation and sufficient peak capacity to enable identification of low abundance proteins in complex mixtures on low- and high-performance mass spectrometers.
For the sample quantity analyzed on the LTQ-Orbitrap (5 μg) it appeared that our 3D separation platform reached a point of diminishing returns between 50-100 fractions, as we only identified an additional 184 unique proteins (1377 unique peptides) by extending the analysis to 101 fractions. To determine whether additional fractionation primarily impacted protein sequence coverage (e.g., more peptides identified per protein) or protein dynamic range (e.g., identification of proteins at low expression levels), we next plotted our fractionation data from Supplementary Figures S1 and S2 along with that from a recent large-scale biochemical analysis61 of protein expression in yeast (Figure 4). We also included the combined set of unique protein identifications resulting from our 19, 51, and 101 fraction experiments. For each expression level bin, we used the quantitative western blot data as a reference and then plotted the subset of these proteins identified in our RP-SAX-RP analyses. Between a range of ≈60 and ≈4,000 protein copies per cell, we observed a consistent trend in which the number of protein identifications increased systematically as a function of mass spectrometry technology (LCQ vs LTQ-Orbitrap) and fractionation depth (19, 51, and 101 fractions, or the combination thereof). However at the lowest expression level (Figure 4B, left, ≈30-60 copies per cell) the mass spectrometry data appeared somewhat stochastic, suggesting that either this range represented the practical detection limit under the conditions tested, and hence fractionation provided little benefit, or the number of proteins (n=9) quantified by western blot was too small to serve as a reference. To explore this question further we next plotted two other expression categories from Ghaemmaghami et al.,61 “extremely low signal (<50 copies per cell)” and “no expression detected,” along with the corresponding proteins detected in our RP-SAX-RP analyses (Figure 4C). With larger sets of reference proteins (234 and 1982 for each category, respectively) we again observed a systematic increase in the number of identified proteins as a function of mass spectrometry technology and fractionation depth. Interestingly, we observed that the number of protein identifications more than doubled (from 62 to 134) for the “extremely low signal (<50 copies per cell)” category moving from the LCQ (236 fractions) to the LTQ-Orbitrap (101 fractions).
Finally in order to identify overall bias in protein identification as a function of expression level, we re-plotted the data in Figure 4A as a relative percentage of all proteins identified by each approach (Figure 5). Not surprisingly, both the LCQ (236 fractions) and Orbitrap (19 fractions) data exhibited a bias towards highly expressed proteins. However the combined data set (Orbitrap 19, 51, and 101 fractions) correlated very well with the western blot data across the majority of expression bins. Collectively our results suggest that increased fractionation provided for improved proteome coverage across the entire range of protein expression in yeast.
While cross-lab comparisons based on protein lists are admittedly wrought with caveats, it is nonetheless informative to compare our results with other analytically-focused studies designed to maximize the discovery capacity of LC-MS/MS. For example, one recent report identified 3313 unique proteins in the context of a triplicate, 12 fraction off-line SCX-RP experiments from 2 mg yeast lysate.19 Similarly a combination of chromatographic and SDS-PAGE protein fractionation, followed by LC-MS/MS (off-line 3D), generated 75 fractions in total and identified 3639 proteins cumulatively across triplicate analyses from 1.5mg cell lysate.18 Finally, a triplicate, 24 fraction experiment based on offline isoelectric focusing peptide fractionation (OFFGEL) followed by LC-MS/MS (off-line IEF-RP) identified 3987 proteins from 0.3mg cell lysate.18 When compared to our data, in which we detected over 4000 unique yeast proteins from only 10% of equivalent input material, these results strongly suggest that high performance fractionation can significantly augment the performance capabilities of state-of-the-art mass spectrometry instrumentation. This observation is further corroborated by the data in Table 1, which compares the ability of our RP-SAX-RP platform to detect a set of low abundance proteins (<128 copies per cells) that were recently quantified by targeted MRM-type mass spectrometry assays. Cumulatively across our 3 analyses on the Orbitrap (19, 51, and 101 fractions) we detected 12 out of the 15 proteins, from approximately 2% of the input material as compared to the work of Picotti, et al.62
Although the above comparisons provide useful insight, a recent, multi-lab report64 illustrated the utility of considering numerous metrics when evaluating discovery-based proteomic methods. For example, it is informative to consider the overall analysis efficiency or rate of data production along with the total number of peptide and protein identifications. Towards this end, Table 2 provides detection efficiencies for the 2D/3D experiments on the Orbitrap (1, 19, 51, and 101 fractions) and the LCQ (37, 40, and 236 fractions) instruments, respectively. These data demonstrate the inevitable compromise between experiment time and the depth of proteome coverage, but nonetheless provide useful benchmarks for evaluation of overall system performance. It is also worth noting that while the capillary format and online geometry of our nanoflow RP-SAX-RP fractionation platform provides for fully automated LC-MS/MS analyses, there is a practical trade-off in terms of limited sample loading capacity. While it is true that offline configurations provide for analysis of larger analyte quantities, recent reports have noted that sample handling, lyophilization, and non-specific adsorption of peptides to tube surfaces can often reduce sample recovery in these schemes. In addition, these deleterious effects tend to scale inversely with total input, meaning that the performance for offline systems can be limited for sample quantities below ≈50-100 μg.45-46 In contrast, we recently demonstrated that electrospray ionization efficiency compensates for chromatographic peak broadening at effluent flow rates below empirical Van Deemter minima,60 and suggested that a renewed focus on small diameter (≤ 25 μm I.D.) LC-MS assemblies operated at low nL/min flowrates would provide significantly improved analytical figures of merit and compliment the continued performance improvements available on state-of-the-art mass spectrometry platforms. Here we successfully coupled our nanoflow 1D columns with capillary-format high-pH RP and SAX to provide an automated, 3-D separation system for analysis of complex proteomes. The capacity of our RP-SAX-RP platform to provide improved detection and dynamic range is illustrated in Figure 4 in which we observed a systematic improvement in detection of low abundance proteins as a function of mass spectrometry technology and fractionation depth.
Consistent with our experience using 25 μm I.D. analytical columns with integrated electrospray emitters in a 1-D nanoflow configuration60 or coupled with high-pH RP in an automated RP-RP configuration,55 our RP-SAX-RP platform has proven to be very robust, providing uninterrupted operation for more than 100 injections spanning several months of analysis time without a major system failure; in fact the majority of data described herein were acquired using one column set over a span of ≈3 months.
Despite the growing use of mass spectrometry in biomedical research, limited dynamic range continues to hinder the characterization of low abundance proteins present in complex biological matrices. Multidimensional peptide fractionation42-43 improves data quality and facilitates analysis of proteomic results in the context of hypotheses of biological function, although achieving an optimal combination of analytical figures of merit remains a significant challenge. Herein we build upon our recent work in high performance 1-D60 and 2-D55 separations and describe an automated platform for 3-D RP-SAX-RP fractionation of tryptic peptides. Our observations are thematically aligned with other recent efforts65-71 and provide compelling evidence that capillary-based fractionation formats offer a powerful combination of automation, reproducibility, separation peak capacity, and sample yield.
The authors thank Eric Smith for assistance with the figures. Generous support for this work was provided by the Dana-Farber Cancer Institute and the National Institutes of Health, NHGRI (P50HG004233), and NINDS (P01NS047572).