We recently performed a detailed comparison of automated, online 2D RP-RP and SCX-RP fractionation platforms for analysis of peptides derived from complex cell lysates and affinity purified multi-component protein complexes.55
Given the high degree of orthogonality that we observed between reversed phase separations performed at high- and low-pH, respectively, we asked whether addition of a third dimension would provide significantly improved fractionation power. Based on the buffer conditions used in the 1st
dimension of our RP-RP platform (20 mM ammonium formate, pH 10) we chose strong anion exchange (SAX) as the 2nd
dimension in a (now) 3D RP-SAX-RP configuration (). Briefly, peptides are loaded through the autosampler (, dashed box) in ammonium formate buffer (pH 10) and captured on the 1st
dimension RP column. Peptides poorly retained on reversed phase at high pH are trapped on either the 2nd
(SAX, pH 10) or 3rd
(RP, pH 3.0) dimension columns. The low pH pre- and analytical columns constitute the 3rd
dimension separation and are assembled in a vented58-60
configuration to facilitate loading of samples or 1st
(acetonitrile, 20 mM ammonium formate, pH 10) and 2nd
(KCl in 20 mM ammonium formate, pH 10) dimension eluents, respectively. The system provides complete, online fractionation, meaning that once peptides are injected from the autosampler well onto the 1st
dimension column, they are automatically transferred to the 2nd
dimensions and are only exposed to the fused silica tubing, PEEK LC fittings, and flow paths of the 6-port valves.
Our previous analytical characterization studies55
suggested that for LC-MS/MS analysis of E. coli
peptides in a complex background of whole cell lysate, RP-RP performance appeared to peak at a depth of between 30-40 fractions. Based on this observation we asked whether RP-SAX-RP would provide for improved performance at a fractionation depth beyond which RP-RP previously exhibited diminishing returns. Towards this end we performed replicate RP-RP (40 fraction) and RP-SAX-RP (37 fraction) analyses of E. coli
tryptic peptides. shows that while each fractionation technique provided reproducible data, the RP-SAX-RP platform identified 44% and 31% more peptides and proteins, respectively, as compared to RP-RP. As a surrogate analysis for separation peak capacity, we next asked whether the number of fractions across which discrete peptides were identified varied between the two separation platforms. Indeed, we observed that 75% of all peptide identifications spanned a single RP-SAX-RP fraction, while in the case of RP-RP () this figure of merit was reduced to 60%. Consistent with these data we observed that 34 of 37 RP-SAX-RP fractions contained between 100-300 unique peptide identifications (), suggesting that this 3-D platform provided for uniform fractionation of E. coli
peptides across the entire separation space.
Given the results above, we next asked whether RP-SAX-RP could provide sufficient fractionation depth to identify proteins expressed across a wide concentration dynamic range in the context of a shotgun, data-dependent type acquisition. In order to facilitate comparison of our results with other large-scale proteomic efforts, we switched to yeast as our model system. S. cerevisae
has a well-annotated genome and has been extensively studied at the whole proteome level using biochemical61
as well as shotgun18,43-44
mass spectrometry-based approaches. With these studies as a reference point, we next sought to establish an extreme limit for automated, online RP-SAX-RP fractionation. Towards this end we loaded 40 μg of peptides derived from yeast whole cell lysate and acquired 236 LC-MS/MS fractions on a LCQ Deca XP ion trap mass spectrometer (Supplementary Figure S1
). This experiment required nearly 18 days of uninterrupted acquisition time and yielded 18,359 unique peptides (1% FDR) that mapped to 2902 proteins (see Supplemental Methods
for details on protein inference). Although these data are based on somewhat dated mass spectrometry technology, it is nonetheless interesting to note that our results compare very favorably to a previous study that utilized 80 fractions of offline SCX fractionation coupled to LC-MS/MS on the same mass spectrometry platform (1504 proteins identified from 1 mg protein lysate44
). This comparison suggests that continued improvements in fractionation can compensate for limitations in the stochastic nature of data-dependent MS/MS acquisition. Consistent with this hypothesis, we observed a surprisingly uniform distribution of unique peptide identifications across all 236 RP-SAX-RP fractions.
Several proteins identified in this analysis have been previously characterized with expression levels at or below 128 copies per cell (), including YOR093C (41copies/cell), YBL063W (49 copies/cell), and YJL084C (57 copies/cell).61
Annotated MS/MS spectra for representative peptides from these proteins are provided on our mzServer63
via embedded links in . For comparison, also lists low abundance proteins (≤ 128 copies per cell) recently characterized by triple quadrupole-based MRM assays62
. Interestingly we identified 8 out of these 15 proteins in our 236 fraction RP-SAX-RP analysis, suggesting that efficient fractionation can offset, in-part, the discrepancies in detection between discovery- and targeted-mode MS/MS analysis. Based on these data we estimate a detection limit of 500 amol (≈50 protein copies per cell, 7E6 cells analyzed) and a dynamic range of 3.2E4 (YKL096W-a 1.59E6 copies/cell, 7E6 cells analyzed) for data-dependent analysis of total tryptic peptides on a 3-D ion trap instrument.
Table 1 Detection of low expression level proteins. Each "Y" entry provides an embedded link to an annotated MS/MS spectrum via mzServer 63.
As an extension of the studies above, we next coupled our RP-SAX-RP separation system to a hybrid linear ion trap orbitrap (Orbitrap XL) mass spectrometer. As we expected significantly improved performance as compared to the 3-D ion trap instrument, we first reduced the sample quantity by nearly an order of magnitude (5 μg versus 40 μg), and then sought to establish the number of RP-SAX-RP fractions required to obtain yeast proteome coverage roughly equivalent to that observed with the LCQ platform (236 fractions, 18,359 unique peptides, 2902 proteins). We found that 19 fractions yielded 2809 protein identifications based on 8929 unique peptide sequences (, blue). Acquisition of these data required approximately 2 days of LC-MS/MS time. We next performed a second 19 fraction RP-SAX-RP analysis (, yellow) to establish a baseline for reproducibility of our separation platform. Here we identified 2856 proteins from 9021 peptides. Across both analyses we identified 2422 proteins in common (≈75% reproducibility) and 3243 proteins in total. We also observed good agreement between signals of individual peptides identified in both runs (), with ≈80% of all intensity ratios within ± w2-fold of the mean value. We next tested whether the peptide content of individual fractions reproduced across these replicate runs by first requiring that peptide sequences were identified in the same fraction across both analyses; under these conditions, we observed that ≈48% of all peptides were reproducibly identified across discrete fractions in these two RP-SAX-RP analyses. However, the stochastic nature of MS/MS likely makes this result the lower bound of “reproducibility.” Hence we next used a more permissive approach which included as “reproduced” those MS-level precursor peaks that corresponded in mass (±10 ppm) and (third dimension) retention time (±0.5 minute) to peptide sequences identified in only one of the two analyses. Based on these criteria we found that potentially 98% of all identified peptides were reproducibly detected.
We next asked whether additional fractionation depth would provide higher proteome sequence coverage. In independent experiments we loaded 5 μg of yeast tryptic peptides and acquired MS/MS data from 19, 51, and 101 RP-SAX-RP fractions, respectively. Encouragingly, at each fractionation depth we obtained data that constituted a significant superset of the previous experiment, with overlaps of at least 70% and 92% at the peptide (Supplementary Figure S2A
) and protein (Supplementary Figure 2SB
) level, respectively. Moreover we observed that at least 62% of unique peptide identifications were confined to a single RP-SAX-RP fraction, even for the most in-depth analysis (101 fractions, Supplementary Figure S2C
). As with the experiments performed on the low resolution instrument (Supplementary Figure S1
) we readily detected low abundance proteins (≈50 copies per cell) at each fractionation depth, albeit at total acquisition times (e.g., number of fractions) far less than were required on the 3D ion trap (see embedded links in for MS/MS spectra of representative peptides). Based on the sample quantity consumed per analysis (5 μg) we estimate a detection limit of 65 amol and a dynamic range of 3.2E4 (YKL096W-a 1.59E6 copies/cell, 8E5 cells analyzed) for data-dependent analysis of total yeast tryptic peptides on a LTQ-Orbitrap XL. Collectively these results suggest that our 3D RP-SAX-RP platform provides for reproducible separation and sufficient peak capacity to enable identification of low abundance proteins in complex mixtures on low- and high-performance mass spectrometers.
For the sample quantity analyzed on the LTQ-Orbitrap (5 μg) it appeared that our 3D separation platform reached a point of diminishing returns between 50-100 fractions, as we only identified an additional 184 unique proteins (1377 unique peptides) by extending the analysis to 101 fractions. To determine whether additional fractionation primarily impacted protein sequence coverage (e.g., more peptides identified per protein) or protein dynamic range (e.g., identification of proteins at low expression levels), we next plotted our fractionation data from Supplementary Figures S1 and S2
along with that from a recent large-scale biochemical analysis61
of protein expression in yeast (). We also included the combined set of unique protein identifications resulting from our 19, 51, and 101 fraction experiments. For each expression level bin, we used the quantitative western blot data as a reference and then plotted the subset of these proteins identified in our RP-SAX-RP analyses. Between a range of ≈60 and ≈4,000 protein copies per cell, we observed a consistent trend in which the number of protein identifications increased systematically as a function of mass spectrometry technology (LCQ vs LTQ-Orbitrap) and fractionation depth (19, 51, and 101 fractions, or the combination thereof). However at the lowest expression level (, left, ≈30-60 copies per cell) the mass spectrometry data appeared somewhat stochastic, suggesting that either this range represented the practical detection limit under the conditions tested, and hence fractionation provided little benefit, or the number of proteins (n=9) quantified by western blot was too small to serve as a reference. To explore this question further we next plotted two other expression categories from Ghaemmaghami et al.,61
“extremely low signal (<50 copies per cell)” and “no expression detected,” along with the corresponding proteins detected in our RP-SAX-RP analyses (). With larger sets of reference proteins (234 and 1982 for each category, respectively) we again observed a systematic increase in the number of identified proteins as a function of mass spectrometry technology and fractionation depth. Interestingly, we observed that the number of protein identifications more than doubled (from 62 to 134) for the “extremely low signal (<50 copies per cell)” category moving from the LCQ (236 fractions) to the LTQ-Orbitrap (101 fractions).
Figure 4 (A, B) Proteome coverage for s.cerevisiae as a function fractionation depth (19, 51, 101, 236 fractions) and mass spectrometry platform (LCQ Deca or LTQ-Orbitrap XL) as compared to biochemical-based quantification of protein expression.61 Western blot (more ...)
Finally in order to identify overall bias in protein identification as a function of expression level, we re-plotted the data in as a relative percentage of all proteins identified by each approach (). Not surprisingly, both the LCQ (236 fractions) and Orbitrap (19 fractions) data exhibited a bias towards highly expressed proteins. However the combined data set (Orbitrap 19, 51, and 101 fractions) correlated very well with the western blot data across the majority of expression bins. Collectively our results suggest that increased fractionation provided for improved proteome coverage across the entire range of protein expression in yeast.
Figure 5 Expression histogram as in , plotted as a percentage of the total protein count detected in each experiment. TAP-GFP data (orange) serves as the reference protein set in each expression bin. The x-axis is labeled based on absolute expression (more ...)
While cross-lab comparisons based on protein lists are admittedly wrought with caveats, it is nonetheless informative to compare our results with other analytically-focused studies designed to maximize the discovery capacity of LC-MS/MS. For example, one recent report identified 3313 unique proteins in the context of a triplicate, 12 fraction off-line SCX-RP experiments from 2 mg yeast lysate.19
Similarly a combination of chromatographic and SDS-PAGE protein fractionation, followed by LC-MS/MS (off-line 3D), generated 75 fractions in total and identified 3639 proteins cumulatively across triplicate analyses from 1.5mg cell lysate.18
Finally, a triplicate, 24 fraction experiment based on offline isoelectric focusing peptide fractionation (OFFGEL) followed by LC-MS/MS (off-line IEF-RP) identified 3987 proteins from 0.3mg cell lysate.18
When compared to our data, in which we detected over 4000 unique yeast proteins from only 10% of equivalent input material, these results strongly suggest that high performance fractionation can significantly augment the performance capabilities of state-of-the-art mass spectrometry instrumentation. This observation is further corroborated by the data in , which compares the ability of our RP-SAX-RP platform to detect a set of low abundance proteins (<128 copies per cells) that were recently quantified by targeted MRM-type mass spectrometry assays. Cumulatively across our 3 analyses on the Orbitrap (19, 51, and 101 fractions) we detected 12 out of the 15 proteins, from approximately 2% of the input material as compared to the work of Picotti, et al.62
Although the above comparisons provide useful insight, a recent, multi-lab report64
illustrated the utility of considering numerous metrics when evaluating discovery-based proteomic methods. For example, it is informative to consider the overall analysis efficiency or rate of data production along with the total number of peptide and protein identifications. Towards this end, provides detection efficiencies for the 2D/3D experiments on the Orbitrap (1, 19, 51, and 101 fractions) and the LCQ (37, 40, and 236 fractions) instruments, respectively. These data demonstrate the inevitable compromise between experiment time and the depth of proteome coverage, but nonetheless provide useful benchmarks for evaluation of overall system performance. It is also worth noting that while the capillary format and online geometry of our nanoflow RP-SAX-RP fractionation platform provides for fully automated LC-MS/MS analyses, there is a practical trade-off in terms of limited sample loading capacity. While it is true that offline configurations provide for analysis of larger analyte quantities, recent reports have noted that sample handling, lyophilization, and non-specific adsorption of peptides to tube surfaces can often reduce sample recovery in these schemes. In addition, these deleterious effects tend to scale inversely with total input, meaning that the performance for offline systems can be limited for sample quantities below ≈50-100 μg.45-46
In contrast, we recently demonstrated that electrospray ionization efficiency compensates for chromatographic peak broadening at effluent flow rates below empirical Van Deemter minima,60
and suggested that a renewed focus on small diameter (≤ 25 μm I.D.) LC-MS assemblies operated at low nL/min flowrates would provide significantly improved analytical figures of merit and compliment the continued performance improvements available on state-of-the-art mass spectrometry platforms. Here we successfully coupled our nanoflow 1D columns with capillary-format high-pH RP and SAX to provide an automated, 3-D separation system for analysis of complex proteomes. The capacity of our RP-SAX-RP platform to provide improved detection and dynamic range is illustrated in in which we observed a systematic improvement in detection of low abundance proteins as a function of mass spectrometry technology and fractionation depth.
Performance metrics for 2D (RP-RP) and 3D (RP-SAX-RP) LC-MS/MS platforms.
Consistent with our experience using 25 μm I.D. analytical columns with integrated electrospray emitters in a 1-D nanoflow configuration60
or coupled with high-pH RP in an automated RP-RP configuration,55
our RP-SAX-RP platform has proven to be very robust, providing uninterrupted operation for more than 100 injections spanning several months of analysis time without a major system failure; in fact the majority of data described herein were acquired using one column set over a span of ≈3 months.