|Home | About | Journals | Submit | Contact Us | Français|
We report the optimization of a common LC/MS/MS platform to maximize the number of proteins identified from a complex biological sample. The platform uses digested yeast lysate on a 75 μm internal diameter × 12 cm reverse-phase column that is combined with an LTQ-Orbitrap mass spectrometer. We first generated a yeast peptide mix that was quantified by multiple methods including the strategy of stable isotope labeling with amino acids in cell culture (SILAC). The peptide mix was analyzed on a highly reproducible, automated nanoLC/MS/MS system with systematic adjustment of loading amount, flow rate, elution gradient range and length. Interestingly, the column was found to be almost saturated by loading ~1 μg of the sample. Whereas the optimal flow rate (~0.2 μl/min) and elution buffer range (13–32% of acetonitrile) appeared to be independent of the loading amount, the best gradient length varied according to the amount of samples: 160 min for 1 μg of the peptide mix, but 40 min for 10 ng of the same sample. The effect of these parameters on elution peptide peak width is evaluated. After full optimization, 1,012 proteins (clustered in 806 groups) with an estimated protein false discovery rate of ~3% were identified in 1 μg of yeast lysate in a single 160-min LC/MS/MS run.
In the last decade, mass spectrometry has emerged as a central proteomics technology in the post-genomic era. Shotgun (bottom-up) proteomics is the most commonly used platform for analyzing proteins and posttranslational modifications1–3. In a typical protocol, simple or complex protein samples are digested by proteases (e.g. trypsin) to generate peptides that are further analyzed by reverse-phase liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS). The MS/MS spectra are then searched against protein databases, resulting in protein identification and determination of post-translational modification sites. Additional strategies such as label-free and stable isotope labeling methods are implemented to obtain quantitative data4. Despite rapid development, current LC/MS/MS platforms still lack the sensitivity and throughput to detect all proteins from mammalian cells in a single experiment. To achieve successful analyses of complex protein samples, it is important to maximize protein/peptide analytic power by optimizing liquid chromatography and mass spectrometry settings.
Liquid chromatography of peptides prior to MS is usually achieved on a reverse-phase column which offers high-resolution separation capacity and utilizes mobile phase solvents compatible with electrospray ionization. As LC efficiency increases with smaller internal dimension and longer columns, detection sensitivity is greatly improved with the development of online microcapillary LC with internal dimension less than 150 μm. A common LC platform includes a standard HPLC, a flow splitter, and a 75 μm I.D. × 12 cm reverse-phase column3. Further decreasing the column I.D. and increasing the column length are possible4, but “ultra-high-pressure” LC systems would be required to provide sufficient back pressure for solvent delivery at optimum column flow rates5. In addition, chromatography peak capacity is also influenced by LC elution gradients and analysis time6–8.
Recently, the development of LTQ-Orbitrap hybrid mass spectrometer offers high-resolution precursor ion scans in the Orbitrap and sensitive, rapid acquisition of MS/MS scans in the LTQ. Compared to a 3D ion trap, the LTQ confines ions in a 2D radiofrequency field, providing higher storage capacity, faster scan rate and better detection efficiency9. The Orbitrap captures ions by orbital trapping, with electrostatic fields generated by central and outer electrodes10, 11. The ions move in spirals around the central electrode and oscillate along the z-axis. The axial oscillation of the ions is independent of initial energy, directions and positions, and is recorded as a current image. The image is then converted to ion frequencies by Fourier transform, leading to highly accurate measurements of m/z values in a large dynamic range12. By combining the advantages of both LTQ and Orbitrap, the hybrid instrument has been demonstrated to be a powerful tool for proteomic studies13–16.
Although microcapillary LC parameters have been extensively studied with respect to peak capacity, there is no detailed report on how to adjust sample loading and LC parameters to optimize protein identification, especially in the context of complex mixtures using the recently developed LTQ-Orbitrap mass spectrometer. Here we used a step-wise protocol to perform a series of optimization on numerous parameters, which are described in a shotgun proteomic study using a complex biology sample (i.e. yeast lysate) among more than 50 LC/MS/MS runs.
A yeast strain SUB59217 was grown in YPD medium at 30°C to early log phase (A600 = 1.0) and extracted in lysis buffer (10 mM Tris-HCl, pH 8.0, 0.1 M NaH2PO4, 8 M urea, 0.02% SDS and 10 mM β-mercaptoethanol). The SILAC analysis was performed in a similar protocol as described18. The isogenic yeast strain JMP025 was generated with lys2 and arg4 gene deletions. The strain was grown in heavy synthetic medium (0.7% Difco yeast nitrogen base, 2% dextrose, supplemented with adenine, uracil, and amino acids plus 12 mg/L [13C615N4] Arg and 18 mg/L [13C6] Lys (Cambridge Isotope Laboratories, Andover, MA) for >8 generations until A600 was ~0.7. The cells were then harvested and lysed in the same lysis buffer.
Protein concentration of yeast lysate was measured by a standard BCA protein assay kit (Thermo Scientific, Rockford, IL) and by a Coomassie stained SDS gel. In the gel analysis, proteins were concentrated on a very short 9% SDS gel (~2 mm long), stained with Coomassie Blue G250, and quantified by Scion Image (http://rsb.info.nih.gov/nih-image/). In both methods, bovine serum albumin (BSA) was used as standard.
The lysate (2 mg protein) was reduced with 10 mM DTT at 37°C for 30 min and alkylated with 50 mM iodoacetamide (IAA) in the dark at room temperature for 30 min. The sample was then diluted to 2 M urea with buffer (5% AcN in 50 mM NH4HCO3), and digested with trypsin (40 μg) at 37°C overnight. The resulting peptide solution was cleaned with a Vydac Bioselect 218 SPE1000 C18 cartridge (Chrom Tech, Apple Valley, MN), dried and dissolved with sample loading buffer (6% acetic acid, 0.005% heptafluorobutyric acid [HFBA], 0.1% TFA, and 5% AcN). The SILAC-labeled sample was processed under the same conditions.
A hybrid LTQ-Orbitrap MS (Thermo Scientific) equipped with an Agilent 1100 binary HPLC (Agilent Technologies, Palo Alto, CA), a Famos autosampler (LC Packings, San Francisco, CA), and a 75 μm I.D. × 12 cm fused-silica capillary column was applied for all runs. The column was packed with C18 resins (5 μm magic C18AQ; pore size, 200 Å; Michrom Bioresources, Auburn, CA). Column flow rate was measured by calibrated 5 μl micropipets for at least 3 times (VWR, West Chester, PA). Peptide samples were loaded onto the column by the autosampler, and eluted by a designed gradient (buffer A, 0.4% acetic acid, 0.005% HFBA, and 5% AcN; buffer B, 0.4% acetic acid, 0.005% HFBA, and 95% AcN). The eluted peptides were detected in a precursor MS scan by Orbitrap (400–1600 m/z, 60,000 resolution at m/z 400, 1 μscan, and 1 ×106 for automatic gain control), followed by sequential data-dependent MS/MS scans of the ten most abundant ions (minimal ion intensity of 500 counts, isolation width of 2 m/z, 35% normalized collision energy, 1 μscan, target value of 5,000 for automatic gain control, 60 sec dynamic exclusion, preview mode enabled, removal of 1+ ions or ions with unassigned charge state, and selection of 2+, 3+, and 4+ ions). When the LTQ was used as survey scan MS analyzer, all of parameters were the same with the exclusion of high-resolution, preview mode function, and charge state selection.
The MS/MS spectra were searched by the Sequest-Sorcerer algorithm on a Sorcerer 2 IDA (Sage-N-Research)19 against a composite target/decoy database to estimate false discovery rate20. The target proteins included yeast proteins (from www.standford.edu/saccharomyces) and common contaminants, such as porcine trypsin and human keratins. The decoy proteins were generated from pseudo-reversed sequences of all target proteins21. Searching parameters consisted of semi-tryptic restriction, fixed modification of Cys (+57.0215 Da, alkylation by iodoacetamide), and dynamic modification of oxidized Met (+15.9949 Da). Mass tolerance was set to ±20 ppm. For SILAC analysis, dynamic modifications of Arg (+10.0083 Da) and Lys (+6.0201 Da) were included. Only b and y ions were considered during the database match.
Peptide matches were filtered by a minimal peptide length of 6 amino acids21, then grouped by trypticity (only accept fully and partially tryptic peptides) and charge states20. In each group, the peptide matches were further filtered by dynamically increasing XCorr and ΔCn cutoffs until the global protein false discovery rate was ~3%7. While effectively removing false matches, the procedure recovered the vast majority (93.2 ± 2.6%) of estimated true MS/MS matches (also named spectral counts, Table 1). The filtering procedure also led to consistent results from technical replicates (Table 1).
When matching filtered peptides to proteins, we assigned the proteins sharing the same peptide(s) in one group, in which the top protein with highest peptide matches was selected to represent the group. For simplicity, we used all identified protein number for comparison during the optimization of LC settings. After optimization, both the identified proteins and protein groups were reported for the analysis of total yeast cell lysate by the LC/MS/MS run. Some of the accepted peptides and spectra are attached (see supplemental Table S3 and S4).
To estimate the peptide recovery for C18 cartridge cleanup, 0.5% of the input and elution was taken and mixed with equal amount of SILAC-labeled peptides (derived from 1 μg of total protein), respectively. The two samples were analyzed by LC/MS/MS and the SILAC-labeled peptides were used as internal standards to evaluate peptide recovery. The detail quantification methods are described in another paper18.
We used a highly complex biological sample from yeast to perform the optimization study (Figure 1A). Total protein was extracted from yeast cells using urea and SDS, and quantified by two independent methods. First, we used standard BCA assay in which Cu2+ is reduced to Cu1+ by proteins in an alkaline medium and the reduced Cu1+ selectively forms an intense purple complex with bicinchoninic acid to allow colorimetric quantification22. In six repeated analyses, the detected concentration was 4.2 ± 0.1 μg/μl using BSA as standard. To account for possible interference from buffer chemicals, we measured the protein concentration again based on Coomassie-stained SDS gel images (Figure 1B). As the interfering chemicals are removed after gel electrophoresis, the Coomassie dye only interacts with positively charged residues in proteins23. To minimize quantification errors, we ran a short gel to compress all proteins in 2-mm range. The dye absorbance signal was linear to titrated BSA concentration (R2 = 0.986) in all three replicates (Figure 1B), and the protein concentration of the cell lysate was measured to be 4.5 ± 0.1 μg/μl, consistent with the BCA result. Finally, the averaged concentration (4.35 μg/μl) was used for subsequent assays.
Yeast proteins (2.0 mg) were then reduced, alkylated, digested in solution, and desalted by a C18 cartridge. To evaluate peptide recovery during the desalting step, we used SILAC-labeled heavy peptides as internal standard to quantify >100 abundant peptides in the input and eluate24. For instance, one peptide (NVPLYQHLADLSK) had a relative intensity of 1.2 before desalting and 1.1 after desalting when compared to the heavy standard (Figure 1C). Thus, the recovery of this peptide was ~91.7% (1.1/1.2). According to the recovery rate of 116 different peptides, we calculated final mean value of peptide recovery (73.7 ± 15%, Figure 1D) and used it to estimate total peptide amount for LC loading.
Since reproducibility of the LC system is a prerequisite for reliable comparison of different runs with varying parameters, we tested run-to-run variation by repeated LC/MS/MS runs. A peptide mixture (equivalent to 1 μg of yeast lysate) was analyzed four times on a 75 μm I.D. × 12 cm reverse-phase column using the same parameter settings. Base peak profiles for the replicates were almost identical (Figure 2A) and retention time shifts of the same peptide ions were usually less than 1 min. After database search and filtering, the four runs resulted in highly consistent number of accepted spectra counts, peptides and proteins, with relative standard deviation of 2.8%, 2.4% and 1.5%, respectively (Figure 2B). The data strongly support high reproducibility of the automated LC/MS/MS system used in this study. The same reverse-phase column was used for entire optimization process and column degeneration was not observed after more than 200 runs (data not shown).
First, we examined the effect of peptide loading amount on protein identification (Figure 3A). When peptide samples were titrated from 10 ng to 1 μg on the column, identified protein number was increased from 395 to 699. Further addition of loading amount to 4 μg resulted in only 6% increase of identified proteins. The titration curve suggests that the LC/MS/MS system was saturated around the point of 1 μg. Similar results were obtained by analyzing the accepted spectral counts and peptide numbers (supplemental Table S1). Whereas loading higher amount of peptides could raise ion intensity, it also led to ion peak broadening that may suppress adjacent co-eluting ions. In the example of an abundant peptide of the TEF2 protein (IGGIGTVPVGR) in the 10 ng run versus the 4 μg (400-fold more loading) run, the ion signal was increased ~200-fold, and the peak width at half height was broadened from 0.22 min to 0.45 min (Figure 3B). The loading saturation of the column was indicated by retention time shift from 40.0 min in the 10 ng run to 27.7 min in the 4 μg run, because the peptide may be pushed forward on the column due to competitive binding of more hydrophobic peptides in the 4 μg run. In addition to this abundant peptide, we analyzed the peak width distribution of all accepted peptides and found the majority of the data could be roughly fitted into Gaussian curve (Figure 3C). The mean values of the Gaussian clearly indicates a global shift of peak width from 0.12 min (10 ng loading) to 0.18 min (4 μg loading), suggesting the occurrence of peak broadening. It should be mentioned that most of peptide peaks were narrower than the abundant TEF2 peptide (Figure 3B). During this analysis, the amount of 1 μg peptides on the 75 μm I.D. × 12 cm column represented a reasonable balance between sensitivity and ion suppression, and thus was used for the following analyses unless specified (Figure 3A). This saturation point is expected to be proportional to the amount of resin in the column and may vary upon the properties of selected reverse-phase resins.
Second, we tested the effect of flow rate on capillary LC column performance (Figure 4). For 1 μg of loaded peptides, when the flow rate changed from 1.1 μl/min to 0.15 μl/min, the best result was achieved with the flow rate between 0.15 μl/min and 0.25 μl/min. A similar optimal flow rate of 0.25 μl/min was found when 50 ng of peptides was loaded (Figure 4). This was not unexpected as slower flow rate resulted in more concentrated eluates and increase sensitivity. Further decrease of flow rate to 0.1 μl/min, however, worsened the results, which may be due to unstable electrospray. In our setting, the voltage was applied on a four-way tee for buffer splitting located ~20 cm away from the column tip3. The ionization was likely influenced by the flow rate. Furthermore, slower flow rates were associated to longer delays of elution (e.g. 20 min at 0.1 μl/min) because of dead volume (~2 μl). Therefore, we fixed the flow rate at 0.20 μl/min for this 75 μm I.D. column in subsequent runs.
Third, we optimized the LC gradient range to fully utilize peptide identification power. As identifiable peptides were not equally distributed during the LC elution, it was desirable to expand the range within which most of the peptides were eluted. We performed a test run with 5–35% of buffer B in 45 min and found that 97% of the identified peptides eluted from 9%–30% of buffer B, equivalent to 13–32% of acetonitrile (Figure 5). Thus, we used the gradient range to 9%–30% of buffer B for this LC system.
Fourth, we adjusted elution time from 10 to 320 min to analyze 1 μg of peptides. The titration curve was not linear and started to plateau at the 160-min time point with 1,012 proteins identified. The 320-min run provided limited benefit with only 79 more proteins identified (Figure 6). We further tested different loading amounts (200 ng and 50 ng) and found the same plateau around 160-min gradient length (Figure 6). This phenomenon could be explained by the two effects with increased gradient length: (i) analysis time was longer to allow more MS/MS scans, and (ii) ion peaks may be broadened with less ion intensity. For example, when 50 ng of peptides was used in the series of analysis during 10 min to 320 min elution, the peak width of the TEF2 peptide (IGGIGTVPVGR) raised from 0.14 min to 0.90 min, whereas the peak height dropped from 100% to 16% in these runs (Figure 7A). The peak broadening caused by long gradient elution was also illustrated by global peak width distribution (Figure 7B). To this end, if sample amount was further limited, long elution time may be even detrimental to the analysis, because peptides signal may become too weak to be detected. To test this idea, we carried out more analyses with 10 ng of loading amount (Figure 6). Indeed, the optimal elution time was decreased to 40 min. This evaluation is useful for selecting optimal gradient length based on the sample amount available.
Considering none of current LC/MS/MS system is capable of analyzing all peptides digested from real biological samples, the above optimization will facilitate protein identification as well as the analysis of posttranslational modifications (PTM). As sequencing coverage by MS/MS is critical for PTM analysis, we examined average sequencing coverage of proteins in one set of six runs (50 ng loading, 10–320 min). Like protein identification, the sequencing coverage is also increased with the elution time up to 80 min and then reached plateau (16.3% for 10 min, 23.7% for 20 min, 24.8% for 40 min, 24.9% for 80 min, 23.5% for 160 min and 21.0% for 320 min).
By systematical adjustment of parameters in shotgun proteomics, we optimized common parameters in our LC and MS settings on a 75 I.D. reverse-phase column. With the optimum flow rate set at 0.2 μl/min and the gradient range set at 13–32% of acetonitrile, the gradient length should be adjusted according to the sample load amounts (e.g. 40 min for 10 ng of peptides, and 160 min for 1 μg of peptides). Using the optimized settings, we were capable of identifying 1,012 proteins (clustered in 806 protein groups) from 1 μg of tryptic yeast total cell lysate. Although some of the parameters may need adjustment when applied to different LC/MS/MS systems, the procedure and the data here are expected to be highly instructive for conducting efficient proteomics analysis.
This work was supported in part by NIH grants CA126222, AG025688 and NS055077 to J.P.