|Home | About | Journals | Submit | Contact Us | Français|
We demonstrate the use of capillary zone electrophoresis with an electrokinetically pumped sheath-flow electrospray interface for the analysis of a tryptic digest of a sample of intermediate protein complexity, the secreted protein fraction of Mycobacterium marinum. For electrophoretic analysis, 11 fractions were generated from the sample using reversed phase liquid chromatography; each fraction was analyzed by CZE-ESI-MS/MS, and 334 peptides corresponding to 140 proteins were identified in 165 min of mass spectrometer time at 95% confidence (FDR<0.15%). In comparison, 388 peptides corresponding to 134 proteins were identified in 180 min of mass spectrometer time by triplicate UPLC-ESI-MS/MS analysis each using 250 ng of the unfractionated peptide mixture at 95% confidence (FDR<0.15%). 62% of peptides identified in CZE-ESI-MS/MS and 67% in UPLC-ESI-MS/MS were unique. CZE-ESI-MS/MS favored basic and hydrophilic peptides with low molecular mass. Combining the two data sets increased the number of unique peptides by 53%. Our approach identified more than twice as many proteins as the previous record for CE proteome analysis. CE-ESI-MS/MS is a useful tool for the analysis of proteome samples of intermediate complexity.
Commercial liquid chromatography is routinely used as a separation technique in bottom-up proteomics.1–8 Very long gradient elution separations often produce extraordinary peak capacity and resolving power.9 In two notable examples, Zhou et al. recently developed an automated 3-D (RP-SAX-RP) LC-MS/MS platform and identified over 4000 unique proteins from 5 μg of total yeast lysate in a single 200 hour acquisition.10 Thakur et al. reported a 50 cm column (i.d. 75 μm) packed with 1.8 μm C18 beads and identified 5000 proteins in a triplicate 8 hour gradients.11
Despite the impressive performance of liquid chromatographic separation for proteomic analysis, it is desirable to employ a complementary separation method that does not rely on reversed phase separation. Capillary electrophoresis (CE) provides an intriguing alternative. Capillary zone electrophoresis (CZE), the simplest CE mode, separates analyte by their charge-to-size ratio in buffers under a high electrical field. CZE-MS was first reported by Smith et al. in 1987,12 and has received attention for MS-based proteomics.
Most examples consider the analysis of standard peptides or the tryptic digest of a few standard proteins, and relatively few describe the use of CZE for the analysis of complex proteomic samples. Lindner et al. recently reported a sheathless CZE-ESI-MS system and compared it to UPLC-ESI-MS by analyzing a rat testis linker histone protein sample digested by endoproteinase Arg-C.13 The total analysis time of CZE-ESI-MS was shorter than nano-HPLC-ESI-MS and identified more low molecular mass peptides. Eight non-histone H1 proteins were identified from the sample by capillary electrophoresis, whereas 23 proteins were identified by LC using a 10X larger sample loading.
Yates employed a solid-phase microextraction technique to prefractionate the yeast ribosome digest followed by CE-MS analysis.14 Eleven fractions were analyzed with 30-minute long CE separations. A total of 66 proteins were identified in the 5.5 hour long mass-spectrometry analysis time.
CZE provides two primary advantages compared to HPLC. First, the separation mechanism is complementary to reversed phase liquid chromatography, which will be of particular value in the analysis of basic peptides. Second, it provides a much faster and higher efficiency separation than conventional HPLC; when operated at high voltages, separation windows of 10 minutes or less and plate counts of 100,000 or more are routine.
However, CZE suffers from two primary disadvantages compared to HPLC. First, the fast separation and high efficiency place severe constraints on the data acquisition rate of the mass spectrometer. For example, a mass spectrometer operating at 10 Hz is only able to acquire 6,000 tandem mass spectra in a ten-minute separation window. Samples of high complexity cannot be analyzed in great depth in a single separation. Second, the loading capacity of CZE is one to three orders of magnitude lower than HPLC, which places severe demands on mass spectrometer sensitivity.
In this paper, we consider a strategy that both accommodates the limitations and takes advantage of the strengths of CZE for the analysis of the tryptic digest of a sample of intermediate complexity. Like Yates’ approach, the sample is pre-fractionated,14 in this case using reversed phase liquid chromatography. Unlike Yates’ approach, each fraction is subjected to a rapid CZE-ESI-MS/MS analysis. The total separation time is equal to that produced by three replicate UPLC-MS/MS analyses of the mixture, and the number of protein identifications is similar for the two separation methods. Importantly, the overlap in protein identifications for the two methods is relatively low; the complementary separation methods explore different portions of the proteome. CZE-MS/MS tends to favor detection of basic and low molecular weight peptides compared to UPLC-MS/MS.
All reagents were purchased from Sigma Aldrich, unless stated. Formic acid, acetic acid, and trifluoroacetic acid were purchased from Fisher Scientific. Sequencing grade modified porcine trypsin was purchased from Promega. Water was purchased from Honeywell Burdick & Jackson. Fused capillaries were purchased from Polymicro Technologies. ZipTipC18 were purchased from Millipore Co.
The culturing of M. marinum and generation of short-term culture filtrates are described elsewhere.15 A secreted protein fraction containing approximately 260 μg of protein, as determined by BCA assay, was purified by ice-cold acetone precipitation and resuspended in 110 μL of 10 mM, pH 8.2 ammonium bicarbonate buffer, reduced at 95 °C for 5 minutes with 2 mM dithiothreitol. Iodoacetamide was added to a final concentration of 6 mM, and alkylation was performed at room temperature for 20 minutes in the dark. 4 μg of trypsin was added and digestion was performed for 6 hours at 37 °C.
HPLC fractionation of the digest was performed on an Alliance HPLC, using a Waters XBridgeTM C18 column (3.0 mm × 50 mm, 5 μm). The gradient profile from 95% solvent A (2% acetonitrile in water) to 95% solvent B (2% water in acetonitrile) was as follows: 0–5 min, 95 % A; 5–10 min, 95%-90% A; 10–40 min, 90%-60% A; 40–42 min, 60-20% A; 42–52 min, 20% A; 52–52.1 min, 20–95% A; 52.1–60 min, 95% A. The flow rate was 0.70 mL/min. The injection amount was 100 μL (approximately 240 μg). 30 fractions were collected every 2 minutes and fractions 5 to 22 were combined to 11 fractions, which were dried in an Eppendorf Vacuum concentrator.
5 μL (approximately 12 μg) of the whole digest sample was desalted by ZipTipC18 and dried by Eppendorf Vacuum concentrator.
The desalted whole digest sample was reconstituted in 50 μL loading buffer containing 0.1% formic acid/3% acetonitrile/water and separated by a UPLC BEH 130 C18 column (100 μm × 100 mm, 1.7 μm) in a nanoACQUITY UPLC system. 1 μL (~250 ng) of the sample was loaded for each analysis. The gradient profile from 100% solvent A (0.1% formic acid and 2% acetonitrile in water, B&J grade) to 100% solvent B (0.1% formic acid and 2% water in acetonitrile, B&J grade) was as follows: 0–5 min, 99% A; 5–7 min, 99-90% A; 7–37 min, 90-60% A; 37–38 min, 60-15% A; 38–48 min, 15% A; 48–49 min, 15–99% A; 49–60 min, 99% A. The flow rate was 1.2 μL/min.
The eluent was introduced into an LTQ-Orbitrap-Velos mass spectrometer by positive mode electrospray ionization. The MS1 survey scan was m/z 395–1800. The 20 most abundant ions of each scan above a threshold of 500 ion counts were selected for collision-induced dissociation and subsequent MS2 scans. Dynamic exclusion was enabled such that a MS1 ion observed twice within a 45-second window (with a mass tolerance of −0.5 to +1.50) was ignored for MS2 for the following 45 seconds, with a maximum exclusion list size of 200. MS1 ions with +1 were excluded for MS2. The analysis was performed three times.
The electrophoresis system was assembled from components.16–18 High voltages were provided by two Spellman CZE 1000R high-voltage power supplies. Electrospray was generated using an electrokinetically pumped sheath flow through a nanospray emitter.16 The emitter was borosilicate glass capillary (o.d. 1.0 mm, i.d. 0.75 mm, 10 cm) pulled with a Sutter instrument P-1000 flaming/brown micropipette puller. The size of the emitter opening was 5–10 μm. Voltage programming was controlled by LabView software.
Each of the dried HPLC fractions was separately reconstituted in 10 μL B&J grade water. The separation buffer was ammonium acetate (10 mM, pH 5.7) and the electrospray sheath flow liquid contained 50% v/v methanol, 50% v/v water, and 10 mM acetic acid.
The separation capillary (i.d. 50 μm, o.d. 149 μm, length 30.0 cm) was uncoated. The injection was done by applying 5.5 kV on the sample reservoir and 1.5 kV on the sheath flow electrospray reservoir for 5 seconds. For separation, 5.5 kV was applied on the injection end of the capillary and 1.5 kV on the sheath flow reservoir for 15 minutes. Each fraction was analyzed once. This injection employed stacking conditions and the injection amount is unknown, but is likely three orders of magnitude smaller than the injection amounts used in the UPLC analysis.
Peptides were introduced into LTQ-Orbitrap-Velos by positive mode electrospray ionization. The MS1 survey scan was m/z 395–1800. The 12 most abundant ions of each scan above a threshold of 200 ion counts were selected for collision-induced dissociation and subsequent MS2 scans. Dynamic exclusion was enabled such that a MS1 ion observed twice within a 45-second window (with a mass tolerance of −0.5 to +1.50) was ignored for MS2 for the following 45 seconds, with a maximum exclusion list size of 200. MS1 ions with +1 were excluded for MS2.
Mascot generic format (mgf) peak list files were generated using RAW2MSM from the Mann lab using the default parameters.19 Peak lists were searched using the Paragon search engine within Protein Pilot 4.0 (ABSciex).15,20 Instrument parameters were set to Orbi (MS) and LTQ (MS/MS), and trypsin was selected as the digestion enzyme. A custom database of M. marinum (marinolist) in FASTA format was combined with a list of approximately 250 contaminant proteins, and the E. coli MG1655 FASTA to increase the size of the search space. FPR and FDR rates were determined by decoy-search strategies.21–23 For the 95% CI datasets used, no decoy hits were observed; we report a conservative 0.15% FDR.
Cumulative distributions were calculated for peptide m/z, GRAVY index, and pI using Matlab. The respective data were sorted according to the desired parameter, and the Matlab command cumsum was used to calculate the cumulative distribution, which was then normalized to a maximum of one.
We analyze a sample of moderate complexity, the secreted protein fraction (secretome) from Mycobacterium marinum. M. tuberculosis, the causative agent of the disease tuberculosis, infects approximately 1/3 of the Earth’s population and is responsible for more than two million annual deaths.24–27 M. marinum is a mycobacterial species that is closely related to M. tuberculosis but does not normally cause disease in humans and is used as a model system for some aspects of M. tuberculosis pathogenesis.15, 28 The protein density of the secreted proteome from M. marinum (and M. tuberculosis) is likely similar to other secretomes. Those secretomes typically contain 100–300 proteins,28–31 which creates a natural sample of intermediate complexity. The secreted protein fraction likely has large differences in protein abundance, making it useful for comparing CZE-ESI-MS and UPLC-ESI-MS.
We performed parallel experiments using CZE-ESI-MS/MS and UPLC-ESI-MS/MS for the analysis of the M. marinum secreted proteome. CZE-ESI-MS/MS was used to analyze eleven fractions from the proteome that were generated using reversed phase HPLC. The entire secreted proteome was analyzed in triplicate using UPLC-ESI-MS/MS. These conditions produced a similar total mass spectrometer analysis time (165 min for CZE-ESI-MS/MS and 180 min for UPLC-ESI-MS/MS, Fig 1). An additional 60 min is required for prefractionation of the CE samples. This prefractionation is performed off-line and does not effect the mass spectrometer analysis time.
CZE-ESI-MS/MS analysis identified 334 peptides and 140 proteins, while UPLC-ESI-MS/MS identified 388 peptides and 134 proteins. The two approaches produced similar numbers of peptide and protein identifications.
Information on peptide IDs from duplicate CZE runs of five HPLC fractions is included in the supporting information. Between 39 and 82% of the peptides were shared between duplicates. Similarly, information on peptide IDs from the triplicate UPLC runs is included in the supporting information. Between 64 and 74% of peptide IDs were shared in any two runs. Like many biological samples, mycobacterial culture filtrates have a large dynamic range, with just a few polypeptide species representing the majority of the population. In one set of these data (CZE duplicate), the top 4 proteins, all known secreted substrates, are identified with >138 peptides, which is a substantial fraction of the total population.
Only 127 peptides were shared in common, Fig 2A. The majority of peptides (62%) identified in CZE-ESI-MS/MS data were not observed in the UPLC-ESI-MS/MS data. Similarly, only 70 proteins were shared, Fig 2B. Roughly half the proteins observed in the CZE-ESI-MS/MS data were not observed by UPLC-ESI-MS/MS.
Our CZE-ESI-MS/MS analysis was biased towards peptides with low m/z (Fig 3). Half of all the peptides identified in CZE-ESI-MS/MS had m/z < 700, whereas only 25% in UPLC-ESI-MS/MS had m/z < 700 (Fig 3). The mean m/z values were 710 in CZE-ESI-MS/MS and 850 in UPLC-ESI-MS/MS. The two separation methods had similar charge state distribution (80% z =+2, 20% z = +3, <1% z = +4), which demonstrates that the difference in m/z distribution was caused by the difference between the two separation methods and not an artifact of electrospray ionization.
This bias toward lower m/z values in the CZE data is explained in part by our use of electrokinetic injection.32 In this mode, the injected amount is proportional to the ion’s electrophoretic mobility, which is related to the ion’s charge to size ratio. As expected, inverse mobility and m/z are roughly correlated (r = 0.66). This biased injection results in lower loading and signal amplitude for the higher m/z ions in CZE. In addition, low molecular weight peptides likely co-elute during UPLC separation, generating a negative bias for UPLC-ESI-MS/MS.
We next analyzed the hydrophobicity of the peptides using the GRAVY index (Fig 4).33 The percentages of hydrophilic (50%) and hydrophobic peptides (50%) were similar in the UPLC-ESI-MS/MS data, while the percentage of hydrophilic peptides (55%) was slightly higher than hydrophobic peptides (45%) in CZE-ESI-MS/MS. CZE-ESI-MS/MS favors hydrophilic peptides more than UPLC-ESI-MS/MS, although not by a large amount.
Peptides identified by CZE-ESI-MS/MS tended to be more basic than those identified by UPLC-ESI-MS/MS, Figure 5. 74% of the peptides identified by CZE-ESI-MS/MS had pI<7 whereas 88% of the peptides identified by UPLC-ESI-MS/MS had pI<7.
HPLC pre-fractionation was involved in the CZE-ESI-MS/MS sample preparation, which may have accounted for the peptides with low m/z, which agreed with the trend in Figure 3. This result suggests that the difference of physico-chemical properties of peptides observed between CZE-ESI-MS/MS and UPLC-ESI-MS/MS is not caused by HPLC pre-fractionation, but by the difference between the two separation methods.
CZE-ESI-MS/MS analysis of a prefractionated peptide sample of intermediate complexity is complementary to UPLC-ESI-MS/MS. Both identified a similar number of peptides and proteins within a similar analysis time. However, the overlap of protein and peptide identification is modest. CZE-ESI-MS/MS analysis of the prefractionated sample tends to identify more peptides that are basic and have lower m/z than UPLC-ESI-MS/MS.
This analysis appears to present the largest number of protein identifications generated using CE-MS/MS. The number of proteins identified by our approach is over twice as large as Yates’ report and over an order of magnitude larger than Lindner’s report.13–14
These results immediately suggest a strategy for the use of CZE-ESI-MS/MS for the analysis of highly complex samples. Those samples can be prefractionated using two-dimensional LC, followed by CE analysis. Assuming a 10-minute CZE separation time, roughly 150 fractions could be analyzed during the 24 hours used for the triplicate gradient in the analysis of the yeast proteome.11 The complementary nature of CZE and UPLC should be of great value for the complete analysis of a complex proteome.
Finally, other forms of capillary electrophoresis have been coupled with liquid chromatographic prefractionation. In particular, capillary isoelectric focusing (cIEF) and capillary isotachophoresis (cITP) have been used for analysis of complex protein lysates.34–38 Those analyses take advantage of the focusing properties of the separation modes, which allows the use of much larger sample volumes than in CZE. Both cIEF and cITP require a fair amount of manipulation of separation buffers and will be more difficult to automate than CZE.
We would like to thank Dr. Carlos Gartner in the Advanced Diagnostics and Therapeutics Program, and Drs. William Boggess and Michelle V. Joyce in the Notre Dame Mass Spectrometry and Proteomics Facility for their help. This project was supported by a grant from the National Institutes of Health (R01GM096767).