|Home | About | Journals | Submit | Contact Us | Français|
Partially tryptic peptides are often identified in shotgun proteomics using trypsin as the proteolytic enzyme; however, their sources have been controversial. Herein we investigate the impact of in-source fragmentation on shotgun proteomics profiling of three biological samples: a standard protein mixture, a mouse brain tissue homogenate, and mouse plasma. Since the in-source fragments of peptide ions have the same LC elution time as its parental peptide, partially tryptic peptide ions from in-source fragmentation can be distinguished from the other partially tryptic peptides based on their elution time differences from those computationally predicted data. The percentage of partially tryptic peptide identifications resulting from in-source fragmentation in a standard protein digest was observed to be ~60 %. In more complex mouse brain or plasma samples, in-source fragmentation contributed to a less degree of 1–3 % of all identified peptides due to the limit dynamic range of LC-MS/MS measurements. The other major source of partially tryptic peptides in complex biological samples is presumably proteolytic cleavage by endogenous proteases in the samples. Our work also provides a method to identify such proteolytic-derived partially tryptic peptides due to endogenous proteases in the samples by removing in-source fragmentation artifacts from the identified peptides.
Trypsin has been commonly used in mass spectrometry (MS)-based bottom-up proteomics for several reasons.1–4 First, this protease has been reported to strictly cleave the C-terminal amide bond of arginine or lysine residues,1 resulting in peptides with a MS-friendly mass range of 10 amino acid residues (on average). Furthermore, the resulting basic residue located at the C-terminus in peptides enhances electrospray ionization efficiency and contributes to the generation of an abundant y-ion series in tandem mass spectra to facilitate confident identification. While the bottom-up proteomics strategy predominately identifies fully tryptic peptides in complex samples when trypsin is used, a significant number of partially tryptic peptides are often confidently identified.2, 3, 5–8 Such observations led to the controversy on whether trypsin provides strict specificity on its cleavage residues arginine and lysine. For example, Olsen et al. reported that trypsin cleaved exclusively C-terminal to arginine and lysine residues based on their data from mouse liver tissue. On the other hand, Picotti et al. reported that the percentage of partially tryptic peptides identified from a single standard protein beta-lactoglobulin was >70%, although most of them were of relatively low-abundance compared to fully tryptic peptides.3
There are several potential sources for generating partially tryptic peptides, including possible chymotrypsin contamination in trypsin, pseudotrypsin activity due to partial autolysis of trypsin, proteolytic products due to endogenous proteases, and in-source or “nozzle-skimmer” fragmentation of fully tryptic peptides.1–3, 8 Among these sources, the chymotrypsin contamination and autolysis issues have been largely alleviated by using the sequencing grade of modified trypsin. The modified trypsin is reductively dimethylated on lysine residues and subjected to TPCK treatment to eliminate chymotryptic activity,9 followed by affinity purification. The dimethylated trypsin also has a strong resistance against auto-proteolysis, which can generate pseudotrypsin activity.10 Therefore, there are only two main sources for generating partially tryptic peptides: proteolysis by endogenous proteases present in the sample and in-source fragmentation (ISF), which fragments fully tryptic peptides within the electrospray ionization source. We hypothesize that the observed high percentage of partially tryptic peptides in a single protein sample as reported by Picotti et al.3 was largely due to ISF since ISF products are more identifiable in a less complex peptide mixture. In this work, we aim to examine the impact of ISF on peptide identifications from both standard protein mixtures and complex biological samples.
In-source fragmentation is directly related with the desolvation and activation energy at the interface of electrospray ionization (ESI) source. The desolvation/activation energy is controlled by the cone voltage, which is also called nozzle-skimmer voltage, or declustering voltage.11–17 Higher desolvation/activation energy can enhance ion yield but simultaneously facilitate ISF due to elevated ion energy,14, 16 which compromises the final ion yield. In practice, researchers typically optimize the ESI-MS desolvation process by balancing these two inversely proportional factors to maximize ion gain. Therefore, some degree of ISF is an inherent phenomenon in ESI-MS, regardless of whether the resulting in-source fragments could be identified or not.
A substantial number of partially tryptic peptides were identified in different LC-MS/MS based shotgun proteomic studies, but they were often considered as less confident identifications than fully tryptic peptides. To clarify the sources of partially tryptic peptides and the specificity of trypsin, we investigated the impact of ISF on peptide identifications in a mixture of six standard proteins, mouse brain tissue, and mouse plasma. Since ISF-generated partially tryptic peptides will have the same LC elution times as their corresponding fully tryptic peptides, these peptides can be differentiated from other proteolytic-derived partially tryptic peptides based on examining their elution times in comparison with their parental tryptic peptides. By removing the ISF-generated artifacts from the identified partially tryptic peptides, it becomes possible to directly survey proteolytic cleavage products resulting from endogenous proteases in biological samples.
Six standard proteins, including bovine carbonic anhydrase II, bovine beta-lactoglobulin, E. coli beta-galactosidase, equine skeletal muscle myoglobin, chicken ovalbumin, and bovine cytochrome c, were obtained from Sigma-Aldrich (St. Louis, MO). Mouse brain tissues dissected from adult C57BL/6J male mice (9 weeks, 21–27 g) were obtained from Jackson Laboratories (Bar Harbor, ME). All tissues were homogenized in 25 mM NH4HCO3 (pH 8.2), and protein concentration was determined by BCA assay (Pierce, Rockford, IL). Mouse plasma from Equitech-Bio (Kerrville, TX) was determined to have a concentration of ~40 mg/mL using the BCA protein assay.
Protein samples were initially denatured and reduced with 8 M urea and 10 mM dithiothreitol in 50 mM NH4HCO3 buffer (pH 8.2) for 1 h at 37 °C and followed by alkylation of cysteine residues with 40 mM iodoacetamide for 1 h at 37 °C in the dark. Following a 10-fold dilution with 50 mM NH4HCO3, resulting samples were digested using sequencing grade modified porcine trypsin (Promega, Madison, WI) at a trypsin-to-protein ratio of 1:50 (w/w) for 5 h at 37 °C. The digested samples were individually loaded onto a 1-mL solid-phase extraction (SPE) C18 column (Supelco, Bellefonte, PA) and washed with 4 mL of 0.1% trifluoroacetic acid (TFA)/5% acetonitrile (ACN). Peptides were eluted from the SPE column with 1 mL of 0.1% TFA/80% ACN, and then lyophilized. After reconstituting the resulting peptide samples in 25 mM NH4HCO3, the final peptide concentration for each sample was measured with the BCA protein assay.
Peptides were analyzed using a custom-built automated four-column high-pressure capillary LC system coupled on-line to a LTQ-Orbitrap mass spectrometer (Thermo Scientific) via a nanoelectrospray ionization interface manufactured in-house with 500 μL/min flow rate.18 The reversed-phase capillary column was prepared by slurry packing 3-μm Jupiter C18 bonded particles (Phenomenex, Torrence, CA) into a 65-cm-long, 75-μm-inner diameter fused silica capillary (Polymicro Technologies, Phoenix, AZ). After loading 2.5 μg of peptides onto the column, the mobile phase was held at 100% A (0.1% formic acid) for 20 min, followed by a linear gradient from 0 to 70% buffer B (0.1% formic acid in 90% acetonitrile) over 85 min. Each full MS scan (m/z 400–2000) was followed by collision-induced MS/MS spectra (normalized collision energy setting of 35%) for the 6 or 10 most abundant ions for standard protein mixture/mouse plasma or mouse brain sample, resulting in an overall cycle time of ~2.5 s or ~3.6 s, respectively. The automatic gain control setting for the orbitrap was 1,000,000 with a resolution of 60,000. The dynamic exclusion duration was set to 1 min. The heated capillary was maintained at 200 °C and. Parameters for ion optics were automatically tuned fby Xcalibur Tune Plus to maximize ion gain and the tuning parameters were: ESI voltage 2.2 kV, heated capillary temperature 200 °C, capillary voltage 30 V, and tube lens voltage 80 V.
LC-MS/MS raw data were converted into .dta files using Extract_MSn (version 3.0) in Bioworks Cluster 3.2 (Thermo Fisher Scientific, Cambridge, MA), and the SEQUEST algorithm with a parameter file consisting of no enzyme criteria and static cysteine modification (+57.022 Da) was used for all MS/MS spectra against the collected standard protein sequence list or the mouse Uniprot database (25 338 proteins, released May 5, 2010). A decoy database search was employed, and an extremely low false discovery rate (FDR) of 0.05% was achieved by filtering peptides using a MS generating function score threshold19, 20 in combination with a 10 ppm mass accuracy cut-off for the final identified peptides.
To investigate how much the ISF-generated ‘artifacts’ can impact the peptide identifications, we initially examined all the identified partially tryptic peptides using a digest of six standard proteins to determine if the peptides were in-source fragments of potential parental peptides at the same elution time. Surprisingly, an almost complete series of y-ion type in-source fragments within detectable m/z range were observed with the majority having <10% intensity of their respective fully tryptic parental peptides in the standard protein digest. It should be noted that the N-terminal portion b-ion type ISF products are not readily identified in typically SEQUEST database searching because the loss of a water molecule at the C-terminus during ISF is often not accounted for in the searching parameters. Nevertheless, the b-ion type in-source fragments with significantly low abundances were also observed. Fig. 1A shows a complete series of extracted ion chromatograms of the in-source fragments from a fully tryptic parental peptide, AVVQDPALKPLALVYGEATSR, from bovine carbonic anhydrase II. All y-ion type in-source fragments, from y19 to y4, with detectable m/z range in orbitrap-MS (400–2000, m/z) and nine b-ions were observed with the same elution time (35.79 min). Among the sixteen y-ion type in-source fragments, ten ions, y19 - y10, were identified by SEQUEST search as partially tryptic peptides while any b-ion type in-source fragments were not identified, not surprising due to the loss of a water molecule in the C-terminus.
When examining much more complex samples such as mouse brain, several in-source fragments were also frequently observed, but with much lower intensities. Fig. 1B shows the extracted ion chromatogram of VIISAPSADAPMFVMGVNHEK from glyceraldehyde-3-phosphate dehydrogenase in mouse brain and its co-eluted in-source fragments, five y-ion and two b-ion type in-source fragments with <2% and <0.1% intensities of the fully tryptic parental peptide, respectively. For these in-source fragments, only the y19 and y16 ions were identified by the SEQUEST search. It is notable that in-source fragments retain lower charge states than their corresponding parental peptides because the charge of the parental peptide is divided between daughter fragments. Similar patterns of ISF were observed in other mouse brain peptides (S. Fig. 1 and S. Table 2)
Partially tryptic peptides can be classified into three groups as the [K/R].P-type, Y-ion type, and B-ion type (Table 1). The [K/R].P type of partially tryptic peptides retains -K.P- or -R.P- sequence on either N- or C-terminus, which are well-known as trypsin-inhibitory sequences. The Y-ion type has C-terminal lysine or arginine residue, while the B-ion type does not. Among the partially tryptic peptides, the [K/R].P type starting with the proline residue and Y-ion type peptides can be originated from in-source fragmentation, whereas the other types of in-source fragments cannot be identified by SEQUEST analysis using fully-tryptic search parameters without considering the water loss on the C-terminus. Considering that the partially tryptic peptides from ISF have the same elution time as their parental peptides, the elution times of partially tryptic peptides from ISF should often be much different from their predicted elution times. In other words, peptides whose actual elution times correlate well with their predicted elution times are more likely the result of proteolysis, not ISF. Based on this concept, we plotted the first observed scan number for all peptides identified against the predicted normalized elution times (NET) using an in-house NET prediction tool based on a previously published algorithm from our laboratory21, 22 (publicly available in http://omics.pnl.gov/software). Fig. 2 shows the plots from three different samples: a standard protein mixture, mouse brain, and mouse plasma. Almost all the fully tryptic peptides show a linear correlation (R2 > 0.97) between predicted NET values and the observed scan numbers, as expected. The identified B-ion type partially tryptic peptides also show similar correlation as the fully tryptic peptides for all three cases. As we noted, since the identified B-ion type partially tryptic peptides cannot be from ISF without considering water loss at the C-terminus, those well-aligned B-ion type partially tryptic peptides can be regarded as products of proteolytic cleavage. Unexpectedly, even the most [K/R].P type partially tryptic peptides show well-matched linear correlation (Fig. 2A–C), which suggests that some of the inhibitory sequences might still be cleavable by trypsin.4 Nevertheless, a significant number of misaligned points were observed as the Y-ion type partially tryptic peptides, which are potentially from ISF, especially in the standard protein mixture sample (Fig. 2A).
Manual inspection of the data confirmed our original speculation that the partially tryptic peptides observed at distant LC elution times from their expected elution times are most likely from ISF since these partially tryptic peptides were well aligned with their parental tryptic peptides in elution times. Furthermore, we did not observe any fully tryptic parental peptides for the [K/R].P-type or B-ion type partially tryptic peptides at the same elution time for all three datasets, which indicates that these types of partially tryptic peptides were generated by proteolytic cleavage.
Fig. 3 displays the percentages of each type of peptides in the three samples. At the unique peptide level, in-source fragments represent a large portion (~60 %) of the total peptide identifications from the standard protein mixture, although the in-source fragments are typically one to three orders of magnitude lower in their observed intensities compared to their fully tryptic parental peptides (Fig. 1). The portion of in-source fragments was smaller in the complex samples, where 1.1 % and 3.1 % of in-source fragments were observed from mouse brain and plasma, respectively. The observed significant portion of partially tryptic peptides (71% of 470 unique peptides) in standard proteins is highly consistent with the work of Picotti et al.,3 where it was reported that 78% (87/112) of the unique peptides in a bovine beta-lactoglobulin digest were partially tryptic; however, they concluded that trypsin might generate a number of non-specific cleavage products with much lower abundance by regarding most of the ISF-generated partially tryptic peptides as non-specific trypsin cleavage products. The lower charge state and significantly lower abundance of the ISF partially tryptic peptides as shown in Fig. 1 and S. Fig. 1 are also consistent with the data from Picotti et al.3 In principle, the level of in-source fragments should be dependent on ESI conditions and not dependent on sample complexity, the detectability of in-source fragments are largely dependent on the overall sample complexity due to the much lower abundance of in-source fragments compared to fully tryptic parental peptides. We attribute the large number of in-source artifacts identified in the standard protein mixture to the increased likelihood for low-abundance ions to be selected for MS/MS fragmentation for the low complex sample. However, in highly complex mouse brain lysate only a small portion (1.1% of 5270 unique peptides) of in-source fragments were observed (Fig. 2B and Fig. 3), while ~96% of identified peptides were fully tryptic. The result from mouse brain is consistent with the work of Olsen et al.,1 where they reported that trypsin cleaved exclusively C-terminal to arginine and lysine residues by demonstrating 97% (593/607) of peptides were fully tryptic in mouse liver lysate. It is notable that even if the numbers of identified in-source artifacts were significantly smaller in complex samples compared with standard protein digest, these in-source fragments still exist as shown in Fig. 1B, but they were not selected for MS/MS fragmentation in mouse brain sample due to their low abundance and so-called ‘under sampling’ in shotgun proteomics (S. Fig. 1A).
The identified in-source fragments in mouse plasma digest were mostly from highly abundance plasma proteins, e.g. serum albumin, transferrin, and apolipoprotein A-1 (S. Table 3). The increased portion (3.1%) of the ISF artifacts in plasma vs. mouse brain tissue can be attributed to the nature of plasma sample, which contains a number of highly abundant proteins. The in-source fragments from tryptic peptides of highly abundant proteins will have relatively competitive abundances compared to those tryptic peptides from mid- or low-abundant proteins, thus making these ISF-generated peptides to be more likely detectable. Also, a large portion (42%) of unique peptides in mouse plasma corresponds to the partially tryptic peptides unrelated to ISF (Fig. 3), in which the observed elution times of these peptides are well-aligned with the predicted ones similar to those fully tryptic peptides (Fig. 2C). This observation is reasonable for the plasma sample since plasma retains numerous secreted proteases and proteolytic processing events are more common in plasma. It is worthy to note that the removal of ISF-driving artifacts from all partially tryptic peptides by aligning predicted elution times with observed ones provide a means to identify in-vivo proteolytic cleavage or degradation products in biological samples. As an example, for serum albumin we identified a number of potential truncated products by aminopeptidases or carboxypeptidases (S. Table 4).
The cleavage motifs of partially tryptic peptides in mouse brain and plasma were investigated separately. Fig. 4 shows the sequence motifs of the ISF-generated partially tryptic peptides (Fig. 4A and B) and the other partially tryptic peptides, which are regarded as in-vivo or in-vitro proteolytic products (Fig. 4 C and D). The ISF artifacts show quite different sequence motifs from proteolysis-generated partially tryptic peptides. In both mouse brain (Fig. 4A) and plasma (Fig. 4B) samples, it appears that proline residue at P1’ position (right side of the cleaved peptide bond) and/or aliphatic residues such as valine, leucine, or isoleucine at P1 position (left side of the cleaved peptide bond) are more favorable for ISF. These residue-specific fragmentations are also well-known patterns for CID in MS/MS spectra,23, 24 providing further evidence that these partially tryptic peptides were from collisional dissociation in the electrospray source region.
The sequence motifs from the proteolysis-generated partially tryptic peptides in mouse brain show a slightly higher frequency of chymotrypsin-susceptible residues, i.e. tyrosine and phenylalanine, at P1 position (Fig. 4C), which might indicate the presence of chymotrypsin-like proteases in the tissue sample, whereas no outstanding motif was observed in mouse plasma (Fig. 4D), which is actually expected because the plasma sample contains many different types of in-vivo proteolyzed proteins by various proteases. The sequence motif from partially tryptic peptides of trypsin only shows the preference of asparagine or alanine residue at P1 position (S. Fig. 2), which suggests that the pseudotrypsin25, 26 might have different specificity from the original trypsin due to partially altered three-dimensional structure. The alanine and asparagine residues with high frequency at P1 position in Fig. 4C might be partially counted to pseudotryptic activity of trypsin.
Interestingly, all of the [K/R].P type partially tryptic peptides in both brain lysate and plasma samples were observed with well-matched elution times as predicted for fully tryptic peptides, and no co-eluted potential fully tryptic parental peptide was observed for them. These observations strongly support that [K/R].P type partially tryptic peptides actually can be considered as fully tryptic peptides; i.e. some of [K/R]-P peptide bonds can be cleaved by trypsin as recently reported by Rodriguez et al.4 Furthermore, it appears that the [K/R].P type peptides contain small amino acids like glycine, alanine, and serine with relatively high frequency at both P2 and P2′ positions (S. Fig. 3A), whereas there is no preference of amino acid at P2 and P2′ positions in non-cleaved peptides. Of interest, however, is that carboxylic acid residues are occasionally present (S. Fig. 3B), and these residues are known to be unfavorable for trypsin digestion.27
At first glance the stringency of trypsin specificity appears to be inconsistent between biological samples if the impact of in-source fragments is not considered, and which explains the discordance between several previous reports about trypsin specificity.1, 3, 4, 6 Biological samples with different levels of complexity were used in these reports, and the groups making shotgun data dependent measurements for complex proteome samples such as liver tissue and having significant under-sampling, report good trypsin specificity,1, 4 whereas the other groups dealing with much less complex samples, e.g. standard protein3 or in-gel digest6, observe significant levels of partially tryptic peptides. Our observation of ISF provides a solid explanation of the observed data and trypsin specificity. As shown in Fig. 3, the predominant source of partially tryptic peptides in standard proteins was ISF (81% of partially tryptic peptides). In mouse brain, since the [K/R].P-cleaved partially tryptic peptides could be regarded as fully tryptic peptides4 as we discussed, only 2% of peptides identified may arise from potentially unspecific activity of trypsin or in-vivo proteolysis. Therefore, it appears that the impact of trypsin-induced partially tryptic peptides is minimal (< 2% in number of unique peptides) in LC-MS/MS proteomics data for complex biological samples. For this reason, the identification of partially tryptic peptides is mostly likely from in-vivo or in-vitro proteolytic processing or degradation events after ISF-generated artifacts are removed. Table 2 and S. Table 4 provide some examples of the sources of identified partially tryptic peptides.
In summary, we report ISF is a major source of partially tryptic peptides in LC-MS/MS proteomics. The impact of ISF artifacts is much less significant for data-dependent measurements for more complex mixtures due to the larger extent of under sampling of low-abundance species. The in-source fragments were distinguished from proteolysis-generated partially tryptic peptides based on elution time criteria, in which the majority of the in-source fragments showed much later elution times compared to their predicted elution times. By excluding such ISF artifacts one can much more effectively identify in-vivo or in-vitro proteolytic products, as well as better characterize protease activities.
Portions of this work were supported by the NIH Director’s New Innovator Award Program DP2OD006668, and NIH grants 8P41 GM103493 and 5P41 RR018522. The experimental work described herein was performed in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the DOE and located at Pacific Northwest National Laboratory, which is operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RL0 1830.