|Home | About | Journals | Submit | Contact Us | Français|
We previously reported a protein expression profiling experiment conducted on human pancreatic tissues using 2D gel electrophoresis and mass spectrometry. Here, 18 spots that were identified in the gel at molecular weights more than 10 kDa lower than database values are characterized. The matrix-assisted laser desorption/ionization mass spectrometry coverage is sufficient to identify the protein region present in each spot. Most of the fragments correspond to processed chains and known structural or functional domains, which may result from limited proteolysis.
Two-dimensional (2D) gel electrophoresis for separation of complex protein samples coupled with mass spectrometry for protein identification has been used to analyze protein expression patterns for many sample types. Inherent in the use of this technique is information not only on full-length protein expression, but expression of modified, splice variant, cleavage product, and processed proteins. Any protein modification that leads to a change in overall protein charge and/or molecular weight (MW) will generate a different spot on the 2D gel. Modification specific staining can identify whether a specific post-translational modification is responsible for the shift, and mass spectrometry can potentially identify the source of isoelectric point (pI) and/or MW differences.1,2 Due to the lack of complete coverage for a protein’s amino acid sequence using either matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) or high-performance liquid chromatography (HPLC) tandem mass spectrometry (LC-MS/MS), there has been limited success in using MS to identify isoforms and post-translational modifications. While the theoretical MW is often slightly higher than the MW of the fully processed protein due to cleavage of signal and pro-peptides, there can also be post-translational modifications that increase the protein’s gel MW. Thus an exploration into the causes of the difference in the theoretical MW and the MW as seen in the gel can yield information about the state of the protein. When the gel MW of a given protein is significantly lower than the calculated weight, the gel spot represents a protein fragment.
The extent to which proteins are present as fragments or variants in tissues and fluids has not been determined, but the combination of 2D gel electrophoresis, Western blotting, and mass spectrometry–based protein identification makes such analyses possible. Two-dimensional gel electrophoresis of human mammary tissue, followed by immunoblotting, resulted in multiple spots at significantly differing molecular weights.3,4 Mattow et al. used 2D gel electrophoresis and MALDI-MS to examine the culture supernatant proteome of Mycobacterium tuberculosis, where 58% of the proteins were detected in more than one spot, including multiple truncations of elongation factor EF-Tu.5,6
The function of protein fragments is dependent on activation processes and localization properties. Proteins that are activated by limited proteolysis can be identified as either precursor protein or fully processed product based on this information. In other cases, there are transcriptional variants of different lengths, such as 54-kDa heat shock cognate protein (HSC54), which serves as a competitive inhibitor of the full-length transcript, and 71-kDa heat shock cognate protein (HSC71), with differing localization properties.7,8 Fragments are secreted into the serum, where they have been identified as antigens or markers for specific diseases.9,10 Several angiogenesis inhibitors are fragments of larger proteins that are themselves not active as angiogenesis inhibitors.11 Vasostatin, the N-terminal domain of calreticulin, is an angiogenesis inhibitor that exerts antitumor effects in vivo.12 In a surface-enhanced laser desorption/ionization (SELDI) identification of three serum biomarkers for the detection of early stage ovarian cancer,13 two of the biomarkers were identified as lower-MW protein fragments. Thus, fragments may be of greater importance as secreted proteins than in the tissue of origin.
Our previously published study used 2D gel electrophoresis and image analysis to identify proteins that were differentially expressed between normal pancreatic tissue, pancreatitis tissue, and pancreatic adenocarcinoma tissue.14 Sixty-eight differentially expressed gel spots were analyzed by MALDI-MS and database searching, and 40 different proteins were identified from these spots. In the course of that analysis, 10 proteins were identified in 18 gel spots at MW values significantly lower than their theoretical MW values. In this study, the protein regions observed in these fragments are identified by detailed analysis of the mass spectral results generated for the previous study. In vitro experiments are described that address the possibility of proteolytic activity as the source of protein fragments.
Twenty-one total pancreatic tissue samples were obtained from NCI Human Tissue Network as described in Supplemental Table 11 of the previous report.14 Normal tissues included autopsy samples (n = 5) collected within 8 h of death from patients with non-pancreatic diseases. The remaining tissues were surgical samples, including normal tissues (n = 2) adjacent to tumors, normal tissue (n = 1), tissues from pancreatitis patients (n = 7), and pancreatic ductal adenocarcinomas (n = 6). All samples were frozen within 30 min of resection. Selected samples were pooled by tissue type into three sets—normal, pancreatitis, and tumor.
For each individual or pooled sample, 150 μg of protein was precipitated from the radioimmuno-precipitation assay (RIPA) buffer. Two-dimensional gel electrophoresis was performed for protein separation with isoelectric focusing over pI ranges of 3–10, 4–7, or 5–8 used for the first dimension and SDS-PAGE for the second dimension, as previously described.14,15 Gels were stained with SYPRO-Ruby and images captured on a Kodak Image Station 440CF. Integrated signal intensities were analyzed quantitatively by using Kodak 1D or Bio-Rad’s PDQuest 2-D gel image analysis software. Differential expression was then confirmed visually by two independent observers.
The selected spots were manually excised and subjected to in-gel tryptic digestion based on the procedure described by Rosenfeld et al.16 The samples were desalted using a Ziptip μ-C18 pipette tip (Millipore, Billerica, MA) according to the manufacturor’s protocol, using 1.45 μL of matrix to elute the bound peptides directly onto the MALDI target. Ten milligrams of α-cyano-4-hydroxycinnamic acid (Applied Biosystems, Framingham, CA) was dissolved in 1 mL of 50% acetonitrile/0.1% tri-fluoroacetic acid to produce a saturated solution. The supernatant was then diluted 1:1 with solvent and used as the matrix. Alternating rows on the target were spotted with 0.45 μL of Calibration Mixture 1 (Applied Biosystems) for external calibration.
MALDI-TOF spectra were acquired for reference 14 on an Applied Biosystems Voyager DE-PRO in reflectron mode over the m/z range 700–2500. Automated data acquisition was performed by the Proteomics Solutions 1 Utility V1.0.0, with close external calibration performed for each sample spot. Database searching was repeated for this paper. Spectral processing utilized Data Explorer 220.127.116.11 to do baseline correction and de-isotoping of the mass spectra. The peak list was filtered to remove porcine trypsin and human keratin masses seen in blank gel digests or contaminated samples. The remaining 15 or 20 most intense peaks in the 900–2500 m/z range were entered into Auto-MS-Fit (v. 3.2.1)17 for PMF matching between the experimentally measured digest peptide masses and those generated by theoretical digest of a mammalian subset of the Swiss-Prot database (v. May 9, 2004)18 containing proteins up to 100 kDa (about 25,000 entries). The PMF searches were carried out assuming peptides were the result of tryptic digestion, with no more than two missed cleavages, and carbamidomethylation of the cysteine residues. Variable modifications considered were oxidation of methionines, protein N-terminal acetylation, and cyclization of peptide N-terminal Gln to form pyrrolidone carboxylic acid. The first-pass search in Auto MS-Fit required the experimentally measured masses to be within 80 ppm of the theoretical values, and generated a list of proteins that matched at least 5 out of the 20 masses submitted. A second pass utilized Intellical to recalibrate the spectrum based on the first-pass results as a pseudo-internal calibration curve, and required experimentally measured masses to be within 30 ppm of the calculated values. A list of proteins that matched at least 4 out of the 20 peak masses submitted was generated. False-positive hits typically had a maximum of 4 or 5 peptides matched out of 20 masses submitted under the search conditions described.
Proteins with less than 6 peptide masses matched by PMF were subjected to post-source decay (PSD) fragmentation spectra on the most intense peptide ion(s) to identify the protein by peptide sequencing. For the PSD, an average of 10 segments were collected at different mirror ratios. In the <180 m/z range, ~3 × 10−6 torr collision gas (air) was used for CID. A list of fragment ions was entered into the MS-Tag search engine manually, using the same parameters as above, and queried against a database of expected peptide fragment ions based on tryptic digest of database proteins. The parent ion mass tolerance was 30 ppm, while the fragment ion mass tolerance was 1500 ppm. The search was performed first using the Swiss-Prot database. For this paper, the search was repeated with a shuffled database to estimate the rate of false positives.19 False-positive hits occurred when 76 ± 11% of ions were matched for peptides with fewer than 20 ions submitted, and false-postive hits matched at most 60 ± 6% of the ions for the peptide with more than 20 ions submitted.
Tissue lysates were analyzed to detect any residual protease activity with the Protease Screening Kit (Genotech, St. Louis, MO) following the manufacturer’s protocol. Twenty tissue lysate samples (one sample was used up prior to this analysis) were analyzed. Five micrograms of protein extract in RIPA buffer from each sample was incubated with protease substrate at 37°C for 3 h. Following precipitation and centrifugation steps, assay buffer was added to the clear supernatant. A pink color developed in the presence of protease activity. The intensity of the color, which is proportional to the protease activity, was measured at λ = 574 nm using a Cary 50 Bio UV-Visible Spectrophotometer (Varian Instruments, Walnut Creek, CA).
Ten proteins identified during the original study appeared in the gel at MW more than 10 kDa lower than the theoretical MW, and thus represent protein fragments. These ten proteins were seen at lower MW in 18 distinct gel spots, and five of the proteins also were detected at the expected MW for the full-length protein. On average, each protein was identified from two separate gels. Figure 11 is a representative 2D gel image from reference 14 showing the location of the 18 protein fragment and 5 full-length protein spots. The proteins were identified either by PMF of the MALDI-MS or peptide fragmentation of individual peptides using MALDI-PSD.14 For this paper, the same spectra were subjected to a new search using the Swiss-Prot database. The protein identification data for the 23 spots of interest from reference 14 is detailed in Supplementary Table 11.. The full peak lists, peptide sequences, and residue numbers for the MALDI-matched masses for the new searches are given in Supplementary Table 22.
For 78-kDa glucose-regulated protein GRP78 (previously reported via its gene name HSPA5), the full-length 72-kDa protein was identified in spot 21. The peptides detected in the MALDI-MS are shown in Figure 2A2A and are labeled in the spectrum on the upper panel of Figure 2B2B.. Thirty-seven percent of the amino acids from the database sequence for GRP78 precursor protein were covered by the MALDI-MS. Of the MALDI-detected peptides for spot 21, the residue closest to the N-terminal was V50, while the residue closest to the C-terminal was K633. GRP78 was also identified in three lower-MW gel spots, spots 22–24. For spots 22–23, all the peptides detected were confined to the protein’s N-terminal, with V50 the residue closest to the N-terminal, and R386 the C-terminal extreme residue detected (Figure 2A2A and middle panel of 2B). In contrast, the peptides detected in spot 24 were confined to the protein C-terminal region, with peptides confined between residues K447 and K663 (Figure 2A2A and lower panel of 2B). Thus, spots 22–23 represent N-terminal protein fragments of GRP78, while spot 24 represents a C-terminal protein fragment of GRP78. The sequence coverage of spots 22 and 23 vs. spot 24 are mutually exclusive, as can be seen by the lack of overlapping peptides in the spectra from spots 23 and 24 in Figure 2B2B.. Cleavage between residues 386 and 447 could have produced the N- and C-terminal fragments. Significantly, even in the spectrum of the full-length protein, no peptides are seen in the 386–447 region, due to a dearth of cleavage sites in that stretch. Similarly, the MALDI-MS for the other protein fragment spots contained peptides that were confined to a particular protein region, either N-terminal, C-terminal, or central. They are listed in Table 11,, along with the MALDI-MS N- and C-terminal extreme residues detected.
Coverage values are also listed in Table 11,, as calculated in the conventional manner, using the number of amino acid residues detected in each MALDI-MS divided by the number of amino acids in the database sequence. In addition, a weighted percent coverage is calculated for the protein fragment spots. Here the ratio of the database MW for the full-length protein over the gel MW of the protein fragment is used to reflect the fact that the spot contains only a fragment of the protein. When this is done, the percentage coverage for protein fragments is on average similar to that of full-length proteins. Using the percentage coverage values for the 18 protein fragments contained in Table 11,, 33% coverage is obtained on average. From the data in our previous publication,14 an average of 24% coverage was observed for the 50 protein spots that appeared at expected MW values. In order to improve protein coverage, the MALDI-MS were examined manually to determine whether unmatched peaks could be identified manually, possibly as non-tryptic cleavage products. In general, the unmatched masses listed in the MS-Fit output represented noise. However, in several cases, an unmatched peptide peak was found to be the protein N-terminal peptide after removal of the signal sequence. Since the Swiss-Prot database lists the precursor protein, the signal sequence is not removed for the theoretical digest. However, removal of the signal sequence generates a new N-terminal tryptic peptide for the protein. This peptide was observed in eight MALDI-MS in this study (see Table 11),), and confirmed by peptide fragmentation of the N-terminal peptide by MALDI-PSD in three cases. Incorporation of signal peptide information would help to improve database searching for PMF and MS/MS searches. Several other unmatched ions were results of sample oxidation during 2D gel electrophoresis: oxidized tryptophan, oxidized carbamidomethylated cysteines, and fragmentation products of oxidized residues. As seen by the GRP78 spectra in Figure 2B2B,, most of the peptide peaks are identified as tryptic peptides.
Combining 2D gel electrophoresis and protein identification by mass spectrometry is an effective method for establishing the presence of protein fragments in biological samples. The peptides detected for protein fragments are confined to a particular region of the protein sequence, and listing the endpoints of the MS detected peptides helps identify the approximate boundaries, even with the 20–30% coverage typical of MALDI-MS. While region-specific antibodies can also be used for such analyses, the possibility of nonspecific binding necessitates confirmation by another method for identification of any protein not seen at the expected molecular weight. If the antibody is raised for only one region of the protein, it will not detect those fragments that do not contain the epitope sequence, while the MS analysis is largely sequence independent. Reporting of the region covered by the MS protein identification will be a useful tool in establishing the prevalence of fragments and more accurately reflect the information contained by the data.
A number of mechanisms are possible for the generation of the protein fragments. They may be generated at the RNA level as sequence variants and truncations, or post-translational modifications used to activate proteins or fragments generated for alternative purposes, such as inhibitors or for secretion. Some proteins are activated by cleavage into polypeptide chains held together by disulfide bridges. In the cell, they remain intact. However, during reduction and alkylation of the cysteines prior to isoelectric focusing, the disulfide bonds are disrupted and the individual chains are separated by 2D gel electrophoresis. This phenomenon is observed for two of the proteins in this study, cathepsin D and chymotrypsinogen. The active form of cathepsin D consists of a light and heavy chain, linked by disulfide bonds. The heavy chain includes residues 169–412, with a calculated MW of 27 kDa. The experimental MALDI data for spots 8 and 9 includes peptides in the region extending from 195–411, with a gel MW of 27 kDa, and can thus be assigned as the heavy chain of a processed cathepsin D (Table 11).
Five spots were identified as chymotrypsinogen (Table 11).). Processing of the inactive zymogen chymotrypsinogen A to the fully active protease chymotrypsin proceeds via two pathways.20,21 In the fast activation pathway, trypsin cleaves after Arg-33, producing an active π-chymotrypsin. Chymotrypsin subsequently excises two dipeptides, residues 32–33 and 165–166, to form the fully activated α-chymotrypsin. The three chains are held together by disulfide bridges. An alternate, slow activation pathway involves chymotryptic cleavage of one or more of three chymotryptic cleavage sites to form inactive neo-chymotrypsinogens, as indicated in Figure 3A3A.22 Neo-chymotrypsinogens can be activated by tryptic cleavage after Arg-15, again progressing to a three-chain enzyme. The location of the dipeptides and final chains is shown in Figure 3A3A,, highlighting the location of peptides detected by MALDI-MS and MALDI-PSD. Spot 16 migrated at the pI and MW expected for the 28-kDa inactive precursor protein chymotrypsinogen, similar to its position in 2D gel separations of pancreatic juice. Spots 17 and 18 (gel MW 17 kDa) were shown to contain peptides from both the A and B chains of chymotrypsin. The N-terminal peptide containing the A chain and first dipeptide (residues 19–33) is detected only in the spectra of spots 17 and 18. This peptide is generated by tryptic digestion only if the precursor signal peptide has been removed prior to 2D gel electrophoresis. The MALDI-PSD of this peptide at m/z 1562 is shown in Figure 3B3B.. The A-B chain is produced via the neo-chymotrypsinogen pathway of activation. The MS of spots 19–20 combined with gel mobility suggest that these spots contain the B chain of the fully activated chymotrypsin enzyme.
The other eight proteins observed as protein fragments do not represent chains resulting from activation pathways required to produce a functional protein. However, most have been observed previously and represent structural or functional domains. A summary of the domains detected is given in Table 22,, along with citations for the characterization of the domains in vitro, or previous observations of fragments from in vivo samples. GRP78 fragments have been observed previously in a global proteomics analysis of pancreatic tissue.23 In addition, in vitro studies have identified two domains, an N-terminal ATPase domain and a C-terminal oligomerization domain.24–26 The boundary between these in vitro domains is Q409, and that is consistent with our observed N-terminal domain spots 22 and 23 and the C-terminal domain in spot 24 (Figure 22).
Prolyl 4-hydroxylase β (gene P4HB), also known as protein disulfide isomerase, is a well-characterized protein containing a signal sequence and five structural domains, as shown in Figure 4A4A.27–29 It was identified in three gel spots in the original study, spots 34–36. Spot 34 is the full-length protein, and spots 35 and 36 are N-terminal protein fragments, with different cleavage sites producing the 37-kDa and 25-kDa protein fragments (Table 11).). The MALDI spectra are shown in Figure 4B4B.. Spot 34 contains peptides from a, b, b′, and a′ domains (upper pane), while spot 35 contains peptides from a, b, and b′ (middle pane) and spot 36 contains peptides only from the a and b domains (lower pane). The cleavage point that generated spot 36 is most likely localized between residues 230 and 247, as the peptide covering residues 231–247 is observed at m/z 1965 (peak labeled b/b′ in Figure 4B4B)) in the spectra of spots 34 and 35, but not spot 36. Additionally, the peptide covering residues 223–230 is observed in spot 36 at m/z 991. The 231–247 region overlaps the boundary between the b and b′ domains. The signal peptide has again been removed, exposing a new N-terminal tryptic peptide that is observed in all spectra at m/z 1521 (peak with bold a label in Figure 4B4B).). The same cleavage site that generated spot 36 was implicated in a 2D gel electrophoresis/mass spectrometry protein profiling experiment on barley seed development that detected both N- and C-terminal domain fragments.30
In several studies, limited proteolysis has identified susceptible cleavage sites that are in agreement with our data. In horse pancreatic lipase–related protein 2 (PLRP2), it has been noted that the Ser262Thr263 location is very susceptible to proteolytic cleavage.31 Cleavage at this site is consistent with our data for the protein fragment of PLRP2 in spot 31, with MALDI-MS amino acid residues detected between 44 and 251. In vitro studies have noted that this N-terminal region is the enzyme’s catalytic domain.32,33 Similarly, the linker and coil 2 regions of cytokeratin 8 are contained in the 17-kDa fragment seen in spot 43, which has been observed as 20-kDa fragments from limited proteolysis of cytokeratin 8 and 18 protofilaments.34 Cytokeratin 8 fragments are found in the serum and in tumors as a component of tissue poly-peptide specific antigen (TS1), as reviewed by Sundstrom and Stigbrand.35 Interestingly, the immunodominant region of the cytokeratin 8 fragments in serum was determined to be the conserved residues 340–365, a region present in the 17-kDa fragment at spot 43.
Heat shock cognate protein 71 kDa (HSC71) has a 54-kDa variant (HSC54) containing the N-terminal portion of the protein, with deletions in the C-terminal domain, which has been identified at the mRNA level in various types of human cells and tissues.7 There is evidence that it functions as an endogenous inhibitory regulator of HSC71 by competing for the cochaperones. In addition, the 54-kDa variant is not able to translocate into the nucleus during stress conditions because it lacks a nuclear localization–related sequence.8 The 37-kDa ATPase domain in spot 40 contains N-terminal peptides common to both HSC71 and HSC54, and may represent another variant, or a fragment of either of the known variants.
For two of the proteins detected as protein fragments, the fragments were not associated with known domains. α-Amylase was identified in spots 27 and 28, which correspond to N-terminal and central domain fragments. However, these fragments do not correlate with the known structural domains of the protein.36–38 For 14-3-3 ζ, a single peptide was detected near the protein’s N-terminal, but the coverage is too low to make any connection to protein domains. In summary, 18 protein fragment spots were identified, with 6 spots corresponding to post-translational processing for activation by limited proteolysis, 9 spots representing structural domains, and 3 that could not be correlated with protein domains.
The cleavages identified suggest that limited proteolysis was the mechanism of protein-fragment production. This would be facilitated by activation of endogenous proteases, as high levels of protease precursors are found normally in pancreatic tissue. Activation could occur as part of normal cellular processes, during sample collection, and/or during processing of the lysates prior to 2D gel electrophoresis. After collection, the tissue samples used in this study were frozen at −80°C prior to analysis and then lysed in protease inhibitor containing RIPA buffer. After lysis, they were kept in 8 M urea briefly on ice prior to 2D gel electrophoresis. Thus, it is unlikely that significant proteolytic activity occurred during sample processing prior to 2D gel electrophoresis.
In order to determine whether any active proteases are present in the sample, extended incubation at 37°C was conducted on 20 of the original tissue samples, using a commercial protease screening kit. A color change indicates proteolytic activity, and absorbance values are given in Table 33.. A control sample, without tissue, lacked proteolytic activity. Normal pancreatic tissue (n = 8) derived from both autopsy and surgical samples, as well as the pancreatitis patient tissue (n = 6), showed higher levels of proteolytic activity than the tumor tissue on average. The method of sample collection does not explain the difference in proteolytic activity, as surgical and autopsy normal samples gave a similar range of values. Tumor tissue has lower levels of the precursor proteases,14 and it is therefore not surprising that a relatively lower level of proteolytic activity was observed. When protease inhibitor levels were doubled, a slight reduction in activity was noted in the normal and pancreatitis samples (Table 33).). This result indicates that some proteases in these samples are not affected by the protease inhibitors used in standard preparations.
While these experiments indicate that high levels of endogenous proteases may increase the abundance of protein fragments, protein fragments have been observed in other sample types. Other investigators using 2D gel electrophoresis combined with Western blotting or mass spectrometry have identified protein fragments in multiple gel spots when analyzing tissue, plant material, and microorganisms.3–5,23,30 Increased use of 2D Western blotting and 2D gel electrophoresis/mass spectrometry identification in a variety of sample types and species will enable determination of the prevalence and variety of protein fragments.
Two-dimensional gel electrophoresis and MALDI-MS are an effective strategy for determining the protein domains present in those gel spots that are observed at significantly lower MW values than are given in the database. While average sequence coverage is only 30%, the peptides detected are confined to a specific region of the protein, such as the protein N- or C-terminal. This information could easily be incorporated into protein identification tables. Regional coverage information is not readily available from either LC-MS/MS analysis of digests of cellular lysates or from epitope-specific antibodies. Some of the protein fragments correspond to chains produced by known cellular processing and activation pathways. Others have been detected as functional and structural domains during in vitro experiments or noted in other in vivo studies, indicating they function intra- or extra-cellularly (see Table2). By using tools that allow both protein identification and measurement of MW, we can assess the abundance and distribution of protein fragments. Correlation of these results with targeted functional studies on specific proteins will elucidate the biological function of protein fragments.
We thank Lisa Schroeder for her excellent technical support, Rong Wang for providing the shuffled database, and Asma Sharif for assistance with the figures. This study was supported by National Institutes of Health (NIH) grants RO1 CA098380, P20 CA101936, P30 ES07784, NIH Cancer Center Core grant CA16672, and a research grant from the Topfer Research Funds.