|Home | About | Journals | Submit | Contact Us | Français|
Offline high performance liquid chromatography combined with matrix assisted laser desorption and Fourier transform ion cyclotron resonance mass spectrometry (HPLC-MALDI-FTICR/MS) provides the means to rapidly analyze complex mixtures of peptides, such as those produced by proteolytic digestion of a proteome. This method is particularly useful for making quantitative measurements of changes in protein expression by using 15N-metabolic labeling. Proteolytic digestion of combined labeled and unlabeled proteomes produces complex mixtures that with many mass overlaps when analyzed by HPLC-MALDI-FTICR/MS. A significant challenge to data analysis is the matching of pairs of peaks which represent an unlabeled peptide and its labeled counterpart. We have developed an algorithm and incorporated it into a compute program which significantly accelerates the interpretation of 15N metabolic labeling data by automating the process of identifying unlabeled/labeled peak pairs. The algorithm takes advantage of the high resolution and mass accuracy of FTICR mass spectrometry. The algorithm is shown to be able to successfully identify the 15N/14N peptide pairs and calculate peptide relative abundance ratios in highly complex mixtures from the proteolytic digest of a whole organism protein extract.
Various stable isotope labeling strategies are being used today in the field of comparative proteomics . The methods of stable isotope labeling can generally be placed in three classes, metabolic labeling [2-4], chemical reaction [5, 6] and enzyme reaction [7, 8]. The differentially labeled samples are then combined and concurrently processed and analyzed. In metabolic labeling method, the proteins in the cells grown in a particular media can be uniformly labeled by including the isotope of interest as an inorganic compound [2, 4] or already incorporated into amino acids . The mass difference between unlabeled and labeled peptides depends upon the type of stable isotope labeling that is employed. The chemical labeling methods and enzyme reactions usually produce a fixed mass difference between unlabeled and labeled peptides, which facilitates the matching of the peak pairs in a mass spectrum. On the other hand, metabolic labeling usually results in a variable mass difference, such as in the case of 15N-labeling, in which the mass difference between unlabeled and labeled peptides is very close to 1 u per nitrogen atom in the elemental composition.
Efforts in our laboratory are directed at developing high-throughput proteomic analysis methods that allow protein identification and quantitation in the shortest possible time. Reversed-phase liquid chromatography (LC) coupled with Fourier transform ion cyclotron resonance mass spectrometry (FTICR) has been shown to provide high confidence in peptide identification, increased sensitivity, and higher throughput than methods that rely on repetitive MS/MS identification [9, 10]. To measure differences in relative protein abundances, proteomes with two different stable-isotope compositions are combined and analyzed. In this study, Methanococcus maripaludis was grown in both normal (14N minimal media) and heavy media (15N minimal media). In the mass spectra, peptides appear as light and heavy pairs of peaks, and the spacing between a heavy/light peak pair corresponds to the number of nitrogen atoms present in the peptide times the mass difference between 14N and 15N, namely 0.9970 u, as shown in Figure 1. Thus, 15N metabolic labeling in conjunction with FTICR not only allows quantitative studies, but also provides the nitrogen stoichiometry of the peptides , which serves as a search constraint to increase the peptide identification rate. For the organism M. maripaludis, with 1722 open reading frames, 15% of the tryptic peptides can be identified by accurate mass measurement with a 10 ppm mass tolerance, while 43% can be identified if nitrogen stoichiometry is used as a search constraint.
At the outset of the development of HPLC-FTICR/MS for shotgun proteomics, the most significant bottleneck in the analysis was manual peak picking data from a 15N metabolic labeling data set. While the most abundant pairs are easily assigned by visual inspection, there are regions in most mass spectra that are quite congested with moderate and low abundance peptides which overlap each other in mass-to-charge. For a typical proteomic analysis of M. maripaludis, 50-100 fractions are collected and each fraction is analyzed by MALDI-FTICR/MS. On average, it takes 2-3 hours to manually pick a single mass spectrum, requiring 150-200 hours for analyzing one data set. Manual interpretation of such complex data is very tedious and time consuming. Several algorithms have been developed to process the stable isotope labeling data [11-14]. However, these algorithms are mostly focused on the quantitative aspects of the measurement, and the peptide identification is based on MS/MS. Until now there has been no development of software that might be used to aid the identification of 15N/14N peptide pairs from 15N metabolic labeling experiments. To assist with the study of 15N/14N peptides in mass spectra, we have devised and implemented a new computer algorithm. It has been incorporated into a program we call N15Tool, which is able to automatically identify and quantify 15N/14N peptide pairs from the mass spectra using m/z and peak intensity data directly.
The objective of the work described here is to first consider the challenge presented in analyzing a typical MALDI-FTICR mass spectrum of a tryptic digest of a 15N-metabolically labeled proteome, and then to develop an algorithm that identifies pairs of 15N labeled components, and calculates the ratio of stable isotope coded peptides. The algorithm for pair assignment is based on finding peaks that are separated by an integer value times the 15N/14N mass spacing of 0.9970 amu, within a narrow range of error. This method will be demonstrated to provide over 99% accuracy in assigning peak pair, verified against manual peak picking. In addition, light and heavy labeled peaks are compared and normalized to provide relative differential quantitative information at peptide level. This automated program reduces the time of data analysis from over 100 hours to tens of minutes.
The analyzed proteome was a whole cell lysate extracted from M. maripaludis, which was grown on minimal media with ammonium sulfate as the sole source of nitrogen. M. maripaludis Δlrp mutant (S102) and wild-type (S2) were cultured in midlogarithmic and stationary stages in ammonium sulfate with naturally occurring isotopic composition (99.6% 14N, 0.4% 15N) and with 15N-enriched composition (>99% 15N), respectively. 40mL of culture grown with 14N mixed with 40 mL of culture grown with 15N to form the cell mixtures. The cell mixtures were centrifuged at 10,000 × g for 30 min at 4 °C and followed by lysis with a French pressure cell. DNA was digested by DNase and incubated for 15 min at 37 °C. Cell debris and the unbroken cells were removed by centrifugation at 8000 × g for 30 min at 4 °C.
The proteome sample was prepared in alkaline solution (10 mM ammonium bicarbonate) at ~1mg/mL concentration and denatured by heating at 90 °C for 10 min. Disulfide bonds were reduced with tris (2-carboxythyl) phosphine (Pierce biotechnology, Rockford, IL). Denatured proteins were digested overnight at 37 °C using trypsin (Promega, Madison, WI) at a 1:50 protease/protein ratio (by mass).
Separation of peptide mixture were carried out on an UltiMate Plus HPLC system by using a 75 μm i.d. × 15cm C18 nanocloumn with 3 μm particles and 100 Å pore size (Dionex LC packings, Sunnyvale, CA). The mobile phase was water and acetonitrile. Peptides were eluted with increasing acetonitrile (5-80% in 90 min) at approximate flow rate of 300 nL/min. Eluate was detected by UV absorption at 214 nm and was collected onto a stainless steel MALDI target at a 60-s interval using a Probot Micro Fraction collector (Dionex LC packings, Sunnyvale, CA). Samples were allowed to dry before adding 400 nL of 1 M 2,5-Dihydroxybenzoic acid (DHB) (Lancaster, Pelham, NH) as the MALDI matrix.
Samples were analyzed on a 7-T FTICR mass spectrometer equipped with an intermediate pressure Scout 100 MALDI source (Bruker Daltonics Inc, Billerica, MA). Conditions for operation of the FTICR/MS were similar to those reported previously  with minor modifications. The ions generated from 6 MALDI laser shots per scan and 12 scans were co-added for each spectrum. The external mass calibration was established using a peptide mixture generated by tryptic digestion of chicken egg albumin (Sigma, St. Louis, MO).
The algorithm for peak matching has been incorporated into a Java program, N15Tool (available from the authors upon request) which consists of three major steps involving data input, processing and output. The input to the program is a text file with two entries per line; the mass value for the monoisotopic peak (charge deconvolution is required for multiply-charged ions) and signal intensity. For the work published here, assignment of monoisotopic peaks for the 14N and 15N labeled peptides (S/N > 3) was performed automatically using the SNAP program in the DataAnalysis software package (version 3.4 build 184, Bruker Daltonics Inc, Billerica, MA). The peak list from each spectrum, sorted by mass-to-charge value, was saved in a general text format, and serve as input to the N15Tool program.
For the analysis, the mass spectrum of each collected fraction is processed separately. The labeled and unlabeled versions of a peptide are not separated by chromatography, and so both will appear in the same fraction. Thus, it is advantageous to the matching procedure that the data from separate fractions are not combined before the assignment of peak pairs is made.
The algorithm separates the mass range (m/z 700-4000) into 100 m/z wide segments. For each segment, the minimum and maximum number of nitrogen atoms per peptide is calculated in advance by analysis of in silico digestion of all entries in the protein database for the organism of interest. The algorithm begins with the lowest m/z value of the first input text file, and proceeds by comparing the mass difference with successively larger m/z values, until a match is found. In order to be considered as a potential heavy isotope companion, the mass difference to the second peak must lie between the computed minimum and maximum number of nitrogen atoms for the segment corresponding to the mass-to-charge value of the peak. Furthermore, to be assigned as a light/heavy peptide pair, the mass-to-charge spacing between two peaks must closely match an integer value times the 14N/15N mass difference of 0.9970 u. That is, for n=Δm/0.997, where Δm is the mass difference between peaks, n is the calculated number of nitrogen atoms in the peptide, and it must lie within ±0.006 of an integer value for peptides of m/z 1000. The tolerance value scales with mass, and is equal to approximately 6 ppt of the mass of the peptide in question. The entry for an identified 15N-labeled peak is then removed from the list so that it will not be selected as a 14N peak. If no suitable corresponding 15N peak is found within the maximum spacing predicted for the segment, the peak is discarded, and processing of the list continues with the next peak, which is then treated as an unlabeled peptide. This procedure is repeated until all peaks in the file for the HPLC fraction have been processed. The procedure is automatically repeated for each input file, corresponding to each separate HPLC fraction.
As each 15N/14N peptide pair is identified, the relative abundance ratio is calculated by the program and stored with the monoisotopic mass of the unlabeled peptide and its nitrogen stoichiometry value. A text format output file is created in which each line contains the three fields of information (mass, nitrogen stoichiometry, and abundance ratio.) 15N metabolic labeling does not cause chromatographic isotope effects, so the peak ratio calculated in each spectrum should preserve the approximation of quantitative information. However, the monoisotopic peak intensities of the 15N/14N peptide pair are not equal for a 1:1 mixture because the isotopic peak distribution shape of the same peptide from a sample grown in 14N media (99.6 atom %) and 15N media (97-99 atom %) are not exactly identical, and depend on the exact isotopic enrichment value and the mass of peptide. Also, it is not possible to create a mixture in which proteins from both samples are present in exactly equal amounts, and so even a carefully measured ratio will deviate from 1:1. To determine an accurate ratio of abundances, both these issues must be addressed. The former issue (determining the enrichment value) is easily addressed by matching the isotope peak distribution of abundant labeled peptides with calculated isotope distributions, in which 15N enrichment is varied to produce the best fit. The latter issue, determining the exact amount of unlabeled and labeled proteins in the original mixture, is resolved by plotting the distribution of abundances, and taking the central value as the true mixing ratio. This value is used to normalize all the measured abundance ratios, so that a value of 1.0 corresponds to no change in protein expression between the control and stressed proteomes.
For the most accurate measurement of abundance, the entire isotope envelope should be used . Although summing the entire isotope distribution for a peptide is the most accurate way to measure its abundance, it cannot be accomplished for highly congested mass spectra where there are significant overlaps in mass, as in Figure 2. For the proteome experiments, we use only the monoisotopic peak intensity for determining peptide abundance ratios between labeled and unlabeled pairs. The ratio of monoisotopic peak intensities for peptide pairs (15N/14N) varies systematically with the molecular weight of a peptide, and depends also on the exact enrichment value for 15N-labeling. Thus, normalization is required for accurate relative abundance determination. The normalization value and its mass dependence is determined from calculated isotope distributions using the elemental composition of averagine . Errors can occur when a peptide contains different number of sulfur atoms from the number of sulfur atoms predicted by averagine. More accurate quantitative analysis can be done if this peptide results in an unique protein identification, as the specific elemental composition of the peptide can be used to fit the isotopic pattern.
The performance of the N15Tool algorithm was tested for a proteome analysis using 15N-metabolic labeling in M. maripaludis. Unlabeled and labeled protein extracts were mixed in approximately equal amounts, digested with trypsin, and analyzed by HPLC-MALDI-FTICR/MS. In this experiment, 90 fractions from the HPLC separation were collected directly onto a MALDI target, and then mass analyzed. To test the accuracy of the assignment of 15N/14N peptide pairs, all spectra were interpreted manually, a task that consumed about 10 days. In contrast, the same data were analyzed by the N15tool program in about 15 minutes. Of the 2321 peptides identified from the N15Tool, more than 99% of them were also identified in the manually-interpreted data set.
The robustness of the algorithm under the realistic test conditions is remarkable given its simplicity. For example, each time an unlabeled/labeled pair is identified, and the peaks are removed from the list, the algorithm assumes that that the next peak in the list is an unlabeled peptide. If this assumption is wrong, the procedure will be out of registry, treating labeled peptides as unlabeled peptides. In developing the algorithm, there was concern that such an event would lead to an irrecoverable event in which all subsequent matches in an input file would be invalid. However, in practice, the algorithm recovers from such situations, and quickly finds the next unlabeled peptide to use as the base for finding the matching labeled peptide. We believe that the robustness results from the very precise requirement for the mass spacing between unlabeled and labeled peptides, i.e. for n=Δm/0.997, n must lie within ±0.006 of an integer value. It appears that when labeled peptides are accidentally treated as unlabeled peptides, no higher mass peak is found to fit the matching constraints. The algorithm discards these unmatched peaks, and thus moves forward to the next peak in the list, and gets back in registry with the unlabeled versus labeled peptides in the list. The ability to recover from situations in which a pair match cannot be established is also useful for other events, for example, when protein expression between the unlabeled and labeled sample changes to such a large extent that only one peak is observed for the unlabeled/labeled peptide pair; or in the case where two peptides overlap in mass sufficiently to be unresolved from each other. Although one can imagine many possible ways in which the algorithm could fail, in practice it recovers gracefully from such events and continues to assign isotopic peak pairs. The robustness of this procedure has been established by its application to hundreds of peak lists and tens of thousands of peptide pairs. The algorithm is particularly remarkable in being able to match peptide pairs in portions of the mass spectrum that are highly congested, as shown in Figure 2. The three lowest mass peptide pairs shown in the figure inset would present a significant challenge to manual interpretation, given the mass overlaps of the ions. For example, peptides of nominal mass 1314 and 1315 produce a doublet m/z 1315, with a separation of 0.025 mass-to-charge units. The SNAP algorithm assigns the proper monoisotopic masses to both peaks, and our pair matching algorithm finds a unique set of labeled peaks to assign to these two peptides.
The pair matching algorithm does not rely on differences in the isotopic distribution of labeled versus unlabeled peptides. When the matching process was performed by hand, we used lower 15N-enrichment (97-98 mol %) to help visually distinguish labeled from unlabeled peptides, as shown in Figure 3. While this aids manual interpretation, it detracts from automated interpretation and reduces the accuracy of the quantitative measurement. The software that is used for assigning monoisotopic peaks (SNAP, vide supra) does not perform well on labeled peptides with isotope distributions that vary significantly from unlabeled peptides. Labeling at 99% enrichment significantly facilitates that automated assignment of monoisotopic peaks for labeled peptides by the SNAP software.
To test the effectiveness of the normalization of abundance ratio, non-normalized ratios were compared with the normalized ratios for the same data set. The sample contains equal amount wild type and mutant type of M. maripaludis, so the overall relative abundance between the light and heavy peptides is expected to be 1:1. Figure 4 shows the distribution of 2321 heavy/light peptide logarithm (base 2) ratio. In this case, the average logarithm (base 2) ratio for these 2321 peptides is 0.1885 if the peak ratio is obtained by direct calculation from the monoisotopic peak height. This value is changed to 0.0846 when normalization is applied by N15tool. The normalized relative abundance of the majority peak pairs indicates a 1:1 ratio
The program produces only one output file even though the number of input files is nearly one hundred. A portion of results from the output file is shown in Table1. Each row of the output contains the information of one identified peptide pair. The output file is a tab delimited text file including the information of the input file name (fraction number), identified peak pairs, and intensity ratio, suitable for import into Excel or other programs. Collectively, these data indicate that the N15Tool program is capable of generating reliable light/heavy peptide assignment and abundance ratios automatically.
We have developed a simple but robust algorithm that aids the process of 15N metabolic labeling data resulting from LC-MS. This algorithm has been incorporated into a Java program that can process data from a variety of mass spectrometer platforms. It provides an automated method for identifying 14N and 15N peak pairs and calculating the ratios between the light and heavy peptides without prior knowledge of amino acid composition of the peptides. The program significantly reduces the burden of manually interpreting large scale LC-MS data sets from 15N labeling experiments. The current version of the program has been tested for the proteome from M. maripaludis. Nevertheless, applications should be expanded to MS based experiments of any other organism with 15N metabolic labeling by apply new theoretical nitrogen numbers of each m/z segment in the program. N15Tool is platform independent and can be used in any operating system have the Java runtime installed. It is also proprietary vendor format independent because the input for the software is text files containing the m/z and signal intensity information. Finally, the flexibility and versatility of N15Tool allows it to be used with any 15N metabolic labeling data from any instrument that can provide accurate mass measurement, including the Orbitrap or high resolution time-of-flight mass spectrometers.
The authors thank Professor William B. Whitman and Yuchen Liu (Department of Microbiology, University of Georgia) for providing the 15N-metabolically labeled M. maripaludis proteome. The authors are grateful for financial support from the National Institutes of Health (grant RR019767-04). IJA would like to acknowledge the influential contributions of Charles Wilkins to the field of MALDI-FTICR/MS and to the applications of computers in Chemistry.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.