|Home | About | Journals | Submit | Contact Us | Français|
We have developed an automated procedure for aligning peaks in multiple TOF spectra that eliminates common timing errors and small variations in spectrometer output. Our method incorporates high resolution peak detection, re-binning and robust linear data fitting in the time domain. This procedure aligns label-free (uncalibrated) peaks to minimize the variation in each peak’s location from one spectrum to the next, while maintaining a high number of degrees of freedom. We apply our method to replicate pooled-serum spectra from multiple laboratories and increase peak precision (t/σt) to values limited only by small random errors (with σt less than one time count in 89 of 91 instances, 13 peaks in 7 data sets). The resulting high precision allowed for an order of magnitude improvement in peak m/z reproducibility. We show that the CV for m/z is 0.01% (100 parts per million) for 12 of the 13 peaks that were observed in all data sets between 2995 and 9297 Da.
Protein expression profiling using MALDI-TOF-MS is a widely used technique for a variety of studies including microbial typing , semi-quantitative comparison , imaging MS , etc. While advances in signal processing and instrumentation improve the ability to resolve spectral features, the reproducibility of such spectra remains a major limitation to the precision of MALDI-TOF-MS. Precise time measurement is critical to both protein identification and pattern recognition in mass spectra. In order to achieve high precision, it is necessary to align (or synchronize) spectra so that characteristic features occur at the same time in all spectra being analyzed.
Instrument precision can be reduced by both systematic and random errors. Although it is hard to avoid random errors, systematic errors can be reduced or eliminated from individual spectra, provided they can be characterized. The most important sources of systematic instrumental error in MALDI spectra are variations in the triggering time from spectrum to spectrum and small variations in the accelerating voltage. Since these errors appear as linear effects in the TOF data, it should be straightforward to remove such errors using corrections to uncalibrated TOF data.
There are various approaches to aligning TOF data including the use of: frequent calibration [4, 5]; clustering or re-binning [6–13]; cross-correlations [14, 15]; minimizing entropy [16, 19]; and others [17–24]. Of these, only three are based in the time domain, either directly  or indirectly by adjusting calibration parameters [16, 23]. Corrections made in the time domain should be inherently more accurate since m/z values are derived from equations obtained by a quadratic fit to a few calibration peaks in time data. (Corrections to data after calibration in effect fit predicted values rather than measured values.)
In this study, we reanalyze the raw TOF data obtained during a multi-lab reproducibility study  using our high resolution peak detection and label-free alignment methods to show an improvement of more than an order of magnitude in precision. In this paper, label-free alignment refers to aligning time domain data by using the most commonly occurring peaks, without regard to the identity of those peaks. While we have used naturally occurring peaks, it would be possible to use markers that were added for calibration. However, even for added calibration markers, the actual m/z values would not be used for this alignment. This method of label-free alignment should be broadly applicable and have significant impact for researchers using TOF data for expression pattern analysis, imaging MS, and improved MS/MS protein ID.
A multi-institution assessment of platform reproducibility has recently characterized the performance of SELDI-TOF instruments at six different locations . In this previous study, using stringent procedures for calibration/synchronization and standardization, six laboratories obtained inter-laboratory reproducibility approaching the intra-laboratory reproducibility. Quality Control (QC) samples were prepared using pooled normal human sera from 360 healthy individuals (197 women and 163 men) and provided to all the sites. Each site then collected TOF mass spectra from these QC samples and data were processed at a central site. Peaks were identified by locating local maxima and aligned by locating clusters of peaks within the ‘window of potential shift’ ( m/z = 0.2% ) for each value of m/z  The synchronization and standardization process included strict acceptance criteria with tolerances (for m/z resolution, intensity values and signal-to-noise ratios) prescribed for three omnipresent peaks ( m/z = 5910, 7773 and 9297 Da). Following synchronization and standardization, 96 QC replicates were run at each site to evaluate the data reproducibility. During this subsequent evaluation, the three target peaks were shown to have CVs for m/z of 0.1%. For this paper, we have reprocessed the raw post-standardization TOF data from this multi-lab reproducibility study using the steps outlined below.
Prior to aligning data sets from different sites, we aligned the individual data sets separately. Figure 1 provides a schematic of our alignment procedure. The steps are described below.
We removed the slowly varying background using a charge accumulation model  and detected peaks using a maximum likelihood method . This required fitting the data to a theoretical line shape (a symmetric Gaussian of a specified width) centered in a sliding window. Since the peak width in time units is nearly constant over a wide range of masses near the delayed extraction mass focused optimum (2–10 kDa, which covered the range of this study), we used a constant window width equal to the peak full width at half maximum (FWHM). (For higher values of m/z, the peak width increases rapidly and the method must be adapted..) We used a hypothesis test to locate windows that could contain peaks, and then maximized the likelihood function to find the optimum window location (and thus peak location). To minimize the effect of line shape errors, we used only data within ±½ FWHM of the center of the line. While this method will fail if there are too many overlapping peaks, it easily detects equal intensity peaks separated by at least one FWHM. All of the peaks reported in this paper were sufficiently separated from adjacent peaks to be resolved. We had the option of using either a Gaussian noise model or a pseudo-Poisson noise model in our likelihood function. For the results presented in this paper, the latter noise model was used. A valuable byproduct of this maximum likelihood approach was the routine estimation of the uncertainty in the peak position, σt. As will be discussed, this is a meaningful estimate of the random error in the peak position and was typically a fraction of a clock count. The ratio provided an estimate of the mass precision of the series of time measurements. While we used our peak detection method, the following alignment procedure (steps 2 – 5) can be applied to peak lists obtained by other methods.
Our peak detection method often found large numbers of peaks, including many that could be quite small. We chose not to use intensity information directly, because we expected that to vary from one patient to another within a clinical data set. However, when the peak density was too high, alignment could become unstable with peaks at almost every time value (across the data set). So, we reduced the peak list by restricting it to commonly occurring isolated peaks. We did this by creating a histogram of peaks for a data set and, using a bin width equal to our largest expected time shift in a data set (±20 time steps in this work), we identified bins that contained at least 80% of the maximum number of counts (total number of spectra) with adjacent bins containing less than 40% of the maximum number. We then created shortened peak lists for each sample by only including those peaks that were counted in the selected bins in the histogram. A sufficient number of peaks were retained to insure correct matching of peaks between spectra. For this study, eight peaks were included in the reduced peak list used to determine the correction factors. (Using eight peaks to determine two correction parameters was sufficient to prevent over-correction of the data.)
With the shortened peak lists, we constructed a master list of common peaks using an iterative process that delimits spectral regions without peaks. We began by choosing a large window size (typically the maximum expected shift which for this study was 8 times the FWHM) and excluding regions larger than that window in which no spectrum had peaks. This usually left some regions with peaks that were larger than the desired window size. We divided any such large regions into two parts at the largest gap between peaks within ±1 window of the average position of the peaks in the bin. We repeated this until we reached the desired widow size (for this study, 3 times FWHM), and then eliminated any bins that had contributions from too few spectra. For this study of nominally identical QC samples, we required bins to have peaks from 80% of the available spectra. Finally, we generated a master peak list of the average position of all the measured values in each remaining bin.
The reduced peak list for each spectrum was assigned a constant shift in time to minimize the average difference between its measured locations and the expected locations of the master list. The alignment procedure of Step 3 was then repeated to construct an improved master list. We then applied a robust linear fit to obtain correction factors for both offset and scale that would minimize the average difference between the reduced peak lists and the expected locations of the new master list.
We then applied the corrections obtained in step 4 to the entire original peak lists, and then used the process of Step 3 with a desired window size of the FWHM to construct a final master list of those bins with contributions from at least 80% of the spectra. Since the alignment step introduced a scale and offset change to each spectrum, re-binning could re-assign peaks to different bins when appropriate. At the completion of this process the bin size had been reduced to the size of a peak width and the residual errors appeared to be random.
For data sets collected on different dates or in different laboratories, preliminary steps were required to “roughly align” data sets prior to merging and aligning as a single data set. First, a master peak list was obtained for each data set (steps 1 and 2 above) independently. (For this preliminary master list construction, the inclusion criterion for peaks was relaxed from occurrence in 80% to 50% of the spectra.) Next, approximate time offsets and scale factors were obtained (using least squares fit) to “roughly align” the master peak lists. After applying these corrections, we merged the data sets and the entire group was simultaneously re-binned and aligned using the procedure above. This method was coded in C and R routines and implemented as a custom R module within a commercially available visual bioinformatics workflow environment (VIBE, http://www.incogen.com). This module as well as the independent routines can be downloaded from http://wecook.people.wm.edu.
Two types of systematic instrumental error are observed in TOF data: variations in the triggering time from spectrum to spectrum and small variations in the accelerating voltage. Triggering time errors, or jitter between spectra, are differences in the measured TOF start times due to variations in the output from the digitizing clock and supporting analog electronics. These timing errors appear as constant time offsets in TOF spectra and are expected to be at least ±1 time count. Since a triggering time error effects all time measurements in a spectrum equally, it can easily be eliminated by subtracting a constant from each time value. We observe offsets as large at 12 clock counts within a single lab and as large as 62 clock counts between laboratories. It is important to note that we are dealing with averaged spectra which are obtained from multiple laser shots. Without access to individual spectra, it is impossible to completely remove error due to jitter. The presence of time jitter between individual shots can produce split peaks in the averaged spectra and consequentially produce errors in peak identification. The Appendix to this paper briefly describes this phenomenon. In addition to the start time jitter, any low frequency variation in the spectrometer acceleration voltage or any thermal expansion (or contraction) of the time-of-flight tube can produce an apparent linear dilation or contraction of the time measurement scale. As with the correction for jitter, a systematic error of this type can be eliminated by simultaneously correcting all the points in a spectrum. This type of error can be corrected with a simple linear scale factor. We observe scale corrections within +/− 0.05%. For data taken during the same day at a given location we typically observe correction factors of no more than 10−3. These correction factors are consistent with the expected differences in power supply outputs in different instruments (1 part in a thousand). Figure 2 illustrates the offset and scale errors in two spectra obtained at one site on two different dates. For this case, there is a 30 count offset and approximately a dilation of 0.007 (25 counts over 3600 counts). While frequent calibration can rectify these errors, it cannot remove the shot to shot voltage variations that occur due to pick-up of 60 Hz (line voltage) noise.
We have analyzed 7 data sets, five containing 96 spectra each from 5 laboratories, the sixth containing 42 spectra from a single laboratory (Lab 2) and a seventh containing 522 spectra from all six laboratories combined. (A number of replicates in the data set from Laboratory 2 data could not be used due to peak splitting revealed by our high resolution detection method. The appendix describes this problem.) For this study, 8 peaks in the range 2–10 kDa occur in at least 80% of the spectra prior to alignment and were used for the alignment procedure. Of the three target peaks used for the previous multi-lab study, only the m/z = 7773 Da peak was included in the restricted peak list used to obtain correction factors for alignment in this analysis. The other two target peaks were not sufficiently isolated (as described in Section 2.2 Step 2) to be included in the reduced peak list.
To illustrate the effectiveness of our alignment procedure, Figure 3 shows histograms of the peak displacements (from the aligned time value) for the m/z = 7773 Da peak before (a) and after (b) alignment. Prior to alignment, jitter between spectra appears as integer jumps in the peak locations. Data from each lab clusters in the vicinity of its corresponding mean displacement (indicated by vertical arrows) revealing systematic differences between the data sets. The mean for Lab 6 does not coincide with the center of a cluster because data from this lab is split between two clusters (at approximately −20 and +32) which were acquired on two occasions, three months apart. (Data from Lab 6 required a preliminary step in which the data obtained on the two different dates were “roughly aligned” as described in Section 2.3.) Table 1 provides a numerical summary of the mean displacements. After alignment, the residual variation in peak location is significantly smaller and appears random. This successful alignment of data from 6 different sources without the application of calibrations demonstrates the power of our time domain alignment method.
This large multi-lab data set allows us to obtain statistics on the performance of our alignment procedure. While comparable data sets from other instruments are not available to us, we have also observed, and corrected, systematic time shift errors in standard spectra obtained using an Ultraflex III MALDI-TOF (Bruker Daltonik) data .
Figure 4 presents standard deviations for peak time locations after alignment for 13 peaks that are detected in all the data sets. The m/z values are obtained using a quadratic fit to the three target peaks identified in reference . Peak location variability is reduced to below 1 time count in 89 of 91 instances. The inter-laboratory variability (solid symbols) is comparable with the intra-lab variability indicating that the alignment procedure has reached the limit of precision given the uncertainty predicted by maximum likelihood peak detection. The higher variability seen in the Lab 6 data set may be the result of the combination of two smaller data sets acquired on different days. The lower variability seen in the Lab 4 data results from a higher sample rate used for data acquisition (twice the rate used by the other labs).
One peak in Figure 4, t = 9663 counts, has a notably higher standard deviation than the others. This is due to a low S/N (peak signal divided by the root-mean-square noise in the window.) Figure 5 presents a portion of a typical spectrum that contains this peak and a neighboring peak with a high S/N (t = 9621 counts, one of the peaks used for alignment process). Histograms for the residual peak displacements for both peaks provide insight into the alignment results. For these peaks, lower S/N is associated with higher uncertainty in the peak detection and greater residual variation after alignment. Studies with simulated data show an inverse relationship between σt and Here n is the number of time points in FWHM and C is a constant of order 1. This agrees with the theory of the peak detection method.
In order to determine if the residual peak displacements are random or might still contain systematic errors, displacement histograms are fit to probability density functions. Figure 6 presents histograms for four peaks with a t probability density function overlaid. The two plots on the left, a) and c), correspond to peaks used for label-free alignment (included in the reduced peak list) and the two plots on the right, b) and d), correspond to peaks to which corrections were applied (included in the original peak lists but not in the reduced peak list used for determination of correction coefficients). All show good agreement with a t -distribution which supports our claims that the residual displacements after alignment are random and systematic timing errors between spectra have been removed. The use of a t -distribution rather than a normal distribution is appropriate for modeling aligned peak values which are effectively mean quantities. 
Low peak position uncertainty leads to remarkably high precision, as shown in Figure 7. For this figure, we translate the values for time precision into coefficients of variation for mass using . (The factor of 2 arises from the fact that m/z values are obtained by applying a quadratic calibration equation.) For reference, Figure 7 shows the previously published CV values (marked as X’s) for the three target peaks used for synchronization in the multi-lab standardization study . A logarithmic scale is used to highlight the order of magnitude improvement is precision. For convenience, Table 2 reproduces the CV values for the 13 peaks detected in all data groups. Published values for the three target peaks from  are included and the peaks used for label-free alignment are indicated.
We have shown significant improvements to TOF precision and, consequently, reproducibility in m/z using linear time domain methods. Peak detection with superior resolution is demonstrated and systematic instrument errors are identified and removed using linear offset and scale corrections. We expect that our post-acquisition data processing will have significant impact for researchers utilizing TOF data for expression pattern analysis, imaging MS, and improved MS/MS protein ID.
The authors gratefully acknowledge D. Manos at the College of William and Mary for stimulating discussions. This work was supported by NIH-National Cancer Institute grants CA101479 (MS), CA126118 (DIM) and CA085067 (OJS).
Triggering time error cannot be completely removed because each spectrum is itself the average of many laser shots, each with its associated jitter. Without access to data from individual laser shots, it is therefore impossible to completely remove all triggering time error. Small timing jitter in averaged spectra usually appears as peak broadening, but sometimes the triggering jitter can jump between two widely separated vales, and this will split each peak into two (or more). Peak splitting due to timing jitter is distinguished from that resulting from the presence of chemical adducts by the fact that each peak is split by the same constant time increment over the entire TOF spectrum (from the low mass sodium peak, through the matrix region and throughout the mass focusing range). Figure A1 illustrates an egregious example of this timing jitter error in averaged spectra obtained at one site. The dotted line represents two peaks from one spectrum that appears to have little jitter, while the solid line represents the corresponding peaks from another spectrum that exhibits significant jitter. The clearly split peaks suggest that there are two main trigger start times associated with the single shot spectra and that the split spectra could be modeled as the average of two subgroups with different start times. A simulated split spectrum similar to the one in the figure is constructed by averaging 110 copies of the unsplit spectrum and 90 copies with a delayed start time of 12 clock ticks. The agreement we see between the simulated peaks (shown as X markers) and the observed split peaks is consistent for all the peaks in the split spectrum. Differences in amplitude are to be expected due to normal peak amplitude variations among replicates. To date, such peak splitting has not challenged the vendor software for peak detection and clustering alignment. However, our new high resolution peak detection method exposes this problem as an obstacle to alignment and improved precision. For this paper, spectra with split peaks are not included in the analysis and as a result, only 42 spectra for Lab 2 are analyzed.