All samples were tested in duplicate and provided spectral profiles in the initial run. In the case of 3 samples, duplicate spectra were removed after initial analysis using CE as the normalization factor (NormF) was greater than average NormF+2σ generated with the batch analysis. These samples were rerun and replicate spectra used in further analysis. In the case of LS analysis, duplicate or replicate spectra were averaged to produce a single spectrum representing each subject prior to peak detection, unlike CE. In the CE analysis, duplicates or replicates peak heights were averaged after peak detection but before t-tests/group analysis.
We estimated the quadratic variance function (QVF) from spaces between the peaks of the QC data. Peak-free regions were selected by visual inspection, with the mean and variance at each point calculated across spectra. A quadratic detector response curve is fit to the mean/variance points using least squares, and results of the QVF estimation procedure are shown in . The QVF is stored for use with LS as part of the modified Antoniadis-Sapatinas (mA-S) algorithm for denoising the QC and clinical spectra.
Extending the modified Antoniadis-Sapatinas algorithm to use cycle spinning required decreasing the computational complexity of the algorithm. For the implementation of the mA-S algorithm used in
[9],
[10], approximately 19.5 minutes of computing time using 10Gb of RAM was required to denoise a single spectrum. Coifman and Donoho showed that it is sufficient to compute

shifted transforms for a signal of length n. In a typical low-laser low-mass focused spectrum we have generated,

, which means it would take almost 5 hours for a cycle-spinning mA-S algorithm to denoise a single spectrum. Five hours is enough time to manually process several spectra, so this result is unacceptable. We made the following observations about the computations:
- The slow part of the computation was the default implementation for calculating the term
from Eq. (9) of [9]. - Many computations were redundant in the 2-D wavelet decomposition used to calculate this term; i.e. the wavelet operator was the same for every spectrum and does not need to be recomputed every time.
- The n × n wavelet operator was very sparse, containing approximately
zeros.
Combining these facts we implemented a sparse matrix-based formulation of the problem to save memory and computation time rather than the standard 2-D filter bank decomposition. The wavelet transform matrix was constructed one time and a sparse-format matrix is stored off line. The sparse-matrix representation uses significantly less memory. Thus, in this implementation, every-time the

component needs to be computed, a sparse-matrix computed offline is read into memory and computation performed with a simple matrix multiply. This implementation resulted in a 640× speed up, as illustrated in . The time required to denoise a dataset of spectra from a group of 60 spectra decreased from ~16 hours to less than 1.5 minutes.
| Table 1Processing time (in seconds) for denoising a single spectrum using different implementations of the modified Antoniadis-Sapatinas algorithm. |
The FIR low-pass filter coefficients were estimated using the Parks-McClellan algorithm in MATLAB (firpm and firpmord functions). The specifications given the algorithm are: normalized frequency transition band between 0.15 and 0.25, pass-band ripple of 0.01, and stop band attenuation of 60 decibels (dB). The non-causal zero-phase implementation of the filter is used to prevent phase delays and preserve the locations of the peaks (filtfilt command in MATLAB)
[15]. The order of the filter is 67. In and , we show the frequency response of the filter and smoothed output, respectively. Our FIR filter can be thought of as an extension of a Savitsky-Golay filter with greater noise suppression properties at high frequencies
[16]. The smoothed spectra are much more visually consistent with that expected from visual observation by clinicians. The FIR smoothing procedure gives rise to ~ 700 local maxima compared to the ~ 150 given by the A-S algorithm, illustrating the tradeoff between peak detection performance and denoising performance between different smoothing approaches.
A feed-forward artificial neural network with one hidden layer was designed that validates peaks with (96.5%, 93.4%, 94.7%) accuracy on the training set, validation set, and test set, respectively. Peaks were detected in the QC data using LS with the most liberal settings possible so as to cast a wide net for all possible peak configurations and shapes in the data for the training process. We manually validated each prediction as a confirmed peak, a noisy/false prediction, or indeterminate. All indeterminate predictions were removed from the model estimation process. After removal of indeterminate cases, we had 4256 expert-annotated predictions, containing 57.07% percent “true” (confirmed peak) examples. We then randomly sampled the data into an approximate 50/25/25% split for the training set, validation set, and test set respectively. To construct the corresponding 62-dimensional feature vector

corresponding to a peak predicted at mass

, we considered the following intuition about how we manually QC peaks in-house.
- For SELDI, all the information needed to make a judgment about peak validity is contained in a window
. - When deciding whether a peak is acceptable or not, both the raw and processed spectra carry important information; a good peak shape in the processed spectra is not sufficient by itself if the corresponding raw spectrum is very noisy and of poor quality.
may affect judgment due to changes in the theoretical peak shape as a function of mass.- Good peaks seem to look like a healthy concave quadratic centered at
, but not always.
These insights are applied to the following procedure used to construct the feature vector. Let

be a linear grid of 30 m/z points evenly spaced over the interval

.
.(peak concavity measure, with parameter
representing the quadratic coefficient of the best fit quadric curve for the raw intensities in the window centered at
).
(mass information, since peak shapes change with mass).
=
the linearly interpolated intensities of the processed spectrum on support
.
=
the linearly interpolated intensities of the raw spectrum on support
.
We used a fully-connected feed-forward neural network with 63 input nodes, 21 hidden layer nodes, and 1 output layer node (including bias nodes). Thus, the parameter matrices had dimensions

,

. Before training, all features were standardized by subtracting the mean value in that dimension and dividing by the standard deviation (calculated across the training set). The normalization parameters are saved for use with the neural net on the validation data, test data, and clinical data. Regularization was used to control the complexity and avoid over-fitting. For each candidate value of

across a grid of points

(between 0.003 and 10), the best-fit

neural network parameters were found by using conjugate-gradient descent to minimize

across the training set. The gradient of

was calculated using the forward/back-propagation algorithm
[13]. The optimization step was observed to converge after 400 iterations. For each

, we calculated the classification error on the validation set,

. We selected our optimal neural network parameters as
where, in Eq. (6) it is understood that the estimate of

is most likely only a local minima. The trained neural network

was evaluated on the independent test set and found to perform with a classification accuracy of 94.7%. By design, the test set was not used at any stage of the parameter-fitting process in order to ensure our test set classification accuracy estimate is an unbiased estimate for how the neural network will perform on peaks it has not “seen” at any stage of parameter fitting.
LibSELDI showed improved sensitivity and specificity for detecting peak cluster mean m/z values corresponding to reproducible peaks in the pooled-sample QC data. shows the operating characteristics for LS at each iteration of improvement discussed in the
Methods section. On QC spectra, LS recovered ~50% percent of the true peak cluster m/z values without a mistake, and 70% at a 5% FDR. Operating points for Ciphergen Express using stringent (S/N 3/2) and non-stringent parameter settings (S/N 1) show a reference method for comparison. While peak cluster mean m/z values were reconstructed successfully, this benchmark did not show the accuracy of individual peak predictions within a cluster, a limitation of this approach. Resolving individual peak predictions within a cluster is critical for clinical group analysis and is discussed next.
LS w/neural network validation found 124 peak clusters and resolved peak predictions accurately within clusters and CE with a SOP were applied to the pilot clinical mucous data. An overview of the LS/neural network strategy for analyzing clinical data is shown in . Since the LS/neural network predictions were able to resolve more accurate peak predictions within clusters, this set the stage for a variety of analyses that would have been more difficult to carry out with Ciphergen Express alone. The results of t-tests, Mann-Whitney U-tests, and Fisher-exact tests with mid-P correction are shown in and . Only 2 p-values come in less than 0.01, and each test conducted tells a different story. There is some consistency in the tests with regards to which peak clusters tend to “look” different with respect to the various tests.
| Table 2CIN0 vs. CIN3 group tests (t-tests and Mann-Whitney U-tests) based on peak height and peak area measurements. |
| Table 3CIN0 vs. CIN3 prevalence differences scored using the Fisher-exact test with mid-P correction. |
With the different parameters used for peak detection, CE detected 106 (stringent) and 168 (non-stringent) peak clusters. Under the stringent parameter (S/N 5/3), 6 peak clusters were found to have differences (p-values less than 0.05, no multiple test correction) () based on peak heights between CIN0 and CIN3 groups as opposed to 10 peak clusters with less stringent parameter (S/N 3/2). However, in several cases visual inspection of the spectra showed that peak detection was not accurate thereby leading to false readings of peak heights, as shown in . This could be attributed to the need to include ‘estimated peaks’ as a peak detection parameter to enable calculation of p-values between sample groups in CE.
| Table 4CIN0 vs. CIN3 group tests (Mann-Whitney) based on peak height measurements under stringent condition (S/N 5/3) using Ciphergen Express. |