Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2896560

Formats

Article sections

Authors

Related links

Conf Proc IEEE Eng Med Biol Soc. Author manuscript; available in PMC 2010 July 3.

Published in final edited form as:

PMCID: PMC2896560

NIHMSID: NIHMS200072

Getachew K Befekadu, The Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA;

The publisher's final edited version of this article is available at Conf Proc IEEE Eng Med Biol Soc

See other articles in PMC that cite the published article.

A Bayesian multilevel functional mixed-effects model with group specific random-effects is presented for analysis of liquid chromatography-mass spectrometry (LC-MS) data. The proposed framework allows alignment of LC-MS spectra with respect to both retention time (RT) and mass-to-charge ratio (*m/z*). Affine transformations are incorporated within the model to account for any variability along the RT and *m/z* dimensions. Simultaneous posterior inference of all unknown parameters is accomplished via Markov chain Monte Carlo method using the Gibbs sampling algorithm. The proposed approach is computationally tractable and allows incorporating prior knowledge in the inference process. We demonstrate the applicability of our approach for alignment of LC-MS spectra based on total ion count profiles derived from two LC-MS datasets.

In proteomic studies, liquid chromatography coupled with mass spectrometry (LC-MS) is a common platform to identify and determine the abundance of various peptides that characterize particular proteins in biological samples [1]. Each LC-MS run generates data comprised of thousands of peak intensities for peptides with specific retention time (RT) and mass-to-charge ratio (*m/z*) values. In differential protein expression studies, multiple LC-MS runs are compared to identify differentially abundant peptides between distinct biological groups. This is a challenging task because of the following reasons: (1) substantial variation in RT across multiple runs due to the LC instrument conditions and the variable complexity of peptide mixtures, (2) variation in *m/z* values of the peptides due to occasional drift in the calibration of the mass spectrometry instrument, and (3) variation in peak intensities due to spray conditions. Thus, efficient and robust alignment algorithms are needed for qualitative comparison of multiple LC-MS runs.

Various alignment methods have been described in literature including dynamic time warping (DTW) [2], correlation optimized warping (COW) [2], vectorized peaks [3], statistical alignment [4], and clustering [5]. Most of these algorithms are either limited to a consensus pair-wise combination of spectra for alignment or may use reference (template) spectra to find matching among datasets. These limitations may lead to sub-optimal results compared to global alignment techniques. Methods that rely on optimization of global fitting functions provide an alternative solution to alignment of multiple LC-MS spectra representing distinct biological groups. For example, a recently introduced method called continuous profile model (CPM) has been applied for alignment of continuous time-series data and for detection of differences in multiple LC-MS data [6]. Although CPM is described as a naïve and computationally intensive method, the method has some limitations, such as the susceptibility to fall into local minimum solutions due to the sub-optimal problem formulation. Also, the method creates superfluous signal gaps, leading to nonuniform trace points across multiple LC-MS spectra. Another notable limitation of CPM algorithm is its poor performance with time complexity scales, requiring substantial computation time in modeling high resolution data. Thus, CPM is more suitable for low resolution of LC-MS data generated from less complex fractionations. Recently, Morris et al. developed a Bayesian-based method for analysis of matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) proteomics data [7]. Their motivation extends from earlier work on Bayesian implementation of the wavelet-based functional mixed effects models introduced by Morris and Carroll [8]. The approach is similar to the spline-based functional mixed effects models introduced by Guo [9], which involves a generalized mixed models equation to handle potentially irregular data. The method specifically deals with the identification of differentially expressed spectral regions across different experimental conditions assuming the alignment issue has already been taken care of.

In this paper, we introduce a Bayesian multilevel functional mixed effects model with group-specific random effects. The method provides the capability to account for population homogeneous behavior (i.e., fixed systematic changes across the entire LC-MS spectra representing distinct biological groups) while allowing for modeling heterogeneity within a group (i.e., random effects). Also, this paradigm allows us to incorporate additional hierarchies such as affine transformation within the model to account for any variability along the RT and *m/z* dimensions, while handling implicitly the normalization of peak intensities of peptides from multiple LC-MS spectra. The method is amenable to model both low and high resolution mass spectra, since it does not introduce superfluous signal gaps across multiple LC-MS spectra. We demonstrate this through two LC-MS datasets obtained from: (1) proteins of *lysed E. coli* cells, and (2) six groups of tryptic digests non-human proteins with different concentrations spiked into a complex sample background of human peptides.

The remainder of this paper is organized as follows. In Section II, we outline the Bayesian hierarchical model (BHM) that describes the data modeling mechanism, based on the functional mixed-effects model, for alignment of LC-MS spectra. This section explains the Markov chain Monte Carlo (MCMC) method using the Gibbs sampling algorithm for simultaneous posterior inference of all unknown parameters. Results and discussions demonstrating the applicability of the proposed method for alignment of LC-MS spectra are given in Section III. Finally, our findings are summarized in Section IV.

We propose a functional mixed-effects model to align LC-MS spectra from multiple LC-MS runs. The idea behind this approach is two-fold: (1) to model the fixed effects as a realization of partially diffused integrated Gaussian processes which account for population homogeneous behaviors (i.e., fixed systematic changes in the LC-spectra across biological groups), and (2) to model the random effects as random realizations from the same partially integrated Gaussian processes with proper variances which, in turn, allow the modeling of heterogeneity within biological groups. The estimation procedure is implemented by taking advantage of the connection between B-splines (at the design points) and mixed effects models. Let the proposed functional mixed-effects model be represented mathematically as follows:

$$\begin{array}{l}\underset{{\text{n}}_{\text{i}}\times 1}{\underbrace{{\mathbf{y}}_{i}\mid \{{z}_{ij}=j\}}}\equiv \underset{{\text{n}}_{\text{i}}\times p}{\underbrace{{\mathbf{B}}_{1i}}}\underset{p\times 1}{\underbrace{{\mathbf{\gamma}}_{j}}}+\underset{{n}_{i}\times q}{\underbrace{{\mathbf{B}}_{2i}}}\underset{q\times 1}{\underbrace{{\mathbf{\eta}}_{ij}}}+\underset{{n}_{i}\times 1}{\underbrace{{\mathbf{\epsilon}}_{ij}}}\\ {\mathbf{\eta}}_{ij}\sim {N}_{q}({\mu}_{j},{\mathbf{\sum}}_{j})\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}{\mathbf{\epsilon}}_{ij}\sim {N}_{{n}_{i}}(0,{\sigma}^{2}\mathbf{I})\end{array}$$

(1)

where *i* = 1, 2,… *s _{j}* denote sample size in each group

Let **Θ** be a vector consisting of all the unknown parameters in Eq. (1) and the priors. Let
$\mathbf{Y}={({\mathbf{y}}_{1}^{T},{\mathbf{y}}_{2}^{T},\dots ,{\mathbf{y}}_{M}^{T})}^{T}$ represent a set of LC-MS spectra. Then, according to the Bayes’ theorem

$$p(\mathbf{\Theta}\mid \mathbf{Y})=\frac{p(\mathbf{Y}\mid \mathbf{\Theta})p(\mathbf{\Theta})}{p(\mathbf{Y})}\propto L(\mathbf{\Theta}\mid \mathbf{Y})\times p(\mathbf{\Theta})$$

(2)

Using the functional mixed-effects modeling of Eq. (1), the likelihood function assuming that the group information is known and that the samples are independent is given by

$$L(\mathbf{\Theta}\mid \mathbf{Y},\mathbf{Z})={\prod}_{j=1}^{m}{\prod}_{i=1}^{{s}_{j}}{N}_{n}({\mathbf{y}}_{i},{\mathbf{Z}}_{i};{\mathbf{B}}_{1i}{\mathbf{\gamma}}_{j}+{\mathbf{B}}_{2i}{\mathbf{\eta}}_{ij},{\sigma}^{2}\mathbf{I})$$

(3)

where **Z** denotes a matrix of indicator vectors **Z*** _{i}* = (

$$p(\mathbf{\Theta}\mid \mathbf{Y})\propto {\prod}_{j=1}^{m}{\prod}_{i=1}^{{s}_{j}}{N}_{n}({\mathbf{y}}_{i},{\mathbf{Z}}_{i};{\mathbf{B}}_{1i}{\mathbf{\gamma}}_{j}+{\mathbf{B}}_{2i}{\mathbf{\eta}}_{ij},{\sigma}^{2}\mathbf{I})\times p(\mathbf{\Theta})$$

(4)

The first step in fitting BHM is to specify all prior distributions. A list of the hierarchical priors assigned to the parameters of the model is given below. The list represents the standard choice of priors for mixture models:

$$\begin{array}{l}{\mathbf{\gamma}}_{j}\sim {N}_{p}(w,\mathbf{W})\\ {\mathbf{\mu}}_{j}\sim {N}_{q}(0,\mathbf{V}),\phantom{\rule{0.38889em}{0ex}}{\mathrm{\sum}}_{j}^{-1}\mid \mathbf{R}\sim {W}_{q}(\rho ,{(\rho \mathbf{R})}^{-1})\\ \mathbf{R}\sim {W}_{q}(r,{(r{\mathbf{R}}_{0})}^{-1}),\phantom{\rule{0.38889em}{0ex}}{\sigma}^{-2}\sim \mathrm{\Gamma}(g,h)\end{array}$$

(5)

where *W*(,), *N*(,) and Γ(,) signify the Wishart, multivariate normal and gamma distributions, respectively. In specifying the prior distribution *p*(**Θ**), a hierarchical structure with independence assumption is considered. Combining this structural information with prior beliefs, we obtain the following joint posterior for the unknown parameters:

$$\begin{array}{c}p(\mathbf{\Theta}\mid \mathbf{Y})\propto {\prod}_{j=1}^{m}\{{\prod}_{i=1}^{{s}_{j}}{N}_{n}({\mathbf{y}}_{i},{\mathbf{Z}}_{i};{\mathbf{B}}_{1i}{\mathbf{\gamma}}_{j}+{\mathbf{B}}_{2i}{\mathbf{\eta}}_{ij},{\sigma}^{2}\mathbf{I})\times {N}_{p}({\mathbf{\gamma}}_{j};w,\mathbf{W})\times \\ \phantom{\rule{4em}{0ex}}{N}_{q}({\mathbf{\eta}}_{ij};{\mathbf{\mu}}_{{z}_{ij}},{\mathrm{\sum}}_{{z}_{ij}})\times {N}_{q}({\mathbf{\mu}}_{j};0,\mathbf{V})\times {W}_{q}({\mathrm{\sum}}_{j}^{-1};\rho ,{(\rho \mathbf{R})}^{-1})\}\times \\ {W}_{q}(\mathbf{R};r,{(r{\mathbf{R}}_{0})}^{-1})\times \mathrm{\Gamma}({\sigma}^{-2};g,h)\end{array}$$

(6)

Using all prior and hyperprior distributions in Eq. (5), the full conditional distributions for the parameters are as follows:

$$\begin{array}{c}p({\mathbf{\eta}}_{ij}\mid \text{rest})\propto {N}_{q}({[{\sigma}^{-2}{\mathbf{B}}_{2i}^{T}{\mathbf{B}}_{2i}+{\mathrm{\sum}}_{j}^{-1}]}^{-1}({\sigma}^{-2}{\mathbf{B}}_{2i}^{T}({\mathbf{y}}_{i}-{\mathbf{B}}_{2i}{\mathbf{\gamma}}_{j})+{\mathrm{\sum}}_{j}^{-1}{\mathbf{\mu}}_{j}),{[{\sigma}^{-2}{\mathbf{B}}_{2i}^{T}{\mathbf{B}}_{2i}+{\mathrm{\sum}}_{j}^{-1}]}^{-1})\\ p({\mathbf{\gamma}}_{j}\mid \text{rest})\propto {N}_{p}({[{\sigma}^{-2}{\sum}_{i:{z}_{ij}=1}{\mathbf{B}}_{1i}^{T}{\mathbf{B}}_{1i}+{\mathbf{W}}^{-1}]}^{-1}({\sigma}^{-2}{\sum}_{i:{z}_{ij}=1}{\mathbf{B}}_{1i}^{T}({\overline{\mathbf{y}}}_{i}-{\mathbf{B}}_{2i}{\overline{\mathbf{\eta}}}_{j})+{\mathbf{W}}^{-1}w),{[{\sigma}^{-2}{\sum}_{i:{z}_{ij}=1}{\mathbf{B}}_{1i}^{T}{\mathbf{B}}_{1i}+{\mathbf{W}}^{-1}]}^{-1})\\ p({\mathbf{\mu}}_{j}\mid \text{rest})\propto {N}_{q}{({s}_{j}{\mathrm{\sum}}_{j}^{-1}+{\mathbf{V}}^{-1})}^{-1}{s}_{j}{\mathrm{\sum}}_{j}^{-1}{\overline{\mathbf{\eta}}}_{j},{[{s}_{j}{\mathrm{\sum}}_{j}^{-1}+{\mathbf{V}}^{-1}]}^{-1})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\overline{\mathbf{\eta}}}_{j}={\sum}_{i:{z}_{ij}=1}{\mathbf{\eta}}_{ij}\\ p({\mathrm{\sum}}_{j}^{-1}\mid \text{rest})\propto {W}_{q}({s}_{j}+\rho ,{[r\mathbf{R}+{\sum}_{i:{z}_{ij}=1}({\mathbf{\eta}}_{ij}-{\mathbf{\mu}}_{j}){({\mathbf{\eta}}_{ij}-{\mathbf{\mu}}_{j})}^{T}]}^{-1})\\ p(\mathbf{R}\mid \text{rest})\propto {W}_{q}(r+m\rho ,{[r{\mathbf{R}}_{0}+\rho {\sum}_{j=1}^{m}{\mathrm{\sum}}_{j}^{-1}]}^{-1})\\ p({\sigma}^{-2}\mid \text{rest})\propto \mathrm{\Gamma}(\frac{{\sum}_{j=1}^{m}{\sum}_{i=1}^{{s}_{i}}{n}_{i}}{2}+g,{[\frac{1}{h}+\frac{1}{2}{\sum}_{j=1}^{m}{\sum}_{i:{z}_{ij}=1}({\mathbf{y}}_{i}-{\mathbf{B}}_{1i}{\mathbf{\gamma}}_{j}-{\mathbf{B}}_{2i}{\mathbf{\eta}}_{ij}){({\mathbf{y}}_{i}-{\mathbf{B}}_{1i}{\mathbf{\gamma}}_{j}-{\mathbf{B}}_{2i}{\mathbf{\eta}}_{ij})}^{T}]}^{-1})\end{array}$$

Consider the Bayesian model of Eq. (4). Let the number of groups *m* be fixed and **Θ** denote all of the unknown parameters in the model, i.e.,

$$\mathbf{\Theta}=\left({\left\{{\left\{{\mathbf{\eta}}_{ij}\right\}}_{i=1}^{{s}_{j}},{\mathbf{\gamma}}_{j},{\mathbf{\mu}}_{j},{\mathrm{\sum}}_{j}\right\}}_{j=1}^{m},\mathbf{R},{\sigma}^{2}\right)$$

Then, using **Θ**^{(0)} as starting value, the Gibbs sampling algorithm [10, 11] proceeds as follows for *t* = 1,2, … *N* iterations:

$$\begin{array}{l}\text{Draw}\phantom{\rule{0.16667em}{0ex}}{\mathbf{\eta}}_{ij}^{(t+1)}\phantom{\rule{0.16667em}{0ex}}\text{from}\phantom{\rule{0.16667em}{0ex}}p({\mathbf{\eta}}_{ij}\mid \mathbf{Y},{\mathbf{\gamma}}_{j}^{(t)},\dots ,{\sigma}^{2(t)})\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}i=1,2,\dots ,{s}_{j}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}j=1,2,\dots ,m\\ \text{Draw}\phantom{\rule{0.16667em}{0ex}}{\mathbf{\gamma}}_{j}^{(t+1)}\phantom{\rule{0.16667em}{0ex}}\text{from}\phantom{\rule{0.16667em}{0ex}}p({\mathbf{\gamma}}_{j}\mid \mathbf{Y},{\mathbf{\eta}}_{ij}^{(t+1)},\dots ,{\sigma}^{2(t)})\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}j=1,2,\dots ,m\\ \text{Draw}\phantom{\rule{0.16667em}{0ex}}{\mathbf{\mu}}_{j}^{(t+1)}\phantom{\rule{0.16667em}{0ex}}\text{from}\phantom{\rule{0.16667em}{0ex}}p({\mathbf{\eta}}_{ij}\mid \mathbf{Y},{\mathbf{\eta}}_{j}^{(t+1)},{\mathbf{\gamma}}_{j}^{(t+1)},\dots ,{\sigma}^{2(t)})\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}j=1,2,\dots ,m\\ \vdots \\ \text{Draw}\phantom{\rule{0.16667em}{0ex}}{\sigma}^{2(t+1)\phantom{\rule{0.16667em}{0ex}}}\text{from}\phantom{\rule{0.16667em}{0ex}}p({\sigma}^{2}\mid \mathbf{Y},{\mathbf{\eta}}_{ij}^{(t+1)},{\mathbf{\gamma}}_{j}^{(t+1)},{\mathbf{\mu}}_{j}^{(t+1)},{\mathrm{\sum}}_{j}^{(t+1)},{\mathbf{R}}^{(t+1)})\end{array}$$

Note that the computations for the conditional probabilities are highly simplified due to the conjugacy of the prior distributions and their conditional independence.

The BHM presented in Eq. (1) can be easily extended to incorporate detail modeling. It is important to introduce priors that appropriately apportion the variability among the replicates and separating out the differing locations or scales along the RT or *m/z* dimensions. This provides a distinct interpretation of the LC-MS data. The alignment model and the associated parameters should allow each replicate sample to have its own affine warping transformation in RT or *m/z* dimensions. Let each spectrum **y*** _{i}* (

Combining the affine warping transformation with our prior beliefs for *a _{i}* and

$$\begin{array}{l}p(\mathbf{\Theta}/\mathbf{Y})\propto {\prod}_{d=1}^{D}{\prod}_{j=1}^{m}\{{\prod}_{i=1}^{{s}_{j}}{N}_{n}({\mathbf{y}}_{i}^{(d)},{\mathbf{Z}}_{i};{\mathbf{B}}_{1i}^{(d)}{\mathbf{\gamma}}_{j}^{(d)}+{\mathbf{B}}_{2i}^{(d)}{\mathbf{\eta}}_{ij}^{(d)},{\sigma}^{2(d)}\mathbf{I})\times \\ \phantom{\rule{1em}{0ex}}{N}_{p}({\mathbf{\gamma}}_{j}^{(d)};{w}^{d},{\mathbf{W}}^{(d)})\times {N}_{q}({\mathbf{\eta}}_{ij}^{(d)};{\mathbf{\mu}}_{{z}_{ij}}^{(d)},{\mathrm{\sum}}_{{z}_{ij}}^{(d)})\times N({\mu}_{b};{\sigma}_{b}^{2})\times \\ \phantom{\rule{1em}{0ex}}N({\mu}_{a};{\sigma}_{a}^{2})I\{{a}_{i}>1\}\times {N}_{q}({\mathbf{\mu}}_{j}^{(d)};0,{\mathbf{V}}^{(d)})\times {W}_{q}({\mathrm{\sum}}_{j}^{-1(d)};{\rho}^{(d)},{({\rho}^{(d)}{\mathbf{R}}^{(d)})}^{-1})\}\times \\ \phantom{\rule{1em}{0ex}}{W}_{q}({\mathbf{R}}^{(d)};r,{(r{\mathbf{R}}_{0})}^{-1})\times \mathrm{\Gamma}({\sigma}^{-2(d)};g,h)\end{array}$$

(7)

where **Θ** denotes all unknown parameters in this new model.

$$\mathbf{\Theta}=\left({\left\{{\left\{{\left\{{\mathbf{\eta}}_{ij}^{(d)}\right\}}_{i=1}^{{s}_{j}},{\mathbf{\gamma}}_{j}^{(d)},{\mathbf{\mu}}_{j}^{(d)},{\mathrm{\sum}}_{j}^{(d)},{\mathbf{R}}^{(d)},{\sigma}^{2(d)}\right\}}_{d=1}^{D},{\mu}_{a},{\mu}_{b},{\sigma}_{a}^{2},{\sigma}_{b}^{2}\right\}}_{j=1}^{m}\right)$$

(8)

The B-spline basis matrices associated with
${\mathbf{y}}_{i}^{(d)}({a}_{i}{\mathbf{x}}_{ij}-{b}_{i})$ need to be updated at each iteration based on the estimates of RT transformation parameters *a _{i}* and

$$\begin{array}{l}{\mathbf{\eta}}_{ij}^{(d)(t+1)},{\mathbf{\gamma}}_{j}^{(d)(t+1)},{\mathbf{\mu}}_{j}^{(d)(t+1)},{\mathrm{\sum}}_{j}^{(d)(t+1)},{\mathbf{R}}^{(d)(t+1)},{\sigma}^{2(d)(t+1)},{\mu}_{a}^{(t+1)},{\mu}_{b}^{(t+1)},\\ {\sigma}_{a}^{2(t+1)},{\sigma}_{b}^{2(t+1)},\phantom{\rule{0.38889em}{0ex}}\text{for}\phantom{\rule{0.38889em}{0ex}}i=1,2,\dots ,{s}_{j},\phantom{\rule{0.16667em}{0ex}}j=1,2,\dots ,m\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}d=1,2,\dots ,D\end{array}$$

We used BHM to align 11 replicate LC-MS spectra obtained from http://www.cs.toronto.edu/~jenn/LCMS. The spectra are generated from proteins of *lysed E. coli* cells by capillary-scale LC coupled on-line to an ion trap mass spectrometer (see Listgarten et al. [6] for details). Each spectrum was represented by two dimensions after calculating the total ion count (TIC) profiles for each RT point across the *m/z* values from the original 400×2400 data matrix corresponding to 400 RT points (~55 min.) and 2400 *m/z* bins spanning between 400 and 1600 Dalton (Da). Fig. 2 depicts these 11 two-dimensional replicate spectra. From this figure, we can see that the spectra show significant shifts along RT as well as distortions in the ion abundance measurement space. We applied our BHM method for alignment of LC-MS spectra with respect to RT. Fig. 3 depicts the aligned spectra. BHM reduced the coefficient of variation (CV) of the original TIC profiles from 82% to 66%. The CV of the spectra aligned by DTW, COW and CPM were 70%, 80% and 57%, respectively.

The second dataset was obtained from http://prottools.ethz.ch/muellelu/web/Latin_Square_Data.php It consists of 18 LC-MS spectra generated from tryptic digests of six standard non-human proteins (myoglobin, carbonic anhydrase, cytochrome c, lysozyme, alcohol dehydrogenase, and aldolase A) spiked with different concentrations into a complex sample background of human peptides and isolated by solid-phase Nglycocapture from serum. The LC-MS spectra generation for these samples was performed using the Fourier transformed-linear trap quadrupole (FT-LTQ) mass spectrometer (see Mueller et al. [5] for details). The 18 spectra represent six groups based on the concentration of the proteins. We processed the raw spectra and obtained for each spectrum a 2000×1300 data matrix corresponding to 2000 RT points (~55 min.) and 1300 m/z bins between 300 and 1600 Da. We calculated the TIC for each RT point across the *m/z* values and obtained 18 two-dimensional TIC profiles (for each of the six groups). Figs. 4 and and55 depict TIC plots of the original and aligned LC-MS spectra, respectively. Fig. 6 shows the corresponding heat maps for the original and aligned LC-MS spectra. BHM reduced the average CV of the original TIC profile across the six groups from 18% to 13%. Both DTW and COW yielded a CV of 17%, while CPM resulted in a CV of 13%.

This paper utilizes a Bayesian hierarchical model for alignment of LC-MS spectra. Specifically, it presents a fully Bayesian mixed-effects model that effectively accounts for population homogeneous behavior across biological groups (i.e., fixed systematic changes) and for heterogeneity within groups (random effects). Bayesian inference of unknown parameters is carried out via MCMC method using the Gibbs sampling technique with conjugate priors. The proposed approach not only allows alignment with respect to RT and *m/z* dimensions, it also implicitly normalizes the peak intensities of peptides. The performance of the approach is assessed through two LC-MS datasets: replicate spectra generated from proteins of *lysed E. coli* cells and spectra representing six groups, where six proteins are spiked at different concentrations into a complex sample background of human peptides. Through these datasets, it is demonstrated that BHM achieves good performance in reducing coefficient of variation of replicate TIC profiles, while preserving the original experimental retention time (i.e., without introducing superfluous signal gaps across multiple LC-MS spectra). A limitation of BHM is that it requires considerable amount of computation time in aligning LC-MS data with respect to both RT and *m/z* dimensions. Future work will focus on addressing this limitation through optimization of the algorithm.

This work was supported in part by the National Science Foundation Grant IIS-0812246, the National Cancer Institute (NCI) R21CA130837 Grant, NCI R03CA119313 Grant, NCI Early Detection Research Network Associate Membership Grant, and the Prevent Cancer Foundation Grant awarded to HWR.

Getachew K Befekadu, The Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA.

Mahlet G Tadesse, The Department of Mathematics, Georgetown University, 308 St. Mary’s Hall, Washington, DC 20057, USA.

Habtom W Ressom, The Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA.

1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. [PubMed]

2. Tomasi G, van den Berg F, Andersson C. Correlation Optimized Warping and Dynamic Time Warping as Preprocessing Methods for Chromatographic Data. Journal of Chemometrics and Intelligent Laboratory Systems. 2004;18:231–241.

3. Hastings CA, Norton SM, Roy S. New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data. Rapid Commun Mass Spectrom. 2002;16(5):462–7. [PubMed]

4. Wang P, Tang H, Fitzgibbon MP, et al. A statistical method for chromatographic alignment of LC-MS data. Biostatistics. 2007;8(2):357–67. [PubMed]

5. Mueller LN, Rinner O, Schmidt A, et al. SuperHirn - a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics. 2007;7(19):3470–80. [PubMed]

6. Listgarten J. Department of Computer Science, vol. Ph.D. Toronto: University of Toronto; 2006. Analysis of sibling time series data: alignment and difference detection.

7. Morris JS, Brown PJ, Herrick RC, et al. Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models. Biometrics. 2008;64(2):479–89. [PMC free article] [PubMed]

8. Morris JS, Carroll RJ. Wavelet-based functional mixed models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006;68:179–199. [PMC free article] [PubMed]

9. Guo W. Functional data analysis in longitudinal settings using smoothing splines. Statistical Methods in Medical Research. 2004;13(1):49–62. [PubMed]

10. Casella G, George EI. Explaining the Gibbs Sampler. The American Statistician. 1992;46(3):167–174.

11. Geman S, Geman D. Readings in uncertain reasoning. Morgan Kaufmann Publishers Inc; 1990. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images; pp. 452–472. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |