Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  PeakLink: a new peptide peak linking method in LC-MS/MS using wavelet and SVM 
Bioinformatics  2014;30(17):2464-2470.
Motivation: In liquid chromatography–mass spectrometry/tandem mass spectrometry (LC-MS/MS), it is necessary to link tandem MS-identified peptide peaks so that protein expression changes between the two runs can be tracked. However, only a small number of peptides can be identified and linked by tandem MS in two runs, and it becomes necessary to link peptide peaks with tandem identification in one run to their corresponding ones in another run without identification. In the past, peptide peaks are linked based on similarities in retention time (rt), mass or peak shape after rt alignment, which corrects mean rt shifts between runs. However, the accuracy in linking is still limited especially for complex samples collected from different conditions. Consequently, large-scale proteomics studies that require comparison of protein expression profiles of hundreds of patients can not be carried out effectively.
Method: In this article, we consider the problem of linking peptides from a pair of LC-MS/MS runs and propose a new method, PeakLink (PL), which uses information in both the time and frequency domain as inputs to a non-linear support vector machine (SVM) classifier. The PL algorithm first uses a threshold on an rt likelihood ratio score to remove candidate corresponding peaks with excessively large elution time shifts, then PL calculates the correlation between a pair of candidate peaks after reducing noise through wavelet transformation. After converting rt and peak shape correlation to statistical scores, an SVM classifier is trained and applied for differentiating corresponding and non-corresponding peptide peaks.
Results: PL is tested in multiple challenging cases, in which LC-MS/MS samples are collected from different disease states, different instruments and different laboratories. Testing results show significant improvement in linking accuracy compared with other algorithms.
Availability and implementation: M files for the PL alignment method are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4147882  PMID: 24813213
2.  Suppression correction and characteristic study in liquid chromatography/Fourier transform mass spectrometry measurements 
Analysis of peptide profiles from liquid chromatography/Fourier transform mass spectrometry (LC/FTMS) reveals a nonlinear distortion in intensity. Investigation of the measured C13/C12 ratios comparing with theoretical ones shows that the nonlinearity can be attributed to signal suppression of low abundance peptide peaks. We find that the suppression is homogenous for different isotopes of identical peptides but non-homogenous for different peptides. We develop an iterative correction algorithm that corrects the intensity distortions for peptides with relatively high abundance. This algorithm can be applied in a wide range of applications using LC/FTMS. We also analyze the distortion characteristics of the instrument for lower abundance peptides, which should be considered when interpreting quantification results of LC/FTMS.
PMCID: PMC4174462  PMID: 21259364
3.  Accurate LC Peak Boundary Detection for 16O/18O Labeled LC-MS Data 
PLoS ONE  2013;8(10):e72951.
In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges. In addition, we remove peptides with erroneous mass assignment through model fitness check, which compares observed intensity patterns to theoretically constructed ones. The proposed algorithm can significantly improve the accuracy and precision of peptide ratio measurements.
PMCID: PMC3792097  PMID: 24115998
4.  BPDA2d—a 2D global optimization-based Bayesian peptide detection algorithm for liquid chromatograph–mass spectrometry 
Bioinformatics  2011;28(4):564-572.
Motivation: Peptide detection is a crucial step in mass spectrometry (MS) based proteomics. Most existing algorithms are based upon greedy isotope template matching and thus may be prone to error propagation and ineffective to detect overlapping peptides. In addition, existing algorithms usually work at different charge states separately, isolating useful information that can be drawn from other charge states, which may lead to poor detection of low abundance peptides.
Results: BPDA2d models spectra as a mixture of candidate peptide signals and systematically evaluates all possible combinations of possible peptide candidates to interpret the given spectra. For each candidate, BPDA2d takes into account its elution profile, charge state distribution and isotope pattern, and it combines all evidence to infer the candidate's signal and existence probability. By piecing all evidence together—especially by deriving information across charge states—low abundance peptides can be better identified and peptide detection rates can be improved. Instead of local template matching, BPDA2d performs global optimization for all candidates and systematically optimizes their signals. Since BPDA2d looks for the optimal among all possible interpretations of the given spectra, it has the capability in handling complex spectra where features overlap. BPDA2d estimates the posterior existence probability of detected peptides, which can be directly used for probability-based evaluation in subsequent processing steps. Our experiments indicate that BPDA2d outperforms state-of-the-art detection methods on both simulated data and real liquid chromatography–mass spectrometry data, according to sensitivity and detection accuracy.
Availability: The BPDA2d software package is available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3278754  PMID: 22155863
5.  Mathematical modeling and stability analysis of macrophage activation in left ventricular remodeling post-myocardial infarction 
BMC Genomics  2012;13(Suppl 6):S21.
About 6 million Americans suffer from heart failure and 70% of heart failure cases are caused by myocardial infarction (MI). Following myocardial infarction, increased cytokines induce two major types of macrophages: classically activated macrophages which contribute to extracellular matrix destruction and alternatively activated macrophages which contribute to extracellular matrix construction. Though experimental results have shown the transitions between these two types of macrophages, little is known about the dynamic progression of macrophages activation. Therefore, the objective of this study is to analyze macrophage activation patterns post-MI.
We have collected experimental data from adult C57 mice and built a framework to represent the regulatory relationships among cytokines and macrophages. A set of differential equations were established to characterize the regulatory relationships for macrophage activation in the left ventricle post-MI based on the physical chemistry laws. We further validated the mathematical model by comparing our computational results with experimental results reported in the literature. By applying Lyaponuv stability analysis, the established mathematical model demonstrated global stability in homeostasis situation and bounded response to myocardial infarction.
We have established and validated a mathematical model for macrophage activation post-MI. The stability analysis provided a possible strategy to intervene the balance of classically and alternatively activated macrophages in this study. The results will lay a strong foundation to understand the mechanisms of left ventricular remodelling post-MI.
PMCID: PMC3481436  PMID: 23134700
6.  SCFIA: a statistical corresponding feature identification algorithm for LC/MS 
BMC Bioinformatics  2011;12:439.
Identifying corresponding features (LC peaks registered by identical peptides) in multiple Liquid Chromatography/Mass Spectrometry (LC-MS) datasets plays a crucial role in the analysis of complex peptide or protein mixtures. Warping functions are commonly used to correct the mean of elution time shifts among LC-MS datasets, which cannot resolve the ambiguity of corresponding feature identification since elution time shifts are random. We propose a Statistical Corresponding Feature Identification Algorithm(SCFIA) based on both elution time shifts and peak shape correlations between corresponding features. SCFIA first trains a set of statistical models, and then, all candidate corresponding features are scored by the statistical models to find the maximum likelihood solution.
We test SCFIA on publicly available datasets. We first compare its performance with that of warping function based methods, and the results show significant improvements. The performance of SCFIA on replicates datasets and fractionated datasets is also evaluated. In both cases, the accuracy is above 90%, which is near optimal. Finally the coverage of SCFIA is evaluated, and it is shown that SCFIA can find corresponding features in multiple datasets for over 90% peptides identified by Tandem MS.
SCFIA can be used for accurate corresponding feature identification in LC-MS. We have shown that peak shape correlation can be used effectively for improving the accuracy. SCFIA provides high coverage in corresponding feature identification in multiple datasets, which serves the basis for integrating multiple LC-MS measurements for accurate peptide quantification.
PMCID: PMC3233610  PMID: 22078262
7.  Processing methods for signal suppression of FTMS data 
Proteome Science  2011;9(Suppl 1):S2.
Fourier Transform Mass Spectrometry coupled with Liquid Chromatography(LC-FTMS) has been widely used in proteomics. Past investigation has revealed that there exists an intensity dependent random suppression in peptide elution profiles in LC-FTMS data. The suppression is homogenous for the same peptide but non-homogenous for different peptides. The correction of suppressed profiles and an estimation on the range of suppression are necessary for accurate and reliable quantification using FTMS data.
A software package, Gcorr, is presented. The software corrects peptide profiles that satisfy correction conditions, and it can predict fold change null distributions at different intensity levels. Subsequently, the significance P-values of measured fold changes can be estimated based on the predicted null distributions. We have used an 1:1 LC-FTMS label-free dataset pair collected based on the same sample to verify that our predicted null distributions conforms to that of the observed null distribution.
This software is able to provide suppression correction for peptide profiles, suppression distribution analysis and peptide differential expression analysis in terms of its fold change significance. The software is freely available at
PMCID: PMC3289080  PMID: 22166077
8.  Bayesian non-negative factor analysis for reconstructing transcription factor mediated regulatory networks 
Proteome Science  2011;9(Suppl 1):S9.
Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task.
We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method.
The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.
PMCID: PMC3289087  PMID: 22166063
9.  Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry 
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
PMCID: PMC3085289  PMID: 21544266
Bayesian methods; Markov chain Monte Carlo; mass spectrometry; peptide peak detection; time-of-flight
10.  MRCQuant- an accurate LC-MS relative isotopic quantification algorithm on TOF instruments 
BMC Bioinformatics  2011;12:74.
Relative isotope abundance quantification, which can be used for peptide identification and differential peptide quantification, plays an important role in liquid chromatography-mass spectrometry (LC-MS)-based proteomics. However, several major issues exist in the relative isotopic quantification of peptides on time-of-flight (TOF) instruments: LC peak boundary detection, thermal noise suppression, interference removal and mass drift correction. We propose to use the Maximum Ratio Combining (MRC) method to extract MS signal templates for interference detection/removal and LC peak boundary detection. In our method, MRCQuant, MS templates are extracted directly from experimental values, and the mass drift in each LC-MS run is automatically captured and compensated. We compared the quantification accuracy of MRCQuant to that of another representative LC-MS quantification algorithm (msInspect) using datasets downloaded from a public data repository.
MRCQuant showed significant improvement in the number of accurately quantified peptides.
MRCQuant effectively addresses major issues in the relative quantification of LC-MS-based proteomics data, and it provides improved performance in the quantification of low abundance peptides.
PMCID: PMC3072341  PMID: 21406110
11.  ICPD-A New Peak Detection Algorithm for LC/MS 
BMC Genomics  2010;11(Suppl 3):S8.
The identification and quantification of proteins using label-free Liquid Chromatography/Mass Spectrometry (LC/MS) play crucial roles in biological and biomedical research. Increasing evidence has shown that biomarkers are often low abundance proteins. However, LC/MS systems are subject to considerable noise and sample variability, whose statistical characteristics are still elusive, making computational identification of low abundance proteins extremely challenging. As a result, the inability of identifying low abundance proteins in a proteomic study is the main bottleneck in protein biomarker discovery.
In this paper, we propose a new peak detection method called Information Combining Peak Detection (ICPD ) for high resolution LC/MS. In LC/MS, peptides elute during a certain time period and as a result, peptide isotope patterns are registered in multiple MS scans. The key feature of the new algorithm is that the observed isotope patterns registered in multiple scans are combined together for estimating the likelihood of the peptide existence. An isotope pattern matching score based on the likelihood probability is provided and utilized for peak detection.
The performance of the new algorithm is evaluated based on protein standards with 48 known proteins. The evaluation shows better peak detection accuracy for low abundance proteins than other LC/MS peak detection methods.
PMCID: PMC2999353  PMID: 21143790
12.  BPDA - A Bayesian peptide detection algorithm for mass spectrometry 
BMC Bioinformatics  2010;11:490.
Mass spectrometry (MS) is an essential analytical tool in proteomics. Many existing algorithms for peptide detection are based on isotope template matching and usually work at different charge states separately, making them ineffective to detect overlapping peptides and low abundance peptides.
We present BPDA, a Bayesian approach for peptide detection in data produced by MS instruments with high enough resolution to baseline-resolve isotopic peaks, such as MALDI-TOF and LC-MS. We model the spectra as a mixture of candidate peptide signals, and the model is parameterized by MS physical properties. BPDA is based on a rigorous statistical framework and avoids problems, such as voting and ad-hoc thresholding, generally encountered in algorithms based on template matching. It systematically evaluates all possible combinations of possible peptide candidates to interpret a given spectrum, and iteratively finds the best fitting peptide signal in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. In contrast to previous detection methods, BPDA performs deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals and produces higher sensitivities and more robust results. Unlike template-matching algorithms, BPDA can handle complex data where features overlap. Our experimental results indicate that BPDA performs well on simulated data and real MS data sets, for various resolutions and signal to noise ratios, and compares very favorably with commonly used commercial and open-source software, such as flexAnalysis, OpenMS, and Decon2LS, according to sensitivity and detection accuracy.
Unlike previous detection methods, which only employ isotopic distributions and work at each single charge state alone, BPDA takes into account the charge state distribution as well, thus lending information to better identify weak peptide signals and produce more robust results. The proposed approach is based on a rigorous statistical framework, which avoids problems generally encountered in algorithms based on template matching. Our experiments indicate that BPDA performs well on both simulated data and real data, and compares very favorably with commonly used commercial and open-source software. The BPDA software can be downloaded from
PMCID: PMC3098078  PMID: 20920238
13.  Review of Peak Detection Algorithms in Liquid-Chromatography-Mass Spectrometry 
Current Genomics  2009;10(6):388-401.
In this review, we will discuss peak detection in Liquid-Chromatography-Mass Spectrometry (LC/MS) from a signal processing perspective. A brief introduction to LC/MS is followed by a description of the major processing steps in LC/MS. Specifically, the problem of peak detection is formulated and various peak detection algorithms are described and compared.
PMCID: PMC2766790  PMID: 20190954

Results 1-13 (13)