Microarray technology has produced a huge body of time-course gene expression data and will continue to produce more. Such gene expression data has been proved useful in genomic disease diagnosis and drug design. The challenge is how to uncover useful information from such data by proper analysis methods such as significance analysis and clustering analysis. Many statistic-based significance analysis methods and distance/correlation-based clustering analysis methods have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterizes such data and that should be considered in analysis of such data. In this paper, we employ a nonlinear model to analyse time-course gene expression data. We firstly develop an efficient method for estimating the parameters in the nonlinear model. Then we utilize this model to perform the significance analysis of individually differentially expressed genes and clustering analysis of a set of gene expression profiles. The verification with two synthetic datasets shows that our developed significance analysis method and cluster analysis method outperform some existing methods. The application to one real-life biological dataset illustrates that the analysis results of our developed methods are in agreement with the existing results.
For high-resolution tandem mass spectra, the determination of monoisotopic masses of fragment ions plays a key role in the subsequent peptide and protein identification. In this paper, we present a new algorithm for deisotoping the bottom-up spectra. Isotopic-cluster graphs are constructed to describe the relationship between all possible isotopic clusters. Based on the relationship in isotopic-cluster graphs, each possible isotopic cluster is assessed with a score function, which is built by combining nonintensity and intensity features of fragment ions. The non-intensity features are used to prevent fragment ions with low intensity from being removed. Dynamic programming is adopted to find the highest score path with the most reliable isotopic clusters. The experimental results have shown that the average Mascot scores and F-scores of identified peptides from spectra processed by our deisotoping method are greater than those by YADA and MS-Deconv software.
Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features) than that of samples (patients), which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by the l1-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM) methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.
Many clustering algorithms are unable to solve the clustering problem of protein-protein interaction (PPI) networks effectively. A novel clustering model which combines the optimization mechanism of artificial bee colony (ABC) with the fuzzy membership matrix is proposed in this paper. The proposed ABC-IFC clustering model contains two parts: searching for the optimum cluster centers using ABC mechanism and forming clusters using intuitionistic fuzzy clustering (IFC) method. Firstly, the cluster centers are set randomly and the initial clustering results are obtained by using fuzzy membership matrix. Then the cluster centers are updated through different functions of bees in ABC algorithm; then the clustering result is obtained through IFC method based on the new optimized cluster center. To illustrate its performance, the ABC-IFC method is compared with the traditional fuzzy C-means clustering and IFC method. The experimental results on MIPS dataset show that the proposed ABC-IFC method not only gets improved in terms of several commonly used evaluation criteria such as precision, recall, and P value, but also obtains a better clustering result.
Genetic regulatory networks are dynamic systems which describe the interactions among gene products (mRNAs and proteins). The internal states of a genetic regulatory network consist of the concentrations of mRNA and proteins involved in it, which are very helpful in understanding its dynamic behaviors. However, because of some limitations such as experiment techniques, not all internal states of genetic regulatory network can be effectively measured. Therefore it becomes an important issue to estimate the unmeasured states via the available measurements. In this study, we design a state observer to estimate the states of genetic regulatory networks with time delays from available measurements. Furthermore, based on linear matrix inequality (LMI) approach, a criterion is established to guarantee that the dynamic of estimation error is globally asymptotically stable. A gene repressillatory network is employed to illustrate the effectiveness of our design approach.
Identification of protein complexes from protein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived. The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures.
We have often observed unexpected state transitions of complex systems. We are thus interested in how to steer a complex system from an unexpected state to a desired state. Here we introduce the concept of transittability of complex networks, and derive a new sufficient and necessary condition for state transittability which can be efficiently verified. We define the steering kernel as a minimal set of steering nodes to which control signals must directly be applied for transition between two specific states of a network, and propose a graph-theoretic algorithm to identify the steering kernel of a network for transition between two specific states. We applied our algorithm to 27 real complex networks, finding that sizes of steering kernels required for transittability are much less than those for complete controllability. Furthermore, applications to regulatory biomolecular networks not only validated our method but also identified the steering kernel for their phenotype transitions.
A real-time peptide-spectrum matching (RT-PSM) algorithm is a database search method to interpret tandem mass spectra (MS/MS) with strict time constraints. Restricted by the hardware and architecture of individual workstation, previous RT-PSM algorithms either are not fast enough to satisfy all real-time system requirements or need to sacrifice the level of inference accuracy to provide the required processing speed.
We develop two parallelized algorithms for MS/MS data analysis: a multi-core RT-PSM (MC RT-PSM) algorithm which works on individual workstations and a distributed computing RT-PSM (DC RT-PSM) algorithm which works on a computer cluster. Two data sets are employed to evaulate the performance of our proposed algorithms. The simulation results show that our proposed algorithms can reach approximately 216.9-fold speedup on a sub-task process (similarity scoring module) and 84.78-fold speedup on the overall process compared with a single-thread process of the RT-PSM algorithm when 240 logical cores are employed.
The improved RT-PSM algorithms can achieve the processing speed requirement without sacrificing the level of inference accuracy. With some configuration adjustments, the proposed algorithm can support many peptide identification programs, such as X!Tandem, CUDA version RT-PSM, etc.
Characterization of unknown proteins through computational approaches is one of the most challenging problems in silico biology, which has attracted world-wide interests and great efforts. There have been some computational methods proposed to address this problem, which are either based on homology mapping or in the context of protein interaction networks.
In this paper, two algorithms are proposed by integrating the protein-protein interaction (PPI) network, proteins’ domain information and protein complexes. The one is domain combination similarity (DCS), which combines the domain compositions of both proteins and their neighbors. The other is domain combination similarity in context of protein complexes (DSCP), which extends the protein functional similarity definition of DCS by combining the domain compositions of both proteins and the complexes including them. The new algorithms are tested on networks of the model species of Saccharomyces cerevisiae to predict functions of unknown proteins using cross validations. Comparing with other several existing algorithms, the results have demonstrated the effectiveness of our proposed methods in protein function prediction. Furthermore, the algorithm DSCP using experimental determined complex data is robust when a large percentage of the proteins in the network is unknown, and it outperforms DCS and other several existing algorithms.
The accuracy of predicting protein function can be improved by integrating the protein-protein interaction (PPI) network, proteins’ domain information and protein complexes.
In dynamic biological processes, genes, transcription factors(TF) and microRNAs(miRNAs) play vital regulation roles. Many researchers have focused on the transcription factors or miRNAs in transcriptional or post transcriptional stage, respectively. However, the transcriptional regulation and post transcriptional regulation is not isolated in the whole dynamic biological processes, there are few reserchers who have tried to consider the network composed by genes, miRNAs and TFs in this dynamic biological processes, especially in the mouse lung development. Moreover, it is widely acknowledged that cancer is a kind of developmental disorders, and some of pathways involved in tissue development might be also implicated in causing cancer. Although it has been found that many genes differentially expressed during mouse lung development are also differentially expressed in lung cancer, very little work has been reported to elucidate the combinational regulatory programs of such kind of associations.
In order to investigate the association of transcriptional and post-transcriptional regulating activities in the mouse lung development, we define the significant triple relations among miRNAs, TFs and mRNAs as circuits. From the lung development time course data GSE21053, we mine 142610 circuit candidates including 96 TFs, 129 miRNAs and 13403 genes. After removing genes with little variation along different time points, we finally find 64760 circuit candidates, containing 8299 genes, 50 TFs, and 118 miRNAs in total. Further analysis on the circuits shows that the circuits vary in different stages of the lung development and play different roles. By investigating the circuits in the context of lung specific genes, we identify out the regulatory combinations for lung specific genes, as well as for those lung non-specific genes. Moreover, we show that the lung non-specific genes involved circuits are functionally related to the lung development. Noticing that some tissue developmental systems may be involved in tumourigenesis, we also check the cancer genes involved circuits, trying to find out their regulatory program, which would be useful for the research of lung cancer.
The relevant transcriptional or post-transcriptional factors and their roles involved in the mouse lung development are both changed greatly in different stages. By investigating the cancer genes involved circuits, we can find miRNAs/TFs playing important roles in tumour progression. Therefore, the miRNA-TF-mRNA circuits can be used in wide translational biomedicine studies, and can provide potential drug targets towards the treatment of lung cancer.
The sequence database searching has been the dominant method for peptide identification, in which a large number of peptide spectra generated from LC/MS/MS experiments are searched using a search engine against theoretical fragmentation spectra derived from a protein sequences database or a spectral library. Selecting trustworthy peptide spectrum matches (PSMs) remains a challenge.
A novel scoring method named FC-Ranker is developed to assign a nonnegative weight to each target PSM based on the possibility of its being correct. Particularly, the scores of PSMs are updated by using a fuzzy SVM classification model and a fuzzy silhouette index iteratively. Trustworthy PSMs will be assigned high scores when the algorithm stops.
Our experimental studies show that FC-Ranker outperforms other post-database search algorithms over a variety of datasets, and it can be extended to solve a general classification problem with uncertain labels.
Peptide identification; Peptide spectrum matches (PSMs); Fuzzy support vector machine (SVM); Fuzzy silhouette
β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design.
We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features.
In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.
Protein interaction networks (PINs) are known to be useful to detect protein complexes. However, most available PINs are static, which cannot reflect the dynamic changes in real networks. At present, some researchers have tried to construct dynamic networks by incorporating time-course (dynamic) gene expression data with PINs. However, the inevitable background noise exists in the gene expression array, which could degrade the quality of dynamic networkds. Therefore, it is needed to filter out contaminated gene expression data before further data integration and analysis.
Firstly, we adopt a dynamic model-based method to filter noisy data from dynamic expression profiles. Then a new method is proposed for identifying active proteins from dynamic gene expression profiles. An active protein at a time point is defined as the protein the expression level of whose corresponding gene at that time point is higher than a threshold determined by a standard variance involved threshold function. Furthermore, a noise-filtered active protein interaction network (NF-APIN) is constructed. To demonstrate the efficiency of our method, we detect protein complexes from the NF-APIN, compared with those from other dynamic PINs.
A dynamic model based method can effectively filter out noises in dynamic gene expression data. Our method to compute a threshold for determining the active time points of noise-filtered genes can make the dynamic construction more accuracy and provide a high quality framework for network analysis, such as protein complex prediction.
A metabolic system consists of a number of reactions transforming molecules of one kind into another to provide the energy that living cells need. Based on the biochemical reaction principles, dynamic metabolic systems can be modeled by a group of coupled differential equations which consists of parameters, states (concentration of molecules involved), and reaction rates. Reaction rates are typically either polynomials or rational functions in states and constant parameters. As a result, dynamic metabolic systems are a group of differential equations nonlinear and coupled in both parameters and states. Therefore, it is challenging to estimate parameters in complex dynamic metabolic systems. In this paper, we propose a method to analyze the complexity of dynamic metabolic systems for parameter estimation. As a result, the estimation of parameters in dynamic metabolic systems is reduced to the estimation of parameters in a group of decoupled rational functions plus polynomials (which we call improper rational functions) or in polynomials. Furthermore, by taking its special structure of improper rational functions, we develop an efficient algorithm to estimate parameters in improper rational functions. The proposed method is applied to the estimation of parameters in a dynamic metabolic system. The simulation results show the superior performance of the proposed method.
Protein inference is an important computational step in proteomics. There exists a natural nest relationship between protein inference and peptide identification, but these two steps are usually performed separately in existing methods. We believe that both peptide identification and protein inference can be improved by exploring such nest relationship.
In this study, a feedback framework is proposed to process peptide identification reports from search engines, and an iterative method is implemented to exemplify the processing of Sequest peptide identification reports according to the framework. The iterative method is verified on two datasets with known validity of proteins and peptides, and compared with ProteinProphet and PeptideProphet. The results have shown that not only can the iterative method infer more true positive and less false positive proteins than ProteinProphet, but also identify more true positive and less false positive peptides than PeptideProphet.
The proposed iterative method implemented according to the feedback framework can unify and improve the results of peptide identification and protein inference.
Identification of essential proteins plays a significant role in understanding minimal requirements for the cellular survival and development. Many computational methods have been proposed for predicting essential proteins by using the topological features of protein-protein interaction (PPI) networks. However, most of these methods ignored intrinsic biological meaning of proteins. Moreover, PPI data contains many false positives and false negatives. To overcome these limitations, recently many research groups have started to focus on identification of essential proteins by integrating PPI networks with other biological information. However, none of their methods has widely been acknowledged.
By considering the facts that essential proteins are more evolutionarily conserved than nonessential proteins and essential proteins frequently bind each other, we propose an iteration method for predicting essential proteins by integrating the orthology with PPI networks, named by ION. Differently from other methods, ION identifies essential proteins depending on not only the connections between proteins but also their orthologous properties and features of their neighbors. ION is implemented to predict essential proteins in S. cerevisiae. Experimental results show that ION can achieve higher identification accuracy than eight other existing centrality methods in terms of area under the curve (AUC). Moreover, ION identifies a large amount of essential proteins which have been ignored by eight other existing centrality methods because of their low-connectivity. Many proteins ranked in top 100 by ION are both essential and belong to the complexes with certain biological functions. Furthermore, no matter how many reference organisms were selected, ION outperforms all eight other existing centrality methods. While using as many as possible reference organisms can improve the performance of ION. Additionally, ION also shows good prediction performance in E. coli K-12.
The accuracy of predicting essential proteins can be improved by integrating the orthology with PPI networks.
In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.
This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.
Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.
Identifying gene regulatory network (GRN) from time course gene expression data has attracted more and more attentions. Due to the computational complexity, most approaches for GRN reconstruction are limited on a small number of genes and low connectivity of the underlying networks. These approaches can only identify a single network for a given set of genes. However, for a large-scale gene network, there might exist multiple potential sub-networks, in which genes are only functionally related to others in the sub-networks.
We propose the network and community identification (NCI) method for identifying multiple subnetworks from gene expression data by incorporating community structure information into GRN inference. The proposed algorithm iteratively solves two optimization problems, and can promisingly be applied to large-scale GRNs. Furthermore, we present the efficient Block PCA method for searching communities in GRNs.
The NCI method is effective in identifying multiple subnetworks in a large-scale GRN. With the splitting algorithm, the Block PCA method shows a promosing attempt for exploring communities in a large-scale GRN.
Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data.
Gene expression data; nonlinear model; periodicall expressed genes; clustering; average adjusted Rand index
There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given.
Charge states of tandem mass spectra from low-resolution collision induced dissociation can not be determined by mass spectrometry. As a result, such spectra with multiple charges are usually searched multiple times by assuming each possible charge state. Not only does this strategy increase the overall database search time, but also yields more false positives. Hence, it is advantageous to determine charge states of such spectra before database search.
We propose a new approach capable of determining the charge states of low-resolution tandem mass spectra. Four novel and discriminant features are introduced to describe tandem mass spectra and used in Gaussian mixture model to distinguish doubly and triply charged peptides. By testing on three independent datasets with known validity, the results have shown that this method can assign charge states to low-resolution tandem mass spectra more accurately than existing methods.
The proposed method can be used to improve the speed and reliability of peptide identification.
Computational gene regulation models provide a means for scientists to draw biological inferences from time-course gene expression data. Based on the state-space approach, we developed a new modeling tool for inferring gene regulatory networks, called time-delayed Gene Regulatory Networks (tdGRNs). tdGRN takes time-delayed regulatory relationships into consideration when developing the model. In addition, a priori biological knowledge from genome-wide location analysis is incorporated into the structure of the gene regulatory network. tdGRN is evaluated on both an artificial dataset and a published gene expression data set. It not only determines regulatory relationships that are known to exist but also uncovers potential new ones. The results indicate that the proposed tool is effective in inferring gene regulatory relationships with time delay. tdGRN is complementary to existing methods for inferring gene regulatory networks. The novel part of the proposed tool is that it is able to infer time-delayed regulatory relationships.
Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra.
We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively.
The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra.
Availability of the software
The software created from this work is available upon request.
Tandem mass spectrometry has become particularly useful for the rapid identification and characterization of protein components of complex biological mixtures. Powerful database search methods have been developed for the peptide identification, such as SEQUEST and MASCOT, which are implemented by comparing the mass spectra obtained from unknown proteins or peptides with theoretically predicted spectra derived from protein databases. However, the majority of spectra generated from a mass spectrometry experiment are of too poor quality to be interpreted while some of spectra with high quality cannot be interpreted by one method but perhaps by others. Hence a filtering algorithm that removes those spectra with poor quality prior to the database search is appealing.
This paper proposes a support vector machine (SVM) based approach to assess the quality of tandem mass spectra. Each mass spectrum is mapping into the 16 proposed features to describe its quality. Based the results from SEQUEST, four SVM classifiers with the input of the 16 features are trained and tested on ISB data and TOV data, respectively. The superior performance of the proposed SVM classifiers is illustrated both by the comparison with the existing classifiers and by the validation in terms of MASCOT search results.
The proposed method can be employed to effectively remove the poor quality spectra before the spectral searching, and also to find the more peptides or post-translational peptides from spectra with high quality using different search engines or de novo method.
Tandem mass spectrometry has emerged as a cornerstone of high throughput proteomic studies owing in part to various high throughput search engines which are used to interpret these tandem mass spectra. However, majority of experimental tandem mass spectra cannot be interpreted by any existing methods. There are many reasons why this happens. However, one of the most important reasons is that majority of experimental spectra are of too poor quality to be interpretable. It wastes time to interpret these "uninterpretable" spectra by any methods. On the other hand, some spectra of high quality are not able to get a score high enough to be interpreted by existing search engines because there are many similar peptides in the searched database. However, such spectra may be good enough to be interpreted by de novo methods or manually verifying methods. Therefore, it is worth in developing a method for assessing spectral quality, which can used for filtering the spectra of poor quality before any interpretation attempts or for finding the most potential candidates for de novo methods or manually verifying methods.
This paper develops a novel method to assess the quality of tandem mass spectra, which can eliminate majority of poor quality spectra while losing very minority of high quality spectra. First, a number of features are proposed to describe the quality of tandem mass spectra. The proposed method maps each tandem spectrum into a feature vector. Then Fisher linear discriminant analysis (FLDA) is employed to construct the classifier (the filter) which discriminates the high quality spectra from the poor quality ones. The proposed method has been tested on two tandem mass spectra datasets acquired by ion trap mass spectrometers.
Computational experiments illustrate that the proposed method outperforms the existing ones. The proposed method is generic, and is expected to be applicable to assessing the quality of spectra acquired by instruments other than ion trap mass spectrometers.