The MassWiz scores were found to be correlated with peptide mass. With an increase in mass, the decoy scores increased and this effect was seen to be affected by charge state (). For higher charge states (~5 or more), the mass dependence may weaken or show negative effect. This effect was also observed for data sets from various platforms and few of them have been shown in
Figure S1. By regressing decoy hits based on charge, a variable FDR threshold could be calculated for different peptide masses. This method, named FlexiFDR, was applied to various diverse data sets. To evaluate the accuracy of FlexiFDR, we tested it across instruments, data types, MS platforms and search methodology. For accomplishing this, we used known standard mixtures of increasing complexity-18,49 and 200 mix, obtained from disparate instrument types and calculated the FDR using both separate and concatenated database search strategies. A strict FDR threshold of ≤1% was applied to all search results before comparison. After calculating FDR with general and FlexiFDR method, comparisons were made at unique spectra and peptide levels. All results provided in main text are compiled from concatenated search results while the separate search results are provided as supplementary figures.
For comparative evaluation, the related terminology is explained in . Since FlexiFDR primarily is a rescoring method, most PSM and peptide identifications compared to general FDR are expected to be common. Comparing the number of hits (PSMs and peptides) does not provide a true picture. The true positives, false positives, true negatives and false negatives are defined with respect to FlexiFDR (). Several datasets of varying complexity were searched with both separate and concatenated database search approaches, called FDR
s and FDR
c respectively. The comparisons for concatenated search are shown as Venn diagrams in . Similar results are observed for separate database search (
Figure S2). The complete results are tabulated as
Table S1 and
S2.
Analysis of identifications unique to FDR and FlexiFDR provides a better depiction of the merit of one method over the other. A comparison of the unique identifications from FDR
c for standard data sets is represented as bar graphs in .
Figure S3 depicts similar results for comparison of separate searches. FlexiFDR leads to higher number of unique identifications in both methods. The numbers of true identifications are much higher in FlexiFDR as compared to FDR. FlexiFDR also decreases the false positives thereby enhancing the performance ( and ). FlexiFDR could enhance up to 14.81% Net Positive Gain in spectra identifications and upto 6.2% peptide identifications (
Table S1). On an average, FlexiFDR identified up to ~ 4.33% net positive gains in spectral identifications and 3.55% in peptide identifications in the standard mix datasets (
Table S1.A and S1.B). For unique identifications, the net positive gain was up to ~13.85 times more true spectral hits and up to ~2.3 times more true peptide hits (
Table S1 and
S2).
In general, it is known that lower mass peptides have a greater chance of being a false positive. By lowering the threshold in low mass region, one should expect more false positives. However, we have shown that proper threshold learnt from decoys, can be very effective in improving the results even at lower mass regions. Employing a charge based threshold allows for flexible modeling irrespective of the slope of the linear regression.
For the complex data sets from E. coli and Yeast, since the true and false identifications cannot be easily defined, we compared their identifications by showing number of spectral and peptide identifications ( and
Figure S2). The comparisons at 1%FDR threshold are tabulated in
Table S2. We observed that FlexiFDR assigned more spectra and peptides for both FDR
s and FDR
c. Average Percentage gain in spectral identification was 8.29% and peptide identification was 7.05%. Unique identifications were enhanced by more than double increment in spectra and peptide numbers.To check whether the trends hold true for different kinds of searches, we carried out high mass accuracy searches (ppm level), searches with semitryptic option and searches with variable modifications of phosphorylation at serine, threonine and tyrosine residues. In all these searches, similar trends were observed and FlexiFDR application resulted in better performance (). The Venn diagrams () and Bar graphs () show that FlexiFDR is applicable across different methods of data analysis.
To further explore the mass dependency, we tried to observe the effect on different search algorithms. We found that X!Tandem and OMSSA being dependent on calibrated e-values, do not have such bias. Interestingly, X!Tandem’s raw score, the hyper score, shows such a dependence (
Figure S4). Mascot ion score, however, showed negative dependence on mass (
Figure S5). Since FlexiFDR depends on the slope, it can adapt to any linear relation with mass and charge. FlexiFDR was applied to few standard datasets for Mascot for evaluation. As expected, we found better results () except for QTOF dataset where the results were nearly similar to previous results. These results show that this method is applicable to other algorithms as well and is versatile in application.
Conclusion
This approach noticeably has many advantages- it adapts itself to different instruments, data types and MS platforms. Given any dataset, it learns from the decoys and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size. It recovers many border line true spectra. By recovering true spectra and eliminating false ones, this method will aid in improved performance in label-free quantitation studies. It is also easily applicable to other algorithms after the correlated variables have been found. Although we have shown charge and mass dependence in this work, it could be other variables for different algorithms.
The slopes of decoy regression lines shown in this study are positive. But FlexiFDR is not restricted to work only on such data. It will work even if different charge states have different slopes including a mixture of positive and negative slopes for different charges. This has been successfully applied on Mascot results depicting its broader utility. For higher charge state data (>5), sometimes there are low number of spectra acquired. This may not be suitable for this method if the points are too few and skewed towards one side. Large datasets should benefit more from FlexiFDR method. Another related pitfall is that FlexiFDR might not work on very small datasets since it needs enough data to learn accurately from it. This is a general property of any FDR method per se and therefore it cannot be used where FDR cannot. The method is simple to use and extensible in design. It can be freely downloaded from
https://sourceforge.net/projects/mssuite/files/FlexiFDR/.