|Home | About | Journals | Submit | Contact Us | Français|
Peptides are commonly identified by searching tandem mass spectrometric data against a protein sequence collection; the protein sequences are digested in silico according to the specificity of the enzyme used in the experiment, and the fragments of each peptide are calculated. With this approach, the information in the tandem mass spectra cannot fully be utilized because we can neither predict the probability for observing a peptide, nor accurately calculate the intensities of the fragment ions from the sequence. In contrast, if we search a spectrum library that has been built from experimental data, we can more effectively utilize the information, because the search space is minimized by searching only mass spectra that have previously been observed, and the intensities of the fragment ions in the query spectrum and the library spectrum are similar. Here, we present a method for constructing and using annotated peptide spectrum libraries (ASL).
The ASL were created from a set of consensus spectra associated with peptide sequences from approximately 13,000,000 confidently assigned experimental tandem spectra from the Global Proteome Machine Database, using a four-stage pipeline curation process to improve the reliability. The current ASL collection contains data for six model eukaryotic species: human, mouse, dog, cow, rat, and budding yeast. Average ASL number of spectra per gene ranged from 6.9 (human) to 3.8 (cow). The peptide sequences in these libraries are sequence aligned with the corresponding ENSEMBL, SWISS-PROT, IPI, or SGD accession numbers. A high-speed search engine, X! Hunter, was constructed to use these libraries, which can identify peptides from sets of experimental mass spectra at a rate of 20,000 spectra/second.
The speed and sensitivity of the search engine are compared with standard techniques, and application of ASL to the high-speed screening of experimental data and instrument control is discussed.