|Home | About | Journals | Submit | Contact Us | Français|
Several protein identification methods using mass spectrum are utilized for proteomic analysis, and almost all methods rely on protein sequence databases for the characteristic of their algorithms. Basic algorithm of these methods is to compare molecular weight (MW) of peptide as digested protein or dissociated peptide fragment with calculated weight by database stored sequences and to identify which peptide or protein is measured.
The number of human protein entries in Uniprot database (UniprotKB / Swiss-Prot Release 9.0), for instance, is 14,987, and it does not reach even the number of human gene. In short, identified protein by database oriented protein identification methods are limited because of the variety of known protein sequences, a lot of mass spectrum is hard to lead identical peptide sequence.
We, therefore, propose the application of de novo peptide sequence databases for protein identification. De novo peptide sequences here are constructed by computational calculation not experimental process. As peptide sequence is basically constructed by the combination of 20 kinds of amino acids, if sequence length is n, 20n kinds of sequence are estimated. It is naturally that the data size of estimated sequences and their MWs are too huge to store in practical disk capacity, but we developed data storing scheme particular for peptide sequence fragment using table driven approach.
De novo sequencing approach is localized sequence identification because it reads sequence by not sequence database but only the intervals of peak combination. It is suitable for the point at this discussion, but hard to identify the sequence without clear distinctions of measurement peaks. On the other hand, this presenting database is for global sequence identification that almost all protein identification method applies. In this presentation, we show the contents of database and its application with several examples.