Comparison of the rhesus monkey de novo sequences with sequences in the human protein database enabled the validation of the de novo capabilities of the present method. This was accomplished by performing the detailed de novo method on tandem mass spectrometer spectra from Homo sapiens samples and objectively assessing the accuracy of the de novo sequencing by performing conventional database search (MASCOT) on the same spectra. Figure shows the preliminary separation of proteins from nucleus accumbens of Macaca mulatta using the two-dimensional gel electrophoresis. The protein spots selected for de novo sequencing analysis ranged between 11- to 70-KDa and pI of 4.5–6.5 with a uniform distribution over the entire 2-D gel with respect to the Mr and pI. The labeled protein spots were subjected to in-gel trypsin digestion after de-staining the gel plugs. The peptide fragments extracted from the gel plugs were then subjected to tandem-mass spectrometry using the ABI 4700 proteomics analyzer (MALDI-TOF-TOF).
Tandem-mass spectra were then submitted for database searching (GPS Explorer: MASCOT), allowing to be searched with and without all known post-translational modifications, for protein characterization using the limited
Macaca mulatta database. The majority of spectral analyses yielded no positive characterization, at which point spectra were subjected to PEAKS
de novo analysis. According to the manufacturer, "the algorithm first computes a
y-ion matching score and a
b-ion matching score at each mass value according to the peaks around it. If there are no peaks around a mass value, a penalty value is assigned. The algorithm then efficiently computes many amino acid sequences that maximize the total scores at the mass values of
b-ions and
y-ions. These candidate sequences are further evaluated by a more accurate scoring function, which also considers other ion types such as immonium ions and internal-cleavage ions (Figure ). The problem of ion absence is addressed because the PEAKS model assigns a score (or penalty) for each mass value. The software also computes a 'positional confidence' for each amino acid in the final result by examining the consensus of the top-scoring peptides" [
4].
Thirteen targeted protein spots (Figure ) were identified by MALDI-TOF-TOF followed by peptide sequencing using PEAKS Studio 4.0 de novo sequencing software. The generalized schematic of the methodology used in the current study to compile a database for Macaca mulatta is depicted in Figure . Detailed information of the confirmed protein characterization are elaborated in Table with respect to the precursor mass, m/z error (ppm), PEAKS and SPIDER score for confidence interval (%) for the PEAKS de novo generated peptide sequences and their corresponding homology searches. This method characterized 13 protein spots out of 30 protein spots initially selected for de novo analysis.
| Table 1The proteins identified by de novo amino acid sequencing using MALDI-TOF-TOF |
The tandem-mass spectra were analyzed by PEAKS
de novo sequencing software to generate amino acid sequences (Figure ). All tandem-mass spectra were deconvoluted to minimize the error in
de novo sequencing. Figure shows the fragmentation pattern of a precursor ion with m/z of 1967.8951. As has been documented previously and can be noted in the spectrum (Figure ), complementary information is not always available for all
b-ions and
y-ions and not all the immonium ions are represented in the spectra. Spectral analysis is further complicated by the appearance of some
a-ions, neutral losses of water and ammonia for
b-ions and
y-ions. These analysis caveats render the ability to obtain a manual
de novo sequence tedious if not impossible. As elaborated in the Methods Section, the PEAKS
de novo sequencing utilizes most abundant peptide fragments '
b-ions and
y-ions'; the less abundant peptide fragments '
a-ions'; the neutral losses of water and ammonia for
b-ions and
y-ions; as well as the
immonium ions to develop confident and complete peptide sequences
de novo from MS/MS spectra[
24]. The
b-,
y-,
a-, and
immonium-ions as well as the neutral losses of water and ammonia for
y-ions are tabulated in Figure for the amino-acid sequence 'RSALQAAHDAVAQEGQCR'. The tandem-mass spectrum in Figure is representative of the similar analysis performed on the remaining protein spots from Figure .
Twenty peptide sequences were characterized by PEAKS de novo sequence analysis software from 13 protein spots (Table ). The generated sequences were used to perform homology searches to characterize proteins. As a standard measure, all de novo generated amino-acid sequences were searched further for homologous sequences using the PEAKS homology search engine against the Mammalian database. Out of the twenty de novo generated sequences subjected to PEAKS homology search, thirteen yielded positive protein characterization (Table ). All peptide sequences exhibited homology to Homo sapiens, with the exception of the sequence from one spot (GST pi enzyme: Macaca mulatta), The inability of the PEAKS homology search to resolve the remaining seven sequences may be attributed to the fact that the software assumes that the de novo sequence is 100% correct. Whereas standard BLAST assumes 100% accuracy of the de novo sequence, SPIDER software accounts for possible errors in de novo sequencing. Also, it should be noted that the conventional search engines such as BLAST and FASTA are designed to handle queries which are longer than 35 amino acids. Prototypically, the peptide sequences obtained after trypsin digestion are not longer than 10–15 amino acids. SPIDER software was utilized for homology based database searches in instances where PEAKS homology searches failed to provide positive protein identification. Such errors were characteristically due to partially correct sequence tags and replacement of an amino acid segment by another segment with approximately the same mass. The criteria used for the SPIDER based searches were as follows: non-gapped homology match; mass tolerance of 0.1 Da; NCBInr database; leucine equals isoleucine; lysine equals glutamine; carbamidomethylation and methionine in oxidized form. The approach yielded positive characterization of the remaining seven peptide sequences. Of these, five peptide sequences resulted in the characterization of three new proteins previously not characterized by the PEAKS homology search. The remaining two peptides correspond to previously identified proteins; however, the peptides represent new characterizations.
Table also shows that the peptide sequences generated by the PEAKS de novo sequencing software returned identical sequences when searched for homologous sequences in the database, with the exception of four peptides belonging to three proteins. Thus, the PEAKS de novo sequencing software was able to provide positive protein identification in most instances. An example of the added benefit of coupling PEAKS de novo sequencing software with the SPIDER homology search is shown in Figure . The original sequence generated by the PEAKS de novo software was '[RS]A [L]QAAHDAVAQEGQCR', whereas the SPIDER homology search returned a sequence of '[NE]A [I]QAAHDAVAQEGQCR' with a unequivocal score of 10 and associated to protein ubiquitin carboxyl-terminal esterase L1 (Homo Sapiens). At least a part of the error of the PEAKS de novo software was due to the I/L ambiguity.
Thirteen targeted protein spots (Figure ) were identified by MALDI-TOF-TOF followed by peptide sequencing using PEAKS Studio 4.0 de novo sequencing software (e.g. Figure ). The generalized schematic of the methodology used in the current study to compile a database for Macaca mulatta is depicted in Figure . The detailed information of the confirmed protein characterization are elaborated in Table with respect to the precursor mass, m/z error (ppm), PEAKS and SPIDER score for confidence interval (%) for the PEAKS de novo generated peptide sequences and their corresponding homology searches. This method characterized 13 protein spots out of 30 protein spots initially selected for de novo analysis.