Comparison of the rhesus monkey de novo sequences with sequences in the human protein database enabled the validation of the de novo capabilities of the present method. This was accomplished by performing the detailed de novo method on tandem mass spectrometer spectra from Homo sapiens samples and objectively assessing the accuracy of the de novo sequencing by performing conventional database search (MASCOT) on the same spectra. Figure shows the preliminary separation of proteins from nucleus accumbens of Macaca mulatta using the two-dimensional gel electrophoresis. The protein spots selected for de novo sequencing analysis ranged between 11- to 70-KDa and pI of 4.5–6.5 with a uniform distribution over the entire 2-D gel with respect to the Mr and pI. The labeled protein spots were subjected to in-gel trypsin digestion after de-staining the gel plugs. The peptide fragments extracted from the gel plugs were then subjected to tandem-mass spectrometry using the ABI 4700 proteomics analyzer (MALDI-TOF-TOF).
Figure 1 Representative 2D gels of Macaca mulatta protein sample stained with SyproRuby™. The polypeptide molecular mass scale in kDa is depicted on the y-axis while the x-axis shows the pI range. The proteins were resolved in 4–7 linear pH gradient (more ...)
Tandem-mass spectra were then submitted for database searching (GPS Explorer: MASCOT), allowing to be searched with and without all known post-translational modifications, for protein characterization using the limited Macaca mulatta
database. The majority of spectral analyses yielded no positive characterization, at which point spectra were subjected to PEAKS de novo
analysis. According to the manufacturer, "the algorithm first computes a y
-ion matching score and a b
-ion matching score at each mass value according to the peaks around it. If there are no peaks around a mass value, a penalty value is assigned. The algorithm then efficiently computes many amino acid sequences that maximize the total scores at the mass values of b
-ions and y
-ions. These candidate sequences are further evaluated by a more accurate scoring function, which also considers other ion types such as immonium ions and internal-cleavage ions (Figure ). The problem of ion absence is addressed because the PEAKS model assigns a score (or penalty) for each mass value. The software also computes a 'positional confidence' for each amino acid in the final result by examining the consensus of the top-scoring peptides" [4
Figure 2 Representative De novo analysis of a MALDI-TOF-TOF spectrum. (A) The x- and y-axes show the mass to charge (m/z) ratio and the % abundance of the precursor ion fragments, respectively. The MS/MS spectrum was analyzed by PEAKS de novo sequencing software (more ...)
Thirteen targeted protein spots (Figure ) were identified by MALDI-TOF-TOF followed by peptide sequencing using PEAKS Studio 4.0 de novo sequencing software. The generalized schematic of the methodology used in the current study to compile a database for Macaca mulatta is depicted in Figure . Detailed information of the confirmed protein characterization are elaborated in Table with respect to the precursor mass, m/z error (ppm), PEAKS and SPIDER score for confidence interval (%) for the PEAKS de novo generated peptide sequences and their corresponding homology searches. This method characterized 13 protein spots out of 30 protein spots initially selected for de novo analysis.
Schematic of the methodology for compilation of protein database for Macaca mulatta from De Novo analysis of MALDI-TOF-TOF spectra.
The proteins identified by de novo amino acid sequencing using MALDI-TOF-TOF
The tandem-mass spectra were analyzed by PEAKS de novo
sequencing software to generate amino acid sequences (Figure ). All tandem-mass spectra were deconvoluted to minimize the error in de novo
sequencing. Figure shows the fragmentation pattern of a precursor ion with m/z of 1967.8951. As has been documented previously and can be noted in the spectrum (Figure ), complementary information is not always available for all b-ions
and not all the immonium ions are represented in the spectra. Spectral analysis is further complicated by the appearance of some a-ions
, neutral losses of water and ammonia for b-ions
. These analysis caveats render the ability to obtain a manual de novo
sequence tedious if not impossible. As elaborated in the Methods Section, the PEAKS de novo
sequencing utilizes most abundant peptide fragments 'b-ions
'; the less abundant peptide fragments 'a-ions
'; the neutral losses of water and ammonia for b-ions
; as well as the immonium ions
to develop confident and complete peptide sequences de novo
from MS/MS spectra[24
]. The b-
, and immonium-ions
as well as the neutral losses of water and ammonia for y-ions
are tabulated in Figure for the amino-acid sequence 'RSALQAAHDAVAQEGQCR'. The tandem-mass spectrum in Figure is representative of the similar analysis performed on the remaining protein spots from Figure .
Twenty peptide sequences were characterized by PEAKS de novo sequence analysis software from 13 protein spots (Table ). The generated sequences were used to perform homology searches to characterize proteins. As a standard measure, all de novo generated amino-acid sequences were searched further for homologous sequences using the PEAKS homology search engine against the Mammalian database. Out of the twenty de novo generated sequences subjected to PEAKS homology search, thirteen yielded positive protein characterization (Table ). All peptide sequences exhibited homology to Homo sapiens, with the exception of the sequence from one spot (GST pi enzyme: Macaca mulatta), The inability of the PEAKS homology search to resolve the remaining seven sequences may be attributed to the fact that the software assumes that the de novo sequence is 100% correct. Whereas standard BLAST assumes 100% accuracy of the de novo sequence, SPIDER software accounts for possible errors in de novo sequencing. Also, it should be noted that the conventional search engines such as BLAST and FASTA are designed to handle queries which are longer than 35 amino acids. Prototypically, the peptide sequences obtained after trypsin digestion are not longer than 10–15 amino acids. SPIDER software was utilized for homology based database searches in instances where PEAKS homology searches failed to provide positive protein identification. Such errors were characteristically due to partially correct sequence tags and replacement of an amino acid segment by another segment with approximately the same mass. The criteria used for the SPIDER based searches were as follows: non-gapped homology match; mass tolerance of 0.1 Da; NCBInr database; leucine equals isoleucine; lysine equals glutamine; carbamidomethylation and methionine in oxidized form. The approach yielded positive characterization of the remaining seven peptide sequences. Of these, five peptide sequences resulted in the characterization of three new proteins previously not characterized by the PEAKS homology search. The remaining two peptides correspond to previously identified proteins; however, the peptides represent new characterizations.
Table also shows that the peptide sequences generated by the PEAKS de novo sequencing software returned identical sequences when searched for homologous sequences in the database, with the exception of four peptides belonging to three proteins. Thus, the PEAKS de novo sequencing software was able to provide positive protein identification in most instances. An example of the added benefit of coupling PEAKS de novo sequencing software with the SPIDER homology search is shown in Figure . The original sequence generated by the PEAKS de novo software was '[RS]A [L]QAAHDAVAQEGQCR', whereas the SPIDER homology search returned a sequence of '[NE]A [I]QAAHDAVAQEGQCR' with a unequivocal score of 10 and associated to protein ubiquitin carboxyl-terminal esterase L1 (Homo Sapiens). At least a part of the error of the PEAKS de novo software was due to the I/L ambiguity.
Thirteen targeted protein spots (Figure ) were identified by MALDI-TOF-TOF followed by peptide sequencing using PEAKS Studio 4.0 de novo sequencing software (e.g. Figure ). The generalized schematic of the methodology used in the current study to compile a database for Macaca mulatta is depicted in Figure . The detailed information of the confirmed protein characterization are elaborated in Table with respect to the precursor mass, m/z error (ppm), PEAKS and SPIDER score for confidence interval (%) for the PEAKS de novo generated peptide sequences and their corresponding homology searches. This method characterized 13 protein spots out of 30 protein spots initially selected for de novo analysis.