J Biomol Tech. 2007 February; 18(1): 50–51.
PMCID: PMC2291934

P145-S Interactive and Automated Top-Down Analysis of Sequencing Data of Intact Proteins


In a Top-Down workflow (ECD, ETD, ISD) the whole protein molecule is subjected to fragmentation and proteins are not digested (as in the Bottom-Up approach). This allows the detection of signal peptides, modifications, sequence variations and mutations. Top-Down analyses can be done using FTMS/ECD/ESI or MALDI-ISD spectra.

For the analysis of Top-Down datasets flexible algorithms are required that allow the creation of sequence tags based on peak mass differences and several ways to process the sequence tags in a subsequent step. Sequence tags can be created that can include various modifications, user-defined mass range, inclusion of selected peaks and exclusion of peaks matched by accepted tags. The resulting sequence tags are scored and can be used to perform MS-BLAST homology searches or Mascot sequence tag searches to identify unknown proteins. An alternative workflow that was developed involves a known protein sequence and the identification of protein modifications or mutations. Using these algorithms it is possible to automatically suggest the signal peptide structure from the mass offset between the experimental tag and the theoretical value that is based on the protein sequence database entry, modifications or a mutations.

We used the algorithms that are included in the BioTools 3.1 software package to analyze reISD-MALDI-TOF and ECD-FTICR spectra from undigested proteins in the molecular weight range 6–70 kDa and could automatically detect, e.g., the length of the pre-pro peptide of bovine serum albumin (67 kDa). Other examples involved proteins with N/D and Q/E ambiguities in their database record such as carbonic anhydrase. An interesting field of application is the localization of protein phosphorylation sites based on the undigested protein as in the top-down analysis suppression of the phosphorylated peptides is not observed.

