Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Prophossi: automating expert validation of phosphopeptide–spectrum matches from tandem mass spectrometry 
Bioinformatics  2010;26(17):2153-2159.
Motivation: Complex patterns of protein phosphorylation mediate many cellular processes. Tandem mass spectrometry (MS/MS) is a powerful tool for identifying these post-translational modifications. In high-throughput experiments, mass spectrometry database search engines, such as MASCOT provide a ranked list of peptide identifications based on hundreds of thousands of MS/MS spectra obtained in a mass spectrometry experiment. These search results are not in themselves sufficient for confident assignment of phosphorylation sites as identification of characteristic mass differences requires time-consuming manual assessment of the spectra by an experienced analyst. The time required for manual assessment has previously rendered high-throughput confident assignment of phosphorylation sites challenging.
Results: We have developed a knowledge base of criteria, which replicate expert assessment, allowing more than half of cases to be automatically validated and site assignments verified with a high degree of confidence. This was assessed by comparing automated spectral interpretation with careful manual examination of the assignments for 501 peptides above the 1% false discovery rate (FDR) threshold corresponding to 259 putative phosphorylation sites in 74 proteins of the Trypanosoma brucei proteome. Despite this stringent approach, we are able to validate 80 of the 91 phosphorylation sites (88%) positively identified by manual examination of the spectra used for the MASCOT searches with a FDR < 15%.
Conclusions:High-throughput computational analysis can provide a viable second stage validation of primary mass spectrometry database search results. Such validation gives rapid access to a systems level overview of protein phosphorylation in the experiment under investigation.
Availability: A GPL licensed software implementation in Perl for analysis and spectrum annotation is available in the supplementary material and a web server can be assessed online at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2922888  PMID: 20651112
2.  The Phosphoproteome of Bloodstream Form Trypanosoma brucei, Causative Agent of African Sleeping Sickness 
The protozoan parasite Trypanosoma brucei is the causative agent of human African sleeping sickness and related animal diseases, and it has over 170 predicted protein kinases. Protein phosphorylation is a key regulatory mechanism for cellular function that, thus far, has been studied in T.brucei principally through putative kinase mRNA knockdown and observation of the resulting phenotype. However, despite the relatively large kinome of this organism and the demonstrated essentiality of several T. brucei kinases, very few specific phosphorylation sites have been determined in this organism. Using a gel-free, phosphopeptide enrichment-based proteomics approach we performed the first large scale phosphorylation site analyses for T.brucei. Serine, threonine, and tyrosine phosphorylation sites were determined for a cytosolic protein fraction of the bloodstream form of the parasite, resulting in the identification of 491 phosphoproteins based on the identification of 852 unique phosphopeptides and 1204 phosphorylation sites. The phosphoproteins detected in this study are predicted from their genome annotations to participate in a wide variety of biological processes, including signal transduction, processing of DNA and RNA, protein synthesis, and degradation and to a minor extent in metabolic pathways. The analysis of phosphopeptides and phosphorylation sites was facilitated by in-house developed software, and this automated approach was validated by manual annotation of spectra of the kinase subset of proteins. Analysis of the cytosolic bloodstream form T. brucei kinome revealed the presence of 44 phosphorylated protein kinases in our data set that could be classified into the major eukaryotic protein kinase groups by applying a multilevel hidden Markov model library of the kinase catalytic domain. Identification of the kinase phosphorylation sites showed conserved phosphorylation sequence motifs in several kinase activation segments, supporting the view that phosphorylation-based signaling is a general and fundamental regulatory process that extends to this highly divergent lower eukaryote.
PMCID: PMC2716717  PMID: 19346560
3.  The Jpred 3 secondary structure prediction server 
Nucleic Acids Research  2008;36(Web Server issue):W197-W201.
Jpred ( is a secondary structure prediction server powered by the Jnet algorithm. Jpred performs over 1000 predictions per week for users in more than 50 countries. The recently updated Jnet algorithm provides a three-state (α-helix, β-strand and coil) prediction of secondary structure at an accuracy of 81.5%. Given either a single protein sequence or a multiple sequence alignment, Jpred derives alignment profiles from which predictions of secondary structure and solvent accessibility are made. The predictions are presented as coloured HTML, plain text, PostScript, PDF and via the Jalview alignment editor to allow flexibility in viewing and applying the data. The new Jpred 3 server includes significant usability improvements that include clearer feedback of the progress or failure of submitted requests. Functional improvements include batch submission of sequences, summary results via email and updates to the search databases. A new software pipeline will enable Jnet/Jpred to continue to be updated in sync with major updates to SCOP and UniProt and so ensures that Jpred 3 will maintain high-accuracy predictions.
PMCID: PMC2447793  PMID: 18463136
4.  OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy 
BMC Bioinformatics  2003;4:47.
The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged.
The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs) of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with <8 sequences gave 91.4% accuracy, significantly better than CLUSTALW (88.9%) and all other methods considered here. The complete suite is available from .
The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP Sc Score which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94.5% with 52.5% accuracy for alignments in the 0–10 percentage identity range. This suggests that further improvements in accuracy will be possible in the future.
PMCID: PMC280650  PMID: 14552658
protein; multiple sequence alignment; benchmark; structural alignment

Results 1-4 (4)