RNA editing by deamination of specific adenosine bases to inosines during pre-mRNA processing generates edited isoforms of proteins. Recoding RNA editing is more widespread in Drosophila than in vertebrates. Editing levels rise strongly at metamorphosis, and Adar5G1 null mutant flies lack editing events in hundreds of CNS transcripts; mutant flies have reduced viability, severely defective locomotion and age-dependent neurodegeneration. On the other hand, overexpressing an adult dADAR isoform with high enzymatic activity ubiquitously during larval and pupal stages is lethal. Advantage was taken of this to screen for genetic modifiers; Adar overexpression lethality is rescued by reduced dosage of the Rdl (Resistant to dieldrin), gene encoding a subunit of inhibitory GABA receptors. Reduced dosage of the Gad1 gene encoding the GABA synthetase also rescues Adar overexpression lethality. Drosophila Adar5G1 mutant phenotypes are ameliorated by feeding GABA modulators. We demonstrate that neuronal excitability is linked to dADAR expression levels in individual neurons; Adar-overexpressing larval motor neurons show reduced excitability whereas Adar5G1 null mutant or targeted Adar knockdown motor neurons exhibit increased excitability. GABA inhibitory signalling is impaired in human epileptic and autistic conditions, and vertebrate ADARs may have a relevant evolutionarily conserved control over neuronal excitability.
Tissue microarrays (TMAs) allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumour biopsies. TMA Navigator (www.tmanavigator.org) is an open access web application for analysis of TMA data and related information, accommodating categorical, semi-continuous and continuous expression scores. Non-biological variation, or batch effects, can hinder data analysis and may be mitigated using the ComBat algorithm, which is incorporated with enhancements for automated application to TMA data. Unsupervised grouping of samples (patients) is provided according to Gaussian mixture modelling of marker scores, with cardinality selected by Bayesian information criterion regularization. Kaplan–Meier survival analysis is available, including comparison of groups identified by mixture modelling using the Mantel-Cox log-rank test. TMA Navigator also supports network inference approaches useful for TMA datasets, which often constitute comparatively few markers. Tissue and cell-type specific networks derived from TMA expression data offer insights into the molecular logic underlying pathophenotypes, towards more effective and personalized medicine. Output is interactive, and results may be exported for use with external programs. Private anonymous access is available, and user accounts may be generated for easier data management.
► Identifies key considerations in target selection and optimisation. ► Approaches to assign useful protein features and structure/function relationships. ► Comparison of latest crystallisation propensity predictors on nonredundant data. ► Discusses single point of reference target selection/optimisation resources. ► Guidance on using the SSPF Target Optimisation Utility (TarO).
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline.
Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
MSA, Multiple Sequence Alignment; PTM, Post Translational Modification; SSPF, Scottish Structural Proteomics Facility; MCC, Matthew’s correlation coefficient; AROC, Area Under the Receiver Operator Characteristic curve; Target selection; Crystallisation; Structural genomics; Structural biology; Bioinformatics; Construct design
Staphylococcus aureus is a major human pathogen and strains resistant to existing treatments continue to emerge. Development of novel treatments is therefore important. Antimicrobial peptides represent a source of potential novel antibiotics to combat resistant bacteria such as Methicillin-Resistant Staphylococcus aureus (MRSA). A promising antimicrobial peptide is ranalexin, which has potent activity against Gram-positive bacteria, and particularly S. aureus. Understanding mode of action is a key component of drug discovery and network biology approaches enable a global, integrated view of microbial physiology, including mechanisms of antibiotic killing. We developed a systems-wide functional association network approach to integrate proteome and transcriptome profiles, enabling study of drug resistance and mode of action.
The functional association network was constructed by Bayesian logistic regression, providing a framework for identification of antimicrobial peptide (ranalexin) response modules from S. aureus MRSA-252 transcriptome and proteome profiling. These signatures of ranalexin treatment revealed multiple killing mechanisms, including cell wall activity. Cell wall effects were supported by gene disruption and osmotic fragility experiments. Furthermore, twenty-two novel virulence factors were inferred, while the VraRS two-component system and PhoU-mediated persister formation were implicated in MRSA tolerance to cationic antimicrobial peptides.
This work demonstrates a powerful integrative approach to study drug resistance and mode of action. Our findings are informative to the development of novel therapeutic strategies against Staphylococcus aureus and particularly MRSA.
The Scottish Structural Proteomics Facility was funded to develop a laboratory scale approach to high throughput structure determination. The effort was successful in that over 40 structures were determined. These structures and the methods harnessed to obtain them are reported here. This report reflects on the value of automation but also on the continued requirement for a high degree of scientific and technical expertise. The efficiency of the process poses challenges to the current paradigm of structural analysis and publication. In the 5 year period we published ten peer-reviewed papers reporting structural data arising from the pipeline. Nevertheless, the number of structures solved exceeded our ability to analyse and publish each new finding. By reporting the experimental details and depositing the structures we hope to maximize the impact of the project by allowing others to follow up the relevant biology.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-010-9090-y) contains supplementary material, which is available to authorized users.
High-throughput; Protein crystallography; Structural proteomics; SSPF
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K.
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K. The protein crystallizes in a primitive triclinic lattice, with unit-cell parameters a = 45.3, b = 60.5, c = 117.6 Å, α = 87.2, β = 81.2, γ = 68.4°. A complete data set has been collected to 2.3 Å resolution at the ESRF. Consideration of the likely solvent content suggested the asymmetric unit to contain four molecules. This has been confirmed by molecular-replacement phasing calculations, which give a solution with four monomers using a monomer of pantothenate synthetase from Escherichia coli (PDB code 1iho), which is 41% identical to Sar2676, as a search model.
Sar2676; pantothenate synthetase; methicillin-resistant Staphylococcus aureus
As part of work on S. aureus, the crystallization of Sar2028, a protein that is upregulated in MRSA, is reported.
Sar2028, an aspartate/tyrosine/phenylalanine pyridoxal-5′-phosphate-dependent aminotransferase with a molecular weight of 48 168 Da, was overexpressed in methicillin-resistant Staphylococcus aureus compared with a methicillin-sensitive strain. The protein was expressed in Escherichia coli, purified and crystallized. The protein crystallized in a primitive orthorhombic Laue group with unit-cell parameters a = 83.6, b = 91.3, c = 106.0 Å, α = β = γ = 90°. Analysis of the systematic absences along the three principal axes indicated the space group to be P212121. A complete data set was collected to 2.5 Å resolution.
Sar2028; Staphylococcus aureus; aminotransferases
TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques. The protein sequence is analysed by 17 algorithms and compared to 8 databases. TarO gathers putative homologues, including orthologues, and then obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications. Analyses are run on a high-performance computing cluster, the results integrated, stored in a database and accessed through a web-based user interface. Output is in tabulated format and in the form of an annotated multiple sequence alignment (MSA) that may be edited interactively in the program Jalview. TarO also simplifies the gathering of additional annotations via the Distributed Annotation System, both from the MSA in Jalview and through links to Dasty2. Routes to other information gateways are included, for example to relevant pages from UniProt, COG and the Conserved Domains Database. Open access to TarO is available from a guest account with private accounts for academic use available on request. Future development of TarO will include further analysis steps and integration with the Protein Information Management System (PIMS), a sister project in the BBSRC ‘Structural Proteomics of Rational Targets’ initiative
Production of diffracting crystals is a critical step in determining the three-dimensional structure of a protein by X-ray crystallography. Computational techniques to rank proteins by their propensity to yield diffraction-quality crystals can improve efficiency in obtaining structural data by guiding both protein selection and construct design. XANNpred comprises a pair of artificial neural networks that each predict the propensity of a selected protein sequence to produce diffraction-quality crystals by current structural biology techniques. Blind tests show XANNpred has accuracy and Matthews correlation values ranging from 75% to 81% and 0.50 to 0.63 respectively; values of area under the receiver operator characteristic (ROC) curve range from 0.81 to 0.88. On blind test data XANNpred outperforms the other available algorithms XtalPred, PXS, OB-Score, and ParCrys. XANNpred also guides construct design by presenting graphs of predicted propensity for diffraction-quality crystals against residue sequence position. The XANNpred-SG algorithm is likely to be most useful to target selection in structural genomics consortia, while the XANNpred-PDB algorithm is more suited to the general structural biology community. XANNpred predictions that include sliding window graphs are freely available from http://www.compbio.dundee.ac.uk/xannpred Proteins 2011. © 2010 Wiley-Liss, Inc.
computational biology; bioinformatics; crystallization; software; artificial neural network; predictor
Proteogenomics has the potential to advance genome annotation
through high quality peptide identifications derived from mass spectrometry
experiments, which demonstrate a given gene or isoform is expressed
and translated at the protein level. This can advance our understanding
of genome function, discovering novel genes and gene structure that
have not yet been identified or validated. Because of the high-throughput
shotgun nature of most proteomics experiments, it is essential to
carefully control for false positives and prevent any potential misannotation.
A number of statistical procedures to deal with this are in wide use
in proteomics, calculating false discovery rate (FDR) and posterior
error probability (PEP) values for groups and individual peptide spectrum
matches (PSMs). These methods control for multiple testing and exploit
decoy databases to estimate statistical significance. Here, we show
that database choice has a major effect on these confidence estimates
leading to significant differences in the number of PSMs reported.
We note that standard target:decoy approaches using six-frame translations
of nucleotide sequences, such as assembled transcriptome data, apparently
underestimate the confidence assigned to the PSMs. The source of this
error stems from the inflated and unusual nature of the six-frame
database, where for every target sequence there exists five “incorrect”
targets that are unlikely to code for protein. The attendant FDR and
PEP estimates lead to fewer accepted PSMs at fixed thresholds, and
we show that this effect is a product of the database and statistical
modeling and not the search engine. A variety of approaches to limit
database size and remove noncoding target sequences are examined and
discussed in terms of the altered statistical estimates generated
and PSMs reported. These results are of importance to groups carrying
out proteogenomics, aiming to maximize the validation and discovery
of gene structure in sequenced genomes, while still controlling for
proteogenomics; peptide spectrum match; false
discovery rate; posterior error probability; expressed