► Identifies key considerations in target selection and optimisation. ► Approaches to assign useful protein features and structure/function relationships. ► Comparison of latest crystallisation propensity predictors on nonredundant data. ► Discusses single point of reference target selection/optimisation resources. ► Guidance on using the SSPF Target Optimisation Utility (TarO).
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline.
Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
MSA, Multiple Sequence Alignment; PTM, Post Translational Modification; SSPF, Scottish Structural Proteomics Facility; MCC, Matthew’s correlation coefficient; AROC, Area Under the Receiver Operator Characteristic curve; Target selection; Crystallisation; Structural genomics; Structural biology; Bioinformatics; Construct design
Summary: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server.
Availability: The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org
Jpred (http://www.compbio.dundee.ac.uk/jpred) is a secondary structure prediction server powered by the Jnet algorithm. Jpred performs over 1000 predictions per week for users in more than 50 countries. The recently updated Jnet algorithm provides a three-state (α-helix, β-strand and coil) prediction of secondary structure at an accuracy of 81.5%. Given either a single protein sequence or a multiple sequence alignment, Jpred derives alignment profiles from which predictions of secondary structure and solvent accessibility are made. The predictions are presented as coloured HTML, plain text, PostScript, PDF and via the Jalview alignment editor to allow flexibility in viewing and applying the data. The new Jpred 3 server includes significant usability improvements that include clearer feedback of the progress or failure of submitted requests. Functional improvements include batch submission of sequences, summary results via email and updates to the search databases. A new software pipeline will enable Jnet/Jpred to continue to be updated in sync with major updates to SCOP and UniProt and so ensures that Jpred 3 will maintain high-accuracy predictions.
SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.
The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.
The MyHits web server (http://myhits.isb-sib.ch) is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures. Guest users can use the system anonymously, with full access to (i) standard bioinformatics programs (e.g. PSI-BLAST, ClustalW, T-Coffee, Jalview); (ii) a large number of protein sequence databases, including standard (Swiss-Prot, TrEMBL) and locally developed databases (splice variants); (iii) databases of protein motifs (Prosite, Interpro); (iv) a precomputed list of matches (‘hits’) between the sequence and motif databases. All databases are updated on a weekly basis and the hit list is kept up to date incrementally. The MyHits server also includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features. Free registration enables users to upload their own sequences and motifs to private databases. These are then made available through the same web interface and the same set of analytical tools. Registered users can manage their own sequences and annotations using only web tools and freeze their data in their private database for publication purposes.
Rising drug resistance is limiting treatment options
by methicillin-resistant Staphylococcus aureus (MRSA).
Herein we provide new evidence that wall teichoic acid (WTA) biogenesis
is a remarkable antibacterial target with the capacity to destabilize
the cooperative action of penicillin-binding proteins (PBPs) that
underlie β-lactam resistance in MRSA. Deletion of gene tarO, encoding the first step of WTA synthesis, resulted
in the restoration of sensitivity of MRSA to a unique profile of β-lactam
antibiotics with a known selectivity for penicillin binding protein
2 (PBP2). Of these, cefuroxime was used as a probe to screen for previously
approved drugs with a cryptic capacity to potentiate its activity
against MRSA. Ticlopidine, the antiplatelet drug Ticlid, strongly
potentiated cefuroxime, and this synergy was abolished in strains
lacking tarO. The combination was also effective
in a Galleria mellonella model of infection. Using
both genetic and biochemical strategies, we determined the molecular
target of ticlopidine as the N-acetylglucosamine-1-phosphate
transferase encoded in gene tarO and provide evidence
that WTA biogenesis represents an Achilles heel supporting the cooperative
function of PBP2 and PBP4 in creating highly cross-linked muropeptides
in the peptidoglycan of S. aureus. This approach
represents a new paradigm to tackle MRSA infection.
Methicillin resistance in Staphylococcus aureus depends on the production of mecA, which encodes penicillin-binding protein 2A (PBP2A), an acquired peptidoglycan transpeptidase (TP) with reduced susceptibility to beta-lactam antibiotics. PBP2A crosslinks nascent peptidoglycan when the native TPs are inhibited by beta-lactams. Although mecA expression is essential for beta-lactam resistance, it is not sufficient. Here we show that blocking the expression of wall teichoic acids (WTAs) by inhibiting the first enzyme in the pathway, TarO, sensitizes MRSA strains to beta-lactams even though the beta-lactam-resistant transpeptidase, PBP2A, is still expressed. The dramatic synergy between TarO inhibitors and beta-lactams is noteworthy not simply because strategies to overcome methicillin-resistant S. aureus (MRSA) are desperately needed, but because neither TarO nor the activities of the native TPs are essential in MRSA strains. The “synthetic lethality” of inhibiting TarO and the native TPs suggests a functional connection between ongoing WTA expression and peptidoglycan assembly in S. aureus. Indeed, transmission electron microscopy shows that S. aureus cells blocked in WTA synthesis have extensive defects in septation and cell separation, indicating dysregulated cell wall assembly and degradation. Our studies imply that WTAs play a fundamental role in S. aureus cell division and raise the possibility that synthetic lethal compound combinations may have therapeutic utility for overcoming antibiotic resistant bacterial infections.
Summary: JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts.
Availability and Implementation: JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
TropGENE-DB, is a crop information system created to store genetic, molecular and phenotypic data of the numerous yet poorly documented tropical crop species. The most common data stored in TropGENE-DB are information on genetic resources (agro-morphological data, parentages, allelic diversity), molecular markers, genetic maps, results of quantitative trait loci analyses, data from physical mapping, sequences, genes, as well as the corresponding references. TropGENE-DB is organized on a crop basis with currently three running modules (sugarcane, cocoa and banana), with plans to create additional modules for rice, cotton, oil palm, coconut, rubber tree, pineapple, taro, yam and sorghum. The TropGENE-DB information system is accessible for consultation via the internet at http://tropgenedb.cirad.fr. Specific web consultation interfaces have been designed to allow quick consultations as well as complex queries.
The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/
Functional annotation is routinely performed for large-scale genomics projects and databases. Researchers working on more specific problems, for instance on an individual pathway or complex, also need to be able to quickly, completely and accurately annotate sequences. The Bioverse sequence annotation server (http://bioverse.compbio.washington.edu) provides a web-based interface to allow users to submit protein sequences to the Bioverse framework. Sequences are functionally and structurally annotated and potential contextual annotations are provided. Researchers can also submit candidate genomes for annotation of all proteins encoded by the genome (proteome).
The physicochemical and pasting properties of taro (Colocasia esculenta L.) flour were investigated and compared with flours from other botanical sources. Proximate composition, color parameters, water and oil absorption, foaming characteristics and pasting properties (measured using Rapid visco analyzer) of flours were related to each other using Pearson correlation and principal component analysis (PCA). Taro flour was significantly (P < 0.05) different from other flours in exhibiting highest carbohydrate, water absorption, and lower protein, foaming capacity and setback viscosity. Peak viscosity of taro flour was lower in comparison to potato flour but higher than that of soya and corn flours. Several significant correlations between functional and pasting properties were revealed both by PCA and Pearson correlation. PCA showed that taro and potato flours were located at the left of the score plot with a negative score, while soybean and corn flours had a large positive score in the first principal component.
Flour; Taro; Functional; Physicochemical; Pasting; Potato
The 3D-GENOMICS database (http://www.sbg.bio.ic.ac.uk/3dgenomics/) provides structural annotations for proteins from sequenced genomes. In August 2003 the database included data for 93 proteomes. The annotations stored in the database include homologous sequences from various sequence databases, domains from SCOP and Pfam, patterns from Prosite and other predicted sequence features such as transmembrane regions and coiled coils. In addition to annotations at the sequence level, several precomputed cross- proteome comparative analyses are available based on SCOP domain superfamily composition. Annotations are available to the user via a web interface to the database. Multiple points of entry are available so that a user is able to: (i) directly access annotations for a single protein sequence via keywords or accession codes, (ii) examine a sequence of interest chosen from a summary of annotations for a particular proteome, or (iii) access precomputed frequency-based cross-proteome comparative analyses.
Profile–profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.
The Gene Ontology (GO) project (http://www.geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
The TAR hairpin of the human immunodeficiency virus type 1 (HIV-1) RNA genome is essential for virus replication. TAR forms the binding site for the transcriptional trans-activator protein Tat and multiple additional TAR functions have been proposed. We previously constructed an HIV-1 variant in which the TAR-Tat transcription control mechanism is replaced by the components of the Tet-ON regulatory system. In this context, the surprising finding was that TAR can be truncated or even deleted, but partial TAR deletions that destabilize the stem structure cause a severe replication defect. In this study, we demonstrate that the HIV-1 RNA genome requires a stable hairpin at its 5′-end because unpaired TAR sequences affect the proper folding of the untranslated leader RNA. Consequently, multiple leader-encoded functions are affected by partial TAR deletions. Upon evolution of such mutant viruses, the replication capacity was repaired through the acquisition of additional TAR mutations that restore the local RNA folding, thus preventing the detrimental effect on the leader conformation.
An extensive study of teichoic acid biosynthesis in the model organism Bacillus subtilis has established teichoic acid polymers as essential components of the gram-positive cell wall. However, similar studies pertaining to therapeutically relevant organisms, such as Staphylococcus aureus, are scarce. In this study we have carried out a meticulous examination of the dispensability of teichoic acid biosynthetic enzymes in S. aureus. By use of an allelic replacement methodology, we examined all facets of teichoic acid assembly, including intracellular polymer production and export. Using this approach we confirmed that the first-acting enzyme (TarO) was dispensable for growth, in contrast to dispensability studies in B. subtilis. Upon further characterization, we demonstrated that later-acting gene products (TarB, TarD, TarF, TarIJ, and TarH) responsible for polymer formation and export were essential for viability. We resolved this paradox by demonstrating that all of the apparently indispensable genes became dispensable in a tarO null genetic background. This work suggests a lethal gain-of-function mechanism where lesions beyond the initial step in wall teichoic acid biosynthesis render S. aureus nonviable. This discovery poses questions regarding the conventional understanding of essential gene sets, garnered through single-gene knockout experiments in bacteria and higher organisms, and points to a novel drug development strategy targeting late steps in teichoic acid synthesis for the infectious pathogen S. aureus.
All human immunodeficiency virus mRNAs contain a sequence known as TAR (trans-activating responsive sequence). The TAR element forms a stable RNA stem-loop structure which binds the HIV tat (trans-activator) protein and mediates increased viral gene expression. In principle, molecules which bind to the TAR RNA structure would inhibit trans-activation by perturbing the native RNA secondary structure. We have constructed a series of phosphodiester and phosphorothioate antisense oligonucleotides which specifically bind to the HIV TAR element. Specific binding to the TAR element was demonstrated in vitro with enzymatically synthesized TAR RNA. The TAR-directed phosphorothioates inhibited trans-activation in a sequence-dependent fashion in a cell culture model using an HIV LTR/human placental alkaline phosphatase gene fusion and tat protein supplied in trans. The molecules also inhibited HIV replication in both acute and chronically infected viral assays, but without sequence specificity. We have constructed a series of vectors consisting of the MMTV promoter and 5'-untranslated region of four different mRNAs, including the TAR region, to study the effect of TAR on gene expression in heterologous systems. The results suggest that, in the absence of the HIV LTR, the TAR element has a repressive effect on gene expression, which is relieved by tat.
TAR, a 59 nt 5′-terminal hairpin in human immunodeficiency virus 1 (HIV-1) mRNA, binds viral Tat and several cellular proteins. We report that eukaryotic translation initiation factor 2 (eIF2) recognizes TAR. TAR and the AUG initiation codon domain, located well downstream from TAR, both contribute to the affinity of HIV-1 mRNA for eIF2. The affinity of TAR for eIF2 was insensitive to lower stem mutations that modify sequence and structure or to sequence changes throughout the remainder that leave the TAR secondary structure intact. Hence, eIF2 recognizes structure rather than sequence in TAR. The affinity for eIF2 was severely reduced by a 3 nt change that converts the single A bulge into a 7 nt internal loop. T1 footprinting showed that eIF2 protects nucleotides in the loop as well as in the strand opposite the A bulge. Thus, eIF2 recognizes the TAR loop and lower part of the sub-apical stem. Though not contiguous, these regions are brought into proximity in TAR by a bend in the helical structure induced by the UCU bulge; binding of eIF2 opens up the bulge context and apical stem. The ability to bind eIF2 suggests a function for TAR in HIV-1 mRNA translation. Indeed, the 3 nt change that reduces the affinity of TAR for eIF2 impairs the ability of reporter mRNA to compete in translation. Interaction of TAR with eIF2 thus allows HIV-1 mRNA to compete more effectively during protein synthesis.
The trans-activation response element (TAR) of human immunodeficiency virus type 1 is a structured RNA consisting of the first 60 nucleotides of all human immunodeficiency virus type 1 RNAs. Computer analyses and limited structural analyses indicated that TAR consists of a stem-bulge-loop structure. Mutational analyses showed that sequences in the bulge are required for Tat binding, whereas sequences in both the bulge and the loop are required for trans activation. In this study, we probed the structures of TAR and various mutants of TAR with chemical probes and RNases and used these methods to footprint a Tat peptide on TAR. Our data show that the structure of wild-type TAR is different from previously published models. The bulge, a Tat-binding site, consists of four nucleotides. The loop is structured, rather than simply single stranded, in a fashion reminiscent of the structures of the tetraloop 5'-UUCG-3' and the GNRA loop (C. Cheong, G. Varani, and I. Tinoco, Jr., Nature [London] 346:680-682, 1990; H.A. Heus and A. Pardi, Science 253:191-193, 1991). RNA footprint data indicate that three bases in the bulge are protected and suggest that a conformational change occurs upon Tat binding.
Wall teichoic acids are anionic phosphate-rich polymers that are part of the complex meshwork of carbohydrates that make up the gram-positive cell wall. These polymers are essential to the proper rod-shaped morphology of Bacillus subtilis and have been shown to be an important virulence determinant in the nosocomial opportunistic pathogen Staphylococcus aureus. Together, sequence-based studies, in vitro experiments with biosynthetic proteins, and analyses of the chemical structure of wall teichoic acid have begun to shed considerable light on our understanding of the biogenesis of this polymer. Nevertheless, some paradoxes remain unresolved. One of these involves a putative duplication of genes linked to CDP-ribitol synthesis (tarI′J′ and tarIJ) as well as poly(ribitol phosphate) polymerization (tarK and tarL) in S. aureus. In the work reported here, we performed careful studies of the dispensability of each gene and discovered a functional redundancy in the duplicated gene clusters. We were able to create mutants in either of the putative ribitol phosphate polymerases (encoded by tarK and tarL) without affecting teichoic acid levels in the S. aureus cell wall. Although genes linked to CDP-ribitol synthesis are also duplicated, a null mutant in only one of these (tarI′J′) could be obtained, while tarIJ remained essential. Suppression analysis of the tarIJ null mutant indicated that the mechanism of dysfunction in tarI′J′ is due to poor translation of the TarJ′ enzyme, which catalyzes the rate-limiting step in CDP-ribitol formation. This work provides new insights into understanding the complex synthetic steps of the ribitol phosphate polymer in S. aureus and has implications on specifically targeting enzymes involved in polymer biosynthesis for antimicrobial design.
The RESID Database is a comprehensive collection of annotations and structures for protein pre-, co- and post-translational modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link modifications. The RESID Database includes: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. This database is freely accessible on the Internet through the European Bioinformatics Institute at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+RESID, through the National Cancer Institute — Frederick Advanced Biomedical Computing Center at http://www.ncifcrf.gov/RESID, or through the Protein Information Resource at http://pir.georgetown.edu/pirwww/dbinfo/resid.html.
Artemis and ACT have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences
Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.
Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: