RNA-binding proteins (RBPs) play an important role in plant host-microbe interactions. In this study, we show that the plant RBP known as FPA, which regulates 3′-end mRNA polyadenylation, negatively regulates basal resistance to bacterial pathogen Pseudomonas syringae in Arabidopsis. A custom microarray analysis reveals that flg22, a peptide derived from bacterial flagellins, induces expression of alternatively polyadenylated isoforms of mRNA encoding the defence-related transcriptional repressor ETHYLENE RESPONSE FACTOR 4 (ERF4), which is regulated by FPA. Flg22 induces expression of a novel isoform of ERF4 that lacks the ERF-associated amphiphilic repression (EAR) motif, while FPA inhibits this induction. The EAR-lacking isoform of ERF4 acts as a transcriptional activator in vivo and suppresses the flg22-dependent reactive oxygen species burst. We propose that FPA controls use of proximal polyadenylation sites of ERF4, which quantitatively limit the defence response output.
CudA, a nuclear protein required for Dictyostelium prespore-specific gene expression, binds in vivo to the promoter of the cotC prespore gene. A 14 nucleotide region of the cotC promoter binds CudA in vitro and ECudA, an Entamoeba CudA homologue, also binds to this site. The CudA and ECudA DNA-binding sites contain a dyad and, consistent with a symmetrical binding site, CudA forms a homodimer in the yeast two-hybrid system. Mutation of CudA binding sites within the cotC promoter reduces expression from cotC in prespore cells. The CudA and ECudA proteins share a 120 amino acid core of homology, and clustered point mutations introduced into two highly conserved motifs within the ECudA core region decrease its specific DNA binding in vitro. This region, the presumptive DNA-binding domain, is similar in sequence to domains in two Arabidopsis proteins and one Oryza protein. Significantly, these are the only proteins in the two plant species that contain an SH2 domain. Such a structure, with a DNA-binding domain located upstream of an SH2 domain, suggests that the plant proteins are orthologous to metazoan STATs. Consistent with this notion, the DNA sequence of the CudA half site, GAA, is identical to metazoan STAT half sites, although the relative positions of the two halves of the dyad are reversed. These results define a hitherto unrecognised class of transcription factors and suggest a model for the evolution of STATs and their DNA-binding sites.
Dictyostelium; CudA; Amoeboza; Plant STATs; SH2 domains
It has recently been shown that RNA 3′ end formation plays a more widespread role in controlling gene expression than previously thought. In order to examine the impact of regulated 3′ end formation genome-wide we applied direct RNA sequencing to A. thaliana. Here we show the authentic transcriptome in unprecedented detail and how 3′ end formation impacts genome organization. We reveal extreme heterogeneity in RNA 3′ ends, discover previously unrecognized non-coding RNAs and propose widespread re-annotation of the genome. We explain the origin of most poly(A)+ antisense RNAs and identify cis-elements that control 3′ end formation in different registers. These findings are essential to understand what the genome actually encodes, how it is organized and the impact of regulated 3′ end formation on these processes.
Small nucleolar RNAs (snoRNAs) function mainly as guides for the post-transcriptional modification of ribosomal RNAs (rRNAs). In recent years, several studies have identified a wealth of small fragments (<35 nt) derived from snoRNAs (termed sdRNAs) that stably accumulate in the cell, some of which may regulate splicing or translation. A comparison of human small RNA deep sequencing data sets reveals that box C/D sdRNA accumulation patterns are conserved across multiple cell types although the ratio of the abundance of different sdRNAs from a given snoRNA varies. sdRNA profiles of many snoRNAs are specific and resemble the cleavage profiles of miRNAs. Many do not show characteristics of general RNA degradation, as seen for the accumulation of small fragments derived from snRNA or rRNA. While 53% of the sdRNAs contain an snoRNA box C motif and boxes D and D′ are also common in sdRNAs (54%), relatively few (12%) contain a full snoRNA guide region. One box C/D snoRNA, HBII-180C, was analysed in greater detail, revealing the presence of C′ box-containing sdRNAs complementary to several pre-messenger RNAs (pre-mRNAs) including FGFR3. Functional analyses demonstrated that this region of HBII-180C can influence the alternative splicing of FGFR3 pre-mRNA, supporting a role for some snoRNAs in the regulation of splicing.
► Identifies key considerations in target selection and optimisation. ► Approaches to assign useful protein features and structure/function relationships. ► Comparison of latest crystallisation propensity predictors on nonredundant data. ► Discusses single point of reference target selection/optimisation resources. ► Guidance on using the SSPF Target Optimisation Utility (TarO).
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline.
Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
MSA, Multiple Sequence Alignment; PTM, Post Translational Modification; SSPF, Scottish Structural Proteomics Facility; MCC, Matthew’s correlation coefficient; AROC, Area Under the Receiver Operator Characteristic curve; Target selection; Crystallisation; Structural genomics; Structural biology; Bioinformatics; Construct design
Nucleolar localization sequences (NoLSs) are short targeting sequences responsible for the localization of proteins to the nucleolus. Given the large number of proteins experimentally detected in the nucleolus and the central role of this subnuclear compartment in the cell, NoLSs are likely to be important regulatory elements controlling cellular traffic. Although many proteins have been reported to contain NoLSs, the systematic characterization of this group of targeting motifs has only recently been carried out.
Here, we describe NoD, a web server and a command line program that predicts the presence of NoLSs in proteins. Using the web server, users can submit protein sequences through the NoD input form and are provided with a graphical output of the NoLS score as a function of protein position. While the web server is most convenient for making prediction for just a few proteins, the command line version of NoD can return predictions for complete proteomes. NoD is based on our recently described human-trained artificial neural network predictor. Through stringent independent testing of the predictor using available experimentally validated NoLS-containing eukaryotic and viral proteins, the NoD sensitivity and positive predictive value were estimated to be 71% and 79% respectively.
NoD is the first tool to provide predictions of nucleolar localization sequences in diverse eukaryotes and viruses. NoD can be run interactively online at http://www.compbio.dundee.ac.uk/nod or downloaded to use locally.
nucleolus; protein targeting signal; protein localization; NoD web server
Summary: JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts.
Availability and Implementation: JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
Staphylococcus aureus is a major human pathogen and strains resistant to existing treatments continue to emerge. Development of novel treatments is therefore important. Antimicrobial peptides represent a source of potential novel antibiotics to combat resistant bacteria such as Methicillin-Resistant Staphylococcus aureus (MRSA). A promising antimicrobial peptide is ranalexin, which has potent activity against Gram-positive bacteria, and particularly S. aureus. Understanding mode of action is a key component of drug discovery and network biology approaches enable a global, integrated view of microbial physiology, including mechanisms of antibiotic killing. We developed a systems-wide functional association network approach to integrate proteome and transcriptome profiles, enabling study of drug resistance and mode of action.
The functional association network was constructed by Bayesian logistic regression, providing a framework for identification of antimicrobial peptide (ranalexin) response modules from S. aureus MRSA-252 transcriptome and proteome profiling. These signatures of ranalexin treatment revealed multiple killing mechanisms, including cell wall activity. Cell wall effects were supported by gene disruption and osmotic fragility experiments. Furthermore, twenty-two novel virulence factors were inferred, while the VraRS two-component system and PhoU-mediated persister formation were implicated in MRSA tolerance to cationic antimicrobial peptides.
This work demonstrates a powerful integrative approach to study drug resistance and mode of action. Our findings are informative to the development of novel therapeutic strategies against Staphylococcus aureus and particularly MRSA.
The SWI/SNF complex acts to constrain distribution of the centromeric histone variant Cse4
The SWI/SNF complex has an important role in regulating chromatin structure during transcriptional activation and DNA repair. Here, the SWI/SNF complex is also involved in the organisation of centromeric chromatin and prevention of the ectopic deposition of centromeric histone variants.
In order to gain insight into the function of the Saccharomyces cerevisiae SWI/SNF complex, we have identified DNA sequences to which it is bound genomewide. One surprising observation is that the complex is enriched at the centromeres of each chromosome. Deletion of the gene encoding the Snf2 subunit of the complex was found to cause partial redistribution of the centromeric histone variant Cse4 to sites on chromosome arms. Cultures of snf2Δ yeast were found to progress through mitosis slowly. This was dependent on the mitotic checkpoint protein Mad2. In the absence of Mad2, defects in chromosome segregation were observed. In the absence of Snf2, chromatin organisation at centromeres is less distinct. In particular, hypersensitive sites flanking the Cse4 containing nucleosomes are less pronounced. Furthermore, SWI/SNF complex was found to be especially effective in the dissociation of Cse4 containing chromatin in vitro. This suggests a role for Snf2 in the maintenance of point centromeres involving the removal of Cse4 from ectopic sites.
centromere; chromatin; Cse4; nucleosome; SWI/SNF
Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional.
To describe the dynamic localisation of proteins in the nucleolus, we investigated the extent of nucleolar association of proteins by first collating an extensively curated literature-derived dataset. This dataset then served to train a probabilistic predictor which integrates gene and protein characteristics. Unlike most previous experimental and computational studies of the nucleolar proteome that produce large static lists of nucleolar proteins regardless of their extent of nucleolar association, our predictor models the fluidity of the nucleolus by considering different classes of nucleolar-associated proteins. The new method predicts all human proteins as either nucleolar-enriched, nucleolar-nucleoplasmic, nucleolar-cytoplasmic or non-nucleolar. Leave-one-out cross validation tests reveal sensitivity values for these four classes ranging from 0.72 to 0.90 and positive predictive values ranging from 0.63 to 0.94. The overall accuracy of the classifier was measured to be 0.85 on an independent literature-based test set and 0.74 using a large independent quantitative proteomics dataset. While the three nucleolar-association groups display vastly different Gene Ontology biological process signatures and evolutionary characteristics, they collectively represent the most well characterised nucleolar functions.
Our proteome-wide classification of nucleolar association provides a novel representation of the dynamic content of the nucleolus. This model of nucleolar localisation thus increases the coverage while providing accurate and specific annotations of the nucleolar proteome. It will be instrumental in better understanding the central role of the nucleolus in the cell and its interaction with other subcellular compartments.
There are two main classes of small nucleolar RNAs (snoRNAs): the box C/D snoRNAs and the box H/ACA snoRNAs that function as guide RNAs to direct sequence-specific modification of rRNA precursors and other nucleolar RNA targets. A previous computational and biochemical analysis revealed a possible evolutionary relationship between miRNA precursors and some box H/ACA snoRNAs. Here, we investigate a similar evolutionary relationship between a subset of miRNA precursors and box C/D snoRNAs. Computational analyses identified 84 intronic miRNAs that are encoded within either box C/D snoRNAs, or in precursors showing similarity to box C/D snoRNAs. Predictions of the folded structures of these box C/D snoRNA-like miRNA precursors resemble the structures of known box C/D snoRNAs, with the boxes C and D often in close proximity in the folded molecule. All five box C/D snoRNA-like miRNA precursors tested (miR-27b, miR-16-1, mir-28, miR-31 and let-7g) bind to fibrillarin, a specific protein component of functional box C/D snoRNP complexes. The data suggest that a subset of small regulatory RNAs may have evolved from box C/D snoRNAs.
Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor’s overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.
In this manuscript we describe the characterisation of human snoRNAs that co-purify with nucleoli and develop a new vector based system for targeted gene knock down. We demonstrate that this novel vector system (snoMEN) can deliver effective, sequence-specific knock down of endogenous cellular genes as well as GFP and GFP-fusion proteins.
Human small nucleolar RNAs (snoRNAs) that copurify with nucleoli isolated from HeLa cells have been characterized. Novel fibrillarin-associated snoRNAs were detected that allowed the creation of a new vector system for the targeted knockdown of one or more genes in mammalian cells. The snoMEN (snoRNA modulator of gene expressioN) vector technology is based on snoRNA HBII-180C, which contains an internal sequence that can be manipulated to make it complementary to RNA targets. Gene-specific knockdowns are demonstrated for endogenous cellular proteins and for G/YFP-fusion proteins. Multiplex snoMEN vectors coexpress multiple snoRNAs in one transcript, targeted either to different genes or to different sites in the same gene. Protein replacement snoMEN vectors can express a single transcript combining cDNA for a tagged protein with introns containing cognate snoRNAs targeted to knockdown the endogenous cellular protein. We foresee applications for snoMEN vectors in basic gene expression research, target validation, and gene therapy.
MicroRNAs (miRNAs) and small nucleolar RNAs (snoRNAs) are two classes of small non-coding regulatory RNAs, which have been much investigated in recent years. While their respective functions in the cell are distinct, they share interesting genomic similarities, and recent sequencing projects have identified processed forms of snoRNAs that resemble miRNAs. Here, we investigate a possible evolutionary relationship between miRNAs and box H/ACA snoRNAs. A comparison of the genomic locations of reported miRNAs and snoRNAs reveals an overlap of specific members of these classes. To test the hypothesis that some miRNAs might have evolved from snoRNA encoding genomic regions, reported miRNA-encoding regions were scanned for the presence of box H/ACA snoRNA features. Twenty miRNA precursors show significant similarity to H/ACA snoRNAs as predicted by snoGPS. These include molecules predicted to target known ribosomal RNA pseudouridylation sites in vivo for which no guide snoRNA has yet been reported. The predicted folded structures of these twenty H/ACA snoRNA-like miRNA precursors reveal molecules which resemble the structures of known box H/ACA snoRNAs. The genomic regions surrounding these predicted snoRNA-like miRNAs are often similar to regions around snoRNA retroposons, including the presence of transposable elements, target site duplications and poly (A) tails. We further show that the precursors of five H/ACA snoRNA-like miRNAs (miR-151, miR-605, mir-664, miR-215 and miR-140) bind to dyskerin, a specific protein component of functional box H/ACA small nucleolar ribonucleoprotein complexes suggesting that these molecules have retained some H/ACA snoRNA functionality. The detection of small RNA molecules that share features of miRNAs and snoRNAs suggest that these classes of RNA may have an evolutionary relationship.
The major functions known for RNA were long believed to be either messenger RNAs, which function as intermediates between genes and proteins, or ribosomal RNAs and transfer RNAs which carry out the translation process. In recent years, however, newly discovered classes of small RNAs have been shown to play important cellular roles. These include microRNAs (miRNAs), which can regulate the production of specific proteins, and small nucleolar RNAs (snoRNAs), which recognise and chemically modify specific sequences in ribosomal RNA. Although miRNAs and snoRNAs are currently believed to be generated by different cellular pathways and to function in different cellular compartments, members of these two types of small RNAs display numerous genomic similarities, and a small number of snoRNAs have been shown to encode miRNAs in several organisms. Here we systematically investigate a possible evolutionary relationship between snoRNAs and miRNAs. Using computational analysis, we identify twenty genomic regions encoding miRNAs with highly significant similarity to snoRNAs, both on the level of their surrounding genomic context as well as their predicted folded structure. A subset of these miRNAs display functional snoRNA characteristics, strengthening the possibility that these miRNA molecules might have evolved from snoRNAs.
Asparagine-linked glycosylation is catalysed by oligosaccharyltransferase (OTase). In Trypanosoma brucei OTase activity is catalysed by single-subunit enzymes encoded by three paralogous genes of which TbSTT3B and TbSTT3C can complement a yeast Δstt3 mutant. The two enzymes have overlapping but distinct peptide acceptor specificities, with TbSTT3C displaying an enhanced ability to glycosylate sites flanked by acidic residues. TbSTT3A and TbSTT3B, but not TbSTT3C, are transcribed in the bloodstream and procyclic life cycle stages of T. brucei. Selective knockdown and analysis of parasite protein N-glycosylation showed that TbSTT3A selectively transfers biantennary Man5GlcNAc2 to specific glycosylation sites whereas TbSTT3B selectively transfers triantennary Man9GlcNAc2 to others. Analysis of T. brucei glycosylation site occupancy showed that TbSTT3A and TbSTT3B glycosylate sites in acidic to neutral and neutral to basic regions of polypeptide, respectively. This embodiment of distinct specificities in single-subunit OTases may have implications for recombinant glycoprotein engineering. TbSTT3A and TbSTT3B could be knocked down individually, but not collectively, in tissue culture. However, both were independently essential for parasite growth in mice, suggesting that inhibiting protein N-glycosylation could have therapeutic potential against trypanosomiasis.
glycosylation; oligosaccharyltransferase; STT3;
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K.
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K. The protein crystallizes in a primitive triclinic lattice, with unit-cell parameters a = 45.3, b = 60.5, c = 117.6 Å, α = 87.2, β = 81.2, γ = 68.4°. A complete data set has been collected to 2.3 Å resolution at the ESRF. Consideration of the likely solvent content suggested the asymmetric unit to contain four molecules. This has been confirmed by molecular-replacement phasing calculations, which give a solution with four monomers using a monomer of pantothenate synthetase from Escherichia coli (PDB code 1iho), which is 41% identical to Sar2676, as a search model.
Sar2676; pantothenate synthetase; methicillin-resistant Staphylococcus aureus
Summary: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server.
Availability: The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org
The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.
The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a general mechanism controlling virtually every cellular activity. Eukaryotic protein kinases can be classified into distinct, well-characterized groups based on amino acid sequence similarity and function. We recently reported a highly sensitive and accurate hidden Markov model-based method for the automatic detection and classification of protein kinases into these specific groups. The Kinomer v. 1.0 database presented here contains annotated classifications for the protein kinase complements of 43 eukaryotic genomes. These span the taxonomic range and include fungi (16 species), plants (6), diatoms (1), amoebas (2), protists (1) and animals (17). The kinomes are stored in a relational database and are accessible through a web interface on the basis of species, kinase group or a combination of both. In addition, the Kinomer v. 1.0 HMM library is made available for users to perform classification on arbitrary sequences. The Kinomer v. 1.0 database is a continually updated resource where direct comparison of kinase sequences across kinase groups and across species can give insights into kinase function and evolution. Kinomer v. 1.0 is available at http://www.compbio.dundee.ac.uk/kinomer/.
SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.
Jpred (http://www.compbio.dundee.ac.uk/jpred) is a secondary structure prediction server powered by the Jnet algorithm. Jpred performs over 1000 predictions per week for users in more than 50 countries. The recently updated Jnet algorithm provides a three-state (α-helix, β-strand and coil) prediction of secondary structure at an accuracy of 81.5%. Given either a single protein sequence or a multiple sequence alignment, Jpred derives alignment profiles from which predictions of secondary structure and solvent accessibility are made. The predictions are presented as coloured HTML, plain text, PostScript, PDF and via the Jalview alignment editor to allow flexibility in viewing and applying the data. The new Jpred 3 server includes significant usability improvements that include clearer feedback of the progress or failure of submitted requests. Functional improvements include batch submission of sequences, summary results via email and updates to the search databases. A new software pipeline will enable Jnet/Jpred to continue to be updated in sync with major updates to SCOP and UniProt and so ensures that Jpred 3 will maintain high-accuracy predictions.
TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques. The protein sequence is analysed by 17 algorithms and compared to 8 databases. TarO gathers putative homologues, including orthologues, and then obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications. Analyses are run on a high-performance computing cluster, the results integrated, stored in a database and accessed through a web-based user interface. Output is in tabulated format and in the form of an annotated multiple sequence alignment (MSA) that may be edited interactively in the program Jalview. TarO also simplifies the gathering of additional annotations via the Distributed Annotation System, both from the MSA in Jalview and through links to Dasty2. Routes to other information gateways are included, for example to relevant pages from UniProt, COG and the Conserved Domains Database. Open access to TarO is available from a guest account with private accounts for academic use available on request. Future development of TarO will include further analysis steps and integration with the Protein Information Management System (PIMS), a sister project in the BBSRC ‘Structural Proteomics of Rational Targets’ initiative
Amino acids responsible for structure, core function or specificity may be inferred from multiple protein sequence alignments where a limited set of residue types are tolerated. The rise in available protein sequences continues to increase the power of techniques based on this principle.
A new algorithm, SMERFS, for predicting protein functional sites from multiple sequences alignments was compared to 14 conservation measures and to the MINER algorithm. Validation was performed on an automatically generated dataset of 1457 families derived from the protein interactions database SNAPPI-DB, and a smaller manually curated set of 148 families. The best performing measure overall was Williamson property entropy, with ROC0.1 scores of 0.0087 and 0.0114 for domain and small molecule contact prediction, respectively. The Lancet method performed worse than random on protein-protein interaction site prediction (ROC0.1 score of 0.0008). The SMERFS algorithm gave similar accuracy to the phylogenetic tree-based MINER algorithm but was superior to Williamson in prediction of non-catalytic transient complex interfaces. SMERFS predicts sites that are significantly more solvent accessible compared to Williamson.
Williamson property entropy is the the best performing of 14 conservation measures examined. The difference in performance of SMERFS relative to Williamson in manually defined complexes was dependent on complex type. The best choice of analysis method is therefore dependent on the system of interest. Additional computation employed by Miner in calculation of phylogenetic trees did not produce improved results over SMERFS. SMERFS performance was improved by use of windows over alignment columns, illustrating the necessity of considering the local environment of positions when assessing their functional significance.
The problems of gaining accurate protein sequence alignments for molecular replacement are discussed, current techniques explained and strategies suggested.
This article focuses on the key step of obtaining the best possible sequence alignment of the Query (the protein you are interested in) to the Target (a protein of known three-dimensional structure) in order to build a molecular model for molecular replacement. Common sequence-alignment methods are discussed, starting from structural alignment and then moving to pairwise, multiple and profile–profile methods. The limitations of sequence-alignment methods and guidelines on how to judge the likely accuracy of alignment are considered. This is not a detailed tutorial on how to use specific programs; rather, the reader is directed to current tools and techniques that are likely to yield good results.
molecular replacement; sequence alignment
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.