Lattice models are a common abstraction used in the study of protein structure, folding, and refinement. They are advantageous because the discretisation of space can make extensive protein evaluations computationally feasible. Various approaches to the protein chain lattice fitting problem have been suggested but only a single backbone-only tool is available currently. We introduce LatFit, a new tool to produce high-accuracy lattice protein models. It generates both backbone-only and backbone-side-chain models in any user defined lattice. LatFit implements a new distance RMSD-optimisation fitting procedure in addition to the known coordinate RMSD method. We tested LatFit's accuracy and speed using a large nonredundant set of high resolution proteins (SCOP database) on three commonly used lattices: 3D cubic, face-centred cubic, and knight's walk. Fitting speed compared favourably to other methods and both backbone-only and backbone-side-chain models show low deviation from the original data (~1.5 Å RMSD in the FCC lattice). To our knowledge this represents the first comprehensive study of lattice quality for on-lattice protein models including side chains while LatFit is the only available tool for such models.
Microtubules (MTs), which play crucial roles in normal cell function, are regulated by MT associated proteins (MAPs). Using a combinatorial approach that includes biochemistry, proteomics and bioinformatics, we have recently identified 270 putative MAPs from Drosophila embryos and characterized some of those required for correct progression through mitosis. Here we identify functional groups of these MAPs using a reciprocal hits sequence alignment technique and assign InterPro functional domains to 28 previously uncharacterized proteins. This approach gives insight into the potential functions of MAPs and how their roles may affect MTs.
Drosophila; domain; microtubule; MAP; alignment
The variable domains of antibodies and T-Cell receptors (TCRs) share similar structures. Both molecules act as sensors for the immune system but recognise their respective antigens in different ways. Antibodies bind to a diverse set of antigenic shapes whilst TCRs only recognise linear peptides presented by a major histocompatibility complex (MHC). The antigen specificity and affinity of both receptors is determined primarily by the sequence and structure of their complementarity determining regions (CDRs). In antibodies the binding site is also known to be affected by the relative orientation of the variable domains, VH and VL. Here, the corresponding property for TCRs, the Vβ-Vα orientation, is investigated and compared with that of antibodies. We find that TCR and antibody orientations are distinct. General antibody orientations are found to be incompatible with binding to the MHC in a canonical TCR-like mode. Finally, factors that cause the orientation of TCRs and antibodies to be different are investigated. Packing of the long Vα CDR3 in the domain-domain interface is found to be influential. In antibodies, a similar packing affect can be achieved using a bulky residue at IMGT position 50 on the VH domain. Along with IMGT VH 50, other positions are identified that may help to promote a TCR-like orientation in antibodies. These positions should provide useful considerations in the engineering of therapeutic TCR-like antibodies.
The immune system needs to be able to sense molecules that might be harmful to the organism. Such harmful molecules are known as antigens. Two classes of receptor proteins that mediate antigen recognition are antibodies and T-Cell receptors (TCRs). Antibodies are able to bind a diverse range of antigen shapes whilst TCRs are specialised to recognise a cell-surface protein, the pMHC. Antibodies that bind the pMHC are rarely created naturally. However, such TCR-like antibodies are of therapeutic importance. The binding regions of the TCR and the antibody have very similar three dimensional structures. Both consist of two independent units, domains, which associate and form the antigen binding site between them. This work examines how the two domains orientate with respect to one another in TCRs and antibodies. Our results show that the conformations that exist in TCRs and antibodies are distinct. Consequently it is difficult for an antibody to bind to a pMHC in the same way a TCR would. However, a similar conformation can be achieved in antibodies as in TCRs by the presence of certain amino-acids in the domain interface. This knowledge should aid the development of therapeutic TCR-like antibodies.
Motivation: Biological network comparison software largely relies on the concept of alignment where close matches between the nodes of two or more networks are sought. These node matches are based on sequence similarity and/or interaction patterns. However, because of the incomplete and error-prone datasets currently available, such methods have had limited success. Moreover, the results of network alignment are in general not amenable for distance-based evolutionary analysis of sets of networks. In this article, we describe Netdis, a topology-based distance measure between networks, which offers the possibility of network phylogeny reconstruction.
Results: We first demonstrate that Netdis is able to correctly separate different random graph model types independent of network size and density. The biological applicability of the method is then shown by its ability to build the correct phylogenetic tree of species based solely on the topology of current protein interaction networks. Our results provide new evidence that the topology of protein interaction networks contains information about evolutionary processes, despite the lack of conservation of individual interactions. As Netdis is applicable to all networks because of its speed and simplicity, we apply it to a large collection of biological and non-biological networks where it clusters diverse networks by type.
Availability and implementation: The source code of the program is freely available at http://www.stats.ox.ac.uk/research/proteins/resources.
Supplementary data are available at Bioinformatics online.
The interplay between T cell receptors (TCRs) and peptides bound by major histocompatibility complexes (MHCs) is one of the most important interactions in the adaptive immune system. Several previous studies have computationally investigated their structural dynamics. On the basis of these simulations several structural and dynamical properties have been proposed as effectors of the immunogenicity. Here we present the results of a large scale Molecular Dynamics simulation study consisting of 100 ns simulations of 172 different complexes. These complexes consisted of all possible point mutations of the Epstein Barr Virus peptide FLRGRAYGL bound by HLA-B*08:01 and presented to the LC13 TCR. We compare the results of these 172 structural simulations with experimental immunogenicity data. We found that simulations with more immunogenic peptides and those with less immunogenic peptides are in fact highly similar and on average only minor differences in the hydrogen binding footprints, interface distances, and the relative orientation between the TCR chains are present. Thus our large scale data analysis shows that many previously suggested dynamical and structural properties of the TCR/peptide/MHC interface are unlikely to be conserved causal factors for peptide immunogenicity.
Immune cells in the human body screen other cells for possible infections. The binding of T-cell receptors (TCR) and parts of pathogens bound by major histocompatibility complexes (MHC) is one of the activation mechanisms of the immune system. There have been many hypotheses as to when such binding will activate the immune system. In this study we performed the, to our knowledge, largest set of Molecular Dynamics simulations of TCR-MHC complexes. We performed 172 simulations each of 100 ns in length. By performing a large number of simulations we obtain insight about which structural features are frequently present in immune system activating and non-activating TCR-MHC complexes. We show that many previously suggested structural features are unlikely to be causal for the activation of the human immune system.
Motivation: Antibodies are currently the most important class of biopharmaceuticals. Development of such antibody-based drugs depends on costly and time-consuming screening campaigns. Computational techniques such as antibody–antigen docking hold the potential to facilitate the screening process by rapidly providing a list of initial poses that approximate the native complex.
Results: We have developed a new method to identify the epitope region on the antigen, given the structures of the antibody and the antigen—EpiPred. The method combines conformational matching of the antibody–antigen structures and a specific antibody–antigen score. We have tested the method on both a large non-redundant set of antibody–antigen complexes and on homology models of the antibodies and/or the unbound antigen structure. On a non-redundant test set, our epitope prediction method achieves 44% recall at 14% precision against 23% recall at 14% precision for a background random distribution. We use our epitope predictions to rescore the global docking results of two rigid-body docking algorithms: ZDOCK and ClusPro. In both cases including our epitope, prediction increases the number of near-native poses found among the top decoys.
Availability and implementation: Our software is available from http://www.stats.ox.ac.uk/research/proteins/resources.
Supplementary data are available at Bioinformatics online.
Helix kinks are a common feature of α-helical membrane proteins, but are thought to be rare in soluble proteins. In this study we find that kinks are a feature of long α-helices in both soluble and membrane proteins, rather than just transmembrane α-helices. The apparent rarity of kinks in soluble proteins is due to the relative infrequency of long helices (≥20 residues) in these proteins. We compare length-matched sets of soluble and membrane helices, and find that the frequency of kinks, the role of Proline, the patterns of other amino acid around kinks (allowing for the expected differences in amino acid distributions between the two types of protein), and the effects of hydrogen bonds are the same for the two types of helices. In both types of protein, helices that contain Proline in the second and subsequent turns are very frequently kinked. However, there are a sizeable proportion of kinked helices that do not contain a Proline in either their sequence or sequence homolog. Moreover, we observe that in soluble proteins, kinked helices have a structural preference in that they typically point into the solvent.
membrane protein; protein structure; protein helix; helix kink; helix distortion; soluble protein; helix bend
The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.
Proteins are the molecular workers of the cell. They are formed from a string of amino acids which folds into an elaborate three-dimensional structure. While there is a relationship between a protein's sequence and its structure this relationship is highly complex and not fully understood. Protein structures tend to evolve differently to their sequences. They are far more conserved so tend to change slower. The aim of this paper was to identify trends in the way that protein structures evolve, rather than adapting models of sequence evolution. To do this we have provided a database of ages for structural superfamilies. These ages are robust to drastic differences in the evolutionary assumptions underlying their estimation and can be used to study differences between populations of proteins. For example, we have compared newly evolved structures against those with a long evolutionary history and found that, overall, a shorter evolutionary history corresponds to a less elaborate structure. We have also demonstrated here how these ages can be used to compare particular structural motifs present in a large number of protein structures and have shown that the jelly roll motif is significantly younger than the greek key.
Structural antibody database (SAbDab; http://opig.stats.ox.ac.uk/webapps/sabdab) is an online resource containing all the publicly available antibody structures annotated and presented in a consistent fashion. The data are annotated with several properties including experimental information, gene details, correct heavy and light chain pairings, antigen details and, where available, antibody–antigen binding affinity. The user can select structures, according to these attributes as well as structural properties such as complementarity determining region loop conformation and variable domain orientation. Individual structures, datasets and the complete database can be downloaded.
Membrane proteins are estimated to be the targets of 50% of drugs that are currently in development, yet we have few membrane protein crystal structures. As a result, for a membrane protein of interest, the much-needed structural information usually comes from a homology model. Current homology modelling software is optimized for globular proteins, and ignores the constraints that the membrane is known to place on protein structure. Our Memoir server produces homology models using alignment and coordinate generation software that has been designed specifically for transmembrane proteins. Memoir is easy to use, with the only inputs being a structural template and the sequence that is to be modelled. We provide a video tutorial and a guide to assessing model quality. Supporting data aid manual refinement of the models. These data include a set of alternative conformations for each modelled loop, and a multiple sequence alignment that incorporates the query and template. Memoir works with both α-helical and β-barrel types of membrane proteins and is freely available at http://opig.stats.ox.ac.uk/webapps/memoir.
Protein-protein interfaces hold the key to understanding protein-protein interactions. In this paper we investigated local interaction network patterns beyond pair-wise contact sites by considering interfaces as contact networks among residues. A contact site was defined as any residue on the surface of one protein which was in contact with a residue on the surface of another protein. We labeled the sub-graphs of these contact networks by their amino acid types. The observed distributions of these labeled sub-graphs were compared with the corresponding background distributions and the results suggested that there were preferred chemical patterns of closely packed residues at the interface. These preferred patterns point to biological constraints on physical proximity between those residues on one protein which were involved in binding to residues which were close on the interacting partner. Interaction interfaces were far from random and contain information beyond pairs and triangles. To illustrate the possible application of the local network patterns observed, we introduced a signature method, called iScore, based on these local patterns to assess interface predictions. On our data sets iScore achieved 83.6% specificity with 82% sensitivity.
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop’s end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell–Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Protein structure; Protein loop; Protein structure prediction; Protein loop structure; Protein loop structure prediction; Protein; Loop stretch; Loop span
Male factor and idiopathic infertility contribute significantly to global infertility, with abnormal testicular gene expression considered to be a major cause. Certain types of male infertility are caused by failure of the sperm to activate the oocyte, a process normally regulated by calcium oscillations, thought to be induced by a sperm-specific phospholipase C, PLCzeta (PLCζ). Previously, we identified a point mutation in an infertile male resulting in the substitution of histidine for proline at position 398 of the protein sequence (PLCζH398P), leading to abnormal PLCζ function and infertility.
METHODS AND RESULTS
Here, using a combination of direct-sequencing and mini-sequencing of the PLCζ gene from the patient and his family, we report the identification of a second PLCζ mutation in the same patient resulting in a histidine to leucine substitution at position 233 (PLCζH233L), which is predicted to disrupt local protein interactions in a manner similar to PLCζH398P and was shown to exhibit abnormal calcium oscillatory ability following predictive 3D modelling and cRNA injection in mouse oocytes respectively. We show that PLCζH233L and PLCζH398P exist on distinct parental chromosomes, the former inherited from the patient's mother and the latter from his father. Neither mutation was detected utilizing custom-made single-nucleotide polymorphism assays in 100 fertile males and females, or 8 infertile males with characterized oocyte activation deficiency.
Collectively, our findings provide further evidence regarding the importance of PLCζ at oocyte activation and forms of male infertility where this is deficient. Additionally, we show that the inheritance patterns underlying male infertility are more complex than previously thought and may involve maternal mechanisms.
infertility; oocyte activation; sperm; phophospholipase C zeta (PLCzeta); inheritance
The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp -value threshold of , we estimate the conservation rate to be about between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care.
It is widely assumed that knowledge gained in one species can be transferred to another species, even among species that are widely separated on the tree of life. This transfer is often done at the level of proteins under the assumption that if two proteins have similar sequences, they will share similar properties. In this paper, we investigate the validity of this assumption for the case of protein-protein interactions. The transfer of protein interactions across species is a common procedure and it is known to have shortcomings but these are generally ascribed to the incompleteness of protein interaction data. We introduce a framework to take such incomplete information into account, and under its assumptions show that the procedure is unreliable when using sequence-similarity thresholds typically thought to allow the transfer of functional information. Our results imply that, unless using strict definitions of homology, interactions rewire at a rate too fast to allow reliable transfer across species. We urge caution in interpreting the results of such transfers.
Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein).
Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively.
We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific “successful” case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random.
All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
Phosphosignalling pathways are an attractive option for the synthetic biologist looking for a wide repertoire of modular components from which to build. We demonstrate that two-component systems can be used in synthetic biology. However, their potential is limited by the fact that host cells contain many of their own phosphosignalling pathways and these may interact with, and cross-talk to, the introduced synthetic components. In this paper we also demonstrate a simple bioinformatic tool that can help predict whether interspecies cross-talk between introduced and native two-component signalling pathways will occur and show both in vitro and in vivo that the predicted interactions do take place. The ability to predict potential cross-talk prior to designing and constructing novel pathways or choosing a host organism is essential for the promise that phosphosignalling components hold for synthetic biology to be realised.
Motivation: Membrane proteins are both abundant and important in cells, but the small number of solved structures restricts our understanding of them. Here we consider whether membrane proteins undergo different substitutions from their soluble counterparts and whether these can be used to improve membrane protein alignments, and therefore improve prediction of their structure.
Results: We construct substitution tables for different environments within membrane proteins. As data is scarce, we develop a general metric to assess the quality of these asymmetric tables. Membrane proteins show markedly different substitution preferences from soluble proteins. For example, substitution preferences in lipid tail-contacting parts of membrane proteins are found to be distinct from all environments in soluble proteins, including buried residues. A principal component analysis of the tables identifies the greatest variation in substitution preferences to be due to changes in hydrophobicity; the second largest variation relates to secondary structure. We demonstrate the use of our tables in pairwise sequence-to-structure alignments (also known as ‘threading’) of membrane proteins using the FUGUE alignment program. On average, in the 10–25% sequence identity range, alignments are improved by 28 correctly aligned residues compared with alignments made using FUGUE's default substitution tables. Our alignments also lead to improved structural models.
Availability: Substitution tables are available at: http://www.stats.ox.ac.uk/proteins/resources.
Motivation: Membrane proteins (MPs) are important drug targets but knowledge of their exact structure is limited to relatively few examples. Existing homology-based structure prediction methods are designed for globular, water-soluble proteins. However, we are now beginning to have enough MP structures to justify the development of a homology-based approach specifically for them.
Results: We present a MP-specific homology-based coordinate generation method, MEDELLER, which is optimized to build highly reliable core models. The method outperforms the popular structure prediction programme Modeller on MPs. The comparison of the two methods was performed on 616 target–template pairs of MPs, which were classified into four test sets by their sequence identity. Across all targets, MEDELLER gave an average backbone root mean square deviation (RMSD) of 2.62 Å versus 3.16 Å for Modeller. On our ‘easy’ test set, MEDELLER achieves an average accuracy of 0.93 Å backbone RMSD versus 1.56 Å for Modeller.
Availability and Implementation: http://medeller.info; Implemented in Python, Bash and Perl CGI for use on Linux systems; Supplementary data are available at http://www.stats.ox.ac.uk/proteins/resources.
Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: A wealth of protein–protein interaction (PPI) data has recently become available. These data are organized as PPI networks and an efficient and biologically meaningful method to compare such PPI networks is needed. As a first step, we would like to compare observed networks to established network models, under the aspect of small subgraph counts, as these are conjectured to relate to functional modules in the PPI network. We employ the software tool GraphCrunch with the Graphlet Degree Distribution Agreement (GDDA) score to examine the use of such counts for network comparison.
Results: Our results show that the GDDA score has a pronounced dependency on the number of edges and vertices of the networks being considered. This should be taken into account when testing the fit of models. We provide a method for assessing the statistical significance of the fit between random graph models and biological networks based on non-parametric tests. Using this method we examine the fit of Erdös–Rényi (ER), ER with fixed degree distribution and geometric (3D) models to PPI networks. Under these rigorous tests none of these models fit to the PPI networks. The GDDA score is not stable in the region of graph density relevant to current PPI networks. We hypothesize that this score instability is due to the networks under consideration having a graph density in the threshold region for the appearance of small subgraphs. This is true for both geometric (3D) and ER random graph models. Such threshold behaviour may be linked to the robustness and efficiency properties of the PPI networks.
Supplementary information: Supplementary data are available at Bioinformatics online.
The idea of “date” and “party” hubs has been influential in the study of protein–protein interaction networks. Date hubs display low co-expression with their partners, whilst party hubs have high co-expression. It was proposed that party hubs are local coordinators whereas date hubs are global connectors. Here, we show that the reported importance of date hubs to network connectivity can in fact be attributed to a tiny subset of them. Crucially, these few, extremely central, hubs do not display particularly low expression correlation, undermining the idea of a link between this quantity and hub function. The date/party distinction was originally motivated by an approximately bimodal distribution of hub co-expression; we show that this feature is not always robust to methodological changes. Additionally, topological properties of hubs do not in general correlate with co-expression. However, we find significant correlations between interaction centrality and the functional similarity of the interacting proteins. We suggest that thinking in terms of a date/party dichotomy for hubs in protein interaction networks is not meaningful, and it might be more useful to conceive of roles for protein-protein interactions rather than for individual proteins.
Proteins are key components of cellular machinery, and most cellular functions are executed by groups of proteins acting in concert. The study of networks formed by protein interactions can help reveal how the complex functionality of cells emerges from simple biochemistry. Certain proteins have a particularly large number of interaction partners; some have argued that these “hubs” are essential to biological function. Previous work has suggested that such hubs can be classified into just two varieties: party hubs, which coordinate a specific cellular process or protein complex; and date hubs, which link together and convey information between different function-specific modules or complexes. In this study, we re-examine the ideas of date and party hubs from multiple perspectives. By computationally partitioning protein interaction networks into functionally coherent subnetworks, we show that the roles of hubs are more diverse than a binary classification allows. We also show that the position of an interaction in the network is related to the functional similarity of the two interacting proteins: the most important interactions holding the network together appear to be between the most dissimilar proteins. Thus, examining interaction roles may be relevant to understanding the organisation of protein interaction networks.
Translation of mRNA into protein is a unidirectional information flow process. Analysing the input (mRNA) and output (protein) of translation, we find that local protein structure information is encoded in the mRNA nucleotide sequence. The Coding Sequence and Structure (CSandS) database developed in this work provides a detailed mapping between over 4000 solved protein structures and their mRNA. CSandS facilitates a comprehensive analysis of codon usage over many organisms. In assigning translation speed, we find that relative codon usage is less informative than tRNA concentration. For all speed measures, no evidence was found that domain boundaries are enriched with slow codons. In fact, genes seemingly avoid slow codons around structurally defined domain boundaries. Translation speed, however, does decrease at the transition into secondary structure. Codons are identified that have structural preferences significantly different from the amino acid they encode. However, each organism has its own set of ‘significant codons’. Our results support the premise that codons encode more information than merely amino acids and give insight into the role of translation in protein folding.
Ever since the ground-breaking work of Anfinsen et al. in which a denatured protein was found to refold to its native state, it has been frequently stated by the protein fold prediction community that all the information required for protein folding lies in the amino acid sequence. Recent in vitro experiments and in silico computational studies, however, have shown that cotranslation may affect the folding pathway of some proteins, especially those of ancient folds. In this paper aspects of cotranslational folding have been incorporated into a protein structure prediction algorithm by adapting the Rosetta program to fold proteins as the nascent chain elongates. This makes it possible to conduct a pairwise comparison of folding accuracy, by comparing folds created sequentially from each end of the protein.
A single main result emerged: in 94% of proteins analyzed, following the sense of translation, from N-terminus to C-terminus, produced better predictions than following the reverse sense of translation, from the C-terminus to N-terminus. Two secondary results emerged. First, this superiority of N-terminus to C-terminus folding was more marked for proteins showing stronger evidence of cotranslation and second, an algorithm following the sense of translation produced predictions comparable to, and occasionally better than, Rosetta.
There is a directionality effect in protein fold prediction. At present, prediction methods appear to be too noisy to take advantage of this effect; as techniques refine, it may be possible to draw benefit from a sequential approach to protein fold prediction.
Chemotaxis is the process by which motile bacteria sense their chemical environment and move towards more favourable conditions. Escherichia coli utilises a single sensory pathway, but little is known about signalling pathways in species with more complex systems.
To investigate whether chemotaxis pathways in other bacteria follow the E. coli paradigm, we analysed 206 species encoding at least 1 homologue of each of the 5 core chemotaxis proteins (CheA, CheB, CheR, CheW and CheY). 61 species encode more than one of all of these 5 proteins, suggesting they have multiple chemotaxis pathways. Operon information is not available for most bacteria, so we developed a novel statistical approach to cluster che genes into putative operons. Using operon-based models, we reconstructed putative chemotaxis pathways for all 206 species. We show that cheA-cheW and cheR-cheB have strong preferences to occur in the same operon as two-gene blocks, which may reflect a functional requirement for co-transcription. However, other che genes, most notably cheY, are more dispersed on the genome. Comparison of our operons with shuffled equivalents demonstrates that specific patterns of genomic location may be a determining factor for the observed in vivo chemotaxis pathways.
We then examined the chemotaxis pathways of Rhodobacter sphaeroides. Here, the PpfA protein is known to be critical for correct partitioning of proteins in the cytoplasmically-localised pathway. We found ppfA in che operons of many species, suggesting that partitioning of cytoplasmic Che protein clusters is common. We also examined the apparently non-typical chemotaxis components, CheA3, CheA4 and CheY6. We found that though variants of CheA proteins are rare, the CheY6 variant may be a common type of CheY, with a significantly disordered C-terminal region which may be functionally significant.
We find that many bacterial species potentially have multiple chemotaxis pathways, with grouping of che genes into operons likely to be a major factor in keeping signalling pathways distinct. Gene order is highly conserved with cheA-cheW and cheR-cheB blocks, perhaps reflecting functional linkage. CheY behaves differently to other Che proteins, both in its genomic location and its putative protein interactions, which should be considered when modelling chemotaxis pathways.
Motivation: Functional module detection within protein interaction networks is a challenging problem due to the sparsity of data and presence of errors. Computational techniques for this task range from purely graph theoretical approaches involving single networks to alignment of multiple networks from several species. Current network alignment methods all rely on protein sequence similarity to map proteins across species.
Results: Here we carry out network alignment using a protein functional similarity measure. We show that using functional similarity to map proteins across species improves network alignment in terms of functional coherence and overlap with experimentally verified protein complexes. Moreover, the results from functional similarity-based network alignment display little overlap (<15%) with sequence similarity-based alignment. Our combined approach integrating sequence and function-based network alignment alongside graph clustering properties offers a 200% increase in coverage of experimental datasets and comparable accuracy to current network alignment methods.
Availability: Program binaries and source code is freely available at http://www.stats.ox.ac.uk/research/bioinfo/resources
Supplementary Information: Supplementary data are available at Bioinformatics online.
Summary: iMembrane is a homology-based method, which predicts a membrane protein's position within a lipid bilayer. It projects the results of coarse-grained molecular dynamics simulations onto any membrane protein structure or sequence provided by the user. iMembrane is simple to use and is currently the only computational method allowing the rapid prediction of a membrane protein's lipid bilayer insertion. Bilayer insertion data are essential in the accurate structural modelling of membrane proteins or the design of drugs that target them.
Availability: http://imembrane.info. iMembrane is available under a non-commercial open-source licence, upon request.
Supplementary information: Supplementary data are available at Bioinformatics online and at http://www.stats.ox.ac.uk/proteins/resources.