PROFtmb predicts transmembrane beta-barrel (TMB) proteins in Gram-negative bacteria. For each query protein, PROFtmb provides both a Z-value indicating that the protein actually contains a membrane barrel, and a four-state per-residue labeling of upward- and downward-facing strands, periplasmic hairpins and extracellular loops. While most users submit individual proteins known to contain TMBs, some groups submit entire proteomes to screen for potential TMBs. Response time is about 4 min for a 500-residue protein. PROFtmb is a profile-based Hidden Markov Model (HMM) with an architecture mirroring the structure of TMBs. The per-residue accuracy on the 8-fold cross-validated testing set is 86% while whole-protein discrimination accuracy was 70 at 60% coverage. The PROFtmb web server includes all source code, training data and whole-proteome predictions from 78 Gram-negative bacterial genomes and is available freely and without registration at .
Transmembrane β-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria and chloroplasts. The cellular location and functional diversity of β-barrel outer membrane proteins makes them an important protein class. At the present time, very few non-homologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane (TM) proteins. The transFold web server uses pairwise inter-strand residue statistical potentials derived from globular (non-outer-membrane) proteins to predict the supersecondary structure of TMB. Unlike all previous approaches, transFold does not use machine learning methods such as hidden Markov models or neural networks; instead, transFold employs multi-tape S-attribute grammars to describe all potential conformations, and then applies dynamic programming to determine the global minimum energy supersecondary structure. The transFold web server not only predicts secondary structure and TMB topology, but is the only method which additionally predicts the side-chain orientation of transmembrane β-strand residues, inter-strand residue contacts and TM β-strand inclination with respect to the membrane. The program transFold currently outperforms all other methods for accuracy of β-barrel structure prediction. Available at .
Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method.
We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies.
The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at .
Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discriminating between bbtm proteins and other proteins, screening of entire genomes remains troublesome as these molecules only constitute a small fraction of the sequences screened. Therefore, novel methods are still required capable of detecting new families of bbtm protein in diverse genomes.
We present TMB-Hunt, a program that uses a k-Nearest Neighbour (k-NN) algorithm to discriminate between bbtm and non-bbtm proteins on the basis of their amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, an accuracy of 92.5% was achieved, with 91% sensitivity and 93.8% positive predictive value (PPV), using a rigorous cross-validation procedure.
A major advantage of this approach is that because it does not rely on beta-strand detection, it does not require resolved structures and thus larger, more representative, training sets could be used. It is therefore believed that this approach will be invaluable in complementing other, physicochemical and homology based methods. This was demonstrated by the correct reassignment of a number of proteins which other predictors failed to classify. We have used the algorithm to screen several genomes and have discussed our findings.
TMB-Hunt achieves a prediction accuracy level better than other approaches published to date. Results were significantly enhanced by use of evolutionary information and a system for calibrating k-NN scoring. Because the program uses a distinct approach to that of other discriminators and thus suffers different liabilities, we believe it will make a significant contribution to the development of a consensus approach for bbtm protein detection.
Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all β-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20 000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at http://toolkit.tuebingen.mpg.de/hhomp.
Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have been introduced for recognition and structure prediction of transmembrane β-barrel proteins. Most of these methods emphasize on homology search rather than any biological or chemical basis.
We present a novel graph-theoretic model for classification and structure prediction of transmembrane β-barrel proteins. This model folds proteins based on energy minimization rather than a homology search, avoiding any assumption on availability of training dataset. The ab initio model presented in this paper is the first method to allow for permutations in the structure of transmembrane proteins and provides more structural information than any known algorithm. The model is also able to recognize β-barrels by assessing the pseudo free energy. We assess the structure prediction on 41 proteins gathered from existing databases on experimentally validated transmembrane β-barrel proteins. We show that our approach is quite accurate with over 90% F-score on strands and over 74% F-score on residues. The results are comparable to other algorithms suggesting that our pseudo-energy model is close to the actual physical model. We test our classification approach and show that it is able to reject α-helical bundles with 100% accuracy and β-barrel lipocalins with 97% accuracy.
We show that it is possible to design models for classification and structure prediction for transmembrane β-barrel proteins which do not depend essentially on training sets but on combinatorial properties of the structures to be proved. These models are fairly accurate, robust and can be run very efficiently on PC-like computers. Such models are useful for the genome screening.
TMB-Hunt is a program that uses a modified k-nearest neighbour (k-NN) algorithm to classify protein sequences as transmembrane β-barrel (TMB) or non-TMB on the basis of whole sequence amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, a discrimination accuracy of 92.5% was achieved, as tested using a rigorous cross-validation procedure. The TMB-Hunt web server, available at , allows screening of up to 10 000 sequences in a single query and provides results and key statistics in a simple colour coded format.
Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences.
The training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set.
Based on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: , and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology.
The β-barrel outer membrane proteins constitute one of the two known structural classes of membrane proteins. Whereas there are several different web-based predictors for α-helical membrane proteins, currently there is no freely available prediction method for β-barrel membrane proteins, at least with an acceptable level of accuracy. We present here a web server (PRED-TMBB, http://bioinformatics.biol.uoa.gr/PRED-TMBB) which is capable of predicting the transmembrane strands and the topology of β-barrel outer membrane proteins of Gram-negative bacteria. The method is based on a Hidden Markov Model, trained according to the Conditional Maximum Likelihood criterion. The model was retrained and the training set now includes 16 non-homologous outer membrane proteins with structures known at atomic resolution. The user may submit one sequence at a time and has the option of choosing between three different decoding methods. The server reports the predicted topology of a given protein, a score indicating the probability of the protein being an outer membrane β-barrel protein, posterior probabilities for the transmembrane strand prediction and a graphical representation of the assumed position of the transmembrane strands with respect to the lipid bilayer.
Recently we reported a nanocontainer based reduction triggered release system through an engineered transmembrane channel (FhuA Δ1-160; Onaca et al., 2008). Compound fluxes within the FhuA Δ1-160 channel protein are controlled sterically through labeled lysine residues (label: 3-(2-pyridyldithio)propionic-acid-N-hydroxysuccinimide-ester). Quantifying the sterical contribution of each labeled lysine would open up an opportunity for designing compound specific drug release systems.
In total, 12 FhuA Δ1-160 variants were generated to gain insights on sterically controlled compound fluxes: Subset A) six FhuA Δ1-160 variants in which one of the six lysines in the interior of FhuA Δ1-160 was substituted to alanine and Subset B) six FhuA Δ1-160 variants in which only one lysine inside the barrel was not changed to alanine. Translocation efficiencies were quantified with the colorimetric TMB (3,3',5,5'-tetramethylbenzidine) detection system employing horseradish peroxidase (HRP). Investigation of the six subset A variants identified position K556A as sterically important. The K556A substitution increases TMB diffusion from 15 to 97 [nM]/s and reaches nearly the TMB diffusion value of the unlabeled FhuA Δ1-160 (102 [nM]/s). The prominent role of position K556 is confirmed by the corresponding subset B variant which contains only the K556 lysine in the interior of the barrel. Pyridyl labeling of K556 reduces TMB translocation to 16 [nM]/s reaching nearly background levels in liposomes (13 [nM]/s). A first B-factor analysis based on MD simulations confirmed that position K556 is the least fluctuating lysine among the six in the channel interior of FhuA Δ1-160 and therefore well suited for controlling compound fluxes through steric hindrance.
A FhuA Δ1-160 based reduction triggered release system has been shown to control the compound flux by the presence of only one inner channel sterical hindrance based on 3-(2-pyridyldithio)propionic-acid labeling (amino acid position K556). As a consequence, the release kinetic can be modulated by introducing an opportune number of hindrances. The FhuA Δ1-160 channel embedded in liposomes can be advanced to a universal and compound independent release system which allows a size selective compound release through rationally re-engineered channels.
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/
The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.
Objective. We sought to evaluate the accuracy of transperineal mapping biopsy (TMB) by comparing it to the pathology specimen of patients who underwent radical prostatectomy (RP) for localized prostate cancer. Methods. From March 2007 to September 2009, 78 men at a single center underwent TMB; 17 of 78 subsequently underwent RP. TMB cores were grouped into four quadrants and matched to data from RP whole-mount slides. Gleason score, tumor location and volume, cross-sectional area, and maximal diameter were measured; sensitivity and specificity were assessed. Results. For the 17 patients who underwent RP, TMB revealed 12 (71%) had biopsy Gleason grades ≥ 3 + 4 and 13 (76%) had bilateral disease. RP specimens showed 14 (82%) had Gleason scores ≥ 3 + 4 and 13 (76%) had bilateral disease. Sensitivity and specificity of TMB for prostate cancer detection were 86% (95% confidence interval [CI] 72%–94%) and 83% (95% CI 62%–95%), respectively. Four quadrants negative for cancer on TMB were positive on prostatectomy, and six positive on TMB were negative on prostatectomy. Conclusion. TMB is a highly invasive procedure that can accurately detect and localize prostate cancer. These findings help establish baseline performance characteristics for TMB and its utility for organ-sparing strategies.
Channel proteins like the engineered FhuA Δ1-159 often cannot insert into thick polymeric membranes due to a mismatch between the hydrophobic surface of the protein and the hydrophobic surface of the polymer membrane. To address this problem usually specific block copolymers are synthesized to facilitate protein insertion. Within this study in a reverse approach we match the protein to the polymer instead of matching the polymer to the protein.
To increase the FhuA Δ1-159 hydrophobic surface by 1 nm, the last 5 amino acids of each of the 22 β-sheets, prior to the more regular periplasmatic β-turns, were doubled leading to an extended FhuA Δ1-159 (FhuA Δ1-159 Ext). The secondary structure prediction and CD spectroscopy indicate the β-barrel folding of FhuA Δ1-159 Ext. The FhuA Δ1-159 Ext insertion and functionality within a nanocontainer polymeric membrane based on the triblock copolymer PIB1000-PEG6000-PIB1000 (PIB = polyisobutylene, PEG = polyethyleneglycol) has been proven by kinetic analysis using the HRP-TMB assay (HRP = Horse Radish Peroxidase, TMB = 3,3',5,5'-tetramethylbenzidine). Identical experiments with the unmodified FhuA Δ1-159 report no kinetics and presumably no insertion into the PIB1000-PEG6000-PIB1000 membrane. Furthermore labeling of the Lys-NH2 groups present in the FhuA Δ1-159 Ext channel, leads to controllability of in/out flux of substrates and products from the nanocontainer.
Using a simple "semi rational" approach the protein's hydrophobic transmembrane region was increased by 1 nm, leading to a predicted lower hydrophobic mismatch between the protein and polymer membrane, minimizing the insertion energy penalty. The strategy of adding amino acids to the FhuA Δ1-159 Ext hydrophobic part can be further expanded to increase the protein's hydrophobicity, promoting the efficient embedding into thicker/more hydrophobic block copolymer membranes.
Research on therapeutic massage bodywork (TMB) continues to expand, but few studies consider how research or knowledge translation may be affected by the lack of uniformly standardized competencies for most TMB therapies, by practitioner variability from training in different forms of TMB, or from the effects of experience on practice.
This study explores and describes how TMB practitioners practice, for the purpose of improving TMB training, practice, and research.
Participants & Setting
19 TMB practitioners trained in multiple TMB therapies, in Alberta, Canada.
Qualitative descriptive sub-analysis of interviews from a comprehensive project on the training and practice of TMB, focused on the delivery of TMB therapies in practice.
Two broad themes emerged from the data: (1) every treatment is individualized, and (2) each practitioner’s practice of TMB therapies evolves. Individualization involves adapting treatment to the needs of the patient in the moment, based on deliberate and unconscious responses to verbal and nonverbal cues. Individualization starts with initial assessment and continues throughout the treatment encounter. Expertise is depicted as more nuanced and skilful individualization and treatment, evolved through experience, ongoing training, and spontaneous technique exploration. Practitioners consider such individualization and development of experience desirable. Furthermore, ongoing training and experience result in therapy application unique to each practitioner. Most practitioners believed they could not apply a TMB therapy without influence from other TMB therapies they had learned.
There are ramifications for research design, knowledge translation, and education. Few practitioners are likely able to administer treatments in the same way, and most would not like to practice without being able to individualize treatment. TMB clinical studies need to employ research methods that accommodate the complexity of clinical practice. TMB education should facilitate the maturation of practice skills and self-reflection, including the mindful integration of multiple TMB therapies.
complementary therapies/methods; massage; musculoskeletal manipulations; clinical competence; decision-making; qualitative research; clinical practice
Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.
Transmembrane proteins control the flow of information and substances into and out of the cell and are involved in a broad range of biological processes. Their interfacing role makes them rewarding drug targets, and it is estimated that more than 50% of recently launched drugs target membrane proteins. However, experimentally determining the three-dimensional structure of a transmembrane protein is still a difficult task, and few of the currently known tertiary structures are of transmembrane proteins despite the fact that as many as one quarter of the proteins in a given organism are transmembrane proteins. Computational methods for predicting the basic topology of a transmembrane protein are therefore of great interest, and these methods must be able to distinguish between mature, membrane-spanning proteins and proteins that, when first synthesized, contain an N-terminal membrane-spanning signal peptide. In this work, we present Philius, a new computational approach that outperforms previous methods in simultaneously detecting signal peptides and correctly predicting the topology of transmembrane proteins. Philius also supplies a set of confidence scores with each prediction. A Philius Web server is available to the public as well as precomputed predictions for over six million proteins in the Yeast Resource Center database.
Beta-barrel membrane proteins (MP) are found in Gram-negative bacteria, mitochondria and chloroplasts. They play important roles in metabolism of bacteria, where they are involved in transport of solutes in and out of the cell. Beta-barrel proteins may also act as proteases, lipases and may be important for cell-cell adhesion. Currently, there are about 30 non-redundant solved structures of β-barrels. Although the number of b-barrel folds is fairly small, it is possible to expand the amount of available structural information by homology modeling using existing structures as templates. The scope of structure prediction may be widened by finding remote homologues of the existing structures. To improve the sensitivity of the database searches and the quality of sequence alignments, we first study evolutionary history of transmembrane segments of 7 β-barrel membrane proteins by estimating substitution rates with a Bayesian Monte Carlo approach. Next, we calculate amino acid substitution matrices, beta-barrel Transmembrane scoring Matrices (bbTM), specifically tuned for TM regions, which can be used to detect remote homologues. We then test bbTM matrices by comparing their performance with membrane-protein derived scoring matrices PHAT and SLIM. Our results demonstrate that bbTM matrices have higher selectivity towards transmembrane β-barrel proteins and may be used with higher confidence in database searches for remote homologues of this class of proteins.
Substitution rate; scoring matrices; beta barrel membrane proteins; bioinformatics
This work describes the development of a program that predicts whether or not a polypeptide sequence from a Gram-negative bacterium is an integral β-barrel outer membrane protein. The program, called the β-barrel Outer Membrane protein Predictor (BOMP), is based on two separate components to recognize integral β-barrel proteins. The first component is a C-terminal pattern typical of many integral β-barrel proteins. The second component calculates an integral β-barrel score of the sequence based on the extent to which the sequence contains stretches of amino acids typical of transmembrane β-strands. The precision of the predictions was found to be 80% with a recall of 88% when tested on the proteins with SwissProt annotated subcellular localization in Escherichia coli K 12 (788 sequences) and Salmonella typhimurium (366 sequences). When tested on the predicted proteome of E.coli, BOMP found 103 of a total of 4346 polypeptide sequences to be possible integral β-barrel proteins. Of these, 36 were found by BLAST to lack similarity (E-value score < 1e−10) to proteins with annotated subcellular localization in SwissProt. BOMP predicted the content of integral β-barrels per predicted proteome of 10 different bacteria to range from 1.8 to 3%. BOMP is available at http://www.bioinfo.no/tools/bomp.
Motivation: We previously reported the development of a highly accurate statistical algorithm for identifying β-barrel outer membrane proteins or transmembrane β-barrels (TMBBs), from genomic sequence data of Gram-negative bacteria (Freeman,T.C. and Wimley,W.C. (2010) Bioinformatics, 26, 1965–1974). We have now applied this identification algorithm to all available Gram-negative bacterial genomes (over 600 chromosomes) and have constructed a publicly available, searchable, up-to-date, database of all proteins in these genomes.
Results: For each protein in the database, there is information on (i) β-barrel membrane protein probability for identification of β-barrels, (ii) β-strand and β-hairpin propensity for structure and topology prediction, (iii) signal sequence score because most TMBBs are secreted through the inner membrane translocon and, thus, have a signal sequence, and (iv) transmembrane α-helix predictions, for reducing false positive predictions. This information is sufficient for the accurate identification of most β-barrel membrane proteins in these genomes. In the database there are nearly 50 000 predicted TMBBs (out of 1.9 million total putative proteins). Of those, more than 15 000 are ‘hypothetical’ or ‘putative’ proteins, not previously identified as TMBBs. This wealth of genomic information is not available anywhere else.
Availability: The TMBB genomic database is available at http://beta-barrel.tulane.edu/.
-barrel membrane proteins play an important role in controlling the exchange and transport of ions and organic molecules across bacterial and mitochondrial outer membranes. They are also major regulators of apoptosis and are important determinants of bacterial virulence. In contrast to -helical membrane proteins, their evolutionary pattern of residue substitutions has not been quantified, and there are no scoring matrices appropriate for their detection through sequence alignment. Using a Bayesian Monte Carlo estimator, we have calculated the instantaneous substitution rates of transmembrane domains of bacterial -barrel membrane proteins. The scoring matrices constructed from the estimated rates, called bbTM for -barrel Transmembrane Matrices, improve significantly the sensitivity in detecting homologs of -barrel membrane proteins, while avoiding erroneous selection of both soluble proteins and other membrane proteins of similar composition. The estimated evolutionary patterns are general and can detect -barrel membrane proteins very remote from those used for substitution rate estimation. Furthermore, despite the separation of 2–3 billion years since the proto-mitochondrion entered the proto-eukaryotic cell, mitochondria outer membrane proteins in eukaryotes can also be detected accurately using these scoring matrices derived from bacteria. This is consistent with the suggestion that there is no eukaryote-specific signals for translocation. With these matrices, remote homologs of -barrel membrane proteins with known structures can be reliably detected at genome scale, allowing construction of high quality structural models of their transmembrane domains, at the rate of 131 structures per template protein. The scoring matrices will be useful for identification, classification, and functional inference of membrane proteins from genome and metagenome sequencing projects. The estimated substitution pattern will also help to identify key elements important for the structural and functional integrity of -barrel membrane proteins, and will aid in the design of mutagenesis studies.
The identification of β-barrel membrane proteins out of a genomic/proteomic background is one of the rapidly developing fields in bioinformatics. Our main goal is the prediction of such proteins in genome/proteome wide analyses.
For the prediction of β-barrel membrane proteins within prokaryotic proteomes a set of parameters was developed. We have focused on a procedure with a low false positive rate beside a procedure with lowest false prediction rate to obtain a high certainty for the predicted sequences. We demonstrate that the discrimination between β-barrel membrane proteins and other proteins is improved by analyzing a length limited region. The developed set of parameters is applied to the proteome of E. coli and the results are compared to four other described procedures.
Analyzing the β-barrel membrane proteins revealed the presence of a defined membrane inserted β-barrel region. This information can now be used to refine other prediction programs as well. So far, all tested programs fail to predict outer membrane proteins in the proteome of the prokaryote E. coli with high reliability. However, the reliability of the prediction is improved significantly by a combinatory approach of several programs. The consequences and usability of the developed scores are discussed.
Transmembrane proteins have important roles in cells, as they are involved in energy production, signal transduction, cell-cell interaction, cell-cell communication and more. In human cells, they are frequently targets for pharmaceuticals; therefore, knowledge about their properties and structure is crucial. Topology of transmembrane proteins provide a low resolution structural information, which can be a starting point for either laboratory experiments or modelling their 3D structures.
Here, we present a database of the human α-helical transmembrane proteome, including the predicted and/or experimentally established topology of each transmembrane protein, together with the reliability of the prediction. In order to distinguish transmembrane proteins in the proteome as well as for topology prediction, we used a newly developed consensus method (CCTOP) that incorporates recent state of the art methods, with tested accuracies on a novel human benchmark protein set. CCTOP utilizes all available structure and topology data as well as bioinformatical evidences for topology prediction in a probabilistic framework provided by the hidden Markov model. This method shows the highest accuracy (98.5 % for discrinimating between transmembrane and non-transmembrane proteins and 84 % for per protein topology prediction) among the dozen tested topology prediction methods. Analysis of the human proteome with the CCTOP indicates that it contains 4998 (26 %) transmembrane proteins. Besides predicting topology, reliability of the predictions is estimated as well, and it is demonstrated that the per protein prediction accuracies of more than 60 % of the predictions are over 98 % on the benchmark sets and most probably on the predicted human transmembrane proteome too.
Here, we present the most accurate prediction of the human transmembrane proteome together with the experimental topology data. These data, as well as various statistics about the human transmembrane proteins and their topologies can be downloaded from and can be visualized at the website of the human transmembrane proteome (http://htp.enzim.hu).
This article was reviewed by Dr. Sandor Pongor, Dr. Michael Galperin and Dr. Pascale Gaudet (nominated by Dr Michael Galperin).
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0061-x) contains supplementary material, which is available to authorized users.
Transmembrane protein; Topology prediction; Hidden markov model; Constrained prediction
The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18 = 80 ± 3% for eukaryotes and a six-state accuracy Q6 = 89 ± 4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3.
Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families.
Here, we describe a new method "TMpro" to predict TM helices with high accuracy. To avoid overfitting to existing topologies, we have collapsed cytoplasmic and extracellular labels to a single state, non-TM. TMpro is a binary classifier which predicts TM or non-TM using multiple amino acid properties (charge, polarity, aromaticity, size and electronic properties) as features. The features are extracted from sequence information by applying the framework used for latent semantic analysis of text documents and are input to neural networks that learn the distinction between TM and non-TM segments. The model uses only 25 free parameters. In benchmark analysis TMpro achieves 95% segment F-score corresponding to 50% reduction in error rate compared to the best methods not requiring an evolutionary profile of a protein to be known. Performance is also improved when applied to more recent and larger high resolution datasets PDBTM and MPtopo. TMpro predictions in membrane proteins with unusual or disputed TM structure (K+ channel, aquaporin and HIV envelope glycoprotein) are discussed.
TMpro uses very few free parameters in modeling TM segments as opposed to the very large number of free parameters used in state-of-the-art membrane prediction methods, yet achieves very high segment accuracies. This is highly advantageous considering that high resolution transmembrane information is available only for very few proteins. The greatest impact of TMpro is therefore expected in the prediction of TM segments in proteins with novel topologies. Further, the paper introduces a novel method of extracting features from protein sequence, namely that of latent semantic analysis model. The success of this approach in the current context suggests that it can find potential applications in other sequence-based analysis problems.
Type IV pili (T4Ps) are surface appendages used by Gram-negative and Gram-positive pathogens for motility and attachment to epithelial surfaces. In Gram-negative bacteria, such as the important pediatric pathogen enteropathogenic Escherichia coli (EPEC), during extension and retraction, the pilus passes through an outer membrane (OM) pore formed by the multimeric secretin complex. The secretin is common to Gram-negative assemblies, including the related type 2 secretion (T2S) system and the type 3 secretion (T3S) system. The N termini of the secretin monomers are periplasmic and in some systems have been shown to mediate substrate specificity. In this study, we mapped the topology of BfpB, the T4P secretin from EPEC, using a combination of biochemical and biophysical techniques that allowed selective identification of periplasmic and extracellular residues. We applied rules based on solved atomic structures of outer membrane proteins (OMPs) to generate our topology model, combining the experimental results with secondary structure prediction algorithms and direct inspection of the primary sequence. Surprisingly, the C terminus of BfpB is extracellular, a result confirmed by flow cytometry for BfpB and a distantly related T4P secretin, PilQ, from Pseudomonas aeruginosa. Keeping with prior evidence, the C termini of two T2S secretins and one T3S secretin were not detected on the extracellular surface. On the basis of our data and structural constraints, we propose that BfpB forms a beta barrel with 16 transmembrane beta strands. We propose that the T4P secretins have a C-terminal segment that passes through the center of each monomer.
Secretins are multimeric proteins that allow the passage of secreted toxins and surface structures through the outer membranes (OMs) of Gram-negative bacteria. To date, there have been no atomic structures of the C-terminal region of a secretin, although electron microscopy (EM) structures of the complex are available. This work provides a detailed topology prediction of the membrane-spanning domain of a type IV pilus (T4P) secretin. Our study used innovative techniques to provide new and comprehensive information on secretin topology, highlighting similarities and differences among secretin subfamilies. Additionally, the techniques used in this study may prove useful for the study of other OM proteins.
Molecular beacons (MBs) are hairpin-like fluorescent DNA probes that have single-mismatch detection capability. Although they are extremely useful for many solution-based nucleic acid detections, MBs are expensive probes for applications that require the use of a large number of different DNA probes due to the high cost and tedious procedures associated with probe synthesis and purification. In addition, since both ends of MB probes are covalently modified with chromophores, they do not offer the flexibility for fluorophore change and the capability for surface immobilization through free DNA ends. In this report, we describe an alternative form of MB, denoted tripartite molecular beacon (TMB), that may help overcome these problems. A TMB uses an unmodified oligodeoxyribonucleotide that forms a MB-like structure with two universal single-stranded arms to bring on a universal pair of oligodeoxyribonucleotides modified separately with a fluorophore and a quencher. We found that TMBs are as effective as standard MBs in signaling the presence of matching nucleic acid targets and in precisely discriminating targets that differ by a single nucleotide. TMBs have the necessary flexibility that may make MBs more affordable for various nucleic acid detection applications.