Reliable prediction of stability changes in protein variants is an important aspect of computational protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site.
We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work.
EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure.
Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP.
Protein structure prediction is computationally a very challenging problem. A large number of existing search
algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. The improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution 20 × 20 energy model for search guiding. The experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads.
Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set.
We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51.
Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users.
protein mutation; stability changes; machine learning
Prediction of the structural classes of proteins can provide important information about their functionalities as well as their major tertiary structures. It is also considered as an important step towards protein structure prediction problem. Despite all the efforts have been made so far, finding a fast and accurate computational approach to solve protein structural class prediction problem still remains a challenging problem in bioinformatics and computational biology.
In this study we propose segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins. By applying SVM to our extracted features, for the first time we enhance the protein structural class prediction accuracy to over 90% and 85% for two popular low-homology benchmarks that have been widely used in the literature. We report 92.2% and 86.3% prediction accuracies for 25PDB and 1189 benchmarks which are respectively up to 7.9% and 2.8% better than previously reported results for these two benchmarks.
By proposing segmented distribution and segmented auto covariance feature extraction methods to capture local and global discriminatory information from evolutionary profiles and predicted secondary structure of the proteins, we are able to enhance the protein structural class prediction performance significantly.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-S1-S2) contains supplementary material, which is available to authorized users.
Protein structural class prediction problem; Structural features; Evolutionary features; Segmented auto covariance; Segmented distribution; Support Vector Machine (SVM)
Protein structure prediction (PSP) is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20 × 20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.
Affect of different environmental factors i.e., temperature, relative humidity and precipitation on population dynamics, density and foraging activities of Microtermes obesi Holmgren and Odontotermes lokanandi Chatarjee and Thakur (Isoptera: Termitidae) were studied from March 2010 to July 2012 in Islamabad. A total of 1200 poplar wooden stakes was used for monitoring the termite activities in Islamabad. The results showed that 65 out of 1200 poplar wooden stakes were found infested by both species i.e. M. obesi and O. lokanandi. Both species were interacting with each other in the experimental field and O. lokanandi was found significantly dominant. Mean yield per trap ranged from 0.83 ± 0.20 gm to 1.12 ± 0.28 gm and 0.35 ± 0.09 gm to 0.82 ± 0.19 gm for M. obesi and O. lokanandi in the field, respectively. M. obesi and O. lokanandi in 1.0 gm sample ranged from 539.83 ± 2.21 to 567.83 ± 9.41 and 407.67 ± 4.75 to 424.5 ± 1.15 individuals, respectively. Population of workers ranged from 93.53 ± 1.73 to 97.68 ± 0.40 and 91.69 ± 1.42 to 98.41 ± 0.50 percent for M. obesi and O. lokanandi, respectively.
Positive and significant correlation was found among atmospheric temperature, precipitation and both subterranean termite species i.e., M. obesi and O. lokanandi; however, the correlation was found non significant and negative between relative humidity and foraging activities of both termite species.
Moreover, correlation was found positive and significant between atmospheric temperature and percent workers of M. obesi; while negative and non-significant between atmospheric temperature and percent workers of O. lokanandi. Negative and significant correlation was noted between relative humidity and percent workers of M. obesi; whereas, positive and significant correlation was recorded between relative humidity and percent workers of O. lokanandi. Positive and non-significant correlation was recorded between precipitation and percent workers of M. obesi; while positive and significant correlation was observed between precipitation and percent workers of O. lokanand.
O. lokanandi; M. obesi; NIFA TERMAPs; Temperature; Relative humidity; Rainfall
An ageing population and higher rates of chronic disease increase the demand on health services. The Australian Institute of Health and Welfare reports a 3.6% per year increase in total elective surgery admissions over the past four years.1 The newly introduced National Elective Surgery Target (NEST) stresses the need for efficiency and necessitates the development of improved planning and scheduling systems in hospitals.
To provide an overview of the challenges of elective surgery scheduling and develop a prediction based methodology to drive optimal management of scheduling processes.
Our proposed two stage methodology initially employs historic utilisation data and current waiting list information to manage case mix distribution. A novel algorithm uses current and past perioperative information to accurately predict surgery duration. A NEST-compliance guided optimisation algorithm is then used to drive allocation of patients to the theatre schedule.
It is expected that the resulting improvement in scheduling processes will lead to more efficient use of surgical suites, higher productivity, and lower labour costs, and ultimately improve patient outcomes.
Accurate prediction of workload and surgery duration, retrospective and current waitlist as well as perioperative information, and NEST-compliance driven allocation of patients are employed by our proposed methodology in order to deliver further improvement to hospital operating facilities.
Surgery scheduling; Predictive optimisation; Waiting list
Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a protein are of importance. While evolutionary relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutionary features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored.
We proposed a number of evolutionary and predicted structural features for the prediction of stability changes and analysed which of them capture the determinants of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combination of the evolutionary features mutation likelihood and SIFTscore in conjunction with the predicted structural feature secondary structure. The same two evolutionary features in the combination with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance.
Although the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combination of evolutionary and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial when appropriately combined with evolutionary features. We conclude that a high prediction accuracy can be achieved knowing only the sequence of a protein when the right combination of both structural and evolutionary features is used.
Given a protein's amino acid sequence, the protein structure prediction problem is to find a three dimensional structure that has the native energy level. For many decades, it has been one of the most challenging problems in computational biology. A simplified version of the problem is to find an on-lattice self-avoiding walk that minimizes the interaction energy among the amino acids. Local search methods have been preferably used in solving the protein structure prediction problem for their efficiency in finding very good solutions quickly. However, they suffer mainly from two problems: re-visitation and stagnancy.
In this paper, we present an efficient local search algorithm that deals with these two problems. During search, we select the best candidate at each iteration, but store the unexplored second best candidates in a set of elite conformations, and explore them whenever the search faces stagnation. Moreover, we propose a new non-isomorphic encoding for the protein conformations to store the conformations and to check similarity when applied with a memory based search. This new encoding helps eliminate conformations that are equivalent under rotation and translation, and thus results in better prevention of re-visitation.
On standard benchmark proteins, our algorithm significantly outperforms the state-of-the art approaches for Hydrophobic-Polar energy models and Face Centered Cubic Lattice.
Protein structure prediction is an important but unsolved problem in biological science. Predicted structures vary much with energy functions and structure-mapping spaces. In our simplified ab initio protein structure prediction methods, we use hydrophobic-polar (HP) energy model for structure evaluation, and 3-dimensional face-centred-cubic lattice for structure mapping. For HP energy model, developing a compact hydrophobic-core (H-core) is essential for the progress of the search. The H-core helps find a stable structure with the lowest possible free energy.
In order to build H-cores, we present a new Spiral Search algorithm based on tabu-guided local search. Our algorithm uses a novel H-core directed guidance heuristic that squeezes the structure around a dynamic hydrophobic-core centre. We applied random walks to break premature H-cores and thus to avoid early convergence. We also used a novel relay-restart technique to handle stagnation.
We have tested our algorithms on a set of benchmark protein sequences. The experimental results show that our spiral search algorithm outperforms the state-of-the-art local search algorithms for simplified protein structure prediction. We also experimentally show the effectiveness of the relay-restart.
Causal models of physiological systems can be immensely useful in medicine as they may be used for both diagnostic and therapeutic reasoning.
In this paper we investigate how an agent may use the theory of belief change to rectify simple causal models of changing blood sugar levels in diabetes patients.
We employ the semantic approach to belief change together with a popular measure of distance called Dalal distance between different state descriptions in order to implement a simple application that simulates the effectiveness of the proposed method in helping an agent rectify a simple causal model.
Our simulation results show that distance-based belief change can help in improving the agent’s causal knowledge. However, under the current implementation there is no guarantee that the agent will learn the complete model and the agent may at times get stuck in local optima.
Distance-based belief change can help in refining simple causal models such as the example in this paper. Future work will include larger state-action spaces, better distance measures and strategies for choosing actions.
Belief Change; Belief Update; Belief Revision; Causal Models; Glucose Metabolism; Diabetes
In Medical Informatics, there is an increasing awareness that temporal information plays a crucial role, so that suitable database approaches are needed to store and support it. Specifically, most clinical data are intrinsically temporal, and a relevant part of them are now-relative (i.e., they are valid at the current time). Even if previous studies indicate that the treatment of now-relative data has a crucial impact on efficiency, current approaches have several limitations. In this paper we propose a novel approach, which is based on a new representation of ‘now’, and on query transformations. We also experimentally demonstrate that our approach outperforms its best competitors in the literature to the extent of a factor of more than ten, both in number of disk accesses and of CPU usage.