Multipole expansions offer a natural path to coarse-graining the electrostatic potential. However, the validity of the expansion is restricted to regions outside a spherical enclosure of the distribution of charge and, therefore, not suitable for most applications that demand accurate representation at arbitrary positions around the molecule. We propose and demonstrate a distributed multipole expansion approach that resolves this limitation. We also provide a practical algorithm for the computational implementation of this approach. The method allows the partitioning of the charge distribution into subsystems so that the multipole expansion of each component of the partition, and therefore of their superposition, is valid outside an enclosing surface of the molecule of arbitrary shape. The complexity of the resulting coarse-grained model of electrostatic potential is dictated by the area of the molecular surface and therefore, for a typical three-dimensional molecule, it scale as N2/3 with N, the number of charges in the system. This makes the method especially useful for coarse-grained studies of biological systems consisting of many large macromolecules provided that the configuration of the individual molecules can be approximated as fixed.
Electrostatic potential; Coarse-graining; Molecular modeling; Multipole moments; Algorithms; Distributed multipole analysis
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.
Motivation: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality.
Results: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model.
Availability: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists
Designers have a saying that “the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years.” It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI), and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD) strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.
The immune epitope database analysis resource (IEDB-AR: http://tools.iedb.org) is a collection of tools for prediction and analysis of molecular targets of T- and B-cell immune responses (i.e. epitopes). Since its last publication in the NAR webserver issue in 2008, a new generation of peptide:MHC binding and T-cell epitope predictive tools have been added. As validated by different labs and in the first international competition for predicting peptide:MHC-I binding, their predictive performances have improved considerably. In addition, a new B-cell epitope prediction tool was added, and the homology mapping tool was updated to enable mapping of discontinuous epitopes onto 3D structures. Furthermore, to serve a wider range of users, the number of ways in which IEDB-AR can be accessed has been expanded. Specifically, the predictive tools can be programmatically accessed using a web interface and can also be downloaded as software packages.
The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.
Here off-target binding implies the binding of a small molecule of therapeutic interest to a protein target other than the primary target for which it was intended. Increasingly such off-targeting appears to be the norm rather than the exception, rational drug design notwithstanding, and can lead to detrimental side-effects, or opportunities to reposition a therapeutic agent to treat a different condition. Not surprisingly, there is significant interest in determining a priori what off-targets exist on a proteome-wide scale. Beyond determining putative off-targets is the need to understand the impact of such binding on the complete biological system, with the ultimate goal of being able to predict the phenotypic outcome. While a very ambitious goal, some progress is being made.
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score, however, these weights should be gene family-dependent. In addition, they incorrectly assume that individual interactions contribute towards the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models; a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of M.tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve.