conjugation of complex post-translational modifications (PTMs)
such as glycosylation and Small Ubiquitin-like Modification (SUMOylation)
to a substrate protein can substantially change the resulting peptide
fragmentation pattern compared to its unmodified counterpart, making
current database search methods inappropriate for the identification
of tandem mass (MS/MS) spectra from such modified peptides. Traditionally
it has been difficult to develop new algorithms to identify these
atypical peptides because of the lack of a large set of annotated
spectra from which to learn the altered fragmentation pattern. Using
SUMOylation as an example, we propose a novel approach to generate
large MS/MS training data from modified peptides and derive an algorithm
that learns properties of PTM-specific fragmentation from such training
data. Benchmark tests on data sets of varying complexity show that
our method is 80–300% more sensitive than current state-of-the-art
approaches. The core concepts of our method are readily applicable
to developing algorithms for the identifications of peptides with
other complex PTMs.
small ubiquitin-like modification (SUMOylation); posttranslational
modification (PTM); combinatorial peptide library; peptide fragmentation patterns; algorithms; database
search method; linked peptides
Motivation: The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects.
Results: An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results.
Conclusion: The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field.
Recent advances in structural bioinformatics have enabled the prediction of protein-drug off-targets based on their ligand binding sites. Concurrent developments in systems biology allow for prediction of the functional effects of system perturbations using large-scale network models. Integration of these two capabilities provides a framework for evaluating metabolic drug response phenotypes in silico. This combined approach was applied to investigate the hypertensive side effect of the cholesteryl ester transfer protein inhibitor torcetrapib in the context of human renal function. A metabolic kidney model was generated in which to simulate drug treatment. Causal drug off-targets were predicted that have previously been observed to impact renal function in gene-deficient patients and may play a role in the adverse side effects observed in clinical trials. Genetic risk factors for drug treatment were also predicted that correspond to both characterized and unknown renal metabolic disorders as well as cryptic genetic deficiencies that are not expected to exhibit a renal disorder phenotype except under drug treatment. This study represents a novel integration of structural and systems biology and a first step towards computational systems medicine. The methodology introduced herein has important implications for drug development and personalized medicine.
Pharmaceutical science is only beginning to scratch the surface on the exact mechanisms of drug action that lead to a drug's breadth of patient responses, both intended and side effects. Decades of clinical trials, molecular studies, and more recent computational analysis have sought to characterize the interactions between a drug and the cell's molecular machinery. We have devised an integrated computational approach to assess how a drug may affect a particular system, in our study the metabolism of the human kidney, and its capacity for filtration of the contents of the blood. We applied this approach to retrospectively investigate potential causal drug targets leading to increased blood pressure in participants of clinical trials for the drug torcetrapib in an effort to display how our approach could be directly useful in the drug development process. Our results suggest specific metabolic enzymes that may be directly responsible for the side effect. The drug screening framework we have developed could be used to link adverse side effects to particular drug targets, discover new uses for old drugs, identify biomarkers for metabolic disease and drug response, and suggest genetic or dietary risk factors to help guide personalized patient care.
Summary: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB’s integrated MyPDB service.
Availability and implementation: RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org).
Genome-Wide Association Studies (GWAS), whole genome sequencing, and high-throughput omics techniques have generated vast amounts of genotypic and molecular phenotypic data. However, these data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, which continues along a one-drug-one-target-one-disease paradigm. As a partial consequence, both the cost to launch a new drug and the attrition rate are increasing. Systems pharmacology and pharmacogenomics are emerging to exploit the available data and potentially reverse this trend, but, as we argue here, more is needed. To understand the impact of genetic, epigenetic, and environmental factors on drug action, we must study the structural energetics and dynamics of molecular interactions in the context of the whole human genome and interactome. Such an approach requires an integrative modeling framework for drug action that leverages advances in data-driven statistical modeling and mechanism-based multiscale modeling and transforms heterogeneous data from GWAS, high-throughput sequencing, structural genomics, functional genomics, and chemical genomics into unified knowledge. This is not a small task, but, as reviewed here, progress is being made towards the final goal of personalized medicines for the treatment of complex diseases.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.
Multipole expansions offer a natural path to coarse-graining the electrostatic potential. However, the validity of the expansion is restricted to regions outside a spherical enclosure of the distribution of charge and, therefore, not suitable for most applications that demand accurate representation at arbitrary positions around the molecule. We propose and demonstrate a distributed multipole expansion approach that resolves this limitation. We also provide a practical algorithm for the computational implementation of this approach. The method allows the partitioning of the charge distribution into subsystems so that the multipole expansion of each component of the partition, and therefore of their superposition, is valid outside an enclosing surface of the molecule of arbitrary shape. The complexity of the resulting coarse-grained model of electrostatic potential is dictated by the area of the molecular surface and therefore, for a typical three-dimensional molecule, it scale as N2/3 with N, the number of charges in the system. This makes the method especially useful for coarse-grained studies of biological systems consisting of many large macromolecules provided that the configuration of the individual molecules can be approximated as fixed.
Electrostatic potential; Coarse-graining; Molecular modeling; Multipole moments; Algorithms; Distributed multipole analysis
Here off-target binding implies the binding of a small molecule of therapeutic interest to a protein target other than the primary target for which it was intended. Increasingly such off-targeting appears to be the norm rather than the exception, rational drug design notwithstanding, and can lead to detrimental side-effects, or opportunities to reposition a therapeutic agent to treat a different condition. Not surprisingly, there is significant interest in determining a priori what off-targets exist on a proteome-wide scale. Beyond determining putative off-targets is the need to understand the impact of such binding on the complete biological system, with the ultimate goal of being able to predict the phenotypic outcome. While a very ambitious goal, some progress is being made.
How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to “reproducibility maps” that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one’s own laboratory.