How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to “reproducibility maps” that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one’s own laboratory.
The growing discipline of structural systems pharmacology is applied prospectively in this study to predict pharmacological outcomes of antibacterial compounds in Escherichia coli K12. This work builds upon previously established methods for structural prediction of ligand binding pockets on protein molecules and utilizes and expands upon the previously developed genome scale model of metabolism integrated with protein structures (GEM-PRO) for E. coli, structurally accounting for protein complexes. Carefully selected case studies are demonstrated to display the potential for this structural systems pharmacology framework in discovery and development of antibacterial compounds.
The prediction framework for antibacterial activity of compounds was validated for a control set of well-studied compounds, recapitulating experimentally-determined protein binding interactions and deleterious growth phenotypes resulting from these interactions. The antibacterial activity of fosfomycin, sulfathiazole, and trimethoprim were accurately predicted, and as a negative control glucose was found to have no predicted antibacterial activity. Previously uncharacterized mechanisms of action were predicted for compounds with known antibacterial properties, including (1-hydroxyheptane-1,1-diyl)bis(phosphonic acid) and cholesteryl oleate. Five candidate inhibitors were predicted for a desirable target protein without any known inhibitors, tryptophan synthase β subunit (TrpB). In addition to the predictions presented, this effort also included significant expansion of the previously developed GEM-PRO to account for physiological assemblies of protein complex structures with activities included in the E. coli K12 metabolic network.
The structural systems pharmacology framework presented in this study was shown to be effective in the prediction of molecular mechanisms of antibacterial compounds. The study provides a promising proof of principle for such an approach to antibacterial development and raises specific molecular and systemic hypotheses about antibacterials that are amenable to experimental testing. This framework, and perhaps also the specific predictions of antibacterials, is extensible to developing antibacterial treatments for pathogenic E. coli and other bacterial pathogens.
Structural systems pharmacology; Antibacterial; Metabolic model; Ligand binding; Escherichia coli
Multipole expansions offer a natural path to coarse-graining the electrostatic potential. However, the validity of the expansion is restricted to regions outside a spherical enclosure of the distribution of charge and, therefore, not suitable for most applications that demand accurate representation at arbitrary positions around the molecule. We propose and demonstrate a distributed multipole expansion approach that resolves this limitation. We also provide a practical algorithm for the computational implementation of this approach. The method allows the partitioning of the charge distribution into subsystems so that the multipole expansion of each component of the partition, and therefore of their superposition, is valid outside an enclosing surface of the molecule of arbitrary shape. The complexity of the resulting coarse-grained model of electrostatic potential is dictated by the area of the molecular surface and therefore, for a typical three-dimensional molecule, it scale as N2/3 with N, the number of charges in the system. This makes the method especially useful for coarse-grained studies of biological systems consisting of many large macromolecules provided that the configuration of the individual molecules can be approximated as fixed.
Electrostatic potential; Coarse-graining; Molecular modeling; Multipole moments; Algorithms; Distributed multipole analysis
It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions.
To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs - from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies.
Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.