The ability to predict RNA secondary structure is fundamental for understanding and manipulating RNA function. The structural information obtained from selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) experiments greatly improves the accuracy of RNA secondary structure prediction. Recently, Das and colleagues [Kladwang et al., Biochemistry
50:8049 (2011)] proposed a “bootstrapping” approach to estimate the variance and helix-by-helix confidence levels of predicted secondary structures based on resampling (randomizing and summing) the measured SHAPE data. We show that the specific resampling approach described by Kladwang et al. introduces systematic errors and underestimates confidence in secondary structure prediction using SHAPE data. Instead, a leave-data-out jackknife approach better estimates the influence of a given experimental dataset on SHAPE-directed secondary structure modeling. Even when 35% of the data were left out in the jackknife approach, the confidence levels of SHAPE-directed secondary structure prediction were significantly higher than those calculated by Das and colleagues using bootstrapping. Helix confidence levels were thus significantly underestimated in the recent study, and resampling approach implemented by Kladwang et al. is not an appropriate metric for assigning confidences in SHAPE-directed secondary structure modeling.
Aggregation of Cu, Zn Superoxide Dismutase (SOD1) is often found in Amyotrophic Lateral Sclerosis (ALS) patients. The fibrillar aggregates formed by wildtype and various disease-associated mutants have recently been found to have distinct cores and morphologies. Previous computational and experimental studies of wildtype SOD1 suggest that the apo-monomer, highly aggregation-prone, displays substantial local unfolding dynamics. The residual folded structure of locally unfolded apoSOD1 corresponds to peptide segments forming the aggregation core as identified by a combination of proteolysis and mass spectroscopy. Therefore, we hypothesize that the destabilization of apoSOD1 caused by various mutations leads to distinct local unfolding dynamics. The partially unfolded structure, exposing the hydrophobic core and backbone hydrogen bond donors and acceptors, is prone to aggregate. The peptide segments in the residual folded structures form the “building block” for aggregation, which in turn determines the morphology of the aggregates. To test this hypothesis, we apply a multiscale simulation approach to study the aggregation of three typical SOD1 variants: wildtype, G37R, and I149T. Each of these SOD1 variants has distinct peptide segments forming the core structure and features different aggregate morphologies. We perform atomistic molecular dynamics simulations to study the conformational dynamics of apoSOD1 monomer, and coarse-grained molecular dynamics simulations to study the aggregation of partially unfolded SOD1 monomers. Our computational studies of monomer local unfolding and the aggregation of different SOD1 variants are consistent with experiments, supporting the hypothesis of the formation of aggregation “building blocks” via apo-monomer local unfolding as the mechanism of SOD1 fibrillar aggregation.
SOD1 misfolding and aggregation; fibrillar aggregate; aggregation building block; molecular dynamics; multiscale modeling
Nature has evolved proteins to counter-act forces applied on living cells, and designed proteins that can sense forces. One can appreciate Nature’s ingenuity in evolving these proteins to be highly sensitive to force and to have a high dynamic force range at which they operate. To achieve this level of sensitivity, many of these proteins are comprised of multiple domains and linking peptides connecting these domain, each of them have their own force response regimes. Here, using a simple model of a protein, we address the question of how each individual domain responds to force. We also ask how multi-domain proteins respond to forces. We find that the end-to-end distance of individual domains under force scales linearly with force. In multi-domain proteins, we find that the force response has a rich range: at low force, extension is predominantly governed by “weaker” linking peptides or domain intermediates, while at higher force, the extension is governed by unfolding of individual domains. Overall, the force extension curve comprises multiple sigmoidal transition governed by unfolding of linking peptides and domains. Our study provides a basic framework for the understanding of protein response to force, and allows for interpretation experiments in which force is used to study the mechanical properties of multi-domain proteins.
force; mechano-sensing proteins; multi-domain proteins
Opioids that stimulate the μ-opioid receptor (MOR1) are the most frequently prescribed and effective analgesics. Here we present a structural model of MOR1. Molecular dynamics simulations show a ligand-dependent increase in the conformational flexibility of the third intracellular loop that couples with the G-protein complex. These simulations likewise identified residues that form frequent contacts with ligands. We validated the binding residues using site-directed mutagenesis coupled with radioligand binding and functional assays. The model was used to blindly screen a library of ~1.2 million compounds. From the thirty-four compounds predicted to be strong binders, the top three candidates were examined using biochemical assays. One compound showed high efficacy and potency. Post hoc testing revealed this compound to be nalmefene, a potent clinically used antagonist, thus further validating the model. In summary, the MOR1 model provides a tool for elucidating the structural mechanism of ligand-initiated cell signaling and screening for novel analgesics.
Catechol O-methyltransferase (COMT) metabolizes catechol moieties by methylating a single hydroxyl group at the meta- or para- hydroxyl position. Hydrophobic amino acids near the active site of COMT influence the regioselectivity of this reaction. Our sequence analysis highlights their importance by showing that these residues are highly conserved throughout evolution. Reaction barriers calculated in the gas phase reveal a lower barrier during methylation at the meta- position, suggesting that the observed meta-regioselectivity of COMT can be attributed to the substrate itself, and that COMT has evolved residues to orient the substrate in a manner that increases the rate of catalysis.
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two non-bonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared to other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.
Homology modeling; refinement; Chiron; Discrete Molecular Dynamics; Protein Design
Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely-coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We develop an algorithm to build the ligand rotamer library “on-the-fly” during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self-docking (to the co-crystallized state) and cross-docking (to a state co-crystallized with a different ligand), the latter of which mimics the virtual-screening procedure in computational drug discovery. We also perform a virtual-screening test of four flexible kinase targets including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual-screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.
Conformational changes of filamin A under stress have been postulated to play crucial roles in signaling pathways of cell responses. Direct observation of conformational changes under stress is beyond the resolution of current experimental techniques. On the other hand, computational studies are mainly limited to either traditional molecular dynamics simulations of short durations and high forces or simulations of simplified models. Here we perform all-atom discrete molecular dynamics (DMD) simulations to study thermally and force-induced unfolding of filamin A. The high conformational sampling efficiency of DMD allows us to observe force-induced unfolding of filamin A Ig domains under physiological forces. The computationally identified critical unfolding forces agree well with experimental measurements. Despite a large heterogeneity in the population of force-induced intermediate states, we find a common initial unfolding intermediate in all the Ig domains of filamin, where the N-terminal strand unfolds. We also study the thermal unfolding of several filamin Ig-like domains. We find that thermally induced unfolding features an early-stage intermediate state similar to the one observed in force-induced unfolding and characterized by N-terminal strand being unfurled. We propose that the N-terminal strand may act as a conformational switch that unfolds under physiological forces leading to exposure of cryptic binding sites, removal of native binding sites, and modulating the quaternary structure of domains.
Accurate RNA structure modeling is an important, incompletely solved, challenge. Single-nucleotide resolution SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) yields an experimental measurement of local nucleotide flexibility that can be incorporated as pseudo-free energy change constraints to direct secondary structure predictions. Prior work from our laboratory has emphasized both the overall accuracy of this approach and the need for nuanced interpretation of some apparent discrepancies between modeled and accepted structures. Recent studies by Das and colleagues [Kladwang et al., Biochemistry 50:8049 (2011) and Nat. Chem. 3:954 (2011)], focused on analyzing six small RNAs, yielded poorer RNA secondary structure predictions than expected based on prior benchmarking efforts. To understand the features that led to these divergent results, we re-examined four RNAs yielding the poorest results in this recent work – tRNAPhe, the adenine and cyclic-di-GMP riboswitches, and 5S rRNA. Most of the errors reported by Das and colleagues reflected non-standard experiment and data processing choices, and selective scoring rules. For two RNAs, tRNAPhe and the adenine riboswitch, secondary structure predictions are nearly perfect if no experimental information is included but were rendered inaccurate by the Das and colleagues SHAPE data. When best practices were used, single-sequence SHAPE-directed secondary structure modeling recovered ~93% of individual base pairs and greater than 90% of helices in the four RNAs, essentially indistinguishable from the mutate-and-map approach with the exception of a single helix in the 5S rRNA. The field of experimentally-directed RNA secondary structure prediction is entering a phase focused on the most difficult prediction challenges. We outline five constructive principles for guiding this field forward.
Over the past three decades the protein folding field has undergone monumental changes. Originally a purely academic question, how a protein folds has now become vital in understanding diseases and our abilities to rationally manipulate cellular life by engineering protein folding pathways. We review and contrast past and recent developments in the protein folding field. Specifically, we discuss the progress in our understanding of protein folding thermodynamics and kinetics, the properties of evasive intermediates, and unfolded states. We also discuss how some abnormalities in protein folding lead to protein aggregation and human diseases.
Understanding the role of biomolecular dynamics in cellular processes leading to human diseases and the ability to rationally manipulate these processes is of fundamental importance in scientific research. The last decade has witnessed significant progress in probing biophysical behavior of proteins. However, we are still limited in understanding how changes in protein dynamics and inter-protein interactions occurring in short length- and time-scales lead to aberrations in their biological function. Bridging this gap in biology probed using computer simulations marks a challenging frontier in computational biology. Here we examine hypothesis-driven simplified protein models in conjunction with discrete molecular dynamics in the study of protein aggregation, implicated in series of neurodegenerative diseases, such as Alzheimer's and Huntington's diseases. Discrete molecular dynamics simulations of simplified protein models have emerged as a powerful methodology with its ability to bridge the gap in time and length scales from protein dynamics to aggregation, and provide an indispensable tool for probing protein aggregation.
Protein Aggregation; Protein Misfolding; Simplified Modeling; Aggregation Kinetics; Folding Thermodynamics; Discrete Molecular Dynamics; Molecular Dynamics; Computational Biology; Biophysics; MD; DMD; Misfolding; Molecular Dynamics; Review
Most cystic fibrosis is caused by a deletion of a single residue (F508) in CFTR that disrupts the folding and biosynthetic maturation of the ion channel protein. Progress towards understanding the underlying mechanisms and overcoming the defect remain incomplete. Here we show that the thermal instability of human ΔF508 CFTR channel activity evident in both cell-attached membrane patches and planar phospholipid bilayers is not observed in corresponding mutant CFTRs of several non-mammalian species. These more stable orthologs are distinguished from their mammalian counterparts by the substitution of proline residues at several key dynamic locations in the first nucleotide domain (NBD1), including the structurally diverse region (SDR), the gamma phosphate switch loop and the Regulatory Insertion (RI). Molecular Dynamic analyses revealed that addition of the prolines could reduce flexibility at these locations and increase the temperatures of unfolding transitions of ΔF508 NBD1 to that of the wild-type. Introduction of these prolines experimentally into full-length human ΔF508 CFTR together with the already recognized I539T suppressor mutation, also in the SDR, restored channel function and thermodynamic stability as well as its trafficking to and lifetime at the cell surface. Thus, while cellular manipulations that circumvent its culling by quality control systems leave ΔF508 CFTR dysfunctional at physiological temperature, restoration of the delicate balance between the dynamic protein’s inherent stability and channel activity returns a near-normal state.
ABC transporters; CFTR; protein thermal stability; ion channel; DMD simulations
Until now it has been impractical to observe protein folding in silico for proteins larger than 50 residues. Limitations of both force field accuracy and computational efficiency make the folding problem very challenging. Here we employ discrete molecular dynamics (DMD) simulations with an all-atom force field to fold fast-folding proteins. We extend the DMD force field by introducing long-range electrostatic interactions to model salt-bridges and a sequence-dependent semi-empirical potential accounting for natural tendencies of certain amino acid sequences to form specific secondary structures. We enhance the computational performance by parallelizing the DMD algorithm. Using a small number of commodity computers, we achieve sampling quality and folding accuracy comparable to the explicit-solvent simulations performed on high-end hardware. We demonstrate that DMD can be used to observe equilibrium folding of villin headpiece and WW domain, study two-state folding kinetics and sample near-native states in ab initio folding of proteins of ~100 residues.
Conformational dynamics; structure prediction; implicit solvent; parallel event-driven simulation
Limited proteolysis, accomplished by endopeptidases, is a ubiquitous phenomenon underlying the regulation and activation of many enzymes, receptors and other proteins synthesized as inactive precursors. Serine proteases are one of the largest and conserved families of endopeptidases involved in diverse cellular activities including wound healing, blood coagulation and immune responses. Heteromeric α,β,γ-epithelial sodium channels (ENaC) associated with diseases like cystic fibrosis and Liddle’s syndrome, are irreversibly stimulated by membrane-anchored proteases (MAPs) and furin-like convertases. Matriptase/Channel activating protease-3 (CAP3) is one of the several MAPs that potently activate ENaC. Despite identification of protease cleavage sites, the basis for enhanced susceptibility of α- and γ-ENaC to proteases remains elusive. Here, we elucidate the energetic and structural bases for activation of ENaC by CAP3. We find a region near the γ-ENaC furin site that is previously unidentified as a critical cleavage site for CAP3-mediated stimulation. We also report that CAP3 mediates cleavage of ENaC at basic residues downstream of the furin site. Our results indicate that surface proteases alone are sufficient to fully activate uncleaved ENaC, and explain how ENaC in epithelia expressing surface-active proteases can appear refractory to soluble proteases. Our results support a model in which proteases prime ENaC for activation by cleaving at the furin site, and cleavage at downstream sites is accomplished by membrane surface proteases or extracellular soluble proteases. Based on our results, we propose a dynamics-driven “anglerfish” mechanism that explains less stringent sequence requirements for substrate recognition and cleavage by matriptase compared to furin.
ENaC; serine endopeptidase; Xenopus; voltage clamp; discrete molecular dynamics
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
Protein-peptide interactions play important roles in many cellular processes, including signal transduction, trafficking, and immune recognition. Protein conformational changes upon binding, an ill-defined peptide binding surface, and the large number of peptide degrees of freedom make the prediction of protein-peptide interactions particularly challenging. To address these challenges, we perform rapid molecular dynamics simulations in order to examine the energetic and dynamic aspects of protein-peptide binding. We find that, in most cases, we recapitulate the native binding sites and native-like poses of protein-peptide complexes. Inclusion of electrostatic interactions in simulations significantly improves the prediction accuracy. Our results also highlight the importance of protein conformational flexibility, especially side-chain movement, which allows the peptide to optimize its conformation. Our findings not only demonstrate the importance of sufficient sampling of the protein and peptide conformations, but also reveal the possible effects of electrostatics and conformational flexibility on peptide recognition.
Molecular modeling guided by experimentally-derived structural information is an attractive approach for three-dimensional structure determination of complex RNAs that are not amenable to study by high-resolution methods. Hydroxyl radical probing (HRP), performed routinely in many laboratories, provides a measure of solvent accessibility at individual nucleotides. HRP measurements have, to date, only been used to evaluate RNA models qualitatively. Here, we report development of a quantitative structure refinement approach using HRP measurements to drive discrete molecular dynamics simulations for RNAs ranging in size from 80 to 230 nucleotides. HRP reactivities were first used to identify RNAs that form extensive helical packing interactions. For these RNAs, we achieved highly significant structure predictions, given inputs of RNA sequence and base pairing. This HRP-directed tertiary structure refinement approach generates robust structural hypotheses useful for guiding explorations of structure-function interrelationships in RNA.
Prolyl hydroxylase domain 2 containing protein (PHD2) is a key protein in regulation of angiogenesis and metastasis. In normoxic condition, PHD2 triggers the degradation of hypoxia-inducible factor 1 (HIF-1α) that induces the expression of hypoxia response genes. Therefore the correct function of PHD2 would inhibit angiogenesis and consequent metastasis of tumor cells in normoxic condition. PHD2 mutations were reported in some common cancers. However, high levels of HIF-1α protein were observed even in normoxic metastatic tumors with normal expression of wild type PHD2. PHD2 malfunctions due to protein misfolding may be the underlying reason of metastasis and invasion in such cases. In this study, we scrutinize the unfolding pathways of the PHD2 catalytic domain’s possible species and demonstrate the properties of their unfolding states by computational approaches. Our study introduces the possibility of aggregation disaster for the prominent species of PHD2 during its partial unfolding. This may justify PHD2 inability to regulate HIF-1α level in some normoxic tumor types.
The curated CSAR-NRC benchmark sets provide valuable opportunity for testing or comparing the performance of both existing and novel scoring functions. We apply two different scoring functions, both independently and in combination, to predict binding affinity of ligands in the CSAR-NRC datasets. One, reported here for the first time, employs multiple chemical-geometrical descriptors of the protein-ligand interface to develop Quantitative Structure – Binding Affinity Relationships (QSBAR) models; these models are then used to predict binding affinity of ligands in the external dataset. Second is a physical force field-based scoring function, MedusaScore. We show that both individual scoring functions achieve statistically significant prediction accuracies with the squared correlation coefficient (R2) between actual and predicted binding affinity of 0.44/0.53 (Set1/Set2) with QSBAR models and 0.34/0.47 (Set1/Set2) with MedusaScore. Importantly, we find that the combination of QSBAR models and MedusaScore into consensus scoring function affords higher prediction accuracy than any of the contributing methods achieving R2 of 0.45/0.58 (Set1/Set2). Furthermore, we identify several chemical features and non-covalent interactions that may be responsible for the inaccurate prediction of binding affinity for several ligands by the scoring functions employed in this study.
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
We developed a new system for light-induced protein dimerization in living cells using a novel photocaged analog of rapamycin (pRap) together with an engineered rapamycin binding domain (iFKBP). Using focal adhesion kinase as a target, we demonstrated successful light-mediated regulation of protein interaction and localization in living cells. Modification of this approach enabled light-triggered activation of a protein kinase and initiation of kinase-induced phenotypic changes in vivo.
Aggregation of Cu, Zn superoxide dismutase (SOD1) is implicated in Amyotrophic Lateral Sclerosis (ALS). Glutathionylation and phosphorylation of SOD1 is omnipresent in the human body, even in healthy individuals, and has been shown to increase SOD1 dimer dissociation, which is the first step on the pathway toward SOD1 aggregation. We find that post-translational modification of SOD1, especially glutathionylation, promotes dimer dissociation. We discover an intermediate state in the pathway to dissociation, a conformational change that involves a “loosening” of the β-barrels and a loss or shift of dimer interface interactions. In modified SOD1, this intermediate state is stabilized as compared to unmodified SOD1. The presence of post-translational modifications could explain the environmental factors involved in the speed of disease progression. Because post-translational modifications such as glutathionylation are often induced by oxidative stress, post-translational modification of SOD1 could be a factor in the occurrence of sporadic cases of ALS, which make up 90% of all cases of the disease.
Motivation: Identifying the location of binding sites on proteins is of fundamental importance for a wide range of applications, including molecular docking, de novo drug design, structure identification and comparison of functional sites. Here we present Erebus, a web server that searches the entire Protein Data Bank for a given substructure defined by a set of atoms of interest, such as the binding scaffolds for small molecules. The identified substructure contains atoms having the same names, belonging to same amino acids and separated by the same distances (within a given tolerance) as the atoms of the query structure. The accuracy of a match is measured by the root-mean-square deviation or by the normal weight with a given variance. Tests show that our approach can reliably locate rigid binding scaffolds of drugs and metal ions.
Availability and Implementation: We provide this service through a web server at http://erebus.dokhlab.org.
We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy.
cryo-EM; density fitting; structural genome; Zernike; geometric invariants