Aggregation of Cu, Zn Superoxide Dismutase (SOD1) is often found in Amyotrophic Lateral Sclerosis (ALS) patients. The fibrillar aggregates formed by wildtype and various disease-associated mutants have recently been found to have distinct cores and morphologies. Previous computational and experimental studies of wildtype SOD1 suggest that the apo-monomer, highly aggregation-prone, displays substantial local unfolding dynamics. The residual folded structure of locally unfolded apoSOD1 corresponds to peptide segments forming the aggregation core as identified by a combination of proteolysis and mass spectroscopy. Therefore, we hypothesize that the destabilization of apoSOD1 caused by various mutations leads to distinct local unfolding dynamics. The partially unfolded structure, exposing the hydrophobic core and backbone hydrogen bond donors and acceptors, is prone to aggregate. The peptide segments in the residual folded structures form the “building block” for aggregation, which in turn determines the morphology of the aggregates. To test this hypothesis, we apply a multiscale simulation approach to study the aggregation of three typical SOD1 variants: wildtype, G37R, and I149T. Each of these SOD1 variants has distinct peptide segments forming the core structure and features different aggregate morphologies. We perform atomistic molecular dynamics simulations to study the conformational dynamics of apoSOD1 monomer, and coarse-grained molecular dynamics simulations to study the aggregation of partially unfolded SOD1 monomers. Our computational studies of monomer local unfolding and the aggregation of different SOD1 variants are consistent with experiments, supporting the hypothesis of the formation of aggregation “building blocks” via apo-monomer local unfolding as the mechanism of SOD1 fibrillar aggregation.
SOD1 misfolding and aggregation; fibrillar aggregate; aggregation building block; molecular dynamics; multiscale modeling
Opioids that stimulate the μ-opioid receptor (MOR1) are the most frequently prescribed and effective analgesics. Here we present a structural model of MOR1. Molecular dynamics simulations show a ligand-dependent increase in the conformational flexibility of the third intracellular loop that couples with the G-protein complex. These simulations likewise identified residues that form frequent contacts with ligands. We validated the binding residues using site-directed mutagenesis coupled with radioligand binding and functional assays. The model was used to blindly screen a library of ~1.2 million compounds. From the thirty-four compounds predicted to be strong binders, the top three candidates were examined using biochemical assays. One compound showed high efficacy and potency. Post hoc testing revealed this compound to be nalmefene, a potent clinically used antagonist, thus further validating the model. In summary, the MOR1 model provides a tool for elucidating the structural mechanism of ligand-initiated cell signaling and screening for novel analgesics.
Catechol O-methyltransferase (COMT) metabolizes catechol moieties by methylating a single hydroxyl group at the meta- or para- hydroxyl position. Hydrophobic amino acids near the active site of COMT influence the regioselectivity of this reaction. Our sequence analysis highlights their importance by showing that these residues are highly conserved throughout evolution. Reaction barriers calculated in the gas phase reveal a lower barrier during methylation at the meta- position, suggesting that the observed meta-regioselectivity of COMT can be attributed to the substrate itself, and that COMT has evolved residues to orient the substrate in a manner that increases the rate of catalysis.
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two non-bonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared to other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.
Homology modeling; refinement; Chiron; Discrete Molecular Dynamics; Protein Design
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
Protein-peptide interactions play important roles in many cellular processes, including signal transduction, trafficking, and immune recognition. Protein conformational changes upon binding, an ill-defined peptide binding surface, and the large number of peptide degrees of freedom make the prediction of protein-peptide interactions particularly challenging. To address these challenges, we perform rapid molecular dynamics simulations in order to examine the energetic and dynamic aspects of protein-peptide binding. We find that, in most cases, we recapitulate the native binding sites and native-like poses of protein-peptide complexes. Inclusion of electrostatic interactions in simulations significantly improves the prediction accuracy. Our results also highlight the importance of protein conformational flexibility, especially side-chain movement, which allows the peptide to optimize its conformation. Our findings not only demonstrate the importance of sufficient sampling of the protein and peptide conformations, but also reveal the possible effects of electrostatics and conformational flexibility on peptide recognition.
The curated CSAR-NRC benchmark sets provide valuable opportunity for testing or comparing the performance of both existing and novel scoring functions. We apply two different scoring functions, both independently and in combination, to predict binding affinity of ligands in the CSAR-NRC datasets. One, reported here for the first time, employs multiple chemical-geometrical descriptors of the protein-ligand interface to develop Quantitative Structure – Binding Affinity Relationships (QSBAR) models; these models are then used to predict binding affinity of ligands in the external dataset. Second is a physical force field-based scoring function, MedusaScore. We show that both individual scoring functions achieve statistically significant prediction accuracies with the squared correlation coefficient (R2) between actual and predicted binding affinity of 0.44/0.53 (Set1/Set2) with QSBAR models and 0.34/0.47 (Set1/Set2) with MedusaScore. Importantly, we find that the combination of QSBAR models and MedusaScore into consensus scoring function affords higher prediction accuracy than any of the contributing methods achieving R2 of 0.45/0.58 (Set1/Set2). Furthermore, we identify several chemical features and non-covalent interactions that may be responsible for the inaccurate prediction of binding affinity for several ligands by the scoring functions employed in this study.
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
Aggregation of Cu, Zn superoxide dismutase (SOD1) is implicated in Amyotrophic Lateral Sclerosis (ALS). Glutathionylation and phosphorylation of SOD1 is omnipresent in the human body, even in healthy individuals, and has been shown to increase SOD1 dimer dissociation, which is the first step on the pathway toward SOD1 aggregation. We find that post-translational modification of SOD1, especially glutathionylation, promotes dimer dissociation. We discover an intermediate state in the pathway to dissociation, a conformational change that involves a “loosening” of the β-barrels and a loss or shift of dimer interface interactions. In modified SOD1, this intermediate state is stabilized as compared to unmodified SOD1. The presence of post-translational modifications could explain the environmental factors involved in the speed of disease progression. Because post-translational modifications such as glutathionylation are often induced by oxidative stress, post-translational modification of SOD1 could be a factor in the occurrence of sporadic cases of ALS, which make up 90% of all cases of the disease.
Motivation: Identifying the location of binding sites on proteins is of fundamental importance for a wide range of applications, including molecular docking, de novo drug design, structure identification and comparison of functional sites. Here we present Erebus, a web server that searches the entire Protein Data Bank for a given substructure defined by a set of atoms of interest, such as the binding scaffolds for small molecules. The identified substructure contains atoms having the same names, belonging to same amino acids and separated by the same distances (within a given tolerance) as the atoms of the query structure. The accuracy of a match is measured by the root-mean-square deviation or by the normal weight with a given variance. Tests show that our approach can reliably locate rigid binding scaffolds of drugs and metal ions.
Availability and Implementation: We provide this service through a web server at http://erebus.dokhlab.org.
We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy.
cryo-EM; density fitting; structural genome; Zernike; geometric invariants
Mutation of the ubiquitous cytosolic enzyme Cu/Zn superoxide dismutase (SOD1) is hypothesized to cause familial amyotrophic lateral sclerosis (FALS) through structural destabilization leading to misfolding and aggregation. Considering the late onset of symptoms as well as the phenotypic variability among patients with identical SOD1 mutations, it is clear that nongenetic factor(s) impact ALS etiology and disease progression. Here we examine the effect of Cys-111 glutathionylation, a physiologically prevalent post-translational oxidative modification, on the stabilities of wild type SOD1 and two phenotypically diverse FALS mutants, A4V and I112T. Glutathionylation results in profound destabilization of SOD1WT dimers, increasing the equilibrium dissociation constant Kd to ~10−20 μM, comparable to that of the aggressive A4V mutant. SOD1A4V is further destabilized by glutathionylation, experiencing an ~30-fold increase in Kd. Dissociation kinetics of glutathionylated SOD1WT and SOD1A4V are unchanged, as measured by surface plasmon resonance, indicating that glutathionylation destabilizes these variants by decreasing association rate. In contrast, SOD1I112T has a modestly increased dissociation rate but no change in Kd when glutathionylated. Using computational structural modeling, we show that the distinct effects of glutathionylation on different SOD1 variants correspond to changes in composition of the dimer interface. Our experimental and computational results show that Cys-111 glutathionylation induces structural rearrangements that modulate stability of both wild type and FALS mutant SOD1. The distinct sensitivities of SOD1 variants to glutathionylation, a modification that acts in part as a coping mechanism for oxidative stress, suggest a novel mode by which redox regulation and aggregation propensity interact in ALS.
Methyltransferases possess a homologous domain that requires both a divalent metal cation and S-adenosyl-L-methionine (SAM) to catalyze its reactions. The kinetics of several methyltransferases has been well characterized; however, the details regarding their structural mechanisms have remained unclear to date. Using catechol O-methyltransferase (COMT) as a model, we perform discrete molecular dynamics and computational docking simulations to elucidate the initial stages of cofactor binding. We find that COMT binds SAM via an induced-fit mechanism, where SAM adopts a different docking pose in the absence of metal and substrate in comparison to the holoenzyme. Flexible modeling of the active site side-chains is essential for observing the lowest energy state in the apoenzyme; rigid docking tools are unable to recapitulate the pose unless the appropriate side-chain conformations are given a priori. From our docking results, we hypothesize that the metal reorients SAM in a conformation suitable for donating its methyl substituent to the recipient ligand. The proposed mechanism enables a general understanding of how divalent metal cations contribute to methyltransferase function.
Catechol-O-methyltransferase (COMT) is a major enzyme controlling catecholamine levels that plays a central role in cognition, affective mood and pain perception. There are three common COMT haplotypes in the human population reported to have functional effects, divergent in two synonymous and one nonsynonymous position. We demonstrate that one of the haplotypes, carrying the non-synonymous variation known to code for a less stable protein, exhibits increased protein expression in vitro. This increased protein expression, which would compensate for lower protein stability, is solely produced by a synonymous variation (C166T) situated within the haplotype and located in the 5′ region of the RNA transcript. Based on mRNA secondary structure predictions, we suggest that structural destabilization near the start codon caused by the T allele could be related to the observed increase in COMT expression. Our folding simulations of the tertiary mRNA structures demonstrate that destabilization by the T allele lowers the folding transition barrier, thus decreasing the probability of occupying its native state. These data suggest a novel structural mechanism whereby functional synonymous variations near the translation initiation codon affect the translation efficiency via entropy-driven changes in mRNA dynamics and present another example of stable compensatory genetic variations in the human population.
While various approaches exist to study protein localization, it is still a challenge to predict where proteins localize. Here, we consider a mechanistic viewpoint for membrane localization. Taking into account the steps for the folding pathway of α-helical membrane proteins and relating biophysical parameters to each of these steps, we create a score capable of predicting the propensity for membrane localization and call it FP3mem. This score is driven from the principal component analysis (PCA) of the biophysical parameters related to membrane localization. FP3mem allows us to rationalize the colocalization of a number of channel proteins with the Cav1.2 channel by their fewer propensities for membrane localization.
We present a method to calculate the propensities of regions within a DNA molecule to transition from B-form to Z-form under negative superhelical stresses. We use statistical mechanics to analyze the competition that occurs among all susceptible Z-forming regions at thermodynamic equilibrium in a superhelically stressed DNA of specified sequence. This method, which we call SIBZ, is similar to the SIDD algorithm that was previously developed to analyze superhelical duplex destabilization. A state of the system is determined by assigning to each base pair either the B- or the Z-conformation, accounting for the dinucleotide repeat unit of Z-DNA. The free energy of a state is comprised of the nucleation energy, the sequence-dependent B-Z transition energy, and the energy associated with the residual superhelicity remaining after the change of twist due to transition. Using this information, SIBZ calculates the equilibrium B-Z transition probability of each base pair in the sequence. This can be done at any physiologically reasonable level of negative superhelicity. We use SIBZ to analyze a variety of representative genomic DNA sequences. We show that the dominant Z-DNA forming regions in a sequence can compete in highly complex ways as the superhelicity level changes. Despite having no tunable parameters, the predictions of SIBZ agree precisely with experimental results, both for the onset of transition in plasmids containing introduced Z-forming sequences and for the locations of Z-forming regions in genomic sequences. We calculate the transition profiles of 5 kb regions taken from each of 12,841 mouse genes and centered on the transcription start site (TSS). We find a substantial increase in the frequency of Z-forming regions immediately upstream from the TSS. The approach developed here has the potential to illuminate the occurrence of Z-form regions in vivo, and the possible roles this transition may play in biological processes.
We present the SIBZ algorithm that calculates the equilibrium properties of the transition from right-handed B-form to left-handed Z-form in a DNA sequence that is subjected to imposed stresses. SIBZ calculates the probability of transition of each base pair in a user-defined sequence. By examining illustrative examples, we show that the transition behaviors of all Z-susceptible regions in a sequence are coupled together by the imposed stresses. We show that the results produced by SIBZ agree closely with experimental observations of both the onset of transitions and the locations of Z-form sites in molecules of specified sequence. By analyzing 12,841 mouse genes, we show that sites susceptible to the B-Z transition cluster upstream from gene start sites. As this is where stresses generated by transcription accumulate, these sites may actually experience this transition when the genes involved are being expressed. This suggests that these transitions may serve regulatory functions.
Determining the forces that conserve amino acid positions in proteins across species is a fundamental pursuit of molecular evolution. Evolutionary conservation is driven by either a protein's function or its thermodynamic stability. Highly conserved histone proteins offer a platform to evaluate these driving forces. While the conservation of histone H3 and H4 “tail” domains and surface residues are driven by functional importance, the driving force behind the conservation of buried histone residues has not been examined. Using a computational approach, we determined the thermodynamically preferred amino acids at each buried position in H3 and H4. In agreement with what is normally observed in proteins, we find a significant correlation between thermodynamic stability and evolutionary conservation in the buried residues in H4. In striking contrast, we find that thermodynamic stability of buried H3 residues does not correlate with evolutionary conservation. Given that these H3 residues are not post-translationally modified and only regulate H3-H3 and H3-H4 stabilizing interactions, our data imply an unknown function responsible for driving conservation of these buried H3 residues.
Most proteins fold to a well-defined, three-dimensional structure, which can be delineated into the protein surface and its buried core. When comparing amino acid sequences of the same protein from different organisms, we would expect to find certain residue positions conserved due to the importance of that position in either maintaining the protein's function or its three-dimensional structure. In this study, we looked at residues in the buried core domains of histone proteins H3 and H4, which have no known function other than maintaining the three-dimensional structure of the protein. We find that perturbing protein stability (which is a measure of maintenance of the protein's structure) by mutating these residues compromises survival fitness in yeast. However, the stability conferred by buried amino acids of H3 alone cannot account for their evolutionary conservation, which is in striking contrast to other proteins where stability has been shown to be the driving force for sequence conservation. This conservation of H3 thus points to either new additional functions of H3 that have not been uncovered or a unique conservation mechanism that goes beyond survival pressure. These data therefore reveal a highly conserved domain that is distinct in its evolutionary conservation.
Small globular proteins and peptides commonly exhibit two-state folding kinetics in which the rate limiting step of folding is the surmounting of a single free energy barrier at the transition state (TS) separating the folded and the unfolded states. An intriguing question is whether the polypeptide chain reaches, and leaves, the TS by completely random fluctuations, or whether there is a directed, stepwise process. Here, the folding TS of a 15-residue β-hairpin peptide, Peptide 1, is characterized using independent 2.5 μs-long unbiased atomistic molecular dynamics (MD) simulations (a total of 15 μs). The trajectories were started from fully unfolded structures. Multiple (spontaneous) folding events to the NMR-derived conformation are observed, allowing both structural and dynamical characterization of the folding TS. A common loop-like topology is observed in all the TS structures with native end-to-end and turn contacts, while the central segments of the strands are not in contact. Non-native sidechain contacts are present in the TS between the only tryptophan (W11) and the turn region (P7-G9). Prior to the TS the turn is found to be already locked by the W11 sidechain, while the ends are apart. Once the ends have also come into contact, the TS is reached. Finally, along the reactive folding paths the cooperative loss of the W11 non-native contacts and the formation of the central inter-strand native contacts lead to the peptide rapidly proceeding from the TS to the native state. The present results indicate a directed stepwise process to folding the peptide.
The folding dynamics of many small protein/peptides investigated recently are in terms of simple two-state model in which only two populations exist (folded and unfolded), separated by a single free energy barrier with only one kinetically important transition state (TS). However, dynamical characterization of the folding TS is challenging. We have used independent unbiased atomistic molecular dynamics simulations with clear folding-unfolding transitions to characterize structural and dynamical features of transition state ensemble of Peptide 1. A common loop-like topology is observed in all TS structures extracted from multiple simulations. The trajectories were used to examine the mechanism by which the TS is reached and subsequent events in folding pathways. The folding TS is reached and crossed in a directed stagewise process rather than through random fluctuations. Specific structures are formed before, during, and after the transition state, indicating a clear structured folding pathway.
The differences in efficacy and molecular mechanisms of platinum anti-cancer drugs cisplatin (CP) and oxaliplatin (OX) are thought to be partially due to the differences in the DNA conformations of the CP and OX adducts that form on adjacent guanines on DNA, which in turn influence the binding of damage-recognition proteins that control downstream effects of the adducts. Here we report a comprehensive comparison of the structural distortion of DNA caused by CP and OX adducts in the TGGT sequence context using nuclear magnetic resonance (NMR) spectroscopy and molecular dynamics (MD) simulations. When compared to our previous studies in other sequence contexts, these structural studies help us understand the effect of the sequence context on the conformation of Pt-GG DNA adducts. We find that both the sequence context and the type of Pt-GG DNA adduct (CP vs. OX) play an important role in the conformation and the conformational dynamics of Pt-DNA adducts, possibly explaining their influence on the ability of many damage-recognition proteins to bind to Pt-DNA adducts.