Macromolecular modeling and design are increasingly useful in basic research, biotechnology, and teaching. However, the absence of a user-friendly modeling framework that provides access to a wide range of modeling capabilities is hampering the wider adoption of computational methods by non-experts. RosettaScripts is an XML-like language for specifying modeling tasks in the Rosetta framework. RosettaScripts provides access to protocol-level functionalities, such as rigid-body docking and sequence redesign, and allows fast testing and deployment of complex protocols without need for modifying or recompiling the underlying C++ code. We illustrate these capabilities with RosettaScripts protocols for the stabilization of proteins, the generation of computationally constrained libraries for experimental selection of higher-affinity binding proteins, loop remodeling, small-molecule ligand docking, design of ligand-binding proteins, and specificity redesign in DNA-binding proteins.
Accommodating backbone flexibility continues to be the most difficult challenge in computational docking of protein-protein complexes. Towards that end, we simulate four distinct biophysical models of protein binding in RosettaDock, a multi-scale Monte-Carlo based algorithm that uses a quasi-kinetic search process to emulate the diffusional encounter of two proteins and identify low energy complexes. The four binding models are: 1) key-lock model (KL) using rigid-backbone docking, 2) conformer selection model (CS) using a novel ensemble docking algorithm, 3) induced fit model (IF) using energy gradient-based backbone minimization, and 4) a combined conformer selection/induced fit model (CS/IF). Backbone flexibility was limited to the smaller partner of the complex, structural ensembles were generated using Rosetta refinement methods, and docking consisted of local perturbations around the complexed conformation using unbound component crystal structures for a set of 21 target complexes. The lowest-energy structure contained more than 30% of the native residue-residue contacts for 9, 13, 13, and 14 targets for KL, CS, IF and CS/IF docking respectively. When applied to 15 targets using NMR ensembles of the smaller protein, the lowest-energy structure recovered at least 30% native residue contacts in 3, 8, 4 and 8 targets for KL, CS, IF and CS/IF docking respectively. CS/IF docking of the NMR ensemble performed equally well or better than KL docking with the unbound crystal structure in 10 of 15 cases. The marked success of CS and CS/IF docking shows that ensemble docking can be a versatile and effective method for accommodating conformational plasticity in docking and serves as a demonstration for the conformer selection theory - that binding-competent conformers exist in the unbound ensemble and can be selected based on their favorable binding energies.
protein-protein docking; flexible docking; ensemble docking; conformer selection; NMR ensembles
Few existing protein-protein interface design methods allow for extensive backbone rearrangements during the design process. There is also a dichotomy between redesign methods, which take advantage of the native interface, and de novo methods, which produce novel binders.
Here, we propose a new method for designing novel protein reagents that combines advantages of redesign and de novo methods and allows for extensive backbone motion. This method requires a bound structure of a target and one of its natural binding partners. A key interaction in this interface, the anchor, is computationally grafted out of the partner and into a surface loop on the design scaffold. The design scaffold's surface is then redesigned with backbone flexibility to create a new binding partner for the target. Careful choice of a scaffold will bring experimentally desirable characteristics into the new complex. The use of an anchor both expedites the design process and ensures that binding proceeds against a known location on the target. The use of surface loops on the scaffold allows for flexible-backbone redesign to properly search conformational space.
Conclusions and Significance
This protocol was implemented within the Rosetta3 software suite. To demonstrate and evaluate this protocol, we have developed a benchmarking set of structures from the PDB with loop-mediated interfaces. This protocol can recover the correct loop-mediated interface in 15 out of 16 tested structures, using only a single residue as an anchor.
The de novo design of protein-binding peptides is challenging, because it requires identifying both a sequence and a backbone conformation favorable for binding. We used a computational strategy that iterates between structure and sequence optimization to redesign the C-terminal portion of the RGS14 GoLoco motif peptide so that it adopts a new conformation when bound to Gαi1. An X-ray crystal structure of the redesigned complex closely matches the computational model, with a backbone RMSD of 1.1 Å.
Recent efforts to design de novo or redesign the sequence and structure of proteins using computational techniques have met with significant success. Most, if not all, of these computational methodologies attempt to model atomic-level interactions, and hence high-resolution structural characterization of the designed proteins is critical for evaluating the atomic-level accuracy of the underlying design force-fields. We previously used our computational protein design protocol RosettaDesign to completely redesign the sequence of the activation domain of human procarboxypeptidase A2. With 68% of the wild-type sequence changed, the designed protein, AYEdesign, is over 10 kcal/mol more stable than the wild-type protein. Here, we describe the high-resolution crystal structure and solution NMR structure of AYEdesign, which show that the experimentally determined backbone and side-chains conformations are effectively superimposable with the computational model at atomic resolution. To isolate the origins of the remarkable stabilization, we have designed and characterized a new series of procarboxypeptidase mutants that gain significant thermodynamic stability with a minimal number of mutations; one mutant gains more than 5 kcal/mol of stability over the wild-type protein with only four amino acid changes. We explore the relationship between force-field smoothing and conformational sampling by comparing the experimentally determined free energies of the overall design and these focused subsets of mutations to those predicted using modified force-fields, and both fixed and flexible backbone sampling protocols.
HSQC, heteronuclear single-quantum coherence; NOE, nuclear Overhauser effect; NOESY, NOE spectroscopy; RMSD, root-mean-square deviation; RDF, radial distribution function; Computational protein design; Rosetta; Thermodynamic stabilization; High-resolution protein structure; Procarboxypeptidase A2
Many protein-protein docking protocols are based on a shotgun approach, in which thousands of independent random-start trajectories minimize the rigid-body degrees of freedom. Another strategy is enumerative sampling as used in ZDOCK. Here, we introduce an alternative strategy, ReplicaDock, using a small number of long trajectories of temperature replica exchange. We compare replica exchange sampling as low-resolution stage of RosettaDock with RosettaDock's original shotgun sampling as well as with ZDOCK. A benchmark of 30 complexes starting from structures of the unbound binding partners shows improved performance for ReplicaDock and ZDOCK when compared to shotgun sampling at equal or less computational expense. ReplicaDock and ZDOCK consistently reach lower energies and generate significantly more near-native conformations than shotgun sampling. Accordingly, they both improve typical metrics of prediction quality of complex structures after refinement. Additionally, the refined ReplicaDock ensembles reach significantly lower interface energies and many previously hidden features of the docking energy landscape become visible when ReplicaDock is applied.
Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code.
Computational protein design is a promising field with many biomedical applications, such as drug design, or the redesign of new enzymes to perform nonnatural chemical reactions. An essential feature of any protein design algorithm is the ability to accurately model the flexibility that occurs in real proteins. In enzyme design, for example, an algorithm must predict how the designed protein will change during binding and catalysis. In this work we present a large-scale study of 69 protein redesigns that shows the necessity of modeling more realistic protein flexibility. Specifically, we model the continuous space around low-energy conformations of amino acid side chains, and compare it against the standard rigid approach of modeling only a small discrete set of low-energy conformations. We show that by allowing the side chains to move in the continuous space around low energy conformations during the protein design search, we obtain very different sequences that better match real protein sequences. Moreover, we propose a new protein design algorithm that, contrary to conventional wisdom, shows that we can search the continuous space around side chains with close to the same efficiency as algorithms that model only a discrete set of conformations.
We have developed a suite of protein redesign algorithms that improves realistic in silico modeling of proteins. These algorithms are based on three characteristics that make them unique: (1) improved flexibility of the protein backbone, protein side chains, and ligand to accurately capture the conformational changes that are induced by mutations to the protein sequence; (2) modeling of proteins and ligands as ensembles of low-energy structures to better approximate binding affinity; and (3) a globally-optimal protein design search, guaranteeing that the computational predictions are optimal with respect to the input model. Here, we illustrate the importance of these three characteristics. We then describe OSPREY, a protein redesign suite that implements our protein design algorithms. OSPREY has been used prospectively, with experimental validation, in several biomedically-relevant settings. We show in detail how OSPREY has been used to predict resistance mutations and explain why improved flexibility, ensembles, and provability are essential for this application.
protein design; OSPREY; Dead-end elimination; protein ensembles; protein flexibility; K*; minDEE
One of the main challenges for protein redesign is the efficient evaluation of a combinatorial number of candidate structures. The modeling of protein flexibility, typically by using a rotamer library of commonly-observed low-energy side-chain conformations, further increases the complexity of the redesign problem. A dominant algorithm for protein redesign is Dead-End Elimination (DEE), which prunes the majority of candidate conformations by eliminating rigid rotamers that provably are not part of the Global Minimum Energy Conformation (GMEC). The identified GMEC consists of rigid rotamers (i.e., rotamers that have not been energy-minimized) and is thus referred to as the rigid-GMEC. As a post-processing step, the conformations that survive DEE may be energy-minimized. When energy minimization is performed after pruning with DEE, the combined protein design process becomes heuristic, and is no longer provably accurate: a conformation that is pruned using rigid-rotamer energies may subsequently minimize to a lower energy than the rigid-GMEC. That is, the rigid-GMEC and the conformation with the lowest energy among all energy-minimized conformations (the minimized-GMEC) are likely to be different. While the traditional DEE algorithm succeeds in not pruning rotamers that are part of the rigid-GMEC, it makes no guarantees regarding the identification of the minimized-GMEC. In this paper we derive a novel, provable, and efficient DEE-like algorithm, called minimized-DEE (MinDEE), that guarantees that rotamers belonging to the minimized-GMEC will not be pruned, while still pruning a combinatorial number of conformations. We show that MinDEE is useful not only in identifying the minimized-GMEC, but also as a filter in an ensemble-based scoring and search algorithm for protein redesign that exploits energy-minimized conformations. We compare our results both to our previous computational predictions of protein designs and to biological activity assays of predicted protein mutants. Our provable and efficient minimized-DEE algorithm is applicable in protein redesign, protein-ligand binding prediction, and computer-aided drug design.
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
The computer-based design of protein-protein interactions is a rigorous test of our understanding of molecular recognition and an attractive approach for creating novel tools for cell and molecular research. Considerable attention has been placed on redesigning the affinity and specificity of naturally occurring interactions. Several studies have shown that reducing the desolvation costs for binding while preserving shape complimentarity and hydrogen bonding is an effective strategy for improving binding affinities. In favorable cases specificity has been designed by focusing only on interactions with the target protein, while in cases with closely related off-target proteins, it has been necessary to explicitly disfavor unwanted binding partners. The rational design of protein-protein interactions from scratch is still an unsolved problem, but recent developments in flexible backbone design and energy functions hold promise for the future.
Rational protein design; computational protein design; de novo protein design; protein-protein interactions
Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.
We explore this strategy for four SCOP families: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000–300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.
For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design.
Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
The inhibitory switch (IS) domain of p21-activated kinase 1 (PAK1) stabilizes full-length PAK1 in an inactive conformation by binding to the PAK1 kinase domain. Competitive binding of small GTPases to the IS domain disrupts the autoinhibitory interactions and exposes the IS domain binding site on the surface of the kinase domain. To build an affinity reagent that selectively binds the activated state of PAK1, we used molecular modeling to re-engineer the isolated IS domain so that it was soluble and stable, did not bind to GTPases and bound more tightly to the PAK1 kinase domain. Three design strategies were tested: in the first and second case, extension and redesign of the N-terminus were used to expand the hydrophobic core of the domain and in the third case the termini were redesigned to be adjacent in space so that that the domain could be stabilized by insertion into a loop in a host cyan fluorescent protein (CFP). The best-performing design, called CFP-PAcKer, was based on the third strategy and bound the kinase domain of PAK1 with an affinity of 400 nM. CFP-PAcKer binds more tightly to a full-length variant of PAK1 that is stabilized in the ‘open’ state (Kd = 3.3 µM) than to full-length PAK1 in the ‘closed’ state (undetectable affinity), and binding can be monitored with fluorescence by placing an environmentally sensitive fluorescence dye on CFP-PAcKer adjacent to the binding site.
Computational protein design; Rosetta; merocyanine dye; p21-activated kinase
Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than −0.4 in native-like binding modes.
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein–protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design.
Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design.
is one of the prime tools for high resolution protein structure
refinement. While its scoring function can distinguish native-like
from non-native-like conformations in many cases, the method is limited
by conformational sampling for larger proteins, that is, leaving a
local energy minimum in which the search algorithm may get stuck.
Here, we test the hypothesis that iteration of Rosetta with an orthogonal
sampling and scoring strategy might facilitate exploration of conformational
space. Specifically, we run short molecular dynamics (MD) simulations
on models created by de novo folding of large proteins
into cryoEM density maps to enable sampling of conformational space
not directly accessible to Rosetta and thus provide an escape route
from the conformational traps. We present a combined MD–Rosetta
protein structure refinement protocol that can overcome some
of these sampling limitations. Two of four benchmark proteins showed
incremental improvement through all three rounds of the iterative
refinement protocol. Molecular dynamics is most efficient in applying
subtle but important rearrangements within secondary structure elements
and is thus highly complementary to the Rosetta refinement, which
focuses on side chains and loop regions.
Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, the docking scores are not sufficiently precise to represent the protein-ligand binding affinity. Here, we developed an efficient computational method for calculating protein-ligand binding affinity, which is based on molecular mechanics generalized Born/surface area (MM-GBSA) calculations and Jarzynski identity. Jarzynski identity is an exact relation between free energy differences and the work done through non-equilibrium process, and MM-GBSA is a semimacroscopic approach to calculate the potential energy. To calculate the work distribution when a ligand is pulled out of its binding site, multiple protein-ligand conformations are randomly generated as an alternative to performing an explicit single-molecule pulling simulation. We assessed the new method, multiple random conformation/MM-GBSA (MRC-MMGBSA), by evaluating ligand-binding affinities (scores) for four target proteins, and comparing these scores with experimental data. The calculated scores were qualitatively in good agreement with the experimental binding affinities, and the optimal docking structure could be determined by ranking the scores of the multiple docking poses obtained by the molecular docking process. Furthermore, the scores showed a strong linear response to experimental binding free energies, so that the free energy difference of the ligand binding (ΔΔG) could be calculated by linear scaling of the scores. The error of calculated ΔΔG was within ≈±1.5 kcal•mol−1 of the experimental values. Particularly, in the case of flexible target proteins, the MRC-MMGBSA scores were more effective in ranking ligands than those generated by the MM-GBSA method using a single protein-ligand conformation. The results suggest that, owing to its lower computational costs and greater accuracy, the MRC-MMGBSA offers efficient means to rank the ligands, in the post-docking process, according to their binding affinities, and to compare these directly with the experimental values.
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.
The in-silico Site Identification by Ligand Competitive Saturation (SILCS) approach identifies the binding sites of representative chemical entities on the entire protein surface, information that can be applied for computational fragment-based drug design. In this study, we report an efficient computational protocol that uses sampling of the protein-fragment conformational space obtained from the SILCS simulations and performs single step free energy perturbation (SSFEP) calculations to identify site-specific favorable chemical modifications of benzene involving substitutions of ring hydrogens with individual non-hydrogen atoms. The SSFEP method is able to capture the experimental trends in relative hydration free energies of benzene analogues and for two datasets of experimental relative binding free energies of congeneric series of ligands of the proteins α-thrombin and P38 MAP kinase. The approach includes a protocol in which data obtained from SILCS simulations of the proteins is first analyzed to identify favorable benzene binding sites following which an ensemble of benzene-protein conformations for that site is obtained. The SSFEP protocol applied to that ensemble results in good reproduction of experimental free energies of the α-thrombin ligands, but not for P38 MAP kinase ligands. Comparison with results from a P38 full-ligand simulation and analysis of conformations reveals the reason for the poor agreement being the connectivity with the remainder of the ligand, a limitation inherent in fragment-based methods. Since the SSFEP approach can identify favorable benzene modifications as well as identify the most favorable fragment conformations, the obtained information can be of value for fragment linking or structure-based optimization.
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/.
The reprogramming of DNA-binding specificity is an important challenge for computational protein design that tests current understanding of protein–DNA recognition, and has considerable practical relevance for biotechnology and medicine1–6. Here we describe the computational redesign of the cleavage specificity of the intron-encoded homing endonuclease I-MsoI7 using a physically realistic atomic-level forcefield8,9. Using an in silico screen, we identified single base-pair substitutions predicted to disrupt binding by the wild-type enzyme, and then optimized the identities and conformations of clusters of amino acids around each of these unfavourable substitutions using Monte Carlo sampling10. A redesigned enzyme that was predicted to display altered target site specificity, while maintaining wild-type binding affinity, was experimentally characterized. The redesigned enzyme binds and cleaves the redesigned recognition site ~10,000 times more effectively than does the wild-type enzyme, with a level of target discrimination comparable to the original endonuclease. Determination of the structure of the redesigned nuclease- recognition site complex by X-ray crystallography confirms the accuracy of the computationally predicted interface. These results suggest that computational protein design methods can have an important role in the creation of novel highly specific endonucleases for gene therapy and other applications.