Macromolecular modeling and design are increasingly useful in basic research, biotechnology, and teaching. However, the absence of a user-friendly modeling framework that provides access to a wide range of modeling capabilities is hampering the wider adoption of computational methods by non-experts. RosettaScripts is an XML-like language for specifying modeling tasks in the Rosetta framework. RosettaScripts provides access to protocol-level functionalities, such as rigid-body docking and sequence redesign, and allows fast testing and deployment of complex protocols without need for modifying or recompiling the underlying C++ code. We illustrate these capabilities with RosettaScripts protocols for the stabilization of proteins, the generation of computationally constrained libraries for experimental selection of higher-affinity binding proteins, loop remodeling, small-molecule ligand docking, design of ligand-binding proteins, and specificity redesign in DNA-binding proteins.
Accommodating backbone flexibility continues to be the most difficult challenge in computational docking of protein-protein complexes. Towards that end, we simulate four distinct biophysical models of protein binding in RosettaDock, a multi-scale Monte-Carlo based algorithm that uses a quasi-kinetic search process to emulate the diffusional encounter of two proteins and identify low energy complexes. The four binding models are: 1) key-lock model (KL) using rigid-backbone docking, 2) conformer selection model (CS) using a novel ensemble docking algorithm, 3) induced fit model (IF) using energy gradient-based backbone minimization, and 4) a combined conformer selection/induced fit model (CS/IF). Backbone flexibility was limited to the smaller partner of the complex, structural ensembles were generated using Rosetta refinement methods, and docking consisted of local perturbations around the complexed conformation using unbound component crystal structures for a set of 21 target complexes. The lowest-energy structure contained more than 30% of the native residue-residue contacts for 9, 13, 13, and 14 targets for KL, CS, IF and CS/IF docking respectively. When applied to 15 targets using NMR ensembles of the smaller protein, the lowest-energy structure recovered at least 30% native residue contacts in 3, 8, 4 and 8 targets for KL, CS, IF and CS/IF docking respectively. CS/IF docking of the NMR ensemble performed equally well or better than KL docking with the unbound crystal structure in 10 of 15 cases. The marked success of CS and CS/IF docking shows that ensemble docking can be a versatile and effective method for accommodating conformational plasticity in docking and serves as a demonstration for the conformer selection theory - that binding-competent conformers exist in the unbound ensemble and can be selected based on their favorable binding energies.
protein-protein docking; flexible docking; ensemble docking; conformer selection; NMR ensembles
Few existing protein-protein interface design methods allow for extensive backbone rearrangements during the design process. There is also a dichotomy between redesign methods, which take advantage of the native interface, and de novo methods, which produce novel binders.
Here, we propose a new method for designing novel protein reagents that combines advantages of redesign and de novo methods and allows for extensive backbone motion. This method requires a bound structure of a target and one of its natural binding partners. A key interaction in this interface, the anchor, is computationally grafted out of the partner and into a surface loop on the design scaffold. The design scaffold's surface is then redesigned with backbone flexibility to create a new binding partner for the target. Careful choice of a scaffold will bring experimentally desirable characteristics into the new complex. The use of an anchor both expedites the design process and ensures that binding proceeds against a known location on the target. The use of surface loops on the scaffold allows for flexible-backbone redesign to properly search conformational space.
Conclusions and Significance
This protocol was implemented within the Rosetta3 software suite. To demonstrate and evaluate this protocol, we have developed a benchmarking set of structures from the PDB with loop-mediated interfaces. This protocol can recover the correct loop-mediated interface in 15 out of 16 tested structures, using only a single residue as an anchor.
The de novo design of protein-binding peptides is challenging, because it requires identifying both a sequence and a backbone conformation favorable for binding. We used a computational strategy that iterates between structure and sequence optimization to redesign the C-terminal portion of the RGS14 GoLoco motif peptide so that it adopts a new conformation when bound to Gαi1. An X-ray crystal structure of the redesigned complex closely matches the computational model, with a backbone RMSD of 1.1 Å.
Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code.
Computational protein design is a promising field with many biomedical applications, such as drug design, or the redesign of new enzymes to perform nonnatural chemical reactions. An essential feature of any protein design algorithm is the ability to accurately model the flexibility that occurs in real proteins. In enzyme design, for example, an algorithm must predict how the designed protein will change during binding and catalysis. In this work we present a large-scale study of 69 protein redesigns that shows the necessity of modeling more realistic protein flexibility. Specifically, we model the continuous space around low-energy conformations of amino acid side chains, and compare it against the standard rigid approach of modeling only a small discrete set of low-energy conformations. We show that by allowing the side chains to move in the continuous space around low energy conformations during the protein design search, we obtain very different sequences that better match real protein sequences. Moreover, we propose a new protein design algorithm that, contrary to conventional wisdom, shows that we can search the continuous space around side chains with close to the same efficiency as algorithms that model only a discrete set of conformations.
One of the main challenges for protein redesign is the efficient evaluation of a combinatorial number of candidate structures. The modeling of protein flexibility, typically by using a rotamer library of commonly-observed low-energy side-chain conformations, further increases the complexity of the redesign problem. A dominant algorithm for protein redesign is Dead-End Elimination (DEE), which prunes the majority of candidate conformations by eliminating rigid rotamers that provably are not part of the Global Minimum Energy Conformation (GMEC). The identified GMEC consists of rigid rotamers (i.e., rotamers that have not been energy-minimized) and is thus referred to as the rigid-GMEC. As a post-processing step, the conformations that survive DEE may be energy-minimized. When energy minimization is performed after pruning with DEE, the combined protein design process becomes heuristic, and is no longer provably accurate: a conformation that is pruned using rigid-rotamer energies may subsequently minimize to a lower energy than the rigid-GMEC. That is, the rigid-GMEC and the conformation with the lowest energy among all energy-minimized conformations (the minimized-GMEC) are likely to be different. While the traditional DEE algorithm succeeds in not pruning rotamers that are part of the rigid-GMEC, it makes no guarantees regarding the identification of the minimized-GMEC. In this paper we derive a novel, provable, and efficient DEE-like algorithm, called minimized-DEE (MinDEE), that guarantees that rotamers belonging to the minimized-GMEC will not be pruned, while still pruning a combinatorial number of conformations. We show that MinDEE is useful not only in identifying the minimized-GMEC, but also as a filter in an ensemble-based scoring and search algorithm for protein redesign that exploits energy-minimized conformations. We compare our results both to our previous computational predictions of protein designs and to biological activity assays of predicted protein mutants. Our provable and efficient minimized-DEE algorithm is applicable in protein redesign, protein-ligand binding prediction, and computer-aided drug design.
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
The computer-based design of protein-protein interactions is a rigorous test of our understanding of molecular recognition and an attractive approach for creating novel tools for cell and molecular research. Considerable attention has been placed on redesigning the affinity and specificity of naturally occurring interactions. Several studies have shown that reducing the desolvation costs for binding while preserving shape complimentarity and hydrogen bonding is an effective strategy for improving binding affinities. In favorable cases specificity has been designed by focusing only on interactions with the target protein, while in cases with closely related off-target proteins, it has been necessary to explicitly disfavor unwanted binding partners. The rational design of protein-protein interactions from scratch is still an unsolved problem, but recent developments in flexible backbone design and energy functions hold promise for the future.
Rational protein design; computational protein design; de novo protein design; protein-protein interactions
Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.
We explore this strategy for four SCOP families: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000–300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.
For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design.
Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
The inhibitory switch (IS) domain of p21-activated kinase 1 (PAK1) stabilizes full-length PAK1 in an inactive conformation by binding to the PAK1 kinase domain. Competitive binding of small GTPases to the IS domain disrupts the autoinhibitory interactions and exposes the IS domain binding site on the surface of the kinase domain. To build an affinity reagent that selectively binds the activated state of PAK1, we used molecular modeling to re-engineer the isolated IS domain so that it was soluble and stable, did not bind to GTPases and bound more tightly to the PAK1 kinase domain. Three design strategies were tested: in the first and second case, extension and redesign of the N-terminus were used to expand the hydrophobic core of the domain and in the third case the termini were redesigned to be adjacent in space so that that the domain could be stabilized by insertion into a loop in a host cyan fluorescent protein (CFP). The best-performing design, called CFP-PAcKer, was based on the third strategy and bound the kinase domain of PAK1 with an affinity of 400 nM. CFP-PAcKer binds more tightly to a full-length variant of PAK1 that is stabilized in the ‘open’ state (Kd = 3.3 µM) than to full-length PAK1 in the ‘closed’ state (undetectable affinity), and binding can be monitored with fluorescence by placing an environmentally sensitive fluorescence dye on CFP-PAcKer adjacent to the binding site.
Computational protein design; Rosetta; merocyanine dye; p21-activated kinase
Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than −0.4 in native-like binding modes.
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein–protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design.
Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design.
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.
Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, the docking scores are not sufficiently precise to represent the protein-ligand binding affinity. Here, we developed an efficient computational method for calculating protein-ligand binding affinity, which is based on molecular mechanics generalized Born/surface area (MM-GBSA) calculations and Jarzynski identity. Jarzynski identity is an exact relation between free energy differences and the work done through non-equilibrium process, and MM-GBSA is a semimacroscopic approach to calculate the potential energy. To calculate the work distribution when a ligand is pulled out of its binding site, multiple protein-ligand conformations are randomly generated as an alternative to performing an explicit single-molecule pulling simulation. We assessed the new method, multiple random conformation/MM-GBSA (MRC-MMGBSA), by evaluating ligand-binding affinities (scores) for four target proteins, and comparing these scores with experimental data. The calculated scores were qualitatively in good agreement with the experimental binding affinities, and the optimal docking structure could be determined by ranking the scores of the multiple docking poses obtained by the molecular docking process. Furthermore, the scores showed a strong linear response to experimental binding free energies, so that the free energy difference of the ligand binding (ΔΔG) could be calculated by linear scaling of the scores. The error of calculated ΔΔG was within ≈±1.5 kcal•mol−1 of the experimental values. Particularly, in the case of flexible target proteins, the MRC-MMGBSA scores were more effective in ranking ligands than those generated by the MM-GBSA method using a single protein-ligand conformation. The results suggest that, owing to its lower computational costs and greater accuracy, the MRC-MMGBSA offers efficient means to rank the ligands, in the post-docking process, according to their binding affinities, and to compare these directly with the experimental values.
The reprogramming of DNA-binding specificity is an important challenge for computational protein design that tests current understanding of protein–DNA recognition, and has considerable practical relevance for biotechnology and medicine1–6. Here we describe the computational redesign of the cleavage specificity of the intron-encoded homing endonuclease I-MsoI7 using a physically realistic atomic-level forcefield8,9. Using an in silico screen, we identified single base-pair substitutions predicted to disrupt binding by the wild-type enzyme, and then optimized the identities and conformations of clusters of amino acids around each of these unfavourable substitutions using Monte Carlo sampling10. A redesigned enzyme that was predicted to display altered target site specificity, while maintaining wild-type binding affinity, was experimentally characterized. The redesigned enzyme binds and cleaves the redesigned recognition site ~10,000 times more effectively than does the wild-type enzyme, with a level of target discrimination comparable to the original endonuclease. Determination of the structure of the redesigned nuclease- recognition site complex by X-ray crystallography confirms the accuracy of the computationally predicted interface. These results suggest that computational protein design methods can have an important role in the creation of novel highly specific endonucleases for gene therapy and other applications.
The accurate prediction of enzyme-substrate interaction energies is one of the major challenges in computational biology. This study describes the improvement of protein-ligand binding energy prediction by incorporating protein flexibility through the use of molecular dynamics (MD) simulations.
Docking experiments were undertaken using the program AutoDock for twenty-five HIV-1 protease-inhibitor complexes determined by x-ray crystallography. Protein-rigid docking without any dynamics produced a low correlation of 0.38 between the experimental and calculated binding energies. Correlations improved significantly for all time scales of MD simulations of the receptor-ligand complex. The highest correlation coefficient of 0.87 between the experimental and calculated energies was obtained after 0.1 picoseconds of dynamics simulation.
Our results indicate that relaxation of protein complexes by MD simulation is useful and necessary to obtain binding energies that are representative of the experimentally determined values.
Peptide–protein interactions are among the most prevalent and important interactions in the cell, but a large fraction of those interactions lack detailed structural characterization. The Rosetta FlexPepDock web server (http://flexpepdock.furmanlab.cs.huji.ac.il/) provides an interface to a high-resolution peptide docking (refinement) protocol for the modeling of peptide–protein complexes, implemented within the Rosetta framework. Given a protein receptor structure and an approximate, possibly inaccurate model of the peptide within the receptor binding site, the FlexPepDock server refines the peptide to high resolution, allowing full flexibility to the peptide backbone and to all side chains. This protocol was extensively tested and benchmarked on a wide array of non-redundant peptide–protein complexes, and was proven effective when applied to peptide starting conformations within 5.5 Å backbone root mean square deviation from the native conformation. FlexPepDock has been applied to several systems that are mediated and regulated by peptide–protein interactions. This easy to use and general web server interface allows non-expert users to accurately model their specific peptide–protein interaction of interest.
Quinones play important roles in mitochondrial and photosynthetic energy conversion acting as intramembrane, mobile electron and proton carriers between catalytic sites in various electron transfer proteins. They display different affinity, selectivity, functionality and exchange dynamics in different binding sites. The computational analysis of quinone binding sheds light on the requirements for quinone affinity and specificity. The affinities of ten oxidized, neutral benzoquinones (BQs) were measured for the high affinity QA site in the detergent solubilized Rhodobacter sphaeroides bacterial photosynthetic reaction center. Multi-Conformation Continuum Electrostatics (MCCE) was then used to calculate their relative binding free energies by Grand Canonical Monte Carlo sampling with a rigid protein backbone, flexible ligand and side chain positions and protonation states. Van der Waals and torsion energies, Poisson-Boltzmann continuum electrostatics and accessible surface area dependent ligand-solvent interactions are considered. An initial, single cycle of GROMACS backbone optimization improves the match with experiment as do coupled ligand and side chain motions. The calculations match experiment with an RMSD of 2.29 and a slope of 1.28. The affinities are dominated by favorable protein-ligand van der Waals rather than electrostatic interactions. Each quinone appears in a closely clustered set of positions. Methyl and methoxy groups move into the same positions as found for the native quinone. Difficulties putting methyls into methoxy sites are observed. Calculations using an SAS dependent implicit van der Waals interaction smoothed out small clashes, providing a better match to experiment with a RMSD of 0.77 and a slope of 0.97.
quinone; bacterial reaction center; binding affinity; docking; QA; photosynthesis
RosettaDock has been increasingly used in protein docking and design strategies in order to predict the structure of protein-protein interfaces. Here we test capabilities of RosettaDock 3.2, part of the newly developed Rosetta v3.2 modeling suite, against Docking Benchmark 3.0, and compare it with RosettaDock v2.3, the latest version of the previous Rosetta software package. The benchmark contains a diverse set of 116 docking targets including 22 antibody-antigen complexes, 33 enzyme-inhibitor complexes, and 60 ‘other’ complexes. These targets were further classified by expected docking difficulty into 84 rigid-body targets, 17 medium targets, and 14 difficult targets. We carried out local docking perturbations for each target, using the unbound structures when available, in both RosettaDock v2.3 and v3.2. Overall the performances of RosettaDock v2.3 and v3.2 were similar. RosettaDock v3.2 achieved 56 docking funnels, compared to 49 in v2.3. A breakdown of docking performance by protein complex type shows that RosettaDock v3.2 achieved docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor targets, and 35% of ‘other’ targets. In terms of docking difficulty, RosettaDock v3.2 achieved funnels for 58% of rigid-body targets, 30% of medium targets, and 14% of difficult targets. For targets that failed, we carry out additional analyses to identify the cause of failure, which showed that binding-induced backbone conformation changes account for a majority of failures. We also present a bootstrap statistical analysis that quantifies the reliability of the stochastic docking results. Finally, we demonstrate the additional functionality available in RosettaDock v3.2 by incorporating small-molecules and non-protein co-factors in docking of a smaller target set. This study marks the most extensive benchmarking of the RosettaDock module to date and establishes a baseline for future research in protein interface modeling and structure prediction.
The RosettaDock server (http://rosettadock.graylab.jhu.edu) identifies low-energy conformations of a protein–protein interaction near a given starting configuration by optimizing rigid-body orientation and side-chain conformations. The server requires two protein structures as inputs and a starting location for the search. RosettaDock generates 1000 independent structures, and the server returns pictures, coordinate files and detailed scoring information for the 10 top-scoring models. A plot of the total energy of each of the 1000 models created shows the presence or absence of an energetic binding funnel. RosettaDock has been validated on the docking benchmark set and through the Critical Assessment of PRedicted Interactions blind prediction challenge.
Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality.
In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems.
The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design.
We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.