We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromolecular modeling methodology and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amounts of conformational sampling, provided that the resolution of the force field is matched to the resolution of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Analysis of the outliers in the stability change calculations suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling.
ΔΔG prediction; protein stability; backbone flexibility; free energy change
Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov and Dunbrack, 2011).
Rosetta; energy function; scientific benchmarking; parameter estimation; decoy discrimination
Peptidomimetics are classes of molecules that mimic structural and functional attributes of polypeptides. Peptidomimetic oligomers can frequently be synthesized using efficient solid phase synthesis procedures similar to peptide synthesis. Conformationally ordered peptidomimetic oligomers are finding broad applications for molecular recognition and for inhibiting protein-protein interactions. One critical limitation is the limited set of design tools for identifying oligomer sequences that can adopt desired conformations. Here, we present expansions to the ROSETTA platform that enable structure prediction and design of five non-peptidic oligomer scaffolds (noncanonical backbones), oligooxopiperazines, oligo-peptoids, -peptides, hydrogen bond surrogate helices and oligosaccharides. This work is complementary to prior additions to model noncanonical protein side chains in ROSETTA. The main purpose of our manuscript is to give a detailed description to current and future developers of how each of these noncanonical backbones was implemented. Furthermore, we provide a general outline for implementation of new backbone types not discussed here. To illustrate the utility of this approach, we describe the first tests of the ROSETTA molecular mechanics energy function in the context of oligooxopiperazines, using quantum mechanical calculations as comparison points, scanning through backbone and side chain torsion angles for a model peptidomimetic. Finally, as an example of a novel design application, we describe the automated design of an oligooxopiperazine that inhibits the p53-MDM2 protein-protein interaction. For the general biological and bioengineering community, several noncanonical backbones have been incorporated into web applications that allow users to freely and rapidly test the presented protocols (http://rosie.rosettacommons.org). This work helps address the peptidomimetic community's need for an automated and expandable modeling tool for noncanonical backbones.
De novo protein design requires the identification of amino-acid sequences that favor the target folded conformation and are soluble in water. One strategy for promoting solubility is to disallow hydrophobic residues on the protein surface during design. However, naturally occurring proteins often have hydrophobic amino acids on their surface that contribute to protein stability via the partial burial of hydrophobic surface area or play a key role in the formation of protein-protein interactions. A less restrictive approach for surface design that is used by the modeling program Rosetta is to parameterize the energy function so that the number of hydrophobic amino acids designed on the protein surface is similar to what is observed in naturally occurring monomeric proteins. Previous studies with Rosetta have shown that this limits surface hydrophobics to the naturally occurring frequency (~28%) but that it does not prevent the formation of hydrophobic patches that are considerably larger than those observed in naturally occurring proteins. Here, we describe a new score term that explicitly detects and penalizes the formation of hydrophobic patches during computational protein design. With the new term we are able to design protein surfaces that include hydrophobic amino acids at naturally occurring frequencies, but do not have large hydrophobic patches. By adjusting the strength of the new score term the emphasis of surface redesigns can be switched between maintaining solubility and maximizing folding free energy.
computational protein design; protein solubility; protein stability; Rosetta
Accurate modeling of biomolecular systems requires accurate forcefields. Widely used molecular mechanics forcefields obtain parameters from experimental data and quantum chemistry calculations on small molecules, but do not have a clear way to take advantage of the information in high-resolution macromolecular structures. Knowledge-based methods, largely ignore the physical chemistry of interatomic interactions; instead, derive parameters almost exclusively from macromolecular structures. This can involve considerable double-counting of the same physical interactions. Here we describe a method to forcefield improvement that combines the strengths of the two approaches, and use it to improve the Rosetta all-atom forcefield, where the total energy is expressed as the sum of terms representing different physical interactions as in molecular mechanics forcefields. The values of the parameters, however, are fine tuned to reproduce the properties of macromolecular structures. To resolve inaccuracies resulting from possible double counting of interactions, we compare distribution functions from low energy modeled structures to those from crystal structures. The structural and physical bases of the deviations between the modeled and reference structures are identified and used to guide forcefield improvements. We describe improvements resolving double-counting between backbone hydrogen-bond interactions and Lennard-Jones interactions in helices; between sidechain-backbone hydrogen-bonds and the backbone torsion potential; and between the sidechain torsion potential and Lennard-Jones interactions. Discrepancies between computed and observed distributions are also used to guide the incorporation of an explicit Cα–hydrogen bond in β sheets. The method can be used generally to integrate different sources of information for forcefield improvement.
forcefield optimization; hydrogen bond potential; rotamer library
We describe a computational protocol, called DDMI, for redesigning scaffold proteins to bind to a specified region on a target protein. The DDMI protocol is implemented within the Rosetta molecular modeling program and uses rigid-body docking, sequence design, and gradient-based minimization of backbone and side chain torsion angles to design low energy interfaces between the scaffold and target protein. Iterative rounds of sequence design and conformational optimization were needed to produce models that have calculated binding energies that are similar to binding energies calculated for native complexes. We also show that additional conformation sampling with molecular dynamics can be iterated with sequence design to further lower the computed energy of the designed complexes. To experimentally test the DDMI protocol we redesigned the human hyperplastic discs protein to bind to the kinase domain of p21-activated kinase 1 (PAK1). Six designs were experimentally characterized. Two of the designs aggregated and were not characterized further. Of the remaining four designs, three bound to the PAK1 with affinities tighter than 350 μM. The tightest binding design, named Spider Roll, bound with an affinity of 100 μM. NMR –based structure prediction of Spider Roll based on backbone and 13Cβ chemical shifts using the program CS-ROSETTA indicated that the architecture of human hyperplastic discs protein is preserved. Mutagenesis studies confirmed that Spider Roll binds the target patch on PAK1. Additionally, Spider Roll binds to full length PAK1 in its activated state, but does not bind PAK1 when it forms an auto-inhibited conformation that blocks the Spider Roll target site. Subsequent NMR characterization of the binding of Spider Roll to PAK1 revealed a comparably small binding `on-rate' constant (<< 105 M−1 s−1). The ability to rationally design the site of novel protein-protein interactions is an important step towards creating new proteins that are useful as therapeutics or molecular probes.
Computational protein design; protein-protein interactions; protein docking; Rosetta molecular modeling program; NMR; CS-ROSETTA
Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design.
Macromolecular modeling and design are increasingly useful in basic research, biotechnology, and teaching. However, the absence of a user-friendly modeling framework that provides access to a wide range of modeling capabilities is hampering the wider adoption of computational methods by non-experts. RosettaScripts is an XML-like language for specifying modeling tasks in the Rosetta framework. RosettaScripts provides access to protocol-level functionalities, such as rigid-body docking and sequence redesign, and allows fast testing and deployment of complex protocols without need for modifying or recompiling the underlying C++ code. We illustrate these capabilities with RosettaScripts protocols for the stabilization of proteins, the generation of computationally constrained libraries for experimental selection of higher-affinity binding proteins, loop remodeling, small-molecule ligand docking, design of ligand-binding proteins, and specificity redesign in DNA-binding proteins.
Symmetric protein assemblies play important roles in many biochemical processes. However, the large size of such systems is challenging for traditional structure modeling methods. This paper describes the implementation of a general framework for modeling arbitrary symmetric systems in Rosetta3. We describe the various types of symmetries relevant to the study of protein structure that may be modeled using Rosetta's symmetric framework. We then describe how this symmetric framework is efficiently implemented within Rosetta, which restricts the conformational search space by sampling only symmetric degrees of freedom, and explicitly simulates only a subset of the interacting monomers. Finally, we describe structure prediction and design applications that utilize the Rosetta3 symmetric modeling capabilities, and provide a guide to running simulations on symmetric systems.
The Rosetta de novo enzyme design protocol has been used to design enzyme
catalysts for a variety of chemical reactions, and in principle can be applied
to any arbitrary chemical reaction of interest, The process has four stages: 1)
choice of a catalytic mechanism and corresponding minimal model active site, 2)
identification of sites in a set of scaffold proteins where this minimal active
site can be realized, 3) optimization of the identities of the surrounding
residues for stabilizing interactions with the transition state and primary
catalytic residues, and 4) evaluation and ranking the resulting designed
sequences. Stages two through four of this process can be carried out with the
Rosetta package, while stage one needs to be done externally. Here, we
demonstrate how to carry out the Rosetta enzyme design protocol from start to
end in detail using for illustration the triosephosphate isomerase reaction.
People exert significant amounts of problem solving effort playing computer games. Simple image- and text-recognition tasks have been successfully crowd-sourced through gamesi, ii, iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodologyiv, while they compete and collaborate to optimize the computed energy. We show that top Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar. New analysis functions have been added for RNA, for interfaces, and for NMR ensembles. Additionally, both the web site and major component programs have been rewritten to improve speed, convenience, clarity and integration with other resources. MolProbity results are reported in multiple forms: as overall numeric scores, as lists or charts of local problems, as downloadable PDB and graphics files, and most notably as informative, manipulable 3D kinemage graphics shown online in the KiNG viewer. This service is available free to all users at http://molprobity.biochem.duke.edu.
Accurate modeling of biomolecular systems requires accurate forcefields. Widely used molecular mechanics (MM) forcefields obtain parameters from experimental data and quantum chemistry calculations on small molecules but do not have a clear way to take advantage of the information in high-resolution macromolecular structures. In contrast, knowledge-based methods largely ignore the physical chemistry of interatomic interactions, and instead derive parameters almost exclusively from macromolecular structures. This can involve considerable double counting of the same physical interactions. Here, we describe a method for forcefield improvement that combines the strengths of the two approaches. We use this method to improve the Rosetta all-atom forcefield, in which the total energy is expressed as the sum of terms representing different physical interactions as in MM forcefields and the parameters are tuned to reproduce the properties of macromolecular structures. To resolve inaccuracies resulting from possible double counting of interactions, we compare distribution functions from low-energy modeled structures to those from crystal structures. The structural and physical bases of the deviations between the modeled and reference structures are identified and used to guide forcefield improvements. We describe improvements resolving double counting between backbone hydrogen bond interactions and Lennard-Jones interactions in helices; between sidechain-backbone hydrogen bonds and the backbone torsion potential; and between the sidechain torsion potential and Lennard-Jones interactions. Discrepancies between computed and observed distributions are also used to guide the incorporation of an explicit Cα-hydrogen bond in β sheets. The method can be used generally to integrate different sources of information for forcefield improvement.
forcefield optimization; hydrogen bond potential; rotamer library