De novo protein design requires the identification of amino-acid sequences that favor the target folded conformation and are soluble in water. One strategy for promoting solubility is to disallow hydrophobic residues on the protein surface during design. However, naturally occurring proteins often have hydrophobic amino acids on their surface that contribute to protein stability via the partial burial of hydrophobic surface area or play a key role in the formation of protein-protein interactions. A less restrictive approach for surface design that is used by the modeling program Rosetta is to parameterize the energy function so that the number of hydrophobic amino acids designed on the protein surface is similar to what is observed in naturally occurring monomeric proteins. Previous studies with Rosetta have shown that this limits surface hydrophobics to the naturally occurring frequency (~28%) but that it does not prevent the formation of hydrophobic patches that are considerably larger than those observed in naturally occurring proteins. Here, we describe a new score term that explicitly detects and penalizes the formation of hydrophobic patches during computational protein design. With the new term we are able to design protein surfaces that include hydrophobic amino acids at naturally occurring frequencies, but do not have large hydrophobic patches. By adjusting the strength of the new score term the emphasis of surface redesigns can be switched between maintaining solubility and maximizing folding free energy.
computational protein design; protein solubility; protein stability; Rosetta
Computationally designing protein-protein interactions with high affinity and desired orientation is a challenging task. Incorporating metal-binding sites at the target interface may be one approach for increasing affinity and specifying the binding mode, thereby improving robustness of designed interactions for use as tools in basic research as well as in applications from biotechnology to medicine. Here we describe a Rosetta-based approach for the rational design of a protein monomer to form a zinc-mediated, symmetric homodimer. Our metal interface design, named MID1 (NESG target ID OR37), forms a tight dimer in the presence of zinc (MID1-zinc) with a dissociation constant <30 nM. Without zinc the dissociation constant is 4 μM. The crystal structure of MID1-zinc shows good overall agreement with the computational model, but only three out of four designed histidines coordinate zinc. However, a histidine-to-glutamate point mutation resulted in four-coordination of zinc, and the resulting metal binding site and dimer orientation closely matches the computational model (Cα RMSD = 1.4 Å).
computational protein interface design; protein-protein interaction; metal; zinc; cobalt; homodimer; de novo
The inhibitory switch (IS) domain of p21-activated kinase 1 (PAK1) stabilizes full-length PAK1 in an inactive conformation by binding to the PAK1 kinase domain. Competitive binding of small GTPases to the IS domain disrupts the autoinhibitory interactions and exposes the IS domain binding site on the surface of the kinase domain. To build an affinity reagent that selectively binds the activated state of PAK1, we used molecular modeling to re-engineer the isolated IS domain so that it was soluble and stable, did not bind to GTPases and bound more tightly to the PAK1 kinase domain. Three design strategies were tested: in the first and second case, extension and redesign of the N-terminus were used to expand the hydrophobic core of the domain and in the third case the termini were redesigned to be adjacent in space so that that the domain could be stabilized by insertion into a loop in a host cyan fluorescent protein (CFP). The best-performing design, called CFP-PAcKer, was based on the third strategy and bound the kinase domain of PAK1 with an affinity of 400 nM. CFP-PAcKer binds more tightly to a full-length variant of PAK1 that is stabilized in the ‘open’ state (Kd = 3.3 µM) than to full-length PAK1 in the ‘closed’ state (undetectable affinity), and binding can be monitored with fluorescence by placing an environmentally sensitive fluorescence dye on CFP-PAcKer adjacent to the binding site.
Computational protein design; Rosetta; merocyanine dye; p21-activated kinase
During ubiquitin conjugation, the thioester bond that links ‘donor’ ubiquitin to ubiquitin-conjugating enzyme (E2) undergoes nucleophilic attack by the ε-amino group of an acceptor lysine, resulting in formation of an isopeptide bond. Models of ubiquitination have envisioned the donor ubiquitin to be a passive participant in this process. However, we show here that the I44A mutation in ubiquitin profoundly inhibits its ability to serve as a donor for ubiquitin chain initiation or elongation, but can be rescued by computationally-predicted compensatory mutations in the E2 Cdc34. The donor defect of ubiquitin-I44A can be partially suppressed either by using a low pKa amine (hydroxylamine) as the acceptor or by performing reactions at higher pH, suggesting that the discharge defect arises in part due to inefficient deprotonation of the acceptor lysine. We propose that interaction between Cdc34 and the donor ubiquitin organizes the active site to promote efficient ubiquitination of substrate.
The de novo design of protein-binding peptides is challenging, because it requires identifying both a sequence and a backbone conformation favorable for binding. We used a computational strategy that iterates between structure and sequence optimization to redesign the C-terminal portion of the RGS14 GoLoco motif peptide so that it adopts a new conformation when bound to Gαi1. An X-ray crystal structure of the redesigned complex closely matches the computational model, with a backbone RMSD of 1.1 Å.
Noncanonical amino acids (NCAAs) can be used in a variety of protein design contexts. For example, they can be used in place of the canonical amino acids (CAAs) to improve the biophysical properties of peptides that target protein interfaces. We describe the incorporation of 114 NCAAs into the protein-modeling suite Rosetta. We describe our methods for building backbone dependent rotamer libraries and the parameterization and construction of a scoring function that can be used to score NCAA containing peptides and proteins. We validate these additions to Rosetta and our NCAA-rotamer libraries by showing that we can improve the binding of a calpastatin derived peptides to calpain-1 by substituting NCAAs for native amino acids using Rosetta. Rosetta (executables and source), auxiliary scripts and code, and documentation can be found at (http://www.rosettacommons.org/).
Fluorescent biosensors for living cells currently require laborious optimization and a unique design for each target. They are limited by the availability of naturally occurring ligands with appropriate target specificity. Here we describe a biosensor based on an engineered fibronectin monobody scaffold that can be tailored to bind different targets via high throughput screening. This Src family kinase (SFK) biosensor was made by derivatizing a monobody specific for activated SFK with a bright dye whose fluorescence increases upon target binding. We identified sites for dye attachment and alterations to eliminate vesiculation in living cells, providing a generalizable scaffold for biosensor production. This approach minimizes cell perturbation because it senses endogenous, unmodified target, and because sensitivity is enhanced by direct dye excitation. Automated correlation of cell velocities and SFK activity revealed that SFK are activated specifically during protrusion. Activity correlates with velocity, and peaks 1–2 microns from the leading edge.
We describe a computational protocol, called DDMI, for redesigning scaffold proteins to bind to a specified region on a target protein. The DDMI protocol is implemented within the Rosetta molecular modeling program and uses rigid-body docking, sequence design, and gradient-based minimization of backbone and side chain torsion angles to design low energy interfaces between the scaffold and target protein. Iterative rounds of sequence design and conformational optimization were needed to produce models that have calculated binding energies that are similar to binding energies calculated for native complexes. We also show that additional conformation sampling with molecular dynamics can be iterated with sequence design to further lower the computed energy of the designed complexes. To experimentally test the DDMI protocol we redesigned the human hyperplastic discs protein to bind to the kinase domain of p21-activated kinase 1 (PAK1). Six designs were experimentally characterized. Two of the designs aggregated and were not characterized further. Of the remaining four designs, three bound to the PAK1 with affinities tighter than 350 μM. The tightest binding design, named Spider Roll, bound with an affinity of 100 μM. NMR –based structure prediction of Spider Roll based on backbone and 13Cβ chemical shifts using the program CS-ROSETTA indicated that the architecture of human hyperplastic discs protein is preserved. Mutagenesis studies confirmed that Spider Roll binds the target patch on PAK1. Additionally, Spider Roll binds to full length PAK1 in its activated state, but does not bind PAK1 when it forms an auto-inhibited conformation that blocks the Spider Roll target site. Subsequent NMR characterization of the binding of Spider Roll to PAK1 revealed a comparably small binding `on-rate' constant (<< 105 M−1 s−1). The ability to rationally design the site of novel protein-protein interactions is an important step towards creating new proteins that are useful as therapeutics or molecular probes.
Computational protein design; protein-protein interactions; protein docking; Rosetta molecular modeling program; NMR; CS-ROSETTA
Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design.
Few existing protein-protein interface design methods allow for extensive backbone rearrangements during the design process. There is also a dichotomy between redesign methods, which take advantage of the native interface, and de novo methods, which produce novel binders.
Here, we propose a new method for designing novel protein reagents that combines advantages of redesign and de novo methods and allows for extensive backbone motion. This method requires a bound structure of a target and one of its natural binding partners. A key interaction in this interface, the anchor, is computationally grafted out of the partner and into a surface loop on the design scaffold. The design scaffold's surface is then redesigned with backbone flexibility to create a new binding partner for the target. Careful choice of a scaffold will bring experimentally desirable characteristics into the new complex. The use of an anchor both expedites the design process and ensures that binding proceeds against a known location on the target. The use of surface loops on the scaffold allows for flexible-backbone redesign to properly search conformational space.
Conclusions and Significance
This protocol was implemented within the Rosetta3 software suite. To demonstrate and evaluate this protocol, we have developed a benchmarking set of structures from the PDB with loop-mediated interfaces. This protocol can recover the correct loop-mediated interface in 15 out of 16 tested structures, using only a single residue as an anchor.
The importance of a protein-protein interaction to a signaling pathway can be established by showing that amino acid mutations that weaken the interaction disrupt signaling, and that additional mutations that rescue the interaction recover signaling. Identifying rescue mutations, often referred to as second-site suppressor mutations, controls against scenarios in which the initial deleterious mutation inactivates the protein or disrupts alternative protein-protein interactions. Here, we test a structure-based protocol for identifying second-site suppressor mutations that is based on a strategy previously described by Kortemme and Baker. The molecular modeling software Rosetta is used to scan an interface for point mutations that are predicted to weaken binding but can be rescued by mutations on the partner protein. The protocol typically identifies three types of specificity switches: knob-in-to-hole redesigns, switching hydrophobic interactions to hydrogen bond interactions, and replacing polar interactions with non-polar interactions. Computational predictions were tested with two separate protein complexes; the G-protein Gαi1 bound to the RGS14 GoLoco motif, and UbcH7 bound to the ubiquitin ligase E6AP. Eight designs were experimentally tested. Swapping a buried hydrophobic residue with a polar residue dramatically weakened binding affinities. In none of these cases were we able to identify compensating mutations that returned binding to wild type affinity, highlighting the challenges inherent in designing buried hydrogen bond networks. The strongest specificity switches were a knob-in-to-hole design (20-fold) and the replacement of a charge-charge interaction with non-polar interactions (55-fold). In two cases, specificity was further tuned by including mutations distant from the initial design.
Computational Protein Design; Protein-Protein Interactions; Protein Binding Specificity; Rosetta Molecular Modeling Software
The computer-based design of protein-protein interactions is a rigorous test of our understanding of molecular recognition and an attractive approach for creating novel tools for cell and molecular research. Considerable attention has been placed on redesigning the affinity and specificity of naturally occurring interactions. Several studies have shown that reducing the desolvation costs for binding while preserving shape complimentarity and hydrogen bonding is an effective strategy for improving binding affinities. In favorable cases specificity has been designed by focusing only on interactions with the target protein, while in cases with closely related off-target proteins, it has been necessary to explicitly disfavor unwanted binding partners. The rational design of protein-protein interactions from scratch is still an unsolved problem, but recent developments in flexible backbone design and energy functions hold promise for the future.
Rational protein design; computational protein design; de novo protein design; protein-protein interactions
Degradation by the ubiquitin-proteasome system requires assembly of a polyubiquitin chain upon substrate. However, the structural and mechanistic features that enable template-independent processive chain synthesis are unknown. We show that chain assembly by ubiquitin ligase SCF and ubiquitin-conjugating enzyme Cdc34 is facilitated by the unusual nature of Cdc34-SCF transactions: Cdc34 binds SCF with nanomolar affinity, nevertheless the complex is extremely dynamic. These properties are enabled by rapid association driven by electrostatic interactions between the acidic tail of Cdc34 and a basic ‘canyon’ in the Cul1 subunit of SCF. Ab initio docking between Cdc34 and Cul1 predicts intimate contact between the tail and the basic canyon, an arrangement confirmed by cross-linking and kinetic analysis of mutants. Basic canyon residues are conserved in both Cul1 paralogs and orthologs, suggesting that the same mechanism underlies processivity for all cullin-RING ubiquitin ligases. We discuss different strategies by which processive ubiquitin chain synthesis may be achieved.
The precise spatio-temporal dynamics of protein activity are often critical in determining cell behaviour, yet for most proteins they remain poorly understood; it remains difficult to manipulate protein activity at precise times and places within living cells. Protein activity has been controlled by light, through protein derivatization with photocleavable moieties1 or using photoreactive small molecule ligands2. However, this requires use of toxic UV wavelengths, activation is irreversible, and/or cell loading is accomplished via disruption of the cell membrane (i.e. through microinjection). We have developed a new approach to produce genetically-encoded photo-activatable derivatives of Rac1, a key GTPase regulating actin cytoskeletal dynamics3,4. Rac1 mutants were fused to the photoreactive LOV (light oxygen voltage) domain from phototropin5,6, sterically blocking Rac1 interactions until irradiation unwound a helix linking LOV to Rac1. Photoactivatable Rac1 (PA-Rac1) could be reversibly and repeatedly activated using 458 or 473 nm light to generate precisely localized cell protrusions and ruffling. Localized Rac activation or inactivation was sufficient to produce cell motility and control the direction of cell movement. Myosin was involved in Rac control of directionality but not in Rac-induced protrusion, while PAK was required for Rac-induced protrusion. PA-Rac1 was used to elucidate Rac regulation of RhoA in cell motility. Rac and Rho coordinate cytoskeletal behaviours with seconds and submicron precision7,8. Their mutual regulation remains controversial9, with data indicating that Rac inhibits and/or activates Rho10,11. Rac was shown to inhibit RhoA in living cells, with inhibition modulated at protrusions and ruffles. A PA-Rac crystal structure and modelling revealed LOV-Rac interactions that will facilitate extension of this photoactivation approach to other proteins.
The de novo design of globular β-sheet proteins remains largely an unsolved problem. It is unclear if most designs are failing because the designed sequences do not have favorable energies in the target conformations or if more emphasis should be placed on negative design, i.e. explicitly identifying sequences that have poor energies when adopting undesired conformations. We tested if we could redesign the sequence of a naturally occurring β-sheet protein, tenascin, with a design algorithm that does not include explicit negative design. Denaturation experiments indicate that the designs are significantly more stable than the wild type protein and the crystal structure of one design closely matches the design model. These results suggest that extensive negative design is not required to create well-folded β-sandwich proteins. However, it is important to note that negative design elements may be encoded in the conformation of the protein backbone which was preserved from the wild type protein.
Computational Protein Design; De Novo Protein Design; β-sheet Design; Negative Design
B. stearothermophilus tryptophanyl-tRNA synthetase catalysis proceeds via high-energy protein conformations. Unliganded MD trajectories of the Pre-Transition-state complex with Mg2+•ATP and the (post) transition-state analog complex with adenosine tetraphosphate relax rapidly in opposite directions, the former regressing, the latter progressing along the structural reaction coordinate. The two crystal structures (RMSD 0.7 Å) therefore lie on opposite sides of a conformational free energy maximum as the chemical transition state forms. SNAPP analysis illustrates the complexity of the associated long-range conformational coupling. Switching interactions in four non-polar core regions are locally isoenergetic throughout the transition. Different configurations, however, propagate their effects to unfavorable, longer-range interactions at the molecular surface. Designed mutation shows that switching interactions enhance the rate, perhaps by destabilizing the ground state immediately before the transition state and limiting non-productive diffusion before and after the chemical transition state, thereby reducing the activation entropy. This paradigm may apply broadly to energy-transducing enzymes.
Molecular Dynamics; Induced fit; domain motion; transition-state stabilization; Delaunay Tessellation; SNAPP analysis; molecular switching; Multi-mutant cycles
The ability to manipulate protein binding affinities is important for the development of proteins as biosensors, industrial reagents, and therapeutics. We have developed a structure-based method to rationally predict single mutations at protein-protein interfaces that enhance binding affinities. The protocol is based on the premise that increasing buried hydrophobic surface area and/or reducing buried hydrophilic surface area will generally lead to enhanced affinity if large steric clashes are not introduced and buried polar groups are not left without a hydrogen bond partner. The procedure selects affinity enhancing point mutations at the protein-protein interface using three criteria: 1) the mutation must be from a polar amino acid to a non-polar amino acid or from a non-polar amino acid to a larger non-polar amino acid, 2) the free energy of binding as calculated with the Rosetta protein modeling program should be more favorable than the free energy of binding calculated for the wild type complex and 3) the mutation should not be predicted to significantly destabilize the monomers. The Rosetta energy function emphasizes short-range interactions: steric repulsion, Van der Waals forces, hydrogen bonding, and an implicit solvation model that penalizes placing atoms adjacent to polar groups. The performance of the computational protocol was experimentally tested on two separate protein complexes; Gαi1 from the heterotrimeric G-protein system bound to the RGS14 GoLoco motif, and the E2, UbcH7, bound to the E3, E6AP from the ubiquitin pathway. 12 single-site mutations that were predicted to be stabilizing were synthesized and characterized in the laboratory. 9 of the 12 mutations successfully increased binding affinity with 5 of these increasing binding by over 1.0 kcal/mol. To further assess our approach we searched the literature for point mutations that pass our criteria and have experimentally determined binding affinities. Of the 8 mutations identified, 5 were accurately predicted to increase binding affinity, further validating the method as a useful tool to increase protein-protein binding affinities.
Computational Protein Design; Protein-Protein Interactions; Protein Binding Hotspots; Rosetta Molecular Modeling Software; Hydrophobic Effect
Amino acid side chains adopt a discrete set of favorable conformations typically referred to as rotamers. The relative energies of rotamers partially determine which side chain conformations are more often observed in protein structures and accurate estimates of these energies are important for predicting protein structure and designing new proteins. Protein modelers typically calculate side chain rotamer energies by using molecular mechanics (MM) potentials or by converting rotamer probabilities from the protein database (PDB) into relative free energies. One limitation of the knowledge-based energies is that rotamer preferences observed in the PDB can reflect internal side chain energies as well as longer-range interactions with the rest of the protein. Here, we test an alternative approach for calculating rotamer energies. We use three different quantum mechanics (QM) methods (second order Moller-Plesset (MP2), density functional theory (DFT) energy calculation using the B3LYP functional, and Hartree-Fock) to calculate the energy of amino acid rotamers in a dipeptide model system, and then use these pre-calculated values in side chain placement simulations. Energies were calculated for over 35,000 different conformations of leucine, isoleucine and valine dipeptides with backbone torsion angles from the helical and strand regions of the Ramachandran plot. In a subset of cases these energies differ significantly from those calculated with standard molecular mechanics potentials or those derived from PDB statistics. We find that in these cases the energies from the QM methods result in more accurate placement of amino acid side chains in structure prediction tests.
Computational Protein Design; Rotamers; Torsion Energies; Protein Structure Prediction
G-proteins cycle between an inactive GDP-bound state and active GTP-bound state, serving as molecular switches that coordinate cellular signaling. We recently used phage-display to identify a series of peptides that bind Gα subunits in a nucleotide-dependent manner [Johnston, C. A., Willard, F. S., Jezyk, M. R., Fredericks, Z., Bodor, E. T., Jones, M. B., Blaesius, R., Watts, V. J., Harden, T. K., Sondek, J., Ramer, J. K., and Siderovski, D. P. (2005) Structure 13, 1069–1080]. Here we describe the structural features and functions of KB-1753, a peptide that binds selectively to GDP·AlF4−- and GTPγS-bound states of Gαi subunits. KB-1753 blocks interaction of Gαtransducin with its effector, cGMP phosphodiesterase, and inhibits transducin-mediated activation of cGMP degradation. Additionally, KB-1753 interferes with RGS protein binding and resultant GAP activity. A fluorescent KB-1753 variant was found to act as a sensor for activated Gα in vitro. The crystal structure of KB-1753 bound to Gαi1·GDP·AlF4− reveals binding to a conserved hydrophobic groove between switch II and α3 helices, and, along with supporting biochemical data and previous structural analyses, supports the notion that this is the site of effector interactions for Gαi subunits.
The conjugation of ubiquitin to substrates requires a series of enzymatic reactions consisting of an activating enzyme (E1), conjugating enzymes (E2) and ligases (E3). Tagging the appropriate substrate with ubiquitin is achieved by specific E2-E3 and E3-substrate interactions. E6AP, a member of the HECT family of E3s, has been previously shown to bind and function with the E2s UbcH7 and UbcH8. To decipher the sequence determinants of this specificity we have developed a quantitative E2-E3 binding assay based on fluorescence polarization and used this assay to measure the affinity of wild type and mutant E2–E6AP interactions. Alanine scanning of the E6AP–UbcH7 binding interface identified 4 side chains on UbcH7 and 6 side chains on E6AP that contribute more than 1 kcal /mol to the binding free energy. Two of the hot spot residues from UbcH7 (K96 and K100) are conserved in UbcH8 but vary across other E2s. To determine if these are key specificity determining residues, we attempted to induce a tighter association between the E2 UbcH5b and E6AP by mutating the corresponding positions in UbcH5b to lysines. Surprisingly, the mutations had little effect, but rather a mutation at UbcH7 position 4, which is not at a hot spot on the UbcH7–E6AP interface, significantly strengthened UbcH5bs affinity for E6AP. This result indicates that E2-E3 binding specificities are a function of both favorable interactions that promote binding, and unfavorable interactions that prevent binding with unwanted partners.
Ubiquitin; UbcH7; E6AP; HECT; E2-E3 Specificity
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.