The considerable flexibility of side-chains in folded proteins is important for protein stability and function, and may play a role in mediating pathways of energetic connectivity between allosteric sites. While sampling side-chain degrees of freedom has been an integral part of several successful computational protein design methods, the predictions of these approaches have not been directly compared to experimental measurements of side-chain motional amplitudes. In addition, protein design methods generally keep the backbone fixed, an approximation that may substantially limit the ability to accurately model side-chain flexibility. Here we describe a Monte Carlo approach to modeling side-chain conformational variability and validate our method against a large dataset of methyl relaxation order parameters derived from Nuclear Magnetic Resonance experiments (17 proteins and a total of 530 data points). We also evaluate a model of backbone flexibility based on Backrub motions, a type of conformational change frequently observed in ultra-high resolution X-ray structures that accounts for correlated side-chain backbone movements. The fixed-backbone model performs reasonably well with an overall rmsd between computed and predicted side-chain order parameters of 0.26. Notably, including backbone flexibility leads to significant improvements in modeling side-chain order parameters for 10 of the 17 proteins in the set. Higher accuracy of the flexible backbone model results from both increases and decreases in side-chain flexibility relative to the fixed-backbone model. This simple flexible-backbone model should be useful for a variety of protein design applications, including improved modeling of protein-protein interactions, design of proteins with desired flexibility or rigidity, and prediction of energetic pathways within proteins.
protein dynamics; side-chain dynamics; NMR order parameters; protein design; flexible backbone
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host’s cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry1-3 to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV–human protein–protein interactions involving 435 individual human proteins, with ~40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Predicting which mutations proteins tolerate while maintaining their structure and function has important applications for modeling fundamental properties of proteins and their evolution; it also drives progress in protein design. Here we develop a computational model to predict the tolerated sequence space of HIV-1 protease reachable by single mutations. We assess the model by comparison to the observed variability in more than 50,000 HIV-1 protease sequences, one of the most comprehensive datasets on tolerated sequence space. We then extend the model to a second protein, reverse transcriptase. The model integrates multiple structural and functional constraints acting on a protein and uses ensembles of protein conformations. We find the model correctly captures a considerable fraction of protease and reverse-transcriptase mutational tolerance and shows comparable accuracy using either experimentally determined or computationally generated structural ensembles. Predictions of tolerated sequence space afforded by the model provide insights into stability-function tradeoffs in the emergence of resistance mutations and into strengths and limitations of the computational model.
Many related protein sequences can be consistent with the structure and function of a given protein, suggesting that proteins may be quite robust to mutations. This tolerance to mutations is frequently exploited by pathogens. In particular, pathogens can rapidly evolve mutated proteins that have a new function - resistance against a therapeutic inhibitor - without abandoning other functions essential for the pathogen. This principle may also hold more generally: Proteins tolerant to mutational changes can more easily acquire new functions while maintaining their existing properties. The ability to predict the tolerance of proteins to mutation could thus help both to analyze the emergence of resistance mutations in pathogens and to engineer proteins with new functions. Here we develop a computational model to predict protein mutational tolerance towards point mutations accessible by single nucleotide changes, and validate it using two important pathogenic proteins and therapeutic targets: the protease and reverse transcriptase from HIV-1. The model provides insights into how resistance emerges and makes testable predictions on mutations that have not been seen yet. Similar models of mutational tolerance should be useful for characterizing and reengineering the functions of other proteins for which a three-dimensional structure is available.
Calcium/calmodulin-dependent kinase II (CaMKII) forms a highly conserved dodecameric assembly that is sensitive to the frequency of calcium pulse trains. Neither the structure of the dodecameric assembly nor how it regulates CaMKII are known. We present the crystal structure of an autoinhibited full-length human CaMKII holoenzyme, revealing an unexpected compact arrangement of kinase domains docked against a central hub, with the calmodulin binding sites completely inaccessible. We show that this compact docking is important for the autoinhibition of the kinase domains and for setting the calcium response of the holoenzyme. Comparison of CaMKII isoforms, which differ in the length of the linker between the kinase domain and the hub, demonstrates that these interactions can be strengthened or weakened by changes in linker length. This equilibrium between autoinhibited states provides a simple mechanism for tuning the calcium response without changes in either the hub or the kinase domains.
Many applications require cells to switch between discrete phenotypic states. Here, we harness the FimBE inversion switch to flip a promoter, allowing expression to be toggled between two genes oriented in opposite directions. The response characteristics of the switch are characterized using two-color cytometry. This switch is used to toggle between orthogonal chemosensory pathways by controlling the expression of CheW and CheW*, which interact with the Tar (aspartate) and Tsr* (serine) chemoreceptors, respectively. CheW* and Tsr* each contain a mutation at their protein-protein interface such that they interact with each other. The complete genetic program containing an arabinose-inducible FimE controlling CheW/CheW* (and constitutively-expressed tar/tsr*) is transformed into an E. coli strain lacking all native chemoreceptors. This program enables bacteria to swim towards serine or aspartate in the absence or presence of arabinose, respectively. Thus, the program functions as a multiplexer with arabinose as the selector. This demonstrates the ability of synthetic genetic circuits to connect to a natural signaling network to switch between phenotypes.
genetic memory; recombinase; stochastic switching; synthetic biology; systems biology
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
G protein–coupled receptors rely on the PDZ domain of SNX27 for endosomal recycling.
Postsynaptic density 95/discs large/zonus occludens-1 (PDZ) domain–interacting motifs, in addition to their well-established roles in protein scaffolding at the cell surface, are proposed to act as cis-acting determinants directing the molecular sorting of transmembrane cargo from endosomes to the plasma membrane. This hypothesis requires the existence of a specific trans-acting PDZ protein that mediates the proposed sorting operation in the endosome membrane. Here, we show that sorting nexin 27 (SNX27) is required for efficient PDZ-directed recycling of the β2-adrenoreceptor (β2AR) from early endosomes. SNX27 mediates this sorting function when expressed at endogenous levels, and its recycling activity requires both PDZ domain–dependent recognition of the β2AR cytoplasmic tail and Phox homology (PX) domain–dependent association with the endosome membrane. These results identify a discrete role of SNX27 in PDZ-directed recycling of a physiologically important signaling receptor, and extend the concept of cargo-specific molecular sorting in the recycling pathway.
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein–protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Incorporation of effective backbone sampling into protein simulation and design is an important step in increasing the accuracy of computational protein modeling. Recent analysis of high-resolution crystal structures has suggested a new model, termed backrub, to describe localized, hinge-like alternative backbone and side chain conformations observed in the crystal lattice. The model involves internal backbone rotations about axes between Cα atoms. Based on this observation, we have implemented a backrub-inspired sampling method in the Rosetta structure prediction and design program. We evaluate this model of backbone flexibility using three different tests. First, we show that Rosetta backrub simulations recapitulate the correlation between backbone and side-chain conformations in the high-resolution crystal structures upon which the model was based. As a second test of backrub sampling, we show that backbone flexibility improves the accuracy of predicting point-mutant side chain conformations over fixed backbone rotameric sampling alone. Finally, we show that backrub sampling of triosephosphate isomerase loop 6 can capture the ms/µs oscillation between the open and closed states observed in solution. Our results suggest that backrub sampling captures a sizable fraction of localized conformational changes that occur in natural proteins. Application of this simple model of backbone motions may significantly improve both protein design and atomistic simulations of localized protein flexibility.
flexible backbone sampling; backrub motion; point mutation; Monte Carlo; triosephosphate isomerase loop 6
The Ras-specific nucleotide exchange factor Son of sevenless (Sos) is inactive without Ras bound to a distal allosteric site. In contrast, the catalytic domain of Ras Guanine Nucleotide Releasing Factor 1 (RasGRF1) is active intrinsically. By substituting residues from RasGRF1 into Sos, we have generated mutants of Sos with basal activity, partially relieved of their dependence on allosteric activation. We have performed molecular dynamics simulations showing how Ras binding to the allosteric site leads to a bias toward the active conformation of Sos. The trajectories show that Sos fluctuates between active and inactive conformations in the absence of Ras and that the activating mutations favor conformations of Sos that are more permissive to Ras binding at the catalytic site. In contrast, unliganded RasGRF1 fluctuates primarily among active conformations. Our results support the premise that the catalytic domain of Sos has evolved an allosteric activation mechanism that extends beyond the simple process of membrane recruitment.
Conformational ensembles are increasingly recognized as a useful representation to describe fundamental relationships between protein structure, dynamics and function. Here we present an ensemble of ubiquitin in solution that is created by sampling conformational space without experimental information using “Backrub” motions inspired by alternative conformations observed in sub-Angstrom resolution crystal structures. Backrub-generated structures are then selected to produce an ensemble that optimizes agreement with nuclear magnetic resonance (NMR) Residual Dipolar Couplings (RDCs). Using this ensemble, we probe two proposed relationships between properties of protein ensembles: (i) a link between native-state dynamics and the conformational heterogeneity observed in crystal structures, and (ii) a relation between dynamics of an individual protein and the conformational variability explored by its natural family. We show that the Backrub motional mechanism can simultaneously explore protein native-state dynamics measured by RDCs, encompass the conformational variability present in ubiquitin complex structures and facilitate sampling of conformational and sequence variability matching those occurring in the ubiquitin protein family. Our results thus support an overall relation between protein dynamics and conformational changes enabling sequence changes in evolution. More practically, the presented method can be applied to improve protein design predictions by accounting for intrinsic native-state dynamics.
Knowledge of protein properties is essential for enhancing the understanding and engineering of biological functions. One key property of proteins is their flexibility—their intrinsic ability to adopt different conformations. This flexibility can be measured experimentally but the measurements are indirect and computational models are required to interpret them. Here we develop a new computational method for interpreting these measurements of flexibility and use it to create a model of flexibility of the protein ubiquitin. We apply our results to show relationships between the flexibility of one protein and the diversity of structures and amino acid sequences of the protein's evolutionary family. Thus, our results show that more accurate computational modeling of protein flexibility is useful for improving prediction of a broader range of amino acid sequences compatible with a given protein. Our method will be helpful for advancing methods to rationally engineer protein functions by enabling sampling of conformational and sequence diversity similar to that of a protein's evolutionary family.
The ‘balance hypothesis' predicts that non-stoichiometric variations in concentrations of proteins participating in complexes should be deleterious. As a corollary, heterozygous deletions and overexpression of protein complex members should have measurable fitness effects. However, genome-wide studies of heterozygous deletions in Saccharomyces cerevisiae and overexpression have been unable to unambiguously relate complex membership to dosage sensitivity. We test the hypothesis that it is not complex membership alone but rather the topology of interactions within a complex that is a predictor of dosage sensitivity. We develop a model that uses the law of mass action to consider how complex formation might be affected by varying protein concentrations given a protein's topological positioning within the complex. Although we find little evidence for combinatorial inhibition of complex formation playing a major role in overexpression phenotypes, consistent with previous results, we show significant correlations between predicted sensitivity of complex formation to protein concentrations and both heterozygous deletion fitness and protein abundance noise levels. Our model suggests a mechanism for dosage sensitivity and provides testable predictions for the effect of alterations in protein abundance noise.
balance hypothesis; dosage sensitivity; heterozygous deletion; protein abundance noise; protein interaction networks
We report crystal structures of a negatively-selected TCR that recognizes two I-Au-restricted myelin basic protein peptides and one of its pMHC ligands. Unusual CDR structural features revealed by our analyses identify a previously unrecognized mechanism by which the highly variable CDR3 regions define ligand specificity. In addition to the pMHC contact residues contributed by CDR3, CDR3 residues buried deep within the Vα/Vβ interface exert indirect effects on recognition by influencing the Vα/Vβ interdomain angle. This phenomenon represents an additional mechanism for increasing the potential diversity of the TCR repertoire. Both the direct and indirect effects exerted by CDR residues can impact global TCR/MHC docking. Analysis of the available TCR structures in light of these results highlights the significance of Vα/Vβ interdomain angle in determining specificity and indicates that TCR/pMHC interface features do not distinguish autoimmune from non-autoimmune class II-restricted TCRs.
Genome-wide studies in Saccharomyces cerevisiae concluded that the dominant determinant of protein evolutionary rates is expression level, where highly-expressed proteins evolve most slowly. To determine how this constraint affects the evolution of protein interactions, we directly measure evolutionary rates of protein interface, surface and core residues by structurally mapping domain interactions to yeast genomes. We find that mRNA level and protein abundance, though correlated, report on pressures affecting regions of proteins differently. Pressures proportional to mRNA level slow evolutionary rates of all structural regions and reduce the variability in rate differences between interfaces and other surfaces. In contrast, the evolutionary rate variation within a domain is less dependent on protein abundance. Distinct pressures may be associated primarily with the cost (mRNA level) and functional benefit (protein abundance) of protein production. Interfaces of proteins with low mRNA levels may have higher evolutionary flexibility, and could constitute the raw material for new functions.
The interaction between integrin lymphocyte function-associated antigen-1 (LFA-1) and its ligand intercellular adhesion molecule-1 (ICAM-1) is critical in immunological and inflammatory reactions but, like other adhesive interactions, is of low affinity. Here, multiple rational design methods were used to engineer ICAM-1 mutants with enhanced affinity for LFA-1. Five amino acid substitutions 1) enhance the hydrophobicity and packing of residues surrounding Glu-34 of ICAM-1, which coordinates to a Mg2+ in the LFA-1 I domain, and 2) alter associations at the edges of the binding interface. The affinity of the most improved ICAM-1 mutant for intermediate- and high-affinity LFA-1 I domains was increased by 19-fold and 22-fold, respectively, relative to wild type. Moreover, potency was similarly enhanced for inhibition of LFA-1-dependent ligand binding and cell adhesion. Thus, rational design can be used to engineer novel adhesion molecules with high monomeric affinity; furthermore, the ICAM-1 mutant holds promise for targeting LFA-1-ICAM-1 interaction for biological studies and therapeutic purposes.
Interactions in protein networks may place constraints on protein interface sequences to maintain correct and avoid unwanted interactions. Here we describe a “multi-constraint” protein design protocol to predict sequences optimized for multiple criteria, such as maintaining sets of interactions, and apply it to characterize the mechanism and extent to which 20 multi-specific proteins are constrained by binding to multiple partners. We find that multi-specific binding is accommodated by at least two distinct patterns. In the simplest case, all partners share key interactions, and sequences optimized for binding to either single or multiple partners recover only a subset of native amino acid residues as optimal. More interestingly, for signaling interfaces functioning as network “hubs,” we identify a different, “multi-faceted” mode, where each binding partner prefers its own subset of wild-type residues within the promiscuous binding site. Here, integration of preferences across all partners results in sequences much more “native-like” than seen in optimization for any single binding partner alone, suggesting these interfaces are substantially optimized for multi-specificity. The two strategies make distinct predictions for interface evolution and design. Shared interfaces may be better small molecule targets, whereas multi-faceted interactions may be more “designable” for altered specificity patterns. The computational methodology presented here is generalizable for examining how naturally occurring protein sequences have been selected to satisfy a variety of positive and negative constraints, as well as for rationally designing proteins to have desired patterns of altered specificity.
Computational methods have recently led to remarkable successes in the design of molecules with novel functions. These approaches offer great promise for creating highly selective molecules to accurately control biological processes. However, to reach these goals modeling procedures are needed that are able to define the optimal “fitness” of a protein to function correctly within complex biological networks and in the context of many possible interaction partners. To make progress toward these goals, we describe a computational design procedure that predicts protein sequences optimized to bind not only to a single protein but also to a set of target interaction partners. Application of the method to characterize “hub” proteins in cellular interaction networks gives insights into the mechanisms nature has used to tune protein surfaces to recognize multiple correct partner proteins. Our study also provides a starting point to engineer designer molecules that could modulate or replace naturally occurring protein interaction networks to combat misregulation in disease or to build new sets of protein interactions for synthetic biology.
RNA-binding proteins play many essential roles in the regulation of gene expression in the cell. Despite the significant increase in the number of structures for RNA–protein complexes in the last few years, the molecular basis of specificity remains unclear even for the best-studied protein families. We have developed a distance and orientation-dependent hydrogen-bonding potential based on the statistical analysis of hydrogen-bonding geometries that are observed in high-resolution crystal structures of protein–DNA and protein–RNA complexes. We observe very strong geometrical preferences that reflect significant energetic constraints on the relative placement of hydrogen-bonding atom pairs at protein–nucleic acid interfaces. A scoring function based on the hydrogen-bonding potential discriminates native protein–RNA structures from incorrectly docked decoys with remarkable predictive power. By incorporating the new hydrogen-bonding potential into a physical model of protein–RNA interfaces with full atom representation, we were able to recover native amino acids at protein–RNA interfaces.