Interactions between polar atoms are challenging to model because at very short ranges they form hydrogen bonds (H-bonds) that are partially covalent in character and exhibit strong orientation preferences; at longer ranges the orientation preferences are lost, but significant electrostatic interactions between charged and partially charged atoms remain. To simultaneously model these two types of behavior, we refined an orientation dependent model of hydrogen bonds [Kortemme et al. 2003] used by the molecular modeling program Rosetta and then combined it with a distance-dependent Coulomb model of electrostatics. The functional form of the H-bond potential is physically motivated and parameters are fit so that H-bond geometries that Rosetta generates closely resemble H-bond geometries in high-resolution crystal structures. The combined potentials improve performance in a variety of scientific benchmarks including decoy discrimination, side chain prediction, and native sequence recovery in protein design simulations, and establishes a new standard energy function for Rosetta.
We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
Natural protein assemblies have many sophisticated architectures and functions, creating nanoscale storage containers, motors and pumps1–3. Inspired by these systems, protein monomers have been engineered to self-assemble into supramolecular architectures4 including symmetrical5,6, metal-templated7,8 and cage-like structures8–10. The complexity of protein machines, however, has made it difficult to create assemblies with both defined structures and controllable functions. Here we report protein assemblies that have been engineered to function as light-controlled nanocontainers. We show that an adenosine-5′-triphosphate (ATP)-driven group II chaperonin11,12, which resembles a barrel with a builtin lid, can be reprogrammed to open and close on illumination with different frequencies of light. By engineering photoswitchable azobenzene-based molecules into the structure, light-triggered changes in interatomic distances in the azobenzene moiety are able to drive large-scale conformational changes of the protein assembly. The different states of the assembly can be visualized with single particle cryo-electron microscopy, and the nanocages can be used to capture and release non-native cargos. Similar strategies switching atomic distances with light could be used to build other controllable nanoscale machines.
The limited size of the germline antibody repertoire has to recognize a far larger number of potential antigens. The ability of a single antibody to bind multiple ligands due to conformational flexibility in the antigen-binding site can significantly enlarge the repertoire. Among the six hyper-variable complementarity determining regions (CDRs) that comprise the binding site, the CDR H3 loop is particularly flexible. Computational protein design studies showed that predicted low energy sequences compatible with a given backbone structure often have considerable similarity to the corresponding native sequences of naturally occurring proteins, indicating that native protein sequences are close to optimal for their structures. Here, we take a step forward to determine whether conformational flexibility, believed to play a key functional role in germline antibodies, is also central in shaping their native sequence. In particular, we use a multi-constraint computational design strategy, along with the Rosetta energy function, to propose that the native sequences of CDR H3 loops from germline antibodies are nearly optimal for conformational flexibility. Moreover, we find that antibody maturation may lead to sequences with a higher degree of optimization for a single conformation, while disfavoring sequences that are intrinsically flexible. In addition, this computational strategy allows us to predict mutations in the CDR H3 loop to stabilize the antigen-bound conformation, a computational mimic of affinity maturation, that may increase antigen binding affinity by pre-organizing the antigen binding loop. In vivo affinity maturation data are consistent with our predictions. The method described here can be useful to design antibodies with higher selectivity and affinity by reducing conformational diversity.
antibody flexibility; computational structural biology; computational design; multi-constraint design; affinity maturation
Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains. We find significant similarities between the amino acid covariation in alignments of natural protein sequences and sequences optimized for their structures by computational protein design methods. These results indicate that the structural constraints imposed by protein architecture play a dominant role in shaping amino acid covariation and that computational protein design methods can capture these effects. We also find that the similarity between natural and designed covariation is sensitive to the magnitude and mechanism of backbone flexibility used in computational protein design. Our results thus highlight the necessity of including backbone flexibility to correctly model precise details of correlated amino acid changes and give insights into the pressures underlying these correlations.
Proteins generally fold into specific three-dimensional structures to perform their cellular functions, and the presence of misfolded proteins is often deleterious for cellular and organismal fitness. For these reasons, maintenance of protein structure is thought to be one of the major fitness pressures acting on proteins. Consequently, the sequences of today's naturally occurring proteins contain signatures reflecting the constraints imposed by protein structure. Here we test the ability of computational protein design methods to recapitulate and explain these signatures. We focus on the physical basis of evolutionary pressures that act on interactions between amino acids in folded proteins, which are critical in determining protein structure and function. Such pressures can be observed from the appearance of amino acid covariation, where the amino acids at certain positions in protein sequences are correlated with each other. We find similar patterns of amino acid covariation in natural sequences and sequences optimized for their structures using computational protein design, demonstrating the importance of structural constraints in protein molecular evolution and providing insights into the structural mechanisms leading to covariation. In addition, these results characterize the ability of computational methods to model the precise details of correlated amino acid changes, which is critical for engineering new proteins with useful functions beyond those seen in nature.
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Protein design; Fixed-backbone design; Flexible-backbone design; Sequence alignments; Relative solvent accessibility; Site variability
There is a growing
interest in engineering proteins whose function
can be controlled with the spatial and temporal precision of light.
Here, we present a novel example of a functional light-triggered switch
in the Ca-dependent cell–cell adhesion protein E-cadherin,
created using a mechanism-based design strategy. We report an 18-fold
change in apparent Ca2+ binding affinity upon illumination.
Our results include a detailed examination of functional switching
via linked changes in Ca2+ binding and cadherin dimerization.
This design opens avenues toward controllable tools that could be
applied to many long-standing questions about cadherin’s biological
function in cell–cell adhesion and downstream signaling.
Sampling alternative conformations is key to understanding how proteins work and engineering them for new functions. However, accurately characterizing and modeling protein conformational ensembles remains experimentally and computationally challenging. These challenges must be met before protein conformational heterogeneity can be exploited in protein engineering and design. Here, as a stepping stone, we describe methods to detect alternative conformations in proteins and strategies to model these near-native conformational changes based on backrub-type Monte Carlo moves in Rosetta. We illustrate how Rosetta simulations that apply backrub moves improve modeling of point mutant side chain conformations, native side chain conformational heterogeneity, functional conformational changes, tolerated sequence space, protein interaction specificity, and amino acid co-variation across protein-protein interfaces. We include relevant Rosetta command lines and RosettaScripts to encourage the application of these types of simulations to other systems. Our work highlights that critical scoring and sampling improvements will be necessary to approximate conformational landscapes. Challenges for the future development of these methods include modeling conformational changes that propagate away from designed mutation sites and modulating backbone flexibility to predictively design functionally important conformational heterogeneity.
Protein design; protein dynamics; conformational heterogeneity; conformational sampling; alternative conformations; Rosetta; Ringer; Backrub
Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function. The optE tool optimizes the weights on the different components of the energy function by maximizing the recapitulation of a wide range of experimental observations. We use the tools to examine three proposed modifications to the Rosetta energy function: improving the unfolded state energy model (reference energies), using bicubic spline interpolation to generate knowledge based torisonal potentials, and incorporating the recently developed Dunbrack 2010 rotamer library (Shapovalov and Dunbrack, 2011).
Rosetta; energy function; scientific benchmarking; parameter estimation; decoy discrimination
The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code’s difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step ‘serverification’ protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org.
To accurately predict protein conformations in atomic detail, a computational method must be capable of sampling models sufficiently close to the native structure. All-atom sampling is difficult because of the vast number of possible conformations and extremely rugged energy landscapes. Here, we test three sampling strategies to address these difficulties: conformational diversification, intensification of torsion and omega-angle sampling and parameter annealing. We evaluate these strategies in the context of the robotics-based kinematic closure (KIC) method for local conformational sampling in Rosetta on an established benchmark set of 45 12-residue protein segments without regular secondary structure. We quantify performance as the fraction of sub-Angstrom models generated. While improvements with individual strategies are only modest, the combination of intensification and annealing strategies into a new “next-generation KIC” method yields a four-fold increase over standard KIC in the median percentage of sub-Angstrom models across the dataset. Such improvements enable progress on more difficult problems, as demonstrated on longer segments, several of which could not be accurately remodeled with previous methods. Given its improved sampling capability, next-generation KIC should allow advances in other applications such as local conformational remodeling of multiple segments simultaneously, flexible backbone sequence design, and development of more accurate energy functions.
The considerable flexibility of side-chains in folded proteins is important for protein stability and function, and may play a role in mediating pathways of energetic connectivity between allosteric sites. While sampling side-chain degrees of freedom has been an integral part of several successful computational protein design methods, the predictions of these approaches have not been directly compared to experimental measurements of side-chain motional amplitudes. In addition, protein design methods generally keep the backbone fixed, an approximation that may substantially limit the ability to accurately model side-chain flexibility. Here we describe a Monte Carlo approach to modeling side-chain conformational variability and validate our method against a large dataset of methyl relaxation order parameters derived from Nuclear Magnetic Resonance experiments (17 proteins and a total of 530 data points). We also evaluate a model of backbone flexibility based on Backrub motions, a type of conformational change frequently observed in ultra-high resolution X-ray structures that accounts for correlated side-chain backbone movements. The fixed-backbone model performs reasonably well with an overall rmsd between computed and predicted side-chain order parameters of 0.26. Notably, including backbone flexibility leads to significant improvements in modeling side-chain order parameters for 10 of the 17 proteins in the set. Higher accuracy of the flexible backbone model results from both increases and decreases in side-chain flexibility relative to the fixed-backbone model. This simple flexible-backbone model should be useful for a variety of protein design applications, including improved modeling of protein-protein interactions, design of proteins with desired flexibility or rigidity, and prediction of energetic pathways within proteins.
protein dynamics; side-chain dynamics; NMR order parameters; protein design; flexible backbone
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host’s cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry1-3 to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV–human protein–protein interactions involving 435 individual human proteins, with ~40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Predicting which mutations proteins tolerate while maintaining their structure and function has important applications for modeling fundamental properties of proteins and their evolution; it also drives progress in protein design. Here we develop a computational model to predict the tolerated sequence space of HIV-1 protease reachable by single mutations. We assess the model by comparison to the observed variability in more than 50,000 HIV-1 protease sequences, one of the most comprehensive datasets on tolerated sequence space. We then extend the model to a second protein, reverse transcriptase. The model integrates multiple structural and functional constraints acting on a protein and uses ensembles of protein conformations. We find the model correctly captures a considerable fraction of protease and reverse-transcriptase mutational tolerance and shows comparable accuracy using either experimentally determined or computationally generated structural ensembles. Predictions of tolerated sequence space afforded by the model provide insights into stability-function tradeoffs in the emergence of resistance mutations and into strengths and limitations of the computational model.
Many related protein sequences can be consistent with the structure and function of a given protein, suggesting that proteins may be quite robust to mutations. This tolerance to mutations is frequently exploited by pathogens. In particular, pathogens can rapidly evolve mutated proteins that have a new function - resistance against a therapeutic inhibitor - without abandoning other functions essential for the pathogen. This principle may also hold more generally: Proteins tolerant to mutational changes can more easily acquire new functions while maintaining their existing properties. The ability to predict the tolerance of proteins to mutation could thus help both to analyze the emergence of resistance mutations in pathogens and to engineer proteins with new functions. Here we develop a computational model to predict protein mutational tolerance towards point mutations accessible by single nucleotide changes, and validate it using two important pathogenic proteins and therapeutic targets: the protease and reverse transcriptase from HIV-1. The model provides insights into how resistance emerges and makes testable predictions on mutations that have not been seen yet. Similar models of mutational tolerance should be useful for characterizing and reengineering the functions of other proteins for which a three-dimensional structure is available.
Calcium/calmodulin-dependent kinase II (CaMKII) forms a highly conserved dodecameric assembly that is sensitive to the frequency of calcium pulse trains. Neither the structure of the dodecameric assembly nor how it regulates CaMKII are known. We present the crystal structure of an autoinhibited full-length human CaMKII holoenzyme, revealing an unexpected compact arrangement of kinase domains docked against a central hub, with the calmodulin binding sites completely inaccessible. We show that this compact docking is important for the autoinhibition of the kinase domains and for setting the calcium response of the holoenzyme. Comparison of CaMKII isoforms, which differ in the length of the linker between the kinase domain and the hub, demonstrates that these interactions can be strengthened or weakened by changes in linker length. This equilibrium between autoinhibited states provides a simple mechanism for tuning the calcium response without changes in either the hub or the kinase domains.
Many applications require cells to switch between discrete phenotypic states. Here, we harness the FimBE inversion switch to flip a promoter, allowing expression to be toggled between two genes oriented in opposite directions. The response characteristics of the switch are characterized using two-color cytometry. This switch is used to toggle between orthogonal chemosensory pathways by controlling the expression of CheW and CheW*, which interact with the Tar (aspartate) and Tsr* (serine) chemoreceptors, respectively. CheW* and Tsr* each contain a mutation at their protein-protein interface such that they interact with each other. The complete genetic program containing an arabinose-inducible FimE controlling CheW/CheW* (and constitutively-expressed tar/tsr*) is transformed into an E. coli strain lacking all native chemoreceptors. This program enables bacteria to swim towards serine or aspartate in the absence or presence of arabinose, respectively. Thus, the program functions as a multiplexer with arabinose as the selector. This demonstrates the ability of synthetic genetic circuits to connect to a natural signaling network to switch between phenotypes.
genetic memory; recombinase; stochastic switching; synthetic biology; systems biology
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
G protein–coupled receptors rely on the PDZ domain of SNX27 for endosomal recycling.
Postsynaptic density 95/discs large/zonus occludens-1 (PDZ) domain–interacting motifs, in addition to their well-established roles in protein scaffolding at the cell surface, are proposed to act as cis-acting determinants directing the molecular sorting of transmembrane cargo from endosomes to the plasma membrane. This hypothesis requires the existence of a specific trans-acting PDZ protein that mediates the proposed sorting operation in the endosome membrane. Here, we show that sorting nexin 27 (SNX27) is required for efficient PDZ-directed recycling of the β2-adrenoreceptor (β2AR) from early endosomes. SNX27 mediates this sorting function when expressed at endogenous levels, and its recycling activity requires both PDZ domain–dependent recognition of the β2AR cytoplasmic tail and Phox homology (PX) domain–dependent association with the endosome membrane. These results identify a discrete role of SNX27 in PDZ-directed recycling of a physiologically important signaling receptor, and extend the concept of cargo-specific molecular sorting in the recycling pathway.
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein–protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design.
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Incorporation of effective backbone sampling into protein simulation and design is an important step in increasing the accuracy of computational protein modeling. Recent analysis of high-resolution crystal structures has suggested a new model, termed backrub, to describe localized, hinge-like alternative backbone and side chain conformations observed in the crystal lattice. The model involves internal backbone rotations about axes between Cα atoms. Based on this observation, we have implemented a backrub-inspired sampling method in the Rosetta structure prediction and design program. We evaluate this model of backbone flexibility using three different tests. First, we show that Rosetta backrub simulations recapitulate the correlation between backbone and side-chain conformations in the high-resolution crystal structures upon which the model was based. As a second test of backrub sampling, we show that backbone flexibility improves the accuracy of predicting point-mutant side chain conformations over fixed backbone rotameric sampling alone. Finally, we show that backrub sampling of triosephosphate isomerase loop 6 can capture the ms/µs oscillation between the open and closed states observed in solution. Our results suggest that backrub sampling captures a sizable fraction of localized conformational changes that occur in natural proteins. Application of this simple model of backbone motions may significantly improve both protein design and atomistic simulations of localized protein flexibility.
flexible backbone sampling; backrub motion; point mutation; Monte Carlo; triosephosphate isomerase loop 6
The Ras-specific nucleotide exchange factor Son of sevenless (Sos) is inactive without Ras bound to a distal allosteric site. In contrast, the catalytic domain of Ras Guanine Nucleotide Releasing Factor 1 (RasGRF1) is active intrinsically. By substituting residues from RasGRF1 into Sos, we have generated mutants of Sos with basal activity, partially relieved of their dependence on allosteric activation. We have performed molecular dynamics simulations showing how Ras binding to the allosteric site leads to a bias toward the active conformation of Sos. The trajectories show that Sos fluctuates between active and inactive conformations in the absence of Ras and that the activating mutations favor conformations of Sos that are more permissive to Ras binding at the catalytic site. In contrast, unliganded RasGRF1 fluctuates primarily among active conformations. Our results support the premise that the catalytic domain of Sos has evolved an allosteric activation mechanism that extends beyond the simple process of membrane recruitment.
Conformational ensembles are increasingly recognized as a useful representation to describe fundamental relationships between protein structure, dynamics and function. Here we present an ensemble of ubiquitin in solution that is created by sampling conformational space without experimental information using “Backrub” motions inspired by alternative conformations observed in sub-Angstrom resolution crystal structures. Backrub-generated structures are then selected to produce an ensemble that optimizes agreement with nuclear magnetic resonance (NMR) Residual Dipolar Couplings (RDCs). Using this ensemble, we probe two proposed relationships between properties of protein ensembles: (i) a link between native-state dynamics and the conformational heterogeneity observed in crystal structures, and (ii) a relation between dynamics of an individual protein and the conformational variability explored by its natural family. We show that the Backrub motional mechanism can simultaneously explore protein native-state dynamics measured by RDCs, encompass the conformational variability present in ubiquitin complex structures and facilitate sampling of conformational and sequence variability matching those occurring in the ubiquitin protein family. Our results thus support an overall relation between protein dynamics and conformational changes enabling sequence changes in evolution. More practically, the presented method can be applied to improve protein design predictions by accounting for intrinsic native-state dynamics.
Knowledge of protein properties is essential for enhancing the understanding and engineering of biological functions. One key property of proteins is their flexibility—their intrinsic ability to adopt different conformations. This flexibility can be measured experimentally but the measurements are indirect and computational models are required to interpret them. Here we develop a new computational method for interpreting these measurements of flexibility and use it to create a model of flexibility of the protein ubiquitin. We apply our results to show relationships between the flexibility of one protein and the diversity of structures and amino acid sequences of the protein's evolutionary family. Thus, our results show that more accurate computational modeling of protein flexibility is useful for improving prediction of a broader range of amino acid sequences compatible with a given protein. Our method will be helpful for advancing methods to rationally engineer protein functions by enabling sampling of conformational and sequence diversity similar to that of a protein's evolutionary family.
The ‘balance hypothesis' predicts that non-stoichiometric variations in concentrations of proteins participating in complexes should be deleterious. As a corollary, heterozygous deletions and overexpression of protein complex members should have measurable fitness effects. However, genome-wide studies of heterozygous deletions in Saccharomyces cerevisiae and overexpression have been unable to unambiguously relate complex membership to dosage sensitivity. We test the hypothesis that it is not complex membership alone but rather the topology of interactions within a complex that is a predictor of dosage sensitivity. We develop a model that uses the law of mass action to consider how complex formation might be affected by varying protein concentrations given a protein's topological positioning within the complex. Although we find little evidence for combinatorial inhibition of complex formation playing a major role in overexpression phenotypes, consistent with previous results, we show significant correlations between predicted sensitivity of complex formation to protein concentrations and both heterozygous deletion fitness and protein abundance noise levels. Our model suggests a mechanism for dosage sensitivity and provides testable predictions for the effect of alterations in protein abundance noise.
balance hypothesis; dosage sensitivity; heterozygous deletion; protein abundance noise; protein interaction networks