A method involving electron paramagnetic resonance spectroscopy of a site-selectively spin-labeled peripheral membrane protein in the presence and absence of membranes and of a water-soluble spin relaxant (chromium oxalate) has been developed to determine how bee venom phospholipase A2 sits on the membrane. Theory based on the Poisson-Boltzmann equation shows that the rate of spin relaxation of a protein-bound nitroxide by a membrane-impermeant spin relaxant depends on the distance (up to tens of angstroms) from the spin probe to the membrane. The measurements define the interfacial binding surface of this secreted phospholipase A2.
SkyLine, a high-throughput homology modeling pipeline tool, detects and models true sequence homologs to a given protein structure. Structures and models are stored in SkyBase with links to computational function annotation, as calculated by MarkUs. The SkyLine/SkyBase/MarkUs technology represents a novel structure-based approach that is more objective and versatile than other protein classification resources. This structure-centric strategy provides a multidimensional organization and coverage of protein space at the levels of family, function, and genome. The concept of “modelability”, the ability to model sequences on related structures, provides a reliable criterion for membership in a protein family (“leverage”) and underlies the unique success of this approach. The overall procedure is illustrated by its application to START domains, which comprise a Biomedical Theme for the Northeast Structural Genomics Consortium (NESG) as part of the Protein Structure Initiative (PSI). START domains are typically involved in the non-vesicular transport of lipids. While 19 experimentally determined structures are available, the family, whose evolutionary hierarchy is not well determined, is highly sequence diverse, and the ligand-binding potential of many family members is unknown. The SkyLine/SkyBase/MarkUs approach provides significant insights and predicts: 1) many more family members (~4,000) than any other resource; 2) the function for a large number of unannotated proteins; 3) instances of START domains in genomes from which they were thought to be absent; and 4) the existence of two types of novel proteins, those containing dual START domain and those containing N-terminal START domains.
Homology modeling; Structural genomics; Bioinformatics; Protein function annotation; START domain; Arabidopsis thaliana
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
The assembly of most retroviruses occurs at the plasma membrane. Membrane association is directed by MA, the N-terminal domain of the Gag structural protein. For human immunodeficiency virus type 1 (HIV-1), this association is mediated in part by a myristate fatty acid modification. Conflicting evidence has been presented on the relative importance of myristoylation, of ionic interactions between protein and membrane, and of Gag multimerization in membrane association in vivo. We addressed these questions biochemically by determining the affinity of purified myristoylated HIV-1 MA for liposomes of defined composition, both for monomeric and for dimeric forms of the protein. Myristoylation increases the barely detectable intrinsic affinity of the apo-protein for liposomes by only 10-fold, and the resulting affinity is still weak, similar to that of the naturally nonmyristoylated MA of Rous sarcoma virus. Membrane binding of HIV-1 MA is absolutely dependent on the presence of negatively charged lipid and is abrogated at high ionic strength. Forced dimerization of MA increases its membrane affinity by several orders of magnitude. When green fluorescent protein fusions of monomeric or dimeric MA are expressed in cells, the dimeric but not the monomeric protein becomes strongly membrane associated. Computational modeling supports these results and suggests a molecular mechanism for the modest effect of myristoylation on binding, wherein the membrane provides a hydrophobic environment for the myristate that is energetically similar to that provided by the protein. Overall, the results imply that the driving force for membrane association stems largely from ionic interactions between multimerized Gag and negatively charged phospholipids.
It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has been on highly connected proteins (“hubs”). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many “shortest paths” going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., “bottleneck-ness”) is a much more significant indicator of essentiality than degree (i.e., “hub-ness”). Furthermore, bottlenecks correspond to the dynamic components of the interaction network—they are significantly less well coexpressed with their neighbors than nonbottlenecks, implying that expression dynamics is wired into the network topology.
A network is a graph consisting of a number of nodes with edges connecting them. Recently, network models have been widely applied to biological systems. Here, we are mainly interested in two types of biological networks: the interaction network, where nodes are proteins and edges connect interacting partners; and the regulatory network, where nodes are proteins and edges connect transcription factors and their targets. Betweenness is one of the most important topological properties of a network. It measures the number of shortest paths going through a certain node. Therefore, nodes with the highest betweenness control most of the information flow in the network, representing the critical points of the network. We thus call these nodes the “bottlenecks” of the network. Here, we focus on bottlenecks in protein networks. We find that, in the regulatory network, where there is a clear concept of information flow, protein bottlenecks indeed have a much higher tendency to be essential genes. In this type of network, betweenness is a good predictor of essentiality. Biological researchers can therefore use the betweenness as one more feature to choose potential targets for detailed analysis.
Formation of multiprotein complexes on cellular membranes is critically dependent on the cyclic activation of small GTPases. FRAP-based analyses demonstrate that within protein complexes, some small GTPases cycle nearly three orders of magnitude faster than they would spontaneously cycle in vitro. At the same time, experiments report concomitant excess of the activated, GTP-bound form of GTPases over their inactive form. Intuitively, high activity and rapid turnover are contradictory requirements. How the cells manage to maximize both remains poorly understood. Here, using GTPases of the Rab and Rho families as a prototype, we introduce a computational model of the GTPase cycle. We quantitatively investigate several plausible layouts of the cycling control module that consist of GEFs, GAPs, and GTPase effectors. We explain the existing experimental data and predict how the cycling of GTPases is controlled by the regulatory proteins in vivo. Our model explains distinct and separable roles that the activating GEFs and deactivating GAPs play in the GTPase cycling control. While the activity of GTPase is mainly defined by GEF, the turnover rate is a sole function of GAP. Maximization of the GTPase activity and turnover rate places conflicting requirements on the concentration of GAP. Therefore, to achieve a high activity and turnover rate at once, cells must carefully maintain concentrations of GEFs and GAPs within the optimal range. The values of these optimal concentrations indicate that efficient cycling can be achieved only within dense protein complexes typically assembled on the membrane surfaces. We show that the concentration requirement for GEF can be dramatically reduced by a GEF-activating GTPase effector that can also significantly boost the cycling efficiency. Interestingly, we find that the cycling regimes are only weakly dependent on the concentration of GTPase itself.
A large variety of cellular processes, such as the formation of filopodia or transport vesicles, require that large protein complexes are precisely positioned on intracellular membranes to execute a specific task and then are promptly disassembled to perform their function elsewhere. Small GTPases play a major role in the spatiotemporal control of these complexes. Their function is based on the unique property of cycling between the active GTP-bound state, in which they enable complex formation, and the inactive GDP-bound state, which promotes complex dissolution. Recent experiments based on fluorescence recovery after photobleaching have found that some small GTPases rapidly cycle within protein complexes, causing continuous release and recruitment of the complex components. The seemingly futile cycling is accompanied by a large excess of the active form. This puzzling behavior challenges one's intuition and calls for the application of quantitative methods. Here, Goryachev and Pokhilko use computational modeling to identify regulatory mechanisms that could enable GTPases to cycle with the experimentally observed frequency and efficiency. They show that to achieve high activity and turnover simultaneously, the concentrations of the regulatory molecules that control GTPase cycling should be tightly maintained within the optimal range.
The protein–protein interaction networks, or interactome networks, have been shown to have dynamic modular structures, yet the functional connections between and among the modules are less well understood. Here, using a new pipeline to integrate the interactome and the transcriptome, we identified a pair of transcriptionally anticorrelated modules, each consisting of hundreds of genes in multicellular interactome networks across different individuals and populations. The two modules are associated with cellular proliferation and differentiation, respectively. The proliferation module is conserved among eukaryotic organisms, whereas the differentiation module is specific to multicellular organisms. Upon differentiation of various tissues and cell lines from different organisms, the expression of the proliferation module is more uniformly suppressed, while the differentiation module is upregulated in a tissue- and species-specific manner. Our results indicate that even at the tissue and organism levels, proliferation and differentiation modules may correspond to two alternative states of the molecular network and may reflect a universal symbiotic relationship in a multicellular organism. Our analyses further predict that the proteins mediating the interactions between these modules may serve as modulators at the proliferation/differentiation switch.
Coordination of proliferation and differentiation is a fundamental process of multicellular organisms. Although at the cellular level proliferation and differentiation seem to correspond to different cellular states that can sometimes be seen separated by the proliferation/differentiation temporal switch, it is unclear whether such switch-like property exists at the tissue or organism level or whether it exists in postmitotic tissues in adult animals. Through integrating protein–protein interaction networks with gene expression profiles, Xia, Xue, Dong, Zhu, and colleagues found that a switch temporally separating proliferation- and differentiation-associated modules can also be detected in the adult human brain and the adult whole fruit fly. The expressions of the two modules are well coordinated at the system level. The evolutionary origins of the proliferation and differentiation modules further implicate a symbiotic relationship between the two modules. Network topologies and gene annotations support a regulatory role of the protein–protein interaction interface between the two modules.
Subcellular protein localization is a universal feature of eukaryotic cells, and the ubiquity of protein localization in prokaryotic species is now acquiring greater appreciation. Though some targeting anchors are known, the origin of polar and division-site localization remains mysterious for a large fraction of bacterial proteins. Ultimately, the molecular components responsible for such symmetry breaking must employ a high degree of self-organization. Here we propose a novel physical mechanism, based on the two-dimensional curvature of the membrane, for spontaneous lipid targeting to the poles and division site of rod-shaped bacterial cells. If one of the membrane components has a large intrinsic curvature, the geometrical constraint of the plasma membrane by the more rigid bacterial cell wall naturally leads to lipid microphase separation. We find that the resulting clusters of high-curvature lipids are large enough to spontaneously and stably localize to the two cell poles. Recent evidence of localization of the phospholipid cardiolipin to the poles of bacterial cells suggests that polar targeting of some proteins may rely on the membrane's differential lipid content. More generally, aggregates of lipids, proteins, or lipid-protein complexes may localize in response to features of cell geometry incapable of localizing individual molecules.
Bacteria contain proteins that localize to different regions of the cell—the poles, the middle of the cell, or along the membrane in specific structures such as helices. This localization is often critical to function, from division-site placement, to cell-cycle progression, to maintaining the shape of the cell. How proteins localize to the poles of cells varies, and the mechanism is unknown for many proteins. A surprising discovery in recent years is that the membrane component cardiolipin localizes to the poles of the rod-shaped bacteria Escherichia coli and Bacillus subtilis. This work of Huang, Mukhopadhyay, and Wingreen presents a model of lipid polar localization in which clusters of cardiolipin are formed naturally due to the constraint of the membrane by the rigid cell wall, and these clusters are large enough to localize to the poles of the cell due to curvature, suggesting that clusters of lipids may serve to recruit proteins to the poles.
The initial coupling between ligand binding and channel gating in the human α7 nicotinic acetylcholine receptor (nAChR) has been investigated with targeted molecular dynamics (TMD) simulation. During the simulation, eight residues at the tip of the C-loop in two alternating subunits were forced to move toward a ligand-bound conformation as captured in the crystallographic structure of acetylcholine binding protein (AChBP) in complex with carbamoylcholine. Comparison of apo- and ligand-bound AChBP structures shows only minor rearrangements distal from the ligand-binding site. In contrast, comparison of apo and TMD simulation structures of the nAChR reveals significant changes toward the bottom of the ligand-binding domain. These structural rearrangements are subsequently translated to the pore domain, leading to a partly open channel within 4 ns of TMD simulation. Furthermore, we confirmed that two highly conserved residue pairs, one located near the ligand-binding pocket (Lys145 and Tyr188), and the other located toward the bottom of the ligand-binding domain (Arg206 and Glu45), are likely to play important roles in coupling agonist binding to channel gating. Overall, our simulations suggest that gating movements of the α7 receptor may involve relatively small structural changes within the ligand-binding domain, implying that the gating transition is energy-efficient and can be easily modulated by agonist binding/unbinding.
Nicotinic acetylcholine receptors are ligand-gated ion channels responsible for neurotransmitter-mediated signal transduction at synapses throughout the central and peripheral nervous systems. Binding of neurotransmitter molecules to subunit interfaces in the N-terminal extracellular domain induces structural rearrangements of the membrane-spanning domain permitting the influx of cations. A full understanding of how the conformational changes propagate from the ligand-binding site to the pore domain is of great interest to biologists, yet remains to be established. Using a special simulation technique known as targeted molecular dynamics, Cheng and colleagues probed the early stages of ligand-induced conformational rearrangements that may lead to channel opening. During the simulation, Cheng et al. observed a sequence of conformational changes that stem from the ligand-binding site to the transmembrane domain resulting in a wider channel. From these results, they suggest that gating movements may entail only small structural changes in the ligand-binding domain, implying that channel gating is energy-efficient and can readily be modulated by the binding/unbinding of agonist molecules.
The study of associations between two biomolecules is the key to understanding molecular function and recognition. Molecular function is often thought to be determined by underlying structures. Here, combining a single-molecule study of protein binding with an energy-landscape–inspired microscopic model, we found strong evidence that biomolecular recognition is determined by flexibilities in addition to structures. Our model is based on coarse-grained molecular dynamics on the residue level with the energy function biased toward the native binding structure (the Go model). With our model, the underlying free-energy landscape of the binding can be explored. There are two distinct conformational states at the free-energy minimum, one with partial folding of CBD itself and significant interface binding of CBD to Cdc42, and the other with native folding of CBD itself and native interface binding of CBD to Cdc42. This shows that the binding process proceeds with a significant interface binding of CBD with Cdc42 first, without a complete folding of CBD itself, and that binding and folding are then coupled to reach the native binding state. The single-molecule experimental finding of dynamic fluctuations among the loosely and closely bound conformational states can be identified with the theoretical, calculated free-energy minimum and explained quantitatively in the model as a result of binding associated with large conformational changes. The theoretical predictions identified certain key residues for binding that were consistent with mutational experiments. The combined study identified fundamental mechanisms and provided insights about designing and further exploring biomolecular recognition with large conformational changes.
Biomolecular function (e.g., binding) is often thought to be determined by the underlying molecular structure. There are more and more findings that molecular binding sometimes involves large conformational changes in various stages of cell function. Addressing this issue will answer the critical questions about how molecular function is determined by conformational flexibility and dynamics in addition to structure. Combining a single-molecule fluorescence study of flexible protein binding with an energy-landscape–inspired microscopic molecular dynamics model, the authors found strong evidence that biomolecular recognition is determined by flexibility and large conformational changes in addition to structure. The single-molecule study shows conformational fluctuations of the protein complex that involve bound and loosely bound states, which can be quantitatively explained in the authors' model as a result of cooperative binding. Theoretical predictions about the key residues are consistent with mutational experiments. Identifying the key residues for binding provides a structural basis for designing drugs that will target those critical residues.
The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 107 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1–4.7 kcal/mol, R2 = 0.7–1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets—a coiled coil, a zinc finger, and a WW domain—as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling.
Many applications in computational structural biology involve evaluating the energy of a protein adopting a specific structure. A variety of functions are used for this purpose. Statistical potentials are fast to evaluate but do not have a clear biophysical basis, whereas physics-based functions consist of well-defined terms that can be costly to compute. This paper describes how the theory of cluster expansion, originally developed to describe the energies of alloys, can be applied to generate a physical potential for proteins that is extremely fast to evaluate. Cluster expansion is a way of representing a property of a system as a discrete function of its degrees of freedom. In this paper, it is used for the problem of protein design, where the energy is determined by the identities and conformations of amino acids at different sites on a fixed protein backbone. Application of cluster expansion to three small protein folds—the α-helical coiled coil, the zinc finger, and the WW domain—shows that protein sequence can be mapped directly to energy using a surprisingly simple function that maintains high accuracy. Promising results on these small systems suggest that the theory may have utility for macromolecular modeling more generally.
In the chemotaxis pathway of the bacterium Escherichia coli, signals are carried from a cluster of receptors to the flagellar motors by the diffusion of the protein CheY-phosphate (CheYp) through the cytoplasm. A second protein, CheZ, which promotes dephosphorylation of CheYp, partially colocalizes with receptors in the plasma membrane. CheZ is normally dimeric in solution but has been suggested to associate into highly active oligomers in the presence of CheYp. A model is presented here and supported by Brownian dynamics simulations, which accounts for these and other experimental data: A minority component of the receptor cluster (dimers of CheAshort) nucleates CheZ oligomerization and CheZ molecules move from the cytoplasm to a bound state at the receptor cluster depending on the current level of cellular stimulation. The corresponding simulations suggest that dynamic CheZ localization will sharpen cellular responses to chemoeffectors, increase the range of detectable ligand concentrations, and make adaptation more precise and robust. The localization and activation of CheZ constitute a negative feedback loop that provides a second tier of adaptation to the system. Subtle adjustments of this kind are likely to be found in many other signaling pathways.
In order to function effectively, a living cell must not only synthesize the correct molecules but also put them in the correct place. Understanding how this positioning occurs, and what its consequences are, is a matter of great interest and concern to contemporary biologists. The author here proposes a novel mechanism that will enhance the ability of a bacterial cell to perform chemotaxis—the ability to swim toward sources of food or away from noxious substances. In this hypothesis, a key protein in the chemotaxis pathway moves dynamically between the membrane and the cytoplasm depending on the presence of attractants or repellents. This idea is explored and tested by means of detailed molecular simulations in which all of the relevant molecules are shown in their correct location in the cell. The simulations show that the proposed shift in location of the key molecule will improve the speed, range, and robustness of the cell's response. It seems likely that similar movements of proteins will occur in many other signaling pathways.
Here our goal is to carry out nanotube design using naturally occurring protein building blocks. Inspection of the protein structural database reveals the richness of the conformations of proteins, their parts, and their chemistry. Given target functional protein nanotube geometry, our strategy involves scanning a library of candidate building blocks, combinatorially assembling them into the shape and testing its stability. Since self-assembly takes place on time scales not affordable for computations, here we propose a strategy for the very first step in protein nanotube design: we map the candidate building blocks onto a planar sheet and wrap the sheet around a cylinder with the target dimensions. We provide examples of three nanotubes, two peptide and one protein, in atomistic model detail for which there are experimental data. The nanotube models can be used to verify a nanostructure observed by low-resolution experiments, and to study the mechanism of tube formation.
Nanobiology is challenging to computational biology. The aim is to predict candidate nanostructures consisting of biopolymers. The idea the authors have followed for some years is to employ naturally occurring protein building blocks for protein and nanostructure design. Drawing on proteins as material is attractive, since proteins and their building blocks have a large repertoire of shapes and surface chemistry. The authors assume that the shape is given: either a target protein scaffold or a functional nanoparticle shape. Ideally, building blocks should self-assemble spontaneously. In practice, self-assembly involves time scales not affordable for computations. Instead, here is presented the first step in knowledge-based nanotube design: creating a nanotube with specified protein arrangement and tube geometry. Models are constructed by wrapping a planar sheet onto a cylinder surface. The sheet is shaped by a repeating 2-D lattice. This simplification reduces the complexity to the protein arrangement in the lattice and does not prevent construction of all possible nanotubes of repeated units. It allows optimization with all-atom force field. This is important since local energy minimization may screen whether a specified nanotube is a feasible nanostructure.
Peptides often have conformational preferences. We simulated 133 peptide 8-mer fragments from six different proteins, sampled by replica-exchange molecular dynamics using Amber7 with a GB/SA (generalized-Born/solvent-accessible electrostatic approximation to water) implicit solvent. We found that 85 of the peptides have no preferred structure, while 48 of them converge to a preferred structure. In 85% of the converged cases (41 peptides), the structures found by the simulations bear some resemblance to their native structures, based on a coarse-grained backbone description. In particular, all seven of the β hairpins in the native structures contain a fragment in the turn that is highly structured. In the eight cases where the bioinformatics-based I-sites library picks out native-like structures, the present simulations are largely in agreement. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.
To carry out specific biochemical reactions, proteins must adopt precise three-dimensional conformations. During the folding of a protein, the protein picks out the right conformation out of billions of other conformations. It is not yet possible to do this computationally. Picking out the native conformation using physics-based atomically detailed models, sampled by molecular dynamics, is presently beyond the reach of computer methods. How can we speed up computational protein-structure prediction? One idea is that proteins start folding at specific parts of a chain that kink up early in the folding process. If we can identify these kinks, we should be able to speed up protein-structure prediction. Previous studies have identified likely kinks through bioinformatic analysis of existing protein structures. The goal of the authors here is to identify these putative folding initiation sites with a physical model instead. In this study, Ho and Dill show that, by chopping a protein chain into peptide pieces, then simulating the pieces in molecular dynamics, they can identify those peptide fragments that have conformational biases. These peptides identify the kinks in the protein chain.
The C2 domain of protein kinase Cα (PKCα) controls the translocation of this kinase from the cytoplasm to the plasma membrane during cytoplasmic Ca2+ signals. The present study uses intracellular coimaging of fluorescent fusion proteins and an in vitro FRET membrane-binding assay to further investigate the nature of this translocation. We find that Ca2+-activated PKCα and its isolated C2 domain localize exclusively to the plasma membrane in vivo and that a plasma membrane lipid, phosphatidylinositol-4,5-bisphosphate (PIP2), dramatically enhances the Ca2+-triggered binding of the C2 domain to membranes in vitro. Similarly, a hybrid construct substituting the PKCα Ca2+-binding loops (CBLs) and PIP2 binding site (β-strands 3–4) into a different C2 domain exhibits native Ca2+-triggered targeting to plasma membrane and recognizes PIP2. Conversely, a hybrid containing the CBLs but lacking the PIP2 site translocates primarily to trans-Golgi network (TGN) and fails to recognize PIP2. Similarly, PKCα C2 domains possessing mutations in the PIP2 site target primarily to TGN and fail to recognize PIP2. Overall, these findings demonstrate that the CBLs are essential for Ca2+-triggered membrane binding but are not sufficient for specific plasma membrane targeting. Instead, targeting specificity is provided by basic residues on β-strands 3–4, which bind to plasma membrane PIP2.
G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Cα root-mean-squared deviation from native of 4.6 Å, with a root-mean-squared deviation in the transmembrane helix region of 2.1 Å. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR).
G protein–coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of the breadth and importance of the physiological roles undertaken by the GPCR family, many of its members are important pharmacological targets. Although the knowledge of a protein's native structure can provide important insight into understanding its function and for the design of new drugs, the experimental determination of the three-dimensional structure of GPCR membrane proteins has proved to be very difficult. This is demonstrated by the fact that there is only one solved GPCR structure (from bovine rhodopsin) deposited in the Protein Data Bank library. In contrast, there are no human GPCR structures in the Protein Data Bank. To address the need for the tertiary structures of human GPCRs, using just sequence information, the authors use a newly developed threading-assembly-refinement method to generate models for all 907 registered GPCRs in the human genome. About 820 GPCRs are anticipated to have correct topology and transmembrane helix arrangement. A subset of the resulting models is validated by comparison with mutagenesis experimental data, and consistent agreement is demonstrated.
Protein-protein interactions, particularly weak and transient ones, are often mediated by peptide recognition domains, such as Src Homology 2 and 3 (SH2 and SH3) domains, which bind to specific sequence and structural motifs. It is important but challenging to determine the binding specificity of these domains accurately and to predict their physiological interacting partners. In this study, the interactions between 35 peptide ligands (15 binders and 20 non-binders) and the Abl SH3 domain were analyzed using molecular dynamics simulation and the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. The calculated binding free energies correlated well with the rank order of the binding peptides and clearly distinguished binders from non-binders. Free energy component analysis revealed that the van der Waals interactions dictate the binding strength of peptides, whereas the binding specificity is determined by the electrostatic interaction and the polar contribution of desolvation. The binding motif of the Abl SH3 domain was then determined by a virtual mutagenesis method, which mutates the residue at each position of the template peptide relative to all other 19 amino acids and calculates the binding free energy difference between the template and the mutated peptides using the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. A single position mutation free energy profile was thus established and used as a scoring matrix to search peptides recognized by the Abl SH3 domain in the human genome. Our approach successfully picked ten out of 13 experimentally determined binding partners of the Abl SH3 domain among the top 600 candidates from the 218,540 decapeptides with the PXXP motif in the SWISS-PROT database. We expect that this physical-principle based method can be applied to other protein domains as well.
One of the central questions of molecular biology is to understand how signals are transduced in the cell. Intracellular signal transduction is mainly achieved through cascades of protein-protein interactions, which are often mediated by peptide-binding modular domains, such as Src Homology 2 and 3 (SH2 and SH3). Each family of these domains binds to peptides with specific sequence and structural characteristics. To reconstruct the protein-protein interaction networks mediated by modular domains, one must identify the peptide motifs recognized by these domains and understand the mechanism of binding specificity. These questions are challenging because the domain-peptide interactions are usually weak and transient. Here, the authors took a physical-principles approach to address these difficult questions for the SH3 domain of human protein Abl, which binds to peptides containing the PXXP motif (where P is proline and X is any amino acid). They generated a position-specific scoring matrix to represent the binding motif of the Abl SH3 domain. Analysis on the binding free energy components suggested insights into how the binding specificity is achieved. Most known protein interacting partners of the Abl SH3 domain were correctly identified using the position-specific scoring matrix, and other potential interacting partners were also suggested.
We propose a new mechanism to explain autoinhibition of the epidermal growth factor receptor (EGFR/ErbB) family of receptor tyrosine kinases based on a structural model that postulates both their juxtamembrane and protein tyrosine kinase domains bind electrostatically to acidic lipids in the plasma membrane, restricting access of the kinase domain to substrate tyrosines. Ligand-induced dimerization promotes partial trans autophosphorylation of ErbB1, leading to a rapid rise in intracellular [Ca2+] that can activate calmodulin. We postulate the Ca2+/calmodulin complex binds rapidly to residues 645–660 of the juxtamembrane domain, reversing its net charge from +8 to −8 and repelling it from the negatively charged inner leaflet of the membrane. The repulsion has two consequences: it releases electrostatically sequestered phosphatidylinositol 4,5-bisphosphate (PIP2), and it disengages the kinase domain from the membrane, allowing it to become fully active and phosphorylate an adjacent ErbB molecule or other substrate. We tested various aspects of the model by measuring ErbB juxtamembrane peptide binding to phospholipid vesicles using both a centrifugation assay and fluorescence correlation spectroscopy; analyzing the kinetics of interactions between ErbB peptides, membranes, and Ca2+/calmodulin using fluorescence stop flow; assessing ErbB1 activation in Cos1 cells; measuring fluorescence resonance energy transfer between ErbB peptides and PIP2; and making theoretical electrostatic calculations on atomic models of membranes and ErbB juxtamembrane and kinase domains.
Acetylcholinesterase (AChE) rapidly hydrolyzes acetylcholine in the neuromuscular junctions and other cholinergic synapses to terminate the neuronal signal. In physiological conditions, AChE exists as tetramers associated with the proline-rich attachment domain (PRAD) of either collagen-like Q subunit (ColQ) or proline-rich membrane-anchoring protein. Crystallographic studies have revealed that different tetramer forms may be present, and it is not clear whether one or both are relevant under physiological conditions. Recently, the crystal structure of the tryptophan amphiphilic tetramerization (WAT) domain of AChE associated with PRAD ([WAT]4PRAD), which mimics the interface between ColQ and AChE tetramer, became available. In this study we built a complete tetrameric mouse [AChET]4–ColQ atomic structure model, based on the crystal structure of the [WAT]4PRAD complex. The structure was optimized using energy minimization. Block normal mode analysis was done to investigate the low-frequency motions of the complex and to correlate the structure model with the two known crystal structures of AChE tetramer. Significant low-frequency motions among the catalytic domains of the four AChE subunits were observed, while the [WAT]4PRAD part held the complex together. Normal mode involvement analysis revealed that the two lowest frequency modes were primarily involved in the conformational changes leading to the two crystal structures. The first 30 normal modes can account for more than 75% of the conformational changes in both cases. The evidence further supports the idea of a flexible tetramer model for AChE. This model can be used to study the implications of the association of AChE with ColQ.
Acetylcholinesterase (AChE) breaks down acetylcholine in the neuromuscular junction and other cholinergic synapses to terminate neuronal signals. AChE exists as tetramers anchored by structural subunits to the cell membranes in the brain or the basal lamina in the neuromuscular junction. Based on a crystal structure of the tetramerization domain of AChE with a proline-rich attachment domain of the anchoring proteins, a symmetric model of the complex of AChE tetramer with the anchoring protein tail was constructed. Block normal mode analysis revealed the presence of several low-frequency, low-barrier normal modes corresponding to inter-subunit motions. Previous crystal structures of AChE tetramer could be rationalized using these normal modes. These low-frequency modes are due to the presence of a flexible hinge in the structure of AChE. This study paints a picture of a flexible AChE tetramer with different conformational states interconverting easily under physiological conditions, which has important implications on the function of AChE. In particular, AChE is not trapped in the compact tetramer structure, for which access of substrate to two of the active sites is somewhat limited. Rather, the tetramer fluctuates to expose all four of its active sites to ensure rapid removal of acetylcholine.
Pseudomonas aeruginosa lipase is a 29-kDa protein that, following the determination of its crystal structure, was postulated to have a lid that stretched between residues 125 and 148. In this paper, using molecular dynamics simulations, we propose that there exists, in addition to the above-mentioned lid, a novel second lid in this lipase. We further show that the second lid, covering residues 210–222, acts as a triggering lid for the movement of the first. We also investigate the role of hydrophobicity in the movement of the lids and show that two residues, Phe214 and Ala217, play important roles in lid movement. To our knowledge, this is the first time that a double-lid movement of the type described in our manuscript has been presented to the scientific community. This work also elucidates the interplay of hydrophobic interactions in the dynamics, and hence the function, of an enzyme.
Lipases hydrolyse long-chain fatty acid esters at water-oil interfaces through the mechanism of interfacial activation mediated by the movement of a lid subdomain that covers the active site. Studying lid movement is an area of active research in the field of protein dynamics. The lipase from Pseudomonas aeruginosa is a 29-kDa protein that was previously crystallized in the open conformation, and as expected, an approximately 20-residue lid subdomain was identified. In the present study, the authors report extensive molecular dynamics simulations of the P. aeruginosa lipase. They show that this protein has two lids covering the substrate-binding pocket. The first lid is the one proposed from the known crystal structure. The second lid, a much shorter one, lies over the binding pocket facing the first lid. Furthermore, using position-restrained simulations, these authors show that movement of the second lid may actually be a trigger for the movement of the first, and that this triggering action is driven by hydrophobic contacts between the two lids. This computational study paves a way for experimentalists to study the structure and dynamics of this protein in greater detail in order to understand coupled subdomain movements in a comprehensive fashion.
In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Transcription starts as the RNA polymerase binds to the promoter and continues until it reaches a transcriptional terminator. Some terminators rely on the presence of the Rho protein, whereas others function independently of Rho. Such Rho-independent terminators consist of an inverted repeat followed by a stretch of thymine residues, allowing us to predict their presence directly from the DNA sequence. Unlike in Escherichia coli, the Rho protein is dispensable in Bacillus subtilis, suggesting a limited role for Rho-dependent termination in this organism and possibly in other Firmicutes. We analyzed 463 experimentally known terminating sequences in B. subtilis and found a decision rule to distinguish Rho-independent transcriptional terminators from non-terminating sequences. The decision rule allowed us to find the boundaries of operons in B. subtilis with a sensitivity and specificity of about 94%. Using the same decision rule, we found an average sensitivity of 94% for 57 bacteria belonging to the Firmicutes phylum, and a considerably lower sensitivity for other bacteria. Our analysis shows that Rho-independent termination is dominant for Firmicutes in general, and that the properties of the transcriptional terminators are conserved. Terminator prediction can be used to reliably predict the operon structure in these organisms, even in the absence of experimentally known operons. Genome-wide predictions of Rho-independent terminators for the 57 Firmicutes are available in the Supporting Information section.
In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Transcription starts as the RNA polymerase binds to the promoter and continues until it reaches a transcriptional terminator. To understand the gene regulatory network of transcription in bacteria, it is important as a first step to determine the operon structure. In this paper, the authors show that (unlike in Escherichia coli) most terminators in Bacillus subtilis function independently of the terminator protein Rho. As these Rho-independent terminators consist of an inverted repeat followed by a stretch of thymine residues, their presence can be predicted directly from the DNA sequence. The authors derived a decision rule by analyzing experimentally known terminating sequences in B. subtilis, and show that the operon boundaries can be found with a high accuracy (about 94%) in B. subtilis and other Firmicutes, even in the absence of experimentally known operons in the given organism. The properties of the transcriptional terminators are shown to be conserved within the Firmicutes phylum. For bacteria other than Firmicutes, the prediction accuracy is considerably lower, suggesting that Rho-dependent or possibly currently unknown termination mechanisms are important in these organisms.
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.
A typical fully sequenced genome from a bacterial species contains several thousand genes, and those from multicellular animals may contain many thousands of genes. Understanding the function of these genes is one of the key goals of the developing fields of bioinformatics and proteomics, and the results are of interest to life scientists. The authors describe a computational statistical method that can identify pairs of genes whose functions may be linked, in the sense of participating in a common metabolic pathway or from some physical interaction. The method is applied to phylogenetic trees of related organisms and identifies instances in which a pair of genes is either gained or lost together during evolution. They find that genes that have co-evolved like this on two or more occasions during their evolutionary history are almost certainly functionally linked. These methods can be applied in an automated way to large numbers of species for which fully annotated genomes are available to identify candidate sets of functionally linked genes, and to characterize gene networks.
The MA domain of retroviral Gag proteins mediates association with the host cell membrane during assembly. The biochemical nature of this interaction is not well understood. We have used an in vitro flotation assay to directly measure Rous sarcoma virus (RSV) MA-membrane interaction in the absence of host cell factors. The association of purified MA and MA-containing proteins with liposomes of defined composition was electrostatic in nature and depended upon the presence of a biologically relevant concentration of negatively charged lipids. A mutant MA protein known to be unable to promote Gag membrane association and budding in vivo failed to bind to liposomes. These results were supported by computational modeling. The intrinsic affinity of RSV MA for negatively charged membranes appears insufficient to promote efficient plasma membrane binding during assembly. However, an artificially dimerized form of MA bound to liposomes by at least an order of magnitude more tightly than monomeric MA. This result suggests that the clustering of MA domains, via Gag-Gag interactions during virus assembly, drives membrane association in vivo.
Translocation of cytosolic phospholipase A2 (cPLA2) to Golgi and ER in response to intracellular calcium mobilization is regulated by its calcium-dependent lipid-binding, or C2, domain. Although well studied in vitro, the biochemical characteristics of the cPLA2C2 domain offer no predictive value in determining its intracellular targeting. To understand the molecular basis for cPLA2C2 targeting in vivo, the intracellular targets of the synaptotagmin 1 C2A (Syt1C2A) and protein kinase Cα C2 (PKCαC2) domains were identified in Madin-Darby canine kidney cells and compared with that of hybrid C2 domains containing the calcium binding loops from cPLA2C2 on Syt1C2A and PKCαC2 domain backbones. In response to an intracellular calcium increase, PKCαC2 targeted plasma membrane regions rich in phosphatidylinositol-4,5-bisphosphate, and Syt1C2A displayed a biphasic targeting pattern, first targeting phosphatidylinositol-4,5-bisphosphate-rich regions in the plasma membrane and then the trans-Golgi network. In contrast, the Syt1C2A/cPLA2C2 and PKCαC2/cPLA2C2 hybrids targeted Golgi/ER and colocalized with cPLA2C2. The electrostatic properties of these hybrids suggested that the membrane binding mechanism was similar to cPLA2C2, but not PKCαC2 or Syt1C2A. These results suggest that primarily calcium binding loops 1 and 3 encode structural information specifying Golgi/ER targeting of cPLA2C2 and the hybrid domains.
A ring of aligned glutamate residues named the intermediate ring of charge surrounds the intracellular end of the acetylcholine receptor channel and dominates cation conduction (Imoto et al. 1988). Four of the five subunits in mouse-muscle acetylcholine receptor contribute a glutamate to the ring. These glutamates were mutated to glutamine or lysine, and combinations of mutant and native subunits, yielding net ring charges of −1 to −4, were expressed in Xenopus laevis oocytes. In all complexes, the α subunit contained a Cys substituted for αThr244, three residues away from the ring glutamate αGlu241. The rate constants for the reactions of αThr244Cys with the neutral 2-hydroxyethyl-methanethiosulfonate, the positively charged 2-ammonioethyl-methanethiosulfonate, and the doubly positively charged 2-ammonioethyl-2′-ammonioethanethiosulfonate were determined from the rates of irreversible inhibition of the responses to acetylcholine. The reagents were added in the presence and absence of acetylcholine and at various transmembrane potentials, and the rate constants were extrapolated to zero transmembrane potential. The intrinsic electrostatic potential in the channel in the vicinity of the ring of charge was estimated from the ratios of the rate constants of differently charged reagents. In the acetylcholine-induced open state, this potential was −230 mV with four glutamates in the ring and increased linearly towards 0 mV by +57 mV for each negative charge removed from the ring. Thus, the intrinsic electrostatic potential in the narrow, intracellular end of the open channel is almost entirely due to the intermediate ring of charge and is strongly correlated with alkali-metal-ion conductance through the channel. The intrinsic electrostatic potential in the closed state of the channel was more positive than in the open state at all values of the ring charge. These electrostatic properties were simulated by theoretical calculations based on a simplified model of the channel.
nicotinic; mutagenesis; reaction kinetics; conductance; selectivity