Despite significant progress in recent years, ab initio folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.
protein structure prediction; ab initio folding; conditional random fields (CRFs); directional statistics; fragment assembly; lattice model
Naturally photoswitchable proteins offer a means of directly manipulating the formation of protein complexes that drive a diversity of cellular processes. We have developed tunable light-inducible dimerization tags (TULIPs) based on a synthetic interaction between the LOV2 domain of Avena sativa phototropin 1 (AsLOV2) and an engineered PDZ domain (ePDZ). TULIP tags can recruit proteins to diverse structures in living yeast and mammalian cells, either globally or with precise spatial control using a steerable laser. The equilibrium binding and kinetic parameters of the interaction are tunable by mutation, making TULIPs readily adaptable to signaling pathways with varying sensitivities and response times. We demonstrate the utility of TULIPs by conferring light sensitivity to functionally distinct components of the yeast mating pathway and by directing the site of cell polarization.
The hydrogen exchange behavior of native cytochrome c in low concentrations of de-naturant reveals a sequence of metastable, partially unfolded forms that occupy free energy levels reaching up to the fully unfolded state. The step from one form to another is accomplished by the unfolding of one or more cooperative units of structure. The cooperative units are entire omega loops or mutually stabilizing pairs of whole helices and loops. The partially unfolded forms detected by hydrogen exchange appear to represent the major intermediates in the reversible, dynamic unfolding reactions that occur even at native conditions and thus may define the major pathway for cytochrome c folding.
Recent work has largely completed our understanding of the hydrogen-exchange chemistry of unstructured proteins and nucleic acids. Some of the high-energy structural fluctuations that determine the hydrogen-exchange behavior of native macromolecules have been explained; others remain elusive. A growing number of applications are exploiting hydrogen-exchange behavior to study difficult molecular systems and elicit otherwise inaccessible information on protein structure, dynamics and energetics.
Crucial to revealing mechanistic details of protein folding is a characterization of the transition state ensemble and its structural dynamics. To probe the transition state of ubiquitin thermal unfolding, we examine unfolding dynamics and kinetics of wildtype and mutant ubiquitin using time-resolved nonlinear infrared spectroscopy after a nanosecond temperature jump. We observe spectral changes on two different timescales. A fast nonexponential microsecond phase is attributed to downhill unfolding from the transition state region, which is induced by a shift of the barrier due to the rapid temperature change. Slow millisecond changes arise from thermally activated folding and unfolding kinetics. Mutants that stabilize or destabilize the β strands III – V lead to decreased or increased amplitude of the μs phase, indicating that the disruption or weakening of these strands occurs in the transition state. Unfolding features from μs to ms can be explained by temperature-dependent changes of a two-dimensional free energy surface constructed by the native contacts between β strands of the protein. In addition, the results support the possibility of an intermediate state in thermal unfolding.
The zinc-specific fluorophore, Zinpyr-1, is used in competition assays to determine the kinetic and thermodynamic parameters of Zn2+ binding to engineered bi-Histidine sites located in Ubiquitin and the B domain of protein A (BdpA). These binding sites are used in ψ-analysis studies to investigate structure formation in the folding transition state identified by the change in folding rate upon the addition of metal ions. For Ubiquitin, the on-rate binding constant and binding affinity for a site located along an α-helix are measured to be ~107 M-1s-1 and 3 μM, respectively. For a site located across two β-strands, the metal binding affinity was too weak to measure in the dye competition assays (Kd > 55 μM). The equilibrium-determined values for the Zn2+-induced stabilization of Ubiquitin and BdpA match the values derived from changes in the global folding and unfolding rates. Therefore, metal-ion binding is in fast equilibrium during the transit over the free energy barrier. Accordingly, the folding rate must be slower than the product of the fractional population of a high energy intermediate with the metal site formed and the metal binding on-rate constant. The known folding rate of 20 s-1 at 1.5 M guanidinium chloride in 400 μM Zn2+ provides an upper bound for the stability of such intermediates, ΔGU-I < +4 kcal·mol-1. These results support a view of the apparent two-state protein folding reaction surface as a fast pre-equilibrium between the denatured state and a series of high energy species. The net folding rate is a product of the equilibrium constant of the highest energy species and a transmission rate. For Ubiquitin, we estimate the transmission rate to be ~104 s-1. Implications to the role of unfolded chain diffusion on folding rates and barrier heights are discussed.
Psi-analysis; Ubiquitin; free-energy barrier; kinetics
We examine the utility of intra-molecular covalent cross-linking to identify the structure present in the folding transition state. In mammalian ubiquitin, cysteine residues located across two β-strands are cross-linked with dichloroacetone. The kinetic effects of these covalent cross-links in ubiquitin, and engineered disulfide bonds in src SH3 (Grantcharova VP, Riddle DS, Baker D. (2000) Proc Natl Acad Sci U S A
97, 7084–7089), are compared to the results of ψ-analysis where strand association is stabilized by metal ion binding to engineered bihistidine sites (Krantz, B. A., Dothager, R. S. & Sosnick, T. R. (2004) J. Mol. Biol.
337, 463–75) at the same positions. The results for the two methods agree at some of the sites. The cross-linking ϕcrosslink-values agree with their corresponding ψ-values when they have both have values of zero or one, which represent the absence and presence of native structure, respectively. When ϕcrosslink > ψ, the apparent inconsistency is rationalized by the difference in the each method’s mode of stabilization; cross-linking reduces the configurational entropy of the unfolded state whereas metal binding directly stabilizes the native state. However, when the cross-linking ϕ-values are smaller than their corresponding ψ-values, the apparent underestimation of structure formation is difficult to rationalize while retaining the assumption that the cross-link exclusively affects the entropy of the unfolded state. The interpretation also is problematic for data on cross-links located across strands which are not hairpins, and hence, these sites are likely to be of limited utility in folding studies. We conclude that cross-linking data for sites on hairpins generally reports on the amount of structure formed within the enclosed loop while the metal binding data report on the amount structure formed at the site itself.
Psi-analysis; dichloroacetone; ubiquitin; bihistidine
We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA.
Insulin-degrading enzyme (IDE) can degrade insulin and amyloid-β (Aβ), peptides involved in diabetes and Alzheimer's disease, respectively. IDE selects its substrates based on size, charge, and flexibility. From these criteria, we predict that IDE can cleave and inactivate ubiquitin (Ub). Here, we show that IDE cleaves Ub in a biphasic manner, first, by rapidly removing the two C-terminal glycines (kcat = 2 sec-1) followed by a slow cleavage between residues 72-73 (kcat = 0.07 sec-1), thereby producing the inactive Ub1-74 and Ub1-72. IDE is a ubiquitously expressed cytosolic protein, where monomeric Ub is also present. Thus, Ub degradation by IDE should be regulated. IDE is known to bind the cytoplasmic intermediate filament protein nestin with high affinity. We found that nestin potently inhibits the cleavage of Ub by IDE. In addition, Ub1-72 has a markedly increased affinity for IDE (∼90 fold). Thus, the association of IDE with cellular regulators and product inhibition by Ub1-72 can prevent inadvertent proteolysis of cellular Ub by IDE. Ub is a highly stable protein. However, IDE instead prefers to degrade peptides with high intrinsic flexibility. Indeed, we demonstrate that IDE is exquisitely sensitive to Ub stability. Mutations that only mildly destabilize Ub (ΔΔG ‹ 0.6 kcal/mol) render IDE hypersensitive to Ub with rate enhancements greater than 12-fold. The Ub-bound IDE structure and IDE mutants reveal that interaction of the exosite with the N-terminus of Ub guides the unfolding of Ub, allowing its sequential cleavages. Together, our studies link the control of Ub clearance with IDE.
ubiquitin turnover; insulin-degrading enzyme; nestin-mediated cleavage regulation; exosite; substrate flexibility
Rather than stressing the most recent advances in the field, this review highlights the fundamental topics where disagreement remains and where adequate experimental data are lacking. These topics include properties of the denatured state and the role of residual structure, the nature of the fundamental steps and barriers, the extent of pathway heterogeneity and non-native interactions, recent comparisons between theory and experiment, and finally, dynamical properties of the folding reaction.
RNA folding occurs via a series of transitions between metastable intermediate states. It is unknown whether folding intermediates are discrete structures folding along defined pathways or heterogeneous ensembles folding along broad landscapes. We use cryo-electron microscopy and single particle image reconstruction to determine the structure of the major folding intermediate of the specificity domain of a ribonuclease P ribozyme. Our results support the existence of a discrete conformation of this folding intermediate.
One of the major challenges with protein template-free modeling is an efficient sampling algorithm that can explore a huge conformation space quickly. The popular fragment assembly method constructs a conformation by stringing together short fragments extracted from the Protein Data Base (PDB). The discrete nature of this method may limit generated conformations to a subspace in which the native fold does not belong. Another worry is that a protein with really new fold may contain some fragments not in the PDB. This article presents a probabilistic model of protein conformational space to overcome the above two limitations. This probabilistic model employs directional statistics to model the distribution of backbone angles and 2nd-order Conditional Random Fields (CRFs) to describe sequence-angle relationship. Using this probabilistic model, we can sample protein conformations in a continuous space, as opposed to the widely used fragment assembly and lattice model methods that work in a discrete space. We show that when coupled with a simple energy function, this probabilistic method compares favorably with the fragment assembly method in the blind CASP8 evaluation, especially on alpha or small beta proteins. To our knowledge, this is the first probabilistic method that can search conformations in a continuous space and achieves favorable performance. Our method also generated three-dimensional (3D) models better than template-based methods for a couple of CASP8 hard targets. The method described in this article can also be applied to protein loop modeling, model refinement, and even RNA tertiary structure prediction.
conditional random fields (CRFs); directional statistics; fragment assembly; lattice model; protein structure prediction; template-free modeling
RNA folding occurs via a series of transitions between metastable intermediate states for Mg2+ concentrations below those needed to fold the native structure. In general, these folding intermediates are considerably less compact than their respective native states. Our previous work demonstrates that the major equilibrium intermediate of the 154 residue specificity domain (S-domain) of the B. subtilis RNase P RNA is more extended than its native structure. We now investigate two models with falsifiable predictions regarding the origins of the extended intermediate structures in the S-domains of the B. subtilis and the E. coli RNase P RNA that belong to different classes P RNA and have distinct native structures. The first model explores the contribution of electrostatic repulsion, while the second model probes specific interactions in the core of the folding intermediate. Using small-angle X-ray scattering (SAXS) and Langevin Dynamics (LD) simulations, we show that electrostatics only plays a minor role, whereas specific interactions largely accounts for the extended nature of the intermediate. Structural contacts in the core, including a non-native base-pair, help to stabilize the intermediate conformation. We conclude that RNA folding intermediates adopt extended conformations due to short-range, non-native interactions rather than generic electrostatic repulsion of helical domains. These principles apply to other ribozymes and riboswitches that undergo functionally relevant conformational changes.
Langevin dynamics; P RNA; S-domain
Although most folding intermediates escape detection, their characterization is crucial to the elucidation of folding mechanisms. Here we outline a powerful strategy to populate partially unfolded intermediates: A buried aliphatic residue is substituted with a charged residue (e.g., Leu→Glu−) to destabilize and unfold a specific region of the protein. We apply this strategy to Ubiquitin, reversibly trapping a folding intermediate in which the β5 strand is unfolded. The intermediate refolds to a native-like structure upon charge neutralization under mildly acidic conditions. Characterization of the trapped intermediate using NMR and hydrogen exchange methods identifies a second folding intermediate and reveals the order and free energies of the two major folding events on the native side of the rate-limiting step. This general strategy may be combined with other methods and have broad applications in the study of protein folding and other reactions that require trapping of high energy states.
Psi-analysis; ubiquitin; protein engineering; native-state hydrogen exchange; small angle X-ray scattering; NMR
Genetically-encoded protein photosensors, including the LOV (light, oxygen, voltage) domain, are promising tools for engineering optical control of cellular behavior. We are only beginning to understand how to couple these light detectors to effectors of choice. We report a method that increases the dynamic range of an artificial photoswitch based on the LOV2 domain of A. sativa phototropin1 (AsLOV2). This approach can potentially be used to improve many AsLOV2-based photoswitches.
ψ-analysis has been used to identify inter-residue contacts in the transition state ensemble (TSE) of ubiquitin and other proteins. The magnitude of ψ depends on the degree to which an inserted bi-Histidine (biHis) metal ion binding site is formed in the TSE. A ψ equal to zero or one indicates that the biHis site is absent or fully native-like, respectively, while a fractional ψ implies that in the TSE, the biHis site recovers only part of the binding-induced stabilization of the native state. All-atom Langevin dynamics (LD) simulations of the TSE are performed with restrictions imposed only on the distances between the pairs of residues with experimentally determined ψ of unity. When a site with a fractional ψ lies adjacent to a site with ψ = 1, the fractional ψ generally signifies that the “fractional site” has a distorted geometry in the TS. When a fractional site is distal to the sites with ψ = 1, however, the histidines sample configurations in which the site is absent. The simulations indicate that the ψ = 1 sites by themselves can be used to generate a well-defined TSE having near native topology. φ-values calculated from the TS simulations exhibit mixed agreement with the experimental values. The origin and implication of the disparities are discussed.
Phi-analysis; Langevin dynamics; Psi-analysis; metal binding
The B-domain of protein A (BdpA) is a small 3-helix bundle that has been the subject of considerable experimental and theoretical investigation. Nevertheless, a unified view of the structure of the transition state ensemble (TSE) is still lacking. To characterize the TSE of this surprisingly challenging protein, we apply a combination of ψ-analysis (which probes the role of specific side chain to side chain contacts) and kinetic H/D amide isotope effects (which measures of hydrogen bond content), building upon previous studies using mutational φ-analysis (which probes the energetic influence of side chain substitutions). The second helix (H2) is folded in the TSE, while helix formation appears just at the carboxy and amino termini of the first and third helices, respectively. The experimental data suggest a homogenous, yet plastic TS with a native-like topology. This study generalizes our earlier conclusion, based on two larger α/β proteins, that the TSEs of most small proteins achieve ~70% of their native state’s relative contact order. This high percentage limits the degree of possible TS heterogeneity and requires a re-evaluation of the structural content of the TSE of other proteins, especially when they are characterized as small or polarized.
Protein A; BdpA; Phi-analysis; Langevin dynamics; Psi-analysis; metal binding; protein; isotope effects
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.