Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction.
energy landscape; protein dynamics; machine learning; allostery
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed based on a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. Based on docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, kcat/Km [M−1s−1]): formylpterin, 5.2 × 106; pterin-6-carboxylate, 4.0 × 106; pterin-7-carboxylate, 3.7 × 106; pterin, 3.3 × 106; hydroxymethylpterin, 1.2 × 106; biopterin, 1.0 × 106; D-(+)-neopterin, 3.1 × 105; isoxanthopterin, 2.8 × 105; sepiapterin, 1.3 × 105; folate, 1.3 × 105, xanthopterin, 1.17 × 105; 7,8-dihydrohydroxymethylpterin, 3.3 × 104. While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
A statistical method to merge SAXS profiles using Gaussian processes is presented.
Small-angle X-ray scattering (SAXS) is an experimental technique that allows structural information on biomolecules in solution to be gathered. High-quality SAXS profiles have typically been obtained by manual merging of scattering profiles from different concentrations and exposure times. This procedure is very subjective and results vary from user to user. Up to now, no robust automatic procedure has been published to perform this step, preventing the application of SAXS to high-throughput projects. Here, SAXS Merge, a fully automated statistical method for merging SAXS profiles using Gaussian processes, is presented. This method requires only the buffer-subtracted SAXS profiles in a specific order. At the heart of its formulation is non-linear interpolation using Gaussian processes, which provides a statement of the problem that accounts for correlation in the data.
SAXS; SANS; data curation; Gaussian process; merging
Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution.
Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Enamel matrix self-assembly has long been suggested as the driving force behind aligned nanofibrous hydroxyapatite formation. We tested if amelogenin, the main enamel matrix protein, can self-assemble into ribbon-like structures in physiologic solutions. Ribbons 17nm wide were observed to grow several microns in length, requiring calcium, phosphate, and pH 4.0–6.0. The pH range suggests that the formation of ion bridges through protonated histidine residues is essential to self-assembly, supported by a statistical analysis of 212 phosphate-binding proteins predicting twelve phosphate-binding histidines. Thermophoretic analysis verified the importance of calcium and phosphate in self-assembly. X-ray scattering characterized amelogenin dimers with dimensions fitting the cross-section of the amelogenin ribbon, leading to the hypothesis that antiparallel dimers are the building blocks of the ribbons. Over 5–7 days, ribbons self-organized into bundles composed of aligned ribbons mimicking the structure of enamel crystallites in enamel rods. These observations confirm reports of filamentous organic components in developing enamel and provide a new model for matrix-templated enamel mineralization.
Enamel; amelogenin; self-assembly; protonated histidine; biomineralization
Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), infects an estimated two billion people worldwide and is the leading cause of mortality due to infectious disease. The development of new anti-TB therapeutics is required, because of the emergence of multi-drug resistance strains as well as co-infection with other pathogens, especially HIV. Recently, the pharmaceutical company GlaxoSmithKline published the results of a high-throughput screen (HTS) of their two million compound library for anti-mycobacterial phenotypes. The screen revealed 776 compounds with significant activity against the M. tuberculosis H37Rv strain, including a subset of 177 prioritized compounds with high potency and low in vitro cytotoxicity. The next major challenge is the identification of the target proteins. Here, we use a computational approach that integrates historical bioassay data, chemical properties and structural comparisons of selected compounds to propose their potential targets in M. tuberculosis. We predicted 139 target - compound links, providing a necessary basis for further studies to characterize the mode of action of these compounds. The results from our analysis, including the predicted structural models, are available to the wider scientific community in the open source mode, to encourage further development of novel TB therapeutics.
Mycobacterium tuberculosis is a major worldwide pathogen infecting millions individuals every year. Additionally, the number of antibiotic resistant strains has dramatically increased over the last decades. Trying to address this challenge, the pharmaceutical company GlaxoSmithKline has recently published the results of a large-scale high-throughput screen (HTS) that resulted in the release of 776 chemical compound structures active against tuberculosis. We have used this dataset of compounds as input to our computational approach that integrates historical bioassay data, chemical properties and structural comparisons. We propose 139 targets alongside their respective hit compounds and made them open to the wider scientific community. Our hope is that the availability of the experimental data from GSK and our computational analysis will encourage further research providing validated therapeutically targets against this devastating disease.
Summary: Accurate alignment of protein sequences and/or structures is crucial for many biological analyses, including functional annotation of proteins, classifying protein sequences into families, and comparative protein structure modeling. Described here is a web interface to SALIGN, the versatile protein multiple sequence/structure alignment module of MODELLER. The web server automatically determines the best alignment procedure based on the inputs, while allowing the user to override default parameter values. Multiple alignments are guided by a dendrogram computed from a matrix of all pairwise alignment scores. When aligning sequences to structures, SALIGN uses structural environment information to place gaps optimally. If two multiple sequence alignments of related proteins are input to the server, a profile–profile alignment is performed. All features of the server have been previously optimized for accuracy, especially in the contexts of comparative modeling and identification of interacting protein partners.
Availability: The SALIGN web server is freely accessible to the academic community at http://salilab.org/salign. SALIGN is a module of the MODELLER software, also freely available to academic users (http://salilab.org/modeller).
The nuclear pore complex (NPC), embedded in the nuclear envelope, is a large, dynamic molecular assembly that facilitates exchange of macromolecules between the nucleus and cytoplasm. The yeast NPC is an eight-fold symmetric annular structure composed of ~456 polypeptide chains contributed by ~30 distinct proteins termed nucleoporins (Nups). Nup116, identified only in fungi, plays a central role in both protein import and mRNA export through the NPC. Nup116 is a modular protein with N-terminal “FG” repeats containing a Gle2p-binding sequence motif (GLEBS motif) and a NPC targeting domain at its C-terminus. We report the crystal structure of the NPC targeting domain of Candida glabrata Nup116, consisting of residues 882-1034 [CgNup116(882-1034)], at 1.94 Å resolution. The X-ray structure of CgNup116(882-1034) is consistent with the molecular envelope determined in solution by Small Angle X-ray Scattering (SAXS). Structural similarities of CgNup116(882-1034) with homologous domains from Saccharomyces cerevisiae Nup116, S. cerevisiaeNup145N, and human Nup98 are discussed.
Nuclear Pore Complex; Nup116; Nup98; Nup100; Nup145; mRNA export; structural genomics
Although nearly half of today’s major pharmaceutical drugs target human integral membrane proteins (hIMPs), only 30 hIMP structures are currently available in the Protein Data Bank, largely owing to inefficiencies in protein production. Here we describe a strategy for the rapid structure determination of hIMPs, using solution NMR spectroscopy with systematically labeled proteins produced via cell-free expression. We report new backbone structures of six hIMPs, solved in only 18 months from 15 initial targets. Application of our protocols to an additional 135 hIMPs with molecular weight <30 kDa yielded 38 hIMPs suitable for structural characterization by solution NMR spectroscopy without additional optimization.
Restriction factors, such as the retroviral complementary DNA deaminase APOBEC3G, are cellular proteins that dominantly block virus replication1-3. The AIDS virus, human immunodeficiency virus type 1 (HIV-1), produces the accessory factor Vif, which counteracts the host’s antiviral defence by hijacking a ubiquitin ligase complex, containing CUL5, ELOC, ELOB and a RING-box protein, and targeting APOBEC3G for degradation4-10. Here we reveal, using an affinity tag/purification mass spectrometry approach, that Vif additionally recruits the transcription cofactor CBF-β to this ubiquitin ligase complex. CBF-β, which normally functions in concert with RUNX DNA binding proteins, allows the reconstitution of a recombinant six-protein assembly that elicits specific polyubiquitination activity with APOBEC3G, but not the related deaminase APOBEC3A. Using RNA knockdown and genetic complementation studies, we also demonstrate that CBF-β is required for Vif-mediated degradation of APOBEC3G and therefore for preserving HIV-1 infectivity. Finally, simian immunodeficiency virus (SIV) Vif also binds to and requires CBF-β to degrade rhesus macaque APOBEC3G, indicating functional conservation. Methods of disrupting the CBF-β–Vif interaction might enable HIV-1 restriction and provide a supplement to current antiviral therapies that primarily target viral proteins.
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host’s cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry1-3 to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV–human protein–protein interactions involving 435 individual human proteins, with ~40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp/) and the LigScore web server (http://salilab.org/ligscore/).
statistical potential; reference state; binding pose; ligand enrichment
The Enzyme Function Initiative (EFI) was recently established to address the challenge of assigning reliable functions to enzymes discovered in bacterial genome projects; in this Current Topic we review the structure and operations of the EFI. The EFI includes the Superfamily/Genome, Protein, Structure, Computation, and Data/Dissemination Cores that provide the infrastructure for reliably predicting the in vitro functions of unknown enzymes. The initial targets for functional assignment are selected from five functionally diverse superfamilies (amidohydrolase, enolase, glutathione transferase, haloalkanoic acid dehalogenase, and isoprenoid synthase), with five superfamily-specific Bridging Projects experimentally testing the predicted in vitro enzymatic activities. The EFI also includes the Microbiology Core that evaluates the in vivo context of in vitro enzymatic functions and confirms the functional predictions of the EFI. The deliverables of the EFI to the scientific community include: 1) development of a large-scale, multidisciplinary sequence/structure-based strategy for functional assignment of unknown enzymes discovered in genome projects (target selection, protein production, structure determination, computation, experimental enzymology, microbiology, and structure-based annotation); 2) dissemination of the strategy to the community via publications, collaborations, workshops, and symposia; 3) computational and bioinformatic tools for using the strategy; 4) provision of experimental protocols and/or reagents for enzyme production and characterization; and 5) dissemination of data via the EFI’s website, enzymefunction.org. The realization of multidisciplinary strategies for functional assignment will begin to define the full metabolic diversity that exists in nature and will impact basic biochemical and evolutionary understanding, as well as a wide range of applications of central importance to industrial, medicinal and pharmaceutical efforts.
G protein-coupled receptors (GPCRs) are attractive targets for pharmaceutical research. With the recent determination of several GPCR X-ray structures, the applicability of structure-based computational methods for ligand identification, such as docking, has increased. Yet, as only about 1% of GPCRs have a known structure, receptor homology modeling remains necessary. In order to investigate the usability of homology models and the inherent selectivity of a particular model in relation to close homologs, we constructed multiple homology models for the A1 adenosine receptor (A1AR) and docked ∼2.2 M lead-like compounds. High-ranking molecules were tested on the A1AR as well as the close homologs A2AAR and A3AR. While the screen yielded numerous potent and novel ligands (hit rate 21% and highest affinity of 400 nM), it delivered few selective compounds. Moreover, most compounds appeared in the top ranks of only one model. These findings have implications for future screens.
An enzyme of unknown function within the amidohydrolase superfamily was discovered to catalyze the hydrolysis of N-6-substituted adenine derivatives, several of which are cytokinins. Cytokinins are a common type of plant hormone and N-6-substituted adenines are also found as modifications to tRNA. Patl2390, from Pseudoalteromonas atlantica T6c, was shown to hydrolytically deaminate N-6-isopentenyladenine to hypoxanthine and isopentenylamine with a kcat/Km of 1.2 × 107 M−1 s−1. Additional substrates include N-6-benzyl adenine, cis- and trans-zeatin, kinetin, O-6-methylguanine, N-6-butyladenine, N-6-methyladenine, N,N-dimethyladenine, 6-methoxypurine, 6-chloropurine, and 6-thiomethylpurine. This enzyme does not catalyze the deamination of adenine or adenosine. A comparative model of Patl2390 was computed using the three-dimensional crystal structure of Pa0148 (PDB code: 3PAO) as a structural template and docking was used to refine the model to accommodate experimentally identified substrates. This is the first identification of an enzyme that will hydrolyze an N-6 substituted side chain larger than methylamine from adenine.
Structural modeling of macromolecular complexes greatly benefits from interactive visualization capabilities. Here we present the integration of several modeling tools into UCSF Chimera. These include comparative modeling by MODELLER, IMP simultaneous fitting of multiple components into electron microscopy density maps by IMP MultiFit, computing of small-angle X-ray scattering profiles and fitting of the corresponding experimental profile by IMP FoXS, and assessment of amino acid sidechain conformations based on rotamer probabilities and local interactions by Chimera.
Integrative structural modeling; restraint-based modeling; electron microscopy; small-angle X-ray scattering; interactive molecular visualization
Integration of EM, protein–protein interaction, and phenotypic data reveals novel insights into the structure and function of the nuclear pore complex’s ∼600-kD heptameric Nup84 complex.
The nuclear pore complex (NPC) is a multiprotein assembly that serves as the sole mediator of nucleocytoplasmic exchange in eukaryotic cells. In this paper, we use an integrative approach to determine the structure of an essential component of the yeast NPC, the ∼600-kD heptameric Nup84 complex, to a precision of ∼1.5 nm. The configuration of the subunit structures was determined by satisfaction of spatial restraints derived from a diverse set of negative-stain electron microscopy and protein domain–mapping data. Phenotypic data were mapped onto the complex, allowing us to identify regions that stabilize the NPC’s interaction with the nuclear envelope membrane and connect the complex to the rest of the NPC. Our data allow us to suggest how the Nup84 complex is assembled into the NPC and propose a scenario for the evolution of the Nup84 complex through a series of gene duplication and loss events. This work demonstrates that integrative approaches based on low-resolution data of sufficient quality can generate functionally informative structures at intermediate resolution.
Recent technological advances enabled high-throughput collection of Small Angle X-ray Scattering (SAXS) profiles of biological macromolecules. Thus, computational methods for integrating SAXS profiles into structural modeling are needed more than ever. Here, we review specifically the use of SAXS profiles for the structural modeling of proteins, nucleic acids, and their complexes. First, the approaches for computing theoretical SAXS profiles from structures are presented. Second, computational methods for predicting protein structures, dynamics of proteins in solution, and assembly structures are covered. Third, we discuss the use of SAXS profiles in integrative structure modeling approaches that depend simultaneously on several data types.
Small Angle X-ray Scattering (SAXS); Protein structure prediction; Macromolecular assembly; Integrative modeling
Virtual ligand screening uses computation to discover new ligands of a protein by screening one or more of its structural models against a database of potential ligands. Comparative protein structure modeling extends the applicability of virtual screening beyond the atomic structures determined by X-ray crystallography or NMR spectroscopy. Here, we describe an integrated modeling and docking protocol, combining comparative modeling by MODELLER and virtual ligand screening by DOCK.
comparative modeling; virtual screening; ligand docking
Nuclear pore complexes (NPCs), responsible for the nucleo-cytoplasmic exchange of proteins and nucleic acids, are dynamic macromolecular assemblies forming an eight-fold symmetric co-axial ring structure. Yeast (Saccharomyces cerevisiae) NPCs are made up of at least 456 polypeptide chains of ~30 distinct sequences. Many of these components (nucleoporins, Nups) share similar structural motifs and form stable subcomplexes. We have determined a high-resolution crystal structure of the C-terminal domain of yeast Nup133 (ScNup133), a component of the heptameric Nup84 subcomplex. Expression tests yielded ScNup133(944-1157) that produced crystals diffracting to 1.9Å resolution.
ScNup133(944-1157) adopts essentially an all α-helical fold, with a short two stranded β-sheet at the C-terminus. The 11 α-helices of ScNup133(944-1157) form a compact fold. In contrast, the previously determined structure of human Nup133(934-1156) bound to a fragment of human Nup107 has its constituent α-helices are arranged in two globular blocks. These differences may reflect structural divergence among homologous nucleoporins.
Nuclear Pore Complex; Nup133; structural genomics
G-Protein coupled receptors (GPCRs) are intensely studied as drug targets and for their role in signaling. With the determination of the first crystal structures, interest in structure-based ligand discovery has increased. Unfortunately, most GPCRs lack experimental structures. The determination of the D3 receptor structure, and a community challenge to predict it, enabled a fully prospective comparison of ligand discovery from a modeled structure versus that of the subsequently released crystal structure. Over 3.3 million molecules were docked against a homology model, and 26 of the highest ranking were tested for binding. Six had affinities from 0.2 to 3.1μM. Subsequently, the crystal structure was released and the docking screen repeated. Of the 25 compounds selected, five had affinities from 0.3 to 3.0μM. One of the novel ligands from the homology model screen was optimized for affinity to 81nM. The feasibility of docking screens against modeled GPCRs more generally is considered.
Adenine deaminase (ADE) catalyzes the conversion of adenine to hypoxanthine and ammonia. The enzyme isolated from Escherichia coli using standard expression conditions was low for the deamination of adenine (kcat = 2.0 s−1; kcat/Km = 2.5 × 103 M−1 s−1). However, when iron was sequestered with a metal chelator and the growth medium was supplemented with Mn2+ prior to induction, the purified enzyme was substantially more active for the deamination of adenine with values of kcat and kcat/Km of 200 s−1 and 5 × 105 M−1s−1, respectively. The apo-enzyme was prepared and reconstituted with Fe2+, Zn2+, or Mn2+. In each case, two enzyme-equivalents of metal were necessary for reconstitution of the deaminase activity. This work provides the first example of any member within the deaminase sub-family of the amidohydrolase superfamily (AHS) to utilize a binuclear metal center for the catalysis of a deamination reaction. [FeII/FeII]-ADE was oxidized to [FeIII/FeIII]-ADE with ferricyanide with inactivation of the deaminase activity. Reducing [FeIII/FeIII]-ADE with dithionite restored the deaminase activity and thus the di-ferrous form of the enzyme is essential for catalytic activity. No evidence for spin-coupling between metal ions was evident by EPR or Mössbauer spectroscopies. The three-dimensional structure of adenine deaminase from Agrobacterium tumefaciens (Atu4426) was determined by X-ray crystallography at 2.2 Å resolution and adenine was modeled into the active site based on homology to other members of the amidohydrolase superfamily. Based on the model of the adenine-ADE complex and subsequent mutagenesis experiments, the roles for each of the highly conserved residues were proposed. Solvent isotope effects, pH rate profiles and solvent viscosity were utilized to propose a chemical reaction mechanism and the identity of the rate limiting steps.