The nuclear pore complex, composed of proteins termed nucleoporins (Nups), is responsible for nucleocytoplasmic transport in eukaryotes. Nuclear pore complexes (NPCs) form an annular structure composed of the nuclear ring, cytoplasmic ring, a membrane ring, and two inner rings. Nup192 is a major component of the NPC’s inner ring. We report the crystal structure of Saccharomyces cerevisiae Nup192 residues 2–960 [ScNup192(2–960)], which adopts an α-helical fold with three domains (i.e., D1, D2, and D3). Small angle X-ray scattering and electron microscopy (EM) studies reveal that ScNup192(2–960) could undergo long-range transition between “open” and “closed” conformations. We obtained a structural model of full-length ScNup192 based on EM, the structure of ScNup192(2–960), and homology modeling. Evolutionary analyses using the ScNup192(2–960) structure suggest that NPCs and vesicle-coating complexes are descended from a common membrane-coating ancestral complex. We show that suppression of Nup192 expression leads to compromised nuclear transport and hypothesize a role for Nup192 in modulating the permeability of the NPC central channel.
Hemoglobin is a complex system that undergoes conformational changes in response to oxygen, allosteric effectors, mutations, and environmental changes. Here, we study allostery and polymerization of hemoglobin and its variants by application of two previously described methods: (i) AllosMod for simulating allostery dynamics given two allosterically related input structures and (ii) a machine-learning method for dynamics- and structure-based prediction of the mutation impact on allostery (Weinkam et al. J. Mol. Biol. 2013), now applicable to systems with multiple coupled binding sites such as hemoglobin. First, we predict the relative stabilities of substates and microstates of hemoglobin, which are determined primarily by entropy within our model. Next, we predict the impact of 866 annotated mutations on hemoglobin’s oxygen binding equilibrium. We then discuss a subset of 30 mutations that occur in the presence of the sickle cell mutation and whose effects on polymerization have been measured. Seven of these HbS mutations occur in three predicted druggable binding pockets that might be exploited to directly inhibit polymerization; one of these binding pockets is not apparent in the crystal structure but only in structures generated by AllosMod. For the 30 mutations, we predict that mutation-induced conformational changes within a single tetramer tend not to significantly impact polymerization; instead, these mutations more likely impact polymerization by directly perturbing a polymerization interface. Finally, our analysis of allostery allows us to hypothesize why hemoglobin evolved to have multiple subunits and a persistent low frequency sickle cell mutation.
Energy landscape; funnel; Gō model; molecular dynamics; machine-learning
The flexible and heterogeneous nature of carbohydrate chains often renders glycoproteins refractory to traditional structure determination methods. Small Angle X-ray scattering (SAXS) can be a useful tool for obtaining structural information of these systems. All-atom modeling of glycoproteins with flexible glycan chains was applied to interpret the solution SAXS data for a set of glycoproteins. For simpler systems (single glycan, with a well defined protein structure), all-atom modeling generates models in excellent agreement with the scattering pattern, and reveals the approximate spatial occupancy of the glycan chain in solution. For more complex systems (several glycan chains, or unknown protein substructure), the approach can still provide insightful models, though the orientations of glycans become poorly determined. Ab initio shape reconstructions appear to capture the global morphology of glycoproteins, but in most cases offer little information about glycan spatial occupancy. The all-atom modeling methodology is available as a webserver at http://modbase.compbio.ucsf.edu/allosmod-foxs.
Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction.
energy landscape; protein dynamics; machine learning; allostery
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed based on a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. Based on docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, kcat/Km [M−1s−1]): formylpterin, 5.2 × 106; pterin-6-carboxylate, 4.0 × 106; pterin-7-carboxylate, 3.7 × 106; pterin, 3.3 × 106; hydroxymethylpterin, 1.2 × 106; biopterin, 1.0 × 106; D-(+)-neopterin, 3.1 × 105; isoxanthopterin, 2.8 × 105; sepiapterin, 1.3 × 105; folate, 1.3 × 105, xanthopterin, 1.17 × 105; 7,8-dihydrohydroxymethylpterin, 3.3 × 104. While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
A statistical method to merge SAXS profiles using Gaussian processes is presented.
Small-angle X-ray scattering (SAXS) is an experimental technique that allows structural information on biomolecules in solution to be gathered. High-quality SAXS profiles have typically been obtained by manual merging of scattering profiles from different concentrations and exposure times. This procedure is very subjective and results vary from user to user. Up to now, no robust automatic procedure has been published to perform this step, preventing the application of SAXS to high-throughput projects. Here, SAXS Merge, a fully automated statistical method for merging SAXS profiles using Gaussian processes, is presented. This method requires only the buffer-subtracted SAXS profiles in a specific order. At the heart of its formulation is non-linear interpolation using Gaussian processes, which provides a statement of the problem that accounts for correlation in the data.
SAXS; SANS; data curation; Gaussian process; merging
Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution.
Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (http://salilab.org/allosmod), the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (http://salilab.org/allosmod-foxs), the FoXSDock server for protein–protein docking filtered by an SAXS profile (http://salilab.org/foxsdock), the SAXS Merge server for automatic merging of SAXS profiles (http://salilab.org/saxsmerge) and the Pose & Rank server for scoring protein–ligand complexes (http://salilab.org/poseandrank). In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
Enamel matrix self-assembly has long been suggested as the driving force behind aligned nanofibrous hydroxyapatite formation. We tested if amelogenin, the main enamel matrix protein, can self-assemble into ribbon-like structures in physiologic solutions. Ribbons 17nm wide were observed to grow several microns in length, requiring calcium, phosphate, and pH 4.0–6.0. The pH range suggests that the formation of ion bridges through protonated histidine residues is essential to self-assembly, supported by a statistical analysis of 212 phosphate-binding proteins predicting twelve phosphate-binding histidines. Thermophoretic analysis verified the importance of calcium and phosphate in self-assembly. X-ray scattering characterized amelogenin dimers with dimensions fitting the cross-section of the amelogenin ribbon, leading to the hypothesis that antiparallel dimers are the building blocks of the ribbons. Over 5–7 days, ribbons self-organized into bundles composed of aligned ribbons mimicking the structure of enamel crystallites in enamel rods. These observations confirm reports of filamentous organic components in developing enamel and provide a new model for matrix-templated enamel mineralization.
Enamel; amelogenin; self-assembly; protonated histidine; biomineralization
Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), infects an estimated two billion people worldwide and is the leading cause of mortality due to infectious disease. The development of new anti-TB therapeutics is required, because of the emergence of multi-drug resistance strains as well as co-infection with other pathogens, especially HIV. Recently, the pharmaceutical company GlaxoSmithKline published the results of a high-throughput screen (HTS) of their two million compound library for anti-mycobacterial phenotypes. The screen revealed 776 compounds with significant activity against the M. tuberculosis H37Rv strain, including a subset of 177 prioritized compounds with high potency and low in vitro cytotoxicity. The next major challenge is the identification of the target proteins. Here, we use a computational approach that integrates historical bioassay data, chemical properties and structural comparisons of selected compounds to propose their potential targets in M. tuberculosis. We predicted 139 target - compound links, providing a necessary basis for further studies to characterize the mode of action of these compounds. The results from our analysis, including the predicted structural models, are available to the wider scientific community in the open source mode, to encourage further development of novel TB therapeutics.
Mycobacterium tuberculosis is a major worldwide pathogen infecting millions individuals every year. Additionally, the number of antibiotic resistant strains has dramatically increased over the last decades. Trying to address this challenge, the pharmaceutical company GlaxoSmithKline has recently published the results of a large-scale high-throughput screen (HTS) that resulted in the release of 776 chemical compound structures active against tuberculosis. We have used this dataset of compounds as input to our computational approach that integrates historical bioassay data, chemical properties and structural comparisons. We propose 139 targets alongside their respective hit compounds and made them open to the wider scientific community. Our hope is that the availability of the experimental data from GSK and our computational analysis will encourage further research providing validated therapeutically targets against this devastating disease.
Summary: Accurate alignment of protein sequences and/or structures is crucial for many biological analyses, including functional annotation of proteins, classifying protein sequences into families, and comparative protein structure modeling. Described here is a web interface to SALIGN, the versatile protein multiple sequence/structure alignment module of MODELLER. The web server automatically determines the best alignment procedure based on the inputs, while allowing the user to override default parameter values. Multiple alignments are guided by a dendrogram computed from a matrix of all pairwise alignment scores. When aligning sequences to structures, SALIGN uses structural environment information to place gaps optimally. If two multiple sequence alignments of related proteins are input to the server, a profile–profile alignment is performed. All features of the server have been previously optimized for accuracy, especially in the contexts of comparative modeling and identification of interacting protein partners.
Availability: The SALIGN web server is freely accessible to the academic community at http://salilab.org/salign. SALIGN is a module of the MODELLER software, also freely available to academic users (http://salilab.org/modeller).
The nuclear pore complex (NPC), embedded in the nuclear envelope, is a large, dynamic molecular assembly that facilitates exchange of macromolecules between the nucleus and cytoplasm. The yeast NPC is an eight-fold symmetric annular structure composed of ~456 polypeptide chains contributed by ~30 distinct proteins termed nucleoporins (Nups). Nup116, identified only in fungi, plays a central role in both protein import and mRNA export through the NPC. Nup116 is a modular protein with N-terminal “FG” repeats containing a Gle2p-binding sequence motif (GLEBS motif) and a NPC targeting domain at its C-terminus. We report the crystal structure of the NPC targeting domain of Candida glabrata Nup116, consisting of residues 882-1034 [CgNup116(882-1034)], at 1.94 Å resolution. The X-ray structure of CgNup116(882-1034) is consistent with the molecular envelope determined in solution by Small Angle X-ray Scattering (SAXS). Structural similarities of CgNup116(882-1034) with homologous domains from Saccharomyces cerevisiae Nup116, S. cerevisiaeNup145N, and human Nup98 are discussed.
Nuclear Pore Complex; Nup116; Nup98; Nup100; Nup145; mRNA export; structural genomics
Although nearly half of today’s major pharmaceutical drugs target human integral membrane proteins (hIMPs), only 30 hIMP structures are currently available in the Protein Data Bank, largely owing to inefficiencies in protein production. Here we describe a strategy for the rapid structure determination of hIMPs, using solution NMR spectroscopy with systematically labeled proteins produced via cell-free expression. We report new backbone structures of six hIMPs, solved in only 18 months from 15 initial targets. Application of our protocols to an additional 135 hIMPs with molecular weight <30 kDa yielded 38 hIMPs suitable for structural characterization by solution NMR spectroscopy without additional optimization.
Restriction factors, such as the retroviral complementary DNA deaminase APOBEC3G, are cellular proteins that dominantly block virus replication1-3. The AIDS virus, human immunodeficiency virus type 1 (HIV-1), produces the accessory factor Vif, which counteracts the host’s antiviral defence by hijacking a ubiquitin ligase complex, containing CUL5, ELOC, ELOB and a RING-box protein, and targeting APOBEC3G for degradation4-10. Here we reveal, using an affinity tag/purification mass spectrometry approach, that Vif additionally recruits the transcription cofactor CBF-β to this ubiquitin ligase complex. CBF-β, which normally functions in concert with RUNX DNA binding proteins, allows the reconstitution of a recombinant six-protein assembly that elicits specific polyubiquitination activity with APOBEC3G, but not the related deaminase APOBEC3A. Using RNA knockdown and genetic complementation studies, we also demonstrate that CBF-β is required for Vif-mediated degradation of APOBEC3G and therefore for preserving HIV-1 infectivity. Finally, simian immunodeficiency virus (SIV) Vif also binds to and requires CBF-β to degrade rhesus macaque APOBEC3G, indicating functional conservation. Methods of disrupting the CBF-β–Vif interaction might enable HIV-1 restriction and provide a supplement to current antiviral therapies that primarily target viral proteins.
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host’s cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry1-3 to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV–human protein–protein interactions involving 435 individual human proteins, with ~40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp/) and the LigScore web server (http://salilab.org/ligscore/).
statistical potential; reference state; binding pose; ligand enrichment
The Enzyme Function Initiative (EFI) was recently established to address the challenge of assigning reliable functions to enzymes discovered in bacterial genome projects; in this Current Topic we review the structure and operations of the EFI. The EFI includes the Superfamily/Genome, Protein, Structure, Computation, and Data/Dissemination Cores that provide the infrastructure for reliably predicting the in vitro functions of unknown enzymes. The initial targets for functional assignment are selected from five functionally diverse superfamilies (amidohydrolase, enolase, glutathione transferase, haloalkanoic acid dehalogenase, and isoprenoid synthase), with five superfamily-specific Bridging Projects experimentally testing the predicted in vitro enzymatic activities. The EFI also includes the Microbiology Core that evaluates the in vivo context of in vitro enzymatic functions and confirms the functional predictions of the EFI. The deliverables of the EFI to the scientific community include: 1) development of a large-scale, multidisciplinary sequence/structure-based strategy for functional assignment of unknown enzymes discovered in genome projects (target selection, protein production, structure determination, computation, experimental enzymology, microbiology, and structure-based annotation); 2) dissemination of the strategy to the community via publications, collaborations, workshops, and symposia; 3) computational and bioinformatic tools for using the strategy; 4) provision of experimental protocols and/or reagents for enzyme production and characterization; and 5) dissemination of data via the EFI’s website, enzymefunction.org. The realization of multidisciplinary strategies for functional assignment will begin to define the full metabolic diversity that exists in nature and will impact basic biochemical and evolutionary understanding, as well as a wide range of applications of central importance to industrial, medicinal and pharmaceutical efforts.
G protein-coupled receptors (GPCRs) are attractive targets for pharmaceutical research. With the recent determination of several GPCR X-ray structures, the applicability of structure-based computational methods for ligand identification, such as docking, has increased. Yet, as only about 1% of GPCRs have a known structure, receptor homology modeling remains necessary. In order to investigate the usability of homology models and the inherent selectivity of a particular model in relation to close homologs, we constructed multiple homology models for the A1 adenosine receptor (A1AR) and docked ∼2.2 M lead-like compounds. High-ranking molecules were tested on the A1AR as well as the close homologs A2AAR and A3AR. While the screen yielded numerous potent and novel ligands (hit rate 21% and highest affinity of 400 nM), it delivered few selective compounds. Moreover, most compounds appeared in the top ranks of only one model. These findings have implications for future screens.
An enzyme of unknown function within the amidohydrolase superfamily was discovered to catalyze the hydrolysis of N-6-substituted adenine derivatives, several of which are cytokinins. Cytokinins are a common type of plant hormone and N-6-substituted adenines are also found as modifications to tRNA. Patl2390, from Pseudoalteromonas atlantica T6c, was shown to hydrolytically deaminate N-6-isopentenyladenine to hypoxanthine and isopentenylamine with a kcat/Km of 1.2 × 107 M−1 s−1. Additional substrates include N-6-benzyl adenine, cis- and trans-zeatin, kinetin, O-6-methylguanine, N-6-butyladenine, N-6-methyladenine, N,N-dimethyladenine, 6-methoxypurine, 6-chloropurine, and 6-thiomethylpurine. This enzyme does not catalyze the deamination of adenine or adenosine. A comparative model of Patl2390 was computed using the three-dimensional crystal structure of Pa0148 (PDB code: 3PAO) as a structural template and docking was used to refine the model to accommodate experimentally identified substrates. This is the first identification of an enzyme that will hydrolyze an N-6 substituted side chain larger than methylamine from adenine.
Structural modeling of macromolecular complexes greatly benefits from interactive visualization capabilities. Here we present the integration of several modeling tools into UCSF Chimera. These include comparative modeling by MODELLER, IMP simultaneous fitting of multiple components into electron microscopy density maps by IMP MultiFit, computing of small-angle X-ray scattering profiles and fitting of the corresponding experimental profile by IMP FoXS, and assessment of amino acid sidechain conformations based on rotamer probabilities and local interactions by Chimera.
Integrative structural modeling; restraint-based modeling; electron microscopy; small-angle X-ray scattering; interactive molecular visualization
Integration of EM, protein–protein interaction, and phenotypic data reveals novel insights into the structure and function of the nuclear pore complex’s ∼600-kD heptameric Nup84 complex.
The nuclear pore complex (NPC) is a multiprotein assembly that serves as the sole mediator of nucleocytoplasmic exchange in eukaryotic cells. In this paper, we use an integrative approach to determine the structure of an essential component of the yeast NPC, the ∼600-kD heptameric Nup84 complex, to a precision of ∼1.5 nm. The configuration of the subunit structures was determined by satisfaction of spatial restraints derived from a diverse set of negative-stain electron microscopy and protein domain–mapping data. Phenotypic data were mapped onto the complex, allowing us to identify regions that stabilize the NPC’s interaction with the nuclear envelope membrane and connect the complex to the rest of the NPC. Our data allow us to suggest how the Nup84 complex is assembled into the NPC and propose a scenario for the evolution of the Nup84 complex through a series of gene duplication and loss events. This work demonstrates that integrative approaches based on low-resolution data of sufficient quality can generate functionally informative structures at intermediate resolution.
Recent technological advances enabled high-throughput collection of Small Angle X-ray Scattering (SAXS) profiles of biological macromolecules. Thus, computational methods for integrating SAXS profiles into structural modeling are needed more than ever. Here, we review specifically the use of SAXS profiles for the structural modeling of proteins, nucleic acids, and their complexes. First, the approaches for computing theoretical SAXS profiles from structures are presented. Second, computational methods for predicting protein structures, dynamics of proteins in solution, and assembly structures are covered. Third, we discuss the use of SAXS profiles in integrative structure modeling approaches that depend simultaneously on several data types.
Small Angle X-ray Scattering (SAXS); Protein structure prediction; Macromolecular assembly; Integrative modeling