Restriction factors, such as the retroviral complementary DNA deaminase APOBEC3G, are cellular proteins that dominantly block virus replication1-3. The AIDS virus, human immunodeficiency virus type 1 (HIV-1), produces the accessory factor Vif, which counteracts the host’s antiviral defence by hijacking a ubiquitin ligase complex, containing CUL5, ELOC, ELOB and a RING-box protein, and targeting APOBEC3G for degradation4-10. Here we reveal, using an affinity tag/purification mass spectrometry approach, that Vif additionally recruits the transcription cofactor CBF-β to this ubiquitin ligase complex. CBF-β, which normally functions in concert with RUNX DNA binding proteins, allows the reconstitution of a recombinant six-protein assembly that elicits specific polyubiquitination activity with APOBEC3G, but not the related deaminase APOBEC3A. Using RNA knockdown and genetic complementation studies, we also demonstrate that CBF-β is required for Vif-mediated degradation of APOBEC3G and therefore for preserving HIV-1 infectivity. Finally, simian immunodeficiency virus (SIV) Vif also binds to and requires CBF-β to degrade rhesus macaque APOBEC3G, indicating functional conservation. Methods of disrupting the CBF-β–Vif interaction might enable HIV-1 restriction and provide a supplement to current antiviral therapies that primarily target viral proteins.
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host’s cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry1-3 to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV–human protein–protein interactions involving 435 individual human proteins, with ~40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp/) and the LigScore web server (http://salilab.org/ligscore/).
statistical potential; reference state; binding pose; ligand enrichment
The Enzyme Function Initiative (EFI) was recently established to address the challenge of assigning reliable functions to enzymes discovered in bacterial genome projects; in this Current Topic we review the structure and operations of the EFI. The EFI includes the Superfamily/Genome, Protein, Structure, Computation, and Data/Dissemination Cores that provide the infrastructure for reliably predicting the in vitro functions of unknown enzymes. The initial targets for functional assignment are selected from five functionally diverse superfamilies (amidohydrolase, enolase, glutathione transferase, haloalkanoic acid dehalogenase, and isoprenoid synthase), with five superfamily-specific Bridging Projects experimentally testing the predicted in vitro enzymatic activities. The EFI also includes the Microbiology Core that evaluates the in vivo context of in vitro enzymatic functions and confirms the functional predictions of the EFI. The deliverables of the EFI to the scientific community include: 1) development of a large-scale, multidisciplinary sequence/structure-based strategy for functional assignment of unknown enzymes discovered in genome projects (target selection, protein production, structure determination, computation, experimental enzymology, microbiology, and structure-based annotation); 2) dissemination of the strategy to the community via publications, collaborations, workshops, and symposia; 3) computational and bioinformatic tools for using the strategy; 4) provision of experimental protocols and/or reagents for enzyme production and characterization; and 5) dissemination of data via the EFI’s website, enzymefunction.org. The realization of multidisciplinary strategies for functional assignment will begin to define the full metabolic diversity that exists in nature and will impact basic biochemical and evolutionary understanding, as well as a wide range of applications of central importance to industrial, medicinal and pharmaceutical efforts.
G protein-coupled receptors (GPCRs) are attractive targets for pharmaceutical research. With the recent determination of several GPCR X-ray structures, the applicability of structure-based computational methods for ligand identification, such as docking, has increased. Yet, as only about 1% of GPCRs have a known structure, receptor homology modeling remains necessary. In order to investigate the usability of homology models and the inherent selectivity of a particular model in relation to close homologs, we constructed multiple homology models for the A1 adenosine receptor (A1AR) and docked ∼2.2 M lead-like compounds. High-ranking molecules were tested on the A1AR as well as the close homologs A2AAR and A3AR. While the screen yielded numerous potent and novel ligands (hit rate 21% and highest affinity of 400 nM), it delivered few selective compounds. Moreover, most compounds appeared in the top ranks of only one model. These findings have implications for future screens.
An enzyme of unknown function within the amidohydrolase superfamily was discovered to catalyze the hydrolysis of N-6-substituted adenine derivatives, several of which are cytokinins. Cytokinins are a common type of plant hormone and N-6-substituted adenines are also found as modifications to tRNA. Patl2390, from Pseudoalteromonas atlantica T6c, was shown to hydrolytically deaminate N-6-isopentenyladenine to hypoxanthine and isopentenylamine with a kcat/Km of 1.2 × 107 M−1 s−1. Additional substrates include N-6-benzyl adenine, cis- and trans-zeatin, kinetin, O-6-methylguanine, N-6-butyladenine, N-6-methyladenine, N,N-dimethyladenine, 6-methoxypurine, 6-chloropurine, and 6-thiomethylpurine. This enzyme does not catalyze the deamination of adenine or adenosine. A comparative model of Patl2390 was computed using the three-dimensional crystal structure of Pa0148 (PDB code: 3PAO) as a structural template and docking was used to refine the model to accommodate experimentally identified substrates. This is the first identification of an enzyme that will hydrolyze an N-6 substituted side chain larger than methylamine from adenine.
Structural modeling of macromolecular complexes greatly benefits from interactive visualization capabilities. Here we present the integration of several modeling tools into UCSF Chimera. These include comparative modeling by MODELLER, IMP simultaneous fitting of multiple components into electron microscopy density maps by IMP MultiFit, computing of small-angle X-ray scattering profiles and fitting of the corresponding experimental profile by IMP FoXS, and assessment of amino acid sidechain conformations based on rotamer probabilities and local interactions by Chimera.
Integrative structural modeling; restraint-based modeling; electron microscopy; small-angle X-ray scattering; interactive molecular visualization
Integration of EM, protein–protein interaction, and phenotypic data reveals novel insights into the structure and function of the nuclear pore complex’s ∼600-kD heptameric Nup84 complex.
The nuclear pore complex (NPC) is a multiprotein assembly that serves as the sole mediator of nucleocytoplasmic exchange in eukaryotic cells. In this paper, we use an integrative approach to determine the structure of an essential component of the yeast NPC, the ∼600-kD heptameric Nup84 complex, to a precision of ∼1.5 nm. The configuration of the subunit structures was determined by satisfaction of spatial restraints derived from a diverse set of negative-stain electron microscopy and protein domain–mapping data. Phenotypic data were mapped onto the complex, allowing us to identify regions that stabilize the NPC’s interaction with the nuclear envelope membrane and connect the complex to the rest of the NPC. Our data allow us to suggest how the Nup84 complex is assembled into the NPC and propose a scenario for the evolution of the Nup84 complex through a series of gene duplication and loss events. This work demonstrates that integrative approaches based on low-resolution data of sufficient quality can generate functionally informative structures at intermediate resolution.
Recent technological advances enabled high-throughput collection of Small Angle X-ray Scattering (SAXS) profiles of biological macromolecules. Thus, computational methods for integrating SAXS profiles into structural modeling are needed more than ever. Here, we review specifically the use of SAXS profiles for the structural modeling of proteins, nucleic acids, and their complexes. First, the approaches for computing theoretical SAXS profiles from structures are presented. Second, computational methods for predicting protein structures, dynamics of proteins in solution, and assembly structures are covered. Third, we discuss the use of SAXS profiles in integrative structure modeling approaches that depend simultaneously on several data types.
Small Angle X-ray Scattering (SAXS); Protein structure prediction; Macromolecular assembly; Integrative modeling
Virtual ligand screening uses computation to discover new ligands of a protein by screening one or more of its structural models against a database of potential ligands. Comparative protein structure modeling extends the applicability of virtual screening beyond the atomic structures determined by X-ray crystallography or NMR spectroscopy. Here, we describe an integrated modeling and docking protocol, combining comparative modeling by MODELLER and virtual ligand screening by DOCK.
comparative modeling; virtual screening; ligand docking
Nuclear pore complexes (NPCs), responsible for the nucleo-cytoplasmic exchange of proteins and nucleic acids, are dynamic macromolecular assemblies forming an eight-fold symmetric co-axial ring structure. Yeast (Saccharomyces cerevisiae) NPCs are made up of at least 456 polypeptide chains of ~30 distinct sequences. Many of these components (nucleoporins, Nups) share similar structural motifs and form stable subcomplexes. We have determined a high-resolution crystal structure of the C-terminal domain of yeast Nup133 (ScNup133), a component of the heptameric Nup84 subcomplex. Expression tests yielded ScNup133(944-1157) that produced crystals diffracting to 1.9Å resolution.
ScNup133(944-1157) adopts essentially an all α-helical fold, with a short two stranded β-sheet at the C-terminus. The 11 α-helices of ScNup133(944-1157) form a compact fold. In contrast, the previously determined structure of human Nup133(934-1156) bound to a fragment of human Nup107 has its constituent α-helices are arranged in two globular blocks. These differences may reflect structural divergence among homologous nucleoporins.
Nuclear Pore Complex; Nup133; structural genomics
G-Protein coupled receptors (GPCRs) are intensely studied as drug targets and for their role in signaling. With the determination of the first crystal structures, interest in structure-based ligand discovery has increased. Unfortunately, most GPCRs lack experimental structures. The determination of the D3 receptor structure, and a community challenge to predict it, enabled a fully prospective comparison of ligand discovery from a modeled structure versus that of the subsequently released crystal structure. Over 3.3 million molecules were docked against a homology model, and 26 of the highest ranking were tested for binding. Six had affinities from 0.2 to 3.1μM. Subsequently, the crystal structure was released and the docking screen repeated. Of the 25 compounds selected, five had affinities from 0.3 to 3.0μM. One of the novel ligands from the homology model screen was optimized for affinity to 81nM. The feasibility of docking screens against modeled GPCRs more generally is considered.
Adenine deaminase (ADE) catalyzes the conversion of adenine to hypoxanthine and ammonia. The enzyme isolated from Escherichia coli using standard expression conditions was low for the deamination of adenine (kcat = 2.0 s−1; kcat/Km = 2.5 × 103 M−1 s−1). However, when iron was sequestered with a metal chelator and the growth medium was supplemented with Mn2+ prior to induction, the purified enzyme was substantially more active for the deamination of adenine with values of kcat and kcat/Km of 200 s−1 and 5 × 105 M−1s−1, respectively. The apo-enzyme was prepared and reconstituted with Fe2+, Zn2+, or Mn2+. In each case, two enzyme-equivalents of metal were necessary for reconstitution of the deaminase activity. This work provides the first example of any member within the deaminase sub-family of the amidohydrolase superfamily (AHS) to utilize a binuclear metal center for the catalysis of a deamination reaction. [FeII/FeII]-ADE was oxidized to [FeIII/FeIII]-ADE with ferricyanide with inactivation of the deaminase activity. Reducing [FeIII/FeIII]-ADE with dithionite restored the deaminase activity and thus the di-ferrous form of the enzyme is essential for catalytic activity. No evidence for spin-coupling between metal ions was evident by EPR or Mössbauer spectroscopies. The three-dimensional structure of adenine deaminase from Agrobacterium tumefaciens (Atu4426) was determined by X-ray crystallography at 2.2 Å resolution and adenine was modeled into the active site based on homology to other members of the amidohydrolase superfamily. Based on the model of the adenine-ADE complex and subsequent mutagenesis experiments, the roles for each of the highly conserved residues were proposed. Solvent isotope effects, pH rate profiles and solvent viscosity were utilized to propose a chemical reaction mechanism and the identity of the rate limiting steps.
While many structures of single protein components are becoming available, structural characterization of their complexes remains challenging. Methods for modeling assembly structures from individual components frequently suffer from large errors, due to protein flexibility and inaccurate scoring functions. However, when additional information is available, it may be possible to reduce the errors and compute near-native complex structures. One such type of information is a small angle X-ray scattering (SAXS) profile that can be collected in a high-throughput fashion from a small amount of sample in solution. Here, we present an efficient method for protein-protein docking with a SAXS profile (FoXSDock): generation of complex models by rigid global docking with PatchDock, filtering of the models based on the SAXS profile, clustering of the models, and refining the interface by flexible docking with FireDock. FoXSDock is benchmarked on 124 protein complexes with simulated SAXS profiles, as well as on 6 complexes with experimentally determined SAXS profiles. When induced fit is less than 1.5Å interface C⟨ RMSD and the fraction residues of missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases. Thus, the integrative approach significantly improves on molecular docking alone. The improvement arises from an increased resolution of rigid docking sampling and more accurate scoring.
Small Angle X-ray Scattering (SAXS); protein-protein docking; macromolecular assembly
Two enzymes of unknown function from the amidohydrolase superfamily were discovered to catalyze the deamination of N-6-methyladenine to hypoxanthine and methyl amine. The methylation of adenine in bacterial DNA is a common modification for the protection of host DNA against restriction endonucleases. The enzyme from Bacillus halodurans, Bh0637, catalyzes the deamination of N-6-methyladenine with a kcat of 185 s−1 and a kcat/Km of 2.5 × 106 M−1 s−1. Bh0637 catalyzes the deamination of N-6-methyladenine two orders of magnitude faster than adenine. A comparative model of Bh0637 was computed using the three-dimensional structure of Atu4426 (PDB code: 3NQB) as a structural template and computational docking was used to rationalize the preferential utilization of N-6-methyladenine over adenine. This is the first identification of an N-6-methyladenine deaminase (6-MAD).
This Meeting Review describes the proceedings and conclusions from the inaugural meeting of the Electron Microscopy Validation Task Force organized by the Unified Data Resource for 3DEM (http://www.emdatabank.org) and held at Rutgers University in New Brunswick, NJ on September 28 and 29, 2010. At the workshop, a group of scientists involved in collecting electron microscopy data, using the data to determine three-dimensional electron microscopy (3DEM) density maps, and building molecular models into the maps explored how to assess maps, models, and other data that are deposited into the Electron Microscopy Data Bank and Protein Data Bank public data archives. The specific recommendations resulting from the workshop aim to increase the impact of 3DEM in biology and medicine.
A set of software tools for building and distributing models of macromolecular assemblies uses an integrative structure modeling approach, which casts the building of models as a computational optimization problem where information is encoded into a scoring function used to evaluate candidate models.
Tremendous variety in form and function is displayed among the intracellular membrane systems of different eukaryotes. Until recently, few clues existed as to how these internal membrane systems had originated and diversified. However, proteomic, structural, and comparative genomics studies together have revealed extensive similarities among many of the protein complexes used in controlling the morphology and trafficking of intracellular membranes. These new insights have had a profound impact on our understanding of the evolutionary origins of the internal architecture of the eukaryotic cell.
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.
In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.
We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Structural models of macromolecular assemblies are instrumental for gaining a mechanistic understanding of cellular processes. Determining these structures is a major challenge for experimental techniques, such as X-ray crystallography, NMR spectroscopy and electron microscopy. Thus, computational modeling techniques, including molecular docking, are required. The development of most molecular docking methods has so far been focused on modeling of binary complexes. We have recently introduced the MultiFit method for modeling the structure of a multi-subunit complex by simultaneously optimizing the fit of the model into an electron microscopy density map of the entire complex and the shape complementarity between interacting subunits. Here, we report algorithmic advances of the MultiFit method that result in an efficient and accurate assembly of the input subunits into their density map. The successful predictions and the increasing number of complexes being characterized by electron microscopy suggests that the CAPRI challenge could be extended to include docking-based modeling of macromolecular assemblies guided by electron microscopy.
symmetry; macromolecules; integrative modeling; electron microscopy; Gaussian mixture model; point alignment; inference
The goals of this study were to determine the role of OCT3 in the pharmacologic action of metformin and to identify and functionally characterize genetic variants of OCT3 with respect to the uptake of metformin and monoamines.
For the pharmacologic studies, we evaluated metformin-induced activation of AMPK, a molecular target of metformin. We used quantitative PCR and immunostaining to localize the transporter and isotopic uptake studies in cells transfected with OCT3 and its nonsynonymous genetic variants for functional analyses.
Quantitative PCR and immunostaining showed that OCT3 was expressed high on the plasma membrane of skeletal muscle and liver, target tissues for metformin action. Both the OCT inhibitor, cimetidine, and OCT3-specific shRNA significantly reduced the activating effect of metformin on AMPK. To identify genetic variants in OCT3, we used recent data from the 1000 Genomes Project and the Pharmacogenomics of Membrane Transporters project. Six novel missense variants were identified. In functional assays, using various monoamines and metformin, 3 variants, T44M (c.131C>T), T400I (c.1199C>T) and V423F (c.1267G>T), showed altered substrate specificity. Notably, in cells expressing T400I and V423F, the uptakes of metformin and catecholamines were significantly reduced but the uptakes of metformin, MPP+ and histamine by T44M were significantly increased more than 50%. Structural modeling suggested that these two variants may be located in the pore-lining (T400) or proximal (V423) membrane-spanning helixes.
Our study suggests that OCT3 plays a role in the therapeutic action of metformin and that genetic variants of OCT3 may modulate metformin and catecholamine action.
Organic cation transporters; monoamines; metformin; pharmacogenomics; muscle cells
The X-ray structure of a putative BenF-like (gene name: PFL1329) protein from Pseudomonas fluorescens Pf-5 (PflBenF) has been determined at 2.6Å resolution. X-ray crystallography revealed a canonical 18-stranded β-barrel fold that forms a central pore with a diameter of ∼4.6Å, which is consistent with the size and physicochemical properties of the presumed aromatic acid substrate, benzoate. Detailed comparisons with the previously-determined structure of Pseudomonas aeruginosa OpdK, a vanillate influx channel, revealed an arginine-rich aromatic acid selectivity filter of nearly identical structure composed of seven highly conserved residues Arg∼Asp∼Arg∼Arg∼Ser∼Asp∼Arg (R∼D∼R∼R∼S∼D∼R sequence motif, where ∼ denotes intervening residues) that define the narrowest part of the pore.
BenF-like; substrate specific porin; OprD superfamily; OprD subfamily; OpdK subfamily; benzoate; Pseudomonas; integral membrane protein
The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDRtargets.org database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritisation of candidate drug targets for pathogens.
Motivation:Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners.
Results:The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to ∼25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate.
Availability: All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online
Nuclear Pore Complex; Nup145; Nup145N; structural genomics; autoproteolysis