The γ-tubulin ring complex (γTuRC) is the primary microtubule nucleator in cells. γTuRC is assembled from repeating γ-tubulin small complex (γTuSC) subunits and is thought to function as a template by presenting a γ-tubulin ring that mimics microtubule geometry. However, a previous yeast γTuRC structure showed γTuSC in an open conformation that prevents matching to microtubule symmetry. By contrast, we show here that γ-tubulin complexes are in a closed conformation when attached to microtubules. To confirm its functional importance we trapped the closed state and determined its structure, showing that the γ-tubulin ring precisely matches microtubule symmetry and providing detailed insight into γTuRC architecture. Importantly, the closed state is a stronger nucleator, suggesting this conformational switch may allosterically control γTuRC activity. Finally, we demonstrate that γTuRCs have a profound preference for tubulin from the same species.
Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology.
in the glutathione transferase (GST) superfamily catalyze
the conjugation of glutathione (GSH) to electrophilic substrates.
As a consequence they are involved in a number of key biological processes,
including protection of cells against chemical damage, steroid and
prostaglandin biosynthesis, tyrosine catabolism, and cell apoptosis.
Although virtual screening has been used widely to discover substrates
by docking potential noncovalent ligands into active site clefts of
enzymes, docking has been rarely constrained by a covalent bond between
the enzyme and ligand. In this study, we investigate the accuracy
of docking poses and substrate discovery in the GST superfamily, by
docking 6738 potential ligands from the KEGG and MetaCyc compound
libraries into 14 representative GST enzymes with known structures
and substrates using the PLOP program [JacobsonProteins2004, 55, 35115048827].
For X-ray structures as receptors, one of the top 3 ranked models
is within 3 Å all-atom root mean square deviation (RMSD) of the
native complex in 11 of the 14 cases; the enrichment LogAUC value
is better than random in all cases, and better than 25 in 7 of 11
cases. For comparative models as receptors, near-native ligand–enzyme
configurations are often sampled but difficult to rank highly. For
models based on templates with the highest sequence identity, the
enrichment LogAUC is better than 25 in 5 of 11 cases, not significantly
different from the crystal structures. In conclusion, we show that
covalent docking can be a useful tool for substrate discovery and
point out specific challenges for future method improvement.
Describing, understanding, and modulating the function of the cell require elucidation of the structures of macromolecular assemblies. Here, we describe an integrative method for modeling heteromeric complexes using as a starting point disassembly pathways determined by native mass spectrometry (MS). In this method, the pathway data and other available information are encoded as a scoring function on the positions of the subunits of the complex. The method was assessed on its ability to reproduce the native contacts in five benchmark cases with simulated MS data and two cases with real MS data. To illustrate the power of our method, we purified the yeast initiation factor 3 (eIF3) complex and characterized it by native MS and chemical crosslinking MS. We established substoichiometric binding of eIF5 and derived a model for the five-subunit eIF3 complex, at domain level, consistent with its role as a scaffold for other initiation factors.
•Integrative MS method allows topological characterization of heteromeric complexes•Intersubunit crosslinks increase the precision of the predicted topologies•A 3D model of eIF3:eIF5 complex was built using restraints from MS-based methods•Integrative modeling reveals two submodules within eIF3: eIF3b:i:g and eIF3a:c
Politis et al. develop a method for integrating diverse mass spectrometry-based data into topological models of protein complexes. The method was benchmarked on a number of known complexes and used to reveal the architecture of the eIF3 in complex with eIF5.
Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.
Results: We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein–protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures.
Availability and implementation: SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Supplementary data are available at Bioinformatics online.
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.
Bacterial secondary metabolites mediate a broad range of microbe-microbe and microbe-host interactions, and are widely used in human medicine, agriculture and manufacturing. Despite recent advances in synthetic biology, efforts to engineer their biosynthetic genes for the production of unnatural variants are frustrated by a high failure rate. In an effort to better understand what types of genetic changes are most likely to lead to successful improvements, we systematically analyzed the ways in which biosynthetic genes naturally evolve to generate new compounds. We show that large gene clusters appear to evolve through the merger of sub-clusters, which function independently, and are promising units for cluster engineering. Moreover, a subset of gene clusters evolve by concerted evolution, which generates sets of interoperable domains that may enable predictable domain swapping. Finally, many biosynthetic gene clusters evolve in family-specific modes that differ greatly from each other. Overall, this quantitative perspective on the ways in which gene clusters naturally evolve suggests novel strategies for using synthetic biology to engineer the production of unnatural metabolites.
Proteins are not monolithic entities; rather, they can contain multiple domains that mediate distinct interactions, and their functionality can be regulated through post-translational modifications at multiple distinct sites. Traditionally, network biology has ignored such properties of proteins and has instead examined either the physical interactions of whole proteins or the consequences of removing entire genes. In this Review, we discuss experimental and computational methods to increase the resolution of protein– protein, genetic and drug–gene interaction studies to the domain and residue levels. Such work will be crucial for using interaction networks to connect sequence and structural information, and to understand the biological consequences of disease-associated mutations, which will hopefully lead to more effective therapeutic strategies.
Structural analysis of proteins and nucleic acids is complicated by their inherent flexibility, conferred, for example, by linkers between their contiguous domains. Therefore, the macromolecule needs to be represented by an ensemble of conformations instead of a single conformation. Determining this ensemble is challenging because the experimental data are a convoluted average of contributions from multiple conformations. As the number of the ensemble degrees of freedom generally greatly exceeds the number of independent observables, directly deconvolving experimental data into a representative ensemble is an ill-posed problem. Recent developments in sparse approximations and compressive sensing have demonstrated that useful information can be recovered from underdetermined (ill-posed) systems of linear equations by using sparsity regularization. Inspired by these advances, we designed Sparse Ensemble Selection (SES) method for recovering multiple conformations from a limited number of observations. SES is more general and accurate than previously published minimum-ensemble methods, and we use it to obtain representative conformational ensembles of Lys48-linked di-ubiquitin, characterized by the residual dipolar coupling data measured at several pH conditions. These representative ensembles are validated against NMR chemical shift perturbation data and compared to maximum-entropy results. The SES method reproduced and quantified the previously observed pH dependence of the major conformation of Lys48-linked di-ubiquitin, and revealed lesser-populated conformations that are pre-organized for binding known di-ubiquitin receptors, thus providing insights into possible mechanisms of receptor recognition by polyubiquitin. SES is applicable to any experimental observables that can be expressed as a weighted linear combination of data for individual states.
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas
vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Modeller; protein structure; comparative modeling; structure prediction; protein fold
A substantial challenge for genomic enzymology is the reliable annotation for proteins of unknown function. Described here is an interrogation of uncharacterized enzymes from the amidohydrolase superfamily using a structure-guided approach that integrates bioinformatics, computational biology and molecular enzymology. Previously, Tm0936 from Thermotoga maritima was shown to catalyze the deamination of S-adenosylhomocysteine (SAH) to Sinosylhomocysteine (SIH). Homologues of Tm0936 homologues were identified, and substrate profiles were proposed by docking metabolites to modeled enzyme structures. These enzymes were predicted to deaminate analogues of adenosine including SAH, 5’-methylthioadenosine (MTA), adenosine (Ado), and 5’-deoxyadenosine (5’-dAdo). Fifteen of these proteins were purified to homogeneity and the three-dimensional structures of three proteins were determined by X-ray diffraction methods. Enzyme assays supported the structure-based predictions and identified subgroups of enzymes with the capacity to deaminate various combinations of the adenosine analogues, including the first enzyme (Dvu1825) capable of deaminating 5’-dAdo. One subgroup of proteins, exemplified by Moth1224 from Moorella thermoacetica, deaminates guanine to xanthine and another subgroup, exemplified by Avi5431 from Agrobacterium vitis S4, deaminates two oxidatively damaged forms of adenine: 2-oxoadenine and 8-oxoadenine. The sequence and structural basis of the observed substrate specificities was proposed and the substrate profiles for 834 protein sequences were provisionally annotated. The results highlight the power of a multidisciplinary approach for annotating enzymes of unknown function.
Proteins of unknown function belonging to cog1816 and cog0402 were characterized. Sav2595 from Steptomyces avermitilis MA-4680, Acel0264 from Acidothermus cellulolyticus 11B, Nis0429 from Nitratiruptor sp. SB155-2 and Dr0824 from Deinococcus radiodurans R1 were cloned, purified, and their substrate profiles determined. These enzymes were previously incorrectly annotated as adenosine deaminases or chlorohydrolases. It was shown here that these enzymes actually deaminate 6-aminodeoxyfutalosine. The deamination of 6-aminodeoxyfutalosine is part of an alternative menaquinone biosynthetic pathway that involves the formation of futalosine. 6-Aminodeoxyfutalosine is deaminated by these enzymes with catalytic efficiencies greater than 105 M−1 s−1, Km values of 0.9 to 6.0 μM and kcat values of 1.2 to 8.6 s−1. Adenosine, 2′-deoxyadenosine, thiomethyladenosine, and S-adenosylhomocysteine are deaminated at least an order of magnitude slower than 6-aminodeoxyfutalosine. The crystal structure of Nis0429 was determined and the substrate, 6-aminodeoxyfutalosine, was positioned in the active site, based on the presence of adventitiously bound benzoic acid. In this model Ser-145 interacts with the carboxylate moiety of the substrate. The structure of Dr0824 was also determined, but a collapsed active site pocket prevented docking of substrates. A computational model of Sav2595 was built based on the crystal structure of adenosine deaminase and substrates were docked. The model predicted a conserved arginine after β-strand 1 to be partially responsible for the substrate specificity of Sav2595.
The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.
Structural genomics; Phosphatase; NYSGXRC; X-ray crystallography
Eukaryotic translation initiation requires the recruitment of the large, multiprotein eIF3 complex to the 40S ribosomal subunit. We present X-ray structures of all major components of the minimal, six-subunit Saccharomyces cerevisiae eIF3 core. These structures, together with electron microscopy reconstructions, cross-linking coupled to mass spectrometry, and integrative structure modeling, allowed us to position and orient all eIF3 components on the 40S⋅eIF1 complex, revealing an extended, modular arrangement of eIF3 subunits. Yeast eIF3 engages 40S in a clamp-like manner, fully encircling 40S to position key initiation factors on opposite ends of the mRNA channel, providing a platform for the recruitment, assembly, and regulation of the translation initiation machinery. The structures of eIF3 components reported here also have implications for understanding the architecture of the mammalian 43S preinitiation complex and the complex of eIF3, 40S, and the hepatitis C internal ribosomal entry site RNA.
•X-ray structures of major yeast eIF3 components and subcomplexes•Crosslinking coupled to mass-spectrometry analysis of 40S⋅eIF1⋅eIF3 complex•Integrative modeling reveals architecture of 40S⋅eIF1⋅eIF3 complex
A hybrid approach drawing on X-ray structures, crosslinking coupled to mass spectrometry, electron microscopy, and integrative modeling yields mechanistic insights into how eIF3 coordinates translation initiation.
The human multidrug and toxin extrusion (MATE) transporter 1 contributes to the tissue distribution and excretion of many drugs. Inhibition of MATE1 may result in potential drug-drug interactions (DDIs) and alterations in drug exposure and accumulation in various tissues. The primary goals of this project were to identify MATE1 inhibitors with clinical importance or in vitro utility and to elucidate the physicochemical properties that differ between MATE1 and OCT2 inhibitors. Using a fluorescence assay of ASP+ uptake in cells stably expressing MATE1, over 900 prescription drugs were screened and 84 potential MATE1 inhibitors were found. We identified several MATE1 selective inhibitors including four FDA-approved medications that may be clinically relevant MATE1 inhibitors and could cause a clinical DDI. In parallel, a QSAR model identified distinct molecular properties of MATE1 versus OCT2 inhibitors and was used to screen the DrugBank in silico library for new hits in a larger chemical space.
MATE1; MATE2-K; OCT2; SLC47A1; SLC47A2; SLC22A2; prescription drug library; HTS; iterative modeling; membrane transporters; QSAR
Solute Carrier (SLC) transporters are membrane proteins that transport solutes, such as ions, metabolites, peptides, and drugs, across biological membranes, using diverse energy coupling mechanisms. In human, there are 386 SLC transporters, many of which contribute to the absorption, distribution, metabolism, and excretion of drugs and/or can be targeted directly by therapeutics. Recent atomic structures of SLC transporters determined by X-ray crystallography and NMR spectroscopy have significantly expanded the applicability of structure-based prediction of SLC transporter ligands, by enabling both comparative modeling of additional SLC transporters and virtual screening of small molecules libraries against experimental structures as well as comparative models. In this review, we begin by describing computational tools, including sequence analysis, comparative modeling, and virtual screening, that are used to predict the structures and functions of membrane proteins such as SLC transporters. We then illustrate the applications of these tools to predicting ligand specificities of select SLC transporters, followed by experimental validation using uptake kinetic measurements and other assays. We conclude by discussing future directions in the discovery of the SLC transporter ligands.
Membrane transporter; comparative modeling; ligand docking; protein function prediction; structure-based ligand discovery
The nuclear pore complex, composed of proteins termed nucleoporins (Nups), is responsible for nucleocytoplasmic transport in eukaryotes. Nuclear pore complexes (NPCs) form an annular structure composed of the nuclear ring, cytoplasmic ring, a membrane ring, and two inner rings. Nup192 is a major component of the NPC’s inner ring. We report the crystal structure of Saccharomyces cerevisiae Nup192 residues 2–960 [ScNup192(2–960)], which adopts an α-helical fold with three domains (i.e., D1, D2, and D3). Small angle X-ray scattering and electron microscopy (EM) studies reveal that ScNup192(2–960) could undergo long-range transition between “open” and “closed” conformations. We obtained a structural model of full-length ScNup192 based on EM, the structure of ScNup192(2–960), and homology modeling. Evolutionary analyses using the ScNup192(2–960) structure suggest that NPCs and vesicle-coating complexes are descended from a common membrane-coating ancestral complex. We show that suppression of Nup192 expression leads to compromised nuclear transport and hypothesize a role for Nup192 in modulating the permeability of the NPC central channel.
Hemoglobin is a complex system that undergoes conformational changes in response to oxygen, allosteric effectors, mutations, and environmental changes. Here, we study allostery and polymerization of hemoglobin and its variants by application of two previously described methods: (i) AllosMod for simulating allostery dynamics given two allosterically related input structures and (ii) a machine-learning method for dynamics- and structure-based prediction of the mutation impact on allostery (Weinkam et al. J. Mol. Biol. 2013), now applicable to systems with multiple coupled binding sites such as hemoglobin. First, we predict the relative stabilities of substates and microstates of hemoglobin, which are determined primarily by entropy within our model. Next, we predict the impact of 866 annotated mutations on hemoglobin’s oxygen binding equilibrium. We then discuss a subset of 30 mutations that occur in the presence of the sickle cell mutation and whose effects on polymerization have been measured. Seven of these HbS mutations occur in three predicted druggable binding pockets that might be exploited to directly inhibit polymerization; one of these binding pockets is not apparent in the crystal structure but only in structures generated by AllosMod. For the 30 mutations, we predict that mutation-induced conformational changes within a single tetramer tend not to significantly impact polymerization; instead, these mutations more likely impact polymerization by directly perturbing a polymerization interface. Finally, our analysis of allostery allows us to hypothesize why hemoglobin evolved to have multiple subunits and a persistent low frequency sickle cell mutation.
Energy landscape; funnel; Gō model; molecular dynamics; machine-learning
The flexible and heterogeneous nature of carbohydrate chains often renders glycoproteins refractory to traditional structure determination methods. Small Angle X-ray scattering (SAXS) can be a useful tool for obtaining structural information of these systems. All-atom modeling of glycoproteins with flexible glycan chains was applied to interpret the solution SAXS data for a set of glycoproteins. For simpler systems (single glycan, with a well defined protein structure), all-atom modeling generates models in excellent agreement with the scattering pattern, and reveals the approximate spatial occupancy of the glycan chain in solution. For more complex systems (several glycan chains, or unknown protein substructure), the approach can still provide insightful models, though the orientations of glycans become poorly determined. Ab initio shape reconstructions appear to capture the global morphology of glycoproteins, but in most cases offer little information about glycan spatial occupancy. The all-atom modeling methodology is available as a webserver at http://modbase.compbio.ucsf.edu/allosmod-foxs.
Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction.
energy landscape; protein dynamics; machine learning; allostery
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed based on a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. Based on docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, kcat/Km [M−1s−1]): formylpterin, 5.2 × 106; pterin-6-carboxylate, 4.0 × 106; pterin-7-carboxylate, 3.7 × 106; pterin, 3.3 × 106; hydroxymethylpterin, 1.2 × 106; biopterin, 1.0 × 106; D-(+)-neopterin, 3.1 × 105; isoxanthopterin, 2.8 × 105; sepiapterin, 1.3 × 105; folate, 1.3 × 105, xanthopterin, 1.17 × 105; 7,8-dihydrohydroxymethylpterin, 3.3 × 104. While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
A statistical method to merge SAXS profiles using Gaussian processes is presented.
Small-angle X-ray scattering (SAXS) is an experimental technique that allows structural information on biomolecules in solution to be gathered. High-quality SAXS profiles have typically been obtained by manual merging of scattering profiles from different concentrations and exposure times. This procedure is very subjective and results vary from user to user. Up to now, no robust automatic procedure has been published to perform this step, preventing the application of SAXS to high-throughput projects. Here, SAXS Merge, a fully automated statistical method for merging SAXS profiles using Gaussian processes, is presented. This method requires only the buffer-subtracted SAXS profiles in a specific order. At the heart of its formulation is non-linear interpolation using Gaussian processes, which provides a statement of the problem that accounts for correlation in the data.
SAXS; SANS; data curation; Gaussian process; merging
Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution.
Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.