Search tips
Search criteria

Results 1-25 (105)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Topological Models of Heteromeric Protein Assemblies from Mass Spectrometry: Application to the Yeast eIF3:eIF5 Complex 
Chemistry & Biology  2015;22(1):117-128.
Describing, understanding, and modulating the function of the cell require elucidation of the structures of macromolecular assemblies. Here, we describe an integrative method for modeling heteromeric complexes using as a starting point disassembly pathways determined by native mass spectrometry (MS). In this method, the pathway data and other available information are encoded as a scoring function on the positions of the subunits of the complex. The method was assessed on its ability to reproduce the native contacts in five benchmark cases with simulated MS data and two cases with real MS data. To illustrate the power of our method, we purified the yeast initiation factor 3 (eIF3) complex and characterized it by native MS and chemical crosslinking MS. We established substoichiometric binding of eIF5 and derived a model for the five-subunit eIF3 complex, at domain level, consistent with its role as a scaffold for other initiation factors.
Graphical Abstract
•Integrative MS method allows topological characterization of heteromeric complexes•Intersubunit crosslinks increase the precision of the predicted topologies•A 3D model of eIF3:eIF5 complex was built using restraints from MS-based methods•Integrative modeling reveals two submodules within eIF3: eIF3b:i:g and eIF3a:c
Politis et al. develop a method for integrating diverse mass spectrometry-based data into topological models of protein complexes. The method was benchmarked on a number of known complexes and used to reveal the architecture of the eIF3 in complex with eIF5.
PMCID: PMC4306531  PMID: 25544043
2.  Optimized atomic statistical potentials: assessment of protein interfaces and loops 
Bioinformatics  2013;29(24):3158-3166.
Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.
Results: We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein–protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures.
Availability and implementation: SOAP-PP and SOAP-Loop are available as part of MODELLER (
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3842762  PMID: 24078704
3.  A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis 
PLoS Computational Biology  2014;10(12):e1004016.
Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways.
Author Summary
Bacterial secondary metabolites mediate a broad range of microbe-microbe and microbe-host interactions, and are widely used in human medicine, agriculture and manufacturing. Despite recent advances in synthetic biology, efforts to engineer their biosynthetic genes for the production of unnatural variants are frustrated by a high failure rate. In an effort to better understand what types of genetic changes are most likely to lead to successful improvements, we systematically analyzed the ways in which biosynthetic genes naturally evolve to generate new compounds. We show that large gene clusters appear to evolve through the merger of sub-clusters, which function independently, and are promising units for cluster engineering. Moreover, a subset of gene clusters evolve by concerted evolution, which generates sets of interoperable domains that may enable predictable domain swapping. Finally, many biosynthetic gene clusters evolve in family-specific modes that differ greatly from each other. Overall, this quantitative perspective on the ways in which gene clusters naturally evolve suggests novel strategies for using synthetic biology to engineer the production of unnatural metabolites.
PMCID: PMC4256081  PMID: 25474254
4.  High-resolution network biology: connecting sequence with function 
Nature reviews. Genetics  2013;14(12):865-879.
Proteins are not monolithic entities; rather, they can contain multiple domains that mediate distinct interactions, and their functionality can be regulated through post-translational modifications at multiple distinct sites. Traditionally, network biology has ignored such properties of proteins and has instead examined either the physical interactions of whole proteins or the consequences of removing entire genes. In this Review, we discuss experimental and computational methods to increase the resolution of protein– protein, genetic and drug–gene interaction studies to the domain and residue levels. Such work will be crucial for using interaction networks to connect sequence and structural information, and to understand the biological consequences of disease-associated mutations, which will hopefully lead to more effective therapeutic strategies.
PMCID: PMC4023809  PMID: 24197012
5.  Recovering a Representative Conformational Ensemble from Underdetermined Macromolecular Structural Data 
Journal of the American Chemical Society  2013;135(44):16595-16609.
Structural analysis of proteins and nucleic acids is complicated by their inherent flexibility, conferred, for example, by linkers between their contiguous domains. Therefore, the macromolecule needs to be represented by an ensemble of conformations instead of a single conformation. Determining this ensemble is challenging because the experimental data are a convoluted average of contributions from multiple conformations. As the number of the ensemble degrees of freedom generally greatly exceeds the number of independent observables, directly deconvolving experimental data into a representative ensemble is an ill-posed problem. Recent developments in sparse approximations and compressive sensing have demonstrated that useful information can be recovered from underdetermined (ill-posed) systems of linear equations by using sparsity regularization. Inspired by these advances, we designed Sparse Ensemble Selection (SES) method for recovering multiple conformations from a limited number of observations. SES is more general and accurate than previously published minimum-ensemble methods, and we use it to obtain representative conformational ensembles of Lys48-linked di-ubiquitin, characterized by the residual dipolar coupling data measured at several pH conditions. These representative ensembles are validated against NMR chemical shift perturbation data and compared to maximum-entropy results. The SES method reproduced and quantified the previously observed pH dependence of the major conformation of Lys48-linked di-ubiquitin, and revealed lesser-populated conformations that are pre-organized for binding known di-ubiquitin receptors, thus providing insights into possible mechanisms of receptor recognition by polyubiquitin. SES is applicable to any experimental observables that can be expressed as a weighted linear combination of data for individual states.
PMCID: PMC3902174  PMID: 24093873
6.  Comparative Protein Structure Modeling Using Modeller 
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
PMCID: PMC4186674  PMID: 18428767
Modeller; protein structure; comparative modeling; structure prediction; protein fold
7.  Structure-Guided Discovery of New Deaminase Enzymes 
Journal of the American Chemical Society  2013;135(37):10.1021/ja4066078.
A substantial challenge for genomic enzymology is the reliable annotation for proteins of unknown function. Described here is an interrogation of uncharacterized enzymes from the amidohydrolase superfamily using a structure-guided approach that integrates bioinformatics, computational biology and molecular enzymology. Previously, Tm0936 from Thermotoga maritima was shown to catalyze the deamination of S-adenosylhomocysteine (SAH) to Sinosylhomocysteine (SIH). Homologues of Tm0936 homologues were identified, and substrate profiles were proposed by docking metabolites to modeled enzyme structures. These enzymes were predicted to deaminate analogues of adenosine including SAH, 5’-methylthioadenosine (MTA), adenosine (Ado), and 5’-deoxyadenosine (5’-dAdo). Fifteen of these proteins were purified to homogeneity and the three-dimensional structures of three proteins were determined by X-ray diffraction methods. Enzyme assays supported the structure-based predictions and identified subgroups of enzymes with the capacity to deaminate various combinations of the adenosine analogues, including the first enzyme (Dvu1825) capable of deaminating 5’-dAdo. One subgroup of proteins, exemplified by Moth1224 from Moorella thermoacetica, deaminates guanine to xanthine and another subgroup, exemplified by Avi5431 from Agrobacterium vitis S4, deaminates two oxidatively damaged forms of adenine: 2-oxoadenine and 8-oxoadenine. The sequence and structural basis of the observed substrate specificities was proposed and the substrate profiles for 834 protein sequences were provisionally annotated. The results highlight the power of a multidisciplinary approach for annotating enzymes of unknown function.
PMCID: PMC3827683  PMID: 23968233
8.  Deamination of 6-Aminodeoxyfutalosine in Menaquinone Biosynthesis by Distantly Related Enzymes 
Biochemistry  2013;52(37):10.1021/bi400750a.
Proteins of unknown function belonging to cog1816 and cog0402 were characterized. Sav2595 from Steptomyces avermitilis MA-4680, Acel0264 from Acidothermus cellulolyticus 11B, Nis0429 from Nitratiruptor sp. SB155-2 and Dr0824 from Deinococcus radiodurans R1 were cloned, purified, and their substrate profiles determined. These enzymes were previously incorrectly annotated as adenosine deaminases or chlorohydrolases. It was shown here that these enzymes actually deaminate 6-aminodeoxyfutalosine. The deamination of 6-aminodeoxyfutalosine is part of an alternative menaquinone biosynthetic pathway that involves the formation of futalosine. 6-Aminodeoxyfutalosine is deaminated by these enzymes with catalytic efficiencies greater than 105 M−1 s−1, Km values of 0.9 to 6.0 μM and kcat values of 1.2 to 8.6 s−1. Adenosine, 2′-deoxyadenosine, thiomethyladenosine, and S-adenosylhomocysteine are deaminated at least an order of magnitude slower than 6-aminodeoxyfutalosine. The crystal structure of Nis0429 was determined and the substrate, 6-aminodeoxyfutalosine, was positioned in the active site, based on the presence of adventitiously bound benzoic acid. In this model Ser-145 interacts with the carboxylate moiety of the substrate. The structure of Dr0824 was also determined, but a collapsed active site pocket prevented docking of substrates. A computational model of Sav2595 was built based on the crystal structure of adenosine deaminase and substrates were docked. The model predicted a conserved arginine after β-strand 1 to be partially responsible for the substrate specificity of Sav2595.
PMCID: PMC3813303  PMID: 23972005
9.  Structural genomics of protein phosphatases 
The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.
PMCID: PMC4163028  PMID: 18058037
Structural genomics; Phosphatase; NYSGXRC; X-ray crystallography
10.  Molecular Architecture of the 40S⋅eIF1⋅eIF3 Translation Initiation Complex 
Cell  2014;158(5):1123-1135.
Eukaryotic translation initiation requires the recruitment of the large, multiprotein eIF3 complex to the 40S ribosomal subunit. We present X-ray structures of all major components of the minimal, six-subunit Saccharomyces cerevisiae eIF3 core. These structures, together with electron microscopy reconstructions, cross-linking coupled to mass spectrometry, and integrative structure modeling, allowed us to position and orient all eIF3 components on the 40S⋅eIF1 complex, revealing an extended, modular arrangement of eIF3 subunits. Yeast eIF3 engages 40S in a clamp-like manner, fully encircling 40S to position key initiation factors on opposite ends of the mRNA channel, providing a platform for the recruitment, assembly, and regulation of the translation initiation machinery. The structures of eIF3 components reported here also have implications for understanding the architecture of the mammalian 43S preinitiation complex and the complex of eIF3, 40S, and the hepatitis C internal ribosomal entry site RNA.
Graphical Abstract
•X-ray structures of major yeast eIF3 components and subcomplexes•Crosslinking coupled to mass-spectrometry analysis of 40S⋅eIF1⋅eIF3 complex•Integrative modeling reveals architecture of 40S⋅eIF1⋅eIF3 complex
A hybrid approach drawing on X-ray structures, crosslinking coupled to mass spectrometry, electron microscopy, and integrative modeling yields mechanistic insights into how eIF3 coordinates translation initiation.
PMCID: PMC4151992  PMID: 25171412
11.  Discovery of Potent, Selective Multidrug And Toxin Extrusion Transporter 1 (MATE1, SLC47A1) Inhibitors Through Prescription Drug Profiling and Computational Modeling 
Journal of medicinal chemistry  2013;56(3):781-795.
The human multidrug and toxin extrusion (MATE) transporter 1 contributes to the tissue distribution and excretion of many drugs. Inhibition of MATE1 may result in potential drug-drug interactions (DDIs) and alterations in drug exposure and accumulation in various tissues. The primary goals of this project were to identify MATE1 inhibitors with clinical importance or in vitro utility and to elucidate the physicochemical properties that differ between MATE1 and OCT2 inhibitors. Using a fluorescence assay of ASP+ uptake in cells stably expressing MATE1, over 900 prescription drugs were screened and 84 potential MATE1 inhibitors were found. We identified several MATE1 selective inhibitors including four FDA-approved medications that may be clinically relevant MATE1 inhibitors and could cause a clinical DDI. In parallel, a QSAR model identified distinct molecular properties of MATE1 versus OCT2 inhibitors and was used to screen the DrugBank in silico library for new hits in a larger chemical space.
PMCID: PMC4068829  PMID: 23241029
MATE1; MATE2-K; OCT2; SLC47A1; SLC47A2; SLC22A2; prescription drug library; HTS; iterative modeling; membrane transporters; QSAR
12.  SLC classification: an update 
PMCID: PMC4068830  PMID: 23778706
13.  Molecular modeling and ligand docking for Solute Carrier (SLC) transporters 
Solute Carrier (SLC) transporters are membrane proteins that transport solutes, such as ions, metabolites, peptides, and drugs, across biological membranes, using diverse energy coupling mechanisms. In human, there are 386 SLC transporters, many of which contribute to the absorption, distribution, metabolism, and excretion of drugs and/or can be targeted directly by therapeutics. Recent atomic structures of SLC transporters determined by X-ray crystallography and NMR spectroscopy have significantly expanded the applicability of structure-based prediction of SLC transporter ligands, by enabling both comparative modeling of additional SLC transporters and virtual screening of small molecules libraries against experimental structures as well as comparative models. In this review, we begin by describing computational tools, including sequence analysis, comparative modeling, and virtual screening, that are used to predict the structures and functions of membrane proteins such as SLC transporters. We then illustrate the applications of these tools to predicting ligand specificities of select SLC transporters, followed by experimental validation using uptake kinetic measurements and other assays. We conclude by discussing future directions in the discovery of the SLC transporter ligands.
PMCID: PMC4056341  PMID: 23578028
Membrane transporter; comparative modeling; ligand docking; protein function prediction; structure-based ligand discovery
14.  Structure, Dynamics, Evolution, and Function of a Major Scaffold Component in the Nuclear Pore Complex 
The nuclear pore complex, composed of proteins termed nucleoporins (Nups), is responsible for nucleocytoplasmic transport in eukaryotes. Nuclear pore complexes (NPCs) form an annular structure composed of the nuclear ring, cytoplasmic ring, a membrane ring, and two inner rings. Nup192 is a major component of the NPC’s inner ring. We report the crystal structure of Saccharomyces cerevisiae Nup192 residues 2–960 [ScNup192(2–960)], which adopts an α-helical fold with three domains (i.e., D1, D2, and D3). Small angle X-ray scattering and electron microscopy (EM) studies reveal that ScNup192(2–960) could undergo long-range transition between “open” and “closed” conformations. We obtained a structural model of full-length ScNup192 based on EM, the structure of ScNup192(2–960), and homology modeling. Evolutionary analyses using the ScNup192(2–960) structure suggest that NPCs and vesicle-coating complexes are descended from a common membrane-coating ancestral complex. We show that suppression of Nup192 expression leads to compromised nuclear transport and hypothesize a role for Nup192 in modulating the permeability of the NPC central channel.
PMCID: PMC3755625  PMID: 23499021
15.  Mapping Polymerization and Allostery of Hemoglobin S Using Point Mutations 
The journal of physical chemistry. B  2013;117(42):13058-13068.
Hemoglobin is a complex system that undergoes conformational changes in response to oxygen, allosteric effectors, mutations, and environmental changes. Here, we study allostery and polymerization of hemoglobin and its variants by application of two previously described methods: (i) AllosMod for simulating allostery dynamics given two allosterically related input structures and (ii) a machine-learning method for dynamics- and structure-based prediction of the mutation impact on allostery (Weinkam et al. J. Mol. Biol. 2013), now applicable to systems with multiple coupled binding sites such as hemoglobin. First, we predict the relative stabilities of substates and microstates of hemoglobin, which are determined primarily by entropy within our model. Next, we predict the impact of 866 annotated mutations on hemoglobin’s oxygen binding equilibrium. We then discuss a subset of 30 mutations that occur in the presence of the sickle cell mutation and whose effects on polymerization have been measured. Seven of these HbS mutations occur in three predicted druggable binding pockets that might be exploited to directly inhibit polymerization; one of these binding pockets is not apparent in the crystal structure but only in structures generated by AllosMod. For the 30 mutations, we predict that mutation-induced conformational changes within a single tetramer tend not to significantly impact polymerization; instead, these mutations more likely impact polymerization by directly perturbing a polymerization interface. Finally, our analysis of allostery allows us to hypothesize why hemoglobin evolved to have multiple subunits and a persistent low frequency sickle cell mutation.
PMCID: PMC3973026  PMID: 23957820
Energy landscape; funnel; Gō model; molecular dynamics; machine-learning
16.  All-atom ensemble modeling to analyze small angle X-ray scattering of glycosylated proteins 
Structure (London, England : 1993)  2013;21(3):10.1016/j.str.2013.02.004.
The flexible and heterogeneous nature of carbohydrate chains often renders glycoproteins refractory to traditional structure determination methods. Small Angle X-ray scattering (SAXS) can be a useful tool for obtaining structural information of these systems. All-atom modeling of glycoproteins with flexible glycan chains was applied to interpret the solution SAXS data for a set of glycoproteins. For simpler systems (single glycan, with a well defined protein structure), all-atom modeling generates models in excellent agreement with the scattering pattern, and reveals the approximate spatial occupancy of the glycan chain in solution. For more complex systems (several glycan chains, or unknown protein substructure), the approach can still provide insightful models, though the orientations of glycans become poorly determined. Ab initio shape reconstructions appear to capture the global morphology of glycoproteins, but in most cases offer little information about glycan spatial occupancy. The all-atom modeling methodology is available as a webserver at
PMCID: PMC3840220  PMID: 23473666
17.  Integrative Structural Biology 
Science (New York, N.Y.)  2013;339(6122):913-915.
PMCID: PMC3633482  PMID: 23430643
18.  Impact of mutations on the allosteric conformational equilibrium 
Journal of molecular biology  2012;425(3):647-661.
Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction.
PMCID: PMC3557769  PMID: 23228330
energy landscape; protein dynamics; machine learning; allostery
19.  The Assignment of Pterin Deaminase Activity to an Enzyme of Unknown Function Guided by Homology Modeling and Docking† 
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed based on a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. Based on docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, kcat/Km [M−1s−1]): formylpterin, 5.2 × 106; pterin-6-carboxylate, 4.0 × 106; pterin-7-carboxylate, 3.7 × 106; pterin, 3.3 × 106; hydroxymethylpterin, 1.2 × 106; biopterin, 1.0 × 106; D-(+)-neopterin, 3.1 × 105; isoxanthopterin, 2.8 × 105; sepiapterin, 1.3 × 105; folate, 1.3 × 105, xanthopterin, 1.17 × 105; 7,8-dihydrohydroxymethylpterin, 3.3 × 104. While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
PMCID: PMC3557803  PMID: 23256477
21.  SAXS Merge: an automated statistical method to merge SAXS profiles using Gaussian processes 
Journal of Synchrotron Radiation  2013;21(Pt 1):203-208.
A statistical method to merge SAXS profiles using Gaussian processes is presented.
Small-angle X-ray scattering (SAXS) is an experimental technique that allows structural information on biomolecules in solution to be gathered. High-quality SAXS profiles have typically been obtained by manual merging of scattering profiles from different concentrations and exposure times. This procedure is very subjective and results vary from user to user. Up to now, no robust automatic procedure has been published to perform this step, preventing the application of SAXS to high-throughput projects. Here, SAXS Merge, a fully automated statistical method for merging SAXS profiles using Gaussian processes, is presented. This method requires only the buffer-subtracted SAXS profiles in a specific order. At the heart of its formulation is non-linear interpolation using Gaussian processes, which provides a statement of the problem that accounts for correlation in the data.
PMCID: PMC3874021  PMID: 24365937
SAXS; SANS; data curation; Gaussian process; merging
22.  A method for integrative structure determination of protein-protein complexes 
Bioinformatics  2012;28(24):3282-3289.
Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution.
Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure.
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3519461  PMID: 23093611
23.  ModBase, a database of annotated comparative protein structure models and associated resources 
Nucleic Acids Research  2013;42(Database issue):D336-D346.
ModBase ( is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment ( ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server ( ModBase models are also available through the Protein Model Portal ( Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (, the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (, the FoXSDock server for protein–protein docking filtered by an SAXS profile (, the SAXS Merge server for automatic merging of SAXS profiles ( and the Pose & Rank server for scoring protein–ligand complexes ( In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
PMCID: PMC3965011  PMID: 24271400
24.  Self-assembly of Filamentous Amelogenin Requires Calcium and Phosphate: From Dimers via Nanoribbons to Fibrils 
Biomacromolecules  2012;13(11):3494-3502.
Enamel matrix self-assembly has long been suggested as the driving force behind aligned nanofibrous hydroxyapatite formation. We tested if amelogenin, the main enamel matrix protein, can self-assemble into ribbon-like structures in physiologic solutions. Ribbons 17nm wide were observed to grow several microns in length, requiring calcium, phosphate, and pH 4.0–6.0. The pH range suggests that the formation of ion bridges through protonated histidine residues is essential to self-assembly, supported by a statistical analysis of 212 phosphate-binding proteins predicting twelve phosphate-binding histidines. Thermophoretic analysis verified the importance of calcium and phosphate in self-assembly. X-ray scattering characterized amelogenin dimers with dimensions fitting the cross-section of the amelogenin ribbon, leading to the hypothesis that antiparallel dimers are the building blocks of the ribbons. Over 5–7 days, ribbons self-organized into bundles composed of aligned ribbons mimicking the structure of enamel crystallites in enamel rods. These observations confirm reports of filamentous organic components in developing enamel and provide a new model for matrix-templated enamel mineralization.
PMCID: PMC3496023  PMID: 22974364
Enamel; amelogenin; self-assembly; protonated histidine; biomineralization

Results 1-25 (105)