Homoisocitrate dehydrogenase (HICDH) catalyzes the conversion of homoisocitrate to 2-oxoadipate, the third enzymatic step in the α-aminoadipate pathway by which lysine is synthesized in fungi and certain archaebacteria. This enzyme represents a potential target for anti-fungal drug design. Here, we describe the first crystal structures of a fungal HICDH, including structures of an apoenzyme and a binary complex with a glycine tri-peptide. The structures illustrate the homology of HICDH with other β-hydroxyacid oxidative decarboxylases and reveal key differences with the active site of Thermus thermophilus HICDH that provide insights into the differences in substrate specificity of these enzymes.
X-ray crystallography; amino acid metabolism; lysine biosynthesis; β-hydroxyacid oxidative decarboxylase
A number of methods have been described for identifying pairs of contacting residues in protein three-dimensional structures, but it is unclear how many contacts are required for accurate structure modeling. The CASP10 assisted contact experiment provided a blind test of contact guided protein structure modeling. We describe the models generated for these contact guided prediction challenges using the Rosetta structure modeling methodology. For nearly all cases, the submitted models had the correct overall topology, and in some cases, they had near atomic-level accuracy; for example the model of the 384 residue homo-oligomeric tetramer (Tc680o) had only 2.9 Å root-mean-square deviation (RMSD) from the crystal structure. Our results suggest that experimental and bioinformatic methods for obtaining contact information may need to generate only one correct contact for every 12 residues in the protein to allow accurate topology level modeling.
protein structure prediction; rosetta; comparative modeling; homology modeling; ab initio prediction; contact prediction
We compare results of the community efforts in modeling protein structures in the tenth CASP experiment, with those in earlier CASPs, particularly in CASP5, a decade ago. There is a substantial improvement in template based model accuracy as reflected in more successful modeling of regions of structure not easily derived from a single experimental structure template, most likely reflecting intensive work within the modeling community in developing methods that make use of multiple templates, as well as the increased number of experimental structures available. Deriving structural information not obvious from a template is the most demanding as well as one of the most useful tasks that modeling can perform. Thus this is gratifying progress. By contrast, overall backbone accuracy of models appears little changed in the last decade. This puzzling result is explained by two factors – increased database size in some ways makes it harder to choose the best available templates, and the increased intrinsic difficulty of CASP targets, as experimental work has progressed to larger and more unusual structures. There is no detectable recent improvement in template free modeling, but again, this may reflect the changing nature of CASP targets.
Protein Structure Prediction; Community Wide Experiment; CASP
We used molecular dynamics (MD) simulations for structure refinement of CASP10 targets. Refinement was achieved by selecting structures from the MD-based ensembles followed by structural averaging. The overall performance of this method in CASP10 is described and specific aspects are analyzed in detail to provide insight into key components. In particular, the use of different restraint types, sampling from multiple short simulations vs. a single long simulation, the success of a quality assessment criterion, the application of scoring vs. averaging, and the impact of a final refinement step are discussed in detail.
CASP; structure prediction; scoring; protein; quality assessment
Cellular Retinoic Acid Binding Protein II (CRABPII) has been re-engineered to specifically bind and react with all–trans-retinal to form a protonated Schiff base. Each step of this process has been dissected and four residues (Lys132, Tyr134, Arg111, Glu121) within the CRABPII binding site have been identified as crucial for imine formation and/or protonation. The precise role of each residue has been examined through site directed mutagenesis and crystallographic studies. The crystal structure of the R132K:L121E-CRABPII double mutant suggests a direct interaction between engineered Glu121 and the native Arg111, which is critical for both Schiff base formation and protonation.
Schiff base; B�rgi-Dunitz angle; Cellular retinoic acid binding protein; ordered water network; rhodopsin protein mimic
The N-terminal region of the chemokine RANTES is critical for its function. A synthesized N-terminally modified analog of RANTES, P2-RANTES, was discovered using a phage display selection against living CCR5-expressing cells, and has been reported to inhibit HIV-1 env-mediated cell-cell fusion at subnanomolar levels [Hartley et al J. Virol 77, 6637–44 (2003)]. In the present study we produced this protein using E. coli overexpression and extensively studied its structure and function. The X-ray crystal structure of P2-RANTES was solved and refined at 1.7 Å resolution. This protein was found to be predominantly a monomer in solution by analytical ultracentrifugation, but a tetramer in the crystal. In studies of glycosaminoglycan binding, P2-RANTES was found to be significantly less able to bind heparin than wild type RANTES. We also tested this protein for receptor internalization where it was shown to be functional, in cell-cell fusion assays where recombinant P2-RANTES was a potent fusion inhibitor (IC50= 2.4 ± 0.8 nM), and in single round infection assays where P2-RANTES inhibited at sub-nanomolar levels. Further, in a modified fusion assay designed to test specificity of inhibition, P2-RANTES was also highly effective, with a 65-fold improvement over the fusion inhibitor C37, which is closely related to the clinically approved inhibitor T-20. These studies provide detailed structural and functional information for this novel N-terminally modified chemokine mutant. This information will be very useful in the development of more potent anti-HIV agents.
HIV fusion inhibitor; chemokine; GAG binding; quaternary state; competition fusion assay
DNA topology; topoisomerase; gyrase; antibiotic resistance; Mycobacterium tuberculosis; breakage-reunion domain
PF10014 is a novel family of 2-oxyglutarate-Fe2+-dependent dioxygenases that are involved in biosynthesis of antibiotics and regulation of biofilm formation, likely by catalyzing hydroxylation of free amino acids or other related ligands. The crystal structure of a PF10014 member from Methylibium petroleiphilum at 1.9 Å resolution shows strong structural similarity to cupin dioxygenases in overall fold and active site, despite very remote homology. However, one of the β-strands of the cupin catalytic core is replaced by a loop that displays conformational isomerism that likely regulates the active site.
PF10014/BsmA; cupin dioxygenase; free amino acids; 2-oxyglutarate; ferrous iron
We developed a method called Residue Contact Frequency (RCF), which uses the complex structures generated by the protein-protein docking algorithm ZDOCK to predict interface residues. Unlike interface prediction algorithms that are based on monomers alone, RCF is binding partner specific. We evaluated the performance of RCF using the Area Under the Precision-Recall (PR) Curve (AUC) on a large protein docking Benchmark. RCF (AUC=0.44) performed as well as meta-PPISP (AUC=0.43), which is one of the best monomer-based interface prediction methods. In addition, we test a Support Vector Machine (SVM) to combine RCF with meta-PPISP and another monomer-based interface prediction algorithm Evolutionary Trace to further improve the performance. We found that the SVM that combined RCF and meta-PPISP achieved the best performance (AUC=0.47).
We used RCF to predict the binding interfaces of proteins that can bind to multiple partners and RCF was able to correctly predict interface residues that are unique for the respective binding partners. Furthermore, we found that residues that contributed greatly to binding affinity (hotspot residues) had significantly higher RCF than other residues.
Protein-Protein Docking; Protein Interface Prediction; Machine Learning; Support Vector Machine; Hotspot Prediction
EccA1 is an important component of the type VII secretion system (T7SS) that is responsible for transport of virulence factors in pathogenic mycobacteria. EccA1 has an N-terminal domain of unknown function and a C-terminal AAA+ (ATPases associated with various cellular activities) domain. Here we report the crystal structure of the N-terminal domain of EccA1 from Mycobacterium tuberculosis, which shows an arrangement of six tetratricopeptide repeats that may mediate interactions of EccA1 with secreted substrates. Furthermore, the size and shape of the N-terminal domain suggest its orientation in the context of a hexamer model of full-length EccA1.
Rv3868; tetratricopeptide repeat; TPR domain; AAA+ ATPase; type VII secretion system
Phosphorylation is a crucial step in many cellular processes, ranging from metabolic reactions involved in energy transformation to signaling cascades. In many instances, protein domains specifically recognize the phosphogroup. Knowledge of the binding site provides insights into the interaction, and it can also be exploited for therapeutic purposes. Previous studies have shown that proteins interacting with phosphogroups are highly heterogeneous, and no single property can be used to reliably identify the binding site. Here we present an energy-based computational procedure that exploits the protein 3D structure to identify binding sites involved in the recognition of phosphogroups. The procedure is validated on three datasets containing over 200 proteins binding to ATP, phosphopeptides, and phosphosugars. A comparison against other three generic binding site identification approaches shows higher accuracy values for our method, with a correct identification rate in the 80-90% range for the top three predicted sites. Addition of conservation information further improves the performance. The method presented here can be used as a first step in functional annotation, or to guide mutagenesis experiments and further studies such as molecular docking.
Predictions of protein-protein binders and binding affinities have traditionally focused on features pertaining to the native complexes. In developing a computational method for predicting protein-protein association rate constants, we introduced the concept of transient complex after mapping the interaction energy surface. The transient complex is located at the outer boundary of the bound-state energy well, having near-native separation and relative orientation between the subunits but not yet formed most of the short-range native interactions. We found that the width of the binding funnel and the electrostatic interaction energy of the transient complex are among the features predictive of binders and binding affinities. These ideas were very promising for the five affinity-related targets (T43–45, 55, and 56) of CAPRI rounds 20–27. For T43, we ranked the single crystallographic complex as number 1 and were one of only two groups that clearly identified that complex as a true binder; for T44, we ranked the only design with measurable binding affinity as number 4. For the nine docking targets, continuing on our success in previous CAPRI rounds, we produced 10 medium-quality models for T47 and acceptable models for T48 and T49. We conclude that the interaction energy landscape and the transient complex in particular will complement existing features in leading to better prediction of binding affinities.
transient complex; interaction energy landscape; binding affinity; protein docking; protein association
Inclusion of entropy is important and challenging for protein-protein binding prediction. Here, we present a statistical mechanics-based approach to empirically consider the effect of orientational entropy. Specifically, we globally sample the possible binding orientations based on a simple shape-complementarity scoring function using an FFT-type docking method. Then, for each generated orientation we calculate the probability through the partition function of the ensemble of accessible states, which are assumed to be represented by the set of nearby binding modes. For each mode, the interaction energy is calculated from our ITScorePP scoring function that was developed in our laboratory based on principles of statistical mechanics. Using the above protocol, we present the results of our participation in Rounds 22–27 of the CAPRI (Critical Assessment of PRedicted Interactions) experiment for ten targets (T46-T58). Additional experimental information, such as low-resolution SAXS data, was used when available. In the prediction (or docking) experiments of the ten target complexes, we achieved correct binding modes for six targets: one with high accuracy (T47), two with medium accuracy (T48 and T57), and three with acceptable accuracy (T49, T50, and T58). In the scoring experiments of seven target complexes, we obtained correct binding modes for six targets: one with high accuracy (T47), two with medium accuracy (T49 and T50), and three with acceptable accuracy (T46, T51, and T53).
protein-protein interaction; CAPRI experiments; scoring function; entropy; molecular docking
The protein docking server ClusPro has been participating in CAPRI since its introduction in 2004. This paper evaluates the performance of ClusPro 2.0 for targets 46–58 in rounds 22–27 of CAPRI. The analysis leads to a number of important observations. First, ClusPro reliably yields acceptable or medium accuracy models for targets of moderate difficulty that have also been successfully predicted by other groups, and fails only for targets that have few acceptable models submitted. Second, the quality of automated docking by ClusPro is very close to that of the best human predictor groups, including our own submissions. This is very important, because servers have to submit results within 48 hours and the predictions should be reproducible, whereas human predictors have several weeks and can use any type of information. Third, while we refined the ClusPro results for manual submission by running computationally costly Monte Carlo minimization simulations, we observed significant improvement in accuracy only for two of the six complexes correctly predicted by ClusPro. Fourth, new developments, not seen in previous rounds of CAPRI, are that the top ranked model provided by ClusPro was acceptable or better quality for all these six targets, and that the top ranked model was also the highest quality for five of the six, confirming that ranking models based on cluster size can reliably identify the best near-native conformations.
protein-protein docking; structure refinement; method development; CAPRI docking experiment; web based server; user community
Protein-DNA interactions are essential for many biological processes, X-ray crystallography can provide high-resolution structures, but protein-DNA complexes are difficult to crystallize and typically contain only small DNA fragments. Thus, there is a need for computational methods that can provide useful predictions to give insights into mechanisms and guide the design of new experiments. We used the program DOT, which performs an exhaustive, rigid-body search between two macromolecules, to investigate four diverse protein-DNA interactions. Here, we compare our computational results with subsequent experimental data on related systems. In all cases, the experimental data strongly supported our structural hypotheses from the docking calculations: a mechanism for weak, non-sequence-specific DNA binding by a transcription factor, a large DNA-binding footprint on the surface of the DNA-repair enzyme uracil-DNA-glycosylase, viral and host DNA-binding sites on the catalytic domain of HIV integrase, and a three-DNA-contact model of the linker histone bound to the nucleosome. In the case of uracil-DNA-glycosylase, the experimental design was based on the DNA-binding surface found by docking, rather than the much smaller surface observed in the crystallographic structure. These comparisons demonstrate that the DOT electrostatic energy gives a good representation of the distinctive electrostatic properties of DNA and DNA-binding proteins. The large, favorably-ranked clusters resulting from the dockings identify active sites, map out large DNA-binding sites, and reveal multiple DNA contacts with a protein. Thus, computational docking can not only help to identify protein-DNA interactions in the absence of a crystal structure, but also expand structural understanding beyond known crystallographic structures.
protein-DNA structure; HIV integrase; uracil DNA-glycosylase; linker histone; transcription factor; Poisson-Boltzmann electrostatics; hydrogen/deuterium exchange
Peptide-mediated interactions, in which a short linear motif binds to a globular domain, play major roles in cellular regulation. An accurate structural model of this type of interaction is an excellent starting point for the characterization of the binding specificity of a given peptide-binding domain. A number of different protocols have recently been proposed for the accurate modeling of peptide-protein complex structures, given the structure of the protein receptor and the binding site on its surface. When no information about the peptide binding site(s) is a priori available, there is a need for new approaches to locate peptide-binding sites on the protein surface. While several approaches have been proposed for the general identification of ligand binding sites, peptides show very specific binding characteristics, and therefore, there is a need for robust and accurate approaches that are optimized for the prediction of peptide-binding sites.
Here we present PeptiMap, a protocol for the accurate mapping of peptide binding sites on protein structures. Our method is based on experimental evidence that peptide-binding sites also bind small organic molecules of various shapes and polarity. Using an adaptation of ab initio ligand binding site prediction based on fragment mapping (FTmap), we optimize a protocol that specifically takes into account peptide binding site characteristics. In a high-quality curated set of peptide-protein complex structures PeptiMap identifies for most the accurate site of peptide binding among the top ranked predictions. We anticipate that this protocol will significantly increase the number of accurate structural models of peptide-mediated interactions.
protein peptide interactions; FFT sampling; binding site detection; mapping; PeptiDB
Although it is now possible to fold peptides and miniproteins in molecular dynamics simulations, it is well appreciated that force fields are not all transferable to different proteins. Here, we investigate the influence of the protein force field and the solvent model on the folding energy landscape of a prototypical two-state folder, the GB1 hairpin. We use extensive replica-exchange molecular dynamics simulations to characterize the free-energy surface as a function of temperature. Most of these force fields appear similar at a global level, giving a fraction folded at 300 K between 0.2 and 0.8 in all cases, which is a difference in stability of 2.8 kT, and are generally consistent with experimental data at this temperature. The most significant differences appear in the unfolded state, where there are different residual secondary structures which are populated, and the overall dimensions of the unfolded states, which in most of the force fields are too collapsed relative to experimental Förster Resonance Energy Transfer (FRET) data.
protein folding; molecular simulations; protein force field; Free-energy landscape