A software system for automated protein–ligand crystallography has been implemented in the Phenix suite. This significantly reduces the manual effort required in high-throughput crystallographic studies.
High-throughput drug-discovery and mechanistic studies often require the determination of multiple related crystal structures that only differ in the bound ligands, point mutations in the protein sequence and minor conformational changes. If performed manually, solution and refinement requires extensive repetition of the same tasks for each structure. To accelerate this process and minimize manual effort, a pipeline encompassing all stages of ligand building and refinement, starting from integrated and scaled diffraction intensities, has been implemented in Phenix. The resulting system is able to successfully solve and refine large collections of structures in parallel without extensive user intervention prior to the final stages of model completion and validation.
protein–ligand complexes; automation; crystallographic structure solution and refinement
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by GDT (Global Distance Test) and related scores (based on multiple superpositions of Cα atoms between target structure and predictions labeled “model 1”). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how “protein-like” the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance.
Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, while self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Cα accuracy.
homology modeling; protein structure prediction; all-atom contacts; full-model assessment
Application of phenix.model_vs_data to the contents of the Protein Data Bank shows that the vast majority of deposited structures can be automatically analyzed to reproduce the reported quality statistics. However, the small fraction of structures that elude automated re-analysis highlight areas where new software developments can help retain valuable information for future analysis.
phenix.model_vs_data is a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particular R factors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputed R values are significantly different from those originally reported. The reasons for these discrepancies are discussed.
PHENIX; Protein Data Bank; data quality; model quality; structure validation; R factors
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
MolProbity structure validation will diagnose most local errors in macromolecular crystal structures and help to guide their correction.
MolProbity is a structure-validation web service that provides broad-spectrum solidly based evaluation of model quality at both the global and local levels for both proteins and nucleic acids. It relies heavily on the power and sensitivity provided by optimized hydrogen placement and all-atom contact analysis, complemented by updated versions of covalent-geometry and torsion-angle criteria. Some of the local corrections can be performed automatically in MolProbity and all of the diagnostics are presented in chart and graphical forms that help guide manual rebuilding. X-ray crystallography provides a wealth of biologically important molecular data in the form of atomic three-dimensional structures of proteins, nucleic acids and increasingly large complexes in multiple forms and states. Advances in automation, in everything from crystallization to data collection to phasing to model building to refinement, have made solving a structure using crystallography easier than ever. However, despite these improvements, local errors that can affect biological interpretation are widespread at low resolution and even high-resolution structures nearly all contain at least a few local errors such as Ramachandran outliers, flipped branched protein side chains and incorrect sugar puckers. It is critical both for the crystallographer and for the end user that there are easy and reliable methods to diagnose and correct these sorts of errors in structures. MolProbity is the authors’ contribution to helping solve this problem and this article reviews its general capabilities, reports on recent enhancements and usage, and presents evidence that the resulting improvements are now beneficially affecting the global database.
all-atom contacts; clashscore; automated correction; KiNG; ribose pucker; Ramachandran plots; side-chain rotamers; model quality; systematic errors; database improvement
Misfit sidechains in protein crystal structures are a stumbling block in using those structures to direct further scientific inference. Problems due to surface disorder and poor electron density are very difficult to address, but a large class of systematic errors are quite common even in well-ordered regions, resulting in sidechains fit backwards into local density in predictable ways. The MolProbity web site is effective at diagnosing such errors, and can perform reliable automated correction of a few special cases such as 180° flips of Asn or Gln sidechain amides, using all-atom contacts and H-bond networks. However, most at-risk residues involve tetrahedral geometry, and their valid correction requires rigorous evaluation of sidechain movement and sometimes backbone shift. The current work extends the benefits of robust automated correction to more sidechain types. The Autofix method identifies candidate systematic, flipped-over errors in Leu, Thr, Val, and Arg using MolProbity quality statistics, proposes a corrected position using real-space refinement with rotamer selection in Coot, and accepts or rejects the correction based on improvement in MolProbity criteria and on χ angle change. Criteria are chosen conservatively, after examining many individual results, to ensure valid correction. To test this method, Autofix was run and analyzed for 945 representative PDB files and on the 50S ribosomal subunit of file 1YHQ. Over 40% of Leu, Val, and Thr outliers and 15% of Arg outliers were successfully corrected, resulting in a total of 3,679 corrected sidechains, or 4 per structure on average. Summary Sentences: A common class of misfit sidechains in protein crystal structures is due to systematic errors that place the sidechain backwards into the local electron density. A fully automated method called “Autofix” identifies such errors for Leu, Val, Thr, and Arg and corrects over one third of them, using MolProbity validation criteria and Coot real-space refinement of rotamers.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-008-9045-8) contains supplementary material, which is available to authorized users.
Automation; Structure improvement; Crystallography; Sidechain rotamers; Protein/RNA interactions
Cyclic GMP-dependent protein kinases (PKGs) are central mediators of the NO-cGMP signaling pathway and phosphorylate downstream substrates that are crucial for regulating smooth muscle tone, platelet activation, nociception and memory formation. As one of the main receptors for cGMP, PKGs mediate most of the effects of cGMP elevating drugs, such as nitric oxide-releasing agents and phosphodiesterase inhibitors which are used for the treatment of angina pectoris and erectile dysfunction, respectively.
We have investigated the mechanism of cyclic nucleotide binding to PKG by determining crystal structures of the amino-terminal cyclic nucleotide-binding domain (CNBD-A) of human PKG I bound to either cGMP or cAMP. We also determined the structure of CNBD-A in the absence of bound nucleotide. The crystal structures of CNBD-A with bound cAMP or cGMP reveal that cAMP binds in either syn or anti configurations whereas cGMP binds only in a syn configuration, with a conserved threonine residue anchoring both cyclic phosphate and guanine moieties. The structure of CNBD-A in the absence of bound cyclic nucleotide was similar to that of the cyclic nucleotide bound structures. Surprisingly, isothermal titration calorimetry experiments demonstrated that CNBD-A binds both cGMP and cAMP with a relatively high affinity, showing an approximately two-fold preference for cGMP.
Our findings suggest that CNBD-A binds cGMP in the syn conformation through its interaction with Thr193 and an unusual cis-peptide forming residues Leu172 and Cys173. Although these studies provide the first structural insights into cyclic nucleotide binding to PKG, our ITC results show only a two-fold preference for cGMP, indicating that other domains are required for the previously reported cyclic nucleotide selectivity.