The new scaling program AIMLESS is described and tests of refinements at different resolutions are compared with analyses from the scaling step.
Following integration of the observed diffraction spots, the process of ‘data reduction’ initially aims to determine the point-group symmetry of the data and the likely space group. This can be performed with the program POINTLESS. The scaling program then puts all the measurements on a common scale, averages measurements of symmetry-related reflections (using the symmetry determined previously) and produces many statistics that provide the first important measures of data quality. A new scaling program, AIMLESS, implements scaling models similar to those in SCALA but adds some additional analyses. From the analyses, a number of decisions can be made about the quality of the data and whether some measurements should be discarded. The effective ‘resolution’ of a data set is a difficult and possibly contentious question (particularly with referees of papers) and this is discussed in the light of tests comparing the data-processing statistics with trials of refinement against observed and simulated data, and automated model-building and comparison of maps calculated with different resolution limits. These trials show that adding weak high-resolution data beyond the commonly used limits may make some improvement and does no harm.
data reduction; data scaling; software; data statistics
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
Low-resolution refinement tools implemented in REFMAC5 are described, including the use of external structural restraints, helical restraints and regularized anisotropic map sharpening.
Two aspects of low-resolution macromolecular crystal structure analysis are considered: (i) the use of reference structures and structural units for provision of structural prior information and (ii) map sharpening in the presence of noise and the effects of Fourier series termination. The generation of interatomic distance restraints by ProSMART and their subsequent application in REFMAC5 is described. It is shown that the use of such external structural information can enhance the reliability of derived atomic models and stabilize refinement. The problem of map sharpening is considered as an inverse deblurring problem and is solved using Tikhonov regularizers. It is demonstrated that this type of map sharpening can automatically produce a map with more structural features whilst maintaining connectivity. Tests show that both of these directions are promising, although more work needs to be performed in order to further exploit structural information and to address the problem of reliable electron-density calculation.
low-resolution refinement; REFMAC5
The decision-making algorithms and software used in PDB_REDO to re-refine and rebuild crystallographic protein structures in the PDB are presented and discussed.
Developments of the PDB_REDO procedure that combine re-refinement and rebuilding within a unique decision-making framework to improve structures in the PDB are presented. PDB_REDO uses a variety of existing and custom-built software modules to choose an optimal refinement protocol (e.g. anisotropic, isotropic or overall B-factor refinement, TLS model) and to optimize the geometry versus data-refinement weights. Next, it proceeds to rebuild side chains and peptide planes before a final optimization round. PDB_REDO works fully automatically without the need for intervention by a crystallographic expert. The pipeline was tested on 12 000 PDB entries and the great majority of the test cases improved both in terms of crystallographic criteria such as R
free and in terms of widely accepted geometric validation criteria. It is concluded that PDB_REDO is useful to update the otherwise ‘static’ structures in the PDB to modern crystallographic standards. The publically available PDB_REDO database provides better model statistics and contributes to better refinement and validation targets.
validation; refinement; model building; automation; PDB
An overview of the CCP4 software suite for macromolecular crystallography is given.
The CCP4 (Collaborative Computational Project, Number 4) software suite is a collection of programs and associated data and software libraries which can be used for macromolecular structure determination by X-ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims. The programs are from a wide variety of sources but are connected by a common infrastructure provided by standard file formats, data objects and graphical interfaces. Structure solution by macromolecular crystallography is becoming increasingly automated and the CCP4 suite includes several automation pipelines. After giving a brief description of the evolution of CCP4 over the last 30 years, an overview of the current suite is given. While detailed descriptions are given in the accompanying articles, here it is shown how the individual programs contribute to a complete software package.
CCP4; macromolecular crystallography; software; collaboration; automation; macromolecular structure determination
The automated pipelines for molecular replacement MrBUMP and BALBES are reviewed, with an emphasis on understanding their output. Conclusions are drawn from their performance in extensive trials.
Molecular replacement is one of the key methods used to solve the problem of determining the phases of structure factors in protein structure solution from X-ray image diffraction data. Its success rate has been steadily improving with the development of improved software methods and the increasing number of structures available in the PDB for use as search models. Despite this, in cases where there is low sequence identity between the target-structure sequence and that of its set of possible homologues it can be a difficult and time-consuming chore to isolate and prepare the best search model for molecular replacement. MrBUMP and BALBES are two recent developments from CCP4 that have been designed to automate and speed up the process of determining and preparing the best search models and putting them through molecular replacement. Their intention is to provide the user with a broad set of results using many search models and to highlight the best of these for further processing. An overview of both programs is presented along with a description of how best to use them, citing case studies and the results of large-scale testing of the software.
MrBUMP; BALBES; molecular replacement
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
The default model-preparation scheme of MOLREP is described. Two examples are presented of model improvement using X-ray data.
The success of molecular replacement is critically dependent on the quality of the search model. Several model-preparation procedures are integrated in the molecular-replacement program MOLREP. These include model modification on the basis of amino-acid sequence alignment and model correction based on analysis of the solvent-accessibility of the atoms. The packing function used in MOLREP for the translational search is explained in the context of model preparation. In difficult cases, bioinformatics-based modifications are not sufficient for successful molecular replacement. An approach implemented in MOLREP for solving cases with translational noncrystallographic symmetry is an example of model preparation in which analysis of X-ray data plays an essential role. In addition, two examples are presented in which the X-ray data were used to refine partial models for subsequent use in molecular replacement.
MOLREP; model preparation; molecular replacement
A systematic test shows how ARP/wARP deals with automated model building for structures that have been solved by molecular replacement. A description of protocols in the flex-wARP control system and studies of two specific cases are also presented.
Automatic iterative model (re-)building, as implemented in ARP/wARP and its new control system flex-wARP, is particularly well suited to follow structure solution by molecular replacement. More than 100 molecular-replacement solutions automatically solved by the BALBES software were submitted to three standard protocols in flex-wARP and the results were compared with final models from the PDB. Standard metrics were gathered in a systematic way and enabled the drawing of statistical conclusions on the advantages of each protocol. Based on this analysis, an empirical estimator was proposed that predicts how good the final model produced by flex-wARP is likely to be based on the experimental data and the quality of the molecular-replacement solution. To introduce the differences between the three flex-wARP protocols (keeping the complete search model, converting it to atomic coordinates but ignoring atom identities or using the electron-density map calculated from the molecular-replacement solution), two examples are also discussed in detail, focusing on the evolution of the models during iterative rebuilding. This highlights the diversity of paths that the flex-wARP control system can employ to reach a nearly complete and accurate model while actually starting from the same initial information.
model building; refinement; molecular replacement
The fully automated pipeline, BALBES, integrates a redesigned hierarchical database of protein structures with their domains and multimeric organization, and solves molecular-replacement problems using only input X-ray and sequence data.
The number of macromolecular structures solved and deposited in the Protein Data Bank (PDB) is higher than 40 000. Using this information in macromolecular crystallography (MX) should in principle increase the efficiency of MX structure solution. This paper describes a molecular-replacement pipeline, BALBES, that makes extensive use of this repository. It uses a reorganized database taken from the PDB with multimeric as well as domain organization. A system manager written in Python controls the workflow of the process. Testing the current version of the pipeline using entries from the PDB has shown that this approach has huge potential and that around 75% of structures can be solved automatically without user intervention.
BALBES; molecular replacement
The presence of pseudosymmetry can cause problems in structure determination and refinement. The relevant background and representative examples are presented.
It is not uncommon for protein crystals to crystallize with more than a single molecule per asymmetric unit. When more than a single molecule is present in the asymmetric unit, various pathological situations such as twinning, modulated crystals and pseudo translational or rotational symmetry can arise. The presence of pseudosymmetry can lead to uncertainties about the correct space group, especially in the presence of twinning. The background to certain common pathologies is presented and a new notation for space groups in unusual settings is introduced. The main concepts are illustrated with several examples from the literature and the Protein Data Bank.
pathology; twinning; pseudosymmetry