Photoactive yellow protein (PYP) from Halorhodospira halophila is a soluble 14 kDa blue-light photoreceptor. It absorbs light via its para-coumaric acid chromophore (pCA), which is covalently attached to Cys69 and is believed to be involved in the negative phototactic response of the organism to blue light. The complete structure (including H atoms) of PYP has been determined in D2O-soaked crystals through the application of joint X-ray (1.1Å) and neutron (2.5Å) structure refinement in combination with cross-validated maximum-likelihood simulated annealing. The resulting XN structure reveals that the phenolate O atom of pCA accepts deuterons from Glu46 Oε2 and Tyr42 Oη in two unusually short hydrogen bonds. This arrangement is stabilized by the donation of a deuteron from Thr50 Oγ1 to Tyr42 Oη. However, the deuteron position between pCA and Tyr42 is only partially occupied. Thus, this atom may also interact with Thr50, possibly being disordered or fluctuating between the two bonds.
Streptococcus agalactiae, a prokaryote that causes infections in neonates and immunocompromised adults, has a serine/threonine protein kinase (STK) signalling cascade. The structure of one of the targets, a family II inorganic pyrophosphatase, has been solved by molecular replacement and refined at 2.80 Å resolution to an R factor of 19.2% (Rfree = 26.7%). The two monomers in the asymmetric unit are related by a noncrystallographic twofold axis, but the biological dimer is formed by a crystallographic twofold. Each monomer contains the pyrophosphate analogue imidodiphosphate (PNP) and three metal ions per active site: two Mn2+ ions in sites M1 and M2 and an Mg2+ ion in site M3. The enzyme is in the closed conformation. Like other family II enzymes, the structure consists of two domains (residues 1–191 and 198–311), with the active site located between them. The conformation of Lys298 in the active site is different from those observed previously and it coordinates to the conserved DHH motif in a unique way. The structure suggests that Ser150, Ser194, Ser195 and Ser296 are the most likely targets for the Ser/Thr kinase and phosphatase because they are surface-accessible and either in the active site or in the hinge region between the two domains.
preface; molecular replacement; CCP4 study weekend 2007
This review outlines questions to consider when attempting to solve crystal structures by molecular replacement.
This review addresses the essential questions to consider when attempting to phase a new crystal structure using molecular replacement. Sequence matching can suggest whether there is a suitable three-dimensional model available, but it is also important to analyse the model in order to find its likely oligomeric state and to establish whether there are likely to be domain movements. Once a solution has been found it must be refined, which can be challenging for low-homology models. There is a detailed discussion of structures used as examples for CCP4 tutorials.
bioinformatics; molecular replacement; validation
The possibility of taking into account large-amplitude collective movements of a given model by using a subset of low-frequency normal modes is evaluated for molecular replacement and refinement using X-ray data or cryo-EM maps.
Normal-mode analysis (NMA) can be used to generate multiple structural variants of a given template model, thereby increasing the chance of finding the molecular-replacement solution. Here, it is shown that it is also possible to directly refine the amplitudes of the normal modes against experimental data (X-ray or cryo-EM), generalizing rigid-body refinement methods by adding just a few additional degrees of freedom that sample collective and large-amplitude movements. It is also argued that the situation where several (conformations of) models are present simultaneously in the crystal can be studied with adjustable occupancies using techniques derived from statistical thermodynamics and already used in molecular modelling.
normal-mode analysis; molecular replacement; refinement
An outline is given of the basic features of the molecular-replacement method for solving crystal structures.
Molecular replacement is fundamentally a simple trial-and-error method of solving crystal structures when a suitable related model is available. The underlying simplicity of the method is often obscured by the mathematical trickery required to make the searches computationally tractable. This introduction sketches the essential issues in molecular replacement without going into technical details. General search strategies are discussed and the alternative Patterson and likelihood approaches are outlined.
molecular replacement; rotation function; translation function; maximum likelihood
A number of techniques for the location of small and medium-sized model fragments in experimentally phased electron-density maps are explored. The application of one of these techniques to automated model building is discussed.
Molecular replacement is a powerful tool for the location of large models using structure-factor magnitudes alone. When phase information is available, it becomes possible to locate smaller fragments of the structure ranging in size from a few atoms to a single domain. The calculation is demanding, requiring a six-dimensional rotation and translation search. A number of approaches have been developed to this problem and a selection of these are reviewed in this paper. The application of one of these techniques to the problem of automated model building is explored in more detail, with particular reference to the problem of sequencing a protein main-chain trace.
model fragments; electron-density maps; model building
Test studies have been conducted on five crystal structures of large molecular assemblies, in which EM maps are used as models for structure solution by molecular replacement using various standard MR packages such as AMoRe, MOLREP and Phaser.
Multi-component molecular complexes are increasingly being tackled by structural biology, bringing X-ray crystallography into the purview of electron-microscopy (EM) studies. X-ray crystallography can utilize a low-resolution EM map for structure determination followed by phase extension to high resolution. Test studies have been conducted on five crystal structures of large molecular assemblies, in which EM maps are used as models for structure solution by molecular replacement (MR) using various standard MR packages such as AMoRe, MOLREP and Phaser. The results demonstrate that EM maps are viable models for molecular replacement. Possible difficulties in data analysis, such as the effects of the EM magnification error, and the effect of MR positional/rotational errors on phase extension are discussed.
electron microscopy; molecular replacement
The default model-preparation scheme of MOLREP is described. Two examples are presented of model improvement using X-ray data.
The success of molecular replacement is critically dependent on the quality of the search model. Several model-preparation procedures are integrated in the molecular-replacement program MOLREP. These include model modification on the basis of amino-acid sequence alignment and model correction based on analysis of the solvent-accessibility of the atoms. The packing function used in MOLREP for the translational search is explained in the context of model preparation. In difficult cases, bioinformatics-based modifications are not sufficient for successful molecular replacement. An approach implemented in MOLREP for solving cases with translational noncrystallographic symmetry is an example of model preparation in which analysis of X-ray data plays an essential role. In addition, two examples are presented in which the X-ray data were used to refine partial models for subsequent use in molecular replacement.
MOLREP; model preparation; molecular replacement
An automation pipeline for macromolecular structure solution by molecular replacement with a special emphasis on the discovery and preparation of a large number of search models is described.
A novel automation pipeline for macromolecular structure solution by molecular replacement is described. There is a special emphasis on the discovery and preparation of a large number of search models, all of which can be passed to the core molecular-replacement programs. For routine molecular-replacement problems, the pipeline automates what a crystallographer might do and its value is simply one of convenience. For more difficult cases, the pipeline aims to discover the particular template structure and model edits required to produce a viable search model and may succeed in finding an efficacious combination that would be missed otherwise. An overview of MrBUMP is given and some recent additions to its functionality are highlighted.
molecular replacement; search-model generation; automation; protein structure
The practical limits of molecular replacement can be extended by using several specifically designed protein models based on fold-recognition methods and by exhaustive searches performed in a parallelized pipeline. Updated results from the JCSG MR pipeline, which to date has solved 33 molecular-replacement structures with less than 35% sequence identity to the closest homologue of known structure, are presented.
The success rate of molecular replacement (MR) falls considerably when search models share less than 35% sequence identity with their templates, but can be improved significantly by using fold-recognition methods combined with exhaustive MR searches. Models based on alignments calculated with fold-recognition algorithms are more accurate than models based on conventional alignment methods such as FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement and allow parallel processing of such calculations, one can effectively increase the success rate of MR. Here, updated results from the JCSG MR pipeline are presented, which to date has solved 33 MR structures with less than 35% sequence identity to the closest homologue of known structure. By using difficult MR problems as examples, it is demonstrated that successful MR phasing is possible even in cases where the similarity between the model and the template can only be detected with fold-recognition algorithms. In the first step, several search models are built based on all homologues found in the PDB by fold-recognition algorithms. The models resulting from this process are used in parallel MR searches with different combinations of input parameters of the MR phasing algorithm. The putative solutions are subjected to rigid-body and restrained crystallographic refinement and ranked based on the final values of free R factor, figure of merit and deviations from ideal geometry. Finally, crystal packing and electron-density maps are checked to identify the correct solution. If this procedure does not yield a solution with interpretable electron-density maps, then even more alternative models are prepared. The structurally variable regions of a protein family are identified based on alignments of sequences and known structures from that family and appropriate trimmings of the models are proposed. All combinations of these trimmings are applied to the search models and the resulting set of models is used in the MR pipeline. It is estimated that with the improvements in model building and exhaustive parallel searches with existing phasing algorithms, MR can be successful for more than 50% of recognizable homologues of known structures below the threshold of 35% sequence identity. This implies that about one-third of the proteins in a typical bacterial proteome are potential MR targets.
molecular replacement; sequence-alignment accuracy; homology modeling; parameter-space screening; structural genomics
A systematic test shows how ARP/wARP deals with automated model building for structures that have been solved by molecular replacement. A description of protocols in the flex-wARP control system and studies of two specific cases are also presented.
Automatic iterative model (re-)building, as implemented in ARP/wARP and its new control system flex-wARP, is particularly well suited to follow structure solution by molecular replacement. More than 100 molecular-replacement solutions automatically solved by the BALBES software were submitted to three standard protocols in flex-wARP and the results were compared with final models from the PDB. Standard metrics were gathered in a systematic way and enabled the drawing of statistical conclusions on the advantages of each protocol. Based on this analysis, an empirical estimator was proposed that predicts how good the final model produced by flex-wARP is likely to be based on the experimental data and the quality of the molecular-replacement solution. To introduce the differences between the three flex-wARP protocols (keeping the complete search model, converting it to atomic coordinates but ignoring atom identities or using the electron-density map calculated from the molecular-replacement solution), two examples are also discussed in detail, focusing on the evolution of the models during iterative rebuilding. This highlights the diversity of paths that the flex-wARP control system can employ to reach a nearly complete and accurate model while actually starting from the same initial information.
model building; refinement; molecular replacement
Overview and examples of combined use of X-ray and electron-microscopy data.
Low-resolution electron-microscopy reconstructions can be used as search models in molecular replacement or may be combined with existing monomeric structures in order to produce multimeric models suitable for molecular replacement. The technique is described in the case of viral and subviral particles as well as in the case of oligomeric proteins.
electron microscopy; molecular replacement
The problems of gaining accurate protein sequence alignments for molecular replacement are discussed, current techniques explained and strategies suggested.
This article focuses on the key step of obtaining the best possible sequence alignment of the Query (the protein you are interested in) to the Target (a protein of known three-dimensional structure) in order to build a molecular model for molecular replacement. Common sequence-alignment methods are discussed, starting from structural alignment and then moving to pairwise, multiple and profile–profile methods. The limitations of sequence-alignment methods and guidelines on how to judge the likely accuracy of alignment are considered. This is not a detailed tutorial on how to use specific programs; rather, the reader is directed to current tools and techniques that are likely to yield good results.
molecular replacement; sequence alignment
The fully automated pipeline, BALBES, integrates a redesigned hierarchical database of protein structures with their domains and multimeric organization, and solves molecular-replacement problems using only input X-ray and sequence data.
The number of macromolecular structures solved and deposited in the Protein Data Bank (PDB) is higher than 40 000. Using this information in macromolecular crystallography (MX) should in principle increase the efficiency of MX structure solution. This paper describes a molecular-replacement pipeline, BALBES, that makes extensive use of this repository. It uses a reorganized database taken from the PDB with multimeric as well as domain organization. A system manager written in Python controls the workflow of the process. Testing the current version of the pipeline using entries from the PDB has shown that this approach has huge potential and that around 75% of structures can be solved automatically without user intervention.
BALBES; molecular replacement
An account is given of the latest developments of the AMoRe package.
An account is given of the latest developments of the AMoRe package: new rotational search algorithms, exploitation of noncrystallographic symmetry, generation and use of ensemble models and interactive graphical molecular replacement.
AMoRe; molecular replacement
The type II dehydroquinase enzyme is a symmetrical dodecameric protein which crystallizes in either high-symmetry cubic space groups or low-symmetry crystal systems with multiple copies in the asymmetric unit. Both systems have provided challenging examples for molecular replacement; for example, a triclinic crystal form has 16 dodecamers (192 monomers) in the unit cell. Three difficult examples are discussed and two are used as test cases to compare the performance of four commonly used molecular-replacement packages.
Type II dehydroquinase is a small (150-amino-acid) protein which in solution packs together to form a dodecamer with 23 cubic symmetry. In crystals of this protein the symmetry of the biological unit can be coincident with the crystallographic symmetry, giving rise to cubic crystal forms with a single monomer in the asymmetric unit. In crystals where this is not the case, multiple copies of the monomer are present, giving rise to significant and often confusing noncrystallographic symmetry in low-symmetry crystal systems. These different crystal forms pose a variety of challenges for solution by molecular replacement. Three examples of structure solutions, including a highly unusual triclinic crystal form with 16 dodecamers (192 monomers) in the unit cell, are described. Four commonly used molecular-replacement packages are assessed against two of these examples, one of high symmetry and the other of low symmetry; this study highlights how program performance can vary significantly depending on the given problem. In addition, the final refined structure of the 16-dodecamer triclinic crystal form is analysed and shown not to be a superlattice structure, but rather an F-centred cubic crystal with frustrated crystallographic symmetry.
multi-copy molecular replacement; superlattice structure; pseudo-cubic symmetry; type II dehydroquinases
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
The presence of pseudosymmetry can cause problems in structure determination and refinement. The relevant background and representative examples are presented.
It is not uncommon for protein crystals to crystallize with more than a single molecule per asymmetric unit. When more than a single molecule is present in the asymmetric unit, various pathological situations such as twinning, modulated crystals and pseudo translational or rotational symmetry can arise. The presence of pseudosymmetry can lead to uncertainties about the correct space group, especially in the presence of twinning. The background to certain common pathologies is presented and a new notation for space groups in unusual settings is introduced. The main concepts are illustrated with several examples from the literature and the Protein Data Bank.
pathology; twinning; pseudosymmetry
Three difficult MR cases are reported in which the orientation of a search oligomer or its internal parameters were determined and the oligomer was positioned according to the maximal value of the correlation coefficient in a series of translation searches. Such an exhaustive search was feasible because of constraints on the model parameters derived from the self-rotation function.
The efficiency of the cross-rotation function step of molecular replacement (MR) is intrinsically limited as it uses only a fraction of the Patterson vectors. Along with general techniques extending the boundaries of the method, there are approaches that utilize specific features of a given structure. In special cases, where the directions of noncrystallographic symmetry axes can be unambiguously derived from the self-rotation function and the structure of the homologue protein is available in a related oligomeric state, the cross-rotation function step of MR can be omitted. In such cases, a small number of yet unknown parameters defining the orientation of the oligomer and/or its internal organization can be optimized using an exhaustive search. Three difficult MR cases are reported in which these parameters were determined and the oligomer was positioned according to the maximal value of the correlation coefficient in a series of translation searches.
molecular replacement; exhaustive search
β-Ketoacyl-ACP synthase is a key target for the treatment of infectious diseases. A structure-based biophysical screening approach identified for the first time a synthetic small molecule, 2-phenylamino-4-methyl-5-acetylthiazole, that binds to the active site of the enzyme. Implications for the use of this information in drug discovery are discussed.
Fatty-acid synthesis in bacteria is of great interest as a target for the discovery of antibacterial compounds. The addition of a new acetyl moiety to the growing fatty-acid chain, an essential step in this process, is catalyzed by β-ketoacyl-ACP synthase (KAS). It is inhibited by natural antibiotics such as cerulenin and thiolactomycin; however, these lack the requirements for optimal drug development. Structure-based biophysical screening revealed a novel synthetic small molecule, 2-phenylamino-4-methyl-5-acetylthiazole, that binds to Escherichia coli KAS I with a binding constant of 25 µM as determined by fluorescence titration. A 1.35 Å crystal structure of its complex with its target reveals noncovalent interactions with the active-site Cys163 and hydrophobic residues of the fatty-acid binding pocket. The active site is accessible through an open conformation of the Phe392 side chain and no conformational changes are induced at the active site upon ligand binding. This represents a novel binding mode that differs from thiolactomycin or cerulenin interaction. The structural information on the protein–ligand interaction offers strategies for further optimization of this low-molecular-weight compound.
β-ketoacyl-ACP synthase; drug design; fatty-acid synthesis; antibiotics
Modelling deformation electron density using interatomic scatters is simpler than multipolar methods, produces comparable results at subatomic resolution and can easily be applied to macromolecules.
A study of the accurate electron-density distribution in molecular crystals at subatomic resolution (better than ∼1.0 Å) requires more detailed models than those based on independent spherical atoms. A tool that is conventionally used in small-molecule crystallography is the multipolar model. Even at upper resolution limits of 0.8–1.0 Å, the number of experimental data is insufficient for full multipolar model refinement. As an alternative, a simpler model composed of conventional independent spherical atoms augmented by additional scatterers to model bonding effects has been proposed. Refinement of these mixed models for several benchmark data sets gave results that were comparable in quality with the results of multipolar refinement and superior to those for conventional models. Applications to several data sets of both small molecules and macromolecules are shown. These refinements were performed using the general-purpose macromolecular refinement module phenix.refine of the PHENIX package.
structure refinement; subatomic resolution; deformation density; interatomic scatterers; PHENIX
Insertion of a dangling 5′-uracil and incorporation of synthetic linkers at the domain interface of a minimal hairpin ribozyme have been investigated as means of favorably influencing crystal packing. These modifications lead to changes in the ribozyme’s structural elements that mimic packing within a natural four-way helical junction, thereby providing an example of how knowledge-based design can be used to enhance the diffraction properties of a tertiarily folded RNA.
The hairpin ribozyme is a small catalytic RNA comprising two helix–loop–helix domains linked by a four-way helical junction (4WJ). In its most basic form, each domain can be formed independently and reconstituted without a 4WJ to yield an active enzyme. The production of such minimal junctionless hairpin ribozymes is achievable by chemical synthesis, which has allowed structures to be determined for numerous nucleotide variants. However, abasic and other destabilizing core modifications hinder crystallization. This investigation describes the use of a dangling 5′-U to form an intermolecular U·U mismatch, as well as the use of synthetic linkers to tether the loop A and B domains, including (i) a three-carbon propyl linker (C3L) and (ii) a nine-atom triethylene glycol linker (S9L). Both linker constructs demonstrated similar enzymatic activity, but S9L constructs yielded crystals that diffracted to 2.65 Å resolution or better. In contrast, C3L variants diffracted to 3.35 Å and exhibited a 15 Å expansion of the c axis. Crystal packing of the C3L construct showed a paucity of 61 contacts, which comprise numerous backbone to 2′-OH hydrogen bonds in junctionless and S9L complexes. Significantly, the crystal packing in minimal structures mimics stabilizing features observed in the 4WJ hairpin ribozyme structure. The results demonstrate how knowledge-based design can be used to improve diffraction and overcome otherwise destabilizing defects.
RNA; ribozyme; four-way helical junction; synthetic linker; crystal packing; structure-based design
Heterogeneity in ensembles generated by independent model rebuilding principally reflects the limitations of the data and of the model-building process rather than the diversity of structures in the crystal.
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ‘perfect’ structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.
model building; model completion; coordinate errors; models; Protein Data Bank; convergence; reproducibility; heterogeneity; precision; accuracy
The crystal structures of T. gondii and P. falciparum ENR in complex with NAD+ and triclosan and of T. gondii ENR in an apo form have been solved to 2.6, 2.2 and 2.8 Å, respectively.
Recent studies have demonstrated that submicromolar concentrations of the biocide triclosan arrest the growth of the apicomplexan parasites Plasmodium falciparum and Toxoplasma gondii and inhibit the activity of the apicomplexan enoyl acyl carrier protein reductase (ENR). The crystal structures of T. gondii and P. falciparum ENR in complex with NAD+ and triclosan and of T. gondii ENR in an apo form have been solved to 2.6, 2.2 and 2.8 Å, respectively. The structures of T. gondii ENR have revealed that, as in its bacterial and plant homologues, a loop region which flanks the active site becomes ordered upon inhibitor binding, resulting in the slow tight binding of triclosan. In addition, the T. gondii ENR–triclosan complex reveals the folding of a hydrophilic insert common to the apicomplexan family that flanks the substrate-binding domain and is disordered in all other reported apicomplexan ENR structures. Structural comparison of the apicomplexan ENR structures with their bacterial and plant counterparts has revealed that although the active sites of the parasite enzymes are broadly similar to those of their bacterial counterparts, there are a number of important differences within the drug-binding pocket that reduce the packing interactions formed with several inhibitors in the apicomplexan ENR enzymes. Together with other significant structural differences, this provides a possible explanation of the lower affinity of the parasite ENR enzyme family for aminopyridine-based inhibitors, suggesting that an effective antiparasitic agent may well be distinct from equivalent antimicrobials.
enoyl acyl carrier protein reductases; triclosan; apicomplexan parasites; Plasmodium falciparum; Toxoplasma gondii