Beta-lactam antibiotics target penicillin-binding proteins including several enzyme classes essential for bacterial cell-wall homeostasis. To better understand the functional and inhibitor-binding specificities of penicillin-binding proteins from the pathogen, Mycobacterium tuberculosis, we carried out structural and phylogenetic analysis of two predicted D,D-carboxypeptidases, Rv2911 and Rv3330. Optimization of Rv2911 for crystallization using directed evolution and the GFP folding reporter method yielded a soluble quadruple mutant. Structures of optimized Rv2911 bound to phenylmethylsulfonyl fluoride and Rv3330 bound to meropenem show that, in contrast to the nonspecific inhibitor, meropenem forms an extended interaction with the enzyme along a conserved surface. Phylogenetic analysis shows that Rv2911 and Rv3330 belong to different clades that emerged in Actinobacteria and are not represented in model organisms such as Escherichia coli and Bacillus subtilis. Clade-specific adaptations allow these enzymes to fulfill distinct physiological roles despite strict conservation of core catalytic residues. The characteristic differences include potential protein-protein interaction surfaces and specificity-determining residues surrounding the catalytic site. Overall, these structural insights lay the groundwork to develop improved beta-lactam therapeutics for tuberculosis.
Identifying errors and alternate conformers, and modeling multiple main-chain conformers in poorly ordered regions are overarching problems in crystallographic structure determination that have limited automation efforts and structure quality. Here, we show that implementation of a full factorial designed set of standard refinement approaches, which we call ExCoR (Extensive Combinatorial Refinement), significantly improves structural models compared to the traditional linear tree approach, in which individual algorithms are tested linearly, and only incorporated if the model improves. ExCoR markedly improved maps and models, and reveals building errors and alternate conformations that were masked by traditional refinement approaches. Surprisingly, an individual algorithm that renders a model worse in isolation could still be necessary to produce the best overall model, suggesting that model distortion allows escape from local minima of optimization target function, here shown to be a hallmark limitation of the traditional approach. ExCoR thus provides a simple approach to improving structure determination.
Refinement of macromolecular structures against low-resolution crystallographic data is limited by the ability of current methods to converge on a structure with realistic geometry. We developed a low-resolution crystallographic refinement method that combines the Rosetta sampling methodology and energy function with reciprocal-space X-ray refinement in Phenix. On a set of difficult low-resolution cases, the method yielded improved model geometry and lower free R factors than alternate refinement methods.
The 1.55 Å resolution X-ray crystal structure of Rv3902c from M. tuberculosis reveals a novel fold.
The crystallographic structure of the Mycobacterium tuberculosis (TB) protein Rv3902c (176 residues; molecular mass of 19.8 kDa) was determined at 1.55 Å resolution. The function of Rv3902c is unknown, although several TB genes involved in bacterial pathogenesis are expressed from the operon containing the Rv3902c gene. The unique structural fold of Rv3902c contains two domains, each consisting of antiparallel β-sheets and α-helices, creating a hand-like binding motif with a small binding pocket in the palm. Structural homology searches reveal that Rv3902c has an overall structure similar to that of the Salmonella virulence-factor chaperone InvB, with an r.m.s.d. for main-chain atoms of 2.3 Å along an aligned domain.
Mycobacterium tuberculosis; Rv3902c; virulence-factor chaperone; novel fold
Macromolecular structures deposited in the PDB can and should be continually reinterpreted and improved on the basis of their accompanying experimental X-ray data, exploiting the steady progress in methods and software that the deposition of such data into the PDB on a massive scale has made possible.
Accurate crystal structures of macromolecules are of high importance in the biological and biomedical fields. Models of crystal structures in the Protein Data Bank (PDB) are in general of very high quality as deposited. However, methods for obtaining the best model of a macromolecular structure from a given set of experimental X-ray data continue to progress at a rapid pace, making it possible to improve most PDB entries after their deposition by re-analyzing the original deposited data with more recent software. This possibility represents a very significant departure from the situation that prevailed when the PDB was created, when it was envisioned as a cumulative repository of static contents. A radical paradigm shift for the PDB is therefore proposed, away from the static archive model towards a much more dynamic body of continuously improving results in symbiosis with continuously improving methods and software. These simultaneous improvements in methods and final results are made possible by the current deposition of processed crystallographic data (structure-factor amplitudes) and will be supported further by the deposition of raw data (diffraction images). It is argued that it is both desirable and feasible to carry out small-scale and large-scale efforts to make this paradigm shift a reality. Small-scale efforts would focus on optimizing structures that are of interest to specific investigators. Large-scale efforts would undertake a systematic re-optimization of all of the structures in the PDB, or alternatively the redetermination of groups of structures that are either related to or focused on specific questions. All of the resulting structures should be made generally available, along with the precursor entries, with various views of the structures being made available depending on the types of questions that users are interested in answering.
structure determination; model quality; data analysis; software development
Rank scaling of Fourier syntheses leads to new tools for the comparison of crystallographic contour maps. The new metrics are in better agreement with a visual map analysis than the conventional map correlation coefficient.
Numerical comparison of crystallographic contour maps is used extensively in structure solution and model refinement, analysis and validation. However, traditional metrics such as the map correlation coefficient (map CC, real-space CC or RSCC) sometimes contradict the results of visual assessment of the corresponding maps. This article explains such apparent contradictions and suggests new metrics and tools to compare crystallographic contour maps. The key to the new methods is rank scaling of the Fourier syntheses. The new metrics are complementary to the usual map CC and can be more helpful in map comparison, in particular when only some of their aspects, such as regions of high density, are of interest.
Fourier syntheses; crystallographic contour maps; map comparison; sigma scale; rank scaling; correlation coefficients
The TB Structural Genomics Consortium is a worldwide organization of collaborators whose mission is the comprehensive structural determination and analyses of Mycobacterium tuberculosis proteins to ultimately aid in tuberculosis diagnosis and treatment. Congruent to the overall vision, Consortium members have additionally established an integrated facilities core to streamline M. tuberculosis structural biology and developed bioinformatics resources for data mining. This review aims to share the latest Consortium developments with the TB community, including recent structures of proteins that play significant roles within M. tuberculosis. Atomic resolution details may unravel mechanistic insights and reveal unique and novel protein features, as well as important protein-protein and protein-ligand interactions, which ultimately leads to a better understanding of M. tuberculosis biology and may be exploited for rational, structure-based therapeutics design.
Mycobacterium tuberculosis; Protein structure; X-ray crystallography; Structural genomics; Drug discovery
The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions.
Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.
refinement; ions; PHENIX
A new module, Guided Ligand Replacement (GLR), has been developed in Phenix to increase the ease and success rate of ligand placement when prior protein-ligand complexes are available.
The process of iterative structure-based drug design involves the X-ray crystal structure determination of upwards of 100 ligands with the same general scaffold (i.e. chemotype) complexed with very similar, if not identical, protein targets. In conjunction with insights from computational models and assays, this collection of crystal structures is analyzed to improve potency, to achieve better selectivity and to reduce liabilities such as absorption, distribution, metabolism, excretion and toxicology. Current methods for modeling ligands into electron-density maps typically do not utilize information on how similar ligands bound in related structures. Even if the electron density is of sufficient quality and resolution to allow de novo placement, the process can take considerable time as the size, complexity and torsional degrees of freedom of the ligands increase. A new module, Guided Ligand Replacement (GLR), was developed in Phenix to increase the ease and success rate of ligand placement when prior protein–ligand complexes are available. At the heart of GLR is an algorithm based on graph theory that associates atoms in the target ligand with analogous atoms in the reference ligand. Based on this correspondence, a set of coordinates is generated for the target ligand. GLR is especially useful in two situations: (i) modeling a series of large, flexible, complicated or macrocyclic ligands in successive structures and (ii) modeling ligands as part of a refinement pipeline that can automatically select a reference structure. Even in those cases for which no reference structure is available, if there are multiple copies of the bound ligand per asymmetric unit GLR offers an efficient way to complete the model after the first ligand has been placed. In all of these applications, GLR leverages prior knowledge from earlier structures to facilitate ligand placement in the current structure.
ligand placement; guided ligand-replacement method; GLR
A software system for automated protein–ligand crystallography has been implemented in the Phenix suite. This significantly reduces the manual effort required in high-throughput crystallographic studies.
High-throughput drug-discovery and mechanistic studies often require the determination of multiple related crystal structures that only differ in the bound ligands, point mutations in the protein sequence and minor conformational changes. If performed manually, solution and refinement requires extensive repetition of the same tasks for each structure. To accelerate this process and minimize manual effort, a pipeline encompassing all stages of ligand building and refinement, starting from integrated and scaled diffraction intensities, has been implemented in Phenix. The resulting system is able to successfully solve and refine large collections of structures in parallel without extensive user intervention prior to the final stages of model completion and validation.
protein–ligand complexes; automation; crystallographic structure solution and refinement
A strategy using a new split green fluorescent protein (GFP) as a modular binding partner to form stable protein complexes with a target protein is presented. The modular split GFP may open the way to rapidly creating crystallization variants.
A modular strategy for protein crystallization using split green fluorescent protein (GFP) as a crystallization partner is demonstrated. Insertion of a hairpin containing GFP β-strands 10 and 11 into a surface loop of a target protein provides two chain crossings between the target and the reconstituted GFP compared with the single connection afforded by terminal GFP fusions. This strategy was tested by inserting this hairpin into a loop of another fluorescent protein, sfCherry. The crystal structure of the sfCherry-GFP(10–11) hairpin in complex with GFP(1–9) was determined at a resolution of 2.6 Å. Analysis of the complex shows that the reconstituted GFP is attached to the target protein (sfCherry) in a structurally ordered way. This work opens the way to rapidly creating crystallization variants by reconstituting a target protein bearing the GFP(10–11) hairpin with a variety of GFP(1–9) mutants engineered for favorable crystallization.
protein crystallization; synthetic symmetrization; protein tagging; split GFP; split protein; green fluorescent protein; protein expression; protein-fragment complementation; crystallization reagents
A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence.
A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.
morphing; model building; sequence assignment; model–map correlation; loop-building
Monitoring protein-protein interactions in living cells is key to unraveling their roles in numerous cellular processes and various diseases. Previously described split-GFP based sensors suffer from poor folding and/or self-assembly background fluorescence. Here, we have engineered a micro-tagging system to monitor protein-protein interactions in vivo and in vitro. The assay is based on tripartite association between two twenty amino-acids long GFP tags, GFP10 and GFP11, fused to interacting protein partners, and the complementary GFP1-9 detector. When proteins interact, GFP10 and GFP11 self-associate with GFP1-9 to reconstitute a functional GFP. Using coiled-coils and FRB/FKBP12 model systems we characterize the sensor in vitro and in Escherichia coli. We extend the studies to mammalian cells and examine the FK-506 inhibition of the rapamycin-induced association of FRB/FKBP12. The small size of these tags and their minimal effect on fusion protein behavior and solubility should enable new experiments for monitoring protein-protein association by fluorescence.
A genetic algorithm has been developed to optimize the phases of the strongest reflections in SIR/SAD data. This is shown to facilitate density modification and model building in several test cases.
Experimental phasing of diffraction data from macromolecular crystals involves deriving phase probability distributions. These distributions are often bimodal, making their weighted average, the centroid phase, improbable, so that electron-density maps computed using centroid phases are often non-interpretable. Density modification brings in information about the characteristics of electron density in protein crystals. In successful cases, this allows a choice between the modes in the phase probability distributions, and the maps can cross the borderline between non-interpretable and interpretable. Based on the suggestions by Vekhter [Vekhter (2005 ▶), Acta Cryst. D61, 899–902], the impact of identifying optimized phases for a small number of strong reflections prior to the density-modification process was investigated while using the centroid phase as a starting point for the remaining reflections. A genetic algorithm was developed that optimizes the quality of such phases using the skewness of the density map as a target function. Phases optimized in this way are then used in density modification. In most of the tests, the resulting maps were of higher quality than maps generated from the original centroid phases. In one of the test cases, the new method sufficiently improved a marginal set of experimental SAD phases to enable successful map interpretation. A computer program, SISA, has been developed to apply this method for phase improvement in macromolecular crystallography.
experimental phasing; density modification; genetic algorithms
The internal symmetry of a macromolecule is both an important aspect of its function and a useful feature in obtaining a structure by X-ray crystallography and other techniques. A method is presented for finding internal symmetry and other non-crystallographic symmetry in a structure based on patterns of density in a density map for that structure. Regions in map that are similar are identified by cutting out a sphere of density from a region that has high local variation and using an FFT-based correlation search to find other regions that match. The relationships among correlated regions are then refined to maximize their correlations and are found to accurately represent non-crystallographic symmetry in the map.
Symmetry; Macromolecule; Crystal structure; Density map; Automation; Macromolecular crystallography; Phenix
AcrB is an inner membrane resistance-nodulation-cell division efflux pump and is part of the AcrAB–TolC tripartite efflux system. We have determined the crystal structure of AcrB with bound Linezolid at a resolution of 3.5 Å. The structure shows that Linezolid binds to the A385/F386 loops of the symmetric trimer of AcrB. A conformational change of a loop in the bottom of the periplasmic cleft is also observed.
Multidrug resistance; AcrB; RND efflux pumps; Linezolid; Membrane protein; Protein–drug complex; X-ray crystal structure
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
A density-based procedure is described for improving a homology model that is locally accurate but differs globally. The model is deformed to match the map and refined, yielding an improved starting point for density modification and further model-building.
An approach is presented for addressing the challenge of model rebuilding after molecular replacement in cases where the placed template is very different from the structure to be determined. The approach takes advantage of the observation that a template and target structure may have local structures that can be superimposed much more closely than can their complete structures. A density-guided procedure for deformation of a properly placed template is introduced. A shift in the coordinates of each residue in the structure is calculated based on optimizing the match of model density within a 6 Å radius of the center of that residue with a prime-and-switch electron-density map. The shifts are smoothed and applied to the atoms in each residue, leading to local deformation of the template that improves the match of map and model. The model is then refined to improve the geometry and the fit of model to the structure-factor data. A new map is then calculated and the process is repeated until convergence. The procedure can extend the routine applicability of automated molecular replacement, model building and refinement to search models with over 2 Å r.m.s.d. representing 65–100% of the structure.
molecular replacement; automation; macromolecular crystallography; structure similarity; modeling; Phenix; morphing
Approximately one-third of mankind has been exposed to Mycobacterium tuberculosis, the etiological agent responsible for tuberculosis (TB). As part of an effort to develop a new generation of anti-TB agents, the chemical shifts for the 261-residue, virulence-associated protein Rv0577 from M. tuberculosis has been extensively assigned.
With over 60,000 protein structures available in the Protein Data Bank, it is frequently possible use one of them to obtain starting phase information and to solve new crystal structures. Molecular replacement1–4 procedures, which search for placements of a starting model within the crystallographic unit cell that best account for the measured diffraction amplitudes, followed by automatic chain tracing methods5–8, have allowed the rapid solution of large numbers of protein structures. Despite extensive work9–14, molecular replacement or the subsequent rebuilding usually fail with more divergent starting models based on remote homologues with less than 30% sequence identity. Here we show that this limitation can be substantially reduced by combining algorithms for protein structure modeling with those developed for crystallographic structure determination. An approach integrating Rosetta structure modeling with Autobuild chain tracing yielded high-resolution structures for 8 of 13 X-ray diffraction datasets that could not be solved in the laboratories of expert crystallographers and that remained unsolved after application of an extensive array of alternative approaches. We estimate the new method should allow rapid structure determination without experimental phase information for over half the cases where current methods fail, given diffraction datasets of better than 3.2Å resolution, four or fewer copies in the asymmetric unit, and the availability of structures of homologous proteins with >20% sequence identity.
In scientific computing, Fortran was the dominant implementation language throughout most of the second part of the 20th century. The many tools accumulated during this time have been difficult to integrate with modern software, which is now dominated by object-oriented languages.
Driven by the requirements of a large-scale scientific software project, we have developed a Fortran to C++ source-to-source conversion tool named FABLE. This enables the continued development of new methods even while switching languages. We report the application of FABLE in three major projects and present detailed comparisons of Fortran and C++ runtime performances.
Our experience suggests that most Fortran 77 codes can be converted with an effort that is minor (measured in days) compared to the original development time (often measured in years). With FABLE it is possible to reuse and evolve legacy work in modern object-oriented environments, in a portable and maintainable way. FABLE is available under a nonrestrictive open source license. In FABLE the analysis of the Fortran sources is separated from the generation of the C++ sources. Therefore parts of FABLE could be reused for other target languages.
Fortran; C++; Source-to-source conversion; Python; Test-driven development
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix