Special methods are required to interpret sparse diffraction patterns collected from peptide crystals at X-ray free-electron lasers. Bragg spots can be indexed from composite-image powder rings, with crystal orientations then deduced from a very limited number of spot positions.
Still diffraction patterns from peptide nanocrystals with small unit cells are challenging to index using conventional methods owing to the limited number of spots and the lack of crystal orientation information for individual images. New indexing algorithms have been developed as part of the Computational Crystallography Toolbox (cctbx) to overcome these challenges. Accurate unit-cell information derived from an aggregate data set from thousands of diffraction patterns can be used to determine a crystal orientation matrix for individual images with as few as five reflections. These algorithms are potentially applicable not only to amyloid peptides but also to any set of diffraction patterns with sparse properties, such as low-resolution virus structures or high-throughput screening of still images captured by raster-scanning at synchrotron sources. As a proof of concept for this technique, successful integration of X-ray free-electron laser (XFEL) data to 2.5 Å resolution for the amyloid segment GNNQQNY from the Sup35 yeast prion is presented.
XFEL; Sup35 yeast prion; indexing methods; crystallography
The dioxygen we breathe is formed from water by its light-induced oxidation in photosystem II. O2 formation takes place at a catalytic manganese cluster within milliseconds after the photosystem II reaction center is excited by three single-turnover flashes. Here we present combined X-ray emission spectra and diffraction data of 2 flash (2F) and 3 flash (3F) photosystem II samples, and of a transient 3F′ state (250 μs after the third flash), collected under functional conditions using an X-ray free electron laser. The spectra show that the initial O-O bond formation, coupled to Mn-reduction, does not yet occur within 250 μs after the third flash. Diffraction data of all states studied exhibit an anomalous scattering signal from Mn but show no significant structural changes at the present resolution of 4.5 Å. This study represents the initial frames in a molecular movie of the structural changes during the catalytic reaction in photosystem II.
X-ray free-electron laser crystallography relies on the collection of still-shot diffraction patterns. New methods are developed for optimal modeling of the crystals’ orientations and mosaic block properties.
X-ray diffraction patterns from still crystals are inherently difficult to process because the crystal orientation is not uniquely determined by measuring the Bragg spot positions. Only one of the three rotational degrees of freedom is directly coupled to spot positions; the other two rotations move Bragg spots in and out of the reflecting condition but do not change the direction of the diffracted rays. This hinders the ability to recover accurate structure factors from experiments that are dependent on single-shot exposures, such as femtosecond diffract-and-destroy protocols at X-ray free-electron lasers (XFELs). Here, additional methods are introduced to optimally model the diffraction. The best orientation is obtained by requiring, for the brightest observed spots, that each reciprocal-lattice point be placed into the exact reflecting condition implied by Bragg’s law with a minimal rotation. This approach reduces the experimental uncertainties in noisy XFEL data, improving the crystallographic R factors and sharpening anomalous differences that are near the level of the noise.
X-ray free-electron lasers; single-shot exposures
Refinement of macromolecular structures against low-resolution crystallographic data is limited by the ability of current methods to converge on a structure with realistic geometry. We developed a low-resolution crystallographic refinement method that combines the Rosetta sampling methodology and energy function with reciprocal-space X-ray refinement in Phenix. On a set of difficult low-resolution cases, the method yielded improved model geometry and lower free R factors than alternate refinement methods.
A Python/C++ library for reading image data and experimental geometry for X-ray diffraction experiments from arbitrary data sources is presented.
Data formats for recording X-ray diffraction data continue to evolve rapidly to accommodate new detector technologies developed in response to more intense light sources. Processing the data from single-crystal X-ray diffraction experiments therefore requires the ability to read, and correctly interpret, image data and metadata from a variety of instruments employing different experimental representations. Tools that have previously been developed to address this problem have been limited either by a lack of extensibility or by inconsistent treatment of image metadata. The dxtbx software package provides a consistent interface to both image data and experimental models, while supporting a completely generic user-extensible approach to reading the data files. The library is written in a mixture of C++ and Python and is distributed as part of the cctbx under an open-source licence at http://cctbx.sourceforge.net.
single-crystal X-ray diffraction; data processing; computer programs
X-ray free-electron laser (XFEL) sources enable the use of crystallography to
solve three-dimensional macromolecular structures under native conditions and free from
radiation damage. Results to date, however, have been limited by the challenge of deriving
accurate Bragg intensities from a heterogeneous population of microcrystals, while at the
same time modeling the X-ray spectrum and detector geometry. Here we present a
computational approach designed to extract statistically significant high-resolution
signals from fewer diffraction measurements.
Flexible torsion angle-based NCS restraints have been implemented in phenix.refine, allowing improved model refinement at all resolutions. Rotamer correction and rotamer consistency checks between NCS-related amino-acid side chains further improve the final model quality.
One of the great challenges in refining macromolecular crystal structures is a low data-to-parameter ratio. Historically, knowledge from chemistry has been used to help to improve this ratio. When a macromolecule crystallizes with more than one copy in the asymmetric unit, the noncrystallographic symmetry relationships can be exploited to provide additional restraints when refining the working model. However, although globally similar, NCS-related chains often have local differences. To allow for local differences between NCS-related molecules, flexible torsion-based NCS restraints have been introduced, coupled with intelligent rotamer handling for protein chains, and are available in phenix.refine for refinement of models at all resolutions.
macromolecular crystallography; noncrystallographic symmetry; NCS; refinement; automation
The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions.
Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.
refinement; ions; PHENIX
A new module, Guided Ligand Replacement (GLR), has been developed in Phenix to increase the ease and success rate of ligand placement when prior protein-ligand complexes are available.
The process of iterative structure-based drug design involves the X-ray crystal structure determination of upwards of 100 ligands with the same general scaffold (i.e. chemotype) complexed with very similar, if not identical, protein targets. In conjunction with insights from computational models and assays, this collection of crystal structures is analyzed to improve potency, to achieve better selectivity and to reduce liabilities such as absorption, distribution, metabolism, excretion and toxicology. Current methods for modeling ligands into electron-density maps typically do not utilize information on how similar ligands bound in related structures. Even if the electron density is of sufficient quality and resolution to allow de novo placement, the process can take considerable time as the size, complexity and torsional degrees of freedom of the ligands increase. A new module, Guided Ligand Replacement (GLR), was developed in Phenix to increase the ease and success rate of ligand placement when prior protein–ligand complexes are available. At the heart of GLR is an algorithm based on graph theory that associates atoms in the target ligand with analogous atoms in the reference ligand. Based on this correspondence, a set of coordinates is generated for the target ligand. GLR is especially useful in two situations: (i) modeling a series of large, flexible, complicated or macrocyclic ligands in successive structures and (ii) modeling ligands as part of a refinement pipeline that can automatically select a reference structure. Even in those cases for which no reference structure is available, if there are multiple copies of the bound ligand per asymmetric unit GLR offers an efficient way to complete the model after the first ligand has been placed. In all of these applications, GLR leverages prior knowledge from earlier structures to facilitate ligand placement in the current structure.
ligand placement; guided ligand-replacement method; GLR
A software system for automated protein–ligand crystallography has been implemented in the Phenix suite. This significantly reduces the manual effort required in high-throughput crystallographic studies.
High-throughput drug-discovery and mechanistic studies often require the determination of multiple related crystal structures that only differ in the bound ligands, point mutations in the protein sequence and minor conformational changes. If performed manually, solution and refinement requires extensive repetition of the same tasks for each structure. To accelerate this process and minimize manual effort, a pipeline encompassing all stages of ligand building and refinement, starting from integrated and scaled diffraction intensities, has been implemented in Phenix. The resulting system is able to successfully solve and refine large collections of structures in parallel without extensive user intervention prior to the final stages of model completion and validation.
protein–ligand complexes; automation; crystallographic structure solution and refinement
A low flow rate liquid microjet method for delivery of hydrated protein crystals to X-ray lasers is presented. Linac Coherent Light Source data demonstrates serial femtosecond protein crystallography with micrograms, a reduction of sample consumption by orders of magnitude.
An electrospun liquid microjet has been developed that delivers protein microcrystal suspensions at flow rates of 0.14–3.1 µl min−1 to perform serial femtosecond crystallography (SFX) studies with X-ray lasers. Thermolysin microcrystals flowed at 0.17 µl min−1 and diffracted to beyond 4 Å resolution, producing 14 000 indexable diffraction patterns, or four per second, from 140 µg of protein. Nanoflow electrospinning extends SFX to biological samples that necessitate minimal sample consumption.
serial femtosecond crystallography; nanoflow electrospinning
The functionality of the molecular-replacement pipeline phaser.MRage is introduced and illustrated with examples.
Phaser.MRage is a molecular-replacement automation framework that implements a full model-generation workflow and provides several layers of model exploration to the user. It is designed to handle a large number of models and can distribute calculations efficiently onto parallel hardware. In addition, phaser.MRage can identify correct solutions and use this information to accelerate the search. Firstly, it can quickly score all alternative models of a component once a correct solution has been found. Secondly, it can perform extensive analysis of identified solutions to find protein assemblies and can employ assembled models for subsequent searches. Thirdly, it is able to use a priori assembly information (derived from, for example, homologues) to speculatively place and score molecules, thereby customizing the search procedure to a certain class of protein molecule (for example, antibodies) and incorporating additional biological information into molecular replacement.
molecular replacement; pipeline; automation; phaser.MRage
Intense femtosecond X-ray pulses produced at the Linac Coherent Light Source (LCLS) were used for simultaneous X-ray diffraction (XRD) and X-ray emission spectroscopy (XES) of microcrystals of Photosystem II (PS II) at room temperature. This method probes the overall protein structure and the electronic structure of the Mn4CaO5 cluster in the oxygen-evolving complex of PS II. XRD data are presented from both the dark state (S1) and the first illuminated state (S2) of PS II. Our simultaneous XRD/XES study shows that the PS II crystals are intact during our measurements at the LCLS, not only with respect to the structure of PS II, but also with regard to the electronic structure of the highly radiation sensitive Mn4CaO5 cluster, opening new directions for future dynamics studies.
The Computational Crystallography Toolbox (cctbx) is a flexible software platform that has been used to develop high-throughput crystal-screening tools for both synchrotron sources and X-ray free-electron lasers. Plans for data-processing and visualization applications are discussed, and the benefits and limitations of using graphics-processing units are evaluated.
Current pixel-array detectors produce diffraction images at extreme data rates (of up to 2 TB h−1) that make severe demands on computational resources. New multiprocessing frameworks are required to achieve rapid data analysis, as it is important to be able to inspect the data quickly in order to guide the experiment in real time. By utilizing readily available web-serving tools that interact with the Python scripting language, it was possible to implement a high-throughput Bragg-spot analyzer (cctbx.spotfinder) that is presently in use at numerous synchrotron-radiation beamlines. Similarly, Python interoperability enabled the production of a new data-reduction package (cctbx.xfel) for serial femtosecond crystallography experiments at the Linac Coherent Light Source (LCLS). Future data-reduction efforts will need to focus on specialized problems such as the treatment of diffraction spots on interleaved lattices arising from multi-crystal specimens. In these challenging cases, accurate modeling of close-lying Bragg spots could benefit from the high-performance computing capabilities of graphics-processing units.
data processing; reusable code; multiprocessing; cctbx
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix
Structural biology and structural genomics projects routinely rely on recombinantly expressed proteins, but many proteins and complexes are difficult to obtain by this approach. We investigated native source proteins for high-throughput protein crystallography applications. The Escherichia coli proteome was fractionated, purified, crystallized, and structurally characterized. Macro-scale fermentation and fractionation were used to subdivide the soluble proteome into 408 unique fractions of which 295 fractions yielded crystals in microfluidic crystallization chips. Of the 295 crystals, 152 were selected for optimization, diffraction screening, and data collection. Twenty-three structures were determined, four of which were novel. This study demonstrates the utility of native source proteins for high-throughput crystallography.
The essential Mycobacterium tuberculosis Ser/Thr protein kinase (STPK), PknB, plays a key role in regulating growth and division, but the structural basis of activation has not been defined. Here we provide biochemical and structural evidence that dimerization through the kinase-domain (KD) N-lobe activates PknB by an allosteric mechanism. Promoting KD pairing using a small-molecule dimerizer stimulates the unphosphorylated kinase, and substitutions that disrupt N-lobe pairing decrease phosphorylation activity in vitro and in vivo. Multiple crystal structures of two monomeric PknB KD mutants in complex with nucleotide reveal diverse inactive conformations that contain large active-site distortions that propagate >30 Å from the mutation site. These results define flexible, inactive structures of a monomeric bacterial receptor KD and show how “back-to-back” N-lobe dimerization stabilizes the active KD conformation. This general mechanism of bacterial receptor STPK activation affords insights into the regulation of homologous eukaryotic kinases that form structurally similar dimers.
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
The Myxococcus xanthus FrzS protein transits from pole-to-pole within the cell, accumulating at the pole that defines the direction of movement in social (S) motility. Here we show using atomic-resolution crystallography and NMR that the FrzS receiver domain (RD) displays the conserved switch Tyr102 in an unusual conformation, lacks the conserved Asp phosphorylation site, and fails to bind Mg2+ or the phosphoryl analogue, Mg2+·BeF3. Mutation of Asp55, closest to the canonical site of RD phosphorylation, showed no motility phenotype in vivo, demonstrating that phosphorylation at this site is not necessary for domain function. In contrast, the Tyr102Ala and His92Phe substitutions on the canonical output face of the FrzS RD abolished S-motility in vivo. Single-cell fluorescence microscopy measurements revealed a striking mislocalization of these mutant FrzS proteins to the trailing cell pole in vivo. The crystal structures of the mutants suggested that the observed conformation of Tyr102 in the wild-type FrzS RD is not sufficient for function. These results support the model that FrzS contains a novel ‘pseudo-receiver domain’ whose function requires recognition of the RD output face but not Asp phosphorylation.
The database of molecular motions, MolMovDB (), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph Server) and predict possible motions in a single structure. In 2005, we expanded the services offered on MolMovDB. In particular, we further developed the Morph Server to produce improved interpolations between two submitted structures. We added support for multiple chains to the original adiabatic mapping interpolation, allowing the analysis of subunit motions. We also added the option of using FRODA interpolation, which allows for more complex pathways, potentially overcoming steric barriers. We added an interface to a hinge prediction service, which acts on single structures and predicts likely residue points for flexibility. We developed tools to relate such points of flexibility in a structure to particular key residue positions, i.e. active sites or highly conserved positions. Lastly, we began relating our motion classification scheme to function using descriptions from the Gene Ontology Consortium.
DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself.