A set of quantitative techniques is suggested for assessing SAXS data quality. These are applied in the form of a script, SAXStats, to a test set of 27 proteins, showing that these techniques are more sensitive than manual assessment of data quality.
Small-angle X-ray scattering (SAXS) has grown in popularity in recent times with the advent of bright synchrotron X-ray sources, powerful computational resources and algorithms enabling the calculation of increasingly complex models. However, the lack of standardized data-quality metrics presents difficulties for the growing user community in accurately assessing the quality of experimental SAXS data. Here, a series of metrics to quantitatively describe SAXS data in an objective manner using statistical evaluations are defined. These metrics are applied to identify the effects of radiation damage, concentration dependence and interparticle interactions on SAXS data from a set of 27 previously described targets for which high-resolution structures have been determined via X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The studies show that these metrics are sufficient to characterize SAXS data quality on a small sample set with statistical rigor and sensitivity similar to or better than manual analysis. The development of data-quality analysis strategies such as these initial efforts is needed to enable the accurate and unbiased assessment of SAXS data quality.
SAXS data quality; SAXStats
Observations of crystallization experiments are classified as specific outcomes and integrated through a phase diagram to visualize solubility and thereby direct subsequent experiments. Specific examples are taken from our high-throughput crystallization laboratory which provided a broad scope of data from 20 million crystallization experiments on 12,500 different biological macromolecules. The methods and rationale are broadly and generally applicable in any crystallization laboratory. Through a combination of incomplete factorial sampling of crystallization cocktails, standard outcome classifications, visualization of outcomes as they relate chemically and application of a simple phase diagram approach we demonstrate how to logically design subsequent crystallization experiments.
As technology advances, the crystal volume that can be used to collect useful X-ray diffraction data decreases. The technologies available to detect and study growing crystals beyond the optical resolution limit and methods to successfully place the crystal into the X-ray beam are discussed.
Structural biology has contributed tremendous knowledge to the understanding of life on the molecular scale. The Protein Data Bank, a depository of this structural knowledge, currently contains over 100 000 protein structures, with the majority stemming from X-ray crystallography. As the name might suggest, crystallography requires crystals. As detectors become more sensitive and X-ray sources more intense, the notion of a crystal is gradually changing from one large enough to embellish expensive jewellery to objects that have external dimensions of the order of the wavelength of visible light. Identifying these crystals is a prerequisite to their study. This paper discusses developments in identifying these crystals during crystallization screening and distinguishing them from other potential outcomes. The practical aspects of ensuring that once a crystal is identified it can then be positioned in the X-ray beam for data collection are also addressed.
crystal detection; crystal growth; crystal positioning
Eukaryotic glutaminyl-tRNA synthetase (GlnRS) contains an appended N-terminal domain (NTD) whose precise function is unknown. Although GlnRS structures from two prokaryotic species are known, no eukaryotic GlnRS structure has been reported. Here we present the first crystallographic structure of yeast GlnRS, finding that the structure of the C-terminal domain is highly similar to Escherichia coli GlnRS but that 214 residues, including the NTD, are crystallographically disordered. We present a model of the full-length enzyme in solution, using the structures of the C-terminal domain, and the isolated NTD, with small-angle X-ray scattering data of the full-length molecule. We proceed to model the enzyme bound to tRNA, using the crystallographic structures of GatCAB and GlnRS–tRNA complex from bacteria. We contrast the tRNA-bound model with the tRNA-free solution state and perform molecular dynamics on the full-length GlnRS–tRNA complex, which suggests that tRNA binding involves the motion of a conserved hinge in the NTD.
eukaryotic glutaminyl-tRNA synthetase; structure; C-terminal domain; small-angle X-ray scattering; molecular dynamics
X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.
Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields.
The rich history of crystallization and how that history influences current practices is described. The tremendous impact of crystallization screens on the field is discussed.
While crystallization historically predates crystallography, it is a critical step for the crystallographic process. The rich history of crystallization and how that history influences current practices is described. The tremendous impact of crystallization screens on the field is discussed.
The heterotrimeric protein complex containing the integrin linked kinase (ILK), parvin, and PINCH proteins, termed the IPP complex, is an essential component of focal adhesions, where it interacts with many proteins to mediate signaling from integrin adhesion receptors. Here we conduct a biochemical and structural analysis of the minimal IPP complex, comprising full-length human ILK, the LIM1 domain of PINCH1, and the CH2 domain of α-parvin. We provide a detailed purification protocol for IPP and show that the purified IPP complex is stable and monodisperse in solution. Using small-angle X-ray scattering (SAXS), we also conduct the first structural characterization of IPP, which reveals an elongated shape with dimensions 120×60×40 Å. Flexibility analysis using the ensemble optimization method (EOM) is consistent with an IPP complex structure with limited flexibility, raising the possibility that inter-domain interactions exist. However, our studies suggest that the inter-domain linker in ILK is accessible and we detect no inter-domain contacts by gel filtration analysis. This study provides a structural foundation to understand the conformational restraints that govern the IPP complex.
Structural crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy are the predominant techniques for understanding the biological world on a molecular level. Crystallography is constrained by the ability to form a crystal that diffracts well and NMR is constrained to smaller proteins. While powerful techniques they leave many soluble, purified protein samples structurally uncharacterized. Small Angle X-ray Scattering (SAXS) is a solution technique that provides data on the size and multiple conformations of a sample, and can be used to reconstruct a low resolution molecular envelope of a macromolecule. In this study SAXS has been used in a high-throughput manner on a subset of 28 proteins where structural information is available from crystallographic and/or NMR techniques. These crystallographic and NMR structures were used to validate the accuracy of molecular envelopes reconstructed from SAXS data on a statistical level, to compare and highlight complementary structural information that SAXS provides, and to leverage biological information derived by crystallographers and spectroscopists from their structures. All of the ab initio molecular envelopes calculated from the SAXS data agree well with the available structural information. SAXS is a powerful albeit low-resolution technique that can provide additional structural information in a high-throughput and complementary manner to improve the functional interpretation of high-resolution structures.
Development of an ontology for the description of crystallization experiments and results is proposed.
When crystallization screening is conducted many outcomes are observed but typically the only trial recorded in the literature is the condition that yielded the crystal(s) used for subsequent diffraction studies. The initial hit that was optimized and the results of all the other trials are lost. These missing results contain information that would be useful for an improved general understanding of crystallization. This paper provides a report of a crystallization data exchange (XDX) workshop organized by several international large-scale crystallization screening laboratories to discuss how this information may be captured and utilized. A group that administers a significant fraction of the world’s crystallization screening results was convened, together with chemical and structural data informaticians and computational scientists who specialize in creating and analysing large disparate data sets. The development of a crystallization ontology for the crystallization community was proposed. This paper (by the attendees of the workshop) provides the thoughts and rationale leading to this conclusion. This is brought to the attention of the wider audience of crystallographers so that they are aware of these early efforts and can contribute to the process going forward.
crystallization screening data; crystallization ontology
In all organisms, aminoacyl tRNA synthetases covalently attach amino acids to their cognate tRNAs. Many eukaryotic tRNA synthetases have acquired appended domains, whose origin, structure and function are poorly understood. The N-terminal appended domain (NTD) of glutaminyl-tRNA synthetase (GlnRS) is intriguing since GlnRS is primarily a eukaryotic enzyme, whereas in other kingdoms Gln-tRNAGln is primarily synthesized by first forming Glu-tRNAGln, followed by conversion to Gln-tRNAGln by a tRNA-dependent amidotransferase. We report a functional and structural analysis of the NTD of Saccharomyces cerevisiae GlnRS, Gln4. Yeast mutants lacking the NTD exhibit growth defects, and Gln4 lacking the NTD has reduced complementarity for tRNAGln and glutamine. The 187-amino acid Gln4 NTD, crystallized and solved at 2.3 Å resolution, consists of two subdomains, each exhibiting an extraordinary structural resemblance to adjacent tRNA specificity-determining domains in the GatB subunit of the GatCAB amidotransferase, which forms Gln-tRNAGln. These subdomains are connected by an apparent hinge comprised of conserved residues. Mutation of these amino acids produces Gln4 variants with reduced affinity for tRNAGln, consistent with a hinge-closing mechanism proposed for GatB recognition of tRNA. Our results suggest a possible origin and function of the NTD that would link the phylogenetically diverse mechanisms of Gln-tRNAGln synthesis.
Using high-throughput crystallization screening technologies and data analysis, an educational program has been developed to teach the scientific method through crystallization and access to a grocery store, a post office and the internet.
Crystallography is a multidisciplinary field that links divergent areas of mathematics, science and engineering to provide knowledge of life on an atomic scale. Crystal growth, a key component of the field, is an ideal vehicle for education. Crystallization has been used with a ‘grocery store chemistry’ approach and linked to high-throughput remote-access screening technologies. This approach provides an educational opportunity that can effectively teach the scientific method, readily accommodate different levels of educational experience, and reach any student with access to a grocery store, a post office and the internet. This paper describes the formation of the program through the students who helped develop and prototype the procedures. A summary is presented of the analysis and preliminary results and a description given of how the program could be linked with other aspects of crystallography. This approach has the potential to bridge the gap between students in remote locations and with limited funding, and access to scientific resources, providing students with an international-level research experience.
crystallographic education; high throughput
Nucleotide biosynthesis pathways have been reported to be essential in some protozoan pathogens. Hence, we evaluated the essentiality of one enzyme in the pyrimidine biosyn-thetic pathway, dihydroorotate dehydrogenase (DHODH) from the eukaryotic parasite Trypanosoma brucei through gene knockdown studies. RNAi knockdown of DHODH expression in bloodstream-form T. brucei did not inhibit growth in normal medium, but profoundly retarded growth in pyrimidine-depleted media or in the presence of the known pyrimidine uptake antagonist 5-fluoruracil (5-FU). These results have significant implications for the development of therapeutics to combat T. brucei infection. Specifically, a combination therapy including a T. brucei-specific DHODH inhibitor plus 5-FU may prove to be an effective therapeutic strategy. We also show that this trypanosomal enzyme is inhibited by known inhibitors of bacterial Class 1A DHODH, in distinction to the sensitivity of DHODH from human and other higher eukaryotes. This selectivity is supported by the crystal structure of the T. brucei enzyme, which is reported here at a resolution of 1.95 Å. Additional research, guided by the crystal structure described herein, is needed to identify potent inhibitors of T. brucei DHODH.
flavoprotein; pyrimidine biosynthesis; gene knockdown; kinetoplastid; RNAi
Crystallization has proven to be the most significant bottleneck to high-throughput protein structure determination using diffraction methods. We have used the large-scale, systematically generated experimental results of the Northeast Structural Genomics Consortium to characterize the biophysical properties that control protein crystallization. Datamining of crystallization results combined with explicit folding studies lead to the conclusion that crystallization propensity is controlled primarily by the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. These analyses identify specific sequence features correlating with crystallization propensity that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the bulk amino acid sequence properties of human versus eubacterial proteins that reflect likely differences in their biophysical properties including crystallization propensity. Finally, our thermodynamic measurements enable critical evaluation of previous claims regarding correlations between protein stability and bulk sequence properties, which generally are not supported by our dataset.
protein crystallization; protein thermodynamics; crystallization mechanism; surface entropy; datamining; structural genomics
AutoSherlock is a program that visually represents results from the Hauptman–Woodward High-Throughput Crystallization Screening Service in chemical space. It thereby aids in the determination and further optimization of crystallization conditions.
A program, AutoSherlock, has been developed to present crystallization screening results in terms of chemical space. This facilitates identification of lead conditions, rational interpretation of results and directions for the optimization of crystallization conditions.
AutoSherlock; computer programs; crystallization; data analysis
Mapping crystallization results in chemical space helps to correlate seemingly distant relationships between crystallization conditions, points to possible optimization strategies and reveals promising unsampled areas of crystallization space.
Macromolecular crystallization screening is an empirical process. It often begins by setting up experiments with a number of chemically diverse cocktails designed to sample chemical space known to promote crystallization. Where a potential crystal is seen a refined screen is set up, optimizing around that condition. By using an incomplete factorial sampling of chemical space to formulate the cocktails and presenting the results graphically, it is possible to readily identify trends relevant to crystallization, coarsely sample the phase diagram and help guide the optimization process. In this paper, chemical space mapping is applied to both single macromolecules and to a diverse set of macromolecules in order to illustrate how visual information is more readily understood and assimilated than the same information presented textually.
chemical space mapping; crystallization screening
As part of a training set for automated image analysis, ∼150 000 images of crystallization experiments from 96 diverse macromolecules have been visually classified within seven categories. Outcomes and trends are analyzed.
Structural crystallography aims to provide a three-dimensional representation of macromolecules. Many parts of the multistep process to produce the three-dimensional structural model have been automated, especially through various structural genomics projects. A key step is the production of crystals for diffraction. The target macromolecule is combined with a large and chemically diverse set of cocktails with some leading ideally, but infrequently, to crystallization. A variety of outcomes will be observed during these screening experiments that typically require human interpretation for classification. Human interpretation is neither scalable nor objective, highlighting the need to develop an automatic computer-based image classification. As a first step towards automated image classification, 147 456 images representing crystallization experiments from 96 different macromolecular samples were manually classified. Each image was classified by three experts into seven predefined categories or their combinations. The resulting data where all three observers are in agreement provides one component of a truth set for the development and rigorous testing of automated image-classification systems and provides information about the chemical cocktails used for crystallization. In this paper, the details of this study are presented.
crystallization; image classification
As part of a training set for automated image analysis, crystallization screening experiments for 269 different macromolecules were visually analyzed and a set of crystal images extracted. Outcomes and trends are analyzed.
In the automated image analysis of crystallization experiments, representative examples of outcomes can be obtained rapidly. However, while the outcomes appear to be diverse, the number of crystalline outcomes can be small. To complement a training set from the visual observation of 147 456 crystallization outcomes, a set of crystal images was produced from 106 and 163 macromolecules under study for the North East Structural Genomics Consortium (NESG) and Structural Genomics of Pathogenic Protozoa (SGPP) groups, respectively. These crystal images have been combined with the initial training set. A description of the crystal-enriched data set and a preliminary analysis of outcomes from the data are described.
crystallization; image analysis; crystal images