Observations of crystallization experiments are classified as specific outcomes and integrated through a phase diagram to visualize solubility and thereby direct subsequent experiments. Specific examples are taken from our high-throughput crystallization laboratory which provided a broad scope of data from 20 million crystallization experiments on 12,500 different biological macromolecules. The methods and rationale are broadly and generally applicable in any crystallization laboratory. Through a combination of incomplete factorial sampling of crystallization cocktails, standard outcome classifications, visualization of outcomes as they relate chemically and application of a simple phase diagram approach we demonstrate how to logically design subsequent crystallization experiments.
The heterotrimeric protein complex containing the integrin linked kinase (ILK), parvin, and PINCH proteins, termed the IPP complex, is an essential component of focal adhesions, where it interacts with many proteins to mediate signaling from integrin adhesion receptors. Here we conduct a biochemical and structural analysis of the minimal IPP complex, comprising full-length human ILK, the LIM1 domain of PINCH1, and the CH2 domain of α-parvin. We provide a detailed purification protocol for IPP and show that the purified IPP complex is stable and monodisperse in solution. Using small-angle X-ray scattering (SAXS), we also conduct the first structural characterization of IPP, which reveals an elongated shape with dimensions 120×60×40 Å. Flexibility analysis using the ensemble optimization method (EOM) is consistent with an IPP complex structure with limited flexibility, raising the possibility that inter-domain interactions exist. However, our studies suggest that the inter-domain linker in ILK is accessible and we detect no inter-domain contacts by gel filtration analysis. This study provides a structural foundation to understand the conformational restraints that govern the IPP complex.
Structural crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy are the predominant techniques for understanding the biological world on a molecular level. Crystallography is constrained by the ability to form a crystal that diffracts well and NMR is constrained to smaller proteins. While powerful techniques they leave many soluble, purified protein samples structurally uncharacterized. Small Angle X-ray Scattering (SAXS) is a solution technique that provides data on the size and multiple conformations of a sample, and can be used to reconstruct a low resolution molecular envelope of a macromolecule. In this study SAXS has been used in a high-throughput manner on a subset of 28 proteins where structural information is available from crystallographic and/or NMR techniques. These crystallographic and NMR structures were used to validate the accuracy of molecular envelopes reconstructed from SAXS data on a statistical level, to compare and highlight complementary structural information that SAXS provides, and to leverage biological information derived by crystallographers and spectroscopists from their structures. All of the ab initio molecular envelopes calculated from the SAXS data agree well with the available structural information. SAXS is a powerful albeit low-resolution technique that can provide additional structural information in a high-throughput and complementary manner to improve the functional interpretation of high-resolution structures.
Development of an ontology for the description of crystallization experiments and results is proposed.
When crystallization screening is conducted many outcomes are observed but typically the only trial recorded in the literature is the condition that yielded the crystal(s) used for subsequent diffraction studies. The initial hit that was optimized and the results of all the other trials are lost. These missing results contain information that would be useful for an improved general understanding of crystallization. This paper provides a report of a crystallization data exchange (XDX) workshop organized by several international large-scale crystallization screening laboratories to discuss how this information may be captured and utilized. A group that administers a significant fraction of the world’s crystallization screening results was convened, together with chemical and structural data informaticians and computational scientists who specialize in creating and analysing large disparate data sets. The development of a crystallization ontology for the crystallization community was proposed. This paper (by the attendees of the workshop) provides the thoughts and rationale leading to this conclusion. This is brought to the attention of the wider audience of crystallographers so that they are aware of these early efforts and can contribute to the process going forward.
crystallization screening data; crystallization ontology
In all organisms, aminoacyl tRNA synthetases covalently attach amino acids to their cognate tRNAs. Many eukaryotic tRNA synthetases have acquired appended domains, whose origin, structure and function are poorly understood. The N-terminal appended domain (NTD) of glutaminyl-tRNA synthetase (GlnRS) is intriguing since GlnRS is primarily a eukaryotic enzyme, whereas in other kingdoms Gln-tRNAGln is primarily synthesized by first forming Glu-tRNAGln, followed by conversion to Gln-tRNAGln by a tRNA-dependent amidotransferase. We report a functional and structural analysis of the NTD of Saccharomyces cerevisiae GlnRS, Gln4. Yeast mutants lacking the NTD exhibit growth defects, and Gln4 lacking the NTD has reduced complementarity for tRNAGln and glutamine. The 187-amino acid Gln4 NTD, crystallized and solved at 2.3 Å resolution, consists of two subdomains, each exhibiting an extraordinary structural resemblance to adjacent tRNA specificity-determining domains in the GatB subunit of the GatCAB amidotransferase, which forms Gln-tRNAGln. These subdomains are connected by an apparent hinge comprised of conserved residues. Mutation of these amino acids produces Gln4 variants with reduced affinity for tRNAGln, consistent with a hinge-closing mechanism proposed for GatB recognition of tRNA. Our results suggest a possible origin and function of the NTD that would link the phylogenetically diverse mechanisms of Gln-tRNAGln synthesis.
Using high-throughput crystallization screening technologies and data analysis, an educational program has been developed to teach the scientific method through crystallization and access to a grocery store, a post office and the internet.
Crystallography is a multidisciplinary field that links divergent areas of mathematics, science and engineering to provide knowledge of life on an atomic scale. Crystal growth, a key component of the field, is an ideal vehicle for education. Crystallization has been used with a ‘grocery store chemistry’ approach and linked to high-throughput remote-access screening technologies. This approach provides an educational opportunity that can effectively teach the scientific method, readily accommodate different levels of educational experience, and reach any student with access to a grocery store, a post office and the internet. This paper describes the formation of the program through the students who helped develop and prototype the procedures. A summary is presented of the analysis and preliminary results and a description given of how the program could be linked with other aspects of crystallography. This approach has the potential to bridge the gap between students in remote locations and with limited funding, and access to scientific resources, providing students with an international-level research experience.
crystallographic education; high throughput
Nucleotide biosynthesis pathways have been reported to be essential in some protozoan pathogens. Hence, we evaluated the essentiality of one enzyme in the pyrimidine biosyn-thetic pathway, dihydroorotate dehydrogenase (DHODH) from the eukaryotic parasite Trypanosoma brucei through gene knockdown studies. RNAi knockdown of DHODH expression in bloodstream-form T. brucei did not inhibit growth in normal medium, but profoundly retarded growth in pyrimidine-depleted media or in the presence of the known pyrimidine uptake antagonist 5-fluoruracil (5-FU). These results have significant implications for the development of therapeutics to combat T. brucei infection. Specifically, a combination therapy including a T. brucei-specific DHODH inhibitor plus 5-FU may prove to be an effective therapeutic strategy. We also show that this trypanosomal enzyme is inhibited by known inhibitors of bacterial Class 1A DHODH, in distinction to the sensitivity of DHODH from human and other higher eukaryotes. This selectivity is supported by the crystal structure of the T. brucei enzyme, which is reported here at a resolution of 1.95 Å. Additional research, guided by the crystal structure described herein, is needed to identify potent inhibitors of T. brucei DHODH.
flavoprotein; pyrimidine biosynthesis; gene knockdown; kinetoplastid; RNAi
Crystallization has proven to be the most significant bottleneck to high-throughput protein structure determination using diffraction methods. We have used the large-scale, systematically generated experimental results of the Northeast Structural Genomics Consortium to characterize the biophysical properties that control protein crystallization. Datamining of crystallization results combined with explicit folding studies lead to the conclusion that crystallization propensity is controlled primarily by the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. These analyses identify specific sequence features correlating with crystallization propensity that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the bulk amino acid sequence properties of human versus eubacterial proteins that reflect likely differences in their biophysical properties including crystallization propensity. Finally, our thermodynamic measurements enable critical evaluation of previous claims regarding correlations between protein stability and bulk sequence properties, which generally are not supported by our dataset.
protein crystallization; protein thermodynamics; crystallization mechanism; surface entropy; datamining; structural genomics
AutoSherlock is a program that visually represents results from the Hauptman–Woodward High-Throughput Crystallization Screening Service in chemical space. It thereby aids in the determination and further optimization of crystallization conditions.
A program, AutoSherlock, has been developed to present crystallization screening results in terms of chemical space. This facilitates identification of lead conditions, rational interpretation of results and directions for the optimization of crystallization conditions.
AutoSherlock; computer programs; crystallization; data analysis
Mapping crystallization results in chemical space helps to correlate seemingly distant relationships between crystallization conditions, points to possible optimization strategies and reveals promising unsampled areas of crystallization space.
Macromolecular crystallization screening is an empirical process. It often begins by setting up experiments with a number of chemically diverse cocktails designed to sample chemical space known to promote crystallization. Where a potential crystal is seen a refined screen is set up, optimizing around that condition. By using an incomplete factorial sampling of chemical space to formulate the cocktails and presenting the results graphically, it is possible to readily identify trends relevant to crystallization, coarsely sample the phase diagram and help guide the optimization process. In this paper, chemical space mapping is applied to both single macromolecules and to a diverse set of macromolecules in order to illustrate how visual information is more readily understood and assimilated than the same information presented textually.
chemical space mapping; crystallization screening
As part of a training set for automated image analysis, ∼150 000 images of crystallization experiments from 96 diverse macromolecules have been visually classified within seven categories. Outcomes and trends are analyzed.
Structural crystallography aims to provide a three-dimensional representation of macromolecules. Many parts of the multistep process to produce the three-dimensional structural model have been automated, especially through various structural genomics projects. A key step is the production of crystals for diffraction. The target macromolecule is combined with a large and chemically diverse set of cocktails with some leading ideally, but infrequently, to crystallization. A variety of outcomes will be observed during these screening experiments that typically require human interpretation for classification. Human interpretation is neither scalable nor objective, highlighting the need to develop an automatic computer-based image classification. As a first step towards automated image classification, 147 456 images representing crystallization experiments from 96 different macromolecular samples were manually classified. Each image was classified by three experts into seven predefined categories or their combinations. The resulting data where all three observers are in agreement provides one component of a truth set for the development and rigorous testing of automated image-classification systems and provides information about the chemical cocktails used for crystallization. In this paper, the details of this study are presented.
crystallization; image classification
As part of a training set for automated image analysis, crystallization screening experiments for 269 different macromolecules were visually analyzed and a set of crystal images extracted. Outcomes and trends are analyzed.
In the automated image analysis of crystallization experiments, representative examples of outcomes can be obtained rapidly. However, while the outcomes appear to be diverse, the number of crystalline outcomes can be small. To complement a training set from the visual observation of 147 456 crystallization outcomes, a set of crystal images was produced from 106 and 163 macromolecules under study for the North East Structural Genomics Consortium (NESG) and Structural Genomics of Pathogenic Protozoa (SGPP) groups, respectively. These crystal images have been combined with the initial training set. A description of the crystal-enriched data set and a preliminary analysis of outcomes from the data are described.
crystallization; image analysis; crystal images