1.  Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining 
IUCrJ  2014;1(Pt 3):179-193.
The dual role of the Protein Data Bank as a repository of all macromolecular structures and as the major source of structural metadata for further analysis is discussed and suggestions are made on how to identify models that contain errors and could potentially degrade the quality of meta analyses.
Whereas the vast majority of the more than 85 000 crystal structures of macromolecules currently deposited in the Protein Data Bank are of high quality, some suffer from a variety of imperfections. Although this fact has been pointed out in the past, it is still worth periodic updates so that the metadata obtained by global analysis of the available crystal structures, as well as the utilization of the individual structures for tasks such as drug design, should be based on only the most reliable data. Here, selected abnormal deposited structures have been analysed based on the Bayesian reasoning that the correctness of a model must be judged against both the primary evidence as well as prior knowledge. These structures, as well as information gained from the corresponding publications (if available), have emphasized some of the most prevalent types of common problems. The errors are often perfect illustrations of the nature of human cognition, which is frequently influenced by preconceptions that may lead to fanciful results in the absence of proper validation. Common errors can be traced to negligence and a lack of rigorous verification of the models against electron density, creation of non-parsimonious models, generation of improbable numbers, application of incorrect symmetry, illogical presentation of the results, or violation of the rules of chemistry and physics. Paying more attention to such problems, not only in the final validation stages but during the structure-determination process as well, is necessary not only in order to maintain the highest possible quality of the structural repositories and databases but most of all to provide a solid basis for subsequent studies, including large-scale data-mining projects. For many scientists PDB deposition is a rather infrequent event, so the need for proper training and supervision is emphasized, as well as the need for constant alertness of reason and critical judgment as absolutely necessary safeguarding measures against such problems. Ways of identifying more problematic structures are suggested so that their users may be properly alerted to their possible shortcomings.
PMCID: PMC4086436
macromolecular crystallography; model validation; Protein Data Bank
2.  Detection and analysis of unusual features in the structural model and structure-factor data of a birch pollen allergen 
The structure factors deposited with PDB entry 3k78 show properties inconsistent with experimentally observed diffraction data, and without uncertainty represent calculated structure factors. The refinement of the 3k78 model against these structure factors leads to an isomorphous structure different from the deposited model with an implausibly small R value (0.019).
Physically improbable features in the model of the birch pollen structure Bet v 1d (PDB entry 3k78) are faithfully reproduced in electron density generated with the deposited structure factors, but these structure factors themselves exhibit properties that are characteristic of data calculated from a simple model and are inconsistent with the data and error model obtained through experimental measurements. The refinement of the 3k78 model against these structure factors leads to an isomorphous structure different from the deposited model with an implausibly small R value (0.019). The abnormal refinement is compared with normal refinement of an isomorphous variant structure of Bet v 1l (PDB entry 1fm4). A variety of analytical tools, including the application of Diederichs plots, Rσ plots and bulk-solvent analysis are discussed as promising aids in validation. The examination of the Bet v 1d structure also cautions against the practice of indicating poorly defined protein chain residues through zero occupancies. The recommendation to preserve diffraction images is amplified.
PMCID: PMC3325800  PMID: 22505400
protein structure; Bet V 1 birch pollen allergen; Diederichs plot; validation; bulk-solvent correction; refinement statistics; intensity statistics
3.  Model building, refinement and validation 
An introduction to the proceedings of the CCP4 Study Weekend held at the University of Warwick on the 6–7 January 2011.
PMCID: PMC3322591  PMID: 22505252
CCP4 Study Weekend
4.  Structure of Rv1848 (UreA), the Mycobacterium tuberculosis urease γ subunit 
Crystal and solution structures of Rv1848 protein and their implications in the biological assembly of Mtb urease is presented.
The crystal structure of the urease γ subunit (UreA) from Mycobacterium tuberculosis, Rv1848, has been determined at 1.8 Å resolution. The asymmetric unit contains three copies of Rv1848 arranged into a homotrimer that is similar to the UreA trimer in the structure of urease from Klebsiella aerogenes. Small-angle X-ray scattering experiments indicate that the Rv1848 protein also forms trimers in solution. The observed homotrimer and the organization of urease genes within the M. tuberculosis genome suggest that M. tuberculosis urease has the (αβγ)3 composition observed for other bacterial ureases. The γ subunit may be of primary importance for the formation of the urease quaternary structure.
PMCID: PMC2898460  PMID: 20606272
Mycobacterium tuberculosis; urease; structural genomics
5.  Operator-assisted harvesting of protein crystals using a universal micromanipulation robot 
Journal of Applied Crystallography  2007;40(Pt 3):539-545.
The prototype of a universal micromanipulation robot for crystal harvesting is presented, and a robotically harvested trypsin crystal yields a high-resolution structure demonstrating the feasibility of robotic protein crystal harvesting.
High-throughput crystallography has reached a level of automation where complete computer-assisted robotic crystallization pipelines are capable of cocktail preparation, crystallization plate setup, and inspection and interpretation of results. While mounting of crystal pins, data collection and structure solution are highly automated, crystal harvesting and cryocooling remain formidable challenges towards full automation. To address the final frontier in achieving fully automated high-throughput crystallography, the prototype of an anthropomorphic six-axis universal micromanipulation robot (UMR) has been designed and tested; this UMR is capable of operator-assisted harvesting and cryoquenching of protein crystals as small as 10 µm from a variety of 96-well plates. The UMR is equipped with a versatile tool exchanger providing full operational flexibility. Trypsin crystals harvested and cryoquenched using the UMR have yielded a 1.5 Å structure demonstrating the feasibility of robotic protein crystal harvesting.
PMCID: PMC2483483  PMID: 19461845
automated crystal harvesting; crystal mounting; cryoprotection; trypsin; protease; benzamidine complex; protamine; intermolecular contacts; crystallization additives

