1.  Improved Crystallographic Structures using Extensive Combinatorial Refinement 
Structure (London, England : 1993)  2013;21(11):1923-1930.
Identifying errors and alternate conformers, and modeling multiple main-chain conformers in poorly ordered regions are overarching problems in crystallographic structure determination that have limited automation efforts and structure quality. Here, we show that implementation of a full factorial designed set of standard refinement approaches, which we call ExCoR (Extensive Combinatorial Refinement), significantly improves structural models compared to the traditional linear tree approach, in which individual algorithms are tested linearly, and only incorporated if the model improves. ExCoR markedly improved maps and models, and reveals building errors and alternate conformations that were masked by traditional refinement approaches. Surprisingly, an individual algorithm that renders a model worse in isolation could still be necessary to produce the best overall model, suggesting that model distortion allows escape from local minima of optimization target function, here shown to be a hallmark limitation of the traditional approach. ExCoR thus provides a simple approach to improving structure determination.
PMCID: PMC4070946  PMID: 24076406
2.  PubChem promiscuity: a web resource for gathering compound promiscuity data from PubChem 
Bioinformatics  2011;28(1):140-141.
Summary: Promiscuity counts allow for a better understanding of a compound's assay activity profile and drug potential. Although PubChem contains a vast amount of compound and assay data, it currently does not have a convenient or efficient method to obtain in-depth promiscuity counts for compounds. PubChem promiscuity fills this gap. It is a Java servlet that uses NCBI Entrez (eUtils) web services to interact with PubChem and provide promiscuity counts in a variety of categories along with compound descriptors, including PAINS-based functional group detection.
PMCID: PMC3276228  PMID: 22084255
3.  BioAssay Ontology Annotations Facilitate Cross-Analysis of Diverse High-throughput Screening Data Sets 
Journal of biomolecular screening  2011;16(4):415-426.
High-throughput screening data repositories, such as PubChem, represent valuable resources for the development of small molecule chemical probes and can serve as entry points for drug discovery programs. While the loose data format offered by PubChem allows for great flexibility, important annotations, such as the assay format and technologies employed, are not explicitly indexed. We have previously developed a BioAssay Ontology (BAO) and curated over 350 assays with standardized BAO terms. Here we describe the use of BAO annotations to analyze a large set of assays that employ luciferase- and β-lactamase-based technologies. We identified promiscuous chemotypes pertaining to different sub-categories of assays and specific mechanisms by which these chemotypes interfere in reporter gene assays. Our results show that the data in PubChem can be used to identify promiscuous compounds that interfere non-specifically with particular technologies. Furthermore, we show that BAO is a valuable toolset for the identification of related assays and for the systematic generation of insights that are beyond the scope of individual assays or screening campaigns.
PMCID: PMC3167204  PMID: 21471461
compound promiscuity; assay ontology; reporter gene assays; high-throughput screening data analysis; cheminformatics
4.  A Java API for working with PubChem datasets 
Bioinformatics  2011;27(5):741-742.
Summary: PubChem is a public repository of chemical structures and associated biological activities. The PubChem BioAssay database contains assay descriptions, conditions and readouts and biological screening results that have been submitted by the biomedical research community. The PubChem web site and Power User Gateway (PUG) web service allow users to interact with the data and raw files are available via FTP.
These resources are helpful to many but there can also be great benefit by using a software API to manipulate the data. Here, we describe a Java API with entity objects mapped to the PubChem Schema and with wrapper functions for calling the NCBI eUtilities and PubChem PUG web services. PubChem BioAssays and associated chemical compounds can then be queried and manipulated in a local relational database. Features include chemical structure searching and generation and display of curve fits from stored dose–response experiments, something that is not yet available within PubChem itself. The aim is to provide researchers with a fast, consistent, queryable local resource from which to manipulate PubChem BioAssays in a database agnostic manner. It is not intended as an end user tool but to provide a platform for further automation and tools development.
PMCID: PMC3105478  PMID: 21216779
5.  A Two-Stage Differential Hydrogen Deuterium Exchange Method for the Rapid Characterization of Protein/Ligand Interactions 
The peroxisome proliferator-activated receptor is a member of the nuclear receptor superfamily of transcriptional regulators. Regulation of the nuclear receptors occurs through changes to the structure and dynamics of the ligand-binding domain. Therefore, the need has arisen for a rapid method capable of detecting changes in the dynamics of nuclear receptors following ligand binding. We recently described how solution-phase amide hydrogen/deuterium exchange (HDX) provides a biophysical technique for probing changes in protein dynamics induced by ligand interaction. Building from this platform, we have optimized the robustness of the differential HDX experiment by minimizing systematic errors, and have increased the efficiency of the chromatographic separation through the use of high-pressure liquid chromatography. Using knowledge gained previously from comprehensive HDX experiments of PPARγ, a modest throughput method to probe changes in the dynamics of key regions of the receptor was developed. A collection of ten synthetic and endogenous PPARγ ligands were characterized with this new method requiring approximately 24 h of analysis. This is a dramatic improvement over the 10 d of analysis that would have been required with our previous approach for comprehensive differential HDX analysis. In addition to demonstrating the utility of this approach, the study presented here is the first to measure changes to the dynamics of PPARγ upon the binding of putative endogenous ligands.
PMCID: PMC2062560  PMID: 17916792
PPAR; HDX; mass spectrometry; nuclear receptor
6.  Rapid Analysis of Protein Structure and Dynamics by Hydrogen/Deuterium Exchange Mass Spectrometry 
An automated approach for the rapid analysis of protein structure has been developed and used to study acid-induced conformational changes in human growth hormone. The labeling approach involves hydrogen/deuterium exchange (H/D-Ex) of protein backbone amide hydrogens with rapid and sensitive detection by mass spectrometry (MS). Briefly, the protein is incubated for defined intervals in a deuterated environment. After rapid quenching of the exchange reaction, the partially deuterated protein is enzymatically digested and the resulting peptide fragments are analyzed by liquid chromatography mass spectrometry (LC-MS). The deuterium buildup curve measured for each fragment yields an average amide exchange rate that reflects the environment of the peptide in the intact protein. Additional analyses allow mapping of the free energy of folding on localized segments along the protein sequence affording unique dynamic and structural information. While amide H/D-Ex coupled with MS is recognized as a powerful technique for studying protein structure and protein–ligand interactions, it has remained a labor-intensive task. The improvements in the amide H/D-Ex methodology described here include solid phase proteolysis, automated liquid handling and sample preparation, and integrated data reduction software that together improve sequence coverage and resolution, while achieving a sample throughput nearly 10-fold higher than the commonly used manual methods.
PMCID: PMC2279949  PMID: 13678147
hydrogen/deuterium exchange; mass spectrometry; protein structure; protein dynamics; laboratory automation; therapeutic proteins; human growth hormone

