Search tips
Search criteria

Results 1-25 (45)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  HTS navigator: freely accessible cheminformatics software for analyzing high-throughput screening data 
Bioinformatics  2013;30(4):588-589.
Summary: We report on the development of the high-throughput screening (HTS) Navigator software to analyze and visualize the results of HTS of chemical libraries. The HTS Navigator processes output files from different plate readers' formats, computes the overall HTS matrix, automatically detects hits and has different types of baseline navigation and correction features. The software incorporates advanced cheminformatics capabilities such as chemical structure storage and visualization, fast similarity search and chemical neighborhood analysis for retrieved hits. The software is freely available for academic laboratories.
Availability and implementation:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3928525  PMID: 24376084
2.  Dataset Modelability by QSAR 
We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (Correct Classification Rate above 0.7) for a binary dataset of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of the nearest neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 datasets and the threshold of 0.65 was found to separate non-modelable from the modelable datasets.
PMCID: PMC3984298  PMID: 24251851
3.  Application of Quantitative Structure–Activity Relationship Models of 5-HT1A Receptor Binding to Virtual Screening Identifies Novel and Potent 5-HT1A Ligands 
The 5-hydroxytryptamine 1A (5-HT1A) serotonin receptor has been an attractive target for treating mood and anxiety disorders such as schizophrenia. We have developed binary classification quantitative structure–activity relationship (QSAR) models of 5-HT1A receptor binding activity using data retrieved from the PDSP Ki database. The prediction accuracy of these models was estimated by external 5-fold cross-validation as well as using an additional validation set comprising 66 structurally distinct compounds from the World of Molecular Bioactivity database. These validated models were then used to mine three major types of chemical screening libraries, i.e., drug-like libraries, GPCR targeted libraries, and diversity libraries, to identify novel computational hits. The five best hits from each class of libraries were chosen for further experimental testing in radioligand binding assays, and nine of the 15 hits were confirmed to be active experimentally with binding affinity better than 10 μM. The most active compound, Lysergol, from the diversity library showed very high binding affinity (Ki) of 2.3 nM against 5-HT1A receptor. The novel 5-HT1A actives identified with the QSAR-based virtual screening approach could be potentially developed as novel anxiolytics or potential antischizophrenic drugs.
PMCID: PMC3985444  PMID: 24410373
4.  Computer-aided design of liposomal drugs: in silico prediction and experimental validation of drug candidates for liposomal remote loading 
Previously we have developed and statistically validated Quantitative Structure Property Relationship (QSPR) models that correlate drugs’ structural, physical and chemical properties as well as experimental conditions with the relative efficiency of remote loading of drugs into liposomes (Cern et al, Journal of Controlled Release, 160(2012) 14–157). Herein, these models have been used to virtually screen a large drug database to identify novel candidate molecules for liposomal drug delivery. Computational hits were considered for experimental validation based on their predicted remote loading efficiency as well as additional considerations such as availability, recommended dose and relevance to the disease. Three compounds were selected for experimental testing which were confirmed to be correctly classified by our previously reported QSPR models developed with Iterative Stochastic Elimination (ISE) and k-nearest neighbors (kNN) approaches. In addition, 10 new molecules with known liposome remote loading efficiency that were not used in QSPR model development were identified in the published literature and employed as an additional model validation set. The external accuracy of the models was found to be as high as 82% or 92%, depending on the model. This study presents the first successful application of QSPR models for the computer-model-driven design of liposomal drugs.
PMCID: PMC4165646  PMID: 24184343
Liposomes; Remote loading; QSPR; Virtual screening; Iterative Stochastic Elimination, k-nearest neighbors
5.  Predicting Drug-induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches 
Chemical research in toxicology  2011;24(8):1251-1262.
Quantitative Structure-Activity Relationship (QSAR) modeling and toxicogenomics are used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely their chemical descriptors and toxicogenomic profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs ( The model endpoint was hepatotoxicity in the rat following 28 days of exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (Correct Classification Rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomic data (24h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomic descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomic data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were also identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the mechanistic understanding of sub-chronic liver injury and afford models capable of accurate prediction of hepatotoxicity from chemical structure and short-term assay results.
PMCID: PMC4281093  PMID: 21699217
Quantitative Structure Activity Relationship (QSAR) modeling; toxicogenomics; biological descriptors; hepatotoxicity
6.  A Systems Chemical Biology Study of Malate Synthase and Isocitrate Lyase Inhibition in Mycobacterium tuberculosis During Active and NRP Growth 
The ability of Mycobacterium tuberculosis (Mtb) to survive in low oxygen environments enables the bacterium to persist in a latent state within host tissues. In vitro studies of Mtb growth have identified changes in isocitrate lyase (ICL) and malate synthase (MS) that enable bacterial persistent under low oxygen and other environmentally limiting conditions. Systems chemical biology (SCB) enables us to evaluate the effects of small molecule inhibitors not only on the reaction catalyzed by malate synthase and isocitrate lyase, but the effect on the complete tricarboxylic acid cycle (TCA) by taking into account complex network relationships within that system.
To study the kinetic consequences of inhibition on persistent bacilli, we implement a systems-chemical biology (SCB) platform and perform a chemistry-centric analysis of key metabolic pathways believed to impact Mtb latency. We explore consequences of disrupting the function of malate synthase (MS) and isocitrate lyase (ICL) during aerobic and hypoxic non-replicating persistence (NRP) growth by using the SCB method to identify small molecules that inhibit the function of MS and ICL, and simulating the metabolic consequence of the disruption.
Results indicate variations in target and non-target reaction steps, clear differences in the normal and low oxygen models, as well as dosage dependent response. Simulation results from singular and combined enzyme inhibition strategies suggest ICL may be the more effective target for chemotherapeutic treatment against Mtb growing in a microenvironment where oxygen is slowly depleted, which may favor persistence.
PMCID: PMC4010430  PMID: 24121675
Biological networks; cheminformatics; biochemical network simulations; systems biology; chemical biology; Mycobacterium tuberculosis
Identification of Endocrine Disrupting Chemicals is one of the important goals of environmental chemical hazard screening. We report on the development of validated in silico predictors of chemicals likely to cause Estrogen Receptor (ER)-mediated endocrine disruption to facilitate their prioritization for future screening. A database of relative binding affinity of a large number of ERα and/or ERβ ligands was assembled (546 for ERα and 137 for ERβ). Both single-task learning (STL) and multi-task learning (MTL) continuous Quantitative Structure-Activity Relationships (QSAR) models were developed for predicting ligand binding affinity to ERα or ERβ. High predictive accuracy was achieved for ERα binding affinity (MTL R2=0.71, STL R2=0.73). For ERβ binding affinity, MTL models were significantly more predictive (R2=0.53, p<0.05) than STL models. In addition, docking studies were performed on a set of ER agonists/antagonists (67 agonists and 39 antagonists for ERα, 48 agonists and 32 antagonists for ERβ, supplemented by putative decoys/non-binders) using the following ER structures (in complexes with respective ligands) retrieved from the Protein Data Bank: ERα agonist (PDB ID: 1L2I), ERα antagonist (PDB ID: 3DT3), ERβ agonist (PDB ID: 2NV7), ERβ antagonist (PDB ID: 1L2J). We found that all four ER conformations discriminated their corresponding ligands from presumed non-binders. Finally, both QSAR models and ER structures were employed in parallel to virtually screen several large libraries of environmental chemicals to derive a ligand- and structure-based prioritized list of putative estrogenic compounds to be used for in vitro and in vivo experimental validation.
PMCID: PMC3775906  PMID: 23707773
Endocrine Disrupting Chemicals; Estrogen Receptor; Quantitative Structure-Activity Relationships modeling; Multi-Task Learning; Docking; Virtual Screening
8.  Predicting binding affinity of CSAR ligands using both structure-based and ligand-based approaches 
We report on the prediction accuracy of ligand-based (2D QSAR) and structure-based (MedusaDock) methods used both independently and in consensus for ranking the congeneric series of ligands binding to three protein targets (UK, ERK2, and CHK1) from the CSAR 2011 benchmark exercise. An ensemble of predictive QSAR models was developed using known binders of these three targets extracted from the publicly-available ChEMBL database. Selected models were used to predict the binding affinity of CSAR compounds towards the corresponding targets and rank them accordingly; the overall ranking accuracy evaluated by Spearman correlation was as high as 0.78 for UK, 0.60 for ERK2, and 0.56 for CHK1, placing our predictions in top-10% among all the participants. In parallel, MedusaDock designed to predict reliable docking poses was also used for ranking the CSAR ligands according to their docking scores; the resulting accuracy (Spearman correlation) for UK, ERK2, and CHK1 were 0.76, 0.31, and 0.26, respectively. In addition, performance of several consensus approaches combining MedusaDock and QSAR predicted ranks altogether has been explored; the best approach yielded Spearman correlation coefficients for UK, ERK2, and CHK1 of 0.82, 0.50, and 0.45, respectively. This study shows that (i) externally validated 2D QSAR models were capable of ranking CSAR ligands at least as accurately as more computationally intensive structure-based approaches used both by us and by other groups and (ii) ligand-based QSAR models can complement structure-based approaches by boosting the prediction performances when used in consensus.
PMCID: PMC3779696  PMID: 23809015
9.  Integrative Chemical-Biological Read-Across Approach for Chemical Hazard Classification 
Chemical research in toxicology  2013;26(8):10.1021/tx400110f.
Traditional read-across approaches typically rely on the chemical similarity principle to predict chemical toxicity; however, the accuracy of such predictions is often inadequate due to the underlying complex mechanisms of toxicity. Here we report on the development of a hazard classification and visualization method that draws upon both chemical structural similarity and comparisons of biological responses to chemicals measured in multiple short-term assays (”biological” similarity). The Chemical-Biological Read-Across (CBRA) approach infers each compound's toxicity from those of both chemical and biological analogs whose similarities are determined by the Tanimoto coefficient. Classification accuracy of CBRA was compared to that of classical RA and other methods using chemical descriptors alone, or in combination with biological data. Different types of adverse effects (hepatotoxicity, hepatocarcinogenicity, mutagenicity, and acute lethality) were classified using several biological data types (gene expression profiling and cytotoxicity screening). CBRA-based hazard classification exhibited consistently high external classification accuracy and applicability to diverse chemicals. Transparency of the CBRA approach is aided by the use of radial plots that show the relative contribution of analogous chemical and biological neighbors. Identification of both chemical and biological features that give rise to the high accuracy of CBRA-based toxicity prediction facilitates mechanistic interpretation of the models.
PMCID: PMC3818153  PMID: 23848138
10.  Do crystal structures obviate the need for theoretical models of GPCRs for structure based virtual screening 
Proteins  2012;80(6):1503-1521.
Recent highly expected structural characterizations of agonist-bound and antagonist-bound beta-2 adrenoreceptor (β2AR) by X-ray crystallography have been widely regarded as critical advances to enable more effective structure-based discovery of GPCRs ligands. It appears that this very important development may have undermined many previous efforts to develop 3D theoretical models of GPCRs. To address this question directly we have compared several historical β2AR models versus the inactive state and nanobody-stabilized active state of β2AR crystal structures in terms of their structural similarity and effectiveness of use in virtual screening for β2AR specific agonists and antagonists. Theoretical models, incluing both homology and de novo types, were collected from five different groups who have published extensively in the field of GPCRs modeling; all models were built before X-ray structures became available. In general, β2AR theoretical models differ significantly from the crystal structure in terms of TMH definition and the global packing. Nevertheless, surprisingly, several models afforded hit rates resulting from virtual screening of large chemical library enriched by known β2AR ligands that exceeded those using X-ray structures; the hit rates were particularly higher for agonists. Furthemore, the screening performance of models is associated with local structural quality such as the RMSDs for binding pocket residues and the ability to capture accurately most if not all critical protein/ligand interactions. These results suggest that carefully built models of GPCRs could capture critical chemical and structural features of the binding pocket thus may be even more useful for practical structure-based drug discovery than X-ray structures.
PMCID: PMC4133977  PMID: 22275072
GPCRs modeling; crystallography; beta-2 adrenoreceptor; agonist-bound; antagonist-bound; molecular docking; enrichment factor
11.  PITPs as Targets for Selectively Interfering With Phosphoinositide Signaling in Cells 
Nature chemical biology  2013;10(1):76-84.
Sec14-like phosphatidylinositol transfer proteins (PITPs) integrate diverse territories of intracellular lipid metabolism with stimulated phosphatidylinositol-4-phosphate production, and are discriminating portals for interrogating phosphoinositide signaling. Yet, neither Sec14-like PITPs, nor PITPs in general, have been exploited as targets for chemical inhibition for such purposes. Herein, we validate the first small molecule inhibitors (SMIs) of the yeast PITP Sec14. These SMIs are nitrophenyl(4-(2-methoxyphenyl)piperazin-1-yl)methanones (NPPMs), and are effective inhibitors in vitro and in vivo. We further establish Sec14 is the sole essential NPPM target in yeast, that NPPMs exhibit exquisite targeting specificities for Sec14 (relative to related Sec14-like PITPs), propose a mechanism for how NPPMs exert their inhibitory effects, and demonstrate NPPMs exhibit exquisite pathway selectivity in inhibiting phosphoinositide signaling in cells. These data deliver proof-of-concept that PITP-directed SMIs offer new and generally applicable avenues for intervening with phosphoinositide signaling pathways with selectivities superior to those afforded by contemporary lipid kinase-directed strategies.
PMCID: PMC4059020  PMID: 24292071
12.  Human intestinal transporter database: QSAR modeling and virtual profiling of drug uptake, efflux and interactions 
Pharmaceutical research  2012;30(4):996-1007.
Membrane transporters mediate many biological effects of chemicals and play a major role in pharmacokinetics and drug resistance. The selection of viable drug candidates among biologically active compounds requires the assessment of their transporter interaction profiles.
Using public sources, we have assembled and curated the largest, to our knowledge, human intestinal transporter database (>5,000 interaction entries for >3,700 molecules). This data was used to develop thoroughly validated classification Quantitative Structure-Activity Relationship (QSAR) models of transport and/or inhibition of several major transporters including MDR1, BCRP, MRP1-4, PEPT1, ASBT, OATP2B1, OCT1, and MCT1.
Results & Conclusions
QSAR models have been developed with advanced machine learning techniques such as Support Vector Machines, Random Forest, and k Nearest Neighbors using Dragon and MOE chemical descriptors. These models afforded high external prediction accuracies of 71–100% estimated by 5-fold external validation, and showed hit retrieval rates with up to 20-fold enrichment in the virtual screening of DrugBank compounds. The compendium of predictive QSAR models developed in this study can be used for virtual profiling of drug candidates and/or environmental agents with the optimal transporter profiles.
PMCID: PMC3596480  PMID: 23269503
membrane transport proteins; ADMET; drug transport; permeability; efflux
17.  The Discovery of Novel Antimalarial Compounds Enabled by QSAR-based Virtual Screening 
Quantitative structure–activity relationship (QSAR) models have been developed for a dataset of 3133 compounds defined as either active or inactive against P. falciparum. Since the dataset was strongly biased towards inactive compounds, different sampling approaches were employed to balance the ratio of actives vs. inactives, and models were rigorously validated using both internal and external validation approaches. The balanced accuracy for assessing the antimalarial activities of 70 external compounds was between 87% and 100% depending on the approach used to balance the dataset. Virtual screening of the ChemBridge database using QSAR models identified 176 putative antimalarial compounds that were submitted for experimental validation, along with 42 putative inactives as negative controls. Twenty five (14.2%) computational hits were found to have antimalarial activities with minimal cytotoxicity to mammalian cells, while all 42 putative inactives were confirmed experimentally. Structural inspection of confirmed active hits revealed novel chemical scaffolds, which could be employed as starting points to discover novel antimalarial agents.
PMCID: PMC3644566  PMID: 23252936
Antimalarial activity; quantitative structure–activity relationships; virtual screening; experimental confirmation
18.  Community-wide assessment of protein-interface modeling suggests improvements to design methodology 
Fleishman, Sarel J | Whitehead, Timothy A | Strauch, Eva-Maria | Corn, Jacob E | Qin, Sanbo | Zhou, Huan-Xiang | Mitchell, Julie C. | Demerdash, Omar N.A | Takeda-Shitaka, Mayuko | Terashi, Genki | Moal, Iain H. | Li, Xiaofan | Bates, Paul A. | Zacharias, Martin | Park, Hahnbeom | Ko, Jun-su | Lee, Hasup | Seok, Chaok | Bourquard, Thomas | Bernauer, Julie | Poupon, Anne | Azé, Jérôme | Soner, Seren | Ovali, Şefik Kerem | Ozbek, Pemra | Ben Tal, Nir | Haliloglu, Türkan | Hwang, Howook | Vreven, Thom | Pierce, Brian G. | Weng, Zhiping | Pérez-Cano, Laura | Pons, Carles | Fernández-Recio, Juan | Jiang, Fan | Yang, Feng | Gong, Xinqi | Cao, Libin | Xu, Xianjin | Liu, Bin | Wang, Panwen | Li, Chunhua | Wang, Cunxin | Robert, Charles H. | Guharoy, Mainak | Liu, Shiyong | Huang, Yangyu | Li, Lin | Guo, Dachuan | Chen, Ying | Xiao, Yi | London, Nir | Itzhaki, Zohar | Schueler-Furman, Ora | Inbar, Yuval | Patapov, Vladimir | Cohen, Mati | Schreiber, Gideon | Tsuchiya, Yuko | Kanamori, Eiji | Standley, Daron M. | Nakamura, Haruki | Kinoshita, Kengo | Driggers, Camden M. | Hall, Robert G. | Morgan, Jessica L. | Hsu, Victor L. | Zhan, Jian | Yang, Yuedong | Zhou, Yaoqi | Kastritis, Panagiotis L. | Bonvin, Alexandre M.J.J. | Zhang, Weiyi | Camacho, Carlos J. | Kilambi, Krishna P. | Sircar, Aroop | Gray, Jeffrey J. | Ohue, Masahito | Uchikoga, Nobuyuki | Matsuzaki, Yuri | Ishida, Takashi | Akiyama, Yutaka | Khashan, Raed | Bush, Stephen | Fouches, Denis | Tropsha, Alexander | Esquivel-Rodríguez, Juan | Kihara, Daisuke | Stranges, P Benjamin | Jacak, Ron | Kuhlman, Brian | Huang, Sheng-You | Zou, Xiaoqin | Wodak, Shoshana J | Janin, Joel | Baker, David
Journal of molecular biology  2011;414(2):10.1016/j.jmb.2011.09.031.
The CAPRI and CASP prediction experiments have demonstrated the power of community wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting there may be important physical chemistry missing in the energy calculations. 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the non-polar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were on average structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a non-binder.
PMCID: PMC3839241  PMID: 22001016
19.  Scoring Protein Interaction Decoys using Exposed Residues (SPIDER): A Novel Multi-Body Interaction Scoring Function based on Frequent Geometric Patterns of Interfacial Residues 
Proteins  2012;80(9):2207-2217.
Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multi-body pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an un-directional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each “native” pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard “ZDOCK” benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein–protein docking methods.
PMCID: PMC3409293  PMID: 22581643
Bioinformatics; Amino acids; Centroids; Statistical potential; Delaunay tessellation; Subgraph mining; Motifs; Coarse-grained; ZDOCK; CAPRI
20.  A Chemocentric Informatics Approach to Drug Discovery: Identification and Experimental Validation of Selective Estrogen Receptor Modulators as ligands of 5-Hydroxytryptamine-6 Receptors and as Potential Cognition Enhancers 
Journal of Medicinal Chemistry  2012;55(12):5704-5719.
We have devised a chemocentric informatics methodology for drug discovery integrating independent approaches to mining biomolecular databases. As a proof of concept, we have searched for novel putative cognition enhancers. First, we generated Quantitative Structure- Activity Relationship (QSAR) models of compounds binding to 5-hydroxytryptamine-6 receptor (5HT6R), a known target for cognition enhancers, and employed these models for virtual screening to identify putative 5-HT6R actives. Second, we queried chemogenomics data from the Connectivity Map ( with the gene expression profile signatures of Alzheimer’s disease patients to identify compounds putatively linked to the disease. Thirteen common hits were tested in 5-HT6R radioligand binding assays and ten were confirmed as actives. Four of them were known selective estrogen receptor modulators that were never reported as 5-HT6R ligands. Furthermore, nine of the confirmed actives were reported elsewhere to have memory-enhancing effects. The approaches discussed herein can be used broadly to identify novel drug-target-disease associations.
PMCID: PMC3401608  PMID: 22537153
21.  Predictive Modeling of Chemical Hazard by Integrating Numerical Descriptors of Chemical Structures and Short-term Toxicity Assay Data 
Toxicological Sciences  2012;127(1):1-9.
Quantitative structure-activity relationship (QSAR) models are widely used for in silico prediction of in vivo toxicity of drug candidates or environmental chemicals, adding value to candidate selection in drug development or in a search for less hazardous and more sustainable alternatives for chemicals in commerce. The development of traditional QSAR models is enabled by numerical descriptors representing the inherent chemical properties that can be easily defined for any number of molecules; however, traditional QSAR models often have limited predictive power due to the lack of data and complexity of in vivo endpoints. Although it has been indeed difficult to obtain experimentally derived toxicity data on a large number of chemicals in the past, the results of quantitative in vitro screening of thousands of environmental chemicals in hundreds of experimental systems are now available and continue to accumulate. In addition, publicly accessible toxicogenomics data collected on hundreds of chemicals provide another dimension of molecular information that is potentially useful for predictive toxicity modeling. These new characteristics of molecular bioactivity arising from short-term biological assays, i.e., in vitro screening and/or in vivo toxicogenomics data can now be exploited in combination with chemical structural information to generate hybrid QSAR–like quantitative models to predict human toxicity and carcinogenicity. Using several case studies, we illustrate the benefits of a hybrid modeling approach, namely improvements in the accuracy of models, enhanced interpretation of the most predictive features, and expanded applicability domain for wider chemical space coverage.
PMCID: PMC3327873  PMID: 22387746
QSAR; toxicity screening; hybrid modeling
22.  Quantitative High-Throughput Screening for Chemical Toxicity in a Population-Based In Vitro Model 
Toxicological Sciences  2012;126(2):578-588.
A shift in toxicity testing from in vivo to in vitro may efficiently prioritize compounds, reveal new mechanisms, and enable predictive modeling. Quantitative high-throughput screening (qHTS) is a major source of data for computational toxicology, and our goal in this study was to aid in the development of predictive in vitro models of chemical-induced toxicity, anchored on interindividual genetic variability. Eighty-one human lymphoblast cell lines from 27 Centre d’Etude du Polymorphisme Humain trios were exposed to 240 chemical substances (12 concentrations, 0.26nM–46.0μM) and evaluated for cytotoxicity and apoptosis. qHTS screening in the genetically defined population produced robust and reproducible results, which allowed for cross-compound, cross-assay, and cross-individual comparisons. Some compounds were cytotoxic to all cell types at similar concentrations, whereas others exhibited interindividual differences in cytotoxicity. Specifically, the qHTS in a population-based human in vitro model system has several unique aspects that are of utility for toxicity testing, chemical prioritization, and high-throughput risk assessment. First, standardized and high-quality concentration-response profiling, with reproducibility confirmed by comparison with previous experiments, enables prioritization of chemicals for variability in interindividual range in cytotoxicity. Second, genome-wide association analysis of cytotoxicity phenotypes allows exploration of the potential genetic determinants of interindividual variability in toxicity. Furthermore, highly significant associations identified through the analysis of population-level correlations between basal gene expression variability and chemical-induced toxicity suggest plausible mode of action hypotheses for follow-up analyses. We conclude that as the improved resolution of genetic profiling can now be matched with high-quality in vitro screening data, the evaluation of the toxicity pathways and the effects of genetic diversity are now feasible through the use of human lymphoblast cell lines.
PMCID: PMC3307611  PMID: 22268004
chemical cytotoxicity; apoptosis; HapMap; lymphoblasts; qHTS
23.  Cheminformatics Meets Molecular Mechanics: A Combined Application of Knowledge-based Pose Scoring and Physical Force Field-based Hit Scoring Functions Improves the Accuracy of Structure-Based Virtual Screening 
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
PMCID: PMC3264743  PMID: 22017385
24.  Computational Systems Chemical Biology 
There is a critical need for improving the level of chemistry awareness in systems biology. The data and information related to modulation of genes and proteins by small molecules continue to accumulate at the same time as simulation tools in systems biology and whole body physiologically-based pharmacokinetics (PBPK) continue to evolve. We called this emerging area at the interface between chemical biology and systems biology systems chemical biology, SCB (Oprea et al., 2007).
The overarching goal of computational SCB is to develop tools for integrated chemical-biological data acquisition, filtering and processing, by taking into account relevant information related to interactions between proteins and small molecules, possible metabolic transformations of small molecules, as well as associated information related to genes, networks, small molecules and, where applicable, mutants and variants of those proteins. There is yet an unmet need to develop an integrated in silico pharmacology / systems biology continuum that embeds drug-target-clinical outcome (DTCO) triplets, a capability that is vital to the future of chemical biology, pharmacology and systems biology. Through the development of the SCB approach, scientists will be able to start addressing, in an integrated simulation environment, questions that make the best use of our ever-growing chemical and biological data repositories at the system-wide level. This chapter reviews some of the major research concepts and describes key components that constitute the emerging area of computational systems chemical biology.
PMCID: PMC3547368  PMID: 20838980
Physiologically-based pharmacokinetics (PBPK); biological networks; cheminformatics; QSAR modeling; biochemical network simulations; systems biology
25.  Combined application of cheminformatics- and physical force field-based scoring functions improves binding affinity prediction for CSAR datasets 
The curated CSAR-NRC benchmark sets provide valuable opportunity for testing or comparing the performance of both existing and novel scoring functions. We apply two different scoring functions, both independently and in combination, to predict binding affinity of ligands in the CSAR-NRC datasets. One, reported here for the first time, employs multiple chemical-geometrical descriptors of the protein-ligand interface to develop Quantitative Structure – Binding Affinity Relationships (QSBAR) models; these models are then used to predict binding affinity of ligands in the external dataset. Second is a physical force field-based scoring function, MedusaScore. We show that both individual scoring functions achieve statistically significant prediction accuracies with the squared correlation coefficient (R2) between actual and predicted binding affinity of 0.44/0.53 (Set1/Set2) with QSBAR models and 0.34/0.47 (Set1/Set2) with MedusaScore. Importantly, we find that the combination of QSBAR models and MedusaScore into consensus scoring function affords higher prediction accuracy than any of the contributing methods achieving R2 of 0.45/0.58 (Set1/Set2). Furthermore, we identify several chemical features and non-covalent interactions that may be responsible for the inaccurate prediction of binding affinity for several ligands by the scoring functions employed in this study.
PMCID: PMC3183266  PMID: 21780807

Results 1-25 (45)