Search tips
Search criteria

Results 1-25 (1037738)

Clipboard (0)

Related Articles

1.  AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening 
BMC Bioinformatics  2008;9:438.
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
PMCID: PMC2588602  PMID: 18925937
2.  FAF-Drugs: free ADME/tox filtering of compound collections 
Nucleic Acids Research  2006;34(Web Server issue):W738-W744.
In silico screening based on the structures of the ligands or of the receptors has become an essential tool to facilitate the drug discovery process but compound collections are needed to carry out such in silico experiments. It has been recognized that absorption, distribution, metabolism, excretion and toxicity (ADME/tox) are key properties that need to be considered early on, even during the database preparation stage. FAF-Drugs is an online service based on Frowns (a chemoinformatics toolkit) that allows users to process their own compound collections via simple ADME/Tox filtering rules such as molecular weight, polar surface area, logP or number of rotatable bonds. SMILES (Simplified Molecular Input Line Entry System), CANSMILES (canonical smiles) or SDF (structure data file) files are required as input and molecules that pass or do not pass the filters are sent back in CANSMILES format. This service should thus help scientists engaging in drug discovery campaigns. Other utilities and several compound collections suitable for in silico screening are available at our site. FAF-Drugs can be accessed at .
PMCID: PMC1538885  PMID: 16845110
3.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
4.  Frog: a FRee Online druG 3D conformation generator 
Nucleic Acids Research  2007;35(Web Server issue):W568-W572.
In silico screening methods based on the 3D structures of the ligands or of the proteins have become an essential tool to facilitate the drug discovery process. To achieve such process, the 3D structures of the small chemical compounds have to be generated. In addition, for ligand-based screening computations or hierarchical structure-based screening projects involving a rigid-body docking step, it is necessary to generate multi-conformer 3D models for each input ligand to increase the efficiency of the search. However, most academic or commercial compound collections are delivered in 1D SMILES (simplified molecular input line entry system) format or in 2D SDF (structure data file), highlighting the need for free 1D/2D to 3D structure generators. Frog is an on-line service aimed at generating 3D conformations for drug-like compounds starting from their 1D or 2D descriptions. Given the atomic constitution of the molecules and connectivity information, Frog can identify the different unambiguous isomers corresponding to each compound, and generate single or multiple low-to-medium energy 3D conformations, using an assembly process that does not presently consider ring flexibility. Tests show that Frog is able to generate bioactive conformations close to those observed in crystallographic complexes. Frog can be accessed at
PMCID: PMC1933180  PMID: 17485475
5.  AfroDb: A Select Highly Potent and Diverse Natural Product Library from African Medicinal Plants 
PLoS ONE  2013;8(10):e78085.
Computer-aided drug design (CADD) often involves virtual screening (VS) of large compound datasets and the availability of such is vital for drug discovery protocols. We assess the bioactivity and “drug-likeness” of a relatively small but structurally diverse dataset (containing >1,000 compounds) from African medicinal plants, which have been tested and proven a wide range of biological activities. The geographical regions of collection of the medicinal plants cover the entire continent of Africa, based on data from literature sources and information from traditional healers. For each isolated compound, the three dimensional (3D) structure has been used to calculate physico-chemical properties used in the prediction of oral bioavailability on the basis of Lipinski’s “Rule of Five”. A comparative analysis has been carried out with the “drug-like”, “lead-like”, and “fragment-like” subsets, as well as with the Dictionary of Natural Products. A diversity analysis has been carried out in comparison with the ChemBridge diverse database. Furthermore, descriptors related to absorption, distribution, metabolism, excretion and toxicity (ADMET) have been used to predict the pharmacokinetic profile of the compounds within the dataset. Our results prove that drug discovery, beginning with natural products from the African flora, could be highly promising. The 3D structures are available and could be useful for virtual screening and natural product lead generation programs.
PMCID: PMC3813505  PMID: 24205103
6.  PDTD: a web-accessible protein database for drug target identification 
BMC Bioinformatics  2008;9:104.
Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) , which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.
PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.
PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at .
PMCID: PMC2265675  PMID: 18282303
7.  Machine Learning Models and Pathway Genome Data Base for Trypanosoma cruzi Drug Discovery 
PLoS Neglected Tropical Diseases  2015;9(6):e0003878.
Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity.
Methodology/Principal Findings
In the present study we developed a computational approach that utilized data from several public whole-cell, phenotypic high throughput screens that have been completed for T. cruzi by the Broad Institute, including a single screen of over 300,000 molecules in the search for chemical probes as part of the NIH Molecular Libraries program. We have also compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. cruzi. In addition, we have developed Bayesian machine learning models that were used to virtually screen libraries of compounds. Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10μM. We progressed five compounds to an in vivo mouse efficacy model of Chagas disease and validated that the machine learning model could identify in vitro active compounds not in the training set, as well as known positive controls. The antimalarial pyronaridine possessed 85.2% efficacy in the acute Chagas mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T. cruzi.
Conclusions/ Significance
We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked. The approach we have taken is broadly applicable to other NTDs.
Author Summary
Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The disease is endemic to Latin America but is increasingly found in North America and Europe, primarily through immigration, and the spread of this disease is bringing new attention to the need for novel, safe, and effective therapeutics to treat T. cruzi infection. We have used data from a phenotypic screen to build Bayesian models to predict anti-parasitic activity against T. cruzi in vitro. These models were used to score various small libraries of molecules. We selected less than 100 compounds for testing and found in vitro actives, some of which were tested in an in vivo efficacy model. We identified the antimalarial pyronaridine as having in vivo efficacy and provides us with a new starting point for further investigation and optimization.
PMCID: PMC4482694  PMID: 26114876
8.  Learning a peptide-protein binding affinity predictor with kernel ridge regression 
BMC Bioinformatics  2013;14:82.
The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation.
We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets.
On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at
PMCID: PMC3651388  PMID: 23497081
9.  CamMedNP: Building the Cameroonian 3D structural natural products database for virtual screening 
Computer-aided drug design (CADD) often involves virtual screening (VS) of large compound datasets and the availability of such is vital for drug discovery protocols. We present CamMedNP - a new database beginning with more than 2,500 compounds of natural origin, along with some of their derivatives which were obtained through hemisynthesis. These are pure compounds which have been previously isolated and characterized using modern spectroscopic methods and published by several research teams spread across Cameroon.
In the present study, 224 distinct medicinal plant species belonging to 55 plant families from the Cameroonian flora have been considered. About 80 % of these have been previously published and/or referenced in internationally recognized journals. For each compound, the optimized 3D structure, drug-like properties, plant source, collection site and currently known biological activities are given, as well as literature references. We have evaluated the “drug-likeness” of this database using Lipinski’s “Rule of Five”. A diversity analysis has been carried out in comparison with the ChemBridge diverse database.
CamMedNP could be highly useful for database screening and natural product lead generation programs.
PMCID: PMC3637470  PMID: 23590173
3D structures, Database collection; Natural products; Medicinal plants; Virtual screening
10.  Cyndi: a multi-objective evolution algorithm based method for bioactive molecular conformational generation 
BMC Bioinformatics  2009;10:101.
Conformation generation is a ubiquitous problem in molecule modelling. Many applications require sampling the broad molecular conformational space or perceiving the bioactive conformers to ensure success. Numerous in silico methods have been proposed in an attempt to resolve the problem, ranging from deterministic to non-deterministic and systemic to stochastic ones. In this work, we described an efficient conformation sampling method named Cyndi, which is based on multi-objective evolution algorithm.
The conformational perturbation is subjected to evolutionary operation on the genome encoded with dihedral torsions. Various objectives are designated to render the generated Pareto optimal conformers to be energy-favoured as well as evenly scattered across the conformational space. An optional objective concerning the degree of molecular extension is added to achieve geometrically extended or compact conformations which have been observed to impact the molecular bioactivity (J Comput -Aided Mol Des 2002, 16: 105–112). Testing the performance of Cyndi against a test set consisting of 329 small molecules reveals an average minimum RMSD of 0.864 Å to corresponding bioactive conformations, indicating Cyndi is highly competitive against other conformation generation methods. Meanwhile, the high-speed performance (0.49 ± 0.18 seconds per molecule) renders Cyndi to be a practical toolkit for conformational database preparation and facilitates subsequent pharmacophore mapping or rigid docking. The copy of precompiled executable of Cyndi and the test set molecules in mol2 format are accessible in Additional file 1.
On the basis of MOEA algorithm, we present a new, highly efficient conformation generation method, Cyndi, and report the results of validation and performance studies comparing with other four methods. The results reveal that Cyndi is capable of generating geometrically diverse conformers and outperforms other four multiple conformer generators in the case of reproducing the bioactive conformations against 329 structures. The speed advantage indicates Cyndi is a powerful alternative method for extensive conformational sampling and large-scale conformer database preparation.
PMCID: PMC2678094  PMID: 19335906
11.  Discovery of Potent Small-Molecule Inhibitors of Multidrug-Resistant Plasmodium falciparum Using a Novel Miniaturized High-Throughput Luciferase-Based Assay ▿ †  
Malaria is a global health problem that causes significant mortality and morbidity, with more than 1 million deaths per year caused by Plasmodium falciparum. Most antimalarial drugs face decreased efficacy due to the emergence of resistant parasites, which necessitates the discovery of new drugs. To identify new antimalarials, we developed an automated 384-well plate screening assay using P. falciparum parasites that stably express cytoplasmic firefly luciferase. After initial optimization, we tested two different types of compound libraries: known bioactive collections (Library of Pharmacologically Active Compounds [LOPAC] and the library from the National Institute of Neurological Disorders and Stroke [NINDS]) and a library of uncharacterized compounds (ChemBridge). A total of 12,320 compounds were screened at 5.5 μM. Selecting only compounds that reduced parasite growth by 85% resulted in 33 hits from the combined bioactive collection and 130 hits from the ChemBridge library. Fifteen novel drug-like compounds from the bioactive collection were found to be active against P. falciparum. Twelve new chemical scaffolds were found from the ChemBridge hits, the most potent of which was a series based on the 1,4-naphthoquinone scaffold, which is structurally similar to the FDA-approved antimalarial atovaquone. However, in contrast to atovaquone, which acts to inhibit the bc1 complex and block the electron transport chain in parasite mitochondria, we have determined that our new 1,4-napthoquinones act in a novel, non-bc1-dependent mechanism and remain potent against atovaquone- and chloroquine-resistant parasites. Ultimately, this study may provide new probes to understand the molecular details of the malaria life cycle and to identify new antimalarials.
PMCID: PMC2934977  PMID: 20547797
12.  VoteDock: Consensus Docking Method for Prediction of Protein–Ligand Interactions 
Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein–ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In silico, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein–ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein–ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein–ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 Å. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5.
PMCID: PMC4510457  PMID: 20812324
drug discovery; PDBbind database; docking; consensus; molecular recognition
Neuro-Oncology  2014;16(Suppl 3):iii17.
BACKGROUND: Transcription factors (TFs) are a major class of protein signaling molecules that play a critical role in most cancers. OLIG2 is a bHLH TF essential for survival and expansion of the highly aggressive brain cancer, glioblastoma (GBM), and represents an attractive therapeutic target. TFs including OLIG2 are typically activated by dimerization, and TF inhibition has proved problematic owing to expansive protein–protein interfaces and the absence of hydrophobic pockets. In silico modeling is increasingly being used in attempts to design TF dimerization inhibitors, but these efforts have met with very limited success. METHODS: evious in silico based drug design approaches for TFs focused on single residues or small foci, called binding hotspots, on TF dimerization surfaces. This approach has largely failed because hotspots do not adequately represent the total active TF dimerization interface. In our modeling approach, which we used to identify candidate small molecule scaffolds for OLIG2 inhibition, we represent the dimerization surface as a comparatively extensive parental pharmacophore. This active surface is comprised of multiple, specific subregions we term daughter pharmacophores or subpharmacophores. We hypothesized that a small molecule capable of binding each subpharmacophore would sufficiently populate the active dimerization surface to interfere with OLIG2 dimerization, thereby suppressing pathway activation. RESULTS: Computational screens, guided by parameters defined by our multiple pharmacophore algorithm, identified potential OLIG2 inhibitors from comprehensive in silico compound libraries. A subset of these candidates convincingly demonstrated OLIG2 pathway inhibition and anti-GBM activity in an array of biochemical, cell-based, and reporter assays. Further, when we tested in mice a leading representative from a chemically tractable structural class we found, (1) attenuation of GBM xenograft growth, and (2) a favorable CNS pharmacokinetic profile. Both observations prompted us to nominate and pursue the representative for further structural optimization in the context of in vivo efficacy and pharmacokinetics. CONCLUSIONS: We have developed a novel computational modeling approach for designing TF inhibitors, using the concept of multiple pharmacophores. We have identified a small molecule compound with significant in vitro/vivo anti-GBM activity and favorable pharmacokinetics. The initial compound is now being pursued as a development candidate for GBM, and if successful its final derivative may ultimately represent the first truly GBM-specific drug. Moreover, our study in a broader context presents a new pharmacologic paradigm and may pave the way for the development of TF-targeted therapeutics in general. SECONDARY CATEGORY: n/a.
PMCID: PMC4144530
14.  Novel Chemical Suppressors of Long QT Syndrome Identified by an in vivo Functional Screen 
Circulation  2010;123(1):23-30.
Genetic long QT (LQT) syndrome is a life-threatening disorder caused by mutations that result in prolongation of cardiac repolarization. Recent work has demonstrated that a zebrafish model of LQT syndrome faithfully recapitulates several features of human disease including prolongation of ventricular action potential duration (APD), spontaneous early after-depolarizations, and 2:1 atrioventricular (AV) block in early stages of development. Due to their transparency, small size, and absorption of small molecules from their environment, zebrafish are amenable to high throughput chemical screens. We describe a small molecule screen using the zebrafish KCNH2 mutant breakdance to identify compounds that can rescue the LQT type 2 phenotype.
Methods and Results
Zebrafish breakdance embryos were exposed to test compounds at 48 hours of development and scored for rescue of 2:1 AV block at 72 hours in a 96-well format. Only compounds that suppressed the LQT phenotype in three of three fish were considered hits. Screen compounds were obtained from commercially available small molecule libraries (Prestwick and Chembridge). Initial hits were confirmed with dose response testing and time course studies. Optical mapping using the voltage sensitive dye di-4 ANEPPS was performed to measure compound effects on cardiac APDs. Screening of 1200 small molecules resulted in the identification of flurandrenolide and 2-methoxy-N-(4-methylphenyl) benzamide (2-MMB) as compounds that reproducibly suppressed the LQT phenotype. Optical mapping confirmed that treatment with each compound caused shortening of ventricular APDs. Structure activity studies and steroid receptor knockdown suggest that flurandrenolide functions via the glucocorticoid signaling pathway.
Using a zebrafish model of LQT type 2 syndrome in a high throughput chemical screen, we have identified two compounds, flurandrenolide and the novel compound, 2-MMB, as small molecules that rescue the zebrafish LQTS 2 by shortening the ventricular action potential duration. We provide evidence that flurandrenolide functions via the glucocorticoid receptor mediated pathway. These two molecules, and future discoveries from this screen, should yield novel tools for the study of cardiac electrophysiology and may lead to novel therapeutics for human LQT patients.
PMCID: PMC3015011  PMID: 21098441
long QT syndrome; animal models of human disease; ion channels; chemical screening
15.  Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery 
Chromosome 1 of Vibrio vulnificus tends to contain larger portion of essential or housekeeping genes on the basis of the genomic analysis and gene knockout experiments performed in this study, while its chromosome 2 seems to have originated and evolved from a plasmid.The genome-scale metabolic network model of V. vulnificus was reconstructed based on databases and literature, and was used to identify 193 essential metabolites.Five essential metabolites finally selected after the filtering process are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA), which were predicted to be essential in V. vulnificus, absent in human, and are consumed by multiple reactions.Chemical analogs of the five essential metabolites were screened and a hit compound showing the minimal inhibitory concentration (MIC) of 2 μg/ml and the minimal bactericidal concentration (MBC) of 4 μg/ml against V. vulnificus was identified.
Discovering new antimicrobial targets and consequently new antimicrobials is important as drug resistance of pathogenic microorganisms is becoming an increasingly serious problem in human healthcare management (Fischbach and Walsh, 2009). There clearly exists a gap between genomic studies and drug discovery as the accumulation of knowledge on pathogens at genome level has not successfully transformed into the development of effective drugs (Mills, 2006; Payne et al, 2007). In this study, we dissected the genome of a microbial pathogen in detail, and subsequently developed a systems biological strategy of employing genome-scale metabolic modeling and simulation together with metabolite essentiality analysis for effective drug targeting and discovery. This strategy was used for identifying new drug targets in an opportunistic pathogen Vibrio vulnificus CMCP6 as a model.
V. vulnificus is a Gram-negative halophilic bacterium that is found in estuarine waters, brackish ponds, or coastal areas, and its Biotype 1 is an opportunistic human pathogen that can attack immune-compromised patients, and causes primary septicemia, necrotized wound infections, and gastroenteritis. We previously found that many metabolic genes were specifically induced in vivo, suggesting that specific metabolic pathways are essential for in vivo survival and virulence of this pathogen (Kim et al, 2003; Lee et al, 2007). These results motivated us to carry out systems biological analysis of the genome and the metabolic network for new drug target discovery.
V. vulnificus CMCP6 has two chromosomes. We first re-sequenced genomic regions assembled in low quality and low depth, and subsequently re-annotated the whole genome of V. vulnificus. Horizontal gene transfer was suspected to be responsible for the diversification of each chromosome of V. vulnificus, and the presence of metabolic genes was more biased to chromosome 1 than chromosome 2. Further studies on V. vulnificus genome revealed that chromosome 2 is more prone to diversification for better adaptation to the environment than its chromosome 1, while chromosome 1 tends to expand their genetic repertoire while maintaining the core genes at a constant level.
Next, a genome-scale metabolic network VvuMBEL943 was reconstructed based on literature, databases and experiments for systematic studies on the metabolism of this pathogen and prediction of drug targets. The VvuMBEL943 model is composed of 943 reactions and 765 metabolites, and covers 673 genes. The model was validated by comparing its simulated cell growth phenotype obtained by constraints-based flux analysis with the V. vulnificus-specific experimental data previously reported in the literature. In this study, constraints-based flux analysis is an optimization-based simulation method that calculates intracellular fluxes under the specific genetic and environmental condition (Kim et al, 2008). As a result, 17 growth phenotypes were correctly predicted out of 18 cases, which demonstrate the validity of VvuMBEL943.
The main objective of constructing VvuMBEL943 in this study is to predict potential drug targets by system-wide analysis of the metabolic network for the effective treatment of V. vulnificus. To achieve this goal, a set of drug target candidates was predicted by taking a metabolite-centric approach. Metabolite essentiality analysis is a concept recently introduced for the study of cellular robustness to complement conventional reaction or gene-centric approach (Kim et al, 2007b). Metabolite essentiality analysis observes changes in flux distribution by removing each metabolite from the in silico metabolic network. Hence, metabolite essentiality predicts essential metabolites whose absence causes cell death. By selecting essential metabolites, it is possible to directly screen only their structural analogs, which substantially reduces the number of chemical compounds to screen from the chemical compound library. As a result of implementing this approach, 193 metabolites were initially identified to be essential to the cell. These essential metabolites were then further filtered based on the predetermined criteria, mainly organism specificity and multiple connectivity associated with each metabolite, in order to reduce the number of initial target candidates towards identifying the most effective ones.
Five essential metabolites finally selected are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA). Enzymes that consume these essential metabolites were experimentally verified to be essential, which indeed demonstrates the essentiality of these five metabolites. On the basis of the structural information of these five essential metabolites, whole-cell screening assay was performed using their analogs for possible antibacterial discovery. We screened 352 chemical analogs of the essential metabolites selected from the chemical compound library, and found a hit compound 24837, which shows the minimal inhibitory concentration (MIC) of 2 μg/ml and minimal bactericidal concentration (MBC) of 4 μg/ml, showing good antibacterial activity without further structural modification. Although this study demonstrates a proof-of-concept, the approaches and their rationale taken here should serve as a general strategy for discovering novel antibiotics and drugs based on systems-level analysis of metabolic networks.
Although the genomes of many microbial pathogens have been studied to help identify effective drug targets and novel drugs, such efforts have not yet reached full fruition. In this study, we report a systems biological approach that efficiently utilizes genomic information for drug targeting and discovery, and apply this approach to the opportunistic pathogen Vibrio vulnificus CMCP6. First, we partially re-sequenced and fully re-annotated the V. vulnificus CMCP6 genome, and accordingly reconstructed its genome-scale metabolic network, VvuMBEL943. The validated network model was employed to systematically predict drug targets using the concept of metabolite essentiality, along with additional filtering criteria. Target genes encoding enzymes that interact with the five essential metabolites finally selected were experimentally validated. These five essential metabolites are critical to the survival of the cell, and hence were used to guide the cost-effective selection of chemical analogs, which were then screened for antimicrobial activity in a whole-cell assay. This approach is expected to help fill the existing gap between genomics and drug discovery.
PMCID: PMC3049409  PMID: 21245845
drug discovery; drug targeting; genome analysis; metabolic network; Vibrio vulnificus
16.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens 
BMC Bioinformatics  2008;9:264.
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.
PMCID: PMC2443381  PMID: 18534020
17.  Potent bace-1 inhibitor design using pharmacophore modeling, in silico screening and molecular docking studies 
BMC Bioinformatics  2011;12(Suppl 1):S28.
Beta-site amyloid precursor protein cleaving enzyme (BACE-1) is a single-membrane protein belongs to the aspartyl protease class of catabolic enzymes. This enzyme involved in the processing of the amyloid precursor protein (APP). The cleavage of APP by BACE-1 is the rate-limiting step in the amyloid cascade leading to the production of two peptide fragments Aβ40 and Aβ42. Among two peptide fragments Aβ42 is the primary species thought to be responsible for the neurotoxicity and amyloid plaque formation that lead to memory and cognitive defects in Alzheimer’s disease (AD). AD is a ravaging neurodegenerative disorder for which no disease-modifying treatment is currently available. Inhibition of BACE-1 is expected to stop amyloid plaque formation and emerged as an interesting and attractive therapeutic target for AD.
Ligand-based computational approach was used to identify the molecular chemical features required for the inhibition of BACE-1 enzyme. A training set of 20 compounds with known experimental activity was used to generate pharmacophore hypotheses using 3D QSAR Pharmacophore Generation module available in Discovery studio. The hypothesis was validated by four different methods and the best hypothesis was utilized in database screening of four chemical databases like Maybridge, Chembridge, NCI and Asinex. The retrieved hit compounds were subjected to molecular docking study using GOLD 4.1 program.
Among ten generated pharmacophore hypotheses, Hypo 1 was chosen as best pharmacophore hypothesis. Hypo 1 consists of one hydrogen bond donor, one positive ionizable, one ring aromatic and two hydrophobic features with high correlation coefficient of 0.977, highest cost difference of 121.98 bits and lowest RMSD value of 0.804. Hypo 1 was validated using Fischer randomization method, test set with a correlation coefficient of 0.917, leave-one-out method and decoy set with a goodness of hit score of 0.76. The validated Hypo 1 was used as a 3D query in database screening and retrieved 773 compounds with the estimated activity value <100 nM. These hits were docked into the active site of BACE-1 and further refined based on molecular interactions with the essential amino acids and good GOLD fitness score.
The best pharmacophore hypothesis, Hypo 1, with high predictive ability contains chemical features required for the effective inhibition of BACE-1. Using Hypo 1, we have identified two compounds with diverse chemical scaffolds as potential virtual leads which, as such or upon further optimization, can be used in the designing of new BACE-1 inhibitors.
PMCID: PMC3044283  PMID: 21342558
18.  Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules 
Bioinformatics  2010;26(22):2881-2888.
Motivation: Most of the previous data mining studies based on the NCI-60 dataset, due to its intrinsic cell-based nature, can hardly provide insights into the molecular targets for screened compounds. On the other hand, the abundant information of the compound–target associations in PubChem can offer extensive experimental evidence of molecular targets for tested compounds. Therefore, by taking advantages of the data from both public repositories, one may investigate the correlations between the bioactivity profiles of small molecules from the NCI-60 dataset (cellular level) and their patterns of interactions with relevant protein targets from PubChem (molecular level) simultaneously.
Results: We investigated a set of 37 small molecules by providing links among their bioactivity profiles, protein targets and chemical structures. Hierarchical clustering of compounds was carried out based on their bioactivity profiles. We found that compounds were clustered into groups with similar mode of actions, which strongly correlated with chemical structures. Furthermore, we observed that compounds similar in bioactivity profiles also shared similar patterns of interactions with relevant protein targets, especially when chemical structures were related. The current work presents a new strategy for combining and data mining the NCI-60 dataset and PubChem. This analysis shows that bioactivity profile comparison can provide insights into the mode of actions at the molecular level, thus will facilitate the knowledge-based discovery of novel compounds with desired pharmacological properties.
Availability: The bioactivity profiling data and the target annotation information are publicly available in the PubChem BioAssay database (
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2971579  PMID: 20947527
19.  Discovery of Selective Probes and Antagonists for G Protein-Coupled Receptors FPR/FPRL1 and GPR30 
Recent technological advances in flow cytometry provide a versatile platform for high throughput screening of compound libraries coupled with high-content biological testing and drug discovery. The G protein-coupled receptors (GPCRs) constitute the largest class of signaling molecules in the human genome with frequent roles in disease pathogenesis, yet many examples of orphan receptors with unknown ligands remain. The complex biology and potential for drug discovery within this class provide strong incentives for chemical biology approaches seeking to develop small molecule probes to facilitate elucidation of mechanistic pathways and enable specific manipulation of the activity of individual receptors. We have initiated small molecule probe development projects targeting two distinct families of GPCRs: the formylpeptide receptors (FPR/FPRL1) and G protein-coupled estrogen receptor (GPR30). In each case the assay for compound screening involved the development of an appropriate small molecule fluorescent probe, and the flow cytometry platform provided inherently biological rich assays that enhanced the process of identification and optimization of novel antagonists. The contributions of cheminformatics analysis tools, virtual screening, and synthetic chemistry in synergy with the biomolecular screening program have yielded valuable new chemical probes with high binding affinity, selectivity for the targeted receptor, and potent antagonist activity. This review describes the discovery of novel small molecule antagonists of FPR and FPRL1, and GPR30, and the associated characterization process involving secondary assays, cell based and in vivo studies to define the selectivity and activity of the resulting chemical probes
PMCID: PMC2885834  PMID: 19807662
flow cytometry; fluorescent; GPCR; formylpeptide receptor; inflammation; GPR30; GPER; estrogen; nongenomic; cancer; antidepressant
20.  Target-based vs. phenotypic screenings in Leishmania drug discovery: A marriage of convenience or a dialogue of the deaf? 
Graphical abstract
•Target-based vs. target-free screenings are on the pipeline of drug discovery.•High Content Screenings have greater acceptance for drug discovery studies.•In vivo imaging of transgenic parasites are suitable for pre-clinical trials.
Drug discovery programs sponsored by public or private initiatives pursue the same ambitious goal: a crushing defeat of major Neglected Tropical Diseases (NTDs) during this decade. Both target-based and target-free screenings have pros and cons when it comes to finding potential small-molecule leads among chemical libraries consisting of myriads of compounds. Within the target-based strategy, crystals of pathogen recombinant-proteins are being used to obtain three-dimensional (3D) structures in silico for the discovery of structure-based inhibitors. On the other hand, genetically modified parasites expressing easily detectable reporters are in the pipeline of target-free (phenotypic) screenings. Furthermore, lead compounds can be scaled up to in vivo preclinical trials using rodent models of infection monitoring parasite loads by means of cutting-edge bioimaging devices. As such, those preferred are fluorescent and bioluminescent readouts due to their reproducibility and rapidity, which reduces the number of animals used in the trials and allows for an earlier stage detection of the infective process as compared with classical methods. In this review, we focus on the current differences between target-based and phenotypic screenings in Leishmania, as an approach that leads to the discovery of new potential drugs against leishmaniasis.
PMCID: PMC4266804  PMID: 25516847
NTD, Neglected Tropical Diseases; TDR, Special Programme for Research and Training in Tropical Diseases; HTS, high throughput screening; HCS, High Content Screening; Leishmania; Phenotypic HTS; Target HTS; Drug discovery
21.  MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening 
BMC Bioinformatics  2008;9:184.
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
PMCID: PMC2373571  PMID: 18402678
22.  Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity 
BMC Bioinformatics  2009;10:245.
Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community.
We introduce a new approach (LigCSRre) to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening.
LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: .
PMCID: PMC2739202  PMID: 19671127
23.  FAF-Drugs2: Free ADME/tox filtering tool to assist drug discovery and chemical biology projects 
BMC Bioinformatics  2008;9:396.
Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performed in silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making.
This paper presents FAF-Drugs2, a free adaptable tool for ADMET filtering of electronic compound collections. FAF-Drugs2 is a command line utility program (e.g., written in Python) based on the open source chemistry toolkit OpenBabel, which performs various physicochemical calculations, identifies key functional groups, some toxic and unstable molecules/functional groups. In addition to filtered collections, FAF-Drugs2 can provide, via Gnuplot, several distribution diagrams of major physicochemical properties of the screened compound libraries.
We have developed FAF-Drugs2 to facilitate compound collection preparation, prior to (or after) experimental screening or virtual screening computations. Users can select to apply various filtering thresholds and add rules as needed for a given project. As it stands, FAF-Drugs2 implements numerous filtering rules (23 physicochemical rules and 204 substructure searching rules) that can be easily tuned.
PMCID: PMC2561050  PMID: 18816385
24.  A practical Java tool for small-molecule compound appraisal 
The increased use of small-molecule compound screening by new users from a variety of different academic backgrounds calls for adequate software to administer, appraise, analyse and exchange information obtained from screening experiments. While software and spreadsheet solutions exist, there is a need for software that can be easily deployed and is convenient to use.
The Java application cApp addresses this need and aids in the handling and storage of information on small-molecule compounds. The software is intended for the appraisal of compounds with respect to their physico-chemical properties, analysis in relation to adherence to likeness rules as well as recognition of pan-assay interference components and cross-linking with identical entries in the PubChem Compound Database. Results are displayed in a tabular form in a graphical interface, but can also be written in an HTML or PDF format. The output of data in ASCII format allows for further processing of data using other suitable programs. Other features include similarity searches against user-provided compound libraries and the PubChem Compound Database, as well as compound clustering based on a MaxMin algorithm.
cApp is a personal database solution for small-molecule compounds which can handle all major chemical formats. Being a standalone software, it has no other dependency than the Java virtual machine and is thus conveniently deployed. It streamlines the analysis of molecules with respect to physico-chemical properties and drug discovery criteria; cApp is distributed under the GNU Affero General Public License version 3 and available from To download cApp, users will be asked for their name, institution and email address. A detailed manual can also be downloaded from this site, and online tutorials are available at
Electronic supplementary material
The online version of this article (doi:10.1186/s13321-015-0079-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4469258  PMID: 26082805
Compound appraisal; Molecular properties; Personal database
25.  A practical Java tool for small-molecule compound appraisal 
The increased use of small-molecule compound screening by new users from a variety of different academic backgrounds calls for adequate software to administer, appraise, analyse and exchange information obtained from screening experiments. While software and spreadsheet solutions exist, there is a need for software that can be easily deployed and is convenient to use.
The Java application cApp addresses this need and aids in the handling and storage of information on small-molecule compounds. The software is intended for the appraisal of compounds with respect to their physico-chemical properties, analysis in relation to adherence to likeness rules as well as recognition of pan-assay interference components and cross-linking with identical entries in the PubChem Compound Database. Results are displayed in a tabular form in a graphical interface, but can also be written in an HTML or PDF format. The output of data in ASCII format allows for further processing of data using other suitable programs. Other features include similarity searches against user-provided compound libraries and the PubChem Compound Database, as well as compound clustering based on a MaxMin algorithm.
cApp is a personal database solution for small-molecule compounds which can handle all major chemical formats. Being a standalone software, it has no other dependency than the Java virtual machine and is thus conveniently deployed. It streamlines the analysis of molecules with respect to physico-chemical properties and drug discovery criteria; cApp is distributed under the GNU Affero General Public License version 3 and available from To download cApp, users will be asked for their name, institution and email address. A detailed manual can also be downloaded from this site, and online tutorials are available at
Electronic supplementary material
The online version of this article (doi:10.1186/s13321-015-0079-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4469258  PMID: 26082805
Compound appraisal; Molecular properties; Personal database

Results 1-25 (1037738)