PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (835712)

Clipboard (0)
None

Related Articles

1.  AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening 
BMC Bioinformatics  2008;9:438.
Background
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
Results
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
Conclusion
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
doi:10.1186/1471-2105-9-438
PMCID: PMC2588602  PMID: 18925937
2.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
doi:10.1038/msb.2011.5
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
3.  FAF-Drugs: free ADME/tox filtering of compound collections 
Nucleic Acids Research  2006;34(Web Server issue):W738-W744.
In silico screening based on the structures of the ligands or of the receptors has become an essential tool to facilitate the drug discovery process but compound collections are needed to carry out such in silico experiments. It has been recognized that absorption, distribution, metabolism, excretion and toxicity (ADME/tox) are key properties that need to be considered early on, even during the database preparation stage. FAF-Drugs is an online service based on Frowns (a chemoinformatics toolkit) that allows users to process their own compound collections via simple ADME/Tox filtering rules such as molecular weight, polar surface area, logP or number of rotatable bonds. SMILES (Simplified Molecular Input Line Entry System), CANSMILES (canonical smiles) or SDF (structure data file) files are required as input and molecules that pass or do not pass the filters are sent back in CANSMILES format. This service should thus help scientists engaging in drug discovery campaigns. Other utilities and several compound collections suitable for in silico screening are available at our site. FAF-Drugs can be accessed at .
doi:10.1093/nar/gkl065
PMCID: PMC1538885  PMID: 16845110
4.  Frog: a FRee Online druG 3D conformation generator 
Nucleic Acids Research  2007;35(Web Server issue):W568-W572.
In silico screening methods based on the 3D structures of the ligands or of the proteins have become an essential tool to facilitate the drug discovery process. To achieve such process, the 3D structures of the small chemical compounds have to be generated. In addition, for ligand-based screening computations or hierarchical structure-based screening projects involving a rigid-body docking step, it is necessary to generate multi-conformer 3D models for each input ligand to increase the efficiency of the search. However, most academic or commercial compound collections are delivered in 1D SMILES (simplified molecular input line entry system) format or in 2D SDF (structure data file), highlighting the need for free 1D/2D to 3D structure generators. Frog is an on-line service aimed at generating 3D conformations for drug-like compounds starting from their 1D or 2D descriptions. Given the atomic constitution of the molecules and connectivity information, Frog can identify the different unambiguous isomers corresponding to each compound, and generate single or multiple low-to-medium energy 3D conformations, using an assembly process that does not presently consider ring flexibility. Tests show that Frog is able to generate bioactive conformations close to those observed in crystallographic complexes. Frog can be accessed at http://bioserv.rpbs.jussieu.fr/Frog.html.
doi:10.1093/nar/gkm289
PMCID: PMC1933180  PMID: 17485475
5.  PDTD: a web-accessible protein database for drug target identification 
BMC Bioinformatics  2008;9:104.
Background
Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) , which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.
Description
PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.
Conclusion
PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at .
doi:10.1186/1471-2105-9-104
PMCID: PMC2265675  PMID: 18282303
6.  Learning a peptide-protein binding affinity predictor with kernel ridge regression 
BMC Bioinformatics  2013;14:82.
Background
The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation.
Results
We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets.
Conclusion
On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.
doi:10.1186/1471-2105-14-82
PMCID: PMC3651388  PMID: 23497081
7.  Cyndi: a multi-objective evolution algorithm based method for bioactive molecular conformational generation 
BMC Bioinformatics  2009;10:101.
Background
Conformation generation is a ubiquitous problem in molecule modelling. Many applications require sampling the broad molecular conformational space or perceiving the bioactive conformers to ensure success. Numerous in silico methods have been proposed in an attempt to resolve the problem, ranging from deterministic to non-deterministic and systemic to stochastic ones. In this work, we described an efficient conformation sampling method named Cyndi, which is based on multi-objective evolution algorithm.
Results
The conformational perturbation is subjected to evolutionary operation on the genome encoded with dihedral torsions. Various objectives are designated to render the generated Pareto optimal conformers to be energy-favoured as well as evenly scattered across the conformational space. An optional objective concerning the degree of molecular extension is added to achieve geometrically extended or compact conformations which have been observed to impact the molecular bioactivity (J Comput -Aided Mol Des 2002, 16: 105–112). Testing the performance of Cyndi against a test set consisting of 329 small molecules reveals an average minimum RMSD of 0.864 Å to corresponding bioactive conformations, indicating Cyndi is highly competitive against other conformation generation methods. Meanwhile, the high-speed performance (0.49 ± 0.18 seconds per molecule) renders Cyndi to be a practical toolkit for conformational database preparation and facilitates subsequent pharmacophore mapping or rigid docking. The copy of precompiled executable of Cyndi and the test set molecules in mol2 format are accessible in Additional file 1.
Conclusion
On the basis of MOEA algorithm, we present a new, highly efficient conformation generation method, Cyndi, and report the results of validation and performance studies comparing with other four methods. The results reveal that Cyndi is capable of generating geometrically diverse conformers and outperforms other four multiple conformer generators in the case of reproducing the bioactive conformations against 329 structures. The speed advantage indicates Cyndi is a powerful alternative method for extensive conformational sampling and large-scale conformer database preparation.
doi:10.1186/1471-2105-10-101
PMCID: PMC2678094  PMID: 19335906
8.  AfroDb: A Select Highly Potent and Diverse Natural Product Library from African Medicinal Plants 
PLoS ONE  2013;8(10):e78085.
Computer-aided drug design (CADD) often involves virtual screening (VS) of large compound datasets and the availability of such is vital for drug discovery protocols. We assess the bioactivity and “drug-likeness” of a relatively small but structurally diverse dataset (containing >1,000 compounds) from African medicinal plants, which have been tested and proven a wide range of biological activities. The geographical regions of collection of the medicinal plants cover the entire continent of Africa, based on data from literature sources and information from traditional healers. For each isolated compound, the three dimensional (3D) structure has been used to calculate physico-chemical properties used in the prediction of oral bioavailability on the basis of Lipinski’s “Rule of Five”. A comparative analysis has been carried out with the “drug-like”, “lead-like”, and “fragment-like” subsets, as well as with the Dictionary of Natural Products. A diversity analysis has been carried out in comparison with the ChemBridge diverse database. Furthermore, descriptors related to absorption, distribution, metabolism, excretion and toxicity (ADMET) have been used to predict the pharmacokinetic profile of the compounds within the dataset. Our results prove that drug discovery, beginning with natural products from the African flora, could be highly promising. The 3D structures are available and could be useful for virtual screening and natural product lead generation programs.
doi:10.1371/journal.pone.0078085
PMCID: PMC3813505  PMID: 24205103
9.  Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery 
Chromosome 1 of Vibrio vulnificus tends to contain larger portion of essential or housekeeping genes on the basis of the genomic analysis and gene knockout experiments performed in this study, while its chromosome 2 seems to have originated and evolved from a plasmid.The genome-scale metabolic network model of V. vulnificus was reconstructed based on databases and literature, and was used to identify 193 essential metabolites.Five essential metabolites finally selected after the filtering process are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA), which were predicted to be essential in V. vulnificus, absent in human, and are consumed by multiple reactions.Chemical analogs of the five essential metabolites were screened and a hit compound showing the minimal inhibitory concentration (MIC) of 2 μg/ml and the minimal bactericidal concentration (MBC) of 4 μg/ml against V. vulnificus was identified.
Discovering new antimicrobial targets and consequently new antimicrobials is important as drug resistance of pathogenic microorganisms is becoming an increasingly serious problem in human healthcare management (Fischbach and Walsh, 2009). There clearly exists a gap between genomic studies and drug discovery as the accumulation of knowledge on pathogens at genome level has not successfully transformed into the development of effective drugs (Mills, 2006; Payne et al, 2007). In this study, we dissected the genome of a microbial pathogen in detail, and subsequently developed a systems biological strategy of employing genome-scale metabolic modeling and simulation together with metabolite essentiality analysis for effective drug targeting and discovery. This strategy was used for identifying new drug targets in an opportunistic pathogen Vibrio vulnificus CMCP6 as a model.
V. vulnificus is a Gram-negative halophilic bacterium that is found in estuarine waters, brackish ponds, or coastal areas, and its Biotype 1 is an opportunistic human pathogen that can attack immune-compromised patients, and causes primary septicemia, necrotized wound infections, and gastroenteritis. We previously found that many metabolic genes were specifically induced in vivo, suggesting that specific metabolic pathways are essential for in vivo survival and virulence of this pathogen (Kim et al, 2003; Lee et al, 2007). These results motivated us to carry out systems biological analysis of the genome and the metabolic network for new drug target discovery.
V. vulnificus CMCP6 has two chromosomes. We first re-sequenced genomic regions assembled in low quality and low depth, and subsequently re-annotated the whole genome of V. vulnificus. Horizontal gene transfer was suspected to be responsible for the diversification of each chromosome of V. vulnificus, and the presence of metabolic genes was more biased to chromosome 1 than chromosome 2. Further studies on V. vulnificus genome revealed that chromosome 2 is more prone to diversification for better adaptation to the environment than its chromosome 1, while chromosome 1 tends to expand their genetic repertoire while maintaining the core genes at a constant level.
Next, a genome-scale metabolic network VvuMBEL943 was reconstructed based on literature, databases and experiments for systematic studies on the metabolism of this pathogen and prediction of drug targets. The VvuMBEL943 model is composed of 943 reactions and 765 metabolites, and covers 673 genes. The model was validated by comparing its simulated cell growth phenotype obtained by constraints-based flux analysis with the V. vulnificus-specific experimental data previously reported in the literature. In this study, constraints-based flux analysis is an optimization-based simulation method that calculates intracellular fluxes under the specific genetic and environmental condition (Kim et al, 2008). As a result, 17 growth phenotypes were correctly predicted out of 18 cases, which demonstrate the validity of VvuMBEL943.
The main objective of constructing VvuMBEL943 in this study is to predict potential drug targets by system-wide analysis of the metabolic network for the effective treatment of V. vulnificus. To achieve this goal, a set of drug target candidates was predicted by taking a metabolite-centric approach. Metabolite essentiality analysis is a concept recently introduced for the study of cellular robustness to complement conventional reaction or gene-centric approach (Kim et al, 2007b). Metabolite essentiality analysis observes changes in flux distribution by removing each metabolite from the in silico metabolic network. Hence, metabolite essentiality predicts essential metabolites whose absence causes cell death. By selecting essential metabolites, it is possible to directly screen only their structural analogs, which substantially reduces the number of chemical compounds to screen from the chemical compound library. As a result of implementing this approach, 193 metabolites were initially identified to be essential to the cell. These essential metabolites were then further filtered based on the predetermined criteria, mainly organism specificity and multiple connectivity associated with each metabolite, in order to reduce the number of initial target candidates towards identifying the most effective ones.
Five essential metabolites finally selected are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA). Enzymes that consume these essential metabolites were experimentally verified to be essential, which indeed demonstrates the essentiality of these five metabolites. On the basis of the structural information of these five essential metabolites, whole-cell screening assay was performed using their analogs for possible antibacterial discovery. We screened 352 chemical analogs of the essential metabolites selected from the chemical compound library, and found a hit compound 24837, which shows the minimal inhibitory concentration (MIC) of 2 μg/ml and minimal bactericidal concentration (MBC) of 4 μg/ml, showing good antibacterial activity without further structural modification. Although this study demonstrates a proof-of-concept, the approaches and their rationale taken here should serve as a general strategy for discovering novel antibiotics and drugs based on systems-level analysis of metabolic networks.
Although the genomes of many microbial pathogens have been studied to help identify effective drug targets and novel drugs, such efforts have not yet reached full fruition. In this study, we report a systems biological approach that efficiently utilizes genomic information for drug targeting and discovery, and apply this approach to the opportunistic pathogen Vibrio vulnificus CMCP6. First, we partially re-sequenced and fully re-annotated the V. vulnificus CMCP6 genome, and accordingly reconstructed its genome-scale metabolic network, VvuMBEL943. The validated network model was employed to systematically predict drug targets using the concept of metabolite essentiality, along with additional filtering criteria. Target genes encoding enzymes that interact with the five essential metabolites finally selected were experimentally validated. These five essential metabolites are critical to the survival of the cell, and hence were used to guide the cost-effective selection of chemical analogs, which were then screened for antimicrobial activity in a whole-cell assay. This approach is expected to help fill the existing gap between genomics and drug discovery.
doi:10.1038/msb.2010.115
PMCID: PMC3049409  PMID: 21245845
drug discovery; drug targeting; genome analysis; metabolic network; Vibrio vulnificus
10.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens 
BMC Bioinformatics  2008;9:264.
Background
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Results
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
Conclusion
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.
doi:10.1186/1471-2105-9-264
PMCID: PMC2443381  PMID: 18534020
11.  Potent bace-1 inhibitor design using pharmacophore modeling, in silico screening and molecular docking studies 
BMC Bioinformatics  2011;12(Suppl 1):S28.
Background
Beta-site amyloid precursor protein cleaving enzyme (BACE-1) is a single-membrane protein belongs to the aspartyl protease class of catabolic enzymes. This enzyme involved in the processing of the amyloid precursor protein (APP). The cleavage of APP by BACE-1 is the rate-limiting step in the amyloid cascade leading to the production of two peptide fragments Aβ40 and Aβ42. Among two peptide fragments Aβ42 is the primary species thought to be responsible for the neurotoxicity and amyloid plaque formation that lead to memory and cognitive defects in Alzheimer’s disease (AD). AD is a ravaging neurodegenerative disorder for which no disease-modifying treatment is currently available. Inhibition of BACE-1 is expected to stop amyloid plaque formation and emerged as an interesting and attractive therapeutic target for AD.
Methods
Ligand-based computational approach was used to identify the molecular chemical features required for the inhibition of BACE-1 enzyme. A training set of 20 compounds with known experimental activity was used to generate pharmacophore hypotheses using 3D QSAR Pharmacophore Generation module available in Discovery studio. The hypothesis was validated by four different methods and the best hypothesis was utilized in database screening of four chemical databases like Maybridge, Chembridge, NCI and Asinex. The retrieved hit compounds were subjected to molecular docking study using GOLD 4.1 program.
Results
Among ten generated pharmacophore hypotheses, Hypo 1 was chosen as best pharmacophore hypothesis. Hypo 1 consists of one hydrogen bond donor, one positive ionizable, one ring aromatic and two hydrophobic features with high correlation coefficient of 0.977, highest cost difference of 121.98 bits and lowest RMSD value of 0.804. Hypo 1 was validated using Fischer randomization method, test set with a correlation coefficient of 0.917, leave-one-out method and decoy set with a goodness of hit score of 0.76. The validated Hypo 1 was used as a 3D query in database screening and retrieved 773 compounds with the estimated activity value <100 nM. These hits were docked into the active site of BACE-1 and further refined based on molecular interactions with the essential amino acids and good GOLD fitness score.
Conclusion
The best pharmacophore hypothesis, Hypo 1, with high predictive ability contains chemical features required for the effective inhibition of BACE-1. Using Hypo 1, we have identified two compounds with diverse chemical scaffolds as potential virtual leads which, as such or upon further optimization, can be used in the designing of new BACE-1 inhibitors.
doi:10.1186/1471-2105-12-S1-S28
PMCID: PMC3044283  PMID: 21342558
12.  Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity 
BMC Bioinformatics  2009;10:245.
Background
Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community.
Results
We introduce a new approach (LigCSRre) to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening.
Conclusion
LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: .
doi:10.1186/1471-2105-10-245
PMCID: PMC2739202  PMID: 19671127
13.  MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening 
BMC Bioinformatics  2008;9:184.
Background
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Results
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
Conclusion
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
doi:10.1186/1471-2105-9-184
PMCID: PMC2373571  PMID: 18402678
14.  ChemVassa: A New Method for Identifying Small Molecule Hits in Drug Discovery 
ChemVassa, a new chemical structure search technology, was developed to allow rapid in silico screening of compounds for hit and hit-to-lead identification in drug development. It functions by using a novel type of molecular descriptor that examines, in part, the structure of the small molecule undergoing analysis, yielding its “information signature.” This descriptor takes into account the atoms, bonds, and their positions in 3-dimensional space.
For the present study, a database of ChemVassa molecular descriptors was generated for nearly 16 million compounds (from the ZINC database and other compound sources), then an algorithm was developed that allows rapid similarity searching of the database using a query molecular descriptor (e.g., the signature of atorvastatin, below). A scoring metric then allowed ranking of the search results.
We used these tools to search a subset of drug-like molecules using the signature of a commercially successful statin, atorvastatin (Lipitor™). The search identified ten novel compounds, two of which have been demonstrated to interact with HMG-CoA reductase, the macromolecular target of atorvastatin. In particular, one compound discussed in the results section tested successfully with an IC50 of less than 100uM and a completely novel structure relative to known inhibitors. Interactions were validated using computational molecular docking and an Hmg-CoA reductase activity assay. The rapidity and low cost of the methodology, and the novel structure of the interactors, suggests this is a highly favorable new method for hit generation.
doi:10.2174/1874104501206010029
PMCID: PMC3601345  PMID: 23525139
Drug discovery; cheminformatics; drug discovery; small molecule
15.  Discovery of Potent Small-Molecule Inhibitors of Multidrug-Resistant Plasmodium falciparum Using a Novel Miniaturized High-Throughput Luciferase-Based Assay ▿ †  
Malaria is a global health problem that causes significant mortality and morbidity, with more than 1 million deaths per year caused by Plasmodium falciparum. Most antimalarial drugs face decreased efficacy due to the emergence of resistant parasites, which necessitates the discovery of new drugs. To identify new antimalarials, we developed an automated 384-well plate screening assay using P. falciparum parasites that stably express cytoplasmic firefly luciferase. After initial optimization, we tested two different types of compound libraries: known bioactive collections (Library of Pharmacologically Active Compounds [LOPAC] and the library from the National Institute of Neurological Disorders and Stroke [NINDS]) and a library of uncharacterized compounds (ChemBridge). A total of 12,320 compounds were screened at 5.5 μM. Selecting only compounds that reduced parasite growth by 85% resulted in 33 hits from the combined bioactive collection and 130 hits from the ChemBridge library. Fifteen novel drug-like compounds from the bioactive collection were found to be active against P. falciparum. Twelve new chemical scaffolds were found from the ChemBridge hits, the most potent of which was a series based on the 1,4-naphthoquinone scaffold, which is structurally similar to the FDA-approved antimalarial atovaquone. However, in contrast to atovaquone, which acts to inhibit the bc1 complex and block the electron transport chain in parasite mitochondria, we have determined that our new 1,4-napthoquinones act in a novel, non-bc1-dependent mechanism and remain potent against atovaquone- and chloroquine-resistant parasites. Ultimately, this study may provide new probes to understand the molecular details of the malaria life cycle and to identify new antimalarials.
doi:10.1128/AAC.00431-10
PMCID: PMC2934977  PMID: 20547797
16.  Novel Chemical Suppressors of Long QT Syndrome Identified by an in vivo Functional Screen 
Circulation  2010;123(1):23-30.
Background
Genetic long QT (LQT) syndrome is a life-threatening disorder caused by mutations that result in prolongation of cardiac repolarization. Recent work has demonstrated that a zebrafish model of LQT syndrome faithfully recapitulates several features of human disease including prolongation of ventricular action potential duration (APD), spontaneous early after-depolarizations, and 2:1 atrioventricular (AV) block in early stages of development. Due to their transparency, small size, and absorption of small molecules from their environment, zebrafish are amenable to high throughput chemical screens. We describe a small molecule screen using the zebrafish KCNH2 mutant breakdance to identify compounds that can rescue the LQT type 2 phenotype.
Methods and Results
Zebrafish breakdance embryos were exposed to test compounds at 48 hours of development and scored for rescue of 2:1 AV block at 72 hours in a 96-well format. Only compounds that suppressed the LQT phenotype in three of three fish were considered hits. Screen compounds were obtained from commercially available small molecule libraries (Prestwick and Chembridge). Initial hits were confirmed with dose response testing and time course studies. Optical mapping using the voltage sensitive dye di-4 ANEPPS was performed to measure compound effects on cardiac APDs. Screening of 1200 small molecules resulted in the identification of flurandrenolide and 2-methoxy-N-(4-methylphenyl) benzamide (2-MMB) as compounds that reproducibly suppressed the LQT phenotype. Optical mapping confirmed that treatment with each compound caused shortening of ventricular APDs. Structure activity studies and steroid receptor knockdown suggest that flurandrenolide functions via the glucocorticoid signaling pathway.
Conclusions
Using a zebrafish model of LQT type 2 syndrome in a high throughput chemical screen, we have identified two compounds, flurandrenolide and the novel compound, 2-MMB, as small molecules that rescue the zebrafish LQTS 2 by shortening the ventricular action potential duration. We provide evidence that flurandrenolide functions via the glucocorticoid receptor mediated pathway. These two molecules, and future discoveries from this screen, should yield novel tools for the study of cardiac electrophysiology and may lead to novel therapeutics for human LQT patients.
doi:10.1161/CIRCULATIONAHA.110.003731
PMCID: PMC3015011  PMID: 21098441
long QT syndrome; animal models of human disease; ion channels; chemical screening
17.  Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules 
Bioinformatics  2010;26(22):2881-2888.
Motivation: Most of the previous data mining studies based on the NCI-60 dataset, due to its intrinsic cell-based nature, can hardly provide insights into the molecular targets for screened compounds. On the other hand, the abundant information of the compound–target associations in PubChem can offer extensive experimental evidence of molecular targets for tested compounds. Therefore, by taking advantages of the data from both public repositories, one may investigate the correlations between the bioactivity profiles of small molecules from the NCI-60 dataset (cellular level) and their patterns of interactions with relevant protein targets from PubChem (molecular level) simultaneously.
Results: We investigated a set of 37 small molecules by providing links among their bioactivity profiles, protein targets and chemical structures. Hierarchical clustering of compounds was carried out based on their bioactivity profiles. We found that compounds were clustered into groups with similar mode of actions, which strongly correlated with chemical structures. Furthermore, we observed that compounds similar in bioactivity profiles also shared similar patterns of interactions with relevant protein targets, especially when chemical structures were related. The current work presents a new strategy for combining and data mining the NCI-60 dataset and PubChem. This analysis shows that bioactivity profile comparison can provide insights into the mode of actions at the molecular level, thus will facilitate the knowledge-based discovery of novel compounds with desired pharmacological properties.
Availability: The bioactivity profiling data and the target annotation information are publicly available in the PubChem BioAssay database (ftp://ftp.ncbi.nlm.nih.gov/pubchem/Bioassay/).
Contact: ywang@ncbi.nlm.nih.gov; bryant@ncbi.nlm.nih.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq550
PMCID: PMC2971579  PMID: 20947527
18.  A Discovery Funnel for Nucleic Acid Binding Drug Candidates 
Drug development research  2011;72(2):178-186.
Computational approaches are becoming increasingly popular for the discovery of drug candidates against a target of interest. Proteins have historically been the primary targets of many virtual screening efforts. While in silico screens targeting proteins has proven successful, other classes of targets, in particular DNA, remain largely unexplored using virtual screening methods. With the realization of the functional importance of many non-cannonical DNA structures such as G-quadruplexes, increased efforts are underway to discover new small molecules that can bind selectively to DNA structures. Here, we describe efforts to build an integrated in silico and in vitro platform for discovering compounds that may bind to a chosen DNA target. Millions of compounds are initially screened in silico for selective binding to a particular structure and ranked to identify several hundred best hits. An important element of our strategy is the inclusion of an array of possible competing structures in the in silico screen. The best hundred or so hits are validated experimentally for binding to the actual target structure by a high-throughput 96-well thermal denaturation assay to yield the top ten candidates. Finally, these most promising candidates are thoroughly characterized for binding to their DNA target by rigorous biophysical methods, including isothermal titration calorimetry, differential scanning calorimetry, spectroscopy and competition dialysis.This platform was validated using quadruplex DNA as a target and a newly discovered quadruplex binding compound with possible anti-cancer activity was discovered. Some considerations when embarking on virtual screening and in silico experiments are also discussed.
doi:10.1002/ddr.20414
PMCID: PMC3090163  PMID: 21566705
drug discovery; in silico screening; SURFLEX-DOCK; DNA; G-quadruplex; high-throughput screening
19.  Discovery of Selective Probes and Antagonists for G Protein-Coupled Receptors FPR/FPRL1 and GPR30 
Recent technological advances in flow cytometry provide a versatile platform for high throughput screening of compound libraries coupled with high-content biological testing and drug discovery. The G protein-coupled receptors (GPCRs) constitute the largest class of signaling molecules in the human genome with frequent roles in disease pathogenesis, yet many examples of orphan receptors with unknown ligands remain. The complex biology and potential for drug discovery within this class provide strong incentives for chemical biology approaches seeking to develop small molecule probes to facilitate elucidation of mechanistic pathways and enable specific manipulation of the activity of individual receptors. We have initiated small molecule probe development projects targeting two distinct families of GPCRs: the formylpeptide receptors (FPR/FPRL1) and G protein-coupled estrogen receptor (GPR30). In each case the assay for compound screening involved the development of an appropriate small molecule fluorescent probe, and the flow cytometry platform provided inherently biological rich assays that enhanced the process of identification and optimization of novel antagonists. The contributions of cheminformatics analysis tools, virtual screening, and synthetic chemistry in synergy with the biomolecular screening program have yielded valuable new chemical probes with high binding affinity, selectivity for the targeted receptor, and potent antagonist activity. This review describes the discovery of novel small molecule antagonists of FPR and FPRL1, and GPR30, and the associated characterization process involving secondary assays, cell based and in vivo studies to define the selectivity and activity of the resulting chemical probes
PMCID: PMC2885834  PMID: 19807662
flow cytometry; fluorescent; GPCR; formylpeptide receptor; inflammation; GPR30; GPER; estrogen; nongenomic; cancer; antidepressant
20.  Antileishmanial High-Throughput Drug Screening Reveals Drug Candidates with New Scaffolds 
Drugs currently available for leishmaniasis treatment often show parasite resistance, highly toxic side effects and prohibitive costs commonly incompatible with patients from the tropical endemic countries. In this sense, there is an urgent need for new drugs as a treatment solution for this neglected disease. Here we show the development and implementation of an automated high-throughput viability screening assay for the discovery of new drugs against Leishmania. Assay validation was done with Leishmania promastigote forms, including the screening of 4,000 compounds with known pharmacological properties. In an attempt to find new compounds with leishmanicidal properties, 26,500 structurally diverse chemical compounds were screened. A cut-off of 70% growth inhibition in the primary screening led to the identification of 567 active compounds. Cellular toxicity and selectivity were responsible for the exclusion of 78% of the pre-selected compounds. The activity of the remaining 124 compounds was confirmed against the intramacrophagic amastigote form of the parasite. In vitro microsomal stability and cytochrome P450 (CYP) inhibition of the two most active compounds from this screening effort were assessed to obtain preliminary information on their metabolism in the host. The HTS approach employed here resulted in the discovery of two new antileishmanial compounds, bringing promising candidates to the leishmaniasis drug discovery pipeline.
Author Summary
Every year, more than 2 million people worldwide suffer from leishmaniasis, a neglected tropical disease present in 88 countries. The disease is caused by the single-celled protozoan parasite species of the genus Leishmania, which is transmitted to humans by the bite of the sandfly. The disease manifests itself in a broad range of symptoms, and its most virulent form, named visceral leishmaniasis, is lethal if not treated. Most of the few available treatments for leishmaniasis were developed decades ago and are often toxic, sometimes even leading to the patient's death. Furthermore, the parasite is developing resistance to available drugs, making the discovery and development of new antileishmanials an urgent need. To tackle this problem, the authors of this study employed the use of high-throughput technologies to screen a large library of small, synthetic molecules for their ability to interfere with the viability of Leishmania parasites. This study resulted in the discovery of two novel compounds with leishmanicidal properties and promising drug-like properties, bringing new candidates to the leishmaniasis drug discovery pipeline.
doi:10.1371/journal.pntd.0000675
PMCID: PMC2864270  PMID: 20454559
21.  FAF-Drugs2: Free ADME/tox filtering tool to assist drug discovery and chemical biology projects 
BMC Bioinformatics  2008;9:396.
Background
Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performed in silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making.
Results
This paper presents FAF-Drugs2, a free adaptable tool for ADMET filtering of electronic compound collections. FAF-Drugs2 is a command line utility program (e.g., written in Python) based on the open source chemistry toolkit OpenBabel, which performs various physicochemical calculations, identifies key functional groups, some toxic and unstable molecules/functional groups. In addition to filtered collections, FAF-Drugs2 can provide, via Gnuplot, several distribution diagrams of major physicochemical properties of the screened compound libraries.
Conclusion
We have developed FAF-Drugs2 to facilitate compound collection preparation, prior to (or after) experimental screening or virtual screening computations. Users can select to apply various filtering thresholds and add rules as needed for a given project. As it stands, FAF-Drugs2 implements numerous filtering rules (23 physicochemical rules and 204 substructure searching rules) that can be easily tuned.
doi:10.1186/1471-2105-9-396
PMCID: PMC2561050  PMID: 18816385
22.  Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification 
One of the initial steps of modern drug discovery is the identification of small organic molecules able to inhibit a target macromolecule of therapeutic interest. A small proportion of these hits are further developed into lead compounds, which in turn may ultimately lead to a marketed drug. A commonly used screening protocol used for this task is high-throughput screening (HTS). However, the performance of HTS against antibacterial targets has generally been unsatisfactory, with high costs and low rates of hit identification. Here, we present a novel computational methodology that is able to identify a high proportion of structurally diverse inhibitors by searching unusually large molecular databases in a time-, cost- and resource-efficient manner. This virtual screening methodology was tested prospectively on two versions of an antibacterial target (type II dehydroquinase from Mycobacterium tuberculosis and Streptomyces coelicolor), for which HTS has not provided satisfactory results and consequently practically all known inhibitors are derivatives of the same core scaffold. Overall, our protocols identified 100 new inhibitors, with calculated Ki ranging from 4 to 250 μM (confirmed hit rates are 60% and 62% against each version of the target). Most importantly, over 50 new active molecular scaffolds were discovered that underscore the benefits that a wide application of prospectively validated in silico screening tools is likely to bring to antibacterial hit identification.
doi:10.1098/rsif.2012.0569
PMCID: PMC3481598  PMID: 22933186
virtual screening; antibacterial hit identification; chemoinformatics; bioinformatics; machine learning; high-throughput screening
23.  Automation of AMOEBA polarizable force field parameterization for small molecules 
Theoretical chemistry accounts  2012;131(3):1138-.
A protocol to generate parameters for the AMOEBA polarizable force field for small organic molecules has been established, and polarizable atomic typing utility, Poltype, which fully automates this process, has been implemented. For validation, we have compared with quantum mechanical calculations of molecular dipole moments, optimized geometry, electrostatic potential, and conformational energy for a variety of neutral and charged organic molecules, as well as dimer interaction energies of a set of amino acid side chain model compounds. Furthermore, parameters obtained in gas phase are substantiated in liquid-phase simulations. The hydration free energy (HFE) of neutral and charged molecules have been calculated and compared with experimental values. The RMS error for the HFE of neutral molecules is less than 1 kcal/mol. Meanwhile, the relative error in the predicted HFE of salts (cations and anions) is less than 3% with a correlation coefficient of 0.95. Overall, the performance of Poltype is satisfactory and provides a convenient utility for applications such as drug discovery. Further improvement can be achieved by the systematic study of various organic compounds, particularly ionic molecules, and refinement and expansion of the parameter database.
doi:10.1007/s00214-012-1138-6
PMCID: PMC3322661  PMID: 22505837
AMOEBA; Polarizable force field; Small molecule modeling; Poltype; Atomic typer; Molecular dynamics
24.  Mining collections of compounds with Screening Assistant 2 
Background
High-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened.
Results
We present Screening Assistant 2 (SA2), an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering. We illustrate the use of SA2 by analysing the composition of a database of 15 million compounds collected from 73 providers, in terms of scaffolds, frameworks, and undesired properties as defined by recently proposed HTS SMARTS filters. We also show how the software can be used to create diverse libraries based on existing ones.
Conclusions
Screening Assistant 2 is a user-friendly, open-source software that can be used to manage collections of compounds and perform simple to advanced chemoinformatics analyses. Its modular design and growing documentation facilitate the addition of new functionalities, calling for contributions from the community. The software can be downloaded at http://sa2.sourceforge.net/.
doi:10.1186/1758-2946-4-20
PMCID: PMC3547782  PMID: 23327565
Chemical libraries; Molecular diversity; DRCS
25.  Non-peptidic Cruzain Inhibitors with Trypanocidal Activity Discovered by Virtual Screening and In Vitro Assay 
A multi-step cascade strategy using integrated ligand- and target-based virtual screening methods was developed to select a small number of compounds from the ZINC database to be evaluated for trypanocidal activity. Winnowing the database to 23 selected compounds, 12 non-covalent binding cruzain inhibitors with affinity values (Ki) in the low micromolar range (3–60 µM) acting through a competitive inhibition mechanism were identified. This mechanism has been confirmed by determining the binding mode of the cruzain inhibitor Nequimed176 through X-ray crystallographic studies. Cruzain, a validated therapeutic target for new chemotherapy for Chagas disease, also shares high similarity with the mammalian homolog cathepsin L. Because increased activity of cathepsin L is related to invasive properties and has been linked to metastatic cancer cells, cruzain inhibitors from the same library were assayed against it. Affinity values were in a similar range (4–80 µM), yielding poor selectivity towards cruzain but raising the possibility of investigating such inhibitors for their effect on cell proliferation. In order to select the most promising enzyme inhibitors retaining trypanocidal activity for structure-activity relationship (SAR) studies, the most potent cruzain inhibitors were assayed against T. cruzi-infected cells. Two compounds were found to have trypanocidal activity. Using compound Nequimed42 as precursor, an SAR was established in which the 2-acetamidothiophene-3-carboxamide group was identified as essential for enzyme and parasite inhibition activities. The IC50 value for compound Nequimed42 acting against the trypomastigote form of the Tulahuen lacZ strain was found to be 10.6±0.1 µM, tenfold lower than that obtained for benznidazole, which was taken as positive control. In addition, by employing the strategy of molecular simplification, a smaller compound derived from Nequimed42 with a ligand efficiency (LE) of 0.33 kcal mol−1 atom−1 (compound Nequimed176) is highlighted as a novel non-peptidic, non-covalent cruzain inhibitor as a trypanocidal agent candidate for optimization.
Author Summary
Chagas disease (American trypanosomiasis) is a parasitic infection that kills millions of mostly poverty-stricken people in Latin America. In recent years it has also spread to nonendemic countries – the United States, Canada, Europe, Australia and Japan – as a result of immigration. The only available drugs for its treatment were introduced more than forty years ago, have low efficacy, and cause various severe side effects. This dire public health situation has prompted us to search for new small molecules to act as drug candidates to treat Chagas disease. The T. cruzi enzyme cruzain, a key biological catalyst used by the protozoan to digest host proteins, is a validated drug target for Chagas disease. By combining in silico molecular design, X-ray crystallography and biological screening, we found a new class of non-covalent small molecules that inhibit cruzain in low micromolar concentrations.
doi:10.1371/journal.pntd.0002370
PMCID: PMC3750009  PMID: 23991231

Results 1-25 (835712)