|Home | About | Journals | Submit | Contact Us | Français|
It is generally recognized that drug discovery and development are very time and resources consuming processes. There is an ever growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In biomedical arena, computer-aided or in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile and avoid safety issues. Commonly used computational approaches include ligand-based drug design (pharmacophore, a 3-D spatial arrangement of chemical features essential for biological activity), structure-based drug design (drug-target docking), and quantitative structure-activity and quantitative structure-property relationships. Regulatory agencies as well as pharmaceutical industry are actively involved in development of computational tools that will improve effectiveness and efficiency of drug discovery and development process, decrease use of animals, and increase predictability. It is expected that the power of CADDD will grow as the technology continues to evolve.
Use of computational techniques in drug discovery and development process is rapidly gaining in popularity, implementation and appreciation. Different terms are being applied to this area, including computer-aided drug design (CADD), computational drug design, computer-aided molecular design (CAMD), computer-aided molecular modeling (CAMM), rational drug design, in silico drug design, computer-aided rational drug design. Term Computer-Aided Drug Discovery and Development (CADDD) will be employed in this overview of the area to cover the entire process. Both computational and experimental techniques have important roles in drug discovery and development and represent complementary approaches. CADDD entails:
Fast expansion in this area has been made possible by advances in software and hardware computational power and sophistication, identification of molecular targets, and an increasing database of publicly available target protein structures. CADDD is being utilized to identify hits (active drug candidates), select leads (most likely candidates for further evaluation), and optimize leads i.e. transform biologically active compounds into suitable drugs by improving their physicochemical, pharmaceutical, ADMET/PK (pharmacokinetic) properties. Virtual screening is used to discover new drug candidates from different chemical scaffolds by searching commercial, public, or private 3-dimensional chemical structure databases. It is intended to reduce the size of chemical space and thereby allow focus on more promising candidates for lead discovery and optimization. The goal is to enrich set of molecules with desirable properties (active, drug-like, lead-like) and eliminate compounds with undesirable properties (inactive, reactive, toxic, poor ADMET/PK). In another words, in silico modeling is used to significantly minimize time and resource requirements of chemical synthesis and biological testing (Fig. 1). The rapid growth of virtual screening is evidenced by increase in the number of citations matching keywords “virtual screening” from 4 in 1997 to 302 in 2004 . In his 2003 review article, Green of GlaxoSmithKline concluded that: “The future is bright. The future is virtual” .
PriceWaterhouseCoopers Pharma 2005: An Industrial Revolution in R&D report  stressed the reality that pharmaceutical industry needs to find means of improving efficiency and effectiveness of drug discovery and development in order to sustain itself. This was recently echoed at the 2006 Drug Discovery Technology Conference in Boston, MA by Dr. Steven Paul, head of science and technology at Eli Lilly & Co. who stated that the current business model will become fundamentally untenable unless there is a significant improvement in efficiency and effectiveness of the process. The PriceWaterhouseCoopers report emphasized growth and value of in silico approaches to address this issue and projected that in silico methods will become dominant from drug discovery through marketing. It was suggested that we are in a transitional period where the roles of primary (laboratory and clinical studies) and secondary (computational) science are in process of reversal .
Estimates of time and cost of currently bringing a new drug to market vary, but 7–12 years and $ 1.2 billion are often cited . Furthermore, five out of 40,000 compounds tested in animals reach human testing and only one of five compounds reaching clinical studies is approved. This represents an enormous investment in terms of time, money and human and other resources. It includes chemical synthesis, purchase, curation, and biological screening of hundreds of thousands of compounds to identify hits followed by their optimization to generate leads which requiring further synthesis. In addition, predictability of animal studies in terms of both efficacy and toxicity is frequently suboptimal. Therefore, new approaches are needed to facilitate, expedite and streamline drug discovery and development, save time, money and resources, and as per pharma mantra “fail fast, fail early”. It is estimated that computer modeling and simulations account for ~ 10% of pharmaceutical R&D expenditure and that they will rise to 20% by 2016 .
Role of computational models is to increase prediction based on existing knowledge . Computational methods are playing increasingly larger and more important role in drug discovery and development [7–15] (Fig. 2) and are believed to offer means of improved efficiency for the industry . They are expected to limit and focus chemical synthesis and biological testing and thereby greatly decrease traditional resource requirements.
Growing presence, prominence and importance of CADDD is seen by multiple scientific sessions dedicated to it at major scientific conferences, e.g. 45th Annual Meeting of SOT in San Diego 2006 (http://www.toxicology.org/ai/meet/am2006/index.asp), 97th Annual Meeting of AACR in DC (http://www.aacr.org/page6029.aspx), PharmaDiscovery2006 in Bethesda, MD (http://www.pharmadiscoveryevent.com/app/homepage.cfm?appname=100304&moduleid=451&campaignid=32872&iUserCampaignID=23386619) as well as conferences dedicated to the subject such as Drug Design conference in London, UK (http://www.smi-online.co.uk/event_media/overview.asp?is=4&ref=2062) and bi-annual Gordon Research Conference-Computer-Aided Drug Design conference that dates back to 1970’s when it was known as Quantitative Structure Activity Relationships (QSAR) (http://www.grc.org/programs/2005/cadd.htm). At the 2005 London Drug Design conference, aspiration and expectation were expressed that computational methods will achieve similar role and utility in pharmaceutical industry as already exist in automotive and airplane industries.
This represents a brief overview, rather than an exhaustive review, of CADDD and the following commonly used computational approaches will be discussed: ligand-based design (e.g. pharmacophore), structure (target)-based design (e.g. docking), and quantitative structure-activity/property relationships (QSAR/QSPR) (e.g. computational predictive toxicology).
IUPAC defines pharmacophore as: “the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure. The pharmacophore can be considered as the largest common denominator shared by a set of active molecules” (http://www.chem.qmul.ac.uk/iupac/medchem/ix.html#p7). Pharmacophoric descriptors include H-bond donors, H-bond acceptors, hydrophobic, aromatic, positive ionizable groups, negative ionizable groups. They represent chemical feature complimentarity to the receptor in the 3-dimensional space. Further enhancement of a pharmacophore can be obtained by combining it with shape and exclusion volumes (steric) constraints [19, 20]. These enhancements decrease likelihood of finding molecules with a suitable 3-dimensional arrangement of functional groups but wrong shape that could prevent them from fitting into the receptor binding site. Pharmacophore requires knowledge of active ligands and/or target receptor. They are number of ways to build a pharmacophore. It can be done based on chemical structure of 3 or 4 known active compounds from different chemical scaffolds (http://www.accelrys.com/products/catalyst/catalystproducts/cathypo.html#hiphop) [21, 22]. Alternately, diverse chemical structures for about 15 compounds along with the corresponding IC50 or Ki50 values ranging over more than 3 orders of magnitude can be used (http://www.accelrys.com/products/catalyst/catalystproducts/cathypo.html) [21, 22]. Statistical validation of the pharmacophore model may be done using Fischer’s randomization test based on a random reassignment of activity values among the molecules of the training set. Further validation may be performed with a set of known active ligands, or a separate set of test compounds with known properties which were not used for model training, and ultimately by biological testing. Using published IC50 data [23–27] with values ranging over more than 3 orders of magnitude from several different chemical scaffolds, including phytoestrogens, we have derived an ERβ agonist pharmacophore (Fig. 3). Thirty compound training and twenty-two compound test set yielded correlation of 0.94 and 0.82, respectively. Resveratrol, compound with cancer chemopreventive activities, showed a reasonable fit to this pharmacophore (Fig. 3), but not quite as good as another naturally occurring chemopreventive compound genistein. In addition, pharmacophore can be designed de novo based on complimentarity to a known ligand binding site. Most commonly used pharmacophore software includes Catalyst (http://www.accelrys.com/products/catalyst/), Phase (http://www.schrodinger.com/ProductDescription.php?mID=6&sID=16&cID=0), Sybyl including Galahad, GASP, DISCOtech, and UNITY 3D (http://www.tripos.com/index.php?family=Modules,SimplePage,discovery_info), and MOE (http://www.chemcomp.com/software.htm).
Structure (target)-based drug design represents docking i.e. ligand binding to its receptor, target protein. Docking is used to identify and optimize drug candidates by examining and modeling molecular interactions between ligands and target macromolecules. An example of ligand binding and the associated van der Waals, hydrogen bonding and electrostatic energies as a function of the interatomic distance is shown in Fig. 4. Based on the X-ray structure of ERβ receptor co-crystallized with various ligands and ERα Met421 →ERβ Ile373 and ERα Leu384 →ERβ Met336 substitution in the ligand binding pocket and computational modeling, Wyeth group has designed a selective ERβ agonist, ERB-041 with similar affinity but more than 200-fold greater selectivity for ERβ than that of 17β-estradiol (Fig. 5). Structure (target)-based design requires structural information for the receptor which can be obtained from X-ray crystallography, NMR or homology modeling. The latter being another computational technique used to predict unknown protein structure from a sequence similarity to known protein structure(s). In the process of docking, multiple ligand conformations and orientations are generated and the most appropriate ones are selected. Scoring functions are applied to evaluate tightness of interaction i.e. estimate binding free energy. General observation is that consensus (combination of different scoring algorithms) scoring yields better results than individual scoring . Validations may be perfomed with known active and inactive ligands, comparisons to crystallographic data and prediction of rank-ordering and binding affinities. Several recent publications have compared different docking methods [29–33]. One of these recent studies evaluated 10 docking programs and 37 scoring functions against eight proteins of seven protein types in terms of binding mode prediction, virtual screening for lead identification, and rank-ordering by affinity for lead optimization . While these programs were able to identify active ligands, none of them performed well for all the targets. In general, the programs were also less effective in rank-ordering and predicting of ligand binding affinity. Most commonly used docking software includes Autodock (http://www.scripps.edu/mb/olson/doc/autodock/), Gold (http://www.ccdc.cam.ac.uk/products/life_sciences/gold/), Dock (http://dock.compbio.ucsf.edu/), Insight II Affinity and Cerius2 LigandFit (http://www.accelrys.com/), Sybyl including FlexE and FlexX (http://tripos.com/index.php?family=modules,SimplePage,,,&page=comp_informatics), Glide (http://www.schrodinger.com/ProductDescription.php?mID=6&sID=6), and MOE (http://www.chemcomp.com/software.htm).
Applications and benefits of CADDD have been reviewed and demonstrated in growing number of publications and supported by examples of drugs derived from the in silico approach [9, 34–39]. Virtual screening has been shown more efficient than commonly used empirical screening. Shoichet reported that ligand discovery i.e. hit rates (number of compounds binding to a target divided by number of compounds tested) is greater in virtual screening by 2 or 3 orders of magnitude than in empirical screening . Others have reported similar results [41–43]. The “receiver operating characteristic (ROC)” curves have also been used as a metric to evaluate the ability of virtual screening in discriminating between active and inactive compounds [44, 45]. Evaluations using ROC curves have shown that virtual screening can exhibit reasonable sensitivity and specificity by minimizing false negatives and false positives, respectively [44, 45].
Number of reports citing successful application of CADDD in developing specific drugs in different therapeutic areas is expanding rapidly. A very interesting example which can also serve as a proof of principle of the in silico approach involves a type I TGF β receptor kinase inhibitor. The same molecule (HTS-466284/LY-364947), a 27 nM inhibitor, was discovered independently using virtual screening by Biogen IDEC  and traditional enzyme and cell-based high-throughput screening by Eli Lilly . Another in silico modeling drug development program led to clinical trials of a novel, potent, and selective anti-anxiety, anti-depression 5-HT1A agonist in less than 2 years from the start and requiring less than 6 months of lead optimization and synthesis of only 31 compounds .
Pharmacophore library screening followed by docking represent complimentary screening methods with the combination providing optimum results . Commonly, this screening approach is preceded by a prior filtering of virtual databases (e.g. physicochemical, ADMET/PK, stability, reactivity, toxicity, drug-like properties, etc.) [9, 50–54]. This combination of screening methods has been successfully employed in designing new hits and leads, e.g. PPARγ ligand , dopamine D3 receptor agonist , antibiotics , c-Src/Abl kinase inhibitors , checkpoint-1 kinase inhibitor , MDM2-p53 inhibitor , integrin αvβ3 antagonist . Typically, this approach involves virtual screening (pharmacophore plus docking) of virtual chemical structure libraries containing hundreds of thousands of compounds and necessitating chemical synthesis and biological screening of less than 100 compounds to yield a handful of drug candidates with good receptor affinities. Recently, application and utility of this virtual screening approach in combination with activity-guided fractionation of medicinal plants was also demonstrated and coined “in combo screening” [60–64].
While computational techniques have already provided significant benefits, they hold a great promise for future progress in drug discovery and development. However, CADDD is still an evolving technology and has number of limitations [9, 28, 65]. Like high-throughput screening, virtual screening accepts limited accuracy (false-positives and false-negatives) in exchange for a list of few promising candidates for further evaluation. There are number of the issues that currently are not adequately addressed in pharmacophore modeling . For example, receptor may have more than a single active site, receptor may adapt to different ligands and multiple pharmacophores may be possible for a single site. Docking also has its limitations [28, 30, 65, 66]. Sampling of molecular conformations to account for both ligand and receptor flexibility and selection of appropriate force fields is not straight forward or simple. Number of possible conformations mushrooms with increasing molecular mass and number of rotational bonds. This presents severe demands on computational hardware and software. Assumption of structural rigidity as an approximation may have severe entropy repercussions. In addition, binding may lead to protein adaptability and additional conformational changes that are not normally considered. Another issue raised is the importance of the crystallization process and how representative is a single crystal structure . Role of solvent molecules is difficult to ascertain. Solvent molecules can play an important role in binding by serving as bridging hydrogen bonds between a ligand and its binding site or via entropy effects. Appropriate treatment of ionization and tautomerization of ligand and protein is also very important. There may be multiple binding sites and calculating ligand-receptor affinities (scoring functions) leaves lot to be desired. It is estimated that docking programs currently dock 70 – 80% of ligands correctly . One recent study proposed that false positives, a significant issue in structure-based virtual screening, stems from inability of the current docking and scoring algorithms to identify key interactions and treat them appropriately . Recognizing the drawbacks of the present state-of-art of docking and scoring, several workshops, including representatives from academia, industry and government, were held to address the issues . The outcome of the workshops and the resultant proposed plan of action can be found in that reference and the cited references therein .
Extensive computational power is required to screen millions of compounds and take into the consideration their and receptors’ flexibilities. In order to address this need, several major virtual screening efforts utilizing grid or distributed computing have been initiated. One of these involves National Foundation for Cancer Research (NFCR) Center for Computational Drug Discovery established in conjunction with Oxford University as an effort of “finding the right key to open a special lock--from billions molecular keys” (http://www.nfcr.org/Default.aspx?tabid=399) . It utilizes over 2 millions personal computers connected world wide via the Internet, and is capable of screening a library of 3.5 billion molecules against a dozen of targets in months. It has generated over 100,000 drug candidate molecules, or “hits” in the first phase of the project. Another similar project allowed scientist to screen over 35 million compounds against several smallpox proteins and returned over 100,000 hits in the first 72 hours http://www.worldcommunitygrid.org/projects_showcase/viewProjectArchives.do). Overall, 44 strong candidates were identified. Similar approach is also being utilized in the fight against AIDS by the Olson laboratory at the Scripps Research Institute (http://fightaidsathome.scripps.edu/discovery.html).
In addition to being able to efficiently identify drug hits and leads, it is also important to avoid drug attrition. Toxicity has almost doubled as a cause of drug attrition from year 1991 to 2000 and is one of the main causes of drug failure . Preclinical safety studies using animals are lengthy, expensive and frequently of limited predictability for human outcomes. In addition, there is a strong push to develop alternative in silico methods of predictive evaluation of drug toxicity in order to minimize animal testing. The expectation is that predictive toxicology will help avoid resource waste, reduce regulatory review burden and expedite review, reduce animal use, avoid need for interspecies uncertainty factors, increase accuracy, sensitivity, and specificity, and predict adverse effects not detectable in animals (e.g. nausea, dizziness, headache, cognitive impairment, etc.). European policy for the evaluation of chemicals (REACH: Registration, Evaluation, and Authorization of Chemicals) has been a strong advocate of alternative in silico methods of predictive evaluation of chemical toxicity in order to minimize animal testing and conserve time and resources . QSAR and QSPR are commonly used computational methods in predictive toxicology. In a strict sense, these two terms are not synonymous even though the term QSAR tends to be used for both QSAR and QSPR. The principle behind them is the same, but they have a different context in terms of the dependent variable, biological activity (QSAR) vs. bio-physico-chemical property (QSPR). Independent variables represent molecular descriptors, e.g. electronic, spatial, topological, conformational, thermodynamic, quantum mechanical, etc. The idea of structure-activity relationship dates back to 1868  when Crum Brown and Frazer reported on the correlation of paralyzing activity to the nature of quarternary group of a collection of strychnine-like compounds. More recently, studies of Corwin Hansch in the 1960’s demonstrated applicability and usefulness of QSAR/QSPR approach and led to its growing use [73, 74]. Interest in the use of QSAR in the regulatory arena has been growing and is being evaluated . Informatics and Computational Safety Analysis Staff (ICSAS) within CDER at FDA is actively evaluating the potential of predictive toxicology (http://www.fda.gov/cder/offices/ops_io/default.htm). They have constructed databases of toxicological and clinical endpoints and, in collaboration with software companies, are developing and evaluating data mining and QSAR computational techniques for predictive toxicology. Applying QSAR algorithms to toxicity data and corresponding chemical structures, they have developed tools for toxicity response (mutagenicity, carcinogenicity) and toxicity dosing (No Observed Effect Level, NOEL; Maximum Recommended Starting Dose, MRSD) predictions . Their carcinogenicity QSAR model with 53 descriptors and data from 2 year rodent study FDA database exhibited 76% sensitivity and 84% specificity . The anticipation is that it could avoid unexpected rodent carcinogenicity results from very costly and lengthy studies ($ 2 to $ 5 million and over 2 years) on top of the cost and time associated with a successful Phase 1 study. Positive predictability of 92.5 % and false positive rate of 4.8% were achieved using QSAR model to estimate the No Observed Effect Level (NOEL) of chemicals in man based on a database of Maximum Recommended Therapeutic Doses (MRTD) of marketed pharmaceuticals . This represents a marked improvement over the poor correlation (R2 = 0.2005) between human MRTD and rodent Maximum Tolerated Dose (MTD) as reported by the same authors . ICSAS has also developed a predictive model to estimate the Maximum Recommended Starting Dose (MRSD) for Phase 1 clinical trials based on the human Maximum Recommended Daily Dose (MRDD) (Fig. 6). QSAR validations are commonly done using internal (leave group out from training set for testing) or external (test compounds not present in the training set). Criteria for validation include accuracy and proper rank ordering. However, concerns still remain about predictability of these QSAR models for new chemical entities and need to be addressed in the future. Nevertheless, the present results are encouraging and it is hoped that predictions based on human data should decrease reliance on lengthy and expensive animal studies and limited predictability of interspecies extrapolations.
Number of open-source and free molecular modeling resources are available (http://www.chemoinf.com/) , including databases like Pubchem (http://pubchem.ncbi.nlm.nih.gov/) and Zinc (http://blaster.docking.org/zinc/). There is even simple, web-based software that allows drawing of chemical structures and estimation of some physicochemical, biological, and drug-like properties (http://www.organic-chemistry.org/prog/peo/index.html and http://www.molinspiration.com/cgi-bin/properties).
Other computational approaches are also being utilized but will not be discussed in this review. One of these, de novo or fragment-based technique [81–83] is based on local optimization and provides means of identifying new chemotypes and chemical scaffolds. In addition, new emerging fields like systems biology are expected to play important role in drug discovery and development [84–93]. Systems biology employs in silico techniques to integrate and analyze disparate chemical and biochemical data in a parallel as opposed to sequential fashion. Fueled by ‘omic’ technologies, it applies principles and mathematical tools of electrical engineering and networks to dynamic modeling and simulation of complex biological systems in a holistic manner.
Therefore, while it should be apparent that CADDD has a great potential, one should not rely on computational techniques in a black box manner and beware of the Garbage In-Garbage Out (GIGO) phenomenon. The in cerebro element is an essential and critical part of the process [7, 94]. CADDD should be based on the in cerebro-in silico-chemico-biological approach.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.