Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Acc Chem Res. Author manuscript; available in PMC 2010 June 16.
Published in final edited form as:
PMCID: PMC2727934

Efficient Drug Lead Discovery and Optimization


During the 1980s, advances in the abilities to perform computer simulations of chemical and biomolecular systems and to calculate free energy changes led to the expectation that such methodology would soon show great utility for guiding molecular design. Important potential applications included design of selective receptors, catalysts, and regulators of biological function including enzyme inhibitors. This time also saw the rise of high-throughput screening and combinatorial chemistry along with complementary computational methods for de novo design and virtual screening including docking. These technologies appeared poised to deliver diverse lead compounds for any biological target. As with many technological advances, realization of the expectations required significant additional effort and time. However, as summarized here, striking success has now been achieved for computer-aided drug lead generation and optimization.

An external file that holds a picture, illustration, etc.
Object name is nihms105169f9.jpg

Both de novo design using molecular growing and docking are illustrated for lead generation, and lead optimization features free energy perturbation calculations in conjunction with Monte Carlo statistical mechanics simulations for protein-inhibitor complexes in aqueous solution. The specific applications are to the discovery of non-nucleoside inhibitors of HIV reverse transcriptase (HIV-RT) and inhibitors of the binding of the proinflammatory cytokine MIF to its receptor, CD74. A standard protocol is presented that includes scans for possible additions of small substituents to a molecular core, interchange of heterocycles, and focused optimization of substituents at one site. Initial leads with activities at low-µM concentrations have been advanced rapidly to low-nM inhibitors.


This Account highlights recent advances in a core activity for drug discovery, structure-based design.1 The design is typically for small molecules that bind to a biomolecular target and inhibit its function, and the design process features building three-dimensional structures of complexes of the small molecules with the target. Structure-based design can be carried out with nothing more than the target structure, which most often comes from X-ray crystallography, and graphics tools for placing small molecules in the proposed binding site. However, additional insights provided by evaluation of the molecular energetics for the binding process are central to most current structure-based design activities. Some experiences and issues that have been addressed in the development and application of improved computational methodology for structure-based design are summarized here. The principal activities are the discovery of initial lead compounds, which show some activity in an assay measuring biological response, and their subsequent optimization to obtain greater potency and pharmacologically acceptable properties.

Lead Generation

Lead generation and optimization can be pursued through joint computational and experimental studies. As summarized in Figure 1, our approach features two pathways for lead generation, de novo design with the ligand-growing program BOMB (Biochemical and Organic Model Builder)2 and virtual screening using the docking-program Glide.3 Fragment-based design, which involves the docking and linking together of multiple small molecules in a binding site, is another popular alternative.4 Desirable compounds from de novo design normally have to be synthesized, while compounds from virtual screening of commercial catalogs are typically purchased. In both cases, it is preferred to begin with a high-resolution crystal structure for a complex of the target protein with a ligand. Though the ligand is removed, it is not advisable to start from an apo structure, which may have side chains repositioned to fill partially the vacant binding site.

Figure 1
Schematic outline for structure-based lead discovery and optimization.

De Novo Design

BOMB is used to grow molecules by adding layers of substituents to a core that is isolated or that has been placed in a binding site.2 In one run, up to four hydrogen atoms can be replaced by new groups L1 to L4. Alternative “topologies” are used such that L1-L4 can replace hydrogens in the core C or they may be linked together in different patterns, e.g., L2-L1-C-L3-L4 or C-L1-L2-L3-L4. BOMB includes a library of ca. 700 possible substituents Li including most common heterocycles and substituted phenyl groups. The substituents are organized in groupings Gi such as 5Het (5-membered ring heterocycles), 6Het, biHet, Ψ (hydrophobic), 3PhX (meta-phenyl-X), OR, etc. The core C may be as simple as, e.g., ammonia or benzene, or it may represent a polycyclic framework of a lead series. For a typical BOMB run, the user specifies the core, the topology, and the Gi. These define a “template”, which is equivalent to a combinatorial library, and all molecules corresponding to the template are grown. The user generally picks a template because it conforms to the geometry of the target binding site and because the molecules are expected to be amenable to synthesis. For each molecule that is grown, a thorough conformational search is performed. The dihedral angles for the conformer are optimized along with its position and orientation in the binding site using the OPLS-AA force field for the protein and OPLS/CM1A for the analogue.5 The resultant lowest-energy conformer is evaluated with a docking-like scoring function to predict activity.

In our search for non-nucleoside inhibitors of HIV reverse transcriptase (NNRTIs), dozens of templates have been considered and ca. 105 molecules have been grown and evaluated using BOMB. The NNRTI binding site is illustrated in Figure 2; a striking feature is the array of aromatic residues, Tyr181, Tyr188, Trp229, and Phe227, which form a “π-box”. A secondary consideration is the polar feature of the carbonyl group of Lys101, which is directed into the binding site. All templates were designed to deliver an unsaturated hydrophobic group U into the π-box, and many also incorporate an NH group to hydrogen bond with Lys101. A template that yielded promising results is Het-NH-34Ph-U where Het represents a monocyclic heterocycle, and 34Ph is a 3- or 4-substituted phenyl group. The library was grown starting with ammonia as the core, which was positioned to form a hydrogen bond with the carbonyl group of Lys101. In this topology, the U group is delivered from the right in Figure 2. Alternatively it could be delivered from the left, for example, in a U-Het-NH-PhX or U-Het-OCH2CH2-Het series (vide infra). In the example, there were ca. 50 Het and 50 U options, so the program built ca. 2500 Het-NH-3-Ph-U and 2500 Het-NH-4-Ph-U possibilities. This exercise resulted in identification of Het = 2-thiazolyl and U = dimethylallyloxy as a promising pair. It had a very low BOMB-score, and upon full optimization it yielded a very favorable protein-ligand interaction energy and no obvious liabilities. Subsequent synthesis of the thiazole 1 did provide a 10-µM lead using an MT-2 cell-based assay for anti-HIV activity. As described below, this lead was optimized to multiple highly potent NNRTIs including the triazine derivative in Figure 2 (31 nM) and the 2-nM inhibitor 2.69

Figure 2
Complex of HIV-RT with a non-nucleoside inhibitor (NNRTI) built using BOMB. The hydrogen bond with the oxygen atom of Lys101 is dashed.
An external file that holds a picture, illustration, etc.
Object name is nihms105169f10.jpg

Some additional details should be noted. The host, typically a protein, is rigid in the BOMB optimizations except for variation of terminal dihedral angles for side chains with hydrogen-bonding groups. The current scoring function has been trained to reproduce experimental activity data for more than 300 complexes of HIV-RT, COX-2, FK506 binding protein, and p38 kinase.2 It yields a correlation coefficient r2 of 0.58 for the computed vs. observed log(activities). The scoring function only contains five descriptors, the most significant of which is an estimate of the analogue’s octanol/water partition coefficient from QikProp (QPlogP).10 This supports the adage that increased hydrophobicity often leads to increased binding. However, refinement for quality of fit is needed using the host-ligand interaction energy or an index of mismatched contacts. The results from BOMB include the structures of the complexes and a spreadsheet with one row for each analogue summarizing computed quantities including host-analogue energy components as well as predicted properties from QikProp. The processing time per analogue is typically 5–30 seconds depending on the number of conformers. BOMB is also used to generate all input files for the subsequent, more rigorous free-energy perturbation (FEP) calculations that are used for lead optimization.

Virtual Screening

The common alternative to de novo design is virtual screening of available compound collections using docking software.1b Promising compounds are purchased and assayed (Figure 1). Many reviews and comparisons for alternative software and scoring functions are available.3,1113 There have been numerous success stories, though it is accepted that correct rank-ordering of compounds for activity is beyond the current capabilities. This is not surprising in view of the thermodynamic complexity of host-ligand binding including potential conformational changes for both host and guest upon binding.14 A beneficial ancillary feature of docking large compound collections is that interesting structural motifs often emerge as potential cores.

Our earliest docking effort did not yield active compounds, though it suggested a lead series that produced potent anti-HIV agents.2,15 70,000 compounds from the Maybridge catalog plus 20 known NNRTIs were processed. After initial filters and docking using Glide 3.5 with standard precision,3 the top 500 compounds were re-docked in extra-precision (XP) mode.16 The top 100 of these were post-scored with an MM-GB/SA method that provided high correlation between predicted and observed activities.15 Though known NNRTIs were retrieved well, assaying of ca. 20 high-scoring compounds failed to yield any actives. Persisting, the highest-ranked library compound, the inactive oxadiazole 3, seemed to have a viable core and was pursued. The ring substituents were removed and a set of small substituents was reintroduced in place of each hydrogen using BOMB; scoring with BOMB, followed by FEP-guided optimization led to synthesis and assaying of several polychloro-analogues with EC50 values as low as 310 nM in the MT-2 cell assay.2 Further cycles of FEP-guided optimization led to novel, very potent NNRTIs including the oxazole derivative 4, as described more below.17

An external file that holds a picture, illustration, etc.
Object name is nihms105169f11.jpg

A more recent virtual screening exercise was strikingly successful.18 New protocols had evolved including use of the much larger ZINC database of ca. 2.1 million commercial compounds.19 The goal was to disrupt the binding of MIF (macrophage migration inhibitory factor) to its receptor CD74. MIF is a cytokine, which is viewed to play a key role in inflammatory diseases and cancer.20,21 Curiously, MIF is also a ketoenol isomerase and the interaction of MIF with CD74 seems to occur in the vicinity of the tautomerase active site. The docking was performed using Glide 4.0 and the 1gcz crystal structure of a complex of MIF. In addition to the ZINC collection, the Maybridge HitFinder library was screened, which provided an additional 24,000 compounds. After processing with SP Glide, the top-ranked 40,000 compounds from ZINC and 1,000 from Maybridge were re-docked in XP mode.1 The large ZINC collection yielded hundreds of compounds with XP scores lower than for any Maybridge compounds. The average molecular weights for the top 1,000 ZINC and Maybridge compounds were both near 310, so the improved performance with ZINC results from greater structural variety.

Finally, the Glide poses for ca. 1200 of the top-ranked compounds were displayed and 34 compounds were selected. Only 24 were actually available for purchase, and 23 compounds were ultimately submitted to a MIF-CD74 binding assay. Remarkably, 11 of the compounds inhibited the protein-protein association in the µM regime including four compounds with IC50 values below 5 µM. Inhibition of MIF tautomerase activity was also established for many of the compounds with IC50 values as low as 0.5 µM (Figure 3). Optimization of several of the lead series is well along. It is expected that contributors to the success with the virtual screening in this case were improvements with the XP scoring, use of the large ZINC library, and the relatively small binding site and consequently small number of rotatable bonds for potential inhibitors. In view of the sensitivity of activity to structure, e.g., 14, it is unlikely that active compounds can be found in small libraries unless the assays can be run at high concentration. Even with a viable core, the chance is low that a small library will contain a derivative with a viable substituent pattern.

Figure 3
(Left) Diverse inhibitors of MIF discovered by docking, purchase, and assaying. (Right) Computed image of the benzoisothiazolone, a 4-µM tautomerase inhibitor, bound to MIF. Binding features aryl-aryl interactions and hydrogen bonding.

ADME Analyses

As leads are pursued, consideration of potential pharmacological liabilities is important. The issue became increasingly salient in the 1990s owing to high failure rates for compounds in clinical trials that could be ascribed to ADME (absorption, distribution, metabolism, excretion) and toxicity problems.22 Recognition arose that compounds developed in the post-HTS era were tending to be larger and more hydrophobic, which is accompanied by solubility and bioavailability deficiencies.23 Consequently, more effort was placed on quantitative prediction of molecular properties beyond log Po/w using statistical procedures, which are trained on experimental data.24,25

In Figure 1, the choice for ADME analyses is QikProp, which was among the earliest programs that predicted a substantial array of pharmacologically relevant properties. Version 1.0 from March 2000 provided predictions for intrinsic aqueous solubility, Caco-2 cell permeability and several partition coefficients including octanol/water. The input for QikProp is a three-dimensional molecular structure, and it mostly uses linear regression equations with descriptors such as surface areas and hydrogen-bond donor and acceptor counts. By version 3.0 from 2006, the output covered 18 quantities including log BB for brain/blood partitioning, log Khsa for serum albumin binding, and primary metabolites.10 Execution time is negligible since the most timeconsuming computation is for the molecule’s surface area.

In order to gauge acceptable ranges of predicted properties, QikProp 3.0 was used to process ca. 1700 known, neutral oral drugs.8,26 Consistent with the log Po/w limit of 5 in Lipinski’s rules,23 91% of oral drugs are found to have QPlogP values below 5.0. For aqueous solubility, 90% of the QPlogS values are above −5.7, i.e., S is greater than 1 µM. The QikProp results also state that 90% of oral drugs have cell permeabilities PCaco above 22 nm/s and no more than 6 primary metabolites. These quantities address important components of bioavailablility, namely, solubility, cell permeability, and metabolism. For the purposes of Figure 1, a compound is viewed as potentially problematic if it does not satisfy a ‘rule-of-three’: predicted log S > −6, PCaco > 30 nm/s, and maximum number of primary metabolites of 6. For activity requiring blood-brain barrier penetration, the predicted log BB should also be positive. There are exceptions to such rules; however, it would be imprudent to ignore property distributions for known drugs.

Lead Optimization

It is assumed that inhibitory potency increases with increasing biomolecule-inhibitor binding. So, on the computational side, the key for lead optimization is accurate prediction of biomolecule-ligand binding affinities.11 There are many approaches, but the potentially most accurate ones are the most rigorous.1,11 Currently, the best that is done is to model the complexes in the presence of hundreds or thousands of explicit water molecules using Monte Carlo statistical mechanics (MC)27 or molecular dynamics (Figure 4). Classical force fields5 are used, and extensive sampling is performed for key external and internal degrees of freedom for the complexes, solvent, and any counterions. FEP and thermodynamic integration (TI) calculations then provide formally rigorous means to compute free-energy changes.28,29 For biomolecule-ligand affinities, perturbations are made to convert one ligand to another using the thermodynamic cycle in Figure 4. The conversions involve a coupling parameter that causes one molecule to be smoothly mutated to the other.30 The difference in free energies of binding for the ligands X and Y then comes from ΔΔGb = ΔGX - ΔGY = ΔGF - ΔGC. Two series of mutations are performed to convert X to Y unbound in water and complexed to the biomolecule, which yield ΔGF and ΔGC.

Figure 4
(Left) A protein-ligand complex in a water droplet; typically, the ligand, 200–300 nearby residues, and 1000 water molecules are modeled. (Right) Thermodynamic cycle for relative free energies of binding. P is the receptor and X and Y are two ...

Absolute free energies of binding are not obtained, but for lead optimization it is sufficient to assess the effects of making changes or additions to a core structure in the same spirit as synthetic modifications. Though the FEP or TI calculations are rigorous, their accuracy is affected by many issues including the quality of the force fields, missing polarization effects, and possibly inadequate configurational sampling, associated with infrequent conformational changes. The idea of using such calculations for molecular design goes back at least to the mid-1980s in reports of the first FEP calculation for conversion of a molecule X to molecule Y30 and of the first FEP calculations for protein-ligand binding.31 A final comment from McCammon’s review on “Computer-aided molecular design” in Science in 1987 was perspicacious: “The attentive reader will have noticed that no molecules were actually designed in the work described here.”32 The situation remained basically unchanged for almost 20 years. As the convergence of FEP calculations was investigated, it was apparent that they were too computationally intensive for routine use given the computer resources available before ca. 2000.

Thus, until recently, FEP and TI calculations on protein-ligand systems predominantly addressed reproduction of known experimental data for small numbers of inhibitors. Kollman was a strong advocate of the potential of free-energy calculations for molecular design, and Merz and he did report a rare, prospective FEP result on the binding of an inhibitor to thermolysin.33 Pearlman also advanced the technology, though recent publications were still retrospective and confined to a simple congeneric series of 16 kinase inhibitors.34 In addition, Reddy and Erion have used FEP calculations to evaluate contributions of heteroatoms and small groups to binding.35 Our own computations on protein-ligand binding began to appear in 1997 using MC/FEP methodology.36 Many topics were subsequently addressed including substituent optimization,37 COX-2/COX-1 selectivity,38 heterocycle optimization,39 and the effects of HIV-RT mutations on the activities of NNRTIs.4043 It should be noted that we have tried more approximate procedures such as linear response and MM/GBSA, but they have not proven to be accurate enough to direct lead optimization.1a,15

FEP-Guided Optimization of Azines as NNRTIs

With this preparation, large increases in computer resources, the hiring of synthetic chemists, and collaboration with biologists, FEP-guided lead optimization projects were initiated. Early success in the optimization of NNRTIs is reflected by the progression from 1 to 2 in the Het-NH-3-Ph-U series.68 MC/FEP calculations were used initially to optimize the heterocycle and the substituent X in the phenyl ring. This quickly led to selection of 2-pyrimidinyl and 2-(1,3,5)-triazinyl for the heterocycle and chlorine or a cyano group for X. More than 10 alternatives for the U group, which scored well with BOMB, were also considered. In this case, synthesis of the alternatives was relatively straightforward and dimethylallyloxy emerged as optimal.6,7 These combinations yielded NNRTIs with EC50 values near 200 nM.

An external file that holds a picture, illustration, etc.
Object name is nihms105169f12.jpg

The next step was optimization of substituents R for the heterocycle.8 For the 2-substituted-pyrimidines, the immediate question was would 4,6-disubstitution be favorable or would mono-substitution at either position be preferred. In complexes with HIV-RT, the 4- and 6-positions are not equivalent; e.g., in Figure 2, the methoxy group could be directed towards the viewer (“out”) or away (“in”), as shown. From display of modeled complexes, the preferences were not obvious. This was clarified by MC/FEP results, which strongly favored a single small substituent on the pyrimidine ring oriented “in”. Synthesis of such substituted pyrimidines and triazines yielded many NNRTIs with EC50’s below 20 nM.68 There was good correlation between the FEP results and the observed activities.6,8 The methoxy-substituted pyrimidine 2 is the most potent, though it is also relatively cytotoxic (CC50 = 230 nM). The corresponding 1,3,5-triazine derivative is also potent (11 nM) and has a large safety margin (CC50 = 42 µM).8 Its p-chloro analogue (31 nM) is depicted in Figure 2.

Heterocycle Scans

In the initial optimization of the heterocycle, both 5- and 6-membered rings were considered and FEP calculations provided ΔΔGb for 6 alternatives in both cases.6 This process can be referred to as a heterocycle scan. It pointed to 2-pyrimidinyl as optimal in the 6-membered series and 2-thiazolyl, 4-1,2,3-(1H)triazolyl, and 2-oxazolyl as the best 5-membered choices. The latter two alternatives were less synthetically accessible and were not pursued; however, synthesis and assaying of several of the 6-membered options confirmed the predicted preference for 2-pyrimidinyl.6

FEP results also established the orientation of the methoxy methyl group in the pyrimidine and triazine derivatives to be as shown in Figure 2, i.e., pointing towards Phe227 rather than Tyr181. This knowledge then suggested cyclization of the methoxy group back into the azine ring to form fused heterocycles: Analogues with 6-5 fused rings were pursued driven by the prospective FEP results in Figure 5. Synthesis and assaying of 6 of the compounds showed close parallel between the predicted and observed activities.9 The illustrated furanopyrimidine derivative was predicted and observed to be the most potent; it is a highly novel, 5-nM NNRTI. The results highlight the accuracy of the FEP predictions and again the sensitivity of activity to structure. The pyrrolopyrimidine (130 nM) and pyrrolopyrazine (19 nM) pair is particularly striking. After the fact, analyses revealed more negative charge on the pyrazinyl nitrogen leading to stronger hydrogen bonding with the backbone of Lys101.9 Clearly, a computational heterocycle scan can be a powerful, time-saving, leadoptimization tool.39 There are many options and the synthetic challenges can be great. In the example, heteroaryl halides were needed for reaction with substituted anilines; several were not previously reported and required considerable synthetic effort.9

Figure 5
Heterocycle scan in the biHet-NH-3-Ph-U series: FEP results for relative ΔGb (kcal/mol) and experimental anti-HIV activities (nM).
An external file that holds a picture, illustration, etc.
Object name is nihms105169f13.jpg

Changing heterocycles in the center of a structure can be particularly difficult. For example, synthesis of 3 and 4 requires fundamentally different procedures for the ring construction.17 Such changes in chemotype can lead to a significant delay as a viable synthetic route is found for the new target. In the case of this U-5Het-NH-4PhX series, an FEP scan was carried out for 11 alternative 5-membered-ring heterocycles by perturbation from the corresponding thiophene.17 Remarkably, the only one that was predicted to be more active than the oxadiazole was the 2,5-disubstituted-oxazole. The prediction was confirmed and provided a major step forward (Figure 6). It is noted that the ca. 8-fold activity improvement, which corresponds to a ΔΔG of 1.2 kcal/mol, is less than the computed ΔΔG of 2.5 kcal/mol. This is a common pattern that likely stems from the use of a cell-based assay and the lack of explicit polarization effects in the FEP calculations.

Figure 6
Heterocycle scan in the U-5Het-NH-4PhX series; FEP results for ΔGb (kcal/mol) relative to the thiophene analogue and experimental anti-HIV activity.

In view of the synthetic challenges, only two alternatives were prepared, the thiadiazole and thiazole analogues, which were both predicted and found to be inactive (Figure 6). Graphical display of modeled complexes is inadequate to gauge relative potency. In retrospect, the results indicate that the longer C-S bonds in the 2,5-disubstituted sulfur-containing heterocycles cause crowding of the dichlorobenzyl group and Tyr181, and the nitrogen in the 4-position has an electrostatically unfavorable interaction with Glu138.

Small Group Scans

Small group scans are also highly informative. A standard protocol with BOMB is to replace each hydrogen of a core, especially aryl hydrogens, with 10 small groups that have been selected for difference in size, electronic character, and hydrogen-bonding patterns: Cl, CH3, OCH3, OH, CH2NH2, CH2OH, CHO, CN, NH2, and NHCH3. This is generally adequate to define likely places for beneficial substitution of hydrogen by the first three groups. The situation with the polar groups is less clear owing to the competition for the ligand between hydrogen bonding in the complex versus unbound in water. As long as some substitutions appear promising, a chlorine and/or methyl scan using FEP calculations is then desirable to obtain quantitatively reliable predictions. The value of using both a chlorine and methyl scan is well illustrated in Figure 7; knowing the optimal position for the two groups provides an activity boost from 30 µM to 39 nM in this case.68

Figure 7
The power of chlorine and methyl scans; experimental EC50 values for anti-HIV activity.

A chlorine scan was also particularly helpful in evolving the inactive oxadiazole 3 into potent anti-HIV agents. The oxadiazole emerged in third place after the docking exercise, embedded among known, potent NNRTIs. The docking pose looked reasonable, though the score from BOMB was modest owing to poor accommodation of the methoxy groups in the vicinity of Tyr181 and Tyr188. Assuming that the tricyclic core might be viable, the substituents were removed and a chlorine scan was performed using MC/FEP simulations.2,17 The predicted changes in free energy of binding for replacing each hydrogen by chlorine are summarized in Figure 8; again formally equivalent positions become non-equivalent in the complexes. The scan indicated that the most favorable positions for chlorines were at C3 and C4 in the phenyl ring and at C2 and C6 in the benzyl ring. A series of polychloro analogues were then synthesized and the activities were found to closely parallel the predictions. The core and, for example, the 4,4’-dichloro analogue 5 were inactive; however, the trichloro and tetrachloro analogues 68 followed the FEP expectations and yielded sub-µM NNRTIs. Thus, with the aid of the FEP chlorine scan it was possible to evolve the false positive from the docking calculations into true positives.2,17

Figure 8
(Left) FEP-computed changes in ΔGb (kcal/mol) for replacement of the indicted hydrogens by chlorine. (Right) Snapshot of the complex of 4 bound to HIV-RT from MC/FEP simulations.
An external file that holds a picture, illustration, etc.
Object name is nihms105169f14.jpg

Small Group and Linker Refinement

Given the positive outcome of chlorine and/or methyl scans, it is natural to consider further optimization at the replacement sites. This has been FEP-guided several times, for example, in the optimization of the substituent in the pyrimidine ring and at the 4-position in the phenyl ring for the Het-NH-3-Ph-U compounds as in 2.68,17 More recent examples occurred with the azoles. For 9, FEP calculations were performed and predicted ΔΔGb values in kcal/mol for X = H (0.0), CH2CH3 (−0.3), CH3 (−1.6), CH2OCH3 (−1.7), OCH3 (−1.8), CF3 (−2.2), F (−2.3), Cl (−4.0), and CN (−5.2). The X = CH3, CH2OCH3, Cl, and CN analogues were synthesized and the assay results with EC50 values of 4, 4, 0.8, and 0.1 µM, respectively, conformed well to the expectations.17 The enhanced activity of the cyano derivatives appears to result from favorable ion-dipole interactions with a proximal protonated histidine residue and from strengthening the hydrogen bond between the para amino group and the backbone carbonyl group of Lys101.

An external file that holds a picture, illustration, etc.
Object name is nihms105169f15.jpg

FEP-guided optimization of the linker Y between the oxadiazole and dichlorophenyl rings in 10 was also pursued. The options considered were Y = CH2, (R)-CHCH3, (S)-CHCH3, NH, NCH3, O, and S. Though display of the corresponding complexes appear reasonable, the FEP predictions for modification of the methylene group were all unfavorable except for minor improvement for the methylamino (−1.6) and thio (−1.4) alternatives. The Y = NH and racemic CHCH3 analogues were synthesized and indeed found to be less active than the methylene compound; the methylamino compound turned out to have similar activity (0.2 µM) as the methylene analogue (0.1 µM) with X = CN, and the oxo and thio options were not pursued.

As a last thrust, FEP calculations were performed for possible replacement of the oxazole C4 hydrogen in 11 by R = F, Et, Me, CF3, and CH2OH. The five analogs were predicted to be less well bound than the unsubstituted compound (4) by 0.8, 1.5, 1.8, 2.2, and 3.9 kcal/mol, respectively. Visual inspections of modeled structures were, once more, ambiguous. The qualitative FEP result was confirmed experimentally for the C4-methyl derivative, which was found to be 7-fold less potent than 4. The other options were not further pursued.

In summary, starting from the inactive “lead” 3, combination of the FEP-based chlorine scan, heterocycle scan, and small substituent and linker optimizations delivered 4, a novel 13-nM NNRTI (Figure 8).17 This case illustrates how a molecular template can be thoroughly scrutinized with FEP calculations. A general protocol is summarized in Figure 1. De novo design or virtual screening can usually provide one or more lead compounds with low-µM activity. The substituents in the lead are likely not optimal. Consequently, removal of any small substituents from the core followed by chlorine and methyl FEP scans are desirable. Synthesis and assaying of the most promising di- or tri-substituted compounds from the scans can provide significant activity improvements, as in Figure 7. FEP-guided refinement of the small substituents, linkers, and heterocycles is the logical next step.

As an on-going example, the MIF lead featured in Figure 3 is being optimized. Chlorine, methyl and small group scans have been performed to optimize the substituents for the N-phenyl-benzoisothiazolone, synthesis of an initial round of analogues based on these results is near completion, and perturbation of the 5-ring heterocycle to 11 alternatives is underway. Concerning computer time, for the typical perturbations considered here, one ΔΔGb result can be obtained in one week using one 3-GHz Pentium processor, or by running the FEP windows in parallel, one ΔΔGb result can be obtained in one day using 12 processors.29 With moderate computer resources, the bottleneck in lead optimization is the synthetic chemistry.


Great progress has been made in the development and application of methodology to facilitate both drug lead generation and lead optimization. Computational chemistry has contributed significantly through advances in de novo design, virtual screening, prediction of pharmacologically important properties, and the estimation of protein-ligand binding affinities. Docking of large commercial and in-house libraries is now an essential approach for structure-based lead generation. Furthermore, as summarized here, the long-standing promise of the utility of free-energy calculations for molecular design has been fulfilled. The methodology allows broad exploration of the effects of potential modifications to a compound without immediate need for synthesis and without conceptual constraints associated with ease of synthesis. Depending on the outcome, synthetic and biological resources can be focused in the most promising directions. In view of the ever pressing needs for efficiency, free-energy guided molecular design can be expected to become a mainstream activity in many contexts.


Gratitude is expressed to the National Institute of General Medical Sciences (GM32136), National Institute of Allergy and Infectious Diseases (AI44616), National Foundation for Cancer Research, and Alliance for Lupus Research for support of this research, and to the many stalwart co-workers at Yale.




Bill Jorgensen is a graduate of Princeton and Harvard; he has been the Whitehead Professor of Chemistry at Yale since 1990. His research spans physical, organic, and biological chemistry.


1. (a) Jorgensen WL. The Many Roles of Computation in Drug Discovery. Science. 2004;303:1813–1818. [PubMed] (b) Klebe G. Virtual ligand screening: strategies, perspectives and limitations. Drug Disc. Today. 2006;11:580–594. [PubMed]
2. Barreiro G, Kim JT, Guimarães CRW, Bailey CM, Domaoal RA, Wang L, Anderson KS, Jorgensen WL. From Docking False-Positive to Active Anti-HIV Agent. J. Med. Chem. 2007;50:5324–5329. [PMC free article] [PubMed]
3. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004;47:1739–1749. [PubMed]
4. (a) Leach AR, Hann MM, Burrows JN, Griffen EJ. Fragment screening: an introduction. Mol. Biosyst. 2006;2:429–446. [PubMed] (b) Congreve M, Chessari G, Tisi D, Woodhead AJ. Recent Developments in Fragment-Based Drug Discovery. J. Med. Chem. 2008;51:3661–3680. [PubMed]
5. Jorgensen WL, Tirado-Rives J. Potential energy functions for atomic-level simulations of water and organic and biomolecular systems. Proc. Nat. Acad. Sci USA. 2005;102:6665–6670. [PubMed]
6. Jorgensen WL, Ruiz-Caro J, Tirado-Rives J, Basavapathruni A, Anderson KS, Hamilton AD. Computer-aided design of non-nucleoside inhibitors of HIV-1 reverse transcriptase. Bioorg. Med. Chem. Lett. 2006;16:663–667. [PubMed]
7. Ruiz-Caro J, Basavapathruni A, Kim JT, Wang L, Bailey CM, Anderson KS, Hamilton AD, Jorgensen WL. Optimization of diarylamines as non-nucleoside inhibitors of HIV-1 reverse transcriptase. Bioorg. Med. Chem. Lett. 2006;16:668–671. [PubMed]
8. Thakur VV, Kim JT, Hamilton AD, Bailey CM, Domaoal RA, Wang L, Anderson KS, Jorgensen WL. Optimization of pyrimidinyl-and triazinyl-amines as non-nucleoside inhibitors of HIV-1 reverse transcriptase. Bioorg. Med. Chem. Lett. 2006;16:5664–5667. [PubMed]
9. Kim JT, Hamilton AD, Bailey CM, Domaoal RA, Wang L, Anderson KS, Jorgensen WL. FEP-guided selection of bicyclic heterocycles in lead optimization for non-nucleoside inhibitors of HIV-1 reverse transcriptase. J. Am. Chem. Soc. 2006;128:15372–15373. [PubMed]
10. Jorgensen WL. QikProp, v 3.0; New York: Schrödinger LLC; 2006.
11. Gohlke H, Klebe G. Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angew. Chem. Int. Ed. Engl. 2002;41:2644–2676. [PubMed]
12. Leach AR, Shoichet BK, Peishoff CE. Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. J. Med. Chem. 2006;49:5851–5855. [PubMed]
13. Zhou Z, Felts AK, Friesner RA, Levy RM. Comparative Performance of Several Flexible Docking Programs and Scoring Functions: Enrichment Studies for a Diverse Set of Pharmaceutically Relevant Targets. J. Chem. Inf. Model. 2007;47:1599–1608. [PMC free article] [PubMed]
14. Tirado-Rives J, Jorgensen WL. Contribution of conformer focusing to the uncertainty in predicting free energies for protein-ligand binding. J. Med. Chem. 2006;49:5880–5884. [PubMed]
15. Barreiro G, Guimarães CRW, Tubert-Brohman I, Lyons TM, Tirado-Rives J, Jorgensen WL. Search for non-nucleoside inhibitors of HIV-1 reverse transcriptase using chemical similarity, molecular docking, and MM-GB/SA scoring. J. Chem. Info. Model. 2007;47:2416–2428. [PMC free article] [PubMed]
16. Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, Sanschagrin PC, Mainz DT. Extra precision Glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 2006;49:6177–6196. [PubMed]
17. Zeevaart JG, Wang L, Thakur VV, Leung CS, Tirado-Rives J, Bailey CM, Domaoal RA, Anderson KS, Jorgensen WL. Optimization of Azoles as Anti-HIV Agents Guided by Free-Energy Calculations. J. Am. Chem. Soc. 2008;130:9492–9499. [PMC free article] [PubMed]
18. Cournia Z, Leng L, Gandavadi S, Du X, Bucala R, Jorgensen WL. Discovery of Human Macrophage Migration Inhibitory Factor (MIF)-CD74 Antagonists via Virtual Screening. J. Med. Chem. 2009;52:416–424. [PMC free article] [PubMed]
19. Irwin JJ, Shoichet BK. ZINC-A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. [PMC free article] [PubMed]
20. Morand EF, Leech M, Bernhagen J. MIF: a new cytokine link between rheumatoid arthritis and atherosclerosis. Nat. Rev. Drug Discov. 2006;5:399–411. [PubMed]
21. Hagemann T, Robinson SC, Thompson RG, Charles K, Kulbe H, Balkwill FR. Ovarian cancer cell-derived migration inhibitory factor enhances tumor growth, progression, and angiogenesis. Mol. Cancer Ther. 2007;6:1993–2002. [PubMed]
22. Egan WJ, Merz KM, Jr, Baldwin JJ. Prediction of Drug Absorption Using Multivariate Statistics. J. Med. Chem. 2000;43:3867–3877. [PubMed]
23. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2001;46:3–26. [PubMed]
24. Jorgensen WL, Duffy EM. Prediction of Solubility from Structure. Adv. Drug Deliv. Rev. 2002;54:355–365. [PubMed]
25. Norinder U, Bergström CAS. Prediction of ADMET properties. ChemMedChem. 2006;1:920–937. [PubMed]
26. Proudfoot JR. The evolution of synthetic oral drug properties. Bioorg. Med. Chem. Lett. 2005;15:1087–1090. [PubMed]
27. Jorgensen WL, Tirado-Rives J. Molecular modeling of organic and biomolecular systems using BOSS and MCPRO. J. Comput. Chem. 2005;26:1689–1700. [PubMed]
28. Chipot C, Pohorille A. In: Springer Series in Chemical Physics, Vol 86: Free Energy Calculations: Theory and Applications in Chemistry and Biology; Chipot C, Pohorille A, editors. Berlin: Springer-Verlag; 2007. pp. 33–75.
29. Jorgensen WL, Thomas LT. Perspective on Free-Energy Perturbation Calculations for Chemical Equilibria. J. Chem. Theory Comput. 2008;4:869–876. [PMC free article] [PubMed]
30. Jorgensen WL, Ravimohan C. Monte Carlo simulation of differences in free energies of hydration. J. Chem. Phys. 1985;83:3050–3054.
31. Wong CF, McCammon JA. Dynamics and design of enzymes and inhibitors. J. Am. Chem. Soc. 1986;108:3830–3832.
32. McCammon JA. Computer-aided molecular design. Science. 1987;238:486–491. [PubMed]
33. (a) Kollman PA. Free Energy Calculations: Applications to Chemical and Biochemical Phenomena. Chem. Rev. 1993;93:2395–2417. (b) Merz KM, Kollman PA. Free Energy Perturbation Simulations of the Inhibition of Thermolysin: Prediction of the Free Energy of Binding of a New Inhibitor. J. Am. Chem. Soc. 1989;111:5649–5658.
34. (a) Pearlman DA, Charifson PS. Are Free Energy Calculations Useful in Practice? A Comparison with Rapid Scoring Functions for the p38 MAP Kinase Protein System. J. Med. Chem. 2001;44:3417–3423. [PubMed] (b) Pearlman DA. Evaluating the Molecular Mechanics Poisson-Boltzmann Surface Area Free Energy Method Using a Congeneric Series of Ligands to p38 MAP kinase. J. Med. Chem. 2005;48:7796–7807. [PubMed]
35. (a) Reddy MR, Erion MD. Calculation of Relative Binding Free Energy Differences for Fructose 1,6-Biphosphatase Inhibitors Using Thermodynamic Cycle Perturbation Approach. J. Am. Chem. Soc. 2001;123:6246–6252. [PubMed] (b) Erion MD, Dang Q, Reddy MR, Kasibhatla SR, Huang J, Lipscomb WN, van Poelje PD. Structure-Guided Design of AMP Mimics That Inhibit Fructose-1,6-bisphosphatase with High Affinity and Specificity. J. Am. Chem. Soc. 2007;129:15480–15490. [PubMed]
36. (a) Pierce AC, Jorgensen WL. Computational Binding Studies of Orthogonal Cyclosporin-Cyclophilin Pairs. Angew. Chem. Int. Ed. Engl. 1997;36:1466–1469. (b) Essex JW, Severance DL, Tirado-Rives J, Jorgensen WL. Monte Carlo Simulations for Proteins: Binding Affinities for Trypsin-Benzamidine Complexes via Free Energy Perturbations. J. Phys. Chem. 1997;101:9663–9669.
37. Plount-Price ML, Jorgensen WL. Analysis of Binding Affinities for Celecoxib Analogs with COX-1 and COX-2 from Docking and Monte Carlo Simulations and Insight into COX-2/COX-1 Selectivity. J. Am. Chem. Soc. 2000;122:9455–9466.
38. Plount-Price ML, Jorgensen WL. Rationale for the Observed COX-2/COX-1 Selectivity of Celecoxib from Monte Carlo Simulations. Bioorg. Med. Chem. Lett. 2001;11:1541–1544. [PubMed]
39. Guimarães CRW, Boger DL, Jorgensen WL. Elucidation of Fatty Acid Amide Hydrolase Inhibition by Potent α-Ketoheterocycle Derivatives from Monte Carlo Simulations. J. Am. Chem. Soc. 2005;127:17377–17384. [PubMed]
40. Rizzo RC, Wang D-P, Tirado-Rives J, Jorgensen WL. Validation of a Model for the Complex of HIV-1 Reverse Transcriptase with Sustiva through Computation of Resistance Profiles. J. Am. Chem. Soc. 2000;122:12898–12900.
41. Wang D-P, Rizzo RC, Tirado-Rives J, Jorgensen WL. Antiviral Drug Design: Computational Analyses of the Effects of the L100I Mutation for HIV-RT on the Binding of NNRTIs. Bioorg. Med. Chem. Lett. 2001;11:2799–2802. [PubMed]
42. Udier-Blagović M, Tirado-Rives J, Jorgensen WL. Validation of a Model for the Complex of HIV-1 Reverse Transcriptase with the Novel Non-nucleoside Inhibitor TMC125. J. Am. Chem. Soc. 2003;125:6016–6017. [PubMed]
43. Blagović MU, Tirado-Rives J, Jorgensen WL. Structural and Energetic Analyses for the Effects of the K103N Mutation of HIV-1 Reverse Transcriptase on Efavirenz Analogs. J. Med. Chem. 2004;46:2389–2392. [PubMed]