|Home | About | Journals | Submit | Contact Us | Français|
The history of fragment-based drug discovery, with an emphasis on crystallographic methods, is sketched, illuminating various contributions, including our own, which preceded the industrial development of the method. Subsequently, the creation of the BMSC fragment cocktails library is described. The BMSC collection currently comprises 68 cocktails of 10 compounds that are shape-wise diverse. The utility of these cocktails for initiating lead discovery in structure-based drug design has been explored by soaking numerous protein crystals obtained by our MSGPP (Medical Structural Genomics of Pathogenic Protozoa) consortium. Details of the fragment selection and cocktail design procedures, as well as examples of the successes obtained are given. The BMSC Fragment Cocktail recipes are available free of charge and are in use in over 20 academic labs.
Fragment-based drug discovery has matured to a powerful tool to arrive at drug leads over the course of the past decade. It has been the subject of many reviews [1–12], and refers to the screening of a drug target (almost always a protein) with a small number of small molecules (typically of the order of a thousand) with small molecular masses (often a Mr less than 250). The screening is primarily done by crystallography or NMR, though some groups use surface plasmon resonance or mass spectrometry analysis of covalent capture of thiolated fragments by strategically placed cysteine tags in the target. The name “fragment” has been coined to emphasize that the screened molecules could be considered substructures of the “normal” size small molecules utilized in high-throughput screening (HTS). In 2008 no fewer than 10 candidates derived from fragment-based screening were in clinical trials and another 6 in were the stage of preclinical research .
Many researchers pinpoint the start of fragment-based drug discovery to the appearance of a seminal paper by the Fesik group at Abbott Laboratories entitled “Discovering high-affinity ligands for proteins: SAR by NMR” . This paper described how the binding of a weakly bound ligand out of a mixture of small molecules – called a cocktail – was mapped to shifts in the 15N-HSQC signals of binding site residues of the target, and it was further demonstrated that high affinity ligands could be obtained by linking the small ligands or “fragments”. Later this idea was transplanted to the world of crystallography, again by researchers at Abbott . In brief, assuming one has crystals that allow for access to the binding site of interest, one can soak the crystal with a cocktail of small but shape-wise diverse “fragments” at high concentrations. With luck, one of the fragments will bind to the site of interest, and because of the shape diversity it should be possible to identify which fragment in the cocktail is fitting the electron density, avoiding lengthy deconvolution experiments. At the beginning of the millennium the method of fragment-based lead discovery using X-ray crystallography quickly became so appealing that specialized biotechnology companies were created that relied on it for their drug discovery efforts, such as SGX Pharmaceuticals (recently absorbed by Lilly), Plexxikon in California, Vertex Pharmaceuticals in Massachusetts, and Astex Therapeutics in the UK.
Fragment-based drug discovery did not emanate from a vacuum. The idea that leads can be elaborated from fragments that home in on hot spots in the targeted binding site was turned into a variety of tools by computational chemists. In 1985, Peter Goodford crafted the program GRID to predict such hot spots in the target for small molecular probes, most the size of just a functional group and hence the ultimate “fragment” . The program MCSS which makes hot-spot predictions for fragment-sized molecules was reported in 1991 . The first fragment libraries to be explored in a binding site were created by Boehm to “feed” his ligand design program LUDI in 1992 . In the same year two of the current authors, then at the University of Groningen in The Netherlands, published an article with the title “In search of new lead compounds for African trypanosomiasis drug design: a linked-fragment approach” . Their computational approach was guided by force-field calculations and entailed four stages: (1) a design pathway is defined based on the three-dimensional structure of a target protein; (2) this pathway is divided into subregions; (3) complementary building blocks, also called fragments, are designed in each subregion; complementarity is defined in terms of shape, hydrophobicity, hydrogen bond properties and electrostatics; and (4) fragments from different subregions are linked into potential lead compounds. Meanwhile experimentalists developed a crystallographic equivalent of GRID. Dagmar Ringe and colleagues replaced water in her crystals with pure organic solvent such as acetonitrile to determine the hot spots of the solvent molecules on the protein surface, a single-compound solvent per experiment .
The first crystallographic experiments using cocktails of fragments were carried out in the group of Wim Hol, at the University of Groningen, The Netherlands. Driven by a life-long interest in the use of structural biology for drug design  and inspired by an article in Scientific American about the revolution in RNA and peptide combinatorial chemistry for discovering high affinity ligands he conceived the idea of fragment cocktail crystallography on Oct 7 1990 (Fig. 1). He convinced Christophe Verlinde, a post-doc at the time, to test his approach with triose-phosphate isomerase of Ttypanosoma. brucei, the African sleeping sickness parasite. Soon thereafter compounds were selected, purchased and three cocktails of 128 compounds each were made by undergraduate student Gabby Rudenko (now Assistant professor at U. of Michigan at An Arbor) and technician Tjaard Pijning. Of the three cocktail soaks with 128 compounds each, one resulted in a convincing difference Fourier peak in the binding site. However, because of the limited resolution of 2.8 Å, the identification of the ligand from the shape of the electron density was not possible and we had to resort to deconvolution. Soaks with one of the sub-cocktails of 64, 32 and 16 different compounds revealed the same difference density but neither of the two sub-cocktails of 8 did – perhaps a chemical reaction in the 16 compounds cocktail produced the ligand. It deserves emphasis that all these experiments were carried out prior to the advent of fast data collection methods and equipment, hence requiring a substantial effort – up to several months for data collection and refinement, rather than the one day it would take in 2009. We were disappointed, but described these experiments in a book chapter in 1997 .
In hindsight, although the cocktail crystallography idea was novel there were several shortcomings in our approach. First, the selection of compounds was haphazard and mainly governed by budget constraints – we had no access to electronic compound catalogs. No effort was made to make sure that the compounds were sufficiently shape diverse for direct recognition from the electron density. Second, we underestimated the importance of solubility of the compounds, and therefore obtained at best a 1mM concentration per individual compound. Third, the number of compounds per cocktail was too high, causing solubility and compatibility issues, and problems with compound identification from the electron density. Each of these three practical problems was properly addressed nearly a decade later by researchers from Abbott Laboratories, undoubtedly benefiting from decades of experience in dealing with chemical screening in industry. In 2001 Vicki Nienaber, Jonathan Greer, Celerino Abad-Zapatero and Daniel Norbeck from Abbott obtained a patent “ligand screening and design by x-ray crystallography” .
Upon seeing around the year 2000 fragment-based ligand discovery blossoming in the biotech industry, we decided to revive the method at the Biomolecular Structure Center (BMSC) of the University of Washington in Seattle, within the framework of our Medical Structural Genomics for Protozoan Parasites (MSGPP) program project .
The design and preparation of the BMSC fragment cocktails library can be briefly decribed as follows.
A starting list of 9,486 commercially available compounds was selected from the commercial ACD database version 2004.1 (Elsevier Molecular Design, San Leandro, CA) through use of the included ISIS program - the free ZINC database by the Shoichet group at UCSF  only became available a year later.
We used the following criteria:
Subsequently, after removal of counter ions and solvent molecules the compounds were analyzed in terms of diversity of ring frameworks by fragmentation with the program MOE (CCG, Montreal, Canada). At the level of connectivity, 23 frameworks as shown in Fig. (2) could be distinguished, and representatives of each category, including saturated and aromatic rings, were further visually selected. Mutagens, poisons, and highly functionalized compounds were eliminated, though we disclaim responsibility for potential failure to recognize all mutagens and poisons. Deliberately, a substantial number of Br containing compounds were retained for the purpose of easier detection in the electron density. The final number of selected compounds was 680.
In the next step the compounds were subdivided in ‘cocktails’ of 10 compounds by maximizing the shape diversity. Because the total number of ways to partition 680 compounds in 68 cocktails of 10 compounds is astronomical we refrained from carrying out the process manually. Instead for each compound a shape fingerprint with 4 descriptors was calculated with MOE consisting of the first kappa shape index of Hall and Kier (1K =Natoms(Natoms-1)/(Nbonds)2), where Natoms is the atom count and Nbonds is the bond count ), and the standard dimensions defined as the square root of each of the three largest eigenvalues of the covariance matrix of the atomic coordinates. The shape fingerprints were used to assess the similarity between compounds and 68 maximally diverse subsets of 10 compounds each were selected by non-parametric ranking in MOE.
The 680 cocktail compounds were purchased from the following vendors: Acros, ABCR, Aldrich, Alfa Aesar, Bachem, Fluka, ICN, Maybridge, Lancaster, Oakwood, Pfaltz-Bauer, Sigma, Specs, TCI, and Toronto Research Chemicals. The cost of goods was about $30,000 in 2004, and the administrative effort of placing and tracking orders should not be underestimated. No guarantee can be given that all chemicals are still available from these vendors. The purity of all compounds was verified by LCMS and NMR. Stock solutions with a 100 mM final concentration of the cocktail compounds in 100 % DMSO. These stock solutions were diluted either into a specific cryo-buffer or into a fraction of the mother liquor, followed by crystal soaking. The final concentration of the cocktail chemicals was 5 to 10 mM, leading also to a residual 5 to 10% DMSO concentration. Soaking times as short as 10 s were shown to reveal bound ligands .
Over the last five years we have explored 26 MSGPP targets by cocktail crystallography. Seven targets gave hits while 19 targets did not yield results because of a variety of reasons. Some crystals were unable to tolerate the presence of even a low concentration of DMSO, other crystals cracked - presumably because of conformational changes in the protein induced by ligand binding. In other cases the diffraction resolution deteriorated beyond usefulness (at least about 2.8 Å is required for ligands that do not contain Br). Finally, some targets produced robust crystals but none of the cocktail soaks led to an interesting difference Fourier density. Here we summarize all of our successes except for the cocktail fragment hit MSGPP recently obtained for T. cruzi histidyl-tRNA synthetase (GeneDB identifier Tc00.1047053507019.40), solved at 1.7 Å resolution, as screening of all the cocktails has not been completed yet.
This glycolytic enzyme, with PlasmoDB  identifier PF11_0208, has so far not been shown to be essential for the parasite, but it was one of the first proteins we used for trying out our BMSC cocktails. The native structure was deposited in the PDB as 1XQ9, but is so far unpublished. We obtained a significant electron density in the difference Fourier after soaking in cocktail #68 (Fig. (3)). The observed density is consistent only with 5-phenyl-2-furoic acid.
It must be noted that the PDB contains another unpublished phosphoglycerate mutase structure from P. falciparum with code 3EOZ. It corresponds to a different PlasmoDB identifier, PFD0660w. Neither of the two proteins has been characterized enzymatically to the best of our knowledge.
This target with GeneDB  identifier Tb927.5.1360 is critical for purine salvage and recycling in trypanosomatid parasites because of the absence of enzymes for de novo purine biosynthesis. Since there is no NDRT homologue in humans the T. brucei enzyme seems a particularly attractive target for drug development, however, due to overlapping salvage pathways in the parasite simultaneous blocking of more than one target may be necessary .
Initially, the crystal structure of the dimeric enzyme was solved at 1.8 Å resolution (PDB: 2A0K) . The two active sites, lined by 9 hydrophobic, 4 charged, and 6 neutral side chains, were shown to be across the dimer interface and to possess a volume of around 1180 Å3 each. The crystals proved robust and were soaked with 31 of the 68 BMSC cocktails. From synchrotron data with a resolution slightly better than the native dataset ultimately four ligands were identified in the active site: benzo[cd]indol-2(1H)-one (PDB: 2F67); 5-aminoisoquinoline (PDB: 2F2T); 2-ethylbenzyl alcohol (PDB: 2F62); and 1-methylquinoline-2(1H)-one (PDB: 2F64) . Remarkably in each case, in the immediate vicinity of the bound compound a glycerol molecule was observed whose oxygen atoms occupy positions that are equivalent to the O3′ and O5′ atoms of the substrate in the homologous Lactobacillus helveticus enzyme . As an example the electron density for benzo[cd]indol-2(1H)-one is shown along with the cocktail composition in Fig. (4). It is clear that no other compound in the cocktail is compatible with the observations.
Leishmania major coproporphyrinogen III oxidase, with GeneDB identifier LmjF06.1270, is required for porphyrin biosynthesis, though there is no evidence so far that it is a valid drug target in trypanosomatids. In all, 66 different cocktails were soaked into 147 crystals, ultimately leading to 42 useful datasets. Fragments from two different cocktails were found to bind: 5-fluoroindole-2-carboxylic acid (PDB: 3DWS) and cyclopentyl acetic acid (PDB: 3DWR). Interestingly, as can be seen in Fig. (5), the dataset collected for the cocktail containing cyclopentyl acetic acid shows that three copies of cyclopentyl acetate occupy the active site as well as 1 acetate molecule, which is a “gift” from E.coli. We hypothesize that the 4 carboxylate groups of the fragments occupy the same positions as the 4 carboxylate moieties of the enzyme substrate coproporphyrinogen-III.
This enzyme of the shikimate pathway, with PlasmoDB identifier PFL0960w, is of particular interest for therapeutic targeting because of its absence in mammals. MSGPP solved the apo structure at 2.0 Å resolution (PDB: 1TQX) . Again, the crystals were used to hone our skills in cocktail crystallography. Soaking with cocktail 68 revealed electron density for 3-aminopyrrolidine at 2.0 Å (Fig. (6)).
GeneDB LmjF30.1890 is a putative L. major adenylate kinase. Its significance as a drug target is unknown, but we used the crystals to test our cocktails during the development phase. We found one hit (Fig. (7)).
This homolog of L. brasiliensis UDG with GeneDB identifier LbrM18_V2.0540 is responsible for excising uracil in DNA repair. The enzyme has been shown to be very important in the related parasite T. cruzi, whereas higher eukaryotes possess redundant pathways to repair the same DNA damage . The 1.5 Å structure has been deposited in complex with 5-bromouracil with PDB accession code 3cxm. All 68 cocktails were screened, 42 of which led to 95 usable datasets, and no fewer than 4 fragments as well as DMSO were shown to bind to this target, as shown in Fig. (8). Subsequenlty, we designed a derivative of the hit 5-chloro-2-methoxybenzoic acid shown in Fig. (9), and thereby improved the IC50 from about 15 mM to 15±3 μM. The new inhibitor consists of 15 non-hydrogen atoms and exhibits a ligand efficiency (LE) of 0.44. This compares favorably with the 1 μM inhibitor of human UDG2 designed by Stivers , MA1, which possesses 25 atoms and an LE of only 0.34.
The enormous advantage of screening smaller molecules than in HTS is that it allows for much better sampling of chemical space. The assay signal for a hit in HTS requires at least 1–10 μM affinity, which results in hits with an average molecular mass (Mr) of around 270, which is only slightly smaller than the average size of a drug . This leaves little leeway for optimization by adding substituents because it has been established that compounds with an Mr above 500 have in general very poor bioavailability. In addition, in HTS up to several million compounds are screened against a target but the virtual space of drug-size molecules has been estimated to be of the order of 1060 to 10200, hence the sampling coverage is dismal. In contrast, the technologies used in fragment-based screening allow for the detection of hits with affinities as low as millimolar. There is a larger chance that a fragment is complementary to a binding site than is the case for a drug-size compound  and since the hits are small in size there is considerable margin for optimization within the bounds of the bioavailability molecular mass limit. Moreover, virtual chemical space for fragment molecules is much smaller than for drug-size molecules. A systematic study of chemical space below an Mr of 160, corresponding to 11 atoms, yielded an estimated 14 million possible compounds of which 36,000 are known (25% are acyclic) . At least, meaningful sampling of the chemical space of the available fragment-size chemicals appears feasible, in contrast to HTS.
While fragments are small and consequently have poor affinity they provide for much better hit rates than HTS. Astex reports typical hit rates range from 0.5 to 10.0% depending on the protein, with cocktails covering over 300 compounds, and 4–8 compounds per cocktail . We have similar results. Overall, seven out of 26 targets gave at least one hit which is 26%, and for datasets with useful resolution the hit rate defined as number of fragments seen in the difference maps divided by total number of soaked molecules is 2%. It should be mentioned that MSGPP did not yet try extensively to obtain robust crystal forms for the 19 targets that were not successful. Robust crystals with accessible binding sites that diffract to at least 2.8 Å when soaked with cocktails are key requirement for success with fragment-based cocktail crystallography, and could have increased significantly our success rate.
MSGPP hopes to transfer in the near future into a combination of projects focusing on ~ 10 to 20 well-validated drug targets from protozoan parasites causing global diseases such as malaria, African trypanosomiasis, Chagas disease, leishmaniasis and cryptosporidiosis. These projects will still be of limited scale and in an academic setting, but will contain a significant chemical synthesis component. However, the full power of all available fragment screening methods cannot be applied and, hence, fragment cocktail crystallography, supplemented by ultra-filtration methods  to find low-affinity binders in solution, will be the main method used for discovering initial binders and for finding opportunities for adding substituents to existing binders.
In a broader perspective, a lesson from the MSGPP efforts is that numerous crystals did not survive the fragment-cocktail soak procedure. As mentioned above, this is in certain cases most likely due to interference of certain fragments with crystal contacts. In other cases, conformational changes upon ligand binding destroy the crystalline order of pre-grown crystals, as Max Perutz already observed when adding oxygen to deoxyhemoglobin crystals more than half a century ago. Hence, for enhanced exploration of the intersection of chemical space with a potential drug target, it would be a major advantage if a substantial number of compounds could be set up in a large number of co-crystallization experiments. Since also numerous variants of a particular drug target need to be tried, one obtains quickly a very large number of crystallization trials needed. For instance, for ~ 100 different drug target variants [37–38], ~1000 crystallization conditions , and ~ 1000 different small molecule compounds, about 108 wells need to be explored. This number is quickly increased by a factor of 10 if different temperatures and protein-to-reservoir-solution volume ratios are also taken into account. Yet, with volumes of tens to a few hundreds of picoliters per well this is in principle possible since such small volumes with ~ 10 mgs/ml protein, can generate a small number of crystals of sufficient size, i.e. 5 to 10 μm dimensions, for detecting diffraction patterns or even solving protein structures . This requires 10 mgs of each of the protein variants of the target under study, which is well possible with current techniques for many proteins. It is exciting that the following technological developments are currently taking place:
By combining such technologies, and others, it might be possible to achieve the very high-throughput crystal growth screening (VHTXS) methods required for reaching the full power of fragment cocktail crystallography in future medical macromolecular crystallography.
We thank the staff of the SSRL and ALS synchrotron beam lines for their assistance. We also wish to thank contributions from other members of the initial SGPP consortium including Lori Anderson, Helen Neely, Margaret A. Holmes, Mark A. Robien, Lori W. Schoenfeld, Chris Mehlin, Mike Soltis, Thomas Earnest, George DeTitta and Joe Luft, to mention only a few. This project was supported in part by NIH grants P50GM64655 (SGPP) and P01AI067921 (MSGPP) to W.G.J.H.