|Home | About | Journals | Submit | Contact Us | Français|
In its first 25 years JCAMD has been disseminating a large number of techniques aimed at finding better medicines faster. These include genetic algorithms, COMFA, QSAR, structure based techniques, homology modelling, high throughput screening, combichem, and dozens more that were a hype in their time and that now are just a useful addition to the drug-designers toolbox. Despite massive efforts throughout academic and industrial drug design research departments, the number of FDA-approved new molecular entities per year stagnates, and the pharmaceutical industry is reorganising accordingly. The recent spate of industrial consolidations and the concomitant move towards outsourcing of research activities requires better integration of all activities along the chain from bench to bedside. The next 25 years will undoubtedly show a series of translational science activities that are aimed at a better communication between all parties involved, from quantum chemistry to bedside and from academia to industry. This will above all include understanding the underlying biological problem and optimal use of all available data.
The online version of this article (doi:10.1007/s10822-011-9519-9) contains supplementary material, which is available to authorized users.
Life expectancy of man, and especially man in the western world, increased by more than 2 days per week for the whole previous century . Much of this dramatic increase is to the credit of hygiene, but medicines, and especially antibiotics and vaccines, have contributed significantly too. In the first world war, for example, almost as many soldiers died of disease as of bullets . During the second world war this unfortunate situation got ‘remedied’ by the introduction of sulphonamides and penicillin .
At this moment medical doctors around the world can write prescriptions for tens of thousands of medicines , and an even larger number is available of herbal medicines, homeopathic wonder-cures, and other preparations for which the medicinal value has not been proven . Most medicines function by interacting with proteins in the body. Of the more than twenty thousand protein types in our body less than five hundred are targeted by all these medicines . This, of course, gives hope for the future of drug design because most proteins are still available as a target for which a blockbuster drug can be designed.
Despite massive, world-wide efforts the number of new molecular entities (NMEs) that the FDA approves per year for use as medicines certainly isn’t growing , while the amount of money involved goes up much faster than inflation  even when we include Obama’s Troubled Asset Relief Program .
This journal (JCAMD) has published many, many articles on techniques that according to the authors of these articles were the holy grail for drug design, and that in today’s reality are just good tools used in this process. Following a path familiar in science, someone has a good idea, gives it a name and publishes it. Others follow suit and publish improvement after improvement, after which yet others start testing all similar methods. An example is the use of support vector machines for ligand selection. This was introduced in 2000  and only 3 years of improvements were needed before the first comparison methods were published . Figure 1 illustrates the desperation of pharmaceutical industries. The ever increasing costs mainly result from development and marketing  and, unfortunately for us, not from research. This might explain why each time a new drug design research tool gets published pharmaceutical industries immediately jump on it and give it a hype status.
The first hype in drug design was born out of the famous article by Hol  in which he coined the name ‘rational drug design’ for all protein-structure based techniques, thereby implicitly calling all methods that actually worked, such as screening or luck, irrational; see Fig. 2.
It is not by eye that we can determine either the fitness of a ligand for a pocket, or the safety or efficacy of a drug. It does not seem illogical to assume that the founders of JCAMD were at least subconsciously dealing with the oversimplification implied by Fig. 2 when they started this journal. And we believe that most articles published in JCAMD have dealt with aspects of drug design ‘left out’ of Fig. 2. The advent of faster computers like first the VAX/VMS, then supercomputers such as the CRAY, and finally the PC, have allowed scientist to numerically solve chemical problems of ever increasing size and complexity. Semi-empirical quantum calculation methods have been devised to calculate the chemically relevant aspects of the electronic wave-functions associated with small organic molecules and thus compute their 3D dimensional structures as well as the energy of their conformers [15–19]. All the techniques derived in this domain are referred to as ligand-based drug design. In parallel, the development of molecular mechanics force fields combined with the fact that Newton’s equations of motion could be solved for entire proteins in their aqueous environments were true innovations in the investigation of the structure function relationships [20–28]. Thus, not only the geometry and the potential energy surface of macromolecular assemblies could be calculated but also their dynamic and thermodynamic properties [29, 30]. For the early computational chemists this opened the perspective of testing at will the energy of interactions between protein targets and large collections of small molecule ligands [29, 31, 32]. The original thoughts that this would replace experimental validation processes, though, have long been shown to be a nice dream at best. The perception that the underlying mechanism of protein–ligand recognition would be unravelled and would thus allow what ever since has been called structure-based drug design has never looked so clear and promising as at that particular moment in 1986.
With the exception of a very small fraction of ligands that are purely rigid, most bioactive ligands have a number of rotatable bonds that make them flexible. The values of the torsion angles in ligands are determined by the valence electrons of the atoms. The development of empirical molecular mechanics force field in the late 1970s [33, 34] have allowed for the in silico determination of the geometries (low energy conformers) of ligands in vacuo. Application of these methods relies on two underlying assumptions: (1) that the conformation of the dissolved ligand corresponds closely to its gas-phase conformation ; and (2) that the biologically active conformation of the ligand is likely to be found among the set of low energy conformers of the isolated ligand [36, 37]. The combined knowledge of the ligand structure (determined by NMR or X-ray), the measured binding affinities, and the spatial overlay of the low energy conformations should then be sufficient to establish a structure activity relationship  and pinpoint the spatial organization of the recurrent chemical features correlated with activity (pharmacophore). This paved the way for a series of successes for ligand-based drug design [e.g. 39, 40, 16]. However, although it seems fairly reasonable at first sight, both assumptions in practice proved to be incomplete and/or insufficient [41–50].
The computational process by which the complementary aspects between a ligand and a receptor binding site can be ascertained has been explored with the design of specifically dedicated docking programmes. Early docking methods were based uniquely on assessing the shape complementarity  between a pocket in the 3D structure of a protein and low energy conformers of a ligand. The approach was computationally cumbersome due to the need to systematically search all possible ligand orientations within the pocket and scoring each of these poses by its steric hindrance. Subsequent developments have taken place in several directions: improved scoring functions [52–63] different ways to deal with ligand flexibility [60, 64–72], and most recently also ways to deal with receptor flexibility [73–78]. Fundamental research has been performed into directions such as desolvation energies [79–83], or other aspects of the force fields used for scoring docking poses [66, 78, 84–96].
The idea to calculate from first principles all atomic motions occurring in an active enzyme in its aqueous environment has attracted many scientists to computer aided molecular design. Starting with the atomic loci obtained from the X-ray structure of en enzyme it can be envisaged to integrate Newton’s equations [29, 31]. A series of snapshots describing the trajectory of the enzyme over time could thus be produced and ensemble average properties calculated based on Boltzmann’s ergodic hypothesis. The near infinite computer time needed for such experiments muted this field till concepts from alchemy could be embraced. In silico, one is not bound by the sequential order of events that govern paths between states, and hence so-called thermodynamic cycle methods could be developed that replaced chemical steps with alchemical steps that in principle should lead to the same outcome [29, 30, 97, 98].
Comparative Molecular Field Analysis (CoMFA) is based on the overlay of active ligands [99, 100–102]. Initially, the technique was more a concept than an effective tool as computer power was very limited and molecular descriptors as well as dedicated algorithms needed to be developed . The underlying idea that the 3D dimensional steric/non-bonded (Van der Waals) and electrostatic potential fields generated by the spatial organization of the chemical features around the scaffold of a ligand (Fig. 3) play a fundamental role in the biomolecular receptor recognition was so intuitively right and the technique made a break-through in 1988 . Examples of the application of the method are plentiful . About 15% of all articles in JCAMD refer to the use of this technique, refined and applied in all sorts of ways to produce the overly famous quantitative structure activity relationship (QSAR) equations. However, CoMFA suffers from three drawbacks: (1) the alignment of the ligands in the pocket must be either known or gambled correctly; (2) the method has been established for rigid or quasi-rigid classes of molecules (e.g. steroids); and (3) the detailed influence of the protein pocket is not known which means that any feature that is not implicitly present in the training set will be missed [104–109]. These nearly fatal drawbacks prevented the generalization of the method as a standalone solution to rational drug design. Certainly, the best way to apply CoMFA is to combine it with a pharmacophore model and a carefully conducted conformational study of the ligands .
Many drug design projects include at some stage knowledge of the 3D structure of the target protein, and homology modelling is normally used when neither X-ray nor NMR derived coordinates are available [111–118]. Many computer programs were written for this purpose [119–123] and the CASP competition  illustrates every 2 years where the field stands. Presently, YASARA seems to be performing very well , but many labs are working hard so this situation might change again in the future. For example, methods are under development that use PLIM  to provide a first fix on the ligand docking site where-after steered Molecular Dynamics is used to continue the trajectory to convergence.
Similar to the CASP competitions, the GPCR-DOCK [126, 127] competitions have evaluated the quality of docking software, but with the additional complexity that the target structures needed to be modelled before docking could be attempted. In recent years a whole series of studies have been published in which homology modelling, combined with other tools, proved a viable replacement for the cumbersome experimental determination of target structures [111, 114, 128, 129]. The good performance of two Dutch teams  in the recent GPCR Dock competition  beautifully illustrates the often mentioned fact that even the best tools only perform well in the hands of good scientists . In this latter article we find the interesting quote “Interestingly enough, it is the model built with most human intervention which proves to be the best”.
In the early 1990s the radical new idea emerged that instead of the virtual and/or real screening of large libraries of already existing molecules to identify new bioactive hits, one could rather attempt to construct entirely new synthesizable molecular entities solely based on the knowledge of the active site of the pharmaceutical target enzyme. [132–134]. To do so, small organic fragments composed of few atoms only must be assembled in silico inside the binding sites of enzymes in such a way that optimal protein–ligand, steric, and electronic complementarity is achieved [84, 125, 135–141]. The major problem of this approach arises from the complexity of the active site landscape and the combinatorial vastness of all possible arrangements of fragments in the volume delineated by an enzyme active site [66, 142–144]. How to choose the first fragment and where should it be positioned and oriented with respect to the inner surface of the binding pocket or cleft [143, 145, 146]? Which next fragment should be attached to it? ? The genetic algorithms [66, 142, 148–150] have been invented which allow this concept to be realized within a tractable amount of computer time by performing random transformations on a ligand collection. These transformations are selection, mutation, and crossover, and are reminiscent of the corresponding evolutionary processes in biology underlying the optimisation of genes, hence their name ‘genetic algorithms’. Experience shows that these algorithms provide solutions that nicely fit the objective function, although it often is difficult to understand exactly why .
Randomly screening very large libraries containing up to 105 or even 106 chemical entities in in vitro enzymatic assays to produce leads has been the central paradigm of the pharmaceutical industry across the 1990s. However, after years of operating very expensive screening facilities, it has been realized that the hits produced were not of the expected quality. For example, often a bias is observed toward too lipophilic compounds that are impossible to optimize. Compared to the actual number of chemically entities (~infinite) the any amount of compounds that can be screened via this process is essentially zero [151, 152]. In parallel, computational chemists had inferred that screening could be successfully operated virtually throughout computers at all stages in the drug design process from hit identification via hit optimization to lead optimization [153–162]. In each of these three stages virtual libraries can be created and filtered either using chemometrics to exclude molecules that obviously aren’t drug-like because of their predicted solubility or ADME/Tox properties, or using 3D chemical molecular descriptors (pharmacophores), or using docking results. Thus, libraries of compound that do not actually exist can be screened and a much smaller, manageable number of compounds selected. This is of particular advantage at the stage of lead optimization, when only few compounds are left. Scaffolds of lead compounds usually carry a number of branching points were chemical variation is allowed. The in silico creation of combinatorial libraries of all the variant compounds is a dramatically faster process than its in vitro counterpart [163–165].
One of the main difficulties in establishing reliable and/or transferrable QSAR equations is that, even within a class of chemical analogues, ligand affinities may not respond linearly to the variation of one or several of the molecular descriptors that have been identified as related to activity. For instance, across a series of chemical substituents sorted by increasing polarity the measured affinity may respond linearly only for a restricted number of them because steric hindrance or global effect such as desolvation may penalize the binding of slightly larger groups. The modification of a branched group at another point around the scaffold may however allow some of the previously excluded ligands to become highly active. Indeed the mere addition of one methyl group may result in a sudden tenfold leap in potency, dramatically increasing ligand efficiency [166, 167]. It was demonstrated that these problems could be circumvented using artificial intelligence methods (neural network, support vector machine, etc.) that are insensitive to the spatial alignment of the ligand scaffolds and that are able to recognize particular combinations of properties distributed around the scaffold of a set of active ligands [168–170]. Artificial intelligence can be ubiquitously implemented at various stages in the rational drug design process to improve results that can be otherwise be more uncertainly obtained with classical methods, especially when assessing general properties that are the result of the subtle combination of many different factors in relation to others such as drug-likeness . Various examples of artificial intelligence applications and their limitations have been published in JCAMD [172–175]. Notwithstanding the utility of artificial intelligence, normal intelligence remains useful in avoiding some of the all too common pitfalls in the derivation and application of QSAR models .
We apologise to the many authors of methods that didn’t make it into the above list (see ESM Table 1). Much good work has been done that the editors certainly wouldn’t allow us to include because citing all 1,200 articles published in JCAMD in the first 25 years would perhaps be a bit excessive. We could have mentioned the work by Che on privilege structures , or by Lotta et al.  on multivariate PLS modelling of data sets. The recent work by Zhou et al.  on the use of DFT calculation to accurately assess the existence of intermolecular H-bonds in docking instances. Sarmah et al’s  work on solvent effects also added significantly to the drug-designers toolbox, but the methods described in these articles didn’t achieve hype status.
The rapid increase in costs of developing and marketing new medicines is not leaving the pharmaceutical industry untouched. Recent years have seen a strong concentration of activities in terms of mergers, buy-outs, and closures . It may simply be, that a research-intensive industry like the pharmaceutical industry does not lend itself to the type of management that is common in consumer goods, fashion and footwear. It seems a paradox, though, that the high costs associated with drug design are caused by development, marketing, and legal fees, but when it comes to cost-reduction research departments are, euphemistically called, consolidated. The past years have also seen a consolidation of methods. JCAMD has published a large series of articles in which multiple methods have been combined. [22, 128, 129, 181–185]. All these pipelines and otherwise combined methods speed up the use of the existing tools, and allow them to be applied to ever larger numbers of small molecules in ever shorter times.
Actually, there is a new hype raging at the moment, and it is called ‘translational science’. In the Wikipedia we find under translational research: “In the field of medicine, for example, it is used to translate the findings in basic research more quickly and efficiently into medical practice and, thus, meaningful health outcomes, whether those are physical, mental, or social outcomes”. In a sense, the recent spate of articles on combining existing techniques into more easily applicable super-tools fit nicely to this translational paradigm. It must be stated, though, that the translation science hype is feeling stiff competition from systems biology  and modelling pharmacokinetics and pharmacodynamics . Between the lines we read in translational science that the pharmaceutical industry has finally realized that our deep lack of understanding of all aspects of the interaction of a medicine with a human being is the main cause for luck still being the most determining factor in the drug design process. Consequently, we see the out-sourcing budgets of the large pharmaceutical industries go up , and more and more fundamental research performed in academia is finding its way to small and medium size enterprises (SMEs) where it can be incorporated in their lean and mean research machines . Big pharma will at some time buy either their products or the whole SMEs and convert validated targets and leads or even Phase I products into new medicines.
This new paradigm will probably also be proven a hype soon; only time can tell if translational research will rescue the pharmaceutical industry, or that it will only better illustrate what it all is that we don’t know yet. It remains a fact that better understanding the underlying biology, better treatment of all available data, and more intelligent combinations of data, information, and knowledge must be beneficial for the drug design process and thus, on the long run, for all of us.
If the pharmaceutical industry wants academia more involved in the drug design process they could themselves make a giant first step by making available all (or at least very many) X-ray structures of protein–ligand complexes. We estimate that the number of PDB [189, 190] files collecting computer dust in the pharmaceutical industry is considerably bigger than the 75,000 structures now in the PDB. We have discussed this possibility with industrial crystallographers who realized that they were sometimes sitting on thousands of structure files for which secrecy was no longer an issue. They remained nevertheless hesitant to even consider discussing with their management the release of these data in fear of paranoia based rejection. Another often heard rejection criterion is that they are a bit ashamed for these data because often these files have not been refined any further than was needed to answer the biological or pharmaceutical question at hand. We offer to set up a database for these files, and we offer to re-refine all industrial structures of protein–ligand complexes. We will then only release those coordinates to the wider public that pass certain minimal validation criteria . Obviously, the files in this system will remain the property of the depositors. If one day deposition of coordinate files into the PDB becomes significantly easier, we can consider depositing all files in the PDB on behalf of the original depositors. It might seem a bold promise to re-refine perhaps even 100,000 structures, but the PDB_REDO experiment [192–196] shows that today this can be done. In PDB_REDO we significantly improved 85% of all presently available PDB files that were solved by X-ray. It seems likely that structures that often have been minimally refined can be improved even easier. One can even envisage that industries would like to look back at their own coordinates after we went through the elaborate and time consuming refinement process for them; in management speak that would be the ultimate win–win situation.
Despite massive efforts in the design of tools, databases, robotic techniques, and management innovations, luck seems to be at the basis of the discovery of most new medicines . The blockbuster Viagra is probably the best illustration of the opportunism that we tend to call serendipity .
In 1997, i.e., long before the first GPCR structure became available, Kuipers et al.  performed a massive literature search for aryloxypropanolamines and similar compounds binding to the serotonin 5HT-1a receptor and a series of sequence similar amine receptors. A correlation analysis  revealed that only one residue’s presence/absence showed a perfect correlation with binding/non-binding of a series of compounds. A mutational study validated the hypothesis that this correlation indicated a direct hydrogen bond between an alcohol group in the aminergic ligand and asparagine 719 . When the structure of the human β2 adrenoceptor bound to carazolol was solved by X-ray [PDBid 2RH1; 202], it showed indeed two hydrogen bonds between Asn-719 and this similar ligand (see Fig 4). By the way, in none of the GPCR homology models available in 199×, did Asn-719 interact with a ligand.
In another GPCR related project aimed at using as much heterogeneous data as can possibly be combined, Oliveira et al.  predicted the role of all ‘active site’ residues in GPCRs, the pivotal role of Arg-340 , and even a series of residue interactions involved in the activation process, and the presence and location of helix VIII . The recent flurry of articles on GPCR Xray structures [206–209], and especially the structure with a covalently agonist-bound G protein  showed all these predictions to be conceptually right.
These two GPCR-related examples make clear that there is a lot to be gained from using experimental data. But these examples also taught us how hard it is to actually get access to those data. With the GPCRDB [211–213] we have started a trend to make Molecular Class Specific Information Systems (MCSIS). And a small company, Bio-Prodict (www.bio-prodict.nl) recently caught on and is now making MCSISes for a wide variety of commercially interesting molecules [214–218]. Their systems (some of which are freely accessible from their website) revolve around a structure based, and thus very accurate multiple sequence alignment (MSA) for a whole protein super-family. This MSA then functions as the anchor on which to position all kinds of data that can range from 3D structures to genome related data, from mutation studies to ligand binding constants, or from sequence correlation patterns to the prediction of mutations that enhance the protein’s stability. As the most powerful information tends to be carefully hidden in the literature, an extensive set of literature-mining scripts aids with the extraction of, for example, mutation information. In fact, it was shown that the suite of mutation data extracting scripts reaches a much better coverage than can be obtained by human experts [214–218].
A recent development that will aid the drug hunters of the future is the Utopia PDF reader [213, 219]. Vroling et al.  showed how this programmable PDF reader could be used to directly couple data in articles on GPCRs to the GPCRDB. This intelligent hyperlinking has a series of benefits. First, the residue numbering problem gets solved because the reader can ask the GPCRDB for the position in the GPCR MSA of any residue mentioned in the article, and it can even modify or correct the sequence numbers in the article if needed. Much good GPCR mutation data was published in the pre-GPCR-structure era that ended with the opsin structure article , and often these data were misinterpreted because of the poor quality of the available homology models . The Utopia-GPCR PDF reader can correct those interpretations thereby salvaging old, high quality experimental data for future use. Figure 5 shows an image from an old mutation study  in which the authors describe several ground-breaking mutations in the guinea pig histamine H1 receptor, building and validating a homology model using these data, and arguing, for example, that residue Trp161 plays an important role in receptor-ligand binding. This assumption was based on the effect of the mutation on receptor function, leading to a model in which Trp161 was modelled in the ligand-binding site. By contrast, the GPCRDB generated annotation listed in the sidebar of the reader indicates that this residue, located in TM IV, points towards the membrane and possibly interacts with cholesterol. This is a completely different situation from that proposed by the authors. Looking at the model provided by the GPCRDB, based on the latest crystal structures, it can be seen that a direct role of Trp161 in receptor-ligand binding is highly unlikely.
Folkerstma et al. (2005) analyzed nearly 100 nuclear receptor (NR) ligand binding domains. Combined with manually curated multiple sequence alignments, key positions in the ligand binding pocket were identified that had specific interactions with functionally diverse compounds. For example, residues at position 26 in Fig. 6 were shown to only have interactions with antagonists. This analysis required a substantial amount of work: categorizing structures and compounds, creating multiple sequence alignments, analyzing ligand contacts, and transferring the results into a homogeneous residue numbering scheme (the so-called 3D numbers). With the 3DM information system [223; see the help movie], these analyses can today be performed in a matter of minutes [215, 217, 224].
More than 100 articles were found that discuss the effects of mutating this residue on the ligand binding of the receptor. In all these articles this same residue has 14 different residue numbers ranging from 52 to 709. The use of a common 3D numbering scheme enables transfer of heterogeneous information between protein family members. Figure 7 shows 40 antagonists in red and 70 agonists in blue. In this example, a hundred articles had to be ‘read’ to extract all available mutation information for this single position mutated in 22 different receptor—species combinations. That these 100 articles had to be found among 100,000 PubMed entries that contain NR information is a whole different story in itself.
If, one day, all structures of NR-ligand complexes that now are scattered over inaccessible industrial hard disks could be concentrated in one system, then we could consider asking much more elaborate questions. We could consider correlating aspects of ligands with protein atom characteristics, or we could analyse if residues not contacting the ligand have an influence on binding or activation, etcetera.
It is not only important to get as much information as possible stored in systems amenable to scrutiny, but it is also important to realize that for every one bioinformatician or drug hunter there are one hundred scientists who do not use molecular software regularly. Project Hope aims to predict the molecular phenotype of point mutations that were shown causally related to human disease states . This system attempts in all stages of user interaction to cater for human geneticists who typically do not use molecular software at all. Hope only asks the user to cut-n-paste the sequence, and click the residue mutated and the mutation residue type. It then builds a homology model if needed, calls dozens of servers and services in seven countries, combines all possible information and writes a final report that can be directly used in publications, but, more importantly, that is written without using any bioinformatics jargon and even has a build-in dictionary that explains terms such as ‘active site’, ‘salt-bridge’, or ‘torsion angle’ in human genetics understandable terms. Hope thus is the ultimate translation machine because in doing translational research it even translates between the researchers.
We believe that the recent spate of consolidations in the pharmaceutical industry is not a problem but an opportunity. Mankind needs medicines, and now that pushing ones luck is slowly becoming a less successful technique, only research can find them. This research can progress rapidly if the thousands and thousands of X-ray structures of protein–ligand complexes would find their way from hard-disks behind pharmaceutical industry firewalls to the public domain. Drug design research in the next 25 years will revolve around ever broader collaborations, ever more holistic understanding of the drug—human interactions, and ever better use of the available data, information, and knowledge.
VL and GV acknowledge financial support from NBIC, and TIPharma, TvdB appreciate the support from Bio-Prodict (www.bio-prodict.com). The authors thank Jan Kelder for critically reviewing the manuscript. Elmar Krieger helped with YASARA, Maarten Hekkelman, Coos Baakman, Bas Vroling, Wilmar Teunissen, Barbara van Kampen, provided technical support. The authors mention with pleasure the many stimulating discussions with Sander Nabuurs, Daniel Gironés, Gijs Schaftenaar, Friedrich Rippmann, Ad IJzerman, Margot Beukers, Isabel Duarte, Christof Francke, Henk-Jan Joosten, Jacob de Vlieg.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.