|Home | About | Journals | Submit | Contact Us | Français|
The clinical efficacy and safety of a drug is determined by its activity profile across multiple proteins in the proteome. However, designing drugs with a specific multi-target profile is both complex and difficult. Therefore methods to rationally design drugs a priori against profiles of multiple proteins would have immense value in drug discovery. We describe a new approach for the automated design of ligands against profiles of multiple drug targets. The method is demonstrated by the evolution of an approved acetylcholinesterase inhibitor drug into brain penetrable ligands with either specific polypharmacology or exquisite selectivity profiles for G-protein coupled receptors. Overall, 800 ligand-target predictions of prospectively designed ligands were tested experimentally, of which 75% were confirmed correct. We also demonstrate target engagement in vivo. The approach can be a useful source of drug leads where multi-target profiles are required to achieve either selectivity over other drug targets or a desired polypharmacology.
The safety and efficacy of a drug is determined not only by its action on an individual protein but also by its interactions with multiple proteins in the proteome. The promiscuous interaction of a drug with undesired proteins frequently causes toxicity1 and adverse effects2,3. Conversely, the modulation of a single drug target can be therapeutically insufficient, particularly in complex neuropsychiatric conditions, infectious diseases and cancer4-6. Instead, it is frequently necessary for a drug to simultaneously engage two or more targets for therapeutic efficacy7. Psychiatric drugs in particular require multiple activities against several targets to therapeutically modulate complex neuropsychiatric domains including perception, cognition and emotion4. However, designing drugs with a specific multi-target profile – to achieve either exquisite selectivity over other drug targets or a desired polypharmacology – is a complex and exceedingly difficult task for medicinal chemistry8. Accordingly, methods are needed to enable drugs to be designed a priori against several molecular targets simultaneously. Here we describe a solution to the complex problem of designing ligands against multiple drug target profiles by automated design.
The problem of designing ligands against a multi-target profile involves the parallel optimisation of multiple structure-activity relationships (SAR) within a desired range of physico-chemical properties. The prospect of multi-target drug design has been recently aided by the development of computational methods that show success in predicting the molecular targets of drugs3,9-13 (Supplementary Fig. 1) although such approaches are not intrinsically design methods.
Drug design can be modelled as an evolutionary process of iterative cycles of exploration and analysis14,15. Adaptive design processes are efficient at solving complex, multi-objective problems. Accordingly, we developed an automated, adaptive design approach to optimise ligands against polypharmacological profiles.
Several de novo drug design methods have been proposed previously16-21. However, of those that have been experimentally tested22-26 only rarely have high affinity ligands been described and these are all against a single molecular-target objective 24,26. In contrast to previous de novo approaches we mimicked the creative process by automated learning of medicinal chemistry design tactics, applying these to the generation of analogues, and then prioritizing them relative to a set of objectives (Fig. 1a) The development of this approach is described below, starting from ‘off-target’ predictions, progressing through ligand design and, finally, the discovery of novel compounds with pre-defined multi-target profiles.
Sir James Black proposed that, “the most fruitful basis for the discovery of a new drug is to start with an old drug”27. Accordingly, we tested whether the algorithm could automate the evolution of new biological activities, starting from a known drug. Donepezil (compound 1) is an acetylcholinesterase inhibitor approved for cognitive enhancement in Alzheimer’s disease. Bayesian probabilistic activity models9, for 784 molecular targets built from the ChEMBL database28 predicted a moderate likelihood that donepezil possessed D4 dopamine receptor activity and a low chance of D2 dopamine receptor activity (Supplementary Table 1). We found donepezil was a moderately potent D4 inverse agonist (ki=614nM) with minimal D2 activity (Supplementary Fig. 2 and 3 and Supplementary Tables 2 and 3). Donepezil’s D4 inverse agonist activity is intriguing given analyses demonstrating a significant improvement in memory in the Trail Making Test with this drug29 and findings that D4 antagonists can prevent stress-induced cognitive dysfunction in primates30.
We tested our method by evolving the structure of donepezil with the dual objectives of improving D2 activity and achieving blood-brain barrier (BBB) penetration. In our approach the desired multi-objective profile is defined a priori and then expressed as a point in multi-dimensional space termed ‘the ideal achievement point’. In this first example the objectives were simply defined as two target properties and, therefore, the space has two dimensions. Each dimension is defined by a Bayesian score for the predicted activity and a combined score that describe the absorption, distribution, metabolism and excretion (ADME) properties suitable for BBB penetration (D2 score=100, ADME score=50). We next generated alternative chemical structures by a set of structural transformations using donepezil as the starting structure. The population was subsequently enumerated by applying a set of transformations to the parent compound(s) of each generation. In contrast to rules-based or synthetic reaction-based approaches for generating chemical structures16,31-34, we used a knowledge-based approach by mining the medicinal chemistry literature28,35. By deriving structural transformations from medicinal chemistry, we attempted to mimic the creative design process (Supplementary Fig. 4)36. Activity predictions were calculated for each of the enumerated compounds from all Bayesian models. Scores representing the likelihood of CNS penetration and good ADME properties were calculated using the program Stardrop (Optibrium Ltd.) and combined into a single value. The predicted properties of the enumerated structures were then expressed as points in multi-dimensional space. The generated structures were subsequently ranked by the distance (in multi-dimensional space) between the predicted properties for each structure and the ideal achievement point37 (Fig. 1b). Compounds were filtered for novelty, Lipinski’s rule-of-five compliance38 and synthetic accessibility39. The top 10,000 prioritized structures were selected for the next iterative cycle along with 500 random structures from the remaining population. The process was iterated until either a structure close to the objectives was discovered or no further improvements were achieved.
Initially, we evolved a series of isoindoles and prioritized them using our achievement objectives as criteria (Fig. 1c, Supplementary Table 4 and Supplementary Fig. 5). Eight analogues were then synthesized and tested (Fig. 2, Supplementary Figs. 2 and 6 and Supplementary Table 5) with all showing significant D2 affinities (ki’s= 156-1,700 nM; Supplementary Table 3).
The second highest ranking compound (3) – chosen from the final population of evolved structures – exhibited the highest D2 receptor affinity (ki=156nM). Thus, we successfully evolved donepezil’s negligible D2 activity into a series of ligands with higher D2 affinities (Fig. 2). Functionally, 3 was a dual D2 inverse agonist/D4 agonist (Supplementary Fig. 3). CNS penetration studies showed that 3 penetrates the brain as predicted, with an in vivo brain/blood ratio (BBR) of 0.5.
Although the evolved compounds were selected for the D2 receptor objective other predicted activities were not selected against. Accordingly, each of the generated compounds had a predicted polypharmacology profile. In general, this set of isoindole analogues was predicted to exhibit promiscuous profiles, with variable activities predicted for multiple serotonergic, adrenergic and dopaminergic receptor subtypes (Supplementary Tables 1 and 5). These predicted promiscuous profiles were subsequently confirmed (Fig. 2; Supplementary Tables 2 and 3), and the predicted multi-target profiles displayed excellent agreement with experimentally determined profiles, thereby implying that the approach can be applied to the de novo design of multi-target agents.
The isoindoles exhibited moderately potent affinities for the α1 adrenoceptors (ki’s = 0.9 nM-3,577 nM; mean ki=277 nM; Supplementary Table 3 and Supplementary Fig. 2). Since α1 adrenoceptor antagonists can induce low blood pressure as a side effect they are considered ‘anti-targets’ to be avoided in drug design40. A common drug design optimisation problem is to reduce such anti-target activity whilst maintaining desired on-target activities. We accordingly evolved the 8 newly-synthesized isoindoles towards a polypharmacological profile (5-HT1A, D2, D3, D4) with selectivity over the three α1 adrenoreceptors anti-targets (α1A, α1B and α1D) whilst maintaining CNS penetration.
To compare evolutionary strategies, the isoindoles were optimised towards polypharmacology objectives with and without highly predicted α1 activities filtered-out at each generation (Fig. 3a, Supplementary Fig. 7, Supplementary Tables 6 and 7). In both optimisations benzolactams, were ranked the highest (compounds 11a and 10a respectively) (Fig. 3b and Supplementary Tables 6 and 7). Six benzolactam analogues were then synthesized based on both sets of objectives (Supplementary Table 8 and Supplementary Fig. 5). Analogues of the benzo-δ-lactam (3,4,-dihydroisoquinolin-1(2H)-one) (9a, 10a and 11a) and benzo-ε-lactam (2,3,4,5-tetrahydro-1-H-benzo[c]azepin-1-one) (9b, 10b and 11b) were synthesized for comparison since both ring systems were highly prioritized.
Both predicted and observed receptor activity profiles for the synthesized benzolactams are shown in Fig. 3b (Supplementary Tables 1 and 3; Supplementary Fig. 2). The 2-pyridine-piperazine analogues (11a and 11b) have the lowest α1 predictions and, indeed, exhibited the lowest α1 affinities (mean ki=1,131nM). In agreement with the models, 11a and 11b also have the lowest affinity for all dopamine D2-like receptors of the benzolactams tested. The dichloro-phenylpiperazine analogues (9a and 9b) exhibited slighted higher α1 and D2 predictions, which were also confirmed experimentally. In contrast, the 2-methoxy phenylpiperazine analogues (10a and 10b) exhibited potent affinities against the polypharmacology profile 5-HT1A, D2, D3 and D4 receptors but also had the highest α1 predictions and were confirmed as the most potent against the α1 receptors (mean ki=45.3nM) (Fig. 3b). Receptor profiling of the benzolactam series revealed that compared to the isoindoles, from which the compounds were evolved, the benzolactams achieved the objective of an increased polypharmacology profile for 5-HT1A,D2, D3 and D4 over the α1 adrenoceptors (Fig. 3b). The benzolactams are 11-fold more selective with respect to D2 and 25-fold more selective with respect to the polypharmacology profile, compared to the isoindoles. Importantly, the benzolactam 9a penetrated the brain (BBR=5.9), as predicted.
As the benzolactam series was not present in the ChEMBL database (release 1) used to build the Bayesian models, they constituted a novel chemical series for the system to discover41. Intriguingly, benzolactam derivatives have recently been independently synthesized and tested as potent D2/D3 ligands42, however the broader receptor profiles of the compounds were not evaluated. This observation provides additional confidence that the algorithm is capable of generating and prioritising novel chemical structures equivalent to those devised by medicinal chemists.
We next explored selectivity in the context of our multi-target objectives and asked whether we could optimise potent, selective, CNS penetrant D4 receptor ligands starting from the chemical structure of donepezil. We executed the optimisation in two stages: we optimised (i) for D4 potency and brain penetrability and (ii) for D4 selectivity. A series of 2-methylindoline derivatives with high predicted D4 activity was evolved from donepezil after six generations (Fig. 4a, Supplementary Fig. 9). Notably, compounds belonging to a 2,3-dihydro-indol-1-yl chemotype were dominant in the prioritized set (Supplementary Table 9). Compounds 12 and 13, which both belong to the 2,3-dihydro-indol-1-yl class, were then selected for testing (Supplementary Figs. 5 & 8). The highest ranking compound, 12, was inactive, while 13, the third highest-ranking design out of the final population, was the most potent D4 ligand amongst all tested compounds (D4 ki=8.9nM). Via optimisation, 13 represents a 69-fold increase in affinity over donepezil. In contrast to the isoindole and benzolactams analogues, 13 is predicted to be a highly selective D4 ligand with 95-fold selectivity over 5-HT2B and weak affinities greater than 1 μM for only 5 other receptors in our panel of GPCRs (Fig. 2). Importantly, and as predicted, 13 is highly CNS penetrant (BBR= 7.5).
To verify that the predicted properties of selectivity, potency, and CNS penetration resulted in D4 receptor activity in vivo, we evaluated 13 on behaviour in wild-type (WT) and D4 receptor knockout (D4R-KO) mice (Fig. 4c-f), as well as in proprotein convertase 7 (PC7) KO mice (Supplementary Fig. 10a-d) that display a similar D4R-KO phenotype. Although open field locomotor activity declined in vehicle-treated D4R-WT animals (Fig. 4c), it remained high and showed little habituation in D4R-KO mice (Supplementary Information). While 0.7 mg/kg 13 was without significant effect in either genotype, the 1 mg/kg dose reduced locomotion at 0-20 min in D4R-WT animals. The same dose exerted no effect in D4R-KO mice. In the centre zone, vehicle-treated D4R-KO mice spent more time in this area at 21-60 min than D4R-WT controls (Fig. 4d). Centre time in D4R-WT animals was enhanced at 0-20 min with both doses of 13; in D4R-KO mice it was attenuated at 41-60 min with the 1 mg/kg dose. In the hole-board test, head-poking was increased in vehicle-treated D4R-KO animals compared to D4R-WT controls (Fig. 4e). In D4R-WT mice 13 augmented head-pokes in a dose-dependent fashion; only the 1 mg/kg dose attenuated head-poking in mutants. In the zero maze open-area time was increased in vehicle-treated D4R-KO mice relative to D4R-WT controls (Fig. 4f). One mg/kg compound 13 selectively increased D4R-WT open-area times to levels of the D4R-KO mice. With regards to PC7 mice, PC7-WT and D4R-WT responses were similar (Supplementary Fig. 8a-d). Although behaviors in PC7-KO vehicle controls essentially phenocopied those in vehicle-treated D4R-KO mice, 13 normalized PC7-KO responses to those of PC7-WT controls. By comparison, D4R-KO mice were largely unresponsive to 13 – demonstrating high D4R selectivity. Nonetheless, 13 did not appear to be absolutely selective since in D4R-KO animals the 1mg/kg dose attenuated head-poking in the hole-board test, implying some possible off-target actions at increased doses.
Using 13, we further expanded our objective to evolve ligands that (i) were highly selective for dopamine D4 (ii) were CNS penetrant and (iii) were a novel chemotype. The evolution of 13 against these objectives resulted in the design of novel morpholino compounds (Fig. 4b, Supplementary Fig. 11) (Supplementary Tables 10 and 11). Compounds with the novel isoindol-1-yl-ethyl-morpholino backbone were prominent in the prioritised final generation population of 10,000 structures (ranked 5th, 6th and 9th of the top 10 compounds in Supplementary Table 10), while most known D4 ligands are 1,4-disubstituted aromatic piperidines and piperazines (1,4-DAPs). However 1,4-DAPs are rather promiscuous substructures common in ligands for many biogenic amine GPCRs43. Therefore the isoindol-1-yl-ethyl-morpholino analogues represent a novel D4 chemotype.
A library of 24 morpholino analogues (compounds 14-29) was then synthesized and profiled against our GPCR panel (Supplementary Table 11; Supplementary Fig 5). Individual R and S morpholino enantiomers were synthesized and assayed separately (chirality designated by suffix). To further reduce complexity, direct analogues with and without the carbonyl oxygen were synthesized (e.g. 20 and 21), as this atom was predicted not to be essential to the overall D4 selectivity profile (Fig. 2).
The assays confirmed the predictions that the new morpholino compounds are generally highly selective for the D4 receptor over the tested receptors (Fig 2). Seventeen of the compounds had affinities for the dopamine D4 receptor, ranging from ki=90nM (Compound 27s) to ki=5,526nM (18s) with eight exhibiting affinities less than 1 μM for D4 (compounds 15r, 19s, 21s, 22r, 23s, 26s, 27s and 28s). Compounds containing the ethanone linker-group were generally less active compared to those with the ethyl linker. For compounds with the ethyl linker, the S enantiomer was more potent than the R enantiomer. Functional assays of an exemplar compound (22r) indicated inverse agonism at D4 (Supplementary Figure 3).
In agreement with the design objectives, the morpholinos displayed exquisite selectivity for the dopamine D4 receptor. Excluding the dopamine receptors, low positive Bayesian scores were observed for 8 of compounds against 5-HT1A, 16 of the compounds against the 5-HT2A/B/C receptors, and almost all had very low scores for 5-HT7 serotonin receptors. The off-target trends were confirmed when the compounds were profiled (Fig. 2). The morpholino compounds, on average, bound to 3.4 targets (including D4 at ki’s <10μM) compared to 15.8 targets for the isoindole and benzolactam compounds. Seven of the active compounds had off-target activities for only 1 of the 20 receptors tested. Compound 26s is the most selective compound with no measured affinity for any other screened receptor. Four compounds possessed both relatively high affinity (D4 ki<1μM) and two or fewer off-target activities out of the 20 GPCRs profiled (21s, 26s, 27s, 28s).
The morpholino series thus represents a new class of highly selective, brain-penetrant, D4-dopamine receptor ligands. Compounds 27s (D4 ki=90nM; D1 ki=5852nM; BBR=2.0) and 21s (D4 ki=182nM, 5-HT2A ki=3,545nM) qualified as lead compounds that fulfilled all of our design objectives of novelty, high affinity for the dopamine D4 receptor and exquisite selectivity and CNS penetration Clearly, the automated design of a novel class of ligands with a desired multi-target profile demonstrates that the method is able to generate novel, drug-like lead compounds directly by automated design.
We focused on the polypharmacology of bioaminergic GPCRs as a convenient test case, due to the importance of multi-target profiles at these receptors for a variety of neuropsychiatric indications4. In principle the approach is applicable to all drug target classes limited only by the requirement for sufficient structure-activity (SAR) data to create useful models9-12. To generally extend polypharmacology profiling and hence de novo design it will be necessary to develop inferencing methods to build predictive bioactivity models that integrate all available SAR, protein structure and protein sequence information together and fuse data from diverse scoring functions into predictive frameworks44,45.
De novo, automated compound design against multi-target profiles provides a powerful new approach for discovering new ligands and drug leads and for discovering ligands that satisfy specific multi-target objectives. The method is particularly useful as a new source of leads for polypharmacology profiles.
All machine learning and data mining of the medicinal chemistry structure-activity data were conducted on the ChEMBL database (release 1) and a pre-release (StARLite version 31)28. The ChEMBL database contains (release 1) over 440,000 compounds abstracted J. Med Chem. and Bioorg. Med Chem. Letts from 1980 to May 2008. The ChEMBL database is available for download from the EBI.
A database of chemical transformations was derived from systematically comparing sets of analogue compounds in ChEMBL28. Sets of analogues with defined structure-activity-relationships (SAR) were identified in ChEMBL28 usually from individual journal articles. The transformations database was seeded with a set of common chemical transformation derived from medicinal chemistry knowledge. The transformations database was then expanded by identifying novel transformations by systematically applied to each of the structures associated with each of the journal articles in ChEMBL. All the transformations were applied to each compound in a journal article. The resulting set of transformed compounds was compared to the published analogues. Analogues that were not present in the transformed set highlighted potential transformations that were missing from the transformation database, and this was subsequently added to the database. This iterative mining method attempts to regenerate all the reported structures of every medicinal chemistry publication reported in ChEMBL. The chemical transformations were encoded in RXN format. The procedure was implemented in Pipeline Pilot. The current database contains over 700 unique structural transformations.
Predictive polypharmacology profiling was undertaken using Bayesian activity models, based on our previously published approach9. The Bayesian method for polypharmacology profiling was chosen as it provided both good performance on noisy datasets and a high speed of calculation51. High confidence models were built using ChEMBL (release 1). Activity data were filtered to keep only activity endpoint points that had either IC50, ki or EC50 values and where the ChEMBL confidence score was at least 7 (protein assignment was direct or homologue). A compound was considered active when the mean activity value was below 10μM. All inactive compounds were assigned to the target ‘none’. Following this procedure 133,061 compounds remained with 215,967 activity endpoints, which were used for model building. Multiple category Laplacian-modified naïve Bayesian models were built with ECFP6 representations52 for 784 targets. For each model the data were split in two for the validation step: compounds were clustered and assigned a cluster number. Clusters with an odd number were assigned to the test set, and the clusters with even number were assigned to the training set. Models were built with the training set, and the test set was scored. The training set was scored using its own model as comparison. Finally a model was built with all data and scored against itself, the training set and whole set should provide similar validation statistics. Statistics on the performance of the models are described in Supplementary Table 12. The results for the model containing all 785 targets the results were very similar to the models for the receptor subsets. Two analyses were used to assess the performance of the different models. The first analysis provides an overall score and does not need to specify a cut-off for distinguishing active from inactive compounds. The area under the Receiver Operating Characteristic (ROC) Curve (AUC) provides an indication of the ability of the model to prioritize active compounds over inactive compounds. The ROC curve is the plot of true positive rate versus false positive rate. However it did not provide information on early enrichment, which was important in studies such as the present one where only the top ranking compounds were considered. Therefore the Boltzmann-Enhanced Discrimination of ROC (BEDROC)53 was used, which solves the early enrichment issue by adding a weight to compounds recognized early. BEDROC was derived from the Robust Initial Enhancement (RIE), and the Sum of log of ranks test (SLR)54 which provided a statistical test to assess which method performs better than random. The percent of active compound retrieved in the top 5% is also calculated (Recall =5%). The second analysis required a cut-off to make the distinction between active and inactive as they varied with the rank of the compounds. For each model, the specificity (true negative rate), sensitivity (true positive rate), false positive rate, false negative rate, precision, F-measure and Matthews Correlation Coefficient (MCC) were calculated at different cut-off values. The cut-off providing the best MCC score was used, as it was shown to provide better performance55 (Supplementary Fig. 12). The quality of the models was assessed using an internal leave-one-out validation: one compound was part of the test set, and was scored using the remaining data as the training set. Then the area under the ROC curve was calculated (Supplementary Fig. 13). A cut-off score to minimize the sum of the percent misclassified for category members and for category non-members was calculated and used to classify compounds in the contingency table.
An All Data Model for dopamine receptors only was built using data from pre release of ChEMBL (StARLite version 31), with similar numbers of compounds and endpoints. The model was built without considering the confidence level of target assignment to gather as much data as possible. This model was used for initial calculation on the evolution of the isoindole series and the 2,3-dihydro-indol-1-yl series. The quality of the models was assessed using the same procedures as described above. The results from the All Data Models and the High Confidence Models were very similar (e.g. D2 model R2=0.998, D4 models R2=0.984).
The cut-off for a good prediction came from the validation steps for the model. From the test set, the cut-off value providing the best MCC value was used. For 5-HT1E, a cut-off of zero was selected. For each ligand-target association, the probability of success was 0.5 (active or inactive where activity was defined as ki<10μM). To test if the profile predictions were better than random, an exact binomial test was performed using R (version 2.8.1), and the cumulative binomial probability was calculated.
The adaptive optimization procedure involves defining a set of x achievement objectives (O1-x), where x is at least one, and an initial parent compound population (PP(G=n) where n=0 for the initial population) of at least one molecule. All of the members of the parent population (PP) are subjected to all possible transformations, so as to maximize the pool of molecules in the transformed population (PT). Each member of the transformed population is scored and ranked using the achievement objectives (i.e. Bayesian predictions, molecular properties).
If a stop condition is not satisfied, prioritized individuals with calculated objective parameters that satisfy the defined thresholds are assigned to an elite population (PE) and those that fail are assigned to a non-elite population (PN), of which n random members form a Random Population (PR). For all the calculations performed PE has a maximum size of 10,000 and PR a maximum size of 500. The new parent population (PP(G=n+1)) is created by merging PE and PR (PP(G=n+1) = PE(G=n) + PR(G=n)). This new population is subjected to another transformation process. This process is repeated until a stop condition is satisfied. The new parent population PP from each generation is also added to a combined pool of all observed parents (PPall).
The population of transformed compounds from the last iteration and the pooled parent population are combined and all duplicates and compounds failing structure valency rules removed to produce a merged population (PM) of unique members (PM= PPall + PT). Properties and parameters (e.g. Bayesian activity models values, physico-chemical properties and predicted ADME properties) are calculated for each individual in PM. Each of the n members of the initial population (PM) is evaluated against the achievement objectives (O1-x). Multi-objective prioritization is performed by describing the calculated parameters for a compound of interest as multi-dimensional coordinates. Pareto ranking is a common method for prioritising multiple criteria17. The Parero frontier maps a surface where all solutions are considered equivalent (non-dominant) – where an increase in one objective leads to a decrease in at least one or more other objectives. However finding a Pareto optimal solution becomes difficult when many objectives are considered56. Instead a vector scalarisation procedure is employed37.
The results are ranked by the magnitude of the vector ||a|| between the multi-dimensional co-ordinates of predicted values of the chemical structure of interest (A) and the defined achievement objective point, (O), with the shortest vector length closest to the ideal in multi-dimensional space:
where the achievement objective point has the coordinates (xO1,…,xOn), and predicted values of the molecule i form the coordinates (xA1,…, xAn). Novelty is assessed by comparing the generated compound with compounds in ChEMBL, either as an exact match or by comparison of the Murcko framework41, depending on whether the objectives are defined in terms of novel compounds or novel chemotypes. Novelty is filtered depending on the goals.
ADME properties and CNS penetration are calculated using previously publishing Gaussian Process models57,58 as implemented in StarDrop (Optibrium Ltd). A synthetic accessibility score, representing historical synthetic knowledge is calculated using a previously published algorithm39. The synthetic accessibility score combine the observation of fragments in ChEMBL and a complexity penalty. A limitation of the ECFP6-Bayesian prediction method is that fingerprints cannot distinguish stereochemistry if the stereochemistry is not encoded in the training data. In the ChEMBL database only 43% of the chiral compounds with chiral centres have their stereochemistry fully defined, thus it is not possible to distinguish between R and S enantiomers. To reduce the complexity of synthesis, compounds with two or more chiral centres are filtered from the final population.
The detailed experimental protocols for the radioligand and functional receptor assays are available on the NIMH PDSP website at http://pdsp.med.unc.edu/UNC-CH%20Protocol%20Book.pdf.
hERG activity was assayed by the patch clamp method on a PatchXpress platform and by FluxOR Tl+ assays. Assays were performed as previously described47.
Metabolic stability was assessed, generating the in vitro intrinsic clearance (Cli) following incubation of test compound with mouse hepatic microsomes. The assay was performed as previously described48.
Mice were housed under standard conditions: 12 h light/dark cycle and food and water available ad libitum throughout the study. The Drug Discovery Unit at the University of Dundee is dedicated to the humane care, maintenance and use of research animals and maintains compliance with UK Home Office regulations. All experiments were approved by the local ethical review committee. The ratio of test compound between brain and blood (B:B ratio) was assessed following intravenous administration to the female NMRI mouse.
Adult male and female C57BL/6J, D4R-KO mice (Jackson Labs, Bar Harbor, ME), and PC7-KO mice (D. Comstam, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland) were used. PC7 mice were crossed with C57BL/6J mice for more than 10 generations (N.G. Seidah and A. Prat, IRCM, Montreal, QC, Canada). Animals were maintained in a humidity- and temperature-controlled room under a 14:10 hr light:dark cycle (lights on 0800 hr). All studies were conducted during the light cycle, between 1000 and 1600 hrs in the following order: zero maze, open field, and hole-board, where tests were separated by at least by 7 days. Animals were assigned to vehicle- [0.1% DMA with 15% 2-hydroxypropyl-β-cyclodextrin (Sigma Aldrich, St. Louis, MO)] and compound 13-treated groups and were maintained in these groups throughout the experiments. Water and laboratory chow were supplied ad libitum. All experiments were conducted with an approved protocol from the Duke University Institutional Animal Care and Use Committee and according to NIH Guide for the Care and Use of Laboratory Animals. To overcome the high intrinsic clearance in mouse hepatic microsomes of compound 13 (Cli=13.8mL/min/g), mice were injected (i.p.) with vehicle, or 0.7 or 1 mg/kg compound 13 and placed immediately into the open field for 60 min as described 49. Activity was monitored as distance traveled and time spent in the center zone. In the hole-board and zero maze tests57,50 mice were given (i.p.) vehicle or compound 13 and tested 30 min later. Hole-board test responses were video-taped for 10 min using high-resolution low-light cameras (Panasonic NA, Secaucus, NJ) and were scored using the TopScan program (Clever Sys Inc., Reston, VA) for the rate of head-pokes into the 16 holes. Zero maze behaviors were video-taped over 5 min and were scored by trained observers blinded to the genotype, sex and treatment-condition of the animals using the Observer XT10 program (Noldus Information Technology, Leesburg, VA) for the percent time in the open areas. All behavioral data are presented as means ±SEM and were analyzed by ANOVA and repeated measures ANOVA followed by Bonferroni corrected pair-wise comparisons (IBM SPSS 20, Chicago, IL). A p<0.05 was considered significant.
This work is supported by SULSA (HR07019), the BBSRC Doctoral Training Programme, the BBSRC Pathfinder (BB/FOF/PF/15/09) and the BBSRC Follow On Fund schemes (BB/J010510/1) (A.L.H.), the University of Dundee’s Pump Priming Fund for Translational Medical Research (I.H.G. and A.L.H.) and supported by grants from the National Institutes of Health contracts supporting drug discovery and receptor pharmacology (B.L.R.) and the NIH grant MH082441 (W.C.W.). The chemical synthesis benefits from the infrastructure investments from the Wellcome Trust Strategic Award (WT 083481). We thank J. Overington of the European Bioinformatics Institute (EMBL-EBI) for StARlite and ChEMBL. We wish to thank C. Means and T. Rhodes for helping with the open field, hole board and zero maze tests. We also wish to thank C. Elms and J. Zhou for their support in the husbandry and generation of the mice used for behavioral testing. We also wish to thank F.Y. Li of Cleversys Inc for customizing the software configuration for the hole-board tests. Some of the equipment used in the behavioral testing was purchased with a grant from the North Carolina Biotechnology Center. BLR also received support from the Michael Hooker Chair of Pharmacology.
Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature.
A.L.H. devised the method, developed the algorithm and designed the study. J.B. coded the algorithm and undertook the calculations. G.R.B. developed the databases. A.L.H. and J.B. selected the compounds for synthesis. I.H.G., G.F.R. and K.A. designed the synthetic routes and G.F.R. and K.A. undertook the chemical synthesis. L.A.W. purified and analyzed several of the compounds. B.L.R. and V.S. designed the empirical tests for the synthesized compound predictions, analyzed and interpreted the results and performed the experiments. X.P.H. performed the 5-HT2B functional assays and the hERG assays. M.F.S. conducted the dopamine D2 and D4 functional assays. K.D.R. designed the DMPK studies and analyzed the results. S.N. L.S. and F.R.C.S. carried out the DMPK experiments. For the behavioural experiments, D.B.C. disrupted the PC7 gene in mice. A.P. and N.G.S verified the PC7 deletion in many tissues including brain, and then re-derived into a pure C57BL/6 background. W.C.W. designed the studies; R.M.R. and A.I.S. conducted the experiments and analyzed the results; W.C.W, R.M.R., A.I.S., D.B.C., A.P. and N.G.S. interpreted the findings; W.C.W. and R.M.R. prepared the figures. A.L.H. and B.L.R. wrote the manuscript, I.H.G. wrote the synthetic methods, W.C.W and R.M.R wrote the behavioral section of the manuscript. J.B., V.S., W.C.W. and R.M.R. prepared the figures. All the authors discussed the results and commented on the manuscript.
Competing Financial Interest
A.L.H., J.B. and G.R.B. are shareholders in Ex Scientia Ltd, a University of Dundee spin out company that has licensed the technology.