PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Chem Inf Model. Author manuscript; available in PMC 2010 October 1.
Published in final edited form as:
PMCID: PMC2767445
NIHMSID: NIHMS150913

A Novel Approach for Efficient Pharmacophore-based Virtual Screening: Method and Applications

Abstract

Virtual screening is emerging as a productive and cost-effective technology in rational drug design for the identification of novel lead compounds. An important model for virtual screening is the pharmacophore. Pharmacophore is the spatial configuration of essential features that enable a ligand molecule to interact with a specific target receptor. In the absence of a known receptor structure, a pharmacophore can be identified from a set of ligands that have been observed to interact with the target receptor. Here, we present a novel computational method for pharmacophore detection and virtual screening. The pharmacophore detection module is able to: (i) align multiple flexible ligands in a deterministic manner without exhaustive enumeration of the conformational space, (ii) detect subsets of input ligands that may bind to different binding sites or have different binding modes, (iii) address cases where the input ligands have different affinities by defining weighted pharmacophores based on the number of ligands that share them, and (iv) automatically select the most appropriate pharmacophore candidates for virtual screening. The algorithm is highly efficient, allowing a fast exploration of the chemical space by virtual screening of huge compound databases. The performance of PharmaGist was successfully evaluated on a commonly used dataset of G-Protein Coupled Receptor alpha1A. Additionally, a large-scale evaluation using the DUD (directory of useful decoys) dataset was performed. DUD contains 2950 active ligands for 40 different receptors, with 36 decoy compounds for each active ligand. PharmaGist enrichment rates are comparable with other state-of-the-art tools for virtual screening.

Availability

The software is available for download. A user-friendly web interface for pharmacophore detection is available at http://bioinfo3d.cs.tau.ac.il/PharmaGist.

Introduction

Virtual Screening (VS) is a computational approach to drug discovery that successfully complements High Throughput Screening (HTS) for hit detection.1 The objective is to use a computational approach for rapid cost-effective evaluation of large virtual databases of chemical compounds in order to identify a set of candidates to be synthesized and examined experimentally for their biological activity. Unlike HTS, virtual screening is not based on a brute-force search, but on some starting information on the receptor under inspection or on its active ligands.

Virtual screening methods can be divided into two broad categories: structure-based and ligand-based. In case the three-dimensional (3D) structure of the target receptor or of its binding site is available, docking is a highly effective technique for virtual screening.2,3 In the absence of structural information on the receptor, virtual screening approaches are mainly based on structural similarity between known and potential active ligands. The rationale is that molecules that share some structural similarity may have a similar activity. Some methods use a known active ligand as a query to extract structurally similar compounds from large databases.4 Among these are the FlexS5,6 and fFlash7 methods. These methods perform a 3D structural alignment between a pair of compounds, a query ligand assumed to be rigid and each database compound treated as flexible.

When a set of active ligands is available, it is possible to compute their shared pharmacophore. A pharmacophore is defined as the 3D arrangement of features that is crucial for a ligand molecule in order to interact with a target receptor in a specific binding site. Once identified, a pharmacophore can serve as an important model for virtual screening, especially in case where the 3D structure of the receptor is unknown and docking techniques are not applicable. The strength of pharmacophore-based screening compared to other ligand similarity screening approaches lies in the ability to detect a diverse set of putative active compounds with totally different chemical scaffolds. This increases the chances that some of the detected compounds will pass all the stages of drug development. Besides screening, pharmacophore is a powerful model also in other applications of drug development, like de novo design, lead optimization, ADME/Tox studies and Chemogenomics.8,9

Many in silico methods are available for identifying pharmacophore models from a set of ligands that have been observed to interact with the same receptor.1012 Generally, these methods search for the largest 3D pattern of features responsible for binding that is shared by all or most input ligands. From the computational standpoint, this task is challenging with respect to both the number of input ligands and their flexibility. The various approaches mainly differ in three aspects: (i) the chosen feature descriptors and structure representation, (ii) the technique for addressing the flexibility of the ligands, and (iii) the definition of the searched common pattern and the algorithm employed for identifying it.10,13

All ligand-based methods for pharmacophore detection represent the input ligands by their features. The selection of feature descriptions is mainly based on the desired level of resolution. At the highest level of resolution, a feature is the 3D position of an atom associated with the atom type.1416 At a coarser resolution level, adjacent atoms in space are grouped into topological features, such as phenyl ring and carbonyl group.17 Finally, at the lowest level of resolution, atoms are grouped into functional features based on physico-chemical properties that are important for ligand-receptor binding, like charge, aromaticity, hydrogen bonding and hydrophobicity.1821 Using the feature descriptors, the structures of the input ligands are represented mainly as 3D point sets,16,22 distance matrices,23,24 graphs,24,25 or trees.26

Drug-like molecules may adopt many possible conformations. The specific conformations that the input ligands adopt in the active site of the receptor are usually unknown. Additionally, they cannot be assumed to be the ones with the lowest energy.27 Therefore, all the feasible conformations of the ligands should be considered during the search for the common pharmacophore. Most methods perform the conformational search in a separate initial stage. Examples for such methods include RAPID, 16 MPHIL,14 DISCO,28,29 Phase,30 Catalyst/HipHop,1820,31 Catalyst/HypoGen18,21,31 and others.3234 These methods generate a discrete set of conformations for each ligand with the goal of covering its whole conformational space. The main limitation of this approach is that many conformations (about 100) might be required for each ligand.27 An alternative strategy is to incorporate the conformational search within the pattern detection process. This way the search space is not restricted to a predefined discrete number of conformations. However, currently the methods that adopt this approach are non-deterministic, predominantly based on a random search.15,17,35,36 Among them are SCAMPI,17 GAMMA,15,37,38 GASP,35,39 GALA-HAD40,41 and the MOGA-based method of Cottrell et al.36 Another approach for treating the flexibility of the ligands within the pattern detection process is employed by the MTree method.26 This method aligns feature-trees derived from the 2D structures of the input ligands.

Given a set of ligands, there are different ways to define the common pattern that we are looking for. The traditional approach is to search for the Maximal Common Substructure (MCS), namely the largest set of features shared by all input ligands.24,42 However, since some features are less frequent than others, the largest common pattern can be sometimes misleading as a pharmacophore. A better approach is to assign weights to the features based on their infrequency of appearance and to look for the highest weighted sum of common features. Another drawback of the MCS approach is that it is based on the assumption that there is a single pharmacophore and all ligands possess all its features. This is not always the case. The input ligands can bind to different binding sites, they may have different binding affinities, and in a noisy input some of them might be outliers. Several methods overcome this limitation by relaxing the requirement that all ligands should possess all the pharmacophoric features. In MPHIL, a pharmacophore model is the smallest set of features with which each ligand has at least a predefined number of features in common. In Catalyst/HipHop the user can define how many ligands must map completely or partially to the pharmacophore hypothesis.11 Besides adopting different approaches for defining the pharmacophore pattern, the methods also differ in the algorithm employed for detecting it. The most common algorithms are clique-detection,14,28,34,43 Gibbs sampling,32 exhaustive search17,18 and genetic algorithms.14,15,35,36

Here, we present an application of PharmaGist, a new ligand-based method for pharmacophore detection,44,45 for virtual screening. The main innovation of the method lies in the deterministic consideration of the flexibility of the input ligands in the pattern detection process. PharmaGist is robust, capable of handling a diverse and noisy input. The input ligands can bind to different binding sites, they may have different binding affinities, and some of them might be outliers. In this work we address the problem of selecting the most appropriate pharmacophore candidate for virtual screening. In addition, to address cases where the input ligands have different affinities, we suggest a scheme for weighting of pharmacophore candidates. In a weighted pharmacophore each feature is weighted based on the number of ligands that possess it. This means that some of the pharmacophoric features do not necessarily appear in all active ligands.

PharmaGist was tested successfully on the FlexS dataset. 45 This benchmark dataset consists of 74 ligands that are classified into 12 cases according to the protein receptor they bind to.6 The results, which were achieved using a set of standard default parameters, were consistent with reference pharmacophores derived from the bound ligand-receptor complexes. Here, we examine the applicability of PharmaGist for virtual screening. We estimate its performance on a commonly used dataset of G-Protein Coupled Receptor alpha1A. Additionally, we perform a large-scale evaluation using the DUD (directory of useful decoys) dataset.46 This dataset contains 2950 active ligands and 95316 decoys for 40 different receptors. The results show the merit of the method in comparison to other methods for pharmacophore-based screening, Catalyst1820,31 and MTree,26,47 as well as to receptor-based screening with the DOCK method.48,49

Method

PharmaGist consists of two modules. The first module is for detecting pharmacophore candidates from a set of input ligands. The other module is for pharmacophore-based screening.

Pharmacophore Detection

The input is a collection of ligands. The goal is to detect pharmacophore candidates, namely the highest scoring 3D configurations of features responsible for binding that are common to a significant number of input ligands. Below, we give an outline of the algorithm (Figure 1).

Figure 1
Pharmacophore detection flow.

The input ligands can be highly flexible. They might have many possible conformations distributed over the entire range of reachable energy and conformational space.27 A common approach to deal with the flexibility of the ligands is to generate in an initial stage a large number of possible conformations for each one of them (about 100 for a sufficient coverage27). Instead, our approach assumes that one of the ligands, the pivot, is given in its binding conformation and the flexibility of the other ligands is fully and explicitly considered within the search for the common pharmacophore. The advantage of this approach lies in the fact that in the worst case, when there are no clues on the binding conformation of any of the ligands, a set of conformations for only one of them (the pivot) needs to be computed. The pivot can be selected as the most rigid molecule or as the molecule with the highest affinity towards the target receptor. Unless a pivot is specified by the user, the algorithm iteratively tries each input ligand as a pivot.

The algorithm identifies pharmacophores by computing multiple flexible alignments between the input ligands. The multiple alignments are generated by combining pairwise alignments between the pivot and the other ligands. However, the optimal multiple alignment does not necessarily consist of the optimal pairwise alignments. Therefore, to ensure that we detect the optimal solution, a large number of pairwise alignments are generated and used by the multiple alignment stage. The resulting multiple alignments reveal spatial arrangements of consensus features shared by different subsets of input ligands. The highest-scoring ones are potential pharmacophores.

Here we briefly describe PharmaGist ligand representation, as well as pairwise and multiple alignment methods. A more detailed technical description accompanied by a complexity analysis can be found elsewhere.44,45 Next, we present our novel contribution in generation of weighted pharmacophores and automated selection of the most appropriate pharmacophores for the virtual screening stage.

Ligand Representation

In this stage, each ligand is partitioned into rigid groups and is assigned with a set of physico-chemical features.

A rigid group of a ligand is defined as a set of atoms between rotatable bonds, where a bond is considered rotatable if it is not (i) double, (ii) a bond on a ring, (iii) a bond connecting a single atom, or (iv) a peptide bond (a bond connecting two amino acids and is formed between the C atom in the carboxyl group of the first amino acid and the N atom in the amino group of the second one).

A feature is defined as a set of atoms in the same rigid group with a physico-chemical property important for binding. Namely, a feature is (i–ii) a hydrogen-bond acceptor/donor atom; (iii–iv) an anion/cation atom; (v) a set of atoms of an aromatic ring; or (vi) a pair of adjacent hydrophobic atoms (Table 3). In addition, users can define other feature types of their own.

Since a drug-like molecule can be represented by a graph, linear time techniques from graph theory are applied to find the rigid groups and the aromatic rings of each ligand.

Pairwise Alignment

The input for this stage is a pair of ligands, a pivot ligand and a single target ligand. The pivot is considered to be rigid, whereas the target ligand is treated as flexible. The goal is to generate the K highest-scoring pairwise alignments, where a pairwise alignment is a superposition of a feasible conformation of the target ligand onto the pivot. We score a pairwise alignment based on the pivot features matched by the superimposed conformation of the target ligand. Two features, one from the pivot and one from a conformation of the target ligand, can be matched if they are of the same type and their distance is below a predefined threshold. By default, the distance threshold is defined as 1.8Å for centers of aromatic rings and 1.4 Å for the centers of the other feature types. Overall, the score of a pairwise alignment with a subset {fi} of matched pivot features is Σi s(fi), where s(fi) is the score for matching the pivot feature fi (by default, it is 0.3 for hydrophobic groups, 3.0 for aromatic rings and 1 for the rest).

The task is tackled by a two-tier procedure. First, for each rigid group of the target ligand, a set of rigid transformations for superimposing it on the pivot is computed. Then, the resulting candidate new poses of the target rigid groups are assembled to form the K highest-scoring pairwise alignments. The procedure is highly efficient. A hybrid technique of Pose-Clustering50 and Geometric Hashing51 is employed for generating candidate new poses for the target rigid groups. Their best assemblies are computed by dynamic programming.

Multiple Alignment

The input is a pivot ligand, a collection of M target ligands and a large number (K) of pairwise alignments between the pivot and each target ligand. The goal is to compute the highest scoring multiple alignments. A multiple alignment is defined by a subset of target ligands and exactly one pairwise alignment for each one of them. The score of a multiple alignment is a function of the number of participating target ligands and the consensus subset of pivot features matched by them. Our aim is to find multiple alignments with significant subsets of pivot features matched by as many target ligands as possible. However, there is an inherent tradeoff between the two requirements. The larger the number of participating ligands (m), the smaller is their consensus subset of pivot features ({fi}). We have found that this tradeoff is well balanced by defining the score of a multiple alignment as m+1·featurescore, where feature_score is Σi s(fi).

Given K pairwise alignments for each of the M target ligands, there is an exponential number of (K + 1)M combinations for a multiple alignment. An enumeration over all these combinations is impractical since K is a large number (1500 by default) and M is unbounded. Instead, we apply an exhaustive search on the possible subsets of matched pivot features. If n is the number of pivot features, then there are 2n subsets to enumerate in the worst case. This enumeration is practical since the number of atoms, and thus the number of features (n), in a typical drug-like molecule is small. The algorithm works in a bottom-up manner. It starts with subsets of one matched pivot feature and incrementally extends them until no larger subsets are matched. The implementation is efficient using bitwise representation.

Weighted Pharmacophore Generation

The input is a list of multiple alignments between the pivot and different subsets of target ligands. In principle, pharmacophore models can be derived from the highest scoring multiple alignments. However, due to the tradeoff between the number of ligands participating in a multiple alignment and its consensus set of features, the input multiple alignments are not disjoint and some of them can be combined to gain more information on the candidate pharmacophores. Specifically, we are interested in combining a parent multiple alignment with other children multiple alignments for which the consensus is partial to the consensus of the parent due to additional target ligands. Figure 2 presents a parent multiple alignment with eight features shared by three ligands and children multiple alignments with less common features compared to their parent but with more input molecules.

Figure 2
The figure presents a multiple alignment ensemble of aldose reductase (ALR2) ligands as a path with three nodes. The parent multiple alignment (left node) consists of eight features shared by three ligands: four aromatic rings (cyan), three acceptors ...

The pharmacophore derived from such an ensemble of multiple alignments is called a weighted pharmacophore (Figure 2). In a weighted pharmacophore, the features {fi} are the consensus pivot features of the parent multiple alignment assigned with a weight w(fi) in addition to the feature score s(fi). The assigned weight w(fi) is a function of the number of molecules m(fi) in the ensemble that match fi, that is: w(fi)=m(fi). The overall score of a weighted pharmacophore is Σi w(fi) · s(fi).

The weighted pharmacophores are computed as follows. A directed acyclic graph (DAG) is constructed. The nodes of the graph stand for the input multiple alignments. A node u is connected to node v by a direct edge if u is a parent of v. Additionally, the nodes of the graph are weighted, where the weight of a node is the score of the corresponding multiple alignment. Paths in this graph stand for parent-children ensembles of multiple alignments and our aim is to find the highest-scoring ones. The highest-scoring paths are computed by dynamic programming.

The result of this stage is a set of multiple alignment ensembles, where each one of them is assigned with the corresponding weighted pharmacophore. Figure 2 presents a multiple alignment ensemble as a path with three nodes. The parent node stands for a multiple alignment between three molecules, which share eight common features. Its direct child node stands for a multiple alignment with two additional molecules and with only five common features. Finally, the last child node corresponds to a multiple alignment with an additional molecule (six in total) sharing only three common features.

Pharmacophore Clustering

The three stages described above (Pairwise Alignment, Multiple Alignment and Weighted Pharmacophore Generation) are repeated for each selected pivot. Different pivots may lead to the same or similar weighted pharmacophores. Therefore, after iterating over all the possible pivots, we cluster weighted pharmacophores with a similar 3D pattern of features.

Pharmacophore Candidates Selection

The algorithm produces a large set of weighted pharmacophores and the task of selecting the most appropriate ones for virtual screening is not trivial. The input molecules may bind different binding sites or belong to different classes of molecules. Based on this observation, we have developed a strategy for selecting a small set of weighted pharmacophores that cover as much as possible input molecules. The weighted pharmacophores are selected iteratively as follows. The first selected weighted pharmacophore is the top-scoring one. The ligands of the parent multiple alignment of this weighted pharmacophore, and these only, are marked as covered. Next, we iterate over the remaining weighted pharmacophores and select the next best scoring weighted pharmacophore that has no more than two molecules in common with the set of already covered ones. The process is repeated until all molecules are covered or no more weighted pharmacophores can be selected. This results in a ranked list of weighted pharmacophores associated with different classes of binding ligands.

Virtual Screening

The input to this module is a cover set of weighted pharmacophores and a 3D database (DB) of drug-like compounds. The goal is to rank the database compounds by their similarity to the weighted pharmacophores. The similarity is computed by a pairwise alignment between the DB molecule and the pivot molecule of each pharmacophore model. In addition to the pivot, it is possible to align the DB molecule with other molecules in the pharmacophore model. This can improve the alignment quality but the running time increases.

The pairwise alignments are scored by Σi w(fi) · s(fi), where w(fi) and s(fi) are the weight and score of the matched feature in the weighted pharmacophore respectively. In case of several pharmacophore models, the maximal score can be selected.

This stage employs the same pairwise alignment algorithm as in the pharmacophore detection module. The difference is that the number of pairwise alignments (K) generated in this stage is much lower compared to the pharmacophore detection module, since here we are only interested in the highest-scoring alignment. This leads to a highly-efficient pairwise alignment, which is suitable for screening large compound databases with explicit consideration of full compound flexibility.

Results

The capability of PharmaGist to identify correct pharmacophores was successfully tested on the FlexS dataset.45 This benchmark dataset consists of 74 ligands that are classified into 12 cases according to the protein receptor they bind to.6 Detailed description of the results is available elsewhere.45 Generally, the results, which were achieved using a set of standard default parameters, were consistent with reference pharmacophores derived from the bound ligand-receptor complexes. Below, we provide a description of additional tests for examining the applicability of PharmaGist not only for pharmacophore detection but mainly for virtual screening.

A key criterion for measuring the success of screening is the enrichment of annotated ligands among top ranking compounds. The higher the percentage of annotated ligands among the top ranking compounds, the better the performance of the screening. The enrichment criterion is evaluated by a numerical factor defined as

EFsubset={ligandssubset/compoundssubset}/{ligandstotal/compoundstotal}

where ligandstotal and compoundstotal are the total number of ligands and compounds in the database respectively, and ligandssubset and compoundssubset are the number of ligands and compounds found in the top-ranking subset of compounds respectively. This enrichment factor reflects the capability of a screening application to detect active ligands (true positives) compared to random selection. Thus, its value should always be greater than 1 and the higher it is, the better the enrichment performance of the virtual screening.

Running times of the pharmacophore detection stage range from several seconds to several minutes.45 In the screening application the running time of the method is 0.5–1.0 seconds (Intel(R) Xeon(TM) CPU 3.20GHz) per database compound, depending on the compound’s size.

Virtual Screening for G-Protein Coupled Receptors

G-Protein Coupled Receptors (GPCRs) is an important family of drug targets. GPCRs are membrane proteins that are hard to crystallize. Since many of them lack crystal structures, GPCRs have long been the ultimate targets for pharmacophore-based virtual screening.47 Here, we describe the performance of PharmaGist in identifying known antagonists of the alpha1A target receptor in comparison to two other pharmacophore-based tools, Catalyst1820,31 and MTree.26

Data and Experiment Description

We used a set of eighteen known alpha1A antagonists for pharmacophore detection. The compounds were divided into two classes based on their structures: twelve class I molecules and six class II molecules.52 A cover set of weighted pharmacophores were computed for each class of ligands separately. Then, each of these weighted pharmacophore candidates was used to screen a large dataset consisting of 50 active compounds and 950 decoys from the MDL Drug Data Report (MDDR) database.52

Class I Pharmacophore Candidates

PharmaGist automatically selected three pharmacophore candidates (Figure 3). Two aromatic rings and one acceptor (close to the left aromatic ring) are shared by all the three pharmacophore models and by all the twelve input ligands. Other features as well as the pivot molecule vary between the three pharmacophore models. The first pharmacophore candidate is the largest one. It consists of 13 features: three aromatic rings, two hydrophobic groups, two donors, five acceptors and one charged group (Figure 3a–b). These 13 features are shared by only three input molecules. The second pharmacophore candidate consists of six features: two aromatic rings, one donor, two acceptors and one charged group (Figure 3c–d). The third pharmacophore candidate also consists of six features: two aromatic rings, hydrophobic group, donor, acceptor and charged group (Figure 3e–f). The three pharmacophore candidates were found to be similar to a pharmacophore hypothesis computed by Catalyst.52 This pharmacophore hypothesis consists of hydrophobic groups instead of the common aromatic rings detected by PharmaGist, the acceptor that is shared by all molecules and the charge feature in the middle.

Figure 3
AlphaA1 Class I Pharmacophore Candidates

Class I Virtual Screening

The screening results with each pharmacophore candidate are summarized in Table 1 and Figure 5a. The enrichment factor at the top 1% of the ranked database (EF1) is 6.0, 14.0 and 10.0 for the first, second and third pharmacophore candidates respectively. The second pharmacophore candidate is much better in differentiating between actives and decoys compared to the first (and larger) pharmacophore candidate. The reason is probably due to the fact that the 13 features in the largest pharmacophore are shared by only three molecules. This artifact can be handled by an a-priori clustering of the input molecules. The enrichment factor EF1 for this pharmacophore candidate (14.0) is comparable with the enrichment factors of MTree (12.0) and Catalyst (14.0).47 It is also notable that the maximal enrichment is achieved for the top 1% of the ranked database.

Figure 5
Enrichment plots and ROC curves for the GPCR dataset
Table 1
Enrichment factors for the GPCR dataset

Class II Pharmacophore Candidates

Two pharmacophore candidates were computed as suitable for virtual screening by PharmaGist for the six class II ligands (Figure 4). Similarly to the class I pharmacophore candidates, two aromatic rings and one acceptor are shared by the two pharmacophore candidates and the six input ligands. However, the common acceptor for this class is between the aromatic rings. In addition to these three common features, the first pharmacophore candidate includes an additional acceptor and a hydrophobic group (Figure 4a–b), which are shared by only three input molecules. The second pharmacophore candidate has an additional acceptor, namely it consists of four features overall: two aromatic rings and two acceptors (Figure 4c–d)). The additional acceptor is shared by five out of six input molecules. The pharmacophore models suggested by Catalyst and MTree47 are similar to the pharmacophore candidates of PharmaGist. They all include the same two aromatic rings and there is a small variance in their other features. Additionally, the additional acceptor of the second selected pharmacophore candidate of PharmaGist is also observed in the pharmacophore model of MTree.

Figure 4
AlphaA1 Class II Pharmacophore Candidates

Class II Virtual Screening

Table 1 and Figure 5b summarize the enrichment results obtained for screening with each of the two pharmacophore candidates. The enrichment factor at the top 1% of the ranked database (EF1) is 4.0 and 14.0 for the first and second selected pharmacophore candidates respectively (Table 1). The enrichment factor for the first pharmacophore increases to 6.8 at the top 5% of the ranked database. Similarly to the results for class I, the second pharmacophore candidate is much better in differentiating between actives and decoys than the first (and larger) pharmacophore candidates. The EF1 enrichment factor for this pharmacophore candidate (14.0) is comparable with the enrichment factors of the pharmacophore models suggested by MTree (16.0) and Catalyst (12.0).47

Large-Scale Evaluation

We evaluated the performance of PharmaGist also on the DUD (directory of useful decoys) benchmark dataset.46 DUD contains 2950 annotated active ligands for 40 different protein receptors. In addition, each ligand has about 36 decoy molecules that are topologically distinct from the particular ligand, but physically similar to it in attributes, such as molecular weight and number of hydrogen bonds. The generation of DUD is based on the observation that the enrichment factor greatly depends on the physical properties of the background set of decoys. Specifically, misleading good enrichment might result if the physical properties of the decoys can easily be distinguished from those of the active ligands. Overall, the DUD database contains 2950 annotated ligands and 95316 decoys for 40 different receptors.

Experiment Description

The experiment for measuring the performances of PharmaGist for virtual screening on DUD was carried out as follows. For each of the 40 target receptors in DUD, we computed a set of weighted pharmacophores. This was performed by five runs of PharmaGist on random subsets (i.e. training sets) with five native ligands and additional five runs on random subsets with ten native ligands. PharmaGist selected only one weighted pharmacophore for each input with five ligands and at most three weighted pharmacophores for each input with ten ligands. Then, the selected weighted pharmacophores were used to screen a dataset consisting of all the native ligands of the receptor without the ligands in the training set (i.e. test set) and their corresponding decoys (called the ‘own decoys’ database in DUD46). For runs on ten input ligands and for which more than one weighted pharmacophore were selected by the method, we ranked the compounds in the database in three ways: (i) based on the score obtained for their similarity with only the first selected weighted pharmacophore (101), (ii) based on the highest score obtained for their similarity with the first and second selected weighted pharmacophore (101–2), and (iii) same as in (ii) but with all the three selected weighted pharmacophores (101–3).

Overall Enrichment Results

The screening results were evaluated by an enrichment factor at the top 1% of the ranked database (EF1) and are summarized in Table 2. Significant enrichment was observed for most receptors. Additionally, the results are competitive with the ones obtained by screening with the docking method, DOCK.48,49 Screening with DOCK was performed both against the ‘own decoys’ database as well as against the ‘entire’ DUD database.46 The ‘own decoys’ database is more challenging for screening. Thus, as reported by Huang et al., the results of DOCK are poorer against this database than against the ‘entire’ dataset and dramatic differences in enrichment were observed for some receptors.46 In contrast, the enrichment results of PharmaGist against the ‘own decoys’ database were far above random. On average, about 31% of the annotated ligands were found at the top 1% of the ranked database (corresponding to the average EF1), compared to 17.3% obtained for DOCK against the ‘entire database’, which is the less challenging database (Table 2). For most of the 40 receptors, PharmaGist achieved a more significant enrichment than DOCK, where for only eight receptors the situation was reversed. However, it should be noted that for at least three out of these eight cases (TK, PNP and SAHH), the reason for the differences in the enrichment obtained for the two methods result from the fact that the ‘own decoys’ database is more difficult for screening than the ‘entire’ database.46 It is also notable that the maximum enrichment factor of PharmaGist was obtained for 93% of the screens at the top 1% of the ranked database.

Table 2
DUD Screening Benchmark

Representative Targets

We examined the six representative receptors of Huang et al.46 in greater details. For these receptors, screening with PharmaGist was performed also on the ‘entire’ DUD database.46 As expected, the obtained enrichment factors for these receptors were much higher than the ones obtained against the ‘own decoys’ database, where on average, about 39% of the annotated ligands were found at the top 1% of the ranked database (corresponding to the average EF1) (highlighted in Table 2 and Figure 7). Pharmacophore candidates and the corresponding multiple alignments of ligands for the six representative receptors are presented in Figure 6. These pharmacophore candidates were computed by a run of PharmaGist on a random set of ten annotated ligands. Note that each feature in these pharmacophore candidates is weighted by they number of aligned ligands that possess it.

Figure 6
Pharmacophore candidates and the corresponding multiple alignments of ligands for the six representative receptors of DUD.46 These were computed by a run of PharmaGist on a random set of ten annotated ligands. Note that each feature in these pharmacophore ...
Figure 7
Enrichment plots for the six selected cases from DUD dataset

Conclusions and Future Work

We have described PharmaGist, a fully automated indirect method for both pharmacophore elucidation and virtual screening. PharmaGist has two modules. Given a collection of drug-like molecules, the first module of PharmaGist computes a set of pharmacophore candidates by multiple flexible alignments of input molecules. The second module uses the pharmacophore candidates for virtual screening in order to identify new compounds that bind to the target receptor.

From the computational standpoint, PharmaGist consists of two main algorithms for pairwise and multiple alignment of flexible drug-like molecules. The computational problems addressed by the two algorithms are NP-hard. Nevertheless, PharmaGist is highly efficient. Specifically, the pairwise alignment algorithm provides a fast heuristic solution with polynomial time complexity. The multiple alignment algorithm is also fast. Its time complexity is exponential only in the number of features (which is relatively low) and polynomial in the number of input ligands and their degrees of freedom.

The capability of PharmaGist to identify correct pharmacophores was successfully tested on the FlexS dataset.45 Here, we have performed additional tests for examining the applicability of PharmaGist not only for pharmacophore detection but mainly for virtual screening. We have evaluated its performance in identifying known antagonists of the alpha1A G-Protein Coupled Receptor from a large virtual library. In addition, we have performed a large-scale evaluation using the DUD dataset, which contains 2950 annotated ligands and 95316 decoys for 40 different receptors. The results show the success of the method in comparison to other methods for pharmacophore-based screening, Catalyst1820,31 and MTree,26,47 as well as to receptor-based screening with the DOCK method.48,49

The results also demonstrate the advantages of the method. PharmaGist is able to deal with different types of target receptors. The flexibility of the ligands is fully considered in a deterministic manner, an attribute that is unique to PharmaGist. This allows the user to screen large databases without a-priori conformational search of the compounds in the database. Another important attribute of the method is the capability of detecting pharmacophore candidates that are shared by non-predefined subsets of input ligands. This makes PharmaGist tolerant to the presence of outlier ligands and may aid in distinguishing between pharmacophores of different binding sites. There is a tradeoff between the number of aligned molecules and the number of their common features. This tradeoff is addressed in PharmaGist by introducing the notion of a weighted pharmacophore. In such a model for a pharmacophore, each feature is weighted based on the number of ligands that possess it. This means that some of the features do not necessarily appear in all active ligands. In addition, we have proposed a strategy for selecting a small cover set of weighted pharmacophores for virtual screening.

The software for pharmacophore detection and virtual screening is available for download. In addition, a user-friendly web interface for pharmacophore detection is available at http://bioinfo3d.cs.tau.ac.il/PharmaGist. The software allows fully automated application of the method. In addition, it allows full user control of the program parameters through a configuration file. In the pharmacophore detection stage, the user can set a pivot molecule, change feature weights, define a new feature type, as well as change more advanced method parameters. The user can also manually select which of the pharmacophores produced by the method should be used in the virtual screening stage.

In future work we intend to improve the method so it will be able to detect pharmacophores that are present on non-adjacent rigid groups of the ligands. PharmaGist can also be strengthened by considering ring conformations, deriving a pharmacophore shape as the negative image of the receptor binding site, and introducing the notion of excluded volumes.

Acknowledgments

The authors thank D. Fishlovitch, A. Oron and H. Senderowitz for their useful advice and A. Evers for providing the GPCR dataset. The research of O. Dror and Y. Inbar has been supported by the Eshkol Fellowship funded by the Israeli Ministry of Science. The research of H.J. Wolfson has been supported in part by the Israel Science Foundation (grant no. 281/05) and by the Hermann Minkowski-Minerva Center for Geometry at TAU. The research of H.J. Wolfson and R. Nussinov has been supported by the NIAID, NIH (grant No. 1UC1AI067231), and by the Binational US-Israel Science Foundation (BSF). This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number N01-CO-12400. The content of this publication does not necessarily reflect the view of the policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the U.S. Government. This research was supported [in part] by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.

Appendix: Feature Definition

Table 3

Feature Definitions

The different features are listed in italics in the 1st column. Hydrogen Bond Donor/Acceptor, Acid and Base are single-atom features. Their definitions follow the ones of fFLASH7 and appear in the format: atom types: bonds. The atom types field is a list of SYBYL atom types separated by the | OR operator. The bonds field can specify the number of bonds of the atom in <>, the bonded atoms in () and atoms that should not be bonded in [] (* stands for any atom type). For example, O.3 | S.3: <2> (1 H) (*) defines an O.3 or S.3 atom attached to exactly one hydrogen and to additional atom. Each feature is represented in 3D space by its center (appears in the 2st column). Note that for a carboxyl group, C.2: <3> (2 O.co2) (*), two feature points are defined. The 3rd column specifies the direction vector for some of the features.

FeatureCenterDirection

Hydrogen Bond Donor
O.3 | S.3: <2> (1 H) (*)atom(*) → atom
N.4 | N.3 | N.2 | N.am | N.pl3: (H)atomatom → (H)
N.ar: <3> (H)atomatom → (H)
Hydrogen Bond Acceptor
N.3 | P.3: <3> (3 *)atomcentroid of (3 *) → atom
O.3 | S.3: <2> (2 *) [H]atomcentroid of (2 *) → atom
N.2 | N.ar: <2> (2 *)atomcentroid of (2 *) → atom
O.2 | S.2: <1> (*: <3>)atom(*: <3>) → atom
O.2: <1> (P | S: <4>)atom(P | S: <4>) → atom
N.1: <1> (*)atom(*) → atom
O.3 | S.3: <2> (1 H) (*)atom(*) → atom
C.2: <3> (2 O.co2) (*)2 × O.co2(*) → atom
O.3: <1> (N.4: <4>)atom(N.4: <4>) → atom
Acid
C.2: <3> (2 O.co2)atom-
P.3 | S.3: (>2 O.co2)atom-
O.3: (N.4)atom-
Base-
C.cat | N.4 | N.ar: <3>atom-
Aromatic Ring
A ring satisfying Definition 1ring centerring normal
Hydrophobic
An atom pair satisfying Definition 2pair center-

Definition 1. Aromatic Ring. A planar ring of five or six atoms is aromatic if one of the following holds: (i) it has 4n+2 Pi electrons, where n is a non-negative integer (Hückel rule); (ii) it has five sp2 atoms and one sp3 atom; or (iii) it has four sp2 atoms and two sp3 atoms.

Definition 2. Hydrophobic Pair. A pair of non-polar carbon atoms that are at least two bonds away from a hetero-atom or a charged atom.

References

1. Sun H. Pharmacophore-based Virtual Screening. Current Medicinal Chemistry. 2008;15:1018–24. [PubMed]
2. Shoichet BK. Virtual Screening of Chemical Libraries. Nature. 2004;432:862–865. [PMC free article] [PubMed]
3. Schneidman-Duhovny D, Nussinov R, Wolfson HJ. Predicting Molecular Interactions in silico II: Protein-protein and Protein-drug Docking. Frontiers in Medicinal Chemistry. 2006;3:585–614. [PubMed]
4. Lengauer T, Lemmen C, Rarey M, Zimmermann M. Novel Technologies for Virtual Screening. Drug Discovery Today. 2004;9:27–34. [PubMed]
5. Lemmen C, Lengauer T. Time-efficient Flexible Superposition of Medium-sized Molecules. J Comput-Aided Mol Des. 1997;11:357–368. [PubMed]
6. Lemmen C, Lengauer T, Klebe G. FlexS: A Method for Fast Flexible Ligand Superposition. J Med Chem. 1998;41:4502–4520. [PubMed]
7. Krämer A, Horn H, Rice J. Fast 3D Molecular Superposition and Similarity Search in Databases of Flexible Molecules. J Comput-Aided Mol Des. 2003;17:13–38. [PubMed]
8. Rognan D. Chemogenomic Approaches to Rational Drug Design. Br J Pharmacol. 2007;152:38–52. [PMC free article] [PubMed]
9. Klabunde T. Chemogenomic Approaches to Drug Discovery: Similar Receptors Bind Similar Ligands. Br J Pharmacol. 2007;152:5–7. [PMC free article] [PubMed]
10. Dror O, Shulman-Peleg A, Nussinov R, Wolfson HJ. Predicting Molecular Interactions In Silico: I. An Updated Guide to Pharmacophore Identification and its Applications to Drug Design. Frontiers in Medicinal Chemistry. 2006;3:551–584. [PubMed]
11. Güner OF, editor. Pharmacophore Perception, Development, and Use in Drug Design. International University Line; La Jolla, CA, USA: 2000.
12. Wolber G, Seidel T, Bendix F, Langer T. Molecule-pharmacophore Superpositioning and Pattern Matching in Computational Drug Design. Drug Discovery Today. 2008;13:23–9. [PubMed]
13. Dror O, Shulman-Peleg A, Nussinov R, Wolfson HJ. Predicting Molecular Interactions in silico: I A Guide to Pharmacophore Identification and its Applications to Drug Design. Current Medicinal Chemistry. 2004;11:71–90. [PubMed]
14. Holliday J, Willet P. Using a Genetic Algorithm to Identify Common Structural Features in Sets of Ligands. J Mol Graph Modell. 1997;15:203–253. [PubMed]
15. Handschuh S, Wagener M, Gasteiger J. The Search for the Spatial and Electronic Requirements of a Drug. J Mol Model. 2000;6:358–378.
16. Finn PW, Kavraki LE, Latombe JC, Motwani R, Shelton C, Venkatasubramanian S, Yao A. RAPID: Randomized Pharmocophore Indentification for Drug Design. Computational Geometry: Theory and Applications. 1998;10:263–272.
17. Chen X, Rusinko A, III, Tropsha A, Young S. Automated Pharmacophore Identification for Large Chemical Data Sets. J Chem Inf Comput Sci. 1999;39:887–896. [PubMed]
18. Güner OF, Clement O, Kurogi Y. Pharmacophore Modeling and Three Dimensional Database Searching for Drug Design Using Catalyst: Recent Advances. Current Medicinal Chemistry. 2004;11:2991–3005. [PubMed]
19. Clement OA, Mehl AT. Chapter HipHop: Pharmacophores Based on Multiple Common-Feature Alignments. International University Line; La Jolla, CA, USA: 2000. Pharmacophore Perception, Development, and Use in Drug Design; pp. 69–84.
20. Barnum D, Greene J, Smellie A, Sprague P. Identification of Common Functional Configurations among Molecules. J Chem Inf Comput Sci. 1996;36:563–571. [PubMed]
21. Li H, Sutter J, Hoffmann R. Chapter HypGen: An Automated System for Generating 3D Predictive Pharmacophore Models. International University Line; La Jolla, CA, USA: 2000. Pharmacophore Perception, Development, and Use in Drug Design; pp. 171–189.
22. Wolber G, Dornhofer A, Langer T. Efficient Overlay of Small Organic Molecules Using 3D Pharmacophores. J Comput-Aided Mol Des. 2006;20:773–88. [PubMed]
23. Crandell C, Smith D. Computer-assisted examination of compounds for common three-dimensional substructures. J Chem Inf Comput Sci. 1983;23:186–197.
24. Brint A, Willett P. Algorithms for the identification of three-dimensional maximal common substructures. J Chem Inf Comput Sci. 1987;27:152–158.
25. Takahashi Y, Satoh Y, Suzuki H, Sasaki S. Recognition of Largest Common Structural Fragment among a Variety of Chemical Structures. Analytical Sciences. 1987;3:23–28.
26. Hessler G, Zimmermann M, Matter H, Evers A, Naumann T, Lengauer T, Rarey M. Multiple-Ligand-Based Virtual Screening: Methods and Applications of the MTree Approach. J Med Chem. 2005;48:6575–6584. [PubMed]
27. Günther S, Senger C, Michalsky E, Goede A, Preissner R. Representation of target-bound drugs by computed conformers: implications for conformational libraries. BMC Bioinformatics. 2006;7:293. [PMC free article] [PubMed]
28. Martin Y, Bures M, Dahaner E, DeLazzer J, Lico I, Pavlik P. A fast new approach to pharmacophore mapping and its Application to Dopaminergic and Benzodiazepine agonists. J Comput-Aided Mol Des. 1993;7:83–102. [PubMed]
29. Martin YC. Chapter DISCO: What We Did Right and What We Missed. International University Line; La Jolla, CA, USA: 2000. Pharmacophore Perception, Development, and Use in Drug Design; pp. 49–68.
30. Dixon S, Smondyrev A, Knoll E, Rao S, Shaw D, Friesner R. PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput-Aided Mol Des. 2006;20:647–71. [PubMed]
31. Kurogi Y, Güner OF. Pharmacophore Modeling and Three-Dimensional Database Searching for Drug Design Using Catalyst. Current Medicinal Chemistry. 2001;8:1035–1055. [PubMed]
32. Feng J, Sanil A, Young S. PharmID: Pharmacophore Identification Using Gibbs Sampling. J Chem Inf Model. 2006;46:1352–1359. [PubMed]
33. Zhu F, Agrafiotis D. Recursive Distance Partitioning Algorithm for Common Pharmacophore Identification. J Chem Inf Model. 2007;47:1619–1625. [PubMed]
34. Podolyan Y, Karypis G. Common Pharmacophore Identification Using Frequent Clique Detection Algorithm. J Chem Inf Model. 2009;49:13–21. [PMC free article] [PubMed]
35. Jones G, Willett P, Glen R. A Genetic Algorithm for Flexible Molecular Overlay and Pharmacophore Elucidation. J Comput-Aided Mol Des. 1995;9:532–49. [PubMed]
36. Cottrell SJ, Gillet VJ, Taylor R, Wilton DJ. Generation of Multiple Pharmacophore Hypotheses Using Multiobjective Optimisation Techniques. J Comput-Aided Mol Des. 2004;18:665–682. [PubMed]
37. Handschuh S, Wagener M, Gasteiger J. Superimposition of Three-Dimensional Chemical Structures Allowing for Conformationa Flexibility by a Hybrid Method. J Chem Inf Comput Sci. 1997;38:220–232. [PubMed]
38. Handshuh S, Gasteiger J. Chapter Pharmacophores Derived from the 3D Substructure Perception. International University Line; La Jolla, CA, USA: 2000. Pharmacophore Perception, Development, and Use in Drug Design; pp. 429–454.
39. Jones G, Willett P, Glen R. Chapter GASP:Genetic Algorithm Superposition Program. International University Line; La Jolla, CA, USA: 2000. Pharmacophore Perception, Development, and Use in Drug Design; pp. 85–106.
40. Richmond N, Abrams C, Wolohan P, Abrahamian E, Willett P, Clark R. GALAHAD: 1. Pharmacophore Identification by Hypermolecular Alignment of Ligands in 3D. J Comput-Aided Mol Des. 2006;20:567–87. [PubMed]
41. Shepphird JK, Clark RD. A Marriage Made in Torsional Space: Using GALAHAD Models to Drive Pharmacophore Multiplet Searches. J Comput-Aided Mol Des. 2006;20:763–771. [PubMed]
42. Shatsky M, Shulman-Peleg A, Nussinov R, Wolfson HJ. The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection. J Comp Biol. 2006;13:407–42. [PubMed]
43. Baum D. Multiple Semi-flexible 3D Superposition of Drug-sized Molecules. Computational Life Sciences: First International Symposium, CompLife 2005; Konstanz, Germany. 2005. pp. 198–207.
44. Inbar Y, Schneidman-Duhovny D, Dror O, Nussinov R, Wolfson HJ. Deterministic Pharmacophore Detection Via Multiple Flexible Alignment of Drug-Like Molecules. Research in Computational Molecular Biology (RECOMB), 11th Annual International Conference; Oakland, CA, USA. 2007. pp. 412–429.
45. Schneidman-Duhovny D, Dror O, Inbar Y, Nussinov R, Wolfson HJ. Deterministic Pharmacophore Detection via Multiple Flexible Alignment of Drug-Like Molecules. J Comp Biol. 2008;15:737–54. [PMC free article] [PubMed]
46. Huang N, Shoichet B, Irwin JJ. Benchmarking Sets for Molecular Docking. J Med Chem. 2006;49:6789–6801. [PMC free article] [PubMed]
47. Evers A, Hessler G, Matter H, Klabunde T. Virtual Screening of Biogenic Amine-Binding G-Protein Coupled Receptors: Comparative Evaluation of Protein-and Ligand-Based Virtual Screening Protocols. J Med Chem. 2005;48:5448–5465. [PubMed]
48. Kuntz I, Blaney J, Oatley S, Langridge R, Ferrin T. A Geometric Approach to Macromolecule-Ligand Interactions. J Mol Biol. 1982;161:269–288. [PubMed]
49. Ewing T, Makino S, Skillman A, Kuntz I. DOCK 4.0: Search Strategies for Automated Molecular Docking of Flexible Molecule Databases. J Comput-Aided Mol Des. 2001;15:411–428. [PubMed]
50. Stockman G. Object Recognition and Localization via Pose Clustering. J of Computer Vision, Graphics, and Image Processing. 1987;40:361–387.
51. Lamdan Y, Wolfson HJ. Geometric Hashing: A General and Efficient Model-Based Recognition Scheme. Proceedings of the IEEE Int. Conf. on Computer Vision; Tampa, Florida, USA. 1988. pp. 238–249.
52. Klabunde T, Evers A. GPCR Antitarget Modeling: Pharmacophore Models for Biogenic Amine Binding GPCRs to Avoid GPCR-mediated Side Effects. Chembiochem. 2005;6:876–889. [PubMed]