|Home | About | Journals | Submit | Contact Us | Français|
We present a novel algorithm called CrystalDock that analyzes a molecular pocket of interest and identifies potential binding fragments. The program first identifies groups of pocket-lining receptor residues (i.e., microenvironments) and then searches for geometrically similar microenvironments present in publically available databases of ligand-bound experimental structures. Germane fragments from the crystallographic or NMR ligands are subsequently placed within the novel binding pocket. These positioned fragments can be linked together to produce ligands that are likely to be potent; alternatively, they can be joined to an inhibitor with a known or suspected binding pose to potentially improve binding affinity.
To demonstrate the utility of the algorithm, CrystalDock is used to analyze the principal binding pockets of influenza neuraminidase and Trypanosoma brucei RNA-editing ligands 1, validated drug targets in the fight against pandemic influenza and African sleeping sickness, respectively. In both cases, CrystalDock suggests modifications to known inhibitors that may improve binding affinity.
The computational prediction of molecular recognition is important in modern drug discovery. Computational medicinal chemists seek to answer two classes of questions. First, given an NMR, crystallographic, or homology-modeled receptor structure, can novel ligands be identified in silico? Second, given a ligand with a known or suspected receptor-binding pose, what chemical modifications can improve potency?
A number of techniques have been developed to answer these questions. When a project requires that thousands of potential ligands be evaluated, techniques such as computational docking that favor speed over accuracy are often employed. These programs typically sample multiple ligand conformations and attempt to fit, or dock, each conformation into a known binding-pocket structure. Receptor-ligand interactions are subsequently evaluated using a fast scoring function to estimate binding affinity. Unfortunately, because these algorithms are optimized for speed, they are far less accurate than most experimental techniques.1 Any single docking prediction is untrustworthy; the aim of a docking study is rather to produce an enriched pool of potential binders by docking many ligands (e.g., compounds with diverse scaffolds or analogues of a known inhibitor) and considering only the top predicted binders for subsequent experimental or computational evaluation.
Other computational techniques similarly designed to confirm or refute molecular recognition have been optimized for accuracy rather than speed. These techniques utilize molecular dynamics simulations to probe not only the possible conformations of the ligand, but also the conformations of the protein, water molecules, and other molecular elements that may contribute to binding.2–6 As an accurate prediction of the binding free energy depends on adequate sampling of these many possible conformations, molecular-dynamics-based techniques can easily require thousands or even tens of thousands of computer hours. If sampling is adequate, however, these methods are often more accurate than the scoring functions used by computer-docking programs.
Here, we present a computer program called CrystalDock that takes a different approach to the computational identification of molecular recognition. Our technique is somewhat unique7 in that it makes direct use of crystallographic and NMR structures from the Protein Data Bank (PDB)8 to generate a molecular-recognition database that is used to place molecular fragments into binding pockets of interest. In this paper, we describe the CrystalDock algorithm and use the program to generate novel potential inhibitors of influenza neuraminidase and Trypanosoma brucei RNA editing ligase 1 (TbREL1), enzymes critical to the etiological agents of pandemic influenza and African sleeping sickness, respectively.
CrystalDock is open source and python implemented, making it easily editable, customizable, and platform independent. The program has been tested on Ubuntu Linux 10.04.1 LTS, Mac OS X 10.6.6, and Windows XP using Python versions 2.6.5, 2.6.1, and 2.6, respectively, together with NumPy/SciPy versions 1.3.0/0.7.0, 2.0.0.dev-3071eab/0.10.0.dev, and 1.6.1rc1/0.9.0, respectively. A copy can be obtained free of charge from http://www.nbcr.net/crystaldock.
A search of the Protein Data Bank in February 2011 revealed 50,424 structures with bound ligands. To generate a database of molecular fragments, models of all these receptor-ligand complexes were downloaded. Several classes of undesirable ligands were subsequently identified: ligands that contained multiple rotamers; ligands that were ribonucleic, deoxyribonucleic, or amino acids; and small molecules that were not sufficiently close to any potential receptor. After the receptor-ligand complexes containing only these undesirable ligands were removed, 43,327 complexes containing 202,584 ligands remained for subsequent analysis. A representative example of such a ligand positioned in a receptor binding pocket is shown in Figure 1A.
Each of these 202,584 ligands was subsequently fragmented into its constituent molecular parts. All bonds between heavy atoms not belonging to the same ring were identified; “cutting” along these bonds produced multiple molecular fragments from each ligand model (Figure 1B). Any fragment with fewer than three heavy atoms was merged with the neighboring fragment that had the fewest atoms.
While long-range electrostatic interactions certainly can influence fragment binding, for many fragments the predominant interactions required for molecular recognition are with receptor atoms that immediately line the fragment-binding pocket. Consequently, receptor microenvironments (i.e., groups of adjacent, pocket-lining receptor residues) were identified by extending geometric rays, separated by 10° in all directions, from each fragment atom out into space (Figure 1C). Whenever a ray encountered a receptor residue, ray extension was terminated, and the residue was recorded. Additionally, if a ray grew to 4 Å without encountering any receptor residue, it was similarly terminated. Testing confirmed that this process could effectively identify the receptor residues that line a fragment-binding pocket (Figure 1D).
Often the number of residues lining a binding pocket was too large. To make the number of microenvironments more manageable for future search, a ligand-receptor distance cutoff was implemented. The cutoff was gradually scaled back from 4 Å to 0 Å, and receptor residues beyond the cutoff were discarded at every step. In this way, multiple microenvironments were identified for each molecular fragment. Only those microenvironments with 3, 4, and 5 receptor residues, 823,460 in total, were subsequently considered (Figure 1E).
Characterization of a new binding pocket begins when the user provides three-dimensional coordinates identifying the pocket location. CrystalDock then sends out rays as described above to identify the pocket-lining receptor residues. All combinations of 3, 4, and 5 active-site residues are subsequently considered (Figure 2A).
CrystalDock then searches through the database of predefined microenvironments in an attempt to find geometric matches (Figure 2B). An RMSD alignment is used to judge microenvironment similarity. Those aligned microenvironments from the database that are judged to be geometrically similar to the identified microenvironments of the binding pocket are saved for further analysis. Rather than requiring exact amino-acid matches, the user can also instruct the program to consider chemically similar amino acids to be equivalent according to a predefined similarity matrix (Table S1) based on BLOSUM62.9 See the supporting information for more details.
Though the RMSD alignment considers only receptor residues (i.e., the residues of the microenvironment), the structures include models of the original ligand fragments as well; RMSD alignment positions these molecular fragments within the binding pocket of interest. Once those positioned fragments that have steric clashes with receptor residues have been discarded, a final set of molecular fragments ideally positioned within the pocket of interest remains (Figure 2C).
Independent-trajectories thermodynamic integration (IT-TI)10 was used to predict the binding energies of both a known and a predicted Trypanosoma brucei RNA editing ligase 1 (TbREL1) inhibitor. The initial model-creation, minimization, and equilibration steps of the receptor-bound V211 simulation have been described previously.12 An identical protocol was used to set up the simulation of a receptor-bound CrystalDock inhibitor as well.
For the solvated-ligand simulations, the same ligand parameters were used. The ligands were immersed in a TIP313 water box extending 10 Å beyond the ligand atoms in all three dimensions. Na+ counterions were added as needed to ensure the electrical neutrality of the system. NAMD 2.7b114 was used to subject the system to 15,000 steps of conjugate-gradient energy minimization.
Following this initial preparation, six independent thermodynamic-integration (TI) runs,15 three with bound-ligand annihilation and three with solvated-ligand annihilation, were performed for each ligand. Twenty-one lambda points were used for both the receptor-bound and solvated simulations; as λ decreased from 1.00 to 0.00 in decrements of 0.05, the electrostatic and van der Waals interactions between the ligand and protein were slowly turned off (tiVdwLambdaEnd = tiElecLambdaStart = 0.5). Each simulation ran for 2 ns. Average dE/dλ values over the last 1 ns of each simulation were plotted against λ; the area under the curve, calculated using the composite trapezoidal rule as implemented in SciPy,16 was taken to be the free energy of the alchemical transformation. The free energies of the transformations in which the protein-bound and solvated ligands vanish are denoted ΔGprotein and ΔGwater, respectively.
When necessary, a restraint was applied to the protein and ligand to maintain the correct positional orientation. This was especially necessary at low λ values, where the interactions between the ligand and protein were nearly absent. As the protein-bound ligand was often confined to an artificially limited volume by these imposed constraints, it was necessary to correct the free energy accordingly. This was done using the formula employed by Lawrenz et al.17 Specifically,
where ΔGcorrected is the corrected free energy, ΔGprotein is the free energy prior to correction, R is the gas constant, T is the temperature, and Vpocket is the volume sampled by the ligand during the simulations. The free energy of binding, ΔGbind, was ultimately calculated by simply subtracting ΔGcorrected from ΔGwater.
As each TI run was performed in triplicate, there were three ΔGcorrected and three ΔGwater values, yielding nine possible estimates of ΔGbind. Histograms of these nine values were generated by simple binning; average predicted binding energies are also reported.
Here, we present a novel algorithm called CrystalDock that identifies molecular fragments likely to bind pockets of interest. First, CrystalDock identifies relevant microenvironments (i.e., groups of adjacent pocket-lining amino-acid residues) within the user-specified binding pocket (Figure 2A). The program then searches through a database of microenvironments derived from ligand-bound crystallographic and NMR structures deposited in the Protein Data Bank (Figure 2B)8 for similar microenvironments, and positions the associated small-molecule fragments from the database within the binding pocket of interest (Figure 2C). The identified fragments can be joined to create novel ligands or can be fused to ligands with known binding poses to enhance potency.
To demonstrate the utility of CrystalDock, we first used the program to analyze the principal binding pocket of influenza neuraminidase (Figure 3). Neuraminidase is a useful initial validation system because it is thoroughly represented among PDB structures; a search of the PDB for the terms “neuraminidase” and “sialidase” returned 222 structures with ~70 unique ligands that bind in the sialic-acid pocket. Neuraminidase is an important drug target in the fight against influenza, including virulent strains like those that have recently caused the H1N1 and H5N1 pandemics. A number of neuraminidase inhibitors have been approved by the FDA or are otherwise progressing through clinical trials.18
CrystalDock-identified binding fragments came from 95 distinct PDB structures representing 39 unique ligands. As expected, most of the identified ringed fragments were derived from known neuraminidase ligands, including peramivir (e.g., PDB ID 3K3719), oseltamivir (e.g., PDB ID 3K3A19), zanamivir (e.g., PDB ID 3B7E20), sialic acid (e.g., PDB ID 2C4A), DANA (e.g., PDB ID 2F25), and other experimental inhibitors. However, positioned fragments were not derived exclusively from neuraminidase structures; a single fragment was also obtained from pentaethylene glycol bound to D-lactate dehydrogenase (PDB ID: 3KB621), confirming that CrystalDock is able to identify binding fragments from even distantly related proteins.
Interestingly, CrystalDock placed a sulfate ion near the location of the charged oseltamivir carboxylate group (Figure 3B). The idea of substituting this carboxylate group with a sulfonate is interesting, as at least one known neuraminidase inhibitor has a sulfonate group so positioned (2-[N-cyclohexylamino]ethane sulfonic acid, PDB ID: 2VW222), and the binding pose of another inhibitor is also thought to position a sulfonate at this same location.23 Additionally, a number of researchers have demonstrated that inhibitors with comparable phosphonate groups are also potent.24–27
As many neuraminidase structures with bound ligands have been deposited in the PDB, the example above, while useful as a proof of concept and for method validation, does not demonstrate the full utility of CrystalDock. Ideally, the program should be able to extract molecular fragments from multiple structurally and even functionally diverse receptors. As a further demonstration, we used CrystalDock to analyze the binding pocket of the adenylation domain of Trypanosoma brucei RNA editing ligase 1 (TbREL1), a protein with only one PDB-deposited crystal structure bound to a single ligand (ATP).28
The results of the TbREL1 CrystalDock run are shown in Figure 4 together with V2, a known low-μM naphthalene-based inhibitor11 docked into the TbREL1 crystal structure using AutoDock Vina.29 When only the lower, buried portion of the TbREL1 active site was targeted, predicted binding fragments from 55 structures representing 45 unique ligands were identified. Interestingly, the CrystalDock-positioned fragments can be generally clustered into three groups. The first group contains a single sulfate that CrystalDock positioned near the predicted pose of a V2 sulfonate group, explaining in part the potency of the naphthalene-based inhibitor. The second group is comprised principally of aromatic fragments that are generally in the same region and plane as the V2 naphthalene group. The third group, represented by several mostly hydrophobic fragments, does not correspond to any V2 substructure, suggesting a possible route for improving potency.
Serendipitously, the CrystalDock poses of two of these group-three hydrophobic fragments, toluene fragments derived from two unique inhibitors of P38 mitogen-activated protein kinases (PDB IDs 2ZB130 and 3LHJ31), was ideal for fragment addition to the Vina-docked V2 via an intermediary methyl linker (Figure 4C). These toluene fragments occupied a small pocket at the buried end of the TbREL1 active site that, to our knowledge, has not been previously exploited for drug design.
To test if a CrystalDock-inspired V2 + toluene composite compound would have improved binding affinity over V2, we employed a computational technique known as independent-trajectories thermodynamic integration (IT-TI)10 to predict ligand binding energies.15 IT-TI is far more computationally demanding than high-throughput methods for estimating binding affinity (e.g., computer docking programs); using the protocol described in the Materials and Methods, calculating the binding energy of a single TbREL1 ligand required approximately 25,000 computer-hours. In contrast, a simple docking run using AutoDock Vina29 takes only a few minutes on a single processor, a speed up of roughly five orders of magnitude. However, if conformational sampling is adequate, IT-TI is often more accurate than docking scoring functions.
Six TI runs were executed for each system, three in which the protein-bound ligand was annihilated, and three in which the solvated ligand was annihilated. From these six TI runs, nine binding-energy estimates were calculated. Histograms of these nine values were generated by simple binning and are shown in Figure 5. The predicted binding affinity of V2 was −8.4 ± 0.5 kcal/mol, which correlates well with the experimentally measured IC50 value of 1.53 μM.11 The predicted binding energy of the new composite compound was −10.7 ± 0.9 kcal/mol, representing a 2.3 kcal/mol improvement.
A one-tailed, homoscedastic T test was subsequently used to determine if the difference in the average predicted binding energies of V2 and the composite compound was statistically significant. The associated p value was 2.7 × 10−6, suggesting that the CrystalDock-inspired ligand represents a genuine improvement, possibly potent in the nM range.
If these predictions are confirmed experimentally, this optimized compound may represent a significant contribution in the fight against African sleeping sickness, a deadly parasitic infection that currently afflicts as many as 70,000 Sub-Saharan Africans and threatens an additional 50 million.32 In a previous study, we demonstrated that V2 fails to kill whole-cell parasites despite its confirmed TbREL1 inhibition,11 perhaps because the compound is too negatively charged to easily cross biological membranes. However, the hydrophobic toluene group may make membrane crossing more feasible. The calculated LogP values of the electrically neutral forms of V2 and the new composite compound are 0.305 and 2.940, respectively, supporting this theory.
The new composite compound is a promising lead for other reasons as well. It is only one hydrogen-bond acceptor away from satisfying Lipinski’s Rule of Five,33 a common measurement of druglikeness; the hydroxyl group connected to the aminonaphthalene forms an important hydrogen bond with the receptor, but the other hydroxyl group may be a good candidate for elimination. The CrystalDock-predicted toluene pose is also ideally positioned to react with the V2 amine, assuming the Vina-docked pose of V2 is correct; we expect that the composite compound can be easily synthesized by reacting V2 with 1-(bromomethyl)-2-methylbenzene via an amino-de-halogenation reaction.34
This work was carried out with funding from NIH 5T32GM007752-32 to AJF and NIH GM31749, NSF MCB-1020765, and MCA93S013 to JAM. Additional support from the Howard Hughes Medical Institute, the NSF Supercomputer Centers, the San Diego Supercomputer Center, the W.M. Keck Foundation, the National Biomedical Computational Resource, and the Center for Theoretical Biological Physics is gratefully acknowledged.