|Home | About | Journals | Submit | Contact Us | Français|
Nuclear receptors (NRs) are ligand dependent transcriptional factors and play a key role in reproduction, development, and homeostasis of organism. NRs are potential targets for treatment of cancer and other diseases such as inflammatory diseases, and diabetes. In this study, we present a comprehensive library of pocket conformational ensembles of thirteen human nuclear receptors (NRs), and test the ability of these ensembles to recognize their ligands in virtual screening, as well as predict their binding geometry, functional type, and relative binding affinity. 157 known NR modulators and 66 structures were used as a benchmark. Our pocket ensemble library correctly predicted the ligand binding poses in 94% of the cases. The models were also highly selective for the active ligands in virtual screening, with the areas under the ROC curves ranging from 82 to a remarkable 99%. Using the computationally determined receptor-specific binding energy offsets, we showed that the ensembles can be used for predicting selectivity profiles of NR ligands. Our results evaluate and demonstrate the advantages of using receptor ensembles for compound docking, screening, and profiling.
The online version of this article (doi:10.1007/s10822-010-9362-4) contains supplementary material, which is available to authorized users.
The primary function of nuclear receptors (NR) is to act as transcriptional switches responding to lipophilic hormones, vitamins, dietary lipids, or other intracellular signals [1, 2]. The common modular structure of an NR includes a conserved N-terminal DNA-binding domain (DBD), and a C-terminal ligand-binding domain (LBD) . The DBD directly interacts with DNA response element on the target genes to modulate transcription. The role of the LBD is to recognize ligands and coregulators . Ligand binding to NRs LBD induces changes in the conformational equilibrium and dynamic behavior of receptors and regulates the recruitment of coregulators and chromatin-modifying machineries [5, 6].
Nuclear receptors can be classified into the three groups. The first is the classic and the most extensively characterized group of receptors that respond to steroid hormones, thyroid hormone (thyroid hormone receptor; TR), and vitamins A (retinoic acid receptors; RAR) and D (vitamin D receptor; VDR). These receptors are all essential for homeostatic control of endocrine system. The second class is the orphan nuclear receptors, which are structurally related to nuclear hormone receptors but lack known natural ligands. In this group, nerve growth factor-induced clone B (Nur77), nuclear receptor related-1 (Nurr1), and neuronderived orphan receptor-1 (NOR1) are involved in the inflammatory processes . The third class of nuclear receptors includes adopted orphan nuclear receptors, which function as heterodimers with retinoid X receptor (RXR). These include receptors for fatty acids (peroxisome proliferator activated receptors; PPARα, δ, γ), oxysterols (liver X receptors: LXRα, β), bile acids (farnesoid X receptor; FXR), xenobiotics (steroid and xenobiotic receptor/pregnane X receptor; SXR/PXR ) and constitutive androstane receptor; CAR) .
NRs are transcription factors naturally switched on and off by small-molecule hormones. These hormones often present physicochemical properties very similar to therapeutic chemical entities. As such, NRs represent an important class of drug targets that have potential applications in cancer, inflammatory diseases, and diabetes.
Several NRs are already targeted by known drugs, such as tamoxifen, a partial antagonist of estrogen receptor (ER), or bicalutamide, an antagonist of androgen receptor (AR) used in treatment of breast and prostate cancer, respectively [10, 11]. Progesterone receptor (PR) agonists have several important applications in women’s health, such as in oral contraception and post-menopausal hormone replacement therapy . Spirolactones are mineralocorticoid receptor (MR) antagonists that have been used in the treatment of sodium-retaining states and as antihypertensive agents . The therapeutic applications of vitamin D metabolites are treatments for renal osteodystrophy, osteoporosis, psoriasis, cancer, autoimmune disease, or prevention of graft rejection .
The rapid growth of the number of high quality crystal structures of ligand binding domains of a number of important nuclear receptors [15–18] creates an opportunity to build a foundation for structure-based identification of potential nuclear receptor agonists, antagonists, or tissue selective modulators . The incorporation of multiple conformers of the same ligand binding pocket, a.k.a. ensemble docking  results in a higher quality algorithms for ligand binding pose prediction, virtual compound screening, and selectivity profiling. The recently developed 4D docking technology  further improves this process by dramatically speeding up all computational steps.
We have previously demonstrated the use of multiple receptor model-based Virtual Ligand Screening for the structure-based discovery of selective small molecule antagonists of the androgen receptor  and the thyroid hormone receptor , as well as antagonists  and agonists  of the retinoic acid receptor RARα. In the present paper we provide comprehensive benchmarking of the approach using a large library of the conformational ensembles of NR ligand binding pockets with determined three-dimensional structures. We demonstrate the exceptional performance of the ensemble docking approach on the tasks of predicting the ligand poses, test the ability of the ensembles to recognize their ligands in virtual screening, and propose a method to evaluate ligand selectivity profiles.
Co-crystal structures of ligand binding domains (LBD) of the progesterone receptor (PR), the mineralocorticoid receptor (MR), vitamin D receptor (VDR), the androgen receptor (AR), the peroxisome proliferator activated receptor alpha, beta, gamma (PPARα, β, γ), the estrogen receptor alpha, beta (ER α,β), the retinoic acid receptor gamma (RARγ), the glucocorticoid receptor (GR), the thyroid receptor (TR β), the retinoid X receptor alpha (RXRα) were used for this study. PDB entries with X-ray resolution >3 Å were excluded. When several crystal structures were available for a specific receptor/compound complex, the better resolution entry was taken.
PDB files were converted into the internal coordinate representation using the Internal Coordinate Mechanics (ICM) method . In the process of conversion, hydrogen atoms and missing heavy atom were built; polar hydrogens were sampled to optimize the hydrogen-bonding networks. Low occupancy side chains were optimized and assigned the lowest energy conformation. Orientations of ambiguously placed amino acids such as histidine, glutamine and asparagine were optimized to improve the hydrogen bonding patterns. For all NRs except ER only active (agonist-bound) conformations were tested in this study. In case MR and AR, wild type ligand binding domains were studied along with therapeutically relevant point mutants.
The database included 157 known ligands including agonists, antagonists, and selective nuclear receptor modulators. The molecules were taken from recent publications. For each receptor screened, between three and twenty-three known active ligands were included in the database. Most ligands targeted one of the library receptors, but several ligands had annotated activity against several receptors. Compounds were converted into 3D representation by the ICM conversion procedure. Bond order, tautomeric form, stereochemistry, hydrogen atoms and protonation state were assigned automatically. Each ligand was assigned the MMFF  force field atom types and charges and was then subject to Cartesian minimization. The free molecules were optimized with global energy in the internal coordinate space by the ICM method [26, 27]. This procedure guaranteed that the docking procedure did not have any memory of the bound conformation of the ligands.
Ligand docking simulations were carried out using ICM molecular modeling software (Molsoft LLC, La Jolla, CA) [26, 27] based on the Biased Probability Monte Carlo (BPMC) simulations. ICM docking procedure performs optimization of the ligand internal coordinates in the set of grid potential maps of the receptors . The ligand binding pocket of each receptor was represented by precalculated potential grid maps with grid step of 0.5 Å. The potentials including three van der Waals grid potential (for a carbon probe, large atom probe, or hydrogen probe), a hydrogen bonding grid potential, an electrostatic grid potential, and a hydrophobic grid potential. Ligands were sampled in the obtained maps. The sampling time ranged from 20 s to 3 min per ligand per processor. All calculations were carried out on Intel Core 2 Duo 2.40 GHz.
In the single receptor docking mode, ligands were docked independently into each receptor conformation and score according to ICM full-atom scoring scheme . In the ensemble mode, docking poses of each ligand in all ensemble structures were merged in a single hit list, and the ligand was represented by the top-scoring pose in that list.
where αi are ligand- and receptor independent constants. ΔEIntFF includes the van der Waals interaction of the ligand with the receptor as well as internal force-field energy of the ligand. TΔSTor is the change in free energy due to the conformational entropy loss for the ligand upon binding. ΔEHBond, ΔESolEl, ΔEHBDesol, ΔEHPhob and QSize are the hydrogen bonding, the solvation electrostatic energy, the hydrogen bond donor/acceptor desolvation correction, the hydrophobic terms, respectively, and QSize is the number of ligand non-hydrogen atoms.
The ICM virtual ligand screening module was used. The ligand database was docked into each protein structure. The top scoring pose per ligand was refined and rescored. The accuracy of VLS was tested as the area under the ROC curves obtained plotting the number of false positives vs. the number of true positives in the hit list.
For each nuclear receptor NRk in the benchmark (k = 1, 2, …, 17), the initial binding score offset was determined as the negative 5% percentile threshold of the database compounds binding scores. In other words, all compounds in the database were scored and rank-ordered the ensembles of NRk, producing the set or scores S (Cmpdi, NRk) (i = 1, 2, …,157, k = 1, 2, …,17). The threshold Tk was defined so that 5% of the database compounds had scores below this threshold. The new scores for all compounds were calculated as Soff (Cmpdi, NRk) = S (Cmpdi, NRk) − Tk.
The optimized NR-specific offsets were obtained by the Nelder-Mead simplex  optimization of the normalized ROC AUC on the matrix of scores Soff (Cmpdi, NRk) as a function of the vector of offsets (T1, T2, …, T17). The available experimental data on compounds activity was used to define true positives, false positives, true negatives, and false negatives. The initial offsets were scattered by addition of random vector of numbers in the ranges of [−10, 10] and used as multiple starting point for optimization. The obtained optimal combination of offsets provided discrimination of binders from non-binders with ROC AUC as high as 90%.
We collected a library of high-quality crystallographic structures of ligand binding domains of various human nuclear receptors to be used in virtual screening and selectivity profiling of NR ligands. Each receptor was represented by an ensemble of structures rather than a single structure in order to capture possible conformational variations of the binding pocket. For four receptors, estrogen receptors ERα and ERβ, mineralocorticoid receptor MR, and androgen receptor AR, we built more than one ensemble.
In case of the estrogen receptors ERα and ERβ, separate ensembles represented active and inactive conformations. Depending on the nature of the bound ligand (agonist or antagonist), the carboxy-terminal helix H12 was found in one of two orientations. In the agonist-bound conformation, it serves as a “lid” to close the ligand-binding pocket, whereas in the antagonist-bound conformation, this helix is positioned in a different orientation opening the entrance to the LBP. The classification was performed based on the conformational analysis rather than on the nature of the crystallized ligand. For example, even though genistein, is an agonist of ERβ, the co-crystal structure (PDB ID: 1qkm ) was classified as inactive because of the observed position of H12.
In case of mineralocorticoid receptor MR, all wild-type structures were merged into one ensemble, while all structures carrying a clinically relevant S810L point mutation formed the second ensemble. This mutation is known to cause severe hypertension, increase the stability of ligand-MR complexes, and change the mode of action of existing antagonists converting them into agonists. For example, progesterone and spironolactone (Fig. 1), antagonists of the wild-type mineralocorticoid receptor [32, 33], act as agonists of S810L MR . Similar situation was observed for the androgen receptor AR, as T877A point mutation allows antagonists of the wild-type receptors to invoke the agonist-bound conformation of the ligand binding domain [35, 36].
For other receptors, the identified point mutations did not change the mode of binding and action of the existing ligands. By this reason, the mutant structures were included in the same ensemble as the wild-type structures.
The structures were further analyzed and filtered to retain only high-resolution (<2.9 Å) X-ray structures that reproduced the binding geometry of their cognate ligands in docking. All protein structures in the ensemble were superimposed using all atoms in the immediate vicinity of the ligands. The superimposition algorithm was based on an iterative procedure that, through an unbiased weight assignment to different atomic subsets, gradually found the better superimposable core of atom pairs between the template and the other structures. The energy-based preprocessing and refinement of the structures was performed that included selection of optimal His tautomers, prediction of orientation of Asn, Gln and His residues, and optimization of polar hydrogen atoms in order to reproduce correct hydrogen bonding networks. The resulting library of NR structure ensembles is summarized in Table 1. It includes 17 ensembles of 13 nuclear receptors, with three to six structures per ensemble (65 structures total).
We first studied the ability of the collected pocket structures to correctly reproduce the binding geometry of their cognate ligands and other ligands in their ensembles. The flexible compound docking and scoring was performed using the Internal Coordinate Mechanics (ICM) algorithm described in Methods. As the criterion of correct binding pose prediction, the standard ligand heavy-atom RMSD cutoff of 2 Å was used. As we expected, all ligands were reproduced correctly within 2Å RMSD in the self-docking (Fig. 2 and Table 2).
In cross docking, 63 compounds were docked into the structures of their target receptors, with the cognate (co-crystallized) structure excluded. That made the total of 256 structure-ligand pairs. As shown in Fig. 2, the near-native ligand conformation scored at the top in 202 cases (78%), and was found within three top-scoring poses in 220 cases (85%). The success rate of ligand binding pose prediction for individual receptors is shown in the Table 2.
Most importantly, the predicted ligand poses correctly captured most or all essential ligand-receptor contacts predicted by crystallography, mutagenesis, and quantum mechanical studies [37–39]. This makes the predicted poses good starting points for compound optimization or understanding effects of pocket residue modifications.
In multiple receptor conformation, or ensemble docking, alternative docking poses of a single ligand from different receptor structures were combined in a single hit list, and the ligand was represented by the best scoring pose in this list. The cognate receptor structure was excluded from the ensemble. As shown in Fig. 2, ensemble docking with excluded cognate receptor structure reproduced the near-native ligand binding geometry as the top-scoring pose in 89% of the cases (56 out of 63 receptor-ligand pairs). 36 of 63 best scoring poses of each ligand in its receptor ensemble are shown in Fig. 3 (the rest of poses can be found in supporting information Figure S2). For three more cases (PPARδ/GNI, PPARγ/EHA and ERα-/E4D), the correct geometry was found within three top scoring poses in the hit list. The result of this section confirms that using experimentally determined structure ensembles with small number of conformers improves docking accuracy as compared to single receptor mode (89% success rate in ensemble docking vs. 78% in single receptor docking).
We tested the ability of the NR pockets and their conformational ensembles to select the active ligands of their respective receptors from a large and challenging compound database. Docking and scoring the database compounds in a single pocket or a pocket ensemble produced an ordered hit list. The screening selectivity of the models is represented by their ability to bring the active compounds to the top of the list, and is evaluated numerically as the Area Under normalized Receiver Operating Characteristic (ROC) curve, or AUC.
The ligand database was collected as described in Methods. It consisted of 157 various nuclear receptor ligands described in literature. 124 of these compounds had annotated agonist or antagonist activity against at least one of the library receptors (supplementary information Table 1). For each receptor, between three and twenty-three known ligands were included in the database. Most compounds in the database targeted exactly one receptor, but for some of them the activity against two or more receptors was described in literature.
In the single pocket mode, 29 out of 66 NR structures (43%) showed high selectivity towards their respective ligands with ROC-AUC exceeding 90%. For 53 structures (80%) ROC-AUC exceeded 80%. The results of VLS with single pocket conformation are summarized in Table 3.
When used in the ensemble mode, the ligand hit lists from different structures in the ensemble were combined and each ligand was represented by its best score in the ensemble. This remarkably improved the screening performance of our NR structure library. For 11 of 17 ensembles, ROC-AUC values of more than 90% were achieved, and for others they exceeded 80%. The most significant improvement was obtained for ERα−, ERβ− and PR+ which showed poor selectivity in the single receptor mode, but achieved the ROC AUC values of 86, 87 and 92% respectively, in the ensemble mode (Fig. 4). This result suggests that ensemble screening is the most efficient approach to VLS.
Variations in the screening selectivity of the individual ensembles can be in part explained by the incompleteness of our ligand benchmark set. Because in the absence of information on binding of a particular ligand to a receptor of interest we assumed that the ligand is inactive, low docking score for that ligand was automatically classified as a false positive. Further examination of several false positives confirmed their activity against the target thus qualifying them as true positives instead. That was the case, for example, for diethylstilbestrol for which ERα agonist activity was confirmed after docking by literature search . Additionally, a number of false positives in the benchmark were due to the fact that antagonist (inactive) binding pockets are often sterically compatible with the agonist molecules (but not vice versa) and share significant subsets of interactions. The use of pharmacophore-like atomic property fields in the spirit of [41, 42] may help to distinguish and eliminate this ambiguity.
False negatives, i.e. active compounds with poor docking scores may also affect the evaluation of the overall selectivity of the ensembles. Our structural ensembles are able to capture some, but not all aspects of the receptor equilibrium, especially for flexible pockets like ERα. In such cases, addition of new experimental conformations as they become available may help improve the quality of the compound recognition.
Ligand selectivity profiling is a very challenging task of computational structural biology that requires an accurate prediction of relative binding energy estimate of a single compound to different receptors. It is orthogonal to the task of screening, which is based on prediction of binding energies of a single receptor to multiple database compounds. It has been previously shown that intrinsic properties of the proteins, such as protein entropy loss upon ligand binding, structure quality fluctuations, and inevitable errors in energy functions can be, to a certain level, absorbed into a single protein-dependent binding energy offset which can be pre-calculated using experimentally observed binding energies or statistical considerations . Such offsets must be applied in compound selectivity profiling.
There are several ways to determine the offsets. The easiest, naïve way is based on the distribution of calculated binding scores of all database compounds in a given ensemble, and does not depend on any experimental data about active compounds. A single offset is calculated for each ensemble and applied in such a way that exactly 5% of all database compounds have binding scores better than a single fixed threshold. A better, more precise way to determine the offset is based on analysis of existing experimental compound activity data. The offsets (one number for each ensemble) are obtained by a machine learning algorithm that optimizes the recognition of correct targets for all database compounds.
In this work, we derived the protein-specific binding energy offsets in both ways, applied them to the predicted ligand binding energies, and evaluated the ligand specificity profiling accuracy of the obtained models. Figure 5 presents the comparison of the naïve offsets with those derived by simplex minimization of the total number of errors in the receptor-ligand binding matrix. It is clear that the naïve offsets are sufficiently close to the trained offsets (R2 = 0.64) and, therefore, represent a reasonable approximation of the true offsets to be used in compound profiling.
We found that even raw ICM docking scores were predictive of compound selectivity profiles. This is demonstrated by Fig. 6a on which the ROC curve for recognition of experimentally confirmed high-affinity ligand receptor pairs in the set of all 157 × 17 = 2,669 possible pairs is shown in grey. This type of analysis relies on the ability of the model to deliver comparable values of the docking scores of a single ligand to multiple receptors. Complementing the ligand docking scores with the pre-determined NR-specific binding energy offsets further improved ligand selectivity prediction. The histogram of per-ligand AUC values with and without the offset correction is shown in Fig. 6b. The AUC of 100 for a selected compound is an evidence of perfect compound selectivity profile recognition, i.e. the situation in all compound/target-receptor pairs are predicted with better scores than compound/non-target-receptor pairs. As shown in Fig. 6b, the prefect profile recognition was achieved for 53 of 124 compounds without the offsets, and for 66/124 compounds with the offsets. Figure 7 presents a good agreement of the activities predicted by docking with annotated experimental compounds activities. For example, cortisol (Ligand No 46 in the Fig. 8) is known as to be an agonist of GR and T877A AR, which does not bind to wtAR. This profile is correctly predicted by our ensemble docking. Another example is asoprisnil (Ligand No 55), agonist of PR, which does not bind to ER and MR. Biological activity of asoprisnil was well characterized by predicting binding energy in our MRC model. The patterns observed in Fig. 7 may also reveal potentially complex cross-reactivity profiles of the NR ligands and provide explanation for the off-target activities or side effects of the studies ligands. Experimental validation of these predictions would allow further refinement of the method.
The obtained results confirm the applicability of the NT binding pocket ensembles in the task of compound selectivity profiling.
Proteins are flexible machines that exist in a dynamic conformational equilibrium in solution. They tend to change their conformation, often significantly, to accommodate binding of ligands (the so-called induced fit effect). Multiple receptor conformation (MRC) docking , with its faster 4D incarnation , represent one possible approach to incorporation of receptor flexibility in ligand docking. With the rapid growth of structural information in the form of X-ray crystal structures and NMR, it becomes possible to design multi-conformational pocket libraries that provide the unique possibilities for structure-based drug design.
Nuclear receptors form a major class of drug targets. They are involved in a variety of human diseases including cancer, hormonal disorders, and diabetes. In order to understand the structural basis of selectivity of NR-targeting existing drugs, and to develop novel selective nuclear receptor modulators, it is important to build a comprehensive panel of 3D pockets with their conformational variations. In this work we present the first such panel consisting of 17 ensembles of binding pockets of 13 different receptors. We tested ability of the designed ensembles to correctly reproduce the compounds binding poses, draw a distinction between the active compounds and other NR ligands in virtual screening, and predict compound selectivity profiles.
The collected NR pocket ensembles reproduced the near-native ligand binding geometry as the top-scoring pose in 78% of the cases in the single receptor docking mode, and in 89% of the cases in the ensemble mode. The high success rate ensemble docking suggests that the obtained models may be good starting points for optimization of the existing compounds or virtual ligand screening.
Most NR models demonstrated high screening selectivity even in single rigid receptor mode. The selectivity was further improved by combining them in the ensemble mode. The high selectivity of all 17 ensembles towards their active ligand is exemplified by the ROC-AUC values above 80%; for 11 ensembles, it exceeded 90%. All receptors recognized most of their active ligands in top 2–10% of the hit list.
Selectivity between different isoforms of the same receptor can represent an important parameter of understanding the binding mode of agonists and antagonists. Here, we built pocket ensembles of PPAR α, δ, γ and ER α, β, and evaluated their ability to discriminate between isotype-specific ligands in virtual screening. All three pocket ensembles of PPAR (α, δ, and γ), recognized their known agonist and antagonists with high selectivity (91, 98 and 88% respectively). Ensembles of ER α, β also showed high selectivity even though the structures of their ligand binding domains are very similar. Further on, the conformation-specific ensembles of ERα, and ERβ were also used to test the selectivity between different conformational states of the same protein. The active and inactive ensembles successfully distinguished agonists from antagonists and vice versa. The obtained results suggest that our proposed pocket ensemble library can potentially be used as a powerful device for structure-based discovery of novel specific nuclear receptor modulators with the desired mechanism of action.
Furthermore, by combining the compound docking scores with pre-determined receptor-specific binding energy offsets, we showed that our proposed pocket ensembles can be used for in silico determination of compound selectivity profiles. This aspect is extremely important as it can open avenues for rational prediction of possible off-target effects of the experimental compounds, or for repurposing of the existing drugs [22, 43, 44].
In this work, we presented a comprehensive library of three dimensional models of the nuclear receptor binding pockets in different functional states. Using the library, we validate and demonstrate the advantage of using the ensemble docking approach in various structure-based drug discovery applications, including compounds pose prediction, virtual screening, and selectivity profiling.
Below is the link to the electronic supplementary material.
This work was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF-2007-357- E00038 38) and NIH/NIGMS grants 5-R01-GM071872 and 1-R01 -GM074832.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.