It has been suggested that PXR forms a heterotetramer and exhibits a range of motions which are key for its functioning and preparing for coactivator binding at the Activator Function (AF-2) site 
. The large and promiscuous ligand binding pocket of PXR accepts molecules of widely varying sizes (Table S1
), and is likely capable of binding small molecules in multiple orientations. Furthermore, movement of regions of this pocket may be translated elsewhere in the protein to influence protein-protein interactions. Thus, the identification of the bioactive conformation of a ligand binding to PXR (and the effect it might have as an agonist, antagonist or allosteric antagonist 
) and development of a ligand alignment based on these conformations represents a challenge for any computational technique. A realistic ligand alignment, however, is the basis for a reliable 3D-QSAR model. Computational methods including QSAR (3D, 4D and 5D), pharmacophores and machine learning classification models for PXR can assist in rapid prediction of whether a compound is likely to be an agonist (activator), however each method has its limitations and advantages (). For example a previous study used human PXR activation data for 30 steroidal compounds (including 9 bile acids) to create a pharmacophore with four hydrophobic features and one hydrogen bond acceptor 
. This pharmacophore contained 5α-androstan-3β-ol (EC50
0.8 µM) which contains one hydrogen bond acceptor, indicating that in contrast to the crystal structure of 17β-estradiol (published EC50
20 µM) bound to human PXR with two hydrogen bonding interactions 
, hydrophobic interactions may therefore be more important for increased affinity 
. This and other pharmacophores have been used to predict PXR interactions for antibiotics 
which were verified in vitro
, suggesting one use for computational approaches in combination with experimental methods.
Summary of the different methods used in this study.
To our knowledge there has been no comparative analysis of the steroidal classes with respect to their use as PXR agonists. The use of the Bayesian classification with 2D fingerprints represents a low computational cost approach 
which has been used frequently with large molecule datasets 
. Using 2D-molecular fingerprint descriptors identified regions in the training set molecules that were predominantly hydrophobic and that were important for PXR activation. Substructures with free hydroxyls as hydrogen bonding features were associated with compounds that were not activators. This is in general agreement with other studies which have used docking to try to help design out PXR activation 
. This model was able to successfully rank a large test set (Table S3
) of non-steroidal molecules, indicative that the molecular descriptors adequately captured the global properties of PXR agonists and suggests some utility.
The current study suggests that while it is generally possible to create 3D-QSAR (CoMFA, CoMSIA, Catalyst) and 4D-QSAR models that can be cross-validated, these models perform poorly when used to predict external molecules. Only the 5D-QSAR model generated displays some success in predicting external test set steroidal compounds. Three main differences between the 5D-QSAR and the 3D-QSAR studies that might contribute to the difference in performance are the less rigid alignment using Symposar 
, the possibility to present a ligand in more than one binding pose and the better treatment of weak or non-binding compounds.
Pharmacophore models for the 4 classes of steroidal compounds possessed some of the features in the published human PXR crystal structures, however the models contained two or three hydrophobic regions (rather than four as shown previously)
and one to two hydrogen bond acceptors or a hydrogen bond acceptor and hydrogen bond donor (compared to one hydrogen bond acceptor as shown previously). This might suggest that the steroids evaluated occupy just a part of the ligand binding pocket while larger molecules like rifampicin occupy most of the binding pocket and have subsequently many more interactions with the protein 
. The addition of the excluded volumes to the pharmacophores was shown to improve the correlation for the training sets and likely acts in a similar manner to using the crystal structures in 5D-QSAR.
Consistent with the QSAR findings were those from docking studies that though modest in success overall, fare much better with individual classes of compounds. The classification was performed using two similarity weighted scoring schemes: one based on a highly potent compound 5α-androstan-3β-ol and the other based on a structurally relevant compound 17β-estradiol. The goal was to test the utility of biasing the scoring scheme with either a structurally relevant compound or a functionally significant compound.
However, in this case 17β-estradiol and 5α-androstan-3β-ol share nearly 75% structural similarity (using MDL Keys and Tanimoto similarity coefficient). The results from the classification studies showed that biasing the scoring scheme with a structurally relevant compound (17β-estradiol) produced classification rates with sensitivity and specificity values averaging at 52% and 50% respectively with slightly better prediction accuracy (). These results unfortunately cannot be compared with our recent docking study 
as a different co-crystal ligand was used for the scoring scheme. Although the structure biased scoring scheme performed better among all the compounds, both the scoring schemes performed equally well when individual classes were considered. In the case of androstanes, 6 out of 11 compounds were correctly predicted as activators in docking studies. 5α-Androstan-3β-ol that had the lowest EC50
value (described earlier) was predicted to be an activator in all structures. 5α-Androstan-3β-ol binds with very high docking scores and has a hydrogen bond interaction with His407, a key interaction of PXR (). This interaction was consistent among all the androstane activators. However, epitestosterone sulfate has an EC50
of 3.39 µM and was misclassified in the combined model using predictions from all structures as a non-activator. Docking studies show that epitestosterone sulfate has a consistently reversed docking pose (when compared with 5α-Androstan-3β-ol) in all the models and the sulfate group is predicted to make a hydrogen bond interaction with His407, as opposed to the steroid ester in 1M13 structure (). A few other misclassified activators were docked in reversed poses and often had favorable hydrogen bonding partners such as sulfates that probably influence the binding mode of these steroids. This is a surprising and novel finding of this study and other researchers should be aware of this when docking similar compounds with this functional group.
Among the bile salts, all four activators were correctly predicted and the ligands bind in a conserved mode with the steroid esters participating in favorable interactions with the side chain of His407 and Arg410, and the steroid rings with hydrophobic groups such as Leu411, Leu239 and Phe281 (). The pregnanes had similar activation patterns as the bile salts and docking studies could predict 4 out of the 9 compounds correctly. Among the misclassified compounds, levonorgestrol was predicted to be an activator in three models, and a non-activator in three models and hence could not be classified with high confidence. Levonorgestrol has an EC50 of 4.30 µM and is predicted to have favorable interactions with hPXR as shown in . Despite this, the similarity weighted scoring functions generally performed well in classifying activators as described in the examples above and by the sensitivity values in . The paucity of available PXR binding data may limit some of the insights from docking experiments performed to date.
It is not surprising that CoMFA and CoMSIA do not perform well as they use rigid alignments of the molecules. This is potentially a seriously limitation given that the binding pocket of PXR may accommodate multiple orientations of the steroids ( vs. ). Theoretically, 4D- and 5-QSAR should perform better by considering an ensemble of ligand conformations and in fact 4D-QSAR does well within subsets (especially androstanes) but like all methods extrapolates poorly. 5D-QSAR appears to perform the best with the test set. Alignment independent methods like Catalyst which can deal with structurally diverse molecules can generate pharmacophores for the individual classes of compounds but their inter-class predictivity is limited. Another alignment independent method such as using 2D fingerprints and descriptors with the Bayesian classification approach may represent a fast approach to screen for potential PXR agonists, but like all methods their applicability domain 
is dependent on the training set. In this case the set of steroids would be expected to limit the utility of such models to a relatively narrow class of compounds, although it may be picking up key features in more diverse molecules (Table S3
) suggesting overlap in the chemical space.
This study shows the inherent difficulty of producing predictive ligand or structure-based computational models for PXR. Some of the methods used are ligand alignment dependent while others are alignment independent, and each has limitations when used with flexible proteins. These computational models also confirm some of the molecular features (hydrophobicity and hydrogen bond acceptors) identified in previous models and structures, while using a large quantitative dataset to create new QSAR, classification and pharmacophore models to test docking and scoring. The study represents an initial step comparing multiple methods focused on steroidal compounds rather than a more diverse series of drug-like molecules. Using a more diverse series of molecules would have been expected to present even more difficulty for the alignment dependent methods such as CoMFA and CoMSIA. There are also many more commercial computational methods that could be evaluated and compared, although we have used several 3D, 4D, 5D-QSAR methods, machine learning with 2D descriptors, pharmacophore and GOLD docking and scoring methods in this study. The results from these methods could be used in combination as part of a consensus approach or Pareto optimization 
. The provision of the 115 molecule human PXR dataset is potentially useful as a benchmark PXR set for testing further methods in future. For example, flexible docking methods 
could be used as well as algorithms that could differentiate multiple binding mechanisms 
In conclusion, there are many promiscuous proteins 
where the modeling of ligand-protein interactions is complicated by a large binding site, multiple binding pockets, protein flexibility or all of the preceding. We have applied several different computational approaches which could also be applied to other proteins like CYPs, transporters and ion channels. This work is therefore more broadly applicable in an attempt to predict whether molecules bind in such flexible proteins, and which methods perform the best. Depending on the desired use of such information, different modeling methods may be appropriate and required. While 2D methods do not encode 3D information like shape 
they are fast and they can highlight important features likely interacting with the protein. 3D-5D methods provide more shape based information but they are fragile, with a narrow applicability domain and may not be able to differentiate close analogs. Docking is also limited unless key interactions with the protein are already known. Our results suggest that even in the presence of multiple crystal structures, the full range of protein motions may not be captured. As we have previously shown, when docking classification predictions are correct the binding conformation information alone may be instructive 
. This current analysis indicates that using many different computational approaches (both alignment dependent and alignment independent) may be necessary and expectations should be scaled accordingly if some do not work with such promiscuous proteins. Even with their respective limitations, these methods have provided some useful information of general interest that could be applicable beyond PXR.