Hsp90 continues to be an important
target for pharmaceutical discovery.
In this project, virtual screening (VS) for novel Hsp90 inhibitors
was performed using a combination of Autodock and Surflex-Sim (LB)
scoring functions with the predictive ability of 3-D QSAR models,
previously generated with the 3-D QSAutogrid/R procedure. Extensive
validation of both structure-based (SB) and ligand-based (LB), through
realignments and cross-alignments, allowed the definition of LB and
SB alignment rules. The mixed LB/SB protocol was applied to virtually
screen potential Hsp90 inhibitors from the NCI Diversity Set composed
of 1785 compounds. A selected ensemble of 80 compounds were biologically
tested. Among these molecules, preliminary data yielded four derivatives
exhibiting IC50 values ranging between 18 and 63 μM
as hits for a subsequent medicinal chemistry optimization procedure.
Cyclophilin D (CypD) is a peptidyl
prolyl isomerase F that resides
in the mitochondrial matrix and associates with the inner mitochondrial
membrane during the mitochondrial membrane permeability transition.
CypD plays a central role in opening the mitochondrial membrane permeability
transition pore (mPTP) leading to cell death and has been linked to
Alzheimer’s disease (AD). Because CypD interacts with amyloid
beta (Aβ) to exacerbate mitochondrial and neuronal stress, it
is a potential target for drugs to treat AD. Since appropriately designed
small organic molecules might bind to CypD and block its interaction
with Aβ, 20 trial compounds were designed using known procedures
that started with fundamental pyrimidine and sulfonamide scaffolds
know to have useful therapeutic effects. Two-dimensional (2D) quantitative
structure–activity relationship (QSAR) methods were applied
to 40 compounds with known IC50 values. These formed a
training set and were followed by a trial set of 20 designed compounds.
A correlation analysis was carried out comparing the statistics of
the measured IC50 with predicted values for both sets.
Selectivity-determining descriptors were interpreted graphically in
terms of principle component analyses. These descriptors can be very
useful for predicting activity enhancement for lead compounds. A 3D
pharmacophore model was also created. Molecular dynamics simulations
were carried out for the 20 trial compounds with known IC50 values, and molecular descriptors were determined by 2D QSAR studies
using the Lipinski rule-of-five. Fifteen of the 20 molecules satisfied
all 5 Lipinski rules, and the remaining 5 satisfied 4 of the 5 Lipinski
criteria and nearly satisfied the fifth. Our previous use of 2D QSAR,
3D pharmacophore models, and molecular docking experiments to successfully
predict activity indicates that this can be a very powerful technique
for screening large numbers of new compounds as active drug candidates.
These studies will hopefully provide a basis for efficiently designing
and screening large numbers of more potent and selective inhibitors
for CypD treatment of AD.
of the structures in PubChem are annotated with activities
determined in high-throughput screening (HTS) assays. Because of the
nature of these assays, the activity data are typically strongly imbalanced,
with a small number of active compounds contrasting with a very large
number of inactive compounds. We have used several such imbalanced
PubChem HTS assays to test and develop strategies to efficiently build
robust QSAR models from imbalanced data sets. Different descriptor
types [Quantitative Neighborhoods of Atoms (QNA) and “biological”
descriptors] were used to generate a variety of QSAR models in the
program GUSAR. The models obtained were compared using external test
and validation sets. We also report on our efforts to incorporate
the most predictive of our models in the publicly available NCI/CADD
Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap).
serotonin (5-hydroxytryptamine, 5-HT) transporter (SERT) plays
an essential role in the termination of serotonergic neurotransmission
by removing 5-HT from the synaptic cleft into the presynaptic neuron.
It is also of pharmacological importance being targeted by antidepressants
and psychostimulant drugs. Here, five commercial databases containing
approximately 3.24 million drug-like compounds have been screened
using a combination of two-dimensional (2D) fingerprint-based and
three-dimensional (3D) pharmacophore-based screening and flexible
docking into multiple conformations of the binding pocket detected
in an outward-open SERT homology model. Following virtual screening
(VS), selected compounds were evaluated using in vitro screening and
full binding assays and an in silico hit-to-lead (H2L) screening was
performed to obtain analogues of the identified compounds. Using this
multistep VS/H2L approach, 74 active compounds, 46 of which had Ki values of ≤1000 nM, belonging to 16
structural classes, have been identified, and multiple compounds share
no structural resemblance with known SERT binders.
We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (Correct Classification Rate above 0.7) for a binary dataset of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of the nearest neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 datasets and the threshold of 0.65 was found to separate non-modelable from the modelable datasets.
The interaction that occurs between molecules is a dynamic process that impacts both structural and conformational properties of the ligand and the ligand binding site. Herein, we investigate the dynamic cross-talk between a protein and the ligand as a source for new opportunities in ligand design. Analysis of the formation/disappearance of protein pockets produced in response to a first-generation inhibitor assisted in the identification of functional groups that could be introduced onto scaffolds to facilitate optimal binding, which allowed for increased binding with previously uncharacterized regions. MD simulations were used to elucidate primary changes that occur in the Hsp90 C-terminal binding pocket in the presence of first-generation ligands. This data was then used to design ligands that adapt to these receptor conformations, which provides access to an energy landscape that is not visible in a static model. The newly synthesized compounds demonstrated anti-proliferative activity at ~150 nanomolar concentration. The method identified herein may be used to design chemical probes that provide additional information on structural variations of Hsp90 C-terminal binding site.
Drug-Design; Flexibility; Allostery; MD simulations; Dynamics-Based Design; Hsp90
Molecular similarity has been effectively applied to many problems in cheminformatics and computational drug discovery, but modern methods can be prohibitively expensive for large-scale applications. The SCISSORS method rapidly approximates measures of pairwise molecular similarity such as ROCS and LINGO Tanimotos, acting as a filter to quickly reduce the size of a problem. We report an in-depth analysis of SCISSORS performance, including a mapping of the SCISSORS error distribution, benchmarking, and investigation of several algorithmic modifications. We show that SCISSORS can accurately predict multiconformer similarity, and suggest a method for estimating optimal SCISSORS parameters in a dataset-specific manner. These results are a useful resource for researchers seeking to incorporate SCISSORS into molecular similarity applications.
drugs were investigated to elucidate their
mechanisms of action (MOAs) and clinical functions by pathway analysis
based on retrieved drug targets interacting with or affected by the
investigated drugs. Protein and gene targets and associated pathways
were obtained by data-mining of public databases including the MMDB,
PubChem BioAssay, GEO DataSets, and the BioSystems databases. Entrez
E-Utilities were applied, and in-house Ruby scripts were developed
for data retrieval and pathway analysis to identify and evaluate relevant
pathways common to the retrieved drug targets. Pathways pertinent
to clinical uses or MOAs were obtained for most drugs. Interestingly,
some drugs identified pathways responsible for other diseases than
their current therapeutic uses, and these pathways were verified retrospectively
by in vitro tests, in vivo tests, or clinical trials. The pathway
enrichment analysis based on drug target information from public databases
could provide a novel approach for elucidating drug MOAs and repositioning,
therefore benefiting the discovery of new therapeutic treatments for
describe a novel approach to RBF approximation, which combines
two new elements: (1) linear radial basis functions and (2) weighting
the model by each descriptor’s contribution. Linear radial
basis functions allow one to achieve more accurate predictions for
diverse data sets. Taking into account the contribution of each descriptor
produces more accurate similarity values used for model development.
The method was validated on 14 public data sets comprising nine physicochemical
properties and five toxicity endpoints. We also compared the new method
with five different QSAR methods implemented in the EPA T.E.S.T. program.
Our approach, implemented in the program GUSAR, showed a reasonable
accuracy of prediction and high coverage for all external test sets,
providing more accurate prediction results than the comparison methods
and even the consensus of these methods. Using our new method, we
have created models for physicochemical and toxicity endpoints, which
we have made freely available in the form of an online service at http://cactus.nci.nih.gov/chemical/apps/cap.
Incorporation of receptor flexibility
into computational drug discovery
through the relaxed complex scheme is well suited for screening against
a single binding site. In the absence of a known pocket or if there
are multiple potential binding sites, it may be necessary to do docking
against the entire surface of the target (global docking). However
no suitable and easy-to-use tool is currently available to rank global
docking results based on the preference of a ligand for a given binding
site. We have developed a protocol, termed LIBSA for LIgand Binding
Specificity Analysis, that analyzes multiple docked poses against
a single or ensemble of receptor conformations and returns a metric
for the relative binding to a specific region of interest. By using
novel filtering algorithms and the signal-to-noise ratio (SNR), the
relative ligand-binding frequency at different pockets can be calculated
and compared quantitatively. Ligands can then be triaged by their
tendency to bind to a site instead of ranking by affinity alone. The
method thus facilitates screening libraries of ligand cores against
a large library of receptor conformations without prior knowledge
of specific pockets, which is especially useful to search for hits
that selectively target a particular site. We demonstrate the utility
of LIBSA by showing that it correctly identifies known ligand binding
sites and predicts the relative preference of a set of related ligands
for different pockets on the same receptor.
determinant(s) of protein thermostability is key for
rational and data-driven protein engineering. By analyzing more than
130 pairs of mesophilic/(hyper)thermophilic proteins, we identified
the quality (residue-wise energy) of hydrophobic interactions as a
key factor for protein thermostability. This distinguishes our study
from previous ones that investigated predominantly structural determinants.
Considering this key factor, we successfully discriminated between
pairs of mesophilic/(hyper)thermophilic proteins (discrimination accuracy: ∼80%)
and searched for structural weak spots in E. coli dihydrofolate reductase (classification accuracy: 70%).
negative docking outcomes for highly symmetric molecules
are a barrier to the accurate evaluation of docking programs, scoring
functions, and protocols. This work describes an implementation of
a symmetry-corrected root-mean-square deviation (RMSD) method into
the program DOCK based on the Hungarian algorithm for solving the
minimum assignment problem, which dynamically assigns atom correspondence
in molecules with symmetry. The algorithm adds only a trivial amount
of computation time to the RMSD calculations and is shown to increase
the reported overall docking success rate by approximately 5% when
tested over 1043 receptor–ligand systems. For some families
of protein systems the results are even more dramatic, with success
rate increases up to 16.7%. Several additional applications of the
method are also presented including as a pairwise similarity metric
to compare molecules during de novo design, as a scoring function
to rank-order virtual screening results, and for the analysis of trajectories
from molecular dynamics simulation. The new method, including source
code, is available to registered users of DOCK6 (http://dock.compbio.ucsf.edu).
5-hydroxytryptamine 1A (5-HT1A) serotonin receptor
has been an attractive target for treating mood and anxiety disorders
such as schizophrenia. We have developed binary classification quantitative
structure–activity relationship (QSAR) models of 5-HT1A receptor binding activity using data retrieved from the PDSP Ki database. The prediction accuracy of these
models was estimated by external 5-fold cross-validation as well as
using an additional validation set comprising 66 structurally distinct
compounds from the World of Molecular Bioactivity database. These
validated models were then used to mine three major types of chemical
screening libraries, i.e., drug-like libraries, GPCR targeted libraries,
and diversity libraries, to identify novel computational hits. The
five best hits from each class of libraries were chosen for further
experimental testing in radioligand binding assays, and nine of the
15 hits were confirmed to be active experimentally with binding affinity
better than 10 μM. The most active compound, Lysergol, from
the diversity library showed very high binding affinity (Ki) of 2.3 nM against 5-HT1A receptor. The novel
5-HT1A actives identified with the QSAR-based virtual screening
approach could be potentially developed as novel anxiolytics or potential
challenge in using computational methods for protein
structure prediction is the refinement of low-resolution structural
models derived from comparative modeling methods into highly accurate
atomistic models useful for detailed structural studies. Previously,
we have developed and demonstrated the utility of the internal coordinate
molecular dynamics (MD) technique, generalized Newton–Euler
inverse mass operator (GNEIMO), for refinement of small proteins.
Using GNEIMO, the high-frequency degrees of freedom are frozen and
the protein is modeled as a collection of rigid clusters connected
by torsional hinges. This physical model allows larger integration
time steps and focuses the conformational search in the low frequency
torsional degrees of freedom. Here, we have applied GNEIMO with temperature
replica exchange to refine low-resolution protein models of 30 proteins
taken from the continuous assessment of structure prediction (CASP)
competition. We have shown that GNEIMO torsional MD method leads to
refinement of up to 1.3 Å in the root-mean-square deviation in
coordinates for 30 CASP target proteins without using any experimental
data as restraints in performing the GNEIMO simulations. This is in
contrast with the unconstrained all-atom Cartesian MD method performed
under the same conditions, where refinement requires the use of restraints
during the simulations.
Predicting the 3D structures of small molecules is a common problem in chemoinformatics. Even the best methods are inaccurate for complex molecules, and there is a large gap in accuracy between proprietary and free algorithms. Previous work presented COSMOS, a novel, data-driven algorithm that uses knowledge of known structures from the Cambridge Structural Database, and demonstrated performance that was competitive with proprietary algorithms. However, dependence on the Cambridge Structural Database prevented its widespread use. Here we present an updated version of the COSMOS structure predictor, complete with a free structure library derived from open data sources. We demonstrate that COSMOS performs better than other freely-available methods, with a mean RMSD of 1.16 Å and 1.68 Å for organic and metal-organic structures, and a mean prediction time of 60 ms per molecule. This is a 17% and 20% reduction in RMSD compared to the free predictor provided by Open Babel, and ten times faster. The ChemDB webportal provides a COSMOS prediction webserver, as well as downloadable copies of the COSMOS executable and the library of molecular substructures.
The Site Identification by Ligand Competitive Saturation (SILCS) method identifies the location and approximate affinities of small molecular fragments on a target macromolecular surface by performing Molecular Dynamics (MD) simulations of the target in an aqueous solution of small molecules representative of different chemical functional groups. In this study, we introduce a set of small molecules to map potential interactions made by neutral hydrogen bond donors and acceptors, and charged donor and acceptor fragments in addition to nonpolar fragments. The affinity pattern is obtained in the form of discretized probability or, equivalently, free energy maps, called FragMaps, which can be visualized with the target surface. We performed SILCS simulations for four proteins for which structural and thermodynamic data is available for multiple, diverse ligands. Good overlap is shown between high affinity regions identified by the FragMaps and the crystallographic positions of ligand functional groups with similar chemical functionality, thus demonstrating the validity of the qualitative information obtained from the simulations. To test the ability of FragMaps in providing quantitative predictions, we calculate the previously introduced Ligand Grid Free Energy (LGFE) metric and observe its correspondence with experimentally measured binding affinity. LGFE is computed for different conformational ensembles and improvement in prediction is shown with increasing ligand conformational sampling. Ensemble generation includes a Monte Carlo sampling approach that uses the GFE FragMaps directly as the energy function. The results show some, but not all experimental trends are predicted, and warrant improvements in the scoring methodology. In addition, the potential utility of atom-based free energy contributions to the LGFE scores and the use of multiple ligands in SILCS to identify displaceable water molecules during ligand design are discussed.
Hydroxyurea (HU) is the only FDA approved medication for treating sickle cell disease in adults. The primary mechanism of action is pharmacological elevation of nitric oxide (NO) levels which induces propagation of fetal hemoglobin. HU is known to undergo redox reactions with heme based enzymes like hemoglobin and catalase to produce NO. However, specific details about the HU based NO release remain unknown. Experimental studies indicate that interaction of HU with human catalase compound I produces NO. Presently, we combine flexible receptor-flexible substrate induced fit docking (IFD) with energy decomposition analyses to examine the atomic level details of a possible key step in the clinical conversion of HU to NO. Substrate binding modes of nine HU analogs with catalase compound I were investigated to determine the essential properties necessary for effective NO release. Three major binding orientations were found that provide insight into the possible reaction mechanisms for producing NO. Further results show that anion/radical intermediates produced as part of these mechanisms would be stabilized by hydrogen bonding interactions from distal residues His75, Asn148, Gln168, and oxoferryl-heme. These details will ideally contribute to both a clearer mechanistic picture and provide insights for future structure based drug design efforts.
The search for new tuberculosis treatments continues as we need to find
molecules that can act more quickly, be accommodated in multi-drug regimens, and
overcome ever increasing levels of drug resistance. Multiple large scale
phenotypic high-throughput screens against Mycobacterium
tuberculosis (Mtb) have generated dose response
data, enabling the generation of machine learning models. These models also
incorporated cytotoxicity data and were recently validated with a large external
A cheminformatics data-fusion approach followed by Bayesian machine
learning, Support Vector Machine or Recursive Partitioning model development
(based on publicly available Mtb screening data) was used to
compare individual datasets and subsequent combined models. A set of 1924
commercially available molecules with promising antitubercular activity (and
lack of relative cytotoxicity to Vero cells) were used to evaluate the
predictive nature of the models. We demonstrate that combining three datasets
incorporating antitubercular and cytotoxicity data in Vero cells from our
previous screens results in external validation receiver operator curve (ROC) of
0.83 (Bayesian or RP Forest). Models that do not have the highest five-fold
cross validation ROC scores can outperform other models in a test set dependent
We demonstrate with predictions for a recently published set of
Mtb leads from GlaxoSmithKline that no single machine
learning model may be enough to identify compounds of interest. Dataset fusion
represents a further useful strategy for machine learning construction as
illustrated with Mtb. Coverage of chemistry and
Mtb target spaces may also be limiting factors for the
whole-cell screening data generated to date.
Bayesian models; Collaborative Drug Discovery Tuberculosis database; Dual-event models; Function class fingerprints; Lead optimization; Mycobacterium tuberculosis; Recursive partitioning; Support vector machine; Tuberculosis
A protocol was developed for the computational determination of the contribution of interfacial amino acid residues to the free energy of protein-protein binding. Thermodynamic integration, based on molecular dynamics simulation in CHARMM, was used to determine the free energy associated with single point mutations to glycine in a protein-protein interface. The hot spot amino acids found in this way were then correlated to structural similarity scores detected by the ProBiS algorithm for local structural alignment. We find that amino acids with high structural similarity scores contribute on average −3.19 kcal/mol to the free energy of protein-protein binding and are thus correlated with hot spot residues, while residues with low similarity scores contribute on average only −0.43 kcal/mol. This suggests that the local structural alignment method provides a good approximation of the contribution of a residue to the free energy of binding and is particularly useful for detection of hot spots in proteins with known structures but undetermined protein-protein complexes.
hot spot prediction; protein-protein binding; thermodynamic integration
End-point free energy calculations using MM-GBSA and MM-PBSA provide a detailed understanding of molecular recognition in protein-ligand interactions. The binding free energy can be used to rank-order protein-ligand structures in virtual screening for compound or target identification. Here, we carry out free energy calculations for a diverse set of 11 proteins bound to 14 small molecules using extensive explicit-solvent MD simulations. The structure of these complexes was previously solved by crystallography and their binding studied with isothermal titration calorimetry (ITC) data enabling direct comparison to the MM-GBSA and MM-PBSA calculations. Four MM-GBSA and three MM-PBSA calculations reproduced the ITC free energy within 1 kcal•mol−1 highlighting the challenges in reproducing the absolute free energy from end-point free energy calculations. MM-GBSA exhibited better rank-ordering with a Spearman ρ of 0.68 compared to 0.40 for MM-PBSA with dielectric constant (ε = 1). An increase in ε resulted in significantly better rank-ordering for MM-PBSA (ρ = 0.91 for ε = 10). But larger ε significantly reduced the contributions of electrostatics, suggesting that the improvement is due to the non-polar and entropy components, rather than a better representation of the electrostatics. SVRKB scoring function applied to MD snapshots resulted in excellent rank-ordering (ρ = 0.81). Calculations of the configurational entropy using normal mode analysis led to free energies that correlated significantly better to the ITC free energy than the MD-based quasi-harmonic approach, but the computed entropies showed no correlation with the ITC entropy. When the adaptation energy is taken into consideration by running separate simulations for complex, apo and ligand (MM-PBSAADAPT), there is less agreement with the ITC data for the individual free energies, but remarkably good rank-ordering is observed (ρ = 0.89). Interestingly, filtering MD snapshots by pre-scoring protein-ligand complexes with a machine learning-based approach (SVMSP) resulted in a significant improvement in the MM-PBSA results (ε = 1) from ρ = 0.40 to ρ = 0.81. Finally, the non-polar components of MM-GBSA and MM-PBSA, but not the electrostatic components, showed strong correlation to the ITC free energy; the computed entropies did not correlate with the ITC entropy.
Structure-property relationships and structure-activity relationships play an important role in many research areas, such as medicinal chemistry and drug discovery. Such methods, however, have focused on providing post-hoc descriptions of such relationships based on known data. The ability for these descriptions to remain relevant when considering compounds of unknown activity, and thus the prediction of activity and property landscapes using existing data, remain little explored. In this study, we present a novel method of evaluating the ability of a compound comparison methodology to provide accurate information about a set of unknown compounds, and also explore the ability of these predicted activity landscapes to prioritize active compounds over inactive. These methods are applied to three distinct and diverse sets of compounds, each with activity data for multiple targets, for a total of eight target-compound set pairs. Six methodologically distinct compound comparison methods were evaluated. We show that overall, all compound comparison methods provided an improvement in structural-activity relationship prediction over random and were able to prioritize compounds in a superior manner to random sampling, but the degree of success and therefore applicability varied markedly.
conditional probability; molecular representation; property landscapes; structure-activity relationships
Current methods of structure identification in mass spectrometry based non-targeted metabolomics rely on matching experimentally determined features of an unknown compound to those of candidate compounds contained in biochemical databases. A major limitation of this approach is the relatively small number of compounds currently included in these databases. If the correct structure is not present in a database it cannot be identified, and if it cannot be identified it cannot be included in a database. Thus, there is an urgent need to augment metabolomics databases with rationally designed biochemical structures using alternative means. In this study, we present a database of in silico enzymatically synthesized metabolites (IIMDB) to partially address this problem. The database, which is available from http://metabolomics.pharm.uconn.edu/iimdb/, includes ~23,000 known compounds (mammalian metabolites, drugs, secondary plant metabolites and glycerophospholipids) collected from existing biochemical databases plus more than 400,000 computationally generated human phase I and phase II metabolites of these known compounds. The IIMDB database features a user-friendly web interface and a programmer-friendly RESTful web service. Ninety-five percent of the computationally generated metabolites in IIMDB were not found in any existing database. However, 21,640 were identical to compounds already listed in PubChem, HMDB, KEGG or HumanCyc. Furthermore, a vast majority of these in silico metabolites were scored as biological using BioSM, a software program that identifies biochemical structures in chemical structure space. These results suggest that in silico biochemical synthesis represents a viable approach for significantly augmenting biochemical databases for non-targeted metabolomics applications.
metabolomics; mass spectrometry; in silico structure generation; biochemical databases
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA.
template-based method; G-LoSA; global structure alignment; pocket shape; computer-aided drug design
CYP19A1, also known as aromatase or estrogen synthetase, is the rate-limiting enzyme in the biosynthesis of estrogens from their corresponding androgens. Several clinically used breast cancer therapies target aromatase. In this work, explicitly solvated all-atom molecular dynamics simulations of aromatase with a model of the lipid bilayer and the transmembrane helix are performed. The dynamics of aromatase and the role of titration of an important amino acid residue involved in aromatization of androgens are investigated via two 250-ns long simulations. One simulation treats the protonated form of the catalytic aspartate 309, which appears more consistent with crystallographic data for the active site, while the simulation of the deprotonated form shows some notable conformational shifts. Ensemble-based computational solvent mapping experiments indicate possible novel druggable binding sites that could be utilized by next-generation inhibitors. In addition, the effects of protonation on the ligand positioning and channel dynamics are investigated using geometrical models that estimate the opening width of critical channels. Significant differences in channel dynamics between the protonated and deprotonated trajectories are exhibited, suggesting that the mechanism for substrate and product entry and the aromatization process may be coupled to a “locking” mechanism and channel opening. Our results may be particularly relevant in the design of novel drugs, which may be useful therapeutic treatments of cancers such as those of the breast and prostate.
In this study, we use the recently released 2012 Community Structure-Activity Resource (CSAR) Dataset to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential (VDWScore). The CSAR Dataset contains 757 compounds, most with known affinities, and 57 crystal structures. With the help of the script files for docking preparation, we use the full CSAR Dataset to evaluate the performances of the scoring functions on binding affinity prediction and active/inactive compound discrimination. The CSAR subset that includes crystal structures is used as well, to evaluate the performances of the scoring functions on binding mode and affinity predictions. Within this structure subset, we investigate the importance of accurate ligand and protein conformational sampling and find that the binding affinity predictions are less sensitive to non-native ligand and protein conformations than the binding mode predictions. We also find the full CSAR Dataset to be more challenging in making binding mode predictions than the subset with structures. The script files used for preparing the CSAR Dataset for docking, including scripts for canonicalization of the ligand atoms, are offered freely to the academic community.
CSAR; community structure-activity resource; protein-ligand docking; knowledge-based scoring functions