PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Chem Inf Model. Author manuscript; available in PMC 2017 July 31.
Published in final edited form as:
PMCID: PMC5537004
NIHMSID: NIHMS879525

Toward High-Throughput Predictive Modeling of Protein Binding/Unbinding Kinetics

See Hong Chiu and Lei Xie*

Abstract

One of the unaddressed challenges in drug discovery is that drug potency determined in vitro is not a reliable indicator of drug activity in vivo. Accumulated evidence suggests that in vivo activity is more strongly correlated with the binding/unbinding kinetics than the equilibrium thermodynamics of protein–ligand interactions (PLIs). However, existing experimental and computational techniques are insufficient in studying the molecular details of kinetics processes of PLIs on a large scale. Consequently, we not only have limited mechanistic understanding of the kinetic processes but also lack a practical platform for high-throughput screening and optimization of drug leads on the basis of their kinetic properties. For the first time, we address this unmet need by integrating coarse-grained normal mode analysis with multitarget machine learning (MTML). To test our method, HIV-1 protease is used as a model system. We find that computational models based on the residue normal mode directionality displacement of PLIs can not only recapitulate the results from all-atom molecular dynamics simulations but also predict protein–ligand binding/unbinding kinetics accurately. When this is combined with energetic features, the accuracy of combined kon and koff prediction reaches 74.35%. Furthermore, our integrated model provides us with new insights into the molecular determinants of the kinetics of PLIs. We propose that the coherent coupling of conformational dynamics and thermodynamic interactions between the receptor and the ligand may play a critical role in determining the kinetic rate constants of PLIs. In conclusion, we demonstrate that residue normal mode directionality displacement can serve as a kinetic fingerprint to capture long-time-scale conformational dynamics of the binding/unbinding kinetics. When this is coupled with MTML, it is possible to screen and optimize compounds on the basis of their binding/unbinding kinetics in a high-throughput fashion. The further development of such computational tools will bridge one of the critical missing links between in vitro compound screening and in vivo drug activity.

Graphical abstract

An external file that holds a picture, illustration, etc.
Object name is nihms879525u1.jpg

1. INTRODUCTION

Target-based and cell-based screenings are the two major approaches in the early stage of drug discovery. In both of these technologies, one of the unaddressed fundamental challenges is that drug potency measured in vitro may not be a reliable indicator of drug efficacy and toxicity in vivo. In compound screening and lead optimization, equilibrium thermodynamic constants such as half-maximal inhibitory concentration (IC50) or dissociation constant (Kd) have been used as the measures of drug potency for years. As molecules in the human body are in a nonequilibrium condition, the activity of a drug depends on not only how strongly it interacts with the protein but also how easily it hits the target and how long it resides in the target. An increasing body of evidence suggests that drug activity in vivo is not defined by equilibrium conditions measured in vitro but rather depends on the residence time (τ = 1/koff) of the receptor–ligand complex in vivo in a number of cases.1 The longer residence time increases the efficacy of the drug. For example, geldenamycin has a low affinity for heat shock protein (Hsp90) in vitro, with IC50 ~ 1 μM, in contrast to its nanomolar effects in vivo.1,2 Copeland et al.3 analyzed the results of experiments on mutation-based resistance to inhibitors of HIV-1 protease and concluded that the essential factor for sustained drug efficacy in vivo is the residence time and not the binding affinity. Pan et al.4 reported that the residence time is highly correlated with the functional efficacy of a series of agonists of the A2A adenosine receptor (r2 = 0.95) but that there is little correlation with binding affinity (r2 = 0.15). Furthermore, Dahl and Akerud5 disclosed that on-target side effects could be reduced by reducing the drug residence time. Thus, a drug with optimal efficacy and toxicity profiles should have a balanced kon and koff. Since IC50 and Kd depend on the measurement of the combined effect of kon and koff, they are actually insufficient to explain the impact of binding/unbinding kinetics on drug action, as the same value of Kd can come from an infinite number of combinations of kon and koff. Additionally, since Kd is dependent on the free energy difference between the bound and unbound states but is independent of the transition state of protein–ligand interaction (PLI), it is inadequate to explain the binding/unbinding process of PLI.4,6

Experimental techniques for the study of PLI kinetics are not only expensive and time-consuming but also insufficient to provide detailed molecular characterization of the PLI kinetics process.79 Computational modeling plays an increasing role in elucidating the binding/unbinding process of PLIs. Molecular dynamics (MD) simulations have been reported to be capable of capturing the binding process from beginning to end in full atomic detail.10 Unfortunately, the power of MD simulations is limited by the fact that protein–ligand binding events take place on time scales ranging from microseconds up to hours and days. Thus, MD simulations are infeasible for the majority of binding processes. Therefore, metadynamics and other conformational sampling techniques have been developed to improve sampling in MD simulations of systems where ergodicity is hindered by the form of the system’s energy landscape.1113 For example, Gervasio et al.11 successfully applied a metadynamics method to the docking of ligands on flexible receptors in water solution. The method is able not only to find the docked geometry and to predict the binding affinity (ΔGbinding) but also to explore the entire docking process from the solution to the docking cavity, including barriers and intermediate minima. Even though these advances are remarkable, existing techniques for MD simulations to study the whole binding/unbinding process of a PLI on a large scale are not yet feasible. In addition, the choice of collative variables in the metadynamics simulation is not a trivial task.

With the increasing availability of protein binding kinetics data,14,15 data-driven modeling provides an alternative and efficient solution for studying the PLI kinetics. Several predictive models for kinetic constants of protein–protein interactions (PPIs) have been developed.16,17 However, the molecular attributes in these models cover only static structural characteristics such as the percentage of residues in α-helixes, the buried surface area of the protein, the proportions of charged residues and polar atoms at the interface, and the energetic features such as hydrogen-bonding potential and the interfacial electrostatic interaction energy between interfacial residues. These features may not sufficiently capture long-timescale conformational dynamics of the PLI kinetic processes. In addition, existing methods predict kon and koff independently. As a matter of fact, however, they could be dependent in nature. To our knowledge, few methods are available for large-scale modeling of the binding/unbinding kinetics of PLIs with explicit conformational dynamics features while predicting kon and koff simultaneously.

To tackle the above problems, we integrate energetic and conformational dynamics features derived from efficient molecular modeling with a state-of-the-art multitarget machine learning (MTML) approach. In this study, ligand-bound HIV-1 proteases are used as an example to build models. In addition to energetic features derived from pairwise decomposition of the residue interaction energy18,19 and environment-dependent electrostatic potential energy,20 relative movement of ligand–residue (RMLR) and relative movement of residue–residue (RMRR), which represent the dynamics impact of ligand binding on the amino acid residues, are derived from coarse-grained normal mode analysis (NMA) and used to train MTML models. RMLR is the dot product of the ligand displacement vector after normalization and the residue displacement vector, and RMRR is the dot product of two displacement vectors of a residue upon ligand binding. Our multifacet statistical analysis consistently shows that conformational dynamics features such as RMLR are as important as energetic features in recapitulating the results from all-atom MD simulations as well as predicting kon and koff accurately. On the basis of these findings, we propose that coherent coupling of conformational dynamics and thermodynamic interactions between the protein and the ligand may play a critical role in determining the kinetic rate constants of PLIs. Furthermore, we demonstrate that the relative movement of normal modes of amino acids upon ligand binding can serve as a kinetic fingerprint to capture long-time-scale conformational dynamics features of the protein binding/unbinding process. In combination with MTML, we can use the kinetic fingerprint to predict combined kon and koff with an accuracy of 74.35% and to screen compounds with in vivo drug activity with an area under the receiver operating characteristic curve of 0.8668. Thus, it is possible for us to screen and optimize compounds on the basis of the binding/unbinding kinetics of PLIs in a high-throughput fashion. The further development of such computational tools will bridge one of the critical missing links between in vitro drug potency and in vivo drug efficacy and safety, thereby accelerating the drug discovery process.

2. MATERIALS AND METHODS

Figure 1 depicts the workflow of the computational procedure in this study, which includes four phases: Phase 1 concerns building the structure of the three-dimensional (3D) ligand-bound HIV-1 protease complex. Phase 2 addresses the identification of ligand-binding-site residues. Phase 3 targets the construction of the five principal data sets. Phase 4 is the MTML computation.

Figure 1
Schema of the methodology.

In brief, when no cocrystallized structures existed, the 3D conformation of the ligand was docked in the HIV protease using the protein–ligand docking software eHiTS.21 NMA was performed for both apo and ligand-bound structures (from the Protein Data Bank (PDB) complex structure or protein–ligand docking) without further energy minimization using iMod.22 RMLR and RMRR, which represent the conformational dynamics impact of ligand binding on the binding-site residues, were derived from NMA analysis. RMLR is the dot product of the ligand displacement vector after normalization and the residue displacement vector, and RMRR is the dot product of the displacement vectors of a residue for the apo and ligand-bound structures. Both RMLR and RMRR use the first 10 lowest-frequency modes.22

Pairwise decomposition of the residue interaction energy was performed to compare the relative significance of conformational dynamics features with energetic features in determining the protein binding/unbinding kinetics. To compute the pairwise decomposition of the residue interaction energy, 39 ligand–HIV-1 models were first minimized by MD simulations. The MD simulations were carried out in the canonical NVT ensemble using Nanoscale Molecular Dynamics (NAMD)19 program with the CHARMM27 force field for HIV-1 protein and the CHARMM general force field for the ligands. All of the models proceeded through a minimization process of 4 ps, an equilibrium process of 150 ps, and a production period of 50 ps. The trajectories in the production phase were stable (Figure S1 in the Supporting Information). During the simulations, the temperature and time step were set at 310 K and 2 fs, respectively, and the benchmarked NAMD generalized Born implicit solvent (GBIS) method reported by Tanner et al.23 was adopted to reduce the simulation time. The capability of the NAMD GBIS model was illustrated by simulation of the ratcheting Escherichia coli ribosome involving ~250 000 atoms. The conformation in the final production phase was used for the calculation of the residue-decomposed pairwise interaction energy (PIE) as well as its two components, the van der Waals energy (VDWE) and the electrostatic energy (EE) between the ligand and amino acid residues. VDWE and EE were calculated using CHARMM force fields and an environment-dependent electrostatic potential energy.20

Energetic features (PIE, VDWE, and EE) and conformational dynamics features (RMRR and RMLR), individually or combined, along with experimentally determined kon and koff data24 (Table S1 in the Supporting Information) were used to train MTML models. We evenly discretized the log10(kon) and log10(koff) values of the 39 HIV-1 protease inhibitors into four different binary classes, with the labels (0,0), (0,1), (1,0), and (1,1) on the two-dimensional space of log10(kon) and log10(koff), using the criteria of log10(koff) = −2 and log10(kon) = 5.6 (Figure 2 and Table S2), where class (0,1) was enriched with five FDA-approved drugs.

Figure 2
Discretization of kon and koff values for HIV protease inhibitors. The discretization is based on the criteria of log10(koff) = −2 (x axis) and log10(kon) = 5.6 (y axis). Data for five FDA-approved drugs and 34 other inhibitors are shown as blue ...

To evaluate whether the proposed conformational dynamics features can capture essential information on the protein binding/unbinding kinetics, we first evaluated their accuracies in predicting the kinetic rate constants kon and koff simultaneously. The accuracy was calculated as follows:

accuracy=i=1nAi/N

where Ai is the prediction accuracy for each case and N is the total number of cases. If both kon and koff are predicted correctly, Ai = 100%. If one of them is correct, Ai = 50%. If neither is correct, Ai = 0.0%.

We also evaluated the performance of the method in high-throughput screening of molecules with in vivo drug activity based on kon and koff using the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC). As all of the FDA-approved drugs fell into the class (0,1), the cases in this class were considered as positives. All of the other cases in the remaining three classes were considered as negatives.

More details on the computational workflow are given in Materials and Methods in the Supporting Information.

3. RESULTS

3.1. Characteristics of the Data Set

In this study, we use HIV-1 protease complex structures to investigate the conformational dynamics and develop a predictive model of the ligand binding/unbinding kinetics. HIV-1 protease is an excellent model system for our purpose. First, 39 HIV-1 protease inhibitors have experimentally determined kon and koff values under the same conditions,24,25 providing a reasonable number of high-quality data points for data-driven modeling. Second, abundant data on HIV-1 protease inhibitor resistance mutations (PIRMs) are available. PIRMs are mutations that facilitate resistance of HIV to protease inhibitors.26 They can be used to validate the predictive model. Third, both unbound and complexed structures of HIV-1 protease have been released in the PDB.27 The apo and holo conformations are the basis for our analysis. Table S3 shows the resolution and the status of apo/holo conformation for the 10 HIV-1 complexes used in the experiment.

The ultimate goal of this work is to develop a new method for high-throughput screening and optimization of chemical compounds with in vivo drug activity on the basis of their kinetic properties. Therefore, we format the problem as a multitarget classification problem to predict whether a chemical is drug-like. In this regard, we group FDA-approved drugs in a single class. In addition, the machine learning algorithms are in general sensitive to imbalances in the training data. In order to avoid the bias of machine learning algorithms in handling an imbalanced small training set, we evenly discretize the log10(koff) and log10(kon) values of 39 HIV-1 protease inhibitors into four different binary classes, with the labels (0,0), (0,1), (1,0), and (1,1) on the two-dimensional space of log10(kon) and log10(koff), using the criteria of log10(koff) = −2 and log10(kon) = 5.6 (Figure 2 and Table S2), where class (0,1) is enriched with five FDA-approved drugs.

Ten inhibitors have solved HIV-1 complex structures in the PDB with resolution varying from 1.7 to 2.29 Å (Table S3). For the remaining 29 inhibitors whose complex structures have not been experimentally determined, the protein–ligand docking software eHiTS21 is applied to predict their binding poses. The receptor is chosen from one of the ligand-bound HIV-1 complexes with a cocrystallized ligand structure similar to the docked ligand structure, as determined by the superimposed ligand substructures. Whenever possible, the common fragment of the cocrystallized ligand and the docked ligand is used as a constraint to select the final binding pose of the docked ligand, such that the root-mean-square deviation (RMSD) of the superimposed common fragments is minimal. An example is shown in Figure S2.

Binding-site amino acid residues that are involved in the HIV-1 protease inhibitor interactions are determined using the change in solvent-accessible surface area (SASA) upon ligand binding (see section SI 1.2 in the Supporting Information). As depicted in Figure 3, there are 22 amino acids on each of the two chains of the HIV-1 dimer, giving a total of 44 amino acids. The 22 amino acids include one nonpolar alanine (A28), one basic polar arginine (R8), two nonpolar leucines (L10 and L23), four nonpolar glycines (G27, G48, G49, and G52), three acidic polar aspartic acids (D25, D29, and D30), five nonpolar isoleucines (I47, I50, I54, I76, and I84), two nonpolar valines (V32 and V82), one basic polar lysine (K45), one nonpolar phenylalanine (F53), one polar threonine (T80), and one nonpolar proline (P81).

Figure 3
The 44 HIV-1 residues selected by the SASA program and the 26 drug-resistant mutation residues. Chains A and B of HIV-1 are shown as transparent gray and yellow ribbons, respectively, with their flap regions (residues 43–58) in pink cartoon and ...

3.2. Characterization of Long-Time-Scale Conformational Dynamics of Protein–Ligand Interactions Using Directionality Displacement of Normal Modes

NMA is a powerful computational method to identify and characterize the slowest molecular deformational motions with large amplitude, which are widely involved in biological functions of macromolecules but are inaccessible by other methods. Protein binding/unbinding events often occur on long-time-scales ranging from milliseconds to days, far beyond the current capability of MD simulations. Coarse-grained NMA may allow us to extract important information on the dynamics of protein–ligand binding/unbinding processes. Since the presence of solvent damping dramatically slows the large-amplitude motions of biomolecules, the time scales of molecular motions in reality are much longer than what can be estimated from the eigenvalues of NMA that are calculated in vacuum. In other words, solvent damping causes a time-scale discrepancy between NMA and real molecular motions. However, the study conducted by Ma revealed that the presence of solvent has a minor impact on eigenvectors, which are determined by the potential surface only.28,29 Thus, the information provided by the eigenvectors for the directionality of conformational transitions can be used to study dynamic processes on the time scale of real situations.

In this study, NMA was conducted using iMod. The directionality of normal modes of the residues in the binding site is used to characterize the conformational dynamics features of binding and unbinding events. Specifically, two data sets, one for RMLR (DS-RMLR) and one for RMRR (DS-RMRR), were derived from NMA analysis. As an example, Figure 4A depicts the superposition of the 44 residue eigenvectors of the aligned apo structure (PDB code 3IXO) and the A045-bound structure for the first normal mode. It illustrates the shift of the eigenvectors of the 44 residues of HIV-1 protease upon ligand binding. Figure 4B illustrates the relative displacements of the 44 ligand–residue pairs in the DMP-bound HIV-1 complex. Each of these two data sets comprises 1716 values for 39 ligands and 44 amino acids. Figure 5 depicts the RMLR and RMRR values for L10 (leucine) and D25 (aspartic acid) of chain A induced by the 39 ligands.

Figure 4
Directionality of normal modes. (A) Superposition of the first normal modes of 44 residues of the aligned apo HIV-1 structure (blue) (PDB code 3IXO) and the A045 ligand-bound HIV-1 structure (green) (PDB code 1AJV). (B) First normal mode displacements ...
Figure 5
Examples of RMLR and RMRR. The RMLR and RMRR of D25 are shown in magenta and light blue, respectively, and the RMLR and RMRR of L10 are shown in dark blue and orange, respectively. Both of these amino acids are in chain A.

3.3. Characterization of Ligand–Residue Interaction Energy

PIE and its two constituent components EE and VDWE between the ligand and the binding-site residue of HIV-1 protease are calculated from CHARMM force fields and an environment-dependent electrostatic potential energy. The values of PIE, EE, and VDWE, which characterize various energetic aspects of the ligand–residue interaction, are used to build three data sets: DS-PIE, DS-EE, and DS-VDWE. Each of these three data sets comprises 1716 values for 39 ligands and 44 amino acids. Figure 6A,B depicts the PIE, EE and VDWE values for D25 and L10 in chain A, respectively.

Figure 6
Example of the residue-decomposed interaction energies PIE, EE, and VDWE. (A) PIE, EE, and VDWE values for D25 of chain A. (B) PIE, EE, and VDWE values for L10 of chain A.

3.4. Structural Determinants of Protein–Ligand Binding/Unbinding Kinetics

We use the energetic and conformational dynamics attributes derived from the residue-decomposed binding energies and NMA to train an MTML model for the classification prediction of kinetic rate constants. In total there are five principal training data sets: DS-PIE, DS-EE, DS-VDWE, DS-RMLR, and DS-RMRR. Each of them comprises 39 cases, with each case comprising 44 attributes.

MTML is defined as follows: Given a set of learning examples D of the form (X, Y), where X = (x1, x2, …, xk) is a vector of k training attributes and Y = (y1, y2, …, yt) is a vector of t target attributes, learn a model that, given a new unlabeled example X, can predict the values of all of the target attributes Y simultaneously. When yi is categorical, the problem is known as classification. In this study, the yi are binarized values of kon and koff as shown in Figure 2.

Random forest predictive clustering (RF-Clus) is applied for the task of MTML. RF-Clus outperforms other MTML algorithms in the benchmark studies.30 In addition, it can handle high-dimensional features, e.g., in the situation where the number of attributes is much larger than the number of cases, and can select the importance of attributes (amino acid residues) that contribute to the accuracy of kon and koff prediction. The model was run on the iteration numbers of 100, 200, 250, and 500 in the leave-one-out (LOO) cross-validation experiment. Except for the value of FTest, which was set at 0.1, default values were chosen for all of the other parameters, including PruneSet = None, MinimalWeight = 1.0, EnsembleMethod = RForest, and VotingType = Majority.

Table 1 shows the selected residues in descending order of score of importance. Residues with frequencies of occurrence greater than 25% in the LOO cross-validation experiment of the binary-target random forest classification algorithm with iteration number = 500 were chosen. Consequently, 16, 15, 13, and 14 residues were selected from DS-RMLR, DS-RMRR, DS-EE, and DS-PIE, respectively. These identified key residues consist of three motifs: an N-terminal motif (R8 and L10), a charged motif (L23, D25, G27, A28, D29, D30, and V32), and a motif corresponding to the flap region (residues 43–58), as shown in Figure 7. Both the N-terminal motif and the charged motif are common to DS-PIE, DS-RMLR, DS-RMRR, and DS-EE. The flap region is identified only by DS-RMLR and DS-RMRR.

Figure 7
The 21 key residues inferred from the MTML feature selection. Chains A and B of HIV-1 protease are shown as transparent gray and transparent yellow ribbons, respectively. The three residues located on chain B are labeled with asterisks (*). Residues in ...
Table 1
Key Residues Identified from the Four Data Sets DS-RMLR, DS-RMRR, DS-EE, and DS-PIE; The Feature Selection Criterion Is a Frequency of Attribute Occurrence ≥25%

All-atom MD simulations have shown that the conformational dynamics of the flap region plays a key role in the ligand binding process of HIV-1 protease.3133 Consistent with this observation, the residues in the flap region are identified as key kinetic features by DS-RMLR and DS-RMRR. The recapitulation of the findings from the all-atom MD simulations supports the conclusion that the directionality displacement of normal modes upon ligand binding can serve as a kinetic fingerprint to capture the long-time-scale conformational dynamics of protein binding/unbinding events. The charged motif in the active site participates in substrate peptide recognition. Specifically, D25 and D29 form hydrogen bonds with the substrate peptide. Additionally, R8 and D30 can interact with polar side chains or distal main-chain groups in longer substrate peptides. Moreover, the mutations of L10, L23, and V32 lead to drug resistance of HIV protease inhibitors.

3.5. Combined Electrostatic and Conformational Dynamics Features Can Predict kon/koff Accurately

We examine the impacts of the energetic and conformational dynamics features on the accuracy of kon/koff prediction by building different MTML models. First, we use energetic features (the DS-EE, DS-PIE, and DS-VDWE data sets) to build the MTML model. Second, in order to evaluate whether the normal mode directionality features can be used to predict kon and koff, we apply DS-RMRR, DS-RMLR, and DS-RMLR + DS-RMRR, comprising 88 training attributes in the feature vectors to build the MTML model. Third, we integrate the properties of conformational dynamics and energetics by adding the RMLR features to DS-EE (data set DS-EE + DS-RMLR) to build the MTML model.

For the models trained by DS-EE, DS-PIE, DS-VDWE, DS-RMLR, DS-RMRR, DS-RMLR + DS-RMRR, and DS-EE + DS-RMLR, the highest prediction accuracies and standard deviations (SDs) of the combined four-class log10(kon)/log10(koff) are 71.79% (SD = 34.01%), 66.66% (SD = 36.87%), 47.43% (SD = 36.18%), 69.23% (SD = 37.37%), 57.69% (SD = 35.42%), 70.51% (SD = 31.68%), and 74.35% (SD = 27.79%), respectively (Figure 8). Moreover, among the three models trained by the energetic features, the prediction accuracies of the combined four-class log10(kon)/log10(koff) given by the DS-EE and DS-PIE models are significantly higher than that given by a random guess (50%) by 21.79% (t test p value = 2.0 × 10−5) and 16.66% (t test p value = 6.0 × 10−4), respectively, whereas the accuracy given by the DS-VDWE model is lower than that for the random guess by 2.57% (t test p value >0.05). These results suggest that in the case of HIV-1 protease, the EE features are more accurate in predicting kon and koff than the features of VDWE and PIE. This implies that EE plays a more important role than VDWE but may be not the only factor in driving the protein–ligand binding/unbinding kinetics, as it cannot capture conformational dynamics information, as shown by the feature selection results.

Figure 8
Prediction accuracies of the MTML models for the combined four-class log10(kon)/log10(koff). The numbers in parentheses are the numbers of iterations in the random forest algorithm used in the experiment.

For all three models trained by the normal mode directionality features, the prediction accuracy of the combined four-class log10(kon)/log10(koff) is higher than that for the random guess. Although the accuracy given by the DS-RMRR model is only slightly higher than the random guess value by 7.69%, the accuracies given by the DS-RMLR and DS-RMLR + DS-RMRR models are significantly higher than for the random guess by 19.23% (t test p value = 3.0 × 10−4) and 10.51% (t test p value = 2.5 × 10−3), respectively. These results suggest that the normal mode directionality can capture information on the ligand binding and unbinding processes, further supporting the conclusion that they can be used as a kinetic fingerprint. Comparison of the prediction accuracies of the combined four-class log10(kon)/log10(koff) given by the DS-EE and DS-EE + DS-RMLR models shows that integrating the conformational dynamics features with the energetic features increases the accuracy by 2.56%, from 71.79% to 74.35% (t test p value = 0.02) (Figure 8). This implies that the electrostatic interactions and conformational dynamics are jointly responsible for the binding/unbinding kinetics of HIV protease. The overall prediction accuracy comes from three situations: correct prediction of both kon and koff (100%), one of them (50%), or neither of them (0%). Figure 9 shows the breakdown of these three situations. The numbers of 100% correctly predicted cases are the same for DS-EE (κ = 0.38), DS-RMLR (κ = 0.37), and DS-EE+DS-RMLR (κ = 0.34). The performance gain of DS-EE + DS-RMLR mainly comes from an increase in 50% accuracy.

Figure 9
Contributions of the three accuracy categories (0%, 50%, and 100%) in the combined four-class prediction. The value in each component of the stacked columns is the number of predictions. The numbers in parentheses are the numbers of iterations in the ...

For the multitarget classification model with the highest accuracy, which is based on DS-EE + DS-RMLR, Table 2 shows the four-class confusion matrix. Class (0,0) has the highest classification error. This may be due to the fact that the log10(koff)/log10(kon) values of several drugs in class (0,0) are close to those in classes (0,1) and (1,0). Figure 10 shows ROC curves for prediction of in vivo active compounds that fall into class (0,1), i.e., log10(koff) < −2.0 and log10(kon) > 5.6. The predictions are ranked by the number of votes for kon minus the number of votes for koff from the random forest model. The five FDA approved drugs saquinavir, indinavir, nelfinavir, amprenavir, and ritonavir are ranked at top 1, 2, 3, 8, and 10, respectively. The corresponding AUC value is 0.8668. It is noted that conventional affinity-based virtual screening may not distinguish compounds in class (0,1) from those in classes (0,0) and (1,1), as they may have similar Kd values. On the other hand, predicting kon or koff alone is not sufficient, as a compound with in vivo drug activity should have a relatively low koff and medium kon in the case of HIV protease inhibitors.24

Figure 10
ROC curve for the MTML model to predict in vivo drug activity. Class (0,1), which is enriched with FDA-approved drugs, is considered as positive, and the remaining three classes are treated as negative.
Table 2
Four-Class Confusion Matrix for the DS-EE + DS-RMLR(500) Predictive Model

The performance of the proposed predictive model may depend on how to discretize the log10(kon) and log10(koff) values. We performed LOO cross-validation for another discretization using the cutoffs to log10(koff) = −1.5 and log10(kon) = 5.0 (model of −1.5/5.0 cutoffs). The set with the −1.5/5.0 cutoffs includes 4, 18, 3, and 14 ligands in the four imbalanced binary classes (0,0), (0,1), (1,0), and (1,1), respectively (Table S4). As expected, the imbalanced class distribution slightly deteriorates the performance of the multitarget classifier. The accuracy of the combined four-class log10(kon)/log10(koff) prediction drops to 71.79%, which is lower than the accuracy of 74.35% based on the −2.0/5.6 cutoffs with DS-EE + DS-RMLR. However, patterns similar to those obtained using the −2.0/5.6 cutoffs are observed: the prediction accuracy of DS-EE + DS-RMLR is the best (70.15% for DS-EE only and 69.23% for DS-RMLR only), and most of residues in the flap region are identified only by DS-RMLR. Thus, the directionality displacement of normal modes upon ligand binding/unbinding is a robust fingerprint to capture long-time-scale conformational dynamics. From the machine learning point of view, the unbalanced data can be handled using under/oversampling. This strategy could be used to improve the model’s performance.

4. DISCUSSION

4.1. High-Throughput Predictive Modeling of Ligand Binding and Unbinding Kinetics

In spite of the recognized importance of protein–ligand binding and unbinding kinetics in drug discovery, few efficient computational tools are available to screen and optimize chemical compounds with in vivo drug activity on the basis of binding and unbinding kinetics in a high-throughput fashion. With the increasing availability of experimentally determined kon/koff data,16,17 a data-driven approach is an appropriate choice for the development of a high-throughput predictive model of ligand binding and unbinding kinetics.34 Moreover, recently developed multitarget classification algorithms such as RF-Clus can be adopted to train machine learning models that can predict coupled kon/koff simultaneously. To train a high-quality machine learning model with minimum impacts of overfitting and false correlation, it is important to select relevant features; in this case, these are the molecular determinants of the ligand binding and unbinding kinetics. For the first time, we have shown that the changes in the directionality of normal modes that are efficiently derived from coarse-grained NMA can capture the long-time-scale conformational dynamics of the ligand binding and unbinding kinetics. Thus, they can serve as a kinetic fingerprint to identify the molecular determinants of the binding/unbinding kinetics and enhance the performance of the machine learning model. It is expected that the residue normal mode directionality replacement fingerprint can be applied to other areas in structure-based drug design, such as protein–ligand docking.

4.2. Coherent Receptor–Ligand Movement Is One of the Structural Determinants of Protein Binding/Unbinding Kinetics

It is known that electrostatic interactions between a charged drug and a charged receptor impact the kinetic rate constants.35,36 Specifically, kon is sensitive to long-range electrostatic interactions, and koff tends to be influenced more by short-range interactions such as hydrogen bonds, salt bridges, and van der Waals contacts.37 The majority of the residues involved in the HIV protease–ligand interaction are hydrophobic; the exceptions are the catalytic D25 and D29, which are able to form hydrogen bonds with the main-chain groups of substrate peptides, and R8, D30, and K45 which can interact with polar side chains or distal main-chain groups in longer substrate peptides. The MTML model identifies these charged residues. In addition, the MTML model can achieve higher prediction accuracy using the electrostatic energy than the van der Waals interaction. These results indicate that electrostatic interactions play a more important role than van der Waals interactions in determining the binding/unbinding kinetics of HIV protease.

Interestingly, in addition to the electrostatic interactions, the directionality of ligand-binding-site residue movement also has strong correlations with the kinetic constants. Consistent with the all-atom MD simulation results, the MTML model trained with the relative directionality of normal modes between residue and ligand recapitulates the role of the flap region in the binding kinetics of HIV protease, which cannot be fully recovered by the energetic features. Moreover, not only are similar residues selected by DS-RMLR and DS-EE, but also, the best-performing MTML model is obtained from the combination of DS-RMLR and DS-EE. On the basis of these observations, we propose that coherent movement between the ligand and the receptor may play a critical role in determining the ligand binding and unbinding kinetics. As shown in Figure 11, even if two protein–ligand complexes have the same non-covalent interactions with equal intensities, they may have different kinetic constants because of the different relative movements between the ligand atom and the receptor atom. This is not surprising, as the non-covalent interactions, especially hydrogen bonding, depend on the relative directionality of atomic pairs. The relative movement may change the directionality of the interaction, thus weakening (even breaking) or strengthening the interaction. Thus, the coherent conformational dynamics coupling could be one of the key structural determinants of protein binding/unbinding kinetics. This has not been observed before.

Figure 11
Influence of ligand–residue conformational coupling on protein binding/unbinding kinetics. (A) Coherent conformational coupling. The relative movement between the ligand and receptor atoms does not change the distance and directionality of the ...

4.3. Future Works

Although this proof-of-concept study demonstrates the potential of integrating coarse-grained normal mode analysis with multitarget machine learning in understanding the molecular determinants and developing a high-throughput predictive model of ligand binding and unbinding kinetics, there is plenty of space to improve the methodology.

There are three different models of the conformational ensemble of a protein–ligand complex. They are the model of the induced-fit mechanism, which is adopted by HIV-1 protein–ligand complexes,1,38 the model of the selected-fit mechanism,39,40 and the model of the three-step mechanism.41,42 The mechanism of the model determines the on-rate and off-rate equations. For example, the induced-fit on rate is limited by the diffusional rate of encounter complex formation of the proteins in their unbound conformational ensemble, but the off rate is dependent on the equilibrium between the ground-state complex and the excited-state complex.38 Since all of the training data sets in this study cover only the characteristics of the ground-state HIV-1 complex, the ignorance of the characteristics of the excited-state HIV-1 complex could induce deficiencies in the predictive model.

Since the solvation effect causes a time-scale discrepancy between real molecular motion and NMA, which is calculated in vacuum, it is expected that NMA coupled with an implicit or explicit solvation model may provide more information on the conformational dynamics of the ligand binding process. As water plays a critical role in ligand binding, explicit incorporation of water molecules in the binding site may improve the accuracy of NMA. The global and local geometries of the binding pocket could be another important feature.43,44 In the current study, the ligand was treated as a single rigid body. As a matter of fact, the flexibility of the ligand may have an impact on the kinetic rate constants. A survey of over 2000 drugs binding to G-protein-coupled receptors, protein kinases, and other enzymes found that higher-molecular-weight drugs tend to have lower off rates and that a greater number of rotatable bonds tend to have longer residence times.4,45 This suggests that the performance of the MTML model could be further improved by incorporating ligand properties. We group the kon/koff value into four classes and use the classification model to predict the class and to select features. In practice, it could be more useful to predict the real values of kon and koff simultaneously. This would require a multitarget regression model, which is an active area of research in machine learning.

CONCLUSION

For the first time, we have demonstrated that residue normal mode directionality displacement derived from coarse-grained NMA of protein–ligand complexes can capture the long-timescale conformational dynamics of protein–ligand binding/unbinding events. In certain molecular systems, the kinetic rate constants kon and koff may be determined by coherent coupling of the conformational dynamics and thermodynamic interactions between the receptor and the ligand. Thus, the residue normal mode directionality displacement may serve as a kinetic fingerprint to study the kinetics of protein–ligand interactions. Combining the kinetic fingerprint with state-of-the-art MTML enables us to screen chemical compounds with in vivo drug activity accurately and efficiently. The further development of such computational tools will bridge one of the critical missing links between in vitro compound screening and in vivo drug activity.

Supplementary Material

suppl

Acknowledgments

We sincerely thank the reviewers for their constructive criticisms and suggestions. This research was supported by the National Library of Medicine of the National Institutes of Health (Grant R01LM011986), the National Science Foundation (Grants CNS-0958379, CNS-0855217, and ACI 1126113), and the City University of New York High Performance Computing Center at the College of Staten Island.

Footnotes

Supporting Information

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.5b00632.

Materials and Methods; association and dissociation rate constants (kon, koff) for the interactions between 39 inhibitors and HIV-1 protease (Table S1); results of discretization given by the model of −2/5.6 cutoffs (Table S2); PDB crystal structures used in the experiment (Table S3); results of discretization given by the model of −1.5/5 cutoffs (Table S4); PDB structures and corresponding HIV-1 protease inhibitors used for eHiTS docking (Table S5); equilibrium trajectory and the RMSD of the backbone atoms of HIV-1 bound with A045 during the 204 ps MD simulation (Figure S1); docking of ligands into HIV-1 protease (Figure S2); SASA TCL script with a probe radius of 1.4 Å for the A045–1AJV complex (Figure S3); average dielectric constants of different types of amino acids (Figure S4); NAMDEnergy graphical user interface (Figure S5) (PDF)

The authors declare no competing financial interest.

References

1. Copeland RA. Conformational Adaptation in Drug-Target Interactions and Residence Time. Future Med Chem. 2011;3:1491–1501. [PubMed]
2. Gooljarsingh LT, Fernandes C, Yan K, Zhang H, Grooms M, Johanson K, Sinnamon RH, Kirkpatrick RB, Kerrigan J, Lewis T, Arnone M, King AJ, Lai Z, Copeland RA, Tummino PJ. A Biochemical Rational for the Anticancer Effects of Hsp90 Inhibitors: Slow, Tight Binding Inhibition by Geldanamycin and its Analogues. Proc Natl Acad Sci USA. 2006;103:7625–7630. [PubMed]
3. Copeland RA, Pompliano DL, Meek TD. Drug-Target Residence Time and its Implications for Lead Optimization. Nat Rev Drug Discovery. 2006;5:730–739. [PubMed]
4. Pan AC, Borhani DW, Dror RO, Shaw DE. Molecular Determinants of Drug-Receptor Binding Kinetics. Drug Discovery Today. 2013;18:667–673. [PubMed]
5. Dahl G, Akerud T. Pharmacokinetics and the Drug-Target Residence Time Concept. Drug Discovery Today. 2013;18:697–707. [PubMed]
6. Xie L, Ge X, Tan H, Xie L, Zhang Y, Hart T, Yang X, Bourne PE. Towards Structural Systems Pharmacology to Study Complex Diseases and Personalized Medicine. PLoS Comput Biol. 2014;10:e1003554. [PMC free article] [PubMed]
7. Myszka DG, Rich RL. SPR’s High Impact on Drug Discovery: Resolution, Throughput and Versatility. Drug Discovery World. 2003 Spring; [Online] http://www.ddw-online.com/enabling-technologies/p148421-spr-s-high-impact-on-drug-discovery:-resolutionthroughput-and-versatility-spring-2003.html (accessed April, 12, 2016).
8. Rossi AM, Taylor CW. Analysis of Protein-Ligand Interactions by Fluorescence Polarization. Nat Protoc. 2011;6:365–387. [PMC free article] [PubMed]
9. Jecklin MC, Schauer S, Dumelin CE, Zenobi R. Label-Free Determination of Protein-Ligand Binding Constants Using Mass Spectrometry and Validation Using Surface Plasmon Resonance and Isothermal Titration Calorimetry. J Mol Recognit. 2009;22:319–329. [PubMed]
10. Buch I, Giorgino T, De Fabritiis G. Complete Reconstruction of an Enzyme-Inhibitor Binding Process by Molecular Dynamics Simulations. Proc Natl Acad Sci USA. 2011;108:10184–10189. [PubMed]
11. Gervasio FL, Laio A, Parrinello M. Flexible Docking in Solution Using Metadynamics. J Am Chem Soc. 2005;127:2600–2607. [PubMed]
12. Mollica L, Decherchi S, Zia SR, Gaspari R, Cavalli A, Rocchia W. Kinetics of Protein-Ligand Unbinding via Smoothed Potential Molecular Dynamics Simulations. Sci Rep. 2015;5:11539. [PMC free article] [PubMed]
13. Tiwary P, Limongelli V, Salvalaglio M, Parrinello M. Kinetics of Protein–Ligand Unbinding: Predicting Pathways, Rates, and Rate-Limiting Steps. Proc Natl Acad Sci USA. 2015;112:E386. [PubMed]
14. Hämäläinen MD, Markgren P-O, Schaal W, Karlén A, Classon B, Vrang L, Samuelsson B, Hallberg A, Danielson UH. Characterization of a Set of HIV-1 Protease Inhibitors Using Binding Kinetics Data from a Biosensor-Based Screen. J Biomol Screening. 2000;5:353–359. [PubMed]
15. Wienkers LC, Heath TG. Predicting in Vivo Drug Interactions from in Vitro Drug Discovery Data. Nat Rev Drug Discovery. 2005;4:825–833. [PubMed]
16. Bai H, Yang K, Yu D, Zhang C, Chen F, Lai L. Predicting Kinetic Constants of Protein-Protein Interactions Based on Structural Properties. Proteins: Struct, Funct Genet. 2011;79:720–734. [PubMed]
17. Moal IH, Bates PA. Kinetic Rate Constant Prediction Supports the Conformational Selection Mechanism of Protein Binding. PLoS Comput Biol. 2012;8:e1002351. [PMC free article] [PubMed]
18. Bhandarkar M, Bhatele A, Bohm E, Brunner R, Buelens F, Chipot C, Dalke A, Dixit S, Fiorin G, Freddolino P, Grayson P, Gullingsrud J, Gursoy A, Hardy D, Harrison C, Hénin J, Humphrey W, Hurwitz D, Krawetz N, Kumar S, Kunzman D, Lai J, Lee C, McGreevy R, Mei C, Nelson M, Phillips J, Sarood O, Shinozaki A, Tanner D, Wells D, Zheng G, Zhu F. NAMD User’s Guide, version 2.10b1; Theoretical Biophysics Group. University of Illinois and Beckman Institute; Urbana, IL: 2014. www.ks.uiuc.edu/Research/namd/2.10b1/ug.pdf (accessed April 12, 2016)
19. Kerrigan JE. Linear Interaction Energy Tutorial for NAMD 2.8. cinjweb.umdnj.edu/~kerrigje/pdf_files/NAMD-LIE-tutorial.pdf (accessed April 12, 2016)
20. Li L, Li C, Zhang Z, Alexov E. On the Dielectric “Constant” of Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its Implementation in DelPhi. J Chem Theory Comput. 2013;9:2126–2136. [PMC free article] [PubMed]
21. Zsoldos Z, Reid D, Simon A, Sadjad SB, Johnson AP. eHits: a New Fast, Exhaustive Flexible Ligand Docking System. J Mol Graphics Modell. 2007;26:198–212. [PubMed]
22. Lopez-Blanco JR, Garzon JI, Chacon P. iMod: Multipurpose Normal Mode Analysis in Internal Coordinates. Bioinformatics. 2011;27:2843–2850. [PubMed]
23. Tanner DE, Chan KY, Phillips JC, Schulten K. Parallel Generalized Born Implicit Solvent Calculations with NAMD. J Chem Theory Comput. 2011;7:3635–3642. [PMC free article] [PubMed]
24. Markgren PO, Schaal W, Hämäläinen M, Karlén A, Hallberg A, Samuelsson B, Danielson UH. Relationships between Structure and Interaction Kinetics for HIV-1 Protease Inhibitors. J Med Chem. 2002;45:5430–5439. [PubMed]
25. Markgren PO, Lindgren MT, Gertow K, Karlsson R, Hämäläinen M, Danielson UH. Determination of Interaction Kinetic Constants for HIV-1 Protease Inhibitors Using Optical Biosensor Technology. Anal Biochem. 2001;291:207–218. [PubMed]
26. Stanford HIV Drug Resistance Database. Major HIV-1Drug Resistance Mutations. 2015 http://hivdb.stanford.edu/pages/download/resistanceMutations_handout.pdf (accessed April 12, 2016).
27. Dutta S, Burkhardt K, Young J, Swaminathan GJ, Matsuura T, Henrick K, Nakamura H, Berman HM. Data Deposition and Annotation at the Worldwide Protein Data Bank. Mol Biotechnol. 2009;42:1–13. [PubMed]
28. Ma J. Usefulness and Limitations of Normal Mode Analysis in Modeling Dynamics of Biomolecular Complexes. Structure. 2005;13:373–380. [PubMed]
29. Hinsen K, Petrescu AJ, Dellerue S, Bellissent-Funel MC, Kneller GR. Harmonicity in Slow Protein Dynamics. Chem Phys. 2000;261:25–37.
30. Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S. An Extensive Experimental Comparison of Methods for Multi-Label Learning. Pattern Recognit. 2012;45:3084–3104.
31. Rose RB, Craik CS, Stroud RM. Domain Flexibility in Retroviral Proteases: Structural Implications for Drug Resistant Mutations. Biochemistry. 1998;37:2607–2621. [PubMed]
32. Hornak V, Okur A, Rizzo RC, Simmerling C. HIV-1 Protease Flaps Spontaneously Open and Reclose in Molecular Dynamics Simulations. Proc Natl Acad Sci USA. 2006;103:915–920. [PubMed]
33. Scott WR, Schiffer CA. Curling of Flap Tips in HIV-1 Protease as a Mechanism for Substrate Entry and Tolerance of Drug Resistance. Structure. 2000;8:1259–1265. [PubMed]
34. Murphy RF. An Active Role for Machine Learning in Drug Development. Nat Chem Biol. 2011;7:327–330. [PMC free article] [PubMed]
35. Robinson RA, Stokes RH. Electrolyte Solutions. 2nd. Dover Publications; Mineola, NY: 2002.
36. Fedosova NU, Champeil P, Esmann M. Nucleotide Binding to Na, K-ATPase: The Role of Electrostatic Interactions. Biochemistry. 2002;41:1267–1273. [PubMed]
37. Schreiber G, Fersht AR. Rapid, Electrostatically Assisted Association of Proteins. Nat Struct Biol. 1996;3:427–431. [PubMed]
38. Weikl TR, von Deuster C. Selected-Fit versus Induced-Fit Protein Binding: Kinetic Differences and Mutational Analysis. Proteins: Struct, Funct Genet. 2009;75:104–110. [PubMed]
39. Tsai CJ, Ma B, Nussinov R. Folding and Binding Cascades: Shifts in Energy Landscapes. Proc Natl Acad Sci USA. 1999;96:9970–9972. [PubMed]
40. Gsponer J, Christodoulou J, Cavalli A, Bui JM, Richter B, Dobson CM, Vendruscolo M. A Coupled Equilibrium Shift Mechanism in Calmodulin-Mediated Signal Transduction. Structure. 2008;16:736–746. [PMC free article] [PubMed]
41. Grunberg R, Leckner J, Nilges M. Complementarity of Structure Ensembles in Protein-Protein Binding. Structure. 2004;12:2125–2136. [PubMed]
42. Wlodarski T, Zagrovic B. Conformational Selection and Induced Fit Mechanism Underlie Specificity in Noncovalent Interactions with Ubiquitin. Proc Natl Acad Sci USA. 2009;106:19346–19351. [PubMed]
43. Di Cera E, Wyman J. Global and Local Metric Geometry of Ligand Binding Thermodynamics. Proc Natl Acad Sci USA. 1991;88:3494–3497. [PubMed]
44. Roy A, Zhang Y. Recognizing Protein-Ligand Binding Sites by Global Structural Alignment and Local Geometry Refinement. Structure. 2012;20:987–997. [PMC free article] [PubMed]
45. Miller DC, Lunn G, Jones P, Sabnis Y, Davies NL, Driscoll P. Investigation of the Effect of Molecular Properties on the Binding Kinetics of a Ligand to its Biological Target. MedChemComm. 2012;3:449–452.