|Home | About | Journals | Submit | Contact Us | Français|
The antiandrogens, such as bicalutamide, targeting the androgen receptor (AR), are the main endocrine therapies for prostate cancer (PCa). But as drug resistance to antiandrogens emerges in advanced PCa, there presents a high medical need for exploitation of novel AR antagonists. In this work, the relationships between the molecular structures and antiandrogenic activities of a series of 7α-substituted dihydrotestosterone derivatives were investigated. The proposed MLR model obtained high predictive ability. The thoroughly validated QSAR model was used to virtually screen new dihydrotestosterones derivatives taken from PubChem, resulting in the finding of novel compounds CID_70128824, CID_70127147, and CID_70126881, whose in silico bioactivities are much higher than the published best one, even higher than bicalutamide. In addition, molecular docking, molecular dynamics (MD) simulations, and MM/GBSA have been employed to analyze and compare the binding modes between the novel compounds and AR. Through the analysis of the binding free energy and residue energy decomposition, we concluded that the newly discovered chemicals can in silico bind to AR with similar position and mechanism to the reported active compound and the van der Waals interaction is the main driving force during the binding process.
According to the latest World Cancer Report 2014 , prostate cancer (PCa) has become the second most common cancer among men in the world. The morbidity rate of PCa has reached 15%, which is merely 1.7% lower than the leading lung cancer. It is reported that about 1100,000 people were diagnosed as new PCa patients in 2012 . Additionally researchers pointed out that prostate cancer is not the privilege of men; women have similar prostate tissue, which also has the risk of cancer .
The androgen receptor (AR), a ligand inducible transcription factor in the nuclear hormone receptor super family , plays a critical role in the development and progress of PCa. Natural hormone testosterone (T) and dihydrotestosterone (DHT), known as androgens, are the endogenous ligands of AR. When bound to AR, androgens play significant roles in the sexual development, function, and musculoskeletal growth of male. The main mechanism of androgen action is to regulate the gene expression by means of binding to AR, changing the level of specific proteins in cells, and controlling cell behavior . Therefore, a rational approach to cure PCa is the use of antiandrogens to prevent the interaction of T or DHT with AR.
At present, androgen receptor antagonists, such as bicalutamide and flutamide, are used as main hormone therapies for prostate cancer . Although these antiandrogens exhibit good efficacy in many cases and comprise an important part of effective therapeutics [7–10], the emergence of recurrent and metastatic forms of castration-resistant PCa (CRPC) becomes a major challenge, with a median survival of only 1~2 years . A possible reason is that these antiandrogens have partial agonistic activities at high concentration in vitro . Therefore, the discovery of new AR antagonists with high antiandrogen activities is highly expected.
Here, in this study, to aid the research and development of steroidal antiandrogens, we investigated the relationships between a series of 7α-substituted dihydrotestosterone derivatives and corresponding antiandrogen activities. The vital features related to the bioactivities were explored, and a linear quantitative structure-activity relationship (QSAR) model was established according to OECD principles , using the QSARINS program [14, 15]. Then the QSAR model, thoroughly and strictly validated by various internal and external validation techniques, is used to virtually screen new dihydrotestosterones, without experimental bioactivities, downloaded from PubChem database . Besides, molecular docking and molecular dynamics (MD) simulations are used to study the possible binding mode of compounds owning high in silico activities with androgen receptor. At last, the most active compounds with good binding affinities to AR, as highlighted by the Insubria graph , are proposed for experimental research group to test the antiandrogen activities in the future.
The success of any QSAR model depends on accurate and clean training data, proper representative descriptor selection methods, suitable statistical methods, and, most critically, both internal and external validation of resulting methods [18, 19]. Here, in this work, a set of 36 7α-substituted dihydrotestosterones derivatives were taken from literatures [20, 21]. The skeleton structure of these derivatives is shown in Figure 1, in which R group represents amine, carboxylic acids, and halogens, and so forth.
These molecules were divided into a training set and a prediction set according to the structure diversity in QSARINS. Finally, 29 compounds were included in the training set and 7 compounds were in the prediction set (prediction set a). The experimental values, half maximal inhibitory concentration (IC50) expressed in nM, were converted into negative logarithmic units marked as pIC50, which was used as dependent variables in the QSAR analyses. The studied molecular structures and corresponding antiandrogen activities were listed in Table 1.
To describe a molecule, the molecular structures were firstly sketched in HyperChem program  and then were optimized to the minimum energy conformation by using AM1 method. After minimization, we submit the structures to DRAGON 5.5 software  to calculate 2914 descriptors including zero-, one-, two-, and three-dimensional (0D, 1D, 2D, and 3D), charge descriptors, and molecular properties. The related theories of the molecular descriptors are provided by DRAGON software, and the calculation procedure is clarified in detail, in the Handbook of Molecular Descriptors .
In order to facilitate the successive feature selection process, the constant and near constant descriptors were removed. Besides, if pairwise correlation of two descriptors is larger than 0.98, the one showing the highest pairwise correlation with others will be excluded. Finally, 358 descriptors remained for the next variable selection process.
After descriptor calculation, genetic algorithm (GA) implemented in QSARINS software was used to select descriptors. The final model was built by using MLR method based on the selected descriptors, named GA-MLR. The first step of GA is to produce a set of solutions randomly which is called initial population. Each solution, a model based on the contained descriptors by using multiple linear regressions method, is called a chromosome. Subsequently, the fitness function, Friedman LOF, is used to evaluate the fitness of these individuals:
Here, SSE represents the sum of squares of errors, c is the number of basis function, d is the smoothness factor (default 0.5), p is the number of features in the model, and n is the number of samples for model construction. In the successful selection stage, the fittest individuals evolve to the next generation. Then crossover and mutation operators were performed to generate new individuals. Finally a new population is formed consisting of the fittest chromosomes. The above evolution continues until the stop conditions are satisfied. The related parameters that control the GA performance are list as follows: population size (200), maximum generations (10000), and mutation probability (0.05).
QSARINS is based on GA-MLR method and performed various tools to a rigorous internal and external validation, based on the different validation criteria, as well as for the check of model applicability domain.
The robustness and stability of the built model were validated by several statistical parameters, such as determination coefficient (R2), leave-one-out (LOO) cross-validation QLOO2, and root mean squared error (RMSE). Besides, leave-many-out (LMO) cross-validation method was also performed and QLMO2 was reported. Randomization technology, by reordering the independent variable, was used to exclude the possibility of the chance correlation. Generally, the correlation coefficient of the built QSAR model should exceed Y randomized generated model. After Y scrambling was carried out with iterations of 5,000, the average value of squared correlation coefficient of the randomized models R2 and QLOO2 was reported.
A good QSAR model should also have satisfactory predictive ability. The best way to evaluate the predictive ability of a model is its validation by new compounds, called prediction set, which do not participate in the process of model building. After the activities of the prediction set samples were predicted, the agreement between the experimental and predicted values was calculated as a measure of a QSAR model quality. Here we adopt several ways to calculate this agreement, QF12, QF22 , QF32 , and CCC [27–29].
All the above external validation parameters were calculated in the software QSARINS and were combined to evaluate the predictive ability of the proposed model.
To validate the practical applicability of a model to a new chemical, the applicability domain (AD), a theoretical domain which is defined by means of the selected descriptors in the process of modeling, should be defined properly. In this research, the AD was quantitatively assessed by the leverage approach [30, 31]. The leverage (hat, h) was calculated by hi = xi(XTX)xiT(i = 1,…, m), where xi was the descriptor row-vector of the query compound i and X was the np matrix of the training set (p is the number of model descriptors). The limit of model domain was quantitatively defined by the leverage cutoff (h), set as 3(p + 1)/n. A leverage greater than h means that the query was outside of the model structural AD, so the predictions were extrapolations of the model and could be less reliable. The AD for chemicals with experimental data was verified by the Williams plot, where the hat values versus the standardized residuals were plotted, while the AD for chemicals without experimental data, which were analyzed in the virtual screening, was verified by the Insubria graph where the hat values were plotted versus the predicted responses [14, 18].
To explore more 7α-substituted dihydrotestosterones and to find similar derivatives with high antiandrogen activities, the studied skeleton structure was used as a query to search PubChem database for new dihydrotestosterones, without experimental bioactivities. Then the established MLR model, after thoroughly being validated internally and externally, was used to predict the antiandrogenic activities of these new dihydrotestosterones, verifying the AD.
Besides, molecular docking was employed to investigate the possible interaction mechanisms of the samples owning high in silico antiandrogenic activities with AR. Particularly, comprehensively considering the docking speed and accuracy, LigandFit, which is commonly used as a flexible docking method executed in the commercial software Discovery Studio 2.5 , was applied into the progress of structure-based virtual screening. The protein structure of AR was firstly downloaded from RCSB Protein Data Bank  (PDB entry code: 1T65) and imported in docking process. All ligands and water molecules were removed at first, the charge and polar hydrogen atoms were added, and the incomplete residues were corrected.
The molecular dynamics (MD) simulations were carried out using the Amber 14 software package . MD is a commonly used methodology in exploring the interaction between ligand and protein. We have investigated the interaction mechanisms of R-bicalutamide/S-1 with WT/W741L AR using molecular dynamics simulations . The docked structures of AR (PDB ID: 1T65) with the reported most active compound number 4 and novel chemicals with high in silico activities were used as the initial structures for MD simulations. During the process of docking, taking into consideration the fact that these residues collide significantly with the compounds, Helix 12 of AR was removed in the model as executed in the literature .
All missing hydrogen atoms of the AR were added by the LEaP module of the Amber 14 package. To maintain the electroneutrality of all the studied complexes, the appropriate number of chloride counterions was added. Then each complex was immersed into a cubic periodic box of TIP3P water model  with at least 10Å distance around the complex.
Amber 14 package and the Amberff03 force field were used for all molecular dynamics simulations. Sander program was carried out for the energy minimization and equilibration protocol. First, energy minimization of four complexes was done through three stages, using the steepest descent method switched to a conjugate gradient every 2500 steps for a total of 5000 steps with a nonbonded cutoff of 10Å. In the first stage, to enable the added TIP3P water molecules to adjust to their proper orientations, the AR and ligand were restrained with 5.0kcalmol−1Å−2. In the second stage, to enable the AR to find a better way of accommodating ligand, the protein backbone was restrained with 3.0kcalmol−1Å−2. In the third stage, the entire solvated system was minimized without any restraint. Additionally, gradual heating, density, and equilibration protocols were performed. During the 100ps heating procedure, the system was gradually heated from 0 to 310K, and then the density was at 310K for 400ps, and at last the equilibration was at 310K for 400ps.
Afterwards, four 20ns production MD simulations were carried out with the PMEMD program without any restraints in the isothermal isobaric ensemble (NPT, P = 1atm, T = 310K) MD. The time step was set at 2fs. 10Å cutoff was applied to treat nonbonding interactions. During the simulations periodic boundary conditions were employed and all electrostatic interactions were calculated using the particle mesh Ewald (PME) method. The SHAKE algorithm was used to restrain all bond lengths containing hydrogen atoms. All of the coordinate trajectories were recorded every 2ps throughout all MD runs. To analyze the energy and structure, a total of 500 snapshots of the simulated structures stripped in the last 2ns stable MD production trajectory at 4ps intervals were extracted.
For each protein-ligand complex, the binding free energy was analyzed by the MM/GBSA method . To compare the AR binding free energies with different ligands, MM/GBSA calculation was applied to the 500 snapshots extracted from the final 2ns of the MD trajectories. The total free energy of binding free energy was composed of the following molecular species (complex):
where Gcomplex, Gprotein, and Gligand are the free energy of complex, receptor, and ligand, respectively. The free energy for each species (complex, ligand, or receptor) can be decomposed into a gas phase energy (ΔEMM), a solvation-free energy (ΔGsol), and an entropy term (TΔS).
where the ΔEMM is the sum of the internal energy of bonds, angle, and torsion (ΔEval), electrostatic interaction energy (ΔEele), and van der Waals interaction energy (ΔEvdw). ΔGsol is solvation-free energy and can be divided into two parts, the polar solvation-free energy (ΔGp) and the nonpolar solvation-free energy (ΔGnp). The polar solvation-free energy ΔGp is determined by generalized Born (GB) equation. The values of the dielectric constant for solute and solvent were set as 1 and 80. ΔGnp is the nonpolar solvation contribution and was calculated with constants 0.0072kcalmol−1Å−2 for surface tension proportionality constant γ and 0.92kcalmol−1Å−2 for the nonpolar free energy for a point solute β. SASA is the solvent accessible surface area and is determined by recursively approximating a sphere around each atom, starting from icosahedra (ICOSA method). TΔS is the entropy term, including the translational, rotational, and vibrational terms of the solute molecules.
Furthermore, to obtain the contribution of each residue to the binding process, we performed binding free energy decomposition. The MM/GBSA approach was used to calculate the per-residue free energy decomposition, which is based on the same 500 snapshots we have extracted from the last 2ns of the stable MD trajectory.
Entropy was analyzed by normal mode with AMBER14 NMODE module. Due to the high computational cost in the entropy calculation, 50 snapshots were extracted from the last 2ns trajectory of the simulation with 40ps time intervals.
The training set samples, 29 compounds as listed in Table 1, were used to build QSAR model by using GA-MLR methods, and the remaining compounds were used to evaluate the predictive ability of the built model. GA provided a series of linear equations containing different descriptor combinations with different performance, but similarly satisfactory. An excellent QSAR model should have high fitting ability, high cross-validated QLOO2, high external predictive ability, and little difference between internal and external predictive ability.
Based on the above principles, a four-descriptor model was selected as the final model. The involved descriptors and corresponding physical-chemical meanings were listed in Table 2. The corresponding model equation and statistic parameters are listed as follows:
From the linear equation and statistic parameters, we could see that the fitting ability of the final model was relatively high with R2 of 0.760 and the final model was stable with QLOO2 of 0.656 and QLMO2 of 0.662. About the predictive ability of the final model, we could find that QF12 and QF22 have similar high values. Compared with QF12 and QF22, the value of QF32 was higher. Besides, the value of CCC was as high as 0.891, surpassing the threshold value of 0.85 as suggested in literature  for predictive model. Additionally, the RMSE values for the training set and prediction set were similarly very low. All these parameters indicated the higher external prediction ability of the final model. The interrelation coefficients of the selected descriptors were presented in the Table 3. It could be seen that the highest intercorrelation coefficient K was −0.61 between IC5 and HATS3e, which indicated that the used variables were independent. All results proved that the selected model was reliable, stable, and predictive.
Y randomization technique was carried out with iterations of 5000 in QSARINS. Figure 2 showed the plot of R2 and Q2 values versus Kxy, automatically obtained in QSARINS. From Figure 2, we could find that the R2 and Q2 values of the final model were much higher than the models from scrambled Y-column, because the relationship between molecular structure and response was broken. This result indicated that the relationships between structures of 7α-substituted dihydrotestosterones and corresponding pIC50 values did exist in the proposed model, and it was really not obtained by chance.
The predicted pIC50 values by MLR model were listed in Table 1. Figure 3 was the scatter plot of the experimental versus the predicted pIC50 values. It was obvious that, in Figure 3, all predicted pIC50 values were close to the line y = x, which indicated that the linear model can accurately predict the antiandrogenic values of these derivatives.
The model applicability domain was evaluated by means of leverage analysis, namely, Williams plot, shown in Figure 4, in which the standardized residuals (σ) and leverage values (h) were plotted. In Figure 4, we could see that all compounds were inside the model structural applicability domain (h = 0.517) and reasonably well predicted with standard residue smaller than 2.5σ.
After the MLR model was built, we luckily found two new 7α-substituted dihydrotestosterones, showed in Table 1 (marked as “b,” prediction set b), with experimental antiandrogen activities from another published literature . These two compounds were additionally used to validate our model. Both of them (numbered in the Williams plot of Figure 4) were located in the applicability domain of the MLR model with h value of 0.332 for compound 37 and 0.237 for compound 38. The predictions on them were quite near to their experimental values. In Figure 3, these two samples (in red) were very near to the line y = x. These results further indicated the high predictive ability of the proposed MLR model.
By interpreting the meaning of the descriptors used in the model, we could extract vital structural features, to some extent, responsible for the antiandrogenic activities of these steroidal derivatives. IC5 was calculated as the mean information content as follows: IC5 = −∑g−1Ag(Ag/nAT) · log2(Ag/nAT), where g runs over the equivalence classes, Ag was the cardinality of the gth equivalence class, and nAT was the total number of atoms. This index represented a measure of structural complexity per vertex. GATS5e belonged to 2D autocorrelations and was Geary autocorrelation, lag 5/weighted by atomic Sanderson electronegativities. This descriptor was favorable to the antiandrogen activities of these steroidal derivatives. DISPp, geometrical descriptors, indicated the displacement between the geometric center and the center of the polarizability, calculated with respect to the molecular principal axes. HATS3u is a GETAWAY descriptor , representing the leverage-weighted autocorrelation of lag 5/weighted by atomic polarizabilities. With the increase of these two descriptors, the bioactivities of the studied compounds decreased.
From PubChem database, we found 110 new 7α-substituted dihydrotestosterone derivatives, without experimental data. By exploring the leverage h values, 77.27% of them were located in the structural applicability domain of the proposed MLR model. Figure 5 is the Insubria graph of these dihydrotestosterones, the plot of leverage values versus predicted pIC50, which was proposed especially for exploring the unknown samples. In Figure 5, most chemicals are in the range of the hat cutoff (h = 0.517). Inside the model AD, the most active compound is CID_70128824, which has in silico PIC50 of 7.37, higher than the reported most active compound number 4 (pIC50 = 6.77). Outside the model AD, we luckily obtained several samples with higher in silico activities, especially CID_70126881 and CID_70127147, which showed excellent in silico antiandrogen activities as high as 7.90 and 7.76, respectively, even higher than bicalutamide and hydroxyflutamide. The ID of these compounds and corresponding predicted pIC50 values are listed in Table 4. Though only these three compounds were highlighted here, other samples with high in silico activities, especially those located in the model structural AD, were also worthy of our attention.
To further explore the possible binding mode of the screened compounds, molecular docking was employed to study the interaction between compounds owning high in silico activities (especially CID_70128824, CID_70126881, and CID_70127147), together with the reported most active compound 4 as a comparison, and androgen receptor (PDB ID: 1T65) by using LigandFit module in Discovery Studio 2.5. Firstly, DHT was extracted from crystal structure and redocked into ligand binding pocket to obtain the optimal docking parameters. Secondly, the ligand binding site was defined with the same parameters as DHT. At this point, the radius of SBD_Site_Sphere was set to 10Å. The other parameters were set by default.
To obtain reasonable conformations of different complex, the top-ranked compounds with lowest RMSD values were extracted. The binding mode of compound 4, CID_70128824, CID_70126881, and CID_70127147 with AR were presented in Figure 6. From this Figure, it could be seen that the docked pose of CID_70128824, CID_70126881, and CID_70127147 located in the same position with similar orientation in the AR ligand binding site to compound 4. All these results indicated that though two of them were outside of the model AD, these three compounds CID_70128824, CID_70126881, and CID_70127147 might have good performance to antagonize androgen receptor and have the potency for further research and development for PCa therapy.
In order to verify whether the studied systems reach equilibrium, the root mean square deviations (RMSDs) of all the backbone atoms of the protein, the Cα atoms for the residues of the active site (residues within 5Å around ligand), and the heavy atoms of ligand from the initial structure were monitored to examine the dynamic stability of the systems and plotted against time, as shown in Figure 7. The three RMSDs have small fluctuations after 15ns, implying that the studied systems have reached stability. We used the last 2ns to analyze the energy and binding modes for the four complexes.
We calculated the binding free energy by MM/GBSA method between the four ligands and AR to validate the reliability of the MD simulation. Table 5 lists the binding free energy and all of the energy terms for the four compounds. From Table 5, the four complexes had different binding free energy; the ranking order is CID_70126881 (−41.62kcalmol−1), CID_70127147 (−33.06kcalmol−1), CID_70128824 (−31.86kcalmol−1), and compound 4 (−22.69kcalmol−1). Different binding free energy means different binding affinity between the four complexes. We have proved that the antiandrogen activity of compounds CID_70126881, CID_70127147, and CID_70128824 are higher than the published best one (compound 4) by MLR model. In addition, the ranking of the calculated binding free energy was consistent with their in silico bioactivities order.
According to the calculated binding free energy, CID_70126881 holds the strongest binding affinity; on the contrary, compound 4 has the lowest binding affinity. As can be seen from Table 5, the nonpolar interactions (ΔGnonpolar) including van der Waals (Evdw) and nonpolar solvation (ΔGnp) terms are the driving force for the binding of the four ligands to AR, and the total polar contributions (ΔGp) are unfavorable for their binding. In addition, CID_70126881, CID_70127147, and CID_70128824 have almost the same van der Waals interactions toward AR (−69.32kcalmol−1, −71.34kcalmol−1, and −70.56kcalmol−1 for CID_70126881-AR, CID_70127147-AR, and CID_70128824-AR), while compound 4 has a low van der Waals value, which may partly explain the reduced binding affinity of compound 4 and prove that the newly discovered chemicals could possess higher antiandrogen activities.
To obtain the detailed interaction between four ligands and AR, the decomposition of binding free energy, which is calculated by MM/GBSA method, was executed to identify key residues during the binding process. The result of energy decomposition contains van der Waals, electrostatic, solvation-free energy, and total energy contribution terms, respectively, for four systems, shown in Figure 8. All the residues with great energy contributions were almost more than 1.5kcalmol−1. As shown in Figure 8(a), residues L701, L704, M780, L873, T877, L880, and L881 of AR make a significant contribution to the CID_70126881-AR binding, as well as those for the CID_70127147-AR which are L704, N705, L712, W741, M745, and F876 (Figure 8(b)). In Figure 8(c), residues L704, N705, G708, L712, F764, and F891 of AR make a substantial contribution to the CID_70128824-AR binding. However, only two key residues (L704 and N705) were the major energy contributions to compound 4-AR binding as shown in Figure 8(d). As mentioned previously, the vast majority of key residues of AR were nonpolar; it was reasonable to speculate that these residues can form greater van der Waals interactions with hydrophobic ligand and exhibit more favorable nonpolar interaction contribution to the binding free energy.
The MD simulation, together with the docking results, confirmed that the newly discovered chemicals CID_70126881, CID_70127147, and CID_70128824 share similar binding mode with the reported compound 4, and the in silico antiandrogen activities of them are higher through the calculated binding free energy and decomposition of binding free energy.
In this study, the relationships between a series of 7α-substituted dihydrotestosterone derivatives and corresponding antiandrogen activities were explored. A reliable, stable, and robust linear MLR model with four descriptors was built and validated in QSARINS. The predictive ability of the final model, fully evaluated by using two different prediction sets, is excellent enough to be used to virtually screen novel 7α-substituted dihydrotestosterones from PubChem database. After antiandrogenic activity prediction, molecular docking, and molecular dynamic simulations, CID_70126881, CID_70127147, and CID_70128824, as the most potent chemicals with good binding affinities to androgen receptor, were proposed. Of course, bioassay experimental researches are needed to evaluate the virtual screening results. This study provides the theoretical basis and specific chemicals for AR antagonists, which can help the experimental research groups to search for potential antiandrogens.
This research is sponsored by the National Natural Science Foundation of China (Grant no. 21205055) and Fundamental Research Funds for the Central Universities (lzujbky-2015-310).
The authors declare that there is no conflict of interests regarding the publication of this paper.