Home | About | Journals | Submit | Contact Us | Français |

**|**Bioinorg Chem Appl**|**v.2017; 2017**|**PMC5512106

Formats

Article sections

Authors

Related links

Bioinorg Chem Appl. 2017; 2017: 4914272.

Published online 2017 July 3. doi: 10.1155/2017/4914272

PMCID: PMC5512106

*Yong-Ming Cai: Email: nc.ude.updg@myc

Academic Editor: Konstantinos Tsipis

Received 2017 February 20; Accepted 2017 May 10.

Copyright © 2017 Li Wen et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Hydroxyl benzoic esters are preservative, being widely used in food, medicine, and cosmetics. To explore the relationship between the molecular structure and antibacterial activity of these compounds and predict the compounds with similar structures, Quantitative Structure-Activity Relationship (QSAR) models of 25 kinds of hydroxyl benzoic esters with the quantum chemical parameters and molecular connectivity indexes are built based on support vector machine (SVM) by using R language. The External Standard Deviation Error of Prediction (SDEP_{ext}), fitting correlation coefficient (*R*^{2}), and leave-one-out cross-validation (*Q*^{2}_{LOO}) are used to value the reliability, stability, and predictive ability of models. The results show that *R*^{2} and *Q*^{2}_{LOO} of 4 kinds of nonlinear models are more than 0.6 and SDEP_{ext} is 0.213, 0.222, 0.189, and 0.218, respectively. Compared with the multiple linear regression (MLR) model (*R*^{2} = 0.421, RSD = 0.260), the correlation coefficient and the standard deviation are both better than MLR. The reliability, stability, robustness, and external predictive ability of models are good, particularly of the model of linear kernel function and eps-regression type. This model can predict the antimicrobial activity of the compounds with similar structure in the applicability domain.

QSAR [1, 2] is used to research the relationship between the molecular structure and biological activity and physicochemical characteristics, reveal the quantitative relationship, predict the activity of unknown compounds, and direct the synthesis of new materials [3–5]. QSAR is considered as one of the promising technologies and is widely used at present because of making up the loss of experimental data, reducing the cost of testing, and achieving high throughput prediction and screening [6]. Many international organizations and regulatory agencies have supported and promoted the use of QSAR and thought that QSAR can be used as an alternative to animal experiments. Health Canada, the United States of Food and Drug Administration (FDA), Environmental Protection Agency (EPA), the European Union, and the Organization for Economic Cooperation and Development (OECD) apply QSAR to identify potential health hazards, screening, and priority [7]. After recent years of development, QSAR has become a frontier topic in medicinal chemistry, environmental chemistry, life science, analytical chemistry, computer chemistry, and even pesticide [8–11].

Hydroxyl benzoic esters are important kinds of preservatives, which are widely used in medicine, food, cosmetics, pesticides, and other fields [12]. At present, there are about 60 kinds of food preservatives in the world [13]. The benzoic acid and sorbic acid are productive in China, but the usage is little because of the high toxicity of benzoic acid and the high price of sorbic acid. Hydroxyl benzoic esters have high efficiency, low toxicity, compatibility, and other advantages; the performance of antibacterial is stronger than benzoic acid and sorbic acid because it has a phenolic hydroxyl [14]. So it is of great significance to study and apply the antibacterial activity of hydroxyl benzoic esters.

SVM is a machine learning algorithm based on statistical learning theory proposed by Cortes et al. [15–17]. SVM can be used for pattern recognition, regression analysis and function fitting, and so forth because it possesses favorable mathematical properties, such as the uniqueness of the solution, nondependence on the dimension of the input space, and so forth. The optimal solution of SVM is superior to the traditional learning methods. In recent years, SVM is applied to the study of QSAR of the compound. Hou et al. [18] investigated the QSAR of the antimalarial activity of PfDHODH inhibitors by generating four computational models using a multiple linear regression (MLR) and a SVM based on a dataset of 255 PfDHODH inhibitors. Sharma et al. [19] drew support from SVM and MLR studying the activity of HIV-1 capsid inhibitors. SVM model was found more efficient in prediction. Khuntwal et al. [20] used MLR and SVM to develop QSAR models for a dataset of 34 tetrahydrobenzothiophene derivatives. Zhiming et al. [21] by using ridge regression (RR) and SVM built QSAR models of bitter tasting thresholds (BTT) and cytotoxic T lymphocyte (CTL) and predicted independent test data. Results showed that the fitting, LOOCV, and external prediction accuracies were superior to the reported results of the existing literature. Zhang et al. [22] took the benzene compounds as the research object, combining the molecular structure of the quantitative description with MLR or nonlinear regression statistical methods SVM, to build successfully the acute toxicity QSAR models and mutagenic QSAR models of benzene compounds. By comparing the linear and nonlinear QSAR models, Zhang Xiao-Long discovered that the stability and prediction ability of nonlinear QSAR models are better than those of multiple linear QSAR models. In the literature, there are very few researches about QSAR of the hydroxyl benzoic esters. Jiang et al. [23] used MLR to build the model of QSAR and it can well predict the MIC and* t*_{0.5} in the range of atomic number (the number of C among 1–4 on the ester chain of MIC and 1–3 of *t*_{0.5}). Qiu et al. [24] optimized the molecular structures of eleven kinds of p-hydroxyl benzoic esters by using density functional theory (DFT) B3LYP method of quantum chemistry and then used stepwise multiple linear regression to select the descriptors and to generate the best prediction model that relates the structural features to inhibitory activity. The QSAR results showed that the lowest unoccupied molecular orbit *E*_{LUMO} and the increase of dipole moment *μ* were the main independent factors contributing to the antifungal activity of the compounds. SVM has shown obvious advantages in the QSAR research, but QSAR study of the compound of hydroxyl benzoic esters is confined to the linear model at present; there is no literature on the nonlinear QSAR analysis of the system.

In this paper, we use the quantum chemical parameters and molecular connectivity indexes to analyze the antibacterial activity of the hydroxyl benzoic esters. The QSAR model is established by the SVM algorithm in the R software. We obtain the structure-activity relationship between the molecular structural parameters and the antibacterial activity of* Escherichia coli* under the most stable configuration, which provides a basis of predicting the antibacterial activity of similar compounds.

This paper took the 25 hydroxyl benzoate group compounds as the research object, including 10 o-hydroxyl benzoic esters, 2 m-hydroxyl benzoic esters, and 13 p-hydroxyl benzoic esters. Their details are shown in Table 1.

The antimicrobial half-life (*t*_{1/2}) (h) at the condition of minimum inhibition concentration of 25 hydroxyl benzoic esters was collected from the literature [23], in the form of logarithm (lgt_{1/2}) to express its antibacterial activity. The results are shown in Table 2.

The quantum chemical parameters [25] and molecular connectivity indexes [26] can well explain the antibacterial activity of compounds and have good correlation between them; therefore, this paper selects them with a clear physical meaning as the descriptor.

In this paper, the quantum chemical parameters are calculated by the latest Gaussian 09 software [27] that is a quantum chemistry software of semiempirical calculation and ab initio calculation of United States Gaussian company. Gaussian 09 in the calculation can carry out the molecular structure through the View Gauss 5 software directly and create the input files of molecular structures. In the calculation, Gaussian 09 software calls directly the input file and translates it into the form of redundant internal coordinates automatically. The results of the calculation are output by the text. Each time before calculation, a suitable chemistry model (computational method) should be established for the system in order to achieve balance in terms of computational cost and accuracy [27, 28]. The method of this paper is B3LYP/6-31G DFT/(d). Because all the molecular configurations are optimal configurations and the geometry optimization is convergent and there is no virtual frequency by the frequency analysis, therefore, all the data are true and reliable. Find out the useful quantum chemical parameters from the output file. The values are shown in Table 3.

Molecular connectivity indexes which mainly reflect the number of atoms in molecules, valence bond and branch information, and so forth are the constants that are calculated according to the molecular structure. Each order index has a different meaning. Many studies show that ^{5}**X**^{v}_{P} can characterize a lot of information, which has a great significance in explaining the influence of structure on biological activity [29, 30]. So, this study selects 8 molecular connectivity indexes, including ^{0}**X**^{v}_{P}, ^{1}**X**^{v}_{P}, ^{2}**X**^{v}_{P}, ^{3}**X**^{v}_{P}, ^{4}**X**^{v}_{P}, ^{5}**X**^{v}_{P}, ^{3}**X**^{v}_{C}, and ^{4}**X**^{v}_{PC}. The results are shown in Table 4.

The rational division of datasets is a very hot research topic in the field of QSAR. There are a variety of methods. In this paper, Random Sampling (RS) [31] is used to divide the raw data into training set (22 kinds) and test set (3 kinds, o-hydroxyl benzoic esters, m-hydroxyl benzoic esters, and p-hydroxyl benzoic esters). The training set is used to establish the SVM nonlinear models, and the test set tests the external prediction ability of the models.

Through the R software program, the training set with 22 compounds is used to build the nonlinear models by SVM algorithm based on the selected descriptors. Firstly, we standardize the data and then establish 4 models of kernel for radial, linear, eps-regression, and nu-regression type, respectively.

Model validation is very important for QSAR research, which consists of two aspects: internal validation to test the fitting ability and robustness of models and external validation to test the model's predictive ability. Both internal and external validations are equally important [32].

There are many methods to estimate a model's stability, robustness, and internal predictive ability, such as the fitting correlation coefficient, cross-validation, random model test, Y random, and various residual errors (like Root Mean Squared Errors (RMSEs), standard residual error, etc.) [33]. In this paper, the fitting correlation coefficient (*R*^{2}) between the experimental and predicted values of the training dataset and leave-one-out cross-validation (*Q*^{2}_{LOO}) are used to test the reliability, robustness, stability, and whether the models are overfitting or not.

A very important purpose of the QSAR models is to predict the related activity data of new or even nonsynthetic compounds, in order to guide the design and synthesis of compounds with desirable activity, or to screen the compounds. This requires that the model has good predictive ability and generalization ability; however, cross-validation can only explain the internal predictive ability of models and good internal prediction ability does not mean the excellent external prediction ability [34–36]; that is, good cross-validation *Q*^{2}_{cv} is a necessary but nonsufficient condition for the high external predictive ability [35]. The only way to evaluate the external predictive ability of the model is to test the model with the new compound (namely, external test set that is not involved in the process of descriptor selection and model establishment). The parameters of evaluation model's external predictive ability include *R*^{2}_{ext}, external *Q*^{2}_{ext}, and SDEP_{ext}. In this paper, the test set is used to predict the corresponding lgt_{1/2} and external predictive ability of the models is evaluated by SDEP_{ext}.

We use principal component analysis to extract the most critical molecular descriptors of the hydroxyl benzoic esters for antibacterial half-life.

Four nonlinear SVM models based on the selected descriptors are established by using training set. Experimental values and internal prediction results of lgt_{1/2} are shown in Table 5 and scatter plot in Figure 1.

Scatter plot of experimental values and 4 SVM models' internal prediction results of lgt1/2.* Note*. The horizontal coordinates, respectively, represent the predicted values of lgt1/2 of 4 SVM models, and the longitudinal coordinates express the experimental **...**

lgt_{1/2} of the test set is predicted, respectively, by 4 SVM models and the results are shown in Table 7. SDEP_{ext} of the models and the residual between experimental values and the predicted results of lgt_{1/2} are displayed in Table 8. Scatter plots of experimental values and prediction results by 4 SVM models of 25 compounds of lgt_{1/2} are shown in Figure 2.

Scatter plot of experimental values and 4 SVM models' prediction results of lgt1/2.* Note*. The horizontal coordinates, respectively, represent the predicted values of lgt1/2 of 4 SVM models, and the longitudinal coordinates express the experimental results. **...**

See Tables Tables1010 and and1111.

The degree of freedom and the speed of the preservative molecule determine the effective collision between the central atom of reactivity and the group or atom of microbial molecular activity. As a result, the antimicrobial property of the preservative is essentially determined by the electronic behavior of the preservative and the microorganism, that is, the quantum biochemical characterization of preservative. Therefore, from the perspective of quantum chemistry to study the relationship between the structure and properties of compound, the effective antimicrobial groups of preservative can be explained in essence [37]. Jiang et al. [23] use multiple linear regression to establish the linear model of 25 kinds of hydroxyl benzoic esters. The parameters are shown in Table 9. Results showed that *R*^{2} was only 0.421, but the equation had good linear relationship when the number of C atoms was less than 4. When the number of C atoms in the ester group is more than 4, the influencing factors become more complex and cannot be described by simple linear relationship and may be in nonlinear or diversified relationship. So we use the R language to write the program and establish 4 kinds of nonlinear models through the SVM machine algorithm for 25 hydroxyl benzoic esters and predict lgt_{1/2}. Predicted results of training set are shown in Table 5. The scatter plot of experimental and predicted lgt_{1/2} is drawn by using R software. Figure 1 shows that the predicted and experimental values are in good agreement and the linearity is obvious. According to literatures, if the value of *R*^{2} is greater than 0.6 [35, 38] and *Q*^{2} is greater than 0.5, the model is good, and model is excellent when the values are more than 0.9 [39]. Tropsha et al. [6] recommend *R*^{2} and *Q*^{2} to be greater than 0.6. Table 6 shows that both *R*^{2} and *Q*^{2}_{LOO} are greater than 0.6 and *R*^{2} and *Q*^{2}_{LOO} of two models with linear kernel function are close to 0.75, so we may think that the stability, robustness, and internal predicted ability of the 4 models are better and the models are not overfitting because *R*^{2} is larger than *Q*^{2}_{LOO} by no more than 25%. By RS extracting, the para-, ortho-, and metacompound from 25 hydroxyl benzoic esters make up external test set to test the models, and the prediction results are shown in Table 7. The parameters from Table 8 show that the residual values of lgt_{1/2} of the test set are in the range of −0.037244~0.322733 and SDEP_{ext} is 0.213, 0.222, 0.189, and 0.218, respectively. The results indicate that the 4 models have high external predictive ability among themselves; in particular the model of the linear kernel function and eps-regression type is better than the other 3 models. Scatter plots of experimental values and prediction results by 4 SVM models of 25 compounds of lgt_{1/2} are shown in Figure 2. The results show that the overall prediction of the 4 SVM models is better and, particularly, the linear relationship between predictive and experimental value of the model, where kernel function is linear and type is eps-regression, is the best.

In Table 10, the principal component analysis shows that the proportion of variance of the first principal component reaches 96.03%; therefore, the first principal component is taken only. Table 11 shows that the first principal component includes *E* (total energy), ZPE (zero-point vibrational energy), and *p* (polarizability). We consider that *E*, ZPE, and *p* are the key factors for antibacterial half-life of hydroxyl benzoic esters. *p* is a kind of structural parameter characterized by molecular deformation tensor under the action of external electric field. It is the most important property that *p* is related to the volume of the molecule and *p* contains information about the molecular interaction that is able to characterize the properties of the molecule as an electron acceptor. Since the coefficients of *p* and ZPE are negative, this indicates that the value of *p* and ZPE is greater and the antibacterial half-life of hydroxyl benzoic esters is shorter but E is just the opposite because the coefficient is positive.

In summary, QSAR nonlinear model obtained by quantum chemical parameters and molecular connectivity indexes can better predict the antibacterial activity of hydroxyl benzoic esters. The introduction of SVM algorithm solves the problem of poor correlation of QSAR and complex nonlinear relationship between the molecular descriptors when formula weight is large, which provides a basis for the prediction of the antibacterial activity of compounds with similar structure.

Therefore, the main conclusions of this paper are as follows:

- The establishment of the 4 kinds of nonlinear models using 25 hydroxyl benzoic acid esters by SVM method, through internal and external validation, the stability, and robustness, and internal and external predictive ability of 4 kinds of models are good; that is, the models are available and may predict new compounds in the applicability domain.
- The model of linear kernel function and eps-regression type has the largest
*R*^{2}and*Q*^{2}_{LOO}, the minimum SDEP_{ext}, and the optimal linear relationship between predictive and experimental value of lgt_{1/2}in 4 kinds of SVM models, which is the optimal model. - SVM algorithm is a good method to solve the problem of multicollinearity and complex nonlinear relationship between molecular descriptors in QSAR modeling.
- E, ZPE, and p are the key factors for antibacterial half-life of hydroxyl benzoic esters.

This study was supported the Natural Science Foundation of Guangdong Province (Grant no. 2014A030313585), the Natural Sciences Funds, China (Grant no. 81473588, 2014), and Guangdong Province Science and Technology New Drug R&D Key Project (Grant no. 2013A022100041).

The authors confirm that this article's content has no conflicts of interest.

1. Duchowicz P. R., Castro E. A., Fernández F. M. Alternative algorithm for the search of an optimal set of descriptors in QSAR-QSPR studies. *Communications in Mathematical and in Computer Chemistry*. 2006;55(1):179–192.

2. Mu G., Liu H., Wen Y., Luan F. Quantitative structure-property relationship study for the prediction of characteristic infrared absorption of carbonyl group of commonly used carbonyl compounds. *Vibrational Spectroscopy*. 2011;55(1):49–57. doi: 10.1016/j.vibspec.2010.07.007. [Cross Ref]

3. Zhu H., Rusyn I., Richard A., Tropsha A. Use of cell viability assay date improves the prediction accuracy of conventional quantitative structure-activity relationships models of animal carcinogenicity. *Environmental Health Perspectives*. 2008;116(4):506–513. doi: 10.1289/ehp.10573. [PMC free article] [PubMed] [Cross Ref]

4. Drosos J. C., Viola-Rhenals M., Vivas-Reyes R. Quantitative structure-retention relationships of polycyclic aromatic hydrocarbons gas-chromatographic retention indices. *Journal of Chromatography A*. 2010;1217(26):4411–4421. doi: 10.1016/j.chroma.2010.04.038. [PubMed] [Cross Ref]

5. D'Archivio A. A., Maggi M. A., Mazzeo P., Ruggieri F. Quantitative structure-retention relationships of pesticides in reversed-phase high-performance liquid chromatography based on WHIM and GETAWAY molecular descriptors. *Analytica Chimica Acta*. 2008;628(2):162–172. doi: 10.1016/j.aca.2008.09.018. [PubMed] [Cross Ref]

6. Tropsha A. Best practices for QSAR model development, validation, and exploitation. *Molecular Informatics*. 2010;29(6-7):476–488. doi: 10.1002/minf.201000061. [PubMed] [Cross Ref]

7. Steger-Hartmann T., Boyer S. *Computer-Based Prediction Models in Regulatory Toxicology*. Vol. 1. Berlin, Germany: Springer-Verlag; 2014.

8. Ruusmann V., Sild S., Maran U. QSAR DataBank - an approach for the digital organization and archiving of QSAR model information. *Journal of Cheminformatics*. 2014;6(1):1–17. doi: 10.1186/1758-2946-6-25. [PMC free article] [PubMed] [Cross Ref]

9. Schultz T. W., Cronin M. T. D., Walker J. D., Aptula A. O. Quantitative structure-activity relationships (QSARS) in toxicology: a historical perspective. *Journal of Molecular Structure: THEOCHEM*. 2003;622(1-2):1–22. doi: 10.1016/S0166-1280(02)00614-0. [Cross Ref]

10. Ma B., Chen H., Xu M., Hayat T., He Y., Xu J. Quantitative structure-activity relationship (QSAR) models for polycyclic aromatic hydrocarbons (PAHs) dissipation in rhizosphere based on molecular structure and effect size. *Environmental Pollution*. 2010;158(8):2773–2777. doi: 10.1016/j.envpol.2010.04.011. [PubMed] [Cross Ref]

11. Lee P. Y., Chen C. Y. Toxicity and quantitative structure–activity relationships of benzoic acids to *Pseudokirchneriella subcapitata*. *Journal of Hazardous Materials*. 2009;165(1–3):156–161. doi: 10.1016/j.jhazmat.2008.09.086. [PubMed] [Cross Ref]

12. Charnock C., Finsrud T. Combining esters of para-hydroxy benzoic acid (parabens) to achieve increased antimicrobial activity. *Journal of Clinical Pharmacy and Therapeutics*. 2007;32(6):567–572. doi: 10.1111/j.1365-2710.2007.00854.x. [PubMed] [Cross Ref]

13. Hoyt A. L., Bushman D., Lewis N., Faber R. Developing a modified preservative efficacy testing approach as a predictive tool for the evaluation of preservative systems in liquid home care products under variable test conditions. *Journal of AOAC International*. 2012;95(1):203–205. doi: 10.5740/jaoacint.10-513. [PubMed] [Cross Ref]

14. Mathammal R., Sangeetha K., Sangeetha M., Mekala R., Gadheeja S. Molecular structure, vibrational, UV, NMR, HOMO-LUMO, MEP, NLO, NBO analysis of 3,5 di tert butyl 4 hydroxy benzoic acid. *Journal of Molecular Structure*. 2016;1120:1–14. doi: 10.1016/j.molstruc.2016.05.008. [Cross Ref]

15. Cortes C., Vapnik V. Support-vector networks. *Machine Learning*. 1995;20(3):273–297. doi: 10.1007/BF00994018. [Cross Ref]

16. Zhang C., Tian Y., Deng N. The new interpretation of support vector machines on statistical learning theory. *Science China. Mathematics*. 2010;53(1):151–164. doi: 10.1007/s11425-010-0018-6. [Cross Ref]

17. Ha M., Wang C., Chen J. The support vector machine based on intuitionistic fuzzy number and kernel function. *Soft Computing*. 2013;17(4):635–641. doi: 10.1007/s00500-012-0937-y. [Cross Ref]

18. Hou X., Chen X., Zhang M., Yan A. QSAR study on the antimalarial activity of Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) inhibitors. *SAR and QSAR in Environmental Research*. 2016;27(2):101–124. doi: 10.1080/1062936X.2015.1134652. [PubMed] [Cross Ref]

19. Sharma N., Ethiraj K. R., Yadav M., et al. Identification of LOGP values and electronegativities as structural insights to model inhibitory activity of HIV-1 capsid inhibitors—a SVM and MLR aided QSAR studies. *Current Topics in Medicinal Chemistry*. 2012;12(16):1763–1774. doi: 10.2174/156802612803989309. [PubMed] [Cross Ref]

20. Khuntwal K., Yadav M., Nayarisseri A., Joshi S., Sharma D., Suhane S. Credential role of van der Waal volumes and atomic masses in modeling hepatitis C virus NS5B polymerase inhibition by Tetrahydrobenzo-thiophenes using SVM and MLR aided QSAR studies. *Current Bioinformatics*. 2013;8(4):465–471. doi: 10.2174/1574893611308040008. [Cross Ref]

21. Zhiming W., Han N., Zheming Y. Feature selection for high-dimensional data based on ridge regression and SVM and its application in peptide QSAR modeling. *Journal of Physical Chemistry*. 2013;03:498–507.

22. Zhang X. L., Zhou Z. X., Liu Y. H., et al. Predicting acute toxicity of aromatic amines by linear and nonlinear regression methods. *Chinese Journal of Structural Chemistry*. 2014;33(2):244–252.

23. Jiang C.-C., Zhou R.-J., Wang D.-J., et al. A QSAR study on antibacterial activity of p-hydroxybenzoate esters. *Modern Food Science and Technology*. 2014;30(6):98–102.

24. Qiu S., Jiang C., Zhou R., Wang D. A QSAR study on antibacterial activity of p-hydroxybenzoate esters. *Modern Food Science and Technology*. 2014;30(6):98–102.

25. Musa A. Y., Ahmoda W., Al-Amiery A. A., Kadhum A. A. H., Mohamad A. B. Quantum chemical calculation for the inhibitory effect of compounds. *Journal of Structural Chemistry*. 2013;54(2):301–308. doi: 10.1134/S0022476613020042. [Cross Ref]

26. Mozrzymas A. On the spacer group effect on critical micelle concentration of cationic gemini surfactants using molecular connectivity indices. *Combinatorial Chemistry and High Throughput Screening*. 2016;19(6):481–488. doi: 10.2174/1386207319666160504095717. [PubMed] [Cross Ref]

27. Frisch A., Frisch M. J., Clemente F. R., et al. *Gaussian 09 User’s Reference*. Gaussian Company Press; 2011.

28. Anonymous. Gaussian 09 Software Ported to 64-bit MAC OS Using PGI Compilers. *Computer Workstations*. 2010;23(6, article 17)

29. Atabati M., Emamalizadeh R. The hydrogen perturbation in molecular connectivity indices and their application to a QSPR study. *Journal of Solution Chemistry*. 2012;41(11):1922–1936. doi: 10.1007/s10953-012-9919-z. [Cross Ref]

30. Mozrzymas A. Modelling of the critical micelle concentration of cationic gemini surfactants using molecular connectivity indices. *Journal of Solution Chemistry*. 2013;42(11):2187–2199. doi: 10.1007/s10953-013-0095-6. [PMC free article] [PubMed] [Cross Ref]

31. Yasri A., Hartsough D. Toward an optimal procedure for variable selection and QSAR model building. *Journal of Chemical Information & Computer Sciences*. 2001;41:1218–1227. [PubMed]

32. Gramatica P., Sangion A. A Historical excursus on the statistical validation parameters for qsar models: a clarification concerning metrics and terminology. *Journal of Chemical Information and Modeling*. 2016;56(6):1127–1131. doi: 10.1021/acs.jcim.6b00088. [PubMed] [Cross Ref]

33. de Haas E. M., Eikelboom T., Bouwman T. Internal and external validation of the long-term QSARs for neutral organics to fish from ECOSAR™ *SAR and QSAR in Environmental Research*. 2011;22(5-6):545–559. doi: 10.1080/1062936X.2011.569949. [PubMed] [Cross Ref]

34. Gramatica P. Principles of QSAR models validation: internal and external. *QSAR and Combinatorial Science*. 2007;26(5):694–701. doi: 10.1002/qsar.200610151. [Cross Ref]

35. Alexander G., Alexander T. Beware of q2! *Journal of Molecular Graphics and Modelling*. 2002;20(4):269–276. doi: 10.1016/S1093-3263(01)00123-1. [PubMed] [Cross Ref]

36. Polanski J., Gieleciak R., Bak A. Probability issues in molecular design: Predictive and modeling ability in 3D-QSAR schemes. *Combinatorial Chemistry and High Throughput Screening*. 2004;7(8):793–807. doi: 10.2174/1386207043328292. [PubMed] [Cross Ref]

37. Kocjan B., Sliwiok J. Chromatographic and spectroscopic comparison of the hydrophobilici-ty of vitamins D_{2} and D_{3}. *Journal of Planar Chromatography. Modern TLC*. 1994;7(4):327–328.

38. Tropsha A., Gramatica P., Gombar V. K. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. *QSAR and Combinatorial Science*. 2003;22(1):69–77. doi: 10.1002/qsar.200390007. [Cross Ref]

39. Eriksson L., Jaworska J., Worth A. P., Cronin M. T. D., McDowell R. M., Gramatica P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. *Environmental Health Perspectives*. 2003;111(10):1361–1375. doi: 10.1289/ehp.5758. [PMC free article] [PubMed] [Cross Ref]

Articles from Bioinorganic Chemistry and Applications are provided here courtesy of **Hindawi**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |