Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2670071

Formats

Article sections

- Abstract
- I. Introduction
- II. Computational Details
- III. Results and Discussion
- IV. Conclusions
- References

Authors

Related links

J Phys Chem A. Author manuscript; available in PMC 2010 April 16.

Published in final edited form as:

PMCID: PMC2670071

NIHMSID: NIHMS105183

The publisher's final edited version of this article is available at J Phys Chem A

See "Molecular acidity: A quantitative conceptual density functional theory description." in *J Chem Phys*, volume 131 on page 164107.

See other articles in PMC that cite the published article.

An effective approach of estimating molecular pKa values from simple density functional calculations is proposed in this work. Both the molecular electrostatic potential (MEP) at the nucleus of the acidic atom and the sum of valence natural atomic orbitals are employed for three categories of compounds, amines and anilines, carbonyl acids and alcohols, and sulfonic acids and thiols. A strong correlation between experimental pKa values and each of these two quantities for each of the three categories has been discovered. Moreover, if the MEP is subtracted by the isolated atomic MEP for each category of compounds, we observe a single unique linear relationship between the resultant MEP difference and experimental pKa data of amines, anilines, carbonyl acids, alcohols, sulfonic acids, thiols, and their substituents. These results can generally be utilized to simultaneously estimate pKa values at multiple sites with a single calculation for either relatively small molecules in drug design or amino acids in proteins and macromolecules.

Knowledge of p*K*_{a} values, the acid-base dissociation constant, as a measure of the strength of an acid or a base, is essential for the understanding and quantitative treatment of acid-base processes in solution, and is relevant in chemical synthesis, pharmacokinetics, drug design and metabolism, toxicology and environmental protection. There has been an immense interest in the literature to develop new and reliable models to predict and estimate pKa values with approaches using ab initio, density functional theory, molecular modeling and statistical methods.^{1}^{-}^{12}

To compute accurate pKa values according to the thermodynamic cycle (Scheme 1) using ab initio and DFT methods is a challenging task for large systems such as proteins and DNA because the simulations must be carried out in solution. According to the cycle, a number of free energy changes must be simulated:^{1}^{,}^{13}

$$2.303\phantom{\rule{thinmathspace}{0ex}}RT\cdot pKa=\Delta {G}_{aq}^{p}=\Delta {G}_{sol}^{dp}+\Delta {G}_{sol}^{{H}^{+}}-\Delta {G}_{sol}^{p}+\Delta {G}_{gas}^{p}$$

(1)

where *R* is the Rydberg gas constant and *T* is the temperature. $\Delta {G}_{aq}^{p}$ is the sum of the free energy of deprotonation of the gas-phase species $\Delta {G}_{gas}^{p}$, the free energies of desolvation of the protonated form $-\Delta {G}_{sol}^{p}$ and solvation of the deprotonated form $\Delta {G}_{sol}^{dp}$, and the free energy of solvation for the proton $\Delta {G}_{sol}^{{H}^{+}}$. For large systems, *ab initio* simulations are still difficult even with the fastest software and hardware.

Much recent attention has been devoted to seeking statistical correlations of pKa values with quantum descriptors such as highest occupied molecular orbital (HOMO) energies,^{14} localized reactive orbital, frontier effective-for-reaction MOs (FERMO),^{10} electrophilicity or group philicity,^{15}^{,}^{16} etc. These relationships originated from the idea that proton or electron donor-acceptor reactions are driven by frontier molecular orbitals such as HOMO. However, the relations found were often only applicable within the same family of compounds like phenols, anilines, and azines.

It is our belief that molecular acidity is a property localized to the particular acidic atom and that the impact of the environment is reflected through the changes to that atom. The localized quantities that are relevant to the acidity of the given non-hydrogen acidic atom should be of either electrostatic or quantum nature, or both. In this work, we use two inter-dependent quantum descriptors to effectively and simultaneously estimate molecular pKa values for amines, anilines, carbonyl acids, alcohols, sulfonic acids, thiols, and their substituents. The two quantum descriptors are molecular electrostatic potential (MEP) on the acidic atom, MEP at N, O or S nucleus, and the sum of the valence p natural atomic orbitals, NAO, of the atom. Using MEP, or closely related quantities, to estimate pKa values^{3}^{,}^{17}^{-}^{20} and other properties ^{21}^{,}^{22} has a long history in the literature, and frontier orbitals such as FERMO have also been employed in predicting acidity.^{10} To the best knowledge of the authors, however, this is the first time that quantum descriptors such as MEP at the acidic nucleus and NAO are introduced generally in pKa estimation, and that the interdependence of these two quantities is revealed. In addition, these descriptors are applied to simultaneously estimate pKa values for more than one category of compounds at more than one atom type site.

A total of 228 molecular systems (154 primary, secondary and tertiary amines and anilines, 59 carboxylic acids and alcohols, and 15 sulfonic acids and thiols) have been investigated. A full structure optimization was first carried out at the DFT B3LYP/6−311+G(2d,2p) level. When a molecule has more than one stable conformation, all conformers will be examined and the one with the lowest energy will be employed in the subsequent calculations. After structure optimization, single point calculations are performed to obtain the molecular electrostatic potential on each of the nuclei followed by a full NBO ^{23} analysis. We obtained the initial structure and experimental pKa values are from the literature.^{24}^{-}^{31} To test the validity and applicability of the relationships presented in the text to other approaches, we also performed the same calculations with the Hartree-Fock method. We examined the results with the inclusion of the solvent effect in terms of the implicit PCM (Polarizable Continuum) Model. All calculations are performed with the Gaussian 03 package ^{32} with tight self-consistent field convergence and ultra-fine integration grids.

Figure 1 exhibits linear relationships between experimental pKa values and each of the two quantities for three categories of compounds, amines and anilines (N, blue color), carboxylic acids and alcohols (O, red color), and sulfonic acids and thiols (S, green color) as well as their derivatives. Their respective data of MEP and NAO are shown in Tables 1 to to3.3. This is seen that reasonable linear relationship is obtained for each category of compounds for each of the two quantum descriptors, giving the correlation coefficient R^{2} of 0.881, 0.878 and 0.926 for N, O, and S containing compounds, respectively, from the MEP vs. pKa plot, and R^{2} = 0.905, 0.924, 0.913 for N, O, and S containing compounds, respectively, from the NAO vs. pKa plot. An average correlation coefficient of 0.904 is observed from these correlations.

Linear relationships between molecular electrostatic potential on acidic nucleus and experimental pKa values for amines (N), carboxylic acids and alcohols (O), and sulfonic acids and thiols (S) (upper panel); and linear relationships between the sum of **...**

Molecular electrostatic potential on the acidic atom nucleus, experimental pKa data and valence NAO energies for amines and anilines (N-containing) calculated at the B3LYP/6−311+G(2d,2p) level of theory. Atomic units.

Molecular electrostatic potential on the acidic atom nucleus, experimental pKa data and valence NAO energies for sulfonic acids and thiols (S-containing) calculated at the B3LYP/6−311+G(2d,2p) level of theory. Atomic units.

Moreover, if one given number, the MEP evaluated for the isolated neutral acidic atom, is subtracted from the MEP value on the acidic nucleus for each of the three categories of compounds and then all MEP differences of the three categories are plotted together against the experimental pKa data, one single linear relationship, as shown Fig. 2, is obtained with the correlation coefficient R^{2}=0.896. The aforementioned reference MEP value (isolated atoms of N, O, and S) employed in this work is −18.28 a.u. for amine and aniline compounds, −22.20 a.u. for carboxylic acids and alcohols, and −59.12 a.u. for sulfonic acids and thiols.

Linear relationship between the MEP difference and experimental pKa values for all 228 data points. The MEP reference value for N, O, and S compounds is −18.28, −22.20, and −59.12 a.u., respectively. Symbols: N: open blue cycle; **...**

The universality of the above linear relationship between the MEP difference [MEP (in molecule) – MEP (neutral isolated atom)] and pKa values for different kinds of compounds can be understood in this manner. The molecular electrostatic potential on a nuclear R_{A} can be expressed as follows:

$${V}_{{\mathbf{R}}_{A}}=\sum _{i\ne A}\frac{{Z}_{i}}{\mid {\mathbf{R}}_{i}-{\mathbf{R}}_{\mathbf{A}}\mid}-\int \frac{\rho \left(\mathbf{r}\right)}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau $$

(2)

This quantity is system dependent because it is a function of {Z_{i}}. However, if one uses the sum of atomic electron densities as the zeroth-order approximation for the total molecular electron density, plus a local environment dependent correction,

$$\rho \left(\mathbf{r}\right)=\sum _{i}{\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})+\sum _{i}{f}_{i}(\mid \mathbf{r}-{\mathbf{R}}_{i}\mid ,NA{O}_{v,i})$$

(3)

and inserts it into the MEP formula, the first term of the MEP can be arranged to cancel approximately, leaving the correction term dependent only on the local environment of the nucleus. To demonstrate, let us rewrite Eq. (3) as

$$\rho \left(\mathbf{r}\right)={\rho}_{A}^{0}(\mathbf{r}-{\mathbf{R}}_{\mathbf{A}})+\sum _{i\ne A}{\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})+\sum _{i}{f}_{i}(\mid \mathbf{r}-{\mathbf{R}}_{i}\mid ,NA{O}_{v,i}).$$

(4)

With Eq. (4), we have

$$\int \frac{\rho \left(\mathbf{r}\right)}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau =\int \frac{{\rho}_{A}^{0}\left(\mathbf{r}\right)}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau +\sum _{i\ne A}\frac{{Z}_{i}}{\mid {\mathbf{R}}_{i}-{\mathbf{R}}_{\mathbf{A}}\mid}+\sum _{i}\int \frac{{g}_{i}(\mid \mathbf{r}-{\mathbf{R}}_{i}\mid ,NA{O}_{v,i})}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau .$$

(5)

To obtain the second term at the right-hand side of Eq. (5), we employed the approximation that **R**_{i} and **R**_{A} are separated (i.e., Atoms **A** and **i** are not overlapped) so when calculating MEP at **R**_{A} from contributions of atoms **R**_{i}, we assume **r** ≈**R _{i}** or |

$$\int \frac{\sum _{i\ne A}{\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau =\sum _{i\ne A}\int \frac{{\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau \approx \sum _{i\ne A}\int \frac{{\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})}{\mid {\mathbf{R}}_{i}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau \approx \sum _{i\ne A}\frac{\int {\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})d\tau}{\mid {\mathbf{R}}_{i}-{\mathbf{R}}_{\mathbf{A}}\mid}\approx \sum _{i\ne A}\frac{{Z}_{i}}{\mid {\mathbf{R}}_{i}-{\mathbf{R}}_{\mathbf{A}}\mid}$$

(6)

The physical meaning of the above approximation is that the electrostatic potential at points **A** outside a spherical charge distribution ρ_{i}(**r**) is equal to the electrostatic potential generated by the point charge Z_{i} from the center of the spherical atom **i** (Scheme 2). To get the last equality of Eq. (6), we used

$$\int {\rho}_{i}^{0}(\mathbf{r}-{\mathbf{R}}_{i})d\tau ={Z}_{i}.$$

(7)

The last term of Eq. (6) absorbed approximations from Eq. (7). Since

$${V}_{{\mathbf{R}}_{A}}^{0}=\int \frac{{\rho}_{A}^{0}\left(\mathbf{r}\right)}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau ,$$

(8)

with Eqs. (2), (5), and (8), there arrives

$${V}_{{\mathbf{R}}_{\mathbf{A}}}-{V}_{{\mathbf{R}}_{\mathbf{A}}}^{0}=\sum _{i}\int \frac{{g}_{i}(\mid \mathbf{r}-{\mathbf{R}}_{i}\mid ,NA{O}_{v,i})}{\mid \mathbf{r}-{\mathbf{R}}_{\mathbf{A}}\mid}d\tau $$

(9)

From the model density, Eq. (3), we know that the correction terms, g_{i}(|**r-R _{i}**|,NAO

A strong correlation between the MEP on the acidic nucleus and the sum of the atom's valence natural atomic orbitals is observed. As an illustrative example, Figure 3 exhibits the relationship for amines and anilines. A similar correlation is seen for O and S containing compounds as well (not shown). Notice that the valence natural atomic orbitals employed in this study are 2p orbitals for nitrogen and oxygen and 3p orbitals for sulfur. We considered to add 2s/3s atomic orbitals in the summation but no significantly different results were obtained. The strong correlation between the MEP on a nucleus and the valence NAO indicates that the correction term in Eq. (3), f_{i}(|**r-R _{i}**|), is dominated by the contribution from the valence part of NAOs of the atom.

Strong linear relationship between MEP on N and the sum of nitrogen 2Px/2Py/2Pz NAO for N-containing compounds (amines and anilines) at the level of B3LYP/6−311+G(2d,2p). Atomic units.

The MEP data are from DFT gas phase calculations at the B3LYP/6−311+G(2d,2p) level. Taking the solvent effect into account does not destroy the correlation between MEP on the nucleus and experimental pKa data. An example is illustrated in Fig. 4 for the N-containing compounds, where one can see that the correlation coefficient is similar to that of the gas phase results. Also, we performed MEP calculations at other levels of theory, such as Hartree-Fock theory (Fig. 5) or with different density functionals; no significantly difference in the correlation was seen. In addition, for amines we also considered the protonated, conjugate species, but no statistically significant correlation between MEP at N and pKa data is observed (results not shown).

The impact of the solvent effect on the correlation between MEP on N and experimental pKa data for N-containing compounds (amines and anilines). The implicit PCM (Polarizable Continuum Model) and 6−311+G(2d,2p) basis set were used.

The strong linear relationship between MEP on N nucleus and experimental pKa data for amines and anilines using the Hartree-Fock method and 6−311+G(2d,2p) basis set.

One possible application of these results is to estimate pKa values with a single DFT calculation for amino acids and peptides where different pKa values at different atom sites are possible. As an illustrative example, we estimated pKa values of cysteinylcysteine, which has four acidic sites, O, S1, S2, and N. Using the relationships in Fig. 1, we obtained the pKa values to be 3.5 (O), 6.9 (S1), 8.2 (S2) and 9.8, respectively, whereas experimental data give 2.7, 7.3, 9.4, and 10.9, respectively. Similar results are obtained when the relationship in Fig. 2 is employed. In both cases, reasonable pKa values are obtained and the order of acidity of the four atoms is correctly predicted.

An effective approach of estimating molecular pKa values from simple gas-phase density functional calculations is proposed in this work, using either the molecular electrostatic potential on the nucleus of the acidic atom or the sum of valence natural atomic orbitals. A strong correlation between experimental pKa values and each of these two quantities has been discovered. Moreover, if the MEP is subtracted by a given reference value for each category of compounds, we observe a single unique linear relationship between the MEP difference and experimental pKa data of amines, anilines, carbonyl acids, alcohols, sulfonic acids, thiols, and their substituents. With a single DFT calculation these results can conveniently be utilized to simultaneously estimate pKa values at multiple sites of small molecules in drug design and of amino acids in proteins and macromolecules.

This work was supported in part by the National Institute of Health (HL-06350), NSF (FRG DMR-0804549) and the Intramural Research Program of NIH, NIEHS. We acknowledge the use of the computational resources provided by Research Computing Center at University of North Carolina at Chapel Hill and the Biomedical Unit of the Pittsburgh Supercomputer Center.

1. Jorgensen WL, Briggs JM, Gao JJ. Am. Chem. Soc. 1987;109:6857.

2. Potter MJ, Gilson MK, McCammon JA. J. Am. Chem. Soc. 1994;116:10298.

3. Rajasekaran E, Jayaram B, Honig BJ. Am. Chem. Soc. 1994;116:8238.

4. Alagona G, Ghio C, Kollman PA. J. Am. Chem. Soc. 1995;117:9855.

5. Jorgensen WL, Briggs JM. J. Am. Chem. Soc. 1989;111:4190.

6. Lim C, Bashford D, Karplus MJ. Phys. Chem. 1991;95:5610.

7. Namazian M, Heidary H. Theochem-J. Mol. Struct. 2003;620:257.

8. Nielsen JE, Mccammon JA. Protein Sci. 2003;12:1894. [PubMed]

9. Fitch CA, Garcia-Moreno B. Biophys. J. 2004;86:86A.

10. da Silva G, Kennedy EM, Dlugogorski BZ. J Phys Chem A. 2006;110:11371. [PubMed]

11. Ohno K, Sakurai MJ. Comput. Chem. 2006;27:906. [PubMed]

12. MacDermaid CM, Kaminski GA. J. Phys. Chem. B. 2007;111:9036. [PubMed]

13. Pliego JR. Chem. Phys. Lett. 2003;367:145.

14. De Proft F, Amira S, Choho K, Geerlings PJ. Phys. Chem. B. 1995;98:5227.Machado HJS, Hinchliffe A. Theochem-J. Mol. Struct. 1995;339:255.

15. Deka RC, Roy RK, Hirao K. Chem. Phys. Lett. 2004;389:186.

16. Gupta K, Roy DR, Subramanian V, Chattaraj PK. Theochem-J. Mol. Struct. 2007;812:13.

17. Nagy P, Novak K, Szasz G. Theochem-J. Mol. Struct. 1989;60:257.

18. Brinck T, Murray JS, Politzer P, Carter RE. J. Org. Chem. 1991;56:2934.

19. Gross KC, Seybold PG, Peralta-Inga Z, Murray JS, Politzer PJ. Org. Chem. 2001;66:6919. [PubMed]

20. Ma Y, Gross KC, Hollingsworth CA, Seybold PG, Murray JS. J. Mol. Model. 2004;10:235.

21. Politzer P. Theor. Chem Acc. 2004;111:395.

22. Politzer P, Ma YG, Jalbout AF, Murray JS. Mol. Phys. 2005;103:15.

23. Glendening ED, Badenhoop J,K, Reed AE, Carpenter JE, Bohmann JA, Morales CM, Weinhold F. NBO 5.0. Theoretical Chemistry Institute, University of Wisconsin; Madison: 2001.

24. Haiting Lu Xi Chen, Chang-Guo Zhan. J. Phys. Chem. B. 2007;111:10599. [PMC free article] [PubMed]

25. Liptak Matthew D., Shields George C. J. Am. Chem. Soc. 2001;123:7314. [PubMed]

26. Liptak Matthew D., Gross Kevin C., Seybold Paul G., Feldgus Steven, Shields George C. J. Am. Chem. Soc. 2002;124:6421. [PubMed]

27. Schruumann G, Cossi M, Barone V, Tomasi J. J. Phys. Chem. A. 1998;102:6706.

28. Gross Kevin C., Seybold Paul G. J. Org. Chem. 2001;66:6919. [PubMed]

29. da Silva Rodrigo R., Ramalho Teodorico C., Santos Joana M., Figueroa-Villar J. Daniel. J. Phys. Chem. A. 2006;110:1031. [PubMed]

30. Chaudry UA, Popelier PLA. J. Org. Chem. 2004;69:233. [PubMed]

31. Lide DR. Handbook of Chemistry and Physics. 88th ed. CRC Press; Boca Raton, New York: 2007.

32. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Montgomery JA, Jr., Vreven T, Kudin KN, Burant JC, Millam JM, Iyengar SS, Tomasi J, Barone V, Mennucci B, Cossi M, Scalmani G, Rega N, Petersson GA, Nakatsuji H, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Klene M, Li X, Knox JE, Hratchian HP, Cross JB, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Ayala PY, Morokuma K, Voth GA, Salvador P, Dannenberg JJ, Zakrzewski VG, Dapprich S, Daniels AD, Strain MC, Farkas O, Malick DK, Rabuck AD, Raghavachari K, Foresman JB, Ortiz JV, Cui Q, Baboul AG, Clifford S, Cioslowski J, Stefanov BB, Liu G, Liashenko A, Piskorz P, Komaromi I, Martin RL, Fox DJ, Keith T, Al-Laham MA, Peng CY, Nanayakkara A, Challacombe M, Gill PMW, Johnson B, Chen W, Wong MW, Gonzalez C, Pople JA. Gaussian 03. Gaussian, Inc.; Pittsburgh, PA: 2003. revision E.01.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |