|Home | About | Journals | Submit | Contact Us | Français|
A new implicit solvation model was developed for calculating free energies of transfer of molecules from water to any solvent with defined bulk properties. The transfer energy was calculated as a sum of the first solvation shell energy and the long-range electrostatic contribution. The first term was proportional to solvent accessible surface area and solvation parameters (σi) for different atom types. The electrostatic term was computed as a product of group dipole moments and dipolar solvation parameter (η) for neutral molecules, or using a modified Born equation for ions. The regression coefficients in linear dependencies of solvation parameters σi and η on dielectric constant, solvatochromic polarizability parameter π*, and hydrogen-bonding donor and acceptor capacities of solvents were optimized using 1269 experimental transfer energies from 19 organic solvents to water. The root-mean-square errors for neutral compounds and ions were 0.82 and 1.61 kcal/mol, respectively. Quantification of energy components demonstrates the dominant roles of hydrophobic effect for non-polar atoms and of hydrogen-bonding for polar atoms. The estimated first solvation shell energy outweighs the long-range electrostatics for most compounds including ions. The simplicity and computational efficiency of the model allows its application for modeling of macromolecules in anisotropic environments, such as biological membranes.
Development of reliable, accurate and efficient methods for modeling of biologically active compounds, peptides and proteins in phospholipid membranes is an important problem in computational chemistry1–3. During the association with membranes some parts of a large molecule may remain in water, while other parts enter into the milieu of different polarity, such as the water-rich interfacial lipid head group region or the hydrophobic acyl chain region. To model this process, the highly anisotropic environment of the lipid bilayer can be approximated by one or several slabs with different dielectric properties2, 4. In a more realistic approach, the lipid bilayer may be described by continuous polarity profiles that can be obtained experimentally using different spectroscopic probes4–9 or theoretically.
In order to reproduce the behavior and energetics of complex biological systems, it is important to have a solvation model that satisfies at least three basic requirements. It should be universal to allow calculating transfer energy from water or vapor to any liquid phase10–12. It should be physics-based to describe all essential components of free energy, including hydrophobic interactions, solute-solvent hydrogen bonds and long-range electrostatics. Finally, it should be computationally efficient to be applicable for macromolecular complexes and high-throughput screening. Though a great variety of methods have been developed for the theoretical assessment of solute-solvent interactions13–15, none of them fully satisfy these requirements.
Full-atomic molecular dynamics (MD) simulations with explicit solvent provide the highest level of structural details, but at the expense of computational efficiency. Despite the algorithmic advances and significant progress in technology which allow running the MD simulations on nanosecond or even millisecond time scales16, an ab initio folding and insertion of macromolecules into biomembranes still remains a significant challenge for this approach. This led to the use of coarse-grained (GC) simulations of membrane-protein complexes, which strongly simplify molecular structure of proteins and lipids17. However, neither CG nor even more rigorous all-atom MD simulations are sufficiently accurate in predicting the free energy of solvation, because the underlying force fields neglect solvent and solute polarization18, 19 and the environment-dependence of van der Waals (vdW) forces20. Furthermore, these methods operate with the potential energy of molecules21. The calculation of free energy requires an extensive conformational sampling and estimation of the entropy of the solute-solvent system22. Small errors in the underlying force fields may accumulate during the calculations of large potential energies. Therefore, the estimations of transfer free energies by these methods are less reliable than those performed by advanced empirical continuum models 4, 23, 24.
On the other hand, solvation models based on empirical parameterization, such as Quantitative Structure Activity Relationship (QSAR) and Linear Solvation Energy Relationship (LSER) models, represent a more straightforward approach to quantitative evaluation of the solvation energy, because all parameters of these models are derived from the experimental partition coefficients that are directly related to the transfer free energy25. In these models the solute is usually considered at all-atom level or using various molecular and fragmental descriptors26, while solvent is treated implicitly as an isotropic liquid phase. However, these models are usually employed for small molecules that are fully exposed to the solvent, but not to macromolecules that have a significant fraction of solvent-inaccessible atoms.
A similar approach can be applied to larger molecules with a significant portion of buried atoms by assuming that energy of the solute-solvent interactions is proportional to the solvent-accessible surface areas (ASA) for different atom types of the macromolecular solute. The corresponding ASA are multiplied by atomic solvation parameters, σi, which are defined as solvation free energy changes per surface unit area27 A number of ASA-based implicit solvation models have been developed and successfully applied for simulation of peptides and proteins in the lipid bilayers3, 28–31. The computational efficiency of such an approach allowed us to use it for large-scale calculations of the spatial arrangement in membranes of integral proteins from the entire Protein Data Bank 3, 32, 33. However, the accuracy of these models is limited because they use a simplified “hydrophobic slab” representation of the lipid bilayer, lack the adequate representation of the complex bilayer interfaces, and do not properly define the electrostatic energy, which is included as a part of the ASA-term rather than a function of atomic charges or dipoles.
More advanced continuum solvation models have been developed to account for energy of electrostatic interactions, using, for example, the finite difference solution of the Poisson equation34 or Generalized Born method35. Some of these models were modified to deal with the heterogeneous dielectric environment of membranes by including a position-dependent scaling factor1, 36, 37. These models assume that the solvent polarity can be characterized solely by static dielectric constant (e). However, it has been well recognized that solvent strength depends on many other factors38. In particular, the importance of solute-solvent H-bonds led to development of empirical hydrogen-bonding acidity (α) and basicity (β) parameters39–42 that were proven to be important for quantification of solubility data. Moreover, the electrostatic interactions of dipoles with media can be described by the solvent solvatochromic polarity/polarizability parameter (π*) better than by macroscopic dielectric constant43, 44. While focusing on the bulk-dielectric treatment of the solvent, the continuum electrostatic models do not accurately quantify hydrogen-bonding and other interactions in the first solvation shell14, 45. Instead, the adjustment of effective atomic radii is frequently used as a simplified approach to deal with this issue 24, 35.
These problems can be resolved by combining the continuum electrostatic models with ASA-based methods that allow incorporation of the first solvation shell effects, as in a series of SMx implicit solvation models23, 46. SMx models represent the universal approach to solvation modeling, which permits prediction of transfer energies between any gaseous and condensed fluid phases23. However, due to the quantum mechanical calculations, this approach is mostly useful for small molecules in isotropic phases rather than for proteins in anisotropic environments.
Here we present a new implicit solvation model for calculating transfer energy of molecules from water to a fluid medium with defined polarity parameters. The proposed model is empirically parameterized to account for both long-range and short-range contributions to the transfer free energy. The model is simple and computationally efficient enough to be used for large macromolecular systems, while remaining universal, sufficiently accurate and physically realistic.
The free energy of transferring a compound from solvent S to water is decomposed into a sum of first solvation shell and long-range electrostatic contributions:
The first solvation shell effects include solute-solvent vdW forces, hydrogen-bonding, and the hydrophobic interactions. Such effects are expected to be proportional to ASA of different atom types in neutral or charged solutes:
where is an atomic solvation parameter of atom type (expressed in cal mol−1Å−2), i ASAi is the solvent accessible surface area of atom i, Ntypes is the number of different atom types and Ji is the number of atoms of type i in the solute molecule. The direction of the transfer is chosen as vapor→non-aqueous solvent→water to be consistent with other publications.
Electrostatic contribution to transfer energy originates from long-range dipole-dipole interactions and polarization of the solvent and solute46. For a neutral solute, this contribution was assumed to be proportional to a sum of group dipole moments in the molecule:
where μl is a dipole moment of a group l, ηS→wat is a dipolar solvation parameter that represents transfer energy of 1 Debye (D) from the solute to water (expressed in cal mol−1 D−1), and r =1 or 2, depending on the model (“μ1“ or “μ2”). We used dipole moments rather than partial atomic charges, similar to that in theories of intermolecular forces20.
Judging from LSER studies, σi, and η can be described by linear functions of certain solvent properties. Here we investigated the relationships between the atomic solvation parameters and different bulk properties of the solvent, such as dielectric constant (ε), solvatochromic parameter (π*), hydrogen bonding acidity ( a) and basicity (β), refraction index (n), and macroscopic surface tension coefficient (γ). Parameters a and β were used to describe the hydrogen bonding properties of solvents rather than solutes, similar to that in SMx models10, but unlike that in LSER models39, 40. We found that solvation parameters σi of neutral atoms and ions can be described by the same general equation:
where εS, αS, and βS are the macroscopic dielectric constant and hydrogen bonding donor (acidity) and acceptor (basicity) parameters of solvent S, respectively; εwat, αwat and βwat are the same parameters for water, and represents transfer energy of atom i from water to a solvent with the same dielectric constant and hydrogen bonding properties as water. We found that dipolar transfer energy can be well described by two alternative models:
where edip is the weighting coefficient; π* is the solvatochromic parameter, and FBW is the Block-Walker (BW) dielectric function of the solvent47.
The electrostatic contribution to the transfer energy of an ion was calculated as Born energy:
where eBorn is a dimensionless weighting factor, and EBorn was calculated using a modified Born equation48:
where r is ionic radius, q is number of charges, e is charge of the electron equal to 1.602E-19 (C), and NA is Avogadro’s number equal to 6.02E23.
The computational procedure included four steps: (1) generation of low energy conformations of solute molecules; (2) calculation of ASA for different atom types in the molecules; (3) assignment of dipole moments to molecular fragments and calculation of their scalar sum for each solute, and (4) least square fitting of calculated and experimental transfer energies to determine parameters of the model.
The structures of compounds were generated using Molecular Editor of QUANTA and optimized using the CHARMm force field (Accelrys Software Inc). Most of the solute molecules are small and rigid, with only a few rotation bonds. A simple conformational search was performed for flexible molecules. Only the lowest energy conformation of each molecule was used for calculating ASA, since the averaging of several conformations produced only minor changes in the final parameters.
ASA were calculated using the subroutine SOLVA from NACCESS49 with the solvent probe radius of water (1.4 Å) and standard vdW and ionic radii50–52. The fitting was performed by LSQR program from LAPACK library.
Dipole moments of molecular groups were defined as follows. Any polar groups separated by at least one aliphatic carbon, such as two adjacent peptide groups, were treated as independent dipoles. Any aromatic or conjugated system with several covalently attached polar substituents, like nucleotides or methoxyphenol, was treated as a single dipole. Changes in the energy of intramolecular dipole-dipole interactions during transfer of molecules from a solvent to water were neglected. These approximations resulted in relatively small errors (less than 1 kcal/mol) except for aromatic push-and-pull electronic systems, such as phenyl rings with several polar substituents, where errors were larger (~1–2 kcal/mol).
Two different least square fitting procedures were used. During the preliminary analysis of data, the fitting was performed for individual solvent-water systems. The values of parameters σi,S→wat and ηS→wat were determined by solving the following system of linear equations:
where is the experimental transfer energy of compound k from solvent S to water (k=1, 2..K), K is the number of compounds in the set; and other terms are as defined in equations (2) and (4). Similar fitting was performed using free energy or enthalpy of transfer from vapor to different solvents.
Final parameters of the universal model were determined by fitting simultaneously for 19 solvent-water systems, but separately for transfer energies of neutral molecules and ions. The following system of linear equations was solved with respect to variables , ei, ai, bi, edip and eBorn (equations 4–6 and 8):
where cim and pm denote coefficients ei, ai and bi and the corresponding solvent polarity parameters from (3) and is experimental transfer energy of compound k from solvent n (n=1,2…Nsolv) to water. The number of variables was kept to a minimum in order to avoid overfitting, which was defined as the situation in which incorporation of an additional linear regression coefficient leads to only a negligible root-mean-square error (rmse) improvement (by 0.002 kcal/mol or less). In this case the additional coefficient was assigned to zero and excluded from the fitting.
We analyzed the most common solvents for which a significant sample of experimental data were made available. They include water, 8 polar solvents that mix with water, 10 non-polar solvents, and an “aliphatic hydrocarbon” dataset53. Empirical polarity parameters of 20 solvents (Table S1) included solvatochromic dipolarity/polarizability parameter ( π*)43, 44, dielectric constants (ε)54, 55, hydrogen bonding acidity (a) and basicity (β) parameters (Σa2 and Σβ2, respectively, in notation of Abraham)39, 40, concentrations of water under saturating conditions (Cw, M)43. The dielectric constant of 1,9-decadiene was estimated using a linear extrapolation for a series of ε values observed in 1,n-dienes, n=1,2,…756. Parameters π*, α and β of 1,9-decadiene were taken as for cyclohexa-1,4-diene.
Partition coefficients of 223 neutral compounds between organic solvents and water (1164 values) or vapor (545 values) were taken from published compilations rather than from commercial databases (Tables S2-S5). The compounds encompassed organic molecules from different classes, such as alkanes, alkenes, alkines, alcohols, ketones, esters, ethers, organic amines, amides and acids, aromatic compounds, heterocycles, molecules containing sulfur-, halogen-, nitrile- or nitro-groups, including drugs and peptide analogues.
We used experimental partition coefficients that have been compiled for specific classes of compounds, such as nucleotides57 and amino acid analogues58; for specific solvents, such as dichloroethane59–61, decadiene62, 63, diethyl and dibutyl ethers64, 65, butyl acetate66, chloroform67, acetonitrile68, methanol69, 70, ethanol71, propanol72, butanol73 and N,N-dimethylformamide68, 70, 74, and compilations for several solvents12, 24, 75–80. Partition coefficients of 11 ions between water and 10 organic solvents were as reported by Abraham and Zhao81. Transfer energies from vapor were taken mostly from the compilations by Li et al.77 and Katritzky et al.75. Enthalpies of transfer from vapor to water were taken from Minz et al.82.
Transfer energies were calculated from experimental partition coefficients in accordance with the equation:
where P = CB/CA is a partition coefficient of a solute between solvents A and B defined as the ratio of molar solute concentrations at equilibrium. At T=298K, ΔGsol = −1.3635 logP (kcal/mol). For the gas-to-solvent transfer, P was replaced by the corresponding dimensionless gas-to-vapor partition coefficient66, 83. The importance of using standard molar concentrations was justified previously84. Transfer energies from polar solvents that mix with water (such as acetone and dimethyl sulfoxide) were derived from a simple thermodynamic cycle as the difference of transfer energies measured from water and the solvent to a third medium.
Experimental dipole moments were taken from compilations for molecules85 and molecular groups86, 87. We used dipole moments measured in benzene at 298K, when available. Dipole moments of imidazole and nucleotides were taken from original experimental studies88–94. Theoretically calculated dipole moments were used only for several nucleotide derivatives57. The sum of group dipole moments varied from zero to 9.2 D in our dataset.
The model was developed in two steps. In the first step, we completed a preliminary study of data for individual solvent-water systems in order to: (1) explore the possibility of empirical separation of electrostatic and non-electrostatic components described by dipolar (η) and atomic (σi) solvation parameters; (2) analyze dependencies of the electrostatic and non-electrostatic components on the solvent properties, and (3) to determine the minimal set of different atom types (Table 1). In the second step we determined the final set of parameters and equations of the universal solvation model using multiple regression analysis of transfer energies for 19 solvents-water systems combined, which was performed separately for neutral molecules and ions.
The energetics of transfer from each individual solvent to water was described by a unique set of empirical parameters σi and η that defined the ASA-dependent and the dipole moment–dependent components of the transfer free energy, respectively. The values of solvation parameters were determined for nine individual solvent-water systems (Figure 1, Table 2). Transfer energies were calculated with equations (1–3) and (10). Experimental values (Tables S2) were reproduced with rmse of 0.6 to 0.8 kcal/mol.
During the parameterization, the set of atom types was kept to a minimum to avoid overfitting. Hydrogen atoms were considered as part of the corresponding non-hydrogen group (e.g. OH, NH of CHn), similar to that in implicit solvation models for proteins. The set of atom types was gradually increased, starting from seven atom types for carbon, nitrogen and oxygen. Each additional atom type was incorporated only if two conditions were simultaneously met: (a) the rmse was reduced by at least 0.02 kcal/mol, and (b) the difference of solvation parameters σi for the additional and the original atom types exceeded the standard error in determination of their σi. Based on such criteria, it was possible to identify 14 atom types: Csp3, Csp2, Csp1, Csp3pol, N/HN, N, N=O, OH, O, S, F, Cl, Br, I (Table 1).
These atom types can be grouped into three distinct classes based on the values and behavior of their solvation parameters: (1) hydrophobic atoms with large, positive σi that are only weakly modulated by polarity of the non-aqueous solvent (Csp3, Csp2, S, F, Cl, Br, I and N=O); (2) environment-insensitive atoms whose σi are small for all solvent-water pairs (Csp3pol, Csp1 and N); and (3) polar atoms with large and negative σi that strongly depend on the solvent polarity (N/NH, OH and O) (Figure 1, Table 2). The hydrophobic character of S, N=O and halogen-containing groups becomes evident only after separating electrostatic and ASA-dependent contributions that have opposite signs.
The dependencies of solvation parameters σi on polarity of the solvent are illustrated by Figures 1, ,22 and Table 3. The σi parameters of polar atoms are negative for transfer from aliphatic solvents to water but closer to zero for transfer from more polar solvents to water. They correlate well with the hydrogen-bonding acidity (α), basicity (β) and dielectric functions of the solvent. However, the correlations are relatively poor for hydrophobic and environment-insensitive types of atoms. No significant correlations were found between σi of any atom types and other solvent properties, such as index of refraction (n) or (n2−1)/(n2+1), solubility parameter ET(30), and macroscopic surface tension (γ) at the liquid-air interface.
The long-range electrostatic component of transfer energy of polar groups with permanent dipole moments is also highly environment-dependent (Figure 3, Table 4). Most significant correlations are observed with solvatocromic dipolarity/polarizability parameter π* (R2=0.88) and with BW dielectric function (R2=0.85), but not with Kirkwood (R2=0.47) or 1/ε (R2=0.15) dielectric functions.
The electrostatic component was calculated using either linear or quadratic dependence on the solute dipole moments (“μ1 model” or “μ2 model”, respectively, eq. 3). The “μ1 model” provides the lower rmse values for individual water-solvent systems (Tables 2 and S6) and a better correlation with dielectric functions of the solvent (Table 4). Values of parameters σi are only weakly affected by the choice of the “μ model”.
To verify the general character of our model and the validity of the separation of dipolar and ASA-dependent components, we applied equation (10) to vapor-solvent transfer. The values of σi, and η were determined by fitting 545 transfer energies from vapor to seven non-polar solvents and to water for a set of 108 neutral compounds (Table S4). In addition, the enthalpic contributions to η and σi were evaluated by fitting enthalpies of transfer from vapor to water for a set of 80 neutral compounds 82. The results are presented in Figures 4, ,55 and Tables S7, S8, S9.
The values of atomic solvation parameters σi obtained by fitting transfer free energies and transfer enthalpies are significantly different, which reveals the presence of a large entropic component (Figure 5). The entropic and enthalpic components of σi parameters have opposite signs. This reflects the balance between the attractive intermolecular forces of enthalpic origin in water and the repulsive entropic component originating from the decreased mobility of water in the first hydration shell 95. Values of σi parameters of polar atoms are more negative for transfer from vapor to water (up to −80 cal mol−1 −2) (Table S7) than for transfer from solvents to water (Table 2). This reflects the gain of stabilizing solute-solvent dispersion attractions during transfer from vapor to condensed media, as was previously found 58. The entropic component is larger for polar groups (Figure 5), which may be due to the stronger orientational restrictions imposed by the solute-solvent H-bonds.
Unlike the ASA-dependent component, the long-range electrostatic energy determined from transfer free energies and enthalpies are nearly identical: η=1274±145 and −1249±116 cal mol−1 D−1, respectively. This confirms the enthalpic origin of the electrostatic energy term. The values of the parameter η for vapor-water transfer and cyclohexane-water transfer are relatively close (the two corresponding points are shown as “vap” and “chx” in Figure 3B). Consistent with this observation, the dipolar energy is close to zero for transfer from vapor to cyclohexane or other non-polar solvents including “average hydrocarbon”, benzol, dibutylether and diethylether (Figure 4). The dipolar energy was detectable only for transfer from vapor to more polar solvents (Figure 4). Such results are consistent with previous analyses of enthalpies of transfer from vapor to non-polar solvents measured for a large set of neutral compounds, where the electrostatic component of transfer energy was found to be negligible96.
The initially obtained sets of parameters σi and η (Table 2) can be used for calculations of transfer energies from each of the nine individual solvents to water. To make the model “universal”, we determined the regression coefficients in linear dependencies of σi and η on bulk solvent properties (equations 4–6) by multiple regression analysis simultaneously for 19 solvent-water systems. The systems of linear equations (11) were solved for neutral molecules and ions separately. The dataset for neutral molecules included 1164 experimental transfer free energies of 223 compounds from 18 solvents to water (Tables S2 and S3). The dataset for ions included 105 transfer energies of 11 ions from 10 polar solvents to water (Table S5).
The final set of parameters is shown in Table 5. The set includes 22 coefficients for 15 atom types, 11 coefficients for 11 ions, and parameters of the long-range electrostatic contribution, edip and eBorn from equation (5–6) and (8), respectively. During the fitting procedure N and NH atom types were separated, and a distinct type was assigned to each ion (Li+, Na+, K+, Rb+, Cs+, NH4+, F−, Cl−, Br−, I− and COO−). The set of parameters was not extended any further, because this did not improve the fitting. Table 1 shows the final set of 26 atom types and ions with their vdW and ionic radii. This parameter set can be employed for calculating transfer energies of any molecule with defined atom types and group dipole moments from water to any fluid phase with known polarity descriptors.
Consistent with the initial analysis, the hydrophobic, environment-insensitive and polar atom types behave differently in different solvents. The environment-insensitive atoms (Csp3pol, Csp1 and N) have very small values of and other regression coefficients equal to zero. Transfer energies of these atoms from any solvent to water are very small and do not depend on the solvent polarity. This may point to the absence of the hydrophobic effect and to the relatively weak hydrogen-bonding capacity that was shown for these atoms 41, or to the mutual cancellation of these two factors. Non-polar atoms (Csp3, Csp2, halogens and N=O) have significant positive values and small values of σi ai, and bi coefficients. This indicates the presence of a significant hydrophobic effect, which is only weakly modulated by polarity of the non-aqueous solvent. Sulfur occupies a borderline position between the non-polar and environment-insensitive atoms. The polar atoms (N, NH, O and OH) have equal to zero but significant values of ei, ai and bi coefficients. The zeroed indicates the lack of energetic penalty for transferring a polar atom from water to a solvent of equal polarity (with the same α, β and ε).
Multiple regression analysis demonstrates that atomic solvation parameters σi can be better described as linear functions of α, β, and 1/ε (Figure 2). Such behavior can be explained by a predominant role of hydrogen bonding in the transfer energy of polar atoms. There is a reciprocal relationship between the hydrogen bonding donor and acceptor capacities of solutes and solvents: σi of H-bond acceptors (N, O and OH) have non-zero coefficients ai, while σi of H-bond donors (NH and OH) have non-zero coefficients bi. The large absolute values of coefficient ei in 1/ε-dependent term for polar atoms indicate a significant dependence of this term on dielectric constant of the solvent.
Consistent with the initial results, the long-range electrostatic component can be described by a linear dependence on the solvent parameter π* or BW function (equations 5 and 6). Furthermore, we found that parameter π* performs consistently better than BW function, and that “μ1-model” provides a better fitting than “ μ2-model”: the rmse values with “μ1 model” are 0.82 and 0.87 kcal/mol, respectively, and with “μ2-model” are 0.92 and 0.94 kcal/mol, respectively. Therefore, the linear dependence η(π*), in combination with “μ1-model” (equations 3, 5) was selected as the best performing dielectric model for neutral molecules.
The independent fitting for ions indicates that their transfer energies can be adequately described as a combination of a short-range hydrogen bonding energy (equation 4), and the long-range electrostatic contribution defined by the modified Born equation (9). The weighting factor of Born energy (eBorn) was determined as −0.198 ±0.016 (Table 5). Judging from the obtained coefficients, acetate and fluoride anions are the strongest H-bond acceptors (aCOO−= −221 and aF−= −143 cal mol−1 A2, respectively), which is consistent with the published observations 81. The H-bond acceptor capacity decreases in the row F−>Cl−>Br−>I−. The H-bond donor capacity decreases in the row Li+>Na+>K+>Rb+>Cs+>NH4+. The hydrogen bonding contributions are significantly larger for charged oxygen in acetate ion than for the neutral O atom, and slightly larger for NH4+ ion than for NH nitrogen, as expected.
The model was validated by three different methods. First, the results of final fitting for all solvent-water systems (1164 transfer energies from 18 solvents to water) were compared with results of fitting for nine individual water-solvent pairs (38 to 139 data points for individual solvent-water pairs). The rmse obtained for the full dataset (0.82 kcal/mol) was only slightly higher than for individual systems (from 0.59 to 0.78 kcal/mol, Table 2), and the values of σi and η were close, within the error of determination, when calculated from regression coefficients ( , ei, ai, bi and edip) for a specific solvent-water system or derived by direct data fitting for this system (Table 6).
Second, the model was cross-validated using the “10% out” test: five training datasets with 90% of data were randomly selected, leaving out the remaining 10% of data as the test set. The average rmse value for the five test sets increased by ~5%, from 0.82 to 0.86 kcal/mol.
Finally, we tested whether regression coefficients derived from data for non-polar solvents can be used to predict transfer energies from polar solvents to water. Thus, a training set was created by selecting only transfer energies from ten non-polar solvents to water (863 points or 74% of total dataset). The regression coefficients obtained by fitting for non-polar solvents were used to calculate 301 transfer energies from eight polar solvents to water. In this case the rmse of the test set increased by 8% (to 0.89 kcal/mol).
The validation demonstrates that our model can be used to predict transfer energies for molecules outside the training sets and for different solvents. Importantly, values of solvation parameters σi and η obtained with different datasets were nearly identical (Table 6).
We developed a new implicit solvent model that provides empirical separation of the first solvation shell and long-range electrostatic contributions to the transfer free energy from an arbitrary solvent to water (equation 1). The contributions were quantified by considering them as the ASA-dependent and dipole moment-dependent components, respectively (equations 2 and 3). Accordingly, two different types of empirical solvation parameters were used to quantify transfer energy of neutral compounds from a specific solvent to water: parameters that describe first-shell transfer energy for different atom σI types per Å2; and η parameter that defines electrostatic transfer energy per 1 Debye. The long-range electrostatic energy of ions was included using a modified Born equation (9). During the initial data analysis and subsequent development of the universal model, we found that all proposed solvation parameters, σi and η, are linearly dependent on several solvent polarity descriptors (α, β, π*, ε). These dependencies (Table 5) and the values of solvation parameters (Tables 2, ,6)6) are physically meaningful.
While considering atomic solvation parameters σi for neutral atoms, we found that they can be described by equation (4), which expresses a linear dependence of σi on hydrogen-bonding donor and acceptor capacities (α, β) and dielectric function (1/ε) of the solvent. Such a result is consistent with the established importance of short-range hydrogen-bonding interactions between solute and solvent42. The presence of a 1/ε-dependent component of the ASA-term may be attributed to electrostatic interactions between surface-distributed charges around polar atoms and the surrounding solvent97. The dependencies of parameters σi, on the solvent polarity reflect the nature of solute-solvent interactions for different atom types. Fifteen atom types were identified here for neutral compounds. They can be divided into three distinct categories based on the values and behavior of their solvation parameters: hydrophobic, environment-insensitive, and polar. Transfer energy of hydrophobic atoms (Csp3, Csp2, halogens, and N=O) originates from the hydrophobic interactions that are weakly modulated by polarity of the non-aqueous solvent. Transfer energy of environmental-insensitive atoms (Csp3pol, Csp1, and N) is very small, and its dependence on solvent polarity could not be reliably established. In contrast, transfer energy of polar atoms (N, NH, O, OH) is driven by hydrogen-bonding and other interactions in the first solvation shell and strongly depends on solvent polarity. The dependence of ASA-dependent water-solvent transfer energies of polar atoms on solvent polarity has been previously noticed98.
While considering the long-range electrostatic component of the transfer energy of polar atoms, we discovered that dipolar solvation parameter η can be described by the linear dependence on solvent dipolarity/polarizability parameter π* (equation 5) or the BW dielectric function (equation 6), but not by Kirkwood or 1/e functions. Similar results have been previously obtained in studies of dielectric models describing electrostatic interactions of molecular dipoles with media6, 99, 100. The π* parameter and BW function may be partly interchangeable because they correlate for “select” solvents100. We compared the dependencies of the electrostatic component on the sum of group dipole moments and on the sum of squared dipole moments (“μ1-model” and “μ2-model”, respectively) and selected “μ1-model” because it provided smaller errors during the fitting procedure. This is consistent with the results of Wolfenden and coworkers who found a linear rather than quadratic dependence of logP values on molecular dipole moments for nucleotide derivatives57.
The multiple regression analysis of data for ions demonstrated that their transfer free energies can be represented as a combination of hydrogen-bonding interactions in the first solvation shell described by equation (4) and the long-range electrostatic energy described by equations (8,9). The independent determination of the both contributions for neutral molecules and ions demonstrates that the first-shell solvation energy dominates over the long-range electrostatics in both cases (Figure 7). The obtained weighting factor of Born energy (eBorn in Table 5) indicates that long-range electrostatic energy for ions constitutes only ~20% of the energy expected in a purely electrostatic (Born) model, because the main part of the transfer energy comes from hydrogen-bonding and other interactions in the first solvation shell. This is in agreement with other analyses of transfer energies for ions45, 81. This result is also consistent with previous QSAR and LSER studies that found an important but not a predominant contribution of the dipolar energy to the partition coefficients of organic compounds between different phases100–104.
The proposed model is reasonably accurate (Figure 6). R2 values for neutral compounds and ions are 0.92 and 0.87, respectively. The rmse for predicting transfer energy of neutral molecules is 0.82 kcal/mol. This is only slightly higher than the errors in predictions by the most advanced universal model, SM8 (0.59 kcal/mol for neutral solutes) and smaller than show other universal solvation models (from 1.86 to 5.66 kcal/mol, see23. The largest errors occur for transfer energies of molecules with aromatic rings and multiple dipolar groups, such as chlorophenol, hydroxyphenol or nucleotides, due to the simplified treatment of electrostatic interactions in our model (see Methodology).
Our model is less accurate for ions, with rmse of 1.61 kcal/mol, although a more rigorous comparison with other models would require a bigger dataset. Our model is universal because it allows calculation of parameters σI and η for any solvent-water system using solvent polarity descriptors and regression coefficients determined here (Table 5, equation 11). Hence, the model can be applied for the evaluation of transfer energy of molecules between any liquid media. Several validation tests demonstrated the predictive power of the method for molecules of different chemical structure and size and for large set of solvents. Importantly, values of empirical solvation parameters calculated for each solvent-water pair do not depend on the training set used for parameterization. Interestingly, though our previous implicit solvent model developed for simulations of proteins in membranes3 was oversimplified and parameterized on a much smaller dataset, the underlying atomic solvation parameters for five most common atom types (Csp3, Csp2, O, N/HN, OH) are close in both current and previous models (see 2nd and 10th columns in Table 6).
All parameters of the model are transferrable from small organic compounds to biological macromolecules composed of the same types of atoms. However, the simulation of proteins in membranes requires the prior knowledge of polarity descriptors that change along the bilayer normal (z). The required polarity profiles, ε(z), π*(z), a(z) and β(z), can be obtained either experimentally using different spectroscopic probes4–9 or theoretically from the lipid fragment distributions along the bilayer normal105 and group constants of the lipid fragments106, as described in the following paper.
We have developed a new implicit solvation model that can be applied for calculating transfer energies of neutral molecules and ions from water to environments with defined bulk properties, such as dielectric constant (ε), dipolar/polarizability parameter (π*) and hydrogen and hydrogen bonding acidity (α) and basicity (β) parameters of Abraham. The model is relatively simple, as it includes a small set of empirical parameters and several linear equations. The accuracy of the model in calculation of transfer free energies of neutral molecules is comparable with the most advanced solvation models. This model satisfies three essential criteria: it is universal, physics-based and computationally efficient. The computational efficiency of the method allows application of the model to large-scale computational analyses of small organic compounds or biological macromolecules. It can be readily used for prediction of ADMET properties of drugs by calculating partition coefficients of molecules between water and any isotropic solvent with defined bulk properties. More importantly, the universal model can describe solvation of molecules in anisotropic environment, such as artificial phospholipid bilayers or biological membranes. The following paper represents the validated application of this model for prediction of membrane binding affinities and spatial positions in membranes of small molecule and membrane-associated peptides and proteins. The model can be further applied for prediction of membrane permeation of structurally diverse molecules.
This research was supported by grant 0849713 (A.L., I.P.) from the National Science Foundation (Division of Biological Infrastructure), Upjohn award from the College of Pharmacy of the University of Michigan (A.L.) and in part by the grant 5R01DA003910-23 (H.I.M.) from the National Institute of Health (National Institute Of Drug Abuse). We are thankful to Dr. Hubbard for the kindly provided NACCESS program.
Supporting Information Available: Empirical polarity parameters of 20 solvents (Table S1), experimental transfer energies of neutral compounds from solvents to water (Tables S2, S3) and from vapor to solvents (Table S4), transfer free energies of ions from solvents to water (Table S5), atomic solvation parameters obtained for transfer from solvents to water and from vapor to water using alternative dielectric models (Tables S6-S9), and linear regression coefficients of atomic solvation parameters obtained using an alternative dataset (Table S10) or an alternative dielectric model (Table S11) are available as Supporting Information. This material is available free of charge via the Internet at http://pubs.acs.org.