|Home | About | Journals | Submit | Contact Us | Français|
In this work semi-experimental and theoretical equilibrium geometries of 10 sulfur-containing organic molecules, as well as 4 oxygenated ones, are determined by means of a computational protocol based on density functional theory. The results collected in the present paper further enhance our online database of accurate semi-experimental equilibrium molecular geometries, adding 13 new molecules containing up to 8 atoms, for 12 of which the first semi-experimental equilibrium structure is reported, to the best of our knowledge. We focus in particular on sulfur-containing compounds, aiming both to provide new accurate data on some rather important chemical moieties, only marginally represented in the literature of the field, and to examine the structural features of carbon-sulfur bonds in the light of the previously presented linear regression approach. The structural changes issuing from substitution of oxygen by sulfur are discussed to get deeper insights on how modifications in electronic structure and nuclear potential can affect equilibrium geometries. With respect to our previous works, we perform non-linear constrained optimizations of equilibrium SE structures with a new general and user-friendly software under development in our group with updated definition of useful statistical indicators.
In a recent perspective paper1, George M. Whitesides, one of the most cited chemists2 of the last decade, pointed out that one of the main concerns that chemistry has still to solve is the so-called “rational drug design”.3–5 Among the various experimental and theoretical steps one should overcome to reach such an ambitious target, those involving the prediction of the tertiary structure of a protein or a drug from their chemical formulae are still extremely challenging. Despite the large number of qualitative techniques developed nowadays, an accurate prediction of the three-dimensional structure continues to be a difficult task, even for medium-sized molecules. In order to overcome such limitations and go beyond our actual knowledge about the most peculiar features determining molecules’ tertiary structures, a systematic and accurate study of a large collection of molecules containing the relevant functional groups is mandatory. Another fascinating challenge of modern chemistry is the search for additional experimental evidences about the origin of life and, as a strictly connected topic, extraterrestrial chemistry. 1,6,7 In this kind of research, computational microwave and infrared spectroscopies play a leading role in driving the interpretation of experimental data and elucidating the chemical composition of both planets’ atmospheres and interstellar medium8–10. On the other hand, the theoretical prediction of reliable spectroscopic properties relies on the computation of accurate equilibrium geometries of isolated molecular systems,11,12 that present the advantages of having a unique definition in the Born-Oppenheimer (B-O) picture and being independent from isotopic substitution and vibrational effects.11–26
It is evident from the above discussion that a large database of accurate equilibrium molecular structures is of primary importance as it provides the required reference data for: (i) testing the precision of various quantum-mechanical (QM) approaches;14,27–31 (ii) parametrizing new density-functionals and force fields for bio-chemical applications;32–41 (iii) developing and validating new computational protocols for supporting and driving experimental observations11,42.
Detailed knowledge of the equilibrium structures of isolated molecular systems, especially those of biological interest, is then a prerequisite for a deeper understanding and a more accurate modelling of paramount aspects of current chemical research. Unfortunately, from both experimental and theoretical points of view, the effort needed to obtain accurate equilibrium geometries becomes prohibitive except for very small systems.43–48 An important breakthrough, first introduced by Pulay and co-workers49, has been the adoption of the so-called semi-experimental (SE) equilibrium geometry 50,51, which can be obtained by a least-squares fit (LSF) of experimental rotational constants for the ground vibrational state of different isotopologues corrected by vibrational contributions computed theoretically. 13,14,49 This interplay between experimental measurements and theoretical calculations paved the route toward the extension of accurate structural studies to systems larger than those treatable by experimental and QM methods separately. 19,51–53
Thanks to the (never too) large set of available experimental data issuing from microwave spectroscopy, we employed DFT-based model chemistries to calculate and afterwards for molecules containing up to 15 atoms, mostly including molecular building blocks of chemical, biological and technological interest. 54,55 The B3LYP functional56,57 in conjunction with a polarized double-ζ basis set has been shown to be an optimal compromise between accuracy and computational cost, thus paving the route to the extension of the SE approach to the structural analysis of larger systems.54 In the few cases where the B3LYP-based scheme did not deliver high accuracy, the use of the B2PLYP functional58–60 coupled with a doubly polarized triple-ζ basis set was shown to represent a very accurate approach in the calculation of , without any significant loss of accuracy with respect to higher level post-Hartree-Fock calculations.55,61 The results we have so far obtained have been made available in a graphical interactive form on our web site, 62 where of various molecules obtained using either B3LYP and B2PLYP functionals are collected in two databases (B3se and B2se set, respectively), together with other kinds of structures that may be useful, such as their fully theoretical counterparts (B3th and B2th sets) and, when feasible, obtained using coupled cluster (CC) techniques for the calculation of (CCse set).62 These SE equilibrium structures not only represent a high quality benchmark for structural and spectroscopic studies, but, as already pointed out elsewhere54,55, they can be seen as a set of “structural synthons” to be used, in the framework of the previously presented templating molecule and linear regression approaches (TMA and LRA, respectively), in order to build reliable equilibrium structures of larger molecules, such as amino acids, polypeptides and nucleobases.55,61
The main aim of the present work is to introduce and discuss 10 new SE equilibrium structures of interest for modeling molecules containing one sulfur atom and hence of bio- and astro-chemical relevance. In addition, in the case of 4 molecules, we have also studied the SE equilibrium structure of the oxygenated analogues, in order to (i) investigate the structural effects of sulfur-oxygen substitution, thus gaining some insight on how the changes in electronic structure and nuclear potential could affect , and (ii) enlarge further our data set. The new structures add some important moieties to our collection, in which sulfur-containing functional groups were underrepresented until now, in spite of their important role in biological molecules63,64 as well as in prebiotic65 and interstellar chemistry. 66,67
In order to obtain the complete of a molecule, one has to perform a LSF on a set of molecular coordinates to the semi-experimental equilibrium moments of inertia, hereafter denoted as of as many isotopologues as possible, where β = a, b or c is one of the principal inertial axes in the molecule-fixed reference frame. The can be obtain starting from the ground state experimental rotational constants, using the following equality
where the equilibrium SE rotational constants are given by
In the previous equation represents a correction computed from a QM approach and explicitly given by
is an electronic contribution, which derives from the rotational g tensor and the ratio between electron (me) and proton (Mp) masses68–70. This term has been systematically included in our computations, even if often negligible. Within B-O approximation and Eckart-Sayvetz conditions71–74, the expression of the vibrational contribution can be found with the aid of second-order vibrational perturbation theory (VPT2) applied to the molecular ro-vibrational Hamiltonian expressed in normal coordinates75–77. In the summation, are the vibration-rotation interaction constants and di is the degeneracy of the i-th vibrational normal mode68,77,78.
As can be inferred by looking at the explicit form of -terms for asymmetric tops68,
the computational bottleneck of this procedure is the calculation of the elements Kiij of the cubic force field. The other terms in the above equation are the inertial derivatives (ai,βγ), the Coriolis coupling constants (ζij,β = −ζji,β) and the harmonic frequencies (ωi); all the remaining symbols take their usual meaning. It is worth underlining how all possible terms involving accidental resonances (ωi ≈ ωj) in equation 4 cancel once the summation in equation 3 is performed, so that the correction is devoid of the problems related to resonances. On the basis of previous validations54,55 all anharmonic calculations have been performed employing DFT techniques, using either the hybrid B3LYP56,57,79 and the double hybrid B2PLYP58–60 functionals. While the latter functional has always been coupled with a standard correlation-consistent polarized triple-ζ cc-pVTZ basis set80–83 (shortly denoted as VTZ), the B3LYP functional has been used in conjunction with the polarized double-ζ SNSD basis set84,85, which was developed in our group starting from a standard 6-31G basis set and adding, with a subsequent optimization of orbital exponents, one contracted and one diffuse s-type function on each atom, together with diffuse p and d-types functions on second- and third-row atoms.86 Both basis sets have been chosen because several studies have shown them to be an excellent compromise between accuracy and computational cost for the calculation of several spectroscopic properties, in particular for anharmonic computational models.31,48,60,84,85,87–90 The contributions have been evaluated using the B3LYP functional in conjunction with the aug-cc-pVTZ (hereafter AVTZ) basis set81,91 in the calculation of the gββ quantities. All the calculations have been performed with the G09-D01 package92, computing harmonic contributions from analytical second derivatives and cubic force fields by numerical differentiation of analytical second derivatives of the electronic energy with respect to geometrical parameters, with the default step of 0.01 Å in the normal-coordinate representation93–98. Very tight convergence criteria have been used for the SCF procedure and geometry optimization, together with an ultra-fine grid for the numerical integration connected to the exchange-correlation functional and its derivatives.
Concerning the fitting procedure, all SE equilibrium geometries presented in this work have been obtained using a new software under development in our group, which allows one to perform non-linear constrained optimizations. One of the main features of the new tool is the possibility of computing the analytical Hessian matrix of the residual function (the squared norm of the residual vector) at the optimized geometry, thus allowing to characterize the critical point reached at the end of the minimization procedure. Its robustness has been proved by studying a representative set of different molecular systems, while the reliability of the results has been confirmed by comparison with the structural parameters computed by the xrefit module of the CFOUR program package99. Although the full implementation of the new software will be presented in a forthcoming paper, we discuss here some details about the error estimate in order to avoid possible ambiguities arising from slightly different approximations employed in the available software, especially with respect to the xrefit code used in our previous work. As it is well known, the aim of an optimization is the determination of the vector x* minimizing the residual function Rf(x):
where ySE and yguess are N-dimensional vectors containing the semi-experimental and the guessed data, respectively, R represents the corresponding residual vector (with elements ri), while x = (x1, … , xM) is the set of variables characterizing the optimization procedure. Due to the non-linearity of the residual function with respect to the variables x, the optimization has to be performed through an iterative procedure, where at each iteration k the correction vector for the geometry can be found by solving a linear system of the type:
where B(xk) is a general symmetric and positive definite matrix, Δxk is the correction vector, g(xk) is the gradient of the residual function, and αk is a scaling factor which ensures that the value of the function decreases at each iteration100. Once the optimized set of structural parameters x* has been computed, the error-related quantities can be defined.
In the present work this has been replaced by the residual standard deviation s defined by Demaison101 as
in order to take into account also the number of degrees of freedom in discussing our results.
The estimates of the standard errors on the optimized variables are equal to the square roots of the diagonal entries of the variance-covariance matrix C(x*):
where H(x*) is the Hessian matrix of the residual function evaluated at the optimum geometry. The standard deviation s(xj) of each parameter xj can be defined as follows,
which is the strategy adopted, for instance, in the xrefit code.
At variance, the uncertainties presented in the following have been defined through the evaluation of the confidence interval for each parameter xj :
In solving equations 6, it may happen that some parameters turn out to be very sensitive to small changes in the data and they suffer from large uncertainties so that they have poor significance. This is often due to the correlation of parameters that leads to an ill-conditioned optimization problem101. A reliable yardstick for assessing whether this kind of problems are present is provided by the condition number (κ) 102:
in which λmax and λmin are the largest and the smallest eigenvalue of the square matrix JJ, J being the Jacobian matrix whose columns have been normalized. As a rule of thumb, a large condition number (≥ 103 in the present case) has to be considered diagnostic for the presence of potentially harmful ill-conditioning101,102.
The results for the molecular structure determination of the 14 molecules considered in this work are summarized in Tables 1, ,22 and and3,3, with atoms labelled according to Figures 1 and and2,2, in the case of ambiguities. To the best of our knowledge, this is the first determination for thiirane, thiazole, 1,2,3-thiadiazole, 1,3,4-thiadiazole, 1,2,4-thiadiazole, 1,2,5-thiadiazole, isothiocyanic acid, thioformaldehyde, thioketene, propadienethione, isocyanic acid and propadienone. For all the molecules under study, the theoretical structure fully optimized at B3LYP/SNSD and B2PLYP/VTZ levels is also reported. Concerning re structures, the new results substantially reproduce our previous findings about the good performances of the B2PLYP/VTZ model chemistry55, that shows a mean absolute error (MAE) with respect to SE equilibrium bond lengths lower than 0.005 Å (about half the value of 0.01 Å issuing from B3LYP/SNSD computations). The good accuracy of B2PLYP in predicting equilibrium structures has been recently extended by Adamo and co-workers to a larger panel of double-hybrid functionals32. In all cases our QM calculations systematically overestimate both re(C − S) and re(N − S), showing a mean error (ME), with respect to their counterparts, that is much larger than the previously mentioned averages, namely 0.0248 Å and 0.0623 Å for B3LYP/SNSD and 0.0097 Å and 0.0247 Å for B2PLYP/VTZ models. These values, about twice (for re(C − S)) and six times (for re(N − S)) larger than the average MAE on bond lengths involving only second row atoms, evidence the significant reduction in the accuracy of DFT-based methods for sulfur. We tried to optimize geometries with larger basis sets, obtained by adding various combinations of contracted d-functions on the sulfur atom, but the marginal improvement achieved suggests that the problem of reduced accuracy of structural parameters involving S atoms may be related to intrinsic limitations of the chosen functionals. In this respect, we optimized the geometries of selected molecules using M06-2X103 and BHandH,104 two global hybrid functionals with a larger amount of Hartree-Fock exchange than B3LYP, using the VTZ basis set. In both cases (see Table 1 in Supporting Information) there is a shortening of almost all chemical bonds with respect to B3LYP/SNSD and B2PLYP/VTZ calculations, that in some instances provides a better description of the molecular structure around sulfur atoms, but, in general, does not improve the overall structural predictions. Therefore, we think that B3LYP and B2PLYP remain the methods of choice for the determination of semi-experimental equilibrium structures. For the sake of completeness we have also reported the fully empirical r0 structure obtained by a least squares fit of bare experimental moments of inertia14,105. Although this completely empirical structure is expected to be significantly less accurate than the SE one106, its differences with respect to SE equilibrium parameters can be roughly interpreted as a measure of molecular flexibility concerning both the whole structure and the individual bonds. In general terms our results discourage the use of the r0 structure even for inferring qualitative considerations. The residual standard deviation (s), in terms of differences between observed and fitted moments of inertia, is always reported as an indicator of the quality of the fit and of the mutual coherence between experimental and theoretical data. Finally, it is worth to underline that the fits took an average number of 7 iterations in order to achieve convergence.
The chemical proprieties of heterocyclic organosulfur compounds are of considerable interest in modern research. In particular thiirane, thiazole and thiadiazoles have been found to have important biological107, pharmacological108–110 and technological111–113 applications; furthermore they are believed to play a significant role in astrochemical processes114,115 and prebiotic chemistry.116,117 With the exception of thiirane, the planarity of these molecules, highlighted by an inertial defect (Δ = Ic − Ib − Ia) close to zero, leads to a dependence among the inertia moments, so it is necessary to select one of their possible couples in the fitting procedure14. For all the cases taken into account, the specific choice did not significantly alter the SE results, which means that we have dealt with problems without strong ill-conditioning. In the following we consider fits on Ia and Ib, because choosing the two smallest inertia moments usually gives the best results for the r0 structure106.
Thiirane, the simplest S-containing heterocyclic compound, is the sulfur analogue of oxirane (ethylene oxide) and it belongs to the C2v symmetry point group. Its rotational and vibrational spectra are well known118, as well as those of its major isotopomers119–121. The use of all the experimental moments of inertia measured till now permitted to fit the complete structure reported in Table 1, where the excellent agreement between B3LYP/SNSD and B2PLYP/VTZ SE schemes can be appreciated. The small magnitude of both condition numbers and uncertainties on resulting parameters confirm the good quality of both calculated and measured data.
Thiazole, or 1,3-thiazole, is a planar molecule (Cs symmetry) for which all mono-substituted isotopologues have been investigated122, thus allowing us to obtain its first complete SE equilibrium structure. As apparent from Table 1, this system suffers from larger uncertainties on the fitted parameters in comparison to the other molecules studied in this paper. In our opinion, this can be due to both possible inconsistencies in the experimental data, as suggested by a condition number about 50 times that of thiirane, and to the restricted number of degrees of freedom (=5) in the regression problem, the latter implicating a value of Student’s t distribution of 2.57. In any case, it is worth noting how the inertial defect, that should vanish for planar systems, considerably decreases when experimental rotational constants are corrected by vibrational and electronic contributions, confirming the consistency of our computational schemes in the calculation of
1,2,3-, 123 1,3,4-, 124 1,2,4-, 125 and 1,2,5-thiadiazole,126 with Cs, C2,v, Cs, and C2,v symmetry, respectively, represent all possible thiadiazoles and all of them have been extensively studied. For these molecules inspection of Table 1 shows satisfactory results, with low statistical uncertainties and differences between B3LYP/SNSD and B2PLYP/VTZ SE parameters that rarely exceed 0.001 Å for bond lengths or 0.1 degrees for bond angles. The only noteworthy discrepancy can be found in the r(C3-H) bond length of 1,2,4-thiadiazole (1.0829Å and 1.0784 Å for B3LYP/SNSD and B2PLYP/VTZ vibrational corrections respectively), which shows large uncertainty together with a(N2-C3-H). This is probably due to the absence of experimental data concerning H→D isotopic substitution causing a scarce sensitivity of the fit to the H3 coordinate, that in turn affects both r(C3-H) and a(N2-C3-H). Despite this, the corresponding values obtained from B3LYP and B2PLYP computations are in agreement within their relatively large confidence intervals.
Sulfur and oxygen have a quite similar chemistry and can be found in analogous compounds. This parallelism can be exploited in order to understand, in a B-O picture, how changes in electronic structure and nuclear potential arising from the substitution of O with S can affect, qualitatively and quantitatively, the structural features of a molecule in its ground state equilibrium configuration. As an example of this kind of comparative analysis, we parallel cumulene-thiones up to the third member of the series (see Table 2) to their cumulenone counterparts (see Table 3), both families being of potential astrochemical and astrobiological interest.127–129 Also isothiocyanic and isocyanic acids are reported in the same tables. All the molecules discussed in this section are near-prolate asymmetric tops because of the linearity, or quasilinearity, of their heavy atom chains, which causes one eigenvalue of their inertia tensors (Ia) to be quite small, so that the corresponding rotational constant is particularly difficult to measure or even estimate. For this reason the inertial defects of these molecules have only marginal significance and are not reported in Tables 2 and and3;3; as another consequence all the and r0 structures have been obtained by fitting the Ib and Ic inertia moments.
Both isothiocyanic and isocyanic acids have been studied exploiting all the most common single isotopic substitutions130,131, obtaining very satisfactory results from the points of view of uncertainties (s values) and agreement between B3LYP/SNSD and B2PLYP/VTZ These two molecules are topologically very similar in their equilibrium structure and the only significant difference introduced by the replacement of oxygen with sulfur is an obvious lengthening of the r(C-S) parameter with respect to its r(C-O) counterpart, i.e. a very localized effect.
Almost the same occurs for thioformaldehyde132 and thioketene133 in comparison to their oxygen analogues, formaldehyde54,55 (already present in our previous data sets) and ketene134, whose new DFT-based structures are in excellent agreement with the most accurate equilibrium geometry previously reported.134
Moving to the third members of cumulene-thiones and cumulenes series, larger uncertainties and unexpected structural differences are observed in both equilibrium and r0 structures of propadienethione and propadienone. In fact the heavy atoms of propadienethione are collinear135, whereas the oxygenated counterpart presents a strongly kinked structure136–138. The uncertainties and the residual standard deviation for the B3LYP/SNSD of propadienethione are considerably larger than those at the B2PLYP/VTZ level. This behaviour can be explained by noting that in this case the contributions are very small (see Table 2 in Supporting Information) and the relative error of the B3LYP/SNSD method with respect to more accurate calculations becomes significant, leading to relevant changes in the result, also possibly related to the presence of ill-conditioning effects. For propadienone the situation is even worst, with condition numbers of the fits larger than 3500, a clear signal of severe ill-conditioning that, in this case, increases the variances of estimated parameters and can be responsible for important rounding errors101.
Moreover, propadienone is the only molecule, among those studied in this work, for which the conditioning of the problem worsens moving from experimental to SE inertia moments, as indicated by the increase of κ by almost 400 units when are fitted instead of The poor results obtained for the structures of propadienone may be due to the failure of VPT2 in accurately computing the vibrational contribution to the rotational constants of this rather flexible molecular system. In particular we believe that the problem concerns the calculation of the vibration-rotation interaction constant of the lowest frequency normal mode, along which the bending potential of propadienone presents a double minimum with a strong anharmonicity139. For those reasons our results on propadienone have poor quantitative significance, but we decided to report their values in Table 3 in the light of their qualitative interest discussed in the following paragraph.
The bent structure of propadienone is of particular interest not only because it is unexpected on the basis of both chemical intuition and comparison with shorter cumulenes, but also because Hartree-Fock calculations predict it to have a wrong C2v structure analogous to that of propadienethione138. In fact, it has been shown how in a B-O framework the non-linearity of the CCCO chain is a surprisingly strong effect of electron correlation, that affects the structural features of the molecular system, correctly predicted only if electron correlation is properly taken into account140. In particular MC-SCF calculations pinpointed the importance of the double in-plane π − π* electronic excitation in kinking the molecule at the β-carbon 141,142, with a resulting wave function compatible with a valence bond based hypothesis about the importance of the H2C = C− − CO+ resonance structure for the stabilization of the system, thus justifying distortion from linearity. Electron correlation has been found to have such a decisive role in shaping, and hence in predicting, atom spatial arrangement also in longer cumulenes143,144 and it is certainly worth noting how DFT methods succeed in qualitatively reproducing the propadienone re structure. Regarding propadienethione, even though QM calculations at any level correctly reproduce the experimentally confirmed C2v geometry, one would like to explain, in terms of reasonable chemical arguments, the driving force that straightens this kind of system upon oxygen-sulfur exchange. Resorting to Lewis resonance structures does not help, because a-priori there are not important arguments that exclude a significant contribution of H2C = C− − CS+. On the one hand, this is reinforced by the similar chemistry usually involving O and S atoms, on the other hand by CASSCF calculations that we performed with qualitative purposes at various levels, obtaining C2v absolute minima with a multiconfigurational wave function very similar to that of the kinked propadienone in terms of the relative weights of corresponding excited determinants; therefore other important electronic and energetic aspects could affect these systems.
At this point it is useful to resort to the very elegant and effective model proposed in 1987 by Trinquier and Malrieu145 that relates the occurrence of distortions from linearity in cumulene-like molecules to the singlet-triplet energy separation (ΔES→T) of the interacting fragments forming the multiple bond involved in the bending. Without entering the details of the model, well explained in the original work145, one can consider propadienone (H2C = C = C = O) as formed by the interaction between one vinylidene (H2C = C:) and one carbonyl (:C = O) fragments, both being diradical species with a singlet ground state and a ΔES→T of 46 and 139 kcal/mol, respectively.146 As it is well known,147–149 the formation of a classical linear double bond (σ + π) requires the reaction between the triplet states of diradical fragments, so the higher ∑ΔES→T of the fragments (185 kcal/mol in the present case) with respect to the energy gain related to their linear double bond (estimated to be 146 kcal/mol for propadienone), the more the fragments tend to keep their singlet nature in a resulting bent structure, that anyway gives rise to some extent of conjugation, and thus represents the best compromise from an energetic point of view. The same considerations can be applied to propadienethione, for which the reduced ΔES→T of the :C = S fragment (79 kcal/mol), together with a quite similar energy gain for the linear double bond (144 kcal/mol), makes the linear C2v structure the preferred one.
In the light of the above model, that works very well also in predicting the C2v symmetry of shorter cumulenes and cumulene-thiones, it is clear that the substitution of the oxygen atom with a sulfur one causes the energy difference between singlet and triplet states of the involved diradical fragment to decrease sufficiently to stabilize the linear structure instead of the kinked one, thus justifying also the importance of electron correlation for a reliable ab − initio prediction of structural deformations.
As a concluding remark we would like to stress the fact the this example confirms the reasonableness of thinking at molecular systems as composed by fragments, or structural synthons, that tend to preserve their individualities and main structural features. This consideration further justifies the applicability and the successful results of our previously introduced TMA and LRA,54,55,61 both exploiting such common characteristics of the same moieties in different molecules.
As previously noted, B3LYP/SNSD and B2PLYP/VTZ predictions of r(C-S) and r(N-S) bond lengths suffer from large inaccuracy with respect to the reference SE parameters. Following the same approach (LRA) adopted in a previous work55, we looked for a linear relation between theoretical calculated parameters and SE ones by performing a linear regression of B2PLYP/VTZ r(C-S) bond lengths versus the corresponding re predictions (see Figure 3). In this way, defining B as the intercept for the linear regression and A as (slope−1), we can find out an empirical correction to improve the accuracy of re parameters:
Even if the number of items is not very large, Figure 3 and Table 4 suggest the good quality of the linear regression for both B3LYP/SNSD and B2PLYP/VTZ, with an R2 of 0.9991 and 0.9996 respectively. Further, it is remarkable that for the empirically corrected parameters the MAE is reduced to 0.0018 Å for B3LYP/SNSD and to 0.0011 Å for B2PLYP/VTZ, with a root-mean-square error (RMSE) of 0.0024 Å and 0.0015 Å, respectively. In order to check the reliability of LRA empirical correction for r(C-S), we can exploit 2-thiouracil and thiophene as test cases, both of them being already present in our database. The best equilibrium value for r(C2-S) in 2-thiouracil is 1.65020(3) Å90, that was obtained by employing highly accurate coupled cluster ab-initio parameters in a partial SE equilibrium structure optimization; B3LYP/SNSD and B2PLYP/VTZ predict bond lengths of 1.6693 Å and 1.6567 Å, respectively55. Applying eq. 13 to these values, we obtain a LRA empirical estimation for r(C2-S) in 2-thiouracil of 1.6464 Å and 1.6477 Å for B3LYP/SNSD and B2PLYP/VTZ, thus reducing the absolute error to 0.0038 Å and 0.0025Å, respectively. Concerning thiophene, our DFT calculations predict the r(C-S) distance to be quite longer than the best SE value of 1.7126 Å, with errors of 0.0278 Å (B3LYP/SNSD) and 0.0088 Å (B2PLYP/VTZ)54,55. Again eq. 13 significantly improves the results, reducing the absolute error to 0.0015 Å for B3LYP/SNSD and to 0.0026 Å in the case of B2PLYP/VTZ These two examples clarify how LRA empirical corrections allow one to significantly improve the DFT theoretical predictions for r(C-S) distances, and thus can be useful in order to have a trustworthy guess of the equilibrium structure employing only non expensive computational calculations. Looking at interpolating straight lines in Figure 3, it is clear that the overestimation error of B3LYP/SNSD rapidly increases with the lengthening of SE equilibrium value of r(C-S), whereas B2PLYP/VTZ suffers of a small quasi-systematic error, the slope of the linear regression line being 0.96.
In the light of the large number of new SE equilibrium carbon-hydrogen bond lengths determined in this work, we report in Table 4 also the results for the linear regression of all B2PLYP/VTZ r(C-H) bond lengths versus both B3LYP/SNSD and B2PLYP/VTZ re predictions present in our online database. The very small changes in A and B parameters with respect to those previously found55 are very satisfactory and, together with low MAE and RMSE of the LRA predictions, confirm the reliability of LRA approach for obtaining accurate estimates of r(C-H) bond lengths comparable with SE ones. It is worth to underline that, in the case of B3LYP/SNSD, Δr corrections reduce the MAE of 0.0062 Å and the RMSE of 0.0064 Å to 0.0008 Å and 0.0011 Å,respectively, thus significantly improving the accuracy of the final results.
Another interesting remark concerns the fact that in all the four cases reported in Table 4 the error distribution of LRA parameters is well centred on zero, confirming the good accuracy of the LRA, while the small values of the standard deviation (SD), together with the significant size of the samples analysed, allows the definition of a quite narrow confidence interval for searching the optimal SE equilibrium value, for instance through the so called predicate method14, starting from a not too expensive QM calculation.
Beyond the significant enlargement of our online datasets of molecular geometries, this work vouches for the achieved ripeness of our computational protocol in dealing with the calculation of accurate SE equilibrium molecular structures of semi-rigid organic molecules containing the most common atoms and moieties of interest in astrochemistry as well as in life- and material-sciences. In fact, we have improved and further tested the accuracy and robustness of our fitting procedure by means of a new software able to perform non-linear constrained optimizations, which will be fully described in a forthcoming paper, once extensive testing and implementation of some additional features will be accomplished. The 68 molecules collected till now represent a valuable source of benchmark data to be used in parametrization and validation studies, as well as input information in many kind of analyses and computations. It is remarkable that the SE technique coupled with DFT anharmonic calculations allows one to obtain, with a very high level of accuracy, the equilibrium structure of molecules containing a third row atom like sulfur, which considerably increases the computational cost of high level post-Hartree-Fock methods because of their unfavourable scaling with the number of electrons.
Next these results can be exploited to find the best compromise between accuracy and computational cost in deriving equilibrium structure of molecular systems. In particular, in the context of the linear regression approach applied to carbon-hydrogen and carbon-sulfur bond lengths, it has been possible to define trustworthy empirical corrections, that permit one to make use of cost-effective, fully quantum mechanical, B3LYP/SNSD and B2PLYP/VTZ results to reach highly accurate estimates of equilibrium values. In this context, the large number of accurate SE parameters permits a profitable use of statistical estimates in order to extract reliable information about molecules’ structural features and their evaluations by different approaches.
The reliable and robust performances offered by our protocol in dealing with medium-sized semi-rigid molecules, together with the dimension of the gathered dataset and the optimized tools developed till now, make us confident that it will be possible to obtain satisfactory results also for larger systems, which will be the focus of our future efforts because of their great interest in many areas of current fundamental and applied research.
Experimental rotational constants, calculated at B3LYP/SNSD and B2PLYP/VTZ levels of theory, and calculated at B3LYP/AVTZ level are reported for all the species discussed in the text, together with geometry optimizations of molecules reported in Table 2 at M06-2X/VTZ and BHandH/VTZ levels of theory.
The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement n. .