Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2853773

Formats

Article sections

Authors

Related links

J Chem Theory Comput. Author manuscript; available in PMC 2011 January 7.

Published in final edited form as:

J Chem Theory Comput. 2010 January 7; 6(2): 405–411.

doi: 10.1021/ct9006635PMCID: PMC2853773

NIHMSID: NIHMS169372

Department of Chemistry and the Quantum Theory Project, 2328 New Physics Building, P.O. Box 118435, University of Florida, Gainesville, Florida 32611-8435

See other articles in PMC that cite the published article.

The ability to perform *ab initio* electronic structure calculations that scales linearly with the system size is one of the central aims in theoretical chemistry. In this study, the implementation of the divide-and-conquer (DC) algorithm, an algorithm with the potential to aid the achievement of true linear scaling within Hartree-Fock (HF) theory is revisited. Standard HF calculations solve the Roothaan-Hall equations for the whole system; in the DC-HF approach, the diagonalization of the Fock matrix is carried out on smaller subsystems. The DC algorithm for HF calculations was validated on polyglycines, polyalanines and eleven real three-dimensional proteins of up to 608 atoms in this work. We also found that a fragment-based initial guess using molecular fractionation with conjugated caps (MFCC) method significantly reduces the number of SCF cycles and even is capable of achieving convergence for some globular proteins where the simple superposition of atomic densities (SAD) initial guess fails.

*Ab initio* quantum mechanical methods have been developed over the past several decades and successfully applied to the study of the chemical properties for small to moderate-sized molecules. The routine application of these full quantum mechanical calculations on macromolecules (molecules containing greater than 500 atoms) continues to pose a great challenge for theoretical chemists. The major limitation of *ab initio* methods is the scaling problem, since the computational cost of *ab initio* methods increases considerably as the size of the molecule increases. For instance, Hartree-Fock (HF)^{1} and Density Functional Theory (DFT)^{2} scale as O(*N*^{4}), post-Hartree-Fock MP2^{3} scales as O(*N*^{5}) and the coupled cluster(CC)^{4}^{-}^{9} method that includes single and double excitations (CCSD) scales as O(*N*^{6}). In modern HF calculations, the computational cost for the 2-electron integrals can be reduced from O(*N*^{4}) to O(*N*^{2}) using a simple screening technique^{10}. Hence, the dominant step for large molecules becomes the matrix diagonalization, which scales as O(*N*^{3}). In this study, our goal was to reduce the computational cost of the diagonalization step in HF calculations to linear with system size.

The state-of-the-art linear-scaling algorithms, which make the computational cost scale linearly *O*(*N*) with the system size, have attracted the focus of many theorists during the past decade.^{11}^{-}^{21} Much effort has been devoted to the development of linear-scaling methods in order to compute the total energy of large molecular systems at the Hartree-Fock (HF) or density functional theory (DFT) level.^{12}^{,}^{15}^{,}^{18}^{,}^{22}^{-}^{26} One of the challenges is to speed up the calculation of long-range Coulomb interactions when assembling the Fock matrix elements. Fast multipole based approaches have successfully reduced the scaling in system size to linear^{14}^{,}^{16}^{-}^{18}^{,}^{25} and made HF and DFT calculations affordable for larger systems when small to moderate sized basis sets are utilized. The more recently developed Fourier Transform Coulomb method of Fusti and Pulay^{27}^{,}^{28} reduced the steep O(N^{4}) scaling in basis set size to quadratic and makes the calculations much more affordable with larger basis sets.^{29} There is also a class of fragment-based methods for quantum calculation of protein systems including the divide and conquer (D&C) method of Yang^{22}, Yang and Lee,^{23} Dixon and Merz,^{30} Gogonea *et al.*,^{31} Shaw and St-Amant,^{32} and Nakai and co-workers,^{33}^{-}^{36} the adjustable density matrix assembler (ADMA) approach method of Exner and Mezey,^{26}^{,}^{37}^{-}^{39} the fragment molecular orbital (FMO) method of Kitaura and co-workers,^{13}^{,}^{40}^{,}^{41} and the molecular fractionation with conjugate caps (MFCC) approach developed by Zhang and co-workers.^{42}^{,}^{43} Most applications of these methods to protein systems have been largely limited to semiempirical, HF and DFT calculations. Among these approaches, FMO has been applied to higher level *ab initio* calculations such as second-order Møller-Plesset perturbation theory (MP2)^{44} and coupled cluster theory (CC).^{45} Nakai and co-workers have recently proposed DC-MP2^{33}^{,}^{36}^{,}^{46} and DC-CCSD^{47} approaches; however, only systems of linear chains or near-linear chains have been tested so far for the divide-and-conquer algorithm for *ab initio* calculations.

In the DC algorithm, the total system is divided into small fragments. Atoms within adjustable buffer regions surrounding each fragment are included in the calculations to preserve the chemical environment of the divided subsystem. A set of local Roothaan-Hall equations is then solved for each subsystem and an approximate full density matrix of the entire molecular system is built up from subsystem contributions. By solving the HF self-consistent field (SCF) equation iteratively, the final converged full density matrix is used to obtain the total energy of the entire system. In this manner, linear scaling of the Fock matrix diagonalization step is achieved as a result of the fact that a set of smaller subsystem Fock matrices is diagonalized in the DC-HF approach rather than the global Fock matrix diagonalization for traditional HF calculations. Furthermore, divide-and-conquer calculations may be efficiently parallelized because the individual subsystem calculations are solved separately. In the DC-HF approach, the memory usage will increase linearly as the size of the system increases, which is also an important feature of this approach.

The aim of our current research is to further develop and validate the divide-and-conquer (DC)^{22}^{,}^{23}^{,}^{30}^{,}^{32}^{,}^{46}^{-}^{48} methodology to aid in the application of *ab initio* methods to biomacromolecules. In this study, our goal is to validate divide-and-conquer algorithm for Hartree-Fock calculations on globular proteins. Moreover, we propose a fragment-based initial guess using molecular fractionation with conjugated caps (MFCC) method to reduce the number of SCF cycles, and different division schemes are compared.

In protein systems, the divide-conquer approach is based on the chemical locality; this assumes that local regions of a protein are only weakly influenced by the atoms that are far away from the region of interest. The whole system is divided into fragments called core regions (*Core*^{α}). A buffer region (*Buffer*^{α}) is assigned for each core region to account for the environmental effects. The combination of every core region and its buffer region constitutes each individual subsystem (*R*^{α}) as illustrated in Figure 1. Local MOs of each subsystem are solved by the Roothaan-Hall equation

$${F}^{\alpha}{C}^{\alpha}={S}^{\alpha}{C}^{\alpha}{E}^{\alpha}$$

(1)

where *F*^{α} and *S*^{α} are local Fock matrix and local overlap matrix, respectively.

$${F}_{\mu \nu}^{\alpha}=\{\begin{array}{cc}{F}_{\mu \nu}\hfill & \phantom{\rule{1em}{0ex}}\text{if}\phantom{\rule{thickmathspace}{0ex}}{\chi}_{\mu}\in {R}^{\alpha}\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}{\chi}_{\nu}\in {R}^{\alpha}\hfill \\ 0\hfill & \phantom{\rule{1em}{0ex}}\text{elsewhere}\hfill \end{array}\phantom{\}}$$

(2)

After the local MO coefficient matrices *C*^{α} are obtained, the total density matrix of the whole system is given by

$${P}_{\mu \nu}=\sum _{\alpha =1}^{{N}_{sub}}{P}_{\mu \nu}^{\alpha}=\sum _{\alpha =1}^{{N}_{sub}}{D}_{\mu \nu}^{\alpha}{p}_{\mu \nu}^{\alpha}$$

(3)

where ${D}_{\mu \nu}^{\alpha}$ is the partition matrix,

$${D}_{\mu \nu}^{\alpha}=\{\begin{array}{cc}1\hfill & {\varphi}_{\mu}\in Cor{e}^{\alpha}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}{\varphi}_{\nu}\in Cor{e}^{\alpha}\hfill \\ 1\u22152\hfill & {\varphi}_{\mu}\in Cor{e}^{\alpha}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}{\varphi}_{\nu}\in Buffe{r}^{\alpha}\phantom{\rule{thickmathspace}{0ex}}\text{or}\phantom{\rule{thickmathspace}{0ex}}{\varphi}_{\mu}\in Buffe{r}^{\alpha}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}{\varphi}_{\nu}\in Cor{e}^{\alpha}\hfill \\ 0\hfill & {\varphi}_{\mu}\notin Cor{e}^{\alpha}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}{\varphi}_{\nu}\notin Cor{e}^{\alpha}\hfill \end{array}\phantom{\}}$$

(4)

and ${p}_{\mu \nu}^{\alpha}$ is the local density matrix defined by

$${p}_{\mu \nu}^{\alpha}=\sum _{i}^{LMOs}{n}_{i}^{\alpha}{C}_{\mu i}^{\alpha}{C}_{\nu i}^{\alpha \ast}$$

(5)

where ${n}_{i}^{\alpha}$ is a smooth approximation to the Heaviside step function:

$${n}_{i}^{\alpha}=\frac{2}{1+\text{exp}[({\epsilon}_{i}^{\alpha}-{\epsilon}_{F})\u2215kT]}$$

(6)

*ε*_{F} is determined through the normalization of the total number of electrons of the whole system.

$${N}_{elec}=\sum _{\alpha}\sum _{\mu}{\left({P}^{\alpha}{S}^{\alpha}\right)}_{\mu \mu}$$

(7)

After the density matrix is converged, the total HF energy is given as

$${E}_{HF}^{DC}=\frac{1}{2}\sum _{\alpha}\sum _{\mu \nu}{P}_{\mu \nu}^{\alpha}({H}_{\mu \nu}^{\alpha}+{F}_{\mu \nu}^{\alpha})$$

(8)

where ${H}_{\mu \nu}^{\alpha}$ is the local one-electron core Hamiltonian matrix similar to the definition of local Fock matrix in equation 2.

For HF calculations on large systems, the construction of the Coulomb matrix and exchange matrix along with the diagonalization of the Fock matrix constitute the three most time-consuming steps. The Hamiltonian matrix diagonalization intrinsically scales as O(*N*^{3}). In the divide-and-conquer scheme the diagonalization calculation is performed on each submatrix, which will naturally make the SCF diagonalization step scale linearly with the number of subsystems. However, it is important to realize that the divide-and-conquer algorithm does not help to reduce the scale of computation of the Coulomb matrix and exchange matrix. The continuous fast multipole method (CFMM)^{14}^{,}^{16}^{-}^{18}^{,}^{25}^{,}^{49}^{-}^{51} and the linear exchange K approach (LinK)^{52}^{,}^{53} provide ways in which the Coulomb matrix and exchange matrix can be made linear-scaling, respectively.

Here we introduce a fragment-based initial guess for *ab initio* calculations using the molecular fractionation with conjugate caps (MFCC) algorithm as described elsewhere^{42}^{,}^{54}^{,}^{55}. In the spirit of the MFCC approach, the full density matrix of the protein can be assembled by a linear combination of fragment density matrices

$${P}_{\mu \nu}=\sum _{i=1}^{{N}_{f}}{P}_{\mu \nu}^{f}\left(i\right)-\sum _{j=1}^{{N}_{c}}{P}_{\mu \nu}^{cc}\left(j\right)$$

(9)

where ${P}_{\mu \nu}^{f}\left(i\right)$ is the density matrix element of the *i*th protein fragment, ${P}_{\mu \nu}^{cc}\left(j\right)$ is the density matrix element of the *j*th conjugate cap. *N*_{f} and *N _{c}* are the total number of the protein fragments and conjugate caps, respectively. The MFCC partition scheme to cut a protein into amino-acid fragments and conjugate caps is shown in Figure 2. First, a series of single point HF calculations on all the fragments and conjugate caps are performed, then the full density matrix of the protein obtained using the converged fragment density matrices based on equation 9 is taken as the initial guess for the subsequent divide-and-conquer HF calculations. All the

In this section we assess the DC-HF approach performance by performing calculations on two types of simple systems: extended polyglycine (gly)_{n} and an alpha-helix of polyalanine (*α* – (ala)_{n} see Figure 3). All the calculations discussed here use the 6-31G* basis set. A buffer radius of *R _{b}* = 5.0 Å was adopted for all DC-HF calculations. Within this definition we include all the residues that contain any atoms within 5Å from the core region as part of the buffer region. A comparison of the CPU time required to solve the SCF equations on the extended polyglycine (gly)

The subsetting schemes for divide-and-conquer calculations on the extended polyglycine (Gly)_{n} (upper) and polyalanine in an *α* – helical structure (*α* – (Ala)_{n}, bottom).

The average computational time to diagonalize the Fock matrix in each SCF cycle using traditional HF and DC-HF for a series of extended polyglycines at the HF/6-31G* level.

The accuracy of the total energy calculated by the DC-HF approach on extended polyglycine systems compared to full system calculations.

In the current DC-HF approach, the scale for the computation of the Coulomb matrix is still *O*(*N*^{2}) after prescreening the two-electron integrals^{10}. When we apply equation 2 to construct the subsystem Fock matrix, the long-range Coulomb interactions between the local subsystem and distant atoms cannot be circumvented, thus, it should be emphasized that the D&C algorithm itself does not reduce the scale of Coulomb and exchange matrix evaluations and other approaches are necessary to achieve this result (*e.g.*, CFMM)^{14}^{,}^{16}^{,}^{17}^{,}^{49}.

Next we compare the number of SCF cycles necessary to reach convergence when the SAD (superposition of atomic densities) and MFCC initial guesses are used in the divide and conquer scheme using the 6-31G* basis set (see Table 1). The convergence criterion in all examples was set to 10^{−6} a.u. on the root-mean-squared change of the density matrix elements and 10^{−4} a.u. for the maximum change of the density matrix elements. Nakai and co-workers^{35} and Shaw and St-Amant^{32} have noted that DIIS causes SCF calculations to oscillate at the final stage of the SCF convergence process due to the slight errors introduced by the DC approximation for assembling the density matrix (see equation 3). In our HF DC calculations, the DIIS technique was turned off when the root-mean-squared change of the density matrix elements reaches 10^{−4} a.u.. We also found that although DIIS works in the early stages of the SCF procedure, we get the best performance when only two previous Fock matrices were stored in the DIIS calculations. One can see from Table 1 that the HF DC calculations usually requires more SCF cycles than the non-DC runs, however, for the polyglycine and polyalanine systems, the MFCC initial guess helps to reduce the number of SCF cycles in both DC and non-DC cases.

Previously, all the calculations used a residue based definition for the core region. We have also examined an atom based subsetting strategy for the core region in polyglycines and polyalanines. One can see from Table 2, the converged total energies using atom-centric core region were almost identical to those using a residue-based cutoff. Indeed, the overall mean unsigned deviation is as low as 0.054 kcal mol^{−1}. This is an attractive aspect of the divide and conquer approach since it allows for parallelization at the atom level rather than at the much larger reside based cutoff scheme.

No previous studies have utilized the divide-and-conquer HF approach on three-dimensional globular proteins. In order to address this point, we have validated the accuracy of divide-and-conquer HF/6-31G* calculations on eleven real proteins. The systems ranged from 304 atoms to 608 atoms and are listed in Table 3. The proteins consisted of *α* – helical structures (see Figure 8a) or are mixed *α* – *β* – structures (see Figure 8b). As shown in Table 3, the largest deviation is 2.25 kcal mol^{−1} and the overall mean unsigned deviation is only 0.97 kcal mol^{−1} compared to standard full system calculations. Importantly, the observed error is large than what was observed for the 0ne-dimensional examples, but is still within acceptable limits. This study sets the stage for the wide application of divide-and-conquer calculations on real protein systems. Furthermore, we have found that for five proteins, the divide-and-conquer HF calculations are not able to reach convergence using the simple SAD initial guess, while all the cases converged using the MFCC initial guess. Therefore, we conclude that the quality of initial guess plays an important role in insuring the convergence of divide-and-conquer calculations. Fragment-based electron density provides a much better quality initial guess with linear-scaling computational cost and, ultimately, much less computational time when compared to full system calculations.

In this study, the divide-and-conquer HF theory was revisited in order to examine its potential to study three-dimensional constructs and a new and effective initial guess scheme was introduced. We first validated the accuracy of the divide-and-conquer HF/6-31G* calculations on eleven three-dimensional globular proteins. The overall mean unsigned error was within 1 kcal mol^{−1} when compared to standard full system calculations. Furthermore, we found that the fragment-based initial guess using the MFCC approach reduces the number of SCF cycles for polyglycine and polyalanine systems. Moreover, the MFCC initial guess facilitated SCF convergence for several of the globular proteins, where the SAD initial guess was unable to yield a converged wavefunction.

We thank the NIH (GM GM044974) for financial support of this research. Computing support from the University of Florida High Performance Computing Center is gratefully acknowledged.

1. Szabo A, Ostlund NS. Modern quantum chemistry : introduction to advanced electronic structure theory. 1st. ed. McGraw-Hill; New York: 1989.

2. Parr RG, Yang WT. Annual Review of Physical Chemistry. Vol. 46. 1995. p. 701. [PubMed]

3. Møller C, Plesset MS. Physical Review. Vol. 46. 1934. p. 0618.

4. Bartlett RJ, Musial M. Reviews of Modern Physics. 2007;79:291.

5. Čížek J. Journal of Chemical Physics. 1966;45:4256.

6. Crawford TD, Schaefer HF. Reviews in Computational Chemistry, Vol 14. 2000;14:33.

7. Kállay M, Gauss J. Journal of Chemical Physics. 2005;123:214105. [PubMed]

8. Kállay M, Surján PR. Journal of Chemical Physics. 2001;115:2945.

9. Bomble YJ, Stanton JF, Kállay M, Gauss J. Journal of Chemical Physics. 2005;123:054101. [PubMed]

10. Strout DL, Scuseria GE. Journal of Chemical Physics. 1995;102:8448.

11. Schwegler E, Challacombe M. Journal of Chemical Physics. 1996;105:2726.

12. Goedecker S. Reviews of Modern Physics. 1999;71:1085.

13. Fedorov DG, Kitaura K. Journal of Physical Chemistry A. 2007;111:6904. [PubMed]

14. Challacombe M, Schwegler E. Journal of Chemical Physics. 1997;106:5526.

15. Friesner RA, Murphy RB, Beachy MD, Ringnalda MN, Pollard WT, Dunietz BD, Cao YX. Journal of Physical Chemistry A. 1999;103:1913.

16. White CA, Johnson BG, Gill PMW, Head-Gordon M. Chemical Physics Letters. 1994;230:8.

17. White CA, Johnson BG, Gill PMW, Head-Gordon M. Chemical Physics Letters. 1996;253:268.

18. Scuseria GE. Journal of Physical Chemistry A. 1999;103:4782.

19. Korchowiec J, Lewandowski J, Makowski M, Gu FL, Aoki Y. Journal of Computational Chemistry. 2009;30:2515. [PubMed]

20. Jiang N, Ma J, Jiang YS. Journal of Chemical Physics. 2006;124:114112. [PubMed]

21. Daniels AD, Scuseria GE. Journal of Chemical Physics. 1999;110:1321.

22. Yang WT. Physical Review Letters. 1991;66:1438. [PubMed]

23. Yang WT, Lee TS. Journal of Chemical Physics. 1995;103:5674.

24. Kohn W. Physical Review Letters. 1996;76:3168. [PubMed]

25. Strain MC, Scuseria GE, Frisch MJ. Science. 1996;271:51.

26. Exner TE, Mezey PG. Journal of Physical Chemistry A. 2002;106:11791.

27. Fusti-Molnar L. Journal of Chemical Physics. 2003;119:11080.

28. Fusti-Molnar L, Pulay P. Journal of Chemical Physics. 2002;117:7827.

29. Shao Y, Molnar LF, Jung Y, Kussmann J, Ochsenfeld C, Brown ST, Gilbert ATB, Slipchenko LV, Levchenko SV, O'Neill DP, DiStasio RA, Lochan RC, Wang T, Beran GJO, Besley NA, Herbert JM, Lin CY, Van Voorhis T, Chien SH, Sodt A, Steele RP, Rassolov VA, Maslen PE, Korambath PP, Adamson RD, Austin B, Baker J, Byrd EFC, Dachsel H, Doerksen RJ, Dreuw A, Dunietz BD, Dutoi AD, Furlani TR, Gwaltney SR, Heyden A, Hirata S, Hsu CP, Kedziora G, Khalliulin RZ, Klunzinger P, Lee AM, Lee MS, Liang W, Lotan I, Nair N, Peters B, Proynov EI, Pieniazek PA, Rhee YM, Ritchie J, Rosta E, Sherrill CD, Simmonett AC, Subotnik JE, Woodcock HL, Zhang W, Bell AT, Chakraborty AK, Chipman DM, Keil FJ, Warshel A, Hehre WJ, Schaefer HF, Kong J, Krylov AI, Gill PMW, Head-Gordon M. Physical Chemistry Chemical Physics. 2006;8:3172. [PubMed]

30. Dixon SL, Merz KM. Journal of Chemical Physics. 1996;104:6643.

31. Gogonea V, Westerhoff LM, Merz KM. Journal of Chemical Physics. 2000;113:5604.

32. Shaw DM, St-Amant A. Journal of Theoretical & Computational Chemistry. 2004;3:419.

33. Kobayashi M, Nakai H. International Journal of Quantum Chemistry. 2009;109:2227.

34. Akama T, Fujii A, Kobayashi M, Nakai H. Molecular Physics. 2007;105:2799.

35. Akama T, Kobayashi M, Nakai H. Journal of Computational Chemistry. 2007;28:2003. [PubMed]

36. Kobayashi M, Akama T, Nakai H. Journal of Chemical Physics. 2006;125:204106. [PubMed]

37. Exner TE, Mezey PG. Journal of Computational Chemistry. 2003;24:1980. [PubMed]

38. Exner TE, Mezey PG. Journal of Physical Chemistry A. 2004;108:4301.

39. Exner TE, Mezey PG. Physical Chemistry Chemical Physics. 2005;7:4061. [PubMed]

40. Nakano T, Kaminuma T, Sato T, Fukuzawa K, Akiyama Y, Uebayasi M, Kitaura K. Chemical Physics Letters. 2002;351:475.

41. Fedorov DG, Kitaura K. Chemical Physics Letters. 2006;433:182.

42. Zhang DW, Zhang JZH. Journal of Chemical Physics. 2003;119:3599.

43. He X, Zhang JZH. Journal of Chemical Physics. 2005;122:031103.

44. Fedorov DG, Ishimura K, Ishida T, Kitaura K, Pulay P, Nagase S. Journal of Computational Chemistry. 2007;28:1476. [PubMed]

45. Fedorov DG, Kitaura K. Journal of Chemical Physics. 2005;123:134103. [PubMed]

46. Kobayashi M, Imamura Y, Nakai H. Journal of Chemical Physics. 2007;127:074103. [PubMed]

47. Kobayashi M, Nakai H. Journal of Chemical Physics. 2008;129:044103. [PubMed]

48. Dixon SL, Merz KM. Journal of Chemical Physics. 1997;107:879.

49. Schwegler E, Challacombe M. Journal of Chemical Physics. 1999;111:6223.

50. Burant JC, Strain MC, Scuseria GE, Frisch MJ. Chemical Physics Letters. 1996;248:43.

51. Shao YH, White CA, Head-Gordon M. Journal of Chemical Physics. 2001;114:6572.

52. Ochsenfeld C. Chemical Physics Letters. 2000;327:216.

53. Ochsenfeld C, White CA, Head-Gordon M. Journal of Chemical Physics. 1998;109:1663.

54. Chen XH, Zhang JZH. Journal of Chemical Physics. 2006;125:044903. [PubMed]

55. Chen XH, Zhang YK, Zhang JZH. Journal of Chemical Physics. 2005;122:184105. [PubMed]

56. He X, Ayers K, Brothers E, Merz KM. QUICK. University of Florida; Gainesville; FL: 2008.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |