Binding energies of molecular solids from fragment and periodic approaches

We calculate binding energies of four molecular solids using the Hartree-Fock (HF) and second-order M{\o}ller-Plesset perturbation theory (MP2). We obtain the energies within many-body expansion (MBE) as well as using periodic boundary conditions (PBC) to compare both approaches. The systems we study are methane, carbon dioxide, ammonia, and methanol. We use tight convergence settings to obtain the binding energies with a high precision, we estimate the uncertainties to be only few tenths of percent. We discuss several issues that affect the quality of the results and which need to be considered to reach high precision for both MBE and within PBC. For example, HF as well as MP2 energies within PBC benefited from the use of real-space Coulomb cut-off technique, the convergence of energies within MBE was improved by modifying the order of summation. Finally, numerical noise made the evaluation of some of the MBE contributions difficult and the effect was reduced by using smaller basis sets for the less critical terms.


Introduction
Molecular solids are materials important in many areas of science and industry, for example as pharmaceuticals. However, it is often difficult to obtain their binding energy reliably from theoretical approaches due to the need to describe accurately both interand intra-molecular interactions [1]. For example, the electron density of the molecule and its response to the environment as well as electron dispersion interactions (van der Waals forces) need to be accounted for reliably.
The natural way to treat molecular solids is by using periodic boundary conditions (PBC). Currently, density functional theory (DFT) approximations are widely used within PBC to study molecular solids [2]. However, due to issues related to, e.g., charge delocalization, the reliability of semi-local DFT approximations can be insufficient for properties such as energy differences between polymorphs or solid phases even when dispersion corrections are employed [3,4]. Hybrid DFT functionals reduce the delocalization errors and lead to more reliable energy differences [3,4]. Despite the higher accuracy, errors in binding energies can still have considerable spread for systems with similar binding character [5]. More consistent results can be obtained from correlated methods, even from simpler ones such as second-order Møller-Plesset perturbation theory (MP2) [6] or the random-phase approximation (RPA) [5,[7][8][9]. The use of the reference quality coupled clusters scheme within PBC is currently limited by its computational cost [10][11][12]. Finally, quantum Monte Carlo (QMC) was shown to offer a very high accuracy [13]. However, QMC as well as the other correlated methods usually require a careful set-up by an experienced user.
As an alternative, the many-body expansion (MBE) can be used to obtain binding energies of molecular solids [14,15] as well as other properties [16]. Its main advantage is the possibility to use computational methods which are too demanding within PBC, such as coupled clusters at the reference CCSD(T) level. Within MBE, the total energy is obtained as a sum of two body energies with non-additive three-, four-, and higherorder corrections. The total computational cost can be affordable if the MBE converges quickly with the size of the fragment (dimer, trimer, . . . ) and with the number of included fragments for each fragment size. The convergence can be problematic for systems with important long-range electrostatic interactions [17]. This issue can be rectified by, e.g., embedding the fragments [14,18,19] or by using a force-field to account for more distant fragments [20]. However, it's not clear what precision can be ultimately reached with these approaches [21]. Finally, one can reduce the slow convergence of MBE by starting from a less accurate but affordable calculation within PBC and use MBE with a higher-level scheme to improve the result [22][23][24].
In this work we consider the binding energies at the HF and MP2 levels. For these methods one can find published results that vary by more than 20% [9,25]. A similar situation can be observed for hybrid functionals as they include the HF-like energy. The origin of such deviations comes likely from the fact that the calculations are computationally demanding and the total or binding energies can strongly depend on several numerical parameters. To analyze some of the issues we employ in this work both MBE and PBC to obtain HF and MP2 binding energies of four crystals. For all the systems and methods we try to reach a very high precision of the results. This allows to analyze the effect of the different parameters and finally reach a very good agreement between the data.

Systems
We selected four systems for our study: methane, carbon dioxide, ammonia, and methanol. We made this choice to include systems with different binding properties. Specifically, the long-range electrostatic interactions are negligible for methane, but are more important for the other systems. Moreover, hydrogen bonds can be expected to dominate the binding of methanol. We expect that the different roles of the various interactions will affect the convergence behaviour of the calculations. Finally, the molecules are small, with up to 20 electrons, which simplifies the calculations.
The structures of carbon dioxide and methanol were obtained from the Crystallography Open Database [26,27]. For methane, we used a high symmetry structure with fcc packing and the structure of ammonia is the model N obtained by Boese et al. [28] based on the structure presented in Ref. [29]. All the structures were used without further optimization and are available in the data repository. We list their properties in Table 1. Table 1. The crystals selected for this study, the volumes of the unit cell V , the number of molecules in the unit cell Z, and the dipole moment of the isolated monomer (experimental value from [30]

Methods
Within the periodic boundary conditions (PBC) approach, the binding energy of a molecular crystal for method M, E M bind , is given by: where E M cell is the total energy of a unit cell, Z is the number of molecules per unit cell, and E M mol is the total energy of an isolated molecule in the gas phase. The calculations within periodic boundary conditions were performed using the Vienna ab-initio simulation package (VASP) [33,34]. The HF and MP2 energies were obtained using the algorithms described in Refs. [35][36][37][38]. Moreover, the so-called direct MP2 energy was evaluated using the random-phase approximation (RPA) routines in VASP [39][40][41]. We used the "hard" projector-augmented wavefunction (PAW) datasets as supplied with VASP to minimize the error related to using pseudo-states. The used basis-set sizes are discussed in the Results section but in general we used orbital cutoffs of at least 600 eV. The cut-off values for overlap densities and related properties (ENCUTGW) were one half of the orbital cut-offs. The real-space cut-off technique [42] was used for HF calculations (HFRCUT), leading to faster convergence with k-points and cell volumes, as discussed in Sections 4.1 and 4.2. The input files are provided in the repository.
The evaluation of the MP2 correlation energy is memory demanding due to the need to store all the overlap densities between occupied and virtual states for all the k-point pairs. For large number of k-points and basis-set sizes the calculation needs to be run over a large number of compute nodes so that the total memory is sufficient. However, this reduces the performance of the code as our systems are rather small and do not scale well. To bypass this issue, we modified the code to evaluate the MP2 energy only for a single combination of k-points at each run which substantially reduces the memory requirements per calculation.
Within the MBE (or fragment) approach, the binding energy of a molecular crystal is assembled from a series of dimer interaction energies of a single arbitrary "reference" molecule with all other molecules in the crystal and higher-order n-body (n = 3, 4, . . .) non-additive interaction energies: where the index 0 denotes the reference molecule and indices i, j, k, . . . other molecules in the crystal. The summation thus runs over all non-equivalent dimers, trimers, etc., in a crystal, and Here, E i , E 0,i , E 0,i,j , and E 0,i,j,k is a monomer, dimer, trimer, and tetramer energy, respectively. As the summations in Equation 2 are infinite, a cut-off distance r cut needs to be introduced above which the contributions are neglected. We define distance between molecules as the average distance between all the pairs of atoms of the two respective molecules. For trimers and tetramers, the total intermolecular distance is a sum of distances of all the pairs of molecules in the fragment. The n-body energy for a given cut-off distance r cut is then the sum of the all the contributions with total intermolecular distance below r cut .
Apart from distance criteria we also define "shells" of unit cells around the unit cell with the reference molecule. The unit cell with the reference molecule is "shell 0", the 26 unit cells around it are "shell 1", the next 98 cells belong to "shell 2" and so on. This is schematically shown for two dimensions in Figure 1 with green, blue, and yellow used for shell 0, shell 1, and shell 2. We use then summation based on index of the shell, i.e., adding all contributions from molecules in a given shell, as it leads to improved convergence compared to the standard distance-based criterion. The set-up of the MBE fragments was done using an in-house library written in Python. This library also contains functions for extraction and analysis of the results. Only the energies of symmetry inequivalent fragments were computed. The symmetry equivalent fragments were obtained by grouping the fragments according to distance and comparing the eigenvalues of the Coulomb matrix [43] for fragments with the same distance, similar to the scheme presented by Borca and co-workers [44]. Note that to sum the energies according to shells we need the full list with all the fragments, even symmetry equivalent. The necessary scripts are in the data repository.
Finite cluster HF and MP2 energies were obtained using the Molpro package [45]. We used the Dunning's correlation consistent basis sets with augmentation functions [46]. The basis-set convergence of interaction energies to the complete basis-set limit was accelerated by using the explicitly correlated (F12) variant of MP2 [47][48][49][50]. The convergence of HF energies with the basis-set size was improved by the complete auxiliary basis-set (CABS) correction [49,51]. The canonical HF and MP2 calculations were run without density fitting. The MP2-F12 uses density fitting and we have tested both the use of a fitting basis-set with the same cardinal number as used for orbital basis and a higher cardinal number [52][53][54]. All energies contributing to an n-body term are computed using the basis set of the corresponding n-body cluster in order to avoid the basis set superposition error (BSSE) [55][56][57].
To increase the precision of the results we set the convergence settings in Molpro as tight as possible. Specifically, we used energy=1.e-16, orbital=1.e-11, lowered the screening thresholds for the electron repulsion integrals, and set a tight convergence for F12 "forb=1.e-15". It was shown that weak convergence criteria can lead to a numerical noise that can dominate over the actual energy contributions, especially for three-body or four-body terms [58]. Indeed, for loose settings we observed that the errors of different fragments do not cancel but accumulate so that the binding energy diverges.

Hartree-Fock energy
The evaluation of the HF energy in PBC is the simplest one from the four types of calculations performed in this work (HF and MP2 in PBC or MBE). However, it is also a calculation where considerable errors can be introduced. The reason for this is the Coulomb singularity in the exchange energy which leads to an error proportional to the volume of the Γ-point in the reciprocal space [36,59]. The HF energies then converge as 1/N k , where N k is the number of k-points or as 1/V where V is the volume of the simulation cell, e.g., for an isolated molecule. This error can be essentially mitigated when the Coulomb interaction is cut in the real space outside of a specified radius [42].
In Figure 2 we show the HF energies (printed out by VASP) of methane molecule and solid for different unit cells and k-points, respectively, without and with the use of the real-space cut-off scheme. As is evident, the cut-off improves the convergence substantially. For solid the energy differs from the converged value by around 0.01 kJ/mol already for 2×2×2 k-points and for molecule a similar convergence is achieved for a cell with 12Å side. For the same k-point set and cell volume the standard HF energies for solid and molecule have errors of 31.7 and 8.4 kJ/mol, respectively. Clearly, the standard HF energies need to be extrapolated to infinite volume to remove the error.
We note that this issue also occurs for hybrid DFT functionals and needs to be taken into account when energies obtained in different cells are compared, such as for adsorption energies or different phases of solids. The problem can be difficult to notice due to the typically smaller amount of Fock exchange in hybrid functionals. Moreover, if one uses k-point set and cell volume for which DFT energies typically converge for insulators or semiconductors (supercell side around 20Å), the errors partially cancel, even though in an uncontrolled manner. Also the problem is not present for functionals which screen the long-range part, such as HSE [36].
We observe a similar improvement of the convergence with k-points and volume for the other systems as well when the Coulomb cut-off is used. A 2×2×2 k-point sampling is sufficient to obtain the energies of ammonia, carbon dioxide, and methanol solids to within 0.1 kJ/mol per molecule. For a 3×3×3 k-point grid the energies differ by around 0.001 kJ/mol per molecule from a denser k-point grid and can be thus considered converged. Note, however, that this small difference is observed for high basis-set cutoffs (above approx. 1200 eV), the energy differences can be larger for smaller basis sets. Finally, the energies of molecules that have a non-zero dipole moment need to be extrapolated to infinite cell volume. This extrapolation corrects the energy of ammonia molecule by only 0.02 kJ/mol compared to a value in a cell with a 21Å side, for methanol the correction is 0.2 kJ/mol for the same settings. For methane and carbon dioxide we simply used a large enough cell to obtain E HF mol . The final binding energies were calculated from data obtained with a very high basis-set cut-off, 2100 eV for methane and methanol and 2000 eV for ammonia and carbon dioxide. While the binding energies are converged to within 0.1 kJ/mol for a basis-set cut-off of 1000 eV, we opted for a higher cut-off to reduce the uncertainty to around 0.01 kJ/mol. The final values of the HF energies are summarized in Table 2. As expected, the HF contribution is repulsive for methane and the binding increases in magnitude with the increased importance of electrostatic interactions in the systems.

MP2 energy
As with the evaluation of the HF energy, one encounters several possible issues when calculating the MP2 energy within periodic-boundary conditions and plane-wave basisset [37]. First, the MP2 energy formula contains a term with zero denominator. In VASP, this contribution is evaluated approximately using the derivative of the wavefunction with respect to k-points (WAVEDER file) [60]. Second, additional error occurs if the HF states used in the MP2 calculation were not obtained with the Coulomb cut-off technique. Moreover, the MP2 energy still possesses dependence on k-points or cell volume, this dependence can be reduced by the scheme developed by Liao and Grüneis [61]. Finally, the MP2 energy depends strongly on the basis set used to expand the orbitals and the overlap densities [37] so that extrapolation to the CBS limit is needed. We shortly discuss this last point before presenting the results. The leading term of the MP2 energy errors due to incomplete basis set is proportional to M −1 , where M is the number of basis functions. For a plane-wave basis set this can be rewritten as E −3/2 cut [37,39,62,63]. This dependence holds for molecule as well as solid and to obtain a converged binding energy one thus needs to extrapolate the energies to the complete basis-set limit. There are, in principle, two ways how to do this. First, one can obtain the MP2 energies of solid and molecule for different cut-offs, extrapolate each of them to the complete basis set (CBS) limit, and subtract them to obtain the binding energy. Second, one can calculate the binding energies for different cut-offs and extrapolate the binding energy to the CBS limit. It was shown before that for molecular solids the dependence of the binding energy on the basis-set size can differ from the one derived for the total energy and that higher-order errors, the first being proportional to E −5/2 cut , can dominate [5]. We therefore use the second approach where one fits the binding energy directly.
We now turn to the calculation of MP2 correlation energies of isolated molecules. We evaluate the energies at several basis-set cut-offs to enable extrapolation to the CBS limit, using 600 eV as the lowest value and increasing the cut-off in steps of 100 eV up to 1300 or 1400 eV for methane. Moreover, for each cut-off, we obtain the energies for different cell sizes starting from 7Å and increasing in steps of 1Å to enable extrapolation to the infinite cell volume. Since MP2 memory requirements grow quickly with increasing basis-set or simulation cell size, we were only able to use simulation cells up to 12Å with an orbital cut-off of 1000 or 1100 eV. Smaller cells were used for higher cut-offs due to memory limits. We find that increasing the orbital cut-off by 100 eV reduces the computationally tractable cell-size by 1Å.
The extrapolation to the infinite cell volume can be done in several ways. It's important to note that for larger cut-offs (above 1200 eV) we have less datapoints available, limiting the accuracy of the fit. For smaller cut-offs (600 to 900 eV), the data contain numerical noise reducing the accuracy of the result as well. After some testing we used to following strategy: First, we fit the data obtained for the cut-off energy of 1000 eV in cells sized between 8 and 12Å using a function E(x) = E ∞ + Cx −6 [39]. Second, we fix the coefficient C and fit the data for the other orbital cut-offs, again for cells with a side of at least 8Å. This procedure reduces the effect of numerical noise for small cut-offs as all the relevant values are used.
We now discuss the evaluation of MP2 energy for solids which is the most computationally demanding part of the calculations within PBC. The MP2 energy can be divided into two contributions: the direct and exchange MP2, dMP2 and xMP2, respectively. It is possible to evaluate the first efficiently using the RPA algorithm which scales linearly with the number of k-points N k [40,41]. This reduces the computational time significantly so that the dMP2 term can be evaluated for dense k-meshes for which the energy is converged to few hundredths of kJ/mol, as discussed below. Fortunately, the xMP2 energy, for which the compute time scales as O(N 3 k ), converges faster with N k and similar precision is achieved for coarser meshes.
The convergence of the dMP2 energy with the number of k-points for ammonia for three different basis sets is shown in Figure 3. The MP2 correlation energy converges as 1/N 2 k which can be used to extrapolate the values to infinite k-point grid, such extrapolations obtained from two adjacent k-points are shown with the dotted lines.
All the values in the graph were shifted so that the values obtained by extrapolation of the N k = 3 3 and N k = 4 3 energies are zero. One can see that already for the 2×2×2 k-point set the energy is converged to within 0.1 kJ/mol, which might be sufficient for many applications. Evaluation using the 3×3×3 k-point mesh reduces the error by almost an order of magnitude, at least for the 800 and 1000 eV cut-offs. The errors are larger for the 600 eV cut-off which we attribute to numerical noise. The extrapolation works very well for ammonia, even when only the Γ-only and 2×2×2 energies are used. Note that the value for the Γ-only calculation is at around 5 kJ/mol. The extrapolation is less efficient for the other systems, we only use it as an additional tool to assess the convergence with respect to the k-point set. Also note that all the values are based on HF calculation for which the Coulomb cut-off method was used. Without it, the errors are 1.4, 0.3, and 0.1 kJ/mol for the 2×2×2, 3×3×3, and 4×4×4 k-point grids, respectively.
The computational and memory requirements limit again the basis-set cut-off which can be used with dense k-point meshes. However, the similar dependence of the MP2 error for the 800 and 1000 eV grid, visible in Figure 3, suggests a correction scheme to estimate the energies for higher basis-set cut-offs. Specifically, to approximate the energy at a high cut-off and dense k-mesh E dense high , we take the energy obtained for the same cut-off and coarse grid E coarse high and correct it with energy difference ∆ between the k-points obtained for a lower cut-off In practice, we obtain the ∆ correction with as high cut-off as possible and for as many k-points as tractable. The cut-offs and k-point sets for which we performed the largest xMP2 and dMP2 calculations are summarized in Table 3 together with the value of ∆. While the value of ∆ depends on the basis-set cut-off, the variations are usually around or below 0.01 kJ/mol when a cut-off of at least 900 eV is used. As mentioned previously, the MP2 correlation energy has an error that is proportional to 1/N 2 k . One can then verify that the error decreases approximately by an order of magnitude when going from n 3 to (n + 1) 3 k-point grid. As the values of ∆ are few hundredths of kJ/mol at most, it's highly probable that going to denser k-point meshes would change the energy by less than 0.01 kJ/mol.   The procedure above gives the energies of molecules and solids for a set of basisset cut-offs and we use them to obtain the binding energy, as a function of the cut-off energy. The binding energy then needs to be extrapolated to the infinite basisset limit. As discussed above, the leading order error of the total energies depends as E −3/2 cut on the plane-wave cut-off, which we observe. However, the convergence behaviour of the binding energies is different. Fitting the binding energies with a function f (x) = E ∞ + ax b gives values of exponent b between −2.2 (carbon dioxide) over −3 (methanol and ammonia) to −4 (methane). A fit which assumes b = 1.5 gives E ∞ that differ by around 0.1 kJ/mol for methanol and carbon dioxide and by around 0.05 kJ/mol for ammonia and methane from E ∞ obtained with varying b. This difference is significant for our purposes. Moreover, visual inspection of the fit shows that b = 1.5 is not appropriate, especially for methanol where the data have little noise.
The values of exponent b suggest that the next order error, decaying as E −5/2 cut , governs the convergence of the binding energy, as for some systems studied in Ref. [9]. Setting b = 2.5 improves the quality of the fit. Moreover, the values of E ∞ are within 0.02 kJ/mol of those obtained with fit where b was a free parameter. Interestingly, the methane and carbon dioxide data are the most noisy, this is probably due to higher symmetry of the crystals compared to methanol or ammonia.
The final values obtained with b = 2.5 are in Tab. 4 together with the binding energies found for a fixed orbital cut-off of 1300 eV. One can see that the extrapolation procedure is necessary for carbon dioxide and methanol if one wants to reach a precision of 0.1 kJ/mol or better.
We estimate that the binding energies have overall uncertainties of around 0.1 kJ/mol, this value does not include the error of the PAW approximation. The main sources are extrapolation of energy of isolated molecule to the infinite cell limit, errors stemming from the correction scheme for solids and errors of the basis-set extrapolation for the binding energy.

Two-body contributions
The two-body energies are the dominant contribution to the MBE of the binding energy. For Hartree-Fock one observes Pauli repulsion for short distances and electrostatic interactions for all distances. The MP2 correlation falls off as r −6 with the distance r between the molecules which leads to a r −3 decay in solid. To reach high precision of the two-body energy the long-range contributions need to be accounted for. To accomplish this, we include all contributions from up to sixth shell. Moreover, the basis-set incompleteness errors need to be reduced as well, which is an issue mainly for MP2. We start the discussion with results obtained for methane which is the simplest molecule from our set. Methane has negligible long-range electrostatic contributions and its HF two-body term is thus dominated by Pauli repulsion. In fact, the twelve nearest molecules with intermolecular distance of around 4.2Å from the reference molecule contribute around 99% of the HF two-body energy. The basis-set convergence is fast and improved by using the CABS corrections. For example, while the difference of the HF two-body energies is around 0.04 kJ/mol between the AVTZ and AV5Z basis sets, it's reduced down to 0.01 kJ/mol when the CABS corrections are added. We use the HF(CABS) energies obtained with the AV5Z basis set from all the molecules up to sixth shell to obtain the final two-body energy E 2b HF = 5.265 ± 0.002 kJ/mol. The uncertainty was estimated based on the difference between the data obtained with different basis sets and the convergence of the energy on the distance cut-off.
We observed one potential issue when evaluating the HF(CABS) energies of methane. For the AVDZ basis set, the smallest basis-set used, the CABS corrections critically depend on the fitting basis set. The long-range contributions do not go to zero when a double-ζ fitting basis-sets are used. This issue is reduced when a larger fitting basis sets are used, as shown in Figure 4.  The MP2 two-body energies of methane obtained for all molecules within six shells using different basis sets are listed in Table 5. As observed before [19], the errors of MP2-F12 using a basis-set with a cardinal number N are similar to errors of MP2 using a basis set with a cardinal number N + 1 or N + 2. When the MP2 energies are extrapolated using AVNZ and AV(N − 1)Z basis sets, they have similar errors to the MP2-F12 in the AVNZ basis. The F12 correction removes the leading order error which decays as N −3 , the correction is less efficient for the higher-order errors. We have tried if the MP2-F12 energy could be extrapolated as well, assuming that the residual error decays as N −5 . This seems to work rather well, the extrapolated MP2-F12 value obtained with the AVDZ and AVTZ basis sets is within 0.01 kJ/mol from the reference value obtained by extrapolating the AVQZ and AV5Z data. To estimate the MP2 contributions of dimers in higher shells we fitted the MP2 energy as a function of r cut with E ∞ + Cr −3 cut . We used only interval between 12 and 20Å where the function is sufficiently smooth. The difference between E ∞ and the value obtained by summing contributions from within the sixth shell is below 0.02 kJ/mol. Setting a finite distance cut-off r cut for the MP2 interactions, the error would be 0.1 kJ/mol for r cut = 13Å and 0.2 kJ/mol for r cut = 11Å.
We calculate our final value for MP2 two-body energy of methane crystal by taking the MP2-F12 energy extrapolated using the AVQZ and AV5Z data and adding the correction to infinite r cut . The final value is then −15.254 ± 0.005 kJ/mol where we the estimated uncertainty is based on errors of extrapolation to the CBS limit and to infinite r cut .
We now turn to the evaluation of the HF energy for carbon dioxide, ammonia, and methanol. For these systems the electrostatic interactions are important and the energies do not converge as quickly with the cut-off distance as for methane. In fact, the energy as a function of the cut-off distance oscillates as shown for methanol in Figure 5. Note, that we use only the dimers up to sixth shell which means that dimers from larger shells are not included in the graph, first such dimer occurs at a distance of around 30Å. One way to deal with the convergence is to parametrize a force-field and use it to include the contributions above the cut-off distance [64]. However, the structure of methanol suggests that the oscillations can be caused by the fact that for some cut-off distance the reference molecule interacts more with molecules with an aligned dipole while for a different cut-off there are more interactions with molecules with a dipole oriented in the opposite way. To test this we reordered the summation so that all the molecules from the same unit cell contribute at the same cut-off distance. The cut-off is given by the smallest distance to the reference molecule for these molecules. The modified summation clearly improves the convergence of the HF two-body energy, as illustrated by the green line in Figure 5.  Figure 5. Convergence of the Hartree-Fock two-body energy for methanol in the AVQZ basis set using standard distance based cut-off scheme (Distance cut-off) and its modification where all molecules from a single unit cell are added at the same time (Cell summation). Note that contributions only from molecules within sixth shell around the unit cell containing the reference molecule are included, not all contributions above cut-off distance of 30Å are considered.
While the summation over unit cells improves the convergence, for numerical analysis it's beneficial to simplify the summation further. To this end, we separate the dimers according to the shell of the second molecule and sum all contributions within each shell. The data show a fast convergence with the index of the shell, see Table 6 for carbon dioxide as an example. For methanol and ammonia, taking 499 two-body contributions from up to second shell leads to an error of around 0.1 kJ/mol. In the case of methanol the standard distance based summation still shows errors three times larger for the same number of dimers.
The long-range contributions are needed to reach high precision but their large number leads to increased computational effort when large basis sets are used. One would therefore hope that for large distances a small basis set (such as AVDZ or AVTZ) would be sufficient to obtain the shell contributions with an error well below 0.01 kJ/mol so that the use of a large basis set could be avoided. However, it turns out that the contributions of more distant shells have actually larger errors when larger basis sets (AVQZ or AV5Z) are used than with the small basis sets. This is illustrated by the shell contributions obtained for carbon dioxide in Table 6. One can see that for the third and higher shells the contributions are well below 0.01 kJ/mol in the AVDZ and AVTZ basis sets. In the AVQZ and AV5Z bases the contributions are positive and sum to almost 0.05 kJ/mol for the AV5Z basis set. Clearly adding more dimers would further reduce the precision of the result. This behaviour is a consequence of the numerical errors: to reach a precision of 0.001 kJ/mol for the total contribution of the sixth shell, each of the term needs to be evaluated with a precision of around 10 −7 kJ/mol or 10 −10 Ha. Given that the total energies are in the order of 10 6 kJ/mol or 10 3 Ha, this is approaching the limit of double precision arithmetic. Moreover, reaching the high precision becomes more difficult with the increasing size of matrices that need to be transformed or diagonalized during the HF calculation. That is, the issue is more severe for larger basis sets and also for larger molecules. Given the numerical issues of the large basis sets we have used them only for the shells with small indices when obtaining the final results. The contributions of larger shells were obtained with the AVTZ basis set. Our final HF two-body energies are summarized in Table 7 and show expected behaviour, the interaction is repulsive for methane and attractive for the other systems. We now discuss the MP2 contributions to the two-body MBE energies of carbon dioxide, ammonia, and methanol. Contrary to the methane crystal, the MP2 twobody energy does not converge smoothly enough with the distance cut-off to enable extrapolation to the infinite cut-off. The situation is improved by reordering the summation, as with the HF energy, but we have decided to base the analysis again on the shell contributions.
As expected, the largest two-body contributions come from the zeroth and first shell, that is from molecules either in contact with the reference molecule or in a close distance. For all three crystals the molecules in the second shell, in which the molecules are at least 8Å away from the reference molecule, still have a significant contribution to the binding: The MP2 two-body energies are −0.35, −0.46, and −0.58 kJ/mol for ammonia, carbon dioxide, and methanol, respectively. For higher shells, the contributions have a similar magnitude for the three systems and sum to approximately −0.1 kJ/mol.
We now turn to the basis-set dependence of the MP2 energies as a function of distance. For simplicity, we compare the energies obtained for different shells. As with HF, we want to see whether smaller basis-sets are sufficient to reproduce the values obtained with large basis sets and for which shells this happens.
We use ammonia as an example and show the errors of AVDZ and AVTZ basis sets for different shells in Table 8. The reference data are the MP2-F12/AV5Z two-body contributions listed in the last column. One can see that using MP2 with AVDZ basis set for the proximate dimers (in zeroth and first shells) leads to an error of around 5 kJ/mol, close to 20% of the total two-body energy. In contrast, using AVDZ for dimers in the second and higher shells leads to a small error of around 0.02 kJ/mol. The errors in all shells decrease approximately by a factor of three when the AVTZ basis set is used or when the F12 corrections are included in the AVDZ calculation. For MP2-F12 in the AVTZ basis set the total error is close to 0.13 kJ/mol and predominantly comes from the first two shells.
The errors in the MP2 two-body terms can be reduced by extrapolation. In the case of canonical MP2, the error is reduced to around 0.2 kJ/mol when the AVDZ and AVTZ values are used. The result of extrapolation using the AVDZ and AVTZ data is not completely satisfactory, in agreement with observations in the literature [65]. However, the error becomes essentially marginal, close to 0.01 kJ/mol, when the MP2-F12 energies in the same bases are extrapolated to the CBS limit. We again assumed the N −5 convergence with the cardinal number of the basis set. We observe a good performance of the MP2-F12 extrapolation also for methanol, where the errors for zeroth and first shells are around 0.01 kJ/mol. In contrast, when the canonical MP2 energies are extrapolated (still using AVDZ and AVTZ basis sets), the contributions of the first two shells have each an error of over 0.3 kJ/mol. Finally, the numerical errors in the HF two-body contributions of higher shells also affect the subsequent MP2 energies. For example, the sixth shell contribution of carbon dioxide becomes positive in the AV5Z basis set. The error is, however, below 0.01 kJ/mol and thus barely noticeable. In any case, we use the large basis set results only from shells with small indices and use AVQZ or AVTZ results for the distant dimers. Our final values are summarized in Table 9, the uncertainties come from estimates of basisset incompleteness errors and estimated contributions beyond the sixth shell (below 0.01 kJ/mol).

Three-body contributions
The three-body contributions can be sizable and the number of trimers required to reach convergence can be also high [66]. Moreover, the accurate summation can be problematic Table 8. Basis-set errors of MP2 (∆ MP2) and MP2-F12 (∆MP2-F12) for twobody contributions from different shells of ammonia crystal with respect to the MP2-F12/AV5Z reference two-body energies (last column). The extrapolated data use the AVDZ and AVTZ values and assume N −3 and N −5 decay of the error with the cardinal number of the basis set N for the canonical and F12-corrected MP2, respectively. The MP2-F12/AVDZ calculations were performed with quadruple-ζ density-fitting basis sets. All energies are in kJ/mol. due to the numerical issues [58]. These issues make the three-body contributions critical for the evaluation of the binding energy within MBE. As with dimers methane shows a different convergence than the other systems and we discuss it first. It was previously shown that the three-body terms are less sensitive to the choice of the basis set compared with the two-body energies [55,57]. For methane, we observe a difference of only 0.002 kJ/mol between CABS corrected HF energies obtained with the AVDZ and AVQZ basis sets. The difference is approximately 0.005 kJ/mol for standard HF. The MP2-F12 values also differ by around 0.002 kJ/mol between the AVDZ and AVQZ basis sets and the AVTZ is less than 0.001 kJ/mol away from the AVQZ energies. We use the AVTZ basis set also for the other systems. One could expect that the stronger electrostatic interactions in the other systems would lead to larger basis-set errors but we generally observe only small changes of the binding energy due to the F12 and CABS corrections.
The convergence of the HF three-body energy with the cut-off distance is fast only for methane. Here a distance cut-off of 18Å is sufficient to essentially converge the three-body energy to a value of −0.199 kJ/mol. Adding contributions above this cut-off only varies the energy by around ±0.003 kJ/mol. The MP2 energy converges as r −4 cut , in agreement with the expected behaviour. However, there is little need to perform extrapolation. The MP2 energy obtained for a cut-off distance 20Å is within 0.01 kJ/mol of the extrapolated value. We note that the F12 corrections show some noise which depends on the basis-set size. However, the noise is only few thousandths of kJ/mol for the small AVDZ basis set (with triple-ζ fitting basis sets in F12).
The stronger electrostatic interactions present in the other systems lead to less systematic convergence, similar to the situation observed for dimers. And as with dimers, the convergence can be improved when the summation is reordered. When a trimer contribution E int 0,i,j is added for some r cut , we also include contributions of trimers E int 0,i,k , where the molecules k share the unit cell with the molecule j. In standard summation, the E int 0,i,k contributions would be included at higher cut-off distances. We observe improved convergence not only for methanol and ammonia, but also for carbon dioxide and both for HF and MP2.
We demonstrate the improved convergence of three-body HF(CABS) and MP2-F12 energies for ammonia in Figure 6. The standard cut-off based summation is shown with the "dist" data (red and black line), the modified summation with the "cell" values (green and blue line). For the HF energies, the standard cut-off summation has oscillations of few tenths of kJ/mol even above a cut-off distance of 40Å. The cell summation reduces the range of the values to approximately one half. Similar improvement can be observed for the MP2-F12 energies, but here the spread of values is in the order of hundredths of kJ/mol already for the standard summation.
We were able to obtain the three-body energies in the AVTZ basis set up to a cut-off distance of 40Å for methanol, 46Å for ammonia, and 45Å for carbon dioxide using the modified summation. Note that the energies contain also contributions from some trimers with higher total intermolecular distance due to the use of the modified summation. We estimate the three-body energy by its average value over the last 5Å. We estimate the uncertainty as one half of the difference of the largest and smallest value of the three-body energy on the last 10Å.
We list the HF three-body energies and their uncertainties in Table 7. Even with the modified summation, the data for ammonia and methanol are not tightly converged and have uncertainties of 0.10 and 0.3 kJ/mol. The convergence is much faster for carbon dioxide, which has no dipole. The three-body HF energy converges at a cut-off of around 25Å and increasing the cut-off further mainly leads to a reduced uncertainty. We note that the stated uncertainties might be rather conservative. We have fitted a simple forcefield that accounts only for polarization to the methanol three-body energies and found that increasing r cut to 70Å, while using the cell summation, changes the three-body energy by only +0.04 kJ/mol.
The MP2 energies show a fast convergence, averages taken over intervals of 5Å are essentially converged to within few hundredths of kJ/mol when the lower limit is at least 15Å. However, to converge the uncertainty or spread of the values to a similar range requires a cut-off of around 30Å. Importantly, the MP2 data of carbon dioxide and ammonia start to show effect of numerical noise above a cut-off of 30Å. We therefore use the average from interval between 25 and 30Å as the final value. The values of methanol do not exhibit such behaviour and we use all the values up to r cut = 40Å.
Overall, we find that all the HF contributions are attractive and all MP2 three-body  . HF(CABS) and MP2-F12 three-body energies of ammonia as a function of the total intermolecular cut-off distance using the standard cut-off ("dist" data, red and black line) and using the summation where all possible trimers sharing the first two molecules are added at the same time ("cell" data, green and blue line). energies are repulsive. All the three-body energies are below 1 kJ/mol, apart from the HF value for methanol which is almost as large as the two-body term. This is due to strengthening of hydrogen bonds in trimers.

Four-body contributions
The four-body terms are smaller compared to the three-body contributions yet they can't be completely neglected [55,67], at least for Hartree-Fock. Their magnitude and importance is expected to be smaller for MP2 compared to HF. However, the precise evaluation of four-body terms is difficult due to numerical issues. We observed for the two-body terms that the numerical issues grow with the basis-set size. We therefore used the AVDZ basis-set together with quadruple-ζ fitting basis sets for CABS and F12 to evaluate the four-body terms. Fortunately, small basis sets do not introduce significant errors in the four-body contributions [55].
We again evaluate the four-body terms using the cell summation. Here all tetramers sharing the first three monomers and with the last monomer in the same unit cell are added to the energy at the same r cut . The cut-off distance at which is the contribution added is given by the shortest total intermolecular distance of all such fragments.
We were able to evaluate the four-body contributions using a cut-off distance of 40Å for methane, ammonia, and methanol and 45Å for carbon dioxide. These cut-offs required around 10 000 individual four-body terms. For the estimates of HF energies we take the average of HF energies on the last 5Å, the uncertainty is again given by the spread of the values.
The final HF values and their uncertainties due to cut-off are summarized in Table 7. All the contributions are destabilizing and reach up to 0.49 kJ/mol for ammonia.
Interestingly, while the four-body contributions are smaller than the three-body energies, the uncertainties do not decrease proportionally, apart from ammonia. While increasing the cut-off further could reduce the uncertainties, the values are close to the limits given by the numerical precision. For example, we see that increasing the cut-off for methane deteriorates the precision of the result and we use only data up to a distance of 30Å. Figure 7 shows the convergence of four-body terms for methanol and illustrates some of the mentioned points. For example, there is only a small difference between HF(CABS) and HF values, as well as between MP2 and MP2-F12. This shows that the AVDZ basis-set is sufficient to evaluate the energy.
One can also see in Figure 7 that it's not clear if the MP2 energy is converged or if it's accumulating numerical noise. We observe a similar pattern also for the other systems. Fortunately, the magnitude of the values is small for the other systems, close to 0.03 kJ/mol for ammonia and carbon dioxide and even smaller for methane. We therefore use the average value from data obtained on the last 5Å as the estimate of the MP2 four-body term and set the uncertainty to the absolute value of the average. Overall, we see that the HF four-body terms contribute between two and five per cent of the total HF energy for systems with attractive electrostatic interactions (all apart from methane). The HF four-body terms are positive for all the systems, partially quenching the three-body terms which are all negative. Many-body HF contributions beyond fourth-order are likely to be smaller but probably comparable or smaller than the uncertainties of the three-and four-body contributions.
Even with the large relative uncertainties in the MP2 four-body terms, they represent only a fraction of per cent of the total MP2 binding energy. This is not surprising as MP2 accounts only for two electron excitations. For this reason we expect the MP2 five-and higher-body interactions to be negligible. We collected the final HF and MP2 binding energies obtained using MBE and within PBC in Table 10. One can see that the agreement is excellent for all the studied systems. The largest deviations, around 0.1 kJ/mol, occur for the HF energy of ammonia and for the MP2 energy of carbon dioxide, but the differences are within the estimated uncertainties.

Discussion
Considering the HF binding energy, the PBC approach is superior to MBE for precise calculations. The main reason is the simplicity of setting-up the calculations and the fact that polarization is treated up to infinite order within PBC. In contrast, MBE requires the evaluation of a large number of fragments and the non-additive terms converge slowly. The reordered summation that we used improves the convergence as it effectively accounts for interaction with the whole unit cell at once and the electrostatic moments of unit cells are typically smaller than the moments of the monomers. The interaction or non-additive energy is subsequently smaller and the oscillations observed for the binding energy as a function of the cut-off distance are reduced as well.
The main contributions to the non-additive terms in HF likely come from polarization and can be accounted for by polarizable force-fields [20]. Our preliminary test with a model that accounts only for polarization basically confirms this for methanol. It also shows that our three-body energy is likely converged to within 0.05 kJ/mol, much better than our estimated uncertainty of 0.3 kJ/mol.
We found that the main caveat in the HF calculations within PBC is the Coulomb singularity in the exchange term [59]. Fortunately, it can be easily avoided with the Coulomb cut-off scheme [42]. If the cut-off scheme is not used, the energies of solids and molecules need to be extrapolated to infinite volume. While this is reasonably simple to perform, the extrapolation adds unnecessary uncertainty to the energies. Moreover, we used hard PAW data sets which offer high precision for reproducing all-electron results. Using standard PAW data sets would introduce an error, especially for systems containing oxygen [5].
Turning now to MP2 energies, we again find an excellent agreement between the MBE and PBC values with the differences being only a fraction of percent (see Table 10). The largest difference is around 0.5% for carbon dioxide. One reason for this is the larger number of electrons in carbon dioxide compared to the other molecules. This leads to more demanding calculations, especially within PBC. For example, we could only use up to 2×2×2 k-points for to obtain E MP2 cell of carbon dioxide while for ammonia with a unit cell of similar size a calculation with 3×3×3 k-points was possible.
Another issue with the MP2 calculations within PBC is the need to perform several extrapolations. Specifically, the extrapolation to infinite volume for the isolated molecule and the extrapolation with the basis-set size were problematic. However, there is one subtle positive point about the PBC calculations based on plane-waves. Unlike the gaussian basis sets which are defined for a specific small set of cardinal numbers, the size of the plane-wave basis-set can be chosen at will. For example, we found that the dependence of the binding energy on the basis-set is noisy for methane and carbon dioxide. In such cases, one can compute additional points between the upper and lower limits to try to improve the quality of the fit.
The MBE calculations require more human time to set-up the computer scripts compared to PBC. However, once the set-up is automated the small size and embarrassing parallelizability of the calculations become benefits. The MP2 energy converges quickly with the order of MBE and the cutoff distance, even though the correlations are considered "long-range". Compared to the PBC results, the uncertainties are smaller for MBE-based MP2, apart from methanol. Due to these reasons, the MBE scheme is preferable for MP2. One possible issue are the numerical errors which can reach similar magnitude as the data [58]. This could be avoided by using an accurate model for the interactions and evaluating it with sufficient numerical precision.
The use of explicit correlation (F12) for MP2 was one of the main benefits of the MBE calculations. This avoids the need to perform extrapolations which is a substantial simplification, especially for the user. Interestingly, the F12 was the most important for short-range dimers, canonical MP2 was sufficient for long-distance dimers and nonadditive terms. One can expect that the F12 scheme will offer similar simplifications of the computational workflow also within PBC [68]. Another simplification would come from reduction of the memory or compute time requirements of the periodic MP2 calculations. This could be accomplished by, e.g., using natural orbitals [69] or by using different MP2 algorithms [70,71].
We now compare our results to other MP2 data in the literature. We find a very good agreement with the binding energies of carbon dioxide obtained using periodic LMP2 [72] and MP2 within the hybrid many-body interaction (HMBI) scheme [20], see Table 11. For ammonia, the agreement is again very satisfactory with the LMP2 value but we find a larger discrepancy with the HMBI result. We observe a difference of several kJ/mol for methanol [73], but note that our values do not contain monomer relaxation energies which would reduce the difference. Further differences, of several tenths of kJ/mol, can occur due to the use of different structures for all the systems.
Finally, we compare our MP2 values to data obtained with other correlated methods. Specifically to binding energies obtained using RPA with GW singles corrections [5], CCSD(T) [20,21,73], and QMC [13], see Table 11. MP2 is in a very good agreement with the reference values for carbon dioxide, the accuracy of RPA+GWSE is lower in this case. The agreement between MP2 and CCSD(T) is worse for ammonia and methanol. However, the CCSD(T) value for ammonia was obtained with the HBMI scheme and we see that the MP2 binding energy obtained using this approach differs from our value by around 4 kJ/mol. Therefore, the MP2 error for ammonia is likely smaller and, in fact, our MP2 binding energy differs by around 2 kJ/mol from the QMC value of Zen et al. [13]. We expect that the disagreement between our MP2 value and the CCSD(T) data for methanol can be due to similar reasons. Table 11. The total MP2 binding energies obtained using PBC and MBE methods. We also list data obtained previously for ammonia and carbon dioxide using periodic local MP2 [72] and by hybrid many-body interaction (HMBI) scheme at MP2 level [20]. The HMBI value for methanol is from Ref. [74]. The results for RPA+GWSE calculations are from Ref. [5]. The CCSD(T) values are from Ref. [73] for methane, from Ref. [20] for carbon dioxide and ammonia and from Ref. [21] for methanol. The QMC values are from Ref. [13].

Conclusions
We presented a detailed evaluation of HF and MP2 binding energies of four molecular solids within periodic boundary conditions and using many-body expansion. We tried to obtain the contributions with high precision to understand the significant sources of errors. The values obtained with the two approaches agree well even though some of them have higher uncertainties. Within PBC, the use of the Coulomb cut-off technique was crucial to speed up the convergence of both HF and MP2 energies with respect to volume and k-points. Otherwise the computational and memory requirements are currently limiting, especially for MP2.
The convergence of MBE with real-space cut-off deteriorated with increasing importance of long-range electrostatic interactions, as expected. We reduced this issue by reordering the summation, adding some terms that would otherwise contribute at higher cut-offs. Numerical noise appeared already for two-body terms, especially for large basis sets. This should make one cautious when fitting force-fields on energies obtained for large distances.
Finally, it's simpler to evaluate the HF energy within PBC while the situation is less clear cut for MP2. Currently, the combination of lower level energies obtained for the whole system with MBE corrections seems to be the most efficient approach for obtaining binding energies for post-HF methods. This is supported by examples in the literature using it both for solids and finite clusters [21,24,57,75].

Data availability statement
The data that support the findings of this study are openly available at the following DOI: 10.5281/zenodo.5248078