Benchmark Phaseless Auxiliary-Field Quantum Monte Carlo Method for Small Molecules

We report a scalable Fortran implementation of the phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) and demonstrate its excellent performance and beneficial scaling with respect to system size. Furthermore, we investigate modifications of the phaseless approximation that can help to reduce the overcorrelation problems common to the ph-AFQMC. We apply the method to the 26 molecules in the HEAT set, the benzene molecule, and water clusters. We observe a mean absolute deviation of the total energy of 1.15 kcal/mol for the molecules in the HEAT set, close to chemical accuracy. For the benzene molecule, the modified algorithm despite using a single-Slater-determinant trial wavefunction yields the same accuracy as the original phaseless scheme with 400 Slater determinants. Despite these improvements, we find systematic errors for the CN, CO2, and O2 molecules that need to be addressed with more accurate trial wavefunctions. For water clusters, we find that the ph-AFQMC yields excellent binding energies that differ from CCSD(T) by typically less than 0.5 kcal/mol.


I. INTRODUCTION
2][3] Despite its simplicity and the fact that it can describe systems with hundreds of atoms, DFT is not always accurate enough to serve as a general-purpose solution.The stretched H 2 molecule is a well-known example of the limitations of DFT, as its results are qualitatively wrong. 4The key problem is the treatment of the electron-electron correlation effects via approximate exchange-correlation (xc) energy functionals. 5Perdew established the analogy of climbing Jacob's ladder for the effort of developing more advanced xc functionals to reach chemical accuracy. 6Higher rungs offer higher accuracy at the price of higher computational costs and lower transferability.Novel xc potentials based on machine learning 7,8 attempt to address the accuracy issue but still struggle with lower transferability.It remains to be seen whether machine learning functionals will replace traditional ones in the coming years.Machine-learned xc potentials as well as machine-learned force fields 9,10 require highly accurate reference data.In addition, more accurate methods are also important to calibrate new methods or to access strongly correlated materials where DFT generally performs poorly.
For this reason, a whole spectrum of correlation-consistent methods has been developed over the last 50 years. 11,12Full configuration interaction (FCI) provides an exact solution within a given basis set 13 but suffers from exponential scaling.Therefore, it is only usable for modest system sizes (up to 10 10 Slater determinants).8][19] These methods enabled the treatment of the Fock space with up to 10 30 Slater determi-nants while maintaining the accuracy of the traditional FCI.However, they still scale weakly exponentially making them difficult to apply to more realistic systems.
Besides CI methods, the most prominent methods are based on the coupled-cluster expansion.The most popular among them, the "gold-standard" coupled-cluster singles, doubles, and perturbative triples (CCSD(T)), 20,21 is extensively used to treat molecules and is known to give excellent results for systems with small static correlation effects.High-accuracy benchmark data for small molecules are usually calculated using the CC expansion.3][24] For the molecules in the HEAT set, these results are considered to be as good as the FCI results.More importantly, the HEAT studies also show that CCSD(T) alone is not accurate enough to achieve chemical accuracy (< 1kcal/mol) but approaches it in many relevant cases.Furthermore, CCSD(T) scales adversely with the 7th power of the system size and performs poorly in the presence of strong static correlation effects, e.g. bond dissociation.These drawbacks emphasize the need for alternative methods.
6][27] It uses matrix-product states to encode the locality in one of the spatial dimensions and to reduce the exponential scaling of the CI expansion.For lowdimensional quantum systems (quantum dots, molecules extended in one dimension), DMRG is the method of choice due to its efficiency and accuracy.Although DMRG is applicable to arbitrary systems, its limitation of about 30 active orbitals prevents it from being a general-purpose tool for more generic and compact molecules or solids.
Quantum Monte Carlo methods, such as variational Monte Carlo 28 (VMC) and diffusion Monte Carlo [29][30][31] (DMC), ex-hibit low polynomial scaling with system size and are nowadays routinely applied to systems with hundreds of electrons.However, Nemec and co-workers 32 showed that achieving chemical accuracy is difficult with DMC.They reported a mean absolute deviation of atomization energies of 3.2kcal/mol for 55 molecules in the G2 set. 33Additionally, VMC and DMC require accurate models for multi-electron wavefunctions including linear combinations of Slater determinants, Jastrow factors, backflow wavefunctions, and many others.][36] In this work, we use the auxiliary-field quantum Monte Carlo (AFQMC) method.7][48][49] It behaved well for systems without severe fermionic sign problem.Since almost all general fermionic systems suffer from the sign problem, the reformulation into an open-ended random walk led to the constrained path (cp)-AFQMC, which was remarkably successful for model Hamiltonians. 50,51In order to handle general many-body Hamiltonians, the phaseless (ph)-AFQMC was proposed by Zhang and co-workers. 52The ph-AFQMC was successfully applied to isolated molecules using Gaussian-type orbitals (GTOs) and to solids using a plane-wave basis (PWs). 538][39][40][41][42][43][44][45] One reason for its popularity is that the ph-AFQMC scales quartic with the system size.This scaling stems from the local energy evaluation and can be further reduced using tensor contraction, 54,55 density-fitting techniques 56,57 or plane-wave basis. 53In contrast to DMC, AFQMC yields energies fully consistent with other traditional quantum chemistry approaches, allowing correlation energies to be directly compared.
Controlling the fermionic phase problem is crucial for the accuracy of the AFQMC method.In the ph-AFQMC, the phaseless approximation imposes a constraint on the walker weights, 52,58,59 analogous to the fixed-node approximation in DMC. 29The phaseless approximation can introduce somewhat uncontrolled errors that often lead to overestimated correlation energies.In the ph-AFQMC, improving the trial wavefunctions systematically reduces the phaseless approximation error.As the trial wavefunction approaches the exact ground-state wavefunction, the ph-AFQMC energy approaches the exact ground-state energy and the phaseless error disappears.Usually, the trial wavefunction consists of the single Slater determinant formed from Hartree-Fock (HF) or Kohn-Sham (KS) orbitals.They are particularly popular because they are easily accessible from all quantum chemistry / solid-state physics codes and avoid additional computational costs introduced by more complex trial wavefunctions.Recent research focused on multi-determinant trial wavefunctions [60][61][62][63][64] .For example, Mahajan et.al. used 10 4 Slater determinants while increasing the total cost of the ph-AFQMC computation by only a factor of 3 in comparison to the single Slater determinant case.
In this study, we adopt a simpler approach.We will carefully scrutinize whether modifications to the weight update reduce the need to go beyond single-determinant trial wavefunctions.With these modifications, the ph-AFQMC can provide close to chemical accuracy for the HEAT set.Bomble et.al. 23 provided highly accurate CCSDTQP molecular energies at the double-zeta basis set (cc-pVDZ).In this work, we benchmark the accuracy of the ph-AFQMC energies against this reference data set of small molecules.Borda et.al. recently performed a similar study on the G1 test set. 65he rest of the paper is structured as follows: In Sec.II, the ph-AFQMC method is briefly reviewed along with the proposed modifications, followed by details on the implementation in Sec.III.In Sec.IV, we present and discuss different applications of ph-AFQMC.Finally, Sec.V concludes our findings and possibilities for future developments.

II. AFQMC FORMALISM
In this section, we briefly introduce the AFQMC formalism.For a more detailed overview of the theory, we refer the interested reader to one of the reviews. 52,58,59Consider the full many-body Hamiltonian written in any orthonormal oneparticle basis given by where â † p and âq are fermionic creation and annihilation operators, respectively.The single-particle Hamiltonian matrix elements h pq include all one-body terms.The two-body Hamiltonian is written as a sum of squares of one-body operators Lg .In the Gaussian basis, we can obtain these operators by iterative Cholesky decomposition 66,67 of electron repulsion integrals (ERIs).The indices p, q, r, and s go over N basis functions and the index g goes over N g Cholesky vectors.Typically, N g ≈ 10N.We neglect the spin indices for simplicity.
The exact many-body ground-state wavefunction |Φ 0 is extracted from the long-time imaginary propagation of an initial state where τ is the imaginary time step and E 0 is the estimated ground-state energy.This equation is exact in the limit of infinitesimally small time steps.In this work, the Hartree-Fock orbitals form the initial wavefunction.
To treat the electron-electron interaction efficiently, the Hubbard-Stratanovich (HS) transformation 68,69 is used where p(x g ) is the standard normal distribution and x g are random numbers drawn from this distribution.In other words, the Hubbard-Stratanovich transformation maps the actual system of interacting particles onto a system of non-interacting particles coupled to random fields.Since ERIs are positive definite, random fields are purely imaginary, i.e., one can define a complex-valued effective Hamiltonian h given by hw where the superscript w emphasizes that unique random fields are drawn for each walker.The ensemble of N w walkers represents the exact many-body ground state, where each walker is represented by the following elements: real-valued weight W , phase θ , and single Slater determinant |Ψ .Therefore, the exact many-body ground-state wavefunction can be approximated as where the index k stands for time steps and the index w for walkers.|Ψ T denotes the trial wavefunction.The walkers are initialized as follows: The equations of motion for the walkers are W w k+1 e iθ w k+1 = W w k e iθ w k During the AFQMC propagation, this approximate groundstate wavefunction grants access to the observables of the system.The simplest and probably the most important observable-the total energy-is defined as the mixed expectation value of the Hamiltonian where the local energy estimator E loc (Ψ w ) is computed using the generalized Wick's theorem 70 The one-body reduced density matrix G w pq is defined as Here, Ψ T and Ψ w represent the N e occupied orbitals in the N basis functions.Equations (7, 8, 9, 10) define the freepropagation (fp)-AFQMC.

A. Mean-Field Subtraction
A decrease in the magnitude of the diagonal components of the Cholesky vectors L g,pq leads to smaller fluctuations in AFQMC and thus better statistics.To this end, one introduces a shift to the two-body Hamiltonian Here, we identify the second term as the classical Hartree potential corresponding to the trial wavefunction and add it to the one-body Hamiltonian.The last term is the Hartree energy corresponding to the trial wavefunction.The mean-field subtraction incurs no computational cost for the AFQMC propagation.

B. Importance Sampling
Introducing an importance function in AFQMC shifts the Gaussian probability density in Eq. ( 3).This shift is referred to as the "force bias".Choosing the force bias as minimizes the statistical fluctuations in the random fields. 58fter applying the mean-field subtraction and importance sampling, the effective Hamiltonian (Eq.( 4)) changes to The force bias is a walker-dependent quantity and must be calculated in each time step for each walker.The force bias is crucial to successful AFQMC simulations, however, computing it entails considerable computational effort for the AFQMC procedure and may even be the computational bottleneck. 59

C. Phaseless Approximation
The fp-AFQMC is formally exact but suffers from the fermionic phase problem.The problem is observed in the expression for the energy evaluation (Eq.( 9)).After the equilibration time, during which excited states contained in |Ψ I decay exponentially, walkers will populate the entire complex plane uniformly.Therefore, the denominator in Eq. ( 9) vanishes and the whole expression becomes ill-defined.To circumvent the fermionic phase problem, the phaseless approximation is introduced.
Since the effective Hamiltonian given in Eq. ( 15) is complex-valued, the orbitals become complex-valued as well.In DMC, |Φ and − |Φ are both valid solutions of the Schrödinger equation.Similarly in AFQMC, if |Φ is the valid solution, then e iθ |Φ is a valid solution for any θ ∈ [0, 2π).To prevent the exponential growth of the noise in AFQMC simulation, Zhang et.al. 52 introduced a phaseless approximation where the importance sampling factor I w and the phase change ∆θ are defined as The update equation ( 16) can be also rewritten as Zhang et.al. 52,58 state that different approximations yield the same expectation values and slightly different standard deviations.
In this work, we investigate this statement more quantitatively.To this end, we modify the expression for the total energy evaluation (Eq.( 9)) and the update procedure for the walker weights We refer to this new approach as ph * -AFQMC throughout this paper.In contrast to the original approach, there are two key differences: First, we retain the complex nature of the walker weights and use the real part for the population control and evaluation of physical observables, such as the energy.Walkers are killed explicitly if Secondly, we include the real part of the importance weight instead of the absolute value.We suggest that only the real part is meaningful and using the absolute value increases the weights with the imaginary part.This may contribute to the systematic overcorrelation of ph-AFQMC.We will show in Sec.IV that ph * -AFQMC systematically yields correlation energies smaller in absolute value.

III. IMPLEMENTATION DETAILS
In this section, we present our implementation of AFQMC in QMCFort. 71QMCFort is written in Fortran and utilizes (threaded) BLAS/LAPACK for fast linear algebra and MPI and OpenMP for parallelization.In addition to AFQMC, QMCFort offers implementations of restricted, restricted open-shell, and unrestricted Hartree-Fock (RHF, ROHF and UHF) and MP2.The electron repulsion integrals (ERIs) are calculated using the McMuchie-Davidson scheme 12 and decomposed into Cholesky vectors.Therefore, QMCFort can act as a standalone tool to setup and run AFQMC simulations.However, only the AFQMC part is highly optimized for largescale calculations.For this reason, we also interface QMCFort with PySCF 72 for isolated systems and with VASP 2 for extended systems.The implementation of the computationally intensive parts of the AFQMC procedure and the interface between QMCFort and VASP are detailed in the remainder of this section.

A. Effective Hamiltonian
At each time step and for each walker, we build the effective Hamiltonian according to Eq. ( 15).The computationally demanding part is the convolution of the shifted random fields with Cholesky vectors ∑ g (x w g − f w g ) • L g,pq .Treating the orbital indices p and q as a combined index, we map this contraction to matrix-matrix multiplication with a cubic scaling (∼ N 2 N g N w ) in system size.Typically, we treat in the range of 10 to 100 walkers per MPI rank and compute this contraction for all of them simultaneously.Although the exchange energy evaluation (Sec.III D) scales worse, the creation of the effective Hamiltonian is often the most expensive operation in AFQMC because it is required at each time step.

B. AFQMC Propagation
The effective Hamiltonian propagates the orbitals according to Eq. (7).The matrix exponentials in the Trotter-Suzuki propagator 73,74 are approximated by the sixth-order Taylor expansion.This requires several applications of the effective Hamiltonian to the orbitals, i.e. matrix-matrix multiplications of the form These scale as N 2 N e .

C. Force Bias
The force bias evaluation (see Eq. ( 14)) is equivalent to the evaluation of the Hartree potential and energy.It scales cubically with respect to the system size, i.e.NN g N e N w .Similar to the effective Hamiltonian (Sec.III A), the force bias is computed for all walkers simultaneously.To make it more transparent, we will first define the contracted Cholesky vectors and the force bias is then Here, we can treat {p, i} as a single index and evaluate the contraction with a matrix-matrix multiplication.We use the force bias to compute the Hartree energy for the walker

D. Exchange Energy
Of all individual contributions to AFQMC, the computation of the exchange energy incurs the largest computational cost.Although the naive implementation of Eq. ( 10) scales as N 3 N g N w , we simplify it by computing intermediate arrays The exchange energy of the walker Ψ w is then given by The quartic-scaling (NN g N 2 e N w ) construction of the intermediate arrays α w g,i j is also mapped to a matrix-matrix multiplication.Multiple walkers are treated simultaneously in a block of approximately 5-10 walkers for optimal performance.One can evaluate the exchange energy only say every 10th time step.This has no significant impact on the statistics because one needs to sample independent local energies so the alternative is block averaging.Computing the exchange energy from the α tensors scales as N g N 2 e but the computational effort is negligible in comparison to other tasks discussed in this section.

E. Other tasks
Besides the operations mentioned above, there are a few additional parts of the AFQMC procedure that should be mentioned.Since the AFQMC propagator is non-unitary, we periodically reorthogonalize the walkers by a QR decomposition of the walker matrices Ψ w .The walker population is controlled to avoid too small or too large weights.We opted for the comb method 75 , because it keeps the total number of walkers constant.

F. Performance
In Table I, we summarize the number of operations corresponding to the computationally demanding tasks.Assuming that the number of walkers is independent of the system size, we list the resulting asymptotic large system-size scaling for these tasks.We gauge the asymptotic behavior for water clusters of various sizes containing up to 5 water molecules.The calculations were performed on a dual-socket Intel Skylake Platinum 8174 with 24 cores each and a nominal base frequency of 3.1 GHz.We average the timings over 200 AFQMC steps, with each step processing 4 800 walkers on 48 MPI processes.Fig. 1 shows the elapsed times per AFQMC step of the computationally demanding tasks.To estimate the effective scaling exponent for each task, we fit the power function f (x) = Cx β through the data points.As predicted, the construction of the effective Hamiltonian, AFQMC propagation, and force bias calculation scale cubically.The measured exponent β = 3.1 for the exchange energy does not exactly match the expected exponent β = 4.With increasing system size, we observe an increasing per-core performance of the exchange evaluation (Fig. 2).This increase effectively reduces the scaling exponent.Still, the exchange evaluation is the least performant of the relevant computational tasks.As depicted in Fig. 2, most other tasks run efficiently with the performance of 20-35 GFLOPS/core.A notable exception is the AFQMC propagation that reaches a performance above 50 GFLOPS/core for large systems, very close to peak performance.

G. Interface with VASP
VASP provides a set of one-electron mean-field orbitals {φ p (r)}.From these, we compute the single-particle Hamiltonian matrix elements h pq and Cholesky vectors L g,pq (see Appendix for more details).We export the matrix elements and Cholesky vectors to a file and read them with QMCFort.Then, the AFQMC calculation for extended systems is equivalent to calculations on isolated molecules using the Gaussian basis set.
As an example, we compute the 2x2x2 supercell of diamond.The convergence of the AFQMC and CCSD(T) correlation energies with respect to the number of canonical orbitals is studied.CCSD(T) energies are computed using Cc4s code. 76A lattice constant of a = 3.567 Å is used with an energy cutoff of E cut = 700 eV.In addition, we use an energy cutoff E χ cut = 500 eV for the truncation of the Coulomb kernel.VASP relies on PAW potentials 3,77 and the potential referred to as C_GW with valence 2s2p was used.The core radius of this potential is r core = 1.5 a.u.; the local potential is equivalent to the d-pseudopotential and 4 projectors are used (two s-projectors with r cut = 1.2 a.u., and two p-projectors with r cut = 1.5 a.u.).
Figure 3 shows the convergence of the correlation energy as a function of the number of canonical orbitals for ph-AFQMC and CCSD(T).The number of orbitals is varied from the small basis set with 64 electrons in 192 orbitals (64e, 192o) to a relatively large basis of 64 electrons in 1024 orbitals (64e, 1024o).A 1/N behavior is fitted through the data points.The ph-AFQMC and CCSD(T) lines are almost indistinguishable sug- 3. Convergence behavior of the CCSD(T) and ph-AFQMC correlation energies as a function of the number of canonical orbitals for diamond.Linear extrapolation to the complete basis set is performed using the first three points (1024 −1 , 512 −1 , and 320 −1 ).The calculations were performed using a 2x2x2 diamond supercell with an energy cutoff of 700eV.
gesting that the methods yield consistent energies for periodic systems.Further details on AFQMC calculations for extended systems will be reported in a forthcoming publication. 78o compare the computational cost of the CCSD(T) and ph-AFQMC, we conducted calculations using 256 and 1024 bands.For the 256-band calculation, the CCSD(T) calculation took 20 minutes on a single node (2x AMD Epyc (Milan)), while the ph-AFQMC calculation took 11 hours.However, for the 1024-band calculation, CCSD(T) required 24 hours on 4 nodes (2x AMD Epyc (Milan)), while ph-AFQMC calculation took 48 hours to complete.The timings we obtained for the CCSD(T) and ph-AFQMC calculations demonstrate the favorable scaling of the AFQMC method, which becomes increasingly advantageous as the number of orbitals and the complexity of the system increase.

IV. RESULTS AND DISCUSSION
In the following four subsections, we report (i) the total molecular energies for 26 molecules from the HEAT set, (ii) the AFQMC energy of the benzene molecule using the cc-pVDZ basis set, and (iii) the AFQMC binding energies of the water clusters.

A. Heat Molecules
We calculated total energies for 26 molecules from the HEAT set using the frozen-core approximation and geometries from Ref. 23 (HEAT).All calculations employed the double-zeta basis set (cc-pVDZ).We used the QMCFort code to calculate Cholesky vectors, the reference Hartree-Fock (RHF/UHF) wavefunctions, and all AFQMC energies.We TABLE II.Total molecular energies (in Hartree units) for the HEAT set 22 molecules at different levels of theory: ph-AFQMC 52 , ph * -AFQMC, CCSD(T), CCSDTQP 23 and HCI 14,15 .All energies are calculated using the cc-pVDZ basis set.The last three rows of the table depict the root-mean-square deviation (RMSD), mean signed deviation (MSD), and mean absolute deviation (MAD).used the PySCF package 72 to calculate the CCSD(T) energies.The Dice code 14,15 was used to calculate the heat-bath CI energies.To eliminate possible sources of systematic errors in the AFQMC, we (i) truncated Cholesky vectors at a conservative threshold of 10 −8 to ensure a reasonably accurate representation of the molecular ERIs, (ii) chose a relatively small time step of 0.002 E h −1 to exclude significant time-step errors, and (iii) equilibrated the system for 40 000 time steps.To ensure good statistics, we propagated 6 000 walkers until the standard error of the mean of the predicted molecular energies dropped below 0.2 mE h .

Molecule
Table II compares the ph-AFQMC and ph * -AFQMC molecular energies to the "gold-standard" CCSD(T) and the more accurate coupled-cluster expansion including variational triple, quadruple and pentuple excitations (CCSDTQP).We use the latter as reference values since they are practically converged against the FCI limit.The CCSDTQP value for the CO 2 molecule is missing in Ref. 23, so we use the result of the heat-bath configuration interaction (HCI). 14To verify the accuracy of the HCI method, we selected a few other molecules (H 2 , H 2 O, CN, N 2 ) and found a maximal difference to the CCSDTQP values of 0.01 mE h , well below the statistical AFQMC errors.Comparing to these reference values, we find a similar precision for the ph-AFQMC and ph * -AFQMC method with a root-mean-square deviation (RMSD) of 2.89 mE h and 2.73 mE h , respectively.For context, the CCSD(T) exhibits a RMSD of 1.65 mE h for these molecules.The mean absolute deviations (MAD) of 1.39 mE h (CCSD(T)), 1.85 mE h (ph-AFQMC) and 1.84 mE h (ph * -AFQMC) show a smaller difference between the methods indicating that some outliers taint the overall performance of AFQMC.The mean signed deviation (MSD) of 1.39 mE h (CCSD(T)) and 1.17 mE h (ph * -AFQMC) indicate under-correlation in contrast to ph-AFQMC with an average over-correlation of −0.38 mE h .The modified approach (ph * -AFMQC) is also closer to CCSD(T) with an RMSD of 2.66 mE h compared to ph-AFQMC with an RMSD of 3.67 mE h .
Next, we present a more detailed analysis based on the graphical representation of the results in Fig. 4. Let us focus on the more-widely-used CCSD(T) method first.Fig. 4 illustrates that CCSD(T) systematically under-correlates compared to the reference values.The most frequent deviation lies around 1 kcal/mol (i.e.1.59 mE h ).The worst case is the CN molecule with a deviation of 4.1 mE h .The reason for the larger deviation is probably the large spin contamination in the UHF wavefunction. 23e ph-AFQMC and ph * -AFQMC results show similar trends.The O 2 molecule is the worst case for AFQMC with an undercorrelation of 8.6 mE h .Solving this problem requires a multi-determinant trial wavefunction, for example from a complete active space self-consistent field (CASSCF) calculation in the open-shell subspace.A similar problem in open-shell atoms is solved by using multi-determinant CASSCF trial wavefunctions or by using symmetry restoration techniques. 59Excluding O 2 from the statistics reduces the RMSD to 2.41 and 2.11 mE h for the ph-AFQMC and ph * -AFQMC method, respectively.
In contrast to the O 2 molecule, the ph-AFQMC values for CN, CO, and CO 2 molecules show considerable overcorrelation (up to 7 mE h for the CN molecule).The origin of these deviations is the fixed-node error.Our modified approach (ph * -AFQMC) systematically increases the energies compared to ph-AFQMC.It is noteworthy that the differences are larger in the cases where the ph-AFQMC shows larger over-correlation.The fixed-node errors are therefore significantly reduced for the CN, CO, and CO 2 molecules but the residual errors are still sizable.Better trial wavefunctions could further reduce the fixed-node errors.Borda et.al. 65 performed similar benchmarks using the G1 test set that partially overlaps with the HEAT set.They also used cc-pVDZ basis sets with the frozen-core approximation and compared AFQMC total energies calculated with the QMCPACK package to the CCSDTQ energies.While we used UHF trial wavefunctions, they used ROHF ones for the AFQMC calculations.Fig. 5 shows the deviation of ph-AFQMC from the reference energies.Overall, most energies agree within 1 mE h between the two ph-AFQMC calculations.For the remaining cases, our results appear to be closer to the reference result except for the CN molecule.Possible reasons for these differences include: (i) smaller statistical and systematic errors in the present work, (ii) different molecular geometries, and (iii) different trial wavefunctions.The latter matters particularly for the CN molecule because the large spin contamination in the UHF wavefunction makes the ROHF wavefunction a better choice for the trial wavefunction.
Next, we study the basis-set impact by comparing the total energy differences between ph * -AFQMC and CCSD(T) for the cc-pVDZ basis set and the roughly six-times larger augcc-pVQZ one.The latter includes core electrons overcoming the frozen-core approximation of the smaller basis.Fig. 6 shows remarkably similar energy differences for both basis sets.Three notable exceptions are F 2 , O 2 , and OH where the deviation is slightly larger than 1 kcal/mol.This result is important for two reasons: First, the comparison between ph * -AFQMC and CCSD(T) is sufficiently accurate using a cc-pVDZ basis set.Hardly any new conclusion could be drawn from the complete basis-set limit.Secondly, the ph * -AFQMC precisely describes core-electron effects at the accuracy of the CCSD(T) method.

B. Benzene Molecule
The ground state of the benzene molecule with double-zeta basis set (cc-pVDZ) is an interesting system to benchmark state-of-the-art correlation-consistent methods because it is among the largest systems that can be treated directly using FCI diagonalization.The benzene molecule in the cc-pVDZ basis set contains 108 orbitals and 30 electrons (in the frozencore approximation).
Eriksen et.al. 79 performed a blind test on the benzene molecule comparing ten different methods.They include the coupled-cluster expansion methods, different selected CI methods, and quantum Monte Carlo methods.The authors agreed that the full coupled-cluster reduction (FCCR) and many-body FCI expansion (MBE-FCI) essentially yield the exact correlation energy of −863.0 mE h .Lee et.al. 43 supplemented the study with ph-AFQMC results using two different trial wavefunctions: RHF wavefunction and CASSCF (6,6)  multiconfigurational wavefunction.The ph-AFQMC+RHF overestimates the correlation energy by 3.1 mE h , while the ph-AFQMC+CAS (6,6) over-correlates by 1.2 mE h .For the sake of completeness, we augmented the study with CCSD(T) results which are under-correlated by 3.5 mE h .Our ph-AFQMC+RHF result is equivalent to the ph-AFQMC+RHF result calculated using the QMCPACK code. 80,81This serves as an additional validation of our AFQMC implementation in QMCFort.Similar to CN, CO, and CO 2 , ph * -AFQMC+RHF reduces the absolute value of the correlation energy considerably and leads to an energy under-correlated by 1.3 mE h .In this case the quality of the ph * -AFQMC+RHF is seemingly similar to the quality of ph-AFQMC+CAS (6,6).However, better trial wavefunctions promise improved accuracy and, most importantly, more controlled results.All methods and respective correlation energies are visualized in Fig. 7.

C. Water Clusters
Great effort has been put into developing empirical models that faithfully represent the properties of bulk water.It turns out that a good model must also adequately describe the small water clusters present in the Earth's atmosphere.Temeslo et.al. collected representative structures for water clusters of various sizes. 82They investigated which contributions are important to obtain accurate formation energies.Motta et al. reported deviations of AFQMC and CCSD(T) larger than the statistical fluctuations. 54Here, we revisit these clusters to scrutinize the accuracy of AFQMC for water.
We estimate the binding energy of the most stable water cluster with n ≤ 5 H 2 O molecules Here, E(nH 2 O) is the total energy of the cluster, and E(H 2 O) is the ground-state energy of the water molecule.The cluster geometries are taken from Ref. 82.We relaxed the single water molecule using MP2 at aug-cc-pVDZ basis set and obtain a bond length of 0.96593 Å and a bond angle of 103.866 • .We used the heavy-augmented basis set (aug-cc-pVDZ for O atom and cc-pVDZ for H atom) and all-electron wavefunctions to compare our results with Ref. 54.The PySCF package 72 produced the CCSD(T) correlation energies and QMCFort the RHF, MP2, and AFQMC ones.We truncated the Cholesky decomposition of the ERIs with a threshold of 10 −6 E h .We propagated 4 800 walkers for 140 000 steps with a time-step of 0.01 E h −1 .The exchange energy was evaluated after every 100 steps.For the largest cluster, we used half as many walkers.This setup keeps the systematic errors in the binding energy within 0.05 kcal/mol.While the RHF binding energies differ significantly from the correlation-consistent methods, all other methods agree within chemical accuracy.Our AFQMC values are in better agreement with MP2 and CCSD(T) values than the previous AFQMC values.The large statistical errors in the reference data might partially explain the discrepancy in AFQMC binding energies.However, the systematic under-correlation of the reference AFQMC binding energies indicates the existence of systematic errors, too.One possible source of the systematic errors could be the looser threshold in the Cholesky decomposition of the ERIs (10 −4 E h ) in Ref. 54.
To test ph-AFQMC and ph * -AFQMC for size consistency we dissociated the cluster into individual molecules by shifting the second and third molecule in the 3UUD cluster by 8 Å in x and y direction, respectively.We computed the total energy of the stretched cluster and of the individual water molecules individually.We summarize our findings in Table IV.Within statistical fluctuations, all fragments exhibit the same total energy which is equal to a third of the stretched cluster.This demonstrates that ph-AFQMC and ph

V. CONCLUSION
The phaseless auxiliary-field quantum Monte Carlo is increasingly popular due to its high accuracy, the low polynomial scaling (N 3 −N 4 ), and its applicability to quantum chemistry and condensed matter physics.Using the DFT or HF solutions as starting point, it can be considered as a natural extension of these methods with similar scaling but higher accuracy.
We have presented a Fortran implementation of the AFQMC, QMCFort, that enables efficient large-scale calculations on CPUs.The code is parallelized using MPI, OpenMP, and parallel BLAS, and runs near peak performance for typical systems (see Fig. 2).QMCFort can run AFQMC simulations independently or obtain Cholesky vectors of the twoelectron intermediates via interfaces to VASP and PySCF.
Using QMCFort, we compared the accuracy of the ph-AFQMC method to the "gold-standard" CCSD(T) and the more accurate CCSDTQP method.For this purpose, we calculated the ph-AFQMC energies of the 26 molecules in the HEAT set, the benzene molecule, and water clusters.For the HEAT set, the ph-AFQMC yields a mean absolute deviation (MAD) of 1.85 mE h (1.15  nize in the absence of reference calculations.Furthermore, CCSD(T) is certainly more robust for the HEAT set and is highly accurate even if the groundstate wavefunction involves significant double excitations.One possible explanation for the presence of outliers and the failure of the AFQMC is that the method fails to account accurately for strong double excitations.This suggests that the AFQMC with a single determinantal trial wavefunction is only capable to treat weakly correlated materials that lack strong static correlation effects even in the space of double excitations.
For the remaining molecules in the HEAT set, ph-AFQMC performs similarly to or better than the CCSD(T).We modified the phaseless approximation-ph * -AFQMC-to overcome at least partly the over-correlation problems often encountered with ph-AFQMC.For the CN, CO, and CO 2 molecules, where the over-correlation effects are particularly pronounced, the modified approach indeed improves the energies significantly.In the case of the benzene molecule, ph * -AFQMC yields a correlation energy 1.3 mE h higher than the FCI reference value.This is noticeably better than ph-AFQMC and comparable to the accuracy of ph-AFQMC with a multi-determinant CAS(6,6) trial wavefunction.Finally, the AFQMC binding energies of water clusters agree with the CCSD(T) and MP2 binding energies to within 0.5 kcal/mol.For the four water clusters, our present results are generally closer to CCSD(T) results than previous AFQMC results, substantiating our claim that the AFQMC method with a single Slater determinant is particularly suitable for weakly correlated molecules and materials.
Both the ph * -AFQMC, as well as the ph-AFQMC, are rather ad hoc approaches to deal with the fermionic sign problem and the exponential increase of the noise.We feel that the present work does not conclusively show that the ph * approximation is superior to the ph approximation.However, the present work clearly shows that even minor changes to the phaseless approximation can affect the final results.Although we could not find an approximation that more generally improves the results, we believe that further research in this direction is well warranted.
In summary, the ph-AFQMC is a promising and reliable method with high accuracy and reasonable computational cost.It provides nearly chemical accuracy for the small molecules studied in this work if the molecules are weakly correlated.We plan to further improve the accuracy of QMCFort through algorithmic improvements and the use of more accurate trial wavefunctions.

VI. ACKNOWLEDGMENTS
Funding by the Austrian Science Foundation (FWF) within the project P 33440 is gratefully acknowledged.All calculations were performed on the VSC4 / VSC5 (Vienna scientific cluster).

FIG. 2 .
FIG.2.Performance per CPU core of the computationally intensive operations in the AFQMC procedure measured on water clusters of different sizes.Tuples on the x-axis denote the number of electrons and orbitals.Executing the AFQMC code on one CPU node (dual-socket Intel Skylake Platinum 8174) delivers 1-1.5 TFLOPS (30 GFLOPS per core, green line) on average.

FIG. 4 .
FIG.4.CCSD(T), ph-AFQMC and ph * -AFQMC total-energy differences ∆E relative to the reference CCSDTQP energies for the HEAT set.Calculations use the cc-pVDZ basis set and a frozen-core approximation including only the last occupied shell.Trial wavefunctions are single RHF/UHF Slater determinants.The solid black line represents the chemical accuracy range (∆E < 1 kcal/mol)

TABLE I .
Number of operations per walker and time step and asymptotic scaling of the computationally intensive parts of the AFQMC procedure.N represents the number of basis functions, N g is the number of Cholesky vectors, and N e is the number of occupied states in the system.
FIG. 1. Elapsed time per AFQMC step of the computationally intensive operations measured on water clusters of different sizes.AFQMC calculations were performed on a dual-socket Intel Skylake Platinum 8174 with 4 800 walkers, and 48 MPI processes.Timings are averaged over 200 AFQMC steps.

TABLE III .
Binding energies of the four most stable water clusters containing up to 5 H 2 O molecules calculated using heavy-augmented cc-pVDZ basis set and all-electron wavefunctions.RHF, MP2, CCSD(T), and different ph-AFQMC values in kcal/mol are reported.
* -AFQMC are size-consistent.FIG. 6. Total-energy difference ∆E between ph * -AFQMC and CCSD(T) for the HEAT set.The cc-pVDZ calculations employ the frozen-core approximation; the aug-cc-pVQZ ones include all electrons.Trial wavefunctions are single RHF and UHF Slater determinants for closed-and open-shell molecules, respectively.The solid black line represents chemical accuracy (∆E < 1 kcal/mol).

TABLE IV .
The AFQMC energies of the stretched 3UUD cluster and its fragments, and the binding energies demonstrate that ph-AFQMC and ph * -AFQMC are both size-consistent (all values are given in E h ).
72al/mol) compared to CCSDTQP values.The poor performance for four molecules (CN, CO, CO 2 , O 2 ) is responsible for a large part of this deviation.Such outliers are clearly troublesome as they are difficult to recog-FIG.7.Correlation energies for the benzene molecule using the cc-pVDZ basis set and the frozen-core approximation.ph-AFQMC(RHF),ph* -AFQMC(RHF) correlation energies are obtained using QMCFort code, while the CCSD(T) energy is calculated using PySCF code.72Othervalues are taken from the Refs.43 and 79.The red line represents the reference correlation energy of −863.0 mE h , while the dashed black lines represent the range of the chemical accuracy (1 kcal/mol).All energy values are given in mE h .
, we introduced a truncated singular value decomposition to obtain the Cholesky vectors L g,pq = SVD