Charge self-consistent many-body corrections using optimized projected localized orbitals

In order for methods combining ab initio density-functional theory and many-body techniques to become routinely used, a flexible, fast, and easy-to-use implementation is crucial. We present an implementation of a general charge self-consistent scheme based on projected localized orbitals in the projector augmented wave framework in the Vienna Ab Initio Simulation Package (VASP). We give a detailed description on how the projectors are optimally chosen and how the total energy is calculated. We benchmark our implementation in combination with dynamical mean-field theory: first we study the charge-transfer insulator NiO using a Hartree-Fock approach to solve the many-body Hamiltonian. We address the advantages of the optimized against non-optimized projectors and furthermore find that charge self-consistency decreases the dependence of the spectral function - especially the gap - on the double counting. Second, using continuous-time quantum Monte Carlo we study a monolayer of SrVO$_3$, where strong orbital polarization occurs due to the reduced dimensionality. Using total-energy calculation for structure determination, we find that electronic correlations have a non-negligible influence on the position of the apical oxygens, and therefore on the thickness of the single SrVO$_3$ layer.


I. INTRODUCTION
The advances in the field of nanostructures, where control is now experimentally possible on an atom-by-atom scale, have given rise to a strong demand for theoretical simulation tools that are capable of simulating complex correlated electron systems such as heterostructures, clusters or adatom arrays on surfaces. There are several successful approaches that can be used for that. Among the most widely used are ab-initio approaches, notably density functional theory (DFT) and many-body model Hamiltonians.
Modern Kohn-Sham DFT, with the local density approximation (LDA) 1 or a generalized gradient approximation (GGA) 2 functional, yields various ground-state properties including crystal structures quite reliably for many materials, essentially without any ambiguous input parameters. The auxiliary single-particle energies can be seen as an estimate of the electronic quasi-particle energies, 3 sometimes giving good qualitative agreement. However, (semi)local functionals like LDA or GGA are well known to fail to capture the physics of stronglycorrelated materials such as Mott insulators, unconventional superconductors, heavy Fermion systems, or Kondo systems. This is why, for treating these systems, model Hamiltonians such as the Hubbard model are usually employed.
Generally speaking, these approaches focus on the description of electron correlation phenomena in a minimal low-energy Hilbert space. They are accessible by a broad set of many-body techniques including dynamical meanfield theory (DMFT) 4 and generalizations thereof. These models naturally depend on model parameters that are unknown a priori. Keeping the number of parameters low can lead to over-simplified models that fail to explain sufficiently well experimental findings. On the other hand, keeping a large number of unknown parameters makes results ambiguous already from the outset. Therefore, to obtain the best of both worlds, it appears natural to combine the complementary strengths of both approaches to model correlated electron systems in a realistic manner. To achieve this goal, there has been an ongoing effort in the development of approaches like DFT+DMFT 5,6 for more than 20 years now. The class of materials addressed by DFT+DMFT is very wide and includes Mott insulators, correlated metals, superconductors, and magnetic materials.
On the DFT side, simulations using projector augmented wave (PAW) 7 basis sets turn out to provide a good compromise between accuracy and computational demand in these systems. Thus, several DFT+DMFT approaches have been implemented with PAW basis sets on the DFT side. 8,9 In its most simple formulation, the DMFT is performed on top of a converged DFT cal-arXiv:1804.02055v1 [cond-mat.str-el] 5 Apr 2018 culation (so-called one-shot DFT+DMFT). For many systems, especially those where the strong correlations cause a rearrangement of charges, the success of the method can be significantly improved by performing a fully charge self-consistent calculation. [10][11][12][13][14] There, the electron density obtained from the DMFT calculation is fed back to the DFT code and the entire loop is selfconsistently solved. There already exist implementations of DFT+DMFT using the PAW formalism which include charge self-consistency. [15][16][17] Here we describe a fully charge self-consistent implementation based on optimized projected local orbitals in the Vienna Ab Initio Simulation Package (vasp). [18][19][20] On the DFT side, it is flexible and easy to use; vasp provides the data necessary to construct the low-energy model of the system and offers an interface so that the updated charge density can be handed back. This makes it possible to combine it, e.g., with any kind of many-body correction beyond DFT in some correlated subspace defined via local projection operators without requiring any changes to the DFT code. On the DMFT side, we present two codes making use of this extension of vasp.
The paper is organized as follows. In section II we explain the details of our approach, particularly, the definition of the optimized local projectors and the way the charge feedback from the many-body to the DFT part is implemented. We then present an application to the testbed material of NiO in section III, where we demonstrate how full charge self-consistency affects simulations of the electronic structure particularly in relation with the so-called double-counting problem. In section IV, we report an application of our scheme to monolayers of SrVO 3 , where we compare our implementation, in the triqs 21 framework, to the already existing interface between triqs and the DFT code wien2k. 22,23 Furthermore, for that material, we demonstrate total energy calculations and find structural changes induced by correlations.

II. DETAILS OF THE IMPLEMENTATION
A. Correlated subspace from optimally projected local orbitals To perform DFT+DMFT calculations one needs a way to transform between the basis of the Kohn-Sham (KS) states and the localized basis of the subspace used to define lattice models, e.g. a Hubbard model. The most obvious way to do this is to perform a unitary transformation of a subset of KS states to construct a set of local states. This method is at the heart of various types of Wannier function based methods, such as the maximallylocalized Wannier functions. 24 In many cases, when KS bands with different characters are strongly entangled, it becomes however difficult to construct a well-defined unitary transformation. In this case it is more advantageous to use projector oper-ators which, acting on the KS Hilbert space, project out only KS states with a desired character. This is the basis of projected localized orbitals (PLO). 8 In the PLO formalism one starts by defining an orthonormal localized basis set |χ L associated with each correlated site, which is typically indexed by local quantum numbers, e.g. L = {l, m, σ, ...} with (l, m) and σ being orbital and spin angular-momentum quantumnumbers, respectively. {|χ L } spans a correlated subspace C at each correlated site. Assuming χ L |χ L = δ LL , any operator acting on this space can be constructed by projecting onto C using a projector operator P C , i.e.,Â An arbitrary vector |Ψ of the Hilbert space can be decomposed in terms of local states by writingP C |Ψ = L |χ L χ L |Ψ . If we now consider a complete basis |Ψ µ , the projector operator is completely defined by specifying PLO functions P L,µ ≡ χ L |Ψ µ .
As described in detail in Refs. 8 and 9, the PLO projector in the PAW framework 7,18 can be written as where |χ R L are localized basis functions associated with each correlated site R, |φ i are all-electron partial waves, |p i are the standard PAW projectors. In the following, we will omit the site index R unless it leads to a confusion. The index i stands for the PAW channel n, the angular momentum quantum number l, and its magnetic quantum number m, and |Ψ νk are pseudo-KS states. For the PAW potentials distributed with vasp, generally a minimum of two channels n for each angular quantum number are used. For instance for 3d elements, the first 3d channel is placed at the energy of the bound 3d state in the atom, and a second channel is added a few eV above the bound atomic 3d state. Inclusion of the second channel improves the transferability of the PAW potentials and the description of the scattering properties greatly. For s and p states in, e.g., 3d transition metals, the first channel usually describes 3s and 3p semi-core states and the second channel is placed somewhere in the valence regime (4s and 4p states) or even above the vacuum level. Although a description of how the projectors have been obtained is stored in more recent PAW potential files, it is not always straightforward for the user to identify the best suitable PLO projectors. Specifically, various choices are possible for |χ L . 9 Conveniently, one might simply use the all-electron partial waves for a particular PAW channel n, |φ Ln , as the local basis. However, doing so in vasp can lead to a nonoptimal projection. For instance, for transition metals to the left of the periodic table, the d states in the solid might be more or less contracted than those in the atom, so that the "best" |χ L is a linear combination of the two available 3d channels. Likewise, projection onto the TM valence s states is often difficult, because of the presence of semi-core s states in the PAW potential file. Ideally, the user should not need to make a manual choice of the local |χ L functions.
In the following we explain a protocol that largely resolves these issues. The first step is not strictly required, but simplifies the subsequent coding somewhat. We first construct a set of PAW projectors and partial waves that are orthogonalized inside the PAW sphere. This can be achieved by diagonalizing the all-electron one-center overlap matrix O nn (for each angular and magnetic quantum number L), which gives the eigenvalues λ n and the eigenvector matrix U , The new partial waves and the corresponding PAW projectors can now be defined as linear combinations of the original ones: This new set of projectors and partial waves have the important property that the all-electron partial waves |ξ Ln are orthogonal to each other and that they are normalized to one. Thus, we will now refer to them as "orthonormal" partial waves. The choice above is not unique, though, for instance one might also adopt a Gram-Schmidt orthogonalization procedure.
To construct a unique projector, we seek a unitary transformation that maximizes the overlap between the projector and KS valence states within a chosen (user supplied) energy window [ε P min , ε P max ]. Specifically, we diagonalize a matrix where the sum is restricted to states with energies ε νk ∈ [ε P min , ε P max ]. We then pick the eigenvector υ n with the largest absolute eigenvalue to construct local basis functions and corresponding PAW projectors, with projector functions given by 25 We note in passing that both steps could be combined into a single computational step, by diagonalization of a generalized eigenvalue problem involving the overlap matrix O and the matrix M evaluated using the original set of projectors and partial waves. Note that since the all-electron partial waves do not form an orthonormal basis set between different PAW spheres and because projection is performed for a subset of KS states, the above procedure will usually produce non-normalized local states. However, normalized local states can be constructed if we orthonormalize the projector functions as follows, On a technical level, the optimal projectors are constructed internally in vasp according to Eqs. (4-12) for each sites R given an energy window [ε P min , ε P max ]. The last orthonormalization step Eq. (14) is done externally as post-processing. This allows one to choose a different energy window for the summation in Eq. (13) in order to fine-tune the degree of localization of the impurity Wannier functions.

B. Total energy and charge self-consistency
A general extension of DFT Kohn-Sham equations to a many-body problem based on the Hubbard model can be formulated using the total-energy functional 10,26 whereĜ(k, iω n ) is the Green's function containing manybody effects, andĤ KS (k) = ν |Ψ νk ε νk Ψ νk | is the KS Hamiltonian. The charge density, n(r), and Green's function in the correlated subspace,Ĝ C (k, iω n ), are both related toĜ(k, iω n ) by whereP C k projects onto correlated localized states, see Eq. (2). E corr is the energy contribution of the interaction term (Hubbard U -term) and E dc is the double-counting correction.
The first four terms of the above expression form the usual DFT total energy calculated for the density matrix and charge density of the interacting system, which is non-diagonal in this case. It is thus clear that to get the correct value of the total energy the density matrix must be calculated in a self-consistent manner.
In the framework of DFT+DMFT 4,6 the interacting Green's function is defined as follows: where the matrix elements of the self-energy in the basis of localized impurity states, |χ lm , are which consist of the purely local impurity self-energŷ Σ imp (iω n ), obtained from an impurity solver, and the double countingΣ dc .
Using the PLO functions we can write this Green's function in terms of its KS matrix elements where the elements ofΣ KS are obtained by up-folding the local self-energy, If we define an energy window [ε C min , ε C max ] which selects a subset W C of KS states affected by correlations, 27 the interacting charge density can be written as where we use a non-diagonal density matrix and a diagonal density matrix formed by the KS occupation numbers f νk . Note that the chemical potential µ here is generally different from the KS chemical potential µ KS . From a practical point of view it is convenient to split the charge density into a DFT part and a correlationinduced part, n(r) =n DFT (r) + ∆n(r), where ∆n(r) is the correlation correction, with the last quantity having the important property Such a definition of the new charge density is convenient because it ensures charge neutrality between DFT iterations.
In a similar way, the total energy can be split into a DFT part calculated using the correlated charge density given by Eq. (24), and a correlation part, Thus, in order to calculate the total energy and to obtain the charge density for the next KS iteration, only two quantities need to be calculated after a DMFT iteration: the density-matrix correction ∆N νν (k) and the interaction energy (including the double-counting term) E corr − E dc .
In this particular vasp implementation, we use ∆N νν (k) to obtain the natural orbitals by a transformation V given by diagonalizing the total correlated density matrix, and the charge density and the one-electron energy are, then, obtained in the same way as in the normal KS cycle, Note that the DFT band energy is calculated using the original occupancies f νk , and the second term in Eq. (31) represents essentially a band-energy correction which takes into account the change in the density matrix induced by correlations. The occupancies f νk will deviate from the usual Fermi-Dirac statistics as a result of particle fluctuations from the occupied non-interacting KS states into unoccupied states.

III. ELECTRONIC STRUCTURE OF NIO: LOCAL HARTREE-FOCK APPROXIMATION
NiO is a prototypical charge-transfer insulator where the electronic band gap on the order of 4 eV arises from a combination of strong local Coulomb repulsion U within the Ni 3d shell and the charge-transfer energy ∆ = d − p , i.e., the difference in on-site energies of O-2p and Ni-3d orbitals. 28 The uppermost valence states are of hybrid Ni-3d and O-2p character, where the O-2p contribution is dominant.
DFT in (semi)local approximations like LDA or GGA fail to describe the insulating nature of NiO. Non-spinpolarized LDA and GGA yield a metal, while spinpolarized calculations of the antiferromagnetically ordered phase of NiO yield an energy gap of ∼ 0.7 eV, which is much smaller than the experimentally established gap. 29 The reason for this failure of semilocal DFT to describe the insulating state of NiO is well known to be the insufficient treatment of the strong local Coulomb interaction in the Ni 3d shell. NiO has thus become a testbed material for realistic correlated electron approaches such as DFT+U 30 or DFT+DMFT. 5,31,32 There are two ways to formulate DFT+U: First, the traditional one, where the Kohn-Sham potential is directly augmented with a potential arising from the Hartree-Fock decoupling of the local Hubbard interaction. Second, in terms of a charge self-consistent DFT+DMFT scheme, where the DMFT impurity problem is solved in the Hartree-Fock approximation called DFT+DMFT(HF) in the following. Both schemes are fully equivalent if the augmentation spaces are chosen to be strictly the same and the way spin-polarization is accounted for is the same.
However, many DFT+U implementations, including the one in vasp, work with angular-momentumdecomposed charge densities, which in general do not relate to proper projectors in the definitions of the "+U" potentials. 33 Additionally, DFT+DMFT and DFT+U for magnetic materials can work with or without spin polarization in the DFT part.
In the following, we compare DFT+U as implemented in vasp to DFT+DMFT(HF) with and without charge self-consistency for the testbed material NiO, and investigate the dependence of the electronic spectra on the double-counting.
At first, we fix the double counting in the vasp DFT+U approach according to a prescription known as "fully localized limit" (FLL) 34 which leads to an orbital independent and diagonal version of Σ dc mm in Eq. (21), in short µ dc . Afterwards, however, we choose the double counting in the DFT+DMFT(HF) approach such that we reproduce the size of the gap in vasp DFT+U, leading to µ dc = 61.5 eV and compare the respective results. We sample the Brillouin zone using 8×8×8 k-points. For the interaction parameters we use U = 8 eV and J = 1 eV as obtained from constrained LDA calculations. 30 We perform spin polarized calculations considering antiferromagnetic ordering. For DFT+DMFT(HF) we only consider spin polarization in the Hartree-Fock part but not in the DFT part. For vasp DFT+U we do consider as usual spin-polarized densities in the DFT part. The respective density of states for spin up are presented in Fig. 1. In general, the DOS from DFT+U and DFT+DMFT(HF) are very similar: a gap of about 4 eV is opened between heavily spin-polarized Ni states in the The double-counting problem poses a serious problem when using many-body corrections in DFT+DMFT approaches predictively, since fundamental properties such as the single particle gap depend on the double counting. Here, we perform fully charge self-consistent (fcsc) and one-shot DFT+DMFT(HF) calculations for various fixed double-counting potentials µ dc to investigate its influence on the spectra and especially on the gap: The resulting total DOS for the fcsc and one-shot calculations are presented color coded in Fig. 2. In both cases the Ni states in the conduction band (dark blue around 5 eV) are shifted linearly towards the Fermi energy as µ dc increases. The O/Ni states at the valence band edge are not affected strongly by the double counting. This is because, in gen-eral, the higher the Ni character of a band, the more its energy is shifted by the double-counting correction. In the conduction band we also find states with low spectral weight in the PAW spheres (i.e., they are neither located around the Ni nor the O atoms but in the interstitial region); their energies shift to lower values with decreasing µ dc , i.e., in the opposite direction than the Ni state in the conduction band. This leads to the situation, for µ dc 58 eV in case of fcsc and µ dc 57 eV in the case of one-shot calculations, that these states cross the Ni states. Then, the conduction band edge does not consist of Ni states, which is not in agreement with experiment. Note that the "around-mean-field" AMF prescription lies in this regime both for one-shot and fcsc calculations and is thus not suitable for NiO. Similarly, the FLL prescription for fcsc calculations is very close to this regime.
For values of the double counting that give the correct conduction band character, the gap shrinks with increasing µ dc . This dependence differs strongly for the one-shot and charge self-consistent calculations: the slope of the gap dE g /dµ dc in the physical regime differs by a factor of nearly 2. Thus, the charge self-consistency does not cure but alleviates the double-counting problem by decreasing the influence of µ dc on the spectrum.

IV. BENCHMARK FOR SRVO3 MONOLAYER
SrVO 3 presents an example of a correlated transitionmetal oxide experiencing a metal-insulator transition (MIT) driven by a dimensional crossover. 35 In particular, a monolayer of SrVO 3 grown on a SrTiO 3 substrate is an insulator with a Mott gap of around 2 eV. In DFT, the compound is found to be metallic and one has to combine DFT with a many-body technique to achieve the correct insulating solution.
Here, we use a monolayer of SrVO 3 to benchmark the fully charge-self-consistent (fcsc) DFT+DMFT implementation of triqs/dfttools 23 within the vasp PLO formalism. The motivation for choosing this particular system is that electronic correlations induce an appreciable charge redistribution, making, therefore, the use of a fcsc DFT+DMFT scheme imperative. Importantly, using the triqs/dfttools framework, we can benchmark the presented vasp interface to the one that is based on the wien2k DFT package, 22 also implemented in triqs/dfttools. Furthermore, we compare our results to previously published fcsc data, also based on the wien2k code, but using MLWF as correlated basis set. 13 The system is modeled by a free-standing monolayer of SrVO 3 with the in-plane lattice constant equal to that of SrTiO 3 (3.92Å), simulating thus the epitaxial geometry. The effect of the substrate is neglected. To isolate the individual layers in a periodic unit cell, a vacuum layer of about 16Å is used in the vasp calculations.
Based on geometry relaxations on the DFT level, in the out-of-plane direction, the V-O distance is reduced from 1.96Å to 1.93Å and the Sr-Sr distance from 3.92Å to 3.52Å. The Brillouin zone is sampled using a 15 × 15 × 1 Γ-centered Monkhorst-Pack k-grid. The energy cutoff of the plane wave basis set is 400 eV, in accordance with the default value for the PAW potentials used. Using the procedure described above, the projectors onto the V d-states are calculated according to Eq. (12) and orthonormalized according to Eq. (14) in an energy window from −2 to 1.1 eV around the Fermi level. In the t 2g subspace, a Hubbard-Kanamori interaction with U = 5.5 eV and J = 0.75 eV (in agreement with Ref. 13) is added; the resulting double counting is estimated in the FLL scheme. 36 The impurity problem is solved using the triqs/CTHYB Quantum Monte Carlo 37 solver at an inverse temperature β = 40 eV −1 .
The one-shot DFT+DMFT calculation results in a nearly complete polarization of the orbitals (see fillings in Tab. I), which is found to be equal in vasp and wien2k, and which is also in accordance with published data. 13 When using the charge feedback in the fcsc framework, the empty orbitals get partly repopulated. This effect happens slightly stronger in vasp but the agreement of the two calculations based on the two different DFT codes is within the expected difference between the two implementations. This re-population can also be seen in the spectral function (Fig. 4), which is obtained using analytic continuation of the local lattice Green's function using the maximum entropy method. 38 Unlike the oneshot calculation (top panel of Fig. 4), the fcsc scheme produces a lower Hubbard band with non-negligible spectral weight also for the degenerate d xz and d yz orbitals (bottom panel of Fig. 4). The same Mott gap (whose value is in good agreement with experiment 35 ) is found starting from both DFT codes, with the peak positions being basically identical. The difference in peak height is compatible with the difference of filling between the two methodologies. The calculated spectra are in excellent agreement with results presented in Ref. 13. Full charge self-consistency allows us to calculate reliably the total energy as a function of structural parameters and to determine the lowest-energy structure in DFT+DMFT. Here, as a proof of principle, we calculate the total energy of the compound as a function of the distance between Sr-O and V-O planes. We consider deviations ∆z from the DFT-optimized structure. Positive ∆z means that the upper Sr-O plane gets shifted upwards and the lower plane gets shifted downwards, thus increasing the thickness of the slab in z direction. This changes the splitting between the d xy , which is close to half-filling, and the degenerate d xz and d yz orbitals, which are nearly empty. For more negative ∆z (i.e., thinner slabs), the d xy orbital gets even closer to half-filling, while the d xz and d yz orbitals are progressively emptied out completely (see Fig. 5, bottom). Additionally, the bandwidth decreases, enhancing thus the correlations and the size of the Mott gap (not shown here). The main result of this total-energy calculation is that the minimum calculated with DFT+DMFT is shifted significantly towards lower ∆z (Fig. 5, top), indicating that the structure with the lowest energy has a smaller slab width than obtained by DFT.
The trend in the structural change can be roughly explained in terms of an additional energy gain by removing the degeneracy between the in-plane d xy and out-of-plane d xz , d yz orbitals for smaller values of ∆z. Indeed, having both types of orbitals occupied results in an additional energy cost proportional to interorbital coupling U − 3J. Once the apical oxygen is moved sufficiently close to the V ion, the anti-bonding orbitals d xz , d yz are pushed up in energy and only d xy remains occupied (half-filled). This removes the interorbital energy cost, lowering thus the total energy.

V. CONCLUSION
We have presented a charge self-consistent implementation to combine DFT with many-body techniques in the vasp package, based on optimized projector localized orbitals (PLO) in the PAW framework. The implemented optimization, seeking the partial wave with the largest overlap with the relevant correlated subspace, is crucial for concise projections and leads to a straight-forward connection between delocalized Kohn-Sham states and localized basis functions. As usual, in the localized subspace, Hubbard-like Hamiltonians can be used straightforwardly. In contrast to a maximally-localized Wannier projection, the projector formalism is very simple, easy to implement, preserves symmetry, and does not require any special precautions for strongly entangled bands. Therefore, the projector formalism is also well suited for the simulation of correlation effects in supercells with a large number of bands.
We benchmarked our fcsc implementation for two cases. First, we compared a standard vasp DFT+U calculation with a charge self-consistent mean-field treatment of the DFT+DMFT Hamiltonian for the case of NiO. We find only small deviations for the DOS, which we relate to a subtly different treatment of the spin polarization and the different projections used in the standard vasp DFT+U and the DFT+DMFT scheme. For NiO an important finding is that the double-counting problem is alleviated by the charge self-consistency. With charge self-consistency the influence of the double-counting parameter on the band gap is reduced by about a factor 2 compared to one-shot calculations.
Second, we simulated a SrVO 3 monolayer and found strong orbital polarization, which is decreased in charge self-consistency.
This agrees with previously published results using FLAPW+DMFT. Additionally, as a proof of concept, we calculated the total energies from DFT+DMFT and found that correlation effects lead to structural changes in SrVO 3 monolayers, reducing the apical oxygen height in the single layer.
The presented projector and fcsc scheme can be used to interface basically any many-body method with the vasp package. It offers a robust and concise interface for materials studies as well as future developments of tools for strongly-correlated electron systems.