Systematic construction of density functionals based on matrix product state computations

We propose a systematic procedure for the approximation of density functionals in density functional theory that consists of two parts. First, for the efficient approximation of a general density functional, we introduce an efficient ansatz whose non-locality can be increased systematically. Second, we present a fitting strategy that is based on systematically increasing a reasonably chosen set of training densities. We investigate our procedure in the context of strongly correlated fermions on a one-dimensional lattice in which we compute accurate training densities with the help of matrix product states. Focusing on the exchange-correlation energy, we demonstrate how an efficient approximation can be found that includes and systematically improves beyond the local density approximation. Importantly, this systematic improvement is shown for target densities that are quite different from the training densities.


Introduction
The formulation of quantum mechanics in terms of density functionals instead of wave functions, following the ground-breaking works of Hohenberg, Kohn, and Sham [1,2], made numerical simulations of quantum mechanical systems ranging from the microscopic to the macroscopic world feasible [3,4,5]. The usefulness of density functional theory (DFT) is certified by the number of works based on the original publications [1,2] and on later improvements of the exchange-correlation (xc) energy density functional E xc [6,7,8,9,10,11,12].
DFT in its most widely used form, namely Kohn-Sham (KS) DFT [2], requires the xc density functional in order to be able to compute ground state energies and densities. By virtue of the Hohenberg-Kohn theorem [1], all ground state observables are functionals of the ground state density n, and so E xc = E xc [n]. The ground state energy E = E[n] of a system can be decomposed into a kinetic, an interaction and a potential part. By means of a fictitious non-interacting system, namely the KS system, the noninteracting part of the kinetic energy, T s = T s [n], can be obtained efficiently, which represents a large contribution to the full interacting kinetic energy T . Further, part of the interaction energy is accounted for by the Hartree energy E H [n]. The potential part E V [n] can be exactly computed efficiently for any ground state density n. Finally, the remaining part of the total ground state energy defines the xc density functional, . DFT is in principle exact, but in practice determining the precise form of the xc density functional is QMA-hard [13]. Therefore, KS DFT can only make use of approximations of E xc . The enormous success of DFT is thus deeply connected to the successful construction of good approximations for the xc energy density functional.
In the history of DFT and quest for a universally applicable approximate E xc [14], mainly two different paths have been followed: one is the non-empirical approach pioneered by Perdew [15] and the other is the semi-empirical approach initiated by Becke [8]. The non-empirical approach makes use of exact conditions, that a physical system must fulfill, to find approximations for the xc density functional. Within this approach a "Jacob's ladder" of functionals was built where each functional on a higher rung of the ladder is supposed to improve upon the ones on the lower rungs [16,17]. On the lowest rung of the "Jacob's ladder" resides the local density approximation (LDA), which was already introduced by Kohn and Sham in [2]. The higher rungs are supposed to systematically improve upon the LDA, which, in practice, does not always happen [17]. Additionally, at the moment, the more precise functionals on the higher rungs are so much more difficult to compute that further improvements of DFT following this non-empirical approach seem very hard to achieve. In the semi-empirical approach, an ansatz for the functional form of E xc is fitted using experimental data, accurate theoretical reference data, or other constraints. However, often relatively small training sets are used in these fits and then the resulting functionals can be biased towards their training [14].
Alternatively, we might obtain further improvements of E xc away from but using concepts of both the semi-empirical and the non-empirical approach, e.g. by using a large set of accurate training densities and corresponding values of E xc , and by fitting an efficient ansatz to these data that includes some exact conditions. Obviously, a difficulty of this alternative scheme is that it requires a possibly large number of accurate solutions for the quantum many-body problem. However, nowadays, tensor network states provide precise results for quantum many-body systems, e.g. [18,19], in particular with respect to ground state properties. We remark that tensor network methods are currently limited to low-dimensional, i.e. one-and some two-dimensional, quantum lattice problems while DFT usually handles three-dimensional continuous quantum systems. Since DFT can be applied to a wide range of realistic quantum systems, it is a useful algorithm for a large community and thus worth improving.
In this article, we want to analyze the feasibility of constructing an approximate xc density functional of a specific form, when large training sets of ground state densities and corresponding values of E xc are available. The specific form for the ansatz of our approximation is inspired by the non-empirical approach [17]: it includes the LDA [2,20] and allows a systematic improvement beyond it. For this feasibility study, we focus on discrete lattice problems and the one-dimensional case, and we use matrix product states (MPS) for the computation of accurate ground state energies and densities [21,22]. The specific discrete lattice problem considered here can be derived from discretization of continuous space, i.e. the usual scenario of DFT. Then our ansatz can be seen as the discretized version of a continuous function. Although we could approach the continuum solution by successively decreasing the discretization, taking the continuum limit is beyond the scope of this work.
The structure of this article is as follows. In section 2 we introduce the considered Hamiltonian and observables. The corresponding exact LDA is presented in section 3. We then propose, fit, and assess our ansatz in section 4. Finally, in section 5 we conclude this work and give an outlook.

Model
In the following, we consider two species of fermions with long-ranged soft-Coulomb interaction on a finite one-dimensional lattice of length L with hard-wall boundary conditions, as represented by the Hamiltonian: withT Here, c † l,σ creates and c l,σ annihilates a fermion of species σ = ↑, ↓ on lattice site l, n l,σ := c † l,σ c l,σ is the corresponding occupation number operator andn l :=n l,↑ +n l,↓ . The total particle number is denoted by N := L l=1n l . We obtain ground states with different total particle number by choosing different values for the chemical potential µ, which plays the role of a Lagrange multiplier fixing N. Such a Hamiltonian can also describe the discretized continuous problem with lattice spacing ∆ when in (2a) t is replaced by 1/(2∆ 2 ) and in (2b) the denominator (m − l) 2 + 1 is replaced by (m − l) 2 ∆ 2 + 1. The solution for different discretizations can be very precisely computed with MPS [23,24] and so our approach should yield highly accurate training densities. If we would like to obtain the solution for continuous space, we would have to run our computations repeatedly with decreasing lattice spacing ∆ and extrapolate our results to ∆ = 0. Here we see (1) as the Hamiltonian of the problem, and not a discrete version of a more fundamental one, thus we set t = 1/2 and U = 1 and fix the number of lattice sites to L = 21 from now on.
On a finite lattice, densities n l := n l = n l,↑ +n l,↓ can be written into a vector n := (n 1 , n 2 , . . . , n L ) T -where T denotes the transpose -such that every density functional F can be written as a function of such density vectors F = F (n). In the following, we will consider the universal Hohenberg-Kohn functional F HK (n), the Hartree-energy E H (n), and the non-interacting kinetic energy T s (n), e.g. [20]. For the above Hamiltonian (1) these functionals read: where E(n) denotes the ground state energy of an interacting density n, i.e. corresponding to (1) withŴ , and E s (n) denotes the ground state energy of a noninteracting density n, i.e. corresponding to (1) withoutŴ . Knowing the values of these functionals (3a), (3b), and (3c) for a particular density n allows to calculate the xc energy E xc for that density: However, given an arbitrary density vector n, only E H (n) is trivial to compute: F HK (n) requires the knowledge of the external potential as a function of the density, v ext l = v ext l (n), and T s (n) requires the knowledge of the effective non-interacting Kohn-Sham potential v s l = v s l (n). This process of calculating the external potential v l (n), in which the ground state has the given density n, is called inversion and can be performed efficiently only in the non-interacting case or for two fermions [25]. In general, there exists no efficient inversion procedure for the interacting case and we use a slight modification of the iteration proposed in [26,27]: Aiming at the target density n tar , we iterate v l (i + 1) = v l (i) + γ(i)(n l (i) − n tar l ) until ||n(i) − n tar ||where || . . . || denotes Euclidean norm -is below a desired precision threshold. Here, n(i) is the ground state density in the external potential v(i) at the iteration step i, and γ(i) > 0 is adjusted during the iterations to speed up the convergence. Since interacting inversion necessitates several ground state computations to attain an approximate solution, it is not efficient. Even more sophisticated iteration schemes cannot circumvent that some densities require incredibly many iterations, i.e. ground state computations, until convergence [28]. Therefore, in general, interacting inversion represents a computationally demanding task.

Exact LDA
As the exact form of E xc from (4) is not known, in practice, approximations are used. One of the simplest and most successful approximations is the LDA [2,20].
The exact LDA e LDA xc is defined via the homogeneous electron gas, i.e. via exactly homogeneous densities n = (n 1 , n 2 , . . . , n L ) T = (n, n, . . . , n) T in the thermodynamic limit L → ∞ [2,20]: This quantity is then used to approximate the xc energy of a finite system by Because E LDA xc (n) is the exact xc energy for exactly homogeneous densities in the thermodynamic limit, it represents a good approximation for relatively homogeneous densities on large lattices L >> 1.
Our feasibility study here assumes a relatively small lattice of size L = 21 with hard-wall boundary conditions, such that finite size and boundary effects play a role. We therefore derive our own LDA for this system and do not make use of the existing results in [29]. Figure 1 shows our e xc (n) obtained from numerically exactly homogeneous densities computed by means of non-interacting (for T s (n)) and interacting (for F HK (n)) inversions on L = 21 lattice sites for all possible total particle numbers N = 0, 1, . . . , 42. To allow for an efficient evaluation of E LDA xc , we parametrize the function e xc (n) using a finite number of parameters. A simple way to achieve this is to assume a polynomial form p d (n) of certain degree d and to fit our results using different values of d. In the fit of each polynomial p d (n), we impose the physically reasonable constraint p d (0) = 0 = p d (2), which trivially holds for the exact E xc , as can be seen in (4): Obviously e xc (0) = E xc (0, 0, . . . , 0)/L = 0 because every term in (4) vanishes independently for zero total particle number, and e xc (2) = E xc (2, 2, . . . , 2)/L = 0 because F HK (2, 2, . . . , 2) = E H (2, 2, . . . , 2) and T s (2, 2, . . . , 2) = 0 due to impossible tunneling. Apparently, our exact LDA e xc is well approximated by polynomials of low degree d since, on the scale of figure 1, the d = 8 fit seems indistinguishable from the d = 4 fit.
The LDA resides on the lowest rung of "Jacob's ladder" [17] and the most successful approximations of E xc beyond the LDA were built on top of it [6,7,8,9,10,11,12]. Analogously, we will use the LDA computed above as our reference, and we will try to improve upon it with a more general ansatz for the functional.

Our ansatz
Our approach for the construction of an improved xc energy approximation consists of two parts. Firstly, it requires an efficient variational density functional ansatz, denoted by G, to approximate E xc . Secondly, a set of M external potentials has to be specified, which will be called training scenario, such that the corresponding M exact ground state densities n t , called training densities, and exact values E xc (n t ) are used to determine G by minimizing a cost function over the variational parameters of G.
We are interested in an ansatz that, firstly, includes the LDA and, secondly, allows for a systematic improvement over it by including non-local terms. In this spirit, we propose a two-site ansatz of the following form: with For X = 0, we have G(n) = G X=0 (n) = G k=0 (n), which is completely analogous to the LDA (6). And for X > 0, the k > 0 terms allow for a more general dependence on the density with two-site functions over a range limited by X. In this way, increasing X allows us to systematically include more non-local information and to go beyond the local LDA.
In order to have a practical functional, we want to write it in terms of a discrete set of variational parameters. Thus we need to restrict the form of the functions g k . For simplicity we choose here a polynomial form for each term, as we did in the previous section 3 for the reference LDA. Additionally, in all following numerical experiments, we simply fix the degree of the polynomial to d = 4.
When G(n) is assumed to be a polynomial of the n l , the variational parameters of G are the polynomial coefficients. Then the desired argmin G d(G), i.e. the argument of d that minimizes the cost function, results from the solution of linear equations Ac = E xc where the polynomial coefficients of G are vectorized in c, the exact values are vectorized in E xc , and the elements in the matrix A establish the correct connection to the cost function (7): We want to emphasize that this approach does not need any computationally demanding interacting inversion. Because the training densities n t follow from the training potentials, i.e. from M different choices of v ext in (3a), we know F HK (n t ). While the calculation of E H (n t ) is trivial, T s (n t ) is computed via efficient non-interacting inversion. Thus, all further ingredients for E xc (n t ) of (4) are then efficiently computable.
The first step in our approach is to consider the ansatz G = G 0 , with g 0 (n l ) := d s=0 c 0 s n s l a polynomial in n l of degree d with coefficients c 0 s (and d = 4 in the following). The simplest possible, "homogeneous", training scenario amounts to setting v ext l = 0, in which case we have one training density for each possible total particle number N = 1, 2, . . . , 2L, i.e. at most M = 42 for L = 21. Figure 2 demonstrates how training our local term g 0 with such ground states reproduces the exact LDA. Remarkably, a very good match between our g 0 and the exact LDA is achieved already with M = 30 ground state computations. This has to be compared to the several thousands of ground state computations that were required for figure 1 due to the interacting inversion iteration.
We can understand that the "homogeneous" training scenario leads to the exact LDA because this scenario contains relatively homogeneous training densities and because the exact LDA is constructed from exactly homogeneous densities. Now we want to consider more inhomogeneous densities. For that purpose we propose the simplest possible extension of the "homogeneous" training scenario: the "step" training scenario shown in figure 3 (a). This scenario contains the simplest external potentials that give rise to inhomogeneous ground state densities, see figure 3 (b) for some example densities. The "step" training scenario allows us to generate much larger training sets since we define it by free choice of: a) the step position (from l = 2, 3, 4, . . . , 21), b) the step height (from h = 0, 0.1, 0.2, . . . , 2.0), c) the step orientation (left or right), and d) the total particle number (from N = 1, 2, 3, . . . , 30). We do not include total particle numbers N larger than 30 in this training set because we want it to contain sufficiently inhomogeneous densities that become more inhomogeneous when the step height increases; clearly, for large total particle numbers such as more than 30 fermions on 21 lattice sites, an increasing step height quickly creates large homogeneous regions of maximum filling in the density, i.e. having 2 fermions per lattice site.
We have investigated two different ways of converging G with this "step" training scenario. In the first way, we pick M ground states randomly and study the convergence of G as a function of M. In the second way, we fix the total particle numbers considered to N = 1, 2, . . . , 12, take all possible step positions and orientations, and increase M systematically together with the step height. In both schemes, convergence is quantified by comparison of the solution for M with the solution for the largest considered M max , which we fixed to 12800 for the random and to 9612 for the systematic densities. We can then look at the quantity where C i (M) denotes the ith parameter of G after training with M densities and . . . := 1/P P i=1 . . . denotes taking the mean value over all P possible values of i. Figure 4 shows our results for random densities. Interestingly, this training scenario gives rise to local terms g 0 that are very similar to the exact LDA, although many training densities of this scenario are very inhomogeneous. Furthermore, we can read off from the inset of figure 4 that convergence occurs rapidly.
To go beyond the LDA, we now include the longer-range two-site terms with k > 0 i.e. these terms are general degree d polynomials of the density values on 2 lattice sites separated by distance k (and, again, we simply fix d = 4 in the following). While, as discussed above, for X = 0, our ansatz G 0 (n) = L l=1 g 0 (n l ) is completely analogous to the LDA of (6), for X > 0, it contains additional non-local terms, such that, by systematically increasing X in our ansatz G X (n), we can systematically increase its non-locality beyond the local LDA-like term.
We enforce in our desired solution G X that the terms g k for increasing k are obtained one after another such that each additional non-local term (i.e. corresponding to the next larger value of k) is a correction to the previous solution. This means that, for given X, we first minimize (7) only via the parameters c 0 which gives G 0 . Then we minimize (7) for the remainder E 1 xc (n t ) := E xc (n t ) − G 0 (n t ) only via the parameters c 1 which together with the previous solution c 0 gives G 1 . Then we minimize (7) for the remainder E 2 xc (n t ) := E xc (n t ) − G 1 (n t ) only via the parameters c 2 which together with the previous solutions c 0 and c 1 gives G 1 . We continue the scheme until we have reached c X and thus G X . This procedure ensures that each longer-range term is built on top of all previous shorter-ranged ones, in the same way as the functionals on higher rungs of "Jacob's ladder" are more non-local and are built on top of the more local functionals on the lower rungs [17]. In order to analyze the performance of the ansatz, we adopt now a different strategy. We will now always fit our ansatz G X with the "step" training scenario of figure 3 and then we will apply it to completely different target densities. For the latter, we choose ground states of the H 2 dissociation problem, i.e. Hamiltonian (1) with total particle number N = 2 and external potential where R denotes the separation between the two H atoms placed in the middle of the lattice such that we set l 0 = 11 for L = 21. Because this problem represents a realistic physical application that is significantly different from our training scenario, we consider it to be a very good benchmark for our approach. From now on, the local terms g 0 (n) in our two-site polynomial ansatz (8) are fixed to be the exact LDA from figure 1. And the non-local terms g k>0 (n 1 , n 2 ) are enforced to fulfill g k>0 (0, 0) = 0 = g k>0 (2, 2) as well as g k>0 (n 1 , n 2 ) = g k>0 (n 2 , n 1 ). These properties are physically reasonable and they reduce the number of variational parameters, which turned out to be beneficial for the convergence of our fit. In particular, this helped us to avoid the effect known as overfitting. We distinguish two different versions of our ansatz, namely constrained and unconstrained. In our constrained two-site polynomial ansatz we determine g k>0 (n 1 , n 2 ) under the constraint g k>0 (n, n) = 0: then G X (n) is exact for exactly homogeneous densities n = (n, n, . . . , n) T . In our unconstrained twosite polynomial ansatz we do not impose this constraint on the polynomial coefficients: the unconstrained ansatz has thus more variational parameters than the constrained ansatz.  Figure 5 shows our results after convergence with M systematic densities. As we can see in (a), the constrained ansatz leads to a visible improvement over LDA close to R = 0, but not for larger R. A convergence of the non-local terms can be concluded from the inset. In (b), the unconstrained ansatz leads to an improvement over LDA for almost all values of R, but it produces too low energy values close to R = 0. The inset of (b) demonstrates that convergence of the non-local terms occurs, however, for k > 1 this convergence is slower than in (a). With increasing X, our ansatz systematically improves the LDA result at specific values of R: around R = 0 when the constrained version is used, and at larger R when the unconstrained version is used. Both versions of our ansatz show a systematic improvement over LDA with increasing X when the mean energy for all values of R is considered. In fact, such a mean value is the correct figure of merit because the cost function (7), minimized for "step" training densities by our ansatz, is also a mean value of many xc energies.
Clearly, we would like to use G X (n) to compute densities self-consistently via the KS cycle. A first step in this direction is the calculation of the xc potential v xc l (n) := −∂E xc /∂n l | n for the exact ground state density n. In the H 2 dissociation problem, the xc potential for larger values of R is particularly interesting, since its exact form exhibits a characteristic peak that cannot be reproduced by LDA alone [30]. Figure 6 shows our results for R = 5. While our constrained ansatz leads to a potential that basically coincides with the one from LDA, our unconstrained ansatz leads to a small systematic improvement with increasing X.

Conclusions
We have analyzed the feasibility of constructing semi-empirical approximations for the xc density functional in the context of a long-range interacting many-electron system on a one-dimensional lattice. Using numerically exact ground states from MPS simulations, we proposed to fit an ansatz that includes an LDA-like part plus additional terms of increasing non-locality, by means of reasonably chosen training densities. We observed that our ansatz converges systematically within the training scenario. Additionally, when applied to completely different target densities, namely of the H 2 dissociation problem, our fitted ansatz improved upon the LDA systematically. This systematic improvement was demonstrated for the ground state energies of the H 2 problem and for a xc potential corresponding to a stretched H 2 molecule.
In this work, we have tested the effect of a systematic inclusion of non-local ingredients in the functional using very simple ansaetze for the non-local terms, namely only two-site dependences, and for the functional form, namely only polynomials, and we considered only one training scenario. Our results show that by systematically including non-local terms, the approximation can without doubt be improved beyond LDA. Our quantitative results are nevertheless limited to the specific form of the ansatz and training set used. For instance, the fact that the dissociation curve does not significantly improve by including terms of longer range after some point seems to indicate that other non-local contributions may be more relevant. Likewise, considering functional forms for each term that go beyond a polynomial may improve the power of the ansatz. We have already run initial tests to study the effect of terms that depend on three variable densities, but we observe no clear convergence with as many as ≈ 20000 densities from the "step" training scenario. This clearly indicates the necessity for a different training scenario and possibly additional physical constraints that effectively reduce the number of variational parameters. The question itself of how to choose the training densities optimally is, in general, a very important one that should be further explored, as a better training scenario would always further improve our results. All in all, although such improved ansaetze and training scenarios must definitely achieve better results than the ones reported here, a careful analysis is beyond the scope of this proof-of-principle work.
It would be very interesting to combine our procedure with concepts from recent works on machine learning of density functionals [31,32,33,34,35,36]. On the one hand, these works typically required less training densities than our approach. On the other hand, our work constructs systematic corrections to a standard approximation, namely the LDA, that can be applied in general, i.e. to other types of systems beyond those that were originally used for the fit. Thus, a combination of the good aspects of our procedure with the good aspects of the previous machine learning concepts could be the ultimate solution to some of the problems that both approaches currently have independently from each other.
In this article, we focused exclusively on approximations of the density functional for the xc energy. However, in principle, any ground state observable can be written as a functional of the ground state density [1]. Our proposed scheme allows in principle to also construct systematically density functionals for observables other than the ground state energy. This fact might now be useful for ultracold atoms in optical lattices since the densities of these systems have become experimentally accessible with remarkable resolution [37,38,39,40,41,42,43,44]. Measurements of a certain observable which are difficult to perform with the current techniques might be easier to carry out now when a density functional would be provided for that observable. Particularly interesting measurements regard two-dimensional systems with special observables, such as e.g. special order parameters. The corresponding density functionals could be constructed with the help of projected entangled pair states (PEPS) [45]. Because experiments are restricted to finite system sizes, PEPS algorithms are perfectly suited for the ground state simulation of such finite two-dimensional systems, for which they can produce accurate numerical results [46,47,48,49,50,51].