Towards the construction of an accurate kinetic energy density functional and its functional derivative through physics-informed neural networks

One of the primary obstacles in the development of orbital – free density functional theory is the lack of an accurate functional for the Kohn – Sham non-interacting kinetic energy, which, in addition to its accuracy, must also render a good approximation for its functional derivative. To address this critical issue, we propose the construction of a kinetic energy density functional throught physical-informed neural network, where the neural network ’ s loss function is designed to simultaneously reproduce the atom ’ s shell structures, and also, an analytically calculated functional derivative. As a proof-of-concept, we have tested the accuracy of the kinetic energy potential by optimizing electron densities for atoms from Li to Xe. The

The development of an accurate kinetic energy functional is perhaps the most challenging problem in the orbital-free version of density functional theory (OF-DFT) [1]. Since Hohenberg and Kohn demonstrated that the total energy of an electron system can be written in terms of the electron density [2], n(r), a considerable number of works have been devoted to seek kinetic and exchange-correlation energy functionals in terms of the electron density and its derivatives [3]. In the Kohn-Sham version of the density functional theory (KS-DFT), an auxiliary set of one-electron orbitals are introduced, which provide accurate kinetic energy with a computational cost comparable with that of a Hartree-Fock calculation [4]. Under this version, an unknown exchangecorrelation functional is introduced, whose exact form is still elusive. Even though the KS-DFT is very appealing for many problems, from a computational point of view, finding the eigenvalues of the KS-DFT equations typically scale as N 3 , where N is the number of basis set orbitals (that increase linearly with the number of atoms). This scalability imposes limitations in the applicability of the KS-DFT approach when considering large systems. In contrast, since OF-DFT equations exhibit a quasi-linear computational scaling, it would be possible to model systems with millions of atoms [5,6]. Although many reasonable kinetic energy density approximations have been proposed, most of them render large errors on the predicted functional derivative to an extent that qualitative incorrect results are observed in iterative OF-DFT calculations [1,7]. Physical-informed neural networks is a natural tool for obtain the functional derivative. In this work, we introduce an alternative to this critical issue by using a physical-informed neural network.
This work is focused on the Kohn-Sham kinetic energy functional, whose general form is, τ KS (r) in equation (1) is the positive Kohn-Sham kinetic energy density, τ KS (r) > 0, defined as, where {f i } are the Kohn-Sham orbitals, n i are the occupation numbers, and the summation runs over all orbitals. It is important to point out that the addition of any amount of the Laplacian of the density to τ KS will not change the overall numerical value of the total kinetic energy; therefore, there is not an unique definition of the local kinetic energy density [8]. By following the approach employed in the generalized gradient approximation [9], we assume that τ KS (r) has the form, where τ TF is the Thomas-Fermi kinetic energy density, τ TF = c TF n 5/3 , with p = ( ) c 3 10 3 TF 2 2 3 and F s is the enhancement factor. It is important to consider that, for this factor to be invariant under an uniform density scaling [10], it has to be a function of the dimensionless reduced gradient, The kinetic enhancement factor is approximated in the form, In this equation, the first two terms of the right side correspond to the second order gradient expansion approximation (GEA2) of the kinetic energy density, that is, the Thomas-Fermi plus 1/9 of the von Weizsäcker energy [11,12]. The third term, Δ, is approximated by using a deep neural network that employs s 2 and p as input variables.
The use equation (6) was motivated by three features of Δ. In first place, GEA2 recovers a significant part of the full kinetic energy; consequently, Δ only contributes only with a small fraction of the kinetic energy. In an atomic and molecular benchmark comparison of different functionals, the error of GEA2 was 0.73% and 0.41%, respectively (see Suplementary Material for details). The second reason to use Δ is its capability to reproduce the shell structures. Thus, after a suitable normalization that uses a hyperbolic tangent function, Δ reveals the shell structure of the atoms, much like does the Pauli enhancement factor, used in the Electron Localization Function [13]. To illustrate this point, in figure 1(a) for the Kr atom, we plot (in black) the normalized value of Δ as a function of the distance to the nuclei, where the shell structure correctly emerges. The shell structure of atoms is an important ingredient in the design of kinetic energy functionals [13][14][15]. In fact, many of the kinetic energy functionals published in the literature fail to correctly describe this important feature. The third reason is related with the shape of s 2 and p after normalization. To clarify this, figure 1(a) also displays the normalized s 2 (in blue) and p (in red), where it is clear that these two descriptors, qualitatively have the same shape as Δ. Therefore, this last property encourages us to use only the reduced gradient and the Laplacian as inputs in the neural network.
Two neural networks models were employed for the construction of Δ. The first one is a purely semi-local neural network (SLNN) that only uses s 2 and p as input variables ( figure 1(b)). The first step is the normalization of the input and the target variables using the hyperbolic tangent, SLNN consists of a number of fully connected hidden layers with a rectified linear unit (RELU) activation function, and a single output neuron with a sigmoid activation function that produces D . Finally a back normalization renders the desired Δ. In addition to this semi-local network, a non-local neural network (NLNN) is employed ( figure 1(c)). NLNN starts with two neural networks, one that is similar to SLNN, that accounts for all the local features, and a second one that acts as a filter. For atoms, the second network depends on the number of electrons and on the distance to the nuclei. For these two networks the last layer contains the same number of neurons. The convolution of the local network and the non-local one is achieved by using an element wise multiplication of the last layers. The result of this convolution is followed by a final fully connected network. A detailed explanation, and technical information of SLNN and NLNN is provided in the supplementary information.
Previous works have revealed that the kinetic energy can be precisely obtained using machine learning [16][17][18] or neural networks [19][20][21][22][23][24]. This may be expected due to the well-known capability of neural networks as universal function approximators [25]; consequently, neural networks are good candidates to offer a simple way to identify a nonlinear map from (s 2 ,p) and the Kohn-Sham kinetic energy density. In this work, an approach has been proposed to introduce in the network training the shell structure of the atoms through Δ, and also a strategy has been proposed for the incorporation of the non-local information through the NLNN. The next step is addressing the problem of the calculation of the functional derivative, to this end a physics informed neural network (PINN) will be introduced.
For training purposes, the functional derivative of the kinetic energy can be evaluated from the exact orbitaldependent expresion [26], where f i and ò i are, respectively, the occupied KS orbitals and their eigenvalues and μ is the chemical potential, approximated by the HOMO eigenvalue. Notice that υ s in equation (8) is related to the effective potential-depend term of Manzhos and Golub [18]. Alternatively, the functional differentiation of T s can be attained by functional derivative of equation (1) [27],  This result shows that υ s depends on the reduced gradient and Laplacian, on derivatives of Δ until third order, and on the fourth-and sixth-order reduced derivatives introduced in the supplementary information. Thus, our aim is not only training a network to reproduce the values of Δ, but also, by employing PINN, to reproduce the values of υ s in equation (9).
Physics-Informed Neural Network (PINNs) is a scientific machine learning technique used to solve problems involving differentiation of the Neural Network target function, as is the case in Partial Differential equations [28,29]. To obtain the functional derivatives, PINN exploits the automatic differentiation capabilities of Python neural networks interfaces, like Tensorflow, to differentiate neural networks with respect to their input and model parameters. Automatic differentiation, or algorithmic differentiation, is one of the most useful, but perhaps underutilized, techniques in machine learning. The objective is to integrate into the loss function of a neural network, the information from both, the value of Δ and from the functional derivative. Figure 2 summarizes the physics-informed neural networks building blocks used in this work. The networks are composed of three components: a neural network, a physics-informed network, and a feedback mechanism. The first block is the neural network introduced in figures 1(b) and (c). The second block computes the derivative to determine the losses. Finally, the feedback mechanism minimizes the loss according to some learning rate.
The PINN methodology determines the parameters {θ} of the Neural Network block by minimizing a loss function defined as, The two terms of  refer to the mean square error of the training data of Δ, denoted as D  , and the mean square error of the functional derivative υ s , denoted as u  . Numerical tests were performed using an atomic dataset of 18 atoms from Li to Xe. In a first step, we trained the neural network to obtain an approximation of Δ, excluding the PINN block (ω υ = 0). For comparison purposes, we have tested several functionals (see Supplementary Material) and computed the mean absolute percent error (MAPE) of the total kinetic energy. The two functionals with the smallest MAPE are the PBE-like approximations of the kinetic energy density, due to to Tran and Wesolowski (PBE-TW) [30], followed by that of Constantin, Fabiano, Lariccia and Della Salla (APBEK) [31]. Figures 3(a) and (b) compare the performances of SLNN and NLNN with those of the GEA2, PBE-TW and APBEK. Figure 3(a) shows that the percentual error for SLNN is one order of magnitude lower than that of APBEK and PBE-TW, whereas the inclusion of the distance dependent part in the NLNN reduces the error an additional order of magnitude. The SLNN MAPE for D (a 1.6 M points) is 0.455% and for NLNN is 0.098%. Clearly, the non-local information is very important for the correct description of the shell structure of the atoms. Figure 3(b) shows the error in the kinetic energy density of the 18 atoms in mHa. Since the kinetic energy increases with atomic number, it is reasonable to expect that their error would also increase. Again, for all atoms, NLNN outperforms SLNN, and any tested functional. For the second row series, the average error is 0.84 mHa, which increases to 14.00, 48.12 and 308.8 mHa for third, fourth and fifth row series.
Next, as starting point, we use PINN, with ω Δ = ω υ = 0.5. Figures 3(a) and (b) show the errors when the SLNN (SLPINN) and the NLNN (NLPINN) are used. It is found that in this case, the performance in the kinetic  (Figure 1(b)) and NLNN (figure 1(c)). A physics informed network, which takes the output field Δ and computes the derivative to compute the loss function. The final block is the feedback mechanism, that minimizes the loss, using an optimizer. energy calculation is slightly worse: The SLPINN error is 1.40 times larger than that of SLNN, while the NLPINN error is 1.47 larger than that of NLNN. Nevertheless, despite of this, the results are much better than those corresponding to the compared functionals.The following step was to investigate to which extent, PINN can match the functional derivative. Thus, for the Kr atom, figure 3(c) displays the results for the orbital potential, equation (8) (black curve), the GEA2 potential (blue curve) and the PINN potentials, SLPINN (orange curve) and NLPINN (green curve) (equation (9)). First, let us notice the absence of nuclear singularities in υ s which may be attributed to the fact that the gaussian basis set does not have the proper nuclear-site cusp. This figure also shows that for distances larger than approximately 10 −2 Å, all the calculations predict similar results; however, closer to the nuclei, they are different. Thus, for GEA2, the potential values are significatively smaller than those corresponding to the orbital potential. This is not a surprising result, since this functional is based on the uniform electron gas limit. On the other hand, PINN based neural networks, show better performance near the nuclei, with NLPINN outperforming SLPINN. From this study is clear that nonlocal information is crucial in two key ways: (i) The information of the nuclei position is used for the computation of the kinetic energy to produce a more detailed description of the regions between two atomic shells; (ii) the potential obtained from the NLPINN is significantly closer to the reference one (RHF).
Let us make some final remarks. In this work, we have used HF gaussian atomic densities to compute the kinetic energy density and the kinetic energy functional derivative. Since our derived PINN functionals have been obtained from orbital potential fitting, it is expected that these functionals would perform equally well in iterative OF-DFT calculations, delivering reasonable results. Additionally, some preliminary tests in atoms have shown that starting from an anzat density, that automatically guarantees integration to the number of electrons, the target density can be obtained from minimizing an energy functional, using a gradient descent algorithm, with the result displaying the proper shell structure (see Supplementary Material). Let us point out here, that the electron density distribution deviations, respect to the reference, are less than 0.1%. Despite all this, some queries still remain on how good these PINN-based functional derivatives may perform in yielding the correct ground-state densities in molecular bonding regions, and assessing for this case, the accuracy of the PINN-based functionals and their derivatives, all of which is important to actually carry out KEDF calculation of the density and its derivative. Although this study is based on an empirical method, the results show that, compared with conventional KEDF, the tested networks with the smallest energy error also display remarkably small density deviations. However, due to the usage of the discontinuous RELU transfer function, there are several challenges in the training of the models and PINN that we intend to improve in future applications. Finally, we would like to indicate that in studies involving machine learning, it is important the transferability of the constructed functionals, hence; we are testing the SLNNS and NLNN networks, but instead of employing the HF training set, densities from DFT calculations are being used, observing no deterioration in the results accuracy.