Inverting the Kohn–Sham equations with physics-informed machine learning

Vincent Martinetto; Karan Shah; Attila Cangi; Aurora Pribram-Jones

doi:10.1088/2632-2153/ad3159

1. Introduction

Modern, high-performance computational resources have enabled large-scale electronic structure simulations of molecules, materials, and other systems of interest across biology, chemistry, physics, and beyond. Kohn–Sham (KS) density functional theory (DFT) [1, 2] is the most widely used method due to its accuracy and computational efficiency. KS-DFT has helped solve major scientific and technological problems such as simulating chemical reactions, computing material properties, finding new catalysts, discovering drugs, and modeling microscopic environmental processes.

The electronic structure problem is typically tackled by solving the non-relativistic Schrödinger equation for the molecular Hamiltonian within the Born-Oppenheimer approximation. KS-DFT simplifies this process by transforming the many-body problem into an effective single-particle problem, utilizing the one-to-one correspondence of the external potential and the electronic density [1]. The crux of KS-DFT is producing an electronic density that matches the one obtained from solving the interacting many-body problem [2]. This is accomplished through an effective single-particle potential known as the KS potential, which is comprised of the external potential, the Hartree potential that accounts for electrostatic repulsion, and an exchange-correlation (XC) potential compensating for all interactions beyond the Hartree potential.

While an exact expression for the XC potential ( $v_\mathrm{{\scriptscriptstyle XC}} = \delta E_\mathrm{{\scriptscriptstyle XC}}/\delta n$ ) is not known, useful approximations exist. These approximations are categorized based on increasing complexity and accuracy [3]. They include, for example, the local density approximation [2, 4, 5], which is derived from the interacting uniform electron gas [6]; generalized gradient approximations (GGAs) [4, 7–9], relying on density gradients; meta-GGAs [10, 11], which consider the kinetic energy density; and hybrid functionals, [4, 12, 13] incorporating a dependence on Hartree–Fock orbitals. More sophisticated approximations beyond hybrid functionals include the random phase approximation [14–19] and double hybrid [20–23] methods.

Despite the usefulness of existing approximations for various applications in chemistry and materials science, developing more accurate XC approximations is an active area of research. [24] New paths for finding and optimizing such approximations are required for describing complex molecular systems of upcoming interest [25]. Some advances in functional construction leverage machine learning (ML) techniques [26, 27]. These methods share the common goal of learning the XC energy functional from accurate data on molecular systems [28–51].

Another promising approach to constructing approximations to the XC functional is by reverse-engineering the XC potential from systems that can be solved exactly. This procedure takes the electronic density as input and finds the corresponding XC potential by inverting the KS equations. Though straightfoward to describe, this process is an example of an inverse problem. Inverse problems are encountered in various scientific fields, such as image reconstruction, a fundamental method in MRI and CT scans; inverse scattering, commonly used in seismology and radar imaging; and tomography, frequently employed in CT scans. In each case, as in our case, observed data is used to reconstruct the structure of an object that gives rise to the initial data [52]. Inverse problems are often ill-posed, meaning they are highly sensitive to small changes in the observed data and noise. This sensitivity makes them challenging to solve [53].

Various algorithmic and numerical schemes for inverting the KS equations exist in the literature [54–60]. These have provided valuable insights [61–66] into the properties of the XC potential, such as the appearance of the derivative discontinuity or the dynamical step structures in time-dependent DFT, and have guided the construction of XC approximations [67]. However, traditional inversion methods are often plagued by the numerical errors and instabilities [55] common in the solution of inverse problems, as well as from those specific to the density-to-potential problem.

Recent research has demonstrated numerous examples of utilizing ML, particularly neural networks, to address inverse problems, with a focus on medical image processing applications. The primary objective is to learn stable mappings that can be effectively applied to noisy data in a general context [68, 69]. The fusion of physics with ML offers innovative solutions to complex problems, leading to the evolution of physics-informed machine learning (PIML) techniques [70]. Instead of solely relying on data, PIML incorporates governing physical laws, typically differential equations, into the learning process. This results in data-efficient learning and equips the model with the capability to generalize within physically consistent bounds. This is especially relevant in cases with sparse or noisy data where conventional ML methods might not perform well due to overfitting or by producing non-physical outcomes.

In this work, we employ two recent advances in PIML, namely physics-informed neural networks (PINNs) [71] and Fourier neural operators (FNOs) [72], to tackle the inverse problem in KS-DFT. PINNs have the unique ability to learn from the underlying physics described by partial differential equations, leading to improved generalizability with less data and training time compared to conventional methods. This characteristic proves highly advantageous in computational sciences where data availability is limited and generating extensive training sets is costly. PINNs have demonstrated their effectiveness and versatility in various scientific domains for inversions, including materials and fluid dynamics models [73–75].

Neural operators (NOs) are a class of models that map function-to-function spaces, as opposed to the finite-dimensional vector space mappings of neural networks. This is especially useful for inverse PDE problems where experimental or simulation data is available. FNOs are a type of NO which represent operator weights in Fourier space. The main advantages of FNOs are that they generalize well across function spaces, and being resolution-invariant, they can be used for inference on higher resolution grids than the training set grids. FNOs have also been used for forward and inverse PDE problems in various domains [72, 76].

In the following, we construct both a PINN and an FNO to predict the XC potential based on input electronic densities. We demonstrate the implementation of these computational workflows and compare both approaches in terms of efficiency and accuracy for their applicability as KS inverters. We observe higher levels of accuracy and ease of use for FNOs and more predictable errors and greater extrapolation power for PINNs.

2. Methods

2.1. KS DFT

In the framework of KS-DFT, we solve a set of effective single-particle equations known as the KS equations (which we present here using atomic units [77] for clarity and simplicity of notation),

$\begin{align} \left[ -\frac{\nabla^2}{2} + v_{s}\left(\boldsymbol{r}\right) \right] \phi_i\left(\boldsymbol{r}\right) = \epsilon_i ~ \phi_i\left(\boldsymbol{r}\right)\,, \end{align} \tag{ 1 }$

where $\phi_i(\boldsymbol{r})$ represents the KS eigenfunctions and ε_i denotes the KS eigenvalues. These KS equations constitute an auxiliary system designed to replicate the electron density of an interacting many-body system, expressed as

$\begin{align} n\left(\boldsymbol{r}\right) = \sum_{i = 1}^{N} \vert \phi_i\left(\boldsymbol{r}\right) \vert^2\,, \end{align} \tag{ 2 }$

where the summation index corresponds to the total number of electrons, denoted as N.

The correspondence of the density defined in equation (2) with the density of an interacting many-body system is established through the KS potential which mimics the inter-electronic interaction and is defined as

$\begin{align} v_\mathrm{{\scriptscriptstyle S}}\left(\boldsymbol{r}\right) = v_\mathrm{{\scriptscriptstyle ext}}\left(\boldsymbol{r}\right) + v_\mathrm{{\scriptscriptstyle H}}\left(\boldsymbol{r}\right) + v_\mathrm{{\scriptscriptstyle XC}}\left(\boldsymbol{r}\right)\ . \end{align} \tag{ 3 }$

Here, $v_\mathrm{{\scriptscriptstyle ext}}(\boldsymbol{r})$ is the external potential, $v_\mathrm{{\scriptscriptstyle H}}(\boldsymbol{r})$ is the Hartree potential (i.e. the electrostatic potential due to an electronic density interacting with itself) and $v_\mathrm{{\scriptscriptstyle XC}}(\boldsymbol{r})$ is the XC potential. Both the external and Hartree potentials are known exactly in terms of the density. The only unknown is the XC potential. While solving the set of KS equations with the exact XC potential yields the exact density of the many-body system, in practice, approximations to the XC energy need to be used. A plethora of approximations is available in the literature [2, 4–13] and more accurate approximations are continuously being developed.

Usually, the KS equations are solved iteratively. This means that a system is defined and an XC approximation is selected. An initial guess of the ground-state density is made, which in turn provides guesses for the Hartree and XC potentials. Equations (1) and (2) are solved, generating a new density, which can be used to calculate new Hartree and XC potentials. This cycle will continue until the density converges to within a chosen criterion. In the end, the resulting density is an accurate approximation of the interacting many-body system, which can be used to calculate electronic properties.

Conversely, equations (1) and (2) can be solved to find the exact KS potential that results in a given target density. In this case, no approximation is made to the XC potential, only an initial guess. Then, by solving equations (1) and (2), the guess for the XC potential can be updated. This process is much less stable than the 'direct' one described above because the potential is sensitive to small changes in the density.

We use the iDEA code [78, 79] for generating data and as benchmark for inversion. In iDEA, the density-to-potential inversion is done iteratively:

$\begin{align} v_{s} \rightarrow v_{s} + \mu \left[n\left(\boldsymbol{r}\right)^p - n_0\left(\boldsymbol{r}\right)^p\right]\,, \end{align} \tag{ 4 }$

where $n_0(r)$ represents the target density, while µ and p serve as convergence parameters.

2.2. Model system

Model systems in one dimension are a useful tool for the development of DFT approximations [59] for two reasons. First, it is simpler to obtain highly accurate or exact data from a one-dimensional system. Second, these model systems can be finely tuned to mimic a wide range of well-understood physical phenomena. Therefore, we leverage one-dimensional models to demonstrate the utility of our inversion method. We define two model systems: a one-dimensional atom and a one-dimensional diatomic molecule, as illustrated in figure 1. A one-dimensional atom is defined by the external potential

$\begin{align} v_\mathrm{{\scriptscriptstyle ext}}\left(\boldsymbol{r}\right) = -\frac{Z}{|\boldsymbol{r}|+a}\,, \end{align} \tag{ 5 }$

where Z is the charge of the atomic well and a is softening parameter to ensure that the potential is defined at all points in space. This simple model can readily be extended to a one-dimensional model of a diatomic molecule by incorporating two potential wells through an additional parameter, d, which represents the bond length in terms of the separation distance between the two wells:

$\begin{align} v_\mathrm{{\scriptscriptstyle ext}}\left(\boldsymbol{r}\right) = -\frac{Z_1}{|\boldsymbol{r}-\frac{d}{2}|+a} - \frac{Z_2}{|\boldsymbol{r}+\frac{d}{2}|+a}. \end{align} \tag{ 6 }$

This idea can be extended by defining additional wells to move from an atomistic view to a material or crystalline structure in one dimension.

**Figure 1.** Illustration of the model systems: one-dimensional atom (blue) and a one-dimensional diatomic molecule (orange) with variable parameters charge Z, Z₁, and Z₂ and bond length d.
Download figure:
Standard image High-resolution image

All data used to train the presented models were generated using external potentials defined in this manner. The electronic densities and KS eigenfunctions were generated using the iDEA code [78, 79]. For the one-dimensional atom of varying charge, Z was varied between the values of two and six in steps of 0.1, resulting in a dataset with 41 individual data points which was split into a train, test, and validation set. These sets had 32, 5, and 4 individual data points each, respectively. For the one-dimensional diatomic molecule, Z₁, Z₂, and d were all varied within the range of one to five, in steps of 0.5. All combinations of the three variables were generated, resulting in a data set with 729 data points. The dataset was split such that the training set had 590 points, the test set had 73, and the validation set had the remaining 66. The dataset can be downloaded from the HZDR Data Repository Rodare via the link https://rodare.hzdr.de/record/2720 [80].

2.3. Physics-informed neural networks

As a subset of neural networks, PINNs use equations and conditions of the underlying physics to enable a neural network to predict physically relevant information. Consider a general partial differential equation of the form

$\begin{align} u\left(x,t\right) + \mathcal{N}\left[u;\lambda\right] = 0\,, \end{align} \tag{ 7 }$

where u is the solution, $x \in \Omega$ a D-dimensional spatial coordinate with Ω a subset of $\mathbb{R}^D$ , $t \in [0,T]$ the time variable, $\mathcal{N}$ a nonlinear operator, and λ the set of parameters that describe a physical system. If a neural network is used to represent the solution u, as $\tilde{u}$ , this gives rise to

$\begin{align} \tilde{u}\left(x,t\right) + \mathcal{N}\left[\tilde{u};\lambda\right] : = f\left(x,t\right)\ . \end{align} \tag{ 8 }$

The function f must only equal zero when the network accurately represents the solution to the differential equation for the given parameters λ. If trained on a wide set of physical data, it should be able to predict solutions to the differential equation within that set more generally.

Based on equation (1), we derive the following model and loss functions within the framework of PINNs:

$\begin{align} f\left(\boldsymbol{r}\right)&=\epsilon_i ~ \phi_i\left(\boldsymbol{r}\right)+\frac{\nabla^2}{2}~\phi_i\left(\boldsymbol{r}\right)- v_{s}\left(\boldsymbol{r}\right)~\phi_i\left(\boldsymbol{r}\right) \end{align} \tag{ 9 }$

$\begin{align} L_\mathrm{{\scriptscriptstyle PDE}} &= \mathrm{}{MSE}\left(f,0\right) = \frac{1}{N_f} \sum_{i = 1}^{N_f}\left| f\left(x_f^i\right) \right|^2\,, \end{align} \tag{ 10 }$

where the loss function $L_\mathrm{{\scriptscriptstyle PDE}}$ is computed in terms of a mean squared error (MSE) $\mathrm{}{MSE}(f,0)$ , with $x_f^i$ denoting one of the N_f collocation points where the solution is provided.

Here, our focus is to invert the KS equation, wherein we have the set of solutions, $\{\phi_i\}$ , and our goal is to get the network to predict the parameters, $v_\mathrm{{\scriptscriptstyle S}}$ , that lead to an appropriate set of solutions. If the network can correctly predict $v_\mathrm{{\scriptscriptstyle S}}$ , then the function f should return a value of zero. What distinguishes this approach from conventional data-driven ML is that the network does not receive the solution set as input for prediction. Instead, it receives only the weighted sum of the solutions in the form of the electronic density calculated using equation (2).

In figure 2, we illustrate our implementation of a PINN that predicts the KS potential for an input density. The network takes the density as an input and processes it through a series of convolutions. The output of each convolution as well as the initial density is used as an input into a dense network. The network is then asked to predict the KS potential on each point in space given this set of features, as well as the first and second derivative of the density.

Convolutional neural networks are a common ML technique for image recognition [81–84]. They observe both local and non-local features in input data to predict what is present in an image. The convolution of two functions f and g is defined as

$\begin{align} \left(f*g\right)\left(t\right) = \int f\left(\tau\right)~g\left(t-\tau\right)~d\tau \end{align} \tag{ 11 }$

and can be described as sum of the overlap of two functions at each point in t. In our convolutional PINN, this is accomplished by using a set of convolutions in series and in parallel. Back propagation is used to learn functions g that emphasize input features crucial for predicting a class. Increasing the number of convolutions in series results in the passing of additional non-local information to the features.

It is widely known [1, 4] that the XC potential is a nonlocal functional of the electronic density. Consequently, training a network to correctly predict an XC potential based only on the electronic density at a single point in space is ill-advised. Our PINN model uses convolutions to account for some of the non-locality of the XC potential. Each step through the convolutions indicates a larger regime of non-locality that has been included in its output features.

2.4. FNOs

Neural operators (NOs) [85, 86] are an extension of neural networks that take functions as inputs and return functions as outputs. While traditional neural networks map finite-dimensional vectors to finite-dimensional vectors, neural operators work on infinite-dimensional function spaces. A neural network can be defined as a mapping $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ , whereas a neural operator maps functions to functions, i.e. $G: \mathcal{A} \rightarrow \mathcal{U}$ , where $\mathcal{A}$ and $\mathcal{U}$ are function spaces. Given that there exists a non-linear map $\mathcal{G}^{\dagger}: \mathcal{A} \rightarrow \mathcal{U}$ , our goal is to construct a neural operator that approximates this map, i.e. $\mathcal{G}_{\theta} \approx \mathcal{G}^{\dagger}$ , where $\mathcal{G}_\theta: \mathcal{A} \rightarrow \mathcal{U}$ with parameters θ in finite dimensional space $\mathbb{R}^p$ .

The neural operator is trained on point-wise (in function space) observations of the form $\{(a_i, u_i) \}_{i = 1}^N$ , where $\left( a_i \in \mathcal{A}; a_i(x) \in \mathbb{R}, x \in \mathbb{R} \right)$ , $\left( u_i \in \mathcal{U}; u_i(x) \in \mathbb{R}, x \in \mathbb{R} \right)$ and $u_i = \mathcal{G^{\dagger}} (a_i)$ . The goal is to find the parameters $\theta^*$ that minimize the loss

$\begin{align} \theta^* = \min _{\theta \in \mathbb{R}^p} \frac{1}{N} \sum_{i = 1}^N\left\|u^{\left(i\right)}-\mathcal{G}_\theta\left(a_i\right)\right\|_{\mathcal{U}}^2. \end{align} \tag{ 12 }$

Analogous to the neural networks being defined as multi-layer compositions of linear and non-linear operations, we can define a neural operator $\mathcal{G}_\theta(a)$ as multi-layer compositions of linear and non-linear operators acting directly in function space:

$\begin{align} \mathcal{G}_\theta\left(a\right) = Q\left(v_L\left(v_{L-1}\left(\dots v_1\left(P\left(a\right)\right)\right)\right)\right)\,, \end{align} \tag{ 13 }$

where the layer $v_{l+1}$ is defined as

$\begin{align} v_{l+1}\left(x\right) = \sigma_{\left(l+1\right)}\left(W_l v_l\left(x\right)+\left(\mathcal{K}_l\left(a ; \lambda\right) v_l\right)\left(x\right)\right)\ . \end{align} \tag{ 14 }$

Here, the non-local kernel integral operator $\mathcal{K}_l$ is defined as

$\begin{align} \left(\mathcal{K}_l\left(a ; \lambda\right) v_l\right)\left(x\right) = \int_{\Omega_{l}} \kappa_{l}\left(x, y,a\left(x\right), a\left(y\right)\right) v_l\left(y\right) d y \end{align} \tag{ 15 }$

over the domain $\Omega_l$ .

The kernel function $\kappa_l(x,y,a(x), a(y))$ is a function of $x,y$ , and the input functions a(x) and a(y). The kernel function, parametrized by λ, and the local linear operator W_l , a matrix of weights, are learned during training. $\sigma_{(l+1)}$ is a local non-linear activation function. The local operator P(a) is a pre-processing operator that transforms the input function a into a higher dimensional representation v₀, which is the input to the first layer. The local operator $Q(v_L)$ is a post-processing operator that projects the output of the last layer v_L back into $\mathcal{U}$ .

Several methods can be used to approximate the global integral calculation in each layer, such as graph neural networks with Monte-Carlo sampling [86] and multipole graph neural operators [87]. These approximations reduce the computational complexity and allow for scalable solutions [76].

FNOs [72] leverage the Fourier transform to work in the frequency domain. Notice that by using a kernel $\kappa(x-y)$ , equation (15) is a convolution operator, and by using the convolution theorem, equation (15) becomes

$\begin{align} \left(\mathcal{K}_l\left(a ; \lambda\right) v_l\right)\left(x\right) &= \mathcal{F}^{-1}\left(\mathcal{F}\left(\kappa_l\right) \cdot \mathcal{F}\left(v_l\right)\right)\left(x\right) \end{align} \tag{ 16 }$

$\begin{align} \mathcal{F}\left(x\right) &= \sum_{n = 0}^{N-1}~x_n \cdot e^{-\frac{i2\pi}{N}kn} = X_k \end{align} \tag{ 17 }$

$\begin{align} \mathcal{F}^{-1}\left(X\right) &= \frac{1}{N}\sum_{n = 0}^{N-1}~X_k \cdot e^{\frac{i2\pi}{N}kn} = x_n, \end{align} \tag{ 18 }$

where $\mathcal{F}$ is the Fourier transform and $\mathcal{F}^{-1}$ is its inverse. The kernel $\kappa_l$ is parametrized in Fourier space. Here the parameterization is finite-dimensional determined by the number of Fourier modes $k_\textrm{max}$ . By working in frequency domain, the kernels can be computed efficiently using the fast-Fourier tranform and the model can capture global patterns more effectively.

A distinction has to be made between the solving approach (PINNs) and the learning approach (NOs) for PDE problems. PINNs are useful for obtaining numerical solutions for specific initial and boundary conditions and PDE parametrizations. In contrast, learning a PDE via neural operators aims to learn the underlying operator itself, enabling the prediction of solutions for various conditions without re-solving the PDE. PINNs can be used to solve PDEs if the PDEs are defined, even in the absence of training data, while NOs can be used to approximate undefined PDEs with training data. NOs are resolution-invariant and can also be used for zero-shot super-resolution [72]. A rigorous mathematical definition and analysis for NOs is provided in [76]. The strengths of FNOs in this context lie in their inherent capability to capture non-local dependencies and their resolution invariance.

Given the non-local dependence of the XC potential on the electronic density, the non-locality of FNO layers make them well-suited for the inverse problem in KS-DFT. The input in this case is a grid of electronic density values and the output is the XC potential at the corresponding spatial grid points. The loss function is simply the MSE between true and predicted XC potential:

$\begin{align} L_\textrm{FNO} = \mathrm{}\textrm{MSE}\left(v_\mathrm{{\scriptscriptstyle XC}}, \tilde{v}_\mathrm{{\scriptscriptstyle XC}}\right) = \frac{1}{N} \sum_{i = 1}^{N}\left| v^{\left(i\right)}_\mathrm{{\scriptscriptstyle XC}} - \tilde{v}^{\left(i\right)}_\mathrm{{\scriptscriptstyle XC}}\right|^2\, \end{align} \tag{ 19 }$

where $\tilde{v}_\mathrm{{\scriptscriptstyle XC}}$ denotes the predicted XC potential for system i. The architecture we implemented is shown in figure 3.

2.5. Similarity measures

We evaluate the complexity of the datasets using two different similarity measures, namely the cosine similarity and Euclidean distance. The cosine similarity between two vectors X and Y in $\mathbb{R}^D$ is defined as

$\begin{align} \text{cos}\left(\theta\right) = \frac{\mathbf{X} \cdot \mathbf{Y}}{\Vert \mathbf{X} \Vert \Vert \mathbf{Y} \Vert}, \end{align} \tag{ 20 }$

which returns values in the range $[-1,1]$ , namely −1 if the vectors point in opposite directions, 1 if they point in the same direction, and 0 if they are orthogonal to each other. The Euclidean distance between the same vectors is defined as

$\begin{align} D\left( \mathbf{X}, \mathbf{Y} \right) = \Vert \mathbf{X} - \mathbf{Y} \Vert. \end{align} \tag{ 21 }$

This measure has a lower bound of zero but no upper limit. The value returned is 0 only if the vectors X and Y are the same vector. Then, as two vectors continue to move further apart, the value moves further away from zero. The Euclidean distance includes some details related to the cosine similarity, but it also describes the difference in magnitude of the values found in the two vectors. This is a significant piece of information that the cosine similarity measure misses. Both of these measures provide insight into the complexity of each dataset and the network architecture's capacity to work on each of them.

We illustrate the similarity for both the atomic and the molecular model systems in figure 4. As expected, the diatomic system is more complex than the atomic system, which is evident from the broader range of values in the diatomic case, versus the atomic case, for both cosine similarity and Euclidean distance measures. In the cosine similarity plots for the diatomic system, a block pattern emerges. The largest blocks shift with a change in the bond length. Descending to the bottom or moving to the right of the plot corresponds to an increase in the separation between two atomic centers. The greatest dissimilarity is noted at the two extremes of separation. Interestingly, at larger separations, the vectors show slightly less self-similarity compared to those at lower separations, as indicated by the darker shading of the bottom right block compared to the top left. The Euclidean distance plot yields similar trends, although it does not show this distinction within varying separation distances.

3. Results

We next present our findings for the two PIML approaches, comparing results using the PINN model as well as the FNO model. We trained both models on the same datasets and evaluate the model performance via the accuracy of the returned eigenvalues when solving the KS equations with the predicted XC potentials. We assess the ability of both PIML approaches as inverters for the KS equations in terms of the two aforemorentioned model systems: the one-dimensional atom with varying charge and the diatomic molecule with varying charges and bond length. Both the PINN model and the FNO model were trained on model systems with a grid resolution of 301 in real space. The performance was measured for systems with resolutions of 301 and 501, as well as for 301-grid systems with parameters outside the training range (extrapolation).

In all model systems, we consider the first six eigenstates, where each model is occupied with two same-spin fermions. Additionally, we consider four virtual KS orbitals. The set of occupied and virtual eigenstates defines the shape of the KS potential. The lowest-energy eigenvalue indicates how well networks predict the cusp of the atoms, while the highest value signifies how well they predict the tail of the potential.

We evaluate each model for ten neural networks with different random seeds. The error is quantified using the mean absolute error (MAE) and the mean absolute percentage error (MAPE) of the eigenvalues defined as

$\begin{align} \text{MAE}_{i,j} & = \frac{1}{N} \sum_{n = 1}^{N} \left| \tilde{\epsilon}_{i,j,n} - \epsilon_{i,j} \right| \end{align} \tag{ 22 }$

$\begin{align} \text{MAPE}_{i,j} & = \frac{1}{N} \sum_{n = 1}^{N} \left| \frac{\tilde{\epsilon}_{i,j,n} - \epsilon_{i,j}}{\epsilon_{i,j}} \right| \times 100\%\,, \end{align} \tag{ 23 }$

where $\tilde{\epsilon}_{i,j,n}$ denotes the predicted eigenvalues and each predicted eigenvalue is indexed by $i,j,n$ , with i denoting the level index, j the system index, and n the network index, with each neural network having a different random seed.

3.1. Inversions with physics-informed neural networks

3.1.1. Atom with varying charge

Figure 5 illustrates the performance of the PINN model for a one-dimensional atom with a variable parameter Z corresponding to its charge, plotting the MAE for a set of fixed charges. Each data point represents the mean of ten neural network predictions that were generated using varying network initializations and random seeds. The data points are color-coded to indicate ground and excited states in the order of blue, orange, red, green, purple, and brown. This order is maintained in the following discussion of the results.

The performance of the PINN model is best when the well charge Z is small. The error systematically increases with the charge, particularly for the lowest-lying eigenvalue. The overall MAE averaged over the test systems is 1.77 $\times\,10^{-3}$ Ha (denoted by a gray horizontal line in figure 5). The PINN model is competitive with the chemical accuracy criterion of being within 1 kcal mol⁻¹ or 0.0016 Ha, as marked by the red horizontal line in our figures.

We have also displayed the the MAPE of the individual eigenvalues in figure 6. The greatest error is in the first two eigenvalues of the systems, with both lying above the total average MAPE of around 0.13%, while the next four lie below this threshold.

We have also explored the relationship between the performance of the PINN model and the resolution of the spatial grid. We suspect that the source of the errors observed in figure 5 is related to the resolution of the grid used to train the PINN model, which depends on the accuracy of the KS orbitals and their second derivatives. We anticipate that as the grid resolution improves, the accuracy of the second derivatives will also improve, particularly at the sharp cusp-like feature in the KS potential. Our findings indicate that errors diminish with the enhancement of grid resolution. This correlation presents a methodical approach for reducing the error in the PINN model. The improvements are reported in table 1, which showcases the systematic enhancement in performance with increasing grid resolution (see Atom 501).

Table 1. Performance comparison of PINN and FNO models on both model systems, showing the MAE (in Ha) and MAPE (%) and denoting the maximum absolute errors and the maximum percentage errors in braces. The labels 301 and 501 indicate the grid resolutions and E denotes extrapolation.

Model	Atom (301)	Atom (501)	Molecule (301)	Molecule (301 E)
PINN	0.0018 Ha	0.0007 Ha	0.0125 Ha	0.0248 Ha
	(0.0110 Ha)	(0.0034 Ha)	(0.1000 Ha)	(0.1200 Ha)
	0.13%	0.06%	0.58%	0.73%
	(0.30 %)	(0.15%)	(2.37%)	(3.25%)
FNO	0.0002 Ha	0.0002 Ha	0.0002 Ha	0.0054 Ha
	(0.0009 Ha)	(0.0010 Ha)	(0.0051 Ha)	(0.0584 Ha)
	0.03%	0.03%	0.02%	0.25%
	(0.15%)	(0.19%)	(0.46%)	(3.51%)

3.1.2. Diatomic molecule with varying charges and bond length

When switching to the diatomic molecule, the learning task becomes much more complex, as both the charges (Z₁ and Z₂) and the bond length (d) are variable parameters. This is exemplified in the roughly 10-fold increase in errors within the predicted eigenvalues. The MAE is shown in figure 7. Compared to the atomic test system, which had an overall MAE averaging around 1.77 $\times\,10^{-3}$ Ha, the overall MAE has increased to 1.25 $\times\,10^{-2}$ Ha (refer to the gray horizontal line in figure 7). In the atomic test system, the largest errors were observed in the ground, first, and second states of the KS spectrum. All eigenvalues exhibit some errors above the limit of chemical accuracy. This suggests that the PINN model has difficulty predicting the tails of the XC potential, but not to the same extent as placing and accurately estimating the well depth. The relative ordering of errors on the states remains consistent.

**Figure 7.** MAE of eigenvalues predicted by the PINN model on a logarithmic scale for the one-dimensional diatomic molecule. The MAE is averaged over 10 network initializations. The red horizontal line denotes chemical accuracy, while the gray horizontal line depicts the MAE averaged over all test systems. The colors label the eigenstates in increasing order of energy as described in figure 5.
Download figure:
Standard image High-resolution image

We have further broken down the error analysis into individual eigenvalues in figure 8. We display the MAPE for the first six eigenvalues, which exhibits the larger errors for the ground and first excited states observed in the MAEs plot. The overall MAPE averages around 0.575% (denoted by a gray horizontal line in figure 8).

3.1.3. Extrapolation for diatomic molecule

Purely data-driven neural network approaches rapidly degrade in predictive performance when provided with data outside the training set. However, using physics-based approaches such as PINNs and also FNOs, higher extrapolation performance can be expected. Compared to data-driven learning, the neural network in the PINN model represents the solution to a given PDE, allowing accurate predictions within the specific scope of that PDE. We evaluate the PINN model's extrapolation ability by asking it to predict on a set of densities outside the training set.

The outcome of this analysis is illustrated in figure 9, which shows the MAE. The errors have increased as anticipated, resulting in an overall MAE of 2.48 $\times\,10^{-2}$ Ha indicated by the gray horizontal line. The error has doubled in comparison to the interpolation task featured in figure 5. While this is not a catastrophic failure, the higher error suggests the intricacy of extrapolation. The challenge of extrapolation is evident in figure 10, where we illustrate the MAPE resolved over the first six eigenvalues with an overall MAPE of 0.73%, denoted by a gray line. Although no significant or sudden rise in error is observed with increasing eigenvalues, the overall MAPE is noticeably higher, as is the standard deviation shown by the red error bars.

**Figure 10.** Extrapolation with the PINN model for the one-dimensional diatomic molecule. The MAPE is resolved for the first six eigenvalues. The red error bars denote three standard deviations away from the mean, while the gray horizontal line depicts the MAPE averaged over all eigenvalues.
Download figure:
Standard image High-resolution image

3.2. Data-driven inversions with FNOs

In contrast to the PINN model, the density-to-potential mapping data is explicitly used in the loss function to train the data-driven FNOs, as outlined in section 2.4. The network takes the density grid as input and produces the corresponding potential grid as output. FNOs have several benefits, including learning the mapping in the functional space, which enables them to generalize effectively over varying input densities. Additionally, the network is resolution-invariant, allowing it to assess higher resolution grids than those present in the training dataset.

3.2.1. Atom with varying charge

In the case of the one-dimensional atom with varying charge Z, the FNO model demonstrates exceptional performance over the range of charge values. The error plots for the eigenvalues, as shown in figure 11, indicate that the predicted eigenvalues closely align with the true values. We resolve the error analysis over the spectrum of eigenvalues by showing the MAPE for the first six eigenvalues in figure 12. The red error bars denote three standard deviations away from the mean, demonstrating that the MAPE is within 0.15% in this range of eigenvalues. The range of MAPEs is lowest for the ground state and shows a slight increasing trend with higher energy levels. This increase can be attributed to the magnitude of the eigenvalues decreasing as the energy level increase. Higher energy states, with their inherently lower magnitudes, offer a smaller margin for absolute error, thus demanding greater accuracy. The model achieves this to a lesser degree, although the level of error is still well within the range of chemical accuracy. The overall MAE averaged over all model systems is 2.22 $\times\,10^{-4}$ Ha and the corresponding averaged MAPE over all eigenvalues is $3.18{\times10^{-2}}\%$ denoted by gray horizontal lines in figures 11 and 12. This underscores the FNO model's ability to accurately capture the potential-density mapping.

**Figure 11.** MAE of eigenvalues predicted by the FNO model on a logarithmic scale for the one-dimensional atom. The MAE is averaged over 10 network initializations. The red horizontal line denotes chemical accuracy, while the gray horizontal line depicts the MAE averaged over all test systems. The colors label the eigenstates in increasing order of energy as described in figure 5.
Download figure:
Standard image High-resolution image

**Figure 12.** The MAPE resolved for the first six eigenvalues of the one-dimensional atom with varying charge predicted by the FNO model. The red error bars denote three standard deviations away from the mean, while the gray horizontal line depicts the MAPE averaged over all eigenvalues.
Download figure:
Standard image High-resolution image

3.2.2. Diatomic molecule with varying charges and bond length

After demonstrating improved performance over the PINNs model for the one-dimensional atom, we investigate the FNO model's abilities for the one-dimensional diatomic molecule. Variations in both charges (Z₁ and Z₂) and bond length d pose a different set of challenges for the FNO model. As depicted in figure 13, the plots for eigenvalue errors indicate a slightly higher margin of error in comparison to the atomic model system. The eigenvalue errors consistently remain within chemical accuracy, except for a particular outlier. Figure 14 shows the MAPE range for each eigenvalue. There is a clear trend of increasing MAPE range from the ground state to excited states. Nevertheless, the overall MAE of 2.27 $\times\,10^{-4}$ Ha and MAPE of $1.90{\times10^{-2}}\%$ (denoted by gray horizontal lines) align with the results of the atomic model system, confirming the FNO model's adaptability and accuracy across this more challenging set of system parameters.

**Figure 13.** MAE of eigenvalues predicted by the FNO model on a logarithmic scale for the one-dimensional diatomic molecule. The MAE is averaged over 10 network initializations. The red horizontal line denotes chemical accuracy, while the gray horizontal line depicts the MAE averaged over all test systems. The colors label the eigenstates in increasing order of energy, as described in figure 5.
Download figure:
Standard image High-resolution image

**Figure 14.** The MAPE resolved for the first six eigenvalues of the one-dimensional diatomic molecule with varying charges and bond length predicted by the FNO model. The red error bars denote three standard deviations away from the mean, while the gray horizontal line depicts the MAPE averaged over all eigenvalues.
Download figure:
Standard image High-resolution image

3.2.3. Extrapolation for diatomic molecule

The true test of a predictive model lies in its ability to generalize beyond the parameters of the systems in the training dataset. Figure 15 showcases the eigenvalue error plots for this dataset, which include systems with charges (Z₁ and Z₂) and bond lengths (d) that are outside the previous datasets. The errors recorded were marginally higher than those for the training dataset, and about half the eigenvalues are predicted within chemical accuracy. The percentage errors are within 2%, as depicted in figure 16. The FNO model yields overall an MAE of 5.4 $\times\,10^{-3}$ Ha and MAPE of $0.25 \%$ (denoted by gray horizontal lines). This demonstrates the FNO model's robustness and potential for broader applicability in solving the KS inversion problem.

**Figure 16.** Extrapolation with the FNO model for the one-dimensional diatomic molecule. The MAPE is resolved for the first six eigenvalues. The red error bars denote three standard deviations away from the mean, while the gray horizontal line depicts the MAPE averaged over all eigenvalues.
Download figure:
Standard image High-resolution image

4. Discussion

We summarize and contrast our results for the PINN and FNO models in table 1 for both atomic and molecular model systems. Our performance metrics are the MAE expressed in Hartree (Ha) and the MAPE. Overall, the FNO outperforms the PINN model, providing chemical accuracy (MAE $\unicode{x2A7D} 0.0016$ Ha) for almost all model systems.

For atoms, the PINN model shows an MAE of 0.0018 Ha, corresponding to a relative error of 0.13%, while the FNO model shows an MAE of 0.0002 Ha, corresponding to a relative error of 0.03%. The PINN model significantly improves its accuracy when the grid resolution is increased to 501, while the FNO model shows consistent accuracy over different grid resolutions.

When applied to molecules, the error of the PINN model increases significantly, especially for the extrapolation task, with an MAE reaching 0.0248 Ha and a relative error up to 0.73%. The FNO model maintains a lower error rate and remains within the chemical accuracy except for the extrapolation task, where its MAE increases to 0.0054 Ha and a relative error of 0.25%. Despite the increase in error, the FNO model is still highly accurate. The values in parentheses represent additional metrics, namely the maximum absolute error and the maximum percentage error, which indicate the maximum spread of errors to evaluate the robustness of the models. Of additional note is the fact that FNOs are resolution invariant, indicated by the N/A in table 2. Here we trained the FNO on data from a set with a grid resolution of 301 points, which could be used to infer on a density generated on a grid with 501 points. The convolutional PINNs presented in this paper are not able to do this.

Table 2. Computational timings for inversions (in seconds), comparing a conventional method (iDEA) with the PINN and FNO models.

Model	Atom (301)	Molecule (301)	Molecule (501)
iDEA	$135 \pm 29$	$305 \pm 262$	$806 \pm 447$
PINN	$0.0016 \pm 0.0011$	$0.0026 \pm 0.0013$	N/A
FNO	$0.0022 \pm 0.0009$	$0.0020 \pm 0.0003$	$0.0037 \pm 0.0004$

Having established the accuracy of the PIML methods, we turn to analyze the computational efficiency. To this end, table 2 shows the computational timings of the PINN and FNO models against a conventional inversion method (iDEA code) across the different model systems. Timing results are given in seconds, with standard deviations to indicate variability in computational performance. For atoms, the conventional inversion method shows a mean computation time of 135 ${\pm}$ 29 seconds, which is significantly higher than both PINN and FNO models, which show mean computation times of 0.0016 ${\pm}$ 0.0011 seconds and 0.0022 ${\pm}$ 0.0009 seconds, respectively. The difference is even more pronounced for the molecular systems, where the conventional method requires 305 ${\pm}$ 262 seconds, while PINN and FNO remain in the millisecond range. In particular, for the molecular system with larger grid resolution (501 grid points), the computational time of the conventional method continues to increase, while FNO maintains an efficient computational evaluation in the millisecond range. These results highlight the substantial improvements in computational speed offered by the PINN and FNO models over conventional methods, with FNO showing slightly faster times and lower variability than PINN in the scenarios tested, suggesting a robustness that could be particularly beneficial for larger and more complex systems.

Looking beyond the main scope of this work, the PIML inversion scheme provides an alternative route to forward-direction self-consistent DFT calculations. In figure 17, our FNO models for the one-dimensional molecular system have been implemented inside of a self-consistent field cycle. This implementation exhibits rapid convergence to reference densities in six iterations, even starting from an initial density missing all the characteristic features found in the reference. Similar results are found for other example systems, shown in the Supplementary Material. This application of our results illustrates a pragmatic approach to using this or other reliable inversion methods in practical calculations.

**Figure 17.** Example of self-consistent calculation for a molecular system with $d = 4.5,z_1 = 2.0,z_2 = 4.5$ . Starting with a constant initial density, the calculations are iteratively carried out using the FNO model to calculate the $v_\textrm{xc}$ at each iteration. For (a) and (b), the dotted light blue line is the initial density/potential, solid light blue line represents intermediate stages and the solid dark blue line is the converged density/potential. The dashed orange line is the exact density/potential obtained from iDEA. The bottom plot shows the absolute difference between the converged SC quantity and reference quantity.
Download figure:
Standard image High-resolution image

**Figure 17.** Example of self-consistent calculation for a molecular system with $d = 4.5,z_1 = 2.0,z_2 = 4.5$ . Starting with a constant initial density, the calculations are iteratively carried out using the FNO model to calculate the $v_\textrm{xc}$ at each iteration. For (a) and (b), the dotted light blue line is the initial density/potential, solid light blue line represents intermediate stages and the solid dark blue line is the converged density/potential. The dashed orange line is the exact density/potential obtained from iDEA. The bottom plot shows the absolute difference between the converged SC quantity and reference quantity.
Download figure:
Standard image High-resolution image

5. Conclusion

In summary, this work has demonstrated the potential of PIML techniques, specifically PINNs and FNOs, to address the inverse problem in KS-DFT. The PINN model uses the underlying KS differential equation to predict the KS potential from given electron densities. FNOs, on the other hand, use data from known density-to-potential mappings. They exhibit superior performance in both accuracy and computational efficiency as PINNs across a range of grid resolutions and complexity of model systems. In particular, FNOs maintain chemical accuracy in most cases, underscoring their ability to handle the density-to-potential mapping effectively. The computational times for both PIML approaches significantly outperform conventional methods, in both direct and self-consistent calculations, offering a promising alternative for large-scale electronic structure inversions.

A promising avenue for future work is to combine the strengths of both PINNs and FNOs. This could be realized by a hybrid model with a two-component loss term: one part based on the density-to-potential mapping characteristic of the FNO model, and another part derived from the underlying KS differential equation represented by the PINN model. During training, both the FNO and PINN components would be optimized simultaneously. This approach is expected to yield an efficient PIML model that not only exploits the data-driven robustness of the FNO component, but also requires less data and exhibits broader generalizability due to the PINN-derived loss. Such models, referred to as physics-informed neural operators (PINOs), have been tested on various PDE benchmarks [88, 89]. A test of the robustness of the PINO approach would be applying it to KS inversion for real systems.

Our results indicate that the integration of machine learning, in particular PIML techniques, can significantly improve the efficiency and accuracy of electronic structure inversions. These advances offer promising prospects for elucidating the XC potential of materials from experimental electron density data and for advancing the development of XC functionals within DFT, providing a reliable computational tool for accurate density-to-potential inversions.

Acknowledgments

This work was partially supported by the Center for Advanced Systems Understanding (CASUS) which is financed by Germany's Federal Ministry of Education and Research (BMBF) and by the Saxon state government out of the State budget approved by the Saxon State Parliament. Computations were performed on a Bull Cluster at the Center for Information Services and High-Performance Computing (ZIH) at Technische Universität Dresden and on the cluster Hemera of the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). A C and K S acknowledge funding from the Helmholtz Association's Initiative and Networking Fund through Helmholtz AI.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://rodare.hzdr.de/record/2720.

Inverting the Kohn–Sham equations with physics-informed machine learning

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. KS DFT

2.2. Model system

2.3. Physics-informed neural networks

2.4. FNOs

2.5. Similarity measures

3. Results

3.1. Inversions with physics-informed neural networks

3.1.1. Atom with varying charge

3.1.2. Diatomic molecule with varying charges and bond length

3.1.3. Extrapolation for diatomic molecule

3.2. Data-driven inversions with FNOs

3.2.1. Atom with varying charge

3.2.2. Diatomic molecule with varying charges and bond length

3.2.3. Extrapolation for diatomic molecule

4. Discussion

5. Conclusion

Acknowledgments

Data availability statement

Inverting the Kohn–Sham equations with physics-informed machine learning

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. KS DFT

2.2. Model system

2.3. Physics-informed neural networks

2.4. FNOs

2.5. Similarity measures

3. Results

3.1. Inversions with physics-informed neural networks

3.1.1. Atom with varying charge

3.1.2. Diatomic molecule with varying charges and bond length

3.1.3. Extrapolation for diatomic molecule

3.2. Data-driven inversions with FNOs

3.2.1. Atom with varying charge

3.2.2. Diatomic molecule with varying charges and bond length

3.2.3. Extrapolation for diatomic molecule

4. Discussion

5. Conclusion

Acknowledgments

Data availability statement