Chemical Properties from Graph Neural Network-Predicted Electron Densities

According to density functional theory, any chemical property can be inferred from the electron density, making it the most informative attribute of an atomic structure. In this work, we demonstrate the use of established physical methods to obtain important chemical properties from model-predicted electron densities. We introduce graph neural network architectural choices that provide physically relevant and useful electron density predictions. Despite not training to predict atomic charges, the model is able to predict atomic charges with an order of magnitude lower error than a sum of atomic charge densities. Similarly, the model predicts dipole moments with half the error of the sum of atomic charge densities method. We demonstrate that larger data sets lead to more useful predictions in these tasks. These results pave the way for an alternative path in atomistic machine learning, where data-driven approaches and existing physical methods are used in tandem to obtain a variety of chemical properties in an explainable and self-consistent manner.


Introduction
Global implementation of sustainable energy technologies requires the discovery of efficient and economical catalysts for chemical reactions such as water oxidation, carbon dioxide reduction, and ammonia synthesis.In these reactions, it is common to use a solid surface as a heterogeneous catalyst.Because the design space of material surfaces is extremely vast, computational modeling techniques are employed to identify promising catalysts for further experimental study.Broadly speaking, these methods can be divided into two categories: physics-based and data-driven.
First, there are physics-based methods.Of these, density functional theory (DFT) is the most popular choice for heterogeneous catalysis.In DFT, the valence electrons in a chemical structure are treated as a density field rather than individual particles described by wavefunctions.This method is based on the two Hohenberg-Kohn Theorems, which state that 1) All chemical properties are a function of electron density, and 2) the true electron density minimizes the energy function. 1,2Thus, in a typical DFT calculation, an initial electron density is guessed.This density is updated iteratively until energy is minimized.
Many properties can then be easily calculated from the converged electron density, such as atomic charges, dipole moment, and atomic forces.
More recently, machine learning methods have been developed as a computationally cheaper alternative to DFT.One popular machine learning method for chemistry is the graph neural network (GNN).In this paradigm, atoms are represented as nodes in a graph, and information is exchanged along the edges between them in a scheme called "message passing".Assuming a constant volume, the computational cost of GNNs scales with the square of the number of atoms, while the cost of DFT generally scales with the cube.When assuming a constant atomic density, the comparison is even more favorable for GNNs: the cost scales linearly with increasing volume for GNNs but cubically for DFT.
However, in comparison to DFT, GNNs have yet to demonstrate competitive accuracy in heterogeneous catalysis.The largest and most diverse data set for heterogeneous catalysis is Open Catalyst 2020 (OC20). 3Each structure in this data set is an adsorbate placed at a site on a catalyst surface, and the target is the associated adsorption energy.However, as of yet, no machine learning model achieves an adsorption energy mean absolute error (MAE) below 0.3 eV (29 kJ/mol) on the test data. 4By propagating this uncertainty through a simple Arrhenius model at a temperature of 300K, one can see an uncertainty of more than five orders of magnitude in kinetic rate constants.Thus, there is a pressing need for more accurate models in this field.
To bridge the gap between the accurate physics-based models and machine learning models, machine learning can be used to predict electron density directly.An accurate prediction of electron density can be used in tandem with established methods, including DFT, to predict many physical quantities.For example, the Hellmann-Feynman Theorem asserts that electron density alone is sufficient to determine atomic forces without the need to assume any particular functional. 5,6By integrating these physical methods, machine learning models can become more useful, explainable, and transferable.
Because the electron density exists in three dimensions throughout the atomic structure, a significant amount of information is required to represent it accurately.It can be projected onto a basis set, but more often in heterogeneous catalysis, electron density data is available at a grid of points.The density is sampled at a large number of points (in our data, 5 × 10 6 on average) throughout the atomic structure.This huge amount of data is largely underutilized by machine learning methods to date.In fact, most machine learned potentials do not use electron density data at all.Several machine learning methods do exist for electron density prediction.][9][10] This low-dimensional representation of electron density is natural for molecules.On the other hand, it is not trivial to project electron density onto atom-centered basis functions in periodic systems such as surfaces, where periodic basis functions such as planewaves are common.
More recently, Rackers et al. introduced an equivariant GNN to predict electron density. 11,12They also demonstrate that increasing the order of the atom-centered basis function set is necessary for accurate short-range force prediction.Increasing basis set order rapidly increases the computational cost of a GNN as well as the difficulty in fitting an appropriate atom-centered basis to a planewave-based electron density.Considering these factors, this method is not well-suited for heterogeneous catalysis at present.On the other hand, pointwise methods do not rely on atom-centered basis function representations of the electron density.Instead, the model predicts electron density at predetermined points in space called probes.This allows electron density prediction at arbitrary resolutions, and the cost increases only linearly with the number of probes.This method is particularly attractive for periodic systems because the electron density does not have to be represented by atom-centered basis functions.Furthermore, popular DFT codes for heterogeneous catalysis, such as Vienna Ab Initio Simulation Package (VASP), 13-15 readily provide point-wise samplings of electron density.In the method introduced by Jrgensen and Bhowmik, 16 the probes are added to the graph in the GNN as information receivers (but not senders).Similarly, Achar et al. 17 introduced a pointwise approach to electron density predictions for solid materials, although this method is not a GNN.We decided to use a pointwise approach in this work due to its natural integration with planewave DFT codes.

Methods Modeling
Our model is based on that of Jrgensen and Bhowmik.We make two key enhancements to the model that enable the use of its predictions with physical methods such as DFT.
First, many probes are sufficiently far from any atoms such that they receive no information in the message-passing procedure.Our model assumes the electron density at these points to be zero.The electron density decays slowly as the distance to the atoms increases, so probe points outside the cutoff radius often still carry some electron density.Since the model has no information for these points, it is forced to predict the same density for all of them.
After training, this results in a non-physical uniform negative charge in the void space, which can have undesired consequences when these predictions are used in downstream applications such as charge partitioning and DFT calculations.Instead, our model constrains this density to be zero.Second, our model enforces a charge balance.Since our systems are periodic, a prediction where the total charge is not balanced exactly is unsuitable for many physical models.Thus, our model takes as an input the total number of valence electrons in the atomic structure.
We implement the following assumption: a random sampling of probes should have a proportional share of the valence electron density.That is, if the probes are randomly divided into ten batches, each batch should contain one tenth of the valence electron density.This assumption is necessary because treating all probes in a single batch is computationally infeasible.It also means that after predictions are made on all batches, the total electron density in the atomic structure is guaranteed to sum to the correct number of valence electrons exactly.
Outside of the model differences, several additional features differentiate this work from previous work.Our implementation is built as an extension of the Open Catalyst Project (OCP) modeling software.This enables the integration of existing message-passing models, including the re-use of models trained on other tasks.For example, model weights from a machine-learned potential can be used.We anticipate further integration with this software will allow us to leverage future developments of the OCP code repository.
Another notable feature is the probe edge-finding algorithm.Previous implementations had a computational cost of O((A+P ) 2 ), where A is the number of atoms and P is the number of probes.This is because no distinction was drawn between atom and probe nodes, and a general neighbor-finding algorithm is used.Because we are not interested in edges between probes, the cost of this algorithm can be reduced to O(A(A+P )), which is significantly better because A is several orders of magnitude smaller than P .To achieve this, we modified the neighbor-finding algorithm available in the OCP modeling software.The standard procedure is to consider every possible edge between two nodes, determine the distance, and remove any which are longer than some specified cutoff radius.Because we are not interested in probeprobe edges, we can save a significant amount of memory by not computing these distances.
We partition the graph into two sets, the set of atoms and the set of probes.Distances are only computed from nodes in the set of atoms.This procedure is written in Pytorch code that runs efficiently on both CPU and GPU.It also respects periodic boundary conditions in one, two, or three directions.Under previous implementations of the probe graph training procedure, the number of probes used in training was limited by graph construction speed.
In contrast, our implementation is limited by the memory requirements of message passing on large graphs.This also enables rapid inference with significant speedup compared to previous implementations.

Data Set
We prepared a large and diverse data set of charge density in heterogeneous catalysis.The atomic structures in this data set are from OC20.32794 random structures from the OC20 train split comprise the training set, and 897 random structures from the OC20 "both out of distribution" split comprise the validation set.One hundred random structures from the OC20 "both out of distribution" split comprise the test data set.No adsorbate or surface that is present in the training set is in the validation or test set.Self-consistent electronic structure optimization was performed in VASP using the Revised Perdew-Burke-Ernzerhof (RPBE) functional.
The electron density in this data set is obtained directly from the CHGCAR output file and represents a pseudo-valence electron density.The density is sampled on a uniform, threedimensional grid of points.The number of electron density probe points varies depending on the size of the unit cell.On average, each atomic structure in this data set contains about 5 million probe points.

Training Procedure
The models are trained to minimize the normalized mean absolute error introduced by Jrgensen and Bhowmik.We note that there is no universally accepted error metric for electron density.We acknowledge that this metric may not translate well across data sets, and is biased towards high-density probe points.The metric is defined as: where ρ is a ground-truth electron density obtained from a converged self-consistent DFT calculation, ρ is a model-predicted electron density, and V is the simulation volume.This integral is evaluated numerically by assuming probe points represent equal-volume voxels, each of uniform density which matches the density at the probe point.In this work, we will focus on one model, a Schnet-based model 19 which is akin to Jrgensen and Bhowmik's "Invariant DeepDFT". 19Our implementation is built from the OCP implementations of Schnet.The main architectural differences between this models and DeepDFT are charge conservation and the long-range zero density constraint as described in the modeling section of this work.The hyperparameters of each of the model are available in the supporting information of this work.

Interface with DFT Calculator
To make predictions of physical properties such as forces, it is advantageous to use an existing quantum mechanical calculator.For instance, one may wish to perform a self-consistent field (SCF) calculation using model-predicted electron densities as an initial guess.Thus, we have established methods to use predictions from our electron density models as inputs to VASP.
To do this, it is necessary to have electron density predictions on the same grid that is sampled in the CHGCAR file.Obtaining these predictions on a single GPU takes about 1-2 minutes for a typical structure in our data set.
The electron density prediction is written to a CHGCAR file.To use this file as a VASP input, one must also write projector-augmented-wave (PAW) occupancies.In this work, we reuse PAW occupancies obtained from a DFT calculation with zero electronic minimization steps.This non-SCF calculation is relatively fast and is only necessary to obtain a reasonable guess of the augmentation occupancies.In principle one may obtain reasonable augmentation occupancies by another method, such as reading the pseudopotential information directly.Once a CHGCAR is written with the atomic structure, electron density, and PAW occupancies, VASP can read the file with the INCAR flag ICHARG=1.This will cause VASP to use the predicted density as an initial guess for the electronic minimization loop instead of the default method.
The default method VASP uses to obtain an initial guess of electron density is superposition of atomic charge densities (SACD).This is an easy and inexpensive way to obtain an electron density by adding the electron density that would be obtained from each atom if it were isolated.It ignores the effects of inter-atomic interactions on electronic structure.
When discussing our results, we sometimes refer to the SACD method as a baseline.

Electron Density Predictions
Across the 100-structure test data set with 4.94 million probe points per structure, the model achieves an MAE of 2.54%.In comparison, the SACD method only achieves an MAE of 8.30%.We show parity between the model-predicted and ground-truth electron densities on a parity heat map.This allows visualization of model performance across different regimes in the test data set.Figure 1 shows the performance achieved by our best model.We see that our model achieves tighter parity in the high-density regime when compared with the SACD method.This high-density regime includes many points in the data set, as evidenced by the thin warm-colored line in this portion of the plot.Examining the low-density regime, it is difficult to claim that the machine learning model has significantly better or worse performance than the SACD method.It seems that the model is unable to effectively distinguish between points with electron densities below 10 −4 electrons per cubic Angstrom.This is to be expected because such points have little impact on the error metric by which the model is trained.Furthermore, many points in this regime seem to have disappeared from the plot.This is because these points are beyond the cutoff radius of any atoms, so the model is forced to predict zero electron density.Thus these points are not shown on the logarithmic plot.Ultimately, precision in this low-density regime is likely unnecessary for many applications.

Learning Curve
To investigate the limitations of the model, we trained three additional models on smaller

DFT Initialization
One potential use of these model-predicted electron densities is to initialize self-consistent DFT calculations.Because these calculations gradually converge an electron density guess to its ground state, it stands to reason that an initial guess which is close to the actual ground state electron density will result in faster convergence.In reality, this is only one piece of the self-consistency puzzle.In a PAW calculation, it is necessary to converge the wavefunctions and augmentation occupancies in addition to the electron density.Thus, even a perfect guess of electron density will require a significant amount of electronic minimization steps to obtain self-consistency if the initial guesses of wavefunctions and augmentations are poor.Despite this limitation, we find that improving the error on electron density improves the overall convergence speed of a DFT calculation.This result is demonstrated in Figure 3.
Figure 3: VASP performance is improved as the training set size increases.This is attributed to more accurate electron densities.The number of Hamiltonian evaluations is normalized by the number required when using the ground-state (i.e.perfectly accurate) electron density as an initial guess.Thus, the blue line represents the "best case scenario" for a perfect electron density model.Significant progress is still needed to approach this limit.Interestingly, the machine learning models do not consistently outperform the baseline SACD method in this task.Boxes represent 50% of the test data and whiskers represent 95% of the test data.
This was measured by running a DFT self-consistency calculation (single point) on each structure in the test data set.This was repeated with electron densities predicted by each model.Additionally, we ran the same experiment using for initialization the actual groundstate electron densities obtained from DFT.To measure performance we take the total number of evaluations of the Hamiltonian acting on a wavefunction, which is reported in VASP's output as "ncg".The performance is then normalized relative to the case where the ground-state initialization is used as the initial guess.
The figure shows a clear correlation between the training set size and resulting DFT performance.However, it is also clear that this scheme offers no practical value at this time.
Even the best model presented in this work is unable to consistently yield electron density guesses that lead to faster convergence than the SACD method.It appears that simply increasing the training set size is not a practical path towards optimal performance on this task.

Atomic Charge Predictions
Another common use of the electron density is to determine atomic charges.This is done by a charge partitioning scheme.These methods divide the structure into volumes, each of which contains one atom.Then, the total charge within that volume is summed up, including the negative charge from the electron density and the positive charge from the nucleus.These total charges are assigned to each atom.Quantitatively, atomic charges can be used to investigate electrostatic interactions between atoms. 20Qualitatively, they can help explain the movement of electrons between atoms and investigate chemical reaction mechanisms.
Furthermore, after dividing the space into these volumes, additional topological analysis can be performed to study chemical bonds. 213][24] We performed this analysis on the electron densities predicted by each of our models on the test data set, as well as the ground-state and SACD electron densities. 25 evaluate the performance of each model by calculating the absolute error between each predicted atomic charge in the test data set, as compared to the charges obtained from the ground-state electron density.The results are summarized in Figure 4.
The model performs significantly better than the SACD method, and performance is improved with the size of the training data set.7][28] However, it is worth considering that the models presented today were not explicitly trained on atomic charge data.In this work we are able to achieve this performance simply by training on electron density data and using existing methods to determine atomic charges.This enforces a conservation of total charge, which is not guaranteed by all existing methods.Moreover, it demonstrates that accurate electron density predictions can be used to obtain other properties by established, physics-based methods.
Figure 4: Error in Bader charges and electrostatic dipole moments.In each case, the machine learning models offer marked improvement over the SACD method.Boxes represent 50% of the test data and whiskers represent 95% of the test data.Training on larger data sets appears to offer slight improvements.Note that the models are not trained to predict either of these properties directly.Rather, existing methods are used to calculate them from the predicted electron densities.

Dipole Moment Predictions
Electrostatic dipole moments were calculated from the electron densities and pseudo-ion charges.Like the Bader analysis, this calculation was done with the SACD density, groundstate density, and each model-predicted density for each structure in the test data set.The result is shown in Figure 4.The results show a similar trend to that of the Bader analysis.
There is a marked improvement in the performance of the machine learning models compared to the SACD method, and training on larger data sets results in further decreases in error.

Current Limitations
With any machine learning method, the resultant model is greatly limited by the training data which is used in its generation.As such, although the model can be used as-is to obtain predictions for out-of-domain systems, these predictions have no guaranteed degree of accuracy Predicting the full electron density of a structure in this data set with this model takes about 1-2 GPU-minutes.This is because the number of probe points is too large for the GPU to handle in one batch.Instead, the probe points are divided into many batches which are transferred to the GPU in succession.In comparison, machine-learned potentials can be evaluated in a fraction of a second.Bridging this speed gap will require optimizations, such as predicting on fewer probe points and efficiently up-sampling to a full grid. 29

Paths Towards Improved Performance
As shown in Figure 2, it appears that model performance can still be improved by training on larger data sets.It is relatively easy to generate an electron density data set that is one or two orders of magnitude larger using DFT.Based on the learning curve, it seems that this would lead to models with an MAE near or below 2%.However, training on such a data set remains a challenge.Practically speaking, the current size of the data set is well over 500 GB.This is comparable in size to the full OC20 data set despite containing four orders of magnitude fewer structures.In fact, due to the batching procedure, the models presented today have not truly been trained on every data point.An electron density data set with the same number of structures as OC20 would contain petabytes of data, placing it among the largest atomistic data sets created to date.Training on such large data sets requires extensive GPU resources and specialized parallelization techniques.Thus there is a need for efficient implementations for storing and training on such large data sets.This path shows the most promise to improve on the accuracy of the models presented today.
On the other hand, one might consider training with a more advanced GNN model.The SchNet model used in this work is not considered state-of-the-art in OC20. 4,30PaiNN-based architectures have already been used for electron density prediction. 16This could also become very compute-expensive as more and more advanced architectures are used.Furthermore, not all GNN architectures are suitable for this task.For instance, a model such as GemNet that enumerates three-and four-body interactions 31,32 is highly impractical since a typical structure will have millions of "bodies": the probe points.It is critical to choose or create GNN message-passing architectures with cost that scales linearly (or better) with the amount of probe points in the batch.
Finally, we discuss some practical paths towards improved DFT performance.As mentioned previously, a self-consistent PAW calculation requires wavefunction and augmentation occupancies in addition to the electron density.Although it is possible to use the default methods for these quantities, improving the initial guesses of these properties will amplify any performance gains from an accurate electron density guess.9][10][11][12] Predicting the periodic wavefunctions promises further SCF convergence speed improve-ments, but it is not immediately obvious how this can be efficiently integrated with current GNN models.Further increases in training set size are not likely to result in large performance improvements on this task.Further investigation is necessary to determine why the SACD method outperforms the machine learning models on this task, despite significantly worse error metrics.

Conclusions
In this work, we implemented a message-passing GNN for electron density prediction.It includes several useful features that enable the predicted electron density to be with physical models such as DFT and Bader charge partitioning.Although fast and accurate machine learning models exist for many individual properties, this work demonstrates a new paradigm, in which models can be trained on electron density only, and existing methods can be used to determine other quantities.We believe that training on larger data sets and with more advanced models will enable even more accurate models in this domain.
At each training step, one atomic structure is chosen from the training data set and converted to a geometric graph structure.Then, a subsample of probes is randomly chosen from a uniform distribution.The number of probes chosen at this step can be thought of as the "batch size" and is generally limited by the graphics processing unit (GPU) memory capacity.For these models, the batch size used for training is 10 4 or more.The geometric graph is augmented with the probe nodes and their respective edges.Then, the model makes a prediction of electron density at each probe point.The error metric is computed, and backpropagation is used to obtain gradients for each model parameter.Model parameters are adjusted according to the Adam algorithm.18This procedure is repeated until each structure in the training data set has been chosen once, which comprises one epoch.The model is trained until no further improvements are seen on the validation set, which typically takes about 20 epochs.

Figure 1 :
Figure 1: Parity heat maps comparing the results of the machine learning model and SACD method.The model achieves tighter parity in the high-density region but not necessarily in the low-density region.This plot does not show points with zero or negative pseudo-valence electron density.

2 .
subsets of the training data set.The performance of each trained model is shown in Figure From this learning curve, it appears that we are still in a regime where the accuracy of the model is limited not by the model architecture itself, but by the size of the training data set.

Figure 2 :
Figure 2: Learning curve with respect to training set size.Boxes represent 50% of the test data and whiskers represent 90%.Further increases to data set size are likely to result in even lower error measurements.
. Such out-of-domain systems include but are not limited to oxide materials, zeolite materials, metal-organic frameworks, amorphous materials, and liquid systems.It is possible to use the tools and techniques presented in this work to create an equivalent model for each of these domains, provided sufficient training data is available.Similarly, we have only trained on the pseudo-valence electron density (available in the CHGCAR) and make no claims about model performance with all-electron density or similar quantities.This model is not accurate enough to directly use the predicted electron density to obtain an estimate of forces, as has been done byRackers et al. 11We attribute this to the diversity of our data set in comparison to that used in other work, and we hope that future training on larger but equally diverse data sets will achieve sufficient accuracy for this task.