Abstract
Despite the successes of machine learning methods in physical sciences, the prediction of the Hamiltonian, and thus the electronic properties, is still unsatisfactory. Based on graph neural network (NN) architecture, we present an extendable NN model to determine the Hamiltonian from ab initio data, with only local atomic structures as inputs. The rotational equivariance of the Hamiltonian is achieved by our complete local coordinates (LCs). The LC information, encoded using a convolutional NN and designed to preserve Hermitian symmetry, is used to map hopping parameters onto local structures. We demonstrate the performance of our model using graphene and SiGe random alloys as examples. We show that our NN model, although trained using small-size systems, can predict the Hamiltonian, as well as electronic properties such as band structures and densities of states for large-size systems within the ab initio accuracy, justifying its extensibility. In combination with the high efficiency of our model, which takes only seconds to get the Hamiltonian of a 1728-atom system, the present work provides a general framework to predict electronic properties efficiently and accurately, which provides new insights into computational physics and will accelerate the research for large-scale materials.
Export citation and abstract BibTeX RIS
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
The Hamiltonian lies at the heart of solid-state physics as it determines the electronic properties, the eigenstates of a Hamiltonian can then be used to obtain other physical properties. For periodic systems, the Hamiltonian needs to be solved in reciprocal space, which then gives band structure information. The past half-century has witnessed the development of solving Hamiltonian problems based on density functional theory (DFT) [1], which has achieved great success in understanding electronic properties. However, with increasing system size, the cost of solving Hamiltonian using DFT increases dramatically and has become one of the greatest challenges in solid-state physics and materials science.
Machine learning can provide a possibility to solve such problems. However, despite past decades having witnessed many successes of machine learning methods in predicting physical properties [2, 3] and interatomic potentials [4–6], prediction of the Hamiltonian, and thus electronic properties, is still unsatisfactory. Current models generally yield low accuracy and poor extensibility to predict electronic properties such as bandgaps, defects, and optical properties. Mapping the Hamiltonian onto some locality is crucial to design an extendable model [7, 8]. Inspired by the tight-binding method, the Hamiltonian can be represented by hopping parameters connecting the localized basis functions. Using numerical pseudo-atomic orbitals (PAOs) as the basis [9], the hopping parameter between atoms i and j is defined by
where and denote the orbitals, is the atomic position, and is the lattice vector. Note that hopping parameters have locality and can be determined by local structures because hopping can be negligible when atomic distance is larger than a certain cutoff radius [7]. Therefore, the Hamiltonian matrix, which is constructed by hopping parameters, can be determined from local structures, and thus an extendable model for predicting the Hamiltonian is theoretically feasible. In previous studies, Hegde and Bowen proposed a model for predicting Hamiltonians in a small basis set and studied the s-orbital Cu and sp3-orbital diamond systems [10]. Schütt et al developed the SchNOrb model for small, individual molecules with data augmentation [11]. Gu et al introduced a neural network (NN) representation of tight-binding Hamiltonians for one-dimensional carbon chains [12]. The main disadvantage of the above methods is that they did not treat the equivariance appropriately, and thus the applications are limited.
Because the Hamiltonian matrix is equivariant under rotations of coordinates, the prediction of hopping parameters is more complicated than predicting measurable properties such as bandgaps, which are independent of the choice of coordinate system. Although the data augmentation technique can reflect rotations [11], it is obviously inefficient. To fix the rotation freedoms, Li et al recently introduced the deepH model [13], in which the local coordinate (LC) for each atom pair is defined based on the local structures. The hopping parameters can be trained in the LC and then transformed back to the global coordinate to construct the Hamiltonian matrix. Using the atomic-like basis functions, the rotational transformations can be well described by the Wigner-D matrix. Nevertheless, more efficient methods for predicting the Hamiltonian are still being explored. For example, in the work of Li et al, the hermiticity of the Hamiltonian is not properly considered. Moreover, the commonly defined LC is an incomplete descriptor, and this will be clarified in the following.
In this work, based on the graph NN architecture and LC transformation, we have developed an extendable model, called graph LC-Net, to predict the hopping parameters and hence obtain DFT Hamiltonian. In addition to the rotational covariance, the key advantages of our model over similar works are as follows: the hermiticity of the Hamiltonian is utilized, a complete LC for each atom pair is defined by our new algorithm to avoid the degeneration problem for highly symmetric structures and the convolutional neural network (CNN) is used to handle the LC information. The node and edge features are updated by message passing with an attention mechanism. The output of the model is the triangular portion of the hopping parameter matrix, and then all the hopping parameters are assembled into the Hamiltonian matrix. Using graphene and SiGe random alloy systems as examples, we demonstrate that our graph LC-Net, trained using small-size systems, can predict the Hamiltonian as well as electronic properties such as band structures and densities of states (DOSs) for large-size systems within the DFT accuracy, but with very little cost, i.e. it takes only about 10 s to get the Hamiltonian of a 1728-atom system using only one GPU card. In addition, the linear scaling with the number of atoms of graph LC-Net makes it very promising for studying the electronic properties of large-scale systems.
2. Results
2.1. Preliminary
Graph is a data structure composed of sets of vertices and edges. The atomic structure is represented by a graph, in which the nodes represent the atoms, and the edges represent the bonds between atoms. In the tight-binding model, the hopping parameters are determined by the local structures of atom pairs. Therefore, the task in this work is to predict the properties of edges. Message passing NNs [14] is a universal graph neural network (GNN) framework proposed for graph, where the node features are updated by gathering the features of all the neighbors. The most straightforward way to update the edge features is to aggregate the features of the two nodes on the edge. GNN is widely used to model interatomic potentials [15], predict materials properties [16], and drug design [17].
2.2. Embedding layer
As shown in figure 1, the graph LC-Net model consists of an embedding layer, an LC layer, several interaction layers, and an output layer. The inputs include the atomic number , the edge distances , and the LC information . In the embedding layer, each node is encoded as a one-hot vector according to the atomic number. The connectivity of the graph is determined by the distances between nodes, that is, there is an edge if the distance between the two nodes is less than the cutoff radius . The initial node features are calculated through an embedding network [15] by . The initial edge features are the Gaussian radial basis functions , where the centers are placed linearly between 0 and , and is the variance.
2.3. LC layer
The LC layer is used to obtain the LC information for each edge which is uniquely determined by the local structures of atoms i and j. Note that, to utilize the hermiticity of the Hamiltonian, the edges ij and ji must be in the same LC. The details of calculating LC are described in the methods section. The LC information can be represented by real spherical harmonics and distances. Here we use the edge ij as an example to illustrate how LC information is obtained. First, the neighbors of atom i, denoted by k, are sorted by distances . Then we calculate the real spherical harmonics of bond ik, denoted by , where and are the polar and azimuthal angles, respectively, of bond ik relative to an LC defined for edge ij. Each bond ik can be represented by a list of real spherical harmonics, yielding the orientational vector calculated by , where is the number of neighbors of atom i and is the number of spherical harmonic functions. Similarly, for atom j we can obtain the orientational vector .
To extract the LC features from the orientational vectors and , Li et al introduce the LC message passing (LCMP) layer [13]. However, the order of the neighbors is not considered in the LCMP layer. To preserve the information of neighbor order, we utilize the CNN architecture for natural language processing (text-CNN) [18, 19] as shown in figure 1(c). The convolution operator with a filter is applied to a window of h neighbors to produce a feature. Using the convolution operator on the input generates a new feature map . Then max pooling is performed to extract a scalar from each feature map. These features are then passed to a dense layer to generate the output vector . The neighbors of atom j are processed similarly, and finally, the outputs are concatenated to give the LC feature vector .
We use an example to further illustrate how to utilize the text-CNN to extract LC information. Consider a local structure with respect to atom i in figure 2(a), the LC is defined on edge ij, and atoms 1–4 are neighbors of atom i. The input of the LC layer is the spherical harmonics . Each is a vector of elements as and are used. In analogy with the text-CNN framework, each neighbor is treated as a word in our model. Since the neighbors are sorted, we expect that the NN is able to learn the information of order, which is not considered by the LCMP [13]. Nevertheless, the text-CNN layer could handle the serialized information appropriately. When a different LC is chosen, the input of spherical harmonics will be different, and thus the information of LC is learned. We have used three windows of sizes 2, 3 and 4, each of which has 64 filters. Thus, for each LC, 192 new features are extracted by the convolution operator. The output is a vector of elements representing the LC information from the neighbors of atom i. The neighbors of atoms j are processed similarly. Finally, the outputs of atom i and atom j are concatenated to give LC feature as shown in figure 2(b).
Download figure:
Standard image High-resolution image2.4. Interaction layer
After the processing of the embedding layer and the LC layer, information is passed to the multiple-stacked interaction layers. In the interaction layer, the attention mechanism [20, 21] is used for updating node features. To alleviate the over-smoothing problem, we have used skip connections inspired by ResNet [22]. Note that, our model is a universal framework, the interaction layers could be replaced by other advanced graph-based models for potentially better performance. To utilize the LC features, we just concatenate the LC feature and the edge feature , since they are in one-to-one correspondence. As shown in figure 1(d), the node features are updated by
where k denotes the neighbor of i, l refers to the l-th interaction layer, and denotes a dense layer. The attention coefficients are computed from
where is a learnable weight variable, LeakyReLU [23] is an activation function and denotes concatenation. After the node features are updated, the edge features are updated from
where h-swish is the hard version of the swish activation function [24].
2.5. Output layer
The output layer consists of two stacked dense layers with h-swish,
where is the upper triangular portion of the hopping matrix of ij. Since the Hamiltonian is Hermitian, the model does not need to output all matrix elements. However, the hopping matrix in general is not Hermitian, that is, unless . Considering the hermiticity of the Hamiltonian, it can be shown that the hopping matrix satisfies . Therefore, only the upper (or lower) triangular portion of hopping parameters are required as the output of the model. There is a problem that, in general, the upper triangular portion of and are different, that is, . The hermiticity can be utilized only if and are in the same LC. In this case, the input information for predicting and are the same, that is, , and then the training will not converge. To overcome this problem, the hopping matrices are divided into two groups according to the distances of the neighbors, as described in the method section. The two groups are trained separately. Then we obtain two models for the hopping parameters, called the forward model and the backward model, respectively. Finally, the Hamiltonian matrix is constructed by all the hopping matrices, as shown in figure 3.
Download figure:
Standard image High-resolution image2.6. Loss function
The graph LC-Net is trained by minimizing the difference between the predicted data and the DFT calculated data . We note that the hopping parameters are highly imbalanced, that is, most targets are close to zero while a few are larger than 10 eV. To better handle the imbalanced data, a regression version of the focal loss [25, 26] is used with
where and are adjustable hyper-parameters, and denotes the output hopping parameters.
2.7. Model evaluation
With the above architecture for learning the Hamiltonian in hand, we turn to real systems to demonstrate the feasibility of learning electronic properties using the NN methods. The goal is to map the local atomic structures to the hopping parameters using graph LC-Net. In this work, s2p2d1 PAOs are used as basis functions and the hopping parameters for each atom pair ij are represented by a 13 × 13 matrix. The mean absolute error (MAE) between DFT calculated and NN hopping parameters is used to evaluate our method. The electronic properties including band structures and DOSs are then calculated for comparison.
2.8. Degeneration problem with local symmetry
The first step of training a Hamiltonian model is to define the LC for each atom pair. In the previous study [13], the LC is determined by the nearest neighbor and the second nearest neighbor . However, this method only works on perturbated structures, that is, the nearest and second-nearest neighbors are unique. In the case of highly symmetric local structures, for instance, defect-free crystals, the above method fails to determine a unique LC since the nearest neighbor cannot be determined uniquely by distances, which then results in data conflict. To be specific, for the neighbors of the atom , if , the atoms and are indistinguishable. This is also known as the degeneration problem [27]. Similar problems can happen for the off-site hopping parameters. As a result, the errors of highly symmetric structures will be significantly larger, even in the training set, than those of randomly perturbated structures due to data conflict. We find that this problem can be solved by utilizing the information of the global coordinate to define a complete LC for each edge, and the algorithm is provided in the method section.
To illustrate the feasibility of the new algorithm, we test the performance of different methods on a perfect graphene structure and a zincblende SiGe alloy. We first calculate LC for each edge using the original method of deepH [13] and then perform calculations with our new algorithm. The learning curves for graphene and SiGe are shown in figures 4(a) and (b), respectively. It is shown that the training loss of the model with the original LC is significantly larger than that with our complete LC.
Download figure:
Standard image High-resolution image2.9. Graphene dataset
We create a dataset of graphene to compare the performance of our graph LC-Net with the DeepH model proposed in [13]. We perform molecular dynamics simulation of a 6 × 6 × 1 graphene structure for 5 ps to generate the dataset, and the computational details are consistent with the case of DeepH. Then we sample 500 structures as the training set and the other 500 structures as the validation set. We find that, on the validation set, the MAEs of the forward model and the backward model are 3.5 meV and 1.9 meV, respectively, as shown in figures 5(a) and (b). Our result is comparable with that of DeepH (0.4 meV–8.5 meV). The element-wise error distributions are shown in figure 5(c). The band structures and DOS are shown in figure 5(d). Note that, the authors of DeepH train 169 models for the hopping parameters to reconstruct the Hamiltonian of the graphene dataset, while we train only 2 models to obtain all the hopping parameters, demonstrating the efficiency of our method.
Download figure:
Standard image High-resolution image2.10. SiGe random alloy dataset
We demonstrate the feasibility of our method with the three-dimensional SiGe random alloy. The SiGe random alloy dataset is generated by randomly occupying the zinc-blende lattice sites with the Si or Ge atoms. The number of possible combinations in a supercell with N sites is given by the combinatorial number , which could be incredibly large as the total atom number increases. Note that, most of the structures are identical under certain symmetry operations. Therefore, we develop a scheme to filter the identical structures to construct a training set of SiGe random alloy.
For each atom, we construct a unique local structure information by its neighbors in the following way. As shown in figure 6(a), we first calculate the dot product of and (atomic positions with respect to the target atom 0), and then obtain a list of dot products for the target atom:
Download figure:
Standard image High-resolution imageThen we sort the elements in the list in ascending order to represent the local structure information. It is easy to check that the two local structures in figure 6(a) are different by comparing the lists.
We calculate the local structure encodings by equation (7) for all atoms, remove duplicated ones and add the encodings into a database. When a new structure is generated, we compare the new encodings with those in the database. If new local structures are found, the generated structure is added to the training set and the database is updated accordingly. New structures are generated repeatedly until no more new local structures are found. In this work, we collect 66 samples of 2 × 2 × 2 supercells and 1000 samples of 3 × 3 × 3 supercells for training. A snapshot of a 3 × 3 × 3 SiGe random alloy in the training set is shown in figure 6(b). The samples in the validation set are randomly generated with supercell sizes ranging from 2 × 2 × 2 to 6 × 6 × 6. A snapshot of a 6 × 6 × 6 SiGe random alloy in the validation set is shown in figure 6(c). All the structures are relaxed before calculating the Hamiltonians. The structures in the training set contain up to 216 atoms (3 × 3 × 3 supercell). For the validation set, the structures contain up to 1728 atoms (6 × 6 × 6 supercell) and 25 406 784 hopping parameters, demonstrating the extensibility of graph LC-Net. For the training set, the MAEs of the forward model and the backward model are 0.50 meV and 0.52 meV, respectively. For the validation set, the MAEs range from 0.5 to 0.8 meV as shown in figure 7(a). The element-wise error distributions are shown in figure 7(b). The band structures and DOS are calculated using the Hamiltonian predicted by graph LC-Net, and comparisons with DFT calculations are given in figures 7(c) and (d).
Download figure:
Standard image High-resolution image3. Discussion
The major advantage of graph LC-Net over DFT calculation is computational efficiency. In DFT calculations, the Kohn–Sham equation needs to be solved iteratively to reach a certain accuracy. For each iteration, eigenvalue problem of a matrix is solved, and the time cost scales cubically. As for the NN model, the iterative calculation is not needed, and the inference time scales linearly with the number of atoms. For the 6 × 6 × 6 supercell with 1728 atoms, graph LC-Net only costs 10.5 s to predict the Hamiltonian. For comparison, the computational time for the DFT self-consistent calculation is about 4800 s using 96 CPU cores, depending on the iteration steps. A comparison regarding the time is summarized in table 1. In this work, graph LC-Net is trained using one GeForce RTX 3090 card, and the DFT calculations are performed using two Intel Xeon Platinum 9242 CPUs (96 cores).
Table 1. Inference time cost by DFT and graph LC-Net. The size denotes the number of atoms in a structure. The unit of time is in second of CPU time.
Size | 64 | 216 | 512 | 1000 | 1728 |
---|---|---|---|---|---|
DFT | 43.2 | 282.4 | 1317.2 | 1271.7 | 4600.5 |
Graph LC-Net | 0.6 | 1.3 | 3.2 | 6.8 | 10.5 |
With DFT accuracy and very high computational efficiency, graph LC-Net can be applied to study electronic properties of very large systems, such as high-entropy alloys [28] or defect structures [29]. Graph LC-Net is also expected to accelerate the inverse design of materials with targeted electronic properties [30, 31] such as band gaps, where DFT calculations are performed thousands of times for structure relaxation and property evaluation. The structure relaxations can be efficiently performed using the NN potential models [32], and our graph LC-Net for Hamiltonian can be used to evaluate the electronic properties to avoid the computationally expensive DFT calculations. Note that, our method is not affected by the types of materials directly if only DFT calculation can be performed accurately [33]. Besides the DFT calculation issue about the training set, the accuracy of the graph LC-Net is possible to improve. For example, the embedding layer could involve more sophisticated initializations [34], and the interaction layers could be replaced by other advanced graph-based models [35].
In conclusion, based on the graph NN architecture and using the local atomic structure information, we have developed the graph LC-Net to predict the hopping parameters and hence obtain DFT Hamiltonian. We have designed the complete LC algorithm for each edge to fix the rotational freedom of the Hamiltonian matrix and reserve the hermiticity of the Hamiltonian to improve the efficiency. By employing graphene and SiGe random alloys as examples, we have demonstrated the high accuracy, extendibility, and efficiency of graph LC-Net in predicting Hamiltonian and electronic properties. This work thus opens doors for studying the electronic properties of large-scale systems using machine learning methods.
4. Methods
4.1. DFT calculation
Before calculating the hopping parameters, all the structures are relaxed into fixed cells using the Vienna ab initio Simulation Package [36]. The projector augmented wave [37] type pseudopotential is used and the plane wave energy cut-off is set to 450 eV. The generalized gradient approximation parameterized by Perdew et al [38] and the local spin density approximation of Ceperley–Alder [39] are used for the exchange-correlation functional for graphene and SiGe alloy, respectively. The relaxation is stopped when the change of the total energy is smaller than 0.001 eV between two ionic steps.
Then we employ the numerical PAOs as implemented in OpenMX software [40] to perform DFT calculations for the hopping parameters. The PAO is given by a product of a radial function R and a real spherical harmonic function Y by
where R is defined numerically and is finite within a cutoff radius. The Hamiltonian and the overlap matrices are given by
and
where i and denote atom and orbital, respectively, is the atomic position, and is the lattice vector. According to the principle of locality, the hopping parameters are determined by the local structures of atoms i and j within a certain cutoff radius. The overlap parameters are calculated by the PAOs without DFT calculations. Then the eigenvalues as well as eigenstates of a system are obtained by solving the generalized eigenvalue problem:
For graphene, the C6.0-s2p2d1 PAOs are used with a cut-off radius of 6.0 Bohr as the basis functions. The Monkhorst–Pack k-meshes of 5 × 5 × 1 are used. For SiGe random alloy, the Si7.0-s2p2d1 and Ge7.0-s2p2d1 PAOs are used as the basis functions and the cut-off radius is set as 7.0 Bohr. For the supercells up to 4 × 4 × 4, the k-points are generated using the automatic scheme to ensure the spacing between k-points is smaller than 0.16 Å−1. For the 5 × 5 × 5 and larger supercells, a single gamma point is used.
4.2. LC
First, consider the case that the nearest neighbor and the second nearest neighbor are different atoms. We use two noncollinear vectors and to define the LC as
For onsite hopping (), the two vectors are determined by
where and denote the nearest neighbor and the second nearest neighbor of atom i, respectively.
For offsite hopping (), we first calculate , which is the distance from the nearest neighbor to the target atom . To utilize the hermiticity, the LCs are defined in two different ways according to the distances. If , we define
where denotes the nearest neighbor of atom i. If , we define
where denotes the nearest neighbor of atom j. In this way, the and are ensured to be in the same LC. Since the input information for predicting and are the same, the hopping parameters defined by equations (14) and (15) are trained separately. The two models are called the forward model and the backward model, respectively.
As for the highly symmetric structures where degeneration problems happen, the nearest neighbor and the second nearest neighbor are the same atom, and therefore the above method is not applicable. In this case, we propose algorithm 1 with Function find_v2() to determine the basis vectors and . We show a non-degenerate structure and a degenerate structure in figures 8(a) and (b), respectively. The degeneration problem rarely happens in the graphene dataset as all the atoms are perturbed during the molecular dynamics simulation. For the SiGe dataset, since all the structures are relaxed, the degeneration problem is more likely to arise when the atomic arrangement exhibits some symmetry. Although algorithm 1 contains quite a lot of computational logic, most structures are non-degenerate and will not use algorithm 1. Moreover, algorithm 1 is only used for preprocessing, that is, transforming an atomic structure into a graph. The training and testing process do not involve algorithm 1.
Download figure:
Standard image High-resolution imageAlgorithm 1. Local coordinate |
---|
Input: Atomic positions , neighbors' distances (sorted) |
/* Find two vectors v1 and v2 to define the LC */ |
if i = j then |
if then /* the nearest and second nearest neighbors are unique */ |
The v1 and v2 are determined by the 1st and 2nd neighbors of atom i, respectively |
else |
Get candidates for v1: and v2: |
if the size of > 1 then |
Find the vector with the largest x component as v1 |
If more than one vectors share the same largest x, compare their y and z components in sequence |
end if |
if the size of > 1 then |
find_v2(, v1) |
end if |
end if |
else |
/* , v1 could be either or */ |
Calculate and
|
if size of = 1 then |
else if size of = 1 then |
else |
find_v2(, ) |
find_v2(, ) |
|
end if |
a : candidates for if . b The function find_v2() calculates the angle between and an auxiliary vector vaux, chose the one with a smaller angle.
Function find_v2(, v1) |
---|
Inputs: |
: candidates for v2 |
vaux: an auxiliary vector |
Procedure: |
Calculate the rotation matrix R to transform v1 to (1, 0, 0) |
for v in do |
Calculate the vector v in the rotated coordinate |
Calculate the angle between v and vaux |
if a smaller is found then |
end if |
end for |
a The auxiliary vector should be chosen to avoid ambiguities, i.e. there are not two or more vectors in the v2 list sharing the same . In this work, vaux = (3.4, 4.5, 5.6) is used.
4.3. Rotation of Hamiltonian
The rotation transformation with real spherical harmonics basis is rather complicated [41]. To be convenient, we first convert real spherical harmonics basis into complex spherical harmonics basis, and then the rotation transformation R can be done with the Wigner D-matrix , where is the angular quantum number and is the magnetic quantum number. The rotation of complex spherical harmonics is given by
Note that the Wigner D-matrix is unitary, that is, . The rotation transformation of the hopping parameter matrix is given by
where denote atoms, and denote orbitals. Finally, the hopping parameters are converted back to real spherical harmonics basis.
4.4. Training details
The graph LC-Net model is developed using the Pytorch-Geometric [42] Python library. The variance of the Gaussian radial basis function is set to 0.0044 Å−2. The parameters and are used in the loss function. Adam optimizer [43] was used with an initial learning rate of 0.001. The reduce-on-plateau strategy is used to schedule the learning rate, and the lower bound of the learning rate is 1 × 10−5. The number of trainable parameters is 524 731. In this work, the graph LC-Net is trained on a single GeForce RTX 3090 card. The model is trained for 16 h (1336 epochs) on the graphene dataset with 500 samples. The model is trained for 12 days (3752 epochs) on the SiGe dataset with 1066 samples.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 12188101, 61904035, 11974078, 11991061). Computations are performed at the Supercomputer Center of Fudan University.
Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/satori555/Graph-LC-Net.
Conflict of interest
The authors declare no conflict of interest.
Author contributions
M S, J Y, and X G conceived the idea and designed the research. M S performed the DFT calculations and implemented the codes. X G supervised the work. All authors discussed the results and were involved in the writing of the manuscript.