Deep-learning electronic-structure calculation of magnetic superstructures

Ab initio studies of magnetic superstructures are indispensable to research on emergent quantum materials, but are currently bottlenecked by the formidable computational cost. Here, to break this bottleneck, we have developed a deep equivariant neural network framework to represent the density functional theory Hamiltonian of magnetic materials for efficient electronic-structure calculation. A neural network architecture incorporating a priori knowledge of fundamental physical principles, especially the nearsightedness principle and the equivariance requirements of Euclidean and time-reversal symmetries (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(3)\times \{I,{{{\mathcal{T}}}}\}$$\end{document}E(3)×{I,T}), is designed, which is critical to capture the subtle magnetic effects. Systematic experiments on spin-spiral, nanotube and moiré magnets were performed, making the challenging study of magnetic skyrmions feasible.

Ab initio studies of magnetic superstructures are indispensable to research on emergent quantum materials, but are currently bottlenecked by the formidable computational cost. Here, to break this bottleneck, we have developed a deep equivariant neural network framework to represent the density functional theory Hamiltonian of magnetic materials for efficient electronic-structure calculation. A neural network architecture incorporating a priori knowledge of fundamental physical principles, especially the nearsightedness principle and the equivariance requirements of Euclidean and time-reversal symmetries (E(3) × {I, ), is designed, which is critical to capture the subtle magnetic effects. Systematic experiments on spin-spiral, nanotube and moiré magnets were performed, making the challenging study of magnetic skyrmions feasible.
The subject of magnetic superstructures, such as magnetic skyrmions, moiré magnetism and spin-spiral magnets, has attracted intensive research interest, opening opportunities to explore emergent physics in quantum materials, including the skyrmion Hall effect, the topological Hall effect, flat-band physics and so on 1 . Ab initio calculations based on density functional theory (DFT) has become an indispensable tool for research, but is only applicable to the study of small-scale superstructures owing to the high computational cost. The recent development of deep learning ab initio methods [2][3][4][5][6][7][8][9][10][11][12][13][14][15] has shed light on solving this bottleneck problem; these methods use artificial neural networks to learn from ab initio data and apply neural networks for material simulation without invoking ab initio codes, enabling the study of large-scale material systems. However, current methods are usually designed to treat systems without magnetism, which neglects the dependence of material properties on magnetic structure; thus, they are not suitable for the purpose.
A key challenge of deep learning DFT calculations is to design deep neural network models to represent the DFT Hamiltonian H DFT for efficient electronic-structure simulation 8,9 . This problem has recently been investigated for non-magnetic systems [7][8][9][10][11][12][13] . The counterpart problem for magnetic systems is of great importance but faces some critical challenges. First, an extra dependence on magnetic structure {ℳ is introduced into H DFT (Fig. 1a), which is physically distinct from the dependence on atomic structure {ℛ . Second, the spin degrees of freedom are usually negligible in the non-magnetic case, but become essential for magnetic systems. Consequently, H DFT becomes nondiagonal in spin space, leading to a larger number of non-zero matrix elements (Fig. 1a). Third, satisfying fundamental symmetry conditions is a prerequisite for achieving good performance in deep learning problems [8][9][10][11] . Generalized symmetry requirements for neural network models of H DFT are imposed by symmetry operations on both {ℳ and {ℛ . This important issue has not been addressed before. Last but not least, high-precision calculations are required to describe the subtle magnetic effects, setting high standards for method development. In this context, substantial generalization of deep learning DFT methods is urgently demanded by the research field.
In this work, we develop an extended deep learning DFT Hamiltonian (xDeepH) method, including theoretical framework, numerical algorithm and computational code, which learns the dependence of the spin-orbital DFT Hamiltonian on atomic and magnetic structures by deep equivariant neural network (ENN) models, enabling efficient electronic-structure calculation of large-scale magnetic materials. As a critical innovation, we design an ENN architecture to incorporate physical insights and respect the fundamental symmetry group E(3) × {I,

Brief Communication
https://doi.org/10.1038/s43588-023-00424-3 The workflow of xDeepH is illustrated in Fig. 1a. First, small-size materials (simulated by small supercells) with diverse atomic and magnetic configurations are calculated by constrained DFT for preparing datasets. Then deep neural networks representing H DFT ({ℛ , {ℳ ) are trained on the datasets. Next, the neural networks are applied to predict H DFT for materials with varying atomic and magnetic structures. Based on H DFT , any electronic properties of materials in the single-particle picture can be computed. By replacing the DFT SCF calculation with deep neural networks, the method greatly reduces the computational cost of electronic-structure calculation and enables the study of magnetic superstructures (for example, magnetic skyrmions). The critical issue, however, is the design of intelligent neural networks for modeling the mapping function ({ℛ , {ℳ ) ↦ H DFT , using as much a priori knowledge as possible for optimizing neural network performance.
Two physical principles are essential to the deep learning problem here, including the nearsightedness (or locality) principle of electronic matter 18 and the symmetry principle. Local physical properties satisfying the nearsightedness principle are insensitive to distant change of the chemical environment 18 . For instance, charge density is a local property, whereas the Kohn-Sham eigenstates are nonlocal. The latter depends sensitively on the global material structure, which is complicated from the point of view of machine learning. In general, qualities with local properties are more favorable for deep learning than non-local ones [19][20][21] . Moreover, the fundamental physical equations are equivariant under symmetry operations (for example, translation, rotation and so on). The symmetry is an inherent property of physical data. Hence the use of symmetry properties could substantially facilitate deep learning. In short, the principles of locality and symmetry are a priori knowledge of pivotal importance to artificial intelligence.
Let us first check the locality nature of the deep learning target H DFT ({ℛ , {ℳ ). In DFT calculations, plane waves and localized orbitals are usually employed as basis functions. The localized basis is preferred, as it is compatible with the locality principle. Under the localized basis, the DFT Hamiltonian can be viewed as an ab initio tight-binding Hamiltonian (see details in 'DFT Hamiltonian under localized basis' in Methods). The Hamiltonian matrix block H ij , which describes hopping between atoms i and j, has vanishing values for atom pairs with distance d ij > R C . The cut-off radius R C is determined by the spread of orbital functions and usually on the order of a few ångströms. Importantly, H ij ({ℛ , {ℳ ) obeys the nearsightedness principle, which is influenced only by changes in the chemical environment of finite range R N .
Noticeably, the influence induced by changes in {ℛ and {ℳ are relevant to two kinds of nearsightedness length scales, denoted as R N1 and R N2 , respectively. Formally, varying {ℛ will alter the strong external potential in H DFT , whereas varying {ℳ will mainly modify the relatively weak constraining fields, leading to minor influence on H DFT . It is thus expected that the latter influence on H DFT is weaker in magnitude and shorter in length scale (R N2 < R N1 ). Our results suggest that R N2 ≈ R C ( Supplementary Fig. 1), and R N1 is typically several times larger than R C . The two distinct dependence behaviors of H DFT ({ℛ , {ℳ ) should be accurately described together by the deep learning method. This important issue will be addressed in the dataset preparation and neural network design.
The symmetry principle is another a priori knowledge of essential importance to optimize deep learning performance. On is the Euclidean group in three-dimensional space, I is the identity operator and is the time-reversal operator. The method is systematically tested to show high precision (submillielectronvolt error) and good transferability by example studies of magnetic superstructures ranging from spin-spiral, nanotube and moiré magnets to magnetic skyrmions. Benefiting from extended capability and state-of-the-art performance, xDeepH could find promising applications in future materials research and stimulate the development of deep learning ab initio methods.

Theoretical framework of xDeepH
The deep learning DFT Hamiltonian (DeepH) method has been developed to improve the efficiency of electronic-structure calculation, which shows great potential to address the accuracy-efficiency dilemma of DFT 8,9 . A substantial generalization of the method is required to study a broad class of magnetic materials. For non-magnetic systems, H DFT as a function of atomic structure {ℛ is calculated by self-consistent field (SCF) iterations in DFT. The function H DFT ({ℛ ) is the learning target of DeepH. In contrast, for magnetic systems, H DFT depends not only on atomic structure but also on magnetic structure {ℳ . To compute H DFT for a given {ℳ , one needs to apply constrained DFT that employs the Lagrangian approach to constrain magnetic configuration and introduce constraining fields into the Kohn-Sham potential 16 . In general, the mapping from {ℛ and {ℳ to the spin-orbital H DFT is unique in constrained DFT 17 . H DFT for magnetic systems is also calculated by SCF iterations but requires much more computational resource than the non-magnetic counterpart. This is because the additional constraining fields should be determined self-consistently, and an enlarged Hamiltonian matrix non-diagonal in spin space must be considered. Train The Euclidean symmetry requirements can be preserved using the framework of ENNs 22,23 . All of the feature vectors in ENNs have the equivariant form x l m , which carries the irreducible representation of the SO(3) group of dimension 2l + 1, where l is an integer and m is an integer or half-integer varying between −l and l. Feature vectors can be converted to equivariant tensors (or vice versa) to construct the DFT Hamiltonian via the Wigner-Eckart theorem: l 1 ⊗ l 2 = |l 1 − l 2 | ⊕ ⋯ ⊕ (l 1 + l 2 ), which changes tensor product '⊗' into direct sum '⊕'. How to realize Euclidean equivariance is described in 'Realization of Euclidean equivariance' in Methods.
However, it seems difficult to handle the time-reversal equivariance within the original ENN framework, because time reversal introduces a non-trivial transformation of the DFT Hamiltonian in the spin space (equation (2) in Methods), and complex-valued quantities are prevalent in the problem. We find that this problem is solved by applying the transformation This additional parity should be taken care of for all the input, output and internal vectors throughout the ENN, but it can be treated in exactly the same way as the spatial inversion. Spatial-inversion symmetry is already implemented in our ENN framework, so introducing another parity index is a straightforward generalization. Therefore, all the symmetry requirements by the E(3) × {I, group can be obeyed by the generalized ENN framework.
Here we describe the ENN architecture for xDeepH. xDeepH is based on a message-passing neural network 4 , using graphs of vertices and edges to represent materials. Each atom associated with nuclear charge Z i and magnetic moment m i is denoted by a vertex. Atomic pairs with non-zero H ij are connected by directional edges associated with interatomic distance vector r ij . These features of vertices and edges are input to an embedding layer, which is used to construct initial equivariant feature vectors of different l. Each vector is also labeled by the spatial-inversion and time-reversal parities (p, t). For instance, r ij has (p = 1, t = 0), and m i has (p = 0, t = 1). The ENN iteratively updates the equivariant features for vertices v i and edges e ij by updating them using features of their neighborhood as proposed in DeepH-E3 11 . Note that non-linear activation functions are allowed for l = 0 vectors only, and only even or odd activation functions are allowed to act on vectors with p = 1 or t = 1. Otherwise, the equivariance of the ENN will get broken. Importantly, interaction between vectors of different l is implemented using a tensor product layer 22  An overview of the neural network architecture of xDeepH is presented in Fig. 2. xDeepH embeds atomic and magnetic structures as initial vertex and edge features, followed by successive vertex layers and edge layers to update corresponding features. Distant information of atomic structure {ℛ is aggregated into features upon successive stacking of layers. The influence of the local magnetic moment on the DFT Hamiltonian is more localized. Regarding this locality, we introduce magnetic information {ℳ to the messagepassing neural network with strict locality, as illustrated in the magnetic moment layer in Fig. 2b. We find that the approach makes training more efficient and accurate compared with treating {ℛ and {ℳ on an equal footing (see ablation studies in Supplementary  Table 1). Finally, equivariant features on edges e ij are used to construct H ij . The xDeepH model is trained using DFT datasets by minimizing the loss function defined as the mean-squared errors of the DFT Hamiltonian matrix elements. Constraint DFT calculations are performed by using the OpenMX package 24 . DFT datasets are prepared by calculating many magnetic configurations for any given atomic structure, which increases the learning weight on the subtle magnetic effects. More details are described in 'Dataset preparation' in Methods. A previous study developed the deep neural network SpookyNet to learn the total energy and atomic forces of molecules, which not only considers the nuclear degrees of freedom as input but also takes the electronic degrees of freedom (including the total charge and spin) into account 6 . It is also based on the ENN method. A comprehensive comparison between xDeepH and SpookyNet is presented in Supplementary Section 7.

Capability of xDeepH
The performance of xDeepH is tested by example studies of increasing complexity. The first case study is on monolayer NiBr 2 , in which spin-spiral magnetism exists 25 . We prepare DFT datasets by calculating supercells of monolayer NiBr 2 with equilibrium lattice structure and random magnetic moment orientations at Ni sites (Extended Data Fig. 1a), and use them for training, validation and testing. The distribution of the mean absolute error (MAE; defined for the DFT Hamiltonian matrix elements, if not mentioned explicitly) for the test set is shown in Supplementary Fig. 2. The average MAE is as low as 0.56 meV. This ensures reliable predictions of band structures for changing magnetic configurations ( Supplementary Fig. 3). Then xDeepH is applied to study the band structure of the 19 × 1 spiral magnetism and obtains results that are well consistent with DFT (Extended Data Fig. 1b), demonstrating the good generalization ability of the method.
The second case study is on nanotubes of monolayer CrI 3 , which is one of the most investigated two-dimensional magnetic materials 26 . Recent research on CrI 3 nanotubes has revealed an intriguing magnetic transition from ferromagnetism to curved magnetism with increasing nanotube diameter 27 . The corresponding electronic-structure study, however, is lacking. We prepare DFT datasets by calculating flat monolayer sheets of CrI 3 with randomly perturbed atomic and magnetic configurations, and apply the trained xDeepH for the research (Extended Data Fig. 1c). On the test sets of monolayer sheets, the averaged MAE is down to 0.36 meV, and the band structures predicted by xDeepH match well with DFT results, showing quite high prediction accuracy ( Supplementary Figs. 5 and 6). Test calculations are performed to compare the computational cost between xDeepH and DFT. The results indicate that the efficiency advantage of xDeepH over DFT is substantial and becomes more considerable as the system size increases (Supplementary Section 5). We further use xDeepH to study (16,16) CrI 3 nanotubes for the two magnetic configurations proposed by ref. 27. As the curved lattice and magnetism are unseen during the training process, this is a strict test for xDeepH, especially on its equivariant performance. Remarkably, the predicted band structure (Extended Data Fig. 1d) and electric susceptibility ( Supplementary  Fig. 7) are in perfect agreement with DFT benchmark data. The variation in magnetism has a subtle influence on the band structure (Supplementary Fig. 8), and the subtle magnetic effects are well captured by xDeepH. Therefore, this method is promising for studying magnetic superstructures.

Brief Communication
https://doi.org/10.1038/s43588-023-00424-3 Finally, we try a challenging study on quantum materials with magnetic skyrmions. For this, we choose the moiré-twisted bilayer CrI 3 (Extended Data Fig. 2a), which has recently been predicted to have magnetic skyrmions originating from the stacking-dependent interlayer magnetic coupling of CrI 3 (ref. 28). The influence of magnetic skyrmions on electronic properties has rarely been studied before. The neural network models are trained on DFT datasets obtained for sample structures with a fixed twist angle θ = 60°, and then applied to study moiré-twisted structures of varying θ. Band structures and density of states obtained from xDeepH match well with the DFT results for a new twist angle θ = 81.79° ( Supplementary Fig. 9), verifying the reliability of the approach. Next, we consider a twist angle of θ = 63.48°, which has magnetic skyrmions in a large supercell 28 , as illustrated in Extended Data Fig. 2a. For the ferromagnetic configuration, an extremely flat valence band emerges in the system (Extended Data Fig. 2b), which originates from the moiré twist. Noticeably, the flat band disappears in the presence of magnetic skyrmions (Extended Data Fig. 2c), indicating strong coupling between the two entities. This intriguing physics relevant to flat bands and magnetic skyrmions will be further explored in our future work.

Discussion
The xDeepH method enables efficient electronic-structure calculations of large-scale magnetic materials, opening opportunities to explore emergent quantum physics and diverse magnetic systems, such as magnetic alloys, collinear or non-collinear magnetic systems, and periodic or non-periodic magnets. Moreover, the theoretical framework of xDeepH could find general applications in future development of deep learning ab initio methods, including the investigation of manybody interactions. For instance, xDeepH can be adapted to design neural networks in full respect of fundamental physical principles for representing other physical quantities, which is useful for exploring general physical effects relevant to spin dynamics, electron-magnon and phonon-magnon coupling, and so on. A more detailed discussion is included in Supplementary Section 9. However, the current xDeepH approach is not without limitations. For instance, considerable time is required to perform spin-constrained DFT calculations and for training neural network models, which hinders the dataset preparation and neural network training, respectively. In-depth research is thus needed to develop advanced spin-constrained DFT algorithms and to optimize the neural network architecture.

DFT Hamiltonian under localized basis
Localized basis is compatible with the locality principle. The orbital functions have the form ϕ iplm (r) = R ipl (r)Y lm (r ), where the radial function R ipl (r) is centered at the ith atom, p is the multiplicity index, and the angular part Y lm (r ) is the spherical harmonics of degree l and order m. Herein the spin degree of freedom must be considered, which is labeled by σ = ±1/2. The matrix element is then written as [H ij ]

Equivariance transformations of DFT Hamiltonian
To study magnetic systems, one must consider an additional input of {ℳ

Magnetic moment layer
Interaction Hamiltonian construction

Brief Communication
https://doi.org/10.1038/s43588-023-00424-3 where the superscript * denotes the complex conjugate, and D l 1 ⊗ 1 is the direct product of Wigner D-matrices D l (R) and D 1 2 (R), which act on the orbital and spin spaces, respectively. Under , {ℳ changes sign, {ℛ is unchanged and H DFT is transformed by an antiunitary operator. As real-valued orbital functions are employed here, the time-reversal transformation gives (2) On spatial inversion, {ℛ and {ℳ have odd and even parities, respectively. h l 1 l 2 also has a well-defined parity, which is (−1)

Realization of Euclidean equivariance
To maintain Euclidean equivariance, all the input, internal and output feature vectors of ENNs are required to be equivariant. If a rotation R is applied to the input coordinates of ENNs, the equivariant feature vectors will transform accordingly: where the equivariant feature vector x l m carries the irreducible representation of the SO(3) group of dimension 2l + 1, l is an integer or half-integer, and m is an integer or half-integer varying between −l and l. Moreover, feature vectors can be converted to equivariant tensors (or vice versa) via the Wigner-Eckart theorem: l 1 ⊗ l 2 = |l 1 − l 2 | ⊕ ⋯ ⊕ (l 1 + l 2 ), which changes tensor product '⊗' into direct sum '⊕'. For example, a representation of l 1 ⊗ l 2 is constructed by X l 1 l 2  (1)). Thus, the spin-unpolarized DFT Hamiltonian can be represented by the tensor, making the rotation-equivariant property preserved. In contrast, using ENNs to construct the spin-orbital DFT Hamiltonian is more complicated for the following reasons. First, the DFT Hamiltonian matrix subblock corresponds to a special representation of where l* denotes a representation whose rotation transformation matrix is the complex conjugate of that for the representation l. Second, complex-valued quantities are generally involved in half-integer representations, but only real-valued ENNs are currently available for practical computation. The problem is addressed in the DeepH-E3 method 11 , which proposes to convert the representation l* into l by unitary transformation, and reduce half-integer representations into integer ones by applying the Wigner-Eckart theorem.

Neural network model
On notations of building blocks in Fig. 2, 'E3Linear' is defined as where l is the angular momentum quantum number, m is the magnetic quantum number, c denotes the channel index, and W l cc ′ and b l c are learnable weights. To preserve equivariance requirements, biases b l c ≠ 0 only for equivariant features x with l = 0, even spatial-inversion index and even time-reversal index. '(Ux) ⊗ (Vy)' denotes the tensor product operation between features x and y, where U and V are learnable parameters. 'E3Layer-Norm' is used to normalize the feature while preserving equivariance 11 : where μ l m and σ l are the mean and the standard deviation of features, respectively, g l c and h l c are learnable weights, and ϵ is denominator for realizing numerical stability. h l c has the same equivariance requirements as b l c in equation (3). The layer of 'Non-linearity' produces the non-linearity activation function or the scalar gate on features x with l = 0 or l > 0, respectively. For l = 0 features with even spatial-inversion index and even timereversal index, the activation function sigmoid linear unit is used. In addition, for other l = 0 features, the odd activation function tanh is used to preserve equivariance with respect to spatial inversion and time reversal.
'Embedding' is used to construct initial features and transform the interatomic distance vector r ij and magnetic moment m i into equivariant vectors. Initial vertex features v (0) i are vector embeddings of nuclear charge Z i . Initial edge features e (0) ij are interatomic distances |r ij | expanded by Gaussian functions. Real spherical harmonics Y lm acting on the vector input r ij and m i are used to introduce equivariant features with arbitrary l, and update vertex and edge features. Inputs Y lm (m i ) for non-magnetic atoms i are set to zero.
In the magnetic moment layer, 'MLP' is multilayer perceptron, and 'Aggregation' is used to aggregate the magnetic moment information in the R C range to the edge feature e l ij : Aggregation l (S l , W l , r, m) where 'scalar' means spatial-inversion-even and time-reversa-even equivariant feature with l = 0, i is the set of neighbors of atom i, S and W are the outputs of equations (5) and (6), r and m are unit vectors of r and m, respectively, and '∥' denotes vector concatenation. Equation (7) introduces strict locality in the same spirit as ref. 15.

Details of neural network training
The xDeepH model was implemented with PyTorch, PyTorch Geometric and e3nn 23  E3Linear layer to obtain the Hamiltonian matrix block. l = 0 to l = 4 was used for Y lm (r ij ) and Y lm (m i ). The neural network we use included three vertex layers, three edge layers and three magnetic moment layers. Training was performed on an NVIDIA RTX 3090 graphics processing unit with a batch size of 1. The initial learning rate was set to be 0.002, which decreases by a factor of 0.5 when the loss does not decrease after 120 epochs. Supplementary Table 3 summarizes the number of parameters of the neural network models and the data splitting used for training, validation and testing, respectively, in each dataset. For NiBr 2 , we found that the deep learning model without E3LayerNorm can achieve slightly lower loss. As a result, we used the model without E3LayerNorm for this dataset, which brings a small difference in the number of parameters.

Computational details
DFT calculations were performed by the OpenMX software package (version 3.9) 24 using pseudo-atomic localized basis functions, normconserving pseudopotentials and the Perdew-Berke-Ernzerhof-type exchange-correlation functional. Constraint DFT as implemented in OpenMX is applied to study systems with specified magnetic configurations. In all calculations, we constrain the orientation of magnetic atoms (Ni and Cr) with 0.5 eV as the prefactor of spin constraint. The spin-orbit coupling is included in the DFT calculations. For NiBr 2 , the basis sets of Ni6.0H-s3p2d1 and Br7.0-s3p2d2 are used, including 14 basis functions for Ni with cut-off radius r c = 6.0 bohr and 19 basis functions for Br with r c = 7.0 bohr. The DFT + U method with a Hubbard correction of U = 4.0 eV is applied to describe the 3d orbitals of Ni. For CrI 3 , the basis sets of Cr6.0-s3p2d1 and I7.0-s3p2d2 pseudo-atomic orbitals are used, including 14 basis functions for Cr with r c = 6.0 bohr and 19 basis functions for I with r c = 7.0 bohr. The convergence of basis sets in the DFT calculations is confirmed by test calculations (Supplementary Section 6.1). The influence of basis-set size on the training efficiency of xDeepH is also checked (Supplementary Section 6.2). The energy cutoff is set to be 300 rydberg. To generate datasets, a Monkhorst-Pack k-mesh of 7 × 7 × 1 is used for supercell calculations of monolayer NiBr 2 with 27 atoms, monolayer CrI 3 with 32 atoms and bilayer CrI 3 with 64 atoms. Meanwhile, a Monkhorst-Pack k-mesh of 3 × 15 × 1 is used for monolayer NiBr 2 with 19 × 1 spiral magnetism (114 atoms), 1 × 1 × 13 for (16, 16) CrI 3 nanotubes (256 atoms) and 3 × 5 × 1 for the 2 × 1 supercell of twisted bilayer CrI 3 with twist angle 81.79° (224 atoms).

Dataset preparation
The datasets in this work involve systems with randomly perturbed atomic and magnetic configurations. Magnetic configurations are constrained by specifying orientations of local magnetic moments of magnetic sites (Ni or Cr). For uniform sampling of magnetic moment orientations, we first generate M x , M y , M z ≈ N(0, 1) for each magnetic site, where N(0, 1) stands for standard normal distribution. They are then normalized by dividing √ M 2 x + M 2 y + M 2 z . The resulting distribution of magnetic orientation vector is equivalent to uniform distribution on the unit sphere.
For monolayer NiBr 2 , we prepare DFT datasets by calculating 500 3 × 3 supercells of NiBr 2 with 27 atoms at the equilibrium lattice structure with random magnetic moment orientations. For monolayer CrI 3 , 100 different atomic structures of 2 × 2 supercells with 32 atoms are prepared by introducing random atomic displacements (up to 0.1 Å) to the equilibrium lattice structure. For each atomic structure, 10 random magnetic configurations are generated with random magnetic moment orientations. In total, 1,000 supercell structures with random atomic and magnetic configurations are included in the datasets of monolayer CrI 3 . Training sets of bilayer CrI 3 are composed of 2 × 2 supercells containing 64 atoms. To simulate a local environment of different interlayer stacking patterns, the second layer is arranged with an overall shift with respect to the first layer. The overall shift is sampled by a uniform 16 × 16 grid of the supercell, yielding 256 atomic configurations. In addition, random displacements up to 0.1 Å are introduced to each atom about their equilibrium positions. For each atomic configuration, 10 random magnetic configurations are generated with the aforementioned method. In total, 2,560 bilayer CrI 3 supercells with 256 unique atomic configurations and 2,560 unique magnetic configurations are prepared. All datasets are randomly divided into training, validation and test sets with a ratio of 6:2:2.

Data availability
Source data for Extended Data Figs. 1 and 2 is available with this paper. The dataset used to train the deep learning model is available at Zenodo 29 .