Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets

The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.


I. INTRODUCTION
Recent discoveries in the field of twist engineering of two-dimensional (2D) magnetism have unveiled unique magnetic domain structures, including nanoscale magnetic domain arrays [1][2][3][4][5].The qualitative features of these structures find their explanation in the context of exotic interlayer magnetic coupling induced by moiré patterns [6][7][8][9][10].However, due to the complex nature of the magnetic coupling in the systems [9], the quantitative description and analysis of these structures pose formidable challenges within conventional theoretical frameworks.Specifically, effective model approximations, such as the continuum model [6,11,12] and the free-layer-substrate model [7,[13][14][15][16][17][18], exhibit substantial limitations in accuracy.Alternative atomistic spin simulations may offer improved accuracy, but they are hindered by their resource-intensive nature [8][9][10][19][20][21].Consequently, it is crucial to develop theoretical frameworks that are both efficient and reliable to overcome these constraints and propel advancements in this field.
In this study, we introduce a deep-learning approach that facilitates efficient and reliable analysis of twisted van der Waals magnets.Deep learning has showcased remarkable success in tackling scientific challenges across various physical systems [22][23][24][25][26][27][28][29][30][31][32][33][34][35][36].Specifically, the application of deep neural network (DNN) techniques to 2D magnetic systems has proven highly effective in extracting magnetic Hamiltonian parameters from magnetic domain images [22].Furthermore, these techniques have demonstrated applicability to experimentally observed domain images within the same research.These results validate DNN techniques as valuable tools for comprehending 2D magnetic domain structures.We develop two specialized DNN kernels designed to analyze 2D magnetic domain structures within twisted van der Waals magnets, such as nanoscale magnetic domain arrays found in chromium triiodide (CrI 3 ) [1-10, 12, 16-20].Our regression model, illustrated in Fig. 1(a), accurately estimates the parameters of the magnetic Hamiltonian from magnetic domain images generated through atomistic spin simulations.This model resolves an inverse problem associated with inferring the parameters from the observed data, a feat beyond the reach of established theoretical methods [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].Our "generative model," depicted in Fig. 1(b), excels at producing magnetic domain images based on predefined magnetic parameters.This model, designed with deterministic characteristics, differs from typical generative frameworks by consistently producing a unique ground-state structure from given parameters.It effectively handles the complexity of the magnetic Hamiltonian without compromising accuracy, thereby reducing the need for resource-intensive atomistic spin simulations [8][9][10][19][20][21].These advancements not only extend the applicability of DNN techniques to twisted van der Waals magnets but also streamline future investigations into these captivating systems.

A. Dataset generation
To generate magnetic domain images, we employ twisted bilayer CrI 3 [6-10, 12, 16-18] in commensurate moiré superlattice structures (Fig. 2(a)).The magnetic behavior of this system is effectively described using a Heisenberg spin model [8,9,16,18]: Here, S l i represents the spin at site i on the top layer (l = t) and the bottom layer (l = b).J > 0 represents intralayer ferromagnetic (FM) exchange interactions.A > 0 represents single-ion anisotropy favoring out-of-plane magnetization.J ⊥ ij represents interlayer exchange interactions derived from ab-initio calculations on bilayer CrI 3 [9].
The coupling type of J ⊥ ij varies between antiferromagnetic and FM depending on the local stacking pattern in the moiré superlattice (Fig. 2(b)).
Figure 2(c) illustrates three magnetic phases inherent in this spin model [8,9]: (i) the FM phase, (ii) the noncollinear domain (NCD) phase, and (iii) the magnetic domain (MD) phase.The FM phase exhibits a perpendicular, uniform spin configuration.In the NCD phase, magnetic domains form in both layers, with spins tilted horizontally in opposite directions (Fig. 2(d)).On the other hand, in the MD phase, domains form in one layer with opposite out-of-plane polarization compared to the background FM region (Fig. 2(e)).In the other layer, the magnetization largely maintains an FM order, but slight spin tilting occurs around localized domain wall regions.Both NCD and MD phases exhibit moiré periodicity, leading to the periodic domain patterns (Fig. 2(d-e)), which we refer to as "nanoscale magnetic domain arrays" [20].
We generate magnetic domain images through atomistic spin simulations [9] using Eq. ( 1).These simulations incorporate randomly generated parameter sets (θ, J, A) spanning from 1.01°to 3.89°for θ, 1 to 10 meV for J, and 0.01 to 0.3 meV for A, encompassing (J, A) ≈ (2.0 meV, 0.2 meV) pertinent to CrI 3 [37].The values of J ⊥ ij are fixed, determining the overall energy scale of the magnetic parameters.For each parameter set, we obtain a magnetic ground state and generate two images for each state, one for the top and the other for the bottom layer.These images depict out-of-plane magnetization within a fixed length dimension of 100 nm.Additionally, we consistently omit the FM phase from the dataset to prevent potential overfitting issues [38], albeit resulting in biased parameter distributions.Consequently, our dataset comprises 17,292 paired images, capturing morphological modifications of nanoscale magnetic domain arrays within the NCD and MD phases corresponding to the parameter variations.Further details about the dataset generation process can be found in Appendix A.

Regression model
We implement our regression model using a customized, fully connected network.Three networks are employed, each predicting one assigned parameter, collectively yielding the entire parameter set (θ, J, A).Each network consists of four hidden layers, employing the Gaussian error linear unit (GELU) activation function [39].The input layer of the network receives two images, each sized at (200, 200), illustrating magnetic domains at the top and bottom layers.These inputs separately undergo dimensional reduction to 512 features within the first two hidden layers, which are then concatenated within a single list with a length of 1024.This list is subsequently processed through the two remaining hidden layers.The final output layer yields a predicted parameter using a linear activation function.For a visual presentation of the network architecture, refer to Appendix B.
We train the networks to minimize the discrepancy between predicted and actual Hamiltonian parameters, employing mean squared error (MSE) as a loss function.The Adam optimizer is employed with a dynamic learning rate ranging from 10 −4 to 10 −3 , and a batch size of 128 was chosen to facilitate efficient gradient updates.Additionally, we incorporate early stopping as a regularization technique to prevent overfitting and enhance generalization performance during training [38].These carefully considered choices in training parameters are made to ensure optimal performance.Two randomly partitioned sets (13,836 samples and 3,459 samples) from the full dataset are utilized as training and test sets.

Generative model
Our generative model is implemented using a customized, fully connected network.The network consists of two hidden layers, employing the GELU activation function.In the input layer, the network receives three parameters (θ, J, A), which are processed through two hidden layers, and culminated in a list of 77,618 features in the output layer.This list is subsequently reshaped into two pixel arrays with dimensions (197,197), representing magnetic domain images that delineate the out-of-plane components of spins in both the top and bottom layers.For a detailed visual representation of the network architecture, please refer to Appendix B.
We train the network to minimize the discrepancy in pixel values between true and predicted magnetic domain images, utilizing the mean absolute error (MAE) as our chosen loss function.The detailed training process aligns with the regression model, involving the Adam optimizer with a dynamic learning rate ranging from 10 −4 to 10 −3 , a batch size of 128, and the implementation of early stopping as a regularization technique.The generative model is trained and tested using the same training and test datasets that are used in the regression model.

C. Statistical error analysis
Our statistical error analysis in both regression and generative models involves normalizing θ, J, and A as follows: This normalization ensures that all parameters fall within the [0, 1] range, facilitating consistent treatment across different parameters [22].Furthermore, the Euclidean distance from each data point to the phase boundary is calculated using the following expression: Here, (θ N , J N , A N ) and (θ * N , J * N , A * N ) represent a data point in the test set and its corresponding nearest point on the FM-NCD or NCD-MD phase boundary.
In the regression model, the MAE values for each normalized parameter are calculated as: where 'True' and 'Pred.' indicate the true and predicted values in the test set, respectively.Furthermore, our principal component analysis [22,40] incorporates error vectors, defined as: The covariance matrix of the error vectors is computed using the following expression: Here, X i,j for i, j = 1, 2, 3 represents each component in the error vector X.E[X i ] indicates the average value of X i among different samples in the test set.The eigenvectors and eigenvalues of K ij are computed, and the principal components are selected based on the magnitude of their eigenvalues.Variance ratios are computed for each component using λi 3 i=1 λi × 100, where λ i denotes the eigenvalue for each component.

D. Generation of noisy test sets
We generate noisy test sets from the original test set by introducing Gaussian-white noises according to the equation: Here, z 0 i and z i represent the pixel values at the i-th pixel in the original and noise-injected images, respectively.The injected noise δz i follows a Gaussian distribution: The variance σ 2 of the distribution is effectively controlled by adjusting a signal-to-noise ratio (SNR), defined as R SNR = P signal /P noise .Here, δz i 2 represent signal and noise power, respectively.The adjustment using R SNR ensures consistency across different images with various P signal levels [22].Given P signal obtained from a noiseless image and a specified value of R SNR , we determine σ 2 through the relationship: which is derived from σ 2 ≈ P noise , valid for a sufficiently large N pixel , the number of pixels in the image.In the generation process of noisy datasets, the SNR is systematically varied over a wide range of R SNR using a decibel scale: where R dB SNR is tuned from −5 dB to 60 dB.

III. RESULTS
A. Regression model: parameter estimation from domain images

Validation of trained networks
Figure 3 presents the parameter estimation results from our trained networks on the test set.The θ estimation demonstrates precise agreement between predicted and true values, evidenced by high-level scores in evaluation metrics (Fig. 3(a)).Further analysis reveals consistently low mean absolute percentage errors (MAPEs) across the entire θ range (Fig. 3(d)).Particularly noteworthy is the observation that MAPE values remain below 0.1 in the small twist angle regime (θ ≲ 1.5°).This capability indicates the network's proficiency in estimating θ, underscoring its promising applications to small twist angle systems, which are frequently involved in experimental investigations [1][2][3][4][5].
Estimations for J and A also exhibit consistent linear relationships between predicted and true values, showing small MAPE values of 1.22 and 2.23 for J and A, respectively (Fig. 3(b-c)).Further examination reveals that the MAPE value for J and A tend to increase as their true values decrease, while high accuracy is maintained for sufficiently large J and A values (Fig. 3(e-f)).Specifically, in the vicinity of (J, A) ≈ (2.0 meV, 0.2 meV) pertinent to CrI 3 [37], the MAPE values stand at small numbers, 1.9 and 0.7 for J and A, respectively.These results showcase the networks' reliable predictive capabilities, specifically in magnetic systems with moderate values of J and A, including CrI 3 .

Statistical error analysis: error correlation between J and A
To delve deeper into the errors in parameter estimation, we conduct a statistical error analysis on these errors.Figure 4 illustrates the profiles of the MAE values (∆θ N , ∆J N , ∆A N ) and the sample count (N training ) in the training set, concerning the variation of the normalized parameters (θ N , J N , A N ).Our finding reveals that the MAE values tend to increase as their corresponding values of N training decrease (Fig. 4(a,e,i)).This tendency demonstrates that sample scarcity is one of the primary factors contributing to these errors, aligning with common expectations in machine learning modeling.In addition to this, we notice an unusual rise in ∆A N within a range characterized by a small A N value, despite an increase in N training (Fig. 4(i)).One of the possible explanations for this is that the rapid alterations in magnetic domain structures within this range [9] make it challenging for the networks to learn these patterns from the limited dataset.
We observe no discernible correlations between ∆θ N -J N , ∆θ N -A N , ∆J N -θ N , and ∆A N -θ N (Fig. 4(b,c,d,g)), suggesting that variations in J N and A N do not significantly affect the estimation of θ N , and vice versa.In contrast, noticeable correlations exist between ∆J N -A N and ∆A N -J N (Fig. 4(f,h)), indicating mutual impacts on their parameter estimation.These contrasting tendencies may stem from the distinct roles among different parameters in determining nanoscale magnetic domain arrays; while θ N solely dictates the arrays' periodicity, J N and A N collectively influence the detailed morphology of individual domains [9].Consequently, these results suggest that the networks recognize these two factors-the periodicity and domain shape of the domain arrays-and process them independently, despite not being explicitly incorporated into the algorithm.In other words, our model effectively exploits the networks' capability to learn from data without explicit manual programming, which is one of the primary benefits of deep learning techniques.
To gain deeper insights into the correlation among errors, we conduct a principal component analysis on the error vectors (∆θ N , ∆J N , ∆A N ), as depicted in Fig. 5. Notably, the error distribution appears highly concentrated within a two-dimensional subspace delineated by ∆J N and ∆A N , reflecting the relatively large values of ∆J N and ∆A N compared to ∆θ N (Fig. 3(d-f)).Moreover, within this subspace, the distribution of ∆J N and ∆A N manifests notable asymmetry, as demonstrated by the principal component (−0.017, −0.227, 0.974), which accounts for 77.86% of the observed variance.This predominant axis, intertwined with ∆J N and ∆A N , signals a significant relationship between the two errors [22], succinctly expressed as: This relationship suggests a tendency for underestimation in ∆J N to coincide with overestimation in ∆A N , and vice versa.Moreover, this correlation may elucidate the observed error magnification in ∆J N and ∆A N (Fig. 3(e-f)), where errors in one variable propagate and exacerbate those in another, resulting in more pronounced deviations [41].
The observed correlation between ∆J N and ∆A N may offer valuable physical insights into the system.Specifically, it suggests an intricate interplay between J and A in shaping the detailed morphology of nanoscale magnetic domain arrays.One conceivable scenario is that J and A operate interchangeably, i.e., an increase in one parameter compensates for a decrease in the other, thereby preserving the overall structure.This possibility aligns with previous findings in the interaction between magnetic dipolar interactions and single-ion anisotropy [22].While elucidating the precise mechanism lies beyond the scope of this study, the specific relationship uncovered here may provide valuable insights into this pursuit.
For a thorough understanding of the errors, we evaluate the network's performance near phase boundaries, where sharp variations in magnetic domain structures can significantly affect parameter estimation.Figure 6 illustrates the profiles of MAE values concerning the Euclidean distance from each data point to the FM-NCD or NCD-MD phase boundaries that quantify the degree of separation from these phase boundaries.We observe a notable increase in MAE values as the Euclidean distance to the FM-NCD phase boundary decreases (Fig. 6(a-c)), indicating amplified errors near this boundary.This trend contrasts with the growing number of samples within this region, suggesting that the errors are more influenced by the physical aspects of this transition rather than sample scarcity.However, no discernible correlation is found between the MAE values and the Euclidean distance to the NCD-MD phase boundary (Fig. 6(d-f)).Consequently, we infer that the FM-NCD transition plays a pivotal role in magnifying errors in parameter estimation.
From these results, we deduce that errors in parameter estimation within the regression model may stem from three potential sources of magnification: (i) limited sample availability in the training set, (ii) error correlation, and (iii) the FM-NCD transition.Although the hierarchy among these factors and their potential interplay lie beyond the scope of this study, these identifications offer valuable insights for implementing meticulous treatments aimed at enhancing predictive accuracy.

Robustness against noise injections
For real-world applications, including noisy experimental data, we evaluate the regression model's performance under noisy conditions.Figure 7 presents the parameter estimation results on noisy test sets obtained from the .Consequently, this result supports that the network retains its proficiency in accurately extracting θ from magnetic domain images containing substantial noises.
Estimation for J and A also display consistent linear relationships between predicted and true values (Fig. 7(b-c)), although their precision considerably degrades as the noise level increases.Specifically, within a weak noise regime (R dB SNR ≥ 20 dB), the MAPE values are sustained at relatively small values less than 4% and 6% for J and A, respectively (Fig. 7(e-f)).However, upon entering a strong noise regime (R dB SNR < 20 dB), the MAPE values steeply increase, reaching considerable levels exceeding 20% for both J and A at R dB SNR = 10 dB.Consequently, the networks' capability to make meaningful predictions in J and A is practically restricted to R dB SNR = 20 dB.We further validate the regression model under thermal fluctuations by employing noisy test sets generated through the solution of the stochastic Landau-Lifshitz-Gilbert equation [42].Our analysis reveals that the trained network maintains its accurate predictive capabilities for estimating θ and J up to the highest investigated temperature of 5 K.However, estimating A from the noisy test sets has proven highly challenging due to the dominant energy scale of the random thermal field compared to A. Consequently, the intricate morphological features associated with A could be overshadowed by the overwhelming random thermal field, leading to the network struggling to accurately estimate A from these features.
These findings suggest that despite being trained using a noiseless training set, the network could be applied to data subject to thermal noises, such as experimental data, at least for estimating θ and J. It's worth mentioning that one can expect higher performance in parameter estimation in the MD phase due to the robustness of magnetic structures against thermal noises within this phase.For a more detailed analysis, please refer to Appendix C.

Validation of trained networks
Figure 8 illustrates the magnetic domain image prediction results from our trained network on the test set.The predicted images precisely replicate the intricate pattern of the nanoscale magnetic domain arrays in the true images, capturing details such as the domains' location, shape, and size (the first and second columns in Fig. 8(a-b)).Additionally, the magnitude of local magnetization exhibits precise agreement between the two images over the majority of the region (the third column in Fig. 8(a-b)), as evidenced by small average MAE values of approximately ∼ 0.002 and ∼ 0.007 for the NCD and MD phases, respectively.This precise agreement persists consistently across the entire pa- rameter ranges, with average MAE values remaining below 0.03 (Fig. 8(c-e)).These observations affirm the proficient performance of the trained network in accurately predicting magnetic domain images from given parameters.
Despite its overall decent performance, the network shows a notable discrepancy in predicting magnetic domain images between the NCD and MD phases.In the NCD phase, the MAE value between true and predicted images remains consistently low, less than 0.02 (the third column in Fig. 8(a)).Conversely, in the MD phase, relatively large MAE values exceeding 0.3 appear in a layer where magnetic domains form (the third column in Fig. 8(b)).A closer examination reveals a concentration of these high MAE values within the local domain-wall regions, where sharp alterations in magnetization occur.This is in contrast to the NCD phase, which lacks domain walls and exhibits relatively smooth variations across the entire region.This observation leads us to posit that the pronounced discrepancy in the MD phase arises from the network's challenge in learning sharp variations within domain wall regions from the limited dataset.

Statistical error analysis: error magnification near NCD-MD transitions
To delve deeper into the errors in domain image generation, we evaluate the network's performance near phase boundaries.Figure 9 illustrates the profiles of average MAE values between true and predicted images concerning the Euclidean distance from each data point to the phase boundaries.We observe no discernible correlation between the average MAE value and the distance to the FM-NCD phase boundary (Fig. 9(a)).However, the average MAE value demonstrates a striking correlation with the distance to the NCD-MD phase boundary, showing a steep increase near the phase boundary (Fig. 9(b)).This trend is particularly compelling, given the rising number of samples within this region, and indicates that the nature of errors is deeply rooted in the physical aspects of the transition rather than a mere statistical origin, such as sample scarcity.These observations strongly suggest that the NCD-MD transition serves as a primary factor magnifying the errors in domain image generation, warranting further investigation.
One conceivable scenario is that the discontinuous nature of the NCD-MD transition exacerbates errors near this boundary.During this transition, a magnetic domain array undergoes a discontinuous transformation [6,8,9].The degree of change, quantified by its net magnetization, tends to increase with a higher value of A [9].In a large A regime, this change may surpass the network's capacity to effectively capture, leading to the magnification of errors.This scenario gains support when observing the magnification of errors with increasing A (Fig. 8(e)).Consequently, it becomes evident that meticulous attention is required to mitigate the impacts of the abrupt change across the NCD-MD transition and uphold the accuracy of the generative model in this scenario.

Prediction performance in averaged magnetization
To assess the versatility of the generative model, we evaluate its performance in predicting averaged magnetization, a practical indicator for discerning nanoscale magnetic domain arrays in experimental investigations [1][2][3][4][5].Figure 10(a-c) presents the profiles of MAPE values between true and predicted averaged magnetization (M ), con- cerning the parameter variations.Our findings reveal consistently low MAPE values across all parameter ranges.Particularly noteworthy is the observation that MAPE values remain below 0.3 in the small twist angle regime (θ ≲ 1.5°) (Fig. 10(a)) and below 0.5 near (J, A) ≈ (2.0meV, 0.2 meV) pertinent to CrI 3 (Fig. 10(b-c)).This capability indicates the network's proficiency in estimating averaged magnetization, underscoring its promising applications to small twist angle systems and the pivotal material CrI 3 , which are frequently investigated experimentally [1][2][3][4][5].
To delve deeper into the errors, we evaluate the network's performance near the phase boundaries.Figure 10(d-e) illustrates the profiles of MAE values between true and predicted averaged magnetization (∆M ), concerning the Euclidean distance to the phase boundaries.We observe no discernible correlation between ∆M and the distance to the FM-NCD phase boundary (Fig. 10(d)).On the other hand, a striking increasing tendency of ∆M emerges near the NCD-MD phase boundary (Fig. 10(e)).These observations align completely with the previous findings in domain image prediction of the generative model.Consequently, the considerations regarding error magnifying factors in that context apply similarly to the case of averaged magnetization, suggesting that the discontinuous nature of the NCD-MD transition serves as a primary factor in magnifying errors in predicting averaged magnetization.

IV. DISCUSSION
We have demonstrated the efficacy of our trained networks in accurately predicting magnetic Hamiltonian parameters based on magnetic domain images within twisted bilayer magnets and vice versa.The reliable performance of our DNN methods positions them as efficient tools for analyzing the intricate magnetic behaviors inherent in these systems.Specifically, these methods have direct applications in the study of twisted bilayer CrI 3 [1,2,[6][7][8][9][10][15][16][17][18] and other related systems incorporating different van der Waals magnetic materials [6,7,9,18,21].Furthermore, our methods possess proficiency in the efficient analysis of twisted multilayer superlattice structures, which typically require increased computing resources for handling such large-sized superlattices [3][4][5]19].Additionally, the networks' robust performance against noise injections suggests promising applications to experimental data [1][2][3][4][5].
Future research prospects encompass expanding our methods to different moiré-induced magnetic orders in various twisted magnetic systems, including the exploration of the coexisting multiple zigzag antiferromagnetic orders in twisted bilayer α-RuCl 3 [21].Additionally, incorporation of topological defects such as skyrmions [7-9, 12-17, 21] and merons [20] into the magnetic domain images holds promise for gaining valuable insights into the generation of topological defects within twisted van der Waals magnets, a captivating aspect that remains unexplored in this study.
We highlight that the pioneering deep-learning approach established in this study opens avenues for the application of deep-learning techniques in twisted van der Waals magnets.This significant breakthrough not only deepens our understanding of the intricate magnetic behaviors within these captivating systems but also holds great promise for accelerating advancements in this swiftly evolving field.

Appendix B: Neural network architectures
Figure 12 provides a detailed illustration of the neural network structure employed in the regression model.The model incorporates three neural networks (labeled as 'network 1', 'network 2', 'network 3') for regressing three parameters (θ, J, A) from given two pixel arrays ('Top True' and 'Bottom True').These arrays are derived from the first quadrant of full images sized at (200,200), obtained in the simulations.This preprocessing step ensures efficiency in processing while maintaining consistency due to the four-fold rotational symmetry inherent in each image.Each network accepts the two arrays through dedicated input layers ('input 1' and 'input 2'), flattening them into two lists ('concatenate').Subsequently, the two lists undergo independent processing through two consecutive hidden layers ('h1 1', 'h2 1' and 'h1 2', 'h2 2' for each list), ultimately reducing to 512 features each, combined into a single list sized at 1024 ('concatenate').This list then undergoes processing through two consecutive hidden layers ('h3', 'h4'), resulting in a further reduction to 512 features.The output layer ('out') of the neural network regresses these features to predict a single parameter from the set (θ, J, A).
Figure 13 provides a detailed illustration of the neural network structure employed in the generative model.The input layer (labeled as 'input 1') processes a list of three parameters (θ, J, A).This input undergoes further processing through two hidden layers ('dense' and 'dense 1') and the output layer ('dense 2').The result is a list of 77,618 floating-point numbers representing as many features.Subsequently, this output list is reshaped into two pixel arrays with dimensions (197,197), serving as the predicted domain array images ('Top Pred.' and 'Bottom Pred.').

Appendix C: Validation of the regression model against thermal noises
We generated noisy test sets with thermal fluctuations at finite temperatures by solving the stochastic Landau-Lifshitz-Gilbert (s-LLG) equation: In this equation, α is the dimensionless Gilbert damping coefficient, representing energy dissipation due to environmental interactions.γ = 1.76 × 10 11 rad/(second • tesla) is the gyromagnetic ratio.The magnetization vector m i is defined by: where µ i = −γS i is the magnetic moment of the spin S i .The effective magnetic field H α eff,i (t) is defined as: where the magnetic interaction term J αβ ij is given by: Here, δ ⟨i,j⟩ indicates that i and j are nearest neighbors in the same layer, while δ l(i),t δ l(j),b signifies that i and j belong to the top (t) and bottom (b) layers, respectively.The random thermal field H ran,i (t) represents the effect of thermal noise, with a white-noise autocorrelation function: ⟨H ran,i (t)H ran,j (t ′ )⟩ = 2Dδ i,j δ(t − t ′ ), (C5) where the fluctuation amplitude D is given by: The random field H ran,i (t) is generated using a Gaussian random number generator with a variance of 2D/dt, with temperatures ranging from 1 to 5 kelvins.For the simulations, we used the original test sets as initial configurations and then computed their time evolution based on the stochastic Landau-Lifshitz-Gilbert (s-LLG) equation.In these calculations, we set α = 0.01 and used a time step of 10 femtoseconds, integrating the s-LLG equation for a duration of 100 picoseconds to ensure that the spin configurations reached equilibrium.Figure 14 shows examples of the noisy magnetic configurations obtained from these calculations.Figures 15 and 16 present the parameter estimation results on noisy test sets obtained from the networks trained on the noiseless training set.Our analysis reveals that in both the NCD and MD phases, the trained network maintains its highly accurate predictive capabilities for estimating θ, showing an MAPE value of less than 3% up to the highest investigated temperature of 5 K (Fig. 15(d) and Fig. 16(d)).The network shows robust performance in estimating J, with MAPE values of approximately 7% and 20% at the highest temperature in the MD and NCD phases, respectively (Fig. 15(e) and Fig. 16(e)).The relatively poor performance in the NCD phase might be attributed to the vulnerable  domain structure against thermal noises observed in our simulations (see Fig. 14(a,c,e) and Fig. 14(b,d,f)), making it challenging for the network to extract this parameter accurately.Our analysis reveals significant challenges in estimating A from the noisy test sets.For instance, the MAPE values reach approximately 50% and 30% at an intermediate temperature of 3 K in the NCD and MD phases, respectively (Fig. 15(f) and Fig. 16(f)).This poor predictive capability for A may stem from the substantial impact of the random thermal field, with an estimated energy scale of approximately 0.88 meV at 3 K, significantly stronger than the parameter value of A ≲ 0.3 meV.This suggests that the network struggles to accurately estimate A due to the overwhelming influence of thermal fluctuations at this temperature in determining the detailed morphology of magnetic structures.

FIG. 1 .
FIG. 1. Schematic diagrams illustrating two deep neural network models developed in this study.(a) The regression model for estimating magnetic Hamiltonian parameters from input magnetic domain images.(b) The generative model for producing predicted magnetic domain images based on input parameters.

FIG. 2 .
FIG. 2. Twisted bilayer CrI3 and its magnetic domain images employed in this study.(a) Moiré superlattice for a twist angle θ = 5.08°.The colored circles denote local stacking patterns, including AA (green), AB (blue), BA (cyan), and monoclinic (red) [9].(b) Modulation of the local interlayer exchange energy (J ⊥ i = j J ⊥ ij ) calculated for θ = 1.02°.Red and blue colors indicate antiferromagnetic and ferromagnetic (FM) interlayer exchange couplings, respectively.(c) Zero temperature magnetic phase diagram for A = 0.2 meV as a function of θ and intralayer exchange (J), showing the FM phase (white), noncollinear domain (NCD) phase (yellow), and magnetic domain (MD) phase (green).(d-e) Magnetic configurations illustrating nanoscale magnetic domain arrays in the NCD (d) and MD phases (e), respectively.Left and right panels in each plot correspond to the top and bottom layers, respectively.The color scale depicts the out-of-plane magnetization directions, and the arrows indicates the in-plane magnetization directions.In (d), J = 2 meV, A = 0.2 meV and θ = 1.89°are utilized.In (e), J = 2 meV, A = 0.2 meV and θ = 1.02°are utilized.

FIG. 4 .
FIG. 4. Profiles of parameter-estimation errors concerning parameter variations.The x-axis denotes the normalized parameters (θN , JN , and AN ).Blue bars represent the sample count (Ntraining) in the training set.Red bars depict the mean absolute error (MAE) values (∆θN , ∆JN , and ∆AN ).Error bars indicate the standard errors of the MAE values.

FIG. 6 .
FIG. 6. Profiles of parameter-estimation errors near phase boundaries.The x-axis denotes the Euclidean distance from each data point to the FM-NCD or NCD-MD phase boundary (RFM-NCD or RNCD-MD).Blue bars depict the sample count (Ntraining) in the training set.Red bars depict the MAE values (∆θN , ∆JN , and ∆AN ).Error bars signify the standard errors of the MAE values.

FIG. 10 .
FIG. 10.Performance of the trained network in the generative model in predicting averaged magnetization.(a-c) Profiles of MAPE values of averaged magnetization (M ).The x-axis denotes the true values used in the simulation.Red bars represent the MAPE values for averaged magnetization.Error bars signify the standard errors of the MAPE values.(d-e) Profiles of MAE values of averaged magnetization near phase boundaries.The x-axis indicates the Euclidean distance from each data point to the phase boundary.Red bars represent the MAE values for averaged magnetization (∆M ).Error bars signify the standard errors of the MAE values.

FIG. 13 .
FIG.13.Neural network structure of the generative model.The generative model takes three parameters (θ, J, A) as input and produces two pixel arrays (labeled as 'Top' and 'Bottom') as output.The network consists of an input layer (denoted as 'input 1'), two hidden layers ('dense' and 'dense 1'), and an output layer ('dense 2').

FIG. 14 .
FIG. 14. Magnetic configurations at various temperatures, simulated using by the s-LLG equation.(a,c,e) NCD configurations corresponding to Fig. 2(d) at T = 1 K, T = 3 K, and T = 5 K, respectively.(b,d,f) MD configurations corresponding to Fig. 2(e) at T = 1 K, T = 3 K, and T = 5 K, respectively.Left and right panels in each plot correspond to the top and bottom layers, respectively.The color scale depicts the out-of-plane magnetization directions, and the arrows indicate the in-plane magnetization directions.

FIG. 15 .
FIG. 15.Performance of the trained networks in the regression model amidst thermal noises, specifically within the NCD phase.(a-c) Parameter estimation results derived from noisy test sets in comparison to the noiseless original test set (denoted by T = 0).Here, T denotes the temperature in the kelvin unit.The x-axis denotes the true values used in the simulation, while the y-axis represents the model's estimated values.The pink lines (y = x, denoted by True) represent the ideal predictions.(d-f) MAPE values for the estimated parameters at different temperatures (T ).The markers denote MAPE values for each parameter.

FIG. 16 .
FIG. 16.Performance of the trained networks in the regression model amidst thermal noises, specifically within the MD phase.(a-c) Parameter estimation results derived from noisy test sets in comparison to the noiseless original test set (denoted by T = 0).Here, T denotes the temperature in the kelvin unit.The x-axis denotes the true values used in the simulation, while the y-axis represents the model's estimated values.The pink lines (y = x, denoted by True) represent the ideal predictions.(d-f) MAPE values for the estimated parameters at different temperatures (T ).The markers denote MAPE values for each parameter.