A regression-based model evaluation of the Curie temperature of transition-metal rare-earth compounds

The Curie temperature (TC) of RT binary compounds consisting of 3d transition-metal (T ) and 4f rare-earth elements (R) is analyzed systematically by a developed machine learning technique called kernel regression-based model evaluation. Twenty-one descriptive variables were designed assuming completely obtained information of the TC. Multiple kernel regression analyses with different kernel types: cosine, linear, Gaussian, polynomial, and Laplacian kernels were implemented and examined. All possible descriptive variable combinations were generated to construct the corresponding prediction models. As a result, by appropriate combinations between descriptive variable sets and kernel formulations, we demonstrate that a number of kernel regression models can accurately reproduce the TC of the RT compounds. The relevance of descriptive variables for predicting TC are systematically investigated. The results indicate that the rare-earth concentration is the most relevant variable in the TC phenomenon. We demonstrate that the regression-based model selection technique can be applied to learn the relationship between the descriptive variables and the actuation mechanism of the corresponding physical phenomenon, i.e., TC in the present case.


Introduction
The development of strong permanent magnets is an urgent technological issue as well as a fundamental challenge in materials science. Most strong permanent magnets are rare-earth magnets that mainly consist of transition-metal and rare-earth elements. To date, various rareearth transition-metal compounds have been synthesized. The strongest magnetic compound, Nd 2 Fe 14 B, which is the main phase of neodymium magnets, was developed by Sagawa et al. [1]. Because of its relatively low Curie temperature (T C ) of 586 K, its coercivity at high temperatures is not sufficient for technological usage; therefore, dysprosium (Dy) is added to ensure sufficient coercivity. In fact, Sagawa  element). However, the T C was much lower than that of Sm 2 Co 17 , which was the strongest magnet before the invention of neodymium magnets. By adding boron to Nd 2 Fe 17 (T C = 335 K), they obtained a new compound with a higher T C , Nd 2 Fe 14 B. However, the T C values are still much lower than those of Sm 2 Co 17 (1,193 K) and α-Fe (1,043 K). Owing to this complexity, it is highly desirable to establish a technique to predict accurate T C values and to clarify the controlling parameters (variables) that influence this property for the development of a new strong magnet.
From a theoretical point of view, an accurate description of T C is a demanding task. The spin magnetic moment at each atomic site and the magnetic exchange coupling between them can be evaluated within the framework of density functional theory [2]. T C can be obtained by solving a derived classical spin model [3,4]. This approach works reasonably well for α-Fe, whereas it gives a value that deviates significantly from the experimental observation for Ni, because the longitudinal spin fluctuation is not captured in the classical spin model [5]. In the case of rareearth compounds, the 4f electrons add further complications. Both the electron correlation and spin-orbit interaction are strong in these compounds; hence, an advanced theoretical treatment is needed for a reliable description of the magnetic properties.
The recent rapid development of data-driven approaches in materials research offers another possibility [6,7,8,9,10]. At present, T C data can be obtained for many compounds from databases or literature. Machine learning may be utilized to predict the T C of a new compound from existing data. In the present work, we focus on the T C of transition-metal rare-earth bimetals. We explain the Kernel ridge regression (KRR) technique with various forms of the kernel functions and how to select variables to represent the collected data. We demonstrate how the T C value of a compound can be reproduced from the existing data by the KRR. We also analyze the role of variables for obtaining a high prediction accuracy. Next, a comparison between the T C value estimated by KRR and the experimentally observed T C value, for the new compounds is studied. Through a systematic analysis, we show that insights into the actuating mechanisms of physical phenomena can be obtained through the determination of a set of physically meaningful parameters by employing machine learning algorithms.

Data collection and representation
We collected experimental data of 101 binary compounds consisting of transition metals and rare-earth metals from the Atomwork database of NIMS [11,12], including the crystal structure of the compounds and their observed T C values. Our task was to develop a model for estimating the T C of a new compound from the training data of known compounds. For this purpose, one of the most important steps is the choice of an appropriate data representation that reflects the knowledge of the application domain, i.e., a model of the underlying physics. On the other hand, a data representation that derives a good estimation model may imply the discovery of the underlying physics. To represent the structural and physical properties of each binary compound, we use a combination of 21 variables. We divide all the 21 variables into three groups. The first and second categories contain descriptive variables that describe the atomic properties of the transition-metal (T ) and rare-earth (R) constituents. The properties were as follows: (1) atomic number (Z R , Z T ); (2) covalent radius (r covR , r covT ); (3) first ionization (IP R , IP T ); and (4) electronegativity (χ R , χ T ). In addition, descriptive variables related to the magnetic properties were included: the (5) total spin quantum number (S 3d , S 4f ); (6) total orbital angular momentum quantum number (L 3d , L 4f ); and (7) total angular momentum (J 3d , J 4f ).
The selection of these variables originates from the physical consideration that the intrinsic electronic and magnetic properties will determine the 3d orbital splitting at transition-metal sites. To capture the effect of the 4f electrons better, especially the strong spin-orbit coupling 3 effect, we add three additional variables to describe the properties of the constituent rareearth metal ions, (15) the projection of the total magnetic moment onto the total angular moment (J 4f g j ), and (16) the projection of the spin magnetic moment onto the total angular moment (J 4f (1 − g j )) of the 4f electrons. The selection of these features is based on the physical consideration that the magnitude of the magnetic moment will determine the T C . It has been well established that information related to the crystal structure is very valuable for understanding the physics of binary compounds with transition metals and rare-earth metals. Therefore, we design a third group with structural variables that roughly represent the structural information at the transition metal and rare-earth metal sites, which are (17) the concentration of the transition metal (C T ), and (18) the concentration of the rare-earth metal (C R ). Note that if we use the atomic percentage for the concentration, the two quantities are not independent. Therefore, in this work, we measured the concentrations in units of atoms/Å 3 . This unit is more informative than the atomic percentage, as it contains information on the constituent atomic size. As a consequence, (C T ) and (C R ) were not completely dependent on each other. Other structure variables were also added: the mean radius of the unit cell (19) between two rare-earth elements r RR , (20) between two transition metal elements r T T , and (21) between transition and rare-earth elements r T R . We set the experimentally observed T C as the target variable.

Kernel regression analysis
To learn a function for predicting the values of T C of compounds from data represented by using vectors of the variables, we apply the KRR technique [13], which has recently been applied successfully in many materials science studies [14,15,16]. In our study, we apply the KRR with five kernel functions, the forms of which are defined as follows: Cosine kernel: Linear kernel: Polynomial kernel: Gaussian kernel: Laplacian kernel: For a given new data instance x * , the predicted propertyf (x * ) is expressed as the weighted sum of the kernel functions:f where N is the number of training data points, and the weighting coefficients c i for the corresponding data point x i are determined by minimizing The regularization parameters λ and hyper parameters (σ, α) are chosen with the help of crossvalidation [17,18], i.e., by excluding some of the data instances during the training process and maximizing the coefficient of determination R 2 : where p vld is the number of validation data points used to evaluate how the predicted values agree with the actual observed values, andȳ is the average of the validation set used to compare the values predicted for the excluded compounds with the known observed values. By employing the cross-validation framework, the evaluation of the prediction accuracy is carried out by using data that are not involved in the training process to avoid overfitting the training data. In this study, we use the maximized R 2 as a measure of prediction accuracy. To obtain a good estimation of the prediction accuracy, we carried out KRR with leave-one-out cross-validation using the collected data, of which T C is considered as a response variable, and the other variables are considered as descriptive variables. It should be noted that in KRR, the kernel metric represents a formulation to measure the similarity between data instances in an abstract description space [19]. Therefore, KRR analyses with different kernel metrics will provide a better understanding of the data set.

Kernel regression-based variable evaluation
To find the most appropriate set of variables for the prediction of T C and the hidden knowledge regarding the determination of the values of T C of the compounds, we utilize an exhaustive search to evaluate all possible variable combinations [20,21,22] to identify and remove irrelevant and redundant variables [23,24,25]. KRR models with various kernel types described previously are applied as the predictors. For a given variable combination, we determine the regularization parameters λ and other hyperparameters corresponding to the specific kernel setting in each experiment (detail in Section 2.2) to maximize the prediction accuracy for T C of the KRR model. By training the KRR models for all combinations of the variables, we can obtain new data for all possible sets of variables. By using the variables in that set, the KRR model can achieve the best prediction accuracy. To analyze this kind of data, we focus on two research objectives: (1) to find the set of variables that yield the best prediction accuracy for the prediction of T C , (2) to find the variables that are relevant for the prediction of T C . One can easily presume that incorporating irrelevant variables into the KRR model may impair its prediction accuracy. Therefore, we define the prediction ability P A(S) of a set S by the maximum prediction accuracy that the KRR model can achieve by using the variables in a subset s of S as follows: where R 2 s is the value of the coefficient of determination R 2 [26] achieved by the KRR using a variable set s as the independent variables. s P A is the variable subset of S that yields the prediction model having the maximum prediction accuracy.

Variable relevance analysis
On the basis of Eq.(9), we can evaluate the relevance [27,28] of a variable for the prediction of T C using the expected reduction in the prediction ability resulting from the removal of this variable from the full set of variables. Let D be a full set of variables, d i a variable, and D i = D − {d i } the full set of variables after removing the variable d i . The degree of the relevance of variables can be formalized as follows: Among the strongly relevant variables, a variable that causes a larger reduction in the prediction ability when it is removed can be considered to be a strong variable. The degree of relevance of a strongly relevant variable can be computationally estimated by using the leave-one-out approach, i.e., by leaving out a variable in the currently considered variable set for the KRR analysis and evaluating the extent to which the prediction accuracy is impaired.

Weak relevance: a variable is weakly relevant if and only if
It is clearly seen from Eq. (11) that estimation of the degree of relevance for the weakly relevant variables cannot be carried out in a straightforward manner, as with the strongly relevant variables. Weakly relevant variables are variables that are relevant for prediction, but they can be replaced by other variables.

Ensemble learning for prediction
In this step, we select a set of variable combinations, that yields top highest R 2 score KRR models, and consider as valuable variable combinations for capturing the T C phenomenon. We then apply a technique from ensemble learning [29,30,31] to integrate all these KRR models to build a prediction model for predicting the T C of new compounds.
In statistical theory, the prediction error of a model in the prediction of the target variable y for a new data instance x can be decomposed into three factors: bias, variance, and irreducible factors [29], as follows: with Bias and V ar are : where f (x) is an actual function associated with the new data points y = f (x) + , the noise has zero mean and variance σ irr ;f (x) is the estimated function (or predicted values of discrete points x). The irreducible factor σ irr depends on the information of the target variable stored in the prediction variables, which were used for describing the training data instances. It should be noted that increasing the number of training data points does not guarantee that the prediction ability of the learning model will improve. This error can only be reduced by improving the similarity with the training data or by improving the designed descriptive variable. On the other hand, the reducible factors, bias and variance, depend on the nature of the modeling method, i.e., the KRR in these experiments. In particular, the variance factor only measured the movement of the KRR around its mean. In this research, we empirically investigate the probability distribution of the predicted values of the target variable for new data instances by utilizing an ensemble learning and bootstrap aggregating-bagging [32] algorithm. In more detail, we vary both the combination of descriptive variables and the training set by sampling with replacement from the original data uniformly. By doing so, we can indirectly observe the behavior of both the bias and the irreducible factors. An prediction model for predicting the T C of new compounds can be constructed from the obtained distribution of prediction values.

Kernel regression-based variable evaluation
To investigate the scientific connection [20] between the variables and the actuation mechanisms of the physical phenomenon of T C in these bimetal systems, we examine whether T C can be predicted by using the designed variables of the compounds. A screening was conducted for all possible variable combinations, 2 21 − 1 = 2, 097, 151 that respectively derive the same number of prediction models. As described in section 2.2, a given kernel metric formula associates with a method to measure the similarity between compounds. Therefore, five kernel metrics-cosine, Leave-one-out cross-validations [17,18] is performed to evaluate the prediction accuracy of these models.
The prediction abilities P A (equation 9) of the designed descriptive variables set for different kernel-type predictors are summarized in the table 1. The highest P A -R 2 score 0.982 is achieved by two models deriving from variable combinations s P A = {Z R , χ T , J 3d , r T R , C T , C R } with the Gaussian kernel and s P A = {Z R , S 4f , L 3d , J 3d , r RR , C T , C R } with the polynomial kernel. The P A value with the Laplacian kernel experiment achieves an R 2 score of 0.973, with its s P A = {χ R , χ T , J 4f (1 − g j ) , Z T , r covT , IP T , S 3d , L 3d , J 3d , C R }. In this screening result, there are 892,612 variable combinations associated with the Gaussian kernel; 284,649 variable combinations associated with the polynomial kernel and 1,317,193 variable combinations associated with the Laplacian kernel that yield regression models with R 2 scores exceeding 0.90. It is noted that, even with the same kernel metric, several regression models achieved a similar excellent prediction accuracy because the designed variables are not independent variables.
On the other hand, the P A values with linear and cosine kernels were lower than 0.8, i.e., the T C variable cannot be predicted by our designed descriptive variable set with these kernel regression models.
Finally, the P A analysis indicates that with all the designed variables, it is possible to accurately predict the values of T C of the rare-earth transition bimetal alloy compounds by using Gaussian, polynomial, and Laplacian KRR models with the designed variables. In the subsequent sections, we will discuss the method to improve the maximum prediction accuracy of the models as well as the physical meaning of strongly relevant variables. Figure 2 shows the dependence of the best prediction accuracy P A-red lines on the number of variables recruited in the Gaussian-polynomial-Laplacian kernel regression models. In general, the prediction accuracy in all the experiments reaches the highest value with the number of descriptive variables from 6 to 8, and then gradually decreases when the number of recruited variables increases. The small subset of the designed descriptive variables and the large number of high-accuracy models described above originate from the fact that the overuse of many weakly relevant variables [23,24,25] weakens the correlation between the similarity of the compounds, which is measured using the kernel of the variables, and the differences in their T C values.

Strongly relevant and weakly relevant features
Next, we evaluate the relevance of each variable for the prediction of T C . We compare P A(D) of the full set of variables D and P A(D − {d i }) for all the variables d i . We found that most of the variables are weakly relevant, and the prediction accuracy does not significantly change, except in one case-when removing the variables of the concentration of the rare-earth metal C R . It is clearly seen in Figure 2 that the absence of C R in the Gaussian and polynomial kernel model results in a dramatic decrease in the accuracy: P A(D) < P A(D − {C R }); therefore, C R is surely assigned as a strongly relevant variable in terms of the prediction of T C (see Section 2.4).
This result is consistent with the understanding so far that the values of T C of binary alloys consisting of 3d transition-metal and 4f rare-earth metals are mainly determined by the magnetic interaction in the transition-metal sublattice. In terms of molecular field theory [33], we have where M T and n T T are the magnetization and molecular field coefficients of the transition-metal sublattice, respectively. Both M T and n T T strongly depend on C R . The dependencies of M T and n T T on C R are different in compounds with different combinations of rare-earth metal (R) and transition metal (T). In Co-based and Ni-based compounds, M T and n T T tend to decrease when C R increases. This leads to a rapid decrease in T C with an increase in C R . For example, in  Gd-Co compounds, T C decreases from 1,404 K to 143 K with the increase in the concentration of Gd from 0% to 75%. In contrast, in R-Fe compounds, M T and n T T tend to increase with increasing C R . This leads to an increase in T C with increasing C R . Indeed, in Gd-Fe compounds, T C rapidly increases from 0 K to 827 K with the increase in the concentration of Gd from 0% to 33%. Detailed correlation between C R and T C in different transition metal-based compound is shown in Figure 1c.

Prediction of T C for new compounds
In this section, we discuss the ability of ensemble learning using the obtained KRR models in the prediction of T C for new compounds. The test set of new compounds includes five Febased compounds. The T C of two of these compounds has been determined experimentally: SmFe 12 − 555(K) [34], YFe 12 − 483(K) [35]. No T C information is available for NdFe 12 , GdFe 12 , and DyFe 12 . From the discussion in Section 3.1, we know that C R is the most strongly relevant to T C . The C R values of the five compounds in the test set are approximately 0.006 atoms/Å 3 , which is much lower than the smallest C R value for all the other Fe-based compounds (which is 0.0075 atoms/Å 3 of Fe 17 compounds-based) (see Figure 1c). The only compound having a similar C R value is LaCo 13 with C R of 0.0055 atoms/Å 3 . Further, the compounds in the test set are recently synthesized and have a crystal structure significantly different from that of all the other compounds in the data set. Therefore, the prediction for the T C of these compounds could be seem closer to an extrapolative than an interpolative prediction problem. Figure 3 shows the predicted value distributions for all test compounds (black line histogram) and the observed values (red dash line). It is easy to recognize that the distributions of the predicted T C for these test compounds can be approximated by a mixture of three separate Gaussian distributions. We represent these distributions using red, blue and green, in increasing order of their mean values. From this Gaussian distribution decomposition, we can suggest that at least three distinguish functionsf (x) can be regressed from the observed data to model the T C phenomenon. Further, since the functions are learned from sub-groups of all the data set  by the bagging method, we can suggest that there are at least three sub-groups of compounds corresponding to these three models. Next, we analyze the relation of the experimentally observed T C of the two test compounds SmFe 12 and YFe 12 to their predicted T C distribution. It is obvious that both the experimentally observed values correspond to the green Gaussian distribution, i.e., the largest T C group (see Figure 3a, 3b). The mean value of these green Gaussian distributions is close to the observed value of 535.1 K, as compared with 555.0 K of SmFe 12 , and 443.7 K as compared with 483.0 K of YFe 12 . From the fact that all the five test compounds are Fe 12 -based compounds with the same crystal structure, we can infer that the experimental values of the remaining three compounds also correspond to the green distribution. Therefore, we predict that the T C of NdFe 12 , GdFe 12 , and DyFe 12 are 488.5 K, 568.5 K, and 444.7 K, respectively. There is also the potential for further experimental study on the T C of these compounds.
Lastly, all the above investigations show that further research is required to extract hidden functions in data as well as an alternative method of integrating predicted results from ensemble

Conclusion
In this study, we analyzed the T C target variable of binary alloys consisting of 3d transitionmetal and 4f rare-earth elements by a machine learning technique called kernel regressionbased variable evaluation. Twenty-one descriptive variables were designed assuming completely obtained information of the target variable. Multiple kernel regression with different kernel types: cosine, linear, Gaussian, polynomial, and Laplacian kernels were implemented as predictor models. All possible descriptive variable combinations were generated to construct the corresponding prediction models. The prediction accuracy of T C was maximized with different combinations of the designed descriptive variables with Gaussian, polynomial, and Laplacian kernel types. Further investigation the result of the screening prediction ability process shows that the rare-earth concentration is the most strongly relevant descriptive variable. Finally, we discuss the utilization of an ensemble model constructed by top highest prediction accuracy and a bagging algorithm to predict a test set of compounds. The final predicted T C values for NdFe 12 , GdFe 12 , and DyFe 12 are 488.5 K, 568.5 K, and 444.7 K, respectively. This research also shows that there are at least three mechanisms that drive the T C phenomenon. It also highlights the need to extract these mechanisms in future work. To conclude, in this research, we demonstrated how the kernel regression-based model evaluation can be utilized for ascertaining the scientific connection between the variable and the actuation mechanisms of a physical phenomenon by using both descriptive and predictive analyses.