Important descriptors and descriptor groups of Curie temperatures of rare-earth transition-metal binary alloys

We analyze Curie temperatures of rare-earth transition metal binary alloys with machine learning method. In order to select important descriptors and descriptor groups, we introduce newly developed subgroup relevance analysis and adopt the hierarchical clustering in the representation. We execute the exhaustive search and successfully illustrate the importance of descriptors and descriptor groups. We execute the exhaustive search and illustrate that our approach indeed leads to the successful selection of important descriptors and descriptor groups. It helps us to choose the combination of the descriptors and to understand the meaning of the selected combination of descriptors.

of the basic approaches is to solve an (extended) Hubbard model by using various low-energy solvers.In principle, this method is expected to be accurate.However Anisimov et al. showed that the results are sensitive to the effective parameters and details of the low energy solver. 4-6)  Therefore, this approach is still at the level of testing the formalism for simple systems like pure transition-metal magnets.
Atomistic spin model is the most common choice for practical application to more complex systems. 3)The spin model is constructed from the magnetic moment at each atomic site and the intersite magnetic exchange-couplings based on the assumption of fixed magnitude of spin moments.The parameters are evaluated using the first-principles calculations. 3)This method can be applied to rare-earth magnets.Usually, the model is simplified further, and is restricted to the TM-3d and RE-4f spins.Then, T C sis evaluated, usually in the mean field approximation.The mean field approximation, however, usually overestimates T C s. Thus, there exist many sources of error in the T C evaluation using the atomistic spin model.The development of theoretical methods for the estimation of the T C is still underway.
In contrast to the deductive approaches described so far, there is now a movement toward utilizing inductive approaches, i.e., data-driven methods for estimating T C , and there have been many reports of successful prediction of the physical quantities using such methods. 7-12)The data-driven approach accumulates data, prepares descriptors, makes a model with the descriptors, and finally predicts the values of physical quantities of new materials.
One of the key points to be considered for successful prediction is the choice of descriptors.A typical example of descriptor selection can be seen in the work by Ghiringhelli et al., where a regression model is used to predict the energy difference between zinc blende or wurtzite and rocksalt structures. 13)They used a linear regression model, and first prepared basic descriptors.However, a linear regression model with only the basic descriptors has low description power.Then, they performed various operations on the basic descriptors and produced a number of nonlinear combinations among the basic descriptors.This resulted in an increase in the prediction power.They shrank the number of descriptors using LASSO and finally employed exhaustive search to find the best linear regression model.Their work shows that the combination of descriptors is important for increasing the accuracy of the regression model.
Usually, we select the best regression model and discard all the others (performanceoptimized model).However we know that there exist many regression models, where the combination of the descriptors is different from the one that has the best score, but the score of which is as good as the best one indicated by the exhaustive search method.(The best Z T , r T , r cv T , IP T , χ T , S 3d , L 3d , J 3d Atomic properties of rare-earth metals (R) Transition metal, rare-earth, and structural descriptors.See also the supporting information. 18)  score means, for example, the largest R 2 value in the regression model.)There exists another strategy where we choose the regression model the score of which is not the best, but is high.
For example, we can choose low cost descriptors, where "low cost" means easy or literally low cost to evaluate through experiments or calculations.This model is usually referred to an operation-optimized model.Okada et al. devoted considerable effort to the latter problem.
They showed the scores of regression models as the density of states to understand the overall structure in one way, and plotted the best scores as a function of the combinations in another way, such as the indicator diagram, to select the best combinations depending on the purpose of the analysis. 14-16)  Yet, it is not easy to understand the relationship and structures among descriptors from a huge list of scores and descriptors.Informatics treatment usually ignore the importance of the meaning of the descriptors, though they are physical parameters that physicists regard as important.However we hope that we can extract more information from the huge data.In the present work, we introduce a well-defined subgroup concept to clarify the relationship among descriptors.Our method can also elucidate how to choose combination of descriptors systematically as well as how to understand the meaning of descriptors.
Our target variable is the experimental T C of the rare-earth transition-metal binary stoichiometry alloys considered in this study. 17)We select the descriptors from the element dependent categories (R for rare-earth elements and T for transition metal elements), and utilize the knowledge of the conventional theory-driven method.The key parameters of the effective theory-driven models are related to the properties of the constituent elements and/or structural parameters.For example, the orbital energy level increases (becomes deeper) as the atomic number Z increases.The electron interaction becomes stronger as the atomic orbital becomes 3/13 more localized.The magnetic exchange-couplings are associated with the strength of the electron interaction and transfer integrals.The coupling strength between TM-3d and RE-4f (through RE-5d) is crucial for discussing the RE dependence of magnetism.This strength is proportional to the 3d-4f effective exchange coupling and the 4f total spin projected onto the 4f total angular moment J 4 f .The latter quantity is given by J 4 f (1 − g J ), with g J being the Landé g-factor.We also add the descriptors from the structure-related category (S) to describe the ratio of the elements as well as the real volume or spatial dependent simple variables to distinguish, e.g., Th 2 Zn 17 and Th 2 Ni 17 polytypes.We list the descriptors in Table I, and give their detailed explanations in the supporting information. 18)  As a regression model, we employ kernel ridge regression with the radial basis function kernel.Kernel ridge regression can include the non-linear effects of the descriptors and has much stronger power to fit the target functions with the descriptors, though there exist a demerit of taking much more time to fit/predict the regression models than the linear regression does.We used Python scripts with mpi4py, scipy and scikit-learn. 19-21)Our scores in the regression models are the R 2 values, which we evaluate in the leave-one-out cross validation.
First, we analyze the descriptors.We take Pearson's correlation coefficient between the descriptors.For the T category, the absolute values of Pearson's correlation coefficient among the three descriptors, Z T , r T , and S 3d , are the same, namely 1, which means that their contributions are the same in the regression model after the normalization procedure.Therefore, the number of independent descriptors is reduced from 27 to 25.Then, we perform exhaustive search for 2 25 − 1 = 3.3 × 10 7 regression models where the combinations of descriptors are different, and evaluate their accuracy values (scores).
Usually, we evaluate the score of the regression model; however, we want to evaluate the importance of the descriptors.Therefore, we change the viewpoint from the regression model to the descriptor in order to discuss the importance of the latter.We use relevance analysis, 22, 23) which roughly corresponds to the linear response theory with respect to the descriptors.( We explain the scores and relevance analysis in the supporting information. 18)) It originally utilizes the change in values when we remove/add a descriptor.The former corresponds to the leave-one-out experiment, while the latter corresponds to the add-one-in experiment.The descriptor is strongly or weakly relevant when its accuracy score changes meaningfully in the leave-one-out or the add-one-in experiment, respectively.
Our first relevance analysis is based on strong relevance.We found that only the descriptor, C R , is strongly relevant.We can verify the importance of C R when we plot C R vs T C .
Almost all the points are placed in the bottom-left side of the right panel of Fig.  clear that C R has a considerable influence on the T C .It should be noted that that we will not able to find such a relationship if we simply execute the regressions.
We notice that relevance analysis can be done not only for a descriptor, but also for a subgroup of descriptors.We define groups and subgroups in this paragraph.The second relevance analysis is based on weak relevance, where, in the original prescription, we add another descriptor to the set of descriptors, which we must define.We define the groups and subgroups here, and make use of them in the relevance analysis.We utilize hierarchal clustering analysis, where the distance between descriptors is one minus the absolute values of Pearson's correlation coefficient.We can define the groups or subgroups of descriptors that are clustered based on the criteria of them being within distance, d, of each other.For example, we can define four groups at d = 0.5.Two of them have the same descriptors as those of the T and R categories, while the other two have that of the original S category.(We call the original cluster as category and the cluster by the hierarchical analysis as group.)The d T R constitutes a group, while the other S category descriptors constitute the other.It is not surprising that the grouping at d = 0.5 is almost the same as the categories defined a priori  as T, R, and S when we remember the definition of the descriptors of the materials.Here, we successfully defined the groups and subgroups, where the groups are almost the same as the original category but are clustered from the data themselves.(We redefine the group S as a result of this clustering.The group S that does not include d T R is different from the category S.) We can make further advances in this grouping.We notice that the definition of the value of d is unnecessary, but we only have to define the vertical line of the decomposition tree to define the subgroups because the child nodes below the vertical line is the same.(See also Fig. 2. The vertical axis corresponds to d.) Thus, we are able to define many subgroups of the descriptors as sets of the child nodes of the dendrogram.
We apply the relevance analysis not to a descriptor but to a subgroup/group.We call this 6/13 method subgroup relevance analysis.We plotted the result in Fig. 2. The horizontal score is evaluated in the leave-one-out experiment and is related to the strong relevance, while the vertical scores are evaluated in the add-one-in experiment and is related to the weak relevance.
Note that the score of a subgroup belonging to a group is evaluated under the condition that we must use at least one descriptor in the subgroup, and any descriptors belonging to the other groups can be added in the weak relevance analysis.
In Fig. 2, the weak relevance values, or add-one-in values, are written as vertical values.
The subgroup containing only r R has the score, 0.89467, which is the highest score in the condition that we must take the subgroup r R in the group R and we can take any descriptors in the other groups.(A subgroup which has a descriptor is also a subgroup.)The subgroup containing r R , Z R , and r cv R has the score, 0.95445, which is the highest score in the condition that we must take at least one descriptor in the subgroup r R , Z R , and r cv R of the group R and we can take any descriptors in the other groups as explained in the previous paragraph.
The sole descriptor Z R in the group R has the highest score (0.95445).It means that Z R can solely represent the group R.This is also the case for the C R subgroup in the group S. However the structure of the group T is different from those of the groups R and S. The subgroup made of J 3d , χ T , r cv T , Z T (and r T and S 3d ) has the highest score (0.94876), but its child subgroup descriptors have smaller scores (0.92427 and 0.94650).It means that there exists no single descriptor that can represent the overall nature of the group T. When we examine all the combinations made of J 3d , χ T , r cv T , Z T , we find that Z T takes the best score (0.95450) if we choose only one of the descriptors among them, a set of Z T and J 3d is the best (0.95339) for two descriptors, and a set of Z T , J 3d and L 3d is the best (0.95445) for three descriptors.We note that the descriptor Z T has the same effect as S 3d .We discuss interpretation of the result later.

7/13
We can also obtain the importance of the groups from the horizontal values above the yellow solid line in Fig. 2. They are the strong relevance values, or leave-one-out values of the groups T, R, and S. For example, the group R has the value, 0.87587, which is the best score when we remove all the descriptors of the group R. The better the score is, the less important the group is.The value, 0.50682, is the smallest among them, which means that the group S is the most important among the groups.On the other hand, the least important group is R, the value of which is 0.87587.It means that the score still holds a high value even if we exclude all the descriptors in the group R. Therefore, the importance of group R is the lowest among T, S, and R.
We have added additional explanation in Fig. 2. The descriptor J 4 f (1 − g J ) can represent the subgroup containing g J ,...,J 4 f g J , but the score is 0.93296, which is lower than the score 0.95445 of Z R .We have also added a comment on the group of d T R .The strong relevance value is 0.95445 and the weak relevance value is 0.95382.The facts that their difference is small and that the weak relevance value is smaller than the strong relevance value mean that the existence of the group d T R makes the regression model worse.
Here, we compare the result of the subgroup relevance analysis shown in Fig. 2 with the best score having n descriptors without the subgroup relevance analysis, which is shown in table II.The set of C R , Z R , and Z T has the best score (0.94222) for n = 3.The set of C R , Z R , Z T , and J R has the best score (0.95339) for n = 4.The set of C R , Z R , Z T , J R , and L 3d has the best score (0.95429) for n = 5.The descriptor sets are made of the most important descriptors in group R (Z R ), group S (C R ), and group T (Z T when we choose a descriptor; J 3d and Z T when we choose two descriptors; and J 3d , L 3d , and Z T when we choose three descriptors.)These combinations are the same as the analysis in the previous paragraph.Thus, the subgroup relevance analysis successfully illustrates the structure among the descriptors and their importance.
One may think that the difference in the scores are quite tiny.For example, 99.0% value of the global best score is 0.944, which roughly corresponds to the best score with 12 descriptors (see also Table I in the supporting information). 18)However the predicting ability changes drastically.We plot the "RMSE" between the best models with n descriptors in Fig. 2 in the supporting information. 18)It can be clearly seen that the prediction abilities for n=3 to 8 is qualitatively different from those for n ≥9, but the difference of the score of the best model with 9 (10) descriptors to the global best model is only 0.1% (0.4%).The difference in the score looks tiny at a glance, but is meaningful in this data and regression model.(One must also discuss the total density of state of the scores to discuss the meaningful difference of the 8/13 scores, but it is beyond the scope of this study. 14-16)) The ordering of the scores of the models (combinations of descriptors) can be changed according to the details of the regression scheme and noise in the data, because the differences in the scores are quite small (Table II in the main body and Table I in the supporting information). 18)Thus, just showing the best models with n descriptors may give us wrong information.However the relevance analysis can give us more significant differences.The dendrogram, or grouping, does not depend on the scores of the models because it is made only of the distances between the descriptors.Even if there exists noise in the data, which may affect the scores of the model, we can expect that similar descriptors will give similar scores.The subgroup relevance analysis can illustrate how the distances, or the similarities, between the descriptors affect to the models.
Here, we further explain the advantage of the expression with the dendrogram.For example, we can easily choose r cv R if we do not want to use Z R if the importance is expressed as in Fig. 2. It enables us to find the next best route, that is, to go upward and try a new branch downward in the tree structure.We believe that this expression is much better than simply providing a list, and it is much easier to find out the operation-optimized regression models.
We can conclude that the descriptor C R is strongly relevant when we define the subgroups at d ∼ 0 and execute the leave-one-out experiment.The original relevance analysis is the special case of the subgroup relevance analysis.Therefore, the subgroup relevance analysis is a natural extension of the original relevance analysis.
Here, we note the possible interpretation of the regression model in the context of condensed matter physics, where we know that physics should depend not on J 4 f but on J 4 f (1 − g J ) in the effective model Hamiltonian.We, however, found more important descriptors, e.g., Z R and r cv R in the group R and J 3d in the group T. It is more plausible that the regression model found a relationship similar to the generalized Slater-Pauling curve for Curie temperature as a function of C R and Z T and Z R , and that the other effects are only marginal. 24)We introduced many descriptors that cannot appear in the atomic-scale effective model Hamiltonian, and the regression model simply selected the inter-scale regression model including the macro scale parameter C R first and Z T and Z R next, which do not directly appear in the effective model Hamiltonian because their relationships are more apparent.It should be noted that the number of data, only about a hundred, is too few to discuss the details because it can easily change the prediction accuracy as discussed in the supporting information. 18)  We cannot avoid errors in T C s because of experimental errors and human errors.The latter is mainly because AtomWork does not allow web scraping.We examine the possibil-9/13 ity of outlier detection using machine learning.We show a plot of experimental T C s versus predicted ones in the supporting information. 18)The overall coincidence is good from 0K to ∼1300K, but there exist a few outliers.We mainly check the outliers of T C s and fix the errors again and again if there are any.We found three major errors and a minor error.After fixing these errors, we evaluated the cross-validation test scores again for the best n descriptors of the original regression model.The best R 2 was 0.96688.By using machine learning, it may be able possible to find data errors efficiently; however, it cannot detect data prediction of which appears consistent with the experimental values accidentally.
We employed Pearson's correlation coefficient to define the distance in this study.However, there exist many choices for the distance.It depends on the problem whose representation is the most appropriate in the unsupervised learning part.We use the similarity, or distance, between materials to find the regression model, but usually discard the similarity between descriptors to make the regression model.We, however, utilized the latter similarity, and therefore took full advantage of the similarity of the data in this prescription.
We showed that the distances between the descriptors are useful to illustrate the importance of descriptors and descriptor groups.This result is not strange when the descriptors have some physical meaning.There exists, however, minor discrepancies in the subgroup containing Z R , J 3d , and L 3d in the dendrogram.This is a limitation of this theory; however, it is possible to overcome this difficulty.We used the distance between the descriptors to explain the scores of the relevance analysis, but its inverse problem is also possible.We can set the value of distances between the descriptors, or the structures of the dendrogram, to be more consistent with the scores of the relevance analysis.
We can consider many variants of the subgroup relevance analysis.We took the best descriptor from the subgroup shown in yellow in Fig. 2. Thus, we were able to show the best descriptors in the subgroup.Another method is to take the best subgroup in the downstream to a specified subgroup.Then, we will be able to understand the relationship among subgroups, and we can easily change them depending on the purpose.
Note that the Monte-Carlo tree search also utilizes the same nature of tree structures.
There may be a route to find out the almost best regression model by utilizing subgroup decomposition without performing expensive exhaustive search.
In summary, we studied the data-driven approach on the Curie temperature of rare-earth transition metal stoichiometric alloys.We successfully made regression models that achieved high scores from our descriptors.We developed subgroup relevance analysis and successfully illustrated the importance, relationship, and structures among the descriptors from a huge list 10/13 of exhaustive search.In addition, it shold be noted that our method makes full use of the similarity of the given data.
Important Descriptors and Descriptor Groups of Curie Temperatures of Rare-earth Transition-metal Binary Alloys: Supporting Information

Descriptors
We collected the experimental data of 101 binary compounds consisting of transition metals and rare-earth metals from the Atomwork database of NIMS [1], including the crystal structure of the compounds and their observed T C .To represent the structural and physical properties of each binary compound, we use a combination of 28 descriptors.We divide all 28 descriptors into three categories.
The first category pertains to the descriptors describing the atomic properties of the transitionmetal constituent, including the (1) atomic number (Z T ), (2) atomic radius (r T ), (3) covalent radius (r cv T ), (4) ionization potential (IP T ), ( 5) electronegativity (χ T ), ( 6) spin angular moment (S 3d ), (7) orbital angular moment (L 3d ), and (8) total angular moment (J 3d ) of the 3d electrons.The selection of these descriptors originates from the physical consideration that the intrinsic electronic and magnetic properties will determine the 3d orbital splitting at transition-metal sites.
In the same manner, we design the second category pertaining to the descriptors for describing the properties of the rare-earth metal constituent, including the (9) atomic number (Z R ), (10) atomic radius (r R ), (11) covalent radius (r cv R ), (12) ionization potential (IP R ), (13) electronegativity (χ R ), (14) spin angular moment (S 4f ), (15) orbital angular moment (L 4f ), and (16) total angular moment (J 4f ) of the 4f electrons.To capture the effect of the 4f electrons better, we add three additional descriptors for describing the properties of the constituent rare-earth metal ions, including (17) the Landé factor (g J ), (18) the projection of the total magnetic moment onto the total angular moment (J 4f g J ), and (19) the projection of the spin magnetic moment onto the total angular moment (J 4f (1 − g J )) of the 4f electrons.The selection of these descriptors originates from the physical consideration that the magnitude of the magnetic moment will determine T C .
It has been well established that information related to the crystal structure is very valuable in relation to understanding the physics of binary compounds with transition metals and rareearth metals.Therefore, we design the third category with structural descriptors that roughly represent the structural information at the transition metal and rare-earth metal sites, which are (20) the concentration of the transition metal (C T ), (21) the concentration of the rareearth metal (C R ), (22) the average distance between a transition-metal site and the nearest transition-metal site (d T −T ), (23) the average distance between a transition-metal site and the nearest rare-earth-metal site (d T −R ), (24) the average distance between a rare-earth metal-site and the nearest rare-earth-metal site (d R−R ), (25) the average number of are-earth-metal sites surrounding a transition-metal site within the distance less than 5.0 Å(N T −R ), (26) the average number of rare-earth-metal sites surrounding a rare-earth-metal site within the distance less 1 arXiv:1809.04750v2[cond-mat.mtrl-sci]15 Oct 2018 than 10.0 Å(N R−R ), and (27) the average number of transition-metal sites surrounding a rareearth-metal site within the distance less than 5.0 Å(N R−T ).The values of these descriptors are calculated from the crystal structures of the compounds from the literature.

Strong Relevance and Weak Relevance
We define the prediction ability P A(S) of descriptors by the maximum prediction accuracy that the model can achieve by using the descriptors in a subset s of a set S of descriptors as follows: where R 2 s is the value of the coefficient of determination R 2 achieved by the model using a descriptor set s. 2 , where y i , y pred.i , and ȳ are the target value, the predicted value, and the man target value, respectively.) On the basis of Eq. ( 1), we can evaluate the relevance [2,3] of a descriptor for the prediction of T C by using the expected reduction in the prediction ability caused by removing this descriptor from the full set of descriptors.Let D be a full set of descriptors, d i a descriptor, and D i = D − {d i } the full set of descriptors after removing the descriptor d i .The degree of relevance of the descriptors can be formalized as follows: Strong relevance: a descriptor is strongly relevant if and only if Among the strongly relevant descriptors, a descriptor that causes a larger reduction in the prediction ability when it is removed can be considered as a strong one.The degree of relevance of a strongly relevant descriptor can be computationally estimated by using the leave-one-out approach, i.e., by leaving out a descriptor in the currently considered descriptor set and testing how much the prediction accuracy is impaired.Weak relevance: a descriptor is weakly relevant if and only if It is clearly seen from Eq. (3) that estimation of the degree of relevance for the weakly relevant descriptors cannot be carried out in a straightforward manner as for the case of the strongly relevant descriptors.Weakly relevant descriptors are descriptors that are relevant for prediction, but they can be substituted by the other descriptors.We can only estimate the degree of relevance for this type of descriptor in specified contexts.For example, in terms of the prediction of T C , the relevance of a descriptor for an atomic property of transition metal can be examined in the context that all of the descriptors for the atomic properties of rare-earth metals are included in the descriptor set.We define the following additional rule for comparing two weakly relevant descriptors: Comparison between weakly relevant descriptors: A weakly relevant descriptor d i is said to be more relevant than the descriptor d j in the context of having a set of descriptors M (d i , d j / ∈ M ) if and only if Table 1: The number of descriptors vs the best R 2 score and descriptors.n score descriptor(s) 1 0.32518 A comparison of two weakly relevant descriptors can be computationally carried out by using the add-one-in approach, i.e., by exclusively adding the two descriptors to the currently considered descriptor set and testing how much the prediction accuracy is improved.

Best R 2 Scores and Descriptors
We present a list of the best R 2 scores and descriptors in Table 1.It may appear that the difference in the scores is very small.We originally used ten times ten-fold cross validation (10×10 CV).[4] The best scores of the 10×10 CV are the same for the two digits, i.e., they are 0.95X and 0.960 for n = 5 to 10, where X varies.Consequently, the plot of the scores versus n shows a plateau.We recognize that there exist non-negligible statistical errors which affects the relevance analysis.Next, we employ the leave-one-out cross validation because there exist no statistical errors and because we can obtain the most accurate scores from the data.Then, the best scores are the same for the three digits, i.e., they are 0.954X for n=5 to 8 in the leave-oneout cross validation, where there is a plateau in the score plot versus n.The difference between the scores becomes 10 times smaller in the latter.

Prediction among the Best n Models
We show the best scores of RMSE and MAE as a function of the number of descriptors (n) in the models in Fig. 1.The score changes gradually as a function of n.One may expect that their predictions are almost the same.We also evaluate the "RMSE" between the leave-one-out cross validated test predictions of the best models with the n descriptors in Fig. 2. We can see that the predictions are almost the same for n=4 to 8; however, the deviations are larger in the other cases.Only the best models for n=4 to 8 give almost the same predictions.We can also see this trend from the kernel parameters.Note that these figures are the results before fixing the errors in the data.

Prediction among the Best n Models after Fixing Errors
We show the scores for RMSE, MAE, and R 2 for the models in Table 1 in Figs. 3 and 4. The models for 4 ≤ n ≤ 8 have high scores.

Experimental T C versus CV-predicted T C
We plot the experimental T C versus the CV-predicted T C before and after fixing the errors in Fig. 5 and 6.They show the mean and standard deviation of the predictions.The standard deviations are shown as bars, but almost all of them are smaller than the sizes of the symbols.The overall coincidence is good from 0K to approximately 1300K, but we find a few outliers in Fig. 5.For example, the experimental T C s of SmCo 5 and PrNi 5 are much higher than the predicted ones, whereas the experimental T C s are much smaller for NdCo 5 and NdNi 5 .We find three major errors and a minor error in the experimental T C s including those for SmCo 5 and PrNi 5 .
A new plot obtained after fixing the errors is shown in Fig. 6.The predicted values of NdCo 5 and NdNi 5 now are almost the same as the experimental values.We find other outliers, such as the data for Ce 2 Co 7 and RCo 5 .However, it appears that these are not because of the in the data.

Predicted T C s for (RE)Fe 12
We examine the prediction ability of the best regression model.We apply the best regression model to (RE)Fe 12 , which was recently synthesized and attracts much attention.The existing experimental T C s are 508K for NdFe 12 , [5] 586K for SmFe 12 , [6] and 483K for YFe 12 .[7].On the other hand, the corresponding predicted T C s are 490(19)K, 581(15)K, and 396(10)K, where the crystal structures are obtained from the first-principles calculation and we substituted the Z and quantum-number-related descriptors of La for those of Y. [8] The coincidences of the values of NdFe 12 and SmFe 12 are fairly good considering the fact that we do not have the structure data in the training set.The predicted values for DyFe 12 and GdFe 12 are 470(11)K and 600(13)K, respectively.However, these predicted values decrease by 120-180K after fixing the errors in the data.The predicted values depend on the value of the L2 penalty term.We add this information as reference.

List of Descriptors
We list the original descriptors and T C s before fixing the errors in Tables 2-10.We list the final descriptors and T C s after fixing the errors in Tables 11-19.The number of original materials was 101, but we found a non-stoichiometry material, which was deleted.3: Descriptors from the 1st to the 40th material.(cont.)Table 5: Descriptors from the 41st to the 80th material.

Fig. 1 .
Fig. 1. (Color online) Top panel: The blue line shows the best score for each number of descriptors.The orange dotted line shows the score when C R is removed.Bottom panel: C R (Å −3 ) vs T C ( • C).

Fig. 2 .
Fig. 2. (Color online) R 2 scores of the subgroup relevance analysis on the hierarchical clustering of the descriptors.We include T C in the dendrogram.The group R (green) is from L 4 f to r cv R .The group T (red) is from IP T to r T .The group S (cyan) is from d T T to C T .The group d T R is made of the descriptor d T R .The horizontal values are strong relevance values and the tilted values are weak relevance values.The vertical axis shows the distance, d, and the values are one minus the absolute values of Pearson's correlation coefficient.The paths of the highest value (0.95445) are colored in yellow dashed lines.See details in the main body also.

Figure 1 :
Figure 1: The best RMSE and MAE as functions of the number of descriptors ( • C).

Figure 2 :
Figure 2: Heatmap of "RMSE" between the best models with n descriptors.

Figure 4 :
Figure 4: The best R 2 as a function of the number of descriptors.

Figure 5 :
Figure 5: Experimental T C versus CV-predicted T C before fixing the errors.

Figure
Figure Experimental T C versus CV-predicted T C after fixing the errors.

Table 6 :
Descriptors from the 41st to the 80th material.(cont.)material

Table 9 :
Descriptors from the 81st to the 101st material.(cont.)material

Table II .
The best R 2 score and descriptors as a function of the number of descriptors n.

Table 2 :
Descriptors from the 1st to the 40th material.

Table 8 :
Descriptors from the 81st to the 101st material.

Table 11 :
Descriptors from the 1st to the 40th material.

Table 15 :
Descriptors from the 41st to the 80th material.(cont.)materialTC Z R r R r cov

Table 17 :
Descriptors from the 81st to the 100th material.